AbstractThe EURODELTA III exercise has facilitated a comprehensive intercomparison and evaluation of chemistry transport model performances. Participating models performed calculations for four 1-month periods in different seasons in the years 2006 to 2009, allowing the influence of different meteorological conditions on model performances to be evaluated. The exercise was performed with strict requirements for the input data, with few exceptions. As a consequence, most of differences in the outputs will be attributed to the differences in model formulations of chemical and physical processes. The models were evaluated mainly for background rural stations in Europe. The performance was assessed in terms of bias, root mean square error and correlation with respect to the concentrations of air pollutants (NO2, O3, SO2, PM10 and PM2.5), as well as key meteorological variables. Though most of meteorological parameters were prescribed, some variables like the planetary boundary layer (PBL) height and the vertical diffusion coefficient were derived in the model preprocessors and can partly explain the spread in model results. In general, the daytime PBL height is underestimated by all models. The largest variability of predicted PBL is observed over the ocean and seas. For ozone, this study shows the importance of proper boundary conditions for accurate model calculations and then on the regime of the gas and particle chemistry. The models show similar and quite good performance for nitrogen dioxide, whereas they struggle to accurately reproduce measured sulfur dioxide concentrations (for which the agreement with observations is the poorest). In general, the models provide a close-to-observations map of particulate matter (PM2.5 and PM10) concentrations over Europe rather with correlations in the range 0.4–0.7 and a systematic underestimation reaching −10 µg m−3 for PM10. The highest concentrations are much more underestimated, particularly in wintertime. Further evaluation of the mean diurnal cycles of PM reveals a general model tendency to overestimate the effect of the PBL height rise on PM levels in the morning, while the intensity of afternoon chemistry leads formation of secondary species to be underestimated. This results in larger modelled PM diurnal variations than the observations for all seasons. The models tend to be too sensitive to the daily variation of the PBL. All in all, in most cases model performances are more influenced by the model setup than the season. The good representation of temporal evolution of wind speed is the most responsible for models' skillfulness in reproducing the daily variability of pollutant concentrations (e.g. the development of peak episodes), while the reconstruction of the PBL diurnal cycle seems to play a larger role in driving the corresponding pollutant diurnal cycle and hence determines the presence of systematic positive and negative biases detectable on daily basis.