Abstract
The lack of readily available methods for estimating high-resolution near-surface relative humidity (RH) and the incapability of weather stations to fully capture the spatiotemporal variability can lead to exposure misclassification in studies of environmental epidemiology. We therefore aimed to predict German-wide 1 × 1 km daily mean RH during 2000–2021. RH observations, longitude and latitude, modelled air temperature, precipitation and wind speed as well as remote sensing information on topographic elevation, vegetation, and the true color band composite were incorporated in a Random Forest (RF) model, in addition to date for capturing the temporal variations of the response-explanatory variables relationship. The model achieved high accuracy (R2 = 0.83) and low errors (Root Mean Square Error (RMSE) of 5.07%, Mean Absolute Percentage Error (MAPE) of 5.19% and Mean Percentage Error (MPE) of - 0.53%), calculated via ten-fold cross-validation. A comparison of our RH predictions with measurements from a dense monitoring network in the city of Augsburg, South Germany confirmed the good performance (R2 ≥ 0.86, RMSE ≤ 5.45%, MAPE ≤ 5.59%, MPE ≤ 3.11%). The model displayed high German-wide RH (22y-average of 79.00%) and high spatial variability across the country, exceeding 12% on yearly averages. Our findings indicate that the proposed RF model is suitable for estimating RH for a whole country in high-resolution and provide a reliable RH dataset for epidemiological analyses and other environmental research purposes.