Abstract To support alternative forms of energy resources, the prediction of global incident solar radiation (I rad) is critical to establish the efficacy of solar energy resources as a free and clean energy, and to identify and screen solar powered sites. Solar radiation data for construction of energy feasibility studies are not available in many locations due to the absence of meteorological stations, especially in remote or regional sites. To surmount the challenge in solar energy site identification, the universally gridded data integrated into predictive models used to generate reliable I rad forecasts can be considered as a viable medium for future energy utilization. The objective of this paper is to review, develop and evaluate a suite of machine learning (ML) models based on the artificial neural network (ANN) versus several other kinds of data-driven models such as support vector regression (SVR), Gaussian process machine learning (GPML) and genetic programming (GP) models for the prediction of daily I rad generated through the European Centre for Medium Range Weather Forecasting (ECMWF) Reanalysis fields. The performance of the ML models are benchmarked against several statistical tools: auto regressive moving integrated average (ARIMA), Temperature Model (TM), Time series and Fourier Series (TSFS) models. To train these models, 87 different predictor variables from the ERA-Interim reanalysis dataset (01-January-1979 to 31-December-2015) were extracted for 5 solar-rich metropolitan sites (i.e., Brisbane, Gold Coast, Sunshine Coast, Ipswich and Toowoomba, Australia) targeted against surface level I rad available from the measured Scientific Information for Land Owners dataset. For daily forecast models, a total of the 20 most important predictors related to the I rad dataset were screened with nearest component analysis: " fsrnca " feature selection, and partitioned into training (70%), validation (15%) and testing (15%) sets for model design. To benchmark the ANN, TSFS and TM models were developed with Fourier series and regression analysis, respectively and the statistical performance was benchmarked with root mean square error (RMSE), mean absolute error (MAE), Nash-Sutcliffe efficiency (E NS), Willmott's Index (WI), Mean Bias error (MBE), Legates and McCabe Index (E 1), and relative MAE , RMSE and diagnostic plots. The performance of ANN was significantly better than the other models (SVR, GPML, GP, TM), resulting in lower RMSE (1.715–2.27 MJm−2/day relative to 2.14–5.90 MJm−2/day), relative RMSE (9.07–12.47 vs 10.98–29.15), relative RMAE (7.97–11.74 vs 9.27–33.96) and larger WI, E NS and E 1 (0.938–0.967 vs. 0.462–0.955, 0.935–0.872 vs. 0.355–0.915, 0.672–0.783 vs. 0.252–0.740). Additionally, models assessed with predictors grouped into El Niño, La Niña and the positive, negative and neutral periods of Indian Ocean Dipole, affirmed the merits of ANN model (RRMSE ≤ 11%). Seasonal analysis showed that ANN was an elite tool over SVR, GPML and GP for I rad prediction. The study concludes that an ANN approach integrated with ECMWF fields, incorporating physical interactions of I rad with atmospheric data, is an efficacious alternative to forecast solar energy and assist with energy modelling for solar-rich sites that have diverse climatic conditions to further support clean energy utilization. Graphical abstract Image 1 Highlights • ANN, SVR, GPML and GP models were designed for prediction of daily solar radiation with reanalysis data. • Determinstic model ARIMA, temperature model and time series with Fourier transform model was used for benchmarking. • Nearest component analysis (fsrnca) was used for feature selection to get optimum inputs. • ANN outperform all ML model and deterministic models. • Solar energy potential is assessed for 5 cities of Queensland, Australia. [ABSTRACT FROM AUTHOR]