Abstract
This paper proposes a practical implementation of robust ensemble learning models for accurate prediction of the internal corrosion rate in oil and gas pipelines. A correct assessment of the corrosion rate in fluid flowed oil and gas pipelines has a significant influence on the system's safety and the ability to control operation. The developed predictive data driven models include four ensemble learning approaches, namely random forest, adoptive boosting, gradient boosting regression tree, and extreme gradient boosting. The implementation procedure of these predictive models integrates a comprehensive database of eight system descriptors, extracted from the literature, while k-fold cross validation is employed to guarantee high performance and generalization. In addition, the obtained results of the internal corrosion rate are subjected to rigorous statistical and graphical analysis to evaluate the models performance and compare their abilities. The extreme gradient boosting model indicate the highest performance in the prediction of the internal corrosion rate in oil and gas pipelines based on the calculated single and global metrics, with a mathematical RMSE value of internal corrosion rate 0.031 mm/y and performance index, PI = 0.61. Besides, the significance of the input variables is determined through a sensitivity analyses by using feature importance criteria, whereas for the applied dataset strongest corrosion rate dependency to temperature and the pressure was shown beside the CO2 contribution. In overall, the ensemble learning models show a significant performance in the internal corrosion rate predictions, while the extreme gradient boosting model is beneficial to model the internal corrosion rate in oil and gas pipelines due to its high performance.