Skip to main content

A Novel Heterogeneous Ensemble Approach to Variable Selection For Gas-Liquid Two-Phase CO\(_2\) Flow Metering

Sun, Caiying, Wang, Lijuan, Yan, Yong, Zhang, Wenbiao, Shao, Ding (2021) A Novel Heterogeneous Ensemble Approach to Variable Selection For Gas-Liquid Two-Phase CO\(_2\) Flow Metering. International Journal of Greenhouse Gas Control, 110 . Article Number 103418. ISSN 1750-5836. (doi:10.1016/j.ijggc.2021.103418) (Access to this publication is currently restricted. You may be able to access a copy if URLs are provided) (KAR id:89452)

PDF Author's Accepted Manuscript
Language: English

Restricted to Repository staff only until 31 July 2022.

Contact us about this Publication
[thumbnail of IJGGC_2020_467_R2_Manuscript-1.pdf]
Official URL


Variable selection is an important preprocessing step in the development of effective data-driven models for CO\(_2\) flow measurement in carbon capture and storage systems. In order to effectively quantify the importance of potential input variables to the desired output, ensemble learning is proposed and incorporated into variable selection methodology. This paper presents a tree-based heterogeneous ensemble approach to variable selection and its application to gas-liquid two-phase CO\(_2\) flow measurement. The importance of each variable is determined through combining the importance scores from four tree-based algorithms, including decision tree regression, bootstrap aggregating of regression trees, gradient boosting decision tree and gradient boosting random forest. Then the backward elimination algorithm is applied to remove the relatively less important variables and hence a small set of input variables for data-driven models. The selection results demonstrate that the significant variables for CO\(_2\) mass flow measurement include apparent mass flow rate, time shift, differential pressure and pressure drop while observed density, density drop, observed flow velocity and outlet temperature for prediction of gas volume fraction. To assess the validity of the selected variables, data-driven models based on gradient boosting random forest are developed. Results suggest that the relative error of the model output is mostly within 1% for CO\(_2\) mass flowrate measurement and 5% for gas volume fraction prediction by taking the selected variables as model inputs.

Item Type: Article
DOI/Identification number: 10.1016/j.ijggc.2021.103418
Uncontrolled keywords: carbon capture and storage, gas-liquid two-phase CO2, variable selection, heterogeneous ensemble approach, data-driven models
Subjects: T Technology > TA Engineering (General). Civil engineering (General) > TA165 Engineering instruments, meters etc. Industrial instrumentation
Divisions: Divisions > Division of Computing, Engineering and Mathematical Sciences > School of Engineering and Digital Arts
Depositing User: Yong Yan
Date Deposited: 26 Jul 2021 12:06 UTC
Last Modified: 12 Aug 2021 08:38 UTC
Resource URI: (The current URI for this page, for reference purposes)
Wang, Lijuan:
Yan, Yong:
Zhang, Wenbiao:
  • Depositors only (login required):