Machine Learning for Process Control. Part 2: NGL Extraction Unit Optimisation with Surrogate Models.
- Ivan Nemov
- 2 days ago
- 12 min read
Introduction
Process optimisation is a mainstay activity performed by process control engineers to increase production and improve products quality of industrial plants. If the underlying process can be well described (or closely approximated) by linear models, then its optimisation can be completed as a part of solving Linear Programming problem and typically done with Model Predictive Control.
In some cases, process behaviour is strongly non-linear and underlying physical relationships are too complex to be represented by linear models. In such cases Real-Time Optimisation (RTO) comes up as a powerful tool. RTO assumes direct use of steady-state process optimisation results by the plant control system. The steady-state process model is updated with current plant conditions and then optimised. The optimal values of manipulated variables are then communicated to the plant control system.
Practicalities of steady-state simulation of first-principles process models used in RTO include specialised simulation software, licensing considerations, software reliability and optimisation convergence problems. RTO deployment can be greatly simplified if the process optimisation is performed offline for a wide range of process conditions, and optimisation results are then implemented in the control system as a look-up table. This does not work well, though, if some parameters of the model or elements of objective function are not static. For example, optimisation results obtained for certain product prices do not work for other prices. Is there a method that combines advantages while staying free of drawbacks? Fortunately, yes.
Surrogate ML models can be trained based on results from rigorous first-principles process simulation, and then used instead of the latter in the online deployments. This post describes how to generate such ML models for accurate capturing of non-linear process behaviour, and then how to apply these models in optimisation. The work covers steady-state process simulation of NGL extraction unit of LNG plant, training of MLP surrogate models to predict key process variables, and application of Bayesian optimisation for finding optimal operating conditions. The objective of this post is to provide a starting point for development and deployment of an RTO application based on ML techniques.
Abbreviations used in the post are explained in this section.

Process Description
Natural Gas contains various amounts of heavy hydrocarbons depending on a source gas field [1]. In LNG process these hydrocarbons, including C5 (pentane) and heavier, shall be removed from feed NG to prevent them from freezing and solidifying during the cryogenic liquefaction process. The exact way how it is achieved depends on LNG process technology. For example, in C3MR LNG process [2], C5+ hydrocarbons are removed in Scrub column (Fig. 1).
![Fig. 1 – C3MR process flow scheme (from [2] with modifications).](https://static.wixstatic.com/media/b2efc2_8611c0b0614b4952912d3203ac35b8c5~mv2.png/v1/fill/w_49,h_33,al_c,q_85,usm_0.66_1.00_0.01,blur_2,enc_avif,quality_auto/b2efc2_8611c0b0614b4952912d3203ac35b8c5~mv2.png)
In Scrub column, pre-cooled feed NG is supplied under the bottom tray and then is passing to the top of the column through multiple trays while cold reflux is running down from the top tray absorbing heavy hydrocarbons. The Scrub column top product is then cooled to a lower temperature, and condensed hydrocarbons are separated in reflux drum. Part of that liquid is directed to the Scrub column as reflux, and the balance part is reinjected into dry NG sent further to liquefaction. The Scrub column bottom product is a mixture of NGL and condensate with dissolved methane. It is fed to a series of fractionation columns where NGL is recovered and then reinjected back to dry NG for liquefaction. This NGL reinjection stream (also called “long” reinjection) is returned to NG at a higher temperature than the temperature of NGL reinjected from the Scrub column reflux drum. Bottom product of the last fractionation column (debutaniser) is the condensate or C5+, which is stored and offloaded as a separate product.
Process Optimisation Problem Statement
Provided that C5+ concentration in LNG meets specification, there still remains operational flexibility to extract more C5+ and produce larger amounts of condensate or, alternatively, to maximise C5+ content in LNG up to the specification limit but on expense of producing less of condensate. Every extra unit of condensate requires progressively more reflux and lower temperature in Scrub column. As a result, greater amount of NGL is reinjected back to NG at warmer temperature competing for the cooling capacity and causing slight LNG production decrease. This relationship is non-linear, and for every combination of LNG and condensate market price there is an optimal level of C5+ extraction. Optimisation of Scrub column operating conditions to maximise the cost function with respect to the C5+ extraction is a problem this work attempts to solve.
Process Simulation Sofware
DWSIM open-source chemical process simulator [3] was used for process model development and simulation. DWSIM is a flowsheet type simulator supporting both steady-state and dynamic modes. In this work, only steady-state simulation mode was used. DWSIM is based on thermodynamic models including commonly used EOS. A process is modelled on simulation flowsheet using unit operations such as streams, heat exchanges, pumps, splitters etc, which are calculated sequentially following process flow direction. DWSIM has integrated Python support with code development interface. This allows to create custom unit operations and interact with simulation flowsheet programmatically. CAPE-OPEN interface allows to integrate DWSIM modelling environment with third party modelling components [4].
Scrub column was modelled in ChemSep which is a column simulation software for distillation, absorption, and extraction operations [5]. Classic equilibrium stage column model was used as opposed to nonequilibrium (rate-based) model. ChemSep has been tested with many industrial columns such as demethanisers, debutanisers, refluxed absorbers, azeotropic and extractive distillation. ChemSep is CAPE-OPEN compliant which allows integration of the column models with DWSIM flowsheet simulation environment.
Model Description
NG Composition and Inlet Conditions
The NGL extraction unit feed composition is provided in Table 1 and is based on one of the real LNG mixtures [1] with some extra C5+ components being added to better represent raw NG feed. The feed is supplied at 57.8 bara and 35°C.
Table 1 – Process model NG composition
Component | Conecentration, mol % |
Methane | 92.20 |
Ethane | 5.49 |
Propane | 2.20 |
n-Butane | 0.25 |
i-Butane | 0.20 |
n-Pentane | 0.15 |
i-Pentane | 0.15 |
n-Hexane | 0.14 |
n-Heptane | 0.14 |
Nitrogen | 0.10 |
Unit Operations
E-100 and E-200 are cooler units with 100% efficiency, 0.75 bar pressure drop each and outlet temperature specifications. These units cool down feed NG and Scrubber column top product (Fig. 2). Outlet temperatures are measured by TI-050 and TI-100 respectively, which are manipulated variables.
E-300 is a heater added for modelling purposes to bring long reinjection stream temperature to +25°C before reinjecting to dry NG. It has no pressure drop and 100% efficiency. In reality, depropanisier and debutaniser top distillate products would be cooled to the ambient temperature before pumping to the reinjection point.
E-400 is a heater added to bring condensate product to a standard temperature of 15°C normally used as a reference temperature for density measurement. The heater pressude drop specification ensures that condensate pressure is reduced to 3 bar.
Scrub column T-100 has 10 trays with 35% efficiency. Operating as an absorber, it receives feed NG under the bottom tray and reflux on the top tray. Dry NG is produced from the top and NGL liquid from the bottom of the column. The column operates at 54.5 bar top pressure and 57.0 bar bottom pressure.
T-200 is a compound separator representing operation of a fractionation unit. It splits Scrub column bottom product on NGL for reinjection containing no C5+ components, and condensate.
V-100 is two-phase separator to separate dry NG from reflux liquid. The liquid is pumped by P-100 at 54.5 bar and further split in SPL-1 to the reflux stream returned to the Scrub column and NGL reinjection. NGL reinjection from SPL-1, long reinjection from E-300 and V-100 gas product are mixed in MIX-1 to make LNG stream.

Manipulated Variables, Process Variables and Model Constraints
Manipulated Variables are external inputs and handles used to adjust the operating conditions of the process unit. The MVs are defined here in the model context, i.e. not all of them can be used for control and optimisation. Each of these MVs can take any value within specified variation range as per Table 2. The variation ranges shall include normal operating ranges with some margin to ensure that ML models training data covers all realistic unit operation scenarios.
Table 2 – Model manipulated variables
Manipulated (input) variables | Description | Variation range | Variation magnitude |
FI-050 | Feed flow rate kg/s | 90 – 110 | 1 |
FI-150 / FI-050 | Recycle ratio | 0.025 – 0.175 | 0.01 |
TI-050 | E-100 outlet temperature deg_C | -20 – 0 | 1 |
TI-150 | E-200 outlet temperature deg_C | -55 – -40 | 1 |
Process Variables are outputs or results of the process model (Table 3). They cannot be manipulated directly but instead require adjustments of MVs to get desired PV values. Later in this work ML models will be trained to predict each PV based on the MVs input.
Table 3 – Model process variables
Process (output) variables | Description |
FI-250 | Condensate flow rate kg/s |
QI-250 | Condensate density kg/m3 |
FI-075 | LNG flow rate kg/s |
QI-075 | LNG C5+ mol |
FI-250 | Long reinjection mass flow kg/s |
FI-175 | Reinjection mass flow kg/s |
QI-075 LNG C5+ concentration is calculated as following:
QI-075 = C_i-Pentane + C_n-Pentane + C_n-Hexane + C_n-Heptane
A modelling constraint was imposed on variation of FI-150 / FI-050 Recycle ratio MV such that resulting FI-175 Reinjection mass flow PV is always positive.
Data Collection
To generate ML model training data the process model is run a number of times with slightly different MVs values each time but respecting the variation ranges. Before every run each MV value is incremented by a random value generated independently for each MV and limited by corresponding variation magnitude (Table 2). This approach allows to cover all variation range with a reasonably uniform sampling density. The advantages of this method are:
gradual evolution of process conditions between the runs allows to converge the process model faster and reduces computation load;
allows to dynamically limit MV changes in a direction violating modelling constrains;
sampling redundancy and randomness allowing to split the data set on training and test parts.
Results of 12,000 model runs is recoded in CSV format for later models training.
MLP Regression Models Training
Process data generated in the previous steps using the first-principles model is saved in CSV format. Hence it is first read is into pandas dataframe [6].
Prior to the models training collected data is visualised to confirm non-linear relationships and correlation between PVs and MVs. Fig. 3 indicates non-linearity between C5+ concentration in LNG and long reinjection flow. The data confirms that C5+ extraction requires progressively larger NGL turnover to reduce C5+ concentration in LNG below 1,000 ppm. At some point LNG production losses caused by warmer reinjected NGL would outweigh the value of extra condensate produced.

Fig. 4 confirms that strong correlation exists between the MVs and PVs (this area is highlighted by green frame). At the same time, MVs do not correlate with each other. Both are positive observations for the ML models training.
Original 12,000 points data set is split into 67% part for training and 33% part for testing with randomised pickup of data points.
MLP regressor [8] of scikit-learn library [7] is used to train a model for each of the PVs (refer to Table 3). Some MLP hyperparameters were selected intentionally to improve generalisation and avoid overfitting:
Activation function - hyperbolic tangent. It is a common alternative to more widely used ReLU (Rectified Linear Unit). In some applications its advantage of smother output changing outweighs higher computational cost compared to ReLU.
L2 regularisation alpha is set 0.1 higher than default 0.0001 to increase penalty for unnecessarily high weights variation during model training.
MLP model training requires data scaling. Various features have different measurement units and amplitude of values variation. At the same time, MLP is initialised with all neuron links having the same weight. To avoid overfitting features having a larger values variation, they all need to be scaled. The scaling is performed using standardscaler class of scikit-learn library. Scaler coefficients saved using joblib library as a serialised object file to be used in future model deployment.

After the training, models performance is evaluated based on error mean, standard deviation and regression score for testing dataset (Table 4). Models can be exported using joblib library to serialise the model objects and save as files. Serialised model files can be then imported in a similar environment and restored into the model objects [9].
High accuracy and good repeatability of the first-principles model results as well as universal fitting capability of the MLP regressor contribute to the high model performance visually demonstrated on Fig. 5.
Table 4 – Models performance
Process (output) variables | Description | Regression score R2 | Mean error | STD of error |
FI-250 | Condensate flow rate kg/s | 0.9975 | 0.0007 | 0.0180 |
QI-250 | Condensate density kg/m3 | 0.9976 | 0.0161 | 0.2060 |
FI-075 | LNG flow rate kg/s | 0.9999 | -0.0015 | 0.0565 |
QI-075 | LNG C5+ mol | 0.9986 | 4.733e-09 | 3.211e-05 |
FI-250 | Long reinjection mass flow kg/s | 0.9973 | -0.0003 | 0.0214 |
FI-175 | Reinjection mass flow kg/s | 0.9988 | 0.0056 | 0.0808 |

Objective Function
Scrub column operation efficiency can be expressed as a function of products value and losses generated over a time period:
G = G_CND + G_C5LNG + G_LNG [$/sec]
where G_CND – value of condensate produced;
G_C5LNG – value of C5+ components in LNG;
G_LNG – value of LNG losses due to warmer NGL reinjection.
Value of condensate produced is calculated as following:
G_CND = (m_CND∙C_CND)/(ρ_CND∙a_1 ) [$/sec]
where m_CND is FI-250 condensate flow rate, [kg/s];
C_CND – condensate price normally in range of 65-85 [$/bbl] [11];
ρ_CND – QI-250 condensate density, [kg/m3];
a_1 = 0.1590 – conversion constant [m3/bbl] [10].
Value of C5+ in LNG is calculated as following:
G_C5LNG = (m_LNG∙ω_C5∙LHV_C5∙C_LNG)/a_2 [$/sec]
where m_LNG is FI-075 LNG flow rate, [kg/s];
ω_C5 - mass fraction of C5+ components in LNG;
LHV_C5 = 44,938 - lower heating value of C5+ components [kJ/kg] (based on petroleum naphtha [12]);
C_LNG – LNG price normally in range 8.8-15.1 [$/MMBtu] [13];
a_2 = 1,055,056 – conversion constant [kJ/MMBtu] [10].
Mass fraction of C5+ components in LNG is related to molar fraction:
ω_C5 = c_C5∙MW_C5/MW_LNG
c_C5 - QI-075 molar fraction of C5+ components in LNG;
MW_C5 = 67.15 – C5+ components molecular weight [kmole/kg];
MW_LNG = 17.87 – LNG molecular weight [kmole/kg].
Value of LNG production loss due to reinjection of warmer NGL:
G_LNG = - m_long_NGL∙LHV_LNG∙a_3∙C_LNG)/a_2 [$/sec]
where m_long_NGL is FI-250 Long reinjection mass flow, [kg/s];
LHV_LNG = 48,632 - lower heating value of LNG [kJ/kg] [12];
a_3 = 0.1 – assumed loss factor defining how much LNG [kg] can be produced using colling duty required to cool down 1 kg of NGL reinjected at 25°C to average temperature downstream E-200 reflux cooler -47.5°C.
Fig. 6 illustrates behavior of the objective function and its elements with respect to one of MVs, reflux ratio. Depending on input variables such as LNG and condensate prices, optimal process conditions can be found within the MVs ranges or on their boundaries.

Bayesian Optimisation
Bayesian optimisation is a sequential optimisation method making no assumptions about the form of the optimised function. It suits well black-box, multi-dimensional, complex functions which are expensive to evaluate, e.g. where a single run takes long time.
In this method, the optimised function is sampled sequentially with different input values each time, and every next sample is based on information received from all previous samples. Every next move is a result of inner optimisation loop that maximises an acquisition function so as to minimize the number of the black-box function calls. There are multiple acquisition function options to be used, but in general they all ensure some level of trade-off between exploration and exploitation [14]. Exploration is aimed to reduce uncertainty about less known areas of the function by sampling further away from previous samples. Exploitation, on the other hand, continues to sample in the area most promising to return optimal result based on previous samples.
In this work, Bayesian optimisation implementation in scikit-optimize library [15] is used with Expected Improvement acquisition function and number of function calls limited by 30. Two MVs are set as fixed inputs:
FI-050 Feed flow rate is set at 100 kg/s;
TI-150 E-200 outlet temperature is set at 48 °C.
Remaining two MVs are optimised for maximum objective function (refer to Table 2):
FI-150 / FI-050 Recycle ratio in range [0.025, 0.175];
TI-050 E-100 outlet temperature in range [-20, 0] °C.
Product prices used for the optimisation are fixed:
C_LNG = 14$/MMBtu;
C_CND = 65$/bbl.
Optimisation run resulted in maximum value of the objective function 1.4921 reached at recycle ratio 0.1750 and TI-050 3.5°C as show on Fig. 7. The partial dependence plot indicates that both optimised MVs have significant influence on the objective function. The samples are evenly distributed leaving no unexplored areas.

In the optimal point, the recycle ratio is constrained by upper limit 0.1750. A preference to use recycle ratio first for maximising C5+ extraction can be explained by the fact that it results in a smaller NGL reinjection increase due to the effect of multiple column trays. Contrary to that, TI-050 reduction is less selective with respect to C5+ extraction because liquid produced in E-100 is knocked off in a single separation stage.
Abbreviations
Abbreviation | Full description |
CSV | Comma Separated Values |
EOS | Equation of State |
LNG | Liquefied Natural Gas |
ML | Machine Learning |
MLP | Multi-Layer Perceptron neural network |
MPC | Model Predictive Control |
MV | Manipulated Variable |
NG | Natural Gas |
NGL | Natural Gas Liquids |
PV | Process Variable |
RTO | Real-Time Optimisation |
References
[1] Augusto Veiga. An Introduction to the Marine LNG Transportation
[2] Ghorbani, B.; Zendehboudi, S.; Saady, N.M.C. Advancing Hybrid Cryogenic Natural Gas Systems: A Comprehensive Review of Processes and Performance Optimization. Energies 2025, 18, 1443.
[3] DWSIM chemical process simulator
[4] CAPE-OPEN Interface Specification
[5] ChemSep column simulator for distillation, absorption, and extraction operations.
[6] Pandas Python library.
[7] Scikit Learn. An introduction to machine learning with scikit-learn
[8] Scikit Learn. Neural Networks
[9] Scikit Learn. Model persistence
[10] UnitConverters.net
[11] Oil Sands Magazine
[12] Lower and Higher Heating Values of Gas, Liquid and Solid Fuels
[13] Global price of LNG, Asia
[14] Acquisition functions in Bayesian Optimization
[15] Scikit Optimize. Sequential model-based optimization in Python