Machine Learning for Process Control. Part 1: Ambient Temperature Prediction.
- Ivan Nemov
- Apr 2
- 13 min read
Updated: Apr 4
Introduction
In process control, if some measured disturbance factor is known to be changing then such information can be used to adjust control handles in advance, so that the disturbance is compensated before it starts impacting controlled variables. This concept is known as feedforward control and requires precise timing. Generally, the sooner disturbance is detected the better, as it provides more time to take corrective action. In some cases, it is beneficial to use anticipated or predicted disturbance variables to create such timing advantage.
Making educated guesses about an outcome using historical data is called Predictive Analytics, with Machine Learning being one of its powerful tools. Machine Learning predictive algorithms can therefore play an important role in feedforward process control. This post shows how to leverage various ML techniques to predict ambient temperature changes over a forecast horizon. It outlines the journey from data preprocessing and model training to real-time deployment, covering such concepts as #feature_engineering, #ensemle_learning, #hyperparameters_tuning, #statistical_models etc. The objective is to give a reader a clear roadmap of ML application design and its deployment in production with all necessary practical details.

Understanding the Problem
The ML application objective is to predict temperature for selected location, in this case Perth WA, based on current weather conditions. Application inputs are intentionally limited to minimum required so it can be deployed along with a basic weather station. The prediction horizon is limited to 3 hrs with 1 hr, 2 hr, and 3 hr points. This is a sufficient look-ahead for most of control problems. Interpolation between current temperature and the three forecasted points allows to customise the prediction time depending on a specific use case.
Architecture of ML Application
Two supervised learning algorithms from scikit-learn Python library were used in the project to predict temperature change over the next period based on current time weather conditions:
MLP and GB models are trained to predict temperature change based on a set of chosen features (predictors). The model training is relying on historical data for the location which was split on day and night parts and further on training and test datasets.
Additionally to the ML models, SARIMA timeseries forecasting algorithm was employed to extract information from the past temperature dynamics. Differently to MLP and GB where a single pre-trained model is used to make all predictions, SARIMA models are trained each time before prediction is to be made by fitting the preceding 7-day window data with automatically selected tuning parameters.
Finally, a basic ensemble learning method is used to combine prediction outputs from GB, MLP and SARIMA using median selection (Fig. 1). Temperature prediction is then calculated as a sum of current temperature and predicted change for each point of the horizon.
Tip:
ML models need large amount of data to learn patterns from it. Some strong patterns based on physics knowledge can explicitly embedded into the application by choice of architecture or feature engineering techniques. As a result, higher model quality can be achieved with less data.
Architecture shown on Fig.1 features split between "day" and "night" models. While day time temperature increases and decreases following amount of radiated solar heat, night time temperature predominantly decreases over time due to heat being radiated back to the sky. Due to the different patterns prevailing during day and night times, it makes sense to specialise the models for each of the cases.
Another architecture solution is to predict temperature increment and add it explicitly to known current temperature, as opposed to feeding a model with current temperature in hope that the model will learn to add it internally.
Offline Model Development
Data Collection and Preprocessing
Weather data is downloaded from Visual Crossing in CSV format and loaded into a pandas DataFrame. Non-essential fields (like “feelslike” or “icon”) are removed, and the ‘datetime’ column is used as an index to organize the time series. The pre-processed dataframe includes following time series relevant to this project:
‘datetime’
‘temp’
‘humidity’
‘windspeed’
‘winddir’
Empty data (NaN) can be easily dropped row-wise across all dataframe or in specific columns. This operation needs to be repeated as new features are derived.
df = df.dropna() # drop across all dataframe
df = df.dropna(subset=['col1', 'col2']) # drop for col1 nad col2 only
Capturing Past Temperature Dynamics
'dt_-1hr', 'dt_-2hr', 'dt_-3hr' features are created to capture known temperature change from the past by subtracting time-offset ‘temp’ column from the original ‘temp’ column. For example, ‘dt_-1hr’ is calculated as ‘temp’ 1 hour ago subtracted from the current ‘temp’:
df['temp_-1hr'] = df['temp'].shift(periods=1) # temperature 1 hr ago
df['dt_-1hr'] = df['temp'] - df['temp_-1hr'] # change over past 1 hr
These new features can be used to extract information about the past dynamics and use it for future temperature prediction.
For example, by adding 'dt_-1hr' to ‘temp’ we can get an extrapolation result of known last two temperatures in future 1hr period - 'temp_linear_2pts_1hr'. If last three known temperatures are used to fit a line, then linear extrapolation from 3 points can be obtained - 'temp_linear_3pts_1hr'. Probably the best method to extract information from the past dynamics is using SARIMA fit over last 7 days of temperature time series – ‘dt_sarima_7d_1hr’.
Results from these three methods are evaluated using overall data set and compared in Fig. 2. Mean hourly temperature change is roughly 1.0°C with standard deviation from the mean 0.8°C. Mean error between 'temp_linear_2pts_1hr', 'temp_linear_3pts_1hr' and ‘dt_sarima_7d_1hr’ predictions and actual temperature change is near 0°C for all methods. As expected, SARIMA performs better (lower standard deviation 0.85°C) than other methods. However, a simple assumption that temperature will change in next 1 hr on the same value as it has changed over the last 1 hr ('temp_linear_2pts_1hr') is not far away (standard deviation 1.05°C) performance-wise from SARIMA. More complex linear extrapolation from 3 points ('temp_linear_2pts_1hr') does not result in a better prediction (standard deviation 1.12°C).

Another key performance metric is probability of a prediction in wrong direction (Table 1), i.e. how frequently predicted change is positive while actual change is negative and vice versa.
Table 1 – Probability of prediction in a wrong direction
dt_linear_3pts_1hr | dt_linear_2pts_1hr | dt_sarima_7d_1hr |
24% | 21% | 18% |
Conclusion from these results is that SARIMA can be used as a separate prediction algorithm, while 'dt_-1hr' has a sufficiently high predictive power to be used as a feature for GB and MLP models.
Feature Engineering
ML models need new features to train against which do not exist in the original dataset – these features are called targets. 'dt_1hr', 'dt_2hr', 'dt_3hr', are created by subtracting original ‘temp’ column from the time-offset ‘temp’ column.
Apart from the past temperature dynamics, prediction about future temperature change can be made based on current time, wind direction and wind speed, and humidity. Time and wind direction are not continuous parameters as they change linearly until range maximum (24 hr, 360°), and then they drop to 0. Such variables require a trigonometric transform to cyclic variables to use circular predictors in regression algorithms
Solar radiation provides a deeper understanding of time impact on temperature change. Using pysolar Python library, sun altitude can be obtained from timestamp and location coordinates. From there inferred solar radiation can be derived.
Calculation of future change of the inferred solar radiation over the next 1 hr - ‘dradiation_1hr’ - does not require anything except timestamp. This feature adds strong predictive power to the model.
def get_solar_radiation(timestamp):
latitude = -31.9514 # Perth coordinates (South hemisphere negative latitude)
longitude = 115.8617 # Perth coordinates
sun_altitude = solar.get_altitude(latitude, longitude, timestamp)
if sun_altitude < 0 or sun_altitude > 180:
sun_radiation = 0
else:
sun_radiation = math.sin(2*math.pi*sun_altitude/360)
return sun_radiation
df['datetime_tz'] = df.apply(lambda x: x['datetime_'].tz_localize(tz='Australia/Perth'), axis=1)
df['solarradiation'] = df.apply(lambda x: get_solar_radiation(x['datetime_tz']), axis=1)
df['dradiation_1hr'] = df['solarradiation'].shift(periods=-1) - df['solarradiation']
Wind direction 'winddir' and speed 'windspeed' can be combined into continuous variables 'wind_x' and 'wind_y' describing the wind as a vector:
df['wind_x'] = df.apply(lambda x: x['windspeed']*math.cos(2*math.pi*x['winddir']/360), axis=1)
df['wind_y'] = df.apply(lambda x: x['windspeed']*math.sin(2*math.pi*x['winddir']/360), axis=1)
Tip:
Different ML algorithms deal with non-linearity with different level of success. Linear regression cannot handle it by definition. As number of fitting parameters grows, reaching thousands in ANN, chances to capture nonlinear behavior are increasing. However, linearisation techniques applied to features will benefit complex ML algorithms too by reducing amount of data needed for training and increasing prediction quality.
Correlation Analysis
Before starting models training it is important to evaluate how individual features correlate with each other, and which features should be selected for the model training as predictors. This will help to avoid the problem of having predictors that are redundant or dependent on each other, which can lead to inflated variance and unreliable estimates of the model.
Correlation between all parameter pairs is evaluated using Pearson method and presented in a matrix form. Correlation matrix (Fig. 3) indicates that the variables most strongly correlated with the temperature change are ‘humidity’, ‘dradiation_1hr’, ‘wind_x, ‘wind_y’ and ‘temp_-1hr’.
An alternative way to visualise features relation to the target variables is to plot them on XY plot (Fig. 4).


Dataset splitting and scaling
Dataframes and the target time series for day and night are split into 67% part for training and 33% part for testing with randomised pickup of data points. Training dataset is used to fit the models, and testing dataset is used to evaluate model performance. Randomised pickup of data points allows to avoid impact of seasonality in the original data: the aim is to represent every hour of day and every month equally in in the training and validation datasets. Example for day data is shown below:
from sklearn.model_selection import train_test_split
day_features = ['dradiation_1hr', 'wind_x', 'wind_y', 'humidity', 'dt_-1hr']
# Drop unrelevant features and leave only predictors
X = df_slice.drop(columns=[feature for feature in list(df_slice.columns.values) if feature not in day_features])
y = df_slice['dt_1hr']
# Split dataset to training and testing
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)
MLP model training requires data scaling. Various features have different measurement units and amplitude of values variation. At the same time, MLP is initialised with all neuron links having the same weight. To avoid overfitting features having a larger values variation, they all need to be scaled. scikit-learn library provides several methods of scaling, and one of them named standardscaler was applied. standardscaler centers each data vector by subtracting its mean from it (mean becomes zero of scaled data), and then divides the result on data standard deviation serving as a unit. Scaled features matrix and scaled target vector are then used to train regressor model. For future model deployment, scaler coefficients must be saved using joblib library as a serialised object file. Scaler objects can be then loaded in production environment to scale inputs and reverse-scale model output (Fig. 5).

import joblib
from sklearn.preprocessing import StandardScaler
# Scale data for using by certain models
scaler_X = StandardScaler().fit(X_train.values)
scaler_y = StandardScaler().fit(y_train.values.reshape(-1, 1))
X_train_scaled = scaler_X.transform(X_train.values)
y_train_scaled = scaler_y.transform(y_train.values.reshape(-1, 1))
joblib.dump(scaler_X, r".\all_models\temp_pred\scaler_X_" + suffix + '_' + y_name + ".pkl")
joblib.dump(scaler_y, r".\all_models\temp_pred\scaler_y_" + suffix + '_' + y_name + ".pkl")
Models Training and Evaluation
Model implementation and training in scikit-learn library is straightforward. There are various regressor classes, such as MLPRegressor and GradientBoostingRegressor. Each regressor has a number of tuning parameters, called hyperparameters, which define model structure and control the process of training. The regressors come with default values of hyperparameters, however, to reach maximum performance for the given use case they should be optimised. scikit-learn library provides GridSearchCV class that searches the best performing tuning combination through provided parameter space.
import datetime
from sklearn.neural_network import MLPRegressor
from sklearn.model_selection import GridSearchCV
parameter_space = {'hidden_layer_sizes': [(200), (300), (400)], # also tested (100, 50), (100, 100), (300, 200), (300, 300)
'activation': ['relu'], # also tested 'tanh', 'logistic'
'solver': ['adam'], # also tested 'sgd'
'alpha': [1e-4], # also tested 1e-3, 1e-2, 0.1, 1
'learning_rate': ['constant', 'adaptive'],
'early_stopping': [True, False],
'validation_fraction': [0.1, 0.2]
}
start_time = datetime.datetime.now()
mlp_model = MLPRegressor(random_state=1, max_iter=500)
mlp_reg = GridSearchCV(mlp_model, parameter_space, n_jobs=-1, cv=5).fit(X_train_scaled, y_train_scaled.ravel())
duration = datetime.datetime.now() - start_time
print('MLP best parameters found:\n', mlp_reg.best_params_)
mlp_reg_score = mlp_reg.score(scaler_X.transform(X_test.values), scaler_y.transform(y_test.values.reshape(-1, 1)))
joblib.dump(mlp_reg, r".\all_models\temp_pred\mlp_reg_model_" + suffix + '_' + y_name + ".pkl")
print("MLP build time:\n", duration)
print("MLP regression score:\n", mlp_reg_score)
pred_dt_mlp_scaled = mlp_reg.predict(scaler_X.transform(X_test.values))
pred_dt_mlp = pd.Series(scaler_y.inverse_transform(pred_dt_mlp_scaled.reshape(-1, 1))[:,0], index=X_test.index)
err_mlp = pred_dt_mlp - y_test
sign = y_test * pd.Series(pred_dt_mlp, index=y_test.index).abs() / (y_test.abs() * pd.Series(pred_dt_mlp, index=y_test.index))
try:
neg = sign.value_counts(ascending=True)[-1.0]
except:
neg = 0
try:
pos = sign.value_counts(ascending=True)[1.0]
except:
pos = 1
print('Mean error for MLP:\n', err_mlp.mean())
print("STD of error for MLP:\n", err_mlp.std())
print("Wrong direction for MLP:\n", str(int(neg * 100 / (neg + pos))) + "%")
After the fitting with optimal hyperparameters, model performance can be evaluated and compared based on error mean, standard deviation and probability of prediction in a wrong direction estimated for testing dataset. Model can be exported using joblib library to serialise the model object and save it as a file. Serialised model file can be then imported in a similar environment and restored into the model object.
Tip:
Optimising hyperparameters with GridSearchCV can be time consuming if parameters space includes multiple options for each hyperparameter and all are optimised at once. Most often it is safe to assume that hyperparameters are independent from each other. This allows to optimise each hyperparameter separately and progress through them while applying previous tuning results.
In this project, GB, MLP and SARIMA models were built for temperature prediction separately for day and night periods with 1hr, 2hr and 3hr look-ahead and their performances are compared in tables below. Results of the three models are then used to calculated median predicted temperature change.
A technique when several inferior ML models are combined to generate a result better than each separate model can produce is called ensemble. It can be as simple as averaging or taking median result, or more complex as weighted average with weights proportional to the performance of corresponding model. Sometimes individual model outcomes are fed into yet another model. To make ensemble working, each model must be better than a random guess, and model errors should not correlate with each other.
Table 2 – Performance of different models (1 hr prediction point)
Day | Day | Day | Night | Night | Night | |
Model | Mean error, °C | Error STD, °C | Wrong direction | Mean error, °C | Error STD, °C | Wrong direction |
GB | -0.036 | 0.743 | 13% | -0.026 | 0.594 | 14% |
MLP | -0.007 | 0.746 | 12% | -0.055 | 0.593 | 14% |
SARIMA | -0.024 | 1.005 | 15% | 0.0002 | 0.708 | 18% |
Ensemble (median) | -0.024 | 0.725 | 12% | -0.033 | 0.598 | 14% |
Table 3 – Performance of different models (2 hr prediction point)
Day | Day | Day | Night | Night | Night | |
---|---|---|---|---|---|---|
Model | Mean error, °C | Error STD, °C | Wrong direction | Mean error, °C | Error STD, °C | Wrong direction |
GB | -0.078 | 1.100 | 8% | 0.024 | 0.877 | 14% |
MLP | 0.011 | 1.098 | 8% | 0.011 | 0.872 | 15% |
SARIMA | 0.223 | 1.730 | 15% | -0.407 | 0.847 | 19% |
Ensemble (median) | -0.034 | 1.07 | 8% | 0.021 | 0.866 | 14% |
Table 4 – Performance of different models (3 hr prediction point)
Day | Day | Day | Night | Night | Night | |
---|---|---|---|---|---|---|
Model | Mean error, °C | Error STD, °C | Wrong direction | Mean error, °C | Error STD, °C | Wrong direction |
GB | -0.104 | 1.369 | 6% | 0.003 | 1.332 | 21% |
MLP | -0.161 | 1.369 | 5% | -0.033 | 1.298 | 21% |
SARIMA | 0.298 | 2.697 | 14% | -0.565 | 1.310 | 22% |
Ensemble (median) | -0.111 | 1.332 | 5% | -0.017 | 1.260 | 19% |
Online Application Deployment
ML deployment in production requires some infrastructure: a database, connectivity with data sources and ability to write results somewhere to make them useful, and some way to schedule computational tasks repeatedly. To apply and maintain ML algorithms at scale this list needs to be extended with visualisation software, a management system, version control, health monitoring etc.
ARTData does it all and goes beyond, making overall deployment process structured, facilitating code audits and ensuring its reliable execution. Below sections cover the ML application deployment in the ARTData lab environment, but the principles can be applied elsewhere as well.
Reading Weather Data via REST API
If an ML application is to be integrated with Process Control System (PCS) it must be able to communicate with it over one of industrial data protocols. ARTData has OPC UA in-built communication interface capable of reading and writing data from/to PCS OPC UA Server. In the lab environment, however, weather data is pulled not from PCS but from VisualCrossing.
VisualCrossing provides access to its historical weather data, current weather and weather forecast via REST API. ARTData does not need a dedicated REST API connector to read the API data. Instead, HTTP request can be made from within data processing item. In ARTData, data processing item is a group of commands bound together to perform data querying, processing and writing results into database.
To abstract the complexity of the making request and parsing returned JSON, these operations are put inside get_weather_forecast function of visualcrossing plugin (Fig. 6). A plugin is custom python file built offline (a module in Python terminology) that can contain classes and functions becoming available in ARTData after it is saved. I.e. visualcrossing plugin is not provided by VisualCrossing but created internally to make implementation more convenient.

The API request and data recording to the time-series and real-time database tables are performed at 120 sec period in WEATHER_API_DATA data processing item, i.e. weather data is updated every 2 min. This also meets requirement of 1000 maximum number of requests per day limited by VisualCrossing.
API key string provided by VisualCrossing is saved in API_STR variable. The next variable API_RSP calls get_weather_forecast function of visualcrossing plugin with API_STR as a parameter. The function returns a dictionary of various weather parameters which are further read into dedicated variables, e.g. timestamp, temp, humidity etc. (Fig. 7). Finally, the variables values are assigned to corresponding metrics of WEATHER_API_DATA tag and recorded in database.

ML Algorithms Deployment
Before referencing in a data processing item, trained and saved ML models and scalers need to be imported to the ARTData file system. Serialised GB and MLP model files for day and night periods as well as data scalers are imported to dedicated folder (Fig. 8). The files become then available in data processing under local path of:
/adtdata/data/mpjt_000_adml_mdl/file_name.pkl

ml_model plugin is added containing following functions:
get_solar_radiation(timestamp) # returns inferred solar radiation based on date and time argument
gb_predict(predictors, quality, model_type, logging_active) # returns GB model prediction for 1hr, 2hr and 3hr points using an array of input features, predictors
mlp_predict(predictors, quality, model_type, logging_active) # returns MLP model prediction for 1hr, 2hr and 3hr points using an array of input features, predictors
get_temp_list(data, end_ts) # returns list of temperatures evenly spaced in time with interval 1 hr
get_temp_from_sarimax(data, f_periods=[1]) # returns SARIMA temperature prediction based on past temperatures
gb_predict and mlp_predict functions recover the GB and MLP models into the objects and run predictions based on provided predictor arguments. The functions take an array of predictors, data quality and logging activation switch as arguments. When provided data quality is good, the prediction runs using an ML model, otherwise default bad values -999 are returned. When logging is activated, predictors values and prediction results are logged in plain text formant in the logs directory. When unhandled exception occurs the details of it will be logged.

The data processing item TEMP_PREDICT is configured to run the ML models using functions of the above plugin. Prediction results are then assigned to metrics HR1, HR2 and HR3 (Fig. 10). The database is updated with new values only if they have Good status. From there, new data can be accessed to visualise or pass via OPC UA Interface.

Similarly to the previous data processing items, a new TEMP_INTERP item is configured. The objective of it is to interpolate between last recorded temperature and the prediction results using current time stamp. In this way a continuously updated Process Variable can be obtained.
Conclusion
Fig. 11 shows current temperature (green trend) and predictions for corresponding lookahead periods as well as the interpolation result (yellow trend). Obviously, the shorter horizon the more accurate prediction results are. Major mismatches between predicted values and reference temperature occur around 12:00 PM and caused by glitches in the life data from VisualCrossing. The models certainly have predictive power which is more evident in the morning hours when temperature increase is steeper. Overall, the quality is good for relatively unstable coastal weather.
Possibly, accuracy can be further improved by employing Recurrent Neural Networks (RNN), and especially its subclass Long Short Term Memory (LSTM) networks, optimised to handle timeseries data. However, overall ML design and deployment workflow would not change a lot.
Interpolated temperature prediction allows to deliver real-time Process Variable and fine-tune prediction look-ahead without changing underlying ML models, which is useful for integration with feedforward process control.

References
[1] Ankur Kumar, Jesus Flores-Cerrillo. Machine Learning in Python for Process Systems Engineering
[2] Joaquín Amat Rodrigo, Javier Escobar Ortiz. ARIMA and SARIMAX models with Python
[3] Maarten Laureyssen. Developing a Weather Model with Machine Learning in Python
Comentarios