Talks and presentations

Using Linear Programming for Route Planning and Job Scheduling

September 05, 2024

Talk, Enterprise Application of the R Language, Brighton, United Kingdom

​Efficiently managing travel and job scheduling for multiple workers across various locations presents a significant operational challenge. We use a Linear Programming (LP) model to optimise route planning and job allocation among multiple workers, aiming to minimise travel time and adhere to individual working hours constraints. Utilising variables such as travel costs, job durations, and resource capacities, we construct a framework that accommodates each worker’s starting location and contractual obligations. This approach not only enhances operational efficiency but also contributes to the broader field of operations research by providing a scalable solution for multi-location, multi-personnel scheduling problems.

Large-Scale Time Series Forecasting in Apache Spark

September 11, 2019

Talk, EARL 2019 (Enterprise Application of the R Language), London, United Kingdom

Accurately forecasting power demand is important for securing energy supply. Time series forecasting methods and other machine learning algorithms can be used to create energy forecasts. We have developed a forecasting framework based on multi-model approach at customer account level. The framework uses a wide range of algorithms (e.g. GLM, ElasticNet, Seasonal ARIMA-X, Decision Tree, Random Forest and Gradient Boosting Machine). Models are pre-trained on AWS EMR cluster using Spark/SparklyR. The process is run at massively parallel scale (>3000 vCores). Once the model training algorithm has completed, the model objects are persisted on AWS S3 so that they can be reused at a later date. To trigger a forecast, the deploy pipeline will load the pre-trained model object from S3 and create a forecast based on the prevailing inputs. The output is stored as partitioned parquet files on S3, which can be converted into table view through AWS Athena.

Signal Analysis using Deep Learning

December 19, 2018

Talk, Hong Kong Machine Learning Meetup, Hong Kong

Deep learning models can be used to extract representations for multidimensional time series data. We have used a sensors dataset collected from a large-scale industrial facility to illustrate this problem. Real-values sensor signals were treated as multidimensional time series and fed through a recurrent auto-encoder model. Representations extracted can be projected to low dimensionality space and reflect temporal behaviour of the underlying time series. In this way, the change of time series features over time can be summarised as a smooth trajectory path. The fixed-length vectors are further analysed using additional visualisation and unsupervised clustering techniques.

Modelling Field Operation Capacity using Generalised Additive Model and Random Forest

July 11, 2018

Talk, The Conference for Users of R (useR!) 2018, Brisbane, Australia

In any customer-facing business, accurately predicting demand ahead of time is of paramount importance*. Workforce capacity can be flexibly scheduled at local area accordingly. In this way, we can ensure having sufficient workforce to meet volatile demand.In this case study, we focus on the gas boiler repairing field operation in the UK. We have developed a prototype capacity forecasting procedure which uses a mixture of machine learning techniques to achieve its goal. Firstly, it uses Generalised Additive Model approach to estimate the number of incoming work requests. It takes into account the non-linear effects of multiple predictor variables. The next stage uses a large random forest to estimate the expected number of appointments for each work request by feeding in various ordinal and categorical inputs. At this stage, the size of the training set is considerable large and does not fully-fit in memory. In light of this, the random forest model was trained in chunks / parallel to enhance computational performance. Once all previous steps have been completed, probabilistic input such as the ECMWF Ensemble weather forecast to give a view of all predicted scenarios.

Generalised Additive Model for Gas Boiler Breakdown Demand Prediction

May 15, 2018

Lightning Talk, European R User Meeting 2018 (eRum), Budapest, Hungary

At British Gas, we operate a service and repair business with more than 6,000 qualified engineers ready to serve customers who are urgently in need across the country. Predicting demand accurately ahead of time allows us to optimally schedule workforce. Additional workforce can be scheduled in case demand is forecasted to increase substantially. We have developed a prototype demand forecasting procedure which uses a mixture of machine learning techniques. The key component uses Generalised Additive Model (GAM) to estimate the number of incoming work requests. It takes into account the non-linear effects of multiple predictor variables. The models were trained at patches level in order to capture local behaviour. Planning operators can then use the model output to fine-tune workforce assignment at the local level to meet changing demand.

Deep Neural Network Training and Applications

January 29, 2018

Talk, Bristol Data Scientists Meeting, Bristol, United Kingdom

Deep learning models can be used to extract representations for multidimensional time series data. We have used a sensors dataset collected from an industrial-scale compresssor unit to illustrate this problem. Real-values sensor signals were treated as multidimensional time series and fed through a recurrent auto-encoder model. Representations extracted can be projected to low dimensionity space and reflect temporal behaviour of the underlying time series. Specific signals can be isolated for detailed analysis using partial reconstruction of the original input.

Text Mining for Preventative Maintenance

December 15, 2017

Talk, Köln R User Group, Cologne, Germany

Large-scale industrial processes are normally comprised of thousands and thousands of individual components which are vulnerable to breakdown. Maintenance of these components is the key to reduce unplanned outages. The repair log dataset contains unstructured, free-format text description detailing the issues. We applied text mining algorithms to this dataset and turned it into an analysable format. A combination of techniques were used including tf-idf scheme and n-grams approach. Groups of vulnerable components can be visualised as a graph network.

Analysing High-Frequency Industrial Component Failure using Text Mining Techniques

September 13, 2017

Talk, EARL 2017 (Effective Application of the R Language), London, United Kingdom

Centrica plc is an energy service company and its Exploration and Production (E&P) division currently operates several gas production assets across the world. A largescale production asset usually contains thousands of components which require regular inspection and maintenance. Understanding the pattern of component failure is the key to manage large-scale assets successfully.

Parallelised Time Series Spike Detection using R on the Hadoop Platform

July 04, 2017

Poster, The Conference for Users of R (useR!) 2017, Brussels, Belgium

Smart meters records continuous stream of electricity consumption for each and every supply point across the United Kingdom. Energy suppliers are interested in understanding customer’s consumption pattern in order to provide better service for them. FlexiScore (F) is a new concept which British Gas has developed. It is a single numeric value ranging between 0 and 1 which quantifies the amount of flexible energy load for each electric supply points. High F value suggests the presence of erratic spikes, while low F value indicates prolonged consistency and non-spiky behaviour. The algorithm has been productionised on the Hadoop platform (on premise) using Microsoft R Server 8.0 as a fully-scalable analytics framework. The large-scale distributed process contains an array of Markov Chains Monte Carlo (MCMC) for missing data permutation. A layer of Fourier transformation has been applied to create seasonal time series model. Afterwards, simple heuristics is applied to isolate erratic consumption spikes. The F score is then computed as output alongside other descriptive statistics.

Multi-seasonal Time Series Modelling using Recurrent Neural Nets

September 14, 2016

Talk, EARL 2016 (Effective Application of the R Language), London, United Kingdom

The ability to foresee what’s about to happen is crucial to the success of energy companies like Centrica. For instance, predicting the number of boiler breakdown on any given day allows us to ensure sufficient number of gas engineers to be staffed.