Talks and presentations

Modelling Field Operation Capacity using Generalised Additive Model and Random Forest

July 11, 2018

Talk, The Conference for Users of R (useR!) 2018, Brisbane, Australia

In any customer-facing business, accurately predicting demand ahead of time is of paramount importance*. Workforce capacity can be flexibly scheduled at local area accordingly. In this way, we can ensure having sufficient workforce to meet volatile demand.In this case study, we focus on the gas boiler repairing field operation in the UK. We have developed a prototype capacity forecasting procedure which uses a mixture of machine learning techniques to achieve its goal. Firstly, it uses Generalised Additive Model approach to estimate the number of incoming work requests. It takes into account the non-linear effects of multiple predictor variables. The next stage uses a large random forest to estimate the expected number of appointments for each work request by feeding in various ordinal and categorical inputs. At this stage, the size of the training set is considerable large and does not fully-fit in memory. In light of this, the random forest model was trained in chunks / parallel to enhance computational performance. Once all previous steps have been completed, probabilistic input such as the ECMWF Ensemble weather forecast to give a view of all predicted scenarios.

Generalised Additive Model for Gas Boiler Breakdown Demand Prediction

May 15, 2018

Lightning Talk, European R User Meeting 2018 (eRum), Budapest, Hungary

At British Gas, we operate a service and repair business with more than 6,000 qualified engineers ready to serve customers who are urgently in need across the country. Predicting demand accurately ahead of time allows us to optimally schedule workforce. Additional workforce can be scheduled in case demand is forecasted to increase substantially. We have developed a prototype demand forecasting procedure which uses a mixture of machine learning techniques. The key component uses Generalised Additive Model (GAM) to estimate the number of incoming work requests. It takes into account the non-linear effects of multiple predictor variables. The models were trained at patches level in order to capture local behaviour. Planning operators can then use the model output to fine-tune workforce assignment at the local level to meet changing demand.

Deep Neural Network Training and Applications

January 29, 2018

Talk, Bristol Data Scientists Meeting, Bristol, United Kingdom

Deep learning models can be used to extract representations for multidimensional time series data. We have used a sensors dataset collected from an industrial-scale compresssor unit to illustrate this problem. Real-values sensor signals were treated as multidimensional time series and fed through a recurrent auto-encoder model. Representations extracted can be projected to low dimensionity space and reflect temporal behaviour of the underlying time series. Specific signals can be isolated for detailed analysis using partial reconstruction of the original input.

Text Mining for Preventative Maintenance

December 15, 2017

Talk, Köln R User Group, Cologne, Germany

Large-scale industrial processes are normally comprised of thousands and thousands of individual components which are vulnerable to breakdown. Maintenance of these components is the key to reduce unplanned outages. The repair log dataset contains unstructured, free-format text description detailing the issues. We applied text mining algorithms to this dataset and turned it into an analysable format. A combination of techniques were used including tf-idf scheme and n-grams approach. Groups of vulnerable components can be visualised as a graph network.

Analysing High-Frequency Industrial Component Failure using Text Mining Techniques

September 13, 2017

Talk, EARL 2017 (Effective Application of the R Language), London, United Kingdom

Centrica plc is an energy service company and its Exploration and Production (E&P) division currently operates several gas production assets across the world. A largescale production asset usually contains thousands of components which require regular inspection and maintenance. Understanding the pattern of component failure is the key to manage large-scale assets successfully.

Parallelised Time Series Spike Detection using R on the Hadoop Platform

July 04, 2017

Poster, The Conference for Users of R (useR!) 2017, Brussels, Belgium

Smart meters records continuous stream of electricity consumption for each and every supply point across the United Kingdom. Energy suppliers are interested in understanding customer’s consumption pattern in order to provide better service for them. FlexiScore (F) is a new concept which British Gas has developed. It is a single numeric value ranging between 0 and 1 which quantifies the amount of flexible energy load for each electric supply points. High F value suggests the presence of erratic spikes, while low F value indicates prolonged consistency and non-spiky behaviour. The algorithm has been productionised on the Hadoop platform (on premise) using Microsoft R Server 8.0 as a fully-scalable analytics framework. The large-scale distributed process contains an array of Markov Chains Monte Carlo (MCMC) for missing data permutation. A layer of Fourier transformation has been applied to create seasonal time series model. Afterwards, simple heuristics is applied to isolate erratic consumption spikes. The F score is then computed as output alongside other descriptive statistics.

Multi-seasonal Time Series Modelling using Recurrent Neural Nets

September 14, 2016

Talk, EARL 2016 (Effective Application of the R Language), London, United Kingdom

The ability to foresee what’s about to happen is crucial to the success of energy companies like Centrica. For instance, predicting the number of boiler breakdown on any given day allows us to ensure sufficient number of gas engineers to be staffed.