Demand forecasting in supermarkets (part 1)

 

The Problem

Striking a good balance between supply and demand has always caused massive headaches for retailers. How much inventory is enough to have on hand to accommodate future sales? Accurate demand forecasting removes the uncertainty and puts retailers one step ahead of the curve. Some organizations are antifragile, whereby they thrive from uncertainty. This is the case of information security companies that become more resilient as they face hackers. However, less agile companies, which is the case for retailers, and in particular supermarkets, are fragile to the uncertainty of future demand. This prevents them from being proactive, worsens profits and hurts the cohesion of their business. We highlight here some of the negative implications of lack of accurate forecasts.

Screenshot 2019-01-31 at 18.03.02.png

By far the most negative one is stockouts, that is, empty shelves. On average, at any point across the world, around 8% of supermarket shelves are empty. Missing sales leads to lower profit margins, while brands take a hit as well in terms of customer preferences.  At the other end of the spectrum, there is overstocking. Too much inventory means blocked stock and for perishable items this often leads to waste. In business language, it means higher operational costs and additional pressure on cash flows. The end result is similar to stockouts: lower margins.

Food waste, apart from eating away a yearly sum of roughly 200 million USD from supermarkets, it’s a major source of environmental damage. So much so, that in 2015, the United Nations General Assembly made it one of its goals to address irresponsible food consumption. Some countries (France, Italy, UK) were quick in passing laws against food waste by supermarkets, adding additional pressure on supermarkets to stock adequately (compare that to the 19th century, when there was almost no food waste since everything was produced and consumed locally).

Stockouts, overstocking and food waste

Stockouts, overstocking and food waste

Traditionally, supermarkets rely solely on past history, human input, and gut-feeling (heuristics) for future demand projections. At most, traditional statistical models are employed in this process. However, the forecast accuracy remains disappointing. That’s mainly because so many things change from year to year. Over-reliance on past sales gives on to repeating past mistakes. Subtle things such as the effect of promotions on demand, the effect of new products on the sales of other products or the importance of external data (geography, competition, economy, social activity etc.) fail to be captured by traditional approaches. To make things even worse, demand forecasting needs to be done at store level, which further increases the complexity of the problem.

Machine learning vs Traditional methods

Machine learning vs Traditional methods

This is where Neurolabs’ solution comes in, by leveraging the power of specialized machine learning models and deep learning to integrate all the above factors (and many more) and increase the accuracy of demand forecasting.

 

Demand forecasting in supermarkets (part 2)

 

The techie bit

This is where machine learning and, in particular, a branch of machine learning, called deep learning, comes in. Specialised artificial intelligence models, they are designed to deal with vast amounts of data. As opposed to the static traditional methods, these algorithms respond dynamically to changes in the data and get better the more data becomes available. Properly built, they can find meaningful insights in an ocean of data.

External factors influencing supermarket demand

External factors influencing supermarket demand

At Neurolabs we focus extensively on developing cutting edge deep learning algorithms to predict future demand for retailers. Our edge consists of augmenting customer’s internal data with vast amounts of relevant external data. Specifically, we look at more than 35 external factors that can influence future demand. Amongst these, there are some obvious ones (e.g. seasonality, weather) but also some less obvious ones that are more difficult to adjust for via traditional methods (e.g. competitor proximity, consumer trends, competitor prices/promotions, social activity etc.). Technically, we managed to show significant additional improvement in terms of the accuracy of the predictions when relevant external data was taken into account.

Our technical expertise, combined with our business acumen, allows us to meet and surpass our clients’ expectations in terms of delivered results.

The results

We tested our algorithms working with a top supermarket chain in Southern Europe. Running across a large and diverse set of different stores (81 stores) and products (14,000 unique SKUs), our solution delivered massive improvements in terms of forecasting accuracy compared to the supermarket chain’s existing baseline method. More importantly, we managed to bring the average stockout rate down by 3% and reduce the average amount of overstocking by up to 40%. Were the supermarket to rely fully on our predictions for a period of 3 months and always order as much inventory as predicted by our solution, the profit margins would have increased by up to 19% for most products. Additionally, the food waste bill would have been 7 figure lower during the same period of time.

Demand prediction: performance comparison for a product

Demand prediction: performance comparison for a product


In practice, however, there are various barriers to fully unlock this margin potential, such as a disconnected or constrained supply chain. Regardless, we managed to show the asymmetrical business impact of our solution: a small investment in the existing inventory management system can pay off handsomely. Technically, we succeeded in showing the importance of incorporating external data in demand forecasting, as can be seen in our demo page. Moreover, such results can be achieved while always aligning with supermarkets’ top priorities: security and control of data. The future will always be uncertain, but for supermarkets partnering with Neurolabs it can be less uncertain and less costly, unleashing enormous benefits.

 

Brain Tumour Data Analysis

 

One of the biggest challenges in the age of Machine Learning is to effectively impact the healthcare domain, bringing benefits to patient treatments and providing a toolset that doctors can use to aid their understanding in situations where AI can provide an advantage.

Introduction

Statistics and medicine have a joint history spanning more than a century. Models started being successfully applied to medical datasets as far back as the late 1960s. Machine Learning methods have been applied to clinical and genetic datasets from the 1980s onwards, with a pick up in pace mirroring that in other fields over the past couple of years. This has been due to the progress in modelling and computational capabilities in the Machine Learning field, of the explosion in medical data, clinical and unstructured, as well as the lower costs for sequencing genetic data 1, 2, 3.

One of the previous research projects done by our co-founders involved partnering with the neuro-oncology department of the Pitie Salpetriere University Hospital/Pierre et Marie Curie University in Paris. Over several decades, this department has collected one of the largest, and certainly most unique, clinical datasets on brain tumors.

In 2018, brain tumors remain one of the worst types of cancers. It is difficult to treat surgically, as well as to reach with radio or chemotherapy - survival expectancy is bleak. For glioblastoma (GBM), survival expectancy from diagnosis is 3 months without treatment and 14 months with Gallego (2015)WHO (2016). The vast majority of studies applying Machine Learning to medical datasets focus on the most common cancers, particularly breast cancer. The aim of our study was to give insight into the patterns found in this dataset, for Glioblastoma, our goals being three-fold:

  1. Apply supervised learning to build survival prediction models and compare them to survival analysis methods.

  2. Perform unsupervised learning for patient clustering & data exploration.

  3. Develop data visualizations and tools that give insight into data patterns and can be of use to clinicians.

Background

There has been significant progress in recent years on applications of Machine Learning methods to the diagnosis of brain cancer, mostly focused on medical imagery Dr. Bradley Erikson NVIDIA’s 2017 Global Impact AwardV. Panca and Z. Rustam. The recent work of J Lao et. al. published in Nature is a perfect example of applying Deep Learning to MRI data (~75 observations) for prediction of survival. However, work on applying Machine Learning to brain cancer survival prediction and clustering from clinical data is almost non existent. We came across one such study BS MA et.al., which mixed molecular and clinical data for GBM cases from the Cancer Genome Atlas database, in order to predict survival. The study achieved Area under the Curve (AUC) scores of 0.82 when mixing clinical and somatic copy-number alterations data, and of 0.98 when mixing microRNA data with clinical data. There have been several meta studies of applications of Machine Learning to oncological datasets, notably, Cruz & Wishart (2007)Kourou et. al. (2015) focused on breast cancer studies specifically, PH Abreu et.al. (2016). Cruz & Wishart found more than 1500 papers on the topic of Machine Learning and cancer.

Dataset

Our dataset consisted of a series of subjects diagnosed with brain tumors over several decades. The dataset consisted of 7630 observations, including in some cases, several observations taken over time for a similar subject. The three main components where categorical variables such as gene mutations, tumor types, grade, location, surgery type, binary variables such as gender or related to patients undergoing certain treatments i.e. radiotherapy or chemotherapy and finally continuous variables such as age at surgery and life expectancy.

As with any real world dataset, our dataset had a lot of missing data, several genetic indicators lacking between 50% and 83% of the observations. Furthermore, we only had about 30% of observations with a known death date. Preprocessing of our dataset and dealing with the missing data was one of the most challenging aspects of our study. In the end, we chose two methods of handling missing data, both by imputation: MICE and Amelia. Because this is such a pervasive problem, we believed the best approach for sharing our knowledge was to write a hands-on tutorial on how to deal with missing data, recently presented by our collaborator Alex for ODSC Europe.

Methods

We present here a quick summary of some of the methods we used in our analysis. At the time of writing, we are in the process of publishing our work, which will contain the detailed steps for the entire workflow.

Clustering

dendogram.png
cobweb.png

Another useful algorithm for clustering is COBWEB3 (above), which does incremental concept formation. For partition based clustering we used Partitioning Around Medoids (PAM), based on Gower distances, which can better compute similarities between mixed type variables.

Supervised Classification

The purpose of this analysis was to identify if we can classify patients based on their date of death, and identify how well we can predict on the test set of patients as well as how useful that is for clinicians. We used out-of-the-box classifiers such as Decision Trees, Random Forests, and various variants of Neural Networks to aim to capture the nonlinearity in the dataset. We are also building a tool that allows clinicians to inspect where the algorithms got “fooled”. Below are confusion matrices from Random Forests & Neural Networks on a test set of 418 patients.

confusion.png
confusion_nn.png

Neural networks provide marginally better results than Random Forests. The downside of this approach is that it is a black box method and we cannot make direct statements about the importance of each feature in our dataset.

Conclusion

We believe that stand-alone clinical data does not have enough predictive power to allow for high accuracy in a classification or regression setting. Regardless of that, a study developing supervised and unsupervised Machine Learning methods on clinical data can give clinicians another idea of how patients are related to one another. Our goal is to provide clinicians with a simple software tool where they can interpret the results of the various methods presented.

References

1. MIT Review
2. Watson Health
3. Deepmind Health

 

Time Series Analysis

 

Sales Forecasting

In the retail industry, demand forecasting is a hot topic. Supermarkets specifically face both an economic and an ethical problem, as for them each forecasting mistake translates into lost revenue and most importantly food waste. $120B worth of food waste can be saved by optimising inventory levels alone, globally.

We are have been working with data from one of the biggest supermarket chains in Portugal and South America in order to improve the statistical algorithms used for stock prediction.

Introduction

Demand forecasting gives businesses the ability to use historical data on markets to help plan for future trends. Without accurate demand forecasting, it is close to impossible to have the right amount of stock on hand at any given time.

In a sense, demand forecasting is attempting to replicate human knowledge of consumers once found in a local store. Long ago, retailers could rely on the instinct and intuition of shopkeepers. They knew their customers by name, but, more importantly, they also knew buying preferences, seasonal trends, product affinities and likely future purchases.

Too much merchandise in the warehouse means more capital tied up in inventory, and not enough could lead to out-of-stocks — and push customers to seek solutions from your competitors.

Dataset

The dataset from a top Portuguese supermarket contains sales data for 12 months, ending in January 2018. There are 1175 different items, 98 store locations in 15 regions and 4 possible assortment types. The stockout rate is 12.1% - meaning that during a week, there’s a 12.1% chance that a store will run out of a given product.

In the graph below, we visualise the sales values for a single item, in two stores, marking the weeks when a stockout occurred.

sales_stockout.png

Next, we want to see how the different data points correlate with each other. In the correlation matrix below, we can observe the impact of the Assortment on the Total sales - the better an item is positioned in the shelf, the more sales it produces.

table_01_squarespace_V03.png


Methods

We have used a few off the shelf methods as benchmarks against our own model.

1. ARIMA models are, in theory, the most general class of models for forecasting a time series which can be made to be “stationary”.

The ARIMA forecasting equation for a stationary time series is a linear (i.e., regression-type) equation in which the predictors consist of lags of the dependent variable and/or lags of the forecast errors. That is:

where θ is the absolute error between the prediction and actual sale number.

where θ is the absolute error between the prediction and actual sale number.

2. Facebook Prophet is an algorithm for time series forecasting, based on an additive model Trends are fit with yearly, weekly and daily seasonality and it also accounts for holidays. The model is designed for time series with seasonal effects, making it a perfect candidate for our problem. Prophet is robust to missing data and shifts in the trend, and typically handles outliers well.

3. Amazon DeepAR is a supervised learning algorithm for forecasting scalar (that is, one-dimensional) time series using recurrent neural networks (RNN). Classical forecasting methods, such as Autoregressive Integrated Moving Average (ARIMA) or Exponential Smoothing (ETS), fit a single model to each individual time series, and then use that model to extrapolate the time series into the future. In many applications, however, you encounter many similar time series across a set of cross-sectional units. Examples of such time series groupings are demand for different products, server loads, and requests for web pages. In this case, it can be beneficial to train a single model jointly over all of these time series. DeepAR takes this approach, outperforming the standard ARIMA and ETS methods when your dataset contains hundreds of related time series. The trained model can also be used for generating forecasts for new time series that are similar to the ones it has been trained on.


Stock predictions

Our model is built in Python using Tensorflow libraries. We use a fully connected Neural Network for predicting our baseline, with Long Short-Term Memory (LSTM) cells.

There are two ways in which we considered to train our model:

  1. Split the whole sales dataset by period: We use everything before a set date (2017-09-10) as training data and everything afterwards as test data. Given the limited amount of data, this approach does not capture special events such as winter holidays.

  2. Take a percentage of all items for the training set (80%) leaving the rest (20%) for the test set.

    • With this approach, we are unsure if we have a balanced distribution of items in the training set and the test set.

In both scenarios, we take data from Week T alone and we predict sales for Week T+1. We assume that we are unaware if a promotion is coming in the following week (a scenario which is unlikely in the real world). 

baseline_vs_neurolabs.png

We measured the model error using Symmetric Mean Absolute Percentage error between the actual sales value and the supermarket’s baseline or our prediction. In both cases, our model fits the total sold items better than the supermarket’s example baseline, as shown in the table below.

Random split       Split by date
Baseline SMAPE 13.4% 14.8%
Neurolabs SMAPE   11.1%   12.6%  
 

Early diabetes detection

 

Deep learning has shown great success in various machine learning tasks. In image classification tasks, the ability of deep convolutional neural networks (CNNs) to deal with complex image data has proved to be unrivalled. Deep CNNs, however, require large amounts of labeled training data to reach their full potential. In specialised domains such as healthcare, labeled data can be difficult and expensive to obtain. One way to alleviate this problem is to rely on active learning, a learning technique that aims to reduce the amount of labelled data needed for a specific task while still delivering satisfactory performance. We designed a new method that exhibits significantly improved performance over the state-of-the-art Bayesian method in active learning for various image classification tasks.

The image classification task is particularly relevant to the medical industry. IBM researchers estimate that at least 90 percent of all medical data comes in the form of medical images [1], making it the largest data source in the healthcare industry. According to a recent study by McKinsey [2], the potential value of deep learning in the medical domain would be enormous, mainly due to machine learning’s enormous potential to enhance diagnostic accuracy. AI solutions are already prevalent in the medical imaging industry, with applications ranging from detection of anatomical and cellular structures, to tissue segmentation, radiology, and disease diagnosis and prognosis. However, most of these techniques rely on rich labeled data sets incorporating image and video inputs, including from MRIs. Our work demonstrated the practical application of active learning, with a focus on the medical domain. Specifically, we developed a new active learning method and used it to successfully detect signs of diabetic retinopathy in images, an eye disease associated with long-standing diabetes.

References
[1] HealthcareInformatics (2016). Ibm unveils watson-powered imaging solutions at rsna. 
[2] Chui, M. and Francisco, S. (2017). Artificial intelligence the next digital frontier? McKinsey and Company Global Institute, 47.