2. Data Pre-processing – This step mainly involved cleansing the data to get it ready for machine learning.
3. Feature Engineering – This is one of the most important steps in this process which involves using domain knowledge in engineering the features from the given data and finding their relative importance with respect to the sales, in order to better understand the data and make the machine learning model more intelligent and accurate. For this solution, we derived various features from the data such as – ‘month over month sales difference’, ‘Quarterly Average Sales’, ‘Average Seasonal Sales’ and so on.
A sample of how the importance is derived and can be used to make decisions is shown below.
Figure 2: Feature Engineering for Sales Forecasting
From Figure 2 above, it can be derived that sales-based features are very important, followed by location-based features and lastly the kind of product. This process ensures that the right data goes into building the model.
The feature engineering itself is a combination of various methods such as correlations and random forests, among others.
4. Machine Learning – The machine learning process was iterative and involved many stages of tuning and optimization to arrive at the best possible model which would take into consideration all the data and business rules. Various regression and time series models were conditioned on the data, but the Advanced Multi-Seasonal-Multi Variate ARIMA (Auto Regression Integrated Moving Average) model was the best suited for this problem. It is an ensemble of the traditional ARIMA + Custom Machine Learning elements.
Why Multi-Seasonal-Multi Variate ARIMA?
Once the candidate model was chosen and optimized it was deployed for use.
The output derived from the Machine Learning Forecasting Model is an accurate forecast of sales volumes of the products by different levels of granularities – Regions, Factories and Salesperson. It is accurate to within an average of 5% of the actuals, meaning very low margin of error.
The sales manager and the salesperson can now have a more reliable view of the forecast from different dimensions and make the appropriate business decisions.
The solution built was very successful in addressing the problem and benefiting the business. Such accurate and scalable models can be applied to similar sales scenarios which involve complexity and large amounts of data.