I am a beginner in Python programming and machine learning.
I have a dataset with sales per product on monthly level.
The dataset has data from 2015 up till 2019.
With the help of Python I would like to make a prediction model that predicts the sales of the next month.
I followed this tutorial:
Sales prediction
This gave me a prediction of the last 6 months and lined them up with the actual sales, I managed to gat a pretty accurate prediction but my problem is that I need the predictions per product and if possible I would like to get the influence of weather in there aswell. For example if the weather data would be rainy it has to take this into account.
Does anyone know a way of doing this?
Every tip on which model to use or article to read is much appreciated!!
The most basic way of doing that would be to run an ARIMA model with external regressors (the weather measured in terms of temperature, humidity or any other feature that is expected to influence the monthly sales).
What is important is that before fitting the model, the sales data had better be transformed into log monthly changes by something like np.log(df.column).diff().
I've recently written a Python micro-package salesplansuccess, which deals with prediction of the current (or next) year's annual sales for individual products from historic monthly sales data. But a major assumption for that model is a quarterly seasonality (more specifically a repeating drift from the 2nd to the 3rd month in each quarter), which is more characteristic for wholesalers (who strives to achieve their formal quarterly sales goals by pushing sales at the end of each quarter).
The package is installed as usual with pip install salesplansuccess.
You can modify its source code for it to better fit your needs. It uses both ARIMA (maximum likelihood estimates) and linear regression (least square estimates) technics under the hood.
The minimalistic use case is below:
import pandas as pd
from salesplansuccess.api import SalesPlanSuccess
myHistoricalData = pd.read_excel('myfile.xlsx')
myAnnualPlan = 1000
sps = SalesPlanSuccess(data=myHistoricalData, plan=myAnnualPlan)
sps.fit()
sps.simulate()
sps.plot()
For more detailed illustration of its use, you may want to refer to a Jupyter Notebook illustration file at its GitHub repository.
Related
I have a dataset that I am trying to analyze for a project.
The first step of the project is to basically model the data, and I am running into some issues. The data is on house sales within the past 5 years. Collecting data on buyers, cost of house, income, age, year purchased, years in loan, years at current job, and whether or not this house was foreclosed on with YES or NO.
The goal is to train a model to make predictions using machine learning, but I am stuck on part 1 - describing the data. I am using Jupyter notebooks to analyze the data and trying to put together a linear or multilinear regression model, and I am failing. When I throw together a scatter plot, my data is all over the chart with no way to really "group" the data at intersection point and cast a prediction line. This makes it difficult to figure out what is actually happening, perhaps the data I am comparing is not correlated in any way.
The problem also comes in with the YES or NO data. I was thinking this might need to be converted into 0s and 1s, but then my linear regression model would have an incredible weight on both ends of the spectrum. Perhaps regression is not the best choice?
I'm just struggling to figure out what to do and how to do it. I am kind of new to data analysis, so perhaps I am thinking of this all wrong. If anyone has any insight it would be much appreciated.
im currently trying to build a good forecast on Hourly Based Data with Python and Prophet.
After cleaning all the data and resampling missing values, i already got a much better result.
I also included cap and floor and a own changepoint_prior_scale.
When i plot the prediction result over the actual data it fits until there are peaks.
Could anybody give me tips to make prophet predict these peaks in July better ?
It seems like the peaks are there but to low.
Here is the part of my code to generate the model and predict the future:
df['cap']=130
df['floor']=0
model = Prophet(changepoint_prior_scale=0.1, growth='logistic').fit(df)
future = model.make_future_dataframe(periods=15*24, freq='H')
future['cap']=130
future['floor']=0
forecast = model.predict(future)
plot1 = model.plot(forecast)
And here is a image of the plot:
I will suggest you to start with looking on the components of the forecast (trend, seasonality) to get the overview of the model. You can do this using:
m.plot_components(forecast)
In your model you are using only changepoint_prior_scale greater than default (0.05) what means that you enable trend to be more flexible.
It will be worth to play with different seasonality parameters - for example try to smooth yearly seasonality adding to the model parameters yearly_seasonality=20 (10 by default).
Knowing that in July you have always the peak, you can try also to use Prophet holiday effect - which is well described here.
Which modelling strategy (time frame, features, technique) would you recommend to forecast 3-month sales for total customer base?
At my company, we often analyse the effect of e.g. marketing campaigns that run at the same time for the total customer base. In order to get an idea of the true incremental impact of the campaign, we - among other things - want to use predicted sales as a counterfactual for the campaign, i.e. what sales were expected to be assuming no marketing campaign.
Time frame used to train the model I'm currently considering 2 options (static time frame and rolling window) - let me know what you think. 1. Static: Use the same period last year as the dependent variable to build a specific model for this particular 3 month time frame. Data of 12 months before are used to generate features. 2. Use a rolling window logic of 3 months, dynamically creating dependent time frames and features. Not yet sure what the benefit of that would be. It uses more recent data for the model creation but feels less specific because it uses any 3 month period in a year as dependent. Not sure what the theory says for this particular example. Experiences, thoughts?
Features - currently building features per customer from one year pre-period data, e.g. Sales in individual months, 90,180,365 days prior, max/min/avg per customer, # of months with sales, tenure, etc. Takes quite a lot of time - any libraries/packages you would recommend for this?
Modelling technique - currently considering GLM, XGboost, S/ARIMA, and LSTM networks. Any experience here?
To clarify, even though I'm considering e.g. ARIMA, I do not need to predict any seasonal patterns of the 3 month window. As a first stab, a single number, total predicted sales of customer base for these 3 months, would be sufficient.
Any experience or comment would be highly appreciated.
Thanks,
F
I have a monthly time series which I want to forecast using Prophet. I also have external regressors which are only available on a quarterly basis.
I have thought of following possibilities -
repeat the quarterly values to make it monthly and then include
linearly interpolate for the months
What other options I can evaluate?
Which would be the most sensible thing to do in this situation?
You have to evaluate based on your business problem, but there are some questions you can ask yourself.
How are the external regressors making their predictions? Are they trained on completely different data?
If not, are they worth including?
How quickly do we expect those regressors to get "stale"? How far in the future are their predictions available? How well do they perform more than one quarter into the future?
Interpolation can be reasonable based on these factors...but don't leak information about the future to your model at training time.
Do they relate to subsets of your features?
If so, some feature engineering could but fun - combine the external regressor's output with your other data in meaningful ways.
Using Python, I am trying to predict the future sales count of a product, using historical sales data. I am also trying to predict these counts for various groups of products.
For example, my columns looks like this:
Date Sales_count Department Item Color
8/1/2018, 50, Homegoods, Hats, Red_hat
If I want to build a model that predicts the sales_count for each Department/Item/Color combo using historical data (time), what is the best model to use?
If I do Linear regression on time against sales, how do I account for various categories? Can I group them?
Would I instead use multilinear regression, treating the various categories as independent variables?
The best way I have come across in forecasting in python is using SARIMAX( Seasonal Auto Regressive Integrated Moving Average with Exogenous Variables) model in statsmodel Library. Here is the link for a very good tutorial in SARIMAX using python
Also, If you are able to group the data frame according to your Department/Item?color combo, you can put them in a loop and apply the same model.
May be you can create a key for each unique combination and for each key condition you can forecast the sales.
For example,
df=pd.read_csv('your_file.csv')
df['key']=df['Department']+'_'+df['Item']+'_'+df['Color']
for key in df['key'].unique():
temp=df.loc[df['key']==key]#filtering only the specific group
temp=temp.groupby('Date')['Sales_count'].sum().reset_index()
#aggregating the sum of sales in that date. Ignore if not required.
#write the forecasting code here from the tutorial