I have searched extensively in Google and here but cannot seem to find the answer I am looking for or at least, some thing I understand. Is it possible to use EWMA in Pandas for forecasting ? For example, if I had daily data of website clicks for 2 months 1st Feb to 31st Mar. and don't see any trend or seasonality in the data, it seems like I should be able to use EWMA to "predict" number of clicks at a later date say on 10th April. In Excel, I can imagine just filling approximately 10 dates or rows after 31st March and computing a moving average where the 5-day EWMA for 10th April will be based on weighted forecasts of prior days. Is there a way I can do this in Python ?
Thanks !
It's a one-liner to implement, but you're going to be a little bored by EWMA's predictions of the future (the mean is simply the most recent observation). If you'd like a python package that lets you experiment with EWMA level, trend and seasonality, try my Holt Winters implementation:
https://github.com/welch/seasonal
https://pypi.python.org/pypi/seasonal
Related
i have 30 years for daily temperature data and i have a problem with determinate the saisonal period , i cant do it with lag = 365 day python is implemented
so can i use 90 day ( 1 season) . I don't know does it make sense or not
A seasonality of 365 does not really make sense for three reasons:
Such a long seasonality often leads to long runtimes or as in your case the process breaks.
When you use exactly 365 days you are ignoring leap years, so you model distorts over longer periods (30 years for examples results a distortion of 7 days already)
The most important reason is the correlation itself. A seasonality of 365 would mean that the weather today is somehow correlated to the weather last year, which is not really the case. Of course, both days will be somewhat close to another due to meteorological seasons, but I would not rely on this correlation. Imagine last year was relatively cold and this year is relatively warm. Do you use the last days immediately before today or would you base your forecast on some day a year ago? That is not very reliable.
Your forecast has to be based on the last immediate lags before the actual day – not the days of last year. However, to improve your model you have to factor in the meteorological seasons (spring, summer, fall, winter). There are multiple ways to do so. One for example would be to use SARIMAX and pass the model an exogenous feature, e.g. the monthly average temperature or a moving average over the last years. That way you can teach the model that days in summer are generally hotter than in winter, and for the precise prediction you use the days immediately before.
See here for the documentation of statsmodels SARIMAX:
https://www.statsmodels.org/dev/generated/statsmodels.tsa.statespace.sarimax.SARIMAX.html
There are plenty of examples on how to create these models with exogenous variables on the web.
I am currently using Prophet to forecast usage in a year period. This is my first time using this algo and I have some questions in mind.
I am utilising the code attached below. I am wondering if anyone has included holidays as parameter before and how to do so while including holidays from other calendar (lunar/islamic etc). Also since February may have 1 more day in a leap year, would be great as well to know if the algorithm take this into consideration?
m = Prophet(
growth='logistic',
seasonality_mode='multiplicative',
seasonality_prior_scale=1.5,
mcmc_samples=5,
n_changepoints=25,
changepoint_range=0.8,
yearly_seasonality='auto',
weekly_seasonality='auto',
daily_seasonality='auto',
holidays=None,
holidays_prior_scale=10.0,
changepoint_prior_scale=0.05,
interval_width=0.8,
stan_backend=None,
)
The holidays parameter takes in a dataframe. The minimal set of columns required in that dataframe are date and holiday name.
The important thing to note here is that you provide both historical and future holidays in this dataframe.
Apart from the 2 columns mentioned above, the following columns are optional:
lower_window, upper_window (int) - to extend holiday effect around the date of holiday.
prior_scale(float) - to set a different prior scale for each holiday.
Also to answer your second question i.e.
Also since February may have 1 more day in a leap year, would be great
as well to know if the algorithm take this into consideration?
It depends on the modelling data. Since the data you'd be providing would already include leap year, Prophet will take that into consideration.
I am a beginner in Python programming and machine learning.
I have a dataset with sales per product on monthly level.
The dataset has data from 2015 up till 2019.
With the help of Python I would like to make a prediction model that predicts the sales of the next month.
I followed this tutorial:
Sales prediction
This gave me a prediction of the last 6 months and lined them up with the actual sales, I managed to gat a pretty accurate prediction but my problem is that I need the predictions per product and if possible I would like to get the influence of weather in there aswell. For example if the weather data would be rainy it has to take this into account.
Does anyone know a way of doing this?
Every tip on which model to use or article to read is much appreciated!!
The most basic way of doing that would be to run an ARIMA model with external regressors (the weather measured in terms of temperature, humidity or any other feature that is expected to influence the monthly sales).
What is important is that before fitting the model, the sales data had better be transformed into log monthly changes by something like np.log(df.column).diff().
I've recently written a Python micro-package salesplansuccess, which deals with prediction of the current (or next) year's annual sales for individual products from historic monthly sales data. But a major assumption for that model is a quarterly seasonality (more specifically a repeating drift from the 2nd to the 3rd month in each quarter), which is more characteristic for wholesalers (who strives to achieve their formal quarterly sales goals by pushing sales at the end of each quarter).
The package is installed as usual with pip install salesplansuccess.
You can modify its source code for it to better fit your needs. It uses both ARIMA (maximum likelihood estimates) and linear regression (least square estimates) technics under the hood.
The minimalistic use case is below:
import pandas as pd
from salesplansuccess.api import SalesPlanSuccess
myHistoricalData = pd.read_excel('myfile.xlsx')
myAnnualPlan = 1000
sps = SalesPlanSuccess(data=myHistoricalData, plan=myAnnualPlan)
sps.fit()
sps.simulate()
sps.plot()
For more detailed illustration of its use, you may want to refer to a Jupyter Notebook illustration file at its GitHub repository.
I have a data file with one column full of time stamps and I have aggregated the times in 10 minute time intervals, I am trying to visualize them to find underlying patterns of the demand. I have looked at a histogram of this information...and the heat map did not return good results.
My information is just one column full of timestamps like this:
2017-08-28 14:37:00
I have 100,000 rows and I am trying to use pandas for forecasting, I dont know if I should use linear regression or kalman filter so far this is my visualization
plt.figure()
df["time"].apply(lambda x: x.hour).plot.hist(bins=24) I am trying to get it more granular on a 10 minute interval time and then look at patterns and implement a forecasting technique
I'm not sure I understand what is your question precisely. From what I understood, you have a uni dimensional time-series of "demand" and you want to develop a prediction algorithm.
For your data exploration, identification of patterns", I understand you have difficulty for visualization. First, to increase the granularity of your histogram, you may want to group your data on a daily basis and plot an histogram with 24*6=144 bins. If you want to try more visualization, some are basics one:
you could try a simple graph visualization as your data seems to be unidimensional
another option is to build heatmaps with as axis the hour of the day, the day of the week (Monday, Tuesday, etc), the month of the year
a scatterplot with as x-axis the hour between 0h and 23h, ...
You should find many different options.
For the prediction algorithm, you did not provide any info so we could give an hint. Try to be more specific, or have a quick search for "time series prediction"
Been trying to take an average of a month worth of data but I wanted to check that:
df=df.resample('M').mean()
Does give the monthly mean and NOT the mean of the last calander day of month
Also I've seen W-Mon which would give an average of the monday at a frequency of a week. What would be the equivalent to compare the monthly average of October over multiple years.
I thought it would be this- but it doesn't seem to recognise the command
df=df.resample("M-OCT").mean()
try this:
df.assign(y=df.index.year, m=df.index.month).query('m==10').groupby(['y', 'm']).mean()
PS if you need a neat and tested answer please post sample data set and desired output in your question...