How to set resample length in Block bootstrap in ARCH package - Python - python

I want to use the CircularBlockBootstrap() in the arch package, but I can't find a way to set the output length. My code snippet is below:
from arch.bootstrap import CircularBlockBootstrap
error_1y = forecast_1y - actual_1y # this is a series of forecast residual with one year's half hourly data
bs = CircularBlockBootstrap(48*7, error_1y, seed=8)
samples = []
for data in bs.bootstrap(100):
samples.append(data[0][0].reset_index().iloc[:,1])
Each sample's length is the same as my input, but I would like to generate 3 years' data. Could anyone please help? Thank you!
I've looked through the documentation but don't seem to find the answer.

Related

Splitting a single large csv file to resample by two columns

I am doing a machine learning project with phone sensor data (accelerometer). I need to preprocess dataset before I export it to the ML model. I have 25 classes (alphabets in the datasets) and there are 20 subjects (how many times I got the alphabet) for each class. Since the lengths are different for each class and subject, I have to resample. I want to split a single csv file by class and subject to be able to resample. I have tried some things like groupby() or other things but did not work. I will be glad if you can share thoughts what I can do about this problem. This is my first time asking a question on this site if I made a mistake I would appreciate it if you warn me about my mistakes. Thank you from now.
I share some code and outputs to help you understand my question better.
what i got when i tried with groupby() but not exactly what i wanted
This is how my csv file looks like. It contains more than 300,000 data.
Some code snippet:
import pandas as pd
import numpy as np
def read_data(file_path):
data = pd.read_csv(file_path)
return data
# read csv file
dataset = read_data('raw_data.csv')
df1 = pd.DataFrame( dataset.groupby(['alphabet', 'subject'])['x_axis'].count())
df1['x_axis'].head(20)
I also need to do this for every x_axis, y_axis and z_axis so what can I use other than groupby() function? I do not want to use only the lengths but also the values of all three to be able to resample.
First, calculate the greatest common number of sample
num_sample = df.groupby(['alphabet', 'subject'])['x_axis'].count().min()
Now you can sample
df.groupby(['alphabet', 'subject']).sample(num_sample)

Accumulated Distribution Line

I'm trying to learn some python and are currently doing a few stock market examples. However, I ran across something called an Accumulated Distribution Line(technical indicator) and tried to follow the mathematical expression for this until I reached the following line:
ADL[i] = ADL[i-1] + money flow volume[i]
Now. I have the money flow volume at index 8 and an empty table for the ADL at index 9 (index for rows in a csv file). How would I actually compute the mathematical expression above in python? (Currently using Python with Pandas)
Currently tried using the range function such as:
for i in range(1,stock["Money flow volume"])):
stock.iloc[0,9] = stock.iloc[(i-1),9] + stock.iloc[i,8]
But I think I'm doing something wrong.
that just looks like a cumulative sum with an unspecified base case, so I'd just use the built in cumsum functionality.
import pandas as pd
df = pd.DataFrame(dict(mfv=range(10)))
df['adl'] = df['mfv'].cumsum()
should do what you want relatively efficiently

Exponential Moving Average df.ewm() function

I am following through step by step of Analytics Vidhya's
Time Series forecasting posted a while ago. I am at the step where we calculate exponential moving average
https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-python/?
Link for the article.
Here is vidhya's code :
xpwighted_avg = pd.ewma(ts_log, halflife=12)
plt.plot(ts_log)
plt.plot(expwighted_avg, color=‘red’)
Mycode:
expwavg = a.ewm(span=12, adjust=True).mean()
plt.plot(a)
plt.plot(expwavg, color='red')
a is my dataset. I believe the function has changed and I am using the most updated one. Any help to solve this function would be helpful.
error : list object has no attribute ewm or ewma
Thanks,
I suspect that a is not actually a DataFrame. You may want to try this first:
# assuming you have previously done:
# import pandas as pd
adf = pd.DataFrame.from_records(a)
adf.head()
If the data appears to be structured as you intend, then your command will likely work:
expwavg = adf.ewm(span=12, adjust=True).mean()
plt.plot(adf)
plt.plot(expwavg, color='red')
If this does not work, you will likely need to post some of the code that precedes the three lines you have already posted.

Multivariate Time series autoregressive series using StatsModels in Python: What to do after fitting the model?

Good day
This is my maiden Stack Overflow question so I hope I get it right and don't break any rules.
I work as a Fund Manager so do not have computer science background. I am however learning python at the moment.
I am trying to fit historical data which includes multiple time series. I think I have managed to do this. The thing I need to do next is to use this data to predict values into the future for these time series. I have looked at the StatsModels documentation but can't quite make heads or tails of it.
I am using xlwings and linking to excel. My code is as follows:
import numpy as np
from xlwings import Workbook, Range
import statsmodels.api as sm
import statsmodels
import pandas
def Fit_the_AR():
dataRange = Range('Sheet1','rDataToFit').value
dateRange = Range('Sheet1', 'rDates').value
titleRange = Range('Sheet1', 'rTitles').value
ARModel = statsmodels.tsa.vector_ar.var_model.VAR(dataRange,dateRange,titleRange,freq='m')
statsmodels.tsa.vector_ar.var_model.VAR.fit(ARModel,1, 'ols', None, 'c', True)
Range('Sheet2','B2').value = ARModel.endog_names
Range('Sheet2','B3').value = ARModel.endog
I thought i would have to use the predict method but not sure how I get all the parameters required for it.
Any help or pointing in the right direction would be much appreciated. I can provide an excel file of the data if need be. Thank you.

Graphing the number of elements down based on timestamps start/end

I am trying to graph alarm counts in Python to give some sort of display to give an idea of the peak amount of network elements down between two timespans. The way that our alarms report handles it is in CSV like this:
Name,Alarm Start,Alarm Clear
NE1,15:42 08/09/11,15:56 08/09/11
NE2,15:42 08/09/11,15:57 08/09/11
NE3,15:42 08/09/11,16:31 08/09/11
NE4,15:42 08/09/11,15:59 08/09/11
I am trying to graph the start and end between those two points and how many NE's were down during that time, including the maximum number and when it went under or over a certain count. An example is below:
15:42 08/09/11 - 4 Down
15:56 08/09/11 - 3 Down
etc.
Any advice where to start on this would be great. Thanks in advance, you guys and gals have been a big help in the past.
I'd start by parsing your indata to a map indexed by dates with counts as values. Just increase the count for each row with the same date you encounter.
After that, use some plotting module, for instance matplotlib to plot the keys of the map versus the values. That should cover it!
Do you need any more detailed ideas?

Categories