Resample daily data into weekday data - python

i´m a beginner but i can´t seem to find a way with resample in pandas to turn daily ohlc data into separate weekdays to analyze weekday return over a period of time and how it changed.(turn it into weekly data for exmaple tuesdays as a "weekstart" and just using first so i don´t see the data for the remaining days)
Hope this question is not too stupid but i´m trying to figure it out since a few days but i haven´t found a working solution.
Thanks in advance !
import pandas as pd
import datetime
import pandas_datareader.data as web
import numpy as np
df = pd.read_csv("ethusdt.csv",parse_dates=["time"], index_col="time")
ohlc_dict = {
'open':'first',
'high':'max',
'low':'min',
'close':'last',
'volume':'sum',
'daily_change':'sum'
}
df = df.resample('W').agg(ohlc_dict)
weeklydf = df.resample('W').agg(ohlc_dict)
weeklydf['weekly_change'] = ((weeklydf['close'] / (weeklydf['open'])-1)*100)
EDIT:
Since i probalby haven´t explained my tought correctly( my english is not very good, sorry) i´ll try to explain the Problem again,
in the Dataframe i have following Data: Open, high, low, close, and volume. My goal is it to resample it to weekly data what i already got but with different starting days, for example with the data right now the daily start is always monday , but i wan´t to change it to all the different weekdays to find changes how the cryptocurrency market "frontruns" itself slowly. Have seen this for a long time but would like to have statistical proof

Related

Pandas datetime64 problem (datetime introduces spikes in data)

This is my first question on stackoverflow, so be kind :)
I work with imported csv files and pandas and really liked the pandas datetime possibilities to work and filter dataframes. But i have serious problems with plotting the data in a neat way when using dates as datetime64. Either when using pandas plots or seaborn plots.
my csv looks like this:
date time Flux_ConNT_C Flux_ConB1 Flux_ConB2 Flux_ConB3 Flux_ConB4 Flux_ConB4
0 01.01.2015 00:30 2.552032129 2.193558665 1.0093326 1.013124869 1.159512896 1.159512896
1 01.01.2015 01:00 2.553308464 2.195533756 1.01003938 1.013935693 1.160672989 1.160672989
2 01.01.2015 01:30 2.554585438 2.197510626 1.010746655 1.014747166 1.161834243 1.161834243
3 01.01.2015 02:00 2.55586305 2.199489276 1.011454426 1.015559289 1.162996658 1.162996658
4 01.01.2015 02:30 2.557141301 2.201469707 1.012162692 1.016372061 1.164160236 1.164160236
when I plot the data with
df.plot(figsize=(15,8))
my output is right output
but when I change the "date time" column to 'datetime64 with
df['date time'] = pd.to_datetime(df['date time'])
and use the same code to plot, the data is plotted with these spikes and its not usable false output
There seems to be a problem with matplotlib, but i can't find anything else than putting register_matplotlib_converters() before the plot, which doesn't change anything.
I'm working with Spyder IDE and Python 3.7 and all libraries are up to date.
Thanks for your help!
Your problem is no miracle, it's simply not reproduciable.
Are you sure your csv doesn't have a header for the first index column 0..4?
Are you sure in the csv column 8 is a duplicate of column 7?
How did you actually import this csv and construct your dataframe?
The first plot only works after replaceing the range index 0..4 by the "date time" column. What other transformations did you apply to the dataframe before calling the plot method?
Your to_datetime conversion only works on a column, not an index. Why don't you share all the code that you've been using?
In the 2 plots the first 5 rows don't don't differ. Why don't you share the data rows that are actually different in the 2 plots?
I will give you credit for trying to abstract the problem properly. Unfortunately, you omitted important information. Based on the limited information you've been showing here, there is no problem at all.
To make my point clear: What you observed is not related to the datetime64[ns] conversion, but to something probably very simple that you didn't consider important enough to share with us.
Have a look at How to create a Minimal, Reproducible Example. The idea is: When you're able to prepare your problem in a reproduciable way, you'll probably be ab le to solve it yourself.

Alphavantage API in Python - no Date on X axis

So some basic stuff - Running python 3.x (Jupyter Notebook) with Alpha Vantage
I'm running this simple code. It all works well, the only problem that I have is that it does not show me the date on the x axis. Anyone has any idea why is doing this?
from alpha_vantage.timeseries import TimeSeries
import matplotlib.pyplot as plt
ts = TimeSeries(key='06VFCKNZ709V6XFG', output_format='pandas')
data, meta_data = ts.get_intraday(symbol='MSFT',interval='1min', outputsize='full')
data['4. close'].plot()
plt.title('Intraday Times Series for the MSFT stock (1 min)')
plt.show()
So it needed this line in order to print out the dates
data.set_index( pd.to_datetime(data.index), inplace=True )
The problem right now is that it displays the time after the stock market closes. And the API does not have those dates. I'll keep posting the answer in case I find it.

Dummy variable for first of the month

The crux of this is that its stock data, so the first day of the month might not always be the 1st. I have found a way of isolating these, but i don't know how to then edit the dataframe to put a "1" next to each of these.
Hopefully this makes sense.
import pandas_datareader as web
import pandas as pd
import numpy as np
import datetime as dt
df = web.DataReader('AAPL', 'google')
df = df.set_index(pd.to_datetime(df['Date']))
df.sort_index(inplace=True)
print(df.groupby(pd.Grouper(freq='MS')).nth(0))
The code i'm using. Currently it prints the first of the month correctly, but i'm not sure how to make a new column (D_FoM) with a 1 at every one of these dates.
I'm sure its something easy but i cant work it out, R is much easier for this sort of thing i feel.

Multivariate Time series autoregressive series using StatsModels in Python: What to do after fitting the model?

Good day
This is my maiden Stack Overflow question so I hope I get it right and don't break any rules.
I work as a Fund Manager so do not have computer science background. I am however learning python at the moment.
I am trying to fit historical data which includes multiple time series. I think I have managed to do this. The thing I need to do next is to use this data to predict values into the future for these time series. I have looked at the StatsModels documentation but can't quite make heads or tails of it.
I am using xlwings and linking to excel. My code is as follows:
import numpy as np
from xlwings import Workbook, Range
import statsmodels.api as sm
import statsmodels
import pandas
def Fit_the_AR():
dataRange = Range('Sheet1','rDataToFit').value
dateRange = Range('Sheet1', 'rDates').value
titleRange = Range('Sheet1', 'rTitles').value
ARModel = statsmodels.tsa.vector_ar.var_model.VAR(dataRange,dateRange,titleRange,freq='m')
statsmodels.tsa.vector_ar.var_model.VAR.fit(ARModel,1, 'ols', None, 'c', True)
Range('Sheet2','B2').value = ARModel.endog_names
Range('Sheet2','B3').value = ARModel.endog
I thought i would have to use the predict method but not sure how I get all the parameters required for it.
Any help or pointing in the right direction would be much appreciated. I can provide an excel file of the data if need be. Thank you.

How to remove day from datetime index in pandas?

The idea behind this question is, that when I'm working with full datetime tags and data from different days, I sometimes want to compare how the hourly behavior compares.
But because the days are different, I can not directly plot two 1-hour data sets on top of each other.
My naive idea would be that I need to remove the day from the datetime index on both sets and then plot them on top of each other. What's the best way to do that?
Or, alternatively, what's the better approach to my problem?
This may not be exactly it but should help you along, assuming ts is your timeseries:
hourly = ts.resample('H')
hourly.index = pd.MultiIndex.from_arrays([hourly.index.hour, hourly.index.normalize()])
hourly.unstack().plot()
If you don't care about the day AT ALL, just hourly.index = hourly.index.hour should work

Categories