Error when resampling a datetime64 time series dataframe - python

I have a Pandas v1.0.3 dataframe where the index is a timestamp and the one column is a numeric value. The index is at a constant time interval difference of 30 minutes. The single column represents a count, and I want to resample it so the index is a 2 day interval.
It seems that I would need to do something like this:
df.index = pd.to_datetime(df.index)
df.resample('2D', on='column_name').sum()
However, I am getting the following error: TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'Int64Index'
If I print out df.index I get:
DatetimeIndex(['2016-12-31 23:59:59', '2017-01-01 00:29:59',
...
dtype='datetime64[ns]', length=500, freq='30T')
Which seems to be compatible. Does anyone know what I'm doing wrong here?

The index should be compatible, but you ask pandas to resample on a column. You should just do:
df.resample('2D').sum()

Related

How to convert "month-day" object to date format in pandas dataframe

I have a df with a column:
column
Dec-01
The column datatype is an object.
I am trying to change it to a datatype date.
This is what I've tried:
df['column'] = pd.to_datetime(df['column']).dt.strftime(%d-%b')
This is the error I receive:
pandas._libs.tslibs.np_datetime.OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 1-12-01 00:00:00
Would really appreciate you help 🙏
You can use:
df['date'] = pd.to_datetime(df['column'], format='%b-%d').dt.strftime('%m/%d/22')
NB. you must hardcode the year as it is undefined in your original data and pandas will by default use 1970.
output:
column date
0 Dec-01 12/01/22

Cannot do positional indexing on DatetimeIndex with these indexers

I have a time series dataframe "his" with datetime index out of which I want to extract a certain range of timesteps. The starting point "selected_var_start" is in datetime index format (extracted from another dataframe). I want to extract in total 24 h.
I try do it like this: his = his.iloc[selected_var_start:(selected_var_start+pd.DateOffset(hours=24))]
I get the following error TypeError: cannot do positional indexing on DatetimeIndex with these indexers [DatetimeIndex(['2010-11-12 19:00:00'], dtype='datetime64[ns]', name='Datetime', freq=None)] of type DatetimeIndex
What am I doing wrong?
You need to access the first element of selected_var_start. Try this selected_var_start[0]. This will transform you element from DatetimeIndex to simply Timestamp, which works with loc.
his.loc[selected_var_start[0]:(selected_var_start[0]+pd.DateOffset(hours=24))]
Use loc with strings
pd.DataFrame(index=pd.date_range('11/01/2010', '12/01/2010')).loc['2010-11-12': '2010-11-13']

How do I group date by month using pd.Grouper?

I've searched stackoverflow to find out how to group DateTime by month and for some reason I keep receiving this error, even after I pass the dataframe through pd.to.datetime
TypeError: Only valid with DatetimeIndex, TimedeltaIndex or
PeriodIndex, but got an instance of 'Int64Index'
df['Date'] = pd.to_datetime(df['Date'])
df['Date'].groupby(pd.Grouper(freq='M'))
When I pull the datatype for df['Date'] it shows dtype: datetime64[ns] Any, ideas why I keep getting this error?
The reason is simple: you didn't pass a groupby key to groupby.
What you want is to group the entire dataframe by the month values of the contents of df['Date'].
However, what df['Date'].groupby(pd.Grouper(freq='M')) actually does is first extract a pd.Series from the DataFrame's Date column. Then, it attempts to perform a groupby on that Series; without a specified key, it defaults to attempting to group by the index, which is of course numeric.
This will work:
df.groupby(pd.Grouper(key='Date', freq='M'))

Select Pandas dataframe rows based on 'hour' datetime

I have a pandas dataframe 'df' with a column 'DateTimes' of type datetime.time.
The entries of that column are hours of a single day:
00:00:00
.
.
.
23:59:00
Seconds are skipped, it counts by minutes.
How can I choose rows by hour, for example the rows between 00:00:00 and 00:01:00?
If I try this:
df.between_time('00:00:00', '00:00:10')
I get an error that index must be a DateTimeIndex.
I set the index as such with:
df=df.set_index(keys='DateTime')
but I get the same error.
I can't seem to get 'loc' to work either. Any suggestions?
Here a working example of what you are trying to do:
times = pd.date_range('3/6/2012 00:00', periods=100, freq='S', tz='UTC')
df = pd.DataFrame(np.random.randint(10, size=(100,1)), index=times)
df.between_time('00:00:00', '00:00:30')
Note the index has to be of type DatetimeIndex.
I understand you have a column with your dates/times. The problem probably is that your column is not of this type, so you have to convert it first, before setting it as index:
# Method A
df.set_index(pd.to_datetime(df['column_name'], drop=True)
# Method B
df.index = pd.to_datetime(df['column_name'])
df = df.drop('col', axis=1)
(The drop is only necessary if you want to remove the original column after setting it as index)
Check out these links:
convert column to date type: Convert DataFrame column type from string to datetime
filter dataframe on dates: Filtering Pandas DataFrames on dates
Hope this helps

Resampling pandas columns datetime

(I think) I have a dataset with the columns representing datetime intervals
Columns were transformed in datetime with:
for col in df.columns:
df.rename({col: pd.to_datetime(col, infer_datetime_format=True)}, inplace=True)
Then, I need to resample the columns (year and month '2001-01') into quarters using mean
I tried
df = df.resample('1q', how='mean', axis=1)
The DataFrame also has a multindex set ['RegionName', 'County']
But I get the error:
Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'Index'
Is the problem in the to_datetime function or in the wrong sampling?
(I think) you are renaming each column head rather than making the entire columns object a DatetimeIndex
Try this instead:
df.columns = pd.to_datetime(df.columns)
Then run your resample
note:
I'd do it with period after converting to DatetimeIndex. That way, you get the period in your column header rather than an end date of the quarter.
df.groupby(df.columns.to_period('Q'), axis=1).mean()
demo
df = pd.DataFrame(np.arange(12).reshape(2, -1),
columns=['2011-01-31', '2011-02-28', '2011-03-31',
'2011-04-30', '2011-05-31', '2011-06-30'])
df.columns = pd.to_datetime(df.columns)
print(df.groupby(df.columns.to_period('Q'), axis=1).mean())
2011Q1 2011Q2
0 1 4
1 7 10

Categories