Cannot do positional indexing on DatetimeIndex with these indexers - python

I have a time series dataframe "his" with datetime index out of which I want to extract a certain range of timesteps. The starting point "selected_var_start" is in datetime index format (extracted from another dataframe). I want to extract in total 24 h.
I try do it like this: his = his.iloc[selected_var_start:(selected_var_start+pd.DateOffset(hours=24))]
I get the following error TypeError: cannot do positional indexing on DatetimeIndex with these indexers [DatetimeIndex(['2010-11-12 19:00:00'], dtype='datetime64[ns]', name='Datetime', freq=None)] of type DatetimeIndex
What am I doing wrong?

You need to access the first element of selected_var_start. Try this selected_var_start[0]. This will transform you element from DatetimeIndex to simply Timestamp, which works with loc.
his.loc[selected_var_start[0]:(selected_var_start[0]+pd.DateOffset(hours=24))]

Use loc with strings
pd.DataFrame(index=pd.date_range('11/01/2010', '12/01/2010')).loc['2010-11-12': '2010-11-13']

Related

Error when resampling a datetime64 time series dataframe

I have a Pandas v1.0.3 dataframe where the index is a timestamp and the one column is a numeric value. The index is at a constant time interval difference of 30 minutes. The single column represents a count, and I want to resample it so the index is a 2 day interval.
It seems that I would need to do something like this:
df.index = pd.to_datetime(df.index)
df.resample('2D', on='column_name').sum()
However, I am getting the following error: TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'Int64Index'
If I print out df.index I get:
DatetimeIndex(['2016-12-31 23:59:59', '2017-01-01 00:29:59',
...
dtype='datetime64[ns]', length=500, freq='30T')
Which seems to be compatible. Does anyone know what I'm doing wrong here?
The index should be compatible, but you ask pandas to resample on a column. You should just do:
df.resample('2D').sum()

Length between two dates in a time series in pandas data frame

I have a time series composed of weekdays with anomalous/unpredictable holidays. On any given day, I want to know the length/number of rows to a date specified under column 'date1'. See below.
len(df.loc['2019-10-18':'2019-11-15']) returns the correct answer
I am trying to create a column 'shift' that will calculate the above.
Both DatetimeIndex and the 'date1' are dtype 'datetime64[ns]'
df['shift']=len(df.loc[df.index : df['date1']]) clearly doesn't work but might there be a solution that does?
IIUC use:
df['len'] = (df.index - df['date1']).dt.days

How do I group date by month using pd.Grouper?

I've searched stackoverflow to find out how to group DateTime by month and for some reason I keep receiving this error, even after I pass the dataframe through pd.to.datetime
TypeError: Only valid with DatetimeIndex, TimedeltaIndex or
PeriodIndex, but got an instance of 'Int64Index'
df['Date'] = pd.to_datetime(df['Date'])
df['Date'].groupby(pd.Grouper(freq='M'))
When I pull the datatype for df['Date'] it shows dtype: datetime64[ns] Any, ideas why I keep getting this error?
The reason is simple: you didn't pass a groupby key to groupby.
What you want is to group the entire dataframe by the month values of the contents of df['Date'].
However, what df['Date'].groupby(pd.Grouper(freq='M')) actually does is first extract a pd.Series from the DataFrame's Date column. Then, it attempts to perform a groupby on that Series; without a specified key, it defaults to attempting to group by the index, which is of course numeric.
This will work:
df.groupby(pd.Grouper(key='Date', freq='M'))

Python cleaning dates for conversion to year only in Pandas

I have a large data set which some users put in data on an csv. I converted the CSV into a dataframe with panda. The column is over 1000 entries here is a sample
datestart
5/5/2013
6/12/2013
11/9/2011
4/11/2013
10/16/2011
6/15/2013
6/19/2013
6/16/2013
10/1/2011
1/8/2013
7/15/2013
7/22/2013
7/22/2013
5/5/2013
7/12/2013
7/29/2013
8/1/2013
7/22/2013
3/15/2013
6/17/2013
7/9/2013
3/5/2013
5/10/2013
5/15/2013
6/30/2013
6/30/2013
1/1/2006
00/00/0000
7/1/2013
12/21/2009
8/14/2013
Feb 1 2013
Then I tried converting the dates into years using
df['year']=df['datestart'].astype('timedelta64[Y]')
But it gave me an error:
ValueError: Value cannot be converted into object Numpy Time delta
Using Datetime64
df['year']=pd.to_datetime(df['datestart']).astype('datetime64[Y]')
it gave:
"ValueError: Error parsing datetime string ""03/13/2014"" at position 2"
Since that column was filled in by users, the majority was in this format MM/DD/YYYY but some data was put in like this: Feb 10 2013 and there was one entry like this 00/00/0000. I am guessing the different formats screwed up the processing.
Is there a try loop, if statement, or something that I can skip over problems like these?
If date time fails I will be force to use a str.extract script which also works:
year=df['datestart'].str.extract("(?P<month>[0-9]+)(-|\/)(?P<day>[0-9]+)(-|\/)(?P<year>[0-9]+)")
del df['month'], df['day']
and use concat to take the year out.
With df['year']=pd.to_datetime(df['datestart'],coerce=True, errors ='ignore').astype('datetime64[Y]') The error message is:
Message File Name Line Position
Traceback
<module> C:\Users\0\Desktop\python\Example.py 23
astype C:\Python33\lib\site-packages\pandas\core\generic.py 2062
astype C:\Python33\lib\site-packages\pandas\core\internals.py 2491
apply C:\Python33\lib\site-packages\pandas\core\internals.py 3728
astype C:\Python33\lib\site-packages\pandas\core\internals.py 1746
_astype C:\Python33\lib\site-packages\pandas\core\internals.py 470
_astype_nansafe C:\Python33\lib\site-packages\pandas\core\common.py 2222
TypeError: cannot astype a datetimelike from [datetime64[ns]] to [datetime64[Y]]
You first have to convert the column with the date values to datetime's with to_datetime():
df['datestart'] = pd.to_datetime(df['datestart'], coerce=True)
This should normally parse the different formats flexibly (the coerce=True is important here to convert invalid dates to NaT).
If you then want the year part of the dates, you can do the following (seems doing astype directly on the pandas column gives an error, but with values you can get the underlying numpy array):
df['datestart'].values.astype('datetime64[Y]')
The problem with this is that it gives again an error when assigning this to a column due to the NaT value (this seems a bug, you can solve this by doing df = df.dropna()). But also, when you assign this to a column, it get converted back to a datetime64[ns] as this is the way pandas stores datetimes. So I personally think if you want a column with the years, you can better do the following:
df['year'] = pd.DatetimeIndex(df['datestart']).year
This last one will return the year as an integer.

Frequency of a data frame

I have a data frame indexed with a date (Python datetime object). How could I find the frequency as the number of months of data in the data frame?
I tried the attribute data_frame.index.freq, but it returns a none value. I also tried asfreq function using data_frame.asfreq('M',how={'start','end'} but it does not return the expected results. Please advise how I can get the expected results.
You want to convert you index of datetimes to a DatetimeIndex, the easiest way is to use to_datetime:
df.index = pd.to_datetime(df.index)
Now you can do timeseries/frame operations, like resample or TimeGrouper.
If your data has a consistent frequency, then this will be df.index.freq, if it doesn't (e.g. if some days are missing) then df.index.freq will be None.
You probably want to be use pandas Timestamp for your index instead of datetime to use 'freq'. See example below
import pandas as pd
dates = pd.date_range('2012-1-1','2012-2-1')
df = pd.DataFrame(index=dates)
print (df.index.freq)
yields,
<Day>
You can easily convert your dataframe like so,
df.index = [pd.Timestamp(d) for d in df.index]

Categories