I've searched stackoverflow to find out how to group DateTime by month and for some reason I keep receiving this error, even after I pass the dataframe through pd.to.datetime
TypeError: Only valid with DatetimeIndex, TimedeltaIndex or
PeriodIndex, but got an instance of 'Int64Index'
df['Date'] = pd.to_datetime(df['Date'])
df['Date'].groupby(pd.Grouper(freq='M'))
When I pull the datatype for df['Date'] it shows dtype: datetime64[ns] Any, ideas why I keep getting this error?
The reason is simple: you didn't pass a groupby key to groupby.
What you want is to group the entire dataframe by the month values of the contents of df['Date'].
However, what df['Date'].groupby(pd.Grouper(freq='M')) actually does is first extract a pd.Series from the DataFrame's Date column. Then, it attempts to perform a groupby on that Series; without a specified key, it defaults to attempting to group by the index, which is of course numeric.
This will work:
df.groupby(pd.Grouper(key='Date', freq='M'))
Related
I have a Pandas v1.0.3 dataframe where the index is a timestamp and the one column is a numeric value. The index is at a constant time interval difference of 30 minutes. The single column represents a count, and I want to resample it so the index is a 2 day interval.
It seems that I would need to do something like this:
df.index = pd.to_datetime(df.index)
df.resample('2D', on='column_name').sum()
However, I am getting the following error: TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'Int64Index'
If I print out df.index I get:
DatetimeIndex(['2016-12-31 23:59:59', '2017-01-01 00:29:59',
...
dtype='datetime64[ns]', length=500, freq='30T')
Which seems to be compatible. Does anyone know what I'm doing wrong here?
The index should be compatible, but you ask pandas to resample on a column. You should just do:
df.resample('2D').sum()
I am trying to reduce the code bloat in my project for the process of creating various date columns (weekday, business day, day index, week index) and I was wondering how I can take the index of my dataframe and build datetime attribute columns from the index.
I thought I could access the .index or index.values and then reference the datetime attributes like month, weekday, etc., but it doesn't appear that Index has those attributes. Would I need to convert the index values to a new list and then build the columns off of that?
Here is my code:
historicals = pd.read_csv("2018-2019_sessions.csv", index_col="date", na_values=0)
type(historicals)
// date formate = 2018-01-01, 2018-01-02, etc.
# Additional Date Fields
date_col = historicals.index
date_col.weekday
// AttributeError: 'Index' object has no attribute 'weekday'
Your index is in string format. You historicals.index probably looks like this
print(historicals.index)
Index(['2018-01-01', '2018-01-02'], dtype='object')
You need to convert it to datetimeindex and get its weekday attribute and assign to new column
historicals['weekday'] = pd.to_datetime(historicals.index).weekday
Or
date_col = pd.to_datetime(historicals.index)
print(date_col.weekday)
I have a pandas dataframe 'df' with a column 'DateTimes' of type datetime.time.
The entries of that column are hours of a single day:
00:00:00
.
.
.
23:59:00
Seconds are skipped, it counts by minutes.
How can I choose rows by hour, for example the rows between 00:00:00 and 00:01:00?
If I try this:
df.between_time('00:00:00', '00:00:10')
I get an error that index must be a DateTimeIndex.
I set the index as such with:
df=df.set_index(keys='DateTime')
but I get the same error.
I can't seem to get 'loc' to work either. Any suggestions?
Here a working example of what you are trying to do:
times = pd.date_range('3/6/2012 00:00', periods=100, freq='S', tz='UTC')
df = pd.DataFrame(np.random.randint(10, size=(100,1)), index=times)
df.between_time('00:00:00', '00:00:30')
Note the index has to be of type DatetimeIndex.
I understand you have a column with your dates/times. The problem probably is that your column is not of this type, so you have to convert it first, before setting it as index:
# Method A
df.set_index(pd.to_datetime(df['column_name'], drop=True)
# Method B
df.index = pd.to_datetime(df['column_name'])
df = df.drop('col', axis=1)
(The drop is only necessary if you want to remove the original column after setting it as index)
Check out these links:
convert column to date type: Convert DataFrame column type from string to datetime
filter dataframe on dates: Filtering Pandas DataFrames on dates
Hope this helps
I've got an imported csv file which has multiple columns with dates in the format "5 Jan 2001 10:20". (Note not zero-padded day)
if I do df.dtype then it shows the columns as being a objects rather than a string or a datetime. I need to be able to subtract 2 column values to work out the difference so I'm trying to get them into a state where I can do that.
At the moment if I try the test subtraction at the end I get the error unsupported operand type(s) for -: 'str' and 'str'.
I've tried multiple methods but have run into a problem every way I've tried.
Any help would be appreciated. If I need to give any more information then I will.
As suggested by #MaxU, you can use pd.to_datetime() method to bring the values of the given column to the 'appropriate' format, like this:
df['datetime'] = pd.to_datetime(df.datetime)
You would have to do this on whatever columns you have that you need trasformed to the right dtype.
Alternatively, you can use parse_dates argument of pd.read_csv() method, like this:
df = pd.read_csv(path, parse_dates=[1,2,3])
where columns 1,2,3 are expected to contain data that can be interpreted as dates.
I hope this helps.
convert a column to datetime using this approach
df["Date"] = pd.to_datetime(df["Date"])
If column has empty values then change error level to coerce to ignore errors: Details
df["Date"] = pd.to_datetime(df["Date"], errors='coerce')
After which you should be able to subtract two dates.
example:
import pandas
df = pandas.DataFrame(columns=['to','fr','ans'])
df.to = [pandas.Timestamp('2014-01-24 13:03:12.050000'), pandas.Timestamp('2014-01-27 11:57:18.240000'), pandas.Timestamp('2014-01-23 10:07:47.660000')]
df.fr = [pandas.Timestamp('2014-01-26 23:41:21.870000'), pandas.Timestamp('2014-01-27 15:38:22.540000'), pandas.Timestamp('2014-01-23 18:50:41.420000')]
(df.fr-df.to).astype('timedelta64[h]')
consult this answer for more details:
Calculate Pandas DataFrame Time Difference Between Two Columns in Hours and Minutes
If you want to directly load the column as datetime object while reading from csv, consider this example :
Pandas read csv dateint columns to datetime
I found that the problem was to do with missing values within the column. Using coerce=True so df["Date"] = pd.to_datetime(df["Date"], coerce=True) solves the problem.
I have a dataframe with Timestamp entries in one column, created from strings like so:
df = pd.DataFrame({"x": pd.to_datetime("MARCH2016")})
Now I want to select from df based on month, cutting across years, by accessing the .month attribute of the datetime object. However, to_datetime actually created a Timestamp object from the string, and I can't seem to coerce it to datetime. The following works as expected:
type(df.x[0].to_datetime()) # gives datetime object
but using apply (which in my real life example of course I want to do given that I have more than one row) doesn't:
type(df.x.apply(pd.to_datetime)[0]) # returns Timestamp
What am I missing?
The fact that it's a TimeStamp is irrelevant here, you can still access the month attribute using .dt accessor:
In [79]:
df = pd.DataFrame({"x": [pd.to_datetime("MARCH2016")]})
df['x'].dt.month
Out[79]:
0 3
Name: x, dtype: int64