How to group a datetime column by year? - python

I have a dataframe with a column of this format : datetime64[ns]
I want to group by rows by year. The dates are of this format: 2019-01-08 02:27:17
I unsuccessfully tried
df1=df.groupby([(df.modification_datetime.year)]).sum()
AttributeError: 'Series' object has no attribute 'year'
Do you know how to solve that?
EDIT :
The solution is
df1=df.groupby(df.modification_datetime.dt.year).sum()
We don't need the brackets!

You don't need all of those brackets, and you need to use the dt accessor to extract the year from the date:
df1 = df.groupby(df.modification_datetime.dt.year).sum()

Related

How to convert "month-day" object to date format in pandas dataframe

I have a df with a column:
column
Dec-01
The column datatype is an object.
I am trying to change it to a datatype date.
This is what I've tried:
df['column'] = pd.to_datetime(df['column']).dt.strftime(%d-%b')
This is the error I receive:
pandas._libs.tslibs.np_datetime.OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 1-12-01 00:00:00
Would really appreciate you help 🙏
You can use:
df['date'] = pd.to_datetime(df['column'], format='%b-%d').dt.strftime('%m/%d/22')
NB. you must hardcode the year as it is undefined in your original data and pandas will by default use 1970.
output:
column date
0 Dec-01 12/01/22

How to offset Date to the beginning of the month?

I have the data frame that goes more or less like this:
Date x y z
1998-01-30 000445 Abbey National Plc 2.24455118179321
1998-01-30 001097 Mytravel Group 1.55792689323425
The 'Date' column is datetime64[ns] type and I would like to offset the 'Date' column so that my date would shift to the beginning of the month, so this should go like this:
df['New Date'] = df['Date'].offsets.MonthBegin()
But returns an error:
AttributeError: 'Series' object has no attribute 'offsets'
Why so? a single df column is series, right?
type(df['Date'])
Out[83]: pandas.core.series.Series
You could try
df['New_date'] = df.set_index('Date').index.to_period('M').to_timestamp('D')
This assumes that Date is already a datetime object. If it isn't, then first convert using.
df['Date'] = pd.to_datetime(df['Date'])
It's not essential, but good practice to add an underscore in between column names.
So New_date instead of New date. Possibly make this lowercase also.

Using DataFrame Index Dates for Date Column Creation

I am trying to reduce the code bloat in my project for the process of creating various date columns (weekday, business day, day index, week index) and I was wondering how I can take the index of my dataframe and build datetime attribute columns from the index.
I thought I could access the .index or index.values and then reference the datetime attributes like month, weekday, etc., but it doesn't appear that Index has those attributes. Would I need to convert the index values to a new list and then build the columns off of that?
Here is my code:
historicals = pd.read_csv("2018-2019_sessions.csv", index_col="date", na_values=0)
type(historicals)
// date formate = 2018-01-01, 2018-01-02, etc.
# Additional Date Fields
date_col = historicals.index
date_col.weekday
// AttributeError: 'Index' object has no attribute 'weekday'
Your index is in string format. You historicals.index probably looks like this
print(historicals.index)
Index(['2018-01-01', '2018-01-02'], dtype='object')
You need to convert it to datetimeindex and get its weekday attribute and assign to new column
historicals['weekday'] = pd.to_datetime(historicals.index).weekday
Or
date_col = pd.to_datetime(historicals.index)
print(date_col.weekday)

How do I group date by month using pd.Grouper?

I've searched stackoverflow to find out how to group DateTime by month and for some reason I keep receiving this error, even after I pass the dataframe through pd.to.datetime
TypeError: Only valid with DatetimeIndex, TimedeltaIndex or
PeriodIndex, but got an instance of 'Int64Index'
df['Date'] = pd.to_datetime(df['Date'])
df['Date'].groupby(pd.Grouper(freq='M'))
When I pull the datatype for df['Date'] it shows dtype: datetime64[ns] Any, ideas why I keep getting this error?
The reason is simple: you didn't pass a groupby key to groupby.
What you want is to group the entire dataframe by the month values of the contents of df['Date'].
However, what df['Date'].groupby(pd.Grouper(freq='M')) actually does is first extract a pd.Series from the DataFrame's Date column. Then, it attempts to perform a groupby on that Series; without a specified key, it defaults to attempting to group by the index, which is of course numeric.
This will work:
df.groupby(pd.Grouper(key='Date', freq='M'))

AttributeError: 'Series' object has no attribute 'days'

I have a column 'delta' in a dataframe dtype: timedelta64[ns], calculated by subcontracting one date from another. I am trying to return the number of days as a float by using this code:
from datetime import datetime
from datetime import date
df['days'] = float(df['delta'].days)
but I receive this error:
AttributeError: 'Series' object has no attribute 'days'
Any ideas why?
DataFrame column is a Series, and for Series you need dt.accessor to calculate days (if you are using a newer Pandas version). You can see docs here
So, you need to change:
df['days'] = float(df['delta'].days)
To
df['days'] = float(df['delta'].dt.days)
While subtracting the dates you should use the following code.
df = pd.DataFrame([ pd.Timestamp('20010101'), pd.Timestamp('20040605') ])
(df.loc[0]-df.loc[1]).astype('timedelta64[D]')
So basically use .astype('timedelta64[D]') on the subtracted column.

Categories