Using DataFrame Index Dates for Date Column Creation - python

I am trying to reduce the code bloat in my project for the process of creating various date columns (weekday, business day, day index, week index) and I was wondering how I can take the index of my dataframe and build datetime attribute columns from the index.
I thought I could access the .index or index.values and then reference the datetime attributes like month, weekday, etc., but it doesn't appear that Index has those attributes. Would I need to convert the index values to a new list and then build the columns off of that?
Here is my code:
historicals = pd.read_csv("2018-2019_sessions.csv", index_col="date", na_values=0)
type(historicals)
// date formate = 2018-01-01, 2018-01-02, etc.
# Additional Date Fields
date_col = historicals.index
date_col.weekday
// AttributeError: 'Index' object has no attribute 'weekday'

Your index is in string format. You historicals.index probably looks like this
print(historicals.index)
Index(['2018-01-01', '2018-01-02'], dtype='object')
You need to convert it to datetimeindex and get its weekday attribute and assign to new column
historicals['weekday'] = pd.to_datetime(historicals.index).weekday
Or
date_col = pd.to_datetime(historicals.index)
print(date_col.weekday)

Related

How to offset Date to the beginning of the month?

I have the data frame that goes more or less like this:
Date x y z
1998-01-30 000445 Abbey National Plc 2.24455118179321
1998-01-30 001097 Mytravel Group 1.55792689323425
The 'Date' column is datetime64[ns] type and I would like to offset the 'Date' column so that my date would shift to the beginning of the month, so this should go like this:
df['New Date'] = df['Date'].offsets.MonthBegin()
But returns an error:
AttributeError: 'Series' object has no attribute 'offsets'
Why so? a single df column is series, right?
type(df['Date'])
Out[83]: pandas.core.series.Series
You could try
df['New_date'] = df.set_index('Date').index.to_period('M').to_timestamp('D')
This assumes that Date is already a datetime object. If it isn't, then first convert using.
df['Date'] = pd.to_datetime(df['Date'])
It's not essential, but good practice to add an underscore in between column names.
So New_date instead of New date. Possibly make this lowercase also.

How do I extract a DateTimeIndex for use in a new column?

I've extracted the dates from filenames in a set of Excel files into a list of DateTimeIndex objects. I now need to write the extracted date from each to a new date column for the dataframes I've created from each Excel sheet. My code works in that it writes the the new 'Date' column to each dataframe, but I'm unable to convert the objects out of their generator object DateTimeIndex format and into a %Y-%m-%d format.
Link to code creating the list of DateTimeIndexes from the filenames:
How do I turn datefinder output into a list?
Code to write each list entry to a new 'Date' column in each dataframe created from the spreadsheets:
for i in range(0, len(df)):
df[i]['Date'] = (event_dates_dto[i] for frames in df)
The involved objects:
type(event_dates_dto)
<class 'list'>
type(event_dates_dto[0])
<class 'pandas.core.indexes.datetimes.DatetimeIndex'>
event_dates_dto
[DatetimeIndex(['2019-03-29'], dtype='datetime64[ns]', freq=None), DatetimeIndex(['2019-04-13'], dtype='datetime64[ns]', freq=None), DatetimeIndex(['2019-05-11'], dtype='datetime64[ns]', freq=None)]
The dates were extracted using datefinder: http://www.blog.pythonlibrary.org/2016/02/04/python-the-datefinder-package/
I've tried using methods here that seemed like they could make sense but none of them are the right ticket: Keep only date part when using pandas.to_datetime
Again, the simple for function is working correctly, but I'm unsure how to coerce the generator object into the correct format so that it not only writes to the new 'Date' column but also so that it is is in a useful '%Y-%m-%d' format that makes sense within the dataframe. Any help is greatly appreciated.
force evaluation with a one line loop like dates = [_ for _ in matches]
convert the index to a column using the .index (or .reset_index() if you don't need to keep it)
convert the column to datetime using pd.to_datetime()
. use the .dt.date object of the datetime column to convert to Y-m-d
Here's a sample
import datefinder
import pandas as pd
data = '''Your appointment is on July 14th, 2016 15:24. Your bill is due 05/05/2016 16:00'''
matches = datefinder.find_dates(data)
# force evaluation with 1 line loop
dates = [_ for _ in matches] # 'dates = list(matches)' also works
df = pd.DataFrame({'dt_index':dates,'value':['appointment','bill']}).set_index('dt_index')
df['date'] = df.index
df['date'] = pd.to_datetime(df['date'])
df['date'] = df['date'].dt.date
df
which gives
value date
dt_index
2016-07-14 15:24:00 appointment 2016-07-14
2016-05-05 16:00:00 bill 2016-05-05
Edit: Edited to account for forced evaluation
A minor fix got it working, I was just trying to carry out too much at once and overthinking it.
#create empty list and append each date
event_dates_transfer = []
#use .strftime('%Y-%m-%d') method on event_dates_dto here if you wish to return a string instead of a datetimeindex
for i in range(0,len(event_dates_dto)):
event_dates_transfer.append(event_dates_dto[i][0])
#Create a 'Date' column for each dataframe correlating to the filename it was created from and set it as the index
for i in range(0, len(df)):
new_date = event_dates_transfer[i]
df[i]['Date'] = new_date
df[i].set_index('Date', inplace=True)

How do I group date by month using pd.Grouper?

I've searched stackoverflow to find out how to group DateTime by month and for some reason I keep receiving this error, even after I pass the dataframe through pd.to.datetime
TypeError: Only valid with DatetimeIndex, TimedeltaIndex or
PeriodIndex, but got an instance of 'Int64Index'
df['Date'] = pd.to_datetime(df['Date'])
df['Date'].groupby(pd.Grouper(freq='M'))
When I pull the datatype for df['Date'] it shows dtype: datetime64[ns] Any, ideas why I keep getting this error?
The reason is simple: you didn't pass a groupby key to groupby.
What you want is to group the entire dataframe by the month values of the contents of df['Date'].
However, what df['Date'].groupby(pd.Grouper(freq='M')) actually does is first extract a pd.Series from the DataFrame's Date column. Then, it attempts to perform a groupby on that Series; without a specified key, it defaults to attempting to group by the index, which is of course numeric.
This will work:
df.groupby(pd.Grouper(key='Date', freq='M'))

Select Pandas dataframe rows based on 'hour' datetime

I have a pandas dataframe 'df' with a column 'DateTimes' of type datetime.time.
The entries of that column are hours of a single day:
00:00:00
.
.
.
23:59:00
Seconds are skipped, it counts by minutes.
How can I choose rows by hour, for example the rows between 00:00:00 and 00:01:00?
If I try this:
df.between_time('00:00:00', '00:00:10')
I get an error that index must be a DateTimeIndex.
I set the index as such with:
df=df.set_index(keys='DateTime')
but I get the same error.
I can't seem to get 'loc' to work either. Any suggestions?
Here a working example of what you are trying to do:
times = pd.date_range('3/6/2012 00:00', periods=100, freq='S', tz='UTC')
df = pd.DataFrame(np.random.randint(10, size=(100,1)), index=times)
df.between_time('00:00:00', '00:00:30')
Note the index has to be of type DatetimeIndex.
I understand you have a column with your dates/times. The problem probably is that your column is not of this type, so you have to convert it first, before setting it as index:
# Method A
df.set_index(pd.to_datetime(df['column_name'], drop=True)
# Method B
df.index = pd.to_datetime(df['column_name'])
df = df.drop('col', axis=1)
(The drop is only necessary if you want to remove the original column after setting it as index)
Check out these links:
convert column to date type: Convert DataFrame column type from string to datetime
filter dataframe on dates: Filtering Pandas DataFrames on dates
Hope this helps

How can a DataFrame change from having two columns (a "from" datetime and a "to" datetime) to having a single column for a date?

I've got a DataFrame that looks like this:
It has two columns, one of them being a "from" datetime and one of them being a "to" datetime. I would like to change this DataFrame such that it has a single column or index for the date (e.g. 2015-07-06 00:00:00 in datetime form) with the variables of the other columns (like deep) split proportionately into each of the days. How might one approach this problem? I've meddled with groupby tricks and I'm not sure how to proceed.
So I don't have time to work through your specific problem at the moment. But the way to approach this is to us pandas.resample(). Here are the steps I would take. 1) Resample your to date column by minute. 2) Populate the other columns out over that resample. 3) Add the date column back in as an index.
If this doesn't work or is being tricky to work with I would create a date range from your earliest date to your latest date (at the smallest interval you want - so maybe hourly?) and then run some conditional statements over your other columns to fill in the data.
Here is somewhat what your code may look like for the resample portion (replace day with hour or whatever):
drange = pd.date_range('01-01-1970', '01-20-2018', freq='D')
data = data.resample('D').fillna(method='ffill')
data.index.name = 'date'
Hope this helps!

Categories