Adding holidays columns in a Dataframe in Python - python

I am trying to add holidays column for France in a Dataframe by using workalendar package but it gives me an error of
Series' object has no attribute 'weekday'
Below is my code;
from workalendar.europe import France
df1 = pd.read_csv('C:\Users\ABC.csv')
df1['Date'] = pd.to_datetime(df1['Date'], format= '%d/%m/%Y %H:%M:%S')
df1['Date1'] = df1.Date.dt.date
dr = df1['Date1']
cal = France()
df1['Holiday'] = cal.is_working_day(df1['Date1'])
df1.head()
The original data in the file looks like this;
Date Value
17/10/2012 19:00:00 0
17/10/2012 20:00:00 0.1
17/10/2012 21:00:00 0
17/10/2012 22:00:00 0
17/10/2012 23:00:00 0
18/10/2012 00:00:00 0
18/10/2012 01:00:00 0
18/10/2012 02:00:00 0
18/10/2012 03:00:00 0.1
18/10/2012 04:00:00 0
18/10/2012 05:00:00 0
18/10/2012 06:00:00 0
18/10/2012 07:00:00 0
18/10/2012 08:00:00 0.2
18/10/2012 09:00:00 0.5
`

Try this.
df1['Holiday'] = df1.Date.apply(lambda x: cal.is_working_day(pd.to_pydatetime(x)))
You have to convert the object type to datetime.
BTW, I thought that working_day would might not be Holiday...
Converting between datetime, Timestamp and datetime64

Related

DataFrame Pandas datetime error: hour must be in 0..23

I have the following time series and I want to convert to datetime in DataFrame using "pd.to_datetime". I am getting the following error: "hour must be in 0..23: 2017/ 01/01 24:00:00". How can I go around this error?
DateTime
0 2017/ 01/01 01:00:00
1 2017/ 01/01 02:00:00
2 2017/ 01/01 03:00:00
3 2017/ 01/01 04:00:00
...
22 2017/ 01/01 23:00:00
23 2017/ 01/01 24:00:00
Given:
DateTime
0 2017/01/01 01:00:00
1 2017/01/01 02:00:00
2 2017/01/01 03:00:00
3 2017/01/01 04:00:00
4 2017/01/01 23:00:00
5 2017/01/01 24:00:00
As the error says, 24:00:00 isn't a valid time. Depending on what it actually means, we can salvage it like this:
# Split up your Date and Time Values into separate Columns:
df[['Date', 'Time']] = df.DateTime.str.split(expand=True)
# Convert them separately, one as datetime, the other as timedelta.
df.Date = pd.to_datetime(df.Date)
df.Time = pd.to_timedelta(df.Time)
# Fix your DateTime Column, Drop the helper Columns:
df.DateTime = df.Date + df.Time
df = df.drop(['Date', 'Time'], axis=1)
print(df)
print(df.dtypes)
Output:
DateTime
0 2017-01-01 01:00:00
1 2017-01-01 02:00:00
2 2017-01-01 03:00:00
3 2017-01-01 04:00:00
4 2017-01-01 23:00:00
5 2017-01-02 00:00:00
DateTime datetime64[ns]
dtype: object
df['DateTime'] =pd.to_datetime(df['DateTime'], format='%y-%m-%d %H:%M', errors='coerce')
Try this out!

datetime difference between dates

I have a df like so:
firstdate seconddate
0 2011-01-01 13:00:00 2011-01-01 13:00:00
1 2011-01-02 14:00:00 2011-01-01 11:00:00
2 2011-01-02 16:00:00 2011-01-02 13:00:00
3 2011-01-04 12:00:00 2011-01-03 15:00:00
...
Seconddate is always before firstdate. I want to compute the difference between firstdate and seconddate in number of days and make this a column, if firstdate and seconddate are the same day, difference=0, if seconddate is the day before firstdate, difference=1 and so on until a week. How would I do this?
df['first'] = pd.to_datetime(df['first'])
df['second'] = pd.to_datetime(df['second'])
df['diff'] = (df['first'] - df['second']).dt.days
This will add a column with the diff. You can delete based on it
df.drop(df[df.diff < 0].index)
# or
df = df[df.diff > 0]

How to round date time index in a pandas data frame?

There is a pandas dataframe like this:
index
2018-06-01 02:50:00 R 45.48 -2.8
2018-06-01 07:13:00 R 45.85 -2.0
...
2018-06-01 08:37:00 R 45.87 -2.7
I would like to round the index to the hour like this:
index
2018-06-01 02:00:00 R 45.48 -2.8
2018-06-01 07:00:00 R 45.85 -2.0
...
2018-06-01 08:00:00 R 45.87 -2.7
I am trying the following code:
df = df.date_time.apply ( lambda x : x.round('H'))
but returns a serie instead of a dataframe with the modified index column
Try using floor:
df.index.floor('H')
Setup:
df = pd.DataFrame(np.arange(25),index=pd.date_range('2018-01-01 01:12:50','2018-01-02 01:12:50',freq='H'),columns=['Value'])
df.head()
Value
2018-01-01 01:12:50 0
2018-01-01 02:12:50 1
2018-01-01 03:12:50 2
2018-01-01 04:12:50 3
2018-01-01 05:12:50 4
df.index = df.index.floor('H')
df.head()
Value
2018-01-01 01:00:00 0
2018-01-01 02:00:00 1
2018-01-01 03:00:00 2
2018-01-01 04:00:00 3
2018-01-01 05:00:00 4
Try my method:
Add a new column by the rounded value of hour:
df['E'] = df.index.round('H')
Set it as index:
df1 = df.set_index('E')
Delete the name you set('E' here):
df1.index.name = None
And now, df1 is a new DataFrame with index hour rounded from df.
Try this
df['index'].apply(lambda dt: datetime.datetime(dt.year, dt.month, dt.day, dt.hour,60*(dt.minute // 60)))

Python: Grouping by time interval

I have a dataframe that looks like this:
I'm using python 3.6.5 and a datetime.time object for the index
print(sum_by_time)
Trips
Time
00:00:00 10
01:00:00 10
02:00:00 10
03:00:00 10
04:00:00 20
05:00:00 20
06:00:00 20
07:00:00 20
08:00:00 30
09:00:00 30
10:00:00 30
11:00:00 30
How can I group this dataframe by time interval to get something like this:
Trips
Time
00:00:00 - 03:00:00 40
04:00:00 - 07:00:00 80
08:00:00 - 11:00:00 120
I think need convert index values to timedeltas by to_timedelta and then resample:
df.index = pd.to_timedelta(df.index.astype(str))
df = df.resample('4H').sum()
print (df)
Trips
00:00:00 40
04:00:00 80
08:00:00 120
EDIT:
For your format need:
df['d'] = pd.to_datetime(df.index.astype(str))
df = df.groupby(pd.Grouper(freq='4H', key='d')).agg({'Trips':'sum', 'd':['first','last']})
df.columns = df.columns.map('_'.join)
df = df.set_index(df['d_first'].dt.strftime('%H:%M:%S') + ' - ' + df['d_last'].dt.strftime('%H:%M:%S'))[['Trips_sum']]
print (df)
Trips_sum
00:00:00 - 03:00:00 40
04:00:00 - 07:00:00 80
08:00:00 - 11:00:00 120

Remove 'seconds' and 'minutes' from a Pandas dataframe column

Given a dataframe like:
import numpy as np
import pandas as pd
df = pd.DataFrame(
{'Date' : pd.date_range('1/1/2011', periods=5, freq='3675S'),
'Num' : np.random.rand(5)})
Date Num
0 2011-01-01 00:00:00 0.580997
1 2011-01-01 01:01:15 0.407332
2 2011-01-01 02:02:30 0.786035
3 2011-01-01 03:03:45 0.821792
4 2011-01-01 04:05:00 0.807869
I would like to remove the 'minutes' and 'seconds' information.
The following (mostly stolen from: How to remove the 'seconds' of Pandas dataframe index?) works okay,
df = df.assign(Date = lambda x: pd.to_datetime(x['Date'].dt.strftime('%Y-%m-%d %H')))
Date Num
0 2011-01-01 00:00:00 0.580997
1 2011-01-01 01:00:00 0.407332
2 2011-01-01 02:00:00 0.786035
3 2011-01-01 03:00:00 0.821792
4 2011-01-01 04:00:00 0.807869
but it feels strange to convert a datetime to a string then back to a datetime. Is there a way to do this more directly?
dt.round
This is how it should be done... use dt.round
df.assign(Date=df.Date.dt.round('H'))
Date Num
0 2011-01-01 00:00:00 0.577957
1 2011-01-01 01:00:00 0.995748
2 2011-01-01 02:00:00 0.864013
3 2011-01-01 03:00:00 0.468762
4 2011-01-01 04:00:00 0.866827
OLD ANSWER
One approach is to set the index and use resample
df.set_index('Date').resample('H').last().reset_index()
Date Num
0 2011-01-01 00:00:00 0.577957
1 2011-01-01 01:00:00 0.995748
2 2011-01-01 02:00:00 0.864013
3 2011-01-01 03:00:00 0.468762
4 2011-01-01 04:00:00 0.866827
Another alternative is to strip the date and hour components
df.assign(
Date=pd.to_datetime(df.Date.dt.date) +
pd.to_timedelta(df.Date.dt.hour, unit='H'))
Date Num
0 2011-01-01 00:00:00 0.577957
1 2011-01-01 01:00:00 0.995748
2 2011-01-01 02:00:00 0.864013
3 2011-01-01 03:00:00 0.468762
4 2011-01-01 04:00:00 0.866827
Other solution could be this :
df.Date = pd.to_datetime(df.Date)
df.Date = df.Date.apply(lambda x: datetime(x.year, x.month, x.day, x.hour))

Categories