A B C
0 2001-01-13 10:00:00 Saturday
1 2001-01-14 12:33:00 Sunday
2 2001-01-20 15:10:00 Saturday
3 2001-01-24 13:15:00 Wednesday
4 2001-01-24 16:56:00 Wednesday
5 2001-01-24 19:09:00 Wednesday
6 2001-01-28 19:14:00 Sunday
7 2001-01-29 11:00:00 Monday
8 2001-01-29 23:50:00 Monday
9 2001-01-30 11:50:00 Tuesday
10 2001-01-30 13:00:00 Tuesday
11 2001-02-02 16:14:00 Wednesday
12 2001-02-02 09:25:00 Friday
I want to create a new df containing rows between all periods from Mondays at 12:00:00 to Wednesdays at 17:00:00
The output would be:
A B C
3 2001-01-24 13:15:00 Wednesday
5 2001-01-24 16:56:00 Wednesday
8 2001-01-29 23:50:00 Monday
9 2001-01-30 11:50:00 Tuesday
10 2001-01-30 13:00:00 Tuesday
11 2001-02-02 16:14:00 Wednesday
I tried with
df[(df["B"] >= "12:00:00") & (df["B"] <= "17:00:00")] & df[(df["C"] >= "Monday") & (df["C"] <= "Wednesday")]
But this is not what I want.
Thank you.
You can create 3 boolean masks and filter by boolean indexing - first for first day with starts time, second for all day between and last for last day and end time:
from datetime import time
#if necessary convert to datetime
df['A'] = pd.to_datetime(df['A'])
#if necessary convert to times
df['B'] = pd.to_datetime(df['B']).dt.time
m1 = (df['B']>=time(12)) & (df['C'] == 'Monday')
m2 = (df['C'] == 'Tuesday')
m3 = (df['B']<=time(17)) & (df['C'] == 'Wednesday')
df = df[m1 | m2 | m3]
print (df)
A B C
3 2001-01-24 13:15:00 Wednesday
4 2001-01-24 16:56:00 Wednesday
8 2001-01-29 23:50:00 Monday
9 2001-01-30 11:50:00 Tuesday
10 2001-01-30 13:00:00 Tuesday
12 2001-02-02 09:25:00 Wednesday
Another solution with same times from Monday to Friday:
from datetime import time
df['A'] = pd.to_datetime(df['A'])
df['B'] = pd.to_datetime(df['B']).dt.time
m1 = (df['B']>=time(12)) & (df['C'] == 'Monday')
m2 = df['C'].isin(['Tuesday', 'Wednesday'])
m3 = (df['B']<=time(17)) & (df['C'] == 'Friday')
df = df[m1 | m2 | m3]
print (df)
A B C
3 2001-01-24 13:15:00 Wednesday
4 2001-01-24 16:56:00 Wednesday
5 2001-01-24 19:09:00 Wednesday
8 2001-01-29 23:50:00 Monday
9 2001-01-30 11:50:00 Tuesday
10 2001-01-30 13:00:00 Tuesday
11 2001-02-02 16:14:00 Friday
12 2001-02-02 09:25:00 Wednesday
Use OR (|) operator and equal (=), instead of & and <=, >=). Hope it helps. Thanks.
old: df[(df["B"] >= "12:00:00") & (df["B"] <= "17:00:00")] & df[(df["C"] >= "Monday") & (df["C"] <= "Wednesday")]
New: df[(df["B"] >= "12:00:00") & (df["B"] <= "17:00:00")] & (df[(df["C"] = "Monday") | (df["C"] = "Tuesday") | (df["C"] = "Wednesday"))]
Related
My DataFrame:
start_trade week_day
0 2021-01-16 09:30:00 Saturday
1 2021-01-19 14:30:00 Tuesday
2 2021-01-25 22:00:00 Monday
3 2021-01-29 12:15:00 Friday
4 2021-01-31 12:35:00 Sunday
There are no trades on the exchange on Saturday and Sunday. Therefore, if my trading signal falls on the weekend, I want to open a trade on Friday 23:50.
Expexted output:
start_trade week_day
0 2021-01-15 23:50:00 Friday
1 2021-01-19 14:30:00 Tuesday
2 2021-01-25 22:00:00 Monday
3 2021-01-29 12:15:00 Friday
4 2021-01-29 23:50:00 Friday
How to do it?
You can do it playing with to_timedelta to change the date to the Friday of the week and then set the time with Timedelta. Do this only on the rows wanted with the mask
#for week ends dates
mask = df['start_trade'].dt.weekday.isin([5,6])
df.loc[mask, 'start_trade'] = (df['start_trade'].dt.normalize() # to get midnight
- pd.to_timedelta(df['start_trade'].dt.weekday-4, unit='D') # to get the friday date
+ pd.Timedelta(hours=23, minutes=50)) # set 23:50 for time
df.loc[mask, 'week_day'] = 'Friday'
print(df)
start_trade week_day
0 2021-01-15 23:50:00 Friday
1 2021-01-19 14:30:00 Tuesday
2 2021-01-25 22:00:00 Monday
3 2021-01-29 12:15:00 Friday
4 2021-01-29 23:50:00 Friday
Try:
weekend = df['week_day'].isin(['Saturday', 'Sunday'])
df.loc[weekend, 'week_day'] = 'Friday'
Or np.where along with str.contains, and | operator:
df['week_day'] = np.where(df['week_day'].str.contains(r'Saturday|Sunday'),'Friday',df['week_day'])
I have a pandas dataframe with a lot of columns, some of which have values on weekends.
I'm now trying to remove all weekend rows, but need to add the values I remove to the respective following Monday.
Thu: 4
Fri: 5
Sat: 2
Sun: 1
Mon: 4
Tue: 3
needs to become
Thu: 4
Fri: 5
Mon: 7
Tue: 3
I have figured out how to slice only the weekdays (using df.index.dayofweek), but can't think of a clever way to aggregate before doing so.
Here's some dummy code to start:
index = pd.date_range(datetime.datetime.now().date() -
datetime.timedelta(20),
periods = 20,
freq = 'D')
df = pd.DataFrame({
'Val_1': np.random.rand(20),
'Val_2': np.random.rand(20),
'Val_3': np.random.rand(20)
},
index = index)
df['Weekday'] = df.index.dayofweek
Any help on this would be much appreciated!
Setup
I included a random seed
np.random.seed([3, 1415])
index = pd.date_range(datetime.datetime.now().date() -
datetime.timedelta(20),
periods = 20,
freq = 'D')
df = pd.DataFrame({
'Val_1': np.random.rand(20),
'Val_2': np.random.rand(20),
'Val_3': np.random.rand(20)
},
index = index)
df['day_name'] = df.index.day_name()
df.head(6)
Val_1 Val_2 Val_3 day_name
2018-07-18 0.444939 0.278735 0.651676 Wednesday
2018-07-19 0.407554 0.609862 0.136097 Thursday
2018-07-20 0.460148 0.085823 0.544838 Friday
2018-07-21 0.465239 0.836997 0.035073 Saturday
2018-07-22 0.462691 0.739635 0.275079 Sunday
2018-07-23 0.016545 0.866059 0.706685 Monday
Solution
I fill in a series of dates with the subsequent Monday for Saturdays and Sundays. That gets used in a group by operation.
weekdays = df.index.to_series().mask(df.index.dayofweek >= 5).bfill()
d_ = df.groupby(weekdays).sum()
d_
Val_1 Val_2 Val_3
2018-07-18 0.444939 0.278735 0.651676
2018-07-19 0.407554 0.609862 0.136097
2018-07-20 0.460148 0.085823 0.544838
2018-07-23 0.944475 2.442691 1.016837
2018-07-24 0.850445 0.691271 0.713614
2018-07-25 0.817744 0.377185 0.776050
2018-07-26 0.777962 0.225146 0.542329
2018-07-27 0.757983 0.435280 0.836541
2018-07-30 2.645824 2.198333 1.375860
2018-07-31 0.926879 0.018688 0.746060
2018-08-01 0.721535 0.700566 0.373741
2018-08-02 0.117642 0.900749 0.603536
2018-08-03 0.145906 0.764869 0.775801
2018-08-06 0.738110 1.580137 1.266593
Compare
df.join(d_, rsuffix='_')
Val_1 Val_2 Val_3 day_name Val_1_ Val_2_ Val_3_
2018-07-18 0.444939 0.278735 0.651676 Wednesday 0.444939 0.278735 0.651676
2018-07-19 0.407554 0.609862 0.136097 Thursday 0.407554 0.609862 0.136097
2018-07-20 0.460148 0.085823 0.544838 Friday 0.460148 0.085823 0.544838
2018-07-21 0.465239 0.836997 0.035073 Saturday NaN NaN NaN
2018-07-22 0.462691 0.739635 0.275079 Sunday NaN NaN NaN
2018-07-23 0.016545 0.866059 0.706685 Monday 0.944475 2.442691 1.016837
2018-07-24 0.850445 0.691271 0.713614 Tuesday 0.850445 0.691271 0.713614
2018-07-25 0.817744 0.377185 0.776050 Wednesday 0.817744 0.377185 0.776050
2018-07-26 0.777962 0.225146 0.542329 Thursday 0.777962 0.225146 0.542329
2018-07-27 0.757983 0.435280 0.836541 Friday 0.757983 0.435280 0.836541
2018-07-28 0.934829 0.700900 0.538186 Saturday NaN NaN NaN
2018-07-29 0.831104 0.700946 0.185523 Sunday NaN NaN NaN
2018-07-30 0.879891 0.796487 0.652151 Monday 2.645824 2.198333 1.375860
2018-07-31 0.926879 0.018688 0.746060 Tuesday 0.926879 0.018688 0.746060
2018-08-01 0.721535 0.700566 0.373741 Wednesday 0.721535 0.700566 0.373741
2018-08-02 0.117642 0.900749 0.603536 Thursday 0.117642 0.900749 0.603536
2018-08-03 0.145906 0.764869 0.775801 Friday 0.145906 0.764869 0.775801
2018-08-04 0.199844 0.253200 0.091238 Saturday NaN NaN NaN
2018-08-05 0.437564 0.548054 0.504035 Sunday NaN NaN NaN
2018-08-06 0.100702 0.778883 0.671320 Monday 0.738110 1.580137 1.266593
Setup data using a simple series so that the weekend roll value is obvious:
index = pd.date_range(start='2018-07-18', periods = 20, freq = 'D')
df = pd.DataFrame({
'Val_1': [1] * 20,
'Val_2': [2] * 20,
'Val_3': [3] * 20,
},
index = index)
You can take the cumulative sum of the relevant columns in your dataframe, and then difference the results using a weekday boolean filter. You need to apply some special logic to correctly account for the first day(s) depending on whether it is a weekday, a Saturday or a Sunday.
The correct roll behavior can be observed using an index start date of July 21st (Saturday) and the 22nd (Sunday).
In addition, you may need to account for the situation where the last day or two falls on a weekend. As is, those values would be lost. Depending on the situation, you may wish to roll them forwards to the following Monday (in which case you would need to extend your index) or else roll them back to the preceding Friday.
weekdays = df.index.dayofweek < 5
df2 = df.iloc[:, :].cumsum()[weekdays].diff()
if weekdays[0]:
# First day is a weekday, so just use its value.
df2.iloc[0, :] = df.iloc[0, :]
elif weekdays[1]:
# First day must be a Sunday.
df2.iloc[0, :] = df.iloc[0:2, :].sum()
else:
# First day must be a Saturday.
df2.iloc[0, :] = df.iloc[0:3, :].sum()
>>> df2.head(14)
Val_1 Val_2 Val_3
2018-07-18 1 2 3
2018-07-19 1 2 3
2018-07-20 1 2 3
2018-07-23 3 6 9
2018-07-24 1 2 3
2018-07-25 1 2 3
2018-07-26 1 2 3
2018-07-27 1 2 3
2018-07-30 3 6 9
2018-07-31 1 2 3
2018-08-01 1 2 3
2018-08-02 1 2 3
2018-08-03 1 2 3
2018-08-06 3 6 9
I have following code:
Date X
...
2014-12-30 23:00:00 2
2014-12-30 23:15:00 0
2014-12-30 23:30:00 1
2014-12-30 23:45:00 1
2014-12-31 00:00:00 22
...
2015-01-01 00:00:00 0
2015-01-02 00:00:00 2
2015-01-03 00:00:00 2
2015-01-04 00:00:00 2
2015-01-04 00:00:00 2
2015-01-05 00:00:00 2
...
I want to split this time series (dataframe) into many time series (dataframe). I would like to have one time series for each Monday, one for all Tuesdays, Wednesdays ... etc.
How can I do that with pandas?
You can create dictionary of DataFrames with groupby and weekday_name:
dfs = dict(tuple(df.groupby(df['Date'].dt.weekday_name)))
#select by days
print (dfs['Friday'])
Date X
6 2015-01-02 2
print (dfs['Thursday'])
Date X
5 2015-01-01 0
Detail:
print (df['Date'].dt.weekday_name)
0 Tuesday
1 Tuesday
2 Tuesday
3 Tuesday
4 Wednesday
5 Thursday
6 Friday
7 Saturday
8 Sunday
9 Sunday
10 Monday
Name: Date, dtype: object
I need get 0 days 08:00:00 to 08:00:00.
code:
import pandas as pd
df = pd.DataFrame({
'Slot_no':[1,2,3,4,5,6,7],
'start_time':['0:01:00','8:01:00','10:01:00','12:01:00','14:01:00','18:01:00','20:01:00'],
'end_time':['8:00:00','10:00:00','12:00:00','14:00:00','18:00:00','20:00:00','0:00:00'],
'location_type':['not considered','Food','Parks & Outdoors','Food',
'Arts & Entertainment','Parks & Outdoors','Food']})
df = df.reindex_axis(['Slot_no','start_time','end_time','location_type','loc_set'], axis=1)
df['start_time'] = pd.to_timedelta(df['start_time'])
df['end_time'] = pd.to_timedelta(df['end_time'].replace('0:00:00', '24:00:00'))
output:
print (df)
Slot_no start_time end_time location_type loc_set
0 1 00:01:00 0 days 08:00:00 not considered NaN
1 2 08:01:00 0 days 10:00:00 Food NaN
2 3 10:01:00 0 days 12:00:00 Parks & Outdoors NaN
3 4 12:01:00 0 days 14:00:00 Food NaN
4 5 14:01:00 0 days 18:00:00 Arts & Entertainment NaN
5 6 18:01:00 0 days 20:00:00 Parks & Outdoors NaN
6 7 20:01:00 1 days 00:00:00 Food NaN
You can use to_datetime with dt.time:
df['end_time_times'] = pd.to_datetime(df['end_time']).dt.time
print (df)
Slot_no start_time end_time location_type loc_set \
0 1 00:01:00 0 days 08:00:00 not considered NaN
1 2 08:01:00 0 days 10:00:00 Food NaN
2 3 10:01:00 0 days 12:00:00 Parks & Outdoors NaN
3 4 12:01:00 0 days 14:00:00 Food NaN
4 5 14:01:00 0 days 18:00:00 Arts & Entertainment NaN
5 6 18:01:00 0 days 20:00:00 Parks & Outdoors NaN
6 7 20:01:00 1 days 00:00:00 Food NaN
end_time_times
0 08:00:00
1 10:00:00
2 12:00:00
3 14:00:00
4 18:00:00
5 20:00:00
6 00:00:00
Given a df of this kind, where we have DateTime Index:
DateTime A
2007-08-07 18:00:00 1
2007-08-08 00:00:00 2
2007-08-08 06:00:00 3
2007-08-08 12:00:00 4
2007-08-08 18:00:00 5
2007-11-02 18:00:00 6
2007-11-03 00:00:00 7
2007-11-03 06:00:00 8
2007-11-03 12:00:00 9
2007-11-03 18:00:00 10
I would like to subset observations using the attributes of the index, like:
First business day of the month
Last business day of the month
First Friday of the month 'WOM-1FRI'
Third Friday of the month 'WOM-3FRI'
I'm specifically interested to know if this can be done using something like:
df.loc[(df['A'] < 5) & (df.index == 'WOM-3FRI'), 'Signal'] = 1
Thanks
You could try...
# FIRST DAY OF MONTH
df.loc[df[1:][df.index.month[:-1]!=df.index.month[1:]].index]
# LAST DAY OF MONTH
df.loc[df[:-1][df.index.month[:-1]!=df.index.month[1:]].index]
# 1st Friday
fr1 = df.groupby(df.index.year*100+df.index.month).apply(lambda x: x[(x.index.week==1)*(x.index.weekday==4)])
# 3rd Friday
fr3 = df.groupby(df.index.year*100+df.index.month).apply(lambda x: x[(x.index.week==3)*(x.index.weekday==4)])
If you want to remove extra-levels in the index of fr1 and fr3:
fr1.index=fr1.index.droplevel(0)
fr3.index=fr3.index.droplevel(0)