in the dataframe below, the 'end_of_week' column does not yet exist. I'm trying to make so that if 'date' is lesser or equal to Thursday of that week, 'end_of_week' is Thursday. How do I do this?
What I'm trying to do:
df['end_of_week'] = Thursday of same week if df['date'] <= Thursday
Example:
0 date end_of_week
1 2015-08-31 2015-09-03 #if <= Thursday of that week
2 2015-09-01 2015-09-03
3 2015-09-07 2015-09-10
4 2015-09-09 2015-09-10
5 2015-09-16 2015-09-17
6 2015-09-17 2015-09-17
Thanks.
How about:
df['end_of_week'] = df['date'].map(lambda x: next_weekday(x, 3) if x.weekday() < 3 else x)
where, from #phihag's answer at Find the date for the first Monday after a given a date:
def next_weekday(d, weekday):
days_ahead = weekday - d.weekday()
if days_ahead <= 0:
days_ahead += 7
return d + datetime.timedelta(days_ahead)
Doing a test:
import pandas as pd
import datetime
xl3 = pd.ExcelFile('test2.xlsx')
df3 = xl3.parse("Sheet1")
df3
Out[71]:
x y date
0 1 fum 2015-06-01
1 2 fo 2015-06-02
2 3 fi 2015-06-03
3 4 fee 2015-06-04
4 5 dumbledum 2015-06-05
5 6 dumbledee 2015-06-06
df3['end_of_week'] = df3['date'].map(lambda x: next_weekday(x, 3) if x.weekday() < 3 else x)
df3
x y date end_of_week
0 1 fum 2015-06-01 2015-06-04
1 2 fo 2015-06-02 2015-06-04
2 3 fi 2015-06-03 2015-06-04
3 4 fee 2015-06-04 2015-06-04
4 5 dumbledum 2015-06-05 2015-06-05
5 6 dumbledee 2015-06-06 2015-06-06
You can use arrow to find the nearest Thursday. Arrow considers Monday the zeroth day of the week.
import arrow
THURSDAY = 3
arw = arrow.get("2015-08-31")
arw = arw.replace(days= +(THURSDAY - arw.weekday())).format('YYYY-MM-DD')
#=> arw = '2015-09-03'
To address your issue directly:
import arrow
THURSDAY = 3
dates = [
"2015-08-31",
"2015-09-01",
"2015-09-07",
"2015-09-09",
"2015-09-16",
"2015-09-17",
]
end_of_week_dates = []
for date in dates:
arw = arrow.get(date)
if arw.weekday() <= THURSDAY:
end_of_week = arw.replace(days= +(THURSDAY - arw.weekday())).format('YYYY-MM-DD')
end_of_week_dates.append(end_of_week)
else:
# handle the case where the date is after the Thursday of that week
# end_of_week_dates = [
# '2015-09-03',
# '2015-09-03',
# '2015-09-10',
# '2015-09-10',
# '2015-09-17',
# '2015-09-17'
# ]
Related
how to convert time to week number
year_start = '2019-05-21'
year_end = '2020-02-22'
How do I get the week number based on the date that I set as first week?
For example 2019-05-21 should be Week 1 instead of 2019-01-01
If you do not have dates outside of year_start/year_end, use isocalendar().week and perform a simple subtraction with modulo:
year_start = pd.to_datetime('2019-05-21')
#year_end = pd.to_datetime('2020-02-22')
df = pd.DataFrame({'date': pd.date_range('2019-05-21', '2020-02-22', freq='30D')})
df['week'] = (df['date'].dt.isocalendar().week.astype(int)-year_start.isocalendar()[1])%52+1
Output:
date week
0 2019-05-21 1
1 2019-06-20 5
2 2019-07-20 9
3 2019-08-19 14
4 2019-09-18 18
5 2019-10-18 22
6 2019-11-17 26
7 2019-12-17 31
8 2020-01-16 35
9 2020-02-15 39
Try the following code.
import numpy as np
import pandas as pd
year_start = '2019-05-21'
year_end = '2020-02-22'
# Create a sample dataframe
df = pd.DataFrame(pd.date_range(year_start, year_end, freq='D'), columns=['date'])
# Add the week number
df['week_number'] = (((df.date.view(np.int64) - pd.to_datetime([year_start]).view(np.int64)) / (1e9 * 60 * 60 * 24) - df.date.dt.day_of_week + 7) // 7 + 1).astype(np.int64)
date
week_number
2019-05-21
1
2019-05-22
1
2019-05-23
1
2019-05-24
1
2019-05-25
1
2019-05-26
1
2019-05-27
2
2019-05-28
2
2020-02-18
40
2020-02-19
40
2020-02-20
40
2020-02-21
40
2020-02-22
40
If you just need a function to calculate week no, based on given start and end date:
import pandas as pd
import numpy as np
start_date = "2019-05-21"
end_date = "2020-02-22"
start_datetime = pd.to_datetime(start_date)
end_datetime = pd.to_datetime(end_date)
def get_week_no(date):
given_datetime = pd.to_datetime(date)
# if date in range
if start_datetime <= given_datetime <= end_datetime:
x = given_datetime - start_datetime
# adding 1 as it will return 0 for 1st week
return int(x / np.timedelta64(1, 'W')) + 1
raise ValueError(f"Date is not in range {start_date} - {end_date}")
print(get_week_no("2019-05-21"))
In the function, we are calculating week no by finding difference between given date and start date in weeks.
I have a pandas dataframe with a date column
I'm trying to create a function and apply it to the dataframe to create a column that returns the number of days in the month/year specified
so far i have:
from calendar import monthrange
def dom(x):
m = dfs["load_date"].dt.month
y = dfs["load_date"].dt.year
monthrange(y,m)
days = monthrange[1]
return days
This however does not work when I attempt to apply it to the date column.
Additionally, I would like to be able to identify whether or not it is the current month, and if so return the number of days up to the current date in that month as opposed to days in the entire month.
I am not sure of the best way to do this, all I can think of is to check the month/year against datetime's today and then use a delta
thanks in advance
For pt.1 of your question, you can cast to pd.Period and retrieve days_in_month:
import pandas as pd
# create a sample df:
df = pd.DataFrame({'date': pd.date_range('2020-01', '2021-01', freq='M')})
df['daysinmonths'] = df['date'].apply(lambda t: pd.Period(t, freq='S').days_in_month)
# df['daysinmonths']
# 0 31
# 1 29
# 2 31
# ...
For pt.2, you can take the timestamp of 'now' and create a boolean mask for your date column, i.e. where its year/month is less than "now". Then calculate the cumsum of the daysinmonth column for the section where the mask returns True. Invert the order of that series to get the days until now.
now = pd.Timestamp('now')
m = (df['date'].dt.year <= now.year) & (df['date'].dt.month < now.month)
df['daysuntilnow'] = df['daysinmonths'][m].cumsum().iloc[::-1].reset_index(drop=True)
Update after comment: to get the elapsed days per month, you can do
df['dayselapsed'] = df['daysinmonths']
m = (df['date'].dt.year == now.year) & (df['date'].dt.month == now.month)
if m.any():
df.loc[m, 'dayselapsed'] = now.day
df.loc[(df['date'].dt.year >= now.year) & (df['date'].dt.month > now.month), 'dayselapsed'] = 0
output
df
Out[13]:
date daysinmonths daysuntilnow dayselapsed
0 2020-01-31 31 213.0 31
1 2020-02-29 29 182.0 29
2 2020-03-31 31 152.0 31
3 2020-04-30 30 121.0 30
4 2020-05-31 31 91.0 31
5 2020-06-30 30 60.0 30
6 2020-07-31 31 31.0 31
7 2020-08-31 31 NaN 27
8 2020-09-30 30 NaN 0
9 2020-10-31 31 NaN 0
10 2020-11-30 30 NaN 0
11 2020-12-31 31 NaN 0
I have a dataframe df:
0 2003-01-02
1 2015-10-31
2 2015-11-01
16 2015-11-02
33 2015-11-03
44 2015-11-04
and I want to trim the outliers in the dates. So in this example I want to delete the row with the date 2003-01-02. Or in bigger data frames I want to delete the dates who do not lie in the interval where 95% or 99% lie. Is there a function who can do this ?
You could use quantile() on Series or DataFrame.
dates = [datetime.date(2003,1,2),
datetime.date(2015,10,31),
datetime.date(2015,11,1),
datetime.date(2015,11,2),
datetime.date(2015,11,3),
datetime.date(2015,11,4)]
df = pd.DataFrame({'DATE': [pd.Timestamp(x) for x in dates]})
print(df)
qa = df['DATE'].quantile(0.1) #lower 10%
qb = df['DATE'].quantile(0.9) #higher 10%
print(qa, qb)
#remove outliers
xf = df[(df['DATE'] >= qa) & (df['DATE'] <= qb)]
print(xf)
The output is:
DATE
0 2003-01-02
1 2015-10-31
2 2015-11-01
3 2015-11-02
4 2015-11-03
5 2015-11-04
2009-06-01 12:00:00 2015-11-03 12:00:00
DATE
1 2015-10-31
2 2015-11-01
3 2015-11-02
4 2015-11-03
Assuming you have your column converted to datetime format:
import pandas as pd
import datetime as dt
df = pd.DataFrame(data)
df = pd.to_datetime(df[0])
you can do:
include = df[df.dt.year > 2003]
print(include)
[out]:
1 2015-10-31
2 2015-11-01
3 2015-11-02
4 2015-11-03
5 2015-11-04
Name: 0, dtype: datetime64[ns]
Have a look here
... regarding to your answer (it's basically the same idea,... be creative my friend):
s = pd.Series(df)
s10 = s.quantile(.10)
s90 = s.quantile(.90)
my_filtered_data = df[df.dt.year >= s10.year]
my_filtered_data = my_filtered_data[my_filtered_data.dt.year <= s90.year]
We have csv file containing predefined time slots.
According to start time and end time provided by the user we want time slots present between the start time and end time.
eg
start time =11:00:00
end time=19:00:00
output- slot_no 2,3,4,5
I think you need boolean indexing with loc and between for selecting column Slot_no, all columns and values are converted to_timedelta, also midnight is replaced to 24:00:00:
df = pd.DataFrame(
{'Slot_no':[1,2,3,4,5,6,7],
'start_time':['0:01:00','8:01:00','10:01:01','12:01:00','14:01:00','18:01:01','20:01:00'],
'end_time':['8:00:00','10:00:00','12:00:00','14:00:00','18:00:00','20:00:00','0:00:00']})
df = df.reindex_axis(['Slot_no','start_time','end_time'], axis=1)
df['start_time'] = pd.to_timedelta(df['start_time'])
df['end_time'] = pd.to_timedelta(df['end_time'].replace('0:00:00', '24:00:00'))
print (df)
Slot_no start_time end_time
0 1 00:01:00 0 days 08:00:00
1 2 08:01:00 0 days 10:00:00
2 3 10:01:01 0 days 12:00:00
3 4 12:01:00 0 days 14:00:00
4 5 14:01:00 0 days 18:00:00
5 6 18:01:01 0 days 20:00:00
6 7 20:01:00 1 days 00:00:00
start = pd.to_timedelta('11:00:00')
end = pd.to_timedelta('19:00:00')
mask = df['start_time'].between(start, end) | df['end_time'].between(start, end)
s = df.loc[mask, 'Slot_no']
print (s)
2 3
3 4
4 5
5 6
Name: Slot_no, dtype: int64
L = df.loc[mask, 'Slot_no'].tolist()
print (L)
[3, 4, 5, 6]
I'm trying to get week on a month, some months might have four weeks some might have five.
For each date i would like to know to which week does it belongs to. I'm mostly interested in the last week of the month.
data = pd.DataFrame(pd.date_range(' 1/ 1/ 2000', periods = 100, freq ='D'))
0 2000-01-01
1 2000-01-02
2 2000-01-03
3 2000-01-04
4 2000-01-05
5 2000-01-06
6 2000-01-07
See this answer and decide which week of month you want.
There's nothing built-in, so you'll need to calculate it with apply. For example, for an easy 'how many 7 day periods have passed' measure.
data['wom'] = data[0].apply(lambda d: (d.day-1) // 7 + 1)
For a more complicated (based on the calender), using the function from that answer.
import datetime
import calendar
def week_of_month(tgtdate):
tgtdate = tgtdate.to_datetime()
days_this_month = calendar.mdays[tgtdate.month]
for i in range(1, days_this_month):
d = datetime.datetime(tgtdate.year, tgtdate.month, i)
if d.day - d.weekday() > 0:
startdate = d
break
# now we canuse the modulo 7 appraoch
return (tgtdate - startdate).days //7 + 1
data['calendar_wom'] = data[0].apply(week_of_month)
I've used the code below when dealing with dataframes that have a datetime index.
import pandas as pd
import math
def add_week_of_month(df):
df['week_in_month'] = pd.to_numeric(df.index.day/7)
df['week_in_month'] = df['week_in_month'].apply(lambda x: math.ceil(x))
return df
If you run this example:
df = test = pd.DataFrame({'count':['a','b','c','d','e']},
index = ['2018-01-01', '2018-01-08','2018-01-31','2018-02-01','2018-02-28'])
df.index = pd.to_datetime(df.index)
you should get the following dataframe
count week_in_month
2018-01-01 a 1
2018-01-08 b 2
2018-01-31 c 5
2018-02-01 d 1
2018-02-28 e 4
TL;DR
import pandas as pd
def weekinmonth(dates):
"""Get week number in a month.
Parameters:
dates (pd.Series): Series of dates.
Returns:
pd.Series: Week number in a month.
"""
firstday_in_month = dates - pd.to_timedelta(dates.dt.day - 1, unit='d')
return (dates.dt.day-1 + firstday_in_month.dt.weekday) // 7 + 1
df = pd.DataFrame(pd.date_range(' 1/ 1/ 2000', periods = 100, freq ='D'), columns=['Date'])
weekinmonth(df['Date'])
0 1
1 1
2 2
3 2
4 2
..
95 2
96 2
97 2
98 2
99 2
Name: Date, Length: 100, dtype: int64
Explanation
At first, calculate first day in month (from this answer: How floor a date to the first date of that month?):
df = pd.DataFrame(pd.date_range(' 1/ 1/ 2000', periods = 100, freq ='D'), columns=['Date'])
df['MonthFirstDay'] = df['Date'] - pd.to_timedelta(df['Date'].dt.day - 1, unit='d')
df
Date MonthFirstDay
0 2000-01-01 2000-01-01
1 2000-01-02 2000-01-01
2 2000-01-03 2000-01-01
3 2000-01-04 2000-01-01
4 2000-01-05 2000-01-01
.. ... ...
95 2000-04-05 2000-04-01
96 2000-04-06 2000-04-01
97 2000-04-07 2000-04-01
98 2000-04-08 2000-04-01
99 2000-04-09 2000-04-01
[100 rows x 2 columns]
Obtain weekday from first day:
df['FirstWeekday'] = df['MonthFirstDay'].dt.weekday
df
Date MonthFirstDay FirstWeekday
0 2000-01-01 2000-01-01 5
1 2000-01-02 2000-01-01 5
2 2000-01-03 2000-01-01 5
3 2000-01-04 2000-01-01 5
4 2000-01-05 2000-01-01 5
.. ... ... ...
95 2000-04-05 2000-04-01 5
96 2000-04-06 2000-04-01 5
97 2000-04-07 2000-04-01 5
98 2000-04-08 2000-04-01 5
99 2000-04-09 2000-04-01 5
[100 rows x 3 columns]
Now I can calculate with modulo of weekdays to obtain the week number in a month:
Get day of the month by df['Date'].dt.day and make sure that begins with 0 due to modulo calculation df['Date'].dt.day-1.
Add weekday number to make sure which day of month starts + df['FirstWeekday']
Be safe to use the integer division of 7 days in a week and add 1 to start week number in month from 1 // 7 + 1.
Whole modulo calculation:
df['WeekInMonth'] = (df['Date'].dt.day-1 + df['FirstWeekday']) // 7 + 1
df
Date MonthFirstDay FirstWeekday WeekInMonth
0 2000-01-01 2000-01-01 5 1
1 2000-01-02 2000-01-01 5 1
2 2000-01-03 2000-01-01 5 2
3 2000-01-04 2000-01-01 5 2
4 2000-01-05 2000-01-01 5 2
.. ... ... ... ...
95 2000-04-05 2000-04-01 5 2
96 2000-04-06 2000-04-01 5 2
97 2000-04-07 2000-04-01 5 2
98 2000-04-08 2000-04-01 5 2
99 2000-04-09 2000-04-01 5 2
[100 rows x 4 columns]
This seems to do the trick for me
df_dates = pd.DataFrame({'date':pd.bdate_range(df['date'].min(),df['date'].max())})
df_dates_tues = df_dates[df_dates['date'].dt.weekday==2].copy()
df_dates_tues['week']=np.mod(df_dates_tues['date'].dt.strftime('%W').astype(int),4)
You can get it subtracting the current week and the week of the first day of the month, but extra logic is needed to handle first and last week of the year:
def get_week(s):
prev_week = (s - pd.to_timedelta(7, unit='d')).dt.week
return (
s.dt.week
.where((s.dt.month != 1) | (s.dt.week < 50), 0)
.where((s.dt.month != 12) | (s.dt.week > 1), prev_week + 1)
)
def get_week_of_month(s):
first_day_of_month = s - pd.to_timedelta(s.dt.day - 1, unit='d')
first_week_of_month = get_week(first_day_of_month)
current_week = get_week(s)
return current_week - first_week_of_month
My logic to get the week of the month depends on the week of the year.
1st calculate week of the year in a data frame
Then get the max week month of the previous year if the month is not 1, if month is 1 return week of year
if max week of previous month equals max week of current month
Then return the difference current week of the year with the max week month of the previous month plus 1
Else return difference of current week of the year with the max week month of the previous month
Hope this solves the problem for multiple logics used above which have limitations, the below function does the same. Temp here is the data frame for which week of the year is calculated using dt.weekofyear
def weekofmonth(dt1):
if dt1.month == 1:
return (dt1.weekofyear)
else:
pmth = dt1.month - 1
year = dt1.year
pmmaxweek = temp[(temp['timestamp_utc'].dt.month == pmth) & (temp['timestamp_utc'].dt.year == year)]['timestamp_utc'].dt.weekofyear.max()
if dt1.weekofyear == pmmaxweek:
return (dt1.weekofyear - pmmaxweek + 1)
else:
return (dt1.weekofyear - pmmaxweek)
import pandas as pd
import math
def week_of_month(dt:pd.Timestamp):
return math.ceil((x-1)//7)+1
dt["what_you_need"] = df["dt_col_name"].apply(week_of_month)
This gives you week from 1-5, if days>28, then it will count as 5th week.