I have df with column day_name. I'm trying to get number of week_days present in last month?
I'm trying to get number of week_days present in last month.
For ex: There are 4 Fridays and 5 Thrusdays in April
df
day_name
0 Friday
1 Sunday
2 Thursday
3 Wednesday
4 Monday
As per python for a single day:
import calendar
year = 2020
month = 4
day_to_count = calendar.WEDNESDAY
matrix = calendar.monthcalendar(year,month)
num_days = sum(1 for x in matrix if x[day_to_count] != 0)
How do i use this in dataframe or any suggestions?
expected output
day_name last_months_count
0 Friday 4
1 Sunday 4
2 Thursday 5
3 Wednesday 5
4 Monday 4
For last month:
year, month = 2020, 4
start,end = f'{year}/{month}/1', f'{year}/{month+1}/1'
# we exclude the last day
# which is first day of next month
last_month = pd.date_range(start,end,freq='D')[:-1]
df['last_month_count'] = df['day_name'].map(last_month.day_name().value_counts())
Output:
day_name last_month_count
0 Friday 4
1 Sunday 4
2 Thursday 5
3 Wednesday 5
4 Monday 4
Bonus: to extract last month programatically:
from datetime import datetime
now = datetime.now()
year, month = now.year, now.month
# first month of the year
if month == 1:
year, month = year-1, 12
Here you go:
from datetime import date, timedelta
from calendar import day_name
import pandas as pd
today = date.today()
dt = date(today.year, today.month, 1) - timedelta(days=1)
day_to_count = {}
month = dt.month
while dt.month == month:
key = day_name[dt.weekday()]
day_to_count[key] = day_to_count.get(key, 0) + 1
dt -= timedelta(days = 1)
df = pd.DataFrame({
'day_name': ['Friday', 'Sunday', 'Thursday', 'Wednesday', 'Monday']
})
df['last_months_count'] = df['day_name'].apply(lambda day : day_to_count[day])
print(df)
Output:
day_name last_months_count
0 Friday 4
1 Sunday 4
2 Thursday 5
3 Wednesday 5
4 Monday 4
Related
I have the timeseries dataframe as:
timestamp
signal_value
2017-08-28 00:00:00
10
2017-08-28 00:05:00
3
2017-08-28 00:10:00
5
2017-08-28 00:15:00
5
I am trying to get the average Monthly percentage of the time where "signal_value" is greater than 5. Something like:
Month
metric
January
16%
February
2%
March
8%
April
10%
I tried the following code which gives the result for the whole dataset but how can I summarize it per each month?
total,count = 0, 0
for index, row in df.iterrows():
total += 1
if row["signal_value"] >= 5:
count += 1
print((count/total)*100)
Thank you in advance.
Let us first generate some random data (generate random dates taken from here):
import pandas as pd
import numpy as np
import datetime
def randomtimes(start, end, n):
frmt = '%d-%m-%Y %H:%M:%S'
stime = datetime.datetime.strptime(start, frmt)
etime = datetime.datetime.strptime(end, frmt)
td = etime - stime
dtimes = [np.random.random() * td + stime for _ in range(n)]
return [d.strftime(frmt) for d in dtimes]
# Recreat some fake data
timestamp = randomtimes("01-01-2021 00:00:00", "01-01-2023 00:00:00", 10000)
signal_value = np.random.random(len(timestamp)) * 10
df = pd.DataFrame({"timestamp": timestamp, "signal_value": signal_value})
Now we can transform the timestamp column to pandas timestamps to extract month and year per timestamp:
df.timestamp = pd.to_datetime(df.timestamp)
df["month"] = df.timestamp.dt.month
df["year"] = df.timestamp.dt.year
We generate a boolean column whether signal_value is larger than some threshold (here 5):
df["is_larger5"] = df.signal_value > 5
Finally, we can get the average for every month by using pandas.groupby:
>>> df.groupby(["year", "month"])['is_larger5'].mean()
year month
2021 1 0.509615
2 0.488189
3 0.506024
4 0.519362
5 0.498778
6 0.483709
7 0.498824
8 0.460396
9 0.542918
10 0.463043
11 0.492500
12 0.519789
2022 1 0.481663
2 0.527778
3 0.501139
4 0.527322
5 0.486936
6 0.510638
7 0.483370
8 0.521253
9 0.493639
10 0.495349
11 0.474886
12 0.488372
Name: is_larger5, dtype: float64
I have a csv-file: https://data.rivm.nl/covid-19/COVID-19_aantallen_gemeente_per_dag.csv
I want to use it to provide insight into the corona deaths per week.
df = pd.read_csv("covid.csv", error_bad_lines=False, sep=";")
df = df.loc[df['Deceased'] > 0]
df["Date_of_publication"] = pd.to_datetime(df["Date_of_publication"])
df["Week"] = df["Date_of_publication"].dt.isocalendar().week
df["Year"] = df["Date_of_publication"].dt.year
df = df[["Week", "Year", "Municipality_name", "Deceased"]]
df = df.groupby(by=["Week", "Year", "Municipality_name"]).agg({"Deceased" : "sum"})
df = df.sort_values(by=["Year", "Week"])
print(df)
Everything seems to be working fine except for the first 3 days of 2021. The first 3 days of 2021 are part of the last week (53) of 2020: http://week-number.net/calendar-with-week-numbers-2021.html.
When I print the dataframe this is the result:
53 2021 Winterswijk 1
Woudenberg 1
Zaanstad 1
Zeist 2
Zutphen 1
So basically what I'm looking for is a way where this line returns the year of the week number and not the year of the date:
df["Year"] = df["Date_of_publication"].dt.year
You can use dt.isocalendar().year to setup df["Year"]:
df["Year"] = df["Date_of_publication"].dt.isocalendar().year
You will get year 2020 for date of 2021-01-01 but will get back to year 2021 for date of 2021-01-04 by this.
This is just similar to how you used dt.isocalendar().week for setting up df["Week"]. Since they are both basing on the same tuple (year, week, day) returned by dt.isocalendar(), they would always be in sync.
Demo
date_s = pd.Series(pd.date_range(start='2021-01-01', periods=5, freq='1D'))
date_s
0
0 2021-01-01
1 2021-01-02
2 2021-01-03
3 2021-01-04
4 2021-01-05
date_s.dt.isocalendar()
year week day
0 2020 53 5
1 2020 53 6
2 2020 53 7
3 2021 1 1
4 2021 1 2
You can simply subtract the two dates and then divide the days attribute of the timedelta object by 7.
For example, this is the current week we are on now.
time_delta = (dt.datetime.today() - dt.datetime(2021, 1, 1))
The output is a datetime timedelta object
datetime.timedelta(days=75, seconds=84904, microseconds=144959)
For your problem, you'd do something like this
time_delta = int((df["Date_of_publication"] - df["Year"].days / 7)
The output would be a number that is the current week since date_of_publication
I have a column with many dates: sample of the said list below
Dates
1 2019-02-01
2 2018-03-10
3 2019-08-01
4 2020-02-07
I would like to have it so that if input a date, of any year I can get the week number.
However, the fiscal year starts on Aug 1 of any given year.
I tried just shifting the date to Jan 1 but it's different for every year due to leap years.
data['Dates'] = pd.to_datetime(data['Dates'])
data['Week'] = (data['Dates'] - timedelta(days=215)).week
print(data)
how can I get a result similar to this one below
Dates Week
1 2019-02-01 27
2 2018-03-10 32
3 2019-08-01 1
4 2020-02-07 28
-Note: the weeks are probably incorrect.
The other answer ignores the fiscal year part of the OP. I am leaving the fiscal year start date calc to the reader but this will calculate the week number (where Monday is the start of the week) from an arbitrary start date.
from dateutil import relativedelta
from datetime import date, datetime, timedelta
NEXT_MONDAY = relativedelta.relativedelta(weekday=relativedelta.MO)
LAST_MONDAY = relativedelta.relativedelta(weekday=relativedelta.MO(-1))
ONE_WEEK = timedelta(weeks=1)
def week_in_fiscal_year(d: date, fiscal_year_start: date) -> int:
fy_week_2_monday = fiscal_year_start + NEXT_MONDAY
if d < fy_week_2_monday:
return 1
else:
cur_week_monday = d + LAST_MONDAY
return int((cur_week_monday - fy_week_2_monday) / ONE_WEEK) + 2
adapted from this post
Convert it to a datetime, then call datetime.date(2010, 6, 16).strftime("%V")4
You can also use isocalendar which will return a tuple, as opposed to a string above datetime.date(2010, 6, 16).isocalendar()[1]
How to get week number in Python?
I need to calculate a price based on a given date weekdays a month.
This is what im currently working with:
month = time.month
year = time.year
weekdays = 0
cal = calendar.Calendar()
for day in cal.itermonthdates(year, month):
if day.weekday() == 6 and day.month == month:
weekdays += 1
But this does not rely on a given date.
I want this to return 6 for the date 10.01.2020, or 6 for 03.01.2020 or 4 for 06.01.2020.
Any help would be very nice.
Following can be a dry approach:
import datetime
# ...
prev_day = time.day - datetime.timedelta(days=1)
month = time.month
year = time.year
cal = calendar.Calendar()
days_iterator = cal.itermonthdates(year, month)
while next(days_iterator) != prev_day:
pass
weekdays = 0
for d in days_iterator:
if d.weekday() == 6 and d.month == month:
weekdays += 1
Try this:
import datetime
d=datetime.date(2020, 1, 10) #Format YYYY, MM, DD
print(d.isoweekday())
Now this will print 5 not 6 as it is a Friday and the counting starts at Monday (using isoweekday instead of weekday will let the counting start by 1 instead of 0) but there should be an easy fix if you want your week begin on Sunday just add 1 and calculate modulo 7:
print((d.isoweekday()+1)%7)
https://docs.python.org/3/library/datetime.html
How do i sort the DataFrame by weekday names? I cannot use pd.to_datetime() method because my dates aren't numbers.
Date Transactions
0 Friday 140.652174
1 Monday 114.000000
2 Saturday 208.826087
3 Sunday 140.565217
4 Thursday 118.217391
5 Tuesday 107.826087
6 Wednesday 105.608696
You can convert column values to ordered categoricals, so it is possible to use sort_values:
cats = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']
df['Date'] = pd.Categorical(df['Date'], categories=cats, ordered=True)
df = df.sort_values('Date')
print (df)
Date Transactions
1 Monday 114.000000
5 Tuesday 107.826087
6 Wednesday 105.608696
4 Thursday 118.217391
0 Friday 140.652174
2 Saturday 208.826087
3 Sunday 140.565217
Or create an index from the Date column with set_index, then reindex and lastly reset_index:
Notice:
Solution only works if column values are unique
df = df.set_index('Date').reindex(cats).reset_index()
print (df)
Date Transactions
0 Monday 114.000000
1 Tuesday 107.826087
2 Wednesday 105.608696
3 Thursday 118.217391
4 Friday 140.652174
5 Saturday 208.826087
6 Sunday 140.565217
Use calendar.day_name with categorical data:
from calendar import day_name
df['Date'] = pd.Categorical(df['Date'], categories=day_name, ordered=True)
df = df.sort_values('Date')
print(df)
Date Transactions
1 Monday 114.000000
5 Tuesday 107.826087
6 Wednesday 105.608696
4 Thursday 118.217391
0 Friday 140.652174
2 Saturday 208.826087
3 Sunday 140.565217
If in your culture Monday is not considered the first day of the week, you can rotate your days of the week by n days. For example:
from collections import deque
days = deque(day_name)
days.rotate(1)
print(days)
deque(['Sunday', 'Monday', 'Tuesday', 'Wednesday',
'Thursday', 'Friday', 'Saturday'])
Then feed categories=days as an argument to pd.Categorical.