Sorting pandas dataframe by weekdays - python

How do i sort the DataFrame by weekday names? I cannot use pd.to_datetime() method because my dates aren't numbers.
Date Transactions
0 Friday 140.652174
1 Monday 114.000000
2 Saturday 208.826087
3 Sunday 140.565217
4 Thursday 118.217391
5 Tuesday 107.826087
6 Wednesday 105.608696

You can convert column values to ordered categoricals, so it is possible to use sort_values:
cats = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']
df['Date'] = pd.Categorical(df['Date'], categories=cats, ordered=True)
df = df.sort_values('Date')
print (df)
Date Transactions
1 Monday 114.000000
5 Tuesday 107.826087
6 Wednesday 105.608696
4 Thursday 118.217391
0 Friday 140.652174
2 Saturday 208.826087
3 Sunday 140.565217
Or create an index from the Date column with set_index, then reindex and lastly reset_index:
Notice:
Solution only works if column values are unique
df = df.set_index('Date').reindex(cats).reset_index()
print (df)
Date Transactions
0 Monday 114.000000
1 Tuesday 107.826087
2 Wednesday 105.608696
3 Thursday 118.217391
4 Friday 140.652174
5 Saturday 208.826087
6 Sunday 140.565217

Use calendar.day_name with categorical data:
from calendar import day_name
df['Date'] = pd.Categorical(df['Date'], categories=day_name, ordered=True)
df = df.sort_values('Date')
print(df)
Date Transactions
1 Monday 114.000000
5 Tuesday 107.826087
6 Wednesday 105.608696
4 Thursday 118.217391
0 Friday 140.652174
2 Saturday 208.826087
3 Sunday 140.565217
If in your culture Monday is not considered the first day of the week, you can rotate your days of the week by n days. For example:
from collections import deque
days = deque(day_name)
days.rotate(1)
print(days)
deque(['Sunday', 'Monday', 'Tuesday', 'Wednesday',
'Thursday', 'Friday', 'Saturday'])
Then feed categories=days as an argument to pd.Categorical.

Related

Find sum of values between two dates of a single date column in Pandas dataframe

The dataframe contains date column, revenue column(for specific date) and the name of the day.
This is the code for creating the df:
pd.DataFrame({'Date':['2015-01-08','2015-01-09','2015-01-10','2015-02-10','2015-08-09','2015-08-13','2015-11-09','2015-11-15'],
'Revenue':[15,4,15,13,16,20,12,9],
'Weekday':['Monday','Tuesday','Wednesday','Monday','Friday','Saturday','Monday','Sunday']})
I want to find the sum of revenue between Mondays:
2015-02-10 34 Monday
2015-11-09 49 Monday etc.
First idea is used Weekday for groups by compare by Monday with cumulative sum and aggregate per groups:
df1 = (df.groupby(df['Weekday'].eq('Monday').cumsum())
.agg({'Date':'first','Revenue':'sum', 'Weekday':'first'}))
print (df1)
Date Revenue Weekday
Weekday
1 2015-01-08 34 Monday
2 2015-02-10 49 Monday
3 2015-11-09 21 Monday
But seems not matched Weekday column with Dates in sample data, so DataFrame.resample per weeks starting by Mondays return different output:
df['Date'] = pd.to_datetime(df['Date'])
df2 = df.resample('W-Mon', on='Date').agg({'Revenue':'sum', 'Weekday':'first'}).dropna()
print (df2)
Revenue Weekday
Date
2015-01-12 34 Monday
2015-02-16 13 Monday
2015-08-10 16 Friday
2015-08-17 20 Saturday
2015-11-09 12 Monday
2015-11-16 9 Sunday
First convert your Date column from string to datetime type:
df.Date = pd.to_datetime(df.Date)
Then generate the result:
result = df.groupby(pd.Grouper(key='Date', freq='W-MON', label='left')).Revenue.sum()/
.reset_index()
This result does not contain day of week and in my opinion this is OK,
as they will be all Mondays.
If you want to see only weeks with non-zero result, you can get it as:
result[result.Revenue != 0]
For your source data the result is:
Date Revenue
0 2015-01-05 34
5 2015-02-09 13
30 2015-08-03 16
31 2015-08-10 20
43 2015-11-02 12
44 2015-11-09 9

Combine weekday with hours in Pandas

I have a data frame with a weekday column that contains the name of the weekdays and a time column that contains hours on these days. How can I combine these 2 columns, so they can be also sortable?
I have tried the string version but it is not sortable based on weekdays and hours.
This is the sample table how it looks like.
weekday
time
Monday
12:00
Monday
13:00
Tuesday
20:00
Friday
10:00
This is what I want to get.
weekday_hours
Monday 12:00
Monday 13:00
Tuesday 20:00
Friday 10:00
Asumming that df is your initial dataframe
import json
datas = json.loads(df.to_json(orient="records"))
final_data = {"weekday_hours": []}
for data in datas:
final_data["weekday_hours"].append(data['weekday'] + ' ' + data['time'])
final_df = pd.DataFrame(final_data)
final_df
Ouptput:
you first need to create a datetime object of 7 days at an hourly level to sort by. In a normal Data warehousing world you normally have a calendar and a time dimension with all the different representation of your date data that you can merge and sort by, this is an adaptation of that methodology.
import pandas as pd
df1 = pd.DataFrame({'date' : pd.date_range('01 Jan 2021', '08 Jan 2021',freq='H')})
df1['str_date'] = df1['date'].dt.strftime('%A %H:%M')
print(df1.head(5))
date str_date
0 2021-01-01 00:00:00 Friday 00:00
1 2021-01-01 01:00:00 Friday 01:00
2 2021-01-01 02:00:00 Friday 02:00
3 2021-01-01 03:00:00 Friday 03:00
4 2021-01-01 04:00:00 Friday 04:00
Then create your column to merge on.
df['str_date'] = df['weekday'] + ' ' + df['time']
df2 = pd.merge(df[['str_date']],df1,on=['str_date'],how='left')\
.sort_values('date').drop('date',1)
print(df2)
str_date
3 Friday 10:00
0 Monday 12:00
1 Monday 13:00
2 Tuesday 20:00
Based on my understanding of the question, you want a single column, "weekday_hours," but you also want to be able to sort the data based on this column. This is a bit tricky because "Monday" doesn't provide enough information to define a valid datetime. Parsing using pd.to_datetime(df['weekday_hours'], format='%A %H:%M' for example, will return 1900-01-01 <hour::minute::second> if given just weekday and time. When sorted, this only sorts by time.
One workaround is to use dateutil to parse the dates. In lieu of a date, it will return the next date corresponding to the day of the week. For example, today (9 April 2021) dateutil.parser.parse('Friday 10:00') returns datetime.datetime(2021, 4, 9, 10, 0) and dateutil.parser.parse('Monday 10:00') returns datetime.datetime(2021, 4, 12, 10, 0). Therefore, we need to set the "default" date to something corresponding to our "first" day of the week. Here is an example starting with unsorted dates:
import datetime
import dateutil
import pandas as pd
weekdays = ['Friday', 'Monday', 'Monday', 'Tuesday']
times = ['10:00', '13:00', '12:00', '20:00', ]
df = pd.DataFrame({'weekday' : weekdays, 'time' : times})
df2 = pd.DataFrame()
df2['weekday_hours'] = df[['weekday', 'time']].agg(' '.join, axis=1)
amonday = datetime.datetime(2021, 2, 1, 0, 0) # assuming week starts monday
sorter = lambda t: [dateutil.parser.parse(ti, default=amonday) for ti in t]
print(df2.sort_values('weekday_hours', key=sorter))
Produces the output:
weekday_hours
2 Monday 12:00
1 Monday 13:00
3 Tuesday 20:00
0 Friday 10:00
Note there are probably more computationaly efficient ways if you are working with a lot of data, but this should illustrate the idea of a sortable weekday/time pair.

Python Dataframe: Get number of week days present in last month?

I have df with column day_name. I'm trying to get number of week_days present in last month?
I'm trying to get number of week_days present in last month.
For ex: There are 4 Fridays and 5 Thrusdays in April
df
day_name
0 Friday
1 Sunday
2 Thursday
3 Wednesday
4 Monday
As per python for a single day:
import calendar
year = 2020
month = 4
day_to_count = calendar.WEDNESDAY
matrix = calendar.monthcalendar(year,month)
num_days = sum(1 for x in matrix if x[day_to_count] != 0)
How do i use this in dataframe or any suggestions?
expected output
day_name last_months_count
0 Friday 4
1 Sunday 4
2 Thursday 5
3 Wednesday 5
4 Monday 4
For last month:
year, month = 2020, 4
start,end = f'{year}/{month}/1', f'{year}/{month+1}/1'
# we exclude the last day
# which is first day of next month
last_month = pd.date_range(start,end,freq='D')[:-1]
df['last_month_count'] = df['day_name'].map(last_month.day_name().value_counts())
Output:
day_name last_month_count
0 Friday 4
1 Sunday 4
2 Thursday 5
3 Wednesday 5
4 Monday 4
Bonus: to extract last month programatically:
from datetime import datetime
now = datetime.now()
year, month = now.year, now.month
# first month of the year
if month == 1:
year, month = year-1, 12
Here you go:
from datetime import date, timedelta
from calendar import day_name
import pandas as pd
today = date.today()
dt = date(today.year, today.month, 1) - timedelta(days=1)
day_to_count = {}
month = dt.month
while dt.month == month:
key = day_name[dt.weekday()]
day_to_count[key] = day_to_count.get(key, 0) + 1
dt -= timedelta(days = 1)
df = pd.DataFrame({
'day_name': ['Friday', 'Sunday', 'Thursday', 'Wednesday', 'Monday']
})
df['last_months_count'] = df['day_name'].apply(lambda day : day_to_count[day])
print(df)
Output:
day_name last_months_count
0 Friday 4
1 Sunday 4
2 Thursday 5
3 Wednesday 5
4 Monday 4

How to iterate a list starting from different index, and also wrap around

Similar question asked here (Start index for iterating Python list), but I need one more thing.
Assume I have a list [Sunday, Monday, ...Saturday],
and I want to iterate the list starting from different position, wrap around and complete the loop.
For example
a = [Sunday, Monday, ...Saturday]
for i in range(7):
print("----")
for j in (SOMETHING):
print(j)
OUTPUT:
----
Sunday
Monday
Tuesday
Wednesday
Thursday
Friday
Saturday
----
Monday
Tuesday
Wednesday
Thursday
Friday
Saturday
Sunday
----
Tuesday
.
.
.
Friday
How could I approach this?
One approach would be using collections.deque:
from collections import deque
from itertools import repeat
d = deque(['Sunday', 'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday'])
n = 7
for i in repeat(d, n):
print(*i, sep='\n')
print('-----')
i.rotate(-1)
Sunday
Monday
Tuesday
Wednesday
Thursday
Friday
Saturday
-----
Monday
Tuesday
Wednesday
Thursday
Friday
Saturday
Sunday
-----
Tuesday
.
.
.
Though you might find more interesting to create a nested list:
n = 7
l = []
for i in repeat(d, n):
sl = []
for j in i:
sl.append(j)
l.append(sl)
i.rotate(-1)
print(l)
# [['Sunday', 'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday'],
# ['Monday', 'Tuesday', 'Wednesday'...
It can be done by:
a[i:]+a[:i]
You could pop the start item off and add it to the end.
days = ['Sunday', 'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday']
for _ in range(7):
print("----")
print("\n".join(days))
days.append(days.pop(0))
You can use collections.dequeue, which has a rotate method. However, if you want to make it on your own you can do it like this:
>>> a = ['a','b','c','d']
>>> counter = 0
>>> start_index=2
>>> while counter < len(a):
... print(a[start_index])
... start_index+=1
... counter += 1
... if start_index==len(a):
... start_index=0
...
c
d
a
b
This is quite optimal, because you do not need to make any copy or create a new list, just iterate.
Use itertools.cycle
from itertools import cycle
counter = 1
days = ['Sunday', 'Monday', 'Tuesday']
for day in cycle(days):
print(day)
counter += 1
if counter == 7:
print('-----')
counter = 1
Use the following function:
def cycle_list(l, i):
for element in l[i:]:
yield element
for element in l[:i]:
yield element
If you don't want to import any libraries.
DAYS = ['Sunday', 'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday']
for i in range(7):
print("----")
for j in range(len(DAYS)):
print(DAYS[(j+i) % len(DAYS)])
you can chain the elements starting from your current index (in your case the current index is i) with the elements before current index using generators, in this way you will not create new lists, will be memory-efficient:
a = ['Sunday', 'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday']
from itertools import chain
for i in range(7):
print("----")
for j in chain((e for e in a[i:]), (e for e in a[:i])):
print(j)
output:
----
Sunday
Monday
Tuesday
Wednesday
Thursday
Friday
Saturday
----
Monday
Tuesday
Wednesday
Thursday
Friday
Saturday
Sunday
----
Tuesday
Wednesday
Thursday
Friday
Saturday
Sunday
Monday
----
Wednesday
Thursday
Friday
Saturday
Sunday
Monday
Tuesday
----
Thursday
Friday
Saturday
Sunday
Monday
Tuesday
Wednesday
----
Friday
Saturday
Sunday
Monday
Tuesday
Wednesday
Thursday
----
Saturday
Sunday
Monday
Tuesday
Wednesday
Thursday
Friday

correct way to sum values of second column for all unique values of first column pandas dataframe

I am new to pandas. I have a dataframe that has the days of the week in the first column and a list of values in the second. I wish to sum up the total value for each week day. so:
day values
0 Thursday 3
1 Thursday 0
2 Friday 0
2 Friday 1
4 Saturday 3
5 Saturday 1
etc...
would become :
day values
0 Thursday 3
1 Friday 1
2 Saturday 4
etc...
Using summing the number of occurrences per day pandas I achieved what I wanted:
- where the original df is called value_frame
values_on_day =pd.DataFrame(value_frame.groupby(value_frame.day).apply(lambda subf: subf['values'].sum()))
however the values and the weekdays are stuffed into one cell so that:
print dict(values_on_day)
equals:
{0: day
Friday 3
Monday 4
Saturday 7
Sunday 22
Thursday 26
Tuesday 2
Wednesday 4
Name: 0, dtype: int64}
I have coded a workaround by converting columns into dicts then lists and then back into a dict and converting back into a df but obviously this is not the way to do it.
Please would you show me the correct way to achieve total values for each day of the week in the original dataframe?
I agree with #Primer. This is the right way to code what you want to do.
I have updated my answer to add an index being the weekday number.
import pandas as pd
import time
df = pd.DataFrame({'day': ['Thursday', 'Thursday', 'Friday', 'Friday', 'Saturday', 'Saturday'], 'values': [3,0,0,1,3,1]})
result = df.groupby('day').sum()
# Reseting the index
result.reset_index(inplace=True)
# Creating a new index as the weekday number for each day
result.index = result['day'].apply(lambda x: time.strptime(x, '%A').tm_wday)
# Renaming the index
result.index.names = ['weekday']
# Sorting by index
result.sort_index(inplace=True)
print(result)
Gives:
day values
weekday
3 Thursday 3
4 Friday 1
5 Saturday 4

Categories