How to set the columns in pandas - python

Here is my dataframe:
Dec-18 Jan-19 Feb-19 Mar-19 Apr-19 May-19
Saturday 2540.0 2441.0 3832.0 4093.0 1455.0 2552.0
Sunday 1313.0 1891.0 2968.0 2260.0 1454.0 1798.0
Monday 1360.0 1558.0 2967.0 2156.0 1564.0 1752.0
Tuesday 1089.0 2105.0 2476.0 1577.0 1744.0 1457.0
Wednesday 1329.0 1658.0 2073.0 2403.0 1231.0 874.0
Thursday 798.0 1195.0 2183.0 1287.0 1460.0 1269.0
I have tried some pandas ops but I am not able to do that.
This is what I want to do:
items
Saturday 2540.0
Sunday 1313.0
Monday 1360.0
Tuesday 1089.0
Wednesday 1329.0
Thursday 798.0
Saturday 2441.0
Sunday 1891.0
Monday 1558.0
Tuesday 2105.0
Wednesday 1658.0
Thursday 1195.0 ............ and so on
I want to set those rows into rows in downside, how to do that?

df.reset_index().melt(id_vars='index').drop('variable',1)
Output:
index value
0 Saturday 2540.0
1 Sunday 1313.0
2 Monday 1360.0
3 Tuesday 1089.0
4 Wednesday 1329.0
5 Thursday 798.0
6 Saturday 2441.0
7 Sunday 1891.0
8 Monday 1558.0
9 Tuesday 2105.0
10 Wednesday 1658.0
11 Thursday 1195.0
12 Saturday 3832.0
13 Sunday 2968.0
14 Monday 2967.0
15 Tuesday 2476.0
16 Wednesday 2073.0
17 Thursday 2183.0
18 Saturday 4093.0
19 Sunday 2260.0
20 Monday 2156.0
21 Tuesday 1577.0
22 Wednesday 2403.0
23 Thursday 1287.0
24 Saturday 1455.0
25 Sunday 1454.0
26 Monday 1564.0
27 Tuesday 1744.0
28 Wednesday 1231.0
29 Thursday 1460.0
30 Saturday 2552.0
31 Sunday 1798.0
32 Monday 1752.0
33 Tuesday 1457.0
34 Wednesday 874.0
35 Thursday 1269.0
Note: just noted a commented suggesting to do the same thing, I will delete my post if requested :)

Create it with numpy by reshaping the data.
import pandas as pd
import numpy as np
pd.DataFrame(df.to_numpy().flatten('F'),
index=np.tile(df.index, df.shape[1]),
columns=['items'])
Output:
items
Saturday 2540.0
Sunday 1313.0
Monday 1360.0
Tuesday 1089.0
Wednesday 1329.0
Thursday 798.0
Saturday 2441.0
...
Sunday 1798.0
Monday 1752.0
Tuesday 1457.0
Wednesday 874.0
Thursday 1269.0

You can do:
df = df.stack().sort_index(level=1).reset_index(level = 1, drop=True).to_frame('items')
It is interesting that this method got overlooked even though it is the fastest:
import time
start = time.time()
df.stack().sort_index(level=1).reset_index(level = 1, drop=True).to_frame('items')
end = time.time()
print("time taken {}".format(end-start))
yields: time taken 0.006181955337524414
while this:
start = time.time()
df.reset_index().melt(id_vars='days').drop('variable',1)
end = time.time()
print("time taken {}".format(end-start))
yields: time taken 0.010072708129882812
Any my output format matches OP's requested exactly.

Related

Pandas Time Series: Count weekdays with min value in annual data

this is my first question on Stackoverflow and I hope I describe my problem detailed enough.
I'm starting to learn data analysis with Pandas and I've created a time series with daily data for gas prices of a certain station. I've already grouped the hourly data into daily data.
I've been successfull with a simple scatter plot over the year with plotly but in the next step I would like to analyze which weekday is the cheapest or most expensive in every week, count the daynames and then look if there is a pattern over the whole year.
count mean std min 25% 50% 75% max \
2022-01-01 35.0 1.685000 0.029124 1.649 1.659 1.689 1.6990 1.749
2022-01-02 27.0 1.673444 0.024547 1.649 1.649 1.669 1.6890 1.729
2022-01-03 28.0 1.664000 0.040597 1.599 1.639 1.654 1.6890 1.789
2022-01-04 31.0 1.635129 0.045069 1.599 1.599 1.619 1.6490 1.779
2022-01-05 33.0 1.658697 0.048637 1.599 1.619 1.649 1.6990 1.769
2022-01-06 35.0 1.658429 0.050756 1.599 1.619 1.639 1.6940 1.779
2022-01-07 30.0 1.637333 0.039136 1.599 1.609 1.629 1.6565 1.759
2022-01-08 41.0 1.655829 0.041740 1.619 1.619 1.639 1.6790 1.769
2022-01-09 35.0 1.647857 0.031602 1.619 1.619 1.639 1.6590 1.769
2022-01-10 31.0 1.634806 0.041374 1.599 1.609 1.619 1.6490 1.769
...
week weekday
2022-01-01 52 Saturday
2022-01-02 52 Sunday
2022-01-03 1 Monday
2022-01-04 1 Tuesday
2022-01-05 1 Wednesday
2022-01-06 1 Thursday
2022-01-07 1 Friday
2022-01-08 1 Saturday
2022-01-09 1 Sunday
2022-01-10 2 Monday
...
I tried with grouping and resampling but unfortunately I didn't get the result I was hoping for.
Can someone suggest a way how to deal with this problem? Thanks!
Here's a way to do what I believe your question asks:
import pandas as pd
import numpy as np
df = pd.DataFrame({
'count':[35,27,28,31,33,35,30,41,35,31]*40,
'mean':
[1.685,1.673444,1.664,1.635129,1.658697,1.658429,1.637333,1.655829,1.647857,1.634806]*40
},
index=pd.Series(pd.to_datetime(pd.date_range("2022-01-01", periods=400, freq="D"))))
print( '','input df:',df,sep='\n' )
df_date = df.reset_index()['index']
df['weekday'] = list(df_date.dt.day_name())
df['year'] = df_date.dt.year.to_numpy()
df['week'] = df_date.dt.isocalendar().week.to_numpy()
df['year_week_started'] = df.year - np.where((df.week>=52)&(df.week.shift(-7)==1),1,0)
print( '','input df with intermediate columns:',df,sep='\n' )
cols = ['year_week_started', 'week']
dfCheap = df.loc[df.groupby(cols)['mean'].idxmin(),:].set_index(cols)
dfCheap = ( dfCheap.groupby(['year_week_started', 'weekday'])['mean'].count()
.rename('freq').to_frame().set_index('freq', append=True)
.reset_index(level='weekday').sort_index(ascending=[True,False]) )
print( '','dfCheap:',dfCheap,sep='\n' )
dfExpensive = df.loc[df.groupby(cols)['mean'].idxmax(),:].set_index(cols)
dfExpensive = ( dfExpensive.groupby(['year_week_started', 'weekday'])['mean'].count()
.rename('freq').to_frame().set_index('freq', append=True)
.reset_index(level='weekday').sort_index(ascending=[True,False]) )
print( '','dfExpensive:',dfExpensive,sep='\n' )
Sample input:
input df:
count mean
2022-01-01 35 1.685000
2022-01-02 27 1.673444
2022-01-03 28 1.664000
2022-01-04 31 1.635129
2022-01-05 33 1.658697
... ... ...
2023-01-31 35 1.658429
2023-02-01 30 1.637333
2023-02-02 41 1.655829
2023-02-03 35 1.647857
2023-02-04 31 1.634806
[400 rows x 2 columns]
input df with intermediate columns:
count mean weekday year week year_week_started
2022-01-01 35 1.685000 Saturday 2022 52 2021
2022-01-02 27 1.673444 Sunday 2022 52 2021
2022-01-03 28 1.664000 Monday 2022 1 2022
2022-01-04 31 1.635129 Tuesday 2022 1 2022
2022-01-05 33 1.658697 Wednesday 2022 1 2022
... ... ... ... ... ... ...
2023-01-31 35 1.658429 Tuesday 2023 5 2023
2023-02-01 30 1.637333 Wednesday 2023 5 2023
2023-02-02 41 1.655829 Thursday 2023 5 2023
2023-02-03 35 1.647857 Friday 2023 5 2023
2023-02-04 31 1.634806 Saturday 2023 5 2023
[400 rows x 6 columns]
Sample output:
dfCheap:
weekday
year_week_started freq
2021 1 Monday
2022 11 Tuesday
10 Thursday
10 Wednesday
6 Sunday
5 Friday
5 Monday
5 Saturday
2023 2 Thursday
1 Saturday
1 Sunday
1 Wednesday
dfExpensive:
weekday
year_week_started freq
2021 1 Saturday
2022 16 Monday
10 Tuesday
6 Sunday
5 Friday
5 Saturday
5 Thursday
5 Wednesday
2023 2 Monday
1 Friday
1 Thursday
1 Tuesday

How to split a dataframe by week on a particular starting weekday (e.g, Thursday)?

I'm using Python, and I have a Dataframe in which all dates and weekdays are mentioned.
And I want to divide them into Week (Like - Thursday to Thursday)
Dataframe -
And Now I want to divide this dataframe in this format-
Date Weekday
0 2021-01-07 Thursday
1 2021-01-08 Friday
2 2021-01-09 Saturday
3 2021-01-10 Sunday
4 2021-01-11 Monday
5 2021-01-12 Tuesday
6 2021-01-13 Wednesday
7 2021-01-14 Thursday,
Date Weekday
0 2021-01-14 Thursday
1 2021-01-15 Friday
2 2021-01-16 Saturday
3 2021-01-17 Sunday
4 2021-01-18 Monday
5 2021-01-19 Tuesday
6 2021-01-20 Wednesday
7 2021-01-21 Thursday,
Date Weekday
0 2021-01-21 Thursday
1 2021-01-22 Friday
2 2021-01-23 Saturday
3 2021-01-24 Sunday
4 2021-01-25 Monday
5 2021-01-26 Tuesday
6 2021-01-27 Wednesday
7 2021-01-28 Thursday,
Date Weekday
0 2021-01-28 Thursday
1 2021-01-29 Friday
2 2021-01-30 Saturday.
In this Format but i don't know how can i divide this dataframe.
You can use pandas.to_datetime if the Date is not yet datetime type, then use the dt.week accessor to groupby:
dfs = [g for _,g in df.groupby(pd.to_datetime(df['Date']).dt.week)]
Alternatively, if you have several years, use dt.to_period:
dfs = [g for _,g in df.groupby(pd.to_datetime(df['Date']).dt.to_period('W'))]
output:
[ Date Weekday
0 2021-01-07 Thursday
1 2021-01-08 Friday
2 2021-01-09 Saturday
3 2021-01-10 Sunday,
Date Weekday
4 2021-01-11 Monday
5 2021-01-12 Tuesday
6 2021-01-13 Wednesday
7 2021-01-14 Thursday
8 2021-01-14 Thursday
9 2021-01-15 Friday
10 2021-01-16 Saturday
11 2021-01-17 Sunday,
Date Weekday
12 2021-01-18 Monday
13 2021-01-19 Tuesday
14 2021-01-20 Wednesday
15 2021-01-21 Thursday
16 2021-01-21 Thursday
17 2021-01-22 Friday
18 2021-01-23 Saturday
19 2021-01-24 Sunday,
Date Weekday
20 2021-01-25 Monday
21 2021-01-26 Tuesday
22 2021-01-27 Wednesday
23 2021-01-28 Thursday
24 2021-01-28 Thursday
25 2021-01-29 Friday
26 2021-01-30 Saturday]
variants
As dictionary:
{k:g for k,g in df.groupby(pd.to_datetime(df['Date']).dt.to_period('W'))}
reset_index of subgroups:
[g.reset_index() for _,g in df.groupby(pd.to_datetime(df['Date']).dt.to_period('W'))]
weeks ending on Wednesday/starting on Thursday with anchor offsets:
[g.reset_index() for _,g in df.groupby(pd.to_datetime(df['Date']).dt.to_period('W-WED'))]

Date Offset in pandas data range

I have the following formula which get me EOM date every 3M starting Feb 90.
dates = pd.date_range(start="1990-02-01", end="2029-09-30", freq="3M")
I am looking to get in a condensed manner the same table but where the dates are offset by x business days.
This mean, if x = 2, 2 business days before the EOM date calculated every 3M starting Feb 90.
Thanks for the help.
from pandas.tseries.offsets import BDay
x = 2
dates = pd.date_range(start="1990-02-01", end="2029-09-30", freq="3M") - BDay(x)
>>> dates
DatetimeIndex(['1990-02-26', '1990-05-29', '1990-08-29', '1990-11-28',
'1991-02-26', '1991-05-29', '1991-08-29', '1991-11-28',
'1992-02-27', '1992-05-28',
...
'2027-05-27', '2027-08-27', '2027-11-26', '2028-02-25',
'2028-05-29', '2028-08-29', '2028-11-28', '2029-02-26',
'2029-05-29', '2029-08-29'],
dtype='datetime64[ns]', length=159, freq=None)
Example
x = 2
dti1 = pd.date_range(start="1990-02-01", end="2029-09-30", freq="3M")
dti2 = pd.date_range(start="1990-02-01", end="2029-09-30", freq="3M") - BDay(x)
df = pd.DataFrame({"dti1": dti1.day_name(), "dti2": dti2.day_name()})
>>> df.head(20)
dti1 dti2
0 Wednesday Monday
1 Thursday Tuesday
2 Friday Wednesday
3 Friday Wednesday
4 Thursday Tuesday
5 Friday Wednesday
6 Saturday Thursday
7 Saturday Thursday
8 Saturday Thursday
9 Sunday Thursday
10 Monday Thursday
11 Monday Thursday
12 Sunday Thursday
13 Monday Thursday
14 Tuesday Friday
15 Tuesday Friday
16 Monday Thursday
17 Tuesday Friday
18 Wednesday Monday
19 Wednesday Monday

Change Saturdays and Sundays to Fridays

My DataFrame:
start_trade week_day
0 2021-01-16 09:30:00 Saturday
1 2021-01-19 14:30:00 Tuesday
2 2021-01-25 22:00:00 Monday
3 2021-01-29 12:15:00 Friday
4 2021-01-31 12:35:00 Sunday
There are no trades on the exchange on Saturday and Sunday. Therefore, if my trading signal falls on the weekend, I want to open a trade on Friday 23:50.
Expexted output:
start_trade week_day
0 2021-01-15 23:50:00 Friday
1 2021-01-19 14:30:00 Tuesday
2 2021-01-25 22:00:00 Monday
3 2021-01-29 12:15:00 Friday
4 2021-01-29 23:50:00 Friday
How to do it?
You can do it playing with to_timedelta to change the date to the Friday of the week and then set the time with Timedelta. Do this only on the rows wanted with the mask
#for week ends dates
mask = df['start_trade'].dt.weekday.isin([5,6])
df.loc[mask, 'start_trade'] = (df['start_trade'].dt.normalize() # to get midnight
- pd.to_timedelta(df['start_trade'].dt.weekday-4, unit='D') # to get the friday date
+ pd.Timedelta(hours=23, minutes=50)) # set 23:50 for time
df.loc[mask, 'week_day'] = 'Friday'
print(df)
start_trade week_day
0 2021-01-15 23:50:00 Friday
1 2021-01-19 14:30:00 Tuesday
2 2021-01-25 22:00:00 Monday
3 2021-01-29 12:15:00 Friday
4 2021-01-29 23:50:00 Friday
Try:
weekend = df['week_day'].isin(['Saturday', 'Sunday'])
df.loc[weekend, 'week_day'] = 'Friday'
Or np.where along with str.contains, and | operator:
df['week_day'] = np.where(df['week_day'].str.contains(r'Saturday|Sunday'),'Friday',df['week_day'])

Transforming pandas data frame using stack function

I have the following pandas dataframe with me
import pandas as pd
import numpy as np
pd.np.random.seed(1)
N = 5
data = pd.DataFrame(pd.np.random.rand(N, 3), columns=['Monday', 'Wednesday', 'Friday'])
data['State'] = 'ST' + pd.Series((pd.np.arange(N) % 19).astype(str))
print data
Monday Wednesday Friday State
0 0.417022 0.720324 0.000114 ST0
1 0.302333 0.146756 0.092339 ST1
2 0.186260 0.345561 0.396767 ST2
3 0.538817 0.419195 0.685220 ST3
4 0.204452 0.878117 0.027388 ST4
I want to convert this dataframe to
0 ST0 Monday 0.417022
Wednesday 0.7203245
Friday 0.0001143748
1 ST1 Monday 0.3023326
Wednesday 0.1467559
Friday 0.09233859
2 ST2 Monday 0.1862602
Wednesday 0.3455607
Friday 0.3967675
State ST2
3 ST3 Monday 0.5388167
Wednesday 0.4191945
Friday 0.6852195
State ST3
4 ST4 Monday 0.2044522
Wednesday 0.8781174
Friday 0.02738759
State ST4
If use data.stack() alone, it will give something like,
0 Monday 0.417022
Wednesday 0.7203245
Friday 0.0001143748
State ST0
1 Monday 0.3023326
Wednesday 0.1467559
Friday 0.09233859
State ST1
2 Monday 0.1862602
Wednesday 0.3455607
Friday 0.3967675
State ST2
3 Monday 0.5388167
Wednesday 0.4191945
Friday 0.6852195
State ST3
4 Monday 0.2044522
Wednesday 0.8781174
Friday 0.02738759
State ST4
Here how can i select State column as first level and the other columns in second level in the multi-index.
You just need to move the State column into the index before stacking:
data.set_index('State', append=True).stack()
Out[4]:
State
0 ST0 Monday 0.417022
Wednesday 0.720324
Friday 0.000114
1 ST1 Monday 0.302333
Wednesday 0.146756
Friday 0.092339
2 ST2 Monday 0.186260
Wednesday 0.345561
Friday 0.396767
3 ST3 Monday 0.538817
Wednesday 0.419195
Friday 0.685220
4 ST4 Monday 0.204452
Wednesday 0.878117
Friday 0.027388
dtype: float64
Note that this doesn't exactly match the output you posted, I haven't included the State alongside the days as I think it's more sensible this way, if you really want it like your original output it would be: data.set_index('State', append=True, drop=False).stack()
You could use melt on State Column like
In [24]: pd.melt(df, id_vars=['State'])
Out[24]:
State variable value
0 ST0 Monday 0.417022
1 ST1 Monday 0.302333
2 ST2 Monday 0.186260
3 ST3 Monday 0.538817
4 ST4 Monday 0.204452
5 ST0 Wednesday 0.720324
6 ST1 Wednesday 0.146756
7 ST2 Wednesday 0.345561
8 ST3 Wednesday 0.419195
9 ST4 Wednesday 0.878117
10 ST0 Friday 0.000114
11 ST1 Friday 0.092339
12 ST2 Friday 0.396767
13 ST3 Friday 0.685220
14 ST4 Friday 0.027388

Categories