performance issues with pandas apply - python

I have the following dataframe df
time u10 ... av kont
latitude longitude ...
51.799999 -3.2 2011-01-07 09:00:00 -2.217477 ... 0.008106 None
-3.1 2011-01-07 09:00:00 -2.137205 ... 0.008202 None
51.900002 -3.1 2011-01-07 09:00:00 -2.276076 ... 0.008310 None
-3.1 2011-01-07 10:00:00 -1.548405 ... 0.006344 None
-3.0 2011-01-07 09:00:00 -2.200620 ... 0.008537 None
52.200001 -3.9 2011-01-05 23:00:00 1.393586 ... 0.005413 None
-3.8 2011-01-05 21:00:00 1.972752 ... 0.007624 None
-3.8 2011-01-05 22:00:00 1.732336 ... 0.006696 None
-3.8 2011-01-05 23:00:00 1.551723 ... 0.005837 None
-3.8 2011-01-06 00:00:00 1.377130 ... 0.004979 None
-3.7 2011-01-05 21:00:00 2.124066 ... 0.008008 None
-3.7 2011-01-05 22:00:00 1.892480 ... 0.007125 None
-3.7 2011-01-05 23:00:00 1.710662 ... 0.006296 None
-3.6 2011-01-05 21:00:00 2.259727 ... 0.008230 None
-3.6 2011-01-05 22:00:00 2.044596 ... 0.007428 None
-3.6 2011-01-05 23:00:00 1.865990 ... 0.006652 None
52.299999 -3.8 2011-01-05 23:00:00 1.652063 ... 0.006964 None
The entire dataframe can be downloaded from here.
I need to sum groups within latitude, longitude and kont. I am doing this with following function though apply:
def summarize(group):
s = group['kont'].eq('from').cumsum()
return group.groupby(s).agg(
t2m=('t2m', 'mean'),
av=('av', 'sum'),
ah=('tp', 'sum'),
d1=('time', 'min'),
d2=('time', 'max')
)
df=df.groupby(['latitude', 'longitude']).apply(summarize).reset_index(level=-1, drop=True)
output is given here.
However, I need to run this on a large dataframe and it takes hours to finish these operations, probably because of the use apply.
Is there any pure pandas way of speeding this up? Is there any other way i.e. dask?

You can try to change the codes as follows, without using .apply():
s = df['kont'].eq('from').cumsum()
df = (df.groupby(['latitude', 'longitude', s])
.agg(
t2m=('t2m', 'mean'),
av=('av', 'sum'),
ah=('tp', 'sum'),
d1=('time', 'min'),
d2=('time', 'max')
)
).reset_index(level=-1, drop=True)
Result:
Result is the same as running the original codes with .apply():
print(df)
t2m av ah d1 d2
latitude longitude
51.799999 -3.2 0.099451 0.008106 0.010043 1/7/2011 9:00 1/7/2011 9:00
-3.1 0.343713 0.008202 0.010375 1/7/2011 9:00 1/7/2011 9:00
51.900002 -3.1 0.097055 0.014654 0.020506 1/7/2011 10:00 1/7/2011 9:00
-3.0 0.261560 0.008537 0.010545 1/7/2011 9:00 1/7/2011 9:00
52.200001 -3.9 0.292841 0.005413 0.010704 1/5/2011 23:00 1/5/2011 23:00
-3.8 0.207666 0.025135 0.042585 1/5/2011 21:00 1/6/2011 0:00
-3.7 0.354354 0.021428 0.031826 1/5/2011 21:00 1/5/2011 23:00
-3.6 0.333602 0.022311 0.031084 1/5/2011 21:00 1/5/2011 23:00
52.299999 -3.8 0.012537 0.012992 0.024472 1/5/2011 23:00 1/6/2011 0:00
-3.7 -0.146262 0.030848 0.047126 1/5/2011 21:00 1/6/2011 0:00
-3.6 0.150072 0.031348 0.044772 1/5/2011 21:00 1/6/2011 0:00
52.400002 -3.8 0.240045 0.007225 0.013877 1/6/2011 0:00 1/6/2011 0:00
-3.7 0.286981 0.015497 0.025990 1/5/2011 23:00 1/6/2011 0:00
-3.6 0.167067 0.024722 0.036369 1/5/2011 22:00 1/6/2011 0:00
-3.5 0.199080 0.024500 0.033631 1/5/2011 22:00 1/6/2011 0:00
-3.4 0.258915 0.024050 0.030358 1/5/2011 22:00 1/6/2011 0:00
-2.8 0.359186 0.009324 0.010351 1/7/2011 11:00 1/7/2011 11:00
-2.7 0.241022 0.011714 0.010068 1/7/2011 10:00 1/7/2011 10:00
52.700001 -2.8 0.378778 0.009083 0.010874 1/6/2011 0:00 1/6/2011 0:00
-2.7 0.314325 0.019510 0.022723 1/5/2011 23:00 1/6/2011 0:00
52.799999 -3.7 0.214777 0.007146 0.011296 1/6/2011 0:00 1/6/2011 0:00
-3.6 0.294733 0.007325 0.010927 1/6/2011 0:00 1/6/2011 0:00
-3.6 0.300104 0.005927 0.010070 1/7/2011 17:00 1/7/2011 17:00
-3.5 0.314325 0.007460 0.010498 1/6/2011 0:00 1/6/2011 0:00
-3.5 0.271021 0.005504 0.010115 1/7/2011 17:00 1/7/2011 17:00
52.900002 -3.9 0.204980 0.006496 0.011364 1/6/2011 0:00 1/6/2011 0:00
-3.8 0.378778 0.006653 0.011136 1/6/2011 0:00 1/6/2011 0:00
-3.6 0.370264 0.005485 0.010155 1/7/2011 18:00 1/7/2011 18:00
-3.5 0.269434 0.007051 0.010269 1/6/2011 0:00 1/6/2011 0:00
-3.5 0.372156 0.005216 0.010152 1/7/2011 18:00 1/7/2011 18:00
53.000000 -3.9 0.050775 0.006166 0.010510 1/6/2011 0:00 1/6/2011 0:00
53.200001 -1.9 0.396478 0.017476 0.012246 1/5/2011 23:00 1/5/2011 23:00
54.200001 -2.3 0.380670 0.014101 0.010786 1/6/2011 0:00 1/6/2011 0:00
54.299999 -2.4 0.183496 0.011351 0.010115 1/6/2011 0:00 1/6/2011 0:00
-2.3 0.122034 0.025713 0.020119 1/5/2011 23:00 1/6/2011 0:00
Performance Comparison:
Original codes using .apply():
%%timeit
def summarize(group):
s = group['kont'].eq('from').cumsum()
return group.groupby(s).agg(
t2m=('t2m', 'mean'),
av=('av', 'sum'),
ah=('tp', 'sum'),
d1=('time', 'min'),
d2=('time', 'max')
)
df.groupby(['latitude', 'longitude']).apply(summarize).reset_index(level=-1, drop=True)
303 ms ± 33.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Modified codes without using .apply():
%%timeit
s = df['kont'].eq('from').cumsum()
(df.groupby(['latitude', 'longitude', s])
.agg(
t2m=('t2m', 'mean'),
av=('av', 'sum'),
ah=('tp', 'sum'),
d1=('time', 'min'),
d2=('time', 'max')
)
).reset_index(level=-1, drop=True)
15.8 ms ± 236 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
303ms vs 15.8ms: ~ 19.2 times faster

Related

Splitting Dataframe time into morning and evening

I have a df that looks like this (shortened):
DateTime Value Date Time
0 2022-09-18 06:00:00 5.4 18/09/2022 06:00
1 2022-09-18 07:00:00 6.0 18/09/2022 07:00
2 2022-09-18 08:00:00 6.5 18/09/2022 08:00
3 2022-09-18 09:00:00 6.7 18/09/2022 09:00
8 2022-09-18 14:00:00 7.9 18/09/2022 14:00
9 2022-09-18 15:00:00 7.8 18/09/2022 15:00
10 2022-09-18 16:00:00 7.6 18/09/2022 16:00
11 2022-09-18 17:00:00 6.8 18/09/2022 17:00
12 2022-09-18 18:00:00 6.4 18/09/2022 18:00
13 2022-09-18 19:00:00 5.7 18/09/2022 19:00
14 2022-09-18 20:00:00 4.8 18/09/2022 20:00
15 2022-09-18 21:00:00 5.4 18/09/2022 21:00
16 2022-09-18 22:00:00 4.7 18/09/2022 22:00
17 2022-09-18 23:00:00 4.3 18/09/2022 23:00
18 2022-09-19 00:00:00 4.1 19/09/2022 00:00
19 2022-09-19 01:00:00 4.4 19/09/2022 01:00
22 2022-09-19 04:00:00 3.5 19/09/2022 04:00
23 2022-09-19 05:00:00 2.8 19/09/2022 05:00
24 2022-09-19 06:00:00 3.8 19/09/2022 06:00
I want to create a new column where i split the between day and night like this:
00:00 - 05:00 night ,
06:00 - 18:00 day ,
19:00 - 23:00 night
But apparently one can't use same label? How can I solve this problem? Here is my code
df['period'] = pd.cut(pd.to_datetime(df.DateTime).dt.hour,
bins=[0, 5, 17, 23],
labels=['night', 'morning', 'night'],
include_lowest=True)
It's returning
ValueError: labels must be unique if ordered=True; pass ordered=False for duplicate labels
if i understood correctly, if time is between 00:00 - 05:00 or 19:00 - 23:00, you want your new column to say 'night', else 'day', well here's that code:
df['day/night'] = df['Time'].apply(lambda x: 'night' if '00:00' <= x <= '05:00' or '19:00' <= x <= '23:00' else 'day')
or you can add ordered = false parameter using your method
input ->
df = pd.DataFrame(columns=['DateTime', 'Value', 'Date', 'Time'], data=[
['2022-09-18 06:00:00', 5.4, '18/09/2022', '06:00'],
['2022-09-18 07:00:00', 6.0, '18/09/2022', '07:00'],
['2022-09-18 08:00:00', 6.5, '18/09/2022', '08:00'],
['2022-09-18 09:00:00', 6.7, '18/09/2022', '09:00'],
['2022-09-18 14:00:00', 7.9, '18/09/2022', '14:00'],
['2022-09-18 15:00:00', 7.8, '18/09/2022', '15:00'],
['2022-09-18 16:00:00', 7.6, '18/09/2022', '16:00'],
['2022-09-18 17:00:00', 6.8, '18/09/2022', '17:00'],
['2022-09-18 18:00:00', 6.4, '18/09/2022', '18:00'],
['2022-09-18 19:00:00', 5.7, '18/09/2022', '19:00'],
['2022-09-18 20:00:00', 4.8, '18/09/2022', '20:00'],
['2022-09-18 21:00:00', 5.4, '18/09/2022', '21:00'],
['2022-09-18 22:00:00', 4.7, '18/09/2022', '22:00'],
['2022-09-18 23:00:00', 4.3, '18/09/2022', '23:00'],
['2022-09-19 00:00:00', 4.1, '19/09/2022', '00:00'],
['2022-09-19 01:00:00', 4.4, '19/09/2022', '01:00'],
['2022-09-19 04:00:00', 3.5, '19/09/2022', '04:00'],
['2022-09-19 05:00:00', 2.8, '19/09/2022', '05:00'],
['2022-09-19 06:00:00', 3.8, '19/09/2022', '06:00']])
output ->
DateTime Value Date Time is_0600_0900
0 2022-09-18 06:00:00 5.4 18/09/2022 06:00 day
1 2022-09-18 07:00:00 6.0 18/09/2022 07:00 day
2 2022-09-18 08:00:00 6.5 18/09/2022 08:00 day
3 2022-09-18 09:00:00 6.7 18/09/2022 09:00 day
4 2022-09-18 14:00:00 7.9 18/09/2022 14:00 day
5 2022-09-18 15:00:00 7.8 18/09/2022 15:00 day
6 2022-09-18 16:00:00 7.6 18/09/2022 16:00 day
7 2022-09-18 17:00:00 6.8 18/09/2022 17:00 day
8 2022-09-18 18:00:00 6.4 18/09/2022 18:00 day
9 2022-09-18 19:00:00 5.7 18/09/2022 19:00 night
10 2022-09-18 20:00:00 4.8 18/09/2022 20:00 night
11 2022-09-18 21:00:00 5.4 18/09/2022 21:00 night
12 2022-09-18 22:00:00 4.7 18/09/2022 22:00 night
13 2022-09-18 23:00:00 4.3 18/09/2022 23:00 night
14 2022-09-19 00:00:00 4.1 19/09/2022 00:00 night
15 2022-09-19 01:00:00 4.4 19/09/2022 01:00 night
16 2022-09-19 04:00:00 3.5 19/09/2022 04:00 night
17 2022-09-19 05:00:00 2.8 19/09/2022 05:00 night
18 2022-09-19 06:00:00 3.8 19/09/2022 06:00 day
You have two options.
Either you don't care about the order and you can set ordered=False as parameter of cut:
df['period'] = pd.cut(pd.to_datetime(df.DateTime).dt.hour,
bins=[0, 5, 17, 23],
labels=['night', 'morning', 'night'],
ordered=False,
include_lowest=True)
Or you care to have night and morning ordered, in which case you can further convert to ordered Categorical:
df['period'] = pd.Categorical(df['period'], categories=['night', 'morning'], ordered=True)
output:
DateTime Value Date Time period
0 2022-09-18 06:00:00 5.4 18/09/2022 06:00 morning
1 2022-09-18 07:00:00 6.0 18/09/2022 07:00 morning
2 2022-09-18 08:00:00 6.5 18/09/2022 08:00 morning
3 2022-09-18 09:00:00 6.7 18/09/2022 09:00 morning
8 2022-09-18 14:00:00 7.9 18/09/2022 14:00 morning
9 2022-09-18 15:00:00 7.8 18/09/2022 15:00 morning
10 2022-09-18 16:00:00 7.6 18/09/2022 16:00 morning
11 2022-09-18 17:00:00 6.8 18/09/2022 17:00 morning
12 2022-09-18 18:00:00 6.4 18/09/2022 18:00 night
13 2022-09-18 19:00:00 5.7 18/09/2022 19:00 night
14 2022-09-18 20:00:00 4.8 18/09/2022 20:00 night
15 2022-09-18 21:00:00 5.4 18/09/2022 21:00 night
16 2022-09-18 22:00:00 4.7 18/09/2022 22:00 night
17 2022-09-18 23:00:00 4.3 18/09/2022 23:00 night
18 2022-09-19 00:00:00 4.1 19/09/2022 00:00 night
19 2022-09-19 01:00:00 4.4 19/09/2022 01:00 night
22 2022-09-19 04:00:00 3.5 19/09/2022 04:00 night
23 2022-09-19 05:00:00 2.8 19/09/2022 05:00 night
24 2022-09-19 06:00:00 3.8 19/09/2022 06:00 morning
column:
df['period']
0 morning
1 morning
2 morning
...
23 night
24 morning
Name: period, dtype: category
Categories (2, object): ['morning', 'night']

Resample a time-series data at the end of the month and at the end of the day

I have a timeseries data with the following format.
DateShort (%d/%m/%Y)
TimeFrom
TimeTo
Value
1/1/2018
0:00
1:00
6414
1/1/2018
1:00
2:00
6153
...
...
...
...
1/1/2018
23:00
0:00
6317
2/1/2018
0:00
1:00
6046
...
...
...
...
I would like to re-sample data at the end of the month and at the end of the day.
The dataset could be retrieved from https://pastebin.com/raw/NWdigN97
pandas.DataFrame.resample() provides 'M' rule to retrieve data from the end of the month but at the beginning of the day.
See https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html
Do you have better solution to accomplish this?
I have the following sample code:
import numpy as np
import pandas as pd
ds_url = 'https://pastebin.com/raw/NWdigN97'
df = pd.read_csv(ds_url, header=0)
df['DateTime'] = pd.to_datetime(
df['DateShort'] + ' ' + df['TimeFrom'],
format='%d/%m/%Y %H:%M'
)
df.drop('DateShort', axis=1, inplace=True)
df.set_index('DateTime', inplace=True)
df.resample('M').asfreq()
The output is
TimeFrom TimeTo Value
DateTime
2018-01-31 0:00 1:00 7215
2018-02-28 0:00 1:00 8580
2018-03-31 0:00 1:00 6202
2018-04-30 0:00 1:00 5369
2018-05-31 0:00 1:00 5840
2018-06-30 0:00 1:00 5730
2018-07-31 0:00 1:00 5979
2018-08-31 0:00 1:00 6009
2018-09-30 0:00 1:00 5430
2018-10-31 0:00 1:00 6587
2018-11-30 0:00 1:00 7948
2018-12-31 0:00 1:00 6193
However, the correct output should be
TimeFrom TimeTo Value
DateTime
2018-01-31 23:00 0:00 7605
2018-02-28 23:00 0:00 8790
2018-03-31 23:00 0:00 5967
2018-04-30 23:00 0:00 5595
2018-05-31 23:00 0:00 5558
2018-06-30 23:00 0:00 5153
2018-07-31 23:00 0:00 5996
2018-08-31 23:00 0:00 5757
2018-09-30 23:00 0:00 5785
2018-10-31 23:00 0:00 6437
2018-11-30 23:00 0:00 7830
2018-12-31 23:00 0:00 6767
Try this:
df.groupby(pd.Grouper(freq='M')).last()
Output:
TimeFrom TimeTo Value
DateTime
2018-01-31 23:00 0:00 7605
2018-02-28 23:00 0:00 8790
2018-03-31 23:00 0:00 5967
2018-04-30 23:00 0:00 5595
2018-05-31 23:00 0:00 5558
2018-06-30 23:00 0:00 5153
2018-07-31 23:00 0:00 5996
2018-08-31 23:00 0:00 5757
2018-09-30 23:00 0:00 5785
2018-10-31 23:00 0:00 6437
2018-11-30 23:00 0:00 7830
2018-12-31 23:00 0:00 6707

How to sample data from Pandas DataFrame where data is present for every hour of a given day

I wish to create a DataFrame where each row is one day, and the columns provide the date, hourly data, and maximum minimum of the day's data. Here is an example (I provide the input data further down in the question):
Date_time 00:00 01:00 02:00 03:00 04:00 05:00 06:00 07:00 08:00 09:00 10:00 11:00 12:00 13:00 14:00 15:00 16:00 17:00 18:00 19:00 20:00 21:00 22:00 23:00 Max Min
0 2019-02-03 18.6 18.6 18.2 18.0 18.0 18.3 18.7 20.1 21.7 23.3 23.7 24.6 25.1 24.5 23.9 19.6 19.2 19.8 19.6 19.3 19.2 19.3 18.8 19.0 25.7 17.9
1 2019-02-04 18.9 18.8 18.6 18.4 18.7 18.8 19.0 19.7 21.4 23.5 25.8 25.4 22.1 21.8 21.0 18.9 18.8 18.9 18.8 18.8 18.9 27.8 18.1
My input DataFrame has a row for each hour, with the date & time, mean, max, and min for each hour as its columns.
I wish to iterate through each day in the input DataFrame and do the following:
Check that there is a row for each hour of the day
Check that there is both maximum and minimum data for each hour of the day
If the conditions above are met, I wish to:
Add a row to the output DataFrame for the given date
Use the date to fill the 'Date_time' cell for the row
Transpose the hourly data to the hourly cells
Find the max of the hourly max data, and use it to fill the max cell for the row
Find the min of the hourly min data, and use it to fill the min cell for the row
Example daily input data examples follow.
Example 1
All hours for day available
Max & min available for each hour
Proceed to create row in output DataFrame
Date_time Mean_temp Max_temp Min_temp
0 2019-02-03 00:00:00 18.6 18.7 18.5
1 2019-02-03 01:00:00 18.6 18.7 18.5
2 2019-02-03 02:00:00 18.2 18.5 18.0
3 2019-02-03 03:00:00 18.0 18.0 17.9
4 2019-02-03 04:00:00 18.0 18.1 17.9
5 2019-02-03 05:00:00 18.3 18.4 18.1
6 2019-02-03 06:00:00 18.7 19.1 18.4
7 2019-02-03 07:00:00 20.1 21.3 19.1
8 2019-02-03 08:00:00 21.7 22.9 21.0
9 2019-02-03 09:00:00 23.2 23.9 22.8
10 2019-02-03 10:00:00 23.7 24.1 23.3
11 2019-02-03 11:00:00 24.6 25.5 24.0
12 2019-02-03 12:00:00 25.1 25.7 24.7
13 2019-02-03 13:00:00 24.5 25.0 24.2
14 2019-02-03 14:00:00 23.9 25.3 21.2
15 2019-02-03 15:00:00 19.6 21.2 18.8
16 2019-02-03 16:00:00 19.2 19.5 18.7
17 2019-02-03 17:00:00 19.8 19.9 19.4
18 2019-02-03 18:00:00 19.6 19.8 19.5
19 2019-02-03 19:00:00 19.3 19.4 19.1
20 2019-02-03 20:00:00 19.2 19.4 19.1
21 2019-02-03 21:00:00 19.3 19.4 18.9
22 2019-02-03 22:00:00 18.8 19.0 18.7
23 2019-02-03 23:00:00 19.0 19.1 18.9
Example 2
All hours for day available
Max & min available for each hour
NaN values for some Mean_temp entries
Proceed to create row in output DataFrame
Date_time Mean_temp Max_temp Min_temp
24 2019-02-04 00:00:00 18.9 19.0 18.9
25 2019-02-04 01:00:00 18.8 18.9 18.7
26 2019-02-04 02:00:00 18.6 18.8 18.4
27 2019-02-04 03:00:00 18.4 18.6 18.1
28 2019-02-04 04:00:00 18.7 18.9 18.4
29 2019-02-04 05:00:00 18.8 18.8 18.7
30 2019-02-04 06:00:00 19.0 19.3 18.8
31 2019-02-04 07:00:00 19.7 20.4 19.3
32 2019-02-04 08:00:00 21.4 22.8 20.3
33 2019-02-04 09:00:00 23.5 23.9 22.8
34 2019-02-04 10:00:00 25.7 23.6
35 2019-02-04 11:00:00 26.5 25.4
36 2019-02-04 12:00:00 27.1 26.1
37 2019-02-04 13:00:00 25.8 26.8 24.8
38 2019-02-04 14:00:00 25.4 27.8 23.7
39 2019-02-04 15:00:00 22.1 24.1 20.2
40 2019-02-04 16:00:00 21.8 22.6 20.2
41 2019-02-04 17:00:00 20.9 22.4 19.6
42 2019-02-04 18:00:00 18.9 19.6 18.6
43 2019-02-04 19:00:00 18.8 18.9 18.6
44 2019-02-04 20:00:00 18.9 19.0 18.8
45 2019-02-04 21:00:00 18.8 18.9 18.7
46 2019-02-04 22:00:00 18.8 18.9 18.7
47 2019-02-04 23:00:00 18.9 19.2 18.7
Example 3
Not all hours of the day are available
Do not create row in output DataFrame
Date_time Mean_temp Max_temp Min_temp
48 2019-02-05 00:00:00 19.2 19.3 19.0
49 2019-02-05 01:00:00 19.3 19.4 19.3
50 2019-02-05 02:00:00 19.3 19.4 19.2
51 2019-02-05 03:00:00 19.4 19.5 19.4
52 2019-02-05 04:00:00 19.5 19.6 19.3
53 2019-02-05 05:00:00 19.3 19.5 19.1
54 2019-02-05 06:00:00 20.1 20.6 19.2
55 2019-02-05 07:00:00 21.1 21.7 20.6
56 2019-02-05 08:00:00 22.3 23.2 21.7
57 2019-02-05 15:00:00 25.3 25.8 25.0
58 2019-02-05 16:00:00 25.8 26.0 25.2
59 2019-02-05 17:00:00 24.3 25.2 23.3
60 2019-02-05 18:00:00 22.5 23.3 22.1
61 2019-02-05 19:00:00 21.6 22.1 21.1
62 2019-02-05 20:00:00 21.1 21.3 20.9
63 2019-02-05 21:00:00 21.2 21.3 20.9
64 2019-02-05 22:00:00 20.9 21.0 20.6
65 2019-02-05 23:00:00 19.9 20.6 19.7
Example 4
All hours of the day are available
Max and/or min have at least one NaN value
Do not create row in output DataFrame
Date_time Mean_temp Max_temp Min_temp
66 2019-02-06 00:00:00 19.7 19.8 19.7
67 2019-02-06 01:00:00 19.6 19.7 19.3
68 2019-02-06 02:00:00 19.0 19.3 18.6
69 2019-02-06 03:00:00 18.5 18.6 18.4
70 2019-02-06 04:00:00 18.6 18.7 18.4
71 2019-02-06 05:00:00 18.5 18.6
72 2019-02-06 06:00:00 19.0 19.6 18.5
73 2019-02-06 07:00:00 20.3 21.2 19.6
74 2019-02-06 08:00:00 21.5 21.7 21.2
75 2019-02-06 09:00:00 21.4 22.3 20.9
76 2019-02-06 10:00:00 23.5 24.4 22.3
77 2019-02-06 11:00:00 24.7 25.4 24.3
78 2019-02-06 12:00:00 24.9 25.5 23.9
79 2019-02-06 13:00:00 23.4 24.0 22.9
80 2019-02-06 14:00:00 23.3 23.8 22.9
81 2019-02-06 15:00:00 24.4 23.7
82 2019-02-06 16:00:00 24.9 25.1 24.7
83 2019-02-06 17:00:00 24.4 24.9 23.8
84 2019-02-06 18:00:00 22.5 23.8 21.7
85 2019-02-06 19:00:00 20.8 21.8 19.6
86 2019-02-06 20:00:00 19.1 19.6 18.9
87 2019-02-06 21:00:00 19.0 19.1 18.9
88 2019-02-06 22:00:00 19.1 19.1 19.0
89 2019-02-06 23:00:00 19.1 19.1 19.0
Just to recap, the above inputs would create the following output:
Date_time 00:00 01:00 02:00 03:00 04:00 05:00 06:00 07:00 08:00 09:00 10:00 11:00 12:00 13:00 14:00 15:00 16:00 17:00 18:00 19:00 20:00 21:00 22:00 23:00 Max Min
0 2019-02-03 18.6 18.6 18.2 18.0 18.0 18.3 18.7 20.1 21.7 23.3 23.7 24.6 25.1 24.5 23.9 19.6 19.2 19.8 19.6 19.3 19.2 19.3 18.8 19.0 25.7 17.9
1 2019-02-04 18.9 18.8 18.6 18.4 18.7 18.8 19.0 19.7 21.4 23.5 25.8 25.4 22.1 21.8 21.0 18.9 18.8 18.9 18.8 18.8 18.9 27.8 18.1
I've had a really good think about this, and I can only come up with a horrible set of if statements that I known will be terribly slow and will take ages to write (apologies, this is due to me being bad at coding)!
Does anyone have any pointers to Pandas functions that could begin to deal with this problem efficiently?
You can use a groupby on the day of the Date_time column, and build each row of your final_df from each group (moving to the next iteration of the groupby whenever there are any missing values in the max_temp or min_temp columns, or whenever the length of the group is less than 24)
Note that I assuming that your Date_time column is of type datetime64[ns]. If it isn't, you should run the line: df['Date_time'] = pd.to_datetime(df['Date_time'])
all_hours = list(pd.date_range(start='1/1/22 00:00:00', end='1/1/22 23:00:00', freq='h').strftime('%H:%M'))
final_df = pd.DataFrame(columns=['Date_time'] + all_hours + ['Max','Min'])
## construct final_df by using a groupby on the day of the 'Date_time' column
for group,df_group in df.groupby(df['Date_time'].dt.date):
## check if NaN is in either 'Max Temp' or 'Min Temp' columns
new_df_data = {}
if (df_group[['Max_temp','Min_temp']].isnull().sum().sum() == 0) & (len(df_group) == 24):
## create a dictionary for the new row of the final_df
new_df_data['Date_time'] = group
new_df_data.update(dict(zip(all_hours, [[val] for val in df_group['Mean_temp']])))
new_df_data['Max'], new_df_data['Min'] = df_group['Max_temp'].max(), df_group['Min_temp'].min()
final_df = pd.concat([final_df, pd.DataFrame(new_df_data)])
else:
continue
Output:
>>> final_df
Date_time 00:00 01:00 02:00 03:00 04:00 05:00 06:00 07:00 08:00 09:00 10:00 11:00 12:00 13:00 14:00 15:00 16:00 17:00 18:00 19:00 20:00 21:00 22:00 23:00 Max Min
0 2019-02-03 18.6 18.6 18.2 18.0 18.0 18.3 18.7 20.1 21.7 23.2 23.7 24.6 25.1 24.5 23.9 19.6 19.2 19.8 19.6 19.3 19.2 19.3 18.8 19.0 25.7 17.9
0 2019-02-04 18.9 18.8 18.6 18.4 18.7 18.8 19.0 19.7 21.4 23.5 NaN NaN NaN 25.8 25.4 22.1 21.8 20.9 18.9 18.8 18.9 18.8 18.8 18.9 27.8 18.1

pandas merge/rearrange/sum single dataframe

I have following dataframe:
latitude longitude d1 d2 ar merge_time
0 15 10.0 12/1/1981 0:00 12/4/1981 3:00 2.317681391 1981-12-04 04:00:00
1 15 10.1 12/1/1981 0:00 12/1/1981 3:00 2.293604127 1981-12-01 04:00:00
2 15 10.2 12/1/1981 0:00 12/1/1981 2:00 2.264552161 1981-12-01 03:00:00
3 15 10.3 12/1/1981 0:00 12/4/1981 2:00 2.278556423 1981-12-04 03:00:00
4 15 10.1 12/1/1981 4:00 12/1/1981 22:00 2.168275766 1981-12-01 23:00:00
5 15 10.2 12/1/1981 3:00 12/1/1981 21:00 2.114636628 1981-12-01 22:00:00
6 15 10.4 12/1/1981 0:00 12/2/1981 17:00 1.384415903 1981-12-02 18:00:00
7 15 10.1 12/2/1981 8:00 12/2/1981 11:00 2.293604127 1981-12-01 12:00:00
I want to group and rearrange above dataframe (value of column ar) based on following criteria:
1. Values latitude and longitude are equal and
2. Values d2 and merge_time are equal withing grouped in 1
Here is desired output:
latitude longitude d1 d2 ar
15 10 12/1/1981 0:00 12/4/1981 3:00 2.317681391
15 10.1 12/1/1981 0:00 12/1/1981 22:00 4.461879893
15 10.2 12/1/1981 0:00 12/1/1981 21:00 4.379188789
15 10.3 12/1/1981 0:00 12/4/1981 2:00 2.278556423
15 10.4 12/1/1981 0:00 12/2/1981 17:00 1.384415903
15 10.1 12/2/1981 8:00 12/2/1981 11:00 2.293604127
How can I achieve this?
Any help is appreceated.
after expressing your requirements in comments
group by location (longitude & latitude)
find rows within this grouping that are contiguous in time
group and aggregate these contiguous sections
import io
import pandas as pd
df = pd.read_csv(io.StringIO(""" latitude longitude d1 d2 ar merge_time
0 15 10.0 12/1/1981 0:00 12/4/1981 3:00 2.317681391 1981-12-04 04:00:00
1 15 10.1 12/1/1981 0:00 12/1/1981 3:00 2.293604127 1981-12-01 04:00:00
2 15 10.2 12/1/1981 0:00 12/1/1981 2:00 2.264552161 1981-12-01 03:00:00
3 15 10.3 12/1/1981 0:00 12/4/1981 2:00 2.278556423 1981-12-04 03:00:00
4 15 10.1 12/1/1981 4:00 12/1/1981 22:00 2.168275766 1981-12-01 23:00:00
5 15 10.2 12/1/1981 3:00 12/1/1981 21:00 2.114636628 1981-12-01 22:00:00
6 15 10.4 12/1/1981 0:00 12/2/1981 17:00 1.384415903 1981-12-02 18:00:00
7 15 10.1 12/2/1981 8:00 12/2/1981 11:00 2.293604127 1981-12-01 12:00:00"""), sep="\s\s+", engine="python")
df = df.assign(**{c:pd.to_datetime(df[c]) for c in ["d1","d2","merge_time"]})
df.groupby(["latitude", "longitude"]).apply(
lambda d: d.groupby(
(d["d1"] != (d["d2"].shift() + pd.Timedelta("1H"))).cumsum(), as_index=False
).agg({"d1": "min", "d2": "max", "ar": "sum"})
).droplevel(2,0).reset_index()
output
latitude
longitude
d1
d2
ar
0
15
10
1981-12-01 00:00:00
1981-12-04 03:00:00
2.31768
1
15
10.1
1981-12-01 00:00:00
1981-12-01 22:00:00
4.46188
2
15
10.1
1981-12-02 08:00:00
1981-12-02 11:00:00
2.2936
3
15
10.2
1981-12-01 00:00:00
1981-12-01 21:00:00
4.37919
4
15
10.3
1981-12-01 00:00:00
1981-12-04 02:00:00
2.27856
5
15
10.4
1981-12-01 00:00:00
1981-12-02 17:00:00
1.38442

How to change the value of a day column based on time column?

I have a df
Time Samstag Sonntag Werktag
00:15:00 95.3 87.8 94.7
00:30:00 95.5 88.3 94.1
00:45:00 96.2 89.0 94.1
01:00:00 97.4 90.1 95.0
01:15:00 98.9 91.3 96.6
01:30:00 100.3 92.4 98.4
01:45:00 101.0 92.9 99.8
02:00:00 100.4 92.5 99.8
02:15:00 98.2 91.0 98.0
02:30:00 95.1 88.7 95.1
02:45:00 91.9 86.4 91.9
03:00:00 89.5 84.7 89.5
03:15:00 88.6 84.0 88.4
03:30:00 88.6 84.0 88.3
03:45:00 88.7 84.0 88.3
04:00:00 88.3 83.5 87.7
04:15:00 86.8 82.1 86.1
04:30:00 85.1 80.6 84.3
04:45:00 84.2 80.1 83.5
05:00:00 85.3 81.6 84.7
05:15:00 89.0 85.9 88.5
05:30:00 94.1 91.6 94.0
05:45:00 99.3 97.0 99.5
06:00:00 102.8 100.4 103.4
06:15:00 103.7 100.7 104.7
06:30:00 102.6 98.8 104.0
06:45:00 100.7 96.2 102.4
07:00:00 99.2 94.3 101.0
07:15:00 99.1 94.4 100.8
07:30:00 100.8 95.7 102.1
07:45:00 104.4 97.6 105.3
08:00:00 110.1 99.2 110.7
08:15:00 117.7 99.7 118.2
08:30:00 126.1 99.6 126.7
08:45:00 133.9 99.2 134.7
09:00:00 139.7 99.2 140.9
09:15:00 142.4 99.8 144.2
09:30:00 142.9 100.9 145.4
09:45:00 142.4 102.1 145.5
10:00:00 142.1 102.8 145.8
10:15:00 142.9 102.9 147.0
10:30:00 144.5 102.5 149.0
10:45:00 146.3 101.8 151.2
11:00:00 147.6 101.0 153.0
11:15:00 147.9 100.4 154.0
11:30:00 147.5 100.0 154.3
11:45:00 146.8 99.8 154.3
12:00:00 146.4 99.8 154.2
12:15:00 146.3 100.0 154.3
12:30:00 146.5 100.5 154.5
12:45:00 146.2 101.0 154.3
13:00:00 145.1 101.6 153.6
13:15:00 142.8 102.2 152.2
13:30:00 139.3 102.4 149.9
13:45:00 134.6 102.1 147.0
14:00:00 128.8 101.0 143.3
14:15:00 122.3 98.9 139.2
14:30:00 115.5 96.3 135.2
14:45:00 109.4 93.8 132.1
15:00:00 104.6 91.9 130.6
15:15:00 101.8 91.1 131.3
15:30:00 100.5 91.2 133.5
15:45:00 100.2 91.8 136.2
16:00:00 100.4 92.5 138.5
16:15:00 100.6 93.1 139.8
16:30:00 101.0 93.4 140.3
16:45:00 101.9 93.6 140.5
17:00:00 103.4 93.7 140.9
17:15:00 105.8 93.9 142.0
17:30:00 108.7 94.3 143.7
17:45:00 111.5 95.2 145.8
18:00:00 113.7 96.8 148.2
18:15:00 115.0 99.1 150.6
18:30:00 115.7 102.2 152.5
18:45:00 116.3 105.7 153.3
19:00:00 117.3 109.5 152.4
19:15:00 119.0 113.2 149.3
19:30:00 120.6 116.3 144.4
19:45:00 121.4 117.9 138.4
20:00:00 120.4 117.3 131.8
20:15:00 117.0 114.2 125.3
20:30:00 112.1 109.4 119.3
20:45:00 106.8 104.2 114.3
21:00:00 102.2 99.8 110.7
21:15:00 99.2 97.1 108.8
21:30:00 97.4 95.9 108.1
21:45:00 96.4 95.4 108.0
22:00:00 95.6 95.0 107.7
22:15:00 94.5 94.1 106.6
22:30:00 93.3 92.8 104.9
22:45:00 92.0 91.2 103.0
23:00:00 90.7 89.5 101.0
23:15:00 89.6 87.8 99.3
23:30:00 88.6 86.4 97.8
23:45:00 88.0 85.7 96.6
00:00:00 87.7 85.9 95.6
I did:
td = pd.to_timedelta(df['Time'].astype(str))
df1 = df.assign(Time=td.mask(td == pd.Timedelta(0),td + pd.Timedelta('1 days 00:00:00')), a=1)
df2 = pd.DataFrame({'dates': pd.date_range(
'01.01.2020', '31.12.2020'), 'a': 1})
df = df2.merge(df1, how='outer').drop('a', axis=1)
df['dates'] = df['dates'].add(df.pop('Time')).dt.strftime('%d.%m.%Y %H:%M')
df['dates'] = pd.to_datetime(df['dates'], dayfirst=True)
df['day'] = df['dates'].dt.day_name()
It gave the following output:
dates Samstag Sonntag Werktag day
2020-01-01 00:15:00 95.3 87.8 94.7 Wednesday
2020-01-01 00:30:00 95.5 88.3 94.1 Wednesday
2020-01-01 00:45:00 96.2 89.0 94.1 Wednesday
2020-01-01 01:00:00 97.4 90.1 95.0 Wednesday
2020-01-01 01:15:00 98.9 91.3 96.6 Wednesday
2020-01-01 01:30:00 100.3 92.4 98.4 Wednesday
2020-01-01 01:45:00 101.0 92.9 99.8 Wednesday
2020-01-01 02:00:00 100.4 92.5 99.8 Wednesday
2020-01-01 02:15:00 98.2 91.0 98.0 Wednesday
2020-01-01 02:30:00 95.1 88.7 95.1 Wednesday
2020-01-01 02:45:00 91.9 86.4 91.9 Wednesday
2020-01-01 03:00:00 89.5 84.7 89.5 Wednesday
2020-01-01 03:15:00 88.6 84.0 88.4 Wednesday
2020-01-01 03:30:00 88.6 84.0 88.3 Wednesday
2020-01-01 03:45:00 88.7 84.0 88.3 Wednesday
2020-01-01 04:00:00 88.3 83.5 87.7 Wednesday
2020-01-01 04:15:00 86.8 82.1 86.1 Wednesday
2020-01-01 04:30:00 85.1 80.6 84.3 Wednesday
2020-01-01 04:45:00 84.2 80.1 83.5 Wednesday
2020-01-01 05:00:00 85.3 81.6 84.7 Wednesday
2020-01-01 05:15:00 89.0 85.9 88.5 Wednesday
2020-01-01 05:30:00 94.1 91.6 94.0 Wednesday
2020-01-01 05:45:00 99.3 97.0 99.5 Wednesday
2020-01-01 06:00:00 102.8 100.4 103.4 Wednesday
2020-01-01 06:15:00 103.7 100.7 104.7 Wednesday
2020-01-01 06:30:00 102.6 98.8 104.0 Wednesday
2020-01-01 06:45:00 100.7 96.2 102.4 Wednesday
2020-01-01 07:00:00 99.2 94.3 101.0 Wednesday
2020-01-01 07:15:00 99.1 94.4 100.8 Wednesday
2020-01-01 07:30:00 100.8 95.7 102.1 Wednesday
2020-01-01 07:45:00 104.4 97.6 105.3 Wednesday
2020-01-01 08:00:00 110.1 99.2 110.7 Wednesday
2020-01-01 08:15:00 117.7 99.7 118.2 Wednesday
2020-01-01 08:30:00 126.1 99.6 126.7 Wednesday
2020-01-01 08:45:00 133.9 99.2 134.7 Wednesday
2020-01-01 09:00:00 139.7 99.2 140.9 Wednesday
2020-01-01 09:15:00 142.4 99.8 144.2 Wednesday
2020-01-01 09:30:00 142.9 100.9 145.4 Wednesday
2020-01-01 09:45:00 142.4 102.1 145.5 Wednesday
2020-01-01 10:00:00 142.1 102.8 145.8 Wednesday
2020-01-01 10:15:00 142.9 102.9 147.0 Wednesday
2020-01-01 10:30:00 144.5 102.5 149.0 Wednesday
2020-01-01 10:45:00 146.3 101.8 151.2 Wednesday
2020-01-01 11:00:00 147.6 101.0 153.0 Wednesday
2020-01-01 11:15:00 147.9 100.4 154.0 Wednesday
2020-01-01 11:30:00 147.5 100.0 154.3 Wednesday
2020-01-01 11:45:00 146.8 99.8 154.3 Wednesday
2020-01-01 12:00:00 146.4 99.8 154.2 Wednesday
2020-01-01 12:15:00 146.3 100.0 154.3 Wednesday
2020-01-01 12:30:00 146.5 100.5 154.5 Wednesday
2020-01-01 12:45:00 146.2 101.0 154.3 Wednesday
2020-01-01 13:00:00 145.1 101.6 153.6 Wednesday
2020-01-01 13:15:00 142.8 102.2 152.2 Wednesday
2020-01-01 13:30:00 139.3 102.4 149.9 Wednesday
2020-01-01 13:45:00 134.6 102.1 147.0 Wednesday
2020-01-01 14:00:00 128.8 101.0 143.3 Wednesday
2020-01-01 14:15:00 122.3 98.9 139.2 Wednesday
2020-01-01 14:30:00 115.5 96.3 135.2 Wednesday
2020-01-01 14:45:00 109.4 93.8 132.1 Wednesday
2020-01-01 15:00:00 104.6 91.9 130.6 Wednesday
2020-01-01 15:15:00 101.8 91.1 131.3 Wednesday
2020-01-01 15:30:00 100.5 91.2 133.5 Wednesday
2020-01-01 15:45:00 100.2 91.8 136.2 Wednesday
2020-01-01 16:00:00 100.4 92.5 138.5 Wednesday
2020-01-01 16:15:00 100.6 93.1 139.8 Wednesday
2020-01-01 16:30:00 101.0 93.4 140.3 Wednesday
2020-01-01 16:45:00 101.9 93.6 140.5 Wednesday
2020-01-01 17:00:00 103.4 93.7 140.9 Wednesday
2020-01-01 17:15:00 105.8 93.9 142.0 Wednesday
2020-01-01 17:30:00 108.7 94.3 143.7 Wednesday
2020-01-01 17:45:00 111.5 95.2 145.8 Wednesday
2020-01-01 18:00:00 113.7 96.8 148.2 Wednesday
2020-01-01 18:15:00 115.0 99.1 150.6 Wednesday
2020-01-01 18:30:00 115.7 102.2 152.5 Wednesday
2020-01-01 18:45:00 116.3 105.7 153.3 Wednesday
2020-01-01 19:00:00 117.3 109.5 152.4 Wednesday
2020-01-01 19:15:00 119.0 113.2 149.3 Wednesday
2020-01-01 19:30:00 120.6 116.3 144.4 Wednesday
2020-01-01 19:45:00 121.4 117.9 138.4 Wednesday
2020-01-01 20:00:00 120.4 117.3 131.8 Wednesday
2020-01-01 20:15:00 117.0 114.2 125.3 Wednesday
2020-01-01 20:30:00 112.1 109.4 119.3 Wednesday
2020-01-01 20:45:00 106.8 104.2 114.3 Wednesday
2020-01-01 21:00:00 102.2 99.8 110.7 Wednesday
2020-01-01 21:15:00 99.2 97.1 108.8 Wednesday
2020-01-01 21:30:00 97.4 95.9 108.1 Wednesday
2020-01-01 21:45:00 96.4 95.4 108.0 Wednesday
2020-01-01 22:00:00 95.6 95.0 107.7 Wednesday
2020-01-01 22:15:00 94.5 94.1 106.6 Wednesday
2020-01-01 22:30:00 93.3 92.8 104.9 Wednesday
2020-01-01 22:45:00 92.0 91.2 103.0 Wednesday
2020-01-01 23:00:00 90.7 89.5 101.0 Wednesday
2020-01-01 23:15:00 89.6 87.8 99.3 Wednesday
2020-01-01 23:30:00 88.6 86.4 97.8 Wednesday
2020-01-01 23:45:00 88.0 85.7 96.6 Wednesday
2020-01-02 00:00:00 87.7 85.9 95.6 Thursday
2020-01-02 00:15:00 95.3 87.8 94.7 Thursday
2020-01-02 00:30:00 95.5 88.3 94.1 Thursday
2020-01-02 00:45:00 96.2 89.0 94.1 Thursday
2020-01-02 01:00:00 97.4 90.1 95.0 Thursday
2020-01-02 01:15:00 98.9 91.3 96.6 Thursday
2020-01-02 01:30:00 100.3 92.4 98.4 Thursday
2020-01-02 01:45:00 101.0 92.9 99.8 Thursday
2020-01-02 02:00:00 100.4 92.5 99.8 Thursday
2020-01-02 02:15:00 98.2 91.0 98.0 Thursday
2020-01-02 02:30:00 95.1 88.7 95.1 Thursday
2020-01-02 02:45:00 91.9 86.4 91.9 Thursday
2020-01-02 03:00:00 89.5 84.7 89.5 Thursday
2020-01-02 03:15:00 88.6 84.0 88.4 Thursday
2020-01-02 03:30:00 88.6 84.0 88.3 Thursday
2020-01-02 03:45:00 88.7 84.0 88.3 Thursday
2020-01-02 04:00:00 88.3 83.5 87.7 Thursday
2020-01-02 04:15:00 86.8 82.1 86.1 Thursday
2020-01-02 04:30:00 85.1 80.6 84.3 Thursday
2020-01-02 04:45:00 84.2 80.1 83.5 Thursday
2020-01-02 05:00:00 85.3 81.6 84.7 Thursday
2020-01-02 05:15:00 89.0 85.9 88.5 Thursday
2020-01-02 05:30:00 94.1 91.6 94.0 Thursday
2020-01-02 05:45:00 99.3 97.0 99.5 Thursday
2020-01-02 06:00:00 102.8 100.4 103.4 Thursday
2020-01-02 06:15:00 103.7 100.7 104.7 Thursday
2020-01-02 06:30:00 102.6 98.8 104.0 Thursday
2020-01-02 06:45:00 100.7 96.2 102.4 Thursday
2020-01-02 07:00:00 99.2 94.3 101.0 Thursday
2020-01-02 07:15:00 99.1 94.4 100.8 Thursday
2020-01-02 07:30:00 100.8 95.7 102.1 Thursday
2020-01-02 07:45:00 104.4 97.6 105.3 Thursday
2020-01-02 08:00:00 110.1 99.2 110.7 Thursday
2020-01-02 08:15:00 117.7 99.7 118.2 Thursday
2020-01-02 08:30:00 126.1 99.6 126.7 Thursday
2020-01-02 08:45:00 133.9 99.2 134.7 Thursday
2020-01-02 09:00:00 139.7 99.2 140.9 Thursday
2020-01-02 09:15:00 142.4 99.8 144.2 Thursday
2020-01-02 09:30:00 142.9 100.9 145.4 Thursday
2020-01-02 09:45:00 142.4 102.1 145.5 Thursday
2020-01-02 10:00:00 142.1 102.8 145.8 Thursday
2020-01-02 10:15:00 142.9 102.9 147.0 Thursday
2020-01-02 10:30:00 144.5 102.5 149.0 Thursday
2020-01-02 10:45:00 146.3 101.8 151.2 Thursday
2020-01-02 11:00:00 147.6 101.0 153.0 Thursday
2020-01-02 11:15:00 147.9 100.4 154.0 Thursday
2020-01-02 11:30:00 147.5 100.0 154.3 Thursday
2020-01-02 11:45:00 146.8 99.8 154.3 Thursday
2020-01-02 12:00:00 146.4 99.8 154.2 Thursday
2020-01-02 12:15:00 146.3 100.0 154.3 Thursday
2020-01-02 12:30:00 146.5 100.5 154.5 Thursday
2020-01-02 12:45:00 146.2 101.0 154.3 Thursday
2020-01-02 13:00:00 145.1 101.6 153.6 Thursday
2020-01-02 13:15:00 142.8 102.2 152.2 Thursday
2020-01-02 13:30:00 139.3 102.4 149.9 Thursday
2020-01-02 13:45:00 134.6 102.1 147.0 Thursday
2020-01-02 14:00:00 128.8 101.0 143.3 Thursday
2020-01-02 14:15:00 122.3 98.9 139.2 Thursday
2020-01-02 14:30:00 115.5 96.3 135.2 Thursday
2020-01-02 14:45:00 109.4 93.8 132.1 Thursday
2020-01-02 15:00:00 104.6 91.9 130.6 Thursday
2020-01-02 15:15:00 101.8 91.1 131.3 Thursday
2020-01-02 15:30:00 100.5 91.2 133.5 Thursday
2020-01-02 15:45:00 100.2 91.8 136.2 Thursday
2020-01-02 16:00:00 100.4 92.5 138.5 Thursday
2020-01-02 16:15:00 100.6 93.1 139.8 Thursday
2020-01-02 16:30:00 101.0 93.4 140.3 Thursday
2020-01-02 16:45:00 101.9 93.6 140.5 Thursday
2020-01-02 17:00:00 103.4 93.7 140.9 Thursday
2020-01-02 17:15:00 105.8 93.9 142.0 Thursday
2020-01-02 17:30:00 108.7 94.3 143.7 Thursday
2020-01-02 17:45:00 111.5 95.2 145.8 Thursday
2020-01-02 18:00:00 113.7 96.8 148.2 Thursday
2020-01-02 18:15:00 115.0 99.1 150.6 Thursday
2020-01-02 18:30:00 115.7 102.2 152.5 Thursday
2020-01-02 18:45:00 116.3 105.7 153.3 Thursday
2020-01-02 19:00:00 117.3 109.5 152.4 Thursday
2020-01-02 19:15:00 119.0 113.2 149.3 Thursday
2020-01-02 19:30:00 120.6 116.3 144.4 Thursday
2020-01-02 19:45:00 121.4 117.9 138.4 Thursday
2020-01-02 20:00:00 120.4 117.3 131.8 Thursday
2020-01-02 20:15:00 117.0 114.2 125.3 Thursday
2020-01-02 20:30:00 112.1 109.4 119.3 Thursday
2020-01-02 20:45:00 106.8 104.2 114.3 Thursday
2020-01-02 21:00:00 102.2 99.8 110.7 Thursday
2020-01-02 21:15:00 99.2 97.1 108.8 Thursday
2020-01-02 21:30:00 97.4 95.9 108.1 Thursday
2020-01-02 21:45:00 96.4 95.4 108.0 Thursday
2020-01-02 22:00:00 95.6 95.0 107.7 Thursday
2020-01-02 22:15:00 94.5 94.1 106.6 Thursday
2020-01-02 22:30:00 93.3 92.8 104.9 Thursday
2020-01-02 22:45:00 92.0 91.2 103.0 Thursday
2020-01-02 23:00:00 90.7 89.5 101.0 Thursday
2020-01-02 23:15:00 89.6 87.8 99.3 Thursday
2020-01-02 23:30:00 88.6 86.4 97.8 Thursday
2020-01-02 23:45:00 88.0 85.7 96.6 Thursday
2020-01-03 00:00:00 87.7 85.9 95.6 Friday
2020-01-03 00:15:00 95.3 87.8 94.7 Friday
2020-01-03 00:30:00 95.5 88.3 94.1 Friday
2020-01-03 00:45:00 96.2 89.0 94.1 Friday
What I would like to do is to change the value of day at 2020-01-02 00:00:00 from Thursday to Wednesday, and similarly the value of day at 2020-01-03 00:00:00 from Friday to Thursday and so on.
In other words: The value of day for next day at 00:00:00 should be similar to the value of the previous day and from 00:15:00, a new day should begin.
Expected output
dates Samstag Sonntag Werktag day
2020-01-01 00:15:00 95.3 87.8 94.7 Wednesday
2020-01-01 00:30:00 95.5 88.3 94.1 Wednesday
2020-01-01 00:45:00 96.2 89.0 94.1 Wednesday
2020-01-01 01:00:00 97.4 90.1 95.0 Wednesday
2020-01-01 01:15:00 98.9 91.3 96.6 Wednesday
2020-01-01 01:30:00 100.3 92.4 98.4 Wednesday
2020-01-01 01:45:00 101.0 92.9 99.8 Wednesday
2020-01-01 02:00:00 100.4 92.5 99.8 Wednesday
2020-01-01 02:15:00 98.2 91.0 98.0 Wednesday
2020-01-01 02:30:00 95.1 88.7 95.1 Wednesday
2020-01-01 02:45:00 91.9 86.4 91.9 Wednesday
2020-01-01 03:00:00 89.5 84.7 89.5 Wednesday
2020-01-01 03:15:00 88.6 84.0 88.4 Wednesday
2020-01-01 03:30:00 88.6 84.0 88.3 Wednesday
2020-01-01 03:45:00 88.7 84.0 88.3 Wednesday
2020-01-01 04:00:00 88.3 83.5 87.7 Wednesday
2020-01-01 04:15:00 86.8 82.1 86.1 Wednesday
2020-01-01 04:30:00 85.1 80.6 84.3 Wednesday
2020-01-01 04:45:00 84.2 80.1 83.5 Wednesday
2020-01-01 05:00:00 85.3 81.6 84.7 Wednesday
2020-01-01 05:15:00 89.0 85.9 88.5 Wednesday
2020-01-01 05:30:00 94.1 91.6 94.0 Wednesday
2020-01-01 05:45:00 99.3 97.0 99.5 Wednesday
2020-01-01 06:00:00 102.8 100.4 103.4 Wednesday
2020-01-01 06:15:00 103.7 100.7 104.7 Wednesday
2020-01-01 06:30:00 102.6 98.8 104.0 Wednesday
2020-01-01 06:45:00 100.7 96.2 102.4 Wednesday
2020-01-01 07:00:00 99.2 94.3 101.0 Wednesday
2020-01-01 07:15:00 99.1 94.4 100.8 Wednesday
2020-01-01 07:30:00 100.8 95.7 102.1 Wednesday
2020-01-01 07:45:00 104.4 97.6 105.3 Wednesday
2020-01-01 08:00:00 110.1 99.2 110.7 Wednesday
2020-01-01 08:15:00 117.7 99.7 118.2 Wednesday
2020-01-01 08:30:00 126.1 99.6 126.7 Wednesday
2020-01-01 08:45:00 133.9 99.2 134.7 Wednesday
2020-01-01 09:00:00 139.7 99.2 140.9 Wednesday
2020-01-01 09:15:00 142.4 99.8 144.2 Wednesday
2020-01-01 09:30:00 142.9 100.9 145.4 Wednesday
2020-01-01 09:45:00 142.4 102.1 145.5 Wednesday
2020-01-01 10:00:00 142.1 102.8 145.8 Wednesday
2020-01-01 10:15:00 142.9 102.9 147.0 Wednesday
2020-01-01 10:30:00 144.5 102.5 149.0 Wednesday
2020-01-01 10:45:00 146.3 101.8 151.2 Wednesday
2020-01-01 11:00:00 147.6 101.0 153.0 Wednesday
2020-01-01 11:15:00 147.9 100.4 154.0 Wednesday
2020-01-01 11:30:00 147.5 100.0 154.3 Wednesday
2020-01-01 11:45:00 146.8 99.8 154.3 Wednesday
2020-01-01 12:00:00 146.4 99.8 154.2 Wednesday
2020-01-01 12:15:00 146.3 100.0 154.3 Wednesday
2020-01-01 12:30:00 146.5 100.5 154.5 Wednesday
2020-01-01 12:45:00 146.2 101.0 154.3 Wednesday
2020-01-01 13:00:00 145.1 101.6 153.6 Wednesday
2020-01-01 13:15:00 142.8 102.2 152.2 Wednesday
2020-01-01 13:30:00 139.3 102.4 149.9 Wednesday
2020-01-01 13:45:00 134.6 102.1 147.0 Wednesday
2020-01-01 14:00:00 128.8 101.0 143.3 Wednesday
2020-01-01 14:15:00 122.3 98.9 139.2 Wednesday
2020-01-01 14:30:00 115.5 96.3 135.2 Wednesday
2020-01-01 14:45:00 109.4 93.8 132.1 Wednesday
2020-01-01 15:00:00 104.6 91.9 130.6 Wednesday
2020-01-01 15:15:00 101.8 91.1 131.3 Wednesday
2020-01-01 15:30:00 100.5 91.2 133.5 Wednesday
2020-01-01 15:45:00 100.2 91.8 136.2 Wednesday
2020-01-01 16:00:00 100.4 92.5 138.5 Wednesday
2020-01-01 16:15:00 100.6 93.1 139.8 Wednesday
2020-01-01 16:30:00 101.0 93.4 140.3 Wednesday
2020-01-01 16:45:00 101.9 93.6 140.5 Wednesday
2020-01-01 17:00:00 103.4 93.7 140.9 Wednesday
2020-01-01 17:15:00 105.8 93.9 142.0 Wednesday
2020-01-01 17:30:00 108.7 94.3 143.7 Wednesday
2020-01-01 17:45:00 111.5 95.2 145.8 Wednesday
2020-01-01 18:00:00 113.7 96.8 148.2 Wednesday
2020-01-01 18:15:00 115.0 99.1 150.6 Wednesday
2020-01-01 18:30:00 115.7 102.2 152.5 Wednesday
2020-01-01 18:45:00 116.3 105.7 153.3 Wednesday
2020-01-01 19:00:00 117.3 109.5 152.4 Wednesday
2020-01-01 19:15:00 119.0 113.2 149.3 Wednesday
2020-01-01 19:30:00 120.6 116.3 144.4 Wednesday
2020-01-01 19:45:00 121.4 117.9 138.4 Wednesday
2020-01-01 20:00:00 120.4 117.3 131.8 Wednesday
2020-01-01 20:15:00 117.0 114.2 125.3 Wednesday
2020-01-01 20:30:00 112.1 109.4 119.3 Wednesday
2020-01-01 20:45:00 106.8 104.2 114.3 Wednesday
2020-01-01 21:00:00 102.2 99.8 110.7 Wednesday
2020-01-01 21:15:00 99.2 97.1 108.8 Wednesday
2020-01-01 21:30:00 97.4 95.9 108.1 Wednesday
2020-01-01 21:45:00 96.4 95.4 108.0 Wednesday
2020-01-01 22:00:00 95.6 95.0 107.7 Wednesday
2020-01-01 22:15:00 94.5 94.1 106.6 Wednesday
2020-01-01 22:30:00 93.3 92.8 104.9 Wednesday
2020-01-01 22:45:00 92.0 91.2 103.0 Wednesday
2020-01-01 23:00:00 90.7 89.5 101.0 Wednesday
2020-01-01 23:15:00 89.6 87.8 99.3 Wednesday
2020-01-01 23:30:00 88.6 86.4 97.8 Wednesday
2020-01-01 23:45:00 88.0 85.7 96.6 Wednesday
2020-01-02 00:00:00 87.7 85.9 95.6 Wednesday
2020-01-02 00:15:00 95.3 87.8 94.7 Thursday
2020-01-02 00:30:00 95.5 88.3 94.1 Thursday
2020-01-02 00:45:00 96.2 89.0 94.1 Thursday
2020-01-02 01:00:00 97.4 90.1 95.0 Thursday
2020-01-02 01:15:00 98.9 91.3 96.6 Thursday
2020-01-02 01:30:00 100.3 92.4 98.4 Thursday
2020-01-02 01:45:00 101.0 92.9 99.8 Thursday
2020-01-02 02:00:00 100.4 92.5 99.8 Thursday
2020-01-02 02:15:00 98.2 91.0 98.0 Thursday
2020-01-02 02:30:00 95.1 88.7 95.1 Thursday
2020-01-02 02:45:00 91.9 86.4 91.9 Thursday
2020-01-02 03:00:00 89.5 84.7 89.5 Thursday
2020-01-02 03:15:00 88.6 84.0 88.4 Thursday
2020-01-02 03:30:00 88.6 84.0 88.3 Thursday
2020-01-02 03:45:00 88.7 84.0 88.3 Thursday
2020-01-02 04:00:00 88.3 83.5 87.7 Thursday
2020-01-02 04:15:00 86.8 82.1 86.1 Thursday
2020-01-02 04:30:00 85.1 80.6 84.3 Thursday
2020-01-02 04:45:00 84.2 80.1 83.5 Thursday
2020-01-02 05:00:00 85.3 81.6 84.7 Thursday
2020-01-02 05:15:00 89.0 85.9 88.5 Thursday
2020-01-02 05:30:00 94.1 91.6 94.0 Thursday
2020-01-02 05:45:00 99.3 97.0 99.5 Thursday
2020-01-02 06:00:00 102.8 100.4 103.4 Thursday
2020-01-02 06:15:00 103.7 100.7 104.7 Thursday
2020-01-02 06:30:00 102.6 98.8 104.0 Thursday
2020-01-02 06:45:00 100.7 96.2 102.4 Thursday
2020-01-02 07:00:00 99.2 94.3 101.0 Thursday
2020-01-02 07:15:00 99.1 94.4 100.8 Thursday
2020-01-02 07:30:00 100.8 95.7 102.1 Thursday
2020-01-02 07:45:00 104.4 97.6 105.3 Thursday
2020-01-02 08:00:00 110.1 99.2 110.7 Thursday
2020-01-02 08:15:00 117.7 99.7 118.2 Thursday
2020-01-02 08:30:00 126.1 99.6 126.7 Thursday
2020-01-02 08:45:00 133.9 99.2 134.7 Thursday
2020-01-02 09:00:00 139.7 99.2 140.9 Thursday
2020-01-02 09:15:00 142.4 99.8 144.2 Thursday
2020-01-02 09:30:00 142.9 100.9 145.4 Thursday
2020-01-02 09:45:00 142.4 102.1 145.5 Thursday
2020-01-02 10:00:00 142.1 102.8 145.8 Thursday
2020-01-02 10:15:00 142.9 102.9 147.0 Thursday
2020-01-02 10:30:00 144.5 102.5 149.0 Thursday
2020-01-02 10:45:00 146.3 101.8 151.2 Thursday
2020-01-02 11:00:00 147.6 101.0 153.0 Thursday
2020-01-02 11:15:00 147.9 100.4 154.0 Thursday
2020-01-02 11:30:00 147.5 100.0 154.3 Thursday
2020-01-02 11:45:00 146.8 99.8 154.3 Thursday
2020-01-02 12:00:00 146.4 99.8 154.2 Thursday
2020-01-02 12:15:00 146.3 100.0 154.3 Thursday
2020-01-02 12:30:00 146.5 100.5 154.5 Thursday
2020-01-02 12:45:00 146.2 101.0 154.3 Thursday
2020-01-02 13:00:00 145.1 101.6 153.6 Thursday
2020-01-02 13:15:00 142.8 102.2 152.2 Thursday
2020-01-02 13:30:00 139.3 102.4 149.9 Thursday
2020-01-02 13:45:00 134.6 102.1 147.0 Thursday
2020-01-02 14:00:00 128.8 101.0 143.3 Thursday
2020-01-02 14:15:00 122.3 98.9 139.2 Thursday
2020-01-02 14:30:00 115.5 96.3 135.2 Thursday
2020-01-02 14:45:00 109.4 93.8 132.1 Thursday
2020-01-02 15:00:00 104.6 91.9 130.6 Thursday
2020-01-02 15:15:00 101.8 91.1 131.3 Thursday
2020-01-02 15:30:00 100.5 91.2 133.5 Thursday
2020-01-02 15:45:00 100.2 91.8 136.2 Thursday
2020-01-02 16:00:00 100.4 92.5 138.5 Thursday
2020-01-02 16:15:00 100.6 93.1 139.8 Thursday
2020-01-02 16:30:00 101.0 93.4 140.3 Thursday
2020-01-02 16:45:00 101.9 93.6 140.5 Thursday
2020-01-02 17:00:00 103.4 93.7 140.9 Thursday
2020-01-02 17:15:00 105.8 93.9 142.0 Thursday
2020-01-02 17:30:00 108.7 94.3 143.7 Thursday
2020-01-02 17:45:00 111.5 95.2 145.8 Thursday
2020-01-02 18:00:00 113.7 96.8 148.2 Thursday
2020-01-02 18:15:00 115.0 99.1 150.6 Thursday
2020-01-02 18:30:00 115.7 102.2 152.5 Thursday
2020-01-02 18:45:00 116.3 105.7 153.3 Thursday
2020-01-02 19:00:00 117.3 109.5 152.4 Thursday
2020-01-02 19:15:00 119.0 113.2 149.3 Thursday
2020-01-02 19:30:00 120.6 116.3 144.4 Thursday
2020-01-02 19:45:00 121.4 117.9 138.4 Thursday
2020-01-02 20:00:00 120.4 117.3 131.8 Thursday
2020-01-02 20:15:00 117.0 114.2 125.3 Thursday
2020-01-02 20:30:00 112.1 109.4 119.3 Thursday
2020-01-02 20:45:00 106.8 104.2 114.3 Thursday
2020-01-02 21:00:00 102.2 99.8 110.7 Thursday
2020-01-02 21:15:00 99.2 97.1 108.8 Thursday
2020-01-02 21:30:00 97.4 95.9 108.1 Thursday
2020-01-02 21:45:00 96.4 95.4 108.0 Thursday
2020-01-02 22:00:00 95.6 95.0 107.7 Thursday
2020-01-02 22:15:00 94.5 94.1 106.6 Thursday
2020-01-02 22:30:00 93.3 92.8 104.9 Thursday
2020-01-02 22:45:00 92.0 91.2 103.0 Thursday
2020-01-02 23:00:00 90.7 89.5 101.0 Thursday
2020-01-02 23:15:00 89.6 87.8 99.3 Thursday
2020-01-02 23:30:00 88.6 86.4 97.8 Thursday
2020-01-02 23:45:00 88.0 85.7 96.6 Thursday
2020-01-03 00:00:00 87.7 85.9 95.6 Thursday
2020-01-03 00:15:00 95.3 87.8 94.7 Friday
2020-01-03 00:30:00 95.5 88.3 94.1 Friday
2020-01-03 00:45:00 96.2 89.0 94.1 Friday
How can this be done??
Edit 1
import pandas as pd
df = pd.DataFrame({ 'dates': ['2020-01-01 22:15:00',
'2020-01-01 22:35:00',
'2020-01-01 22:45:00',
'2020-01-01 23:00:00',
'2020-01-01 23:15:00',
'2020-01-01 23:30:00',
'2020-01-01 23:45:00',
'2020-01-02 00:00:00',
'2020-01-02 22:15:00',
'2020-01-02 22:35:00',
'2020-01-02 22:45:00',
'2020-01-02 23:00:00',
'2020-01-02 23:15:00',
'2020-01-02 23:30:00',
'2020-01-02 23:45:00',
'2020-01-03 00:00:00'],
'expected_output':['Wednesday',
'Wednesday',
'Wednesday',
'Wednesday',
'Wednesday',
'Wednesday',
'Wednesday',
'Wednesday',
'Thursday',
'Thursday',
'Thursday',
'Thursday',
'Thursday','Thursday','Thursday','Thursday']})
Just check the minutes of Timestamp using apply.
# df = pd.DataFrame({'dates': ['2020-01-01 22:15:00', .....]}, )
# convert str date into Timestamp
df['dates'] = pd.to_datetime(df['dates'])
def calculate_day(x):
# get previous day
if x.hour == 0 and x.minute < 15:
return (x - pd.DateOffset(days=1)).day_name()
return x.day_name()
df['day'] = df['dates'].apply(calculate_day)
print(df)
# dates day
#0 2020-01-01 22:15:00 Wednesday
#...
JFYI: weekday_name deprecated. Use day_name().
Hope this helps.

Categories