Python: Grouping by time interval - python

I have a dataframe that looks like this:
I'm using python 3.6.5 and a datetime.time object for the index
print(sum_by_time)
Trips
Time
00:00:00 10
01:00:00 10
02:00:00 10
03:00:00 10
04:00:00 20
05:00:00 20
06:00:00 20
07:00:00 20
08:00:00 30
09:00:00 30
10:00:00 30
11:00:00 30
How can I group this dataframe by time interval to get something like this:
Trips
Time
00:00:00 - 03:00:00 40
04:00:00 - 07:00:00 80
08:00:00 - 11:00:00 120

I think need convert index values to timedeltas by to_timedelta and then resample:
df.index = pd.to_timedelta(df.index.astype(str))
df = df.resample('4H').sum()
print (df)
Trips
00:00:00 40
04:00:00 80
08:00:00 120
EDIT:
For your format need:
df['d'] = pd.to_datetime(df.index.astype(str))
df = df.groupby(pd.Grouper(freq='4H', key='d')).agg({'Trips':'sum', 'd':['first','last']})
df.columns = df.columns.map('_'.join)
df = df.set_index(df['d_first'].dt.strftime('%H:%M:%S') + ' - ' + df['d_last'].dt.strftime('%H:%M:%S'))[['Trips_sum']]
print (df)
Trips_sum
00:00:00 - 03:00:00 40
04:00:00 - 07:00:00 80
08:00:00 - 11:00:00 120

Related

How to get odd hours to even hours in pandas dataframe?

I have such a dataframe with "normal" steps of two hours between the timestamps. But sometimes there are unfortunately gaps within my data. Because of that I would like to round timestamps with odd hours (01:00, 03:00 etc.) to even hours (02:00, 04:00 etc.). Time is my index column.
My dataframe looks like this:
Time Values
2021-10-24 22:00:00 2
2021-10-25 00:00:00 4
2021-10-25 02:00:00 78
2021-10-25 05:00:00 90
2021-10-25 07:00:00 1
How can I get a dataframe like this?
Time Values
2021-10-24 22:00:00 2
2021-10-25 00:00:00 4
2021-10-25 02:00:00 78
2021-10-25 06:00:00 90
2021-10-25 08:00:00 1
Use DateTimeIndex.floor or DateTimeIndex.ceil with a frequency string 2H depending if you want to down or upsample.
df.index = df.index.ceil('2H')
>>> df
Values
Time
2021-10-24 22:00:00 2
2021-10-25 00:00:00 4
2021-10-25 02:00:00 78
2021-10-25 06:00:00 90
2021-10-25 08:00:00 1
If "Time" is a column (and not the index), you can use dt.ceil:
df["Time"] = df["Time"].dt.ceil("2H")
>>> df
Time Values
0 2021-10-24 22:00:00 2
1 2021-10-25 00:00:00 4
2 2021-10-25 02:00:00 78
3 2021-10-25 06:00:00 90
4 2021-10-25 08:00:00 2
Alternatively, if you want to ensure that the data contains every 2-hour interval, you could resample:
df = df.resample("2H", on="Time", closed="right").sum()
>>> df
Values
Time
2021-10-24 22:00:00 2
2021-10-25 00:00:00 4
2021-10-25 02:00:00 78
2021-10-25 04:00:00 0
2021-10-25 06:00:00 90
2021-10-25 08:00:00 2

datetime difference between dates

I have a df like so:
firstdate seconddate
0 2011-01-01 13:00:00 2011-01-01 13:00:00
1 2011-01-02 14:00:00 2011-01-01 11:00:00
2 2011-01-02 16:00:00 2011-01-02 13:00:00
3 2011-01-04 12:00:00 2011-01-03 15:00:00
...
Seconddate is always before firstdate. I want to compute the difference between firstdate and seconddate in number of days and make this a column, if firstdate and seconddate are the same day, difference=0, if seconddate is the day before firstdate, difference=1 and so on until a week. How would I do this?
df['first'] = pd.to_datetime(df['first'])
df['second'] = pd.to_datetime(df['second'])
df['diff'] = (df['first'] - df['second']).dt.days
This will add a column with the diff. You can delete based on it
df.drop(df[df.diff < 0].index)
# or
df = df[df.diff > 0]

How to use pandas Grouper to get sum of values within each hour

I have the following table:
Hora_Retiro count_uses
0 00:00:18 1
1 00:00:34 1
2 00:02:27 1
3 00:03:13 1
4 00:06:45 1
... ... ...
748700 23:58:47 1
748701 23:58:49 1
748702 23:59:11 1
748703 23:59:47 1
748704 23:59:56 1
And I want to group all values within each hour, so I can see the total number of uses per hour (00:00:00 - 23:00:00)
I have the following code:
hora_pico_aug= hora_pico.groupby(pd.Grouper(key="Hora_Retiro",freq='H')).count()
Hora_Retiro column is of timedelta64[ns] type
Which gives the following output:
count_uses
Hora_Retiro
00:00:02 2566
01:00:02 602
02:00:02 295
03:00:02 5
04:00:02 10
05:00:02 4002
06:00:02 16075
07:00:02 39410
08:00:02 76272
09:00:02 56721
10:00:02 36036
11:00:02 32011
12:00:02 33725
13:00:02 41032
14:00:02 50747
15:00:02 50338
16:00:02 42347
17:00:02 54674
18:00:02 76056
19:00:02 57958
20:00:02 34286
21:00:02 22509
22:00:02 13894
23:00:02 7134
However, the index column starts at 00:00:02, and I want it to start at 00:00:00, and then go from one hour intervals. Something like this:
count_uses
Hora_Retiro
00:00:00 2565
01:00:00 603
02:00:00 295
03:00:00 5
04:00:00 10
05:00:00 4002
06:00:00 16075
07:00:00 39410
08:00:00 76272
09:00:00 56721
10:00:00 36036
11:00:00 32011
12:00:00 33725
13:00:00 41032
14:00:00 50747
15:00:00 50338
16:00:00 42347
17:00:00 54674
18:00:00 76056
19:00:00 57958
20:00:00 34286
21:00:00 22509
22:00:00 13894
23:00:00 7134
How can i make it to start at 00:00:00??
Thanks for the help!
You can create an hour column from Hora_Retiro column.
df['hour'] = df['Hora_Retiro'].dt.hour
And then groupby on the basis of hour
gpby_df = df.groupby('hour')['count_uses'].sum().reset_index()
gpby_df['hour'] = pd.to_datetime(gpby_df['hour'], format='%H').dt.time
gpby_df.columns = ['Hora_Retiro', 'sum_count_uses']
gpby_df
gives
Hora_Retiro sum_count_uses
0 00:00:00 14
1 09:00:00 1
2 10:00:00 2
3 20:00:00 2
I assume that Hora_Retiro column in your DataFrame is of
Timedelta type. It is not datetime, as in this case there
would be printed also the date part.
Indeed, your code creates groups starting at the minute / second
taken from the first row.
To group by "full hours":
round each element in this column to hour,
then group (just by this rounded value).
The code to do it is:
hora_pico.groupby(hora_pico.Hora_Retiro.apply(
lambda tt: tt.round('H'))).count_uses.count()
However I advise you to make up your mind, what do you want to count:
rows or values in count_uses column.
In the second case replace count function with sum.

How to iterate over hours of a given day in python?

I have the following time series data of temperature readings:
DT Temperature
01/01/2019 0:00 41
01/01/2019 1:00 42
01/01/2019 2:00 44
......
01/01/2019 23:00 41
01/02/2019 0:00 44
I am trying to write a function that compares the hourly change in temperature for a given day. Any change greater than 3 will increment quickChange counter. Something like this:
def countChange(day):
for dt in day:
if dt+1 - dt > 3: quickChange = quickChange+1
I can call the function for a day ex: countChange(df.loc['2018-01-01'])
Use Series.diff with compare by 3 and count Trues values by sum:
np.random.seed(2019)
rng = (pd.date_range('2018-01-01', periods=10, freq='H').tolist() +
pd.date_range('2018-01-02', periods=10, freq='H').tolist())
df = pd.DataFrame({'Temperature': np.random.randint(100, size=20)}, index=rng)
print (df)
Temperature
2018-01-01 00:00:00 72
2018-01-01 01:00:00 31
2018-01-01 02:00:00 37
2018-01-01 03:00:00 88
2018-01-01 04:00:00 62
2018-01-01 05:00:00 24
2018-01-01 06:00:00 29
2018-01-01 07:00:00 15
2018-01-01 08:00:00 12
2018-01-01 09:00:00 16
2018-01-02 00:00:00 48
2018-01-02 01:00:00 71
2018-01-02 02:00:00 83
2018-01-02 03:00:00 12
2018-01-02 04:00:00 80
2018-01-02 05:00:00 50
2018-01-02 06:00:00 95
2018-01-02 07:00:00 5
2018-01-02 08:00:00 24
2018-01-02 09:00:00 28
#if necessary create DatetimeIndex if DT is column
df = df.set_index("DT")
def countChange(day):
return (day['Temperature'].diff() > 3).sum()
print (countChange(df.loc['2018-01-01']))
4
print (countChange(df.loc['2018-01-02']))
9
try pandas.DataFrame.diff:
df = pd.DataFrame({'dt': ["01/01/2019 0:00","01/01/2019 1:00","01/01/2019 2:00","01/01/2019 23:00","01/02/2019 0:00"],
'Temperature': [41, 42, 44, 41, 44]})
df = df.sort_values("dt")
df = df.set_index("dt")
def countChange(df):
df["diff"] = df["Temperature"].diff()
return df.loc[df["diff"] > 3, "diff"].count()
quickchange = countChange(df.loc["2018-01-01"])

How to add timedelta for each time seperately contain in one column using python

Here I have a dataset with on input and date and time. Here I just want to convert time into 00:00:00 for specific value which is contain in input column, and other time will be display as it is. Then I wrote the code for that. Then what I want is specify that 00:00:00 only. So I wrote the code for it.
Here is my code:
data['time_diff']= pd.to_datetime(data['date'] + " " + data['time'],
format='%d/%m/%Y %H:%M:%S', dayfirst=True)
data['duration'] = np.where(data['X3'].eq(5), np.timedelta64(0), pd.to_timedelta(data['time']))
print (data['duration'].dtype)
def f(x):
ts = x.total_seconds()
hours, remainder = divmod(ts, 3600)
minutes, seconds = divmod(remainder, 60)
return ('{:02d}:{:02d}:{:02d}').format(int(hours), int(minutes), int(seconds))
data['duration'] = data['duration'].apply(f)
match_time="00:00:00"
T = data.loc[data['duration'] == match_time, 'duration']
Then I got the output :
Then what I want to do is I just want to add 6hours for each time series Then I wrote the code for it and it gave me just 0 values without separate.
my code:
def time (y):
S=[]
row=0
for row in range(len(T)):
y = "00:00:00"
while row >0:
S = np.array(y + np.timedelta(hours=i) for i in range(6))
row += 1
break
else:
continue
#break
return
A= T.apply(time)
print(A)
then output came:
But what I expected is :
T add timedelta 1hr till to 6 hrs expected output
00:00:00 01:00:00
" 02:00:00
03:00:00
04:00:00
" 05:00:00
06:00:00
00:00:00 " 01:00:00
02:00:00
03:00:00
04:00:00
05:00:00
06:00:00
00:00:00:00 01:00:00
02:00:00
03:00:00
04:00:00
05:00:00
06:00:00
My csv file
Maybe that's what you thought:
My test data frame:
T= pd.DataFrame({"T":[ "00:00:00" for i in range(3) ]},index=np.random.randint(0,100,3))
T
8 00:00:00
96 00:00:00
44 00:00:00
tims=[ dt.time(i).strftime("%H:%M:%S") for i in range(1,7)]
['01:00:00', '02:00:00', '03:00:00', '04:00:00', '05:00:00', '06:00:00']
dd=T.apply(lambda r: pd.Series({"T":"00:00:00", "Hours":tims}), axis=1)
T Hours
8 00:00:00 [01:00:00, 02:00:00, 03:00:00, 04:00:00, 05:00...
96 00:00:00 [01:00:00, 02:00:00, 03:00:00, 04:00:00, 05:00...
44 00:00:00 [01:00:00, 02:00:00, 03:00:00, 04:00:00, 05:00...
dd.explode("Hours")
T Hours
8 00:00:00 01:00:00
8 00:00:00 02:00:00
8 00:00:00 03:00:00
8 00:00:00 04:00:00
8 00:00:00 05:00:00
8 00:00:00 06:00:00
44 00:00:00 01:00:00
44 00:00:00 02:00:00
44 00:00:00 03:00:00
44 00:00:00 04:00:00
44 00:00:00 05:00:00
44 00:00:00 06:00:00
96 00:00:00 01:00:00
96 00:00:00 02:00:00
96 00:00:00 03:00:00
96 00:00:00 04:00:00
96 00:00:00 05:00:00
96 00:00:00 06:00:00

Categories