How to get odd hours to even hours in pandas dataframe? - python

I have such a dataframe with "normal" steps of two hours between the timestamps. But sometimes there are unfortunately gaps within my data. Because of that I would like to round timestamps with odd hours (01:00, 03:00 etc.) to even hours (02:00, 04:00 etc.). Time is my index column.
My dataframe looks like this:
Time Values
2021-10-24 22:00:00 2
2021-10-25 00:00:00 4
2021-10-25 02:00:00 78
2021-10-25 05:00:00 90
2021-10-25 07:00:00 1
How can I get a dataframe like this?
Time Values
2021-10-24 22:00:00 2
2021-10-25 00:00:00 4
2021-10-25 02:00:00 78
2021-10-25 06:00:00 90
2021-10-25 08:00:00 1

Use DateTimeIndex.floor or DateTimeIndex.ceil with a frequency string 2H depending if you want to down or upsample.
df.index = df.index.ceil('2H')
>>> df
Values
Time
2021-10-24 22:00:00 2
2021-10-25 00:00:00 4
2021-10-25 02:00:00 78
2021-10-25 06:00:00 90
2021-10-25 08:00:00 1

If "Time" is a column (and not the index), you can use dt.ceil:
df["Time"] = df["Time"].dt.ceil("2H")
>>> df
Time Values
0 2021-10-24 22:00:00 2
1 2021-10-25 00:00:00 4
2 2021-10-25 02:00:00 78
3 2021-10-25 06:00:00 90
4 2021-10-25 08:00:00 2
Alternatively, if you want to ensure that the data contains every 2-hour interval, you could resample:
df = df.resample("2H", on="Time", closed="right").sum()
>>> df
Values
Time
2021-10-24 22:00:00 2
2021-10-25 00:00:00 4
2021-10-25 02:00:00 78
2021-10-25 04:00:00 0
2021-10-25 06:00:00 90
2021-10-25 08:00:00 2

Related

How to create 2-3 hrs from multiple columns

I’ve a df with multiple time series looks like this
df_observ
id
Date
start time
Action 1 time
end of action2
end Time
Observ1 time
observ2 time
observ1 value
observ2 value
indv1
3-2017
00:00:00
02:40:00
04:25:00
04:38:00
00:05:00
01:45:00
57
111
indv2
11-6-2019
00:00:00
00:46:00
02:16:00
02:40:00
01:01:00
02:37:00
68
113
indv2
13-4-2017
00:00:00
02:22:00
04:25:00
04:38:00
00:05:00
03:10:06
82
125
indv3
23-5-2022
00:00:00
01:34:00
02:22:00
03:34:00
02:24:00
03:25:00
67
101
indv4
8-11-2021
00:00:00
00:05:00
03:16:00
03:52:14
01:01:00
02:11:00
63
108
all-time series are subtracted from the start time. is there a way to plot obsev value changes in different time points ?
Thanks!

How to use pandas Grouper to get sum of values within each hour

I have the following table:
Hora_Retiro count_uses
0 00:00:18 1
1 00:00:34 1
2 00:02:27 1
3 00:03:13 1
4 00:06:45 1
... ... ...
748700 23:58:47 1
748701 23:58:49 1
748702 23:59:11 1
748703 23:59:47 1
748704 23:59:56 1
And I want to group all values within each hour, so I can see the total number of uses per hour (00:00:00 - 23:00:00)
I have the following code:
hora_pico_aug= hora_pico.groupby(pd.Grouper(key="Hora_Retiro",freq='H')).count()
Hora_Retiro column is of timedelta64[ns] type
Which gives the following output:
count_uses
Hora_Retiro
00:00:02 2566
01:00:02 602
02:00:02 295
03:00:02 5
04:00:02 10
05:00:02 4002
06:00:02 16075
07:00:02 39410
08:00:02 76272
09:00:02 56721
10:00:02 36036
11:00:02 32011
12:00:02 33725
13:00:02 41032
14:00:02 50747
15:00:02 50338
16:00:02 42347
17:00:02 54674
18:00:02 76056
19:00:02 57958
20:00:02 34286
21:00:02 22509
22:00:02 13894
23:00:02 7134
However, the index column starts at 00:00:02, and I want it to start at 00:00:00, and then go from one hour intervals. Something like this:
count_uses
Hora_Retiro
00:00:00 2565
01:00:00 603
02:00:00 295
03:00:00 5
04:00:00 10
05:00:00 4002
06:00:00 16075
07:00:00 39410
08:00:00 76272
09:00:00 56721
10:00:00 36036
11:00:00 32011
12:00:00 33725
13:00:00 41032
14:00:00 50747
15:00:00 50338
16:00:00 42347
17:00:00 54674
18:00:00 76056
19:00:00 57958
20:00:00 34286
21:00:00 22509
22:00:00 13894
23:00:00 7134
How can i make it to start at 00:00:00??
Thanks for the help!
You can create an hour column from Hora_Retiro column.
df['hour'] = df['Hora_Retiro'].dt.hour
And then groupby on the basis of hour
gpby_df = df.groupby('hour')['count_uses'].sum().reset_index()
gpby_df['hour'] = pd.to_datetime(gpby_df['hour'], format='%H').dt.time
gpby_df.columns = ['Hora_Retiro', 'sum_count_uses']
gpby_df
gives
Hora_Retiro sum_count_uses
0 00:00:00 14
1 09:00:00 1
2 10:00:00 2
3 20:00:00 2
I assume that Hora_Retiro column in your DataFrame is of
Timedelta type. It is not datetime, as in this case there
would be printed also the date part.
Indeed, your code creates groups starting at the minute / second
taken from the first row.
To group by "full hours":
round each element in this column to hour,
then group (just by this rounded value).
The code to do it is:
hora_pico.groupby(hora_pico.Hora_Retiro.apply(
lambda tt: tt.round('H'))).count_uses.count()
However I advise you to make up your mind, what do you want to count:
rows or values in count_uses column.
In the second case replace count function with sum.

How do I display a subset of a pandas dataframe?

I have a dataframe df that contains datetimes for every hour of a day between 2003-02-12 to 2017-06-30 and I want to delete all datetimes between 24th Dec and 1st Jan of EVERY year.
An extract of my data frame is:
...
7505,2003-12-23 17:00:00
7506,2003-12-23 18:00:00
7507,2003-12-23 19:00:00
7508,2003-12-23 20:00:00
7509,2003-12-23 21:00:00
7510,2003-12-23 22:00:00
7511,2003-12-23 23:00:00
7512,2003-12-24 00:00:00
7513,2003-12-24 01:00:00
7514,2003-12-24 02:00:00
7515,2003-12-24 03:00:00
7516,2003-12-24 04:00:00
7517,2003-12-24 05:00:00
7518,2003-12-24 06:00:00
...
7723,2004-01-01 19:00:00
7724,2004-01-01 20:00:00
7725,2004-01-01 21:00:00
7726,2004-01-01 22:00:00
7727,2004-01-01 23:00:00
7728,2004-01-02 00:00:00
7729,2004-01-02 01:00:00
7730,2004-01-02 02:00:00
7731,2004-01-02 03:00:00
7732,2004-01-02 04:00:00
7733,2004-01-02 05:00:00
7734,2004-01-02 06:00:00
7735,2004-01-02 07:00:00
...
and my expected output is:
...
7505,2003-12-23 17:00:00
7506,2003-12-23 18:00:00
7507,2003-12-23 19:00:00
7508,2003-12-23 20:00:00
7509,2003-12-23 21:00:00
7510,2003-12-23 22:00:00
7511,2003-12-23 23:00:00
...
7728,2004-01-02 00:00:00
7729,2004-01-02 01:00:00
7730,2004-01-02 02:00:00
7731,2004-01-02 03:00:00
7732,2004-01-02 04:00:00
7733,2004-01-02 05:00:00
7734,2004-01-02 06:00:00
7735,2004-01-02 07:00:00
...
Sample dataframe:
dates
0 2003-12-23 23:00:00
1 2003-12-24 05:00:00
2 2004-12-27 05:00:00
3 2003-12-13 23:00:00
4 2002-12-23 23:00:00
5 2004-01-01 05:00:00
6 2014-12-24 05:00:00
Solution:
If you want it for every year between the following dates excluded, then extract the month and dates first:
df['month'] = df['dates'].dt.month
df['day'] = df['dates'].dt.day
And now put the condition check:
dec_days = [24, 25, 26, 27, 28, 29, 30, 31]
## if the month is dec, then check for these dates
## if the month is jan, then just check for the day to be 1 like below
df = df[~(((df.month == 12) & (df.day.isin(dec_days))) | ((df.month == 1) & (df.day == 1)))]
Sample output:
dates month day
0 2003-12-23 23:00:00 12 23
3 2003-12-13 23:00:00 12 13
4 2002-12-23 23:00:00 12 23
This takes advantage of the fact that datetime-strings in the form mm-dd are sortable. Read everything in from the CSV file then filter for the dates you want:
df = pd.read_csv('...', parse_dates=['DateTime'])
s = df['DateTime'].dt.strftime('%m-%d')
excluded = (s == '01-01') | (s >= '12-24') # Jan 1 or >= Dec 24
df[~excluded]
You can try dropping on conditionals. Maybe with a pattern match to the date string or parsing the date as a number (like in Java) and conditionally removing.
datesIdontLike = df[df['colname'] == <stringPattern>].index
newDF = df.drop(datesIdontLike, inplace=True)
Check this out: https://thispointer.com/python-pandas-how-to-drop-rows-in-dataframe-by-conditions-on-column-values/
(If you have issues, let me know.)
You can use pandas and boolean filtering with strftime
# version 0.23.4
import pandas as pd
# make df
df = pd.DataFrame(pd.date_range('20181223', '20190103', freq='H'), columns=['date'])
# string format the date to only include the month and day
# then set it strictly less than '12-24' AND greater than or equal to `01-02`
df = df.loc[
(df.date.dt.strftime('%m-%d') < '12-24') &
(df.date.dt.strftime('%m-%d') >= '01-02')
].copy()
print(df)
date
0 2018-12-23 00:00:00
1 2018-12-23 01:00:00
2 2018-12-23 02:00:00
3 2018-12-23 03:00:00
4 2018-12-23 04:00:00
5 2018-12-23 05:00:00
6 2018-12-23 06:00:00
7 2018-12-23 07:00:00
8 2018-12-23 08:00:00
9 2018-12-23 09:00:00
10 2018-12-23 10:00:00
11 2018-12-23 11:00:00
12 2018-12-23 12:00:00
13 2018-12-23 13:00:00
14 2018-12-23 14:00:00
15 2018-12-23 15:00:00
16 2018-12-23 16:00:00
17 2018-12-23 17:00:00
18 2018-12-23 18:00:00
19 2018-12-23 19:00:00
20 2018-12-23 20:00:00
21 2018-12-23 21:00:00
22 2018-12-23 22:00:00
23 2018-12-23 23:00:00
240 2019-01-02 00:00:00
241 2019-01-02 01:00:00
242 2019-01-02 02:00:00
243 2019-01-02 03:00:00
244 2019-01-02 04:00:00
245 2019-01-02 05:00:00
246 2019-01-02 06:00:00
247 2019-01-02 07:00:00
248 2019-01-02 08:00:00
249 2019-01-02 09:00:00
250 2019-01-02 10:00:00
251 2019-01-02 11:00:00
252 2019-01-02 12:00:00
253 2019-01-02 13:00:00
254 2019-01-02 14:00:00
255 2019-01-02 15:00:00
256 2019-01-02 16:00:00
257 2019-01-02 17:00:00
258 2019-01-02 18:00:00
259 2019-01-02 19:00:00
260 2019-01-02 20:00:00
261 2019-01-02 21:00:00
262 2019-01-02 22:00:00
263 2019-01-02 23:00:00
264 2019-01-03 00:00:00
This will work with multiple years because we are only filtering on the month and day.
# change range to include 2017
df = pd.DataFrame(pd.date_range('20171223', '20190103', freq='H'), columns=['date'])
df = df.loc[
(df.date.dt.strftime('%m-%d') < '12-24') &
(df.date.dt.strftime('%m-%d') >= '01-02')
].copy()
print(df)
date
0 2017-12-23 00:00:00
1 2017-12-23 01:00:00
2 2017-12-23 02:00:00
3 2017-12-23 03:00:00
4 2017-12-23 04:00:00
5 2017-12-23 05:00:00
6 2017-12-23 06:00:00
7 2017-12-23 07:00:00
8 2017-12-23 08:00:00
9 2017-12-23 09:00:00
10 2017-12-23 10:00:00
11 2017-12-23 11:00:00
12 2017-12-23 12:00:00
13 2017-12-23 13:00:00
14 2017-12-23 14:00:00
15 2017-12-23 15:00:00
16 2017-12-23 16:00:00
17 2017-12-23 17:00:00
18 2017-12-23 18:00:00
19 2017-12-23 19:00:00
20 2017-12-23 20:00:00
21 2017-12-23 21:00:00
22 2017-12-23 22:00:00
23 2017-12-23 23:00:00
240 2018-01-02 00:00:00
241 2018-01-02 01:00:00
242 2018-01-02 02:00:00
243 2018-01-02 03:00:00
244 2018-01-02 04:00:00
245 2018-01-02 05:00:00
... ...
8779 2018-12-23 19:00:00
8780 2018-12-23 20:00:00
8781 2018-12-23 21:00:00
8782 2018-12-23 22:00:00
8783 2018-12-23 23:00:00
9000 2019-01-02 00:00:00
9001 2019-01-02 01:00:00
9002 2019-01-02 02:00:00
9003 2019-01-02 03:00:00
9004 2019-01-02 04:00:00
9005 2019-01-02 05:00:00
9006 2019-01-02 06:00:00
9007 2019-01-02 07:00:00
9008 2019-01-02 08:00:00
9009 2019-01-02 09:00:00
9010 2019-01-02 10:00:00
9011 2019-01-02 11:00:00
9012 2019-01-02 12:00:00
9013 2019-01-02 13:00:00
9014 2019-01-02 14:00:00
9015 2019-01-02 15:00:00
9016 2019-01-02 16:00:00
9017 2019-01-02 17:00:00
9018 2019-01-02 18:00:00
9019 2019-01-02 19:00:00
9020 2019-01-02 20:00:00
9021 2019-01-02 21:00:00
9022 2019-01-02 22:00:00
9023 2019-01-02 23:00:00
9024 2019-01-03 00:00:00
Since you want this to happen for every year, we can first define a series that where we replace the year by a static value (2000 for example). Let date be the column that stores the date, we can generate such column as:
dt = pd.to_datetime({'year': 2000, 'month': df['date'].dt.month, 'day': df['date'].dt.day})
For the given sample data, we get:
>>> dt
0 2000-12-23
1 2000-12-23
2 2000-12-23
3 2000-12-23
4 2000-12-23
5 2000-12-23
6 2000-12-23
7 2000-12-24
8 2000-12-24
9 2000-12-24
10 2000-12-24
11 2000-12-24
12 2000-12-24
13 2000-12-24
14 2000-01-01
15 2000-01-01
16 2000-01-01
17 2000-01-01
18 2000-01-01
19 2000-01-02
20 2000-01-02
21 2000-01-02
22 2000-01-02
23 2000-01-02
24 2000-01-02
25 2000-01-02
26 2000-01-02
dtype: datetime64[ns]
Next we can filter the rows, like:
from datetime import date
df[(dt >= date(2000,1,2)) & (dt < date(2000,12,24))]
This gives us the following data for your sample data:
>>> df[(dt >= date(2000,1,2)) & (dt < date(2000,12,24))]
id dt
0 7505 2003-12-23 17:00:00
1 7506 2003-12-23 18:00:00
2 7507 2003-12-23 19:00:00
3 7508 2003-12-23 20:00:00
4 7509 2003-12-23 21:00:00
5 7510 2003-12-23 22:00:00
6 7511 2003-12-23 23:00:00
19 7728 2004-01-02 00:00:00
20 7729 2004-01-02 01:00:00
21 7730 2004-01-02 02:00:00
22 7731 2004-01-02 03:00:00
23 7732 2004-01-02 04:00:00
24 7733 2004-01-02 05:00:00
25 7734 2004-01-02 06:00:00
26 7735 2004-01-02 07:00:00
So regardless what the year is, we will only consider dates between the 2nd of January and the 23rd of December (both inclusive).

Python: Grouping by time interval

I have a dataframe that looks like this:
I'm using python 3.6.5 and a datetime.time object for the index
print(sum_by_time)
Trips
Time
00:00:00 10
01:00:00 10
02:00:00 10
03:00:00 10
04:00:00 20
05:00:00 20
06:00:00 20
07:00:00 20
08:00:00 30
09:00:00 30
10:00:00 30
11:00:00 30
How can I group this dataframe by time interval to get something like this:
Trips
Time
00:00:00 - 03:00:00 40
04:00:00 - 07:00:00 80
08:00:00 - 11:00:00 120
I think need convert index values to timedeltas by to_timedelta and then resample:
df.index = pd.to_timedelta(df.index.astype(str))
df = df.resample('4H').sum()
print (df)
Trips
00:00:00 40
04:00:00 80
08:00:00 120
EDIT:
For your format need:
df['d'] = pd.to_datetime(df.index.astype(str))
df = df.groupby(pd.Grouper(freq='4H', key='d')).agg({'Trips':'sum', 'd':['first','last']})
df.columns = df.columns.map('_'.join)
df = df.set_index(df['d_first'].dt.strftime('%H:%M:%S') + ' - ' + df['d_last'].dt.strftime('%H:%M:%S'))[['Trips_sum']]
print (df)
Trips_sum
00:00:00 - 03:00:00 40
04:00:00 - 07:00:00 80
08:00:00 - 11:00:00 120

Converting pandas date column into seconds elapsed

I have a pandas dataframe of multiple columns with a column of datetime64[ns] data. Time is in HH:MM:SS format. How can I convert this column of dates into a column of seconds elapsed? Like if the time said 10:00:00 in seconds that would be 36000. The seconds should be in a float64 type format.
Example data column
New Answer
Convert your text to Timedelta
df['Origin Time(Local)'] = pd.to_timedelta(df['Origin Time(Local)'])
df['Seconds'] = df['Origin Time(Local)'].dt.total_seconds()
Old Answer
Consider the dataframe df
df = pd.DataFrame(dict(Date=pd.date_range('2017-03-01', '2017-03-02', freq='2H')))
Date
0 2017-03-01 00:00:00
1 2017-03-01 02:00:00
2 2017-03-01 04:00:00
3 2017-03-01 06:00:00
4 2017-03-01 08:00:00
5 2017-03-01 10:00:00
6 2017-03-01 12:00:00
7 2017-03-01 14:00:00
8 2017-03-01 16:00:00
9 2017-03-01 18:00:00
10 2017-03-01 20:00:00
11 2017-03-01 22:00:00
12 2017-03-02 00:00:00
Subtract the most recent day from the timestamps and use total_seconds. total_seconds is an attribute of a Timedelta. We get a series of Timedeltas by taking the difference between two series of Timestamps.
(df.Date - df.Date.dt.floor('D')).dt.total_seconds()
# equivalent to
# (df.Date - pd.to_datetime(df.Date.dt.date)).dt.total_seconds()
0 0.0
1 7200.0
2 14400.0
3 21600.0
4 28800.0
5 36000.0
6 43200.0
7 50400.0
8 57600.0
9 64800.0
10 72000.0
11 79200.0
12 0.0
Name: Date, dtype: float64
Put it in a new column
df.assign(seconds=(df.Date - df.Date.dt.floor('D')).dt.total_seconds())
Date seconds
0 2017-03-01 00:00:00 0.0
1 2017-03-01 02:00:00 7200.0
2 2017-03-01 04:00:00 14400.0
3 2017-03-01 06:00:00 21600.0
4 2017-03-01 08:00:00 28800.0
5 2017-03-01 10:00:00 36000.0
6 2017-03-01 12:00:00 43200.0
7 2017-03-01 14:00:00 50400.0
8 2017-03-01 16:00:00 57600.0
9 2017-03-01 18:00:00 64800.0
10 2017-03-01 20:00:00 72000.0
11 2017-03-01 22:00:00 79200.0
12 2017-03-02 00:00:00 0.0
it would work:
df['time'].dt.total_seconds()
regards

Categories