Pandas Series of hour values to Series of dates - python

I have a time series covering January of 1979 with 6 hours time deltas. Time format is in continuous hour range:
1
7
13
18
25
31
.
.
.
739
Is it possible to convert these ints to dates? For instance:
1979/01/01 - 1:00
1979/01/01 - 7:00
1979/01/01 - 13:00
1979/01/01 - 18:00
1979/01/02 - 1:00
Thank you so much!

Setup
df = pd.DataFrame({'hour': [1,7,13,18,25,31]})
Use pd.to_datetime with the unit flag, and set the origin flag to the beginning of your desired year.
pd.to_datetime(df.hour, unit='h', origin='1979-01-01')
0 1979-01-01 01:00:00
1 1979-01-01 07:00:00
2 1979-01-01 13:00:00
3 1979-01-01 18:00:00
4 1979-01-02 01:00:00
5 1979-01-02 07:00:00
Name: hour, dtype: datetime64[ns]

Here is another way:
import pandas as pd
s = pd.Series([1,7,13])
s = pd.to_datetime(s*1e9*60*60+ pd.Timestamp(1979,1,1).value)
print(s)
Returns:
0 1979-01-01 01:00:00
1 1979-01-01 07:00:00
2 1979-01-01 13:00:00
dtype: datetime64[ns]

Could also just do this:
from datetime import datetime, timedelta
s = pd.Series([1,7,13,18,25])
s = s.apply(lambda h: datetime(1979, 1, 1) + timedelta(hours=h))
print(s)
Returns:
0 1979-01-01 01:00:00
1 1979-01-01 07:00:00
2 1979-01-01 13:00:00
3 1979-01-01 18:00:00
4 1979-01-02 01:00:00
dtype: datetime64[ns]

Related

How to use pandas Grouper to get sum of values within each hour

I have the following table:
Hora_Retiro count_uses
0 00:00:18 1
1 00:00:34 1
2 00:02:27 1
3 00:03:13 1
4 00:06:45 1
... ... ...
748700 23:58:47 1
748701 23:58:49 1
748702 23:59:11 1
748703 23:59:47 1
748704 23:59:56 1
And I want to group all values within each hour, so I can see the total number of uses per hour (00:00:00 - 23:00:00)
I have the following code:
hora_pico_aug= hora_pico.groupby(pd.Grouper(key="Hora_Retiro",freq='H')).count()
Hora_Retiro column is of timedelta64[ns] type
Which gives the following output:
count_uses
Hora_Retiro
00:00:02 2566
01:00:02 602
02:00:02 295
03:00:02 5
04:00:02 10
05:00:02 4002
06:00:02 16075
07:00:02 39410
08:00:02 76272
09:00:02 56721
10:00:02 36036
11:00:02 32011
12:00:02 33725
13:00:02 41032
14:00:02 50747
15:00:02 50338
16:00:02 42347
17:00:02 54674
18:00:02 76056
19:00:02 57958
20:00:02 34286
21:00:02 22509
22:00:02 13894
23:00:02 7134
However, the index column starts at 00:00:02, and I want it to start at 00:00:00, and then go from one hour intervals. Something like this:
count_uses
Hora_Retiro
00:00:00 2565
01:00:00 603
02:00:00 295
03:00:00 5
04:00:00 10
05:00:00 4002
06:00:00 16075
07:00:00 39410
08:00:00 76272
09:00:00 56721
10:00:00 36036
11:00:00 32011
12:00:00 33725
13:00:00 41032
14:00:00 50747
15:00:00 50338
16:00:00 42347
17:00:00 54674
18:00:00 76056
19:00:00 57958
20:00:00 34286
21:00:00 22509
22:00:00 13894
23:00:00 7134
How can i make it to start at 00:00:00??
Thanks for the help!
You can create an hour column from Hora_Retiro column.
df['hour'] = df['Hora_Retiro'].dt.hour
And then groupby on the basis of hour
gpby_df = df.groupby('hour')['count_uses'].sum().reset_index()
gpby_df['hour'] = pd.to_datetime(gpby_df['hour'], format='%H').dt.time
gpby_df.columns = ['Hora_Retiro', 'sum_count_uses']
gpby_df
gives
Hora_Retiro sum_count_uses
0 00:00:00 14
1 09:00:00 1
2 10:00:00 2
3 20:00:00 2
I assume that Hora_Retiro column in your DataFrame is of
Timedelta type. It is not datetime, as in this case there
would be printed also the date part.
Indeed, your code creates groups starting at the minute / second
taken from the first row.
To group by "full hours":
round each element in this column to hour,
then group (just by this rounded value).
The code to do it is:
hora_pico.groupby(hora_pico.Hora_Retiro.apply(
lambda tt: tt.round('H'))).count_uses.count()
However I advise you to make up your mind, what do you want to count:
rows or values in count_uses column.
In the second case replace count function with sum.

Converting to datetime in python

I have a time data in a column and trying to figure out how can I get it in datetime format
2000
2100
2300
2355
0
1
5
10
100
105
330
My question is how can I get these in datetime format:
output should be:
20:00:00
21:00:00
23:00:00
23:55:00
00:00:00
00:01:00
00:05:00
00:10:00
01:00:00
01:05:00
03:30:00
tried:
1. da = pd.to_datetime(330, format='%H%M')
output: '03:30:00'
2. d= str(datetime.timedelta(minutes = 55 ))
output : '0:55:00'
But if I apply 1. to 100 it gives 10 hrs.
eg: da = pd.to_datetime(100, format='%H%M')
output: '10:00:00'
Try,
pd.to_datetime(df['time'].astype(str).str.zfill(4), format = '%H%M').dt.time
0 20:00:00
1 21:00:00
2 23:00:00
3 23:55:00
4 00:00:00
5 00:01:00
6 00:05:00
7 00:10:00
8 01:00:00
9 01:05:00
10 03:30:00
IIUC str.rjust
pd.to_datetime(s.astype(str).str.rjust(4,'0'),format='%H%M').dt.time
Out[41]:
0 20:00:00
1 21:00:00
2 23:00:00
3 23:55:00
4 00:00:00
5 00:01:00
6 00:05:00
7 00:10:00
8 01:00:00
9 01:05:00
10 03:30:00
Name: x, dtype: object
Since novice code, I am making the things more explicit and adding the formatting letters %H and %M info:
df['cname'] = pd.to_datetime(df['cname'].astype(str).str.zfill(4), format = '%H%M').dt.time
print(df['cname'])
# %H Hour (24-hour clock) as a zero-padded decimal number. 07
# %M Minute as a zero-padded decimal number. 06

Converting pandas date column into seconds elapsed

I have a pandas dataframe of multiple columns with a column of datetime64[ns] data. Time is in HH:MM:SS format. How can I convert this column of dates into a column of seconds elapsed? Like if the time said 10:00:00 in seconds that would be 36000. The seconds should be in a float64 type format.
Example data column
New Answer
Convert your text to Timedelta
df['Origin Time(Local)'] = pd.to_timedelta(df['Origin Time(Local)'])
df['Seconds'] = df['Origin Time(Local)'].dt.total_seconds()
Old Answer
Consider the dataframe df
df = pd.DataFrame(dict(Date=pd.date_range('2017-03-01', '2017-03-02', freq='2H')))
Date
0 2017-03-01 00:00:00
1 2017-03-01 02:00:00
2 2017-03-01 04:00:00
3 2017-03-01 06:00:00
4 2017-03-01 08:00:00
5 2017-03-01 10:00:00
6 2017-03-01 12:00:00
7 2017-03-01 14:00:00
8 2017-03-01 16:00:00
9 2017-03-01 18:00:00
10 2017-03-01 20:00:00
11 2017-03-01 22:00:00
12 2017-03-02 00:00:00
Subtract the most recent day from the timestamps and use total_seconds. total_seconds is an attribute of a Timedelta. We get a series of Timedeltas by taking the difference between two series of Timestamps.
(df.Date - df.Date.dt.floor('D')).dt.total_seconds()
# equivalent to
# (df.Date - pd.to_datetime(df.Date.dt.date)).dt.total_seconds()
0 0.0
1 7200.0
2 14400.0
3 21600.0
4 28800.0
5 36000.0
6 43200.0
7 50400.0
8 57600.0
9 64800.0
10 72000.0
11 79200.0
12 0.0
Name: Date, dtype: float64
Put it in a new column
df.assign(seconds=(df.Date - df.Date.dt.floor('D')).dt.total_seconds())
Date seconds
0 2017-03-01 00:00:00 0.0
1 2017-03-01 02:00:00 7200.0
2 2017-03-01 04:00:00 14400.0
3 2017-03-01 06:00:00 21600.0
4 2017-03-01 08:00:00 28800.0
5 2017-03-01 10:00:00 36000.0
6 2017-03-01 12:00:00 43200.0
7 2017-03-01 14:00:00 50400.0
8 2017-03-01 16:00:00 57600.0
9 2017-03-01 18:00:00 64800.0
10 2017-03-01 20:00:00 72000.0
11 2017-03-01 22:00:00 79200.0
12 2017-03-02 00:00:00 0.0
it would work:
df['time'].dt.total_seconds()
regards

How to rearrange a date in python

I have a column in a pandas data frame looking like:
test1.Received
Out[9]:
0 01/01/2015 17:25
1 02/01/2015 11:43
2 04/01/2015 18:21
3 07/01/2015 16:17
4 12/01/2015 20:12
5 14/01/2015 11:09
6 15/01/2015 16:05
7 16/01/2015 21:02
8 26/01/2015 03:00
9 27/01/2015 08:32
10 30/01/2015 11:52
This represents a time stamp as Day Month Year Hour Minute. I would like to rearrange the date as Year Month Day Hour Minute. So that it would look like:
test1.Received
Out[9]:
0 2015/01/01 17:25
1 2015/01/02 11:43
...
Just use pd.to_datetime:
In [33]:
import pandas as pd
pd.to_datetime(df['date'])
Out[33]:
index
0 2015-01-01 17:25:00
1 2015-02-01 11:43:00
2 2015-04-01 18:21:00
3 2015-07-01 16:17:00
4 2015-12-01 20:12:00
5 2015-01-14 11:09:00
6 2015-01-15 16:05:00
7 2015-01-16 21:02:00
8 2015-01-26 03:00:00
9 2015-01-27 08:32:00
10 2015-01-30 11:52:00
Name: date, dtype: datetime64[ns]
In your case:
pd.to_datetime(test1['Received'])
should just work
If you want to change the display format then you need to parse as a datetime and then apply `datetime.strftime:
In [35]:
import datetime as dt
pd.to_datetime(df['date']).apply(lambda x: dt.datetime.strftime(x, '%m/%d/%y %H:%M:%S'))
Out[35]:
index
0 01/01/15 17:25:00
1 02/01/15 11:43:00
2 04/01/15 18:21:00
3 07/01/15 16:17:00
4 12/01/15 20:12:00
5 01/14/15 11:09:00
6 01/15/15 16:05:00
7 01/16/15 21:02:00
8 01/26/15 03:00:00
9 01/27/15 08:32:00
10 01/30/15 11:52:00
Name: date, dtype: object
So the above is now showing month/day/year, in your case the following should work:
pd.to_datetime(test1['Received']).apply(lambda x: dt.datetime.strftime(x, '%y/%m/%d %H:%M:%S'))
EDIT
it looks like you need to pass param dayfirst=True to to_datetime:
In [45]:
pd.to_datetime(df['date'], format('%d/%m/%y %H:%M:%S'), dayfirst=True).apply(lambda x: dt.datetime.strftime(x, '%m/%d/%y %H:%M:%S'))
Out[45]:
index
0 01/01/15 17:25:00
1 01/02/15 11:43:00
2 01/04/15 18:21:00
3 01/07/15 16:17:00
4 01/12/15 20:12:00
5 01/14/15 11:09:00
6 01/15/15 16:05:00
7 01/16/15 21:02:00
8 01/26/15 03:00:00
9 01/27/15 08:32:00
10 01/30/15 11:52:00
Name: date, dtype: object
Pandas has this in-built, you can specify your datetime format
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.to_datetime.html.
use infer_datetime_format
>>> import pandas as pd
>>> i = pd.date_range('20000101',periods=100)
>>> df = pd.DataFrame(dict(year = i.year, month = i.month, day = i.day))
>>> pd.to_datetime(df.year*10000 + df.month*100 + df.day, format='%Y%m%d')
0 2000-01-01
1 2000-01-02
...
98 2000-04-08
99 2000-04-09
Length: 100, dtype: datetime64[ns]
you can use the datetime functions to convert from and to strings.
# converts to date
datetime.strptime(date_string, 'DD/MM/YYYY HH:MM')
and
# converts to your requested string format
datetime.strftime(date_string, "YYYY/MM/DD HH:MM:SS")

pandas time delta from grouped neighbors

I have a group of dates. I would like to subtract them from their forward neighbor to get the delta between them. My code look like this:
import pandas, numpy, StringIO
txt = '''ID,DATE
002691c9cec109e64558848f1358ac16,2003-08-13 00:00:00
002691c9cec109e64558848f1358ac16,2003-08-13 00:00:00
0088f218a1f00e0fe1b94919dc68ec33,2006-05-07 00:00:00
0088f218a1f00e0fe1b94919dc68ec33,2006-06-03 00:00:00
00d34668025906d55ae2e529615f530a,2006-03-09 00:00:00
00d34668025906d55ae2e529615f530a,2006-03-09 00:00:00
0101d3286dfbd58642a7527ecbddb92e,2007-10-13 00:00:00
0101d3286dfbd58642a7527ecbddb92e,2007-10-27 00:00:00
0103bd73af66e5a44f7867c0bb2203cc,2001-02-01 00:00:00
0103bd73af66e5a44f7867c0bb2203cc,2008-01-20 00:00:00
'''
df = pandas.read_csv(StringIO.StringIO(txt))
df = df.sort('DATE')
df.DATE = pandas.to_datetime(df.DATE)
grouped = df.groupby('ID')
df['X_SEQUENCE_GAP'] = pandas.concat([g['DATE'].sub(g['DATE'].shift(), fill_value=0) for title,g in grouped])
I am getting pretty incomprehensible results. So, I am going to go with I have a logic error.
The results I get are as follows:
ID DATE X_SEQUENCE_GAP
0 002691c9cec109e64558848f1358ac16 2003-08-13 00:00:00 12277 days, 00:00:00
1 002691c9cec109e64558848f1358ac16 2003-08-13 00:00:00 00:00:00
3 0088f218a1f00e0fe1b94919dc68ec33 2006-06-03 00:00:00 27 days, 00:00:00
2 0088f218a1f00e0fe1b94919dc68ec33 2006-05-07 00:00:00 13275 days, 00:00:00
5 00d34668025906d55ae2e529615f530a 2006-03-09 00:00:00 13216 days, 00:00:00
4 00d34668025906d55ae2e529615f530a 2006-03-09 00:00:00 00:00:00
6 0101d3286dfbd58642a7527ecbddb92e 2007-10-13 00:00:00 13799 days, 00:00:00
7 0101d3286dfbd58642a7527ecbddb92e 2007-10-27 00:00:00 14 days, 00:00:00
9 0103bd73af66e5a44f7867c0bb2203cc 2008-01-20 00:00:00 2544 days, 00:00:00
8 0103bd73af66e5a44f7867c0bb2203cc 2001-02-01 00:00:00 11354 days, 00:00:00
I was expecting for exapme that 0 and 1 would have both a 0 result. Any help is most appreciated.
This is in 0.11rc1 (I don't think will work on a prior version)
When you shift dates the first one is a NaT (like a nan, but for datetimes/timedeltas)
In [27]: df['X_SEQUENCE_GAP'] = grouped.apply(lambda g: g['DATE']-g['DATE'].shift())
In [30]: df.sort()
Out[30]:
ID DATE X_SEQUENCE_GAP
0 002691c9cec109e64558848f1358ac16 2003-08-13 00:00:00 NaT
1 002691c9cec109e64558848f1358ac16 2003-08-13 00:00:00 00:00:00
2 0088f218a1f00e0fe1b94919dc68ec33 2006-05-07 00:00:00 NaT
3 0088f218a1f00e0fe1b94919dc68ec33 2006-06-03 00:00:00 27 days, 00:00:00
4 00d34668025906d55ae2e529615f530a 2006-03-09 00:00:00 NaT
5 00d34668025906d55ae2e529615f530a 2006-03-09 00:00:00 00:00:00
6 0101d3286dfbd58642a7527ecbddb92e 2007-10-13 00:00:00 NaT
7 0101d3286dfbd58642a7527ecbddb92e 2007-10-27 00:00:00 14 days, 00:00:00
8 0103bd73af66e5a44f7867c0bb2203cc 2001-02-01 00:00:00 NaT
9 0103bd73af66e5a44f7867c0bb2203cc 2008-01-20 00:00:00 2544 days, 00:00:00
You can then fillna (but you have to do this ackward type conversion becuase of a numpy bug, will get fixed in 0.12).
In [57]: df['X_SEQUENCE_GAP'].sort_index().astype('timedelta64[ns]').fillna(0)
Out[57]:
0 00:00:00
1 00:00:00
2 00:00:00
3 27 days, 00:00:00
4 00:00:00
5 00:00:00
6 00:00:00
7 14 days, 00:00:00
8 00:00:00
9 2544 days, 00:00:00
Name: X_SEQUENCE_GAP, dtype: timedelta64[ns]

Categories