How can I create a datetime column without 'date' part? - python

I have a dataframe and there's a column named 'Time' in it like the below(HH:MM:SS:fffff).
>>> df['Time']
0 09:42:29:75284
1 09:42:29:95584
2 09:42:31:15036
3 09:42:35:15138
4 09:42:35:95491
5 09:42:43:55414
6 09:42:45:35866
7 09:42:46:74638
8 09:42:47:35582
9 09:42:47:74774
10 09:42:48:94582
...
Name: Time, Length: 18924, dtype: object
I want to change its type as datetime, in order to make it easier to calculate. Is it possible to change its type, using pandas.to_datetime, as datetime without date?

You can convert it to timedelta64[ns] dtype:
Source DF:
In [164]: df
Out[164]:
Time
0 09:42:29:75284
1 09:42:29:95584
2 09:42:31:15036
3 09:42:35:15138
4 09:42:35:95491
5 09:42:43:55414
6 09:42:45:35866
7 09:42:46:74638
8 09:42:47:35582
9 09:42:47:74774
10 09:42:48:94582
In [165]: df.dtypes
Out[165]:
Time object # <-------- NOTE!
dtype: object
Converted:
In [166]: df.Time = pd.to_timedelta(df.Time.str.replace(r'\:(\d+)$', r'.\1'),
errors='coerce')
In [167]: df
Out[167]:
Time
0 09:42:29.752840
1 09:42:29.955840
2 09:42:31.150360
3 09:42:35.151380
4 09:42:35.954910
5 09:42:43.554140
6 09:42:45.358660
7 09:42:46.746380
8 09:42:47.355820
9 09:42:47.747740
10 09:42:48.945820
In [168]: df.dtypes
Out[168]:
Time timedelta64[ns] # <-------- NOTE!
dtype: object

Please refer python to_datetime documentation.
import pandas as pd
df = pd.DataFrame({'Time': ['09:42:29:75284','09:42:29:95584','09:42:31:15036']})
df
Out[]:
Time
0 09:42:29:75284
1 09:42:29:95584
2 09:42:31:15036
You can convert this into datetime format by specifying format as follows:
pd.to_datetime(df['Time'], format='%H:%M:%S:%f')
Out[]:
0 1900-01-01 09:42:29.752840
1 1900-01-01 09:42:29.955840
2 1900-01-01 09:42:31.150360
Name: Time, dtype: datetime64[ns]
but doing this will also add date 1900-01-01.

Related

Convert int64 column to time column

I have a pandas dataframe with a "time" column, which currently looks like:
ab['ZEIT'].unique()
array([ 0, 165505, 203355, ..., 73139, 75211, 74244], dtype=int64)
How can i get german time format out of it with hh:mm:ss so basically that it looks like:
array([00:00:00, 16:55:05, 20:33:55, ..., 07:31:39, 07:52:11, 07:42:44], dtype=?)
Use pd.to_datetime after converting values to string with padded 0s. Then call dt accessor for time
In [12]: pd.to_datetime(df['ZEIT'].astype(str).str.zfill(6), format='%H%M%S').dt.time
Out[12]:
0 00:00:00
1 16:55:05
2 20:33:55
3 07:31:39
4 07:52:11
5 07:42:44
Name: ZEIT, dtype: object
Details
In [13]: df
Out[13]:
ZEIT
0 0
1 165505
2 203355
3 73139
4 75211
5 74244
In [14]: df['ZEIT'].astype(str).str.zfill(6)
Out[14]:
0 000000
1 165505
2 203355
3 073139
4 075211
5 074244
Name: ZEIT, dtype: object
In [15]: pd.to_datetime(df['ZEIT'].astype(str).str.zfill(6), format='%H%M%S')
Out[15]:
0 1900-01-01 00:00:00
1 1900-01-01 16:55:05
2 1900-01-01 20:33:55
3 1900-01-01 07:31:39
4 1900-01-01 07:52:11
5 1900-01-01 07:42:44
Name: ZEIT, dtype: datetime64[ns]

pandas to_datetime convert datetime string to 0

I have a column in a df which contains datetime strings,
inv_date
24/01/2008
15/06/2007 14:55:22
08/06/2007 18:26:12
15/08/2007 14:53:25
15/02/2008
07/03/2007
13/08/2007
I used pd.to_datetime with format %d%m%Y for converting the strings into datetime values;
pd.to_datetime(df.inv_date, errors='coerce', format='%d%m%Y')
I got
inv_date
24/01/2008
0
0
0
15/02/2008
07/03/2007
13/08/2007
the format is inferred from inv_date as the most common datetime format; I am wondering how to not convert 15/06/2007 14:55:22, 08/06/2007 18:26:12, 15/08/2007 14:53:25 to 0s, but 15/06/2007, 08/06/2007, 15/08/2007.
Use the regular pd.to_datetime call then use .dt.date:
>>> pd.to_datetime(df.inv_date).dt.date
0 2008-01-24
1 2007-06-15
2 2007-08-06
3 2007-08-15
4 2008-02-15
5 2007-07-03
6 2007-08-13
Name: inv_date, dtype: object
>>>
Or as #ChrisA mentioned, you can also use, only thing is the pandas format is good already, so skipped that part:
>>> pd.to_datetime(df.inv_date.str[:10], errors='coerce')
0 2008-01-24
1 2007-06-15
2 2007-08-06
3 2007-08-15
4 2008-02-15
5 2007-07-03
6 2007-08-13
Name: inv_date, dtype: object
>>>
You can also try this:
df = pd.read_csv('myfile.csv', parse_dates=['inv_date'], dayfirst=True)
df['inv_date'].dt.strftime('%d/%m/%Y')
0 24/01/2008
1 15/06/2007
2 08/06/2007
3 15/08/2007
4 15/02/2008
5 07/03/2007
6 13/08/2007
Hope this will help too.

How to plot timedelta data from a pandas DataFrame?

I am trying to plot a Series (a columns from a dataframe to be precise). It seems to have valid data in the format hh:mm:ss (timedelta64)
In [14]: x5.task_a.describe()
Out[14]:
count 165
mean 0 days 03:35:41.121212
std 0 days 07:07:40.950819
min 0 days 00:00:06
25% 0 days 00:37:13
50% 0 days 01:28:17
75% 0 days 03:41:32
max 2 days 12:32:26
Name: task_a, dtype: object
In [15]: x5.task_a.head()
Out[15]:
wbdqueue_id
26868 00:26:11
26869 02:08:28
26872 00:26:07
26874 00:48:22
26875 00:26:17
Name: task_a, dtype: timedelta64[ns]
But when I try to plot it, I get an error saying there is no numeric data in the Empty 'DataFrame'.
I've tried:
x5.task_a.plot.kde()
and
x5.plot()
where x5 is the DataFrame with several Series of such timedelta data.
TypeError: Empty 'DataFrame': no numeric data to plot
I see that one can generate series of random values and plot it.
What am I doing wrong?
Convert to any logical numeric values, like hours or minutes, and then use .plot.kde()
(x5.task_a / np.timedelta64(1, 'h')).plot.kde()
Details
In [149]: x5
Out[149]:
task_a
0 0 days 22:27:46.684800
1 1 days 00:20:43.036800
2 0 days 12:16:24.873600
3 1 days 11:10:14.880000
4 1 days 03:31:05.548800
5 1 days 05:20:52.944000
6 1 days 00:09:09.590400
7 0 days 13:53:50.179200
8 1 days 04:08:57.695999
9 0 days 14:14:53.088000
In [150]: x5.task_a / np.timedelta64(1, 'h') # convert to hours
Out[150]:
0 22.462968
1 24.345288
2 12.273576
3 35.170800
4 27.518208
5 29.348040
6 24.152664
7 13.897272
8 28.149360
9 14.248080
Name: task_a, dtype: float64
Or to minutes
In [151]: x5.task_a / np.timedelta64(1, 'm')
Out[151]:
0 1347.77808
1 1460.71728
2 736.41456
3 2110.24800
4 1651.09248
5 1760.88240
6 1449.15984
7 833.83632
8 1688.96160
9 854.88480
Name: task_a, dtype: float64
another way using total_seconds
In [153]: x5.task_a.dt.total_seconds() / 60
Out[153]:
0 1347.77808
1 1460.71728
2 736.41456
3 2110.24800
4 1651.09248
5 1760.88240
6 1449.15984
7 833.83632
8 1688.96160
9 854.88480
Name: task_a, dtype: float64
You can convert the TimedeltaIndex to total_seconds
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
idx = pd.date_range('20140101', '20140201')
df = pd.DataFrame(index=idx)
df['col0'] = np.random.randn(len(idx))
diff_idx = (pd.Series(((idx-
idx.shift(1)).fillna(pd.Timedelta(0))).map(pd.TimedeltaIndex.total_seconds),
index=idx)) # need to do this because we can't shift index
df['diff_dt'] = diff_idx
df['diff_dt'].plot()

How to replace python data frame values and concatenate another string with where condition

I want to replace column "Time Period" values & attach other string as shown below.
value: 2017M12
M replace with - and add '-01'
Final result: 2017-12-01
Frequency,Time Period,Date
3,2016M12
3,2016M1
3,2016M8
3,2016M7
3,2016M11
3,2016M10
dt['Date'] = dt.loc[dt['Frequency']=='3',replace('Time Period','M','-')]+'-01'
In [18]: df.loc[df.Frequency==3,'Date'] = \
pd.to_datetime(df.loc[df.Frequency==3, 'Time Period'],
format='%YM%m', errors='coerce')
In [19]: df
Out[19]:
Frequency Time Period Date
0 3 2016M12 2016-12-01
1 3 2016M1 2016-01-01
2 3 2016M8 2016-08-01
3 3 2016M7 2016-07-01
4 3 2016M11 2016-11-01
5 3 2016M10 2016-10-01
In [20]: df.dtypes
Out[20]:
Frequency int64
Time Period object
Date datetime64[ns] # <--- NOTE
dtype: object
You can use apply :
dt['Date'] = dt[ dt['Frequency'] ==3]['Time Period'].apply(lambda x: x.replace('M','-')+"-01")
output
Frequency Time Period Date
0 3 2016M12 2016-12-01
1 3 2016M1 2016-1-01
2 3 2016M8 2016-8-01
3 3 2016M7 2016-7-01
4 3 2016M11 2016-11-01
5 3 2016M10 2016-10-01
Also you don't need to create an empty columns 'Data', dt['Date'] = will create it automatically

is the year-month part of a datetime variable still a time object in Pandas?

consider this
df=pd.DataFrame({'A':['20150202','20150503','20150503'],'B':[3, 3, 1],'C':[1, 3, 1]})
df.A=pd.to_datetime(df.A)
df['month']=df.A.dt.to_period('M')
df
Out[59]:
A B C month
0 2015-02-02 3 1 2015-02
1 2015-05-03 3 3 2015-05
2 2015-05-03 1 1 2015-05
and my month variable is:
df.month
Out[82]:
0 2015-02
1 2015-05
2 2015-05
Name: month, dtype: object
Now if I index my dataset by df.month, it seems that Pandas understands this is a date. In other words, I can draw a plot without having to sort my index first.
But is this actually correct? The dtype object (instead of some datetime format) worries me. Is there a proper date object type for this kind of monthly date?
It is a pandas period object
In [5]: df.month.map(type)
Out[5]:
0 <class 'pandas._period.Period'>
1 <class 'pandas._period.Period'>
2 <class 'pandas._period.Period'>
Name: month, dtype: object

Categories