Convert int64 column to time column - python

I have a pandas dataframe with a "time" column, which currently looks like:
ab['ZEIT'].unique()
array([ 0, 165505, 203355, ..., 73139, 75211, 74244], dtype=int64)
How can i get german time format out of it with hh:mm:ss so basically that it looks like:
array([00:00:00, 16:55:05, 20:33:55, ..., 07:31:39, 07:52:11, 07:42:44], dtype=?)

Use pd.to_datetime after converting values to string with padded 0s. Then call dt accessor for time
In [12]: pd.to_datetime(df['ZEIT'].astype(str).str.zfill(6), format='%H%M%S').dt.time
Out[12]:
0 00:00:00
1 16:55:05
2 20:33:55
3 07:31:39
4 07:52:11
5 07:42:44
Name: ZEIT, dtype: object
Details
In [13]: df
Out[13]:
ZEIT
0 0
1 165505
2 203355
3 73139
4 75211
5 74244
In [14]: df['ZEIT'].astype(str).str.zfill(6)
Out[14]:
0 000000
1 165505
2 203355
3 073139
4 075211
5 074244
Name: ZEIT, dtype: object
In [15]: pd.to_datetime(df['ZEIT'].astype(str).str.zfill(6), format='%H%M%S')
Out[15]:
0 1900-01-01 00:00:00
1 1900-01-01 16:55:05
2 1900-01-01 20:33:55
3 1900-01-01 07:31:39
4 1900-01-01 07:52:11
5 1900-01-01 07:42:44
Name: ZEIT, dtype: datetime64[ns]

Related

Convert string to_datetime with wrong place day and month

I have a dataset like this
df = pd.DataFrame({'time': ('08.02.2020', '21.02.2020', '2020.05.04')})
df
I do
pd.to_datetime(df['time'])
0 2020-08-02
1 2020-02-21
2 2020-05-04
Name: time, dtype: datetime64[ns]
But the first row must be
0 2020-02-08
If i do
pd.to_datetime(df['time']).dt.strftime('%d-%m-%Y')
0 02-08-2020
1 21-02-2020
2 04-05-2020
Name: time, dtype: object
Again 02-08-2020 instead of 08-02-2020

How to find a partial numeric value in column in Pandas?

I have a data frame created with Pandas. It has 3 columns. One of them has the date in the format %Y%m%d%H. I need to find the rows that match a date with the format %Y%m%d.
I tried
df.loc[df["MESS_DATUM"] == 20170807]
which doesn't work. Only when I do
df.loc[df["MESS_DATUM"] == 2017080723]
it works for that single line. But I need the other lines containing the date only (without the hour). I know there is something like .str.cotains(""). Is there something similar for numeric values or a way to use wildcards in the lines above?
We can "integer divide" MESS_DATUM column by 100:
df.loc[df["MESS_DATUM"]//100 == 20170807]
Demo:
In [29]: df
Out[29]:
MESS_DATUM
0 2017080719
1 2017080720
2 2017080721
3 2017080722
4 2017080723
In [30]: df.dtypes
Out[30]:
MESS_DATUM int64
dtype: object
In [31]: df["MESS_DATUM"]//100
Out[31]:
0 20170807
1 20170807
2 20170807
3 20170807
4 20170807
Name: MESS_DATUM, dtype: int64
But I would consider converting it to datetime dtype:
df["MESS_DATUM"] = pd.to_datetime(df["MESS_DATUM"].astype(str), format='%Y%m%d%H')
If df["MESS_DATUM"] is of float dtype, then we can use the following trick:
In [41]: pd.to_datetime(df["MESS_DATUM"].astype(str).str.split('.').str[0],
format='%Y%m%d%H')
Out[41]:
0 2017-08-07 19:00:00
1 2017-08-07 20:00:00
2 2017-08-07 21:00:00
3 2017-08-07 22:00:00
4 2017-08-07 23:00:00
Name: MESS_DATUM, dtype: datetime64[ns]

How can I create a datetime column without 'date' part?

I have a dataframe and there's a column named 'Time' in it like the below(HH:MM:SS:fffff).
>>> df['Time']
0 09:42:29:75284
1 09:42:29:95584
2 09:42:31:15036
3 09:42:35:15138
4 09:42:35:95491
5 09:42:43:55414
6 09:42:45:35866
7 09:42:46:74638
8 09:42:47:35582
9 09:42:47:74774
10 09:42:48:94582
...
Name: Time, Length: 18924, dtype: object
I want to change its type as datetime, in order to make it easier to calculate. Is it possible to change its type, using pandas.to_datetime, as datetime without date?
You can convert it to timedelta64[ns] dtype:
Source DF:
In [164]: df
Out[164]:
Time
0 09:42:29:75284
1 09:42:29:95584
2 09:42:31:15036
3 09:42:35:15138
4 09:42:35:95491
5 09:42:43:55414
6 09:42:45:35866
7 09:42:46:74638
8 09:42:47:35582
9 09:42:47:74774
10 09:42:48:94582
In [165]: df.dtypes
Out[165]:
Time object # <-------- NOTE!
dtype: object
Converted:
In [166]: df.Time = pd.to_timedelta(df.Time.str.replace(r'\:(\d+)$', r'.\1'),
errors='coerce')
In [167]: df
Out[167]:
Time
0 09:42:29.752840
1 09:42:29.955840
2 09:42:31.150360
3 09:42:35.151380
4 09:42:35.954910
5 09:42:43.554140
6 09:42:45.358660
7 09:42:46.746380
8 09:42:47.355820
9 09:42:47.747740
10 09:42:48.945820
In [168]: df.dtypes
Out[168]:
Time timedelta64[ns] # <-------- NOTE!
dtype: object
Please refer python to_datetime documentation.
import pandas as pd
df = pd.DataFrame({'Time': ['09:42:29:75284','09:42:29:95584','09:42:31:15036']})
df
Out[]:
Time
0 09:42:29:75284
1 09:42:29:95584
2 09:42:31:15036
You can convert this into datetime format by specifying format as follows:
pd.to_datetime(df['Time'], format='%H:%M:%S:%f')
Out[]:
0 1900-01-01 09:42:29.752840
1 1900-01-01 09:42:29.955840
2 1900-01-01 09:42:31.150360
Name: Time, dtype: datetime64[ns]
but doing this will also add date 1900-01-01.

How to replace python data frame values and concatenate another string with where condition

I want to replace column "Time Period" values & attach other string as shown below.
value: 2017M12
M replace with - and add '-01'
Final result: 2017-12-01
Frequency,Time Period,Date
3,2016M12
3,2016M1
3,2016M8
3,2016M7
3,2016M11
3,2016M10
dt['Date'] = dt.loc[dt['Frequency']=='3',replace('Time Period','M','-')]+'-01'
In [18]: df.loc[df.Frequency==3,'Date'] = \
pd.to_datetime(df.loc[df.Frequency==3, 'Time Period'],
format='%YM%m', errors='coerce')
In [19]: df
Out[19]:
Frequency Time Period Date
0 3 2016M12 2016-12-01
1 3 2016M1 2016-01-01
2 3 2016M8 2016-08-01
3 3 2016M7 2016-07-01
4 3 2016M11 2016-11-01
5 3 2016M10 2016-10-01
In [20]: df.dtypes
Out[20]:
Frequency int64
Time Period object
Date datetime64[ns] # <--- NOTE
dtype: object
You can use apply :
dt['Date'] = dt[ dt['Frequency'] ==3]['Time Period'].apply(lambda x: x.replace('M','-')+"-01")
output
Frequency Time Period Date
0 3 2016M12 2016-12-01
1 3 2016M1 2016-1-01
2 3 2016M8 2016-8-01
3 3 2016M7 2016-7-01
4 3 2016M11 2016-11-01
5 3 2016M10 2016-10-01
Also you don't need to create an empty columns 'Data', dt['Date'] = will create it automatically

Dividing a series containing datetime by a series containing an integer in Pandas

I have a series s1 which is of type datetime and has a time which represents a range between a start time and an end time - typical values are 7 days, 4 hours 5 mins etc. I have series s2 which contains integers for the number of events that happened in that time range.
I want to calculate the event frequency by:
event_freq = s1 / s2
I get the error:
cannot operate on a series with out a rhs of a series/ndarray of type datetime64[ns] or a timedelta
Whats the best way to fix this?
Thanks in advance!
EXAMPLE of s1 is:
some_id
1 2012-09-02 09:18:40
3 2012-04-02 09:36:39
4 2012-02-02 09:58:02
5 2013-02-09 14:31:52
6 2012-01-09 12:59:20
EXAMPLE of s2 is:
some_id
1 3
3 1
4 1
5 2
6 1
8 1
10 3
12 2
This might possibly be a bug but what works is to operate on the underlying numpy array like so:
import pandas as pd
from pandas import Series
startdate = Series(pd.date_range('2013-01-01', '2013-01-03'))
enddate = Series(pd.date_range('2013-03-01', '2013-03-03'))
s1 = enddate - startdate
s2 = Series([2, 3, 4])
event_freq = Series(s1.values / s2)
Here are the Series:
>>> s1
0 59 days, 00:00:00
1 59 days, 00:00:00
2 59 days, 00:00:00
dtype: timedelta64[ns]
>>> s2
0 2
1 3
2 4
dtype: int64
>>> event_freq
0 29 days, 12:00:00
1 19 days, 16:00:00
2 14 days, 18:00:00
dtype: timedelta64[ns]

Categories