I have 3 dataframes with multiple columns, with 2 of them having a datetime that is is UTC, and the other one being 'Europe/Amsterdam'. However, they are still unaware.
How do I make these datasets timezone aware, and convert the 'Europe/Amsterdam' to UTC?
The datetimes are in the index of each dataset.
If you're using pandas Dataframes and Python 3, you can do it like this:
import pandas as pd
values = {'dates': ['20190902101010','20190913202020','20190921010101'],
'status': ['Opened','Opened','Closed']
}
df = pd.DataFrame(values, columns = ['dates','status'])
df['dates_datetime'] = pd.to_datetime(df['dates'], format='%Y%m%d%H%M%S')
df['dates_datetime_tz'] = df.dates_datetime.dt.tz_localize('UTC').dt.tz_convert('Asia/Kolkata')
print (df)
print (df.dtypes)
Result:
dates status dates_datetime dates_datetime_tz
0 20190902101010 Opened 2019-09-02 10:10:10 2019-09-02 15:40:10+05:30
1 20190913202020 Opened 2019-09-13 20:20:20 2019-09-14 01:50:20+05:30
2 20190921010101 Closed 2019-09-21 01:01:01 2019-09-21 06:31:01+05:30
I've converted from UTC to a specific TZ, you can choose any other you need.
Related
I have a column with timestamps (strings) which look like the following:
2017-10-25T09:57:00.319Z
2017-10-25T09:59:00.319Z
2017-10-27T11:03:00.319Z
Tbh I do not know the meaning of Z but I guess it is not that important.
How to convert the above strings into correct timestamp to calculate the difference/delta (e.g. in seconds or minutes)?
I want to have a column where the deltas between one to anoter timestamp are listed.
You can use pd.to_datetime() to convert the string to datetime format. Then get the time difference/delta by .diff(). Finally, convert the timedelta to seconds by .dt.total_seconds(), as follows:
(Assuming your column of string is named Date):
df['Date'] = pd.to_datetime(df['Date'])
df['TimeDelta'] = df['Date'].diff().dt.total_seconds()
Result:
Time delta in seconds:
print(df)
Date TimeDelta
0 2017-10-25 09:57:00.319000+00:00 NaN
1 2017-10-25 09:59:00.319000+00:00 120.0
2 2017-10-27 11:03:00.319000+00:00 176640.0
I have a row of messy data where date formats are different and I want them to be coherent as datetime in pandas
df:
Date
0 1/05/2015
1 15 Jul 2009
2 1-Feb-15
3 12/08/2019
When I run this part:
df['date'] = pd.to_datetime(df['date'], format='%d %b %Y', errors='coerce')
I get
Date
0 NaT
1 2009-07-15
2 NaT
3 NaT
How do I convert it all to date time in pandas?
pd.to_datetime is capabale of handling multiple date formats in the same column. Specifying a format will hinder its ability to dynamically determine the format, so if there are multiple types do not specify the format:
import pandas as pd
df = pd.DataFrame({
'Date': ['1/05/2015', '15 Jul 2009', '1-Feb-15', '12/08/2019']
})
df['Date'] = pd.to_datetime(df['Date'], errors='coerce')
print(df)
Date
0 2015-01-05
1 2009-07-15
2 2015-02-01
3 2019-12-08
*There are limitations to the ability to handle multiple date times. Mixed timezone aware and timezone unaware datetimes will not process correctly. Likewise mixed dayfirst and monthfirst notations will not always parse correctly.
I get a date in data which looks like this "2014-12-19T05:00:00". I want to convert it in order to obtain a Date or String object and get something like "01-04-2018" that its "dd-MM-YYYY" in dataframe. How can I do it?
The result will be used for time series. So far,my time series result is like this, perhaps because it doesn't detect the date format (x-axis not in datetime).
Date column:
For a pandas dataframe column/series:
Convert a string column (dtype of object) to a datetime column (dtype of datetime64[ns]) using to_datetime. Then if you want another column with your datetimes back in a string format of your choosing, use dt.strftime.
An example:
df = pd.DataFrame({
"Date": ["2014-12-19T05:00:00", "2014-12-20T05:00:00", "2014-12-21T05:00:00"],
"Value": [0, 2, 4]})
df['DateTime'] = pd.to_datetime(df['Date'])
df['MyDateTimeString'] = df['DateTime'].dt.strftime('%Y-%m-%d')
print(df)
# Date Value DateTime MyDateTimeString
# 0 2014-12-19T05:00:00 0 2014-12-19 05:00:00 2014-12-19
# 1 2014-12-20T05:00:00 2 2014-12-20 05:00:00 2014-12-20
# 2 2014-12-21T05:00:00 4 2014-12-21 05:00:00 2014-12-21
In general:
To read your strings into datetime objects, use strptime:
import datetime
d = datetime.datetime.strptime("2014-12-19T05:00:00", "%Y-%m-%dT%H:%M:%S")
Then to get a string representation of those datetime objects, use strftime:
d.strftime("%d-%m-%Y")
For more general string-to-datetime parsing, the dateparser library is handy:
import dateparser
dateparser.parse("2014-12-19T05:00:00").strftime("%d-%m-%Y")
# '19-12-2014'
dateparser.parse("December 19, 2014 at 5am").strftime("%d-%m-%Y")
# '19-12-2014'
I recommend using https://pypi.org/project/python-dateutil/
(Install with pip install python-dateutil.)
>>> import dateutil.parser
>>> d = dateutil.parser.isoparse('2014-12-19T05:00:00')
>>> print(d.strftime('%m-%d-%Y'))
12-19-2014
When converting a pandas dataframe column from object to datetime using astype function, the behavior is different depending on if the strings have the time component or not. What is the correct way of converting the column?
df = pd.DataFrame({'Date': ['12/07/2013 21:50:00','13/07/2013 00:30:00','15/07/2013','11/07/2013']})
df['Date'] = pd.to_datetime(df['Date'], format="%d/%m/%Y %H:%M:%S", exact=False, dayfirst=True, errors='ignore')
Output:
Date
0 12/07/2013 21:50:00
1 13/07/2013 00:30:00
2 15/07/2013
3 11/07/2013
but the dtype is still object. When doing:
df['Date'] = df['Date'].astype('datetime64')
it becomes of datetime dtype but the day and month are not parsed correctly on rows 0 and 3.
Date
0 2013-12-07 21:50:00
1 2013-07-13 00:30:00
2 2013-07-15 00:00:00
3 2013-11-07 00:00:00
The expected result is:
Date
0 2013-07-12 21:50:00
1 2013-07-13 00:30:00
2 2013-07-15 00:00:00
3 2013-07-11 00:00:00
If we look at the source code, if you pass format= and dayfirst= arguments, dayfirst= will never be read because passing format= calls a C function (np_datetime_strings.c) that doesn't use dayfirst= to make conversions. On the other hand, if you pass only dayfirst=, it will be used to first guess the format and falls back on dateutil.parser.parse to make conversions. So, use only one of them.
In most cases,
df['Date'] = pd.to_datetime(df['Date'])
does the job.
In the specific example in the OP, passing dayfirst=True does the job.
df['Date'] = pd.to_datetime(df['Date'], dayfirst=True)
That said, passing the format= makes the conversion run ~25x faster (see this post for more info), so if your frame is anything larger than 10k rows, then it's better to pass the format=. Now since the format is mixed, one way is to perform the conversion in two steps (errors='coerce' argument will be useful)
convert the datetimes with time component
fill in the NaT values (the "coerced" rows) by a Series converted with a different format.
df['Date'] = pd.to_datetime(df['Date'], format='%d/%m/%Y %H:%M:%S', errors='coerce')
df['Date'] = df['Date'].fillna(pd.to_datetime(df['Date'], format='%d/%m/%Y', errors='coerce'))
This method (of performing or more conversions) can be used to convert any column with "weirdly" formatted datetimes.
I am working with python 3.5.2, pandas 0.18.1 and sqlite3.
In my data base, I have a column unix_time with INT for seconds since 1970. Ideally I want to read my dataframe from sqlite, and then create a time column which would correspond to the datetime or pandas.tslib.Timestamp conversion of the unix_time column that I woul only use for some processing and then drop before saving the dataframe back.
The issue is that when parsing the unix_time column using :
df = pd.read_from_sql_query("SELECT * FROM test", con, parse_dates=['unix_time'])
I obtain pandas.tslib.Timestamp types which is fine for my processing, but then I have to recreate my original unix_time column using :
df['unix_time'][i] = (df['unix_time'][i] - datetime(1970,1,1)).total_seconds()
which is really 'dirty'
First question : Do you have a better way?
I thought about giving up the unix time format and only use datetime format but the to_datetime method from pandas returns in fact pandas.tslib.Timestamp ... And anyway, doing so would force me to iterate over all rows which is a bad solution. (It is impossible to apply to_datetime on something else than a view over a single cell of the dataframe
Second question : Is it possible to apply it on a series?
My last try was with directly using df['time'] = datetime.datetime.fromtimestamp(df['unix_time']) but surprisingly, it also returns pandas.tslib.Timestamp.
In the end, knowing that I can only save unix timestamps or datetimes, my only choices for the moment are :
parsing but then having to convert them back to unix timestamp one by
one.
Or not parse it but have to convert them to pandas.tslib.Timestamp
one by one.
It would be great if I could convert a whole series.
Last question : Is there a way to convert a unix timestamps series to datetime (or at least pandas.tslib.Timestamp), or a pandas.tslib.Timestamp (or datetime) series to unix timestamps?
Thanks
EDIT:
During my processing, I extract a row that I want to append to my dataset. Apparently, the coversion to pandas.tslib.Timestamp appends implicitly when passing from dataframe to serie :
df = pd.DataFrame({'UNX':pd.date_range('2016-01-01', freq='9999S', periods=10).astype(np.int64)//10**9})
df['Date'] = pd.to_datetime(df.UNX, unit='s')
print(df.Date.dtypes)
print(type(df['Date'][0]))
test = df.iloc[0]
print(type(test.Date))
new_df = test.to_frame().transpose() #from here, impossible to do : new_df.to_sql("test", con) because the type for 'Date' is not supported
print(new_df.Date.dtypes)
returns
datetime64[ns]
<class 'pandas.tslib.Timestamp'>
<class 'pandas.tslib.Timestamp'>
object
Is there a way to convert the 'Date' in new_df from pandas.tslib.Timestamp to datetime64[ns] or datetime.datetime (or simply str) ?
IIUC you can do it this way:
In [96]: df = pd.DataFrame({'UNX':pd.date_range('2016-01-01', freq='9999S', periods=10).astype(np.int64)//10**9})
In [97]: df
Out[97]:
UNX
0 1451606400
1 1451616399
2 1451626398
3 1451636397
4 1451646396
5 1451656395
6 1451666394
7 1451676393
8 1451686392
9 1451696391
Convert UNIX epoch to Python datetime:
In [98]: df['Date'] = pd.to_datetime(df.UNX, unit='s')
In [99]: df
Out[99]:
UNX Date
0 1451606400 2016-01-01 00:00:00
1 1451616399 2016-01-01 02:46:39
2 1451626398 2016-01-01 05:33:18
3 1451636397 2016-01-01 08:19:57
4 1451646396 2016-01-01 11:06:36
5 1451656395 2016-01-01 13:53:15
6 1451666394 2016-01-01 16:39:54
7 1451676393 2016-01-01 19:26:33
8 1451686392 2016-01-01 22:13:12
9 1451696391 2016-01-02 00:59:51
Convert datetime to UNIX epoch:
In [100]: df['UNX2'] = df.Date.astype('int64')//10**9
In [101]: df
Out[101]:
UNX Date UNX2
0 1451606400 2016-01-01 00:00:00 1451606400
1 1451616399 2016-01-01 02:46:39 1451616399
2 1451626398 2016-01-01 05:33:18 1451626398
3 1451636397 2016-01-01 08:19:57 1451636397
4 1451646396 2016-01-01 11:06:36 1451646396
5 1451656395 2016-01-01 13:53:15 1451656395
6 1451666394 2016-01-01 16:39:54 1451666394
7 1451676393 2016-01-01 19:26:33 1451676393
8 1451686392 2016-01-01 22:13:12 1451686392
9 1451696391 2016-01-02 00:59:51 1451696391
Check:
In [102]: df.UNX.eq(df.UNX2).all()
Out[102]: True
Round trip between Pandas Timestamp and Unix Seconds (since 1970-01-01):
date_in = pd.to_datetime("2022-04-07")
# type(date_in) is: pandas._libs.tslibs.timestamps.Timestamp
unix_seconds = date_in.value//10**9
date_out = pd.to_datetime(unix_seconds, unit="s")
Output:
date_in
Out[1]: Timestamp('2021-04-07 00:00:00')
unix_seconds
Out[2]: 1617753600
date_out
Out[3]: Timestamp('2021-04-07 00:00:00')