I'm looking to convert a UNIX timestamp object to pandas date time. I'm importing the timestamps from a separate source, which displays a date time of 21-01-22 00:01 for the first timepoint and 21-01-22 00:15 for the second time point. Yet my conversion is 10 hours behind these two. Is this related to the +1000 at the end of each string?
df = pd.DataFrame({
'Time' : ['/Date(1642687260000+1000)/','/Date(1642688100000+1000)/'],
})
df['Time'] = df['Time'].str.split('+').str[0]
df['Time'] = df['Time'].str.split('(').str[1]
df['Time'] = pd.to_datetime(df['Time'], unit = 'ms')
Out:
Time
0 2022-01-20 14:01:00
1 2022-01-20 14:15:00
Other source:
Time
0 2022-01-21 00:01:00
1 2022-01-21 00:15:00
You could use a regex to extract Unix time and UTC offset, then parse Unix time to datetime and add the UTC offset as a timedelta, e.g.
import pandas as pd
df = pd.DataFrame({
'Time' : ['/Date(1642687260000+1000)/','/Date(1642688100000+1000)/', None],
})
df[['unix', 'offset']] = df['Time'].str.extract(r'(\d+)([+-]\d+)')
# datetime from unix first, leaves NaT for invalid values
df['datetime'] = pd.to_datetime(df['unix'], unit='ms')
# where datetime is not NaT, add the offset:
df.loc[~df['datetime'].isnull(), 'datetime'] += (
pd.to_datetime(df['offset'][~df['datetime'].isnull()], format='%z').apply(lambda t: t.utcoffset())
)
# or without the apply, but by using an underscored method:
# df['datetime'] = (pd.to_datetime(df['unix'], unit='ms') +
# pd.to_datetime(df['offset'], format='%z').dt.tz._offset)
df['datetime']
# 0 2022-01-21 00:01:00
# 1 2022-01-21 00:15:00
# 2 NaT
# Name: datetime, dtype: datetime64[ns]
Unfortunately, you'll have to use an underscored ("private") method, if you want to avoid the apply. This also only works if you have a constant offset, i.e. if it's the same offset throughout the whole series.
Related
I have a column with timestamps (strings) which look like the following:
2017-10-25T09:57:00.319Z
2017-10-25T09:59:00.319Z
2017-10-27T11:03:00.319Z
Tbh I do not know the meaning of Z but I guess it is not that important.
How to convert the above strings into correct timestamp to calculate the difference/delta (e.g. in seconds or minutes)?
I want to have a column where the deltas between one to anoter timestamp are listed.
You can use pd.to_datetime() to convert the string to datetime format. Then get the time difference/delta by .diff(). Finally, convert the timedelta to seconds by .dt.total_seconds(), as follows:
(Assuming your column of string is named Date):
df['Date'] = pd.to_datetime(df['Date'])
df['TimeDelta'] = df['Date'].diff().dt.total_seconds()
Result:
Time delta in seconds:
print(df)
Date TimeDelta
0 2017-10-25 09:57:00.319000+00:00 NaN
1 2017-10-25 09:59:00.319000+00:00 120.0
2 2017-10-27 11:03:00.319000+00:00 176640.0
I am struggling with datetime format... This is my dataframe in pandas:
Datetime Date Field
2020-01-12 00:00:00 2020-12-01 6.543916
2020-01-12 00:10:00 2020-12-01 6.505547
2020-01-12 00:20:00 2020-12-01 7.047578
2020-01-12 00:30:00 2020-12-01 6.070998
2020-01-12 00:40:00 2020-12-01 6.452112
df.dtypes
Datetime object
Date datetime64[ns]
Field float64
I need to convert Datetime to datetime64 and swap months with days to get values in the format %Y-%m-%d %H:%M:%S, e.g. 2020-12-01 00:00:00.
import pandas as pd
from datetime import datetime
df["Datetime"] = pd.to_datetime(df["Datetime"])
df["Datetime"] = df["Datetime"].apply(lambda x: datetime.strftime(x, "%Y-%m-%d %H:%M:%S"))
Still I get the same dataframe as shown above...
Consider placing the parameter "errors":
df["Datetime"] = pd.to_datetime(df["Datetime"], errors='coerce')
See if it helps you!
I think you'll get what you want with "%Y-%d-%m %H:%M:%S" instead of "%Y-%m-%d %H:%M:%S" on your last line.
EDIT: Or better even, simply replace the last 2 lines of your code by the following:
df["Datetime"] = pd.to_datetime(df["Datetime"], format="%Y-%d-%m %H:%M:%S")
That way you won't get a ParserError: month must be in 1..12 from pd.to_datetime in the case where your Datetime column contains something like "2020-30-12 00:00:00"
I have a dataset with two columns: Actual Time and Promised Time (representing the actual and promised start times of some process).
For example:
import pandas as pd
example_df = pd.DataFrame(columns = ['Actual Time', 'Promised Time'],
data = [
('2016-6-10 9:00', '2016-6-10 9:00'),
('2016-6-15 8:52', '2016-6-15 9:52'),
('2016-6-19 8:54', '2016-6-19 9:02')]).applymap(pd.Timestamp)
So as we can see, sometimes Actual Time = Promised Time, but there are also cases where Actual Time < Promised Time.
I defined a column that shows the difference between these two columns (example_df['Actual Time']-example_df['Promised Time']), but the problem is that for the third row it returned -1 day +23:52:00 instead of - 00:08:00.
Sample:
print (df)
Actual Time Promised Time
0 2016-6-10 9:00 2016-6-10 9:00
1 2016-6-15 10:52 2016-6-15 9:52 <- changed datetimes
2 2016-6-19 8:54 2016-6-19 9:02
def format_timedelta(x):
ts = x.total_seconds()
if ts >= 0:
hours, remainder = divmod(ts, 3600)
minutes, seconds = divmod(remainder, 60)
return ('{}:{:02d}:{:02d}').format(int(hours), int(minutes), int(seconds))
else:
hours, remainder = divmod(-ts, 3600)
minutes, seconds = divmod(remainder, 60)
return ('-{}:{:02d}:{:02d}').format(int(hours), int(minutes), int(seconds))
First create datetimes:
df['Actual Time'] = pd.to_datetime(df['Actual Time'])
df['Promised Time'] = pd.to_datetime(df['Promised Time'])
And then timedeltas:
df['diff'] = (df['Actual Time'] - df['Promised Time'])
If convert negative timedeltas to seconds by Series.dt.total_seconds it working nice:
df['diff1'] = df['diff'].dt.total_seconds()
But if want negative timedeltas in string representation it is possible with custom function, because strftime for timedeltas is not yet implemented:
df['diff2'] = df['diff'].apply(format_timedelta)
print (df)
Actual Time Promised Time diff diff1 diff2
0 2016-06-10 09:00:00 2016-06-10 09:00:00 00:00:00 0.0 0:00:00
1 2016-06-15 10:52:00 2016-06-15 09:52:00 01:00:00 3600.0 1:00:00
2 2016-06-19 08:54:00 2016-06-19 09:02:00 -1 days +23:52:00 -480.0 -0:08:00
I assume your dataframe already in datetime dtype. abs works just fine
Without abs
df['Actual Time'] - df['Promised Time']
Out[526]:
0 00:00:00
1 -1 days +23:00:00
2 -1 days +23:52:00
dtype: timedelta64[ns]
With abs
abs(df['Promised Time'] - df['Actual Time'])
Out[529]:
0 00:00:00
1 01:00:00
2 00:08:00
dtype: timedelta64[ns]
The difference result is timedelta type which by default is in ns format.
You need to change the type of your result to you desired format:
import pandas as pd
df=pd.DataFrame(data={
'Actual Time':['2016-6-10 9:00','2016-6-15 8:52','2016-6-19 8:54'],
'Promised Time':['2016-6-10 9:00','2016-6-15 9:52','2016-6-19 9:02']
},dtype='datetime64[ns]')
# here you need to add the `astype` part and to determine the unit you want
df['diff']=(df['Actual Time']-df['Promised Time']).astype('timedelta64[m]')
I have a variable in my dataframe that looks like
Day
2015-01-01
2015-02-01
I would like to generate a time variable that gives me the UTC timestamp of the corresponding EST timestamp of the day in Day at 9:00 am in the morning (always 9am, dont ask why...).
Precisely, considering the output for row one,I mean going from 2015-01-01 to 2015-01-01 9:00 EST to 2015-01-01 14:00 UTC. How can I do that in Pandas?
Thanks!
df.Day = pd.date_range('1/1/2015 09:00', periods=10, freq='D')
df.Day = df.Day.tz_localize('UTC')
df.Day_converted = df.Day.tz_convert('US/Eastern')
After creating the timeseries you can localize it to a zone with .tz_localize and then convert it.
thats probably the better solution
df['time']=df['Day'].apply(lambda x: x + ' 9:00')
df['time_utc']=df['time'].apply(lambda x: `pd.Timestamp(x).tz_localize('US/Eastern').tz_convert('UTC'))`
If you'd like to do it with individual variables:
import pandas as pd
from datetime import timedelta
ts = pd.Timestamp('2015-01-01') # Going from 2015-01-01 to ...
ts = ts + timedelta(hours=9) # 2015-01-01 9:00 ...
ts = ts.tz_localize('US/Eastern') # EST to
ts = ts.tz_convert('UTC') # 2015-01-01 14:00 UTC
I have a Pandas Dataframe df:
a date
1 2014-06-29 00:00:00
df.types return:
a object
date object
I want convert column data to data without time but:
df['date']=df['date'].astype('datetime64[s]')
return:
a date
1 2014-06-28 22:00:00
df.types return:
a object
date datetime64[ns]
But value is wrong.
I'd have:
a date
1 2014-06-29
or:
a date
1 2014-06-29 00:00:00
I would start by putting your dates in pd.datetime:
df['date'] = pd.to_datetime(df.date)
Now, you can see that the time component is still there:
df.date.values
array(['2014-06-28T19:00:00.000000000-0500'], dtype='datetime64[ns]')
If you are ok having a date object again, you want:
df['date'] = [x.strftime("%y-%m-%d") for x in df.date]
Here would be ending with a datetime:
df['date'] = [x.date() for x in df.date]
df.date
datetime.date(2014, 6, 29)
Here you go. Just use this pattern:
df.to_datetime().date()