I am struggling with datetime format... This is my dataframe in pandas:
Datetime Date Field
2020-01-12 00:00:00 2020-12-01 6.543916
2020-01-12 00:10:00 2020-12-01 6.505547
2020-01-12 00:20:00 2020-12-01 7.047578
2020-01-12 00:30:00 2020-12-01 6.070998
2020-01-12 00:40:00 2020-12-01 6.452112
df.dtypes
Datetime object
Date datetime64[ns]
Field float64
I need to convert Datetime to datetime64 and swap months with days to get values in the format %Y-%m-%d %H:%M:%S, e.g. 2020-12-01 00:00:00.
import pandas as pd
from datetime import datetime
df["Datetime"] = pd.to_datetime(df["Datetime"])
df["Datetime"] = df["Datetime"].apply(lambda x: datetime.strftime(x, "%Y-%m-%d %H:%M:%S"))
Still I get the same dataframe as shown above...
Consider placing the parameter "errors":
df["Datetime"] = pd.to_datetime(df["Datetime"], errors='coerce')
See if it helps you!
I think you'll get what you want with "%Y-%d-%m %H:%M:%S" instead of "%Y-%m-%d %H:%M:%S" on your last line.
EDIT: Or better even, simply replace the last 2 lines of your code by the following:
df["Datetime"] = pd.to_datetime(df["Datetime"], format="%Y-%d-%m %H:%M:%S")
That way you won't get a ParserError: month must be in 1..12 from pd.to_datetime in the case where your Datetime column contains something like "2020-30-12 00:00:00"
Related
I'm looking to convert a UNIX timestamp object to pandas date time. I'm importing the timestamps from a separate source, which displays a date time of 21-01-22 00:01 for the first timepoint and 21-01-22 00:15 for the second time point. Yet my conversion is 10 hours behind these two. Is this related to the +1000 at the end of each string?
df = pd.DataFrame({
'Time' : ['/Date(1642687260000+1000)/','/Date(1642688100000+1000)/'],
})
df['Time'] = df['Time'].str.split('+').str[0]
df['Time'] = df['Time'].str.split('(').str[1]
df['Time'] = pd.to_datetime(df['Time'], unit = 'ms')
Out:
Time
0 2022-01-20 14:01:00
1 2022-01-20 14:15:00
Other source:
Time
0 2022-01-21 00:01:00
1 2022-01-21 00:15:00
You could use a regex to extract Unix time and UTC offset, then parse Unix time to datetime and add the UTC offset as a timedelta, e.g.
import pandas as pd
df = pd.DataFrame({
'Time' : ['/Date(1642687260000+1000)/','/Date(1642688100000+1000)/', None],
})
df[['unix', 'offset']] = df['Time'].str.extract(r'(\d+)([+-]\d+)')
# datetime from unix first, leaves NaT for invalid values
df['datetime'] = pd.to_datetime(df['unix'], unit='ms')
# where datetime is not NaT, add the offset:
df.loc[~df['datetime'].isnull(), 'datetime'] += (
pd.to_datetime(df['offset'][~df['datetime'].isnull()], format='%z').apply(lambda t: t.utcoffset())
)
# or without the apply, but by using an underscored method:
# df['datetime'] = (pd.to_datetime(df['unix'], unit='ms') +
# pd.to_datetime(df['offset'], format='%z').dt.tz._offset)
df['datetime']
# 0 2022-01-21 00:01:00
# 1 2022-01-21 00:15:00
# 2 NaT
# Name: datetime, dtype: datetime64[ns]
Unfortunately, you'll have to use an underscored ("private") method, if you want to avoid the apply. This also only works if you have a constant offset, i.e. if it's the same offset throughout the whole series.
I try to convert my column with "time" in the form "hr hr: min min :sec sec" in my pandas frame from object to date time 64 as I want to filter for hours.
I tried new['Time'] = pd.to_datetime(new['Time'], format='%H:%M:%S').dt.time which has no effect at all (it is still an object).
I also tried new['Time'] = pd.to_datetime(new['Time'],infer_datetime_format=True)
which gets the error message: TypeError: <class 'datetime.time'> is not convertible to datetime
I want to be able to sort my data frame for hours.
How do i convert the object to the hour?
can I then filter by hour (for example everything after 8am) or do I have to enter the exact value with minutes and seconds to filter for it?
Thank you
If you want your df['Time'] to be of type datetime64 just use
df['Time'] = pd.to_datetime(df['Time'], format='%H:%M:%S')
print(df['Time'])
This will result in the following column
0 1900-01-01 00:00:00
1 1900-01-01 00:01:00
2 1900-01-01 00:02:00
3 1900-01-01 00:03:00
4 1900-01-01 00:04:00
...
1435 1900-01-01 23:55:00
1436 1900-01-01 23:56:00
1437 1900-01-01 23:57:00
1438 1900-01-01 23:58:00
1439 1900-01-01 23:59:00
Name: Time, Length: 1440, dtype: datetime64[ns]
If you just want to extract the hour from the timestamp extent pd.to_datetime(...) by .dt.hour
If you want to group your values on an hourly basis you can also use (after converting the df['Time'] to datetime):
new_df = df.groupby(pd.Grouper(key='Time', freq='H'))['Value'].agg({pd.Series.to_list})
This will return all values grouped by hour.
IIUC, you already have a time structure from datetime module:
Suppose this dataframe:
from datetime import time
df = pd.DataFrame({'Time': [time(10, 39, 23), time(8, 47, 59), time(9, 21, 12)]})
print(df)
# Output:
Time
0 10:39:23
1 08:47:59
2 09:21:12
Few operations:
# Check if you have really `time` instance
>>> df['Time'].iloc[0]
datetime.time(10, 39, 23)
# Sort values by time
>>> df.sort_values('Time')
Time
1 08:47:59
2 09:21:12
0 10:39:23
# Extract rows from 08:00 and 09:00
>>> df[df['Time'].between(time(8), time(9))]
Time
1 08:47:59
Below is the sample data
Datetime
11/19/2020 9:48:50 AM
12/17/2020 2:41:02 PM
2020-02-11 14:44:58
2020-28-12 10:41:02
2020-05-12 06:31:39
11/19/2020 is in mm/dd/yyyy whereas 2020-28-12 is yyyy-dd-mm.
After applying pd.to_datetime below is the output that I am getting.
Date
2020-11-19 09:48:50
2020-12-17 22:41:02
2020-02-11 14:44:58
2020-28-12 10:41:02
2020-05-12 06:31:39
If the input data is coming with slash (/) i.e 11/19/2020 then format is mm/dd/yyyy in input itself and when data is coming with dash (-) i.e 2020-02-11 then the format is yyyy-dd-mm. But after applying pd.to_datetime datetime is getting interchanged.
The first two output is correct. The bottom three needs to be corrected as
2020-11-02 14:44:58
2020-12-28 10:41:02
2020-12-05 06:31:39
Please suggest to have common format i.e yyyy-mm-dd format.
Use to_datetime with specify both formats and errors='coerce' for missing values if no match and then replace them by another Series in Series.fillna:
d1 = pd.to_datetime(df['datetime'], format='%Y-%d-%m %H:%M:%S', errors='coerce')
d2 = pd.to_datetime(df['datetime'], format='%m/%d/%Y %I:%M:%S %p', errors='coerce')
df['datetime'] = d1.fillna(d2)
print (df)
datetime
0 2020-11-19 09:48:50
1 2020-12-17 14:41:02
2 2020-11-02 14:44:58
3 2020-12-28 10:41:02
4 2020-12-05 06:31:39
i am doing project on cab fare prediction using python. while changing datetime column from object to datetime type am getting an error. please help format looking like this: 2009-06-15 17:26:21 UTC
i have tried below code
df_train["pickup_datetime"]= pd.to_datetime(df_train["pickup_datetime"], format= "%Y-%m-%d %H:%M:%S UTC")
ValueError: time data '43' does not match format '%Y-%m-%d %H:%M:%S UTC' (match)
I am able to convert your datetime string without any issues using pandas module, there might be something else going on
import pandas as pd
li = ['2009-06-15 17:26:21 UTC', '2010-01-05 16:52:16 UTC', '2011-08-18 00:35:00 UTC', '2012-04-21 04:30:42 UTC', '2010-03-09 07:51:00 UTC',
'2011-01-06 09:50:45 UTC', '2012-11-20 20:35:00 UTC', '2012-01-04 17:22:00 UTC']
df = pd.Series(li)
df = pd.to_datetime(df, format= "%Y-%m-%d %H:%M:%S UTC")
print(df)
I get the output
0 2009-06-15 17:26:21
1 2010-01-05 16:52:16
2 2011-08-18 00:35:00
3 2012-04-21 04:30:42
4 2010-03-09 07:51:00
5 2011-01-06 09:50:45
6 2012-11-20 20:35:00
7 2012-01-04 17:22:00
dtype: datetime64[ns]
I am also doing the same project.
You have to use:
df_train.drop(df_t[data['pickup_datetime'] == '43'].index,inplace=True).
Then it will work.
I have a column which contains values like this
04-04-2007
14-03-2008
14-03-2008
2011-10-10 00:00:00
2011-10-10 00:00:00
27-04-2012
27-04-2012
28-03-2014
2014-03-28 00:00:00
2017-03-31 00:00:00
2017-03-31 00:00:00
2018-04-02 00:00:00
As can be seen, few values are datetime and few are string. I want all to be converted to datetime.
pd.to_datetime(df['Event Date'], format='%d-%m-%Y') throws an error when it encounters datetime type values
does errors='ignore' help? This should skip the values it can't parse and return the original input (so, in theory, returning the datetime value)
pd.to_datetime(df['Event Date'], format='%d-%m-%Y', errors='ignore')