How do you convert a column of dates of the form "2020-06-30 15:20:13.078196+00:00" to datetime in pandas?
This is what I have done:
pd.concat([df, df.date_string.apply(lambda s: pd.Series({'date':datetime.strptime(s, '%Y-%m-%dT%H:%M:%S.%f%z')}))], axis=1)
pd.concat([df, df.file_created.apply(lambda s: pd.Series({'date':datetime.strptime(s, '%Y-%m-%dT%H:%M:%S.%f.%z')}))], axis=1)
pd.concat([df, df.file_created.apply(lambda s: pd.Series({'date':datetime.strptime(s, '%Y-%m-%dT%H:%M:%S.%f:%z')}))], axis=1)
I get the error - time data '2020-06-30 15:20:13.078196+00:00' does not match format in all cases.
Any help is appreciated.
None of the formats mentioned by you above matches your sample.
Try this
"%Y-%m-%d %H:%M:%S.%f%z" (Notice the space before %H).
The easiest thing to do is let pd.to_datetime auto-infer the format. That works very well for standard formats like this (ISO 8601):
import pandas as pd
dti = pd.to_datetime(["2020-06-30 15:20:13.078196+00:00"])
print(dti)
# DatetimeIndex(['2020-06-30 15:20:13.078196+00:00'], dtype='datetime64[ns, UTC]', freq=None)
+00:00 is a UTC offset of zero hours, thus can be interpreted as UTC.
btw., pd.to_datetime also works very well for mixed formats, see e.g. here.
Related
How can I convert the following string format of datetime into datetime object to be used in pandas Dataframe? I tried many examples, but it seems my format is different from the standard Pandas datetime object. I know this could be a repetition, but I tried solutions on the Stackexchange, but they don't work!
Below code will convert it into appropriate format
df = pd.DataFrame({'datetime':['2013-11-1_00:00','2013-11-1_00:10','2013-11-1_00:20']})
df['datetime_changed'] = pd.to_datetime(df['datetime'].str.replace('_','T'))
df.head()
output:
You can use pd.to_datetime with format
df['datetime'] = pd.to_datetime(df['datetime'], format='%Y-%m-%d_%H:%M')
Is there a way in pandas to convert my column date which has the following format '1997-01-31' to '199701', without including any information about the day?
I tried solution of the following form:
df['DATE'] = df['DATE'].apply(lambda x: datetime.strptime(x, '%Y%m'))
but I obtain this error : 'ValueError: time data '1997-01-31' does not match format '%Y%m''
Probably the reason is that I am not including the day in the format. Is there a way better to pass from YYYY-MM_DD format to YYYYMM in pandas?
One way is to convert the date to date time and then use strftime. Just a note that you do lose the datetime functionality of the date
df = pd.DataFrame({'date':['1997-01-31' ]})
df['date'] = pd.to_datetime(df['date'])
df['date'] = df['date'].dt.strftime('%Y%m')
date
0 199701
Might not need to go through the datetime conversion if the data are sufficiently clean (no incorrect strings like 'foo' or '001231'):
df = pd.DataFrame({'date':['1997-01-31', '1997-03-31', '1997-12-18']})
df['date'] = [''.join(x.split('-')[0:2]) for x in df.date]
# date
#0 199701
#1 199703
#2 199712
Or if you have null values:
df['date'] = df.date.str.replace('-', '').str[0:6]
The default format of csv is dd/mm/yyyy. When I convert it to datetime by df['Date']=pd.to_datetime(df['Date']), it change the format to mm//dd/yyyy.
Then, I used df['Date'] = pd.to_datetime(df['Date']).dt.strftime('%d/%m/%Y')
to convert to dd/mm/yyyy, But, they are in the string (object) format. However, I need to change them to datetime format. When I use again this (df['Date']=pd.to_datetime(df['Date'])), it gets back to the previous format. Need your help
You can use the parse_dates and dayfirst arguments of pd.read_csv, see: the docs for read_csv()
df = pd.read_csv('myfile.csv', parse_dates=['Date'], dayfirst=True)
This will read the Date column as datetime values, correctly taking the first part of the date input as the day. Note that in general you will want your dates to be stored as datetime objects.
Then, if you need to output the dates as a string you can call dt.strftime():
df['Date'].dt.strftime('%d/%m/%Y')
When I use again this: df['Date'] = pd.to_datetime(df['Date']), it gets back to the previous format.
No, you cannot simultaneously have the string format of your choice and keep your series of type datetime. As remarked here:
datetime series are stored internally as integers. Any
human-readable date representation is just that, a representation,
not the underlying integer. To access your custom formatting, you can
use methods available in Pandas. You can even store such a text
representation in a pd.Series variable:
formatted_dates = df['datetime'].dt.strftime('%m/%d/%Y')
The dtype of formatted_dates will be object, which indicates
that the elements of your series point to arbitrary Python times. In
this case, those arbitrary types happen to be all strings.
Lastly, I strongly recommend you do not convert a datetime series
to strings until the very last step in your workflow. This is because
as soon as you do so, you will no longer be able to use efficient,
vectorised operations on such a series.
This solution will work for all cases where a column has mixed date formats. Add more conditions to the function if needed. Pandas to_datetime() function was not working for me, but this seems to work well.
import date
def format(val):
a = pd.to_datetime(val, errors='coerce', cache=False).strftime('%m/%d/%Y')
try:
date_time_obj = datetime.datetime.strptime(a, '%d/%m/%Y')
except:
date_time_obj = datetime.datetime.strptime(a, '%m/%d/%Y')
return date_time_obj.date()
Saving the changes to the same column.
df['Date'] = df['Date'].apply(lambda x: format(x))
Saving as CSV.
df.to_csv(f'{file_name}.csv', index=False, date_format='%s')
How can I convert a DataFrame column of strings (in dd/mm/yyyy format) to datetime dtype?
The easiest way is to use to_datetime:
df['col'] = pd.to_datetime(df['col'])
It also offers a dayfirst argument for European times (but beware this isn't strict).
Here it is in action:
In [11]: pd.to_datetime(pd.Series(['05/23/2005']))
Out[11]:
0 2005-05-23 00:00:00
dtype: datetime64[ns]
You can pass a specific format:
In [12]: pd.to_datetime(pd.Series(['05/23/2005']), format="%m/%d/%Y")
Out[12]:
0 2005-05-23
dtype: datetime64[ns]
If your date column is a string of the format '2017-01-01'
you can use pandas astype to convert it to datetime.
df['date'] = df['date'].astype('datetime64[ns]')
or use datetime64[D] if you want Day precision and not nanoseconds
print(type(df_launath['date'].iloc[0]))
yields
<class 'pandas._libs.tslib.Timestamp'>
the same as when you use pandas.to_datetime
You can try it with other formats then '%Y-%m-%d' but at least this works.
You can use the following if you want to specify tricky formats:
df['date_col'] = pd.to_datetime(df['date_col'], format='%d/%m/%Y')
More details on format here:
Python 2 https://docs.python.org/2/library/datetime.html#strftime-strptime-behavior
Python 3 https://docs.python.org/3.7/library/datetime.html#strftime-strptime-behavior
If you have a mixture of formats in your date, don't forget to set infer_datetime_format=True to make life easier.
df['date'] = pd.to_datetime(df['date'], infer_datetime_format=True)
Source: pd.to_datetime
or if you want a customized approach:
def autoconvert_datetime(value):
formats = ['%m/%d/%Y', '%m-%d-%y'] # formats to try
result_format = '%d-%m-%Y' # output format
for dt_format in formats:
try:
dt_obj = datetime.strptime(value, dt_format)
return dt_obj.strftime(result_format)
except Exception as e: # throws exception when format doesn't match
pass
return value # let it be if it doesn't match
df['date'] = df['date'].apply(autoconvert_datetime)
Try this solution:
Change '2022–12–31 00:00:00' to '2022–12–31 00:00:01'
Then run this code: pandas.to_datetime(pandas.Series(['2022–12–31 00:00:01']))
Output: 2022–12–31 00:00:01
Multiple datetime columns
If you want to convert multiple string columns to datetime, then using apply() would be useful.
df[['date1', 'date2']] = df[['date1', 'date2']].apply(pd.to_datetime)
You can pass parameters to to_datetime as kwargs.
df[['start_date', 'end_date']] = df[['start_date', 'end_date']].apply(pd.to_datetime, format="%m/%d/%Y")
Use format= to speed up
If the column contains a time component and you know the format of the datetime/time, then passing the format explicitly would significantly speed up the conversion. There's barely any difference if the column is only date, though. In my project, for a column with 5 millions rows, the difference was huge: ~2.5 min vs 6s.
It turns out explicitly specifying the format is about 25x faster. The following runtime plot shows that there's a huge gap in performance depending on whether you passed format or not.
The code used to produce the plot:
import perfplot
import random
mdYHM = range(1, 13), range(1, 29), range(2000, 2024), range(24), range(60)
perfplot.show(
kernels=[lambda x: pd.to_datetime(x), lambda x: pd.to_datetime(x, format='%m/%d/%Y %H:%M')],
labels=['pd.to_datetime(x)', "pd.to_datetime(x, format='%m/%d/%Y %H:%M')"],
n_range=[2**k for k in range(19)],
setup=lambda n: pd.Series([f"{m}/{d}/{Y} {H}:{M}"
for m,d,Y,H,M in zip(*[random.choices(e, k=n) for e in mdYHM])]),
equality_check=pd.Series.equals,
xlabel='len(df)'
)
I have a Pandas dataframe df containing datetimes and their respective values. Now I want to make some format changes to each datetime in the dataframe, but noticed that a normal for loop doesn't actually change anything in the dataframe.
This is what I tried, and also shows what I'm trying to do:
#original format of the datetimes: sunnuntai 1.1.2017 00:00
for i in df["Datetime"]:
#removes the string containing the weekday from the beginning
i = re.sub("^[^ ]* ","", i)
#converts 1.1.2017 00:00 into 2017-01-01 00:00
i = datetime.datetime.strptime(i, "%d.%m.%Y %H:%M").strftime("%Y-%m-%d %H:%M")
How should I go about doing these format changes permanently? Thank you.
Ditch the loop, aim to vectorize. I break down the steps —
use str.split to get rid of leading text,
pd.to_datetime with dayfirst=True for datetime conversion, and
dt.strftime to convert the result to your format
df['Datetime'] = pd.to_datetime(
df['Datetime'].str.split(n=1).str[1], dayfirst=True
).dt.strftime("%Y-%m-%d %H:%M")