pandas to_datetime converts non-zero padded month and day into datetime - python

I am using pd.to_datetime to convert strings into datetime;
df = pd.DataFrame(data={'id':['DD-83']})
pd.to_datetime(df['id'].str.replace(r'\D+', ''), errors='coerce', format='%d%m')
%d%m defines zero-padded day and month, but the code still converts the above string into
0 1900-03-08
Name: id, dtype: datetime64[ns]
I am wondering how to avoid it being converted into datetime (e.g. convert to NaT in this case), if the month and day in a string are not 0-padded. So
DD0306
DD0706
DD-83
will convert to
1900-06-03
1900-06-07
NaT

You need to look for - and only pass strings without -.
Setup:
df = pd.DataFrame(data={'id':['DD-83', 'DD0706', 'DD0306']})
Code:
df['date'] = pd.to_datetime(df['id'].loc[~df['id'].str.contains('-')].str.replace(r'\D+', ''), errors='coerce', format='%d%m')
Output:
id date
0 DD-83 NaT
1 DD0706 1900-06-07
2 DD0306 1900-06-03

Related

Convert date column (string) to datetime and match the format

I'm trying to covert the next date column (str) to datetime64 and say that format doesn't match, can anyone help me pleas :)
Column:
df["Date"]
0 15/7/21
...
2541 13/9/21
dtype: object
What I try:
pd.to_datetime(df["Date"], format = "%d/%m/%Y")
ValueError: time data '15/7/21' does not match format '%d/%m/%Y' (match)
I also try:
pd.to_datetime(df["Date"].astype("datetime64"), format='%d/%m/%Y')
And it convert it as datetime but there is some date the day is in the month.
Anyone know what to do ?
%Y expects a 4-digit year. Use %y for a 2-digit year (See the docs):
>>> import pandas as pd
>>> df = pd.DataFrame({'Date':['15/7/21','13/9/21']})
>>> df['Date']
0 15/7/21
1 13/9/21
Name: Date, dtype: object
>>> pd.to_datetime(df['Date'].astype('datetime64'),format='%d/%m/%y')
0 2021-07-15
1 2021-09-13
Name: Date, dtype: datetime64[ns]
Note that pandas is pretty good at guessing the format:
>>> pd.to_datetime(df['Date'])
0 2021-07-15
1 2021-09-13

Split Date Time string (not in usual format) and pull out month

I have a dataframe that has a date time string but is not in traditional date time format. I would like to separate out the date from the time into two separate columns. And then eventually also separate out the month.
This is what the date/time string looks like: 2019-03-20T16:55:52.981-06:00
>>> df.head()
Date Score
2019-03-20T16:55:52.981-06:00 10
2019-03-07T06:16:52.174-07:00 9
2019-06-17T04:32:09.749-06:003 1
I tried this but got a type error:
df['Month'] = pd.DatetimeIndex(df['Date']).month
This can be done just using pandas itself. You can first convert the Date column to datetime by passing utc = True:
df['Date'] = pd.to_datetime(df['Date'], utc = True)
And then just extract the month using dt.month:
df['Month'] = df['Date'].dt.month
Output:
Date Score Month
0 2019-03-20 22:55:52.981000+00:00 10 3
1 2019-03-07 13:16:52.174000+00:00 9 3
2 2019-06-17 10:32:09.749000+00:00 1 6
From the documentation of pd.to_datetime you can see a parameter:
utc : boolean, default None
Return UTC DatetimeIndex if True (converting any tz-aware datetime.datetime objects as well).

Convert strings with and without time (mixed format) to datetime in a pandas dataframe

When converting a pandas dataframe column from object to datetime using astype function, the behavior is different depending on if the strings have the time component or not. What is the correct way of converting the column?
df = pd.DataFrame({'Date': ['12/07/2013 21:50:00','13/07/2013 00:30:00','15/07/2013','11/07/2013']})
df['Date'] = pd.to_datetime(df['Date'], format="%d/%m/%Y %H:%M:%S", exact=False, dayfirst=True, errors='ignore')
Output:
Date
0 12/07/2013 21:50:00
1 13/07/2013 00:30:00
2 15/07/2013
3 11/07/2013
but the dtype is still object. When doing:
df['Date'] = df['Date'].astype('datetime64')
it becomes of datetime dtype but the day and month are not parsed correctly on rows 0 and 3.
Date
0 2013-12-07 21:50:00
1 2013-07-13 00:30:00
2 2013-07-15 00:00:00
3 2013-11-07 00:00:00
The expected result is:
Date
0 2013-07-12 21:50:00
1 2013-07-13 00:30:00
2 2013-07-15 00:00:00
3 2013-07-11 00:00:00
If we look at the source code, if you pass format= and dayfirst= arguments, dayfirst= will never be read because passing format= calls a C function (np_datetime_strings.c) that doesn't use dayfirst= to make conversions. On the other hand, if you pass only dayfirst=, it will be used to first guess the format and falls back on dateutil.parser.parse to make conversions. So, use only one of them.
In most cases,
df['Date'] = pd.to_datetime(df['Date'])
does the job.
In the specific example in the OP, passing dayfirst=True does the job.
df['Date'] = pd.to_datetime(df['Date'], dayfirst=True)
That said, passing the format= makes the conversion run ~25x faster (see this post for more info), so if your frame is anything larger than 10k rows, then it's better to pass the format=. Now since the format is mixed, one way is to perform the conversion in two steps (errors='coerce' argument will be useful)
convert the datetimes with time component
fill in the NaT values (the "coerced" rows) by a Series converted with a different format.
df['Date'] = pd.to_datetime(df['Date'], format='%d/%m/%Y %H:%M:%S', errors='coerce')
df['Date'] = df['Date'].fillna(pd.to_datetime(df['Date'], format='%d/%m/%Y', errors='coerce'))
This method (of performing or more conversions) can be used to convert any column with "weirdly" formatted datetimes.

How to convert pandas column to date when column is something like "Jan-18"?

what is the efficient way to convert the column values into dates "DD-MM-YYYY" when the values given like "Feb-15" which needs to be "01-02-2015". if it's "Dec-46" it must return "01-12-1946".
You can pass the format '%b-%y' to to_datetime:
In[42]:
df = pd.DataFrame({'date':["Feb-15","Dec-46"]})
df['new_date'] = pd.to_datetime(df['date'], format='%b-%y')
df
Out[42]:
date new_date
0 Feb-15 2015-02-01
1 Dec-46 2046-12-01
Note that the new dtype is datetime64, you cannot control the display output, if you insist on DD-MM-YYYY then you would have to convert to a string using dt.strftime:
In[43]:
df['str_date'] = df['new_date'].dt.strftime('%d-%m-%Y')
df
Out[43]:
date new_date str_date
0 Feb-15 2015-02-01 01-02-2015
1 Dec-46 2046-12-01 01-12-2046
but then you have strings which is not that useful if you need to perform arithmetic operations or filtering
EDIT
You cannot store dates earlier than 1970 so '01-01-1946' is not a valid datetime that can be represented by datetime64

Pandas Dataframe convert string to data without time

I have a Pandas Dataframe df:
a date
1 2014-06-29 00:00:00
df.types return:
a object
date object
I want convert column data to data without time but:
df['date']=df['date'].astype('datetime64[s]')
return:
a date
1 2014-06-28 22:00:00
df.types return:
a object
date datetime64[ns]
But value is wrong.
I'd have:
a date
1 2014-06-29
or:
a date
1 2014-06-29 00:00:00
I would start by putting your dates in pd.datetime:
df['date'] = pd.to_datetime(df.date)
Now, you can see that the time component is still there:
df.date.values
array(['2014-06-28T19:00:00.000000000-0500'], dtype='datetime64[ns]')
If you are ok having a date object again, you want:
df['date'] = [x.strftime("%y-%m-%d") for x in df.date]
Here would be ending with a datetime:
df['date'] = [x.date() for x in df.date]
df.date
datetime.date(2014, 6, 29)
Here you go. Just use this pattern:
df.to_datetime().date()

Categories