Converting date formats in pandas dataframe - python

I have a dataframe and the Date column has two different types of date formats going on.
eg. 1983-11-10 00:00:00 and 10/11/1983
I want them all to be the same type, how can I iterate through the Date column of my dataframe and convert the dates to one format?

I believe you need parameter dayfirst=True in to_datetime:
df = pd.DataFrame({'Date': {0: '1983-11-10 00:00:00', 1: '10/11/1983'}})
print (df)
Date
0 1983-11-10 00:00:00
1 10/11/1983
df['Date'] = pd.to_datetime(df.Date, dayfirst=True)
print (df)
Date
0 1983-11-10
1 1983-11-10
because:
df['Date'] = pd.to_datetime(df.Date)
print (df)
Date
0 1983-11-10
1 1983-10-11
Or you can specify both formats and then use combine_first:
d1 = pd.to_datetime(df.Date, format='%Y-%m-%d %H:%M:%S', errors='coerce')
d2 = pd.to_datetime(df.Date, format='%d/%m/%Y', errors='coerce')
df['Date'] = d1.combine_first(d2)
print (df)
Date
0 1983-11-10
1 1983-11-10
General solution for multiple formats:
from functools import reduce
def convert_formats_to_datetimes(col, formats):
out = [pd.to_datetime(col, format=x, errors='coerce') for x in formats]
return reduce(lambda l,r: pd.Series.combine_first(l,r), out)
formats = ['%Y-%m-%d %H:%M:%S', '%d/%m/%Y']
df['Date'] = df['Date'].pipe(convert_formats_to_datetimes, formats)
print (df)
Date
0 1983-11-10
1 1983-11-10

I want them all to be the same type, how can I iterate through the
Date column of my dataframe and convert the dates to one format?
Your input data is ambiguous: is 10 / 11 10th November or 11th October? You need to specify logic to determine which is appropriate. A function is useful if you with to try multiple date formats sequentially:
def date_apply_formats(s, form_lst):
s = pd.to_datetime(s, format=form_lst[0], errors='coerce')
for form in form_lst[1:]:
s = s.fillna(pd.to_datetime(s, format=form, errors='coerce'))
return s
df['Date'] = date_apply_formats(df['Date'], ['%Y-%m-%d %H:%M:%S', '%d/%m/%Y'])
Priority is given to the first item in form_lst. The solution is extendible to an arbitrary number of provided formats.

Input date is
NSECODE Date Close
1 NSE500 20000103 1291.5500
2 NSE500 20000104 1335.4500
3 NSE500 20000105 1303.8000
history_nseindex_df["Date"] = pd.to_datetime(history_nseindex_df["Date"])
history_nseindex_df["Date"] = history_nseindex_df["Date"].dt.strftime("%Y-%m-%d")
ouput is now
NSECode Date Close
1 NSE500 2000-01-03 1291.5500
2 NSE500 2000-01-04 1335.4500
3 NSE500 2000-01-05 1303.8000

Related

Pandas Mixed Date Format Values in One Column

df = pd.Series('''18-04-2022
2016-10-05'''.split('\n') , name='date'
).to_frame()
df['post_date'] = pd.to_datetime(df['date'])
print (df)
date post_date
0 18-04-2022 2022-04-18
1 2016-10-05 2016-10-05
When trying to align the date column into one consistent format, I get an error such as above.
The error is that values have mixed date formats dd-mm-yyyy (18-04-2022) and yyyy-dd-mm (2016-10-05).
What I want to have is below (yyyy-mm-dd) for both of the above inconsistent formats:
date post_date
0 18-04-2022 2022-04-18
1 2016-10-05 2016-05-10
Appreciate it in advance.
You can be explicit and parse the two possible formats one after the other:
df['post_date'] = (
pd.to_datetime(df['date'], format='%d-%m-%Y', errors='coerce')
.fillna(
pd.to_datetime(df['date'], format='%Y-%d-%m', errors='coerce')
)
)
Output:
date post_date
0 18-04-2022 2022-04-18
1 2016-10-05 2016-05-10

Pandas DateTime for Month

I have month column with values formatted as: 2019M01
To find the seasonality I need this formatted into Pandas DateTime format.
How to format 2019M01 into datetime so that I can use it for my seasonality plotting?
Thanks.
Use to_datetime with format parameter:
print (df)
date
0 2019M01
1 2019M03
2 2019M04
df['date'] = pd.to_datetime(df['date'], format='%YM%m')
print (df)
date
0 2019-01-01
1 2019-03-01
2 2019-04-01

Applying date to both %y and %Y

I have a dataframe where one of the columns called 'date', containing objects, looks like:
df =
|Date
|Mar-24
|Aug-22
|Sep-25
|...
I want to convert that column into date so for example Mar-24 would look like 2024-03-01. So far i have tried
df['Date'] = pd.to_datetime(df['Date'], format= '%b-%y')
which i think should work but from the few thousand rows i've found that there are rows which contain the full year such as 'Apr 2023' which won't be picked up by %y, is there a way i could find those rows in the column and change them into the short year before applying the above code or just giving the code both %y and %Y arguments?
Use the parameter errors='coerce' in combination with combine_first:
Minimal example:
import pandas as pd
series = pd.Series(['Mar-24', 'Aug-22', 'Sep-2025'], [0, 1, 2])
date1 = pd.to_datetime(series, format='%b-%y', errors='coerce')
date2 = pd.to_datetime(series, format='%b-%Y', errors='coerce')
date1.combine_first(date2)
Output:
0 2024-03-01
1 2022-08-01
2 2025-09-01
dtype: datetime64[ns]
Or for your specific case in one line:
df['Date'] = pd.to_datetime(df['Date'], format='%b-%y', errors='coerce').combine_first(pd.to_datetime(df['Date'], format='%b-%Y', errors='coerce'))

Identifying a dateformat and change it into another

I am working with the following piece of data which has a different format of dates and which creates confusion later in the process. The data is like:
S. No DateTime Area
1 03/05/2019 6:33 A
2 06/03/2019 07:23:45 AM B
The first row is the format %m/%d/%Y h: mm and the second row is the format of %d/%m/%Y hh:mm: ss AM/PM. The first date value can be confusing though, is it 5th march or 3rd May. So to make everything of the same format, I want that my code detects the date format and changes in the desired format.
I have tried doing this:
df['Detection Date'] = pd.to_datetime(df['Detection Date & Time'], errors = 'coerce').dt.datetime
col = df['Detection Date'].apply(str)
for i in df.index:
if datetime.datetime.strptime(col, '%m/%d/%Y h:mm'):
ColDate = datetime.datetime.strftime(col, '%d/%m/%Y hh:mm:ss AM/PM')
But i am getting an error saying:
TypeError: strptime() argument 1 must be str, not Series
How it should be conducted.
Thanks
If it is ok to install a dependency then you can use dateparser link
import pandas as pd
import dateparser
df = pd.DataFrame({'Detection Date & Time': ['03/05/2019 6:33', '06/03/2019 07:23:45 AM']})
df['Date & time'] = df['Detection Date & Time'].apply(dateparser.parse)
You can specify both possible formats in to_datetime, so if format not match is returned missing values, so is possible use Series.fillna:
date1 = pd.to_datetime(df['DateTime'], errors = 'coerce', format='%m/%d/%Y %H:%M')
date2 = pd.to_datetime(df['DateTime'], errors = 'coerce', format='%d/%m/%Y %H:%M:%S %p')
df['DateTime'] = date1.fillna(date2)
print (df)
S. No DateTime Area
0 1 2019-03-05 06:33:00 A
1 2 2019-03-06 07:23:45 B
Last if want specify new format add Series.dt.strftime - advanatage of solution are verify both formats:
df['DateTime'] = date1.fillna(date2).dt.strftime('%d/%m/%Y %H:%M:%S %p')
print (df)
S. No DateTime Area
0 1 05/03/2019 06:33:00 AM A
1 2 06/03/2019 07:23:45 AM B
Details:
print (date1)
0 2019-03-05 06:33:00
1 NaT
Name: DateTime, dtype: datetime64[ns]
print (date2)
0 NaT
1 2019-03-06 07:23:45
Name: DateTime, dtype: datetime64[ns]
Another possible solution without verify another formats - only repalaced format %m/%d/%Y %H:%M to %d/%m/%Y %H:%M:%S %p:
date1 = pd.to_datetime(df['DateTime'], errors = 'coerce', format='%m/%d/%Y %H:%M').dt.strftime('%d/%m/%Y %H:%M:%S %p')
df['DateTime'] = date1.replace('NaT', df['DateTime'])
print (df)
S. No DateTime Area
0 1 05/03/2019 06:33:00 AM A
1 2 06/03/2019 07:23:45 AM B

Comparing today date with date in dataframe

Comparing today date with date in dataframe
Sample Data
id date
1 1/2/2018
2 1/5/2019
3 5/3/2018
4 23/11/2018
Desired output
id date
2 1/5/2019
4 23/11/2018
My current code
dfdateList = pd.DataFrame()
dfDate= self.df[["id", "date"]]
today = datetime.datetime.now()
today = today.strftime("%d/%m/%Y").lstrip("0").replace(" 0", "")
expList = []
for dates in dfDate["date"]:
if dates <= today:
expList.append(dates)
dfdateList = pd.DataFrame(expList)
Currently my code is printing every single line despite the conditions, can anyone guide me? thanks
Pandas has native support for a large class of operations on datetimes, so one solution here would be to use pd.to_datetime to convert your dates from strings to pandas' representation of datetimes, pd.Timestamp, then just create a mask based on the current date:
df['date'] = pd.to_datetime(df['date'], dayfirst=True)
df[df['date'] > pd.Timestamp.now()]
For example:
In [34]: df['date'] = pd.to_datetime(df['date'], dayfirst=True)
In [36]: df
Out[36]:
id date
0 1 2018-02-01
1 2 2019-05-01
2 3 2018-03-05
3 4 2018-11-23
In [37]: df[df['date'] > pd.Timestamp.now()]
Out[37]:
id date
1 2 2019-05-01
3 4 2018-11-23

Categories