Pandas Mixed Date Format Values in One Column

Pandas Mixed Date Format Values in One Column - python

df = pd.Series('''18-04-2022
2016-10-05'''.split('\n') , name='date'
).to_frame()
df['post_date'] = pd.to_datetime(df['date'])
print (df)
date post_date
0 18-04-2022 2022-04-18
1 2016-10-05 2016-10-05
When trying to align the date column into one consistent format, I get an error such as above.
The error is that values have mixed date formats dd-mm-yyyy (18-04-2022) and yyyy-dd-mm (2016-10-05).
What I want to have is below (yyyy-mm-dd) for both of the above inconsistent formats:
date post_date
0 18-04-2022 2022-04-18
1 2016-10-05 2016-05-10
Appreciate it in advance.

You can be explicit and parse the two possible formats one after the other:
df['post_date'] = (
pd.to_datetime(df['date'], format='%d-%m-%Y', errors='coerce')
.fillna(
pd.to_datetime(df['date'], format='%Y-%d-%m', errors='coerce')
)
)
Output:
date post_date
0 18-04-2022 2022-04-18
1 2016-10-05 2016-05-10

Related

Pandas DateTime for Month

I have month column with values formatted as: 2019M01
To find the seasonality I need this formatted into Pandas DateTime format.
How to format 2019M01 into datetime so that I can use it for my seasonality plotting?
Thanks.

Use to_datetime with format parameter:
print (df)
date
0 2019M01
1 2019M03
2 2019M04
df['date'] = pd.to_datetime(df['date'], format='%YM%m')
print (df)
date
0 2019-01-01
1 2019-03-01
2 2019-04-01

Converting date formats in pandas dataframe

I have a dataframe and the Date column has two different types of date formats going on.
eg. 1983-11-10 00:00:00 and 10/11/1983
I want them all to be the same type, how can I iterate through the Date column of my dataframe and convert the dates to one format?

I believe you need parameter dayfirst=True in to_datetime:
df = pd.DataFrame({'Date': {0: '1983-11-10 00:00:00', 1: '10/11/1983'}})
print (df)
Date
0 1983-11-10 00:00:00
1 10/11/1983
df['Date'] = pd.to_datetime(df.Date, dayfirst=True)
print (df)
Date
0 1983-11-10
1 1983-11-10
because:
df['Date'] = pd.to_datetime(df.Date)
print (df)
Date
0 1983-11-10
1 1983-10-11
Or you can specify both formats and then use combine_first:
d1 = pd.to_datetime(df.Date, format='%Y-%m-%d %H:%M:%S', errors='coerce')
d2 = pd.to_datetime(df.Date, format='%d/%m/%Y', errors='coerce')
df['Date'] = d1.combine_first(d2)
print (df)
Date
0 1983-11-10
1 1983-11-10
General solution for multiple formats:
from functools import reduce
def convert_formats_to_datetimes(col, formats):
out = [pd.to_datetime(col, format=x, errors='coerce') for x in formats]
return reduce(lambda l,r: pd.Series.combine_first(l,r), out)
formats = ['%Y-%m-%d %H:%M:%S', '%d/%m/%Y']
df['Date'] = df['Date'].pipe(convert_formats_to_datetimes, formats)
print (df)
Date
0 1983-11-10
1 1983-11-10

I want them all to be the same type, how can I iterate through the
Date column of my dataframe and convert the dates to one format?
Your input data is ambiguous: is 10 / 11 10th November or 11th October? You need to specify logic to determine which is appropriate. A function is useful if you with to try multiple date formats sequentially:
def date_apply_formats(s, form_lst):
s = pd.to_datetime(s, format=form_lst[0], errors='coerce')
for form in form_lst[1:]:
s = s.fillna(pd.to_datetime(s, format=form, errors='coerce'))
return s
df['Date'] = date_apply_formats(df['Date'], ['%Y-%m-%d %H:%M:%S', '%d/%m/%Y'])
Priority is given to the first item in form_lst. The solution is extendible to an arbitrary number of provided formats.

Input date is
NSECODE Date Close
1 NSE500 20000103 1291.5500
2 NSE500 20000104 1335.4500
3 NSE500 20000105 1303.8000
history_nseindex_df["Date"] = pd.to_datetime(history_nseindex_df["Date"])
history_nseindex_df["Date"] = history_nseindex_df["Date"].dt.strftime("%Y-%m-%d")
ouput is now
NSECode Date Close
1 NSE500 2000-01-03 1291.5500
2 NSE500 2000-01-04 1335.4500
3 NSE500 2000-01-05 1303.8000

Comparing today date with date in dataframe

Comparing today date with date in dataframe
Sample Data
id date
1 1/2/2018
2 1/5/2019
3 5/3/2018
4 23/11/2018
Desired output
id date
2 1/5/2019
4 23/11/2018
My current code
dfdateList = pd.DataFrame()
dfDate= self.df[["id", "date"]]
today = datetime.datetime.now()
today = today.strftime("%d/%m/%Y").lstrip("0").replace(" 0", "")
expList = []
for dates in dfDate["date"]:
if dates <= today:
expList.append(dates)
dfdateList = pd.DataFrame(expList)
Currently my code is printing every single line despite the conditions, can anyone guide me? thanks

Pandas has native support for a large class of operations on datetimes, so one solution here would be to use pd.to_datetime to convert your dates from strings to pandas' representation of datetimes, pd.Timestamp, then just create a mask based on the current date:
df['date'] = pd.to_datetime(df['date'], dayfirst=True)
df[df['date'] > pd.Timestamp.now()]
For example:
In [34]: df['date'] = pd.to_datetime(df['date'], dayfirst=True)
In [36]: df
Out[36]:
id date
0 1 2018-02-01
1 2 2019-05-01
2 3 2018-03-05
3 4 2018-11-23
In [37]: df[df['date'] > pd.Timestamp.now()]
Out[37]:
id date
1 2 2019-05-01
3 4 2018-11-23

Cleaning up pandas column of datetime strings

I currently have some data in the form of datestrings that I would like to standardize into a zero-padded %H:%M:%S string. In its original form, the data deviates from the standard format in the following ways:
The time is not zero padded (e.g. '2:05:00')
There can be trailing whitespaces (e.g., ' 2:05:00')
There can be times over 24H displayed (e.g., '25:00:00')
Currently, this is what I have:
df['arrival_time'] = pd.to_datetime(df['arrival_time'].map(lambda x: x.strip()), format='%H:%M:%S').dt.strftime('%H:%M:%S')
But I get an error on the times that are over 24H. Is there a good way to transform this dataframe column into the proper format?

I believe you need:
df = pd.DataFrame({'arrival_time':['2:05:00','2:05:00','25:00:00'],})
df['arrival_time'] = df['arrival_time'].str.strip().str.zfill(8)
print (df)
arrival_time
0 02:05:00
1 02:05:00
2 25:00:00
Or:
df['arrival_time'] = pd.to_datetime(df['arrival_time'].str.strip(), errors='coerce')
.dt.strftime('%H:%M:%S')
print (df)
arrival_time
0 02:05:00
1 02:05:00
2 NaT
Or:
df['arrival_time'] = (pd.to_timedelta(df['arrival_time'].str.strip())
.astype(str)
.str.extract('\s.*\s(.*)\.', expand=False))
print (df)
arrival_time
0 02:05:00
1 02:05:00
2 01:00:00

Reformat Dataframe column to date only format

I have a dataframe (df) with a column 'Date of birth' column:
Date of birth
0 1957-04-30 00:00:00
1 1966-11-10 00:00:00
2 1966-11-10 00:00:00
3 1962-03-28 00:00:00
4 1958-10-28 00:00:00
5 1958-06-04 00:00:00
How can I reformat the column to a date only format? After I reformat I'm going to work out age from a specific date:
Date of birth
0 1957-04-30
1 1966-11-10
2 1966-11-10
3 1962-03-28
4 1958-10-28
5 1958-06-04
I have tried using
df["Date of birth"] = pd.to_datetime(df['Date of birth'], format='%d%b%Y')
df["Date of birth"] = df["Date of birth"].dt.strftime('%m/%d/%Y')
but with no joy.

After the column becomes a date, use date accessor to access it.
df["Date of birth"] = pd.to_datetime(df['Date of birth']).dt.date

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Pandas Mixed Date Format Values in One Column - python

Related

Pandas DateTime for Month

Converting date formats in pandas dataframe

Comparing today date with date in dataframe

Cleaning up pandas column of datetime strings

Reformat Dataframe column to date only format

Categories

Resources