Date formating in Pandas - python

I'm trying to format a column with date to 'Month Year' format without changing non-date values .
input_df = pd.DataFrame({'Period' :['2017-11-01 00:00:00', '2019-02-01 00:00:00', 'Mar 2020', 'Pre-Nov 2017', '2019-10-01 00:00:00' , 'Nov 17-Nov 18'] } )
input_df is
expected output is:
I tired with the below code which didn't work:
output_df['Period'] = input_df['Period'].apply(lambda x: x.strftime('%m %Y') if isinstance(x, datetime.date) else x)
Pls help..

You can do with error='coerce' and fillna:
input_df['new_period'] = (pd.to_datetime(input_df['Period'], errors='coerce')
.dt.strftime('%b %Y')
.fillna(input_df['Period'])
)
Output:
Period new_period
0 2017-11-01 00:00:00 Nov 2017
1 2019-02-01 00:00:00 Feb 2019
2 Mar 2020 Mar 2020
3 Pre-Nov 2017 Pre-Nov 2017
4 2019-10-01 00:00:00 Oct 2019
5 Nov 17-Nov 18 Nov 17-Nov 18
Update: Second, safer option:
s = pd.to_datetime(input_df['Period'], errors='coerce')
input_df['new_period'] = np.where(s.isna(), input_df['Period'],
s.dt.strftime('%b %Y'))

Related

Creating a function to execute it on entire Dataframe

I have a data that includes columns with dates:
col_1 col_2
'may 2021 - 2023' 'nov 2020 - feb 2021'
'jan 2022 - 2023' 'sep 2021- 2023'
With below code I can create the required output, but I am looking to create a function which can take a dataframe as input and produces the expected output :
s = df['col_1'].str.split(r'\s*-\s*')
df['year_1'] = (pd
.to_datetime(s.str[1])
.sub(pd.to_datetime(s.str[0])))
t = df['col_2'].str.split(r'\s*-\s*')
df['year_2'] = (pd
.to_datetime(t.str[1])
.sub(pd.to_datetime(t.str[0])))
to prepare the below output i need to rerun the code with change in variable. as explained i need to make a function. please note that number of columns can be more so code should work fine
Expected Output
col_1 Year_1 col_2 Year_2
'may 2021 - 2023' 610 days 'sep 2017-dec 2017' 91 days
'jan 2022 - 2023' 365 days 'sep 2021- 2023' 487 days
You can use:
def compute_days(sr):
parts = sr.str.strip("'").str.split('-', expand=True)
start = pd.to_datetime(parts[0])
end = pd.to_datetime(parts[1])
return end - start
days = df.apply(compute_days).rename(columns=lambda x: f"Year_{x.split('_')[1]}")
out = pd.concat([df, days], axis=1)
Output:
col_1 col_2 Year_1 Year_2
0 'may 2021 - 2023' 'nov 2020 - feb 2021' 610 days 92 days
1 'jan 2022 - 2023' 'sep 2021- 2023' 365 days 487 days
2 '03/2017 - 08/2021' '2022 - 2023' 1614 days 365 days
3 '' '' NaT NaT

how to convert a column with string datetime to datetime format

i want to convert a column with string date '19 Desember 2022' for example (the month name is in Indonesian), to supported datetime format without translating it, how do i do that?
already tried this one
df_train['date'] = pd.to_datetime(df_train['date'], format='%d %B %Y') but got error time data '19 Desember 2022' does not match format '%d %B %Y' (match)
incase if anyone want to see the row image
Try using dateparser
import dateparser
df_train = pd.DataFrame(['19 Desember 2022', '20 Desember 2022', '21 Desember 2022', '22 Desember 2022'], columns = ['date'])
df_train['date'] = [dateparser.parse(x) for x in df_train['date']]
df_train
Output:
date
0 2022-12-19
1 2022-12-20
2 2022-12-21
3 2022-12-22
Pandas doesn't recognize bahasa(indonesian language) Try replacing the spelling of December (as pointed out you can use a one liner and create a new column):
df_train["formatted_date"] = pd.to_datetime(df_train["date"].str.replace("Desember", "December"), format="%d %B %Y")
print(df_train)
Output:
user_type date formatted_date
0 Anggota 19 Desember 2022 2022-12-19
1 Anggota 19 Desember 2022 2022-12-19
2 Anggota 19 Desember 2022 2022-12-19
3 Anggota 19 Desember 2022 2022-12-19
4 Anggota 19 Desember 2022 2022-12-19

Convert DD MM YY to Datetime Object in a Pandas Series

In a current pandas series, I have numerous rows in which the date is formatted in the format YY MM DD, with the months being the text abbreviation of the month. For example, 19 JA 02 represents January 2, 2019. Is there a way to convert and parse this series to a datetime object?
I have currently tried the following:
chemical_19['Manufacture Date(MM/DD/YYYY)'].apply(lambda x : datetime.datetime.strptime(x, '%Y/%M/%d'))
If you would have Jan, Feb then you could use %b to match month.
And '%y %b %d' to match 19 Jan 02
But in pandas you can use dictionary {'JA': 'Jan', 'FE':'Feb', ...}
with .replace(..., regex=True) to change names
And later use pd.to_datetime() with '%y %b %d'.
import pandas as pd
df = pd.DataFrame({
'date': ["19 JA 02", "06 FE 14"],
})
print(df)
df['date'] = df['date'].replace({'JA': 'Jan', 'FE':'Feb'}, regex=True)
print(df)
df['date'] = pd.to_datetime(df['date'], format='%y %b %d')
print(df)
Result:
date
0 19 JA 02
1 06 FE 14
date
0 19 Jan 02
1 06 Feb 14
date
0 2019-01-02
1 2006-02-14

Need help to convert str date to datetime64 pandas python

Is there a way to convert this type (String) of date format below:
Wed Feb 24 18:04:49 SGT 2021
To datetime64 ns
2021-02-24
I tried using code below using pandas and it does not work
data = {'UpdateTime': [
'Thu May 28 01:24:38 SGT 2020',
'Wed Feb 24 18:04:49 SGT 2021',
'Mon Mar 01 20:34:49 SGT 2021',
'Fri Sep 18 21:29:35 SGT 2020',
'Tue Feb 09 14:21:56 SGT 2021',
'Thu Jan 01 07:30:00 SGT 1970',
]}
df = pd.DataFrame(data)
df['UpdateTime']=pd.to_datetime(df['UpdateTime'].str.split(' ',1).str[0])
and got error
pandas._libs.tslibs.np_datetime.OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 1-01-04 00:00:00
I'm pretty sure this is regex issue and I'm not familiar with it. Please help.
Thanks!
I think regex here is not necessary, you can specify format in to_datetime:
df['UpdateTime']=pd.to_datetime(df['UpdateTime'], format='%a %b %d %H:%M:%S SGT %Y')
print (df)
UpdateTime
0 2020-05-28 01:24:38
1 2021-02-24 18:04:49
2 2021-03-01 20:34:49
3 2020-09-18 21:29:35
4 2021-02-09 14:21:56
5 1970-01-01 07:30:00

Turning an object into a datetime raising errors

a two part question
I'm attempting to transform a column into a datetime, an easy task I assume ? as I've done it before on different df's using the documentation without much issue.
df = pd.DataFrame({'date' : ['24 October 2018', '23 April 2018', '18 January 2018']})
print(df)
date
0 24 October 2018
1 23 April 2018
2 18 January 2018
I was going through the datetime docs and I thought this piece of code would convert this column (which is an object) into a datetime
df.date = pd.to_datetime(df['date'], format="%d-%m-%Y",errors='ignore')
which gives the error :
ValueError: time data '24 April 2018' does not match format '%d-%m-%Y' (match)
I've attempted playing with formulas and going through documentation to no avail!
You are using the wrong format. '24 October 2018' uses format="%d %B %Y". The format specifiers are listed here.
edit: -Demo-
>>> import pandas as pd
>>> df = pd.DataFrame({'date':['24 October 2018', '23 April 2018', '18 January 2018']})
>>> df.date = pd.to_datetime(df['date'], format="%d %B %Y")
>>>
>>> df
date
0 2018-10-24
1 2018-04-23
2 2018-01-18
>>>
>>> df['date'][0]
Timestamp('2018-10-24 00:00:00')
>>> df['date'][0].month
10
edit 2: second question
>>> df['status'] = ['complete', 'complete', 'requested']
>>> df
date status
0 2018-10-24 complete
1 2018-04-23 complete
2 2018-01-18 requested
>>>
>>> df[df['status'] != 'complete']
date status
2 2018-01-18 requested
You can use pd.to_datetime or the datetime library
import datetime as dt
df['date'].apply(lambda x: dt.datetime.strptime(x,'%d %B %Y'))

Categories