how to convert a column with string datetime to datetime format - python

i want to convert a column with string date '19 Desember 2022' for example (the month name is in Indonesian), to supported datetime format without translating it, how do i do that?
already tried this one
df_train['date'] = pd.to_datetime(df_train['date'], format='%d %B %Y') but got error time data '19 Desember 2022' does not match format '%d %B %Y' (match)
incase if anyone want to see the row image

Try using dateparser
import dateparser
df_train = pd.DataFrame(['19 Desember 2022', '20 Desember 2022', '21 Desember 2022', '22 Desember 2022'], columns = ['date'])
df_train['date'] = [dateparser.parse(x) for x in df_train['date']]
df_train
Output:
date
0 2022-12-19
1 2022-12-20
2 2022-12-21
3 2022-12-22

Pandas doesn't recognize bahasa(indonesian language) Try replacing the spelling of December (as pointed out you can use a one liner and create a new column):
df_train["formatted_date"] = pd.to_datetime(df_train["date"].str.replace("Desember", "December"), format="%d %B %Y")
print(df_train)
Output:
user_type date formatted_date
0 Anggota 19 Desember 2022 2022-12-19
1 Anggota 19 Desember 2022 2022-12-19
2 Anggota 19 Desember 2022 2022-12-19
3 Anggota 19 Desember 2022 2022-12-19
4 Anggota 19 Desember 2022 2022-12-19

Related

Convert DD MM YY to Datetime Object in a Pandas Series

In a current pandas series, I have numerous rows in which the date is formatted in the format YY MM DD, with the months being the text abbreviation of the month. For example, 19 JA 02 represents January 2, 2019. Is there a way to convert and parse this series to a datetime object?
I have currently tried the following:
chemical_19['Manufacture Date(MM/DD/YYYY)'].apply(lambda x : datetime.datetime.strptime(x, '%Y/%M/%d'))
If you would have Jan, Feb then you could use %b to match month.
And '%y %b %d' to match 19 Jan 02
But in pandas you can use dictionary {'JA': 'Jan', 'FE':'Feb', ...}
with .replace(..., regex=True) to change names
And later use pd.to_datetime() with '%y %b %d'.
import pandas as pd
df = pd.DataFrame({
'date': ["19 JA 02", "06 FE 14"],
})
print(df)
df['date'] = df['date'].replace({'JA': 'Jan', 'FE':'Feb'}, regex=True)
print(df)
df['date'] = pd.to_datetime(df['date'], format='%y %b %d')
print(df)
Result:
date
0 19 JA 02
1 06 FE 14
date
0 19 Jan 02
1 06 Feb 14
date
0 2019-01-02
1 2006-02-14

Need help to convert str date to datetime64 pandas python

Is there a way to convert this type (String) of date format below:
Wed Feb 24 18:04:49 SGT 2021
To datetime64 ns
2021-02-24
I tried using code below using pandas and it does not work
data = {'UpdateTime': [
'Thu May 28 01:24:38 SGT 2020',
'Wed Feb 24 18:04:49 SGT 2021',
'Mon Mar 01 20:34:49 SGT 2021',
'Fri Sep 18 21:29:35 SGT 2020',
'Tue Feb 09 14:21:56 SGT 2021',
'Thu Jan 01 07:30:00 SGT 1970',
]}
df = pd.DataFrame(data)
df['UpdateTime']=pd.to_datetime(df['UpdateTime'].str.split(' ',1).str[0])
and got error
pandas._libs.tslibs.np_datetime.OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 1-01-04 00:00:00
I'm pretty sure this is regex issue and I'm not familiar with it. Please help.
Thanks!
I think regex here is not necessary, you can specify format in to_datetime:
df['UpdateTime']=pd.to_datetime(df['UpdateTime'], format='%a %b %d %H:%M:%S SGT %Y')
print (df)
UpdateTime
0 2020-05-28 01:24:38
1 2021-02-24 18:04:49
2 2021-03-01 20:34:49
3 2020-09-18 21:29:35
4 2021-02-09 14:21:56
5 1970-01-01 07:30:00

convert month_year value to month name and year columns in python

I've a sample dataframe
year_month
202004
202005
202011
202012
How can I append the month_name + year column to the dataframe
year_month month_name
202004 April 2020
202005 May 2020
202011 Nov 2020
202112 Dec 2021
You can use datetime.strptime to convert your string into a datetime object, then you can use datetime.strftime to convert it back into a string with different format.
>>> import datetime as dt
>>> import pandas as pd
>>> df = pd.DataFrame(['202004', '202005', '202011', '202012'], columns=['year_month'])
>>> df['month_name'] = df['year_month'].apply(lambda x: dt.datetime.strptime(x, '%Y%m').strftime('%b %Y'))
>>> df
year_month month_name
0 202004 Apr 2020
1 202005 May 2020
2 202011 Nov 2020
3 202012 Dec 2020
You can see the full list of format codes here.

Date formating in Pandas

I'm trying to format a column with date to 'Month Year' format without changing non-date values .
input_df = pd.DataFrame({'Period' :['2017-11-01 00:00:00', '2019-02-01 00:00:00', 'Mar 2020', 'Pre-Nov 2017', '2019-10-01 00:00:00' , 'Nov 17-Nov 18'] } )
input_df is
expected output is:
I tired with the below code which didn't work:
output_df['Period'] = input_df['Period'].apply(lambda x: x.strftime('%m %Y') if isinstance(x, datetime.date) else x)
Pls help..
You can do with error='coerce' and fillna:
input_df['new_period'] = (pd.to_datetime(input_df['Period'], errors='coerce')
.dt.strftime('%b %Y')
.fillna(input_df['Period'])
)
Output:
Period new_period
0 2017-11-01 00:00:00 Nov 2017
1 2019-02-01 00:00:00 Feb 2019
2 Mar 2020 Mar 2020
3 Pre-Nov 2017 Pre-Nov 2017
4 2019-10-01 00:00:00 Oct 2019
5 Nov 17-Nov 18 Nov 17-Nov 18
Update: Second, safer option:
s = pd.to_datetime(input_df['Period'], errors='coerce')
input_df['new_period'] = np.where(s.isna(), input_df['Period'],
s.dt.strftime('%b %Y'))

Turning an object into a datetime raising errors

a two part question
I'm attempting to transform a column into a datetime, an easy task I assume ? as I've done it before on different df's using the documentation without much issue.
df = pd.DataFrame({'date' : ['24 October 2018', '23 April 2018', '18 January 2018']})
print(df)
date
0 24 October 2018
1 23 April 2018
2 18 January 2018
I was going through the datetime docs and I thought this piece of code would convert this column (which is an object) into a datetime
df.date = pd.to_datetime(df['date'], format="%d-%m-%Y",errors='ignore')
which gives the error :
ValueError: time data '24 April 2018' does not match format '%d-%m-%Y' (match)
I've attempted playing with formulas and going through documentation to no avail!
You are using the wrong format. '24 October 2018' uses format="%d %B %Y". The format specifiers are listed here.
edit: -Demo-
>>> import pandas as pd
>>> df = pd.DataFrame({'date':['24 October 2018', '23 April 2018', '18 January 2018']})
>>> df.date = pd.to_datetime(df['date'], format="%d %B %Y")
>>>
>>> df
date
0 2018-10-24
1 2018-04-23
2 2018-01-18
>>>
>>> df['date'][0]
Timestamp('2018-10-24 00:00:00')
>>> df['date'][0].month
10
edit 2: second question
>>> df['status'] = ['complete', 'complete', 'requested']
>>> df
date status
0 2018-10-24 complete
1 2018-04-23 complete
2 2018-01-18 requested
>>>
>>> df[df['status'] != 'complete']
date status
2 2018-01-18 requested
You can use pd.to_datetime or the datetime library
import datetime as dt
df['date'].apply(lambda x: dt.datetime.strptime(x,'%d %B %Y'))

Categories