Convert DD MM YY to Datetime Object in a Pandas Series - python

In a current pandas series, I have numerous rows in which the date is formatted in the format YY MM DD, with the months being the text abbreviation of the month. For example, 19 JA 02 represents January 2, 2019. Is there a way to convert and parse this series to a datetime object?
I have currently tried the following:
chemical_19['Manufacture Date(MM/DD/YYYY)'].apply(lambda x : datetime.datetime.strptime(x, '%Y/%M/%d'))

If you would have Jan, Feb then you could use %b to match month.
And '%y %b %d' to match 19 Jan 02
But in pandas you can use dictionary {'JA': 'Jan', 'FE':'Feb', ...}
with .replace(..., regex=True) to change names
And later use pd.to_datetime() with '%y %b %d'.
import pandas as pd
df = pd.DataFrame({
'date': ["19 JA 02", "06 FE 14"],
})
print(df)
df['date'] = df['date'].replace({'JA': 'Jan', 'FE':'Feb'}, regex=True)
print(df)
df['date'] = pd.to_datetime(df['date'], format='%y %b %d')
print(df)
Result:
date
0 19 JA 02
1 06 FE 14
date
0 19 Jan 02
1 06 Feb 14
date
0 2019-01-02
1 2006-02-14

Related

how to convert a column with string datetime to datetime format

i want to convert a column with string date '19 Desember 2022' for example (the month name is in Indonesian), to supported datetime format without translating it, how do i do that?
already tried this one
df_train['date'] = pd.to_datetime(df_train['date'], format='%d %B %Y') but got error time data '19 Desember 2022' does not match format '%d %B %Y' (match)
incase if anyone want to see the row image
Try using dateparser
import dateparser
df_train = pd.DataFrame(['19 Desember 2022', '20 Desember 2022', '21 Desember 2022', '22 Desember 2022'], columns = ['date'])
df_train['date'] = [dateparser.parse(x) for x in df_train['date']]
df_train
Output:
date
0 2022-12-19
1 2022-12-20
2 2022-12-21
3 2022-12-22
Pandas doesn't recognize bahasa(indonesian language) Try replacing the spelling of December (as pointed out you can use a one liner and create a new column):
df_train["formatted_date"] = pd.to_datetime(df_train["date"].str.replace("Desember", "December"), format="%d %B %Y")
print(df_train)
Output:
user_type date formatted_date
0 Anggota 19 Desember 2022 2022-12-19
1 Anggota 19 Desember 2022 2022-12-19
2 Anggota 19 Desember 2022 2022-12-19
3 Anggota 19 Desember 2022 2022-12-19
4 Anggota 19 Desember 2022 2022-12-19

Timestamp format not being matched in pandas to_datetime format specifier

I have a pandas dataframe with some timestamp values in a column. I wish to get the sum of values grouped by every hour.
Date_and_Time Frequency
0 Jan 08 15:54:39 NaN
1 Jan 09 10:48:13 NaN
2 Jan 09 10:42:24 NaN
3 Jan 09 20:18:46 NaN
4 Jan 09 12:08:23 NaN
I started off removing the leading days in the column and then typed the following to convert the values to date_time compliant format:
dateTimeValues['Date_and_Time'] = pd.to_datetime(dateTimeValues['Date_and_Time'], format='%b %d %H:%M:%S')
After doing so, I receive the following error:
ValueError: time data 'Jan 08 12:41:' does not match format '%b %d %H:%M:%S' (match)
On checking my input CSV, I can confirm that no column containing the above data are incomplete.
I'd like to know how to resolve this issue and successfully process my timestamps to their desired output format.
I suggest you create a self defined lambda function which selects the needed format string.
You may have to edit the lambda function:
df = pd.DataFrame({'Date_and_Time':['Jan 08 15:54:39', 'Jan 09 10:48:']})
df
>>>
Date_and_Time
0 Jan 08 15:54:39
1 Jan 09 10:48:
With one typo in line 1.
Now selected the format string for every item with the lambda function.
def my_lambda(x):
f = '%b %d %H:%M:%S'
if x.endswith(':'):
f = '%b %d %H:%M:'
return pd.to_datetime(x , format=f)
df['Date_and_Time'] = df['Date_and_Time'].apply(my_lambda)
>>> df
Date_and_Time
0 1900-01-08 15:54:39
1 1900-01-09 10:48:00

ValueError in dataframe while trying to extract day, month and year using datetime python library

I have three columns in my dataframe: Tweet Posted Time (UTC), Tweet Content, and Tweet Location. The "Tweet Posted Time (UTC)" column has date object in the format: 31 Mar 2020 10:49:01
My objective is to reformat the dataframe in such a way that the 'Tweet Posted Time (UTC)' column displays only the day, month and the year alone (such as 31-03-2020), to be able to plot a time-series graph, but my attempts result in the error below.
ValueError: time data '0 31 Mar 2020 10:49:01\n1 31 Mar 2020 05:48:43\n2 30 Mar 2020 05:38:50\n3 29 Mar 2020 21:19:23\n4 29 Mar 2020 20:28:22\n ... \n2488 02 Jan 2018 13:36:07\n2489 02 Jan 2018 10:33:21\n2490 01 Jan 2018 12:23:47\n2491 01 Jan 2018 06:03:51\n2492 01 Jan 2018 02:09:15\nName: Tweet Posted Time (UTC), Length: 2451, dtype: object' does not match format '%d %b %Y %H:%M:%S'
My code is below, can you tell me what I am doing wrong, please?
from datetime import datetime
import pandas as pd
import re #regular expression
from textblob import TextBlob
import string
import preprocessor as p
pd.set_option("expand_frame_repr", False)
df1 = pd.read_csv("C:/tweet_data.csv")
dataType = df1.dtypes
print(dataType)
# convert datetime object to string
old_formatDate = str(df1['Tweet Posted Time (UTC)'])
# extract day, month, and year and convert back to datetime object
date_TimeObject = datetime.strptime(old_formatDate, '%d %b %Y %H:%M:%S')
new_formatDate = date_TimeObject.strftime('%d-%m-%Y')
print(new_formatDate)
I researched and solved the problem by changing the data frame to panda series and then to datetime format. Then, applied dt.strftime.
df.columns = ['Tweet_Posted_Time', 'Tweet_Content', 'Tweet_Location']
print(df)
# Convert the date and time column (Tweet_Posted_Time) from panda data frame to Panda Series
df1 = pd.Series(df['Tweet_Posted_Time'])
print(df1)
# Convert the Panda Series to datetime format
df1 = pd.to_datetime(df1)
print(df1)
# convert the date column to new date format
df1 = df1.dt.strftime('%d-%m-%Y')
print(df1)
# Replace the Column "Tweet_Posted_Time" in the original data frame with the new data frame containing new date format
df.assign(Tweet_Posted_Time=df1)````

Date formating in Pandas

I'm trying to format a column with date to 'Month Year' format without changing non-date values .
input_df = pd.DataFrame({'Period' :['2017-11-01 00:00:00', '2019-02-01 00:00:00', 'Mar 2020', 'Pre-Nov 2017', '2019-10-01 00:00:00' , 'Nov 17-Nov 18'] } )
input_df is
expected output is:
I tired with the below code which didn't work:
output_df['Period'] = input_df['Period'].apply(lambda x: x.strftime('%m %Y') if isinstance(x, datetime.date) else x)
Pls help..
You can do with error='coerce' and fillna:
input_df['new_period'] = (pd.to_datetime(input_df['Period'], errors='coerce')
.dt.strftime('%b %Y')
.fillna(input_df['Period'])
)
Output:
Period new_period
0 2017-11-01 00:00:00 Nov 2017
1 2019-02-01 00:00:00 Feb 2019
2 Mar 2020 Mar 2020
3 Pre-Nov 2017 Pre-Nov 2017
4 2019-10-01 00:00:00 Oct 2019
5 Nov 17-Nov 18 Nov 17-Nov 18
Update: Second, safer option:
s = pd.to_datetime(input_df['Period'], errors='coerce')
input_df['new_period'] = np.where(s.isna(), input_df['Period'],
s.dt.strftime('%b %Y'))

Turning an object into a datetime raising errors

a two part question
I'm attempting to transform a column into a datetime, an easy task I assume ? as I've done it before on different df's using the documentation without much issue.
df = pd.DataFrame({'date' : ['24 October 2018', '23 April 2018', '18 January 2018']})
print(df)
date
0 24 October 2018
1 23 April 2018
2 18 January 2018
I was going through the datetime docs and I thought this piece of code would convert this column (which is an object) into a datetime
df.date = pd.to_datetime(df['date'], format="%d-%m-%Y",errors='ignore')
which gives the error :
ValueError: time data '24 April 2018' does not match format '%d-%m-%Y' (match)
I've attempted playing with formulas and going through documentation to no avail!
You are using the wrong format. '24 October 2018' uses format="%d %B %Y". The format specifiers are listed here.
edit: -Demo-
>>> import pandas as pd
>>> df = pd.DataFrame({'date':['24 October 2018', '23 April 2018', '18 January 2018']})
>>> df.date = pd.to_datetime(df['date'], format="%d %B %Y")
>>>
>>> df
date
0 2018-10-24
1 2018-04-23
2 2018-01-18
>>>
>>> df['date'][0]
Timestamp('2018-10-24 00:00:00')
>>> df['date'][0].month
10
edit 2: second question
>>> df['status'] = ['complete', 'complete', 'requested']
>>> df
date status
0 2018-10-24 complete
1 2018-04-23 complete
2 2018-01-18 requested
>>>
>>> df[df['status'] != 'complete']
date status
2 2018-01-18 requested
You can use pd.to_datetime or the datetime library
import datetime as dt
df['date'].apply(lambda x: dt.datetime.strptime(x,'%d %B %Y'))

Categories