Is there a way to convert this type (String) of date format below:
Wed Feb 24 18:04:49 SGT 2021
To datetime64 ns
2021-02-24
I tried using code below using pandas and it does not work
data = {'UpdateTime': [
'Thu May 28 01:24:38 SGT 2020',
'Wed Feb 24 18:04:49 SGT 2021',
'Mon Mar 01 20:34:49 SGT 2021',
'Fri Sep 18 21:29:35 SGT 2020',
'Tue Feb 09 14:21:56 SGT 2021',
'Thu Jan 01 07:30:00 SGT 1970',
]}
df = pd.DataFrame(data)
df['UpdateTime']=pd.to_datetime(df['UpdateTime'].str.split(' ',1).str[0])
and got error
pandas._libs.tslibs.np_datetime.OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 1-01-04 00:00:00
I'm pretty sure this is regex issue and I'm not familiar with it. Please help.
Thanks!
I think regex here is not necessary, you can specify format in to_datetime:
df['UpdateTime']=pd.to_datetime(df['UpdateTime'], format='%a %b %d %H:%M:%S SGT %Y')
print (df)
UpdateTime
0 2020-05-28 01:24:38
1 2021-02-24 18:04:49
2 2021-03-01 20:34:49
3 2020-09-18 21:29:35
4 2021-02-09 14:21:56
5 1970-01-01 07:30:00
Related
i want to convert a column with string date '19 Desember 2022' for example (the month name is in Indonesian), to supported datetime format without translating it, how do i do that?
already tried this one
df_train['date'] = pd.to_datetime(df_train['date'], format='%d %B %Y') but got error time data '19 Desember 2022' does not match format '%d %B %Y' (match)
incase if anyone want to see the row image
Try using dateparser
import dateparser
df_train = pd.DataFrame(['19 Desember 2022', '20 Desember 2022', '21 Desember 2022', '22 Desember 2022'], columns = ['date'])
df_train['date'] = [dateparser.parse(x) for x in df_train['date']]
df_train
Output:
date
0 2022-12-19
1 2022-12-20
2 2022-12-21
3 2022-12-22
Pandas doesn't recognize bahasa(indonesian language) Try replacing the spelling of December (as pointed out you can use a one liner and create a new column):
df_train["formatted_date"] = pd.to_datetime(df_train["date"].str.replace("Desember", "December"), format="%d %B %Y")
print(df_train)
Output:
user_type date formatted_date
0 Anggota 19 Desember 2022 2022-12-19
1 Anggota 19 Desember 2022 2022-12-19
2 Anggota 19 Desember 2022 2022-12-19
3 Anggota 19 Desember 2022 2022-12-19
4 Anggota 19 Desember 2022 2022-12-19
I have a list of strings date. Formatted in like
Fri Apr 23 12:38:07 +0000 2021
How can I change its format? I want to take only the hours. I checked other source before, but you need to change the date format, which obviously I'm struggling rn
As I know, you can write the code like
ds['waktu'] = pd.to_datetime(ds['tanggal'], format='%A %b %d %H:%M:%S %z %Y')
to change its format. But idk what +0000 means.
If you only want to take the hours from the date strings, you can use .dt.strftime() after the pd.to_datetime() call, as follows:
ds['waktu'] = pd.to_datetime(ds['tanggal'], format='%a %b %d %H:%M:%S %z %Y').dt.strftime('%H:%M:%S')
Note that your format string for pd.to_datetime() is not correct and need to replace %A by %a.
+0000 is the time zone, which you can parse with %z in the format string.
Demo
ds = pd.DataFrame({'tanggal': ['Fri Apr 23 12:38:07 +0000 2021', 'Thu Apr 22 11:28:17 +0000 2021']})
ds['waktu'] = pd.to_datetime(ds['tanggal'], format='%a %b %d %H:%M:%S %z %Y').dt.strftime('%H:%M:%S')
print(ds)
tanggal waktu
0 Fri Apr 23 12:38:07 +0000 2021 12:38:07
1 Thu Apr 22 11:28:17 +0000 2021 11:28:17
Currently I have datetime column in this format
Datime
Thu Jun 18 23:04:19 +0000 2020
Thu Jun 18 23:04:18 +0000 2020
Thu Jun 18 23:04:14 +0000 2020
Thu Jun 18 23:04:13 +0000 2020
I want to change it to:
Datetime
2020-06-18 23:04:19
2020-06-18 23:04:18
2020-06-18 23:04:14
2020-06-18 23:04:13
Assuming you have loaded your pandas dataframe, you can convert Datetime column to specified format using this function. You can rename this function.
import datetime
def modify_datetime(dtime):
my_time = datetime.datetime.strptime(dtime, '%a %b %d %H:%M:%S %z %Y')
return my_time.strftime('%Y-%m-%d %H:%M:%S')
First argument to strptime function is date string and second argument is format.
Directive, Description
%a Weekday abbreviated
%b Month abbreviated name
%d Day of the month
%H Hour (24-hour format)
%M Minute with zero padding
%S Second with zero padding
%z UTC offset
%Y Full year
Once you converted string date to datetime objects you can convert it back to string with specified format using strftime function. You can read more about formats here.
Finally, just modify the Datetime column
df['Datetime'] = df['Datetime'].apply(modify_datetime)
You can use pandas.to_datetime and pandas.Series.dt.strftime appropriately:
>>> import pandas as pd
>>> from datetime import datetime
>>> datetime_strs = ["Thu Jun 18 23:04:19 +0000 2020", "Thu Jun 18 23:04:18 +0000 2020", "Thu Jun 18 23:04:14 +0000 2020", "Thu Jun 18 23:04:13 +0000 2020"]
>>> d = {'Datetimes': datetime_strs}
>>> df = pd.DataFrame(data=d)
>>> df
Datetimes
0 Thu Jun 18 23:04:19 +0000 2020
1 Thu Jun 18 23:04:18 +0000 2020
2 Thu Jun 18 23:04:14 +0000 2020
3 Thu Jun 18 23:04:13 +0000 2020
>>> df['Datetimes'] = pd.to_datetime(df['Datetimes'], format='%a %b %d %H:%M:%S %z %Y')
>>> df
Datetimes
0 2020-06-18 23:04:19+00:00
1 2020-06-18 23:04:18+00:00
2 2020-06-18 23:04:14+00:00
3 2020-06-18 23:04:13+00:00
>>> df['Datetimes'] = df['Datetimes'].dt.strftime('%Y-%m-%d %H:%M:%S')
>>> df
Datetimes
0 2020-06-18 23:04:19
1 2020-06-18 23:04:18
2 2020-06-18 23:04:14
3 2020-06-18 23:04:13
The dates returned by imaplib are in the following format:
dates = [
'Mon, 27 May 2019 13:13:02 -0300 (ART)',
'Tue, 28 May 2019 00:28:31 +0800 (CST)',
'Mon, 27 May 2019 18:32:13 +0200',
'Mon, 27 May 2019 18:43:13 +0200',
'Mon, 27 May 2019 19:00:11 +0200',
'27 May 2019 18:54:58 +0100',
'27 May 2019 18:56:02 +0100',
'Mon, 03 Jun 2019 10:19:56 GMT',
'4 Jun 2019 07:46:30 +0100',
'Mon, 03 Jun 2019 18:48:01 +0200',
'5 Jun 2019 10:39:19 +0100'
]
How can I convert these into say, BST datetimes?
Here's what I've tried so far:
def date_parse(date):
try:
return datetime.strptime(date, '%a, %d %b %Y %H:%M:%S %z')
except ValueError:
try:
return datetime.strptime(date[:-6], '%a, %d %b %Y %H:%M:%S %z')
except ValueError:
try:
return datetime.strptime(date[:-6], '%d %b %Y %H:%M:%S')
except ValueError:
return datetime.strptime(date[:-4], '%a, %d %b %Y %H:%M:%S')
for date in dates:
print(date)
parsed_date = date_parse(date)
print(parsed_date, type(parsed_date))
print('')
However I get dates repeated followed by an Traceback (most recent call last): error.
What is the best way to clean these dates?
Is there a imaplib/email function that allows us to return clean dates automatically?
parse function from dateutil.parser did the trick:
from dateutil.parser import parse
dates = [
'Mon, 27 May 2019 13:13:02 -0300 (ART)',
'Tue, 28 May 2019 00:28:31 +0800 (CST)',
'Mon, 27 May 2019 18:32:13 +0200',
'Mon, 27 May 2019 18:43:13 +0200',
'Mon, 27 May 2019 19:00:11 +0200',
'27 May 2019 18:54:58 +0100',
'27 May 2019 18:56:02 +0100',
'Mon, 03 Jun 2019 10:19:56 GMT',
'4 Jun 2019 07:46:30 +0100',
'Mon, 03 Jun 2019 18:48:01 +0200',
'5 Jun 2019 10:39:19 +0100'
]
for date in dates:
print(date, type(date))
print(parse(date), type(parse(date)))
print('')
I'm trying to format a column with date to 'Month Year' format without changing non-date values .
input_df = pd.DataFrame({'Period' :['2017-11-01 00:00:00', '2019-02-01 00:00:00', 'Mar 2020', 'Pre-Nov 2017', '2019-10-01 00:00:00' , 'Nov 17-Nov 18'] } )
input_df is
expected output is:
I tired with the below code which didn't work:
output_df['Period'] = input_df['Period'].apply(lambda x: x.strftime('%m %Y') if isinstance(x, datetime.date) else x)
Pls help..
You can do with error='coerce' and fillna:
input_df['new_period'] = (pd.to_datetime(input_df['Period'], errors='coerce')
.dt.strftime('%b %Y')
.fillna(input_df['Period'])
)
Output:
Period new_period
0 2017-11-01 00:00:00 Nov 2017
1 2019-02-01 00:00:00 Feb 2019
2 Mar 2020 Mar 2020
3 Pre-Nov 2017 Pre-Nov 2017
4 2019-10-01 00:00:00 Oct 2019
5 Nov 17-Nov 18 Nov 17-Nov 18
Update: Second, safer option:
s = pd.to_datetime(input_df['Period'], errors='coerce')
input_df['new_period'] = np.where(s.isna(), input_df['Period'],
s.dt.strftime('%b %Y'))