Parsing dates with multiple formats - python

The dates returned by imaplib are in the following format:
dates = [
'Mon, 27 May 2019 13:13:02 -0300 (ART)',
'Tue, 28 May 2019 00:28:31 +0800 (CST)',
'Mon, 27 May 2019 18:32:13 +0200',
'Mon, 27 May 2019 18:43:13 +0200',
'Mon, 27 May 2019 19:00:11 +0200',
'27 May 2019 18:54:58 +0100',
'27 May 2019 18:56:02 +0100',
'Mon, 03 Jun 2019 10:19:56 GMT',
'4 Jun 2019 07:46:30 +0100',
'Mon, 03 Jun 2019 18:48:01 +0200',
'5 Jun 2019 10:39:19 +0100'
]
How can I convert these into say, BST datetimes?
Here's what I've tried so far:
def date_parse(date):
try:
return datetime.strptime(date, '%a, %d %b %Y %H:%M:%S %z')
except ValueError:
try:
return datetime.strptime(date[:-6], '%a, %d %b %Y %H:%M:%S %z')
except ValueError:
try:
return datetime.strptime(date[:-6], '%d %b %Y %H:%M:%S')
except ValueError:
return datetime.strptime(date[:-4], '%a, %d %b %Y %H:%M:%S')
for date in dates:
print(date)
parsed_date = date_parse(date)
print(parsed_date, type(parsed_date))
print('')
However I get dates repeated followed by an Traceback (most recent call last): error.
What is the best way to clean these dates?
Is there a imaplib/email function that allows us to return clean dates automatically?

parse function from dateutil.parser did the trick:
from dateutil.parser import parse
dates = [
'Mon, 27 May 2019 13:13:02 -0300 (ART)',
'Tue, 28 May 2019 00:28:31 +0800 (CST)',
'Mon, 27 May 2019 18:32:13 +0200',
'Mon, 27 May 2019 18:43:13 +0200',
'Mon, 27 May 2019 19:00:11 +0200',
'27 May 2019 18:54:58 +0100',
'27 May 2019 18:56:02 +0100',
'Mon, 03 Jun 2019 10:19:56 GMT',
'4 Jun 2019 07:46:30 +0100',
'Mon, 03 Jun 2019 18:48:01 +0200',
'5 Jun 2019 10:39:19 +0100'
]
for date in dates:
print(date, type(date))
print(parse(date), type(parse(date)))
print('')

Related

Need help to convert str date to datetime64 pandas python

Is there a way to convert this type (String) of date format below:
Wed Feb 24 18:04:49 SGT 2021
To datetime64 ns
2021-02-24
I tried using code below using pandas and it does not work
data = {'UpdateTime': [
'Thu May 28 01:24:38 SGT 2020',
'Wed Feb 24 18:04:49 SGT 2021',
'Mon Mar 01 20:34:49 SGT 2021',
'Fri Sep 18 21:29:35 SGT 2020',
'Tue Feb 09 14:21:56 SGT 2021',
'Thu Jan 01 07:30:00 SGT 1970',
]}
df = pd.DataFrame(data)
df['UpdateTime']=pd.to_datetime(df['UpdateTime'].str.split(' ',1).str[0])
and got error
pandas._libs.tslibs.np_datetime.OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 1-01-04 00:00:00
I'm pretty sure this is regex issue and I'm not familiar with it. Please help.
Thanks!
I think regex here is not necessary, you can specify format in to_datetime:
df['UpdateTime']=pd.to_datetime(df['UpdateTime'], format='%a %b %d %H:%M:%S SGT %Y')
print (df)
UpdateTime
0 2020-05-28 01:24:38
1 2021-02-24 18:04:49
2 2021-03-01 20:34:49
3 2020-09-18 21:29:35
4 2021-02-09 14:21:56
5 1970-01-01 07:30:00

Change datetime format to hours only in Pandas

I have a list of strings date. Formatted in like
Fri Apr 23 12:38:07 +0000 2021
How can I change its format? I want to take only the hours. I checked other source before, but you need to change the date format, which obviously I'm struggling rn
As I know, you can write the code like
ds['waktu'] = pd.to_datetime(ds['tanggal'], format='%A %b %d %H:%M:%S %z %Y')
to change its format. But idk what +0000 means.
If you only want to take the hours from the date strings, you can use .dt.strftime() after the pd.to_datetime() call, as follows:
ds['waktu'] = pd.to_datetime(ds['tanggal'], format='%a %b %d %H:%M:%S %z %Y').dt.strftime('%H:%M:%S')
Note that your format string for pd.to_datetime() is not correct and need to replace %A by %a.
+0000 is the time zone, which you can parse with %z in the format string.
Demo
ds = pd.DataFrame({'tanggal': ['Fri Apr 23 12:38:07 +0000 2021', 'Thu Apr 22 11:28:17 +0000 2021']})
ds['waktu'] = pd.to_datetime(ds['tanggal'], format='%a %b %d %H:%M:%S %z %Y').dt.strftime('%H:%M:%S')
print(ds)
tanggal waktu
0 Fri Apr 23 12:38:07 +0000 2021 12:38:07
1 Thu Apr 22 11:28:17 +0000 2021 11:28:17

Changing datetime format which includes weekday?

Currently I have datetime column in this format
Datime
Thu Jun 18 23:04:19 +0000 2020
Thu Jun 18 23:04:18 +0000 2020
Thu Jun 18 23:04:14 +0000 2020
Thu Jun 18 23:04:13 +0000 2020
I want to change it to:
Datetime
2020-06-18 23:04:19
2020-06-18 23:04:18
2020-06-18 23:04:14
2020-06-18 23:04:13
Assuming you have loaded your pandas dataframe, you can convert Datetime column to specified format using this function. You can rename this function.
import datetime
def modify_datetime(dtime):
my_time = datetime.datetime.strptime(dtime, '%a %b %d %H:%M:%S %z %Y')
return my_time.strftime('%Y-%m-%d %H:%M:%S')
First argument to strptime function is date string and second argument is format.
Directive, Description
%a Weekday abbreviated
%b Month abbreviated name
%d Day of the month
%H Hour (24-hour format)
%M Minute with zero padding
%S Second with zero padding
%z UTC offset
%Y Full year
Once you converted string date to datetime objects you can convert it back to string with specified format using strftime function. You can read more about formats here.
Finally, just modify the Datetime column
df['Datetime'] = df['Datetime'].apply(modify_datetime)
You can use pandas.to_datetime and pandas.Series.dt.strftime appropriately:
>>> import pandas as pd
>>> from datetime import datetime
>>> datetime_strs = ["Thu Jun 18 23:04:19 +0000 2020", "Thu Jun 18 23:04:18 +0000 2020", "Thu Jun 18 23:04:14 +0000 2020", "Thu Jun 18 23:04:13 +0000 2020"]
>>> d = {'Datetimes': datetime_strs}
>>> df = pd.DataFrame(data=d)
>>> df
Datetimes
0 Thu Jun 18 23:04:19 +0000 2020
1 Thu Jun 18 23:04:18 +0000 2020
2 Thu Jun 18 23:04:14 +0000 2020
3 Thu Jun 18 23:04:13 +0000 2020
>>> df['Datetimes'] = pd.to_datetime(df['Datetimes'], format='%a %b %d %H:%M:%S %z %Y')
>>> df
Datetimes
0 2020-06-18 23:04:19+00:00
1 2020-06-18 23:04:18+00:00
2 2020-06-18 23:04:14+00:00
3 2020-06-18 23:04:13+00:00
>>> df['Datetimes'] = df['Datetimes'].dt.strftime('%Y-%m-%d %H:%M:%S')
>>> df
Datetimes
0 2020-06-18 23:04:19
1 2020-06-18 23:04:18
2 2020-06-18 23:04:14
3 2020-06-18 23:04:13

getting date from datetime data

I have a datetime data in this format,
08:15:54:012 12 03 2016 +0000 GMT+00:00
I need to extract only date,that is 12 03 2016 in python.
I have tried
datetime_object=datetime.strptime('08:15:54:012 12 03 2016 +0000 GMT+00:00','%H:%M:%S:%f %d %m %Y')
I get an
ValueError: unconverted data remains: +0000 GMT+00:00
If you don't mind using an external library, I find the dateparser module much more intuitive than pythons internal datetime. It can parse pretty much anything if you just do
>>> import dateparser
>>> dateparser.parse('08:15:54:012 12 03 2016 +0000 GMT+00:00')
It claims it can handle timezone offsets tho I haven't tested it.
If you need this as string then use slicing
text = '08:15:54:012 12 03 2016 +0000 GMT+00:00'
print(text[13:23])
# 12 03 2016
but you can also convert to datetime
from datetime import datetime
text = '08:15:54:012 12 03 2016 +0000 GMT+00:00'
datetime_object = datetime.strptime(text[13:23],'%d %m %Y')
print(datetime_object)
# datetime.datetime(2016, 3, 12, 0, 0)
BTW:
in your oryginal version you have to remove +0000 GMT+00:00 usinig slicing [:-16]
strptime('08:15:54:012 12 03 2016 +0000 GMT+00:00'[:-16], '%H:%M:%S:%f %d %m %Y')
You can also use split() and join()
>>> x = '08:15:54:012 12 03 2016 +0000 GMT+00:00'.split()
['08:15:54:012', '12', '03', '2016', '+0000', 'GMT+00:00']
>>> x[1:4]
['12', '03', '2016']
>>> ' '.join(x[1:4])
'12 03 2016'
You can do it like this:
d = '08:15:54:012 12 03 2016 +0000 GMT+00:00'
d = d[:23] #Remove the timezone details
from datetime import datetime
d = datetime.strptime(d, "%H:%M:%S:%f %m %d %Y") #parse the string
d.strftime('%m %d %Y') #format the string
You get:
'12 03 2016'

How to get time in GMT?

So far I have:
>>> import time
>>> time.strftime("%a, %d %b %Y %H:%M:%S %Z", time.localtime())
'Tue, 10 Sep 2013 22:55:08 Mitteleurop\xe4ische Sommerzeit'
But what I need is:
'Tue, 10 Sep 2013 22:55:08 GMT'
>>> import time
>>> time.strftime("%a, %d %b %Y %H:%M:%S GMT", time.gmtime())
'Tue, 10 Sep 2013 20:08:51 GMT

Categories