I've converted from string to datetimes in columns numerous times. However in each of those instances, the string format was consistent. Now I have a dataframe with mixed formats to change. Example below, but this is throughout 100,000s of rows.
index date
0 30 Jan 2018
1 January 30 2018
I could convert each type on an individual basis, but is there a way to convert that df['date'] to datetime with mixed formats easily?
Here is a module which can do this for you dateparser
from dateparser import parse
print(parse('2018-04-18 22:33:40'))
print(parse('Wed 11 Jul 2018 23:00:00 GMT'))
Output:
datetime.datetime(2018, 4, 18, 22, 33, 40)
datetime.datetime(2018, 7, 11, 23, 0, tzinfo=<StaticTzInfo 'GMT'>)
Here is a way to do it using datetime.strptime
from datetime import datetime
def IsNumber(s):
try:
int(s)
return True
except ValueError:
return False
def ConvertToDatetime(date):
date=date.split(" ") #split by space
if(IsNumber(date[0])): #is of the form dd month year
if(len(date[1])==3): #if month is for form Jan,Feb...
datetime_object = datetime.strptime(" ".join(date), '%d %b %Y')
else: #if month is for form January ,February ...
datetime_object = datetime.strptime(" ".join(date), '%d %B %Y')
else: #is of the form month date year
if(len(date[0])==3): #if month is for form Jan,Feb...
datetime_object = datetime.strptime(" ".join(date), '%b %d %Y')
else: #if month is for form January ,February ...
datetime_object = datetime.strptime(" ".join(date), '%B %d %Y')
return datetime_object
You can add more cases based on the documentation and the format
An example for the two in your question are :
ConvertToDatetime("30 Jan 2018")
2018-01-30 00:00:00
ConvertToDatetime("January 30 2018")
2018-01-30 00:00:00
Related
I have a dataset with a column "date" with values like "Jul 31, 2014", "Sep 23, 2018"...
I want to place months in a different column, convert them in integer using "df.to_datetime(df.MONTH, format='%b').dt.month" and then return back in order to sort it by the date index.
How can I choose only the first 3 letters from the cells?
You can try to_datetime with the date format %b %d, %Y:
df["date"] = pd.to_datetime(df["date"], format='%b %d, %Y')
df["month"] = df["date"].dt.month
Code:
print(df)
# date
# 0 Jul 31, 2014
# 1 Sep 23, 2018
df["date"] = pd.to_datetime(df["date"], format='%b %d, %Y')
df["month"] = df["date"].dt.month
print(df)
# date month
# 0 2014-07-31 7
# 1 2018-09-23 9
For more detail on how to get the date format, refer the doc
I have a dataframe with column date which looks like this:
Feb 24, 2020 # 12:47:31.616
I would like it to become this:
2020-02-24
I can achieve this using slicing since I am dealing only with one week's data hence all months will be Feb.
Is there a neat pandas way to change the datestamp to date format I desire?
Thank you for your suggestions.
Use to_datetime with format %b %d, %Y # %H:%M:%S.%f and then if necessary convert to dates by Series.dt.date or to datetimes by Series.dt.floor:
#dates
df = pd.DataFrame({'dates':['Feb 24, 2020 # 12:47:31.616','Feb 24, 2020 # 12:47:31.616']})
df['dates'] = pd.to_datetime(df['dates'], format='%b %d, %Y # %H:%M:%S.%f').dt.date
#datetimes
df['dates'] = pd.to_datetime(df['dates'], format='%b %d, %Y # %H:%M:%S.%f').dt.floor('d')
print (df)
dates
0 2020-02-24
1 2020-02-24
Using pd.to_datetime with Series.str.split:
df = pd.DataFrame({'date':['Feb 24, 2020 # 12:47:31.616']})
date
0 Feb 24, 2020 # 12:47:31.616
df['date'] = pd.to_datetime(df['date'].str.split('\s#\s').str[0], format='%b %d, %Y')
date
0 2020-02-24
I am getting the following error:
ValueError: time data 'Feb 1, 2017 0:03 pm' does not match format '%b %d, %Y %I:%M %p'
Here is the code :
from datetime import datetime
latest_datetime = 'Feb 1, 2017 0:03 pm'
datetime_obj = datetime.strptime(latest_datetime, "%b %d, %Y %I:%M %p")
I'm unable to figure out why I get the error.
A 12-hour clock has no 0 hour; %I will only match 1 through to 12. Your timestamp has an impossible time in it:
0:03 pm
From the strftime() and strptime() Behavior documentation:
%I
Hour (12-hour clock) as a zero-padded decimal number.
01, 02, ..., 12
Assuming 0 is really 12, you could repair this by replacing the ' 0:' with '12:' (note the leading space for the zero!):
>>> from datetime import datetime
>>> latest_datetime = 'Feb 1, 2017 0:03 pm'
>>> datetime.strptime(latest_datetime.replace(' 0:', '12:'), "%b %d, %Y %I:%M %p")
datetime.datetime(2017, 2, 1, 12, 3)
It doesn't really matter if you have one or two spaces between the year and the hour, the string will be parsed either way.
I am parsing emails through Gmail API and have got the following date format:
Sat, 21 Jan 2017 05:08:04 -0800
I want to convert it into ISO 2017-01-21 (yyyy-mm-dd) format for MySQL storage. I am not able to do it through strftime()/strptime() and am missing something. Can someone please help?
TIA
isoformat() in the dateutil.
import dateutil.parser as parser
text = 'Sat, 21 Jan 2017 05:08:04 -0800'
date = (parser.parse(text))
print(date.isoformat())
print (date.date())
Output :
2017-01-21T05:08:04-08:00
2017-01-21
You can do it with strptime():
import datetime
datetime.datetime.strptime('Sat, 21 Jan 2017 05:08:04 -0800', '%a, %d %b %Y %H:%M:%S %z')
That gives you:
datetime.datetime(2017, 1, 21, 5, 8, 4, tzinfo=datetime.timezone(datetime.timedelta(-1, 57600)))
You can even do it manually using simple split and dictionary.That way, you will have more control over formatting.
def dateconvertor(date):
date = date.split(' ')
month = {'Jan': 1, 'Feb': 2, 'Mar': 3}
print str(date[1]) + '-' + str(month[date[2]]) + '-' + str(date[3])
def main():
dt = "Sat, 21 Jan 2017 05:08:04 -0800"
dateconvertor(dt)
if __name__ == '__main__':
main()
Keep it simple.
from datetime import datetime
s="Sat, 21 Jan 2017 05:08:04 -0800"
d=(datetime.strptime(s,"%a, %d %b %Y %X -%f"))
print(datetime.strftime(d,"%Y-%m-%d"))
Output : 2017-01-21
def deadlines(t):
'''shows pretty time to deadlines'''
fmt = '%a %d %m %Y %I:%M %p %Z'
dt = datetime.strptime( t , fmt )
print 'dt ', repr(dt)
first = 'Sun 11 May 2014 05:00 PM PDT'
deadlines(first)
ValueError: time data 'Sun 11 May 2014 02:00 PM PDT' does not match format ' %a %d %m %Y %I:%M %p %Z '
Whats wrong with this?
%m matches months represent as a two-digit decimal (in [01, 12]). Use %b for abbreviated month names, or %B for full month names instead:
fmt = '%a %d %b %Y %I:%M %p %Z'
A table showing the date format directives and their meanings can be found here.
If you're having trouble parsing PDT using %Z:
Per the time.strptime docs:
Support for the %Z directive is based on the values contained in
tzname and whether daylight is true. Because of this, it is
platform-specific except for recognizing UTC and GMT which are always
known (and are considered to be non-daylight savings timezones).
So, if parsing the date string without PDT works:
In [73]: datetime.strptime('Sun 11 May 2014 05:00 PM', '%a %d %b %Y %I:%M %p')
Out[73]: datetime.datetime(2014, 5, 11, 17, 0)
but
datetime.strptime('Sun 11 May 2014 05:00 PM PDT', '%a %d %b %Y %I:%M %p %Z')
raises a ValueError, then you may need strip off the timezone name (they are, in general, ambiguous anyway):
In [10]: datestring = 'Sun 11 May 2014 05:00 PM PDT'
In [11]: datestring, _ = datestring.rsplit(' ', 1)
In [12]: datestring
Out[12]: 'Sun 11 May 2014 05:00 PM'
In [13]: datetime.strptime(datestring, '%a %d %b %Y %I:%M %p')
Out[13]: datetime.datetime(2014, 5, 11, 17, 0)
or use dateutil:
In [1]: import dateutil.parser as parser
In [2]: parser.parse('Sun 11 May 2014 05:00 PM PDT')
Out[2]: datetime.datetime(2014, 5, 11, 17, 0)