Convert date type object to datetime - python

I have a dataframe with Date_Birth in the following format: July 1, 1991 (as type object).
How can I change the entire column to datetime?
Thanks

Use the pandas.to_datetime function. You can write out a format specifier in the strptime format or have python guess the format. In your case, guessing works.
>>> import pandas as pd
>>> df=pd.DataFrame({"Date":["July 1, 1991"]})
>>> pd.to_datetime(df["Date"], format="%B %d, %Y")
0 1991-07-01
Name: Date, dtype: datetime64[ns]
>>> pd.to_datetime(df["Date"], infer_datetime_format=True)
0 1991-07-01
Name: Date, dtype: datetime64[ns]

Related

Convert date column (string) to datetime and match the format

I'm trying to covert the next date column (str) to datetime64 and say that format doesn't match, can anyone help me pleas :)
Column:
df["Date"]
0 15/7/21
...
2541 13/9/21
dtype: object
What I try:
pd.to_datetime(df["Date"], format = "%d/%m/%Y")
ValueError: time data '15/7/21' does not match format '%d/%m/%Y' (match)
I also try:
pd.to_datetime(df["Date"].astype("datetime64"), format='%d/%m/%Y')
And it convert it as datetime but there is some date the day is in the month.
Anyone know what to do ?
%Y expects a 4-digit year. Use %y for a 2-digit year (See the docs):
>>> import pandas as pd
>>> df = pd.DataFrame({'Date':['15/7/21','13/9/21']})
>>> df['Date']
0 15/7/21
1 13/9/21
Name: Date, dtype: object
>>> pd.to_datetime(df['Date'].astype('datetime64'),format='%d/%m/%y')
0 2021-07-15
1 2021-09-13
Name: Date, dtype: datetime64[ns]
Note that pandas is pretty good at guessing the format:
>>> pd.to_datetime(df['Date'])
0 2021-07-15
1 2021-09-13

Identifying a dateformat and change it into another

I am working with the following piece of data which has a different format of dates and which creates confusion later in the process. The data is like:
S. No DateTime Area
1 03/05/2019 6:33 A
2 06/03/2019 07:23:45 AM B
The first row is the format %m/%d/%Y h: mm and the second row is the format of %d/%m/%Y hh:mm: ss AM/PM. The first date value can be confusing though, is it 5th march or 3rd May. So to make everything of the same format, I want that my code detects the date format and changes in the desired format.
I have tried doing this:
df['Detection Date'] = pd.to_datetime(df['Detection Date & Time'], errors = 'coerce').dt.datetime
col = df['Detection Date'].apply(str)
for i in df.index:
if datetime.datetime.strptime(col, '%m/%d/%Y h:mm'):
ColDate = datetime.datetime.strftime(col, '%d/%m/%Y hh:mm:ss AM/PM')
But i am getting an error saying:
TypeError: strptime() argument 1 must be str, not Series
How it should be conducted.
Thanks
If it is ok to install a dependency then you can use dateparser link
import pandas as pd
import dateparser
df = pd.DataFrame({'Detection Date & Time': ['03/05/2019 6:33', '06/03/2019 07:23:45 AM']})
df['Date & time'] = df['Detection Date & Time'].apply(dateparser.parse)
You can specify both possible formats in to_datetime, so if format not match is returned missing values, so is possible use Series.fillna:
date1 = pd.to_datetime(df['DateTime'], errors = 'coerce', format='%m/%d/%Y %H:%M')
date2 = pd.to_datetime(df['DateTime'], errors = 'coerce', format='%d/%m/%Y %H:%M:%S %p')
df['DateTime'] = date1.fillna(date2)
print (df)
S. No DateTime Area
0 1 2019-03-05 06:33:00 A
1 2 2019-03-06 07:23:45 B
Last if want specify new format add Series.dt.strftime - advanatage of solution are verify both formats:
df['DateTime'] = date1.fillna(date2).dt.strftime('%d/%m/%Y %H:%M:%S %p')
print (df)
S. No DateTime Area
0 1 05/03/2019 06:33:00 AM A
1 2 06/03/2019 07:23:45 AM B
Details:
print (date1)
0 2019-03-05 06:33:00
1 NaT
Name: DateTime, dtype: datetime64[ns]
print (date2)
0 NaT
1 2019-03-06 07:23:45
Name: DateTime, dtype: datetime64[ns]
Another possible solution without verify another formats - only repalaced format %m/%d/%Y %H:%M to %d/%m/%Y %H:%M:%S %p:
date1 = pd.to_datetime(df['DateTime'], errors = 'coerce', format='%m/%d/%Y %H:%M').dt.strftime('%d/%m/%Y %H:%M:%S %p')
df['DateTime'] = date1.replace('NaT', df['DateTime'])
print (df)
S. No DateTime Area
0 1 05/03/2019 06:33:00 AM A
1 2 06/03/2019 07:23:45 AM B

Pandas changing date to a shorter format

G'day!
In my limited time working with Python and Pandas one question comes up time and time again - what if my input data has date/time in a long format, how to change it to a shorter version?
For example, the date in the input file would be:
10/10/2019 5:52:30 AM
If I want to perform date/time operations with it, I'll need to convert it to datetime:
df = pd.to_datetime(df['date'], format="%d/%m/%Y %H:%M:%S %p")
So now I have datetime objects in full long format. But what if I only need the day/month/year?
I could of course convert them back to strings and then to convert them back into datetime format.
df = df['date'].dt.strftime("%d/%m/%Y")
df = pd.to_datetime(df['date'], format="%d/%m/%Y")
It hurts my eyes to look at this... There should be a simpler way of doing this, right?
Pandas floor or round functions can do the job:
#Generate the data
df = pd.DataFrame({'year': [2015, 2016],
'month': [2, 3],
'day': [4, 5],
'hour': [2, 23]})
df['Date']=pd.to_datetime(df)
#Floor and round datetime
df['Date'].dt.floor('d')
df['Date'].dt.round('d')
The output for dt.floor is:
0 2015-02-04
1 2016-03-05
Name: Date, dtype: datetime64[ns]
and for dt.round:
0 2015-02-04
1 2016-03-06
Name: Date, dtype: datetime64[ns]

pandas to_datetime converts non-zero padded month and day into datetime

I am using pd.to_datetime to convert strings into datetime;
df = pd.DataFrame(data={'id':['DD-83']})
pd.to_datetime(df['id'].str.replace(r'\D+', ''), errors='coerce', format='%d%m')
%d%m defines zero-padded day and month, but the code still converts the above string into
0 1900-03-08
Name: id, dtype: datetime64[ns]
I am wondering how to avoid it being converted into datetime (e.g. convert to NaT in this case), if the month and day in a string are not 0-padded. So
DD0306
DD0706
DD-83
will convert to
1900-06-03
1900-06-07
NaT
You need to look for - and only pass strings without -.
Setup:
df = pd.DataFrame(data={'id':['DD-83', 'DD0706', 'DD0306']})
Code:
df['date'] = pd.to_datetime(df['id'].loc[~df['id'].str.contains('-')].str.replace(r'\D+', ''), errors='coerce', format='%d%m')
Output:
id date
0 DD-83 NaT
1 DD0706 1900-06-07
2 DD0306 1900-06-03

Convert pandas series cell to string and datetime object

I have sliced the pandas dataframe.
end_date = df[-1:]['end']
type(end_date)
Out[4]: pandas.core.series.Series
end_date
Out[3]:
48173 2017-09-20 04:47:59
Name: end, dtype: datetime64[ns]
How to get rid of end_date's index value 48173 and get only 2017-09-20 04:47:59 string? I have to call REST API with 2017-09-20 04:47:59 as a parameter, so I have to get string from pandas datetime64 series.
How to get rid of end_date's index value 48173 and get only datetime object [something like datetime.datetime.strptime('2017-09-20 04:47:59', '%Y-%m-%d %H:%M:%S')]. I need it because, later I will have to check if '2017-09-20 04:47:59' < datetime.datetime(2017,1,9)
I need to convert just a single cell value, not a whole column.
How to do these conversions?
It seems you need:
import pandas as pd
data = ['2017-09-20 04:47:59','2017-10-20 04:47:59','2017-09-30 04:47:59']
df = pd.DataFrame(data,columns=['end'])
df['end'] = pd.to_datetime(df['end'])
df
df will be:
end
0 2017-09-20 04:47:59
1 2017-10-20 04:47:59
2 2017-09-30 04:47:59
After that you can use below code to get rid of index and use as 'Timestamp' object:
end_date = df['end'].iloc[-1] #get last row of column end
print(type(end_date)) # pandas.tslib.Timestamp
end_date_str = end_date.strftime('%Y-%m-%d %H:%M:%S') #convert to str
print(end_date_str) # '2017-09-30 04:47:59'
print(end_date < datetime.datetime(2017,1,9)) #False
Simply cast the result to a string, and recover it using .values[0]:
In [38]: end_date
Out[38]:
48173 2017-09-20 04:47:59
Name: end, dtype: datetime64[ns]
In [39]: end_date.astype(str).values[0]
Out[39]: '2017-09-20 04:47:59'
If you want a datetime object, you have to convert it to a timestamp, and then back to a datetime object:
In [42]: end_date.values[0].item()
Out[42]: 1505882879000000000
In [43]: datetime.fromtimestamp(end_date.values[0].item()/10**9)
Out[43]: datetime.datetime(2017, 9, 20, 6, 47, 59)
Otherwise, you can strptime the string recovered in step 1:
In [48]: datetime.datetime.strptime(end_date.astype(str).values[0], '%Y-%m-%d %H:%M:%S')
Out[48]: datetime.datetime(2017, 9, 20, 4, 47, 59)
You may wonder why there is a 2 hours difference between the results. This is because the datetime.datetime.fromtimestamp takes my timezone into account (currently CEST, which is UTC+2).
On the other hand, parsing a string to a datetime object doesn't yield any timezone information, srtptime naively parses the timestamp without regards for the timezone, which leads to a 2 hours discrepancy.

Categories