Pandas changing date to a shorter format - python

G'day!
In my limited time working with Python and Pandas one question comes up time and time again - what if my input data has date/time in a long format, how to change it to a shorter version?
For example, the date in the input file would be:
10/10/2019 5:52:30 AM
If I want to perform date/time operations with it, I'll need to convert it to datetime:
df = pd.to_datetime(df['date'], format="%d/%m/%Y %H:%M:%S %p")
So now I have datetime objects in full long format. But what if I only need the day/month/year?
I could of course convert them back to strings and then to convert them back into datetime format.
df = df['date'].dt.strftime("%d/%m/%Y")
df = pd.to_datetime(df['date'], format="%d/%m/%Y")
It hurts my eyes to look at this... There should be a simpler way of doing this, right?

Pandas floor or round functions can do the job:
#Generate the data
df = pd.DataFrame({'year': [2015, 2016],
'month': [2, 3],
'day': [4, 5],
'hour': [2, 23]})
df['Date']=pd.to_datetime(df)
#Floor and round datetime
df['Date'].dt.floor('d')
df['Date'].dt.round('d')
The output for dt.floor is:
0 2015-02-04
1 2016-03-05
Name: Date, dtype: datetime64[ns]
and for dt.round:
0 2015-02-04
1 2016-03-06
Name: Date, dtype: datetime64[ns]

Related

Applying date to both %y and %Y

I have a dataframe where one of the columns called 'date', containing objects, looks like:
df =
|Date
|Mar-24
|Aug-22
|Sep-25
|...
I want to convert that column into date so for example Mar-24 would look like 2024-03-01. So far i have tried
df['Date'] = pd.to_datetime(df['Date'], format= '%b-%y')
which i think should work but from the few thousand rows i've found that there are rows which contain the full year such as 'Apr 2023' which won't be picked up by %y, is there a way i could find those rows in the column and change them into the short year before applying the above code or just giving the code both %y and %Y arguments?
Use the parameter errors='coerce' in combination with combine_first:
Minimal example:
import pandas as pd
series = pd.Series(['Mar-24', 'Aug-22', 'Sep-2025'], [0, 1, 2])
date1 = pd.to_datetime(series, format='%b-%y', errors='coerce')
date2 = pd.to_datetime(series, format='%b-%Y', errors='coerce')
date1.combine_first(date2)
Output:
0 2024-03-01
1 2022-08-01
2 2025-09-01
dtype: datetime64[ns]
Or for your specific case in one line:
df['Date'] = pd.to_datetime(df['Date'], format='%b-%y', errors='coerce').combine_first(pd.to_datetime(df['Date'], format='%b-%Y', errors='coerce'))

How do i convert 2014-12-19T05:00:00 to proper datatime, 2014-12 in Python

I get a date in data which looks like this "2014-12-19T05:00:00". I want to convert it in order to obtain a Date or String object and get something like "01-04-2018" that its "dd-MM-YYYY" in dataframe. How can I do it?
The result will be used for time series. So far,my time series result is like this, perhaps because it doesn't detect the date format (x-axis not in datetime).
Date column:
For a pandas dataframe column/series:
Convert a string column (dtype of object) to a datetime column (dtype of datetime64[ns]) using to_datetime. Then if you want another column with your datetimes back in a string format of your choosing, use dt.strftime.
An example:
df = pd.DataFrame({
"Date": ["2014-12-19T05:00:00", "2014-12-20T05:00:00", "2014-12-21T05:00:00"],
"Value": [0, 2, 4]})
df['DateTime'] = pd.to_datetime(df['Date'])
df['MyDateTimeString'] = df['DateTime'].dt.strftime('%Y-%m-%d')
print(df)
# Date Value DateTime MyDateTimeString
# 0 2014-12-19T05:00:00 0 2014-12-19 05:00:00 2014-12-19
# 1 2014-12-20T05:00:00 2 2014-12-20 05:00:00 2014-12-20
# 2 2014-12-21T05:00:00 4 2014-12-21 05:00:00 2014-12-21
In general:
To read your strings into datetime objects, use strptime:
import datetime
d = datetime.datetime.strptime("2014-12-19T05:00:00", "%Y-%m-%dT%H:%M:%S")
Then to get a string representation of those datetime objects, use strftime:
d.strftime("%d-%m-%Y")
For more general string-to-datetime parsing, the dateparser library is handy:
import dateparser
dateparser.parse("2014-12-19T05:00:00").strftime("%d-%m-%Y")
# '19-12-2014'
dateparser.parse("December 19, 2014 at 5am").strftime("%d-%m-%Y")
# '19-12-2014'
I recommend using https://pypi.org/project/python-dateutil/
(Install with pip install python-dateutil.)
>>> import dateutil.parser
>>> d = dateutil.parser.isoparse('2014-12-19T05:00:00')
>>> print(d.strftime('%m-%d-%Y'))
12-19-2014

How to convert pandas column to date when column is something like "Jan-18"?

what is the efficient way to convert the column values into dates "DD-MM-YYYY" when the values given like "Feb-15" which needs to be "01-02-2015". if it's "Dec-46" it must return "01-12-1946".
You can pass the format '%b-%y' to to_datetime:
In[42]:
df = pd.DataFrame({'date':["Feb-15","Dec-46"]})
df['new_date'] = pd.to_datetime(df['date'], format='%b-%y')
df
Out[42]:
date new_date
0 Feb-15 2015-02-01
1 Dec-46 2046-12-01
Note that the new dtype is datetime64, you cannot control the display output, if you insist on DD-MM-YYYY then you would have to convert to a string using dt.strftime:
In[43]:
df['str_date'] = df['new_date'].dt.strftime('%d-%m-%Y')
df
Out[43]:
date new_date str_date
0 Feb-15 2015-02-01 01-02-2015
1 Dec-46 2046-12-01 01-12-2046
but then you have strings which is not that useful if you need to perform arithmetic operations or filtering
EDIT
You cannot store dates earlier than 1970 so '01-01-1946' is not a valid datetime that can be represented by datetime64

Convert pandas series cell to string and datetime object

I have sliced the pandas dataframe.
end_date = df[-1:]['end']
type(end_date)
Out[4]: pandas.core.series.Series
end_date
Out[3]:
48173 2017-09-20 04:47:59
Name: end, dtype: datetime64[ns]
How to get rid of end_date's index value 48173 and get only 2017-09-20 04:47:59 string? I have to call REST API with 2017-09-20 04:47:59 as a parameter, so I have to get string from pandas datetime64 series.
How to get rid of end_date's index value 48173 and get only datetime object [something like datetime.datetime.strptime('2017-09-20 04:47:59', '%Y-%m-%d %H:%M:%S')]. I need it because, later I will have to check if '2017-09-20 04:47:59' < datetime.datetime(2017,1,9)
I need to convert just a single cell value, not a whole column.
How to do these conversions?
It seems you need:
import pandas as pd
data = ['2017-09-20 04:47:59','2017-10-20 04:47:59','2017-09-30 04:47:59']
df = pd.DataFrame(data,columns=['end'])
df['end'] = pd.to_datetime(df['end'])
df
df will be:
end
0 2017-09-20 04:47:59
1 2017-10-20 04:47:59
2 2017-09-30 04:47:59
After that you can use below code to get rid of index and use as 'Timestamp' object:
end_date = df['end'].iloc[-1] #get last row of column end
print(type(end_date)) # pandas.tslib.Timestamp
end_date_str = end_date.strftime('%Y-%m-%d %H:%M:%S') #convert to str
print(end_date_str) # '2017-09-30 04:47:59'
print(end_date < datetime.datetime(2017,1,9)) #False
Simply cast the result to a string, and recover it using .values[0]:
In [38]: end_date
Out[38]:
48173 2017-09-20 04:47:59
Name: end, dtype: datetime64[ns]
In [39]: end_date.astype(str).values[0]
Out[39]: '2017-09-20 04:47:59'
If you want a datetime object, you have to convert it to a timestamp, and then back to a datetime object:
In [42]: end_date.values[0].item()
Out[42]: 1505882879000000000
In [43]: datetime.fromtimestamp(end_date.values[0].item()/10**9)
Out[43]: datetime.datetime(2017, 9, 20, 6, 47, 59)
Otherwise, you can strptime the string recovered in step 1:
In [48]: datetime.datetime.strptime(end_date.astype(str).values[0], '%Y-%m-%d %H:%M:%S')
Out[48]: datetime.datetime(2017, 9, 20, 4, 47, 59)
You may wonder why there is a 2 hours difference between the results. This is because the datetime.datetime.fromtimestamp takes my timezone into account (currently CEST, which is UTC+2).
On the other hand, parsing a string to a datetime object doesn't yield any timezone information, srtptime naively parses the timestamp without regards for the timezone, which leads to a 2 hours discrepancy.

Pandas Dataframe convert string to data without time

I have a Pandas Dataframe df:
a date
1 2014-06-29 00:00:00
df.types return:
a object
date object
I want convert column data to data without time but:
df['date']=df['date'].astype('datetime64[s]')
return:
a date
1 2014-06-28 22:00:00
df.types return:
a object
date datetime64[ns]
But value is wrong.
I'd have:
a date
1 2014-06-29
or:
a date
1 2014-06-29 00:00:00
I would start by putting your dates in pd.datetime:
df['date'] = pd.to_datetime(df.date)
Now, you can see that the time component is still there:
df.date.values
array(['2014-06-28T19:00:00.000000000-0500'], dtype='datetime64[ns]')
If you are ok having a date object again, you want:
df['date'] = [x.strftime("%y-%m-%d") for x in df.date]
Here would be ending with a datetime:
df['date'] = [x.date() for x in df.date]
df.date
datetime.date(2014, 6, 29)
Here you go. Just use this pattern:
df.to_datetime().date()

Categories