I have one column on my dataframe that follows this date format:
17 MAY2016
I've tried to follow this reference: http://strftime.org/ and pandas.to_datetime reference: http://pandas.pydata.org/pandas-docs/version/0.20/generated/pandas.to_datetime.html
My code is as follows:
df1 =df1.apply(pandas.to_datetime, errors='ignore', format='%d %b%Y')
I also tried: format='%d/%b%Y' format='%d /%b%Y' and still doesn't work. The date column type is still and object.
Any ideas? Thanks in advance
You can use to_datetime only:
df = pd.DataFrame({'date':['17 MAY2016']})
df['date'] = pd.to_datetime(df['date'])
print (df)
date
0 2016-05-17
If want format parameter:
df['date'] = pd.to_datetime(df['date'], format='%d %b%Y')
print (df)
date
0 2016-05-17
If some non date values add errors='coerce' for convert them to NaT:
df['date'] = pd.to_datetime(df['date'], errors='coerce')
EDIT:
For check use dtypes:
print (df.dtypes)
date datetime64[ns]
dtype: object
You don't need to use .apply, the to_datetime function natively works on pandas Series objects.
df1['date column'] = pd.to_datetime(df1['date column'], errors='ignore')
Related
I have a dataframe where one of the columns called 'date', containing objects, looks like:
df =
|Date
|Mar-24
|Aug-22
|Sep-25
|...
I want to convert that column into date so for example Mar-24 would look like 2024-03-01. So far i have tried
df['Date'] = pd.to_datetime(df['Date'], format= '%b-%y')
which i think should work but from the few thousand rows i've found that there are rows which contain the full year such as 'Apr 2023' which won't be picked up by %y, is there a way i could find those rows in the column and change them into the short year before applying the above code or just giving the code both %y and %Y arguments?
Use the parameter errors='coerce' in combination with combine_first:
Minimal example:
import pandas as pd
series = pd.Series(['Mar-24', 'Aug-22', 'Sep-2025'], [0, 1, 2])
date1 = pd.to_datetime(series, format='%b-%y', errors='coerce')
date2 = pd.to_datetime(series, format='%b-%Y', errors='coerce')
date1.combine_first(date2)
Output:
0 2024-03-01
1 2022-08-01
2 2025-09-01
dtype: datetime64[ns]
Or for your specific case in one line:
df['Date'] = pd.to_datetime(df['Date'], format='%b-%y', errors='coerce').combine_first(pd.to_datetime(df['Date'], format='%b-%Y', errors='coerce'))
I want to convert this Timestamp object to datetime this object was obtained after using asfreq on a dataframe this is the last index
Timestamp('2018-12-01 00:00:00', freq='MS')
<class 'pandas._libs.tslibs.timestamps.Timestamp'>
wanted output
2018-12-01
do you want this?
from pandas._libs.tslibs.timestamps import Timestamp
ts = Timestamp('2018-12-01 00:00:00', freq='MS')
date_time = ts.to_pydatetime()
And if you just want a string then you can do this:
print(str(ts).split()[0])
out:
'2018-12-01'
You should be able to floor the timestamp upto the date part (or any other part), which in this example will get rid of the hour-minute-second level detail.
df = pd.DataFrame({'ts': [pd.Timestamp('2019-01-01 00:10:10')]})
df.ts.dt.floor('d')
0 2019-01-01
Name: ts, dtype: datetime64[ns]
I have a dataframe and the Date column has two different types of date formats going on.
eg. 1983-11-10 00:00:00 and 10/11/1983
I want them all to be the same type, how can I iterate through the Date column of my dataframe and convert the dates to one format?
I believe you need parameter dayfirst=True in to_datetime:
df = pd.DataFrame({'Date': {0: '1983-11-10 00:00:00', 1: '10/11/1983'}})
print (df)
Date
0 1983-11-10 00:00:00
1 10/11/1983
df['Date'] = pd.to_datetime(df.Date, dayfirst=True)
print (df)
Date
0 1983-11-10
1 1983-11-10
because:
df['Date'] = pd.to_datetime(df.Date)
print (df)
Date
0 1983-11-10
1 1983-10-11
Or you can specify both formats and then use combine_first:
d1 = pd.to_datetime(df.Date, format='%Y-%m-%d %H:%M:%S', errors='coerce')
d2 = pd.to_datetime(df.Date, format='%d/%m/%Y', errors='coerce')
df['Date'] = d1.combine_first(d2)
print (df)
Date
0 1983-11-10
1 1983-11-10
General solution for multiple formats:
from functools import reduce
def convert_formats_to_datetimes(col, formats):
out = [pd.to_datetime(col, format=x, errors='coerce') for x in formats]
return reduce(lambda l,r: pd.Series.combine_first(l,r), out)
formats = ['%Y-%m-%d %H:%M:%S', '%d/%m/%Y']
df['Date'] = df['Date'].pipe(convert_formats_to_datetimes, formats)
print (df)
Date
0 1983-11-10
1 1983-11-10
I want them all to be the same type, how can I iterate through the
Date column of my dataframe and convert the dates to one format?
Your input data is ambiguous: is 10 / 11 10th November or 11th October? You need to specify logic to determine which is appropriate. A function is useful if you with to try multiple date formats sequentially:
def date_apply_formats(s, form_lst):
s = pd.to_datetime(s, format=form_lst[0], errors='coerce')
for form in form_lst[1:]:
s = s.fillna(pd.to_datetime(s, format=form, errors='coerce'))
return s
df['Date'] = date_apply_formats(df['Date'], ['%Y-%m-%d %H:%M:%S', '%d/%m/%Y'])
Priority is given to the first item in form_lst. The solution is extendible to an arbitrary number of provided formats.
Input date is
NSECODE Date Close
1 NSE500 20000103 1291.5500
2 NSE500 20000104 1335.4500
3 NSE500 20000105 1303.8000
history_nseindex_df["Date"] = pd.to_datetime(history_nseindex_df["Date"])
history_nseindex_df["Date"] = history_nseindex_df["Date"].dt.strftime("%Y-%m-%d")
ouput is now
NSECode Date Close
1 NSE500 2000-01-03 1291.5500
2 NSE500 2000-01-04 1335.4500
3 NSE500 2000-01-05 1303.8000
what is the efficient way to convert the column values into dates "DD-MM-YYYY" when the values given like "Feb-15" which needs to be "01-02-2015". if it's "Dec-46" it must return "01-12-1946".
You can pass the format '%b-%y' to to_datetime:
In[42]:
df = pd.DataFrame({'date':["Feb-15","Dec-46"]})
df['new_date'] = pd.to_datetime(df['date'], format='%b-%y')
df
Out[42]:
date new_date
0 Feb-15 2015-02-01
1 Dec-46 2046-12-01
Note that the new dtype is datetime64, you cannot control the display output, if you insist on DD-MM-YYYY then you would have to convert to a string using dt.strftime:
In[43]:
df['str_date'] = df['new_date'].dt.strftime('%d-%m-%Y')
df
Out[43]:
date new_date str_date
0 Feb-15 2015-02-01 01-02-2015
1 Dec-46 2046-12-01 01-12-2046
but then you have strings which is not that useful if you need to perform arithmetic operations or filtering
EDIT
You cannot store dates earlier than 1970 so '01-01-1946' is not a valid datetime that can be represented by datetime64
I have a Pandas Dataframe df:
a date
1 2014-06-29 00:00:00
df.types return:
a object
date object
I want convert column data to data without time but:
df['date']=df['date'].astype('datetime64[s]')
return:
a date
1 2014-06-28 22:00:00
df.types return:
a object
date datetime64[ns]
But value is wrong.
I'd have:
a date
1 2014-06-29
or:
a date
1 2014-06-29 00:00:00
I would start by putting your dates in pd.datetime:
df['date'] = pd.to_datetime(df.date)
Now, you can see that the time component is still there:
df.date.values
array(['2014-06-28T19:00:00.000000000-0500'], dtype='datetime64[ns]')
If you are ok having a date object again, you want:
df['date'] = [x.strftime("%y-%m-%d") for x in df.date]
Here would be ending with a datetime:
df['date'] = [x.date() for x in df.date]
df.date
datetime.date(2014, 6, 29)
Here you go. Just use this pattern:
df.to_datetime().date()