Why does time data not match format? - python

I have a dataframe with strings that I am converting to datetimes. They all look like "12/20/17 5:45:30" (month/day/year hour:minute:second). This is my code:
for col in cols:
df[col] = pd.to_datetime(df[col], format='%m/%d/%Y %H:%M:%S')
But I get the following error:
ValueError: time data '4/19/16 1:05:30' does not match format '%m/%d/%Y %H:%M:%S'
The date shown in the error is the very first date in the dataframe, so it is not working at all. Can someone explain what's wrong with my datetime format? How does that datetime not match the format? By the way, before I was doing this with a file that had no seconds, and my format was %m/%d/%Y %H:%M, which worked fine, but now with seconds it does not.

Your format string is not working because your format uses a Y where it needed a y. But pandas to the rescue, it can often figure this stuff out for you by using the infer_datetime_format parameter to pandas.to_datetime()
Code:
df[col] = pd.to_datetime(df[col], infer_datetime_format=True)
Test Code:
df = pd.DataFrame(["12/20/17 5:45:30", "4/19/16 1:05:30"], columns=['date'])
print(df)
for col in df.columns:
df[col] = pd.to_datetime(df[col], infer_datetime_format=True)
print(df)
Results:
date
0 12/20/17 5:45:30
1 4/19/16 1:05:30
date
0 2017-12-20 05:45:30
1 2016-04-19 01:05:30

Related

Convert df column to date

I have the following sample data to convert to date.
data = {'Dates':['20030430', '20010131', '20190805', '20191115']}
df = pd.DataFrame(data)
The code I am using is
df['Converted Date'] = pd.to_datetime(df['Dates'], format='%Y%m%d')
it gives the following error.
ValueError: time data '2011-10-13 00:00:00' does not match format '%Y%m%d' (match)
I tried
df['Converted Date'] = pd.to_datetime(df['Dates'], format='%Y%m%d%f')
which results in below error.
ValueError: time data '20190805' does not match format '%Y%m%d%f' (match)
Kindly help to resolve.

Use `pandas.to_timedelta` with decimal periods

In my CSV I have a column with time durations written as 13:08.4 and 13:06.20.
I would like to convert this column into a pandas timedelta.
However when I try df['Time'] = pd.to_timedelta(df['Time']) I get an error ValueError: expected hh:mm:ss format before .
What is going on and how do I fix it?
You can replace '.' by ':' like error said to make it in format of hh:mm:ss:
df['Time'] = pd.to_timedelta(df['Time'].str.replace('.',':'))
#13:08.4 becomes: 0 days 13:08:04
OR
If the format of 'Time' is mm:ss then use:
df['Time'] = pd.to_timedelta('00:'+df['Time'])
#13:08.4 becomes: 0 days 00:13:08.400000

Convert date string YYYY-MM-DD to YYYYMM in pandas

Is there a way in pandas to convert my column date which has the following format '1997-01-31' to '199701', without including any information about the day?
I tried solution of the following form:
df['DATE'] = df['DATE'].apply(lambda x: datetime.strptime(x, '%Y%m'))
but I obtain this error : 'ValueError: time data '1997-01-31' does not match format '%Y%m''
Probably the reason is that I am not including the day in the format. Is there a way better to pass from YYYY-MM_DD format to YYYYMM in pandas?
One way is to convert the date to date time and then use strftime. Just a note that you do lose the datetime functionality of the date
df = pd.DataFrame({'date':['1997-01-31' ]})
df['date'] = pd.to_datetime(df['date'])
df['date'] = df['date'].dt.strftime('%Y%m')
date
0 199701
Might not need to go through the datetime conversion if the data are sufficiently clean (no incorrect strings like 'foo' or '001231'):
df = pd.DataFrame({'date':['1997-01-31', '1997-03-31', '1997-12-18']})
df['date'] = [''.join(x.split('-')[0:2]) for x in df.date]
# date
#0 199701
#1 199703
#2 199712
Or if you have null values:
df['date'] = df.date.str.replace('-', '').str[0:6]

Python ValueError: time data '2001-11-03 ' %Y:%m %d %H:%M:%S' when dates in csv file are month/day/year

I'm having an issue where the date format is not matching up. Meaning in my .csv file the dates are as follows %m/%d/%Y (ex. 11/3/2001) but in the error it saying %Y/%m/%d or %Y/%d/%m. I've tried all the possible permutations as far as year, month and day and I continue to recieve the same error of ValueError: time data '2001-11-03 ' %Y:%m %d %H:%M:%S'. Below is my code. Thanks.
df = pd.read_excel('.xlsx', header=None)
df.to_csv('.csv', header=None, index=False)
df= pd.read_csv('.csv', index_col[5,8,9,12], date_parser=lambda x: datetime.datetime.strptime(x, '%Y/%m/$d %H:%M:%S').strptime('%m/%d/%Y))
Note: What I'm trying to do is convert an .xlsx file to .csv and then remove the trailing 0:00 from multiple columns within the .csv file. Hope this helps.
Use the parse from dateutil.parser to parse the date appropriately. It is an easy access. The fastest way to parse dates.
from dateutil.parser import parse
df = pd.read_csv('filename.csv', date_parser = parse, index_..)
our you can use to_datetime native to Pandas
pd.to_datetime(df['Date Col'])
In order to format the date properly, you should use the following:
date_parser=lambda x: parse(x)
#parse from dateutil.parser
df['Date Col'] = df['Date Col'].strftime('%m/%d/%Y')
df.to_csv('New File.csv')
You can use to_datetime since you are using pandas. MoreInfo
import pandas as pd
df = pd.DataFrame({"a": ["11/3/2001", '2001-11-03']})
df["a"] = pd.to_datetime(df["a"])
print(df["a"])
Output:
0 2001-11-03
1 2001-11-03
Name: a, dtype: datetime64[ns]

Split Datetime Column into a Date and Time Python

Hey so I have seen several questions about this, however, I have yet to successful solve my problem.
I have a single column Time in the format:
2014-07-17 00:59:27.400189+00
I want to split this into a two columns, Date and Hour.
I used
posts['Date']=pd.to_datetime(posts['Time'],format='%Y-%m-%d %H:%M:%S')
However, I get an error
ValueError: unconverted data remains: 400189+00
I am not sure what to label the last bit of information. I tried added %o but received another error
ValueError: 'o' is a bad directive in format '%Y-%m-%d %H:%M:%S.%o'
Any ideas on how I can split these two values into two columns?
Thanks!
the following worked for me:
In [18]:
import pandas as pd
df = pd.DataFrame({'Date':['2014-07-17 00:59:27.400189+00']})
df.dtypes
Out[18]:
Date object
dtype: object
In [19]:
df['Date'] = pd.to_datetime(df['Date'])
df.dtypes
Out[19]:
Date datetime64[ns]
dtype: object
In [20]:
df['Time'],df['Date']= df['Date'].apply(lambda x:x.time()), df['Date'].apply(lambda x:x.date())
df
Out[20]:
Date Time
0 2014-07-17 00:59:27.400189
[1 rows x 2 columns]
This worked for me
import pandas as pd
data = pd.DataFrame({'Date':['2014-07-17 00:59:27.400189+00']})
data['Dates'] = pd.to_datetime(data['Date'], format='%Y:%M:%D').dt.date
data['Hours'] = pd.to_datetime(data['Date'], format='%Y:%M:%D').dt.time
You have to have
print(data)
Dates Hours
2014-07-17 00:59:27.400189+00
import pandas as pd
data = pd.DataFrame({'Date':['2014-07-17 00:59:27.400189+00']})
data['Dates'] = pd.to_datetime(data['Date'], format='%Y:%M:%D').dt.date
data['Hours'] = pd.to_datetime(data['Date'], format='%Y:%M:%D').dt.time
This gives me object type Date and Time. The expected column should be in date format

Categories