How to convert non-traditional formatted Pandas date object to datetime - python

I have an interesting scenario where a date object looks like the following:
'6/7/2018 7:59:11 PM'
in the format m/d/yyyy h:mm:ss PM (or AM). Note that the month and hour is not padded with a zero. I have tried the following line of code using a Pandas date object:
data = pd.read_csv('file.txt', sep="\t", header=None, dtype = 'str')
data.columns = ['A', 'B', 'C', ...]
The data.columns provides a look at the format of the file, all tab-delimited (note that is not an actual line of code, just an arbitrary way to show how the columns were labeled). The time series are in Column A. I attempted the conversion using:
time = pd.to_datetime(pd.Series(data['A']), format = '%-m/%-d/%Y %-H/%M/%S %p')
The return is:
ValueError: '-' is a bad directive in format '%-m/%-d/%Y %-H/%M/%S %p'
Any suggestions on how to go about resolving this issue would be greatly appreciated!

Your datetime string is '%m/%d/%Y %H:%M:%S %p'
Ex:
import pandas as pd
data = pd.DataFrame({"A": ['6/7/2018 7:59:11 PM' ]})
time = pd.to_datetime(pd.Series(data['A']), format = '%m/%d/%Y %H:%M:%S %p')
print( time )
Output:
0 2018-06-07 07:59:11
Name: A, dtype: datetime64[ns]

Related

Pandas converting one set of columns and other to timestamp on read csv

I am currently trying to read a CSV with two columns that have date variables in them. The problem that I am seeing is that column DateA values are coming back as a datetime type, while DateB column values are a Pandas Timestamp type. Any ideas why this could be? I cant convert from Timestamp to date time individually either. I customized the read_csv to be as follows:
pd.read_csv("filename.csv", parse_dates=['DateA', 'DateB'], date_parser=self.date_function)
def date_function(self, date_list):
for i in range(len(date_list)):
#Date formats can be 1/10/21 or 1/10/21 12:45
formats = ['%m/%d/%y %H:%M', '%Y-%m-%d %H:%M:%S', '%m/%d/%y']
for format in formats:
try:
date_list[i] = dt.strptime(date_list[i], format)
break
except Exception as e:
print("Error parsing: {}. lets keep trying other formats!".format(e))
continue
return date_list
CSV:
DateA,Summary,DateB
10/21/21 19:00,Some Summary,10/26/21
Padas version 1.3.4
I don't understand where is your problem. I slightly modified your code because it's not reproducible:
import pandas as pd
from datetime import datetime as dt
def date_function(date_list):
for i in range(len(date_list)):
#Date formats can be 1/10/21 or 1/10/21 12:45
formats = ['%m/%d/%y %H:%M', '%Y-%m-%d %H:%M:%S', '%m/%d/%y']
for format in formats:
try:
date_list[i] = dt.strptime(date_list[i], format)
break
except Exception as e:
print("Error parsing: {}. lets keep trying other formats!".format(e))
continue
return date_list
df = pd.read_csv('filename.csv', parse_dates=['DateA', 'DateB'], date_parser=date_function)
# Output:
Error parsing: time data '10/26/21' does not match format '%m/%d/%y %H:%M'. lets keep trying other formats!
Error parsing: time data '10/26/21' does not match format '%Y-%m-%d %H:%M:%S'. lets keep trying other formats!
Result:
>>> df.dtypes
DateA datetime64[ns]
Summary object
DateB datetime64[ns]
dtype: object
>>> df.loc[0, 'DateA']
Timestamp('2021-10-21 19:00:00')
>>> df.loc[0, 'DateB']
Timestamp('2021-10-26 00:00:00')
But you can get the same result without your date_parser function:
df = pd.read_csv('filename.csv', parse_dates=['DateA', 'DateB'])

csv Pandas datetime convert time to seconds

I work with data from Datalogger and the timestap is not supported by datetime in the Pandas Dataframe.
I would like to convert this timestamp into a format pandas knows and the then convert the datetime into seconds, starting with 0.
>>>df.time
0 05/20/2019 19:20:27:374
1 05/20/2019 19:20:28:674
2 05/20/2019 19:20:29:874
3 05/20/2019 19:20:30:274
Name: time, dtype: object
I tried to convert it from the object into datetime64[ns]. with %m or %b for month.
df_time = pd.to_datetime(df["time"], format = '%m/%d/%y %H:%M:%S:%MS')
df_time = pd.to_datetime(df["time"], format = '%b/%d/%y %H:%M:%S:%MS')
with error: redefinition of group name 'M' as group 7; was group 5 at position 155
I tried to reduce the data set and remove the milliseconds without success.
df['time'] = pd.to_datetime(df['time'],).str[:-3]
ValueError: ('Unknown string format:', '05/20/2019 19:20:26:383')
or is it possible to just subtract the first time line from all the other values in the column time?
Use '%m/%d/%Y %H:%M:%S:%f' as format instead of '%m/%d/%y %H:%M:%S:%MS'
Here is the format documentation for future reference
I am not exactly sure what you are looking for but you can use the above example to format your output and then you can remove items from your results like the microseconds this way:
date = str(datetime.now())
print(date)
2019-07-28 14:04:28.986601
print(date[11:-7])
14:04:28
time = date[11:-7]
print(time)
14:04:28

Converting pandas Column to datetime

I am trying to convert a pandas column to datetime. This is my error message.
ValueError: time data '01-JUN-17 00:00:00' does not match format
'%d-%b-%y %H.%M.%S' (match)
This is my code :
df['dayofservice'] = pd.to_datetime(df['dayofservice'], format = '%d-%b-%y %H.%M.%S')
I have read this documentation to ensure my format is correct : https://docs.python.org/2/library/datetime.html#strftime-and-strptime-behavior
It's still not working for me.
pandas is man/woman enough to parse this without a format field:
In[90]:
pd.to_datetime('01-JUN-17 00:00:00')
Out[90]: Timestamp('2017-06-01 00:00:00')
So this should work:
df['dayofservice'] = pd.to_datetime(df['dayofservice'])
Change . to : in times like:
df['dayofservice'] = pd.to_datetime(df['dayofservice'], format = '%d-%b-%y %H:%M:%S')

Python ValueError: time data '2001-11-03 ' %Y:%m %d %H:%M:%S' when dates in csv file are month/day/year

I'm having an issue where the date format is not matching up. Meaning in my .csv file the dates are as follows %m/%d/%Y (ex. 11/3/2001) but in the error it saying %Y/%m/%d or %Y/%d/%m. I've tried all the possible permutations as far as year, month and day and I continue to recieve the same error of ValueError: time data '2001-11-03 ' %Y:%m %d %H:%M:%S'. Below is my code. Thanks.
df = pd.read_excel('.xlsx', header=None)
df.to_csv('.csv', header=None, index=False)
df= pd.read_csv('.csv', index_col[5,8,9,12], date_parser=lambda x: datetime.datetime.strptime(x, '%Y/%m/$d %H:%M:%S').strptime('%m/%d/%Y))
Note: What I'm trying to do is convert an .xlsx file to .csv and then remove the trailing 0:00 from multiple columns within the .csv file. Hope this helps.
Use the parse from dateutil.parser to parse the date appropriately. It is an easy access. The fastest way to parse dates.
from dateutil.parser import parse
df = pd.read_csv('filename.csv', date_parser = parse, index_..)
our you can use to_datetime native to Pandas
pd.to_datetime(df['Date Col'])
In order to format the date properly, you should use the following:
date_parser=lambda x: parse(x)
#parse from dateutil.parser
df['Date Col'] = df['Date Col'].strftime('%m/%d/%Y')
df.to_csv('New File.csv')
You can use to_datetime since you are using pandas. MoreInfo
import pandas as pd
df = pd.DataFrame({"a": ["11/3/2001", '2001-11-03']})
df["a"] = pd.to_datetime(df["a"])
print(df["a"])
Output:
0 2001-11-03
1 2001-11-03
Name: a, dtype: datetime64[ns]

Why does time data not match format?

I have a dataframe with strings that I am converting to datetimes. They all look like "12/20/17 5:45:30" (month/day/year hour:minute:second). This is my code:
for col in cols:
df[col] = pd.to_datetime(df[col], format='%m/%d/%Y %H:%M:%S')
But I get the following error:
ValueError: time data '4/19/16 1:05:30' does not match format '%m/%d/%Y %H:%M:%S'
The date shown in the error is the very first date in the dataframe, so it is not working at all. Can someone explain what's wrong with my datetime format? How does that datetime not match the format? By the way, before I was doing this with a file that had no seconds, and my format was %m/%d/%Y %H:%M, which worked fine, but now with seconds it does not.
Your format string is not working because your format uses a Y where it needed a y. But pandas to the rescue, it can often figure this stuff out for you by using the infer_datetime_format parameter to pandas.to_datetime()
Code:
df[col] = pd.to_datetime(df[col], infer_datetime_format=True)
Test Code:
df = pd.DataFrame(["12/20/17 5:45:30", "4/19/16 1:05:30"], columns=['date'])
print(df)
for col in df.columns:
df[col] = pd.to_datetime(df[col], infer_datetime_format=True)
print(df)
Results:
date
0 12/20/17 5:45:30
1 4/19/16 1:05:30
date
0 2017-12-20 05:45:30
1 2016-04-19 01:05:30

Categories