Pandas to_datetime error 'unconverted data remains'

Pandas to_datetime error 'unconverted data remains' - python

I'm trying to convert date column in my Pandas DataFrame to datetime format. If I don't specify date format, it works fine, but then further along in the code I get issues because of different time formats.
The original dates looks like this 10/10/2019 6:00 in european date format.
I tried specifying format like so:
df['PeriodStartDate'] = pd.to_datetime(df['PeriodStartDate'],
format="%d/%m/%Y")
which results in an error: unconverted data remains 6:00
I then tried to update format directive to format="%d/%m/%Y %-I/%H" which comes up with another error: '-' is a bad directive in format '%d/%m/%Y %-I/%H' even though I thought that to_datetime uses the same directives and strftime and in the latter %-I is allowed.
In frustration I then decided to chop off the end of the string since I don't really need hours and minutes:
df['PeriodStartDate'] = df['PeriodStartDate'].str[:10]
df['PeriodStartDate'] = pd.to_datetime(df['PeriodStartDate'],
format="%d/%m/%Y")
But this once again results in an error: ValueError: unconverted data remains: which of course comes from the fact that some dates have 9 digits like 3/10/2019 6:00
Not quite sure where to go from here.

format %H:%M would work(don't forget the : in between)
pd.to_datetime('10/10/2019 6:00', format="%m/%d/%Y %H:%M")
Out[1049]: Timestamp('2019-10-10 06:00:00')
pd.to_datetime('3/10/2019 18:00', format="%d/%m/%Y %H:%M")
Out[1064]: Timestamp('2019-10-03 18:00:00')

Oh, I feel so dumb. I figured out what the issue was. For some reason I thought that hours were in a 12-hour format, but they were in fact in a 24-hour format, so changing directive to "%d/%m/%Y %H:%M" solved it.

Related

Correct reading of datetime with AM/PM format

I'm trying to read some csv files that contain a column called 'timestamp' with this format:
7/6/2022 7:30:00 PM which should translate to (mm/dd/YYYY hh:mm:ss). What I tried was after reading the csv file using:
df['timestamp']= pd.to_datetime(df['timestamp'],format='%m/%d/%Y %I:%M:%S %p')
And it renders a totally different thing with this error:
ValueError: time data '07-06 19:30' does not match format '%m/%d/%Y %I:%M:%S %p' (match)
'07-06 19:30' This value is the same that appears when reading the csv directly with no formatting which is strange as when I open the csv the full date is there. I'm a bit lost on this case as it appears as I cannot convert the date.
Thanks

The format='%m/%d/%Y %I:%M:%S %p' should work, make sure you read your data as string.
That said, pandas is advanced enough to figure out the format semi-automatically, the only ambiguity to resolve is to specify that the first digits are not days:
df['new_timestamp'] = pd.to_datetime(df['timestamp'], dayfirst=False)
example:
timestamp new_timestamp
0 7/6/2022 7:30:00 PM 2022-07-06 19:30:00

Using datetime.strptime with microseconds (ValueError: unconverted data remains: :00)

I'm trying to convert this date '2021-09-29 00:05:00+00:00' into "str" using the following code:
date1 = '2021-09-29 00:05:00+00:00'
date1 = datetime.datetime.strptime(date1,'%Y-%m-%d %H:%M:%S+%f')
but I get the error:
"ValueError: unconverted data remains: :00".
I don't know how to deal with the microseconds. Any help to use strptime with that date format would be more than appreciated!
Thanks in advance.

The +00:00 offset is a timezone offset in hours and minutes. Per the strftime() and strptime() Format Codes documentation, use %z to parse:
Directive
Meaning
Example
Notes
%z
UTC offset in the form ±HHMM[SS[.ffffff]] (empty string if the object is naive)
(empty), +0000, -0400, +1030, +063415, -030712.345216
(6)
Syntax for the colon(:) wasn't supported until Python 3.7, per a detail in note 6:
Changed in version 3.7: When the %z directive is provided to the strptime() method, the UTC offsets can have a colon as a separator between hours, minutes and seconds. For example, '+01:00:00' will be parsed as an offset of one hour. In addition, providing 'Z' is identical to '+00:00'.
from datetime import datetime
s = '2021-09-29 00:05:00+00:00'
t = datetime.strptime(s,'%Y-%m-%d %H:%M:%S%z')
print(t)
Output:
2021-09-29 00:05:00+00:00

Converting date into '%Y-%m-%d' using strptime and strftime using Python

I have a .csv file with a date format I am unfamiliar with (I think from reading around it's unix).
1607299200000,string,int,float
1607385600000,string,int,float
1606953600000,string,int,float
I have been trying to convert it into '%Y-%m-%d' using Python but keep getting various errors. I am fairly novice with Python but this is what I have so far:
outRow.append(datetime.datetime.strptime(row[0],'%B %d %Y').strftime('%Y-%m-%d'))
Any help would be appreciated.

import datetime
timestamp = datetime.datetime.fromtimestamp(1500000000)
print(timestamp.strftime('%Y-%m-%d %H:%M:%S'))
Output:
2017-07-14 08:10:00

The tricky part here is that you have a unix timestamp with microseconds.
AFAIK there's no option to convert unix ts with ms to datetime.
So first you have to drop them (div by 1000), then add them if needed.
ts = row[0]
dt = datetime.utcfromtimestamp(ts//1000).replace(microsecond=ts%1000*1000)
then you can strftime to whichever format you need.
Though if you need to execute this operation for the entire csv, you better look into pandas but that's out of the scope of this question.

Convert UTC timestamp to local timezone issue in pandas

I'm trying to convert a Unix UTC timestamp to a local date format in Pandas. I've been looking through a few solutions but I can't quite get my head around how to do this properly.
I have a dataframe with multiple UTC timestamp columns which all need to be converted to a local timezone. Let's say EU/Berlin.
I first convert all the timestamp columns into valid datetime columns with the following adjustments:
df['date'] = pd.to_datetime(df['date'], unit='s')
This works and gives me the following outcome e.g. 2019-01-18 15:58:25 if I know try to adjust the timezone for this Date Time I have tried both:
df['date'].tz_localize('UTC').tz_convert('Europe/Berlin')
and
df['date'].tz_convert('Europe/Berlin')
In both cases the error is: TypeError: index is not a valid DatetimeIndex or PeriodIndex and I don't understand why.
The problem must be that the DateTime column is not on the index. But even when I use df.set_index('date') and after I try the above options it doesn't work and I get the same error.
Also, if it would work it seems that this method only allows the indexed DateTime to be timezone adjusted. How would I then adjust for the other columns that need adjustment?
Looking to find some information on how to best approach these issues once and for all! Thanks

You should first specify that it is a datetime by adding the .dt. to a non index
df['date'] = df['date'].dt.tz_localize('UTC').dt.tz_convert('Europe/Berlin')
This should be used if the column is not the index column.

Pandas to_datetime specifying format for iso 8601 timestamp string with nanoseconds

I have ISO 8601 formatted timestamp strings in a column of a Pandas dataframe. I am looking for the most efficient way to convert this column of strings to Pandas datatime objects. pd.to_datetime() works but my dataframe is about 7.5 million rows so it is very slow.
I can specify the format using strftime syntax to avoid auto format detection and (I assume) substantially speed up the conversion.
import pandas as pd
pd.to_datetime('2013-04-27 08:27:30.000001540', format='%Y-%m-%d %H:%M:%S.%f')
Gives me:
ValueError: unconverted data remains: 540
If I chop off the last three characters of the timestamp this works perfectly. Seems like the elegant solution here is to determine the Python compatible strftime directive for nanoseconds. I tried %9N, %N, %9. Any ideas?

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Pandas to_datetime error 'unconverted data remains' - python

format %H:%M would work(don't forget the : in between) pd.to_datetime('10/10/2019 6:00', format="%m/%d/%Y %H:%M") Out[1049]: Timestamp('2019-10-10 06:00:00') pd.to_datetime('3/10/2019 18:00', format="%d/%m/%Y %H:%M") Out[1064]: Timestamp('2019-10-03 18:00:00')

Oh, I feel so dumb. I figured out what the issue was. For some reason I thought that hours were in a 12-hour format, but they were in fact in a 24-hour format, so changing directive to "%d/%m/%Y %H:%M" solved it.

Related

Correct reading of datetime with AM/PM format

Using datetime.strptime with microseconds (ValueError: unconverted data remains: :00)

Converting date into '%Y-%m-%d' using strptime and strftime using Python

Convert UTC timestamp to local timezone issue in pandas

Pandas to_datetime specifying format for iso 8601 timestamp string with nanoseconds

Categories

Resources