Issue with converting string to a datetime in Python - python

I've a datetime (int64) column in my pandas dataframe.
I'm trying to convert its value of 201903250428 to a datetime value.
The value i have for the datetime (int64) column is only till minute level with 24 hours format.
I tried various methods like striptime, to_datetime methods but no luck.
pd.datetime.strptime('201903250428','%y%m%d%H%M')
I get this error when i use the above code.
ValueError: unconverted data remains: 0428
I wanted this value to be converted to like '25-03-2019 04:28:00'

Lower-case y means two-digit years only, so this is trying to parse "20" as the year, 1 as the month, 9 the day, and 03:25 as the time, leaving "0428" unconverted.
You need to use %Y which will work fine:
pd.datetime.strptime('201903250428','%Y%m%d%H%M')
http://strftime.org/ is a handy reference for time formatting/parsing parameters.

Related

Standardizing date format in Python

I have a dataset and I realized the values in the date column are in two different formats.
I tried resolving this issue using date parser but it confuses the day of the month with the month.
For example :
'27/02/21 13:40' is converted correctly to 2021-02-27 13:40:00' BUT '01/03/21 15:09' is converted to '2021-01-03 15:09:00' (Instead of March 1st it's transformed into January 3rd).
I really don't understand why in the first case the conversion is correct, while in the second it's not. Both dates are in the same column and have the same format.
This is a preview of the dataset with the two different dates:
This is the date that was converted correctly
This is the date that was not converted correctly
These are the steps I followed:
I converted the date columns to a list of strings
I created this function:
date parser function
I previewed my converted list and noticed that not all dates had been converted in the same way:
This date was converted correctly
Here, month and day of the month are switched

Pandas.to_datetime doesn't recognize the format of the string to be converted to datetime

I am trying to convert some data from a .txt file to a dataframe to use it for some analysis
the form of the data in the .txt is a follows
DATE_TIME VELOC MEASURE
[m/s] [l/h]
A 09.01.2023 12:45:20 ??? ???
A 09.01.2023 12:46:20 0,048 52,67
A 09.01.2023 12:47:20 0,049 53,77
A 09.01.2023 12:48:20 0,050 54,86
I load the data to a dataframe no problem i covnert the str values of the measurement to float etc everything is good as shows in the
image
the problem I get is when trying to convert the column of the date time that is string to datetime pandas format using this line of code:
volume_flow['DATE_TIME'] = pd.to_datetime(volume_flow['DATE_TIME'], format = '%d.%m.%Y %H:%M:S')
and i get the following error
ValueError: time data '09.01.2023 12:46:20' does not match format '%d.%m.%Y %H:%M:S' (match)
but i don't see how the format is off
I am really lost as to why this is caused as i used the same code with different formats of datetime before with no problem
further more i tried using format = '%dd.%mm.%yyyy %H:%M:S' as well with the same results and when i let the pandas.to_datetime convert it automatically it confuses the day and the month of the data. the data is between 09.01-12.01 so you can't really tell if one is the month or day just by the values.
I think you should go from this
(..., format='%d.%m.%Y %H:%M:S')
to this
(..., format='%d.%m.%Y %H:%M:%S')
You forgot the percentage character!
check the documentations for correct time format. You will note that the directive %S represents the seconds.
Second as a decimal number [00,61].

convert string to datetime for old years (out of bound problem)

I am working with dataset of some historical subjects, some of them are in 1500's. I need to convert the datatype of some columns to datetime so I can calculate the difference in days. I tried pandas.to_datetime for converting strings in columns to datetime, but it returned Out of Bound error.
The issue can be reproduced by the following code:
datestring = '01-04-1595'
datenew = pd.to_datetime(datestring,format='%d-%m-%Y')
and the output error:
OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 1595-04-01 00:00:00
I learned that the limits of timestamp are min 1677-09-21 and max 2262-04-11, but what would be the workaround for this? The expected timestamp range that will accomodate my dataset is between 1500 to 1900.
I would like to apply the string to datetime conversion for all entries of a column.
Thank you.

Group by month in a Pandas dataframe when there is no year in the datetime object

I have a large dataset with a date_time field (object) that is in this format: 01/01 01:00:00 (month/day hour:minute:second). There is no year. I want to be able to group the dataset by month in a Pandas dataframe.
Whatever I try, I either get an error like, "Error parsing datetime string " 01/01 01:00:00" at position 3" or an out-of-bounds error. I'm a bit of a newbie here. I suspect it is a datetime formatting issue because there is no year...but I cannot figure it out.
If you don't have a year, you don't really have a date. But you can still group by month, just treat it like a string!
Something along the lines of this should work:
# create a month string column, called month_str
# the lambda function just turns the col with the yearless 'dates' into a str
# and takes only the first two characters
df['month_str'] = df['datetime'].apply(lambda x: str(x)[0:2])
df.groupby('month_str')

Passing chopped down datetimes

I have been stumped for the past few hours trying to solve the following.
In a large data set I have from an automated system, there is a DATE_TIME value, which for rows at midnight has values that dont have a the full hour like: 12-MAY-2017 0:16:20
When I try convert this to a date (so that its usable for conversions) as follows:
df['DATE_TIME'].astype('datetime64[ns]')
I get the following error:
Error parsing datetime string "12-MAY-2017 0:16:20" at position 3
I tried writing some REGEX to pull out each piece but couldnt get anything working given the hour could be either 1 or two characters respectively. It also doesn't seem like an ideal solution to write regex for each peice.
Any ideas on this?
Try to use pandas.to_datetime() method:
df['DATE_TIME'] = pd.to_datetime(df['DATE_TIME'], errors='coerce')
Parameter errors='coerce' will take care of those strings that can't be converted to datatime dtype
I think you need pandas.to_datetime only:
df = pd.DataFrame({'DATE_TIME':['12-MAY-2017 0:16:20','12-MAY-2017 0:16:20']})
print (df)
DATE_TIME
0 12-MAY-2017 0:16:20
1 12-MAY-2017 0:16:20
df['DATE_TIME'] = pd.to_datetime(df['DATE_TIME'])
print (df)
DATE_TIME
0 2017-05-12 00:16:20
1 2017-05-12 00:16:20
Convert in numpy by astype seems problematic, because need strings in ISO 8601 date or datetime format:
df['DATE_TIME'].astype('datetime64[ns]')
ValueError: Error parsing datetime string "12-MAY-2017 0:16:20" at position 3
EDIT:
If datetimes are broken (some strings or ints) then use MaxU answer.

Categories