convert string to datetime for old years (out of bound problem) - python

I am working with dataset of some historical subjects, some of them are in 1500's. I need to convert the datatype of some columns to datetime so I can calculate the difference in days. I tried pandas.to_datetime for converting strings in columns to datetime, but it returned Out of Bound error.
The issue can be reproduced by the following code:
datestring = '01-04-1595'
datenew = pd.to_datetime(datestring,format='%d-%m-%Y')
and the output error:
OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 1595-04-01 00:00:00
I learned that the limits of timestamp are min 1677-09-21 and max 2262-04-11, but what would be the workaround for this? The expected timestamp range that will accomodate my dataset is between 1500 to 1900.
I would like to apply the string to datetime conversion for all entries of a column.
Thank you.

Related

Pandas.to_datetime doesn't recognize the format of the string to be converted to datetime

I am trying to convert some data from a .txt file to a dataframe to use it for some analysis
the form of the data in the .txt is a follows
DATE_TIME VELOC MEASURE
[m/s] [l/h]
A 09.01.2023 12:45:20 ??? ???
A 09.01.2023 12:46:20 0,048 52,67
A 09.01.2023 12:47:20 0,049 53,77
A 09.01.2023 12:48:20 0,050 54,86
I load the data to a dataframe no problem i covnert the str values of the measurement to float etc everything is good as shows in the
image
the problem I get is when trying to convert the column of the date time that is string to datetime pandas format using this line of code:
volume_flow['DATE_TIME'] = pd.to_datetime(volume_flow['DATE_TIME'], format = '%d.%m.%Y %H:%M:S')
and i get the following error
ValueError: time data '09.01.2023 12:46:20' does not match format '%d.%m.%Y %H:%M:S' (match)
but i don't see how the format is off
I am really lost as to why this is caused as i used the same code with different formats of datetime before with no problem
further more i tried using format = '%dd.%mm.%yyyy %H:%M:S' as well with the same results and when i let the pandas.to_datetime convert it automatically it confuses the day and the month of the data. the data is between 09.01-12.01 so you can't really tell if one is the month or day just by the values.
I think you should go from this
(..., format='%d.%m.%Y %H:%M:S')
to this
(..., format='%d.%m.%Y %H:%M:%S')
You forgot the percentage character!
check the documentations for correct time format. You will note that the directive %S represents the seconds.
Second as a decimal number [00,61].

Issue with converting string to a datetime in Python

I've a datetime (int64) column in my pandas dataframe.
I'm trying to convert its value of 201903250428 to a datetime value.
The value i have for the datetime (int64) column is only till minute level with 24 hours format.
I tried various methods like striptime, to_datetime methods but no luck.
pd.datetime.strptime('201903250428','%y%m%d%H%M')
I get this error when i use the above code.
ValueError: unconverted data remains: 0428
I wanted this value to be converted to like '25-03-2019 04:28:00'
Lower-case y means two-digit years only, so this is trying to parse "20" as the year, 1 as the month, 9 the day, and 03:25 as the time, leaving "0428" unconverted.
You need to use %Y which will work fine:
pd.datetime.strptime('201903250428','%Y%m%d%H%M')
http://strftime.org/ is a handy reference for time formatting/parsing parameters.

Date Comparison in python pandas

I have to compare two columns containing date values and find the difference between the 2 dates.
Out of the 2 columns one is of datetime type however another is an object type. W
hen trying to convert the object type to datetime using:
final['Valid to']=pd.to_datetime(final['Valid to'])
I am getting an error:
OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 9999-12-31
00:00:00
How to convert the column of object type to datetime so that i can compare and get the required result?
use format parameter to provide the correct format of the string so that to_datetime can understand what type of string it is converting into datetime object
in your case it would be like
pd.to_datetime(s, format='%Y-%m-%d %H:%M:%S')
please post the sample data for the correct answer as someone have already written in the comment, that would be helpful.

Pandas.to_datetime giving an error when given 15-Jan-0001 is there a way around this?

I've got a dataset which goes back to 15-Jan-0001 (yes that is 1 CE!), it was originally 0 CE but since that year doesn't exist I cut those 12 months out of the data.
I am trying to get pandas to convert to date-time string in my datasdf.datetime=pd.to_datetime(df.datetime) to an internal datetime object.
I tried:
import pandas as pd
df = pd.load_csv(file)
df.datetime = pd.to_dtaetime(df.datetime)
and got:
OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 1-01-15 00:00:00
the first two lines of the csv file are:
datenum,year,month,day,datetime,data_mean_global,data_mean_nh,data_mean_sh
381,1,1,15,15-Jan-0001 00:00:00,277.876675965034,278.555895908363,277.197456021705
One way is convert this problematic values to NaTs:
df.datetime = pd.to_dtaetime(df.datetime, errors='coerce')

Python cleaning dates for conversion to year only in Pandas

I have a large data set which some users put in data on an csv. I converted the CSV into a dataframe with panda. The column is over 1000 entries here is a sample
datestart
5/5/2013
6/12/2013
11/9/2011
4/11/2013
10/16/2011
6/15/2013
6/19/2013
6/16/2013
10/1/2011
1/8/2013
7/15/2013
7/22/2013
7/22/2013
5/5/2013
7/12/2013
7/29/2013
8/1/2013
7/22/2013
3/15/2013
6/17/2013
7/9/2013
3/5/2013
5/10/2013
5/15/2013
6/30/2013
6/30/2013
1/1/2006
00/00/0000
7/1/2013
12/21/2009
8/14/2013
Feb 1 2013
Then I tried converting the dates into years using
df['year']=df['datestart'].astype('timedelta64[Y]')
But it gave me an error:
ValueError: Value cannot be converted into object Numpy Time delta
Using Datetime64
df['year']=pd.to_datetime(df['datestart']).astype('datetime64[Y]')
it gave:
"ValueError: Error parsing datetime string ""03/13/2014"" at position 2"
Since that column was filled in by users, the majority was in this format MM/DD/YYYY but some data was put in like this: Feb 10 2013 and there was one entry like this 00/00/0000. I am guessing the different formats screwed up the processing.
Is there a try loop, if statement, or something that I can skip over problems like these?
If date time fails I will be force to use a str.extract script which also works:
year=df['datestart'].str.extract("(?P<month>[0-9]+)(-|\/)(?P<day>[0-9]+)(-|\/)(?P<year>[0-9]+)")
del df['month'], df['day']
and use concat to take the year out.
With df['year']=pd.to_datetime(df['datestart'],coerce=True, errors ='ignore').astype('datetime64[Y]') The error message is:
Message File Name Line Position
Traceback
<module> C:\Users\0\Desktop\python\Example.py 23
astype C:\Python33\lib\site-packages\pandas\core\generic.py 2062
astype C:\Python33\lib\site-packages\pandas\core\internals.py 2491
apply C:\Python33\lib\site-packages\pandas\core\internals.py 3728
astype C:\Python33\lib\site-packages\pandas\core\internals.py 1746
_astype C:\Python33\lib\site-packages\pandas\core\internals.py 470
_astype_nansafe C:\Python33\lib\site-packages\pandas\core\common.py 2222
TypeError: cannot astype a datetimelike from [datetime64[ns]] to [datetime64[Y]]
You first have to convert the column with the date values to datetime's with to_datetime():
df['datestart'] = pd.to_datetime(df['datestart'], coerce=True)
This should normally parse the different formats flexibly (the coerce=True is important here to convert invalid dates to NaT).
If you then want the year part of the dates, you can do the following (seems doing astype directly on the pandas column gives an error, but with values you can get the underlying numpy array):
df['datestart'].values.astype('datetime64[Y]')
The problem with this is that it gives again an error when assigning this to a column due to the NaT value (this seems a bug, you can solve this by doing df = df.dropna()). But also, when you assign this to a column, it get converted back to a datetime64[ns] as this is the way pandas stores datetimes. So I personally think if you want a column with the years, you can better do the following:
df['year'] = pd.DatetimeIndex(df['datestart']).year
This last one will return the year as an integer.

Categories