I have to compare two columns containing date values and find the difference between the 2 dates.
Out of the 2 columns one is of datetime type however another is an object type. W
hen trying to convert the object type to datetime using:
final['Valid to']=pd.to_datetime(final['Valid to'])
I am getting an error:
OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 9999-12-31
00:00:00
How to convert the column of object type to datetime so that i can compare and get the required result?
use format parameter to provide the correct format of the string so that to_datetime can understand what type of string it is converting into datetime object
in your case it would be like
pd.to_datetime(s, format='%Y-%m-%d %H:%M:%S')
please post the sample data for the correct answer as someone have already written in the comment, that would be helpful.
Related
I am working with dataset of some historical subjects, some of them are in 1500's. I need to convert the datatype of some columns to datetime so I can calculate the difference in days. I tried pandas.to_datetime for converting strings in columns to datetime, but it returned Out of Bound error.
The issue can be reproduced by the following code:
datestring = '01-04-1595'
datenew = pd.to_datetime(datestring,format='%d-%m-%Y')
and the output error:
OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 1595-04-01 00:00:00
I learned that the limits of timestamp are min 1677-09-21 and max 2262-04-11, but what would be the workaround for this? The expected timestamp range that will accomodate my dataset is between 1500 to 1900.
I would like to apply the string to datetime conversion for all entries of a column.
Thank you.
I have a vaex dataframe that reads from a hdf5 file. It has a date column which is read as string. I converted it into datetime. However, I am not able to do any date comparisons. I can extract day,month,year, etc from the date so the conversion is correct. But how do I perform operations like date is between x and y?
import vaex
import datetime
vaex_df=vaex.open('filename.hdf5')
vaex_df['pDate']=vaex_df.Date.values.astype('datetime64[ns]')
The datatypes are as expected
print(data.dtypes)
## Date <class 'str'>
## pDate datetime64[ns]
Now I need to filter out rows based on some date
start_date=datetime.date(2019,10,1)
vaex_df=vaex_df[(vaex_df.pDate.dt>=start_date)]
print(vaex_df) # throws SyntaxError: invalid token
I get an invalid token when I try to look at the new dataframe.
I can extract the month and year separately and apply the filter. But that would give a wrong result
vaex_df=vaex_df[(vaex_df.pDate.dt.month>int(str(start_date)[5:7]))&(vaex_df.pDate.dt.year>=int(str(start_date)[:4]))]
How do I do date range comparison operations in vaex?
datetime from numpy works
#Instead of
start_date=datetime.date(2019,10,1)
#Use
start_date=np.datetime64('2019-10-01')
On the vaex dataframe
vaex_df=vaex_df[(vaex_df.pDate>=start_date)]
following problem:
I'm trying to convert a column of a DataFrame from a string to a datetime object with following code:
df = pd.read_csv('data.csv')
df['Time (CET)'] = pd.to_datetime(df['Time (CET)'])
Should be the standard pandas way to do so. But the dtype of the column doesn't change, keeps being an object while no error or exception is raised.
The entries look like 2018-12-31 17:47:14+01:00.
If I apply pd.to_datetime with utc=True, it works completely fine, dtype changes from object to datetime64[ns, UTC]. Unfortunately I don't want to convert the time to UTC, only converting the string to a datetime object without any time zone changes.
Thanks a lot!
I have been stumped for the past few hours trying to solve the following.
In a large data set I have from an automated system, there is a DATE_TIME value, which for rows at midnight has values that dont have a the full hour like: 12-MAY-2017 0:16:20
When I try convert this to a date (so that its usable for conversions) as follows:
df['DATE_TIME'].astype('datetime64[ns]')
I get the following error:
Error parsing datetime string "12-MAY-2017 0:16:20" at position 3
I tried writing some REGEX to pull out each piece but couldnt get anything working given the hour could be either 1 or two characters respectively. It also doesn't seem like an ideal solution to write regex for each peice.
Any ideas on this?
Try to use pandas.to_datetime() method:
df['DATE_TIME'] = pd.to_datetime(df['DATE_TIME'], errors='coerce')
Parameter errors='coerce' will take care of those strings that can't be converted to datatime dtype
I think you need pandas.to_datetime only:
df = pd.DataFrame({'DATE_TIME':['12-MAY-2017 0:16:20','12-MAY-2017 0:16:20']})
print (df)
DATE_TIME
0 12-MAY-2017 0:16:20
1 12-MAY-2017 0:16:20
df['DATE_TIME'] = pd.to_datetime(df['DATE_TIME'])
print (df)
DATE_TIME
0 2017-05-12 00:16:20
1 2017-05-12 00:16:20
Convert in numpy by astype seems problematic, because need strings in ISO 8601 date or datetime format:
df['DATE_TIME'].astype('datetime64[ns]')
ValueError: Error parsing datetime string "12-MAY-2017 0:16:20" at position 3
EDIT:
If datetimes are broken (some strings or ints) then use MaxU answer.
I have a column in pandas df with string datetime like below
a,dtime
1,2017-07-06 09:15:00
1,2017-07-06 10:15:00
I am writing a script that needs to compare time
I need to compare like df[df.dtime < "10:15:00"] (without date)
So I converted df['dtime']=pd.to_datetime(df['dtime'])
If I do
df[df.dtime < "10:15:00"]
it takes today date as default and would always compare with today's "10:15:00" what I don't want.
So I created another column and then did it like below
df['ts']=df.dtime.apply(lambda x: x.time())
df[df.ts<"09:16:00"]
TypeError: can't compare datetime.time to str
df[df.ts<pd.to_datetime("09:16:00").time()] #this works well
Is there a better way to do this simple task, I dont see any point creating a new ts column.
When I do
df['dtime']=pd.to_datetime(df['dtime']) I should only extract time part. But doing df['dtime']=pd.to_datetime(df['dtime']).time() gives error AttributeError: 'Series' object has no attribute 'time'
You need to use time or timedelta instead of datetime. You can access it vie the .dt. methods
t = pd.to_datetime('10:15:00').time()
df['dtime'].dt.time < t
0 True
1 False
Name: dtime, dtype: bool