I have a pandas dataframe that has a column of type int64 but this columns represets date, e.g. 20180501. I'd like to convert this column to datetime and I'm having the following code but it returns an error message
df['new_date'] = pd.to_datetime(df['old_date'].astype('str'), format = '%y%m%d')
I'm getting the following error message
ValueError: unconverted data remains: 0501
How can I fix my code?
You need a capital Y. See Python's strftime directives for a complete reference.
df = pd.DataFrame({'old_date': [20180501, 20181230, 20181001]})
df['new_date'] = pd.to_datetime(df['old_date'].astype(str), format='%Y%m%d')
print(df)
old_date new_date
0 20180501 2018-05-01
1 20181230 2018-12-30
2 20181001 2018-10-01
It could be that the problem arises due to a format error at some places in the dataframe.
You could try setting the parameter errors="coerce" to avoid converting those entries and setting them to NaT.
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.to_datetime.html
Related
I have a df with a column:
column
Dec-01
The column datatype is an object.
I am trying to change it to a datatype date.
This is what I've tried:
df['column'] = pd.to_datetime(df['column']).dt.strftime(%d-%b')
This is the error I receive:
pandas._libs.tslibs.np_datetime.OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 1-12-01 00:00:00
Would really appreciate you help 🙏
You can use:
df['date'] = pd.to_datetime(df['column'], format='%b-%d').dt.strftime('%m/%d/22')
NB. you must hardcode the year as it is undefined in your original data and pandas will by default use 1970.
output:
column date
0 Dec-01 12/01/22
I have a dataframe called pomi that looks like this
date time sub
2019-09-20 00:00:00 25.0 org
I want to convert the values in the column 'date' to datetime.date, so that I'm left with only the dates (ie '2019-09-20').
I have tried:
pomi['date'] = pd.to_datetime(pomi['date'])
pomi['just_date'] = pomi['date'].dt.date
pomi.date = pd.to_datetime(pomi.date,dayfirst=True)
pomi['date'] = pd.to_datetime(pomi["date"].astype(str)).dt.time
pomi['date'] = pd.to_datetime(pomi['date']).dt.date
pomi['date'] = pd.to_datetime(pomi['date']).dt.normalize()
None of them have worked.
Most often I get the error message "TypeError: <class 'datetime.time'> is not convertible to datetime"
All help appreciated. Thanks.
Full disclosure, I am not 100% sure what is the issue, your code was working fine at my end. But there is something you can try as convert to Timestamp & than check. This & your code both works at my end giving required out.
import pandas as pd
df = pd.DataFrame({'date': ['2019-09-20 00:00:00'], 'time':[25], 'sub':['org']})
df['date'] = df['date'].apply(pd.Timestamp)
df['just_date'] = df['date'].dt.date
df
I am trying to convert a column in string format to DateTime format, However, I am getting the following error, could somebody please help?
The error:time data '42:53.700' does not match format '%H:%M:%S.%f' (match)
Code:
Merge_df['Time'] = pd.to_datetime(Merge_df['Time'], format='%H:%M:%S.%f')
You'll need to clean the data to get a common format before you can parse to data type 'datetime'. For example you can remove the colons and fill with zeros, then parse with the appropriate directive:
import pandas as pd
df = pd.DataFrame({'time': ["1:45.333", "45:22.394", "4:55:23.444", "23:44:01.004"]})
df['time'] = pd.to_datetime(df['time'].str.replace(':', '').str.zfill(10), format="%H%M%S.%f")
df['time']
0 1900-01-01 00:01:45.333
1 1900-01-01 00:45:22.394
2 1900-01-01 04:55:23.444
3 1900-01-01 23:44:01.004
Name: time, dtype: datetime64[ns]
Since the data actually looks more like a duration to me, here's a way how to convert to data type 'timedelta'. You'll need to ensure HH:MM:SS.fff format which is a bit more work:
# ensure common string length
df['time'] = df['time'].str.zfill(12)
# ensure HH:MM:SS.fff format
df['time'] = df['time'].str[:2] + ":" + df['time'].str[3:5] + ":" + df['time'].str[6:]
df['timedelta'] = pd.to_timedelta(df['time'])
df['timedelta']
0 0 days 00:01:45.333000
1 0 days 00:45:22.394000
2 0 days 04:55:23.444000
3 0 days 23:44:01.004000
Name: timedelta, dtype: timedelta64[ns]
The advantage of using timedelta is that you can now also handle hours greater 23.
I have tried many things and cannot seem to get this to work. In essence, I want to do this because an error occurs when I'm trying to convert this ndarray to a DataFrame. The following error occurs when finding missing Datetime64 values within the Dataframe:
"Out of bounds nanosecond timestamp: 1-01-01 00:00:00"
Therefore I wish to convert these DateTime64 columns into Strings and Recode '1-01-01 00:00:00' within the ndarray, then convert them back to DateTime variables in a DataFrame in order to avoid facing the error shown above.
with sRW.SavReaderNp('C:/Users/Sam/Downloads/data.sav') as reader:
record = reader.all()
prints:
[(b'61D8894E-7FB0-3DE6-E053-6C04A8C01207', 250000., '2019-08-05T00:00:00.000000',
(b'61D8894E-7FB0-3DE6-E053-6C04A8C01207', 250000., '2019-08-05T00:00:00.000000',
(b'61D8894E-7FB0-3DE6-E053-6C04A8C01207', 250000., '0001-01-01T00:00:00.000000',)]
First of all please check if your post is valid, i.e. contains runnable code.
Your example returns a syntax error and the code where you tried what you explained is simply not there.
However, I assume your data looks like
arr = [(b'61D8894E-7FB0-3DE6-E053-6C04A8C01207', 250000., '2019-08-05T00:00:00.000000'),
(b'61D8894E-7FB0-3DE6-E053-6C04A8C01207', 250000., '2019-08-05T00:00:00.000000'),
(b'61D8894E-7FB0-3DE6-E053-6C04A8C01207', 250000., '0001-01-01T00:00:00.000000')]
which looks converted to a dataframe like
df = pd.DataFrame(arr, columns=['ID', 'value', 'date'])
# ID ... date
# 0 b'61D8894E-7FB0-3DE6-E053-6C04A8C01207' ... 2019-08-05T00:00:00.000000
# 1 b'61D8894E-7FB0-3DE6-E053-6C04A8C01207' ... 2019-08-05T00:00:00.000000
# 2 b'61D8894E-7FB0-3DE6-E053-6C04A8C01207' ... 0001-01-01T00:00:00.000000
Then your attempt to convert the date strings into datetime objects was probably
df.date = pd.to_datetime(df.date)
# OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 1-01-01 00:00:00
which results in the error message you posted in your question.
You can catch these parsing errors with the errors kwarg of pd.to_datetime:
df.date = pd.to_datetime(df.date, 'coerce')
# ID value date
# 0 b'61D8894E-7FB0-3DE6-E053-6C04A8C01207' 250000.0 2019-08-05
# 1 b'61D8894E-7FB0-3DE6-E053-6C04A8C01207' 250000.0 2019-08-05
# 2 b'61D8894E-7FB0-3DE6-E053-6C04A8C01207' 250000.0 NaT
I am trying to find the day difference between today, and dates in my dataframe.
Below is my conversion of dates in my dataframe
df['Date']=pd.to_datetime(df['Date'])
Below is my code to get today
today1=dt.datetime.today().strftime('%Y-%m-%d')
today1=pd.to_datetime(today1)
Both are converted to pandas.to_datetime, but when I do subtraction, the below error came out.
ValueError: Cannot add integral value to Timestamp without offset.
Can someone help to advise? Thanks!
This is a simple example how you can do this:
import pandas
import datetime as dt
First, you have to get today.
today1=dt.datetime.today().strftime('%Y-%m-%d')
today1=pd.to_datetime(today1)
Then, you can construct the data frame:
df = pandas.DataFrame({'Date':'2016-11-24 11:03:10.050000', 'today1': today1 }, index = [0])
In this example I just have 2 columns, each with one value.
Next, you should check the data types:
print(df.dtypes)
Date datetime64[ns]
today1 datetime64[ns]
If both data types are datetime64[ns], you can then subtract df.Date from df.today1.
print(df.today1 - df.Date)
The output:
0 19 days 12:56:49.950000
dtype: timedelta64[ns]