np where : can't compare datetime.date to unicode - python

I am trying to do a np.where in a dataframe and separate the date that are from 2017 or more and below 2017.
I need to compare a column "Creation_Date" (date format "%d/%m/%Y") with the value '01/01/2017'.
I keep getting the same error which is : can't compare datetime.date to unicode
I have converted a "Creation_Date" column to a date format using the function strftime. Then I have converted the value '01/01/2017' to date format to compare it with the values in the "Creation_Date" column.
Here is the actual code :
my_df['temp_date'] = pd.to_datetime(my_df['Creation_Date'], dayfirst=True).dt.strftime('%Y-%m-%d')
t1 = my_df['temp_date'] >= dt.date(2017, 1, 1)
my_df['Final_Date'] = np.where(t1,'2017 or more','Below 2017')
Also tried :
my_df['temp_date'] = pd.to_datetime(my_df['Creation_Date'], dayfirst=True).dt.strftime('%Y-%m-%d')
t1 = my_df['temp_date'] >= dt.datetime.strptime('01/01/2017','%d/%m/%Y')
my_df['Final_Date'] = np.where(t1,'2017 or more','Below 2017')
I still can't manage to get the right format between these comparison : can't compare datetime.date to unicode.
I need to get a Final_Date column that distinguish the Creation_Date from 2017 or more and below 2017.
Can you help me, please ?
Best regards.

From your code,
my_df['temp_date'] = pd.to_datetime(my_df['Creation_Date'], dayfirst=True).dt.strftime('%Y-%m-%d')
then my_df['temp_date'] are strings, so you can't really compare them to dt.date(2017, 1, 1) or dt.datetime.strptime('01/01/2017','%d/%m/%Y') which are both datetime type.
On the other hand, pandas allows comparison between pandas.Datetime type and date string. So you can get rid of dt.strftime and you can compare:
my_df['temp_date']=pd.to_datetime(my_df['Creation_Date'], dayfirst=True)
my_df['temp_date'] >= '2017-01-01'

Related

Get the average number of days between two dates in Python

I have a data frame with only dates in one column.
My objective is to calculate the number of days between each date, in order to get the average number of days between two dates (each date corresponds to an operation).
I tried doing something like this :
for i in range(len(df)):
if i != 0:
d0 = df.iloc[[i-1,1]]
d1 = df.iloc[[i,1]]
L.append((d1 - d0).days)
But got the error message : 'unsupported operand type(s) for -: 'str' and 'str''
You can subtract a date from another if they're in proper format. Maybe a timestamp in seconds, maybe a proper datetime object. As you might've noticed, you can't subtract strings.
If the date is in the ISO format, it's the easiest, just do this before the subtraction:
from datetime import datetime
d0 = datetime.fromisoformat(d0)
d1 = datetime.fromisoformat(d1)
The result will be in a datetime.timedelta format with a .total_seconds() method and even a .days attribute, so you get days like this:
difference_in_days = (d1 - d0).days
If it's something else than an ISO string, check out this question:
Converting string into datetime

How do I read an Excel column as datetime.datetime instead of the auto datetime.time

I am trying to select rows of a pandas Dataframe within a date range. The dataframe is uploaded from an Excel and the date is automatically saved as datetime.time which is causing problems when comparing to datetime.datetime.
I tried converting the datetime.time to datetime.datetime using pd.to_datetime but it didn't work, maybe because it is in a DF. I tried setting the column to datetime.datetime when it is being read. I have tried converting to datetime.datetime while being read. None of these worked. The column is named Sub_End and it is just a 5 digit number with date format, like 42636 is 9/23/2016.
Here are some of the upload attempts I have made:
Subadvisory_Advisor_Fires=pd.read_excel('SOLO_Advisor_Data.xlsx',sheetname='Advisor_Fires', dtype={'Sub_End': date})
This read the file with no issue but the column was still datetime.time
Subadvisory_Advisor_Fires=pd.read_excel('SOLO_Advisor_Data.xlsx',sheetname='Advisor_Fires', converters= {'Sub_End': pd.to_datetime})
I got an error on this one:
TypeError: is not convertible to datetime
Subadvisory_Advisor_Fires=pd.read_excel('SOLO_Advisor_Data.xlsx',sheetname='Advisor_Fires', dtype={'Sub_End': datetime.datetime})
This read the file with no issue but the column was still datetime.time
The code that is having the error is:
Advisor_Fires=Subadvisory_Advisor_Fires
Start_Datetime = datetime.datetime(2016, 12, 31)
End_Datetime = datetime.datetime(2018, 12, 31)
Advisor_Fires = Advisor_Fires[(Advisor_Fires['Sub_End']).between(Start_Datetime, End_Datetime)]
The error I get is:
TypeError: can't compare datetime.time to datetime.datetime
I am just trying to limit the rows to include one where they are between these two dates. Nothing I have tried has allowed the date in the Excel file to be read properly as dates.
I'm sure there is an easier way to do this, but I got this to work
Subadvisory_Advisor_Fires=pd.read_excel('SOLO_Advisor_Data.xlsx',sheetname='Advisor_Fires', converters={'Sub_End':str})
Year = Subadvisory_Advisor_Fires['Sub_End'].str.slice(0, 4)
Month = Subadvisory_Advisor_Fires['Sub_End'].str.slice(5, 7)
Day = Subadvisory_Advisor_Fires['Sub_End'].str.slice(8, 10)
Year = pd.to_numeric(Year, errors='coerce')
Month = pd.to_numeric(Month, errors='coerce')
Day = pd.to_numeric(Day, errors='coerce')
Dates = pd.to_datetime((Year*10000+Month*100+Day).apply(str),format='%Y%m%d')
Subadvisory_Advisor_Fires['Sub_End_Converted'] = Dates

Comparison between datetime and datetime64[ns] in pandas

I'm writing a program that checks an excel file and if today's date is in the excel file's date column, I parse it
I'm using:
cur_date = datetime.today()
for today's date. I'm checking if today is in the column with:
bool_val = cur_date in df['date'] #evaluates to false
I do know for a fact that today's date is in the file in question. The dtype of the series is datetime64[ns]
Also, I am only checking the date itself and not the timestamp afterwards, if that matters. I'm doing this to make the timestamp 00:00:00:
cur_date = datetime.strptime(cur_date.strftime('%Y_%m_%d'), '%Y_%m_%d')
And the type of that object after printing is datetime as well
For anyone who also stumbled across this when comparing a dataframe date to a variable date, and this did not exactly answer your question; you can use the code below.
Instead of:
self.df["date"] = pd.to_datetime(self.df["date"])
You can import datetime and then add .dt.date to the end like:
self.df["date"] = pd.to_datetime(self.df["date"]).dt.date
You can use
pd.Timestamp('today')
or
pd.to_datetime('today')
But both of those give the date and time for 'now'.
Try this instead:
pd.Timestamp('today').floor('D')
or
pd.to_datetime('today').floor('D')
You could have also passed the datetime object to pandas.to_datetime but I like the other option mroe.
pd.to_datetime(datetime.datetime.today()).floor('D')
Pandas also has a Timedelta object
pd.Timestamp('now').floor('D') + pd.Timedelta(-3, unit='D')
Or you can use the offsets module
pd.Timestamp('now').floor('D') + pd.offsets.Day(-3)
To check for membership, try one of these
cur_date in df['date'].tolist()
Or
df['date'].eq(cur_date).any()
When converting datetime64 type using pd.Timestamp() it is important to note that you should compare it to another timestamp type. (not a datetime.date type)
Convert a date to numpy.datetime64
date = '2022-11-20 00:00:00'
date64 = np.datetime64(date)
Seven days ago - timestamp type
sevenDaysAgoTs = (pd.to_datetime('today')-timedelta(days=7))
convert date64 to Timestamp and see if it was in the last 7 days
print(pd.Timestamp(pd.to_datetime(date64)) >= sevenDaysAgoTs)

ValueError when converting String to datetime

I have a dataframe as follows, and I am trying to reduce the dataframe to only contain rows for which the Date is greater than a variable curve_enddate. The df['Date'] is in datetime and hence I'm trying to convert curve_enddate[i][0] which gives a string of the form 2015-06-24 to datetime but am getting the error ValueError: time data '2015-06-24' does not match format '%Y-%b-%d'.
Date Maturity Yield_pct Currency
0 2015-06-24 0.25 na CAD
1 2015-06-25 0.25 0.0948511020 CAD
The line where I get the Error:
df = df[df['Date'] > time.strptime(curve_enddate[i][0], '%Y-%b-%d')]
Thank You
You are using wrong date format, %b is for the named months (abbreviations like Jan or Feb , etc), use %m for the numbered months.
Code -
df = df[df['Date'] > time.strptime(curve_enddate[i][0], '%Y-%m-%d')]
You cannot compare a time.struct_time tuple which is what time.strptime returns to a Timestamp so you also need to change that as well as using '%Y-%m-%d' using m which is the month as a decimal number. You can use pd.to_datetime to create the object to compare:
df = df[df['Date'] > pd.to_datetime(curve_enddate[i][0], '%Y-%m-%d')]

Selecting Data from Last Week in Python

I have a large database and I am looking to read only the last week for my python code.
My first problem is that the column with the received date and time is not in the format for datetime in pandas. My input (Column 15) looks like this:
recvd_dttm
1/1/2015 5:18:32 AM
1/1/2015 6:48:23 AM
1/1/2015 13:49:12 PM
From the Time Series / Date functionality in the pandas library I am looking at basing my code off of the "Week()" function shown in the example below:
In [87]: d
Out[87]: datetime.datetime(2008, 8, 18, 9, 0)
In [88]: d - Week()
Out[88]: Timestamp('2008-08-11 09:00:00')
I have tried ordering the date this way:
df =pd.read_csv('MYDATA.csv')
orderdate = datetime.datetime.strptime(df['recvd_dttm'], '%m/%d/%Y').strftime('%Y %m %d')
however I am getting this error
TypeError: must be string, not Series
Does anyone know a simpler way to do this, or how to fix this error?
Edit: The dates are not necessarily in order. AND sometimes there is a faulty error in the database like a date that is 9/03/2015 (in the future) someone mistyped. I need to be able to ignore those.
import datetime as dt
# convert strings to datetimes
df['recvd_dttm'] = pd.to_datetime(df['recvd_dttm'])
# get first and last datetime for final week of data
range_max = df['recvd_dttm'].max()
range_min = range_max - dt.timedelta(days=7)
# take slice with final week of data
sliced_df = df[(df['recvd_dttm'] >= range_min) &
(df['recvd_dttm'] <= range_max)]
You may iterate over the dates to convert by making a list comprehension
orderdate = [datetime.datetime.strptime(ttm, '%m/%d/%Y').strftime('%Y %m %d') for ttm in list(df['recvd_dttm'])]

Categories