I am trying to format the due date column of my dataframe from strings to dates from the datetime class. It seems to work within my for-loop, however, when I leave the loop, the values in my dataframe do not change.
df.replace() is obsolete, and .iat and .at do not work either. Code below. Thanks!
for x in df['due date']:
if type(x) == str:
date = x.split('/')
month = int(date[0])
day = int(date[1])
year = int(date[2].split(' ')[0])
formattedDate = dt.date(year, month, day)
df.at[x, "due date"] = formattedDate
Unless I'm missing something here, you can just pass the column to the built in 'to_datetime' function.
df['due date'] = pd.to_datetime(df['due date'],format="%m/%d/%Y")
That is assuming your date format is something like: 02/24/2021
If you need to change the date format, see the following:
strftime and strptime behavior
Related
I am working with a Dataframe containing date in string format. Dates look like this: 19620201 so with year first, then month, then day.
I want to convert those dates into Datetime. I tried to use this:
pd.to_datetime(df.Date)
But it doesn't work because some date have the day to "00" sometimes it's the month and sometimes it's even the year.
I don't wanna drop those dates because I still wnat the years or month.
So i tried to write a function like this one:
def handle_the_00_case(date):
try:
if date.endswith("0000"):
return pd.to_datetime(date[:-4], format="%Y")
elif date.endswith("00"):
return pd.to_datetime(date[:-2], format="%Y%m")
return pd.to_datetime(date, format="%Y%m%d")
except ValueError:
return
And use the following statement:
df.Date.apply(handle_the_00_case)
But this is really too long to compute.
Do you have an idea on how I can improve the speed of this ?
I tried the np.vectorize() and the swifter library but this doesn't work, I know I should change the way I wrote the function but i don't know how.
Thank you if you can help me ! :)
You should first convert the column to valid dates, and then convert to datetime only once:
date = df['Date'].str.replace('0000$','0101')
date = date.str.replace('00$','01')
date = pd.to_datetime(date, format="%Y%m%d")
First idea is use vectorized solution with pass column to to_datetime and generate ouput column by numpy.where:
d1 = pd.to_datetime(df['Date'].str[:-4], format="%Y", errors='coerce')
d2 = pd.to_datetime(df['Date'].str[:-2], format="%Y%m", errors='coerce')
d3 = pd.to_datetime(df['Date'], format="%Y%m%d", errors='coerce')
m1 = df['Date'].str.endswith("0000")
m2 = df['Date'].str.endswith("00")
df['Date_out'] = np.where(m1, d1, np.where(m2, d2, d3))
I have a simple DataFrame in Pandas that one of the column contains date in this format day-month-year.
I need to make another column that contains which weekday is that. I wrote this function that works with simple argument like '12-3-1999':
def convert_date_to_weekday(date_string):
# convert string to date object
date_object = datetime.strptime(date_string, '%d-%m-%Y').date()
# convert date object to weekday string
print(date_object.strftime("%A"))
Unfortunetely this doesn't work:
df['Weekday'] = convert_date_to_weekday(df['Date'])
How to make it work?
You can use apply , which also a recommended way with pandas
def convert_date_to_weekday(date_string):
# convert string to date object
date_object = datetime.strptime(date_string, '%d-%m-%Y').date()
# convert date object to weekday string
# print(date_object.strftime("%A"))
return date_object.strftime("%A")
df['Weekday'] = df['Date'].apply(lambda x: convert_date_to_weekday(x)))
try
df['Weekday'] = df['Date'].apply(lambda x: convert_date_to_weekday(x))
and use return in your function instead of print
Try:
def convert_date_to_weekday(date_string):
# convert string to date object
date_object = datetime.strptime(date_string, '%d-%m-%Y').date()
# convert date object to weekday string
return date_object.strftime("%A")
Along with your apply line.
You probably want to use apply() method of the pandas dataframe:
df['Weekday'] = df['Date'].apply(convert_date_to_weekday)
But your convert_date_to_weekday should return the result instead of printing.
You can use the apply() function.
df['Weekday'] = df['Date'].apply(lambda d: convert_date_to_weekday(d))
Also, change your print(date_object.strftime("%A")) to return date_object.strftime("%A")
I'm writing a program that checks an excel file and if today's date is in the excel file's date column, I parse it
I'm using:
cur_date = datetime.today()
for today's date. I'm checking if today is in the column with:
bool_val = cur_date in df['date'] #evaluates to false
I do know for a fact that today's date is in the file in question. The dtype of the series is datetime64[ns]
Also, I am only checking the date itself and not the timestamp afterwards, if that matters. I'm doing this to make the timestamp 00:00:00:
cur_date = datetime.strptime(cur_date.strftime('%Y_%m_%d'), '%Y_%m_%d')
And the type of that object after printing is datetime as well
For anyone who also stumbled across this when comparing a dataframe date to a variable date, and this did not exactly answer your question; you can use the code below.
Instead of:
self.df["date"] = pd.to_datetime(self.df["date"])
You can import datetime and then add .dt.date to the end like:
self.df["date"] = pd.to_datetime(self.df["date"]).dt.date
You can use
pd.Timestamp('today')
or
pd.to_datetime('today')
But both of those give the date and time for 'now'.
Try this instead:
pd.Timestamp('today').floor('D')
or
pd.to_datetime('today').floor('D')
You could have also passed the datetime object to pandas.to_datetime but I like the other option mroe.
pd.to_datetime(datetime.datetime.today()).floor('D')
Pandas also has a Timedelta object
pd.Timestamp('now').floor('D') + pd.Timedelta(-3, unit='D')
Or you can use the offsets module
pd.Timestamp('now').floor('D') + pd.offsets.Day(-3)
To check for membership, try one of these
cur_date in df['date'].tolist()
Or
df['date'].eq(cur_date).any()
When converting datetime64 type using pd.Timestamp() it is important to note that you should compare it to another timestamp type. (not a datetime.date type)
Convert a date to numpy.datetime64
date = '2022-11-20 00:00:00'
date64 = np.datetime64(date)
Seven days ago - timestamp type
sevenDaysAgoTs = (pd.to_datetime('today')-timedelta(days=7))
convert date64 to Timestamp and see if it was in the last 7 days
print(pd.Timestamp(pd.to_datetime(date64)) >= sevenDaysAgoTs)
I have a dataframe full of dates and I would like to select all dates where the month==12 and the day==25 and add replace the zero in the xmas column with a 1.
Anyway to do this? the second line of my code errors out.
df = DataFrame({'date':[datetime(2013,1,1).date() + timedelta(days=i) for i in range(0,365*2)], 'xmas':np.zeros(365*2)})
df[df['date'].month==12 and df['date'].day==25] = 1
Pandas Series with datetime now behaves differently. See .dt accessor.
This is how it should be done now:
df.loc[(df['date'].dt.day==25) & (cust_df['date'].dt.month==12), 'xmas'] = 1
Basically what you tried won't work as you need to use the & to compare arrays, additionally you need to use parentheses due to operator precedence. On top of this you should use loc to perform the indexing:
df.loc[(df['date'].month==12) & (df['date'].day==25), 'xmas'] = 1
An update was needed in reply to this question. As of today, there's a slight difference in how you extract months from datetime objects in a pd.Series.
So from the very start, incase you have a raw date column, first convert it to datetime objects by using a simple function:
import datetime as dt
def read_as_datetime(str_date):
# replace %Y-%m-%d with your own date format
return dt.datetime.strptime(str_date,'%Y-%m-%d')
then apply this function to your dates column and save results in a new column namely datetime:
df['datetime'] = df.dates.apply(read_as_datetime)
finally in order to extract dates by day and month, use the same piece of code that #Shayan RC explained, with this slight change; notice the dt.datetime after calling the datetime column:
df.loc[(df['datetime'].dt.datetime.month==12) &(df['datetime'].dt.datetime.day==25),'xmas'] =1
Date Description
0 6/09/2012 Amazon
1 6/09/2012 iTunes
2 6/08/2012 iTunes
3 6/08/2012 Building
4 6/08/2012 Slicehost
I have a DataFrame like the above. I can pick out the day part of the above datestring using a function get_day() like this:
def get_day(date_string):
d = datetime.strptime(date_string, '%m/%d/%Y')
return d.day
Now how do I pass this function to the above DataFrame to get groupby going on the day rather than the datestring itself. Couldn't figure it out from looking at the docs. Any help appreciated.
df.groupby(get_day)
but I would convert the date strings to datetime objects first anyway.
Another problem is that you're calling .day which returns a day of month (number 1-31). You probably want to call .date():
def get_day(date_string):
return datetime.strptime(date_string, '%m/%d/%Y').date()
or directly
df.groupby(lambda x: datetime.strptime(date_string, '%m/%d/%Y').date())