I would like to extract a week number from a data in panda dataframe but starting from the SPECIFIC date.
For example from 4th April:
20/04/2010 --> 1
27/04/2010 --> 2
04/05/2010 --> 3
and so on..
Any idea?
Thank you in advance!
Just calculate the difference in days between 2 dates, divide by 7 and add 1 :
from datetime import date
origin = date(2010, 4, 20)
def week_number_from(my_date, origin):
return (my_date - origin).days / 7 + 1
Use pandas to_datetime to parse your date column if it is not already in datetime format.
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.to_datetime.html
Then use the date time method weekday.
https://docs.python.org/2/library/datetime.html
my_dataframe['week_nums'] = pandas.to_datetime(my_dataframe['date_col']).weekday()
Sorry I saw you want week day from specific date, will update answer... it is easy to calculate the difference between 2 dates.
Related
I have a column present in the format "YYYY-WW" which contains a future week. How to calculate the no. of weeks from present week till this future week in Python ?
You can create a new column to convert the "YYYY-WW" format to Monday's date of week like so:
df.MondayDate = pd.to_datetime(df['YYYY-WW'] + '-1', format='%Y-%W-%w')
Then you can use this newly created column to calculate the difference in days and convert to week like:
df.WeeksRemaining = (df['MondayDate']-pd.Timestamp.today())/np.timedelta64(1, 'W')
Currently my script is subtracting my current time with the times that i have in a Dataframe column called "Creation", generating a new column with the days of the difference. I get the difference days with this code:
df['Creation']= pandas.to_datetime(df["Creation"],dayfirst="True")
#Generates new column with the days.
df['Difference'] = df.to_datetime('now') - df['Creation']
What i want to now is for it to give me the days like hes giving me but dont count the Saturdays and Sundays. How can i do that ?
you can make use of numpy's busday_count, Ex:
import pandas as pd
import numpy as np
# some dummy data
df = pd.DataFrame({'Creation': ['2021-03-29', '2021-03-30']})
# make sure we have datetime
df['Creation'] = pd.to_datetime(df['Creation'])
# set now to a fixed date
now = pd.Timestamp('2021-04-05')
# difference in business days, excluding weekends
# need to cast to datetime64[D] dtype so that np.busday_count works
df['busday_diff'] = np.busday_count(df['Creation'].values.astype('datetime64[D]'),
np.repeat(now, df['Creation'].size).astype('datetime64[D]'))
df['busday_diff'] # since I didn't define holidays, potential Easter holiday is excluded:
0 5
1 4
Name: busday_diff, dtype: int64
If you need the output to be of dtype timedelta, you can easily cast to that via
df['busday_diff'] = pd.to_timedelta(df['busday_diff'], unit='d')
df['busday_diff']
0 5 days
1 4 days
Name: busday_diff, dtype: timedelta64[ns]
Note: np.busday_count also allows you to set a custom weekmask (exclude days other than Saturday and Sunday) or a list of holidays. See the docs I linked on top.
Related: Calculate difference between two dates excluding weekends in python?, how to use (np.busday_count) with pandas.core.series.Series
I am trying to convert Julian codes to calendar dates in pandas using :
pd.to_datetime(43390, unit = 'D', origin = 'Julian')
This is giving me ValueError: origin Julian cannot be converted to a Timestamp
You need to set origin = 'julian'
pd.to_datetime(43390, unit = 'D', origin = 'julian')
but this number (43390) throws
OutOfBoundsDatetime: 43390 is Out of Bounds for origin='julian'
because the bounds are from 2333836 to 2547339
(Timestamp('1677-09-21 12:00:00') to Timestamp('2262-04-11 12:00:00'))
Method 1 - using Julian for origin didn't work
Method 2 - using excel start date to calculate other dates. All other date values will be referenced from excel default start date.
Finally this worked for me.
pd.to_datetime(43390, unit = 'D', origin=pd.Timestamp("30-12-1899"))
Below code works only for 6 digit julian value. It also handles the calendar date for leap and non-leap years.
A Julian date is considered as "CYYDDD". Where C represents century, YY represents Year and DDD represents total days which are then further defined in Days and Months.
import pandas as pd
from datetime import datetime
jul_date = '120075'
add_days = int(jul_date[3:6])
cal_date = pd.to_datetime(datetime.strptime(str(19+int(jul_date[0:1]))+jul_date[1:3]+'-01-01','%Y-%m-%d'))-timedelta(1)+pd.DateOffset(days= add_days)
print(cal_date.strftime('%Y-%m-%d'))
output: 2020-03-15
without timedelta(1): 2020-03-16
Here datetime.strptime function is being used to cast date type from string to date.
%Y represents year in 4 digit (1980)
%m & %d represents month and day in digits.
strftime('%Y-%m-%d') is used to remove timestamp from the date.
timedelta(1) :- It's used to minus one day from the date because we've concatenated year with '01-01'. so when total no's of days being split to days and months, one day will not be extra.
I want to create function which return me the difference between two dates excluding weekends and holidays ?
Eg:- Difference between 01/07/2019 and 08/07/2019 should return me 5 days excluding (weekend on 6/7/107 and 7/07/2019).
What should be best possible way to achieve this ???
Try converting string into date using the format of your date using pd.to_datetime()
use np.busday_count to find difference between days excluding the weekends
import pandas as pd
import numpy as np
date1 = "01/07/2019"
date2 = "08/07/2019"
date1 = pd.to_datetime(date1,format="%d/%m/%Y").date()
date2 = pd.to_datetime(date2,format="%d/%m/%Y").date()
days = np.busday_count( date1 , date2)
print(days)
5
incase you want to provide holidays
holidays = pd.to_datetime("04/07/2019",format="%d/%m/%Y").date()
days = np.busday_count( start, end,holidays=[holidays] )
print(days)
4
I am trying to find the # of months between 2 dates. Some solutions are off by 1 month and others are off by several months. I found This solution on SO but the solutions are either too complicated or incorrect.
For example, given the starting date of 04/30/12 and ending date of 03/31/16,
def diff_month(d1, d2):
return (d1.year - d2.year)*12 + d1.month - d2.month
returns 47 months, not 48
and
dates = [dt for dt in rrule(MONTHLY, dtstart=strt_dt, until=end_dt)]
returns 44 (Reason being that February does not have a day # 30 so it does not see it as a valid date)
I can of course fix that by doing
dates = [dt for dt in rrule(MONTHLY, dtstart=strt_dt.replace(day=2), until=end_dt.replace(day=1))]
But this does not seem like a proper solution (I mean the answer is right but the method sucks).
Is there a proper way of calculating the # of months so that given my example dates, it would return 48?
I realize this post doesn't have a Pandas tag, but if you are willing to use it you can simply do the following which takes the difference between two monthly periods:
import pandas as pd
>>> pd.Period('2016-3-31', 'M') - pd.Period('2012-4-30', 'M')
47