I need a function to count the total number of days in the 'days' column between a start date of 1st Jan 1995 and an end date of 31st Dec 2019 in a dataframe taking leap years into account as well.
Example: 1st Jan 1995 - Day 1, 1st Feb 1995 - Day 32 .......and so on all the way to 31st.
If you want to filter a pandas dataframe using a range of 2 date you can do this by:
start_date = '1995/01/01'
end_date = '1995/02/01'
df = df[ (df['days']>=start_date) & (df['days']<=end_date) ]
and with len(df) you will see the number of rows of the filter dataframe.
Instead, if you want to calculate a range of days between 2 different date you can do without pandas with datetime:
from datetime import datetime
start_date = '1995/01/01'
end_date = '1995/02/01'
delta = datetime.strptime(end_date, '%Y/%m/%d') - datetime.strptime(start_date, '%Y/%m/%d')
print(delta.days)
Output:
31
The only thing is that this not taking into account leap years
I want to produce a dataframe that splits by day (which is the day date of the month) but then orders them by the date. At the moment the code below splits them into dates e.g. 1 - 11, 2 - 11 but the 30 -10 and 31-10 come after all my November dates.
ResultSet2 = ResultProxy2.fetchall()
df2 = pd.DataFrame(ResultSet2)
resultsrecovery = [group[1] for group in df2.groupby(["day"])]
The current code output :
I basically want the grouped dataframe for the 30-10 and 31st of October to come before all the ones in November
I have a dataframe with multiple columns, one of which is a date column. I'm interested in creating a new column which contains the number of months between the date column and a preset date. For example one of the dates in the 'start date' column is '2019-06-30 00:00:00' i would want to be able to calculate the number of months between that date and the end of 2021 so 2021-12-31 and place the answer into a new column and do this for the entire date column in the dataframe. I haven't been able to work out how i could go about this but i would like it in the end to look like this if the predetermined end date was 2021-12-31:
df =
|start date months
0|2019-06-30 30
1|2019-08-12 28
2|2020-01-24 23
You can do this using np.timedelta64:
end_date = pd.to_datetime('2021-12-31')
df['start date'] = pd.to_datetime(df['start date'])
df['month'] = ((end_date - df['start date'])/np.timedelta64(1, 'M')).astype(int)
print(df)
start date month
0 2019-06-30 30
1 2019-08-12 28
2 2020-01-24 23
Assume that start date column is of datetime type (not string)
and the reference date is defined as follows:
refDate = pd.to_datetime('2021-12-31')
or any other date of your choice.
Then you can compute the number of months as:
df['months'] = (refDate.to_period('M') - df['start date']\
.dt.to_period('M')).apply(lambda x: x.n)
With the datetime data below I would like to set a "deadline" relative to the 'Date' column. In essence, the deadline should be 2 Business Days from the time of the "Date". However there are some specific criteria, which are below:
If a 'Date' is in Holidays or on a weekend, then the deadline should be the next 2 business day at 17:00 hours.
If a 'Date' is on Monday-Friday, not a holiday, AND the hour of the 'Date' is between 0-8, then the deadline should be the next non-holiday, business day at 17:00 hours
If a 'Date' is on Monday-Friday, not a holiday, AND the hour of the 'Date' is between 9-17, then the deadline should be the next 2 non-holiday, business day at the same time.
If a 'Date' is on Monday-Friday, not a holiday, AND the hour of the 'Date' is between 18-23, then the deadline should be the next 2 non-holiday, business day at 17:00 hours
Below is the data:
import datetime
Holidays = [date(2018,1,1),date(2018,1,15),date(2018,2,19),date(2018,3,9)]
df = pd.DataFrame({'Date': ['2018-01-01 18:47','2018-01-08 06:11','2018-01-12 10:05','2018-02-10 09:22','2018-02-20 14:14','2018-03-08 16:17','2018-03-25 17:35'],
'Weekday': [0,0,4,5,1,3,6],
'Hour': [18,6,10,9,14,16,17]})
df['Date'] = pd.to_datetime(df['Date'])
The result should be as follows:
df = pd.DataFrame({'Date': ['2018-01-01 18:47','2018-01-08 06:11','2018-01-12 10:05','2018-02-10 09:22','2018-02-21 14:14','2018-03-08 16:17','2018-03-25 17:35'],
'Deadline': ['2018-01-03 17:00','2018-01-09 17:00','2018-01-17 10:05','2018-02-13 17:00','2018-02-23 14:14','2018-03-13 16:17','2018-03-27 17:00']})
I am trying to create a date range using pd.date_range from 2018-01-01 to 2018-01-31 for days of the week Monday, Tuesday, Wednesday, from 6 AM - 6 PM at every 12 minutes.
Basically, I need an array of datetime objects or strings with a value every 12 minutes for particular days of the week, between particular business hours for the given range of dates. I am not able to use CustomBusinessDay, CustomBusinessHour and freq together to get the desired range of datetime objects.
Any suggestions?
You could use
index = pd.date_range('2018-01-01', '2018-02-01', freq='12min')
index[(index.dayofweek <= 2) & (index.hour >= 6) & (index.hour < 18)]