Work with data in DataFrame in Python Pandas? - python

I have DataFrame like below:
date = pd.DataFrame({'inputDates':['2015-01-07', '2015-12-02',
'2005-01-03', '2016-11-13',
'2020-06-03']})
And I need to check for all of these dates:
number of day in month - for example 07.01.2015 it is seventh day in month
number of week in year - for example 07.01.2015 is 1st week in year
number of month in year - for example 07.01.2015 is 1st month in year
number of day in year - for example 07.01.2015 is the 7th day in year
number of quarter in year for example 07.01.2015 is the 1st quarter in year

Try (see more in the doc ):
date['inputDates'] = pd.to_datetime(date['inputDates'])
# day in month
date.inputDates.dt.day
# week in year
date.inputDates.dt.isocalendar().week
# month in year
date.inputDates.dt.month
# day in year
date.inputDates.dt.dayofyear
# quarter
date.inputDates.dt.to_period('Q-DEC').dt.quarter

Related

How to convert date format (dd/mm/yyyy) to days in python csv

I need a function to count the total number of days in the 'days' column between a start date of 1st Jan 1995 and an end date of 31st Dec 2019 in a dataframe taking leap years into account as well.
Example: 1st Jan 1995 - Day 1, 1st Feb 1995 - Day 32 .......and so on all the way to 31st.
If you want to filter a pandas dataframe using a range of 2 date you can do this by:
start_date = '1995/01/01'
end_date = '1995/02/01'
df = df[ (df['days']>=start_date) & (df['days']<=end_date) ]
and with len(df) you will see the number of rows of the filter dataframe.
Instead, if you want to calculate a range of days between 2 different date you can do without pandas with datetime:
from datetime import datetime
start_date = '1995/01/01'
end_date = '1995/02/01'
delta = datetime.strptime(end_date, '%Y/%m/%d') - datetime.strptime(start_date, '%Y/%m/%d')
print(delta.days)
Output:
31
The only thing is that this not taking into account leap years

How to get year,month,quarter from the date in pandas

I have a function that takes start date and end date and builds a dataframe off of it. The columns I want in my dataframe are what month it is, what year it is, quater and two more columns.
I want to build the year column based on fiscal year. So a new fiscal year is July-June. So if the date is Jan 1,2021 it is Fiscal Year 2021 but August 1,2021 is Fiscal year 2022. And a quarter is 3 months, so July-Sept is Q1, Oct-Dec is Q2, Jan-March is Q3 and April-June is Q4. How do I do I add this quarter column?
Next I wanted Year context and Month Context so if the year is 2022 its current year, if it is 2021 it is 1 year ago, if it is 2020, its 2 years ago and so on. And for Month, if the date is August,2021 its current month, if it is July,2021 its 1 month ago and so on. Similarly I want a quarter context column.
Here is my code for all this: When I do this it says January as Q1 which is not correct, and I am not sure how to change it.
Its a long post but to sum up here's what I need help with:
Quarter column,year context column, month context column
Its a long post but to sum up here's what I need help with:
Quarter column,year context column, month context column, creating quarter context column
def create_date_table3(start, end):
df = pd.DataFrame({"date": pd.date_range(start, end)})
df = df.assign(Year=(df.date - pd.offsets.MonthBegin(7)).dt.year + 1)
current_month = datetime.now().month
current_quarter = df["date"].dt.quarter
current_year = datetime.now().year if datetime.now().month <= 6 else datetime.now().year + 1
df["Month"] = df.date.dt.month
df["Year_Month"] = df.Year.astype(str) + '_' + df.Month.astype(str)
df = df.assign(Quarter=(df.date - pd.offsets.MonthBegin(7)).dt.quarter)
df["YearContext"] = ["Current Year" if x == current_year else str(current_year - x) + ' Yr Ago' for x in df["Year"]]
df["MonthContext"] = ["Current Month" if x == current_month else str(current_month - x) + ' Mo Ago' for x in df["Month"]]
return df

Find week number from a date but choosing a start date [duplicate]

This question already has answers here:
How to convert columns into one datetime column in pandas?
(8 answers)
Closed 1 year ago.
I'm using pandas and have 3 columns of data, containing a day, a month, and a year. I want to input my numbers into a loop so that I can create a new column in my dataframe that shows the week number. My data also starts from October 1, and I need this to be my first week.
I've tried using this code:
for (a,b,c) in zip(year, month, day):
print(datetime.date(a, b, c).strftime("%U"))
But this assumes that the first week is in January.
I'm also unsure how to assign what's in the loop to a new column. I was just printing what was in the for loop to test it out.
Thanks
I think this is what you want :
import pandas as pd
import datetime
# define a function to get the week number according to January
get_week_number = lambda y, m, d : int(datetime.date(y, m, d).strftime('%U'))
# get the week number for October 1st; the offset
offset = get_week_number(2021, 10, 1)
def compute_week_number(year, month, day):
"""
Function that computes the week number with an offset
October 1st becomes week number 1
"""
return get_week_number(year, month, day) - offset + 1
df = pd.DataFrame({'year':[2021, 2021, 2021],
'month':[10, 10, 10],
'day':[1, 6, 29]})
df['week_number'] = df.apply(lambda x: compute_week_number(x['year'],
x['month'],
x['day']),
axis=1)
apply with the use of axis=1 allows to call a function for each line of the dataframe to return the value of the new column we want to compute for this line.
I used % (modulo) to compute the new week number according to what you asked for.
Week 39 becomes week 1, week 40 becomes week 2 and so on.
This gives :
year
month
day
week_number
2021
10
1
1
2021
10
6
2
2021
10
29
5

Pandas read date from CSV and groupby total business days per month

data.CSV
ID Activity Month Activity Date
0 04/2019 04-01-2019
1 05/2019 05-13-2019
2 05/2019 05-25-2019
3 06/2019 06-10-2019
4 06/2019 06-19-2019
5 07/2019 07-15-2019
6 07/2019 07-18-2019
7 07/2019 07-29-2019
8 08/2019 06-03-2019
9 08/2019 06-15-2019
10 08/2019 06-20-2019
MY PLAN
Read csv:
df = pd.read_csv('data.CSV')
Convert to datetime:
df['Activity Date'] = pd.to_datetime(df['Activity Date'], dayfirst=True)
Groupby the Activity Month column:
grouped = df.groupby(['Activity Month'])['Activity Date'].count()
print(grouped)
Activity Month
04/2019 15532
05/2019 13924
06/2019 12822
07/2019 14067
08/2019 10939
Name: Activity Date, dtype: int64
While the date is grouped, perform business day calculation:
This part i'm not sure what to do. Lost already
CODE I USED TO CALCULATE BUSINESS DAYS
import calendar
import datetime
x = datetime.date(2019, 4, 1)
cal = calendar.Calendar()
working_days = len([x for x in cal.itermonthdays2(x.year, x.month) if x[0] !=0 and x[1] < 5])
print ("Total business days for month (" + str(x.month) + ") is " + str(working_days) + " days")
OUTPUT THAT I WANTED
Total business days for month (4) is 22 days
Total business days for month (5) is 23 days
Total business days for month (6) is 20 days
Total business days for month (7) is 23 days
Total business days for month (8) is 22 days
I'm not entirely clear and the problem statement here, but if you want to calculate the number of business days for each Activity Month, you can wrap your calculation in a method, and apply that method over your Activity Month column (the lambda expression is basically a for loop operation over each row for specified columns).
grouped = df.groupby(['Activity Month'])['Activity Date'].count().reset_index()
def get_business_days(x):
x = datetime.date(int(x.split('/')[1]), int(x.split('/')[0]), 1)
cal = calendar.Calendar()
working_days = len([x for x in cal.itermonthdays2(x.year, x.month) if x[0] !=0 and x[1] < 5])
return ("Total business days for month (" + str(x.month) + ") is " + str(working_days) + " days")
grouped['Activity Month'].apply(get_business_days)
The output is a Series that has your text output.
0 Total business days for month (4) is 22 days
1 Total business days for month (5) is 23 days
2 Total business days for month (6) is 20 days
3 Total business days for month (7) is 23 days
4 Total business days for month (8) is 22 days
But, it's a bad idea to store repeated information in every cell. It'd be preferable to simply return working_days instead of having it embedded in a string.

Get next and last ISO week number in a year

By this I am getting current week number
datetime.date(2014, 9, 30).isocalendar()[1]
but I want to get what is next ISO week number and max ISO week number in current year in python.
I can't add current week number + 1 to get next week number, because it may be that year doesn't have that week number.
To get next week's week number, add a timedelta:
>>> import datetime
>>> today = datetime.date(2014, 9, 30)
>>> next_week = today + datetime.timedelta(days=7)
>>> next_week.isocalendar()[1]
41
To get the last week number in the year, note that the following rule is used:
The following years have 53 weeks:
years starting with Thursday
leap years starting with Wednesday

Categories