filter dates using pandas from dataframe - python

I have a column of dates. I need to filter out those dates that fall between today's date and end of the current month. If the dates fall between these dates then the next column showns "Y"
Date
Column
01/02/2021
03/02/2021
31/03/2021
Y
01/03/2021
07/03/2021
Y
08/03/2021
Y
Since today's date is 07/03/2021 three dates fall between 07/03/2021 and 31/03/2021.

Convert into datetime column using specific time format and compare with today's timestamp
df.Date = pd.to_datetime(df.Date, format='%d/%m/%Y')
today = pd.to_datetime('today').normalize()
end_of_month = today + pd.tseries.offsets.MonthEnd(1)
df['Column'] = np.where((df.Date >= today) & (df.Date <= end_of_month), 'Y', '')
Output
Date Column
0 2021-02-01
1 2021-02-03
2 2021-03-31 Y
3 2021-03-01
4 2021-03-07 Y
5 2021-03-08 Y
6 2021-04-02

Related

Creating year week based on date with different start date

I have a df
date
2021-03-12
2021-03-17
...
2022-05-21
2022-08-17
I am trying to add a column year_week, but my year week starts at 2021-06-28, which is the first day of July.
I tried:
df['date'] = pd.to_datetime(df['date'])
df['year_week'] = (df['date'] - timedelta(days=datetime(2021, 6, 24).timetuple()
.tm_yday)).dt.isocalendar().week
I played around with the timedelta days values so that the 2021-06-28 has a value of 1.
But then I got problems with previous & dates exceeding my start date + 1 year:
2021-03-12 has a value of 38
2022-08-17 has a value of 8
So it looks like the valid period is from 2021-06-28 + 1 year.
date year_week
2021-03-12 38 # LY38
2021-03-17 39 # LY39
2021-06-28 1 # correct
...
2022-05-21 47 # correct
2022-08-17 8 # NY8
Is there a way to get around this? As I am aggregating the data by year week I get incorrect results due to the past & upcoming dates. I would want to have negative dates for the days before 2021-06-28 or LY38 denoting that its the year week of the last year, accordingly year weeks of 52+ or NY8 denoting that this is the 8th week of the next year?
Here is a way, I added two dates more than a year away. You need the isocalendar from the difference between the date column and the dayofyear of your specific date. Then you can select the different scenario depending on the year of your specific date. use np.select for the different result format.
#dummy dataframe
df = pd.DataFrame(
{'date': ['2020-03-12', '2021-03-12', '2021-03-17', '2021-06-28',
'2022-05-21', '2022-08-17', '2023-08-17']
}
)
# define start date
d = pd.to_datetime('2021-6-24')
# remove the nomber of day of year from each date
s = (pd.to_datetime(df['date']) - pd.Timedelta(days=d.day_of_year)
).dt.isocalendar()
# get the difference in year
m = (s['year'].astype('int32') - d.year)
# all condition of result depending on year difference
conds = [m.eq(0), m.eq(-1), m.eq(1), m.lt(-1), m.gt(1)]
choices = ['', 'LY','NY',(m+1).astype(str)+'LY', '+'+(m-1).astype(str)+'NY']
# create the column
df['res'] = np.select(conds, choices) + s['week'].astype(str)
print(df)
date res
0 2020-03-12 -1LY38
1 2021-03-12 LY38
2 2021-03-17 LY39
3 2021-06-28 1
4 2022-05-21 47
5 2022-08-17 NY8
6 2023-08-17 +1NY8
I think
pandas period_range can be of some help
pd.Series(pd.period_range("6/28/2017", freq="W", periods=Number of weeks you want))

String dates into unixtime in a pandas dataframe

i got dataframe with column like this:
Date
3 mins
2 hours
9-Feb
13-Feb
the type of the dates is string for every row. What is the easiest way to get that dates into integer unixtime ?
One idea is convert columns to datetimes and to timedeltas:
df['dates'] = pd.to_datetime(df['Date']+'-2020', format='%d-%b-%Y', errors='coerce')
times = df['Date'].replace({'(\d+)\s+mins': '00:\\1:00',
'\s+hours': ':00:00'}, regex=True)
df['times'] = pd.to_timedelta(times, errors='coerce')
#remove rows if missing values in dates and times
df = df[df['Date'].notna() | df['times'].notna()]
df['all'] = df['dates'].dropna().astype(np.int64).append(df['times'].dropna().astype(np.int64))
print (df)
Date dates times all
0 3 mins NaT 00:03:00 180000000000
1 2 hours NaT 02:00:00 7200000000000
2 9-Feb 2020-02-09 NaT 1581206400000000000
3 13-Feb 2020-02-13 NaT 1581552000000000000

Days before end of month in pandas

I would like to get the number of days before the end of the month, from a string column representing a date.
I have the following pandas dataframe :
df = pd.DataFrame({'date':['2019-11-22','2019-11-08','2019-11-30']})
df
date
0 2019-11-22
1 2019-11-08
2 2019-11-30
I would like the following output :
df
date days_end_month
0 2019-11-22 8
1 2019-11-08 22
2 2019-11-30 0
The package pd.tseries.MonthEnd with rollforward seemed a good pick, but I can't figure out how to use it to transform a whole column.
Subtract all days of month created by Series.dt.daysinmonth with days extracted by Series.dt.day:
df['date'] = pd.to_datetime(df['date'])
df['days_end_month'] = df['date'].dt.daysinmonth - df['date'].dt.day
Or use offsets.MonthEnd, subtract and convert timedeltas to days by Series.dt.days:
df['days_end_month'] = (df['date'] + pd.offsets.MonthEnd(0) - df['date']).dt.days
print (df)
date days_end_month
0 2019-11-22 8
1 2019-11-08 22
2 2019-11-30 0

Return DataFrame rows for max date of every month and only if it falls in the last 2 weeks of that month

I'm want to return rows by checking for the maximum date of the month and then rechecking if the date falls in the last 2 weeks of that particular month. Below is the DataFrame that I'm using:
finalPrize date high low
1777.44 2018-07-31 1801.83 1739.32
1797.17 2018-06-27 1798.44 1776.02
1834.33 2018-05-28 1836.56 1786.00
1823.29 2018-04-03 1841.00 1821.50
1847.75 2018-03-29 1847.77 1818.92
I have referred other answers and found a way to find the max date from the 'date' column. Here is the code:
df.index = df['date']
print(df.groupby(df.index.month).apply(lambda x: x.index.max()))
But, this results into:
date
1 2019-07-31
2 2019-06-27
3 2019-05-28
4 2019-04-03
5 2019-03-29
Rather, I want to return all the values from the rows where these dates occur But that date should fall in last 2 weeks. I'm not able to figure out how to do that!
So expected output is:
finalPrize date high low
1777.44 2018-07-31 1801.83 1739.32
1797.17 2018-06-27 1798.44 1776.02
1834.33 2018-05-28 1836.56 1786.00
1847.75 2018-03-29 1847.77 1818.92
import calendar
df.index = pd.to_datetime(df.index)
df['day'] = pd.to_numeric(df.index.day)
df['days_in_month'] = df.apply(lambda row : calendar.monthrange(row.name.year,row.name.month)[1], axis = 1)
df['first_day'] = df.apply(lambda row : calendar.monthrange(row.name.year,row.name.month)[0], axis = 1)
df['days_in_last_week'] = ((df['days_in_month'])%7+df['first_day'])%7
df[df['day'] > (df['days_in_month'] - df['days_in_last_week'])]
Hope this works!Do this after you set date to index.

Comparing today date with date in dataframe

Comparing today date with date in dataframe
Sample Data
id date
1 1/2/2018
2 1/5/2019
3 5/3/2018
4 23/11/2018
Desired output
id date
2 1/5/2019
4 23/11/2018
My current code
dfdateList = pd.DataFrame()
dfDate= self.df[["id", "date"]]
today = datetime.datetime.now()
today = today.strftime("%d/%m/%Y").lstrip("0").replace(" 0", "")
expList = []
for dates in dfDate["date"]:
if dates <= today:
expList.append(dates)
dfdateList = pd.DataFrame(expList)
Currently my code is printing every single line despite the conditions, can anyone guide me? thanks
Pandas has native support for a large class of operations on datetimes, so one solution here would be to use pd.to_datetime to convert your dates from strings to pandas' representation of datetimes, pd.Timestamp, then just create a mask based on the current date:
df['date'] = pd.to_datetime(df['date'], dayfirst=True)
df[df['date'] > pd.Timestamp.now()]
For example:
In [34]: df['date'] = pd.to_datetime(df['date'], dayfirst=True)
In [36]: df
Out[36]:
id date
0 1 2018-02-01
1 2 2019-05-01
2 3 2018-03-05
3 4 2018-11-23
In [37]: df[df['date'] > pd.Timestamp.now()]
Out[37]:
id date
1 2 2019-05-01
3 4 2018-11-23

Categories