How do I slice data before and after a certain date?

How do I slice data before and after a certain date? - python

I have a pandas Series object with dates as index and values as a share price of a company. I would like to slice the data, so that I have let´s say a date 10.01.2022, and I want a slice from 3 previous dates and 5 next days from this date. Is that easily done? Or do I have to convert it, add/subtract those numbers from that date, and convert back? I´m a bit lost in all that datetime, strptime, to_datetime,...
Something like this:
date = "10.01.2022"
share_price = [date - 3 : date + 5]
Thank you

You can use .loc[]. Both ends will be inclusive.
Example:
s = pd.Series([1,2,3,4,5,6],
index = pd.to_datetime([
'07.01.2022', '09.01.2022', '10.01.2022',
'12.01.2022', '15.01.2022', '16.01.2022'
], dayfirst=True))
date = pd.to_datetime("10.01.2022", dayfirst=True)
s:
2022-01-07 1
2022-01-09 2
2022-01-10 3
2022-01-12 4
2022-01-15 5
2022-01-16 6
dtype: int64
date:
Timestamp('2022-01-10 00:00:00')
s.loc[date - pd.Timedelta('3d') : date + pd.Timedelta('5d')]
2022-01-07 1
2022-01-09 2
2022-01-10 3
2022-01-12 4
2022-01-15 5
dtype: int64
Edit:
To add business days:
from pandas.tseries.offsets import BDay
s.loc[date - BDay(3) : date + BDay(5)]

Related

How to calculate the quantity of business days between two dates using Pandas

I created a pandas df with columns named start_date and current_date. Both columns have a dtype of datetime64[ns]. What's the best way to find the quantity of business days between the current_date and start_date column?
I've tried:
from pandas.tseries.holiday import USFederalHolidayCalendar
from pandas.tseries.offsets import CustomBusinessDay
us_bd = CustomBusinessDay(calendar=USFederalHolidayCalendar())
projects_df['start_date'] = pd.to_datetime(projects_df['start_date'])
projects_df['current_date'] = pd.to_datetime(projects_df['current_date'])
projects_df['days_count'] = len(pd.date_range(start=projects_df['start_date'], end=projects_df['current_date'], freq=us_bd))
I get the following error message:
Cannot convert input....start_date, dtype: datetime64[ns]] of type <class 'pandas.core.series.Series'> to Timestamp
I'm using Python version 3.10.4.

pd.date_range's parameters need to be datetimes, not series.
For this reason, we can use df.apply to apply the function to each row.
In addition, pandas has bdate_range which is just date_range with freq defaulting to business days, which is exactly what you need.
Using apply and a lambda function, we can create a new Series calculating business days between each start and current date for each row.
projects_df['start_date'] = pd.to_datetime(projects_df['start_date'])
projects_df['current_date'] = pd.to_datetime(projects_df['current_date'])
projects_df['days_count'] = projects_df.apply(lambda row: len(pd.bdate_range(row['start_date'], row['current_date'])), axis=1)
Using a random sample of 10 date pairs, my output is the following:
start_date current_date bdays
0 2022-01-03 17:08:04 2022-05-20 00:53:46 100
1 2022-04-18 09:43:02 2022-06-10 16:56:16 40
2 2022-09-01 12:02:34 2022-09-25 14:59:29 17
3 2022-04-02 14:24:12 2022-04-24 21:05:55 15
4 2022-01-31 02:15:46 2022-07-02 16:16:02 110
5 2022-08-02 22:05:15 2022-08-17 17:25:10 12
6 2022-03-06 05:30:20 2022-07-04 08:43:00 86
7 2022-01-15 17:01:33 2022-08-09 21:48:41 147
8 2022-06-04 14:47:53 2022-12-12 18:05:58 136
9 2022-02-16 11:52:03 2022-10-18 01:30:58 175

First week of year considering the first day last year

I have the following df:
time_series date sales
store_0090_item_85261507 1/2020 1,0
store_0090_item_85261501 2/2020 0,0
store_0090_item_85261500 3/2020 6,0
Being 'date' = Week/Year.
So, I tried use the following code:
df['date'] = df['date'].apply(lambda x: datetime.strptime(x + '/0', "%U/%Y/%w"))
But, return this df:
time_series date sales
store_0090_item_85261507 2020-01-05 1,0
store_0090_item_85261501 2020-01-12 0,0
store_0090_item_85261500 2020-01-19 6,0
But, the first day of the first week of 2020 is 2019-12-29, considering sunday as first day. How can I have the first day 2020-12-29 of the first week of 2020 and not 2020-01-05?

From the datetime module's documentation:
%U: Week number of the year (Sunday as the first day of the week) as a zero padded decimal number. All days in a new year preceding the first Sunday are considered to be in week 0.
Edit: My originals answer doesn't work for input 1/2023 and using ISO 8601 date values doesn't work for 1/2021, so I've edited this answer by adding a custom function
Here is a way with a custom function
import pandas as pd
from datetime import datetime, timedelta
##############################################
# to demonstrate issues with certain dates
print(datetime.strptime('0/2020/0', "%U/%Y/%w")) # 2019-12-29 00:00:00
print(datetime.strptime('1/2020/0', "%U/%Y/%w")) # 2020-01-05 00:00:00
print(datetime.strptime('0/2021/0', "%U/%Y/%w")) # 2020-12-27 00:00:00
print(datetime.strptime('1/2021/0', "%U/%Y/%w")) # 2021-01-03 00:00:00
print(datetime.strptime('0/2023/0', "%U/%Y/%w")) # 2023-01-01 00:00:00
print(datetime.strptime('1/2023/0', "%U/%Y/%w")) # 2023-01-01 00:00:00
#################################################
df = pd.DataFrame({'date':["1/2020", "2/2020", "3/2020", "1/2021", "2/2021", "1/2023", "2/2023"]})
print(df)
def get_first_day(date):
date0 = datetime.strptime('0/' + date.split('/')[1] + '/0', "%U/%Y/%w")
date1 = datetime.strptime('1/' + date.split('/')[1] + '/0', "%U/%Y/%w")
date = datetime.strptime(date + '/0', "%U/%Y/%w")
return date if date0 == date1 else date - timedelta(weeks=1)
df['new_date'] = df['date'].apply(lambda x:get_first_day(x))
print(df)
Input
date
0 1/2020
1 2/2020
2 3/2020
3 1/2021
4 2/2021
5 1/2023
6 2/2023
Output
date new_date
0 1/2020 2019-12-29
1 2/2020 2020-01-05
2 3/2020 2020-01-12
3 1/2021 2020-12-27
4 2/2021 2021-01-03
5 1/2023 2023-01-01
6 2/2023 2023-01-08

You'll want to use ISO week parsing directives, Ex:
import pandas as pd
date = pd.Series(["1/2020", "2/2020", "3/2020"])
pd.to_datetime(date+"/1", format="%V/%G/%u")
0 2019-12-30
1 2020-01-06
2 2020-01-13
dtype: datetime64[ns]
you can also shift by one day if the week should start on Sunday:
pd.to_datetime(date+"/1", format="%V/%G/%u") - pd.Timedelta('1d')
0 2019-12-29
1 2020-01-05
2 2020-01-12
dtype: datetime64[ns]

parse multiple date format pandas

I 've got stuck with the following format:
0 2001-12-25
1 2002-9-27
2 2001-2-24
3 2001-5-3
4 200510
5 20078
What I need is the date in a format %Y-%m
What I tried was
def parse(date):
if len(date)<=5:
return "{}-{}".format(date[:4], date[4:5], date[5:])
else:
pass
df['Date']= parse(df['Date'])
However, I only succeeded in parse 20078 to 2007-8, the format like 2001-12-25 appeared as None.
So, how can I do it? Thank you!

we can use the pd.to_datetime and use errors='coerce' to parse the dates in steps.
assuming your column is called date
s = pd.to_datetime(df['date'],errors='coerce',format='%Y-%m-%d')
s = s.fillna(pd.to_datetime(df['date'],format='%Y%m',errors='coerce'))
df['date_fixed'] = s
print(df)
date date_fixed
0 2001-12-25 2001-12-25
1 2002-9-27 2002-09-27
2 2001-2-24 2001-02-24
3 2001-5-3 2001-05-03
4 200510 2005-10-01
5 20078 2007-08-01
In steps,
first we cast the regular datetimes to a new series called s
s = pd.to_datetime(df['date'],errors='coerce',format='%Y-%m-%d')
print(s)
0 2001-12-25
1 2002-09-27
2 2001-02-24
3 2001-05-03
4 NaT
5 NaT
Name: date, dtype: datetime64[ns]
as you can can see we have two NaT which are null datetime values in our series, these correspond with your datetimes which are missing a day,
we then reapply the same datetime method but with the opposite format, and apply those to the missing values of s
s = s.fillna(pd.to_datetime(df['date'],format='%Y%m',errors='coerce'))
print(s)
0 2001-12-25
1 2002-09-27
2 2001-02-24
3 2001-05-03
4 2005-10-01
5 2007-08-01
then we re-assign to your dataframe.

You could use a regex to pull out the year and month, and convert to datetime :
df = pd.read_clipboard("\s{2,}",header=None,names=["Dates"])
pattern = r"(?P<Year>\d{4})[-]*(?P<Month>\d{1,2})"
df['Dates'] = pd.to_datetime([f"{year}-{month}" for year, month in df.Dates.str.extract(pattern).to_numpy()])
print(df)
Dates
0 2001-12-01
1 2002-09-01
2 2001-02-01
3 2001-05-01
4 2005-10-01
5 2007-08-01
Note that pandas automatically converts the day to 1, since only year and month was supplied.

String dates into unixtime in a pandas dataframe

i got dataframe with column like this:
Date
3 mins
2 hours
9-Feb
13-Feb
the type of the dates is string for every row. What is the easiest way to get that dates into integer unixtime ?

One idea is convert columns to datetimes and to timedeltas:
df['dates'] = pd.to_datetime(df['Date']+'-2020', format='%d-%b-%Y', errors='coerce')
times = df['Date'].replace({'(\d+)\s+mins': '00:\\1:00',
'\s+hours': ':00:00'}, regex=True)
df['times'] = pd.to_timedelta(times, errors='coerce')
#remove rows if missing values in dates and times
df = df[df['Date'].notna() | df['times'].notna()]
df['all'] = df['dates'].dropna().astype(np.int64).append(df['times'].dropna().astype(np.int64))
print (df)
Date dates times all
0 3 mins NaT 00:03:00 180000000000
1 2 hours NaT 02:00:00 7200000000000
2 9-Feb 2020-02-09 NaT 1581206400000000000
3 13-Feb 2020-02-13 NaT 1581552000000000000

How can I change the datetime values in a pandas df?

I have a pandas dataframe and datetime is used as an index in the following format: datetime.date(2018, 12, 31).
Each datetime represents the fiscal year end, i.e. 31/12/2018, 31/12/2017, 31/12/2016 etc.
However, for some companies the fiscal year end may be 30/11/2018 or 31/10/2018 and etc. instead of the last date of each year.
Is there any quick way in changing the non-standardized datetime to the last date of each year?
i.e. from 30/11/2018 to 30/12/2018 and 31/10/2018 to 31/12/2018 an so on.....

df = pd.DataFrame({'datetime': ['2019-01-02','2019-02-01', '2019-04-01', '2019-06-01', '2019-11-30','2019-12-30'],
'data': [1,2,3,4,5,6]})
df['datetime'] = pd.to_datetime(df['datetime'])
df['quarter'] = df['datetime'] + pd.tseries.offsets.QuarterEnd(n=0)
df
datetime data quarter
0 2019-01-02 1 2019-03-31
1 2019-02-01 2 2019-03-31
2 2019-04-01 3 2019-06-30
3 2019-06-01 4 2019-06-30
4 2019-11-30 5 2019-12-31
5 2019-12-30 6 2019-12-31
We have a datetime column with random dates I picked. Then we add a timeseries offset to the end of each date to make it quarter end and standardize the times.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How do I slice data before and after a certain date? - python

Related

How to calculate the quantity of business days between two dates using Pandas

First week of year considering the first day last year

parse multiple date format pandas

String dates into unixtime in a pandas dataframe

How can I change the datetime values in a pandas df?

Categories

Resources