Pandas dataframe get next (trading) day in dataframe - python

I have a date given that may or may not be a trading day, and I have a pandas dataframe indexed by trading days that has returns of each trading day.
This is my date
dt_query = datetime.datetime(2006, 12, 31, 16)
And I want to do something like this (returns is a pandas dataframe)
returns.ix[pd.Timestamp(dt_query + datetime.timedelta(days = 1))]
However, that may not work as one day ahead may or may not be a trading day. I could make a try block that loops and tries until we find something, but I'm wondering if there's a more elegant way that just uses pandas.

This might not be the most the elegant solution but it works.
Here's the idea: from any date dt_query, within a number of calender days (say 10), there must be trading days, and your next trading day is just the first among them. So you can find all days in returns within dt_query and dt_query + timedelta(days = 10), and then get the first one.
Using your example, it should look like
next_trading_date = returns.index[(returns.index > dt_query) & (returns.index <= dt_query + timedelta(days = 10))][0]

You can check the timedelta of the whole column doing:
delta = returns.column - dt_query
then use np.timedelta64() to define a tolerance used to check which rows you want to select:
tol = np.timedelta64(days=2)
and:
returns[delta < tol]
will return the rows within the desired range...

Thank you! That's been plaguing me for hours.
I altered it a bit:
try:
date_check = dja[start_day]
except KeyError:
print("Start date not a trading day, fetching next trading day...")
test = dja.index.searchsorted(start_day)
next_date = dja.index[(dja.index > start_day)]
start_date = next_date[0]
print("New date:", start_date)

Related

Filter Django query by week of month?

In a Django query, how would you filter by a timestamp's week within a month?
There's a built-in week accessor, but that refers to week-of-the-year, e.g. 1-52. As far as I can tell, there's no other built-in option.
The only way I see to do this is to calculate the start and end date range for the week, and then filter on that using the conventional means.
So I'm using a function like:
def week_of_month_date(year, month, week):
"""
Returns the date of the first day in the week of the given date's month,
where Monday is the first day of the week.
e.g. week_of_month_date(year=2022, month=8, week=2) -> date(2022, 8, 7)
"""
assert 1 <= week <= 5
assert 1 <= month <= 12
for i in range(1, 32):
dt = date(year, month, i)
_week = week_of_month(dt)
if _week == week:
return dt
and then to calculate for, say, the 3rd week of July, 2022, I'd do:
start_date = week_of_month_date(2022, 7, 3)
end_date = week_of_month_date(2022, 7, 3) + timedelta(days=7)
qs = MyModel.objects.filter(created__gte=start_date, created__lte=end_date)
Is there an easier or more efficient way to do this with the Django ORM or SQL?
The easiest way to do this using datetime objects is to quite simply subtract the current date weekly year value, with the yearly week value for the 1st day (or 1st week) of the month.
You can use the .isocalendar() function to achieve this:
dt.isocalendar[1] - dt.replace(day=1).isocalendar()[1] + 1
Basically if the week is 46 and that means the first week is week 44 then the resulting output should be 2.
UPDATE
I misunderstood the question, the answer is clear below. However, you may want to consider revising your function based on my above comments.
Come to think of it, if you have a datetime object, you can get the isocalendar week and filter using that like so:
MyModel.objects.filter(created__week=dt.isocalendar()[1])
dt.isocalendar() returns essentially a tuple of 3 integers, [0], is the year, [1], is the iso week (1-52 or 53) and [2], the day of the week (1-7).
As per the docs here:
https://docs.djangoproject.com/en/4.1/ref/models/querysets/#week
There is a built-in filter for isoweek out of the box :)
However, filtering by "week of month" is not possible within the realms of "out of the box".
You might consider writing your own query expression object which accepts an isocalendar object and converts that? But I think you would be better off converting a datetime object and use the isoweek filter.
There's a neat little blog post here to get you started if you really want to do that:
https://dev.to/idrisrampurawala/writing-custom-django-database-functions-4dmb

find the time different between the day

Dataframe
I have different machine running different hours that might cross over a day and I want to differentiate it on different day
Example Machine A running 8 hours from Start Date and Time 12-Aug, 9pm to 13-Aug , 5am
I cant get the correct time that 3hours from 12-Aug and 5hours from 13-Aug
Suspect that because i'm using datetime.now
how do it change the date will be same as Start date/ End date in python?
Here is my code:
endoftoday = datetime.now()
endoftoday = endoftoday.replace(hour=23,minute=59,second=59)
dt['Start_Date']=dt['Start_Time'].dt.strftime('%d/%m/%Y')
dt['End_Date']=dt['Finish_Time'].dt.strftime('%d/%m/%Y')
if (dt.['Start_Date'].str == dt['End_Date'].str):
dt['Tested_Time_Today']= endoftoday-dt['Start_Time']
dt['Tested_Time_NextDay']= dt['Finish_Time'] - endoftoday
Here is my attempt:
import pandas as pd
import datetime
def get_times(args):
start_time, end_time, start_date, end_date = args
hours = {}
for day in pd.date_range(start_date, end_date, freq='d'):
hours[day] = max(day, end_time) - max(start_time, day) + datetime.timedelta(hours=24)
return hours
df = pd.DataFrame({'Start_Time': [datetime.datetime(2021,8,21,6,2), datetime.datetime(2021,8,21,7,19)], 'Finish_Time': [datetime.datetime(2021,8,22,5,12), datetime.datetime(2021,8,21,16,50)], 'Start_Date': [datetime.date(2021,8,21), datetime.date(2021,8,21)], 'End_Date': [datetime.date(2021,8,22), datetime.date(2021,8,21)]})
df['hours'] = df.apply(get_times, axis=1)
print(df)
This is probably not exactly what you are looking for since I also don't really understand your question well enough. But what you get is a new column which contains in each row a dictionary with the dates as key and the hours during that day as value.
If you let us know what exactly you are after, I might be able to improve the answer.
Edit: This won't work if your time period covers more than two days. If that is necessary, the time calculation would have to slightly extended. And if you have more columns than the ones that we perform the calculation on, please change the penultimate row to df['hours'] = df[['Start_Time', 'Finish_Time', 'Start_Date', 'End_Date']].apply(get_times, axis=1)

How to count days belonging to a given month in the range of two Python datetimes?

I have two Python datetime and I want to count the days between those dates, counting ONLY the days belonging to the month I choose. The range might overlap multiple months/years.
Example:
If I have 2017-10-29 & 2017-11-04 and I chose to count the days in October, I get 3 (29, 30 & 31 Oct.).
I can't find a straightforward way to do this so I think I'm going to iterate over the days using datetime.timedelta(days=1), and increment a count each time the day belongs to the month I chose.
Do you know a more performant method?
I'm using Python 2.7.10 with the Django framework.
Iterating over the days would be the most straightforward way to do it. Otherwise, you would need to know how many days are in a given month and you would need different code for different scenarios:
The given month is the month of the first date
The given month is the month of the second date
The given month is between the first and the second date (if dates span more than two months)
If you want to support dates spanning more than one year then you would need the input to include month and year.
Your example fits scenario #1, which I guess you could do like this:
>>> from datetime import datetime, timedelta
>>>
>>> first_date = datetime(2017, 10, 29)
>>>
>>> first_day_of_next_month = first_date.replace(month=first_date.month + 1, day=1)
>>> last_day_of_this_month = first_day_of_next_month - timedelta(1)
>>> number_of_days_in_this_month = last_day_of_this_month.day
>>> number_of_days_in_this_month - first_date.day + 1
3
This is why I would suggest implementing it the way you originally intended and only turning to this if there's a performance concern.
You can get difference between two datetime objects by simply subtracting them.
So, we start by getting the difference between the two dates.
And then we generate all the dates between the two using
gen = (start_date + datetime.timedelta(days = e) for e in range(diff + 1))
And since we only want the dates between the specified ones, we apply a filter.
filter(lambda x : x==10 , gen)
Then we will sum them over.
And the final code is this:
diff = start_date - end_date
gen = (start_date + datetime.timedelta(days = e) for e in range(diff + 1))
filtered_dates = filter(
lambda x : x.month == 10 ,
gen
)
count = sum(1 for e in filtered_dates)
You can also use reduce but sum() is a lot more readable.
A potential method of achieving this is to first compare whether your start or end dates you are comparing have the same month that you want to choose.
For example:
start = datetime(2017, 10, 29)
end = datetime(2017, 11, 4)
We create a function to compare the dates like so:
def daysofmonth(start, end, monthsel):
if start.month == monthsel:
days = (datetime(start.year, monthsel+1, 1) - start).days
elif end.month == monthsel:
days = (end - datetime(end.year, monthsel, 1)).days
elif not (monthsel > start.month) & (end.month > monthsel):
return 0
else:
days = (datetime(start.year, monthsel+1, 1) - datetime(start.year, monthsel, 1)).days
return days
So, in our example setting monthsel gives:
>>> daysofmonth(start, end, 10)
>>> 3
Using pandas whit your dates:
import pandas as pd
from datetime import datetime
first_date = datetime(2017, 10, 29)
second_date = datetime(2017, 11, 4)
days_count = (second_date - first_date).days
month_date = first_date.strftime("%Y-%m")
values = pd.date_range(start=first_date,periods=days_count,freq='D').to_period('M').value_counts()
print(values)
print(values[month_date])
outputs
2017-10 3
2017-11 3
3

Mapping Values in a pandas Dataframe column?

I am trying to filter out some data and seem to be running into some errors.
Below this statement is a replica of the following code I have:
url = "http://elections.huffingtonpost.com/pollster/2012-general-election-romney-vs-obama.csv"
source = requests.get(url).text
s = StringIO(source)
election_data = pd.DataFrame.from_csv(s, index_col=None).convert_objects(
convert_dates="coerce", convert_numeric=True)
election_data.head(n=3)
last_day = max(election_data["Start Date"])
filtered = election_data[((last_day-election_data['Start Date']).days <= 5)]
As you can see last_day is the max within the column election_data
I would like to filter out the data in which the difference between
the max and x is less than or equal to 5 days
I have tried using for - loops, and various combinations of list comprehension.
filtered = election_data[map(lambda x: (last_day - x).days <= 5, election_data["Start Date"]) ]
This line would normally work however, python3 gives me the following error:
<map object at 0x10798a2b0>
Your first attempt has it almost right. The issue is
(last_day - election_date['Start Date']).days
which should instead be
(last_day - election_date['Start Date']).dt.days
Series objects do not have a days attribute, only TimedeltaIndex objects do. A fully working example is below.
data = pd.read_csv(url, parse_dates=['Start Date', 'End Date', 'Entry Date/Time (ET)'])
data.loc[(data['Start Date'].max() - data['Start Date']).dt.days <= 5]
Note that I've used Series.max which is more performant than the built-in max. Also, data.loc[mask] is slightly faster than data[mask] since it is less-overloaded (has a more specialized use case).
If I understand your question correctly, you just want to filter your data where any Start Date value that is <=5 days away from the last day. This sounds like something pandas indexing could easily handle, using .loc.
If you want an entirely new DataFrame object with the filtered data:
election_data # your frame
last_day = max(election_data["Start Date"])
date = # Your date within 5 days of the last day
new_df = election_data.loc[(last_day-election_data["Start Date"]<=date)]
Or if you just want the Start Date column post-filtering:
last_day = max(election_data["Start Date"])
date = # Your date within 5 days of the last day
filtered_dates = election_data.loc[(last_day-election_data["Start Date"]<=date), "Start Date"]
Note that your date variable needs to be your date in the format required by Start Date (possibly YYYYmmdd format?). If you don't know what this variable should be, then just print(last_day) then count 5 days back.

Divide two timespans by eachother in Python/dateutil

please consider this problem:
I have a date in the past from where I start adding periods. When adding these periods results in a date greater than today, I want to stop, and check what the last date is.
This is a functionality for calculating debit dates in a membership. A member joins, say, 2007-01-31. He is debited every month. Let's say today is 2013-03-29 (it actually is atm). So I need to start counting months since 2007-01-31 and when I get past today's date, I need to stop. I can then see that the next debit date is 2013-03-31.
I am using the dateutil library to implement this, adding relativedelta's in a while loop until I surpass the current date. (I know it's probably not the best way, but I'm still quite new at Python and this is a proof-of-concept). The problem is that when I add a month to 2007-01-31, the next date is 2007-02-28, which is correct. But the next iteration the date is 2007-03-28, because dateutil doesn't recognize the 28th as the last day of the month to keep it intact and iterate to the last day of march. Ofcourse, that's a perfectly valid implementation. I then experimented with dateutils rrule object, but it has the same principles. It outputs a list of dates, but it simply skips the months that don't have enough days.
period = rrule(MONTHLY, interval=1, dtstart=datetime.date(2012, 5, 31), until=datetime.date(2013, 3, 29))
print(list(period))
Then I thought of a different approach:
If I could count the number of periods in the timespan between 2007-01-31 and 2013-03-29, I can add those number of periods to the startdate, and dateutil would return the right date.
The problem there is that the period isn't always one month. It can also be four weeks, a quarter or a year, for example.
I couldn't find a way to take a relativedelta and divide it with another relativedelta, to get a number of times the latter goes in the first.
If anyone can point me in the right direction I would appreciate it. Is there, for example, a library that can do this (divide timespans by each other, and output the result in a given timeblock, like months or weeks)? Is there perhaps a datediff function that accepts a period as an input (I know for example in vbscript you can get the difference between two dates in whatever period you want, be it weeks, months, days, whatever). Is there perhaps a totally different solution?
For completeness, I will include the code, but I think the text explains it all already:
def calculate(self, testdate=datetime.date.today()):
self._testdate = testdate
while self.next < self._testdate:
self.next += self._ddinterval
self.previous = self.next - self._ddinterval
return self.next
Thanks,
Erik
edit: I now have a solution that does what it's supposed to, but it's hardly Pythonic, elegant or speedy. So the question remains the same, if anyone can come up with a better solution, please do. Here's what I came up with:
def calculate(self, testdate=datetime.date.today()):
self._testdate = testdate
start = self.next
count = 0
while self.next < self._testdate:
count += 1
self.next = start + (count * self._ddinterval)
self.previous = self.next - self._ddinterval
return self.next
Instead of using the new value after adding your period as the starting point for the next loop, create an ever-increasing delta instead:
today = datetime.date.today()
start = datetime.date(2007, 1, 31)
period = relativedelta(months=1)
delta = period
while start + delta < today:
delta += period
next = start + delta
This results in:
>>> import datetime
>>> from dateutil.relativedelta import relativedelta
>>> today = datetime.date.today()
>>> start = datetime.date(2007, 1, 31)
>>> period = relativedelta(months=1)
>>> delta = period
>>> while start + delta < today:
... delta += period
...
>>> delta
relativedelta(years=+6, months=+3)
>>> start + delta
datetime.date(2013, 4, 30)
This works for any variable period lengths including quarters and years. For periods measured in exact weeks or days you can use a timedelta() object too, but this is a generic solution.
You cannot 'divide' time periods when using variable-width periods such as months and quarters. For time periods measured in whole days you can simply convert the difference between two dates to days (from the timedelta) and then get the modulus:
period = datetime.timedelta(days=28) # 4 weeks
delta = today - start
remainder = delta.days % period.days
end = today + datetime.timedelta(days=remainder)
which gives:
>>> period = datetime.timedelta(days=28) # 4 weeks
>>> delta = today - start
>>> remainder = delta.days % period.days
>>> today + datetime.timedelta(days=remainder)
datetime.date(2013, 4, 17)
If your delta is variable with respect to base time (i.e., 1 month can mean any of 28-31 days depending), then you're stuck with a loop.
If delta is a constant day count, however, you can sidestep iteration by converting to integers and doing a modulo operation.

Categories