django number of consecutive years - python

I am working on a page with stocks and dividends. So to simplify, my model is something like this:
class Stock(models.Model):
name = models.CharField("Stock's name", max_length=200)
class Dividend(models.Model):
date = models.DateField('pay date')
amount = models.DecimalField(max_digits=20, decimal_places=10)
stock = models.ForeignKey(Stock)
I want to calculate number of consecutive years where dividend was paid for each stock. Pure existence of dividend that year is enough. If company paid dividends through 2000-2005 and 2008-2014, I want to get number 7. What is the best way to calculate it? I came up with:
1. making query for each year if there is any dividend (too many requests)
2. using values() or values_list() to get only list of distinct ordered years and then iterating over that list
I would go with number 2. Is there any better way how to use queryset to calculate this value?
Edit:
3. I noticed dates just now.

I have thought about your comments and this is what I think.
remram: SQL unnecessary difficult.
dylrei: Firstly I thought it's a great idea, but now I am no so sure. There might be one dividend 70 years ago and then 69 consecutive years no dividends. There are a lot if stocks that do not pay dividends and I should be able to get 0 years quickly. Also, I do not know how to get years, were no dividends were paid in Django.
This is what I came up with:
def get_years_paying(symbol):
"""
get number of consecutive years paying dividends for Stock by symbol
:param symbol:
:return: number of years
"""
num_years = 0
iteration_year = date.today().year
dividend_years = Dividend.objects.filter(
stock__symbol=symbol, spinoff=False, special=False, split=False
).dates(
"date", "year", order='DESC'
)
dividend_years_list = [i.year for i in dividend_years]
while True:
if iteration_year in dividend_years_list:
num_years += 1
elif iteration_year == date.today().year:
pass # current year is optional, there will be no dividends on 1. January
else:
return num_years if num_years != 0 else None
iteration_year -= 1

Related

Pandas calculating sales for recurring monthly payments

I have a dataset with millions of records just like below
CustomerID
StartTime
EndTime
1111
2015-7-10
2016-3-7
1112
2016-1-5
2016-1-19
1113
2015-10-18
2020-9-1
This dataset contains the information for different subscription contracts and it is assumed that:
if the contract is active then the customer will need to pay a monthly fee in advance. The first payment will be collected on the start date.
If the contract ends before the next payment date, which is exactly one month after the last payment date, the customer does not need to pay the next subscription. For instance, customer 1112 only needs to pay once.
monthly payment fee is $10
In this situation, I need to calculate the monthly/quarterly/annual sales between 2015 and 2020. It is ideal to also show the breakdown of sales by different customer IDs so that subsequent machine learning tasks can be performed.
Importing data (I saved your table as a .csv in Excel, which is the reason for the specific formatting of the pd.to_datetime):
import pandas as pd
import numpy as np
df = pd.read_csv("Data.csv", header=0)
# convert columns "to_datetime"
df["StartTime"] = pd.to_datetime(df["StartTime"], format="%d/%m/%Y")
df["EndTime"] = pd.to_datetime(df["EndTime"], format="%d/%m/%Y")
Calculate the number of months between the start and end dates (+1 at the end because there will be a payment even if the contract is not active for a whole month, because it is in advance):
df["Months"] = ((df["EndTime"] - df["StartTime"])/np.timedelta64(1, 'M')).astype(int) + 1
Generate a list of payment dates (from the start date, for the given number of months, one month apart). The pd.tseries.offsets.DateOffset(months=1) will ensure that the payment date is on the same day every month, rather than the default end-of-month if freq="M".
df["PaymentDates"] = df.apply(lambda x: list(pd.date_range(start=x["StartTime"], periods=x["Months"], freq=pd.tseries.offsets.DateOffset(months=1)).date), axis=1)
Create a new row for each payment date, add a payment column of 10, then pivot so that the CustomerID is the column, and the date is the row:
df = df.explode("PaymentDates").reset_index(drop=True)
df["PaymentDates"] = pd.to_datetime(df["PaymentDates"])
df["Payment"] = 10
df = pd.pivot_table(df, index="PaymentDates", columns="CustomerID", values="Payment")
Aggregate for month, quarter, and year sales (this will be an aggregation for each individual CustomerID. You can then sum by row to get a total amount:
months = df.groupby([df.index.year, df.index.month]).sum()
quarters = df.groupby([df.index.year, df.index.quarter]).sum()
years = df.groupby(df.index.year).sum()
# total sales
months["TotalSales"] = months.sum(axis=1)
quarters["TotalSales"] = quarters.sum(axis=1)
years["TotalSales"] = years.sum(axis=1)
I realise this may be slow for the df.apply if you have millions of records, and there may be other ways to complete this, but this is what I have thought of.
You will also have a lot of columns if there are many millions of customers, but this way you will keep all the CustomerID values separate and be able to know which customers made payments in a given month.
After the number of months is calculated in df["Months"], you could then multiply this by 10 to get the number of sales for each customer.
If this is the only data you need for each customer individually, you would not need to pivot the data at all, just aggregate on the "PaymentDates" column, count the number of rows and multiply by 10 to get the sales for month, quarter, year.

getting the age from RFC Mex

I have a data frame where I'm trying to get the age of the user, but the problem is that there is no birth date, so here in my country exist some kind of tax ID where you can get this data:
ABCD971021XZY or ABCD971021
Where the first 4 letters represent the name and last name and the numbers are the birthday date
in the case above would be 1997/10/21
At this point I've already tried this:
# To slice the RFC
df_v['new_column'] = df_v['RFC'].apply(lambda x: x[4:10])
# Trying to gt the date
from datetime import datetime, timedelta
s = "971021"
date = datetime(year=int(s[0:2]), month=int(s[2:4]), day=int(s[4:6]))
OUT: 0097-10-21
What I'm looking for is to look something like this.
1997-10-21
The problem is that the millenium and century are not given explicitly in the tax ID, and there is no single way to convert from a two-digit year to a four-digit year.
e.g. 971021 tells you that the birth year is xx97, but for all datetime knows, that could mean the year 1597 or 1097 or 2397.
You as the programmer will have to decide how to encode your assumptions about what millenium and century a person was most likely born in. For example, a simplistic (untested) solution could be:
year_last_two = int(s[0:2])
# If the year given is less than 20, this person was most likely born in the 2000's
if year_last_two < 20:
year = 2000 + year_last_two
# Otherwise, the person was most likely born in the 1900's
else:
year = 1900 + year_last_two
date = datetime(year=year, month=int(s[2:4]), day=int(s[4:6]))
Of course, this solution only applies in 2019, and also assumes no one is more than 100 years old. You could make it better by using the current year as the splitting point.

Django queryset : Calculate monthly average

I have a sales model and i want to calculate the (Number of transactions)/(num of days) when grouped by month, week, year.
class SaleItem(models.Model):
id = models.UUIDField(default=uuid4, primary_key=True)
bill = models.ForeignKey()
item = models.ForeignKey('item')
quantity = models.PositiveSmallIntegerField()
price = models.DecimalField(max_digits=13, decimal_places=3, default=0)
So if sales is grouped by month then this becomes (# transcations/# days in that month) for each month. Now if the sales if grouped by year this becomes (# transcations/# days in that year)
Currently i can get the number of transactions
aggregate = 'month' # parameter
# get number of transactions
SaleItem.objects.annotate(date=Trunc('bill__date', aggregate)).values('date').annotate(sales=Count('bill', distinct=True))
But how can i divide each count by the number of days in that group?
Doing it in SQL is possible (and not even that difficult). Getting the number of days in a month is RDBMS-specific though, and there is no generic Django database function to shield you from the various SQL implementations.
Django makes it very easy to wrap your own functions around SQL functions. For instance, for SQLite, you can define
class DaysInMonth(Func):
output_field = IntegerField()
def as_sqlite(self, compiler, connection):
return super().as_sql(
compiler,
connection,
function='strftime',
template='''
%(function)s("%%%%d",
%(expressions)s,
"start of month",
"+1 month",
"-1 day")
''',
)
Then you can use DaysInMonth() to divide your count by the number of days:
qs = (
SaleItem.objects
.annotate(date=Trunc('bill__date', aggregate))
.values('date')
.annotate(
sales = Count('bill', distinct=True),
sales_per_day = F('sales') / DaysInMonth('date')
)
)
If a rounded-down integer is not sufficient and you need a decimal result, this is another hoop to jump through:
sales_per_day=ExpressionWrapper(
Cast('sales', FloatField()) / DaysInMonth(F('date')),
DecimalField()
)
If, heaven forbid, you want to round in the database rather than in your template, you need another custom function:
class Round(Func):
function = 'ROUND'
output_field = FloatField()
arity = 2
sales_per_day=Round(
Cast('sales', FloatField()) / DaysInMonth(F('date')),
2 # decimal precision
)
So Django is really flexible, but as Willem said doing it in Python would save you some pain without losing significant performance (if any at all).

Django filter using two conditions

I am developing a web app using django. Among with other tables, I have a table called GeneralContract, which has issueDate and Expiration Date as date fields.
I want to find out the profit of an insurance agent in my case would get from these contracts between a period. For example, if the date range is 15 January 2015 - 25 February 2015 I would like to filter all GeneralContract objects who are issued ANY year between this period.
(i.e. issueDate__month = Date1.month AND IssueDate.day_gte= Date1.day) AND (IssueDate__month = Date2.month AND IssueDate.day_lte = Date2.day) ??
I tried the following but it is not giving me the results I wanted and I am not sure if I am writing this in the correct syntax or if my logic is wrong.
criterion1 = Q(issuedate__month=date1.month)
criterion2 = Q(issuedate__day__gte=date1.day)
criterion3 = Q(issuedate__month=date2.month)
criterion4 = Q(issuedate__day__lte=date2.day)
criterionA = criterion1 & criterion2
criterionB = criterion3 & criterion4
criterionC = criterionA & criterionB
currentGenProfits = GeneralContract.objects.filter(criterionC, cancelled=False)
Is this the right way of doing this filtering logic?
You can't do that because if date1.day = 5 and date2.day = 4 you will have no result, you must check the month and the date together, syntax is right but logic is wrong.
I may suggest to start by taking the biggest set and then apply filtering on it
Start with filtering between the two month and then remove from the queryset objects which are on the first month but before the min day and remove from the queryset objects which are on the last month but after the max day, i think that can do the job.

Django/python date aggregation by month and quarter

I'm writing a feature that requires the average price of an item over different times (week, month, quarter etc.) Here's the model:
class ItemPrice(models.Model):
item = models.ForeignKey(Item)
date = models.DateField()
price = models.FloatField()
This model tracks the price of the item over time, with new Items being added at frequent, but not regular, intervals.
Finding the average price over the last week is easy enough:
ItemPrice.objects.filter(item__id = 1)
.filter(date_lt = TODAY)
.filter(date_gte = TODAY_MINUS_7_DAYS)
.filter(date_.aggregate(Avg('value'))
As a week always has 7 days, but what about month and quarter? They have different numbers of days...?
Thanks!
EDIT:
The app is for a finance org, 30-day months wont cut it I'm afraid, thanks for the suggestion!
The solution is two-part, first using the aggregation functions of django ORM, the second using python-dateutil.
from dateutil.relativedelta import relativedelta
A_MONTH = relativedelta(months=1)
month = ItemPrice.objects \
.filter(date__gte = date - A_MONTH) \
.filter(date__lt = date) \
.aggregate(month_average = Avg('price'))
month equals:
{'month_average': 40}
It's worth noticing that you can change the key of the month dictionary by changing the .aggregate() param.
dateutil's relativedelta can handle days, weeks, years and lots more. An excellent package, I'll be re-writing my home-grown hax.
import datetime
from dateutil import relativedelta, rrule
obj = self.get_object()
datenow = datetime.datetime.now()
quarters = rrule.rrule(
rrule.MONTHLY,
bymonth=(1, 4, 7, 10),
bysetpos=-1,
dtstart=datetime.datetime(datenow.year, 1, 1),
count=5)
first_day = quarters.before(datenow)
last_day = (quarters.after(datenow) - relativedelta.relativedelta(days=1))
quarter = Payment.objects.filter(
operation__department__cashbox__id=obj.pk,
created__range=(first_day, last_day)).aggregate(count=Sum('amount'))
inspiration from there
I would go for the 360-day calendar and not worry about these little inaccuracies. Just use the last 30 days for your "last month average" and the last 90 days for your "last quarter average".
First of all, are you interested in the past 7 days or the last week? If the answer is the last week, your query is not correct.
If it is past "n" days that concerns you, then your query is correct and I suppose you can just relax and use 30 days for a month and 90 days for a quarter.

Categories