Comparing two lists to find the average

Comparing two lists to find the average - python

The list should hold the average profit of all januaries and all februaries etc;
The way I thought you could do this was to compare to lists, such as year 1 list would have value for janurary, feb, march,...,december and from there I could find the average profit based on the years, This hasn't been working and I'm not sure where to go from here. Any suggestions?
MONTHS = 12
def average_profit(years):
assert years >= 0, "Years cannot be negative"
total = 0.0
monthly_average = 0.0
total_months = years * MONTHS
total_list = []
average_list=[]
percentage_list = []
for i in range(0, years):
yearly_total = 0.0
for j in range(1, 13):
monthly_profit = float(input("Please enter the profit made in month {0}: ".format(j)).strip())
monthly_average = monthly_average + monthly_profit
month_average = monthly_average/j
total_list.append(monthly_profit)
average_list.append(month_average)
yearly_total = yearly_total + monthly_profit
total_percent = (monthly_profit/12)*100
percentage_list.append(total_percent)
print("Total this year was ${0:.2f}".format(yearly_total))
total = total + yearly_total
average_per_month = total / total_months
return total, average_per_month, total_months, total_list, average_list, percentage_list

Your problem is most likely that for i in range(0, years) should be changed to for i in range(0, years). You do this right with the months, but getting it right with the years is just as important.

It seems to me that a better data structure could help with this a good bit. It's hard to tell what your best bet will be, but one suggestion could be to use a dict (defaultdict is even easier):
from collections import defaultdict:
d = defaultdict(list)
for i in range(0,years):
for month in range(1,13)
d[month].append(float(input()))
#now find the sum for each month:
for i in range(1,13):
print sum(d[i])/len(d[i])
Of course, we could use a list instead of a dictionary, but the dictionary would allow you to use month names instead of numbers (which may be kind of nice -- and I bet you could get their names pretty easily from the calendar module.)

Related

Paradox python algorithm

I am trying to solve a version of the birthday paradox question where I have a probability of 0.5 but I need to find the number of people n where at least 4 have their birthdays within a week of each other.
I have written code that is able to simulate where 2 people have their birthdays on the same day.
import numpy
import matplotlib.pylab as plt
no_of_simulations = 1000
milestone_probabilities = [50, 75, 90, 99]
milestone_current = 0
def birthday_paradox(no_of_people, simulations):
global milestone_probabilities, milestone_current
same_birthday_four_people = 0
#We assume that there are 365 days in all years.
for sim in range(simulations):
birthdays = numpy.random.choice(365, no_of_people, replace=True)
unique_birthdays = set(birthdays)
if len(unique_birthdays) < no_of_people:
same_birthday_four_people += 1
success_fraction = same_birthday_four_people/simulations
if milestone_current < len(milestone_probabilities) and success_fraction*100 > milestone_probabilities[milestone_current]:
print("P(Four people sharing birthday in a room with " + str(no_of_people) + " people) = " + str(success_fraction))
milestone_current += 1
return success_fraction
def main():
day = []
success = []
for i in range(1, 366): #Executing for all possible cases where can have unique birthdays, i.e. from 1 person to a maximum of 365 people in a room
day.append(i)
success.append(birthday_paradox(i, no_of_simulations))
plt.plot(day, success)
plt.show()
main()
I am looking to modify the code to look for sets of 4 instead of 2 and then calculate the difference between them to be less than equal to 7 in order to meet the question.
Am I going down the right path or should I approach the question differently?

The key part of your algorithm is in these lines:
unique_birthdays = set(birthdays)
if len(unique_birthdays) < no_of_people:
same_birthday_four_people += 1
Comparing the number of unique birthdays to the number of people did the work when you tested if two different people had the same birthday, but It wont do for your new test.
Define a new function that will receive the birthday array and return True or False after checking if indeed 4 different people had the a birthday in a range of 7 days:
def four_birthdays_same_week(birthdays):
# fill this function code
def birthday_paradox(no_of_people, simulations):
...
(this function can be defined outside the birthday_paradox function)
Then switch this code:
if len(unique_birthdays) < no_of_people:
same_birthday_four_people += 1
into:
if four_birthdays_same_week(birthdays):
same_birthday_four_people += 1
Regarding the algorithm for checking if there 4 different birthday on the same week: a basic idea would be to sort the array of birthdays, then for every group of 4 birthdays check if the day range between them is equal or lower to 7:
if it is, the function can immediately return True.
(I am sure this algorithm can be vastly improved.)
If after scanning the whole array we didn't return True, the function can return False.

KeyError: 20, not sure what is wrong

I am new to python and am currently trying to create a program that will create a list of yearly percentage changes in revenue.
This is what I have so far:
revs = {}
for year in range(14, 20):
revs[year] = float(input("Revenue in {0}: ".format(year)))
revs_change = []
for year in range(14, 20):
next_year = year + 1
revs_change.append((revs[next_year] - revs[year])/revs[year])
print(revs_change[0])
The error comes on the 8th line and it has something to do with using the variable next_year.
Thanks!

If you are going to print out the values of year and next_year the problem is that there is no value for revs[next_year=20].
One way is to do like this-
start = 14
end = 21
for year in range(start, end):
revs[year] = float(input("Revenue in {0}: ".format(year)))
revs_change = []
for year in range(start, end-1):
next_year = year + 1
print(f"Year: {year}, Next Year: {next_year}\n")
revs_change.append( (revs[next_year] - revs[year])/revs[year] )
print(revs_change[0])

In Python, the end limit for any range you give will be taken excluding that upper limit i.e if you set the upper limit as 20, it evaluates up to 19. Give the upper limit as 21 so that you'll get the desired output

Calculating monthly growth percentage from cumulative total growth

I am trying to calculate a constant for month-to-month growth rate from an annual growth rate (goal) in Python.
My question has arithmetic similarities to this question, but was not completely answered.
For example, if total annual sales for 2018 are $5,600,000.00 and I have an expected 30% increase for the next year, I would expect total annual sales for 2019 to be $7,280,000.00.
BV_2018 = 5600000.00
Annual_GR = 0.3
EV_2019 = (BV * 0.3) + BV
I am using the last month of 2018 to forecast the first month of 2019
Last_Month_2018 = 522000.00
Month_01_2019 = (Last_Month_2018 * CONSTANT) + Last_Month_2018
For the second month of 2019 I would use
Month_02_2019 = (Month_01_2019 * CONSTANT) + Month_01_2019
...and so on and so forth
The cumulative sum of Month_01_2019 through Month_12_2019 needs to be equal to EV_2019.
Does anyone know how to go about calculating the constant in Python? I am familiar with the np.cumsum function, so that part is not an issue. My problem is I cannot solve for the constant I need.
Thank you in advance and please do not hesitate to ask for further clarification.
More clarification:
# get beginning value (BV)
BV = 522000.00
# get desired end value (EV)
EV = 7280000.00
We are trying to get from BV to EV (which is a cumulative sum) by calculating the cumulative sum of the [12] monthly totals. Each monthly total will have a % increase from the previous month that is constant across months. It is this % increase that I want to solve for.
Keep in mind, BV is the last month of the previous year. It is from BV that our forecast (i.e., Months 1 through 12) will be calculated. So, I'm thinking that it makes sense to go from BV to the EV plus the BV. Then, just remove BV and its value from the list, giving us EV as the cumulative total of Months 1 through 12.
I imagine using this constant in a function like this:
def supplier_forecast_calculator(sales_at_cost_prior_year, sales_at_cost_prior_month, year_pct_growth_expected):
"""
Calculates monthly supplier forecast
Example:
monthly_forecast = supplier_forecast_calculator(sales_at_cost_prior_year = 5600000,
sales_at_cost_prior_month = 522000,
year_pct_growth_expected = 0.30)
monthly_forecast.all_metrics
"""
# get monthly growth rate
monthly_growth_expected = CONSTANT
# get first month sales at cost
month1_sales_at_cost = (sales_at_cost_prior_month*monthly_growth_expected)+sales_at_cost_prior_month
# instantiate lists
month_list = ['Month 1'] # for months
sales_at_cost_list = [month1_sales_at_cost] # for sales at cost
# start loop
for i in list(range(2,13)):
# Append month to list
month_list.append(str('Month ') + str(i))
# get sales at cost and append to list
month1_sales_at_cost = (month1_sales_at_cost*monthly_growth_expected)+month1_sales_at_cost
# append month1_sales_at_cost to sales at cost list
sales_at_cost_list.append(month1_sales_at_cost)
# add total to the end of month_list
month_list.insert(len(month_list), 'Total')
# add the total to the end of sales_at_cost_list
sales_at_cost_list.insert(len(sales_at_cost_list), np.sum(sales_at_cost_list))
# put the metrics into a df
all_metrics = pd.DataFrame({'Month': month_list,
'Sales at Cost': sales_at_cost_list}).round(2)
# return the df
return all_metrics

Let r = 1 + monthly_rate. Then, the problem we are trying to solve is
r + ... + r**12 = EV/BV. We can use numpy to get the numeric solution. This should be relatively fast in practice. We are solving a polynomial r + ... + r**12 - EV/BV = 0 and recovering monthly rate from r. There will twelve complex roots, but only one real positive one - which is what we want.
import numpy as np
# get beginning value (BV)
BV = 522000.00
# get desired end value (EV)
EV = 7280000.00
def get_monthly(BV, EV):
coefs = np.ones(13)
coefs[-1] -= EV / BV + 1
# there will be a unique positive real root
roots = np.roots(coefs)
return roots[(roots.imag == 0) & (roots.real > 0)][0].real - 1
rate = get_monthly(BV, EV)
print(rate)
# 0.022913299846925694
Some comments:
roots.imag == 0 may be problematic in some cases since roots uses a numeric algorithm. As an alternative, we can pick a root with the least imaginary part (in absolute value) among all roots with a positive real part.
We can use the same method to get rates for other time intervals. For example, for weekly rates, we can replace 13 == 12 + 1 with 52 + 1.
The above polynomial has a solution by radicals, as outlined here.
Update on performance. We could also frame this as a fixed point problem, i.e. to look for a fixed point of a function
x = EV/BV * x ** 13 - EV/BV + 1
The fix point x will be equal to (1 + rate)**13.
The following pure-Python implementation is roughly four times faster than the above numpy version on my machine.
def get_monthly_fix(BV, EV, periods=12):
ratio = EV / BV
r = guess = ratio
while True:
r = ratio * r ** (1 / periods) - ratio + 1
if abs(r - guess) < TOLERANCE:
return r ** (1 / periods) - 1
guess = r
We can make this run even faster with a help of numba.jit.

I am not sure if this works (tell me if it doesn't) but try this.
def get_value(start, end, times, trials=100, _amount=None, _last=-1, _increase=None):
#don't call with _amount, _last, or _increase! Only start, end and times
if _amount is None:
_amount = start / times
if _increase is None:
_increase = start / times
attempt = 1
for n in range(times):
attempt = (attempt * _amount) + attempt
if attempt > end:
if _last != 0:
_increase /= 2
_last = 0
_amount -= _increase
elif attempt < end:
if _last != 1:
_increase /= 2
_last = 1
_amount += _increase
else:
return _amount
if trials <= 0:
return _amount
return get_value(start, end, times, trials=trials-1,
_amount=_amount, _last=_last, _increase=_increase)
Tell me if it works.
Used like this:
get_value(522000.00, 7280000.00, 12)

python tuple over writing previous data

I am trying to create a function that will start the loop and add a day to current day count, it will ask 3 questions then combine that data to equal Total_Output. I then want 'n' to represent the end of the tuple, and in the next step add the Total_Output to the end of the tuple. But when I run the function it seems like it is creating a new tuple.
Example:
Good Morninghi
This is Day: 1
How much weight did you use?40
How many reps did you do?20
How many sets did you do?6
Day: 1
[4800.0]
This is Day: 2
How much weight did you use?50
How many reps did you do?20
How many sets did you do?6
Day: 2
[6000.0, 6000.0]
This is Day: 3
How much weight did you use?40
How many reps did you do?20
How many sets did you do?6
Day: 3
[4800.0, 4800.0, 4800.0]
failed
Here is the function:
def Start_Work(x):
Num_Days = 0
Total_Output = 0
Wght = 0
Reps = 0
Sets = 0
Day = []
while x == 1 and Num_Days < 6: ##will be doing in cycles of 6 days
Num_Days += 1 ##increase day count with each loop
print "This is Day:",Num_Days
Wght = float(raw_input("How much weight did you use?"))
Reps = float(raw_input("How many reps did you do?"))
Sets = float(raw_input("How many sets did you do?"))
Total_Output = Wght * Reps * Sets
n = Day[:-1] ##go to end of tuple
Day = [Total_Output for n in range(Num_Days)] ##add data (Total_Output to end of tuple
print "Day:",Num_Days
print Day
else:
print "failed"
Input = raw_input("Good Morning")
if Input.lower() == str('hi') or str('start') or str('good morning'):
Start_Work(1)
else:
print "Good Bye"

n = Day[:-1] ##go to end of tuple
Day = [Total_Output for n in range(Num_Days)] ##add data (Total_Output to end of tuple
Does not do what you think it does. You assign n but never use it (the n in the loop is assigned by the for n in), and it only hold a list of the end of the Day variable.
You then set Day to be [Total_Output] * Num_Days, so you make a new list of Num_Days occurrences of Total_Output.
You want:
Day.append(Total_Output)
to replace both of those lines.

Loop For working too long

I have two list of dicts: prices_distincts, prices.
They connect through hash_brand_artnum, both of them sorted by hash_brand_artnum
I do not understand why loop works for so long:
If length of prices_distincts is 100,000 it works for 30 min
But If length of prices_distincts is 10,000 it works for 10 sec.
Code:
prices_distincts = [{'hash_brand_artnum':1202},...,..]
prices = [{'hash_brand_artnum':1202,'price':12.077},...,...]
for prices_distinct in prices_distincts:
for price in list(prices):
if prices_distinct['hash_brand_artnum'] == price['hash_brand_artnum']:
print price['hash_brand_artnum']
#print prices
del prices[0]
else:
continue
I need to look for items with same prices. Relation beatween prices_distincts and prices one to many. And group price with equal price['hash_brand_artnum']

it's working so long because your algorithm is O(N^2) and 100000 ^ 2 = 10000000000 and 10000 ^ 2 = 100000000. So factor between two number is 100, and factor between 30 min and 10 sec ~100.
EDIT: It's hard to say by your code and such a small amount of data, and I don't know what your task is, but I think that your dictionaries is not very useful.
May be try this:
>>> prices_distincts = [{'hash_brand_artnum':1202}, {'hash_brand_artnum':14}]
>>> prices = [{'hash_brand_artnum':1202, 'price':12.077}, {'hash_brand_artnum':14, 'price':15}]
# turning first list of dicts into simple list of numbers
>>> dist = [x['hash_brand_artnum'] for x in prices_distincts]
# turning second list of dicts into dict where number is a key and price is a value
>>> pr = {x['hash_brand_artnum']:x["price"] for x in prices}
not you can iterate throuth your number and get prices:
>>> for d in dist:
... print d, pr[d]

As #RomanPekar mentioned, your algorithm is running slow because its complexity is O(n^2). To fix it, you should write it as an O(n) algorithm:
import itertools as it
for price, prices_distinct in it.izip(prices, prices_distincts):
if prices_distinct['hash_brand_artnum'] == price['hash_brand_artnum']:
# do stuff

If prices grows more or less with prices_distincts, then if you multiply the size of prices_distincts by 10, your original 10 seconds will be multiply by 10 then again by 10 (second for loop), and then by ~2 because of the "list(prices)" (that, by the way, should definitively be done out of the loop):
10sec*10*10*2 = 2000sec = 33min
This conversion is usually expensive.
prices_distincts = [{'hash_brand_artnum':1202},...,..]
prices = [{'hash_brand_artnum':1202,'price':12.077},...,...]
list_prices = list(prices)
for prices_distinct in prices_distincts:
for price in list_prices:
if prices_distinct['hash_brand_artnum'] == price['hash_brand_artnum']:
print price['hash_brand_artnum']
#print prices
del prices[0]
else:
continue

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Comparing two lists to find the average - python

Your problem is most likely that for i in range(0, years) should be changed to for i in range(0, years). You do this right with the months, but getting it right with the years is just as important.

Related

Paradox python algorithm

KeyError: 20, not sure what is wrong

Calculating monthly growth percentage from cumulative total growth

python tuple over writing previous data

Loop For working too long

Categories

Resources