How to find the average of results in a linear regression equation - python

I have the equation and I've been asked to find the average of x from 2010 to 2015. I started a loop to first get the values for 2010-2015 but I'm stuck on how to get the average of those values. Below is what I have so far:
a = -22562.8
b = 11.24
i = 2010
while i <=2015:
sum_estimated_riders = (a + (i * b)) * 100000
print(sum_estimated_riders)
i = i + 1

You can use numpy.mean() for this
Make a list, append it with each value, then average that.
import numpy as np
estimated_riders = []
a = -22562.8
b = 11.24
i = 2010
while i <=2015:
sum_estimated_riders = (a + (i * b)) * 100000
estimated_rides.append(sum_estimated_riders)
i = i + 1
avg = np.mean(estimated_riders)
print(avg)

You overwrite sum_estimated_riders every time. Instead, initialize it to 0 before the loop and add to it inside the loop. Then divide by the number of iterations.
a = -22562.8
b = 11.24
i = 2010
sum_estimated_riders = 0
num_years = 0
while i <=2015:
sum_estimated_riders += (a + (i * b)) * 100000
num_years += 1
i = i + 1
mean_estimated_riders = sum_estimated_riders / num_years
print(mean_estimated_riders)
Alternatively, you could create a list of estimated_riders for each year. Then, use sum() to calculate the sum and divide by the length of the list.
estimated_riders = []
while i <= 2015:
estimated_riders.append((a + (i * b)) * 100000)
mean_estimated_riders = sum(estimated_riders) / len(estimated_riders)
Or, as a list comprehension:
estimated_riders = [(a + (i * b)) * 100000 for i in range(2010, 2016)] # 2016 because range() excludes the end
mean_estimated_riders = sum(estimated_riders) / len(estimated_riders)

Related

How to loop multiple appends for python

num_pixels_per_cell_one_axis = 5
num_cells_per_module_one_axis = 3
inter_cell_sep = 4
max_items_in_list = num_cells_per_module_one_axis * num_pixels_per_cell_one_axis + (num_cells_per_module_one_axis-1) * inter_cell_sep
print(max_items_in_list)
indices_to_retain = list(range(max_items_in_list))
indices_to_remove = indices_to_retain[num_pixels_per_cell_one_axis :: num_pixels_per_cell_one_axis + inter_cell_sep]
if inter_cell_sep == 2:
for k in range(0,len(indices_to_remove)):
indices_to_remove.append(indices_to_remove[k]+1)
if inter_cell_sep == 3:
for k in range(0,len(indices_to_remove)):
indices_to_remove.append(indices_to_remove[k]+1)
indices_to_remove.append(indices_to_remove[k]+2)
for k in indices_to_remove:
indices_to_retain.remove(k)
print(indices_to_remove)
print(indices_to_retain)
I want to find a way to loop inter_cell_sep for any positive number and as it increases the lines for appending the list also increases. The expected answer should be [0,1,2,3,4,9,10,11,12,17,18,19,20]
I think instead of using if statements for each value of inter_cell_sep you could loop through a range of inter_cell_sep. Here is what I came up with.
num_pixels_per_cell_one_axis = 5
num_cells_per_module_one_axis = 3
inter_cell_sep = 3
max_items_in_list = num_cells_per_module_one_axis * num_pixels_per_cell_one_axis + (num_cells_per_module_one_axis - 1) * inter_cell_sep
print(max_items_in_list)
indices_to_retain = list(range(max_items_in_list))
indices_to_remove = indices_to_retain[num_pixels_per_cell_one_axis:: num_pixels_per_cell_one_axis + inter_cell_sep]
for k in range(0, len(indices_to_remove)):
indices_to_remove.extend([indices_to_remove[k] + x for x in range(1, inter_cell_sep)])
for k in indices_to_remove:
indices_to_retain.remove(k)
print(indices_to_remove)
print(indices_to_retain)

vectorizing a double for loop

This is a performance question. I am trying to optimize the following double for loop. Here is a MWE
import numpy as np
from timeit import default_timer as tm
# L1 and L2 will range from 0 to 3 typically, sometimes up to 5
# all of the following are dummy values but match correct `type`
L1, L2, x1, x2, fac = 2, 3, 2.0, 4.5, 2.3
saved_values = np.random.uniform(high=75.0, size=[max(L1,L2) + 1, max(L1,L2) + 1])
facts = np.random.uniform(high=65.0, size=[L1 + L2 + 1])
val = 0
start = tm()
for i in range(L1+1):
sf = saved_values[L1][i] * x1 ** (L1 - i)
for j in range(L2 + 1):
m = i + j
if m % 2 == 0:
num = sf * facts[m] / (2 * fac) ** (m / 2)
val += saved_values[L2][j] * x1 ** (L1 - j) * num
end = tm()
time = end-start
print("Long way: time taken was {} and value is {}".format(time, val))
My idea for a solution is to take out the if m % 2 == 0: statement and then calculate all i and j combinations i.e., a matrix, which I should be able to vectorize, and then use something like np.where() to add up all of the elements meeting the requirement of if m % 2 == 0: where m= i+j.
Even if this is not faster than the explicit for loops, it should be vectorized because in reality I will be sending arrays to a function containing the double for loops, so being able to do that part vectorized, should get me the speed gains I am after, even if vectorizing this double for loop does not.
I am stuck spinning my wheels right now on how to broadcast, but account for the sf factor as well as the m factor in the inner loop.

How to calculate sum of terms of Taylor series of sin(x) without using inner loops or if-else?

I can't use inner loops
I can't use if-else
I need to compute the following series:
x - x^3/3! + x^5/5! - x^7/7! + x^9/9! ...
I am thinking something like the following:
n =1
x =0.3
one=1
fact1=1
fact2=1
term =0
sum =0
for i in range(1, n+1, 2):
one = one * (-1)
fact1 = fact1*i
fact2 = fact2*i+1
fact = fact1*fact2
x = x * x
term = x/fact
sum = sum + term
But, I am finding hard times in keeping the multiplications of both fact and x.
You want to compute a sum of terms. Each term is the previous term mutiplied by -1 * x * x and divided by n * (n+1). Just write it:
def func(x):
eps = 1e-6 # the expected precision order
term = x
sum = term
n = 1
while True:
term *= -x * x
term /= (n+1) * (n+2)
if abs(term) < eps: break
sum += term
n += 2
return sum
Demo:
>>> func(math.pi / 6)
0.4999999918690232
giving as expected 0.5 with a precision of 10e-6
Note: the series is the well known development of the sin function...
Isn't that a Taylor series for sin(x)? And can you use list comprehension? With list comprehension that could be something like
x = 0.3
sum([ (-1)**(n+1) * x**(2n-1) / fact(2n-1) for n in range(1, numOfTerms)])
If you can't use list comprehension you could simply loop that like this
x=0.3
terms = []
for n in range(1, numberOfTerms):
term = (-1)**(n+1)*x**(2n-1)/fact(2n-1)
terms.append(term)
sumOfTerms = sum(terms)
Then calculating the factorial by recursion:
def fact(k):
if (k == 1):
return n
else:
return fact(k-1)*k
Calcualting the factorial using Striling's approximation:
fact(k) = sqrt(2*pi*k)*k**k*e**(-k)
No if-else here nor inner loops. But then there will be precision errors and need to use math lib to get the constants or get even more precision error and use hard coded values for pi and e.
Hope this can help!
n = NUMBER_OF_TERMS
x = VALUE_OF_X
m = -1
sum = x # Final sum
def fact(i):
f = 1
while i >= 1:
f = f * i
i = i - 1
return f
for i in range(1, n):
r = 2 * i + 1
a = pow (x , r)
term = a * m / fact(r);
sum = sum + term;
m = m * (-1)

faster way to calculate row values based on values of previous rows, pandas dataframe

need to calculate value of the expression(as in code), for each row of the dataframe.
the current code works, however take too long to compute.
need a faster way to implement the same.
code:
num =0
den = 0
for i in range(1,2000):
p1 = p[i]
t1 = tx[i]
num = num * pow(numpy.e,-1*t1) + p1
den = den * pow(numpy.e,-1*t1) + 1
t["s"][i] = num/den
all values in the dataframe are of float datatype
above code takes approx 80sec for 2000 rows.
actual dataframe has over a million rows
please suggest.
Thanks!
If t is the only variable you need to evaluate, you don't need pow(numpy.e, -t1)
num = 0
den = 0
for i in range(1,2000):
p1 = p[i]
#t1 = tx[i]
num += p1
den += 1
t["s"][i] = num / den
This would be enough.
EDIT:
def mytest(a, b):
t = []
num, den = 0, 0
for i in range(1, 2000):
num = num * pow(np.e, -b[i]) + a[i]
den = den * pow(np.e, -b[i]) + 1
t.append(num / den)
return t
def mytest2(a, b):
t = []
num, den = 0, 0
neck = pow(np.e, -b) # bottle neck
for i in range(1, 2000):
num = num * neck[i] + a[i]
den = den * neck[i] + 1
t.append(num / den)
return t
Output:
%timeit mytest(random.rand(2000), random.rand(2000))
100 loops, best of 3: 3.26 ms per loop
%timeit mytest2(random.rand(2000), random.rand(2000))
100 loops, best of 3: 1.54 ms per loop
Unfortunately, I cannot reproduce your (HUGE) 80 secs. You should fix somewhere else.

How to loop a number so that the number iterates through every possible value?

I was doing Problem 9 in Project Euler when I ran into a problem. My program was taking way to long too run. More than half an hour. Here is my code.
def Problem9():
a = 1
b = 1
c = 1
x = []
while(a + b + c != 1000):
a = a + 1
for i in range(0,1000):
c = 1000 - (a + b)
if a < b < c:
if (a*a) + (b*b) == (c*c):
x.append(a*b*c)
print(a*b*c)
b = b + 1
print(x)
Problem9()
This basically is supposed to find out all the Pythagorean triplets which add up to one thousand(link to problem so that you can understand it better: https://projecteuler.net/problem=9) Is there a problem in my code which I can fix or is my code fundamentally wrong?
Since you know that the three numbers must add up to 1000, and a < b < c, you take advantage of that fact to loop much more efficiently (and cleanly).
def Problem9():
for a in range(1000):
for b in range(a,1000):
if a**2 + b**2 == (1000 - a - b)**2:
return a*b*(1000 - a - b)
Here, you loop over a from 1 to 1,000. Since b must be greater than a, you then looper over b from a until 1,000. Then, since you know that 1000 = a + b + c, then c = 1000 - a - b, and you can test you Pythagorean condition without any more looping.
A Pythagorean triplet is a set of three natural numbers, a < b < c, for > which a2 + b2 = c2.
There exists exactly one Pythagorean triplet for which a + b + c = 1000.
This will work
def pythagorean_tiplet():
a = 1
while(a < 1000):
b = a + 1 # note this, b is starting from a + 1, since b starting from 0 is useless and will only add to the running time.
while (b < 1000):
result = a**2 + b**2
c = math.sqrt(result)
if (( a + b + c) == 1000 and (a < b < c)): #test for conditions
return a * b * c
b += 1
a += 1
print(pythagorean_tiplet())
This
algorithm is definitely unsuitable for perimeters s > 1 000 000.
There is a faster algorithm that can be used to solve it. you can search for parametrisation of Pythagorean triplets
You've the system
(*1) a + b + c = 1000
(*2) a² + b² = c²
If
a + b + c = 1000
then
a + b = 1000 - c
(a + b)² = (1000 - c)²
a² + 2ab + b² = 1000² - 2000c + c²
( a² + b² ) + 2ab = 1000² - 2000c + c²
but, by the (*2), ( a² + b² ) = c², and then
c² + 2ab = 1000² - 2000c + c²
2ab = 1000² - 2000c
2000c = 1000² - 2ab
then
c = 500 - ab/(1000)
So, now, you've the new system:
(*3) a + b + 500 - ab/(1000) = 1000
(*4) c = 500 - ab/(1000)
Besides, a, b, and c are whole numbers, and a<b<c;
if a>332, a must be, at least, 333, and then,
b should be, at least, 334, and then, c should be, at least, 335; 333 + 334 + 335 = 1002.
With more math, you can do this even easier.
def p():
for a in range(1,333):
for b in range(a+1,(1000-a)/2):
if ( 1000*a + 1000*b + 500000 - a*b == 1000000 ):
c=500-((a*b)/1000)
print (a,b,c);print a*b*c
return
p()
Result:
time python Special_Pythagorean_triplet.py
(200, 375, 425)
31875000
real 0m0.041s user 0m0.036s sys 0m0.000s
In the if statement:
if ( 1000*a + 1000*b + 500000 - a*b == 1000000 )
you could use:
if ( a + b + 500 - (a*b)/1000 == 1000 )
but, in this case, only whole numbers matters:
with the first, you get around division and its rounding problems.
A better way is use itertools
https://docs.python.org/3.4/library/itertools.html
from itertools import product
def ff1():
for r in product(range(1,1000),repeat=3):
a,b,c=r
if a+b+c==1000:
if a<b<c:
if a**2+b**2==c**2:
print(a,b,c)
print(a*b*c)
ff1()
This code is really awkward. The while condition itself it's somehow wrong, you would stop with the first 3 numbers that sum 1000, then exit. Another wrong thing is that B doesn't reset. You can do similar to how Ibukun suggested, but it is not the best way to do for this direct approach. You DON'T need to check if they sum 1000. It's way simpler:
Iterate A from 3 to 997
Iterate B from A+1 to 999-A
Do C = 1000 - A - B (that's how you don't need to check the sum, you kinda already did that)
Check if they are triplet, when they are, you are done!
There are other great approaches you can check out once you enter the right answer, they are way more interesting

Categories