Total Sum of Squares (TSS) in python - python

I'm trying to calculate the total sum of squares using python.
I know that the formula of TSS is:
[enter image description here][1]
I created a code to do it:
from statistics import mean
x = ([3,1,3,1,3,13])
def tss(a):
m = mean(a)
for i in a:
i += ((i-m)**2)
return (i)
print(tss(x))
The problem is: Its keeping returning me 94, but i know that the correct answer is 102. I don't have a clue of what i did wrong. Can anybody help me?
[1]: https://i.stack.imgur.com/Alx6r.png

i is resetting each time it goes through the loop. So on the last loop your function erases all the previous sums, sets i to 13, then adds the square of the difference between 13 and the mean to i (which is now 13), returning 94. You need a different variable to track the sum, so it doesn't get lost each loop. You want:
from statistics import mean
x = ([3,1,3,1,3,13])
def tss(a):
m = mean(a)
n = 0
for i in a:
n += ((i-m)**2)
return (n)
print(tss(x))
'''
#mateen's answer is more pythonic and will perform better than a loop, but I don't think you'll get the understanding from it. Welcome to python!

If you want to keep your initial script , just do :
from statistics import mean
x = ([3, 1, 3, 1, 3, 13])
def tss(a):
total = 0
for i in a:
total = total + ((i-mean(a))**2)
return total

Without numpy:
def tss(xs):
m = sum(xs) / len(xs)
return sum((x - m)**2 for x in xs)
With numpy:
import numpy as np
def tss(x):
return ((x - np.mean(x))**2).sum()

Related

How do I optimize a simple multivariate function in economics

I apologize if this is simple but I have looked for over an hour and nothing has worked. I am attempting to use python to find the optimal level of L (Labor) and the output (Profit) given the objective function 5(12*L*K - 0.1*L^2*K) - 5*L - 5*K when K is fixed at 10.
I have tried to use the following code from other answers to similar questions (using the '-' to maximize).
def micro(L):
return 5(12*L*10 - 0.1*L**2*10) - 5*L - 5*10
results = minimize(-micro, 0)
I'm still new to python so I could just be completely off base. Thanks for the help!
scipy.optimize.fmin can do this:
>>> from scipy.optimize import fmin
>>> def fn(x):
... return -(-5*x*x+595*x-10)
...
>>> fmin(fn,0)
Optimization terminated successfully.
Current function value: -17651.250000
Iterations: 37
Function evaluations: 74
[59.5]
>>>
By using np.arange we create a NumPy array from 0 to 100 each time increasing by 0.1 then we plug it into the profit equation and we use np.argmax to return the index of the biggest profit witch will get us how much labor you have to do for it.
import numpy as np
def find_optimal_L():
L = np.arange(0, 100, 0.1)
profit = 5*(12*L*10 - 0.1*L**2*10) - 5*L - 5*10
return L[np.argmax(profit)], np.max(profit)
L,profit = find_optimal_L()
print(L,profit)
This is other option:
my_results = []
def micro(L):
result = 5*(12*L*10 - 0.1*L**2*10) - 5*L - 5*10
my_results.append(result)
return np.amax(my_results)
You can change the range here:
L = np.linspace(-100, 100, 1000)
micro(L)
17651.232263294325 # Output
If you want to minimise, just change the return:
return np.amin(my_results) # -109550.0 Output

Any easy way to transform a missing number sequence to its range?

Suppose I have a list that goes like :
'''
[1,2,3,4,9,10,11,20]
'''
I need the result to be like :
'''
[[4,9],[11,20]]
'''
I have defined a function that goes like this :
def get_range(lst):
i=0
seqrange=[]
for new in lst:
a=[]
start=new
end=new
if i==0:
i=1
old=new
else:
if new - old >1:
a.append(old)
a.append(new)
old=new
if len(a):
seqrange.append(a)
return seqrange
Is there any other easier and efficient way to do it? I need to do this in the range of millions.
You can use numpy arrays and the diff function that comes along with them. Numpy is so much more efficient than looping when you have millions of rows.
Slight aside:
Why are numpy arrays so fast? Because they are arrays of data instead of arrays of pointers to data (which is what Python lists are), because they offload a whole bunch of computations to a backend written in C, and because they leverage the SIMD paradigm to run a Single Instruction on Multiple Data simultaneously.
Now back to the problem at hand:
The diff function gives us the difference between consecutive elements of the array. Pretty convenient, given that we need to find where this difference is greater than a known threshold!
import numpy as np
threshold = 1
arr = np.array([1,2,3,4,9,10,11,20])
deltas = np.diff(arr)
# There's a gap wherever the delta is greater than our threshold
gaps = deltas > threshold
gap_indices = np.argwhere(gaps)
gap_starts = arr[gap_indices]
gap_ends = arr[gap_indices + 1]
# Finally, stack the two arrays horizontally
all_gaps = np.hstack((gap_starts, gap_ends))
print(all_gaps)
# Output:
# [[ 4 9]
# [11 20]]
You can access all_gaps like a 2D matrix: all_gaps[0, 1] would give you 9, for example. If you really need the answer as a list-of-lists, simply convert it like so:
all_gaps_list = all_gaps.tolist()
print(all_gaps_list)
# Output: [[4, 9], [11, 20]]
Comparing the runtime of the iterative method from #happydave's answer with the numpy method:
import random
import timeit
import numpy
def gaps1(arr, threshold):
deltas = np.diff(arr)
gaps = deltas > threshold
gap_indices = np.argwhere(gaps)
gap_starts = arr[gap_indices]
gap_ends = arr[gap_indices + 1]
all_gaps = np.hstack((gap_starts, gap_ends))
return all_gaps
def gaps2(lst, thr):
seqrange = []
for i in range(len(lst)-1):
if lst[i+1] - lst[i] > thr:
seqrange.append([lst[i], lst[i+1]])
return seqrange
test_list = [i for i in range(100000)]
for i in range(100):
test_list.remove(random.randint(0, len(test_list) - 1))
test_arr = np.array(test_list)
# Make sure both give the same answer:
assert np.all(gaps1(test_arr, 1) == gaps2(test_list, 1))
t1 = timeit.timeit('gaps1(test_arr, 1)', setup='from __main__ import gaps1, test_arr', number=100)
t2 = timeit.timeit('gaps2(test_list, 1)', setup='from __main__ import gaps2, test_list', number=100)
print(f"t1 = {t1}s; t2 = {t2}s; Numpy gives ~{t2 // t1}x speedup")
On my laptop, this gives:
t1 = 0.020834800001466647s; t2 = 1.2446780000027502s; Numpy gives ~59.0x speedup
My word that's fast!
There is iterator based solution. It'is allow to get intervals one by one:
flist = [1,2,3,4,9,10,11,20]
def get_range(lst):
start_idx = lst[0]
for current_idx in flist[1:]:
if current_idx > start_idx+1:
yield [start_idx, current_idx]
start_idx = current_idx
for inverval in get_range(flist):
print(inverval)
I don't think there's anything inefficient about the solution, but you can clean up the code quite a bit:
seqrange = []
for i in range(len(lst)-1):
if lst[i+1] - lst[i] > 1:
seqrange.append([lst[i], lst[i+1]])
I think this could be more efficient and a bit cleaner.
def func(lst):
ans=0
final=[]
sol=[]
for i in range(1,lst[-1]+1):
if(i not in lst):
ans+=1
final.append(i)
elif(i in lst and ans>0):
final=[final[0]-1,i]
sol.append(final)
ans=0
final=[]
else:
final=[]
return(sol)

Write a code that Calculates the average number of non-zero ratings per individual in our data set

So I've written my code below but I am having a hard time getting the code to not include the number of zeros. It runs but unfortunately not the way I want it to. Can anyone shed some light?
the variable movies is basically a list of movies we were given The code will be right if an average of 21.4 is the output.
all_ratings =[
[5,5,4,4,3,1,2,3,4,4,4,3,4,0,0,0,1,2,3,4,4,4,1,4,0,0,0,1,2,5],
[5,0,1,2,3,1,2,3,4,4,4,5,4,2,1,0,1,2,0,5,0,4,1,4,2,0,0,1,0,5],
[5,2,3,4,4,0,0,0,4,5,0,3,0,0,0,3,4,0,1,4,4,4,0,4,0,3,0,1,2,5],
[5,0,4,0,0,4,2,3,0,0,4,0,3,0,1,0,1,2,3,0,2,0,1,0,0,0,4,0,1,5],
[5,4,3,2,1,1,2,3,4,3,4,3,4,0,3,0,1,2,4,4,4,4,1,4,0,0,0,1,2,5],
]
total=[]
average=[]
for index in range (len(all_ratings)):
total+=[sum(all_ratings[index])]
for index in range(len(all_ratings)):
average = average + [total[index]/30]
for index in range(len(movies)):
print(average)
define
def mean(l):
return sum(l)/len(l)
and now
[mean([y for y in x if y > 0]) for x in allratings]
You can use numpy array to select nonzero elements and numpy mean to get the average
from numpy import array, mean
answ = [mean(array(el)[array(el)!=0]) for el in
all_ratings]
print (answ)
Output:
[3.2083333333333335, 2.869565217391304, 3.4210526315789473, 2.8125, 2.96]
Perhaps this might be a solution.
all_ratings =[
[5,5,4,4,3,1,2,3,4,4,4,3,4,0,0,0,1,2,3,4,4,4,1,4,0,0,0,1,2,5],
[5,0,1,2,3,1,2,3,4,4,4,5,4,2,1,0,1,2,0,5,0,4,1,4,2,0,0,1,0,5],
[5,2,3,4,4,0,0,0,4,5,0,3,0,0,0,3,4,0,1,4,4,4,0,4,0,3,0,1,2,5],
[5,0,4,0,0,4,2,3,0,0,4,0,3,0,1,0,1,2,3,0,2,0,1,0,0,0,4,0,1,5],
[5,4,3,2,1,1,2,3,4,3,4,3,4,0,3,0,1,2,4,4,4,4,1,4,0,0,0,1,2,5],
]
def get_average(vals): # without zero
counter = 0
total = 0
for i in range(len(vals)):
if (vals[i] != 0):
counter += 1
total += vals[i]
return round(total / counter, 2)
for i in range(len(all_ratings)):
print(get_average(all_ratings[i]))
Update: I guess 21.4 would be the average number of movies that were rated in each list of ratings. The code below returns 21.4
counter = 0;
for index, item in np.ndenumerate(all_ratings):
if item != 0:
counter += 1
print(counter / 5)

How to find highest power of 2 less than n in a list?

I have a list likes
lst = [20, 40, 110]
I want to find the highest power of 2 in the list satisfied as
For the first number, the highest power of 2 will get the first element of the list as input. So the result is 16 (closest to 20)
For the next numbers, it will get the summation of previous result (i.e 16) and current number (.i.e 40) so the closest number will be 32 (closest 40 +16)
So the output what I expect is
lst_pow2 = [16, 32, 128]
This is my current code to find the highest number of a number, but for my problem it should change something because my input is list. Any suggestion? Thanks
# Python3 program to find highest
# power of 2 smaller than or
# equal to n.
import math
def highestPowerof2(n):
p = int(math.log(n, 2));
return int(pow(2, p));
So what I tried but it does not do the summation
lst_power2 = [highestPowerof2(lst[i]) for i in range(len(lst))]
You can perhaps use the following :
lst_power2 = [highestPowerof2(lst[i]+((i>0) and highestPowerof2(lst[i-1]))) for i in range(len(lst))]
instead of
lst_power2 = [highestPowerof2(lst[i]) for i in range(len(lst))]
You may want to modify your approach thus:
Modify your function to take 2 integers. prev_power and curr_num (this was n in your code)
Calculate the power of 2 for the first number and add to a result list
Now pass this number and the next number in the list to your highestPowerof2 function
Use an extra variable that keeps track of the value to be added, and build your logic while iterating.
lst = [20, 40, 110]
import math
def highestPowerof2(n):
p = int(math.log(n, 2)) #you do not need semi colons in python
return int(pow(2, p))
acc = 0 #to keep track of what was the last highest* power
result = []
for n in lst:
result.append(highestPowerof2(n + acc))
acc = result[-1]
print(result)
#Output:
[16, 32, 128]
This question has an accepted answer but I thought this would be a good problem that could also be solved by using a generator. The accepted answer is definitely compact but I though it would be fun to give this solution as well.
lst = [20, 40, 110]
import math
def highestPowerof2(lst):
last = 0
for element in lst:
p = int(math.log(element + last, 2))
last = int(pow(2, p)) # Remember the last value
yield last
lst_power2 = [i for i in highestPowerof2(lst)]
print(lst_power2)
You could use reduce() too:
functools.reduce(lambda res,n:res+[highestPowerof2(n+res[-1])],lst,[0])[1:]
which is short, just the [1:] is ugly at the end
Or as:
functools.reduce(lambda res,n:res+[highestPowerof2(n+(len(res) and res[-1]))],lst,[])
which does not need the slicing, but it is less readable inside.
Full example:
import math,functools
def highestPowerof2(n):
p = int(math.log(n, 2))
return int(pow(2, p))
lst = [20, 40, 110]
print(functools.reduce(lambda res,n:res+[highestPowerof2(n+res[-1])],lst,[0])[1:])
print(functools.reduce(lambda res,n:res+[highestPowerof2(n+(len(res) and res[-1]))],lst,[]))

Multiplying two arrays in python with different lenghts

I want to know if it's possible to solve this problem. I have this values:
yf = (0.23561643, 0.312328767, 0.3506849315, 0.3890410958, 0.4273972602, 0.84931506)
z = (4.10592285e-05, 0.0012005020, 0.00345332906, 0.006367483, 0.0089151571, 0.01109750, 0.01718827)
I want to use this function (Discount factor) but it's not going to work because of the different lenghts between z and yf.
def f(x):
res = 1/( 1 + x * yf)
return res
f(z)
output: ValueError: cannot evaluate a numeric op with unequal lengths
My question is that if it exists a way to solve this. The approximate output values are:
res = (0.99923, 0.99892, 0.99837, 0.99802, 0.99763, 0.99175)
Any help with this will be perfect and I want to thanks in advance to everyone who takes his/her time to read it or try to help.
Do you want array to broadcast to the whichever is the shorter? You can do this
def f(x):
leng = min(len(x), len(yf))
x = x[:leng]
new_yf = yf[:leng] # Don't want to modify global variable.
res = 1/( 1 + x * new_yf)
return res
and it should work.
Find the minimum length and iterate. Can also covert to numpy arrays and that would avoid a step of iteration
import numpy as np
yf = (0.23561643, 0.312328767, 0.3506849315, 0.3890410958, 0.4273972602, 0.84931506)
z = (4.10592285e-05, 0.0012005020, 0.00345332906, 0.006367483, 0.0089151571, 0.01109750, 0.01718827)
x=min(len(yf),len(z))
res = 1/( 1 + np.array(z[:x]) * np.array(yf[:x]))
using numpy.multiply
res = 1/( 1 + np.multiply(np.array(z[:x]),np.array(yf[:x])))

Categories