Weighted averaging a list

Weighted averaging a list - python

Thanks for your responses. Yes, I was looking for the weighted average.
rate = [14.424, 14.421, 14.417, 14.413, 14.41]
amount = [3058.0, 8826.0, 56705.0, 30657.0, 12984.0]
I want the weighted average of the top list based on each item of the bottom list.
So, if the first bottom-list item is small (such as 3,058 compared to the total 112,230), then the first top-list item should have less of an effect on the top-list average.
Here is some of what I have tried. It gives me an answer that looks right, but I am not sure if it follows what I am looking for.
for g in range(len(rate)):
rate[g] = rate[g] * (amount[g] / sum(amount))
rate = sum(rate)
EDIT:
After comparing other responses with my code, I decided to use the zip code to keep it as short as possible.

You could use numpy.average to calculate weighted average.
In [13]: import numpy as np
In [14]: rate = [14.424, 14.421, 14.417, 14.413, 14.41]
In [15]: amount = [3058.0, 8826.0, 56705.0, 30657.0, 12984.0]
In [17]: weighted_avg = np.average(rate, weights=amount)
In [19]: weighted_avg
Out[19]: 14.415602815646439

for g in range(len(rate)):
rate[g] = rate[g] * amount[g] / sum(amount)
rate = sum(rate)
is the same as:
sum(rate[g] * amount[g] / sum(amount) for g in range(len(rate)))
which is the same as:
sum(rate[g] * amount[g] for g in range(len(rate))) / sum(amount)
which is the same as:
sum(x * y for x, y in zip(rate, amount)) / sum(amount)
Result:
14.415602815646439

This looks like a weighted average.
values = [1, 2, 3, 4, 5]
weights = [2, 8, 50, 30, 10]
s = 0
for x, y in zip(values, weights):
s += x * y
average = s / sum(weights)
print(average) # 3.38
This outputs 3.38, which indeed tends more toward the values with the highest weights.

Let's use python zip function
zip([iterable, ...])
This function returns a list of tuples, where the i-th tuple contains the i-th element from each of the argument sequences or iterables. The returned list is truncated in length to the length of the shortest argument sequence. When there are multiple arguments which are all of the same length, zip() is similar to map() with an initial argument of None. With a single sequence argument, it returns a list of 1-tuples. With no arguments, it returns an empty list.
weights = [14.424, 14.421, 14.417, 14.413, 14.41]
values = [3058.0, 8826.0, 56705.0, 30657.0, 12984.0]
weighted_average = sum(weight * value for weight, value in zip(weights, values)) / sum(weights)

As a documented and tested function:
def weighted_average(values, weights=None):
"""
Returns the weighted average of `values` with weights `weights`
Returns the simple aritmhmetic average if `weights` is None.
>>> weighted_average([3, 9], [1, 2])
7.0
>>> 7 == (3*1 + 9*2) / (1 + 2)
True
"""
if weights == None:
weights = [1 for _ in range(len(values))]
normalization = 0
val = 0
for value, weight in zip(values, weights):
val += value * weight
normalization += weight
return val / normalization
For completeness another version where the values and weights are stored in tuples:
def weighted_average(values_and_weights):
"""
The input is expected in the form:
[(value_1, weight_1), (value_2, weight_2), ...(value_n, weight_n)]
>>> weighted_average([(3,1), (9,2)])
7.0
>>> 7 == (3*1 + 9*2) / (1 + 2)
True
"""
normalization = 0
val = 0
for value, weight in values_and_weights:
val += value * weight
normalization += weight
return val / normalization

Related

Removing a row of data that contains values more than two standard deviations away from the mean

So I have a list of lists as my dataset X. I computed the means and standard deviations of each row and stored them each into their own list. My goal is finding which rows of X have outliers (values that are more than two standard deviations away from the mean) and deleting the entire row. I have only accomplished removing the outliers from a single test list and not a list of lists:
from math import sqrt
def std_dev(lst): # standard deviation function
mean = float(sum(lst)) / len(lst)
return sqrt(sum((x - mean)**2 for x in lst) / len(lst))
def compute_std(X):
std = []
std.append([std_dev(char) for char in X])
return std
std = compute_std(X)
def means(lst):
return float(sum(lst)) / len(lst)
def compute_mean(X):
mean = []
mean.append([means(chars) for chars in X])
return mean
mean = compute_mean(X)
final_list1 = [x for x in X if (x > mean - 2 * std)]
final_list = [x for x in final_list1 if (x < mean + 2 * std)]
The last two lines of code have only worked on a single list and I want it to iterate through each list in X. I am new to python and list comprehension.

I did not use a list comprehension to get the valid rows of X, and I'm not sure I have found the qualifying rows correctly, (with the stdev and mean).
Perhaps someone else can create a comprehension, but I thought there were too many calculations (mean, stddev, and the comparison) to make a succinct list comprehension.
from statistics import mean, stdev
from random import randint, seed
#seed(1)
X = []
final_list = []
for i in range(10):
M = []
for j in range(25):
M.append(randint(-25, 1000))
X.append(M)
for row in X:
sd = stdev(row)
avg = mean(row)
#print(avg, sd)
high = avg + 2 * sd
low = avg - 2 * sd
if all(low < num < high for num in row):
final_list.append(row)
print(len(final_list))

Code to maximize the sum of squares modulo m

Inputs:
k-> number of lists
m->modulo
Constraints
1<=k<=7
1<=M<=1000
1<=Magnitude of elements in list<=10*9
1<=Elements in each list<=7
`
This snippet of code is responsible for maximizing (x1^2 + x2^2 + ...) % m where x1, x2, ... are chosen from lists X1, X2, ...
k,m=map(int,input().split())
Sum=0
s=[]
for _ in range(k):
s.append(max(map(int,input().split())))
Sum+=int(s[_])**2
print(Sum%m)
So for instance if inputs are :
3 1000
2 5 4
3 7 8 9
5 5 7 8 9 10
The output would be 206, owing to selecting highest element in each list, square that element, take the sum and perform modulus operation using m
So, it would be (5^2+9^2+10^2)%1000=206
If I provide input like,
3 998
6 67828645 425092764 242723908 669696211 501122842 438815206
4 625649397 295060482 262686951 815352670
3 100876777 196900030 523615865
The expected output is 974, but I am getting 624
I would like to know how you would approach this problem or how to correct existing code.

You have to find max((sum of squares) modulo m). That's not the same as max(sum of squares) modulo m.
It may be that you find a sum of squares that's not in absolute terms as large as possible, but is maximum when you take it modulo m.
For example:
m=100
[10, 9],
[10, 5]
Here, the maximum sum of squares is 100 + 100 = 200, which is 0 modulo 100. The maximum (sum of squares modulo 100) is (81 + 100) = 182, which is 82 modulo 100.
Given that m is forced to be small, there's an fast dynamic programming solution that runs in O(m * N) time, where N is the total number of items in all the lists.
def solve(m, xxs):
r = [1] + [0] * (m - 1)
for xs in xxs:
s = [0] * m
for i in xrange(m):
for x in xs:
xx = (x * x) % m
s[i] += r[(i - xx) % m]
r = s
return max(i for i in xrange(m) if r[i])
m = 998
xxs = [
[67828645, 425092764, 242723908, 669696211, 501122842, 438815206],
[625649397, 295060482, 262686951, 815352670],
[100876777, 196900030, 523615865]]
print solve(m, xxs)
This outputs 974 as required.

One important logical problem here is you have to skip the number of items in each list while find the max element in your for loop. That is, instead of
Example,
6 67828645 425092764 242723908 669696211 501122842 438815206
and your data is
67828645 425092764 242723908 669696211 501122842 438815206
That is,
input().split()
You have to use,
input().split()[1:]
As pointed by Paul Hankin, you basically need to find max(sum of powers % m)
You have to find the combination from three lists whose sum%m is max.
So, this is basically,
You scan the input, split with space, leaving the first element which is the number of values in each line,you map them to integers. And then, you find the squares and append them to a list s. Having that you find the product(itertools module) Example - product([1,2],[3,4,5]) will give, [(1, 1), (1, 2), (1, 3), (2, 1), (2, 2), (2, 3)]. Now, you can find the sum of each such result % m and find the max value!
That is,
k,m=map(int,input().split())
from itertools import product
s=[]
for _ in range(k):
s.append(map(lambda x:x**2,map(int,input().split()[1:])))
print(max([sum(i)%m for i in product(*s)]))
Try it online!
This will give you the desired output!
Hope it helps!

Your question is not very clear. However, if I understand it correctly, you have lists of possible values for f(X1), ..., f(Xn) (probably obtained by applying f to all possible values for X1, ..., Xn), and you want to maximize f(X1)^2 + ... + f(Xn)^2 ?
If so, your code seems good, I get the same result:
lists = [[6, 67828645, 425092764, 242723908, 669696211, 501122842, 438815206],
[4, 625649397, 295060482, 262686951, 815352670],
[3, 100876777, 196900030, 523615865]]
sum = 0
for l in lists:
sum += max(l)**2
print(sum%998)
This print 624, just like your code. Where are you getting the 974 from ?

Not going to win any codegolf with this but here was my solution:
from functools import reduce
def get_input():
"""
gets input from stdin.
input format:
3 1000
2 5 4
3 7 8 9
5 5 7 8 9 10
"""
k, m = [int(i) for i in input().split()]
lists = []
for _ in range(k):
lists.append([int(i) for i in input().split()[1:]])
return m, k, lists
def maximise(m, k, lists):
"""
m is the number by which the sum of squares is modulo'd
k is the number of lists in the list of lists
lists is the list of lists containing vals to be sum of squared
maximise aims to maximise S for:
S = (f(x1) + f(x2)...+ f(xk)) % m
where:
f(x) = x**2
"""
max_value = reduce(lambda x,y: x+y**2, [max(l) for l in lists], 0)
# check whether the max sum of squares is greater than m
# if it is the answer has to be the max
if max_value < m:
print(max_value)
return
results = []
for product in cartesian_product(lists):
S = reduce(lambda x, y: x + y**2, product, 0) % m
if S == m-1:
print(S)
return
results.append(S)
print(max(results))
def cartesian_product(ll, accum=None):
"""
all combinations of lists made by combining one element from
each list in a list of lists (cartesian product)
"""
if not accum:
accum = []
for i in range(len(ll[0])):
if len(ll) == 1:
yield accum + [ll[0][i]]
else:
yield from cartesian_product(ll[1:], accum + [ll[0][i]])
if __name__ == "__main__":
maximise(*get_input())

how to execute sigma operation in python

I am trying to create a function that computes this formula:
.
Formula non-screenshot:
distance = sigma * (( observed - expected)**2 / expected )
This is my current code:
def distance(observed, expected):
num = (observed - expected)**2
den = (expected)
dist = sigma * (num/den)
return dist
I have no idea how I would compute sigma, so I appreciate any help/feedback!
Thanks!

Sigma here means the sum over multiple observed and expected pairs.
For example:
If observed is a list of numbers [ 1,1,3,3,...]
and expected is a list of expected values corresponding to the observed values, say, [1.2,1.3,3.1,3.2...]
Then you are required to find the sum over their individual distances.
def distance(observed, expected):
res = 0
for o, e in zip(observed,expected):
res += (o-e)**2/e
return res

Sigma is summation over the range and not the multiplication.
Observed and Expected must be list of numbers of same length.
def distance(observed, expected):
sample_space_length = len(observed)
distance = 0
for x in range(sample_space_length):
distance += ((observed[x] - expected[x]) ** 2) / expected[x]
return distance

observed should be a list of number
def distance(observed, expected):
return sum((item - expected)**2*1.0/ expected for item in observed )
observed = [1,3,45,56,3,2,4,5,6,7]
expected = sum(observed)/len(observed)
print distance(observed,expected)
274.461538462

How to optimize data generation for numpy call

I'd like to know how to make the following code shorter and/or more efficient. Could I (or should I) get rid of the for loop by using a functional method, or is there method I should be using from numpy?
The code calculates the expected value of an array of of integers.
vals = np.arange(self.n+1)
# array of probability of each value in vals
parr = np.ones(len(vals))
for i in range(len(vals)):
parr[i] *= self.prob(vals[i])
return np.dot(vals,parr)
As requested in comments, the implementation of the method prob():
def prob(self, x):
"""Computes probability of removing x items
:param x: number of items to remove
:returns: probability of removing x items
"""
# p is the probability of removing an item
# sl.choose computes n choose x
return sl.choose(self.n, x) * (self.p**x) * \
(1-self.p)**(self.n-x)

I think it will be most faster:
vals = np.arange(self.n+1)
# array of probability of each value in vals
parr = self.prob(vals)
return np.dot(vals,parr)
and function:
def prob(list_of_x):
"""Computes probability of removing x items
:param list_of_x: numbers of items to remove
:returns: probability of removing x items
"""
# p is the probability of removing an item
# sl.choose computes n choose x
return np.asarray([sl.choose(self.n, e) for e in list_of_x]) * (self.p ** list_of_x) * \
(1-self.p)**(self.n - list_of_x)
Because numpy is faster:
import timeit
import numpy as np
list_a = [1, 2, 3] * 1000
list_b = [4, 5, 6] * 1000
np_list_a = np.asarray(list_a)
np_list_b = np.asarray(list_b)
print(timeit.timeit('[a * b for a, b in zip(list_a, list_b)]', 'from __main__ import list_a, list_b', number=1000))
print(timeit.timeit('np_list_a * np_list_b', 'from __main__ import np_list_a, np_list_b', number=1000))
Result:
0.19378583212707723
0.004333830584755033

The loop can be reduced to a list comprehension:
vals = np.arange(self.n+1)
# array of probability of each value in vals
parr = [self.prob(v) for v in vals]
return np.dot(vals, parr)

Evaluation of lists: AvgP#K and R#K are they same?

My goal is to understand Average Precision at K, and Recall at K. I have two lists, one is predicted and other is actual (ground truth)
lets call these two lists as predicted and actual. Now I want to do precision#k and recall#k.
Using python I implemented Avg precision at K as follows:
def apk(actual, predicted, k=10):
"""
Computes the average precision at k.
This function computes the average precision at k between two lists of items.
Parameters
----------
actual: list
A list of elements that are to be predicted (order doesn't matter)
predicted : list
A list of predicted elements (order does matter)
k: int, optional
Returns
-------
score : double
The average precision at k over the input lists
"""
if len(predicted) > k:
predicted = predicted[:k]
score = 0.0
num_hits = 0.0
for i,p in enumerate(predicted):
if p in actual and p not in predicted[:i]:
num_hits += 1.0
score += num_hits / (i + 1.0)
if not actual:
return 1.0
if min(len(actual), k) == 0:
return 0.0
else:
return score / min(len(actual), k)
lets assume that our predicted has 5 strings in following order:
predicted = ['b','c','a','e','d'] andactual = ['a','b','e']since we are doing #k would the precision#k is same asrecall#k? If not how would I dorecall#k`
If I want to do f-measure (f-score) what would be the best route to do for above mention list?

I guess, you've already checked wiki. Based on its formula, the 3rd and the biggest one (after the words 'This finite sum is equivalent to:'), let's see at your example for each iteration:
i=1 p = 1
i=2 rel = 0
i=3 p = 2/3
i=4 p = 3/4
i=5 rel = 0
So, avp#4 = avp#5 = (1 + 0.66 + 0.75) / 3 = 0.805; avp#3 = (1 + 0.66) / 3 and so on.
Recall#5 = Recall#4 = 3/3 = 1; Recall#3 = 2/3; Recall#2 =Recall#1 = 1/3
Below is the code for precision#k and recall#k. I kept your notation, while it seems to be more common to use actual for observed/returned value and expected for ground truth (see for example JUnit defaults).
def precision(actual, predicted, k):
act_set = set(actual)
pred_set = set(predicted[:k])
result = len(act_set & pred_set) / float(k)
return result
def recall(actual, predicted, k):
act_set = set(actual)
pred_set = set(predicted[:k])
result = len(act_set & pred_set) / float(len(act_set))
return result

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Weighted averaging a list - python

This looks like a weighted average. values = [1, 2, 3, 4, 5] weights = [2, 8, 50, 30, 10] s = 0 for x, y in zip(values, weights): s += x * y average = s / sum(weights) print(average) # 3.38 This outputs 3.38, which indeed tends more toward the values with the highest weights.

Related

Removing a row of data that contains values more than two standard deviations away from the mean

Code to maximize the sum of squares modulo m

how to execute sigma operation in python

How to optimize data generation for numpy call

Evaluation of lists: AvgP#K and R#K are they same?

Categories

Resources