Getting mean values out of a for loop

Getting mean values out of a for loop - python

Fairly new to python and I have a for loop that resembles this (I won't include the contents since they seem irrelevant):
for i, (A, B) in enumerate(X):
...
arbitrary calculations
...
print s1
print s2
This cycles through ten times(although it does vary occasionally), giving me 10 values for s1 and 10 for s2. Is there an efficient way of finding the means of these values?

You would need to either append each number to a list, or add them up on the fly before finding the mean.
Using a list:
s1_list = []
s2_list = []
for i, (A, B) in enumerate(X):
...
arbitrary calculations
...
s1_list.append(s1)
s2_list.append(s2)
s1_mean = sum(s1_list)/float(len(s1_list))
s2_mean = sum(s2_list)/float(len(s2_list))
Adding them up on the fly:
s1_total = 0
s2_total = 0
for i, (A, B) in enumerate(X):
...
arbitrary calculations
...
s1_total += s1
s2_total += s2
s1_mean = s1_total/float(len(X))
s2_mean = s2_total/float(len(X))
Use float otherwise the mean will be rounded down if it is a decimal number.

I would not allocate lists like in the other answer, just sum inside the loop and divide afterwards by the total number of elements:
sum1 = 0
sum2 = 0
for i, (A, B) in enumerate(X):
...
arbitrary calculations
...
sum1 += s1
sum2 += s2
n = i+1
print(sum1/n)
print(sum2/n)
Allocation is costly if the lists grow too much bigger.

Sure, you can save them to do so.
lst_s1, lst_s2 = [], []
for i, (A,B) in enumerate(X):
...
lst_s1.append(s1)
lst_s2.append(s2)
print s1
print s2
avg_s1 = sum(lst_s1) / len(lst_s1)
avg_s2 = sum(lst_s2) / len(lst_s2)

Try following snippet to calculate mean of array. Bottom line is that it will not cause an overflow.
X = [9, 9, 9, 9, 9, 9, 9, 9]
factor = 1000000
xlen = len(X)
mean = (sum([float(x) / factor for x in X]) * factor) / xlen
print(mean)

Related

Compare a list with multiple lists inside a list

I would like to compare a list with multiple lists stored in a list and get the average correctness of my data. I want to compare one list with 35 lists(for my project) but i've simplified to comparing one list with three lists to make it easier to understand.
Here's what i've done so far,
def get_accuracy(a, b):
# Initialize variable to get sum
total = 0.0
# Range of 35 because i have 35 lines of data stored in csv file
for i in range(35):
# Get the number of matching zeros and ones between 2 lists
f = sum(a != b for a, b in zip(a, b))
# Divide the number of matched zeros and ones with length of the shorter list
if len(a) > len(b):
percentage = f / len(b) * 100
else:
percentage = f / len(a) * 100
total += percentage
#Return total/35 to get the average correctness after comparing with 35 lists
return total / 35
l1=[1,0,1,0,0]
l2=[[1,0,1,1,0,1],[1,0,1,1,1,0,1,0,0],[1,0,1,1,0,1,0]]
res=get_accuracy(l1,l2)
#Expected answer should be 73.33%
print(res)
I've explained what job every line of code does to complete my comparison. What changes do i have to make to compare l1 with every lists in l2 to get an average matching correctness?

I have found a simpler example to get list similarity in percentage for you:
# initialize lists
test_list1 = [1, 4, 6, 8, 9, 10, 7]
test_list2 = [7, 11, 12, 8, 9]
# printing original lists
print("The original list 1 is : " + str(test_list1))
print("The original list 2 is : " + str(test_list2))
# Percentage similarity of lists
# using "|" operator + "&" operator + set()
res = len(set(test_list1) & set(test_list2)) / float(len(set(test_list1) | set(test_list2))) * 100
# printing result
print("Percentage similarity among lists is : " + str(res))
If for you it is ok to use a library difflib's sequence matcher makes it even easier to get a similarity ratio:
import difflib
sm=difflib.SequenceMatcher(None,a,b)
sm.ratio()
A final version using difflib could look like this:
import difflib
def get_accuracy(a,b):
result = 0.0
for list_contained in b:
sm = difflib.SequenceMatcher(None, a, list_contained)
result += sm.ratio()
return result / len(b)
l1=[1,0,1,0,0]
l2=[[1,0,1,1,0,1],[1,0,1,1,1,0,1,0,0],[1,0,1,1,0,1,0]]
res=get_accuracy(l1,l2)
print(res)
Source

This should do:
f = sum(i != j for i, j in zip(a, b[i]))

assuming your code works for a single list, this should work.
def get_accuracy(a, b):
sum = 0
length = len(b)
for list_in_b in b:
# Initialize variable to get sum
total = 0.0
# Range of 35 because i have 35 lines of data stored in csv file
for i in range(35):
# Get the number of matching zeros and ones between 2 lists
f = sum(a != b for a, b in zip(a, list_in_b))
# Divide the number of matched zeros and ones with length of the shorter list
if len(a) > len(list_in_b):
percentage = f / len(list_in_b ) * 100
else:
percentage = f / len(a) * 100
total += percentage
sum += total/35
#Return total/35 to get the average correctness after comparing with 35 lists
return sum / length

Making a list of a geometric progression when the ratio and range are given

Given the positive integer ratio greater than 1, and the non-negative integer n, create a list consisting of the geometric progression of numbers between (and including) 1 and n with a common ratio of ratio. For example, if ratio is 2 and n is 8, the list would be [1, 2, 4, 8].
Associate the list with the variable geom_prog.
I have tried the following code:
r= ratio
geom_prog = []
for i in range(1, n+1):
i *= r
geom_prog.append(i)
For ratio 2 and n = 8:
Expected result: [1, 2, 4, 8]
What I got: [2, 4, 6, 8, 10, 12, 14, 16]
More than anything I'm just wondering what the correct algorithm for getting the correct elements would be. Or if there is a more efficient way to do this problem.

If I understand
r = 2 # set here the factor
geom_prog = []
x = 1 # first element and var to update
n = 8 # last element
for i in range(x, n+1):
geom_prog.append(x)
x *= r
EDIT:
Or more pythonic
[start * ratio**i for i in range(n)]
ref: Python: Generate a geometric progression using list comprehension

r = ratio
geom_prog = []
x = 1
while x <= n:
geom_prog.append(x)
x *= r

The problem is instead of restricting values till 8,
for i in range(1, n+1):
this is telling the program to run the loop for n times.
try this instead:
n = 8
r = 2
geom_prog = []
i = 1 ;
while(i*r <= n):
geom_prog.append(i*r)
i+=1 ;
#print(geom_prog) #returns [2, 4, 6, 8]

Use a simple while loop:
>>> r = 2
>>> n = 8
>>> e = 1
>>> geom_prog = []
>>> while e <= n:
... geom_prog.append(e)
... e *= r
...
>>> geom_prog
[1, 2, 4, 8]

A few good answers have already been posted, but adding this one as well.
you can use the math library to calculate the for loop upper limit as well as each element in the progression without changing your logic too much.
import math
r= 2
geom_prog = []
n = 8
n = int(math.log(n, r))
for i in range(0, n+1):
k = math.pow(r,i)
geom_prog.append(k)

Suppose you want to know how many terms will be involved in advance.
Think of the following question, we want ratio^m <= n where we want to solve for m.
then we have m <= log(n)/log(ratio), since m is an integer, m <= int(log(n)/log(ratio))
import math
n=8
ratio = 2
r= ratio
geom_prog = [1]
for i in range(1, int(math.log(n)/math.log(ratio))+1):
geom_prog.append(geom_prog[-1] * r)
print(geom_prog)

By looping over range(1, n+1), you're making n passes (8 in your case). The termination criteria you're looking for is when the newest element of your set hits (or exceeds) n. Try a while loop:
>>> def geom(ratio, n):
... series = [1]
... while series[-1] < n:
... series.append( series[-1] * ratio )
... return series
...
>>>
>>> geom(2, 8)
[1, 2, 4, 8]
Probably want to add some code to check this will terminate for your parameters (e.g. a ratio of 1 with n > 1 will never terminate), but this should get you started.

How to generate and filter efficiently all combinations of a list of list product

Hello guys here is the problem. I have something like this in input [[1,2,3],[4,5,6],[7,8,9]]...etc
And i want to generate all possible combination of product of those list and then multiply each elements of the resulting combination beetween them to finally filter the result in a interval.
So first input a n list [[1,2,3],[4,5,6],[7,8,9],[10,11,12]]...etc
Which will then give (1,4,7,10)
(1,4,7,11)
(1,4,7,12)
and so on
Then combination of those result for k in n like (1,4,7)(1,4,10)(1,7,10) for the first row
The multiplication of x as 1*4*7 = 28, 1*4*10 = 40, 1*7*10 = 70
And from this get only the unique combination and the result need in the interval choosed beforehand : if x > 50 and x < 100 i will get (1,7,10) : 70
I did try
def mult(lst): #A function mult i'm using later
r = 1
for element in lst:
r *= element
return round(r)
s = [] #Where i add my list of list
for i in range(int(input1)):
b = input("This is line %s : " % (i+1)).split()
for i in range(len(b)):
b[i] = float(b[i])
s.append(b)
low_result = input("Expected low_result : ")
high_result = input("Expected high_result : ")
combine = []
my_list = []
for element in itertools.product(*s):
l= [float(x) for x in element]
comb = itertools.combinations([*l], int(input2))
for i in list(comb):
combine.append(i)
res = mult(i)
if res >= int(low_result) and res <= int(high_result):
my_list.append(res)
f = open("list_result.txt","a+")
f.write("%s : result is %s\n" % (i, res))
f.close()
And it always result in memory error cause there is too many variation with what i'm seeking.
What i would like is a way to generate from a list of list of 20 elements or more all the product and resulting combination of k in n for the result(interval) that i need.

As suggested above, I think this can be done without exploding your memory by never holding an array in memory at any time. But the main issue is then runtime.
The maths
As written we are:
Producing every combination of m rows of n items n ** m
Then taking a choice of c items from those m values C(m, c)
This is very large. If we have m=25 rows, of n=3 items each and pick c=3 items in them we get:
= n ** m * C(m, c)
= 3 ** 25 * 2300 - n Choose r calculator
= 1.948763802×10¹⁵
If instead we:
Choose c rows from the m rows: C(m, c) as before
Then pick every combination of n items from these c rows: n ** c
With m=25 rows, of n=3 items each and pick c=3 items in them we get:
= n ** c * C(m, c)
= 3 ** 3 * 2300
= 20700
This is now a solvable problem.
The code
from itertools import product, combinations
def mult(values, min_value, max_value):
"""
Multiply together the values, but return None if we get too big or too
small
"""
output = 1
for value in values:
output *= value
# Early return if we go too big
if output > max_value:
return None
# Early return if we goto zero (from which we never return)
if output == 0 and min_value != 0:
return None
if output < min_value:
return None
return output
def yield_valid_combos(values, choose, min_value, max_value):
# No doubt an even fancier list compression would get this too
for rows in combinations(values, choose):
for combos in product(*rows):
value = mult(combos, min_value, max_value)
if value is not None:
yield combos, value
values = [[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]]
with open('list_result.txt', 'w') as fh:
for selection, value in yield_valid_combos(
values, choose=3, min_value=50, max_value=100):
fh.write('{}: result is {}\n'.format(selection, value))
This solution also returns no duplicate answers (unless the same value appears in multiple rows).
As an optimisation the multiplication method attempts to return early if we detect the result will be too big or small. We also only open the file once and then keep adding rows to it as they come.
Further optimisation
You can also optimise your set of values ahead of time by screening out values which cannot contribute to a solution. But for smaller values of c, you may find this is not even necessary.
The smallest possible combination of values is c items from the set of the smallest values in each row. If we take the c - 1 smallest items from the set of smallest values, mutliply them together and then divide the maximum by this number, it gives us an upper bound for the largest value which can be in a solution. We can then then screen out all values above this value (cutting down on permutations)

most efficient way to find a sum of two numbers

I am looking into a problem: given an arbitrary list, in this case it is [9,15,1,4,2,3,6], find any two numbers that would sum to a given result (in this case 10). What would be the most efficient way to do this? My solution is n2 in terms of big O notation and even though I have filtered and sorted the numbers I am sure there is a way to do this more efficiently. Thanks in advance
myList = [9,15,1,4,2,3,6]
myList.sort()
result = 10
myList = filter(lambda x:x < result,myList)
total = 0
for i in myList:
total = total + 1
for j in myList[total:]:
if i + j == result:
print i,j
break

O(n log n) solution
Sort your list. For each number x, binary search for S - x in the list.
O(n) solution
For each number x, see if you have S - x in a hash table. Add x to the hash table.
Note that, if your numbers are really small, the hash table can be a simple array where h[i] = true if i exists in the hash table and false otherwise.

Use a dictionary for this and for each item in list look for total_required - item in the dictionary. I have used collections.Counter here because a set can fail if total_required - item is equal to the current item from the list. Overall complexity is O(N):
>>> from collections import Counter
>>> def find_nums(total, seq):
c = Counter(seq)
for x in seq:
rem = total - x
if rem in c:
if rem == x and c[rem] > 1:
return x, rem
elif rem != x:
return x, rem
...
>>> find_nums(2, [1, 1])
(1, 1)
>>> find_nums(2, [1])
>>> find_nums(24, [9,15,1,4,2,3,6])
(9, 15)
>>> find_nums(9, [9,15,1,4,2,3,6])
(3, 6)

I think, this solution would work....
list = [9,15,1,4,2,3,6]
result = 10
list.sort()
list = filter(lambda x:x < result,list)
myMap = {}
for i in list:
if i in myMap:
print myMap[i], i
break
myMap[result - i] = i

what is a pythonic way to get the number of times list1[i] < list2[i] and vise versa

I have two lists with values, the expected result is a tuple (a,b) where a is the number of i values which list1[i] < list2[i], and b is the number of i values where list1[i] > list2[i] (equalities are not counted at all).
I have this solution, and it works perfectly:
x = (0,0)
for i in range(len(J48)):
if J48[i] < useAllAttributes7NN[i]:
x = (x[0]+1,x[1])
elif J48[i] > useAllAttributes7NN[i]:
x = (x[0], x[1]+1)
However, I am trying to improve my python skills, and it seems very non-pythonic way to achieve it.
What is a pythonic way to achieve the same result?
FYI, this is done to achieve the required input for binom_test() that tries to prove two algorithms are not statistically identical.
I don't believe this information has any additional value to the specific question though.

One way is to build a set of scores and then add them up.
scores = [ (a < b, a > b) for (a, b) in zip(J48, useAllAttributes7nn) ]
x = (sum( a for (a, _) in scores ), sum( b for (_, b) in scores ))
// Or, as per #agf's suggestion (though I prefer comprehensions to map)...
x = [ sum(s) for s in zip(*scores) ]
Another is to zip them once then count scores separately:
zipped = zip(J48, useAllAttributes7nn)
x = (sum( a < b for (a, b) in zipped ), sum( a > b for (a, b) in zipped ))
Note that this doesn't work in Python 3 (thanks #Darthfett).

Just for the sake of fun, solving this problem using complex numbers. Though not Pythonic but quite mathematical :-)
Just think this problem as plotting the result on a two dimensional complex space
result=sum((x < y) + (x > y)*1j for x,y in zip(list1,list2))
(result.real,result.imag)

import itertools
x = [0, 0, 0]
for a, b in itertools.izip(J48, useAllAttributes7NN):
x[cmp(a, b)] += 1
and then take just x[0] and x[2], because x[1] counts the equalities.
Another way (has to parse the lists twice):
first = sum(1 for a, b in itertools.izip(J48, useAllAttributes7NN) if a > b)
second = sum(1 for a, b in itertools.izip(J48, useAllAttributes7NN) if a < b)

A simple solution is:
list1 = range(10)
list2 = reversed(range(10))
x = [0, 0]
for a, b in zip(list1, list2):
x[0] += 1 if a < b else 0
x[1] += 1 if a > b else 0
x = tuple(x)
Giving us:
(5, 5)
zip() is the best way to iterate over two lists at once. If you are using Python 2.x, you might want to use itertools.izip for performance reasons (it's lazy like Python 3.x's zip().
It's also easier to work on a list until you stop changing it, as a list is mutable.
Edit:
A Python 3-compatible version of the versions that use cmp:
def add_tuples(*args):
return tuple(sum(z) for z in zip(*args))
add_tuples(*[(1, 0) if a < b else ((0, 1) if a > b else (0, 0)) for a, b in zip(list1, list2)])

Well, what about:
import itertools
def func(list1, list2):
x,y = 0,0
for (a,b) in itertools.izip(list1, list2):
if a > b:
x += 1
elif a < b:
y += 1
print x,y

list1 = [1, 3, 7, 15, 22, 27]
list2 = [2, 5, 10, 12, 20, 30]
x = 0
y = 0
for index, value in enumerate(list1):
if value < list2[index]:
x += 1
elif value > list2[index]:
y += 1
print (x, y)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Getting mean values out of a for loop - python

Sure, you can save them to do so. lst_s1, lst_s2 = [], [] for i, (A,B) in enumerate(X): ... lst_s1.append(s1) lst_s2.append(s2) print s1 print s2 avg_s1 = sum(lst_s1) / len(lst_s1) avg_s2 = sum(lst_s2) / len(lst_s2)

Try following snippet to calculate mean of array. Bottom line is that it will not cause an overflow. X = [9, 9, 9, 9, 9, 9, 9, 9] factor = 1000000 xlen = len(X) mean = (sum([float(x) / factor for x in X]) * factor) / xlen print(mean)

Related

Compare a list with multiple lists inside a list

Making a list of a geometric progression when the ratio and range are given

How to generate and filter efficiently all combinations of a list of list product

most efficient way to find a sum of two numbers

what is a pythonic way to get the number of times list1[i] < list2[i] and vise versa

Categories

Resources