Compare a list with multiple lists inside a list - python

I would like to compare a list with multiple lists stored in a list and get the average correctness of my data. I want to compare one list with 35 lists(for my project) but i've simplified to comparing one list with three lists to make it easier to understand.
Here's what i've done so far,
def get_accuracy(a, b):
# Initialize variable to get sum
total = 0.0
# Range of 35 because i have 35 lines of data stored in csv file
for i in range(35):
# Get the number of matching zeros and ones between 2 lists
f = sum(a != b for a, b in zip(a, b))
# Divide the number of matched zeros and ones with length of the shorter list
if len(a) > len(b):
percentage = f / len(b) * 100
else:
percentage = f / len(a) * 100
total += percentage
#Return total/35 to get the average correctness after comparing with 35 lists
return total / 35
l1=[1,0,1,0,0]
l2=[[1,0,1,1,0,1],[1,0,1,1,1,0,1,0,0],[1,0,1,1,0,1,0]]
res=get_accuracy(l1,l2)
#Expected answer should be 73.33%
print(res)
I've explained what job every line of code does to complete my comparison. What changes do i have to make to compare l1 with every lists in l2 to get an average matching correctness?

I have found a simpler example to get list similarity in percentage for you:
# initialize lists
test_list1 = [1, 4, 6, 8, 9, 10, 7]
test_list2 = [7, 11, 12, 8, 9]
# printing original lists
print("The original list 1 is : " + str(test_list1))
print("The original list 2 is : " + str(test_list2))
# Percentage similarity of lists
# using "|" operator + "&" operator + set()
res = len(set(test_list1) & set(test_list2)) / float(len(set(test_list1) | set(test_list2))) * 100
# printing result
print("Percentage similarity among lists is : " + str(res))
If for you it is ok to use a library difflib's sequence matcher makes it even easier to get a similarity ratio:
import difflib
sm=difflib.SequenceMatcher(None,a,b)
sm.ratio()
A final version using difflib could look like this:
import difflib
def get_accuracy(a,b):
result = 0.0
for list_contained in b:
sm = difflib.SequenceMatcher(None, a, list_contained)
result += sm.ratio()
return result / len(b)
l1=[1,0,1,0,0]
l2=[[1,0,1,1,0,1],[1,0,1,1,1,0,1,0,0],[1,0,1,1,0,1,0]]
res=get_accuracy(l1,l2)
print(res)
Source

This should do:
f = sum(i != j for i, j in zip(a, b[i]))

assuming your code works for a single list, this should work.
def get_accuracy(a, b):
sum = 0
length = len(b)
for list_in_b in b:
# Initialize variable to get sum
total = 0.0
# Range of 35 because i have 35 lines of data stored in csv file
for i in range(35):
# Get the number of matching zeros and ones between 2 lists
f = sum(a != b for a, b in zip(a, list_in_b))
# Divide the number of matched zeros and ones with length of the shorter list
if len(a) > len(list_in_b):
percentage = f / len(list_in_b ) * 100
else:
percentage = f / len(a) * 100
total += percentage
sum += total/35
#Return total/35 to get the average correctness after comparing with 35 lists
return sum / length

Related

Python list average value

I have two lists:
A=[100, 200, 300, 400,......]
B=[50, 110, 150, 210, 250,.........]
I want to average the elements in the list B within each elements of A[i] and A[i+1] and calculate the average by counting only those elements within the bound.
For example, to find and count all the numbers in list A within 100 to 200 it should add only 110 + 150 from list B and the average should be (110 + 150) / 2 and so on for 200 to 300 etc.
I have written the code but it seems lengthy. Please help me with shorter methods.
from statistics import mean
for lower, upper in zip(A, A[1:]):
average = mean(x for x in B if lower <= x <= upper)
This will raise a statistics.StatisticsError if there are no elements within one of the windows, because you can't take the average of an empty list. If you want to handle that case, you need to catch the error
from statistics import mean, StatisticsError
for lower, upper in zip(A, A[1:]):
try:
average = mean(x for x in B if lower <= x <= upper)
except StatisticsError:
average = None
print(average)
will print
130
230
None
So far, all the other solutions have a time complexity of O(mn) where A has size m and B has size n, due to iterating over B for each adjacent pair of elements in A.
So here's a solution in O(m + n log m), iterating over B just once and using binary search to find the interval(s) which each number sits in:
from bisect import bisect_left
def average_bins(a, b):
num_bins = len(a) - 1
sums = [0] * num_bins
counts = [0] * num_bins
for x in b:
i = bisect_left(a, x)
if i > 0 and i <= num_bins:
sums[i-1] += x
counts[i-1] += 1
if i < num_bins and a[i] == x:
sums[i] += x
counts[i] += 1
return [ (s/c if c else None) for s, c in zip(sums, counts) ]
If it's known that A is evenly spaced, this can be improved to O(m + n) by eliminating the need to do binary search; replace i = bisect_left(a, x) with i = math.ceil((x - a[0]) / (a[1] - a[0])).
You can do it like so,
avg = []
for j in range(0, len(A)-1):
sum = 0
count = 0
for element in B:
if(element>=A[j] and element<=A[j+1]):
sum+=element
count+=1
if(count!=0):
avg.append(sum/count)
else:
avg.append(None)
from functools import reduce
for i in range(0,len(a)-1):
lst = list(filter(lambda x: x > a[i]and x < a[i+1],b))
avg = reduce(lambda x,y:x+y,lst) / len(lst)
print(avg)
The concept is to take two variables at a time: a[i], a[i+1].
lst filter function is to filter out the records, so that it contains the list of values which are greater than a[i] and less than a[i+1] avg variable, will actually calculate the sum of values in lst and then divide by number of variables to give the average.
Let me know if you want more clarity on the lambda functions.

Similarity Measure in Python

I am working on this coding challenge named Similarity Measure. Now the problem is my code works fine for some test cases, and failed due to the Time Limit Exceed problem. However, my code is not wrong, takes more than 25 sec for input of range 10^4.
I need to know what I can do to make it more efficient, I cannot think on any better solution than my code.
Question goes like this:
Problems states that given an array of positive integers, and now we have to answer based upon the Q queries.
Query: Given two indices L,R, determine the maximum absolute difference of index of two same elements lies between L and R
If in a range, there are no two same inputs then return 0
INPUT FORMAT
The first line contains N, no. of elements in the array A
The Second line contains N space separated integers that are elements of the array A
The third line contains Q the number of queries
Each of the Q lines contains L, R
CONSTRAINTS
1 <= N, Q <= 10^4
1 <= Ai <= 10^4
1 <= L, R <= N
OUTPUT FORMAT
For each query, print the ans in a new line
Sample Input
5
1 1 2 1 2
5
2 3
3 4
2 4
3 5
1 5
Sample Output
0
0
2
2
3
Explanation
[2,3] - No two elements are same
[3,4] - No two elements are same
[2,4] - there are two 1's so ans = |4-2| = 2
[3,5] - there are two 2's so ans = |5-3| = 2
[1,5] - there are three 1's and two 2's so ans = max(|4-2|, |5-3|, |4-1|, |2-1|) = 3
Here is my algorithm:
To take the input and test the range in a different method
Input will be L, R and the Array
For difference between L and R equal to 1, check if the next element is equal, return 1 else return 0
For difference more than 1, loop through array
Make a nested loop to check for the same element, if yes, store the difference into maxVal variable
Return maxVal
My Code:
def ansArray(L, R, arr):
maxVal = 0
if abs(R - L) == 1:
if arr[L-1] == arr[R-1]: return 1
else: return 0
else:
for i in range(L-1, R):
for j in range(i+1, R):
if arr[i] == arr[j]:
if (j-i) > maxVal: maxVal = j-i
return maxVal
if __name__ == '__main__':
input()
arr = (input().split())
for i in range(int(input())):
L, R = input().split()
print(ansArray(int(L), int(R), arr))
Please help me with this. I really want to learn a different and a more efficient way to solve this problem. Need to pass all the TEST CASES. :)
You can try this code:
import collections
def ansArray(L, R, arr):
dct = collections.defaultdict(list)
for index in range(L - 1, R):
dct[arr[index]].append(index)
return max(lst[-1] - lst[0] for lst in dct.values())
if __name__ == '__main__':
input()
arr = (input().split())
for i in range(int(input())):
L, R = input().split()
print(ansArray(int(L), int(R), arr))
Explanation:
dct is a dictionary that for every seen number keeps a list of indices. The list is sorted so lst[-1] - lst[0] will give maximum absolute difference for this number. Applying max to all this differences you get the answer. Code complexity is O(R - L).
This can be solved as O(N) approximately the following way:
from collections import defaultdict
def ansArray(L, R, arr) :
# collect the positions and save them into the dictionary
positions = defaultdict(list)
for i,j in enumerate(arr[L:R+1]) :
positions[j].append(i)
# create the list of the max differences in index
max_diff = list()
for vals in positions.values() :
max_diff.append( max(vals) - min(vals) )
# now return the max element from the list we have just created
if len(max_diff) :
return max(max_diff)
else :
return 0

How to generate and filter efficiently all combinations of a list of list product

Hello guys here is the problem. I have something like this in input [[1,2,3],[4,5,6],[7,8,9]]...etc
And i want to generate all possible combination of product of those list and then multiply each elements of the resulting combination beetween them to finally filter the result in a interval.
So first input a n list [[1,2,3],[4,5,6],[7,8,9],[10,11,12]]...etc
Which will then give (1,4,7,10)
(1,4,7,11)
(1,4,7,12)
and so on
Then combination of those result for k in n like (1,4,7)(1,4,10)(1,7,10) for the first row
The multiplication of x as 1*4*7 = 28, 1*4*10 = 40, 1*7*10 = 70
And from this get only the unique combination and the result need in the interval choosed beforehand : if x > 50 and x < 100 i will get (1,7,10) : 70
I did try
def mult(lst): #A function mult i'm using later
r = 1
for element in lst:
r *= element
return round(r)
s = [] #Where i add my list of list
for i in range(int(input1)):
b = input("This is line %s : " % (i+1)).split()
for i in range(len(b)):
b[i] = float(b[i])
s.append(b)
low_result = input("Expected low_result : ")
high_result = input("Expected high_result : ")
combine = []
my_list = []
for element in itertools.product(*s):
l= [float(x) for x in element]
comb = itertools.combinations([*l], int(input2))
for i in list(comb):
combine.append(i)
res = mult(i)
if res >= int(low_result) and res <= int(high_result):
my_list.append(res)
f = open("list_result.txt","a+")
f.write("%s : result is %s\n" % (i, res))
f.close()
And it always result in memory error cause there is too many variation with what i'm seeking.
What i would like is a way to generate from a list of list of 20 elements or more all the product and resulting combination of k in n for the result(interval) that i need.
As suggested above, I think this can be done without exploding your memory by never holding an array in memory at any time. But the main issue is then runtime.
The maths
As written we are:
Producing every combination of m rows of n items n ** m
Then taking a choice of c items from those m values C(m, c)
This is very large. If we have m=25 rows, of n=3 items each and pick c=3 items in them we get:
= n ** m * C(m, c)
= 3 ** 25 * 2300 - n Choose r calculator
= 1.948763802×10¹⁵
If instead we:
Choose c rows from the m rows: C(m, c) as before
Then pick every combination of n items from these c rows: n ** c
With m=25 rows, of n=3 items each and pick c=3 items in them we get:
= n ** c * C(m, c)
= 3 ** 3 * 2300
= 20700
This is now a solvable problem.
The code
from itertools import product, combinations
def mult(values, min_value, max_value):
"""
Multiply together the values, but return None if we get too big or too
small
"""
output = 1
for value in values:
output *= value
# Early return if we go too big
if output > max_value:
return None
# Early return if we goto zero (from which we never return)
if output == 0 and min_value != 0:
return None
if output < min_value:
return None
return output
def yield_valid_combos(values, choose, min_value, max_value):
# No doubt an even fancier list compression would get this too
for rows in combinations(values, choose):
for combos in product(*rows):
value = mult(combos, min_value, max_value)
if value is not None:
yield combos, value
values = [[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]]
with open('list_result.txt', 'w') as fh:
for selection, value in yield_valid_combos(
values, choose=3, min_value=50, max_value=100):
fh.write('{}: result is {}\n'.format(selection, value))
This solution also returns no duplicate answers (unless the same value appears in multiple rows).
As an optimisation the multiplication method attempts to return early if we detect the result will be too big or small. We also only open the file once and then keep adding rows to it as they come.
Further optimisation
You can also optimise your set of values ahead of time by screening out values which cannot contribute to a solution. But for smaller values of c, you may find this is not even necessary.
The smallest possible combination of values is c items from the set of the smallest values in each row. If we take the c - 1 smallest items from the set of smallest values, mutliply them together and then divide the maximum by this number, it gives us an upper bound for the largest value which can be in a solution. We can then then screen out all values above this value (cutting down on permutations)

How to use random to print integers whose sum is less than or equal to another integer in Python?

I have a list of integers and I would like to print all integers whose sum is less than than or equal to a variable. My sum is 38 below, how do I randomly return the values in the list below where my sum is less than or equal to 15? I have tried to adapt the function below, but it doesn't work.
j=[4,5,6,7,1,3,7,5]
x = 15
jSum = sum(j)
def decomposition(i):
while i <= x:
n = random.randint(j, i)
yield n
i -= n
print i
decomposition(jSum)
Let's create a list of possible lists with sums < x. This can be done with two nested for-loops and itertools.combinations:
ops=[list(c) for l in range(1,len(j)) for c in itertools.combinations(j,l) if sum(c) < x]
then just randomly select one with random.choice:
random.choice(ops)
And when I ran this with j = [4,5,6,7,1,3,7,5] and x = 15 the random output I got was:
[6, 1, 3]
Which works! (sum is < 15 and all elements are in j)

Python Algorithm To Maximize Number of Equal Elements in List

I am trying to make an algorithm in Python that will take a list of random numbers from 0 to a 1,000,000 no more than 100 elements in length and will even this array out as much as possible giving me the maximum number of equal elements. This is what I have so far:
def answer(x):
diff = max(x) - min(x)
while diff > 1:
x[x.index(max(x))] = x[x.index(max(x))] - (diff / 2)
x[x.index(min(x))] = x[x.index(min(x))] + (diff / 2)
diff = max(x) - min(x)
return count(x)
def count(x):
from collections import Counter
c = Counter(x)
return max(c.values())
This will take an array such as [0,50] and create an array [25,25] and return the integer 2 because there are two equal elements in the array. I know for a fact this algorithm works in most cases however it doesn't in all.
Can anyone please point out any array of integers this would not yield the correct answer for? Thanks
Edit:
For those who don't want to read the while loop the code finding the range of the entire list. Splitting the range in half and adding half to the min, and subtracting half from the max. It is trying to equalize the entire list while keeping the same sum
[1,4,1] = [2,3,1] = [2,2,2] = (number of equal elements) 3
[2,1,4,9] = [2,5,4,5] = [3,4,4,5] = [4,4,4,4] = (number of equal elements) all4
What about this?
l = [1, 2, 5, 10]
# "best" possible case
l_m = [sum(l) / len(l)] * len(l)
# see if lists fit (division can cause rounding errors)
if sum(l_m) != sum(l):
# if they don't this means we can only have len(l) - 1 similar items
print len(l) - 1
else:
# if sums fit the first list can be spread like this
print len(l)
I can imagine that you're trying to make as many elements in the array equal as possible, while keeping their sum, and keeping the elements integer.
For N elements, you can get N - 1 elements equal, and, with some luck, all N equal.
This is a bit of pseudocode for you:
average = sum(elements) / length(elements) # a float
best_approximation = trunc(average) # round() would also work
discrepancy = sum(elements) - best_approximation * length(elements)
discrepant_value = best_approximation + discrepancy
result = [discrepant_value] + the rest of list having best_approximation value
By construction, you get length(elements) - 1 of equal values and one discrepant_value.
What you're really doing in normalizing your input to an integer average and distributing the remainder among the result.
L = [1,2,3,4,5,7]
# Calc the integer average
avg = sum(L)/len(L)
# Find the remainder
mod = sum(L)%len(L)
# Create a new list length of original
# populating it first with the average
L2 = [avg] * len(L)
# Then add 1 to each element for as many
# as the remainder
for n in range(mod): L2[n] += 1
def count(x):
from collections import Counter
c = Counter(x)
return max(c.values())
count(L2)
4
You don't need to morph the original list or create a new one (negating the need for your import):
L = [1,2,3,4,5,7]
# Don't even need to figure the average if you
# find the remainder of the sum of your list
# divided by the length of your list
mod = sum(L)%len(L)
result = mod if mod >= len(L)/2 else len(L) - mod
print result
4
This is the final solution I have come to.
After it minimizes the entire array's range to no greater than 1, it then checks to see if the number of equal numbers in the array is the same as the length, this means the array looks something like this: [4,4,4,4] then spit out the number of equal numbers immediately (4). If the number of the majority of the equal numbers in the list is less than the length then it equalizes the list. So if the list is something like [4,4,3,3] it is more optimal if it could be turned into [4,4,4,2]. This is what the equalize function can do.
def answer(x):
diff = max(x) - min(x)
while diff > 1:
x[x.index(max(x))] = x[x.index(max(x))] - (diff / 2)
x[x.index(min(x))] = x[x.index(min(x))] + (diff / 2)
diff = max(x) - min(x)
print(x)
if count(x) == len(x):
return count(x)
return equalize(x)
def equalize(x):
from collections import Counter
eq = Counter(x)
eq = min(eq.values())
operations = eq - 1
for i in range(0,operations):
x[x.index(min(x))] = x[x.index(min(x))] + 1
return count(x)
def count(x):
from collections import Counter
c = Counter(x)
return max(c.values())
http://repl.it/6bA/1

Categories