I have two lists:
A=[100, 200, 300, 400,......]
B=[50, 110, 150, 210, 250,.........]
I want to average the elements in the list B within each elements of A[i] and A[i+1] and calculate the average by counting only those elements within the bound.
For example, to find and count all the numbers in list A within 100 to 200 it should add only 110 + 150 from list B and the average should be (110 + 150) / 2 and so on for 200 to 300 etc.
I have written the code but it seems lengthy. Please help me with shorter methods.
from statistics import mean
for lower, upper in zip(A, A[1:]):
average = mean(x for x in B if lower <= x <= upper)
This will raise a statistics.StatisticsError if there are no elements within one of the windows, because you can't take the average of an empty list. If you want to handle that case, you need to catch the error
from statistics import mean, StatisticsError
for lower, upper in zip(A, A[1:]):
try:
average = mean(x for x in B if lower <= x <= upper)
except StatisticsError:
average = None
print(average)
will print
130
230
None
So far, all the other solutions have a time complexity of O(mn) where A has size m and B has size n, due to iterating over B for each adjacent pair of elements in A.
So here's a solution in O(m + n log m), iterating over B just once and using binary search to find the interval(s) which each number sits in:
from bisect import bisect_left
def average_bins(a, b):
num_bins = len(a) - 1
sums = [0] * num_bins
counts = [0] * num_bins
for x in b:
i = bisect_left(a, x)
if i > 0 and i <= num_bins:
sums[i-1] += x
counts[i-1] += 1
if i < num_bins and a[i] == x:
sums[i] += x
counts[i] += 1
return [ (s/c if c else None) for s, c in zip(sums, counts) ]
If it's known that A is evenly spaced, this can be improved to O(m + n) by eliminating the need to do binary search; replace i = bisect_left(a, x) with i = math.ceil((x - a[0]) / (a[1] - a[0])).
You can do it like so,
avg = []
for j in range(0, len(A)-1):
sum = 0
count = 0
for element in B:
if(element>=A[j] and element<=A[j+1]):
sum+=element
count+=1
if(count!=0):
avg.append(sum/count)
else:
avg.append(None)
from functools import reduce
for i in range(0,len(a)-1):
lst = list(filter(lambda x: x > a[i]and x < a[i+1],b))
avg = reduce(lambda x,y:x+y,lst) / len(lst)
print(avg)
The concept is to take two variables at a time: a[i], a[i+1].
lst filter function is to filter out the records, so that it contains the list of values which are greater than a[i] and less than a[i+1] avg variable, will actually calculate the sum of values in lst and then divide by number of variables to give the average.
Let me know if you want more clarity on the lambda functions.
Related
Giving a list of numbers A, the goal is to find the minimum possible number of divisions you can do on each item to get an x in which x <= sum(A)/2.
For example,
if A = [10,10] you can divide the A[0] by 2 and you can divide A[1] by 2 as well to get half of the sum of the list. So, the minimum possible number of divisions equals to 2,
if A = [200, 25,25,25] you can divide A[0]/2^2 so x =[50,25,25,25] which sum to 125 <= 275 so minimum possible number of divisions equals to 3.
I tried the following way and it works just fine, but when I input huage lists like this get_f([200,25,25,25,490,99999, 2000, 43002]) I got timeout error.
import itertools
def get_f(A):
A = sorted(A)[::-1]
Goal = sum(A)/2
filters = []
x = itertools.product(range(len(A)), repeat=len(A))
for i in x:
i = list(i)
C = A.copy()
x = 0
for Index in i:
C[Index]/=2
x+=1
if sum(C) <= Goal and x not in filters:
filters.append(x)
return sorted(filters)[0]
The currently accepted solution is incorrect- it will, for instance, never return a number larger than the length of the input list, regardless of how large any of those numbers are.
The optimal way to approach this problem is with a priority queue- allowing you to avoid recomputing the max() (as done in another answer) over and over.
Code:
from heapq import heapify, heappop, heappush
from itertools import count
from math import isclose
def solve(nums):
heap = [-x for x in nums]
heapify(heap)
total = sum(nums)
target = total / 2
for divs in count():
if total < target or isclose(total, target):
return divs
delta = heappop(heap) / 2
total += delta
heappush(heap, delta)
Demo:
>>> print(solve([200, 25, 25, 25]))
2
>>> print(solve([200, 25, 25, 25, 0]))
2
>>> print(solve([100, 25, 100, 25]))
3
Note: the original example in the problem statement is wrong- you can divide 200 by 2 twice, and the sum [50, 25, 25, 25] = 125 is less than the original sum 275, making the answer 2, not 3.
Here is my version of the code. It works just fine on the tests. Check it out.
def get_f(A):
"""
A: list of numbers
return: int
"""
x = sum(A) // 2
if x == 0:
return 0
count = 0
for i in A:
if i <= x:
count += 1
return count
print(get_f([10,10]))
print(get_f([200, 25,25,25]))
print(get_f([10,10,10]))
print(get_f([10,10,10,10]))
print(get_f([200,25,25,25,490,99999, 2000, 43002]))
Hope this is what you were looking for
I would like to compare a list with multiple lists stored in a list and get the average correctness of my data. I want to compare one list with 35 lists(for my project) but i've simplified to comparing one list with three lists to make it easier to understand.
Here's what i've done so far,
def get_accuracy(a, b):
# Initialize variable to get sum
total = 0.0
# Range of 35 because i have 35 lines of data stored in csv file
for i in range(35):
# Get the number of matching zeros and ones between 2 lists
f = sum(a != b for a, b in zip(a, b))
# Divide the number of matched zeros and ones with length of the shorter list
if len(a) > len(b):
percentage = f / len(b) * 100
else:
percentage = f / len(a) * 100
total += percentage
#Return total/35 to get the average correctness after comparing with 35 lists
return total / 35
l1=[1,0,1,0,0]
l2=[[1,0,1,1,0,1],[1,0,1,1,1,0,1,0,0],[1,0,1,1,0,1,0]]
res=get_accuracy(l1,l2)
#Expected answer should be 73.33%
print(res)
I've explained what job every line of code does to complete my comparison. What changes do i have to make to compare l1 with every lists in l2 to get an average matching correctness?
I have found a simpler example to get list similarity in percentage for you:
# initialize lists
test_list1 = [1, 4, 6, 8, 9, 10, 7]
test_list2 = [7, 11, 12, 8, 9]
# printing original lists
print("The original list 1 is : " + str(test_list1))
print("The original list 2 is : " + str(test_list2))
# Percentage similarity of lists
# using "|" operator + "&" operator + set()
res = len(set(test_list1) & set(test_list2)) / float(len(set(test_list1) | set(test_list2))) * 100
# printing result
print("Percentage similarity among lists is : " + str(res))
If for you it is ok to use a library difflib's sequence matcher makes it even easier to get a similarity ratio:
import difflib
sm=difflib.SequenceMatcher(None,a,b)
sm.ratio()
A final version using difflib could look like this:
import difflib
def get_accuracy(a,b):
result = 0.0
for list_contained in b:
sm = difflib.SequenceMatcher(None, a, list_contained)
result += sm.ratio()
return result / len(b)
l1=[1,0,1,0,0]
l2=[[1,0,1,1,0,1],[1,0,1,1,1,0,1,0,0],[1,0,1,1,0,1,0]]
res=get_accuracy(l1,l2)
print(res)
Source
This should do:
f = sum(i != j for i, j in zip(a, b[i]))
assuming your code works for a single list, this should work.
def get_accuracy(a, b):
sum = 0
length = len(b)
for list_in_b in b:
# Initialize variable to get sum
total = 0.0
# Range of 35 because i have 35 lines of data stored in csv file
for i in range(35):
# Get the number of matching zeros and ones between 2 lists
f = sum(a != b for a, b in zip(a, list_in_b))
# Divide the number of matched zeros and ones with length of the shorter list
if len(a) > len(list_in_b):
percentage = f / len(list_in_b ) * 100
else:
percentage = f / len(a) * 100
total += percentage
sum += total/35
#Return total/35 to get the average correctness after comparing with 35 lists
return sum / length
I am trying to create a function where:
The output list is generated from random numbers from the input list
The output list is a specified length and adds to a specified sum
ex. I specify that I want a list that is 4 in length and adds up to 10. random numbers are pulled from the input list until the criteria is satisfied.
I feel like I am approaching this problem all wrong trying to use recursion. Any help will be greatly appreciated!!!
EDIT: for more context on this problem.... Its going to be a random enemy generator.
The end goal input list will be coming from a column in a CSV called XP. (I plan to use pandas module). But this CSV will have a list of enemy names in the one column, XP in another, Health in another, etc. So the end goal is to be able to specify the total number of enemies and what the sum XP should be between those enemies and have the list generate with the appropriate information. For ex. 5 enemies with a total of 200 XP between them. The result is maybe -> Apprentice Wizard(50 xp), Apprentice Wizard(50 xp), Grung(50), Xvart(25 xp), Xvart(25 xp). The output list will actually need to include all of the row information for the selected items. And it is totally fine to have duplicated in the output as seen in this example. That will actually make more sense in the narrative of the game that this is for.
The csv --> https://docs.google.com/spreadsheets/d/1PjnN00bikJfY7mO3xt4nV5Ua1yOIsh8DycGqed6hWD8/edit?usp=sharing
import random
from random import *
lis = [1,2,3,4,5,6,7,8,9,10]
output = []
def query (total, numReturns, myList, counter):
random_index = randrange(len(myList)-1)
i = myList[random_index]
h = myList[i]
# if the problem hasn't been solved yet...
if len(output) != numReturns and sum(output) != total:
print(output)
# if the length of the list is 0 (if we just started), then go ahead and add h to the output
if len(output) == 0 and sum(output) + h != total:
output.append(h)
query (total, numReturns, myList, counter)
#if the length of the output is greater than 0
if len(output) > 0:
# if the length plus 1 is less than or equal to the number numReturns
if len(output) +1 <= numReturns:
print(output)
#if the sum of list plus h is greater than the total..then h is too big. We need to try another number
if sum(output) + h > total:
# start counter
for i in myList:# try all numbers in myList...
print(output)
print ("counter is ", counter, " and i is", i)
counter += 1
print(counter)
if sum(output) + i == total:
output.append(i)
counter = 0
break
if sum(output) + i != total:
pass
if counter == len(myList):
del(output[-1]) #delete last item in list
print(output)
counter = 0 # reset the counter
else:
pass
#if the sum of list plus h is less than the total
if sum(output) + h < total:
output.append(h) # add h to the list
print(output)
query (total, numReturns, myList, counter)
if len(output) == numReturns and sum(output) == total:
print(output, 'It worked')
else:
print ("it did not work")
query(10, 4, lis, 0)
I guess that it would be better to get first all n-size combinations of given array which adds to specified number, and then randomly select one of them. Random selecting and checking if sum is equal to specified value, in pessimistic scenario, can last indefinitely.
from itertools import combinations as comb
from random import randint
x = [1,1,2,4,3,1,5,2,6]
def query(arr, total, size):
combs = [c for c in list(comb(arr, size)) if sum(c)==total]
return combs[randint(0, len(combs))]
#example 4-item array with items from x, which adds to 10
print(query(x, 10, 4))
If the numbers in your input list are consecutive numbers, then this is equivalent to the problem of choosing a uniform random output list of N integers in the range [min, max], where the output list is ordered randomly and min and max are the smallest and largest number in the input list. The Python code below shows how this can be solved. It has the following advantages:
It does not use rejection sampling.
It chooses uniformly at random from among all combinations that meet the requirements.
It's based on an algorithm by John McClane, which he posted as an answer to another question. I describe the algorithm in another answer.
import random # Or secrets
def _getSolTable(n, mn, mx, sum):
t = [[0 for i in range(sum + 1)] for j in range(n + 1)]
t[0][0] = 1
for i in range(1, n + 1):
for j in range(0, sum + 1):
jm = max(j - (mx - mn), 0)
v = 0
for k in range(jm, j + 1):
v += t[i - 1][k]
t[i][j] = v
return t
def intsInRangeWithSum(numSamples, numPerSample, mn, mx, sum):
""" Generates one or more combinations of
'numPerSample' numbers each, where each
combination's numbers sum to 'sum' and are listed
in any order, and each
number is in the interval '[mn, mx]'.
The combinations are chosen uniformly at random.
'mn', 'mx', and
'sum' may not be negative. Returns an empty
list if 'numSamples' is zero.
The algorithm is thanks to a _Stack Overflow_
answer (`questions/61393463`) by John McClane.
Raises an error if there is no solution for the given
parameters. """
adjsum = sum - numPerSample * mn
# Min, max, sum negative
if mn < 0 or mx < 0 or sum < 0:
raise ValueError
# No solution
if numPerSample * mx < sum:
raise ValueError
if numPerSample * mn > sum:
raise ValueError
if numSamples == 0:
return []
# One solution
if numPerSample * mx == sum:
return [[mx for i in range(numPerSample)] for i in range(numSamples)]
if numPerSample * mn == sum:
return [[mn for i in range(numPerSample)] for i in range(numSamples)]
samples = [None for i in range(numSamples)]
table = _getSolTable(numPerSample, mn, mx, adjsum)
for sample in range(numSamples):
s = adjsum
ret = [0 for i in range(numPerSample)]
for ib in range(numPerSample):
i = numPerSample - 1 - ib
# Or secrets.randbelow(table[i + 1][s])
v = random.randint(0, table[i + 1][s] - 1)
r = mn
v -= table[i][s]
while v >= 0:
s -= 1
r += 1
v -= table[i][s]
ret[i] = r
samples[sample] = ret
return samples
Example:
weights=intsInRangeWithSum(
# One sample
1,
# Count of numbers per sample
4,
# Range of the random numbers
1, 5,
# Sum of the numbers
10)
# Divide by 100 to get weights that sum to 1
weights=[x/20.0 for x in weights[0]]
I am trying to make an algorithm in Python that will take a list of random numbers from 0 to a 1,000,000 no more than 100 elements in length and will even this array out as much as possible giving me the maximum number of equal elements. This is what I have so far:
def answer(x):
diff = max(x) - min(x)
while diff > 1:
x[x.index(max(x))] = x[x.index(max(x))] - (diff / 2)
x[x.index(min(x))] = x[x.index(min(x))] + (diff / 2)
diff = max(x) - min(x)
return count(x)
def count(x):
from collections import Counter
c = Counter(x)
return max(c.values())
This will take an array such as [0,50] and create an array [25,25] and return the integer 2 because there are two equal elements in the array. I know for a fact this algorithm works in most cases however it doesn't in all.
Can anyone please point out any array of integers this would not yield the correct answer for? Thanks
Edit:
For those who don't want to read the while loop the code finding the range of the entire list. Splitting the range in half and adding half to the min, and subtracting half from the max. It is trying to equalize the entire list while keeping the same sum
[1,4,1] = [2,3,1] = [2,2,2] = (number of equal elements) 3
[2,1,4,9] = [2,5,4,5] = [3,4,4,5] = [4,4,4,4] = (number of equal elements) all4
What about this?
l = [1, 2, 5, 10]
# "best" possible case
l_m = [sum(l) / len(l)] * len(l)
# see if lists fit (division can cause rounding errors)
if sum(l_m) != sum(l):
# if they don't this means we can only have len(l) - 1 similar items
print len(l) - 1
else:
# if sums fit the first list can be spread like this
print len(l)
I can imagine that you're trying to make as many elements in the array equal as possible, while keeping their sum, and keeping the elements integer.
For N elements, you can get N - 1 elements equal, and, with some luck, all N equal.
This is a bit of pseudocode for you:
average = sum(elements) / length(elements) # a float
best_approximation = trunc(average) # round() would also work
discrepancy = sum(elements) - best_approximation * length(elements)
discrepant_value = best_approximation + discrepancy
result = [discrepant_value] + the rest of list having best_approximation value
By construction, you get length(elements) - 1 of equal values and one discrepant_value.
What you're really doing in normalizing your input to an integer average and distributing the remainder among the result.
L = [1,2,3,4,5,7]
# Calc the integer average
avg = sum(L)/len(L)
# Find the remainder
mod = sum(L)%len(L)
# Create a new list length of original
# populating it first with the average
L2 = [avg] * len(L)
# Then add 1 to each element for as many
# as the remainder
for n in range(mod): L2[n] += 1
def count(x):
from collections import Counter
c = Counter(x)
return max(c.values())
count(L2)
4
You don't need to morph the original list or create a new one (negating the need for your import):
L = [1,2,3,4,5,7]
# Don't even need to figure the average if you
# find the remainder of the sum of your list
# divided by the length of your list
mod = sum(L)%len(L)
result = mod if mod >= len(L)/2 else len(L) - mod
print result
4
This is the final solution I have come to.
After it minimizes the entire array's range to no greater than 1, it then checks to see if the number of equal numbers in the array is the same as the length, this means the array looks something like this: [4,4,4,4] then spit out the number of equal numbers immediately (4). If the number of the majority of the equal numbers in the list is less than the length then it equalizes the list. So if the list is something like [4,4,3,3] it is more optimal if it could be turned into [4,4,4,2]. This is what the equalize function can do.
def answer(x):
diff = max(x) - min(x)
while diff > 1:
x[x.index(max(x))] = x[x.index(max(x))] - (diff / 2)
x[x.index(min(x))] = x[x.index(min(x))] + (diff / 2)
diff = max(x) - min(x)
print(x)
if count(x) == len(x):
return count(x)
return equalize(x)
def equalize(x):
from collections import Counter
eq = Counter(x)
eq = min(eq.values())
operations = eq - 1
for i in range(0,operations):
x[x.index(min(x))] = x[x.index(min(x))] + 1
return count(x)
def count(x):
from collections import Counter
c = Counter(x)
return max(c.values())
http://repl.it/6bA/1
The longest arithmetic progression subsequence problem is as follows. Given an array of integers A, devise an algorithm to find the longest arithmetic progression in it. In other words find a sequence i1 < i2 < … < ik, such that A[i1], A[i2], …, A[ik] form an arithmetic progression, and k is maximal. The following code solves the problem in O(n^2) time and space. (Modified from http://www.geeksforgeeks.org/length-of-the-longest-arithmatic-progression-in-a-sorted-array/ . )
#!/usr/bin/env python
import sys
def arithmetic(arr):
n = len(arr)
if (n<=2):
return n
llap = 2
L = [[0]*n for i in xrange(n)]
for i in xrange(n):
L[i][n-1] = 2
for j in xrange(n-2,0,-1):
i = j-1
k = j+1
while (i >=0 and k <= n-1):
if (arr[i] + arr[k] < 2*arr[j]):
k = k + 1
elif (arr[i] + arr[k] > 2*arr[j]):
L[i][j] = 2
i -= 1
else:
L[i][j] = L[j][k] + 1
llap = max(llap, L[i][j])
i = i - 1
k = j + 1
while (i >=0):
L[i][j] = 2
i -= 1
return llap
arr = [1,4,5,7,8,10]
print arithmetic(arr)
This outputs 4.
However I would like to be able to find arithmetic progressions where up to one value is missing. So if arr = [1,4,5,8,10,13] I would like it to report that there is a progression of length 5 with one value missing.
Can this be done efficiently?
Adapted from my answer to Longest equally-spaced subsequence. n is the length of A, and d is the range, i.e. the largest item minus the smallest item.
A = [1, 4, 5, 8, 10, 13] # in sorted order
Aset = set(A)
for d in range(1, 13):
already_seen = set()
for a in A:
if a not in already_seen:
b = a
count = 1
while b + d in Aset:
b += d
count += 1
already_seen.add(b)
# if there is a hole to jump over:
if b + 2 * d in Aset:
b += 2 * d
count += 1
while b + d in Aset:
b += d
count += 1
# don't record in already_seen here
print "found %d items in %d .. %d" % (count, a, b)
# collect here the largest 'count'
I believe that this solution is still O(n*d), simply with larger constants than looking without a hole, despite the two "while" loops inside the two nested "for" loops. Indeed, fix a value of d: then we are in the "a" loop that runs n times; but each of the inner two while loops run at most n times in total over all values of a, giving a complexity O(n+n+n) = O(n) again.
Like the original, this solution is adaptable to the case where you're not interested in the absolute best answer but only in subsequences with a relatively small step d: e.g. n might be 1'000'000, but you're only interested in subsequences of step at most 1'000. Then you can make the outer loop stop at 1'000.