Python - evaulate elements in list of lists - python

Trying to learn Python and got this task to find if there is a consecutive year-over-year improvement (higher score) from 2003-2018 for any of the countries.
Given list:
lst=[[;2018;2015;2012;2009;2006;2003],[Country1;558;544;538;525;525;527],[Country2;551;548;561;555;547;550],[Country3;545;538;536;534;533;529],[Country4;526;524;554;546;547;542]]
The list is a lot longer than the sample. The countries with counsecutive year-over-year improvement shall be presented in a table. Second task is to do the same, but for countries with year-over-year lower score.
No imports are allowed.
I'm think that I need to do some if-searches in a for-loop, but I'm drawing a blank here really. I can't wrap me head around it really.
Any strategy tip or code samples are much appreciated.

For starters, I'd use the differences between each years, like so
lst=[[2018,2015,2012,2009,2006,2003],["Country1",558,544,538,525,525,527],["Country2",551,548,561,555,547,550],["Country3",545,538,536,534,533,529],["Country4",526,524,554,546,547,542]]
for list in lst[1:]:
nums = list[1:]
print([nextnum - num for num, nextnum in zip(nums, nums[1:])])
Then you can make a simple loop checking where the differences stay positive/negative

I'm not sure how you want to represent this as a table, but to calculate YoY you can do it like this:
def yoy(y1, y2):
return (y2 - y1) / y1
And then apply it on your list to find out which countries have improved year by year:
for row in lst[1:]:
# calculate YoY
country_yoys = []
for y1, y2 in zip(row[1:], row[2:]):
gain = yoy(y1, y2)
country_yoys.append(gain)
# Check if each YoY has an improvement
counsecutive_yoy_improv = all(sum(country_yoys[i:i+2]) > 0 for i in range(1, len(country_yoys)+1))
# Print results
if counsecutive_yoy_improv:
print(f"Country \"{row[0]}\" has been doing good!")
else:
print(f"Country \"{row[0]}\" has been doing bad!")

Try this.
lst = [
['',2018,2015,2012,2009,2006,2003],
["Country1",558,544,538,525,525,524],
["Country2",551,548,561,555,547,550],
["Country3",545,538,536,534,533,529],
["Country4",526,524,554,546,547,542]
]
_, *year_data = lst.pop(0)
countries = lst
def is_decreasing(lst) -> bool:
last_checked, *rest = lst
for year_data in rest:
if year_data < last_checked:
last_checked = year_data
else:
return False
return True
def is_increasing(lst) -> bool:
last_checked, *rest = lst
for year_data in rest:
if year_data > last_checked:
last_checked = year_data
else:
return False
return True
def find_yoy():
increasing_yoy = []
decreasing_yoy = []
for country in countries:
country_name, *country_data = country
assert len(year_data) == len(country_data)
sorted_year_data, sorted_country_data = zip(*sorted(zip(year_data, country_data)))
if is_increasing(sorted_country_data):
increasing_yoy.append(country_name)
elif is_decreasing(sorted_country_data):
decreasing_yoy.append(country_name)
print("countries with increasing yoy:", ', '.join(increasing_yoy), '.')
print("countries with decreasing yoy:", ','.join(decreasing_yoy), '.')
find_yoy()
If performance matters, merge is_decreasing and is_increasing into one function.

Related

Is it possible to make this algorithm recursive?

Background
We have a family tradition where my and my siblings' Christmas presents are identified by a code that can be solved using only numbers related to us. For example, the code could be birth month * age + graduation year (This is a simple one). If the numbers were 8 * 22 + 2020 = 2196, the number 2196 would be written on all my Christmas presents.
I've already created a Python class that solves the code with certain constraints, but I'm wondering if it's possible to do it recursively.
Current Code
The first function returns a result set for all possible combinations of numbers and operations that produce a value in target_values
#Master algorithm (Get the result set of all combinations of numbers and cartesian products of operations that reach a target_value, using only the number_of_numbers_in_solution)
#Example: sibling1.results[1] = [(3, 22, 4), (<built-in function add>, <built-in function add>), 29]. This means that 3 + 22 + 4 = 29, and 29 is in target_values
import operator
from itertools import product
from itertools import combinations
NUMBER_OF_OPERATIONS_IN_SOLUTION = 2 #Total numbers involved is this plus 1
NUMBER_OF_NUMBERS_IN_SOLUTION = NUMBER_OF_OPERATIONS_IN_SOLUTION + 1
TARGET_VALUES = {22,27,29,38,39}
def getresults( list ):
#Add the cartesian product of all possible operations to a variable ops
ops = []
opslist = [operator.add, operator.sub, operator.mul, operator.truediv]
for val in product(opslist, repeat=NUMBER_OF_OPERATIONS_IN_SOLUTION):
ops.append(val)
#Get the result set of all combinations of numbers and cartesian products of operations that reach a target_value
results = []
for x in combinations(list, NUMBER_OF_NUMBERS_IN_SOLUTION):
for y in ops:
result = 0
for z in range(len(y)):
#On the first iteration, do the operation on the first two numbers (x[z] and x[z+1])
if (z == 0):
#print(y[z], x[z], x[z+1])
result = y[z](x[z], x[z+1])
#For all other iterations, do the operation on the current result and x[z+1])
else:
#print(y[z], result, x[z+1])
result = y[z](result, x[z+1])
if result in TARGET_VALUES:
results.append([x, y, result])
#print (x, y)
print(len(results))
return results
Then a class that takes in personal parameters for each person and gets the result set
def getalpha( str, inverse ):
"Converts string to alphanumeric array of chars"
array = []
for i in range(0, len(str)):
alpha = ord(str[i]) - 96
if inverse:
array.append(27 - alpha)
else:
array.append(alpha)
return array;
class Person:
def __init__(self, name, middlename, birthmonth, birthday, birthyear, age, orderofbirth, gradyear, state, zip, workzip, cityfirst3):
#final list
self.listofnums = []
self.listofnums.extend((birthmonth, birthday, birthyear, birthyear - 1900, age, orderofbirth, gradyear, gradyear - 2000, zip, workzip))
self.listofnums.extend(getalpha(cityfirst3, False))
self.results = getresults(self.listofnums)
Finally, a "solve code" method that takes from the result sets and finds any possible combinations that produce the full list of target_values.
#Compares the values of two sets
def compare(l1, l2):
result = all(map(lambda x, y: x == y, l1, l2))
return result and len(l1) == len(l2)
#Check every result in sibling2 with a different result target_value and equal operation sets
def comparetwosiblings(current_values, sibling1, sibling2, a, b):
if sibling2.results[b][2] not in current_values and compare(sibling1.results[a][1], sibling2.results[b][1]):
okay = True
#If the indexes aren't alphanumeric, ensure they're the same before adding to new result set
for c in range(0, NUMBER_OF_NUMBERS_IN_SOLUTION):
indexintersection = set([index for index, value in enumerate(sibling1.listofnums) if value == sibling1.results[a][0][c]]) & set([index for index, value in enumerate(sibling2.listofnums) if value == sibling2.results[b][0][c]])
if len(indexintersection) > 0:
okay = True
else:
okay = False
break
else:
okay = False
return okay
#For every result, we start by adding the result number to the current_values list for sibling1, then cycle through each person and see if a matching operator list leads to a different result number. (Matching indices as well)
#If there's a result set for everyone that leads to five different numbers in the code, the values will be added to the newresult set
def solvecode( sibling1, sibling2, sibling3, sibling4, sibling5 ):
newresults = []
current_values = []
#For every result in sibling1
for a in range(len(sibling1.results)):
current_values = []
current_values.append(sibling1.results[a][2])
for b in range(len(sibling2.results)):
if comparetwosiblings(current_values, sibling1, sibling2, a, b):
current_values.append(sibling2.results[b][2])
for c in range(len(sibling3.results)):
if comparetwosiblings(current_values, sibling1, sibling3, a, c):
current_values.append(sibling3.results[c][2])
for d in range(len(sibling4.results)):
if comparetwosiblings(current_values, sibling1, sibling4, a, d):
current_values.append(sibling4.results[d][2])
for e in range(len(sibling5.results)):
if comparetwosiblings(current_values, sibling1, sibling5, a, e):
newresults.append([sibling1.results[a][0], sibling2.results[b][0], sibling3.results[c][0], sibling4.results[d][0], sibling5.results[e][0], sibling1.results[a][1]])
current_values.remove(sibling4.results[d][2])
current_values.remove(sibling3.results[c][2])
current_values.remove(sibling2.results[b][2])
print(len(newresults))
print(newresults)
It's the last "solvecode" method that I'm wondering if I can optimize and make into a recursive algorithm. In some cases it can be helpful to add or remove a sibling, which would look nice recursively (My mom sometimes makes a mistake with one sibling, or we get a new brother/sister-in-law)
Thank you for any and all help! I hope you at least get a laugh out of my weird family tradition.
Edit: In case you want to test the algorithm, here's an example group of siblings that result in exactly one correct solution
#ALL PERSONAL INFO CHANGED FOR STACKOVERFLOW
sibling1 = Person("sibling1", "horatio", 7, 8, 1998, 22, 5, 2020, "ma", 11111, 11111, "red")
sibling2 = Person("sibling2", "liem", 2, 21, 1995, 25, 4, 2018, "ma", 11111, 11111, "pho")
sibling3 = Person("sibling3", "kyle", 4, 21, 1993, 26, 3, 2016, "ma", 11111, 11111, "okl")
sibling4 = Person("sibling4", "jamal", 4, 7, 1991, 29, 2, 2014, "ma", 11111, 11111, "pla")
sibling5 = Person("sibling5", "roberto", 9, 23, 1990, 30, 1, 2012, "ma", 11111, 11111, "boe")
I just spent a while improving the code. Few things I need to mention:
It's not good practice to use python keywords(like list, str and zip) as variables, it will give you problems and it makes it harder to debug.
I feel like you should use the permutation function as combination gives unordered pairs while permutation gives ordered pairs which are more in number and will give more results. For example, for the sibling info you gave combination gives only 1 solution through solvecode() while permutation gives 12.
Because you are working with operators, there can be more cases with brackets. To solve that problem and to make the getresults() function a bit more optimized, I suggest you explore the reverse polish notation. Computerphile has an excellent video on it.
You don't need a compare function. list1==list2 works.
Here's the optimized code:
import operator
from itertools import product
from itertools import permutations
NUMBER_OF_OPERATIONS_IN_SOLUTION = 2 #Total numbers involved is this plus 1
NUMBER_OF_NUMBERS_IN_SOLUTION = NUMBER_OF_OPERATIONS_IN_SOLUTION + 1
TARGET_VALUES = {22,27,29,38,39}
def getresults(listofnums):
#Add the cartesian product of all possible operations to a variable ops
ops = []
opslist = [operator.add, operator.sub, operator.mul, operator.truediv]
for val in product(opslist, repeat=NUMBER_OF_OPERATIONS_IN_SOLUTION):
ops.append(val)
#Get the result set of all combinations of numbers and cartesian products of operations that reach a target_value
results = []
for x in permutations(listofnums, NUMBER_OF_NUMBERS_IN_SOLUTION):
for y in ops:
result = y[0](x[0], x[1])
if NUMBER_OF_OPERATIONS_IN_SOLUTION>1:
for z in range(1, len(y)):
result = y[z](result, x[z+1])
if result in TARGET_VALUES:
results.append([x, y, result])
return results
def getalpha(string, inverse):
"Converts string to alphanumeric array of chars"
array = []
for i in range(0, len(string)):
alpha = ord(string[i]) - 96
array.append(27-alpha if inverse else alpha)
return array
class Person:
def __init__(self, name, middlename, birthmonth, birthday, birthyear, age, orderofbirth, gradyear, state, zipcode, workzip, cityfirst3):
#final list
self.listofnums = [birthmonth, birthday, birthyear, birthyear - 1900, age, orderofbirth, gradyear, gradyear - 2000, zipcode, workzip]
self.listofnums.extend(getalpha(cityfirst3, False))
self.results = getresults(self.listofnums)
#Check every result in sibling2 with a different result target_value and equal operation sets
def comparetwosiblings(current_values, sibling1, sibling2, a, b):
if sibling2.results[b][2] not in current_values and sibling1.results[a][1]==sibling2.results[b][1]:
okay = True
#If the indexes aren't alphanumeric, ensure they're the same before adding to new result set
for c in range(0, NUMBER_OF_NUMBERS_IN_SOLUTION):
indexintersection = set([index for index, value in enumerate(sibling1.listofnums) if value == sibling1.results[a][0][c]]) & set([index for index, value in enumerate(sibling2.listofnums) if value == sibling2.results[b][0][c]])
if len(indexintersection) > 0:
okay = True
else:
okay = False
break
else:
okay = False
return okay
And now, the million dollar function or should i say two functions:
# var contains the loop variables a-e, depth keeps track of sibling number
def rec(arg, var, current_values, newresults, depth):
for i in range(len(arg[depth].results)):
if comparetwosiblings(current_values, arg[0], arg[depth], var[0], i):
if depth<len(arg)-1:
current_values.append(arg[depth].results[i][2])
rec(arg, var[:depth]+[i], current_values, newresults, depth+1)
current_values.remove(arg[depth].results[i][2])
else:
var.extend([i])
newresults.append([arg[0].results[var[0]][0], arg[1].results[var[1]][0], arg[2].results[var[2]][0], arg[3].results[var[3]][0], arg[4].results[var[4]][0], arg[0].results[var[0]][1]])
def solvecode(*arg):
newresults = []
for a in range(len(arg[0].results)):
current_values = [arg[0].results[a][2]]
rec(arg, var=[a], current_values=current_values, newresults=newresults, depth=1)
print(len(newresults))
print(newresults)
There is a need for two functions as the first one is the recursive one and the second one is like a packaging. I've also fulfilled your second wish, that was being able to have variable number of siblings' data that can be input into the new solvecode function. I've checked the new functions and they work together exactly like the original solvecode function. Something to be noted is that there is no significant difference in the version's runtimes although the second one has 8 less lines of code. Hope this helped. lmao took me 3 hours.

Most frequently overlapping range - Python3.x

I'm a beginner, trying to write code listing the most frequently overlapping ranges in a list of ranges.
So, input is various ranges (#1 through #7 in the example figure; https://prntscr.com/kj80xl) and I would like to find the most common range (in the example 3,000- 4,000 in 6 out of 7 - 86 %). Actually, I would like to find top 5 most frequent.
Not all ranges overlap. Ranges are always positive and given as integers with 1 distance (standard range).
What I have now is only code comparing one sequence to another and returning the overlap, but after that I'm stuck.
def range_overlap(range_x,range_y):
x = (range_x[0], (range_x[-1])+1)
y = (range_y[0], (range_y[-1])+1)
overlap = (max(x[0],y[0]),min(x[-1],(y[-1])))
if overlap[0] <= overlap[1]:
return range(overlap[0], overlap[1])
else:
return "Out of range"
I would be very grateful for any help.
Better solution
I came up with a simpler solution (at least IMHO) so here it is:
def get_abs_min(ranges):
return min([min(r) for r in ranges])
def get_abs_max(ranges):
return max([max(r) for r in ranges])
def count_appearances(i, ranges):
return sum([1 for r in ranges if i in r])
def create_histogram(ranges):
keys = [str(i) for i in range(len(ranges) + 1)]
histogram = dict.fromkeys(keys)
results = []
min = get_abs_min(range_list)
max = get_abs_max(range_list)
for i in range(min, max):
count = str(count_appearances(i, ranges))
if histogram[count] is None:
histogram[count] = dict(start=i, end=None)
elif histogram[count]['end'] is None:
histogram[count]['end'] = i
elif histogram[count]['end'] == i - 1:
histogram[count]['end'] = i
else:
start = histogram[count]['start']
end = histogram[count]['end']
results.append((range(start, end + 1), count))
histogram[count]['start'] = i
histogram[count]['end'] = None
for count, d in histogram.items():
if d is not None and d['start'] is not None and d['end'] is not None:
results.append((range(d['start'], d['end'] + 1), count))
return results
def main(ranges, top):
appearances = create_histogram(ranges)
return sorted(appearances, key=lambda t: t[1], reverse=True)[:top]
The idea here is as simple as iterating through a superposition of all the ranges and building a histogram of appearances (e.g. the number of original ranges this current i appears in)
After that just sort and slice according to the chosen size of the results.
Just call main with the ranges and the top number you want (or None if you want to see all results).
OLDER EDITS BELOW
I (almost) agree with #Kasramvd's answer.
here is my take on it:
from collections import Counter
from itertools import combinations
def range_overlap(x, y):
common_part = list(set(x) & set(y))
if common_part:
return range(common_part[0], common_part[-1] +1)
else:
return False
def get_most_common(range_list, top_frequent):
overlaps = Counter(range_overlap(i, j) for i, j in
combinations(list_of_ranges, 2))
return [(r, i) for (r, i) in overlaps.most_common(top_frequent) if r]
you need to input the range_list and the number of top_frequent you want.
EDIT
the previous answer solved this question for all 2's combinations over the range list.
This edit is tested against your input and results with the correct answer:
from collections import Counter
from itertools import combinations
def range_overlap(*args):
sets = [set(r) for r in args]
common_part = list(set(args[0]).intersection(*sets))
if common_part:
return range(common_part[0], common_part[-1] +1)
else:
return False
def get_all_possible_combinations(range_list):
all_combos = []
for i in range(2, len(range_list)):
all_combos.append(combinations(range_list, i))
all_combos = [list(combo) for combo in all_combos]
return all_combos
def get_most_common_for_combo(combo):
return list(filter(None, [range_overlap(*option) for option in combo]))
def get_most_common(range_list, top_frequent):
all_overlaps = []
combos = get_all_possible_combinations(range_list)
for combo in combos:
all_overlaps.extend(get_most_common_for_combo(combo))
return [r for (r, i) in Counter(all_overlaps).most_common(top_frequent) if r]
And to get the results just run get_most_common(range_list, top_frequent)
Tested on my machine (ubunut 16.04 with python 3.5.2) with your input range_list and top_frequent = 5 with the results:
[range(3000, 4000), range(2500, 4000), range(1500, 4000), range(3000, 6000), range(1, 4000)]
You can first change your function to return a valid range in both cases so that you can use it in a set of comparisons. Also, since Python's range objects are not already created iterables but smart objects that only get start, stop and step attributes of a range and create the range on-demand, you can do a little change on your function as well.
def range_overlap(range_x,range_y):
rng = range(max(range_x.start, range_y.start),
min(range_x.stop, range_y.stop)+1)
if rng.start < rng.stop:
return rng.start, rng.stop
Now, if you have a set of ranges and you want to compare all the pairs you can use itertools.combinations to get all the pairs and then using range_overlap and collections.Counter you can find the number of overlapped ranges.
from collections import Counter
from itertools import combinations
overlaps = Counter(range_overlap(i,j) for i, j in
combinations(list_of_ranges, 2))

Comparing median and sum in python

I have data for businesses that has categories and review counts. I have grouped the categories together for each business and I want to separate out those businesses that have review counts that are above the median number of review counts within each category and those that are below the median number of review counts. Essentially, I need to return a Series of median values indexed by category and use that to find out if a given business is greater than the median for its category. I have to compare its review count to the median for its category.
My code is throwing errors and I can't figure out why. Suggestions? I've tried both of the below.
n = df.groupby('category')['review_count'].size()
def cats_median_split(n):
s = df.groupby('category')['review_count'].median()
if n > s:
return True
else:
return False
df.groupby('category')['review_count'].apply(cats_median_split)
OR:
n = df.groupby('category')['review_count'].sum()
def cats_median_split(n):
s = n.median()
if n > s:
return True
else:
return False
df.groupby('category')['review_count'].apply(cats_median_split)
If I understood correctly you wish to:
def median (seq, index=0):
customcmp = lambda x, y: cmp(x[index], y[index])
seq = sorted(seq, customcmp)
l = len(seq)
if l%2==0:
return (seq[l/2-1][index]+seq[l/2][index])/2.0
return seq[l/2][index]
def split (seq, index=0, trashhold=0):
left = []; right = []
for element in seq:
if element[index]<trashhold:
left.append(element)
else:
right.append(element)
return left, right
cats = [(123, 345), (99, 258), (9753, 36754), (234, 216), (123456, 76543)]
m = median(cats, 1)
split(cats, 0, m)
For median you better use numpy, but for smaller sequences this implementation will do.

How to reduce a collection of ranges to a minimal set of ranges [duplicate]

This question already has answers here:
Union of multiple ranges
(5 answers)
Closed 7 years ago.
I'm trying to remove overlapping values from a collection of ranges.
The ranges are represented by a string like this:
499-505 100-115 80-119 113-140 500-550
I want the above to be reduced to two ranges: 80-140 499-550. That covers all the values without overlap.
Currently I have the following code.
cr = "100-115 115-119 113-125 80-114 180-185 500-550 109-120 95-114 200-250".split(" ")
ar = []
br = []
for i in cr:
(left,right) = i.split("-")
ar.append(left);
br.append(right);
inc = 0
for f in br:
i = int(f)
vac = []
jnc = 0
for g in ar:
j = int(g)
if(i >= j):
vac.append(j)
del br[jnc]
jnc += jnc
print vac
inc += inc
I split the array by - and store the range limits in ar and br. I iterate over these limits pairwise and if the i is at least as great as the j, I want to delete the element. But the program doesn't work. I expect it to produce this result: 80-125 500-550 200-250 180-185
For a quick and short solution,
from operator import itemgetter
from itertools import groupby
cr = "499-505 100-115 80-119 113-140 500-550".split(" ")
fullNumbers = []
for i in cr:
a = int(i.split("-")[0])
b = int(i.split("-")[1])
fullNumbers+=range(a,b+1)
# Remove duplicates and sort it
fullNumbers = sorted(list(set(fullNumbers)))
# Taken From http://stackoverflow.com/questions/2154249
def convertToRanges(data):
result = []
for k, g in groupby(enumerate(data), lambda (i,x):i-x):
group = map(itemgetter(1), g)
result.append(str(group[0])+"-"+str(group[-1]))
return result
print convertToRanges(fullNumbers)
#Output: ['80-140', '499-550']
For the given set in your program, output is ['80-125', '180-185', '200-250', '500-550']
Main Possible drawback of this solution: This may not be scalable!
Let me offer another solution that doesn't take time linearly proportional to the sum of the range sizes. Its running time is linearly proportional to the number of ranges.
def reduce(range_text):
parts = range_text.split()
if parts == []:
return ''
ranges = [ tuple(map(int, part.split('-'))) for part in parts ]
ranges.sort()
new_ranges = []
left, right = ranges[0]
for range in ranges[1:]:
next_left, next_right = range
if right + 1 < next_left: # Is the next range to the right?
new_ranges.append((left, right)) # Close the current range.
left, right = range # Start a new range.
else:
right = max(right, next_right) # Extend the current range.
new_ranges.append((left, right)) # Close the last range.
return ' '.join([ '-'.join(map(str, range)) for range in new_ranges ]
This function works by sorting the ranges, then looking at them in order and merging consecutive ranges that intersect.
Examples:
print(reduce('499-505 100-115 80-119 113-140 500-550'))
# => 80-140 499-550
print(reduce('100-115 115-119 113-125 80-114 180-185 500-550 109-120 95-114 200-250'))
# => 80-125 180-185 200-250 500-550

Determine which numbers in list add up to specified value

I have a quick (hopefully accounting problem. I just entered a new job and the books are a bit of a mess. The books have these lump sums logged, while the bank account lists each and every individual deposit. I need to determine which deposits belong to each lump sum in the books. So, I have these four lump sums:
[6884.41, 14382.14, 2988.11, 8501.60]
I then have this larger list of individual deposits (sorted):
[98.56, 98.56, 98.56, 129.44, 160.0, 242.19, 286.87, 290.0, 351.01, 665.0, 675.0, 675.0, 677.45, 677.45, 695.0, 695.0, 695.0, 695.0, 715.0, 720.0, 725.0, 730.0, 745.0, 745.0, 750.0, 750.0, 750.0, 750.0, 758.93, 758.93, 763.85, 765.0, 780.0, 781.34, 781.7, 813.79, 824.97, 827.05, 856.28, 874.08, 874.44, 1498.11, 1580.0, 1600.0, 1600.0]
In Python, how can I determine which sub-set of the longer list sums to one of the lump sum values?
(NOTE: these numbers have the additional problem that the sum of the lump sums is $732.70 more than the sum of the individual accounts. I'm hoping that this doesn't make this problem completely unsolvable)
Here's a pretty good start at a solution:
import datetime as dt
from itertools import groupby
from math import ceil
def _unique_subsets_which_sum_to(target, value_counts, max_sums, index):
value, count = value_counts[index]
if index:
# more values to be considered; solve recursively
index -= 1
rem = max_sums[index]
# find the minimum amount that this value must provide,
# and the minimum occurrences that will satisfy that value
if target <= rem:
min_k = 0
else:
min_k = (target - rem + value - 1) // value # rounded up to next int
# find the maximum occurrences of this value
# which result in <= target
max_k = min(count, target // value)
# iterate across min..max occurrences
for k in range(min_k, max_k+1):
new_target = target - k*value
if new_target:
# recurse
for solution in _unique_subsets_which_sum_to(new_target, value_counts, max_sums, index):
yield ((solution + [(value, k)]) if k else solution)
else:
# perfect solution, no need to recurse further
yield [(value, k)]
else:
# this must finish the solution
if target % value == 0:
yield [(value, target // value)]
def find_subsets_which_sum_to(target, values):
"""
Find all unique subsets of values which sum to target
target integer >= 0, total to be summed to
values sequence of integer > 0, possible components of sum
"""
# this function is basically a shell which prepares
# the input values for the recursive solution
# turn sequence into sorted list
values = sorted(values)
value_sum = sum(values)
if value_sum >= target:
# count how many times each value appears
value_counts = [(value, len(list(it))) for value,it in groupby(values)]
# running total to each position
total = 0
max_sums = [0]
for val,num in value_counts:
total += val * num
max_sums.append(total)
start = dt.datetime.utcnow()
for sol in _unique_subsets_which_sum_to(target, value_counts, max_sums, len(value_counts) - 1):
yield sol
end = dt.datetime.utcnow()
elapsed = end - start
seconds = elapsed.days * 86400 + elapsed.seconds + elapsed.microseconds * 0.000001
print(" -> took {:0.1f} seconds.".format(seconds))
# I multiplied each value by 100 so that we can operate on integers
# instead of floating-point; this will eliminate any rounding errors.
values = [
9856, 9856, 9856, 12944, 16000, 24219, 28687, 29000, 35101, 66500,
67500, 67500, 67745, 67745, 69500, 69500, 69500, 69500, 71500, 72000,
72500, 73000, 74500, 74500, 75000, 75000, 75000, 75000, 75893, 75893,
76385, 76500, 78000, 78134, 78170, 81379, 82497, 82705, 85628, 87408,
87444, 149811, 158000, 160000, 160000
]
sum_to = [
298811,
688441,
850160 #,
# 1438214
]
def main():
subset_sums_to = []
for target in sum_to:
print("\nSolutions which sum to {}".format(target))
res = list(find_subsets_which_sum_to(target, values))
print(" {} solutions found".format(len(res)))
subset_sums_to.append(res)
return subset_sums_to
if __name__=="__main__":
subsetsA, subsetsB, subsetsC = main()
which on my machine results in
Solutions which sum to 298811
-> took 0.1 seconds.
2 solutions found
Solutions which sum to 688441
-> took 89.8 seconds.
1727 solutions found
Solutions which sum to 850160
-> took 454.0 seconds.
6578 solutions found
# Solutions which sum to 1438214
# -> took 7225.2 seconds.
# 87215 solutions found
The next step is to cross-compare solution subsets and see which ones can coexist together. I think the fastest approach would be to store subsets for the smallest three lump sums, iterate through them and (for compatible combinations) find the remaining values and plug them into the solver for the last lump sum.
Continuing from where I left off (+ a few changes to the above code to grab the return lists for subsums to the first three values).
I wanted a way to easily get the remaining value-coefficients each time;
class NoNegativesDict(dict):
def __sub__(self, other):
if set(other) - set(self):
raise ValueError
else:
res = NoNegativesDict()
for key,sv in self.iteritems():
ov = other.get(key, 0)
if sv < ov:
raise ValueError
# elif sv == ov:
# pass
elif sv > ov:
res[key] = sv - ov
return res
then I apply it as
value_counts = [(value, len(list(it))) for value,it in groupby(values)]
vc = NoNegativesDict(value_counts)
nna = [NoNegativesDict(a) for a in subsetsA]
nnb = [NoNegativesDict(b) for b in subsetsB]
nnc = [NoNegativesDict(c) for c in subsetsC]
# this is kind of ugly; with some more effort
# I could probably make it a recursive call also
b_tries = 0
c_tries = 0
sol_count = 0
start = dt.datetime.utcnow()
for a in nna:
try:
res_a = vc - a
sa = str(a)
for b in nnb:
try:
res_b = res_a - b
b_tries += 1
sb = str(b)
for c in nnc:
try:
res_c = res_b - c
c_tries += 1
#unpack remaining values
res_values = [val for val,num in res_c.items() for i in range(num)]
for sol in find_subsets_which_sum_to(1438214, res_values):
sol_count += 1
print("\n================")
print("a =", sa)
print("b =", sb)
print("c =", str(c))
print("d =", str(sol))
except ValueError:
pass
except ValueError:
pass
except ValueError:
pass
print("{} solutions found in {} b-tries and {} c-tries".format(sol_count, b_tries, c_tries))
end = dt.datetime.utcnow()
elapsed = end - start
seconds = elapsed.days * 86400 + elapsed.seconds + elapsed.microseconds * 0.000001
print(" -> took {:0.1f} seconds.".format(seconds))
and the final output:
0 solutions found in 1678 b-tries and 93098 c-tries
-> took 73.0 seconds.
So the final answer is there is no solution for your given data.
Hope that helps ;-)

Categories