Largest Subset whose sum is less than equal to a given sum - python

A list is defined as follows: [1, 2, 3]
and the sub-lists of this are:
[1], [2], [3],
[1,2]
[1,3]
[2,3]
[1,2,3]
Given K for example 3 the task is to find the largest length of sublist with sum of elements is less than equal to k.
I am aware of itertools in python but it will result in segmentation fault for larger lists. Is there any other efficient algorithm to achieve this? Any help would be appreciated.
My code is as allows:
from itertools import combinations
def maxLength(a, k):
#print a,k
l= []
i = len(a)
while(i>=0):
lst= list(combinations(sorted(a),i))
for j in lst:
#rint list(j)
lst = list(j)
#print sum(lst)
sum1=0
sum1 = sum(lst)
if sum1<=k:
return len(lst)
i=i-1

You can use the dynamic programming solution that #Apy linked to. Here's a Python example:
def largest_subset(items, k):
res = 0
# We can form subset with value 0 from empty set,
# items[0], items[0...1], items[0...2]
arr = [[True] * (len(items) + 1)]
for i in range(1, k + 1):
# Subset with value i can't be formed from empty set
cur = [False] * (len(items) + 1)
for j, val in enumerate(items, 1):
# cur[j] is True if we can form a set with value of i from
# items[0...j-1]
# There are two possibilities
# - Set can be formed already without even considering item[j-1]
# - There is a subset with value i - val formed from items[0...j-2]
cur[j] = cur[j-1] or ((i >= val) and arr[i-val][j-1])
if cur[-1]:
# If subset with value of i can be formed store
# it as current result
res = i
arr.append(cur)
return res
ITEMS = [5, 4, 1]
for i in range(sum(ITEMS) + 1):
print('{} -> {}'.format(i, largest_subset(ITEMS, i)))
Output:
0 -> 0
1 -> 1
2 -> 1
3 -> 1
4 -> 4
5 -> 5
6 -> 6
7 -> 6
8 -> 6
9 -> 9
10 -> 10
In above arr[i][j] is True if set with value of i can be chosen from items[0...j-1]. Naturally arr[0] contains only True values since empty set can be chosen. Similarly for all the successive rows the first cell is False since there can't be empty set with non-zero value.
For rest of the cells there are two options:
If there already is a subset with value of i even without considering item[j-1] the value is True
If there is a subset with value of i - items[j - 1] then we can add item to it and have a subset with value of i.

As far as I can see (since you treat sub array as any items of the initial array) you can use greedy algorithm with O(N*log(N)) complexity (you have to sort the array):
1. Assign entire array to the sub array
2. If sum(sub array) <= k then stop and return sub array
3. Remove maximim item from the sub array
4. goto 2
Example
[1, 2, 3, 5, 10, 25]
k = 12
Solution
sub array = [1, 2, 3, 5, 10, 25], sum = 46 > 12, remove 25
sub array = [1, 2, 3, 5, 10], sum = 21 > 12, remove 10
sub array = [1, 2, 3, 5], sum = 11 <= 12, stop and return
As an alternative you can start with an empty sub array and add up items from minimum to maximum while sum is less or equal then k:
sub array = [], sum = 0 <= 12, add 1
sub array = [1], sum = 1 <= 12, add 2
sub array = [1, 2], sum = 3 <= 12, add 3
sub array = [1, 2, 3], sum = 6 <= 12, add 5
sub array = [1, 2, 3, 5], sum = 11 <= 12, add 10
sub array = [1, 2, 3, 5, 10], sum = 21 > 12, stop,
return prior one: [1, 2, 3, 5]

Look, for generating the power-set it takes O(2^n) time. It's pretty bad. You can instead use the dynamic programming approach.
Check in here for the algorithm.
http://www.geeksforgeeks.org/dynamic-programming-subset-sum-problem/
And yes, https://www.youtube.com/watch?v=s6FhG--P7z0 (Tushar explains everything well) :D

Assume everything is positive. (Handling negatives is a simple extension of this and is left to the reader as an exercise). There exists an O(n) algorithm for the described problem. Using the O(n) median select, we partition the array based on the median. We find the sum of the left side. If that is greater than k, then we cannot take all elements, we must thus recur on the left half to try to take a smaller set. Otherwise, we subtract the sum of the left half from k, then we recur on the right half to see how many more elements we can take.
Partitioning the array based on median select and recurring on only 1 of the halves yields a runtime of n+n/2 +n/4 +n/8.. which geometrically sums up to O(n).

Related

Error when trying to implement MERGE algorithm merging to sorted lists of integers in python?

I'm new to both algorithms AND programming.
As an intro to the MERGE algorithms the chapter introduces first the MERGE algorithm by itself. It merges and sorts an array consisting of 2 sorted sub-arrays.
I did the pseudocode on paper according to the book:
Source: "Introduction to Algorithms
Third Edition" Thomas H. Cormen Charles E. Leiserson Ronald L. Rivest Clifford Stein
Since I am implementing it in python3 I had to change some lines given that indexing in python starts at 0 unlike in the pseudocode example of the book.
Keep in mind that the input is one array that contains 2 SORTED sub-arrays which are then merged and sorted, and returned. I kept the prints in my code, so you can see my checks...
#!/anaconda3/bin/python3
import math
import argparse
# For now only MERGE slides ch 2 -- Im defining p q and r WITHIN the function
# But for MERGE_SORT p,q and r are defined as parameters!
def merge(ar):
'''
Takes as input an array. This array consists of 2 subarrays that ARE ALLREADY sorted
(small to large). When splitting the array into half, the left
part will be longer by one if not divisible by 2. These subarrays will be
called left and right. Each of the subarrays must already be sorted. Merge() then
merges these sorted arrays into one big sorted array. The sorted array is returned.
'''
print(ar)
p=0 # for now defining always as 0
if len(ar)%2==0:
q=len(ar)//2-1 # because indexing starts from ZERO in py
else:
q=len(ar)//2 # left sub array will be 1 item longer
r=len(ar)-1 # again -1 because indexing starts from ZERO in py
print('p', p, 'q', q, 'r', r)
# lets see if n1 and n2 check out
n_1 = q-p+1 # lenght of left subarray
n_2 = r-q # lenght of right subarray
print('n1 is: ', n_1)
print('n2 is: ', n_2)
left = [0]*(n_1+1) # initiating zero list of lenght n1
right=[0]*(n_2+1)
print(left, len(left))
print(right, len(right))
# filling left and right
for i in range(n_1):# because last value will always be infinity
left[i] = ar[p+i]
for j in range(n_2):
right[j] = ar[q+j+1]
#print(ar[q+j+1])
#print(right[j])
# inserting infinity at last index for each subarray
left[n_1]=math.inf
right[n_2]=math.inf
print(left)
print(right)
# merging: initiating indexes at 0
i=0
j=0
print('p', p)
print('r', r)
for k in range(p,r):
if left[i] <= right[j]:
ar[k]=left[i]
# increase i
i += 1
else:
ar[k]=right[j]
#increase j
j += 1
print(ar)
#############################################################################################################################
# Adding parser
#############################################################################################################################
parser = argparse.ArgumentParser(description='MERGE algorithm from ch 2')
parser.add_argument('-a', '--array', type=str, metavar='', required=True, help='One List of integers composed of 2 sorted halves. Sorting must start from smallest to largest for each of the halves.')
args = parser.parse_args()
args_list_st=args.array.split(',') # list of strings
args_list_int=[]
for i in args_list_st:
args_list_int.append(int(i))
if __name__ == "__main__":
merge(args_list_int)
The problem:
When I try to sort the array as shown in the book the merged array that is returned contains two 6es and the 7 is lost.
$ ./2.merge.py -a=2,4,5,7,1,2,3,6
[2, 4, 5, 7, 1, 2, 3, 6]
p 0 q 3 r 7
n1 is: 4
n2 is: 4
[0, 0, 0, 0, 0] 5
[0, 0, 0, 0, 0] 5
[2, 4, 5, 7, inf]
[1, 2, 3, 6, inf]
p 0
r 7
[1, 2, 2, 3, 4, 5, 6, 6]
This does how ever not happen with arrays of any number higher than 6.
$ ./2.merge.py -a=2,4,5,7,1,2,3,8
[2, 4, 5, 7, 1, 2, 3, 8]
p 0 q 3 r 7
n1 is: 4
n2 is: 4
[0, 0, 0, 0, 0] 5
[0, 0, 0, 0, 0] 5
[2, 4, 5, 7, inf]
[1, 2, 3, 8, inf]
p 0
r 7
[1, 2, 2, 3, 4, 5, 7, 8]
I showed it to a colleague in my class without success. And I've walked it through manually with numbers on paper snippets but withouth success. I hope someone can find my silly mistake because I'm completely stuck.
Thanks
As r is the index of the last value in arr, you need to add one to it to make a range that also includes that final index:
for k in range(p, r + 1):
# ^^^^^
Note that your code could be greatly reduced if you would use list slicing.
Brother you made a very small mistake in this line
for k in range(p,r):
Here you loop is running from p to r-1 and your last index i.e r, will not get iterated.
So you have to use
for k in range(p,r+1):
And in the second testcase a=[2,4,5,7,1,2,3,8]
You are getting the correct output even with your wrong code because you are overwriting the values in array ar and your current code was able to sort the array till index r-1 and the number present at index r will be the same which was present before the execution of your merge function i.e 8
Try using this testcase: [2, 4, 5, 8, 1, 2, 3, 7]
And your output will be [1, 2, 2, 3, 4, 5, 7, 7]
Hope this helped

Count how many permutations of list possible as long as it 'fits' into another list

I'm trying to find how many arrangements of a list are possible with each arrangement 'fitting' into another list (i.e. all elements of the arrangement have to be less than or equal to the corresponding element). For example, the list [1, 2, 3, 4] has to fit in the list [2, 4, 3, 4].
There are 8 possible arrangements in this case:
[1, 2, 3, 4]
[1, 4, 2, 3]
[1, 3, 2, 4]
[1, 4, 3, 2]
[2, 1, 3, 4]
[2, 4, 1, 3]
[2, 3, 1, 4]
[2, 4, 3, 1]
Because 3 and 4 cannot fit into the first slot of the list, all arrangements that start with 3 or 4 are cut out. Additionally, 4 cannot fit into the third slot, so any remaining arrangements with 4 in the third slot are removed.
This is my current code trying to brute-force the problem:
from itertools import permutations
x = [1, 2, 3, 4]
box = [2, 4, 3, 4] # this is the list we need to fit our arrangements into
counter = 0
for permutation in permutations(x):
foo = True
for i in range(len(permutation)):
if permutation[i] > box[i]:
foo = False
break
if foo:
counter += 1
print(counter)
It works, but because I'm generating all the possible permutations of the first list, it's very slow, but I just can't find an algorithm for it. I realize that it's a basically a math problem, but I'm bad at math.
If you sort the x in reverse, you can try to find all the spots each element can fit in the box one at a time.
In your example:
4 has 2 spots it can go
3 has 3 spots, but you have to account for already placing the "4",
so you have 3 - 1 = 2 available
2 has 4 spots, but you have to account for already placing two things
(the "4" and "3"), so you have 4 - 2 = 2 available
1 has 4 spots, but you have already placed 3... so 4 - 3 = 1
The product 2 * 2 * 2 * 1 is 8.
Here's one way you can do that:
import numpy as np
counter = 1
for i, val in enumerate(reversed(sorted(x))):
counter *= ( (val <= np.array(box)).sum() - i)
print(counter)
...or without numpy (and faster, actually):
for i, val in enumerate(reversed(sorted(x))):
counter *= ( sum( ( val <= boxval for boxval in box)) - i)
I've experimented a bit with timings and here's what I found:
Your original code
for permutation in permutations(x):
foo = True
for i in range(len(permutation)):
if permutation[i] > box[i]:
foo = False
break
if foo:
counter += 1
Took about 13569 ns per run
Filtering the permutation
for i in range(100):
res = len(list(filter(lambda perm: all([perm[i] <= box[i] for i in range(len(box))]), permutations(x))))
Took slightly longer at 16717 ns
Rick M
counter = 1
for i, val in enumerate(reversed(sorted(x))):
counter *= ((val <= np.array(box)).sum() - i)
Took even longer at 20146 ns
Recursive Listcomprehension
def findPossiblities(possibleValues, box):
return not box or sum([findPossiblities([rem for rem in possibleValues if rem != val], box[1:]) for val in [val for val in possibleValues if val <= box[0]]])
findPossiblities(x, box)
Even longer at 27052 ns.
As a conclusion, using itertools and filtering is probably the best option

How to count the number of items in several bins using loop in python? details showed in picture

Question details showed in the picture Thanks for your help.
Write a function histogram(values, dividers) that takes as argument a sequence of values and a sequence of bin dividers, and returns the histogram as a sequence of a suitable type (say, an array) with the counts in each bin. The number of bins is the number of dividers + 1; the first bin has no lower limit and the last bin has no upper limit. As in (a), elements that are equal to one of the dividers are counted in the bin below.
For example, suppose the sequence of values is the numbers 1,..,10 and the bin dividers are array(2, 5, 7); the histogram should be array(2, 3, 2, 3).
Here is my code
def histogram(values, dividers):
count=0
for element in values:
index=0
i=0
count[i]=0
while index < len(dividers) - 2:
if element <= dividers[index]:
i=dividers[index]
count[i] += 1
index=len(dividers)
elif element > dividers[index] and element <= dividers[index+1]:
i=dividers[index]
count[i] += 1
index= len(dividers)
index += 1
return count[i]
from bisect import bisect_left
# Using Python builtin to find where value is in dividers
(this is O(log n) for each value)
def histogram(values, dividers):
count = [0]*(1+len(dividers))
for element in values:
i = bisect_left(dividers, element)
count[i] += 1
return count
values = list(range(1, 11)) # list from 1 through 10
bins = [2, 5, 7]
c = histogram(values, bins) # Result [2, 3, 2, 3]
Explanation of histogram
1. bisect_left finds the bin the index the value should be inserted
2. We update count array according to this index. Count array size is
(1+len(bins)), to allow for values > bins[-1]
A simple implementation would be to prepare a list of counters of size len(dividers)+1.
Go through all numbers provided:
if your current number is bigger then the largest bin-divider, increment the last bins counter
else go through all dividers until your number is no longer bigger as it, and increment that bin-counter by 1
This leads to:
def histogram(values, dividers):
bins = [0 for _ in range(len(dividers)+1)]
print (bins)
for num in values:
if num > dividers[-1]:
bins[-1] += 1
else:
k = 0
while num > dividers[k]:
k+=1
bins[k] += 1
return bins
print(histogram(range(20),[2,4,9]))
Output:
# counts
[3, 2, 5, 10]
Explanation
Dividers: [2,4,9]
Bins: [ 2 and less | 4 | 9 | 10 and more ]
Numbers: 0..19
0, 1, 2 -> not bigger then 9, smaller/equal 2
3, 4 -> not bigger then 9, smaller/equal 4
5, 6, 7, 8, 9 -> not bigger then 9, smaller/equal 9
10, 11, 12, 13, 14, 15, 16, 17, 18, 19 -> bigger 9
This is a naive implementation and there are faster ones using tree like data structures for more performance. Consider a divider of [5,6,7] and a list of [7,7,7,7,7,7] this would run 6 times (6*7) testing for bins 3 times (bigger then 5, bigger then 6, not bigger then 7) == 18 unrolled loops.
There are more efficient algos possible using better suited data structures.

Python3 - mergeSort implementation

I'm trying to implement mergeSort on my own, but the order of the returned list is still not correct. I must be missing something (especially by the merge step), could someone pls help me what it is?
Here is my code:
def merge(left_half, right_half):
"""
Merge 2 sorted lists.
:param left_half: sorted list C
:param right_half: sorted list D
:return: sorted list B
"""
i = 0
j = 0
B = []
for item in range(len(left_half) + len(right_half)):
while i < len(left_half) and j < len(right_half):
if left_half[i] <= right_half[j]:
B.insert(item, left_half[i])
i += 1
else:
B.insert(item, right_half[j])
j += 1
B += left_half[i:]
B += right_half[j:]
print("result: ", B)
return B
def mergeSort(A):
"""
Input: list A of n distinct integers.
Output: list with the same integers, sorted from smallest to largest.
:return: Output
"""
# base case
if len(A) < 2:
return A
# divide the list into two
mid = len(A) // 2
print(mid)
left = A[:mid] # recursively sort first half of A
right = A[mid:] # recursively sort second half of A
x = mergeSort(left)
y = mergeSort(right)
return merge(x, y)
print(mergeSort([1, 3, 2, 4, 6, 5]))
Before the last merge I receive the two lists [1, 2, 3] and [4, 5, 6] correctly, but my final result is [3, 2, 1, 4, 5, 6].
In the first iteration of your for-loop, you entirely traverse one of the lists, but always insert at index 0.
You do not want to insert an element, you always want to append it. This then makes the for-loop unecessary.
Here is a fixed version of your code:
def merge(left_half, right_half):
"""
Merge 2 sorted arrays.
:param left_half: sorted array C
:param right_half: sorted array D
:return: sorted array B
"""
i = 0
j = 0
B = []
while i < len(left_half) and j < len(right_half):
if left_half[i] <= right_half[j]:
B.append(left_half[i])
i += 1
else:
B.append(right_half[j])
j += 1
B += left_half[i:]
B += right_half[j:]
print("result: ", B)
return B
merge([1, 2, 3], [4, 5, 6])
# result: [1, 2, 3, 4, 5, 6]

Remove elements that appear more often than once from numpy array

The question is, how can I remove elements that appear more often than once in an array completely. Below you see an approach that is very slow when it comes to bigger arrays.
Any idea of doing this the numpy-way? Thanks in advance.
import numpy as np
count = 0
result = []
input = np.array([[1,1], [1,1], [2,3], [4,5], [1,1]]) # array with points [x, y]
# count appearance of elements with same x and y coordinate
# append to result if element appears just once
for i in input:
for j in input:
if (j[0] == i [0]) and (j[1] == i[1]):
count += 1
if count == 1:
result.append(i)
count = 0
print np.array(result)
UPDATE: BECAUSE OF FORMER OVERSIMPLIFICATION
Again to be clear: How can I remove elements appearing more than once concerning a certain attribute from an array/list ?? Here: list with elements of length 6, if first and second entry of every elements both appears more than once in the list, remove all concerning elements from list. Hope I'm not to confusing. Eumiro helped me a lot on this, but I don't manage to flatten the output list as it should be :(
import numpy as np
import collections
input = [[1,1,3,5,6,6],[1,1,4,4,5,6],[1,3,4,5,6,7],[3,4,6,7,7,6],[1,1,4,6,88,7],[3,3,3,3,3,3],[456,6,5,343,435,5]]
# here, from input there should be removed input[0], input[1] and input[4] because
# first and second entry appears more than once in the list, got it? :)
d = {}
for a in input:
d.setdefault(tuple(a[:2]), []).append(a[2:])
outputDict = [list(k)+list(v) for k,v in d.iteritems() if len(v) == 1 ]
result = []
def flatten(x):
if isinstance(x, collections.Iterable):
return [a for i in x for a in flatten(i)]
else:
return [x]
# I took flatten(x) from http://stackoverflow.com/a/2158522/1132378
# And I need it, because output is a nested list :(
for i in outputDict:
result.append(flatten(i))
print np.array(result)
So, this works, but it's impracticable with big lists.
First I got
RuntimeError: maximum recursion depth exceeded in cmp
and after applying
sys.setrecursionlimit(10000)
I got
Segmentation fault
how could I implement Eumiros solution for big lists > 100000 elements?
np.array(list(set(map(tuple, input))))
returns
array([[4, 5],
[2, 3],
[1, 1]])
UPDATE 1: If you want to remove the [1, 1] too (because it appears more than once), you can do:
from collections import Counter
np.array([k for k, v in Counter(map(tuple, input)).iteritems() if v == 1])
returns
array([[4, 5],
[2, 3]])
UPDATE 2: with input=[[1,1,2], [1,1,3], [2,3,4], [4,5,5], [1,1,7]]:
input=[[1,1,2], [1,1,3], [2,3,4], [4,5,5], [1,1,7]]
d = {}
for a in input:
d.setdefault(tuple(a[:2]), []).append(a[2])
d is now:
{(1, 1): [2, 3, 7],
(2, 3): [4],
(4, 5): [5]}
so we want to take all key-value pairs, that have single values and re-create the arrays:
np.array([k+tuple(v) for k,v in d.iteritems() if len(v) == 1])
returns:
array([[4, 5, 5],
[2, 3, 4]])
UPDATE 3: For larger arrays, you can adapt my previous solution to:
import numpy as np
input = [[1,1,3,5,6,6],[1,1,4,4,5,6],[1,3,4,5,6,7],[3,4,6,7,7,6],[1,1,4,6,88,7],[3,3,3,3,3,3],[456,6,5,343,435,5]]
d = {}
for a in input:
d.setdefault(tuple(a[:2]), []).append(a)
np.array([v for v in d.itervalues() if len(v) == 1])
returns:
array([[[456, 6, 5, 343, 435, 5]],
[[ 1, 3, 4, 5, 6, 7]],
[[ 3, 4, 6, 7, 7, 6]],
[[ 3, 3, 3, 3, 3, 3]]])
This is a corrected, faster version of Hooked's answer. count_unique counts the number of the number of occurrences for each unique key in keys.
import numpy as np
input = np.array([[1,1,3,5,6,6],
[1,1,4,4,5,6],
[1,3,4,5,6,7],
[3,4,6,7,7,6],
[1,1,4,6,88,7],
[3,3,3,3,3,3],
[456,6,5,343,435,5]])
def count_unique(keys):
"""Finds an index to each unique key (row) in keys and counts the number of
occurrences for each key"""
order = np.lexsort(keys.T)
keys = keys[order]
diff = np.ones(len(keys)+1, 'bool')
diff[1:-1] = (keys[1:] != keys[:-1]).any(-1)
count = np.where(diff)[0]
count = count[1:] - count[:-1]
ind = order[diff[1:]]
return ind, count
key = input[:, :2]
ind, count = count_unique(key)
print key[ind]
#[[ 1 1]
# [ 1 3]
# [ 3 3]
# [ 3 4]
# [456 6]]
print count
[3 1 1 1 1]
ind = ind[count == 1]
output = input[ind]
print output
#[[ 1 3 4 5 6 7]
# [ 3 3 3 3 3 3]
# [ 3 4 6 7 7 6]
# [456 6 5 343 435 5]]
Updated Solution:
From the comments below, the new solution is:
idx = argsort(A[:, 0:2], axis=0)[:,1]
kidx = where(sum(A[idx,:][:-1,0:2]!=A[idx,:][1:,0:2], axis=1)==0)[0]
kidx = unique(concatenate((kidx,kidx+1)))
for n in arange(0,A.shape[0],1):
if n not in kidx:
print A[idx,:][n]
> [1 3 4 5 6 7]
[3 3 3 3 3 3]
[3 4 6 7 7 6]
[456 6 5 343 435 5]
kidx is a index list of the elements you don't want. This preserves rows where the first two inner elements do not match any other inner element. Since everything is done with indexing, it should be fast(ish), though it requires a sort on the first two elements. Note that original row order is not preserved, though I don't think this is a problem.
Old Solution:
If I understand it correctly, you simply want to filter out the results of a list of lists where the first element of each inner list is equal to the second element.
With your input from your update A=[[1,1,3,5,6,6],[1,1,4,4,5,6],[1,3,4,5,6,7],[3,4,6,7,7,6],[1,1,4,6,88,7],[3,3,3,3,3,3],[456,6,5,343,435,5]], the following line removes A[0],A[1] and A[4]. A[5] is also removed since that seems to match your criteria.
[x for x in A if x[0]!=x[1]]
If you can use numpy, there is a really slick way of doing the above. Assume that A is an array, then
A[A[0,:] == A[1,:]]
Will pull out the same values. This is probably faster than the solution listed above if you want to loop over it.
Why not create another array to hold the output?
Iterate through your main list and for each i check if i is in your other array and if not append it.
This way, your new array will not contain more than one of each element

Categories