Summing values in an array less than a certain value

Summing values in an array less than a certain value - python

I have a 3x3 array with numbers and zeroes. I need to take the absolute difference between the next point, ls[i+1], and the point before it, ls[i]. Here is an example of my list:
ls=[(98.6,99,0),(98.2,98.4,97.1),(97.6,0,98.3)]
The zeroes are faulty data. I need a loop that will:
Take the absolute difference between the future number and the current number in each row,
Make the differences greater than the max difference zero
(max diff=1.9 in this case given that the zeroes are faulty data),
Sum together the differences in each row so that I'm left with a list of the sums.
As it stands now, the end result will be:
result=[(0.4,99),(0.2,1.3),(97.6,98.3)]
Given that the zeroes are not good data, differences greater than 1.9 are not an accurate result.

If you're happy with setting differences over a given maximum difference value to 0, perhaps implement that logic in a 2nd step:
ls=[(98.6,99,0),(98.2,98.4,97.1),(97.6,0,98.3)]
unfiltered = [tuple(abs(x1 - x2) for x1, x2 in zip(tup, tup[1:]))
for tup in ls]
max_diff = 1.9
results = [tuple((x if x < max_diff else 0) for x in tup)
for tup in unfiltered]
If you have objects that are not native python lists/tuples but do support indexing, it might be better to do this:
ls=[(98.6,99,0),(98.2,98.4,97.1),(97.6,0,98.3)]
unfiltered = [tuple(abs(item[i] - item[i+1]) for i in range(len(item)-1))
for item in ls]
max_diff = 1.9
results = [tuple((x if x < max_diff else 0) for x in tup)
for tup in unfiltered]

Not sure why the numbers get all messed up when doing the absolute difference, probably something to do with floating point numbers...
ls=[(98.6,99,0),(98.2,98.4,97.1),(97.6,0,98.3)]
def abs_diff(lst, max_diff=1.9):
n = len(lst)
if n < 2:
return lst
res = []
for i in range(n-1):
diff = abs(lst[i] - lst[i+1])
if diff > max_diff:
res.append(0)
else:
res.append(diff)
return res
result = map(tuple, map(abs_diff, ls))
print result
# [(0.40000000000000568, 0), (0.20000000000000284, 1.3000000000000114), (0, 0)]

This should do you. I've broken out your awkward subtraction/clearing of bad values, but you can tail recursively move through the list, building the needed values as you go, filtering out 0s.
def awkward_subtract(a, b):
if (a is None) or (b is None) or (a == 0) or (b == 0):
return 0
else:
return abs(a - b)
def compare_lists(ls):
head, *tail = ls
if not tail:
return [list(filter(int(0).__ne__, head))]
else:
values = [awkward_subtract(head[x], tail[0][x]) for x in range(0, len(head))]
return [list(filter(int(0).__ne__, values))] + compare_lists(tail)
You can test it in the REPL*:
>>> ls = [[98.6,99,0],[98.2,98.4,97.1],[97.6,0,98.3]]
>>> compare_lists(ls)
[[0.3999999999999915, 0.5999999999999943], [0.6000000000000085, 1.2000000000000028], [97.6, 98.3]]
(*) I think your test is not quite right, btw.
Note that this uses embedded lists for ease, but it is dead simple to fix that:
ts = [(98.6,99,0),(98.2,98.4,97.1),(97.6,0,98.3)]
ls = [list(t) for t in ts]

Related

Getting all subsets from subset sum problem on Python using Dynamic Programming

I am trying to extract all subsets from a list of elements which add up to a certain value.
Example -
List = [1,3,4,5,6]
Sum - 9
Output Expected = [[3,6],[5,4]]
Have tried different approaches and getting the expected output but on a huge list of elements it is taking a significant amount of time.
Can this be optimized using Dynamic Programming or any other technique.
Approach-1
def subset(array, num):
result = []
def find(arr, num, path=()):
if not arr:
return
if arr[0] == num:
result.append(path + (arr[0],))
else:
find(arr[1:], num - arr[0], path + (arr[0],))
find(arr[1:], num, path)
find(array, num)
return result
numbers = [2, 2, 1, 12, 15, 2, 3]
x = 7
subset(numbers,x)
Approach-2
def isSubsetSum(arr, subset, N, subsetSize, subsetSum, index , sum):
global flag
if (subsetSum == sum):
flag = 1
for i in range(0, subsetSize):
print(subset[i], end = " ")
print("")
else:
for i in range(index, N):
subset[subsetSize] = arr[i]
isSubsetSum(arr, subset, N, subsetSize + 1,
subsetSum + arr[i], i + 1, sum)

If you want to output all subsets you can't do better than a sluggish O(2^n) complexity, because in the worst case that will be the size of your output and time complexity is lower-bounded by output size (this is a known NP-Complete problem). But, if rather than returning a list of all subsets, you just want to return a boolean value indicating whether achieving the target sum is possible, or just one subset summing to target (if it exists), you can use dynamic programming for a pseudo-polynomial O(nK) time solution, where n is the number of elements and K is the target integer.
The DP approach involves filling in an (n+1) x (K+1) table, with the sub-problems corresponding to the entries of the table being:
DP[i][k] = subset(A[i:], k) for 0 <= i <= n, 0 <= k <= K
That is, subset(A[i:], k) asks, 'Can I sum to (little) k using the suffix of A starting at index i?' Once you fill in the whole table, the answer to the overall problem, subset(A[0:], K) will be at DP[0][K]
The base cases are for i=n: they indicate that you can't sum to anything except for 0 if you're working with the empty suffix of your array
subset(A[n:], k>0) = False, subset(A[n:], k=0) = True
The recursive cases to fill in the table are:
subset(A[i:], k) = subset(A[i+1:, k) OR (A[i] <= k AND subset(A[i+i:], k-A[i]))
This simply relates the idea that you can use the current array suffix to sum to k either by skipping over the first element of that suffix and using the answer you already had in the previous row (when that first element wasn't in your array suffix), or by using A[i] in your sum and checking if you could make the reduced sum k-A[i] in the previous row. Of course, you can only use the new element if it doesn't itself exceed your target sum.
ex: subset(A[i:] = [3,4,1,6], k = 8)
would check: could I already sum to 8 with the previous suffix (A[i+1:] = [4,1,6])? No. Or, could I use the 3 which is now available to me to sum to 8? That is, could I sum to k = 8 - 3 = 5 with [4,1,6]? Yes. Because at least one of the conditions was true, I set DP[i][8] = True
Because all the base cases are for i=n, and the recurrence relation for subset(A[i:], k) relies on the answers to the smaller sub-problems subset(A[i+i:],...), you start at the bottom of the table, where i = n, fill out every k value from 0 to K for each row, and work your way up to row i = 0, ensuring you have the answers to the smaller sub-problems when you need them.
def subsetSum(A: list[int], K: int) -> bool:
N = len(A)
DP = [[None] * (K+1) for x in range(N+1)]
DP[N] = [True if x == 0 else False for x in range(K+1)]
for i in range(N-1, -1, -1):
Ai = A[i]
DP[i] = [DP[i+1][k] or (Ai <=k and DP[i+1][k-Ai]) for k in range(0, K+1)]
# print result
print(f"A = {A}, K = {K}")
print('Ai,k:', *range(0,K+1), sep='\t')
for (i, row) in enumerate(DP): print(A[i] if i < N else None, *row, sep='\t')
print(f"DP[0][K] = {DP[0][K]}")
return DP[0][K]
subsetSum([1,4,3,5,6], 9)
If you want to return an actual possible subset alongside the bool indicating whether or not it's possible to make one, then for every True flag in your DP you should also store the k index for the previous row that got you there (it will either be the current k index or k-A[i], depending on which table lookup returned True, which will indicate whether or not A[i] was used). Then you walk backwards from DP[0][K] after the table is filled to get a subset. This makes the code messier but it's definitely do-able. You can't get all subsets this way though (at least not without increasing your time complexity again) because the DP table compresses information.

Here is the optimized solution to the problem with a complexity of O(n^2).
def get_subsets(data: list, target: int):
# initialize final result which is a list of all subsets summing up to target
subsets = []
# records the difference between the target value and a group of numbers
differences = {}
for number in data:
prospects = []
# iterate through every record in differences
for diff in differences:
# the number complements a record in differences, i.e. a desired subset is found
if number - diff == 0:
new_subset = [number] + differences[diff]
new_subset.sort()
if new_subset not in subsets:
subsets.append(new_subset)
# the number fell short to reach the target; add to prospect instead
elif number - diff < 0:
prospects.append((number, diff))
# update the differences record
for prospect in prospects:
new_diff = target - sum(differences[prospect[1]]) - prospect[0]
differences[new_diff] = differences[prospect[1]] + [prospect[0]]
differences[target - number] = [number]
return subsets

Is there a python function that returns the first positive int that does not occur in list?

I'm tryin to design a function that, given an array A of N integers, returns the smallest positive integer (greater than 0) that does not occur in A.
This code works fine yet has a high order of complexity, is there another solution that reduces the order of complexity?
Note: The 10000000 number is the range of integers in array A, I tried the sort function but does it reduces the complexity?
def solution(A):
for i in range(10000000):
if(A.count(i)) <= 0:
return(i)

The following is O(n logn):
a = [2, 1, 10, 3, 2, 15]
a.sort()
if a[0] > 1:
print(1)
else:
for i in range(1, len(a)):
if a[i] > a[i - 1] + 1:
print(a[i - 1] + 1)
break
If you don't like the special handling of 1, you could just append zero to the array and have the same logic handle both cases:
a = sorted(a + [0])
for i in range(1, len(a)):
if a[i] > a[i - 1] + 1:
print(a[i - 1] + 1)
break
Caveats (both trivial to fix and both left as an exercise for the reader):
Neither version handles empty input.
The code assumes there no negative numbers in the input.

O(n) time and O(n) space:
def solution(A):
count = [0] * len(A)
for x in A:
if 0 < x <= len(A):
count[x-1] = 1 # count[0] is to count 1
for i in range(len(count)):
if count[i] == 0:
return i+1
return len(A)+1 # only if A = [1, 2, ..., len(A)]

This should be O(n). Utilizes a temporary set to speed things along.
a = [2, 1, 10, 3, 2, 15]
#use a set of only the positive numbers for lookup
temp_set = set()
for i in a:
if i > 0:
temp_set.add(i)
#iterate from 1 upto length of set +1 (to ensure edge case is handled)
for i in range(1, len(temp_set) + 2):
if i not in temp_set:
print(i)
break

My proposal is a recursive function inspired by quicksort.
Each step divides the input sequence into two sublists (lt = less than pivot; ge = greater or equal than pivot) and decides, which of the sublists is to be processed in the next step. Note that there is no sorting.
The idea is that a set of integers such that lo <= n < hi contains "gaps" only if it has less than (hi - lo) elements.
The input sequence must not contain dups. A set can be passed directly.
# all cseq items > 0 assumed, no duplicates!
def find(cseq, cmin=1):
# cmin = possible minimum not ruled out yet
size = len(cseq)
if size <= 1:
return cmin+1 if cmin in cseq else cmin
lt = []
ge = []
pivot = cmin + size // 2
for n in cseq:
(lt if n < pivot else ge).append(n)
return find(lt, cmin) if cmin + len(lt) < pivot else find(ge, pivot)
test = set(range(1,100))
print(find(test)) # 100
test.remove(42)
print(find(test)) # 42
test.remove(1)
print(find(test)) # 1

Inspired by various solutions and comments above, about 20%-50% faster in my (simplistic) tests than the fastest of them (though I'm sure it could be made faster), and handling all the corner cases mentioned (non-positive numbers, duplicates, and empty list):
import numpy
def firstNotPresent(l):
positive = numpy.fromiter(set(l), dtype=int) # deduplicate
positive = positive[positive > 0] # only keep positive numbers
positive.sort()
top = positive.size + 1
if top == 1: # empty list
return 1
sequence = numpy.arange(1, top)
try:
return numpy.where(sequence < positive)[0][0]
except IndexError: # no numbers are missing, top is next
return top
The idea is: if you enumerate the positive, deduplicated, sorted list starting from one, the first time the index is less than the list value, the index value is missing from the list, and hence is the lowest positive number missing from the list.
This and the other solutions I tested against (those from adrtam, Paritosh Singh, and VPfB) all appear to be roughly O(n), as expected. (It is, I think, fairly obvious that this is a lower bound, since every element in the list must be examined to find the answer.) Edit: looking at this again, of course the big-O for this approach is at least O(n log(n)), because of the sort. It's just that the sort is so fast comparitively speaking that it looked linear overall.

most efficient way to find a sum of two numbers

I am looking into a problem: given an arbitrary list, in this case it is [9,15,1,4,2,3,6], find any two numbers that would sum to a given result (in this case 10). What would be the most efficient way to do this? My solution is n2 in terms of big O notation and even though I have filtered and sorted the numbers I am sure there is a way to do this more efficiently. Thanks in advance
myList = [9,15,1,4,2,3,6]
myList.sort()
result = 10
myList = filter(lambda x:x < result,myList)
total = 0
for i in myList:
total = total + 1
for j in myList[total:]:
if i + j == result:
print i,j
break

O(n log n) solution
Sort your list. For each number x, binary search for S - x in the list.
O(n) solution
For each number x, see if you have S - x in a hash table. Add x to the hash table.
Note that, if your numbers are really small, the hash table can be a simple array where h[i] = true if i exists in the hash table and false otherwise.

Use a dictionary for this and for each item in list look for total_required - item in the dictionary. I have used collections.Counter here because a set can fail if total_required - item is equal to the current item from the list. Overall complexity is O(N):
>>> from collections import Counter
>>> def find_nums(total, seq):
c = Counter(seq)
for x in seq:
rem = total - x
if rem in c:
if rem == x and c[rem] > 1:
return x, rem
elif rem != x:
return x, rem
...
>>> find_nums(2, [1, 1])
(1, 1)
>>> find_nums(2, [1])
>>> find_nums(24, [9,15,1,4,2,3,6])
(9, 15)
>>> find_nums(9, [9,15,1,4,2,3,6])
(3, 6)

I think, this solution would work....
list = [9,15,1,4,2,3,6]
result = 10
list.sort()
list = filter(lambda x:x < result,list)
myMap = {}
for i in list:
if i in myMap:
print myMap[i], i
break
myMap[result - i] = i

Python Algorithm To Maximize Number of Equal Elements in List

I am trying to make an algorithm in Python that will take a list of random numbers from 0 to a 1,000,000 no more than 100 elements in length and will even this array out as much as possible giving me the maximum number of equal elements. This is what I have so far:
def answer(x):
diff = max(x) - min(x)
while diff > 1:
x[x.index(max(x))] = x[x.index(max(x))] - (diff / 2)
x[x.index(min(x))] = x[x.index(min(x))] + (diff / 2)
diff = max(x) - min(x)
return count(x)
def count(x):
from collections import Counter
c = Counter(x)
return max(c.values())
This will take an array such as [0,50] and create an array [25,25] and return the integer 2 because there are two equal elements in the array. I know for a fact this algorithm works in most cases however it doesn't in all.
Can anyone please point out any array of integers this would not yield the correct answer for? Thanks
Edit:
For those who don't want to read the while loop the code finding the range of the entire list. Splitting the range in half and adding half to the min, and subtracting half from the max. It is trying to equalize the entire list while keeping the same sum
[1,4,1] = [2,3,1] = [2,2,2] = (number of equal elements) 3
[2,1,4,9] = [2,5,4,5] = [3,4,4,5] = [4,4,4,4] = (number of equal elements) all4

What about this?
l = [1, 2, 5, 10]
# "best" possible case
l_m = [sum(l) / len(l)] * len(l)
# see if lists fit (division can cause rounding errors)
if sum(l_m) != sum(l):
# if they don't this means we can only have len(l) - 1 similar items
print len(l) - 1
else:
# if sums fit the first list can be spread like this
print len(l)

I can imagine that you're trying to make as many elements in the array equal as possible, while keeping their sum, and keeping the elements integer.
For N elements, you can get N - 1 elements equal, and, with some luck, all N equal.
This is a bit of pseudocode for you:
average = sum(elements) / length(elements) # a float
best_approximation = trunc(average) # round() would also work
discrepancy = sum(elements) - best_approximation * length(elements)
discrepant_value = best_approximation + discrepancy
result = [discrepant_value] + the rest of list having best_approximation value
By construction, you get length(elements) - 1 of equal values and one discrepant_value.

What you're really doing in normalizing your input to an integer average and distributing the remainder among the result.
L = [1,2,3,4,5,7]
# Calc the integer average
avg = sum(L)/len(L)
# Find the remainder
mod = sum(L)%len(L)
# Create a new list length of original
# populating it first with the average
L2 = [avg] * len(L)
# Then add 1 to each element for as many
# as the remainder
for n in range(mod): L2[n] += 1
def count(x):
from collections import Counter
c = Counter(x)
return max(c.values())
count(L2)
4
You don't need to morph the original list or create a new one (negating the need for your import):
L = [1,2,3,4,5,7]
# Don't even need to figure the average if you
# find the remainder of the sum of your list
# divided by the length of your list
mod = sum(L)%len(L)
result = mod if mod >= len(L)/2 else len(L) - mod
print result
4

This is the final solution I have come to.
After it minimizes the entire array's range to no greater than 1, it then checks to see if the number of equal numbers in the array is the same as the length, this means the array looks something like this: [4,4,4,4] then spit out the number of equal numbers immediately (4). If the number of the majority of the equal numbers in the list is less than the length then it equalizes the list. So if the list is something like [4,4,3,3] it is more optimal if it could be turned into [4,4,4,2]. This is what the equalize function can do.
def answer(x):
diff = max(x) - min(x)
while diff > 1:
x[x.index(max(x))] = x[x.index(max(x))] - (diff / 2)
x[x.index(min(x))] = x[x.index(min(x))] + (diff / 2)
diff = max(x) - min(x)
print(x)
if count(x) == len(x):
return count(x)
return equalize(x)
def equalize(x):
from collections import Counter
eq = Counter(x)
eq = min(eq.values())
operations = eq - 1
for i in range(0,operations):
x[x.index(min(x))] = x[x.index(min(x))] + 1
return count(x)
def count(x):
from collections import Counter
c = Counter(x)
return max(c.values())
http://repl.it/6bA/1

Python: 'string index out of range'

I had a quick search but couldn't find anything that helped my problem.
I'm trying to make a program that takes the first 5 numbers and sources their product, and if that product is the largest found thus far it is set as such.
My code is:
string = str(integer)
x = 0
largest = 0
stringlength = len(string)
while x < stringlength:
a = int(string[x])
b = int(string[x+1])
c = int(string[x+2])
d = int(string[x+3])
e = int(string[x+4])
if (a*b*c*d*e > largest):
largest = a*b*c*d*e
print(largest)
x += 1
print(largest)
I excluded the integer value itself, but for reference it is 1000 digits long. Whenever I try to run this code I get "IndexError: string index out of range". Can anyone help?

string = str(integer)
x = 0
largest = 0
stringlength = len(string)
while x < stringlength-4: # going to -5 would be out of rangue
a = int(string[x])
b = int(string[x+1])
c = int(string[x+2])
d = int(string[x+3])
e = int(string[x+4])
if (a*b*c*d*e > largest):
largest = a*b*c*d*e
print(largest)
x += 1
print(largest)

This is a classic off-by-one error (or, in this case, off-by-4 error).
When x reaches stringlength-4, x+4 is stringlength, which is past the end of string. So, you need x < stringlength-4, not x < stringlength.
But you might want to consider rewriting your code to use higher-level abstractions, to make these problems harder to run into and easier to think about.
First, instead of this:
x= 0
while x < stringlength:
# ...
x += 1
Just do this:
for x in range(stringlength):
You could then solve your problem with this:
for x in range(stringlength-4):
But let's take it farther.
If you slice the string, you won't get an IndexError:
for x in range(len(stringlength)):
a, b, c, d, e = map(int, string[x:x+4])
However, now you'll get a ValueError in the unpacking. But really, you have no need to unpack into 5 separate variables here. Just keep the sequence and multiply it out. (You can do that with a loop, but in my opinion, this is one of the few cases reduce is the most readable way to write something in Python.)
for x in range(len(stringlength)):
values = map(int, string[x:x+4])
prod = reduce(operator.mul, values)
if prod > largest:
largest = prod
print(largest)
Now there are no more errors—but that's because you're multiplying together the last 4, 3, 2, and 1 numbers. And that's exactly the problem: you never decided what should happen there.
So, now, you can make the decision explicit. Do you want to count them as batches, or skip them?
If you want to push even further forward, you can write sliding-window grouper functions using itertools, one version that acts like zip (stopping when the right edge of the window goes off the end of the list), one that acts like zip_longest (stopping only when the left edge of the window goes off):
def groupwise(iterable, n):
groups = itertools.tee(iterable, n)
for i, group in enumerate(groups):
next(itertools.islice(group, i, i), None)
return zip(*groups)
def groupwise_longest(iterable, n, fillvalue=None):
groups = itertools.tee(iterable, n)
for i, group in enumerate(groups):
next(itertools.islice(group, i, i), None)
return itertools.zip_longest(*groups, fillvalue=fillvalue)
Now, you can just do this:
for group_of_five in groupwise_longest(string, 5, 1):
values = map(int, group)
prod = reduce(operator.mul, values)
if prod > largest:
largest = prod
print(largest)
Then, if you decide you'd rather not compare the incomplete groups at the end, just change the first line to:
for group_of_five in groupwise(string, 5):
Then you can move all the work outside the for loop:
groups = groupwise_longest(string, 5, 1)
intgroups = (map(int, group) for group in groups)
prods = (reduce(operator.mul, group) for group in groups)
And now that we have a sequence of products, it should be obvious that to find the highest one, that's just:
print(max(prods))
For example:
>>> string = '12345678987654321'
>>> groups = groupwise(string, 5)
>>> intgroups = (map(int, group) for group in groups)
>>> prods = (reduce(operator.mul, group) for group in groups)
>>> max(prods)
28224
And notice that there's nowhere you could make an off-by-one errors, or any other "small" error. Of course you could still get something completely wrong, or just have no idea how to write it, but at least your errors will be obvious big errors, which are easier to debug.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Summing values in an array less than a certain value - python

Related

Getting all subsets from subset sum problem on Python using Dynamic Programming

Is there a python function that returns the first positive int that does not occur in list?

most efficient way to find a sum of two numbers

Python Algorithm To Maximize Number of Equal Elements in List

Python: 'string index out of range'

Categories

Resources