When we sort a list, like
a = [1,2,3,3,2,2,1]
sorted(a) => [1, 1, 2, 2, 2, 3, 3]
equal elements are always adjacent in the resulting list.
How can I achieve the opposite task - shuffle the list so that equal elements are never (or as seldom as possible) adjacent?
For example, for the above list one of the possible solutions is
p = [1,3,2,3,2,1,2]
More formally, given a list a, generate a permutation p of it that minimizes the number of pairs p[i]==p[i+1].
Since the lists are large, generating and filtering all permutations is not an option.
Bonus question: how to generate all such permutations efficiently?
This is the code I'm using to test the solutions: https://gist.github.com/gebrkn/9f550094b3d24a35aebd
UPD: Choosing a winner here was a tough choice, because many people posted excellent answers. #VincentvanderWeele, #David Eisenstat, #Coady, #enrico.bacis and #srgerg provided functions that generate the best possible permutation flawlessly. #tobias_k and David also answered the bonus question (generate all permutations). Additional points to David for the correctness proof.
The code from #VincentvanderWeele appears to be the fastest.
This is along the lines of Thijser's currently incomplete pseudocode. The idea is to take the most frequent of the remaining item types unless it was just taken. (See also Coady's implementation of this algorithm.)
import collections
import heapq
class Sentinel:
pass
def david_eisenstat(lst):
counts = collections.Counter(lst)
heap = [(-count, key) for key, count in counts.items()]
heapq.heapify(heap)
output = []
last = Sentinel()
while heap:
minuscount1, key1 = heapq.heappop(heap)
if key1 != last or not heap:
last = key1
minuscount1 += 1
else:
minuscount2, key2 = heapq.heappop(heap)
last = key2
minuscount2 += 1
if minuscount2 != 0:
heapq.heappush(heap, (minuscount2, key2))
output.append(last)
if minuscount1 != 0:
heapq.heappush(heap, (minuscount1, key1))
return output
Proof of correctness
For two item types, with counts k1 and k2, the optimal solution has k2 - k1 - 1 defects if k1 < k2, 0 defects if k1 = k2, and k1 - k2 - 1 defects if k1 > k2. The = case is obvious. The others are symmetric; each instance of the minority element prevents at most two defects out of a total of k1 + k2 - 1 possible.
This greedy algorithm returns optimal solutions, by the following logic. We call a prefix (partial solution) safe if it extends to an optimal solution. Clearly the empty prefix is safe, and if a safe prefix is a whole solution then that solution is optimal. It suffices to show inductively that each greedy step maintains safety.
The only way that a greedy step introduces a defect is if only one item type remains, in which case there is only one way to continue, and that way is safe. Otherwise, let P be the (safe) prefix just before the step under consideration, let P' be the prefix just after, and let S be an optimal solution extending P. If S extends P' also, then we're done. Otherwise, let P' = Px and S = PQ and Q = yQ', where x and y are items and Q and Q' are sequences.
Suppose first that P does not end with y. By the algorithm's choice, x is at least as frequent in Q as y. Consider the maximal substrings of Q containing only x and y. If the first substring has at least as many x's as y's, then it can be rewritten without introducing additional defects to begin with x. If the first substring has more y's than x's, then some other substring has more x's than y's, and we can rewrite these substrings without additional defects so that x goes first. In both cases, we find an optimal solution T that extends P', as needed.
Suppose now that P does end with y. Modify Q by moving the first occurrence of x to the front. In doing so, we introduce at most one defect (where x used to be) and eliminate one defect (the yy).
Generating all solutions
This is tobias_k's answer plus efficient tests to detect when the choice currently under consideration is globally constrained in some way. The asymptotic running time is optimal, since the overhead of generation is on the order of the length of the output. The worst-case delay unfortunately is quadratic; it could be reduced to linear (optimal) with better data structures.
from collections import Counter
from itertools import permutations
from operator import itemgetter
from random import randrange
def get_mode(count):
return max(count.items(), key=itemgetter(1))[0]
def enum2(prefix, x, count, total, mode):
prefix.append(x)
count_x = count[x]
if count_x == 1:
del count[x]
else:
count[x] = count_x - 1
yield from enum1(prefix, count, total - 1, mode)
count[x] = count_x
del prefix[-1]
def enum1(prefix, count, total, mode):
if total == 0:
yield tuple(prefix)
return
if count[mode] * 2 - 1 >= total and [mode] != prefix[-1:]:
yield from enum2(prefix, mode, count, total, mode)
else:
defect_okay = not prefix or count[prefix[-1]] * 2 > total
mode = get_mode(count)
for x in list(count.keys()):
if defect_okay or [x] != prefix[-1:]:
yield from enum2(prefix, x, count, total, mode)
def enum(seq):
count = Counter(seq)
if count:
yield from enum1([], count, sum(count.values()), get_mode(count))
else:
yield ()
def defects(lst):
return sum(lst[i - 1] == lst[i] for i in range(1, len(lst)))
def test(lst):
perms = set(permutations(lst))
opt = min(map(defects, perms))
slow = {perm for perm in perms if defects(perm) == opt}
fast = set(enum(lst))
print(lst, fast, slow)
assert slow == fast
for r in range(10000):
test([randrange(3) for i in range(randrange(6))])
Pseudocode:
Sort the list
Loop over the first half of the sorted list and fill all even indices of the result list
Loop over the second half of the sorted list and fill all odd indices of the result list
You will only have p[i]==p[i+1] if more than half of the input consists of the same element, in which case there is no other choice than putting the same element in consecutive spots (by the pidgeon hole principle).
As pointed out in the comments, this approach may have one conflict too many in case one of the elements occurs at least n/2 times (or n/2+1 for odd n; this generalizes to (n+1)/2) for both even and odd). There are at most two such elements and if there are two, the algorithm works just fine. The only problematic case is when there is one element that occurs at least half of the time. We can simply solve this problem by finding the element and dealing with it first.
I don't know enough about python to write this properly, so I took the liberty to copy the OP's implementation of a previous version from github:
# Sort the list
a = sorted(lst)
# Put the element occurring more than half of the times in front (if needed)
n = len(a)
m = (n + 1) // 2
for i in range(n - m + 1):
if a[i] == a[i + m - 1]:
a = a[i:] + a[:i]
break
result = [None] * n
# Loop over the first half of the sorted list and fill all even indices of the result list
for i, elt in enumerate(a[:m]):
result[2*i] = elt
# Loop over the second half of the sorted list and fill all odd indices of the result list
for i, elt in enumerate(a[m:]):
result[2*i+1] = elt
return result
The algorithm already given of taking the most common item left that isn't the previous item is correct. Here's a simple implementation, which optimally uses a heap to track the most common.
import collections, heapq
def nonadjacent(keys):
heap = [(-count, key) for key, count in collections.Counter(a).items()]
heapq.heapify(heap)
count, key = 0, None
while heap:
count, key = heapq.heapreplace(heap, (count, key)) if count else heapq.heappop(heap)
yield key
count += 1
for index in xrange(-count):
yield key
>>> a = [1,2,3,3,2,2,1]
>>> list(nonadjacent(a))
[2, 1, 2, 3, 1, 2, 3]
You can generate all the 'perfectly unsorted' permutations (that have no two equal elements in adjacent positions) using a recursive backtracking algorithm. In fact, the only difference to generating all the permutations is that you keep track of the last number and exclude some solutions accordingly:
def unsort(lst, last=None):
if lst:
for i, e in enumerate(lst):
if e != last:
for perm in unsort(lst[:i] + lst[i+1:], e):
yield [e] + perm
else:
yield []
Note that in this form the function is not very efficient, as it creates lots of sub-lists. Also, we can speed it up by looking at the most-constrained numbers first (those with the highest count). Here's a much more efficient version using only the counts of the numbers.
def unsort_generator(lst, sort=False):
counts = collections.Counter(lst)
def unsort_inner(remaining, last=None):
if remaining > 0:
# most-constrained first, or sorted for pretty-printing?
items = sorted(counts.items()) if sort else counts.most_common()
for n, c in items:
if n != last and c > 0:
counts[n] -= 1 # update counts
for perm in unsort_inner(remaining - 1, n):
yield [n] + perm
counts[n] += 1 # revert counts
else:
yield []
return unsort_inner(len(lst))
You can use this to generate just the next perfect permutation, or a list holding all of them. But note, that if there is no perfectly unsorted permutation, then this generator will consequently yield no results.
>>> lst = [1,2,3,3,2,2,1]
>>> next(unsort_generator(lst))
[2, 1, 2, 3, 1, 2, 3]
>>> list(unsort_generator(lst, sort=True))
[[1, 2, 1, 2, 3, 2, 3],
... 36 more ...
[3, 2, 3, 2, 1, 2, 1]]
>>> next(unsort_generator([1,1,1]))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
To circumvent this problem, you could use this together with one of the algorithms proposed in the other answers as a fallback. This will guarantee to return a perfectly unsorted permutation, if there is one, or a good approximation otherwise.
def unsort_safe(lst):
try:
return next(unsort_generator(lst))
except StopIteration:
return unsort_fallback(lst)
In python you could do the following.
Consider you have a sorted list l, you can do:
length = len(l)
odd_ind = length%2
odd_half = (length - odd_ind)/2
for i in range(odd_half)[::2]:
my_list[i], my_list[odd_half+odd_ind+i] = my_list[odd_half+odd_ind+i], my_list[i]
These are just in place operations and should thus be rather fast (O(N)).
Note that you will shift from l[i] == l[i+1] to l[i] == l[i+2] so the order you end up with is anything but random, but from how I understand the question it is not randomness you are looking for.
The idea is to split the sorted list in the middle then exchange every other element in the two parts.
For l= [1, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5] this leads to l = [3, 1, 4, 2, 5, 1, 3, 1, 4, 2, 5]
The method fails to get rid of all the l[i] == l[i + 1] as soon as the abundance of one element is bigger than or equal to half of the length of the list.
While the above works fine as long as the abundance of the most frequent element is smaller than half the size of the list, the following function also handles the limit cases (the famous off-by-one issue) where every other element starting with the first one must be the most abundant one:
def no_adjacent(my_list):
my_list.sort()
length = len(my_list)
odd_ind = length%2
odd_half = (length - odd_ind)/2
for i in range(odd_half)[::2]:
my_list[i], my_list[odd_half+odd_ind+i] = my_list[odd_half+odd_ind+i], my_list[i]
#this is just for the limit case where the abundance of the most frequent is half of the list length
if max([my_list.count(val) for val in set(my_list)]) + 1 - odd_ind > odd_half:
max_val = my_list[0]
max_count = my_list.count(max_val)
for val in set(my_list):
if my_list.count(val) > max_count:
max_val = val
max_count = my_list.count(max_val)
while max_val in my_list:
my_list.remove(max_val)
out = [max_val]
max_count -= 1
for val in my_list:
out.append(val)
if max_count:
out.append(max_val)
max_count -= 1
if max_count:
print 'this is not working'
return my_list
#raise Exception('not possible')
return out
else:
return my_list
Here is a good algorithm:
First of all count for all numbers how often they occur. Place the answer in a map.
sort this map so that the numbers that occur most often come first.
The first number of your answer is the first number in the sorted map.
Resort the map with the first now being one smaller.
If you want to improve efficiency look for ways to increase the efficiency of the sorting step.
In answer to the bonus question: this is an algorithm which finds all permutations of a set where no adjacent elements can be identical. I believe this to be the most efficient algorithm conceptually (although others may be faster in practice because they translate into simpler code). It doesn't use brute force, it only generates unique permutations, and paths not leading to solutions are cut off at the earliest point.
I will use the term "abundant element" for an element in a set which occurs more often than all other elements combined, and the term "abundance" for the number of abundant elements minus the number of other elements.
e.g. the set abac has no abundant element, the sets abaca and aabcaa have a as the abundant element, and abundance 1 and 2 respectively.
Start with a set like:
aaabbcd
Seperate the first occurances from the repeats:
firsts: abcd
repeats: aab
Find the abundant element in the repeats, if any, and calculate the abundance:
abundant element: a
abundance: 1
Generate all permutations of the firsts where the number of elements after the abundant element is not less than the abundance: (so in the example the "a" cannot be last)
abcd, abdc, acbd, acdb, adbc, adcb, bacd, badc, bcad, bcda, bdac, bdca,
cabd, cadb, cbad, cbda, cdab, cdba, dabc, dacb, abac, dbca, dcab, dcba
For each permutation, insert the set of repeated characters one by one, following these rules:
5.1. If the abundance of the set is greater than the number of elements after the last occurance of the abundant element in the permutation so far, skip to the next permutation.
e.g. when permutation so far is abc, a set with abundant element a can only be inserted if the abundance is 2 or less, so aaaabc is ok, aaaaabc isn't.
5.2. Select the element from the set whose last occurance in the permutation comes first.
e.g. when permutation so far is abcba and set is ab, select b
5.3. Insert the selected element at least 2 positions to the right of its last occurance in the permutation.
e.g. when inserting b into permutation babca, results are babcba and babcab
5.4. Recurse step 5 with each resulting permutation and the rest of the set.
EXAMPLE:
set = abcaba
firsts = abc
repeats = aab
perm3 set select perm4 set select perm5 set select perm6
abc aab a abac ab b ababc a a ababac
ababca
abacb a a abacab
abacba
abca ab b abcba a -
abcab a a abcaba
acb aab a acab ab a acaba b b acabab
acba ab b acbab a a acbaba
bac aab b babc aa a babac a a babaca
babca a -
bacb aa a bacab a a bacaba
bacba a -
bca aab -
cab aab a caba ab b cabab a a cababa
cba aab -
This algorithm generates unique permutations. If you want to know the total number of permutations (where aba is counted twice because you can switch the a's), multiply the number of unique permutations with a factor:
F = N1! * N2! * ... * Nn!
where N is the number of occurances of each element in the set. For a set abcdabcaba this would be 4! * 3! * 2! * 1! or 288, which demonstrates how inefficient an algorithm is that generates all permutations instead of only the unique ones. To list all permutations in this case, just list the unique permutations 288 times :-)
Below is a (rather clumsy) implementation in Javascript; I suspect that a language like Python may be better suited for this sort of thing. Run the code snippet to calculate the seperated permutations of "abracadabra".
// FIND ALL PERMUTATONS OF A SET WHERE NO ADJACENT ELEMENTS ARE IDENTICAL
function seperatedPermutations(set) {
var unique = 0, factor = 1, firsts = [], repeats = [], abund;
seperateRepeats(set);
abund = abundance(repeats);
permutateFirsts([], firsts);
alert("Permutations of [" + set + "]\ntotal: " + (unique * factor) + ", unique: " + unique);
// SEPERATE REPEATED CHARACTERS AND CALCULATE TOTAL/UNIQUE RATIO
function seperateRepeats(set) {
for (var i = 0; i < set.length; i++) {
var first, elem = set[i];
if (firsts.indexOf(elem) == -1) firsts.push(elem)
else if ((first = repeats.indexOf(elem)) == -1) {
repeats.push(elem);
factor *= 2;
} else {
repeats.splice(first, 0, elem);
factor *= repeats.lastIndexOf(elem) - first + 2;
}
}
}
// FIND ALL PERMUTATIONS OF THE FIRSTS USING RECURSION
function permutateFirsts(perm, set) {
if (set.length > 0) {
for (var i = 0; i < set.length; i++) {
var s = set.slice();
var e = s.splice(i, 1);
if (e[0] == abund.elem && s.length < abund.num) continue;
permutateFirsts(perm.concat(e), s, abund);
}
}
else if (repeats.length > 0) {
insertRepeats(perm, repeats);
}
else {
document.write(perm + "<BR>");
++unique;
}
}
// INSERT REPEATS INTO THE PERMUTATIONS USING RECURSION
function insertRepeats(perm, set) {
var abund = abundance(set);
if (perm.length - perm.lastIndexOf(abund.elem) > abund.num) {
var sel = selectElement(perm, set);
var s = set.slice();
var elem = s.splice(sel, 1)[0];
for (var i = perm.lastIndexOf(elem) + 2; i <= perm.length; i++) {
var p = perm.slice();
p.splice(i, 0, elem);
if (set.length == 1) {
document.write(p + "<BR>");
++unique;
} else {
insertRepeats(p, s);
}
}
}
}
// SELECT THE ELEMENT FROM THE SET WHOSE LAST OCCURANCE IN THE PERMUTATION COMES FIRST
function selectElement(perm, set) {
var sel, pos, min = perm.length;
for (var i = 0; i < set.length; i++) {
pos = perm.lastIndexOf(set[i]);
if (pos < min) {
min = pos;
sel = i;
}
}
return(sel);
}
// FIND ABUNDANT ELEMENT AND ABUNDANCE NUMBER
function abundance(set) {
if (set.length == 0) return ({elem: null, num: 0});
var elem = set[0], max = 1, num = 1;
for (var i = 1; i < set.length; i++) {
if (set[i] != set[i - 1]) num = 1
else if (++num > max) {
max = num;
elem = set[i];
}
}
return ({elem: elem, num: 2 * max - set.length});
}
}
seperatedPermutations(["a","b","r","a","c","a","d","a","b","r","a"]);
The idea is to sort the elements from the most common to the least common, take the most common, decrease its count and put it back in the list keeping the descending order (but avoiding putting the last used element first to prevent repetitions when possible).
This can be implemented using Counter and bisect:
from collections import Counter
from bisect import bisect
def unsorted(lst):
# use elements (-count, item) so bisect will put biggest counts first
items = [(-count, item) for item, count in Counter(lst).most_common()]
result = []
while items:
count, item = items.pop(0)
result.append(item)
if count != -1:
element = (count + 1, item)
index = bisect(items, element)
# prevent insertion in position 0 if there are other items
items.insert(index or (1 if items else 0), element)
return result
Example
>>> print unsorted([1, 1, 1, 2, 3, 3, 2, 2, 1])
[1, 2, 1, 2, 1, 3, 1, 2, 3]
>>> print unsorted([1, 2, 3, 2, 3, 2, 2])
[2, 3, 2, 1, 2, 3, 2]
Sort the list.
Generate a "best shuffle" of the list using this algorithm
It will give the minimum of items from the list in their original place (by item value) so it will try, for your example, to put the 1's, 2's and 3's away from their sorted positions.
Start with the sorted list of length n.
Let m=n/2.
Take the values at 0, then m, then 1, then m+1, then 2, then m+2, and so on.
Unless you have more than half of the numbers the same, you'll never get equivalent values in consecutive order.
Please forgive my "me too" style answer, but couldn't Coady's answer be simplified to this?
from collections import Counter
from heapq import heapify, heappop, heapreplace
from itertools import repeat
def srgerg(data):
heap = [(-freq+1, value) for value, freq in Counter(data).items()]
heapify(heap)
freq = 0
while heap:
freq, val = heapreplace(heap, (freq+1, val)) if freq else heappop(heap)
yield val
yield from repeat(val, -freq)
Edit: Here's a python 2 version that returns a list:
def srgergpy2(data):
heap = [(-freq+1, value) for value, freq in Counter(data).items()]
heapify(heap)
freq = 0
result = list()
while heap:
freq, val = heapreplace(heap, (freq+1, val)) if freq else heappop(heap)
result.append(val)
result.extend(repeat(val, -freq))
return result
Count number of times each value appears
Select values in order from most frequent to least frequent
Add selected value to final output, incrementing the index by 2 each time
Reset index to 1 if index out of bounds
from heapq import heapify, heappop
def distribute(values):
counts = defaultdict(int)
for value in values:
counts[value] += 1
counts = [(-count, key) for key, count in counts.iteritems()]
heapify(counts)
index = 0
length = len(values)
distributed = [None] * length
while counts:
count, value = heappop(counts)
for _ in xrange(-count):
distributed[index] = value
index = index + 2 if index + 2 < length else 1
return distributed
Related
I am trying to extract all subsets from a list of elements which add up to a certain value.
Example -
List = [1,3,4,5,6]
Sum - 9
Output Expected = [[3,6],[5,4]]
Have tried different approaches and getting the expected output but on a huge list of elements it is taking a significant amount of time.
Can this be optimized using Dynamic Programming or any other technique.
Approach-1
def subset(array, num):
result = []
def find(arr, num, path=()):
if not arr:
return
if arr[0] == num:
result.append(path + (arr[0],))
else:
find(arr[1:], num - arr[0], path + (arr[0],))
find(arr[1:], num, path)
find(array, num)
return result
numbers = [2, 2, 1, 12, 15, 2, 3]
x = 7
subset(numbers,x)
Approach-2
def isSubsetSum(arr, subset, N, subsetSize, subsetSum, index , sum):
global flag
if (subsetSum == sum):
flag = 1
for i in range(0, subsetSize):
print(subset[i], end = " ")
print("")
else:
for i in range(index, N):
subset[subsetSize] = arr[i]
isSubsetSum(arr, subset, N, subsetSize + 1,
subsetSum + arr[i], i + 1, sum)
If you want to output all subsets you can't do better than a sluggish O(2^n) complexity, because in the worst case that will be the size of your output and time complexity is lower-bounded by output size (this is a known NP-Complete problem). But, if rather than returning a list of all subsets, you just want to return a boolean value indicating whether achieving the target sum is possible, or just one subset summing to target (if it exists), you can use dynamic programming for a pseudo-polynomial O(nK) time solution, where n is the number of elements and K is the target integer.
The DP approach involves filling in an (n+1) x (K+1) table, with the sub-problems corresponding to the entries of the table being:
DP[i][k] = subset(A[i:], k) for 0 <= i <= n, 0 <= k <= K
That is, subset(A[i:], k) asks, 'Can I sum to (little) k using the suffix of A starting at index i?' Once you fill in the whole table, the answer to the overall problem, subset(A[0:], K) will be at DP[0][K]
The base cases are for i=n: they indicate that you can't sum to anything except for 0 if you're working with the empty suffix of your array
subset(A[n:], k>0) = False, subset(A[n:], k=0) = True
The recursive cases to fill in the table are:
subset(A[i:], k) = subset(A[i+1:, k) OR (A[i] <= k AND subset(A[i+i:], k-A[i]))
This simply relates the idea that you can use the current array suffix to sum to k either by skipping over the first element of that suffix and using the answer you already had in the previous row (when that first element wasn't in your array suffix), or by using A[i] in your sum and checking if you could make the reduced sum k-A[i] in the previous row. Of course, you can only use the new element if it doesn't itself exceed your target sum.
ex: subset(A[i:] = [3,4,1,6], k = 8)
would check: could I already sum to 8 with the previous suffix (A[i+1:] = [4,1,6])? No. Or, could I use the 3 which is now available to me to sum to 8? That is, could I sum to k = 8 - 3 = 5 with [4,1,6]? Yes. Because at least one of the conditions was true, I set DP[i][8] = True
Because all the base cases are for i=n, and the recurrence relation for subset(A[i:], k) relies on the answers to the smaller sub-problems subset(A[i+i:],...), you start at the bottom of the table, where i = n, fill out every k value from 0 to K for each row, and work your way up to row i = 0, ensuring you have the answers to the smaller sub-problems when you need them.
def subsetSum(A: list[int], K: int) -> bool:
N = len(A)
DP = [[None] * (K+1) for x in range(N+1)]
DP[N] = [True if x == 0 else False for x in range(K+1)]
for i in range(N-1, -1, -1):
Ai = A[i]
DP[i] = [DP[i+1][k] or (Ai <=k and DP[i+1][k-Ai]) for k in range(0, K+1)]
# print result
print(f"A = {A}, K = {K}")
print('Ai,k:', *range(0,K+1), sep='\t')
for (i, row) in enumerate(DP): print(A[i] if i < N else None, *row, sep='\t')
print(f"DP[0][K] = {DP[0][K]}")
return DP[0][K]
subsetSum([1,4,3,5,6], 9)
If you want to return an actual possible subset alongside the bool indicating whether or not it's possible to make one, then for every True flag in your DP you should also store the k index for the previous row that got you there (it will either be the current k index or k-A[i], depending on which table lookup returned True, which will indicate whether or not A[i] was used). Then you walk backwards from DP[0][K] after the table is filled to get a subset. This makes the code messier but it's definitely do-able. You can't get all subsets this way though (at least not without increasing your time complexity again) because the DP table compresses information.
Here is the optimized solution to the problem with a complexity of O(n^2).
def get_subsets(data: list, target: int):
# initialize final result which is a list of all subsets summing up to target
subsets = []
# records the difference between the target value and a group of numbers
differences = {}
for number in data:
prospects = []
# iterate through every record in differences
for diff in differences:
# the number complements a record in differences, i.e. a desired subset is found
if number - diff == 0:
new_subset = [number] + differences[diff]
new_subset.sort()
if new_subset not in subsets:
subsets.append(new_subset)
# the number fell short to reach the target; add to prospect instead
elif number - diff < 0:
prospects.append((number, diff))
# update the differences record
for prospect in prospects:
new_diff = target - sum(differences[prospect[1]]) - prospect[0]
differences[new_diff] = differences[prospect[1]] + [prospect[0]]
differences[target - number] = [number]
return subsets
I'm tryin to design a function that, given an array A of N integers, returns the smallest positive integer (greater than 0) that does not occur in A.
This code works fine yet has a high order of complexity, is there another solution that reduces the order of complexity?
Note: The 10000000 number is the range of integers in array A, I tried the sort function but does it reduces the complexity?
def solution(A):
for i in range(10000000):
if(A.count(i)) <= 0:
return(i)
The following is O(n logn):
a = [2, 1, 10, 3, 2, 15]
a.sort()
if a[0] > 1:
print(1)
else:
for i in range(1, len(a)):
if a[i] > a[i - 1] + 1:
print(a[i - 1] + 1)
break
If you don't like the special handling of 1, you could just append zero to the array and have the same logic handle both cases:
a = sorted(a + [0])
for i in range(1, len(a)):
if a[i] > a[i - 1] + 1:
print(a[i - 1] + 1)
break
Caveats (both trivial to fix and both left as an exercise for the reader):
Neither version handles empty input.
The code assumes there no negative numbers in the input.
O(n) time and O(n) space:
def solution(A):
count = [0] * len(A)
for x in A:
if 0 < x <= len(A):
count[x-1] = 1 # count[0] is to count 1
for i in range(len(count)):
if count[i] == 0:
return i+1
return len(A)+1 # only if A = [1, 2, ..., len(A)]
This should be O(n). Utilizes a temporary set to speed things along.
a = [2, 1, 10, 3, 2, 15]
#use a set of only the positive numbers for lookup
temp_set = set()
for i in a:
if i > 0:
temp_set.add(i)
#iterate from 1 upto length of set +1 (to ensure edge case is handled)
for i in range(1, len(temp_set) + 2):
if i not in temp_set:
print(i)
break
My proposal is a recursive function inspired by quicksort.
Each step divides the input sequence into two sublists (lt = less than pivot; ge = greater or equal than pivot) and decides, which of the sublists is to be processed in the next step. Note that there is no sorting.
The idea is that a set of integers such that lo <= n < hi contains "gaps" only if it has less than (hi - lo) elements.
The input sequence must not contain dups. A set can be passed directly.
# all cseq items > 0 assumed, no duplicates!
def find(cseq, cmin=1):
# cmin = possible minimum not ruled out yet
size = len(cseq)
if size <= 1:
return cmin+1 if cmin in cseq else cmin
lt = []
ge = []
pivot = cmin + size // 2
for n in cseq:
(lt if n < pivot else ge).append(n)
return find(lt, cmin) if cmin + len(lt) < pivot else find(ge, pivot)
test = set(range(1,100))
print(find(test)) # 100
test.remove(42)
print(find(test)) # 42
test.remove(1)
print(find(test)) # 1
Inspired by various solutions and comments above, about 20%-50% faster in my (simplistic) tests than the fastest of them (though I'm sure it could be made faster), and handling all the corner cases mentioned (non-positive numbers, duplicates, and empty list):
import numpy
def firstNotPresent(l):
positive = numpy.fromiter(set(l), dtype=int) # deduplicate
positive = positive[positive > 0] # only keep positive numbers
positive.sort()
top = positive.size + 1
if top == 1: # empty list
return 1
sequence = numpy.arange(1, top)
try:
return numpy.where(sequence < positive)[0][0]
except IndexError: # no numbers are missing, top is next
return top
The idea is: if you enumerate the positive, deduplicated, sorted list starting from one, the first time the index is less than the list value, the index value is missing from the list, and hence is the lowest positive number missing from the list.
This and the other solutions I tested against (those from adrtam, Paritosh Singh, and VPfB) all appear to be roughly O(n), as expected. (It is, I think, fairly obvious that this is a lower bound, since every element in the list must be examined to find the answer.) Edit: looking at this again, of course the big-O for this approach is at least O(n log(n)), because of the sort. It's just that the sort is so fast comparitively speaking that it looked linear overall.
Hello guys here is the problem. I have something like this in input [[1,2,3],[4,5,6],[7,8,9]]...etc
And i want to generate all possible combination of product of those list and then multiply each elements of the resulting combination beetween them to finally filter the result in a interval.
So first input a n list [[1,2,3],[4,5,6],[7,8,9],[10,11,12]]...etc
Which will then give (1,4,7,10)
(1,4,7,11)
(1,4,7,12)
and so on
Then combination of those result for k in n like (1,4,7)(1,4,10)(1,7,10) for the first row
The multiplication of x as 1*4*7 = 28, 1*4*10 = 40, 1*7*10 = 70
And from this get only the unique combination and the result need in the interval choosed beforehand : if x > 50 and x < 100 i will get (1,7,10) : 70
I did try
def mult(lst): #A function mult i'm using later
r = 1
for element in lst:
r *= element
return round(r)
s = [] #Where i add my list of list
for i in range(int(input1)):
b = input("This is line %s : " % (i+1)).split()
for i in range(len(b)):
b[i] = float(b[i])
s.append(b)
low_result = input("Expected low_result : ")
high_result = input("Expected high_result : ")
combine = []
my_list = []
for element in itertools.product(*s):
l= [float(x) for x in element]
comb = itertools.combinations([*l], int(input2))
for i in list(comb):
combine.append(i)
res = mult(i)
if res >= int(low_result) and res <= int(high_result):
my_list.append(res)
f = open("list_result.txt","a+")
f.write("%s : result is %s\n" % (i, res))
f.close()
And it always result in memory error cause there is too many variation with what i'm seeking.
What i would like is a way to generate from a list of list of 20 elements or more all the product and resulting combination of k in n for the result(interval) that i need.
As suggested above, I think this can be done without exploding your memory by never holding an array in memory at any time. But the main issue is then runtime.
The maths
As written we are:
Producing every combination of m rows of n items n ** m
Then taking a choice of c items from those m values C(m, c)
This is very large. If we have m=25 rows, of n=3 items each and pick c=3 items in them we get:
= n ** m * C(m, c)
= 3 ** 25 * 2300 - n Choose r calculator
= 1.948763802×10¹⁵
If instead we:
Choose c rows from the m rows: C(m, c) as before
Then pick every combination of n items from these c rows: n ** c
With m=25 rows, of n=3 items each and pick c=3 items in them we get:
= n ** c * C(m, c)
= 3 ** 3 * 2300
= 20700
This is now a solvable problem.
The code
from itertools import product, combinations
def mult(values, min_value, max_value):
"""
Multiply together the values, but return None if we get too big or too
small
"""
output = 1
for value in values:
output *= value
# Early return if we go too big
if output > max_value:
return None
# Early return if we goto zero (from which we never return)
if output == 0 and min_value != 0:
return None
if output < min_value:
return None
return output
def yield_valid_combos(values, choose, min_value, max_value):
# No doubt an even fancier list compression would get this too
for rows in combinations(values, choose):
for combos in product(*rows):
value = mult(combos, min_value, max_value)
if value is not None:
yield combos, value
values = [[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]]
with open('list_result.txt', 'w') as fh:
for selection, value in yield_valid_combos(
values, choose=3, min_value=50, max_value=100):
fh.write('{}: result is {}\n'.format(selection, value))
This solution also returns no duplicate answers (unless the same value appears in multiple rows).
As an optimisation the multiplication method attempts to return early if we detect the result will be too big or small. We also only open the file once and then keep adding rows to it as they come.
Further optimisation
You can also optimise your set of values ahead of time by screening out values which cannot contribute to a solution. But for smaller values of c, you may find this is not even necessary.
The smallest possible combination of values is c items from the set of the smallest values in each row. If we take the c - 1 smallest items from the set of smallest values, mutliply them together and then divide the maximum by this number, it gives us an upper bound for the largest value which can be in a solution. We can then then screen out all values above this value (cutting down on permutations)
I want to test if a list contains consecutive integers and no repetition of numbers.
For example, if I have
l = [1, 3, 5, 2, 4, 6]
It should return True.
How should I check if the list contains up to n consecutive numbers without modifying the original list?
I thought about copying the list and removing each number that appears in the original list and if the list is empty then it will return True.
Is there a better way to do this?
For the whole list, it should just be as simple as
sorted(l) == list(range(min(l), max(l)+1))
This preserves the original list, but making a copy (and then sorting) may be expensive if your list is particularly long.
Note that in Python 2 you could simply use the below because range returned a list object. In 3.x and higher the function has been changed to return a range object, so an explicit conversion to list is needed before comparing to sorted(l)
sorted(l) == range(min(l), max(l)+1))
To check if n entries are consecutive and non-repeating, it gets a little more complicated:
def check(n, l):
subs = [l[i:i+n] for i in range(len(l)) if len(l[i:i+n]) == n]
return any([(sorted(sub) in range(min(l), max(l)+1)) for sub in subs])
The first code removes duplicates but keeps order:
from itertools import groupby, count
l = [1,2,4,5,2,1,5,6,5,3,5,5]
def remove_duplicates(values):
output = []
seen = set()
for value in values:
if value not in seen:
output.append(value)
seen.add(value)
return output
l = remove_duplicates(l) # output = [1, 2, 4, 5, 6, 3]
The next set is to identify which ones are in order, taken from here:
def as_range(iterable):
l = list(iterable)
if len(l) > 1:
return '{0}-{1}'.format(l[0], l[-1])
else:
return '{0}'.format(l[0])
l = ','.join(as_range(g) for _, g in groupby(l, key=lambda n, c=count(): n-next(c)))
l outputs as: 1-2,4-6,3
You can customize the functions depending on your output.
We can use known mathematics formula for checking consecutiveness,
Assuming min number always start from 1
sum of consecutive n numbers 1...n = n * (n+1) /2
def check_is_consecutive(l):
maximum = max(l)
if sum(l) == maximum * (maximum+1) /2 :
return True
return False
Once you verify that the list has no duplicates, just compute the sum of the integers between min(l) and max(l):
def check(l):
total = 0
minimum = float('+inf')
maximum = float('-inf')
seen = set()
for n in l:
if n in seen:
return False
seen.add(n)
if n < minimum:
minimum = n
if n > maximum:
maximum = n
total += n
if 2 * total != maximum * (maximum + 1) - minimum * (minimum - 1):
return False
return True
import numpy as np
import pandas as pd
(sum(np.diff(sorted(l)) == 1) >= n) & (all(pd.Series(l).value_counts() == 1))
We test both conditions, first by finding the iterative difference of the sorted list np.diff(sorted(l)) we can test if there are n consecutive integers. Lastly, we test if the value_counts() are all 1, indicating no repeats.
I split your query into two parts part A "list contains up to n consecutive numbers" this is the first line if len(l) != len(set(l)):
And part b, splits the list into possible shorter lists and checks if they are consecutive.
def example (l, n):
if len(l) != len(set(l)): # part a
return False
for i in range(0, len(l)-n+1): # part b
if l[i:i+3] == sorted(l[i:i+3]):
return True
return False
l = [1, 3, 5, 2, 4, 6]
print example(l, 3)
def solution(A):
counter = [0]*len(A)
limit = len(A)
for element in A:
if not 1 <= element <= limit:
return False
else:
if counter[element-1] != 0:
return False
else:
counter[element-1] = 1
return True
The input to this function is your list.This function returns False if the numbers are repeated.
The below code works even if the list does not start with 1.
def check_is_consecutive(l):
"""
sorts the list and
checks if the elements in the list are consecutive
This function does not handle any exceptions.
returns true if the list contains consecutive numbers, else False
"""
l = list(filter(None,l))
l = sorted(l)
if len(l) > 1:
maximum = l[-1]
minimum = l[0] - 1
if minimum == 0:
if sum(l) == (maximum * (maximum+1) /2):
return True
else:
return False
else:
if sum(l) == (maximum * (maximum+1) /2) - (minimum * (minimum+1) /2) :
return True
else:
return False
else:
return True
1.
l.sort()
2.
for i in range(0,len(l)-1)))
print(all((l[i+1]-l[i]==1)
list must be sorted!
lst = [9,10,11,12,13,14,15,16]
final = True if len( [ True for x in lst[:-1] for y in lst[1:] if x + 1 == y ] ) == len(lst[1:]) else False
i don't know how efficient this is but it should do the trick.
With sorting
In Python 3, I use this simple solution:
def check(lst):
lst = sorted(lst)
if lst:
return lst == list(range(lst[0], lst[-1] + 1))
else:
return True
Note that, after sorting the list, its minimum and maximum come for free as the first (lst[0]) and the last (lst[-1]) elements.
I'm returning True in case the argument is empty, but this decision is arbitrary. Choose whatever fits best your use case.
In this solution, we first sort the argument and then compare it with another list that we know that is consecutive and has no repetitions.
Without sorting
In one of the answers, the OP commented asking if it would be possible to do the same without sorting the list. This is interesting, and this is my solution:
def check(lst):
if lst:
r = range(min(lst), max(lst) + 1) # *r* is our reference
return (
len(lst) == len(r)
and all(map(lst.__contains__, r))
# alternative: all(x in lst for x in r)
# test if every element of the reference *r* is in *lst*
)
else:
return True
In this solution, we build a reference range r that is a consecutive (and thus non-repeating) sequence of ints. With this, our test is simple: first we check that lst has the correct number of elements (not more, which would indicate repetitions, nor less, which indicates gaps) by comparing it with the reference. Then we check that every element in our reference is also in lst (this is what all(map(lst.__contains__, r)) is doing: it iterates over r and tests if all of its elements are in lts).
l = [1, 3, 5, 2, 4, 6]
from itertools import chain
def check_if_consecutive_and_no_duplicates(my_list=None):
return all(
list(
chain.from_iterable(
[
[a + 1 in sorted(my_list) for a in sorted(my_list)[:-1]],
[sorted(my_list)[-2] + 1 in my_list],
[len(my_list) == len(set(my_list))],
]
)
)
)
Add 1 to any number in the list except for the last number(6) and check if the result is in the list. For the last number (6) which is the greatest one, pick the number before it(5) and add 1 and check if the result(6) is in the list.
Here is a really short easy solution without having to use any imports:
range = range(10)
L = [1,3,5,2,4,6]
L = sorted(L, key = lambda L:L)
range[(L[0]):(len(L)+L[0])] == L
>>True
This works for numerical lists of any length and detects duplicates.
Basically, you are creating a range your list could potentially be in, editing that range to match your list's criteria (length, starting value) and making a snapshot comparison. I came up with this for a card game I am coding where I need to detect straights/runs in a hand and it seems to work pretty well.
I have a small problem within a bigger problem.
I have an array of positive integers. I need to find a position i in the array such that all the numbers which are smaller than the element at position i should appear after it.
Example:
(let's assume array is indexed at 1)
2, 3, 4, 1, 9,3, 2 => 3rd pos // 1,2,3 are less than 4 and are occurring after it.
5, 2, 1, 5 => 2nd pos
1,2,1 => 2nd pos
1, 4, 6, 7, 2, 3 => doesn't exist
I'm thinking of using a hashtable but I don't know exactly how. Or sorting would be better? Any ideas for an efficient idea?
We can start by creating a map (or hash table or whatever), which records the last occurence for each entry:
for i from 1 to n
lastOccurrence[arr[i]] = i
next
We know that if j is a valid answer, then every number smaller than j is also a valid answer. So we want to find the maximum j. The minimum j is obviously 1 because then the left sublist is empty.
We can then iterate all possible js and check their validity.
maxJ = n
for j from 1 to n
if j > maxJ
return maxJ
if lastOccurrence[arr[j]] == j
return j
maxJ = min(maxJ, lastOccurrence[arr[j]] - 1)
next
from sets import Set
def findMaxIndex(array):
lastSet = Set()
size = len(array)
maxIndex = size
for index in range(size-1,-1,-1):
if array[index] in lastSet:
continue
else:
lastSet.add(array[index])
maxIndex = index + 1
if maxIndex == 1:
return 0 # don't exist
else:
return maxIndex
from the last element to the first, use a set to keep elements having met, if iterate element(index i) is not in set, then the max index is i, and update the set