I'm trying to get n random and non-overlapping slices of a sequence where each subsequence is of length l, preferably in the order they appear.
This is the code I have so far and it's gotten more and more messy with each attempt to make it work, needless to say it doesn't work.
def rand_parts(seq, n, l):
"""
return n random non-overlapping partitions each of length l.
If n * l > len(seq) raise error.
"""
if n * l > len(seq):
raise Exception('length of seq too short for given n, l arguments')
if not isinstance(seq, list):
seq = list(seq)
gaps = [0] * (n + 1)
for g in xrange(len(seq) - (n * l)):
gaps[random.randint(0, len(gaps) - 1)] += 1
result = []
for i, g in enumerate(gaps):
x = g + (i * l)
result.append(seq[x:x+l])
if i < len(gaps) - 1:
gaps[i] += x
return result
For example if we say rand_parts([1, 2, 3, 4, 5, 6], 2, 2) there are 6 possible results that it could return from the following diagram:
[1, 2, 3, 4, 5, 6]
____ ____
[1, 2, 3, 4, 5, 6]
____ ____
[1, 2, 3, 4, 5, 6]
____ ____
[1, 2, 3, 4, 5, 6]
____ ____
[1, 2, 3, 4, 5, 6]
____ ____
[1, 2, 3, 4, 5, 6]
____ ____
So [[3, 4], [5, 6]] would be acceptable but [[3, 4], [4, 5]] wouldn't because it's overlapping and [[2, 4], [5, 6]] also wouldn't because [2, 4] isn't contiguous.
I encountered this problem while doing a little code golfing so for interests sake it would also be nice to see both a simple solution and/or an efficient one, not so much interested in my existing code.
def rand_parts(seq, n, l):
indices = xrange(len(seq) - (l - 1) * n)
result = []
offset = 0
for i in sorted(random.sample(indices, n)):
i += offset
result.append(seq[i:i+l])
offset += l - 1
return result
To understand this, first consider the case l == 1. Then it's basically just returning a random.sample() of the input data in sorted order; in this case the offset variable is always 0.
The case where l > 1 is an extension of the previous case. We use random.sample() to pick up positions, but maintain an offset to shift successive results: in this way, we make sure that they are non-overlapping ranges --- i.e. they start at a distance of at least l of each other, rather than 1.
Many solutions can be hacked for this problem, but one has to be careful if the sequences are to be strictly random. For example, it's wrong to begin by picking a random number between 0 and len(seq)-n*l and say that the first sequence will start there, then work recursively.
The problem is equivalent to selecting randomly n+1 integer numbers such that their sum is equal to len(seq)-l*n. (These numbers will be the "gaps" between your sequences.) To solve it, you can see this question.
This worked for me in Python 3.3.2. It should be backwards compatible with Python 2.7.
from random import randint as r
def greater_than(n, lis, l):
for element in lis:
if n < element + l:
return False
return True
def rand_parts(seq, n, l):
"""
return n random non-overlapping partitions each of length l.
If n * l > len(seq) raise error.
"""
if n * l > len(seq):
raise(Exception('length of seq too short for given n, l arguments'))
if not isinstance(seq, list):
seq = list(seq)
# Setup
left_to_do = n
tried = []
result = []
# The main loop
while left_to_do > 0:
while True:
index = r(0, len(seq) - 1)
if greater_than(index, tried, l) and index <= len(seq) - left_to_do * l:
tried.append(index)
break
left_to_do -= 1
result.append(seq[index:index+l])
# Done
return result
a = [1, 2, 3, 4, 5, 6]
print(rand_parts(a, 3, 2))
The above code will always print [[1, 2], [3, 4], [5, 6]]
If you do it recursively it's much simpler. Take the first part from (so the rest will fit):
[0:total_len - (numer_of_parts - 1) * (len_of_parts)]
and then recurse with what left to do:
rand_parts(seq - begining _to_end_of_part_you_grabbed, n - 1, l)
First of all, I think you need to clarify what you mean by the term random.
How can you generate a truly random list of sub-sequences when you are placing specific restrictions on the sub-sequences themselves?
As far as I know, the best "randomness" anyone can achieve in this context is generating all lists of sub-sequences that satisfy your criteria, and selecting from the pool however many you need in a random fashion.
Now based on my experience from an algorithms class that I've taken a few years ago, your problem seems to be a typical example which could be solved using a greedy algorithm making these big (but likely?) assumptions about what you were actually asking in the first place:
What you actually meant by random is not that a list of sub-sequence should be generated randomly (which is kind of contradictory as I said before), but that any of the solutions that could be produced is just as valid as the rest (e.g. any of the 6 solutions is valid from input [1,2,3,4,5,6] and you don't care which one)
Restating the above, you just want any one of the possible solutions that could be generated, and you want an algorithm that can output one of these valid answers.
Assuming the above here is a greedy algorithm which generates one of the possible lists of sub-sequences in linear time (excluding sorting, which is O(n*log(n))):
def subseq(seq, count, length):
s = sorted(list(set(seq)))
result = []
subseq = []
for n in s:
if len(subseq) == length:
result.append(subseq)
if len(result) == count:
return result
subseq = [n]
elif len(subseq) == 0:
subseq.append(n)
elif subseq[-1] + 1 == n:
subseq.append(n)
elif subseq[-1] + 1 < n:
subseq = [n]
print("Impossible!")
The gist of the algorithm is as follows:
One of your requirements is that there cannot be any overlaps, and this ultimately implies you need to deal with unique numbers and unique numbers only. So I use the set() operation to get rid all the duplicates. Then I sort it.
Rest is pretty straight forward imo. I just iterate over the sorted list and form sub-sequences greedily.
If the algorithm can't form enough number of sub-sequences then print "Impossible!"
Hope this was what you were looking for.
EDIT: For some reason I wrongly assumed that there couldn't be repeating values in a sub-sequence, this one allows it.
def subseq2(seq, count, length):
s = sorted(seq)
result = []
subseq = []
for n in s:
if len(subseq) == length:
result.append(subseq)
if len(result) == count:
return result
subseq = [n]
elif len(subseq) == 0:
subseq.append(n)
elif subseq[-1] + 1 == n or subseq[-1] == n:
subseq.append(n)
elif subseq[-1] + 1 < n:
subseq = [n]
print("Impossible!")
Related
I'm wanting to create a list of permutations or cartesian products (not sure which one applies here) where the sum of values in each permutation totals to a provided value.
There should be three parameters required for the function.
Sample Size: The number of items in each permutation
Desired Sum: The total that each permutation should add up to
Set of Numbers: The set of numbers that can be included with repetition in the permutations
I have an implementation working below but it seems quite slow I would prefer to use an iterator to stream the results but I would also need a function that would be able to calculate the total number of items that the iterator would produce.
def buildPerms(sample_size, desired_sum, set_of_number):
blank = [0] * sample_size
return recurseBuildPerms([], blank, set_of_number, desired_sum)
def recurseBuildPerms(perms, blank, values, desired_size, search_index = 0):
for i in range(0, len(values)):
for j in range(search_index, len(blank)):
if(blank[j] == 0):
new_blank = blank.copy()
new_blank[j] = values[i]
remainder = desired_size - sum(new_blank)
new_values = list(filter(lambda x: x <= remainder, values))
if(len(new_values) > 0):
recurseBuildPerms(perms, new_blank, new_values, desired_size, j)
elif(sum(new_blank) <= desired_size):
perms.append( new_blank)
return perms
perms = buildPerms(4, 10, [1,2,3])
print(perms)
## Output
[[1, 3, 3, 3], [2, 2, 3, 3], [2, 3, 2, 3],
[2, 3, 3, 2], [3, 1, 3, 3], [3, 2, 2, 3],
[3, 2, 3, 2], [3, 3, 1, 3], [3, 3, 2, 2],
[3, 3, 3, 1]]
https://www.online-python.com/9cmOev3zlg
Questions:
Can someone help me convert my solution into an iterator?
Is it possible to have a calculation to know the total number of items without seeing the full list?
Here is one way to break this down into two subproblems:
Find all restricted integer partitions of target_sum into sample_size summands s.t. all summands come from set_of_number.
Compute multiset permutations for each partition (takes up most of the time).
Problem 1 can be solved with dynamic programming. I used multiset_permutations from sympy for part 2, although you might be able to get better performance by writing your own numba code.
Here is the code:
from functools import lru_cache
from sympy.utilities.iterables import multiset_permutations
#lru_cache(None)
def restricted_partitions(n, k, *xs):
'partitions of n into k summands using only elements in xs (assumed positive integers)'
if n == k == 0:
# case of unique empty partition
return [[]]
elif n <= 0 or k <= 0 or not xs:
# case where no partition is possible
return []
# general case
result = list()
x = xs[0] # element x we consider including in a partition
i = 0 # number of times x should be included
while True:
i += 1
if i > k or x * i > n:
break
for rest in restricted_partitions(n - x * i, k - i, *xs[1:]):
result.append([x] * i + rest)
result.extend(restricted_partitions(n, k, *xs[1:]))
return result
def buildPerms2(sample_size, desired_sum, set_of_number):
for part in restricted_partitions(desired_sum, sample_size, *set_of_number):
yield from multiset_permutations(part)
# %timeit sum(1 for _ in buildPerms2(8, 16, [1, 2, 3, 4])) # 16 ms
# %timeit sum(1 for _ in buildPerms (8, 16, [1, 2, 3, 4])) # 604 ms
The current solution requires computing all restricted partitions before iteration can begin, but it may still be practical if restricted partitions can be computed quickly. It may be possible to compute partitions iteratively as well, although this may require more work.
On the second question, you can indeed count the number of such permutations without generating them all:
# present in the builtin math library for Python 3.8+
#lru_cache(None)
def binomial(n, k):
if k == 0:
return 1
if n == 0:
return 0
return binomial(n - 1, k) + binomial(n - 1, k - 1)
#lru_cache(None)
def perm_counts(n, k, *xs):
if n == k == 0:
# case of unique empty partition
return 1
elif n <= 0 or k <= 0 or not xs:
# case where no partition is possible
return 0
# general case
result = 0
x = xs[0] # element x we consider including in a partition
i = 0 # number of times x should be included
while True:
i += 1
if i > k or x * i > n:
break
result += binomial(k, i) * perm_counts(n - x * i, k - i, *xs[1:])
result += perm_counts(n, k, *xs[1:])
return result
# assert perm_counts(15, 6, *[1,2,3,4]) == sum(1 for _ in buildPerms2(6, 15, [1,2,3,4])) == 580
# perm_counts(1000, 100, *[1,2,4,8,16,32,64])
# 902366143258890463230784240045750280765827746908124462169947051257879292738672
The function used to count all restricted permutations looks very similar to the function that generates partitions above. The only significant change is in the following line:
result += binomial(k, i) * perm_counts(n - x * i, k - i, *xs[1:])
There are i copies of x to include and k possible positions where x's may end up. To account for this multiplicity, the number of ways to resolve the recursive sub-problem is multiplied by k choose i.
Function to find minimum number of eliminations such that sum of all adjacent elements is even:
def min_elimination(n, arr):
countOdd = 0
# Stores the new value
for i in range(n):
# Count odd numbers
***if (arr[i] % 2):
countOdd += 1***
# Return the minimum of even and
# odd count
return min(countOdd, n - countOdd)
# Driver code
if __name__ == '__main__':
arr = [1, 2, 3, 7, 9]
n = len(arr)
print(min_elimination(n, arr))
Please help me with the if condition. When the code does if(number%2) then control is going inside the if since the first element of list is an odd number. Is there any difference between if(number%2) and if(number%2==0). Because when I tried if(number%2==0) control didn't go inside the if as the number was odd (check first element of the list).
This is a simple version of the above.
def min_elimination(arr):
lst1 = [n for n in arr if n%2] # List of all odd numbers
lst2 = [n for n in arr if not n%2] # List of all even numbers
lst = max(lst1, lst2, key=lambda x: len(x))
return lst
print(min_elimination([1, 2, 3, 7, 9]))
I believe your code works fine as-is. It correctly returns the minimum number of eliminations necessary to achieve the result, not the result itself.
is there any difference between if(number%2) and if(number%2==0)
Yes, if number % 2 is the same as saying if number % 2 == 1, which is the opposite of saying if number % 2 == 0. So switching one for the other will break your program's logic.
I might simplify your code as follows:
def min_elimination(array):
odd = 0
# Stores the new value
for number in array:
# Count odd numbers
odd += number % 2
return min(odd, len(array) - odd)
Where min_elimination([1, 2, 3, 7, 9]) returns 1, there is one (1) elimination necessary to make the sum of all adjacent elements even.
The only way two integers can add up to be even is that both integers should be odd, or both integers should be even. So your question basically wants to find whether there are more odd number in the array, or even numbers:
def min_elimination(arr):
return len(min([n for n in arr if n%2],[n for n in arr if not n%2],key=len))
print(min_elimination([1, 2, 3, 7, 9]))
Output:
1
You can use numpy to compare the number of odd numbers in the array to the number of even numbers in the array:
import numpy as np
def min_elimination(arr):
return len(min(arr[arr%2],arr[any(arr%2)],key=len))
print(min_elimination(np.array([1, 2, 3, 7, 9])))
Output:
1
I am trying to do a challenge in Python, the challenge consists of :
Given an array X of positive integers, its elements are to be transformed by running the following operation on them as many times as required:
if X[i] > X[j] then X[i] = X[i] - X[j]
When no more transformations are possible, return its sum ("smallest possible sum").
Basically you pick two non-equal numbers from the array, and replace the largest of them with their subtraction. You repeat this till all numbers in array are same.
I tried a basic approach by using min and max but there is another constraint which is time. I always get timeout because my code is not optimized and takes too much time to execute. Can you please suggest some solutions to make it run faster.
def solution(array):
while len(set(array)) != 1:
array[array.index(max(array))] = max(array) - min(array)
return sum(array)
Thank you so much !
EDIT
I will avoid to spoil the challenge... because I didn't find the solution in Python. But here's the general design of an algorithm that works in Kotlin (in 538 ms). In Python I'm stuck at the middle of the performance tests.
Some thoughts:
First, the idea to remove the minimum from the other elements is good: the modulo (we remove the minimum as long as it is possible) will be small.
Second, if this minimum is 1, the array will be soon full of 1s and the result is N (the len of the array).
Third, if all elements are equal, the result is N times the value of one element.
The algorithm
The idea is to keep two indices: i is the current index that cycles on 0..N and k is the index of the current minimum.
At the beginning, k = i = 0 and the minimum is m = arr[0]. We advance i until one of the following happen:
i == k => we made a full cycle without updating k, return N*m;
arr[i] == 1 => return N;
arr[i] < m => update k and m;
arr[i] > m => compute the new value of arr[i] (that is arr[i] % m or m if arr[i] is a multiple of m). If thats not m, thats arr[i] % m < m: update k and m;
arr[i] == m => pass.
Bascially, we use a rolling minimum and compute the modulos on the fly until all element are the same. That spares the computation of a min of the array periodically.
PREVIOUS ANSWER
As #BallpointBen wrote, you'll get the n times the GCD of all numbers. But that's cheating ;)! If you want to find a solution by hand, you can optimize your code.
While you don't find N identical numbers, you use the set, max (twice!), min and index functions on array. Those functions are pretty expensive. The number of iterations depend on the array.
Imagine the array is sorted in reverse order: [22, 14, 6, 2]. You can replace 22 by 22-14, 14 by 14-6, ... and get: [8, 12, 4, 2]. Sort again: [12, 8, 4, 2], replace again: [4, 4, 4, 2]. Sort again, replace again (if different): [4, 4, 2, 2], [4, 2, 2, 2], [2, 2, 2, 2]. Actually, in the first pass 14 could be replaced by 14-2*6 = 2 (as in the classic GCD computation), giving the following sequence:
[22, 14, 6, 2]
[8, 2, 2, 2]
[2, 2, 2, 2]
The convergence is fast.
def solution2(arr):
N = len(arr)
end = False
while not end:
arr = sorted(arr, reverse=True)
end = True
for i in range(1, N):
while arr[i-1] > arr[i]:
arr[i-1] -= arr[i]
end = False
return sum(arr)
A benchmark:
import random
import timeit
arr = [4*random.randint(1, 100000) for _ in range(100)] # GCD will be 4 or a multiple of 4
assert solution(list(arr)) == solution2(list(arr))
print(timeit.timeit(lambda: solution(list(arr)), number=100))
print(timeit.timeit(lambda: solution2(list(arr)), number=100))
Output:
2.5396839629975148
0.029025810996245127
def solution(a):
N = len(a)
end = False
while not end:
a = sorted(a, reverse=True)
small = min(a)
end = True
for i in range(1, N):
if a[i-1] > small:
a[i-1] = a[i-1]%small if a[i-1]%small !=0 else small
end = False
return sum(a)
made it faster with a slight change
This solution worked for me. I iterated on the list only once. initially I find the minimum and iterating over the list I replace the element with the rest of the division. If I find a rest equal to 1 the result will be trivially 1 multiplied by the length of the list otherwise if it is less than the minimum, i will replace the variable m with the minimum found and continue. Once the iteration is finished, the result will be the minimum for the length of the list.
Here the code:
def solution(a):
L = len(a)
if L == 1:
return a[0]
m=min(a)
for i in range(L):
if a[i] != m:
if a[i] % m != 0:
a[i] = a[i]%m
if a[i]<m:
m=a[i]
elif a[i] % m == 0:
a[i] -= m * (a[i] // m - 1)
if a[i]==1:
return 1*L
return m*L
When I was struggling to do Problem 14 in Project Euler, I discovered that I could use a thing called memoization to speed up my process (I let it run for a good 15 minutes, and it still hadn't returned an answer). The thing is, how do I implement it? I've tried to, but I get a keyerror(the value being returned is invalid). This bugs me because I am positive I can apply memoization to this and get this faster.
lookup = {}
def countTerms(n):
arg = n
count = 1
while n is not 1:
count += 1
if not n%2:
n /= 2
else:
n = (n*3 + 1)
if n not in lookup:
lookup[n] = count
return lookup[n], arg
print max(countTerms(i) for i in range(500001, 1000000, 2))
Thanks.
There is also a nice recursive way to do this, which probably will be slower than poorsod's solution, but it is more similar to your initial code, so it may be easier for you to understand.
lookup = {}
def countTerms(n):
if n not in lookup:
if n == 1:
lookup[n] = 1
elif not n % 2:
lookup[n] = countTerms(n / 2)[0] + 1
else:
lookup[n] = countTerms(n*3 + 1)[0] + 1
return lookup[n], n
print max(countTerms(i) for i in range(500001, 1000000, 2))
The point of memoising, for the Collatz sequence, is to avoid calculating parts of the list that you've already done. The remainder of a sequence is fully determined by the current value. So we want to check the table as often as possible, and bail out of the rest of the calculation as soon as we can.
def collatz_sequence(start, table={}): # cheeky trick: store the (mutable) table as a default argument
"""Returns the Collatz sequence for a given starting number"""
l = []
n = start
while n not in l: # break if we find ourself in a cycle
# (don't assume the Collatz conjecture!)
if n in table:
l += table[n]
break
elif n%2 == 0:
l.append(n)
n = n//2
else:
l.append(n)
n = (3*n) + 1
table.update({n: l[i:] for i, n in enumerate(l) if n not in table})
return l
Is it working? Let's spy on it to make sure the memoised elements are being used:
class NoisyDict(dict):
def __getitem__(self, item):
print("getting", item)
return dict.__getitem__(self, item)
def collatz_sequence(start, table=NoisyDict()):
# etc
In [26]: collatz_sequence(5)
Out[26]: [5, 16, 8, 4, 2, 1]
In [27]: collatz_sequence(5)
getting 5
Out[27]: [5, 16, 8, 4, 2, 1]
In [28]: collatz_sequence(32)
getting 16
Out[28]: [32, 16, 8, 4, 2, 1]
In [29]: collatz_sequence.__defaults__[0]
Out[29]:
{1: [1],
2: [2, 1],
4: [4, 2, 1],
5: [5, 16, 8, 4, 2, 1],
8: [8, 4, 2, 1],
16: [16, 8, 4, 2, 1],
32: [32, 16, 8, 4, 2, 1]}
Edit: I knew it could be optimised! The secret is that there are two places in the function (the two return points) that we know l and table share no elements. While previously I avoided calling table.update with elements already in table by testing them, this version of the function instead exploits our knowledge of the control flow, saving lots of time.
[collatz_sequence(x) for x in range(500001, 1000000)] now times around 2 seconds on my computer, while a similar expression with #welter's version clocks in 400ms. I think this is because the functions don't actually compute the same thing - my version generates the whole sequence, while #welter's just finds its length. So I don't think I can get my implementation down to the same speed.
def collatz_sequence(start, table={}): # cheeky trick: store the (mutable) table as a default argument
"""Returns the Collatz sequence for a given starting number"""
l = []
n = start
while n not in l: # break if we find ourself in a cycle
# (don't assume the Collatz conjecture!)
if n in table:
table.update({x: l[i:] for i, x in enumerate(l)})
return l + table[n]
elif n%2 == 0:
l.append(n)
n = n//2
else:
l.append(n)
n = (3*n) + 1
table.update({x: l[i:] for i, x in enumerate(l)})
return l
PS - spot the bug!
This is my solution to PE14:
memo = {1:1}
def get_collatz(n):
if n in memo : return memo[n]
if n % 2 == 0:
terms = get_collatz(n/2) + 1
else:
terms = get_collatz(3*n + 1) + 1
memo[n] = terms
return terms
compare = 0
for x in xrange(1, 999999):
if x not in memo:
ctz = get_collatz(x)
if ctz > compare:
compare = ctz
culprit = x
print culprit
If I need to divide for example 7 into random number of elements of random size, how would I do this?
So that sometimes I would get [3,4], sometimes [2,3,1] and sometimes [2,2,1,1,0,1]?
I guess it's quite simple, but I can't seem to get the results. Here what I am trying to do code-wise (does not work):
def split_big_num(num):
partition = randint(1,int(4))
piece = randint(1,int(num))
result = []
for i in range(partition):
element = num-piece
result.append(element)
piece = randint(0,element)
#What's next?
if num - piece == 0:
return result
return result
EDIT: Each of the resulting numbers should be less than initial number and the number of zeroes should be no less than number of partitions.
I'd go for the next:
>>> def decomposition(i):
while i > 0:
n = random.randint(1, i)
yield n
i -= n
>>> list(decomposition(7))
[2, 4, 1]
>>> list(decomposition(7))
[2, 1, 3, 1]
>>> list(decomposition(7))
[3, 1, 3]
>>> list(decomposition(7))
[6, 1]
>>> list(decomposition(7))
[5, 1, 1]
However, I am not sure if this random distribution is perfectly uniform.
You have to define what you mean by "random". If you want an arbitrary integer partition, you can generate all integer partitions, and use random.choice. See python: Generating integer partitions This would give no results with 0. If you allow 0, you will have to allow results with a potentially infinite number of 0s.
Alternatively if you just want to take random chunks off, do this:
def arbitraryPartitionLessThan(n):
"""Returns an arbitrary non-random partition where no number is >=n"""
while n>0:
x = random.randrange(1,n) if n!=1 else 1
yield x
n -= x
It is slightly awkward due to the problem constraints that each number should be less than the original number; it would be more elegant if you allowed the original number. You can do randrange(n) if you want 0s but it wouldn't make sense unless there is a hidden reason you are not sharing.
edit in response to question edit: Since you desire the "the number of zeroes should be no less than number of partitions" you can arbitrarily add 0s to the end:
def potentiallyInfiniteCopies(x):
while random.random()<0.5:
yield x
x = list(arbitraryPartitionLessThan(n))
x += [0]*len(x) + list(potentiallyInfiniteCopies(0))
The question is quite arbitrary, and I highly recommend that you choose this instead as your answer:
def arbitraryPartition(n):
"""Returns an arbitrary non-random partition"""
while n>0:
x = random.randrange(1,n+1)
yield x
n -= x
Recursion to the rescue:
import random
def splitnum(num, lst=[]):
if num == 0:
return lst
n = random.randint(0, num)
return splitnum(num - n, lst + [n])
for i in range(10):
print splitnum(7)
Result:
[1, 6]
[6, 0, 0, 1]
[5, 1, 1]
[6, 0, 1]
[2, 0, 3, 1, 1]
[7]
[2, 1, 0, 4]
[7]
[3, 4]
[2, 0, 4, 1]
This solution does not insert 0s (I do not understand what your description of your zeros rule is supposed to be), and is equally likely to generate every possible combination other than the original number by itself.
def split (n):
answer = [1]
for i in range(n - 1):
if random.random() < 0.5:
answer[-1] += 1
else:
answer.append(1)
if answer == [n]:
return split(n)
else:
return answer