Python - Memoization and Collatz Sequence - python

When I was struggling to do Problem 14 in Project Euler, I discovered that I could use a thing called memoization to speed up my process (I let it run for a good 15 minutes, and it still hadn't returned an answer). The thing is, how do I implement it? I've tried to, but I get a keyerror(the value being returned is invalid). This bugs me because I am positive I can apply memoization to this and get this faster.
lookup = {}
def countTerms(n):
arg = n
count = 1
while n is not 1:
count += 1
if not n%2:
n /= 2
else:
n = (n*3 + 1)
if n not in lookup:
lookup[n] = count
return lookup[n], arg
print max(countTerms(i) for i in range(500001, 1000000, 2))
Thanks.

There is also a nice recursive way to do this, which probably will be slower than poorsod's solution, but it is more similar to your initial code, so it may be easier for you to understand.
lookup = {}
def countTerms(n):
if n not in lookup:
if n == 1:
lookup[n] = 1
elif not n % 2:
lookup[n] = countTerms(n / 2)[0] + 1
else:
lookup[n] = countTerms(n*3 + 1)[0] + 1
return lookup[n], n
print max(countTerms(i) for i in range(500001, 1000000, 2))

The point of memoising, for the Collatz sequence, is to avoid calculating parts of the list that you've already done. The remainder of a sequence is fully determined by the current value. So we want to check the table as often as possible, and bail out of the rest of the calculation as soon as we can.
def collatz_sequence(start, table={}): # cheeky trick: store the (mutable) table as a default argument
"""Returns the Collatz sequence for a given starting number"""
l = []
n = start
while n not in l: # break if we find ourself in a cycle
# (don't assume the Collatz conjecture!)
if n in table:
l += table[n]
break
elif n%2 == 0:
l.append(n)
n = n//2
else:
l.append(n)
n = (3*n) + 1
table.update({n: l[i:] for i, n in enumerate(l) if n not in table})
return l
Is it working? Let's spy on it to make sure the memoised elements are being used:
class NoisyDict(dict):
def __getitem__(self, item):
print("getting", item)
return dict.__getitem__(self, item)
def collatz_sequence(start, table=NoisyDict()):
# etc
In [26]: collatz_sequence(5)
Out[26]: [5, 16, 8, 4, 2, 1]
In [27]: collatz_sequence(5)
getting 5
Out[27]: [5, 16, 8, 4, 2, 1]
In [28]: collatz_sequence(32)
getting 16
Out[28]: [32, 16, 8, 4, 2, 1]
In [29]: collatz_sequence.__defaults__[0]
Out[29]:
{1: [1],
2: [2, 1],
4: [4, 2, 1],
5: [5, 16, 8, 4, 2, 1],
8: [8, 4, 2, 1],
16: [16, 8, 4, 2, 1],
32: [32, 16, 8, 4, 2, 1]}
Edit: I knew it could be optimised! The secret is that there are two places in the function (the two return points) that we know l and table share no elements. While previously I avoided calling table.update with elements already in table by testing them, this version of the function instead exploits our knowledge of the control flow, saving lots of time.
[collatz_sequence(x) for x in range(500001, 1000000)] now times around 2 seconds on my computer, while a similar expression with #welter's version clocks in 400ms. I think this is because the functions don't actually compute the same thing - my version generates the whole sequence, while #welter's just finds its length. So I don't think I can get my implementation down to the same speed.
def collatz_sequence(start, table={}): # cheeky trick: store the (mutable) table as a default argument
"""Returns the Collatz sequence for a given starting number"""
l = []
n = start
while n not in l: # break if we find ourself in a cycle
# (don't assume the Collatz conjecture!)
if n in table:
table.update({x: l[i:] for i, x in enumerate(l)})
return l + table[n]
elif n%2 == 0:
l.append(n)
n = n//2
else:
l.append(n)
n = (3*n) + 1
table.update({x: l[i:] for i, x in enumerate(l)})
return l
PS - spot the bug!

This is my solution to PE14:
memo = {1:1}
def get_collatz(n):
if n in memo : return memo[n]
if n % 2 == 0:
terms = get_collatz(n/2) + 1
else:
terms = get_collatz(3*n + 1) + 1
memo[n] = terms
return terms
compare = 0
for x in xrange(1, 999999):
if x not in memo:
ctz = get_collatz(x)
if ctz > compare:
compare = ctz
culprit = x
print culprit

Related

Obtain a strictly increasing sequence python

Question:
"Given a sequence of integers as an array, determine whether it is possible to obtain a strictly increasing sequence by removing no more than one element from the array.
Note: sequence a0, a1, ..., an is considered to be a strictly increasing if a0 < a1 < ... < an. Sequence containing only one element is also considered to be strictly increasing.
Examples:
For sequence = [1, 3, 2, 1], the output should be
solution(sequence) = false. There is no one element in this array that can be removed in order to get a strictly increasing sequence.
For sequence = [1, 3, 2], the output should be
solution(sequence) = true. You can remove 3 from the array to get the strictly increasing sequence [1, 2]. Alternately, you can remove 2 to get the strictly increasing sequence [1, 3]."
Here's my code:
def solution(sequence):
if len(sequence) == 1:
return True
else:
count = 0
for i in range(0,len(sequence) - 1):
if sequence[i] >= sequence[i + 1]:
count = count + 1
for i in range(0,len(sequence) - 2):
if sequence[i] >= sequence[i + 2]:
count = count + 1
return count <= 1
My code covers three cases:
Case 1: when the sequence is just one element long. I caught that in the first if statement.
Case 2: If there's more than one down-step – a case where the neighbour is less than the element being considered – then there is no way to adjust the sequence with just one removal, and so we get false (count > 1). I caught this in the first if statement.
Case 3: There are some cases, however, where there is only one down-step but there is still no way to remove just one element. This happens when the second element along is also less than the element being considered. For example, with [1,4,3,2] even if you removed the 3, you would still get a downstep. Now I covered this case by doing a second check, which checked whether the element two along is less, and if it is, then we add to the count.
Case 4: There is a case my code doesn't cover, which seems to be the only one, and that is when an element's neighbour and the next element along are both smaller than the element under consideration, but we could solve the issue just by getting rid of the element under consideration. So, with [1,4,2,3] both 2 and 3 are smaller than 4, but if we just got rid of the 4, then we're good. This case can occur either when the problem element is the first in the sequence or not. I'm not sure how to capture this properly. I suppose you might add in a conditional which looked at whether i-2 is less than i+1, but this won't work when i indexes the first element and it's quite cumbersome. I'm not sure how to go sort this out.
I'm quite sure I've overcomplicated things and what really is needed is to step back and think of a less piece-meal solution, but I'm stuck. Could anyone help? Note that we don't have to actually obtain the strictly increasing sequence; we just have to see whether we could.
Here is an idea you can a look at (edited after comments)
def distinct(L: list[int]) -> bool:
return len(L) == len(set(L))
def almost_increasing(L: list[int]) -> bool:
# some trivial cases
if len(L) <= 2: return True
if L[1: ] == sorted(L[1: ]) and distinct(L[1: ]): return True
if L[:-1] == sorted(L[:-1]) and distinct(L[:-1]): return True
return any(
L[ :i] == sorted(L[ :i]) and distinct(L[ :i]) and
L[i+1:] == sorted(L[i+1:]) and distinct(L[i+1:]) and
L[i-1 ] < L[i+1]
for i in range(1, len(L)-1)
)
And here is a nice way you can test it with hypothesis and pytest:
#given(L=st.lists(st.integers(), min_size=2, max_size=6))
def test_almost_increasing(L: list[int]):
expected = False
for i in range(len(L)):
Lcopy = L.copy()
del Lcopy[i]
expected |= (Lcopy == sorted(Lcopy) and distinct(Lcopy))
received = almost_increasing(L)
assert received == expected
Let's split the input into at most two increasing subsequences. If this is not possible, return false.
If there's only one sequence, return true.
If the length of either sequence is 1, the answer is true - we simply remove this element.
Otherwise we can join two sequences by either removing the last item from the first sequence, or the first item of the second one. That is,
if a[j-2] < a[j] -> ok, remove a[j - 1]
if a[j-1] < a[j + 1] -> ok, remove a[j]
where j is the index where the second sequence starts.
Code:
def check(a):
j = 0
for i in range(1, len(a)):
if a[i - 1] >= a[i]:
if j > 0:
return None
j = i
if j == 0:
return a
if j == 1:
return a[1:]
if j == len(a) - 1:
return a[:-1]
if a[j - 2] < a[j]:
return a[:(j - 1)] + a[j:]
if a[j - 1] < a[j + 1]:
return a[:j] + a[(j + 1):]
return None
assert check([2, 4, 6, 8]) == [2, 4, 6, 8], 'j == 0'
assert check([9, 4, 6, 8]) == [4, 6, 8], 'j == 1'
assert check([4, 6, 8, 1]) == [4, 6, 8], 'j == len-1'
assert check([2, 4, 9, 6, 8]) == [2, 4, 6, 8], 'j-2 < j'
assert check([2, 4, 1, 6, 8]) == [2, 4, 6, 8], 'j-1 < j+1'
assert check([2, 2, 2, 2]) is None, 'early return'
assert check([2, 8, 9, 6, 1]) is None, 'early return'
assert check([2, 4, 9, 3, 5]) is None, 'last return'
assert check([2]) == [2]
Try this
def solution(sequence):
n = len(sequence)
for i in range(n):
count = 0
trail = sequence[:]
del trail[i]
m = len(trail)
for j in range(m-1):
if trail[j] >= trail[j+1]:
count += 1
if count == 0:
return True
return False
This is not efficient nor optimized but it does the work.
Try this:
def is_solution(list_to_check):
if len(list_to_check) == 1:
return True
for i in range(1, len(list_to_check)):
if list_to_check[i] <= list_to_check[i - 1]:
new_list = list_to_check[:i - 1] + list_to_check[i:]
if (list_to_check[i] > list_to_check[i - 2]
and new_list == sorted(new_list)):
return True
elif (i == len(list_to_check) - 1
and list_to_check[:-1] == sorted(list_to_check[:-1])):
return True
return False
if __name__ == '__main__':
list_to_check = [1, 2, 1]
print(is_solution(list_to_check))
def solution(sequence):
"""determine strict increase"""
sequence_length = len(sequence)
if (
sequence_length == 0
or not len(set(sequence)) + 1 >= sequence_length
):
return False
return True

Optimizing permutation generator where total of each permutation totals to same value

I'm wanting to create a list of permutations or cartesian products (not sure which one applies here) where the sum of values in each permutation totals to a provided value.
There should be three parameters required for the function.
Sample Size: The number of items in each permutation
Desired Sum: The total that each permutation should add up to
Set of Numbers: The set of numbers that can be included with repetition in the permutations
I have an implementation working below but it seems quite slow I would prefer to use an iterator to stream the results but I would also need a function that would be able to calculate the total number of items that the iterator would produce.
def buildPerms(sample_size, desired_sum, set_of_number):
blank = [0] * sample_size
return recurseBuildPerms([], blank, set_of_number, desired_sum)
def recurseBuildPerms(perms, blank, values, desired_size, search_index = 0):
for i in range(0, len(values)):
for j in range(search_index, len(blank)):
if(blank[j] == 0):
new_blank = blank.copy()
new_blank[j] = values[i]
remainder = desired_size - sum(new_blank)
new_values = list(filter(lambda x: x <= remainder, values))
if(len(new_values) > 0):
recurseBuildPerms(perms, new_blank, new_values, desired_size, j)
elif(sum(new_blank) <= desired_size):
perms.append( new_blank)
return perms
perms = buildPerms(4, 10, [1,2,3])
print(perms)
## Output
[[1, 3, 3, 3], [2, 2, 3, 3], [2, 3, 2, 3],
[2, 3, 3, 2], [3, 1, 3, 3], [3, 2, 2, 3],
[3, 2, 3, 2], [3, 3, 1, 3], [3, 3, 2, 2],
[3, 3, 3, 1]]
https://www.online-python.com/9cmOev3zlg
Questions:
Can someone help me convert my solution into an iterator?
Is it possible to have a calculation to know the total number of items without seeing the full list?
Here is one way to break this down into two subproblems:
Find all restricted integer partitions of target_sum into sample_size summands s.t. all summands come from set_of_number.
Compute multiset permutations for each partition (takes up most of the time).
Problem 1 can be solved with dynamic programming. I used multiset_permutations from sympy for part 2, although you might be able to get better performance by writing your own numba code.
Here is the code:
from functools import lru_cache
from sympy.utilities.iterables import multiset_permutations
#lru_cache(None)
def restricted_partitions(n, k, *xs):
'partitions of n into k summands using only elements in xs (assumed positive integers)'
if n == k == 0:
# case of unique empty partition
return [[]]
elif n <= 0 or k <= 0 or not xs:
# case where no partition is possible
return []
# general case
result = list()
x = xs[0] # element x we consider including in a partition
i = 0 # number of times x should be included
while True:
i += 1
if i > k or x * i > n:
break
for rest in restricted_partitions(n - x * i, k - i, *xs[1:]):
result.append([x] * i + rest)
result.extend(restricted_partitions(n, k, *xs[1:]))
return result
def buildPerms2(sample_size, desired_sum, set_of_number):
for part in restricted_partitions(desired_sum, sample_size, *set_of_number):
yield from multiset_permutations(part)
# %timeit sum(1 for _ in buildPerms2(8, 16, [1, 2, 3, 4])) # 16 ms
# %timeit sum(1 for _ in buildPerms (8, 16, [1, 2, 3, 4])) # 604 ms
The current solution requires computing all restricted partitions before iteration can begin, but it may still be practical if restricted partitions can be computed quickly. It may be possible to compute partitions iteratively as well, although this may require more work.
On the second question, you can indeed count the number of such permutations without generating them all:
# present in the builtin math library for Python 3.8+
#lru_cache(None)
def binomial(n, k):
if k == 0:
return 1
if n == 0:
return 0
return binomial(n - 1, k) + binomial(n - 1, k - 1)
#lru_cache(None)
def perm_counts(n, k, *xs):
if n == k == 0:
# case of unique empty partition
return 1
elif n <= 0 or k <= 0 or not xs:
# case where no partition is possible
return 0
# general case
result = 0
x = xs[0] # element x we consider including in a partition
i = 0 # number of times x should be included
while True:
i += 1
if i > k or x * i > n:
break
result += binomial(k, i) * perm_counts(n - x * i, k - i, *xs[1:])
result += perm_counts(n, k, *xs[1:])
return result
# assert perm_counts(15, 6, *[1,2,3,4]) == sum(1 for _ in buildPerms2(6, 15, [1,2,3,4])) == 580
# perm_counts(1000, 100, *[1,2,4,8,16,32,64])
# 902366143258890463230784240045750280765827746908124462169947051257879292738672
The function used to count all restricted permutations looks very similar to the function that generates partitions above. The only significant change is in the following line:
result += binomial(k, i) * perm_counts(n - x * i, k - i, *xs[1:])
There are i copies of x to include and k possible positions where x's may end up. To account for this multiplicity, the number of ways to resolve the recursive sub-problem is multiplied by k choose i.

Find The Parity Outlier, CodeWars question on python

I am trying to solve this problem but it doesn't give me the correct answer. Here is the problem:
You are given an array (which will have a length of at least 3, but could be very large) containing integers. The array is either entirely comprised of odd integers or entirely comprised of even integers except for a single integer N. Write a method that takes the array as an argument and returns this "outlier" N.
Here is my code:
a = [2, 4, 6, 8, 10, 3]
b = [2, 4, 0, 100, 4, 11, 2602, 36]
c = [160, 3, 1719, 19, 11, 13, -21]
def find_outlier(list_integers):
for num in list_integers:
if num % 2 != 0:
odd = num
elif num % 2 == 0:
even = num
for num in list_integers:
if len([odd]) < len([even]):
return odd
else:
return even
print(find_outlier(a))
print(find_outlier(b))
print(find_outlier(c))
It spits out 10, 36, 160 and obviously only the last one is correct. Can anyone help me out with it?
Thanks!
You could analyze the first three and find the outlier if it is there.
If it is there, you are done. If not, you know the expected parity and can test each subsequent element accordingly.
Creating lists for odd/even numbers, while in principle leading to a result, is unnecessarily memory inefficient.
In code this could look something like:
def find_outlier(seq):
par0 = seq[0] % 2
par1 = seq[1] % 2
if par0 != par1:
return seq[1] if seq[2] % 2 == par0 else seq[0]
# the checks on the first 2 elements are redundant, but avoids copying
# for x in seq[2:]: would do less iteration but will copy the input
for x in seq:
if x % 2 != par0:
return x
a = [2, 4, 6, 8, 10, 3]
b = [2, 4, 0, 100, 4, 11, 2602, 36]
c = [160, 3, 1719, 19, 11, 13, -21]
print(find_outlier(a))
# 3
print(find_outlier(b))
# 11
print(find_outlier(c))
# 160
Your code could not work in its current form:
this block:
for num in list_integers:
if num % 2 != 0:
odd = num
elif num % 2 == 0:
even = num
will just have the last seen odd in odd and the last seen even in even, without any info on how many were seen. You would need to count how many even/odd numbers are there and eventually need to store the first value encountered for each parity.
this second block
for num in list_integers:
if len([odd]) < len([even]):
return odd
else:
return even
is always checking the length of the length-1 lists, and will always return even.
I see no simple way of fixing your code to make it with comparable efficiency as the above solution. But you could adapt your code to make it reasonably efficient (O(n) in time -- but without short-circuiting, O(1) in memory):
def find_outlier_2(seq):
odd = None
even = None
n_odd = n_even = 0
for x in seq:
if x % 2 == 0:
if even is None: # save first occurrence
even = x
n_even += 1
else: # no need to compute the modulus again
if odd is None: # save first occurrente
odd = x
n_odd += 1
if n_even > 1:
return odd
else:
return even
The above is significantly more efficient than some of the other answers in that it does not create unnecessary lists.
For example, these solutions are unnecessarily more memory consuming (being O(n) in time and O(n) in memory):
def find_outlier_3(list_integers):
odd = []
even = []
for num in list_integers:
if num % 2 != 0:
odd.append(num)
elif num % 2 == 0:
even.append(num)
if len(odd) < len(even):
return odd[0]
else:
return even[0]
def find_outlier_4(lst):
odds = [el % 2 for el in lst]
if odds.count(0) == 1:
return lst[odds.index(0)]
else:
return lst[odds.index(1)]
Simple benchmarks show that these solutions are also slower:
%timeit [find_outlier(x) for x in (a, b, c) * 100]
# 10000 loops, best of 3: 128 µs per loop
%timeit [find_outlier_2(x) for x in (a, b, c) * 100]
# 1000 loops, best of 3: 229 µs per loop
%timeit [find_outlier_3(x) for x in (a, b, c) * 100]
# 1000 loops, best of 3: 341 µs per loop
%timeit [find_outlier_4(x) for x in (a, b, c) * 100]
# 1000 loops, best of 3: 248 µs per loop
You can nicely use list comprehensions for this:
a = [2, 4, 6, 8, 10, 3]
b = [2, 4, 0, 100, 4, 11, 2602, 36]
c = [160, 3, 1719, 19, 11, 13, -21]
def outlier(lst):
odds = [ el % 2 for el in lst ] # list with 1's when odd, 0's when even
print(odds) # just to show what odds contains
if odds.count(0) == 1: # if the amount of zeros (even numbers) = 1 in this list
print(lst[odds.index(0)]) # find the index of the 'zero' and use it to read the value from the input lst
else:
print(lst[odds.index(1)]) # find the index of the 'one' and use it to read the value from the input lst
outlier(a)
outlier(b)
outlier(c)
Output
[0, 0, 0, 0, 0, 1] # only 1 'one' so use the position of that 'one'
3
[0, 0, 0, 0, 0, 1, 0, 0] # only 1 'one' so use the position of that 'one'
11
[0, 1, 1, 1, 1, 1, 1] # only 1 'zero' so use the position of that 'zero'
160
Count the number of odd values in the first three items in the list. This can be done using a sum(). It the sum > 1, the list has mostly odd numbers, so find the even outlier. Otherwise find the odd outlier.
def find_outlier(sequence):
if sum(x & 1 for x in numbers[:3]) > 1:
# find the even outlier
for n in sequence:
if not n & 1:
return n
else:
# find the odd outlier
for n in sequence:
if n & 1:
return n
I imagine it would be a bit more efficient to first determine if the outlier is odd or even by looking at a small sample, then return just the outlier using list comprehension. This way, if the list is massive, you won't have timeout issues.
Here's what I would do:
def findoutlier(yourlist):
if (yourlist[0] % 2 == 0 and yourlist[1] % 2 == 0) or (yourlist[0] % 2 == 0 and yourlist[2] % 2 == 0) or (yourlist[1] % 2 == 0 and yourlist[2] % 2 == 0):
oddeven = "even"
else:
oddeven = "odd"
if oddeven == "even":
return [i for i in yourlist if i % 2 != 0][0]
else:
return [i for i in yourlist if i % 2 == 0][0]
a = [2, 4, 6, 8, 10, 3]
b = [2, 4, 0, 100, 4, 11, 2602, 36]
c = [160, 3, 1719, 19, 11, 13, -21]
print(findoutlier(a))
print(findoutlier(b))
print(findoutlier(c))
This will return 3, 11, and 160 as expected.
You want to use a list to store your odd/even numbers. Right now you're storing them as int and they're getting replaced on your loop's next iteration.
def find_outlier(list_integers):
odd = []
even = []
for num in list_integers:
if num % 2 != 0:
odd.append(num)
elif num % 2 == 0:
even.append(num)
if len(odd) < len(even):
return odd[0]
else:
return even[0]

Find the number of consecutively increasing elements in a list

I got a problem in TalentBuddy, which sounds like this
A student's performance in lab activities should always improve, but that is not always the case.
Since progress is one of the most important metrics for a student, let’s write a program that computes the longest period of increasing performance for any given student.
For example, if his grades for all lab activities in a course are: 9, 7, 8, 2, 5, 5, 8, 7 then the longest period would be 4 consecutive labs (2, 5, 5, 8).
So far, I seem too confused to work the code. The only thing that I worked is
def longest_improvement(grades):
res = 0
for i in xrange(len(grades) - 2):
while grades[i] <= grades[i + 1]:
res += 1
i += 1
print res
But that prints 17, rather than 6 when grades = [1, 7, 2, 5, 6, 9, 11, 11, 1, 6, 1].
How to work out the rest of the code? Thanks
Solved with some old-fashioned tail-recursion:
grades = [1, 7, 2, 5, 6, 9, 11, 11, 1, 6, 1]
def streak(grades):
def streak_rec(longest, challenger, previous, rest):
if rest == []: # Base case
return max(longest, challenger)
elif previous <= rest[0]: # Streak continues
return streak_rec(longest, challenger + 1, rest[0], rest[1:])
else: # Streak is reset
return streak_rec(max(longest, challenger), 1, rest[0], rest[1:])
return streak_rec(0, 0, 0, grades)
print streak(grades) # => 6
print streak([2]) # => 1
Since the current solution involves yield and maps and additional memory overhead, it's probably a good idea to at least mention the simple solution:
def length_of_longest_sublist(lst):
max_length, cur_length = 1, 1
prev_val = lst[0]
for val in lst[1:]:
if val >= prev_val :
cur_length += 1
else:
max_length = max(max_length, cur_length)
cur_length = 1
prev_val = val
return max(max_length, cur_length)
We could reduce that code by getting the previous value directly:
def length_of_longest_sublist2(lst):
max_length, cur_length = int(bool(lst)), int(bool(lst))
for prev_val, val in zip(lst, lst[1:]):
if val >= prev_val:
cur_length += 1
else:
max_length = max(max_length, cur_length)
cur_length = 1
return max(max_length, cur_length)
which is a nice trick to know (and allows it to easily return the right result for an empty list), but confusing to people who don't know the idiom.
This method uses fairly basic python and the return statement can be quickly modified so that you have a list of all the streak lengths.
def longest_streak(grades):
if len(grades) < 2:
return len(grades)
else:
start, streaks = -1, []
for idx, (x, y) in enumerate(zip(grades, grades[1:])):
if x > y:
streaks.append(idx - start)
start = idx
else:
streaks.append(idx - start + 1)
return max(streaks)
I would solve it this way:
from itertools import groupby
from funcy import pairwise, ilen
def streak(grades):
if len(grades) <= 1:
return len(grades)
orders = (x <= y for x, y in pairwise(grades))
return max(ilen(l) for asc, l in groupby(orders) if asc) + 1
Very explicit: orders is an iterator of Trues for ascending pairs and Falses for descending ones. Then we need just find a longest list of ascending and add 1.
You're using the same res variable in each iteration of the inner while loop. You probably want to reset it, and keep the highest intermediate result in a different variable.
Little bit late, but here's my Updated version:
from funcy import ilen, ireductions
def streak(last, x):
if last and x >= last[-1]:
last.append(x)
return last
return [x]
def longest_streak(grades):
xs = map(ilen, ireductions(streak, grades, None))
return xs and max(xs) or 1
grades = [1, 7, 2, 5, 6, 9, 11, 11, 1, 6, 1]
print longest_streak(grades)
print longest_streak([2])
I decided in the end to not only produce a correct
version without bugs, but to use a library I quite like funcy :)
Output:
6
1
Maybe not as efficient as previous answers, but it's short :P
diffgrades = np.diff(grades)
maxlen = max([len(list(g)) for k,g in groupby(diffgrades, lambda x: x >= 0) if k]) + 1
Building on the idea of #M4rtini to use itertools.groupby.
def longest_streak(grades):
from itertools import groupby
if len(grade) > 1:
streak = [x <= y for x, y in zip(grades,grades[1:])]
return max([sum(g, 1) for k, g in groupby(streak) if k])
else:
return len(grades)

N random, contiguous and non-overlapping subsequences each of length

I'm trying to get n random and non-overlapping slices of a sequence where each subsequence is of length l, preferably in the order they appear.
This is the code I have so far and it's gotten more and more messy with each attempt to make it work, needless to say it doesn't work.
def rand_parts(seq, n, l):
"""
return n random non-overlapping partitions each of length l.
If n * l > len(seq) raise error.
"""
if n * l > len(seq):
raise Exception('length of seq too short for given n, l arguments')
if not isinstance(seq, list):
seq = list(seq)
gaps = [0] * (n + 1)
for g in xrange(len(seq) - (n * l)):
gaps[random.randint(0, len(gaps) - 1)] += 1
result = []
for i, g in enumerate(gaps):
x = g + (i * l)
result.append(seq[x:x+l])
if i < len(gaps) - 1:
gaps[i] += x
return result
For example if we say rand_parts([1, 2, 3, 4, 5, 6], 2, 2) there are 6 possible results that it could return from the following diagram:
[1, 2, 3, 4, 5, 6]
____ ____
[1, 2, 3, 4, 5, 6]
____ ____
[1, 2, 3, 4, 5, 6]
____ ____
[1, 2, 3, 4, 5, 6]
____ ____
[1, 2, 3, 4, 5, 6]
____ ____
[1, 2, 3, 4, 5, 6]
____ ____
So [[3, 4], [5, 6]] would be acceptable but [[3, 4], [4, 5]] wouldn't because it's overlapping and [[2, 4], [5, 6]] also wouldn't because [2, 4] isn't contiguous.
I encountered this problem while doing a little code golfing so for interests sake it would also be nice to see both a simple solution and/or an efficient one, not so much interested in my existing code.
def rand_parts(seq, n, l):
indices = xrange(len(seq) - (l - 1) * n)
result = []
offset = 0
for i in sorted(random.sample(indices, n)):
i += offset
result.append(seq[i:i+l])
offset += l - 1
return result
To understand this, first consider the case l == 1. Then it's basically just returning a random.sample() of the input data in sorted order; in this case the offset variable is always 0.
The case where l > 1 is an extension of the previous case. We use random.sample() to pick up positions, but maintain an offset to shift successive results: in this way, we make sure that they are non-overlapping ranges --- i.e. they start at a distance of at least l of each other, rather than 1.
Many solutions can be hacked for this problem, but one has to be careful if the sequences are to be strictly random. For example, it's wrong to begin by picking a random number between 0 and len(seq)-n*l and say that the first sequence will start there, then work recursively.
The problem is equivalent to selecting randomly n+1 integer numbers such that their sum is equal to len(seq)-l*n. (These numbers will be the "gaps" between your sequences.) To solve it, you can see this question.
This worked for me in Python 3.3.2. It should be backwards compatible with Python 2.7.
from random import randint as r
def greater_than(n, lis, l):
for element in lis:
if n < element + l:
return False
return True
def rand_parts(seq, n, l):
"""
return n random non-overlapping partitions each of length l.
If n * l > len(seq) raise error.
"""
if n * l > len(seq):
raise(Exception('length of seq too short for given n, l arguments'))
if not isinstance(seq, list):
seq = list(seq)
# Setup
left_to_do = n
tried = []
result = []
# The main loop
while left_to_do > 0:
while True:
index = r(0, len(seq) - 1)
if greater_than(index, tried, l) and index <= len(seq) - left_to_do * l:
tried.append(index)
break
left_to_do -= 1
result.append(seq[index:index+l])
# Done
return result
a = [1, 2, 3, 4, 5, 6]
print(rand_parts(a, 3, 2))
The above code will always print [[1, 2], [3, 4], [5, 6]]
If you do it recursively it's much simpler. Take the first part from (so the rest will fit):
[0:total_len - (numer_of_parts - 1) * (len_of_parts)]
and then recurse with what left to do:
rand_parts(seq - begining _to_end_of_part_you_grabbed, n - 1, l)
First of all, I think you need to clarify what you mean by the term random.
How can you generate a truly random list of sub-sequences when you are placing specific restrictions on the sub-sequences themselves?
As far as I know, the best "randomness" anyone can achieve in this context is generating all lists of sub-sequences that satisfy your criteria, and selecting from the pool however many you need in a random fashion.
Now based on my experience from an algorithms class that I've taken a few years ago, your problem seems to be a typical example which could be solved using a greedy algorithm making these big (but likely?) assumptions about what you were actually asking in the first place:
What you actually meant by random is not that a list of sub-sequence should be generated randomly (which is kind of contradictory as I said before), but that any of the solutions that could be produced is just as valid as the rest (e.g. any of the 6 solutions is valid from input [1,2,3,4,5,6] and you don't care which one)
Restating the above, you just want any one of the possible solutions that could be generated, and you want an algorithm that can output one of these valid answers.
Assuming the above here is a greedy algorithm which generates one of the possible lists of sub-sequences in linear time (excluding sorting, which is O(n*log(n))):
def subseq(seq, count, length):
s = sorted(list(set(seq)))
result = []
subseq = []
for n in s:
if len(subseq) == length:
result.append(subseq)
if len(result) == count:
return result
subseq = [n]
elif len(subseq) == 0:
subseq.append(n)
elif subseq[-1] + 1 == n:
subseq.append(n)
elif subseq[-1] + 1 < n:
subseq = [n]
print("Impossible!")
The gist of the algorithm is as follows:
One of your requirements is that there cannot be any overlaps, and this ultimately implies you need to deal with unique numbers and unique numbers only. So I use the set() operation to get rid all the duplicates. Then I sort it.
Rest is pretty straight forward imo. I just iterate over the sorted list and form sub-sequences greedily.
If the algorithm can't form enough number of sub-sequences then print "Impossible!"
Hope this was what you were looking for.
EDIT: For some reason I wrongly assumed that there couldn't be repeating values in a sub-sequence, this one allows it.
def subseq2(seq, count, length):
s = sorted(seq)
result = []
subseq = []
for n in s:
if len(subseq) == length:
result.append(subseq)
if len(result) == count:
return result
subseq = [n]
elif len(subseq) == 0:
subseq.append(n)
elif subseq[-1] + 1 == n or subseq[-1] == n:
subseq.append(n)
elif subseq[-1] + 1 < n:
subseq = [n]
print("Impossible!")

Categories