Python: parsing a string of concatenated ascending integers - python

The objective is to parse the output of an ill-behaving program which concatenates a list of numbers, e.g., 3, 4, 5, into a string "345", without any non-number separating the numbers. I also know that the list is sorted in ascending order.
I came up with the following solution which reconstructs the list from a string:
a = '3456781015203040'
numlist = []
numlist.append(int(a[0]))
i = 1
while True:
j = 1
while True:
if int(a[i:i+j]) <= numlist[-1]:
j = j + 1
else:
numlist.append(int(a[i:i+j]))
i = i + j
break
if i >= len(a):
break
This works, but I have a feeling that the solution reflects too much the fact that I have been trained in Pascal, decades ago. Is there a better or more pythonic way to do it?
I am aware that the problem is ill-posed, i.e., I could start with '34' as the initial element and get a different solution (or possibly end up with remaining trailing numeral characters which don't form the next element of the list).

This finds solutions for all possible initial number lengths:
a = '3456781015203040'
def numbers(a,n):
current_num, i = 0, 0
while True:
while i+n <= len(a) and int(a[i:i+n]) <= current_num:
n += 1
if i+n <= len(a):
current_num = int(a[i:i+n])
yield current_num
i += n
else:
return
for n in range(1,len(a)):
l = list(numbers(a,n))
# print only solutions that use up all digits of a
if ''.join(map(str,l)) == a:
print(l)
[3, 4, 5, 6, 7, 8, 10, 15, 20, 30, 40]
[34, 56, 78, 101, 520, 3040]
[34567, 81015, 203040]

little modification which allows to parse "7000000000001" data and give the best output (max list size)
a = 30000001
def numbers(a,n):
current_num, i = 0, 0
while True:
while i+n <= len(a) and int(a[i:i+n]) <= current_num:n += 1
if i+2*n>len(a):current_num = int(a[i:]);yield current_num; return
elif i+n <= len(a):current_num = int(a[i:i+n]);yield current_num;i += n
else: return
print(current_num)
for n in range(1,len(a)):
l = list(numbers(a,n))
if "".join(map(str,l)) == a:print (l)

Related

Find large number in a list, where all previous numbers are also in the list

I am trying to implement a Yellowstone Integer calculation which suggests that "Every number appears exactly once: this is a permutation of the positive numbers". The formula I have implemented to derive the values is as follows:
import math
yellowstone_list = []
item_list = []
i = 0
while i <= 1000:
if i <= 3:
yellowstone_list.append(i)
else:
j = 1
inList = 1
while inList == 1:
minus_1 = math.gcd(j, yellowstone_list[i-1])
minus_2 = math.gcd(j, yellowstone_list[i-2])
if minus_1 == 1 and minus_2 > 1:
if j in yellowstone_list:
inList = 1
else:
inList = 0
j += 1
yellowstone_list.append(j - 1)
item_list.append(i)
i += 1
The issue becomes that as i increases, the time taken for the formula to determine the value of j also increases (naturally as i is increasingly further away from the start point of j).
What I would like to do is determine the largest value of j in the yellowstone_list, where all the values of 1 to j are already in the list.
As an example, in the below list, j would be 9, as all the values 0 - 9 are in the list:
yellowstone_list = [0, 1, 2, 3, 4, 9, 8, 15, 14, 5, 6, 25, 12, 35, 16, 7]
Any suggestions on how to implement this in an efficient manner?
For the "standalone" problem as stated the algorithm would be:
Sort the list.
Run a counter from 0 while in parallel traversing the list. Once the counter value is unequal to the list element, then you have found one-past the wanted element.
Something like the following:
x=[0, 1, 2, 3, 4, 9, 8, 15, 14, 5, 6, 25, 12, 35, 16, 7]
y=sorted(x)
for i in range(1, len(y)):
if y[i]!=i:
print(i-1)
break
But in your case it appears that the initial list is being built gradually. So each time a number is added to the list, it can be inserted in a sorted manner and can be checked against the previous element and the traversal can start from there for more efficient process.
This is how I would do it:
lst.sort()
for c, i in enumerate(lst):
if c + 1 < len(lst) and lst[c + 1] != i + 1:
j = i
break
else:
j = i
Basically, the list is sorted, and then, it loops through each value, checking if the next value is only 1 greater than the current.
After some time to sit down and think about it, and using the suggestions to sort the list, I came up with two solutions:
Sorting
I implemented #eugebe Sh.'s solution within the while i < 1000 loop as follows:
while i <= 1000:
m = sorted(yellowstone_list)
for n in range(1, len(m)):
if m[n]!=n:
break
if i == 0:
....
In List
I ran an increment to check if the value was in the list using the "in" function, also within the while i < 1000 loop, as follows:
while i <= 1000:
while k in yellowstone_list:
k += 1
if i == 0:
....
Running both codes 100 times, I got the following:
Sorting: Total: 1:56.403527 seconds, Average: 1.164035 seconds.
In List: Total: 1:14.225230 seconds, Average: 0.742252 seconds.

Automatically generate list from math function?

My idea is to run the 3n + 1 process (Collatz conjecture) on numbers ending in 1, 3, 7, and 9, within any arbitrary range, and to tell the code to send the lengths of each action to a list, so I can run functions on that list separately.
What I have so far is to specify unit digits 1,3,7 and 9 as: if n % 10 == 1; if n % 10 == 3 ...etc, and I think my plan needs some form of nested for loops; where I'm at with list appending is to have temp = [] and leng = [] and find a way for the code to automatically temp.clear() before each input to leng. I'm assuming there's different ways to do this, and I'm open to any ideas.
leng = []
temp = []
def col(n):
while n != 1:
print(n)
temp.append(n)
if n % 2 == 0:
n = n // 2
else:
n = n * 3 + 1
temp.append(n)
print(n)
It's unclear what specifically you're asking about and want to know, so this is only a guess. Since you only want to know the lengths of the sequences, there's no need to actually save the numbers in each one—which means there's only one list created.
def collatz(n):
""" Return length of Collatz sequence beginning with positive integer "n".
"""
count = 0
while n != 1:
n = n // 2 if n % 2 == 0 else n*3 + 1
count += 1
return count
def process_range(start, stop):
""" Return list of results of calling the collatz function to the all the
numbers in the closed interval [start...stop] that end with a digit
in the set {1, 3, 7, or 9}.
"""
return [collatz(n) for n in range(start, stop+1) if n % 10 in {1, 3, 7, 9}]
print(process_range(1, 42))
Output:
[0, 7, 16, 19, 14, 9, 12, 20, 7, 15, 111, 18, 106, 26, 21, 34, 109]

if a number is divisible by all the entries of a list then

This came up while attempting Problem 5 of Project Euler, I'm sorry if this is vague or obvious I am new to programming.
Suppose I have a list of integers
v = range(1,n) = [1, ..., n]
What I want to do is this:
if m is divisible by all the entries of v then I want to set
m/v[i] for i starting at 2 and iterating up
then I want to keep repeating this process until I eventually get something which is not divisible by all the entries of v.
Here is a specific example:
let v=[1,2,3,4] and m = 24
m is divisible by 1, 2, 3, and 4, so we divide m by 2 giving us
m=12 which is divisible by 1, 2, 3, and 4 , so we divide by 3
giving us m=4 which is not divisible by 1, 2, 3, and 4. So we stop here.
Is there a way to do this in python using a combination of loops?
I think this code will solve your problem:
i=1
while(True):
w=[x for x in v if (m%x)==0]
if(w==v):
m/=v[i]
i+=1
continue
elif(m!=v):
break
Try this out of size, have a feeling this is what you were asking for:
v = [1,2,3,4]
m = 24
cont = True
c = 1
d = m
while cont:
d = d/c
for i in v:
if d % i != 0:
cont = False
result = d
break
c+=1
print (d)
Got an output of 4.
I think this piece of code should do what you're asking for:
v = [1,2,3,4]
m = 24
index = 1
done = False
while not done:
if all([m % x == 0 for x in v]):
m = m // v[index]
if index + 1 == len(v):
print('Exhausted v')
done = True
else:
index += 1
else:
done = True
print('Not all elements in v evenly divide m')
That said, this is not the best way to go about solving Project Euler problem 5. A more straightforward and faster approach would be:
solved = False
num = 2520
while not solved:
num += 2520
if all([num % x == 0 for x in [11, 13, 14, 16, 17, 18, 19, 20]]):
solved = True
print(num)
In this approach, we known that the answer will be a multiple of 2520, so we increment the value we're checking by that amount. We also know that the only values that need to be checked are in [11, 13, 14, 16, 17, 18, 19, 20], because the number in the range [1,20] that aren't in that list are factors of at least one of the numbers in the list.

Is this Longest Common Subsequence Correct?

I just wrote this implementation to find out the length of the longest increasing subsequence using dynamic programming. So for input as [10, 22, 9, 33, 21, 50, 41, 60, 80] the LIS is 6 and one of the set is [10, 22, 33, 50, 60, 80].
When I run the below code I get the correct answer as 6 with O(n) complexity. Is it correct?
def lis(a):
dp_lis = []
curr_index = 0
prev_index = 0
for i in range(len(a)):
prev_index = curr_index
curr_index = i
print 'if: %d < %d and %d < %d' % (prev_index, curr_index, a[prev_index], a[curr_index])
if prev_index < curr_index and a[prev_index] < a[curr_index]:
print '\tadd ELEMENT: ', a[curr_index]
new_lis = 1 + max(dp_lis)
dp_lis.append(new_lis)
else:
print '\telse ELEMENT: ', a[curr_index]
dp_lis.append(1)
print "DP LIST: ", dp_lis
return max(dp_lis)
if __name__ == '__main__':
a = [10, 22, 9, 33, 21, 50, 41, 60, 80]
print lis(a)
Use this correct, proven but inefficient implementation of the algorithm to check against your results - it's the standard recursive solution, it doesn't use dynamic programming:
def lis(nums):
def max_length(i):
if i == -1:
return 0
maxLen, curLen = 0, 0
for j in xrange(i-1, -1, -1):
if nums[j] < nums[i]:
curLen = max_length(j)
if curLen > maxLen:
maxLen = curLen
return 1 + maxLen
if not nums:
return 0
return max(max_length(x) for x in xrange(len(nums)))
Check to see if your_lis(nums) == my_lis(nums) for as many different-sized input lists with numbers as possible, they should be equal. At some point, for long lists my implementation will be far slower than yours.
As a further comparison point, here's my own optimized dynamic programming solution. It runs in O(n log k) time and O(n) space, returning the actual longest increasing subsequences it finds along the way:
def an_lis(nums):
table, lis = lis_table(nums), []
for i in xrange(len(table)):
lis.append(nums[table[i]])
return lis
def lis_table(nums):
if not nums:
return []
table, preds = [0], [0] * len(nums)
for i in xrange(1, len(nums)):
if nums[table[-1]] < nums[i]:
preds[i] = table[-1]
table.append(i)
continue
minIdx, maxIdx = 0, len(table)-1
while minIdx < maxIdx:
mid = (minIdx + maxIdx) / 2
if nums[table[mid]] < nums[i]:
minIdx = mid + 1
else:
maxIdx = mid
if nums[i] < nums[table[minIdx]]:
if minIdx > 0:
preds[i] = table[minIdx-1]
table[minIdx] = i
current, i = table[-1], len(table)
while i:
i -= 1
table[i], current = current, preds[current]
return table
I implement dynamic programming algorithms fairly often.
I have found that the best way to check for correctness is to write a brute-force version of the algorithm and compare the output with the dynamic programming implementation on small examples.
If the output of the two versions agree, then you have reasonable confidence of correctness.

Python - Memoization and Collatz Sequence

When I was struggling to do Problem 14 in Project Euler, I discovered that I could use a thing called memoization to speed up my process (I let it run for a good 15 minutes, and it still hadn't returned an answer). The thing is, how do I implement it? I've tried to, but I get a keyerror(the value being returned is invalid). This bugs me because I am positive I can apply memoization to this and get this faster.
lookup = {}
def countTerms(n):
arg = n
count = 1
while n is not 1:
count += 1
if not n%2:
n /= 2
else:
n = (n*3 + 1)
if n not in lookup:
lookup[n] = count
return lookup[n], arg
print max(countTerms(i) for i in range(500001, 1000000, 2))
Thanks.
There is also a nice recursive way to do this, which probably will be slower than poorsod's solution, but it is more similar to your initial code, so it may be easier for you to understand.
lookup = {}
def countTerms(n):
if n not in lookup:
if n == 1:
lookup[n] = 1
elif not n % 2:
lookup[n] = countTerms(n / 2)[0] + 1
else:
lookup[n] = countTerms(n*3 + 1)[0] + 1
return lookup[n], n
print max(countTerms(i) for i in range(500001, 1000000, 2))
The point of memoising, for the Collatz sequence, is to avoid calculating parts of the list that you've already done. The remainder of a sequence is fully determined by the current value. So we want to check the table as often as possible, and bail out of the rest of the calculation as soon as we can.
def collatz_sequence(start, table={}): # cheeky trick: store the (mutable) table as a default argument
"""Returns the Collatz sequence for a given starting number"""
l = []
n = start
while n not in l: # break if we find ourself in a cycle
# (don't assume the Collatz conjecture!)
if n in table:
l += table[n]
break
elif n%2 == 0:
l.append(n)
n = n//2
else:
l.append(n)
n = (3*n) + 1
table.update({n: l[i:] for i, n in enumerate(l) if n not in table})
return l
Is it working? Let's spy on it to make sure the memoised elements are being used:
class NoisyDict(dict):
def __getitem__(self, item):
print("getting", item)
return dict.__getitem__(self, item)
def collatz_sequence(start, table=NoisyDict()):
# etc
In [26]: collatz_sequence(5)
Out[26]: [5, 16, 8, 4, 2, 1]
In [27]: collatz_sequence(5)
getting 5
Out[27]: [5, 16, 8, 4, 2, 1]
In [28]: collatz_sequence(32)
getting 16
Out[28]: [32, 16, 8, 4, 2, 1]
In [29]: collatz_sequence.__defaults__[0]
Out[29]:
{1: [1],
2: [2, 1],
4: [4, 2, 1],
5: [5, 16, 8, 4, 2, 1],
8: [8, 4, 2, 1],
16: [16, 8, 4, 2, 1],
32: [32, 16, 8, 4, 2, 1]}
Edit: I knew it could be optimised! The secret is that there are two places in the function (the two return points) that we know l and table share no elements. While previously I avoided calling table.update with elements already in table by testing them, this version of the function instead exploits our knowledge of the control flow, saving lots of time.
[collatz_sequence(x) for x in range(500001, 1000000)] now times around 2 seconds on my computer, while a similar expression with #welter's version clocks in 400ms. I think this is because the functions don't actually compute the same thing - my version generates the whole sequence, while #welter's just finds its length. So I don't think I can get my implementation down to the same speed.
def collatz_sequence(start, table={}): # cheeky trick: store the (mutable) table as a default argument
"""Returns the Collatz sequence for a given starting number"""
l = []
n = start
while n not in l: # break if we find ourself in a cycle
# (don't assume the Collatz conjecture!)
if n in table:
table.update({x: l[i:] for i, x in enumerate(l)})
return l + table[n]
elif n%2 == 0:
l.append(n)
n = n//2
else:
l.append(n)
n = (3*n) + 1
table.update({x: l[i:] for i, x in enumerate(l)})
return l
PS - spot the bug!
This is my solution to PE14:
memo = {1:1}
def get_collatz(n):
if n in memo : return memo[n]
if n % 2 == 0:
terms = get_collatz(n/2) + 1
else:
terms = get_collatz(3*n + 1) + 1
memo[n] = terms
return terms
compare = 0
for x in xrange(1, 999999):
if x not in memo:
ctz = get_collatz(x)
if ctz > compare:
compare = ctz
culprit = x
print culprit

Categories