Is this Longest Common Subsequence Correct?

Is this Longest Common Subsequence Correct? - python

I just wrote this implementation to find out the length of the longest increasing subsequence using dynamic programming. So for input as [10, 22, 9, 33, 21, 50, 41, 60, 80] the LIS is 6 and one of the set is [10, 22, 33, 50, 60, 80].
When I run the below code I get the correct answer as 6 with O(n) complexity. Is it correct?
def lis(a):
dp_lis = []
curr_index = 0
prev_index = 0
for i in range(len(a)):
prev_index = curr_index
curr_index = i
print 'if: %d < %d and %d < %d' % (prev_index, curr_index, a[prev_index], a[curr_index])
if prev_index < curr_index and a[prev_index] < a[curr_index]:
print '\tadd ELEMENT: ', a[curr_index]
new_lis = 1 + max(dp_lis)
dp_lis.append(new_lis)
else:
print '\telse ELEMENT: ', a[curr_index]
dp_lis.append(1)
print "DP LIST: ", dp_lis
return max(dp_lis)
if __name__ == '__main__':
a = [10, 22, 9, 33, 21, 50, 41, 60, 80]
print lis(a)

Use this correct, proven but inefficient implementation of the algorithm to check against your results - it's the standard recursive solution, it doesn't use dynamic programming:
def lis(nums):
def max_length(i):
if i == -1:
return 0
maxLen, curLen = 0, 0
for j in xrange(i-1, -1, -1):
if nums[j] < nums[i]:
curLen = max_length(j)
if curLen > maxLen:
maxLen = curLen
return 1 + maxLen
if not nums:
return 0
return max(max_length(x) for x in xrange(len(nums)))
Check to see if your_lis(nums) == my_lis(nums) for as many different-sized input lists with numbers as possible, they should be equal. At some point, for long lists my implementation will be far slower than yours.
As a further comparison point, here's my own optimized dynamic programming solution. It runs in O(n log k) time and O(n) space, returning the actual longest increasing subsequences it finds along the way:
def an_lis(nums):
table, lis = lis_table(nums), []
for i in xrange(len(table)):
lis.append(nums[table[i]])
return lis
def lis_table(nums):
if not nums:
return []
table, preds = [0], [0] * len(nums)
for i in xrange(1, len(nums)):
if nums[table[-1]] < nums[i]:
preds[i] = table[-1]
table.append(i)
continue
minIdx, maxIdx = 0, len(table)-1
while minIdx < maxIdx:
mid = (minIdx + maxIdx) / 2
if nums[table[mid]] < nums[i]:
minIdx = mid + 1
else:
maxIdx = mid
if nums[i] < nums[table[minIdx]]:
if minIdx > 0:
preds[i] = table[minIdx-1]
table[minIdx] = i
current, i = table[-1], len(table)
while i:
i -= 1
table[i], current = current, preds[current]
return table

I implement dynamic programming algorithms fairly often.
I have found that the best way to check for correctness is to write a brute-force version of the algorithm and compare the output with the dynamic programming implementation on small examples.
If the output of the two versions agree, then you have reasonable confidence of correctness.

Related

What is the problem in this binary search?

What is the problem in this binary search Python code?
I've tried using this binary search code, with high and lows, but I could use it. Please tell me where I am wrong
def binsearch(arr, n):
t = len(arr) // 2
if arr[t] == n:
print("number found at %d"%(t))
elif arr[t] > n:
binsearch(arr[:t-1], n)
elif arr[t] < n:
binsearch(arr[t+1:], n)
else:
print("num not found")
arr = [12, 24, 32, 39, 45, 50, 54]
n = 32
binsearch(arr, n)

Your first elif condition binsearch(arr[:t-1],n) will omit the t-1 index. That is not what you want.
You should use binsearch(arr[:t],n)

There are several problems here:
else block is unreachable, because if/elif/elif covers all possibilities
print("number found at %d"%(t))
will not print correct answer, because function takes sliced array not the initial one
Error appers, when you try get first element of empty array
t=len(arr)//2
if arr[t]==n:
here is t == 0, arr == []
Do not recommend recursion for binary search. There is how i suggest to write it:
def bin_search(arr, n):
l = 0
r = len(arr)
m = (l+r)//2
while l < r:
if arr[m] > n:
r = m
elif arr[m] < n:
l = m+1
else:
return m
m = (l+r)//2
arr=[12, 24, 32, 39, 45, 50, 54]
n=32
print(bin_search(arr,n))

Python: parsing a string of concatenated ascending integers

The objective is to parse the output of an ill-behaving program which concatenates a list of numbers, e.g., 3, 4, 5, into a string "345", without any non-number separating the numbers. I also know that the list is sorted in ascending order.
I came up with the following solution which reconstructs the list from a string:
a = '3456781015203040'
numlist = []
numlist.append(int(a[0]))
i = 1
while True:
j = 1
while True:
if int(a[i:i+j]) <= numlist[-1]:
j = j + 1
else:
numlist.append(int(a[i:i+j]))
i = i + j
break
if i >= len(a):
break
This works, but I have a feeling that the solution reflects too much the fact that I have been trained in Pascal, decades ago. Is there a better or more pythonic way to do it?
I am aware that the problem is ill-posed, i.e., I could start with '34' as the initial element and get a different solution (or possibly end up with remaining trailing numeral characters which don't form the next element of the list).

This finds solutions for all possible initial number lengths:
a = '3456781015203040'
def numbers(a,n):
current_num, i = 0, 0
while True:
while i+n <= len(a) and int(a[i:i+n]) <= current_num:
n += 1
if i+n <= len(a):
current_num = int(a[i:i+n])
yield current_num
i += n
else:
return
for n in range(1,len(a)):
l = list(numbers(a,n))
# print only solutions that use up all digits of a
if ''.join(map(str,l)) == a:
print(l)
[3, 4, 5, 6, 7, 8, 10, 15, 20, 30, 40]
[34, 56, 78, 101, 520, 3040]
[34567, 81015, 203040]

little modification which allows to parse "7000000000001" data and give the best output (max list size)
a = 30000001
def numbers(a,n):
current_num, i = 0, 0
while True:
while i+n <= len(a) and int(a[i:i+n]) <= current_num:n += 1
if i+2*n>len(a):current_num = int(a[i:]);yield current_num; return
elif i+n <= len(a):current_num = int(a[i:i+n]);yield current_num;i += n
else: return
print(current_num)
for n in range(1,len(a)):
l = list(numbers(a,n))
if "".join(map(str,l)) == a:print (l)

Basic prime number generator in Python

Just wanted some feedback on my prime number generator. e.g. is it ok, does it use to much resources etc. It uses no libraries, it's fairly simple, and it is a reflection of my current state of programming skills, so don't hold back as I want to learn.
def prime_gen(n):
primes = [2]
a = 2
while a < n:
counter = 0
for i in primes:
if a % i == 0:
counter += 1
if counter == 0:
primes.append(a)
else:
counter = 0
a = a + 1
print primes

There are a few optimizations thar are common:
Example:
def prime(x):
if x in [0, 1]:
return False
if x == 2:
return True
for n in xrange(3, int(x ** 0.5 + 1)):
if x % n == 0:
return False
return True
Cover the base cases
Only iterate up to the square root of n
The above example doesn't generate prime numbers but tests them. You could adapt the same optimizations to your code :)
One of the more efficient algorithms I've found written in Python is found in the following question ans answer (using a sieve):
Simple Prime Generator in Python
My own adaptation of the sieve algorithm:
from itertools import islice
def primes():
if hasattr(primes, "D"):
D = primes.D
else:
primes.D = D = {}
def sieve():
q = 2
while True:
if q not in D:
yield q
D[q * q] = [q]
else:
for p in D[q]:
D.setdefault(p + q, []).append(p)
del D[q]
q += 1
return sieve()
print list(islice(primes(), 0, 1000000))
On my hardware I can generate the first million primes pretty quickly (given that this is written in Python):
prologic#daisy
Thu Apr 23 12:58:37
~/work/euler
$ time python foo.py > primes.txt
real 0m19.664s
user 0m19.453s
sys 0m0.241s
prologic#daisy
Thu Apr 23 12:59:01
~/work/euler
$ du -h primes.txt
8.9M primes.txt

Here is the standard method of generating primes adapted from the C# version at: Most Elegant Way to Generate Prime Number
def prime_gen(n):
primes = [2]
# start at 3 because 2 is already in the list
nextPrime = 3
while nextPrime < n:
isPrime = True
i = 0
# the optimization here is that you're checking from
# the number in the prime list to the square root of
# the number you're testing for primality
squareRoot = int(nextPrime ** .5)
while primes[i] <= squareRoot:
if nextPrime % primes[i] == 0:
isPrime = False
i += 1
if isPrime:
primes.append(nextPrime)
# only checking for odd numbers so add 2
nextPrime += 2
print primes

You start from this:
def prime_gen(n):
primes = [2]
a = 2
while a < n:
counter = 0
for i in primes:
if a % i == 0:
counter += 1
if counter == 0:
primes.append(a)
else:
counter = 0
a = a + 1
print primes
do you really need the else branch? No.
def prime_gen(n):
primes = [2]
a = 2
while a < n:
counter = 0
for i in primes:
if a % i == 0:
counter += 1
if counter == 0:
primes.append(a)
a = a + 1
print primes
Do you need the counter? No!
def prime_gen(n):
primes = [2]
a = 2
while a < n:
for i in primes:
if a % i == 0:
primes.append(a)
break
a = a + 1
print primes
Do you need to check for i larger that sqrt(a)? No.
def prime_gen(n):
primes = [2]
a = 3
while a < n:
sqrta = sqrt(a+1)
for i in primes:
if i >= sqrta:
break
if a % i == 0:
primes.append(a)
break
a = a + 1
print primes
Do you really want to manually increase a?
def prime_gen(n):
primes = [2]
for a in range(3,n):
sqrta = sqrt(a+1)
for i in primes:
if i >= sqrta:
break
if a % i == 0:
primes.append(a)
break
This is some basic refactoring that should automatically flow out of your fingers.
Then you test the refactored code, see that it is buggy and fix it:
def prime_gen(n):
primes = [2]
for a in range(3,n):
sqrta = sqrt(a+1)
isPrime = True
for i in primes:
if i >= sqrta:
break
if a % i == 0:
isPrime = False
break
if(isPrime):
primes.append(a)
return primes
And finally you get rid of the isPrime flag:
def prime_gen(n):
primes = [2]
for a in range(3,n):
sqrta = sqrt(a+1)
for i in primes:
if i >= sqrta:
primes.append(a)
break
if a % i == 0:
break
return primes
now you believe you're done. Then suddenly a friend of yours point out that for a even you are checking i >= sqrta for no reason. (Similarly for a mod 3 == 0 numbers, but then branch-prediction comes in help.)
Your friend suggest you to check a % i == 0 before:
def prime_gen(n):
primes = [2]
for a in range(3,n):
sqrta = sqrt(a+1)
for i in primes:
if a % i == 0:
break
if i >= sqrta:
primes.append(a)
break
return primes
now you're done and grateful to your brillant friend!

You can use Python yield statement to generate one item at the time. Son instead of get all items at once you will iterate over generator and get one item at the time. This minimizes your resources.
Here an example:
from math import sqrt
from typing import Generator
def gen(num: int) -> Generator[int, None, None]:
if 2 <= num:
yield 2
yield from (
i
for i in range(3, num + 1, 2)
if all(i % x != 0 for x in range(3, int(sqrt(i) + 1)))
)
for x in gen(100):
print(x, end=", ")
Output:
2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97,

I made improvements on the solution proposed my jimifiki
import math #for finding the sqare root of the candidate number
def primes(n):
test = [3] #list of primes new candidates are tested against
found = [5] #list of found primes, which are not being tested against
c = 5 #candidate number, starting at five
while c < n: #upper bound, largest candidate will be this or 1 bigger
p = True #notes the possibility of c to be prime
c += 2 #increase candidate by 2, avoiding all even numbers
for a in test: #for each item in test
if c % a == 0: #check if candidate is divisible
p = False #since divisible cannot be prime
break #since divisible no need to continue checking
if p: #true only if not divisible
if found[0] > math.sqrt(c): #is samallest in found > sqrt of c
found.append(c) #if so c is a prime, add it to the list
else: #if not, it's equal and we need to start checking for it
test.append(found.pop(0)) #move pos 0 of found to last in test
return([2] + test + found) #after reaching limit returns 2 and both lists
The biggest improvement is not checking for even numbers and checking the square root only if the number is not divisible, the latter really adds up when numbers get bigger. The reason we don't need to check for the square root is, that the test list only contains numbers smaller than the square root. This is because we add the next number only when we get to the first non-prime not divisible by any of the numbers in test. This number is always the square of the next biggest prime which is also the smallest number in found. The use of the boolean "p" feels kind of spaghetty to me so there might be room for improvement.

Here's a pretty efficient prime number generator that I wrote a while back that uses the Sieve of Eratosthenes:
#!/usr/bin/env python2.7
def primeslt(n):
"""Finds all primes less than n"""
if n < 3:
return []
A = [True] * n
A[0], A[1] = False, False
for i in range(2, int(n**0.5)+1):
if A[i]:
j = i**2
while j < n:
A[j] = False
j += i
return [num for num in xrange(n) if A[num]]
def main():
i = ''
while not i.isdigit():
i = raw_input('Find all prime numbers less than... ')
print primeslt(int(i))
if __name__ == '__main__':
main()
The Wikipedia article (linked above) explains how it works better than I could, so I'm just going to recommend that you read that.

I have some optimizations for the first code which can be used when the argument is negative:
def is_prime(x):
if x <=1:
return False
else:
for n in xrange(2, int(x ** 0.5 + 1)):
if x % n == 0:
return False
return True
print is_prime(-3)

Being Python, it usually better to return a generator that will return an infinite sequence of primes rather than a list.
ActiveState has a list of older Sieve of Eratosthenes recipes
Here is one of them updated to Python 2.7 using itertools count with a step argument which did not exist when the original recipe was written:
import itertools as it
def sieve():
""" Generate an infinite sequence of prime numbers.
"""
yield 2
D = {}
for q in it.count(3, 2): # start at 3 and step by odds
p = D.pop(q, 0)
if p:
x = q + p
while x in D: x += p
D[x] = p # new composite found. Mark that
else:
yield q # q is a new prime since no composite was found
D[q*q] = 2*q
Since it is a generator, it is much more memory efficient than generating an entire list. Since it locates composite, it is computationally efficient as well.
Run this:
>>> g=sieve()
Then each subsequent call returns the next prime:
>>> next(g)
2
>>> next(g)
3
# etc
You can then get a list between boundaries (i.e., the Xth prime from the first to the X+Y prime...) by using islice:
>>> tgt=0
>>> tgt, list(it.islice(sieve(), tgt, tgt+10))
(0, [2, 3, 5, 7, 11, 13, 17, 19, 23, 29])
>>> tgt=1000000
>>> tgt, list(it.islice(sieve(), tgt, tgt+10))
(1000000, [15485867, 15485917, 15485927, 15485933, 15485941, 15485959, 15485989, 15485993, 15486013, 15486041])

To Get the 100th prime number:
import itertools
n=100
x = (i for i in itertools.count(1) if all([i%d for d in xrange(2,i)]))
print list(itertools.islice(x,n-1,n))[0]
To get prime numbers till 100
import itertools
n=100
x = (i for i in xrange(1,n) if all([i%d for d in xrange(2,i)]))
for n in x:
print n

you can do it this way also to get the primes in a dictionary in python
def is_prime(a):
count = 0
counts = 0
k = dict()
for i in range(2, a - 1):
k[count] = a % i
count += 1
for j in range(len(k)):
if k[j] == 0:
counts += 1
if counts == 0:
return True
else:
return False
def find_prime(f, g):
prime = dict()
count = 0
for i in range(int(f), int(g)):
if is_prime(i) is True:
prime[count] = i
count += 1
return prime
a = find_prime(20,110)
print(a)
{0: 23, 1: 29, 2: 31, 3: 37, 4: 41, 5: 43, 6: 47, 7: 53, 8: 59, 9: 61, 10: 67, 11:
71, 12: 73, 13: 79, 14: 83, 15: 89, 16: 97, 17: 101, 18: 103, 19: 107, 20: 109}

Finding the rank of an element using modified quicksort

I am trying to find the rank of an element, defined as k where in an unsorted list, the rank would be the k^th lowest value in the list.
E.G.
Given a list:
[5,4,1,10,8,3,2]
Where k is 1, value is 1
Where k is 3, value is 3
Where k is 6, value is 8
Where k is 7, value is 10
I have to use a modified quicksort partition function provided below.
def partition(a_list, first, last):
pivot = a_list[last]
i = first - 1
for j in range(first, last):
if a_list[j] <= pivot:
i += 1
a_list[i], a_list[j] = a_list[j], a_list[i]
a_list[i + 1], a_list[last] = a_list[last], a_list[i + 1]
return i + 1
I'm looking for a function with an expected running time of O(n), I'm trying to recursively navigate through the list by calling partition and then deciding whether to go through the right half or left half, but I'm having trouble actually getting the results that are expected. Here is my selection method.
def selection(a_list, first, last, k):
intReturn = partition(a_list, first, last)
print(a_list)
print(intReturn)
if intReturn == k:
return a_list[intReturn - 1]
else:
if intReturn < k:
return selection(a_list, first + intReturn, last, k) #- (first + intReturn))
elif intReturn > k:
return a_list[k]
#return selection(a_list, first, last - intReturn, k)
The selection function should be called as follows:
print(selection([5,4,1,10,8,3,2], 0, 6, 1))
print(selection([5,4,1,10,8,3,2], 0, 6, 3))
print(selection([5,4,1,10,8,3,2], 0, 6, 6))
print(selection([5,4,1,10,8,3,2], 0, 6, 7))
print(selection([46, 50, 16, 88, 79, 77, 17, 2, 43, 13, 86, 12, 68, 33, 81, \
74, 19, 52, 98, 70, 61, 71, 93, 5, 55], 0, 24, 19))
So yea, how do I go about recursively at expected running time O(n) to find an element given a certain rank without having to sort the whole list? While being restricted to this particular way of partitioning.

The Python cookbooks has at least two worked-out examples here and here.
This is my version:
import random
def select(data, n):
"Find the nth rank ordered element (the least value has rank 0)."
data = list(data)
if not 0 <= n < len(data):
raise ValueError('not enough elements for the given rank')
while True:
pivot = random.choice(data)
pcount = 0
under, over = [], []
uappend, oappend = under.append, over.append
for elem in data:
if elem < pivot:
uappend(elem)
elif elem > pivot:
oappend(elem)
else:
pcount += 1
if n < len(under):
data = under
elif n < len(under) + pcount:
return pivot
else:
data = over
n -= len(under) + pcount
As you requested, here is a recursive version of the same code:
def select(data, n):
"Find the nth rank ordered element (the least value has rank 0)."
if not 0 <= n < len(data):
raise ValueError('not enough elements for the given rank')
pivot = random.choice(data)
pcount = 0
under, over = [], []
uappend, oappend = under.append, over.append
for elem in data:
if elem < pivot:
uappend(elem)
elif elem > pivot:
oappend(elem)
else:
pcount += 1
if n < len(under):
return select(under, n)
elif n < len(under) + pcount:
return pivot
else:
return select(over, n - len(under) - pcount)

Try this? One observation that might help is that the partition function returns the index (starting from zero) of the pivot element.
def selection(a_list, first, last, k):
assert (k - 1 >= first)
assert (k - 1 <= last)
intReturn = partition(a_list, first, last)
if intReturn + 1 == k:
return a_list[intReturn]
if intReturn + 1 < k:
return selection(a_list, intReturn + 1, last, k)
if intReturn + 1 > k:
return selection(a_list, first, intReturn - 1, k)

Efficient way of counting number of elements smaller (larger) than cutoff in a sorted list

Let's assume we have a sorted list:
lst = [1,3,4,89,456,543] # a long one
and what we'd like to do is to find the number of elements in a list which are smaller than, mx.
Easy:
n = len([x for x in lst if x < mx])
or with generator:
n = sum(1 for x in lst if x < mx)
I assume the second approach should be slightly quicker, but still, the problem here is that we are going through all the elements of a list while we could stop early. It doesn't use the fact that the list is sorted.
Yep, I can do it with a loop:
s = 0
for x in lst:
if x >= mx:
break
s += 1
But, I have a feeling there must be a better (shorter and / or quicker) way to do the same thing, maybe with some generator or an external module function?

We can do even better with a binary search, which is handily implemented for us in the bisect module:
import bisect
n = bisect.bisect_left(lst, mx)
This takes time logarithmic in the length of lst, whereas a linear search with early termination is linear in n. This will generally be faster.
If you want to use a linear search, the takewhile function from itertools can stop the iteration early:
import itertools
n = sum(1 for _ in itertools.takewhile(lambda x: x < mx, lst))

I am trying to solve using binary search:
#!/usr/bin/python
lst = range(12, 100)
mx = 30
def binary_search(data, target, low, high):
if low > high:
return False
else:
mid = (low + high) // 2
if target == data[mid]:
return mid
elif target < data[mid]:
return binary_search(data, target, low, (mid - 1))
else:
return binary_search(data, target, mid + 1, high)
if __name__ == '__main__':
index = binary_search(lst, mx, 0, len(lst) + 1)
print 'Count: %d' % len(lst[:index])
print lst[:index]
Output:
Count : 18
[12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29]

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Is this Longest Common Subsequence Correct? - python

Related

What is the problem in this binary search?

Python: parsing a string of concatenated ascending integers

Basic prime number generator in Python

Finding the rank of an element using modified quicksort

Efficient way of counting number of elements smaller (larger) than cutoff in a sorted list

Categories

Resources