I'm trying to find the number of positive numbers in the list using python and I want the code to run in O(log(n))
the list is of type Positive-Negative if there's a negative number it will be on the right-hand side of the list and all the positives are on the left side.
for example, if the input is :
lst=[2,6,0,1,-3,-19]
the output is
4
I've tried some methods but I didn't get the result that I wanted
this is the last form that I get until now:
def pos_neg(lst):
if int(len(lst)) is 0 or (int(len(lst)) is 1 and lst[0] <0):
return 0
elif len(lst) is 1 and lst[0] >=0:
return 1
leng = (int(len(lst))) // 2
counter = 0
index = 0
while(leng != 0):
if lst[leng] >=0 and lst[leng+1] <0:
index = leng
break
if lst[leng] >=0 and lst[leng + 1] >=0:
if int(len(lst)) < leng + int(len(lst)):
return 0
else:
leng = (leng + int(len(lst))) // 2
if lst[leng] <0 and lst[leng + 1] <0:
leng = leng // 2
return index
You can use recursion. Every time the list is halved, so the complexity is O(logn)
def find_pos(start, end, l, count):
if(start > end):
return count
middle = start + (end-start)//2
if(l[middle] >= 0): # that means all left side is positive
count += (middle-start) + 1
return find_pos(middle+1, end, l, count)
else: # that means I am in wrong region
return find_pos(start, middle-1, l, count)
lst=[2,6,0,1,-3,-19]
find_pos(0, len(lst)-1, lst, 0)
>>> 4
Update:
If you want one function passing only lst
def find_positives(l):
return find_pos(0, len(l)-1, l, 0)
find_positives(lst)
What you describe is a bisection algorithm. The standard library provides value search bisection for ascending sequences, but it can be easily adjusted to search for positive/negative in sign-descending sequences:
def bisect_pos(nms):
"""
Do a right-bisect for positive numbers
:prams nms: numbers where positive numbers are left-aligned
This returns the index *after* the right-most positive number.
This is equivalent to the count of positive numbers.
"""
if not nms:
return 0
# the leftmost/rightmost search index
low_idx, high_idx = 0, len(nms)
while low_idx < high_idx:
mid_idx = (low_idx + high_idx) // 2
if nms[mid_idx] < 0: # negative numbers – search to the left
high_idx = mid_idx
else: # positive numbers – search to the right
low_idx = mid_idx+1
return low_idx
I usually find binary searches that repeatedly compute the middle difficult to confirm as correct, whereas constructing the index we're looking for bit-by-bit is always clear to me. So:
def find_first_negative(l):
if not l or l[0] < 0: return 0
idx = 0 # Last positive index.
pot = 1 << len(l).bit_length()
while pot:
if idx + pot < len(l) and l[idx + pot] >= 0:
idx += pot
pot >>= 1
return idx + 1
Idea of algorithm is following:
you assign two variables: left = 0 and right=len(lst)-1
then cycle calculating middle index as middle = (left + right) // 2
and check sign of lst[middle]. Then assign left = middle or right = middle depending on this check. Then cycle repeats while left < right and not equal.
As result your middle element will point to last positive or first negative depending on how you make comparison.
The question is from here https://leetcode.com/problems/contiguous-array/
Actually, I came up with a DP solution for this question.
However, It won't pass one test case.
Any thought?
DP[i][j] ==1 meaning from substring[i] to substring[j] is valid
Divide the question into smaller
DP[i][j]==1
- if DP[i+2][j]==1 and DP[i][i+1]==1
- else if DP[i][j-2]==1 and DP[j-1][j]==1
- else if num[i],num[j] == set([0,1]) and DP[i+1][j-1]==1
```
current_max_len = 0
if not nums:
return current_max_len
dp = [] * len(nums)
for _ in range(len(nums)):
dp.append([None] * len(nums))
for thisLen in range(2, len(nums)+1, 2):
for i in range(len(nums)):
last_index = i + thisLen -1
if i + thisLen > len(nums):
continue
if thisLen==2:
if set(nums[i:i+2]) == set([0, 1]):
dp[i][last_index] = 1
elif dp[i][last_index-2] and dp[last_index-1][last_index]:
dp[i][last_index] = 1
elif dp[i][i + 1] and dp[i + 2][last_index]:
dp[i][last_index] = 1
elif dp[i + 1][last_index-1] and set([nums[i], nums[last_index]]) == set([0, 1]):
dp[i][last_index] = 1
else:
dp[i][last_index] = 0
if dp[i][last_index] == 1:
current_max_len = max(current_max_len, thisLen)
return current_max_len
```
Here is a counter example [1, 1, 0, 0, 0, 0, 1, 1]. The problem with you solution that it requires a list to be composed of smaller valid lists of size n-1 or n-2 in this counter example it's two lists of length 4 or n-2 . -- SPOILER ALERT -- You can solve the problem by using other dp technique basically for every i,j you can find the number of ones and zeroes between them in constant time to do that just store the number of ones from the start of the list to every index i
here is python code
def func( nums):
track,has=0,{0:-1}
length=len(nums);
ress_max=0;
for i in range(0,length):
track += (1 if nums[i]==1 else -1)
if track not in has:
has[track]=i
elif ress_max <i-has[track]:
ress_max = i-has[track]
return ress_max
l = list(map(int,input().strip().split()))
print(func(l))
Since given length of binary string may be at most 50000. So, running O(n * n) algorithm may lead to time limit exceed. I would like to suggest you to solve it in O(n) time and space complexity. The idea is :
If we take any valid contiguous sub-sequence and perform summation of numbers treating 0 as -1 then, total summation should be zero always.
If we keep track of prefix summation then we can get zero summation in the range L to R, if prefix summation up to L - 1 and prefix summation up to R are equal.
Since we are looking for maximum length, we will always treat index of newly found summation as a first one and put it into hash map with value as current index and which will persist forever for that particular summation.
Every time we calculate cumulative summation, we look whether it has any previous occurrence. If it has previous occurrence we calculate length and try to maximize , otherwise it will be the first one and will persist forever in hash map with value as current index.
Note: To calculate pure prefix, we must treat summation 0 is already in map and paired with value -1 as index.
The sample code in C++ is as follow:
int findMaxLength(vector<int>& nums) {
unordered_map<int,int>lastIndex;
lastIndex[0] = -1;
int cumulativeSum = 0;
int maxLen = 0;
for (int i = 0; i < nums.size(); ++i) {
cumulativeSum += (nums[i] == 0 ? -1 : 1);
if (lastIndex.find(cumulativeSum) != lastIndex.end()) {
maxLen = max(maxLen, i - lastIndex[cumulativeSum]);
} else {
lastIndex[cumulativeSum] = i;
}
}
return maxLen;
}
I have a problem where I need to (pretty sure at least) go through the entire list to solve. The question is to figure out the largest number of consecutive numbers in a list that add up to another (greater) element in that list. If there aren't any then we just take the largest value in the list as the candidate summation and 1 as the largest consecutive number of elements.
My general code works, but not too well for large lists (>500,000 elements). I am just looking for tips as to how I could approach the problem differently. My current approach:
L = [1,2,3,4,5,6,7,8,9,10]
candidate_sum = L[-1]
largest_count = 1
N = len(L)
i = 0
while i < N - 1:
s = L[i]
j = 0
while s <= (N - L[i + j + 1]):
j += 1
s += L[i+j]
if s in L and (j+1) > largest_count:
largest_count = j+1
candidate_sum = s
i+=1
In this case, the answer would be [1,2,3,4] as they add up to 10 and the length is 4 (obviously this example L is a very simple example).
I then made it faster by changing the initial while loop condition to:
while i < (N-1)/largest_count
Not a great assumption, but basic thinking that the distribution of numbers is somewhat uniform, so two numbers on the second half of the list are on average bigger than the final number in the list, and therefore are disqualified.
I'm just looking for:
possible bottlenecks
suggestions as to different approaches to try
Strictly ascending: no duplication of elements or subsequences, single possible solution
Arbitrary-spaced: no arithmetical shortcuts, has to operate brute-force
Efficient C implementation using pointer arithmetic, quasi polymorphic over numeric types:
#define TYPE int
int max_subsum(TYPE arr [], int size) {
int max_length = 1;
TYPE arr_fst = * arr;
TYPE* num_ptr = arr;
while (size --) {
TYPE num = * num_ptr++;
TYPE* lower = arr;
TYPE* upper = arr;
TYPE sum = arr_fst;
int length = 1;
for (;;) {
if (sum > num) {
sum -= * lower++;
-- length;
}
else if (sum < num) {
sum += * ++upper;
++ length;
}
else {
if (length > max_length) {
max_length = length;
}
break;
}
}
}
return max_length;
}
The main loop over the nums is parallelizable. Relatively straight-forward translation into Python 3 using the dynamic-array list type for arr and the for each loop:
def max_subsum(arr):
max_len = 1
arr_fst = arr[0]
for n in arr:
lower = 0
upper = 0
sum = arr_fst
while True:
if sum > n:
sum -= arr[lower]
lower += 1
elif sum < n:
upper += 1
sum += arr[upper]
else:
sum_len = upper - lower + 1
if sum_len > max_len:
max_len = sum_len
break
return max_len
This max_subsum is a partial function; Python lists can be empty. The algorithm is appropriate for C-like compiled imperative languages offering fast indexing and statically typed arithmetic. Both are comparatively expensive in Python. A (totally defined) algorithm rather similar to yours, using the set data type for more performant universal quantification, and avoiding Python's dynamically typed arithmetic, can be more efficiently interpreted:
def max_subsum(arr):
size = len(arr)
max_len = 0
arr_set = set(arr)
for i in range(size):
sum = 0
sum_len = 0
for j in range(i, size):
sum_mem = sum + arr[j]
if num_mem not in arr_set:
break
sum = sum_mem
sum_len += 1
if sum_len > max_len:
max_len = sum_len
return max_len
I'm going to ignore the possibility of a changing target value, and let you figure that out, but to answer your question "is there a faster way to do it?" Yes: by using cumulative sums and some math to eliminate one of your loops.
import numpy as np
L = np.random.randint(0,100,100)
L.sort()
cum_sum = np.cumsum(L)
start = 0
end = 0
target = 200
while 1:
total = cum_sum [end-1] - (cum_sum [start-1] if start else 0)
if total == target:
break
elif total < target:
end += 1
elif total > target:
start += 1
if end >= len(L):
raise ValueError('something informative')
#!/usr/bin/python2
"""
Each new term in the Fibonacci sequence is generated by adding the previous two terms. By starting with 1 and 2, the first 10 terms will be:
1, 2, 3, 5, 8, 13, 21, 34, 55, 89, ...
By considering the terms in the Fibonacci sequence whose values do not exceed four million, find the sum of the even-valued terms.
"""
odd, even = 0,1
total = 0
while True:
odd = odd + even #Odd
even = odd + even #Even
if even < 4000000:
total += even
else:
break
print total
My algo:
If I take first 2 numbers as 0, 1; the number that I find first in while loop will be an odd number and first of Fibonacci series.
This way I calculate the even number and each time add the value of even to total.
If value of even is greater than 4e6, I break from the infinite loop.
I have tried so much but my answer is always wrong. Googling says the answer should be 4613732 but I always seem to get 5702886
Basically what you're doing here is adding every second element of the fibonacci sequence while the question asks to only sum the even elements.
What you should do instead is just iterate over all the fibonacci values below 4000000 and do a if value % 2 == 0: total += value. The % is the remainder on division operator, if the remainder when dividing by 2 equals 0 then the number is even.
E.g.:
prev, cur = 0, 1
total = 0
while True:
prev, cur = cur, prev + cur
if cur >= 4000000:
break
if cur % 2 == 0:
total += cur
print(total)
def fibonacci_iter(limit):
a, b = 0, 1
while a < limit:
yield a
a, b = b, a + b
print sum(a for a in fibonacci_iter(4e6) if not (a & 1))
Here is simple solution in C:
#include <stdio.h>
#include <stdlib.h>
int main()
{
int i=1,j=1,sum=0;
while(i<4000000)
{
i=i+j;
j=i-j;
if(i%2==0)
sum+=i;
}
printf("Sum is: %d",sum);
}
Your code includes every other term, not the even-valued ones. To see what's going on, print even just before total += even - you'll see odd numbers. What you need to do instead is check the number you're adding to the total for evenness with the modulo operator:
total = 0
x, y = 0, 1
while y < 4000000:
x, y = y, x + y
if x % 2:
continue
total += x
print total
code in python3:
sum = 2
a = 1
b = 2
c = 0
while c <= 4000000:
c = a + b
if c%2 == 0:
sum += c
a,b = b,c
print(sum)
output >>> 4613732
You just misunderstood with the even sequence and even value.
Example: 1, 2, 3, 5, 8, 13, 21
In the above sequence we need to pick 1, 3, 5, 13, 21 and not 2, 5, 13.
Here is the solution fro JAVA
public static void main(String[] args) {
int sum = 2; // Starts with 1, 2: So 2 is added
int n1=1;
int n2=2;
int n=0;
while(n<4000000){
n=n1+n2;
n1=n2;
n2=n;
if(n%2==0){
sum=sum+n;
}
}
System.out.println("Sum: "+sum);
}
Output is,
Sum: 4613732
def fibLessThan(lim):
a ,b = 1,2
total = 0
while b<lim:
if b%2 ==0:
total+=b
a,b = b,a+b
return total
I tried this exactly working answer. Most of us are adding number after fib formula where we are missing 2. With my code I am adding 2 first then fib formula. This is what exact answer for the Euler problem.
This is the second problem in the Project Euler series.
It is proven that every third Fibonacci number is even (originally the zero was not part of the series). So I start with a, b, c being 0,1,1 and the sum will be every recurring first element in my iteration.
The values of my variables will be updated with each being the sum of the preceding two:
a = b + c, b = c + a , c = a + b.
The variable a will be always even. In this way I can avoid the check for parity.
In code:
def euler2():
a, b, c, sum = 0, 1, 1, 0
while True:
print(a, b, c)
a, b, c = (b + c), (2 * c + b), (2 * b + 3 * c)
if a >= 4_000_000:
break
sum += a
return sum
print(euler2())
it should be:
odd, even = 1,0
Also, every third numer is even (even + odd + odd = even).
If you add every second value of the fibonacci sequence you'll get the next fibonacci value after the last added value. For example:
f(0) + f(2) + f(4) = f(5)
0 + 1 + 3 + 8 = 13
But your code currently does not add the first even value 1.
Other answers are correct but note that to just add all even numbers in an array, just do
myarray=[1, 2, 3, 5, 8, 13, 21, 34, 55, 89]
sum(map(lambda k:k if k%2 else 0, myarray))
or
sum([k if k%2 else 0 for k in [1,2,3,4,5]])
Every 3rd item in the Fibonnaci sequence is even. So, you could have this:
prev, cur = 0, 1
count = 1
total = 0
while True:
prev, cur = cur, prev + cur
count = count + 1
if cur >= 4000000:
break
if count % 3 == 0:
total += cur
print(total)
or this (changing your code as little as possible):
even, odd = 0,1 # this line was corrected
total = 0
while True:
secondOdd = even + odd # this line was changed
even = odd + secondOdd #Even # this line was changed
if even < 4000000:
total += even
odd = secondOdd + even # this line was added
else:
break
print total
Another way would be (by the use of some simple math) to check that the sum of a2+a5+a8+a11+...+a(3N+2) (the sum of even Fibonacci values) is equal to (a(3N+4)-1)/2. So, if you can calculate directly that number, there is no need to calculate all the previous Fibonacci numbers.
not sure if your question is already answered or you've found a solution, but here's what you're doing wrong. The problem asks you to find even-valued terms, which means that you'll need to find every value in the fibonacci sequence which can be divided by 2 without a remainder. The problem does not ask you to find every even-indexed value. Here's the solution to your problem then, which gives a correct answer:
i = 1
total = 0
t = fib(i)
while t <= 4000000:
t = fib(i)
if t % 2 == 0:
total += t
i += 1
print total
Basically you loop through every each value in fibonacci sequence, checking if value is even by using 'mod' (% operator) to get remainder, and then if it's even you add it to sum.
Here is how I was able to solve this using native javascript.
var sum = 0,
x = 1,
y = 2,
z = 0;
while (z < 4000000) {
if (y%2==0){
sum +=y;
}
z = x + y;
x = y;
y = z;
} console.log(sum);
I did it differently.
def fibLessThan(lim):
#################
# Initial Setup #
#################
fibArray=[1, 1, 2]
i=3
#####################
# While loop begins #
#####################
while True:
tempNum = fibArray[i-2]+fibArray[i-1]
if tempNum <= lim:
fibArray.append(tempNum)
i += 1
else:
break
print fibArray
return fibArray
limit = 4000000
fibList = fibLessThan(limit)
#############
# summation #
#############
evenNum = [x for x in fibList if x%2==0]
evenSum = sum(evenNum)
print "evensum=", evenSum
Here is my Python code:
even_sum = 0
x = [1, 1] # Fibonacci sequence starts with 1,1...
while (x [-2] + x [-1]) < 4000000: # Check if the coming number is smaller than 4 million
if (x [-2] + x [-1]) % 2 == 0: # Check if the number is even
even_sum += (x [-2] + x [-1])
x.append (x [-2] + x [-1]) # Compose the Fibonacci sequence
print (even_sum)
Although it's hard to believe that a question with 17 answers needs yet another, nearly all previous answers have problems in my view: first, they use the modulus operator (%) aka division to solve an addition problem; second, they calculate all the numbers in the sequence and toss the odd ones; finally, many of them look like C programs, using little of Python's advantages.
Since we know that every third number of the Fibonacci sequence is even, we can generate every third number starting from 2 and sum the result:
def generate_even_fibonacci(limit):
previous, current = 0, 2
while current < limit:
yield current
previous, current = current, current * 4 + previous
print(sum(generate_even_fibonacci(4_000_000)))
OUTPUT
> python3 test.py
4613732
>
So much code for such a simple series. It can be easily shown that f(i+3) = f(i-3) + 4*f(i) so you can simply start from 0,2 which are f(0),f(3) and progress directly through the even values striding by 3 as you would for the normal series:
s,a,b = 0,0,2
while a <= 4000000: s,a,b = s+a,b,a+4*b
print(s)
I solved it this way:
list=[1, 2]
total =2
while total< 4000000:
list.append(list[-1]+list[-2])
if list[-1] % 2 ==0:
total += list[-1]
print(total)
long sum = 2;
int start = 1;
int second = 2;
int newValue = 0;
do{
newValue = start + second;
if (newValue % 2 == 0) {
sum += newValue;
}
start = second;
second = newValue;
} while (newValue < 4000000);
System.out.println("Finding the totoal sum of :" + (sum));`enter code here`
The first mistake was you messed the Fibonacci sequence and started with 0 and 1 instead of 1 and 2. The sum should therefore be initialized to 2
#!/usr/bin/python2
firstNum, lastNum = 1, 2
n = 0
sum = 2 # Initialize sum to 2 since 2 is already even
maxRange = input("Enter the final number")
max = int(maxRange)
while n < max:
n = firstNum + lastNum
firstNum = lastNum
lastNum = n
if n % 2 == 0:
sum = sum + n
print(sum)
I did it this way:)
It works completely fine:)
n = int(input())
f = [0, 1]
for i in range(2,n+1):
f.append(f[i-1]+f[i-2])
sum = 0
for i in f:
if i>n:
break
elif i % 2 == 0:
sum += i
print(sum)
There are many great answers here. Nobody's posted a recursive solution so here's one of those in C
#include <stdio.h>
int filt(int n){
return ( n % 2 == 0);
}
int fib_func(int n0, int n1, int acc){
if (n0 + n1 > 4000000)
return acc;
else
return fib_func(n1, n0+n1, acc + filt(n0+n1)*(n0+n1));
}
int main(int argc, char* argv){
printf("%d\n", fib_func(0,1,0));
return 0;
}
This is the python implementation and works perfectly.
from math import pow
sum=0
summation=0
first,second=1,2
summation+=second
print first,second,
while sum < 4*math.pow(10,6):
sum=first+second
first=second
second=sum
#i+=1
if sum > 4*math.pow(10,6):
break
elif sum%2==0:
summation+=sum
print "The final summation is %d" %(summation)
problem in your code basicly related with looping style and checking condition timing. with below algorithm coded in java you can find (second + first) < 4000000 condition check and it brings you correct ( which less than 4000000) result, have a nice coding...
int first = 0, second = 1, pivot = 0;
do {
if ((second + first) < 4000000) { // this is the point which makes your solution correct
pivot = second + first;
first = second;
second = pivot;
System.out.println(pivot);
} else {
break;
}
} while (true);
I am trying to find the longest common subsequence of 3 or more strings. The Wikipedia article has a great description of how to do this for 2 strings, but I'm a little unsure of how to extend this to 3 or more strings.
There are plenty of libraries for finding the LCS of 2 strings, so I'd like to use one of them if possible. If I have 3 strings A, B and C, is it valid to find the LCS of A and B as X, and then find the LCS of X and C, or is this the wrong way to do it?
I've implemented it in Python as follows:
import difflib
def lcs(str1, str2):
sm = difflib.SequenceMatcher()
sm.set_seqs(str1, str2)
matching_blocks = [str1[m.a:m.a+m.size] for m in sm.get_matching_blocks()]
return "".join(matching_blocks)
print reduce(lcs, ['abacbdab', 'bdcaba', 'cbacaa'])
This outputs "ba", however it should be "baa".
Just generalize the recurrence relation.
For three strings:
dp[i, j, k] = 1 + dp[i - 1, j - 1, k - 1] if A[i] = B[j] = C[k]
max(dp[i - 1, j, k], dp[i, j - 1, k], dp[i, j, k - 1]) otherwise
Should be easy to generalize to more strings from this.
I just had to do this for a homework, so here is my dynamic programming solution in python that's pretty efficient. It is O(nml) where n, m and l are the lengths of the three sequences.
The solution works by creating a 3D array and then enumerating all three sequences to calculate the path of the longest subsequence. Then you can backtrack through the array to reconstruct the actual subsequence from its path.
So, you initialize the array to all zeros, and then enumerate the three sequences. At each step of the enumeration, you either add one to the length of the longest subsequence (if there's a match) or just carry forward the longest subsequence from the previous step of the enumeration.
Once the enumeration is complete, you can now trace back through the array to reconstruct the subsequence from the steps you took. i.e. as you travel backwards from the last entry in the array, each time you encounter a match you look it up in any of the sequences (using the coordinate from the array) and add it to the subsequence.
def lcs3(a, b, c):
m = len(a)
l = len(b)
n = len(c)
subs = [[[0 for k in range(n+1)] for j in range(l+1)] for i in range(m+1)]
for i, x in enumerate(a):
for j, y in enumerate(b):
for k, z in enumerate(c):
if x == y and y == z:
subs[i+1][j+1][k+1] = subs[i][j][k] + 1
else:
subs[i+1][j+1][k+1] = max(subs[i+1][j+1][k],
subs[i][j+1][k+1],
subs[i+1][j][k+1])
# return subs[-1][-1][-1] #if you only need the length of the lcs
lcs = ""
while m > 0 and l > 0 and n > 0:
step = subs[m][l][n]
if step == subs[m-1][l][n]:
m -= 1
elif step == subs[m][l-1][n]:
l -= 1
elif step == subs[m][l][n-1]:
n -= 1
else:
lcs += str(a[m-1])
m -= 1
l -= 1
n -= 1
return lcs[::-1]
To find the Longest Common Subsequence (LCS) of 2 strings A and B, you can traverse a 2-dimensional array diagonally like shown in the Link you posted. Every element in the array corresponds to the problem of finding the LCS of the substrings A' and B' (A cut by its row number, B cut by its column number). This problem can be solved by calculating the value of all elements in the array. You must be certain that when you calculate the value of an array element, all sub-problems required to calculate that given value has already been solved. That is why you traverse the 2-dimensional array diagonally.
This solution can be scaled to finding the longest common subsequence between N strings, but this requires a general way to iterate an array of N dimensions such that any element is reached only when all sub-problems the element requires a solution to has been solved.
Instead of iterating the N-dimensional array in a special order, you can also solve the problem recursively. With recursion it is important to save the intermediate solutions, since many branches will require the same intermediate solutions. I have written a small example in C# that does this:
string lcs(string[] strings)
{
if (strings.Length == 0)
return "";
if (strings.Length == 1)
return strings[0];
int max = -1;
int cacheSize = 1;
for (int i = 0; i < strings.Length; i++)
{
cacheSize *= strings[i].Length;
if (strings[i].Length > max)
max = strings[i].Length;
}
string[] cache = new string[cacheSize];
int[] indexes = new int[strings.Length];
for (int i = 0; i < indexes.Length; i++)
indexes[i] = strings[i].Length - 1;
return lcsBack(strings, indexes, cache);
}
string lcsBack(string[] strings, int[] indexes, string[] cache)
{
for (int i = 0; i < indexes.Length; i++ )
if (indexes[i] == -1)
return "";
bool match = true;
for (int i = 1; i < indexes.Length; i++)
{
if (strings[0][indexes[0]] != strings[i][indexes[i]])
{
match = false;
break;
}
}
if (match)
{
int[] newIndexes = new int[indexes.Length];
for (int i = 0; i < indexes.Length; i++)
newIndexes[i] = indexes[i] - 1;
string result = lcsBack(strings, newIndexes, cache) + strings[0][indexes[0]];
cache[calcCachePos(indexes, strings)] = result;
return result;
}
else
{
string[] subStrings = new string[strings.Length];
for (int i = 0; i < strings.Length; i++)
{
if (indexes[i] <= 0)
subStrings[i] = "";
else
{
int[] newIndexes = new int[indexes.Length];
for (int j = 0; j < indexes.Length; j++)
newIndexes[j] = indexes[j];
newIndexes[i]--;
int cachePos = calcCachePos(newIndexes, strings);
if (cache[cachePos] == null)
subStrings[i] = lcsBack(strings, newIndexes, cache);
else
subStrings[i] = cache[cachePos];
}
}
string longestString = "";
int longestLength = 0;
for (int i = 0; i < subStrings.Length; i++)
{
if (subStrings[i].Length > longestLength)
{
longestString = subStrings[i];
longestLength = longestString.Length;
}
}
cache[calcCachePos(indexes, strings)] = longestString;
return longestString;
}
}
int calcCachePos(int[] indexes, string[] strings)
{
int factor = 1;
int pos = 0;
for (int i = 0; i < indexes.Length; i++)
{
pos += indexes[i] * factor;
factor *= strings[i].Length;
}
return pos;
}
My code example can be optimized further. Many of the strings being cached are duplicates, and some are duplicates with just one additional character added. This uses more space than necessary when the input strings become large.
On input: "666222054263314443712", "5432127413542377777", "6664664565464057425"
The LCS returned is "54442"
This below code can find the longest common subsequence in N strings. This uses itertools to generate required index combinations and then use these indexes for finding common substring.
Example Execution:
Input:
Enter the number of sequences: 3
Enter sequence 1 : 83217
Enter sequence 2 : 8213897
Enter sequence 3 : 683147
Output:
837
from itertools import product
import numpy as np
import pdb
def neighbors(index):
N = len(index)
for relative_index in product((0, -1), repeat=N):
if not all(i == 0 for i in relative_index):
yield tuple(i + i_rel for i, i_rel in zip(index, relative_index))
def longestCommonSubSequenceOfN(sqs):
numberOfSequences = len(sqs);
lengths = np.array([len(sequence) for sequence in sqs]);
incrLengths = lengths + 1;
lengths = tuple(lengths);
inverseDistances = np.zeros(incrLengths);
ranges = [tuple(range(1, length+1)) for length in lengths[::-1]];
for tupleIndex in product(*ranges):
tupleIndex = tupleIndex[::-1];
neighborIndexes = list(neighbors(tupleIndex));
operationsWithMisMatch = np.array([]);
for neighborIndex in neighborIndexes:
operationsWithMisMatch = np.append(operationsWithMisMatch, inverseDistances[neighborIndex]);
operationsWithMatch = np.copy(operationsWithMisMatch);
operationsWithMatch[-1] = operationsWithMatch[-1] + 1;
chars = [sqs[i][neighborIndexes[-1][i]] for i in range(numberOfSequences)];
if(all(elem == chars[0] for elem in chars)):
inverseDistances[tupleIndex] = max(operationsWithMatch);
else:
inverseDistances[tupleIndex] = max(operationsWithMisMatch);
# pdb.set_trace();
subString = "";
mainTupleIndex = lengths;
while(all(ind > 0 for ind in mainTupleIndex)):
neighborsIndexes = list(neighbors(mainTupleIndex));
anyOperation = False;
for tupleIndex in neighborsIndexes:
current = inverseDistances[mainTupleIndex];
if(current == inverseDistances[tupleIndex]):
mainTupleIndex = tupleIndex;
anyOperation = True;
break;
if(not anyOperation):
subString += str(sqs[0][mainTupleIndex[0] - 1]);
mainTupleIndex = neighborsIndexes[-1];
return subString[::-1];
numberOfSequences = int(input("Enter the number of sequences: "));
sequences = [input("Enter sequence {} : ".format(i)) for i in range(1, numberOfSequences + 1)];
print(longestCommonSubSequenceOfN(sequences));
Here is a link to the solution view explanation here output is Length of LCS is 2
# Python program to find
# LCS of three strings
# Returns length of LCS
# for X[0..m-1], Y[0..n-1]
# and Z[0..o-1]
def lcsOf3(X, Y, Z, m, n, o):
L = [[[0 for i in range(o+1)] for j in range(n+1)]
for k in range(m+1)]
''' Following steps build L[m+1][n+1][o+1] in
bottom up fashion. Note that L[i][j][k]
contains length of LCS of X[0..i-1] and
Y[0..j-1] and Z[0.....k-1] '''
for i in range(m+1):
for j in range(n+1):
for k in range(o+1):
if (i == 0 or j == 0 or k == 0):
L[i][j][k] = 0
elif (X[i-1] == Y[j-1] and
X[i-1] == Z[k-1]):
L[i][j][k] = L[i-1][j-1][k-1] + 1
else:
L[i][j][k] = max(max(L[i-1][j][k],
L[i][j-1][k]),
L[i][j][k-1])
# L[m][n][o] contains length of LCS for
# X[0..n-1] and Y[0..m-1] and Z[0..o-1]
return L[m][n][o]
# Driver program to test above function
X = 'AGGT12'
Y = '12TXAYB'
Z = '12XBA'
m = len(X)
n = len(Y)
o = len(Z)
print('Length of LCS is', lcsOf3(X, Y, Z, m, n, o))
# This code is contributed by Soumen Ghosh.