String divisibility without search tables? - python

Given two strings s & t, determine if s is divisible by t. For
example: "abab" is divisible by "ab" But "ababab" is not divisible by
"abab". If it isn't divisible, return -1. If it is, return the length
of the smallest common divisor: So, for "abababab" and "abab", return
2 as s is divisible by t and the smallest common divisor is "ab" with
length 2.
The way I thought it through was: I define the lengths of these two strings, find the greatest common divisor of these two. If t divides s, then the smallest common divisor is just the smallest divisor of t. And then one can use this algorithm: https://en.wikipedia.org/wiki/Knuth%E2%80%93Morris%E2%80%93Pratt_algorithm.
Is there any simpler solution?

To test for divisibility:
test that the length of s is a multiple of the length of t (otherwise not divisible);
divide s into chunks of length t; and
check that all the chunks are the same.
To find the smallest common divisor, you need to find the shortest repeating substring of t that makes up the whole of t. One approach is:
Find the factors of the length of t (the crude approach of searching from 1 up to sqrt(len(t)) should be fine for strings of any reasonable length);
For each factor (start with the smallest):
i. divide t into chunks of length factor;
ii. check if all the chunks are the same, and return factor if they are.
Using a Python set is a neat way to check if all the chunks in a list are equal. len(set(chunks)) == 1 tells you they are.

The GCD of the string lengths doesn't help you.
A string t "divides" a string s if len(s) is a multiple of len(t) and s == t * (len(s)//len(t)).
As for finding the length of the smallest divisor of t, the classic trick for that is (t+t).index(t, 1): find the first nonzero index of t in t+t. I've used the built-in index method here, but depending on why you're doing this and what runtime properties you want, KMP string search may be a better choice. Manually implementing KMP in Python is going to have a massive constant-factor overhead, but it'll have a much better worst-case runtime than index.
KMP isn't particularly hard to implement, as far as string searches go. You're not going to get anything significantly "simpler" without delegating to a library routine or sacrificing asymptotic complexity properties (or both).
If you want to avoid a search table and keep asymptotic complexity guarantees, you can use something like two-way string matching, but it's not going to be any easier to implement than KMP.

Related

Maximum number irrespective of sign from positive and negative numbers in a list

I want to find the maximum number from a list that has both positive, negative number and irrespective of sign. For example:
arr = [2,3,-6,5]
## output: -6
arr = [2,3,6,-5]
## output: 6
I've the following code which is working:
def max_number(l):
abs_maxval = max(l,key=abs)
maxval = max(l)
minval = min(l)
if maxval == abs_maxval:
return maxval
else:
return minval
Though this is working and the time complexity is O(N), I'm wondering if there is a way to find the number faster or optimize the code? From my understanding I'm scanning the list 3 times which might be slower for a large list and for my problem, I'm going through hundreds of thousands large lists.
Any suggestion will be helpful. Thanks!
You should just be able to
max(arr, key=abs)
Linear scaling is not bad, and there is very little practical difference between O(3N) and O(N). Since it would be impossible to determine (in an unordered list) that you've found the biggest or smallest without searching the entire list, O(N) is the best you can ask for.
That being said you could find what you're looking for in one pass by comparing the absolute value of each number (as you iterate) to the absolute value of the biggest or smallest number you've found so far.

Time complexity of string permutation algorithm

I wrote a simple algorithm to return a list of all possible permutations of a string, as follows:
def get_permutations(sequence):
'''
Enumerate all permutations of a given string
sequence (string): an arbitrary string to permute. Assume that it is a
non-empty string.
Returns: a list of all permutations of sequence
'''
if len(sequence) <= 1:
return list(sequence)
else:
return_list = get_permutations(sequence[1:])
new_list = []
for e in return_list:
for pos in range(len(e) + 1):
new_list.append(e[:pos] + sequence[0] + e[pos:])
return new_list
From this code I'm seeing a time complexity of O(n* n!), O(n!) is the increasing tendency for the number of elements "e" in the "return_list", and there's a nested loop that increases linearly with each new recursion, so from my understanding, O(n). The conclusion is that the algorithm as a whole has O(n*n!) complexity.
However, when searching for similar solutions I found many threads saying the optimal case for this type of algorithm should be only O(n!), so my question is:
Am I missing something on my complexity analysis or is my code not optimal? And if it isn't, how can I properly correct it?
Take any algorithm that generates and then prints out all permutations of a sequence of n different elements. Then, since there are n! different permutations and each one has n elements, simply printing out all the permutations will take time Θ(n · n!). That's worth keeping in mind as you evaluate the cost of generating permutations - even if you could generate all permutations in time O(n!), you couldn't then visit all those permutations without doing O(n · n!) work to view them all.
That being said - the recursive permutation-generating code you have up above does indeed run in time Θ(n · n!). There are some other algorithms for generating permutations that can generate but not print the permutations in time Θ(n!), but they work on different principles.
I have found, empirically, that unless you see a careful runtime analysis of a permutation-generating algorithm, you should be skeptical that the runtime is Θ(n!). Most algorithms don't hit this runtime, and in the cases of the ones that do, the analysis is somewhat subtle. Stated differently - you're not missing anything; there's just lots of "on the right track but incorrect" claims made out there. :-)
I think your algorithm is O(n * n!) because to calculate a permutation of a string x of length n, your algorithm will use the permutations of a sub string of x, which is x without the first character. I'll call this sub string y. But to calculate the permutations of y, the permutations of a sub string of y will need to be calculated. This will continue until the sub string to have its permutations calculated is of length 1. This means that to calculate the permutations of x you will need to calculate the permutations of n - 1 other strings.
Here is an example. Let's say the input string was "pie". Then what your algorithm does is it takes "pie" and calls its self again with "ie", after which it calls itself with "e". Because "e" is of length 1 it returns and all the permutations for "ie" are found which are "ie" and ei". Then that function call will return the permutations of "ie" and it is only at this point that the permutations of "pie" are calculated which it does using the permutations of "ie".
I looked up a permutation generating algorithm called Heap's algorithm that has a time complexity of O(n!). The reason it has a time complexity of n! is because it generates permutations using swaps and each swap that it does on an array generates a unique permutation for the input string. Your algorithm however, generates permutations of the n-1 sub strings of the input string which is where the time complexity of O(n * n!) comes from.
I hope this helps and sorry if I'm being overly verbose.

Is there a faster way to count non-overlapping occurrences in a string than count()?

Given a minimum length N and a string S of 1's and 0's (e.g. "01000100"), I am trying to return the number of non-overlapping occurrences of a sub-string of length n containing all '0's. For example, given n=2 and the string "01000100", the number of non-overlapping "00"s is 2.
This is what I have done:
def myfunc(S,N):
return S.count('0'*N)
My question: is there a faster way of performing this for very long strings? This is from an online coding practice site and my code passes all but one of the test cases, which fails due to not being able to finish within a time limit. Doing some research it seems I can only find that count() is the fastest method for this.
This might be faster:
>>> s = "01000100"
>>> def my_count( a, n ) :
... parts = a.split('1')
... return sum( len(p)//n for p in parts )
...
>>> my_count(s, 2)
2
>>>
Worst case scenario for count() is O(N^2), the function above is strictly linear O(N). Here's the discussion where O(N^2) number came from: What's the computational cost of count operation on strings Python?
Also, you may always do this manually, without using split(), just loop over the string, reset counter (once saved counter // n somewhere) on 1 and increase counter on 0. This would definitely beat any other approach because strictly O(N).
Finally, for relatively large values of n (n > 10 ?), there might be a sub-linear (or still linear, but with a smaller constant) algorithm, which starts with comparing a[n-1] to 0, and going back to beginning. Chances are, there going to be a 1 somewhere, so we don't have to analyse the beginning of the string if a[n-1] is 1 -- simply because there's no way to fit enough zeros in there. Assuming we have found 1 at position k, the next position to compare would be a[k+n-1], again going back to the beginning of the string.
This way we can effectively skip most of the string during the search.
lenik posted a very good response that worked well. I also found another method faster than count() that I will post here as well. It uses the findall() method from the regex library:
import re
def my_count(a, n):
return len(re.findall('0'*n, a))

Time complexity of integer comparison in python

What is the time complexity of integer comparison in Python for very large integers? For example, if we calculate factorial of 1000 using 2 functions, then check equality, is it O(1)?
def fact(n):
prod = 1
for i in range(n):
prod = prod * (i + 1)
return prod
i = fact(1000)
j = fact(1000)
# Complexity of this check?
if i == j:
print "Equal"
There isn't a simple answer, but the answer is nevertheless obvious ;-)
That is, if two integers are in fact equal, it's impossible to know that without comparing all their bits. So in case of equality, the time needed is proportional to the number of bits (which is proportional to log(abs(N)) if N is one of the comparands).
If they're not in fact equal, there are several cases, all related to implementation internals. Long ints are stored as a vector of "digits" in a power-of-2 base. If the vectors don't have the same lengths, then the ints aren't equal, and that takes constant time.
But if they do have the same lengths, then the "digits" have to be compared until finding the first (if any) mismatching pair. That takes time proportional to the number of digits that need to be compared.
Then complicate all the above to account for possible mixtures of signs.

Determine whether N is fibonacci or not, if not find the largest fibonacci number smaller than N

How can I determine whether a given number N is a fibonacci number or not, if that number is not a fibonacci number how can I*determine the largest fibonacci number smaller than N?
I found the solution via generating the series of fibonacci number with limit N.
Is there any better way to do this in Python?
guys consider while DOWN VOTES, I've accepted the solution provided here. I do not think it as worthwhile since I have posted what I need and got the solution from you guys.
Thank You.
A simple test for whether some integer N is a Fibonacci number is as follows:
N is a Fibonacci number iff either (5 * n^2 + 4) or (5 * n^2 - 4) is a square number.
See here for the ingenious proof (page 417): http://www.fq.math.ca/Scanned/10-4/advanced10-4.pdf
If it turns out that N is not a Fibonacci number, then the simplest method is just to keep trying with smaller numbers until you find one, although this could take a very long time for large N.
Here's a general algorithm:
The naive way is to solve it with recursion - but in terms of run time complexity it's not useful at all.
Create a new array, let's call it FibArr .
Insert 1,1, to the the array.
Then , the value of the i'th index in the array is fibArr[i-1]+fibArr[i-2] (i>=3)
In every iteration check whether the new value inserted into fibArr==N.
If true , return.
Else, check whether the inserted value is bigger then N.
If true , assuming now fibArr has k values, return the (k-1) value.
Else , keep iterating :)
*With python it's even easier to do - but notice that in python there are no arrays , but lists.
It's easier with python becuase you don't have to set the list length, like in java.

Categories