Python - Finding Index Location Function - python

What is the complexity of the following function??
def find_set(string, chars):
schars = set(chars)
for i, c in enumerate(string):
if c in schars:
return i
return -1
print(find_set("Happy birthday", "py"))
In this instance, a 1 is returned, since H is at index 1 of CHEERIO.
Is it possible to further optimize this function?

Your (worst case) time complexity is O(len(string) * len(set)). Yes you can do better (at least from an algorithms perspective).
def find_set(string, chars):
schars = set(chars)
return next((i for i, c in enumerate(string) if c in schars), -1)
This should execute in O(len(chars) + len(string)) (worst case). Of course, when it comes to "optimization", usually you should forget what you think you know and profile. Just because mine has better algorithmic complexity doesn't mean that it will perform better on your real world data.

Related

What is the algorithmic complexity of this recursive function?

I have the following recursive Python function:
def f(n):
if n <=2:
result = n
else:
result = f(n-2) + f(n-3)
return result
What would you say is the algorithmic complexity of it?
I was thinking this function is really similar to the Fibonacci recursive function which has a complexity of O(2^N) but I am not sure if that would be correct for this specific case.
Just write the complexity function.
Assuming all atomic operations here worth 1 (they are O(1), so, even if it is false to say that they are all equal, as long as we just need the big-O of our complexity, at the end, it is equivalent to the reality), the complexity function is
def complexity(n):
if n<=2:
return 1
else:
return 1 + complexity(n-2) + complexity(n-3)
So, complexity of f computation is almost the same thing as f!
Which is not surprising: the only possible direct return value for f is 2. All other values are sum of other f. And in absence of any mechanism to avoid to recompute redundant value, you can say that if f(1000) = 2354964531714, then it took 2354964531714/2 addition of f(?)=2 to get to that result.
So, number of additions to compute f(n) is O(f(n))
And since f(n) is O(1.32ⁿ) (f(n)/1.32ⁿ → ∞, while f(n)/1.33ⁿ → 0. So it is exponential, with a base somewhere in between 1.32 and 1.33), it is an exponential complexity.
Let's suppose, guessing wildly, that T(n) ~ rT(n-1) for large values of n. Then our recurrence relation...
T(n) = c + T(n-2) + T(n-3)
Can be rewritten...
r^3T(n-3) ~ c + rT(n-3) + T(n-3)
Letting x = T(n-3), we have
(r^3)x ~ c + rx + x
Rearranging, we get
x(r^3 - r - 1) ~ c
And finally:
r^3 - r - 1 ~ c/x
Assuming again that x is very large, c/x ~ 0, so
r^3 - r - 1 ~ 0
I didn't have much look solving this analytically, however, as chrslg finds and Wolfram Alpha confirms, this has a root around ~1.32-1.33, and so there is real value of r that works; and so the time complexity is bounded by an exponential function with that base. If I am able to find an analytical solution, I will post with details (or if anybody else can do it, please leave it here.)

Recursion with memory vs loop

I've made two functions for computing the Fibonacci Sequence, one using recursion with memory and one using a loop;
def fib_rec(n, dic = {0 : 0, 1 : 1}):
if n in dic:
return dic[n]
else:
fib = fib_rec(n - 2, dic) + fib_rec(n - 1, dic)
dic[n] = fib
return fib
def fib_loop(n):
if n == 0 or n == 1:
return n
else:
smaller = 0
larger = 1
for i in range(1, n):
smaller, larger = larger, smaller + larger
return larger
I've heard that the Fibonacci Sequence often is solved using recursion, but I'm wondering why. Both my algorithms are of linear time complexity, but the one using a loop will not have to carry a dictionary of all past Fibonacci numbers, it also won't exceed Python's recursion depth.
Is this problem solved using recursion only to teach recursion or am I missing something?
The usual recursive O(N) Fibonacci implementation is more like this:
def fib(n, a=0, b=1):
if n == 0: return a
if n == 1: return b
return fib(n - 1, b, a + b)
The advantage with this approach (aside from the fact that it uses O(1) memory) is that it is tail-recursive: some compilers and/or runtimes can take advantage of that to secretly convert it to a simple JUMP instruction. This is called tail-call optimization.
Python, sadly, doesn't use this strategy, so it will use extra memory for the call stack, which as you noted quickly runs into Python's recursion depth limit.
The Fibonacci sequence is mostly a toy problem, used for teaching people how to write algorithms and about big Oh notation. It has elegant functional solutions as well as showing the strengths of dynamic programming (basically your dictionary-based solution), but it's also practically a solved problem.
We can also go a lot faster. The page https://www.nayuki.io/page/fast-fibonacci-algorithms describes how. It includes a fast doubling algorithm written in Python:
#
# Fast doubling Fibonacci algorithm (Python)
# by Project Nayuki, 2015. Public domain.
# https://www.nayuki.io/page/fast-fibonacci-algorithms
#
# (Public) Returns F(n).
def fibonacci(n):
if n < 0:
raise ValueError("Negative arguments not implemented")
return _fib(n)[0]
# (Private) Returns the tuple (F(n), F(n+1)).
def _fib(n):
if n == 0:
return (0, 1)
else:
a, b = _fib(n // 2)
c = a * (b * 2 - a)
d = a * a + b * b
if n % 2 == 0:
return (c, d)
else:
return (d, c + d)

find index of first occurrence of str from given str

Google or Amazone ask the following question in an interview, would my solution be accepted?
problem: find the index of the first occurrence of the given word from the given string
note: Above problem is from a website and following code passed all the test cases. however, I am not sure if this is the most optimum solutions and so would be accepted by big giants.
def strStr(A, B):
if len(A) == 0 or len(B) == 0:
return -1
for i in range(len(A)):
c = A[i:i+len(B)]
if c == B:
return i
else:
return -1
There are a few algorithms that you can learn on this topic like
rabin karp algorithm , z algorithm ,kmpalgorithm
which all run in run time complexity of O(n+m) where n is the string length and m is the pattern length. Your algorithm runs in O(n*m) runtime complexity . I would suggest starting to learn from rabin karp algorithm, I personally found it the easiest to grasp.
There are also some advanced topics like searching many patterns in one string like the aho-corasick algorithm which is good to read. I think this is what grep uses when searching for multiple patterns.
Hope it helps :)
Python actually has a built in function for this, which is why this question doesn't seem like a great fit for interviews in python. Something like this would suffice:
def strStr(A, B):
return A.find(B)
Otherwise, as commenters have mentioned, inputs/outputs and tests are important. You could add some checks that make it slightly more performant (i.e. check that B is smaller than A), but I think in general, you won't do better than O(n).
If you want to match the entire word to the words in the string, your code would not work.
E.g If my arguments are print(strStr('world hello world', 'wor')), your code would return 0, but it should return -1.
I checked your function, works well in python3.6
print(strStr('abcdef', 'bcd')) # with your function. *index start from 0
print("adbcdef".find('bcd')) # python default function. *index start from 1
first occurrence index, use index() or find()
text = 'hello i am homer simpson'
index = text.index('homer')
print(index)
index = text.find('homer')
print(index)
output:
11
11
It is always better to got for the builtin python funtions.
But sometimes in the interviews they will ask for you to implemente it yourself. The best thing to do is to start with the simplest version, then think about corner cases and improvements.
Here you have a test with your version, a slightly improved one that avoid to reallocating new strings in each index and the python built-ing:
A = "aaa foo baz fooz bar aaa"
B = "bar"
def strInStr1(A, B):
if len(A) == 0 or len(B) == 0:
return -1
for i in range(len(A)):
c = A[i:i+len(B)]
if c == B:
return i
else:
return -1
def strInStr2(A, B):
size = len(B)
for i in range(len(A)):
if A[i] == B[0]:
if A[i:i+size] == B:
return i
return -1
def strInStr3(A, B):
return A.index(B)
import timeit
setup = '''from __main__ import strInStr1, strInStr2, strInStr3, A, B'''
for f in ("strInStr1", "strInStr2", "strInStr3"):
result = timeit.timeit(f"{f}(A, B)", setup=setup)
print(f"{f}: ", result)
The results speak for themselves (time in seconds):
strInStr1: 15.809420814999612
strInStr2: 7.687011377005547
strInStr3: 0.8342400040055509
Here you have the live version

Recursion, memoization and mutable default arguments in Python

"Base" meaning without just using lru_cache. All of these are "fast enough" -- I'm not looking for the fastest algorithm -- but the timings surprised me so I was hoping I could learn something about how Python "works".
Simple loop (/tail recursion):
def fibonacci(n):
a, b = 0, 1
if n in (a, b): return n
for _ in range(n - 1):
a, b = b, a + b
return b
Simple memoized:
def fibonacci(n, memo={0:0, 1:1}):
if len(memo) <= n:
memo[n] = fibonacci(n - 1) + fibonacci(n - 2)
return memo[n]
Using a generator:
def fib_seq():
a, b = 0, 1
yield a
yield b
while True:
a, b = b, a + b
yield b
def fibonacci(n):
return next(x for (i, x) in enumerate(fib_seq()) if i == n)
I expected the first, being dead simple, to be the fastest. It's not. The second is by far the fastest, despite the recursion and lots of function calls. The third is cool, and uses "modern" features, but is even slower, which is disappointing. (I was tempted to think of generators as in some ways an alternative to memoization -- since they remember their state -- and since they're implemented in C I was hoping they'd be faster.)
Typical results:
loop: about 140 μs
memo: about 430 ns
genr: about 250 μs
So can anyone explain, in particular, why memoization is an order of magnitude faster than a simple loop?
EDIT:
Clear now that I have (like many before me) simply stumbled upon Python's mutable default arguments. This behavior explains the real and the apparent gains in execution speeds.
What you're seeing is the whole point of memoization. The first time you call the function, the memo cache is empty and it has to recurse. But the next time you call it with the same or a lower parameter, the answer is already in the cache, so it returns immediately. if you perform thousands of calls, you're amortizing that first call's time over all the other calls. That's what makes memoization such a useful optimization, you only pay the cost the first time.
If you want to see how long it takes when the cache is fresh and you have to do all the recursions, you can pass the initial cache as an explicit argument in the benchmark call:
fibonacci(100, {0:0, 1:1})

Low Autocorrelation Binary Sequence problem? Python troubleshooting

I'm trying to model this problem (for details on it, http://www.mpi-hd.mpg.de/personalhomes/bauke/LABS/index.php)
I've seen that the proven minimum for a sequence of 10 digits is 13. However, my application seems to be getting 12 quite frequently. This implies some kind of error in my program. Is there an obvious error in the way I've modeled those summations in this code?
def evaluate(self):
self.fitness = 10000000000 #horrible practice, I know..
h = 0
for g in range(1, len(self.chromosome) - 1):
c = self.evaluateHelper(g)
h += c**2
self.fitness = h
def evaluateHelper(self, g):
"""
Helper for evaluate function. The c sub g function.
"""
totalSum = 0
for i in range(len(self.chromosome) - g - 1):
product = self.chromosome[i] * self.chromosome[(i + g) % (len(self.chromosome))]
totalSum += product
return totalSum
I can't spot any obvious bug offhand, but you're making things really complicated, so maybe a bug's lurking and hiding somewhere. What about
def evaluateHelper(self, g):
return sum(a*b for a, b in zip(self.chromosome, self.chomosome[g:]))
this should return the same values you're computing in that subtle loop (where I think the % len... part is provably redundant). Similarly, the evaluate method seems ripe for a similar 1-liner. But, anyway...
There's a potential off-by-one issue: the formulas in the article you point to are summing for g from 1 to N-1 included -- you're using range(1, len(...)-1), whereby N-1 is excluded. Could that be the root of the problem you observe?
Your bug was here:
for i in range(len(self.chromosome) - g - 1):
The maximum value for i will be len(self.chromosome) - g - 2, because range is exclusive. Thus, you don't consider the last pair. It's basically the same as your other bug, just in a different place.

Categories