Trying to understand the space complexity of concatenated string output - python

I had this problem in a coding interview:
# AAABB should return A3B2
This is a classic algorithm interview question. I said that I can solve this in O(n) time and O(1) space.
def compress(s):
output = ''
count = 1
for i in range(len(s)-1):
if s[i] == s[i+1]:
count+=1
else:
output = output + s[i] + str(count)
count=1
output = output +s[i+1] + str(count)
return output
compress('AAABB') #returns A3B2
I understand that O(n) space means that it grows proportionally with the size of input. So I was thinking that O(n) space would look something like
[(A,3),(B,2)].
I am under the impression that A3B2 is in O(1) space since it's not being split up into multiple strings.
I now realized that n == len(s) and my output grows un-proportionally (less) with my input size, so is it correct to say that space is O(log n)?

The length of the output string you store must be counted. In the worst case (no consecutive characters match), it’s actually twice as long as the input. So clearly it’s O(n) in general: it would only be asymptotically better if somehow you knew that long inputs always contained very long runs. (In the best case, all characters are the same, and the length of the one number is O(log n).)
That said, it’s sometimes useful to consider your output as a stream (like print), and then your space complexity (for count and perhaps the current input character) is constant. Of course, even then it’s technically logarithmic, since the number of bits needed to store count is, but that’s often disregarded in practical analyses.

Related

String divisibility without search tables?

Given two strings s & t, determine if s is divisible by t. For
example: "abab" is divisible by "ab" But "ababab" is not divisible by
"abab". If it isn't divisible, return -1. If it is, return the length
of the smallest common divisor: So, for "abababab" and "abab", return
2 as s is divisible by t and the smallest common divisor is "ab" with
length 2.
The way I thought it through was: I define the lengths of these two strings, find the greatest common divisor of these two. If t divides s, then the smallest common divisor is just the smallest divisor of t. And then one can use this algorithm: https://en.wikipedia.org/wiki/Knuth%E2%80%93Morris%E2%80%93Pratt_algorithm.
Is there any simpler solution?
To test for divisibility:
test that the length of s is a multiple of the length of t (otherwise not divisible);
divide s into chunks of length t; and
check that all the chunks are the same.
To find the smallest common divisor, you need to find the shortest repeating substring of t that makes up the whole of t. One approach is:
Find the factors of the length of t (the crude approach of searching from 1 up to sqrt(len(t)) should be fine for strings of any reasonable length);
For each factor (start with the smallest):
i. divide t into chunks of length factor;
ii. check if all the chunks are the same, and return factor if they are.
Using a Python set is a neat way to check if all the chunks in a list are equal. len(set(chunks)) == 1 tells you they are.
The GCD of the string lengths doesn't help you.
A string t "divides" a string s if len(s) is a multiple of len(t) and s == t * (len(s)//len(t)).
As for finding the length of the smallest divisor of t, the classic trick for that is (t+t).index(t, 1): find the first nonzero index of t in t+t. I've used the built-in index method here, but depending on why you're doing this and what runtime properties you want, KMP string search may be a better choice. Manually implementing KMP in Python is going to have a massive constant-factor overhead, but it'll have a much better worst-case runtime than index.
KMP isn't particularly hard to implement, as far as string searches go. You're not going to get anything significantly "simpler" without delegating to a library routine or sacrificing asymptotic complexity properties (or both).
If you want to avoid a search table and keep asymptotic complexity guarantees, you can use something like two-way string matching, but it's not going to be any easier to implement than KMP.

How to suppress output in python without using characters and spaces?

I found out I can solve this problem in a better space complexity if I enter inputs one at time. However, python is defeating the purpose of my script.
Because python is outputting O(n) elements that I just gave for input.
0
2
no
See that 0 and 2 for inputs? I only need one line. And it can only have the current input for that one particular line. (not 0,2....) Otherwise, the computer is technically still using O(n) space just for the graphics card to remember what pixels to output.
I tried using os.devnull and other methods. But, the computer still used O(N) space by simply outputting none or null. Outputting space characters still use O(N) space so does every other possible character you can think of. Output must be 100% suppressed excluding the yes or no outputs.
This isn't impossible because I guarantee you that the algorithm works by hand with an auxiliary space better than O(N)
# Decision Problem: Is N in index M?
import sys
import os
M = 2
N = 2
index = -1
while True:
# We are going to enter each element from our list ONE AT A TIME.
# This will improve our space-complexity that is better than O(n)
# Be careful not to enter the elements out of order.
a = int(input(os.devnull))
index = index + 1
if a == N:
if index == M:
print('yes')
break
if index > M:
print('no')
break
if index < M:
if a == N:
print('no')
break
Question
How do I suppress output completely without losing my "yes" or "no" outputs?

Is there a faster way to count non-overlapping occurrences in a string than count()?

Given a minimum length N and a string S of 1's and 0's (e.g. "01000100"), I am trying to return the number of non-overlapping occurrences of a sub-string of length n containing all '0's. For example, given n=2 and the string "01000100", the number of non-overlapping "00"s is 2.
This is what I have done:
def myfunc(S,N):
return S.count('0'*N)
My question: is there a faster way of performing this for very long strings? This is from an online coding practice site and my code passes all but one of the test cases, which fails due to not being able to finish within a time limit. Doing some research it seems I can only find that count() is the fastest method for this.
This might be faster:
>>> s = "01000100"
>>> def my_count( a, n ) :
... parts = a.split('1')
... return sum( len(p)//n for p in parts )
...
>>> my_count(s, 2)
2
>>>
Worst case scenario for count() is O(N^2), the function above is strictly linear O(N). Here's the discussion where O(N^2) number came from: What's the computational cost of count operation on strings Python?
Also, you may always do this manually, without using split(), just loop over the string, reset counter (once saved counter // n somewhere) on 1 and increase counter on 0. This would definitely beat any other approach because strictly O(N).
Finally, for relatively large values of n (n > 10 ?), there might be a sub-linear (or still linear, but with a smaller constant) algorithm, which starts with comparing a[n-1] to 0, and going back to beginning. Chances are, there going to be a 1 somewhere, so we don't have to analyse the beginning of the string if a[n-1] is 1 -- simply because there's no way to fit enough zeros in there. Assuming we have found 1 at position k, the next position to compare would be a[k+n-1], again going back to the beginning of the string.
This way we can effectively skip most of the string during the search.
lenik posted a very good response that worked well. I also found another method faster than count() that I will post here as well. It uses the findall() method from the regex library:
import re
def my_count(a, n):
return len(re.findall('0'*n, a))

Regarding Time complexity of program

I have done question in one of competitive exam but I am struggling to find out the time complexity of program i.e whether it is O(n) or O(n^2) in python 3.can any one help me.
I asked one of my friends some of them told it is O(n),and some of them told it is O(n^2) so I am totally get confused with there answers.
s = input() #reading base string
b = input() #reading reference string
for i in s:
if i in b:
print(i, end='')
Sample Input:
polikujmnhytgbvfredcxswqaz #base string
abcd #refernce string
Sample output:
bdca
The misconception here, which is common among beginners in my experience, is that you can only have one variable in your big-O notation. This seems to happensbecause most introductory examples are shown with a single input. When you have multiple independent inputs, you can have multiple variables, since the complexity will scale independently when either input changes.
A ubiquitous example of this is graphs. Graphs have nodes and edges. The number of nodes might set an upper bound on the number of edges, but the two are really quite independent. Most graph algorithms are therefore analyzed in terms of V and E, rather than a single variable N.
What this means for you is that you have two independent quantities. Let's say S = len(s) and B = len(b). The outer loop performs S iterations. The operator in b performs B operations in the worst case. If you assume that print runs in constant time for a single character, the result is O(S * B).
Where
n = len(s)
m = len(b)
your code will scale with time complexity
O(m*n)
Since in the worst case you loop through the whole base string and perform m maximum constant time if operations. N is often used as a placeholder in theory but has no meaning regarding your code.

Fastest way to compute e^x?

What is the fastest way to compute e^x, given x can be a floating point value.
Right now I have used the python's math library to compute this, below is the complete code where in result = -0.490631 + 0.774275 * math.exp(0.474907 * sum) is the main logic, rest is file handling code which the question demands.
import math
import sys
def sum_digits(n):
r = 0
while n:
r, n = r + n % 10, n // 10
return r
def _print(string):
fo = open("output.txt", "w+")
fo.write(string)
fo.close()
try:
f = open('input.txt')
except IOError:
_print("error")
sys.exit()
data = f.read()
num = data.split('\n', 1)[0]
try:
val = int(num)
except ValueError:
_print("error")
sys.exit()
sum = sum_digits(int(num))
f.close()
if (sum == 2):
_print("1")
else:
result = -0.490631 + 0.774275 * math.exp(0.474907 * sum)
_print(str(math.ceil(result)))
The rvalue of result is the equation of curve (which is the solution to a programming problem) which I derived from wolfarm-mathematica using my own data set.
But this doesn't seem to pass the par criteria of the assessment !
I have also tried the newton-raphson way but the convergence for larger x is causing the problem, other than that, calculating the natural log ln(x) is a challenge there again !
I don't have any language constraint so any solution is acceptable. Also if the python's math library is fastest as some of the comments says then can anyone give an insight on the time complexity and execution time of this program, in short the efficiency of the program ?
I don't know if the exponential curve math is accurate in this code, but it certainly isn't the slow point.
First, you read the input data in one read call. It does have to be read, but that loads the entire file. The next step takes the first line only, so it would seem more appropriate to use readline. That split itself is O(n) where n is the file size, at least, which might include data you were ignoring since you only process one line.
Second, you convert that line into an int. This probably requires Python's long integer support, but the operation could be O(n) or O(n^2). A single pass algorithm would multiply the accumulated number by 10 for each digit, allocating one or two new (longer) longs each time.
Third, sum_digits breaks that long int down into digits again. It does so using division, which is expensive, and two operations as well, rather than using divmod. That's O(n^2), because each division has to process every higher digit for each digit. And it's only needed because of the conversion you just did.
Summing the digits found in a string is likely easier done with something like sum(int(c) for c in l if c.isdigit()) where l is the input line. It's not particularly fast, as there's quite a bit of overhead in the digit conversions and the sum might grow large, but it does make a single pass with a fairly tight loop; it's somewhere between O(n) and O(n log n), depending on the length of the data, because the sum might grow large itself.
As for the unknown exponential curve, the existence of an exception for a low number is concerning. There's likely some other option that's both faster and more accurate if the answer's an integer anyway.
Lastly, you have at least four distinct output data formats: error, 2, 3.0, 3e+20. Do you know which of these is expected? Perhaps you should be using formatted output rather than str to convert your numbers.
One extra note: If the data is really large, processing it in chunks will definitely speed things up (instead of running out of memory, needing to swap, etc). As you're looking for a digit sum your size complexity can be reduced from O(n) to O(log n).

Categories