How to optimize short string lexing in Python for speed - python

I'm trying to lex (i.e., tokenize) escaped strings in pure CPython fast (without resorting to C code).
The best I have been able to come up with is the following:
def bench(s, c, i, n):
m = 0
iteration = 0
while iteration < n:
# How do I optimize this part?
# Inputs: string s, index i
k = i
while True:
j = s.index(c, k, n)
sub = s[k:j]
if '\\' not in sub: break
k += sub.index('\\') + 2
# Outputs: substring s[i:j], index j
m += j - i
iteration += 1
return m
def test():
from time import clock
start = clock()
s = 'sd;fa;sldkfjas;kdfj;askjdf;askjd;fasdjkfa, "abcdefg", asdfasdfas;dfasdl;fjas;dfjk'
m = bench(s, '"', s.index('"') + 1, 3000000)
print "%.0f chars/usec" % (m / (clock() - start) / 1000000,)
test()
However, it's still somewhat slow for my taste. It seems that the invocation of .index is taking a lot of time in my actual project, though it doesn't seem to happen quite as often in this benchmark.
Most strings that it needs to lex can be assumed to be relatively short (say, 7 characters) and are unlikely to contain backslashes. I've already optimized for that somewhat. My question is:
Are there any optimizations I could make to speed up this code? If so, what?

Related

Python - "Fast Exponention Algorithm" performance outperformed by "worse" algorithm

Can anyone explain to me how this code:
def pow1(a, n):
DP = [None] * (n+1)
DP[1] = a
i = [1]
while not i[-1] == n:
if(i[-1]*2 <= n):
DP[i[-1]*2] = DP[i[-1]]*DP[i[-1]]
i.append(i[-1]+i[-1])
else:
missing = n-i[-1]
low = 0
high = len(i) - 1
mid = 0
while low <= high:
mid = (high + low) // 2
if i[mid] < missing:
if i[mid+1] > missing:
break
low = mid + 1
elif i[mid] > missing:
high = mid - 1
else:
break
DP[i[-1]+i[mid]] = DP[i[-1]]*DP[i[mid]]
i.append(i[mid]+i[-1])
return DP[n]
out-performs this code:
def pow2(a, n):
res = 1
while (n > 0):
if (n & 1):
res = res * a
a = a * a
n >>= 1
return res
Here is how I check them:
a = 34 # just arbitrary
n = 2487665 # just arbitrary
starttime = timeit.default_timer()
pow1(a, n)
print("pow1: The time difference is :", (
timeit.default_timer() - starttime))
starttime = timeit.default_timer()
pow2(a, n)
print("pow2: The time difference is :",
(timeit.default_timer() - starttime))
This is the result on my MacBook Air m1:
# pow1: The time difference is : 3.71763225
# pow2: The time difference is : 6.091892
As far as I can tell they work very similarly only the first one (pow1) stores all its intermediate results therefore has like O(log(n)) space complexity and then has to find all the required factors(sub-products) to get the final result. So O(log(n)) calculating them all, worst case has to do log(n) binarySearches (again O(log(n)) resulting in a runtime of O( logN*log(logN) ). Where as pow2 essentially never has to search through previous results... so pow2 has Time Complexity: O(logN) and Auxiliary Space: O(1) vs pow1 - Time Complexity: O(logN*logN*logN) and Auxiliary Space: O(logN).
Maybe (probably) I'm missing something essential but I don't see how the algorithm, the hardware (ARM), python or the testing could have this impact.
Also I just realised the space complexity for pow1 is O(n) the way I did it.
Okey so I figured it out. In pow2 (which I implemented from cp-algorithms.com but can also be found on geeksforgeeks.org in python) there is a bug.
The problem is this line gets executed one time too many:
res = 1
while (n > 0):
if (n & 1):
res = res * a
a = a * a #<--- THIS LINE
n >>= 1
return res
that gets called even tough the result has already been calculated, causing the function to do one more unnecessary multiplication, which with big numbers has a big impact. Here would be a very quick fix:
def pow2(a, n):
res = 1
while (n > 0):
if n & 1:
res = res * a
if n == 1:
return res
a = a * a
n >>= 1
return res
New measurements:
# a:34 n:2487665
# pow1(): The time difference is : 3.749621834
# pow2(): The time difference is : 3.072042833
# a**n: The time difference is : 2.119430791000001
Great catch. It's a weakness of while loops that they lead to wasted computations at the bottom of the loop body. Usually it doesn't matter.
Languages like Ada provide an unconditional loop expecting that an internal break (exit in Ada) will be used to leave for this reason.
So fwiw, a cleaner code using this "break in the middle" style would be:
def pow2(a, n):
res = 1
while True:
if (n & 1):
res *= a
n >>= 1
if n == 0:
return res
a *= a
With other algorithms, you might need to guard the loop for the case n = 0. But with this one, that's optional. You could check for that and return 1 explicitly before the loop if desired.
Overall, this avoids a comparison per loop wrt your solution. Maybe with big numbers this is worthwhile.

Why is str.replace so fast?

I'm currently learning and practicing algorithms on strings. Specifically I was toying with replacing patterns in strings based on KMP with some modifications, which has O(N) complexity (my implementation below).
def replace_string(s, p, c):
"""
Replace pattern p in string s with c
:param s: initial string
:param p: pattern to replace
:param c: replacing string
"""
pref = [0] * len(p)
s_p = p + '#' + s
p_prev = 0
shift = 0
for i in range(1, len(s_p)):
k = p_prev
while k > 0 and s_p[i] != s_p[k]:
k = pref[k - 1]
if s_p[i] == s_p[k]:
k += 1
if i < len(p):
pref[i] = k
p_prev = k
if k == len(p):
s = s[:i - 2 * len(p) + shift] + c + s[i - len(p) + shift:]
shift += len(c) - k
return s
Then, I wrote the same program using built-in python str.replace function:
def replace_string_python(s, p, c):
return s.replace(p, c)
and compared performance for various strings, I'll attach just one example, for string of length 1e5:
import time
if __name__ == '__main__':
initial_string = "a" * 100000
pattern = "a"
replace = "ab"
start = time.time()
res = replace_string(initial_string, pattern, replace)
print(time.time() - start)
Output (my implementation):
total time: 1.1617710590362549
Output (python built-in):
total time: 0.0015637874603271484
As you can see, implementation via python str.replace is light-years ahead KMP. So my question why is that? What algorithm does python C code use?
While the algorithm might be O(N), your implementation does not seem linear, at least not with respect to multiple repetitions of the pattern, because of
s = s[:i - 2 * len(p) + shift] + c + s[i - len(p) + shift:]
which is O(N) itself. Thus if your pattern happens N time in a string, your implementation is in fact O(N^2).
See the following timings for the scaling time of your algorithm, which confirms the quadratic shape
LENGTH TIME
------------
100000 1s
200000 8s
300000 31s
400000 76s
500000 134s

Calculating a^b mod p for a large prime p

I'm trying to write a python code that calculates a^b mod p, where p = 10^9+7 for a list of pairs (a,b). The challenge is that the code has to finish the calculation and output the result in < 1 second. I've implemented successive squaring to calculate a^b mod p quickly. Please see my code below:
from sys import stdin, stdout
rl = stdin.readline
wo = stdout.write
m = 10**9+7
def fn(a,n):
t = 1
while n > 0:
if n%2 != 0: #exponent is odd
t = t*a %m
a = a*a %m
n = int(n/2)
return t%m
t = int(rl()) # number of pairs
I = stdin.read().split() # reading all pairs
I = list(map(int,I)) # making pairs a list of integers
# calculating a^b mod p. I used map because I read its faster than a for loop
s = list(map(fn,I[0:2*t:2],I[1:2*t:2]))
stdout.write('\n'.join(map(str,s))) # printing output
for 200000 pairs (a,b) with a,b<10^9, my code takes > 1 second. I'm new to python and was hoping someone could help me identify the time bottle neck in my code. Is it reading input and printing output or the calculation itself? Thanks for the help!
I don't see something wrong with your code from an efficiency standpoint, it's just unnecessarily complicated.
Here's what I'd call the straight-forward solution:
n = int(input())
for _ in range(n):
a, b = map(int, input().split())
print(pow(a, b, 10**9 + 7))
That did get accepted with PyPy3 but not with CPython3. And with PyPy3 it still took 0.93 seconds.
I'd say their time limit is inappropriate for Python. But try yours with PyPy3 if you haven't yet.
In case someone's wondering whether the map wastes time, the following got accepted in 0.92 seconds:
n = int(input())
for _ in range(n):
a, b = input().split()
print(pow(int(a), int(b), 10**9 + 7))

Homework: Implementing the Z algorithm in python, it's really slow, slower than naive string search

I have to implement the Z algorithm and use it to search a target text for a specific pattern. I've implemented what I thought was the correct algorithm and search function using it but it's really slow. For the naive implementation of string search I consistently got times lower than 1.5 seconds and for the z string search I consistently got times over 3 seconds (for my biggest test case) so I have to be doing something wrong. The results seem to be correct, or were at least for the few test cases we were given. The code for the functions mentioned in my rant is below:
import sys
import time
# z algorithm a.k.a. the fundemental preprocessing algorithm
def z(P, start=1, max_box_size=sys.maxsize):
n = len(P)
boxes = [0] * n
l = -1
r = -1
for k in range(start, n):
if k > r:
i = 0
while k + i < n and P[i] == P[k + i] and i < max_box_size:
i += 1
boxes[k] = i
if i:
l = k
r = k + i - 1
else:
kp = k - l
Z_kp = boxes[kp]
if Z_kp < r - k + 1:
boxes[k] = Z_kp
else:
i = r + 1
while i < n and P[i] == P[i - k] and i - k < max_box_size:
i += 1
boxes[k] = i - k
l = k
r = i - 1
return boxes
# a simple string search
def naive_string_search(P, T):
m = len(T)
n = len(P)
indices = []
for i in range(m - n + 1):
if P == T[i: i + n]:
indices.append(i)
return indices
# string search using the z algorithm.
# The pattern you're searching for is simply prepended to the target text
# and than the z algorithm is run on that concatenation
def z_string_search(P, T):
PT = P + T
n = len(P)
boxes = z(PT, start=n, max_box_size=n)
return list(map(lambda x: x[0]-n, filter(lambda x: x[1] >= n, enumerate(boxes))))
Your's implementation of z-function def z(..) is algorithmically ok and asymptotically ok.
It has O(m + n) time complexity in worst case while implementation of naive string search has O(m*n) time complexity in worst case, so I think that the problem is in your test cases.
For example if we take this test case:
T = ['a'] * 1000000
P = ['a'] * 1000
we will get for z-function:
real 0m0.650s
user 0m0.606s
sys 0m0.036s
and for naive string matching:
real 0m8.235s
user 0m8.071s
sys 0m0.085s
PS: You should understand that there are a lot of test cases where naive string matching works in linear time too, for example:
T = ['a'] * 1000000
P = ['a'] * 1000000
Thus the worst case for a naive string matching is where function should apply pattern and check again and again. But in this case it will do only one check because of the lengths of the input (it cannot apply pattern from index 1 so it won't continue).

Python Swap two digits in a number?

What is the fastest way to swap two digits in a number in Python? I am given the numbers as strings, so it'd be nice if I could have something as fast as
string[j] = string[j] ^ string[j+1]
string[j+1] = string[j] ^ string[j+1]
string[j] = string[j] ^ string[j+1]
Everything I've seen has been much more expensive than it would be in C, and involves making a list and then converting the list back or some variant thereof.
This is faster than you might think, at least faster than Jon Clements' current answer in my timing test:
i, j = (i, j) if i < j else (j, i) # make sure i < j
s = s[:i] + s[j] + s[i+1:j] + s[i] + s[j+1:]
Here's my test bed should you want to compare any other answers you get:
import timeit
import types
N = 10000
R = 3
SUFFIX = '_test'
SUFFIX_LEN = len(SUFFIX)
def setup():
import random
global s, i, j
s = 'abcdefghijklmnopqrstuvwxyz'
i = random.randrange(len(s))
while True:
j = random.randrange(len(s))
if i != j: break
def swapchars_martineau(s, i, j):
i, j = (i, j) if i < j else (j, i) # make sure i < j
return s[:i] + s[j] + s[i+1:j] + s[i] + s[j+1:]
def swapchars_martineau_test():
global s, i, j
swapchars_martineau(s, i, j)
def swapchars_clements(text, fst, snd):
ba = bytearray(text)
ba[fst], ba[snd] = ba[snd], ba[fst]
return str(ba)
def swapchars_clements_test():
global s, i, j
swapchars_clements(s, i, j)
# find all the functions named *SUFFIX in the global namespace
funcs = tuple(value for id,value in globals().items()
if id.endswith(SUFFIX) and type(value) is types.FunctionType)
# run the timing tests and collect results
timings = [(f.func_name[:-SUFFIX_LEN],
min(timeit.repeat(f, setup=setup, repeat=R, number=N))
) for f in funcs]
timings.sort(key=lambda x: x[1]) # sort by speed
fastest = timings[0][1] # time fastest one took to run
longest = max(len(t[0]) for t in timings) # len of longest func name (w/o suffix)
print 'fastest to slowest *_test() function timings:\n' \
' {:,d} chars, {:,d} timeit calls, best of {:d}\n'.format(len(s), N, R)
def times_slower(speed, fastest):
return speed/fastest - 1.0
for i in timings:
print "{0:>{width}}{suffix}() : {1:.4f} ({2:.2f} times slower)".format(
i[0], i[1], times_slower(i[1], fastest), width=longest, suffix=SUFFIX)
Addendum:
For the special case of swapping digit characters in a positive decimal number given as a string, the following also works and is a tiny bit faster than the general version at the top of my answer.
The somewhat involved conversion back to a string at the end with the format() method is to deal with cases where a zero got moved to the front of the string. I present it mainly as a curiosity, since it's fairly incomprehensible unless you grasp what it does mathematically. It also doesn't handle negative numbers.
n = int(s)
len_s = len(s)
ord_0 = ord('0')
di = ord(s[i])-ord_0
dj = ord(s[j])-ord_0
pi = 10**(len_s-(i+1))
pj = 10**(len_s-(j+1))
s = '{:0{width}d}'.format(n + (dj-di)*pi + (di-dj)*pj, width=len_s)
It has to be of a mutable type of some sort, the best I can think of is (can't make any claims as to performance though):
def swapchar(text, fst, snd):
ba = bytearray(text)
ba[fst], ba[snd] = ba[snd], ba[fst]
return ba
>>> swapchar('thequickbrownfox', 3, 7)
bytearray(b'thekuicqbrownfox')
You can still utilise the result as a str/list - or explicitly convert it to a str if needs be.
>>> int1 = 2
>>> int2 = 3
>>> eval(str(int1)+str(int2))
23
I know you've already accepted an answer, so I won't bother coding it in Python, but here's how you could do it in JavaScript which also has immutable strings:
function swapchar(string, j)
{
return string.replace(RegExp("(.{" + j + "})(.)(.)"), "$1$3$2");
}
Obviously if j isn't in an appropriate range then it just returns the original string.
Given an integer n and two (zero-started) indexes i and j of digits to swap, this can be done using powers of ten to locate the digits, division and modulo operations to extract them, and subtraction and addition to perform the swap.
def swapDigits(n, i, j):
# These powers of 10 encode the locations i and j in n.
power_i = 10 ** i
power_j = 10 ** j
# Retrieve digits [i] and [j] from n.
digit_i = (n // power_i) % 10
digit_j = (n // power_j) % 10
# Remove digits [i] and [j] from n.
n -= digit_i * power_i
n -= digit_j * power_j
# Insert digit [i] in position [j] and vice versa.
n += digit_i * power_j
n += digit_j * power_i
return n
For example:
>>> swapDigits(9876543210, 4, 0)
9876503214
>>> swapDigits(9876543210, 7, 2)
9826543710

Categories