Memory limit exceeded in Python - python

I am solving a problem that needs either a list of integers or a dictionary of size 10^18. Upon running the code the compiler throws an error message saying "Memory Limit Exceeded".
Here is my code:
def fun(l, r, p):
#f = [None, 1, 1]
f = {0:0, 1:1, 2:1}
su = 0
for i in range(1, r):
if i%2 == 0:
f[i+2] = 2*f[i+1] - f[i] + 2
#f.append(2*f[i+1] - f[i] + 2)
else:
f[i+2] = 3*f[i]
#f.append(3*f[i])
for k in range(l, r):
su = su + f[k]
su = (su + f[r]) % p
print(su)
t, p = input().split()
p = int(p)
t = int(t)
#t = 3
#p = 100000007
for i in range(t):
l , r = input().split()
l = int(l)
r = int(r)
fun(l, r, p)
It is showing memory limit exceeded with a maximum memory usage of 306612 KiB.

Two observations here:
You don't need to store all numbers simultaneously. You can use the deque and generator functions to generate the numbers by keeping track of only the last three digits generated instead of the entire sequence.
import itertools
from collections import deque
def infinite_fun_generator():
seed = [0, 1, 1]
dq = deque(maxlen=2)
dq.extend(seed)
yield from seed
for i in itertools.count(1):
if i % 2 == 0:
dq.append(2 * dq[-1] - dq[-2] + 2)
else:
dq.append(3 * dq[-2])
yield dq[-1]
def fun(l, r, p):
funs = itertools.islice(infinite_fun_generator(), l, r + 1)
summed_funs = itertools.accumulate(funs, lambda a, b: (a + b) % p)
return deque(summed_funs, maxlen=1)[-1]
You might have a better chance asking this in Math.SE since I don't want to do the math right now, but just like with the Fibonacci sequence there's likely an analytic solution that you can use to compute the nth member of the sequence analytically without having to iteratively compute the intermediate numbers and it may even be possible to analytically derive a formula to compute the sums in constant time.

Related

Let n be a square number. Using Python, how we can efficiently calculate natural numbers y up to a limit l such that n+y^2 is again a square number?

Using Python, I would like to implement a function that takes a natural number n as input and outputs a list of natural numbers [y1, y2, y3, ...] such that n + y1*y1 and n + y2*y2 and n + y3*y3 and so forth is again a square.
What I tried so far is to obtain one y-value using the following function:
def find_square(n:int) -> tuple[int, int]:
if n%2 == 1:
y = (n-1)//2
x = n+y*y
return (y,x)
return None
It works fine, eg. find_square(13689) gives me a correct solution y=6844. It would be great to have an algorithm that yields all possible y-values such as y=44 or y=156.
Simplest slow approach is of course for given N just to iterate all possible Y and check if N + Y^2 is square.
But there is a much faster approach using integer Factorization technique:
Lets notice that to solve equation N + Y^2 = X^2, that is to find all integer pairs (X, Y) for given fixed integer N, we can rewrite this equation to N = X^2 - Y^2 = (X + Y) * (X - Y) which follows from famous school formula of difference of squares.
Now lets rename two factors as A, B i.e. N = (X + Y) * (X - Y) = A * B, which means that X = (A + B) / 2 and Y = (A - B) / 2.
Notice that A and B should be of same odditiy, either both odd or both even, otherwise in last formulas above we can't have whole division by 2.
We will factorize N into all possible pairs of two factors (A, B) of same oddity. For fast factorization in code below I used simple to implement but yet quite fast algorithm Pollard Rho, also two extra algorithms were needed as a helper to Pollard Rho, one is Fermat Primality Test (which allows fast checking if number is probably prime) and second is Trial Division Factorization (which helps Pollard Rho to factor out small factors, which could cause Pollard Rho to fail).
Pollard Rho for composite number has time complexity O(N^(1/4)) which is very fast even for 64-bit numbers. Any faster factorization algorithm can be chosen if needed a bigger space to be searched. My fast algorithm time is dominated by speed of factorization, remaining part of algorithm is blazingly fast, just few iterations of loop with simple formulas.
If your N is a square itself (hence we know its root easily), then Pollard Rho can factor N even much faster, within O(N^(1/8)) time. Even for 128-bit numbers it means very small time, 2^16 operations, and I hope you're solving your task for less than 128 bit numbers.
If you want to process a range of possible N values then fastest way to factorize them is to use techniques similar to Sieve of Erathosthenes, using set of prime numbers, it allows to compute all factors for all N numbers within some range. Using Sieve of Erathosthenes for the case of range of Ns is much faster than factorizing each N with Pollard Rho.
After factoring N into pairs (A, B) we compute (X, Y) based on (A, B) by formulas above. And output resulting Y as a solution of fast algorithm.
Following code as an example is implemented in pure Python. Of course one can use Numba to speed it up, Numba usually gives 30-200 times speedup, for Python it achieves same speed as optimized C++. But I thought that main thing here is to implement fast algorithm, Numba optimizations can be done easily afterwards.
I added time measurement into following code. Although it is pure Python still my fast algorithm achieves 8500x times speedup compared to regular brute force approach for limit of 1 000 000.
You can change limit variable to tweak amount of searched space, or num_tests variable to tweak amount of different tests.
Following code implements both solutions - fast solution find_fast() described above plus very tiny brute force solution find_slow() which is very slow as it scans all possible candidates. This slow solution is only used to compare correctness in tests and compare speedup.
Code below uses nothing except few standard Python library modules, no external modules were used.
Try it online!
def find_slow(N):
import math
def is_square(x):
root = int(math.sqrt(float(x)) + 0.5)
return root * root == x, root
l = []
for y in range(N):
if is_square(N + y ** 2)[0]:
l.append(y)
return l
def find_fast(N):
import itertools, functools
Prod = lambda it: functools.reduce(lambda a, b: a * b, it, 1)
fs = factor(N)
mfs = {}
for e in fs:
mfs[e] = mfs.get(e, 0) + 1
fs = sorted(mfs.items())
del mfs
Ys = set()
for take_a in itertools.product(*[
(range(v + 1) if k != 2 else range(1, v)) for k, v in fs]):
A = Prod([p ** t for (p, _), t in zip(fs, take_a)])
B = N // A
assert A * B == N, (N, A, B, take_a)
if A < B:
continue
X = (A + B) // 2
Y = (A - B) // 2
assert N + Y ** 2 == X ** 2, (N, A, B, X, Y)
Ys.add(Y)
return sorted(Ys)
def trial_div_factor(n, limit = None):
# https://en.wikipedia.org/wiki/Trial_division
fs = []
while n & 1 == 0:
fs.append(2)
n >>= 1
all_checked = False
for d in range(3, (limit or n) + 1, 2):
if d * d > n:
all_checked = True
break
while True:
q, r = divmod(n, d)
if r != 0:
break
fs.append(d)
n = q
if n > 1 and all_checked:
fs.append(n)
n = 1
return fs, n
def fermat_prp(n, trials = 32):
# https://en.wikipedia.org/wiki/Fermat_primality_test
import random
if n <= 16:
return n in (2, 3, 5, 7, 11, 13)
for i in range(trials):
if pow(random.randint(2, n - 2), n - 1, n) != 1:
return False
return True
def pollard_rho_factor(n):
# https://en.wikipedia.org/wiki/Pollard%27s_rho_algorithm
import math, random
fs, n = trial_div_factor(n, 1 << 7)
if n <= 1:
return fs
if fermat_prp(n):
return sorted(fs + [n])
for itry in range(8):
failed = False
x = random.randint(2, n - 2)
for cycle in range(1, 1 << 60):
y = x
for i in range(1 << cycle):
x = (x * x + 1) % n
d = math.gcd(x - y, n)
if d == 1:
continue
if d == n:
failed = True
break
return sorted(fs + pollard_rho_factor(d) + pollard_rho_factor(n // d))
if failed:
break
assert False, f'Pollard Rho failed! n = {n}'
def factor(N):
import functools
Prod = lambda it: functools.reduce(lambda a, b: a * b, it, 1)
fs = pollard_rho_factor(N)
assert N == Prod(fs), (N, fs)
return sorted(fs)
def test():
import random, time
limit = 1 << 20
num_tests = 20
t0, t1 = 0, 0
for i in range(num_tests):
if (round(i / num_tests * 1000)) % 100 == 0 or i + 1 >= num_tests:
print(f'test {i}, ', end = '', flush = True)
N = random.randrange(limit)
tb = time.time()
r0 = find_slow(N)
t0 += time.time() - tb
tb = time.time()
r1 = find_fast(N)
t1 += time.time() - tb
assert r0 == r1, (N, r0, r1, t0, t1)
print(f'\nTime slow {t0:.05f} sec, fast {t1:.05f} sec, speedup {round(t0 / max(1e-6, t1))} times')
if __name__ == '__main__':
test()
Output:
test 0, test 2, test 4, test 6, test 8, test 10, test 12, test 14, test 16, test 18, test 19,
Time slow 26.28198 sec, fast 0.00301 sec, speedup 8732 times
For the easiest solution, you can try this:
import math
n=13689 #or we can ask user to input a square number.
for i in range(1,9999):
if math.sqrt(n+i**2).is_integer():
print(i)

Python program seems to run forever without outputting in terminal

I have been working on a library to implement the RSA encryption method, and this file (along with others I have been working on) do not output, but instead after executing the script only output a blank line in terminal. I have run it through an autograder, and it times out. Below is the code for the library, but something tells me my issue could be an interpreter issue or something outside of the file itself. It looks like it could be getting stuck before reaching a return or output statement. I've also included a screenshot of the terminal output.
import stdio
import stdrandom
import sys
# Generates and returns the public/private keys as a tuple (n, e, d). Prime numbers p and q
# needed to generate the keys are picked from the interval [lo, hi).
def keygen(lo, hi):
primes = []
for i in range(lo, hi):
if _primes(0, i):
primes += [i]
ptemp = stdrandom.uniformInt(0, len(primes))
qtemp = stdrandom.uniformInt(0, len(primes))
p = primes[ptemp]
q = primes[qtemp]
n = p * q
m = (p - 1) * (q - 1)
while True:
e = stdrandom.uniformInt(2, m)
if e % m == 0 and m % e != 0:
break
d = 0
for a in range(1, m):
if (e * a) % m == 1:
d = a
break
return n, e, d
# Encrypts x (int) using the public key (n, e) and returns the encrypted value.
def encrypt(x, n, e):
return (x ** e) % n
# Decrypts y (int) using the private key (n, d) and returns the decrypted value.
def decrypt(y, n, d):
return (y ** d) % n
# Returns the least number of bits needed to represent n.
def bitLength(n):
return len(bin(n)) - 2
# Returns the binary representation of n expressed in decimal, having the given width, and padded
# with leading zeros.
def dec2bin(n, width):
return format(n, '0%db' % (width))
# Returns the decimal representation of n expressed in binary.
def bin2dec(n):
return int(n, 2)
# Returns a list of primes from the interval [lo, hi).
def _primes(lo, hi):
primes = []
for p in range(lo, hi + 1):
j = 2
f = 1
while(j * j <= p):
if(p % j == 0):
f = 0
break
j = j + 1
if(f == 1):
primes += [p]
return primes
# Returns a list containing a random sample (without replacement) of k items from the list a.
def _sample(a, k):
b = a.copy()
c = b[0:k]
stdrandom.shuffle(c)
return c
# Returns a random item from the list a.
def _choice(a):
random = stdrandom.uniformInt(0, len(a))
return random
# Unit tests the library [DO NOT EDIT].
def _main():
x = ord(sys.argv[1])
n, e, d = keygen(25, 100)
encrypted = encrypt(x, n, e)
stdio.writef('encrypt(%c) = %d\n', x, encrypted)
decrypted = decrypt(encrypted, n, d)
stdio.writef('decrypt(%d) = %c\n', encrypted, decrypted)
width = bitLength(x)
stdio.writef('bitLength(%d) = %d\n', x, width)
xBinary = dec2bin(x, width)
stdio.writef('dec2bin(%d) = %s\n', x, xBinary)
stdio.writef('bin2dec(%s) = %d\n', xBinary, bin2dec(xBinary))
if __name__ == '__main__':
_main()
As #iz_ suggests, you have an infinite loop in your code. This code:
while True:
e = stdrandom.uniformInt(2, m)
if e % m == 0 and m % e != 0:
break
will never exit because e will always be less than m, and therefore e % m == 0 will always be False. The loop won't exit until e % m == 0 is True, which will never happen.

What do I get from Queue.get() (Python)

Overall question: How do I know what I am getting from a Queue object when I call Queue.get()? How do I sort it, or identify it? Can you get specific items from the Queue and leave others?
Context:
I wanted to learn a little about multi-proccessing (threading?) to make solving a matrix equation more efficient.
To illustrate, below is my working code for solving the matrix equation Ax = b without taking advantage of multiple cores. The solution is [1,1,1].
def jacobi(A, b, x_k):
N = len(x_k)
x_kp1 = np.copy(x_k)
E_rel = 1
iteration = 0
if (N != A.shape[0] or N != A.shape[1]):
raise ValueError('Matrix/vector dimensions do not match.')
while E_rel > ((10**(-14)) * (N**(1/2))):
for i in range(N):
sum = 0
for j in range(N):
if j != i:
sum = sum + A[i,j] * x_k[j]
x_kp1[i] =(1 / A[i,i]) * (b[i] - sum)
E_rel = 0
for n in range(N):
E_rel = E_rel + abs(x_kp1[n] - x_k[n]) / ((abs(x_kp1[n]) + abs(x_k[n])) / 2)
iteration += 1
# print("relative error for this iteration:", E_rel)
if iteration < 11:
print("iteration ", iteration, ":", x_kp1)
x_k = np.copy(x_kp1)
return x_kp1
if __name__ == '__main__':
A = np.matrix([[12.,7,3],[1,5,1],[2,7,-11]])
b = np.array([22.,7,-2])
x = np.array([1.,2,1])
print("Jacobi Method:")
x_1 = jacobi(A, b, x)
Ok, so I wanted to convert this code following this nice example: https://p16.praetorian.com/blog/multi-core-and-distributed-programming-in-python
So I got some code that runs and converges to the correct solution in the same number of iterations! That's really great, but what is the guarantee that this happens? It seems like Queue.get() just grabs whatever result from whatever process finished first (or last?). I was actually very surprised when my code ran, as I expected
for i in range(N):
x_update[i] = q.get(True)
to jumble up the elements of the vector.
Here is my code updated using the multi-processing library:
import numpy as np
import multiprocessing as mu
np.set_printoptions(precision=15)
def Jacobi_step(index, initial_vector, q):
N = len(initial_vector)
sum = 0
for j in range(N):
if j != i:
sum = sum + A[i, j] * initial_vector[j]
# this result is the updated element at given index of our solution vector.
q.put((1 / A[index, index]) * (b[index] - sum))
if __name__ == '__main__':
A = np.matrix([[12.,7,3],[1,5,1],[2,7,-11]])
b = np.array([22.,7,-2])
x = np.array([1.,2,1])
q = mu.Queue()
N = len(x)
x_update = np.copy(x)
p = []
error = 1
iteration = 0
while error > ((10**(-14)) * (N**(1/2))):
# assign a process to each element in the vector x,
# update one element with a single Jacobi step
for i in range(N):
process = mu.Process(target=Jacobi_step(i, x, q))
p.append(process)
process.start()
# fill in the updated vector with each new element aquired by the last step
for i in range(N):
x_update[i] = q.get(True)
# check for convergence
error = 0
for n in range(N):
error = error + abs(x_update[n] - x[n]) / ((abs(x_update[n]) + abs(x[n])) / 2)
p[i].join()
x = np.copy(x_update)
iteration += 1
print("iteration ", iteration, ":", x)
del p[:]
A Queue is first-in-first-out which means the first element inserted is the first element retrieved, in order of insertion.
Since you have no way to control that, I suggest you insert tuples in the Queue, containing the value and some identifying object that can be used to sort/relate to the original computation.
result = (1 / A[index, index]) * (b[index] - sum)
q.put((index, result))
This example puts the index in the Queue together with the result, so that when you .get() later you get the index too and use it to know which computation this is for:
i, x_i = q.get(True)
x_update[i] = x_i
Or something like that.

Optimizing Python Code for Competitions

I'm new to Python, and I'm trying to get familiar with it by solving problems on CodeChef. I'm attempting to solve the Easy problem Number Game. The issue is that the execution time is too long for my code.
I have translated the Python solution I wrote into C++, and the submission was accepted, so I know I have a correct answer, and it's just off by a constant multiple.
Is it possible to solve this problem in Python 3 in the allotted time? Can you help me speed up my code to accomplish this?
import time
def getStartValues(A, M):
startVals = [0]*M
b = [0]*len(A)
for i in range(len(A)-1):
b[i+1] = (10*b[i] + A[i]) % M
f = 0
power = 1
for i in range(len(A)-1,0,-1):
startVals[(b[i]*power + f) % M] += 1
f = (A[i]*power + f) % M
power = (power*10 % M)
startVals[f] += 1
return startVals, power
def checkValues(i, startVals, M, powNm1, checked, chklst):
if checked[i] == 1:
return startVals[i]
q = [i]
chk = [0]*M
chk[i] = 1
while len(q) > 0:
val = q.pop(0)
for j in chklst:
val2 = (powNm1*val + j) % M
if checked[val2] > 0:
checked[i] = 1
return startVals[i]
elif chk[val2] == 0:
q.append(val2)
chk[val2] = 1
return 0
def compute(A, M):
startVals, power = getStartValues(A, M)
checked = [0]*M
checked[0] = 1
chklst = [j for j in range(M) if startVals[j] > 0]
total = 0
for i in chklst:
c = checkValues(i, startVals, M, power, checked, chklst)
total += c
return total
start = time.time()
file = open('numbgame.in', 'r')
#T = int(input())
T = int(file.readline())
for i in range(T):
#A, M = input().split()
A, M = file.readline().split()
A = list(map(int,A))
M = int(M)
print(compute(A, M))
tDiff = time.time() - start
print('Total time: %s' % tDiff)
Note that I have modified the code to read from a file and to display execution time, as a convenience, and some small alterations are needed before it can be submitted.
getStartValues takes in the (big) list of digits of the input A and the (small) integer M and returns the values modulo M that can be generated from A by removing a single digit.
checkValues takes an index i, the list startValues, the integer M, the integer powNm1 (which is the value 10^(n-1) mod M, where n is the number of digits in A, a list checked that keeps track of whether a value has already been determined to be solvable, and the list chklst (which contains the indices i such that startValues[i] > 0).
The majority of the time is spent in the function getStartValues, since A could be up to 10^6 digits long. On my desktop, the getStartValues function call takes about 1.2s, while the rest of the compute function takes about 0.04s (for worst case inputs).

Homework: Implementing the Z algorithm in python, it's really slow, slower than naive string search

I have to implement the Z algorithm and use it to search a target text for a specific pattern. I've implemented what I thought was the correct algorithm and search function using it but it's really slow. For the naive implementation of string search I consistently got times lower than 1.5 seconds and for the z string search I consistently got times over 3 seconds (for my biggest test case) so I have to be doing something wrong. The results seem to be correct, or were at least for the few test cases we were given. The code for the functions mentioned in my rant is below:
import sys
import time
# z algorithm a.k.a. the fundemental preprocessing algorithm
def z(P, start=1, max_box_size=sys.maxsize):
n = len(P)
boxes = [0] * n
l = -1
r = -1
for k in range(start, n):
if k > r:
i = 0
while k + i < n and P[i] == P[k + i] and i < max_box_size:
i += 1
boxes[k] = i
if i:
l = k
r = k + i - 1
else:
kp = k - l
Z_kp = boxes[kp]
if Z_kp < r - k + 1:
boxes[k] = Z_kp
else:
i = r + 1
while i < n and P[i] == P[i - k] and i - k < max_box_size:
i += 1
boxes[k] = i - k
l = k
r = i - 1
return boxes
# a simple string search
def naive_string_search(P, T):
m = len(T)
n = len(P)
indices = []
for i in range(m - n + 1):
if P == T[i: i + n]:
indices.append(i)
return indices
# string search using the z algorithm.
# The pattern you're searching for is simply prepended to the target text
# and than the z algorithm is run on that concatenation
def z_string_search(P, T):
PT = P + T
n = len(P)
boxes = z(PT, start=n, max_box_size=n)
return list(map(lambda x: x[0]-n, filter(lambda x: x[1] >= n, enumerate(boxes))))
Your's implementation of z-function def z(..) is algorithmically ok and asymptotically ok.
It has O(m + n) time complexity in worst case while implementation of naive string search has O(m*n) time complexity in worst case, so I think that the problem is in your test cases.
For example if we take this test case:
T = ['a'] * 1000000
P = ['a'] * 1000
we will get for z-function:
real 0m0.650s
user 0m0.606s
sys 0m0.036s
and for naive string matching:
real 0m8.235s
user 0m8.071s
sys 0m0.085s
PS: You should understand that there are a lot of test cases where naive string matching works in linear time too, for example:
T = ['a'] * 1000000
P = ['a'] * 1000000
Thus the worst case for a naive string matching is where function should apply pattern and check again and again. But in this case it will do only one check because of the lengths of the input (it cannot apply pattern from index 1 so it won't continue).

Categories