Overall question: How do I know what I am getting from a Queue object when I call Queue.get()? How do I sort it, or identify it? Can you get specific items from the Queue and leave others?
Context:
I wanted to learn a little about multi-proccessing (threading?) to make solving a matrix equation more efficient.
To illustrate, below is my working code for solving the matrix equation Ax = b without taking advantage of multiple cores. The solution is [1,1,1].
def jacobi(A, b, x_k):
N = len(x_k)
x_kp1 = np.copy(x_k)
E_rel = 1
iteration = 0
if (N != A.shape[0] or N != A.shape[1]):
raise ValueError('Matrix/vector dimensions do not match.')
while E_rel > ((10**(-14)) * (N**(1/2))):
for i in range(N):
sum = 0
for j in range(N):
if j != i:
sum = sum + A[i,j] * x_k[j]
x_kp1[i] =(1 / A[i,i]) * (b[i] - sum)
E_rel = 0
for n in range(N):
E_rel = E_rel + abs(x_kp1[n] - x_k[n]) / ((abs(x_kp1[n]) + abs(x_k[n])) / 2)
iteration += 1
# print("relative error for this iteration:", E_rel)
if iteration < 11:
print("iteration ", iteration, ":", x_kp1)
x_k = np.copy(x_kp1)
return x_kp1
if __name__ == '__main__':
A = np.matrix([[12.,7,3],[1,5,1],[2,7,-11]])
b = np.array([22.,7,-2])
x = np.array([1.,2,1])
print("Jacobi Method:")
x_1 = jacobi(A, b, x)
Ok, so I wanted to convert this code following this nice example: https://p16.praetorian.com/blog/multi-core-and-distributed-programming-in-python
So I got some code that runs and converges to the correct solution in the same number of iterations! That's really great, but what is the guarantee that this happens? It seems like Queue.get() just grabs whatever result from whatever process finished first (or last?). I was actually very surprised when my code ran, as I expected
for i in range(N):
x_update[i] = q.get(True)
to jumble up the elements of the vector.
Here is my code updated using the multi-processing library:
import numpy as np
import multiprocessing as mu
np.set_printoptions(precision=15)
def Jacobi_step(index, initial_vector, q):
N = len(initial_vector)
sum = 0
for j in range(N):
if j != i:
sum = sum + A[i, j] * initial_vector[j]
# this result is the updated element at given index of our solution vector.
q.put((1 / A[index, index]) * (b[index] - sum))
if __name__ == '__main__':
A = np.matrix([[12.,7,3],[1,5,1],[2,7,-11]])
b = np.array([22.,7,-2])
x = np.array([1.,2,1])
q = mu.Queue()
N = len(x)
x_update = np.copy(x)
p = []
error = 1
iteration = 0
while error > ((10**(-14)) * (N**(1/2))):
# assign a process to each element in the vector x,
# update one element with a single Jacobi step
for i in range(N):
process = mu.Process(target=Jacobi_step(i, x, q))
p.append(process)
process.start()
# fill in the updated vector with each new element aquired by the last step
for i in range(N):
x_update[i] = q.get(True)
# check for convergence
error = 0
for n in range(N):
error = error + abs(x_update[n] - x[n]) / ((abs(x_update[n]) + abs(x[n])) / 2)
p[i].join()
x = np.copy(x_update)
iteration += 1
print("iteration ", iteration, ":", x)
del p[:]
A Queue is first-in-first-out which means the first element inserted is the first element retrieved, in order of insertion.
Since you have no way to control that, I suggest you insert tuples in the Queue, containing the value and some identifying object that can be used to sort/relate to the original computation.
result = (1 / A[index, index]) * (b[index] - sum)
q.put((index, result))
This example puts the index in the Queue together with the result, so that when you .get() later you get the index too and use it to know which computation this is for:
i, x_i = q.get(True)
x_update[i] = x_i
Or something like that.
Related
I am trying to find the number of ways to construct an array such that consecutive positions contain different values.
Specifically, I need to construct an array with elements such that each element 1 between and k , all inclusive. I also want the first and last elements of the array to be 1 and x.
Complete problem statement:
Here is what I tried:
def countArray(n, k, x):
# Return the number of ways to fill in the array.
if x > k:
return 0
if x == 1:
return 0
def fact(n):
if n == 0:
return 1
fact_range = n+1
T = [1 for i in range(fact_range)]
for i in range(1,fact_range):
T[i] = i * T[i-1]
return T[fact_range-1]
ways = fact(k) / (fact(n-2)*fact(k-(n-2)))
return int(ways)
In short, I did K(C)N-2 to find the ways. How could I solve this?
It passes one of the base case with inputs as countArray(4,3,2) but fails for 16 other cases.
Let X(n) be the number of ways of constructing an array of length n, starting with 1 and ending in x (and not repeating any numbers). Let Y(n) be the number of ways of constructing an array of length n, starting with 1 and NOT ending in x (and not repeating any numbers).
Then there's these recurrence relations (for n>1)
X(n+1) = Y(n)
Y(n+1) = X(n)*(k-1) + Y(n)*(k-2)
In words: If you want an array of length n+1 ending in x, then you need an array of length n not ending in x. And if you want an array of length n+1 not ending in x, then you can either add any of the k-1 symbols to an array of length n ending in x, or you can take an array of length n not ending in x, and add any of the k-2 symbols that aren't x and don't repeat the last value.
For the base case, n=1, if x is 1 then X(1)=1, Y(1)=0 otherwise, X(1)=0, Y(1)=1
This gives you an O(n)-time method of computing the result.
def ways(n, k, x):
M = 10**9 + 7
wx = (x == 1)
wnx = (x != 1)
for _ in range(n-1):
wx, wnx = wnx, wx * (k-1) + wnx*(k-2)
wnx = wnx % M
return wx
print(ways(100, 5, 2))
In principle you can reduce this to O(log n) by expressing the recurrence relations as a matrix and computing the matrix power (mod M), but it's probably not necessary for the question.
[Additional working]
We have the recurrence relations:
X(n+1) = Y(n)
Y(n+1) = X(n)*(k-1) + Y(n)*(k-2)
Using the first, we can replace the Y(_) in the second with X(_+1) to reduce it down to a single variable. Then:
X(n+2) = X(n)*(k-1) + X(n+1)*(k-2)
Using standard techniques, we can solve this linear recurrence relation exactly.
In the case x!=1, we have:
X(n) = ((k-1)^(n-1) - (-1)^n) / k
And in the case x=1, we have:
X(n) = ((k-1)^(n-1) - (1-k)(-1)^n)/k
We can compute these mod M using Fermat's little theorem because M is prime. So 1/k = k^(M-2) mod M.
Thus we have (with a little bit of optimization) this short program that solves the problem and runs in O(log n) time:
def ways2(n, k, x):
S = -1 if n%2 else 1
return ((pow(k-1, n-1, M) + S) * pow(k, M-2, M) - S*(x==1)) % M
could you try this DP version: (it's passed all tests) (it's inspired by #PaulHankin and take DP approach - will run performance later to see what's diff for big matrix)
def countArray(n, k, x):
# Return the number of ways to fill in the array.
big_mod = 10 ** 9 + 7
dp = [[1], [1]]
if x == 1:
dp = [[1], [0]]
else:
dp = [[1], [1]]
for _ in range(n-2):
dp[0].append(dp[0][-1] * (k - 1) % big_mod)
dp[1].append((dp[0][-1] - dp[1][-1]) % big_mod)
return dp[1][-1]
Given :
I : a positive integer
n : a positive integer
nth Term of sequence for input = I :
F(I,1) = (I * (I+1)) / 2
F(I,2) = F(I,1) + F(I-1,1) + F(I-2,1) + .... F(2,1) + F(1,1)
F(I,3) = F(I,2) + F(I-1,2) + F(I-2,2) + .... F(2,2) + F(2,1)
..
..
F(I,n) = F(I,n-1) + F(I-1,n-1) + F(I-2,n-1) + .... F(2,n-1) + F(1,n-1)
nth term --> F(I,n)
Approach 1 : Used recursion to find the above :
def recursive_sum(I, n):
if n == 1:
return (I * (I + 1)) // 2
else:
return sum(recursive_sum(j, n - 1) for j in range(I, 0, -1))
Approach 2 : Iteration to store reusable values in a dictionary. Used this dictionary to get the nth term.:
def non_recursive_sum_using_data(I, n):
global data
if n == 1:
return (I * (I + 1)) // 2
else:
return sum(data[j][n - 1] for j in range(I, 0, -1))
def iterate(I,n):
global data
data = {}
i = 1
j = 1
for i in range(n+1):
for j in range(I+1):
if j not in data:
data[j] = {}
data[j][i] = recursive_sum(j,i)
return data[I][n]
The recursion approach is obviously not efficient due to maximum recursion depth. Also the next approach's time and space complexity will be poor.
Is there better way to recurse ? or a different approach than recursion ?
I am curious if we can find a formula for nth term.
You could just cache your recursive results:
from functools import lru_cache
#lru_cache(maxsize=None)
def recursive_sum(I, n):
if n == 1:
return (I * (I + 1)) // 2
return sum(recursive_sum(j, n - 1) for j in range(I, 0, -1))
That way you can get the readability and brevity of the recursive approach without most of the performance issues since the function is only called once for each argument combination (I, n).
Using the usual binomial(n,k) = n!/(k!*(n-k)!), you have
F(I,n) = binomial(I+n, n+1).
Then you can choose the method you like most to compute binomial coefficients.
Here an example:
def binomial(n, k):
numerator = denominator = 1
t = max(k, n-k)
for low,high in enumerate(range(t+1, n+1), 1):
numerator *= high
denominator *= low
return numerator // denominator
def F(I,n): return binomial(I+n, n+1)
The formula for the nth term of the sequence is the one you have already mentioned.
Also rightly so you have identified that it will lead to an inefficient algorithm and stack overflow.
You can look into dynamic programming approach where u calculate F(I,N) just once and just reuse the value.
For example this is how the fibonacci seq is calculated.
[just-example] https://www.geeksforgeeks.org/program-for-nth-fibonacci-number/
You need to find the same pattern and cache the value
I have an example for this here in this small code written in golang
https://play.golang.org/p/vRi-QMj7z2v
the standard DP
One can do a (tiny) bit of math to rewrite your function:
F(i,n) = sum_{k=0}^{i-1} F(i-k, n-1) = sum_{k=1}^{i} F(k, n-1)
Now notice, that if you consider a matrix F_{ixn}, to compute F(i,n) we just need to add the elements of the previous column.
x----+---
| + |
|----+ |
|----+-F(i,n)
We conclude that we can build the first layer (aka column). Then the second one. And so forth until we get to the n-th layer.
Finally we take the last element of our final layer which is F(i,n)
The computation time is about O(I*n)
More math based but faster
An other way is to consider our layer as a vector variable X.
We can write the recurrence relation as
X_n = MX_{n-1}
where M is a triangular matrix with 1 in the lower part.
We then want to compute the general term of X_n so we want to compute M^n.
By following Yves Daoust
(I just copy from the link above)
Coefficients should be indiced _{n+1} and _n, but here it is _1 and '' for readability
Moreover the matrix is upper triangular but we can just take the transpose afterwards...
a_1 b_1 c_1 d_1 1 1 1 1 a b c d
a_1 b_1 c_1 = 0 1 1 1 * 0 a b c
a_1 b_1 0 0 1 1 0 0 a b
a_1 0 0 0 1 0 0 0 a
by going from last row to first:
a = 1
from b_1 = a+b = 1 + b = n, b = n
from c_1 = a+b+c = 1+n+c, c = n(n+1)/2
from d_1 = a+b+c+d = 1+n+n(n+1)/2 +d, d = n(n+1)(n+2)/6
I have not proved it but I hint that e_1 = n(n+1)(n+2)(n+3)/24 (so basically C_n^k)
(I think the proof lies more in the fact that F(i,n) = F(i,n-1) + F(i-1,n) )
More generally instead of taking variables a,b,c... but X_n(0), X_n(1)...
X_n(0) = 1
X_n(i) = n*...*(n+i-1) / i!
And by applying recusion for computing X:
X_n(0) = 1
X_n(i) = X_n(i-1)*(n+i-1)/i
Finally we deduce F(i,n) as the scalar product Y_{n-1} * X_1 where Y_n is the reversed vector of X_n and X_1(n) = n*(n+1)/2
from functools import lru_cache
#this is copypasted from schwobaseggl
#lru_cache(maxsize=None)
def recursive_sum(I, n):
if n == 1:
return (I * (I + 1)) // 2
return sum(recursive_sum(j, n - 1) for j in range(I, 0, -1))
def iterative_sum(I,n):
layer = [ i*(i+1)//2 for i in range(1,I+1)]
x = 2
while x <= n:
next_layer = [layer[0]]
for i in range(1,I):
#we don't need to compute all the sum everytime
#take the previous sum and add it the new number
next_layer.append( next_layer[i-1] + layer[i] )
layer = next_layer
x += 1
return layer[-1]
def brutus(I,n):
if n == 1:
return I*(I+1)//2
X_1 = [ i*(i+1)//2 for i in range(1, I+1)]
X_n = [1]
for i in range(1, I):
X_n.append(X_n[-1] * (n-1 + i-1) / i )
X_n.reverse()
s = 0
for i in range(0, I):
s += X_1[i]*X_n[i]
return s
def do(k,n):
print('rec', recursive_sum(k,n))
print('it ', iterative_sum(k,n))
print('bru', brutus(k,n))
print('---')
do(1,4)
do(2,1)
do(3,2)
do(4,7)
do(7,4)
I am solving a problem that needs either a list of integers or a dictionary of size 10^18. Upon running the code the compiler throws an error message saying "Memory Limit Exceeded".
Here is my code:
def fun(l, r, p):
#f = [None, 1, 1]
f = {0:0, 1:1, 2:1}
su = 0
for i in range(1, r):
if i%2 == 0:
f[i+2] = 2*f[i+1] - f[i] + 2
#f.append(2*f[i+1] - f[i] + 2)
else:
f[i+2] = 3*f[i]
#f.append(3*f[i])
for k in range(l, r):
su = su + f[k]
su = (su + f[r]) % p
print(su)
t, p = input().split()
p = int(p)
t = int(t)
#t = 3
#p = 100000007
for i in range(t):
l , r = input().split()
l = int(l)
r = int(r)
fun(l, r, p)
It is showing memory limit exceeded with a maximum memory usage of 306612 KiB.
Two observations here:
You don't need to store all numbers simultaneously. You can use the deque and generator functions to generate the numbers by keeping track of only the last three digits generated instead of the entire sequence.
import itertools
from collections import deque
def infinite_fun_generator():
seed = [0, 1, 1]
dq = deque(maxlen=2)
dq.extend(seed)
yield from seed
for i in itertools.count(1):
if i % 2 == 0:
dq.append(2 * dq[-1] - dq[-2] + 2)
else:
dq.append(3 * dq[-2])
yield dq[-1]
def fun(l, r, p):
funs = itertools.islice(infinite_fun_generator(), l, r + 1)
summed_funs = itertools.accumulate(funs, lambda a, b: (a + b) % p)
return deque(summed_funs, maxlen=1)[-1]
You might have a better chance asking this in Math.SE since I don't want to do the math right now, but just like with the Fibonacci sequence there's likely an analytic solution that you can use to compute the nth member of the sequence analytically without having to iteratively compute the intermediate numbers and it may even be possible to analytically derive a formula to compute the sums in constant time.
I'm new to Python, and I'm trying to get familiar with it by solving problems on CodeChef. I'm attempting to solve the Easy problem Number Game. The issue is that the execution time is too long for my code.
I have translated the Python solution I wrote into C++, and the submission was accepted, so I know I have a correct answer, and it's just off by a constant multiple.
Is it possible to solve this problem in Python 3 in the allotted time? Can you help me speed up my code to accomplish this?
import time
def getStartValues(A, M):
startVals = [0]*M
b = [0]*len(A)
for i in range(len(A)-1):
b[i+1] = (10*b[i] + A[i]) % M
f = 0
power = 1
for i in range(len(A)-1,0,-1):
startVals[(b[i]*power + f) % M] += 1
f = (A[i]*power + f) % M
power = (power*10 % M)
startVals[f] += 1
return startVals, power
def checkValues(i, startVals, M, powNm1, checked, chklst):
if checked[i] == 1:
return startVals[i]
q = [i]
chk = [0]*M
chk[i] = 1
while len(q) > 0:
val = q.pop(0)
for j in chklst:
val2 = (powNm1*val + j) % M
if checked[val2] > 0:
checked[i] = 1
return startVals[i]
elif chk[val2] == 0:
q.append(val2)
chk[val2] = 1
return 0
def compute(A, M):
startVals, power = getStartValues(A, M)
checked = [0]*M
checked[0] = 1
chklst = [j for j in range(M) if startVals[j] > 0]
total = 0
for i in chklst:
c = checkValues(i, startVals, M, power, checked, chklst)
total += c
return total
start = time.time()
file = open('numbgame.in', 'r')
#T = int(input())
T = int(file.readline())
for i in range(T):
#A, M = input().split()
A, M = file.readline().split()
A = list(map(int,A))
M = int(M)
print(compute(A, M))
tDiff = time.time() - start
print('Total time: %s' % tDiff)
Note that I have modified the code to read from a file and to display execution time, as a convenience, and some small alterations are needed before it can be submitted.
getStartValues takes in the (big) list of digits of the input A and the (small) integer M and returns the values modulo M that can be generated from A by removing a single digit.
checkValues takes an index i, the list startValues, the integer M, the integer powNm1 (which is the value 10^(n-1) mod M, where n is the number of digits in A, a list checked that keeps track of whether a value has already been determined to be solvable, and the list chklst (which contains the indices i such that startValues[i] > 0).
The majority of the time is spent in the function getStartValues, since A could be up to 10^6 digits long. On my desktop, the getStartValues function call takes about 1.2s, while the rest of the compute function takes about 0.04s (for worst case inputs).
I have to implement the Z algorithm and use it to search a target text for a specific pattern. I've implemented what I thought was the correct algorithm and search function using it but it's really slow. For the naive implementation of string search I consistently got times lower than 1.5 seconds and for the z string search I consistently got times over 3 seconds (for my biggest test case) so I have to be doing something wrong. The results seem to be correct, or were at least for the few test cases we were given. The code for the functions mentioned in my rant is below:
import sys
import time
# z algorithm a.k.a. the fundemental preprocessing algorithm
def z(P, start=1, max_box_size=sys.maxsize):
n = len(P)
boxes = [0] * n
l = -1
r = -1
for k in range(start, n):
if k > r:
i = 0
while k + i < n and P[i] == P[k + i] and i < max_box_size:
i += 1
boxes[k] = i
if i:
l = k
r = k + i - 1
else:
kp = k - l
Z_kp = boxes[kp]
if Z_kp < r - k + 1:
boxes[k] = Z_kp
else:
i = r + 1
while i < n and P[i] == P[i - k] and i - k < max_box_size:
i += 1
boxes[k] = i - k
l = k
r = i - 1
return boxes
# a simple string search
def naive_string_search(P, T):
m = len(T)
n = len(P)
indices = []
for i in range(m - n + 1):
if P == T[i: i + n]:
indices.append(i)
return indices
# string search using the z algorithm.
# The pattern you're searching for is simply prepended to the target text
# and than the z algorithm is run on that concatenation
def z_string_search(P, T):
PT = P + T
n = len(P)
boxes = z(PT, start=n, max_box_size=n)
return list(map(lambda x: x[0]-n, filter(lambda x: x[1] >= n, enumerate(boxes))))
Your's implementation of z-function def z(..) is algorithmically ok and asymptotically ok.
It has O(m + n) time complexity in worst case while implementation of naive string search has O(m*n) time complexity in worst case, so I think that the problem is in your test cases.
For example if we take this test case:
T = ['a'] * 1000000
P = ['a'] * 1000
we will get for z-function:
real 0m0.650s
user 0m0.606s
sys 0m0.036s
and for naive string matching:
real 0m8.235s
user 0m8.071s
sys 0m0.085s
PS: You should understand that there are a lot of test cases where naive string matching works in linear time too, for example:
T = ['a'] * 1000000
P = ['a'] * 1000000
Thus the worst case for a naive string matching is where function should apply pattern and check again and again. But in this case it will do only one check because of the lengths of the input (it cannot apply pattern from index 1 so it won't continue).