Does numpy.zeros() allow correct numba caching?

Does numpy.zeros() allow correct numba caching? - python

Numba does not cache when numpy.zeros() is called in the function. However, caching works properly with numpy.zeros_like().
I cannot have numba to cache the function when numpy.zeros() is called in the function. The same function is successfully cached when numpy.zeros() is replaced by numpy.zeros_like(), and properly changing the argument.
#jit('UniTuple(float64[:],4)(int32[:],int32[:],int32[:],int32[:],float64[:])',
nopython=True, parallel=True, cache=True, fastmath=True, nogil=True)
def fun_rec(a: np.ndarray, b: np.ndarray, m: np.ndarray, n: np.ndarray, x: np.ndarray):
l = int(len(x))
rec_num = np.zeros_like(x)
# not caching with rec_num = np.zeros(l)
rec_avg = np.zeros_like(x)
rec_err = np.zeros_like(x)
rec_fnd = np.zeros_like(x)
for i in range(l):
if rec_num[i] != 0:
continue
for j in prange(i + 1, l):
if (a[i] == m[j] and b[i] == n[j] and m[i] == a[j] and n[i] == b[j]):
avg = (x[i] + x[j]) / 2
err = abs(x[i] - x[j]) / abs(avg)
rec_num[i] = j + 1
rec_num[j] = i + 1
rec_avg[i] = avg
rec_avg[j] = avg
rec_err[i] = err
rec_err[j] = err
rec_fnd[i] = 1
rec_fnd[j] = 1
break
return(rec_num, rec_avg, rec_err, rec_fnd)
I was expecting numba to support numpy.zeros() as described in the documentation, no difference between numpy.zeros() and numpy.zeros_like() is described. I would like to know if the incorrect caching is related to possible errors in my code.

Related

Leetcode 4 binary search while loop

I'm trying to solve leetcode problem 4 which needs you to do binary search.
Given two sorted arrays nums1 and nums2 of size m and n respectively, return the median of the two sorted arrays.
The overall run time complexity should be O(log (m+n)).
This is my code:
class Solution:
def findMedianSortedArrays(self, nums1, nums2):
if len(nums1) > len(nums2):
return self.findMedianSortedArrays(nums2, nums1)
A, B = nums1, nums2
total = len(A) + len(B)
half = total // 2
l, r = 0, len(A) - 1
while l < r:
i = l + (r - l) // 2
j = half - i - 1 - 1
Aleft = A[i] if i >= 0 else float("-inf")
Aright = A[i + 1] if (i + 1) < len(A) else float("inf")
Bleft = B[j] if j >= 0 else float("-inf")
Bright = B[j + 1] if (j + 1) < len(B) else float("inf")
if Aleft <= Bright and Bleft <= Aright:
if total % 2:
return min(Aright, Bright)
else:
return (max(Aleft, Bleft) + min(Aright, Bright)) / 2
elif Aleft > Bright:
r = i - 1
elif Bleft > Aright:
l = i + 1
It gives this error note:
TypeError: None is not valid value for the expected return type double
raise TypeError(str(ret) + " is not valid value for the expected return type double");
Line 52 in _driver (Solution.py)
_driver()
Line 59 in <module> (Solution.py)
During handling of the above exception, another exception occurred:
TypeError: must be real number, not NoneType
Line 18 in _serialize_float (./python3/__serializer__.py)
Line 61 in _serialize (./python3/__serializer__.py)
out = ser._serialize(ret, 'double')
Line 50 in _driver (Solution.py)
However if I change the while loop condition to just while True:
class Solution:
def findMedianSortedArrays(self, nums1, nums2):
if len(nums1) > len(nums2):
return self.findMedianSortedArrays(nums2, nums1)
A, B = nums1, nums2
total = len(A) + len(B)
half = total // 2
l, r = 0, len(A) - 1
while True:
i = l + (r - l) // 2
j = half - i - 1 - 1
Aleft = A[i] if i >= 0 else float("-inf")
Aright = A[i + 1] if (i + 1) < len(A) else float("inf")
Bleft = B[j] if j >= 0 else float("-inf")
Bright = B[j + 1] if (j + 1) < len(B) else float("inf")
if Aleft <= Bright and Bleft <= Aright:
if total % 2:
return min(Aright, Bright)
else:
return (max(Aleft, Bleft) + min(Aright, Bright)) / 2
elif Aleft > Bright:
r = i - 1
elif Bleft > Aright:
l = i + 1
It runs perfectly.
Why the difference? How could these two different in logic?
Any help will be much appreciated.

The error you see is a little obscure because LeetCode is taking your solution code and wrapping it up for execution - but 1 hint you can tell from the error is that something is returning None when it's expecting a double.
The thing that's different is that if l >= r, none of the code in the while block will execute and the function will return with an implicit None value. If you instead change to while True:, the code within will always execute at least once and if the code falls into Aleft <= Bright and Bleft <= Aright, then the function will return with a non-None value.
if you add a return <some-number-here> after the while loop at the end of the function, you won't get the same error however you won't have the right answer either since the failing test case will probably return with whatever value you hardcoded in.
In the default test case, you can see that this happens at some point in your code because l, r = 0, 0 if you add a print(l, r) at the end of the function(outside of the while loop) followed by the dummy return value.
My advice (and I know some IDEs warn about this as well) would be to use explicit returns if a function is expected to return something and try not to force consumers of your function to handle None if you can - consistent types make for happy programmers.

A Normal Distribution Calculator

so im trying to make a program to solve various normal distribution questions with pure python (no modules other than math) to 4 decimal places only for A Levels, and there is this problem that occurs in the function get_z_less_than_a_equal(0.75):. Apparently, without the assert statement in the except clause, the variables all get messed up, and change. The error, i'm catching is the recursion error. Anyways, if there is an easier and more efficient way to do things, it'd be appreciated.
import math
mean = 0
standard_dev = 1
percentage_points = {0.5000: 0.0000, 0.4000: 0.2533, 0.3000: 0.5244, 0.2000: 0.8416, 0.1000: 1.2816, 0.0500: 1.6440, 0.0250: 1.9600, 0.0100: 2.3263, 0.0050: 2.5758, 0.0010: 3.0902, 0.0005: 3.2905}
def get_z_less_than(x):
"""
P(Z < x)
"""
return round(0.5 * (1 + math.erf((x - mean)/math.sqrt(2 * standard_dev**2))), 4)
def get_z_greater_than(x):
"""
P(Z > x)
"""
return round(1 - get_z_less_than(x), 4)
def get_z_in_range(lower_bound, upper_bound):
"""
P(lower_bound < Z < upper_bound)
"""
return round(get_z_less_than(upper_bound) - get_z_less_than(lower_bound), 4)
def get_z_less_than_a_equal(x):
"""
P(Z < a) = x
acquires a, given x
"""
# first trial: brute forcing
for i in range(401):
a = i/100
p = get_z_less_than(a)
if x == p:
return a
elif p > x:
break
# second trial: using symmetry
try:
res = -get_z_less_than_a_equal(1 - x)
except:
# third trial: using estimation
assert a, "error"
prev = get_z_less_than(a-0.01)
p = get_z_less_than(a)
if abs(x - prev) > abs(x - p):
res = a
else:
res = a - 0.01
return res
def get_z_greater_than_a_equal(x):
"""
P(Z > a) = x
"""
if x in percentage_points:
return percentage_points[x]
else:
return get_z_less_than_a_equal(1-x)
print(get_z_in_range(-1.20, 1.40))
print(get_z_less_than_a_equal(0.7517))
print(get_z_greater_than_a_equal(0.1000))
print(get_z_greater_than_a_equal(0.0322))
print(get_z_less_than_a_equal(0.1075))
print(get_z_less_than_a_equal(0.75))

Since python3.8, the statistics module in the standard library has a NormalDist class, so we could use that to implement our functions "with pure python" or at least for testing:
import math
from statistics import NormalDist
normal_dist = NormalDist(mu=0, sigma=1)
for i in range(-2000, 2000):
test_val = i / 1000
assert get_z_less_than(test_val) == round(normal_dist.cdf(test_val), 4)
Doesn't throw an error, so that part probably works fine
Your get_z_less_than_a_equal seems to be the equivalent of NormalDist.inv_cdf
There are very efficient ways to compute it accurately using the inverse of the error function (see Wikipedia and Python implementation), but we don't have that in the standard library
Since you only care about the first few digits and get_z_less_than is monotonic, we can use a simple bisection method to find our solution
Newton's method would be much faster, and not too hard to implement since we know that the derivative of the cdf is just the pdf, but still probably more complex than what we need
def get_z_less_than_a_equal(x):
"""
P(Z < a) = x
acquires a, given x
"""
if x <= 0.0 or x >= 1.0:
raise ValueError("x must be >0.0 and <1.0")
min_res, max_res = -10, 10
while max_res - min_res > 1e-7:
mid = (max_res + min_res) / 2
if get_z_less_than(mid) < x:
min_res = mid
else:
max_res = mid
return round((max_res + min_res) / 2, 4)
Let's test this:
for i in range(1, 2000):
test_val = i / 2000
left_val = get_z_less_than_a_equal(test_val)
right_val = round(normal_dist.inv_cdf(test_val), 4)
assert left_val == right_val, f"{left_val} != {right_val}"
# AssertionError: -3.3201 != -3.2905
We see that we are losing some precision, that's because the error introduced by get_z_less_than (which rounds to 4 digits) gets propagated and amplified when we use it to estimate its inverse (see Wikipedia - error propagation for details)
So let's add a "digits" parameter to get_z_less_than and change our functions slightly:
def get_z_less_than(x, digits=4):
"""
P(Z < x)
"""
res = 0.5 * (1 + math.erf((x - mean) / math.sqrt(2 * standard_dev ** 2)))
return round(res, digits)
def get_z_less_than_a_equal(x, digits=4):
"""
P(Z < a) = x
acquires a, given x
"""
if x <= 0.0 or x >= 1.0:
raise ValueError("x must be >0.0 and <1.0")
min_res, max_res = -10, 10
while max_res - min_res > 10 ** -(digits * 2):
mid = (max_res + min_res) / 2
if get_z_less_than(mid, digits * 2) < x:
min_res = mid
else:
max_res = mid
return round((max_res + min_res) / 2, digits)
And now we can try the same test again and see it passes

Horner's rule and direct method in python

I am trying to solve the following question. I have made codes for implanting the direct method and Horner's rule, which I believe I have done this correctly. However past that I am having some problems figuring out the rest. Looking for some help with this, all help is greatly appreciated!
Here is the code I have produced for Horner's rule, which I believe I have done correctly.
def poly_horner(A, x):
p = A[-1]
i = len(A) - 2
while i >= 0:
p = p * x + A[i]
i -= 1
return p
And here is the code I have produced for the direct method:
def poly_naive(A, x):
p = 0
for i, a in enumerate(A):
p += (x ** i) * a
return p
How can I put this code together and finish the rest?

Using global as suggested in the paper,
flops = 0
def add(x1, x2):
global flops
flops += 1
return x1 + x2
def multiply(x1, x2):
global flops
flops += 1
return x1 * x2
def poly_horner(A, x):
global flops
flops = 0
p = A[-1]
i = len(A) - 2
while i >= 0:
p = add(multiply(p, x), A[i])
i -= 1
return p
def poly_naive(A, x):
global flops
flops = 0
p = 0
for i, a in enumerate(A):
xp = a
for _ in range(i):
xp = multiply(xp, x)
p = add(p, xp)
return p
To run the above code, for example:
>>> poly_horner([1,2,3,4,5], 2)
129
>>> print(flops)
8
Compare to numpy's polyval:
>>> import numpy as np
>>> np.polyval([5,4,3,2,1], 2)
129

What do I get from Queue.get() (Python)

Overall question: How do I know what I am getting from a Queue object when I call Queue.get()? How do I sort it, or identify it? Can you get specific items from the Queue and leave others?
Context:
I wanted to learn a little about multi-proccessing (threading?) to make solving a matrix equation more efficient.
To illustrate, below is my working code for solving the matrix equation Ax = b without taking advantage of multiple cores. The solution is [1,1,1].
def jacobi(A, b, x_k):
N = len(x_k)
x_kp1 = np.copy(x_k)
E_rel = 1
iteration = 0
if (N != A.shape[0] or N != A.shape[1]):
raise ValueError('Matrix/vector dimensions do not match.')
while E_rel > ((10**(-14)) * (N**(1/2))):
for i in range(N):
sum = 0
for j in range(N):
if j != i:
sum = sum + A[i,j] * x_k[j]
x_kp1[i] =(1 / A[i,i]) * (b[i] - sum)
E_rel = 0
for n in range(N):
E_rel = E_rel + abs(x_kp1[n] - x_k[n]) / ((abs(x_kp1[n]) + abs(x_k[n])) / 2)
iteration += 1
# print("relative error for this iteration:", E_rel)
if iteration < 11:
print("iteration ", iteration, ":", x_kp1)
x_k = np.copy(x_kp1)
return x_kp1
if __name__ == '__main__':
A = np.matrix([[12.,7,3],[1,5,1],[2,7,-11]])
b = np.array([22.,7,-2])
x = np.array([1.,2,1])
print("Jacobi Method:")
x_1 = jacobi(A, b, x)
Ok, so I wanted to convert this code following this nice example: https://p16.praetorian.com/blog/multi-core-and-distributed-programming-in-python
So I got some code that runs and converges to the correct solution in the same number of iterations! That's really great, but what is the guarantee that this happens? It seems like Queue.get() just grabs whatever result from whatever process finished first (or last?). I was actually very surprised when my code ran, as I expected
for i in range(N):
x_update[i] = q.get(True)
to jumble up the elements of the vector.
Here is my code updated using the multi-processing library:
import numpy as np
import multiprocessing as mu
np.set_printoptions(precision=15)
def Jacobi_step(index, initial_vector, q):
N = len(initial_vector)
sum = 0
for j in range(N):
if j != i:
sum = sum + A[i, j] * initial_vector[j]
# this result is the updated element at given index of our solution vector.
q.put((1 / A[index, index]) * (b[index] - sum))
if __name__ == '__main__':
A = np.matrix([[12.,7,3],[1,5,1],[2,7,-11]])
b = np.array([22.,7,-2])
x = np.array([1.,2,1])
q = mu.Queue()
N = len(x)
x_update = np.copy(x)
p = []
error = 1
iteration = 0
while error > ((10**(-14)) * (N**(1/2))):
# assign a process to each element in the vector x,
# update one element with a single Jacobi step
for i in range(N):
process = mu.Process(target=Jacobi_step(i, x, q))
p.append(process)
process.start()
# fill in the updated vector with each new element aquired by the last step
for i in range(N):
x_update[i] = q.get(True)
# check for convergence
error = 0
for n in range(N):
error = error + abs(x_update[n] - x[n]) / ((abs(x_update[n]) + abs(x[n])) / 2)
p[i].join()
x = np.copy(x_update)
iteration += 1
print("iteration ", iteration, ":", x)
del p[:]

A Queue is first-in-first-out which means the first element inserted is the first element retrieved, in order of insertion.
Since you have no way to control that, I suggest you insert tuples in the Queue, containing the value and some identifying object that can be used to sort/relate to the original computation.
result = (1 / A[index, index]) * (b[index] - sum)
q.put((index, result))
This example puts the index in the Queue together with the result, so that when you .get() later you get the index too and use it to know which computation this is for:
i, x_i = q.get(True)
x_update[i] = x_i
Or something like that.

Finding c so that sum(x+c) over positives = K

Say I have a 1D array x with positive and negative values in Python, e.g.:
x = random.rand(10) * 10
For a given positive value of K, I would like to find the offset c that makes the sum of positive elements of the array y = x + c equal to K.
How can I solve this problem efficiently?

How about binary search to determine which elements of x + c are going to contribute to the sum, followed by solving the linear equation? The running time of this code is O(n log n), but only O(log n) work is done in Python. The running time could be dropped to O(n) via a more complicated partitioning strategy. I'm not sure whether a practical improvement would result.
import numpy as np
def findthreshold(x, K):
x = np.sort(np.array(x))[::-1]
z = np.cumsum(np.array(x))
l = 0
u = x.size
while u - l > 1:
m = (l + u) // 2
if z[m] - (m + 1) * x[m] >= K:
u = m
else:
l = m
return (K - z[l]) / (l + 1)
def test():
x = np.random.rand(10)
K = np.random.rand() * x.size
c = findthreshold(x, K)
assert np.abs(K - np.sum(np.clip(x + c, 0, np.inf))) / K <= 1e-8
Here's a randomized expected O(n) variant. It's faster (on my machine, for large inputs), but not dramatically so. Watch out for catastrophic cancellation in both versions.
def findthreshold2(x, K):
sumincluded = 0
includedsize = 0
while x.size > 0:
pivot = x[np.random.randint(x.size)]
above = x[x > pivot]
if sumincluded + np.sum(above) - (includedsize + above.size) * pivot >= K:
x = above
else:
notbelow = x[x >= pivot]
sumincluded += np.sum(notbelow)
includedsize += notbelow.size
x = x[x < pivot]
return (K - sumincluded) / includedsize

You can sort x in descending order, loop over x and compute the required c thus far. If the next element plus c is positive, it should be included in the sum, so c gets smaller.
Note that it might be the case that there is no solution: if you include elements up to m, c is such that m+1 should also be included, but when you include m+1, c decreases and a[m+1]+c might get negative.
In pseudocode:
sortDescending(x)
i = 0, c = 0, sum = 0
while i < x.length and x[i] + c >= 0
sum += x[i]
c = (K - sum) / i
i++
if i == 0 or x[i-1] + c < 0
#no solution
The running time is obviously O(n log n) because it is dominated by the initial sort.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Does numpy.zeros() allow correct numba caching? - python

Related

Leetcode 4 binary search while loop

A Normal Distribution Calculator

Horner's rule and direct method in python

What do I get from Queue.get() (Python)

Finding c so that sum(x+c) over positives = K

Categories

Resources