I found my program failed to overwrite an np.ndarray (the X variable) in the for loop by assignment statement like "X[i] = another np.ndarray with matched shape". I have no idea how this could happen...
Codes:
import numpy as np
def qr_tridiagonal(T: np.ndarray):
m, n = T.shape
X = T.copy()
Qt = np.identity(m)
for i in range(n-1):
ai = X[i, i]
ak = X[i+1, i]
c = ai/(ai**2 + ak**2)**.5
s = ak/(ai**2 + ak**2)**.5
# Givens rotation
tmp1 = c*X[i] + s*X[i+1]
tmp2 = c*X[i+1] - s*X[i]
print("tmp1 before:", tmp1)
print("X[i] before:", X[i])
X[i] = tmp1
X[i+1] = tmp2
print("tmp1 after:", tmp1)
print("X[i] after:", X[i])
print()
print(X)
return Qt.T, X
A = np.array([[1, 1, 0, 0], [1, 1, 1, 0], [0, 1, 1, 1], [0, 0, 1, 1]])
Q, R = qr_tridiagonal(A)
Output (the first 4 lines):
tmp1 before: [1.41421356 1.41421356 0.70710678 0. ]
X[i] before: [1 1 0 0]
tmp1 after: [1.41421356 1.41421356 0.70710678 0. ]
X[i] after: [1 1 0 0]
Though X[i] is assigned by tmp1, the values in the array X[i] or X[i, :] remain unchanged. Hope somebody help me out....
Other info: the above is a function to compute QR factorization for tridiagonal matrices using Givens Rotation.
I did check that assigning constant values to X[i] work, e.g. X[i] = 10 then the printed results fit this statement. But if X[i] = someArray then in my codes it would fail. I am not sure whether this is a particular issue triggered by the algorithm I was implementing in the above codes, because such scenarios never happen before.
I did try to install new environments using conda to make sure that my python is not problematic. The above strange outputs should be able to re-generate on other devices.
Many thanks to #hpaulj
It turns out to be a problem of datatype. The program is ok but the input datatype is int, which results in intermediate trancation errors.
A lesson learned: be aware of the dtype of np.ndarray!
Related
I am trying to simulate a grid of spins in python that can change their orientation (represented by the sign):
>>> import numpy as np
>>> spin_values = np.random.choice([-1, 1], (2, 2))
>>> spin_values
array([[-1, 1],
[ 1, 1]])
I then throw two sets of random indices of that grid for spins that have a certain probability to switch their orientation, let's say:
>>> i = np.array([1, 1])
>>> j = np.array([0, 0])
>>> switches = np.array([-1, -1])
i and j here contain the indices that might change and switches states whether they do switch (-1) or keep their orientation (1). My idea for calculating the new orientations was:
>>> spin_values[i, j] *= switches
When a spin orientation only changes once this works fine. However, when it is supposed to change twice (as with the example values) it only changes once, therefore giving me a wrong result.
>>> spin_values
array([[-1, 1],
[-1, 1]])
How could I get the right results while having a short run time (this has to be done many times on a bigger grid)?
I would use numpy.unique to get the count of unique pairs of indices and compute -1 ** n:
idx, cnt = np.unique(np.vstack([i, j]), axis=1, return_counts=True)
spin_values[tuple(idx)] = (-1) ** cnt
Updated spin_values:
array([[-1, 1],
[ 1, 1]])
I am working on the implementation of a gauss elimination algorithm in python using numpy. While working on it, I have noticed a weird behaviour. Here goes what I've done so far:
def gauss_elimination(a, b):
n = len(b)
print(a)
for k in range(0, n-1):
for i in range(k+1, n):
lam = a[i, k] / a[k, k]
a[i, k:n] = a[i, k:n] - (lam * a[k, k:n])
b[i] = b[i] - lam * b[k]
print(a)
return b
With this code, considering the following arrays:
a = np.array([[4, -2, 1], [-2, 4, -2], [1, -2, 4]])
b = np.array([11, -16, 17])
The result will be:
array([ 11, -10, 10])
This is how the algorithm changed the array a:
array([[ 4, -2, 1],
[ 0, 3, -1],
[ 0, 0, 2]])
Which is wrong. For some reasong, the value in the second row and third column is -1, when it should -1.5. I've inserted some printing in the way to see what was actually happening, and for some reason numpy is truncating the result. This is the changed code:
def gauss_elimination(a, b):
n = len(b)
print(a)
for k in range(0, n-1):
for i in range(k+1, n):
lam = a[i, k] / a[k, k]
print(a[i, k:n])
print(lam * a[k, k:n])
print(a[i, k:n] - lam * a[k, k:n])
a[i, k:n] = a[i, k:n] - (lam * a[k, k:n])
b[i] = b[i] - lam * b[k]
print(a)
return b
And considering the same arrays that were defined a while back, the results will be:
[[ 4 -2 1]
[-2 4 -2]
[ 1 -2 4]]
[ 0. 3. -1.5] # This shows that the value is being calculated as presumed
[[ 4 -2 1]
[ 0 3 -1] # But when the object a is updated, the value is -1 and not -1.5
[ 1 -2 4]]
[ 0. -1.5 3.75]
[[ 4 -2 1]
[ 0 3 -1]
[ 0 -1 3]]
[0. 2.6666666666666665]
[[ 4 -2 1]
[ 0 3 -1]
[ 0 0 2]]
I am little bit confused. Perhaps I might have made a mistake, but the printing shows that everything is being calculated as it should be. Any tips?
The problem
Your array's dtype is "int32":
>>> import numpy as np
>>> x = np.array([1, 2, 3])
>>> x[1] / x[2]
0.6666666666666666
>>> x[0] = x[1] / x[2]
>>> x
array([0, 2, 3])
>>> x.dtype
>>> dtype('int32')
The "solution"
You can use dtype="float64" at instantiation to fix this issue:
>>> x = np.array([1, 2, 3], dtype="float64")
>>> x[0] = x[1] / x[2]
>>> x
array([0.66666667, 2. , 3. ])
Of course, you could also do this without having to explicitly specify the dtype by instantiating your array with floats in the first place:
>>> x = np.array([1., 2., 3.])
>>> x.dtype
dtype('float64')
However, whenever dealing with numerical methods in programming, you should be cognizant of the limitations of floating point arithmetic. Take a look at the note at the bottom of this answer for more details.
Explanation
NumPy arrays are homogeneous arrays, meaning that every element is of the same type. The dtype or "data type" determines how much memory each element in an array requires. This in turn dictates the memory footprint of an entire array in memory.
Because NumPy arrays are homogenenous, when an array is created it is stored in a contiguous chunk of memory which makes accessing elements very vast and easy since all elements take up the same number of bytes. This is part of the reason NumPy is so fast.
Just FYI, this is a very simplified explanation of what's going on here. Essentially, NumPy arrays are arrays of a single type ("int8", or "int32", or "float64", etc).
An operation like division on integers always results in a float, and since NumPy doesn't want to assume that you wanted to change the dtype of the entire array (which would require creating an entire new array in memory), it quietly maintains the dtype of the existing array.
Note
Anytime you're dealing with numerical computations, you need to be aware of the limitations of floating point arithmetic. Floating point arithmetic is prone to inaccuracy due to the nature of how these numbers are stored in memory.
In the case of solving systems of equations using linear algebra, even after solving for a particular vector x in Ax = b using a straight forward Gaussian elimination algorithm based on floating point arithmetic, the expression Ax will not necessarily be exactly equal to b. I recommend reading further: https://en.wikipedia.org/wiki/Floating-point_arithmetic
You've stumbled upon a discipline known as numerical analysis, by the way! More specifically numerical linear algebra. This is a deep topic. Gaussian elimination is a lovely algorithm, but I also hope this answer galvanizes future readers to look into more advanced algorithms for solving systems of equations.
I am trying to use a for loop for a binary classification function using numpy. The problem that I am having is that the loop needs to continue until all values in Y are greater than 0 when multiplied by the updated value ak.
However, this loop is returning a value of ak that does ensure all values of Y are greater than 0. For example, the current output, [0,0,7], when multiplied by the fourth array in Y1, [-1,-1,0], is equal to 0.
What can I do to fix this issue? I have posted my code below. Thanks in advance.
import numpy as np
def percep(a1, epochs, Y):
sum = Y1.sum(axis=0)
print(sum)
ak=a1
for k in range(1, epochs):
if np.multiply(ak, Y).all() > 0:
return(ak)
else:
rho = np.multiply(1/k, sum)
ak = np.add(ak, rho)
return(ak)
a1 = np.array([0,0,0])
Y1 = np.array([ [1,1,1], [1,1,2], [1,2,2], [-1,-1,0], [-1, -1, 1], [-1, -2, 1]])
I'm working on some code for dehazing images, based on this paper, and I started with an abandoned Py2.7 implementation. Since then, particularly with Numba, I've made some real performance improvements (important since I'll have to run this on 8K images).
I'm pretty convinced my last significant performance bottleneck is in performing the box filter step (I've already shaved off almost a minute per image, but this last slow step is ~30s/image), and I'm close to getting it to run as nopython in Numba:
#njit # Row dependencies means can't be parallel
def yCumSum(a):
"""
Numba based computation of y-direction
cumulative sum. Can't be parallel!
"""
out = np.empty_like(a)
out[0, :] = a[0, :]
for i in prange(1, a.shape[0]):
out[i, :] = a[i, :] + out[i - 1, :]
return out
#njit(parallel= True)
def xCumSum(a):
"""
Numba-based parallel computation
of X-direction cumulative sum
"""
out = np.empty_like(a)
for i in prange(a.shape[0]):
out[i, :] = np.cumsum(a[i, :])
return out
#jit
def _boxFilter(m, r, gpu= hasGPU):
if gpu:
m = cp.asnumpy(m)
out = __boxfilter__(m, r)
if gpu:
return cp.asarray(out)
return out
#jit(fastmath= True)
def __boxfilter__(m, r):
"""
Fast box filtering implementation, O(1) time.
Parameters
----------
m: a 2-D matrix data normalized to [0.0, 1.0]
r: radius of the window considered
Return
-----------
The filtered matrix m'.
"""
#H: height, W: width
H, W = m.shape
#the output matrix m'
mp = np.empty(m.shape)
#cumulative sum over y axis
ySum = yCumSum(m) #np.cumsum(m, axis=0)
#copy the accumulated values of the windows in y
mp[0:r+1,: ] = ySum[r:(2*r)+1,: ]
#differences in y axis
mp[r+1:H-r,: ] = ySum[(2*r)+1:,: ] - ySum[ :H-(2*r)-1,: ]
mp[(-r):,: ] = np.tile(ySum[-1,: ], (r, 1)) - ySum[H-(2*r)-1:H-r-1,: ]
#cumulative sum over x axis
xSum = xCumSum(mp) #np.cumsum(mp, axis=1)
#copy the accumulated values of the windows in x
mp[:, 0:r+1] = xSum[:, r:(2*r)+1]
#difference over x axis
mp[:, r+1:W-r] = xSum[:, (2*r)+1: ] - xSum[:, :W-(2*r)-1]
mp[:, -r: ] = np.tile(xSum[:, -1][:, None], (1, r)) - xSum[:, W-(2*r)-1:W-r-1]
return mp
There's plenty to do around the edges, but if I can get the tile operation as a nopython call, I can nopython the whole boxfilter step and get a big performance boost. I'm not super inclined to do something really really specific as I'd love to reuse this code elsewhere, but I wouldn't particularly object to it being limited to a 2D scope. For whatever reason I'm just staring at this and not really sure where to start.
np.tile is a bit too complicated to reimplement in full, but unless I'm misreading it looks like you only need to take a vector and then repeat it along a different axis r times.
A Numba-compatible way to do this is to write
y = x.repeat(r).reshape((-1, r))
Then x will be repeated r times along the second dimension, so that y[i, j] == x[i].
Example:
In [2]: x = np.arange(5)
In [3]: x.repeat(3).reshape((-1, 3))
Out[3]:
array([[0, 0, 0],
[1, 1, 1],
[2, 2, 2],
[3, 3, 3],
[4, 4, 4]])
If you want x to be repeated along the first dimension instead, just take the transpose y.T.
I am looking into the NumPy correlation function
numpy.correlate(a, v, mode='valid')[source]
Cross-correlation of two 1-dimensional sequences.
This function computes the correlation as generally defined in signal processing texts:
c_{av}[k] = sum_n a[n+k] * conj(v[n])
Then for the example:
a = [1, 2, 3]
v = [0, 1, 0.5]
np.correlate([1, 2, 3], [0, 1, 0.5], "full")
array([ 0.5, 2. , 3.5, 3. , 0. ])
So the k in the output array is from 0 to 4 in this example. However, I am wondering how does a[n+k] is defined when (n+k) > 2 in this case?
Also, how is conjugate(v(n)) defined and how is each element in array computed?
The formula c_{av}[k] = sum_n a[n+k] * conj(v[n]) is a little misleading because k on the left is not necessarily the Python index of the output array. In the 'full' mode, the possible values of k are those for which there exists at least one n such that a[n+k] * conj(v[n]) is defined (that is, both n+k and n fall in the ranges of respective arrays).
In your examples, k in sum_n a[n+k] * conj(v[n]) can be -2, -1, 0, 1, 2. These generate 5 values that you see. For example, k being -2 results in a[2-2]*conj(v[2]) which is 0.5, and so on.
In general, the range of k in the 'full' mode is from 1-len(a) to len(v)-1 inclusive. So, if k is really understood as Python index, then the formula should be
c_{av}[k] = sum_n a[n+k+len(a)-1] * conj(v[n])