How to prevent accidental assignment into empty NumPy views - python

Consider the following Python + NumPy code that executes without error:
a = np.array((1, 2, 3))
a[13:17] = 23
Using a slice beyond the limits of the array truncates the slice and even returns an empty view if start and stop are beyond the limits. Assigning to such a slice just drops the input.
In my use case the indices are calculated in a non-trivial way and are used to manipulate selected parts of an array. The above behavior means that I might silently skip parts of that manipultion if the indices are miscalculated. That can be hard to detect and can lead to "almost correct" results, i.e. the worst kind of programming errors.
For that reason I'd like to have strict checking for slices so that a start or stop outside the array bounds triggers an error. Is there a way to enable that in NumPy?
As additional information, the arrays are large and the operation is performed very often, i.e. there should be no performance penalty. Furthermore, the arrays are often multidimensional, including multidimensional slicing.

You could be using np.put_along_axis instead, which seems to fit your needs:
>>> a = np.array((1, 2, 3))
>>> np.put_along_axis(a, indices=np.arange(13, 17), axis=0, values=23)
The above will raise the following error:
IndexError: index 13 is out of bounds for axis 0 with size 3
Parameter values can either be a scalar value or another NumPy array.
Or in a shorter form:
>>> np.put_along_axis(a, np.r_[13:17], 23, 0)
Edit: Alternatively np.put has a mode='raise' option (which is set by default):
np.put(a, ind, v, mode='raise')
a: ndarray - Target array.
ind: array_like - Target indices, interpreted as integers.
v: array_like - Values to place in a at target indices. [...]
mode: {'raise', 'wrap', 'clip'} optional - Specifies how out-of-bounds
indices will behave.
'raise' – raise an error (default)
'wrap' – wrap around
'clip' – clip to the range
The default behavior will be:
>>> np.put(a, np.r_[13:17], 23)
IndexError: index 13 is out of bounds for axis 0 with size 3
while with mode='clip', it remains silent:
>>> np.put(a, np.r_[13:17], 23, mode='clip')

Depending on how complicated your indices are (read: how much pain in the backside it is to predict shapes after slicing), you may want to compute the expected shape directly and then reshape to it. If the size of your actual sliced array doesn't match this will raise an error. Overhead is minor:
import numpy as np
from timeit import timeit
def use_reshape(a,idx,val):
expected_shape = ((s.stop-s.start-1)//(s.step or 1) + 1 if isinstance(s,slice) else 1 for s in idx)
a[idx].reshape(*expected_shape)[...] = val
def no_check(a,idx,val):
a[idx] = val
val = 23
idx = np.s_[13:1000:2,14:20]
for f in (no_check,use_reshape):
a = np.zeros((1000,1000))
print(f.__name__)
print(timeit(lambda:f(a,idx,val),number=1000),'ms')
assert (a[idx] == val).all()
# check it works
print("\nThis should raise an exception:\n")
use_reshape(a,np.s_[1000:1001,10],0)
Please note, that this is proof of concept code. To make it safe you'd have to check for unexpected index kinds, matching numbers of dimensions and, importantly, check for indices that select a single element.
Running it anyway:
no_check
0.004587646995787509 ms
use_reshape
0.006306983006652445 ms
This should raise an exception:
Traceback (most recent call last):
File "check.py", line 22, in <module>
use_reshape(a,np.s_[1000:1001,10],0)
File "check.py", line 7, in use_reshape
a[idx].reshape(*expected_shape)[...] = val
ValueError: cannot reshape array of size 0 into shape (1,1)

One way to achieve the behavior you want is to use ranges instead of slices:
a = np.array((1, 2, 3))
a[np.arange(13, 17)] = 23
I think NumPy's behavior here is consistent with the behavior of pure Python's lists and should be expected. Instead of workarounds, it might be better for code readability to explicitly add asserts:
index_1, index_2 = ... # a complex computation
assert index_1 < index_2 and index_2 < a.shape[0]
a[index_1:index_2] = 23

Related

Upsampling using Numpy

I want to upsample a given 1d array by adding 'k-1' zeros between the elements for a given upsampling factor 'k'.
k=2
A = np.array([1,2,3,4,5])
B = np.insert(A,np.arange(1,len(A)), values=np.zeros(k-1))
The Above code works for k=2.
Output: [1 0 2 0 3 0 4 0 5]
k=3
A = np.array([1,2,3,4,5])
B = np.insert(A,np.arange(1,len(A)), values=np.zeros(k-1))
For k=3, it's throwing me an error.
The output I desire is k-1 i.e., 3-1 = 2 zeros between the elements.
Output: [1,0,0,2,0,0,3,0,0,4,0,0,5]
I want to add k-1 zeros between the elements of the 1d array.
ValueError Traceback (most recent call last)
Cell In [98], line 4
1 k = 3
3 A = np.array([1,2,3,4,5])
----> 4 B = np.insert(A, np.arange(1,len(A)), values=np.zeros(k-1))
6 print(k,'\n')
7 print(A,'\n')
File <__array_function__ internals>:180, in insert(*args, **kwargs)
File c:\Users\Naruto\AppData\Local\Programs\Python\Python310\lib\site-packages\numpy\lib\function_base.py:5325, in insert(arr, obj, values, axis)
5323 slobj[axis] = indices
5324 slobj2[axis] = old_mask
-> 5325 new[tuple(slobj)] = values
5326 new[tuple(slobj2)] = arr
5328 if wrap:
ValueError: shape mismatch: value array of shape (2,) could not be broadcast to indexing result of shape (4,)```
Would this be what you are looking for?
k=3
A=np.array([1,2,3,4,5])
B=np.insert(A, list(range(1,len(A)+1))*(k-1), 0)
I just duplicate the indexes in the obj array. Plus, no need to build an array of zeros, a single 0 scalar will do for the value argument.
Note that there are certainly better ways than the list to create that index (since it actually build a list). I fail to think of a one-liner for now. But, if that list is big, it might be a good idea to create an iterator for that.
I am not sure (I've never asked myself this question before) if this insert is optimal neither.
For example
B=np.zeros((len(A)*k,), dtype=np.int)
B[::k]=A
also does the trick. Which one is better memory wise (I would say this one, but just at first glance, because it doesn't create the obj list), and cpu-wise, not sure.
EDIT: in fact, I've just tried. The second solution is way faster (27 ms vs 1586 ms, for A with 50000 values and k=100). Which is not surprising. It is quite easy to figure out what it does (in C, I mean, in numpy code, not in python): just an allocation, and then a for loop to copy some values. It could hardly be simpler. Whereas insert probably computes shifting and such
A simple and fast method using np.zeros to create B, then assign values from A.
k = 3
A = np.array([1,2,3,4,5])
B = np.zeros(k*len(A)-k+1, dtype=A.dtype)
B[::k] = A

Error creating very large sparse matrix from sub-blocks using scipy.sparse.bmat

I am trying to create a matrix using scipy.sparse.bmat from smaller csr matrices - my function call is here: sparse.bmat(HList, format='csr'). The resulting matrix will be a square matrix with ~2.6 billion columns/rows. However, I have the following error when I attempt to construct this matrix:
Traceback (most recent call last):
[...]/lib/python3.7/site-packages/scipy/sparse/construct.py", line 623, in bmat
return coo_matrix((data, (row, col)), shape=shape).asformat(format)
[...]/lib/python3.7/site-packages/scipy/sparse/coo.py", line 192, in __init__
self._check()
[...]/lib/python3.7/site-packages/scipy/sparse/coo.py", line 283, in _check
raise ValueError('negative row index found')
ValueError: negative row index found
The problem appears to occur when the combined matrix is converted into coo format. I believe the problem has something to do with indices overflowing, as the indices of the full matrix wouldn't fit in a 32 bit format (2.6 billion > 2^31). I have tested my matrix construction script for a smaller version of this problem, and it worked correctly.
This post has a very similar problem to mine, however, the solutions listed there didn't work for my situation. Running the test described there,
>>> scipy.sparse.sputils.get_index_dtype((np.array(10**10),))
<class 'numpy.int64'>
I can confirm that numpy is using 64-bit indices.
Is there some other part of my program causing overflow? Or is the source of the problem something else entirely?
Any help is greatly appreciated!
import numpy as np
from scipy.sparse import coo_matrix, csr_matrix, bmat
a = coo_matrix(([1], ([int(1e9)], [int(1e9)])))
blocks = [a.copy() for i in range(200)]
blocks = [blocks for i in range(20)]
arr = bmat(blocks, format='coo')
First thing's first - this is definitely reproducible (I'm using a COO array because I don't want to allocate a 1e11 indptr array).
ValueError: negative row index found
It also doesn't help to convert the a array indices from int32 to int64. In fact, it looks like the problem is entirely internal to the bmat function
# convert everything to COO format
for i in range(M):
for j in range(N):
if blocks[i,j] is not None:
A = coo_matrix(blocks[i,j])
First, it converts all your blocks to COO matrices. If the row and column indices fit in int32s, it will use int32s (and I assume your indices do). Later on it calculates new row values by adding in an offset (based on where the blocks are). Unfortunately, this is where it overflows:
for i, j in zip(ii, jj):
B = blocks[i, j]
...
row[idx] = B.row + row_offsets[i]
col[idx] = B.col + col_offsets[j]
>>> blocks[2, 0].row
array([1000000000], dtype=int32)
>>> blocks[2, 0].row + 2000000002
array([-1294967294], dtype=int32)
Because of that overflow (and because it's in code within bmat that you can't access externally), this is a scipy bug. That said, you can fix it very simply if you copy the scipy bmat function and retype the block index arrays as follows:
for i, j in zip(ii, jj):
B = blocks[i, j]
...
row[idx] = B.row.astype(idx_dtype) + row_offsets[i]
col[idx] = B.col.astype(idx_dtype) + col_offsets[j]

sum in python where list of list are in exponential

I would like to write the following summation in python
The coefficients are given as the following list
cn=[1.2,0.4,0.6]
vn=[1e-6,5e-5,1e-6]
gn=[4.5e3,6.5e3,9e3]
t=np.linspace(0,10,100)
I tried the following
import numpy as np
cn=[1.2,0.4,0.6]
vn=[1e-6,5e-5,1e-6]
gn=[4.5e3,6.5e3,9e3]
t=np.linspace(0,10,100)
yt=np.sum(cn *np.exp(-vn*(t-gn)**2))
but am getting the error
TypeError: bad operand type for unary -: 'list'
I would like to know where am getting it wrong or how to do this task
This run:
import numpy as np
cn=np.array([1.2,0.4,0.6])
vn=np.array([1e-6,5e-5,1e-6])
gn=np.array([4.5e3,6.5e3,9e3])
t=np.linspace(0,10,3)
yt=np.sum(cn * np.exp(-vn * (t - gn)**2))
Transform lists into numpy arrays
Make sure the matrix / array sizes are compatible, (ie. You can't add arrays of different lengths)
Example:
Add int to python list:
cn=[1.2,0.4,0.6]
cn+1
# output: TypeError: can only concatenate list (not "int") to list
Add int to numpy array:
cn=np.array([1.2,0.4,0.6])
cn+1
# output: array([2.2, 1.4, 1.6])
Add numpy arrays with different dimensions:
cn = np.arange(1,3)
t = np.arange(1,100)
cn + t
# output: ValueError: operands could not be broadcast together with shapes (2,) (99,)
Add numpy arrays with the same dimensions:
cn = np.arange(1,3)
t = np.arange(3,5)
cn + t
# output: array([4, 6])
Here is a lazy way of fixing it:
yt=np.sum(cn *np.exp(0-vn*(np.c_[t]-gn)**2), 1)
^ ^------^ ^-^
I've highlighted the changes. The most important change is the np.c_ which does two things:
It converts t to array
It makes t a column vector
1) serves as a "germ" for converting all the other lists to arrays via overloaded arithmetic operators.
Exception: the unary - in front of vn hits vn before it gets the chance to become an array. We put a zero in front the - to make it binary, thereby reducing it's precedence and closing the array coercion chain. This is not the most obvious fix but the one involving the least typing.
2) separates the time dimension from the summation dimension which is likely the correct interpretation. We have to add an eplicit axis argument to the sum which is the 1 we inserted at the very end of the expression.
If found two issues which I fixed but I am not sure is what you intended.
you don't need to convert the list to numpy array because you can perform arithmetic array between ndarray and list which will result ndarray.
Two error found are
1. shape of t was not matching with other arrays
2. you were trying to negate python list which doesn't support it
Also as you haven't put tn in your mathematical expression of summation above so I doubt it you want the length of t to be 3
import numpy as np
cn=[1.2,0.4,0.6]
vn=[1e-6,5e-5,1e-6]
gn=[4.5e3,6.5e3,9e3]
t=np.linspace(0,10,3) # shape of t what 100 and not matching with other arrays
yt=np.sum(cn *np.exp(-(vn*(t-gn)**2))) # -(vn*(t-gn)**2)) wasn't wrapped in brackets

Make the matrix multiplication operator # work for scalars in numpy

In python 3.5, the # operator was introduced for matrix multiplication, following PEP465. This is implemented e.g. in numpy as the matmul operator.
However, as proposed by the PEP, the numpy operator throws an exception when called with a scalar operand:
>>> import numpy as np
>>> np.array([[1,2],[3,4]]) # np.array([[1,2],[3,4]]) # works
array([[ 7, 10],
[15, 22]])
>>> 1 # 2 # doesn't work
Traceback (most recent call last):
File "<input>", line 1, in <module>
TypeError: unsupported operand type(s) for #: 'int' and 'int'
This is a real turnoff for me, since I'm implementing numerical signal processing algorithms that should work for both scalars and matrices. The equations for both cases are mathematically exactly equivalent, which is no surprise, since "1-D x 1-D matrix multiplication" is equivalent to scalar multiplication. The current state however forces me to write duplicate code in order to handle both cases correctly.
So, given that the current state is not satisfactory, is there any reasonable way I can make the # operator work for scalars? I thought about adding a custom __matmul__(self, other) method to scalar data types, but this seems like a lot of hassle considering the number of involved internal data types. Could I change the implementation of the __matmul__ method for numpy array data types to not throw an exception for 1x1 array operands?
And, on a sidenote, which is the rationale behind this design decision? Off the top of my head, I cannot think of any compelling reasons not to implement that operator for scalars as well.
As ajcr suggested, you can work around this issue by forcing some minimal dimensionality on objects being multiplied. There are two reasonable options: atleast_1d and atleast_2d which have different results in regard to the type being returned by #: a scalar versus a 1-by-1 2D array.
x = 3
y = 5
z = np.atleast_1d(x) # np.atleast_1d(y) # returns 15
z = np.atleast_2d(x) # np.atleast_2d(y) # returns array([[15]])
However:
Using atleast_2d will lead to an error if x and y are 1D-arrays that would otherwise be multiplied normally
Using atleast_1d will result in the product that is either a scalar or a matrix, and you don't know which.
Both of these are more verbose than np.dot(x, y) which would handle all of those cases.
Also, the atleast_1d version suffers from the same flaw that would also be shared by having scalar # scalar = scalar: you don't know what can be done with the output. Will z.T or z.shape throw an error? These work for 1-by-1 matrices but not for scalars. In the setting of Python, one simply cannot ignore the distinction between scalars and 1-by-1 arrays without also giving up all the methods and properties that the latter have.

numpy: Invalid value encountered in true_divide

I have two numpy arrays and I am trying to divide one with the other and at the same time, I want to make sure that the entries where the divisor is 0, should just be replaced with 0.
So, I do something like:
log_norm_images = np.where(b_0 > 0, np.divide(diff_images, b_0), 0)
This gives me a run time warning of:
RuntimeWarning: invalid value encountered in true_divide
Now, I wanted to see what was going on and I did the following:
xx = np.isfinite(diff_images)
print (xx[xx == False])
xx = np.isfinite(b_0)
print (xx[xx == False])
However, both of these return empty arrays meaning that all the values in the arrays are finite. So, I am not sure where the invalid value is coming from. I am assuming checking b_0 > 0 in the np.where function takes care of the divide by 0.
The shape of the two arrays are (96, 96, 55, 64) and (96, 96, 55, 1)
You may have a NAN, INF, or NINF floating around somewhere. Try this:
np.isfinite(diff_images).all()
np.isfinite(b_0).all()
If one or both of those returns False, that's likely the cause of the runtime error.
The reason you get the runtime warning when running this:
log_norm_images = np.where(b_0 > 0, np.divide(diff_images, b_0), 0)
is that the inner expression
np.divide(diff_images, b_0)
gets evaluated first, and is run on all elements of diff_images and b_0 (even though you end up ignoring the elements that involve division-by-zero). In other words, the warning happens before the code that ignores those elements. That is why it's a warning and not an error: there are legitimate cases like this one where the division-by-zero is not a problem because it's being handled in a later operation.
Another useful Numpy command is nan_to_num(diff_images)
By default it replaces in a Numpy array; NaN to zero, -INF to -(large number) and +INF to +(large number)
You can change the defaults, see https://numpy.org/doc/stable/reference/generated/numpy.nan_to_num.html
As #drammock pointed out, the cause of the warning is that some of the values in b_0 is 0 and the runtime warning is generated before the np.where is evaluated. While #Luca's suggestion of running np.errstate(invalid='ignore', divide='ignore'):" before the np.where will prevent the warning in this case, there may be other legitimate cases where this warning could be generated. For instance, corresponding elements of b_0 and diff_images are set to np.inf, which would return np.nan.
So to prevent warnings for known cases (i.e. b_0 = 0) and allow for warnings of unknown cases, evaluate the np.where first then evaluate the arithmetic:
#First, create log_norm_images
log_norm_images = np.zeros(b_0.shape)
#Now get the valid indexes
valid = where(b_0 > 0)
#Lastly, evaluate the division problem at the valid indexes
log_norm_images[valid] = np.divide(diff_images[valid], b_0[valid])
num = np.array([1,2,3,4,5])
den = np.array([1,1,0,1,1])
res = np.array([None]*5)
ix = (den!=0)
res[ix] = np.divide( num[ix], den[ix] )
print(res)
[1.0 2.0 None 4.0 5.0]

Categories