I want to slice an array so that I can use it to perform an operation with another array of arbitrary dimension. In other words, I am doing the following:
A = np.random.rand(5)
B = np.random.rand(5,2,3,4)
slicer = [slice(None)] + [None]*(len(B.shape)-1)
result = B*A[slicer]
Is there some syntax that I can use so that I do not have to construct slicer?
In this specific case you can use np.einsum with an ellipsis.
result2 = np.einsum('i,i...->i...', A, B)
np.allclose(result, result2)
Out[232]: True
Although, as #hpaulj points out this only works for multiplication (or division if you use 1/B).
Since broadcasting works from the other end normally, you can use np.transpose twice get the axes in the right order.
result3 = np.transpose(np.transpose(B) * A)
But that's also not a general case
Related
I have code that generates a boolean array that acts as a mask on numpy arrays, along the lines of:
def func():
a = numpy.arange(10)
mask = a % 2 == 0
return a[mask]
Now, I need to separate this into a case where the mask is created, and one where it is not created and all values are used instead. This could be achieved as follows:
def func(use_mask):
a = numpy.arange(10)
if use_mask:
mask = a % 2 == 0
else:
mask = numpy.ones(10, dtype=bool)
return a[mask]
However, this becomes extremely wasteful for large arrays, since an equally large boolean array must first be created.
My question is thus: Is there something I can pass as an "index" to recreate the behavior of such an everywhere-true array?
Systematically changing occurrences of a[mask] to something else involving some indexing magic etc. is a valid solution, but just avoiding the masking entirely via an expanded case distinction or something else that changes the structure of the code is not desired, as it would impair readability and maintainability (see next paragraph).
For the sake of completeness, here's what I'm currently considering doing, though this makes the code messier and less streamlined since it expands the if/else beyond where it technically needs to be (in reality, the mask is used more than once, hence every occurrence would need to be contained within the case distinction; I used f1 and f2 as examples here):
def func(use_mask):
a = numpy.arange(10)
if use_mask:
mask = a % 2 == 0
r = f1(a[mask])
q = f2(a[mask], r)
return q
else:
r = f1(a)
q = f2(a, r)
return q
Recall that a[:] returns the contents of a (even if a is multidimensional). We cannot store the : in the mask variable, but we can use a slice object equivalently:
def func(use_mask):
a = numpy.arange(10)
if use_mask:
mask = a % 2 == 0
else:
mask = slice(None)
return a[mask]
This does not use any memory to create the index array. I'm not sure what the CPU usage of the a[slice(None)] operation is, though.
I would like to slice a numpy array to obtain the i-th index in the last dimension. For a 3D array, this would be:
slice = myarray[:, :, i]
But I am writing a function where I can take an array of arbitrary dimensions, so for a 4D array I'd need myarray[:, :, :, i], and so on. Is there a way I can obtain this slice for any array without explicitly having to write the array dimensions?
There is ... or Ellipsis, which does exactly this:
slice = myarray[..., i]
Ellipsis is the python object, if you should want to use it outside the square bracket notation.
Actually, just found the answer. As stated in numpy's documentation this can be done with the slice object. In my particular case, this would do it:
idx = [slice(None)] * (myarray.ndim - 1) + [i]
my_slice = myarray[idx]
The slice(None) is equivalent to choosing all elements in that index, and the last [i] selects a specific index for the last dimension.
In terms of slicing an arbitrary dimension, the previous excellent answers can be extended to:
indx = [slice(None)]*myarray.ndim
indx[slice_dim] = i
sliced = myarray[indx]
This returns the slice from any dimension slice_dim - slice_dim = -1 reproduces the previous answers.
For completeness - the first two lines of the above listing can be condensed to:
indx = [slice(None)]*(slice_dim) + [i] + [slice(None)]*(myarray.ndim-slice_dim-1)
though I find the previous version more readable.
If I have a.shape = (3,4,5) and b.shape = (3,5), using np.einsum() makes broadcasting then multiplying the two arrays super easy and explicit:
result = np.einsum('abc, ac -> abc', a, b)
But if I want to add the two arrays, so far as I can tell, I need two separate steps so that the broadcasting happens properly, and the code feels less explicit.
b = np.expand_dims(b, 1)
result = a + b
Is there way out there that allows me to do this array addition with the clarity of np.einsum()?
Broadcasting can occur only on one extra dimension. For adding these two arrays one could expand them in a one-liner as follows:
import numpy as np
a = np.random.rand(3,4,5); b = np.random.rand(3,5);
c = a + b[:, None, :] # c is shape of a, broadcasting occurs along 2nd dimension
Note this is not any different than c = a + np.expand_dim(b, 1). In terms of clarity it is a personal style thing. I prefer broadcasting, others prefer einsum.
I found this task in a book of my prof:
def f(x):
return f = log(exp(z))
def problem(M: List)
return np.array([f(x) for x in M])
How do I implement a solution?
Numpy is all about performing operations on entire arrays. Your professor is expecting you to use that functionality.
Start by converting your list M into array z:
z = np.array(M)
Now you can do elementwise operations like exp and log:
e = np.exp(z)
f = 1 + e
g = np.log(f)
The functions np.exp and np.log are applied to each element of an array. If the input is not an array, it will be converted into one.
Operations like 1 + e work on an entire array as well, in this case using the magic of broadcasting. Since 1 is a scalar, it can unambiguously expanded to the same shape as e, and added as if by np.add.
Normally, the sequence of operations can be compactified into a single line, similarly to what you did in your initial attempt. You can reduce the number of operations slightly by using np.log1p:
def f(x):
return np.log1p(np.exp(x))
Notice that I did not convert x to an array first since np.exp will do that for you.
A fundamental problem with this naive approach is that np.exp will overflow for values that we would expect to get reasonable results. This can be solved using the technique in this answer:
def f(x):
return np.log1p(np.exp(-np.abs(x))) + np.maximum(x, 0)
I calculated the elements by double for loop as follows.
N,l=20,10
a=np.random.rand(N,l)
b=np.random.rand(N,l)
r=np.zeros((N,N,l))
for i in range(N):
for j in range(N):
r[i,j]=a[i]*a[j]*(b[i]-b[j])-a[i]/a[j]
Question:
How to vectorize the array and calculate it with broadcasting?
I also want to set the index inot equalsj, which means leave the diagonal element as zero. Can I do that also by vectorization?
You can broadcast all of the arithmetic and remove the loop.s
r2 = (a[:,None]*a) * (b[:,None]-b) - (a[:,None]/a)
# Verify the correctness
np.array_equal(r, r2)
# True
Finally, to set diagonals to zero, either use in-place assignment
r2[(np.arange(N),)*2] = 0
Or, numpy.fill_diagonal, which also fills in-place.
np.fill_diagonal(r2, 0)