How can we extract the rows of a matrix given a batch of indices (in Python)?
i = [[0,1],[1,2],[2,3]]
a = jnp.array([[1,2,3,4],[2,3,4,5]])
def extract(A,idx):
A = A[:,idx]
return A
B = extract(a,i)
I expect to get this result (where the matrices are stacked):
B = [[[1,2],
[2,3]],
[[2,3],
[3,4]],
[3,4],
[4,5]]]
And NOT:
B_ = [[1, 2],
[2, 3],
[3, 4]],
[[2, 3],
[3 ,4],
[4, 5]]]
In this case, the rows are stacked, but I want to stack the different matrices.
I tried using
jax.vmap(extract)(a,i),
but this gives me an error since a and i don't have the same dimension.... Is there an alternative, without using loops?
You can do this with vmap if you specify in_axes in the right way, and convert your index list into an index array:
vmap(extract, in_axes=(None, 0))(a, jnp.array(i))
# DeviceArray([[[1, 2],
# [2, 3]],
#
# [[2, 3],
# [3, 4]],
#
# [[3, 4],
# [4, 5]]], dtype=int32)
When you say in_axes=(None, 0), it specifies that you want the first argument to be unmapped, and you want the second argument to be mapped along its leading axis.
The reason you need to convert i from a list to an array is because JAX will only map over array arguments: if vmap encounters a collection like a list, tuple, dict, or a general pytree, it attempts to map over each array-like value within the collection.
You can use indexing right away on the matrix a transposed:
a.T[i,:]
Related
I want to return array of elements whose indexes are specified in another array.
>>> a = [[[1,2],[3,4]],[[5,6],[7,8]]]
>>> ar = np.array(a)
>>> p=[0,0]
>>> pr = np.array(p)
>>> ar[:,pr[:]]
array([[[1, 2],
[1, 2]],
[[5, 6],
[5, 6]]])
I understand why the output came. It simply returned 0th element in each 0th (outermost) dimension of ar two times, since p=[0,0]. What I want is it should index 0th dimension of ar using indexes specified inside pr.
So, above should return
[[1,2],[5,6]]
So here, we used p[0] (which is 0) to index inside ar[0] returning [1,2] and p[1] (which is also 0) to index inside index inside ar[1] returning [5,6].
Similarly I want following outputs:
For p=[0,1], I want [[1,2],[7,8]]
For p=[1,0], I want [[3,4],[5,6]]
How can I do this?
You want something like:
>>> ar[np.indices(ar.shape[:1]), pr]
array([[[1, 2],
[5, 6]]])
Note, np.indices simply returns an array of indices of a given shape, in this case:
>>> np.indices(ar.shape[:1])
array([[0, 1]])
Maybe, squeeze it:
>>> ar[np.indices(ar.shape[:1]), pr].squeeze()
array([[1, 2],
[5, 6]])
Or maybe use arange:
>>> ar[np.arange(ar.shape[0]), pr]
array([[1, 2],
[5, 6]])
I want to write a function that takes a numpy array and I want to check if it meets the requirements. One thing that confuses me is that:
np.array([1,2,3]).shape = np.array([[1,2,3],[2,3],[2,43,32]]) = (3,)
[1,2,3] should be allowed, while [[1,2,3],[2,3],[2,43,32]] shouldn't.
Allowed shapes:
[0, 1, 2, 3, 4]
[0, 1, 2]
[[1],[2]]
[[1, 2], [2, 3], [3, 4]]
Not Allowed:
[] (empty array is not allowed)
[[0], [1, 2]] (inner dimensions must have same size 1!=2)
[[[4,5,6],[4,3,2][[2,3,2],[2,3,4]]] (more than 2 dimension)
You should start with defining what you want in terms of shape. I tried to understand it from the question, please add more details if it is not correct.
So here we have (1) empty array is not allowed and (2) no more than two dimensions. It translates the following way:
def is_allowed(arr):
return arr.shape != (0, ) and len(arr.shape) <= 2
The first condition just compares you array's shape with the shape of an empty array. the second condition checks that an array has no more than two dimensions.
With inner dimensions there is a problem. Some of the lists you provided as an example are not numpy arrays. If you cast np.array([[1,2,3],[2,3],[2,43,32]]), you get just an array where each element is the list. It is not a "real" numpy array with direct access to all the elements. See example:
>>> np.array([[1,2,3],[2,3],[2,43,32]])
array([list([1, 2, 3]), list([2, 3]), list([2, 43, 32])], dtype=object)
>>> np.array([[1,2,3],[2,3, None],[2,43,32]])
array([[1, 2, 3],
[2, 3, None],
[2, 43, 32]], dtype=object)
So I would recommend (if you are operating with usual lists) check that all arrays have the same length without numpy.
I have a large NumPy array which I want to fill with new data on each iteration of a loop. The array is filled with data repeated along axis 0, for example:
[[1, 5],
[1, 5],
[1, 5],
[1, 5]]
I know how to create this array from scratch in each iteration:
x = np.repeat([[1, 5]], 4, axis=0)
However, I don't want to create a new array every time, because it's a very large array (much larger than 4x2). Instead, I want to create the array in advance using the above code, and then just fill the array with new data on each iteration.
But np.repeat() returns a new array, rather than acting on an existing array. Is there an equivalent of np.repeat() for filling an existing array?
As we noted in comments, you can use a broadcasting assignment to fill your 2d array with a 1d array-like of the appropriate size:
x[...] = [1, 5]
If by any chance your large array always contains the same items in each row (i.e. you won't change these preset values later), you can almost certainly use broadcasting in later parts of your code and just work with an initial x such as
x = np.array([[1, 5]])
This array has shape (1, 2) which is broadcast-compatible with other arrays of shape (4, 2) you might have in the above example.
If you always need the same values in each row and for some reason you can't use broadcasting (both cases are highly unlikely), you can use broadcast_to to create an array with an explicit 2d shape without copying memory:
x_bc = np.broadcast_to([1, 5], (4, 2)) # broadcast 1d [1, 5] to shape (4, 2)
This might work because it has the right shape with only 2 unique elements in memory:
>>> x_bc
array([[1, 5],
[1, 5],
[1, 5],
[1, 5]])
>>> x_bc.strides
(0, 8)
However you can't mutate it, because it's a read-only view:
>>> x_bc[0, :] = [2, 4]
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-35-ae12ecfe3c5e> in <module>
----> 1 x_bc[0, :] = [2, 4]
ValueError: assignment destination is read-only
So, if you only need the same values in each row and you can't use broadcasting and you want to mutate those same rows later, you can use stride tricks to map the same 1d data to a 2d array:
>>> x_in = np.array([1, 5])
... x_strided = np.lib.stride_tricks.as_strided(x_in, shape=(4,) + x_in.shape,
... strides=(0,) + x_in.strides[-1:])
>>> x_strided
array([[1, 5],
[1, 5],
[1, 5],
[1, 5]])
>>> x_strided[0, :] = [2, 4]
>>> x_strided
array([[2, 4],
[2, 4],
[2, 4],
[2, 4]])
Which gives you a 2d array of fixed shape that always contains one unique row, and mutating any of the rows mutates the rest (since the underlying data corresponds to only a single row). Handle with care, because if you ever want to have two different rows you'll have to do something else.
For example I have an array:
[[[[1 2][3 4]]][[[1 2][3 4]]]]
How would I set 4 equal to 1? I used
array[-1][-1][-1][-1] = array[0][0][0][0]
but I got an error because of it later on. Is there a more general way of doing this?
You can "cheat" by updating the flattened array:
a = np.array([[[1,2],[3,4]],[[1,2],[3,4]]])
a.flat[-1] = a.flat[0]
a
array([[[1, 2],
[3, 4]],
[[1, 2],
[3, 1]]])
I have two numpy arrays acting as lower and upper boundaries of a range of vectors that I want to generate.
In the a similar way that arange() works, I would like to generate the intermediate members as in the example:
lower_boundary = np.array([1,1])
upper_boundary = np.array([3,3])
expected_result = [[1,1], [1,2], [1,3], [2,1], [2,2], [2,3], [3,1], [3,2], [3,3]]
The result can be a list or another numpy array. So far I have managed to workaround this scenario with nested loops, but the dimensions of 'lower_boundary' and 'upper_boundary' may vary, and my approach is not applicable.
In a typical scenario, both boundaries have at least 4 dimensions.
You can use np.indicies to get a range of index values of your desired range (upper_boundary - lower boundary + 1), reshape it to your needs (reshape(len(upper_boundary),-1)) and add your lower_boundry to values resulting in;
>>> np.indices(upper_boundary - lower_boundary + 1).reshape(len(upper_boundary),-1).T + lower_boundary
array([[1, 1],
[1, 2],
[1, 3],
[2, 1],
[2, 2],
[2, 3],
[3, 1],
[3, 2],
[3, 3]])
Edit: I forgot to correct the code before posting, it should be like this.
Thanks #Divakar for the fix.