I have a problem using multi-dimensional vectors as indices for multi-dimensional vectors. Say I have C.ndim == idx.shape[0], then I want C[idx] to give me a single element. Allow me to explain with a simple example:
A = arange(0,10)
B = 10+A
C = array([A.T, B.T])
C = C.T
idx = array([3,1])
Now, C[3] gives me the third row, and C[1] gives me the first row. C[idx] then will give me a vstack of both rows. However, I need to get C[3,1]. How would I achieve that given arrays C, idx?
/edit:
An answer suggested tuple(idx). This work's perfectly for a single idx. But:
Let's take it to the next level: say INDICES is a vector where I have stacked vertically arrays of shape idx. tuple(INDICES) will give me one long tuple, so C[tuple(INDICES)] won't work. Is there a clean way of doing this or will I need to iterate over the rows?
If you convert idx to a tuple, it'll be interpreted as basic and not advanced indexing:
>>> C[3,1]
13
>>> C[tuple(idx)]
13
For the vector case:
>>> idx
array([[3, 1],
[7, 0]])
>>> C[3,1], C[7,0]
(13, 7)
>>> C[tuple(idx.T)]
array([13, 7])
>>> C[idx[:,0], idx[:,1]]
array([13, 7])
Related
I have my np array list with tuples like np.array[(0,1), (2,5),...]
Now I want to search for the index of a certain value. But I just know the left side of the tuple. The approach I have found to get the index of a value (if you have both) is the following:
x = np.array(list(map(lambda x: x== (2, 5), groups)))
print(np.where(x))
But how can I search if I only know x==(2,) but not the right number?
As stated in https://numpy.org/doc/stable/reference/generated/numpy.where.html#numpy.where, it is preferred to use np.nonzero directly. I would also recommend reading up on NumPy's use of boolean masking. The answer to your question, is this:
import numpy as np
i = 0 # index of number we're looking for
j = 2 # number we're looking for
mask = x[:,i] == j # Generate a binary/boolean mask for this array and this comparison
indices = np.nonzero(mask) # Find indices where x==(2,_)
print(indices)
In NumPy it's generally preferred to avoid loops like the one you use above. Instead, you should use vectorized operations. So, try to avoid the list(map()) construction you used here.
It might just be easier that you think.
Example:
# Create a sample array.
a = np.array([(0, 1), (2, 5), (3, 2), (4, 6)])
# Use slicing to return all rows, there column 0 equals 2.
(a[:,0] == 2).argmax()
>>> 1
# Test the returned index against the array to verify.
a[1]
>>> array([2, 5])
Another way to look at the array is shown below, and will help put the concept of rows/columns into perspective for the mentioned array:
>>> a
array([[0, 1],
[2, 5],
[3, 2],
[4, 6]])
Example of what I want to do:
import numpy as np
values = np.array([7, 7, 5, 2, 3, 9])
indices = np.array([
np.array([3,5]),
np.array([4]),
np.array([1,2,3])
])
>>> values[indices]
array([
array([2,9]),
array([3]),
array([7,5,2]),
])
Is it possible to achieve this using vectorization?
Right now I'm doing it with a for loop, but it can get slow.
Thanks!
We could concatenate the indices, index into values with those and finally split back -
idx = np.concatenate(indices)
all_out = values[idx]
lens = list(map(len,indices))
ssidx = np.r_[0,lens].cumsum()
out = [all_out[i:j] for (i,j) in zip(ssidx[:-1],ssidx[1:])]
For completeness, here's the straight-forward indexing based version -
[values[i] for i in indices]
So, with the proposed method we are making use of slicing and hence reducing per-iteration workload. As such, alongwith the step to get idx that needs concatenation of all indices in the proposed one, it makes sense for the case with small indexing arrays in indices.
Let's say that I have n numpy arrays of the same length. I would like to now create a numpy matrix, sucht that each column of the matrix is one of the numpy arrays. How can I achieve this? Now I'm doing this in a loop and it produces the wrong results.
Note: I have to be able to stack them next to each other one by one iteratively.
my code looks like assume that get_array is a function that returns a certain array based on its argument. I don't know until after the loop, how many columns that I'm going to have.
matrix = np.empty((n_rows,))
for item in sorted_arrays:
array = get_array(item)
matrix = np.vstack((matrix,array))
any help would be appreciated
You could try putting all your arrays (or lists) into a matrix and then transposing it. This will work if all arrays are the same length.
mymatrix = np.asmatrix((array1, array2, array3)) #... putting arrays into matrix.
mymatrix = mymatrix.transpose()
This should output a matrix with each array as a column. Hope this helps.
Time and again, we recommend collecting the arrays in a list, and making the final array with one call. That's more efficient, and usually easier to get right.
alist = []
for item in sorted_arrays:
alist.append(get_array(item)
or
alist = [get_array(item) for item in sorted_arrays]
There are various ways of assembling the list. Since you want columns, and assuming get_array produces equal sized 1d arrays:
arr = np.column_stack(alist)
Collecting them in rows and transposing that works too:
arr = np.array(alist).T
arr = np.vstack(alist).T
arr = np.stack(alist).T
arr = np.stack(alist, axis=1)
If the arrays are already 2d
arr = np.concatenate(alist, axis=1)
All the stack variations use concatenate, just varying in how they tweak the shape(s) of the input arrays. The key to using concatenate is to understand the dimensions and shapes, and how to add dimensions as needed. That should, soon or later, become fluent in that kind of coding.
If they vary in shape or dimensions, things get messier.
Equally good is to put the arrays in a pre-allocated array. But you need to know the desired final shape
arr = np.zeros((m,n), dtype)
for i, item in enumerate(sorted_arrays):
arr[:,i] = get_array(item)
n is len(sorted_arrays), and m is the length of one of get_array(item). You also need to know the expected dtype (int, float etc).
If you have a, b, c, d np array of same length, the following code will accomplish what you want:
out_matrix = np.vstack([a, b, c, d]).transpose()
An example:
In [3]: a = np.array([1, 2, 3, 4])
In [4]: b = np.array([5, 6, 7, 8])
In [5]: c = np.array([2, 3, 4, 5])
In [6]: d = np.array([6, 8, 2, 4])
In [10]: np.vstack([a, b, c, d]).transpose()
Out[10]:
array([[1, 5, 2, 6],
[2, 6, 3, 8],
[3, 7, 4, 2],
[4, 8, 5, 4]])
I have two arrays, one is a matrix of index pairs,
a = array([[[0,0],[1,1]],[[2,0],[2,1]]], dtype=int)
and another which is a matrix of data to access at these indices
b = array([[1,2,3],[4,5,6],[7,8,9]])
and I want to able to use the indices of a to get the entries of b. Just doing:
>>> b[a]
does not work, as it gives one row of b for each entry in a, i.e.
array([[[[1,2,3],
[1,2,3]],
[[4,5,6],
[4,5,6]]],
[[[7,8,9],
[1,2,3]],
[[7,8,9],
[4,5,6]]]])
when I would like to use the index pair in the last axis of a to give the two indices of b:
array([[1,5],[7,8]])
Is there a clean way of doing this, or do I need to reshape b and combine the columns of a in a corresponding manner?
In my actual problem a has about 5 million entries, and b is 100-by-100, I'd like to avoid for loops.
Actually, this works:
b[a[:, :, 0],a[:, :, 1]]
Gives array([[1, 5],
[7, 8]]).
For this case, this works
tmp = a.reshape(-1,2)
b[tmp[:,0], tmp[:,1]]
A more general solution, whenever you want to use a 2D array of indices of shape (n,m) with arbitrary large dimension m, named inds, in order to access elements of another 2D array of shape (n,k), named B:
# array of index offsets to be added to each row of inds
offset = np.arange(0, inds.size, inds.shape[1])
# numpy.take(B, C) "flattens" arrays B and C and selects elements from B based on indices in C
Result = np.take(B, offset[:,np.newaxis]+inds)
Another solution, which doesn't use np.take and I find more intuitive, is the following:
B[np.expand_dims(np.arange(B.shape[0]), -1), inds]
The advantage of this syntax is that it can be used both for reading elements from B based on inds (like np.take), as well as for assignment.
You can test this by using, e.g.:
B = 1/(np.arange(n*m).reshape(n,-1) + 1)
inds = np.random.randint(0,B.shape[1],(B.shape[0],B.shape[1]))
Say that I have 4 numpy arrays
[1,2,3]
[2,3,1]
[3,2,1]
[1,3,2]
In this case, I've determined [1,2,3] is the "minimum array" for my purposes, as it is one of two arrays with lowest value at index 0, and of those two arrays it has the the lowest index 1. If there were more arrays with similar values, I would need to compare the next index values, and so on.
How can I extract the array [1,2,3] in that same order from the pile?
How can I extend that to x arrays of size n?
Thanks
Using the python non-numpy .sort() or sorted() on a list of lists (not numpy arrays) automatically does this e.g.
a = [[1,2,3],[2,3,1],[3,2,1],[1,3,2]]
a.sort()
gives
[[1,2,3],[1,3,2],[2,3,1],[3,2,1]]
The numpy sort seems to only sort the subarrays recursively so it seems the best way would be to convert it to a python list first. Assuming you have an array of arrays you want to pick the minimum of you could get the minimum as
sorted(a.tolist())[0]
As someone pointed out you could also do min(a.tolist()) which uses the same type of comparisons as sort, and would be faster for large arrays (linear vs n log n asymptotic run time).
Here's an idea using numpy:
import numpy
a = numpy.array([[1,2,3],[2,3,1],[3,2,1],[1,3,2]])
col = 0
while a.shape[0] > 1:
b = numpy.argmin(a[:,col:], axis=1)
a = a[b == numpy.min(b)]
col += 1
print a
This checks column by column until only one row is left.
numpy's lexsort is close to what you want. It sorts on the last key first, but that's easy to get around:
>>> a = np.array([[1,2,3],[2,3,1],[3,2,1],[1,3,2]])
>>> order = np.lexsort(a[:, ::-1].T)
>>> order
array([0, 3, 1, 2])
>>> a[order]
array([[1, 2, 3],
[1, 3, 2],
[2, 3, 1],
[3, 2, 1]])