location of array of values in numpy array - python

Here is a small code to illustrate the problem.
A = array([[1,2], [1,0], [5,3]])
f_of_A = f(A) # this is precomputed and expensive
values = array([[1,2], [1,0]])
# location of values in A
# if I just had 1d values I could use numpy.in1d here
indices = array([0, 1])
# example of operation type I need (recalculating f_of_A as needed is not an option)
f_of_A[ indices ]
So, basically I think I need some equivalent to in1d for higher dimensions. Does such a thing exist? Or is there some other approach?
Looks like there is also a searchsorted() function, but that seems to work for 1d arrays also. In this example I used 2d points, but any solution would need to work for 3d points also.

Okay, this is what I came up with.
To find the value of one multi-dimensional index, let's say ii = np.array([1,2]), we can do:
n.where((A == ii).all(axis=1))[0]
Let's break this down, we have A == ii, which will give element-wise comparisons with ii for each row of A. We want an entire row to be true, so we add .all(axis=1) to collapse them. To find where these indices happen, we plug this into np.where and get the first value of the tuple.
Now, I don't have a fast way to do this with multiple indices yet (although I have a feeling there is one). However, this will get the job done:
np.hstack([np.where((A == values[i]).all(axis=1))[0] for i in xrange(len(values))])
This basically just calls the above, for each value of values, and concatenates the result.
Update:
Here is for the multi-dimensional case (all in one go, should be fairly fast):
np.where((np.expand_dims(A, -1) == values.T).all(axis=1).any(axis=1))[0]

You can use np.in1d over a view of your original array with all coordinates collapsed into a single variable of dtype np.void:
import numpy as np
A = np.array([[1,2], [1,0], [5,3]])
values = np.array([[1,2], [1,0]])
# Make sure both arrays are contiguous and have common dtype
common_dtype = np.common_type(A, values)
a = np.ascontiguousarray(A, dtype=common_dtype)
vals = np.ascontiguousarray(values, dtype=common_dtype)
a_view = A.view((np.void, A.dtype.itemsize*A.shape[1])).ravel()
values_view = values.view((np.void,
values.dtype.itemsize*values.shape[1])).ravel()
Now each item of a_view and values_view is all coordinates for one point packed together, so you can do whatever 1D magic you would use. I don't see how to use np.in1d to find indices though, so I would go the np.searchsorted route:
sort_idx = np.argsort(a_view)
locations = np.searchsorted(a_view, values_view, sorter=sort_idx)
locations = sort_idx[locations]
>>> locations
array([0, 1], dtype=int64)

Related

Access elements of a Matrix by a list of indices in Python to apply a max(val, 0.5) to each value without a for loop

I know how to access elements in a vector by indices doing:
test = numpy.array([1,2,3,4,5,6])
indices = list([1,3,5])
print(test[indices])
which gives the correct answer : [2 4 6]
But I am trying to do the same thing using a 2D matrix, something like:
currentGrid = numpy.array( [[0, 0.1],
[0.9, 0.9],
[0.1, 0.1]])
indices = list([(0,0),(1,1)])
print(currentGrid[indices])
this should display me "[0.0 0.9]" for the value at (0,0) and the one at (1,1) in the matrix. But instead it displays "[ 0.1 0.1]". Also if I try to use 3 indices with :
indices = list([(0,0),(1,1),(0,2)])
I now get the following error:
Traceback (most recent call last):
File "main.py", line 43, in <module>
print(currentGrid[indices])
IndexError: too many indices for array
I ultimately need to apply a simple max() operation on all the elements at these indices and need the fastest way to do that for optimization purposes.
What am I doing wrong ? How can I access specific elements in a matrix to do some operation on them in a very efficient way (not using list comprehension nor a loop).
The problem is the arrangement of the indices you're passing to the array. If your array is two-dimensional, your indices must be two lists, one containing the vertical indices and the other one the horizontal ones. For instance:
idx_i, idx_j = zip(*[(0, 0), (1, 1), (0, 2)])
print currentGrid[idx_j, idx_i]
# [0.0, 0.9, 0.1]
Note that the first element when indexing arrays is the last dimension, e.g.: (y, x). I assume you defined yours as (x, y) otherwise you'll get an IndexError
There are already some great answers to your problem. Here just a quick and dirty solution for your particular code:
for i in indices:
print(currentGrid[i[0],i[1]])
Edit:
If you do not want to use a for loop you need to do the following:
Assume you have 3 values of your 2D-matrix (with the dimensions x1 and x2 that you want to access. The values have the "coordinates"(indices) V1(x11|x21), V2(x12|x22), V3(x13|x23). Then, for each dimension of your matrix (2 in your case) you need to create a list with the indices for this dimension of your points. In this example, you would create one list with the x1 indices: [x11,x12,x13] and one list with the x2 indices of your points: [x21,x22,x23]. Then you combine these lists and use them as index for the matrix:
indices = [[x11,x12,x13],[x21,x22,x23]]
or how you write it:
indices = list([(x11,x12,x13),(x21,x22,x23)])
Now with the points that you used ((0,0),(1,1),(2,0)) - please note you need to use (2,0) instead of (0,2), because it would be out of range otherwise:
indices = list([(0,1,2),(0,1,0)])
print(currentGrid[indices])
This will give you 0, 0.9, 0.1. And on this list you can then apply the max() command if you like (just to consider your whole question):
maxValue = max(currentGrid[indices])
Edit2:
Here an example how you can transform your original index list to get it into the correct shape:
originalIndices = [(0,0),(1,1),(2,0)]
x1 = []
x2 = []
for i in originalIndices:
x1.append(i[0])
x2.append(i[1])
newIndices = [x1,x2]
print(currentGrid[newIndices])
Edit3:
I don't know if you can apply max(x,0.5) to a numpy array with using a loop. But you could use Pandas instead. You can cast your list into a pandas Series and then apply a lambda function:
import pandas as pd
maxValues = pd.Series(currentGrid[newIndices]).apply(lambda x: max(x,0.5))
This will give you a pandas array containing 0.5,0.9,0.5, which you can simply cast back to a list maxValues = list(maxValues).
Just one note: In the background you will always have some kind of loop running, also with this command. I doubt, that you will get much better performance by this. If you really want to boost performance, then use a for loop, together with numba (you simply need to add a decorator to your function) and execute it in parallel. Or you can use the multiprocessing library and the Pool function, see here. Just to give you some inspiration.
Edit4:
Accidentally I saw this page today, which allows to do exactly what you want with Numpy. The solution (considerin the newIndices vector from my Edit2) to your problem is:
maxfunction = numpy.vectorize(lambda i: max(i,0.5))
print(maxfunction(currentGrid[newIndices]))
2D indices have to be accessed like this:
print(currentGrid[indices[:,0], indices[:,1]])
The row indices and the column indices are to be passed separately as lists.

Python compute a specific inner product on vectors

Assume having two vectors with m x 6, n x 6
import numpy as np
a = np.random.random(m,6)
b = np.random.random(n,6)
using np.inner works as expected and yields
np.inner(a,b).shape
(m,n)
with every element being the scalar product of each combination. I now want to compute a special inner product (namely Plucker). Right now im using
def pluckerSide(a,b):
a0,a1,a2,a3,a4,a5 = a
b0,b1,b2,b3,b4,b5 = b
return a0*b4+a1*b5+a2*b3+a4*b0+a5*b1+a3*b2
with a,b sliced by a for loop. Which is way too slow. Any plans on vectorizing fail. Mostly broadcast errors due to wrong shapes. Cant get np.vectorize to work either.
Maybe someone can help here?
There seems to be an indexing based on some random indices for pairwise multiplication and summing on those two input arrays with function pluckerSide. So, I would list out those indices, index into the arrays with those and finally use matrix-multiplication with np.dot to perform the sum-reduction.
Thus, one approach would be like this -
a_idx = np.array([0,1,2,4,5,3])
b_idx = np.array([4,5,3,0,1,2])
out = a[a_idx].dot(b[b_idx])
If you are doing this in a loop across all rows of a and b and thus generating an output array of shape (m,n), we can vectorize that, like so -
out_all = a[:,a_idx].dot(b[:,b_idx].T)
To make things a bit easier, we can re-arrange a_idx such that it becomes range(6) and re-arrange b_idx with that pattern. So, we would have :
a_idx = np.array([0,1,2,3,4,5])
b_idx = np.array([4,5,3,2,0,1])
Thus, we can skip indexing into a and the solution would be simply -
a.dot(b[:,b_idx].T)

Python: return the row index of the minimum in a matrix

I wanna print the index of the row containing the minimum element of the matrix
my matrix is matrix = [[22,33,44,55],[22,3,4,12],[34,6,4,5,8,2]]
and the code
matrix = [[22,33,44,55],[22,3,4,12],[34,6,4,5,8,2]]
a = np.array(matrix)
buff_min = matrix.argmin(axis = 0)
print(buff_min) #index of the row containing the minimum element
min = np.array(matrix[buff_min])
print(str(min.min(axis=0))) #print the minium of that row
print(min.argmin(axis = 0)) #index of the minimum
print(matrix[buff_min]) # print all row containing the minimum
after running, my result is
1
3
1
[22, 3, 4, 12]
the first number should be 2, because the minimum is 2 in the third list ([34,6,4,5,8,2]), but it returns 1. It returns 3 as minimum of the matrix.
What's the error?
I am not sure which version of Python you are using, i tested it for Python 2.7 and 3.2 as mentioned your syntax for argmin is not correct, its should be in the format
import numpy as np
np.argmin(array_name,axis)
Next, Numpy knows about arrays of arbitrary objects, it's optimized for homogeneous arrays of numbers with fixed dimensions. If you really need arrays of arrays, better use a nested list. But depending on the intended use of your data, different data structures might be even better, e.g. a masked array if you have some invalid data points.
If you really want flexible Numpy arrays, use something like this:
np.array([[22,33,44,55],[22,3,4,12],[34,6,4,5,8,2]], dtype=object)
However this will create a one-dimensional array that stores references to lists, which means that you will lose most of the benefits of Numpy (vector processing, locality, slicing, etc.).
Also, to mention if you can resize your numpy array thing might work, i haven't tested it, but by the concept that should be an easy solution. But i will prefer use a nested list in this case of input matrix
Does this work?
np.where(a == a.min())[0][0]
Note that all rows of the matrix need to contain the same number of elements.

Numpy/Python: Array iteration without for-loop

So it's another n-dimensional array question:
I want to be able to compare each value in an n-dimensional arrays with its neighbours. For example if a is the array which is 2-dimensional i want to be able to check:
a[y][x]==a[y+1][x]
for all elements. So basically check all neighbours in all dimensions. Right now I'm doing it via:
for x in range(1,a.shape[0]-1):
do.something(a[x])
The shape of the array is used, so that I don't run into an index out of range at the edges. So if I want to do something like this in n-D for all elements in the array, I do need n for-loops which seems to be untidy. Is there a way to do so via slicing? Something like a==a[:,-1,:] or am I understanding this fully wrong? And is there a way to tell a slice to stop at the end? Or would there be another idea of getting things to work in a totally other way? Masked arrays?
Greets Joni
Something like:
a = np.array([1,2,3,4,4,5])
a == np.roll(a,1)
which returns
array([False, False, False, False, True, False], dtype=bool
You can specify an axis too for higher dimensions, though as others have said you'll need to handle the edges somehow as the values wrap around (as you can guess from the name)
For a fuller example in 2D:
# generate 2d data
a = np.array((np.random.rand(5,5)) * 10, dtype=np.uint8)
# check all neighbours
for ax in range(len(a.shape)):
for i in [-1,1]:
print a == np.roll(a, i, axis=ax)
This might also be useful, this will compare each element to the following element, along axis=1. You can obviously adjust the axis or the distance. The trick is to make sure that both sides of the == operator have the same shape.
a[:, :-1, :] == a[:, 1:, :]
How about just:
np.diff(a) != 0
?
If you need the neighbours in the other axis, maybe diff the result of np.swapaxes(a) and merge the results together somehow ?

Select cells randomly from NumPy array - without replacement

I'm writing some modelling routines in NumPy that need to select cells randomly from a NumPy array and do some processing on them. All cells must be selected without replacement (as in, once a cell has been selected it can't be selected again, but all cells must be selected by the end).
I'm transitioning from IDL where I can find a nice way to do this, but I assume that NumPy has a nice way to do this too. What would you suggest?
Update: I should have stated that I'm trying to do this on 2D arrays, and therefore get a set of 2D indices back.
How about using numpy.random.shuffle or numpy.random.permutation if you still need the original array?
If you need to change the array in-place than you can create an index array like this:
your_array = <some numpy array>
index_array = numpy.arange(your_array.size)
numpy.random.shuffle(index_array)
print your_array[index_array[:10]]
All of these answers seemed a little convoluted to me.
I'm assuming that you have a multi-dimensional array from which you want to generate an exhaustive list of indices. You'd like these indices shuffled so you can then access each of the array elements in a randomly order.
The following code will do this in a simple and straight-forward manner:
#!/usr/bin/python
import numpy as np
#Define a two-dimensional array
#Use any number of dimensions, and dimensions of any size
d=numpy.zeros(30).reshape((5,6))
#Get a list of indices for an array of this shape
indices=list(np.ndindex(d.shape))
#Shuffle the indices in-place
np.random.shuffle(indices)
#Access array elements using the indices to do cool stuff
for i in indices:
d[i]=5
print d
Printing d verified that all elements have been accessed.
Note that the array can have any number of dimensions and that the dimensions can be of any size.
The only downside to this approach is that if d is large, then indices may become pretty sizable. Therefore, it would be nice to have a generator. Sadly, I can't think of how to build a shuffled iterator off-handedly.
Extending the nice answer from #WoLpH
For a 2D array I think it will depend on what you want or need to know about the indices.
You could do something like this:
data = np.arange(25).reshape((5,5))
x, y = np.where( a = a)
idx = zip(x,y)
np.random.shuffle(idx)
OR
data = np.arange(25).reshape((5,5))
grid = np.indices(data.shape)
idx = zip( grid[0].ravel(), grid[1].ravel() )
np.random.shuffle(idx)
You can then use the list idx to iterate over randomly ordered 2D array indices as you wish, and to get the values at that index out of the data which remains unchanged.
Note: You could also generate the randomly ordered indices via itertools.product too, in case you are more comfortable with this set of tools.
Use random.sample to generates ints in 0 .. A.size with no duplicates,
then split them to index pairs:
import random
import numpy as np
def randint2_nodup( nsample, A ):
""" uniform int pairs, no dups:
r = randint2_nodup( nsample, A )
A[r]
for jk in zip(*r):
... A[jk]
"""
assert A.ndim == 2
sample = np.array( random.sample( xrange( A.size ), nsample )) # nodup ints
return sample // A.shape[1], sample % A.shape[1] # pairs
if __name__ == "__main__":
import sys
nsample = 8
ncol = 5
exec "\n".join( sys.argv[1:] ) # run this.py N= ...
A = np.arange( 0, 2*ncol ).reshape((2,ncol))
r = randint2_nodup( nsample, A )
print "r:", r
print "A[r]:", A[r]
for jk in zip(*r):
print jk, A[jk]
Let's say you have an array of data points of size 8x3
data = np.arange(50,74).reshape(8,-1)
If you truly want to sample, as you say, all the indices as 2d pairs, the most compact way to do this that i can think of, is:
#generate a permutation of data's size, coerced to data's shape
idxs = divmod(np.random.permutation(data.size),data.shape[1])
#iterate over it
for x,y in zip(*idxs):
#do something to data[x,y] here
pass
Moe generally, though, one often does not need to access 2d arrays as 2d array simply to shuffle 'em, in which case one can be yet more compact. just make a 1d view onto the array and save yourself some index-wrangling.
flat_data = data.ravel()
flat_idxs = np.random.permutation(flat_data.size)
for i in flat_idxs:
#do something to flat_data[i] here
pass
This will still permute the 2d "original" array as you'd like. To see this, try:
flat_data[12] = 1000000
print data[4,0]
#returns 1000000
people using numpy version 1.7 or later there can also use the builtin function numpy.random.choice

Categories