numpy.ndindex and array slices - python

I have a multidimensional array, in which I want to get 1D slices, something like mega_array[:, i, j, k, .....]
To do it, I try numpy.ndindex:
for idx in np.ndindex(mega_array.shape[1:]):
print mega_array[:, index]
But alas: this still gives me multidimensional slices, where only the dimension, other than first, are equal to one.
I want to use the slices as l-value, so, simple ravel() is not suitable here.
What should I use to get normal, 1D slices?
UPD: Here's a small example:
in_array = np.asarray([[7, 40], [777, 440]])
for index in np.ndindex(in_array.shape[1:]):
print "---"
print index
print in_array[:, index] # gives 2D array
UPD: Here's a 3D example:
in_array = np.asarray([[[7, 40, 5], [777, 440, 0]], [[8, 41, 6], [778, 441, 1]]])
print in_array
print in_array.shape
# print in_array[:, 0, 2]
for index in np.ndindex(in_array.shape[1:]):
print index
print in_array[:, index] # FAILS
# expected [7, 8], [40, 41], [5, 6], [778, 441] and so on.

You need to add a slice to the index.
In:
in_array = np.asarray([[7, 40], [777, 440]])
for index in np.ndindex(in_array.shape[1:]):
print "---"
print index
print in_array[:, index] # gives 2D array
index has values like (0,),(1,), i.e. tuples.
in_array[:,(1,)] is not the same as in_array[:,1]. To get the latter you need to use in_array[(slice(None),1)]. The slice must be part of the index tuple. We can do that by concatenating tuples.
in_array = np.asarray([[7, 40], [777, 440]])
for index in np.ndindex(in_array.shape[1:]):
print "---"
index = (slice(None),)+index
print index
print in_array[index]
printing:
---
(slice(None, None, None), 0)
[ 7 777]
---
(slice(None, None, None), 1)
[ 40 440]
Same adjustment should work with the nD array case

You can use np.dstack that stack arrays in sequence depth wise (along third axis). :
>>> a
array([[[ 7, 40, 5],
[777, 440, 0]],
[[ 8, 41, 6],
[778, 441, 1]]])
>>> np.dstack(a)
array([[[ 7, 8],
[ 40, 41],
[ 5, 6]],
[[777, 778],
[440, 441],
[ 0, 1]]])
Also based on your dimensions you can use other numpy joining functions :http://docs.scipy.org/doc/numpy/reference/routines.array-manipulation.html#joining-arrays

do you need to transpose the matrix to get the rows?
In [5]: in_array = numpy.asarray([[7, 40], [777, 440]])
In [6]: in_array.transpose()[0]
Out[6]: array([ 7, 777])

After you posted your 3D example, now I see what you wish to do. The following should work for arrays with more than 3 dimensions, too:
Save the number of items in the dimensions > 0,
nitems = np.product(in_array.shape[1:])
Reshape the array (similar to np.dstack pointed out by Kasra),
new_array = np.reshape(in_array, [in_array.shape[0], nitems])
Loop over new array:
for i in range(new_array.shape[1]):
print(new_array[:, i])
For me, that gives the expected output.

Related

Find index of a row in numpy array

Given m x n numpy array
X = np.array([
[1, 2],
[10, 20],
[100, 200]
])
how to find index of a row, i.e. [10, 20] -> 1?
n could any - 2, 3, ..., so I can have n x 3 arrays
Y = np.array([
[1, 2, 3],
[10, 20, 30],
[100, 200, 300]
])
so I need to pass a vector of size n, in this case n=3, i.e a vector [10, 20, 30] to get its row index 1? Again, n could be of any value, like 100 or 1000.
Numpy arrays could be big, so I don't want to convert them to lists to use .index()
Just in case that the query array contains duplicate rows that you are looking for, the function below returns multiple indices in such case.
def find_rows(source, target):
return np.where((source == target).all(axis=1))[0]
looking = [10, 20, 30]
Y = np.array([[1, 2, 3],
[10, 20, 30],
[100, 200, 300],
[10, 20, 30]])
print(find_rows(source=Y, target=looking)) # [1, 3]
You can use numpy.equal, which will broadcast and compare row vector against each row of the original array, and if all elements of a row are equal to the target, the row is identical to the target:
import numpy as np
np.flatnonzero(np.equal(X, [10, 20]).all(1))
# [1]
np.flatnonzero(np.equal(Y, [10, 20, 30]).all(1))
# [1]
You can make a function as follow:
def get_index(seq, *arrays):
for array in arrays:
try:
return np.where(array==seq)[0][0]
except IndexError:
pass
then:
>>>get_index([10,20,30],Y)
1
Or with just indexing:
>>>np.where((Y==[10,20,30]).all(axis=1))[0]
1

Sum of rows based on index with Numpy

I have a 2D array composed of 2D vectors and a 1D array of indices.
How can I add / sumvthe rows of the 2D array that share the same index, using numpy?
Example:
arr = np.array([[48, -51], [-15, -55], [26, -49], [-13, -17], [-67, -7], [23, -48], [-29, -64], [37, 68]])
idx = np.array([0, 1, 1, 2, 2, 3, 3, 4])
#desired output
array([[48, -51],
[11, -104],
[-80, -24],
[-6, -112],
[ 37, 68]])
Notice how the original array arr is of shape (8, 2), and the result of the operation is (5, 2).
If the indices are not always grouped, apply np.argsort first:
order = np.argsort(idx)
You can compute the locations of the sums using np.diff followed by np.flatnonzero to get the indices. We'll also prepend zero and shift everything by 1:
breaks = np.flatnonzero(np.concatenate(([1], np.diff(idx[order])))
breaks can now be used as an argument to np.add.reduceat:
result = np.add.reduceat(arr[order, :], breaks, axis=0)
If the indices are already grouped, you don't need to use order at all:
breaks = np.flatnonzero(np.concatenate(([1], np.diff(idx)))
result = np.add.reduceat(arr, breaks, axis=0)
You can use pandas for the purpose:
pd.DataFrame(arr).groupby(idx).sum().to_numpy()
Output:
array([[ 48, -51],
[ 11, -104],
[ -80, -24],
[ -6, -112],
[ 37, 68]])

split ndarray into chunks - whilst maintaining order

I am trying to implement a function, which can split a 3 dimensional numpy array in to 8 pieces, whilst keeping the order intact. Essentially I need the splits to be:
G[:21, :18,:25]
G[21:, :18,:25]
G[21:, 18:,:25]
G[:21, 18:,:25]
G[:21, :18,25:]
G[21:, :18,25:]
G[21:, 18:,25:]
G[:21, 18:,25:]
Where the original size of this particular matrix would have been 42, 36, 50. How is it possible to generalise these 8 "slices" so I do not have to hardcode all of them? essentially move the :in every possible position.
Thanks!
You could apply the 1d slice to successive (lists of) dimensions.
With a smaller 3d array
In [147]: X=np.arange(4**3).reshape(4,4,4)
A compound list comprehension produces a nested list. Here I'm using the simplest double split
In [148]: S=[np.split(z,2,0) for y in np.split(X,2,2) for z in np.split(y,2,1)]
In this case, all sublists have the same size, so I can convert it to an array for convenient viewing:
In [149]: SA=np.array(S)
In [150]: SA.shape
Out[150]: (4, 2, 2, 2, 2)
There are your 8 subarrays, but grouped (4,2).
In [153]: SAA = SA.reshape(8,2,2,2)
In [154]: SAA[0]
Out[154]:
array([[[ 0, 1],
[ 4, 5]],
[[16, 17],
[20, 21]]])
In [155]: SAA[1]
Out[155]:
array([[[32, 33],
[36, 37]],
[[48, 49],
[52, 53]]])
Is the order right? I can change it by changing the axis in the 3 split operations.
Another approach is write your indexing expressions as tuples
In [156]: x,y,z = 2,2,2 # define the split points
In [157]: ind = [(slice(None,x), slice(None,y), slice(None,z)),
(slice(x,None), slice(None,y), slice(None,z)),]
# and so on
In [158]: S1=[X[i] for i in ind]
In [159]: S1[0]
Out[159]:
array([[[ 0, 1],
[ 4, 5]],
[[16, 17],
[20, 21]]])
In [160]: S1[1]
Out[160]:
array([[[32, 33],
[36, 37]],
[[48, 49],
[52, 53]]])
Looks like the same order I got before.
That ind list of tuples can be produced with some sort of iteration and/or list comprehension. Maybe even using itertools.product or np.mgrid to generate the permutations.
An itertools.product version could look something like
In [220]: def foo(i):
return [(slice(None,x) if j else slice(x,None))
for j,x in zip(i,[2,2,2])]
In [221]: SAA = np.array([X[foo(i)] for i in
itertools.product(range(2),range(2),range(2))])
In [222]: SAA[-1]
Out[222]:
array([[[ 0, 1],
[ 4, 5]],
[[16, 17],
[20, 21]]])
product iterates the last value fastest, so the list is reversed (compared to your target).
To generate a particular order it may be easier to list the tuples explicitly, e.g.:
In [227]: [X[foo(i)] for i in [(1,1,1),(0,1,1),(0,0,1)]]
Out[227]:
[array([[[ 0, 1],
[ 4, 5]],
[[16, 17],
[20, 21]]]), array([[[32, 33],
[36, 37]],
[[48, 49],
[52, 53]]]), array([[[40, 41],
[44, 45]],
[[56, 57],
[60, 61]]])]
This highlights the fact that there are 2 distinct issues - generating the iteration pattern, and splitting the array based on this pattern.

2D numpy argsort index returns 3D when used in the original matrix

I am trying to obtain the top 2 values from each row in a matrix using argsort. The indexing is working, as in argsort is returning the correct values. However, when I put the argsort result as an index, it returns a 3 dimensional result.
For example:
test_mat = np.matrix([[0 for i in range(5)] for j in range(5)])
for i in range(5):
for j in range(5):
test_mat[i, j] = i * j
test_mat[range(2,3)] = test_mat[range(2,3)] * -1
last_two = range(-1, -3, -1)
index = np.argsort(test_mat, axis=1)
index = index[:, last_k]
This gives:
index.shape
Out[402]: (5L, 5L)
test_mat[index].shape
Out[403]: (5L, 5L, 5L)
Python is new to me and I find indexing to be very confusing in general even after reading the various array manuals. I spend more time trying to get the right values out of objects than actually solving problems. I'd welcome any tips on where to properly learn what is going on. Thanks.
You can use linear indexing to solve your case, like so -
# Say A is your 2D input array
# Get sort indices for the top 2 values in each row
idx = A.argsort(1)[:,::-1][:,:2]
# Get row offset numbers
row_offset = A.shape[1]*np.arange(A.shape[0])[:,None]
# Add row offsets with top2 sort indices giving us linear indices of
# top 2 elements in each row. Index into input array with those for output.
out = np.take( A, idx + row_offset )
Here's a step-by-step sample run -
In [88]: A
Out[88]:
array([[34, 45, 16, 20, 24],
[37, 13, 49, 37, 21],
[42, 36, 35, 24, 18],
[26, 28, 21, 13, 44]])
In [89]: idx = A.argsort(1)[:,::-1][:,:2]
In [90]: idx
Out[90]:
array([[1, 0],
[2, 3],
[0, 1],
[4, 1]])
In [91]: row_offset = A.shape[1]*np.arange(A.shape[0])[:,None]
In [92]: row_offset
Out[92]:
array([[ 0],
[ 5],
[10],
[15]])
In [93]: np.take( A, idx + row_offset )
Out[93]:
array([[45, 34],
[49, 37],
[42, 36],
[44, 28]])
You can directly get the top 2 values from each row with just sorting along the second axis and some slicing, like so -
out = np.sort(A,1)[:,:-3:-1]

python iterate over and select values from list of zipped arrays

I have several multidimensional arrays that have been zipped into a single list and am trying to remove values from the list according to a selection criteria applied to a single array. Specifically I have the 4 arrays, all of which have the same shape, that have all been zipped into one list of arrays:
in: array1.shape
out: (5,3)
...
in: array4.shape
out: (5,3)
in: array1
out: ([[0, 1, 1],
[0, 0, 1],
[0, 0, 1],
[0, 1, 1],
[0, 0, 0]])
in: array4
out: ([[20, 16, 20],
[15, 19, 17],
[21, 24, 23],
[22, 22, 26],
[27, 24, 23]])
in: fullarray = zip(array1,...,array4)
in: fullarray[0]
out: (array([0, 1, 1]), array([3, 4, 5]), array([33, 34, 35]), array([20, 16, 20]))
I am trying to iterate over the values from a single target array within each set of arrays and select the values from each array with the same index as the target array when the value equals 20. I doubt I explained that clearly so I'll give an example.
in: fullarray[0]
out: (array([0, 1, 1]), array([3, 4, 5]), array([33, 34, 35]), array([20, 16, 20]))
what I want is to iterate over the values of the fourth array in the list for
fullarray[x] and where the value = 20 to take the value of each array with
the same index and append them into a new list as an array.
so the output for fullarray[0] would be ([[0, 3, 33, 20]), [1, 5, 35, 20]])
My previous attempts have all generated a variety of error messages (example below). Any help would be appreciated.
in: for i in g:
for n in i:
if n == 3:
for k in n:
if k == 0:
newlist.append(i[k])
out: for i in fullarray:
2 for n in i:
----> 3 if n == 3:
4 for k in n:
5 if k == 0:
ValueError: The truth value of an array with more than one element is ambiguous.
Use a.any() or a.all()
Edit for modified question:
Here's a piece of code that's doing what you what. The time complexity could probably be improved though.
from numpy import array
fullarray = [(array([0, 1, 1]), array([3, 4, 5]), array([33, 34, 35]), array([20, 16, 20]))]
newlist = []
for arrays in fullarray:
for idx, value in enumerate(arrays[3]):
if value == 20:
newlist.append([array[idx] for array in arrays])
print newlist
Old answer: Assuming all your arrays are the same size you could do the following:
full[idx] contains a tuple with the values of all your arrays at the index idx in the order you zipped them.
import numpy as np
ar1 = np.array([1] * 8)
ar2 = np.array([2] * 8)
full = zip(ar1, ar2)
print full
newlist = []
for idx, v in enumerate(ar1):
if v != 0:
newlist.append(full[idx]) # Here you get a tuple such as (ar1[idx], ar2[idx])
But if len(ar1) > len(ar2) it is going to throw and exception so keep that in mind and adjust your code accordingly.

Categories