NumPy/Python array - python

I created a NumPy array,
a = numpy.array([[1,2,3][4,5,6]])
I want to make the array look like this [[1,4],[2,5],[3,6]] and also after I make change I want to return to the original structure.
Is there a NumPy command to run a function on all values, like a[0] * 2?
The result should be
[[2,8][2,5][3,6]

You want to transpose the array (think matrices). Numpy arrays have a method for that:
a = np.array([[1,2,3],[4,5,6]])
b = a.T # or a.transpose()
But note that b is now a view of a; if you change b, a changes as well (this saves memory and time otherwise spent copying).
You can change the first column of b with
b[0] *= 2
Which gives you the result you want, but a has also changed! If you don't want that, use
b = a.T.copy()
If you do want to change a, note that you can also immediately change the values you want in a itself:
a[:, 0] *= 2

You can use zip on the ndarray and pass it to numpy.array:
In [36]: a = np.array([[1,2,3], [4,5,6]])
In [37]: b = np.array(zip(*a))
In [38]: b
Out[38]:
array([[1, 4],
[2, 5],
[3, 6]])
In [39]: b*2
Out[39]:
array([[ 2, 8],
[ 4, 10],
[ 6, 12]])
Use numpy.column_stack for a pure NumPy solution:
In [44]: b = np.column_stack(a)
In [45]: b
Out[45]:
array([[1, 4],
[2, 5],
[3, 6]])
In [46]: b*2
Out[46]:
array([[ 2, 8],
[ 4, 10],
[ 6, 12]])

Related

Add elements to 2-D array

I have the following Arrays
import numpy as np
A = np.array([[1,2], [3,4]])
B = np.array([5,6])
Now I want to add the second element of A to B using the append function:
B = np.append(B, A[1])
What I want to get is:
B = np.array([[5, 6],[3,4]])
But what I get is:
B = np.array([5, 6, 3, 4])
How can I solve this? Should I use another function instead of numpy.append?
import numpy as np
A = np.array([[1,2], [3,4]])
B = np.array([[5,6]])
B = np.append(B, [A[1]], axis=0)
print(B)
Output
array([[5, 6],
[3, 4]])
This would be one of the way using np.append() specifying the axis = 0.
You can use np.stack for this instead of np.append:
>>> np.stack([B, A[1]])
array([[5, 6],
[3, 4]])
You could also add [:, None] to your two arrays and append with axis=1:
>>> np.append(B[:, None], A[1][:, None], axis=1)
array([[5, 3],
[6, 4]])

Numpy slice to indices

Let's say I have a 3x3 matrix. The 1D indices of this matrix are:
0 1 2
3 4 5
6 7 8
Is there a function that receives a slice and returns the 1D indices, wathever the dimension? Something like:
m = np.ones((3, 3))
id1 = some_function(m, (1, :)) # [3, 4, 5]
id2 = some_function(m, (:, 1)) # [1, 4, 7]
# Use the indices together
m[id1 + id2] = wathever
m[~(id1 + id2)] = wathever else
I don't want to code it because I'm sure it exists somewhere in numpy! For those who wonder why I want that, it's because I want to merge several slices together, use not (~) on the indices, etc.
ravel_multi_index returns the 1d equivalent of n-d indexing tuple:
In [208]: np.ravel_multi_index(([1],[0,1,2]),(3,3))
Out[208]: array([3, 4, 5], dtype=int32)
In [209]: np.ravel_multi_index(([0,1,2],[1]),(3,3))
Out[209]: array([1, 4, 7], dtype=int32)
For more complex indexing we may need to use ix_ to get index broadcasting right:
In [214]: np.ravel_multi_index((np.ix_([0,1,2],[1,2])),(3,3))
Out[214]:
array([[1, 2],
[4, 5],
[7, 8]], dtype=int32)
Now we just need to turn [1,:] in to that tuple. Something in indexing_tricks should do that.
In [222]: np.ravel_multi_index((np.ix_(np.r_[0:3],[1,2])),(3,3))
Out[222]:
array([[1, 2],
[4, 5],
[7, 8]], dtype=int32)
In [223]: np.ravel_multi_index((np.ix_([1],np.r_[0:3])),(3,3))
Out[223]: array([[3, 4, 5]], dtype=int32)
In a more general case we'd want to use m.shape instead of (3,3).
~ works on boolean masks, not indices. So to 'delete' the [1] element from a array, we can do:
In [225]: mask = np.ones((3,),bool)
In [226]: mask[1] = False # index to delete
In [227]: np.arange(3)[mask]
Out[227]: array([0, 2])
This is essentially what np.delete does.

Numpy Vectorization While Indexing Two Arrays

I'm trying to vectorize the following function using numpy and am completely lost.
A = ndarray: Z x 3
B = ndarray: Z x 3
C = integer
D = ndarray: C x 3
Pseudocode:
entries = []
means = []
For i in range(C):
for p in range(len(B)):
if B[p] == D[i]:
entries.append(A[p])
means.append(columnwise_means(entries))
return means
an example would be :
A = [[1,2,3],[1,2,3],[4,5,6],[4,5,6]]
B = [[9,8,7],[7,6,5],[1,2,3],[3,4,5]]
C = 2
D = [[1,2,3],[4,5,6]]
Returns:
[average([9,8,7],[7,6,5]), average(([1,2,3],[3,4,5])] = [[8,7,6],[2,3,4]]
I've tried using np.where, np.argwhere, np.mean, etc but can't seem to get the desired effect. Any help would be greatly appreciated.
Thanks!
Going by the expected output of the question, I am assuming that in the actual code, you would have :
IF conditional statement as : if A[p] == D[i], and
Entries would be appended from B : entries.append(B[p]).
So, here's one vectorized approach with NumPy broadcasting and dot-product -
mask = (D[:,None,:] == A).all(-1)
out = mask.dot(B)/(mask.sum(1)[:,None])
If the input arrays are integer arrays, then you can save on memory and boost up performance, considering the arrays as indices of a n-dimensional array and thus create the 2D mask without going 3D like so -
dims = np.maximum(A.max(0),D.max(0))+1
mask = np.ravel_multi_index(D.T,dims)[:,None] == np.ravel_multi_index(A.T,dims)
Sample run -
In [107]: A
Out[107]:
array([[1, 2, 3],
[1, 2, 3],
[4, 5, 6],
[4, 5, 6]])
In [108]: B
Out[108]:
array([[9, 8, 7],
[7, 6, 5],
[1, 2, 3],
[3, 4, 5]])
In [109]: mask = (D[:,None,:] == A).all(-1)
...: out = mask.dot(B)/(mask.sum(1)[:,None])
...:
In [110]: out
Out[110]:
array([[8, 7, 6],
[2, 3, 4]])
I see two hints :
First, comparing array by rows. A way to do that is to simplify you index system in 1D :
def indexer(M,base=256):
return (M*base**arange(3)).sum(axis=1)
base is an integer > A.max() . Then the selection can be done like that :
indices=np.equal.outer(indexer(D),indexer(A))
for :
array([[ True, True, False, False],
[False, False, True, True]], dtype=bool)
Second, each group can have different length, so vectorisation is difficult for the last step. Here a way to do achieve the job.
B=array(B)
means=[B[i].mean(axis=0) for i in indices]

argsort for a multidimensional ndarray

I'm trying to get the indices to sort a multidimensional array by the last axis, e.g.
>>> a = np.array([[3,1,2],[8,9,2]])
And I'd like indices i such that,
>>> a[i]
array([[1, 2, 3],
[2, 8, 9]])
Based on the documentation of numpy.argsort I thought it should do this, but I'm getting the error:
>>> a[np.argsort(a)]
IndexError: index 2 is out of bounds for axis 0 with size 2
Edit: I need to rearrange other arrays of the same shape (e.g. an array b such that a.shape == b.shape) in the same way... so that
>>> b = np.array([[0,5,4],[3,9,1]])
>>> b[i]
array([[5,4,0],
[9,3,1]])
Solution:
>>> a[np.arange(np.shape(a)[0])[:,np.newaxis], np.argsort(a)]
array([[1, 2, 3],
[2, 8, 9]])
You got it right, though I wouldn't describe it as cheating the indexing.
Maybe this will help make it clearer:
In [544]: i=np.argsort(a,axis=1)
In [545]: i
Out[545]:
array([[1, 2, 0],
[2, 0, 1]])
i is the order that we want, for each row. That is:
In [546]: a[0, i[0,:]]
Out[546]: array([1, 2, 3])
In [547]: a[1, i[1,:]]
Out[547]: array([2, 8, 9])
To do both indexing steps at once, we have to use a 'column' index for the 1st dimension.
In [548]: a[[[0],[1]],i]
Out[548]:
array([[1, 2, 3],
[2, 8, 9]])
Another array that could be paired with i is:
In [560]: j=np.array([[0,0,0],[1,1,1]])
In [561]: j
Out[561]:
array([[0, 0, 0],
[1, 1, 1]])
In [562]: a[j,i]
Out[562]:
array([[1, 2, 3],
[2, 8, 9]])
If i identifies the column for each element, then j specifies the row for each element. The [[0],[1]] column array works just as well because it can be broadcasted against i.
I think of
np.array([[0],
[1]])
as 'short hand' for j. Together they define the source row and column of each element of the new array. They work together, not sequentially.
The full mapping from a to the new array is:
[a[0,1] a[0,2] a[0,0]
a[1,2] a[1,0] a[1,1]]
def foo(a):
i = np.argsort(a, axis=1)
return (np.arange(a.shape[0])[:,None], i)
In [61]: foo(a)
Out[61]:
(array([[0],
[1]]), array([[1, 2, 0],
[2, 0, 1]], dtype=int32))
In [62]: a[foo(a)]
Out[62]:
array([[1, 2, 3],
[2, 8, 9]])
The above answers are now a bit outdated, since new functionality was added in numpy 1.15 to make it simpler; take_along_axis (https://docs.scipy.org/doc/numpy-1.15.1/reference/generated/numpy.take_along_axis.html) allows you to do:
>>> a = np.array([[3,1,2],[8,9,2]])
>>> np.take_along_axis(a, a.argsort(axis=-1), axis=-1)
array([[1 2 3]
[2 8 9]])
I found the answer here, with someone having the same problem. They key is just cheating the indexing to work properly...
>>> a[np.arange(np.shape(a)[0])[:,np.newaxis], np.argsort(a)]
array([[1, 2, 3],
[2, 8, 9]])
You can also use linear indexing, which might be better with performance, like so -
M,N = a.shape
out = b.ravel()[a.argsort(1)+(np.arange(M)[:,None]*N)]
So, a.argsort(1)+(np.arange(M)[:,None]*N) basically are the linear indices that are used to map b to get the desired sorted output for b. The same linear indices could also be used on a for getting the sorted output for a.
Sample run -
In [23]: a = np.array([[3,1,2],[8,9,2]])
In [24]: b = np.array([[0,5,4],[3,9,1]])
In [25]: M,N = a.shape
In [26]: b.ravel()[a.argsort(1)+(np.arange(M)[:,None]*N)]
Out[26]:
array([[5, 4, 0],
[1, 3, 9]])
Rumtime tests -
In [27]: a = np.random.rand(1000,1000)
In [28]: b = np.random.rand(1000,1000)
In [29]: M,N = a.shape
In [30]: %timeit b[np.arange(np.shape(a)[0])[:,np.newaxis], np.argsort(a)]
10 loops, best of 3: 133 ms per loop
In [31]: %timeit b.ravel()[a.argsort(1)+(np.arange(M)[:,None]*N)]
10 loops, best of 3: 96.7 ms per loop

Select a submatrix based on diagonal value

I want to select a submatrix of a numpy matrix based on whether the diagonal is less than some cutoff value. For example, given the matrix:
Test = array([[1,2,3,4,5],
[2,3,4,5,6],
[3,4,5,6,7],
[4,5,6,7,8],
[5,6,7,8,9]])
I want to select the rows and columns where the diagonal value is less than, say, 6. In this example, the diagonal values are sorted, so that I could just take Test[:3,:3], but in the general problem I want to solve this isn't the case.
The following snippet works:
def MatrixCut(M,Ecut):
D = diag(M)
indices = D<Ecut
n = sum(indices)
NewM = zeros((n,n),'d')
ii = -1
for i,ibool in enumerate(indices):
if ibool:
ii += 1
jj = -1
for j,jbool in enumerate(indices):
if jbool:
jj += 1
NewM[ii,jj] = M[i,j]
return NewM
print MatrixCut(Test,6)
[[ 1. 2. 3.]
[ 2. 3. 4.]
[ 3. 4. 5.]]
However, this is fugly code, with all kinds of dangerous things like initializing the ii/jj indices to -1, which won't cause an error if somehow I get into the loop and take M[-1,-1].
Plus, there must be a numpythonic way of doing this. For a one-dimensional array, you could do:
D = diag(A)
A[D<Ecut]
But the analogous thing for a 2d array doesn't work:
D = diag(Test)
Test[D<6,D<6]
array([1, 3, 5])
Is there a good way to do this? Thanks in advance.
This also works when the diagonals are not sorted:
In [7]: Test = array([[1,2,3,4,5],
[2,3,4,5,6],
[3,4,5,6,7],
[4,5,6,7,8],
[5,6,7,8,9]])
In [8]: d = np.argwhere(np.diag(Test) < 6).squeeze()
In [9]: Test[d][:,d]
Out[9]:
array([[1, 2, 3],
[2, 3, 4],
[3, 4, 5]])
Alternately, to use a single subscript call, you could do:
In [10]: d = np.argwhere(np.diag(Test) < 6)
In [11]: Test[d, d.flat]
Out[11]:
array([[1, 2, 3],
[2, 3, 4],
[3, 4, 5]])
[UPDATE]: Explanation of the second form.
At first, it may be tempting to just try Test[d, d] but that will only extract elements from the diagonal of the array:
In [75]: Test[d, d]
Out[75]:
array([[1],
[3],
[5]])
The problem is that d has shape (3, 1) so if we use d in both subscripts, the output array will have the same shape as d. The d.flat is equivalent to using d.flatten() or d.ravel() (except flat just returns an iterator instead of an array). The effect is that the result has shape (3,):
In [76]: d
Out[76]:
array([[0],
[1],
[2]])
In [77]: d.flatten()
Out[77]: array([0, 1, 2])
In [79]: print d.shape, d.flatten().shape
(3, 1) (3,)
The reason Test[d, d.flat] works is because numpy's general broadcasting rules cause the last dimension of d (which is 1) to be broadcast to the last (and only) dimension of d.flat (which is 3). Similarly, d.flat is broadcast to match the first dimension of d. The result is two (3,3) index arrays, which are equivalent to the following arrays i and j:
In [80]: dd = d.flatten()
In [81]: i = np.hstack((d, d, d)
In [82]: j = np.vstack((dd, dd, dd))
In [83]: print i
[[0 0 0]
[1 1 1]
[2 2 2]]
In [84]: print j
[[0 1 2]
[0 1 2]
[0 1 2]]
And just to make sure they work:
In [85]: Test[i, j]
Out[85]:
array([[1, 2, 3],
[2, 3, 4],
[3, 4, 5]])
The only way I found to solve your task is somewhat tricky
>>> Test[[[i] for i,x in enumerate(D<6) if x], D<6]
array([[1, 2, 3],
[2, 3, 4],
[3, 4, 5]])
possibly not the best one. Based on this answer.
Or (thanks to #bogatron or reminding me argwhere):
>>> Test[np.argwhere(D<6), D<6]
array([[1, 2, 3],
[2, 3, 4],
[3, 4, 5]])

Categories