How do I insert a single element into an array on numpy. I know how to insert an entire column or row using the insert and axis parameter. But how do I insert/expand by one.
For example, say I have an array:
1 1 1
1 1 1
1 1 1
How do I insert 0 (on the same row), say on (1, 1) location, say:
1 1 1
1 0 1 1
1 1 1
is this doable? If so, then how do you do the opposite (on the same column), say:
1 1 1
1 0 1
1 1 1
1
Numpy has something that looks like ragged arrays, but those are arrays of objects, and probably not what you want. Note the difference in the following:
In [27]: np.array([[1, 2], [3]])
Out[27]: array([[1, 2], [3]], dtype=object)
In [28]: np.array([[1, 2], [3, 4]])
Out[28]:
array([[1, 2],
[3, 4]])
If you want to insert v into row/column i/j, you can do so by padding the other rows. This is easy to do:
In [29]: a = np.array([[1, 1, 1], [1, 1, 1], [1, 1, 1]])
In [30]: i, j, v = 1, 1, 3
In [31]: np.array([np.append(a[i_], [0]) if i_ != i else np.insert(a[i_], j, v) for i_ in range(a.shape[1])])
Out[31]:
array([[1, 1, 1, 0],
[1, 3, 1, 1],
[1, 1, 1, 0]])
To pad along columns, not rows, first transpose a, then perform this operation, then transpose again.
I think you should use append() of regular Python arrays (not numpy)
Here is a short example
A = [[1,1,1],
[1,1,1],
[1,1,1]]
A[1].append(1)
The result is
[[1, 1, 1],
[1, 1, 1, 1],
[1, 1, 1]] # like in your example
Column extension with one element is impossible, because values are stored by rows. Thechnically you can do something like this A.append([None,1,None]), but this is bad practice.
Related
I want to merge an two equal element in an array, let say I am having an array like this
np.array([[0,1,1,2,2],
[0,1,1,2,2],
[0,2,2,2,2]])
I want to produce something like this if I am directing it right
np.array([[0,0,2,0,4],
[0,0,2,0,4],
[0,0,4,0,4]])
And this if I am moving it up
np.array([[0,2,2,4,4],
[0,0,0,0,0],
[0,2,2,2,2]])
My current code simply loops through a normal list
for i in range(4):
for j in range(3):
if mat[i][j]==matrix[i][j+1] and matrix[i][j]!=0:
matrix[i][j]*=2
matrix[i][j+1]=0
I prefer numpy and absence of loops if possible
This task is deceptively difficult to do without loops! You'll need a bunch of high-level numpy tricks to make it work. I sort of fly through them here, but I will try to link to other resources where I can.
From here, the best way to do row-wise comparison is:
a = np.array([[0,1,1,2,2],
[0,1,1,2,2],
[0,2,2,2,2]])
b = np.ascontiguousarray(a).view(np.dtype((np.void, a.dtype.itemsize * a.shape[1])))
b
array([[[0 0 0 0 1 0 0 0 1 0 0 0 2 0 0 0 2 0 0 0]],
[[0 0 0 0 1 0 0 0 1 0 0 0 2 0 0 0 2 0 0 0]],
[[0 0 0 0 2 0 0 0 2 0 0 0 2 0 0 0 2 0 0 0]]],
dtype='|V20')
b.shape
(3, 1)
Notice that the innermost brackets are not an additional dimension, but an np.void object that can be compared with things like np.unique.
Still, getting the indices you want to keep isn't really easy, but here's the one-liner:
c = np.flatnonzero(np.r_[1, np.diff(np.unique(b, return_inverse = 1)[1])])
Eech. It's kinda messy. Basically you're looking for the indices where the lines change, and the first line. Normally you wouldn't need the np.unique call and could just do np.diff(b), but you can't subtract np.void. np.r_ is a shortcut for np.concatenate that's a bit more readable. And np.flatnonzero gives you the indices where your new array isn't zero (i.e. the indices you want to keep)
c
array([0, 2], dtype=int32)
There, now you can use some fancy ufunc.reduceat math to do your addition:
d = np.add.reduceat(a, c, axis = 0)
d
array([[0, 2, 2, 4, 4],
[0, 2, 2, 2, 2]], dtype=int32)
OK, now to add the zeros, we'll just plug that into an np.zero array using advanced indexing
e = np.zeros_like(a)
e[c] = d
e
array([[0, 2, 2, 4, 4],
[0, 0, 0, 0, 0],
[0, 2, 2, 2, 2]])
And there we go! You can go in other directions by transposing or flipping the matrix at the beginning and the end.
def reduce_duplicates(a):
b = np.ascontiguousarray(a).view(np.dtype((np.void, a.dtype.itemsize * a.shape[1])))
c = np.flatnonzero(np.r_[1, np.diff(np.unique(b, return_inverse = 1)[1])])
d = np.add.reduceat(a, c, axis = 0)
e = np.zeros_like(a)
e[c] = d
return e
reduce_duplicates(a.T[::-1,:])[::-1,:].T #reducing right
array([[0, 0, 2, 0, 4],
[0, 0, 2, 0, 4],
[0, 0, 4, 0, 4]])
I don't have numba so I can't test speed against the other suggestion (knowing numba it is probably slower), but it is loopless and numpy.
A "vectorized" version of your function would be pretty messy, since the merges can happen at both even or odd indexes in each row/column, depending on preceding values in that row/column.
To illustrate, see how this vectorized version works on your (horizontal) example which happens to have all merges land on odd indexes:
>>> x
array([[0, 1, 1, 2, 2],
[0, 1, 1, 2, 2],
[0, 2, 2, 2, 2]])
>>> y=x==np.roll(x, 1, axis=1); y[:,1::2]=False; x*y*2
array([[0, 0, 2, 0, 4],
[0, 0, 2, 0, 4],
[0, 0, 4, 0, 4]])
But if I shift one of the rows by 1, it no longer works:
>>> x2
array([[0, 1, 1, 2, 2],
[0, 0, 1, 1, 2],
[0, 2, 2, 2, 2]])
>>> y=x2==np.roll(x2, 1, axis=1); y[:,1::2]=False; x2*y*2
array([[0, 0, 2, 0, 4],
[0, 0, 0, 0, 0],
[0, 0, 4, 0, 4]])
I'm not sure what strategy I would take next, if it is possible to implement this in a vectorized fashion, but it wouldn't be very clean.
I would suggest using numba for something like this. It will keep your code readable and should make it faster. Just add the #jit decorator to your function and evaluate how much it improves performance.
EDIT: I did some timing for you. Also there is a small fix to your function to make it coincide with your example.
>>> def foo(matrix):
... for i in range(matrix.shape[0]):
... for j in range(matrix.shape[1]-1):
... if matrix[i][j]==matrix[i][j+1] and matrix[i][j]!=0:
... matrix[i][j+1]*=2
... matrix[i][j]=0
...
>>> from numba import jit
>>> #jit
... def foo2(matrix):
... for i in range(matrix.shape[0]):
... for j in range(matrix.shape[1]-1):
... if matrix[i][j]==matrix[i][j+1] and matrix[i][j]!=0:
... matrix[i][j+1]*=2
... matrix[i][j]=0
...
>>> import time
>>> z=np.random.random((1000,1000)); start=time.time(); foo(z); print(time.time()-start)
1.0277159214
>>> z=np.random.random((1000,1000)); start=time.time(); foo2(z); print(time.time()-start)
0.00354909896851
for example, I have the numpy arrays like this
a =
array([[1, 2, 3],
[4, 3, 2]])
and index like this to select the max values
max_idx =
array([[0, 2],
[1, 0]])
how can I access there positions at the same time, to modify them.
like "a[max_idx] = 0" getting the following
array([[1, 2, 0],
[0, 3, 2]])
Simply use subscripted-indexing -
a[max_idx[:,0],max_idx[:,1]] = 0
If you are working with higher dimensional arrays and don't want to type out slices of max_idx for each axis, you can use linear-indexing to assign zeros, like so -
a.ravel()[np.ravel_multi_index(max_idx.T,a.shape)] = 0
Sample run -
In [28]: a
Out[28]:
array([[1, 2, 3],
[4, 3, 2]])
In [29]: max_idx
Out[29]:
array([[0, 2],
[1, 0]])
In [30]: a[max_idx[:,0],max_idx[:,1]] = 0
In [31]: a
Out[31]:
array([[1, 2, 0],
[0, 3, 2]])
Numpy support advanced slicing like this:
a[b[:, 0], b[:, 1]] = 0
Code above would fit your requirement.
If b is more than 2-D. A better way should be like this:
a[np.split(b, 2, axis=1)]
The np.split will split ndarray into columns.
I'd like to get the index of a value for every column in a matrix M. For example:
M = matrix([[0, 1, 0],
[4, 2, 4],
[3, 4, 1],
[1, 3, 2],
[2, 0, 3]])
In pseudocode, I'd like to do something like this:
for col in M:
idx = numpy.where(M[col]==0) # Only for columns!
and have idx be 0, 4, 0 for each column.
I have tried to use where, but I don't understand the return value, which is a tuple of matrices.
The tuple of matrices is a collection of items suited for indexing. The output will have the shape of the indexing matrices (or arrays), and each item in the output will be selected from the original array using the first array as the index of the first dimension, the second as the index of the second dimension, and so on. In other words, this:
>>> numpy.where(M == 0)
(matrix([[0, 0, 4]]), matrix([[0, 2, 1]]))
>>> row, col = numpy.where(M == 0)
>>> M[row, col]
matrix([[0, 0, 0]])
>>> M[numpy.where(M == 0)] = 1000
>>> M
matrix([[1000, 1, 1000],
[ 4, 2, 4],
[ 3, 4, 1],
[ 1, 3, 2],
[ 2, 1000, 3]])
The sequence may be what's confusing you. It proceeds in flattened order -- so M[0,2] appears second, not third. If you need to reorder them, you could do this:
>>> row[0,col.argsort()]
matrix([[0, 4, 0]])
You also might be better off using arrays instead of matrices. That way you can manipulate the shape of the arrays, which is often useful! Also note ajcr's transpose-based trick, which is probably preferable to using argsort.
Finally, there is also a nonzero method that does the same thing as where in this case. Using the transpose trick now:
>>> (M == 0).T.nonzero()
(matrix([[0, 1, 2]]), matrix([[0, 4, 0]]))
As an alternative to np.where, you could perhaps use np.argwhere to return an array of indexes where the array meets the condition:
>>> np.argwhere(M == 0)
array([[[0, 0]],
[[0, 2]],
[[4, 1]]])
This tells you each the indexes in the format [row, column] where the condition was met.
If you'd prefer the format of this output array to be grouped by column rather than row, (that is, [column, row]), just use the method on the transpose of the array:
>>> np.argwhere(M.T == 0).squeeze()
array([[0, 0],
[1, 4],
[2, 0]])
I also used np.squeeze here to get rid of axis 1, so that we are left with a 2D array. The sequence you want is the second column, i.e. np.argwhere(M.T == 0).squeeze()[:, 1].
The result of where(M == 0) would look something like this
(matrix([[0, 0, 4]]), matrix([[0, 2, 1]])) First matrix tells you the rows where 0s are and second matrix tells you the columns where 0s are.
Out[4]:
matrix([[0, 1, 0],
[4, 2, 4],
[3, 4, 1],
[1, 3, 2],
[2, 0, 3]])
In [5]: np.where(M == 0)
Out[5]: (matrix([[0, 0, 4]]), matrix([[0, 2, 1]]))
In [6]: M[0,0]
Out[6]: 0
In [7]: M[0,2] #0th row 2nd column
Out[7]: 0
In [8]: M[4,1] #4th row 1st column
Out[8]: 0
This isn't anything new on what's been already suggested, but a one-line solution is:
>>> np.where(np.array(M.T)==0)[-1]
array([0, 4, 0])
(I agree that NumPy matrix objects are more trouble than they're worth).
>>> M = np.array([[0, 1, 0],
... [4, 2, 4],
... [3, 4, 1],
... [1, 3, 2],
... [2, 0, 3]])
>>> [np.where(M[:,i]==0)[0][0] for i in range(M.shape[1])]
[0, 4, 0]
I am trying to optimise some code by removing for loops and using numpy arrays only as I am working with large data sets.
I would like to take a 1D numpy array, for example:
a = [1, 2, 3, 4, 5]
and produce a 2D numpy array whereby the value in each column shifts along a place, for example in the case above for a I wish to have a function which returns:
[[1 2 3 4 5]
[0 1 2 3 4]
[0 0 1 2 3]
[0 0 0 1 2]
[0 0 0 0 1]]
I have found examples which use the strides function to do something similar to produce, for example:
[[1 2 3]
[2 3 4]
[3 4 5]]
However I am trying to shift each of my columns in the other direction. Alternatively, one can view the problem as putting the first element of a on the first diagonal, the second element on the second diagonal and so on. However, I would like to stress again how I would like to avoid using a for, while or if loop entirely. Any help would be greatly appreciated.
Such a matrix is an example of a Toeplitz matrix. You could use scipy.linalg.toeplitz to create it:
In [32]: from scipy.linalg import toeplitz
In [33]: a = range(1,6)
In [34]: toeplitz(a, np.zeros_like(a)).T
Out[34]:
array([[1, 2, 3, 4, 5],
[0, 1, 2, 3, 4],
[0, 0, 1, 2, 3],
[0, 0, 0, 1, 2],
[0, 0, 0, 0, 1]])
Inspired by #EelcoHoogendoorn's answer, here's a variation that doesn't use as much memory as scipy.linalg.toeplitz:
In [47]: from numpy.lib.stride_tricks import as_strided
In [48]: a
Out[48]: array([1, 2, 3, 4, 5])
In [49]: t = as_strided(np.r_[a[::-1], np.zeros_like(a)], shape=(a.size,a.size), strides=(a.itemsize, a.itemsize))[:,::-1]
In [50]: t
Out[50]:
array([[1, 2, 3, 4, 5],
[0, 1, 2, 3, 4],
[0, 0, 1, 2, 3],
[0, 0, 0, 1, 2],
[0, 0, 0, 0, 1]])
The result should be treated as a "read only" array. Otherwise, you'll be in for some surprises when you change an element. For example:
In [51]: t[0,2] = 99
In [52]: t
Out[52]:
array([[ 1, 2, 99, 4, 5],
[ 0, 1, 2, 99, 4],
[ 0, 0, 1, 2, 99],
[ 0, 0, 0, 1, 2],
[ 0, 0, 0, 0, 1]])
Here is the indexing-tricks based solution. Not nearly as elegant as the toeplitz solution already posted, but should memory consumption or performance be a concern, it is to be preferred. As demonstrated, this also makes it easy to subsequently manipulate the entries of the matrix in a consistent manner.
import numpy as np
a = np.arange(5)+1
def toeplitz_view(a):
b = np.concatenate((np.zeros_like(a),a))
i = a.itemsize
v = np.lib.index_tricks.as_strided(b,
shape=(len(b),)*2,
strides=(-i, i))
#return a view on the 'original' data as well, for manipulation
return v[:len(a), len(a):], b[len(a):]
v, a = toeplitz_view(a)
print v
a[0] = 10
v[2,1] = -1
print v
I have the following 3 x 3 x 3 numpy array called a (the comments will make sense after you read the rest of the question):
array([[[8, 1, 0], # irrelevant 1 (is at position 1 rather than 0)
[1, 7, 5], # the 1 on this line is what I am after!
[1, 4, 9]], # irrelevant 1 (out of the "cross")
[[4, 0, 1], # irrelevant 1 (is at position 2 rather than 0)
[1, 0, 1], # I'm only after the first 1 on this line!
[6, 2, 1]], # irrelevant 1 (is at position 2 rather than 0)
[[0, 2, 2],
[0, 6, 7],
[3, 4, 9]]])
furthermore I have this list of indexes that refers to the "central cross" of said matrix, called idx
[array([0, 1, 1, 1, 2]), array([1, 0, 1, 2, 1])]
EDIT: I call it "cross" as it marks the central column and row in the following:
>>> a[..., 0]
array([[8, 1, 1],
[4, 1, 6],
[0, 0, 3]])
What I would like to obtain is the indexes of all those arrays located at idx whose first value is 1, but I'm struggling in understanding how to use numpy.where() in the right way. Since...
>>> a[..., 0][idx]
array([1, 4, 1, 6, 0])
...I tried...
>>> np.where(a[..., 0][idx] == 1)
(array([0, 2]),)
...but as you can see it returns the index of the sliced array, not of a, while I would like to get:
[array([0, 1]), array([1, 1])] #as a[0, 1, 0] and a [1, 1, 0] are equal to 1.
Thank you in advance for your help!
PS: In the comments I have been suggested to try to give a broader scenario of applicability. Although it is not what I am using for, I suppose this could be used to process images as many 2D libraries do, with a source layer, a destination layer and a mask (see for example cairo). In this case the mask would be the idx array, and one might imagine working with the R channel of RGB colors (a[..., 0]).
You can translate the indices back using idx:
>>> w = np.where(a[..., 0][idx] == 1)[0]
>>> array(idx).T[w]
array([[0, 1],
[1, 1]])