I'm working with 2D numpy arrays which exhibit variable sizes, in terms of the number of rows and columns. I'd like to pad this array with zeros both before the start of the first row and at the end of the last row, but I'd like the start/end of the zeros to be offset in a different way for each column of data.
So the original 2D array:
1 2 3
4 5 6
7 8 9
A Normal example of padding:
0 0 0
0 0 0
1 2 3
4 5 6
7 8 9
0 0 0
Modified Padding with offsets (what I'm trying to do):
0 0 0
1 0 0
4 0 3
7 2 6
0 5 9
0 8 0
Does numpy possess any functions which can replicate the last example in an extendable manner for variables numbers of rows/columns, that avoids the use of for loops/other computationally slow approaches?
Here's a vectorized one with broadcasting and boolean-indexing -
def create_padded_array(a, row_start, n_rows):
r = np.arange(n_rows)[:,None]
row_start = np.asarray(row_start)
mask = (r >= row_start) & (r < row_start+a.shape[0])
out = np.zeros(mask.shape, dtype=a.dtype)
out.T[mask.T] = a.ravel('F')
return out
Sample run -
In [184]: a
Out[184]:
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
In [185]: create_padded_array(a, row_start=[1,3,2], n_rows=6)
Out[185]:
array([[0, 0, 0],
[1, 0, 0],
[4, 0, 3],
[7, 2, 6],
[0, 5, 9],
[0, 8, 0]])
Sorry for the trouble, but I think I found the answer that I was looking for.
I can use numpy.pad to create an arbitrary number of filler zeros at the end of my original array. There is also a function called numpy.roll which can then be used to shift all array elements along a given axis by a set number of positions down the column.
After a quick test, it looks like this is extendable for an arbitrary number of matrix elements and allows a unique offset along each column.
Thanks to everyone for their responses to this question!
To my knowledge there is no such numpy function with those exact specific requirements, however what you can do is have your array:
`
In [10]: arr = np.array([(1,2,3),(4,5,6),(7,8,9)])
In [11]: arr
Out[11]:
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])`
Then pad it:
In [12]: arr = np.pad(arr, ((2,1),(0,0)), 'constant', constant_values=(0))
In [13]: arr
Out[13]:
array([[0, 0, 0],
[0, 0, 0],
[1, 2, 3],
[4, 5, 6],
[7, 8, 9],
[0, 0, 0]])
Then you can randomize with shuffle (which I assume is what you want to do):
But np.random.shuffle only shuffles rows if this is satisfactory for your needs then:
In [14]: np.random.shuffle(arr)
In [15]: arr
Out[15]:
array([[7, 8, 9],
[4, 5, 6],
[0, 0, 0],
[0, 0, 0],
[0, 0, 0],
[1, 2, 3]])
If this is not satisfactory you can do this:
First create a 1D array:
In [16]: arr = np.arange(1,10)
In [17]: arr
Out[17]: array([1, 2, 3, 4, 5, 6, 7, 8, 9])
Then pad your array with zeros:
In [18]: arr = np.pad(arr, (6,3), 'constant', constant_values = (0))
In [19]: arr
Out[19]: array([0, 0, 0, 0, 0, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 0, 0])
Then you shuffle the array:
In [20]: np.random.shuffle(arr)
In [21]: arr
Out[21]: array([4, 0, 0, 5, 0, 0, 3, 0, 0, 0, 8, 0, 7, 2, 1, 6, 0, 9])
Finally you reshape to the desired format:
In [22]: np.reshape(arr,[6,3])
Out[22]:
array([[4, 0, 0],
[5, 0, 0],
[3, 0, 0],
[0, 8, 0],
[7, 2, 1],
[6, 0, 9]])
Although this may seem lengthy this is much quicker for large data sets than it will be using for loops, or any other python control structures. When you say offsets if you want to change the amount of randomness you can choose to only shuffle portions of the 1D array then combine it to the rest of the data so that way the whole data set is not shuffled but a portion you want to be shuffled is.
(If what you mean by offsets is different from my assumption above please clarify in a comment)
Related
I have a 2D array:
[[1,2,0,0],
[4,0,9,4],
[0,0,1,0],
[4,6,9,0]]
is there an efficient way (without using loops) to replace every first 0 in the array, with a 1:
[[1,2,1,0],
[4,1,9,4],
[1,0,1,0],
[4,6,9,1]]
?
Thanks a lot !
Here is a one-liner inspired by the accepted answer of this question:
a = np.array([
[1, 2, 0, 0],
[4, 0, 9, 4],
[0, 0, 1, 0],
[4, 6, 9, 0]
])
a[range(len(a)), np.argmax(a == 0, axis=1)] = 1
So, you can use np.where to get the indices of the rows and columns where the array is 0:
In [45]: arr = np.array(
...: [[1,2,0,0],
...: [4,0,9,4],
...: [0,0,1,0],
...: [4,6,9,0]]
...: )
In [46]: r, c = np.where(arr == 0)
Then, use np.unique to get the unique x values, which will correspond to the first incidence of 0 in each row, and use return_index to get the indices to extract the corresponding column values:
In [47]: uniq_val, uniq_idx = np.unique(r, return_index=True)
In [48]: arr[uniq_val, c[uniq_idx]] = 1
In [49]: arr
Out[49]:
array([[1, 2, 1, 0],
[4, 1, 9, 4],
[1, 0, 1, 0],
[4, 6, 9, 1]])
If performance is really an issue, you could just write a numba function, I suspect this would be very amenable to numba
I have a 2D numpy array that I want to extract a submatrix from.
I get the submatrix by slicing the array as below.
Here I want a 3*3 submatrix around an item at the index of (2,3).
>>> import numpy as np
>>> a = np.array([[0, 1, 2, 3],
... [4, 5, 6, 7],
... [8, 9, 0, 1],
... [2, 3, 4, 5]])
>>> a[1:4, 2:5]
array([[6, 7],
[0, 1],
[4, 5]])
But what I want is that for indexes that are out of range, it goes back to the beginning of array and continues from there. This is the result I want:
array([[6, 7, 4],
[0, 1, 8],
[4, 5, 2]])
I know that I can do things like getting mod of the index to the width of the array; but I'm looking for a numpy function that does that.
And also for an one dimensional array this will cause an index out of range error, which is not really useful...
This is one way using np.pad with wraparound mode.
>>> a = np.array([[0, 1, 2, 3],
[4, 5, 6, 7],
[8, 9, 0, 1],
[2, 3, 4, 5]])
>>> pad_width = 1
>>> i, j = 2, 3
>>> startrow, endrow = i-1+pad_width, i+2+pad_width # for 3 x 3 submatrix
>>> startcol, endcol = j-1+pad_width, j+2+pad_width
>>> np.pad(a, (pad_width, pad_width), 'wrap')[startrow:endrow, startcol:endcol]
array([[6, 7, 4],
[0, 1, 8],
[4, 5, 2]])
Depending on the shape of your patch (eg. 5 x 5 instead of 3 x 3) you can increase the pad_width and start and end row and column indices accordingly.
np.take does have a mode parameter which can wrap-around out of bound indices. But it's a bit hacky to use np.take for multidimensional arrays since the axis must be a scalar.
However, In your particular case you could do this:
a = np.array([[0, 1, 2, 3],
[4, 5, 6, 7],
[8, 9, 0, 1],
[2, 3, 4, 5]])
np.take(a, np.r_[2:5], axis=1, mode='wrap')[1:4]
Output:
array([[6, 7, 4],
[0, 1, 8],
[4, 5, 2]])
EDIT
This function might be what you are looking for (?)
def select3x3(a, idx):
x,y = idx
return np.take(np.take(a, np.r_[x-1:x+2], axis=0, mode='wrap'), np.r_[y-1:y+2], axis=1, mode='wrap')
But in retrospect, i recommend using modulo and fancy indexing for this kind of operation (it's basically what the mode='wrap' is doing internally anyways):
def select3x3(a, idx):
x,y = idx
return a[np.r_[x-1:x+2][:,None] % a.shape[0], np.r_[y-1:y+2][None,:] % a.shape[1]]
The above solution is also generalized for any 2d shape on a.
I am working with python 3.7 and I would like to get all the odd columns of a matrix.
To give a example, I have a 4x4 matrix of this style right now.
[[0, 9, 1, 6], [0, 3, 1, 5], [0, 2, 1, 7], [0, 6, 1, 2]]
That is...
0 9 1 6
0 3 1 5
0 2 1 7
0 6 1 2
And I would like to get:
9 6
3 5
2 7
6 2
The numbers and the size of the matrix will change but the structure will always be
[[0, (int), 1, (int), 2...], [0, (int), 1, (int), 2 ...], [0, (int), 1, (int), 2...], [0, (int), 1, (int), 2...], ...]
To get the rows I can do [:: 2], but that wonderful solution does not work for me right now. I try to access the matrix with:
for i in matrix:
for j in matrix:
But none of this doesn't work either.
How can I solve it?
Thank you.
Without using numpy, you can use something similar to your indexing scheme ([1::2]) in a list comprehension:
>>> [i[1::2] for i in mat]
[[9, 6], [3, 5], [2, 7], [6, 2]]
Using numpy, you can do something similar:
>>> import numpy as np
>>> np.array(mat)[:,1::2]
array([[9, 6],
[3, 5],
[2, 7],
[6, 2]])
If you can't use NumPy for whatever reason, write a custom implementation:
def getColumns(matrix, columns):
return {c: [matrix[r][c] for r in range(len(matrix))] for c in columns}
It takes a 2D array and a list of columns, and it returns a dictionary where the column indexes are keys and the actual columns are values. Note that if you passed all indices you would get a transposed matrix.
In your case,
M = [[0, 9, 1, 6],
[0, 3, 1, 5],
[0, 2, 1, 7],
[0, 6, 1, 2]]
All odd columns are even indices (because the index of the first one is 0), Therefore:
L = list(range(0, len(M[0]), 2))
And then you would do:
myColumns = getColumns(M, L)
print(list(myColumns.values()))
#result: [[0, 0, 0, 0], [1, 1, 1, 1]]
But since you showed the values as if they were in rows:
def f(matrix, columns):
return [[matrix[row][i] for i in columns] for row in range(len(matrix))]
print(f(M, L))
#result: [[0, 1], [0, 1], [0, 1], [0, 1]]
And I believe that the latter is what you wanted.
I am trying to optimise some code by removing for loops and using numpy arrays only as I am working with large data sets.
I would like to take a 1D numpy array, for example:
a = [1, 2, 3, 4, 5]
and produce a 2D numpy array whereby the value in each column shifts along a place, for example in the case above for a I wish to have a function which returns:
[[1 2 3 4 5]
[0 1 2 3 4]
[0 0 1 2 3]
[0 0 0 1 2]
[0 0 0 0 1]]
I have found examples which use the strides function to do something similar to produce, for example:
[[1 2 3]
[2 3 4]
[3 4 5]]
However I am trying to shift each of my columns in the other direction. Alternatively, one can view the problem as putting the first element of a on the first diagonal, the second element on the second diagonal and so on. However, I would like to stress again how I would like to avoid using a for, while or if loop entirely. Any help would be greatly appreciated.
Such a matrix is an example of a Toeplitz matrix. You could use scipy.linalg.toeplitz to create it:
In [32]: from scipy.linalg import toeplitz
In [33]: a = range(1,6)
In [34]: toeplitz(a, np.zeros_like(a)).T
Out[34]:
array([[1, 2, 3, 4, 5],
[0, 1, 2, 3, 4],
[0, 0, 1, 2, 3],
[0, 0, 0, 1, 2],
[0, 0, 0, 0, 1]])
Inspired by #EelcoHoogendoorn's answer, here's a variation that doesn't use as much memory as scipy.linalg.toeplitz:
In [47]: from numpy.lib.stride_tricks import as_strided
In [48]: a
Out[48]: array([1, 2, 3, 4, 5])
In [49]: t = as_strided(np.r_[a[::-1], np.zeros_like(a)], shape=(a.size,a.size), strides=(a.itemsize, a.itemsize))[:,::-1]
In [50]: t
Out[50]:
array([[1, 2, 3, 4, 5],
[0, 1, 2, 3, 4],
[0, 0, 1, 2, 3],
[0, 0, 0, 1, 2],
[0, 0, 0, 0, 1]])
The result should be treated as a "read only" array. Otherwise, you'll be in for some surprises when you change an element. For example:
In [51]: t[0,2] = 99
In [52]: t
Out[52]:
array([[ 1, 2, 99, 4, 5],
[ 0, 1, 2, 99, 4],
[ 0, 0, 1, 2, 99],
[ 0, 0, 0, 1, 2],
[ 0, 0, 0, 0, 1]])
Here is the indexing-tricks based solution. Not nearly as elegant as the toeplitz solution already posted, but should memory consumption or performance be a concern, it is to be preferred. As demonstrated, this also makes it easy to subsequently manipulate the entries of the matrix in a consistent manner.
import numpy as np
a = np.arange(5)+1
def toeplitz_view(a):
b = np.concatenate((np.zeros_like(a),a))
i = a.itemsize
v = np.lib.index_tricks.as_strided(b,
shape=(len(b),)*2,
strides=(-i, i))
#return a view on the 'original' data as well, for manipulation
return v[:len(a), len(a):], b[len(a):]
v, a = toeplitz_view(a)
print v
a[0] = 10
v[2,1] = -1
print v
I have the following 3 x 3 x 3 numpy array called a (the comments will make sense after you read the rest of the question):
array([[[8, 1, 0], # irrelevant 1 (is at position 1 rather than 0)
[1, 7, 5], # the 1 on this line is what I am after!
[1, 4, 9]], # irrelevant 1 (out of the "cross")
[[4, 0, 1], # irrelevant 1 (is at position 2 rather than 0)
[1, 0, 1], # I'm only after the first 1 on this line!
[6, 2, 1]], # irrelevant 1 (is at position 2 rather than 0)
[[0, 2, 2],
[0, 6, 7],
[3, 4, 9]]])
furthermore I have this list of indexes that refers to the "central cross" of said matrix, called idx
[array([0, 1, 1, 1, 2]), array([1, 0, 1, 2, 1])]
EDIT: I call it "cross" as it marks the central column and row in the following:
>>> a[..., 0]
array([[8, 1, 1],
[4, 1, 6],
[0, 0, 3]])
What I would like to obtain is the indexes of all those arrays located at idx whose first value is 1, but I'm struggling in understanding how to use numpy.where() in the right way. Since...
>>> a[..., 0][idx]
array([1, 4, 1, 6, 0])
...I tried...
>>> np.where(a[..., 0][idx] == 1)
(array([0, 2]),)
...but as you can see it returns the index of the sliced array, not of a, while I would like to get:
[array([0, 1]), array([1, 1])] #as a[0, 1, 0] and a [1, 1, 0] are equal to 1.
Thank you in advance for your help!
PS: In the comments I have been suggested to try to give a broader scenario of applicability. Although it is not what I am using for, I suppose this could be used to process images as many 2D libraries do, with a source layer, a destination layer and a mask (see for example cairo). In this case the mask would be the idx array, and one might imagine working with the R channel of RGB colors (a[..., 0]).
You can translate the indices back using idx:
>>> w = np.where(a[..., 0][idx] == 1)[0]
>>> array(idx).T[w]
array([[0, 1],
[1, 1]])