I am working with python 3.7 and I would like to get all the odd columns of a matrix.
To give a example, I have a 4x4 matrix of this style right now.
[[0, 9, 1, 6], [0, 3, 1, 5], [0, 2, 1, 7], [0, 6, 1, 2]]
That is...
0 9 1 6
0 3 1 5
0 2 1 7
0 6 1 2
And I would like to get:
9 6
3 5
2 7
6 2
The numbers and the size of the matrix will change but the structure will always be
[[0, (int), 1, (int), 2...], [0, (int), 1, (int), 2 ...], [0, (int), 1, (int), 2...], [0, (int), 1, (int), 2...], ...]
To get the rows I can do [:: 2], but that wonderful solution does not work for me right now. I try to access the matrix with:
for i in matrix:
for j in matrix:
But none of this doesn't work either.
How can I solve it?
Thank you.
Without using numpy, you can use something similar to your indexing scheme ([1::2]) in a list comprehension:
>>> [i[1::2] for i in mat]
[[9, 6], [3, 5], [2, 7], [6, 2]]
Using numpy, you can do something similar:
>>> import numpy as np
>>> np.array(mat)[:,1::2]
array([[9, 6],
[3, 5],
[2, 7],
[6, 2]])
If you can't use NumPy for whatever reason, write a custom implementation:
def getColumns(matrix, columns):
return {c: [matrix[r][c] for r in range(len(matrix))] for c in columns}
It takes a 2D array and a list of columns, and it returns a dictionary where the column indexes are keys and the actual columns are values. Note that if you passed all indices you would get a transposed matrix.
In your case,
M = [[0, 9, 1, 6],
[0, 3, 1, 5],
[0, 2, 1, 7],
[0, 6, 1, 2]]
All odd columns are even indices (because the index of the first one is 0), Therefore:
L = list(range(0, len(M[0]), 2))
And then you would do:
myColumns = getColumns(M, L)
print(list(myColumns.values()))
#result: [[0, 0, 0, 0], [1, 1, 1, 1]]
But since you showed the values as if they were in rows:
def f(matrix, columns):
return [[matrix[row][i] for i in columns] for row in range(len(matrix))]
print(f(M, L))
#result: [[0, 1], [0, 1], [0, 1], [0, 1]]
And I believe that the latter is what you wanted.
Related
I have an n x m x 3 numpy array. This represents a middle-step towards an RGB representation of a complex-function plotter. When the function being plotted takes infinite values or has singularities, parts of the RGB data become NaNs.
I'm looking for an efficient way to replace a row containing a NaN with a row of my choice, perhaps [0, 0, 0] or [1, 1, 1]. In terms of the RGB values, this has the effect of replacing poorly-behaving pixels with white or black pixels. By efficient, I mean some way that takes advantage of numpy's vectorization and speed.
Please note that I am not looking to merely replace the NaN values with 0 (which I know how to do with numpy.where); if a row contains a NaN, I want to replace the whole row. I suspect this can be done nicely in numpy, but I'm not sure how.
Concrete Question
Suppose we are given a 2 x 2 x 3 array arr. If a row contains a 5, I want to replace the row with [0, 0, 0]. Trivial code that does this slowly is as follows.
import numpy as np
arr = np.array([[[1, 2, 3], [4, 5, 6]], [[1, 3, 5], [2, 4, 6]]])
# so arr is
# array([[[1, 2, 3],
# [4, 5, 6]],
#
# [[1, 3, 5],
# [2, 4, 6]]])
# Trivial and slow version to replace rows containing 5 with [0,0,0]
for i in range(len(arr)):
for j in range(len(arr[i])):
if 5 in arr[i][j]:
arr[i][j] = np.array([0, 0, 0])
# Now arr is
#
# array([[[1, 2, 3],
# [0, 0, 0]],
#
# [[0, 0, 0],
# [2, 4, 6]]])
How can we accomplish this taking advantage of numpy?
A simpler way would be -
arr[np.isin(arr,5).any(-1)] = 0
If it's just a single value that you are looking for, then we could simplify to -
arr[(arr==5).any(-1)] = 0
If you are looking to match against NaN, we need to do the comparison differently and use np.isnan instead -
arr[np.isnan(arr).any(-1)] = 0
If you are looking to assign array values, instead of just 0, the solutions stay the same. Hence it would be -
arr[(arr==5).any(-1)] = new_array
Using np.broadcast_to
arr[np.broadcast_to((arr == 5).any(-1)[..., None], arr.shape)] = 0
array([[[1, 2, 3],
[0, 0, 0]],
[[0, 0, 0],
[2, 4, 6]]])
Just as FYI, based on your description, if you want to find np.nans instead of integers like 5, you shouldn't use ==, but rather np.isnan
arr[np.broadcast_to((np.isnan(arr)).any(-1)[..., None], arr.shape)] = 0
you can do it using in1d function like below
arr = np.array([[[1, 2, 3], [4, 5, 6]], [[1, 3, 5], [2, 4, 6]]])
arr[np.in1d(arr,5).reshape(arr.shape).any(axis=2)] = [0,0,0]
arr
I'm working with 2D numpy arrays which exhibit variable sizes, in terms of the number of rows and columns. I'd like to pad this array with zeros both before the start of the first row and at the end of the last row, but I'd like the start/end of the zeros to be offset in a different way for each column of data.
So the original 2D array:
1 2 3
4 5 6
7 8 9
A Normal example of padding:
0 0 0
0 0 0
1 2 3
4 5 6
7 8 9
0 0 0
Modified Padding with offsets (what I'm trying to do):
0 0 0
1 0 0
4 0 3
7 2 6
0 5 9
0 8 0
Does numpy possess any functions which can replicate the last example in an extendable manner for variables numbers of rows/columns, that avoids the use of for loops/other computationally slow approaches?
Here's a vectorized one with broadcasting and boolean-indexing -
def create_padded_array(a, row_start, n_rows):
r = np.arange(n_rows)[:,None]
row_start = np.asarray(row_start)
mask = (r >= row_start) & (r < row_start+a.shape[0])
out = np.zeros(mask.shape, dtype=a.dtype)
out.T[mask.T] = a.ravel('F')
return out
Sample run -
In [184]: a
Out[184]:
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
In [185]: create_padded_array(a, row_start=[1,3,2], n_rows=6)
Out[185]:
array([[0, 0, 0],
[1, 0, 0],
[4, 0, 3],
[7, 2, 6],
[0, 5, 9],
[0, 8, 0]])
Sorry for the trouble, but I think I found the answer that I was looking for.
I can use numpy.pad to create an arbitrary number of filler zeros at the end of my original array. There is also a function called numpy.roll which can then be used to shift all array elements along a given axis by a set number of positions down the column.
After a quick test, it looks like this is extendable for an arbitrary number of matrix elements and allows a unique offset along each column.
Thanks to everyone for their responses to this question!
To my knowledge there is no such numpy function with those exact specific requirements, however what you can do is have your array:
`
In [10]: arr = np.array([(1,2,3),(4,5,6),(7,8,9)])
In [11]: arr
Out[11]:
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])`
Then pad it:
In [12]: arr = np.pad(arr, ((2,1),(0,0)), 'constant', constant_values=(0))
In [13]: arr
Out[13]:
array([[0, 0, 0],
[0, 0, 0],
[1, 2, 3],
[4, 5, 6],
[7, 8, 9],
[0, 0, 0]])
Then you can randomize with shuffle (which I assume is what you want to do):
But np.random.shuffle only shuffles rows if this is satisfactory for your needs then:
In [14]: np.random.shuffle(arr)
In [15]: arr
Out[15]:
array([[7, 8, 9],
[4, 5, 6],
[0, 0, 0],
[0, 0, 0],
[0, 0, 0],
[1, 2, 3]])
If this is not satisfactory you can do this:
First create a 1D array:
In [16]: arr = np.arange(1,10)
In [17]: arr
Out[17]: array([1, 2, 3, 4, 5, 6, 7, 8, 9])
Then pad your array with zeros:
In [18]: arr = np.pad(arr, (6,3), 'constant', constant_values = (0))
In [19]: arr
Out[19]: array([0, 0, 0, 0, 0, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 0, 0])
Then you shuffle the array:
In [20]: np.random.shuffle(arr)
In [21]: arr
Out[21]: array([4, 0, 0, 5, 0, 0, 3, 0, 0, 0, 8, 0, 7, 2, 1, 6, 0, 9])
Finally you reshape to the desired format:
In [22]: np.reshape(arr,[6,3])
Out[22]:
array([[4, 0, 0],
[5, 0, 0],
[3, 0, 0],
[0, 8, 0],
[7, 2, 1],
[6, 0, 9]])
Although this may seem lengthy this is much quicker for large data sets than it will be using for loops, or any other python control structures. When you say offsets if you want to change the amount of randomness you can choose to only shuffle portions of the 1D array then combine it to the rest of the data so that way the whole data set is not shuffled but a portion you want to be shuffled is.
(If what you mean by offsets is different from my assumption above please clarify in a comment)
Is there a way in python/numpy/scipy to create dynamically a list of integers in a specific range, which can vary and in which the numbers are ordererd depending on a distribtuin, like nomral(gaussian), exponential, linear. I imagine something
like for range 3:
[1,2,3]
[2,1,2]
[1,2,1]
[3,2,1]
for range 4:
[1,2,3,4]
[2,1,1,2]
[1,2,2,1]
[4,3,2,1]
for range 5:
[1,2,3,4,5]
[2,1,0,1,2]
[1,2,3,2,1]
[5,4,3,2,1]
We could use a bit of trickery using np.minimum to generate the symmetrical version in third row. The second row is just a complement of the third row subtracted from 3. The first and last rows are just ranges starting from 1 till n and flipped version of it respectively.
Thus, we would have one approach after row-stacking those rows to have a 2D array, like so -
def ranged_arr(n):
r = np.arange(n)+1
row3 = np.minimum(r,r[::-1])
return np.c_[r, 3-row3, row3, r[::-1]].T
We could also use np.row_stack to do the stacking -
np.row_stack((r, 3-row3, row3, r[::-1]))
Sample runs -
In [106]: ranged_arr(n=3)
Out[106]:
array([[1, 2, 3],
[2, 1, 2],
[1, 2, 1],
[3, 2, 1]])
In [107]: ranged_arr(n=4)
Out[107]:
array([[1, 2, 3, 4],
[2, 1, 1, 2],
[1, 2, 2, 1],
[4, 3, 2, 1]])
In [108]: ranged_arr(n=5)
Out[108]:
array([[1, 2, 3, 4, 5],
[2, 1, 0, 1, 2],
[1, 2, 3, 2, 1],
[5, 4, 3, 2, 1]])
I am trying to optimise some code by removing for loops and using numpy arrays only as I am working with large data sets.
I would like to take a 1D numpy array, for example:
a = [1, 2, 3, 4, 5]
and produce a 2D numpy array whereby the value in each column shifts along a place, for example in the case above for a I wish to have a function which returns:
[[1 2 3 4 5]
[0 1 2 3 4]
[0 0 1 2 3]
[0 0 0 1 2]
[0 0 0 0 1]]
I have found examples which use the strides function to do something similar to produce, for example:
[[1 2 3]
[2 3 4]
[3 4 5]]
However I am trying to shift each of my columns in the other direction. Alternatively, one can view the problem as putting the first element of a on the first diagonal, the second element on the second diagonal and so on. However, I would like to stress again how I would like to avoid using a for, while or if loop entirely. Any help would be greatly appreciated.
Such a matrix is an example of a Toeplitz matrix. You could use scipy.linalg.toeplitz to create it:
In [32]: from scipy.linalg import toeplitz
In [33]: a = range(1,6)
In [34]: toeplitz(a, np.zeros_like(a)).T
Out[34]:
array([[1, 2, 3, 4, 5],
[0, 1, 2, 3, 4],
[0, 0, 1, 2, 3],
[0, 0, 0, 1, 2],
[0, 0, 0, 0, 1]])
Inspired by #EelcoHoogendoorn's answer, here's a variation that doesn't use as much memory as scipy.linalg.toeplitz:
In [47]: from numpy.lib.stride_tricks import as_strided
In [48]: a
Out[48]: array([1, 2, 3, 4, 5])
In [49]: t = as_strided(np.r_[a[::-1], np.zeros_like(a)], shape=(a.size,a.size), strides=(a.itemsize, a.itemsize))[:,::-1]
In [50]: t
Out[50]:
array([[1, 2, 3, 4, 5],
[0, 1, 2, 3, 4],
[0, 0, 1, 2, 3],
[0, 0, 0, 1, 2],
[0, 0, 0, 0, 1]])
The result should be treated as a "read only" array. Otherwise, you'll be in for some surprises when you change an element. For example:
In [51]: t[0,2] = 99
In [52]: t
Out[52]:
array([[ 1, 2, 99, 4, 5],
[ 0, 1, 2, 99, 4],
[ 0, 0, 1, 2, 99],
[ 0, 0, 0, 1, 2],
[ 0, 0, 0, 0, 1]])
Here is the indexing-tricks based solution. Not nearly as elegant as the toeplitz solution already posted, but should memory consumption or performance be a concern, it is to be preferred. As demonstrated, this also makes it easy to subsequently manipulate the entries of the matrix in a consistent manner.
import numpy as np
a = np.arange(5)+1
def toeplitz_view(a):
b = np.concatenate((np.zeros_like(a),a))
i = a.itemsize
v = np.lib.index_tricks.as_strided(b,
shape=(len(b),)*2,
strides=(-i, i))
#return a view on the 'original' data as well, for manipulation
return v[:len(a), len(a):], b[len(a):]
v, a = toeplitz_view(a)
print v
a[0] = 10
v[2,1] = -1
print v
I have the following 3 x 3 x 3 numpy array called a (the comments will make sense after you read the rest of the question):
array([[[8, 1, 0], # irrelevant 1 (is at position 1 rather than 0)
[1, 7, 5], # the 1 on this line is what I am after!
[1, 4, 9]], # irrelevant 1 (out of the "cross")
[[4, 0, 1], # irrelevant 1 (is at position 2 rather than 0)
[1, 0, 1], # I'm only after the first 1 on this line!
[6, 2, 1]], # irrelevant 1 (is at position 2 rather than 0)
[[0, 2, 2],
[0, 6, 7],
[3, 4, 9]]])
furthermore I have this list of indexes that refers to the "central cross" of said matrix, called idx
[array([0, 1, 1, 1, 2]), array([1, 0, 1, 2, 1])]
EDIT: I call it "cross" as it marks the central column and row in the following:
>>> a[..., 0]
array([[8, 1, 1],
[4, 1, 6],
[0, 0, 3]])
What I would like to obtain is the indexes of all those arrays located at idx whose first value is 1, but I'm struggling in understanding how to use numpy.where() in the right way. Since...
>>> a[..., 0][idx]
array([1, 4, 1, 6, 0])
...I tried...
>>> np.where(a[..., 0][idx] == 1)
(array([0, 2]),)
...but as you can see it returns the index of the sliced array, not of a, while I would like to get:
[array([0, 1]), array([1, 1])] #as a[0, 1, 0] and a [1, 1, 0] are equal to 1.
Thank you in advance for your help!
PS: In the comments I have been suggested to try to give a broader scenario of applicability. Although it is not what I am using for, I suppose this could be used to process images as many 2D libraries do, with a source layer, a destination layer and a mask (see for example cairo). In this case the mask would be the idx array, and one might imagine working with the R channel of RGB colors (a[..., 0]).
You can translate the indices back using idx:
>>> w = np.where(a[..., 0][idx] == 1)[0]
>>> array(idx).T[w]
array([[0, 1],
[1, 1]])