numpy select members with a certain shifted window - python

I want to extract some members from many large numpy arrays. A simple example is
A = np.arange(36).reshape(6,6)
array([[ 0, 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10, 11],
[12, 13, 14, 15, 16, 17],
[18, 19, 20, 21, 22, 23],
[24, 25, 26, 27, 28, 29],
[30, 31, 32, 33, 34, 35]])
I want to extract the members in a shifted windows in each row with minimum stride 2 and maximum stride 4. For example, in the first row, I would like to have
[2, 3, 4] # A[i,i+2:i+4+1] where i == 0
In the second row, I want to have
[9, 10, 11] # A[i,i+2:i+4+1] where i == 1
In the third row, I want to have
[16, 17, 0]
[[2, 3, 4],
[9, 10, 11],
[16 17, 0],
[23, 0, 0]]
I want to know efficient ways to do this. Thanks.

you can extract values in an array by providing a list of indices for each dimension. For example, if you want the second diagonal, you can use arr[np.arange(0, len(arr)-1), np.arange(1, len(arr))]
For your example, I would do smth like what's in the code below. although, I did not account for different strides. If you want to account for strides, you can change the behaviour of the index list creation. If you struggle with adding the stride functionality, comment and I'll edit this answer.
import numpy as np
def slide(arr, window_len = 3, start = 0):
# pad array to get zeros out of bounds
arr_padded = np.zeros((arr.shape[0], arr.shape[1]+window_len-1))
arr_padded[:arr.shape[0], :arr.shape[1]] = arr
# compute the number of window moves
repeats = min(arr.shape[0], arr.shape[1]-start)
# create index lists
idx0 = np.arange(repeats).repeat(window_len)
idx1 = np.concatenate(
[np.arange(start+i, start+i+window_len)
for i in range(repeats)])
return arr_padded[idx0, idx1].reshape(-1, window_len)
A = np.arange(36).reshape(6,6)
print(f'A =\n{A}')
print(f'B =\n{slide(A, start = 2)}')
output:
A =
[[ 0 1 2 3 4 5]
[ 6 7 8 9 10 11]
[12 13 14 15 16 17]
[18 19 20 21 22 23]
[24 25 26 27 28 29]
[30 31 32 33 34 35]]
B =
[[ 2. 3. 4.]
[ 9. 10. 11.]
[16. 17. 0.]
[23. 0. 0.]]

Related

replace missing values in a 3d array

I have a 3d array containing missing values:
arr = np.array([[[ 1, 13],[ 2, 14],[ 3, np.nan]],[[ 4, 16],[ 5, 17],[ 6, 18]],[[ np.nan, 19],[ 8, 20],[ 9, 21]],[[10, 22],[11, 23],[12, np.nan]]])
I would like to perform imputation to replace those missing values, preferably using nearest neighbors. I tried looking into sklearn.impute module but none of the functions accept a 3d array. I know I could flatten the array, but that will result in loss of spatial information. Are there any alternatives?
EDIT:
The array has a 3d spatial configuration and in the real world might look like this:
layer 2
13 14 nan
16 17 18
19 20 21
22 23 nan
layer 1
1 2 3
4 5 6
nan 8 9
10 11 12
for example, value 1 is a neighbor of 2 and 4 in layer 1.
By flattening arr,
[[ 1, 13],
[ 2, 14],
[ 3, np.nan],
[ 4, 16],
[ 5, 17],
[ 6, 18],
[ np.nan, 19],
[ 8, 20],
[ 9, 21],
[10, 22],
[11, 23],
[12, np.nan]]
it looks as though 4 is farther away from 1 than 2 is, but it isn't. 1 is just as close to 2 as it is to 4, just in different dimensions.

How to swap values in a 3-d array in Python?

I have a matrix (3x5) where a number is randomly selected in this matrix. I want to swap the selected number with the one down-right. I'm able to locate the index of the randomly selected number but not sure how to replace it with the one that is down then right. For example, given the matrix:
[[169 107 229 317 236]
[202 124 114 280 106]
[306 135 396 218 373]]
and the selected number is 280 (which is in position [1,3]), needs to be swapped with 373 on [2,4]. I'm having issues on how to move around with the index. I can hard-code it but it becomes a little more complex when the number to swap is randomly selected.
If the selected number is on [0,0], then hard-coded would look like:
selected_task = tard_generator1[0,0]
right_swap = tard_generator1[1,1]
tard_generator1[1,1] = selected_task
tard_generator1[0,0] = right_swap
Any suggestions are welcome!
How about something like
chosen = (1, 2)
right_down = chosen[0] + 1, chosen[1] + 1
matrix[chosen], matrix[right_down] = matrix[right_down], matrix[chosen]
will output:
>>> a
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24]])
>>> index = (1, 2)
>>> right_down = index[0] + 1, index[1] + 1
>>> a[index], a[right_down] = a[right_down], a[index]
>>> a
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 13, 8, 9],
[10, 11, 12, 7, 14],
[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24]])
There should be a boundary check but its omitted
Try this:
import numpy as np
def swap_rdi(mat, index):
row, col = index
rows, cols = mat.shape
assert(row + 1 != rows and col + 1 != cols)
mat[row, col], mat[row+1, col+1] = mat[row+1, col+1], mat[row, col]
return
Example:
mat = np.matrix([[1,2,3], [4,5,6]])
print('Before:\n{}'.format(mat))
print('After:\n{}'.format(swap_rdi(mat, (0,1))))
Outputs:
Before:
[[1 2 3]
[4 5 6]]
After:
[[1 6 3]
[4 5 2]]

Does Tensorflow.unstack() Remove or Rearrange data?

I've been trying to understand how Tensorflow.Unstack() works. I've read the documentation a few times here: https://www.tensorflow.org/api_docs/python/tf/unstack
According to the Tensorflow documentation "the dimension unpacked along is gone". It sounds like unstacking a tensor removes data from the original tensor. Is this true? Or does it only rearrange the data?
In my code example, in Y, it appears that it has removed the fourth row of X. What confuses me, is why does it leave the row on the side of matrix? Is the function actually removing the row or leaving it there? I'm not quite sure what to make of the output.
import tensorflow as tf
X = tf.constant(np.array(range(24)).reshape(2, 3, 4))
Y = tf.unstack(X, axis=0)
with tf.Session() as sess:
print("X ", sess.run(X))
print("Y ", sess.run(Y))
#Ouput
X [[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]]
[[12 13 14 15]
[16 17 18 19]
[20 21 22 23]]]
Y [array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]]), array([[12, 13, 14, 15],
[16, 17, 18, 19],
[20, 21, 22, 23]])]
If we perform unstack on tensor it wont remove the data but it will rearrange it.
Syntax for tf.unstack as below:
tf.unstack(
value, num=None, axis=0, name='unstack'
)
Unstack: split the value(i.e. input) according to the specified axis, and output the list containing num elements.
Here X.shape is (2,3,4),
If axis=0, num must be filled with 2. After transformation, the list (i.e. output) has 2 elements, and the shape of the element is (3,4).
import tensorflow as tf
import numpy as np
print("Tensorflow Version:",tf.__version__)
X = tf.constant(np.array(range(24)).reshape(2, 3, 4))
Y = tf.unstack(X, axis=0)
with tf.Session() as sess:
print("\n")
print("Shape:", X)
print("\n")
print("X ", sess.run(X))
print("\n")
print("Shape:",Y)
print("\n")
print("Y ", sess.run(Y))
Output:
Tensorflow Version: 1.15.0
Shape: Tensor("Const_1:0", shape=(2, 3, 4), dtype=int64)
X [[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]]
[[12 13 14 15]
[16 17 18 19]
[20 21 22 23]]]
Shape: [<tf.Tensor 'unstack_1:0' shape=(3, 4) dtype=int64>, <tf.Tensor 'unstack_1:1' shape=(3, 4) dtype=int64>]
Y [array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]]), array([[12, 13, 14, 15],
[16, 17, 18, 19],
[20, 21, 22, 23]])]

Pythonic way to get both diagonals passing through a matrix entry (i,j)

What is the Pythonic way to get a list of diagonal elements in a matrix passing through entry (i,j)?
For e.g., given a matrix like:
[1 2 3 4 5]
[6 7 8 9 10]
[11 12 13 14 15]
[16 17 18 19 20]
[21 22 23 24 25]
and an entry, say, (1,3) (representing element 9) how can I get the elements in the diagonals passing through 9 in a Pythonic way? Basically, [3,9,15] and [5,9,13,17,21] both.
Using np.diagonal with a little offset logic.
import numpy as np
lst = np.array([[1, 2, 3, 4, 5],
[6, 7, 8, 9, 10],
[11, 12, 13, 14, 15],
[16, 17, 18, 19, 20],
[21, 22, 23, 24, 25]])
i, j = 1, 3
major = np.diagonal(lst, offset=(j - i))
print(major)
array([ 3, 9, 15])
minor = np.diagonal(np.rot90(lst), offset=-lst.shape[1] + (j + i) + 1)
print(minor)
array([ 5, 9, 13, 17, 21])
The indices i and j are the row and column. By specifying the offset, numpy knows from where to begin selecting elements for the diagonal.
For the major diagonal, You want to start collecting from 3 in the first row. So you need to take the current column index and subtract it by the current row index, to figure out the correct column index at the 0th row. Similarly for the minor diagonal, where the array is flipped (rotated by 90˚) and the process repeats.
As another alternative method, with raveling the array and for matrix with shape (n*n):
array = np.array([[1, 2, 3, 4, 5],
[6, 7, 8, 9, 10],
[11, 12, 13, 14, 15],
[16, 17, 18, 19, 20],
[21, 22, 23, 24, 25]])
x, y = 1, 3
a_mod = array.ravel()
size = array.shape[0]
if y >= x:
diag = a_mod[y-x:(x+size-y)*size:size+1]
else:
diag = a_mod[(x-y)*size::size+1]
if x-(size-1-y) >= 0:
reverse_diag = array[:, ::-1].ravel()[(x-(size-1-y))*size::size+1]
else:
reverse_diag = a_mod[x:x*size+1:size-1]
# diag --> [ 3 9 15]
# reverse_diag --> [ 5 9 13 17 21]
The correctness of the resulted arrays must be checked further. This can be developed to handle matrices with other shapes e.g. (n*m).

Transforming multiindex to row-wise multi-dimensional NumPy array.

Suppose I have a MultiIndex DataFrame similar to an example from the MultiIndex docs.
>>> df
0 1 2 3
first second
bar one 0 1 2 3
two 4 5 6 7
baz one 8 9 10 11
two 12 13 14 15
foo one 16 17 18 19
two 20 21 22 23
qux one 24 25 26 27
two 28 29 30 31
I want to generate a NumPy array from this DataFrame with a 3-dimensional structure like
>>> desired_arr
array([[[ 0, 4],
[ 1, 5],
[ 2, 6],
[ 3, 7]],
[[ 8, 12],
[ 9, 13],
[10, 14],
[11, 15]],
[[16, 20],
[17, 21],
[18, 22],
[19, 23]],
[[24, 28],
[25, 29],
[26, 30],
[27, 31]]])
How can I do so?
Hopefully it is clear what is happening here - I am effectively unstacking the DataFrame by the first level and then trying to turn each top level in the resulting column MultiIndex to its own 2-dimensional array.
I can get half way there with
>>> df.unstack(1)
0 1 2 3
second one two one two one two one two
first
bar 0 4 1 5 2 6 3 7
baz 8 12 9 13 10 14 11 15
foo 16 20 17 21 18 22 19 23
qux 24 28 25 29 26 30 27 31
but then I am struggling to find a nice way to turn each column into a 2-dimensional array and then join them together, beyond doing so explicitly with loops and lists.
I feel like there should be some way for me to specify the shape of my desired NumPy array beforehand, fill it with np.nan and then use a specific iterating order to fill the values with my DataFrame, but I have not managed to solve the problem with this approach yet .
To generate the sample DataFrame
iterables = [['bar', 'baz', 'foo', 'qux'], ['one', 'two']]
ind = pd.MultiIndex.from_product(iterables, names=['first', 'second'])
df = pd.DataFrame(np.arange(8*4).reshape((8, 4)), index=ind)
Some reshape and swapaxes magic -
df.values.reshape(4,2,-1).swapaxes(1,2)
Generalizable to -
m,n = len(df.index.levels[0]), len(df.index.levels[1])
arr = df.values.reshape(m,n,-1).swapaxes(1,2)
Basically splitting the first axis into two of lengths 4 and 2 creating a 3D array and then swapping the last two axes, i.e. pushing in the axis of length 2 to the back (as the last one).
Sample output -
In [35]: df.values.reshape(4,2,-1).swapaxes(1,2)
Out[35]:
array([[[ 0, 4],
[ 1, 5],
[ 2, 6],
[ 3, 7]],
[[ 8, 12],
[ 9, 13],
[10, 14],
[11, 15]],
[[16, 20],
[17, 21],
[18, 22],
[19, 23]],
[[24, 28],
[25, 29],
[26, 30],
[27, 31]]])
to complete the answer of #divakar, for a multidimensionnal generalisation :
# sort values by index
A = df.sort_index()
# fill na
for idx in A.index.names:
A = A.unstack(idx).fillna(0).stack(1)
# create a tuple with the rights dimensions
reshape_size = tuple([len(x) for x in A.index.levels])
# reshape
arr = np.reshape(A.values, reshape_size ).swapaxes(0,1)

Categories