Columns of each row in 2D Numpy array do not shuffle [duplicate] - python

Suppose I have a matrix A with some arbitrary values:
array([[ 2, 4, 5, 3],
[ 1, 6, 8, 9],
[ 8, 7, 0, 2]])
And a matrix B which contains indices of elements in A:
array([[0, 0, 1, 2],
[0, 3, 2, 1],
[3, 2, 1, 0]])
How do I select values from A pointed by B, i.e.:
A[B] = [[2, 2, 4, 5],
[1, 9, 8, 6],
[2, 0, 7, 8]]

EDIT: np.take_along_axis is a builtin function for this use case implemented since numpy 1.15. See #hpaulj 's answer below for how to use it.
You can use NumPy's advanced indexing -
A[np.arange(A.shape[0])[:,None],B]
One can also use linear indexing -
m,n = A.shape
out = np.take(A,B + n*np.arange(m)[:,None])
Sample run -
In [40]: A
Out[40]:
array([[2, 4, 5, 3],
[1, 6, 8, 9],
[8, 7, 0, 2]])
In [41]: B
Out[41]:
array([[0, 0, 1, 2],
[0, 3, 2, 1],
[3, 2, 1, 0]])
In [42]: A[np.arange(A.shape[0])[:,None],B]
Out[42]:
array([[2, 2, 4, 5],
[1, 9, 8, 6],
[2, 0, 7, 8]])
In [43]: m,n = A.shape
In [44]: np.take(A,B + n*np.arange(m)[:,None])
Out[44]:
array([[2, 2, 4, 5],
[1, 9, 8, 6],
[2, 0, 7, 8]])

More recent versions have added a take_along_axis function that does the job:
A = np.array([[ 2, 4, 5, 3],
[ 1, 6, 8, 9],
[ 8, 7, 0, 2]])
B = np.array([[0, 0, 1, 2],
[0, 3, 2, 1],
[3, 2, 1, 0]])
np.take_along_axis(A, B, 1)
Out[]:
array([[2, 2, 4, 5],
[1, 9, 8, 6],
[2, 0, 7, 8]])
There's also a put_along_axis.

I know this is an old question, but another way of doing it using indices is:
A[np.indices(B.shape)[0], B]
output:
[[2 2 4 5]
[1 9 8 6]
[2 0 7 8]]

Following is the solution using for loop:
outlist = []
for i in range(len(B)):
lst = []
for j in range(len(B[i])):
lst.append(A[i][B[i][j]])
outlist.append(lst)
outarray = np.asarray(outlist)
print(outarray)
Above can also be written in more succinct list comprehension form:
outlist = [ [A[i][B[i][j]] for j in range(len(B[i]))]
for i in range(len(B)) ]
outarray = np.asarray(outlist)
print(outarray)
Output:
[[2 2 4 5]
[1 9 8 6]
[2 0 7 8]]

Related

Given indexes, get values from numpy matrix

Let's say I have this numpy matrix:
>>> mat = np.matrix([[3,4,5,2,1], [1,2,7,6,5], [8,9,4,5,2]])
>>> mat
matrix([[3, 4, 5, 2, 1],
[1, 2, 7, 6, 5],
[8, 9, 4, 5, 2]])
Now let's say I have some indexes in this form:
>>> ind = np.matrix([[0,2,3], [0,4,2], [3,1,2]])
>>> ind
matrix([[0, 2, 3],
[0, 4, 2],
[3, 1, 2]])
What I would like to do is to get three values from each row of the matrix, specifically values at columns 0, 2, and 3 for the first row, values at columns 0, 4 and 2 for the second row, etc. This is the expected output:
matrix([[3, 5, 2],
[1, 5, 7],
[5, 9, 4]])
I've tried using np.take but it doesn't seem to work. Any suggestion?
This is take_along_axis.
>>> np.take_along_axis(mat, ind, axis=1)
matrix([[3, 5, 2],
[1, 5, 7],
[5, 9, 4]])
This will do it: mat[np.arange(3).reshape(-1, 1), ind]
In [245]: mat[np.arange(3).reshape(-1, 1), ind]
Out[245]:
matrix([[3, 5, 2],
[1, 5, 7],
[5, 9, 4]])
(but take_along_axis in #user3483203's answer is simpler).

Python - Reshape matrix by taking n consecutive rows every n rows

There is a bunch of questions regarding reshaping of matrices using NumPy here on stackoverflow. I have found one that is closely related to what I am trying to achieve. However, this answer is not general enough for my application. So here we are.
I have got a matrix with millions of lines (shape m x n) that looks like this:
[[0, 0, 0, 0],
[1, 1, 1, 1],
[2, 2, 2, 2],
[3, 3, 3, 3],
[4, 4, 4, 4],
[5, 5, 5, 5],
[6, 6, 6, 6],
[7, 7, 7, 7],
[...]]
From this I would like to go to a shape m/2 x 2n like it can be seen below. For that one has to take n consecutive rows every n rows (in this example n = 2). The blocks of consecutively taken rows are then horizontally stacked to the untouched rows. In this example that would mean:
The first two rows stay like they are.
Take row two and three and horizontally concatenate them to row zero and one.
Take row six and seven and horizontally concatenate them to row four and five. This concatenated block then becomes row two and three.
...
[[0, 0, 0, 0, 2, 2, 2, 2],
[1, 1, 1, 1, 3, 3, 3, 3],
[4, 4, 4, 4, 6, 6, 6, 6],
[5, 5, 5, 5, 7, 7, 7, 7],
[...]]
How would I most efficiently (in terms of the least computation time possible) do that using Numpy? And would it make sense to speed the process up using Numba? Or is there not much to speed up?
Assuming your array's length is divisible by 4, here one way you can do it using numpy.hstack after creating the correct indices for selecting the rows for the "left" and "right" parts of the resulting array:
import numpy
# Create the array
N = 1000*4
a = np.hstack([np.arange(0, N)[:, None]]*4) #shape (4000, 4)
a
array([[ 0, 0, 0, 0],
[ 1, 1, 1, 1],
[ 2, 2, 2, 2],
...,
[3997, 3997, 3997, 3997],
[3998, 3998, 3998, 3998],
[3999, 3999, 3999, 3999]])
left_idx = np.array([np.array([0,1]) + 4*i for i in range(N//4)]).reshape(-1)
right_idx = np.array([np.array([2,3]) + 4*i for i in range(N//4)]).reshape(-1)
r = np.hstack([a[left_idx], a[right_idx]]) #shape (2000, 8)
r
array([[ 0, 0, 0, ..., 2, 2, 2],
[ 1, 1, 1, ..., 3, 3, 3],
[ 4, 4, 4, ..., 6, 6, 6],
...,
[3993, 3993, 3993, ..., 3995, 3995, 3995],
[3996, 3996, 3996, ..., 3998, 3998, 3998],
[3997, 3997, 3997, ..., 3999, 3999, 3999]])
Here's an application of the swapaxes answer in your link.
In [11]: x=np.array([[0, 0, 0, 0],
...: [1, 1, 1, 1],
...: [2, 2, 2, 2],
...: [3, 3, 3, 3],
...: [4, 4, 4, 4],
...: [5, 5, 5, 5],
...: [6, 6, 6, 6],
...: [7, 7, 7, 7]])
break the array into 'groups' with a reshape, keeping the number of columns (4) unchanged.
In [17]: x.reshape(2,2,2,4)
Out[17]:
array([[[[0, 0, 0, 0],
[1, 1, 1, 1]],
[[2, 2, 2, 2],
[3, 3, 3, 3]]],
[[[4, 4, 4, 4],
[5, 5, 5, 5]],
[[6, 6, 6, 6],
[7, 7, 7, 7]]]])
swap the 2 middle dimensions, regrouping rows:
In [18]: x.reshape(2,2,2,4).transpose(0,2,1,3)
Out[18]:
array([[[[0, 0, 0, 0],
[2, 2, 2, 2]],
[[1, 1, 1, 1],
[3, 3, 3, 3]]],
[[[4, 4, 4, 4],
[6, 6, 6, 6]],
[[5, 5, 5, 5],
[7, 7, 7, 7]]]])
Then back to the target shape. This final step creates a copy of the original (the previous steps were view):
In [19]: x.reshape(2,2,2,4).transpose(0,2,1,3).reshape(4,8)
Out[19]:
array([[0, 0, 0, 0, 2, 2, 2, 2],
[1, 1, 1, 1, 3, 3, 3, 3],
[4, 4, 4, 4, 6, 6, 6, 6],
[5, 5, 5, 5, 7, 7, 7, 7]])
It's hard to generalize this, since there are different ways of rearranging blocks. For example my first try produced:
In [16]: x.reshape(4,2,4).transpose(1,0,2).reshape(4,8)
Out[16]:
array([[0, 0, 0, 0, 2, 2, 2, 2],
[4, 4, 4, 4, 6, 6, 6, 6],
[1, 1, 1, 1, 3, 3, 3, 3],
[5, 5, 5, 5, 7, 7, 7, 7]])

How to add a new column in a empty list from another list in python

I have a list
L=[[1, 2, 3, 0, 3, 8], [4, 5, 6, 0, 3, 8], [7, 8, 9, 0, 3, 8]]
another list
col=[0,2,3]
and an empty list M = [].
col list has the index of columns of L list that has to be copied to M.
So M should be [[1,3,0],[4,6,0],[7,9,0]].
How can i do this??
I want M as a dataframe.
>>> L=[[1, 2, 3, 0, 3, 8], [4, 5, 6, 0, 3, 8], [7, 8, 9, 0, 3, 8]]
>>> col=[0,2,3]
>>> M = [[nums[i] for i in col] for nums in L]
>>> M
[[1, 3, 0], [4, 6, 0], [7, 9, 0]]
With numpy you can use a list as list index:
>>> import numpy as np
>>> L=np.array([[1, 2, 3, 0, 3, 8], [4, 5, 6, 0, 3, 8], [7, 8, 9, 0, 3, 8]])
>>> col=[0,2,3]
>>> M = [row[col] for row in L]
>>> M
[array([1, 3, 0]), array([4, 6, 0]), array([7, 9, 0])]
>>> M = [list(row[col]) for row in L]
>>> M
[[1, 3, 0], [4, 6, 0], [7, 9, 0]]
You can use operator.itemgetter along with simple list comprehension to fetch your desired elements
>>> from operator import itemgetter
>>> L = [[1, 2, 3, 0, 3, 8], [4, 5, 6, 0, 3, 8], [7, 8, 9, 0, 3, 8]]
>>> col = col=[0,2,3]
>>> M = [list(itemgetter(*col)(i)) for i in l]
>>> M
[[1, 3, 0], [4, 6, 0], [7, 9, 0]]
To Convert it to DataFrame you can do
>>> import pandas as pd
>>> df = pd.DataFrame(M)
>>> df
0 1 2
0 1 3 0
1 4 6 0
2 7 9 0
Just to put it out there, the enumerate solution:
L=[[1, 2, 3, 0, 3, 8], [4, 5, 6, 0, 3, 8], [7, 8, 9, 0, 3, 8]]
col=[0,2,3]
solution = [[j for i, j in enumerate(sub) if i in col] for sub in L]
#[[1, 3, 0], [4, 6, 0], [7, 9, 0]]
This would work faster if col was a set:
col={0,2,3}

Cross-concatenate two matrices in tensorflow

Given two matrices, for example
[1 2] and [5 6]
[3 4] [7 8],
is there a way to concatenate them to get the following matrix?
[1 2 1 2]
[3 4 3 4]
[5 5 6 6]
[7 7 8 8]
NumPy based approach
Using array-initialization with broadcasted-assignment -
def assign_as_blocks(a,b):
m1,n1 = a.shape
m2,n2 = b.shape
out = np.empty((m1+m2,n2,n1),dtype=int)
out[:m1] = a[:,None,:]
out[m1:] = b[:,:,None]
return out.reshape(m1+m2,-1)
To let use tensorflow tools, a modified version would be :
def assign_as_blocks_v2(a,b):
shape1 = tf.shape(a)
shape2 = tf.shape(b)
m1 = shape1[0]
n1 = shape1[1]
m2 = shape2[0]
n2 = shape2[1]
p1 = tf.tile(a,[1,n2])
p2 = tf.reshape(tf.tile(tf.expand_dims(b, 1),[1,1,n1]), [m2,-1])
out = tf.concat((p1,p2),axis=0)
return out
Sample runs
Case #1 (Sample from question) :
In [95]: a
Out[95]:
array([[1, 2],
[3, 4]])
In [96]: b
Out[96]:
array([[5, 6],
[7, 8]])
In [97]: assign_as_blocks(a, b)
Out[97]:
array([[1, 2, 1, 2],
[3, 4, 3, 4],
[5, 5, 6, 6],
[7, 7, 8, 8]])
Case #2 (Generic shaped random array) :
In [106]: np.random.seed(0)
...: a = np.random.randint(0,9,(2,3))
...: b = np.random.randint(0,9,(4,5))
In [107]: a
Out[107]:
array([[5, 0, 3],
[3, 7, 3]])
In [108]: b
Out[108]:
array([[5, 2, 4, 7, 6],
[8, 8, 1, 6, 7],
[7, 8, 1, 5, 8],
[4, 3, 0, 3, 5]])
In [109]: assign_as_blocks(a, b)
Out[109]:
array([[5, 0, 3, 5, 0, 3, 5, 0, 3, 5, 0, 3, 5, 0, 3],
[3, 7, 3, 3, 7, 3, 3, 7, 3, 3, 7, 3, 3, 7, 3],
[5, 5, 5, 2, 2, 2, 4, 4, 4, 7, 7, 7, 6, 6, 6],
[8, 8, 8, 8, 8, 8, 1, 1, 1, 6, 6, 6, 7, 7, 7],
[7, 7, 7, 8, 8, 8, 1, 1, 1, 5, 5, 5, 8, 8, 8],
[4, 4, 4, 3, 3, 3, 0, 0, 0, 3, 3, 3, 5, 5, 5]])
>>> a = [[1,2],[3,4]]
>>> b = [[5,6],[7,8]]
>>> np.r_[np.kron([1,1],a),np.kron(b, [1,1])]
array([[1, 2, 1, 2],
[3, 4, 3, 4],
[5, 5, 6, 6],
[7, 7, 8, 8]])

Efficient way of making a list of pairs from an array in Numpy

I have a numpy array x (with (n,4) shape) of integers like:
[[0 1 2 3],
[1 2 7 9],
[2 1 5 2],
...]
I want to transform the array into an array of pairs:
[0,1]
[0,2]
[0,3]
[1,2]
...
so first element makes a pair with other elements in the same sub-array. I have already a for-loop solution:
y=np.array([[x[j,0],x[j,i]] for i in range(1,4) for j in range(0,n)],dtype=int)
but since looping over numpy array is not efficient, I tried slicing as the solution. I can do the slicing for every column as:
y[1]=np.array([x[:,0],x[:,1]]).T
# [[0,1],[1,2],[2,1],...]
I can repeat this for all columns. My questions are:
How can I append y[2] to y[1],... such that the shape is (N,2)?
If number of columns is not small (in this example 4), how can I find y[i] elegantly?
What are the alternative ways to achieve the final array?
The cleanest way of doing this I can think of would be:
>>> x = np.arange(12).reshape(3, 4)
>>> x
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
>>> n = x.shape[1] - 1
>>> y = np.repeat(x, (n,)+(1,)*n, axis=1)
>>> y
array([[ 0, 0, 0, 1, 2, 3],
[ 4, 4, 4, 5, 6, 7],
[ 8, 8, 8, 9, 10, 11]])
>>> y.reshape(-1, 2, n).transpose(0, 2, 1).reshape(-1, 2)
array([[ 0, 1],
[ 0, 2],
[ 0, 3],
[ 4, 5],
[ 4, 6],
[ 4, 7],
[ 8, 9],
[ 8, 10],
[ 8, 11]])
This will make two copies of the data, so it will not be the most efficient method. That would probably be something like:
>>> y = np.empty((x.shape[0], n, 2), dtype=x.dtype)
>>> y[..., 0] = x[:, 0, None]
>>> y[..., 1] = x[:, 1:]
>>> y.shape = (-1, 2)
>>> y
array([[ 0, 1],
[ 0, 2],
[ 0, 3],
[ 4, 5],
[ 4, 6],
[ 4, 7],
[ 8, 9],
[ 8, 10],
[ 8, 11]])
Like Jaimie, I first tried a repeat of the 1st column followed by reshaping, but then decided it was simpler to make 2 intermediary arrays, and hstack them:
x=np.array([[0,1,2,3],[1,2,7,9],[2,1,5,2]])
m,n=x.shape
x1=x[:,0].repeat(n-1)[:,None]
x2=x[:,1:].reshape(-1,1)
np.hstack([x1,x2])
producing
array([[0, 1],
[0, 2],
[0, 3],
[1, 2],
[1, 7],
[1, 9],
[2, 1],
[2, 5],
[2, 2]])
There probably are other ways of doing this sort of rearrangement. The result will copy the original data in one way or other. My guess is that as long as you are using compiled functions like reshape and repeat, the time differences won't be significant.
Suppose the numpy array is
arr = np.array([[0, 1, 2, 3],
[1, 2, 7, 9],
[2, 1, 5, 2]])
You can get the array of pairs as
import itertools
m, n = arr.shape
new_arr = np.array([x for i in range(m)
for x in itertools.product(a[i, 0 : 1], a[i, 1 : n])])
The output would be
array([[0, 1],
[0, 2],
[0, 3],
[1, 2],
[1, 7],
[1, 9],
[2, 1],
[2, 5],
[2, 2]])

Categories