Related
Let's say I have this numpy matrix:
>>> mat = np.matrix([[3,4,5,2,1], [1,2,7,6,5], [8,9,4,5,2]])
>>> mat
matrix([[3, 4, 5, 2, 1],
[1, 2, 7, 6, 5],
[8, 9, 4, 5, 2]])
Now let's say I have some indexes in this form:
>>> ind = np.matrix([[0,2,3], [0,4,2], [3,1,2]])
>>> ind
matrix([[0, 2, 3],
[0, 4, 2],
[3, 1, 2]])
What I would like to do is to get three values from each row of the matrix, specifically values at columns 0, 2, and 3 for the first row, values at columns 0, 4 and 2 for the second row, etc. This is the expected output:
matrix([[3, 5, 2],
[1, 5, 7],
[5, 9, 4]])
I've tried using np.take but it doesn't seem to work. Any suggestion?
This is take_along_axis.
>>> np.take_along_axis(mat, ind, axis=1)
matrix([[3, 5, 2],
[1, 5, 7],
[5, 9, 4]])
This will do it: mat[np.arange(3).reshape(-1, 1), ind]
In [245]: mat[np.arange(3).reshape(-1, 1), ind]
Out[245]:
matrix([[3, 5, 2],
[1, 5, 7],
[5, 9, 4]])
(but take_along_axis in #user3483203's answer is simpler).
There is a bunch of questions regarding reshaping of matrices using NumPy here on stackoverflow. I have found one that is closely related to what I am trying to achieve. However, this answer is not general enough for my application. So here we are.
I have got a matrix with millions of lines (shape m x n) that looks like this:
[[0, 0, 0, 0],
[1, 1, 1, 1],
[2, 2, 2, 2],
[3, 3, 3, 3],
[4, 4, 4, 4],
[5, 5, 5, 5],
[6, 6, 6, 6],
[7, 7, 7, 7],
[...]]
From this I would like to go to a shape m/2 x 2n like it can be seen below. For that one has to take n consecutive rows every n rows (in this example n = 2). The blocks of consecutively taken rows are then horizontally stacked to the untouched rows. In this example that would mean:
The first two rows stay like they are.
Take row two and three and horizontally concatenate them to row zero and one.
Take row six and seven and horizontally concatenate them to row four and five. This concatenated block then becomes row two and three.
...
[[0, 0, 0, 0, 2, 2, 2, 2],
[1, 1, 1, 1, 3, 3, 3, 3],
[4, 4, 4, 4, 6, 6, 6, 6],
[5, 5, 5, 5, 7, 7, 7, 7],
[...]]
How would I most efficiently (in terms of the least computation time possible) do that using Numpy? And would it make sense to speed the process up using Numba? Or is there not much to speed up?
Assuming your array's length is divisible by 4, here one way you can do it using numpy.hstack after creating the correct indices for selecting the rows for the "left" and "right" parts of the resulting array:
import numpy
# Create the array
N = 1000*4
a = np.hstack([np.arange(0, N)[:, None]]*4) #shape (4000, 4)
a
array([[ 0, 0, 0, 0],
[ 1, 1, 1, 1],
[ 2, 2, 2, 2],
...,
[3997, 3997, 3997, 3997],
[3998, 3998, 3998, 3998],
[3999, 3999, 3999, 3999]])
left_idx = np.array([np.array([0,1]) + 4*i for i in range(N//4)]).reshape(-1)
right_idx = np.array([np.array([2,3]) + 4*i for i in range(N//4)]).reshape(-1)
r = np.hstack([a[left_idx], a[right_idx]]) #shape (2000, 8)
r
array([[ 0, 0, 0, ..., 2, 2, 2],
[ 1, 1, 1, ..., 3, 3, 3],
[ 4, 4, 4, ..., 6, 6, 6],
...,
[3993, 3993, 3993, ..., 3995, 3995, 3995],
[3996, 3996, 3996, ..., 3998, 3998, 3998],
[3997, 3997, 3997, ..., 3999, 3999, 3999]])
Here's an application of the swapaxes answer in your link.
In [11]: x=np.array([[0, 0, 0, 0],
...: [1, 1, 1, 1],
...: [2, 2, 2, 2],
...: [3, 3, 3, 3],
...: [4, 4, 4, 4],
...: [5, 5, 5, 5],
...: [6, 6, 6, 6],
...: [7, 7, 7, 7]])
break the array into 'groups' with a reshape, keeping the number of columns (4) unchanged.
In [17]: x.reshape(2,2,2,4)
Out[17]:
array([[[[0, 0, 0, 0],
[1, 1, 1, 1]],
[[2, 2, 2, 2],
[3, 3, 3, 3]]],
[[[4, 4, 4, 4],
[5, 5, 5, 5]],
[[6, 6, 6, 6],
[7, 7, 7, 7]]]])
swap the 2 middle dimensions, regrouping rows:
In [18]: x.reshape(2,2,2,4).transpose(0,2,1,3)
Out[18]:
array([[[[0, 0, 0, 0],
[2, 2, 2, 2]],
[[1, 1, 1, 1],
[3, 3, 3, 3]]],
[[[4, 4, 4, 4],
[6, 6, 6, 6]],
[[5, 5, 5, 5],
[7, 7, 7, 7]]]])
Then back to the target shape. This final step creates a copy of the original (the previous steps were view):
In [19]: x.reshape(2,2,2,4).transpose(0,2,1,3).reshape(4,8)
Out[19]:
array([[0, 0, 0, 0, 2, 2, 2, 2],
[1, 1, 1, 1, 3, 3, 3, 3],
[4, 4, 4, 4, 6, 6, 6, 6],
[5, 5, 5, 5, 7, 7, 7, 7]])
It's hard to generalize this, since there are different ways of rearranging blocks. For example my first try produced:
In [16]: x.reshape(4,2,4).transpose(1,0,2).reshape(4,8)
Out[16]:
array([[0, 0, 0, 0, 2, 2, 2, 2],
[4, 4, 4, 4, 6, 6, 6, 6],
[1, 1, 1, 1, 3, 3, 3, 3],
[5, 5, 5, 5, 7, 7, 7, 7]])
I have a list
L=[[1, 2, 3, 0, 3, 8], [4, 5, 6, 0, 3, 8], [7, 8, 9, 0, 3, 8]]
another list
col=[0,2,3]
and an empty list M = [].
col list has the index of columns of L list that has to be copied to M.
So M should be [[1,3,0],[4,6,0],[7,9,0]].
How can i do this??
I want M as a dataframe.
>>> L=[[1, 2, 3, 0, 3, 8], [4, 5, 6, 0, 3, 8], [7, 8, 9, 0, 3, 8]]
>>> col=[0,2,3]
>>> M = [[nums[i] for i in col] for nums in L]
>>> M
[[1, 3, 0], [4, 6, 0], [7, 9, 0]]
With numpy you can use a list as list index:
>>> import numpy as np
>>> L=np.array([[1, 2, 3, 0, 3, 8], [4, 5, 6, 0, 3, 8], [7, 8, 9, 0, 3, 8]])
>>> col=[0,2,3]
>>> M = [row[col] for row in L]
>>> M
[array([1, 3, 0]), array([4, 6, 0]), array([7, 9, 0])]
>>> M = [list(row[col]) for row in L]
>>> M
[[1, 3, 0], [4, 6, 0], [7, 9, 0]]
You can use operator.itemgetter along with simple list comprehension to fetch your desired elements
>>> from operator import itemgetter
>>> L = [[1, 2, 3, 0, 3, 8], [4, 5, 6, 0, 3, 8], [7, 8, 9, 0, 3, 8]]
>>> col = col=[0,2,3]
>>> M = [list(itemgetter(*col)(i)) for i in l]
>>> M
[[1, 3, 0], [4, 6, 0], [7, 9, 0]]
To Convert it to DataFrame you can do
>>> import pandas as pd
>>> df = pd.DataFrame(M)
>>> df
0 1 2
0 1 3 0
1 4 6 0
2 7 9 0
Just to put it out there, the enumerate solution:
L=[[1, 2, 3, 0, 3, 8], [4, 5, 6, 0, 3, 8], [7, 8, 9, 0, 3, 8]]
col=[0,2,3]
solution = [[j for i, j in enumerate(sub) if i in col] for sub in L]
#[[1, 3, 0], [4, 6, 0], [7, 9, 0]]
This would work faster if col was a set:
col={0,2,3}
Given two matrices, for example
[1 2] and [5 6]
[3 4] [7 8],
is there a way to concatenate them to get the following matrix?
[1 2 1 2]
[3 4 3 4]
[5 5 6 6]
[7 7 8 8]
NumPy based approach
Using array-initialization with broadcasted-assignment -
def assign_as_blocks(a,b):
m1,n1 = a.shape
m2,n2 = b.shape
out = np.empty((m1+m2,n2,n1),dtype=int)
out[:m1] = a[:,None,:]
out[m1:] = b[:,:,None]
return out.reshape(m1+m2,-1)
To let use tensorflow tools, a modified version would be :
def assign_as_blocks_v2(a,b):
shape1 = tf.shape(a)
shape2 = tf.shape(b)
m1 = shape1[0]
n1 = shape1[1]
m2 = shape2[0]
n2 = shape2[1]
p1 = tf.tile(a,[1,n2])
p2 = tf.reshape(tf.tile(tf.expand_dims(b, 1),[1,1,n1]), [m2,-1])
out = tf.concat((p1,p2),axis=0)
return out
Sample runs
Case #1 (Sample from question) :
In [95]: a
Out[95]:
array([[1, 2],
[3, 4]])
In [96]: b
Out[96]:
array([[5, 6],
[7, 8]])
In [97]: assign_as_blocks(a, b)
Out[97]:
array([[1, 2, 1, 2],
[3, 4, 3, 4],
[5, 5, 6, 6],
[7, 7, 8, 8]])
Case #2 (Generic shaped random array) :
In [106]: np.random.seed(0)
...: a = np.random.randint(0,9,(2,3))
...: b = np.random.randint(0,9,(4,5))
In [107]: a
Out[107]:
array([[5, 0, 3],
[3, 7, 3]])
In [108]: b
Out[108]:
array([[5, 2, 4, 7, 6],
[8, 8, 1, 6, 7],
[7, 8, 1, 5, 8],
[4, 3, 0, 3, 5]])
In [109]: assign_as_blocks(a, b)
Out[109]:
array([[5, 0, 3, 5, 0, 3, 5, 0, 3, 5, 0, 3, 5, 0, 3],
[3, 7, 3, 3, 7, 3, 3, 7, 3, 3, 7, 3, 3, 7, 3],
[5, 5, 5, 2, 2, 2, 4, 4, 4, 7, 7, 7, 6, 6, 6],
[8, 8, 8, 8, 8, 8, 1, 1, 1, 6, 6, 6, 7, 7, 7],
[7, 7, 7, 8, 8, 8, 1, 1, 1, 5, 5, 5, 8, 8, 8],
[4, 4, 4, 3, 3, 3, 0, 0, 0, 3, 3, 3, 5, 5, 5]])
>>> a = [[1,2],[3,4]]
>>> b = [[5,6],[7,8]]
>>> np.r_[np.kron([1,1],a),np.kron(b, [1,1])]
array([[1, 2, 1, 2],
[3, 4, 3, 4],
[5, 5, 6, 6],
[7, 7, 8, 8]])
I have a numpy array x (with (n,4) shape) of integers like:
[[0 1 2 3],
[1 2 7 9],
[2 1 5 2],
...]
I want to transform the array into an array of pairs:
[0,1]
[0,2]
[0,3]
[1,2]
...
so first element makes a pair with other elements in the same sub-array. I have already a for-loop solution:
y=np.array([[x[j,0],x[j,i]] for i in range(1,4) for j in range(0,n)],dtype=int)
but since looping over numpy array is not efficient, I tried slicing as the solution. I can do the slicing for every column as:
y[1]=np.array([x[:,0],x[:,1]]).T
# [[0,1],[1,2],[2,1],...]
I can repeat this for all columns. My questions are:
How can I append y[2] to y[1],... such that the shape is (N,2)?
If number of columns is not small (in this example 4), how can I find y[i] elegantly?
What are the alternative ways to achieve the final array?
The cleanest way of doing this I can think of would be:
>>> x = np.arange(12).reshape(3, 4)
>>> x
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
>>> n = x.shape[1] - 1
>>> y = np.repeat(x, (n,)+(1,)*n, axis=1)
>>> y
array([[ 0, 0, 0, 1, 2, 3],
[ 4, 4, 4, 5, 6, 7],
[ 8, 8, 8, 9, 10, 11]])
>>> y.reshape(-1, 2, n).transpose(0, 2, 1).reshape(-1, 2)
array([[ 0, 1],
[ 0, 2],
[ 0, 3],
[ 4, 5],
[ 4, 6],
[ 4, 7],
[ 8, 9],
[ 8, 10],
[ 8, 11]])
This will make two copies of the data, so it will not be the most efficient method. That would probably be something like:
>>> y = np.empty((x.shape[0], n, 2), dtype=x.dtype)
>>> y[..., 0] = x[:, 0, None]
>>> y[..., 1] = x[:, 1:]
>>> y.shape = (-1, 2)
>>> y
array([[ 0, 1],
[ 0, 2],
[ 0, 3],
[ 4, 5],
[ 4, 6],
[ 4, 7],
[ 8, 9],
[ 8, 10],
[ 8, 11]])
Like Jaimie, I first tried a repeat of the 1st column followed by reshaping, but then decided it was simpler to make 2 intermediary arrays, and hstack them:
x=np.array([[0,1,2,3],[1,2,7,9],[2,1,5,2]])
m,n=x.shape
x1=x[:,0].repeat(n-1)[:,None]
x2=x[:,1:].reshape(-1,1)
np.hstack([x1,x2])
producing
array([[0, 1],
[0, 2],
[0, 3],
[1, 2],
[1, 7],
[1, 9],
[2, 1],
[2, 5],
[2, 2]])
There probably are other ways of doing this sort of rearrangement. The result will copy the original data in one way or other. My guess is that as long as you are using compiled functions like reshape and repeat, the time differences won't be significant.
Suppose the numpy array is
arr = np.array([[0, 1, 2, 3],
[1, 2, 7, 9],
[2, 1, 5, 2]])
You can get the array of pairs as
import itertools
m, n = arr.shape
new_arr = np.array([x for i in range(m)
for x in itertools.product(a[i, 0 : 1], a[i, 1 : n])])
The output would be
array([[0, 1],
[0, 2],
[0, 3],
[1, 2],
[1, 7],
[1, 9],
[2, 1],
[2, 5],
[2, 2]])