What would be the best way of converting a 2D numpy array into a list of 1D columns?
For instance, for an array:
array([[ 0, 5, 10],
[ 1, 6, 11],
[ 2, 7, 12],
[ 3, 8, 13],
[ 4, 9, 14]])
I would like to get:
[array([0, 1, 2, 3, 4]), array([5, 6, 7, 8, 9]), array([10, 11, 12, 13, 14])]
This works:
[a[:, i] for i in range(a.shape[1])]
but I was wondering if there is a better solution using pure Numpy functions?
I can't think of any reason you would need
[array([0, 1, 2, 3, 4]), array([5, 6, 7, 8, 9]), array([10, 11, 12, 13, 14])]
Instead of
array([[0, 1, 2, 3, 4], [5, 6, 7, 8, 9], [10, 11, 12, 13, 14]])
Which you can get simply with a.T
If you really need a list, then you can use list(a.T)
Here is one way:
In [20]: list(a.T)
Out[20]: [array([0, 1, 2, 3, 4]), array([5, 6, 7, 8, 9]), array([10, 11, 12, 13, 14])]
Here is another one:
In [40]: [*map(np.ravel, np.hsplit(a, 3))]
Out[40]: [array([0, 1, 2, 3, 4]), array([5, 6, 7, 8, 9]), array([10, 11, 12, 13, 14])]
Simply call the list function:
list(a.T)
Well, here's another way. (I assume x is your original 2d array.)
[row for row in x.T]
Still uses a Python list generator expression, however.
Related
What happens to the original numpy array when we slice it and set it to the same variable?
>>> a = np.arange(15).reshape([3,5])
>>> a
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14]])
>>> a = a[2:,:]
>>> a
array([[10, 11, 12, 13, 14]])
What happened to the original a array, was it garbage collected? However, we need the original array to refer to, so where is it stored?
In [69]: a=np.arange(15).reshape(3,5)
Arrays have a base attribute; in this case it references the 1d arange:
In [70]: a.base
Out[70]: array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14])
In [71]: a = a[2:,:]
In [72]: a
Out[72]: array([[10, 11, 12, 13, 14]])
Same base. The (3,5) view is not available:
In [73]: a.base
Out[73]: array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14])
But if the 2d array is a copy, not just a view:
In [74]: a=np.arange(15).reshape(3,5).copy()
In [75]: a.base
In [76]: a = a[2:,:]
In [77]: a.base
Out[77]:
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14]])
I thought I understood the reshape function in Numpy until I was messing around with it and came across this example:
a = np.arange(16).reshape((4,4))
which returns:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15]])
This makes sense to me, but then when I do:
a.reshape((2,8), order = 'F')
it returns:
array([[0, 8, 1, 9, 2, 10, 3, 11],
[4, 12, 5, 13, 6, 14, 7, 15]])
I would expect it to return:
array([[0, 4, 8, 12, 1, 5, 9, 13],
[2, 6, 10, 14, 3, 7, 11, 15]])
Can someone please explain what is happening here?
The elements of a in order 'F'
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15]])
are [0,4,8,12,1,5,9 ...]
Now rearrange them in a (2,8) array.
I think the reshape docs talks about raveling the elements, and then reshaping them. Evidently the ravel is done first.
Experiment with a.ravel(order='F').reshape(2,8).
Oops, I get what you expected:
In [208]: a = np.arange(16).reshape(4,4)
In [209]: a
Out[209]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15]])
In [210]: a.ravel(order='F')
Out[210]: array([ 0, 4, 8, 12, 1, 5, 9, 13, 2, 6, 10, 14, 3, 7, 11, 15])
In [211]: _.reshape(2,8)
Out[211]:
array([[ 0, 4, 8, 12, 1, 5, 9, 13],
[ 2, 6, 10, 14, 3, 7, 11, 15]])
OK, I have to keep the 'F' order during the reshape
In [214]: a.ravel(order='F').reshape(2,8, order='F')
Out[214]:
array([[ 0, 8, 1, 9, 2, 10, 3, 11],
[ 4, 12, 5, 13, 6, 14, 7, 15]])
In [215]: a.ravel(order='F').reshape(2,8).flags
Out[215]:
C_CONTIGUOUS : True
F_CONTIGUOUS : False
...
In [216]: a.ravel(order='F').reshape(2,8, order='F').flags
Out[216]:
C_CONTIGUOUS : False
F_CONTIGUOUS : True
From np.reshape docs
You can think of reshaping as first raveling the array (using the given
index order), then inserting the elements from the raveled array into the
new array using the same kind of index ordering as was used for the
raveling.
The notes on order are fairly long, so it's not surprising that the topic is confusing.
Given two arrays (A and B) of different shapes, I've like to produce an array containing the concatenation of every row from A with every row from B.
E.g. given:
A = np.array([[1, 2],
[3, 4],
[5, 6]])
B = np.array([[7, 8, 9],
[10, 11, 12]])
would like to produce the array:
[[1, 2, 7, 8, 9],
[1, 2, 10, 11, 12],
[3, 4, 7, 8, 9],
[3, 4, 10, 11, 12],
[5, 6, 7, 8, 9],
[5, 6, 10, 11, 12]]
I can do this with iteration, but it's very slow, so looking for some combination of numpy functions that can recreate the above as efficiently as possible (the input arrays A and B will be up to 10,000 rows in size, hence looking to avoid nested loops).
Perfect problem to learn about slicing and broadcasted-indexing.
Here's a vectorized solution using those tools -
def concatenate_per_row(A, B):
m1,n1 = A.shape
m2,n2 = B.shape
out = np.zeros((m1,m2,n1+n2),dtype=A.dtype)
out[:,:,:n1] = A[:,None,:]
out[:,:,n1:] = B
return out.reshape(m1*m2,-1)
Sample run -
In [441]: A
Out[441]:
array([[1, 2],
[3, 4],
[5, 6]])
In [442]: B
Out[442]:
array([[ 7, 8, 9],
[10, 11, 12]])
In [443]: concatenate_per_row(A, B)
Out[443]:
array([[ 1, 2, 7, 8, 9],
[ 1, 2, 10, 11, 12],
[ 3, 4, 7, 8, 9],
[ 3, 4, 10, 11, 12],
[ 5, 6, 7, 8, 9],
[ 5, 6, 10, 11, 12]])
Reference: numpy.concatenate on record arrays fails when array has different length strings
import numpy as np
from numpy.lib.recfunctions import stack_arrays
from pprint import pprint
A = np.array([[1, 2],
[3, 4],
[5, 6]])
B = np.array([[7, 8, 9],
[10, 11, 12]])
cartesian = [stack_arrays((a, b), usemask=False) for a in A
for b in B]
pprint(cartesian)
Output:
[array([1, 2, 7, 8, 9]),
array([ 1, 2, 10, 11, 12]),
array([3, 4, 7, 8, 9]),
array([ 3, 4, 10, 11, 12]),
array([5, 6, 7, 8, 9]),
array([ 5, 6, 10, 11, 12])]
I have two arrays.
"a", a 2d numpy array.
import numpy.random as npr
a = array([[5,6,7,8,9],[10,11,12,14,15]])
array([[ 5, 6, 7, 8, 9],
[10, 11, 12, 14, 15]])
"idx", a 3d numpy array constituting three index variants I want to use to index "a".
idx = npr.randint(5, size=(nsamp,shape(a)[0], shape(a)[1]))
array([[[1, 2, 1, 3, 4],
[2, 0, 2, 0, 1]],
[[0, 0, 3, 2, 0],
[1, 3, 2, 0, 3]],
[[2, 1, 0, 1, 4],
[1, 1, 0, 1, 0]]])
Now I want to index "a" three times with the indices in "idx" to obtain an object as follows:
array([[[6, 7, 6, 8, 9],
[12, 10, 12, 10, 11]],
[[5, 5, 8, 7, 5],
[11, 14, 12, 10, 14]],
[[7, 6, 5, 6, 9],
[11, 11, 10, 11, 10]]])
The naive "a[idx]" does not work. Any ideas as to how to do this? (I use Python 3.4 and numpy 1.9)
You can use choose to make the selection from a:
>>> np.choose(idx, a.T[:,:,np.newaxis])
array([[[ 6, 7, 6, 8, 9],
[12, 10, 12, 10, 11]],
[[ 5, 5, 8, 7, 5],
[11, 14, 12, 10, 14]],
[[ 7, 6, 5, 6, 9],
[11, 11, 10, 11, 10]]])
As you can see, a has to be reshaped from an array with shape (2, 5) to an array with shape (5, 2, 1) first. This is essentially so that it is broadcastable with idx, which has shape (3, 2, 5).
(I learned this method from #immerrr's answer here: https://stackoverflow.com/a/26225395/3923281)
You can use take array method:
import numpy
a = numpy.array([[5,6,7,8,9],[10,11,12,14,15]])
idx = numpy.random.randint(5, size=(3, a.shape[0], a.shape[1]))
print a.take(idx)
The native python codes are like this:
>>> a=[1,2,3,4,5,6]
>>> [[i+j for i in a] for j in a]
[[2, 3, 4, 5, 6, 7],
[3, 4, 5, 6, 7, 8],
[4, 5, 6, 7, 8, 9],
[5, 6, 7, 8, 9, 10],
[6, 7, 8, 9, 10, 11],
[7, 8, 9, 10, 11, 12]]
However, I have to use numpy to do this job as the array is very large.
Does anyone have ideas about how to do the same work in numpy?
Many NumPy binary operators have an outer method which can be used to form the equivalent of a multiplication (or in this case, addition) table:
In [260]: import numpy as np
In [255]: a = np.arange(1,7)
In [256]: a
Out[256]: array([1, 2, 3, 4, 5, 6])
In [259]: np.add.outer(a,a)
Out[259]:
array([[ 2, 3, 4, 5, 6, 7],
[ 3, 4, 5, 6, 7, 8],
[ 4, 5, 6, 7, 8, 9],
[ 5, 6, 7, 8, 9, 10],
[ 6, 7, 8, 9, 10, 11],
[ 7, 8, 9, 10, 11, 12]])