I have a 2D numpy array of shape (N,2) which is holding N points (x and y coordinates). For example:
array([[3, 2],
[6, 2],
[3, 6],
[3, 4],
[5, 3]])
I'd like to sort it such that my points are ordered by x-coordinate, and then by y in cases where the x coordinate is the same. So the array above should look like this:
array([[3, 2],
[3, 4],
[3, 6],
[5, 3],
[6, 2]])
If this was a normal Python list, I would simply define a comparator to do what I want, but as far as I can tell, numpy's sort function doesn't accept user-defined comparators. Any ideas?
EDIT: Thanks for the ideas! I set up a quick test case with 1000000 random integer points, and benchmarked the ones that I could run (sorry, can't upgrade numpy at the moment).
Mine: 4.078 secs
mtrw: 7.046 secs
unutbu: 0.453 secs
Using lexsort:
import numpy as np
a = np.array([(3, 2), (6, 2), (3, 6), (3, 4), (5, 3)])
ind = np.lexsort((a[:,1],a[:,0]))
a[ind]
# array([[3, 2],
# [3, 4],
# [3, 6],
# [5, 3],
# [6, 2]])
a.ravel() returns a view if a is C_CONTIGUOUS. If that is true,
#ars's method, slightly modifed by using ravel instead of flatten, yields a nice way to sort a in-place:
a = np.array([(3, 2), (6, 2), (3, 6), (3, 4), (5, 3)])
dt = [('col1', a.dtype),('col2', a.dtype)]
assert a.flags['C_CONTIGUOUS']
b = a.ravel().view(dt)
b.sort(order=['col1','col2'])
Since b is a view of a, sorting b sorts a as well:
print(a)
# [[3 2]
# [3 4]
# [3 6]
# [5 3]
# [6 2]]
The title says "sorting 2D arrays". Although the questioner uses an (N,2)-shaped array, it's possible to generalize unutbu's solution to work with any (N,M) array, as that's what people might actually be looking for.
One could transpose the array and use slice notation with negative step to pass all the columns to lexsort in reversed order:
>>> import numpy as np
>>> a = np.random.randint(1, 6, (10, 3))
>>> a
array([[4, 2, 3],
[4, 2, 5],
[3, 5, 5],
[1, 5, 5],
[3, 2, 1],
[5, 2, 2],
[3, 2, 3],
[4, 3, 4],
[3, 4, 1],
[5, 3, 4]])
>>> a[np.lexsort(np.transpose(a)[::-1])]
array([[1, 5, 5],
[3, 2, 1],
[3, 2, 3],
[3, 4, 1],
[3, 5, 5],
[4, 2, 3],
[4, 2, 5],
[4, 3, 4],
[5, 2, 2],
[5, 3, 4]])
The numpy_indexed package (disclaimer: I am its author) can be used to solve these kind of processing-on-nd-array problems in an efficient fully vectorized manner:
import numpy_indexed as npi
npi.sort(a) # by default along axis=0, but configurable
You can use np.complex_sort. This has the side effect of changing your data to floating point, I hope that's not a problem:
>>> a = np.array([[3, 2], [6, 2], [3, 6], [3, 4], [5, 3]])
>>> atmp = np.sort_complex(a[:,0] + a[:,1]*1j)
>>> b = np.array([[np.real(x), np.imag(x)] for x in atmp])
>>> b
array([[ 3., 2.],
[ 3., 4.],
[ 3., 6.],
[ 5., 3.],
[ 6., 2.]])
I was struggling with the same thing and just got help and solved the problem. It works smoothly if your array have column names (structured array) and I think this is a very simple way to sort using the same logic that excel does:
array_name[array_name[['colname1','colname2']].argsort()]
Note the double-brackets enclosing the sorting criteria. And off course, you can use more than 2 columns as sorting criteria.
EDIT: removed bad answer.
Here's one way to do it using an intermediate structured array:
from numpy import array
a = array([[3, 2], [6, 2], [3, 6], [3, 4], [5, 3]])
b = a.flatten()
b.dtype = [('x', '<i4'), ('y', '<i4')]
b.sort()
b.dtype = '<i4'
b.shape = a.shape
print b
which gives the desired output:
[[3 2]
[3 4]
[3 6]
[5 3]
[6 2]]
Not sure if this is quite the best way to go about it though.
I found one way to do it:
from numpy import array
a = array([(3,2),(6,2),(3,6),(3,4),(5,3)])
array(sorted(sorted(a,key=lambda e:e[1]),key=lambda e:e[0]))
It's pretty terrible to have to sort twice (and use the plain python sorted function instead of a faster numpy sort), but it does fit nicely on one line.
Related
I need to create a 2-D numpy array using only list comprehension, but it has to follow the following format:
[[1, 2, 3],
[2, 3, 4],
[3, 4, 5],
[4, 5, 6],
[5, 6, 7]]]
So far, all I've managed to figure out is:
two_d_array = np.array([[x+1 for x in range(3)] for y in range(5)])
Giving:
array([[1, 2, 3],
[1, 2, 3],
[1, 2, 3],
[1, 2, 3],
[1, 2, 3]])
Just not very sure how to change the incrementation. Any help would be appreciated, thanks!
EDIT: Accidentally left out [3, 4, 5] in example. Included it now.
Here's a quick one-liner that will do the job:
np.array([np.arange(i, i+3) for i in range(1, 6)])
Where 3 is the number of columns, or elements in each array, and 6 is the number of iterations to perform - or in this case, the number of arrays to create; which is why there are 5 arrays in the output.
Output:
array([[1, 2, 3],
[2, 3, 4],
[3, 4, 5],
[4, 5, 6],
[5, 6, 7]])
Change the code, something like this can work:
two_d_array = np.array([[(y*3)+x+1 for x in range(3)] for y in range(5)])
>>> [[1,2,3],[4,5,6],...]
two_d_array = np.array([[y+x+1 for x in range(3)] for y in range(5)])
>>> [[1,2,3],[2,3,4],...]
You've got a couple of good comprehension answers, so here are a couple of numpy solutions.
Simple addition:
np.arange(1, 6)[:, None] + np.arange(3)
Crazy stride tricks:
base = np.arange(1, 8)
np.lib.stride_tricks.as_strided(base, shape=(5, 3), strides=base.strides * 2).copy()
Reshaped cumulative sum:
base = np.ones(15)
base[3::3] = -1
np.cumsum(base).reshape(5, 3)
This question already has an answer here:
how to reshape an N length vector to a 3x(N/3) matrix in numpy using reshape
(1 answer)
Closed 2 years ago.
I have an array: [1, 2, 3, 4, 5, 6]. I would like to use the numpy.reshape() function so that I end up with this array:
[[1, 4],
[2, 5],
[3, 6]
]
I'm not sure how to do this. I keep ending up with this, which is not what I want:
[[1, 2],
[3, 4],
[5, 6]
]
These do the same thing:
In [57]: np.reshape([1,2,3,4,5,6], (3,2), order='F')
Out[57]:
array([[1, 4],
[2, 5],
[3, 6]])
In [58]: np.reshape([1,2,3,4,5,6], (2,3)).T
Out[58]:
array([[1, 4],
[2, 5],
[3, 6]])
Normally values are 'read' across the rows in Python/numpy. This is call row-major or 'C' order. Read down is 'F', for FORTRAN, and is common in MATLAB, which has Fortran roots.
If you take the 'F' order, make a new copy and string it out, you'll get a different order:
In [59]: np.reshape([1,2,3,4,5,6], (3,2), order='F').copy().ravel()
Out[59]: array([1, 4, 2, 5, 3, 6])
You can set the order in np.reshape, in your case you can use 'F'. See docs for details
>>> arr
array([1, 2, 3, 4, 5, 6])
>>> arr.reshape(-1, 2, order = 'F')
array([[1, 4],
[2, 5],
[3, 6]])
The reason that you are getting that particular result is that arrays are normally allocates in C order. That means that reshaping by itself is not sufficient. You have to tell numpy to change the order of the axes when it steps along the array. Any number of operations will allow you to do that:
Set the axis order to F. F is for Fortran, which, like MATLAB, conventionally uses column-major order:
a.reshape(2, 3, order='F')
Swap the axes after reshaping:
np.swapaxes(a.reshape(2, 3), 0, 1)
Transpose the result:
a.reshape(2, 3).T
Roll the second axis forward:
np.rollaxis(a.reshape(2, 3), 1)
Notice that all but the first case require you to reshape to the transpose.
You can even manually arrange the data
np.stack((a[:3], a[3:]), axis=1)
Note that this will make many unnecessary copies. If you want the data copied, just do
a.reshape(2, 3, order='F').copy()
I have split a numpy array like so:
x = np.random.randn(10,3)
x_split = np.split(x,5)
which splits x equally into five numpy arrays each with shape (2,3) and puts them in a list. What is the best way to combine a subset of these back together (e.g. x_split[:k] and x_split[k+1:]) so that the resulting shape is similar to the original x i.e. (something,3)?
I found that for k > 0 this is possible with you do:
np.vstack((np.vstack(x_split[:k]),np.vstack(x_split[k+1:])))
but this does not work when k = 0 as x_split[:0] = [] so there must be a better and cleaner way. The error message I get when k = 0 is:
ValueError: need at least one array to concatenate
The comment by Paul Panzer is right on target, but since NumPy now gently discourages vstack, here is the concatenate version:
x = np.random.randn(10, 3)
x_split = np.split(x, 5, axis=0)
k = 0
np.concatenate(x_split[:k] + x_split[k+1:], axis=0)
Note the explicit axis argument passed both times (it has to be the same); this makes it easy to adapt the code to work for other axes if needed. E.g.,
x_split = np.split(x, 3, axis=1)
k = 0
np.concatenate(x_split[:k] + x_split[k+1:], axis=1)
np.r_ can turn several slices into a list of indices.
In [20]: np.r_[0:3, 4:5]
Out[20]: array([0, 1, 2, 4])
In [21]: np.vstack([xsp[i] for i in _])
Out[21]:
array([[9, 7, 5],
[6, 4, 3],
[9, 8, 0],
[1, 2, 2],
[3, 3, 0],
[8, 1, 4],
[2, 2, 5],
[4, 4, 5]])
In [22]: np.r_[0:0, 1:5]
Out[22]: array([1, 2, 3, 4])
In [23]: np.vstack([xsp[i] for i in _])
Out[23]:
array([[9, 8, 0],
[1, 2, 2],
[3, 3, 0],
[8, 1, 4],
[3, 2, 0],
[0, 3, 8],
[2, 2, 5],
[4, 4, 5]])
Internally np.r_ has a lot of ifs and loops to handle the slices and their boundaries, but it hides it all from us.
If the xsp (your x_split) was an array, we could do xsp[np.r_[...]], but since it is a list we have to iterate. Well we could also hide that iteration with an operator.itemgetter object.
In [26]: operator.itemgetter(*Out[22])
Out[26]: operator.itemgetter(1, 2, 3, 4)
In [27]: np.vstack(operator.itemgetter(*Out[22])(xsp))
So I have a 4 by 4 matrix. [[1,2,3,4],[2,3,4,5],[3,4,5,6],[4,5,6,7]]
I need to subtract the second row by [1,2,3,4]
no numpy if possible. I'm a beginner and don't know how to use that
thnx
With regular Python loops:
a = [[1,2,3,4],[2,3,4,5],[3,4,5,6],[4,5,6,7]]
b = [1,2,3,4]
for i in range(4):
a[1][i] -= b[i]
Simply loop over the entries in the b list and subtract from the corresponding entries in a[1], the second list (ie row) of the a matrix.
However, NumPy can do this for you faster and easier and isn't too hard to learn:
In [47]: import numpy as np
In [48]: a = np.array([[1,2,3,4],[2,3,4,5],[3,4,5,6],[4,5,6,7]])
In [49]: a
Out[49]:
array([[1, 2, 3, 4],
[2, 3, 4, 5],
[3, 4, 5, 6],
[4, 5, 6, 7]])
In [50]: a[1] -= [1,2,3,4]
In [51]: a
Out[51]:
array([[1, 2, 3, 4],
[1, 1, 1, 1],
[3, 4, 5, 6],
[4, 5, 6, 7]])
Note that NumPy vectorizes many of its operations (such as subtraction), so the loops involved are handled for you (in fast, pre-compiled C-code).
I have a numpy array that looks like this
[
[[1,2,3], [4,5,6]],
[[3,8,9], [2,9,4]],
[[7,1,3], [1,3,6]]
]
I want it like this after deleting first column
[
[[2,3], [5,6]],
[[8,9], [9,4]],
[[1,3], [3,6]]
]
so currently the dimension is 3*3*3, after removing the first column it should be 3*3*2
You can slice it as so, where 1: signifies that you only want the second and all remaining columns from the inner most array (i.e. you 'delete' its first column).
>>> a[:, :, 1:]
array([[[2, 3],
[5, 6]],
[[8, 9],
[9, 4]],
[[1, 3],
[3, 6]]])
Since you are using numpy I'll mention numpy way of doing this. First of all, the dimension you have specified for the question seems wrong. See below
x = np.array([
[[1,2,3], [4,5,6]],
[[3,8,9], [2,9,4]],
[[7,1,3], [1,3,6]]
])
The shape of x is
x.shape
(3, 2, 3)
You can use numpy.delete to remove a column as shown below
a = np.delete(x, 0, 2)
a
array([[[2, 3],
[5, 6]],
[[8, 9],
[9, 4]],
[[1, 3],
[3, 6]]])
To find the shape of a
a.shape
(3, 2, 2)