Concatenate and sort data in python - python

I have a few numpy arrays like so:
import numpy as np
a = np.array([[1, 2, 3, 4, 5], [14, 16, 17, 27, 38]])
b = np.array([[1, 2, 3, 4, 5], [.4, .2, .5, .1, .6]])
I'd like to be able to 1.Copy these arrays into a new single array and 2. Sort the data so that the result is as follows:
data = [[1, 1, 2, 2, 3, 3, 4, 4, 5, 5], [14, .4, 16, .2, 17, .5, 27, .1, 38, .6]]
Or, in other words, I need all columns from the original array to be the same, just in an ascending order. I tried this:
data = np.hstack((a,b))
Which gave me the appended data, but I'm not sure how to sort it. I tried np.sort() but it didn't keep the columns the same. Thanks!

Stack those horizontally (as you already did), then get argsort indices for sorting first row and use those to sort all columns in the stacked array.
Thus, we need to add one more step, like so -
ab = np.hstack((a,b))
out = ab[:,ab[0].argsort()]
Sample run -
In [370]: a
Out[370]:
array([[ 1, 2, 3, 4, 5],
[14, 16, 17, 27, 38]])
In [371]: b
Out[371]:
array([[ 1. , 2. , 3. , 4. , 5. ],
[ 0.4, 0.2, 0.5, 0.1, 0.6]])
In [372]: ab = np.hstack((a,b))
In [373]: print ab[:,ab[0].argsort()]
[[ 1. 1. 2. 2. 3. 3. 4. 4. 5. 5. ]
[ 14. 0.4 16. 0.2 17. 0.5 27. 0.1 38. 0.6]]
Please note that to keep the order for identical elements, we need to use to use kind='mergesort' with argsort as described in the docs.

If you like something short.
np.array(zip(*sorted(zip(*np.hstack((a,b))))))
>>> array([[ 1. , 1. , 2. , 2. , 3. , 3. , 4. , 4. , 5. , 5. ],
[ 0.4, 14. , 0.2, 16. , 0.5, 17. , 0.1, 27. , 0.6, 38. ]])
Version that preserve second element order:
np.array(zip(*sorted(zip(*np.hstack((a,b))),key=lambda x:x[0])))
>>>array([[ 1. , 1. , 2. , 2. , 3. , 3. , 4. , 4. , 5. , 5. ],
[ 14. , 0.4, 16. , 0.2, 17. , 0.5, 27. , 0.1, 38. ,0.6]])

Related

Python3: Remove array elements with same coordinate (x,y)

I have this array (x,y,f(x,y)):
a=np.array([[ 1, 5, 3],
[ 4, 5, 6],
[ 4, 5, 6.1],
[ 1, 3, 42]])
I want to remove the duplicates with same x,y. In my array I have (4,5,6) and (4,5,6.1) and I want to remove one of them (no criterion).
If I had 2 columns (x,y) I could use
np.unique(a[:,:2], axis = 0)
But my array has 3 columns and I don't see how to do this in a simple way.
I can do a loop but my arrays can be very large.
Is there a way to do this more efficiently?
If I understand correctly, you need this:
a[np.unique(a[:,:2],axis=0,return_index=True)[1]]
output:
[[ 1. 3. 42.]
[ 1. 5. 3.]
[ 4. 5. 6.]]
Please be mindful that it does not keep the original order of rows in a. If you want to keep the order, simply sort the indices:
a[np.sort(np.unique(a[:,:2],axis=0,return_index=True)[1])]
output:
[[ 1. 5. 3.]
[ 4. 5. 6.]
[ 1. 3. 42.]]
I think you want to do this?
np.rint will round your numbers to an integer
import numpy as np
a = np.array([
[ 1, 5, 3],
[ 4, 5, 6],
[ 4, 5, 6.1],
[ 1, 3, 42]
])
a = np.unique(np.rint(a), axis = 0)
print(a)
//result :
[[ 1. 3. 42.]
[ 1. 5. 3.]
[ 4. 5. 6.]]

For what reason makes numpy MGrid some indexing convention by complex numbers?

I have some problem with understanding. I read the following:
class MGridClass(nd_grid):
"""
`nd_grid` instance which returns a dense multi-dimensional "meshgrid".
An instance of `numpy.lib.index_tricks.nd_grid` which returns an dense
(or fleshed out) mesh-grid when indexed, so that each returned argument
has the same shape. The dimensions and number of the output arrays are
equal to the number of indexing dimensions. If the step length is not a
complex number, then the stop is not inclusive.
However, if the step length is a **complex number** (e.g. 5j), then
the integer part of its magnitude is interpreted as specifying the
number of points to create between the start and stop values, where
the stop value **is inclusive**.
So if I give real numbers, the content is 'modulo n==0'-wise divided:
>>> numpy.mgrid[0:4:1, 10:15:2]
array([[[ 0, 0, 0],
[ 1, 1, 1],
[ 2, 2, 2],
[ 3, 3, 3]],
[[10, 12, 14],
[10, 12, 14],
[10, 12, 14],
[10, 12, 14]]])
And with complex numbers - the number the integer with j suffix, instead of i for technical purposes - its the length of resulting values in the corresponding axis.
>>> numpy.mgrid[0:4:3j, 10:15:5j]
array([[[ 0. , 0. , 0. , 0. , 0. ],
[ 2. , 2. , 2. , 2. , 2. ],
[ 4. , 4. , 4. , 4. , 4. ]],
[[10. , 11.25, 12.5 , 13.75, 15. ],
[10. , 11.25, 12.5 , 13.75, 15. ],
[10. , 11.25, 12.5 , 13.75, 15. ]]])
But what's special with complex numbers, that they would be appropriate to reflect this change of perspective instead of a simple flag? Is here another part of real fancyness of numpy?

Get more than one dimension with numpy take function

I've got a one dimensional array (n) called edges and want to insert the values by the index from the vertices array (n,3)
vertices = [[ 1.25, 4.321, -4], [2, -5, 3.32], [23.3, 43, 12], [32, 4, -23]]
edges = [1, 3, 2, 0]
result = [[2, -5, 3.32], [32, 4, -23], [23.3, 43, 12], [ 1.25, 4.321, -4]]
I tried np.take(vertices, edges) but It doesn't work for multi dimensional arrays.
take with axis parameter works
In [313]: vertices=np.array(vertices)
In [314]: edges=[1,3,2,0]
In [315]: np.take(vertices, edges,0)
Out[315]:
array([[ 2. , -5. , 3.32 ],
[ 32. , 4. , -23. ],
[ 23.3 , 43. , 12. ],
[ 1.25 , 4.321, -4. ]])
In [316]: vertices[edges,:]
Out[316]:
array([[ 2. , -5. , 3.32 ],
[ 32. , 4. , -23. ],
[ 23.3 , 43. , 12. ],
[ 1.25 , 4.321, -4. ]])
You can simply use indexing here:
vertices[edges]
# ^ ^ indexing
If you index with a list, then numpy will reshuffle the original matrix such that the highest dimension here follows the indices as specified by edges.
like:
>>> vertices = np.array([[ 1.25, 4.321, -4], [2, -5, 3.32], [23.3, 43, 12], [32, 4, -23]])
>>> edges = [1, 3, 2, 0]
>>> vertices[edges]
array([[ 2. , -5. , 3.32 ],
[ 32. , 4. , -23. ],
[ 23.3 , 43. , 12. ],
[ 1.25 , 4.321, -4. ]])
>>> vertices[edges].base is None
True
The fact that base is None means that this does not generate a view, it makes a copy of the matrix (with filtered/reordered rows). Changes you thus later make to the elements of vertices will not change the elements of the result of vertices[edges] (given you make the copy before altering vertices of course).

Delete columns based on repeat value in one row in numpy array

I'm hoping to delete columns in my arrays that have repeat entries in row 1 as shown below (row 1 has repeats of values 1 & 2.5, so one of each of those values have been been deleted, together with the column each deleted value lies within).
initial_array =
row 0 [[ 1, 1, 1, 1, 1, 1, 1, 1,]
row 1 [0.5, 1, 2.5, 4, 2.5, 2, 1, 3.5,]
row 2 [ 1, 1.5, 3, 4.5, 3, 2.5, 1.5, 4,]
row 3 [228, 314, 173, 452, 168, 351, 300, 396]]
final_array =
row 0 [[ 1, 1, 1, 1, 1, 1,]
row 1 [0.5, 1, 2.5, 4, 2, 3.5,]
row 2 [ 1, 1.5, 3, 4.5, 2.5, 4,]
row 3 [228, 314, 173, 452, 351, 396]]
Ways I was thinking of included using some function that checked for repeats, giving a True response for the second (or more) time a value turned up in the dataset, then using that response to delete the row. That or possibly using the return indices function within numpy.unique. I just can't quite find a way through it or find the right function though.
If I could find a way to return an mean value in the row 3 of the retained repeat and the deleted one, that would be even better (see below).
final_array_averaged =
row 0 [[ 1, 1, 1, 1, 1, 1,]
row 1 [0.5, 1, 2.5, 4, 2, 3.5,]
row 2 [ 1, 1.5, 3, 4.5, 2.5, 4,]
row 3 [228, 307, 170.5, 452, 351, 396]]
Thanks in advance for any help you can give to a beginner who is stumped!
You can use the optional arguments that come with np.unique and then use np.bincount to use the last row as weights to get the final averaged output, like so -
_,unqID,tag,C = np.unique(arr[1],return_index=1,return_inverse=1,return_counts=1)
out = arr[:,unqID]
out[-1] = np.bincount(tag,arr[3])/C
Sample run -
In [212]: arr
Out[212]:
array([[ 1. , 1. , 1. , 1. , 1. , 1. , 1. , 1. ],
[ 0.5, 1. , 2.5, 4. , 2.5, 2. , 1. , 3.5],
[ 1. , 1.5, 3. , 4.5, 3. , 2.5, 1.5, 4. ],
[ 228. , 314. , 173. , 452. , 168. , 351. , 300. , 396. ]])
In [213]: out
Out[213]:
array([[ 1. , 1. , 1. , 1. , 1. , 1. ],
[ 0.5, 1. , 2. , 2.5, 3.5, 4. ],
[ 1. , 1.5, 2.5, 3. , 4. , 4.5],
[ 228. , 307. , 351. , 170.5, 396. , 452. ]])
As can be seen that the output has now an order with the second row being sorted. If you are looking to keep the order as it was originally, use np.argsort of unqID, like so -
In [221]: out[:,unqID.argsort()]
Out[221]:
array([[ 1. , 1. , 1. , 1. , 1. , 1. ],
[ 0.5, 1. , 2.5, 4. , 2. , 3.5],
[ 1. , 1.5, 3. , 4.5, 2.5, 4. ],
[ 228. , 307. , 170.5, 452. , 351. , 396. ]])
You can find the indices of wanted columns using unique:
>>> indices = np.sort(np.unique(A[1], return_index=True)[1])
Then use a simple indexing to get the desire columns:
>>> A[:,indices]
array([[ 1. , 1. , 1. , 1. , 1. , 1. ],
[ 0.5, 1. , 2.5, 4. , 2. , 3.5],
[ 1. , 1.5, 3. , 4.5, 2.5, 4. ],
[ 228. , 314. , 173. , 452. , 351. , 396. ]])
This is a typical grouping problem, which can be solve elegantly and efficiently using the numpy_indexed package (disclaimer: I am its author):
import numpy_indexed as npi
unique, final_array = npi.group_by(initial_array[1]).mean(initial_array, axis=1)
Note that there are many other reductions than mean; if you want the original behavior you described, you could replace 'mean' with 'first', for instance.

Prepending 1d array onto each 2d array of a 3d array

Say I have the size (2,3,2) array a and the size (2) array b below.
import numpy as np
a = np.array([[[1, 2], [3, 4], [5, 6]], [[7, 8], [9, 10], [11, 12]]])
b = np.array([0.2, 0.8])
Array a looks like this:
I'd like to use numpy routines to concatenate b to the first row of each 2d arrray in a to make the array
I can't seem to make concatenate, vstack, append, etc. work.
Try this:
np.concatenate(([[b]]*2,a),axis=1)
# Result:
array([[[ 0.2, 0.8],
[ 1. , 2. ],
[ 3. , 4. ],
[ 5. , 6. ]],
[[ 0.2, 0.8],
[ 7. , 8. ],
[ 9. , 10. ],
[ 11. , 12. ]]])
This works:
np.insert(a.astype(float), 0, b, 1)
Output:
array([[[ 0.2, 0.8],
[ 1. , 2. ],
[ 3. , 4. ],
[ 5. , 6. ]],
[[ 0.2, 0.8],
[ 7. , 8. ],
[ 9. , 10. ],
[ 11. , 12. ]]])
If you don't cast with astype() first, you just end up prepending [0, 0]
Note, this is slower than the concatenate():
$ python test.py
m1: 8.20246601105 sec
m2: 43.8010189533 sec
Code:
#!/usr/bin/python
import numpy as np
import timeit
a = np.array([[[1, 2], [3, 4], [5, 6]], [[7, 8], [9, 10], [11, 12]]])
b = np.array([0.2, 0.8])
def m1():
np.concatenate(([[b]]*2,a),axis=1)
def m2():
np.insert(a.astype(float), 0, b, 1)
print "m1: %s sec" % timeit.timeit(m1)
print "m2: %s sec" % timeit.timeit(m2)

Categories