I have need the N minimum (index) values in a numpy array

I have need the N minimum (index) values in a numpy array - python

Hi I have an array with X amount of values in it I would like to locate the indexs of the ten smallest values. In this link they calculated the maximum effectively, How to get indices of N maximum values in a numpy array?
however I cant comment on links yet so I'm having to repost the question.
I'm not sure which indices i need to change to achieve the minimum and not the maximum values.
This is their code
In [1]: import numpy as np
In [2]: arr = np.array([1, 3, 2, 4, 5])
In [3]: arr.argsort()[-3:][::-1]
Out[3]: array([4, 3, 1])

If you call
arr.argsort()[:3]
It will give you the indices of the 3 smallest elements.
array([0, 2, 1], dtype=int64)
So, for n, you should call
arr.argsort()[:n]

Since this question was posted, numpy has updated to include a faster way of selecting the smallest elements from an array using argpartition. It was first included in Numpy 1.8.
Using snarly's answer as inspiration, we can quickly find the k=3 smallest elements:
In [1]: import numpy as np
In [2]: arr = np.array([1, 3, 2, 4, 5])
In [3]: k = 3
In [4]: ind = np.argpartition(arr, k)[:k]
In [5]: ind
Out[5]: array([0, 2, 1])
In [6]: arr[ind]
Out[6]: array([1, 2, 3])
This will run in O(n) time because it does not need to do a full sort. If you need your answers sorted (Note: in this case the output array was in sorted order but that is not guaranteed) you can sort the output:
In [7]: sorted(arr[ind])
Out[7]: array([1, 2, 3])
This runs on O(n + k log k) because the sorting takes place on the smaller
output list.

I don't guarantee that this will be faster, but a better algorithm would rely on heapq.
import heapq
indices = heapq.nsmallest(10,np.nditer(arr),key=arr.__getitem__)
This should work in approximately O(N) operations whereas using argsort would take O(NlogN) operations. However, the other is pushed into highly optimized C, so it might still perform better. To know for sure, you'd need to run some tests on your actual data.

Just don't reverse the sort results.
In [164]: a = numpy.random.random(20)
In [165]: a
Out[165]:
array([ 0.63261763, 0.01718228, 0.42679479, 0.04449562, 0.19160089,
0.29653725, 0.93946388, 0.39915215, 0.56751034, 0.33210873,
0.17521395, 0.49573607, 0.84587652, 0.73638224, 0.36303797,
0.2150837 , 0.51665416, 0.47111993, 0.79984964, 0.89231776])
Sorted:
In [166]: a.argsort()
Out[166]:
array([ 1, 3, 10, 4, 15, 5, 9, 14, 7, 2, 17, 11, 16, 8, 0, 13, 18,
12, 19, 6])
First ten:
In [168]: a.argsort()[:10]
Out[168]: array([ 1, 3, 10, 4, 15, 5, 9, 14, 7, 2])

This code save 20 index of maximum element of split_list in Twenty_Maximum:
Twenty_Maximum = split_list.argsort()[-20:]
against this code save 20 index of minimum element of split_list in Twenty_Minimum:
Twenty_Minimum = split_list.argsort()[:20]

Related

Delete array of values from numpy array

This post is an extension of this question.
I would like to delete multiple elements from a numpy array that have certain values. That is for
import numpy as np
a = np.array([1, 1, 2, 5, 6, 8, 8, 8, 9])
How do I delete one instance of each value of [1,5,8], such that the output is [1,2,6,8,8,9]. All I have found in the documentation for an array removal is the use of np.setdiff1d, but this removes all instances of each number. How can this be updated?

Using outer comparison and argmax to only remove once. For large arrays this will be memory intensive, since the created mask has a.shape * r.shape elements.
r = np.array([1, 5, 8])
m = (a == r[:, None]).argmax(1)
np.delete(a, m)
array([1, 2, 6, 8, 8, 9])
This does assume that each value in r appears in a at least once, otherwise the value at index 0 will get deleted since argmax will not find a match, and will return 0.

delNums = [np.where(a == x)[0][0] for x in [1,5,8]]
a = np.delete(a, delNums)
here, delNums contains the indexes of the values 1,5,8 and np.delete() will delete the values at those specified indexes
OUTPUT:
[1 2 6 8 8 9]

deleting rows based on value found in specififc column

I am attempting to write a code that searches a numpy array for cases where the value in the fifth column does not have 50. If it does not I wish to remove it.
This is what I have so far:
for rows in range(len(b)):
if b[:,4].any() != 50:
b = np.delete(b, b[rows])
However, I keep getting the following error:
too many indices for array

Lets run the calculation with some diagnositic prints. Note where the error occurs. That's important! (We shouldn't just keep trying things without isolating the problem!)
In [2]: b=np.array([[0,1,2],[1,2,3],[2,1,2]])
In [3]: for row in range(len(b)):
...: print(row)
...: if b[:,2].any() !=2:
...: print(b[row])
...: b = np.delete(b, b[row])
...:
0
[0 1 2]
1
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-3-04dc188d9a2b> in <module>()
1 for row in range(len(b)):
2 print(row)
----> 3 if b[:,2].any() !=2:
4 print(b[row])
5 b = np.delete(b, b[row])
IndexError: too many indices for array
So the error occurs on the 2nd iteration (row 1). Something is wrong with the b after the delete. What is the new value of b?
In [4]: b
Out[4]: array([1, 2, 3, 2, 1, 2])
b is a 1d array, not the 2d we started with. That explains the error, right? Something must be wrong with the use of delete. Maybe we need to check its documentation????
Look at the axis parameter:
axis : int, optional
The axis along which to delete the subarray defined by `obj`.
If `axis` is None, `obj` is applied to the flattened array.
We didn't specify an axis, so the delete was applied to the flattened array, and result was flattened - 1d.
But even if I specify an axis I get an error (I won't get into that), which prompts me to look more carefully at the if condition:
In [10]: b[:,2]
Out[10]: array([2, 3, 2])
In [11]: b[:,2].any()
Out[11]: True
In [12]: b[:,2]!=2
Out[12]: array([False, True, False])
Applying any to the column don't make sense - it just checks if any values in the column are not 0. Instead we want to test the column against the target, getting a boolean that matches the column in size.
We can use that boolean directly as row selection mask
In [13]: b[_,:]
Out[13]: array([[1, 2, 3]])
No need to iterate.
Another problem with your iteration. You iterate on the range(3), [0,1,2]. But inside the loop you try to remove a row from b, changing the size of b. That going to give problems when you try to index b[row] by number, right? When iterating, in Python or numpy, be careful about modifying the object that you are iterating over.
Sorry to be long winded about this, but it looks like you need some basic debugging guidance.
Here's a basic list approach:
In [15]: [row for row in b if row[2]!=2]
Out[15]: [array([1, 2, 3])]
I'm iterating on the rows, not their indices, and for each row checking the column value, and keeping that row if the check is True. We could do that with np.delete, but a list comprehension is clearer (and faster).

It would be better to provide b and desired output, but if i understand it correctly, you could use:
import numpy as np
b = np.array([[50, 2, 3, 4, 5, 6],
[4, 50, 6, 7, 8, 9],
[1, 1, 1, 1, 50, 9]])
array([[50, 2, 3, 4, 5, 6],
[ 4, 50, 6, 7, 8, 9],
[ 1, 1, 1, 1, 50, 9]])
Then you can check which rows contain 50 in the 5th column using
b[:, 4] == 50
array([False, False, True])
and feed this Boolean array back to b to select the desired columns:
b[b[:, 4] == 50]
which leaves you with one row in this case
array([[ 1, 1, 1, 1, 50, 9]])

Computing a "moving sum of counts" on a NumPy array

I have the following arrays:
# input
In [77]: arr = np.array([23, 45, 23, 0, 12, 45, 45])
# result
In [78]: res = np.zeros_like(arr)
Now, I want to compute a moving sum of unique elements and store it in the res array.
Concretely, res array should be:
In [79]: res
Out[79]: array([1, 1, 2, 1, 1, 2, 3])
[23, 45, 23, 0, 12, 45, 45]
[1, 1, 2, 1, 1, 2, 3]
We start counting each element and increment the count if an element re-appears, until we reach end of the array. This element specific counts should be returned as result.
How should we do achieve this using NumPy built-in functions? I tried using numpy.bincount but it gives undesired results.

Not sure you'll find a builtin, so here is a homebrew using argsort.
def running_count(arr):
idx = arr.argsort(kind='mergesort')
sarr = arr[idx]
neq = np.where(sarr[1:] != sarr[:-1])[0] + 1
run = np.ones(arr.shape, int)
run[neq[0]] -= neq[0]
run[neq[1:]] -= np.diff(neq)
res = np.empty_like(run)
res[idx] = run.cumsum()
return res
For example:
>>> running_count(arr)
array([1, 1, 2, 1, 1, 2, 3])
>>> running_count(np.array(list("xabaaybeeetz")))
array([1, 1, 1, 2, 3, 1, 2, 1, 2, 3, 1, 1])
Explainer:
We first sort using argsort because we need indices to go back to original order in the end. Here it is important to have a stable sort, hence the use of the slow mergesort.
Once the elements are sorted the running counts will form a "saw tooth" pattern. The vectorized way to create this is to observe that the diff of a saw tooth has "jump" values where a new tooth starts and ones everywhere else. So that is staight-forward to construct.

Map a number to an id in python

Suppose I have a numpy array like: [11, 30, 25]. These numbers represent categories of the objects corresponding to the indices. I know there are just 20 categories but for some reason they are numbered from 11 to 29. I'd like to convert them to numbers in 0:19 and back. What would by a pythonic way to do this? Preferably in bumpy.
EDIT: this is just a small example of a bigger problem, where the number of categories are in the thousands, and some categories are never represented, so the maximum id will be the number of unique existing categories.

Let's say arr is the input array of categories.
Forward Process/Encoding : From categories to IDs
To perform the encoding, use np.unique alongwith its optional return_inverse argument to give us IDs that would have values from 0 to N-1, where N is the number of categories you would have in arr , like so -
unq,idx = np.unique(arr,return_inverse=True)
Backward Process/Decoding : From IDs to categories
To go back to the original categories from the IDs (idx), just index into unique categories saved earlier as unq, like so -
arr_out = unq[idx]
Sample run -
In [40]: arr # Input array of categories
Out[40]: array([7, 1, 1, 3, 8, 2, 7, 7, 0, 2])
In [41]: unq,idx = np.unique(arr,return_inverse=True)
In [42]: idx # ID array with values from 0 to 5 (6 categories)
Out[42]: array([4, 1, 1, 3, 5, 2, 4, 4, 0, 2])
In [43]: unq[idx] # Get back original array of categories
Out[43]: array([7, 1, 1, 3, 8, 2, 7, 7, 0, 2])

To be able to easily convert back-and-forth, I would use the sklearn.preprocessing module LabelEncoder:
In [7]: from sklearn.preprocessing import LabelEncoder
In [8]: encoder = LabelEncoder()
In [9]: encoder.fit(range(11,31))
Out[9]: LabelEncoder()
In [10]: encoder.transform([11,30,25])
Out[10]: array([ 0, 19, 14])
In [11]: encoder.inverse_transform([18, 1, 15])
Out[11]: array([29, 12, 26])

Change a 1D NumPy array from (implicit) row major to column major order

I have a 1D array in NumPy that implicitly represents some 2D data in row-major order. Here's a trivial example:
import numpy as np
# My data looks like [[1,2,3,4], [5,6,7,8]]
a = np.array([1,2,3,4,5,6,7,8])
I want to get a 1D array in column-major order (ie. b = [1,5,2,6,3,7,4,8] in the example above).
Normally, I would just do the following:
mat = np.reshape(a, (-1,4))
b = mat.flatten('F')
Unfortunately, the length of my input array is not an exact multiple of the row length I want (ie. a = [1,2,3,4,5,6,7]), so I can't call reshape. I want to keep that extra data, though, which might be quite a lot since my rows are pretty long. Is there any straightforward way to do this in NumPy?

The simplest way I can think of is not to try and use reshape with methods such as ravel('F'), but just to concatenate sliced views of your array.
For example:
>>> cols = 4
>>> a = np.array([1,2,3,4,5,6,7])
>>> np.concatenate([a[i::cols] for i in range(cols)])
array([1, 5, 2, 6, 3, 7, 4])
This works for any length of array and any number of columns:
>>> cols = 5
>>> b = np.arange(17)
>>> np.concatenate([b[i::cols] for i in range(cols)])
array([ 0, 5, 10, 15, 1, 6, 11, 16, 2, 7, 12, 3, 8, 13, 4, 9, 14])
Alternatively, use as_strided to reshape. The fact that the array a is too small to fit the (2, 4) shape doesn't matter: you'll just get junk (i.e. whatever's in memory) in the last place:
>>> np.lib.stride_tricks.as_strided(a, shape=(2, 4))
array([[ 1, 2, 3, 4],
[ 5, 6, 7, 168430121]])
>>> _.flatten('F')[:7]
array([1, 5, 2, 6, 3, 7, 4])
In the general case, given an array b and a desired number of columns cols you can do this:
>>> x = np.lib.stride_tricks.as_strided(b, shape=(len(b)//cols + 1, cols)) # reshape to min 2d array needed to hold array b
>>> np.concatenate((x[:,:len(b)%cols].ravel('F'), x[:-1, len(b)%cols:].ravel('F')))
This unravels the "good" part of the array (those columns not containing junk values) and the bad part (except for the junk values which lie in the bottom row) and concatenates the two unraveled arrays. For example:
>>> cols = 5
>>> b = np.arange(17)
>>> x = np.lib.stride_tricks.as_strided(b, shape=(len(b)//cols + 1, cols))
>>> np.concatenate((x[:,:len(b)%cols].ravel('F'), x[:-1, len(b)%cols:].ravel('F')))
array([ 0, 5, 10, 15, 1, 6, 11, 16, 2, 7, 12, 3, 8, 13, 4, 9, 14])

Use some value to represent null to make the array be a multiple of how you want to split it. If casting to float is acceptable, you could use nan's to represent the added elements that represent nulls. Then reshape to 2D, call transpose, and reshape to 1D. Then eliminate the nulls.
import numpy as np
a = np.array([1,2,3,4,5,6,7]) # input
b = np.concatenate( (a, [np.NaN]) ) # add a NaN to make it 8 = 4x2
c = b.reshape(2,4).transpose().reshape(8,) # reshape to 2x4, transpose, reshape to 8x1
d = c[-np.isnan(c)] # remove NaN
print d
[ 1. 5. 2. 6. 3. 7. 4.]

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

I have need the N minimum (index) values in a numpy array - python

If you call arr.argsort()[:3] It will give you the indices of the 3 smallest elements. array([0, 2, 1], dtype=int64) So, for n, you should call arr.argsort()[:n]

This code save 20 index of maximum element of split_list in Twenty_Maximum: Twenty_Maximum = split_list.argsort()[-20:] against this code save 20 index of minimum element of split_list in Twenty_Minimum: Twenty_Minimum = split_list.argsort()[:20]

Related

Delete array of values from numpy array

deleting rows based on value found in specififc column

Computing a "moving sum of counts" on a NumPy array

Map a number to an id in python

Change a 1D NumPy array from (implicit) row major to column major order

Categories

Resources