Related
This post is an extension of this question.
I would like to delete multiple elements from a numpy array that have certain values. That is for
import numpy as np
a = np.array([1, 1, 2, 5, 6, 8, 8, 8, 9])
How do I delete one instance of each value of [1,5,8], such that the output is [1,2,6,8,8,9]. All I have found in the documentation for an array removal is the use of np.setdiff1d, but this removes all instances of each number. How can this be updated?
Using outer comparison and argmax to only remove once. For large arrays this will be memory intensive, since the created mask has a.shape * r.shape elements.
r = np.array([1, 5, 8])
m = (a == r[:, None]).argmax(1)
np.delete(a, m)
array([1, 2, 6, 8, 8, 9])
This does assume that each value in r appears in a at least once, otherwise the value at index 0 will get deleted since argmax will not find a match, and will return 0.
delNums = [np.where(a == x)[0][0] for x in [1,5,8]]
a = np.delete(a, delNums)
here, delNums contains the indexes of the values 1,5,8 and np.delete() will delete the values at those specified indexes
OUTPUT:
[1 2 6 8 8 9]
a = np.array([5,8,3,4,2,5,7,8,1,9,1,3,4,7])
b = np.array ([3,4,7,8,1,3])
I have two lists of integers that each is grouped by every 2 consecutive items (ie indices [0, 1], [2, 3] and so on).
The pairs of items cannot be found as duplicates in either list, neither in the same or the reverse order.
One list is significantly larger and inclusive of the other.
I am trying to figure out an efficient way to get the indices
of the larger list's grouped items that are also in the smaller one.
The desired output in the example above should be:
[2,3,6,7,10,11] #indices
Notice that, as an example, the first group ([3,4]) should not get indices 11,12 as a match because in that case 3 is the second element of [1,3] and 4 the first element of [4,7].
Since you are grouping your arrays by pairs, you can reshape them into 2 columns for comparison. You can then compare each of the elements in the shorter array to the longer array, and reduce the boolean arrays. From there it is a simple matter to get the indices using a reshaped np.arange.
import numpy as np
from functools import reduce
a = np.array([5,8,3,4,2,5,7,8,1,9,1,3,4,7])
b = np.array ([3,4,7,8,1,3])
# reshape a and b into columns
a2 = a.reshape((-1,2))
b2 = b.reshape((-1,2))
# create a generator of bools for the row of a2 that holds b2
b_in_a_generator = (np.all(a2==row, axis=1) for row in b2)
# reduce the generator to get an array of boolean that is True for each row
# of a2 that equals one of the rows of b2
ix_bool = reduce(lambda x,y: x+y, b_in_a_generator)
# grab the indices by slicing a reshaped np.arange array
ix = np.arange(len(a)).reshape((-1,2))[ix_bool]
ix
# returns:
array([[ 2, 3],
[ 6, 7],
[10, 11]])
If you want a flat array, simply ravel ix
ix.ravel()
# returns
array([ 2, 3, 6, 7, 10, 11])
Here's one approach making use of NumPy view of group of elements -
# Taken from https://stackoverflow.com/a/45313353/
def view1D(a, b): # a, b are arrays
a = np.ascontiguousarray(a)
void_dt = np.dtype((np.void, a.dtype.itemsize * a.shape[1]))
return a.view(void_dt).ravel(), b.view(void_dt).ravel()
def grouped_indices(a, b):
a0v, b0v = view1D(a.reshape(-1,2), b.reshape(-1,2))
sidx = a0v.argsort()
idx = sidx[np.searchsorted(a0v,b0v, sorter=sidx)]
return ((idx*2)[:,None] + [0,1]).ravel()
If there isn't a membership between any group from b in a, we could filter that out using a mask : a0v[idx] == b0v.
Sample run -
In [345]: a
Out[345]: array([5, 8, 3, 4, 2, 5, 7, 8, 1, 9, 1, 3, 4, 7])
In [346]: b
Out[346]: array([3, 4, 7, 8, 1, 3])
In [347]: grouped_indices(a, b)
Out[347]: array([ 2, 3, 6, 7, 10, 11])
Another one using np.in1d to replace np.searchsorted -
def grouped_indices_v2(a, b):
a0v, b0v = view1D(a.reshape(-1,2), b.reshape(-1,2))
return (np.flatnonzero(np.in1d(a0v, b0v))[:,None]*2 + [0,1]).ravel()
How can I convert numpy array a to numpy array b in a (num)pythonic way. Solution should ideally work for arbitrary dimensions and array lengths.
import numpy as np
a=np.arange(12).reshape(2,3,2)
b=np.empty((2,3),dtype=object)
b[0,0]=np.array([0,1])
b[0,1]=np.array([2,3])
b[0,2]=np.array([4,5])
b[1,0]=np.array([6,7])
b[1,1]=np.array([8,9])
b[1,2]=np.array([10,11])
For a start:
In [638]: a=np.arange(12).reshape(2,3,2)
In [639]: b=np.empty((2,3),dtype=object)
In [640]: for index in np.ndindex(b.shape):
b[index]=a[index]
.....:
In [641]: b
Out[641]:
array([[array([0, 1]), array([2, 3]), array([4, 5])],
[array([6, 7]), array([8, 9]), array([10, 11])]], dtype=object)
It's not ideal since it uses iteration. But I wonder whether it is even possible to access the elements of b in any other way. By using dtype=object you break the basic vectorization that numpy is known for. b is essentially a list with numpy multiarray shape overlay. dtype=object puts an impenetrable wall around those size 2 arrays.
For example, a[:,:,0] gives me all the even numbers, in a (2,3) array. I can't get those numbers from b with just indexing. I have to use iteration:
[b[index][0] for index in np.ndindex(b.shape)]
# [0, 2, 4, 6, 8, 10]
np.array tries to make the highest dimension array that it can, given the regularity of the data. To fool it into making an array of objects, we have to give an irregular list of lists or objects. For example we could:
mylist = list(a.reshape(-1,2)) # list of arrays
mylist.append([]) # make the list irregular
b = np.array(mylist) # array of objects
b = b[:-1].reshape(2,3) # cleanup
The last solution suggests that my first one can be cleaned up a bit:
b = np.empty((6,),dtype=object)
b[:] = list(a.reshape(-1,2))
b = b.reshape(2,3)
I suspect that under the covers, the list() call does an iteration like
[x for x in a.reshape(-1,2)]
So time wise it might not be much different from the ndindex time.
One thing that I wasn't expecting about b is that I can do math on it, with nearly the same generality as on a:
b-10
b += 10
b *= 2
An alternative to an object dtype would be a structured dtype, e.g.
In [785]: b1=np.zeros((2,3),dtype=[('f0',int,(2,))])
In [786]: b1['f0'][:]=a
In [787]: b1
Out[787]:
array([[([0, 1],), ([2, 3],), ([4, 5],)],
[([6, 7],), ([8, 9],), ([10, 11],)]],
dtype=[('f0', '<i4', (2,))])
In [788]: b1['f0']
Out[788]:
array([[[ 0, 1],
[ 2, 3],
[ 4, 5]],
[[ 6, 7],
[ 8, 9],
[10, 11]]])
In [789]: b1[1,1]['f0']
Out[789]: array([8, 9])
And b and b1 can be added: b+b1 (producing an object dtype). Curiouser and curiouser!
Based on hpaulj I provide a litte more generic solution. a is an array of dimension N which shall be converted to an array b of dimension N1 with dtype object holding arrays of dimension (N-N1).
In the example N equals 5 and N1 equals 3.
import numpy as np
N=5
N1=3
#create array a with dimension N
a=np.random.random(np.random.randint(2,20,size=N))
a_shape=a.shape
b_shape=a_shape[:N1] # shape of array b
b_arr_shape=a_shape[N1:] # shape of arrays in b
#Solution 1 with list() method (faster)
b=np.empty(np.prod(b_shape),dtype=object) #init b
b[:]=list(a.reshape((-1,)+b_arr_shape))
b=b.reshape(b_shape)
print "Dimension of b: {}".format(len(b.shape)) # dim of b
print "Dimension of array in b: {}".format(len(b[0,0,0].shape)) # dim of arrays in b
#Solution 2 with ndindex loop (slower)
b=np.empty(b_shape,dtype=object)
for index in np.ndindex(b_shape):
b[index]=a[index]
print "Dimension of b: {}".format(len(b.shape)) # dim of b
print "Dimension of array in b: {}".format(len(b[0,0,0].shape)) # dim of arrays in b
I have a 1D array in NumPy that implicitly represents some 2D data in row-major order. Here's a trivial example:
import numpy as np
# My data looks like [[1,2,3,4], [5,6,7,8]]
a = np.array([1,2,3,4,5,6,7,8])
I want to get a 1D array in column-major order (ie. b = [1,5,2,6,3,7,4,8] in the example above).
Normally, I would just do the following:
mat = np.reshape(a, (-1,4))
b = mat.flatten('F')
Unfortunately, the length of my input array is not an exact multiple of the row length I want (ie. a = [1,2,3,4,5,6,7]), so I can't call reshape. I want to keep that extra data, though, which might be quite a lot since my rows are pretty long. Is there any straightforward way to do this in NumPy?
The simplest way I can think of is not to try and use reshape with methods such as ravel('F'), but just to concatenate sliced views of your array.
For example:
>>> cols = 4
>>> a = np.array([1,2,3,4,5,6,7])
>>> np.concatenate([a[i::cols] for i in range(cols)])
array([1, 5, 2, 6, 3, 7, 4])
This works for any length of array and any number of columns:
>>> cols = 5
>>> b = np.arange(17)
>>> np.concatenate([b[i::cols] for i in range(cols)])
array([ 0, 5, 10, 15, 1, 6, 11, 16, 2, 7, 12, 3, 8, 13, 4, 9, 14])
Alternatively, use as_strided to reshape. The fact that the array a is too small to fit the (2, 4) shape doesn't matter: you'll just get junk (i.e. whatever's in memory) in the last place:
>>> np.lib.stride_tricks.as_strided(a, shape=(2, 4))
array([[ 1, 2, 3, 4],
[ 5, 6, 7, 168430121]])
>>> _.flatten('F')[:7]
array([1, 5, 2, 6, 3, 7, 4])
In the general case, given an array b and a desired number of columns cols you can do this:
>>> x = np.lib.stride_tricks.as_strided(b, shape=(len(b)//cols + 1, cols)) # reshape to min 2d array needed to hold array b
>>> np.concatenate((x[:,:len(b)%cols].ravel('F'), x[:-1, len(b)%cols:].ravel('F')))
This unravels the "good" part of the array (those columns not containing junk values) and the bad part (except for the junk values which lie in the bottom row) and concatenates the two unraveled arrays. For example:
>>> cols = 5
>>> b = np.arange(17)
>>> x = np.lib.stride_tricks.as_strided(b, shape=(len(b)//cols + 1, cols))
>>> np.concatenate((x[:,:len(b)%cols].ravel('F'), x[:-1, len(b)%cols:].ravel('F')))
array([ 0, 5, 10, 15, 1, 6, 11, 16, 2, 7, 12, 3, 8, 13, 4, 9, 14])
Use some value to represent null to make the array be a multiple of how you want to split it. If casting to float is acceptable, you could use nan's to represent the added elements that represent nulls. Then reshape to 2D, call transpose, and reshape to 1D. Then eliminate the nulls.
import numpy as np
a = np.array([1,2,3,4,5,6,7]) # input
b = np.concatenate( (a, [np.NaN]) ) # add a NaN to make it 8 = 4x2
c = b.reshape(2,4).transpose().reshape(8,) # reshape to 2x4, transpose, reshape to 8x1
d = c[-np.isnan(c)] # remove NaN
print d
[ 1. 5. 2. 6. 3. 7. 4.]
Hi I have an array with X amount of values in it I would like to locate the indexs of the ten smallest values. In this link they calculated the maximum effectively, How to get indices of N maximum values in a numpy array?
however I cant comment on links yet so I'm having to repost the question.
I'm not sure which indices i need to change to achieve the minimum and not the maximum values.
This is their code
In [1]: import numpy as np
In [2]: arr = np.array([1, 3, 2, 4, 5])
In [3]: arr.argsort()[-3:][::-1]
Out[3]: array([4, 3, 1])
If you call
arr.argsort()[:3]
It will give you the indices of the 3 smallest elements.
array([0, 2, 1], dtype=int64)
So, for n, you should call
arr.argsort()[:n]
Since this question was posted, numpy has updated to include a faster way of selecting the smallest elements from an array using argpartition. It was first included in Numpy 1.8.
Using snarly's answer as inspiration, we can quickly find the k=3 smallest elements:
In [1]: import numpy as np
In [2]: arr = np.array([1, 3, 2, 4, 5])
In [3]: k = 3
In [4]: ind = np.argpartition(arr, k)[:k]
In [5]: ind
Out[5]: array([0, 2, 1])
In [6]: arr[ind]
Out[6]: array([1, 2, 3])
This will run in O(n) time because it does not need to do a full sort. If you need your answers sorted (Note: in this case the output array was in sorted order but that is not guaranteed) you can sort the output:
In [7]: sorted(arr[ind])
Out[7]: array([1, 2, 3])
This runs on O(n + k log k) because the sorting takes place on the smaller
output list.
I don't guarantee that this will be faster, but a better algorithm would rely on heapq.
import heapq
indices = heapq.nsmallest(10,np.nditer(arr),key=arr.__getitem__)
This should work in approximately O(N) operations whereas using argsort would take O(NlogN) operations. However, the other is pushed into highly optimized C, so it might still perform better. To know for sure, you'd need to run some tests on your actual data.
Just don't reverse the sort results.
In [164]: a = numpy.random.random(20)
In [165]: a
Out[165]:
array([ 0.63261763, 0.01718228, 0.42679479, 0.04449562, 0.19160089,
0.29653725, 0.93946388, 0.39915215, 0.56751034, 0.33210873,
0.17521395, 0.49573607, 0.84587652, 0.73638224, 0.36303797,
0.2150837 , 0.51665416, 0.47111993, 0.79984964, 0.89231776])
Sorted:
In [166]: a.argsort()
Out[166]:
array([ 1, 3, 10, 4, 15, 5, 9, 14, 7, 2, 17, 11, 16, 8, 0, 13, 18,
12, 19, 6])
First ten:
In [168]: a.argsort()[:10]
Out[168]: array([ 1, 3, 10, 4, 15, 5, 9, 14, 7, 2])
This code save 20 index of maximum element of split_list in Twenty_Maximum:
Twenty_Maximum = split_list.argsort()[-20:]
against this code save 20 index of minimum element of split_list in Twenty_Minimum:
Twenty_Minimum = split_list.argsort()[:20]