how to get items that belongs to all numpy arrays? - python

I need some analogue to numpy.in1d() function, my task is to have list of items, that more than 2 arrays have. For example i have 3 arrays:
a = np.array((1,2,5,6,12))
b = np.array((1,3,7,8,5,14,19))
c = np.array((2,6,9,5,1,22))
the result should be [1, 5]
Any faster way than pure cycle using np.in1d to compare first with all the rest? Some unions of arrays or some smart subindexing?

You can use np.intersect1d. For example:
In [15]: np.intersect1d(a, np.intersect1d(b, c))
Out[15]: array([1, 5])
or with reduce:
In [16]: reduce(np.intersect1d, (a, b, c))
Out[16]: array([1, 5])
If you know the elements within each array are unique, use the argument assume_unique=True:
In [21]: reduce(lambda x, y: np.intersect1d(x, y, assume_unique=True), (a, b, c))
Out[21]: array([1, 5])

If each list is unique you can try:
>>> total=np.concatenate((a,b,c))
>>> np.where(np.bincount(total)>2)
(array([1, 5]),)
#Might be faster to do this.
>>>bins=np.bincount(total)
>>>np.arange(bins.shape[0])[bins>2]
array([1, 5])
If these arrays are large:
>>> tmp=np.concatenate((np.unique(a),np.unique(b),np.unique(c)))
>>> tmp
array([ 1, 2, 5, 6, 12, 1, 3, 5, 7, 8, 14, 19, 1, 2, 5, 6, 9,
22])
>>> ulist,uindices=np.unique(tmp,return_inverse=True)
>>> ulist
array([ 1, 2, 3, 5, 6, 7, 8, 9, 12, 14, 19, 22])
>>> uindices
array([ 0, 1, 3, 4, 8, 0, 2, 3, 5, 6, 9, 10, 0, 1, 3, 4, 7,
11])
>>> np.bincount(uindices)
array([3, 2, 1, 3, 2, 1, 1, 1, 1, 1, 1, 1])
>>> ulist[np.bincount(uindices)>2]
array([1, 5])

Related

How to remap list of values from an array to another list of values in NumPy?

Let's say we have initial array:
test_array = np.array([1, 4, 2, 5, 7, 4, 2, 5, 6, 7, 7, 2, 5])
What is the best way to remap elements in this array by using two other arrays, one that represents elements we want to replace and second one which represents new values which replace them:
map_from = np.array([2, 4, 5])
map_to = np.array([9, 0, 3])
So the results should be:
remaped_array = [1, 0, 9, 3, 7, 0, 9, 3, 6, 7, 7, 9, 3]
There might be a more succinct way of doing this, but this should work by using a mask.
mask = test_array[:,None] == map_from
val = map_to[mask.argmax(1)]
np.where(mask.any(1), val, test_array)
output:
array([1, 0, 9, 3, 7, 0, 9, 3, 6, 7, 7, 9, 3])
If your original array contains only positive integers and their maximum values are not very large, it is easiest to use a mapped array:
>>> a = np.array([1, 4, 2, 5, 7, 4, 2, 5, 6, 7, 7, 2, 5])
>>> mapping = np.arange(a.max() + 1)
>>> map_from = np.array([2, 4, 5])
>>> map_to = np.array([9, 0, 3])
>>> mapping[map_from] = map_to
>>> mapping[a]
array([1, 0, 9, 3, 7, 0, 9, 3, 6, 7, 7, 9, 3])
Here is another general method:
>>> vals, inv = np.unique(a, return_inverse=True)
>>> vals[np.searchsorted(vals, map_from)] = map_to
>>> vals[inv]
array([1, 0, 9, 3, 7, 0, 9, 3, 6, 7, 7, 9, 3])

Slicing list with respect to limits

I have the following problem regarding slicing arrays.
I have an array of the form:
a = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
and I would like to extract an array like this:
b = [8, 9, 10, 0, 1, 2, 3].
I tried using usual declaration like: b = a[-3 : 3] but it returns an empty array since it's going over the array limit.
I can't figure out how I can extract this array in correct permutation.
You can just concatenate the list slices you are looking for:
b = a[-3:] + a[:4]
In your example, you preserve all of the array elements. For this, there is a function called roll.
>>> import numpy as np
>>> a = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
>>> np.roll(a, 3)
array([8, 9, 10, 0, 1, 2, 3, 4, 5, 6, 7])
You can also use a negative number to roll the other way:
>>> np.roll(a, -3)
array([ 3, 4, 5, 6, 7, 8, 9, 10, 0, 1, 2])

Skip every nth index of numpy array

In order to do K-fold validation I would like to use slice a numpy array such that a view of the original array is made but with every nth element removed.
For example:
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
If n = 4 then the result would be
[1, 2, 4, 5, 6, 8, 9]
Note: the numpy requirement is due to this being used for a machine learning assignment where the dependencies are fixed.
Approach #1 with modulus
a[np.mod(np.arange(a.size),4)!=0]
Sample run -
In [255]: a
Out[255]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
In [256]: a[np.mod(np.arange(a.size),4)!=0]
Out[256]: array([1, 2, 3, 5, 6, 7, 9])
Approach #2 with masking : Requirement as a view
Considering the views requirement, if the idea is to save on memory, we could store the equivalent boolean array that would occupy 8 times less memory on Linux system. Thus, such a mask based approach would be like so -
# Create mask
mask = np.ones(a.size, dtype=bool)
mask[::4] = 0
Here's the memory requirement stat -
In [311]: mask.itemsize
Out[311]: 1
In [312]: a.itemsize
Out[312]: 8
Then, we could use boolean-indexing as a view -
In [313]: a
Out[313]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
In [314]: a[mask] = 10
In [315]: a
Out[315]: array([ 0, 10, 10, 10, 4, 10, 10, 10, 8, 10])
Approach #3 with NumPy array strides : Requirement as a view
You can use np.lib.stride_tricks.as_strided to create such a view given the length of the input array is a multiple of n. If it's not a multiple, it would still work, but won't be a safe practice, as we would be going beyond the memory allocated for input array. Please note that the view thus created would be 2D.
Thus, an implementaion to get such a view would be -
def skipped_view(a, n):
s = a.strides[0]
strided = np.lib.stride_tricks.as_strided
return strided(a,shape=((a.size+n-1)//n,n),strides=(n*s,s))[:,1:]
Sample run -
In [50]: a = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]) # Input array
In [51]: a_out = skipped_view(a, 4)
In [52]: a_out
Out[52]:
array([[ 1, 2, 3],
[ 5, 6, 7],
[ 9, 10, 11]])
In [53]: a_out[:] = 100 # Let's prove output is a view indeed
In [54]: a
Out[54]: array([ 0, 100, 100, 100, 4, 100, 100, 100, 8, 100, 100, 100])
numpy.delete :
In [18]: arr = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
In [19]: arr = np.delete(arr, np.arange(0, arr.size, 4))
In [20]: arr
Out[20]: array([1, 2, 3, 5, 6, 7, 9])
The slickest answer that I found is using delete with i being the nth index which you want to skip:
del list[i-1::i]
Example:
In [1]: a = list([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
In [2]: del a[4-1::4]
In [3]: print(a)
Out[3]: [0, 1, 2, 4, 5, 6, 8, 9]
If you also want to skip the first value, use a[1:].

Vectorized search of element indeces

I have two integer numpy arrays, let's say, arr1 and arr2, that are permutations of range(some_length)
I want to get the third one, where
arr3[idx] = arr1.get_index_of(arr2[idx]) for all idx = 0,1,2,..., some_length-1
here get_index_of method is a pseudo-method of getting index of some element in the collection.
That can be done with naive looping through all the indeces, searching correspondnent element with subsequent assignment of it's index, etc.
But that is slow -- O(n^2). Can it be done faster (At least n*log(n) complexity)? Can it be done via pretty numpy methods? Maybe some sorting with non-trivial key= parameter? Sure there is some elegant solution.
Thank you in advance.
say, a is a permutation of 0..9:
>>> a = np.random.permutation(10)
>>> a
array([3, 7, 1, 8, 2, 4, 6, 0, 9, 5])
then, the indexer array is:
>>> i = np.empty(len(a), dtype='i8')
>>> i[a] = np.arange(len(a))
>>> i
array([7, 2, 4, 0, 5, 9, 6, 1, 3, 8])
this means that, index of say 0 in a is i[0] == 7, which is true since a[7] == 0.
So, in your example, say if you have an extra vector b, you can do as in below:
>>> b
array([5, 9, 4, 8, 6, 1, 7, 2, 3, 0])
>>> i[b]
array([9, 8, 5, 3, 6, 2, 1, 4, 0, 7])
which means that, say, b[0] == 5 and index of 5 in a is i[b][0] == 9, which is true, since a[9] = 5 = b[0].
Lets try a test case
In [166]: arr1=np.random.permutation(10)
In [167]: arr2=np.random.permutation(10)
In [168]: arr1
Out[168]: array([4, 3, 2, 9, 7, 8, 5, 1, 6, 0])
In [169]: arr2
Out[169]: array([9, 2, 6, 4, 0, 3, 1, 7, 8, 5])
np.where(arr1==i) performs your get_index_of method, so your iterative solution is:
In [170]: np.array([np.where(arr1==i)[0] for i in arr2]).flatten()
Out[170]: array([3, 2, 8, 0, 9, 1, 7, 4, 5, 6], dtype=int32)
A vectorized approach is to do an 'outter' comparison between the 2 arrays. This produces a (10,10) array, to which we can apply where to get the indices. Still an O(n^2) method, but it is mostly compiled. On this size of a problem it is 5x faster.
In [171]: np.where(arr1==arr2[:,None])
Out[171]:
(array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=int32),
array([3, 2, 8, 0, 9, 1, 7, 4, 5, 6], dtype=int32))

Fast way to select n items (drawn from a Poisson distribution) for each element in array x

I am having some trouble with solving a problem I encountered.
I have an array with prices:
>>> x = np.random.randint(10, size=10)
array([6, 1, 7, 6, 9, 0, 8, 2, 1, 8])
And a (randomly) generated array of Poisson distributed arrivals:
>>> arrivals = np.random.poisson(1, size=10)
array([4, 0, 1, 1, 3, 2, 1, 3, 2, 1])
Each single arrival should be associated with the price at the same index. So in the case above, the first element ( x[0] ) should be selected 4 times ( y[0] ). The second element ( x[1] ) should be selected 0 times ( y[1] )... The result thus should be:
array([6, 6, 6, 6, 7, 6, 9, 9, 9, 0, 0, 8, 2, 2, 2, 1, 1, 8])
Is there any (fast) way to accomplish this, without iterating over the arrays? Any help would be greatly appreciated.
You could use np.repeat:
In [43]: x = np.array([6, 1, 7, 6, 9, 0, 8, 2, 1, 8])
In [44]: arrivals = np.array([4, 0, 1, 1, 3, 2, 1, 3, 2, 1])
In [45]: np.repeat(x, arrivals)
Out[45]: array([6, 6, 6, 6, 7, 6, 9, 9, 9, 0, 0, 8, 2, 2, 2, 1, 1, 8])
but note that for certain calculations, it might be possible to avoid having to form this intermediate array. See for example, scipy.stats.binned_statistic.
I don't really see how you could do that without looping at all.
What you could do is create the result array prior to looping; that way you don't need to concatenate afterwards.
Result = np.empty( arrivals.sum(), dtype='i' )
and then change the values of that array blockwise:
Result_position = np.r_[ [0], arrivals.cumsum() ]
for i, xx in enumerate(x):
Result[ Result_position[i]:Result_position[i+1] ] = xx

Categories