Skip every nth index of numpy array - python

In order to do K-fold validation I would like to use slice a numpy array such that a view of the original array is made but with every nth element removed.
For example:
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
If n = 4 then the result would be
[1, 2, 4, 5, 6, 8, 9]
Note: the numpy requirement is due to this being used for a machine learning assignment where the dependencies are fixed.

Approach #1 with modulus
a[np.mod(np.arange(a.size),4)!=0]
Sample run -
In [255]: a
Out[255]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
In [256]: a[np.mod(np.arange(a.size),4)!=0]
Out[256]: array([1, 2, 3, 5, 6, 7, 9])
Approach #2 with masking : Requirement as a view
Considering the views requirement, if the idea is to save on memory, we could store the equivalent boolean array that would occupy 8 times less memory on Linux system. Thus, such a mask based approach would be like so -
# Create mask
mask = np.ones(a.size, dtype=bool)
mask[::4] = 0
Here's the memory requirement stat -
In [311]: mask.itemsize
Out[311]: 1
In [312]: a.itemsize
Out[312]: 8
Then, we could use boolean-indexing as a view -
In [313]: a
Out[313]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
In [314]: a[mask] = 10
In [315]: a
Out[315]: array([ 0, 10, 10, 10, 4, 10, 10, 10, 8, 10])
Approach #3 with NumPy array strides : Requirement as a view
You can use np.lib.stride_tricks.as_strided to create such a view given the length of the input array is a multiple of n. If it's not a multiple, it would still work, but won't be a safe practice, as we would be going beyond the memory allocated for input array. Please note that the view thus created would be 2D.
Thus, an implementaion to get such a view would be -
def skipped_view(a, n):
s = a.strides[0]
strided = np.lib.stride_tricks.as_strided
return strided(a,shape=((a.size+n-1)//n,n),strides=(n*s,s))[:,1:]
Sample run -
In [50]: a = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]) # Input array
In [51]: a_out = skipped_view(a, 4)
In [52]: a_out
Out[52]:
array([[ 1, 2, 3],
[ 5, 6, 7],
[ 9, 10, 11]])
In [53]: a_out[:] = 100 # Let's prove output is a view indeed
In [54]: a
Out[54]: array([ 0, 100, 100, 100, 4, 100, 100, 100, 8, 100, 100, 100])

numpy.delete :
In [18]: arr = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
In [19]: arr = np.delete(arr, np.arange(0, arr.size, 4))
In [20]: arr
Out[20]: array([1, 2, 3, 5, 6, 7, 9])

The slickest answer that I found is using delete with i being the nth index which you want to skip:
del list[i-1::i]
Example:
In [1]: a = list([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
In [2]: del a[4-1::4]
In [3]: print(a)
Out[3]: [0, 1, 2, 4, 5, 6, 8, 9]
If you also want to skip the first value, use a[1:].

Related

Slicing list with respect to limits

I have the following problem regarding slicing arrays.
I have an array of the form:
a = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
and I would like to extract an array like this:
b = [8, 9, 10, 0, 1, 2, 3].
I tried using usual declaration like: b = a[-3 : 3] but it returns an empty array since it's going over the array limit.
I can't figure out how I can extract this array in correct permutation.
You can just concatenate the list slices you are looking for:
b = a[-3:] + a[:4]
In your example, you preserve all of the array elements. For this, there is a function called roll.
>>> import numpy as np
>>> a = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
>>> np.roll(a, 3)
array([8, 9, 10, 0, 1, 2, 3, 4, 5, 6, 7])
You can also use a negative number to roll the other way:
>>> np.roll(a, -3)
array([ 3, 4, 5, 6, 7, 8, 9, 10, 0, 1, 2])

Easiest way to create a matrix with pre-determined dimension and values

I have a matrix with dimention (2,5) and I have have a vector of values to be fill in that matrix. What is the best way. I can think of three methods but I have trouble using the np.empty & fill and np.full without loops
x=np.array(range(0,10))
mat=x.reshape(2,5)
array([[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9]])
mat=np.empty((2,5))
newMat=mat.fill(x) # Error: The x has to be scalar
mat=np.full((2,5),x) # Error: The x has to be scalar
full and fill are for setting all elements the same
In [557]: np.full((2,5),10)
Out[557]:
array([[10, 10, 10, 10, 10],
[10, 10, 10, 10, 10]])
Assigning an array works provided the shapes match (in the broadcasting sense):
In [558]: arr[...] = x.reshape(2,5) # make source the same shape as target
In [559]: arr
Out[559]:
array([[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9]])
In [560]: arr.flat = x # make target same shape as source
In [561]: arr
Out[561]:
array([[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9]])
arr.flat and arr.ravel() are equivalent. Well, not quite:
In [562]: arr.flat = x.reshape(2,5) # don't need the [:] with flat #wim
In [563]: arr
Out[563]:
array([[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9]])
In [564]: arr.ravel()[:] = x.reshape(2,5)
ValueError: could not broadcast input array from shape (2,5) into shape (10)
In [565]: arr.ravel()[:] = x.reshape(2,5).flat
flat works with any shape source, even ones that require replication
In [570]: arr.flat = [1,2,3]
In [571]: arr
Out[571]:
array([[1, 2, 3, 1, 2],
[3, 1, 2, 3, 1]])
More broadcasted inputs
In [572]: arr[...] = np.ones((2,1))
In [573]: arr
Out[573]:
array([[1, 1, 1, 1, 1],
[1, 1, 1, 1, 1]])
In [574]: arr[...] = np.arange(5)
In [575]: arr
Out[575]:
array([[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4]])
An example of the problem Eric mentioned. The ravel (or other reshape) of a transpose is (often) a copy. So writing to that does not modify the original.
In [578]: arr.T.ravel()[:]=10
In [579]: arr
Out[579]:
array([[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4]])
In [580]: arr.T.flat=10
In [581]: arr
Out[581]:
array([[10, 10, 10, 10, 10],
[10, 10, 10, 10, 10]])
ndarray.flat returns an object which can modify the contents of the array by direct assignment:
>>> array = np.empty((2,5), dtype=int)
>>> vals = range(10)
>>> array.flat = vals
>>> array
array([[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9]])
If that seems kind of magical to you, then read about the descriptor protocol.
Warning: assigning to flat does not raise exceptions for size mismatch. If there are not enough values on the right hand side of the assignment, the data will be rolled/repeated. If there are too many values, only the first few will be used.
If you want a 10x2 matrix of 5:
np.ones((10,2))*5
If you have a list of values and just want them in a particular shape:
datavalues = [1,2,3,4,5,6,7,8,9,10]
np.reshape(datavalues,(2,5))
array([[ 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10]])

Numpy Broadcasting

What happens when i make this operation in Numpy?
a = np.ones([500,1])
b = np.ones([5000,])/2
c = a + b
# a.shape (500,1)
# b.shape (5000, )
# c.shape (500, 5000)
I'm having a hard time to figure out what is actually happening in this broadcast.
Numpy assumes for 1 dimensional arrays row vectors, so your summation is indeed between shapes (500, 1) and (1, 5000), which leads to matrix summation.
Since this is not very clear, you should extend your dimensions explicitly:
>>> np.arange(5)[:, None] + np.arange(8)[None, :]
array([[ 0, 1, 2, 3, 4, 5, 6, 7],
[ 1, 2, 3, 4, 5, 6, 7, 8],
[ 2, 3, 4, 5, 6, 7, 8, 9],
[ 3, 4, 5, 6, 7, 8, 9, 10],
[ 4, 5, 6, 7, 8, 9, 10, 11]])

Vectorized search of element indeces

I have two integer numpy arrays, let's say, arr1 and arr2, that are permutations of range(some_length)
I want to get the third one, where
arr3[idx] = arr1.get_index_of(arr2[idx]) for all idx = 0,1,2,..., some_length-1
here get_index_of method is a pseudo-method of getting index of some element in the collection.
That can be done with naive looping through all the indeces, searching correspondnent element with subsequent assignment of it's index, etc.
But that is slow -- O(n^2). Can it be done faster (At least n*log(n) complexity)? Can it be done via pretty numpy methods? Maybe some sorting with non-trivial key= parameter? Sure there is some elegant solution.
Thank you in advance.
say, a is a permutation of 0..9:
>>> a = np.random.permutation(10)
>>> a
array([3, 7, 1, 8, 2, 4, 6, 0, 9, 5])
then, the indexer array is:
>>> i = np.empty(len(a), dtype='i8')
>>> i[a] = np.arange(len(a))
>>> i
array([7, 2, 4, 0, 5, 9, 6, 1, 3, 8])
this means that, index of say 0 in a is i[0] == 7, which is true since a[7] == 0.
So, in your example, say if you have an extra vector b, you can do as in below:
>>> b
array([5, 9, 4, 8, 6, 1, 7, 2, 3, 0])
>>> i[b]
array([9, 8, 5, 3, 6, 2, 1, 4, 0, 7])
which means that, say, b[0] == 5 and index of 5 in a is i[b][0] == 9, which is true, since a[9] = 5 = b[0].
Lets try a test case
In [166]: arr1=np.random.permutation(10)
In [167]: arr2=np.random.permutation(10)
In [168]: arr1
Out[168]: array([4, 3, 2, 9, 7, 8, 5, 1, 6, 0])
In [169]: arr2
Out[169]: array([9, 2, 6, 4, 0, 3, 1, 7, 8, 5])
np.where(arr1==i) performs your get_index_of method, so your iterative solution is:
In [170]: np.array([np.where(arr1==i)[0] for i in arr2]).flatten()
Out[170]: array([3, 2, 8, 0, 9, 1, 7, 4, 5, 6], dtype=int32)
A vectorized approach is to do an 'outter' comparison between the 2 arrays. This produces a (10,10) array, to which we can apply where to get the indices. Still an O(n^2) method, but it is mostly compiled. On this size of a problem it is 5x faster.
In [171]: np.where(arr1==arr2[:,None])
Out[171]:
(array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=int32),
array([3, 2, 8, 0, 9, 1, 7, 4, 5, 6], dtype=int32))

Fast way to select n items (drawn from a Poisson distribution) for each element in array x

I am having some trouble with solving a problem I encountered.
I have an array with prices:
>>> x = np.random.randint(10, size=10)
array([6, 1, 7, 6, 9, 0, 8, 2, 1, 8])
And a (randomly) generated array of Poisson distributed arrivals:
>>> arrivals = np.random.poisson(1, size=10)
array([4, 0, 1, 1, 3, 2, 1, 3, 2, 1])
Each single arrival should be associated with the price at the same index. So in the case above, the first element ( x[0] ) should be selected 4 times ( y[0] ). The second element ( x[1] ) should be selected 0 times ( y[1] )... The result thus should be:
array([6, 6, 6, 6, 7, 6, 9, 9, 9, 0, 0, 8, 2, 2, 2, 1, 1, 8])
Is there any (fast) way to accomplish this, without iterating over the arrays? Any help would be greatly appreciated.
You could use np.repeat:
In [43]: x = np.array([6, 1, 7, 6, 9, 0, 8, 2, 1, 8])
In [44]: arrivals = np.array([4, 0, 1, 1, 3, 2, 1, 3, 2, 1])
In [45]: np.repeat(x, arrivals)
Out[45]: array([6, 6, 6, 6, 7, 6, 9, 9, 9, 0, 0, 8, 2, 2, 2, 1, 1, 8])
but note that for certain calculations, it might be possible to avoid having to form this intermediate array. See for example, scipy.stats.binned_statistic.
I don't really see how you could do that without looping at all.
What you could do is create the result array prior to looping; that way you don't need to concatenate afterwards.
Result = np.empty( arrivals.sum(), dtype='i' )
and then change the values of that array blockwise:
Result_position = np.r_[ [0], arrivals.cumsum() ]
for i, xx in enumerate(x):
Result[ Result_position[i]:Result_position[i+1] ] = xx

Categories