Sampling unique column indexes for each row of a numpy array - python

I want to generate a fixed number of random column indexes (without replacement) for each row of a numpy array.
A = np.array([[3, 5, 2, 3, 3],
[1, 3, 3, 4, 5],
[3, 5, 4, 2, 1],
[1, 2, 3, 5, 3]])
If I fixed the required column number to 2, I want something like
np.array([[1,3],
[0,4],
[1,4],
[2,3]])
I am looking for a non-loop Numpy based solution. I tried with choice, but with the replacement=False I get error
ValueError: Cannot take a larger sample than population when
'replace=False'

Here's one vectorized approach inspired by this post -
def random_unique_indexes_per_row(A, N=2):
m,n = A.shape
return np.random.rand(m,n).argsort(1)[:,:N]
Sample run -
In [146]: A
Out[146]:
array([[3, 5, 2, 3, 3],
[1, 3, 3, 4, 5],
[3, 5, 4, 2, 1],
[1, 2, 3, 5, 3]])
In [147]: random_unique_indexes_per_row(A, N=2)
Out[147]:
array([[4, 0],
[0, 1],
[3, 2],
[2, 0]])
In [148]: random_unique_indexes_per_row(A, N=3)
Out[148]:
array([[2, 0, 1],
[3, 4, 2],
[3, 2, 1],
[4, 3, 0]])

Like this?
B = np.random.randint(5, size=(len(A), 2))

You can use random.choice() as following:
def random_indices(arr, n):
x, y = arr.shape
return np.random.choice(np.arange(y), (x, n))
# or return np.random.randint(low=0, high=y, size=(x, n))
Demo:
In [34]: x, y = A.shape
In [35]: np.random.choice(np.arange(y), (x, 2))
Out[35]:
array([[0, 2],
[0, 1],
[0, 1],
[3, 1]])
As an experimental approach here is a way that in 99% of the times will give unique indices:
In [60]: def random_ind(arr, n):
...: x, y = arr.shape
...: ind = np.random.randint(low=0, high=y, size=(x * 2, n))
...: _, index = np.unique(ind.dot(np.random.rand(ind.shape[1])), return_index=True)
...: return ind[index][:4]
...:
...:
...:
In [61]: random_ind(A, 2)
Out[61]:
array([[0, 1],
[1, 0],
[1, 1],
[1, 4]])
In [62]: random_ind(A, 2)
Out[62]:
array([[1, 0],
[2, 0],
[2, 1],
[3, 1]])
In [64]: random_ind(A, 3)
Out[64]:
array([[0, 0, 0],
[1, 1, 2],
[0, 4, 1],
[2, 3, 1]])
In [65]: random_ind(A, 4)
Out[65]:
array([[0, 4, 0, 3],
[1, 0, 1, 4],
[0, 4, 1, 2],
[3, 0, 1, 0]])
This function will return IndexError at line return ind[index][:4] if there's no 4 unique items in that case you can repeat the function to make sure you'll get the desire result.

Related

Downsampling 3D array with numpy

Given array:
A = array([[[1, 2, 3, 1],
[4, 5, 6, 2],
[7, 8, 9, 3]]])
I obtain the following array in the forward pass with a downsampling factor of k-1:
k = 3
B = A[...,::k]
#output
array([[[1, 1],
[4, 2],
[7, 3]]])
In the backward pass I want to be able to come back to my original shape, with an output of:
array([[[1, 0, 0, 1],
[4, 0, 0, 2],
[7, 0, 0, 3]]])
You can use numpy.zeros to initialize the output and indexing:
shape = list(B.shape)
shape[-1] = k*(shape[-1]-1)+1
# [1, 3, 4]
A2 = np.zeros(shape, dtype=B.dtype)
A2[..., ::k] = B
print(A2)
output:
array([[[1, 0, 0, 1],
[4, 0, 0, 2],
[7, 0, 0, 3]]])
using A:
A2 = np.zeros_like(A)
A2[..., ::k] = B
# or directly
# A2[..., ::k] = A[..., ::k]

Python: How to find indices of elements that satisfy conditions in each row, and transformed them to a dict?

An example:
import numpy as np
np.random.seed(20211021)
myarray = np.random.randint(0, 5, size=(5, 4))
>>> myarray
array([[2, 3, 0, 1],
[3, 3, 3, 1],
[1, 0, 0, 0],
[3, 2, 4, 0],
[4, 1, 4, 0]])
Here I use argwhere in numpy to find indices of elements that greater than 0 in each row.
g0 = np.argwhere(myarray > 0)
>>> g0
array([[0, 0],
[0, 1],
[0, 3],
[1, 0],
[1, 1],
[1, 2],
[1, 3],
[2, 0],
[3, 0],
[3, 1],
[3, 2],
[4, 0],
[4, 1],
[4, 2]], dtype=int64)
The dices g0 is a two-dimension array. The form of indices that I intend to create is like below:
{
0: [0, 1, 3],
1: [0, 1, 2, 3],
2: [0],
3: [0, 1, 2],
4: [0, 1, 2]
}
Is there any way in which g0 can be transformed to a dict?
(Other than applying function to each row of myarray I hasn't find an efficient method)
np.unique can be used with indexes to get both the dictionary keys and locations, then use np.split to divide the array, then zip together the keys and the arrays to build the dictionary from the tuples:
g0 = np.argwhere(myarray > 0)
keys, locs = np.unique(g0[:, 0], return_index=True)
d = dict(zip(keys, np.split(g0[:, 1], locs[1:])))
np.nonzero may be faster than np.argwhere in this case:
i, v = np.nonzero(myarray > 0)
keys, locs = np.unique(i, return_index=True)
d = dict(zip(keys, np.split(v, locs[1:])))
However, a simple dictionary comprehension is likely the fastest option on smaller arrays:
d = {i: np.nonzero(r > 0)[0] for i, r in enumerate(myarray)}
All options produce d:
{0: array([0, 1, 3]),
1: array([0, 1, 2, 3]),
2: array([0]),
3: array([0, 1, 2]),
4: array([0, 1, 2])}
Setup and imports:
import numpy as np
np.random.seed(20211021)
myarray = np.random.randint(0, 5, size=(5, 4))

How to expand the elements of a numpy matrix into sub matrices [duplicate]

This question already has answers here:
Quick way to upsample numpy array by nearest neighbor tiling [duplicate]
(3 answers)
Closed 3 years ago.
Let's say I have a numpy array:
x = np.array([[1, 2],
[3, 4]]
What is the easiest way to expand the elements into submatrices?
An intermediary result could look like this:
x = np.array([[[[1, 1],[1, 1]], [[2, 2],[2, 2]]],
[[[3, 3],[3, 3]], [[4, 4],[4, 4]]]]
And the desired result:
x = np.array([[1, 1, 2, 2],
[1, 1, 2, 2],
[3, 3, 4, 4],
[3, 3, 4, 4]]
You can use two repeats over the desired axes:
In [34]: np.repeat(np.repeat(x, 2, 1), 2, 0)
Out[34]:
array([[1, 1, 2, 2],
[1, 1, 2, 2],
[3, 3, 4, 4],
[3, 3, 4, 4]])
Or as a faster approach (more suitable for larger arrays and repeat numbers) you can use as_strided:
In [43]: from numpy.lib.stride_tricks import as_strided
In [44]: x, y = arr.shape
In [45]: xs, ys = arr.strides
In [46]: result = as_strided(arr, (x, 2, y, 2), (xs, 0, ys, 0))
In [47]: result.reshape(x*2, y*2)
Out[47]:
array([[1, 1, 2, 2],
[1, 1, 2, 2],
[3, 3, 4, 4],
[3, 3, 4, 4]])
You can use numpy.repeat for the task. It has an axis argument.
>>> a = np.array([[1, 2], [3, 4]])
>>> a
array([[1, 2],
[3, 4]])
>>> np.repeat(a, 2)
array([1, 1, 2, 2, 3, 3, 4, 4])
>>> np.repeat(a, 2, axis=1)
array([[1, 1, 2, 2],
[3, 3, 4, 4]])
>>> np.repeat(np.repeat(a, 2, axis=1), 2, axis=0)
array([[1, 1, 1, 1],
[2, 2, 2, 2],
[3, 3, 3, 3],
[4, 4, 4, 4]])

np.choose not giving desired result after broadcasting

I would like to pick the nth elements as specified in maxsuit from suitCounts. I did broadcast the maxsuit array so I do get a result, but not the desired one. Any suggestions what I'm doing conceptually wrong is appreciated. I don't understand the result of np.choose(self.maxsuit[:,:,None]-1, self.suitCounts), which is not what I'm looking for.
>>> self.maxsuit
Out[38]:
array([[3, 3],
[1, 1],
[1, 1]], dtype=int64)
>>> self.maxsuit[:,:,None]-1
Out[33]:
array([[[2],
[2]],
[[0],
[0]],
[[0],
[0]]], dtype=int64)
>>> self.suitCounts
Out[34]:
array([[[2, 1, 3, 0],
[1, 0, 3, 0]],
[[4, 1, 2, 0],
[3, 0, 3, 0]],
[[2, 2, 0, 0],
[1, 1, 1, 0]]])
>>> np.choose(self.maxsuit[:,:,None]-1, self.suitCounts)
Out[35]:
array([[[2, 2, 0, 0],
[1, 1, 1, 0]],
[[2, 1, 3, 0],
[1, 0, 3, 0]],
[[2, 1, 3, 0],
[1, 0, 3, 0]]])
The desired result would be:
[[3,3],[4,3],[2,1]]
You could use advanced-indexing for a broadcasted way to index into the array, like so -
In [415]: val # Data array
Out[415]:
array([[[2, 1, 3, 0],
[1, 0, 3, 0]],
[[4, 1, 2, 0],
[3, 0, 3, 0]],
[[2, 2, 0, 0],
[1, 1, 1, 0]]])
In [416]: idx # Indexing array
Out[416]:
array([[3, 3],
[1, 1],
[1, 1]])
In [417]: m,n = val.shape[:2]
In [418]: val[np.arange(m)[:,None],np.arange(n),idx-1]
Out[418]:
array([[3, 3],
[4, 3],
[2, 1]])
A bit cleaner way with np.ogrid to use open range arrays -
In [424]: d0,d1 = np.ogrid[:m,:n]
In [425]: val[d0,d1,idx-1]
Out[425]:
array([[3, 3],
[4, 3],
[2, 1]])
This is the best I can do with choose
In [23]: np.choose([[1,2,0],[1,2,0]], suitcounts[:,:,:3])
Out[23]:
array([[4, 2, 3],
[3, 1, 3]])
choose prefers that we use a list of arrays, rather than single one. It's supposed to prevent misuse. So the problem could be written as:
In [24]: np.choose([[1,2,0],[1,2,0]], [suitcounts[0,:,:3], suitcounts[1,:,:3], suitcounts[2,:,:3]])
Out[24]:
array([[4, 2, 3],
[3, 1, 3]])
The idea is to select items from the 3 subarrays, based on an index array like:
In [25]: np.array([[1,2,0],[1,2,0]])
Out[25]:
array([[1, 2, 0],
[1, 2, 0]])
The output will match the indexing array in shape. The choise arrays have match in shape as well, hence my use of [...,:3].
Values for the first column are selected from suitcounts[1,:,:3], for the 2nd column from suitcounts[2...] etc.
choose is limited to 32 choices; this is limitation imposed by the broadcasting mechanism.
Speaking of broadcasting I could simplify the expression
In [26]: np.choose([1,2,0], suitcounts[:,:,:3])
Out[26]:
array([[4, 2, 3],
[3, 1, 3]])
This broadcasts [1,2,0] to match the 2x3 shape of the subarrays.
I could get the target order by reordering the columns:
In [27]: np.choose([0,1,2], suitcounts[:,:,[2,0,1]])
Out[27]:
array([[3, 4, 2],
[3, 3, 1]])

replicating borders in 2d numpy arrays

I am trying to replicate the border of a 2d numpy array:
>>> from numpy import *
>>> test = array(range(9)).reshape(3,3)
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
Is there an easy way to replicate a border in any direction?
for example:
>>>> replicate(test, idx=0, axis=0, n=3)
array([[0, 1, 2],
[0, 1, 2],
[0, 1, 2],
[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
edit:
the following function did the job:
def replicate(a, xy, se, n):
rptIdx = numpy.ones(a.shape[0 if xy == 'X' else 1], dtype=int)
rptIdx[0 if se == 'start' else -1] = n + 1
return numpy.repeat(a, rptIdx, axis=0 if xy == 'X' else 1)
with xy in ['X', 'Y'] and se in ['start', 'end']
You can use np.repeat:
In [5]: np.repeat(test, [4, 1, 1], axis=0)
Out[5]:
array([[0, 1, 2],
[0, 1, 2],
[0, 1, 2],
[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
But for larger/variable arrays it will be more difficult to define the repeats argument ([4, 1, 1], which is in this case how many times you want to repeat each row).

Categories