Histogramming boolean numpy arrays by giving each array a unique label

Histogramming boolean numpy arrays by giving each array a unique label - python

I have a large sample (M) of boolean arrays of length 'N'. So there are 2^N unique boolean arrays possible.
I would like to know how many arrays are duplicates of each other and create a histogram.
One possibility is to create a unique integer (a[0] + a[1]*2 + a[3]*4 + ...+a[N]*2^(N-1)) for each unique array and histogram that integer.
But this is going to be O(M*N). What is the best way to do this?

numpy.ravel_multi_index is able to do this for you:
arr = np.array([[True, True, True],
[True, False, True],
[True, False, True],
[False, False, False],
[True, False, True]], dtype=int)
nums = np.ravel_multi_index(arr.T, (2,) * arr.shape[1])
>>> nums
array([7, 5, 5, 0, 5], dtype=int64)
And since you need a histogram, use
>>> np.histogram(nums, bins=np.arange(2**arr.shape[1]+1))
(array([1, 0, 0, 0, 0, 3, 0, 1], dtype=int64),
array([0, 1, 2, 3, 4, 5, 6, 7, 8]))
Another option is to use np.unique:
>>> np.unique(arr, return_counts=True, axis=0)
(array([[0, 0, 0],
[1, 0, 1],
[1, 1, 1]]),
array([1, 3, 1], dtype=int64))

With vectorized operation, the creation of a key is much more faster than a[0] + a[1]x2 + a[3]x4 + ...+a[N]*2^(N-1). I think that's no a better solution... in any case you need to almost "read" one time each value, and this require MxN step.
N = 3
M = 5
sample = [
np.array([True, True, True]),
np.array([True, False, True]),
np.array([True, False, True]),
np.array([False, False, False]),
np.array([True, False, True]),
]
multipliers = [2<<i for i in range(N-2, -1, -1)]+[1]
buckets = {}
buck2vec = {}
for s in sample:
key = sum(s*multipliers)
if key not in buckets:
buckets[key] = 0
buck2vec[key] = s
buckets[key]+=1
for key in buckets:
print(f"{buck2vec[key]} -> {buckets[key]} occurency")
Results:
[False False False] -> 1 occurency
[ True False True] -> 3 occurency
[ True True True] -> 1 occurency

Related

is there a method for finding the indexes of a 2d-array based on a given array

suppose we have two arrays like these two:
A=np.array([[1, 4, 3, 0, 5],[6, 0, 7, 12, 11],[20, 15, 34, 45, 56]])
B=np.array([[4, 5, 6, 7]])
I intend to write a code in which I can find the indexes of an array such as A based on values in
the array B
for example, I want the final results to be something like this:
C=[[0 1]
[0 4]
[1 0]
[1 2]]
can anybody provide me with a solution or a hint?

Do you mean?
In [375]: np.isin(A,B[0])
Out[375]:
array([[False, True, False, False, True],
[ True, False, True, False, False],
[False, False, False, False, False]])
In [376]: np.argwhere(np.isin(A,B[0]))
Out[376]:
array([[0, 1],
[0, 4],
[1, 0],
[1, 2]])
B shape of (1,4) where the initial 1 isn't necessary. That's why I used B[0], though isin, via in1d ravels it anyways.
where is result is often more useful
In [381]: np.where(np.isin(A,B))
Out[381]: (array([0, 0, 1, 1]), array([1, 4, 0, 2]))
though it's a bit harder to understand.
Another way to get the isin array:
In [383]: (A==B[0,:,None,None]).any(axis=0)
Out[383]:
array([[False, True, False, False, True],
[ True, False, True, False, False],
[False, False, False, False, False]])

You can try in this way by using np.where().
index = []
for num in B:
for nums in num:
x,y = np.where(A == nums)
index.append([x,y])
print(index)
>>array([[0,1],
[0,4],
[1,0],
[1,2]])

With zip and np.where:
>>> list(zip(*np.where(np.in1d(A, B).reshape(A.shape))))
[(0, 1), (0, 4), (1, 0), (1, 2)]
Alternatively:
>>> np.vstack(np.where(np.isin(A,B))).transpose()
array([[0, 1],
[0, 4],
[1, 0],
[1, 2]], dtype=int64)

Generate three non-overlapping mask for 2-D matrix that covers all of it

I have a 2-d array and I want to divide it into 3 non-overlapping and random sub-matrix by mask generation. For example I have a matrix like follow:
input = [[1,2,3],
[4,5,6],
[7,8,9]]
I want three random zero-one masks like follow:
mask1 = [[0,1,0],
[1,0,1],
[0,0,0]]
mask2 = [[1,0,0],
[0,1,0],
[1,0,0]]
mask3 =[[0,0,1],
[0,0,0],
[0,1,1]]
But my input matrix is too large and I need to do it in a fast way. I also want to determine the ratio of ones for every mask as input. In the above example the ratio is equal for all masks.
To produce one random mask, I use following code:
np.random.choice([0, 1],size=(size of matrix[0],size of matrix[1]))
My problem is how to produce non-overlapping masks.

IIUC, you can make a random matrix of 0, 1, and 2, and then extract the m == 0, m == 1, and m == 2 values:
groups = np.random.randint(0, 3, (5,5))
masks = (groups[...,None] == np.arange(3)[None,:]).T
However, this wouldn't guarantee an equal number of elements in each mask. To achieve that, you could permute a balanced allocation:
a = np.arange(25).reshape(5,5) # dummy input
groups = np.random.permutation(np.arange(a.size) % 3).reshape(a.shape)
masks = (groups[...,None] == np.arange(3)[None,:]).T
If you wanted a random probability to be in a group:
groups = np.random.choice([0,1,2], p=[0.3, 0.6, 0.1], size=a.shape)
or something. All you need to do is decide how you want to assign cells to groups, and then you can build your masks.
For example:
In [431]: groups = np.random.permutation(np.arange(a.size) % 3).reshape(a.shape)
In [432]: groups
Out[432]:
array([[1, 0, 0, 2, 0],
[1, 2, 0, 0, 1],
[2, 0, 2, 0, 2],
[1, 1, 2, 1, 0],
[2, 2, 1, 1, 0]], dtype=int32)
In [433]: masks = (groups[...,None] == np.arange(3)[None,:]).T
In [434]: masks
Out[434]:
array([[[False, False, False, False, False],
[ True, False, True, False, False],
[ True, True, False, False, False],
[False, True, True, False, False],
[ True, False, False, True, True]],
[[ True, True, False, True, False],
[False, False, False, True, False],
[False, False, False, False, True],
[False, False, False, True, True],
[False, True, False, False, False]],
[[False, False, True, False, True],
[False, True, False, False, True],
[False, False, True, True, False],
[ True, False, False, False, False],
[False, False, True, False, False]]])
which gives me a full mask:
In [450]: masks.sum(axis=0)
Out[450]:
array([[1, 1, 1, 1, 1],
[1, 1, 1, 1, 1],
[1, 1, 1, 1, 1],
[1, 1, 1, 1, 1],
[1, 1, 1, 1, 1]])
and reasonably balanced. If the number of cells were a multiple of 3, these numbers would all agree.
In [451]: masks.sum(2).sum(1)
Out[451]: array([9, 8, 8])
You can use .astype(int) to convert from a bool array to an int array of 0s and 1s if you like.

Numpy element-wise in operation

Suppose I have a column vector y with length n, and I have a matrix X of size n*m. I want to check for each element i in y, whether the element is in the corresponding row in X. What is the most efficient way of doing this?
For example:
y = [1,2,3,4].T
and
X =[[1, 2, 3],[3, 4, 5],[4, 3, 2],[2, 2, 2]]
Then the output should be
[1, 0, 1, 0] or [True, False, True, False]
which ever is easier.
Of course we can use a for loop to iterate through both y and X, but is there any more efficient way of doing this?

Vectorized approach using broadcasting -
((X == y[:,None]).any(1)).astype(int)
Sample run -
In [41]: X # Input 1
Out[41]:
array([[1, 2, 3],
[3, 4, 5],
[4, 3, 2],
[2, 2, 2]])
In [42]: y # Input 2
Out[42]: array([1, 2, 3, 4])
In [43]: X == y[:,None] # Broadcasted comparison
Out[43]:
array([[ True, False, False],
[False, False, False],
[False, True, False],
[False, False, False]], dtype=bool)
In [44]: (X == y[:,None]).any(1) # Check for any match along each row
Out[44]: array([ True, False, True, False], dtype=bool)
In [45]: ((X == y[:,None]).any(1)).astype(int) # Convert to 1s and 0s
Out[45]: array([1, 0, 1, 0])

How can you turn an index array into a mask array in Numpy?

Is it possible to convert an array of indices to an array of ones and zeros, given the range?
i.e. [2,3] -> [0, 0, 1, 1, 0], in range of 5
I'm trying to automate something like this:
>>> index_array = np.arange(200,300)
array([200, 201, ... , 299])
>>> mask_array = ??? # some function of index_array and 500
array([0, 0, 0, ..., 1, 1, 1, ... , 0, 0, 0])
>>> train(data[mask_array]) # trains with 200~299
>>> predict(data[~mask_array]) # predicts with 0~199, 300~499

Here's one way:
In [1]: index_array = np.array([3, 4, 7, 9])
In [2]: n = 15
In [3]: mask_array = np.zeros(n, dtype=int)
In [4]: mask_array[index_array] = 1
In [5]: mask_array
Out[5]: array([0, 0, 0, 1, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0])
If the mask is always a range, you can eliminate index_array, and assign 1 to a slice:
In [6]: mask_array = np.zeros(n, dtype=int)
In [7]: mask_array[5:10] = 1
In [8]: mask_array
Out[8]: array([0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0])
If you want an array of boolean values instead of integers, change the dtype of mask_array when it is created:
In [11]: mask_array = np.zeros(n, dtype=bool)
In [12]: mask_array
Out[12]:
array([False, False, False, False, False, False, False, False, False,
False, False, False, False, False, False], dtype=bool)
In [13]: mask_array[5:10] = True
In [14]: mask_array
Out[14]:
array([False, False, False, False, False, True, True, True, True,
True, False, False, False, False, False], dtype=bool)

For a single dimension, try:
n = (15,)
index_array = [2, 5, 7]
mask_array = numpy.zeros(n)
mask_array[index_array] = 1
For more than one dimension, convert your n-dimensional indices into one-dimensional ones, then use ravel:
n = (15, 15)
index_array = [[1, 4, 6], [10, 11, 2]] # you may need to transpose your indices!
mask_array = numpy.zeros(n)
flat_index_array = np.ravel_multi_index(
index_array,
mask_array.shape)
numpy.ravel(mask_array)[flat_index_array] = 1

There's a nice trick to do this as a one-liner, too - use the numpy.in1d and numpy.arange functions like this (the final line is the key part):
>>> x = np.linspace(-2, 2, 10)
>>> y = x**2 - 1
>>> idxs = np.where(y<0)
>>> np.in1d(np.arange(len(x)), idxs)
array([False, False, False, True, True, True, True, False, False, False], dtype=bool)
The downside of this approach is that it's ~10-100x slower than the appropch Warren Weckesser gave... but it's a one-liner, which may or may not be what you're looking for.

As requested, here it is in an answer. The code:
[x in index_array for x in range(500)]
will give you a mask like you asked for, but it will use Bools instead of 0's and 1's.

How to check if all values in the columns of a numpy matrix are the same?

I want to check if all values in the columns of a numpy array/matrix are the same.
I tried to use reduce of the ufunc equal, but it doesn't seem to work in all cases:
In [55]: a = np.array([[1,1,0],[1,-1,0],[1,0,0],[1,1,0]])
In [56]: a
Out[56]:
array([[ 1, 1, 0],
[ 1, -1, 0],
[ 1, 0, 0],
[ 1, 1, 0]])
In [57]: np.equal.reduce(a)
Out[57]: array([ True, False, True], dtype=bool)
In [58]: a = np.array([[1,1,0],[1,0,0],[1,0,0],[1,1,0]])
In [59]: a
Out[59]:
array([[1, 1, 0],
[1, 0, 0],
[1, 0, 0],
[1, 1, 0]])
In [60]: np.equal.reduce(a)
Out[60]: array([ True, True, True], dtype=bool)
Why does the middle column in the second case also evaluate to True, while it should be False?
Thanks for any help!

In [45]: a
Out[45]:
array([[1, 1, 0],
[1, 0, 0],
[1, 0, 0],
[1, 1, 0]])
Compare each value to the corresponding value in the first row:
In [46]: a == a[0,:]
Out[46]:
array([[ True, True, True],
[ True, False, True],
[ True, False, True],
[ True, True, True]], dtype=bool)
A column shares a common value if all the values in that column are True:
In [47]: np.all(a == a[0,:], axis = 0)
Out[47]: array([ True, False, True], dtype=bool)
The problem with np.equal.reduce can be seen by micro-analyzing what happens when it is applied to [1, 0, 0, 1]:
In [49]: np.equal.reduce([1, 0, 0, 1])
Out[50]: True
The first two items, 1 and 0 are tested for equality and the result is False:
In [51]: np.equal.reduce([False, 0, 1])
Out[51]: True
Now False and 0 are tested for equality and the result is True:
In [52]: np.equal.reduce([True, 1])
Out[52]: True
But True and 1 are equal, so the total result is True, which is not the desired outcome.
The problem is that reduce tries to accumulate the result "locally", while we want a "global" test like np.all.

Given ubuntu's awesome explanation, you can use reduce to solve your problem, but you have to apply it to bitwise_and and bitwise_or rather than equal. As a consequence, this will not work with floating point arrays:
In [60]: np.bitwise_and.reduce(a) == a[0]
Out[60]: array([ True, False, True], dtype=bool)
In [61]: np.bitwise_and.reduce(b) == b[0]
Out[61]: array([ True, False, True], dtype=bool)
Basically, you are comparing the bits of each element in the column. Identical bits are unchanged. Different bits are set to zero. This way, any number that has a zero instead of a one bit will change the reduced value. bitwise_and will not trap the case where bits are introduced rather than removed:
In [62]: c = np.array([[1,0,0],[1,0,0],[1,0,0],[1,1,0]])
In [63]: c
Out[63]:
array([[1, 0, 0],
[1, 0, 0],
[1, 0, 0],
[1, 1, 0]])
In [64]: np.bitwise_and.reduce(c) == c[0]
Out[64]: array([ True, True, True], dtype=bool)
The second coumn is clearly wrong. We need to use bitwise_or to trap new bits:
In [66]: np.bitwise_or.reduce(c) == c[0]
Out[66]: array([ True, False, True], dtype=bool)
Final Answer
In [69]: np.logical_and(np.bitwise_or.reduce(a) == a[0], np.bitwise_and.reduce(a) == a[0])
Out[69]: array([ True, False, True], dtype=bool)
In [70]: np.logical_and(np.bitwise_or.reduce(b) == b[0], np.bitwise_and.reduce(b) == b[0])
Out[70]: array([ True, False, True], dtype=boo
In [71]: np.logical_and(np.bitwise_or.reduce(c) == c[0], np.bitwise_and.reduce(c) == c[0])
Out[71]: array([ True, False, True], dtype=bool)
This method is more restrictive and less elegant than ubunut's suggestion of using all, but it has the advantage of not creating enormous temporary arrays if your input is enormous. The temporary arrays should only be as big as the first row of your matrix.
EDIT
Based on this Q/A and the bug I filed with numpy, the solution provided only works because your array contains zeros and ones. As it happens, the bitwise_and.reduce() operations shown can only ever return zero or one because bitwise_and.identity is 1, not -1. I am keeping this answer in the hope that numpy gets fixed and the answer becomes valid.
Edit
Looks like there will in fact be a change to numpy soon. Certainly to bitwise_and.identity, and also possibly an optional parameter to reduce.
Edit
Good news everyone. The identity for np.bitwise_and has been set to -1 as of version 1.12.0.

Not as elegant but may also work in the example above.
a = np.array([[1,1,0],[1,-1,0],[1,0,0],[1,1,0]])
take the difference between the each row and the one above it
np.diff(a,axis=0)==0
array([[ True, False, True],
[ True, False, True],
[ True, False, True]])

A general solution that allows for equality comparison across any arbitrary axis/combination of axes, as well as floating point tolerances as in np.isclose with the keyword arguments rtol and atol:
def isconst(x, axis=None, **kwargs):
if axis is None:
x = x.reshape(-1)
else:
if isinstance(axis, int):
axis = [axis]
axis = sorted([d % x.ndim for d in axis])[::-1]
for d in axis:
x = np.moveaxis(x, d, -1)
x = x.reshape(*x.shape[:-len(axis)],-1)
return np.isclose(x[...,:-1], x[...,1:], **kwargs).all(axis=-1)
Example:
>>> a = np.array([[[1.0,1.0,0.9],
... [1.0,1.1,1.1]],
... [[1.0,1.0,1.0],
... [0.99999,1.0,1.09999]]])
>>> isconst(a, axis=0)
array([[ True, True, False],
[ True, False, True]])
>>> isconst(a, axis=0, rtol=1.0e-8)
array([[ True, True, False],
[False, False, False]])

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Histogramming boolean numpy arrays by giving each array a unique label - python

Related

is there a method for finding the indexes of a 2d-array based on a given array

Generate three non-overlapping mask for 2-D matrix that covers all of it

Numpy element-wise in operation

How can you turn an index array into a mask array in Numpy?

How to check if all values in the columns of a numpy matrix are the same?

Categories

Resources