Comparing two numpy arrays for compliance with two conditions - python

Consider two numpy arrays having the same shape, A and B, composed of 1s and 0s. A small example is shown:
A = [[1 0 0 1] B = [[0 0 0 0]
[0 0 1 0] [0 0 0 0]
[0 0 0 0] [1 1 0 0]
[0 0 0 0] [0 0 1 0]
[0 0 1 1]] [0 1 0 1]]
I now want to assign values to the two Boolean variables test1 and test2 as follows:
test1: Is there at least one instance where a 1 in an A column and a 1 in the SAME B column have row differences of exactly 1 or 2? If so, then test1 = True, otherwise False.
In the example above, column 0 of both arrays have 1s that are 2 rows apart, so test1 = True. (there are other instances in column 2 as well, but that doesn't matter - we only require one instance.)
test2: Do the 1 values in A and B all have different array addresses? If so, then test2 = True, otherwise False.
In the example above, both arrays have [4,3] = 1, so test2 = False.
I'm struggling to find an efficient way to do this and would appreciate some assistance.

Here is a simple way to test if two arrays have an entry one element apart in the same column (only in one direction):
(A[1:, :] * B[:-1, :]).any(axis=None)
So you can do
test1 = (A[1:, :] * B[:-1, :] + A[:-1, :] * B[1:, :]).any(axis=None) or (A[2:, :] * B[:-2, :] + A[:-2, :] * B[2:, :]).any(axis=None)
The second test can be done by converting the locations to indices, stacking them together, and using np.unique to count the number of duplicates. Duplicates can only come from the same index in two arrays since an array will never have duplicate indices. We can further speed up the calculation by using flatnonzero instead of nonzero:
test2 = np.all(np.unique(np.concatenate((np.flatnonzero(A), np.flatnonzero(B))), return_counts=True)[1] == 1)
A more efficient test would use np.intersect1d in a similar manner:
test2 = not np.intersect1d(np.flatnonzero(A), np.flatnonzero(B)).size

You can use masked_arrays and for second task you can do:
A_m = np.ma.masked_equal(A, 0)
B_m = np.ma.masked_equal(B, 0)
test2 = np.any((A_m==B_m).compressed())
And a naive way of doing first task is:
test1 = np.any((np.vstack((A_m[:-1],A_m[:-2],A_m[1:],A_m[2:]))==np.vstack((B_m[1:],B_m[2:],B_m[:-1],B_m[:-2]))).compressed())
output:
True
True

For Test2: You could just check if they found any similar indexes found for a value of 1.
A = np.array([[1, 0, 0, 1],[0, 0, 1, 0],[0, 0, 0, 0],[0, 0, 0, 0],[0, 0, 1, 1]])
B = np.array([[0, 0, 0, 0],[0, 0, 0, 0],[1, 1, 0, 0],[0, 0, 1, 0],[0, 1, 0, 1]])
print(len(np.intersect1d(np.flatnonzero(A==1),np.flatnonzero(B==1)))>0))

Related

Counting elements inside an array/matrix

I am struggling with what is hopefully a simple problem. Haven't been able to find a clear cut answer online.
The program given, asks for a user input (n) and then produces an n-sized square matrix. The matrix will only be made of 0s and 1s. I am attempting to count the arrays (I have called this x) that contain a number, or those that do not only contain only 0s.
Example output:
n = 3
[0, 0, 0] [1, 0, 0] [0, 1, 0]
In this case, x should = 2.
n = 4
[0, 0, 0, 0] [1, 0, 0, 0] [0, 1, 0, 0] [0, 0, 0, 0]
In this case, x should also be 2.
def xArrayCount(MX):
x = 0
count = 0
for i in MX:
if i in MX[0 + count] == 0:
x += 0
count += 1
else:
x += 1
count += 1
return(x)
Trying to count the number of 0s/1s in each index of the matrix but I am going wrong somewhere, could someone explain how this should work?
(Use of extra python modules is disallowed)
Thank you
You need to count all the lists that contain at least once the number one. To do that you can't use any other module.
def count_none_zero_items(matrix):
count = 0
for row in matrix:
if 1 in row:
count += 1
return count
x = [[0, 0, 0, 0], [1, 0, 0, 0], [0, 1, 0, 0], [0, 0, 0, 0]]
count_none_zero_items(x)
Also notice that i changed the function name to lower case as this is the convention in python. Read more about it here:
Python3 Conventions
Also it's worth mentioning that in python we call the variable list instead of array.
Those look like tuples, not arrays. I tried this myself and if I change the tuples into nested arrays (adding outer brackets and commas between the single arrays), this function works:
def xArrayCount(MX):
x = 0
count = 0
for i in matrix:
if MX[count][count] == 0:
count += 1
else:
x += 1
count += 1
return x

Comparing elements at specific positions in numpy.ndarray

I don't know if the title describes my question. I have such list of floats obtained from a sigmoid activation function.
outputs =
[[0.015161413699388504,
0.6720218658447266,
0.0024502829182893038,
0.21356457471847534,
0.002232735510915518,
0.026410426944494247],
[0.006432057358324528,
0.0059209042228758335,
0.9866275191307068,
0.004609372932463884,
0.007315939292311668,
0.010821194387972355],
[0.02358204871416092,
0.5838017225265503,
0.005475651007145643,
0.012086033821106,
0.540218658447266,
0.010054176673293114]]
To calculate my metrics, I would like to say if any neuron's output value is greater than 0.5, it is assumed that the comment belongs to the class (multi-label problem). I could easily do that using
outputs = np.where(np.array(outputs) >= 0.5, 1, 0)
However, I would like to add a condition to consider only the bigger value if class#5 and and any other class have values > 0.5 (as class#5 cannot occur with other classes). How to write that condition?
In my example the output should be:
[[0 1 0 0 0 0]
[0 0 1 0 0 0]
[0 1 0 0 0 0]]
instead of:
[[0 1 0 0 0 0]
[0 0 1 0 0 0]
[0 1 0 0 1 0]]
Thanks,
You can write a custom function that you can then apply to each sub-array in outputs using the np.apply_along_axis() function:
def choose_class(a):
if (len(np.argwhere(a >= 0.5)) > 1) & (a[4] >= 0.5):
return np.where(a == a.max(), 1, 0)
return np.where(a >= 0.5, 1, 0)
outputs = np.apply_along_axis(choose_class, 1, outputs)
outputs
# array([[0, 1, 0, 0, 0, 0],
# [0, 0, 1, 0, 0, 0],
# [0, 1, 0, 0, 0, 0]])
For the simple mask, you don't need np.where
mask = outputs >= 0.5
If you want an integer instead of a boolean:
mask = (outputs >= 0.5).view(np.uint8)
To check the fifth column, you need to keep a reference to the original data around. You can get the maximum masked value in each relevant row with
rows = np.flatnonzero(mask[:, 4])
keep = (outputs[mask] * mask[rows]).argmax()
Then you can blank out the rows and set only the maximum value:
mask[rows] = 0
mask[rows, keep] = 1
One other solution:
# Your example input array
out = np.array([[0.015, 0.672, 0.002, 0.213, 0.002, 0.026],
[0.006, 0.005, 0.986, 0.004, 0.007, 0.010],
[0.023, 0.583, 0.005, 0.012, 0.540, 0.010]])
# We get the desired result
val = (out>=0.5)*out//(out.max(axis=1))[:,None]
This solution do the following operation:
Set to zero all the value < 0.5
Set to 1 the maximum value by row (iif this value is >= 0.5)

How can I get exactly the same amount elements replaced in numpy 2D matrix?

I got a symmetrical 2D numpy matrix, it only contains ones and zeros and diagonal elements are always 0.
I want to replace part of the elements from one to zero, and the result need to keep symmetrical too. How many elements will be selected depends on the parameterreplace_rate.
Since it's a symmetrical matrix, I take half of the matrix and select the elements(those values are 1) randomly, change them from 1 to 0. And then with a mirror operation, make sure the whole matrix are still symmetrical.
For example
com = np.array ([[0, 1, 1, 1, 1],
[1, 0, 1, 1, 1],
[1, 1, 0, 1, 1],
[1, 1, 1, 0, 1],
[1, 1, 1, 1, 0]])
replace_rate = 0.1
com = np.triu(com)
mask = np.random.choice([0,1],size=(com.shape),p=((1-replace_rate),replace_rate)).astype(np.bool)
r1 = np.random.rand(*com.shape)
com[mask] = r1[mask]
com += com.T - np.diag(com.diagonal())
com is a (5,5) symmetrical matrix, and 10% of elements (only include those values are 1, the diagonal elements are excluded) will be replaced to 0 randomly.
The question is , how can I make sure the amount of elements changed keep the same each time?
Keep the same replace_rate = 0.1, sometimes I will get result like:
com = np.array([[0 1 1 1 1]
[1 0 1 1 1]
[1 1 0 1 1]
[1 1 1 0 1]
[1 1 1 1 0]])
Actually no one changed this time, and if I repeat it, I got 2 elements changed :
com = np.array([[0 1 1 1 1]
[1 0 1 1 1]
[1 1 0 1 0]
[1 1 1 0 1]
[1 1 0 1 0]])
I want to know how to fix the amount of elements changed with the same replace_rate?
Thanks in advance!!
How about something like this:
def make_transform(m, replace_rate):
changed = [] # keep track of indices we already changed
def get_random():
# Get a random pair of indices which are not equal (i.e. not on the diagonal)
c1, c2 = random.choices(range(len(com)), k=2)
if c1 == c2 or (c1,c2) in changed or (c2,c1) in changed:
return get_random() # Recurse until we find an i,j pair : i!=j , that hasnt already been changed
else:
changed.append((c1,c2))
return c1, c2
n_changes = int(m.shape[0]**2 * replace_rate) # the number of changes to make
print(n_changes)
for _ in range(n_changes):
i, j = get_random() # Get an valid index
m[i][j] = m[j][i] = 0
return m
This is the solution I suggest:
def rand_zero(mat, replace_rate):
triu_mat = np.triu(mat)
_ind = np.where(triu_mat != 0) # gets indices of non-zero elements, not just non-diagonals
ind = [x for x in zip(*_ind)]
chng = np.random.choice(range(len(ind)), # select some indices, at rate 'replace_rate'
size = int(replace_rate*mat.size),
replace = False) # do not select duplicates
mod_mat = triu_mat
for c in chng:
mod_mat[ind[c]] = 0
mod_mat = mod_mat + mod_mat.T
return mod_mat
I use int() to truncate to an integer in size, but you can use round() if that's what you desire.
Hope this gives consistent results!

Numpy: Storing standard basis vector in a memory efficient way

This was the only question I found about standard basis vectors in numpy but it's not really related to my question.
I have a numpy array of integers and I want to determine the co-occurrence matrix which stores the number of times indicies have the same value in the same column. This question describes the problem in more detail.
I have a method of solving my problem but it doesn't scale well.
My question then is this:
Is it possible to store standard basis vectors in a numpy array in a memory efficient manner?
I want to be able to do the following:
Given an array
M = e1 e2 e1
e1 e2 e2
e3 e1 e3
e2 e3 e3
where ei is the transposed i-th standard basis vector of the vector space (R3 in this case), perform matrix multiplication with the transpose of M, i.e. determine np.dot(M, M.T). To be clear, the matrix M above could be written as:
M = 1 0 0 0 1 0 1 0 0
1 0 0 0 1 0 0 1 0
0 0 1 1 0 0 0 0 1
0 1 0 0 0 1 0 0 1
(extra spaces added for emphasis).
The issue with representing the matrix like this is that it isn't scalable in memory with the number of rows and dimension of the vector space.
EDIT: I should mention that the number of columns can increase as well. The memory complexity is D * R * C where D is the dimension of the vector space, R is the number of rows and C is the number of columns. In an average working example I have roughly D == 150, R == 2000 and C == 1000 though R can go up to 20,000 and C is unbounded (though 10,000 is a reasonable estimate).
The rules for standard basis vector multiplication are simple (ei * ei.T == 1, ei * ej.T == 0 if i != j) so I was wondering if it's possible to store these rules in a numpy array to save memory.
Let's encode the basis vectors with numbers: e1 -> 1, e2 -> 2, ... This allows a very memory-efficient storage.
M = np.array([[1, 2, 1], [1, 2, 2], [3, 1, 3], [2, 3, 3]], dtype=np.uint8)
# if more than 255 basis vectors, use uint16.
Now we only need to implement a special dot product that works with these basis vectors. Basically we only replace the multiplication with a comparison:
def basis_dot(a, b):
return np.sum(a[:, :, np.newaxis] == b[np.newaxis, :, :], axis=1)
print(basis_dot(M, M.T))
# [[3 2 0 0]
# [2 3 0 0]
# [0 0 3 1]
# [0 0 1 3]]
Let's verify the result:
M = np.array([[1, 0, 0, 0, 1, 0, 1, 0, 0],
[1, 0, 0, 0, 1, 0, 0, 1, 0],
[0, 0, 1, 1, 0, 0, 0, 0, 1],
[0, 1, 0, 0, 0, 1, 0, 0, 1]])
np.dot(M, M.T)
# array([[3, 2, 0, 0],
# [2, 3, 0, 0],
# [0, 0, 3, 1],
# [0, 0, 1, 3]])
A potential drawback with the approach is the large temporary array required in basis_dot. The memory requirement can be reduced by explicitly coding the loops, at the cost of performance (unless you use a jit compiler).
# slower but more memory friendly
def basis_dot(a, b):
out = np.empty((a.shape[0], b.shape[1]))
for i in range(a.shape[0]):
for j in range(b.shape[1]):
out[i, j] = np.sum(a[i, :] == b[:, j])
return out
So, my assumption based on your example is that you're actually working with a higher dimensionality than just 3. My other assumption is that you're not computing any basis vectors, but just auto-generating basis vectors for RN. I'll ignore the question of exactly what you're trying to accomplish or why you're storing vectors that you can easily auto-generate for now.
If all of the above assumptions are accurate then you can likely gain a lot of benefit by storing in a sparse data format. This will only improve storage if you've got a preponderance of zeroes, but that seems like a reasonable assumption. There are a large number of sparse formats which you can view here. My best guess for you would be the coo_matrix class.
from scipy.sparse import coo_matrix
new_matrix = coo_matrix(<your_matrix>)
Then saving new matrix in your format of choice.

In order to generate all combinations of 1's and 0's we use a simple binary table. How can I easily create this binary table in an array?

For example the binary table for 3 bit:
0 0 0
0 0 1
0 1 0
1 1 1
1 0 0
1 0 1
And I want to store this into an n*n*2 array so it would be:
0 0 0
0 0 1
0 1 0
1 1 1
1 0 0
1 0 1
For generating the combinations automatically, you can use itertools.product standard library, which generates all possible combinations of the different sequences which are supplied, i. e. the cartesian product across the input iterables. The repeat argument comes in handy as all of our sequences here are identical ranges.
from itertools import product
x = [i for i in product(range(2), repeat=3)]
Now if we want an array instead a list of tuples from that, we can just pass this to numpy.array.
import numpy as np
x = np.array(x)
# [[0 0 0]
# [0 0 1]
# [0 1 0]
# [0 1 1]
# [1 0 0]
# [1 0 1]
# [1 1 0]
# [1 1 1]]
If you want all elements in a single list, so you could index them with a single index, you could chain the iterable:
from itertools import chain, product
x = list(chain.from_iterable(product(range(2), repeat=3)))
result: [0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1]
Most people would expect 2^n x n as in
np.c_[tuple(i.ravel() for i in np.mgrid[:2,:2,:2])]
# array([[0, 0, 0],
# [0, 0, 1],
# [0, 1, 0],
# [0, 1, 1],
# [1, 0, 0],
# [1, 0, 1],
# [1, 1, 0],
# [1, 1, 1]])
Explanation: np.mgrid as used here creates the coordinates of the corners of a unit cube which happen to be all combinations of 0 and 1. The individual coordinates are then ravelled and joined as columns by np.c_
Here's a recursive, native python (no libraries) version of it:
def allBinaryPossiblities(maxLength, s=""):
if len(s) == maxLength:
return s
else:
temp = allBinaryPossiblities(maxLength, s + "0") + "\n"
temp += allBinaryPossiblities(maxLength, s + "1")
return temp
print (allBinaryPossiblities(3))
It prints all possible:
000
001
010
011
100
101
110
111

Categories