Count Instances in One Array Based on Another Array - python

I am wondering if there is an efficient way to perform the following. I have two (numpy) arrays, and I would like to count the number of instances of a value occurring in one based on the criteria of another another array. For example:
a = np.array([1,-1,1,1,-1,-1])
b = np.array([.75,.35,.7,.8,.2,.6])
I would like to calculate c as the number of 1's in a that occur when b > .5, so in this case `c = 3'. My current solution is ugly and would appreciate any suggestions.

You can use numpy.sum for this:
a = np.array([1,-1,1,1,-1,-1])
b = np.array([.75,.35,.7,.8,.2,.6])
np.sum((a == 1) & (b > .5)) # 3
This works because bool is a subclass of int.

If it's only one condition you are after, try:
np.count_nonzero((a == 1) & (b > .5))

Related

Use for loop to check each elements in matrix in python

A = np.arange(2,42).reshape(5,8)
B = np.arange(4,68).reshape(8,8)
C=np.dot(A,B)
how to use for loop to check each element in C is larger than 100 or not, then the output is True or False.
I have no idea because it is a matrix not a number.
Is there someone help me please
Do you want to return True, if EVERY element of C is >100? or do you want to create a matrix, whose entries are True or False if the entries of C are >100 or <100?
In both cases I would recommend not using for-loops. For case1,you can try:
print(min(C.flatten())<100)
which will print False, if all elements of C are bigger than 100 and else True (Note that the .flatten command just rewrites the 2D Array into a 1D one temporarily. The shape of C stays in its original state.)
for case2, you can just type
print(C<100)
and it will print a matrix with entries True or False, based on whether C is > or <100 in that entry.
if you want to use for-loops: First note that the shape of C is (5,8), meaning that C is a 2D object. Now, in order to access all entries of C via for-loops, you can write something like this:
import numpy as np
A = np.arange(2,42).reshape(5,8)
B = np.arange(4,68).reshape(8,8)
C=np.dot(A,B)
D = np.zeros(C.shape,dtype = bool)
for i in range(C.shape[0]): #meaning in range 5
for j in range(C.shape[1]): #meaning in range 8
if(C[i,j]>100):
D[i,j] = False
else:
D[i,j] = True
print(D)
where I introduced a new matrix D, which is just a new matrix in the same shape of C, in which we consecutively fill up True or False, based on if C is > or <100 at that entry. This code is equivalent, but slower and more complicated than the one I proposed above.
I hope this answers your question sufficiently. If you have any more questions on detials etc., dont hestitate to ask ;).
You must use numpy filter method. This ways that say are very awful and slow. Numpy filtering methods are very optimal
import numpy as np
filter_arr = C > 100
newarr = arr[C]
print(newarr)

Is there an efficient way to pass "all" as a numpy index?

I have code that generates a boolean array that acts as a mask on numpy arrays, along the lines of:
def func():
a = numpy.arange(10)
mask = a % 2 == 0
return a[mask]
Now, I need to separate this into a case where the mask is created, and one where it is not created and all values are used instead. This could be achieved as follows:
def func(use_mask):
a = numpy.arange(10)
if use_mask:
mask = a % 2 == 0
else:
mask = numpy.ones(10, dtype=bool)
return a[mask]
However, this becomes extremely wasteful for large arrays, since an equally large boolean array must first be created.
My question is thus: Is there something I can pass as an "index" to recreate the behavior of such an everywhere-true array?
Systematically changing occurrences of a[mask] to something else involving some indexing magic etc. is a valid solution, but just avoiding the masking entirely via an expanded case distinction or something else that changes the structure of the code is not desired, as it would impair readability and maintainability (see next paragraph).
For the sake of completeness, here's what I'm currently considering doing, though this makes the code messier and less streamlined since it expands the if/else beyond where it technically needs to be (in reality, the mask is used more than once, hence every occurrence would need to be contained within the case distinction; I used f1 and f2 as examples here):
def func(use_mask):
a = numpy.arange(10)
if use_mask:
mask = a % 2 == 0
r = f1(a[mask])
q = f2(a[mask], r)
return q
else:
r = f1(a)
q = f2(a, r)
return q
Recall that a[:] returns the contents of a (even if a is multidimensional). We cannot store the : in the mask variable, but we can use a slice object equivalently:
def func(use_mask):
a = numpy.arange(10)
if use_mask:
mask = a % 2 == 0
else:
mask = slice(None)
return a[mask]
This does not use any memory to create the index array. I'm not sure what the CPU usage of the a[slice(None)] operation is, though.

Use z3Py to prove the equivalence/difference of ranges of two expressions

from z3 import *
s = Solver()
a, b = Ints("a b")
s.add(a > 2)
s.add(b > 0)
s.add(Or(Exists(a, ForAll(b, a != b)), Exists(b, ForAll(a, a != b))))
s.check() # print "unsat"
I am trying to prove the difference of ranges of a and b. This can be done by locating an assignment to b of value 1 which is beyond the range of a.
However, the program above gives unexpected unsat. I wonder why and whether there is more efficient way to achieve this goal.
ForAll means exactly that: all numbers, i.e. Exists(a, ForAll(b, a != b)) is always false because there is no Int that's different from all Ints and thus the third assertion is unsatisfiable all by itself. You want something like s.add(Exists(a, (Exists (b, And(Not(a > 2), b > 0))))).
Also, note that you use two different a and b. Exists(a, ...) does not quantify over an existing variable, but introduces a new variable that's accidentally called by the same name as your global (existential) a, b.

How to return a numpy array with values where, the common indices values for 2 arrays are both greater than 0

I want the first array to display it's values only when common indices values of both the arrays are greater than zero else make it zero. I'm not really sure how to frame the question. Hopefully the expected output provides better insight.
I tried playing around with np.where, but I can't seem to make it work when 2 arrays are provided.
a = np.array([0,2,1,0,4])
b = np.array([1,1,3,4,0])
# Expected Output
a = ([0,2,1,0,0])
The zip function, which takes elements of two arrays side by side, is useful here. You don't necessarily need an np/numpy function.
import numpy as np
a = np.array([0,2,1,0,4])
b = np.array([1,1,3,4,0])
c = np.array([x if x * y > 0 else 0 for x,y in zip(a, b)])
print(c)

Element-wise equality in 3-D array with arbitrarily sized axes

I am looking for an efficient way to find element-wise equality in a 3D or N-D array, for example, equal value on RGB pixels of an image. Some test data:
a = numpy.arange(100).reshape((10,10))
b = a.copy()
c = a.copy()
b[5,5] = 1
c[6,6] = 100
d = numpy.array([a,b,c])
I can think of three options, the first of which does not generalize well to more dimensions:
equal_mask = (d[0] == d[1]) & (d[0] == d[2])
or
equal_mask = d.min(axis=0) == d.max(axis=0)
or, maybe better:
equal_mask = numpy.logical_and.reduce(d == d[0])
Is there a more efficient solution?
EDIT: I should clarify that I didn't mean n-D, 3-D with different length on the first axis, for example d = numpy.array([a,b,c,a,b,c]).
Maybe this one:
np.logical_and(*(d[0,:]==d[1:,:]))
Here's an approach for nD array cases that looks for all 0's diferentiations along the first axis -
(np.diff(d,axis=0)==0).all(0)
Sample run to verify results -
In [46]: d = np.random.randint(0,9,(3,3,5,2,3,4,2))
In [47]: out = (np.diff(d,axis=0)==0).all(0)
In [48]: np.allclose(out,(d[0] == d[1]) & (d[0] == d[2]))
Out[48]: True
As it turns out, this method seems to be slower than numpy.logical_and.reduce based method as listed in the question. So, at this point looks like sticking with it might be the way to go.
Divakar and Colonel Beauvel's solutions both hint that I can make my solution a bit faster by skipping the check d[0] == d[0]:
numpy.logical_and.reduce(d[1:] == d[0])
In terms of efficiency and ability to to work on arbitrarily size axes, this still appears to be the best solution, so far...

Categories