2 different specified elements from 2 numpy arrays - python

I have two numpy arrays with 0s and 1s in them. How can I find the indexes with 1 in the first array and 0 in the second?
I tried np.logical_and
But got error message (builtin_function_or_method' object is not subscriptable)

Use np.where(arr1==1) and np.where(arr2==0)

import numpy as np
array1 = np.array([0,0,0,1,1,0,1])
array2 = np.array([0,1,0,0,1,0,1])
ones = np.where(array1 == 1)
zeroes = np.where(array2 == 0)
print("array 1 has 1 at",ones)
print("array 2 has 0 at",zeroes)
returns:
array 1 has 1 at (array([3, 4, 6]),)
array 2 has 0 at (array([0, 2, 3, 5]),)

I'm not sure if theres some built-in numpy function that will do this for you, since it's a fairly specific problem. EDIT: there is, see bottom
Nonetheless, if one were to exist, it would have to be a linear time algorithm if you're passing in a bare numpy array, so writing your own isn't difficult.
If I have any numpy array (or python array) myarray, and I want a collection of indices where some object myobject appears, we can do this in one line using a list comprehension:
indices = [i for i in range(len(myarray)) if myarray[i] == myobject]
So what's going on here?
A list comprehension works in the following format:
[<output> for <input> in <iterable> if <condition>]
In our case, <input> and <output> are the indices of myarray, and the <condition> block checks if the value at the current index is equal to that of our desired value.
Edit: as White_Sirilo helpfully pointed out, numpy.where does the same thing, I stand corrected

Let's say your arrays are called j and k. The following code returns all indices where j[index] = 1 and k[index] = 0 if both arrays are 1-dimensional. It also works if j and k are different sizes.
idx_1 = np.where(j == 1)[0]
idx_2 = np.where(k == 0)[0]
final_indices = np.intersect1d(idx_1, idx_2, return_indices=False)
If your array is 2-dimensional, you can use the above code in a function and then go row-by-row. There are almost definitely better ways to do this, but this works in a pinch.

tow numpy array given in problem.
array1 and array2
just use
one_index=np.where(array1==1)
and
zero_index=np.where(array2==0)

Related

Can I get a similar behavior with numpy broadcasting to the following array assignment in matlab

Given array of ones, size 5, I'm trying to assign it with indices including those larger than 5 (assigned array). the following is Matlab behavior, is there an equivalent behavior in python ?
a = ones(5,1)
a =
1
1
1
1
1
a([2,10,7]) = [2,3,5]
a =
1
2
1
1
1
0
5
0
0
3
To my knowledge it is not possible to use anything close to the matlab syntax with numpy. You either have to allocate the right size from the start, or append a new array/list to the previous one. Here are the two ideas that come to mind:
A: Right size from the start
This is the method I would prefer since it uses less dynamical allocation and thus, is faster.
import numpy as np
a = np.zeros(10) # First assign the array
a[:5] = np.ones(5) # Put the first few elements at 1
a[[1, 9, 6]] = [2, 3, 5] # Assign the values
Note the difference in the indexing since matlab's array start from 1 and python's from 0.
B: Dynamically append to the array
This method is takes longer and is more complex. I would only use it if you have no way of know from the start the complete size of your array:
import numpy as np
# Creat original array
a = np.ones(5)
# Define before hand which indices and values should be added
idx_to_add = [1, 9, 6]
values = [2, 3, 5]
# Loop over indice-values pairs to add them
for i, val in zip(idx_to_add, values):
# If index is in array perform normal assignment
if i < len(a):
a[i] = val
# Else extend the array
else:
extension = zeros(i - len(a) + 1) # Build the array to append
extension[i-len(a)] = val # Assign the new value
a = np.append(a, extension) # Append the extension to a
As you see this is more complex to write but should still work. There might be some extra tricks to reduce a bit the number of lines of code here and there (I didn't use them for clarity) but the general idea stays the same.
Final note
I'll just finish by mentioning that, although the matlab syntax you're using is very short, it is slower. It is good practice to apply something similar to the first thing I suggest in python, where you create the whole array at first and then simply fill it. Your code will be much faster if you can do so.

comparing numpy arrays element-wise setting an element-wise result

possibly this has been asked before, but I have a hard time finding a corresponding solution, since I can't find the right keywords to search for it.
One huge advantage of numpy arrays that I like is that they are transparent for many operations.
However in my code I have a function that has a conditional statement in the form (minimal working example):
import numpy as np
arr1 = np.array([1, 2, 3])
arr2 = np.array([1, 1, 3])
def func(x, y):
if x > y:
z = 1
else:
z = 2
return z
func(arr1, arr2) obviously results in an error message:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
I do understand what the problem here is and why it can't work like this.
What I would like to do here is that x > y is evaluated for each element and then an array z is returned with the corresponding result for each comparison. (Needs to ensure of course that the arrays are of equal length, but that's not a problem here)
I know that I could do this by changing func such that it iterates over the elements, but since I'm trying to improve my numpy skills: is there a way to do this without explicit iteration?
arr1 > arr2 does exactly what you'd hope it does: compare each element of the two arrays and build an array with the result. The result can be used to index in any of the two arrays, should you need to. The equivalent of your function, however, can be done with np.where:
res = np.where(arr1 > arr2, 1, 2)
or equivalently (but slightly less efficiently), using the boolean array directly:
res = np.ones(arr1.shape)
res[arr1 <= arr2] = 2 # note that I have to invert the condition to select the cells where you want the 2

Walk through each column in a numpy matrix efficiently in Python

I have a very big two-dimensions array in Python, using numpy library. I want to walk through each column efficiently and check each time if elements are different from 0 to count their number in every column.
Suppose I have the following matrix.
M = array([[1,2], [3,4]])
The following code enables us to walk through each row efficiently, for example (it is not what I intend to do of course!):
for row_idx, row in enumerate(M):
print "row_idx", row_idx, "row", row
for col_idx, element in enumerate(row):
print "col_idx", col_idx, "element", element
# update the matrix M: square each element
M[row_idx, col_idx] = element ** 2
However, in my case I want to walk through each column efficiently, since I have a very big matrix.
I've heard that there is a very efficient way to achieve this using numpy, instead of my current code:
curr_col, curr_row = 0, 0
while (curr_col < numb_colonnes):
result = 0
while (curr_row < numb_rows):
# If different from 0
if (M[curr_row][curr_col] != 0):
result += 1
curr_row += 1
.... using result value ...
curr_col += 1
curr_row = 0
Thanks in advance!
In the code you showed us, you treat numpy's arrays as lists and for what you can see, it works! But arrays are not lists, and while you can treat them as such it wouldn't make sense to use arrays, or even numpy.
To really exploit the usefulness of numpy you have to operate directly on arrays, writing, e.g.,
M = M*M
when you want to square the elements of an array and using the rich set of numpy functions to operate directly on arrays.
That said, I'll try to get a bit closer to your problem...
If your intent is to count the elements of an array that are different from zero, you can use the numpy function sum.
Using sum, you can obtain the sum of all the elements in an array, or you can sum across a particular axis.
import numpy as np
a = np.array(((3,4),(5,6)))
print np.sum(a) # 18
print np.sum(a, axis=0) # [8, 10]
print np.sum(a, axis=1) # [7, 11]
Now you are protesting: I don't want to sum the elements, I want to count the non-zero elements... but
if you write a logical test on an array, you obtain an array of booleans, e.g, we want to test which elements of a are even
print a%2==0
# [[False True]
# [False True]]
False is zero and True is one, at least when we sum it...
print np.sum(a%2==0) # 2
or, if you want to sum over a column, i.e., the index that changes is the 0-th
print np.sum(a%2==0, axis=0) # [0 2]
or sum across a row
print np.sum(a%2==0, axis=1) # [1 1]
To summarize, for your particular use case
by_col = np.sum(M!=0, axis=0)
# use the counts of non-zero terms in each column, stored in an array
...
# if you need the grand total, use sum again
total = np.sum(by_col)

function to get number of columns in a NumPy array that returns 1 if it is a 1D array

I have defined operations on 3xN NumPy arrays, and I want to loop over each column of the array.
I tried:
for i in range(nparray.shape[1]):
However, if nparray.ndim == 1, this fails.
Is there a clean way to ascertain the number of columns of a NumPy array, for example, to get 1 if it is a 1D array (like MATLAB's size operation does)?
Otherwise, I have implemented:
if nparray.ndim == 1:
num_points = 1
else:
num_points = nparray.shape[1]
for i in range(num_points):
If you're just looking for something less verbose, you could do this:
num_points = np.atleast_2d(nparray).shape[1]
That will, of course, make a new temporary array just to take its shape, which is a little silly… but it'll be pretty cheap, because it's just a view of the same memory.
However, I think your explicit code is more readable, except that I might do it with a try:
try:
num_points = nparray.shape[1]
except IndexError:
num_points = 1
If you're doing this repeatedly, whatever you do, you should wrap it in a function. For example:
def num_points(arr, axis):
try:
return arr.shape[axis]
except IndexError:
return 1
Then all you have to write is:
for i in range(num_points(nparray, 1)):
And of course it means you can change things everywhere by just editing one place, e.g.,:
def num_points(arr, axis):
return nparray[:,...,np.newaxis].shape[1]
If you want to keep the one-liner, how about using conditional expressions:
for i in range(nparray.shape[1] if nparray.ndim > 1 else 1):
pass
By default, to iterate a np.array means to iterate over the rows. If you have to iterate over columns, just iterate through the transposed array:
>>> a2=array(range(12)).reshape((3,4))
>>> for col in a2.T:
print col
[0 4 8]
[1 5 9]
[ 2 6 10]
[ 3 7 11]
What's the intended behavior of an array array([1,2,3]), it is treated as having one column or having 3 cols? It is confusing that you mentioned that the arrays are all 3XN arrays, which means this should be the intended behavior, as it should be treated as having just 1 column:
>>> a1=array(range(3))
>>> for col in a1.reshape((3,-1)).T:
print col
[0 1 2]
So, a general solution: for col in your_array.reshape((3,-1)).T: #do something
I think the easiest way is to use the len function:
for i in range(len(nparray)):
...
Why? Because if the number of nparray is a like a one dimensional vector, len will return the number of elements. In your case, the number of columns.
nparray = numpy.ones(10)
print(len(nparray))
Out: 10
If nparray is like a matrix, the number of columns will be returned.
nparray = numpy.ones((10, 5))
print(len(nparray))
Out: 10
If you have a list of numpy arrays with different sizes, just use len inside a loop based on your list.

python printing a generator list after vectorization?

I am new with vectorization and generators. So far I have created the following function:
import numpy as np
def ismember(a,b):
for i in a:
if len(np.where(b==i)[0]) == 0:
lv_var = 0
else:
lv_var = np.int(np.where(b==i)[0])
yield lv_var
vect = np.vectorize(ismember)
A = np.array(xrange(700000))
B = np.array(xrange(700000))
lv_result = vect(A,B)
When I try to cast lv_result as a list or loop through the resulting numpy array I get a list of generator objects. I need to somehow get the actual result. How do I print the actual results from this function ? .next() on generator doesn't seem to do the job.
Could someone tell me what is that I am doing wrong or how could I reconfigure the code to achieve the end goal?
---------------------------------------------------
OK so I understand the vectorize part now (thanks Viet Nguyen for the example).
I was also able to print the generator object results. The code has been modified. Please see below.
For the generator part:
What I am trying to do is to mimic a MATLAB function called ismember (The one that is formatted as: [Lia,Locb] = ismember(A,B). I am just trying to get the Locb part only.
From Matlab: Locb, contain the lowest index in B for each value in A that is a member of B. The output array, Locb, contains 0 wherever A is not a member of B
One of the main problems is that I need to be able to perform this operation as efficient as possible. For testing I have two arrays of 700k elements. Creating a generator and going through the values of the generator doesn't seem to get any better performance wise.
To print the Generator, I have created function f() below.
import numpy as np
def ismember(a,b):
for i in a:
index = np.where(b==i)[0]
if len(index) == 0:
yield 0
else:
yield index
def f(A, gen_obj):
my_array = np.arange(len(A))
for i in my_array:
my_array[i] = gen_obj.next()
return my_array
A = np.arange(700000)
B = np.arange(700000)
gen_obj = ismember(A,B)
f(A, gen_obj)
print 'done'
Note: if we were to try the above code with smaller arrays:
Lets say.
A = np.array([3,4,4,3,6])
B = np.array([2,5,2,6,3])
The result will be an array of : [4 0 0 4 3]
Just like matlabs function: the goal is to get the lowest index in B for each value in A that is a member of B. The output array, Locb, contains 0 wherever A is not a member of B.
Numpy's intersection function doesn't help me to achieve the goal. Also the size of the returning array needs to be kept the same size as the size of array A.
So far this process is taking forever(for arrays of 700k elements). Unfortunately I haven't been able to find the best solution yet. Any inputs on how could I reconfigure the code to achieve the end goal, with the best performance, will be much appreciated.
Optimization Problem solved in:
python-run-generator-using-multiple-cores-for-optimization
I believe you've misunderstood the inputs to a numpy.vectorize function. The "vectorized" function operates on the arrays on a per-element basis (see numpy.vectorize reference). Your function ismember seems to presume that the inputs a and b are arrays. Instead, think of the function as something you would use with built-in map().
> import numpy as np
> def mask(a, b):
> return 1 if a == b else 0
> a = np.array([1, 2, 3, 4])
> b = np.array([1, 3, 4, 5])
> maskv = np.vectorize(mask)
> maskv(a, b)
array([1, 0, 0, 0])
Also, if I'm understanding your intention correctly, NumPy comes with an intersection function.

Categories