python printing a generator list after vectorization? - python

I am new with vectorization and generators. So far I have created the following function:
import numpy as np
def ismember(a,b):
for i in a:
if len(np.where(b==i)[0]) == 0:
lv_var = 0
else:
lv_var = np.int(np.where(b==i)[0])
yield lv_var
vect = np.vectorize(ismember)
A = np.array(xrange(700000))
B = np.array(xrange(700000))
lv_result = vect(A,B)
When I try to cast lv_result as a list or loop through the resulting numpy array I get a list of generator objects. I need to somehow get the actual result. How do I print the actual results from this function ? .next() on generator doesn't seem to do the job.
Could someone tell me what is that I am doing wrong or how could I reconfigure the code to achieve the end goal?
---------------------------------------------------
OK so I understand the vectorize part now (thanks Viet Nguyen for the example).
I was also able to print the generator object results. The code has been modified. Please see below.
For the generator part:
What I am trying to do is to mimic a MATLAB function called ismember (The one that is formatted as: [Lia,Locb] = ismember(A,B). I am just trying to get the Locb part only.
From Matlab: Locb, contain the lowest index in B for each value in A that is a member of B. The output array, Locb, contains 0 wherever A is not a member of B
One of the main problems is that I need to be able to perform this operation as efficient as possible. For testing I have two arrays of 700k elements. Creating a generator and going through the values of the generator doesn't seem to get any better performance wise.
To print the Generator, I have created function f() below.
import numpy as np
def ismember(a,b):
for i in a:
index = np.where(b==i)[0]
if len(index) == 0:
yield 0
else:
yield index
def f(A, gen_obj):
my_array = np.arange(len(A))
for i in my_array:
my_array[i] = gen_obj.next()
return my_array
A = np.arange(700000)
B = np.arange(700000)
gen_obj = ismember(A,B)
f(A, gen_obj)
print 'done'
Note: if we were to try the above code with smaller arrays:
Lets say.
A = np.array([3,4,4,3,6])
B = np.array([2,5,2,6,3])
The result will be an array of : [4 0 0 4 3]
Just like matlabs function: the goal is to get the lowest index in B for each value in A that is a member of B. The output array, Locb, contains 0 wherever A is not a member of B.
Numpy's intersection function doesn't help me to achieve the goal. Also the size of the returning array needs to be kept the same size as the size of array A.
So far this process is taking forever(for arrays of 700k elements). Unfortunately I haven't been able to find the best solution yet. Any inputs on how could I reconfigure the code to achieve the end goal, with the best performance, will be much appreciated.
Optimization Problem solved in:
python-run-generator-using-multiple-cores-for-optimization

I believe you've misunderstood the inputs to a numpy.vectorize function. The "vectorized" function operates on the arrays on a per-element basis (see numpy.vectorize reference). Your function ismember seems to presume that the inputs a and b are arrays. Instead, think of the function as something you would use with built-in map().
> import numpy as np
> def mask(a, b):
> return 1 if a == b else 0
> a = np.array([1, 2, 3, 4])
> b = np.array([1, 3, 4, 5])
> maskv = np.vectorize(mask)
> maskv(a, b)
array([1, 0, 0, 0])
Also, if I'm understanding your intention correctly, NumPy comes with an intersection function.

Related

Can I get a similar behavior with numpy broadcasting to the following array assignment in matlab

Given array of ones, size 5, I'm trying to assign it with indices including those larger than 5 (assigned array). the following is Matlab behavior, is there an equivalent behavior in python ?
a = ones(5,1)
a =
1
1
1
1
1
a([2,10,7]) = [2,3,5]
a =
1
2
1
1
1
0
5
0
0
3
To my knowledge it is not possible to use anything close to the matlab syntax with numpy. You either have to allocate the right size from the start, or append a new array/list to the previous one. Here are the two ideas that come to mind:
A: Right size from the start
This is the method I would prefer since it uses less dynamical allocation and thus, is faster.
import numpy as np
a = np.zeros(10) # First assign the array
a[:5] = np.ones(5) # Put the first few elements at 1
a[[1, 9, 6]] = [2, 3, 5] # Assign the values
Note the difference in the indexing since matlab's array start from 1 and python's from 0.
B: Dynamically append to the array
This method is takes longer and is more complex. I would only use it if you have no way of know from the start the complete size of your array:
import numpy as np
# Creat original array
a = np.ones(5)
# Define before hand which indices and values should be added
idx_to_add = [1, 9, 6]
values = [2, 3, 5]
# Loop over indice-values pairs to add them
for i, val in zip(idx_to_add, values):
# If index is in array perform normal assignment
if i < len(a):
a[i] = val
# Else extend the array
else:
extension = zeros(i - len(a) + 1) # Build the array to append
extension[i-len(a)] = val # Assign the new value
a = np.append(a, extension) # Append the extension to a
As you see this is more complex to write but should still work. There might be some extra tricks to reduce a bit the number of lines of code here and there (I didn't use them for clarity) but the general idea stays the same.
Final note
I'll just finish by mentioning that, although the matlab syntax you're using is very short, it is slower. It is good practice to apply something similar to the first thing I suggest in python, where you create the whole array at first and then simply fill it. Your code will be much faster if you can do so.

2 different specified elements from 2 numpy arrays

I have two numpy arrays with 0s and 1s in them. How can I find the indexes with 1 in the first array and 0 in the second?
I tried np.logical_and
But got error message (builtin_function_or_method' object is not subscriptable)
Use np.where(arr1==1) and np.where(arr2==0)
import numpy as np
array1 = np.array([0,0,0,1,1,0,1])
array2 = np.array([0,1,0,0,1,0,1])
ones = np.where(array1 == 1)
zeroes = np.where(array2 == 0)
print("array 1 has 1 at",ones)
print("array 2 has 0 at",zeroes)
returns:
array 1 has 1 at (array([3, 4, 6]),)
array 2 has 0 at (array([0, 2, 3, 5]),)
I'm not sure if theres some built-in numpy function that will do this for you, since it's a fairly specific problem. EDIT: there is, see bottom
Nonetheless, if one were to exist, it would have to be a linear time algorithm if you're passing in a bare numpy array, so writing your own isn't difficult.
If I have any numpy array (or python array) myarray, and I want a collection of indices where some object myobject appears, we can do this in one line using a list comprehension:
indices = [i for i in range(len(myarray)) if myarray[i] == myobject]
So what's going on here?
A list comprehension works in the following format:
[<output> for <input> in <iterable> if <condition>]
In our case, <input> and <output> are the indices of myarray, and the <condition> block checks if the value at the current index is equal to that of our desired value.
Edit: as White_Sirilo helpfully pointed out, numpy.where does the same thing, I stand corrected
Let's say your arrays are called j and k. The following code returns all indices where j[index] = 1 and k[index] = 0 if both arrays are 1-dimensional. It also works if j and k are different sizes.
idx_1 = np.where(j == 1)[0]
idx_2 = np.where(k == 0)[0]
final_indices = np.intersect1d(idx_1, idx_2, return_indices=False)
If your array is 2-dimensional, you can use the above code in a function and then go row-by-row. There are almost definitely better ways to do this, but this works in a pinch.
tow numpy array given in problem.
array1 and array2
just use
one_index=np.where(array1==1)
and
zero_index=np.where(array2==0)

How to return a numpy array with values where, the common indices values for 2 arrays are both greater than 0

I want the first array to display it's values only when common indices values of both the arrays are greater than zero else make it zero. I'm not really sure how to frame the question. Hopefully the expected output provides better insight.
I tried playing around with np.where, but I can't seem to make it work when 2 arrays are provided.
a = np.array([0,2,1,0,4])
b = np.array([1,1,3,4,0])
# Expected Output
a = ([0,2,1,0,0])
The zip function, which takes elements of two arrays side by side, is useful here. You don't necessarily need an np/numpy function.
import numpy as np
a = np.array([0,2,1,0,4])
b = np.array([1,1,3,4,0])
c = np.array([x if x * y > 0 else 0 for x,y in zip(a, b)])
print(c)

How to find last "K" indexes of vector satisfying condition (Python) ? (Analogue of Matlab's "find" )

Consider some vector:
import numpy as np
v = np.arange(10)
Assume we need to find last 2 indexes satisfying some condition.
For example in Matlab it would be written e.g.
find(v <5 , 2,'last')
answer = [ 3 , 4 ] (Note: Matlab indexing from 1)
Question: What would be the clearest way to do that in Python ?
"Nice" solution should STOP search when it finds 2 desired results, it should NOT search over all elements of vector.
So np.where does not seems to be "nice" in that sense.
We can easyly write that using "for", but is there any alternative way ?
I am afraid using "for" since it might be slow (at least it is very much so in Matlab).
This attempt doesn't use numpy, and it is probably not very idiomatic.
Nevertheless, if I understand it correctly, zip, filter and reversed are all lazy iterators that take only the elements that they really need. Therefore, you could try this:
x = list(range(10))
from itertools import islice
res = reversed(list(map(
lambda xi: xi[1],
islice(
filter(
lambda xi: xi[0] < 5,
zip(reversed(x), reversed(range(len(x))))
),
2
)
)))
print(list(res))
Output:
[3, 4]
What it does (from inside to outside):
create index range
reverse both array and indices
zip the reversed array with indices
filter the two (value, index)-pairs that you need, extract them by islice
Throw away the values, retain only indices with map
reverse again
Even though it looks somewhat monstrous, it should all be lazy, and stop after it finds the first two elements that you are looking for. I haven't compared it with a simple loop, maybe just using a loop would be both simpler and faster.
Any solution you'd find will iterate over the list even if the loop is 'hidden' inside a function.
The solution to your problem depends on the assumptions you can make e.g. is the list sorted?
for the general case I'd iterate over the loop starting at the end:
def find(condition, k, v):
indices = []
for i, var in enumerate(reversed(v)):
if condition(var):
indices.append(len(v) - i - 1)
if len(indices) >= k:
break
return indices
The condition should then be passed as a function, so you can use a lambda:
v = range(10)
find(lambda x: x < 5, 3, v)
will output
[4, 3, 2]
I'm not aware of a "good" numpy solution to short-circuiting.
The most principled way to go would be using something like Cython which to brutally oversimplify it adds fast loops to Python. Once you have set that up it would be easy.
If you do not want to do that you'd have to employ some gymnastics like:
import numpy as np
def find_last_k(vector, condition, k, minchunk=32):
if k > minchunk:
minchunk = k
l, r = vector.size - minchunk, vector.size
found = []
n_found = 0
while r > 0:
if l <= 0:
l = 0
found.append(l + np.where(condition(vector[l:r]))[0])
n_found += len(found[-1])
if n_found >= k:
break
l, r = 3 * l - 2 * r, l
return np.concatenate(found[::-1])[-k:]
This tries balancing loop overhead and numpy "inflexibility" by searching in chunks, which we grow exponentially until enough hits are found.
Not exactly pretty, though.
This is what I've found that seems to do this job for the example described (using argwhere which returns all indices that meet the criteria and then we find the last two of these as a numpy array):
ind = np.argwhere(v<5)
ind[-2:]
This searches through the entire array so is not optimal but is easy to code.

Evaluating a function using numpy

What is the significance of the return part when evaluating functions? Why is this necessary?
Your assumption is right: dfdx[0] is indeed the first value in that array, so according to your code that would correspond to evaluating the derivative at x=-1.0.
To know the correct index where x is equal to 0, you will have to look for it in the x array.
One way to find this is the following, where we find the index of the value where |x-0| is minimal (so essentially where x=0 but float arithmetic requires taking some precautions) using argmin :
index0 = np.argmin(np.abs(x-0))
And we then get what we want, dfdx at the index where x is 0 :
print dfdx[index0]
An other but less robust way regarding float arithmetic trickery is to do the following:
# we make a boolean array that is True where x is zero and False everywhere else
bool_array = (x==0)
# Numpy alows to use a boolean array as a way to index an array
# Doing so will get you the all the values of dfdx where bool_array is True
# In our case that will hopefully give us dfdx where x=0
print dfdx[bool_array]
# same thing as oneliner
print dfdx[x==0]
You give the answer. x[0] is -1.0, and you want the value at the middle of the array.`np.linspace is the good function to build such series of values :
def f1(x):
g = np.sin(math.pi*np.exp(-x))
return g
n = 1001 # odd !
x=linspace(-1,1,n) #x[n//2] is 0
f1x=f1(x)
df1=np.diff(f1(x),1)
dx=np.diff(x)
df1dx = - math.pi*np.exp(-x)*np.cos(math.pi*np.exp(-x))[:-1] # to discard last element
# In [3]: np.allclose(df1/dx,df1dx,atol=dx[0])
# Out[3]: True
As an other tip, numpy arrays are more efficiently and readably used without loops.

Categories