'Remove' command for ND arrays in Python - python

I have two arrays
A=np.array([[2,0],
[3,4],
[5,6]])
and
B=np.array([[4,3],
[6,7],
[3,4],
[2,0]])
I want to essentially subtract B from A and obtain indices of elements in A which are present in B. How do I accomplish this? In this example, I need answers like:
C=[0,1] //index in A of elements repeated in B
D=[[2,0], [3,4]] //their value
E=[3,2] //index location of these in B
Several usual commands like nonzero, remove, filter, etc seem unusable for ND arrays. Can someone please help me out?

You can define a data type that will be the concatenation of your columns, allowing you to use the 1d set operations:
a = np.ascontiguousarray(A).view(np.dtype((np.void, A.shape[1]*min(A.strides))))
b = np.ascontiguousarray(B).view(np.dtype((np.void, B.shape[1]*min(B.strides))))
check = np.in1d(a, b)
C = np.where(check)[0]
D = A[check]
check = np.in1d(b, a)
E = np.where(check)[0]
If you wanted only D, for example, you could have done:
D = np.intersect1d(a, b).view(A.dtype).reshape(-1, A.shape[1])
note in the last example how the original dtype can be recovered.

Related

Assign values to an array using two values

I am trying to generate an array that is the sum of two previous arrays. e.g
c = [A + B for A in a and B in b]
Here, get the error message
NameError: name 'B' is not defined
where
len(a) = len(b) = len(c)
Please can you let me know what I am doing wrong. Thanks.
The boolean and operator does not wire iterables together, it evaluates the truthiness (or falsiness) of its two operands.
What you're looking for is zip:
c = [A + B for A, B in zip(a, b)]
Items from the two iterables are successively assigned to A to B until one of the two is exhausted. B is now defined!
It should be
c = [A + B for A in a for B in b]
for instead of and. You might want to consider using numpy, where you can add 2 matrices directly, and more efficient.
'for' does not work the way you want it to work.
You could use zip().
A = [1,2,3]
B = [4,5,6]
c = [ a + b for a,b in zip(A,B)]
zip iterates through A & B and produces tuples.
To see what this looks like try:
[ x for x in zip(A,B)]

Remove elements from lists when index is one lists is NaN

Suppose I have three lists where one contains NaN's (I think they're 'NaNs', they get printed as '--' from a previous masked array operation):
a = [1,2,3,4,5]
b = [6,7,--,9,--]
c = [6,7,8,9,10]
I'd like to perform an operation that iterates through b, and deletes the indexes from all lists where b[i]=NaN. I'm thinking something like this:
for i in range(0,len(b):
if b[i] = NaN:
del.a[i] etc
b is generated from from masking c under some condition earlier on in my code, something like this:
b = np.ma.MaskedArray(c, condition)
Thanks!
This is easy to do using numpy:
import numpy as np
a = np.array([1,2,3,4,5])
b = np.array([6,7,np.NaN,9,np.NaN])
c = np.array([6,7,8,9,10])
where_are_nans = np.isnan(b)
filtered_array = a[~where_are_nans] #note the ~ negation
print(filtered_array)
And as you can easily see it returns:
[1 2 4]

How to map one matrix value to another in theano function

I want to implement the following function in theano function,
a=numpy.array([ [b_row[dictx[idx]] if idx in dictx else 0 for idx in range(len(b_row))]
for b_row in b])
where a, b are narray, and dictx is a dictionary
I got the error TensorType does not support iteration
Do I have to use scan? or is there any simpler way?
Thanks!
Since b is of type ndarray, I'll assume every b_row has the same length.
If I understood correctly the code swaps the order of columns in b according to dictx, and pads the non-specified columns with zeros.
The main problem is Theano doesn't have a dictionary-like data structure (please let me know if there's one).
Because in your example the dictionary keys and values are integers within range(len(b_row)), one way to work around this is to construct a vector that uses indices as keys (if some index should not be contained in the dictionary, make its value -1).
The same idea should apply for mapping elements of a matrix in general, and there're certainly other (better) ways of doing this.
Here is the code.
Numpy:
dictx = {0:1,1:2}
b = numpy.asarray([[1,2,3],
[4,5,6],
[7,8,9]])
a = numpy.array([[b_row[dictx[idx]] if idx in dictx else 0 for idx in range(len(b_row))] for b_row in b])
print a
Theano:
dictx = theano.shared(numpy.asarray([1,2,-1]))
b = tensor.matrix()
a = tensor.switch(tensor.eq(dictx, -1), tensor.zeros_like(b), b[:,dictx])
fn = theano.function([b],a)
print fn(numpy.asarray([[1,2,3],
[4,5,6],
[7,8,9]]))
They both print:
[[2 3 0]
[5 6 0]
[8 9 0]]

Find the non-intersecting values of two arrays

If I have two numpy arrays and want to find the the non-intersecting values, how do I do it?
Here's a short example of what I can't figure out.
a = ['Brian', 'Steve', 'Andrew', 'Craig']
b = ['Andrew','Steve']
I want to find the non-intersecting values. In this case I want my output to be:
['Brian','Craig']
The opposite of what I want is done with this:
c=np.intersect1d(a,b)
which returns
['Andrew' 'Steve']
You can use setxor1d. According to the documentation:
Find the set exclusive-or of two arrays.
Return the sorted, unique values that are in only one (not both) of the input arrays.
Usage is as follows:
import numpy
a = ['Brian', 'Steve', 'Andrew', 'Craig']
b = ['Andrew','Steve']
c = numpy.setxor1d(a, b)
Executing this will result in c having a value of array(['Brian', 'Craig']).
Given that none of the objects shown in your question are Numpy arrays, you don't need Numpy to achieve this:
c = list(set(a).symmetric_difference(b))
If you have to have a Numpy array as the output, it's trivial to create one:
c = np.array(set(a).symmetric_difference(b))
(This assumes that the order in which elements appear in c does not matter. If it does, you need to state what the expected order is.)
P.S. There is also a pure Numpy solution, but personally I find it hard to read:
c = np.setdiff1d(np.union1d(a, b), np.intersect1d(a, b))
np.setdiff1d(a,b)
This will return non intersecting value of first argument with second argument
Example:
a = [1,2,3]
b = [1,3]
np.setdiff1d(a,b) -> returns [2]
np.setdiff1d(b,a) -> returns []
This should do it for python arrays
c=[x for x in a if x not in b]+[x for x in b if x not in a]
It first collects all the elements from a that are not in b and then adds all those elements from b that are not in a. This way you get all elements that are in a or b, but not in both.
import numpy as np
a = np.array(['Brian', 'Steve', 'Andrew', 'Craig'])
b = np.array(['Andrew','Steve'])
you can use
set(a) - set(b)
Output:
set(['Brian', 'Craig'])
Note: set operation returns unique values

large array searching with numpy

I have a two arrays of integers
a = numpy.array([1109830922873, 2838383, 839839393, ..., 29839933982])
b = numpy.array([2838383, 555555555, 2839474582, ..., 29839933982])
where len(a) ~ 15,000 and len(b) ~ 2 million.
What I want is to find the indices of array b elements which match those in array a. Now, I'm using list comprehension and numpy.argwhere() to achieve this:
bInds = [ numpy.argwhere(b == c)[0] for c in a ]
however, obviously, it is taking a long time to complete this. And array a will become larger too, so this is not a sensible route to take.
Is there a better way to achieve this result, considering the large arrays I'm dealing with here? It currently takes around ~5 minutes to do this. Any speed up is needed!
More info: I want the indices to match the order of array a too. (Thanks Charles)
Unless I'm mistaken, your approach searches the entire array b for each element of a again and again.
Alternatively, you could create a dictionary mapping the individual elements from b to their indices.
indices = {}
for i, e in enumerate(b):
indices[e] = i # if elements in b are unique
indices.setdefault(e, []).append(i) # otherwise, use lists
Then you can use this mapping for quickly finding the indices where elements from a can be found in b.
bInds = [ indices[c] for c in a ]
This take about a second to run.
import numpy
#make some fake data...
a = (numpy.random.random(15000) * 2**16).astype(int)
b = (numpy.random.random(2000000) * 2**16).astype(int)
#find indcies of b that are contained in a.
set_a = set(a)
result = set()
for i,val in enumerate(b):
if val in set_a:
result.add(i)
result = numpy.array(list(result))
result.sort()
print result

Categories