If I have two numpy arrays and want to find the the non-intersecting values, how do I do it?
Here's a short example of what I can't figure out.
a = ['Brian', 'Steve', 'Andrew', 'Craig']
b = ['Andrew','Steve']
I want to find the non-intersecting values. In this case I want my output to be:
['Brian','Craig']
The opposite of what I want is done with this:
c=np.intersect1d(a,b)
which returns
['Andrew' 'Steve']
You can use setxor1d. According to the documentation:
Find the set exclusive-or of two arrays.
Return the sorted, unique values that are in only one (not both) of the input arrays.
Usage is as follows:
import numpy
a = ['Brian', 'Steve', 'Andrew', 'Craig']
b = ['Andrew','Steve']
c = numpy.setxor1d(a, b)
Executing this will result in c having a value of array(['Brian', 'Craig']).
Given that none of the objects shown in your question are Numpy arrays, you don't need Numpy to achieve this:
c = list(set(a).symmetric_difference(b))
If you have to have a Numpy array as the output, it's trivial to create one:
c = np.array(set(a).symmetric_difference(b))
(This assumes that the order in which elements appear in c does not matter. If it does, you need to state what the expected order is.)
P.S. There is also a pure Numpy solution, but personally I find it hard to read:
c = np.setdiff1d(np.union1d(a, b), np.intersect1d(a, b))
np.setdiff1d(a,b)
This will return non intersecting value of first argument with second argument
Example:
a = [1,2,3]
b = [1,3]
np.setdiff1d(a,b) -> returns [2]
np.setdiff1d(b,a) -> returns []
This should do it for python arrays
c=[x for x in a if x not in b]+[x for x in b if x not in a]
It first collects all the elements from a that are not in b and then adds all those elements from b that are not in a. This way you get all elements that are in a or b, but not in both.
import numpy as np
a = np.array(['Brian', 'Steve', 'Andrew', 'Craig'])
b = np.array(['Andrew','Steve'])
you can use
set(a) - set(b)
Output:
set(['Brian', 'Craig'])
Note: set operation returns unique values
Related
Suppose I have three lists where one contains NaN's (I think they're 'NaNs', they get printed as '--' from a previous masked array operation):
a = [1,2,3,4,5]
b = [6,7,--,9,--]
c = [6,7,8,9,10]
I'd like to perform an operation that iterates through b, and deletes the indexes from all lists where b[i]=NaN. I'm thinking something like this:
for i in range(0,len(b):
if b[i] = NaN:
del.a[i] etc
b is generated from from masking c under some condition earlier on in my code, something like this:
b = np.ma.MaskedArray(c, condition)
Thanks!
This is easy to do using numpy:
import numpy as np
a = np.array([1,2,3,4,5])
b = np.array([6,7,np.NaN,9,np.NaN])
c = np.array([6,7,8,9,10])
where_are_nans = np.isnan(b)
filtered_array = a[~where_are_nans] #note the ~ negation
print(filtered_array)
And as you can easily see it returns:
[1 2 4]
I want to implement the following function in theano function,
a=numpy.array([ [b_row[dictx[idx]] if idx in dictx else 0 for idx in range(len(b_row))]
for b_row in b])
where a, b are narray, and dictx is a dictionary
I got the error TensorType does not support iteration
Do I have to use scan? or is there any simpler way?
Thanks!
Since b is of type ndarray, I'll assume every b_row has the same length.
If I understood correctly the code swaps the order of columns in b according to dictx, and pads the non-specified columns with zeros.
The main problem is Theano doesn't have a dictionary-like data structure (please let me know if there's one).
Because in your example the dictionary keys and values are integers within range(len(b_row)), one way to work around this is to construct a vector that uses indices as keys (if some index should not be contained in the dictionary, make its value -1).
The same idea should apply for mapping elements of a matrix in general, and there're certainly other (better) ways of doing this.
Here is the code.
Numpy:
dictx = {0:1,1:2}
b = numpy.asarray([[1,2,3],
[4,5,6],
[7,8,9]])
a = numpy.array([[b_row[dictx[idx]] if idx in dictx else 0 for idx in range(len(b_row))] for b_row in b])
print a
Theano:
dictx = theano.shared(numpy.asarray([1,2,-1]))
b = tensor.matrix()
a = tensor.switch(tensor.eq(dictx, -1), tensor.zeros_like(b), b[:,dictx])
fn = theano.function([b],a)
print fn(numpy.asarray([[1,2,3],
[4,5,6],
[7,8,9]]))
They both print:
[[2 3 0]
[5 6 0]
[8 9 0]]
Say I have 3 integer arrays: {1,2,3}, {2,3}, {1}
I must take exactly one element from each array, to form a new array where all numbers are unique. In this example, the correct answers are: {2,3,1} and {3,2,1}. (Since I must take one element from the 3rd array, and I want all numbers to be unique, I must never take the number 1 from the first array.)
What I have done:
for a in array1:
for b in array2:
for c in array3:
if a != b and a != c and b != c:
AddAnswer(a,b,c)
This is brute force, which works, but it doesn't scale well. What if now we are dealing with 20 arrays instead of just 3. I don't think it's good to write a 20 nested for-loops. Is there a clever way to do this?
What about:
import itertools
arrs = [[1,2,3], [2,3], [1]]
for x in itertools.product(*arrs):
if len(set(x)) < len(arrs): continue
AddAnswer(x)
AddAnswer(x) is called twice, with the tuples:
(2, 3, 1)
(3, 2, 1)
You can think of this as finding a matching in a bipartite graph.
You are trying to select one element from each set, but are not allowed to select the same element twice, so you are trying to match sets to numbers.
You can use the matching function in the graph library NetworkX to do this efficiently.
Python example code:
import networkx as nx
A=[ [1,2,3], [2,3], [1] ]
numbers = set()
for s in A:
for n in s:
numbers.add(n)
B = nx.Graph()
for n in numbers:
B.add_node('%d'%n,bipartite=1)
for i,s in enumerate(A):
set_name = 's%d'%i
B.add_node(set_name,bipartite=0)
for n in s:
B.add_edge(set_name,n)
matching = nx.maximal_matching(B)
if len(matching) != len(A):
print 'No complete matching'
else:
for number,set_name in matching:
print 'choose',number,'from',set_name
This is a simple, efficient method for finding a single matching.
If you want to enumerate through all matchings you may wish to read:
Algorithms for Enumerating All Perfect, Maximum and
Maximal Matchings in Bipartite Graphs by Takeaki UNO which gives O(V) complexity per matching.
A recursive solution (not tested):
def max_sets(list_of_sets, excluded=[]):
if not list_of_sets:
return [set()]
else:
res = []
for x in list_of_sets[0]:
if x not in excluded:
for candidate in max_sets(list_of_sets[1:], exclude+[x]):
candidate.add(x)
res.append(candidate)
return res
(You could probably dispense with the set but it's not clear if it was in the question or not...)
I have two arrays
A=np.array([[2,0],
[3,4],
[5,6]])
and
B=np.array([[4,3],
[6,7],
[3,4],
[2,0]])
I want to essentially subtract B from A and obtain indices of elements in A which are present in B. How do I accomplish this? In this example, I need answers like:
C=[0,1] //index in A of elements repeated in B
D=[[2,0], [3,4]] //their value
E=[3,2] //index location of these in B
Several usual commands like nonzero, remove, filter, etc seem unusable for ND arrays. Can someone please help me out?
You can define a data type that will be the concatenation of your columns, allowing you to use the 1d set operations:
a = np.ascontiguousarray(A).view(np.dtype((np.void, A.shape[1]*min(A.strides))))
b = np.ascontiguousarray(B).view(np.dtype((np.void, B.shape[1]*min(B.strides))))
check = np.in1d(a, b)
C = np.where(check)[0]
D = A[check]
check = np.in1d(b, a)
E = np.where(check)[0]
If you wanted only D, for example, you could have done:
D = np.intersect1d(a, b).view(A.dtype).reshape(-1, A.shape[1])
note in the last example how the original dtype can be recovered.
I have a two arrays of integers
a = numpy.array([1109830922873, 2838383, 839839393, ..., 29839933982])
b = numpy.array([2838383, 555555555, 2839474582, ..., 29839933982])
where len(a) ~ 15,000 and len(b) ~ 2 million.
What I want is to find the indices of array b elements which match those in array a. Now, I'm using list comprehension and numpy.argwhere() to achieve this:
bInds = [ numpy.argwhere(b == c)[0] for c in a ]
however, obviously, it is taking a long time to complete this. And array a will become larger too, so this is not a sensible route to take.
Is there a better way to achieve this result, considering the large arrays I'm dealing with here? It currently takes around ~5 minutes to do this. Any speed up is needed!
More info: I want the indices to match the order of array a too. (Thanks Charles)
Unless I'm mistaken, your approach searches the entire array b for each element of a again and again.
Alternatively, you could create a dictionary mapping the individual elements from b to their indices.
indices = {}
for i, e in enumerate(b):
indices[e] = i # if elements in b are unique
indices.setdefault(e, []).append(i) # otherwise, use lists
Then you can use this mapping for quickly finding the indices where elements from a can be found in b.
bInds = [ indices[c] for c in a ]
This take about a second to run.
import numpy
#make some fake data...
a = (numpy.random.random(15000) * 2**16).astype(int)
b = (numpy.random.random(2000000) * 2**16).astype(int)
#find indcies of b that are contained in a.
set_a = set(a)
result = set()
for i,val in enumerate(b):
if val in set_a:
result.add(i)
result = numpy.array(list(result))
result.sort()
print result