'Remove' command for ND arrays in Python

'Remove' command for ND arrays in Python - python

I have two arrays
A=np.array([[2,0],
[3,4],
[5,6]])
and
B=np.array([[4,3],
[6,7],
[3,4],
[2,0]])
I want to essentially subtract B from A and obtain indices of elements in A which are present in B. How do I accomplish this? In this example, I need answers like:
C=[0,1] //index in A of elements repeated in B
D=[[2,0], [3,4]] //their value
E=[3,2] //index location of these in B
Several usual commands like nonzero, remove, filter, etc seem unusable for ND arrays. Can someone please help me out?

You can define a data type that will be the concatenation of your columns, allowing you to use the 1d set operations:
a = np.ascontiguousarray(A).view(np.dtype((np.void, A.shape[1]*min(A.strides))))
b = np.ascontiguousarray(B).view(np.dtype((np.void, B.shape[1]*min(B.strides))))
check = np.in1d(a, b)
C = np.where(check)[0]
D = A[check]
check = np.in1d(b, a)
E = np.where(check)[0]
If you wanted only D, for example, you could have done:
D = np.intersect1d(a, b).view(A.dtype).reshape(-1, A.shape[1])
note in the last example how the original dtype can be recovered.

Related

Assign values to an array using two values

I am trying to generate an array that is the sum of two previous arrays. e.g
c = [A + B for A in a and B in b]
Here, get the error message
NameError: name 'B' is not defined
where
len(a) = len(b) = len(c)
Please can you let me know what I am doing wrong. Thanks.

The boolean and operator does not wire iterables together, it evaluates the truthiness (or falsiness) of its two operands.
What you're looking for is zip:
c = [A + B for A, B in zip(a, b)]
Items from the two iterables are successively assigned to A to B until one of the two is exhausted. B is now defined!

It should be
c = [A + B for A in a for B in b]
for instead of and. You might want to consider using numpy, where you can add 2 matrices directly, and more efficient.

'for' does not work the way you want it to work.
You could use zip().
A = [1,2,3]
B = [4,5,6]
c = [ a + b for a,b in zip(A,B)]
zip iterates through A & B and produces tuples.
To see what this looks like try:
[ x for x in zip(A,B)]

Remove elements from lists when index is one lists is NaN

Suppose I have three lists where one contains NaN's (I think they're 'NaNs', they get printed as '--' from a previous masked array operation):
a = [1,2,3,4,5]
b = [6,7,--,9,--]
c = [6,7,8,9,10]
I'd like to perform an operation that iterates through b, and deletes the indexes from all lists where b[i]=NaN. I'm thinking something like this:
for i in range(0,len(b):
if b[i] = NaN:
del.a[i] etc
b is generated from from masking c under some condition earlier on in my code, something like this:
b = np.ma.MaskedArray(c, condition)
Thanks!

This is easy to do using numpy:
import numpy as np
a = np.array([1,2,3,4,5])
b = np.array([6,7,np.NaN,9,np.NaN])
c = np.array([6,7,8,9,10])
where_are_nans = np.isnan(b)
filtered_array = a[~where_are_nans] #note the ~ negation
print(filtered_array)
And as you can easily see it returns:
[1 2 4]

How to map one matrix value to another in theano function

I want to implement the following function in theano function,
a=numpy.array([ [b_row[dictx[idx]] if idx in dictx else 0 for idx in range(len(b_row))]
for b_row in b])
where a, b are narray, and dictx is a dictionary
I got the error TensorType does not support iteration
Do I have to use scan? or is there any simpler way?
Thanks!

Since b is of type ndarray, I'll assume every b_row has the same length.
If I understood correctly the code swaps the order of columns in b according to dictx, and pads the non-specified columns with zeros.
The main problem is Theano doesn't have a dictionary-like data structure (please let me know if there's one).
Because in your example the dictionary keys and values are integers within range(len(b_row)), one way to work around this is to construct a vector that uses indices as keys (if some index should not be contained in the dictionary, make its value -1).
The same idea should apply for mapping elements of a matrix in general, and there're certainly other (better) ways of doing this.
Here is the code.
Numpy:
dictx = {0:1,1:2}
b = numpy.asarray([[1,2,3],
[4,5,6],
[7,8,9]])
a = numpy.array([[b_row[dictx[idx]] if idx in dictx else 0 for idx in range(len(b_row))] for b_row in b])
print a
Theano:
dictx = theano.shared(numpy.asarray([1,2,-1]))
b = tensor.matrix()
a = tensor.switch(tensor.eq(dictx, -1), tensor.zeros_like(b), b[:,dictx])
fn = theano.function([b],a)
print fn(numpy.asarray([[1,2,3],
[4,5,6],
[7,8,9]]))
They both print:
[[2 3 0]
[5 6 0]
[8 9 0]]

Find the non-intersecting values of two arrays

If I have two numpy arrays and want to find the the non-intersecting values, how do I do it?
Here's a short example of what I can't figure out.
a = ['Brian', 'Steve', 'Andrew', 'Craig']
b = ['Andrew','Steve']
I want to find the non-intersecting values. In this case I want my output to be:
['Brian','Craig']
The opposite of what I want is done with this:
c=np.intersect1d(a,b)
which returns
['Andrew' 'Steve']

You can use setxor1d. According to the documentation:
Find the set exclusive-or of two arrays.
Return the sorted, unique values that are in only one (not both) of the input arrays.
Usage is as follows:
import numpy
a = ['Brian', 'Steve', 'Andrew', 'Craig']
b = ['Andrew','Steve']
c = numpy.setxor1d(a, b)
Executing this will result in c having a value of array(['Brian', 'Craig']).

Given that none of the objects shown in your question are Numpy arrays, you don't need Numpy to achieve this:
c = list(set(a).symmetric_difference(b))
If you have to have a Numpy array as the output, it's trivial to create one:
c = np.array(set(a).symmetric_difference(b))
(This assumes that the order in which elements appear in c does not matter. If it does, you need to state what the expected order is.)
P.S. There is also a pure Numpy solution, but personally I find it hard to read:
c = np.setdiff1d(np.union1d(a, b), np.intersect1d(a, b))

np.setdiff1d(a,b)
This will return non intersecting value of first argument with second argument
Example:
a = [1,2,3]
b = [1,3]
np.setdiff1d(a,b) -> returns [2]
np.setdiff1d(b,a) -> returns []

This should do it for python arrays
c=[x for x in a if x not in b]+[x for x in b if x not in a]
It first collects all the elements from a that are not in b and then adds all those elements from b that are not in a. This way you get all elements that are in a or b, but not in both.

import numpy as np
a = np.array(['Brian', 'Steve', 'Andrew', 'Craig'])
b = np.array(['Andrew','Steve'])
you can use
set(a) - set(b)
Output:
set(['Brian', 'Craig'])
Note: set operation returns unique values

large array searching with numpy

I have a two arrays of integers
a = numpy.array([1109830922873, 2838383, 839839393, ..., 29839933982])
b = numpy.array([2838383, 555555555, 2839474582, ..., 29839933982])
where len(a) ~ 15,000 and len(b) ~ 2 million.
What I want is to find the indices of array b elements which match those in array a. Now, I'm using list comprehension and numpy.argwhere() to achieve this:
bInds = [ numpy.argwhere(b == c)[0] for c in a ]
however, obviously, it is taking a long time to complete this. And array a will become larger too, so this is not a sensible route to take.
Is there a better way to achieve this result, considering the large arrays I'm dealing with here? It currently takes around ~5 minutes to do this. Any speed up is needed!
More info: I want the indices to match the order of array a too. (Thanks Charles)

Unless I'm mistaken, your approach searches the entire array b for each element of a again and again.
Alternatively, you could create a dictionary mapping the individual elements from b to their indices.
indices = {}
for i, e in enumerate(b):
indices[e] = i # if elements in b are unique
indices.setdefault(e, []).append(i) # otherwise, use lists
Then you can use this mapping for quickly finding the indices where elements from a can be found in b.
bInds = [ indices[c] for c in a ]

This take about a second to run.
import numpy
#make some fake data...
a = (numpy.random.random(15000) * 2**16).astype(int)
b = (numpy.random.random(2000000) * 2**16).astype(int)
#find indcies of b that are contained in a.
set_a = set(a)
result = set()
for i,val in enumerate(b):
if val in set_a:
result.add(i)
result = numpy.array(list(result))
result.sort()
print result

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

'Remove' command for ND arrays in Python - python

Related

Assign values to an array using two values

Remove elements from lists when index is one lists is NaN

How to map one matrix value to another in theano function

Find the non-intersecting values of two arrays

large array searching with numpy

Categories

Resources