How to map one matrix value to another in theano function - python

I want to implement the following function in theano function,
a=numpy.array([ [b_row[dictx[idx]] if idx in dictx else 0 for idx in range(len(b_row))]
for b_row in b])
where a, b are narray, and dictx is a dictionary
I got the error TensorType does not support iteration
Do I have to use scan? or is there any simpler way?
Thanks!

Since b is of type ndarray, I'll assume every b_row has the same length.
If I understood correctly the code swaps the order of columns in b according to dictx, and pads the non-specified columns with zeros.
The main problem is Theano doesn't have a dictionary-like data structure (please let me know if there's one).
Because in your example the dictionary keys and values are integers within range(len(b_row)), one way to work around this is to construct a vector that uses indices as keys (if some index should not be contained in the dictionary, make its value -1).
The same idea should apply for mapping elements of a matrix in general, and there're certainly other (better) ways of doing this.
Here is the code.
Numpy:
dictx = {0:1,1:2}
b = numpy.asarray([[1,2,3],
[4,5,6],
[7,8,9]])
a = numpy.array([[b_row[dictx[idx]] if idx in dictx else 0 for idx in range(len(b_row))] for b_row in b])
print a
Theano:
dictx = theano.shared(numpy.asarray([1,2,-1]))
b = tensor.matrix()
a = tensor.switch(tensor.eq(dictx, -1), tensor.zeros_like(b), b[:,dictx])
fn = theano.function([b],a)
print fn(numpy.asarray([[1,2,3],
[4,5,6],
[7,8,9]]))
They both print:
[[2 3 0]
[5 6 0]
[8 9 0]]

Related

Searching index position in python

cols = [2,4,6,8,10,12,14,16,18] # selected the columns i want to work with
df = pd.read_csv('mywork.csv')
df1 = df.iloc[:, cols]
b= np.array(df1)
b
outcome
b = [['WV5 6NY' 'RE4 9VU' 'BU4 N90' 'TU3 5RE' 'NE5 4F']
['SA8 7TA' 'BA31 0PO' 'DE3 2FP' 'LR98 4TS' 0]
['MN0 4NU' 'RF5 5FG' 'WA3 0MN' 'EA15 8RE' 'BE1 4RE']
['SB7 0ET' 'SA7 0SB' 'BT7 6NS' 'TA9 0LP' 'BA3 1OE']]
a = np.concatenate(b) #concatenated to get a single array, this worked well
a = np.array([x for x in a if x != 'nan'])
a = a[np.where(a != '0')] #removed the nan
print(np.sort(a)) # to sort alphabetically
#Sorted array
['BA3 1OE' 'BA31 0PO' 'BE1 4RE' 'BT7 6NS' 'BU4 N90'
'DE3 2FP' 'EA15 8RE' 'LR98 4TS' 'MN0 4NU', 'NE5 4F' 'RE4 9VU'
'RF5 5FG' 'SA7 0SB' 'SA8 7TA' 'SB7 0ET' 'TA9 0LP' 'TU3 5RE'
'WA3 0MN' 'WV5 6NY']
#Find the index position of all elements of b in a(sorted array)
def findall_index(b, a )
result = []
for i in range(len(a)):
for j in range(len(a[i])):
if b[i][j] == a:
result.append((i, j))
return result
print(findall_index(0,result))
I am still very new with python, I tried finding the index positions of all element of b in a above. The underneath codes blocks doesn't seem to be giving me any result. Please can some one help me.
Thank you in advance.
One way you could approach this is by zipping (creating pairs) the index of elements in b with the actual elements and then sorting this new array based on the elements only. Now you have a mapping from indices of the original array to the new sorted array. You can then just loop over the sorted pairs to map the current index to the original index.
I would highly suggest you to code this yourself, since it will help you learn!

Remove elements from lists when index is one lists is NaN

Suppose I have three lists where one contains NaN's (I think they're 'NaNs', they get printed as '--' from a previous masked array operation):
a = [1,2,3,4,5]
b = [6,7,--,9,--]
c = [6,7,8,9,10]
I'd like to perform an operation that iterates through b, and deletes the indexes from all lists where b[i]=NaN. I'm thinking something like this:
for i in range(0,len(b):
if b[i] = NaN:
del.a[i] etc
b is generated from from masking c under some condition earlier on in my code, something like this:
b = np.ma.MaskedArray(c, condition)
Thanks!
This is easy to do using numpy:
import numpy as np
a = np.array([1,2,3,4,5])
b = np.array([6,7,np.NaN,9,np.NaN])
c = np.array([6,7,8,9,10])
where_are_nans = np.isnan(b)
filtered_array = a[~where_are_nans] #note the ~ negation
print(filtered_array)
And as you can easily see it returns:
[1 2 4]

'Remove' command for ND arrays in Python

I have two arrays
A=np.array([[2,0],
[3,4],
[5,6]])
and
B=np.array([[4,3],
[6,7],
[3,4],
[2,0]])
I want to essentially subtract B from A and obtain indices of elements in A which are present in B. How do I accomplish this? In this example, I need answers like:
C=[0,1] //index in A of elements repeated in B
D=[[2,0], [3,4]] //their value
E=[3,2] //index location of these in B
Several usual commands like nonzero, remove, filter, etc seem unusable for ND arrays. Can someone please help me out?
You can define a data type that will be the concatenation of your columns, allowing you to use the 1d set operations:
a = np.ascontiguousarray(A).view(np.dtype((np.void, A.shape[1]*min(A.strides))))
b = np.ascontiguousarray(B).view(np.dtype((np.void, B.shape[1]*min(B.strides))))
check = np.in1d(a, b)
C = np.where(check)[0]
D = A[check]
check = np.in1d(b, a)
E = np.where(check)[0]
If you wanted only D, for example, you could have done:
D = np.intersect1d(a, b).view(A.dtype).reshape(-1, A.shape[1])
note in the last example how the original dtype can be recovered.

Find the non-intersecting values of two arrays

If I have two numpy arrays and want to find the the non-intersecting values, how do I do it?
Here's a short example of what I can't figure out.
a = ['Brian', 'Steve', 'Andrew', 'Craig']
b = ['Andrew','Steve']
I want to find the non-intersecting values. In this case I want my output to be:
['Brian','Craig']
The opposite of what I want is done with this:
c=np.intersect1d(a,b)
which returns
['Andrew' 'Steve']
You can use setxor1d. According to the documentation:
Find the set exclusive-or of two arrays.
Return the sorted, unique values that are in only one (not both) of the input arrays.
Usage is as follows:
import numpy
a = ['Brian', 'Steve', 'Andrew', 'Craig']
b = ['Andrew','Steve']
c = numpy.setxor1d(a, b)
Executing this will result in c having a value of array(['Brian', 'Craig']).
Given that none of the objects shown in your question are Numpy arrays, you don't need Numpy to achieve this:
c = list(set(a).symmetric_difference(b))
If you have to have a Numpy array as the output, it's trivial to create one:
c = np.array(set(a).symmetric_difference(b))
(This assumes that the order in which elements appear in c does not matter. If it does, you need to state what the expected order is.)
P.S. There is also a pure Numpy solution, but personally I find it hard to read:
c = np.setdiff1d(np.union1d(a, b), np.intersect1d(a, b))
np.setdiff1d(a,b)
This will return non intersecting value of first argument with second argument
Example:
a = [1,2,3]
b = [1,3]
np.setdiff1d(a,b) -> returns [2]
np.setdiff1d(b,a) -> returns []
This should do it for python arrays
c=[x for x in a if x not in b]+[x for x in b if x not in a]
It first collects all the elements from a that are not in b and then adds all those elements from b that are not in a. This way you get all elements that are in a or b, but not in both.
import numpy as np
a = np.array(['Brian', 'Steve', 'Andrew', 'Craig'])
b = np.array(['Andrew','Steve'])
you can use
set(a) - set(b)
Output:
set(['Brian', 'Craig'])
Note: set operation returns unique values

large array searching with numpy

I have a two arrays of integers
a = numpy.array([1109830922873, 2838383, 839839393, ..., 29839933982])
b = numpy.array([2838383, 555555555, 2839474582, ..., 29839933982])
where len(a) ~ 15,000 and len(b) ~ 2 million.
What I want is to find the indices of array b elements which match those in array a. Now, I'm using list comprehension and numpy.argwhere() to achieve this:
bInds = [ numpy.argwhere(b == c)[0] for c in a ]
however, obviously, it is taking a long time to complete this. And array a will become larger too, so this is not a sensible route to take.
Is there a better way to achieve this result, considering the large arrays I'm dealing with here? It currently takes around ~5 minutes to do this. Any speed up is needed!
More info: I want the indices to match the order of array a too. (Thanks Charles)
Unless I'm mistaken, your approach searches the entire array b for each element of a again and again.
Alternatively, you could create a dictionary mapping the individual elements from b to their indices.
indices = {}
for i, e in enumerate(b):
indices[e] = i # if elements in b are unique
indices.setdefault(e, []).append(i) # otherwise, use lists
Then you can use this mapping for quickly finding the indices where elements from a can be found in b.
bInds = [ indices[c] for c in a ]
This take about a second to run.
import numpy
#make some fake data...
a = (numpy.random.random(15000) * 2**16).astype(int)
b = (numpy.random.random(2000000) * 2**16).astype(int)
#find indcies of b that are contained in a.
set_a = set(a)
result = set()
for i,val in enumerate(b):
if val in set_a:
result.add(i)
result = numpy.array(list(result))
result.sort()
print result

Categories