I have an N x N sparse matrix in Matlab, that has cell values indexed by (r,c) pairs such that r and c are unique id's.
The problem is, that after converting this matrix into Python, all of the indices values are decremented by 1.
For example:
Before After
(210058,10326) = 1 (210057,10325) = 1
Currently, I am doing the following to counter this:
mat_contents = sparse.loadmat(filename)
G = mat_contents['G']
I,J = G.nonzero()
I += 1
J += 1
V = G.data
G = sparse.csr_matrix((V,(I,J)))
I have also tried using different options in scipy.sparse.io.loadmat {matlab_compatible, mat_dtype}, but neither worked.
I am looking for a solution that will give me the same indices as the Matlab matrix. Solutions that do not require reconstructing the matrix would be ideal, but I am also curious how others have gotten around this problem.
Thank you all for the good advice.
I decided to stick with Python. I do most of my data transfers between Matlab and Python
using text files now.
Related
I was writing code in Matlab and had to return a matrix which gave a 0 or 1 to represent elements in the original matrix.
I wanted to know if there is a python equivalent of the above without running nested loops to achieve the same result.
c = [2; 1; 3]
temp = eye(3,3)
d = temp(c,:)
the d matrix needs to tell me what number was present in my original matrix.
i = 1, j = 2 if 1 tells me the first element of the original matrix was 2
The "direct" equivalent of that code is this (note the 0-indexing, compared to matlab's 1-indexing)
import numpy
c = numpy.array( [1, 0, 2] )
temp = numpy.eye( 3 )
d = temp[c, :]
Here is the link to the documentation on how to index using 'index arrays' in the official numpy documentation
However, in general what you are doing above is called "one hot" encoding (or "one-of-K", as per Bishop2006). There are specialised methods for one hot encoding in the various machine learning toolkits, which confer some advantages, so you may prefer to look those up instead.
I've occasionally but not frequently used numpy. I'm now needing to do some summations where the sums involve the row/column indices.
I have an m x n array S. I want to do the create a new m x n array whose 's,i' entry is
-c i S[s,i] + g (i+1)S[s,i+1] + (s+1)S[s+1,i-1]
So say S=np.array([[1,2],[3,4], [5,6]]) the result I want is
-c*np.array([[0*1, 1*2],[0*3, 1*4],[0*5, 1*6]])
+ g*np.array([[1*2, 2*0],[1*4, 2*0],[1*6, 2*0]])
+ np.array([[1*0, 1*3],[2*0, 2*5],[3*0, 3*0]])
(that's not all the terms in my equation, but I feel like knowing how to do this would be enough to complete what I'm after).
I think what I will need to do is create a new array whose rows are just the index of the rows and another corresponding for columns. Then do some component-wise multiplication. But this is well outside what I normally do in my research, so I've taken a few wrong steps already.
note: It is understood that where the indices refer to something outside my array the value is zero.
Is there a clean way to do the summation I've described above?
I would do it in several steps, due to your possible out-of-bounds indexing:
import numpy as np
S = np.array([[1,2],[3,4], [5,6]])
c = np.random.rand()
g = np.random.rand()
m,n = S.shape
Stmp1 = S*np.arange(0,n) # i*S[s,i]
Stmp2 = S*np.arange(0,m)[:,None] # s*S[s,i]
# the answer:
Sout = -c*Stmp1
Sout[:,:-1] = Sout[:,:-1] + g*Stmp1[:,1:]
Sout[:-1,1:] = Sout[:-1,1:] + Stmp2[1:,:-1]
# only for control:
Sout2 = -c*np.array([[0*1, 1*2],[0*3, 1*4],[0*5, 1*6]]) \
+ g*np.array([[1*2, 2*0],[1*4, 2*0],[1*6, 2*0]]) \
+ np.array([[1*0, 1*3],[2*0, 2*5],[3*0, 3*0]])
Check:
In [431]: np.all(Sout==Sout2)
Out[431]: True
I introduced auxiliary arrays for i*S[s,i] and s*S[s,i]. While this is clearly not necessary, it makes the code easier to read. We could've easily sliced into the np.arange(0,n) calls directly, but unless memory is not an issue, I find this approach much more straightforward.
I'm currently porting a code base, I initially implemented in Perl, to Python. The following short piece of code takes up about 90% of the significant runtime when I run on the whole dataset.
def equate():
for i in range(row):
for j in range(row):
if adj_matrix[i][j] != adj_matrix[mapping[i]][mapping[j]]:
return False
return True
Where equate is a closure inside of another method, row is an integer, adj_matrix is a list of lists representing a matrix and mapping is a list representing a vector.
The equivalent Perl code is as follows:
sub equate
{
for ( 0..$row)
{
my ($smrow, $omrow) = ($$adj_matrix[$_], $$adj_matrix[$$mapping[$_]]); #DEREF LINE
for (0..$row)
{
return 0 if $$smrow[$_] != $$omrow[$$mapping[$_]];
}
}
return 1;
}
This is encapsulated as a sub ref in the outer subroutine, so I don't have to pass variables to the subroutine.
In short, the Perl version is much much faster and my testing indicates that it is due to the dereferencing in "DEREF LINE". I have tried what I believed was the equivalent in Python:
def equate():
for i in range(row):
row1 = adj_matrix[i]
row2 = adj_matrix[mapping[i]]
for j in range(row):
if row1[j] != row2[mapping[j]]:
return False
return True
But this was an insignificant improvement. Additionally, I tried using a NumPy matrix to represent adj_matrix, but again this was a small improvement probably because adj_matrix is typically a small matrix so the overhead from NumPy is much greater, and I'm not really doing any matrix math operations.
I welcome any suggestion to improve the runtime of the Python equate method and an explanation why my "improved" Python equate method is not much better. While I consider myself a competent Perl programmer, I am a Python novice.
ADDITIONAL DETAILS:
I am using Python 3.4, although similar behavior was observed when I initially implemented it in 2.7. I switched to 3.4 since the lab I work in uses 3.4.
As for the contents of the vectors, allow me to provide some background so the following details make sense. This is part of a algorithm to identify subgraph isomorphisms between two chemical compounds (a and b) represented by the graphs A and B respectively, where each atom is a node and each bond an edge. The above code is for the simplified case where A = B, so I am looking for symmetrical transformations of the compound (planes of symmetry), and the size of A in number of atoms is N. Each atom is assigned a unique index beginning at zero.
Mapping is a 1D vector of dimensions 1xN where each element in a mapping is an integer. mapping[i] = j, represents that atom with index i (will refer to as atom i or generically atom 'index') is currently mapped to atom j. The absence of a mapping is indicated by j = -1.
Adj_matrix is a 2D matrix of dimensions NxN where each element adj_matrix[i][j] = k is a natural number and represents the presence and order of an edge between atoms i and j in compound A. If k = 0, there is no such edge (AKA no bond between i and j) else k > 0 and k represents the order of the bond between atoms i and j.
When A != B, there are two different adj_matrices that are compared in equate and the size of a and b in atoms is Na and Nb. Na does not have to equal Nb, but Na =< Nb. I only mention this as optimizations are possible for the special case that are not valid in the general case, but any advice would be helpful.
With numpy you could vectorize your whole code as follows, assuming adj_matrix and mapping are numpy arrays:
def equate():
row1 = adj_matrix[:row]
row2 = adj_matrix[mapping[:row]]
return np.all(row1 == row2)
It doesn't break out early of the loop if it finds a mismatch, but unless your arrays are huge, the speed of NumPy is going to dominate.
I am trying to write a function that take two m x n matrices as input and gives a binary matrix output, where 0 is returned if an element m,n is less than zero and returns 1 if else. I want these binary values to replace the values that were evaluated as negative or else in an array or matrix format. Here is my code that has produced errors thus far:
def rdMatrix(mat1, mat2):
mat3 = np.dot(mat1,mat2)
arr = [[]]
k = mat3(m,n)
for k in mat3:
if k < 0:
arr.append[0]
else:
arr.append[1]
I am having difficulty in telling the function to map a new value to each element in the matrix and then store it in an array. I'm also having trouble defining what a specific element of m x n is in the for loop. I am new to programming, so please forgive me for any obvious mistakes or errors that will easily fix this function. Also, please let me know if there is anything that needs clarification.
Any help is greatly appreciated, thank you!
This is NumPy, so you can obtain binary matrices using comparison operators.
For example, your code can be implemented very simply as
mat3 = np.dot(mat1, mat2)
return mat3 >= 0
I'm a bit of a newbie to both Matlab and Python so, many apologies if this question is a bit dumb...
I'm trying to convert some Matlab code over to Python using numpy and scipy and things were going fine until I reached the sparse matrix that someone wrote. The Matlab code goes like:
unwarpMatrix = sparse(phaseOrigin, ceil([1:nRead*nSlice*nPhaseDmap]/expan), 1, numPoints, numPoints)/expan;
Here's my python code (with my thought process) leading up to my attempt at conversion. For a given dataset I was testing with (in both Matlab and Python):
nread = 64
nslice = 28
nphasedmap = 3200
expan = 100
numpoints = 57344
Thus, the length of phaseorigin, s, and j arrays are 5734400 (and I've confirmed the functions that create my phaseorigin array output exactly the same result that Matlab does)
#Matlab sparse takes: S = sparse(i,j,s,m,n)
#Generates an m by n sparse matrix such that: S(i(k),j(k)) = s(k)
#scipy csc matrix takes: csc_matrix((data, ij), shape=(M, N))
#Matlab code is: unwarpMatrix = sparse(phaseOrigin, ceil([1:nRead*nSlice*nPhaseDmap]/expan), 1, numPoints, numPoints)/expan;
size = nread*nslice*nphasedmap
#i would be phaseOrigin variable
j = np.ceil(np.arange(1,size+1, dtype=np.double)/expan)
#Matlab apparently treats '1' as a scalar so I should be tiling 1 to the same size as j and phaseorigin
s = np.tile(1,size)
unwarpmatrix = csc_matrix((s,(phaseorigin, j)), shape=(numpoints,numpoints))/expan
so when I try to run my python code I get:
ValueError: column index exceedes matrix dimensions
This doesn't occur when I run the Matlab code even though the array sizes are larger than the defined matrix size...
What am I doing wrong? I've obviously screwed something up... Thanks very much in advance for any help!
The problem is; Python indexes start from 0, whereas Matlab indexes start from 1. So for an array of size 57344, in Python first element would be arr[0] and last element would be arr[57343].
You variable j has values from 1 to 57344. You probably see the problem. Creating your j like this would solve the problem:
j = np.floor(np.arange(0,size, dtype=np.double)/expan)
Still, better to check this before using...