array operations in numpy vs matlab - python

I have a Matlab code that I have to convert into python. There is this one operation that I am currently struggling with. I have a Matlab array "edof" which has the dimensions 262144 x 24. I have a second array "dofVector" which has the dimensions 823875 x 1. The operation performed is:
edof = dofVector(edof);
which updates my variable and I get the resulting dimensions of the variable "edof" to be the same as before i.e 262144 x 24 but the values are changed.
I am trying to convert this line of code in numpy but have been unable do so.
edof = dofVector[edof]
I get the following error.
Can someone please help me here?
I am not including the description of the arrays here as this is highly technical and specific to my field.
Thank you!

Numpy indices are zero-based. MATLAB indices are one-based. So if edof is identical between MATLAB and Python up to that step, you'll want to do
edof = dofVector[edof - 1]

Related

Difference in outputs between numpy.sum() in python and sum() in matlab

I'm converting MATLAB code to Python
This is my code in python:
import numpy as np
import math
n=150
L=1
inter=L/n
y=np.linspace(inter/2,L-inter/2,n).transpose()
E=(210000000000)*np.ones(n)
Rho=7800*np.ones(n)
PI=math.pi
A=np.exp( 5+2*y*(np.sin(2*PI*y/L)) )*0.000001
This works fine up until this point with no difference in values or issues until I have to execute this piece of MATLAB code.
Mass=sum(Rho*inter.*A)
I tried the same using np.sum(Rho*inter*A) and just Rho*inter*A
The first case I got a single answer 1.0626206716847877 but MATLAB returns a 150 element array.
In the scond case I got an ndarray like I wanted but the values were not the same as what I got in MATLAB.
Values I got in MATLAB : matlab values pastebin
Values I got in python : python values pastebin
What am I doing wrong?
(Rho[:,None]*inter*A).sum(axis=0)
matches your MATLAB pastebin.
Or using einsum to sort out the axes:
np.einsum('i,j->j', Rho,inter*A)
which just reduces to:
Rho.sum() * inter*A
Is that really what you are trying to do in MATLAB?
It might help if you showed the actual MATLAB code used to create Rho, A etc.
Mass=sum(Rho*inter.*A)
What's the size of Rho and A in MATLAB? One may be [1x150], but the other? Is Rho [1x150] also, or [150x150]. The * is matrix multiplication, like # in numpy, but .* is elementwise.
In the numpy code y, Rho and A all have shape (150,). The transpose on y does nothing. Rho*inter*A is elementwise multiplication producing a (150,) as well.
NumPy always sums all elements of a matrix. MATLAB's default is column-based, i.e. all of your 150 columns sum to a single total, hence the array. Use sum(matrix,'all'); in MATLAB to sum over all elements in a matrix. If you have a MATLAB older than 2018b, use sum(matrix(:)), i.e. store your matrix in a temporary variable, then flatten it to a column before summing.
To sum over columns in Python, specify the axis, being 0: np.sum(matrix,axis=0)
numpy.sum():
Axis or axes along which a sum is performed. The default, axis=None, will sum all of the elements of the input array.
sum() from MATLAB:
S = sum(A) returns the sum of the elements of A along the first array dimension whose size does not equal 1.
If A is a matrix, then sum(A) returns a row vector containing the sum of each column.
S = sum(A,'all') computes the sum of all elements of A. This syntax is valid for MATLAB® versions R2018b and later.
To prevent this kind of unclarities, I prefer to always specify which direction to sum over, i.e. sum(matrix,1) for MATLAB and np.sum(matrix,axis=0) for NumPy, regardless of the default.
I think that in MATLAB using sum on a matrix you will get the sum of its individual columns and you will end up with an array with its number of elements equal to that of the columns. Use one more sum command in MATLAB: sum(sum(M)), which is the equivalent of np.sum(M) in Python.

Computation difference between function and manual computation

I am facing a mystery right now. I get strange results in some program and I think it may be related to the computation since I got different results with my functions compared to manual computation.
This is from my program, I am printing the values pre-computation :
print("\nPrecomputation:\nmatrix\n:", matrix)
tmp = likelihood_left * likelihood_right
print("\nconditional_dep:", tmp)
print("\nfinal result:", matrix # tmp)
I got the following output:
Precomputation:
matrix:
[array([0.08078721, 0.5802404 , 0.16957052, 0.09629893, 0.07310294])
array([0.14633129, 0.45458744, 0.20096238, 0.02142105, 0.17669784])
array([0.41198731, 0.06197812, 0.05934063, 0.23325626, 0.23343768])
array([0.15686545, 0.29516415, 0.20095091, 0.14720275, 0.19981674])
array([0.15965914, 0.18383683, 0.10606946, 0.14234812, 0.40808645])]
conditional_dep: [0.01391123 0.01388155 0.17221067 0.02675524 0.01033257]
final result: [0.07995043 0.03485223 0.02184015 0.04721548 0.05323298]
The thing is when I compute the following code:
matrix = [np.array([0.08078721, 0.5802404 , 0.16957052, 0.09629893, 0.07310294]),
np.array([0.14633129, 0.45458744, 0.20096238, 0.02142105, 0.17669784]),
np.array([0.41198731, 0.06197812, 0.05934063, 0.23325626, 0.23343768]),
np.array([0.15686545, 0.29516415, 0.20095091, 0.14720275, 0.19981674]),
np.array([0.15965914, 0.18383683, 0.10606946, 0.14234812, 0.40808645])]
tmp = np.asarray([0.01391123, 0.01388155, 0.17221067, 0.02675524, 0.01033257])
matrix # tmp
The values in use are exactly the same as they should be in the computation before but I get the following result:
array([0.04171218, 0.04535276, 0.02546353, 0.04688848, 0.03106443])
This result is then obviously different than the previous one and is the true one (I computed the dot product by hand).
I have been facing this problem the whole day and I did not find anything useful online. If any of you have any even tiny idea where it can come from I'd be really happy :D
Thank's in advance
Yann
PS: I can show more of the code if needed.
PS2: I don't know if it is relevant but this is used in a dynamic programming algorithm.
To recap our discussion in the comments, in the first part ("pre-computation"), the following is true about the matrix object:
>>> matrix.shape
(5,)
>>> matrix.dtype
dtype('O') # aka object
And as you say, this is due to matrix being a slice of a larger, non-uniform array. Let's recreate this situation:
>>> matrix = np.array([[], np.array([0.08078721, 0.5802404 , 0.16957052, 0.09629893, 0.07310294]), np.array([0.14633129, 0.45458744, 0.20096238, 0.02142105, 0.17669784]), np.array([0.41198731, 0.06197812, 0.05934063, 0.23325626, 0.23343768]), np.array([0.15686545, 0.29516415, 0.20095091, 0.14720275, 0.19981674]), np.array([0.15965914, 0.18383683, 0.10606946, 0.14234812, 0.40808645])])[1:]
It is now not a matrix with scalars in rows and columns, but a column vector of column vectors. Technically, matrix # tmp is an operation between two 1-D arrays and hence NumPy should, according to the documentation, calculate the inner product of the two. This is true in this case, with the convention that the sum be over the first axis:
>>> np.array([matrix[i] * tmp[i] for i in range(5)]).sum(axis=0)
array([0.07995043, 0.03485222, 0.02184015, 0.04721548, 0.05323298])
>>> matrix # tmp
array([0.07995043, 0.03485222, 0.02184015, 0.04721548, 0.05323298])
This is essentially the same as taking the transpose of the proper 2-D matrix before the multiplication:
>>> np.stack(matrix).T # tmp
array([0.07995043, 0.03485222, 0.02184015, 0.04721548, 0.05323298])
Equivalently, as noted by #jirasssimok:
>>> tmp # np.stack(matrix)
array([0.07995043, 0.03485222, 0.02184015, 0.04721548, 0.05323298])
Hence the erroneous or unexpected result.
As you have already resolved to do in the comments, this can be avoided in the future by ensuring all matrices are proper 2-D arrays.
It looks like you got the operands switched in one of your matrix multiplications.
Using the same values of matrix and tmp that you provided, matrix # tmp and tmp # matrix provide the two results you showed.1
matrix = [np.array([0.08078721, 0.5802404 , 0.16957052, 0.09629893, 0.07310294]),
np.array([0.14633129, 0.45458744, 0.20096238, 0.02142105, 0.17669784]),
np.array([0.41198731, 0.06197812, 0.05934063, 0.23325626, 0.23343768]),
np.array([0.15686545, 0.29516415, 0.20095091, 0.14720275, 0.19981674]),
np.array([0.15965914, 0.18383683, 0.10606946, 0.14234812, 0.40808645])]
tmp = np.asarray([0.01391123, 0.01388155, 0.17221067, 0.02675524, 0.01033257])
print(matrix # tmp) # [0.04171218 0.04535276 0.02546353 0.04688848 0.03106443]
print(tmp # matrix) # [0.07995043 0.03485222 0.02184015 0.04721548 0.05323298]
To make it a little more obvious what your code is doing, you might also consider using np.dot instead of #. If you pass matrix as the first argument and tmp as the second, it will have the result you want, and make it more clear that you're conceptually calculating dot products rather than multiplying matrices.
As an additional note, if you're performing matrix operations on matrix, it might be better if it was a single two-dimensional array instead of a list of 1-dimensional arrays. this will prevent errors of the sort you'll see right now if you try to run matrix # matrix. This would also let you say matrix.dot(tmp) instead of np.dot(matrix, tmp) if you wanted to.
(I'd guess that you can use np.stack or a similar function to create matrix, or you can call np.stack on matrix after creating it.)
1 Because tmp has only one dimension and matrix has two, NumPy can and will treat tmp as whichever type of vector makes the multiplication work (using broadcasting). So tmp is treated as a column vector in matrix # tmp and a row vector in tmp # matrix.

Why do I keep getting this error 'RuntimeWarning: overflow encountered in int_scalars'

I am trying to multiply all the row values and column values of a 2 dimensional numpy array with an explicit for-loop:
product_0 = 1
product_1 = 1
for x in arr:
product_0 *= x[0]
product_1 *= x[1]
I realize the product will blow up to become an extremely large number but from my previous experience python has had no memory problem dealing very very extremely large numbers.
So from what I can tell this is a problem with numpy except I am not storing the gigantic product in a numpy array or any numpy data type for that matter its just a normal python variable.
Any idea how to fix this?
Using non inplace multiplication hasn't helped product_0 = x[0]*product_0
Python int are represented in arbitrary precision, so they cannot overflow. But numpy uses C++ under the hood, so the highest long signed integer is with fixed precision, 2^63 - 1. Your number is far beyond this value, having in average ((716-1)/2)^86507).
When you, in the for loop, extract the x[0] this is still a numpy object. To use the full power of python integers you need to clearly assign it as python int, like this:
product_0 = 1
product_1 = 1
for x in arr:
t = int(x[0])
product_0 = product_0 * t
and it will not overflow.
Following your comment, which makes your question more specific, your original problem is to calculate the geometric mean of the array for each row/column. Here the solution:
I generate first an array that has the same properties of your array:
arr = np.resize(np.random.randint(1,716,86507*2 ),(86507,2))
Then, calculate the geometric mean for each column/row:
from scipy import stats
gm_0 = stats.mstats.gmean(arr, axis = 0)
gm_1 = stats.mstats.gmean(arr, axis = 1)
gm_0 will be an array that contains the geometric mean of the xand y coordinates. gm_1 instead contains the geometric mean of the rows.
Hope this solves your problem!
You say
So from what I can tell this is a problem with numpy except I am not storing the gigantic product in a numpy array or any numpy data type for that matter its just a normal python variable.
Your product may not be a NumPy array, but it is using a NumPy data type. x[0] and x[1] are NumPy scalars, and multiplying a Python int by a NumPy scalar produces a NumPy scalar. NumPy integers have a finite range.
While you technically could call int on x[0] and x[1] to get a Python int, it'd probably be better to avoid needing such huge ints. You say you're trying to perform this multiplication to compute a geometric mean; in that case, it'd be better to compute the geometric mean by transforming to and from logarithms, or to use scipy.stats.mstats.gmean, which uses logarithms under the hood.
Numpy is compiled for 32 bit and not 64 bit , so while Python can handle this numpy will overflow for smaller values , if u want to use numpy then I recommend that you build it from source .
Edit
After some testing with
import numpy as np
x=np.abs(np.random.randn(1000,2)*1000)
np.max(x)
prod1=np.dtype('int32').type(1)
prod2=np.dtype('int32').type(1)
k=0
for i,j in x:
prod1*=i
prod2*=j
k+=1
print(k," ",prod1,prod2)
1.797693134e308 is the max value (to this many digits my numpy scalar was able to take)
if you run this you will see that numpy is able to handle quite a large value but when you said your max value is around 700 , even with a 1000 values my scalar overflowed.
As for how to fix this , rather than doing this manually the answer using scipy seems more viable now and is able to get the answer so i suggest that you go forward with that
from scipy.stats.mstats import gmean
x=np.abs(np.random.randn(1000,2)*1000)
print(gmean(x,axis=0))
You can achieve what you want with the following command in numpy:
import numpy as np
product_0 = np.prod(arr.astype(np.float64))
It can still reach np.inf if your numbers are large enough, but that can happen for any type.

How to assign column values in large numpy array?

I am new to Python programming and I have a problem in assigning specific values to the first column of a very large numpy.array.
This is the code I use:
import numpy as np
a = np.zeros ((365343020, 9), dtype = np.float32)
for n in range (0, 36534302):
a[n*10:(n+1)*10,0] = n
where the second row is where I create an array, of 365343020 rows and 9 columns, filled with zeros; while the successive “for” is meant to replace the first column of the array with a vector whose elements are 36534302 sequential integers repeated 10 times each (eg [0,0,…,0,1,1,…,1,2,2,…, 36534301, 36534301,…, 36534301]).
The code seems to respond as desired till around row 168000000 or the array, then it substitute the 10 repetitions of numbers with the last digit odd with a second repetition of the (even) number before.
I have looked for explanations regarding the difference between views and copies. However, even trying to manually define the content of a specific cell of the array (where it is wrongly defined by the loop), it does not change.
Could you please help me in solving this problem?
Thanks
Maybe your program is consuming too much memory. Here is some basic math for your code.
Date type: float32
Bits used: 32 bits
Size of array: 3288087180 (365343020*9)
Total memory consumed: 105218789760 bits(13.15234872 GB)
1.Try using float8 bit if value being stored in array is not large.
2.Try to decrease your array size.
3.Both 1 and 2

Issue converting Matlab sparse() code to numpy/scipy with csc_matrix()

I'm a bit of a newbie to both Matlab and Python so, many apologies if this question is a bit dumb...
I'm trying to convert some Matlab code over to Python using numpy and scipy and things were going fine until I reached the sparse matrix that someone wrote. The Matlab code goes like:
unwarpMatrix = sparse(phaseOrigin, ceil([1:nRead*nSlice*nPhaseDmap]/expan), 1, numPoints, numPoints)/expan;
Here's my python code (with my thought process) leading up to my attempt at conversion. For a given dataset I was testing with (in both Matlab and Python):
nread = 64
nslice = 28
nphasedmap = 3200
expan = 100
numpoints = 57344
Thus, the length of phaseorigin, s, and j arrays are 5734400 (and I've confirmed the functions that create my phaseorigin array output exactly the same result that Matlab does)
#Matlab sparse takes: S = sparse(i,j,s,m,n)
#Generates an m by n sparse matrix such that: S(i(k),j(k)) = s(k)
#scipy csc matrix takes: csc_matrix((data, ij), shape=(M, N))
#Matlab code is: unwarpMatrix = sparse(phaseOrigin, ceil([1:nRead*nSlice*nPhaseDmap]/expan), 1, numPoints, numPoints)/expan;
size = nread*nslice*nphasedmap
#i would be phaseOrigin variable
j = np.ceil(np.arange(1,size+1, dtype=np.double)/expan)
#Matlab apparently treats '1' as a scalar so I should be tiling 1 to the same size as j and phaseorigin
s = np.tile(1,size)
unwarpmatrix = csc_matrix((s,(phaseorigin, j)), shape=(numpoints,numpoints))/expan
so when I try to run my python code I get:
ValueError: column index exceedes matrix dimensions
This doesn't occur when I run the Matlab code even though the array sizes are larger than the defined matrix size...
What am I doing wrong? I've obviously screwed something up... Thanks very much in advance for any help!
The problem is; Python indexes start from 0, whereas Matlab indexes start from 1. So for an array of size 57344, in Python first element would be arr[0] and last element would be arr[57343].
You variable j has values from 1 to 57344. You probably see the problem. Creating your j like this would solve the problem:
j = np.floor(np.arange(0,size, dtype=np.double)/expan)
Still, better to check this before using...

Categories