I am calling from matlab (R2015b) a python module that I created. Now I've noticed that, we can only send to python a 1xN vector.
So I fixed this in Matlab
Matlab Code:
a = ones(3, 3);
a = a(:).';
Then I sent a as parameter to python function.
m = py.computeCoreset.computecoreset(a, obj.coresetSize);
Now my problem that python doesn't return Matlab matrix
I noticed that I am returning an ndarray while debugging.
This is my python code:
import numpy as np
def computecoreset(mat, coresetSize):
return np.random.choice(mat, coresetSize)
I guess I need to make the ndarray a matrix once again
But how do I convert it ?
Thanks in advance!
https://www.mathworks.com/matlabcentral/answers/157347-convert-python-numpy-array-to-double
The accepted answer suggests py.array.array function:
data = double(py.array.array('d',py.numpy.nditer(x)));
Which is also listed on
https://www.mathworks.com/help/matlab/matlab_external/handling-data-returned-from-python.html
This is a shot in the dark, because I don't have Matlab to test it, but I suspect you'll have to return a python array object, not a numpy array.
So something like this:
import numpy as np
import array
def computecoreset(mat, coresetSize):
c = np.random.choice(mat, coresetSize)
return array.array('d', c)
Just for completeness, the other way goes like this:
rnd = rand(5);
py.numpy.asarray(rnd)
Python ndarray:
0.3112 0.6541 0.2290 0.9961 0.0046
0.5285 0.6892 0.9133 0.0782 0.7749
0.1656 0.7482 0.1524 0.4427 0.8173
0.6020 0.4505 0.8258 0.1067 0.8687
0.2630 0.0838 0.5383 0.9619 0.0844
Use details function to view the properties of the Python object.
Use double function to convert to a MATLAB array.
Related
I am trying to write a code which reads this file and gives the inverse of square root of each term in a matrix. This is the file I am using:
1.659999999999999963e-04
3.970000000000000005e-04
-8.014499999999999402e-02
-2.274299999999999933e-02
-7.559999999999999880e-03
-3.156229999999999869e-01
5.650100000000000261e-02
2.350100000000000106e-02
-4.383999999999999876e-03
-4.878299999999999997e-02
1.207599999999999993e-02
-5.254199999999999843e-02
1.123500000000000019e-02
1.614240000000000119e-01
1.954900000000000040e-02
-2.614100000000000104e-02
1.534899999999999980e-02
5.446000000000000320e-03
-6.210299999999999848e-02
-9.615000000000000283e-03
1.687800000000000064e-02
6.460999999999999729e-03
-9.490999999999999437e-03
1.676700000000000065e-02
-2.308000000000000156e-03
-1.412399999999999940e-02
8.978899999999999382e-02
1.848960000000000048e-01
5.956000000000000356e-03
-5.592300000000000049e-02
1.114599999999999966e-02
-5.689600000000000213e-02
-6.731000000000000004e-03
2.572999999999999940e-02
1.512000000000000106e-03
-3.237999999999999993e-03
-4.068999999999999700e-03
-1.234000000000000071e-03
2.378109999999999946e-01
-1.128000000000000096e-03
-3.534999999999999948e-03
-4.550000000000000008e-04
1.479999999999999925e-04
5.220000000000000031e-04
3.718099999999999877e-02
1.104580000000000006e-01
1.965000000000000167e-03
4.266999999999999960e-03
-5.140999999999999737e-03
1.648640000000000105e-01
1.776220000000000021e-01
1.922000000000000097e-03
3.250600000000000017e-02
4.402899999999999869e-02
-8.430999999999999259e-03
4.409999999999999858e-04
1.389999999999999905e-04
1.374209999999999876e-01
-2.431860000000000133e-01
-1.727000000000000019e-03
-2.280000000000000126e-04
8.100000000000000375e-05
-7.480999999999999803e-03
8.000000000000000654e-05
-3.939999999999999817e-04
1.441000000000000007e-03
-7.290000000000000473e-04
-3.663000000000000284e-02
-1.657999999999999969e-03
-8.369999999999999619e-04
-6.904999999999999680e-03
1.593100000000000072e-02
-3.393000000000000183e-03
1.495999999999999934e-03
-7.368999999999999682e-03
1.436199999999999977e-02
-1.319700000000000040e-02
-4.557000000000000287e-03
8.123700000000000365e-02
2.447399999999999923e-02
-1.295199999999999997e-02
-8.722100000000000686e-02
-5.232999999999999804e-03
-1.255940000000000112e-01
1.291999999999999963e-03
-1.382999999999999898e-03
4.989999999999999644e-03
1.508000000000000009e-03
2.304399999999999851e-02
2.819400000000000031e-02
3.119999999999999944e-04
-8.781999999999999876e-03
6.794500000000000539e-02
6.198999999999999649e-03
-2.058879999999999877e-01
9.219999999999999680e-04
-1.618800000000000100e-02
-3.415860000000000007e-01
-1.660999999999999933e-03
-4.889999999999999599e-04
1.759999999999999954e-04
-3.763999999999999985e-03
-6.566600000000000215e-02
-7.680000000000000195e-04
-1.231799999999999978e-01
7.047999999999999578e-03
1.425000000000000051e-02
-2.900799999999999906e-02
1.187499999999999944e-01
-1.449199999999999933e-01
-1.106999999999999911e-03
-1.557999999999999923e-03
-2.236999999999999839e-03
-7.270699999999999386e-02
-5.140000000000000254e-04
2.246999999999999865e-03
-1.778949999999999976e-01
1.669599999999999904e-02
-1.277799999999999943e-02
-2.379040000000000044e-01
-2.207999999999999893e-03
1.925000000000000062e-03
7.750100000000000044e-02
-5.004100000000000215e-02
1.704999999999999918e-03
3.272400000000000309e-02
1.957499999999999865e-02
-1.514620000000000133e-01
-3.288999999999999823e-03
-3.605699999999999877e-02
3.648999999999999900e-03
3.459799999999999681e-02
-1.859999999999999945e-04
3.300000000000000253e-05
3.000000000000000076e-06
9.999999999999999547e-07
4.800000000000000122e-05
1.361999999999999929e-03
-6.057300000000000184e-02
5.689999999999999529e-04
-5.000000000000000409e-06
2.984699999999999853e-02
6.999999999999999387e-05
4.600000000000000004e-05
-1.294499999999999991e-02
-2.318000000000000182e-03
-0.000000000000000000e+00
1.858200000000000129e-01
-5.969959999999999711e-01
6.000000000000000152e-06
The code I have tried to write is this:
from ast import Num
from cmath import sqrt
import math
import numpy as np
from numpy import append
f1= open('diagonal.txt', 'r')
k=f1.readlines()
#def mkstr(s):
# str1=" "
# return (str1.join(s))
#print(float(mkstr(k)))
#for j in f1:
#inverse_root(j)
#def mkstr2():
# f1=open("diagonal.txt", "r")
# k=f1.readlines()
# fl=[float(item) for item in k]
# inverse_root(fl)
#for j in mkstr2():
#inverse_root(j)
f=[float(x) for x in k]
sa=np.array(f)
ca=sa.astype(np.float)
def inverse_root(j):
return [1/(sqrt(j))]
j=[inverse_root(ca)]
I am getting the output like this:
DeprecationWarning: `np.float` is a deprecated alias for the builtin `float`. To silence this warning, use `float` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.float64` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
ca=sa.astype(np.float)
Traceback (most recent call last):
File "/home/gian-2018/prateek/python/read_file/27/5/rmwpd.py", line 34, in <module>
j=[inverse_root(ca)]
File "/home/gian-2018/prateek/python/read_file/27/5/rmwpd.py", line 33, in inverse_root
return [1/(sqrt(j))]
TypeError: only length-1 arrays can be converted to Python scalars
So to get inverse of sqrt for each term from the matrix file: I tried to convert the list of these strings into float list, but even after I get error for trying to get list of these numbers into the formula of inverse sqrt instead of using a real number.
Now I am trying to create the inverse formula suitable for an array which means creating a new array which converts each term in the previous matrix into their inverse sqrt and saves the value into the new array.
By any way possible please either tell me how to convert this list into real numbers or convert each term of this matrix into its inverse sqrt and save it into new matrix.
Or just tell me how to implement this task in any way possible if you have understood it.
I have a MATLAB function :
Bits=30
NBits= ceil(fzero(#(x)2^(x) - x -1 - Bits, max(log2(Bits),1)))
I want to convert it to python, I wrote something like this so far:
from numpy import log, log2
from scipy.optimize import root_scalar
def func(x,Bits):
return ((x)2^(x)-x-1-Bits, max(log2(Bits)))
However it says that it need to be (x)*2^
Does anybody know first, if the conversion from Matlab to python is correct? and second if * has to be added?
Upon suggestion I wrote this lambda function:
lambda x: (2^(x) -x -1 -Bits) , max(log2(Bits))
but I get this error:
TypeError: 'numpy.float64' object is not iterable
I don't have numpy or scipy on this computer so here is my best attempt at an answer.
def YourFunc(Bits):
return math.ceil(root_scalar(lambda x: (2**x)-x-1-Bits, x0 = max(log2(Bits),1)))
Bits = 30
NBits = YourFunc(30)
print(NBits)
I used this function for log2 rather than the one from numpy. Try it
def log2(x):
return math.log(x,2)
I am new to cython and have the following code for a numpy for loop that I am trying to optimize. So far, this Cython code isn't much faster than the numpy for loop.
# cython: infer_types = True
import numpy as np
cimport numpy
DTYPE = np.double
def hdcfTransfomation(scanData):
cdef Py_ssize_t Position
scanLength = scanData.shape[0]
hdcfFunction_np = np.zeros(scanLength, dtype = DTYPE)
cdef double [::1] hdcfFunction = hdcfFunction_np
for position in range(scanLength - 1):
topShift = scanData[1 + position:]
bottomShift = scanData[:-(position + 1)]
arrayDiff = np.subtract(topShift, bottomShift)
arraySquared = np.square(arrayDiff)
arrayMean = np.mean(arraySquared, axis = 0)
hdcfFunction[position] = arrayMean
return hdcfFunction
I know that using C math library functions would be more ideal than calling back into the numpy language (subtract, square, mean), but I am not sure where I can find a list of functions that can be called in this manner.
I have been trying to figure out ways to optimize this code by using different types, ect. but nothing is providing the performance that I think is possible with a fully optimized implementation of Cython.
Here is a working example of the numpy for-loop:
def hdcfTransfomation(scanData):
scanLength = scanData.shape[0]
hdcfFunction = np.zeros(scanLength)
for position in range(scanLength - 1):
topShift = scanData[1 + position:]
bottomShift = scanData[:-(position + 1)]
arrayDiff = np.subtract(topShift, bottomShift)
arraySquared = np.square(arrayDiff)
arrayMean = np.mean(arraySquared, axis = 0)
hdcfFunction[position] = arrayMean
return hdcfFunction
scanDataArray = np.random.rand(80000, 1)
transformedScan = hdcfTransformed(scanDataArray)
Always provide as much informations as possible (some example data, Python/Cython Version, Compiler Version/Settings and CPU Model.
Without that it is quite hard to compare any timings. For example this problem benefits quite well from SIMD-vectorization. It will make quite a difference which compiler you use or if you want to redistribute a compiled version which should also run on low-end or quite old CPUS (eg. no AVX).
I am not very familiar with Cython, but I think your main problem is the missing declaration for scanData. Maybe the C-Compiler needs additional flags like march=native, but the real syntax is compiler dependend. I am am also not sure how Cython or the C-compiler optimizes this part:
arrayDiff = np.subtract(topShift, bottomShift)
arraySquared = np.square(arrayDiff)
arrayMean = np.mean(arraySquared, axis = 0)
If that loops (all vectorized commands are actually loops) are not joined, but intead there are temporary arryas like in pure Python created, this will slow down the code. It will be a good idea to create a 1D array first. (eg. scanData=scanData[::1]
As said I am not that familliar with Cython, I tried what is possible with Numba. At least it shows what should also be possible with a resonable good Cython implementation.
Maybe easier to otimize for the compiler
import numba as nb
import numpy as np
#nb.njit(fastmath=True,error_model='numpy',parallel=True)
#scanData is a 1D-array here
def hdcfTransfomation(scanData):
scanLength = scanData.shape[0]
hdcfFunction = np.zeros(scanLength, dtype = scanData.dtype)
for position in nb.prange(scanLength - 1):
topShift = scanData[1 + position:]
bottomShift = scanData[:scanData.shape[0]-(position + 1)]
sum=0.
jj=0
for i in range(scanLength-(position + 1)):
jj+=1
sum+=(topShift[i]-bottomShift[i])**2
hdcfFunction[position] = sum/jj
return hdcfFunction
I also used parallelization here, because the problem is embarrassingly parallel. At least with a size of 80_000 and Numba it doesn't matter if you use a slightly modified version of your code (1D-array), or the code above.
Timings
#Quadcore Core i7-4th gen,Numba 0.4dev,Python 3.6
scanData=np.random.rand(80_000)
#The first call to the function isn't measured (compilation overhead),but the following calls.
Pure Python: 5900ms
Numba single-threaded: 947ms
Numba parallel: 260ms
Especially for larger arrays than np.random.rand(80_000) there may be better aproaches (loop tilling for better cache usage), but for this size that should be more or less OK (At least it fits in the L3-cache)
Naive GPU Implementation
from numba import cuda, float32
#cuda.jit('void(float32[:], float32[:])')
def hdcfTransfomation_gpu(scanData,out_data):
scanLength = scanData.shape[0]
position = cuda.grid(1)
if position < scanLength - 1:
sum= float32(0.)
offset=1 + position
for i in range(scanLength-offset):
sum+=(scanData[i+offset]-scanData[i])**2
out_data[position] = sum/(scanLength-offset)
hdcfTransfomation_gpu[scanData.shape[0]//64,64](scanData,res_3)
This gives about 400ms on a GT640 (float32) and 970ms (float64). For a good implemenation shared arrays should be considered.
Putting cython aside, does this do the same thing as your current code but without a for loop? We can tighten it up and correct for inaccuracies, but the first port of call is to try apply operations in numpy to 2D arrays before turning to cython for for loops. It's too long to put in a comment.
import numpy as np
# Setup
arr = np.random.choice(np.arange(10), 100).reshape(10, 10)
top_shift = arr[:, :-1]
bottom_shift = arr[:, 1:]
arr_diff = top_shift - bottom_shift
arr_squared = np.square(arr_diff)
arr_mean = arr_squared.mean(axis=1)
Calling MATLAB from Python is bound to give some performance reduction that I could avoid by rewriting (a lot of) code in Python. However, this isn't a realistic option for me, but it annoys me that a huge loss of efficiency lies in the simple conversion from a numpy array to a MATLAB double.
I'm talking about the following conversion from data1 to data1m, where
data1 = np.random.uniform(low = 0.0, high = 30000.0, size = (1000000,))
data1m = matlab.double(list(data1))
Here matlab.double comes from Mathworks own MATLAB package / engine. The second line of code takes 20 s on my system, which just seems like too much for a conversion that doesn't really do anything other than making the numbers 'edible' for MATLAB.
So basically I'm looking for a trick opposite to the one given here that works for converting MATLAB output back to Python.
Passing numpy arrays efficiently
Take a look at the file mlarray_sequence.py in the folder PYTHONPATH\Lib\site-packages\matlab\_internal. There you will find the construction of the MATLAB array object. The performance problem comes from copying data with loops within the generic_flattening function.
To avoid this behavior we will edit the file a bit. This fix should work on complex and non-complex datatypes.
Make a backup of the original file in case something goes wrong.
Add import numpy as np to the other imports at the beginning of the file
In line 38 you should find:
init_dims = _get_size(initializer)
replace this with:
try:
init_dims=initializer.shape
except:
init_dims = _get_size(initializer)
In line 48 you should find:
if is_complex:
complex_array = flat(self, initializer,
init_dims, typecode)
self._real = complex_array['real']
self._imag = complex_array['imag']
else:
self._data = flat(self, initializer, init_dims, typecode)
Replace this with:
if is_complex:
try:
self._real = array.array(typecode,np.ravel(initializer, order='F').real)
self._imag = array.array(typecode,np.ravel(initializer, order='F').imag)
except:
complex_array = flat(self, initializer,init_dims, typecode)
self._real = complex_array['real']
self._imag = complex_array['imag']
else:
try:
self._data = array.array(typecode,np.ravel(initializer, order='F'))
except:
self._data = flat(self, initializer, init_dims, typecode)
Now you can pass a numpy array directly to the MATLAB array creation method.
data1 = np.random.uniform(low = 0.0, high = 30000.0, size = (1000000,))
#faster
data1m = matlab.double(data1)
#or slower method
data1m = matlab.double(data1.tolist())
data2 = np.random.uniform(low = 0.0, high = 30000.0, size = (1000000,)).astype(np.complex128)
#faster
data1m = matlab.double(data2,is_complex=True)
#or slower method
data1m = matlab.double(data2.tolist(),is_complex=True)
The performance in MATLAB array creation increases by a factor of 15 and the interface is easier to use now.
While awaiting better suggestions, I'll post the best trick I've come up with so far. It comes down to saving the file with `scipy.io.savematĀ“ and then loading this file in MATLAB.
This is not the prettiest hack and it requires some care to ensure different processes relying on the same script don't end up writing and loading each other's .mat files, but the performance gain is worth it for me.
As a test case I wrote two simple, almost identical MATLAB functions that require 2 numpy arrays (I tested with length 1000000) and one int as input.
function d = test(x, y, fs_signal)
d = sum((x + y))./double(fs_signal);
function d = test2(path)
load(path)
d = sum((x + y))./double(fs_signal);
The function test requires conversion, while test2 requires saving.
Testing test: Converting the two numpy arrays takes cirka 40 s on my system. The total time to prepare for and run test comes down to 170 s
Testing test2: Saving the arrays and int takes cirka 0.35 s on my system. Suprisingly, loading the .mat file in MATLAB is extremely efficient (or more suprisingly, it is extremely ineffcient at dealing with its doubles)... The total time to prepare for and run test2 comes down to 0.38 s
That's a performance gain of almost 450x...
My situation was a bit different (python script called from matlab) but for me converting the ndarray into an array.array massively speed up the process. Basically it is very similar to Alexandre Chabot solution but without the need to alter any files:
#untested i.e. only deducted from my "matlab calls python" situation
import numpy
import array
data1 = numpy.random.uniform(low = 0.0, high = 30000.0, size = (1000000,))
ar = array.array('d',data1.flatten('F').tolist())
p = matlab.double(ar)
C = matlab.reshape(p,data1.shape) #this part I am definitely not sure about if it will work like that
At least if done from Matlab the combination of "array.array" and "double" is relative fast. Tested with Matlab 2016b + python 3.5.4 64bit.
I want to pass a numpy array into some(one else's) C++ code via CFFI. Assume I cannot (in any sense) change the C++ code, whose interface is:
double CompactPD_LH(int Nbins, double * DataArray, void * ParamsArray ) {
...
}
I pass Nbins as a python integer, ParamsArray as a dict -> structure, but DataArray (shape = 3 x NBins, which needs to be populated from a numpy array, is giving me a headache. (cast_matrix from Why is cffi so much quicker than numpy? isn't helping here :(
Here's one attempt that fails:
from blah import ffi,lib
data=np.loadtxt(histof)
DataArray=cast_matrix(data,ffi) # see https://stackoverflow.com/questions/23056057/why-is-cffi-so-much-quicker-than-numpy/23058665#23058665
result=lib.CompactPD_LH(Nbins,DataArray,ParamsArray)
For reference, cast_matrix was:
def cast_matrix(matrix, ffi):
ap = ffi.new("double* [%d]" % (matrix.shape[0]))
ptr = ffi.cast("double *", matrix.ctypes.data)
for i in range(matrix.shape[0]):
ap[i] = ptr + i*matrix.shape[1]
return ap
Also:
How to pass a Numpy array into a cffi function and how to get one back out?
https://gist.github.com/arjones6/5533938
Thanks #morningsun!
dd=np.ascontiguousarray(data.T)
DataArray = ffi.cast("double *",dd.ctypes.data)
result=lib.CompactPD_LH(Nbins,DataArray,ParamsArray)
works!