Following the answer in How to efficiently convert Matlab engine arrays to numpy ndarray?, it seems much more efficient to access the matlab engine array through the _data property.
However it appears that there is no _data property when the array returned by Matlab is a 'complex single' one. Is there an equivalent fast access to the array of complex numbers ?
A possible workaround is to return from Matlab two real arrays (one containing the real part, the other the imaginary part) and build back the complex value in Python
M_real, M_imag = myMatlabFunction()
M_real_np = np.array(M_real._data)
M_imag_np = np.array(M_imag._data)
M_np = M_real_np + M_imag_np*np.complex(0,1)
Then we can profit from the fast access to the _data member of each array.
I am still interested in more straightforward solution.
Related
I have two questions that I have been dealing with for two days:
if I want to determine for a numpy object adarray the memory address of the object and the elements with the numpy method .array and once with the normal python functions hex(id()) I get different addresses.
with hex(id()) it gets really weird. sometimes the elements get the same addresses sometimes different ones.
import numpy as np
y = np.array([0,1,2,3])
print(y.data)
print(y[0].data)
print(y[1].data)
print(y[2].data)
print(y[3].data)
print(hex(id(y[0])))
print(hex(id(y[1])))
print(hex(id(y[2])))
print(hex(id(y[3])))
the results are:
<memory at 0x7f9aaa22d870>
<memory at 0x7f9aaa1bd940>
<memory at 0x7f9aaa1bd940>
<memory at 0x7f9aaa1bd940>
<memory at 0x7f9aaa1bd940>
0x7f9aaa31e030
with hex((id))
0x7f9aaa1c0750
0x7f9aaa1c0730
0x7f9aaa1c0130
0x7f9aaa1c0750
Most of these results don't mean what you're thinking, because NumPy memory layout doesn't work like you're thinking.
A NumPy array object is not its data buffer. The data buffer is separate. With all the metadata an array needs, it would not be possible for an array to literally be its data buffer, and with how NumPy makes heavy use of array views, it would not be possible for an array to directly contain its buffer either. Many arrays can share the same data buffer, or have overlapping data buffers.
A NumPy array object contains some metadata and a number of pointers, one of which points to its buffer. If you had done print(hex(id(y))), you would have gotten the address of the array object itself. With print(y.data), you print a memoryview object representing the array's data buffer, and the "at 0x..." gives the address of the buffer.
When you do y[0], that's not really an array element. It's a new array scalar object, representing an immutable scalar with value taken from the first index of y. It does not directly refer to the memory used for y's first element, because when someone does
x = y[0]
y[0] = 1
they don't want the y[0] = 1 assignment to affect x.
The array scalar has its own address and its own data buffer, separate from the array scalar itself. The array scalar has a very short lifetime, so y[0] and y[1] may end up using the same memory if y[0]'s lifetime ends before you retrieve y[1]. They don't have to use the same memory, but they can.
When you do print(hex(id(y[0]))), you're printing the address of the array scalar. When you do print(y[0].data), you're printing a memoryview representing the array scalar's data buffer.
With all that said, there is almost nothing useful you can do with any of these memory addresses, especially if you're not writing a C extension. If you are writing a C extension, you still probably shouldn't be using any of these addresses directly. Cython is much more convenient than writing C code directly. If you do want to write C to interact with NumPy, you're going to want a much deeper understanding of how NumPy arrays work under the hood, and you should go read the NumPy C API docs.
I have a table that looks as follows:
City
Value
<String>
<String>
Chicago
12
Detriot
15
Jersery City
20
This table is locked in this format:
import numpy as np
x = np.array([('Chicago', '12'),('Detriot', '15'),('Jersery City', '20')])
I did some research on Stack Overflow and came to this post here. However I don't know why it is not working. I tried the following code:
x[:,1] = x[:,1].astype(int)
I even tried the following as well and it did not work:
x[:,[1]] = x[:,[1]].astype(int)
However this line when run returns the following:
type(x[0,1])
numpy.str_
Numpy array only support uniform types. Thus, all the items of an array should be of the same type (you can retrieve it using x.dtype), like np.float64 or np.int64 for example.
The type of the items in x cannot change at runtime. x[:,1] = x[:,1].astype(int) performs an implicit conversion so that the types matches. If you need that, then you have to create a new array.
Note that this type can be object. In such a situation, any Python object can be stored in the Numpy array. However, this is generally a bad idea to use object types since they are inefficiently stored in memory, defeat any possible low-level vectorization (ie. slow) and cause performance issues in parallel (because of the GIL).
Note also that Numpy provides structured types to store (quite) complex data structure in each array item.
I have what I thought would be a simple task in numpy, but I'm having trouble.
I have a function which takes an index in the array and returns the value that belongs at that index. I would like to, efficiently, write the values into a numpy array.
I have found numpy.fromfunction, but it doesn't behave remotely like the documentation suggests. It seems to "vectorise" the function, which means that instead of passing the actual indices it passes a numpy array of indices:
def vsin(i):
return float(round(A * math.sin((2 * pi * wf) * i)))
numpy.fromfunction(vsin, (len,), dtype=numpy.int16)
# TypeError: only length-1 arrays can be converted to Python scalars
(if we use a debugger to inspect i, it is a numpy.array instance.)
So, if we try to use numpy's vectorised sin function:
def vsin(i):
return (A * numpy.sin((2 * pi * wf) * i)).astype(numpy.int16)
numpy.fromfunction(vsin, (len,), dtype=numpy.int16)
We don't get a type error, but if len > 2**15 we get discontinuities chopping accross our oscillator, because numpy is using int16_t to represent the index!
The point here isn't about sin in particular: I want to be able to write arbitrary python functions like this (whether a numpy vectorised version exists or not) and be able to run them inside a tight C loop (rather than a roundabout python one), and not have to worry about integer wraparound.
Do I really have to write my own cython extension in order to be able to do this? Doesn't numpy have support for running python functions once per item in an array, with access to the index?
It doesn't have to be a creation function: I can use numpy.empty (or indeed, reuse an existing array from somewhere else.) So a vectorised transformation function would also do.
I think the issue of integer wraparound is unrelated to numpy's vectorized sin implementation and even the use of python or C.
If you use a 2-byte signed integer and try to generate an array of integer values ranging from 0 to above 32767, you will get a wrap-around error. The array will look like:
[0, 1, 2, ... , 32767, -32768, -32767, ...]
The simplest solution, assuming memory is not too tight, is to use more bytes for your integer array generated by fromfunction so you don't have a wrap-around problem in the first place (up to a few billion):
numpy.fromfunction(vsin, (len,), dtype=numpy.int32)
numpy is optimized to work fast on arrays by passing the whole array around between vectorized functions. I think in general the numpy tools are inconvenient for trying to run scalar functions once per array element.
How would I translate the following into Python from Matlab? I'm still trying to wrap my head around lists/matrices and arrays in numpy, etc.
outframe(:,[4:4:nout-1]) = 0.25*inframe(:,[1:n-1]) + 0.75*inframe(:,[2:n])
pos=(beamnum>0)*(beamnum<=nbeams)*(binnum>0)*(binnum<=nbins)*((beamnum-1)*nbins+binnum)
for index =1:512:
outarray(index,:) =uint8(interp1([1:n],inarray64(index,:),[1:.25:n],method))
(There's other stuff, these are just the particular statements I'm not sure how to make sense of. I have numpy imported,
The main workhorse in numpy is the ndarray (or array). It will for the most part replace matlab matrices when you translate code. Like a matlab matrix, the ndarray stores homogeneous data (ie float64) and is optimized for numerical operations.
The numpy matrix is a subclass of the ndarray which can be convenient for some linear algebra intensive applications. Here is more info about the differences between the two.
The python list is more like a matlab cell array (though not exactly the same). It's one of the basic python data structures, but in scientific applications I find that it comes up most often when you need to hold heterogeneous data. (Or when you're doing something very simple and don't want to go to the trouble of creating a numpy array).
Your code above can be converted almost verbatim to python using the ndarray and replacing () with [] for indexing and taking into account that indexing starts at 1 in MATLAB and 0 in python
i.e. : the first element in MATLAB is element 1, and in python it is element 0.
Let's try this line by line:
outframe(:,[4:4:nout-1]) = 0.25*inframe(:,[1:n-1]) + 0.75*inframe(:,[2:n])
would translate in "English" to: all rows of outframe, but only every 4th column starting from 4 to nout-1 (i.e.4,8..). I assume you understand what inframe references mean.
pos=(beamnum>0)*(beamnum<=nbeams)*(binnum>0)*(binnum<=nbins)*((beamnum-1)*nbins+binnum)
Possibly beamnum is a vector and (beamnum >0) returns a vector of {0,1} such that the elements are '1' where the respective beamnum element is >0, else 0. The rest of it is clear, i hope.
The second last line is a for-loop and the last line should hopefully be clear.
Maybe this is a simple issue, but I could not find any information about it so far.
For an optimization in numpy I need an array of functions. The number of functions I need depends on the current object which shall be optimized.
I have already figured out how to create these functions dynamically, but now I would like to store them in an array like this:
myArray = zeros(x)
for i in range(x):
myArray[i] = createFunction(i)
If I run this I get a type mismatch:
float() argument must be a string or a number, not 'function'
Creating the array directly works well:
myArray = array([createFunction(0)...])
But because I don't know the number of functions I need, this is exactly what I want to prevent.
Ah, I get it. You really do mean an array of functions.
The type mismatch error arises because the call to zeros creates an array of floats by default. So your original would work if instead you did myArray = numpy.empty(x, dtype=numpy.object) (note that empty makes more sense than zeros here). The slightly more pythonic version is to use a list comprehension
myArray = numpy.array([createFunction(i) for i in range(x)]).
But you might not need to create a numpy array at all, depending on what you want to do with it:
myArray = [createFunction(i) for i in range(x)]
If you want to avoid the list, it might be better to use numpy.fromfunction along with numpy.vectorize:
myArray = numpy.fromfunction(numpy.vectorize(createFunction),
shape=(x,), dtype=numpy.object)
where (x,) is a tuple giving the shape of the array. The call to vectorize is needed because fromfunction assumes that the function can work on an array of inputs and return an array of scalars, and vectorize converts a function to do exactly that. The dtype=object is needed since otherwise numpy tries to create an array of floats.
Maybe you can use
myArray = array([createFunction(i) for i in range(x)])
If you need an array of functions, is it possible to not use NumPy? NumPy arrays have C-style types and it defaults to float. If you can, just use a standard Python list. But if you absolutely must use NumPy, try defining the array like so:
import numpy as np
a = np.empty([x], dtype=np.dtype(np.object_))
Or however you need it to be with that dtype.
Numpy arrays are homogeneous. That is all elements of a numpy array are of the same type -- python is duck-typed, numpy isn't. This is part of what makes matrix operations on numpy arrays and matrices so fast. However, because of this a data type must be known when the array is first created. Numpy is generally very good at inferring the data type. The problem comes when creating an empty or zeroed array. Since there are no elements to examine numpy must guess the data type. Numpy defaults to numpy.float64 if it isn't given a data type at array creation time. This is a decent choice as numpy is typically used in scientific or engineering areas where floating point numbers are required. This is also why numpy is complaining -- because it can't store your functions as 64-bit floating point numbers.
The quick solution is to let numpy know the data type you want. eg.
myArray = numpy.zeros(x, dtype=numpy.object)
Note that the data type cannot be any class, but must be an instance of numpy.dtype (for advanced use you can create additional dtypes a runtime that numpy can then manipulate). For functions, numpy will store them as numpy.object (which means any generic python object). I do not think you will get any performance benefit from using numpy to store arrays of functions. Perhaps you would be better off creating generator functions and chaining them, converting to a numpy array once you know the result will be a number.
funcs = [createFunction(i) for i in xrange(x)]
def getItemFromEachFunction(i):
return funcs[i]()
arr = numpy.fromfunction(getItemFromEachFunction, (x,))