Is it advisable, while working with arrays of symbolic expresions, to use numpy arrays?
Something like
u0=numpy.array([Number(1.0), Number(1.0), Number(1.0)])
I mean, is it faster to use numpy arrays instead of python lists?
If so, certain operations with numpy arrays seem to convert automatically to float symbolic expresions, for example:
u0=np.array([Number(1.0), Number(1.0), Number(1.0)])
u = np.zeros((10, 3))
u[0] = u0
Now while
type(u0[0]) >> sympy.core.numbers.Float ,
type(u[0][0]) >> numpy.float64
How can I avoid numpy to convert the symbolic expresions copied to float64?
I doubt there's much speed difference vs. a list, since using any non-NumPy data type (i.e., any SymPy data type) in a NumPy array results in dtype=object, meaning the array is just an array of pointers (which a list is too).
It's really unclear why you want to use a NumPy array?
The first question is, why don't you want to use float64? Assumedly you are using
Symbolic expressions (such as x**2 or pi),
Rational numbers, or
sympy.Float objects with higher precision
Those are the only reasons I can think of that you would want to prefer a SymPy type over a NumPy one.
The main advantage of using a NumPy array would be if you want to take advantage of NumPy's superior indexing syntax. As Stelios pointed out, you can get much of this by using SymPy's tensor module. This is really the only reason to use them, and you have to be careful and be aware of which NumPy methods/functions will work and which won't.
The reason is that any NumPy mathematical function will not work (or at best, will convert the array to float64 first). The reason is that NumPy functions are designed to work on NumPy data types. They don't know about the above data types. To get exact values (symbolic expressions or rational numbers), or higher precision floating point values (for the case of sympy.Float), you need to use SymPy functions, which do not work on NumPy arrays.
If on the other hand (again, it's not clear what exactly you are trying to do), you want to do calculations in SymPy and then use NumPy functions to numerically evaluate the expressions, you should use SymPy to create your expressions, and then lambdify (or ufuncify if performance becomes an issue) to convert the expressions to equivalent NumPy functions, which can operate on NumPy arrays of NumPy dtypes.
I think it is ok to work with numpy arrays, if necessary. You should bear in mind that arrays are fundamentally different from lists. Most importantly,
all array elements have to be of the same type and you cannot change the type.
In particular, you define the array u0 which is per default an array of floats.
That is why you cannot assign any sympy objects to it.
I myself use numpy arrays to accommodate sympy expressions. Most notably, in cases where I need more than 2 dimensions and therefore cannot use Sympy matrices.
If the only reason to use arrays instead of lists is speed, it might not be advisable. Especially, since you have to be a bit careful with types (as you find out) and there should be less surprises when using lists or rather sympy.Matrix.
In your example, you can fix the problem by defining a proper data type:
u = np.zeros((10, 3), dtype=sp.Symbol)
Related
I am working on an algorithm that leverages Cython and C++ code to speed up a computation. Part of the computation involves keeping track of a 2D matrix, vecs that is D x D' dimensions (e.g. 1000 x 100). The algorithm parallelizes within Cython to set values per column. I am then interested in then obtaining the vecs values as a NumPy array in Python.
Modifying vecs in Cython
The pseudocode for setting each column of vecs is something like:
# this occurs in a Cython/C++ function
for icolumn in range(D'):
for irow in range(D):
vecs[irow, icolumn] = val
Data structure for vecs
To represent such a matrix, I am using a pointer of pointers of type npy_float32 (which I think is just numpy's float 32 type). I have a pointer to pointer array now that looks like this:
ctypedef np.npy_float32 DTYPE_t
cdef DTYPE_t** vecs # (D, D') array of vectors
Goal to obtain the vecs in NumPy Array at the Python Level
I am interested in converting this vecs variable into a numpy array. This is my attempt, but it doesn't work. I'm pretty novice at C++ and Cython.
numpy_vec_arr = np.asarray(<DTYPE_t[:,:]> proj_vecs_arr)
I'm not an expert in NumPy nor Cython, but I do know some about C++ and about interoperating C++ with other languages.
What problem are you trying to solve?
Answering that will prevent the XY problem, and might allow folks here to better help you.
Now, answering your original question. I see two ways to do this.
Use an available constructor of the NumPy array to construct a NumPy array. This will clearly give you a NumPy array, problem solved.
Create an object with the same in-memory structure as the NumPy array that Python expects, then convert that object to a NumPy array. This is tricky to do, bug prone, and depends on many implementation details. For those reasons, I would not suggest taking this approach.
I wrote a program using normal Python, and I now think it would be a lot better to use numpy instead of standard lists. The problem is there are a number of things where I'm confused how to use numpy, or whether I can use it at all.
In general how do np.arrays work? Are they dynamic in size like a C++ vector or do I have declare their length and type beforehand like a standard C++ array? In my program I've got a lot of cases where I create a list
ex_list = [] and then cycle through something and append to it ex_list.append(some_lst). Can I do something like with a numpy array? What if I knew the size of ex_list, could I declare and empty one and then add to it?
If I can't, let's say I only call this list, would it be worth it to convert it to numpy afterwards, i.e. is calling a numpy list faster?
Can I do more complicated operations for each element using a numpy array (not just adding 5 to each etc), example below.
full_pallete = [(int(1+i*(255/127.5)),0,0) for i in range(0,128)]
full_pallete += [col for col in right_palette if col[1]!=0 or col[2]!=0 or col==(0,0,0)]
In other words, does it make sense to convert to a numpy array and then cycle through it using something other than for loop?
Numpy arrays can be appended to (see http://docs.scipy.org/doc/numpy/reference/generated/numpy.append.html), although in general calling the append function many times in a loop has a heavy performance cost - it is generally better to pre-allocate a large array and then fill it as necessary. This is because the arrays themselves do have fixed size under the hood, but this is hidden from you in python.
Yes, Numpy is well designed for many operations similar to these. In general, however, you don't want to be looping through numpy arrays (or arrays in general in python) if they are very large. By using inbuilt numpy functions, you basically make use of all sorts of compiled speed up benefits. As an example, rather than looping through and checking each element for a condition, you would use numpy.where().
The real reason to use numpy is to benefit from pre-compiled mathematical functions and data processing utilities on large arrays - both those in the core numpy library as well as many other packages that use them.
Short version
Given a built-in quaternion data type, how can I view a numpy array of quaternions as a numpy array of floats with an extra dimension of size 4 (without copying memory)?
Long version
Numpy has built-in support for floats and complex floats. I need to use quaternions -- which generalize complex numbers, but rather than having two components, they have four. There's already a very nice package that uses the C API to incorporate quaternions directly into numpy, which seems to do all the operations perfectly fast. There are a few more quaternion functions that I need to add to it, but I think I can mostly handle those.
However, I would also like to be able to use these quaternions in other functions that I need to write using the awesome numba package. Unfortunately, numba cannot currently deal with custom types. But I don't need the fancy quaternion functions in those numba-ed functions; I just need the numbers themselves. So I'd like to be able to just re-cast an array of quaternions as an array of floats with one extra dimension (of size 4). In particular, I'd like to just use the data that's already in the array without copying, and view it as a new array. I've found the PyArray_View function, but I don't know how to implement it.
(I'm pretty confident the data are held contiguously in memory, which I assume would be required for having a simple view of them. Specifically, elsize = 8*4 and alignment = 8 in the quaternion package.)
Turns out that was pretty easy. The magic of numpy means it's already possible. While thinking about this, I just tried the following with complex numbers:
import numpy as np
a = np.array([1+2j, 3+4j, 5+6j])
a.view(np.float).reshape(a.shape[0],2)
And this gave exactly what I was looking for. Somehow the same basic idea works with the quaternion type. I guess the internals just rely on that elsize, divide by sizeof(float) and use that to set the new size in the last dimension???
To answer my own question then, the same idea can be applied to the quaternion module:
import numpy as np, quaternions
a = np.array([np.quaternion(1,2,3,4), np.quaternion(5,6,7,8), np.quaternion(9,0,1,2)])
a.view(np.float).reshape(a.shape[0],4)
The view transformation and reshaping combined seem to take about 1 microsecond on my laptop, independent of the size of the input array (presumably because there's no memory copying, other than a few members in some basic python object).
The above is valid for simple 1-d arrays of quaternions. To apply it to general shapes, I just write a function inside the quaternion namespace:
def as_float_array(a):
"View the quaternion array as an array of floats with one extra dimension of size 4"
return a.view(np.float).reshape(a.shape+(4,))
Different shapes don't seem to slow the function down significantly.
Also, it's easy to convert back to from a float array to a quaternion array:
def as_quat_array(a):
"View a float array as an array of floats with one extra dimension of size 4"
if(a.shape[-1]==4) :
return a.view(np.quaternion).reshape(a.shape[:-1])
return a.view(np.quaternion).reshape(a.shape[:-1]+(a.shape[-1]//4,))
i'm using python + numpy + scipy to do some convolution filtering over a complex-number array.
field = np.zeros((field_size, field_size), dtype=complex)
...
field = scipy.signal.convolve(field, kernel, 'same')
So, when i want to use a complex array in numpy all i need to do is pass the dtype=complex parameter.
For my research i need to implement two other types of complex numbers: dual (i*i=0) and double (i*i=1). It's not a big deal - i just take the python source code for complex numbers and change the multiplication function.
The problem: how do i make a numpy array of those exotic numeric types?
It looks like you are trying to create a new dtype for e.g. dual numbers. It is possible to do this with the following code:
dual_type = np.dtype([("a", np.float), ("b", np.float)])
dual_array = np.zeros((10,), dtype=dual_type)
However this is just a way of storing the data type, and doesn't tell numpy anything about the special algebra which it obeys.
You can partially achieve the desired effect by subclassing numpy.ndarray and overriding the relevant member functions, such as __mul__ for multiply and so on. This should work fine for any python code, but I am fairly sure that any C or fortran-based routines (i.e. most of numpy and scipy) would multiply the numbers directly, rather than calling the __mul__. I suspect that convolve would fall into this basket, therefore it would not respect the rules which you define unless you wrote your own pure python version.
Here's my solution:
from iComplex import SplitComplex as c_split
...
ctype = c_split
constructor = np.vectorize(ctype, otypes=[np.object])
field = constructor(np.zeros((field_size, field_size)))
That is the easy way to create numpy object array.
What about scipy.signal.convolve - it doesn't seem to work with my complex numbers and i had to make my own convolution and it works deadly slow. So now i am looking for ways to speed it up.
Would it work to turn things inside-out? I mean instead of an array as the outer container holding small containers holding a couple floating point values as a complex number, turn that around so that your complex number is the outer container. You'd have two arrays, one of plain floats as the real part, and another array as the imaginary part. The basic super-fast convolver can do its job although you'd have to write code to use it four times, for all combinations of real/imaginary of the two factors.
In color image processing, I have often refactored my code from using arrays of RGB values to three arrays of scalar values, and found a good speed-up due to simpler convolutions and other operations working much faster on arrays of bytes or floats.
YMMV, since locality of the components of the complex (or color) can be important.
I want to create a MATLAB-like cell array in Numpy. How can I accomplish this?
Matlab cell arrays are most similar to Python lists, since they can hold any object - but scipy.io.loadmat imports them as numpy object arrays - which is an array with dtype=object.
To be honest though you are just as well off using Python lists - if you are holding general objects you will loose almost all of the advantages of numpy arrays (which are designed to hold a sequence of values which each take the same amount of memory).