Subclassing numpy scalar types

Subclassing numpy scalar types - python

I'm trying to subclass numpy.complex64 in order to make use of the way numpy stores the data, (contiguous, alternating real and imaginary part) but use my own __add__, __sub__, ... routines.
My problem is that when I make a numpy.ndarray, setting dtype=mysubclass, I get a numpy.ndarray with dtype='numpy.complex64' in stead, which results in numpy not using my own functions for additions, subtractions and so on.
Example:
import numpy as np
class mysubclass(np.complex64):
pass
a = mysubclass(1+1j)
A = np.empty(2, dtype=mysubclass)
print type(a)
print repr(A)
Output:
<class '__main__.mysubclass'>
array([ -2.07782988e-20 +4.58546896e-41j, -2.07782988e-20 +4.58546896e-41j], dtype=complex64)'
Does anyone know how to do this?
Thanks in advance - Soren

The NumPy type system is only designed to be extended from C, via the PyArray_RegisterDataType function. It may be possible to access this functionality from Python using ctypes but I wouldn't recommend it; better to write an extension in C or Cython, or subclass ndarray as #seberg describes.
There's a simple example dtype in the NumPy source tree: newdtype_example/floatint.c. If you're into Pyrex, reference.pyx in the pytables source may be worth a look.

Note that scalars and arrays are quite different in numpy. np.complex64 (this is 32-bit float, just to note, not double precision). You will not be able to change the array like that, you will need to subclass the array instead and then override its __add__ and __sub__.
If that is all you want to do, it should just work otherwise look at http://docs.scipy.org/doc/numpy/user/basics.subclassing.html since subclassing an array is not that simple.
However if you want to use this type also as a scalar. For example you want to index scalars out, it gets more difficult at least currently. You can get a little further by defining __array_wrap__ to convert to scalars to your own scalar type for some reduce functions, for indexing to work in all cases it appears to me that you may have define your own __getitem__ currently.
In all cases with this approach, you still use the complex datatype, and all functions that are not explicitly overridden will still behave the same. #ecatmur mentioned that you can create new datatypes from the C side, if that is really what you want.

Related

Substitute numpy functions with Python only

I have a python function that employs the numpy package. It uses numpy.sort and numpy.array functions as shown below:
def function(group):
pre_data = np.sort(np.array(
[c["data"] for c in group[1]],
dtype = np.float64
))
How can I re-write the sort and array functions using only Python in such a way that I no longer need the numpy package?

It really depends on the code after this. pre_data will be a numpy.ndarray which means that it has array methods which will be really hard to replicate without numpy. If those methods are being called later in the code, you're going to have a hard time and I'd advise you to just bite the bullet and install numpy. It's popularity is a testament to it's usefulness...
However, if you really just want to sort a list of floats and put it into a sequence-like container:
def function(group):
pre_data = sorted(float(c['data']) for c in group[1])
should do the trick.

Well, it's not strictly possible because the return type is an ndarray. If you don't mind to use a list instead, try this:
pre_data = sorted(float(c["data"]) for c in group[1])

That's not actually using any useful numpy functions anyway
def function(group):
pre_data = sorted(float(c["data"]) for c in group[1])

Casting python list to cython ptr

For given ctype array or a python list, how does one cast python object to cython void ptr?
The way I'm doing it now is something like this(_arr is python list):
int *int_t = <int*> malloc(cython.sizeof(int) * len(_arr))
if int_t is NULL:
raise MemoryError()
for i in xrange(len(_arr)):
int_t[i] = _arr[i]
After this I have int_t, in which I have entire array. But I wan't the thing to be more general and support other types, not just int. Do I have to do the same thing for each type or is there any generic way in which this can be done?

First of all, you should be aware that numpy arrays support quite a range of data types. Here is a general introduction on how to work with numpy arrays in cython. If you want to do numerical stuff that should do the trick. Also, numpy does support arrays of Python objects. But then again, the array lookups are optimized but interaction with the objects is not, since they are still Python objects.
If you are thinking about doing this with arbitrary types, i.e. converting arbitrary Python objects into some kind of C type or C++ object, you have to do it manually for every type. And this makes sense, too, since automatic casting between a potentially dynamic data structure and a static one is not always obvious, especially not to a compiler.

numpy.array() exceptions thrown (sorry for the initially horrible title)

I am learning Python and numpy, and am new to the idea of duck typing. I'm writing functions into which something/someone should pass a numpy array. Trying to embrace duck typing, I'm writing my code to use numpy.array with the copy= and ndmin= options to convert array_likes or 1d/0d arrays into the shape I need. Specifically, I use the ndmin= option in cases where I can accept either a (p,p) array or a scalar; the scalar can be coded as an int, (1,) array, (1,1) array, [1] list, etc...
So to take care of this, I'm using something like S = numpy.array(S,copy=False,ndmin=2) to get an array (if possible) with the right ndim, then test for the shape as I need. I know I should embed this in a Try-Except block, but can't find any documentation about what kind of exception numpy.array() is likely to throw. Thus I currently just have this:
# duck covariance matrix into a 2d matrix
try:
S = numpy.array(S, ndmin = 2, copy=False)
except Exception as e:
raise e
What specific exception(s) should I try to catch here? Thanks.

Document your function as accepting an array_like object and leave handling of exceptions to a caller.
numpy.array() is very permissive function it will convert to an array almost anything.

Try using np.asarray to convert inputs into arrays instead. It's guaranteed not to copy anything if the input is already a Numpy array. If you expect to receive array subclasses, use np.asanyarray.
Note that a lot of Numpy interfaces don't care if the input is 1- or 2-dimensional -- for example, np.dot works with both 1- and 2-dimensional input. It's probably best to leave it that way -- that way, things like scalar multiplication just work.

Python-numpy test for ndarray using ndim

I'm working on a project in Python requiring a lot of numerical array calculations. Unfortunately (or fortunately, depending on your POV), I'm very new to Python, but have been doing MATLAB and Octave programming (APL before that) for years. I'm very used to having every variable automatically typed to a matrix float, and still getting used to checking input types.
In many of my functions, I require the input S to be a numpy.ndarray of size (n,p), so I have to both test that type(S) is numpy.ndarray and get the values (n,p) = numpy.shape(S). One potential problem is that the input could be a list/tuple/int/etc..., another problem is that the input could be an array of shape (): S.ndim = 0. It occurred to me that I could simultaneously test the variable type, fix the S.ndim = 0problem, then get my dimensions like this:
# first simultaneously test for ndarray and get proper dimensions
try:
if (S.ndim == 0):
S = S.copy(); S.shape = (1,1);
# define dimensions p, and p2
(p,p2) = numpy.shape(S);
except AttributeError: # got here because input is not something array-like
raise AttributeError("blah blah blah");
Though it works, I'm wondering if this is a valid thing to do? The docstring for ndim says
If it is not already an ndarray, a conversion is
attempted.
and we surely know that numpy can easily convert an int/tuple/list to an array, so I'm confused why an AttributeError is being raised for these types inputs, when numpy should be doing this
numpy.array(S).ndim;
which should work.

When doing input validation for NumPy code, I always use np.asarray:
>>> np.asarray(np.array([1,2,3]))
array([1, 2, 3])
>>> np.asarray([1,2,3])
array([1, 2, 3])
>>> np.asarray((1,2,3))
array([1, 2, 3])
>>> np.asarray(1)
array(1)
>>> np.asarray(1).shape
()
This function has the nice feature that it only copies data when necessary; if the input is already an ndarray, the data is left in-place (only the type may be changed, because it also gets rid of that pesky np.matrix).
The docstring for ndim says
That's the docstring for the function np.ndim, not the ndim attribute, which non-NumPy objects don't have. You could use that function, but the effect would be that the data might be copied twice, so instead do:
S = np.asarray(S)
(p, p2) = S.shape
This will raise a ValueError if S.ndim != 2.
[Final note: you don't need ; in Python if you just follow the indentation rules. In fact, Python programmers eschew the semicolon.]

Given the comments to #larsmans answer, you could try:
if not isinstance(S, np.ndarray):
raise TypeError("Input not a ndarray")
if S.ndim == 0:
S = np.reshape(S, (1,1))
(p, p2) = S.shape
First, you check explicitly whether S is a (subclass of) ndarray. Then, you use the np.reshape to copy your data (and reshaping it, of course) if needed. At last, you get the dimension.
Note that in most cases, the np functions will first try to access the corresponding method of a ndarray, then attempt to convert the input to a ndarray (sometimes keeping it a subclass, as in np.asanyarray, sometimes not (as in np.asarray(...)). In other terms, it's always more efficient to use the method rather than the function: that's why we're using S.shape and not np.shape(S).
Another point: the np.asarray, np.asanyarray, np.atleast_1D... are all particular cases of the more generic function np.array. For example, asarray sets the optional copy argument of array to False, asanyarray does the same and sets subok=True, atleast_1D sets ndmin=1, atleast_2d sets ndmin=2... In other terms, it's always easier to use np.array with the appropriate arguments. But as mentioned in some comments, it's a matter of style. Shortcuts can often improve readability, which is always an objective to keep in mind.
In any case, when you use np.array(..., copy=True), you're explicitly asking for a copy of your initial data, a bit like doing a list([....]). Even if nothing else changed, your data will be copied. That has the advantages of its drawbacks (as we say in French), you could for example change the order from row-first C to column-first F. But anyway, you get the copy you wanted.
With np.array(input, copy=False), a new array is always created. It will either point to the same block of memory as input if this latter was already a ndarray (that is, no waste of memory), or will create a new one "from scratch" if input wasn't. The interesting case is of course if input was a ndarray.
Using this new array in a function may or may not change the original input, depending on the function. You have to check the documentation of the function you want to use to see whether it returns a copy or not. The NumPy developers try hard to limit unnecessary copies (following the Python example), but sometimes it can't be avoided. The documentation should tell explicitly what happens, if it doesn't or it's unclear, please mention it.
np.array(...) may raise some exceptions if something goes awry. For example, trying to use a dtype=float with an input like ["STRING", 1] will raise a ValueError. However, I must admit I can't remember which exceptions in all the cases, please edit this post accordingly.

Welcome to stack-overflow. This comes down to almost a style choice, but the most common way I've seen to deal with this kind of situation is to convert the input to an array. Numpy provides some useful tools for this. numpy.asarray has already been mentioned, but here are a few more. numpy.at_least1d is similar to asarray, but reshapes () arrays to be (1,) numpy.at_least2d is the same as above but reshapes 0d and 1d arrays to be 2d, ie (3,) to (1, 3). The reason we convert "array_like" inputs to arrays is partly just because we're lazy, for example sometimes it can be easier to write foo([1, 2, 3]) than foo(numpy.array([1, 2, 3])), but this is also the design choice made within numpy itself. Notice that the following works:
>>> numpy.mean([1., 2., 3.])
>>> 2.0
In the docs for numpy.mean we can see that x should be "array_like".
Parameters
----------
a : array_like
Array containing numbers whose mean is desired. If `a` is not an
array, a conversion is attempted.
That being said, there are situations when you want to only accept arrays as arguments and not all "array_like" types.

Custom data types in numpy arrays

I'm creating a numpy array which is to be filled with objects of a particular class I've made. I'd like to initialize the array such that it will only ever contain objects of that class. For example, here's what I'd like to do, and what happens if I do it.
class Kernel:
pass
>>> L = np.empty(4,dtype=Kernel)
TypeError: data type not understood
I can do this:
>>> L = np.empty(4,dtype=object)
and then assign each element of L as a Kernel object (or any other type of object). It would be so neat were I able to have an array of Kernels, though, from both a programming point of view (type checking) and a mathematical one (operations on sets of functions).
Is there any way for me to specify the data type of a numpy array using an arbitrary class?

If your Kernel class has a predictable amount of member data, then you could define a dtype for it instead of a class. e.g. if it's parameterized by 9 floats and an int, you could do
kerneldt = np.dtype([('myintname', np.int32), ('myfloats', np.float64, 9)])
arr = np.empty(dims, dtype=kerneldt)
You'll have to do some coercion to turn them into objects of class Kernel every time you want to manipulate methods of a single kernel but that's one way to store the actual data in a NumPy array. If you want to only store a reference, then the object dtype is the best you can do without subclassing ndarray.

It has to be a Numpy scalar type:
http://docs.scipy.org/doc/numpy/reference/arrays.scalars.html#arrays-scalars-built-in
or a subclass of ndarray:
http://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.html#numpy.ndarray

As far as I know, enforcing a single type for elements in a numpy.ndarray has to be done manually (unless the array contains Numpy scalars): there is no built-in checking mechanism (your array has dtype=object). If you really want to enforce a single type, you have to subclass ndarray and implement the checks in the appropriate methods (__setitem__, etc.).
If you want to implement operations on a set of functions (Kernel objects), you might be able to do so by defining the proper operations directly in your Kernel class. This is what I did for my uncertainties.py module, which handles numpy.ndarrays of numbers with uncertainties.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Subclassing numpy scalar types - python

Related

Substitute numpy functions with Python only

Casting python list to cython ptr

numpy.array() exceptions thrown (sorry for the initially horrible title)

Python-numpy test for ndarray using ndim

Custom data types in numpy arrays

Categories

Resources