How do I properly create type hints if the value isn't assigned yet.
For example:
class foo():
def __init__():
data: np.ndarray = None
def load_data():
data = np.loadtxt(...)
Now I obviously get a warning, that type ndarray is expected, and not None. What is an elegant solution to this? Do I just make up some ndarray like data: np.ndarray = np.array([])? That seams just wrong to me, and I'm sure there is a better way of doing it.
I still prefer the None version, because if there is an error with reading the numpy array, I will get an error like "can't calculate ... with type None". Then I imideatly know, it didn't read the file. Whereas, if the array is just empty, I might get weird errors, I don't understand.
SOLUTION:
Thanks to the commentators, pointing this out. The solution is importing Optional from typing, and then use Optional[np.ndarray] instead of np.ndarray
Consider using data: np.ndarray = np.empty([])
Or if you know the dimension of the array, initialize it with its dimensions.
See numpy.empty for more information.
Good Luck,
Ben
Related
I am fairly new to cython and I am wondering why the following takes very long:
cpdef test(a):
cdef np.ndarray[dtype=int] b
for i in range(10):
b=a
a=np.array([1,2,3],dtype=int)
t = timeit.Timer(functools.partial(test.test, a))
print(t.timeit(1000000))
-> 0.5446977 Seconds
If i comment out the cdef declaration this is done in no-time. If i declare "a" as np.ndarray in the function header nothing changes. Also, id(a) == id(b) so no new objects are created.
Similar behaviour can be observed when calling a function that takes many ndarray as args, e.g.
cpdef foo(np.ndarray a, np.ndarray b,np.ndarray c, ..... )
Can anybody help me? What am i missing here?
Edit:
I noticed the following:
This is slow:
cpdef foo(np.ndarray[dtype=int,ndim=1] a,np.ndarray[dtype=int,ndim=1] b,np.ndarray[dtype=int,ndim=1] c ) :
return
This is faster:
def foo(np.ndarray[dtype=int,ndim=1] a,np.ndarray[dtype=int,ndim=1] b,np.ndarray[dtype=int,ndim=1] c ) :
return
This is the fastest
cpdef foo( a,b,c ) :
return
The function foo() is called very frequently (many million times) in my project from many different locations and does some calculus with the three numpy arrays (however, it doesnt change their content).
I basically need the speed of knowing the data-type inside of the arrays while also having a very low function-call overead. What would be the most adequate solution for this?
b = a generates a bunch of type checking that needs to identify whether the type of a is actually an ndarray and makes sure it exports the buffer protocol with an appropriate element type. In exchange for this one-off cost you get fast indexing of single elements.
If you're not doing indexing of single elements then typing as np.ndarray is literally pointless and you're pessimizing your code. If you are doing this indexing then you can get significant optimizations.
If i comment out the cdef declaration this is done in no-time.
This is often a sign that the C compiler has realized the entire function does nothing and optimized it out completely. And therefore your measurement may be meaningless.
cpdef foo(np.ndarray a, np.ndarray b,np.ndarray c, ..... )
just specifying the type as np.ndarray without specifying the element dtype usually gains you very little, and is probably not worthwhile.
If you have a function that you're calling millions of times then it is likely that the input arrays come from somewhere, and can be pre-typed, probably with less frequency. For example they might come by taking slices from a larger array?
The newer memoryview syntax (int[:]) is quick to slice, so for example if you already have a 2D memoryview (int[:,:] x) it's very quick to generate a 1D memoryview from it with (e.g. x[:,0]), and it's quick to pass existing memoryviews into a cdef function with memoryview arguments. (Note that (a) I'm just unsure if all of this applies to np.ndarray too, and (b) seeing up a fresh memoryview is likely to be about the same cost an an np.ndarray so I'm only suggesting using them because I know slicing is quick).
Therefore my main suggestion is to move the typing outwards to try to reduce the number of fresh initializations of these typed arrays. If that isn't possible then I think you may be stuck.
I have a list of type objects and want to construct a typing.Tuple object. Is there a way to do this?
tys = [int, str] # This is known only at runtime
x = typing.Tuple(tys) # TypeError: Type Tuple cannot be instantiated; use tuple() instead
Any help will be appreciated. Thanks.
Edit: For clarification about the question-
What I am trying to do - visit a Tuple in python AST and annotate its type. For this the list tys is made out of a loop (and so, I could not start with a tuple in the start). Now, I definitely have to annotate it with a typing.Tuple object and hence the question.
I got the solution.
x = typing.Tuple[tuple(tys)] # This works
Edit: This works for all typing constructs. For eg. while typing.Union(tys) and typing.Union[tys] will give an error, typing.Union[tuple(tys)] works. I am not sure if this is a general "python" thing, or it is special with the typing module. I will update this answer once I know that.
I'm trying to build a code that checks whether a given object is an np.array() in python.
if isinstance(obj,np.array()) doesn't seem to work.
I would truly appreciate any help.
isinstance(obj, numpy.ndarray) may work
Below code seems to work. Use numpy.ndarray.
import numpy as np
l = [1,2,3,4]
l_arr = np.array(l)
if isinstance(l_arr, np.ndarray):
print("Type is np.array")
else:
print("Type is not np.array")
Output:
Type is np.array
You could compare the type of the object being passed to the checking function with 'np.ndarray' to check if the given object is indeed an np.ndarray
The sample code snippet for the same should look something like this :
if isinstance(obj,np.ndarray):
# proceed -> is an np array
else
# Not an np.ndarray
The type of what numpy.array returns is numpy.ndarray. You can determine that in the repl by calling type(numpy.array([])). Note that this trick works even for things where the raw class is not publicly accessible. It's generally better to use the direct reference, but storing the return from type(someobj) for later comparison does have its place.
I'm trying to subclass numpy.complex64 in order to make use of the way numpy stores the data, (contiguous, alternating real and imaginary part) but use my own __add__, __sub__, ... routines.
My problem is that when I make a numpy.ndarray, setting dtype=mysubclass, I get a numpy.ndarray with dtype='numpy.complex64' in stead, which results in numpy not using my own functions for additions, subtractions and so on.
Example:
import numpy as np
class mysubclass(np.complex64):
pass
a = mysubclass(1+1j)
A = np.empty(2, dtype=mysubclass)
print type(a)
print repr(A)
Output:
<class '__main__.mysubclass'>
array([ -2.07782988e-20 +4.58546896e-41j, -2.07782988e-20 +4.58546896e-41j], dtype=complex64)'
Does anyone know how to do this?
Thanks in advance - Soren
The NumPy type system is only designed to be extended from C, via the PyArray_RegisterDataType function. It may be possible to access this functionality from Python using ctypes but I wouldn't recommend it; better to write an extension in C or Cython, or subclass ndarray as #seberg describes.
There's a simple example dtype in the NumPy source tree: newdtype_example/floatint.c. If you're into Pyrex, reference.pyx in the pytables source may be worth a look.
Note that scalars and arrays are quite different in numpy. np.complex64 (this is 32-bit float, just to note, not double precision). You will not be able to change the array like that, you will need to subclass the array instead and then override its __add__ and __sub__.
If that is all you want to do, it should just work otherwise look at http://docs.scipy.org/doc/numpy/user/basics.subclassing.html since subclassing an array is not that simple.
However if you want to use this type also as a scalar. For example you want to index scalars out, it gets more difficult at least currently. You can get a little further by defining __array_wrap__ to convert to scalars to your own scalar type for some reduce functions, for indexing to work in all cases it appears to me that you may have define your own __getitem__ currently.
In all cases with this approach, you still use the complex datatype, and all functions that are not explicitly overridden will still behave the same. #ecatmur mentioned that you can create new datatypes from the C side, if that is really what you want.
Is there a simple way to create an immutable NumPy array?
If one has to derive a class from ndarray to do this, what's the minimum set of methods that one has to override to achieve immutability?
You can make a numpy array unwriteable:
a = np.arange(10)
a.flags.writeable = False
a[0] = 1
# Gives: ValueError: assignment destination is read-only
Also see the discussion in this thread:
http://mail.scipy.org/pipermail/numpy-discussion/2008-December/039274.html
and the documentation:
http://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.flags.html
I have a subclass of Array at this gist: https://gist.github.com/sfaleron/9791418d7023a9985bb803170c5d93d8
It makes a copy of its argument and marks that as read-only, so you should only be able to shoot yourself in the foot if you are very deliberate about it. My immediate need was for it to be hashable, so I could use them in sets, so that works too. It isn't a lot of code, but about 70% of the lines are for testing, so I won't post it directly.
Note that it's not a drop-in replacement; it won't accept any keyword args like a normal Array constructor. Instances will behave like Arrays, though.
Setting the flag directly didn't work for me, but using ndarray.setflags did work:
a = np.arange(10)
a.setflags(write=False)
a[0] = 1 # ValueError