I've got some code I'm porting to Cython which had a line like
my_list.sort(key=lambda x: x.attr[item])
Is there a nice pythonic way of avoiding the closure with some combination of itemgetter and attrgetter?
The key is to use the package functional:
from functional import compose
from operator import attrgetter, itemgetter
my_list.sort(key=compose(itemgetter(item), attrgetter('attr')))
The accepted self answer feels like a mistake/de-optimization to me.
My guess is that the unstated problem is that closures aren't supported inside cpdef functions. However closures are supported inside both cdef and def functions.
My view is that there's rarely a reason to use cpdef functions - they have all the disadvantages of def functions and all the disadvantages of cdef functions (plus a few more unique disadvantages, like no closures) so I usually treat them as the worst of all worlds. Ideally you'd just decide if something should be a Cython/C interface (cdef) or a Python interface def and use that. Also remember that the type of function makes little to no difference to how the code inside it is compiled, so a find/replace of def for cpdef really is redundant when porting to Cython.
Therefore for this case I would keep the closure as written in the original Python code and pick one of:
Just keep it as a def function.
If you really need to call it from both Python and need the speed of a cdef call when calling it from Cython then write a cdef function and a really small def function wrapper for the cdef function.
The code in the accepted answer is de-optimizing an attribute lookup and an index into at least(?) 3 Python function calls, an attribute lookup, and an index.
Related
I am rephrasing my original question based on some of the comments (I recognized that when posted the question was not as precise as it could have been) in the hopes to find a answer.
I have the following cdef function that among various other parameters accepts an f() function defined as an ftype as shown below,
%%cython
ctypedef double (*ftype) (double)
cdef cy_myfunc(int a,..., double x, ftype f):
...
cdef double result
result = f(x)
return result
and I define a python function such as this:
def py_myfunc(a,..., x, f):
return cy_myfunc(a,...,x,f)
I want to be able to call py_myfunc() and pass a user python function as the f parameter (the only prerequisite of the python function is that is should accept a double and return double). Is this possible? if so how can this be achieved in cython? The f() function is used repeatedly within a loop so I would like for it to be as quick as possible.
I haven't tinkered with cython in a while. But conceptually your approach is flawed. By hard-coding a C-type function that can ever only be a simple pointer-value, you limited yourself too much. A python function can never be that. It will ALWAYS be represented by a Py_Object, as any callable.
Just don't declare f as a ftype, but instead a normal python object (or no type at all), and then just call it. You need the full semantics of a passed Py_Object* anyways, as cython has no way to guarantee that the passed function is not completely different callable.
I was told the following extension type may not be very efficient due to the need for a Python object type-declaration for the use of DefaultDict. Can someone explain why this is, and whether DefaultDict could still be worth using (as opposed to dict)?
cdef class CythonClass(object):
cdef int var1, var2
cdef object defaultdict
def __init__(self, a, b):
self.var1 = a
self.var2 = b
self.defaultdict = DefaultDict(DefaultDict([]))
I may have overstated the efficiency part in my other answer. What I meant was: don't expect huge speedups (more than 1.5-2x) when you have to use Python objects.
You can use them and it won't be slower (except for very rare cases) than using them in python code. However the great power with Cython is that you can utilize native c types and homogeneous data structures like c-arrays (which can be much faster than python lists or even dictionaries) or if you go c++ then also vector, unordered_map and such.
One point to remember when dealing with Python objects is that all python objects are pointers to some structs, so each one adds one layer of indirection - that's even true for Python ints. However the Cython int is a C-integer without that indirection. That's one of the main reasons why for-loops in Cython are much faster. (However they are limit to the range of 64bits not unlimited precision like in Python, that's the trade-off).
Another point is that operating on python objects means that you need to go through pythons lookups, pythons operations, etc. However for built-in objects cython can use the Python C API to gain additional speed by avoiding the python based lookups (DefaultDict is not among them, I guess), for example with declared dicts the following code compiles differently:
def dumb_func(dict dct, str key):
return dct[key] # translates to: __Pyx_PyDict_GetItem(__pyx_v_dct, __pyx_v_key)
def dumb_func(object dct, object key): # different signature
return dct[key] # translates to: PyObject_GetItem(__pyx_v_dct, __pyx_v_key)
You probably could guess which one is faster, the one that adresses the dict directly (__Pyx_PyDict_GetItem is probably a sophisticated wrapper for PyDict_GetItem and PyDict_GetItemString) or the one that just adresses a python object with PyObject_GetItem (going through the python lookups). This won't be a huge speedup neither but it's noticable.
In the end I would say that normal (and declared) dicts would definetly be faster than DefaultDict (except that's some C or C++ class) in Cython code.
I know python does not allow us to overload functions. However, does it have inbuilt overloaded methods?
Consider this:
setattr(object_name,'variable', 'value')
setattr(class_name,'method','function')
The first statement dynamically adds variables to objects during run time, but the second one attaches outside functions to classes at run time.
The same function does different things based on its arguments. Is this function overload?
The function setattr(foo, 'bar', baz) is always the same as foo.bar = baz, regardless of the type of foo. There is no overloading here.
In Python 3, limited overloading is possible with functools.singledispatch, but setattr is not implemented with that.
A far more interesting example, in my opinion, is type(). type() does two entirely different things depending on how you call it:
If called with a single argument, it returns the type of that argument.
If called with three arguments (of the correct types), it dynamically creates a new class.
Nevertheless, type() is not overloaded. Why not? Because it is implemented as one function that counts how many arguments it got and then decides what to do. In pure Python, this is done with the variadic *args syntax, but type() is implemented in C, so it looks rather different. It's doing the same thing, though.
Python, in some sense, doesn't need a function overloading capability when other languages do. Consider the following example in C:
int add(int x, int y) {
return x + y;
}
If you wish to extend the notion to include stuff that are not integers you would need to make another function:
float add(float x, float y) {
return x + y;
}
In Python, all you need is:
def add(x, y):
return x + y
It works fine for both, and it isn't considered function overloading. You can also handle different cases of variable types using methods like isinstance. The major issue, as pointed out by this question, is the number of types. But in your case you pass the same number of types, and even so, there are ways around this without function overloading.
overloading methods is tricky in python. However, there could be usage of passing the dict, list or primitive variables.
I have tried something for my use cases, this could help here to understand people to overload the methods.
Let's take the example:
a class overload method with call the methods from different class.
def add_bullet(sprite=None, start=None, headto=None, spead=None, acceleration=None):
pass the arguments from remote class:
add_bullet(sprite = 'test', start=Yes,headto={'lat':10.6666,'long':10.6666},accelaration=10.6}
OR add_bullet(sprite = 'test', start=Yes,headto={'lat':10.6666,'long':10.6666},speed=['10','20,'30']}
So, handling is being achieved for list, Dictionary or primitive variables from method overloading.
try it out for your codes
In Python, for a simple function foo(x, y) there are at least 3 ways that i know to bind the argument y to some value
# defining a nested function:
def foobar(x):
return foo(x, y=yval)
# using lambda
foobar = lambda x: foo(x, y=yval)
# using functools
from functools import partial
foobar = partial(foo, y=yval)
while i am doubtful that the list above is exhaustive, i also wonder which one should i go with? are they all equivalent in terms of performance, safety and namespace handling? or are there extra overheads and caveats with each method? why should functools define partial when the other methods are already there?
No, they're not all equivalent -- in particular, a lambda cannot be pickled and a functools.partial can, IIRC, be pickled only in recent Python versions (I can't find which exact version in the docs; it doesn't work in 2.6, but it does in 3.1). Neither can functions defined inside of other functions (neither in 2.6 nor 3.1).
The reason for partial's appearance in the library is that it gives you an explicit idiom to partially apply a function inline. A definition (def) cannot appear in the middle of an expression such as
map(partial(foo, y=yval), xs)
Also, from a definition or lambda, it's not immediately clear that partial application is what's going on, since you can put an arbitrary expression in a lambda and arbitrary statements in a definition.
I suggest you go with partial unless you have a reason not to use it.
[And indeed, the list is not exhaustive. The first alternative that comes to mind is a callable object:
class foobar:
def __init__(self, yval):
self.__yval = yval
def __call__(self, x):
return foo(x, self.__yval)
but those are heavy-weights for such a simple problem.]
I have two functions like the following:
def fitnesscompare(x, y):
if x.fitness>y.fitness:
return 1
elif x.fitness==y.fitness:
return 0
else: #x.fitness<y.fitness
return -1
that are used with 'sort' to sort on different attributes of class instances.
These are used from within other functions and methods in the program.
Can I make them visible everywhere rather than having to pass them to each object in which they are used?
Thanks
The best approach (to get the visibility you ask about) is to put this def statement in a module (say fit.py), import fit from any other module that needs access to items defined in this one, and use fit.fitnesscompare in any of those modules as needed.
What you ask, and what you really need, may actually be different...:
as I explained in another post earlier today, custom comparison functions are not the best way to customize sorting in Python (which is why in Python 3 they're not even allowed any more): rather, a custom key-extraction function will serve you much better (future-proof, more general, faster). I.e., instead of calling, say
somelist.sort(cmp=fit.fitnesscompare)
call
somelist.sort(key=fit.fitnessextract)
where
def fitnessextract(x):
return x.fitness
or, for really blazing speed,
import operator
somelist.sort(key=operator.attrgetter('fitness'))
Defining a function with def makes that function available within whatever scope you've defined it in. At module level, using def will make that function available to any other function inside that module.
Can you perhaps post an example of what is not working for you? The code you've posted appears to be unrelated to your actual problem.