Unexpected performance loss when calling Cython function within Python script? - python

So I have a time-critical section of code within a Python script, and I decided to write a Cython module (with one function -- all I need) to replace it. Unfortunately, the execution speed of the function I'm calling from the Cython module (which I'm calling within my Python script) isn't nearly as fast as I tested it to be in a variety of other scenarios. Note that I CANNOT share the code itself because of contract law! See the following cases, and take them as an initial description of my issue:
(1) Execute Cython function by using the Python interpreter to import the module and run the function. Runs relatively quickly (~0.04 sec on ~100 separate tests, versus original ~0.24 secs).
(2) Call Cython function within Python script at 'global' level (i.e. not inside any function). Same speed as case (1).
(3) Call Cython function within Python script, with Cython function inside my Python script's main function; tested with the Cython function in global and local namespaces, all with the same speed as case (1).
(4) Same as (3), but inside a simple for-loop within said Python function. Same speed as case (1).
(5) problem! Same as (4), but inside yet another for-loop: Cython function's execution time (whether called globally or locally) balloons to ~10 times that of the other cases, and this is where I need the function to get called. Nothing odd to report about this loop, and I tested all of the components of this loop (adjusting/removing what I could). I also tried using a 'while' loop for giggles, to no avail.
"One thing I've yet to try is making this inner-most loop a function and going from there." EDIT: Just tried this- no luck.
Thanks for any suggestions you have- I deeply regret not being able to share my code...it hurts my soul a little, but my client just can't have this code floating around. Let me know if there is any other information that I can provide!
-The Real Problem and an Initial (ugly) Solution-
It turns out that the best hint in this scenario was the obvious one (as usual): it wasn't the for-loop that was causing the problem; why would it? After a few more tests, it became obvious that something about the way I was calling my Cython function was wrong, because I could call it elsewhere (using an input variable different from the one going to the 'real' Cython function) without the performance loss issue.
The underlying issue: data types. I wrote my Cython function to expect a list full of standard floats. Unfortunately, my code did this:
function_input = list(numpy_array_containing_npfloat64_data) # yuck.
type(function_input[0]) = numpy.float64
output = Cython_Function(function_input)
inside the Cython function:
def Cython_Function(list function_input):
cdef many_vars
"""process lots of vars expecting C floats""" # Slowness from converting numpy.float64's --> floats???
type(output) = list
return output
I'm aware that I can play around more with types in the Cython function, which I very well may do to prevent having to 'list' an existing numpy array. Anyway, here is my current solution:
function_input = [float(x) for x in function_input]
I welcome any feedback and suggestions for improvement. The function_input numpy array doesn't really need the precision of numpy.float64, but it does get used a few times before getting passed to my Cython function.

It could be that, while individually, each function call with the Cython implementation is faster than its corresponding Python function, there is more overhead in the Cython function call because it has to look up the name in the module namespace. You can try assigning the function to a local callable first, for example:
from module import function
def main():
my_func = functon
for i in sequence:
my_func()
If possible, you should try to include the loops within the Cython function, which would reduce the overhead of a Python loop to the (very minimal) overhead of a compiled C loop. I understand that it might not be possible (i.e. need references from a global/larger scope), but it's worth some investigation on your part. Good luck!

function_input = list(numpy_array_containing_npfloat64_data)
def Cython_Function(list function_input):
cdef many_vars
I think the problem is in using the numpy array as a list ... can't you use the np.ndarray as input to the Cython function?
def Cython_Function(np.ndarray[dtype=np.float64] input):
....

Related

How to remove function implementation from a python file?

I am designing python assignments for a class. I define functions, write docstrings and then I implement them. Afterward, I'd like to remove all my implementations of the functions and replace only the code (not the doc-strings, function names, and arguments) with a raise NotImplementedError.
Is there any tool (e.g. IDE) which removes all the code for me automatically, so that I don't have to replace the implemented function by myself? I was thinking about writing a small script, but I thought I might ask here before I do this ...
If anyone has written something similar or knows of a quick way how to this, I would appreciate this a lot.
Here's a minimal example of what I'd like to achieve:
test.py
def add(a,b):
"""
Adds two numbers
"""
return a+b
def multiply(a,b):
"""
Multiplies two numbers
"""
return a*b
should become in an automated fashion (and of course for much larger files):
test.py
def add(a,b):
"""
Adds two numbers
"""
raise NotImplementedEror
def multiply(a,b):
"""
Multiplies two numbers
"""
raise NotImplementedEror
I don't know of a tool to do specifically this, but Python provides great AST manipulation tools within its own standard library via the ast module. You'll need a third party module to "unparse" the result after transformation back into regular Python code, and after a quick search I found this one seems to do the trick, although there do seem to be many others.
Here's a bit of sample code to get you in the right direction. Obviously, you'll need to tweak this to get the exact behavior you want, especially if you want to provide classes instead of just top-level functions (as I've written nothing to handle that use case). But with a bit of Python knowledge, it should be possible to automate.
Also, this is Python 3 (which, as of the start of 2020, is the only supported Python version). If you're still on Python 2, it may require some modifications.
import ast
import astunparse
# Read our file using the built-in Python AST module.
with open('filename.py') as f:
data = ast.parse(f.read(), 'filename.py')
# Loop through all declarations.
for decl in data.body:
# Only modify functions
if isinstance(decl, ast.FunctionDef):
# The docstring is the first statement of the body. So we don't
# want to change it. Instead, replace the rest of the body with
# our pre-built "raise" call. Note that I figured out what "raise"
# looked like in AST form by running
#
# ast.dump(ast.parse("raise NotImplementedError()"))
#
decl.body[1:] = [ast.Raise(ast.Call(ast.Name('NotImplementedError'), [], []), None)]
# Use astunparse to pretty print the result as Python code.
print(astunparse.unparse(data))
It's definitely possible to automate, if you're willing to take the time to do it. If you're planning to do this for several assignments, or even over several semesters, you may consider making a script for it. But if you're just doing it once, it may be more worth your time to just do it by hand.

How can I use a decorator to wrap the result of my function, inside of a multiple external library functions

I've only recently learned about decorators, and despite reading nearly every search result I can find about this question, I cannot figure this out. All I want to do is define some function "calc(x,y)", and wrap its result with a series of external functions, without changing anything inside of my function, nor its calls in the script, such as:
#tan
#sqrt
def calc(x,y):
return (x+y)
### calc(x,y) = tan(sqrt(calc(x,y))
### Goal is to have every call of calc in the script automatically nest like that.
After reading about decorators for almost 10 hours yesterday, I got the strong impression this is what they were used for. I do understand that there are various ways to modify how the functions are passed to one another, but I can't find any obvious guide on how to achieve this. I read that maybe functools wraps can be used for this purpose, but I cannot figure that out either.
Most of the desire here is to be able to quickly and easily test how different functions modify the results of others, without having to tediously wrap functions between parenthesis... That is, to avoid having to mess with parenthesis at all, having my modifier test functions defined on their own lines.
A decorator is simply a function that takes a function and returns another function.
def tan(f):
import math
def g(x,y):
return math.tan(f(x,y))
return g

Reset global variables in timeit.repeat

Scenario
Let test be the module we run as __main__. This module contains one global variable named primes, which is initialized in the module with the following assignment.
primes = []
The module also contains a function named pi, which alters this global variable:
def pi(n):
global primes
"""Some code that modifies the global 'primes' variable"""
I then want to time said function using the builtin timeit module. I want to use the timeit.repeat function and get the minimum value of the timing, as a way of improving the measurement's accuracy (instead of measuring just one time, which may be subject to slow-down due to unrelated processes).
print(min(timeit.repeat('test.pi(50000)',
setup="import test",
number=1, repeat=10)) * 1000)
The problem is that the pi function behaves differently depending on the value of primes: I expected that, for each repetition, the import test statement in the setup parameter would re-run the primes = [] statement in the test, thus 'resetting' primes so that the code being executed would be identical for each repetition. But, instead, the value of primes that resulted from the previous execution is used, so I had to add the statement test.primes = [] to the setup parameter:
print(min(timeit.repeat('test.pi(50000)',
setup="import test \n" + "test.primes = []",
number=1, repeat=10)) * 1000)
Question
This leads me to the question: is there a direct way (i.e. in one statement) to 'reset' the values of all the global variables to what they were when they were first assigned in the module?
In this specific scenario adding that one statement to manually 'reset' primes works fine, but consider a case in which there are a lot of global variables, and you want to 'reset' all of them.
Side quest-ion
Why doesn't the statement import test re-run the initial primes = [] assignment?
Let's start with your side question, because it turns out that it's actually central to everything:
Why doesn't the statement import test re-run the initial primes = [] assignment?"
Because, as explained in the docs on the import system and the import statement, what import test does is, loosely, this pseudocode:
if 'test' not in sys.modules:
find, load (compiling if needed), and exec the module
sys.modules['test'] = result
test = sys['test.modules']
OK, but why does it do that?
If you have two modules that both import the same module, they expect to see the same globals. And remember that types, functions, etc. defined at the top level of a function are all globals. For example, if sortedlist.py imports collections.abc to class SortedList(collections.abc.Sequence):, and scraper.py imports collections.abc to isinstance(something, collections.abc.Sequence), you'd want a SortedList to pass that test—but it won't if those are two completely independent types because they came from two different module objects that happen to have the same name,
If you have 12 modules that all import pandas as pd, you'd be running all the Pandas initialization code 12 times. Except that some of your modules also probably import each other, so they'd each be run multiple times, and import Pandas each time. How long do you think it would take to run all the Pandas initialization 60 times?
So, reusing existing modules is almost always what you want.
And when you don't, that's usually a sign that there's something wrong with your design (which may well be the case here).
But "almost always" isn't "always". So there are ways around it. None of them are usually a good idea for live code, but for things like unit tests and benchmarking, there are three basic options that are all fine, as long as the tradeoffs are the ones you want:
del sys.modules['test']. This is obviously pretty hacky, but it actually does exactly what you want here. Any existing references to the old module are completely untouched, but the next time anyone does import test, they're going to get a brand-new test module.
importlib.reload(test). This sounds great, but it may on the one hand be overkill (notice that it forces the module source to be recompiled, which you don't need), while on the other it may not be sufficient (it doesn't actually reset the globals—if your code does primes = [] at the top level, that line gets executed, so who cares, but if your code instead does, say, globals().setdefault('primes', []) inside the pi function, you care).
Instead of import test, manually do all the steps up through executing the module (see the examples in the importlib docs), but don't store it in sys.modules['test'] or in test, just store it in a local variable you discard after each test. This is probably the cleanest, although it does mean 6 lines of code instead of 1.

GridSearchCV: print some expression each time a function completes a loop

Assume you have some function function in Python that works by looping: for example it could be a function that evaluates a certain mathematical expression, e.g. x**2, for all elements from an array, e.g. ([1, 2, ..., 100]) (obviously this is a toy example). Would it be possible to write a code such that, each time function goes through a loop and obtains a result, some code is executed, e.g. print("Loop %s has been executed" % i)? So, in our example, when x**1 has been computed, the program prints Loop 1 has been executed, then when x**2 has been computed, it prints Loop 2 has been executed, and so on.
Note that the difficulty comes from the fact that I do not program the function, it is a preexisting function from some package (more specifically, the function I am interested in would be GridSearchCV from package scikit learn).
The easiest way to do this would be to just copy the function's code into your own function, tweak it, and then use it. In your case, you would have to subclass GridSearchCV and override the _fit method. The problem with this approach is that it may not survive a package upgrade.
In your case, that's not necessary. You can just specify a verbosity level when creating the object:
GridSearchCV(verbose=100)
I'm not entirely sure what the verbosity number itself means. Here's the documentation from the package used internally that does the printing:
The verbosity level: if non zero, progress messages are printed. Above 50, the output is sent to stdout. The frequency of the messages increases with the verbosity level. If it more than 10, all iterations are reported.
You can look at the source code if you really want to know what the verbosity number does. I can't tell.
You could potentially use monkey-patching ("monkey" because it's hacky)
Assuming the library function is
def function(f):
for i in range(100):
i**2
and you want to enter a print statement, you would need to copy the entire function into your own file, and make your tiny edit:
def my_function(f):
for i in range(100):
i**2
print ("Loop %s" % i)
Now you overwrite the library function:
from library import module
module.existing_function = my_function
Obviously this is not an easily maintainable solution (if your target library is upgraded, you might have to go through this process again), so make sure you use it only for temporary debugging purposes.

Will every Python statement involved dictionary make a hash table research?

Python is a kind of "script" programming language.
In this situation:
def dic_test():
a={}
a[0]=[0,0,0]
for i in range(10000000):
a[0][0]+=1
a[0][1]+=1
a[0][2]+=1
print(a)
def no_dic_test():
a={}
a[0]=[0,0,0]
target=a[0]
for i in range(10000000):
target[0]+=1
target[1]+=1
target[2]+=1
print(a)
Will no_dic_test() be faster than dic_test()?
I thought Yes. Because, Python is dynamical. Each statement will be translated separately.
I used profile to benchmark. The first function was slower than second one, but the different was slight.
First function: 5 function calls in 26.113 seconds
Second function: 5 function calls in 23.835 seconds
That is a extreme case. In my own case, like 10k keys, 10k times operations, direct use of a dictionary will be faster. I am so surprised.
To end, is there "static compiler" like C or cache optimisation in Python for Dictionary? or are Python hash table just too fast to face the problems?
Thanks!
Its pretty obvious that the second function is doing far less work on each loop.
The first function will have to do a dict lookup and local store for each loop where as the second function does this once.
There are runtimes like PyPy that spot the hot loop and JIT compile them for added performance, but the CPython runtime doesn't do this kind of optimisation yet.

Categories