Assume you have some function function in Python that works by looping: for example it could be a function that evaluates a certain mathematical expression, e.g. x**2, for all elements from an array, e.g. ([1, 2, ..., 100]) (obviously this is a toy example). Would it be possible to write a code such that, each time function goes through a loop and obtains a result, some code is executed, e.g. print("Loop %s has been executed" % i)? So, in our example, when x**1 has been computed, the program prints Loop 1 has been executed, then when x**2 has been computed, it prints Loop 2 has been executed, and so on.
Note that the difficulty comes from the fact that I do not program the function, it is a preexisting function from some package (more specifically, the function I am interested in would be GridSearchCV from package scikit learn).
The easiest way to do this would be to just copy the function's code into your own function, tweak it, and then use it. In your case, you would have to subclass GridSearchCV and override the _fit method. The problem with this approach is that it may not survive a package upgrade.
In your case, that's not necessary. You can just specify a verbosity level when creating the object:
GridSearchCV(verbose=100)
I'm not entirely sure what the verbosity number itself means. Here's the documentation from the package used internally that does the printing:
The verbosity level: if non zero, progress messages are printed. Above 50, the output is sent to stdout. The frequency of the messages increases with the verbosity level. If it more than 10, all iterations are reported.
You can look at the source code if you really want to know what the verbosity number does. I can't tell.
You could potentially use monkey-patching ("monkey" because it's hacky)
Assuming the library function is
def function(f):
for i in range(100):
i**2
and you want to enter a print statement, you would need to copy the entire function into your own file, and make your tiny edit:
def my_function(f):
for i in range(100):
i**2
print ("Loop %s" % i)
Now you overwrite the library function:
from library import module
module.existing_function = my_function
Obviously this is not an easily maintainable solution (if your target library is upgraded, you might have to go through this process again), so make sure you use it only for temporary debugging purposes.
Related
I am designing python assignments for a class. I define functions, write docstrings and then I implement them. Afterward, I'd like to remove all my implementations of the functions and replace only the code (not the doc-strings, function names, and arguments) with a raise NotImplementedError.
Is there any tool (e.g. IDE) which removes all the code for me automatically, so that I don't have to replace the implemented function by myself? I was thinking about writing a small script, but I thought I might ask here before I do this ...
If anyone has written something similar or knows of a quick way how to this, I would appreciate this a lot.
Here's a minimal example of what I'd like to achieve:
test.py
def add(a,b):
"""
Adds two numbers
"""
return a+b
def multiply(a,b):
"""
Multiplies two numbers
"""
return a*b
should become in an automated fashion (and of course for much larger files):
test.py
def add(a,b):
"""
Adds two numbers
"""
raise NotImplementedEror
def multiply(a,b):
"""
Multiplies two numbers
"""
raise NotImplementedEror
I don't know of a tool to do specifically this, but Python provides great AST manipulation tools within its own standard library via the ast module. You'll need a third party module to "unparse" the result after transformation back into regular Python code, and after a quick search I found this one seems to do the trick, although there do seem to be many others.
Here's a bit of sample code to get you in the right direction. Obviously, you'll need to tweak this to get the exact behavior you want, especially if you want to provide classes instead of just top-level functions (as I've written nothing to handle that use case). But with a bit of Python knowledge, it should be possible to automate.
Also, this is Python 3 (which, as of the start of 2020, is the only supported Python version). If you're still on Python 2, it may require some modifications.
import ast
import astunparse
# Read our file using the built-in Python AST module.
with open('filename.py') as f:
data = ast.parse(f.read(), 'filename.py')
# Loop through all declarations.
for decl in data.body:
# Only modify functions
if isinstance(decl, ast.FunctionDef):
# The docstring is the first statement of the body. So we don't
# want to change it. Instead, replace the rest of the body with
# our pre-built "raise" call. Note that I figured out what "raise"
# looked like in AST form by running
#
# ast.dump(ast.parse("raise NotImplementedError()"))
#
decl.body[1:] = [ast.Raise(ast.Call(ast.Name('NotImplementedError'), [], []), None)]
# Use astunparse to pretty print the result as Python code.
print(astunparse.unparse(data))
It's definitely possible to automate, if you're willing to take the time to do it. If you're planning to do this for several assignments, or even over several semesters, you may consider making a script for it. But if you're just doing it once, it may be more worth your time to just do it by hand.
I've only recently learned about decorators, and despite reading nearly every search result I can find about this question, I cannot figure this out. All I want to do is define some function "calc(x,y)", and wrap its result with a series of external functions, without changing anything inside of my function, nor its calls in the script, such as:
#tan
#sqrt
def calc(x,y):
return (x+y)
### calc(x,y) = tan(sqrt(calc(x,y))
### Goal is to have every call of calc in the script automatically nest like that.
After reading about decorators for almost 10 hours yesterday, I got the strong impression this is what they were used for. I do understand that there are various ways to modify how the functions are passed to one another, but I can't find any obvious guide on how to achieve this. I read that maybe functools wraps can be used for this purpose, but I cannot figure that out either.
Most of the desire here is to be able to quickly and easily test how different functions modify the results of others, without having to tediously wrap functions between parenthesis... That is, to avoid having to mess with parenthesis at all, having my modifier test functions defined on their own lines.
A decorator is simply a function that takes a function and returns another function.
def tan(f):
import math
def g(x,y):
return math.tan(f(x,y))
return g
What is the difference between a coroutine and a continuation and a generator ?
I'll start with generators, seeing as they're the simplest case. As #zvolkov mentioned, they're functions/objects that can be repeatedly called without returning, but when called will return (yield) a value and then suspend their execution. When they're called again, they will start up from where they last suspended execution and do their thing again.
A generator is essentially a cut down (asymmetric) coroutine. The difference between a coroutine and generator is that a coroutine can accept arguments after it's been initially called, whereas a generator can't.
It's a bit difficult to come up with a trivial example of where you'd use coroutines, but here's my best try. Take this (made up) Python code as an example.
def my_coroutine_body(*args):
while True:
# Do some funky stuff
*args = yield value_im_returning
# Do some more funky stuff
my_coro = make_coroutine(my_coroutine_body)
x = 0
while True:
# The coroutine does some funky stuff to x, and returns a new value.
x = my_coro(x)
print x
An example of where coroutines are used is lexers and parsers. Without coroutines in the language or emulated somehow, lexing and parsing code needs to be mixed together even though they're really two separate concerns. But using a coroutine, you can separate out the lexing and parsing code.
(I'm going to brush over the difference between symmetric and asymmetric coroutines. Suffice it to say that they're equivalent, you can convert from one to the other, and asymmetric coroutines--which are the most like generators--are the easier to understand. I was outlining how one might implement asymmetric coroutines in Python.)
Continuations are actually quite simple beasts. All they are, are functions representing another point in the program which, if you call it, will cause execution to automatically switch to the point that function represents. You use very restricted versions of them every day without even realising it. Exceptions, for instance, can be thought of as a kind of inside-out continuation. I'll give you a Python based pseudocode example of a continuation.
Say Python had a function called callcc(), and this function took two arguments, the first being a function, and the second being a list of arguments to call it with. The only restriction on that function would be that the last argument it takes will be a function (which will be our current continuation).
def foo(x, y, cc):
cc(max(x, y))
biggest = callcc(foo, [23, 42])
print biggest
What would happen is that callcc() would in turn call foo() with the current continuation (cc), that is, a reference to the point in the program at which callcc() was called. When foo() calls the current continuation, it's essentially the same as telling callcc() to return with the value you're calling the current continuation with, and when it does that, it rolls back the stack to where the current continuation was created, i.e., when you called callcc().
The result of all of this would be that our hypothetical Python variant would print '42'.
Coroutine is one of several procedures that take turns doing their job and then pause to give control to the other coroutines in the group.
Continuation is a "pointer to a function" you pass to some procedure, to be executed ("continued with") when that procedure is done.
Generator (in .NET) is a language construct that can spit out a value, "pause" execution of the method and then proceed from the same point when asked for the next value.
In newer version of Python, you can send values to Generators with generator.send(), which makes python Generators effectively coroutines.
The main difference between python Generator, and other generator, say greenlet, is that in python, your yield value can only return back to the caller. While in greenlet, target.switch(value) can take you to a specific target coroutine and yield a value where the target would continue to run.
Scenario
Let test be the module we run as __main__. This module contains one global variable named primes, which is initialized in the module with the following assignment.
primes = []
The module also contains a function named pi, which alters this global variable:
def pi(n):
global primes
"""Some code that modifies the global 'primes' variable"""
I then want to time said function using the builtin timeit module. I want to use the timeit.repeat function and get the minimum value of the timing, as a way of improving the measurement's accuracy (instead of measuring just one time, which may be subject to slow-down due to unrelated processes).
print(min(timeit.repeat('test.pi(50000)',
setup="import test",
number=1, repeat=10)) * 1000)
The problem is that the pi function behaves differently depending on the value of primes: I expected that, for each repetition, the import test statement in the setup parameter would re-run the primes = [] statement in the test, thus 'resetting' primes so that the code being executed would be identical for each repetition. But, instead, the value of primes that resulted from the previous execution is used, so I had to add the statement test.primes = [] to the setup parameter:
print(min(timeit.repeat('test.pi(50000)',
setup="import test \n" + "test.primes = []",
number=1, repeat=10)) * 1000)
Question
This leads me to the question: is there a direct way (i.e. in one statement) to 'reset' the values of all the global variables to what they were when they were first assigned in the module?
In this specific scenario adding that one statement to manually 'reset' primes works fine, but consider a case in which there are a lot of global variables, and you want to 'reset' all of them.
Side quest-ion
Why doesn't the statement import test re-run the initial primes = [] assignment?
Let's start with your side question, because it turns out that it's actually central to everything:
Why doesn't the statement import test re-run the initial primes = [] assignment?"
Because, as explained in the docs on the import system and the import statement, what import test does is, loosely, this pseudocode:
if 'test' not in sys.modules:
find, load (compiling if needed), and exec the module
sys.modules['test'] = result
test = sys['test.modules']
OK, but why does it do that?
If you have two modules that both import the same module, they expect to see the same globals. And remember that types, functions, etc. defined at the top level of a function are all globals. For example, if sortedlist.py imports collections.abc to class SortedList(collections.abc.Sequence):, and scraper.py imports collections.abc to isinstance(something, collections.abc.Sequence), you'd want a SortedList to pass that test—but it won't if those are two completely independent types because they came from two different module objects that happen to have the same name,
If you have 12 modules that all import pandas as pd, you'd be running all the Pandas initialization code 12 times. Except that some of your modules also probably import each other, so they'd each be run multiple times, and import Pandas each time. How long do you think it would take to run all the Pandas initialization 60 times?
So, reusing existing modules is almost always what you want.
And when you don't, that's usually a sign that there's something wrong with your design (which may well be the case here).
But "almost always" isn't "always". So there are ways around it. None of them are usually a good idea for live code, but for things like unit tests and benchmarking, there are three basic options that are all fine, as long as the tradeoffs are the ones you want:
del sys.modules['test']. This is obviously pretty hacky, but it actually does exactly what you want here. Any existing references to the old module are completely untouched, but the next time anyone does import test, they're going to get a brand-new test module.
importlib.reload(test). This sounds great, but it may on the one hand be overkill (notice that it forces the module source to be recompiled, which you don't need), while on the other it may not be sufficient (it doesn't actually reset the globals—if your code does primes = [] at the top level, that line gets executed, so who cares, but if your code instead does, say, globals().setdefault('primes', []) inside the pi function, you care).
Instead of import test, manually do all the steps up through executing the module (see the examples in the importlib docs), but don't store it in sys.modules['test'] or in test, just store it in a local variable you discard after each test. This is probably the cleanest, although it does mean 6 lines of code instead of 1.
So I have a time-critical section of code within a Python script, and I decided to write a Cython module (with one function -- all I need) to replace it. Unfortunately, the execution speed of the function I'm calling from the Cython module (which I'm calling within my Python script) isn't nearly as fast as I tested it to be in a variety of other scenarios. Note that I CANNOT share the code itself because of contract law! See the following cases, and take them as an initial description of my issue:
(1) Execute Cython function by using the Python interpreter to import the module and run the function. Runs relatively quickly (~0.04 sec on ~100 separate tests, versus original ~0.24 secs).
(2) Call Cython function within Python script at 'global' level (i.e. not inside any function). Same speed as case (1).
(3) Call Cython function within Python script, with Cython function inside my Python script's main function; tested with the Cython function in global and local namespaces, all with the same speed as case (1).
(4) Same as (3), but inside a simple for-loop within said Python function. Same speed as case (1).
(5) problem! Same as (4), but inside yet another for-loop: Cython function's execution time (whether called globally or locally) balloons to ~10 times that of the other cases, and this is where I need the function to get called. Nothing odd to report about this loop, and I tested all of the components of this loop (adjusting/removing what I could). I also tried using a 'while' loop for giggles, to no avail.
"One thing I've yet to try is making this inner-most loop a function and going from there." EDIT: Just tried this- no luck.
Thanks for any suggestions you have- I deeply regret not being able to share my code...it hurts my soul a little, but my client just can't have this code floating around. Let me know if there is any other information that I can provide!
-The Real Problem and an Initial (ugly) Solution-
It turns out that the best hint in this scenario was the obvious one (as usual): it wasn't the for-loop that was causing the problem; why would it? After a few more tests, it became obvious that something about the way I was calling my Cython function was wrong, because I could call it elsewhere (using an input variable different from the one going to the 'real' Cython function) without the performance loss issue.
The underlying issue: data types. I wrote my Cython function to expect a list full of standard floats. Unfortunately, my code did this:
function_input = list(numpy_array_containing_npfloat64_data) # yuck.
type(function_input[0]) = numpy.float64
output = Cython_Function(function_input)
inside the Cython function:
def Cython_Function(list function_input):
cdef many_vars
"""process lots of vars expecting C floats""" # Slowness from converting numpy.float64's --> floats???
type(output) = list
return output
I'm aware that I can play around more with types in the Cython function, which I very well may do to prevent having to 'list' an existing numpy array. Anyway, here is my current solution:
function_input = [float(x) for x in function_input]
I welcome any feedback and suggestions for improvement. The function_input numpy array doesn't really need the precision of numpy.float64, but it does get used a few times before getting passed to my Cython function.
It could be that, while individually, each function call with the Cython implementation is faster than its corresponding Python function, there is more overhead in the Cython function call because it has to look up the name in the module namespace. You can try assigning the function to a local callable first, for example:
from module import function
def main():
my_func = functon
for i in sequence:
my_func()
If possible, you should try to include the loops within the Cython function, which would reduce the overhead of a Python loop to the (very minimal) overhead of a compiled C loop. I understand that it might not be possible (i.e. need references from a global/larger scope), but it's worth some investigation on your part. Good luck!
function_input = list(numpy_array_containing_npfloat64_data)
def Cython_Function(list function_input):
cdef many_vars
I think the problem is in using the numpy array as a list ... can't you use the np.ndarray as input to the Cython function?
def Cython_Function(np.ndarray[dtype=np.float64] input):
....