Trying to understand Python memoization code snippet

Trying to understand Python memoization code snippet - python

In a recent Hacker Newsletter issue, this very useful article about decorators in Python was linked. I like the article and I think I understand most of the decorator examples. However, in the non-decorator memoization example, I'm very confused by the code:
def memoize(fn):
stored_results = {}
def memoized(*args):
try:
# try to get the cached result
return stored_results[args]
except KeyError:
# nothing was cached for those args. let's fix that.
result = stored_results[args] = fn(*args)
return result
return memoized
I'm confused about how this function would create a persisting dictionary stored_results that gets appended to. After re-reading it, copy/pasting it into my editor and playing with it, and looking online for help, I still don't understand what the syntax stored_results[args] = fn(*args) is really doing.
(1) The article suggests that the above code will return the function, but that now it will search a dictionary first before executing on novel arguments. How does this happen? Why isn't stored_results just local to memoize? Why doesn't it get destroyed when memoized is returned?
(2) Links to other questions or web resources that explain the argument passing here with *args would be helpful too. If *args is a list of arguments, why can we use the syntax stored_results[args], when normally you get a non-hashable error when trying to index a dictionary on a list?
Thanks for any clarifying thoughts.

If *args is a list of arguments, why can we use the syntax stored_results[args], when normally you get a non-hashable error when trying to index a dictionary on a list?
Because it's not a list, it's a tuple. Lists are mutable, so you can't define a meaningful hash function on them. Tuples however are immutable data structures.
Why isn't stored_results just local to memoize? Why doesn't it get destroyed when memoized is returned?
Because memoize and memoized share the same name context (closure). The closure persists because memoized holds a reference to it and you return it and assign it to a global name (this is the effect of the decorator statement). With the closure, all the captured values persist as well.

Related

Transparently passing through a function with a variable argument list

I am using Python RPyC to communicate between two machines. Since the link may be prone to errors I would like to have a generic wrapper function which takes a remote function name plus that function's parameters as its input, does some status checking, calls the function with the parameters, does a little more status checking and then returns the result of the function call. The wrapper should have no knowledge of the function, its parameters/parameter types or the number of them, or the return value for that matter, the user has to get that right; it should just pass them transparently through.
I get the getattr(conn.root, function)() pattern to call the function but my Python expertise runs out at populating the parameters. I have read various posts on the use of *arg and **kwarg, in particular this one, which suggests that it is either difficult or impossible to do what I want to do. Is that correct and, if so, might there be a scheme which would work if I, say, ensured that all the function parameters were keyword parameters?
I do own both ends of this interface (the caller and the called) so I could arrange to dictionary-ise all the function parameters but I'd rather not make my API too peculiar if I could possibly avoid it.
Edit: the thing being called, at the remote end of the link, is a class with very ordinary methods, e.g.;
def exposed_a(self)
def exposed_b(self, thing1)
def exposed_c(self, thing1=None)
def exposed_d(self, thing1=DEFAULT_VALUE1, thing2=None)
def exposed_e(self, thing1, thing2, thing3=DEFAULT_VALUE1, thing4=None)
def exposed_f(self, thing1=None, thing2=None)
...where the types of each argument (and the return values) could be string, dict, number or list.

And it is indeed, trivial, my Goggle fu had simply failed me in finding the answer. In the hope of helping anyone else who is inexperienced in Python and is having a Google bad day:
One simply takes *arg and **kwarg as parameters and passes them directly on, with the asterisks attached. So in my case, to do my RPyC pass-through, where conn is the RPyC connection:
def my_passthru(conn, function_name, *function_args, **function_kwargs):
# Do a check of something or other here
return_value = getattr(conn.root, function_name)(*function_args, **function_kwargs)
# Do another check here
return return_value
Then, for example, a call to my exposed_e() method above might be:
return_value = my_passthru(conn, e, thing1, thing2, thing3)
(the exposed_ prefix being added automagically by RPyC in this case).
And of course one could put a try: / except ConnectionRefusedError: around the getattr() call in my_passthru() to generically catch the case where the connection has dropped underneath RPyC, which was my main purpose.

Function not being called in multiprocessing

I am using multiprocessing as given below:
with ProcessPoolExecutor(max_workers=3) as exe:
result = exe.map(self.extraction, tokens)
tokens is a list. Problem is its not getting into extraction function. Tried with print statements within function, not printing at all.

I see two potential reasons why with the code you provide, your function isn't being called, but no error is occurring either:
1 - self.extraction does not have the right signature. If the supplied function isn't one that takes exactly one argument, it won't be called. In the case of a instance function (method), this does not include self. So your method's signature should look like this:
def extraction(self, token):
...
I ran into this case myself a few weeks ago and it cost me a few hours. It's strange that the framework does not complain about the function's signature, and yet doesn't call it either.
2 - tokens is an empty iterable. Since the function will be called once for each item in the iterable, if it is empty, it will never be called. If tokens isn't an iterable at all, it seems that you get an error stating this.

I found the issue and resolved it referring this link Multiprocessing: How to use Pool.map on a function defined in a class?
My code is working like this :
with ProcessPoolExecutor(max_workers=3) as exe:
result = exe.map(ExtractSuggestions.extract_suggestion, tokens)
ExtractSuggestions is my class

Why should I set a function list argument as empty or none instead of using a normal variable

I've been looking up the information about why people should not use empty list as a function argument and pass none type as an argument and came up with a few questions.
Here is my code:
def add_employee(emp, emp_list=None):
if emp_list is None:
emp_list = []
emp_list.append(emp)
print(emp_list)
And here is code without second argument's type specified:
def add_employee(emp, emp_list):
emp_list.append(emp)
return emp_list
When I do not define emp_list as an empty list or none I can not utilize function's deafualt argument behavior: I can't call it like add_employee('Mark'), I had to add second variable to pass. Why is it good to have that backup default behaviour? Why couldn't I just leave it as emp_list.

Here is a great explanation of why you should avoid using mutable arguments in your defaults. Or at least use them sparingly: link
The general gist of it that the list will be created for the first time (in a location in memory) when you define the function. So as python reads the code and interprets it, you will end up creating that list once on the first 'read' of the function.
What this means is that you are not creating a fresh list each time you call that function, only when you define it.
To illustrate I will use the example from the link I shared above:
def some_func(default_arg=[]):
default_arg.append("some_string")
return default_arg
>>> some_func()
['some_string']
>>> some_func()
['some_string', 'some_string']
>>> some_func([])
['some_string']
>>> some_func()
['some_string', 'some_string', 'some_string']
If I understood your question correctly, you are asking why you're better off defining the emp_list explicitly rather than having it outside the function. In my mind it boils down to encapsulation. You're essentially trying to make sure that your function doesn't change the behavior of anything outside of its scope so you're forced to pass it things directly and be explicit. In practice if you have a variable outside of the scope named emp_list, it is absolutely fine to just append to it as long as you understand what the expected behavior is.
If you pass a list in the first bit of code as the emp_list, then the variable emp_list will contain your list a. The if statement will check if list a is None and since that check fails, it will skip the next line of assigning it a fresh empty list.

Running function code only when NOT assigning output to variable?

I am looking for a way in python to stop certain parts of the code inside a function but only when the output of the function is assigned to a variable. If the the function is run without any assignment then it should run all the inside of it.
Something like this:
def function():
print('a')
return ('a')
function()
A=function()
The first time that I call function() it should display a on the screen, while the second time nothing should print and only store value returned into A.
I have not tried anything since I am kind of new to Python, but I was imagining it would be something like the if __name__=='__main__': way of checking if a script is being used as a module or run directly.

I don't think such a behavior could be achieved in python, because within the scope of the function call, there is no indication what your will do with the returned value.
You will have to give an argument to the function that tells it to skip/stop with a default value to ease the call.
def call_and_skip(skip_instructions=False):
if not skip_instructions:
call_stuff_or_not()
call_everytime()
call_and_skip()
# will not skip inside instruction
a_variable = call_and_skip(skip_instructions=True)
# will skip inside instructions

As already mentionned in comments, what you're asking for is not technically possible - a function has (and cannot have) any knowledge of what the calling code will do with the return value.
For a simple case like your example snippet, the obvious solution is to just remove the print call from within the function and leave it out to the caller, ie:
def fun():
return 'a'
print(fun())
Now I assume your real code is a bit more complex than this so such a simple solution would not work. If that's the case, the solution is to split the original function into many distinct one and let the caller choose which part it wants to call. If you have a complex state (local variables) that need to be shared between the different parts, you can wrap the whole thing into a class, turning the sub functions into methods and storing those variables as instance attributes.

Can I be warned when I used a generator function by accident

I was working with generator functions and private functions of a class. I am wondering
Why when yielding (which in my one case was by accident) in __someFunc that this function just appears not to be called from within __someGenerator. Also what is the terminology I want to use when referring to these aspects of the language?
Can the python interpreter warn of such instances?
Below is an example snippet of my scenario.
class someClass():
def __init__(self):
pass
#Copy and paste mistake where yield ended up in a regular function
def __someFunc(self):
print "hello"
#yield True #if yielding in this function it isn't called
def __someGenerator (self):
for i in range(0, 10):
self.__someFunc()
yield True
yield False
def someMethod(self):
func = self.__someGenerator()
while func.next():
print "next"
sc = someClass()
sc.someMethod()
I got burned on this and spent some time trying to figure out why a function just wasn't getting called. I finally discovered I was yielding in function I didn't want to in.

A "generator" isn't so much a language feature, as a name for functions that "yield." Yielding is pretty much always legal. There's not really any way for Python to know that you didn't "mean" to yield from some function.
This PEP http://www.python.org/dev/peps/pep-0255/ talks about generators, and may help you understand the background better.
I sympathize with your experience, but compilers can't figure out what you "meant for them to do", only what you actually told them to do.

I'll try to answer the first of your questions.
A regular function, when called like this:
val = func()
executes its inside statements until it ends or a return statement is reached. Then the return value of the function is assigned to val.
If a compiler recognizes the function to actually be a generator and not a regular function (it does that by looking for yield statements inside the function -- if there's at least one, it's a generator), the scenario when calling it the same way as above has different consequences. Upon calling func(), no code inside the function is executed, and a special <generator> value is assigned to val. Then, the first time you call val.next(), the actual statements of func are being executed until a yield or return is encountered, upon which the execution of the function stops, value yielded is returned and generator waits for another call to val.next().
That's why, in your example, function __someFunc didn't print "hello" -- its statements were not executed, because you haven't called self.__someFunc().next(), but only self.__someFunc().
Unfortunately, I'm pretty sure there's no built-in warning mechanism for programming errors like yours.

Python doesn't know whether you want to create a generator object for later iteration or call a function. But python isn't your only tool for seeing what's going on with your code. If you're using an editor or IDE that allows customized syntax highlighting, you can tell it to give the yield keyword a different color, or even a bright background, which will help you find your errors more quickly, at least. In vim, for example, you might do:
:syntax keyword Yield yield
:highlight yield ctermbg=yellow guibg=yellow ctermfg=blue guifg=blue
Those are horrendous colors, by the way. I recommend picking something better. Another option, if your editor or IDE won't cooperate, is to set up a custom rule in a code checker like pylint. An example from pylint's source tarball:
from pylint.interfaces import IRawChecker
from pylint.checkers import BaseChecker
class MyRawChecker(BaseChecker):
"""check for line continuations with '\' instead of using triple
quoted string or parenthesis
"""
__implements__ = IRawChecker
name = 'custom_raw'
msgs = {'W9901': ('use \\ for line continuation',
('Used when a \\ is used for a line continuation instead'
' of using triple quoted string or parenthesis.')),
}
options = ()
def process_module(self, stream):
"""process a module
the module's content is accessible via the stream object
"""
for (lineno, line) in enumerate(stream):
if line.rstrip().endswith('\\'):
self.add_message('W9901', line=lineno)
def register(linter):
"""required method to auto register this checker"""
linter.register_checker(MyRawChecker(linter))
The pylint manual is available here: http://www.logilab.org/card/pylint_manual
And vim's syntax documentation is here: http://www.vim.org/htmldoc/syntax.html

Because the return keyword is applicable in both generator functions and regular functions, there's nothing you could possibly check (as #Christopher mentions). The return keyword in a generator indicates that a StopIteration exception should be raised.
If you try to return with a value from within a generator (which doesn't make sense, since return just means "stop iteration"), the compiler will complain at compile-time -- this may catch some copy-and-paste mistakes:
>>> def foo():
... yield 12
... return 15
...
File "<stdin>", line 3
SyntaxError: 'return' with argument inside generator
I personally just advise against copy and paste programming. :-)
From the PEP:
Note that return means "I'm done, and have nothing interesting to
return", for both generator functions and non-generator functions.

We do this.
Generators have names with "generate" or "gen" in their name. It will have a yield statement in the body. Pretty easy to check visually, since no method is much over 20 lines of code.
Other methods don't have "gen" in their name.
Also, we do not every use __ (double underscore) names under any circumstances. 32,000 lines of code. Non __ names.
The "generator vs. non-generator" method function is entirely a design question. What did the programmer "intend" to happen. The compiler can't easily validate your intent, it can only validate what you actually typed.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Trying to understand Python memoization code snippet - python

Related

Transparently passing through a function with a variable argument list

Function not being called in multiprocessing

Why should I set a function list argument as empty or none instead of using a normal variable

Running function code only when NOT assigning output to variable?

Can I be warned when I used a generator function by accident

Categories

Resources