Function not being called in multiprocessing - python

I am using multiprocessing as given below:
with ProcessPoolExecutor(max_workers=3) as exe:
result = exe.map(self.extraction, tokens)
tokens is a list. Problem is its not getting into extraction function. Tried with print statements within function, not printing at all.

I see two potential reasons why with the code you provide, your function isn't being called, but no error is occurring either:
1 - self.extraction does not have the right signature. If the supplied function isn't one that takes exactly one argument, it won't be called. In the case of a instance function (method), this does not include self. So your method's signature should look like this:
def extraction(self, token):
...
I ran into this case myself a few weeks ago and it cost me a few hours. It's strange that the framework does not complain about the function's signature, and yet doesn't call it either.
2 - tokens is an empty iterable. Since the function will be called once for each item in the iterable, if it is empty, it will never be called. If tokens isn't an iterable at all, it seems that you get an error stating this.

I found the issue and resolved it referring this link Multiprocessing: How to use Pool.map on a function defined in a class?
My code is working like this :
with ProcessPoolExecutor(max_workers=3) as exe:
result = exe.map(ExtractSuggestions.extract_suggestion, tokens)
ExtractSuggestions is my class

Related

Transparently passing through a function with a variable argument list

I am using Python RPyC to communicate between two machines. Since the link may be prone to errors I would like to have a generic wrapper function which takes a remote function name plus that function's parameters as its input, does some status checking, calls the function with the parameters, does a little more status checking and then returns the result of the function call. The wrapper should have no knowledge of the function, its parameters/parameter types or the number of them, or the return value for that matter, the user has to get that right; it should just pass them transparently through.
I get the getattr(conn.root, function)() pattern to call the function but my Python expertise runs out at populating the parameters. I have read various posts on the use of *arg and **kwarg, in particular this one, which suggests that it is either difficult or impossible to do what I want to do. Is that correct and, if so, might there be a scheme which would work if I, say, ensured that all the function parameters were keyword parameters?
I do own both ends of this interface (the caller and the called) so I could arrange to dictionary-ise all the function parameters but I'd rather not make my API too peculiar if I could possibly avoid it.
Edit: the thing being called, at the remote end of the link, is a class with very ordinary methods, e.g.;
def exposed_a(self)
def exposed_b(self, thing1)
def exposed_c(self, thing1=None)
def exposed_d(self, thing1=DEFAULT_VALUE1, thing2=None)
def exposed_e(self, thing1, thing2, thing3=DEFAULT_VALUE1, thing4=None)
def exposed_f(self, thing1=None, thing2=None)
...where the types of each argument (and the return values) could be string, dict, number or list.
And it is indeed, trivial, my Goggle fu had simply failed me in finding the answer. In the hope of helping anyone else who is inexperienced in Python and is having a Google bad day:
One simply takes *arg and **kwarg as parameters and passes them directly on, with the asterisks attached. So in my case, to do my RPyC pass-through, where conn is the RPyC connection:
def my_passthru(conn, function_name, *function_args, **function_kwargs):
# Do a check of something or other here
return_value = getattr(conn.root, function_name)(*function_args, **function_kwargs)
# Do another check here
return return_value
Then, for example, a call to my exposed_e() method above might be:
return_value = my_passthru(conn, e, thing1, thing2, thing3)
(the exposed_ prefix being added automagically by RPyC in this case).
And of course one could put a try: / except ConnectionRefusedError: around the getattr() call in my_passthru() to generically catch the case where the connection has dropped underneath RPyC, which was my main purpose.

Running function code only when NOT assigning output to variable?

I am looking for a way in python to stop certain parts of the code inside a function but only when the output of the function is assigned to a variable. If the the function is run without any assignment then it should run all the inside of it.
Something like this:
def function():
print('a')
return ('a')
function()
A=function()
The first time that I call function() it should display a on the screen, while the second time nothing should print and only store value returned into A.
I have not tried anything since I am kind of new to Python, but I was imagining it would be something like the if __name__=='__main__': way of checking if a script is being used as a module or run directly.
I don't think such a behavior could be achieved in python, because within the scope of the function call, there is no indication what your will do with the returned value.
You will have to give an argument to the function that tells it to skip/stop with a default value to ease the call.
def call_and_skip(skip_instructions=False):
if not skip_instructions:
call_stuff_or_not()
call_everytime()
call_and_skip()
# will not skip inside instruction
a_variable = call_and_skip(skip_instructions=True)
# will skip inside instructions
As already mentionned in comments, what you're asking for is not technically possible - a function has (and cannot have) any knowledge of what the calling code will do with the return value.
For a simple case like your example snippet, the obvious solution is to just remove the print call from within the function and leave it out to the caller, ie:
def fun():
return 'a'
print(fun())
Now I assume your real code is a bit more complex than this so such a simple solution would not work. If that's the case, the solution is to split the original function into many distinct one and let the caller choose which part it wants to call. If you have a complex state (local variables) that need to be shared between the different parts, you can wrap the whole thing into a class, turning the sub functions into methods and storing those variables as instance attributes.

Why am I getting this TypeError?

RESOLVED: Okay, you guys probably won't believe this. I did a lot of digging and it turns out that all the files we are loading and using were created incorrectly. The files fail to conform with the code we are writing — the things we want to do in our program are simply not possible based on the current state of the files we load. I am currently working on fixing this. Sorry about the non-question, guys!
In Python I have code that essentially reads as follows:
partsList = getPartsList() # this function returns a list
for part in partsList:
...
bar(partsList)
def bar(partsList):
for part in partsList:
...
But when I run the code I get the following TypeError:
TypeError: iteration over non-sequence
This TypeError is in reference to the noted line:
def bar(partsList):
for part in partsList: # this is the line with the TypeError
...
How can this be? I know that partsList is not a non-sequence because just before my program calls bar(partsList), I explicitly iterate over partsList.
My function does not modify partsList before interacting with it, and I do not modify partsList when iterating through it prior to calling the function, yet somehow it changes from a list to a non-sequence when the function is called.
I am working entirely within a class so these are all methods actually; I just thought it would be easier to read if I present the code this way.
The following is in response to the comments:
I wish I could provide you all with the full code, but at the moment the program requires exactly 275 files to run and has 20+ .py files. I will mention that the method in question does employ recursion after iteration through its given list. I thought this may be linked to the error, but when when attempting to print the list itself and its contents, the program gave the same TypeError before making it through the method even once, so I know that this is not due to the recursion; it never even recursed.
Ok I have inserted print statements as follows (keep in mind these are within methods in a class):
def someMethod(self):
...
partsList = self.getPartsList() # this function returns a list
for part in partsList:
...
print partsList # prints [object(1), object(2)]
self.bar(partsList)
def bar(self, partsList):
print partsList # prints <filename.class instance at 0x04886148>
for part in partsList: # still gives me the TypeError
...
When I say filename.class I don't literally mean filename and class. You guys know what I mean.
Is the second print statement printing <filename.class instance at 0x04886148> because it is pointing to the actual partsList? I'm not entirely sure how pointers and references work in Python.
You don't define bar correctly; its first argument is a reference to the object that calls it, and the second argument is the list you pass as the explicit argument.
def bar(self, partsList):
for part in partsList:
...
Your answer is there in the print lines.
def bar(self, partsList):
print partsList # prints <filename.class instance at 0x04886148>
for part in partsList: # still gives me the TypeError
...
partsList isn't a list going into this method. Here is some tweaked, functioning, example code from your code:
class myClass():
def someMethod(self):
partsList=self.getPartsList()
for part in partsList:
print part
self.bar(partsList)
def bar(self, pList):
print pList
for part in pList:
print part
def getPartsList(self):
return ['a', 'b', 'c']
Running this interactively gets me this:
from fake_try import myClass
x = myClass()
x.someMethod()
a
b
c
['a', 'b', 'c']
a
b
c
You'll notice that when I called "print pList" I received a pretty print output of the list. You are receiving an object of your class type.
I understand and empathize with your situation. Having a large, complex program throwing errors can be quite painful to debug. Unfortunately without seeing your entire code I don't think anyone here will be able to debug your issue because my guess is that you are calling 'someMethod' in a way that is unexpected in the actual code(or in an unexpected place) which is causing you to have issues.
There are a couple of ways you can debug this.
I am assuming that everything ran UNTIL you added the someMethod functionality? Revert your code to a state prior to the error and add lines on at a time(with dummy functions if neccesary) to find exactly where the unexpected value is coming from. If you cannot revert my first step would be to simplify all logic surrounding this issue. You have a function 'getPartsList()' that's supposed to return a list. It looks like it is here, but make it even easier to check. Make a dummy function that simply returns a fake list and see what the behavior is. Change things one step at a time until you iron out where the issue is.
You may not be familiar with the inspect module. Try importing inspect in your module and using inspect.getmember(x) with x being the object you want more information about. I would probably use this in place of you print partsList in the bar method( something like inspect.getmember(partsList) ) I would guess that you're somehow passing a class there instead of the list, this should tell you what that class has been instantiated as.

Trying to understand Python memoization code snippet

In a recent Hacker Newsletter issue, this very useful article about decorators in Python was linked. I like the article and I think I understand most of the decorator examples. However, in the non-decorator memoization example, I'm very confused by the code:
def memoize(fn):
stored_results = {}
def memoized(*args):
try:
# try to get the cached result
return stored_results[args]
except KeyError:
# nothing was cached for those args. let's fix that.
result = stored_results[args] = fn(*args)
return result
return memoized
I'm confused about how this function would create a persisting dictionary stored_results that gets appended to. After re-reading it, copy/pasting it into my editor and playing with it, and looking online for help, I still don't understand what the syntax stored_results[args] = fn(*args) is really doing.
(1) The article suggests that the above code will return the function, but that now it will search a dictionary first before executing on novel arguments. How does this happen? Why isn't stored_results just local to memoize? Why doesn't it get destroyed when memoized is returned?
(2) Links to other questions or web resources that explain the argument passing here with *args would be helpful too. If *args is a list of arguments, why can we use the syntax stored_results[args], when normally you get a non-hashable error when trying to index a dictionary on a list?
Thanks for any clarifying thoughts.
If *args is a list of arguments, why can we use the syntax stored_results[args], when normally you get a non-hashable error when trying to index a dictionary on a list?
Because it's not a list, it's a tuple. Lists are mutable, so you can't define a meaningful hash function on them. Tuples however are immutable data structures.
Why isn't stored_results just local to memoize? Why doesn't it get destroyed when memoized is returned?
Because memoize and memoized share the same name context (closure). The closure persists because memoized holds a reference to it and you return it and assign it to a global name (this is the effect of the decorator statement). With the closure, all the captured values persist as well.

Can I be warned when I used a generator function by accident

I was working with generator functions and private functions of a class. I am wondering
Why when yielding (which in my one case was by accident) in __someFunc that this function just appears not to be called from within __someGenerator. Also what is the terminology I want to use when referring to these aspects of the language?
Can the python interpreter warn of such instances?
Below is an example snippet of my scenario.
class someClass():
def __init__(self):
pass
#Copy and paste mistake where yield ended up in a regular function
def __someFunc(self):
print "hello"
#yield True #if yielding in this function it isn't called
def __someGenerator (self):
for i in range(0, 10):
self.__someFunc()
yield True
yield False
def someMethod(self):
func = self.__someGenerator()
while func.next():
print "next"
sc = someClass()
sc.someMethod()
I got burned on this and spent some time trying to figure out why a function just wasn't getting called. I finally discovered I was yielding in function I didn't want to in.
A "generator" isn't so much a language feature, as a name for functions that "yield." Yielding is pretty much always legal. There's not really any way for Python to know that you didn't "mean" to yield from some function.
This PEP http://www.python.org/dev/peps/pep-0255/ talks about generators, and may help you understand the background better.
I sympathize with your experience, but compilers can't figure out what you "meant for them to do", only what you actually told them to do.
I'll try to answer the first of your questions.
A regular function, when called like this:
val = func()
executes its inside statements until it ends or a return statement is reached. Then the return value of the function is assigned to val.
If a compiler recognizes the function to actually be a generator and not a regular function (it does that by looking for yield statements inside the function -- if there's at least one, it's a generator), the scenario when calling it the same way as above has different consequences. Upon calling func(), no code inside the function is executed, and a special <generator> value is assigned to val. Then, the first time you call val.next(), the actual statements of func are being executed until a yield or return is encountered, upon which the execution of the function stops, value yielded is returned and generator waits for another call to val.next().
That's why, in your example, function __someFunc didn't print "hello" -- its statements were not executed, because you haven't called self.__someFunc().next(), but only self.__someFunc().
Unfortunately, I'm pretty sure there's no built-in warning mechanism for programming errors like yours.
Python doesn't know whether you want to create a generator object for later iteration or call a function. But python isn't your only tool for seeing what's going on with your code. If you're using an editor or IDE that allows customized syntax highlighting, you can tell it to give the yield keyword a different color, or even a bright background, which will help you find your errors more quickly, at least. In vim, for example, you might do:
:syntax keyword Yield yield
:highlight yield ctermbg=yellow guibg=yellow ctermfg=blue guifg=blue
Those are horrendous colors, by the way. I recommend picking something better. Another option, if your editor or IDE won't cooperate, is to set up a custom rule in a code checker like pylint. An example from pylint's source tarball:
from pylint.interfaces import IRawChecker
from pylint.checkers import BaseChecker
class MyRawChecker(BaseChecker):
"""check for line continuations with '\' instead of using triple
quoted string or parenthesis
"""
__implements__ = IRawChecker
name = 'custom_raw'
msgs = {'W9901': ('use \\ for line continuation',
('Used when a \\ is used for a line continuation instead'
' of using triple quoted string or parenthesis.')),
}
options = ()
def process_module(self, stream):
"""process a module
the module's content is accessible via the stream object
"""
for (lineno, line) in enumerate(stream):
if line.rstrip().endswith('\\'):
self.add_message('W9901', line=lineno)
def register(linter):
"""required method to auto register this checker"""
linter.register_checker(MyRawChecker(linter))
The pylint manual is available here: http://www.logilab.org/card/pylint_manual
And vim's syntax documentation is here: http://www.vim.org/htmldoc/syntax.html
Because the return keyword is applicable in both generator functions and regular functions, there's nothing you could possibly check (as #Christopher mentions). The return keyword in a generator indicates that a StopIteration exception should be raised.
If you try to return with a value from within a generator (which doesn't make sense, since return just means "stop iteration"), the compiler will complain at compile-time -- this may catch some copy-and-paste mistakes:
>>> def foo():
... yield 12
... return 15
...
File "<stdin>", line 3
SyntaxError: 'return' with argument inside generator
I personally just advise against copy and paste programming. :-)
From the PEP:
Note that return means "I'm done, and have nothing interesting to
return", for both generator functions and non-generator functions.
We do this.
Generators have names with "generate" or "gen" in their name. It will have a yield statement in the body. Pretty easy to check visually, since no method is much over 20 lines of code.
Other methods don't have "gen" in their name.
Also, we do not every use __ (double underscore) names under any circumstances. 32,000 lines of code. Non __ names.
The "generator vs. non-generator" method function is entirely a design question. What did the programmer "intend" to happen. The compiler can't easily validate your intent, it can only validate what you actually typed.

Categories