How could I resolve python - parallel issue?

How could I resolve python - parallel issue? - python

I wrote a code which is something like this:
import time
import pp
class gener(object):
def __init__(self, n):
self.n = n
def __call__(self):
return self
def __iter__(self):
n = self.n
for i in range(n):
time.sleep(2)
yield i
def gen():
return gener(3)
job_server = pp.Server(4)
job = job_server.submit(
gen,
args=(),
depfuncs=("gener",),
modules=("time",),
)
print job()
following #zilupe useful comment I got the following output:
<__main__.gener object at 0x7f862dc18a90>
How can I insert class gener iteration to parallel?
I'd like to run it parallel with other functions.
I need this generator like class, because I want to replace an other code from a modul of a rather complicated program package and it would be hard to refactor that code.
I tried a lot of thing without any success so far.
I made a rather solid investigation on this site, but I couldn't find the right answer suitable for me.
Anybody, any help?
Thank you.
ADDITION:
According to #zilupe useful comments some additional information:
The main purpose is to parallelize the iteration inside of class gener:
def __iter__(self):
n = self.n
for i in range(n):
time.sleep(2)
yield i
I only created of gen() function because I wasn't be able to figure out, how could I use it directly in parallel.submit().

This doesn't answer your question but you get the error because depfuncs has to be a tuple of functions, not strings -- the line should be: depfuncs=(gener,),
Do you realise that function gen is only creating an instance of gener and is not actually calling it?
What are you trying to parallelise here -- creation of a generator or iteration over the generator? If it's the iteration, you should probably create the generator first, pass it to gen in args and then in gen iterate over it:
def gen(g):
for _ in g:
pass
job_server.submit(gen,args=(gener(3),),...)

Related

Prevent calling a function more than once if the parameters have been used before

I would like a way to limit the calling of a function to once per values of parameters.
For example
def unique_func(x):
return x
>>> unique_func([1])
[1]
>>> unique_func([1])
*** wont return anything ***
>>> unique_func([2])
[2]
Any suggestions? I've looked into using memoization but not established a solution just yet.
This is not solved by the suggested Prevent a function from being called twice in a row since that only solves when the previous func call had them parameters.

Memoization uses a mapping of arguments to return values. Here, you just want a mapping of arguments to None, which can be handled with a simple set.
def idempotize(f):
cache = set()
def _(x):
if x in cache:
return
cache.add(x)
return f(x)
return _
#idempotize
def unique_fun(x):
...
With some care, this can be generalized to handle functions with multiple arguments, as long as they are hashable.
def idempotize(f):
cache = set()
def _(*args, **kwargs):
k = (args, frozenset(kwargs.items()))
if k in cache:
return
return f(*args, **kwargs)
return _

Consider using the built-in functools.lru_cache() instead of rolling your own.
It won't return nothing on the second function call with the same arugments (it will return the same thing as the first function call) but maybe you can live with that. It would seem like a negligible price to pay, compared to the advantages of using something that's maintained as part of the standard library.
Requires your argument x to be hashable, so won't work with lists. Strings are fine.
from functools import lru_cache
#lru_cache()
def unique_fun(x):
...

I've built a function decorator to handle this scenario, that limits function calls to the same function in a given timeframe.
You can directly use it via PyPI with pip install ofunctions.threading or checkout the github sources.
Example: I want to limit calls to the same function with the same parameters to one call per 10 seconds:
from ofunctions.threading import no_flood
#no_flood(10)
def my_function():
print("It's me, the function")
for _ in range(0, 5):
my_function()
# Will print the text only once.
if after 10 seconds the function is called again, we'll allow a new execution, but will prevent any other execution for the next 10 seconds.
By default #no_flood will limit function calls with the same parameter, so that calling func(1) and func(2) are still allowed concurrently.
The #no_flood decorator can also limit all function calls to a given function regardless of it's parameters:
from ofunctions.threading import no_flood
#no_flood(10, False)
def my_function(var):
print("It's me, function number {}".format(var))
for i in range(0, 5):
my_function(i)
# Will only print function text once

Python: Redefine function so that it references its own self

Say I have got some function fun, the actual code body of which is out of my control. I can create a new function which does some preprocessing before calling fun, i.e.
def process(x):
x += 1
return fun(x)
If I now want process to take the place of fun for all future calls to fun, I need to do something like
# Does not work
fun = process
This does not work however, as this creates a cyclic reference problem as now fun is called from within the body of fun. One solution I have found is to reference a copy of fun inside of process, like so:
# Works
import copy
fun_cp = copy.copy(fun)
def process(x):
x += 1
return fun_cp(x)
fun = process
but this solution bothers me as I don't really know how Python constructs a copy of a function. I guess my problem is identical to that of extending a class method using inheritance and the super function, but here I have no class.
How can I do this properly? I would think that this is a common enough task that some more or less idiomatic solution should exist, but I have had no luck finding it.

Python is not constructing a copy of your function. copy.copy(fun) just returns fun; the difference is that you saved that to the fun_cp variable, a different variable from the one you saved process to, so it's still in fun_cp when process tries to look for it.
I'd do something similar to what you did, saving the original function to a different variable, just without the "copy":
original_fun = fun
def fun(x):
x += 1
return original_fun(x)
If you want to apply the same wrapping to multiple functions, defining a decorator and doing fun = decorate(fun) is more reusable, but for a one-off, it's more work than necessary and an extra level of indentation.

This looks like a use case for python's closures. Have a function return your function.
def getprocess(f):
def process(x):
x += 1
return f(x) # f is referenced from the enclosing scope.
return process
myprocess = getprocess(fun)
myprocess = getprocess(myprocess)

Credit to coldspeed for the idea of using a closure. A fully working and polished solution is
import functools
def getprocess(f):
#functools.wraps(f)
def process(x):
x += 1
return f(x)
return process
fun = getprocess(fun)
Note that this is 100% equivalent to applying a decorator (getprocess) to fun. I couldn't come up with this solution as the dedicated decorator syntax #getprocess can only be used at the definition place of the function (here fun). To apply it on an existing function though, just do fun = getprocess(fun).

Terminating multiprocess pool when one of the workers found proper solution

I've created a program, which can be sum up to something like this:
from itertools import combinations
class Test(object):
def __init__(self, t2):
self.another_class_object = t2
def function_1(self,n):
a = 2
while(a <= n):
all_combs = combinations(range(n),a)
for comb in all_combs:
if(another_class_object.function_2(comb)):
return 1
a += 1
return -1
Function combinations is imported from itertools. Function_2 returns True or False depending on the input and is a method in another class object, e.g.:
class Test_2(object):
def __init__(self, list):
self.comb_list = list
def function_2(self,c):
return c in self.comb_list
Everything is working just fine. But now I want to change it a little bit and implement multiprocessing. I found this topic that shows an example of how to exit the script when one of the worker process determines no more work needs to be done. So I made following changes:
added a definition of pool into __init__ method: self.pool = Pool(processes=8)
created a callback function:
all_results = []
def callback_function(self, result):
self.all_results.append(result)
if(result):
self.pool.terminate()
changed function_1:
def function_1(self,n):
a = 2
while(a <= n):
all_combs = combinations(range(n),a)
for comb in all_combs:
self.pool.apply_async(self.another_class_object.function_2, args=comb, callback=self.callback_function)
#self.pool.close()
#self.pool.join()
if(True in all_results):
return 1
a += 1
return -1
Unfortunately, it does not work as I expected. Why? After debugging it looks like the callback function is never reached. I thought that it would be reached by every worker. Am I wrong? What can be the problem?

I did not try your code as such, but I tried your structure. Are you sure the problem is in callback function and not the worker function? I did not manage to get apply_async launch a single instance of the worker function if the function was a class method. It just did not do anything. Apply_async completes without error but it does not implement the worker.
As soon as I moved the worker function (in your case another_class_object.function2) as a standalone global function outside classes, it started working as expected and the callback was triggered normally. The callback function, in contrast, seems to work fine as a class method.
There seems to be discussion about this for example here: Why can I pass an instance method to multiprocessing.Process, but not a multiprocessing.Pool?
Is this in any way useful?
Hannu

Question: ... not work as I expected. ... What can be the problem?
It's always necessary to get() the Results from pool.apply_async(... to see the Errors from the Pool Processes.
Change to the following:
pp = []
for comb in all_combs:
pp.append(pool.apply_async(func=self.another_class_object.function_2, args=comb, callback=self.callback_function))
pool.close()
for ar in pp:
print('ar=%s' % ar.get())
And you will see this Error:
TypeError: function_2() takes 2 positional arguments but 3 were given
Fix for this Error, change args=comb to args=(comb,):
pp.append(pool.apply_async(func=self.another_class_object.function_2, args=(comb,), callback=self.callback_function))
Tested with Python: 3.4.2

Memoize a function so that it isn't reset when I rerun the file in Python

I often do interactive work in Python that involves some expensive operations that I don't want to repeat often. I'm generally running whatever Python file I'm working on frequently.
If I write:
import functools32
#functools32.lru_cache()
def square(x):
print "Squaring", x
return x*x
I get this behavior:
>>> square(10)
Squaring 10
100
>>> square(10)
100
>>> runfile(...)
>>> square(10)
Squaring 10
100
That is, rerunning the file clears the cache. This works:
try:
safe_square
except NameError:
#functools32.lru_cache()
def safe_square(x):
print "Squaring", x
return x*x
but when the function is long it feels strange to have its definition inside a try block. I can do this instead:
def _square(x):
print "Squaring", x
return x*x
try:
safe_square_2
except NameError:
safe_square_2 = functools32.lru_cache()(_square)
but it feels pretty contrived (for example, in calling the decorator without an '#' sign)
Is there a simple way to handle this, something like:
#non_resetting_lru_cache()
def square(x):
print "Squaring", x
return x*x
?

Writing a script to be executed repeatedly in the same session is an odd thing to do.
I can see why you'd want to do it, but it's still odd, and I don't think it's unreasonable for the code to expose that oddness by looking a little odd, and having a comment explaining it.
However, you've made things uglier than necessary.
First, you can just do this:
#functools32.lru_cache()
def _square(x):
print "Squaring", x
return x*x
try:
safe_square_2
except NameError:
safe_square_2 = _square
There is no harm in attaching a cache to the new _square definition. It won't waste any time, or more than a few bytes of storage, and, most importantly, it won't affect the cache on the previous _square definition. That's the whole point of closures.
There is a potential problem here with recursive functions. It's already inherent in the way you're working, and the cache doesn't add to it in any way, but you might only notice it because of the cache, so I'll explain it and show how to fix it. Consider this function:
#lru_cache()
def _fact(n):
if n < 2:
return 1
return _fact(n-1) * n
When you re-exec the script, even if you have a reference to the old _fact, it's going to end up calling the new _fact, because it's accessing _fact as a global name. It has nothing to do with the #lru_cache; remove that, and the old function will still end up calling the new _fact.
But if you're using the renaming trick above, you can just call the renamed version:
#lru_cache()
def _fact(n):
if n < 2:
return 1
return fact(n-1) * n
Now the old _fact will call fact, which is still the old _fact. Again, this works identically with or without the cache decorator.
Beyond that initial trick, you can factor that whole pattern out into a simple decorator. I'll explain step by step below, or see this blog post.
Anyway, even with the less-ugly version, it's still a bit ugly and verbose. And if you're doing this dozens of times, my "well, it should look a bit ugly" justification will wear thin pretty fast. So, you'll want to handle this the same way you always factor out ugliness: wrap it in a function.
You can't really pass names around as objects in Python. And you don't want to use a hideous frame hack just to deal with this. So you'll have to pass the names around as strings. ike this:
globals().setdefault('fact', _fact)
The globals function just returns the current scope's global dictionary. Which is a dict, which means it has the setdefault method, which means this will set the global name fact to the value _fact if it didn't already have a value, but do nothing if it did. Which is exactly what you wanted. (You could also use setattr on the current module, but I think this way emphasizes that the script is meant to be (repeatedly) executed in someone else's scope, not used as a module.)
So, here that is wrapped up in a function:
def new_bind(name, value):
globals().setdefault(name, value)
… which you can turn that into a decorator almost trivially:
def new_bind(name):
def wrap(func):
globals().setdefault(name, func)
return func
return wrap
Which you can use like this:
#new_bind('foo')
def _foo():
print(1)
But wait, there's more! The func that new_bind gets is going to have a __name__, right? If you stick to a naming convention, like that the "private" name must be the "public" name with a _ prefixed, we can do this:
def new_bind(func):
assert func.__name__[0] == '_'
globals().setdefault(func.__name__[1:], func)
return func
And you can see where this is going:
#new_bind
#lru_cache()
def _square(x):
print "Squaring", x
return x*x
There is one minor problem: if you use any other decorators that don't wrap the function properly, they will break your naming convention. So… just don't do that. :)
And I think this works exactly the way you want in every edge case. In particular, if you've edited the source and want to force the new definition with a new cache, you just del square before rerunning the file, and it works.
And of course if you want to merge those two decorators into one, it's trivial to do so, and call it non_resetting_lru_cache.
However, I'd keep them separate. I think it's more obvious what they do. And if you ever want to wrap another decorator around #lru_cache, you're probably still going to want #new_bind to be the outermost decorator, right?
What if you want to put new_bind into a module that you can import? Then it's not going to work, because it will be referring to the globals of that module, not the one you're currently writing.
You can fix that by explicitly passing your globals dict, or your module object, or your module name as an argument, like #new_bind(__name__), so it can find your globals instead of its. But that's ugly and repetitive.
You can also fix it with an ugly frame hack. At least in CPython, sys._getframe() can be used to get your caller's frame, and frame objects have a reference to their globals namespace, so:
def new_bind(func):
assert func.__name__[0] == '_'
g = sys._getframe(1).f_globals
g.setdefault(func.__name__[1:], func)
return func
Notice the big box in the docs that tells you this is an "implementation detail" that may only apply to CPython and is "for internal and specialized purposes only". Take this seriously. Whenever someone has a cool idea for the stdlib or builtins that could be implemented in pure Python, but only by using _getframe, it's generally treated almost the same as an idea that can't be implemented in pure Python at all. But if you know what you're doing, and you want to use this, and you only care about present-day versions of CPython, it will work.

There is no persistent_lru_cache in the stdlib. But you can build one pretty easily.
The functools source is linked directly from the docs, because this is one of those modules that's as useful as sample code as it is for using it directly.
As you can see, the cache is just a dict. If you replace that with, say, a shelf, it will become persistent automatically:
def persistent_lru_cache(filename, maxsize=128, typed=False):
"""new docstring explaining what dbpath does"""
# same code as before up to here
def decorating_function(user_function):
cache = shelve.open(filename)
# same code as before from here on.
Of course that only works if your arguments are strings. And it could be a little slow.
So, you might want to instead keep it as an in-memory dict, and just write code that pickles it to a file atexit, and restores it from a file if present at startup:
def decorating_function(user_function):
# ...
try:
with open(filename, 'rb') as f:
cache = pickle.load(f)
except:
cache = {}
def cache_save():
with lock:
with open(filename, 'wb') as f:
pickle.dump(cache, f)
atexit.register(cache_save)
# …
wrapper.cache_save = cache_save
wrapper.cache_filename = filename
Or, if you want it to write every N new values (so you don't lose the whole cache on, say, an _exit or a segfault or someone pulling the cord), add this to the second and third versions of wrapper, right after the misses += 1:
if misses % N == 0:
cache_save()
See here for a working version of everything up to this point (using save_every as the "N" argument, and defaulting to 1, which you probably don't want in real life).
If you want to be really clever, maybe copy the cache and save that in a background thread.
You might want to extend the cache_info to include something like number of cache writes, number of misses since last cache write, number of entries in the cache at startup, …
And there are probably other ways to improve this.
From a quick test, with save_every=1, this makes the cache on both get_pep and fib (from the functools docs) persistent, with no measurable slowdown to get_pep and a very small slowdown to fib the first time (note that fib(100) has 100097 hits vs. 101 misses…), and of course a large speedup to get_pep (but not fib) when you re-run it. So, just what you'd expect.

I can't say I won't just use #abarnert's "ugly frame hack", but here is the version that requires you to pass in the calling module's globals dict. I think it's worth posting given that decorator functions with arguments are tricky and meaningfully different from those without arguments.
def create_if_not_exists_2(my_globals):
def wrap(func):
if "_" != func.__name__[0]:
raise Exception("Function names used in cine must begin with'_'")
my_globals.setdefault(func.__name__[1:], func)
def wrapped(*args):
func(*args)
return wrapped
return wrap
Which you can then use in a different module like this:
from functools32 import lru_cache
from cine import create_if_not_exists_2
#create_if_not_exists_2(globals())
#lru_cache()
def _square(x):
print "Squaring", x
return x*x
assert "_square" in globals()
assert "square" in globals()

I've gained enough familiarity with decorators during this process that I was comfortable taking a swing at solving the problem another way:
from functools32 import lru_cache
try:
my_cine
except NameError:
class my_cine(object):
_reg_funcs = {}
#classmethod
def func_key (cls, f):
try:
name = f.func_name
except AttributeError:
name = f.__name__
return (f.__module__, name)
def __init__(self, f):
k = self.func_key(f)
self._f = self._reg_funcs.setdefault(k, f)
def __call__(self, *args, **kwargs):
return self._f(*args, **kwargs)
if __name__ == "__main__":
#my_cine
#lru_cache()
def fact_my_cine(n):
print "In fact_my_cine for", n
if n < 2:
return 1
return fact_my_cine(n-1) * n
x = fact_my_cine(10)
print "The answer is", x
#abarnert, if you are still watching, I'd be curious to hear your assessment of the downsides of this method. I know of two:
You have to know in advance what attributes to look in for a name to associate with the function. My first stab at it only looked at func_name which failed when passed an lru_cache object.
Resetting a function is painful: del my_cine._reg_funcs[('__main__', 'fact_my_cine')], and the swing I took at adding a __delitem__ was unsuccessful.

Is there a better/more pythonified way to do this?

I've been teaching myself Python at my new job, and really enjoying the language. I've written a short class to do some basic data manipulation, and I'm pretty confident about it.
But old habits from my structured/modular programming days are hard to break, and I know there must be a better way to write this. So, I was wondering if anyone would like to take a look at the following, and suggest some possible improvements, or put me on to a resource that could help me discover those for myself.
A quick note: The RandomItems root class was written by someone else, and I'm still wrapping my head around the itertools library. Also, this isn't the entire module - just the class I'm working on, and it's prerequisites.
What do you think?
import itertools
import urllib2
import random
import string
class RandomItems(object):
"""This is the root class for the randomizer subclasses. These
are used to generate arbitrary content for each of the fields
in a csv file data row. The purpose is to automatically generate
content that can be used as functional testing fixture data.
"""
def __iter__(self):
while True:
yield self.next()
def slice(self, times):
return itertools.islice(self, times)
class RandomWords(RandomItems):
"""Obtain a list of random real words from the internet, place them
in an iterable list object, and provide a method for retrieving
a subset of length 1-n, of random words from the root list.
"""
def __init__(self):
urls = [
"http://dictionary-thesaurus.com/wordlists/Nouns%285,449%29.txt",
"http://dictionary-thesaurus.com/wordlists/Verbs%284,874%29.txt",
"http://dictionary-thesaurus.com/wordlists/Adjectives%2850%29.txt",
"http://dictionary-thesaurus.com/wordlists/Adjectives%28929%29.txt",
"http://dictionary-thesaurus.com/wordlists/DescriptiveActionWords%2835%29.txt",
"http://dictionary-thesaurus.com/wordlists/WordsThatDescribe%2886%29.txt",
"http://dictionary-thesaurus.com/wordlists/DescriptiveWords%2886%29.txt",
"http://dictionary-thesaurus.com/wordlists/WordsFunToUse%28100%29.txt",
"http://dictionary-thesaurus.com/wordlists/Materials%2847%29.txt",
"http://dictionary-thesaurus.com/wordlists/NewsSubjects%28197%29.txt",
"http://dictionary-thesaurus.com/wordlists/Skills%28341%29.txt",
"http://dictionary-thesaurus.com/wordlists/TechnicalManualWords%281495%29.txt",
"http://dictionary-thesaurus.com/wordlists/GRE_WordList%281264%29.txt"
]
self._words = []
for url in urls:
urlresp = urllib2.urlopen(urllib2.Request(url))
self._words.extend([word for word in urlresp.read().split("\r\n")])
self._words = list(set(self._words)) # Removes duplicates
self._words.sort() # sorts the list
def next(self):
"""Return a single random word from the list
"""
return random.choice(self._words)
def get(self):
"""Return the entire list, if needed.
"""
return self._words
def wordcount(self):
"""Return the total number of words in the list
"""
return len(self._words)
def sublist(self,size=3):
"""Return a random segment of _size_ length. The default is 3 words.
"""
segment = []
for i in range(size):
segment.append(self.next())
#printable = " ".join(segment)
return segment
def random_name(self):
"""Return a string-formatted list of 3 random words.
"""
words = self.sublist()
return "%s %s %s" % (words[0], words[1], words[2])
def main():
"""Just to see it work...
"""
wl = RandomWords()
print wl.wordcount()
print wl.next()
print wl.sublist()
print 'Three Word Name = %s' % wl.random_name()
#print wl.get()
if __name__ == "__main__":
main()

Here are my five cents:
Constructor should be called __init__.
You could abolish some code by using random.sample, it does what your next() and sublist() does but it's prepackaged.
Override __iter__ (define the method in your class) and you can get rid of RandomIter. You can read more about at it in the docs (note Py3K, some stuff may not be relevant for lower version). You could use yield for this which as you may or may not know creates a generator, thus wasting little to no memory.
random_name could use str.join instead. Note that you may need to convert the values if they are not guaranteed to be strings. This can be done through [str(x) for x in iterable] or in-built map.

First knee-jerk reaction: I would offload your hard-coded URLs into a constructor parameter passed to the class and perhaps read from configuration somewhere; this will allow for easier change without necessitating a redeploy.
The drawback of this is that consumers of the class have to know where those URLs are stored... so you could create a companion class whose only job is to know what the URLs are (i.e. in configuration, or even hard-coded) and how to get them. You could allow the consumer of your class to provide the URLs, or if they are not provided, the class could hit up the companion class for the URLs.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How could I resolve python - parallel issue? - python

Related

Prevent calling a function more than once if the parameters have been used before

Python: Redefine function so that it references its own self

Terminating multiprocess pool when one of the workers found proper solution

Memoize a function so that it isn't reset when I rerun the file in Python

Is there a better/more pythonified way to do this?

Categories

Resources