Will there be re-enter issue with the following twisted code? - python

I'm learning Twisted recently, and just now I re-read some basic docs on Deferred, here is some example code from:http://twistedmatrix.com/documents/12.3.0/core/howto/defer.html
What about commenting out the second g = Getter() out?
Will there be re-enter poblem? Do you have some good ideas on how to avoid these kind of issue?
from twisted.internet import reactor, defer
class Getter:
def gotResults(self, x):
The Deferred mechanism provides a mechanism to signal error
conditions. In this case, odd numbers are bad.
This function demonstrates a more complex way of starting
the callback chain by checking for expected results and
choosing whether to fire the callback or errback chain
if self.d is None:
print "Nowhere to put results"
d = self.d
self.d = None
if x % 2 == 0:
d.errback(ValueError("You used an odd number!"))
def _toHTML(self, r):
This function converts r to HTML.
It is added to the callback chain by getDummyData in
order to demonstrate how a callback passes its own result
to the next callback
return "Result: %s" % r
def getDummyData(self, x):
The Deferred mechanism allows for chained callbacks.
In this example, the output of gotResults is first
passed through _toHTML on its way to printData.
Again this function is a dummy, simulating a delayed result
using callLater, rather than using a real asynchronous
self.d = defer.Deferred()
# simulate a delayed result by asking the reactor to schedule
# gotResults in 2 seconds time
reactor.callLater(2, self.gotResults, x)
return self.d
def printData(d):
print d
def printError(failure):
import sys
# this series of callbacks and errbacks will print an error message
g = Getter()
d = g.getDummyData(3)
# this series of callbacks and errbacks will print "Result: 12"
#g = Getter() #<= What about commenting this line out?
d = g.getDummyData(4)
reactor.callLater(4, reactor.stop)

Yes, if you comment the second g = Getter(), you will have a problem. The same Deferred will fire twice because you have the Deferred stored in the Getter object. In particular, the second call to getDummyData will overwrite the first Deferred.
You shouldn't do this. As a general point, I don't think it is a good idea to hold onto Deferred objects, because they can only fire once and it is all too easy to have a problem like you do.
What you should do is this:
def getDummyData(self, x):
d = defer.Deferred()
# simulate a delayed result by asking the reactor to schedule
# gotResults in 2 seconds time
reactor.callLater(2, self.gotResults, x, d)
return d
def gotResults(self, x, d):
The Deferred mechanism provides a mechanism to signal error
conditions. In this case, odd numbers are bad.
This function demonstrates a more complex way of starting
the callback chain by checking for expected results and
choosing whether to fire the callback or errback chain
if d is None:
print "Nowhere to put results"
if x % 2 == 0:
d.errback(ValueError("You used an odd number!"))
Notice that in this case Getter has no state, which is good, and you don't need a class for it!
My opinioin is that Deferreds should be used to give the caller of your function the ability to do something with the result when it becomes available. They should not be used for anything fancier. So, I always have
def func():
d = defer.Deferred()
return d
If the caller has to hold on to the Deferred for whatever reason, they may, but I can freely call func multiple times without having to worry about hidden state.


Difficulty understanding how data is passed

So I’m trying to have a strobe like effect on a game I’m building and the way I currently have it it’s destroying my frame rate because the sleep function is also applying to the draw function. Can someone explain why this happens? And the logic that I’m failing to understand. Why can’t I just have the return happen every .5 seconds without it affecting the .1 sleep I have in my hue function?
Here’s a crude demonstration of what the code kind of does.
from random import randint
import time
def rand_intr():
r = randint(1,256)
return r
def rand_intg():
g = randint(1,256)
return g
def rand_intb():
b = randint(1,256)
return b
def hue():
r = rand_intr()
g = rand_intg()
b = rand_intb()
print(r, g, b)
while True:
The sleep function blocks the main thread. This means rand_intg does not run until rand_intr "wakes up" from its sleep.
Similarly, rand_intb has to wait for rand_intg, and hue has to wait for all the previous 3 functions. This means the total time hue has to wait before it can do any work is at least the amount of time needed to complete rand_intr, rand_intg, and rand_intb.
We can understand what is happening if we modify your example slightly and look at the output.
from random import randint
import time
def log_entry_exit(f):
def wrapped():
print("Entered {}".format(f.__name__))
result = f()
print("Leaving {}".format(f.__name__))
return result
return wrapped
def rand_intr():
r = randint(1,256)
return r
def rand_intg():
g = randint(1,256)
return g
def rand_intb():
b = randint(1,256)
return b
def hue():
r = rand_intr()
g = rand_intg()
b = rand_intb()
print(r, g, b)
while True:
Here I just modified your functions to print a message when we enter and exit each function.
The output is
Entered rand_intr
Leaving rand_intr
Entered rand_intg
Leaving rand_intg
Entered rand_intb
Leaving rand_intb
172 206 115
Entered rand_intr
Leaving rand_intr
Entered rand_intg
Leaving rand_intg
Entered rand_intb
Leaving rand_intb
240 33 135
Here, the effect of each sleep on hue can be seen clearly. You don't get to print the rgb values or "test" until the previous functions have completed.
What you can do is to call your hue function periodically using a timer callback, and then modify the rgb values according to some pattern. See this stackoverflow question on
executing periodic actions for an example on how to periodically execute a function using a basic time-based mechanism.
Based on your comment to #jasonharper
If you call hue every 60 seconds, it does not make sense if your calls to the functions that generate the random rgb values occur at a faster rate because any changes in the intervening time will not be seen in hue.
What you can do is call hue every 60 seconds, then generate your rgb values to have whatever pattern in there.
Modifying the answer by #kev in the post I linked to above,
import time, threading
def update():
# Do whatever you want here.
# This function will be called again in 60 seconds.
# ...
# Whatever other things you want to do
# ...
threading.Timer(60.0, update).start()
def hue():
r = rand_intr()
g = rand_intg()
b = rand_intb()
print(r, g, b)
# Don't call sleep.
if __name__ == "__main__":
Now you should only call update once, possibly in some startup part of your code and remove all the calls to sleep in your functions.

Call many object's methods in parallel in python

I have two classes. One called algorithm and the other called Chain. In algorithm, I create multiple chains, which are going to be a sequence of sampled values. I want to run the sampling in parallel at the chain level.
In other words, the algorithm class instantiates n chains and I want to run the _sample method, which belongs to the Chain class, for each of the chains in parallel within the algorithm class.
Below is a sample code that attempts what I would like to do.
I have seen a similar questions here: Apply a method to a list of objects in parallel using multi-processing, but as shown in the function _sample_chains_parallel_worker, this method does not work for my case (I am guessing it is because of the nested class structure).
Question 1: Why does this not work for this case?
The method in _sample_chains_parallel also does not even run in parallel.
Question 2: Why?
Question 3: How do I sample each of these chains in parallel?
import time
import multiprocessing
class Chain():
def __init__(self):
self.thetas = []
def _sample(self):
for i in range(3):
def clear_thetas(self):
self.thetas = []
class algorithm():
def __init__(self, n=3):
self.n = n
self.chains = []
def _init_chains(self):
for _ in range(self.n):
def _sample_chains(self):
for chain in self.chains:
def _sample_chains_parallel(self):
pool = multiprocessing.Pool(processes=self.n)
for chain in self.chains:
def _sample_chains_parallel_worker(self):
def worker(obj):
pool = multiprocessing.Pool(processes=self.n)
pool.map(worker, self.chains)
if __name__=="__main__":
import time
alg = algorithm()
start = time.time()
end = time.time()
print "sequential", end - start
start = time.time()
end = time.time()
print "parallel", end - start
start = time.time()
end = time.time()
print "parallel, map and worker", end - start
In _sample_chains_parallel you are calling chain._sample() instead of just passing the function: pool.apply_async(chain._sample()). So you are passing the result as an argument instead of letting apply_async calculate it.
But removing () won't help you much, because Python 2 cannot pickle instance methods (possible for Python +3.5). It wouldn't raise the error unless you call get() on the result objects so don't rejoice if you see low times for this approach, that's because it immidiately quits with an unraised exception.
For the parallel versions you would have to relocate worker to the module level and call it pool.apply_async(worker (chain,)) respectively pool.map(worker, self.chains).
Note that you forgot clear_thetas() for _sample_chains_parallel_worker. The better solution would be anyway to let let Chain._sample take care of calling self._clear_thetas().

How to design an async pipeline pattern in python

I am trying to design an async pipeline that can easily make a data processing pipeline. The pipeline is composed of several functions. Input data goes in at one end of the pipeline and comes out at the other end.
I want to design the pipeline in a way that:
Additional functions can be insert in the pipeline
Functions already in the pipeline can be popped out.
Here is what I came up with:
import asyncio
def add(x):
return x + 1
def prod(x):
return x * 2
def power(x):
return x ** 3
def connect(funcs):
def wrapper(*args, **kwargs):
data_out = yield from funcs[0](*args, **kwargs)
for func in funcs[1:]:
data_out = yield from func(data_out)
return data_out
return wrapper
pipeline = connect([add, prod, power])
input = 1
output = asyncio.get_event_loop().run_until_complete(pipeline(input))
This works, of course, but the problem is that if I want to add another function into (or pop out a function from) this pipeline, I have to disassemble and reconnect every function again.
I would like to know if there is a better scheme or design pattern to create such a pipeline?
I've done something similar before, using just the multiprocessing library. It's a bit more manual, but it gives you the ability to easily create and modify your pipeline, as you've requested in your question.
The idea is to create functions that can live in a multiprocessing pool, and their only arguments are an input queue and an output queue. You tie the stages together by passing them different queues. Each stage receives some work on its input queue, does some more work, and passes the result out to the next stage through its output queue.
The workers spin on trying to get something from their queues, and when they get something, they do their work and pass the result to the next stage. All of the work ends by passing a "poison pill" through the pipeline, causing all stages to exit:
This example just builds a string in multiple work stages:
import multiprocessing as mp
def stage1(q_in, q_out):
while True:
# get either work or a poison pill from the previous stage (or main)
val = q_in.get()
# check to see if we got the poison pill - pass it along if we did
if val == POISON_PILL:
# do stage 1 work
val = val + "Stage 1 did some work.\n"
# pass the result to the next stage
def stage2(q_in, q_out):
while True:
val = q_in.get()
if val == POISON_PILL:
val = val + "Stage 2 did some work.\n"
def main():
pool = mp.Pool()
manager = mp.Manager()
# create managed queues
q_main_to_s1 = manager.Queue()
q_s1_to_s2 = manager.Queue()
q_s2_to_main = manager.Queue()
# launch workers, passing them the queues they need
results_s1 = pool.apply_async(stage1, (q_main_to_s1, q_s1_to_s2))
results_s2 = pool.apply_async(stage2, (q_s1_to_s2, q_s2_to_main))
# Send a message into the pipeline
q_main_to_s1.put("Main started the job.\n")
# Wait for work to complete
print(q_s2_to_main.get()+"Main finished the job.")
if __name__ == "__main__":
The code produces this output:
Main started the job.
Stage 1 did some work.
Stage 2 did some work.
Main finished the job.
You can easily put more stages in the pipeline or rearrange them just by changing which functions get which queues. I'm not very familiar with the asyncio module, so I can't speak to what capabilities you would be losing by using the multiprocessing library instead, but this approach is very straightforward to implement and understand, so I like its simplicity.
I don't know if it is the best way to do it but here is my solution.
While I think it's possible to control a pipeline using a list or a dictionary I found easier and more efficent to use a generator.
Consider the following generator:
def controller():
old = value = None
while True:
new = (yield value)
value = old
old = new
This is basically a one-element queue, it stores the value that you send it and releases it at the next call of send (or next).
>>> c = controller()
>>> next(c) # prime the generator
>>> c.send(8) # send a value
>>> next(c) # pull the value from the generator
By associating every coroutine in the pipeline with its controller we will have an external handle that we can use to push the target of each one. We just need to define our coroutines in a way that they will pull the new target from our controller every cycle.
Now consider the following coroutines:
def source(controller):
while True:
target = next(controller)
print("source sending to", target.__name__)
yield (yield from target)
def add():
return (yield) + 1
def prod():
return (yield) * 2
The source is a coroutine that does not return so that it will not terminate itself after the first cycle. The other coroutines are "sinks" and does not need a controller.
You can use these coroutines in a pipeline as in the following example. We initially set up a route source --> add and after receiving the first result we change the route to source --> prod.
# create a controller for the source and prime it
cont_source = controller()
# create three coroutines
# associate the source with its controller
coro_source = source(cont_source)
coro_add = add()
coro_prod = prod()
# create a pipeline
# prime the source and send a value to it
print("add =", coro_source.send(4))
# change target of the source
# reset the source, send another value
print("prod =", coro_source.send(8))
source sending to add
add = 5
source sending to prod
prod = 16

Returning success or failure as well as result from function

Can you suggest for me a nice way to return to the client if the following function succeeds or not before the timeout is reached?
def until_true(func, condition, timeout):
s = 0.05
t = 0
while t <= timeout:
result = func()
if condition(result):
return t, result
t = t + s
return result
With the current implementation you can test if it has failed by checking the len of the return value which is really ugly. If the function succeeds, I also wish to return the result and the time it took for the function to succeed. If it fails, I wish to return just the result.
Have your first return value as a sentinel to check if it is successful or not. That is
return (True, t, result)
And while accepting it at the calling side have it as success, others, others2 = until_true(..) so that you can test, if success ...
Similarly for the failed portion you can have
return (False, result)
A more pythonic way would be to use exceptions:
class TimeoutException(Exception):
def __init__(self, result):
self.result = result
def until_true(func, condition, timeout):
raise TimeoutException(result)
time_spent, result = until_true([...])
except TimeoutException as exc:
result = exc.result
The typical pythonic way of reporting error conditions in python is via exceptions rather than return values. In this case you still want to return something, though, so the solution becomes a bit less clear cut. Still, you might want to go with something like this (not tested):
class Timeout(Exception): pass
def until_true(func, condition=bool, timeout=1.0, interval=0.05):
t = 0
while t <= timeout:
result = func()
if condition(result):
return t, result
t += interval
raise Timeout(result)
t, result = until_true(func)
except Timeout as err:
print "Oh no, timed out! But got result " + str(err.message)
I think this approach is better if timeouts usually shouldn't happen, and require some special action. Otherwise, a simple approach is to simply return t, result in any case, letting the caller check if t is greater than the timeout, or returning some special value for t upon timeout, or even using a third return value like the other answer.
(By the way, your logic assumes that func or condition take no time to execute. If they are time-consuming, until_true will end up running for longer than your timeout since you only count the time spent sleeping. A better approach would use time.time() or similar to check the actual elapsed time.)

Separating Progress Tracking and Loop Logic

Suppose i want to track the progress of a loop using the progress bar printer ProgressMeter (as described in this recipe).
def bigIteration(collection):
for element in collection:
I would like to be able to switch the progress bar on and off. I also want to update it only every x steps for performance reasons. My naive way to do this is
def bigIteration(collection, progressbar=True):
if progressBar:
pm = progress.ProgressMeter(total=len(collection))
pc = 0
for element in collection:
if progressBar:
pc += 1
if pc % 100 = 0:
However, I am not satisfied. From an "aesthetic" point of view, the functional code of the loop is now "contaminated" with generic progress-tracking code.
Can you think of a way to cleanly separate progress-tracking code and functional code? (Can there be a progress-tracking decorator or something?)
It seems like this code would benefit from the null object pattern.
# a progress bar that uses ProgressMeter
class RealProgressBar:
pm = Nothing
def setMaximum(self, max):
pm = progress.ProgressMeter(total=max)
pc = 0
def progress(self):
pc += 1
if pc % 100 = 0:
# a fake progress bar that does nothing
class NoProgressBar:
def setMaximum(self, max):
def progress(self):
# Iterate with a given progress bar
def bigIteration(collection, progressBar=NoProgressBar()):
for element in collection:
bigIteration(collection, RealProgressBar())
(Pardon my French, er, Python, it's not my native language ;) Hope you get the idea, though.)
This lets you move the progress update logic from the loop, but you still have some progress related calls in there.
You can remove this part if you create a generator from the collection that automatically tracks progress as you iterate it.
# turn a collection into one that shows progress when iterated
def withProgress(collection, progressBar=NoProgressBar()):
for element in collection:
yield element
# simple iteration function
def bigIteration(collection):
for element in collection:
# let's iterate with progress reports
bigIteration(withProgress(collection, RealProgressBar()))
This approach leaves your bigIteration function as is and is highly composable. For example, let's say you also want to add cancellation this big iteration of yours. Just create another generator that happens to be cancellable.
# highly simplified cancellation token
# probably needs synchronization
class CancellationToken:
cancelled = False
def isCancelled(self):
return cancelled
def cancel(self):
cancelled = True
# iterates a collection with cancellation support
def withCancellation(collection, cancelToken):
for element in collection:
if cancelToken.isCancelled():
yield element
progressCollection = withProgress(collection, RealProgressBar())
cancellableCollection = withCancellation(progressCollection, cancelToken)
# meanwhile, on another thread...
You could rewrite bigIteration as a generator function as follows:
def bigIteration(collection):
for element in collection:
yield element
Then, you could do a great deal outside of this:
def mycollection = [1,2,3]
if progressBar:
pm = progress.ProgressMeter(total=len(collection))
pc = 0
for item in bigIteration(mycollection):
pc += 1
if pc % 100 = 0:
for item in bigIteration(mycollection):
My approach would be like that:
The looping code yields the progress percentage whenever it changes (or whenever it wants to report it). The progress-tracking code then reads from the generator until it's empty; updating the progress bar after every read.
However, this also has some disadvantages:
You need a function to call it without a progress bar as you still need to read from the generator until it's empty.
You cannot easily return a value at the end. A solution would be wrapping the return value though so the progress method can determine if the function yielded a progress update or a return value. Actually, it might be nicer to wrap the progress update so the regular return value can be yielded unwrapped - but that'd require much more wrapping since it would need to be done for every progress update instead just once.
