I want to realize some sort oft client-server-connection using Python and are rather new to multiprocessing. Basically, I have a class 'Manager' that inherits from multiprocessing.Process and manages the connection from a client to different data sources. This process has some functions like 'get_value(key)' that should return the value of the key-data source. Now, as I want this to run asynchronized, I cannot simply call this function from my client process.
My idea so far would be that I connect the Client- and Manager-Processes using a Pipe and then send a message from the Client to the Manager to execute this function. I would realize this by sending a list through the pipe where the first element is the name of the function the remaining elements are the arguments of the actual function, e.g. ['get_value', 'datasource1']. The process then would receive this and send the return value through the pipe to the client. This would look something like this:
from multiprocessing import Process, Pipe
import time
class Manager(Process):
def __init__(self, connection):
super(Process, self).__init__()
self.connection = connection
def run(self):
while True:
if self.connection.poll():
msg = self.connection.recv()
self.call_function(msg[0], msg[:])
def call_function(self, name, *args):
print('Function Called with %s' % name)
return_val = getattr(self, name)(*args)
self.connection.send(return_val)
def get_value(self, key):
return 1.0
While I guess that this would work, I am not very happy with this solution. Especially the call-function-by-string-method seems very error-prone. Is there a more elegant way of requesting to execute a function in Python?
I think that your approach, all in all, is a good one (there are other ways to do the same thing, of course, but there is nothing wrong with your general approach).
That said, I would change the design slightly to add a "routing" component: think of some logic that somehow limits what "commands" can be sent by clients, and hooks between commands and "handlers" - that is functions that handle them. Basically think Web Framework routing (if you are familiar with the concept).
This is a good idea both in terms of flexibility of the design, in terms of error detection and in terms of security (you don't want clients to call ['__del__'] for example on your Manager.
At it's very basic form, a router can be a dictionary mapping commands to class methods:
class Manager(Process):
def __init__(self, connection):
super(Process, self).__init__()
self.connection = connection
self.routes = {'do_action': self._do_action,
'do_other_action': some_callable,
'ping': lambda args: args} # <- as long as it's callable and has the right signature...
def call_function(self, name, *args):
try:
handler = self.routes[name]
except KeyError:
return self._error_reply('{} is not a valid command'.format(name))
try:
return_val = handler(*args) # handler functions will need to throw something if arguments are wrong...
except ValueError as e:
return self._error_reply('Invalid command arguments: {}'.format(str(e)))
except Exception as e:
# This is your catch-all "internal server error" handler
return self._error_reply(str(e))
self.connection.send(return_val)
This is of course just an example of an approach. You will need to implement _error_reply() in whatever way works for you.
You can expand on it by creating a Router class and passing it as a dependency to Manager, making it even more flexible. You might also want to think about making your Manager a separate thing and not a subclass of Process (because you might want to run it regardless of whether it is in a subprocess - for example in testing).
BTW, there are frameworks for implementing such things with various degrees of complexity and flexibility (Thrift, ZeroMQ, ...), but if you want to do something simple and learn, doing it yourself is in my opinion a great choice.
Related
I am trying to organize my code by removing a lot of repetitive logic. I felt like this would be a great use case for a context manager.
When making certain updates in the system, 4 things always happen -
We lock the resource to prevent concurrent updates
We wrap our logic in a database transaction
We validate the data making sure the update is permissible
After the function executes we add history rows to the database
I want a wrapper to encapsulate this like below
from contextlib import contextmanager
#contextmanager
def my_manager(resource_id):
try:
with lock(resource_id), transaction.atomic():
validate_data(resource_id)
resources = yield
create_history_objects(resources)
except LockError:
raise CustomError
def update(resource_id):
with my_manager(resource_id):
_update(resource_id)
def _update(resource_id):
# do something
return resource
Everything works as expected except my ability to access resources in the contextmanager, which are None. The resources are returned from the function that's called during the yield statement.
What is the proper way to access those resources through yield or maybe another utility? Thanks
so I have a twisted server I built, and I was wondering what is the best way to limit the number of simultaneous connections?
Is having my Factory return None the best way? When I do this, I throw a lot of exceptions like:
exceptions.AttributeError: 'NoneType' object has no attribute 'makeConnection'
I would like someway to have the clients just sit in queue until the current connection number goes back down, but I don't know how to do that asynchronously.
Currently I am using my factory do like this:
class HandleClientFactory(Factory):
def __init__(self):
self.numConnections = 0
def buildProtocol(self, addr):
#limit connection number here
if self.numConnections >= Max_Clients:
logging.warning("Reached maximum Client connections")
return None
return HandleClient(self)
which works, but disconnects rather than waits, and also throws a lot of unhandled errors.
You have to build this yourself. Fortunately, the pieces are mostly in place to do so (you could probably ask for slightly more suitable pieces but ...)
First, to avoid the AttributeError (which indeed causes the connection to be closed), be sure to return an IProtocol provider from your buildProtocol method.
class DoesNothing(Protocol):
pass
class YourFactory(Factory):
def buildProtocol(self, addr):
if self.currentConnections < self.maxConnections:
return Factory.buildProtocol(self, addr)
protocol = DoesNothing()
protocol.factory = self
return protocol
If you use this factory (filling in the missing pieces - eg, initializing maxConnections and so tracking currentConnections correctly) then you'll find that clients which connect once the limit has been reached are given the DoesNothing protocol. They can send as much data as they like to this protocol. It will discard it all. It will never send them any data. It will leave the connection open until they close it. In short, it does nothing.
However, you also wanted clients to actually receive service once connection count fell below the limit.
To do this, you need a few more pieces:
You have to keep any data they might send buffered so it is available to be read when you're ready to read it.
You have to keep track of the connections so you can begin to service them when the time is ripe.
You have to begin to service them at said time.
For the first of these, you can use the feature of most transports to "pause":
class PauseTransport(Protocol):
def makeConnection(self, transport):
transport.pauseProducing()
class YourFactory(Factory):
def buildProtocol(self, addr):
if self.currentConnections < self.maxConnections:
return Factory.buildProtocol(self, addr)
protocol = PauseTransport()
protocol.factory = self
return protocol
PauseTransport is similar to DoesNothing but with the minor (and useful) difference that as soon as it is connected to a transport it tells the transport to pause. Thus, no data will ever be read from the connection and it will all remain buffered for whenever you're ready to deal with it.
For the next requirement, many possible solutions exist. One of the simplest is to use the factory as storage:
class PauseAndStoreTransport(Protocol):
def makeConnection(self, transport):
transport.pauseProducing()
self.factory.addPausedTransport(transport)
class YourFactory(Factory):
def buildProtocol(self, addr):
# As above
...
def addPausedTransport(self, transport):
self.transports.append(transport)
Again, with the proper setup (eg, initialize the transports attribute), you now have a list of all of the transports which correspond to connections you've accepted above the concurrency limit which are waiting for service.
For the last requirement, all that is necessary is to instantiate and initialize the protocol that's actually capable of serving your clients. Instantiation is easy (it's your protocol, you probably know how it works). Initialization is largely a matter of calling the makeConnection method:
class YourFactory(Factory):
def buildProtocol(self, addr):
# As above
...
def addPausedTransport(self, transport):
# As above
...
def oneConnectionDisconnected(self)
self.currentConnections -= 1
if self.currentConnections < self.maxConnections:
transport = self.transports.pop(0)
protocol = self.buildProtocol(address)
protocol.makeConnection(transport)
transport.resumeProducing()
I've omitted the details of keeping track of the address argument required by buildProtocol (with the transport carried from its point of origin to this part of the program, it should be clear how to do something similar for the original address value if your program actually wants it).
Apart from that, all that happens here is you take the next queued transport (you could use a different scheduling algorithm if you want, eg LIFO) and hook it up to a protocol of your choosing just as Twisted would do. Finally, you undo the earlier pause operation so data will begin to flow.
Or... almost. This would be pretty slick except Twisted transports don't actually expose any way to change which protocol they deliver data to. Thus, as written, data from clients will actually be delivered to the original PauseAndStoreTransport protocol instance. You can hack around this (and "hack" is clearly the right word). Store both the transport and PauseAndStoreTransport instance in the list on the factory and then:
def oneConnectionDisconnected(self)
self.currentConnections -= 1
if self.currentConnections < self.maxConnections:
originalProtocol, transport = self.transports.pop(0)
newProtocol = self.buildProtocol(address)
originalProtocol.dataReceived = newProtocol.dataReceived
originalProtocol.connectionLost = newProtocol.connectionLost
newProtocol.makeConnection(transport)
transport.resumeProducing()
Now the object that the transport wants to call methods on has had its methods replaced by those from the object that you want the methods called on. Again, this is clearly a hack. You can probably put together something less hackish if you want (eg, a third protocol class that explicitly supports delegating to another protocol). The idea will be the same - it'll just be more wear on your keyboard. For what it's worth, I suspect that it may be both easier and less typing to do something similar using Tubes but I'll leave an attempt at a solution based on that library to someone else for now.
I've avoided addressing the problem of keeping currentConnections properly up to date. Since you already had numConnections in your question I'm assuming you know how to manage that part. All I've done in the last step here is suppose that the way you do the decrement step is by calling oneConnectionDisconnected on the factory.
I've also avoided addressing the event that a queued connection gets bored and goes away. This will mostly work as written - Twisted won't notice the connection was closed until you call resumeProducing and then connectionLost will be called on your application protocol. This should be fine since your protocol needs to handle lost connections anyway.
Cite from docs:
...look at the **kwargs argument. All signals send keyword arguments,
and may change those keyword arguments at any time. In the case of
request_finished, it’s documented as sending no arguments, which means
we might be tempted to write our signal handling as
my_callback(sender).
This would be wrong – in fact, Django will throw an error if you do
so. That’s because at any point arguments could get added to the
signal and your receiver must be able to handle those new arguments.
I don't get it. Why 'arguments could be added at any time', isn't interfaces in programs exist to be constant and everybody should be aware of them? Or does these words mean, that every receiver must always fail silently? Because it is obvious if, sender would randomly change interface, receivers will fail and throw errors.
This would be wrong – in fact, Django will throw an error if you do
so.
Throwing errors is wrong when using signals or what did they meant?
Seems like that is just telling you to be sure you always include the **kwargs argument. So you should do that.
In Python and especially in Django, it is common to program API's (or more precisely, the functions that expose the API's) in a way that when given additional parameters, they can still operate instead of crashing.
In your particular situation- consider a signal handler like:
def (sender, param1, param2):
pass
Lets say you have some version X.Y of Django where this handler works perfectly. Then you realise that Django has been updated to X.Z and one thing in the changelog was that signals now get given a fourth keyword arg (param3).
Would you rather go through you entire codebase and change handlers to:
def handler(sender, param1, param2, param3):
pass
...or would it have been better to program all handlers like:
def handler(sender, param1, param2, **kwargs):
pass
?
This kind of design is also useful when your functions are supposed to relay the params to other functions:
def func(*args,**kwargs):
# do something
other_func(*args, **kwargs)
There is a caveat though: this kind of API design is reasonably when acting on params is voluntary. Consider a (naive) example like the following "sum" API.
def sum(a,b):
return a+b
Then in the next version the sum function suddenly starts to get more params: a,b,c. A function like the following would probably cause hard to track bugs:
def sum(a,b,**kwargs):
return a+b
Throwing errors is wrong when using signals or what did they meant?
They only mean that the **kwargs is required when you call signal receiver function. Throwing errors when using signals is not wrong a priori. In conclusion, remind to always include the kwargs when you need to define a receiver function, as the docs says:
def my_callback(sender, **kwargs):
print("Request finished!")
I'm implementing a utility library which is a sort-of task manager intended to run within the distributed environment of Google App Engine cloud computing service. (It uses a combination of task queues and memcache to execute background processing). I plan to use generators to control the execution of tasks, essentially enforcing a non-preemptive "concurrency" via the use of yield in the user's code.
The trivial example - processing a bunch of database entities - could be something like the following:
class EntityWorker(Worker):
def setup():
self.entity_query = Entity.all()
def run():
for e in self.entity_query:
do_something_with(e)
yield
As we know, yield is two way communication channel, allowing to pass values to code that uses generators. This allows to simulate a "preemptive API" such as the SLEEP call below:
def run():
for e in self.entity_query:
do_something_with(e)
yield Worker.SLEEP, timedelta(seconds=1)
But this is ugly. It would be great to hide the yield within seperate function which could invoked in simple way:
self.sleep(timedelta(seconds=1))
The problem is that putting yield in function sleep turns it into a generator function. The call above would therefore just return another generator. Only after adding .next() and yield back again we would obtain previous result:
yield self.sleep(timedelta(seconds=1)).next()
which is of course even more ugly and unnecessarily verbose that before.
Hence my question: Is there a way to put yield into function without turning it into generator function but making it usable by other generators to yield values computed by it?
You seem to be missing the obvious:
class EntityWorker(Worker):
def setup(self):
self.entity_query = Entity.all()
def run(self):
for e in self.entity_query:
do_something_with(e)
yield self.sleep(timedelta(seconds=1))
def sleep(self, wait):
return Worker.SLEEP, wait
It's the yield that turns functions into generators, it's impossible to leave it out.
To hide the yield you need a higher order function, in your example it's map:
from itertools import imap
def slowmap(f, sleep, *iters):
for row in imap(f, self.entity_query):
yield Worker.SLEEP, wait
def run():
return slowmap(do_something_with,
(Worker.SLEEP, timedelta(seconds=1)),
self.entity_query)
Alas, this won't work. But a "middle-way" could be fine:
def sleepjob(*a, **k):
if a:
return Worker.SLEEP, a[0]
else:
return Worker.SLEEP, timedelta(**k)
So
yield self.sleepjob(timedelta(seconds=1))
yield self.sleepjob(seconds=1)
looks ok for me.
I would suggest you have a look at the ndb. It uses generators as co-routines (as you are proposing here), allowing you to write programs that work with rpcs asynchronously.
The api does this by wrapping the generator with another function that 'primes' the generator (it calls .next() immediately so that the code begins execution). The tasklets are also designed to work with App Engine's rpc infrastructure, making it possible to use any of the existing asynchronous api calls.
With the concurreny model used in ndb, you yield either a future object (similar to what is described in pep-3148) or an App Engine rpc object. When that rpc has completed, the execution in the function that yielded the object is allowed to continue.
If you are using a model derived from ndb.model.Model then the following will allow you to asynchronously iterate over a query:
from ndb import tasklets
#tasklets.tasklet
def run():
it = iter(Entity.query())
# Other tasklets will be allowed to run if the next call has to wait for an rpc.
while (yield it.has_next_async()):
entity = it.next()
do_something_with(entity)
Although ndb is still considered experimental (some of its error handling code still needs some work), I would recommend you have a look at it. I have used it in my last 2 projects and found it to be an excellent library.
Make sure you read through the documentation linked from the main page, and also the companion documentation for the tasklet stuff.
i am trying to wrap the read and write operation of an instance of a file object (specifically the readline() and write() methods).
normally, i would simply replace those functions by a wrapper, a bit like this:
def log(stream):
def logwrite(write):
def inner(data):
print 'LOG: > '+data.replace('\r','<cr>').replace('\n','<lf>')
return write(data)
return inner
stream.write = logwrite(stream.write)
but the attributes of a file object are read-only ! how could i wrap them properly ?
(note: i am too lazy to wrap the whole fileobject... really, i don't want to miss a feature that i did not wrap properly, or a feature which may be added in a future version of python)
more context :
i am trying to automate the communication with a modem, whose AT command set is made available on the network through a telnet session. once logged in, i shall "grab" the module with which i want to communicate with. after some time without activity, a timeout occurs which releases the module (so that it is available to other users on the network... which i don't care, i am the sole user of this equipment). the automatic release writes a specific line on the session.
i want to wrap the readline() on a file built from a socket (cf. socket.makefile()) so that when the timeout occurs, a specific exception is thrown, so that i can detect the timeout anywhere in the script and react appropriately without complicating the AT command parser...
(of course, i want to do that because the timeout is quite spurious, otherwise i would simply feed the modem with commands without any side effect only to keep the module alive)
(feel free to propose any other method or strategy to achieve this effect)
use __getattr__ to wrap your file object. provide modified methods for the ones that you are concerned with.
class Wrapped(object):
def __init__(self, file_):
self._file = file_
def write(self, data):
print 'LOG: > '+data.replace('\r','<cr>').replace('\n','<lf>')
return self._file.write(data)
def __getattr__(self, attr):
return getattr(self._file, attr)
This way, requests for attributes which you don't explicitly provide will be routed to the attribute on the wrapped object and you can just implement the ones that you want
logged = Wrapped(open(filename))