so I have a twisted server I built, and I was wondering what is the best way to limit the number of simultaneous connections?
Is having my Factory return None the best way? When I do this, I throw a lot of exceptions like:
exceptions.AttributeError: 'NoneType' object has no attribute 'makeConnection'
I would like someway to have the clients just sit in queue until the current connection number goes back down, but I don't know how to do that asynchronously.
Currently I am using my factory do like this:
class HandleClientFactory(Factory):
def __init__(self):
self.numConnections = 0
def buildProtocol(self, addr):
#limit connection number here
if self.numConnections >= Max_Clients:
logging.warning("Reached maximum Client connections")
return None
return HandleClient(self)
which works, but disconnects rather than waits, and also throws a lot of unhandled errors.
You have to build this yourself. Fortunately, the pieces are mostly in place to do so (you could probably ask for slightly more suitable pieces but ...)
First, to avoid the AttributeError (which indeed causes the connection to be closed), be sure to return an IProtocol provider from your buildProtocol method.
class DoesNothing(Protocol):
pass
class YourFactory(Factory):
def buildProtocol(self, addr):
if self.currentConnections < self.maxConnections:
return Factory.buildProtocol(self, addr)
protocol = DoesNothing()
protocol.factory = self
return protocol
If you use this factory (filling in the missing pieces - eg, initializing maxConnections and so tracking currentConnections correctly) then you'll find that clients which connect once the limit has been reached are given the DoesNothing protocol. They can send as much data as they like to this protocol. It will discard it all. It will never send them any data. It will leave the connection open until they close it. In short, it does nothing.
However, you also wanted clients to actually receive service once connection count fell below the limit.
To do this, you need a few more pieces:
You have to keep any data they might send buffered so it is available to be read when you're ready to read it.
You have to keep track of the connections so you can begin to service them when the time is ripe.
You have to begin to service them at said time.
For the first of these, you can use the feature of most transports to "pause":
class PauseTransport(Protocol):
def makeConnection(self, transport):
transport.pauseProducing()
class YourFactory(Factory):
def buildProtocol(self, addr):
if self.currentConnections < self.maxConnections:
return Factory.buildProtocol(self, addr)
protocol = PauseTransport()
protocol.factory = self
return protocol
PauseTransport is similar to DoesNothing but with the minor (and useful) difference that as soon as it is connected to a transport it tells the transport to pause. Thus, no data will ever be read from the connection and it will all remain buffered for whenever you're ready to deal with it.
For the next requirement, many possible solutions exist. One of the simplest is to use the factory as storage:
class PauseAndStoreTransport(Protocol):
def makeConnection(self, transport):
transport.pauseProducing()
self.factory.addPausedTransport(transport)
class YourFactory(Factory):
def buildProtocol(self, addr):
# As above
...
def addPausedTransport(self, transport):
self.transports.append(transport)
Again, with the proper setup (eg, initialize the transports attribute), you now have a list of all of the transports which correspond to connections you've accepted above the concurrency limit which are waiting for service.
For the last requirement, all that is necessary is to instantiate and initialize the protocol that's actually capable of serving your clients. Instantiation is easy (it's your protocol, you probably know how it works). Initialization is largely a matter of calling the makeConnection method:
class YourFactory(Factory):
def buildProtocol(self, addr):
# As above
...
def addPausedTransport(self, transport):
# As above
...
def oneConnectionDisconnected(self)
self.currentConnections -= 1
if self.currentConnections < self.maxConnections:
transport = self.transports.pop(0)
protocol = self.buildProtocol(address)
protocol.makeConnection(transport)
transport.resumeProducing()
I've omitted the details of keeping track of the address argument required by buildProtocol (with the transport carried from its point of origin to this part of the program, it should be clear how to do something similar for the original address value if your program actually wants it).
Apart from that, all that happens here is you take the next queued transport (you could use a different scheduling algorithm if you want, eg LIFO) and hook it up to a protocol of your choosing just as Twisted would do. Finally, you undo the earlier pause operation so data will begin to flow.
Or... almost. This would be pretty slick except Twisted transports don't actually expose any way to change which protocol they deliver data to. Thus, as written, data from clients will actually be delivered to the original PauseAndStoreTransport protocol instance. You can hack around this (and "hack" is clearly the right word). Store both the transport and PauseAndStoreTransport instance in the list on the factory and then:
def oneConnectionDisconnected(self)
self.currentConnections -= 1
if self.currentConnections < self.maxConnections:
originalProtocol, transport = self.transports.pop(0)
newProtocol = self.buildProtocol(address)
originalProtocol.dataReceived = newProtocol.dataReceived
originalProtocol.connectionLost = newProtocol.connectionLost
newProtocol.makeConnection(transport)
transport.resumeProducing()
Now the object that the transport wants to call methods on has had its methods replaced by those from the object that you want the methods called on. Again, this is clearly a hack. You can probably put together something less hackish if you want (eg, a third protocol class that explicitly supports delegating to another protocol). The idea will be the same - it'll just be more wear on your keyboard. For what it's worth, I suspect that it may be both easier and less typing to do something similar using Tubes but I'll leave an attempt at a solution based on that library to someone else for now.
I've avoided addressing the problem of keeping currentConnections properly up to date. Since you already had numConnections in your question I'm assuming you know how to manage that part. All I've done in the last step here is suppose that the way you do the decrement step is by calling oneConnectionDisconnected on the factory.
I've also avoided addressing the event that a queued connection gets bored and goes away. This will mostly work as written - Twisted won't notice the connection was closed until you call resumeProducing and then connectionLost will be called on your application protocol. This should be fine since your protocol needs to handle lost connections anyway.
Related
I want to realize some sort oft client-server-connection using Python and are rather new to multiprocessing. Basically, I have a class 'Manager' that inherits from multiprocessing.Process and manages the connection from a client to different data sources. This process has some functions like 'get_value(key)' that should return the value of the key-data source. Now, as I want this to run asynchronized, I cannot simply call this function from my client process.
My idea so far would be that I connect the Client- and Manager-Processes using a Pipe and then send a message from the Client to the Manager to execute this function. I would realize this by sending a list through the pipe where the first element is the name of the function the remaining elements are the arguments of the actual function, e.g. ['get_value', 'datasource1']. The process then would receive this and send the return value through the pipe to the client. This would look something like this:
from multiprocessing import Process, Pipe
import time
class Manager(Process):
def __init__(self, connection):
super(Process, self).__init__()
self.connection = connection
def run(self):
while True:
if self.connection.poll():
msg = self.connection.recv()
self.call_function(msg[0], msg[:])
def call_function(self, name, *args):
print('Function Called with %s' % name)
return_val = getattr(self, name)(*args)
self.connection.send(return_val)
def get_value(self, key):
return 1.0
While I guess that this would work, I am not very happy with this solution. Especially the call-function-by-string-method seems very error-prone. Is there a more elegant way of requesting to execute a function in Python?
I think that your approach, all in all, is a good one (there are other ways to do the same thing, of course, but there is nothing wrong with your general approach).
That said, I would change the design slightly to add a "routing" component: think of some logic that somehow limits what "commands" can be sent by clients, and hooks between commands and "handlers" - that is functions that handle them. Basically think Web Framework routing (if you are familiar with the concept).
This is a good idea both in terms of flexibility of the design, in terms of error detection and in terms of security (you don't want clients to call ['__del__'] for example on your Manager.
At it's very basic form, a router can be a dictionary mapping commands to class methods:
class Manager(Process):
def __init__(self, connection):
super(Process, self).__init__()
self.connection = connection
self.routes = {'do_action': self._do_action,
'do_other_action': some_callable,
'ping': lambda args: args} # <- as long as it's callable and has the right signature...
def call_function(self, name, *args):
try:
handler = self.routes[name]
except KeyError:
return self._error_reply('{} is not a valid command'.format(name))
try:
return_val = handler(*args) # handler functions will need to throw something if arguments are wrong...
except ValueError as e:
return self._error_reply('Invalid command arguments: {}'.format(str(e)))
except Exception as e:
# This is your catch-all "internal server error" handler
return self._error_reply(str(e))
self.connection.send(return_val)
This is of course just an example of an approach. You will need to implement _error_reply() in whatever way works for you.
You can expand on it by creating a Router class and passing it as a dependency to Manager, making it even more flexible. You might also want to think about making your Manager a separate thing and not a subclass of Process (because you might want to run it regardless of whether it is in a subprocess - for example in testing).
BTW, there are frameworks for implementing such things with various degrees of complexity and flexibility (Thrift, ZeroMQ, ...), but if you want to do something simple and learn, doing it yourself is in my opinion a great choice.
Using Python 2.7 on Windows, and will use Jython which support true multi-threading. The method sendMessage is used to receive message from a specific client, and the client may send the same message to a few other clients (which is what parameter receivers is for, and receivers is a list). The method receiveMessage is used to receive message for a specific client, which are sent from other clients.
The question is whether I need any locks for method sendMessage and receiveMessage? I think there is no need, since even if a client X is receiving its message, it is perfect fine for another client Y to append to message pool to deliver message to client X. And I think for defaultdict/list, append/pop are both atomic and no need for protection.
Please feel free to correct me if I am wrong.
from collections import defaultdict
class Foo:
def __init__(self):
# key: receiver client ID, value: message
self.messagePool = defaultdict(list)
def sendMessage(self, receivers, message):
# check valid for receivers
for r in receivers:
self.messagePool[r].append(message)
def receiveMessage(self, clientID):
result = []
while len(self.messagePool[clientID]) > 0:
result.append(self.messagePool[clientID].pop(0))
return result
I suggest to use Queue instead of list. It is designed for append\pop in threads with locking.
I think this question is already well-answered for CPython here and here (basically, you're safe because of GIL, although nothing in documentation (like on defaultdict or list) officially says about that). But I understand your concern about Jython, so let's solve it using some official source, like Jython source code. A pythonic list is a javaish PyList there with this kind of code:
public void append(PyObject o) {
list_append(o);
}
final synchronized void list_append(PyObject o) {
...
}
public PyObject pop(int n) {
return list_pop(n);
}
final synchronized PyObject list_pop(int n) {
...
}
And as we have these methods synchronized, we can be sure that list appends and pops are also thread-safe with Jython. Thus, your code seems to be safe wrt threading.
Although Queue suggestion is still valid one, it really is more appropriate for this use case.
Race conditions is about two or more threads changing some global states at the same time.
In your code for sendMessage, you are changing self.messagePool[r], which is a global object. Hence, self.messagePool[r] should be locked before appending a new item.
Same with your receiveMessage function.
list.append and list.pop are armotized O(1) and O(1) operations, so it rarely would cause any race condition. However, the risk is still there.
I have class that opens a socket connection on initialization, and can transmit and receive certain messages back and forth with the counterparty. I create an instance of the object using a with statement. In my class, if I receive certain messages back on the socket, I want to explicitly close the connection, and exit the with statement.
I attempt to do so, by explicitly calling self.\__exit__(None, None, None)
def __exit__(self, type, value, traceback):
print 'Closing Connection'
self.logout()
self.conn.close()
sys.exit(1)
However, I am finding that I am getting the Closing Connection message back twice, and running into problems because on the second call, there is no longer a connection to close. Examining the code, I have ruled out all other instances of my explicit call to self.__exit__(None, None, None). What's going on? Is the sys.exit(1) insufficient for preventing the with from garbage collecting again (although from what I've read, this seems to be the most "approved" way to do this)? How do I prevent the with statement from calling self.__exit__(None, None, None). Any help, or a point in the right direction, would be greatly appreciated!
Once you are in a with statement, the only way to leave with without running __exit__ is to use os._exit; that's bad. Instead, explicitly begin by calling __enter__ if you want this behavior. Or change your class to make sure it doesn't do the cleanup twice if called twice like #Kay suggests in his answer. Or, do as #IsmailBadawi suggests in his comment, and refactor your code so you don't need to explicitly call __exit__.
Just memorize if the you already closed the connection / log / something:
class MyContext(object):
def __init__(self):
self.__already_closed = False
....
def close(self):
if not self.__already_closed:
self.__already_closed = True
self.logout()
self.conn.close()
def __exit__(...):
self.close()
Maybe even add a "please don't cleanup" method:
def do_not_cleanup(self):
self.__already_closed = True
I want to read and process some data from an external service. I ask the service if there is any data, if something was returned I process it and ask again (so data can be processed immediately when it's available) and otherwise I wait for a notification that data is available. This can be written as an infinite loop:
def loop(self):
while True:
data = yield self.get_data_nonblocking()
if data is not None:
yield self.process_data(data)
else:
yield self.data_available
def on_data_available(self):
self.data_available.fire()
How can data_available be implemented here? It could be a Deferred but a Deferred cannot be reset, only recreated. Are there better options?
Can this loop be integrated into the Twisted event loop? I can read and process data right in on_data_available and write some code instead of the loop checking get_data_nonblocking but I feel like then I'll need some locks to make sure data is processed in the same order it arrives (the code above enforces it because it's the only place where it's processed). Is this a good idea at all?
Consider the case of a TCP connection. The receiver buffer for a TCP connection can either have data in it or not. You can get that data, or get nothing, without blocking by using the non-blocking socket API:
data = socket.recv(1024)
if data:
self.process_data(data)
You can wait for data to be available using select() (or any of the basically equivalent APIs):
socket.setblocking(False)
while True:
data = socket.recv(1024)
if data:
self.process_data(data)
else:
select([socket], [], [])
Of these, only select() is particularly Twisted-unfriendly (though the Twisted idiom is certainly not to make your own socket.recv calls). You could replace the select call with a Twisted-friendly version though (implement a Protocol with a dataReceived method that fires a Deferred - sort of like your on_data_available method - toss in some yields and make this whole thing an inlineCallbacks generator).
But though that's one way you can get data from a TCP connection, that's not the API that Twisted encourages you to use to do so. Instead, the API is:
class SomeProtocol(Protocol):
def dataReceived(self, data):
# Your logic here
I don't see how your case is substantially different. What if, instead of the loop you wrote, you did something like this:
class YourDataProcessor(object):
def process_data(self, data):
# Your logic here
class SomeDataGetter(object):
def __init__(self, processor):
self.processor = processor
def on_available_data(self):
data = self.get_data_nonblocking()
if data is not None:
self.processor.process_data(data)
Now there are no Deferreds at all (except perhaps in whatever implements on_available_data or get_data_nonblocking but I can't see that code).
If you leave this roughly as-is, you are guaranteed of in-ordered execution because Twisted is single-threaded (except in a couple places that are very clearly marked) and in a single-threaded program, an earlier call to process_data must complete before any later call to process_data could be made (excepting, of course, the case where process_data reentrantly invokes itself - but that's another story).
If you switch this back to using inlineCallbacks (or any equivalent "coroutine" flavored drink mix) then you are probably introducing the possibility of out-of-order execution.
For example, if get_data_nonblocking returns a Deferred and you write something like this:
#inlineCallbacks
def on_available_data(self):
data = yield self.get_data_nonblocking()
if data is not None:
self.processor.process_data(data)
Then you have changed on_available_data to say that a context switch is allowed when calling get_data_nonblocking. In this case, depending on your implementation of get_data_nonblocking and on_available_data, it's entirely possible that:
on_available_data is called
get_data_nonblocking is called and returns a Deferred
on_available_data tells execution to switch to another context (via yield / inlineCallbacks)
on_available_data is called again
get_data_nonblocking is called again and returns a Deferred (perhaps the same one! perhaps a new one! depends on how it's implement)
The second invocation of on_available_data tells execution to switch to another context (same reason)
The reactor spins around for a while and eventually an event arrives that causes the Deferred returned by the second invocation of get_data_nonblocking to fire.
Execution switches back to the second on_available_data frame
process_data is called with whatever data the second get_data_nonblocking call returned
Eventually the same things happen to the first set of objects and process_data is called again with whatever data the first get_data_nonblocking call returned
Now perhaps you've processed data out of order - again, this depends on more details of other parts of your system.
If so, you can always re-impose order. There are a lot of different possible approaches to this. Twisted itself doesn't come with any APIs that are explicitly in support of this operation so the solution involves writing some new code. Here's one idea (untested) for an approach - a queue-like class that knows about object sequence numbers:
class SequencedQueue(object):
"""
A queue-like type which guarantees objects come out of the queue in the order
defined by a sequence number associated with the objects when they are put into
the queue.
Application code manages sequence number assignment so that sequence numbers don't
have to have the same order as `put` calls on this type.
"""
def __init__(self):
# The sequence number of the object that should be given out
# by the next call to `get`
self._next_sequence = 0
# The sequence number of the next result that needs to be provided.
self._next_result = 0
# A holding area for objects past _next_sequence
self._queue = {}
# A holding area
self._waiting =
def put(self, sequence, object):
"""
Put an object into the queue at a particular point in the sequence.
"""
if sequence < self._next_sequence:
# Programming error. The sequence number
# of the object being put has already been used.
raise ...
self._queue[sequence] = object
self._check_waiters()
def get(self):
"""
Get an object from the queue which has the next sequence number
following whatever was previously gotten.
"""
result = self._waiters[self._next_sequence] = Deferred()
self._next_sequence += 1
self._check_waiters()
return result
def _check_waiters(self):
"""
Find any Deferreds previously given out by get calls which can now be given
their results and give them to them.
"""
while True:
seq = self._next_result
if seq in self._queue and seq in self._waiting:
self._next_result += 1
# XXX Probably a re-entrancy bug here. If a callback calls back in to
# put then this loop might run recursively
self._waiting.pop(seq).callback(self._queue.pop(seq))
else:
break
The expected behavior (modulo any bugs I accidentally added) is something like:
q = SequencedQueue()
d1 = q.get()
d2 = q.get()
# Nothing in particular happens
q.put(1, "second result")
# d1 fires with "first result" and afterwards d2 fires with "second result"
q.put(0, "first result")
Using this, just make sure you assign sequence numbers in the order you want data dispatched rather than the order it actually shows up somewhere. For example:
#inlineCallbacks
def on_available_data(self):
sequence = self._process_order
data = yield self.get_data_nonblocking()
if data is not None:
self._process_order += 1
self.sequenced_queue.put(sequence, data)
Elsewhere, some code can consume the queue sort of like:
#inlineCallbacks
def queue_consumer(self):
while True:
yield self.process_data(yield self.sequenced_queue.get())
i am trying to wrap the read and write operation of an instance of a file object (specifically the readline() and write() methods).
normally, i would simply replace those functions by a wrapper, a bit like this:
def log(stream):
def logwrite(write):
def inner(data):
print 'LOG: > '+data.replace('\r','<cr>').replace('\n','<lf>')
return write(data)
return inner
stream.write = logwrite(stream.write)
but the attributes of a file object are read-only ! how could i wrap them properly ?
(note: i am too lazy to wrap the whole fileobject... really, i don't want to miss a feature that i did not wrap properly, or a feature which may be added in a future version of python)
more context :
i am trying to automate the communication with a modem, whose AT command set is made available on the network through a telnet session. once logged in, i shall "grab" the module with which i want to communicate with. after some time without activity, a timeout occurs which releases the module (so that it is available to other users on the network... which i don't care, i am the sole user of this equipment). the automatic release writes a specific line on the session.
i want to wrap the readline() on a file built from a socket (cf. socket.makefile()) so that when the timeout occurs, a specific exception is thrown, so that i can detect the timeout anywhere in the script and react appropriately without complicating the AT command parser...
(of course, i want to do that because the timeout is quite spurious, otherwise i would simply feed the modem with commands without any side effect only to keep the module alive)
(feel free to propose any other method or strategy to achieve this effect)
use __getattr__ to wrap your file object. provide modified methods for the ones that you are concerned with.
class Wrapped(object):
def __init__(self, file_):
self._file = file_
def write(self, data):
print 'LOG: > '+data.replace('\r','<cr>').replace('\n','<lf>')
return self._file.write(data)
def __getattr__(self, attr):
return getattr(self._file, attr)
This way, requests for attributes which you don't explicitly provide will be routed to the attribute on the wrapped object and you can just implement the ones that you want
logged = Wrapped(open(filename))