How to implement threaded socket.recv() in python? - python

I have a number of devices from which i need to get status updates. A socket object is all I have, and socket.recv() is all I need to get the status. Put into a single threaded application, no problems occur:
class Device:
def receive(self):
log.debug("receive waiting: %r", self.device_id)
try:
packet = self.socket.recv(255)
except Exception as e:
self.report_socket_error(e)
self.reconnect()
log.debug("received response: %r", self.device_id)
d = Device()
d.connect()
while True:
d.receive()
However, the same code wrapped in a threading.Thread class causes deadlocks and funny behaviour. Wrapping it with locks didn't change anything. I traced the problem down to be the socket.recv() call...So, how to implement multiple threads where each thread owns one socket (1 thread owns exclusively 1 socket), which are able to wait for data simultaneously?
Thanks in advance

I know this does not answer your question on how to fix your deadlock problem, however it appears as your use of threads is overhead in your case:
You can just use one thread in which you use select() to find out which socket has available data and then handle the reported data. Unless the handling takes long or your protocol is more complicated select should be just fine and avoid all threading issues.
Have a look at http://docs.python.org/howto/sockets.html#non-blocking-sockets for more details.

How many different sockets do you have to read from?
If the answer is "just one", then use just one thread. Adding another helps you in no way and only complicates your life, as you found out.
If the answer is "several", than one way to organize this is indeed to have a thread per socket. recv is a blocking operation, which makes a thread an attractive option to organize code. Each thread owns a separate socket and reads from it at its leisure. You should have no problems and deadlocks with this.
Locks are unnecessary as long as no resources are shared. Even if you do share resources (logging, some data store, etc.) don't just use simple locks - Python has higher-level utilities for that like the Queue module.

Related

pyzmq - zmq_req can I have one context and use several sockets?

I'm currently working on a Benchmark project, where I'm trying to stress the server out with zmq requests.
I was wondering what would be the best way to approach this, I was thinking of having a context to create a socket and push it into a thread, in which I would send request and wait for responses in each thread respectively, but I'm not too sure this is possible with python's limitations.
More over, would it be the same socket for all threads, that is, if I'm waiting for a response on one thread (With it's own socket), would it be possible for another thread to catch that response?
Thanks.
EDIT:
Test flow logic would be like this:
Client socket would use zmq.REQ.
Client sends message.
Client waits for a response.
If no response, client reconnects and tries again until limit.
I'd like to scale this operation up to any number of clients, preferring not to deal with Processes unless performance wise the difference is significant..
How would you do this?
Q : "...can I have one context and use several sockets?"
Oh sure you can.
Moreover, you can have several Context()-instances, each one managing ... almost... any number of Socket()-instances, each Socket()-instance's methods may get called from one and only one python-thread ( a Zen-of-Zero rule: zero-sharing ).
Due to known GIL-lock re-[SERIAL]-isation of all the thread-based code-execution flow, this still has to and will wait for acquiring the GIL-lock ownership, which in turn permits a GIL-lock owner ( and nobody else ) to execute a fixed amount of python instructions, before it re-releases the GIL-lock to some other thread...

Pika threaded execution gets error - 505, 'UNEXPECTED_FRAME

I'm aware that pika is not thread safe, i was trying to work around using a lock to access to channel but still get error:
pika.exceptions.ConnectionClosed: (505, 'UNEXPECTED_FRAME - expected content header for class 60, got non content header frame instead')
PS i cannot use a different channel.
what could i do? Thank you for help in advance
You need to redesign your application or choose another Rabbitmq library than Pika. Locks do not make Pika thread safe. Each thread needs to have a separate connection.
You have a couple of options, but none of them will be as simple as using a lock.
One would be to replace Pika with Kombu. Kombu is thread safe but the interface is rather different from Pika (simpler in my opinion but this is subjective).
If you want to keep using Pika, then you need to redesign your Rabbit interface. I do not know why you "cannot" use a different channel. But one possible way of doing this would be to have a single thread interfacing with Rabbit, and that thread would interact with worker threads doing tasks with the received data, and you would communicate via queues with them. This way your Rabbit thread would read data, send the received data to a worker in a queue, receive answers from workers via another queue and then submitting them to rabbit as responses.
You might also be able to untangle something in your communications protocol so that you actually can use a different channel and each thread can interface rabbit independently with their own connections and channels. This is the method I generally use.
Yet another candidate would be to get rid of threads and start using async methods instead. Your application may or may not be suitable for this.
But there is no simple workaround, and you will eventually encounter weird behaviour or exceptions if you try to share Pika objects between threads.

Python passing variable into thread

I'm using the threading module to control threads that send data through sockets and what not, however I can't find a suitable solution to pass data into the thread to work with. I've tried things such as Overriding python threading.Thread.run() but can't seem to get it working. If anyone has any suggestions I'd be happy to try anything :)
Thanks !
You are thinking about this backwards. Forget about the fact that it happens to be a thread that's sending the data through the sockets. The data doesn't need to get to the thread, it needs to get to the logic that sends data on the socket.
For example, you can have a queue that holds things that need to be sent through the socket. The socket write code pulls messages from the queue and sends them out the socket. The other code puts messages on this queue. The code that needs to send messages to the socket shouldn't know or care that there happens to be a thread that does the sending.
Use message queues for this. Python has the Queue module for passing data between threads, but if you use a third party library like 0MQ http://www.zeromq.org instead, then you can split the threads into separate processes and it will work the same way.
Multiprocessing is easier to do than threading, but if you have to use threading, avoid locking and sharing data as much as you can. Instead use a prewritten module like Queue to limit the ways in which subtle bugs can arise.

Pattern for a background Twisted server that fills an incoming message queue and empties an outgoing message queue?

I'd like to do something like this:
twistedServer.start() # This would be a nonblocking call
while True:
while twistedServer.haveMessage():
message = twistedServer.getMessage()
response = handleMessage(message)
twistedServer.sendResponse(response)
doSomeOtherLogic()
The key thing I want to do is run the server in a background thread. I'm hoping to do this with a thread instead of through multiprocessing/queue because I already have one layer of messaging for my app and I'd like to avoid two. I'm bringing this up because I can already see how to do this in a separate process, but what I'd like to know is how to do it in a thread, or if I can. Or if perhaps there is some other pattern I can use that accomplishes this same thing, like perhaps writing my own reactor.run method. Thanks for any help.
:)
The key thing I want to do is run the server in a background thread.
You don't explain why this is key, though. Generally, things like "use threads" are implementation details. Perhaps threads are appropriate, perhaps not, but the actual goal is agnostic on the point. What is your goal? To handle multiple clients concurrently? To handle messages of this sort simultaneously with events from another source (for example, a web server)? Without knowing the ultimate goal, there's no way to know if an implementation strategy I suggest will work or not.
With that in mind, here are two possibilities.
First, you could forget about threads. This would entail defining your event handling logic above as only the event handling parts. The part that tries to get an event would be delegated to another part of the application, probably something ultimately based on one of the reactor APIs (for example, you might set up a TCP server which accepts messages and turns them into the events you're processing, in which case you would start off with a call to reactor.listenTCP of some sort).
So your example might turn into something like this (with some added specificity to try to increase the instructive value):
from twisted.internet import reactor
class MessageReverser(object):
"""
Accept messages, reverse them, and send them onwards.
"""
def __init__(self, server):
self.server = server
def messageReceived(self, message):
"""
Callback invoked whenever a message is received. This implementation
will reverse and re-send the message.
"""
self.server.sendMessage(message[::-1])
doSomeOtherLogic()
def main():
twistedServer = ...
twistedServer.start(MessageReverser(twistedServer))
reactor.run()
main()
Several points to note about this example:
I'm not sure how your twistedServer is defined. I'm imagining that it interfaces with the network in some way. Your version of the code would have had it receiving messages and buffering them until they were removed from the buffer by your loop for processing. This version would probably have no buffer, but instead just call the messageReceived method of the object passed to start as soon as a message arrives. You could still add buffering of some sort if you want, by putting it into the messageReceived method.
There is now a call to reactor.run which will block. You might instead write this code as a twistd plugin or a .tac file, in which case you wouldn't be directly responsible for starting the reactor. However, someone must start the reactor, or most APIs from Twisted won't do anything. reactor.run blocks, of course, until someone calls reactor.stop.
There are no threads used by this approach. Twisted's cooperative multitasking approach to concurrency means you can still do multiple things at once, as long as you're mindful to cooperate (which usually means returning to the reactor once in a while).
The exact times the doSomeOtherLogic function is called is changed slightly, because there's no notion of "the buffer is empty for now" separate from "I just handled a message". You could change this so that the function is installed called once a second, or after every N messages, or whatever is appropriate.
The second possibility would be to really use threads. This might look very similar to the previous example, but you would call reactor.run in another thread, rather than the main thread. For example,
from Queue import Queue
from threading import Thread
class MessageQueuer(object):
def __init__(self, queue):
self.queue = queue
def messageReceived(self, message):
self.queue.put(message)
def main():
queue = Queue()
twistedServer = ...
twistedServer.start(MessageQueuer(queue))
Thread(target=reactor.run, args=(False,)).start()
while True:
message = queue.get()
response = handleMessage(message)
reactor.callFromThread(twistedServer.sendResponse, response)
main()
This version assumes a twistedServer which works similarly, but uses a thread to let you have the while True: loop. Note:
You must invoke reactor.run(False) if you use a thread, to prevent Twisted from trying to install any signal handlers, which Python only allows to be installed in the main thread. This means the Ctrl-C handling will be disabled and reactor.spawnProcess won't work reliably.
MessageQueuer has the same interface as MessageReverser, only its implementation of messageReceived is different. It uses the threadsafe Queue object to communicate between the reactor thread (in which it will be called) and your main thread where the while True: loop is running.
You must use reactor.callFromThread to send the message back to the reactor thread (assuming twistedServer.sendResponse is actually based on Twisted APIs). Twisted APIs are typically not threadsafe and must be called in the reactor thread. This is what reactor.callFromThread does for you.
You'll want to implement some way to stop the loop and the reactor, one supposes. The python process won't exit cleanly until after you call reactor.stop.
Note that while the threaded version gives you the familiar, desired while True loop, it doesn't actually do anything much better than the non-threaded version. It's just more complicated. So, consider whether you actually need threads, or if they're merely an implementation technique that can be exchanged for something else.

A programming strategy to bypass the os thread limit?

The scenario: We have a python script that checks thousands of proxys simultaneously.
The program uses threads, 1 per proxy, to speed the process. When it reaches the 1007 thread, the script crashes because of the thread limit.
My solution is: A global variable that gets incremented when a thread spawns and decrements when a thread finishes. The function which spawns the threads monitors the variable so that the limit is not reached.
What will your solution be, friends?
Thanks for the answers.
You want to do non-blocking I/O with the select module.
There are a couple of different specific techniques. select.select should work for every major platform. There are other variations that are more efficient (and could matter if you are checking tens of thousands of connections simultaneously) but you will then need to write the code for you specific platform.
I've run into this situation before. Just make a pool of Tasks, and spawn a fixed number of threads that run an endless loop which grabs a Task from the pool, run it, and repeat. Essentially you're implementing your own thread abstraction and using the OS threads to implement it.
This does have drawbacks, the major one being that if your Tasks block for long periods of time they can prevent the execution of other Tasks. But it does let you create an unbounded number of Tasks, limited only by memory.
Does Python have any sort of asynchronous IO functionality? That would be the preferred answer IMO - spawning an extra thread for each outbound connection isn't as neat as having a single thread which is effectively event-driven.
Using different processes, and pipes to transfer data. Using threads in python is pretty lame. From what I heard, they don't actually run in parallel, even if you have a multi-core processor... But maybe it was fixed in python3.
My solution is: A global variable that gets incremented when a thread spawns and decrements when a thread finishes. The function which spawns the threads monitors the variable so that the limit is not reached.
The standard way is to have each thread get next tasks in a loop instead of dying after processing just one. This way you don't have to keep track of the number of threads, since you just fire a fixed number of them. As a bonus, you save on thread creation/destruction.
A counting semaphore should do the trick.
from socket import *
from threading import *
maxthreads = 1000
threads_sem = Semaphore(maxthreads)
class MyThread(Thread):
def __init__(self, conn, addr):
Thread.__init__(self)
self.conn = conn
self.addr = addr
def run(self):
try:
read = conn.recv(4096)
if read == 'go away\n':
global running
running = False
conn.close()
finally:
threads_sem.release()
sock = socket()
sock.bind(('0.0.0.0', 2323))
sock.listen(1)
running = True
while running:
conn, addr = sock.accept()
threads_sem.acquire()
MyThread(conn, addr).start()
Make sure your threads get destroyed properly after they've been used or use a threadpool, although per what I see they're not that effective in Python
see here:
http://code.activestate.com/recipes/203871/
Using the select module or a similar library would most probably be a more efficient solution, but that would require bigger architectural changes.
If you just want to limit the number of threads, a global counter should be fine, as long as you access it in a thread safe way.
Be careful to minimize the default thread stack size. At least on Linux, the default limit puts severe restrictions on the number of created threads. Linux allocates a chunk of the process virtual address space to the stack (usually 10MB). 300 threads x 10MB stack allocation = 3GB of virtual address space dedicated to stack, and on a 32 bit system you have a 3GB limit. You can probably get away with much less.
Twisted is a perfect fit for this problem. See http://twistedmatrix.com/documents/current/core/howto/clients.html for a tutorial on writing a client.
If you don't mind using alternate Python implmentations, Stackless has light-weight (non-native) threads. The only company I know doing much with it though is CCP--they use it for tasklets in their game on both the client and server. You still need to do async I/O with Stackless because if a thread blocks, the process blocks.
As mentioned in another thread, why do you spawn off a new thread for each single operation? This is a classical producer - consumer problem, isn't it? Depending a bit on how you look at it, the proxy checkers might be comsumers or producers.
Anyway, the solution is to make a "queue" of "tasks" to process, and make the threads in a loop check if there are any more tasks to perform in the queue, and if there isn't, wait a predefined interval, and check again.
You should protect your queue with some locking mechanisms, i.e. semaphores, to prevent race conditions.
It's really not that difficult. But it requires a bit of thinking getting it right. Good luck!

Categories