Python passing variable into thread - python

I'm using the threading module to control threads that send data through sockets and what not, however I can't find a suitable solution to pass data into the thread to work with. I've tried things such as Overriding python threading.Thread.run() but can't seem to get it working. If anyone has any suggestions I'd be happy to try anything :)
Thanks !

You are thinking about this backwards. Forget about the fact that it happens to be a thread that's sending the data through the sockets. The data doesn't need to get to the thread, it needs to get to the logic that sends data on the socket.
For example, you can have a queue that holds things that need to be sent through the socket. The socket write code pulls messages from the queue and sends them out the socket. The other code puts messages on this queue. The code that needs to send messages to the socket shouldn't know or care that there happens to be a thread that does the sending.

Use message queues for this. Python has the Queue module for passing data between threads, but if you use a third party library like 0MQ http://www.zeromq.org instead, then you can split the threads into separate processes and it will work the same way.
Multiprocessing is easier to do than threading, but if you have to use threading, avoid locking and sharing data as much as you can. Instead use a prewritten module like Queue to limit the ways in which subtle bugs can arise.

Related

Pika threaded execution gets error - 505, 'UNEXPECTED_FRAME

I'm aware that pika is not thread safe, i was trying to work around using a lock to access to channel but still get error:
pika.exceptions.ConnectionClosed: (505, 'UNEXPECTED_FRAME - expected content header for class 60, got non content header frame instead')
PS i cannot use a different channel.
what could i do? Thank you for help in advance
You need to redesign your application or choose another Rabbitmq library than Pika. Locks do not make Pika thread safe. Each thread needs to have a separate connection.
You have a couple of options, but none of them will be as simple as using a lock.
One would be to replace Pika with Kombu. Kombu is thread safe but the interface is rather different from Pika (simpler in my opinion but this is subjective).
If you want to keep using Pika, then you need to redesign your Rabbit interface. I do not know why you "cannot" use a different channel. But one possible way of doing this would be to have a single thread interfacing with Rabbit, and that thread would interact with worker threads doing tasks with the received data, and you would communicate via queues with them. This way your Rabbit thread would read data, send the received data to a worker in a queue, receive answers from workers via another queue and then submitting them to rabbit as responses.
You might also be able to untangle something in your communications protocol so that you actually can use a different channel and each thread can interface rabbit independently with their own connections and channels. This is the method I generally use.
Yet another candidate would be to get rid of threads and start using async methods instead. Your application may or may not be suitable for this.
But there is no simple workaround, and you will eventually encounter weird behaviour or exceptions if you try to share Pika objects between threads.

How do I create an asynchronous socket in Python?

I've created a socket object for Telnet communication, and I'm using it to communicate with an API, sending and receiving data. I need to configure it in such a way that I can send and receive data at the same time. By that, I mean data should be sent as soon as the application tries to send it, and data should be processed immediately on receipt. Currently, I have a configuration which allows receipt to be instant, and sending to be second priority with a very short delay.
Currently the best way I have found to do this is by having an event queue, and pushing data to send into it, then having a response queue into which I put messages from the server. I have a thread which polls the buffer every .1 seconds to check for new data, if there isn't any, it then checks the request queue and processes anything there, and that's running in a continuous loop. I then have threads insert data into the request queue, and read data from the response queue. Everything is just about linear enough that this works fine.
This is not "asynchronous", in a sense that I've had to make it as asynchronous as possible without actually achieving it. Is there a proper way to do this? Or is anything under the hood going to be doing exactly the same as I am?
Other things I have investigated as a solution to this problem:
A callback system, where I might call socket.on_receipt(handle_message, args) to call the method handle_message with args as a parameter, passing the received data into the method. The only way I could find to achieve this is by implementing what I already have, then registering a callback for it (in fact, this is very close to what I do already have).
Please note: I am approaching this as a learning exercise to understand better how asynchronous systems work, not to understand how to use a particular library, so please do not suggest an existing library unless it contains very clear code which is simple to understand and answers the question fully and concisely.
This seems like a pretty straightforward use case for asyncio. I wouldn't consider using asyncio as "using a particular library" since socket programming paired with asyncio's event loop is pretty low-level and the concept is very transparent if you have experience with other languages and just want to see how async programming works in Python.
You can use this async chat as an example: https://gist.github.com/gregvish/7665915
Essentially, you create a non-blocking socket, see standard library reference on socket.setblocking(0):
https://docs.python.org/3/library/socket.html#socket.socket.setblocking
I'd also suggest this amazing session by David Beazley as a must-see for async Python programming. He explains the concurrency concepts in Python using sockets, exactly what you need: https://www.youtube.com/watch?v=MCs5OvhV9S4

How to implement threaded socket.recv() in python?

I have a number of devices from which i need to get status updates. A socket object is all I have, and socket.recv() is all I need to get the status. Put into a single threaded application, no problems occur:
class Device:
def receive(self):
log.debug("receive waiting: %r", self.device_id)
try:
packet = self.socket.recv(255)
except Exception as e:
self.report_socket_error(e)
self.reconnect()
log.debug("received response: %r", self.device_id)
d = Device()
d.connect()
while True:
d.receive()
However, the same code wrapped in a threading.Thread class causes deadlocks and funny behaviour. Wrapping it with locks didn't change anything. I traced the problem down to be the socket.recv() call...So, how to implement multiple threads where each thread owns one socket (1 thread owns exclusively 1 socket), which are able to wait for data simultaneously?
Thanks in advance
I know this does not answer your question on how to fix your deadlock problem, however it appears as your use of threads is overhead in your case:
You can just use one thread in which you use select() to find out which socket has available data and then handle the reported data. Unless the handling takes long or your protocol is more complicated select should be just fine and avoid all threading issues.
Have a look at http://docs.python.org/howto/sockets.html#non-blocking-sockets for more details.
How many different sockets do you have to read from?
If the answer is "just one", then use just one thread. Adding another helps you in no way and only complicates your life, as you found out.
If the answer is "several", than one way to organize this is indeed to have a thread per socket. recv is a blocking operation, which makes a thread an attractive option to organize code. Each thread owns a separate socket and reads from it at its leisure. You should have no problems and deadlocks with this.
Locks are unnecessary as long as no resources are shared. Even if you do share resources (logging, some data store, etc.) don't just use simple locks - Python has higher-level utilities for that like the Queue module.

Mix Python Twisted with multiprocessing?

I need to write a proxy like program in Python, the work flow is very similar to a web proxy. The program sits in between the client and the server, incept requests sent by the client to the server, process the request, then send it to the original server. Of course the protocol used is a private protocol uses TCP.
To minimize the effort, I want to use Python Twisted to handle the request receiving (the part acts as a server) and resending (the part acts as a client).
To maximum the performance, I want to use python multiprocessing (threading has the GIL limit) to separate the program into three parts (processes). The first process runs Twisted to receive requests, put the request in a queue, and return success immediately to the original client. The second process take request from the queue, process the request further and put it to another queue. The 3rd process take request from the 2nd queue and send it to the original server.
I was a new comer to Python Twisted, I know it is event driven, I also heard it's better to not mix Twisted with threading or multiprocessing. So I don't know whether this way is appropriate or is there a more elegant way by just using Twisted?
Twisted has its own event-driven way of running subprocesses which is (in my humble, but correct, opinion) better than the multiprocessing module. The core API is spawnProcess, but tools like ampoule provide higher-level wrappers over it.
If you use spawnProcess, you will be able to handle output from subprocesses in the same way you'd handle any other event in Twisted; if you use multiprocessing, you'll need to develop your own queue-based way of getting output from a subprocess into the Twisted mainloop somehow, since the normal callFromThread API that a thread might use won't work from another process. Depending on how you call it, it will either try to pickle the reactor, or just use a different non-working reactor in the subprocess; either way it will lose your call forever.
ampoule is the first thing I think when reading your question.
It is a simple process pool implementation which uses the AMP protocol to communicate. You can use the deferToAMPProcess function, it's very easy to use.
You can try something like Cooperative Multitasking technique as it's described there http://us.pycon.org/2010/conference/schedule/event/73/ . It's simillar to technique as Glyph menitioned and it's worth a try.
You can try to use ZeroMQ with Twisted but it's really hard and experimental for now :)

Python/Twisted - Sending to a specific socket object?

I have a "manager" process on a node, and several worker processes. The manager is the actual server who holds all of the connections to the clients. The manager accepts all incoming packets and puts them into a queue, and then the worker processes pull the packets out of the queue, process them, and generate a result. They send the result back to the manager (by putting them into another queue which is read by the manager), but here is where I get stuck: how do I send the result to a specific socket? When dealing with the processing of the packets on a single process, it's easy, because when you receive a packet you can reply to it by just grabbing the "transport" object in-context. But how would I do this with the method I'm using?
It sounds like you might need to keep a reference to the transport (or protocol) along with the bytes the just came in on that protocol in your 'event' object. That way responses that came in on a connection go out on the same connection.
If things don't need to be processed serially perhaps you should think about setting up functors that can handle the data in parallel to remove the need for queueing. Just keep in mind that you will need to protect critical sections of your code.
Edit:
Judging from your other question about evaluating your server design it would seem that processing in parallel may not be possible for your situation, so my first suggestion stands.

Categories