ZeroMQ cleaning up PULL socket - half-close

ZeroMQ cleaning up PULL socket - half-close - python

My problem is kind of trying to half-close a zmq socket.
In simple terms I have a pair of PUSH/PULL sockets in Python.
The PUSH socket never stops sending, but the PULL socket should be able to clean itself up in a following way:
Stop accepting any additional messages to the queue
Process the messages still in the queue
Close the socket etc.
I don't want to affect the PUSH socket in any way, it can keep accumulating its own queue until another PULL socket comes around or that might be there already. The LINGER option doesn't seem to work with recv() (just with send()).
One option might be to have a broker in between with the broker PUSH and receiver PULL HWM set to zero. Then the broker's PULL would accumulate the messages. However, I'd rather not do this. Is there any other way?

I believe you are confusing which socket type will queue messages. According to the zmq_socket docs, a PUSH socket will queue its messages but a PULL socket doesn't have any type of queuing mechanism.
So what you're asking to be able to do would be something of the following:
1) Stop recv'ing any additional messages to the PULL socket.
2) Close the socket etc.
The PUSH socket will continue to 'queue' its messages automatically until either the HWM is met (at which it will then block and not queue any more messages) or a PULL socket comes along and starts recv'ing messages.
The case I think you're really concerned about is a slow PULL reader. In which you would like to get all of the currently queued messages in the PUSH socket (at once?) and then quit. This isn't how zmq works, you get one message at a time.
To implement something of this sort, you'll have to wrap the PULL capability with your own queue. You 'continually' PULL the messages into your personal queue (in a different thread?) until you want to stop, then process those messages and quit.

Related

Python Sockets: Question regarding Network Buffers when using send() and recv()

First, of, I've read around a fair amount of time including many threads on this site, however I still need some clarification on Sockets, TCP and Networking in Python, as I feel like I don't fully understand what's happening in my program.
I'm sending data from a server to a client via an Unix Domain Socket (AF_UNIX) using TCP (SOCK_STREAM).
On the server side, a process is continuously putting items on a Queue.Queue and another process is sending items to the client by running
while True:
conn.sendall(queue.get())
On the client side, data is read by running
while True:
conn.recv(1024)
# time.sleep(10)
Now, I emulate a slow client by sending the client process to sleep after every call on recv(). What I expect is that the queue on the server side is filled up, since send() should block because the client can't read off data fast enough.
I monitor the number of items send to the client as well as the queue size. What I notice is that several dozen messages (roughly depending on the size of the messages, but slightly different message sizes might behave the same) are sent to the client (which are received by the client with delay, due to time.seep()) before the queue starts to fill up.
What is happening here? Why is send() not blocking immediately?
I suspect that some sort of network or file buffer is involved, which queues the send items and fills up before my implemented queue.

There are a number of buffers in various places in the system, on both the sender and the receiver. Your call to a sending function won't block until all those buffers are filled up. When the receiver drains some of the buffers, data will flow again and eventually it will unblock the send call.
Typically there's a buffer in the sender holding data waiting to be put on the wire, a buffer "in flight" allowing a certain number of bytes to be send before having to wait for the receiver to acknowledge, and lastly receive buffers holding data that has been acknowledged but not yet delivered to the receiving application.
Were this not so, forward progress would be extremely limited. The sender would be stuck waiting to send until the receiver called receive. Then, whichever one finishes first would have to wait for the other one. Even if the sender was finished first, it couldn't make any forward progress at all until the receiver finished processing the previous chunk of data. That would be quite sub-optimal for most applications.

socket server responding multiple requests simultaneously from a client socket

I am building a socket server with Python.
This server
receives data from client
does something here (approximately it takes about 10 sec in maximum depending on input data)
sends back data after some works done above
This system works fine unless client doesn't send data simultaneously in a row. For example, say server takes 5 seconds to process data and client sends data every 10 seconds. The problem, however, is that client send multiple requests at a time, thus causing a delay. Currently, client cannot send data to server unless server is ready to receive data which means that server is not doing any work. Below are what I want to build.
a) build a queue at socket server whose main task is to make a queue of input data so that client can send data to server even when server is busy
b) make a thread(here, I'm bit confused with concurrency and parallelism. Work in socket focused on computation rather than system call) at socket server so that server can do work 'simultaneously'.
c) send back data to client socket
My questions are as follows.
Is it Queue that I need to use in order to achieve a) ?
Is it thread or something else that I need to use in order to achieve b)?
Thanks in advance
Best
Gee

Yeah something like this could work.
First, you'll need a thread to receive and send data. If you have limited amount of clients, you can create a thread per client, but it's not an option for a more or less robust system. In order to be able to serve multiple clients in a single thread, the sockets should be nonblocking. Otherwise one long transmission would block other transmissions. Nonblocking code has more sophisticated structure that uses select, so I would advice to spend some time reading about it.
Then you'll need a thread to do the math. Or several threads/processes if "the math" is taking long to execute.
Last but not least, these socket threads and a "math" thread should use two queues to exchange data. Simple lists's are enough, but make sure they are synchronized. Guard them with mutexes, or locks. This is another vast topic that is worth reading about.

Exiting a multiprocessing program on error

I have an application that creates one multiprocessing process for communication and n for workers. I would need to find a way to exit the whole application when there is an error somewhere or the user presses control-C.
// the agents are connected via zmq

If the error occurs in the communication process, you could use your existing socket to send a special task that asks the receiver to close. This would be received by one worker, that notifies the communication process (through some socket) that it is exiting, and then exits.
The communication process could basically fill the pipeline with n sockets if no queuing at workers are done, or k*n + one in a while if queuing is done. This is done until all workers are done.
An error at a worker could be sent to the communication process through the earlier mentioned channel that an error has occurred. The channel could for example be a push/pull socket, or maybe a pub/sub with a proxy. (I guess you are using a push/pull to get the results back. In that case use that.)

How should a ZeroMQ worker safely "hang up"?

I started using ZeroMQ this week, and when using the Request-Response pattern I am not sure how to have a worker safely "hang up" and close his socket without possibly dropping a message and causing the customer who sent that message to never get a response. Imagine a worker written in Python who looks something like this:
import zmq
c = zmq.Context()
s = c.socket(zmq.REP)
s.connect('tcp://127.0.0.1:9999')
while i in range(8):
s.recv()
s.send('reply')
s.close()
I have been doing experiments and have found that a customer at 127.0.0.1:9999 of socket type zmq.REQ who makes a fair-queued request just might have the misfortune of having the fair-queuing algorithm choose the above worker right after the worker has done its last send() but before it runs the following close() method. In that case, it seems that the request is received and buffered by the ØMQ stack in the worker process, and that the request is then lost when close() throws out everything associated with the socket.
How can a worker detach "safely" — is there any way to signal "I don't want messages anymore", then (a) loop over any final messages that have arrived during transmission of the signal, (b) generate their replies, and then (c) execute close() with the guarantee that no messages are being thrown away?
Edit: I suppose the raw state that I would want to enter is a "half-closed" state, where no further requests could be received — and the sender would know that — but where the return path is still open so that I can check my incoming buffer for one last arrived message and respond to it if there is one sitting in the buffer.
Edit: In response to a good question, corrected the description to make the number of waiting messages plural, as there could be many connections waiting on replies.

You seem to think that you are trying to avoid a “simple” race condition such as in
... = zmq_recv(fd);
do_something();
zmq_send(fd, answer);
/* Let's hope a new request does not arrive just now, please close it quickly! */
zmq_close(fd);
but I think the problem is that fair queuing (round-robin) makes things even more difficult: you might already even have several queued requests on your worker. The sender will not wait for your worker to be free before sending a new request if it is its turn to receive one, so at the time you call zmq_send other requests might be waiting already.
In fact, it looks like you might have selected the wrong data direction. Instead of having a requests pool send requests to your workers (even when you would prefer not to receive new ones), you might want to have your workers fetch a new request from a requests queue, take care of it, then send the answer.
Of course, it means using XREP/XREQ, but I think it is worth it.
Edit: I wrote some code implementing the other direction to explain what I mean.

I think the problem is that your messaging architecture is wrong. Your workers should use a REQ socket to send a request for work and that way there is only ever one job queued at the worker. Then to acknowledge completion of the work, you could either use another REQ request that doubles as ack for the previous job and request for a new one, or you could have a second control socket.
Some people do this using PUB/SUB for the control so that each worker publishes acks and the master subscribes to them.
You have to remember that with ZeroMQ there are 0 message queues. None at all! Just messages buffered in either the sender or receiver depending on settings like High Water Mark, and type of socket. If you really do need message queues then you need to write a broker app to handle that, or simply switch to AMQP where all communication is through a 3rd party broker.

I've been thinking about this as well. You may want to implement a CLOSE message which notifies the customer that the worker is going away. You could then have the worker drain for a period of time before shutting down. Not ideal, of course, but might be workable.

There is a conflict of interest between sending requests as rapidly as possible to workers, and getting reliability in case a worked crashes or dies. There is an entire section of the ZeroMQ Guide that explains different answers to this question of reliability. Read that, it'll help a lot.
tl;dr workers can/will crash and clients need a resend functionality. The Guide provides reusable code for that, in many languages.

Wouldn't the simplest solution be to have the customer timeout when waiting for the reply and then retry if no reply is received?

Try sleeping before the call to close. This is fixed in 2.1 but not in 2.0 yet.

Python/Twisted - Sending to a specific socket object?

I have a "manager" process on a node, and several worker processes. The manager is the actual server who holds all of the connections to the clients. The manager accepts all incoming packets and puts them into a queue, and then the worker processes pull the packets out of the queue, process them, and generate a result. They send the result back to the manager (by putting them into another queue which is read by the manager), but here is where I get stuck: how do I send the result to a specific socket? When dealing with the processing of the packets on a single process, it's easy, because when you receive a packet you can reply to it by just grabbing the "transport" object in-context. But how would I do this with the method I'm using?

It sounds like you might need to keep a reference to the transport (or protocol) along with the bytes the just came in on that protocol in your 'event' object. That way responses that came in on a connection go out on the same connection.
If things don't need to be processed serially perhaps you should think about setting up functors that can handle the data in parallel to remove the need for queueing. Just keep in mind that you will need to protect critical sections of your code.
Edit:
Judging from your other question about evaluating your server design it would seem that processing in parallel may not be possible for your situation, so my first suggestion stands.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

ZeroMQ cleaning up PULL socket - half-close - python

Related

Python Sockets: Question regarding Network Buffers when using send() and recv()

socket server responding multiple requests simultaneously from a client socket

Exiting a multiprocessing program on error

How should a ZeroMQ worker safely "hang up"?

Python/Twisted - Sending to a specific socket object?

Categories

Resources