processing HTTPS requests and ZeroMQ messages in parallel inside Twisted

processing HTTPS requests and ZeroMQ messages in parallel inside Twisted - python

I am trying to do two things in parallel:
intercept HTTPS POST request and make Twisted hold the connection open indefinitely (return NOT_DONE_YET)
listen to ZeroMQ queue (and close one of the pending connections as a result of a ZeroMQ message)
I know how to make each of these two things separately, but how can I do them in parallel? To be specific, how can I receive messages from ZeroMQ inside the Twisted framework, when I am already listening to TCP sockets?

You can get support for ZMQ here, in txZMQ.
It sounds like you already know how to deal with the HTTP request.
To do these things in parallel, just create your ZmqSubConnection and your twisted.web.server.Site, with references to each other as necessary for your application. No special set-up is required.
Just make sure you only create and run one reactor. Other people with similar questions sometimes don't understand that reactor.run() means "run the event loop for the whole process", and not "run this one thing I set up".

Related

ZeroMQ: How to construct simple asynchronous broker? Seems impossible

I am building a simple star-like client-server topology.
The idea is that clients connect to the server, can send messages, and the server can send messages to them, when the server decides to. There will be a relatively small number of clients, about 30, but so many that it is not sensible to send all outgoing data to all. I'm sure I'm just boneheaded, but this seems to be completely impossible with ZeroMQ.
The last part is the reason this question does not provide answer.
The catch is this :
I can use a ROUTER socket to receive messages from clients. This also carries identification. However, I cannot use the same socket for sending, since ZeroMQ sockets are not threadsafe. I.e. I can't have one thread waiting for incoming messages, and another sending outgoing from the server itself. I am not aware of any way I could wait in blocking for both - socket.recv(), and for example .get() on a queue - at the same time on a single thread in python. Maybe there is a way to do that.
Using two sockets - one incoming one outgoing - doesn't work either. The identification is not shared between sockets, and so the sending socket would still have to be polled to obtain client id mapping, if even for once. We obviously can't use own port for each client. There seems to be no way for the server to send a message to a single client out of it's own volition.
(subscription topics are a dead idea too: message filtering is performed on client-side, and the server would just flood all client networks)
In the end TCP sockets can handle this sort of asynchronous situation easily, but effective message framing on python is a nightmare to build. All I'm essentially after is a reliable socket that handles messages, and has well defined failure modes.

I don't know Python but for C/C++ I would use zmq_poll(). There are several options, depending on your requirements.
Use zmq_poll() to wait for messages from clients. If a message arrives, process it. Also use a time-out. When the time-out expires, check if you need to send messages to clients and send them.
zmq_poll() can also wait on general file descriptors. You can use some type of file descriptor and trigger it (write to it) from another process or thread when you have a message to send to a client. If this file descriptor is triggered, send messages to clients.
Use ZeroMQ sockets internally inside your server. Use zmq_poll() to wait both on messages from clients and internal processes or threads. If the internal sockets are triggered, send messages to clients.
You can use the file descriptor or internal ZeroMQ sockets just for triggering but you can also send the message content through the file descriptor or ZeroMQ socket.

Q : "ZeroMQ: How to construct simple asynchronous broker?"
The concept builds on a few assumptions that are not supported or do not hold :
a)Python threads actually never execute concurrently, they are re-[SERIAL]-ised into a sequence of soloists execution blocks & for any foreseeable future will remain such, since ever & forever (as Guido van ROSSUM has explained this feature to be a pyramidal reason for collision prevention - details on GIL-lock, serving this purpose, are countless )
b)ZeroMQ thread-safeness has nothing to do with using a blocking-mode for operations.
c)ZeroMQ PUB/SUB archetype does perform a topic-filtering, yet in different versions on different sides of the "ocean" :
Until v3.1, subscription mechanics ( a.k.a. a TOPIC-filter ) was handled on the SUB-side, so this part of the processing got distributed among all SUB-s ( at a cost of uniformly wide data-traffic across all transport-classes involved ) and there was no penalty, except for a sourcing such data-flow related workload ... on the PUB-side.
Since v3.1, the TOPIC-filter is processed on the PUB-side, at a cost of such a processing overhead & memory allocations, but saving all the previously wasted transport-capacities, consumed just to later realise at the SUB-side the message is not matching the TOPIC-filter and will be disposed off.
Using a .poll()-based & zmq.NOBLOCK-modes of .recv()- & .send()-methods in the code design will never leave one in ambiguous, the less in an unsalvagable deadlock waiting-state and adds the capability to design even a lightweight priority-driven soft-scheduler for doing so with different relative priority levels.
Given your strong exposure in realtime systems, you might like to have a read into this to review the ZeroMQ Framework properties.

Pika threaded execution gets error - 505, 'UNEXPECTED_FRAME

I'm aware that pika is not thread safe, i was trying to work around using a lock to access to channel but still get error:
pika.exceptions.ConnectionClosed: (505, 'UNEXPECTED_FRAME - expected content header for class 60, got non content header frame instead')
PS i cannot use a different channel.
what could i do? Thank you for help in advance

You need to redesign your application or choose another Rabbitmq library than Pika. Locks do not make Pika thread safe. Each thread needs to have a separate connection.
You have a couple of options, but none of them will be as simple as using a lock.
One would be to replace Pika with Kombu. Kombu is thread safe but the interface is rather different from Pika (simpler in my opinion but this is subjective).
If you want to keep using Pika, then you need to redesign your Rabbit interface. I do not know why you "cannot" use a different channel. But one possible way of doing this would be to have a single thread interfacing with Rabbit, and that thread would interact with worker threads doing tasks with the received data, and you would communicate via queues with them. This way your Rabbit thread would read data, send the received data to a worker in a queue, receive answers from workers via another queue and then submitting them to rabbit as responses.
You might also be able to untangle something in your communications protocol so that you actually can use a different channel and each thread can interface rabbit independently with their own connections and channels. This is the method I generally use.
Yet another candidate would be to get rid of threads and start using async methods instead. Your application may or may not be suitable for this.
But there is no simple workaround, and you will eventually encounter weird behaviour or exceptions if you try to share Pika objects between threads.

simple websocket server on Python using time.sleep

When using time.sleep(1) before sendMessage, the hole process stops (even the others connections).
def handleConnected(self):
print self.address, 'connected'
for client in clients:
time.sleep(1)
client.sendMessage(self.address[0] + u' - connected')
Server: https://github.com/dpallot/simple-websocket-server
How to solve it?

The server that you are using is a synchronous, "select" type server. These servers use a single process and a single thread, they achieve concurrency through the use of the select() function to efficiently wait for I/O on multiple socket connections.
The advantage of select servers is that they can easily scale to very large number of clients. The disadvantage is that when the server invokes an application handler (the handleConnected(), handleMessage() and handleClose() methods for this server), the server blocks on them, meaning that while the handlers are running the server is suspended, because both the handlers and the server run on the same thread. The only way for the server to be responsive in this type of architecture is to code the handlers in such a way that they do what they need to do quickly and return control back to the server.
Your handleConnected handler function is not a good match for this type of server, because it is a long running function. This function will run for several seconds (as many seconds as there are clients), so during all that time the server is going to be blocked.
You can maybe work around the limitations in this server by creating a background thread for your long running task. That way your handler can return back to the server after launching the thread. The server will then regain control and go back to work, while the background thread does that loop with the one second sleeps inside. The only problem you have to consider is that now you have sort of a home-grown multithreaded server, so you will not be able to scale as easily.
Another option for you to consider is to use a different server architecture. A coroutine based server will support your handler function as you coded it, for example. The two servers that I recommend in this category are eventlet and gevent. The eventlet server comes with native WebSocket support. For gevent you have to install an extension called gevent-websocket.
Good luck!

You are suspending the thread with sleep and the server which you are using seems to be using select to handle the requests not threads. So no other request will be able to be handled.
So you can't use time.sleep.
Why do you need to sleep? Can you solve it some other way?
Maybe you can use something like threading.Timer()
def sendHello(client):
client.sendMessage("hello, world")
for client in clients:
t = Timer(1.0, lambda: sendHello(client))
t.start() # after 30 seconds, "hello, world" will be printed
This is off the top of my head. You would also need a way to cancel each timer so I guess you would need to save each t in a list and call it when done.

What is the right ZMQ architecture for a webserver sending fire-and-forget tasks to a bunch of webservers?

I have a website which sends out heavy processing tasks to a worker server. Right now, there is only one worker server however in the future more will be added. These jobs are quite time-consuming (takes 5mins - 1 hour). The idea is to have a configuration where just building a new worker server should suffice to increase the capacity of the whole system, without needing extra configuration in the webserver parts.
Currently, I've done a basic implementation using python-zeromq, with the PUSH/PULL architecture.
Everytime there's a new job request, the webserver creates a socket, connects to one of the workers and sends the job (no reply needed, this is a fire-and-forget type of job):
context = zmq.Context()
socket = context.socket(zmq.PUSH)
socket.connect("tcp://IP:5000")
socket.send(msg)
And on the worker side this is running all the time:
context = zmq.Context()
socket = context.socket(zmq.PULL)
# bind to port in it's own IP
socket.bind("tcp://IP:5000")
print("Listening for messages...")
while True:
msg = socket.recv()
<do something>
Now I looked more into this, and I think this is not quite the right way of doing it. Since adding a new worker server would require to add the IP of it to the webserver script, connect to both of them etc.
I would rather prefer the webserver to have a persistent socket on (and not create one everytime), and have workers connect to the webserver instead. Sort of like here:
https://github.com/taotetek/blog_examples/blob/master/python_multiprocessing_with_zeromq/workqueue_example.py
In short, as opposed to what is above, webserver's socket, binds to its own IP, and workers connects to it.I suppose then jobs are sent via round-robin style.
However what I'm worried about is, what happens if the webserver gets restarted (something that happens quite often) or gets offline for a while. Using zeromq, will all worker
connections will hang? Somehow become invalid? If the webserver goes down, will the current queue disappear?
In the current setup, things seem to run somewhat OK, but I'm not 100% sure what's the right (and not too complex) way of doing this.

From the ZeroMQ Guide:
Components can come and go dynamically and ØMQ will automatically reconnect.
If the underlying tcp connection is broken, ZeroMQ will repeatedly try to reconnect, sending your message once the connection succeeds.
Note that PAIR sockets are an exception. They don't automatically reconnect. (See the zmq_socket docs.)

Binding on the server might work. Are you sure you won't ever need more than one web server, though? I'd consider putting a broker between your server(s) and workers.
Either way, I think persistent sockets are the way to go.

Interaction with twisted.internet.reactor

I am learning Twisted, especially its XMPP side. I am writing a Jabber client which must send and recieve messages. Here is my code: http://pastebin.com/m71225776
As I understood the workflow is like this:
1. I create handlers for important network events (i.e. connecting, message recieving, disconnecting, etc)
2. I run reactor. At this moment starts the loop which is waiting for any event. When event happens it is passed to specified handler.
The problem is in sending messages. Sending is not associated with any network event so I can't create handler for it. Also I can't do anything with reactor until its loop stops working. But the goal is "To send messages when I need and recieve data when it comes".
I think am not fully understand philosophy of twisted, so give me a right way please.

You just need to find what events will trigger sending a message.
For example, in a GUI client, sending happens when the user types something. You should integrate with a graphics toolkit, using the Twisted reactor for its mainloop (there's a Gtk+ Twisted reactor for example). Then you'll be able to listen for some interface events, like the user typing enter in a text area; and you'll be able to react to that event by sending a message.
Other sources of events could be Twisted timers, any kind of protocol, including IPC, webhooks…
Incidentally, if you need a higher-level library for XMPP with Twisted, have a look at Wokkel.

More accurately, you can't do anything with the reactor until it calls one of your callbacks. You don't call twisted, twisted calls you.
One way to experiment is to have one of your setup handlers which you know will be called (or just test code put in after you start the reactor) call callLater() or loopingCall().

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.