I'm looking for a way to take gads of inbound SMTP messages and drop them onto an AMQP broker for further routing and processing. The messages won't actually end up in a mailbox, but instead SMTP is used as a message gateway.
I've written a Postfix After-Queue Content Filter in Python that drops the inbound SMTP message onto a RabbitMQ broker. That works well - I get the raw message over a queue and it gets picked up nicely by a consumer. The issue is that the AMQP connection is created and torn down with each message... the Content Filter script gets re-executed from scratch each time. I imagine that will end up being a performance issue.
If I could leverage something re-entrant I could reuse the connection. Or maybe I'm just approaching the whole thing incorrectly...
Making an AMQP connection over plain TCP is pretty quick. Perhaps if you're using SSL then it's another story but you sure that enqueueing the raw message onto the AMQP exchange is going to be the bottleneck? My guess would be that actually delivering the message via SMTP is going to be much slower so how fast you can queue things up isn't going to affect the throughput of the system.
If this piece does turn out to be a bottleneck I rather like creating little web servers using Sinatra, or Rack but it sounds like you might prefer a Python based solution. Have the postfix content filter perform a HTTP POST using curl to a webserver, which maintains a persistent connection to the AMQP server.
Of course now you have an extra moving part for which you need to think about monitoring, error handling and security.
Use SwiftMQ. It has a JavaMail bridge which receives your eMails from an IMAP or POP3 account, converts it into a JMS message which then can be consumed by an AMQP 0.9.1 and/or AMQP 1.0 and of course JMS client.
You can make Postfix deliver all or any of your emails to an external program, where you can throw them anywhere you want. Some examples can be found here.
Related
I have a quick question regarding sending and receiving STOMP messages over an AMQ broker.
I have a python script that is sending data as STOMP messages to an AMQ instance, and another script which listens to that messages topic and grabs it. Everything is working as expected so far, but I'm curious about the security of the system. Would someone on the network be able to use a packet sniffer or similar tool to read the messages that are being sent/received by the broker? Or are they unable to see the data without the AMQ server login? My gut tells me it's the latter, but I wanted to confirm.
For context, my sender sends out the data using stomp.py
conn = stomp.Connection(host_and_ports=[(ip, port)])
conn.connect(wait=True)
conn.send(body=clean_msg, destination=f"/topic/{topic}")
Is that conn.send call encrypting or protecting my data in any way? If it isn't, how do I go about doing so? All my research into ActiveMQ and STOMP encryption leads me to encrypting the login or using SSL to login to the AMQ server, which leads me to believe that as long as the login is secure, I should be fine.
Thanks in advance!
STOMP is a text oriented protocol so unless you're using SSL/TLS then anybody who has access to the network would be able to look at the packets and fairly easily read the message data that's being sent from your producer(s) to the broker and from the broker to the consumer(s).
From what I can tell your Python STOMP client is not using SSL/TLS so your transmissions would not be protected.
Furthermore, once the data is stored on the broker then anybody with file-system access would be able to read the data as it is not encrypted in the storage. You can, of course, mitigate this risk by enforcing standard user access.
I am creating a colloabrative note-making app in python.
Here, one guy on computer running the app can create the server subseuqently the changes on the screen([color, pixel], where pixel=[x,y]) will be transmitted to others connected to the server.
I am using kivy for creating the app. My question is with respect to transmitting the data over the server.
I can create server using this:
import socket
ip_address=socket.gethostbyname(socket.gethostname())
execfile( "manage.py runserver "+ip_address+":8000" )
Now, how do others connect to the server and request the data(assuming the above code is correct). Also, how to send the data in django.
Well, Django is a framework that allows creating a site or API that is reachable through HTTP protocol. This has several consequences for you:
Server cannot send a message to client unless the client asks. HTTP is a "request-response" protocol. Client sends a request (for example, http://server.com/getUpdates?id=100500) and gets a response from server.
Creating clients that ask the server to give them updates all the time is a bad practice, probably leading to server DoS.
Although you can use WebSockets, using Django for such a task is really an overkill.
Summarizing, you need a reliable duplex channel for sending data in both directions. I'd start with TCP server, rather than HTTP. Fortunately, Python stdlib has a module you can start with - socketserver.
Additional reading
TCP
UDP (you will probably want this for broadcasting)
Berkeley sockets (a socket standard underlying socketserver module)
TCP vs. UDP
When deciding what protocol to use, following aspects should be considered:
TCP is reliable. Messages never disappear implicitly. If there was a network error, message will be resent. If there's no connection, explicit error will be raised. TCP uses several algorithms to fit into the network channel. It is an intelligent protocol.
UDP is unreliable. It possesses no feature TCP has. Packets can disappear, get reordered. But UDP messages are lightweight and in experienced hands they summon to life such systems as network action games and streaming video (lost and reordered messages aren't crucial here and TCP becomes too slow).
So I'd recommend to start with TCP. It's way more easier to get working fast and correct than UDP. Switch to UDP if you have some experience with TCP and there are a lot of people using you app and wanting to get the lowest latency possible.
Say I have a typical web server that serves standard HTML pages to clients, and a websocket server running alongside it used for realtime updates (chat, notifications, etc.).
My general workflow is when something occurs on the main server that triggers the need for a realtime message, the main server sends that message to the realtime server (via a message queue) and the realtime server distributes it to any related connection.
My concern is, if I want to scale things up a bit, and add another realtime server, it seems my only options are:
Have the main server keep track of which realtime server the client
is connected to. When that client receives a notification/chat
message, the main server forwards that message along to only the
realtime server the client is connected to. The downside here is
code complexity, as the main server has to do some extra book
keeping.
Or instead have the main server simply pass that message
along to every realtime server; only the server the client is
connected to would actually do anything with it. This would result
in a number of wasted messages being passed around.
Am I missing another option here? I'm just trying to make sure I don't go too far down one of these paths and realize I'm doing things totally wrong.
If the scenario is
a) The main web server raises a message upon an action (let's say a record is inserted)
b ) He notifies the appropriate real-time server
you could decouple these two steps by using an intermediate pub/sub architecture that forwards the messages to the indended recipient.
An implementation would be
1) You have a redis pub-sub channel where upon a client connecting to a real-time socket, you start listening in that channel
2) When the main app wants to notify a user via the real-time server, it pushes to the channel a message, the real-time server get's it and forwards it to the intended user.
This way, you decouple the realtime notification from the main app and you don't have to keep track of where the user is.
The problem you are describing is the common "message backplane" used for example in SignalR, also related to the "fanout message exchange" in message architectures. When having a backplane or doing fanout, every message is forwarded to every message node server, so clients can connect to any server and get the message. This approach is a reasonable pain when you have to support both long polling and websockets. However, as you noticed, it is a waste of traffic and resources.
You need to use a message infrastructure with intelligent routing, like RabbitMQ. Take a look to topic and header exchange : https://www.rabbitmq.com/tutorials/amqp-concepts.html
How Topic Exchanges Route Messages
RabbitMQ for Windows: Exchange Types
There are tons of different queuing frameworks. Pick the one you like, but ensure you can have more exchange modes than just direct or fanout ;) At the end, a WebSocket is just and endpoint to connect to a message infrastructure. So if you want to scale out, it boils down to the backend you have :)
For just a few realtime servers, you could conceivably just keep a list of them in the main server and just go through them round-robin.
Another approach is to use a load balancer.
Basically, you'll have one dedicated node to receive the requests from the main server, and then have that load-balancer node take care of choosing which websocket/realtime server to forward the request to.
Of course, this just shifts the code complexity from the main server to a new component, but conceptually I think it's better and more decoupled.
Changed the answer because a reply indicated that the "main" and "realtime" servers are alraady load-balanced clusters and not individual hosts.
The central scalability question seems to be:
My general workflow is when something occurs on the main server that triggers the need for a realtime message, the main server sends that message to the realtime server (via a message queue) and the realtime server distributes it to any related connection.
Emphasis on the word "related". Assume you have 10 "main" servers and 50 "realtime" servers, and an event occurs on main server #5: which of the websockets would be considered related to this event?
Worst case is that any event on any "main" server would need to propagate to all websockets. That's a O(N^2) complexity, which counts as a severe scalability impairment.
This O(N^2) complexity can only be prevented if you can group the related connections in groups that don't grow with the cluster size or total nr. of connections. Grouping requires state memory to store to which group(s) does a connection belong.
Remember that there's 3 ways to store state:
global memory (memcached / redis / DB, ...)
sticky routing (load balancer configuration)
client memory (cookies, browser local storage, link/redirect URLs)
Where option 3 counts as the most scalable one because it omits a central state storage.
For passing the messages from "main" to the "realtime" servers, that traffic should by definition be much smaller than the traffic towards the clients. There's also efficient frameworks to push pub/sub traffic.
Important note:
I've asked this question already on ServerFault: https://serverfault.com/questions/349065/clustering-tcp-servers-so-can-send-data-to-all-clients, but I'd also like a programmers perspective on the problem.
I'm developing a real-time mobile app by setting up a TCP connection between the app and server backend. Each user can send messages to all other users.
(I'm making the TCP server in Python with Twisted, am creating my own 'protocol' for communication between the app/backend and hosting it on Amazon Web Services.)
Currently I'm trying to make the backend scalable (and reliable). As far as I can tell, the system could cope with more users by upgrading to a bigger server (which could become rather limiting), or by adding new servers in a cluster configuration - i.e. having several servers sitting behind a load balancer, probably with 1 database they all access.
I have sketched out the rough architecture of this:
However what if the Red user sends a message to all other connected users? Red's server has a TCP connection with Red, but not with Green.
I can think of a one way to deal with this problem:
Each server could have an open TCP (or SSL) connection with each other server. When one server wants to send a message to all users it simply passes this along it's connection to the other servers. A record could be kept in the database of which servers are online (and their IP address), and one of the servers could be a boss - i.e. decides if others are up and running, if not it could remove them from the database (if a server was up and lost it's connection to the boss it could check the database and see if it had been removed, and restart if it had - else it could assume the boss was down.)
Clearly this needs refinement but shows the general principle.
Alternatively I'm not sure if this is possible (- definitely seems like wishful thinking on my part):
Perhaps users could just connect to a box or router, and all servers could message all users through it?
If you know how to cluster TCP servers effectively, or a design pattern that provides a solution, or have any comments at all, then I would be very grateful. Thank you :-)
You need to decide (or if you already did this - to share these decisions with us) reliability requirements for your system: should all messages be sent to all users in any case (e.g. one or more servers crashed), can you tolerate sending the same message twice to the same user on server crash? Your system complexity depends directly on these decisions.
The simplest version is when a message is not delivered to all users on server crash. All your servers keep TCP connection to each other. One of them receives a message from a user and sends it to all other connected users (to this server) and to all other connected servers. Other servers send this message to all their users. To scale the system you just run additional server which connects to all existing servers.
Have a look how it is handled with IRC servers. They essentially can do this already. Everbody can send to everybody else, on all servers. Or just to single users, also on another server. And to groups, called "channels". It works best by routing amongst the servers.
It's not that hard, if you can make sure the servers know each other and can talk to each other.
On a side note: At 9/11, the most reliable internet news source was the IRC network. All the www sites were down because of bandwidth; it took them ages to even get a plain-text web page back up. During this time, IRC networks were able to provide near real-time, moderated news channels across the atlantic. You maybe could no longer log into a server on the other side, but at least the servers were able to keep up a server-to-server connection across.
An obvious choice is to use the DB as a clearinghouse for messages. You have to store incoming messages somewhere anyway, lest they be lost if a server suddenly crashes. Put incoming messages into the central database and have notification processes on the TCP servers grab the messages and send them to the correct users.
TCP server cannot be clustered, the snapshot you put here is a classic HTTP server example.
Since the device will send TCP connection to server, say, pure socket, there will be noway of establishing a load-balancing server.
The fun part of websockets is sending essentially unsolicited content from the server to the browser right?
Well, I'm using django-websocket by Gregor Müllegger. It's a really wonderful early crack at making websockets work in Django.
I have accomplished "hello world." The way this works is: when a request is a websocket, an object, websocket, is appended to the request object. Thus, I can, in the view interpreting the websocket, do something like:
request.websocket.send('We are the knights who say ni!')
That works fine. I get the message back in the browser like a charm.
But what if I want to do that without issuing a request from the browser at all?
OK, so first I save the websocket in the session dictionary:
request.session['websocket'] = request.websocket
Then, in a shell, I go and grab the session by session key. Sure enough, there's a websocket object in the session dictionary. Happy!
However, when I try to do:
>>> session.get_decoded()['websocket'].send('With a herring!')
I get:
Traceback (most recent call last):
File "<console>", line 1, in <module>
error: [Errno 9] Bad file descriptor
Sad. :-(
OK, so I don't know much of anything about sockets, but I know enough to sniff around in a debugger, and lo and behold, I see that the socket in my debugger (which is tied to the genuine websocket from the request) has fd=6, while the one that I grabbed from the session-saved websocket has fd=-1.
Can a socket-oriented person help me sort this stuff out?
I'm the author of django-websocket. I'm not a real expert in the topic of websockets and networking, however I think I have a decent understanding of whats going on. Sorry for going into great detail. Even if most of the answer isn't specific to your question it might help you at some other point. :-)
How websockets work
Let me explain shortly what a websocket is. A websocket starts as something that really looks like a plain HTTP request, established from the browser. It indicates through a HTTP header that it wants to "upgrade" the protocol to be a websocket instead of a HTTP request. If the server supports websockets, it agrees on the handshake and both - server and client - now know that they will use the established tcp socket formerly used for the HTTP request as a connection to interchange websocket messages.
Beside sending and waiting for messages, they have also of course the ability to close the connection at any time.
How django-websocket abuses the python's wsgi request environment to hijack the socket
Now lets get into the details of how django-websocket implements the "upgrading" of the HTTP request in a django request-response cylce.
Django usually uses the WSGI specification to talk to the webserver like apache or gunicorn etc. This specification was designed just with the very limited communication model of HTTP in mind. It assumes that it gets a HTTP request (only incoming data) and returns the response (only outgoing data). This makes it tricky to force django into the concept of a websocket where bidirectional communication is allowed.
What I'm doing in django-websocket to achieve this is that I dig very deeply into the internals of WSGI and django's request object to retrieve the underlaying socket. This tcp socket is then used to handle the upgrade the HTTP request to a websocket instance directly.
Now to your original question ...
I hope the above makes it obvious that when a websocket is established, there is no point in returning a HttpResponse. This is why you usually don't return anything in a view that is handled by django-websocket.
However I wanted to stick close to the concept of a view that holds the logic and returns data based on the input. This is why you should only use the code in your view to handle the websocket.
After you return from the view, the websocket is automatically closed. This is done for a reason: We don't want to keep the socket open for an undefined amount of time and relying on the client (the browser) to close it.
This is why you cannot access a websocket with django-websocket outside of your view. The file descriptor is then of course set to -1 indicating that its already closed.
Disclaimer
I explained above that I'm digging in the surrounding environment of django to get somehow -- in a very hackish way -- access to the underlaying socket. This is very fragile and also not supposed to work since WSGI is not designed for this! I also explained above that the websocket is closed after the view ends - however after the websocket closed down (AND closed the tcp socket), django's WSGI implementation tries to send a HTTP response - it doesn't know about websockets and thinks it is in a normal HTTP request-response cycle. But the socket is already closed an the sending will fail. This usually causes an exception in django.
This didn't affected my testings with the development server. The browser will never notice (you know .. the socket is already closed ;-) - but raising an unhandled error in every request is not a very good concept and may leak memory, doesn't handle database connection shutdown correctly and many athor things that will break at some point if you use django-websocket for more than experimenting.
This is why I would really advise you not to use websockets with django yet. It doesn't work by design. Django and especially WSGI would need a total overhaul to solve these problems (see this discussion for websockets and WSGI). Since then I would suggest using something like eventlet. Eventlet has a working websocket implementation (I borrowed some code from eventlet for the initial version of django-websocket) and since its just plain python code you can import your models and everything else from django. The only drawback is that you need a second webserver running just to handle websockets.
As Gregor Müllegger pointed out, Websockets can't be properly handled by WSGI, because that protocol never was designed to handle such a feature.
uWSGI, since version 1.9.11, can handle Websockets out of the box. Here uWSGI communicates with the application server using raw HTTP rather than the WSGI protocol. A server written that way, can therefore handle the protocol internals and keep the connection open over a long period. Having long living connections handled by a Django view is not a good idea either, because they then would block a worker thread, which is a limited resource.
The main purpose of Websockets, is to have the server push messages to the client in an asynchronous way. This can be a Django view triggered by other browsers (ex.: chat clients, multiplayer games), or an event triggered by, say django-celery (ex.: sport results). It therefore is fundamental for these Django services, to use a message queue for pushing messages to the client.
To handle this in a scalable way, I wrote django-websocket-redis, a Django module which can keep open all those long living Websocket connections in one single thread/process using Redis as the backend message queue.
You could give stargate a bash: http://boothead.github.com/stargate/ and http://pypi.python.org/pypi/stargate/.
It's built on top of pyramid and eventlet (I also contributed a fair bit of the websocket support and tests to eventlet). The big advantage of pyramid for this sort of thing is that it's got the concept of a resource which the url maps to, rather than just the result of a callable. So you end up with a graph of persistent resources that maps to your url structure and websocket connections are simply routed and connected to those resources.
So you end up only needing to do two things:
class YourView(WebSocketView):
def handler(self, websocket):
self.request.context.add_listener(websocket)
while True:
msg = websocket.wait()
# Do something with message
To receive messages
and
resource.send(some_other_message)
Here resource is an instance of a stargate.resource.WebSocketAwareContext (as is self.request.context) above and the send method sends the message to all clients connected with the add_listener method.
To publish a message to all of the connected clients you just call node.send(message)
I'm hopefully going to write up a little example app in the next week or two to demonstrate this a little better.
Feel free to ping me on github if you want some help with it.
request.websocket is probably get closed when you return from the request handler (view). The simple solution is to keep the handler alive (by not returning from the view). If your server is not multi-threaded you won't be able to accept any other simultaneous requests.