How does reddit send email?

How does reddit send email? - python

I am trying to learn how to large organisations that use python structure their code so that I can maybe apply some of their theories to my own code.
Currently I am looking through reddit's code and am interested how they have implemented the sending of emails generated as part of the app's operations. See: https://github.com/reddit/reddit/blob/master/r2/r2/lib/emailer.py (their emailing library) and https://github.com/reddit/reddit/blob/master/r2/r2/models/mail_queue.py
I think mail_queue.py contains some form of SqlAlchemy table backed email queue.
Is this the case. Does that mean the table is kept in memory? Could somebody make this a little clearer for me?
Cheers from Down Under.
P.S. May I suggest if anybody is trying to get a good understanding of how to structure python apps they do the same as I am. Reading and understanding other peoples code has allowed me to structure and write noticeably better code.. :) Open source stuff is great!

Traditionally, the mail queue on e-mail servers has been some sort of disk storage. The reason for this is so that the chance of mail getting lost is minimized. For example, the mail server would receive a message and not send back a successful return code to the sending mail client until the entire message was successfully written to disk via synchronous write.
Yes, the reddit code is using a database as a email data store via SqlAlchemy.
As far as the table being stored in memory, I wouldn't imagine that it would be. From reading the SqlAchemy documentation, the Table object is SqlAlchemy is just a proxy to the underlying table in whatever database is backing the system. In general, you wouldn't want the table in memory since you don't know how many messages the system will process, how big the e-mail messages are, and how many messages need to be queued in case of a temporary mail sending failure.

Related

Which AWS infrastructure to choose to write an automated application in python?

I have a basic background in DS with Python and I try now the first time to build an application and I need a bit advise which infrastructure to choose on AWS and how to structure the application. The code I can develop/google on my own :)
The main question is: Where/On which platform of AWS should happen Step 2. I guess I miss there some basic knowledge of applications and therefore I have problems to google the problem myself.
What should happen by the application:
On a website a user types in values in a form and this values are sended somewhere so be processed. (Already coded)
Now, this values (so far an email with the values) has to be sent somewhere to be processed. Here I do not know in which infrastructure of AWS I can write an application that can receive this values (/email) directly and process it automatically?
3./4. Automated process of values, pdf creation and sending etc.
Goal is that always when a user uses the website and sends the email, that the automated process is triggered.
Thank you for your help! :)

I am assuming that you have access to the mailbox to which user form data will be sent via email, You can then read the email data using imap module of python and extract the required information either by using regex or by some html to dict conversion module, please find below link for html to dict conversion.
How to convert an HTML table into a Python dictionary.
Having said all that I would strongly recommend you to use AWS EC2 instance to host your application, NGNIX as web server, postgress as database and most importantly Django as the webframe work, you should have the user fill the require data in form and send that form directly to the back end server which can then save it directly to your database (there is no need to send the data via email), if you have any queries please let me know.

I would suggest you use a "fanning out" architecture with something like Eventbridge or SNS topic.
When your user submits form, you publish a message to an SNS topic.
That topic can send an email, and also send the data to a backend service like lambda to save to something like DynamoDB or something like RDS MySQL.

python simple data storage

I'm writing a little chatserver and client. There I got the idea to let users connect (nice :D) and when they want to protect their account by a password, they send /password <PASS> and the server will store the account information in a sqlite database file, so only users, who know the passphrase, are able to use the name.
But there's the problem: I totally forgot, that sqlite3 in python is not thread-safe :( And now its not working. Thanks to git I can undo all changes with the storage.
Does anyone have an idea how to store this stuff, so that they are persistent when stopping/starting the server?
Thanks.

OK, I'm using a simple JSON text file with automatic saving every minute

Fetch mails from another mailbox in Google App Engine

I am trying to fetch mails from another mailbox (xxx#domail.com or xxx#gmail.com) in google-app-engine.
I don't want to read mails from appspotmail box as it is being used for different purpose.
Is there any efficient way in which i can make this happen.

Two options:
You could read an inbox via POP/IMAP, but this requires a bit of coding. You also need to have Outgoing Sockets API enabled, which requires you to have a paid app. This approach is async, which means you will constantly need to poll for new messages.
Forward emails to a new appspotmail address (you can have many). This is pretty easy, especially since you already process incoming emails. Since you can have multiple accounts, e.g. xyz#yourappid.appspotmail.com, you can distinguish between them in code.

You can use imap+oauth to read email from a google address. If you google it the very first result is what you need. https://developers.google.com/gmail/oauth_overview

What's the preferred method for throttle websocket connections?

I have a web app where I am streaming model changes to a backbone collection in a chrome client. There a a few backbone views that may or may not render parts of the page depending on the type of update and what is being looked at. For example some changes to a model result in the view for the collection being re-rendered and there may or may not be a detail panel view open for the model that's being updated. These model changes can happen very fast as the server side workflow involves quite verbose and rapid changes to the model.
Here's the problem: I'm getting a large number of errno 32 pipe broken messages in the webserver's process when sending messages to the client, although the websocket connection is still up and its readyState is still 1 (OPEN).
What I suspect is happening is that the various views haven't finished rendering in the onmessage callback by the time the next message is coming in. After I get these tracebacks in stdout the websocket connection can still work and the UI will still update.
If I put eventlet.sleep(0.02) in the loop that reads model changes off the message queue and sends them on the websocket the broken pipe messages go away, however this isn't a real solution and feels like a nasty hack.
Has anyone has similar problems with websocket's onmessage function trying to do too much work and still being busy when the next message comes in? Anyone have a solution?

I think the most efficient way to do this is that client app tell the server what they are displaying. The server keep track of this and send changes only to the objects currently viewed, only to the concerned client.
A way to do this is by using a "Who Watch What" list of items.
Items are indexed in two ways. From the client ID and with a isVievedBy chainlist inside each data objects (I know it doesn't look clean to mix it with data but it is very efficient).
You'll also need a lastupdate timestamp for each data object.
When a client change view, it send a "I'm viewing this, wich I have the version -timestamp-" message to the server. The server check timestamp and send back the object if required. It also remove obsolete "Who Watch What" (accessing them by client ID) items and create the new ones.
When a data object is updated, loop through the isVievedBy chainlist of this object to know which client should be updated. Put this in message buffers for each client and flush those buffers manually (in case you update several items at the same time, it will send one big message).
This is lot of work, but your app will be efficient and scale gracefully, even with lot of objects and lot of clients. It sends only usefull messages and it is very unlikely that there will be too many of them.
For your onMessage problem, I would store data in a queue and process them asynchronously.

Process dynamic email addresses using python

I need to do the following and I was wondering if anyone has done something similar, and if so what they did.
I need to write a program that will handle incoming emails for different clients, process them, and then depending on the email address, do something (add to database, reply, etc).
The thing that makes this a little more challenging is that the email addresses aren't static they are dynamic. For example. The emails would be something like this. dynamic-email1#dynamic-subdomain1.domain.com . The emails are grouped by client using a dynamic subdomain in this example it would be 'dynamic-subdomain1'. A client would have their own subdomain that is assigned to them. Each client can create their own email address under their subdomain, and assign an event to that email. These email addresses and subdomains can change all of the time, new ones added, old ones removed, etc.
So for example if an email comes in for the email 'dynamic-email1#dynamic-subdomain1.domain.com' then I would need to look up in the database to find out which client is assigned the 'dynamic-subdomain1' subdomain and then look to see which event maps to the email address of 'dynamic-email1' and then execute that event. I have the event processing already, I'm just not sure how to map the email addresses to the event.
Since the email addresses are dynamic, it would be a real pain to handle this with file based configuration files, it would be nice to look up in a database instead. I did some research and I found some projects that do something similar but not exactly. The closest that I found is Zed Shaw's Lamson project: http://lamsonproject.org
More background:
I'm using python, django, linux, mysql, memcached currently.
Questions:
Has anyone used Lamson to do what I'm looking to do, how did you like it?
Is there any other projects that do something similar, maybe in a different language besides python?
How would I setup my DNS MX record to handle something like this?
Thanks for your help.
Update:
I did some more research on the google app engine suggestion and it might work but I would need to change too many things and it would add too many moving parts. I would also need a catch all emailer forwarder, anyone know of any good cheap ones? I prefer to deploy on system that handles all email. It looks like people have used postfix listening on port 25 and forwarding requests to lamson. This seems reasonable, I'm going to try it out and see how it goes. I'll update with my results.
Update 2:
I did some more research and I found a couple of websites that do something like this for me, so I'm going to look at them next.
http://mailgun.net
http://www.emailyak.com

I've done some work on a couple projects using dynamic email addresses, but never with dynamic subdomains at the same time. My thoughts on your questions:
I've never used Lamson, so I can't comment on that.
I usually use App Engine's API to receive and handle incoming messages, and it works quite well. You could easily turn each received message into a basic POST request on your own server with e.g. To, From, Subject, and Message fields and handle those with standard django.
One downside with GAE email is having to use *#yourappname.appspotmail.com, but you could get around that by setting up a catch-all email forwarder for *#yourdomain.com to direct everything to secretaddress#yourappname.appspotmail.com. That would let you receive the messages on the custom domain and handle them with GAE.
The other issue/benefit with GAE is using Google's servers instead of your own (at least for the email bit).
For the subdomain issue, you could try setting up wildcard DNS for the MX records, which (in theory) would direct all mail sent to any subdomain to the same server(s). This would enable you to receive email on all subdomains (for better or worse--look out for spam!)

For lamson, have you tried something as simple as:
#route("(address)#(subdomain).(host)", address=".+", subdomain="[^\.]+")
def START(message, address=None, subdomain=None, host=None):
....

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.