Email title and link from rss-feed and email them - python

I'm doing a bit of an experiment in Python. I'm making a script which checks a rss-feed for new items, and then sends the title and link of the items via email. I've got the script to work to a certain level: when it runs it will take the link+title of the newest item and email it, regardless of wether it emailed that file already or not. I'd need to add 2 things: a way to get multiple items at once (and email those, one by one), and a way to check wether they have been sent already. How would I do this? I'm using feedparser, this is what I've got so far:
d = feedparser.parse('http://feedparser.org/docs/examples/rss20.xml')
link = d.entries[0].link
title = d.entries[0].title
And then a couple of lines which send an email with "link" and "title" in there. I know I'd need to use the Etag, but haven't been able to work out how, and how would I send the emails 1 by 1?

for the feed parsing part, you could consider following the advise given in this question regarding How to detect changed and new items in an RSS feed?. Basically, you could hash the contents of each entry and use that as an id.
For instance, on the first run of your program it will calculate the hash of each entry, store that hash, and send these new entries by mail. On it's next run, it will rehash each entry's content and compare those hashes with the ones found before (you should use some sort of database for this, or at least an on memory dictionary/list when developing with the entries already parsed and sent). If your program finds hashes that where not generated on the previous runs, it will assemble a new email and send it with the "new" entries.
As for your email assembling part, the question Sending HTML email in Python could help. Just make sure to send a text only and a html version.

For the simplest method see the python smtplib documentation example. (I won't repeat the code here.) It's all you need for basic email sending.
For nicer/more complicated email content also look into python's email module, of course.

Related

Task Automation Help: (no similar entries)

Problem: I’m a nurse, and part of my job is to pull up a list of “unsigned services”. Then of course, take these charts, and send them to the right person.
The person before me did not do this, leaving me THOUSANDS of charts to pull up by patient name and DOB, select the right document, and send to the right person.
I have figured out how to use selenium with python to automate logging in, using input to send keys to search the correct patient, and even to pull up the correct document that needs signed.
How do I have the program do this, for every chart? How do I have python work down the list of names and DOB’s without my having to manually put them in?
Anything I look for on my own is just examples of applying a basic function to a list of numbers and that isn’t my goal.
Thanks for your help!

Parse email and send reply using googleapiclient in Python

I'm currently working on a project and I have chosen to use Gmail for sending and receiving emails. I want to be able to send an email, have a user reply to it, and parse their response. The response can be any number of lines (so something like response.split('\n')[0] won't work). It should then be able to reply directly to that email thread.
I've been following the googleapiclient tutorials, but they leave a lot to be desired. However, I've managed to read email threads using:
service.users.threads().get(userId='me', id=thread_id).execute()
where thread_id is (predictably) the ID of the email thread (which I find elsewhere). In the large dict returned by this, there is a section of base64 data which contains the content of the email. This was the only place I could find the actual data for the response. Unfortunately, I get this when it is decoded:
b'This is my response from my phone\r\n\r\nOn Sat, 28 Nov 2020, 8:40 PM , <myemail#gmail.com>\r\nwrote:\r\n\r\n> This is sent from the python script\r\n>\r\n'
This is all the data in the thread, however, I only want the response as there is clearly no way to split this to get only the data I need. The best I can think of is to parse out anything of the form On <date>, <time>, but that could lead to problems. There must be another way to extract only This is my response from my phone and no other data.
Once I get the response, I want to parse it and reply with an appropriate response based on the contents of the message. I would prefer to reply directly to the thread, rather than starting a new one. Unfortunately, all the Google documentation says is:
If you're trying to send a reply and want the email to thread, make sure that:
The Subject headers match
The References and In-Reply-To headers follow the RFC 2822 standard.
The documentation provides this code (with some minor modifications by me) for sending an email:
def create_message(sender, to, subject, message_text):
message = MIMEText(message_text)
message['to'] = to
message['from'] = sender
message['subject'] = subject
return {'raw': base64.urlsafe_b64encode(message.as_bytes()).decode()}
Sending a reply with the same subject line is pretty straight forward (message['subject'] = same_subject_as_before), but I don't even know where to start with the References and In-Reply-To headers. How do I set these?
Why is this hard?
You are trying to use e-mail for something it simply wasn't originally designed for. My impression is you want the e-mail response to contain structured data, but e-mail text lacks any well-defined structure. It also depends on which e-mail client the other user has, and whether they send HTML e-mail or not.
This is usually easy for a human to see, but difficult for a computer. Which suggests that Machine Learning might be the best strategy if you want higher reliability. Whatever solution you choose, it's not going to be 100% reliable.
E-mail can be plain text or HTML, or both.
There is no well-defined structure to separate replies from the original text. Wikipedia lists a few different "posting styles".
In the old days when "Netiquette" was still cool, putting your reply on top ("top-posting") was considered bad practice, and new Internet users were told by old folks to avoid top-posting. Some users still reply below or interleaved with the original text.
The reply line (e.g. "On DATE, EMAIL wrote:" or "-------- Original Message --------") will be different, depending on which e-mail client is used, what language that client is set to, and the user's own preferences.
Using a text delimiter
A class of software which faces a similar problem as the one you describe is customer service applications, which allow operators to use e-mail for communication. A common strategy is to inject some unique text in your templates for outgoing e-mail. For example, Zendesk uses a text "delimiter" such as:
##- Please type your reply above this line -##
This serves two purposes; it tells users to top-post, and it provides a separator to cut out most of the irrelevant text.
If you first handle any HTML encoding, you should be able to split the message by such a text delimiter. It's not perfect, but it usually works.
Use products made by others
There are some open source options, such as:
https://github.com/zapier/email-reply-parser
And I found a commercial product, SigParser, which seems to use a machine learning model that they've trained very carefully:
https://sigparser.com/developers/extract-reply-chains-from-emails/
They also explain some of the challenges of parsing e-mail text into structured data.

How to parse email body in Python? [duplicate]

I want to retrieve body (only text) of emails using python imap and email package.
As per this SO thread, I'm using the following code:
mail = email.message_from_string(email_body)
bodytext = mail.get_payload()[ 0 ].get_payload()
Though it's working fine for some instances, but sometime I get similar to following response
[<email.message.Message instance at 0x0206DCD8>, <email.message.Message instance at 0x0206D508>]
You are assuming that messages have a uniform structure, with one well-defined "main part". That is not the case; there can be messages with a single part which is not a text part (just an "attachment" of a binary file, and nothing else) or it can be a multipart with multiple textual parts (or, again, none at all) and even if there is only one, it need not be the first part. Furthermore, there are nested multiparts (one or more parts is another MIME message, recursively).
In so many words, you must inspect the MIME structure, then decide which part(s) are relevant for your application. If you only receive messages from a fairly static, small set of clients, you may be able to cut some corners (at least until the next upgrade of Microsoft Plague hits) but in general, there simply isn't a hierarchy of any kind, just a collection of (not necessarily always directly related) equally important parts.
The main problem in my case is that replied or forwarded message shown as message instance in the bodytext.
Solved my problem using the following code:
bodytext=mail.get_payload()[0].get_payload();
if type(bodytext) is list:
bodytext=','.join(str(v) for v in bodytext)
My external lib: https://github.com/ikvk/imap_tools
from imap_tools import MailBox
# get list of email bodies from INBOX folder
with MailBox('imap.mail.com').login('test#mail.com', 'pwd', 'INBOX') as mailbox:
bodies = [msg.text or msg.html for msg in mailbox.fetch()]
Maybe this post (of mine) can be of help. I receive a Newsletter with prices of different kind of oil in the US. I fetch email in gmail with a given pattern for the title, then I extract the prices in the mail body using regex. So i have to access the mail body for the last n emails which title observe given pattern.
I am using email.message_from_string() also: msg = email.message_from_string(response_part[1])
so maybe it gives you concrete example of how to use methods in this python lib.

Filter EWS mailbox by recipient address using exchangelib

I'm writing a monitoring solution using python3 with exchangelib and trying to count messages in our team's mailbox. One of the criteria: recipient list must contain specific email address.
When i use filter() with author or subject arguments script is working fine and return correct results.
But when i tried to filter by to_recipients or to_recipients__contains (which is list-type field), script throws an exception:
ValueError: EWS does not support filtering on field 'to_recipients'
Is there a way to filter mailbox by recipient email_address, avoiding to fetch all messages and than filtering it on the client side?
[exchangelib maintainer here]
I don't think there is. You could try to flip the is_searchable flag on that field and search anyway, but I never could get filtering to work in my tests. I can't remember if it throws server errors, returns all items anyway, or returns an empty list.
I'm happy to accept patches it you do find a solution.

Python Flask: How to stop the user from exploiting Inspect Element or to get rid of the possibility completely

I'm trying to make a chat website using python Flask. I have a route called '/chatroom', and every time a message is sent (using the HTML I'm returning in the function), the HTML also sends along a hidden field with the value of the userid back to the start of the chatroom so I can use the request.args.get function to get the userid. My problem is that anyone on the site can simply Inspect Element and change their user id located in the hidden field to get past my user removal method. Is there a way to check if Inspect Element is used by any client or to disable it? Or better yet, is it possible to not have to send the user id along in the hidden field? Here is the code for the message send field and the userid hidden field:
`'''<!DOCTYPE html>
<html>
<body>
<form action='/chatroom'>
<input type=text name='msg'>
<input type=hidden name=userid value="''',userid,'''">
</form>
</body>
</html>'''`
Every time the form is sent, like I said earlier, the program gets the 'userid' argument and assigns in to the variable userid.
I have a different method about the first time when the page loads. Just ignore that for now.
Thanks!
P.S. If the question feels vague, which it probably will, just comment on this and I can clear it up.
The problem is not really with Inspect Element - the problem is that you've built a site which is, unfortunately, insecure.
One way of handling this would be to encode and decode user IDs with each communication between the client and server. So the userid that is sent forward to the frontend (the client) is encoded and no longer looks like a regular user ID. Then when the request is sent to the server, the server decodes the ID to be able to identify the user.
This is a very simplified version of web security and isn't even close to foolproof; for example, what's to stop people just copying the encoded IDs? You could follow it up by hashing IDs, but that adds more complexity. Long story short, it's going to be touch to implement yourself, and disabling Inspect Element won't solve the problem of your site being insecure. As such, I suggest that you look into existing packages which aim to secure user identities and either use them out of the box or try to replicate their behaviour. A couple of examples to investigate are Sessions and Security - thery would be good places to start.

Categories