I'm working on a NLP project for classifying email in Python. The main goal is to build a model that automatically redirect mails to the good service. I try to build a database with only the customers text mail and their demand.
I started to load the emails on the pop server with poplib and it works good.
I'm looking for a solution to decode any mail whatever the encoding.
I'm really not expert with encodings and I use a code that doesn't always work, I can't figure out why ..
I remark that it doesn't work on old messages, probably they are archived in one more different encoding.
I need a method that can detect and decode systematically, I searched on the web for two days and found nothing! Only website which propose to do it but I would like to integrate it directly in my code. I only need the body of the mail.
Does such a package exist? And if yes, which ?
Thanks a lot for reading me
Related
Goal: I want to send the result of a <script> tag in HTML to a server of any kind.
I am working with the Spotify authorization API for a project and one of the required keys is contained in a query string at the end of my app’s redirect URI. I need to get this key.
My solution is to set the redirect URI to a redirect page. Then, on the page, automatically run a script that gets the current URL and send it to my Python script for use. If there’s another way, please tell me, because I’m pretty stuck.
However, to send variables between HTML and Python, I have found I need to use a simple server. Setting this up is the hard part. I’ve made Java, Node.js, and Python servers, but nothing seems to want to work with the <string> tag, and I’m doubtful something like this would even get the actual output of the script.
Is there a way to do this? This is a pretty long question and I apologize, and it should probably be noted I’m a beginner so an explanation and code examples would be nice. Thank you to anyone who reads this!
I want to email out a document that will be filled in by many people and emailed back to me. I will then parse the responses using Python and load them into my database.
What is the best format to send out the initial document in?
I was thinking an interactive .pdf but do not want to have to pay for Adobe XI. Alternatively maybe a .html file but I'm not sure how easy it is to save the state of it once its been filled in in order to be emailed back to me. A .xls file may also be a solution but I'm leaning away from it simply because it would not be a particularly professional looking format.
The key points are:
Answers can be easily parsed using Python
The format should common enough to open on most computers
The document should look relatively pleasing to the eye
Send them a web-page with a FORM section, complete with some Javascript to grab the contents of the controls and send them to you (e.g. in JSON format) when they press "submit".
Another option is to set it up as a web application. There are several Python web frameworks that could be used for that. You could then e-mail people a link to the web-app.
Why don't you use Google Docs for the form. Create the form in Google Docs and save the answer in an excel sheet. And then use any python Excel format reader (Google them) to read the file. This way you don't need to parse through mails and will be performance friendly too. Or you could just make a simple form using AppEngine and save the data directly to the database.
Let's dive into this, shall we?
Ok, I need to write a script (I don't care what language, prefer something like Python or Javascript, but whatever works I will take time to learn). The script will access multiple URL's, extract text from each site and store it into a folder on my PC. (From there I am manipulating the data with Python, which I know how to do.)
EDIT:
Currently I am using python's NLTK module. Here is a simple version of my code:
url = "<URL HERE>"
html = urlopen(url).read()
raw = nltk.clean_html(html)
print(raw)
This code works fine for both http and https, but not for instances where authentication is required.
Is there a Python module which deals with secure authentication?
Thanks in advance for help! And to the mods who will view this as a bad question, please just give me ways to make it better. I need ideas..from people, not Google.
Mechanize (2) is one option, other is just with urllib2
I have to process a file everyday. This file is sent to my Email once everyday. If I can get to this email once every day and download the attachment, that had be awesome. Is it even remotely possible to do such a thing?
Thanks!
Please see How can I download all emails with attachments from Gmail? for a practical example.
This is certainly possible. Check out imaplib in Python's standard library; with it doing what you want should be quite straightforward. Also, you can process zip files directly in Python using the zipfile library.
Your best bet is to create an IMAP Folder for your daily emails to be sent to and then create a filter in GMail to send those files there. Your Python script can then check ONLY that folder on some interval and assume that whatever ends up in there is the file you want.
A quick search yielded sooo many results for IMAP fetching examples in Python, I'll leave that part up to you, but I will say that libgmail looks pretty neat.
I am basically trying to export a configuration file, once a week. While the product in question allows you to log in manually via a web client, enter some information, and get an XML file back when you submit, there's no facility for automating this. I can get away with using Python 2.5 (have used for a while) or 2.6 (unfamiliar) to do this.
I think I need to have some way to authenticate against the product. Although I can view the cookie in Firefox, when I looked through the actual cookie.txt file, it was not present. Didn't show up after clearing my private data and re-authenticating. Odd. Should I be shooting for the Cookie module, or could this be some arcane method of authentication that only looks like cookies? How would I know?
I think I need the httplib module to do the HTTP POST, but I don't know how to do the multipart/form-data encoding. Have I overlooked a handy option, or is this something where I must roll my own?
I assume I can get the XML file back in the HTTPesponse from httplib.
I've fetched web things via Python before, but not with POST, multipart encoding, and authentication in the mix.
urllib2 should cover all of this.
Here's a Basic Authentication example.
Here's a Post with multipart/form-data.
Try mechanize module.
You should look at the MultipartPostHandler:
http://odin.himinbi.org/MultipartPostHandler.py
And if you need to support unicode file names , see a fix at:
http://peerit.blogspot.com/2007/07/multipartposthandler-doesnt-work-for.html