Python 3 email extracting search engine

Python 3 email extracting search engine - python

Q. Write a search engine that will take a file (like an html source page) and extract all of the email addresses. It will then print them out in an ordered list. The file may contain a lot of messy text (i.e. asda#home is not valid.. and there can be a lot of #'s in the file in roles other than emails!)
For testing purposes, this is the text file I have been using:
askdalsd
asd
sad
asd
asd
asd
ad
asd
asda
da
moi1990#gmail.com
masda#sadas
223#home.ca
125512#12451.cpm
domain#name.com
asda
sda
as
da
ketchup#ketchup##%##.com
onez!es#gomail.com
asdasda#####email.com
asda#asdasdaad.ca
moee#gmail.com
And this is what I have so far:
import os
import re
import sys
def grab_email(file):
email_pattern = re.compile(r'\b[A-Z0-9._%+-]+#[A-Z0-9.-]+\.[A-Z]{2,4}\b',re.IGNORECASE)
found = set()
if os.path.isfile(file):
for line in open(file, 'r'):
found.update(email_pattern.findall(line))
for email_address in found:
print (email_address)
if __name__ == '__main__':
grab_email(sys.argv[1])
grab_email('email_addresses.txt')
Now the problem I am having is that after a certain point, the program crashes. This is the output:
125512#12451.cpm
es#gomail.com
asda#asdasdaad.ca
223#home.ca
moee#gmail.com
moi1990#gmail.com
domain#name.com
Traceback (most recent call last):
File "D:/Sheridan/Part Time/TELE26529 Linux Architecture w. Network Scripting/Python Assignment 3.5/question1.py", line 17, in <module>
grab_email('email_addresses.txt')
File "D:/Sheridan/Part Time/TELE26529 Linux Architecture w. Network Scripting/Python Assignment 3.5/question1.py", line 14, in grab_email
grab_email(sys.argv[1])
IndexError: list index out of range
What am I doing wrong here and how do I fix this? How can I more effectively handle these exceptions?

The problem is this part:
if __name__ == '__main__':
grab_email(sys.argv[1])
Your program is crashing because it is processing this inside of the grab_email function. Since you are running from the interpreter, the if statement will of course evaluate to True. Then, since you have passed no command line arguments, you are attempting a non-existing list element, causing the error you get.
To fix, just dedent! It should look like:
import os
import re
import sys
def grab_email(file):
email_pattern = re.compile(r'\b[A-Z0-9._%+-]+#[A-Z0-9.-]+\.[A-Z]{2,4}\b',re.IGNORECASE)
found = set()
if os.path.isfile(file):
for line in open(file, 'r'):
found.update(email_pattern.findall(line))
for email_address in found:
print (email_address)
if __name__ == '__main__':
grab_email(sys.argv[1])
This will now run correctly from the command line (assuming you pass the file name correctly from the command line). I have also removed the extraneous function call.
Of course, if you just want this to run in the interpreter, take out the if statement and reinstate the function call I removed. You could also do this:
if __name__ == '__main__':
if len(sys.argv)>1:
grab_email(sys.argv[1])
else:
grab_email('email_addresses.txt')
Which isn't great, per se, but handles that particular error (while introducing another potential one).

Related

Result of running code

My app is in VB.Net. I have a textbox. The user writes a piece of Python code init. I want to run this code. For example, the code in textbox is something like this:
print 7*7
the result of running this code in Python is 49. But if the user forgets a space and writes:
print7*7
the result of running this code in Python is:
Traceback (most recent call last):
File "vm_main.py", line 33, in <module>
import main
File "/tmp/vmuser_jrlbqyaetu/main.py", line 8, in <module>
print7*7
NameError: name 'print7' is not defined
Now I want to save the result of running code (error or correct data) in a string in VB.Net. Questions:
What is the data type of the result of running the code?
Is it possible to access it?
Is it possible to save it? Is it possible to save it in a string? If yes, how?

You can try this:
import sys
f = open('output.txt', 'w')
sys.stdout = f
###############
#your code here
##############
And than all outputs will be written in ouput?txt

CGI with Python

I'm beginning to use CGI with Python.
After running the following piece of code:
#!c:\python34\python.exe
import cgi
print("Content-type: text/html\n\n") #important
def getData():
formData = cgi.FieldStorage()
InputUN = formData.getvalue('username')
InputPC = formData.getvalue('passcode')
TF = open("TempFile.txt", "w")
TF.write(InputUN)
TF.write(InputPC)
TF.close()
if __name__ =="__main__":
LoginInput = getData()
print("cgi worked")
The following error occurs:
Traceback (most recent call last):
File "C:\xampp\htdocs\actual\loginvalues.cgi", line 21, in <module>
LoginInput = getData()
File "C:\xampp\htdocs\actual\loginvalues.cgi", line 16, in getData
TF.write(InputUN)
TypeError: must be str, not None
>>>
I'm trying to write the values, inputted in html, to a text file.
Any help would be appreciated :)

Your calls to getValue() are returning None, meaning the form either didn't contain them, had them set to an empty string, or had them set by name only. Python's CGI module ignores inputs that aren't set to a non-null string.
Works for Python CGI:
mysite.com/loginvalues.cgi?username=myname&pass=mypass
Doesn't work for Python CGI:
mysite.com/loginvalues.cgi?username=&pass= (null value(s))
mysite.com/loginvalues.cgi?username&pass (Python requires the = part.)
To account for this, introduce a default value for when a form element is missing, or handle the None case manually:
TF.write('anonymous' if InputUN is None else InputUN)
TF.write('password' if InputPC is None else InputUN)
As a note, passwords and other private login credentials should never be used in a URL. URLs are not encrypted. Even in HTTPS, the URL is sent in plain text that anyone on the network(s) between you and your users can read.
The only time a URL is ever encrypted is over a tunneled SSH port or an encrypted VPN, but you can't control that, so never bank on it.

AttributeError: 'module' object has no attribute

I've been scouring the internet for a solution and everything i've come across hasn't helped. So now i turn to you.
Traceback (most recent call last):
File "cardreader.py", line 9, in <module>
import ATRdb as ATR
File "/home/pi/Desktop/CardReader/ATRdb.py", line 4, in <module>
import cardreader
File "/home/pi/Desktop/CardReader/cardreader.py", line 113, in <module>
main()
File "/home/pi/Desktop/CardReader/cardreader.py", line 40, in main
getData(db)
File "/home/pi/Desktop/CardReader/cardreader.py", line 98, in getData
if ATR.checkPerms(db,track1):
AttributeError: 'module' object has no attribute 'checkPerms'
I have two files cardreader.py & ATRdb.py
---ATRdb.py has this setup
import sys
import MYSQLdb
import datetime
import cardreader
def checkPerms(db, securitycode):
try:
cursor = db.cursor()
cursor.execute("""SELECT permissions FROM atrsecurity.employee WHERE securitycode = %s""", (securitycode))
r = cursor.fetchone()
Permissions = r
if '3' in Permissions[0]:
return True
else:
return False
except Exception:
cardreader.main()
return False
---cardreader.py has this setup
import sys
import usb.core
import usb.util
import MYSQLdb
import ATRdb as ATR
def main():
db = MYSQLdb.connect(HOST,USER, PASS, DB)
print("Please swipe your card...")
getData(db)
main()
db.close()
def getData(db):
#
#lots of code to get card data
#
if ATR.checkPerms(db, track1):
print ("User has permission")
unlockDoor()
i get the error at the "If ATR.checkPerms():" part. Any help would be appreciated
(first python project)

Your problem is circular imports.
In cardreader, you do this:
import ATRdb as ATR
That starts importing ATRdb, but a few lines into the code, it hits this:
import cardreader
The exact sequence from here depends on whether cardreader.py is your main script or not, and on whether your top-level code that calls main is protected by an if __name__ == '__main__' guard (and assuming that top-level code is in cardreader rather than elsewhere). Rather than try to explain all the possibilities in detail (or wait for you to tell us which one matches your actual code), let's look at what we know is true based on the behavior:
In some way, you're calling main before finishing the import of ATRdb.
This means that, at this point, ATRdb has nothing in it but sys, MYSQLdb, and datetime (and a handful of special attributes that every module gets automatically). In particular, it hasn't gotten to the definition of checkPerms yet, so no such attribute exists in the module yet.
Of course eventually it's going to finish importing the rest of ATRdb, but at that point it's too late; you've already called main and it tried to call ATR.checkPerms and that failed.
While there are various complicated ways to make circular imports work (see the official FAQ for some), the easiest and cleanest solution is to just not do it. If ATRdb needs some functions that are in cardreader, you should probably factor those out into a third module, like cardutils, that both ATRdb and cardreader can import.

Python , XML Index error

Hello I am having trouble with a xml file I am using. Now what happens is on a short xml file the program works fine but for some reason once it reaches a size ( I am thinking 1 MB)
it gives me a "IndexError: list index out of range"
Here is the code I am writing so far.
from xml.dom import minidom
import smtplib
from email.mime.text import MIMEText
from datetime import datetime
def xml_data():
f = open('C:\opidea_2.xml', 'r')
data = f.read()
f.close()
dom = minidom.parseString(data)
ic = (dom.getElementsByTagName('logentry'))
dom = None
content = ''
for num in ic:
name = num.getElementsByTagName('author')[0].firstChild.nodeValue
if name:
content += "***Changes by:" + str(name) + "*** " + '\n\n Date: '
else:
content += "***Changes are made Anonymously *** " + '\n\n Date: '
print content
if __name__ == "__main__":
xml_data ()
Here is part of the xml if it helps.
<log>
<logentry
revision="33185">
<author>glv</author>
<date>2012-08-06T21:01:52.494219Z</date>
<paths>
<path
kind="file"
action="M">/branches/Patch_4_2_0_Branch/text.xml</path>
<path
kind="dir"
action="M">/branches/Patch_4_2_0_Branch</path>
</paths>
<msg>PATCH_BRANCH:N/A
BUG_NUMBER:N/A
FEATURE_AFFECTED:N/A
OVERVIEW:N/A
Adding the SVN log size requirement to the branch
</msg>
</logentry>
</log>
The actual xml file is much bigger but this is the general format. It will actually work if it was this small but once it gets bigger I get problems.
here is the traceback
Traceback (most recent call last):
File "C:\python\src\SVN_Email_copy.py", line 141, in <module>
xml_data ()
File "C:\python\src\SVN_Email_copy.py", line 50, in xml_data
name = num.getElementsByTagName('author')[0].firstChild.nodeValue
IndexError: list index out of range

Based on the code provided your error is going to be in this line:
name = num.getElementsByTagName('author')[0].firstChild.nodeValue
#xml node-^
#function call -------------------------^
#list indexing ----------------------------^
#attribute access -------------------------------------^
That's the only place in the demonstrated code that you're indexing into a list. That would imply that in your larger XML Sample you're missing an <author> tag. You'll have to correct that, or add in some level of error handling / data validation.
Please see the code elaboration for more explanation. You're doing a ton of things in a single line by taking advantage of the return behaviors of successive commands. So, the num is defined, that's fine. Then you call a function (method). It returns a list. You attempt to retrieve from that list and it throws an exception, so you never make it to the Attribute Access to get to firstChild, which definitely means you get no nodeValue.
Error checking may look something like this:
authors = num.getElementsByTagName('author')
if len(authors) > 0:
name = authors[0].firstChild.nodeValue
Though there are many, many ways you could achieve that.

EOL stops python on Calculate Field

Would anyone be able to help me modify these scripts to ignore the error and continue running ? I just need to figure out how to make the script skip over these errors and finish the rest of the lines.
Here is the full Python script:
# Import system modules
import sys, string, os, arcgisscripting
# Create the geoprocessor object
gp = arcgisscripting.create(9.3)
gp.OverWriteOutput = True
# Set the workspace. List all of the folders within
gp.Workspace = "C:\ZP4"
fcs = gp.ListWorkspaces("*","Folder")
for fc in fcs:
print fc
gp.CalculateField_management(fc + "\\Parcels.shp", "SIT_FULL_S", "myfunction(!SIT_HSE_NU!,!SIT_FULL_S!)", "PYTHON", "def myfunction(fld1,fld2):\n if (fld1=='0'or fld1=='00'or fld1<'00000000000'):\n return ''\n else:\n return fld2")
And here is the error I encounter:
Traceback (most recent call last):
File "C:\Documents and Settings\Andrew\Desktop\HOUSENUMERZERO.py", line 18, in
<module>
ERROR 000539: Error running expression: myfunction
(" ","69 FLOOD ST
") <type 'exceptions.SyntaxError'>: EOL while scanning single-quoted string (<st
ring>, line 1)
Failed to execute (CalculateField).

First option: wrap the gp.CalculateField_management(...) in a try/except, like so:
try:
gp.CalculateField_management(...)
except SyntaxError:
pass
This should allow your script to keep going, but I'm not sure what the state of gp will be.
A better option would be to preprocess each file, and deal with the fields that have the embedd new-lines in them; something like:
for fc in fcs:
fix_bad_fields(fp)
gp.Calculatate...
and fix_bad_fields looks something like (you'll have to research this as I am unfamiliar with .shp files -- I'll pretend it allows writing back to the same file, but if not you'll have to do some copying and renaming as well):
def fix_bad_fields(filename):
data_file = open_shp_file(filename)
for row in data_file:
row[0] = row[0].replace('\n', '')
row[1] = row[1].replace('\n', '')
row1.put_changes_on_disk() # force changes to disk (may not be necessary)
data_file.close()
Lots of guesswork in those particulars, but hopefully that gives you an idea and enough to go on.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python 3 email extracting search engine - python

Related

Result of running code

CGI with Python

AttributeError: 'module' object has no attribute

Python , XML Index error

EOL stops python on Calculate Field

Categories

Resources