I have a perculiar problem.
I'm using the extract_msg package to analyse a bunch of .msg files.
The package results in an NotImplementedError error when the .msg file contains attachtments.
I'm using a try-except block to catch this error, but for some weird reason its prints "hello" for every error it finds. Nowhere in my code I wrote it to print "hello" so I'm a bit baffled here.
Has anyone seen this before? And how can I avoid this strange thing from happening?
#f-variabele
f = glob.glob(r'D:\AA Brenda\Python\DVS_lang\*\*.msg', recursive=True)
instead of this please do like this and try again
f = glob.glob(r'D:/AA Brenda/Python/DVS_lang/**/*.msg', recursive=True)
#loop over folders
paths =[]
senders = []
recipients = []
dates = []
subjects = []
body = []
#append data to lists:
for filename in f:
try:
msg = extract_msg.Message(filename)
paths.append(filename)
senders.append(msg.sender)
recipients.append(msg.to)
dates.append(msg.date)
subjects.append(msg.subject)
body.append(msg.body)
except NotImplementedError: #"NotImplementedError: Current version of ExtractMsg.py does not
#support extraction of containers that are not embeded msg files."
print("")
It's in their library:
https://github.com/TeamMsgExtractor/msg-extractor/blob/master/extract_msg/msg.py#L644
If that really bothers you, you can use the following:
How to block calls to print?
Related
I'm building a Python program to sort pictures by "EXIF DateTimeOriginal" tag using the exifread module. There is an error when not picture file (e.g. .mp3 file) is processed by exifread.process_file(item). I would like Python to ignore files without EXIF tags so I use try, except statement but it still returns the error File format not recognized which terminates the program.
I added tags["EXIF DateTimeOriginal"] == True which stopped program termination but the error is still printed.
Has anyone idea how to make exifread module to ignore files which are not pictures?
import exifread
item = "D:\TEMP\Vesna.mp3"
with open(item, 'rb') as file:
try:
tags = exifread.process_file(file, stop_tag="EXIF DateTimeOriginal")
tags["EXIF DateTimeOriginal"] == True
except:
print("No tag")
else:
taken = tags["EXIF DateTimeOriginal"]
print(tags["EXIF DateTimeOriginal"])
**returns**
File format not recognized.
No tag
I could sort out not picture files before fetching them to exifread commands but I have the impression that it would take more time and also some images could still not possess the required tag.
From the source, process_file catches file type errors, logs a warning message and returns an empty dict. You could test for the empty dict or, since you are also concerned about an entry in that dict, use a get with a default value for test. And you can change what happens to a logging event with the logging module.
import exifread
import logging
item = "D:\TEMP\Vesna.mp3"
logging.basicConfig(level=logging.ERROR)
with open(item, 'rb') as file:
tags = exifread.process_file(file, stop_tag="EXIF DateTimeOriginal")
if tags.get("EXIF DateTimeOriginal", True):
taken = tags["EXIF DateTimeOriginal"]
print(tags["EXIF DateTimeOriginal"])
else:
print("No tag")
Your solution #tdelaney still raised an Error so I tweaked it slightly and here is the result. Thanks for the introduction to the logging module :)
import exifread
import logging
item = "D:\TEMP\\20160130_215245.jpg"
logging.basicConfig(level=logging.ERROR)
try:
file = open(item, 'rb')
tags = exifread.process_file(file, stop_tag="EXIF DateTimeOriginal")
taken = tags["EXIF DateTimeOriginal"]
except:
print("No tag")
else:
print(tags["EXIF DateTimeOriginal"])
I'm trying to control exceptions when reading files, but I have a problem. I'm new to Python, and I am not yet able to control how I can catch an exception and still continue reading text from the files I am accessing. This is my code:
import errno
import sys
class Read:
#FIXME do immutables this 2 const
ROUTE = "d:\Profiles\user\Desktop\\"
EXT = ".txt"
def setFileReaded(self, fileToRead):
content = ""
try:
infile = open(self.ROUTE+fileToRead+self.EXT)
except FileNotFoundError as error:
if error.errno == errno.ENOENT:
print ("File not found, please check the name and try again")
else:
raise
sys.exit()
with infile:
content = infile.read()
infile.close()
return content
And from another class I tell it:
read = Read()
print(read.setFileReaded("verbs"))
print(read.setFileReaded("object"))
print(read.setFileReaded("sites"))
print(read.setFileReaded("texts"))
Buy only print this one:
turn on
connect
plug
File not found, please check the name and try again
And no continue with the next files. How can the program still reading all files?
It's a little difficult to understand exactly what you're asking here, but I'll try and provide some pointers.
sys.exit() will terminate the Python script gracefully. In your code, this is called when the FileNotFoundError exception is caught. Nothing further will be ran after this, because your script will terminate. So none of the other files will be read.
Another thing to point out is that you close the file after reading it, which is not needed when you open it like this:
with open('myfile.txt') as f:
content = f.read()
The file will be closed automatically after the with block.
I try to write a plugin to get all the classes in current folder to do an auto-complete injection.
the following code is in my python file:
class FolderPathAutoComplete(sublime_plugin.EventListener):
def on_query_completions(self, view, prefix, locations):
folders = view.window().folders()
results = get_path_classes(folders)
all_text = ""
for result in results:
all_text += result + "\n"
#sublime.error_message(all_text)
return results
def get_path_classes(folders):
classesList = []
for folder in folders:
for root, dirs, files in os.walk(folder):
for filename in files:
filepath = root +"/"+filename
if filepath.endswith(".java"):
filepath = filepath.replace(".java","")
filepath = filepath[filepath.rfind("/"):]
filepath = filepath[1:]
classesList.append(filepath)
return classesList
but somehow when I work in a folder dir with a class named "LandingController.java" and I try to get the result, the auto complete is not working at all.
However, as you may noticed I did a error_message output of all the contents I got, there are actual a list of class name found.
Can anyone help me solve this? thank you!
It turn outs that the actual format which sublime text accept is: [(word,word),...]
but thanks to MattDMo who point out the documentation since the official documentation says nothing about the auto complete part.
For better understanding of the auto complete injection api, you could follow Zinggi's plugin DictionaryAutoComplete and this is the github link
So for a standard solution:
class FolderPathAutoComplete(sublime_plugin.EventListener):
def on_query_completions(self, view, prefix, locations):
suggestlist = self.get_autocomplete_list(prefix)
return suggestlist
def get_autocomplete_list(self, word):
global classesList
autocomplete_list = []
uniqueautocomplete = set()
# filter relevant items:
for w in classesList:
try:
if word.lower() in w.lower():
actual_class = parse_class_path_only_name(w)
if actual_class not in uniqueautocomplete:
uniqueautocomplete.add(actual_class)
autocomplete_list.append((actual_class, actual_class))
except UnicodeDecodeError:
print(actual_class)
# autocomplete_list.append((w, w))
continue
return autocomplete_list
I am new to Python, and with some really great assistance from StackOverflow, I've written a program that:
1) Looks in a given directory, and for each file in that directory:
2) Runs a HTML-cleaning program, which:
Opens each file with BeautifulSoup
Removes blacklisted tags & content
Prettifies the remaining content
Runs Bleach to remove all non-whitelisted tags & attributes
Saves out as a new file
It works very well, except when it hits a certain kind of file content that throws up a bunch of BeautifulSoup errors and aborts the whole thing. I want it to be robust against that, as I won't have control over what sort of content winds up in this directory.
So, my question is: How can I re-structure the program so that when it errors on one file within the directory, it reports that it was unable to process that file, and then continues to run through the remaining files?
Here is my code so far (with extraneous detail removed):
def clean_dir(directory):
os.chdir(directory)
for filename in os.listdir(directory):
clean_file(filename)
def clean_file(filename):
tag_black_list = ['iframe', 'script']
tag_white_list = ['p', 'div']
attr_white_list = {'*': ['title']}
with open(filename, 'r') as fhandle:
text = BeautifulSoup(fhandle)
text.encode("utf-8")
print "Opened "+ filename
# Step one, with BeautifulSoup: Remove tags in tag_black_list, destroy contents.
[s.decompose() for s in text(tag_black_list)]
pretty = (text.prettify())
print "Prettified"
# Step two, with Bleach: Remove tags and attributes not in whitelists, leave tag contents.
cleaned = bleach.clean(pretty, strip="TRUE", attributes=attr_white_list, tags=tag_white_list)
fout = open("../posts-cleaned/"+filename, "w")
fout.write(cleaned.encode("utf-8"))
fout.close()
print "Saved " + filename +" in /posts-cleaned"
print "Done"
clean_dir("../posts/")
I looking for any guidance on how to write this so that it will keep running after hitting a parsing/encoding/content/attribute/etc error within the clean_file function.
You can handle the Errors using :try-except-finally
You can do the error handling inside clean_file or in the for loop.
for filename in os.listdir(directory):
try:
clean_file(filename)
except:
print "Error processing file %s" % filename
If you know what exception gets raised you can use a more specific catch.
I am going to start of by showing the code I have thus far:
def err(em):
print(em)
exit
def rF(f):
s = ""
try:
fh = open(f, 'r')
except IOError:
e = "Could not open the file: " + f
err(e)
try:
with fh as ff:
next(ff)
for l in ff:
if ">" in l:
next(ff)
else:
s += l.replace('\n','').replace('\t','').replace('\r','')
except:
e = "Unknown Exception"
err(e)
fh.close()
return s
For some reason the python shell (I am using 3.2.2) freezes up whenever I tried to read a file by typing:
rF("mycobacterium_bovis.fasta")
The conditionals in the rF function are to prevent reading each line that starts with a ">" token. These lines aren't DNA/RNA code (which is what I am trying to read from these files) and should be ignored.
I hope anyone can help me out with this, I don't see my error.
As per the usual, MANY thanks in advance!
EDIT:
*The problem persists!*
This is the code I now use, I removed the error handling which was a fancy addition anyway, still the shell freezes whenever attempting to read a file. This is my code now:
def rF(f):
s = ""
try:
fh = open(f, 'r')
except IOError:
print("Err")
try:
with fh as ff:
next(ff)
for l in ff:
if ">" in l:
next(ff)
else:
s += l.replace('\n','').replace('\t','').replace('\r','')
except:
print("Err")
fh.close()
return s
You didn't ever define e.
So you'll get a NameError that is being hidden by the naked except:.
This is why it is good and healthy to specify the exception, e.g.:
try:
print(e)
except NameError as e:
print(e)
In cases like yours, though, when you don't necessarily know what the exception will be you should at least use this method of displaying information about the error:
import sys
try:
print(e)
except: # catch *all* exceptions
e = sys.exc_info()[1]
print(e)
Which, using the original code you posted, would have printed the following:
name 'e' is not defined
Edit based on updated information:
Concatenating a string like that is going to be quite slow if you have a large file.
Consider instead writing the filtered information to another file, e.g.:
def rF(f):
with open(f,'r') as fin, open('outfile','w') as fou:
next(fin)
for l in fin:
if ">" in l:
next(fin)
else:
fou.write(l.replace('\n','').replace('\t','').replace('\r',''))
I have tested that the above code works on a FASTA file based on the format specification listed here: http://en.wikipedia.org/wiki/FASTA_format using Python 3.2.2 [GCC 4.6.1] on linux2.
A couple of recommendations:
Start small. Get a simple piece working then add a step.
Add print() statements at trouble spots.
Also, consider including more information about the contents of the file you're attempting to parse. That may make it easier for us to help.