I have several large zip file that contain a dir structure that I must maintain. Currently to unzip them I am using
zip = zipfile.ZipFile(self.fileName)
zip.extractall(self.destination)
zip.close()
The problem is that these process can take upwards of 3-5 minutes and I have no feedback that they are still working. What I would like to do is output the name of the file currently being unziped to the status bar of my gui. What I have in mind is something like
zip = zipfile.ZipFile(self.fileName)
zipNameList = zipfile.namelist(self.fileName)
for item in zipNameList:
self.SetStatusText("Unzipping" + str(item))
zip.extract(item)
zip.close()
The problem with this is that it does not create the correct dir structure. I am not sure that this is even the best way to go about it.
I was also looking into using wx.progressdialog but could not come up with a way to have it show progress of the zip.extractall(filename).
I got it to an acceptable solution - Though I think I would prefer it thread it eventually.
def unzipItem(self, fileName, destination)
print "--unzipItem--"
zip = zipfile.ZipFile(fileName)
nameList = zip.namelist()
#get the amount of files in the file to properly size the progress bar
fileCount = 0
for item in nameList:
fileCount += 1
#Built progress dialog
dlg = wx.ProgressDialog("Unziping files",
"An informative message",
fileCount,
parent = self,
)
keepGoing = True
count = 0
for item in nameList:
count += 1
dir,file = os.path.split(item)
print "unzip " + file
#update status bar
self.SetStatusText("Unziping " + str(item))
#update progress dialog
(keepGoing, skip) = dlg.Update(count, file)
zip.extract(item,destination)
zip.close()
You can use infolist instead of namelist. From the docs:
The objects are in the same order as their entries in the actual ZIP
file on disk if an existing archive was opened.
Also, consider this note:
The open(), read() and extract() methods can take a filename or a ZipInfo object. You will appreciate this when trying to read a ZIP file that contains members with duplicate names.
So you can write something like this:
with ZipFile(zip_file_name) as myzipfile:
members = myzipfile.infolist()
for i, member in enumerate(members):
myzipfile.extract(member, destination_path)
self.SetStatusText("Unziping " + str(i))
self.mysignal.emit(i) # use this to update inside a thread
You can put this on a thread and then update through a signal, and the SetStatusText method should be called inside the corresponding slot.
Related
I'm writing something in python that needs to know which specific files/programs are open. I've mapped the list of running processes to find the executable paths of the processes I'm looking for. This works for most things, but all Microsoft Office programs run under general processes like WINWORD.exe or EXCEL.exe etc. I've also tried getting a list of open windows and their titles to see what file is being edited, but the window titles are relative paths not absolute paths to the file being edited.
Here's a sample:
import wmi
f = wmi.WMI()
pid_map = {}
PID = 4464 #pid of Microsoft Word
for process in f.Win32_Process():
if not process.Commandline: continue
pid_map[process.ProcessID] = process.Commandline
pid_map[PID]
Outputs:
'"C:\\Program Files\\Microsoft Office\\root\\Office16\\WINWORD.EXE" '
How do I get the path of the file actually being edited?
I figured it out. Here is a function that will return the files being edited.
import pythoncom
def get_office(): # creates doctype: docpath dictionary
context = pythoncom.CreateBindCtx(0)
files = {}
dupl = 1
patt2 = re.compile(r'(?i)(\w:)((\\|\/)+([\w\-\.\(\)\{\}\s]+))+'+r'(\.\w+)') #matches files, can change the latter to only get specific files
#look for path in ROT
for moniker in pythoncom.GetRunningObjectTable():
name = moniker.GetDisplayName(context, None)
checker = re.search(patt2,name)
if checker:
match = checker.group(5) #extension
if match in ('.XLAM','.xlam'): continue #These files aren't useful
try:
files[match[1:]] # check to see if the file type was already documented
match += str(dupl)
dupl += 1
except KeyError:
pass
files[match[1:]] = name #add doctype: doc path pairing to dictionary
I'm trying to make a custom logwatcher of a log folder using python. The objective is simple, finding a regex in the logs and write a line in a text if find it.
The problem is that the script must be running constantly against a folder in where could be multiple log files of unknown names, not a single one, and it should detect the creation of new log files inside the folder on the fly.
I made some kind of tail -f (copying part of the code) in python which is constantly reading a specific log file and write a line in a txt file if regex is found in it, but I don't know how could I do it with a folder instead a single log file, and how can the script detect the creation of new log files inside the folder to read them on the fly.
#!/usr/bin/env python
import time, os, re
from datetime import datetime
# Regex used to match relevant loglines
error_regex = re.compile(r"ERROR:")
start_regex = re.compile(r"INFO: Service started:")
# Output file, where the matched loglines will be copied to
output_filename = os.path.normpath("log/script-log.txt")
# Function that will work as tail -f for python
def follow(thefile):
thefile.seek(0,2)
while True:
line = thefile.readline()
if not line:
time.sleep(0.1)
continue
yield line
logfile = open("log/service.log")
loglines = follow(logfile)
counter = 0
for line in loglines:
if (error_regex.search(line)):
counter += 1
sttime = datetime.now().strftime('%Y%m%d_%H:%M:%S - ')
out_file=open(output_filename, "a")
out_file.write(sttime + line)
out_file.close()
if (start_regex.search(line)):
sttime = datetime.now().strftime('%Y%m%d_%H:%M:%S - ')
out_file=open(output_filename, "a")
out_file.write(sttime + "SERVICE STARTED\n" + sttime + "Number of errors detected during the startup = {}\n".format(counter))
counter = 0
out_file.close()
You can use watchgod for this purpose. This may be a comment too, not sure if it deserves to be na answer.
I am brand new to python (and coding in general) and I've been unable to find a solution to my specific problem online. I am currently creating a tool which will allow a user to save a file to a network location. The file will have a version number. What I would like to is to have the script auto version up before it saves. I have the rest of the script done, but it is the auto versioning that I am having issues with. Here's what I have so far:
import re
import os
def main():
wip_folder = "L:/xxx/xxx/xxx/scenes/wip/"
_file_list = os.listdir('%s' % wip_folder)
if os.path.exists('%s' wip_path):
for file in _file_list:
versionPattern = re.compile('_v\d{3}')
curVersions = versionPattern.findall('%s' % wip_folder)
curVersions.sort()
nextVersion = '_v%03d' % (int(curVersions[-1][2:]) + 1)
return nextVersion
else:
nextVersion = '_v001'
name = xxx_xxx_xx
name += '%s' nextVersion
name += '_xxx_wip
I should probably point out that main() is going to be called by a QPushbutton in another module. Also, that wip_path will most likely have several versions of a single file in it. So if there are 10 versions of this file in wip_path, this save should be v011. I apologize if this question makes no sense. Any help would be appreciated. Thank you!
You do not need to use re at all, programming is about simplification and you are over complicating it! I chose to return but anything in this function can be changed to whatever you need it to specifically do! Just read the comments and good luck!
def getVersioning(path):
#passing path
count = 1 #init count
versionStr = 'wip_path_v'
try: #in case path dosen't exist
for i in os.listdir(path): #loop thorough files in passed dir
if versionStr in i: #check if files contain default versioning string e.g. wip_path_V or any other (used in case other files are in the same dir)
count += 1 #incriment count
except:
os.mkdir(path) #makedir if one does not exist for future use
newName = versionStr + str(count) #new versioning file name
return newName #return new versioning name
print getVersioning('tstFold')
I am new to Python, and with some really great assistance from StackOverflow, I've written a program that:
1) Looks in a given directory, and for each file in that directory:
2) Runs a HTML-cleaning program, which:
Opens each file with BeautifulSoup
Removes blacklisted tags & content
Prettifies the remaining content
Runs Bleach to remove all non-whitelisted tags & attributes
Saves out as a new file
It works very well, except when it hits a certain kind of file content that throws up a bunch of BeautifulSoup errors and aborts the whole thing. I want it to be robust against that, as I won't have control over what sort of content winds up in this directory.
So, my question is: How can I re-structure the program so that when it errors on one file within the directory, it reports that it was unable to process that file, and then continues to run through the remaining files?
Here is my code so far (with extraneous detail removed):
def clean_dir(directory):
os.chdir(directory)
for filename in os.listdir(directory):
clean_file(filename)
def clean_file(filename):
tag_black_list = ['iframe', 'script']
tag_white_list = ['p', 'div']
attr_white_list = {'*': ['title']}
with open(filename, 'r') as fhandle:
text = BeautifulSoup(fhandle)
text.encode("utf-8")
print "Opened "+ filename
# Step one, with BeautifulSoup: Remove tags in tag_black_list, destroy contents.
[s.decompose() for s in text(tag_black_list)]
pretty = (text.prettify())
print "Prettified"
# Step two, with Bleach: Remove tags and attributes not in whitelists, leave tag contents.
cleaned = bleach.clean(pretty, strip="TRUE", attributes=attr_white_list, tags=tag_white_list)
fout = open("../posts-cleaned/"+filename, "w")
fout.write(cleaned.encode("utf-8"))
fout.close()
print "Saved " + filename +" in /posts-cleaned"
print "Done"
clean_dir("../posts/")
I looking for any guidance on how to write this so that it will keep running after hitting a parsing/encoding/content/attribute/etc error within the clean_file function.
You can handle the Errors using :try-except-finally
You can do the error handling inside clean_file or in the for loop.
for filename in os.listdir(directory):
try:
clean_file(filename)
except:
print "Error processing file %s" % filename
If you know what exception gets raised you can use a more specific catch.
Hello
My error is produced in generating a zip file. Can you inform what I should do?
main.py", line 2289, in get
buf=zipf.read(2048)
NameError: global name 'zipf' is not defined
The complete code is as follows:
def addFile(self,zipstream,url,fname):
# get the contents
result = urlfetch.fetch(url)
# store the contents in a stream
f=StringIO.StringIO(result.content)
length = result.headers['Content-Length']
f.seek(0)
# write the contents to the zip file
while True:
buff = f.read(int(length))
if buff=="":break
zipstream.writestr(fname,buff)
return zipstream
def get(self):
self.response.headers["Cache-Control"] = "public,max-age=%s" % 86400
start=datetime.datetime.now()-timedelta(days=20)
count = int(self.request.get('count')) if not self.request.get('count')=='' else 1000
from google.appengine.api import memcache
memcache_key = "ads"
data = memcache.get(memcache_key)
if data is None:
a= Ad.all().filter("modified >", start).filter("url IN", ['www.koolbusiness.com']).filter("published =", True).order("-modified").fetch(count)
memcache.set("ads", a)
else:
a = data
dispatch='templates/kml.html'
template_values = {'a': a , 'request':self.request,}
path = os.path.join(os.path.dirname(__file__), dispatch)
output = template.render(path, template_values)
self.response.headers['Content-Length'] = len(output)
zipstream=StringIO.StringIO()
file = zipfile.ZipFile(zipstream,"w")
url = 'http://www.koolbusiness.com/list.kml'
# repeat this for every URL that should be added to the zipfile
file =self.addFile(file,url,"list.kml")
# we have finished with the zip so package it up and write the directory
file.close()
zipstream.seek(0)
# create and return the output stream
self.response.headers['Content-Type'] ='application/zip'
self.response.headers['Content-Disposition'] = 'attachment; filename="list.kmz"'
while True:
buf=zipf.read(2048)
if buf=="": break
self.response.out.write(buf)
That is probably zipstream and not zipf. So replace that with zipstream and it might work.
i don't see where you declare zipf?
zipfile? Senthil Kumaran is probably right with zipstream since you seek(0) on zipstream before the while loop to read chunks of the mystery variable.
edit:
Almost certainly the variable is zipstream.
zipfile docs:
class zipfile.ZipFile(file[, mode[, compression[, allowZip64]]])
Open a ZIP file, where file can be either a path to a file (a string) or
a file-like object. The mode parameter
should be 'r' to read an existing
file, 'w' to truncate and write a new
file, or 'a' to append to an existing
file. If mode is 'a' and file refers
to an existing ZIP file, then
additional files are added to it. If
file does not refer to a ZIP file,
then a new ZIP archive is appended to
the file. This is meant for adding a
ZIP archive to another file (such as
python.exe).
your code:
zipsteam=StringIO.StringIO()
create a file-like object using StringIO which is essentially a "memory file" read more in docs
file = zipfile.ZipFile(zipstream,w)
opens the zipfile with the zipstream file-like object in 'w' mode
url = 'http://www.koolbusiness.com/list.kml'
# repeat this for every URL that should be added to the zipfile
file =self.addFile(file,url,"list.kml")
# we have finished with the zip so package it up and write the directory
file.close()
uses the addFile method to retrieve and write the retrieved data to the file-like object and returns it. The variables are slightly confusing because you pass a zipfile to the addFile method which aliases as zipstream (confusing because we are using zipstream as a StringIO file-like object). Anyways, the zipfile is returned, and closed to make sure everything is "written".
It was written to our "memory file", which we now seek to index 0
zipstream.seek(0)
and after doing some header stuff, we finally reach the while loop that will read our "memory-file" in chunks
while True:
buf=zipstream.read(2048)
if buf=="": break
self.response.out.write(buf)
You need to declare:
global zipf
right after your
def get(self):
line. you are modifying a global variable, and this is the only way python knows what you are doing.