How to pass values to a recursive call involving file.io - python

So recently I was working on a file.io recursive homework involving basic os methods and I have run into an error where the values in one part of the recursive function are not being passed on when I call the function again.
def findLargestFile(path):
findLargestFileHelper(path)
def findLargestFileHelper(path, size=0, pathToLargest=""):
if (os.path.isdir(path) == False):
if os.path.getsize(path) > size:
size = os.path.getsize(path)
pathToLargest = print(path)
else:
for filename in os.listdir(path):
findLargestFileHelper(path + "/" + filename, size, pathToLargest)
return pathToLargest
What I am trying to do is find the largest file in a folder, by recursively looping through all the folders till I find a file, and seeing if that is the largest file and if it is, pass the size in to "size" and pass the path to the file to "pathToLargest".
It seems that size is not passed in the else statement, and also when I just pass "path" to pathToLargest, it also is not working. (Pretty sure print(path) isn't the way to go about it either)
If someone could suggest what I should do instead, that would be greatly appreciated.

Related

Check if image is decorative in powerpoint using python-pptx

The company I work at requires a list of all inaccessible images/shapes in a .pptx document (don't have alt-text and aren't decorative). To automate the process, I'm writing a script that extracts all inaccessible images/shapes in a specified .pptx and compiles a list. So far, I've managed to make it print out the name, slide #, and image blob of images with no alt-text.
Unfortunately after extensively searching the docs, I came to find that the python-pptx package does not support functionality for checking whether an image/shape is decorative or not.
I haven't mapped XML elements to objects in the past and was wondering how I could go about making a function that reads the val attribute within the adec:decorative element in this .pptx file (see line 4).
<p:cNvPr id="3" name="Picture 2">
<a:extLst>
<a:ext uri="{FF2B5EF4-FFF2-40B4-BE49-F238E27FC236}"><a16:creationId xmlns:a16="http://schemas.microsoft.com/office/drawing/2014/main" id="{77922398-FA3E-426B-895D-97239096AD1F}" /></a:ext>
<a:ext uri="{C183D7F6-B498-43B3-948B-1728B52AA6E4}"><adec:decorative xmlns:adec="http://schemas.microsoft.com/office/drawing/2017/decorative" val="0" /></a:ext>
</a:extLst>
</p:cNvPr>
Since I've only recently started using this package, I'm not sure how to go about creating custom element classes within python-pptx. If anyone has any other workaround or suggestions please let me know, thank you!
Creating a custom element class would certainly work, but I would regard it as an extreme method (think bazooka for killing mosquitos) :).
I'd be inclined to think you could accomplish what you want with an XPath query on the closest ancestor you can get to with python-pptx.
Something like this would be in the right direction:
cNvPr = shape._element._nvXxPr.cNvPr
adec_decoratives = cNvPr.xpath(".//adec:decorative")
if adec_decoratives:
print("got one, probably need to look more closely at them")
One of the challenges is likely to be getting the adec namespace prefix registered because I don't think it is by default. So you probably need to execute this code before the XPath expression, possibly before loading the first document:
from pptx.oxml.ns import _nsmap
_nsmap["adec"] = "http://schemas.microsoft.com/office/drawing/2017/decorative"]
Also, if you research XPath a bit, I think you'll actually be able to query on <adec:decorative> elements that have val=0 or whatever specific attribute state satisfies what you're looking for.
But this is the direction I recommend. Maybe you can post your results once you've worked them out in case someone else faces the same problem later.
The problem was a lot simpler after all! All thanks too #scanny I was able to fix the issue and target the val=1 attribute in the adec:decorative element. The following function returns True if val=1 for that shape.
def isDecorative(shape):
cNvPr = shape._element._nvXxPr.cNvPr
adec_decoratives = cNvPr.xpath(".//adec:decorative[#val='1']")
if adec_decoratives:
return True
Here is the complete script for checking accessibility in a single specified .pptx so far (Prints out image name and slide # if image is not decorative and doesn't have alt-text):
from pptx import Presentation
from pptx.enum.shapes import MSO_SHAPE_TYPE
from pptx.enum.shapes import PP_PLACEHOLDER
from pptx.oxml.ns import _nsmap
_nsmap["adec"] = "http://schemas.microsoft.com/office/drawing/2017/decorative"
filePath = input("Specify PPT file path > ")
print()
def validShape(shape):
if shape.shape_type == MSO_SHAPE_TYPE.PICTURE:
return True
elif shape.shape_type == MSO_SHAPE_TYPE.PLACEHOLDER:
if shape.placeholder_format.type == PP_PLACEHOLDER.OBJECT:
return True
else:
return False
else:
return False
def isDecorative(shape):
cNvPr = shape._element._nvXxPr.cNvPr
adec_decoratives = cNvPr.xpath(".//adec:decorative[#val='1']")
if adec_decoratives:
return True
# Note: References custom #property added to shared.py and base.py
def hasAltText(shape):
if shape.alt_text:
return True
def checkAccessibility(prs):
for slide in prs.slides:
for shape in slide.shapes:
if validShape(shape) and not isDecorative(shape) and not hasAltText(shape):
yield shape
slideNumber = prs.slides.index(slide) + 1
print("Slide #: %d " % slideNumber + "\n");
for picture in checkAccessibility(Presentation(filePath)):
print(picture.name);

Instantiating a class gives dubious results the 2nd time around while looping

Edit:
Firstly, thank you #martineau and #jonrsharpe for your prompt reply.
I was initially hesitant to write a verbose description, but I now realize that I am sacrificing clarity for brevity. (thanks #jonrsharpe for the link).
So here's my attempt to describe what I am upto as succinctly as possible:
I have implemented the Lempel-Ziv-Welch text file compression algorithm in form of a python package. Here's the link to the repository.
Basically, I have a compress class in the lzw.Compress module, which takes in as input the file name(and a bunch of other jargon parameters) and generates the compressed file which is then decompressed by the decompress class within the lzw.Decompress module generating the original file.
Now what I want to do is to compress and decompress a bunch of files of various sizes stored in a directory and save and visualize graphically the time taken for compression/decompression along with the compression ratio and other metrics. For this, I am iterating over the list of the file names and passing them as parameters to instantiate the compress class and begin compression by calling the encode() method on it as follows:
import os
os.chdir('/path/to/files/to/be/compressed/')
results = dict()
results['compress_time'] = []
results['other_metrics'] = []
file_path = '/path/to/files/to/be/compressed/'
comp_path = '/path/to/store/compressed/files/'
decomp_path = '/path/to/store/decompressed/file'
files = [_ for _ in os.listdir()]
for f in files:
from lzw.Compress import compress as comp
from lzw.Decompress import decompress as decomp
c = comp(file_path+f,comp_path) #passing the input file and the output path for storing compressed file.
c.encode()
#Then measure time required for comression using time.monotonic()
del c
del comp
d = decomp('/path/to/compressed/file',decomp_path) #Decompressing
d.decode()
#Then measure time required for decompression using
#time.monotonic()
#append metrics to lists in the results dict for this particular
#file
if decompressed_file_size != original_file_size:
print("error")
break
del d
del decomp
I have run this code independently for each file without the for loop and have achieved compression and decompression successfully. So there are no problems in the files I wish to compress.
What happens is that whenever I run this loop, the first file (the first iteration) runs successfully and the on the next iteration, after the entire process happens for the 2nd file, "error" is printed and the loop exits. I have tried reordering the list or even reversing it(maybe a particular file is having a problem), but to no avail.
For the second file/iteration, the decompressed file contents are dubious(not matching the original file). Typically, the decompressed file size is nearly double that of the original.
I strongly suspect that there is something to do with the variables of the class/package retaining their state somehow among different iterations of the loop. (To counter this I am deleting both the instance and the class at the end of the loop as shown in the above snippet, but no success.)
I have also tried to import the classes outside the loop, but no success.
P.S.: I am a python newbie and don't have much of an expertise, so forgive me for not being "pythonic" in my exposition and raising a rather naive issue.
Update:
Thanks to #martineau, one of the problem was regarding the importing of global variables from another submodule.
But there was another issue which crept in owing to my superficial knowledge about the 'del' operator in python3.
I have this trie data structure in my program which is basically just similar to a binary tree.
I had a self_destruct method to delete the tree as follows:
class trie():
def __init__(self):
self.next = {}
self.value = None
self.addr = None
def insert(self, word=str(),addr=int()):
node = self
for index,letter in enumerate(word):
if letter in node.next.keys():
node = node.next[letter]
else:
node.next[letter] = trie()
node = node.next[letter]
if index == len(word) - 1:
node.value = word
node.addr = addr
def self_destruct(self):
node = self
if node.next == {}:
return
for i in node.next.keys():
node.next[i].self_destruct()
del node
Turns out that this C-like recursive deletion of objects makes no sense in python as here simply its association in the namespace is removed while the real work is done by the garbage collector.
Still, its kinda weird why python is retaining the state/association of variables even on creating a new object(as shown in my loop snippet in the edit).
So 2 things solved the problem. Firstly, I removed the global variables and made them local to the module where I need them(so no need to import). Also, I deleted the self_destruct method of the trie and simple did: del root where root = trie() after use.
Thanks #martineau & #jonrsharpe.

Python: error handling in recursive functions

Me: I am running Python 2.3.3 without possibility to upgrade and i don't have much experience with Python. My method for learning is googling and reading tons of stackoverflow.
Background: I am creating a python script whose purpose is to take two directories as arguments and then perform comparisons/diff of all the files found within the two directories. The directories have sub-directories that also have to be included in the diff.
Each directory is a List and sub-directories are nested Lists and so on...
the two directories:
oldfiles/
a_tar_ball.tar
a_text_file.txt
nest1/
file_in_nest
nest1a/
file_in_nest
newfiles/
a_tar_ball.tar
a_text_file.txt
nest1/
file_in_nest
nest1a/
Problem: Normally all should go fine as all files in oldfiles should exist in newfiles but in the above example one of the 'file_in_nest' is missing in 'newfiles/'.
I wish to print an error message telling me which file that is missing but when i'm using the code structure below the current instance of my 'compare' function doesn't know any other directories but the closest one. I wonder if there is a built in error handling that can send information about files and directory up in the recursion ladder adding info to it as we go. If i would just print the filename of the missing file i would not know which one of them it might be as there are two 'file_in_nest' in 'oldfiles'
def compare(file_tree)
for counter, entry in enumerate(file_tree[0][1:]):
if not entry in file_tree[1]
# raise "some" error and send information about file back to the
# function calling this compare, might be another compare.
elif not isinstance(entry, basestring):
os.chdir(entry[0])
compare(entry)
os.chdir('..')
else:
# perform comparison (not relevant to the problem)
# detect if "some" error has been raised
# prepend current directory found in entry[0] to file information
break
def main()
file_tree = [['/oldfiles', 'a_tar_ball.tar', 'a_text_file.txt', \
[/nest1', 'file_in_nest', [/nest1a', 'file_in_nest']], \
'yet_another_file'], \
['/newfiles', 'a_tar_ball.tar', 'a_text_file.txt', \
[/nest1', 'file_in_nest', [/nest1a']], \
'yet_another_file']]
compare(file_tree)
# detect if "some" error has been raised and print error message
This is my first activity on stackoverflow other than reading som please tell me if i should improve on the question!
// Stefan
Well, it depends whether you want to report an error as an exception or as some form of status.
Let's say you want to go the 'exception' way and have the whole program crash if one file is missing, you can define your own exception saving the state from the callee to the caller:
class PathException(Exception):
def __init__(self, path):
self.path = path
Exception.__init__(self)
def compare(filetree):
old, new = filetree
for counter, entry in enumerate(old[1:]):
if entry not in new:
raise PathException(entry)
elif not isinstance(entry, basestring):
os.chdir(entry[0])
try:
compare(entry)
os.chdir("..")
except PathException as e:
os.chdir("..")
raise PathException(os.path.join(entry, e.path))
else:
...
Where you try a recursive call, and update any incoming exception with the information of the caller.
To see it on a smaller example, let's try to deep-compare two lists, and raise an exception if they are not equal:
class MyException(Exception):
def __init__(self, path):
self.path = path
Exception.__init__(self)
def assertEq(X, Y):
if hasattr(X, '__iter__') and hasattr(Y, '__iter__'):
for i, (x, y) in enumerate(zip(X, Y)):
try:
assertEq(x, y)
except MyException as e:
raise MyException([i] + e.path)
elif X != Y:
raise MyException([]) # Empty path -> Base case
This gives us:
>>> L1 = [[[1,2,3],[4,5],[[6,7,8],[7,9]]],[3,5,[7,8]]]
>>> assertEq(L1, L1)
Nothing happens (lists are similar), and:
>>> L1 = [[[1,2,3],[4,5],[[6,7,8],[7,9]]],[3,5,[7,8]]]
>>> L2 = [[[1,2,3],[4,5],[[6,7,8],[7,5]]],[3,5,[7,8]]] # Note the [7,9] -> [7,5]
>>> try:
... assertEq(L1, L2)
... except MyException as e:
... print "Diff at",e.path
Diff at [0, 2, 1, 1]
>>> print L1[0][2][1][1], L2[0][2][1][1]
9 5
Which gives the full path.
As recursive lists or paths are basically the same thing, it is easy to adapt it to your use case.
Another simple way of solving this would be to report this difference in files as a simple diff, similar to the others: you can return it as a difference between the old file and the (non-existent) new file, or return both the list of differences in files and the list of differences of files, in which case it is easy to update recursively the values as they are returned by the recursive calls.

Can I limit write access of a program to a certain directory in osx? Also set maximum size of the directory and memory allocated

I am writing code with python that might run wild and do unexpected things. These might include trying to save very large arrays to disk and trying to allocate huge amounts of memory for arrays (more than is physically available on the system).
I want to run the code in a constrained environment in Mac OSX 10.7.5 with the following rules:
The program can write files to one specific directory and no others (i.e. it cannot modify files outside this directory but it's ok to read files from outside)
The directory has a maximum "capacity" so the program cannot save gigabytes worth of data
Program can allocate only a finite amount of memory
Does anyone have any ideas on how to set up such a controlled environment?
Thanks.
import os
stats = os.stat('possibly_big_file.txt')
if (stats.st_size > TOOBIG):
print "Oh no....."
A simple and naive solution, that can be expanded to achieve what you want:
WRITABLE_DIRECTORY = '/full/path/to/writable/directory'
class MaxSizeFile(object):
def __init__(self, fobj, max_bytes=float('+inf')):
self._fobj = fobj
self._max = max_bytes
self._cur = 0
def write(self, data):
# should take into account file position...
if self._cur + len(data) > self._max:
raise IOError('The file is too big!')
self._fobj.write(data)
self._cur += len(data)
def __getattr__(self, attr):
return getattr(self._fobj, attr)
def my_open(filename, mode='r', ..., max_size=float('+inf')):
if '+' in mode or 'w' in mode:
if os.path.dirname(filename) != WRITABLE_DIRECTORY:
raise OSError('Cannot write outside the writable directory.')
return MaxSizeFile(open(filename, mode, ...), max_size)
Then, instead using the built-in open you call my_open. The same can be done for the arrays. Instead of allocating the arrays directly you call a function that keeps track of how much memory has been allocated and eventually raises an exception.
Obviously this gives only really light constraints, but if the program wasn't written with the goal of causing problems it should be enough.

QFileDialog returns selected file with wrong seperators

I noticed that QFileDialog instance is returning absolute paths for the member function selectedFile() that have the wrong separator for the given operating system. This is not expected on a cross platform language (python)
What should I do to correct this so that the rest of my properly OS-independant python code using 'os.sep' can be correct? I don't want to have to remember where I can and can't use it.
You use the os.path.abspath function:
>>> import os
>>> os.path.abspath('C:/foo/bar')
'C:\\foo\\bar'
The answer came from another thread ( HERE ) that mentioned I need to use QDir.toNativeSeparators()
so I did the following in my loop (which should probably be done in pyqt itself for us):
def get_files_to_add(some_directory):
addq = QFileDialog()
addq.setFileMode(QFileDialog.ExistingFiles)
addq.setDirectory(some_directory)
addq.setFilter(QDir.Files)
addq.setAcceptMode(QFileDialog.AcceptOpen)
new_files = list()
if addq.exec_() == QDialog.Accepted:
for horrible_name in addq.selectedFiles():
### CONVERSION HERE ###
temp = str(QDir.toNativeSeparators(horrible_name)
###
# temp is now as the os module expects it to be
# let's strip off the path and the extension
no_path = temp.rsplit(os.sep,1)[1]
no_ext = no_path.split(".")[0]
#... do some magic with the file name that has had path stripped and extension stripped
new_files.append(no_ext)
pass
pass
else:
#not loading anything
pass
return new_files

Categories