Recursive printing of file directory in Python - python

I'm trying to figure out how to print out each item in a directory with proper indentation. The code I have so far is below:
import os
def traverse(pathname,d):
'prints a given nested directory with proper indentation'
indent = ''
for i in range(d):
indent = indent + ' '
for item in os.listdir(pathname):
try:
newItem = os.path.join(pathname, item)
traverse(newItem,d+1)
except:
print(indent + newItem)
The output that I have prints out all the files in the test directory, but does not print out the folder names. What I get is this:
>>> traverse('test',0)
test/fileA.txt
test/folder1/fileB.txt
test/folder1/fileC.txt
test/folder1/folder11/fileD.txt
test/folder2/fileD.txt
test/folder2/fileE.txt
>>>
What the output should be:
>>> traverse('test',0)
test/fileA.txt
test/folder1
test/folder1/fileB.txt
test/folder1/fileC.txt
test/folder1/folder11
test/folder1/folder11/fileD.txt
test/folder2
test/folder2/fileD.txt
test/folder2/fileE.txt
>>>
Can anyone let me know what I need to be doing with the code to get the folder names to show up? I've tried to print out the pathname, but it just repeats the folder name every time Python prints out a file name since it is in a for loop. A nudge in the right direction would be greatly appreciated!

You need to print the file name whether it's a directory or not, something like:
for item in os.listdir(pathname):
try:
newItem = os.path.join(pathname, item)
print(indent + newItem)
traverse(newItem,d+1)
except:
pass
Though I would rather not use an exception to detect whether it's a directory, so if os.path.isdir is allowed:
for item in os.listdir(pathname):
newItem = os.path.join(pathname, item)
print(indent + newItem)
if (os.path.isdir(newItem)):
traverse(newItem,d+1)

A recursive directory printing function that does not use os.walk might look something like this:
def traverse(root, depth, indent=''):
for file_ in os.listdir(root):
leaf = os.path.join(root, file_)
print indent, leaf
if os.path.isdir(leaf) and depth:
traverse(leaf, depth-1, indent + ' ')
I don't approve of teaching people to avoid built ins to try to drive home a concept. There are plenty of ways to teach recursion without making someone neglect the finer aspects of the language they're learning. That's my opinion though, and apparently not one shared by many professors.

Related

Python nested dictionary not keeping elements in order

I've been trying to create an input library for Selenium using the nested dictionary data type, and while at first it was working perfectly I am now realizing I have gotten myself into a position where I cannot be assured that my elements will stay in order (which is very necessary for this library).
Here is an example of how I am trying to structure this code:
qlib = {
'code_xp':
{'keywords':
{'javascript':0,
'web design':1,
'python':0},
'answer':
{'4',
'yes'}}
}
for prompt, info in qlib.items()
for t, i in enumerate(list(info['answer'])):
if t == 0:
try:
print(i)
except:
pass
If you run this yourself, you will soon realize that after a few runs it will have rearranged the output from the list ['4', 'yes'], switching between ['4'] to ['yes']. Given that I depend on only referencing the first element for certain inputs ('4'), I can't allow this.
As for the 'keywords' section, I have used the structure i.e. 'javascript':0 as a necessary tag element for data processing. While this is not relevant for this problem, any solution would have to account for this. Here is my full data processing engine for those that would like to see the original context. Please note this comes before the 'for' loop listed above:
trs = 'translate(., "ABCDEFGHIJKLMNOPQRSTUVWXYZ", "abcdefghijklmnopqrstuvwxyz")'
input_type = ['numeric', 'text', 'multipleChoice', 'multipleChoice']
element_type = ['label', 'label', 'label', 'legend']
for index, item in enumerate(input_type):
print(f"Current input: {item}")
form_number = driver.find_elements(By.XPATH,'//' +element_type[index]+ '[starts-with(#for, "urn:li:fs_easyApplyFormElement")][contains(#for, "' +item+ '")]')
if form_number:
print(item)
for link in form_number:
for prompt, info in qlib.items():
keywords = info['keywords']
full_path = []
for i, word in enumerate(keywords):
path = 'contains(' +trs+ ', "' +word+ '")'
if i < len(keywords) - 1:
if keywords[word] == 1:
path += " and"
elif keywords[word] == 0:
path += " or"
full_path.append(path)
full_string = ' '.join(full_path)
answer = ' '.join(info['answer'])
I've been trying to find the right datatype for this code for a while now, and while this almost works perfectly the problem I'm facing makes it unusable. I've considered an OrderedDict as well, however I am not confident I can keep the structures that I depend on. Looking for anything that will work. Thank you so much for any help!

Is there a memory-efficient way to use a tuple to iterate over very large os.scandir() objects?

**EDITED TO FOCUS ON FILENAME (PARTIAL) MATCHING:
I am working with approximately 1.8 million files in one directory on a remote server. I am using os.scandir() in python3 to generate the full list of files in that directory, then checking each file name against an existing tuple, and, if there is a match, copying that file to a separate directory (also on the remote server).
The tuple I am using to check for the proper filenames is ~100,000 items long. Further, each item in the tuple is only a partial match for the actual filename -- for example, a tuple item might be '2019007432' and I want it to match a filename such as '2019007432_longunpredictablefilename.doc'. So I've used .startswith when searching filenames, rather than looking for exact matches.
I have successfully been able to run this code one time, but the script slows down progressively as it goes on, maxing out my computer's RAM -- and it took about 24 hours to run. As I will be adding to the 1.8 million files in the future, and I may have additional (longer) tuples with which to find and copy files, I'm looking for ways to streamline the code so it will run more efficiently. Does anyone have any suggestions about how to make this work any better?
import os
import shutil
from variables import file_tuple
srcpath = 'path/to/source/directory'
destpath = 'path/to/destination/directory'
counter = 1
copy_counter = 1
error_list = []
all_files = os.scandir(srcpath)
for file in all_files:
try:
if file.name.startswith(file_tuple):
shutil.copy(srcpath + '/' + file.name, destpath)
print('copied ' + str(counter) + ' -- ' + str(copy_counter))
copy_counter +=1
else:
if counter % 5000 == 0:
percent = "{0:.0%}".format(counter/1860000)
print(str(counter) + ' -- ' + str(percent))
except Exception as e:
print(e)
error_list.append(file.name)
counter +=1
print(error_list)
So now we talk about algorithm. In my opinion one of the best idea is to shrink the list of all files in the computer. So try to find a similar patter for this names in tuple like all start with a digit or all ends with a digit or contains only digits or have some precise length range. After you subset this files you could look across much smaller list. Still it will be a O(N^2) although it might be significantly more efficient. * It is like one additional loop across all files looking for similar pattern
In case this ends up being useful to others: Following the advice from #juanpa.arrivillaga, I converted my tuple into a set, and altered my code to split my the filenames generated by scandir such that they would be an exact match with the items in the set. This ran in about 6 hours (compared to 24+ hours the original way. See code below.
I haven't yet tried #polkas suggestion, which is to break down the tuple into smaller chunks and run them separately, but I suspect it will be very useful for a) unstable internet connections, to allow me to only run sections at a time without losing my place when the internet drops and/or b) when the filenames cannot be easily split at a known character.
files = os.scandir(srcpath)
for file in files:
try:
UID = file.name.split("_")[0]
if UID in file_set:
shutil.copy(srcpath + '/' + file.name, destpath)
print('copied ' + str(counter) + ' -- ' + str(copy_counter))
copy_counter +=1
else:
if counter % 5000 == 0:
percent = "{0:.0%}".format(counter/1860000)
print(str(counter) + ' -- ' + str(percent))
except Exception as e:
print('Error on line {}'.format(sys.exc_info()[-1].tb_lineno), type(e).__name__, e)
error_list.append(file.name)
counter +=1
print(error_list)

How do I use list comprehension to simplify the sorting and adding of multiple lists and strings?

The loop below will take a list of directories and/or images, and it will return a list of images, including those images in the supplied directories. For each directory that it is given, it verifies that each file is indeed a valid image, before adding that image to the list.
import os
import argparse
parser = argparse.ArgumentParser()
parser.add_argument("-style_image", help="List of images and/or directories")
params = parser.parse_args()
style_image_input = params.style_image.split(',')
valid_ext, style_input_sorted = [".jpg",".png"], None
for image in style_image_input:
if os.path.isdir(image):
for file in os.listdir(image):
print(file)
ext = os.path.splitext(file)[1]
if ext.lower() not in valid_ext:
continue
if style_input_sorted == None:
style_input_sorted = file
else:
style_input_sorted += "," + file
else:
if style_input_sorted == None:
style_input_sorted = image
else:
style_input_sorted += "," + image
style_image_list = style_input_sorted.split(',')
print(style_image_list)
How can I use list comprehension to simplify this loop?
Forget about "how can I turn this into a list comprehension". Start with "how can I simplify this". If, in the end, you get down to a loop with one or two clauses and a simple expression, then you can consider turning that into a list comprehension, but that's the final step, not the starting goal.
Most of your repetition is in the way you're building up style_input_sorted:
Set it to None.
Every time you get a value, if it's None, set it to the value
Otherwise, add a comma and then add the value.
Instead of starting with None, you could start with "", and then do this:
if style_input_sorted:
style_input_sorted += ","
style_input_sorted += file
But, even more simply: what you're doing is the same thing str.join already knows how to do. If you can build up a list of strings, and then just join that list at the end, it'll be a lot simpler:
style_input_sorted = []
if …
for …
style_input_sorted.append(file)
else …
style_input_sorted.append(file)
style_input_sorted = ",".join(style_input_sorted)
But it looks like the only thing you ever do with style_input_sorted is to split it back out into a list anyway. So why even join a string just to split it?
style_input_list = []
if …
for …
style_input_list.append(file)
else …
style_input_list.append(file)
There are some other simplifications you can make, but this is the biggest one, and it'll open the door for the next ones. For example, now that you're only doing one trivial thing instead of four lines of code for valid extensions, you can probably get rid of the continue:
if os.path.isdir(image):
for file in os.listdir(image):
ext = os.path.splitext(file)[1]
if ext.lower() in valid_ext:
style_input_list.append(file)
else:
style_input_list.append(file)
And now we have a piece we could turn into a list comprehension—although I'd use a generator expression:
if os.path.isdir(image):
images = (file for file in os.listdir(image)
if os.path.splitext(file)[1].lower() in valid_ext)
style_input_list.extend(images)
else:
style_input_list.append(image)
But to turn the whole thing into a list comprehension will be horribly ugly. You can turn that into an expression using ternary if and flatten the whole thing out at the end, but if you have five lines of code, that doesn't belong inside a comprehension. (Of course you could factor those five lines out into a function and then wrap a call to that function into a comprehension, which might be worth doing.)

python worm how to make it more complex?

Please be kind this is my second post and i hope you all like.
Here I have made a program that makes directories inside directories,
but the problem is I would like a way to make it self replicate.
Any ideas and help is greatly appreciated.
Before:
user/scripts
After:
user/scripts/worm1/worm2/worm3
The script is as follows:
import os, sys, string, random
worms_made = 0
stop = 20
patha = ''
pathb = '/'
pathc = ''
def fileworm(worms_made, stop, patha, pathb, pathc):
filename = (''.join(random.choice(string.ascii_lowercase
+string.ascii_uppercase + string.digits) for i in range(8)))
pathc = patha + filename + pathb
worms_made = worms_made + 1
os.system("mkdir %s" % filename)
os.chdir(pathc)
print "Worms made: %r" % worms_made
if worms_made == stop:
print "*Done"
exit(0)
elif worms_made != stop:
pass
fileworm(worms_made, stop, patha, pathb, pathc)
fileworm(worms_made, stop, patha, pathb, pathc)
To create a variable depth, you could do something like this:
import os
depth = 3
worms = ['worm{}'.format(x) for x in range(1, depth+1)]
path = os.path.join(r'/user/scripts', *worms)
os.path.makedirs(path)
As mentioned, os.path.makedirs() will create all the required folders in one call. You just need to build the full path.
Python has a function to help with creating paths called os.path.join(). This makes sure the correct / or \ is automatically added for the current operating system between each part.
worms is a list containing ["worm1", "worm2", "worm3"], it is created using a Python feature called a list comprehension. This is passed to the os.path.join() function using * meaning the each element of the list is passed as a separate parameter.
I suggest you try adding print worms or print path to see how it works.
The result is that a string looking something like as follows is passed to the function to create your folder structure:
/user/scripts/worm1/worm2/worm3

how to skip the rest of a sequence

I have a couple of functions that are being called recursively inside nested loops. The ultimate objective of my program is to:
a) loop through each year,
b) within each each year, loop through each month (12 total),
c) within each month, loop through each day (using a self generated day counter),
d) and read 2 files and merge them together into a another file.
In each instance, I am going down into the directory only if exists. Otherwise, I'm to just skip it and go to the next one. My code does a pretty good job when all the files are present, but when one of the files is missing, I would like to just simply skip the whole process of creating a merged file and continue the loops. The problem I am getting is a syntax error that states that continue is not properly in the loop. I am only getting this error in the function definitions, and not outside of them.
Can someone explain why I'm getting this error?
import os, calendar
file01 = 'myfile1.txt'
file02 = 'myfile2.txt'
output = 'mybigfile.txt'
def main():
#ROOT DIRECTORY
top_path = r'C:\directory'
processTop(top_path)
def processTop(path):
year_list = ['2013', '2014', '2015']
for year in year_list:
year_path = os.path.join(path, year)
if not os.path.isdir(year_path):
continue
else:
for month in range(1, 13):
month_path = os.path.join(year_path, month)
if not os.path.isdir(month_path):
continue
else:
numDaysInMth = calendar.monthrange(int(year), month)[1]
for day in range(1, numDaysInMth+1):
processDay(day, month_path)
print('Done!')
def processDay(day, path):
day_path = os.path.join(path, day)
if not os.path.isdir(day_path):
continue
else:
createDailyFile(day_path, output)
def createDailyFile(path, dailyFile):
data01 = openFile(file01, path)
data02 = openFile(file02, path)
if len(data01) == 0 or len(data02) == 0:
# either file is missing
continue
else:
# merge the two datalists into a single list
# create a file with the merged list
pass
def openFile(filename, path):
# return a list of contents of filename
# returns an empty list if file is missing
pass
if __name__ == "__main__": main()
You can use continue only plainly inside a loop (otherwise, what guarantee you have that the function was called in a loop in the first place?) If you need stack unwinding, consider using exceptions (Python exception handling).
I think you can get away with having your functions return a value that would say if operation was completed successfully:
def processDay(day, path):
do_some_job()
if should_continue:
return False
return True
And then in your main code simply say
if not processDay(day, path):
continue
You are probably getting that error in processDay and createDailyFile, right? That's because there is no loop in these functions, and yet you use continue. I'd recommend using return or pass in them.
The continue statement only applies in loops as the error message implies if your functions are structured as you show you can just use pass.
continue can only appear in a loop since it tells python not to execute the lines below and go to the next iteration. Hence, this syntax here is not valid :
def processDay(day, path):
day_path = os.path.join(path, day)
if not os.path.isdir(day_path):
continue # <============ this continue is not inside a loop !
else:
createDailyFile(day_path, output)enter code here
Same for your createDailyFile function.
You may want to replace it with a return ?

Categories