program to print directory structure recursively in python not working - python

I have this directory structure:
test1
file1.txt
test2
file2.txt
test3
file3.txt
test4
file4.txt
my current code to print this directory levels is as follows:
import os
def printRootStructure(dirname,indent=0):
for i in range(indent):
print " ",
print dirname
if os.path.isdir(dirname):
for files in os.listdir(dirname):
printRootStructure(files,indent+1)
printRootStructure("test")
It currently prints as
test
file1.txt
test1
It is not proceeding to next level. any help here to troubleshoot?

Unless you have a specific reason to use recursion, it's simpler to use os.walk to traverse a directory structure.
import os
import os.path as P
for topdir, subdirs, files in os.walk(starting_point):
print " " * topdir.count(P.sep), P.basename(topdir)
for f in sorted(files):
print " " * (topdir.count(P.sep) + 1), f

I think you can fix this by passing the full path name into printRootStructure:
import os
def printRootStructure(dirname,indent=0):
for i in range(indent):
print " ",
print os.path.basename(dirname) # changed
if os.path.isdir(dirname):
for files in os.listdir(dirname):
printRootStructure(os.path.join(dirname,files),indent+1) # changed
As it was in your original code, you were passing just the last part (this is called the "basename") of each file into printRootStructure when you made the recursive calls.
Working directory and path names
Any time you start up a program on a modern computer, your program runs in a fixed location in the filesystem (this is called the "current working directory"). If you invoke a program from the commane-line, the current working directory is simply the path where you invoked the program's name. If you invoke a program by clicking something in a GUI environment, it can get more complicated, but the general behavior is the same: your program always runs in a specific working directory.
All path tests, and in particular os.path.isdir, are evaluated with respect to that working directory. So when you make your first recursive call in your example, you are testing os.path.isdir("test1"), which doesn't exist in the working directory -- it only exists inside "test" !
The fix is to pass the full path name into your recursive calls. Then, because your full name might be excessive when you print out the tree, I added a call to os.path.basename to print just the basename portion of each file.

Related

shutil.move() creates duplicates and fails on subsequent calls

I finished writing a script which creates some files so I'm making a tidy() function which sorts these files in folders. The end result should look like this:
/Scripting
- Output
- script.py
/Scripting/Output
- Folder1
- Folder2
- Folder3
Each folder contains the necessary files
I managed to create the list of folders and get the files in them without any problem so I now have in /Project: script.py, folder1, folder2, etc... I copy pasted most of the code from the first part in order to move them into the Output folder. The following code is executed with every subfolder containing their respective files is located in the same directory as the script.
try:
os.mkdir('output')
except FileExistsError:
pass
for file in os.listdir():
if '.' not in file and file != 'output':
shutil.move(file, f'{os.getcwd()}/output/{file})
The problem is that if I look into my folder after running, I find the following directory tree:
/Output
- Folder1
- Folder1
- File1
- File2
I get a duplicate folder within that folder and I don't understand where it's coming from. If I try to call the script again, I get the error: shutil.Error destination path 'Scripting/output/folder1/fodler1' already exists
What am I doing wrong?
Edit:
Here's the new code:
try:
os.mkdir('output')
except FileExistsError:
pass
obj = os.scandir()
cwd = os.getcwd()
for entry in obj:
if entry.is_dir() and not entry.name.startswith('.'):
continue
shutil.move(entry.name, f'{cwd}/'/output/'{entry.name}')
This works the first time I run it, but breaks if I keep calling the script by giving me the same mistake as above. It creates folder1 within folder1 only on subsequent calls and I can't find a reason for it.
Found the answer mostly by trial and error. I initially chose to use shutil.move() because it replaces a file if it finds another one with the same name. However, it does not do this with directory. It will instead add to that path. /Scripting/Output/Folder1/ as a destination path for Folder1 would give an error when I run the script a second time so instead of replacing the folder, it simple adds it into its path which would then become /Scripting/Output/Folder1/Folder1/ while still adding the files to the initial path (it looks like it runs the shutil.move() on everything within that path). To fix this, use obj = os.scandir() with obj.is_dir() and obj.name to parse your folders. Either os.rmdir() the extra folder every time, or add the folders before adding the files. This is the code that worked for me:
cwd = os.getcwd()
try:
os.mkdir('output')
except:
pass
os.chdir('output')
for name in folder_names:
try:
os.mkdir(name)
except:
pass
os.chdir('..')
obj = os.scandir()
cwd = os.getcwd()
for f in obj:
if f.is_file():
if True:# depends on how your files are organized
shutil.move(f.name, f'{cwd}/output/folder1/{f.name}')
# Do this for every file

Finding the "root" of a directory

I am attempting to write a function in python that scans the contents of a directory at the script's level (once de-bugged I'll switch it to not needing to be at the same level but for this problem it's irrelevant) and recursively lists the paths to anything that is not a directory. The logic I am working with is:
If the parent "directory" is not a directory then it must be a file so print the path to it. Otherwise, for every "file" in that directory, if each "file" is not actually a directory, state the path to the file, and if the "file" is actually a directory, call the function again.
The environment I am using is as follows: I have the script at the same level as a directory named a, and inside a is a file d.txt, as well as another directory named b. Inside b is a file c.text. Per the way I would like this function to execute, first it should recognize that a is in fact a directory, and therefore begin to iterate over its contents. When it encounters d.txt, it should print out the path to it, and then when it encounters directory b it should begin to iterate over it's contents and thereby print the path to c.txt when it sees it. So in this example, the output of the script should be "C:\private\a\d.txt, C:\private\a\b\c.txt" but instead it is "C:\private\d.txt, C:\private\b". Here is the code thus far:
import os
def find_root(directory):
if not os.path.isdir(directory):
print(os.path.abspath(directory))
else:
for file in os.listdir(directory):
if not os.path.isdir(file):
print(os.path.abspath(file))
else:
find_root(file)
find_root('a')
[Python]: os.listdir(path='.'):
Return a list containing the names of the entries in the directory given by path.
but they are just basenames. So, in order for them to make sense, when you go a level deeper in the recursion either:
Prepend the "current" folder to their name
cd to each folder (and also, cd back when returning from recursion)
Here's your code modified to use the 1st approach:
import os
def find_root(path):
if os.path.isdir(path):
for item in os.listdir(path):
full_item = os.path.join(path, item)
if os.path.isdir(full_item):
find_root(full_item)
else:
print(os.path.abspath(full_item))
else:
print(os.path.abspath(path))
if __name__ == "__main__":
find_root("a")
Notes:
I recreated your folder structure
I renamed some of the variables for clarity
I reversed the negate conditions
Output:
c:\Work\Dev\StackOverflow\q47193260>"c:\Work\Dev\VEnvs\py35x64_test\Scripts\python.exe" a.py
c:\Work\Dev\StackOverflow\q47193260\a\b\c.txt
c:\Work\Dev\StackOverflow\q47193260\a\d.txt

How to change directories within Python?

I have the following code. It works for the first directory but not the second one...
What I am trying to do is to count the lines on each of the files in different directory.
import csv
import copy
import os
import sys
import glob
os.chdir('Deployment/Work/test1/src')
names={}
for fn in glob.glob('*.c'):
with open(fn) as f:
names[fn]=sum(1 for line in f if line.strip() and not line.startswith('/') and not line.startswith('#') and not line.startswith('/*')and not line.startswith(' *'))
print ("Lines test 1 ", names)
test1 = names
os.chdir('Deployment/Work/test2/src')
names={}
for fn in glob.glob('*.c'):
with open(fn) as f:
names[fn]=sum(1 for line in f if line.strip() and not line.startswith('/') and not line.startswith('#') and not line.startswith('/*')and not line.startswith(' *'))
print ("Lines test 2 ", names)
test2 = names
print ("Lines ", test1 + test2)
Traceback:
FileNotFoundError: [WinError 3] The system cannot find the path specified: 'Deployment/Work/test2/src'
You'll either have to return to the root directory using as many .. as required, store the root directory or specify a full directory from your home:
curr_path = os.getcwd()
os.chdir('Deployment/Work/test2/src')
os.chdir(curr_path)
os.chdir('Deployment/Work/test2/src')
Or:
os.chdir('Deployment/Work/test2/src')
os.chdir('../../../../Deployment/Work/test2/src') # Not advisable
Instead of the above, you may consider more Pythonic ways to change directories on the fly, like using a context manager for directories:
import contextlib
import os
#contextlib.contextmanager
def working_directory(path):
prev_cwd = os.getcwd()
os.chdir(path)
yield
os.chdir(prev_cwd)
with working_directory('Deployment/Work/test1/src'):
names = {}
for fn in glob.glob('*.c'):
with working_directory('Deployment/Work/test2/src'):
names = {}
for fn in glob.glob('*.c'):
...
You simply specify the relative directory from the current directory, and then run your code in the context of that directory.
Your os.chdir is interpreted relative to the current working directory. Your first os.chdir changes the working directory. The system tries to find the second path relative to the first path.
There are several ways to solve this. You can keep track of the current directory and change back to it. Else make the second os.chdir relative to the first directory. (E.g. os.chdir(../../test2/src'). This is slightly ugly. A third option is to make all paths absolute instead of relative.
I suppose the script is not working because you are trying to change the directory using a relative path. This means that when you execute the first os.chdir you change your working directory from the current one to 'Deployment/Work/test1/src' and when you call os.chdir the second time the function tries to change working directory to 'Deployment/Work/test1/src/Deployment/Work/test2/src' that I suppose is not what you want.
To solve this you can either use an absolute path:
os.chdir('/Deployment/Work/test1/src')
or before the first os.chdir you could keep track of your current folder:
current = os.getcwd()
os.chdir('Deployment/Work/test1/src')
...
os.chdir(current)
os.chdir('Deployment/Work/test2/src')

Finding files in directories in Python

I've been doing some scripting where I need to access the os to name images (saving every subsequent zoom of the Mandelbrot set upon clicking) by counting all of the current files in the directory and then using %s to name them in the string after calling the below function and then adding an option to delete them all
I realize the below will always grab the absolute path of the file but assuming we're always in the same directory is there not a simplified version to grab the current working directory
def count_files(self):
count = 0
for files in os.listdir(os.path.abspath(__file__))):
if files.endswith(someext):
count += 1
return count
def delete_files(self):
for files in os.listdir(os.path.abspath(__file__))):
if files.endswith(.someext):
os.remove(files)
Since you're doing the .endswith thing, I think the glob module might be of some interest.
The following prints all files in the current working directory with the extension .py. Not only that, it returns only the filename, not the path, as you said you wanted:
import glob
for fn in glob.glob('*.py'): print(fn)
Output:
temp1.py
temp2.py
temp3.py
_clean.py
Edit: re-reading your question, I'm unsure of what you were really asking. If you wanted an easier way to get the current working directory than
os.path.abspath(__file__)
Then yes, os.getcwd()
But os.getcwd() will change if you change the working directory in your script (e.g. via os.chdir(), whereas your method will not.
Using antipathy* it gets a little easier:
from antipathy import Path
def count_files(pattern):
return len(Path(__file__).glob(pattern))
def deletet_files(pattern):
Path(__file__).unlink(pattern)
*Disclosure: I'm the author of antipathy.
You can use os.path.dirname(path) to get the parent directory of the thing path points to.
def count_files(self):
count = 0
for files in os.listdir(os.path.dirname(os.path.abspath(__file__)))):
if files.endswith(someext):
count += 1
return count

Find files that have been changed recursively

I am attempting to write a simple script to recursively rip through a directory and check if any of the files have been changed. I only have the traversal so far:
import fnmatch
import os
from optparse import OptionParser
rootPath = os.getcwd()
pattern = '*.js'
for root, dirs, files in os.walk(rootPath):
for filename in files:
print( os.path.join(root, filename))
I have two issues:
1. How do I tell if a file has been modified?
2. How can I check if a directory has been modified? - I need to do this because the folder I wish to traverse is huge. If I can check if the dir has been modified and not recursively rip through an unchanged dir, this would greatly help.
Thanks!
If you are comparing two files between two folders, you can use os.path.getmtime() on both files and compare the results. If they're the same, they haven't been modified. Note that this will work on both files and folders.
The typical fast way to tell if a file has been modified is to use os.path.getmtime(path) (assuming a Linux or similar environment). This will give you the modification timestamp, which you can compare to a stored timestamp to determine if a file has been modified.
getmtime() works on directories too, but it will only tell you whether a file has been added, removed or renamed in the directory; it will not tell you whether a file has been modified inside the directory.
This is my own implementation of what you might be looking for. Mind that, beside timestamps you might want to track files that have been added or deleted too (like i do). If not you can just change the code on line:
if now == before:
here is the code:
# check if any txt file in folder "wd" has been modified (rewritten added or deleted)
def src_dir_modified(wd):
now = []
global before
all_files = glob.glob(os.path.join(wd,'*.txt'))
for infile in all_files:
now.append([infile, os.stat(infile).st_mtime])
if now == before: # compare files and their time stamps
return False
else:
before = now
print 'Source code has been modified.'
return True
If you can admit the use of a command-line tool, you could use rsync instead of re-inventing the wheel. rsync uses file modification time and file size to decide if a file has been changed or not.
rsync --verbose --recursive --dry-run dir1 dir2 should get the differences between files in dir1 and dir2. You can write the output to a log file to act on it.

Categories