Traversing Directories - python

I am trying to write a python program that takes an input directory, and prints out all the .txt files that are in the directory. However, if there is another folder inside that one, it must do the same thing using recursion.
My problem is that is only does the .txt files and does not traverse further into the directory.
import os
path = input("What directory would you like to search?: ")
def traverse(path):
files = os.listdir(path)
for i in files:
if os.path.isdir(i) == True:
traverse(i)
elif i.endswith('.txt'):
print(i)
traverse(path)
What is the problem?

It looks like the reason your code is failing is because the if os.path.isdir(i) == True line always fails, regardless of whether or not the file is the directory. This is because the files variable stores relative paths rather than absolute paths, which causes the check to fail.
If you want to do it using the recursion method you gave, your code can be changed as follows:
import os
path = input("What directory would you like to search?: ")
def traverse(path):
files = os.listdir(path)
files = (os.path.join(os.path.abspath(path), file) for file in files)
for i in files:
if os.path.isdir(i) == True:
traverse(i)
elif i.endswith('.txt'):
print(i)
traverse(path)
Here is a better way to do it using fnmatch (adapted to suit the rest of your code from Use a Glob() to find files recursively in Python?). It will recursively search all files in the supplied directory, and match those that end with
import fnmatch
import os
path = input("What directory would you like to search?: ")
def traverse(path):
matches = []
for root, dirnames, filenames in os.walk(path):
for filename in fnmatch.filter(filenames, '*.txt'):
matches.append(os.path.join(root, filename))
print matches
traverse(path)

You are missing full path otherwise it's fine. see below
def traverse(path):
files = os.listdir(path)
for i in files:
if os.path.isdir(os.path.join(path,i)):
traverse(os.path.join(path,i))
elif i.endswith('.txt'):
print(os.path.join(path,i))

Related

python: collect files with one extention from all sub-dir

I am trying to collect all files with all sub-directories and move to another directory
Code used
#collects all mp3 files from folders to a new folder
import os
from pathlib import Path
import shutil
#run once
path = os.getcwd()
os.mkdir("empetrishki")
empetrishki = path + "/empetrishki" #destination dir
print(path)
print(empetrishki)
#recursive collection
for root, dirs, files in os.walk(path, topdown=True, onerror=None, followlinks=True):
for name in files:
filePath = Path(name)
if filePath.suffix.lower() == ".mp3":
print(filePath)
os.path.join
filePath.rename(empetrishki.joinpath(filePath))
I have trouble with the last line of moving files: filePath.rename() nor shutil.move nor joinpath() have worked for me. Maybe that's because I am trying to change the element in the tuple - the output from os.walk
Similar code works with os.scandir but this would collect files only in the current directory
How can I fix that, thanks!
If you use pathlib.Path(name) that doesn't mean that something exists called name. Hence, you do need to be careful that you have a full path, or relative path, and you need to make sure to resolve those. In particular I am noting that you don't change your working directory and have a line like this:
filePath = Path(name)
This means that while you may be walking down the directory, your working directory may not be changing. You should make your path from the root and the name, it is also a good idea to resolve so that the full path is known.
filePath = Path(root).joinpath(name).resolve()
You can also place the Path(root) outside the inner loop as well. Now you have an absolute path from '/home/' to the filename. Hence, you should be able to rename with .rename(), like:
filePath.rename(x.parent.joinpath(newname))
#Or to another directory
filePath.rename(other_dir.joinpath(newname))
All together:
from pathlib import os, Path
empetrishki = Path.cwd().joinpath("empetrishki").resolve()
for root, dirs, files in os.walk(path, topdown=True, onerror=None, followlinks=True):
root = Path(root).resolve()
for name in files:
file = root.joinpath(name)
if file.suffix.lower() == ".mp3":
file.rename(empetrishki.joinpath(file.name))
for root, dirs, files in os.walk(path, topdown=True, onerror=None, followlinks=True):
if root == empetrishki:
continue # skip the destination dir
for name in files:
basename, extension = os.path.splitext(name)
if extension.lower() == ".mp3":
oldpath = os.path.join(root, name)
newpath = os.path.join(empetrishki, name)
print(oldpath)
shutil.move(oldpath, newpath)
This is what I suggest. Your code is running on the current directory, and the file is at the path os.path.join(root, name) and you need to provide such path to your move function.
Besides, I would also suggest to use os.path.splitext for extracting the file extension. More pythonic. And also you might want to skip scanning your target directory.

Recursive file walk only going one depth down

I am trying to create code for an assignment that can walk down a directory and return all files
I am having trouble with multilevel folders, such as
folder1
---> folder2
------->foo.txt
I have the following code
def find_larger(path, max_n_results=10):
files = []
print(path)
path_files = os.listdir(path)
for file in path_files:
if os.path.isdir(file):
files += find_larger(os.path.join(path, file))
files.append(file)
return files
print(find_larger('.'))
However, If I were to run that code I would get the following result
[folder1, folder2]
I have run through this is a debugger and the program is not detecting the second directory to actually be a directory.
How can I get the program to walk all the way through the directory.
Note, I am not allowed to user os.walk
os.path.isdir() takes a full path, you are only giving it the relative name. Create the path first, then test that:
def find_larger(path, max_n_results=10):
files = []
print(path)
path_files = os.listdir(path)
for file in path_files:
subpath = os.path.join(path, file)
if os.path.isdir(subpath):
files += find_larger(subpath)
files.append(subpath)
return files
However, you are re-inventing a wheel here, just use the os.walk() function to list directory contents :
def find_larger(path, max_n_results=10):
files = []
print(path)
for dirpath, dirnames, filenames in os.walk(path):
files += (os.join(dirpath, filename) for filename in filenames)
return files

Python error extracting zip file

I only know how to write python for GIS purposes. There is more to this code using arcpy and geoprocessing tools.... this is just the beginning part I'm stuck on that is for trying to get data ready so I can then use the shapefiles within the zipped folder for the rest of my script
I am trying to prompt the user to enter a directory to search through. For use of this script it will be searching for a compressed zip file, then extract all the files to that same directory.
import zipfile, os
# ask what directory to search in
mypath = input("Enter .zip folder path: ")
extension = ".zip"
os.chdir(mypath) # change directory from working dir to dir with files
for item in os.listdir(mypath):
if item.endswith(extension):
filename = os.path.abspath(item)
zip_ref = zipfile.ZipFile(filename)
zip_ref.extractall(mypath)
zip_ref.close()
Tried with y'alls suggestions and still have issues with the following:
import zipfile, os
mypath = input("Enter folder: ")
if os.path.isdir(mypath):
for dirpath, dirname, filenames in os.listdir(mypath):
for file in filenames:
if file.endswith(".zip"):
print(os.path.abspath(file))
with zipfile.ZipFile(os.path.abspath(file)) as z:
z.extractall(mypath)
else:
print("Directory does not exist.")
I'm not sure on the use of arcpy. However...
To iterate over entries in a directory, use os.listdir:
for entry_name in os.listdir(directory_path):
# ...
Inside the loop, entry_name will be the name of an item in the directory at directory_path.
When checking if it ends with ".zip", keep in mind that the comparison is case sensitive. You can use str.lower to effectively ignore case when using str.endswith:
if entry_name.lower().endswith('.zip'):
# ...
To get the full path to the entry (in this case, your .zip), use os.path.join:
entry_path = os.path.join(directory_path, entry_name)
Pass this full path to zipfile.ZipFile.
Your first if-block will negate the else-block if the location is invalid. I'd remove the 'else' operator entirely. If you keep it, the if-check effectively kills the program. The "if folderExist" is sufficient to replace the else.
import arcpy, zipfile, os
# ask what directory to search in
folder = input("Where is the directory? ")
# set workspace as variable so can change location
arcpy.env.workspace = folder
# check if invalid entry - if bad, ask to input different location
if len(folder) == 0:
print("Invalid location.")
new_folder = input("Try another directory?")
new_folder = folder
# does the above replace old location and re set as directory location?
# check to see if folder exists
folderExist = arcpy.Exists(folder)
if folderExist:
# loop through files in directory
for item in folder:
# check for .zip extension
if item.endswith(".zip"):
file_name = os.path.abspath(item) # get full path of files
print(file_name)
zip_ref = zipfile.ZipFile(file_name) # create zipfile object
zip_ref.extractall(folder) # extract all to directory
zip_ref.close() # close file
This may be neater if you're okay not using your original code:
import zipfile, os
from tkinter import filedialog as fd
# ask what directory to search in
folder = fd.askdirectory(title="Where is the directory?")
# loop through files in directory
for item in os.listdir(folder):
# check for .zip extension
if zipfile.is_zipfile(item):
file_name = os.path.abspath(item) # get full path of files
# could string combine to ensure path
# file_name = folder + "/" + item
print(file_name)
zip_ref = zipfile.ZipFile(file_name) # create zipfile object
zip_ref.extractall(folder) # extract all to directory
zip_ref.close() # close file
[UPDATED]
I was able to solve the same with the following code
import os
import zipfile
mypath = raw_input('Enter Folder: ')
if os.path.isdir(mypath):
for file in os.listdir(mypath):
if file.endswith('.zip'):
with zipfile.ZipFile(os.path.join(mypath, file)) as z:
z.extractall(mypath)
else:
print('Directory does not exist')

All Files in Dir & Sub-Dir

I would like to find all the files in a directory and all sub-directories.
code used:
import os
import sys
path = "C:\\"
dirs = os.listdir(path)
filename = "C.txt"
FILE = open(filename, "w")
FILE.write(str(dirs))
FILE.close()
print dirs
The problem is - this code only lists files in directories, not sub-directories. What do I need to change in order to also list files in subdirectories?
To traverse a directory tree you want to use os.walk() for this.
Here's an example to get you started:
import os
searchdir = r'C:\root_dir' # traversal starts in this directory (the root)
for root, dirs, files in os.walk(searchdir):
for name in files:
(base, ext) = os.path.splitext(name) # split base and extension
print base, ext
which would give you access to the file names and the components.
You'll find the functions in the os and os.path module to be of great use for this sort of work.
This function will help you: os.path.walk() http://docs.python.org/library/os.path.html#os.path.walk

Excluding all but a single subdirectory from a file search

I have a directory structure that resembles the following:
Dir1
Dir2
Dir3
Dir4
L SubDir4.1
L SubDir4.2
L SubDir4.3
I want to generate a list of files (with full paths) that include all the contents of Dirs1-3, but only SubDir4.2 inside Dir4. The code I have so far is
import fnmatch
import os
for root, dirs, files in os.walk( '.' )
if 'Dir4' in dirs:
if not 'SubDir4.2' in 'Dir4':
dirs.remove( 'Dir4' )
for file in files
print os.path.join( root, file )
My problem is that the part where I attempt to exclude any file that does not have SubDir4.2 in it's path is excluding everything in Dir4, including the things I would like to remain. How should I amend that above to to do what I desire?
Update 1: I should add that there are a lot of directories below Dir4 so manually listing them in an excludes list isn't a practical option. I'd like to be able to specify SubDur4.2 as the only subdirectory within Dir4 to be read.
Update 2: For reason outside of my control, I only have access to Python version 2.4.3.
There are a few typos in your snippet. I propose this:
import os
def any_p(iterable):
for element in iterable:
if element:
return True
return False
include_dirs = ['Dir4/SubDir4.2', 'Dir1/SubDir4.2', 'Dir3', 'Dir2'] # List all your included folder names in that
for root, dirs, files in os.walk( '.' ):
dirs[:] = [d for d in dirs if any_p(d in os.path.join(root, q_inc) for q_inc in include_dirs)]
for file in files:
print file
EDIT: According to comments, I have changed that so this is include list, instead of an exclude one.
EDIT2: Added a any_p (any() equivalent function for python version < 2.5)
EDIT3bis: if you have other subfolders with the same name 'SubDir4.2' in other folders, you can use the following to specify the location:
include_dirs = ['Dir4/SubDir4.2', 'Dir1/SubDir4.2']
Assuming you have a Dir1/SubDir4.2.
If they are a lot of those, then you may want to refine this approach with fnmatch, or probably a regex query.
I altered mstud's solution to give you what you are looking for:
import os;
for root, dirs, files in os.walk('.'):
# Split the root into its path parts
tmp = root.split(os.path.sep)
# If the lenth of the path is long enough to be your path AND
# The second to last part of the path is Dir4 AND
# The last part of the path is SubDir4.2 THEN
# Stop processing this pass.
if (len(tmp) > 2) and (tmp[-2] == 'Dir4') and (tmp[-1] != 'SubDir4.2'):
continue
# If we aren't in Dir4, print the file paths.
if tmp[-1] != 'Dir4':
for file in files:
print os.path.join(root, file)
In short, the first "if" skips the printing of any directory contents under Dir4 that aren't SubDir4.2. The second "if" skips the printing of the contents of the Dir4 directory.
for root, dirs, files in os.walk('.'):
tmp = root.split(os.path.sep)
if len(tmp)>2 and tmp[-2]=="Dir4" and tmp[-1]=="SubDir4.2":
continue
for file in files:
print os.path.join(root, file)

Categories