Python 3: search subdirectories for a file - python

I'm using Pycharm on a Mac. In the script below I'm calling the os.path.isfile function on a file called dwnld.py. It prints out "File exists" since dwnld.py is in the same directory of the script (/Users/BobSpanks/PycharmProjects/my scripts).
If I was to put dwnld.py in a different location, how to make the code below search all subdirectories starting from /Users/BobbySpanks for dwnld.py? I tried reading os.path notes but I couldn't really find what I needed. I'm new to Python.
import os.path
File = "dwnld.py"
if os.path.isfile(File):
print("File exists")
else:
print("File doesn't exist")

You can use the glob module for this:
import glob
import os
pattern = '/Users/BobbySpanks/**/dwnld.py'
for fname in glob.glob(pattern, recursive=True):
if os.path.isfile(fname):
print(fname)
A simplified version without checking if dwnld.py is actually file:
for fname in glob.glob(pattern, recursive=True):
print(fname)
Theoretically, it could be a directory now.
If recursive is true, the pattern '**' will match any files and zero
or more directories and subdirectories.

This might work for you:
import os
File = 'dwnld.py'
for root, dirs, files in os.walk('/Users/BobbySpanks/'):
if File in files:
print ("File exists")
os.walk(top, topdown=True, onerror=None, followlinks=False)
Generate
the file names in a directory tree by walking the tree either top-down
or bottom-up. For each directory in the tree rooted at directory top
(including top itself), it yields a 3-tuple (dirpath, dirnames,
filenames).
Source

Try this
import os
File = "dwnld.py"
for root, dirs, files in os.walk('.'):
for file in files: # loops through directories and files
if file == File: # compares to your specified conditions
print ("File exists")
Taken from: https://stackoverflow.com/a/31621120/5135450

You may also use the Path module built into python and use the glob method of that
Code for that:-
import Path
File = "dwnld.py"
pattern = '/Users/BobbySpanks/**/dwnld.py'
for fname in Path(pattern.rstrip(File)).glob(File, recursive=True):
if os.path.isfile(fname):
print("File exists")
else:
print("File doesn't exist")

something like this, using os.listdir(dir):
import os
my_dirs = os.listdir(os.getcwd())
for dirs in my_dirs:
if os.path.isdir(dirs):
os.chdir(os.path.join(os.getcwd(), dirs)
#do even more

Related

reading a file and getting the MD5 hash in python

The loop is working but once I put the if statements in it only prints I am a dir
If the if statements are not there I am able to print the dirpath, dirname, filename to the console
I am trying to list all the file names in a directory and get the MD5 sum.
from os import walk
import hashlib
import os
path = "/home/Desktop/myfile"
for (dirpath, dirname, filename) in walk(path):
if os.path.isdir(dirpath):
print("I am a dir")
if os.path.isfile(dirpath):
print(filename, hashlib.md5(open(filename, 'rb').read()).digest())
You're only checking dirpath. What you have as dirname and filename are actually collections of directory names and files under dirpath. Taken from the python docs, and modified slightly, as their example removes the files:
import os
for root, dirs, files in os.walk(top):
for name in files:
print(os.path.join(root, name))
for name in dirs:
print(os.path.join(root, name))
Will print the list of of directories and files under top and then will recurse down the directories in under top and print the folders and directories there.
From the Python documentation about os.walk:
https://docs.python.org/2/library/os.html
dirpath is a string, the path to the directory. dirnames is a list of
the names of the subdirectories in dirpath (excluding '.' and '..').
filenames is a list of the names of the non-directory files in
dirpath.
With os.path.isfile(dirpath) you are checking whether dirpath is a file, which is never the case. Try changing the code to:
full_filename = os.path.join(dirpath, filename)
if os.path.isfile(full_filename):
print(full_filename, hashlib.md5(open(full_filename, 'rb').read()).digest())

Read file in unknown directory

I need to read and edit serveral files, the issue is I know roughly where these files are but not entirely.
so all the files are called QqTest.py in various different directories.
I know that the parent directories are called:
mdcArray = ['MDC0021','MDC0022','MDC0036','MDC0055','MDC0057'
'MDC0059','MDC0061','MDC0062','MDC0063','MDC0065'
'MDC0066','MDC0086','MDC0095','MDC0098','MDC0106'
'MDC0110','MDC0113','MDC0114','MDC0115','MDC0121'
'MDC0126','MDC0128','MDC0135','MDC0141','MDC0143'
'MDC0153','MDC0155','MDC0158']
but after that there is another unknown subdirectory that contains QqTest.txt
so I need to read the QqTest.txt from /MDC[number]/unknownDir/QqTest.txt
So how I wildcard read the file in python similar to how I would in bash
i.e
/MDC0022/*/QqTest.txt
You can use a Python module called glob to do this. It enables Unix style pathname pattern expansions.
import glob
glob.glob("/MDC0022/*/QqTest.txt")
If you want to do it for all items in the list you can try this.
for item in mdcArray:
required_files = glob.glob("{0}/*/QqTest.txt".format(item))
# process files here
Glob documentation
You could search your root folders as follows:
import os
mdcArray = ['MDC0021','MDC0022','MDC0036','MDC0055','MDC0057'
'MDC0059','MDC0061','MDC0062','MDC0063','MDC0065'
'MDC0066','MDC0086','MDC0095','MDC0098','MDC0106'
'MDC0110','MDC0113','MDC0114','MDC0115','MDC0121'
'MDC0126','MDC0128','MDC0135','MDC0141','MDC0143'
'MDC0153','MDC0155','MDC0158']
for root in mdcArray:
for dirpath, dirnames, filenames in os.walk(root):
for filename in filenames:
if filename == 'QqTest.txt':
file = os.path.join(dirpath, filename)
print "Found - {}".format(file)
This would display something like the following:
Found - MDC0022\test\QqTest.txt
The os.walk function can be used to traverse your folder structure.
To search all folders for MDC<number> in the path, you could use the following approach:
import os
import re
for dirpath, dirnames, filenames in os.walk('.'):
if re.search(r'MDC\d+', dirpath):
for filename in filenames:
if filename == 'QqTest.txt':
file = os.path.join(dirpath, filename)
print "Found - {}".format(file)
You might use os.walk. Not exactly what you wanted but will do the job.
rootDir = '.'
for dirName, subdirList, fileList in os.walk(rootDir):
print('Found directory: %s' % dirName)

All Files in Dir & Sub-Dir

I would like to find all the files in a directory and all sub-directories.
code used:
import os
import sys
path = "C:\\"
dirs = os.listdir(path)
filename = "C.txt"
FILE = open(filename, "w")
FILE.write(str(dirs))
FILE.close()
print dirs
The problem is - this code only lists files in directories, not sub-directories. What do I need to change in order to also list files in subdirectories?
To traverse a directory tree you want to use os.walk() for this.
Here's an example to get you started:
import os
searchdir = r'C:\root_dir' # traversal starts in this directory (the root)
for root, dirs, files in os.walk(searchdir):
for name in files:
(base, ext) = os.path.splitext(name) # split base and extension
print base, ext
which would give you access to the file names and the components.
You'll find the functions in the os and os.path module to be of great use for this sort of work.
This function will help you: os.path.walk() http://docs.python.org/library/os.path.html#os.path.walk

Get absolute paths of all files in a directory

How do I get the absolute paths of all the files in a directory that could have many sub-folders in Python?
I know os.walk() recursively gives me a list of directories and files, but that doesn't seem to get me what I want.
os.path.abspath makes sure a path is absolute. Use the following helper function:
import os
def absoluteFilePaths(directory):
for dirpath,_,filenames in os.walk(directory):
for f in filenames:
yield os.path.abspath(os.path.join(dirpath, f))
If you have Python 3.4 or newer you can use pathlib (or a third-party backport if you have an older Python version):
import pathlib
for filepath in pathlib.Path(directory).glob('**/*'):
print(filepath.absolute())
If the argument given to os.walk is absolute, then the root dir names yielded during iteration will also be absolute. So, you only need to join them with the filenames:
import os
for root, dirs, files in os.walk(os.path.abspath("../path/to/dir/")):
for file in files:
print(os.path.join(root, file))
Try:
import os
for root, dirs, files in os.walk('.'):
for file in files:
p=os.path.join(root,file)
print p
print os.path.abspath(p)
print
You can use os.path.abspath() to turn relative paths into absolute paths:
file_paths = []
for folder, subs, files in os.walk(rootdir):
for filename in files:
file_paths.append(os.path.abspath(os.path.join(folder, filename)))
Starting with python 3.5 the idiomatic solution would be:
import os
def absolute_file_paths(directory):
path = os.path.abspath(directory)
return [entry.path for entry in os.scandir(path) if entry.is_file()]
This not just reads nicer but also is faster in many cases.
For more details (like ignoring symlinks) see original python docs:
https://docs.python.org/3/library/os.html#os.scandir
All files and folders:
x = [os.path.abspath(os.path.join(directory, p)) for p in os.listdir(directory)]
Images (.jpg | .png):
x = [os.path.abspath(os.path.join(directory, p)) for p in os.listdir(directory) if p.endswith(('jpg', 'png'))]
from glob import glob
def absolute_file_paths(directory):
return glob(join(directory, "**"))
Try:
from pathlib import Path
path = 'Desktop'
files = filter(lambda filepath: filepath.is_file(), Path(path).glob('*'))
for file in files:
print(file.absolute())
I wanted to keep the subdirectory details and not the files and wanted only subdirs with one xml file in them. I can do it this way:
for rootDirectory, subDirectories, files in os.walk(eventDirectory):
for subDirectory in subDirectories:
absSubDir = os.path.join(rootDirectory, subDirectory)
if len(glob.glob(os.path.join(absSubDir, "*.xml"))) == 1:
print "Parsing information in " + absSubDir
for root, directories, filenames in os.walk(directory):
for directory in directories:
print os.path.join(root, directory)
for filename in filenames:
if filename.endswith(".JPG"):
print filename
print os.path.join(root,filename)
Try This
pth=''
types=os.listdir(pth)
for type_ in types:
file_names=os.listdir(f'{pth}/{type_}')
file_names=list(map(lambda x:f'{pth}/{type_}/{x}',file_names))
train_folder+=file_names

How do I copy files with specific file extension to a folder in my python (version 2.5) script?

I'd like to copy the files that have a specific file extension to a new folder. I have an idea how to use os.walk but specifically how would I go about using that? I'm searching for the files with a specific file extension in only one folder (this folder has 2 subdirectories but the files I'm looking for will never be found in these 2 subdirectories so I don't need to search in these subdirectories). Thanks in advance.
import glob, os, shutil
files = glob.iglob(os.path.join(source_dir, "*.ext"))
for file in files:
if os.path.isfile(file):
shutil.copy2(file, dest_dir)
Read the documentation of the shutil module to choose the function that fits your needs (shutil.copy(), shutil.copy2() or shutil.copyfile()).
If you're not recursing, you don't need walk().
Federico's answer with glob is fine, assuming you aren't going to have any directories called ‘something.ext’. Otherwise try:
import os, shutil
for basename in os.listdir(srcdir):
if basename.endswith('.ext'):
pathname = os.path.join(srcdir, basename)
if os.path.isfile(pathname):
shutil.copy2(pathname, dstdir)
Here is a non-recursive version with os.walk:
import fnmatch, os, shutil
def copyfiles(srcdir, dstdir, filepattern):
def failed(exc):
raise exc
for dirpath, dirs, files in os.walk(srcdir, topdown=True, onerror=failed):
for file in fnmatch.filter(files, filepattern):
shutil.copy2(os.path.join(dirpath, file), dstdir)
break # no recursion
Example:
copyfiles(".", "test", "*.ext")
This will walk a tree with sub-directories. You can do an os.path.isfile check to make it a little safer.
for root, dirs, files in os.walk(srcDir):
for file in files:
if file[-4:].lower() == '.jpg':
shutil.copy(os.path.join(root, file), os.path.join(dest, file))
Copy files with extension "extension" from srcDir to dstDir...
import os, shutil, sys
srcDir = sys.argv[1]
dstDir = sys.argv[2]
extension = sys.argv[3]
print "Source Dir: ", srcDir, "\n", "Destination Dir: ",dstDir, "\n", "Extension: ", extension
for root, dirs, files in os.walk(srcDir):
for file_ in files:
if file_.endswith(extension):
shutil.copy(os.path.join(root, file_), os.path.join(dstDir, file_))

Categories