Get an arbitrary file from a directory tree

Get an arbitrary file from a directory tree - python

I'd like to get a path to an arbitrary text file (with .txt suffix) which is present somewhere in the directory tree. The file should not be hidden or in hidden directory.
I tried to write the code but it looks little cumbersome. How would you improve it to avoid useless steps?
def getSomeTextFile(rootDir):
"""Get the path to arbitrary text file under the rootDir"""
for root, dirs, files in os.walk(rootDir):
for f in files:
path = os.path.join(root, f)
ext = path.split(".")[-1]
if ext.lower() == "txt":
# it shouldn't be hidden or in hidden directory
if not "/." in path:
return path
return "" # there isn't any text file

Using os.walk (like in your example) is definitely a good start.
You can use fnmatch (link to the docs here) to simplify the rest of the code.
E.g:
...
if fnmatch.fnmatch(file, '*.txt'):
print file
...

I'd use fnmatch instead of string manipulations.
import os, os.path, fnmatch
def find_files(root, pattern, exclude_hidden=True):
""" Get the path to arbitrary .ext file under the root dir """
for dir, _, files in os.walk(root):
for f in fnmatch.filter(files, pattern):
path = os.path.join(dir, f)
if '/.' not in path or not exclude_hidden:
yield path
I've also rewritten the function to be more generic (and "pythonic"). To get just one pathname, call it like this:
first_txt = next(find_files(some_dir, '*.txt'))

Related

python: collect files with one extention from all sub-dir

I am trying to collect all files with all sub-directories and move to another directory
Code used
#collects all mp3 files from folders to a new folder
import os
from pathlib import Path
import shutil
#run once
path = os.getcwd()
os.mkdir("empetrishki")
empetrishki = path + "/empetrishki" #destination dir
print(path)
print(empetrishki)
#recursive collection
for root, dirs, files in os.walk(path, topdown=True, onerror=None, followlinks=True):
for name in files:
filePath = Path(name)
if filePath.suffix.lower() == ".mp3":
print(filePath)
os.path.join
filePath.rename(empetrishki.joinpath(filePath))
I have trouble with the last line of moving files: filePath.rename() nor shutil.move nor joinpath() have worked for me. Maybe that's because I am trying to change the element in the tuple - the output from os.walk
Similar code works with os.scandir but this would collect files only in the current directory
How can I fix that, thanks!

If you use pathlib.Path(name) that doesn't mean that something exists called name. Hence, you do need to be careful that you have a full path, or relative path, and you need to make sure to resolve those. In particular I am noting that you don't change your working directory and have a line like this:
filePath = Path(name)
This means that while you may be walking down the directory, your working directory may not be changing. You should make your path from the root and the name, it is also a good idea to resolve so that the full path is known.
filePath = Path(root).joinpath(name).resolve()
You can also place the Path(root) outside the inner loop as well. Now you have an absolute path from '/home/' to the filename. Hence, you should be able to rename with .rename(), like:
filePath.rename(x.parent.joinpath(newname))
#Or to another directory
filePath.rename(other_dir.joinpath(newname))
All together:
from pathlib import os, Path
empetrishki = Path.cwd().joinpath("empetrishki").resolve()
for root, dirs, files in os.walk(path, topdown=True, onerror=None, followlinks=True):
root = Path(root).resolve()
for name in files:
file = root.joinpath(name)
if file.suffix.lower() == ".mp3":
file.rename(empetrishki.joinpath(file.name))

for root, dirs, files in os.walk(path, topdown=True, onerror=None, followlinks=True):
if root == empetrishki:
continue # skip the destination dir
for name in files:
basename, extension = os.path.splitext(name)
if extension.lower() == ".mp3":
oldpath = os.path.join(root, name)
newpath = os.path.join(empetrishki, name)
print(oldpath)
shutil.move(oldpath, newpath)
This is what I suggest. Your code is running on the current directory, and the file is at the path os.path.join(root, name) and you need to provide such path to your move function.
Besides, I would also suggest to use os.path.splitext for extracting the file extension. More pythonic. And also you might want to skip scanning your target directory.

Read file in unknown directory

I need to read and edit serveral files, the issue is I know roughly where these files are but not entirely.
so all the files are called QqTest.py in various different directories.
I know that the parent directories are called:
mdcArray = ['MDC0021','MDC0022','MDC0036','MDC0055','MDC0057'
'MDC0059','MDC0061','MDC0062','MDC0063','MDC0065'
'MDC0066','MDC0086','MDC0095','MDC0098','MDC0106'
'MDC0110','MDC0113','MDC0114','MDC0115','MDC0121'
'MDC0126','MDC0128','MDC0135','MDC0141','MDC0143'
'MDC0153','MDC0155','MDC0158']
but after that there is another unknown subdirectory that contains QqTest.txt
so I need to read the QqTest.txt from /MDC[number]/unknownDir/QqTest.txt
So how I wildcard read the file in python similar to how I would in bash
i.e
/MDC0022/*/QqTest.txt

You can use a Python module called glob to do this. It enables Unix style pathname pattern expansions.
import glob
glob.glob("/MDC0022/*/QqTest.txt")
If you want to do it for all items in the list you can try this.
for item in mdcArray:
required_files = glob.glob("{0}/*/QqTest.txt".format(item))
# process files here
Glob documentation

You could search your root folders as follows:
import os
mdcArray = ['MDC0021','MDC0022','MDC0036','MDC0055','MDC0057'
'MDC0059','MDC0061','MDC0062','MDC0063','MDC0065'
'MDC0066','MDC0086','MDC0095','MDC0098','MDC0106'
'MDC0110','MDC0113','MDC0114','MDC0115','MDC0121'
'MDC0126','MDC0128','MDC0135','MDC0141','MDC0143'
'MDC0153','MDC0155','MDC0158']
for root in mdcArray:
for dirpath, dirnames, filenames in os.walk(root):
for filename in filenames:
if filename == 'QqTest.txt':
file = os.path.join(dirpath, filename)
print "Found - {}".format(file)
This would display something like the following:
Found - MDC0022\test\QqTest.txt
The os.walk function can be used to traverse your folder structure.
To search all folders for MDC<number> in the path, you could use the following approach:
import os
import re
for dirpath, dirnames, filenames in os.walk('.'):
if re.search(r'MDC\d+', dirpath):
for filename in filenames:
if filename == 'QqTest.txt':
file = os.path.join(dirpath, filename)
print "Found - {}".format(file)

You might use os.walk. Not exactly what you wanted but will do the job.
rootDir = '.'
for dirName, subdirList, fileList in os.walk(rootDir):
print('Found directory: %s' % dirName)

All Files in Dir & Sub-Dir

I would like to find all the files in a directory and all sub-directories.
code used:
import os
import sys
path = "C:\\"
dirs = os.listdir(path)
filename = "C.txt"
FILE = open(filename, "w")
FILE.write(str(dirs))
FILE.close()
print dirs
The problem is - this code only lists files in directories, not sub-directories. What do I need to change in order to also list files in subdirectories?

To traverse a directory tree you want to use os.walk() for this.
Here's an example to get you started:
import os
searchdir = r'C:\root_dir' # traversal starts in this directory (the root)
for root, dirs, files in os.walk(searchdir):
for name in files:
(base, ext) = os.path.splitext(name) # split base and extension
print base, ext
which would give you access to the file names and the components.
You'll find the functions in the os and os.path module to be of great use for this sort of work.

This function will help you: os.path.walk() http://docs.python.org/library/os.path.html#os.path.walk

Excluding all but a single subdirectory from a file search

I have a directory structure that resembles the following:
Dir1
Dir2
Dir3
Dir4
L SubDir4.1
L SubDir4.2
L SubDir4.3
I want to generate a list of files (with full paths) that include all the contents of Dirs1-3, but only SubDir4.2 inside Dir4. The code I have so far is
import fnmatch
import os
for root, dirs, files in os.walk( '.' )
if 'Dir4' in dirs:
if not 'SubDir4.2' in 'Dir4':
dirs.remove( 'Dir4' )
for file in files
print os.path.join( root, file )
My problem is that the part where I attempt to exclude any file that does not have SubDir4.2 in it's path is excluding everything in Dir4, including the things I would like to remain. How should I amend that above to to do what I desire?
Update 1: I should add that there are a lot of directories below Dir4 so manually listing them in an excludes list isn't a practical option. I'd like to be able to specify SubDur4.2 as the only subdirectory within Dir4 to be read.
Update 2: For reason outside of my control, I only have access to Python version 2.4.3.

There are a few typos in your snippet. I propose this:
import os
def any_p(iterable):
for element in iterable:
if element:
return True
return False
include_dirs = ['Dir4/SubDir4.2', 'Dir1/SubDir4.2', 'Dir3', 'Dir2'] # List all your included folder names in that
for root, dirs, files in os.walk( '.' ):
dirs[:] = [d for d in dirs if any_p(d in os.path.join(root, q_inc) for q_inc in include_dirs)]
for file in files:
print file
EDIT: According to comments, I have changed that so this is include list, instead of an exclude one.
EDIT2: Added a any_p (any() equivalent function for python version < 2.5)
EDIT3bis: if you have other subfolders with the same name 'SubDir4.2' in other folders, you can use the following to specify the location:
include_dirs = ['Dir4/SubDir4.2', 'Dir1/SubDir4.2']
Assuming you have a Dir1/SubDir4.2.
If they are a lot of those, then you may want to refine this approach with fnmatch, or probably a regex query.

I altered mstud's solution to give you what you are looking for:
import os;
for root, dirs, files in os.walk('.'):
# Split the root into its path parts
tmp = root.split(os.path.sep)
# If the lenth of the path is long enough to be your path AND
# The second to last part of the path is Dir4 AND
# The last part of the path is SubDir4.2 THEN
# Stop processing this pass.
if (len(tmp) > 2) and (tmp[-2] == 'Dir4') and (tmp[-1] != 'SubDir4.2'):
continue
# If we aren't in Dir4, print the file paths.
if tmp[-1] != 'Dir4':
for file in files:
print os.path.join(root, file)
In short, the first "if" skips the printing of any directory contents under Dir4 that aren't SubDir4.2. The second "if" skips the printing of the contents of the Dir4 directory.

for root, dirs, files in os.walk('.'):
tmp = root.split(os.path.sep)
if len(tmp)>2 and tmp[-2]=="Dir4" and tmp[-1]=="SubDir4.2":
continue
for file in files:
print os.path.join(root, file)

Get absolute paths of all files in a directory

How do I get the absolute paths of all the files in a directory that could have many sub-folders in Python?
I know os.walk() recursively gives me a list of directories and files, but that doesn't seem to get me what I want.

os.path.abspath makes sure a path is absolute. Use the following helper function:
import os
def absoluteFilePaths(directory):
for dirpath,_,filenames in os.walk(directory):
for f in filenames:
yield os.path.abspath(os.path.join(dirpath, f))

If you have Python 3.4 or newer you can use pathlib (or a third-party backport if you have an older Python version):
import pathlib
for filepath in pathlib.Path(directory).glob('**/*'):
print(filepath.absolute())

If the argument given to os.walk is absolute, then the root dir names yielded during iteration will also be absolute. So, you only need to join them with the filenames:
import os
for root, dirs, files in os.walk(os.path.abspath("../path/to/dir/")):
for file in files:
print(os.path.join(root, file))

Try:
import os
for root, dirs, files in os.walk('.'):
for file in files:
p=os.path.join(root,file)
print p
print os.path.abspath(p)
print

You can use os.path.abspath() to turn relative paths into absolute paths:
file_paths = []
for folder, subs, files in os.walk(rootdir):
for filename in files:
file_paths.append(os.path.abspath(os.path.join(folder, filename)))

Starting with python 3.5 the idiomatic solution would be:
import os
def absolute_file_paths(directory):
path = os.path.abspath(directory)
return [entry.path for entry in os.scandir(path) if entry.is_file()]
This not just reads nicer but also is faster in many cases.
For more details (like ignoring symlinks) see original python docs:
https://docs.python.org/3/library/os.html#os.scandir

All files and folders:
x = [os.path.abspath(os.path.join(directory, p)) for p in os.listdir(directory)]
Images (.jpg | .png):
x = [os.path.abspath(os.path.join(directory, p)) for p in os.listdir(directory) if p.endswith(('jpg', 'png'))]

from glob import glob
def absolute_file_paths(directory):
return glob(join(directory, "**"))

Try:
from pathlib import Path
path = 'Desktop'
files = filter(lambda filepath: filepath.is_file(), Path(path).glob('*'))
for file in files:
print(file.absolute())

I wanted to keep the subdirectory details and not the files and wanted only subdirs with one xml file in them. I can do it this way:
for rootDirectory, subDirectories, files in os.walk(eventDirectory):
for subDirectory in subDirectories:
absSubDir = os.path.join(rootDirectory, subDirectory)
if len(glob.glob(os.path.join(absSubDir, "*.xml"))) == 1:
print "Parsing information in " + absSubDir

for root, directories, filenames in os.walk(directory):
for directory in directories:
print os.path.join(root, directory)
for filename in filenames:
if filename.endswith(".JPG"):
print filename
print os.path.join(root,filename)

Try This
pth=''
types=os.listdir(pth)
for type_ in types:
file_names=os.listdir(f'{pth}/{type_}')
file_names=list(map(lambda x:f'{pth}/{type_}/{x}',file_names))
train_folder+=file_names

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Get an arbitrary file from a directory tree - python

Using os.walk (like in your example) is definitely a good start. You can use fnmatch (link to the docs here) to simplify the rest of the code. E.g: ... if fnmatch.fnmatch(file, '*.txt'): print file ...

Related

python: collect files with one extention from all sub-dir

Read file in unknown directory

All Files in Dir & Sub-Dir

Excluding all but a single subdirectory from a file search

Get absolute paths of all files in a directory

Categories

Resources