All Files in Dir & Sub-Dir - python

I would like to find all the files in a directory and all sub-directories.
code used:
import os
import sys
path = "C:\\"
dirs = os.listdir(path)
filename = "C.txt"
FILE = open(filename, "w")
FILE.write(str(dirs))
FILE.close()
print dirs
The problem is - this code only lists files in directories, not sub-directories. What do I need to change in order to also list files in subdirectories?

To traverse a directory tree you want to use os.walk() for this.
Here's an example to get you started:
import os
searchdir = r'C:\root_dir' # traversal starts in this directory (the root)
for root, dirs, files in os.walk(searchdir):
for name in files:
(base, ext) = os.path.splitext(name) # split base and extension
print base, ext
which would give you access to the file names and the components.
You'll find the functions in the os and os.path module to be of great use for this sort of work.

This function will help you: os.path.walk() http://docs.python.org/library/os.path.html#os.path.walk

Related

python: collect files with one extention from all sub-dir

I am trying to collect all files with all sub-directories and move to another directory
Code used
#collects all mp3 files from folders to a new folder
import os
from pathlib import Path
import shutil
#run once
path = os.getcwd()
os.mkdir("empetrishki")
empetrishki = path + "/empetrishki" #destination dir
print(path)
print(empetrishki)
#recursive collection
for root, dirs, files in os.walk(path, topdown=True, onerror=None, followlinks=True):
for name in files:
filePath = Path(name)
if filePath.suffix.lower() == ".mp3":
print(filePath)
os.path.join
filePath.rename(empetrishki.joinpath(filePath))
I have trouble with the last line of moving files: filePath.rename() nor shutil.move nor joinpath() have worked for me. Maybe that's because I am trying to change the element in the tuple - the output from os.walk
Similar code works with os.scandir but this would collect files only in the current directory
How can I fix that, thanks!
If you use pathlib.Path(name) that doesn't mean that something exists called name. Hence, you do need to be careful that you have a full path, or relative path, and you need to make sure to resolve those. In particular I am noting that you don't change your working directory and have a line like this:
filePath = Path(name)
This means that while you may be walking down the directory, your working directory may not be changing. You should make your path from the root and the name, it is also a good idea to resolve so that the full path is known.
filePath = Path(root).joinpath(name).resolve()
You can also place the Path(root) outside the inner loop as well. Now you have an absolute path from '/home/' to the filename. Hence, you should be able to rename with .rename(), like:
filePath.rename(x.parent.joinpath(newname))
#Or to another directory
filePath.rename(other_dir.joinpath(newname))
All together:
from pathlib import os, Path
empetrishki = Path.cwd().joinpath("empetrishki").resolve()
for root, dirs, files in os.walk(path, topdown=True, onerror=None, followlinks=True):
root = Path(root).resolve()
for name in files:
file = root.joinpath(name)
if file.suffix.lower() == ".mp3":
file.rename(empetrishki.joinpath(file.name))
for root, dirs, files in os.walk(path, topdown=True, onerror=None, followlinks=True):
if root == empetrishki:
continue # skip the destination dir
for name in files:
basename, extension = os.path.splitext(name)
if extension.lower() == ".mp3":
oldpath = os.path.join(root, name)
newpath = os.path.join(empetrishki, name)
print(oldpath)
shutil.move(oldpath, newpath)
This is what I suggest. Your code is running on the current directory, and the file is at the path os.path.join(root, name) and you need to provide such path to your move function.
Besides, I would also suggest to use os.path.splitext for extracting the file extension. More pythonic. And also you might want to skip scanning your target directory.

Folder containing subfolders, that contain multiple files (.xlsm, .pdf, .txt). How to rename .pdf files to subfolders' name?

This could be done with python, but I think I am missing a way to loop for all directories. Here is the code I am using:
import os
def renameInDir(directory):
for filename in os.listdir(directory):
if filename.endswith(".pdf"):
path = os.path.realpath(filename)
parents = path.split('/') //make an array of all the dirs in the path. 0 will be the original basefilename
newFilename=os.path.dirname(filename)+directory +parents[-1:][0] //reorganize data into format you want
os.rename(filename, newFilename)//rename the file
You should go with os.walk(). It will map the directory tree by the given directory param, and generate the file names.
Using os.walk() you'll accomplish the desired result is this way:
import os
from os.path import join
for dirpath, dirnames, filenames in os.walk('/path/to/directory'):
for name in filenames:
new_name = name[:-3] + 'new_file_extension'
os.rename(join(dirpath, name), join(dirpath, new_name))

Read file in unknown directory

I need to read and edit serveral files, the issue is I know roughly where these files are but not entirely.
so all the files are called QqTest.py in various different directories.
I know that the parent directories are called:
mdcArray = ['MDC0021','MDC0022','MDC0036','MDC0055','MDC0057'
'MDC0059','MDC0061','MDC0062','MDC0063','MDC0065'
'MDC0066','MDC0086','MDC0095','MDC0098','MDC0106'
'MDC0110','MDC0113','MDC0114','MDC0115','MDC0121'
'MDC0126','MDC0128','MDC0135','MDC0141','MDC0143'
'MDC0153','MDC0155','MDC0158']
but after that there is another unknown subdirectory that contains QqTest.txt
so I need to read the QqTest.txt from /MDC[number]/unknownDir/QqTest.txt
So how I wildcard read the file in python similar to how I would in bash
i.e
/MDC0022/*/QqTest.txt
You can use a Python module called glob to do this. It enables Unix style pathname pattern expansions.
import glob
glob.glob("/MDC0022/*/QqTest.txt")
If you want to do it for all items in the list you can try this.
for item in mdcArray:
required_files = glob.glob("{0}/*/QqTest.txt".format(item))
# process files here
Glob documentation
You could search your root folders as follows:
import os
mdcArray = ['MDC0021','MDC0022','MDC0036','MDC0055','MDC0057'
'MDC0059','MDC0061','MDC0062','MDC0063','MDC0065'
'MDC0066','MDC0086','MDC0095','MDC0098','MDC0106'
'MDC0110','MDC0113','MDC0114','MDC0115','MDC0121'
'MDC0126','MDC0128','MDC0135','MDC0141','MDC0143'
'MDC0153','MDC0155','MDC0158']
for root in mdcArray:
for dirpath, dirnames, filenames in os.walk(root):
for filename in filenames:
if filename == 'QqTest.txt':
file = os.path.join(dirpath, filename)
print "Found - {}".format(file)
This would display something like the following:
Found - MDC0022\test\QqTest.txt
The os.walk function can be used to traverse your folder structure.
To search all folders for MDC<number> in the path, you could use the following approach:
import os
import re
for dirpath, dirnames, filenames in os.walk('.'):
if re.search(r'MDC\d+', dirpath):
for filename in filenames:
if filename == 'QqTest.txt':
file = os.path.join(dirpath, filename)
print "Found - {}".format(file)
You might use os.walk. Not exactly what you wanted but will do the job.
rootDir = '.'
for dirName, subdirList, fileList in os.walk(rootDir):
print('Found directory: %s' % dirName)

match filenames to foldernames then move files

I have files named "a1.txt", "a2.txt", "a3.txt", "a4.txt", "a5.txt" and so on. Then I have folders named "a1_1998", "a2_1999", "a3_2000", "a4_2001", "a5_2002" and so on.
I would like to make the conection between file "a1.txt" & folder "a1_1998" for example. (I'm guessing I'll need a regular expresion to do this). then use shutil to move file "a1.txt" into folder "a1_1998", file "a2.txt" into folder "a2_1999" etc....
I've started like this but I'm stuck because of my lack of understanding of regular expresions.
import re
##list files and folders
r = re.compile('^a(?P')
m = r.match('a')
m.group('id')
##
##Move files to folders
I modified the answer below slightly to use shutil to move the files, did the trick!!
import shutil
import os
import glob
files = glob.glob(r'C:\Wam\*.txt')
for file in files:
# this will remove the .txt extension and keep the "aN"
first_part = file[7:-4]
# find the matching directory
dir = glob.glob(r'C:\Wam\%s_*/' % first_part)[0]
shutil.move(file, dir)
You do not need regular expressions for this.
How about something like this:
import glob
files = glob.glob('*.txt')
for file in files:
# this will remove the .txt extension and keep the "aN"
first_part = file[:-4]
# find the matching directory
dir = glob.glob('%s_*/' % first_part)[0]
os.rename(file, os.path.join(dir, file))
A slight alternative, taking into account Inbar Rose's suggestion.
import os
import glob
files = glob.glob('*.txt')
dirs = glob.glob('*_*')
for file in files:
filename = os.path.splitext(file)[0]
matchdir = next(x for x in dirs if filename == x.rsplit('_')[0])
os.rename(file, os.path.join(matchdir, file))

Get absolute paths of all files in a directory

How do I get the absolute paths of all the files in a directory that could have many sub-folders in Python?
I know os.walk() recursively gives me a list of directories and files, but that doesn't seem to get me what I want.
os.path.abspath makes sure a path is absolute. Use the following helper function:
import os
def absoluteFilePaths(directory):
for dirpath,_,filenames in os.walk(directory):
for f in filenames:
yield os.path.abspath(os.path.join(dirpath, f))
If you have Python 3.4 or newer you can use pathlib (or a third-party backport if you have an older Python version):
import pathlib
for filepath in pathlib.Path(directory).glob('**/*'):
print(filepath.absolute())
If the argument given to os.walk is absolute, then the root dir names yielded during iteration will also be absolute. So, you only need to join them with the filenames:
import os
for root, dirs, files in os.walk(os.path.abspath("../path/to/dir/")):
for file in files:
print(os.path.join(root, file))
Try:
import os
for root, dirs, files in os.walk('.'):
for file in files:
p=os.path.join(root,file)
print p
print os.path.abspath(p)
print
You can use os.path.abspath() to turn relative paths into absolute paths:
file_paths = []
for folder, subs, files in os.walk(rootdir):
for filename in files:
file_paths.append(os.path.abspath(os.path.join(folder, filename)))
Starting with python 3.5 the idiomatic solution would be:
import os
def absolute_file_paths(directory):
path = os.path.abspath(directory)
return [entry.path for entry in os.scandir(path) if entry.is_file()]
This not just reads nicer but also is faster in many cases.
For more details (like ignoring symlinks) see original python docs:
https://docs.python.org/3/library/os.html#os.scandir
All files and folders:
x = [os.path.abspath(os.path.join(directory, p)) for p in os.listdir(directory)]
Images (.jpg | .png):
x = [os.path.abspath(os.path.join(directory, p)) for p in os.listdir(directory) if p.endswith(('jpg', 'png'))]
from glob import glob
def absolute_file_paths(directory):
return glob(join(directory, "**"))
Try:
from pathlib import Path
path = 'Desktop'
files = filter(lambda filepath: filepath.is_file(), Path(path).glob('*'))
for file in files:
print(file.absolute())
I wanted to keep the subdirectory details and not the files and wanted only subdirs with one xml file in them. I can do it this way:
for rootDirectory, subDirectories, files in os.walk(eventDirectory):
for subDirectory in subDirectories:
absSubDir = os.path.join(rootDirectory, subDirectory)
if len(glob.glob(os.path.join(absSubDir, "*.xml"))) == 1:
print "Parsing information in " + absSubDir
for root, directories, filenames in os.walk(directory):
for directory in directories:
print os.path.join(root, directory)
for filename in filenames:
if filename.endswith(".JPG"):
print filename
print os.path.join(root,filename)
Try This
pth=''
types=os.listdir(pth)
for type_ in types:
file_names=os.listdir(f'{pth}/{type_}')
file_names=list(map(lambda x:f'{pth}/{type_}/{x}',file_names))
train_folder+=file_names

Categories