I have a folder with many sub-folders, and each sub-folder also has two sub-folders. The structure looks like this:
--folder
--sub-f1
--1-1000
--1-1050
--sub-f2
--1-1030
--1-1060
--sub-f3
--1-1040
--1-1070
What I want to achieve is extract the folder with smaller number in name(in the above example, 1-1000, 1-1030 and 1-1040) and rename these folders according to their parent folders(sub-f1-1, sub-f2-1 and sub-f3-1). I'm running Windows 10 and any simple solutions are welcome!
I have written a fairly verbose algorithm to explain step by step how it works:
from glob import glob
import os
root_dir = 'folder' # specify here your root dir, in your example is "folder"
parent_dirs = glob(f"{root_dir}/*", recursive=True) # detect all first level subdirs
for parent in parent_dirs:
# for each parent dirs, detect all childs
child_dirs = glob(f"{parent}/*", recursive=True)
# for each child, extract the last part of name (e.g. 1000) and append to list
child_nums = []
for child in child_dirs:
child_nums.append(child.split(os.sep)[-1].split("-")[-1])
# find smallest child num and its index in list to retrieve corrispondent full path
small_child = min(child_nums)
small_child_index = child_nums.index(small_child)
src_folder = child_dirs[small_child_index]
# compose the destination dir name
dst_folder = os.path.join(parent, parent.split(os.sep)[1] + "-1")
os.rename(src_folder, dst_folder)
print(f"src_folder '{src_folder}' renamed as: '{dst_folder}'")
output will be:
src_folder 'folder\sub-f1\1-1000' renamed as: 'folder\sub-f1\sub-f1-1'
src_folder 'folder\sub-f2\1-1030' renamed as: 'folder\sub-f2\sub-f2-1'
src_folder 'folder\sub-f3\1-1040' renamed as: 'folder\sub-f3\sub-f3-1'
This solution can be implemented in many other ways, even without using glob and just using os.walk() for example. You can use much more optimized algorithms but it depends on how many folders you need to rename. This is just a simple example.
I feel that assigning files, and folders and doing the += [item] part is a bit hackish. Any suggestions? I'm using Python 3.2
from os import *
from os.path import *
def dir_contents(path):
contents = listdir(path)
files = []
folders = []
for i, item in enumerate(contents):
if isfile(contents[i]):
files += [item]
elif isdir(contents[i]):
folders += [item]
return files, folders
os.walk and os.scandir are great options, however, I've been using pathlib more and more, and with pathlib you can use the .glob() or .rglob() (recursive glob) methods:
root_directory = Path(".")
for path_object in root_directory.rglob('*'):
if path_object.is_file():
print(f"hi, I'm a file: {path_object}")
elif path_object.is_dir():
print(f"hi, I'm a dir: {path_object}")
Take a look at the os.walk function which returns the path along with the directories and files it contains. That should considerably shorten your solution.
For anyone looking for a solution using pathlib (python >= 3.4)
from pathlib import Path
def walk(path):
for p in Path(path).iterdir():
if p.is_dir():
yield from walk(p)
continue
yield p.resolve()
# recursively traverse all files from current directory
for p in walk(Path('.')):
print(p)
# the function returns a generator so if you need a list you need to build one
all_files = list(walk(Path('.')))
However, as mentioned above, this does not preserve the top-down ordering given by os.walk
Since Python >= 3.4 the exists the generator method Path.rglob.
So, to process all paths under some/starting/path just do something such as
from pathlib import Path
path = Path('some/starting/path')
for subpath in path.rglob('*'):
# do something with subpath
To get all subpaths in a list do list(path.rglob('*')).
To get just the files with sql extension, do list(path.rglob('*.sql')).
If you want to recursively iterate through all the files, including all files in the subfolders, I believe this is the best way.
import os
def get_files(input):
for fd, subfds, fns in os.walk(input):
for fn in fns:
yield os.path.join(fd, fn)
## now this will print all full paths
for fn in get_files(fd):
print(fn)
Since Python 3.4 there is new module pathlib. So to get all dirs and files one can do:
from pathlib import Path
dirs = [str(item) for item in Path(path).iterdir() if item.is_dir()]
files = [str(item) for item in Path(path).iterdir() if item.is_file()]
Another solution how to walk a directory tree using the pathlib module:
from pathlib import Path
for directory in Path('.').glob('**'):
for item in directory.iterdir():
print(item)
The pattern ** matches current directory and all subdirectories, recursively, and the method iterdir then iterates over each directory's contents. Useful when you need more control when traversing the directory tree.
def dir_contents(path):
files,folders = [],[]
for p in listdir(path):
if isfile(p): files.append(p)
else: folders.append(p)
return files, folders
Indeed using
items += [item]
is bad for many reasons...
The append method has been made exactly for that (appending one element to the end of a list)
You are creating a temporary list of one element just to throw it away. While raw speed should not your first concern when using Python (otherwise you're using the wrong language) still wasting speed for no reason doesn't seem the right thing.
You are using a little asymmetry of the Python language... for list objects writing a += b is not the same as writing a = a + b because the former modifies the object in place, while the second instead allocates a new list and this can have a different semantic if the object a is also reachable using other ways. In your specific code this doesn't seem the case but it could become a problem later when someone else (or yourself in a few years, that is the same) will have to modify the code. Python even has a method extend with a less subtle syntax that is specifically made to handle the case in which you want to modify in place a list object by adding at the end the elements of another list.
Also as other have noted seems that your code is trying to do what os.walk already does...
Instead of the built-in os.walk and os.path.walk, I use something derived from this piece of code I found suggested elsewhere which I had originally linked to but have replaced with inlined source:
import os
import stat
class DirectoryStatWalker:
# a forward iterator that traverses a directory tree, and
# returns the filename and additional file information
def __init__(self, directory):
self.stack = [directory]
self.files = []
self.index = 0
def __getitem__(self, index):
while 1:
try:
file = self.files[self.index]
self.index = self.index + 1
except IndexError:
# pop next directory from stack
self.directory = self.stack.pop()
self.files = os.listdir(self.directory)
self.index = 0
else:
# got a filename
fullname = os.path.join(self.directory, file)
st = os.stat(fullname)
mode = st[stat.ST_MODE]
if stat.S_ISDIR(mode) and not stat.S_ISLNK(mode):
self.stack.append(fullname)
return fullname, st
if __name__ == '__main__':
for file, st in DirectoryStatWalker("/usr/include"):
print file, st[stat.ST_SIZE]
It walks the directories recursively and is quite efficient and easy to read.
Try using the append method.
While googling for the same info, I found this question.
I am posting here the smallest, clearest code which I found at http://www.pythoncentral.io/how-to-traverse-a-directory-tree-in-python-guide-to-os-walk/ (rather than just posting the URL, in case of link rot).
The page has some useful info and also points to a few other relevant pages.
# Import the os module, for the os.walk function
import os
# Set the directory you want to start from
rootDir = '.'
for dirName, subdirList, fileList in os.walk(rootDir):
print('Found directory: %s' % dirName)
for fname in fileList:
print('\t%s' % fname)
I've not tested this extensively yet, but I believe
this will expand the os.walk generator, join dirnames to all the file paths, and flatten the resulting list; To give a straight up list of concrete files in your search path.
import itertools
import os
def find(input_path):
return itertools.chain(
*list(
list(os.path.join(dirname, fname) for fname in files)
for dirname, _, files in os.walk(input_path)
)
)
I currently have a problem where I want to get a list of directories that are at an n-1 level. The structure looks somewhat like the diagram below, and I want a list of all the folders that are blue in color. The height of the tree however, varies across the entire file system.
Due to the fact that all the folders that are blue, generally end their name with the string images, I have written the code in Python below:
def getDirList(dir):
dirList = [x[0] for x in os.walk(dir)]
return dirList
oldDirList = getDirList(sys.argv[1])
dirList = []
# Hack method for getting the folders
for i, dir in enumerate(oldDirList):
if dir.endswith('images'):
dirList.append(oldDirList[i] + '/')
Now, I do not want to use this method, since I want a general solution to this problem, with Python or bash scripting and then read the bash script result into Python. Which one would be more efficient in practice and theoretically?
To rephrase what I think you're asking - you want to list all folders that do not contain any subfolders (and thus contain only non-folder files).
You can use os.walk() for this pretty easily. os.walk() returns an iterable of three-tuples (dirname, subdirectories, filenames). We can wrap a list comprehension around that output to select only the "leaf" directories from a file tree - just collect all the dirnames that have no subdirectories.
import os
dirList = [d[0] for d in os.walk('root/directory/path') if len(d[1]) == 0]
So another way to state your problem is that you want all folders that contain no subfolders? If that's the case then you can make use of the fact that os.walk lists all the subfolders within a folder. If that list is empty, then append it to dirList
import os
import sys
def getDirList(dir):
# x[1] contains the list of subfolders
dirList = [(x[0], x[1]) for x in os.walk(dir)]
return dirList
oldDirList = getDirList(sys.argv[1])
dirList = []
for i, dir in enumerate(oldDirList):
if not dir[1]: # if the list of subfolders is not empty
dirList.append(dir[0])
print dirList
today I had a similar problem.
Try pathlib: https://docs.python.org/3/library/pathlib.html
from pathlib import PurePath
import os, sys
#os.getcwd() returns path of red_dir if script is inside
gray_dir = PurePath(os.getcwd()).parents[1] # .parents[1] returns n-1 path
blue_things = os.listdir(gray_dir)
blue_dirs = []
for thing in blue_things:
if os.path.isdir(str(gray_dir) + "\\" + str(thing)): # make sure not to append files
blue_dirs.append(thing)
print(blue_dirs)
I have a problem with the glob.glob function in Python.
This line works perfectly for me getting all text files with the name 002 in the two subsequent folders of Models:
All_txt = glob.glob("C:\Users\EDV\Desktop\Peter\Models\*\*\002.txt")
But going into one subfolder and asking the same:
All_txt = glob.glob('C:\Users\EDV\Desktop\Peter\Models\Texte\*\002.txt')
results in an empty list. Does anybody know what the problem here is (or knows another function which expresses the same)?
I double-checked the folder paths and that all folders contain these text-files.
Try putting an r in front of the string to make a raw string: glob.glob(r'C:\Users\EDV\Desktop\Peter\Models\Texte\*\002.txt'). This will make it so the backslashes arent used for escaping the next character.
You could also do it without glob like so:
import os
all_txt = []
root = r'C:\Users\EDV\Desktop\Peter\Models\Texte'
for d in os.listdir(root):
abs_d = os.path.join(root, d)
if os.path.isdir(abs_d):
txt = os.path.join(abs_d, '002.txt')
if os.path.isfile(txt):
all_txt.append(txt)
I do atomistic modelling, and use Python to analyze simulation results. To simplify work with a whole bunch of Python scripts used for different tasks, I decided to write simple GUI to run scripts from it.
I have a (rather complex) directory structure beginning from some root (say ~/calc), and I want to populate wx.TreeCtrl control with directories containing calculation results preserving their structure. The folder contains the results if it contains a file with .EXT extension. What i try to do is walk through dirs from root and in each dir check whether it contains .EXT file. When such dir is reached, add it and its ancestors to the tree:
def buildTree(self, rootdir):
root = rootdir
r = len(rootdir.split('/'))
ids = {root : self.CalcTree.AddRoot(root)}
for (dirpath, dirnames, filenames) in os.walk(root):
for dirname in dirnames:
fullpath = os.path.join(dirpath, dirname)
if sum([s.find('.EXT') for s in filenames]) > -1 * len(filenames):
ancdirs = fullpath.split('/')[r:]
ad = rootdir
for ancdir in ancdirs:
d = os.path.join(ad, ancdir)
ids[d] = self.CalcTree.AppendItem(ids[ad], ancdir)
ad = d
But this code ends up with many second-level nodes with the same name, and that's definitely not what I want. So I somehow need to see if the node is already added to the tree, and in positive case add new node to the existing one, but I do not understand how this could be done. Could you please give me a hint?
Besides, the code contains 2 dirty hacks I'd like to get rid of:
I get the list of ancestor dirs with splitting the full path in \
positions, and this is Linux-specific;
I find if .EXT file is in the directory by trying to find the extension in the strings from filenames list, taking in account that s.find returns -1 if the substring is not found.
Is there a way to make these chunks of code more readable?
First of all the hacks:
To get the path seperator for whatever os your using you can use os.sep.
Use str.endswith() and use the fact that in Python the empty list [] evaluates to False:
if [ file for file in filenames if file.endswith('.EXT') ]:
In terms of getting them all nicely nested you're best off doing it recursively. So the pseudocode would look something like the following. Please note this is just provided to give you an idea of how to do it, don't expect it to work as it is!
def buildTree(self, rootdir):
rootId = self.CalcTree.AddRoot(root)
self.buildTreeRecursion(rootdir, rootId)
def buildTreeRecursion(self, dir, parentId)
# Iterate over the files in dir
for file in dirFiles:
id = self.CalcTree.AppendItem(parentId, file)
if file is a directory:
self.buildTreeRecursion(file, id)
Hope this helps!