Count the number of folders in a directory and subdirectories - python

I've got a script that will accurately tell me how many files are in a directory, and the subdirectories within. However, I'm also looking into identify how many folders there are within the same directory and its subdirectories...
My current script:
import os, getpass
from os.path import join, getsize
user = 'Copy of ' + getpass.getuser()
path = "C://Documents and Settings//" + user + "./"
folder_counter = sum([len(folder) for r, d, folder in os.walk(path)])
file_counter = sum([len(files) for r, d, files in os.walk(path)])
print ' [*] ' + str(file_counter) + ' Files were found and ' + str(folder_counter) + ' folders'
This code gives me the print out of: [*] 147 Files were found and 147 folders.
Meaning that the folder_counter isn't counting the right elements. How can I correct this so the folder_counter is correct?

Python 2.7 solution
For a single directory and in you can also do:
import os
print len(os.walk('dir_name').next()[1])
which will not load the whole string list and also return you the amount of directories inside the 'dir_name' directory.
Python 3.x solution
Since many people just want an easy and fast solution, without actually understanding the solution, I edit my answer to include the exact working code for Python 3.x.
So, in Python 3.x we have the next method instead of .next. Thus, the above snippet becomes:
import os
print(len(next(os.walk('dir_name'))[1]))
where dir_name is the directory that you want to find out how many directories has inside.

I think you want something like:
import os
files = folders = 0
for _, dirnames, filenames in os.walk(path):
# ^ this idiom means "we won't be using this value"
files += len(filenames)
folders += len(dirnames)
print "{:,} files, {:,} folders".format(files, folders)
Note that this only iterates over os.walk once, which will make it much quicker on paths containing lots of files and directories. Running it on my Python directory gives me:
30,183 files, 2,074 folders
which exactly matches what the Windows folder properties view tells me.
Note that your current code calculates the same number twice because the only change is renaming one of the returned values from the call to os.walk:
folder_counter = sum([len(folder) for r, d, folder in os.walk(path)])
# ^ here # ^ and here
file_counter = sum([len(files) for r, d, files in os.walk(path)])
# ^ vs. here # ^ and here
Despite that name change, you're counting the same value (i.e. in both it's the third of the three returned values that you're using)! Python functions do not know what names (if any at all; you could do print list(os.walk(path)), for example) the values they return will be assigned to, and their behaviour certainly won't change because of it. Per the documentation, os.walk returns a three-tuple (dirpath, dirnames, filenames), and the names you use for that, e.g. whether:
for foo, bar, baz in os.walk(...):
or:
for all_three in os.walk(..):
won't change that.

If interested only in the number of folders in /input/dir (and not in the subdirectories):
import os
folder_count = 0 # type: int
input_path = "/path/to/your/input/dir" # type: str
for folders in os.listdir(input_path): # loop over all files
if os.path.isdir(os.path.join(input_path, folders): # if it's a directory
folder_count += 1 # increment counter
print("There are {} folders".format(folder_count))

>>> import os
>>> len(list(os.walk('folder_name')))
According to os.walk the first argument dirpath enumerates all directories.

Related

Python: Identifying numerically names folders in a folder structure

I have the below function, that walksthe root of a given directory and grabs all subdirectories and places them into a list. This part works, sort of.
The objective is to determine the highest (largest number) numerically named folder.
Assuming that the folder contains only numerically named folders, and does not contain alphanumeric folders of files, I'm good. However, if a file, or folder is present that is not numerically named I encounter issues because the script seems to be collecting all subdirectories and files, and loast everything into the list.
I need to just find those folders whose naming is numeric, and ignore anything else.
Example folder structure for c:\Test
\20200202\
\20200109\
\20190308\
\Apples\
\Oranges\
New Document.txt
This works to walk the directory but puts everything in the list, not just the numeric subfolders.
#Example code
import os
from pprint import pprint
files=[]
MAX_DEPTH = 1
folders = ['C:\\Test']
for stuff in folders:
for root, dirs, files in os.walk(stuff, topdown=True):
for subdirname in dirs:
files.append(os.path.join(subdirname))
#files.append(os.path.join(root, subdirname)) will give full directory
#print("there are", len(files), "files in", root) will show counts of files per directory
if root.count(os.sep) - stuff.count(os.sep) == MAX_DEPTH - 1:
del dirs[:]
pprint(max(files))
Current Result of max(files):
New Document.txt
Desired Output:
20200202
What I have tried so far:
I've tried catching each element before I add it to the list, seeing if the string of the subdirname can be converted to int, and then adding it to the list. This fails to convert the numeric subdirnames to an int, and somehow (I don't know how) the New Document.txt file gets added to the list.
files=[]
MAX_DEPTH = 1
folders = ['C:\\Test']
for stuff in folders:
for root, dirs, files in os.walk(stuff, topdown=True):
for subdirname in dirs:
try:
subdirname = int(subdirname)
print("Found subdir named " + subdirname + " type: " + type(subdirname))
files.append(os.path.join(subdirname))
except:
print("Error converting " + str(subdirname) + " to integer")
pass
#files.append(os.path.join(root, subdirname)) will give full directory
#print("there are", len(files), "files in", root) will show counts of files per directory
if root.count(os.sep) - stuff.count(os.sep) == MAX_DEPTH - 1:
del dirs[:]
return (input + "/" + max(files))
I've also tried appending everything to the list and then creating a second list (ie, without the try/except) using the below, but I wind up with an empty list. I'm not sure why, and I'm not sure where/how to start looking. Using 'type' on the list before applying the following shows that everything in the list is a str type.
list2 = [x for x in files if isinstance(x,int) and not isinstance(x,bool)]
I'm going to go ahead and answer my own question here:
Changing the method entirely helped, and made it significantly faster, and simpler.
#the find_newest_date function looks for a folder with the largest number and assumes that is the newest data
def find_newest_date(input):
intlistfolders = []
list_subfolders_with_paths = [f.name for f in os.scandir(input) if f.is_dir()]
for x in list_subfolders_with_paths:
try:
intval = int(x)
intlistfolders.append(intval)
except:
pass
return (input + "/" + str(max(intlistfolders)))
Explanation:
scandir is 3x faster than walk. directory performance
scandir also allows the use of f.name to pull out just the folder
names, or f.path to get paths.
So, use scandir to load up the list with all the subdirs.
Iterate over the list, and try to convert each value to an integer.
I don't know why it wouldn't work in the earlier example, but it
works in this case.
The first part of the try statement converts to an integer.
If conversion fails, the except clause is run, and 'pass' is
essentially a null statement. It does nothing.
Then, finally, join the input directory with the string
representation of the maximum numeric value (ie most recently dated
folder in this case).
The function is called with:
folder_named_path = find_newest_date("C:\\Test") or something similar.
Try matching dirs with a regular expression.num = r”[0-9]+” is your regular expression. Something like re.findall(num,subdirname) returns to you a matching string that is one or more Numbers.

Python code for renaming files in directory not working

I'm looking to rename my files from time to time in numerical order, so for example, 1.png, 2.png., 3.png, etc
I wrote this code in an attempt to do so, I simply ended it with by printing what the files would be named to make sure it was right:
import os
os.chdir('/Users/hasso/Pictures/Digital Art/saved images for vid/1')
for f in os.listdir():
f_name=1
f_ext= '.png'
print('{}{}'.format(f_name, f_ext))
How would I go by solving this?
You keep on getting 1.png suggested as the new name because you always set f_name = 1 inside the loop. Initialize it with 1 before the loop, and then increment it as you are renaming each file instead.
A few additional points:
You don't need os.chdir because even if the default is . – the current dir –, you can also supply the target path to os.filelist.
When dealing with user home directories, it's nice if you don't have to hardcode it. os.path.expanduser retrieves this value for you.
When iterating over lists that you possibly want to change, it's best to make a separate list of only the items that you want to change. So, rather than looping over all files and changing some of them, make it easier by first gathering all items that you want to change. In your case, make a list of only .png files and then loop over this list.
(Rather advanced) os.rename will throw an error if you try to rename to an already existing name. What I do below is check if the next name to be used is already in the list, and if it is, increase the f_name number.
import os
yourPath = os.path.expanduser('~')+'/Pictures/Digital Art/saved images for vid/1'
filelist = []
for f in os.listdir(yourPath):
if f.lower().endswith('.png'):
filelist.append (f)
f_name = 1
for f in filelist:
while True:
next_name = str(f_name)+'.png'
if not next_name in filelist:
break
f_name += 1
print ('Renaming {} to {}'.format(yourPath+'/'+f, next_name))
# os.rename (yourPath+'/'+f, next_name)
f_name += 1
I'm not sure why you need to use os.chdir() to change directories, when you can just pass the path straight to os.listdir(). To rename files, you can use os.rename(). You also need to increment the counter for the file names, since your current code you keep fname equal to 1 on each iteration. You need to keep the counter outside the loop and increment it within the loop. This is where you can useenumerate(), since you can use indexing instead.
Basic version:
from os import listdir
from os import rename
from os.path import join
path = "path_to_images"
for i, f in enumerate(listdir(path), start=1):
rename(join(path, f), join(path, str(i) + '.png'))
You can get the full paths using os.path.join(), since os.listdir() doesn't include the full path of the file. The above code is also not very robust as it renames all files, and doesn't handle renaming already existent .png files.
Advanced version:
from os import listdir
from os import rename
from os.path import join
from os.path import exists
path = "path_to_images"
extension = '.png'
fname = 1
for f in listdir(path):
if f.endswith(extension):
while exists(join(path, str(fname) + extension)):
fname += 1
rename(join(path, f), join(path, str(fname) + extension))
fname += 1
Which uses os.path.exists() to check if the file already exists.

Want to create a list of directories at the n-1 level (folders that do not contain any subfolders) with either bash or python

I currently have a problem where I want to get a list of directories that are at an n-1 level. The structure looks somewhat like the diagram below, and I want a list of all the folders that are blue in color. The height of the tree however, varies across the entire file system.
Due to the fact that all the folders that are blue, generally end their name with the string images, I have written the code in Python below:
def getDirList(dir):
dirList = [x[0] for x in os.walk(dir)]
return dirList
oldDirList = getDirList(sys.argv[1])
dirList = []
# Hack method for getting the folders
for i, dir in enumerate(oldDirList):
if dir.endswith('images'):
dirList.append(oldDirList[i] + '/')
Now, I do not want to use this method, since I want a general solution to this problem, with Python or bash scripting and then read the bash script result into Python. Which one would be more efficient in practice and theoretically?
To rephrase what I think you're asking - you want to list all folders that do not contain any subfolders (and thus contain only non-folder files).
You can use os.walk() for this pretty easily. os.walk() returns an iterable of three-tuples (dirname, subdirectories, filenames). We can wrap a list comprehension around that output to select only the "leaf" directories from a file tree - just collect all the dirnames that have no subdirectories.
import os
dirList = [d[0] for d in os.walk('root/directory/path') if len(d[1]) == 0]
So another way to state your problem is that you want all folders that contain no subfolders? If that's the case then you can make use of the fact that os.walk lists all the subfolders within a folder. If that list is empty, then append it to dirList
import os
import sys
def getDirList(dir):
# x[1] contains the list of subfolders
dirList = [(x[0], x[1]) for x in os.walk(dir)]
return dirList
oldDirList = getDirList(sys.argv[1])
dirList = []
for i, dir in enumerate(oldDirList):
if not dir[1]: # if the list of subfolders is not empty
dirList.append(dir[0])
print dirList
today I had a similar problem.
Try pathlib: https://docs.python.org/3/library/pathlib.html
from pathlib import PurePath
import os, sys
#os.getcwd() returns path of red_dir if script is inside
gray_dir = PurePath(os.getcwd()).parents[1] # .parents[1] returns n-1 path
blue_things = os.listdir(gray_dir)
blue_dirs = []
for thing in blue_things:
if os.path.isdir(str(gray_dir) + "\\" + str(thing)): # make sure not to append files
blue_dirs.append(thing)
print(blue_dirs)

Change image names using os.walk to include parent directory names

I would like to rename images based on part of the name of the folder the images are in and iterate through the images. I am using os.walk and I was able to rename all the images in the folders but could not figure out how to use the letters to the left of the first hyphen in the folder name as part of the image name.
Folder name: ABCDEF - THIS IS - MY FOLDER - NAME
Current image names in folder:
dsc_001.jpg
dsc_234.jpg
dsc_123.jpg
Want to change to show like this:
ABCDEF_1.jpg
ABCDEF_2.jpg
ABCDEF_3.jpg
What I have is this, but I am not sure why I am unable to split the filename by the hyphen:
import os
from os.path import join
path = r'C:\folderPath'
i = 1
for root, dirs, files in os.walk(path):
for image in files:
prefix = files.split(' - ')[0]
os.rename(os.path.join(path, image), os.path.join(path, prefix + '_'
+ str(i)+'.jpg'))
i = i+1
Okay, I've re-read your question and I think I know what's wrong.
1.) The os.walk() iterable is recursive, i.e. if you use os.walk(r'C:\'), it will loop through all the folders and find all the files under C drive. Now I'm not sure if your C:\folderPath has any sub-folders in it. If it does, and any of the folder/file format are not the convention as C:\folderPath, your code is going to have a bad time.
2.) When you iterate through files, you are split()ing the wrong object. Your question state you want to split the Folder name, but your code is splitting the files iterable which is a list of all the files under the current iteration directory. That doesn't accomplish what you want. Depending if your ABCDEF folder is the C:\folderPath or a sub folder within, you'll need to code differently.
3.) you have imported join from os.path but you still end up calling the full name os.path.join() anyways, which is redundant. Either just import os and call os.path.join() or just with your current imports, just join().
Having said all of that, here are my edits:
Answer 1:
If your ABCDEF is the assigned folder
import os
from os.path import join
path = r'C:\ABCDEF - THIS - IS - MY - FOLDER - NAME'
for root, dirs, files in os.walk(path):
folder = root.split("\\")[-1] # This gets you the current folder's name
for i, image in enumerate(files):
new_image = "{0}_{1}.jpg".format(folder.split(' - ')[0], i + 1)
os.rename(join(path, image), join(path, new_image))
break # if you have sub folders that follow the SAME structure, then remove this break. Otherwise, keep it here so your code stop after all the files are updated in your parent folder.
Answer 2:
Assuming your ABCDEF's are all sub folders under the assigned directory, and all of them follow the same naming convention.
import os
from os.path import join
path = r'C:\parentFolder' # The folder that has all the sub folders that are named ABCDEF...
for i, (root, dirs, files) in enumerate(os.walk(path)):
if i == 0: continue # skip the parentFolder as it doesn't follow the same naming convention
folder = root.split("\\")[-1] # This gets you the current folder's name
for i, image in enumerate(files):
new_image = "{0}_{1}.jpg".format(folder.split(' - ')[0], i + 1)
os.rename(join(path, image), join(path, new_image))
Note:
If your scenario doesn't fall under either of these, please make it clear what your folder structure is (a sample including all sub folders and sub files). Remember, consistency is key in determining how your code should work. If it's inconsistent, your best bet is use Answer 1 on each target folder separately.
Changes:
1.) You can get an incremental index without doing a i += 1. enumerate() is a great tool for iterables that also give you the iteration number.
2.) Your split() should be operated on the folder name instead of files (an iterable). In your case, image is the actual file name, and files is the list of files in the current iteration directory.
3.) Use of str.format() function to make your new file format easier to read.
4.) You'll note the use of split("\\") instead of split(r"\"), and that's because a single backslash cannot be a raw string.
This should now work. I ended up doing a lot more research than expected such as how to handle the os.walk() properly in both scenarios. For future reference, a little google search goes a long way. I hope this finally answers your question. Remember, doing your own research and clarity in demonstrating your problem will get you more efficient answers.
Bonus: if you have python 3.6+, you can even use f strings for your new file name, which ends up looking really cool:
new_image = f"{image.split(' - ')[0]}_{i+1}.jpg"

Python automated file names

I want to automate the file name used when saving a spreadsheet using xlwt. Say there is a sub directory named Data in the folder the python program is running. I want the program to count the number of files in that folder (# = n). Then the filename must end in (n+1). If there are 0 files in the folder, the filename must be Trial_1.xls. This file must be saved in that sub directory.
I know the following:
import xlwt, os, os.path
n = len([name for name in os.listdir('.') if os.path.isfile(name)])
counts the number of files in the same folder.
a = n + 1
filename = "Trial_" + "a" + ".xls"
book.save(filename)
this will save the file properly named in to the same folder.
My question is how do I extend this in to a sub directory? Thanks.
os.listdir('.') the . in this points to the directory from where the file is executed. Change the . to point to the subdirectory you are interested in.
You should give it the full path name from the root of your file system; otherwise it will be relative to the directory from where the script is executed. This might not be what you want; especially if you need to refer to the sub directory from another program.
You also need to provide the full path to the filename variable; which would include the sub directory.
To make life easier, just set the full path to a variable and refer to it when needed.
TARGET_DIR = '/home/me/projects/data/'
n = sum(1 for f in os.listdir(TARGET_DIR) if os.path.isfile(os.path.join(TARGET_DIR, f)))
new_name = "{}Trial_{}.xls".format(TARGET_DIR,n+1)
You actually want glob:
from glob import glob
DIR = 'some/where/'
existing_files = glob(DIR + '*.xls')
filename = DIR + 'stuff--%d--stuff.xls' % (len(existing_files) + 1)
Since you said Burhan Khalid's answer "Works perfectly!" you should accept it.
I just wanted to point out a different way to compute the number. The way you are doing it works, but if we imagine you were counting grains of sand or something would use way too much memory. Here is a more direct way to get the count:
n = sum(1 for name in os.listdir('.') if os.path.isfile(name))
For every qualifying name, we get a 1, and all these 1's get fed into sum() and you get your count.
Note that this code uses a "generator expression" instead of a list comprehension. Instead of building a list, taking its length, and then discarding the list, the above code just makes an iterator that sum() iterates to compute the count.
It's a bit sleazy, but there is a shortcut we can use: sum() will accept boolean values, and will treat True as a 1, and False as a 0. We can sum these.
# sum will treat Boolean True as a 1, False as a 0
n = sum(os.path.isfile(name) for name in os.listdir('.'))
This is sufficiently tricky that I probably would not use this without putting a comment. But I believe this is the fastest, most efficient way to count things in Python.

Categories