I need to iterate over a folder tree. I have to check each subfolder, which looks like this:
moduleA-111-date
moduleA-112-date
moduleA-113-date
moduleB-111-date
moduleB-112-date
etc.
I figured out how to iterate over a folder tree. I can also use stat with mtime to get the date of the folder which seems easier than parsing the name of the date.
How do I single out modules with the same prefix (such as "moduleA") and compare their mtime's so I can delete the oldest?
Since you have no code, I assume that you're looking for design help. I'd lead my students to something like:
Make a list of the names
From each name, find the prefix, such as "moduleA. Put those in a set.
For each prefix in the set
Find all names with that prefix; put these in a temporary list
Sort this list.
For each file in this list *except* the last (newest)
delete the file
Does this get you moving?
I'm posting the code (answer) here, I suppose my question wasn't clear since I'm getting minus signs but anyway the solution wasn't as straight forward as I thought, I'm sure the code could use some fine tuning but it get's the job done.
#!/usr/bin/python
import os
import sys
import fnmatch
import glob
import re
import shutil
##########################################################################################################
#Remove the directory
def remove(path):
try:
shutil.rmtree(path)
print "Deleted : %s" % path
except OSError:
print OSError
print "Unable to remove folder: %s" % path
##########################################################################################################
#This function will look for the .sh files in a given path and returns them as a list.
def searchTreeForSh(path):
full_path = path+'*.sh'
listOfFolders = glob.glob(full_path)
return listOfFolders
##########################################################################################################
#Gets the full path to files containig .sh and returns a list of folder names (prefix) to be acted upon.
#listOfScripts is a list of full paths to .sh file
#dirname is the value that holds the root directory where listOfScripts is operating in
def getFolderNames(listOfScripts):
listOfFolders = []
folderNames = []
for foldername in listOfScripts:
listOfFolders.append(os.path.splitext(foldername)[0])
for folders in listOfFolders:
folder = folders.split('/')
foldersLen=len(folder)
folderNames.append(folder[foldersLen-1])
folderNames.sort()
return folderNames
##########################################################################################################
def minmax(items):
return max(items)
##########################################################################################################
#This function will check the latest entry in the tuple provided, and will then send "everything" to the remove function except that last entry
def sortBeforeDelete(statDir, t):
count = 0
tuple(statDir)
timeNotToDelete = minmax(statDir)
for ff in t:
if t[count][1] == timeNotToDelete:
count += 1
continue
else:
remove(t[count][0])
count += 1
##########################################################################################################
#A loop to run over the fullpath which is broken into items (see os.listdir above), elemenates the .sh and the .txt files, leaves only folder names, then matches it to one of the
#name in the "folders" variable
def coolFunction(folderNames, path):
localPath = os.listdir(path)
for folder in folderNames:
t = () # a tuple to act as sort of a dict, it will hold the folder name and it's equivalent st_mtime
statDir = [] # a list that will hold the st_mtime for all the folder names in subDirList
for item in localPath:
if os.path.isdir(path + item) == True:
if re.search(folder, item):
mtime = os.stat(path + '/' + item)
statDir.append(mtime.st_mtime)
t = t + ((path + item,mtime.st_mtime),)# the "," outside the perenthasis is how to make t be a list of lists and not set the elements one after theother.
if t == ():continue
sortBeforeDelete(statDir, t)
##########################################################################################################
def main(path):
dirs = os.listdir(path)
for component in dirs:
if os.path.isdir(component) == True:
newPath = path + '/' + component + '/'
listOfFolders= searchTreeForSh(newPath)
folderNames = getFolderNames(listOfFolders)
coolFunction(folderNames, newPath)
##########################################################################################################
if __name__ == "__main__":
main(sys.argv[1])
Related
I have a directory with a large number of files that I want to move into folders based on part of the file name. My list of files looks like this:
001-020-012B-B.nc
001-022-151-A.nc
001-023-022-PY-T1.nc.nc
001-096-016B-A.nc
I want to move the files I have into separate folders based on the first part of the file name (001-096-016B, 001-023-022, 001-022-151). The first parts of the file name always have the same number of numbers and are always in 3 parts separated by an underscore '-'.
The folder names are named like this \oe-xxxx\xxxx\xxxx\001-Disc-PED\020-Rotor-parts-1200.
So for example, this file should be placed in the above folder, based on the folder name (the numbers):
001-020-012B-B.nc
File path divided into column to show where the above file has to be moved to:
(001)-Disc-PED\(020)-Rotor-parts-1200.
Therefore:
(001)-Disc-PED\(020)-Rotor-parts-1200 (001)-(020)-012B-B.nc
This is what I have tried from looking online but it does not work:
My thinking is I want to loop through the folders and look for matches.
import os
import glob
import itertools
import re
#Source file
sourcefile = r'C:\Users\cah\Desktop\000Turning'
destinationPath = r'C:\Users\cah\Desktop\08-CAM'
#Seperation
dirs = glob.glob('*-*')
#Every file with file extension .nc
files = glob.glob('*.nc')
for root, dirs, files in os.walk(sourcefile):
for file in files:
if file.endswith(".nc"):
first3Char = str(file[0:3])
last3Char = str(file[4:7])
for root in os.walk(destinationPath):
first33CharsOfRoot = str(root[0:33])
cleanRoot1 = str(root).replace("[", "")
cleanRoot2 = str(cleanRoot1).replace("]", "")
cleanRoot3 = str(cleanRoot2).replace(")", "")
cleanRoot4 = str(cleanRoot3).replace("'", "")
cleanRoot5 = str(cleanRoot4).replace(",", "")
firstCharOfRoot = re.findall(r'(.{3})\s*$', str(cleanRoot5))
print(firstCharOfRoot==first3Char)
if(firstCharOfRoot == first3Char):
print("Hello")
for root in os.walk(destinationPath):
print(os.path.basename(root))
# if(os.path)
I realized that I should not look for the last 3 chars in the path, because it is the first (001) etc. Numbers that I need to look for in the beginning to find the first path that I need to go to.
EDIT:
import os
import glob
import itertools
import re
#Source file
sourcefile = r'C:\Users\cah\Desktop\000Turning'
destinationPath = r'C:\Users\cah\Desktop\08-CAM'
#Seperation
dirs = glob.glob('*-*')
#Every file with file extension .nc
files = glob.glob('*.nc')
for root, dirs, files in os.walk(sourcefile):
for file in files:
if file.endswith(".nc"):
first3Char = str(file[0:3])
last3Char = str(file[4:7])
for root in os.walk(destinationPath):
cleanRoot1 = str(root).replace("[", "")
cleanRoot2 = str(cleanRoot1).replace("]", "")
cleanRoot3 = str(cleanRoot2).replace(")", "")
cleanRoot4 = str(cleanRoot3).replace("'", "")
cleanRoot5 = str(cleanRoot4).replace(",", "")
firstCharOfRoot = re.findall(r'^(?:[^\\]+\\\\){5}(\d+).*$', str(cleanRoot5))
secondCharOfRoot = re.findall(r'^(?:[^\\]+\\\\){6}(\d+).*$', str(cleanRoot5))
firstCharOfRootCleaned = ''.join(firstCharOfRoot)
secondCharOfRoot = ''.join(secondCharOfRoot)
cleanRoot6 = str(cleanRoot5).replace("(", "")
if(firstCharOfRootCleaned == str(first3Char) & secondCharOfRoot == str(last3Char)):
print("BINGOf")
# for root1 in os.walk(cleanRoot6):
Solution
There is an improved solution in the next section. But let's decompose the straightforward solution before.
First, get the complete list of subfolders.
all_folders_splitted = [os.path.split(f)\
for f in glob.iglob(os.path.join(destinationPath, "**"), recursive=True)\
if os.path.isdir(f)]
Then, use a function on each of your file to find its matching folder, or a new filepath if it doesn't exist. I include this function called find_folder() in the rest of the script:
import os
import glob
import shutil
sourcefile= r'C:\Users\cah\Desktop\000Turning'
destinationPath = r'C:\Users\cah\Desktop\08-CAM'
all_folders_splitted = [os.path.split(f)\
for f in glob.iglob(os.path.join(destinationPath , "**"), recursive=True)\
if os.path.isdir(f)]
# It will create and return a new directory if no directory matches
def find_folder(part1, part2):
matching_folders1 = [folder for folder in all_folders_splitted\
if os.path.split(folder[0])[-1].startswith(part1)]
matching_folder2 = None
for matching_folder2 in matching_folders1:
if matching_folder2[-1].startswith(part2):
return os.path.join(*matching_folder2)
# Whole new folder tree
if matching_folder2 is None:
dest = os.path.join(destinationPath, part1, part2)
os.makedirs(dest)
return dest
# Inside the already existing folder part "1"
dest = os.path.join(matching_folder2[0], part2)
os.makedirs(dest)
return dest
# All the files you want to move
files_gen = glob.iglob(os.path.join(source_path, "**", "*-*-*.nc"), recursive=True)
for file in files_gen:
# Split the first two "-"
basename = os.path.basename(file)
splitted = basename.split("-", 2)
# Format the destination folder.
# Creates it if necessary
destination_folder = find_folder(splitted[0], splitted[1])
# Copying the file
shutil.copy2(file, os.path.join(destination_folder, basename))
Improved solution
In case you have a large number of files, it could be detrimental to "split and match" every folder at each iteration.
We can store the folder, found given a pattern, in a dictionary. The dictionary will be updated if a new pattern is given, else it will return the previously found folder.
import os
import glob
import shutil
sourcefile= r'C:\Users\cah\Desktop\000Turning'
destinationPath = r'C:\Users\cah\Desktop\08-CAM'
# Global dictionary to store folder paths, relative to a pattern
found_pattern = dict()
all_folders_splitted = [os.path.split(f)\
for f in glob.iglob(os.path.join(destinationPath , "**"), recursive=True)\
if os.path.isdir(f)]
def find_folder(part1, part2):
current_key = tuple([part1, part2])
if current_key in pattern_match:
# Already found previously.
# We just return the folder path, stored as the value.
return pattern_match[current_key]
matching_folders1 = [folder for folder in all_folders_splitted\
if os.path.split(folder[0])[-1].startswith(part1)]
matching_folder2 = None
for matching_folder2 in matching_folders1:
if matching_folder2[-1].startswith(part2):
dest = os.path.join(*matching_folder2)
# Update the dictionary
pattern_match[current_key] = dest
return dest
if matching_folder2 is None:
dest = os.path.join(destinationPath, part1, part2)
else:
dest = os.path.join(matching_folder2[0], part2)
# Update the dictionary
pattern_match[current_key] = dest
os.makedirs(dest, exist_ok = True)
return dest
# All the files you want to move
files_gen = glob.iglob(os.path.join(source_path, "**", "*-*-*.nc"), recursive=True)
for file in files_gen:
# Split the first two "-"
basename = os.path.basename(file)
splitted = basename.split("-", 2)
# Format the destination folder.
# Creates it if necessary
destination_folder = find_folder(splitted[0], splitted[1])
# Copying the file
shutil.copy2(file, os.path.join(destination_folder, basename))
This updated solution makes it more efficient (especially when many files should share the same folder) and you could also make use of the dictionary later, if you save it.
I have a folder with some 1500 excel files . The format of each file is something like this:
0d20170101abcd.xlsx
1d20170101ef.xlsx
0d20170104g.xlsx
0d20170109hijkl.xlsx
1d20170109mno.xlsx
0d20170110pqr.xlsx
The first character of the file name is either '0' or '1' followed by 'd' followed by the date when the file was created followed by customer id(abcd,ef,g,hijkl,mno,pqr).The customer id has no fixed length and it can vary.
I want to create folders for each unique date(folder name should be date) and move the files with the same date into a single folder .
So for the above example , 4 folders (20170101,20170104,20170109,20170110) has to be created with files with same dates copied into their respective folders.
I want to know if there is any way to do this in python ? Sorry for not posting any sample code because I have no idea as to how to start.
Try this out:
import os
import re
root_path = 'test'
def main():
# Keep track of directories already created
created_dirs = []
# Go through all stuff in the directory
file_names = os.listdir(root_path)
for file_name in file_names:
process_file(file_name, created_dirs)
def process_file(file_name, created_dirs):
file_path = os.path.join(root_path, file_name)
# Check if it's not itself a directory - safe guard
if os.path.isfile(file_path):
file_date, user_id, file_ext = get_file_info(file_name)
# Check we could parse the infos of the file
if file_date is not None \
and user_id is not None \
and file_ext is not None:
# Make sure we haven't already created the directory
if file_date not in created_dirs:
create_dir(file_date)
created_dirs.append(file_date)
# Move the file and rename it
os.rename(
file_path,
os.path.join(root_path, file_date, '{}.{}'.format(user_id, file_ext)))
print file_date, user_id
def create_dir(dir_name):
dir_path = os.path.join(root_path, dir_name)
if not os.path.exists(dir_path) or not os.path.isdir(dir_path):
os.mkdir(dir_path)
def get_file_info(file_name):
match = re.search(r'[01]d(\d{8})([\w+-]+)\.(\w+)', file_name)
if match:
return match.group(1), match.group(2), match.group(3)
return None, None, None
if __name__ == '__main__':
main()
Note that depending on the names of your files, you might want to change (in the future) the regex I use, i.e. [01]d(\d{8})([\w+-]+) (you can play with it and see details about how to read it here)...
Check this code.
import os
files = list(x for x in os.listdir('.') if x.is_file())
for i in files:
d = i[2:10] #get data from filename
n = i[10:] #get new filename
if os.path.isdir(i[2:10]):
os.rename(os.getcwd()+i,os.getcwd()+d+"/"+i)
else:
os.mkdir(os.getcwd()+i)
os.rename(os.getcwd()+i,os.getcwd()+d+"/"+i)
Here's is the repl link
Try this out :
import os, shutil
filepath = "your_file_path"
files = list(x for x in os.listdir(filepath) if x.endswith(".xlsx"))
dates = list(set(x[2:10] for x in files))
for j in dates:
os.makedirs(filepath + j)
for i in files:
cid = i[10:]
for j in dates:
if j in i:
os.rename(filepath+i,cid)
shutil.copy2(filepath+cid, filepath+j)
Is there an inbuilt module to search for a file in the current directory, as well as all the super-directories?
Without the module, I'll have to list all the files in the current directory, search for the file in question, and recursively move up if the file isn't present. Is there an easier way to do this?
Well this is not so well implemented, but will work
use listdir to get list of files/folders in current directory and then in the list search for you file.
If it exists loop breaks but if it doesn't it goes to parent directory using os.path.dirname and listdir.
if cur_dir == '/' the parent dir for "/" is returned as "/" so if cur_dir == parent_dir it breaks the loop
import os
import os.path
file_name = "test.txt" #file to be searched
cur_dir = os.getcwd() # Dir from where search starts can be replaced with any path
while True:
file_list = os.listdir(cur_dir)
parent_dir = os.path.dirname(cur_dir)
if file_name in file_list:
print "File Exists in: ", cur_dir
break
else:
if cur_dir == parent_dir: #if dir is root dir
print "File not found"
break
else:
cur_dir = parent_dir
Here's another one, using pathlib:
from pathlib import Path
def find_upwards(cwd: Path, filename: str) -> Path | None:
if cwd == Path(cwd.root) or cwd == cwd.parent:
return None
fullpath = cwd / filename
return fullpath if fullpath.exists() else find_upwards(cwd.parent, filename)
# usage example:
find_upwards(Path.cwd(), "helloworld.txt")
(using some Python 3.10 typing syntax here, you can safely skip that if you are using an earlier version)
Another option, using pathlib:
from pathlib import Path
def search_upwards_for_file(filename):
"""Search in the current directory and all directories above it
for a file of a particular name.
Arguments:
---------
filename :: string, the filename to look for.
Returns
-------
pathlib.Path, the location of the first file found or
None, if none was found
"""
d = Path.cwd()
root = Path(d.root)
while d != root:
attempt = d / filename
if attempt.exists():
return attempt
d = d.parent
return None
The parent question was to walk parent directories (not descend into children like the find command):
# walk PARENT directories looking for `filename`:
f = 'filename'
d = os.getcwd()
while d != "/" and f not in os.listdir(d):
d = os.path.abspath(d + "/../")
if os.path.isfile(os.path.join(d,f)):
do_something(f)
Here's a version that uses shell globbing to match multiple files:
# walk PARENT directories looking for any *.csv files,
# stopping when a directory that contains any:
f = '*.csv'
d = os.getcwd()
while d != "/" and not glob.glob(os.path.join(d, f)):
d = os.path.abspath(d + "/../")
files = glob.glob(os.path.join(d,f))
for filename in files:
do_something(filename)
Here a function that does an upward search:
import sys, os, os.path
def up_dir(match,start=None):
"""
Find a parent path producing a match on one of its entries.
Without match an empty string is returned.
:param match: a function returning a bool on a directory entry
:param start: absolute path or None
:return: directory with a match on one of its entries
>>> up_dir(lambda x: False)
''
"""
if start is None:
start = os.getcwd()
if any(match(x) for x in os.listdir(start)):
return start
parent = os.path.dirname(start)
if start == parent:
rootres = start.replace('\\','/').strip('/').replace(':','')
if len(rootres)==1 and sys.platform=='win32':
rootres = ''
return rootres
return up_dir(match,start=parent)
Here is an example that will find all the .csv files in a specified directory "path" and all its root directories and print them:
import os
for root, dirs, files in os.walk(path):
for file in files:
if file.endswith(".csv"):
path_file = os.path.join(root,file)
print(path_file)
If you want to start at one directory and work your way through the parents then this would work for finding all the .csv files (for example):
import os
import glob
last_dir = ''
dir = r'c:\temp\starting_dir'
os.chdir(dir)
while last_dir != dir:
dir = os.getcwd()
print(glob.glob('*.csv'))
os.chdir('..')
last_dir = os.getcwd()
I was looking for this too, since os.walk is exactly the opposite of what I wanted. That searches subdirectories. I wanted to search backwards through parent directories until I hit the drive root.
Bumming some inspiration from previous answers, below is what I am using. It doesn't require changing the working directory and it has a place for you to do something when you find a match. And you can change how the match is found. I'm using regex but a basic string compare would work fine too.
# Looking for a file with the string 'lowda' in it (like beltalowda or inyalowda)
import os
import re # only if you want to use regex
# Setup initial directories
starting_dir = 'C:\\Users\\AvasaralaC\\Documents\\Projects'
last_dir = ''
curr_dir = starting_dir
filename = ''
# Loop through parent directories until you hit the end or find a match
while last_dir != curr_dir:
for item in os.listdir(curr_dir):
if re.compile('.*lowda.*').search(item): # Here you can do your own comparison
filename = (curr_dir + os.path.sep + item)
break
if filename:
break
last_dir = curr_dir
curr_dir = os.path.abspath(curr_dir + os.path.sep + os.pardir)
Other comparisons you could do are item.lower().endswith('.txt') or some other string comparison.
Just wrote this to find the "images" directory, note '/' is Linux style
dir = os.getcwd()
while dir != '/' and not glob.glob( dir + '/images' ):
dir = os.path.dirname(dir)
I'm trying to return a unique list (set) of all directories if they do not contain certain file types. If that file type is NOT found, add that directory name to a list for further auditing.
The function below will find all valid folders and add it to a set for further comparison. I'd like to extend this to only return those directories that DO NOT contain files in the out_list. These directories MAY contain sub-directories with file in the out_list. If that's TRUE, I only want that path of the folder name of the valid dir.
# directory = r'w:\workorder'
#
# Example:
# w:\workorder\region1\12345678\hi.pdf
# w:\workorder\region2\23456789\test\bye.pdf
# w:\workorder\region3\34567891\<empty>
# w:\workorder\region4\45678912\Final.doc
#
# Results:
# ['34567891', '45678912']
job_folders = set([]) #set list is unique
out_list = [".pdf", ".ppt", ".txt"]
def get_filepaths(directory):
"""
This function will generate the file names in a directory
tree by walking the tree either top-down or bottom-up. For each
directory in the tree rooted at directory top (including top itself),
it yields a 3-tuple (dirpath, dirnames, filenames).
"""
folder_paths = [] # List which will store all of the full filepaths.
# Walk the tree.
for item in os.listdir(directory):
if os.path.isdir(os.path.join(directory, item)):
folderpath = os.path.join(directory, item) # Join the two strings in order to form the full folderpath.
if re.search('^[0-9]', item):
job_folders.add(item[:8])
folder_paths.append(folderpath) # Add it to the list.
return folder_paths
Does this do what you want?
import os
def main():
exts = {'.pdf', '.ppt', '.txt'}
for directory in get_directories_without_exts('W:\\workorder', exts):
print(directory)
def get_directories_without_exts(root, exts):
for root, dirs, files in os.walk(root):
for file in files:
if os.path.splitext(file)[1] in exts:
break
else:
yield root
if __name__ == '__main__':
main()
Edit: After looking at your requirements, I decided to create a tree object to analyze your directory structure. Once created, it is simple to make a recursive query with caching that to find out if a directory "is okay." From there, creating a generator that only finds top-level directories that are "not okay" is fairly simple. There is probably a better way to do this, but the code should at least work.
import os
def main():
exts = {'.pdf', '.ppt', '.txt'}
for directory in Tree('W:\\workorder', exts).not_okay:
print(directory)
class Tree:
def __init__(self, root, exts):
if not os.path.isdir(root):
raise ValueError('root must be a directory')
self.name = root
self.exts = exts
self.files = set()
self.directories = []
try:
names = os.listdir(root)
except OSError:
pass
else:
for child in names:
path = os.path.join(root, child)
if os.path.isfile(path):
self.files.add(os.path.splitext(child)[1])
elif os.path.isdir(path):
self.directories.append(self.__class__(path, exts))
self._is_okay = None
#property
def is_okay(self):
if self._is_okay is None:
self._is_okay = any(c.is_okay for c in self.directories) or \
any(c in self.exts for c in self.files)
return self._is_okay
#property
def not_okay(self):
if self.is_okay:
for child in self.directories:
for not_okay in child.not_okay:
yield not_okay
else:
yield self.name
if __name__ == '__main__':
main()
Did you copy and paste the existing code from somewhere else? Because the docstring appears to be that of os.walk...
Your question is unclear on several points:
You state that the goal of the code is to "return a unique list (set) of all directories if they do not contain certain file types".
First of all list and set are different data structures.
Secondly, your code creates one of each: job_folders is a set of folder names containing numbers, while folder_paths is a list of complete paths to folders regardless of whether or not they contain numbers.
What do you actually want as output here?
Should "those directories that DO NOT contain files in the out_list" be defined recursively, or only include first-level contents of those directories? My solution assumes the latter
Your example is contradictory on this point: it shows 34567891 in the results, but not region3 in the results. Whether or not the definition is recursive, region3 should be included in the results because region3 does not contain any files with the listed extensions under it.
Should job_folders be populated only with directories that satisfy the criterion about their contents, or with all folder names containing numbers? My solution assumes the latter
One poor practice in your code that I'd highlight is your use of global variables, out_list and job_folders. I've changed the former to a second parameter of get_filepaths and the latter to a second return value.
Anyway, here goes the solution...
import os, re
ext_list = [".pdf", ".ppt", ".txt"]
def get_filepaths(directory, ext_list):
folder_paths = [] # List which will store all of the full filepaths.
job_folders = set([])
# Walk the tree.
for dir, subdirs, files in os.walk(directory):
_, lastlevel = os.path.split(dir)
if re.search('^[0-9]', lastlevel):
job_folders.add(lastlevel[:8])
for item in files:
root, ext = os.path.splitext(item)
if ext in ext_list:
break
else:
# Since none of the file extensions matched ext_list, add it to the list of folder_paths
folder_paths.append(os.path.relpath(dir, directory))
return folder_paths, job_folders
I created a directory structure identical to yours under /tmp and ran the following:
folder_paths, job_folders = get_filepaths( os.path.expandvars(r"%TEMP%\workorder"), ext_list )
print "folder_paths =", folder_paths
print "job_folders =", job_folders
Here's the output:
folder_paths = ['.', 'region1', 'region2', 'region2\\23456789', 'region3', 'region3\\34567891', 'region4', 'region4\\456789123']
job_folders = set(['12345678', '23456789', '34567891', '45678912'])
As you can see, region1\12345678 and region2\23456789\test are not included in the output folder_paths because they do directly contain files of the specified extensions; all the other subdirectories are included in the output because they do not directly contain files of the specified extensions.
To get the file extension:
name,ext = os.path.splitext(os.path.join(directory,item))
if ext not in out_list:
job_folders.add(item[:8])
thanks to #DanLenski and #NoctisSkytower I was able to get this worked out.
My WorkOrder directory is always the 7th folder down when walking in_path and I found that using os.sep.
I borrowed from both of your solutions and came up with the following:
import os, re
ext_list = [".pdf"]
in_path = r'\\server\E\Data\WorkOrder'
def get_filepaths(directory, ext_list):
not_okay = set([]) # Set which will store Job folder where no ext_list files found
okay = set([]) # Set which will store Job folder where ext_list files found
job_folders = set([]) #valid Job ID folder
# Walk the tree.
for dir, subdirs, files in os.walk(directory):
for item in files:
root, ext = os.path.splitext(item)
if len(dir.split(os.sep)) >= 8: #Tree must contain Job ID folder
job_folder = dir.split(os.sep)[7]
if ext in ext_list:
okay.add(job_folder)
else: # Since none of the file extensions matched ext_list, add it to the list of folder_paths
not_okay.add(job_folder)
bad_list = list(not_okay - okay)
bad_list.sort()
return bad_list
bad_list = get_filepaths( os.path.expandvars(in_path), ext_list )
I can't figure out what's wrong. I've used rename before without any problems, and can't find a solution in other similar questions.
import os
import random
directory = "C:\\whatever"
string = ""
alphabet = "abcdefghijklmnopqrstuvwxyz"
listDir = os.listdir(directory)
for item in listDir:
path = os.path.join(directory, item)
for x in random.sample(alphabet, random.randint(5,15)):
string += x
string += path[-4:] #adds file extension
os.rename(path, string)
string= ""
There are a few strange things in your code. For example, your source to the file is the full path but your destination to rename is just a filename, so files will appear in whatever the working directory is - which is probably not what you wanted.
You have no protection from two randomly generated filenames being the same, so you could destroy some of your data this way.
Try this out, which should help you identify any problems. This will only rename files, and skip subdirectories.
import os
import random
import string
directory = "C:\\whatever"
alphabet = string.ascii_lowercase
for item in os.listdir(directory):
old_fn = os.path.join(directory, item)
new_fn = ''.join(random.sample(alphabet, random.randint(5,15)))
new_fn += os.path.splitext(old_fn)[1] #adds file extension
if os.path.isfile(old_fn) and not os.path.exists(new_fn):
os.rename(path, os.path.join(directory, new_fn))
else:
print 'error renaming {} -> {}'.format(old_fn, new_fn)
If you want to save back to the same directory you will need to add a path to your 'string' variable. Currently it is just creating a filename and os.rename requires a path.
for item in listDir:
path = os.path.join(directory, item)
for x in random.sample(alphabet, random.randint(5,15)):
string += x
string += path[-4:] #adds file extension
string = os.path.join(directory,string)
os.rename(path, string)
string= ""