Auto-increment file name Python - python

I am trying to write a function that assigns a path name and filename to a variable that is based on a name of a file than exists in the folder. Then, if the name of the file already exists the file name is auto-incremented. I have seen some posts on this using while loop but I cannot get my head around this and would like to wrap it in a recursive function.
Here is what I have so far. When testing with print statement every works well. But it does not return the new name back to the main program.
def checkfile(ii, new_name,old_name):
if not os.path.exists(new_name):
return new_name
if os.path.exists(new_name):
ii+=1
new_name = os.path.join(os.path.split(old_name)[0],str(ii) + 'snap_'+ os.path.split(old_name)[1])
print new_name
old_name = “D:\Bar\foo”
new_name= os.path.join(os.path.split(old_name)[0],”output_” + os.path.split(old_name)[1])
checkfile(0,new_name,old_name)

While I wouldn't recommend using recursion for this (python's stack maxes out at about 1000 function calls deep), you're just missing a return for the recursive bit:
new_name= os.path.join(os.path.split(old_name)[0],”output_” + os.path.split(old_name)[1])
checkfile(0,new_name,old_name)
Should instead be:
new_name= os.path.join(os.path.split(old_name)[0],”output_” + os.path.split(old_name)[1])
return checkfile(ii,new_name,old_name)
But really, you can make this a whole lot simpler by re-writing it as:
def checkfile(path):
path = os.path.expanduser(path)
if not os.path.exists(path):
return path
root, ext = os.path.splitext(os.path.expanduser(path))
dir = os.path.dirname(root)
fname = os.path.basename(root)
candidate = fname+ext
index = 0
ls = set(os.listdir(dir))
while candidate in ls:
candidate = "{}_{}{}".format(fname,index,ext)
index += 1
return os.path.join(dir,candidate)
This form also handles the fact that filenames have extensions, which your original code doesn't, at least not very clearly. It also avoids needless os.path.exist's, which can be very expensive, especially if the path is a network location.

Related

How can I avoid repeating a for loop in two different functions

I am writing an ImageCollection class in python that should hold a dictionary with a name and the image-object (pygame.image object).
In one case I want to load all images inside a folder to the dictionary and in another case just specific files, for example only button-files.
What I have written so far is this:
class ImageCollection:
def __init__(self):
self.dict = {}
def load_images(self, path):
directory = os.fsencode(path)
for file in os.listdir(directory):
file_name = os.fsdecode(file)
img_path = path + "/" + file_name
if file_name.endswith(".jpg") or file_name.endswith(".png"):
# Remove extension for dictionary entry name and add image to dictionary
#-----------------------------------------------------------------------
dict_entry_name = file_name.removesuffix(".jpg").removesuffix(".png")
self.dict.update({dict_entry_name: image.Image(img_path, 0)})
def load_specific_images(self, path, contains_str):
directory = os.fsencode(path)
for file in os.listdir(directory):
file_name = os.fsdecode(file)
img_path = path + "/" + file_name
if file_name.endswith(".jpg") or file_name.endswith(".png"):
if file_name.rfind(contains_str):
# Remove extension for dictionary entry name and add image to dictionary
#-----------------------------------------------------------------------
dict_entry_name = file_name.removesuffix(".jpg").removesuffix(".png")
self.dict.update({dict_entry_name: image.Image(img_path, 0)})
The only problem is that this is probably bad programming pattern, right? In this case it probably doesnt matter but I would like to know what the best-practice in this case would be.
How can I avoid repeating myself in two different functions when the only difference is just a single if condition?
I have tried creating a "dict_add" function that creates the entry.
Then I was thinking I could create two different functions, one which directly calls "dict_add" and the other one checks for the specific condition and then calls "dict_add".
Then I thought I could add create just a single function with the for-loop but pass a function as an argument (which would be a callback I assume?). But one callback would need an additional argument so thats where I got stuck and wondered if my approach was correct.
You could make the contains_str an optional argument.
In cases where you want to load_images - you just provide the path
In cases where you want to load specific images - you provide the path and the contains_str argument
In both cases you call load_images(...)
Code:
class ImageCollection:
def __init__(self):
self.dict = {}
def load_images(self, path, contains_str=""):
directory = os.fsencode(path)
for file in os.listdir(directory):
file_name = os.fsdecode(file)
img_path = path + "/" + file_name
if file_name.endswith(".jpg") or file_name.endswith(".png"):
if contains_str == "" or (contains_str != "" and file_name.rfind(contains_str)):
# Remove extension for dictionary entry name and add image to dictionary
#-----------------------------------------------------------------------
dict_entry_name = file_name.removesuffix(".jpg").removesuffix(".png")
self.dict.update({dict_entry_name: image.Image(img_path, 0)})

Why am I getting Flake8 F821 error when the variable exists?

I have a function that's returning a variable, and a second function that's using it. In my main func though flake8 is coming up that the variable is undefined.
I tried adding it as a global var, and placing a tox.ini file in the same folder as my script with ignore = F821 but this didn't register either. A
Any suggestions? Code block is below for reference. new_folder is the culprit
def createDestination(self):
'''
split the src variable for machine type
and create a folder with 'Evo' - machine
'''
s = src.split('\\')
new_folder = (dst + '\\Evo ' + s[-1])
if not os.path.exists(new_folder):
os.makedirs(new_folder)
return self.new_folder
def copyPrograms(new_folder):
'''
find all TB-Deco programs in second tier directory.
'''
# create file of folders in directory
folder_list = os.listdir(src)
# iterate the folder list
for folder in folder_list:
# create a new directory inside each folder
folder_src = (src + '\\' + folder)
# create a list of the files in the folder
file_list = os.listdir(folder_src)
# iterate the list of files
for file in file_list:
# if the file ends in .part .PART .dbp or .DBP - add it to a list
if (file.endswith('.part') or file.endswith('.PART') or
file.endswith('.dbp') or file.endswith('.DBP')):
# create a location variable for that file
file_src = (src + folder + '\\' + file)
# copy the file from the server to dst folder
new_file = ('Evo ' + file)
file_dst = (new_folder + '\\' + new_file)
if not os.path.exists(file_dst):
shutil.copy2(file_src, file_dst)
def main():
createDestination()
copyPrograms(new_folder)
if __name__ == "__main__":
main()
The first problem is that createDestination never defines an attribute self.new_folder, only a local variable new_folder. The indentation is also off, as you want to return new_folder whether or not you had to create it first.
def createDestination(self):
'''
split the src variable for machine type
and create a folder with 'Evo' - machine
'''
s = src.split('\\')
new_folder = (dst + '\\Evo ' + s[-1])
if not os.path.exists(new_folder):
os.makedirs(new_folder)
return new_folder # not self.new_folder
Second, you never assigned the return value of createDestination to any name so that you could pass it to copyPrograms as an argument.
def main():
new_folder = createDestination()
copyPrograms(new_folder)
Names have scope, and a variable named new_folder inside createDestination is distinct from one by the same name in main. As a corollary, there's no need to use the same name; the following definition of main works just as well:
def main():
d = createDestination()
copyPrograms(d)
and you don't even need to name the return value; you can pass it directly as
def main():
copyPrograms(createDestination())

Python keep multiple counters in one recursion function

I am trying to count the number of python files and non-python files in a path recursively.
import os
def main():
#path = input('Enter an existing path to a file or directory: ')
path ='/Users/ziyuanhan/PycharmProjects/lab6/'
print(count_file(path, counter={'py':0, 'non_py':0}))
def count_file(path,counter):
if os.path.isfile(path):
if path.endswith('.py') :
counter['py']+=1
return path, counter
else:
counter['non_py']+=1
return path, counter
elif os.path.isdir(path):
for files in os.listdir(path):
print(files)
path = os.path.abspath(files)
print(path)
count_file(path, counter)
return path, counter
main()
The few problems I have is
I had trouble in keeping multiple counters in one recursion function.
Also the return I want is a dictionary format, but I can only do it this way because I have to return it with path.
I use print(files) to check if the function is working alright, but it shows a lot more files(the top 7 files) I never seen in my folder, why is this happening?
When print(files)
/Library/Frameworks/Python.framework/Versions/3.5/bin/python3.5
/Users/ziyuanhan/PycharmProjects/lab7/recursive_dir_traversal.py
.DS_Store
/Users/ziyuanhan/PycharmProjects/lab7/.DS_Store
.idea
/Users/ziyuanhan/PycharmProjects/lab7/.idea
lab7.iml
/Users/ziyuanhan/PycharmProjects/lab7/lab7.iml
misc.xml
/Users/ziyuanhan/PycharmProjects/lab7/misc.xml
modules.xml
/Users/ziyuanhan/PycharmProjects/lab7/modules.xml
workspace.xml
/Users/ziyuanhan/PycharmProjects/lab7/workspace.xml
km_mi_table.py
/Users/ziyuanhan/PycharmProjects/lab7/km_mi_table.py
km_to_miles.py
/Users/ziyuanhan/PycharmProjects/lab7/km_to_miles.py
wordfrequency.py
/Users/ziyuanhan/PycharmProjects/lab7/wordfrequency.py
('/Users/ziyuanhan/PycharmProjects/lab7/wordfrequency.py', {'non_py': 0, 'py': 0})
BTW we have to use recursive function, it is mandatory as the Prof requested.
You don't need to iterate directory recursively yourself. You can use os.walk which yields directories, files for you:
You cannot change local variable / argument of caller. How about returns total_python, total_non_python and use in caller like below?
def count_file(path):
total_python, total_non_python = 0, 0
for parent, directories, files in os.walk(path):
for filename in files:
if filename.lower().endswith('.py'):
total_python += 1
else:
total_non_python += 1
return total_python, total_non_python
def main():
path = input('Enter a path to a file or directory: ')
total_python, total_non_python = count_file(path)
print(path, total_python, total_non_python)
Alternatively, os.scandir is also available since Python 3.5.
You can pass a dictionary as an argument to the function and change the values of the items in the dictionary.
First intialize the dictionary:
counters = {'py': 0, 'other': 0}
Then modify it inside the recursive function:
counters['py'] += 1
This will work because dictionaries are mutable.
This function takes a pathname and returns (total_python, total_not_python). It calls itself on each entries in directories. This is meant to be as close to the given code as reasonable.
def count_file(path):
if os.path.isfile(path):
if path.endswith('.py') :
return 1, 0
else:
return 0, 1
elif os.path.isdir(path):
total_python, total_not_python = 0, 0
for files in os.listdir(path):
print(files)
path = os.path.join(path, files)
subtotal_python, subtotal_python = count_file(path)
total_python += subtotal_python
total_not_python += subtotal_not_python
return total_python, total_not_python

delete older folder with similar name using python

I need to iterate over a folder tree. I have to check each subfolder, which looks like this:
moduleA-111-date
moduleA-112-date
moduleA-113-date
moduleB-111-date
moduleB-112-date
etc.
I figured out how to iterate over a folder tree. I can also use stat with mtime to get the date of the folder which seems easier than parsing the name of the date.
How do I single out modules with the same prefix (such as "moduleA") and compare their mtime's so I can delete the oldest?
Since you have no code, I assume that you're looking for design help. I'd lead my students to something like:
Make a list of the names
From each name, find the prefix, such as "moduleA. Put those in a set.
For each prefix in the set
Find all names with that prefix; put these in a temporary list
Sort this list.
For each file in this list *except* the last (newest)
delete the file
Does this get you moving?
I'm posting the code (answer) here, I suppose my question wasn't clear since I'm getting minus signs but anyway the solution wasn't as straight forward as I thought, I'm sure the code could use some fine tuning but it get's the job done.
#!/usr/bin/python
import os
import sys
import fnmatch
import glob
import re
import shutil
##########################################################################################################
#Remove the directory
def remove(path):
try:
shutil.rmtree(path)
print "Deleted : %s" % path
except OSError:
print OSError
print "Unable to remove folder: %s" % path
##########################################################################################################
#This function will look for the .sh files in a given path and returns them as a list.
def searchTreeForSh(path):
full_path = path+'*.sh'
listOfFolders = glob.glob(full_path)
return listOfFolders
##########################################################################################################
#Gets the full path to files containig .sh and returns a list of folder names (prefix) to be acted upon.
#listOfScripts is a list of full paths to .sh file
#dirname is the value that holds the root directory where listOfScripts is operating in
def getFolderNames(listOfScripts):
listOfFolders = []
folderNames = []
for foldername in listOfScripts:
listOfFolders.append(os.path.splitext(foldername)[0])
for folders in listOfFolders:
folder = folders.split('/')
foldersLen=len(folder)
folderNames.append(folder[foldersLen-1])
folderNames.sort()
return folderNames
##########################################################################################################
def minmax(items):
return max(items)
##########################################################################################################
#This function will check the latest entry in the tuple provided, and will then send "everything" to the remove function except that last entry
def sortBeforeDelete(statDir, t):
count = 0
tuple(statDir)
timeNotToDelete = minmax(statDir)
for ff in t:
if t[count][1] == timeNotToDelete:
count += 1
continue
else:
remove(t[count][0])
count += 1
##########################################################################################################
#A loop to run over the fullpath which is broken into items (see os.listdir above), elemenates the .sh and the .txt files, leaves only folder names, then matches it to one of the
#name in the "folders" variable
def coolFunction(folderNames, path):
localPath = os.listdir(path)
for folder in folderNames:
t = () # a tuple to act as sort of a dict, it will hold the folder name and it's equivalent st_mtime
statDir = [] # a list that will hold the st_mtime for all the folder names in subDirList
for item in localPath:
if os.path.isdir(path + item) == True:
if re.search(folder, item):
mtime = os.stat(path + '/' + item)
statDir.append(mtime.st_mtime)
t = t + ((path + item,mtime.st_mtime),)# the "," outside the perenthasis is how to make t be a list of lists and not set the elements one after theother.
if t == ():continue
sortBeforeDelete(statDir, t)
##########################################################################################################
def main(path):
dirs = os.listdir(path)
for component in dirs:
if os.path.isdir(component) == True:
newPath = path + '/' + component + '/'
listOfFolders= searchTreeForSh(newPath)
folderNames = getFolderNames(listOfFolders)
coolFunction(folderNames, newPath)
##########################################################################################################
if __name__ == "__main__":
main(sys.argv[1])

Problems with variable referenced before assignment when using os.path.walk

OK. I have some background in Matlab and I'm now switching to Python.
I have this bit of code under Pythnon 2.6.5 on 64-bit Linux which scrolls through directories, finds files named 'GeneralData.dat', retrieves some data from them and stitches them into a new data set:
import pylab as p
import os, re
import linecache as ln
def LoadGenomeMeanSize(arg, dirname, files):
for file in files:
filepath = os.path.join(dirname, file)
if filepath == os.path.join(dirname,'GeneralData.dat'):
data = p.genfromtxt(filepath)
if data[-1,4] != 0.0: # checking if data set is OK
data_chopped = data[1000:-1,:] # removing some of data
Grand_mean = data_chopped[:,2].mean()
Grand_STD = p.sqrt((sum(data_chopped[:,4]*data_chopped[:,3]**2) + sum((data_chopped[:,2]-Grand_mean)**2))/sum(data_chopped[:,4]))
else:
break
if filepath == os.path.join(dirname,'ModelParams.dat'):
l = re.split(" ", ln.getline(filepath, 6))
turb_param = float(l[2])
arg.append((Grand_mean, Grand_STD, turb_param))
GrandMeansData = []
os.path.walk(os.getcwd(), LoadGenomeMeanSize, GrandMeansData)
GrandMeansData = sorted(GrandMeansData, key=lambda data_sort: data_sort[2])
TheMeans = p.zeros((len(GrandMeansData), 3 ))
i = 0
for item in GrandMeansData:
TheMeans[i,0] = item[0]
TheMeans[i,1] = item[1]
TheMeans[i,2] = item[2]
i += 1
print TheMeans # just checking...
# later do some computation on TheMeans in NumPy
And it throws me this (though I would swear it was working a month ego):
Traceback (most recent call last):
File "/home/User/01_PyScripts/TESTtest.py", line 29, in <module>
os.path.walk(os.getcwd(), LoadGenomeMeanSize, GrandMeansData)
File "/usr/lib/python2.6/posixpath.py", line 233, in walk
walk(name, func, arg)
File "/usr/lib/python2.6/posixpath.py", line 225, in walk
func(arg, top, names)
File "/home/User/01_PyScripts/TESTtest.py", line 26, in LoadGenomeMeanSize
arg.append((Grand_mean, Grand_STD, turb_param))
UnboundLocalError: local variable 'Grand_mean' referenced before assignment
All right... so I went and did some reading and came up with this global variable:
import pylab as p
import os, re
import linecache as ln
Grand_mean = p.nan
Grand_STD = p.nan
def LoadGenomeMeanSize(arg, dirname, files):
for file in files:
global Grand_mean
global Grand_STD
filepath = os.path.join(dirname, file)
if filepath == os.path.join(dirname,'GeneralData.dat'):
data = p.genfromtxt(filepath)
if data[-1,4] != 0.0: # checking if data set is OK
data_chopped = data[1000:-1,:] # removing some of data
Grand_mean = data_chopped[:,2].mean()
Grand_STD = p.sqrt((sum(data_chopped[:,4]*data_chopped[:,3]**2) + sum((data_chopped[:,2]-Grand_mean)**2))/sum(data_chopped[:,4]))
else:
break
if filepath == os.path.join(dirname,'ModelParams.dat'):
l = re.split(" ", ln.getline(filepath, 6))
turb_param = float(l[2])
arg.append((Grand_mean, Grand_STD, turb_param))
GrandMeansData = []
os.path.walk(os.getcwd(), LoadGenomeMeanSize, GrandMeansData)
GrandMeansData = sorted(GrandMeansData, key=lambda data_sort: data_sort[2])
TheMeans = p.zeros((len(GrandMeansData), 3 ))
i = 0
for item in GrandMeansData:
TheMeans[i,0] = item[0]
TheMeans[i,1] = item[1]
TheMeans[i,2] = item[2]
i += 1
print TheMeans # just checking...
# later do some computation on TheMeans in NumPy
It does not give error massages. Even gives a file with data... but data are bloody wrong! I checked some of them manually by running commands:
import pylab as p
data = p.genfromtxt(filepath)
data_chopped = data[1000:-1,:]
Grand_mean = data_chopped[:,2].mean()
Grand_STD = p.sqrt((sum(data_chopped[:,4]*data_chopped[:,3]**2) \
+ sum((data_chopped[:,2]-Grand_mean)**2))/sum(data_chopped[:,4]))
on selected files. They are different :-(
1) Can anyone explain me what's wrong?
2) Does anyone know a solution to that?
I'll be grateful for help :-)
Cheers,
PTR
I would say this condition is not passing:
if filepath == os.path.join(dirname,'GeneralData.dat'):
which means you are not getting GeneralData.dat before ModelParams.dat. Maybe you need to sort alphabetically or the file is not there.
I see one issue with the code and the solution that you have provided.
Never hide the issue of "variable referencing before assignment" by just making the variable visible.
Try to understand why it happened?
Prior to creating a global variable "Grand_mean", you were getting an issue that you are accessing Grand_mean before any value is assigned to it. In such a case, by initializing the variable outside the function and marking it as global, only serves to hide the issue.
You see erroneous result because now you have made the variable visible my making it global but the issue continues to exist. You Grand_mean was never equalized to some correct data.
This means that section of code under "if filepath == os.path.join(dirname,..." was never executed.
Using global is not the right solution. That only makes sense if you do in fact want to reference and assign to the global "Grand_mean" name. The need for disambiguation comes from the way the interpreter prescans for assignment operators in function declarations.
You should start by assigning a default value to Grand_mean within the scope of LoadGenomeMeanSize(). You have 1 of 4 branches to actually assign a value to Grand_mean that has correct semantic meaning within one loop iteration. You are likely running into a case where
if filepath == os.path.join(dirname,'ModelParams.dat'): is true, but either
if filepath == os.path.join(dirname,'GeneralData.dat'): or if data[-1,4] != 0.0: is not. It's likely the second condition that is failing for you. Move the
The quick and dirty answer is you probably need to rearrange your code like this:
...
if filepath == os.path.join(dirname,'GeneralData.dat'):
data = p.genfromtxt(filepath)
if data[-1,4] != 0.0: # checking if data set is OK
data_chopped = data[1000:-1,:] # removing some of data
Grand_mean = data_chopped[:,2].mean()
Grand_STD = p.sqrt((sum(data_chopped[:,4]*data_chopped[:,3]**2) + sum((data_chopped[:,2]-Grand_mean)**2))/sum(data_chopped[:,4]))
if filepath == os.path.join(dirname,'ModelParams.dat'):
l = re.split(" ", ln.getline(filepath, 6))
turb_param = float(l[2])
arg.append((Grand_mean, Grand_STD, turb_param))
else:
break
...

Categories