I need to retrieve the directory of the most recently create folder. I am using a program that will output a new run## folder each time it is executed (i.e run01, run02, run03 and so on). Within any one run## folder resides a data file that I want analyze (file-i-want.txt).
folder_numb = 'run01'
dir = os.path.dirname(__file__)
filepath = os.path.join(dir, '..\data\directory',run_numb,'file-i-want.txt')
In short I want to skip having to hardcode in run## and just get the directory of a file within the most recently created run## folder.
You can get the creation date with os.stat
path = '/a/b/c'
#newest
newest = max([f for f in os.listdir(path)], key=lambda x: os.stat(os.path.join(path,x)).st_birthtime)
# all files sorted
sorted_files = sorted([f for f in os.listdir(path)],key=lambda x: os.stat(os.path.join(path, x)).st_birthtime, reverse=True)
pathlib is the recommeded over os for filesystem related tasks.
reference
You can try:
filepath = Path(__file__).parent / 'data/directory'
fnames = sorted(list(Path(filepath).rglob('file-i-want.txt')), key=lambda x: Path.stat(x).st_mtime, reverse=True)
filepath = str(fnames[0])
filepath
glob.glob('run*') will return the list of files/directories that match the pattern ordered by name.
so if you want the latest run your code will be:
import glob
print(glob.glob('run*')[-1]) # raises index error if there are no runs
IMPORTANT, the files are ordered by name, in this case, for example, 'run21' will come AFTER 'run100', so you will need to use a high enough number of digits to not see this error. or just count the number of matched files and recreate the name of the folder with this number.
you can use glob to check the number of files with the same name pattern:
import glob
n = len(glob.glob('run*')) # number of files which name starts with 'run'
new_run_name = 'run' + str(n)
Note: with this code the file names starts from 0, if you want to start from 1 just add 1 to n.
if you want always double digit run number (00, 01, 02) instead of 'str(n)' use 'str(n).zfill(2)'
example:
import glob
n = len(glob.glob('run*')) # number of files which name starts with 'run'
new_run_name = 'run' + str(n + 1).zfill(2)
Related
I have a folder Alpha which contains a series of folders named Beta1, Beta2, ..., Beta 397. Each of the Beta folders contains a variable number of alphanumerically numbered images of different file formats.
My goal is to run a script that crawls all these Beta folders, selectively chooses images of jpeg/png format, and merges them to a pdf (per Beta folder) after name-sort.
My code is stored alongside the Beta folders and reads:-
import glob
import re
import img2pdf
import os
_nsre = re.compile('([0-9]+)')
def natural_sort_key(s):
return [int(text) if text.isdigit() else text.lower()
for text in re.split(_nsre, s)]
for X in range(1, 397):
dirname = os.path.join('./','BetaX', '')
output = os.path.join('./','BetaX', '/output.pdf')
# Get all the filenames per image format
filenames1 = [f for f in glob.iglob(f'{dirname}*.jpg')]
filenames2 = [f for f in glob.iglob(f'{dirname}*.png')]
# Merges the 2 lists
filenames3 = filenames1 + filenames2
# Sort the list alphanumerically
filenames3.sort(key=natural_sort_key)
# Print to pdf
with open(output,"wb") as f:
f.write(img2pdf.convert(filenames3))
print(f'Finished converting {output}')
filenames1.clear()
filenames2.clear()
filenames3.clear()
If I remove the for loop line and type the value of X, the pdf is outputted without any fuss, on an individual-folder basis. However, I am looking for ways to treat X as a loop-variable from the range and batch-process all the folders at once.
The way your code is currently:
for X in range(1, 397):
dirname = os.path.join('./','BetaX', '')
output = os.path.join('./','BetaX', '/output.pdf')
The X is just a character in the string BetaX. You need to cause X to be treated like an integer value and then you need to concatenate that value onto Beta to come up with your full folder name.
Also, you don't want the slashes in what you're passing to os.path.join. The point of the join call is to hide the details of the path separator character. The value of output will be just /output.pdf with what you have, as the third parameter will be considered an absolute path because of the slash at the front of it.
Here's that part of your code with both of these issues addressed:
for X in range(1, 397):
dirname = os.path.join('.','Beta' + str(X), '')
output = os.path.join('.','Beta' + str(X), 'output.pdf')
I have a bunch of files that are downloaded and I'm trying to get the most recently downloaded version for my analysis. Obviously this is sorting based on text rather than numeric so I'm running into the issue where File 30 comes before File 4. The numbers are within () everytime (your normal copied download). How would I sort based on that number?
Filename (1)
Filename (2)
...
Filename (30)
Filename (4)
...
files = glob.glob(r"C:\Users\xxxxx\Downloads\Filename*")
#files = files.sort(reverse=True)
files = sorted(files, reverse = True)
print(files)
exit()
Using Regex with pattern r"\((\d+)\)" to extract the number inside the brackets and then convert to int for sorting.
Ex:
import re
files = glob.glob(r"C:\Users\xxxxx\Downloads\Filename*")
files = sorted(files, key=lambda x:int(re.search(r"\((\d+)\)", x).group(1)), reverse = True)
You can use key inside the sorted function. They key uses a function that gets the int value between the brackets.
e.g.
def get_num(x):
return int(x.split('(')[1].lstrip().split(')')[0])
files = glob.glob(r"C:\Users\xxxxx\Downloads\Filename*")
files = sorted(files, key = get_num, reverse = True)
print(files)
exit()
So I have a folder, say D:\Tree, that contains only subfolders (names may contain spaces). These subfolders contain a few files - and they may contain files of the form "D:\Tree\SubfolderName\SubfolderName_One.txt" and "D:\Tree\SubfolderName\SubfolderName_Two.txt" (in other words, the subfolder may contain both of them, one, or neither). I need to find every occurence where a subfolder contains both of these files, and send their absolute paths to a text file (in a format explained in the following example). Consider these three subfolders in D:\Tree:
D:\Tree\Grass contains Grass_One.txt and Grass_Two.txt
D:\Tree\Leaf contains Leaf_One.txt
D:\Tree\Branch contains Branch_One.txt and Branch_Two.txt
Given this structure and the problem mentioned above, I'd to like to be able to write the following lines in myfile.txt:
D:\Tree\Grass\Grass_One.txt D:\Tree\Grass\Grass_Two.txt
D:\Tree\Branch\Branch_One.txt D:\Tree\Branch\Branch_Two.txt
How might this be done? Thanks in advance for any help!
Note: It is very important that "file_One.txt" comes before "file_Two.txt" in myfile.txt
import os
folderPath = r'Your Folder Path'
for (dirPath, allDirNames, allFileNames) in os.walk(folderPath):
for fileName in allFileNames:
if fileName.endswith("One.txt") or fileName.endswith("Two.txt") :
print (os.path.join(dirPath, fileName))
# Or do your task as writing in file as per your need
Hope this helps....
Here is a recursive solution
def findFiles(writable, current_path, ending1, ending2):
'''
:param writable: file to write output to
:param current_path: current path of recursive traversal of sub folders
:param postfix: the postfix which needs to match before
:return: None
'''
# check if current path is a folder or not
try:
flist = os.listdir(current_path)
except NotADirectoryError:
return
# stores files which match given endings
ending1_files = []
ending2_files = []
for dirname in flist:
if dirname.endswith(ending1):
ending1_files.append(dirname)
elif dirname.endswith(ending2):
ending2_files.append(dirname)
findFiles(writable, current_path+ '/' + dirname, ending1, ending2)
# see if exactly 2 files have matching the endings
if len(ending1_files) == 1 and len(ending2_files) == 1:
writable.write(current_path+ '/'+ ending1_files[0] + ' ')
writable.write(current_path + '/'+ ending2_files[0] + '\n')
findFiles(sys.stdout, 'G:/testf', 'one.txt', 'two.txt')
Let's say I have the following files in a directory:
snackbox_1a.dat
zebrabar_3z.dat
cornrows_00.dat
meatpack_z2.dat
I have SEVERAL of these directories, in which all of the files are of the same format, ie:
snackbox_xx.dat
zebrabar_xx.dat
cornrows_xx.dat
meatpack_xx.dat
So what I KNOW about these files is the first bit (snackbox, zebrabar, cornrows, meatpack). What I don't know is the bit for the file extension (the 'xx'). This changes both within the directory across the files, and across the directories (so another directory might have different xx values, like 12, yy, 2m, 0t, whatever).
Is there a way for me to rename all of these files, or truncate them all (since the xx.dat will always be the same length), for ease of use when attempting to call them? For instance, I'd like to rename them so that I can, in another script, use a simple index to step through and find the file I want (instead of having to go into each directory and pull the file out manually).
In other words, I'd like to change the file names to:
snackbox.dat
zebrabar.dat
cornrows.dat
meatpack.dat
Thanks!
You can use shutil.move to move files. To calculate the new filename, you can use Python's string split method:
original_name = "snackbox_12.dat"
truncated_name = original.split("_")[0] + ".dat"
Try re.sub:
import re
filename = 'snackbox_xx.dat'
filename_new = re.sub(r'_[A-Za-z0-9]{2}', '', filename)
You should get 'snackbox.dat' for filename_new
This assumes the two characters after the "_" are either a number or lowercase/uppercase letter, but you could choose to expand the classes included in the regular expression.
EDIT: including moving and recursive search:
import shutil, re, os, fnmatch
directory = 'your_path'
for root, dirnames, filenames in os.walk(directory):
for filename in fnmatch.filter(filenames, '*.dat'):
filename_new = re.sub(r'_[A-Za-z0-9]{2}', '', filename)
shutil.move(os.path.join(root, filename), os.path.join(root, filename_new))
This solution renames all files in the current directory that match the pattern in the function call.
What the function does
snackbox_5R.txt >>> snackbox.txt
snackbox_6y.txt >>> snackbox_0.txt
snackbox_a2.txt >>> snackbox_1.txt
snackbox_Tm.txt >>> snackbox_2.txt
Let's look at the functions inputs and some examples.
list_of_files_names This is a list of string. Where each string is the filename without the _?? part.
Examples:
['snackbox.txt', 'zebrabar.txt', 'cornrows.txt', 'meatpack.txt', 'calc.txt']
['text.dat']
upper_bound=1000 This is an integer. When the ideal filename is already taken, e.g snackbox.dat already exist it will create snackbox_0.dat all the way up to snackbox_9999.dat if need be. You shouldn't have to change the default.
The Code
import re
import os
import os.path
def find_and_rename(dir, list_of_files_names, upper_bound=1000):
"""
:param list_of_files_names: List. A list of string: filname (without the _??) + extension, EX: snackbox.txt
Renames snackbox_R5.dat to snackbox.dat, etc.
"""
# split item in the list_of_file_names into two parts, filename and extension "snackbox.dat" -> "snackbox", "dat"
list_of_files_names = [(prefix.split('.')[0], prefix.split('.')[1]) for prefix in list_of_files_names]
# store the content of the dir in a list
list_of_files_in_dir = os.listdir(dir)
for file_in_dir in list_of_files_in_dir: # list all files and folders in current dir
file_in_dir_full_path = os.path.join(dir, file_in_dir) # we need the full path to rename to use .isfile()
print() # DEBUG
print('Is "{}" a file?: '.format(file_in_dir), end='') # DEBUG
print(os.path.isfile(file_in_dir_full_path)) # DEBUG
if os.path.isfile(file_in_dir_full_path): # filters out the folder, only files are needed
# Filename is a tuple containg the prefix filename and the extenstion
for file_name in list_of_files_names: # check if the file matches on of our renaming prefixes
# match both the file name (e.g "snackbox") and the extension (e.g "dat")
# It find "snackbox_5R.txt" by matching "snackbox" in the front and matching "dat" in the rear
if re.match('{}_\w+\.{}'.format(file_name[0], file_name[1]), file_in_dir):
print('\nOriginal File: ' + file_in_dir) # printing this is not necessary
print('.'.join(file_name))
ideal_new_file_name = '.'.join(file_name) # name might already be taken
# print(ideal_new_file_name)
if os.path.isfile(os.path.join(dir, ideal_new_file_name)): # file already exists
# go up a name, e.g "snackbox.dat" --> "snackbox_1.dat" --> "snackbox_2.dat
for index in range(upper_bound):
# check if this new name already exists as well
next_best_name = file_name[0] + '_' + str(index) + '.' + file_name[1]
# file does not already exist
if os.path.isfile(os.path.join(dir,next_best_name)) == False:
print('Renaming with next best name')
os.rename(file_in_dir_full_path, os.path.join(dir, next_best_name))
break
# this file exist as well, keeping increasing the name
else:
pass
# file with ideal name does not already exist, rename with the ideal name (no _##)
else:
print('Renaming with ideal name')
os.rename(file_in_dir_full_path, os.path.join(dir, ideal_new_file_name))
def find_and_rename_include_sub_dirs(master_dir, list_of_files_names, upper_bound=1000):
for path, subdirs, files in os.walk(master_dir):
print(path) # DEBUG
find_and_rename(path, list_of_files_names, upper_bound)
find_and_rename_include_sub_dirs('C:/Users/Oxen/Documents/test_folder', ['snackbox.txt', 'zebrabar.txt', 'cornrows.txt', 'meatpack.txt', 'calc.txt'])
I have a folder with over 100,000 files, all numbered with the same stub, but without leading zeros, and the numbers aren't always contiguous (usually they are, but there are gaps) e.g:
file-21.png,
file-22.png,
file-640.png,
file-641.png,
file-642.png,
file-645.png,
file-2130.png,
file-2131.png,
file-3012.png,
etc.
I would like to batch process this to create padded, contiguous files. e.g:
file-000000.png,
file-000001.png,
file-000002.png,
file-000003.png,
When I parse the folder with for filename in os.listdir('.'): the files don't come up in the order I'd like to them to. Understandably they come up
file-1,
file-1x,
file-1xx,
file-1xxx,
etc. then
file-2,
file-2x,
file-2xx,
etc. How can I get it to go through in the order of the numeric value? I am a complete python noob, but looking at the docs i'm guessing I could use map to create a new list filtering out only the numerical part, and then sort that list, then iterate that? With over 100K files this could be heavy. Any tips welcome!
import re
thenum = re.compile('^file-(\d+)\.png$')
def bynumber(fn):
mo = thenum.match(fn)
if mo: return int(mo.group(1))
allnames = os.listdir('.')
allnames.sort(key=bynumber)
Now you have the files in the order you want them and can loop
for i, fn in enumerate(allnames):
...
using the progressive number i (which will be 0, 1, 2, ...) padded as you wish in the destination-name.
There are three steps. The first is getting all the filenames. The second is converting the filenames. The third is renaming them.
If all the files are in the same folder, then glob should work.
import glob
filenames = glob.glob("/path/to/folder/*.txt")
Next, you want to change the name of the file. You can print with padding to do this.
>>> filename = "file-338.txt"
>>> import os
>>> fnpart = os.path.splitext(filename)[0]
>>> fnpart
'file-338'
>>> _, num = fnpart.split("-")
>>> num.rjust(5, "0")
'00338'
>>> newname = "file-%s.txt" % num.rjust(5, "0")
>>> newname
'file-00338.txt'
Now, you need to rename them all. os.rename does just that.
os.rename(filename, newname)
To put it together:
for filename in glob.glob("/path/to/folder/*.txt"): # loop through each file
newname = make_new_filename(filename) # create a function that does step 2, above
os.rename(filename, newname)
Thank you all for your suggestions, I will try them all to learn the different approaches. The solution I went for is based on using a natural sort on my filelist, and then iterating that to rename. This was one of the suggested answers but for some reason it has disappeared now so I cannot mark it as accepted!
import os
files = os.listdir('.')
natsort(files)
index = 0
for filename in files:
os.rename(filename, str(index).zfill(7)+'.png')
index += 1
where natsort is defined in http://code.activestate.com/recipes/285264-natural-string-sorting/
Why don't you do it in a two step process. Parse all the files and rename with padded numbers and then run another script that takes those files, which are sorted correctly now, and renames them so they're contiguous?
1) Take the number in the filename.
2) Left-pad it with zeros
3) Save name.
def renamer():
for iname in os.listdir('.'):
first, second = iname.replace(" ", "").split("-")
number, ext = second.split('.')
first, number, ext = first.strip(), number.strip(), ext.strip()
number = '0'*(6-len(number)) + number # pad the number to be 7 digits long
oname = first + "-" + number + '.' + ext
os.rename(iname, oname)
print "Done"
Hope this helps
The simplest method is given below. You can also modify for recursive search this script.
use os module.
get filenames
os.rename
import os
class Renamer:
def __init__(self, pattern, extension):
self.ext = extension
self.pat = pattern
return
def rename(self):
p, e = (self.pat, self.ext)
number = 0
for x in os.listdir(os.getcwd()):
if str(x).endswith(f".{e}") == True:
os.rename(x, f'{p}_{number}.{e}')
number+=1
return
if __name__ == "__main__":
pattern = "myfile"
extension = "txt"
r = Renamer(pattern=pattern, extension=extension)
r.rename()