glob function in python with one wildcard - python

I have a problem with the glob.glob function in Python.
This line works perfectly for me getting all text files with the name 002 in the two subsequent folders of Models:
All_txt = glob.glob("C:\Users\EDV\Desktop\Peter\Models\*\*\002.txt")
But going into one subfolder and asking the same:
All_txt = glob.glob('C:\Users\EDV\Desktop\Peter\Models\Texte\*\002.txt')
results in an empty list. Does anybody know what the problem here is (or knows another function which expresses the same)?
I double-checked the folder paths and that all folders contain these text-files.

Try putting an r in front of the string to make a raw string: glob.glob(r'C:\Users\EDV\Desktop\Peter\Models\Texte\*\002.txt'). This will make it so the backslashes arent used for escaping the next character.
You could also do it without glob like so:
import os
all_txt = []
root = r'C:\Users\EDV\Desktop\Peter\Models\Texte'
for d in os.listdir(root):
abs_d = os.path.join(root, d)
if os.path.isdir(abs_d):
txt = os.path.join(abs_d, '002.txt')
if os.path.isfile(txt):
all_txt.append(txt)

Related

Glob to match files except certain extension

Newbie to python! I'm trying to use glob in conjunction with max to find the last modified file in a folder but excluding one type with extension pdf. I've tried:
Without the exclude i have this which is working fine:
crshLogs = glob.glob(homePath+crshLogPath+'*.*')
currCrshLog = max(crshLogs , key = os.path.getmtime)
To try and exclude the pdf I've tried:
crshLogs = glob.glob(homePath+crshLogPath+'!(*.pdf)')
and also
crshLogs = glob.glob(homePath+crshLogPath+'*.*') - glob.glob(homePath+crshLogPath+'*.pdf')
But in both cases the next line of code fails with ValueError: max() arg is an empty sequence so presumably nothing is being returned.
Any help would be gratefully received!
[filename for filename in glob.glob(homePath+crshLogPath+'*.*') if not filename.endswith('pdf')]
Also I would change
crshLogs = glob.glob(homePath+crshLogPath+'.')
to
crshLogs = glob.glob(os.path.join(homePath, crshLogPath, *.*')
This will take care of shitty edgecases, like homePath ending not in / and crshLog path not starting with / which would make a mess
You could create an array and not put pdfs in it:
file_list=[]
for filename in glob.glob(homePath+crshLogPath+'*.*'):
if ".pdf" not in filename:
file_list.append(filename)
And then get your filenames from that array.

Change image names using os.walk to include parent directory names

I would like to rename images based on part of the name of the folder the images are in and iterate through the images. I am using os.walk and I was able to rename all the images in the folders but could not figure out how to use the letters to the left of the first hyphen in the folder name as part of the image name.
Folder name: ABCDEF - THIS IS - MY FOLDER - NAME
Current image names in folder:
dsc_001.jpg
dsc_234.jpg
dsc_123.jpg
Want to change to show like this:
ABCDEF_1.jpg
ABCDEF_2.jpg
ABCDEF_3.jpg
What I have is this, but I am not sure why I am unable to split the filename by the hyphen:
import os
from os.path import join
path = r'C:\folderPath'
i = 1
for root, dirs, files in os.walk(path):
for image in files:
prefix = files.split(' - ')[0]
os.rename(os.path.join(path, image), os.path.join(path, prefix + '_'
+ str(i)+'.jpg'))
i = i+1
Okay, I've re-read your question and I think I know what's wrong.
1.) The os.walk() iterable is recursive, i.e. if you use os.walk(r'C:\'), it will loop through all the folders and find all the files under C drive. Now I'm not sure if your C:\folderPath has any sub-folders in it. If it does, and any of the folder/file format are not the convention as C:\folderPath, your code is going to have a bad time.
2.) When you iterate through files, you are split()ing the wrong object. Your question state you want to split the Folder name, but your code is splitting the files iterable which is a list of all the files under the current iteration directory. That doesn't accomplish what you want. Depending if your ABCDEF folder is the C:\folderPath or a sub folder within, you'll need to code differently.
3.) you have imported join from os.path but you still end up calling the full name os.path.join() anyways, which is redundant. Either just import os and call os.path.join() or just with your current imports, just join().
Having said all of that, here are my edits:
Answer 1:
If your ABCDEF is the assigned folder
import os
from os.path import join
path = r'C:\ABCDEF - THIS - IS - MY - FOLDER - NAME'
for root, dirs, files in os.walk(path):
folder = root.split("\\")[-1] # This gets you the current folder's name
for i, image in enumerate(files):
new_image = "{0}_{1}.jpg".format(folder.split(' - ')[0], i + 1)
os.rename(join(path, image), join(path, new_image))
break # if you have sub folders that follow the SAME structure, then remove this break. Otherwise, keep it here so your code stop after all the files are updated in your parent folder.
Answer 2:
Assuming your ABCDEF's are all sub folders under the assigned directory, and all of them follow the same naming convention.
import os
from os.path import join
path = r'C:\parentFolder' # The folder that has all the sub folders that are named ABCDEF...
for i, (root, dirs, files) in enumerate(os.walk(path)):
if i == 0: continue # skip the parentFolder as it doesn't follow the same naming convention
folder = root.split("\\")[-1] # This gets you the current folder's name
for i, image in enumerate(files):
new_image = "{0}_{1}.jpg".format(folder.split(' - ')[0], i + 1)
os.rename(join(path, image), join(path, new_image))
Note:
If your scenario doesn't fall under either of these, please make it clear what your folder structure is (a sample including all sub folders and sub files). Remember, consistency is key in determining how your code should work. If it's inconsistent, your best bet is use Answer 1 on each target folder separately.
Changes:
1.) You can get an incremental index without doing a i += 1. enumerate() is a great tool for iterables that also give you the iteration number.
2.) Your split() should be operated on the folder name instead of files (an iterable). In your case, image is the actual file name, and files is the list of files in the current iteration directory.
3.) Use of str.format() function to make your new file format easier to read.
4.) You'll note the use of split("\\") instead of split(r"\"), and that's because a single backslash cannot be a raw string.
This should now work. I ended up doing a lot more research than expected such as how to handle the os.walk() properly in both scenarios. For future reference, a little google search goes a long way. I hope this finally answers your question. Remember, doing your own research and clarity in demonstrating your problem will get you more efficient answers.
Bonus: if you have python 3.6+, you can even use f strings for your new file name, which ends up looking really cool:
new_image = f"{image.split(' - ')[0]}_{i+1}.jpg"

WxPython - building a directory tree based on file availability

I do atomistic modelling, and use Python to analyze simulation results. To simplify work with a whole bunch of Python scripts used for different tasks, I decided to write simple GUI to run scripts from it.
I have a (rather complex) directory structure beginning from some root (say ~/calc), and I want to populate wx.TreeCtrl control with directories containing calculation results preserving their structure. The folder contains the results if it contains a file with .EXT extension. What i try to do is walk through dirs from root and in each dir check whether it contains .EXT file. When such dir is reached, add it and its ancestors to the tree:
def buildTree(self, rootdir):
root = rootdir
r = len(rootdir.split('/'))
ids = {root : self.CalcTree.AddRoot(root)}
for (dirpath, dirnames, filenames) in os.walk(root):
for dirname in dirnames:
fullpath = os.path.join(dirpath, dirname)
if sum([s.find('.EXT') for s in filenames]) > -1 * len(filenames):
ancdirs = fullpath.split('/')[r:]
ad = rootdir
for ancdir in ancdirs:
d = os.path.join(ad, ancdir)
ids[d] = self.CalcTree.AppendItem(ids[ad], ancdir)
ad = d
But this code ends up with many second-level nodes with the same name, and that's definitely not what I want. So I somehow need to see if the node is already added to the tree, and in positive case add new node to the existing one, but I do not understand how this could be done. Could you please give me a hint?
Besides, the code contains 2 dirty hacks I'd like to get rid of:
I get the list of ancestor dirs with splitting the full path in \
positions, and this is Linux-specific;
I find if .EXT file is in the directory by trying to find the extension in the strings from filenames list, taking in account that s.find returns -1 if the substring is not found.
Is there a way to make these chunks of code more readable?
First of all the hacks:
To get the path seperator for whatever os your using you can use os.sep.
Use str.endswith() and use the fact that in Python the empty list [] evaluates to False:
if [ file for file in filenames if file.endswith('.EXT') ]:
In terms of getting them all nicely nested you're best off doing it recursively. So the pseudocode would look something like the following. Please note this is just provided to give you an idea of how to do it, don't expect it to work as it is!
def buildTree(self, rootdir):
rootId = self.CalcTree.AddRoot(root)
self.buildTreeRecursion(rootdir, rootId)
def buildTreeRecursion(self, dir, parentId)
# Iterate over the files in dir
for file in dirFiles:
id = self.CalcTree.AppendItem(parentId, file)
if file is a directory:
self.buildTreeRecursion(file, id)
Hope this helps!

batch search and replace strings in filenames with python

I am trying to write a small python script to rename a bunch of filenames by searching and replacing. For example:
Original filename:
MyMusic.Songname.Artist-mp3.iTunes.mp3
Intendet Result:
Songname.Artist.mp3
what i've got so far is:
#!/usr/bin/env python
from os import rename, listdir
mustgo = "MyMusic."
filenames = listdir('.')
for fname in fnames:
if fname.startswith(mustgo):
rename(fname, fname.replace(mustgo, '', 1))
(got it from this site as far as i can remember)
Anyway, this will only get rid of the String at the beginning, but not of those in the filename.
Also I would like to maybe use a seperate file (eg badwords.txt) containing all the strings that should be searched for and replaced, so that i can update them without having to edit the whole code.
Content of badwords.txt
MyMusic.
-mp3
-MP3
.iTunes
.itunes
I have been searching for quite some time now but havent found anything. Would appreciate any help!
Thank you!
import fnmatch
import re
import os
with open('badwords.txt','r') as f:
pat='|'.join(fnmatch.translate(badword)[:-1] for badword in
f.read().splitlines())
for fname in os.listdir('.'):
new_fname=re.sub(pat,'',fname)
if fname != new_fname:
print('{o} --> {n}'.format(o=fname,n=new_fname))
os.rename(fname, new_fname)
# MyMusic.Songname.Artist-mp3.iTunes.mp3 --> Songname.Artist.mp3
Note that it is possible for some files to be overwritten (and thus
lost) if two names get reduced to the same shortened name after
badwords have been removed. A set of new fnames could be kept and
checked before calling os.rename to prevent losing data through
name collisions.
fnmatch.translate takes shell-style patterns and returns the
equivalent regular expression. It is used above to convert badwords
(e.g. '.iTunes') into regular expressions (e.g. r'\.iTunes').
Your badwords list seems to indicate you want to ignore case. You
could ignore case by adding '(?i)' to the beginning of pat:
with open('badwords.txt','r') as f:
pat='(?i)'+'|'.join(fnmatch.translate(badword)[:-1] for badword in
f.read().splitlines())

Selecting folders using strings in Python

Simple question here: I'm trying to identify folders with a specific string in their name, but I want to specify some additional exclusion criteria. Right now, I'm looking for all folders that begin with a specific string using this syntax:
import os
parent_cause = 'B03'
path = ('filepath')
child_causes = [x for x in os.listdir(path) if x.startswith(parent_cause + '.')]
While this does identify the subfolders I am looking for ('B03.1', 'B03.2'), it also includes deeper subfolders which I want to exclude ('B03.1.1', 'B03.1.2'). Any thoughts on a simple algorithm to identify subfolders which begin the the string, but exclude ones which contain two or more '.' than the parent?
NOt sure I fully understand the issues, but I suggest os.walk
good_dirs = []
bad_dirs = []
for root, files, dirs in os.walk("/tmp/folder/B03"):
# this will walk recursively depth first into B03
# root will be the pwd, so we can test for that
if root.count(".") == 1: ###i think aregex here might help
good_dirs.append(root)
else:
bad_dirs.append(root)
try using regex
import os
import re
parent_cause = 'B03'
path = ('filepath')
validPath = []
for eachDir in os.listdir(path):
if re.match('^%s\.\d+$' % parent_cause, eachDir):
validPath.append(path+'/'+eachDir)

Categories