I often find myself in a situation where I have a folder containing files which are named according to a certain file naming convention, and I have to go through them manually to rename them to the one I want. A laborious repetitive task.
E.g. 01_artist_name_-_album_title_-_song_title_somethingelse.mp3 -> Song_Title.mp3
So the removal of certain bits of information, replacement of underscores with spaces, and capitalisation. Not just for music, that's just an example.
I have been thinking about automating this task using Python. Basically I want to be able to input the starting convention and my wanted convention and for it to rename them all accordingly.
Ideally I want to be able to do this in Python on Windows, but I have an Ubuntu machine I could use for this if it was easier to do in bash (or Python on UNIX).
If anyone can shed light on how I might approach this problem (suggestion of IO python commands that read contents of a folder - and rename files - on Windows, and how I might go about stripping the information from the filename and categorising it, maybe using RegEx?) I'll see what I can make it do and update with progress.
For your special case:
import glob, shutil, os.path
# glob.glob returns a list with all pathes according to the given pattern
for path in glob.glob("music_folder/*.mp3"):
# os.path.dirname gives the directory name, here it is "music_folder"
dirname = os.path.dirname(path)
# example: 01_artist_name_-_album_title_-_song_title_somethingelse.mp3
# split returns "_song_title_somethingelse.mp3"
interesting = path.split("-")[2]
# titlepart is a list with ["song", "title"], the beginning "_" and the
# 'somehting' string is removed by choosing the slice 1:-1
titlepart = interesting.split("_")[1:-1]
# capitalize converts song -> Song, title -> title
# join gluest both to "Song_Title"
new_name = "_".join(p.capitalize() for p in titlepart)+".mp3"
# shutil.move renames the given file
shutil.move(path, os.path.join(dirname, new_name))
If you want to use regular expression, you have to replace:
m=re.search(".*-_(\S+_\S+)_.*",path)
if m is None:
raise Exception("file name does not match regular expression")
song_name = m.groups()[0]
titlepart = song_name.split("_")
Related
I have a folder which contains +100 songs named this way "Song Name, Singer Name" (e.g. Smooth Criminal, Michael Jackson). I'm trying to rename all the songs to "Song Name (Singer Name)" (e.g. Smooth Criminal (Michael Jackson)).
I tried this code. But, I didn't know what parameters to write.
import os
files = os.getcwd()
os.rename(files, "") # I'm confused because I don't know what to put here as parameters since I want to change only parts of the files' names, and not the files' names entirely.
Any suggestions on the parameters of "os.rename()"?
Unlike the command line program rename which can rename a batch of files using a pattern in a single command, Python's os.rename() is a thin wrapper around the underlying rename syscall. It can thus rename a single file at a time.
Assuming all songs are stored in the same directory and ends with an extension like '.mp3', one approach is to loop over the return of os.listdir().
Additionally, it would be wise to check that current file is, indeed a file and not, say, a directory or symbolic link. This can be done using os.path.isfile()
Here is a full example:
import os
TARGET_DIR = "/path/to/some/folder"
for filename in os.listdir(TARGET_DIR):
# Only consider regular files
if not os.path.isfile(filename):
continue
# Extract filename and extension
basename, extension = os.path.splitext(filename)
if ',' not in basename:
continue # Ignore
# Extract the song and singer names
songname, singername = basename.split(',', 1)
# Build the new name, rename
target_name = f'{songname} ({singername}){extension}'
os.rename(
os.path.join(TARGET_DIR, filename),
os.path.join(TARGET_DIR, target_name)
)
Note: If the songs are potentially stored in subfolders, os.walk() will be a better candidate than the lower level os.listdir()
I would recommend you to write a loop using os.
However with pandas you can replace some Name snipeds with no problem, I would recommend chosing pandas for this Task!
I have a datapath to a file couple of data files, let us say data01.txt, data02.txt and so on. During processing the user will provide mask files for the data (potentially also via an external tool). Mask files will contain the string 'mask', e.g., data01-mask.txt.
from pathlib import Path
p = Path(C:\Windowns\test\data01.txt)
dircontent = list(p.parent.glob('*'))
Gives me a list of all the filespath as Path objects including potential masks. Now I want a list that gives me the directory content but not including any file containing mask. I have tried this approach to use fancy regex *![mask]* but I do not get it to work.
Using,
dircontentstr = [str(elem) for elem in x]
filtereddir = [elem.find('mask') for elem in dircontentstr if elem.find('mask')==-1]
I can get the desired result, but it seems silly to then convert back to Path elements. Is there a straight forward way to exclude from the directory list?
There is no need to convert anything to strings here, as Path objects have helpful attributes that you can use to filter on. Take a look at the .name and .stem attributes; these let you filter path objects on the base filename (where .stem is the base name without extension):
dircontent = [path for path in p.parent.glob('*') if 'mask' not in path.stem]
I would like to rename images based on part of the name of the folder the images are in and iterate through the images. I am using os.walk and I was able to rename all the images in the folders but could not figure out how to use the letters to the left of the first hyphen in the folder name as part of the image name.
Folder name: ABCDEF - THIS IS - MY FOLDER - NAME
Current image names in folder:
dsc_001.jpg
dsc_234.jpg
dsc_123.jpg
Want to change to show like this:
ABCDEF_1.jpg
ABCDEF_2.jpg
ABCDEF_3.jpg
What I have is this, but I am not sure why I am unable to split the filename by the hyphen:
import os
from os.path import join
path = r'C:\folderPath'
i = 1
for root, dirs, files in os.walk(path):
for image in files:
prefix = files.split(' - ')[0]
os.rename(os.path.join(path, image), os.path.join(path, prefix + '_'
+ str(i)+'.jpg'))
i = i+1
Okay, I've re-read your question and I think I know what's wrong.
1.) The os.walk() iterable is recursive, i.e. if you use os.walk(r'C:\'), it will loop through all the folders and find all the files under C drive. Now I'm not sure if your C:\folderPath has any sub-folders in it. If it does, and any of the folder/file format are not the convention as C:\folderPath, your code is going to have a bad time.
2.) When you iterate through files, you are split()ing the wrong object. Your question state you want to split the Folder name, but your code is splitting the files iterable which is a list of all the files under the current iteration directory. That doesn't accomplish what you want. Depending if your ABCDEF folder is the C:\folderPath or a sub folder within, you'll need to code differently.
3.) you have imported join from os.path but you still end up calling the full name os.path.join() anyways, which is redundant. Either just import os and call os.path.join() or just with your current imports, just join().
Having said all of that, here are my edits:
Answer 1:
If your ABCDEF is the assigned folder
import os
from os.path import join
path = r'C:\ABCDEF - THIS - IS - MY - FOLDER - NAME'
for root, dirs, files in os.walk(path):
folder = root.split("\\")[-1] # This gets you the current folder's name
for i, image in enumerate(files):
new_image = "{0}_{1}.jpg".format(folder.split(' - ')[0], i + 1)
os.rename(join(path, image), join(path, new_image))
break # if you have sub folders that follow the SAME structure, then remove this break. Otherwise, keep it here so your code stop after all the files are updated in your parent folder.
Answer 2:
Assuming your ABCDEF's are all sub folders under the assigned directory, and all of them follow the same naming convention.
import os
from os.path import join
path = r'C:\parentFolder' # The folder that has all the sub folders that are named ABCDEF...
for i, (root, dirs, files) in enumerate(os.walk(path)):
if i == 0: continue # skip the parentFolder as it doesn't follow the same naming convention
folder = root.split("\\")[-1] # This gets you the current folder's name
for i, image in enumerate(files):
new_image = "{0}_{1}.jpg".format(folder.split(' - ')[0], i + 1)
os.rename(join(path, image), join(path, new_image))
Note:
If your scenario doesn't fall under either of these, please make it clear what your folder structure is (a sample including all sub folders and sub files). Remember, consistency is key in determining how your code should work. If it's inconsistent, your best bet is use Answer 1 on each target folder separately.
Changes:
1.) You can get an incremental index without doing a i += 1. enumerate() is a great tool for iterables that also give you the iteration number.
2.) Your split() should be operated on the folder name instead of files (an iterable). In your case, image is the actual file name, and files is the list of files in the current iteration directory.
3.) Use of str.format() function to make your new file format easier to read.
4.) You'll note the use of split("\\") instead of split(r"\"), and that's because a single backslash cannot be a raw string.
This should now work. I ended up doing a lot more research than expected such as how to handle the os.walk() properly in both scenarios. For future reference, a little google search goes a long way. I hope this finally answers your question. Remember, doing your own research and clarity in demonstrating your problem will get you more efficient answers.
Bonus: if you have python 3.6+, you can even use f strings for your new file name, which ends up looking really cool:
new_image = f"{image.split(' - ')[0]}_{i+1}.jpg"
I want to copy an installer file from a location where one of the folder names changes as per the build number
This works for defining the path where the last folder name changes
import glob
import os
dirname = "z:\\zzinstall\\*.install"
filespec = "setup.exe"
print glob.glob (os.path.join (dirname, filespec))
# the print is how I'm verifying the path is correct
['z:\\zzinstall\\35115.install\\setup.exe'
The problem I have is that I can't get the setup.exe to launch due to the arguments needed
I need to launch setup.exe with, for example
setup.exe /S /z"
There are numerous other arguments that need to be passed with double quotes, slashes and whitespaces. Due to the documentation provided which is inconsistent, I have to test via trial and error. There are even instances that state I need to use a "" after a switch!
So how can I do this?
Ideally I'd like to pass the entrire path, including the file I need to glob or
I'd like to declare the result of the path with glob as a variable then concatenate with setup.exe and the arguements. That did not work, the error list can't be combined with string is returned.
Basically anything that works, so far I've failed because of my inability to handle the filename that varies and the obscene amount of whitespaces and special characters in the arguements.
The following link is related howevers does not include a clear answer for my specific question
link text
The response provided below does not answer the question nor does the link I provided, that's why I'm asking this question. I will rephrase in case I'm not understood.
I have a file that I need to copy at random times. The file is prependedned with unique, unpredicatable number e.g. a build number. Note this is a windows system.
For this example I will cite the same folder/file structure.
The build server creates a build any time in a 4 hour range. The path to the build server folder is Z:\data\builds*.install\setup.exe
Note the wildcard in the path. This means the folder name is prepended with a random(yes, random) string of 8 digits then a dot. then "install". So, the path at one time may be Z:\data\builds\12345678.install\setup.exe or it could be Z:\data\builds\66666666.install\setup.exe This is one, major portion of this problem. Note, I did not design this build numbering system. I've never seen anything like this my years as a QA engineer.
So to deal with the first issue I plan on using a glob.
import glob
import os
dirname = "Z:\\data\\builds\\*.install"
filespec = "setup.exe"
instlpath = glob.glob (os.path.join (dirname, filespec))
print instlpath # this is the test,printsthe accurate path to launch an install, problem #is I have to add arguements
OK so I thought I could use path that I defined as instlpath, concatnenate it and execute.
when it try and use prinnt to test
print instlpath + [" /S /z" ]
I get
['Z:\builds\install\12343333.install\setup.exe', ' /S /z']
I need
Z:\builds\install\12343333.install\setup.exe /S /z" #yes, I need the whitespace as #well and amy also need a z""
Why are all of the installs called setup.exe and not uniquely named? No freaking idea!
Thank You,
Surfdork
The related question you linked to does contain a relatively clear answer to your problem:
import subprocess
subprocess.call(['z:/zzinstall/35115.install/setup.exe', '/S', '/z', ''])
So you don't need to concatenate the path of setup.exe and its arguments. The arguments you specify in the list are passed directly to the program and not processed by the shell. For an empty string, which would be "" in a shell command, use an empty python string.
See also http://docs.python.org/library/subprocess.html#subprocess.call
Ant has a nice way to select groups of files, most handily using ** to indicate a directory tree. E.g.
**/CVS/* # All files immediately under a CVS directory.
mydir/mysubdir/** # All files recursively under mysubdir
More examples can be seen here:
http://ant.apache.org/manual/dirtasks.html
How would you implement this in python, so that you could do something like:
files = get_files("**/CVS/*")
for file in files:
print file
=>
CVS/Repository
mydir/mysubdir/CVS/Entries
mydir/mysubdir/foo/bar/CVS/Entries
Sorry, this is quite a long time after your OP. I have just released a Python package which does exactly this - it's called Formic and it's available at the PyPI Cheeseshop. With Formic, your problem is solved with:
import formic
fileset = formic.FileSet(include="**/CVS/*", default_excludes=False)
for file_name in fileset.qualified_files():
print file_name
There is one slight complexity: default_excludes. Formic, just like Ant, excludes CVS directories by default (as for the most part collecting files from them for a build is dangerous), the default answer to the question would result in no files. Setting default_excludes=False disables this behaviour.
As soon as you come across a **, you're going to have to recurse through the whole directory structure, so I think at that point, the easiest method is to iterate through the directory with os.walk, construct a path, and then check if it matches the pattern. You can probably convert to a regex by something like:
def glob_to_regex(pat, dirsep=os.sep):
dirsep = re.escape(dirsep)
print re.escape(pat)
regex = (re.escape(pat).replace("\\*\\*"+dirsep,".*")
.replace("\\*\\*",".*")
.replace("\\*","[^%s]*" % dirsep)
.replace("\\?","[^%s]" % dirsep))
return re.compile(regex+"$")
(Though note that this isn't that fully featured - it doesn't support [a-z] style glob patterns for instance, though this could probably be added). (The first \*\*/ match is to cover cases like \*\*/CVS matching ./CVS, as well as having just \*\* to match at the tail.)
However, obviously you don't want to recurse through everything below the current dir when not processing a ** pattern, so I think you'll need a two-phase approach. I haven't tried implementing the below, and there are probably a few corner cases, but I think it should work:
Split the pattern on your directory seperator. ie pat.split('/') -> ['**','CVS','*']
Recurse through the directories, and look at the relevant part of the pattern for this level. ie. n levels deep -> look at pat[n].
If pat[n] == '**' switch to the above strategy:
Reconstruct the pattern with dirsep.join(pat[n:])
Convert to a regex with glob\_to\_regex()
Recursively os.walk through the current directory, building up the path relative to the level you started at. If the path matches the regex, yield it.
If pat doesn't match "**", and it is the last element in the pattern, then yield all files/dirs matching glob.glob(os.path.join(curpath,pat[n]))
If pat doesn't match "**", and it is NOT the last element in the pattern, then for each directory, check if it matches (with glob) pat[n]. If so, recurse down through it, incrementing depth (so it will look at pat[n+1])
os.walk is your friend. Look at the example in the Python manual
(https://docs.python.org/2/library/os.html#os.walk) and try to build something from that.
To match "**/CVS/*" against a file name you get, you can do something like this:
def match(pattern, filename):
if pattern.startswith("**"):
return fnmatch.fnmatch(file, pattern[1:])
else:
return fnmatch.fnmatch(file, pattern)
In fnmatch.fnmatch, "*" matches anything (including slashes).
There's an implementation in the 'waf' build system source code.
http://code.google.com/p/waf/source/browse/trunk/waflib/Node.py?r=10755#471
May be this should be wrapped up in a library of its own?
Yup. Your best bet is, as has already been suggested, to work with 'os.walk'. Or, write wrappers around 'glob' and 'fnmatch' modules, perhaps.
os.walk is your best bet for this. I did the example below with .svn because I had that handy, and it worked great:
import re
for (dirpath, dirnames, filenames) in os.walk("."):
if re.search(r'\.svn$', dirpath):
for file in filenames:
print file