How To Rename Files (Using Python OS Module)? - python

I have a folder which contains +100 songs named this way "Song Name, Singer Name" (e.g. Smooth Criminal, Michael Jackson). I'm trying to rename all the songs to "Song Name (Singer Name)" (e.g. Smooth Criminal (Michael Jackson)).
I tried this code. But, I didn't know what parameters to write.
import os
files = os.getcwd()
os.rename(files, "") # I'm confused because I don't know what to put here as parameters since I want to change only parts of the files' names, and not the files' names entirely.
Any suggestions on the parameters of "os.rename()"?

Unlike the command line program rename which can rename a batch of files using a pattern in a single command, Python's os.rename() is a thin wrapper around the underlying rename syscall. It can thus rename a single file at a time.
Assuming all songs are stored in the same directory and ends with an extension like '.mp3', one approach is to loop over the return of os.listdir().
Additionally, it would be wise to check that current file is, indeed a file and not, say, a directory or symbolic link. This can be done using os.path.isfile()
Here is a full example:
import os
TARGET_DIR = "/path/to/some/folder"
for filename in os.listdir(TARGET_DIR):
# Only consider regular files
if not os.path.isfile(filename):
continue
# Extract filename and extension
basename, extension = os.path.splitext(filename)
if ',' not in basename:
continue # Ignore
# Extract the song and singer names
songname, singername = basename.split(',', 1)
# Build the new name, rename
target_name = f'{songname} ({singername}){extension}'
os.rename(
os.path.join(TARGET_DIR, filename),
os.path.join(TARGET_DIR, target_name)
)
Note: If the songs are potentially stored in subfolders, os.walk() will be a better candidate than the lower level os.listdir()

I would recommend you to write a loop using os.
However with pandas you can replace some Name snipeds with no problem, I would recommend chosing pandas for this Task!

Related

How to change all file's type to a specific in a folder?

These Files are containing different file-types. after trying bunch file name changing and using such techniques using cmd this didn't changed. find the screenshot for better understanding.
I've used such code of Python from here but it didn't worked.
How to change multiple filenames in a directory using Python
You could use os to rename, and glob to conveniently get the list of files:
import os, glob
def rename(files, pattern, replacement):
for pathname in glob.glob(files):
basename= os.path.basename(pathname)
new_filename= basename.replace(pattern, replacement)
if new_filename != basename:
os.rename(
pathname,
os.path.join(os.path.dirname(pathname), new_filename))
rename('*.XXX', 'XXX', 'YYY')
Inspired in this answer.
I've Tried this code in cmd to change file name.
but it is changing one file-type at once.
ren [Something].XXX [Something].YYY
Even for changing the file-type of Multiple files in a folder
ren *.XXX *.YYY
But I'm trying to get specifically changing multiple file-types to single file-type.
[Some output i've got][1] : https://i.stack.imgur.com/Kdwzx.png

Python - Navigating through Subdirectories that Meet Naming Criteria

I am using Python 3.5 to analyze data contained in csv files. These files are contained in a "figs" directory, which is contained in a case directory, which is contained in an overall data directory, e.g.:
/strm1/serino/DATA/06052009/figs
Or more generally:
/strm1/serino/DATA/case_date_in_MMDDYYYY/figs
The directory I am starting in is '/strm1/serino/DATA/,' and each subdirectory is the month, day, and year of a case I am working with. Each subdirectory contains another subdirectory named 'figs,' and that is the location of each case's csv file. To be exact:
/strm1/serino/DATA/case_date_in_MMDDYYYY/figs/case_date_in_MMDDYYYY.csv
So, I would like to start in my DATA directory and go through its subdirectories to find those that have the MMDDYYYY naming. However, some of the case directories may be named with a state abbreviation at the end, like: '06052009_TX.' Therefore, instead of matching the MMDDYYYY naming exactly, it could be something as simple as verifying that the directory name contains any number 1 through 9.
Once I am in the first subdirectory (the case directory) I would like to move into the 'figs' subdirectory. Once there, I want to access the csv file with the same naming convention as the first subdirectory (the case directory). I will fill existing arrays with the data contained in each csv file.
Basically, my question concerns navigating through multiple subdirectories that match a certain naming convention and ultimately accessing the data file at the "end." I was naively playing around with glob, fnmatch, os.listdir, and os.walk, but I could not get anything close enough to working that I feel would be helpful to include. I am not very familiar with those modules. What I can include is what I am going for:
for dirs in data_dir that contain a number:
go into this directory
go into 'figs' directory
read data from the csv file whose name matches its case directory name (or whose name format matches the case directory name format)
I have come across related questions, but I have not been able to apply their answers in the way that I would like, especially with nested directories. I really appreciate the help, and let me know if I need to clarify anything.
The following should get you going. It uses the datetime.strptime() function to attempt to convert each folder name into a valid datetime object. If the conversion fails, then you know that the folder name is not in the correct format and can be skipped. It then attempts to parse any CSV file found in the corresponding fig folder:
from datetime import datetime
import glob
import csv
import os
dirpath, dirnames, filenames = next(os.walk('/strm1/serino/DATA'))
for dirname in dirnames:
if len(dirname) >= 8:
try:
dt = datetime.strptime(dirname[:8], '%m%d%Y')
print(dt, dirname)
csv_folder = os.path.join(dirpath, dirname)
for csv_file in glob.glob(os.path.join(csv_folder, 'figs', '*.csv')):
with open(csv_file, newline='') as f_input:
csv_input = csv.reader(f_input)
for row in csv_input:
print(row)
except ValueError as e:
pass
You listed several problems above. Which one are you stuck on? It seems like you already know how to navigate the file storage system using os.path. You may not know of the function os.path.join() which allows you to manually specify a file path relative to a file as such:
os.path.abspath(os.path.join(os.path.dirname(__file__), '../..', 'Data/TrailShelters/'))
To break down the above:
os.path.dirname(__file__) returns the path of the current file. '../..' means: go up two levels in the folder hierarchy. And Data/TrailShelters/ is the directory I wish to navigate to.
How does this apply to your particular case? Well, you will need to make some adaptations but you can store the os.path of the parent directory in a variable. Then you can essentially use a while sub_dir is not null loop to iterate through subdirectories. For every subdirectory you will want to examine its os.path and extract the particular part of the path you are interested in. Then you can simply use something like: if 'TN' in subdirectory_name to determine if it is a subdirectory you are interested in. If so; then update the saved os.path of the parent directory by appending the path to the subdirectory. Does that make any sense?

Changing name of file until it is unique

I have a script that downloads files (pdfs, docs, etc) from a predetermined list of web pages. I want to edit my script to alter the names of files with a trailing _x if the file name already exists, since it's possible files from different pages will share the same filename but contain different contents, and urlretrieve() appears to automatically overwrite existing files.
So far, I have:
urlfile = 'https://www.foo.com/foo/foo/foo.pdf'
filename = urlfile.split('/')[-1]
filename = foo.pdf
if os.path.exists(filename):
filename = filename('.')[0] + '_' + 1
That works fine for one occurrence, but it looks like after one foo_1.pdf it will start saving as foo_1_1.pdf, and so on. I would like to save the files as foo_1.pdf, foo_2.pdf, and so on.
Can anybody point me in the right direction on how to I can ensure that file names are stored in the correct fashion as the script runs?
Thanks.
So what you want is something like this:
curName = "foo_0.pdf"
while os.path.exists(curName):
num = int(curName.split('.')[0].split('_')[1])
curName = "foo_{}.pdf".format(str(num+1))
Here's the general scheme:
Assume you start from the first file name (foo_0.pdf)
Check if that name is taken
If it is, iterate the name by 1
Continue looping until you find a name that isn't taken
One alternative: Generate a list of file numbers that are in use, and update it as needed. If it's sorted you can say name = "foo_{}.pdf".format(flist[-1]+1). This has the advantage that you don't have to run through all the files every time (as the above solution does). However, you need to keep the list of numbers in memory. Additionally, this will not fill any gaps in the numbers
Why not just use the tempfile module:
fileobj = tempfile.NamedTemporaryFile(suffix='.pdf', prefix='', delete = False)
Now your filename will be available in fileobj.name and you can manipulate to your heart's content. As an added benefit, this is cross-platform.
Since you're dealing with multiple pages, this seeems more like a "global archive" than a per-page archive. For a per-page archive, I would go with the answer from #wnnmaw
For a global archive, I would take a different approch...
Create a directory for each filename
Store the file in the directory as "1" + extension
write the current "number" to the directory as "_files.txt"
additional files are written as 2,3,4,etc and increment the value in _files.txt
The benefits of this:
The directory is the original filename. If you keep turning "Example-1.pdf" into "Example-2.pdf" you run into a possibility where you download a real "Example-2.pdf", and can't associate it to the original filename.
You can grab the number of like-named files either by reading _files.txt or counting the number of files in the directory.
Personally, I'd also suggest storing the files in a tiered bucketing system, so that you don't have too many files/directories in any one directory (hundreds of files makes it annoying as a user, thousands of files can affect OS performance ). A bucketing system might turn a filename into a hexdigest, then drop the file into `/%s/%s/%s" % ( hex[0:3], hex[3:6], filename ). The hexdigest is used to give you a more even distribution of characters.
import os
def uniquify(path, sep=''):
path = os.path.normpath(path)
num = 0
newpath = path
dirname, basename = os.path.split(path)
filename, ext = os.path.splitext(basename)
while os.path.exists(newpath):
newpath = os.path.join(dirname, '{f}{s}{n:d}{e}'
.format(f=filename, s=sep, n=num, e=ext))
num += 1
return newpath
filename = uniquify('foo.pdf', sep='_')
Possible problems with this include:
If you call to uniquify many many thousands of times with the same
path, each subsequent call may get a bit slower since the
while-loop starts checking from num=0 each time.
uniquify is vulnerable to race conditions whereby a file may not
exist at the time os.path.exists is called, but may exist at the
time you use the value returned by uniquify. Use
tempfile.NamedTemporaryFile to avoid this problem. You won't get
incremental numbering, but you will get files with unique names,
guaranteed not to already exist. You could use the prefix parameter to
specify the original name of the file. For example,
import tempfile
import os
def uniquify(path, sep='_', mode='w'):
path = os.path.normpath(path)
if os.path.exists(path):
dirname, basename = os.path.split(path)
filename, ext = os.path.splitext(basename)
return tempfile.NamedTemporaryFile(prefix=filename+sep, suffix=ext, delete=False,
dir=dirname, mode=mode)
else:
return open(path, mode)
Which could be used like this:
In [141]: f = uniquify('/tmp/foo.pdf')
In [142]: f.name
Out[142]: '/tmp/foo_34cvy1.pdf'
Note that to prevent a race-condition, the opened filehandle -- not merely the name of the file -- is returned.

File Renaming by Convention - All within a folder

I often find myself in a situation where I have a folder containing files which are named according to a certain file naming convention, and I have to go through them manually to rename them to the one I want. A laborious repetitive task.
E.g. 01_artist_name_-_album_title_-_song_title_somethingelse.mp3 -> Song_Title.mp3
So the removal of certain bits of information, replacement of underscores with spaces, and capitalisation. Not just for music, that's just an example.
I have been thinking about automating this task using Python. Basically I want to be able to input the starting convention and my wanted convention and for it to rename them all accordingly.
Ideally I want to be able to do this in Python on Windows, but I have an Ubuntu machine I could use for this if it was easier to do in bash (or Python on UNIX).
If anyone can shed light on how I might approach this problem (suggestion of IO python commands that read contents of a folder - and rename files - on Windows, and how I might go about stripping the information from the filename and categorising it, maybe using RegEx?) I'll see what I can make it do and update with progress.
For your special case:
import glob, shutil, os.path
# glob.glob returns a list with all pathes according to the given pattern
for path in glob.glob("music_folder/*.mp3"):
# os.path.dirname gives the directory name, here it is "music_folder"
dirname = os.path.dirname(path)
# example: 01_artist_name_-_album_title_-_song_title_somethingelse.mp3
# split returns "_song_title_somethingelse.mp3"
interesting = path.split("-")[2]
# titlepart is a list with ["song", "title"], the beginning "_" and the
# 'somehting' string is removed by choosing the slice 1:-1
titlepart = interesting.split("_")[1:-1]
# capitalize converts song -> Song, title -> title
# join gluest both to "Song_Title"
new_name = "_".join(p.capitalize() for p in titlepart)+".mp3"
# shutil.move renames the given file
shutil.move(path, os.path.join(dirname, new_name))
If you want to use regular expression, you have to replace:
m=re.search(".*-_(\S+_\S+)_.*",path)
if m is None:
raise Exception("file name does not match regular expression")
song_name = m.groups()[0]
titlepart = song_name.split("_")

I want to rename a file that has a random part (in its filename) to a specific one

I have some builds generating files, and they all have a random part in them (checksum number).
How do I rename them to a unique name and execute them in Python?
If you are sure there are never any underscores in the piece of the name you would like to keep, you can split it. name = name.split('_')[0]. This won't of course preserve the file extension, but if all the output files are exes, you can just name += '.exe'
Edit:
file_list = os.listdir('.')
for each in file_list:
if each[-4:] != '.exe':
file_list.pop(file_list.index(each))
for each in file_list:
name = each.split('_')[0]
name += '.exe'
os.rename(each, name)
This is not very robust, and requires you execute it in the dir the exes are in. If its something you're going to keep around long term or expect others to use, you should investigate regex and make it path agnostic. I didn't test this- its just a hack, so use at your own risk; but it should be pretty benign. It will try to rename ALL the .exe files in the directory.

Categories