batch search and replace strings in filenames with python - python

I am trying to write a small python script to rename a bunch of filenames by searching and replacing. For example:
Original filename:
MyMusic.Songname.Artist-mp3.iTunes.mp3
Intendet Result:
Songname.Artist.mp3
what i've got so far is:
#!/usr/bin/env python
from os import rename, listdir
mustgo = "MyMusic."
filenames = listdir('.')
for fname in fnames:
if fname.startswith(mustgo):
rename(fname, fname.replace(mustgo, '', 1))
(got it from this site as far as i can remember)
Anyway, this will only get rid of the String at the beginning, but not of those in the filename.
Also I would like to maybe use a seperate file (eg badwords.txt) containing all the strings that should be searched for and replaced, so that i can update them without having to edit the whole code.
Content of badwords.txt
MyMusic.
-mp3
-MP3
.iTunes
.itunes
I have been searching for quite some time now but havent found anything. Would appreciate any help!
Thank you!

import fnmatch
import re
import os
with open('badwords.txt','r') as f:
pat='|'.join(fnmatch.translate(badword)[:-1] for badword in
f.read().splitlines())
for fname in os.listdir('.'):
new_fname=re.sub(pat,'',fname)
if fname != new_fname:
print('{o} --> {n}'.format(o=fname,n=new_fname))
os.rename(fname, new_fname)
# MyMusic.Songname.Artist-mp3.iTunes.mp3 --> Songname.Artist.mp3
Note that it is possible for some files to be overwritten (and thus
lost) if two names get reduced to the same shortened name after
badwords have been removed. A set of new fnames could be kept and
checked before calling os.rename to prevent losing data through
name collisions.
fnmatch.translate takes shell-style patterns and returns the
equivalent regular expression. It is used above to convert badwords
(e.g. '.iTunes') into regular expressions (e.g. r'\.iTunes').
Your badwords list seems to indicate you want to ignore case. You
could ignore case by adding '(?i)' to the beginning of pat:
with open('badwords.txt','r') as f:
pat='(?i)'+'|'.join(fnmatch.translate(badword)[:-1] for badword in
f.read().splitlines())

Related

Python string alphabet removal?

So in my program, I am reading in files and processing them.
My output should say just the file name and then display some data
When I am looping through files and printing output by their name and data,
it displays for example: myfile.txt. I don't want the .txt part. just myfile.
how can I remove the .txt from the end of this string?
The best way to do it is in the example
import os
filename = 'myfile.txt'
print(filename)
print(os.path.splitext(filename))
print(os.path.splitext(filename)[0])
More info about this very useful builtin module
https://docs.python.org/3.8/library/os.path.html
The answers given are totally right, but if you have other possible extensions, or don't want to import anything, try this:
name = file_name.rsplit(".", 1)[0]
You can use pathlib.Path which has a stem attribute that returns the filename without the suffix.
>>> from pathlib import Path
>>> Path('myfile.txt').stem
'myfile'
Well if you only have .txt files you can do this
file_name = "myfile.txt"
file_name.replace('.txt', '')
This uses the built in replace functionality. You can find more info on it here!

Renaming files with quotation marks in the title using Python

I know similar questions have been asked a few times on this site, but the solutions provided there did not work for me.
I need to rename files with titles such as
a.jpg
'b.jpg'
c.jpg
"d.jpg"
to
a.jpg
b.jpg
c.jpg
d.jpg
Some of these titles have quotation marks inside the title as well, but it doesn't matter whether they get removed or not.
I have tried
import os
import re
fnames = os.listdir('.')
for fname in fnames:
os.rename(fname, re.sub("\'", '', fname))
and
import os
for file in os.listdir("."):
os.rename(file, file.replace("\'", ""))
to then do the same for the " quotation mark as well, but the titles remained unchanged. I think it might be due to listdir returning the filenames with ' quotation marks around them, but I am not sure.
Edit: I am working on a Ubuntu 18.04.
On windows, a filename with double quotes in it isn't a valid filename. However, a filename with single quotes is valid.
A string with double quotes in it in python would look like:
'"I\'m a string with a double quote on each side"'
A string with single quotes in it in python would look like:
"'I\'m a string with a single quote on each side'"
Because you can't have a double-quote filename on windows, you can't os.rename('"example.txt"', "example.txt"). Because it can't exist to even be renamed.
You can put this script on your desktop and watch the filenames change as it executes:
import os
open("'ex'am'ple.t'xt'",'w')
input("Press enter to rename.")
#example with single quotes all over the filename
os.rename("'ex'am'ple.t'xt'", "example.txt")
open("'example.txt'",'w')
input("Press enter to rename.")
#example with single quotes on each side of filename
os.rename("'example2.txt'", "example2.txt")
Here is my attempt using a for-loop, like you do and list comprehension used on the string, which is also an iterable.
import os
files = os.listdir(os.getcwd())
for file in files:
new_name = ''.join([char for char in file if not char == '\''])
print(new_name)
os.rename(file, new_name)
Edit the forbidden_chars list with the characters, that you do not want in the future filename.
Remember that this will also change folder names afaik, so you may want to check at the start of the for-loop
if os.isfile(file):
before changing the name.
I actually do not understand how you would have filenames, that include the extension inside of quotation marks, but this should work either way. I highly recommend being careful if you want to remove dots.
I also recommend peeking at the documentation for the os module, before using its functions as they can do things you may not be aware of. For example: renaming to an existing filename within the directory will just silently replace the file.

Delete all files with partial filename python

I have files in my present working directory that I would like to delete. They all have a filename that starts with the string 'words' (for example, files words_1.csv and words_2.csv). I want to match all files in the current directory that start with 'words' and delete them. What would the search pattern be?
I found this from here, but it doesn't quite answer the question.
import os, re
def purge(dir, pattern):
for f in os.listdir(dir):
if re.search(pattern, f):
os.remove(os.path.join(dir, f))
t = 'words_1.csv'
print(t.startswith('words'))
it‘s done.
and the pattern may be the '^words.*\.csv$',but i suggest you read python RE doc.
If I'm understanding your question correctly, you have this function and you are asking how it may be used. You should be able to call simply:
purge('/path/to/your/dir','words.*')
This will remove any files starting with the string "words".
pattern is a regular expression pattern. In your case, it's simply anything beginning with "words" and ending with ".csv", so you can use
pattern = "words*.csv"

Grab part of filename with Python

Newbie here.
I've just been working with Python/coding for a few days, but I want to create a script that grabs parts of filenames corresponding to a certain pattern, and outputs it to a textfile.
So in my case, let's say I have four .pdf like this:
aaa_ID_8423.pdf
bbbb_ID_8852.pdf
ccccc_ID_7413.pdf
dddddd_ID_4421.pdf
(Note that they are of variable length.)
I want the script to go through these filenames, grab the string after "ID_" and before the filename extension.
Can you point me in the direction to which Python modules and possibly guides that could assist me?
Here's a simple solution using the re module as mentioned in other answers.
# Libraries
import re
# Example filenames. Use glob as described below to grab your pdf filenames
file_list = ['name_ID_123.pdf','name2_ID_456.pdf'] # glob.glob("*.pdf")
for fname in file_list:
res = re.findall("ID_(\d+).pdf", fname)
if not res: continue
print res[0] # You can append the result to a list
And below should be your output. You should be able to adapt this to other patterns.
# Output
123
456
Goodluck!
Here's another alternative, using re.split(), which is probably closer to the spirit of exactly what you're trying to do (although solutions with re.match() and re.search(), among others, are just as valid, useful, and instructive):
>>> import re
>>> re.split("[_.]", "dddddd_ID_4421.pdf")[-2]
'4421'
>>>
If the numbers are variable length, you'll want the regex module "re"
import re
# create and compile a regex pattern
pattern = re.compile(r"_([0-9]+)\.[^\.]+$")
pattern.search("abc_ID_8423.pdf").group(1)
Out[23]: '8423'
Regex is generally used to match variable strings. The regex I just wrote says:
Find an underscore ("_"), followed by a variable number of digits ("[0-9]+"), followed by the last period in the string ("\.[^\.]+$")
You can use the os module in python and do a listdir to get a list of filenames present in that path like so:
import os
filenames = os.listdir(path)
Now you can iterate over the filenames list and look for the pattern which you need using regular expressions:
import re
for filename in filenames:
m = re.search('(?<=ID_)\w+', filename)
print (m)
The above snippet will return the part of the filename following ID_ and prints it out. So, for your example, it would return 4421.pdf, 8423.pdf etc. You can write a similar regex to remove the .pdf part.
You probably want to use glob, which is a python module for file globbing. From the python help page the usage is as follows:
>>> import glob
>>> glob.glob('./[0-9].*')
['./1.gif', './2.txt']
>>> glob.glob('*.gif')
['1.gif', 'card.gif']
>>> glob.glob('?.gif')
['1.gif']

how can I save the output of a search for files matching *.txt to a variable?

I'm fairly new to python. I'd like to save the text that is printed by at this script as a variable. (The variable is meant to be written to a file later, if that matters.) How can I do that?
import fnmatch
import os
for file in os.listdir("/Users/x/y"):
if fnmatch.fnmatch(file, '*.txt'):
print(file)
you can store it in variable like this:
import fnmatch
import os
for file in os.listdir("/Users/x/y"):
if fnmatch.fnmatch(file, '*.txt'):
print(file)
my_var = file
# do your stuff
or you can store it in list for later use:
import fnmatch
import os
my_match = []
for file in os.listdir("/Users/x/y"):
if fnmatch.fnmatch(file, '*.txt'):
print(file)
my_match.append(file) # append insert the value at end of list
# do stuff with my_match list
You can store it in a list:
import fnmatch
import os
matches = []
for file in os.listdir("/Users/x/y"):
if fnmatch.fnmatch(file, '*.txt'):
matches.append(file)
Both answers already provided are correct, but Python provides a nice alternative. Since iterating through an array and appending to a list is such a common pattern, the list comprehension was created as a one-stop shop for the process.
import fnmatch
import os
matches = [filename for filename in os.listdir("/Users/x/y") if fnmatch.fnmatch(filename, "*.txt")]
While NSU's answer and the others are all perfectly good, there may be a simpler way to get what you want.
Just as fnmatch tests whether a certain file matches a shell-style wildcard, glob lists all files matching a shell-style wildcard. In fact:
This is done by using the os.listdir() and fnmatch.fnmatch() functions in concert…
So, you can do this:
import glob
matches = glob.glob("/Users/x/y/*.txt")
But notice that in this case, you're going to get full pathnames like '/Users/x/y/spam.txt' rather than just 'spam.txt', which may not be what you want. Often, it's easier to keep the full pathnames around and os.path.basename them when you want to display them, than to keep just the base names around and os.path.join them when you want to open them… but "often" isn't "always".
Also notice that I had to manually paste the "/Users/x/y/" and "*.txt" together into a single string, the way you would at the command line. That's fine here, but if, say, the first one came from a variable, rather than hardcoded into the source, you'd have to use os.path.join(basepath, "*.txt"), which isn't quite as nice.
By the way, if you're using Python 3.4 or later, you can get the same thing out of the higher-level pathlib library:
import pathlib
matches = list(pathlib.Path("/Users/x/y/").glob("*.txt"))
Maybe defining an utility function is the right path to follow...
def list_ext_in_dir(e,d):
"""e=extension, d= directory => list of matching filenames.
If the directory d cannot be listed returns None."""
from fnmatch import fnmatch
from os import listdir
try:
dirlist = os.listdir(d)
except OSError:
return None
return [fname for fname in dirlist if fnmatch(fname,e)]
I have put the dirlist inside a try except clause to catch the
possibility that we cannot list the directory (non-existent, read
permission, etc). The treatment of errors is a bit simplistic, but...
the list of matching filenames is built using a so called list comprehension, that is something that you should investigate as soon as possible if you're going to use python for your programs.
To close my post, an usage example
l_txtfiles = list_ext_in_dir('*.txt','/Users/x/y;)

Categories