How To Include "part#+.rar" In "Extensions - python

So I'm using a script to split filenames into "name" and "extension" so I can then apply a bunch of rules to and play around with the "name" and have the script put everything back together at the end.
At the moment, I'm using:
import os, shutil, re
def rename_file (original_filename):
name, extension = os.path.splitext(original_filename)
name = re.sub(r"\'", r"", name) # etc...more of these...
new_filename = name + extension
try:
# moves files or directories (recursively)
shutil.move(original_filename, new_filename)
except shutil.Error:
print ("Couldn't rename file %(original_filename)s!" % locals())
[rename_file(f) for f in os.listdir('.') if not f.startswith('.')]
My problem is that os.path.splitext() includes "the .part(s)" of the ".partX.rar" as part of the filename, whereas I'd like it to be included as part of the file extension.
How can I get the the script to do that (without having a list of "extensions" or a completely separate script for rar files)?
Thanks!

os.path.splitext does a reverse search for '.' and returns the first match it finds. So out of the box splitext will not do what you need. If you are just using it to tokenize file names I suggest that you parse the filename yourself by splitting on . taking the left side as the name and then rejoining the right side.
Here is one way to do it:
def split_name(file_name):
'''
Returns root_filename, 'middle tokens', and extension
'''
tokens = file_name.split('.')
return (tokens[0], ".".join(tokens[1:-1]), tokens[-1]) if len(tokens) >1 else file_name
file_name = 'this.is.a.txt'
split_name(file_name)
#result is:
>>>  ('this', 'is.a', 'txt')

Related

Check if a filename has multiple '.'/periods in it

So I'm making a website and in one of the pages you can upload images.
I didn't think of this before when making my file upload function but files are allowed to have multiple . in them, so how can I differentiate between the "real" . and the fake . to get the filename and the extension.
This is my file upload function, which isn't especially relevant but it shows how I upload the files:
def upload_files(files, extensions, path, overwrite=False, rename=None):
if not os.path.exists(path):
os.makedirs(path)
filepath = None
for file in files:
name, ext = file.filename.split('.')
if ext in extensions or extensions == '*':
if rename:
filepath = path + rename + '.' + ext if path else rename + '.' + ext
else:
filepath = path + file.filename if path else file.filename
file.save(filepath, overwrite=overwrite)
else:
raise Exception('[ FILE ISSUE ] - File Extension is not allowed.')
As you can see I am splitting the filename based on the . that is there but I now need to split it and figure out which . split pair is the actual pair for filename and extension, it also creates the issue of providing too many values for the declaration name, ext since there is a third var now at least.
Sounds like you are looking for os.path.splitext which will split your filename into a name and extension part
import os
print(os.path.splitext("./.././this.file.ext"))
# => ('./.././this.file', '.ext')
So after going through your query I thought of something which might help you.
Method-1:
If you have some pre defined extensions like jpg, png, jpeg, pdf, docx, ppt, csv, xlxd ..... you can make a list of these and use that to separate your file extension
filename = Anindya.Patro.pdf
for I in filename.split('.'):
if I in list_of_files:
Do your operation
This is a brute force kinda idea.
Method-2:
The extensions are always at the last of file name - like you don't find files like Anindya.pdf.Patro
So you can access it in two ways:
By splitting at last . only
split the filename and do filename[-1] which will give you the last word after split
l = filename.split('.')
extension = l[-1]

Iterating over multiple files and replacing a single line - why doesn't it work?

I'm trying to use the fileinput module to iterate over a bunch of files and replace a single line in them. This is how my code looks:
def main():
for root, dirs, files in os.walk('target/generated-sources'):
for line in fileinput.input([os.path.join(root, file) for file in files if file.endsWith('.java')], inplace=True):
match = re.search(r'#Table\(name = "(.*)"\)', line)
output = "".join(['#Table(name = "', PREFIX, match.group(1)[MAX_TABLENAME_LEN - len(PREFIX)], '")', '\n']) if match else line
print output,
The problem I face is that I get no error, and the script somehow seems to block. I'm using Python 2.5.2.
Your list comprehension is returning empty lists when a root does not contain .java files. When your script passes an empty list to fileinput.input(), it reverts to the default expecting input from stdin. Since there is nothing coming in from stdin, your script blocks.
Try this instead:
def main():
for root, dirs, files in os.walk('target/generated-sources'):
java_files = [os.path.join(root, file) for file in files if file.endsWith('.java')]
if not java_files: # go to next iteration if list is empty
continue
for line in fileinput.input(java_files, inplace=True):
match = re.search(r'#Table\(name = "(.*)"\)', line)
output = "".join(['#Table(name = "', PREFIX, match.group(1)[MAX_TABLENAME_LEN - len(PREFIX)], '")', '\n']) if match else line
print output,
Alternatively, split up the logic of the file discovery. The following creates a generator which produces a list of files which you can then use as an input to fileinput.
import os, fnmatch, fileinput
def find_files(directory, pattern):
"Generator that returns files within direction with name matching pattern"
for root, dirs, files in os.walk(directory):
for basename in fnmatch.filter(files, pattern):
filename = os.path.join(root, basename)
yield filename
for line in fileinput.input(find_files("target/generated-sources", "*.java")):
match = re.search(r'#Table\(name = "(.*)"\)', line)
output = "".join(['#Table(name = "', PREFIX, match.group(1)[MAX_TABLENAME_LEN - len(PREFIX)], '")', '\n']) if match else line
print output,
If you want to know where the interpreter blocks, you can send signal SIGINT to the process. At least on unix like operating systems.
kill -sigint PID
Try to add some print or logging- lines, to see where your code hangs. Maybe fileinput works well, and the app blocks after that.
Some time ago I wrote a tool to do search+replace in several files:
http://www.thomas-guettler.de/scripts/reprec.py.txt

python grep reverse matching

I would like to build a small python script that basicaly does the reverse of grep.
I want to match the files in a directory/subdirectory that doesn't have a "searched_string".
So far i've done that:
import os
filefilter = ['java','.jsp']
path= "/home/patate/code/project"
for path, subdirs, files in os.walk(path):
for name in files:
if name[-4:] in filefilter :
print os.path.join(path, name)
This small script will be listing everyfiles with "java" or "jsp" extension inside each subdirectory, and will output them full path.
I'm now wondering how to do the rest, for example i would like to be able if I forgot a session management entry in one file (allowing anyone a direct file access), to search for :
"if (!user.hasPermission" and list the file which does not contain this string.
Any help would be greatly appreciated !
Thanks
To check if a file with a path bound to variable f contains a string bound to name s, simplest (and acceptable for most reasonably-sized files) is something like
with open(f) as fp:
if s in fp.read():
print '%s has the string' % f
else:
print '%s doesn't have the string' % f
In your os.walk loop, you have the root path and filename separately, so
f = os.path.join(path, name)
(what you're unconditionally printing) is the path you want to open and check.
Instead of printing file name call function that will check if file content do not match texts you want to have in source files. In such cases I use check_file() that looks like this:
WARNING_RX = (
(re.compile(r'if\s+\(!\s+user.hasPermission'), 'user.hasPermission'),
(re.compile(r'other regexp you want to have'), 'very important'),
)
def check_file(fn):
f = open(fn, 'r')
content = f.read()
f.close()
for rx, rx_desc in WARNING_RX:
if not rx.search(content):
print('%s: not found: %s' % (fn, rx_desc))

Rename multiple files in a directory in Python

I'm trying to rename some files in a directory using Python.
Say I have a file called CHEESE_CHEESE_TYPE.*** and want to remove CHEESE_ so my resulting filename would be CHEESE_TYPE
I'm trying to use the os.path.split but it's not working properly. I have also considered using string manipulations, but have not been successful with that either.
Use os.rename(src, dst) to rename or move a file or a directory.
$ ls
cheese_cheese_type.bar cheese_cheese_type.foo
$ python
>>> import os
>>> for filename in os.listdir("."):
... if filename.startswith("cheese_"):
... os.rename(filename, filename[7:])
...
>>>
$ ls
cheese_type.bar cheese_type.foo
Here's a script based on your newest comment.
#!/usr/bin/env python
from os import rename, listdir
badprefix = "cheese_"
fnames = listdir('.')
for fname in fnames:
if fname.startswith(badprefix*2):
rename(fname, fname.replace(badprefix, '', 1))
The following code should work. It takes every filename in the current directory, if the filename contains the pattern CHEESE_CHEESE_ then it is renamed. If not nothing is done to the filename.
import os
for fileName in os.listdir("."):
os.rename(fileName, fileName.replace("CHEESE_CHEESE_", "CHEESE_"))
Assuming you are already in the directory, and that the "first 8 characters" from your comment hold true always. (Although "CHEESE_" is 7 characters... ? If so, change the 8 below to 7)
from glob import glob
from os import rename
for fname in glob('*.prj'):
rename(fname, fname[8:])
I have the same issue, where I want to replace the white space in any pdf file to a dash -.
But the files were in multiple sub-directories. So, I had to use os.walk().
In your case for multiple sub-directories, it could be something like this:
import os
for dpath, dnames, fnames in os.walk('/path/to/directory'):
for f in fnames:
os.chdir(dpath)
if f.startswith('cheese_'):
os.rename(f, f.replace('cheese_', ''))
Try this:
import os
import shutil
for file in os.listdir(dirpath):
newfile = os.path.join(dirpath, file.split("_",1)[1])
shutil.move(os.path.join(dirpath,file),newfile)
I'm assuming you don't want to remove the file extension, but you can just do the same split with periods.
This sort of stuff is perfectly fitted for IPython, which has shell integration.
In [1] files = !ls
In [2] for f in files:
newname = process_filename(f)
mv $f $newname
Note: to store this in a script, use the .ipy extension, and prefix all shell commands with !.
See also: http://ipython.org/ipython-doc/stable/interactive/shell.html
Here is a more general solution:
This code can be used to remove any particular character or set of characters recursively from all filenames within a directory and replace them with any other character, set of characters or no character.
import os
paths = (os.path.join(root, filename)
for root, _, filenames in os.walk('C:\FolderName')
for filename in filenames)
for path in paths:
# the '#' in the example below will be replaced by the '-' in the filenames in the directory
newname = path.replace('#', '-')
if newname != path:
os.rename(path, newname)
It seems that your problem is more in determining the new file name rather than the rename itself (for which you could use the os.rename method).
It is not clear from your question what the pattern is that you want to be renaming. There is nothing wrong with string manipulation. A regular expression may be what you need here.
import os
import string
def rename_files():
#List all files in the directory
file_list = os.listdir("/Users/tedfuller/Desktop/prank/")
print(file_list)
#Change current working directory and print out it's location
working_location = os.chdir("/Users/tedfuller/Desktop/prank/")
working_location = os.getcwd()
print(working_location)
#Rename all the files in that directory
for file_name in file_list:
os.rename(file_name, file_name.translate(str.maketrans("","",string.digits)))
rename_files()
This command will remove the initial "CHEESE_" string from all the files in the current directory, using renamer:
$ renamer --find "/^CHEESE_/" *
I was originally looking for some GUI which would allow renaming using regular expressions and which had a preview of the result before applying changes.
On Linux I have successfully used krename, on Windows Total Commander does renaming with regexes, but I found no decent free equivalent for OSX, so I ended up writing a python script which works recursively and by default only prints the new file names without making any changes. Add the '-w' switch to actually modify the file names.
#!/usr/bin/python
# -*- coding: utf-8 -*-
import os
import fnmatch
import sys
import shutil
import re
def usage():
print """
Usage:
%s <work_dir> <search_regex> <replace_regex> [-w|--write]
By default no changes are made, add '-w' or '--write' as last arg to actually rename files
after you have previewed the result.
""" % (os.path.basename(sys.argv[0]))
def rename_files(directory, search_pattern, replace_pattern, write_changes=False):
pattern_old = re.compile(search_pattern)
for path, dirs, files in os.walk(os.path.abspath(directory)):
for filename in fnmatch.filter(files, "*.*"):
if pattern_old.findall(filename):
new_name = pattern_old.sub(replace_pattern, filename)
filepath_old = os.path.join(path, filename)
filepath_new = os.path.join(path, new_name)
if not filepath_new:
print 'Replacement regex {} returns empty value! Skipping'.format(replace_pattern)
continue
print new_name
if write_changes:
shutil.move(filepath_old, filepath_new)
else:
print 'Name [{}] does not match search regex [{}]'.format(filename, search_pattern)
if __name__ == '__main__':
if len(sys.argv) < 4:
usage()
sys.exit(-1)
work_dir = sys.argv[1]
search_regex = sys.argv[2]
replace_regex = sys.argv[3]
write_changes = (len(sys.argv) > 4) and sys.argv[4].lower() in ['--write', '-w']
rename_files(work_dir, search_regex, replace_regex, write_changes)
Example use case
I want to flip parts of a file name in the following manner, i.e. move the bit m7-08 to the beginning of the file name:
# Before:
Summary-building-mobile-apps-ionic-framework-angularjs-m7-08.mp4
# After:
m7-08_Summary-building-mobile-apps-ionic-framework-angularjs.mp4
This will perform a dry run, and print the new file names without actually renaming any files:
rename_files_regex.py . "([^\.]+?)-(m\\d+-\\d+)" "\\2_\\1"
This will do the actual renaming (you can use either -w or --write):
rename_files_regex.py . "([^\.]+?)-(m\\d+-\\d+)" "\\2_\\1" --write
You can use os.system function for simplicity and to invoke bash to accomplish the task:
import os
os.system('mv old_filename new_filename')
This works for me.
import os
for afile in os.listdir('.'):
filename, file_extension = os.path.splitext(afile)
if not file_extension == '.xyz':
os.rename(afile, filename + '.abc')
What about this :
import re
p = re.compile(r'_')
p.split(filename, 1) #where filename is CHEESE_CHEESE_TYPE.***

Replace strings in files by Python, recursively in given directory and its subdirectories?

How can you replace a string match inside a file with the given replacement, recursively, inside a given directory and its subdirectories?
Pseudo-code:
import os
import re
from os.path import walk
for root, dirs, files in os.walk("/home/noa/Desktop/codes"):
for name in dirs:
re.search("dbname=noa user=noa", "dbname=masi user=masi")
// I am trying to replace here a given match in a file
Put all this code into a file called mass_replace. Under Linux or Mac OS X, you can do chmod +x mass_replace and then just run this. Under Windows, you can run it with python mass_replace followed by the appropriate arguments.
#!/usr/bin/python
import os
import re
import sys
# list of extensions to replace
DEFAULT_REPLACE_EXTENSIONS = None
# example: uncomment next line to only replace *.c, *.h, and/or *.txt
# DEFAULT_REPLACE_EXTENSIONS = (".c", ".h", ".txt")
def try_to_replace(fname, replace_extensions=DEFAULT_REPLACE_EXTENSIONS):
if replace_extensions:
return fname.lower().endswith(replace_extensions)
return True
def file_replace(fname, pat, s_after):
# first, see if the pattern is even in the file.
with open(fname) as f:
if not any(re.search(pat, line) for line in f):
return # pattern does not occur in file so we are done.
# pattern is in the file, so perform replace operation.
with open(fname) as f:
out_fname = fname + ".tmp"
out = open(out_fname, "w")
for line in f:
out.write(re.sub(pat, s_after, line))
out.close()
os.rename(out_fname, fname)
def mass_replace(dir_name, s_before, s_after, replace_extensions=DEFAULT_REPLACE_EXTENSIONS):
pat = re.compile(s_before)
for dirpath, dirnames, filenames in os.walk(dir_name):
for fname in filenames:
if try_to_replace(fname, replace_extensions):
fullname = os.path.join(dirpath, fname)
file_replace(fullname, pat, s_after)
if len(sys.argv) != 4:
u = "Usage: mass_replace <dir_name> <string_before> <string_after>\n"
sys.stderr.write(u)
sys.exit(1)
mass_replace(sys.argv[1], sys.argv[2], sys.argv[3])
EDIT: I have changed the above code from the original answer. There are several changes. First, mass_replace() now calls re.compile() to pre-compile the search pattern; second, to check what extension the file has, we now pass in a tuple of file extensions to .endswith() rather than calling .endswith() three times; third, it now uses the with statement available in recent versions of Python; and finally, file_replace() now checks to see if the pattern is found within the file, and doesn't rewrite the file if the pattern is not found. (The old version would rewrite every file, changing the timestamps even if the output file was identical to the input file; this was inelegant.)
EDIT: I changed this to default to replacing every file, but with one line you can edit to limit it to particular extensions. I think replacing every file is a more useful out-of-the-box default. This could be extended with a list of extensions or filenames not to touch, options to make it case insensitive, etc.
EDIT: In a comment, #asciimo pointed out a bug. I edited this to fix the bug. str.endswith() is documented to accept a tuple of strings to try, but not a list. Fixed. Also, I made a couple of the functions accept an optional argument to let you pass in a tuple of extensions; it should be pretty easy to modify this to accept a command-line argument to specify which extensions.
Do you really need regular expressions?
import os
def recursive_replace( root, pattern, replace )
for dir, subdirs, names in os.walk( root ):
for name in names:
path = os.path.join( dir, name )
text = open( path ).read()
if pattern in text:
open( path, 'w' ).write( text.replace( pattern, replace ) )
Of course, if you just want to get it done without coding it up, use find and xargs:
find /home/noa/Desktop/codes -type f -print0 | \
xargs -0 sed --in-place "s/dbname=noa user=noa/dbname=masi user=masi"
(And you could likely do this with find's -exec or something as well, but I prefer xargs.)
This is how I would find and replace strings in files using python. This is a simple little function that will recursively search a directories for a string and replace it with a string. You can also limit files with a certain file extension like the example below.
import os, fnmatch
def findReplace(directory, find, replace, filePattern):
for path, dirs, files in os.walk(os.path.abspath(directory)):
for filename in fnmatch.filter(files, filePattern):
filepath = os.path.join(path, filename)
with open(filepath) as f:
s = f.read()
s = s.replace(find, replace)
with open(filepath, "w") as f:
f.write(s)
This allows you to do something like:
findReplace("some_dir", "find this", "replace with this", "*.txt")
this should work:
import re, os
import fnmatch
for path, dirs, files in os.walk(os.path.abspath(directory)):
for filename in fnmatch.filter(files, filePattern):
filepath = os.path.join(path, filename)
with open("namelist.wps", 'a') as out:
with open("namelist.wps", 'r') as readf:
for line in readf:
line = re.sub(r"dbname=noa user=noa", "dbname=masi user=masi", line)
out.write(line)

Categories