How can I remove a prefix from a filename in Python? [duplicate] - python

This question already has answers here:
Get relative path from comparing two absolute paths
(6 answers)
Closed 5 years ago.
I want to write a script that receives a path to a directory and a path to a file contained in that directory (possibly nested many directories deep) and returns a path to this file relative to the outer directory.
For example, if the outer directory is /home/hugomg/foo and the inner file is /home/hugomg/foo/bar/baz/unicorns.txt I would like the script to output bar/baz/unicorns.txt.
Right now I am doing it using realpath and string manipulation:
import os
dir_path = "/home/hugomg/foo"
file_path = "/home/hugomg/foo/bar/baz/unicorns.py"
dir_path = os.path.realpath(dir_path)
file_path = os.path.realpath(file_path)
if not file_path.startswith(dir_path):
print("file is not inside the directory")
exit(1)
output = file_path[len(dir_path):]
output = output.lstrip("/")
print(output)
But is there a more robust way to do this? I'm not confident that my current solution is the right way to do this. Is using startswith together with realpath a correct way to test that one file is inside another? And is there a way to avoid that awkward situation with the leading slash that I might need to remove?

You can use the commonprefix and relpath of the os.path module to find the longest common prefix of two paths. Its always preferred to use realpath.
import os
dir_path = os.path.realpath("/home/hugomg/foo")
file_path = os.path.realpath("/home/hugomg/foo/bar/baz/unicorns.py")
common_prefix = os.path.commonprefix([dir_path,file_path])
if common_prefix != dir_path:
print("file is not inside the directory")
exit(1)
print(os.path.relpath(file_path, dir_path))
Output:
bar/baz/unicorns.txt

Related

Apply script to multiple folders using string in file path [duplicate]

This question already has answers here:
How to use glob() to find files recursively?
(28 answers)
Closed 1 year ago.
I am new to the programing world and I have hit a snag on a piece of code.
What I have: I have a piece of code that identifies all .MP4 files and calculates the size of the files within a directory. So far I can only apply this code to a specific folder that I input manually. But the code works.
Problem: I would like to apply what I have to multiple folders within a directory. I have several folders with years in the file path and I would like to apply this code to each year individually. For example: I need a piece of code/direction to code that can allow me to run this code on all folders with '2021' in the name.
Any pointers or suggestions are welcome.
# import module
import os
# assign size
size = 0
# assign folder path
Folderpath = r'C:\file_path_name_here'
# get size
for path, dirs, files in os.walk(Folderpath):
for f in files:
if not f.endswith('.MP4'):
continue
else:
fp = os.path.join(path, f)
size += os.path.getsize(fp)
# display size
print("Folder size: " + str(size))
You can use glob for that.
If i understand you correctly you want to iterate through all Folders right?
I myself am not a routined coder aswell but i use it in for a Script where i have to iterate over an unknown number of files and folders. In my case PDF's
which then get scanned/indexed/merged...
This obviously returns a list of of files with which you then could workd through. os.path commonpath is also handy for that.
def getpdflisting(fpath):
filelist = []
for filepath in Path(os.path.join(config['Ordner']['path'] +\
fpath)).glob('**/*.pdf'):
filelist.append(str(filepath))
if filelist:
filelist = [x for x in filelist if x]
logger.info(filelist)
return filelist
Or even better https://stackoverflow.com/a/66042729/16573616

I want to rename all the .txt files on a dir to .csv using Python 3 [duplicate]

This question already has answers here:
Rename multiple files in a directory in Python
(15 answers)
Closed 4 years ago.
Looking to change the file extension from .txt to .csv
import os, shutil
for filename in os.listdir(directory):
# if the last four characters are “.txt” (ignoring case)
# (converting to lowercase will include files ending in “.TXT”, etc)
if filename.lower().endswidth(“.txt”):
# generate a new filename using everything before the “.txt”, plus “.csv”
newfilename = filename[:-4] + “.csv”
shutil.move(filename, newfilename)
You can use os and rename.
But let me give you a small advice. When you do these kind of operations as (copy, delete, move or rename) I'd suggest you first print the thing you are trying to achieve. This would normally be the startpath and endpath.
Consider this example below where the action os.rename() is commented out in favor of print():
import os
for f in os.listdir(directory):
if f.endswith('.txt'):
print(f, f[:-4]+'.csv')
#os.rename(f, f[:-4]+'.csv')
By doing this we could be certain things look ok. And if your directory is somewhere else than . You would probably need to do this:
import os
for f in os.listdir(directory):
if f.endswith('.txt'):
fullpath = os.path.join(directory,f)
print(fullpath, fullpath[:-4]+'.csv')
#os.rename(fullpath, fullpath[:-4]+'.csv')
The os.path.join() will make sure the directory path is added too.

Python glob, os, relative path, making filenames into a list [duplicate]

This question already has answers here:
Python Glob without the whole path - only the filename
(10 answers)
Closed 5 years ago.
I am trying to make a list of all files in a directory with filenames in a that end in .root.
After reading some writings in the forum I tried to basic strategies using glob and os.listdir but I got into trouble for both of them
First, when I use
import glob
filelist = glob.glob('/home/usr/dir/*.root')
It does make a list of string with all filenames that end in .root but I still face a problem.
I would like to be the list of string to have filenames as '/dir/.root' but the string has full path '/home/usr/dir/.root'
Second, if I use os.listdir, I get into the trouble that
path = '/home/usr/'
filelist = os.listdir(path + 'dir/*.root')
syntax error
which tells me that I can not only get the list of files for .root.
In summary, I would like to make a list of filenames, that end in .root and are in my /home/usr/dir, while cutting off the '/home/usr' part. If I use globe, I get into the trouble of having /home/usr/. If I use os.listdir, I can't specify ".root" endling.
glob will return paths in a format matching your query, so that
glob.glob("/home/usr/dir/*.root")
# ['home/usr/dir/foo.root', 'home/usr/dir/bar.root', ...]
glob.glob("*.root")
# ['foo.root', 'bar.root', ...]
glob.glob("./*.root")
# ['./foo.root', './bar.root', ...]
...and so forth.
To get only the filename, you can use path.basename of the os module, something like this:
from glob import glob
from os import path
pattern = "/home/usr/dir/*.root"
files = [path.basename(x) for x in glob(pattern)]
# ['foo.root', 'bar.root', ...]
...or, if you want to prepend the dir part:
pattern = "/home/usr/dir/*.root"
files = [path.join('dir', path.basename(x)) for x in glob(pattern)]
# ['dir/foo.root', 'dir/bar.root', ...]
...or, if you really want the path separator at the start:
from glob import glob
import os
pattern = "/home/usr/dir/*.root"
files = [os.sep + os.path.join('dir', os.path.basename(x)) for x in glob(pattern)]
# ['/dir/foo.root', '/dir/bar.root', ...]
Using path.join and path.sep will make sure that the correct path syntax is used, depending on your OS (i.e. / or \ as a separator).
Depending on what you are really trying to do here, you might want to look at os.path.relpath, for the relative path. The title of your question indicates that relative paths might be what you are actually after:
pattern = "/home/usr/dir/*.root"
files = [os.path.relpath(x) for x in glob(pattern)]
# files will now contain the relative path to each file, from the current working directory
just use glob for getting the list you want
and then use os.path.relpath on each file
import glob
files_names = []
for file in glob.glob('/home/usr/dir/*.root'):
files_names.append(os.path.relpath(file, "/home/usr"))
You can also use regex
import re
files_names.append(re.sub(r'//home//usr//','', file, flags=re.I))

How to remove all characters before the final \ [duplicate]

This question already has answers here:
Extract file name from path, no matter what the os/path format
(22 answers)
Closed 6 years ago.
I have a variable called dllName that grabs the name of a dll that has been executed. Sometimes this dll is returned in the format of "kernel32.dll" and sometimes as "C:\Windows\system32\kernel32.dll".
The path can vary, what I am trying to achieve is the stripping of the "C:\Windows\system32\".
EDIT: Extract file name from path, no matter what the os/path format
My question is not the same as this question, as os.path.basename and os.path.split do not work in this situation.
For os.path.split the head is empty and the tail contains the whole path?
You could use :
path = 'C:\\Windows\\system32\\kernel32.dll'
print path.split('\\')[-1]
#=> kernel32.dll
or
import os.path
print os.path.basename(path)
or
import re
def extract_basename(path):
"""Extracts basename of a given path. Should Work with any OS Path on any OS"""
basename = re.search(r'[^\\/]+(?=[\\/]?$)', path)
if basename:
return basename.group(0)
print extract_basename(path)
This last example should work for any OS, any Path.
Here are some tests.

Script delete C:\Windows\CSC\v2.0.6\namespace [duplicate]

This question already has answers here:
How to delete the contents of a folder?
(27 answers)
Closed 9 years ago.
I am trying to create a Python script to Delete everying under C:\Windows\CSC\v2.0.6\namespace
I need an Idea.. to do it in the command line i have to go to cmd an then psexec -s cmd than i have to goto C:\Windows\CSC\v2.0.6\namespace and than rd *what ever folder there.. i want to create a script to remove all.. any help
This code should delete any files or directories in your directory:
import os, shutil
folder = "C:\Windows\CSC\v2.0.6\namespace"
for item in os.listdir(folder):
path = os.path.join(folder, item)
try:
os.unlink(path) # delete if the item is a file
except Exception as e:
shutil.rmtree(path) # delete if the item is a folder
This has been answered previously.
A simple Google search and a few modifications:
import os
mainPath = "C:\Windows\CSC\v2.0.6\namespace"
files = os.listdir(mainPath)
for f in files:
os.remove('{}/{}'.format(mainPath, f))
If you want to recursively find all of the files and THEN delete them all (this is a small script I wrote yesterday):
import os, os.path
def getAllFiles(mainPath):
subPaths = os.listdir(mainPath)
for path in subPaths:
pathDir = '{}\{}'.format(mainPath, path)
if os.path.isdir(pathDir):
paths.extend(getAllFiles(pathDir, paths))
else:
paths.append(pathDir)
return paths
So then you can do:
files = getAllFiles(mainPath)
for f in files:
os.remove(f)
Note: the recursive algorithm gets somewhat slow (and may raise a MemoryError) if there are too many subfolders (it creates a lot of recursive nodes).
To avoid this, you can use the recursive function as a helper function, which is called by a main iterative function:
def getDirs(path):
sub = os.listdir(path)
paths = []
for p in sub:
pDir = '{}\{}'.format(path, p)
if os.path.isdir(pDir):
paths.extend(getAllFiles(pDir, paths)) # getAllFiles is the same as above
else:
paths.append(pDir)
return paths
It get's slow for very large subfolders, however. Going through C:\Python27\Lib takes about 6-7 seconds for me (it has about 5k+ files in it, and many, many subfolders).

Categories