replace part of path - python - python

Is there a quick way to replace part of the path in python?
for example:
old_path='/abc/dfg/ghi/f.txt'
I don't know the beginning of the path (/abc/dfg/), so what I'd really like to tell python to keep everything that comes after /ghi/ (inclusive) and replace everything before /ghi/ with /jkl/mno/:
>>> new_path
'/jkl/mno/ghi/f.txt/'

If you're using Python 3.4+, or willing to install the backport, consider using pathlib instead of os.path:
path = pathlib.Path(old_path)
index = path.parts.index('ghi')
new_path = pathlib.Path('/jkl/mno').joinpath(*path.parts[index:])
If you just want to stick with the 2.7 or 3.3 stdlib, there's no direct way to do this, but you can get the equivalent of parts by looping over os.path.split. For example, keeping each path component until you find the first ghi, and then tacking on the new prefix, will replace everything before the last ghi (if you want to replace everything before the first ghi, it's not hard to change things):
path = old_path
new_path = ''
while True:
path, base = os.path.split(path)
new_path = os.path.join(base, new_path)
if base == 'ghi':
break
new_path = os.path.join('/jkl/mno', new_path)
This is a bit clumsy, so you might want to consider writing a simple function that gives you a list or tuple of the path components, so you can just use find, then join it all back together, as with the pathlib version.

>>> import os.path
>>> old_path='/abc/dfg/ghi/f.txt'
First grab the relative path from the starting directory of your choice using os.path.relpath
>>> rel = os.path.relpath(old_path, '/abc/dfg/')
>>> rel
'ghi\\f.txt'
Then add the new first part of the path to this relative path using os.path.join
>>> new_path = os.path.join('jkl\mno', rel)
>>> new_path
'jkl\\mno\\ghi\\f.txt'

You can use the index of ghi:
old_path.replace(old_path[:old_path.index("ghi")],"/jkl/mno/")
In [4]: old_path.replace(old_path[:old_path.index("ghi")],"/jkl/mno/" )
Out[4]: '/jkl/mno/ghi/f.txt'

A rather naive approach, but does the job:
Function:
def replace_path(path, frm, to):
pre, match, post = path.rpartition(frm)
return ''.join((to if match else pre, match, post))
Example:
>>> s = '/abc/dfg/ghi/f.txt'
>>> replace_path(s, '/ghi/', '/jkl/mno')
'/jkl/mno/ghi/f.txt'
>>> replace_path(s, '/whatever/', '/jkl/mno')
'/abc/dfg/ghi/f.txt'

The following is useful when you want to replace some known base directory in your path.
from pathlib import Path
old_path = Path('/abc/dfg/ghi/f.txt')
old_root = Path('/abc/dfg')
new_root = Path('/jkl/mno')
new_path = new_root / old_path.relative_to(old_root)
# Result: /jkl/mno/ghi/f.txt
I understand that the OP specifically mentioned that the path to the base directory is not known. However, since it is a common task to remove the path to the base directory, and the title of the question ("replace part of the path") is certainly bringing some folks with this subtype of problem here, I am posting it anyway.

I needed to replace an arbitrary number of an arbitrary strings in a path
e.g. replace 'package' with foo in
VERSION_FILE = Path(f'{Path.home()}', 'projects', 'package', 'package', '_version.py')
So I use this call
_replace_path_text(VERSION_FILE, 'package', 'foo)
def _replace_path_text(path, text, replacement):
parts = list(path.parts)
new_parts = [part.replace(text, replacement) for part in parts]
return Path(*new_parts)

Related

How do I get the parent directory's name only, not full path?

I am trying to get the parent directory's name only. Meaning, only its last component, not the full path.
So for example for the path a/b/c/d/e I want to get d, and not a/b/c/d.
My current code:
import os
path = "C:/example/folder/file1.jpg"
directoryName = os.path.dirname(os.path.normpath(path))
print(directoryName)
This prints out C:/example/folder and I want to get just folder.
The simplest way to do this would be using pathlib. Using parent will get you the parent's full path, and name will give you just the last component:
>>> from pathlib import Path
>>> path = Path("/a/b/c/d/e")
>>> path.parent.name
'd'
For comparison, to do the same with os.path, you will need to get the basename of the dirname of your path. So that translates directly to:
import os
path = "C:/example/folder/file1.jpg"
print(os.path.basename(os.path.dirname(path)))
Which is the nicer version of:
os.path.split(os.path.split(path)[0])[1]
Where both give:
'folder'
As you can see, the pathlib approach is much clearer and readable. Because pathlib incorporates the OOP approach for representing paths, instead of strings, we get a clear chain of attributes/method calls.
path.parent.name
Is read in order as:
start from path -> take its parent -> take its name
Whereas in the os functions-accepting-strings approach you actually need to read from inside-out!
os.path.basename(os.path.dirname(path))
Is read in order as:
The name of the parent of the path
Which I'm sure you'll agree is much harder to read and understand (and this is just a simple-case example).
You could also use the str.split method together with os.sep:
>>> path = "C:\\example\\folder\\file1.jpg"
>>> path.split(os.sep)[-2]
'folder'
But as the docs state:
Note that knowing this [(the separator)] is not sufficient to be able to parse or
concatenate pathnames — use os.path.split() and os.path.join() — but
it is occasionally useful.
Use pathlib.Path to get the .name of the .parent:
from pathlib import Path
p = Path("C:/example/folder/file1.jpg")
print(p.parent.name) # folder
Compared to os.path, pathlib represents paths as a separate type instead of strings. It generally is shorter and more convenient to use.
this works
path = "C:/example/folder/file1.jpg"
directoryName = os.path.dirname(path)
parent = directoryName.split("/")
parent.reverse()
print(parent[0])
Simple to solve using pathlib
0. Import Path from pathlib
from pathlib import Path
path = "C:/example/folder/file1.jpg"
1. Get parent level 1
parent_lv1 = Path(path).parent
2. Get parent level 2
parent_lv2 = parent_lv1.parent
3. Get immediate parent
imm_parent = parent_lv1.relative_to(parent_lv2)
print(imm_parent)
I prefer regex
import re
def get_parent(path: str) -> str:
match = re.search(r".*[\\|/](\w+)[\\|/].*", path)
if match:
return match.group(1)
else:
return ""
if __name__ == '__main__':
my_path = "/home/tony/some/cool/path"
print(get_parent(my_path))
win_path = r"C:\windows\path\has\dumb\backslashes"
print(get_parent(win_path))
Output
cool
dumb

How can I replace a substring in a Python pathlib.Path?

Is there an easy way to replace a substring within a pathlib.Path object in Python? The pathlib module is nicer in many ways than storing a path as a str and using os.path, glob.glob etc, which are built in to pathlib. But I often use files that follow a pattern, and often replace substrings in a path to access other files:
data/demo_img.png
data/demo_img_processed.png
data/demo_spreadsheet.csv
Previously I could do:
img_file_path = "data/demo_img.png"
proc_img_file_path = img_file_path.replace("_img.png", "_img_proc.png")
data_file_path = img_file_path.replace("_img.png", "_spreadsheet.csv")
pathlib can replace the file extension with the with_suffix() method, but only accepts extensions as valid suffixes. The workarounds are:
import pathlib
import os
img_file_path = pathlib.Path("data/demo_img.png")
proc_img_file_path = pathlib.Path(str(img_file_path).replace("_img.png", "_img_proc.png"))
# os.fspath() is available in Python 3.6+ and is apparently safer than str()
data_file_path = pathlib.Path(os.fspath(img_file_path).replace("_img.png", "_img_proc.png"))
Converting to a string to do the replacement and reconverting to a Path object seems laborious. Assume that I never have a copy of the string form of img_file_path, and have to convert the type as needed.
You are correct. To replace old with new in Path p, you need:
p = Path(str(p).replace(old, new))
EDIT
We turn Path p into str so we get this str method:
Help on method_descriptor:
replace(self, old, new, count=-1, /)
Return a copy with all occurrences of substring old replaced by new.
Otherwise we'd get this Path method:
Help on function replace in module pathlib:
replace(self, target)
Rename this path to the given path, clobbering the existing destination if it exists, and return a new Path instance pointing to the given path.
I have recently faced a similar problem and found this thread when searching for a solution. In contrast to the accepted answer I did not convert the pathlib.Path object into a string. Instead, I used its parent and name attributes (name is a string itself), along with the joinpath() method. Here is the code:
In [2]: from pathlib import Path
In [3]: img_file_path = Path('data/demo_img.png')
In [4]: parent, name = img_file_path.parent, img_file_path.name
In [5]: proc_fn = name.replace('_img.png', '_img_proc.png')
...: data_fn = name.replace('_img.png', '_spreadsheet.csv')
In [6]: proc_img_file_path = Path(parent).joinpath(proc_fn)
...: data_img_file_path = Path(parent).joinpath(data_fn)
In [7]: proc_img_file_path
Out[7]: WindowsPath('data/demo_img_proc.png')
In [8]: data_img_file_path
Out[8]: WindowsPath('data/demo_spreadsheet.csv')
An advantage of this approach is that it avoids the risk of making unwanted replacements in the parent bit.
use PurePath.with_name() or PurePath.with_stem()

Find the common path prefix of a list of paths

My problem is to find the common path prefix of a given set of files.
Literally I was expecting that "os.path.commonprefix" would do just that. Unfortunately, the fact that commonprefix is located in path is rather misleading, since it actually will search for string prefixes.
The question to me is, how can this actually be solved for paths? The issue was briefly mentioned in this (fairly high rated) answer but only as a side-note and the proposed solution (appending slashes to the input of commonprefix) imho has issues, since it will fail for instance for:
os.path.commonprefix(['/usr/var1/log/', '/usr/var2/log/'])
# returns /usr/var but it should be /usr
To prevent others from falling into the same trap, it might be worthwhile to discuss this issue in a separate question: Is there a simple / portable solution for this problem that does not rely on nasty checks on the file system (i.e., access the result of commonprefix and check whether it is a directory and if not returns a os.path.dirname of the result)?
It seems that this issue has been corrected in recent versions of Python. New in version 3.5 is the function os.path.commonpath(), which returns the common path instead of the common string prefix.
Awhile ago I ran into this where os.path.commonprefix is a string prefix and not a path prefix as would be expected. So I wrote the following:
def commonprefix(l):
# this unlike the os.path.commonprefix version
# always returns path prefixes as it compares
# path component wise
cp = []
ls = [p.split('/') for p in l]
ml = min( len(p) for p in ls )
for i in range(ml):
s = set( p[i] for p in ls )
if len(s) != 1:
break
cp.append(s.pop())
return '/'.join(cp)
it could be made more portable by replacing '/' with os.path.sep.
Assuming you want the common directory path, one way is to:
Use only directory paths as input. If your input value is a file name, call os.path.dirname(filename) to get its directory path.
"Normalize" all the paths so that they are relative to the same thing and don't include double separators. The easiest way to do this is by calling os.path.abspath( ) to get the path relative to the root. (You might also want to use os.path.realpath( ) to remove symbolic links.)
Add a final separator (found portably with os.path.sep or os.sep) to the end of all the normalized directory paths.
Call os.path.dirname( ) on the result of os.path.commonprefix( ).
In code (without removing symbolic links):
def common_path(directories):
norm_paths = [os.path.abspath(p) + os.path.sep for p in directories]
return os.path.dirname(os.path.commonprefix(norm_paths))
def common_path_of_filenames(filenames):
return common_path([os.path.dirname(f) for f in filenames])
A robust approach is to split the path into individual components and then find the longest common prefix of the component lists.
Here is an implementation which is cross-platform and can be generalized easily to more than two paths:
import os.path
import itertools
def components(path):
'''
Returns the individual components of the given file path
string (for the local operating system).
The returned components, when joined with os.path.join(), point to
the same location as the original path.
'''
components = []
# The loop guarantees that the returned components can be
# os.path.joined with the path separator and point to the same
# location:
while True:
(new_path, tail) = os.path.split(path) # Works on any platform
components.append(tail)
if new_path == path: # Root (including drive, on Windows) reached
break
path = new_path
components.append(new_path)
components.reverse() # First component first
return components
def longest_prefix(iter0, iter1):
'''
Returns the longest common prefix of the given two iterables.
'''
longest_prefix = []
for (elmt0, elmt1) in itertools.izip(iter0, iter1):
if elmt0 != elmt1:
break
longest_prefix.append(elmt0)
return longest_prefix
def common_prefix_path(path0, path1):
return os.path.join(*longest_prefix(components(path0), components(path1)))
# For Unix:
assert common_prefix_path('/', '/usr') == '/'
assert common_prefix_path('/usr/var1/log/', '/usr/var2/log/') == '/usr'
assert common_prefix_path('/usr/var/log1/', '/usr/var/log2/') == '/usr/var'
assert common_prefix_path('/usr/var/log', '/usr/var/log2') == '/usr/var'
assert common_prefix_path('/usr/var/log', '/usr/var/log') == '/usr/var/log'
# Only for Windows:
# assert common_prefix_path(r'C:\Programs\Me', r'C:\Programs') == r'C:\Programs'
I've made a small python package commonpath to find common paths from a list. Comes with a few nice options.
https://github.com/faph/Common-Path

Comparing two paths in python

Consider:
path1 = "c:/fold1/fold2"
list_of_paths = ["c:\\fold1\\fold2","c:\\temp\\temp123"]
if path1 in list_of_paths:
print "found"
I would like the if statement to return True, but it evaluates to False,
since it is a string comparison.
How to compare two paths irrespective of the forward or backward slashes they have? I'd prefer not to use the replace function to convert both strings to a common format.
Use os.path.normpath to convert c:/fold1/fold2 to c:\fold1\fold2:
>>> path1 = "c:/fold1/fold2"
>>> list_of_paths = ["c:\\fold1\\fold2","c:\\temp\\temp123"]
>>> os.path.normpath(path1)
'c:\\fold1\\fold2'
>>> os.path.normpath(path1) in list_of_paths
True
>>> os.path.normpath(path1) in (os.path.normpath(p) for p in list_of_paths)
True
os.path.normpath(path1) in map(os.path.normpath, list_of_paths) also works, but it will build a list with entire path items even though there's match in the middle. (In Python 2.x)
On Windows, you must use os.path.normcase to compare paths because on Windows, paths are not case-sensitive.
All of these answers mention os.path.normpath, but none of them mention os.path.realpath:
os.path.realpath(path)
Return the canonical path of the specified filename, eliminating any symbolic links encountered in the path (if they are supported by the operating system).
New in version 2.2.
So then:
if os.path.realpath(path1) in (os.path.realpath(p) for p in list_of_paths):
# ...
The os.path module contains several functions to normalize file paths so that equivalent paths normalize to the same string. You may want normpath, normcase, abspath, samefile, or some other tool.
If you are using python-3, you can use pathlib to achieve your goal:
import pathlib
path1 = pathlib.Path("c:/fold1/fold2")
list_of_paths = [pathlib.Path(path) for path in ["c:\\fold1\\fold2","c:\\temp\\temp123"]]
assert path1 in list_of_paths
Store the list_of_paths as a list instead of a string:
list_of_paths = [["c:","fold1","fold2"],["c","temp","temp123"]]
Then split given path by '/' or '\' (whichever is present) and then use the in keyword.
Use os.path.normpath to canonicalize the paths before comparing them. For example:
if any(os.path.normpath(path1) == os.path.normpath(p)
for p in list_of_paths):
print "found"

get absolute path from one absolute and one relative

I want to get absolute path from absolute path and relative:
absolute1 = '/a/b/c/d.js'
relative = '../../e.js'
absolute2 = getAbsoluteFromAbsoluteAndRelative(absolute1, relative)
In this example absolute2 should be equal 'a/e.js'
How to write getAbsoluteFromAbsoluteAndRelative method?
Update:
I found os.path.abspath but it takes only one argument
Your absolute path still contains a filename, so remove that with os.path.dirname() to obtain just the directory.
Then join the two and apply os.path.normpath() to the result:
os.path.normpath(os.path.join(os.path.dirname(absolute1), relative))
normpath normalizes a path with relative references in it; A/foo/../B becomes A/B, for example.
Demo:
>>> import os.path
>>> absolute1 = '/a/b/c/d.js'
>>> relative = '../../e.js'
>>> os.path.normpath(os.path.join(os.path.dirname(absolute1), relative))
'/a/e.js'
Try absolute2 = os.path.join(os.path.dirname(absolute1), relative)
Edit: Martijn beat me to it. Wrapping this in os.path.normpath is the way to go.

Categories