How can I replace a substring in a Python pathlib.Path? - python

Is there an easy way to replace a substring within a pathlib.Path object in Python? The pathlib module is nicer in many ways than storing a path as a str and using os.path, glob.glob etc, which are built in to pathlib. But I often use files that follow a pattern, and often replace substrings in a path to access other files:
data/demo_img.png
data/demo_img_processed.png
data/demo_spreadsheet.csv
Previously I could do:
img_file_path = "data/demo_img.png"
proc_img_file_path = img_file_path.replace("_img.png", "_img_proc.png")
data_file_path = img_file_path.replace("_img.png", "_spreadsheet.csv")
pathlib can replace the file extension with the with_suffix() method, but only accepts extensions as valid suffixes. The workarounds are:
import pathlib
import os
img_file_path = pathlib.Path("data/demo_img.png")
proc_img_file_path = pathlib.Path(str(img_file_path).replace("_img.png", "_img_proc.png"))
# os.fspath() is available in Python 3.6+ and is apparently safer than str()
data_file_path = pathlib.Path(os.fspath(img_file_path).replace("_img.png", "_img_proc.png"))
Converting to a string to do the replacement and reconverting to a Path object seems laborious. Assume that I never have a copy of the string form of img_file_path, and have to convert the type as needed.

You are correct. To replace old with new in Path p, you need:
p = Path(str(p).replace(old, new))
EDIT
We turn Path p into str so we get this str method:
Help on method_descriptor:
replace(self, old, new, count=-1, /)
Return a copy with all occurrences of substring old replaced by new.
Otherwise we'd get this Path method:
Help on function replace in module pathlib:
replace(self, target)
Rename this path to the given path, clobbering the existing destination if it exists, and return a new Path instance pointing to the given path.

I have recently faced a similar problem and found this thread when searching for a solution. In contrast to the accepted answer I did not convert the pathlib.Path object into a string. Instead, I used its parent and name attributes (name is a string itself), along with the joinpath() method. Here is the code:
In [2]: from pathlib import Path
In [3]: img_file_path = Path('data/demo_img.png')
In [4]: parent, name = img_file_path.parent, img_file_path.name
In [5]: proc_fn = name.replace('_img.png', '_img_proc.png')
...: data_fn = name.replace('_img.png', '_spreadsheet.csv')
In [6]: proc_img_file_path = Path(parent).joinpath(proc_fn)
...: data_img_file_path = Path(parent).joinpath(data_fn)
In [7]: proc_img_file_path
Out[7]: WindowsPath('data/demo_img_proc.png')
In [8]: data_img_file_path
Out[8]: WindowsPath('data/demo_spreadsheet.csv')
An advantage of this approach is that it avoids the risk of making unwanted replacements in the parent bit.

use PurePath.with_name() or PurePath.with_stem()

Related

How do I get the parent directory's name only, not full path?

I am trying to get the parent directory's name only. Meaning, only its last component, not the full path.
So for example for the path a/b/c/d/e I want to get d, and not a/b/c/d.
My current code:
import os
path = "C:/example/folder/file1.jpg"
directoryName = os.path.dirname(os.path.normpath(path))
print(directoryName)
This prints out C:/example/folder and I want to get just folder.
The simplest way to do this would be using pathlib. Using parent will get you the parent's full path, and name will give you just the last component:
>>> from pathlib import Path
>>> path = Path("/a/b/c/d/e")
>>> path.parent.name
'd'
For comparison, to do the same with os.path, you will need to get the basename of the dirname of your path. So that translates directly to:
import os
path = "C:/example/folder/file1.jpg"
print(os.path.basename(os.path.dirname(path)))
Which is the nicer version of:
os.path.split(os.path.split(path)[0])[1]
Where both give:
'folder'
As you can see, the pathlib approach is much clearer and readable. Because pathlib incorporates the OOP approach for representing paths, instead of strings, we get a clear chain of attributes/method calls.
path.parent.name
Is read in order as:
start from path -> take its parent -> take its name
Whereas in the os functions-accepting-strings approach you actually need to read from inside-out!
os.path.basename(os.path.dirname(path))
Is read in order as:
The name of the parent of the path
Which I'm sure you'll agree is much harder to read and understand (and this is just a simple-case example).
You could also use the str.split method together with os.sep:
>>> path = "C:\\example\\folder\\file1.jpg"
>>> path.split(os.sep)[-2]
'folder'
But as the docs state:
Note that knowing this [(the separator)] is not sufficient to be able to parse or
concatenate pathnames — use os.path.split() and os.path.join() — but
it is occasionally useful.
Use pathlib.Path to get the .name of the .parent:
from pathlib import Path
p = Path("C:/example/folder/file1.jpg")
print(p.parent.name) # folder
Compared to os.path, pathlib represents paths as a separate type instead of strings. It generally is shorter and more convenient to use.
this works
path = "C:/example/folder/file1.jpg"
directoryName = os.path.dirname(path)
parent = directoryName.split("/")
parent.reverse()
print(parent[0])
Simple to solve using pathlib
0. Import Path from pathlib
from pathlib import Path
path = "C:/example/folder/file1.jpg"
1. Get parent level 1
parent_lv1 = Path(path).parent
2. Get parent level 2
parent_lv2 = parent_lv1.parent
3. Get immediate parent
imm_parent = parent_lv1.relative_to(parent_lv2)
print(imm_parent)
I prefer regex
import re
def get_parent(path: str) -> str:
match = re.search(r".*[\\|/](\w+)[\\|/].*", path)
if match:
return match.group(1)
else:
return ""
if __name__ == '__main__':
my_path = "/home/tony/some/cool/path"
print(get_parent(my_path))
win_path = r"C:\windows\path\has\dumb\backslashes"
print(get_parent(win_path))
Output
cool
dumb

Python pathlib.Path - how do I get just a platform independent file separator as a string?

I am creating a format string that is different based on class, that is used to generate a filename by a generic class method. I'm using the Python 3.4+ pathlib.Path module for object-oriented file I/O.
In building this string, the path separator is missing, and rather than just put the windows version in, I want to add a platform independent file separator.
I searched the pathlib docs, and answers here about this, but all the examples assume I'm building a Path object, not a string. The pathlib functions will add the correct separator in any string outputs, but those are actual paths - so it won't work.
Besides something hacky like writing a string and parsing it to figure out what the separator is, is there a way to directly get the current, correct file separator string?
Prefer an answer using pathlib.Path, rather than os or shutil packages.
Here's what the code looks like:
In the constructor:
self.outputdir = Path('drive:\\dir\\file')
self.indiv_fileformatstr = str(self.outputdir) + '{}_new.m'
In the final method used:
indiv_filename = Path(self.indiv_fileformatstr.format(topic))
This leaves out the file separator
There is nothing public in the pathlib module providing the character used by the operating system to separate pathname components. If you really need it, import os and use os.sep.
But you likely don't need it in the first place - it's missing the point of pathlib if you convert to strings in order to join a filename. In typical usage, the separator string itself isn't used for concatenating path components because pathlib overrides division operators (__truediv__ and __rtruediv__) for this purpose. Similarly, it's not needed for splitting due to methods such as Path.parts.
Instead of:
self.indiv_fileformatstr = str(self.outputdir) + '{}_new.m'
You would usually do something like:
self.indiv_fileformatpath = self.outputdir / '{}_new.m'
self.indiv_fileformatstr = str(self.indiv_fileformatpath)
The platform-independent separator is in pathlib.os.sep
Solution using wim's answer
Based on wim's answer, the following works great:
Save the format string in the Path object
When needing to substitute into the templated filename in the future, just use str(path_object) to get the string back out.
import pathlib
# Start with following, with self.outputdir as pathlib.Path object
outputdir = 'c:\\myfolder'
file_template_path = outputdir / '{}_new.m'
# Then to make the final file object later (i.e. in a child class, etc.)
base_filename_string = 'myfile'
new_file = pathlib.Path(str(file_template).format(base_filename_string))
This creates:
pathlib.Path("c:\\myfolder\myfile_new.m")
Creating the template with prefix/postfix/etc.
If you need to apply other variables, you can use 2 levels of formatting to apply specialized prefixes/postfixes, etc., then store the final template in a Path object, as shown above.
When creating 2 levels of formatting, use double brackets where the first level formatter should just create a single bracket and not try to interpret a tag. i.e. {{basename}} becomes just {basename} without any variable substitution.
prefix = 'new_'
postfix = '_1'
ext = 'txt'
file_template_path = outputdir / f'{prefix}{{}}{postfix}.{ext}'
which becomes a path object with the following string:
$ file_template_path
pathlib.Path("c:\\myfolder\new_{}_1.txt")

How to make a new Path object from parts of a current path with pathlib?

I would like to change a part of a Path object with pathlib.
For example if you have a Path object:
import pathlib
path = pathlib.Path("/home/user/to/some/floder/toto.out")
How can I change the file name ? And get a new path with for example "/home/user/to/some/folder/other_file.dat" ?
Or more generally, one can I change one or several elements of that path ?
I can get the parts of the path:
In [1]: path.parts
Out[1]: ('/', 'home', 'user', 'to', 'some', 'floder', 'toto.out')
Thus, a workaround is to join the needed parts, make a new string and then a new path, but I wonder if there is a more convenient tool to do that.
EDIT
To be more precise, does it exist an equivalent to path.name that return the complementary part of the path : str(path).replace(path.name, "").
In order to sum up the comments, the statements are the following:
In order to change the file name
In [1]: import pathlib
In [2]: path = pathlib.Path("/home/user/to/some/folder/toto.out")
In [3]: path.parent / "other_file.dat"
Out[3]: PosixPath('/home/user/to/some/folder/other_file.dat')
In order to change one part in the path
In [4]: parts = list(path.parts)
In [5]: parts[4] = "other"
In [6]: pathlib.Path(*parts)
Out[6]: PosixPath('/home/user/to/other/folder/toto.out')
You can try using str.format to have variable filenames
Ex:
import pathlib
filename = "other_file.dat" #Variable.
path = pathlib.Path("/home/user/to/some/floder/{0}".format(filename))

replace part of path - python

Is there a quick way to replace part of the path in python?
for example:
old_path='/abc/dfg/ghi/f.txt'
I don't know the beginning of the path (/abc/dfg/), so what I'd really like to tell python to keep everything that comes after /ghi/ (inclusive) and replace everything before /ghi/ with /jkl/mno/:
>>> new_path
'/jkl/mno/ghi/f.txt/'
If you're using Python 3.4+, or willing to install the backport, consider using pathlib instead of os.path:
path = pathlib.Path(old_path)
index = path.parts.index('ghi')
new_path = pathlib.Path('/jkl/mno').joinpath(*path.parts[index:])
If you just want to stick with the 2.7 or 3.3 stdlib, there's no direct way to do this, but you can get the equivalent of parts by looping over os.path.split. For example, keeping each path component until you find the first ghi, and then tacking on the new prefix, will replace everything before the last ghi (if you want to replace everything before the first ghi, it's not hard to change things):
path = old_path
new_path = ''
while True:
path, base = os.path.split(path)
new_path = os.path.join(base, new_path)
if base == 'ghi':
break
new_path = os.path.join('/jkl/mno', new_path)
This is a bit clumsy, so you might want to consider writing a simple function that gives you a list or tuple of the path components, so you can just use find, then join it all back together, as with the pathlib version.
>>> import os.path
>>> old_path='/abc/dfg/ghi/f.txt'
First grab the relative path from the starting directory of your choice using os.path.relpath
>>> rel = os.path.relpath(old_path, '/abc/dfg/')
>>> rel
'ghi\\f.txt'
Then add the new first part of the path to this relative path using os.path.join
>>> new_path = os.path.join('jkl\mno', rel)
>>> new_path
'jkl\\mno\\ghi\\f.txt'
You can use the index of ghi:
old_path.replace(old_path[:old_path.index("ghi")],"/jkl/mno/")
In [4]: old_path.replace(old_path[:old_path.index("ghi")],"/jkl/mno/" )
Out[4]: '/jkl/mno/ghi/f.txt'
A rather naive approach, but does the job:
Function:
def replace_path(path, frm, to):
pre, match, post = path.rpartition(frm)
return ''.join((to if match else pre, match, post))
Example:
>>> s = '/abc/dfg/ghi/f.txt'
>>> replace_path(s, '/ghi/', '/jkl/mno')
'/jkl/mno/ghi/f.txt'
>>> replace_path(s, '/whatever/', '/jkl/mno')
'/abc/dfg/ghi/f.txt'
The following is useful when you want to replace some known base directory in your path.
from pathlib import Path
old_path = Path('/abc/dfg/ghi/f.txt')
old_root = Path('/abc/dfg')
new_root = Path('/jkl/mno')
new_path = new_root / old_path.relative_to(old_root)
# Result: /jkl/mno/ghi/f.txt
I understand that the OP specifically mentioned that the path to the base directory is not known. However, since it is a common task to remove the path to the base directory, and the title of the question ("replace part of the path") is certainly bringing some folks with this subtype of problem here, I am posting it anyway.
I needed to replace an arbitrary number of an arbitrary strings in a path
e.g. replace 'package' with foo in
VERSION_FILE = Path(f'{Path.home()}', 'projects', 'package', 'package', '_version.py')
So I use this call
_replace_path_text(VERSION_FILE, 'package', 'foo)
def _replace_path_text(path, text, replacement):
parts = list(path.parts)
new_parts = [part.replace(text, replacement) for part in parts]
return Path(*new_parts)

Comparing two paths in python

Consider:
path1 = "c:/fold1/fold2"
list_of_paths = ["c:\\fold1\\fold2","c:\\temp\\temp123"]
if path1 in list_of_paths:
print "found"
I would like the if statement to return True, but it evaluates to False,
since it is a string comparison.
How to compare two paths irrespective of the forward or backward slashes they have? I'd prefer not to use the replace function to convert both strings to a common format.
Use os.path.normpath to convert c:/fold1/fold2 to c:\fold1\fold2:
>>> path1 = "c:/fold1/fold2"
>>> list_of_paths = ["c:\\fold1\\fold2","c:\\temp\\temp123"]
>>> os.path.normpath(path1)
'c:\\fold1\\fold2'
>>> os.path.normpath(path1) in list_of_paths
True
>>> os.path.normpath(path1) in (os.path.normpath(p) for p in list_of_paths)
True
os.path.normpath(path1) in map(os.path.normpath, list_of_paths) also works, but it will build a list with entire path items even though there's match in the middle. (In Python 2.x)
On Windows, you must use os.path.normcase to compare paths because on Windows, paths are not case-sensitive.
All of these answers mention os.path.normpath, but none of them mention os.path.realpath:
os.path.realpath(path)
Return the canonical path of the specified filename, eliminating any symbolic links encountered in the path (if they are supported by the operating system).
New in version 2.2.
So then:
if os.path.realpath(path1) in (os.path.realpath(p) for p in list_of_paths):
# ...
The os.path module contains several functions to normalize file paths so that equivalent paths normalize to the same string. You may want normpath, normcase, abspath, samefile, or some other tool.
If you are using python-3, you can use pathlib to achieve your goal:
import pathlib
path1 = pathlib.Path("c:/fold1/fold2")
list_of_paths = [pathlib.Path(path) for path in ["c:\\fold1\\fold2","c:\\temp\\temp123"]]
assert path1 in list_of_paths
Store the list_of_paths as a list instead of a string:
list_of_paths = [["c:","fold1","fold2"],["c","temp","temp123"]]
Then split given path by '/' or '\' (whichever is present) and then use the in keyword.
Use os.path.normpath to canonicalize the paths before comparing them. For example:
if any(os.path.normpath(path1) == os.path.normpath(p)
for p in list_of_paths):
print "found"

Categories