Having trouble converting paths in Pathlib - python

I have to take in a file path that looks like this:
'C:/Users/xxx/Desktop/test_folder'
It gets stored into a variable as a string so:
path_intake = 'C:/Users/xxx/Desktop/test_folder'
I want to assign that path to my
p = Path(path_intake)
But, When p takes in path_intake it changes the path to:
'C:\Users\xxx\Desktop\test_folder'
Which is not what I want since .rglob can only read the path like this:
p = Path(C:/Users/xxx/Desktop/test_folder)
How to obtain this path by taking in the first path?

The value
C:/Users/xxx/Desktop/test_folder
is not a canonical Windows path string. As everyone knows, Windows uses backslashes. So if you supply /, pathlib turns the path into a canonical path string for your platform, which is
C:\Users\xxx\Desktop\test_folder
But the two Path objects are identical, as you will quickly see if you do this:
>>> p = pathlib.Path(r"C:\Users\xxx\Desktop\test_folder")
>>> p2 = pathlib.Path(r"C:/Users/xxx/Desktop/test_folder")
>>> p == p2
True
You are not correct when you say that ".rglob can only read a path like this: C:/Users/xxx/Desktop/test_folder". To demonstrate that, do this:
>>> list(p.rglob("*.txt")) == list(p2.rglob("*.txt"))
True
The Path objects are identical and you can call .rglob() on either one and get the expected result.

Related

How to iterate over files in specific directories?

I'd like to iterate over files in two folders in a directory only, and ignore any other files/directories.
e.g in path: "dirA/subdirA/folder1" and "dirA/subdirA/folder2"
I tried passing both to pathlib as:
root_dir_A = "dirA/subdirA/folder1"
root_dir_B = "dirA/subdirA/folder2"
for file in Path(root_dir_A,root_dir_B).glob('**/*.json'):
json_data = open(file, encoding="utf8")
...
But it only iterates over the 2nd path in Path(root_dir_A,root_dir_B).
You can't pass two separate directories to Path(). You'll need to loop over them.
for dirpath in (root_dir_A, root_dir_B):
for file in Path(dirpath).glob('**/*.json'):
...
According to the documentation, Path("foo", "bar") should produce "foo/bar"; but it seems to actually use only the second path segment if it is absolute. Either way, it doesn't do what you seemed to hope it would.
Please check the output of Path(root_dir_A,root_dir_B) to see if it returns what you want.
In your specific case this should work:
path_root = Path('dirA')
for path in path_root.glob('subdirA/folder[12]/*/*.json'):
...
If your paths aren't homogeneous enough you might have to chain generators. I. e.:
from itertools import chain
content_dir_A = Path(root_dir_A).glob('**/*.json')
content_dir_B = Path(root_dir_B).glob('**/*.json')
content_all = chain(content_dir_A, content_dir_B)
for path in content_all:
...

String '\' replace with '/'

a='C:/Users/me/Documents/PythonProjects/opencv/Train\11\00011_00014_00018.png'
I am running a for loop with variables such as a, that are strings.
I intend to obtain the number 11 from the string above.
Using a.replace('\\,'/') , i get the exact same string back , that is, 'C:/Users/me/Documents/PythonProjects/opencv/Train\11\00011_00014_00018.png'
the only way i got it to work was with r/'C:/Users/me/Documents/PythonProjects/opencv/Train\11\00011_00014_00018.png'.replace('\\','/') but that does not work with variables i.e
r'a'.replace('\\','/')
its not like f-strings whereby i can parse variables as such f'{a}'
I would instead recommend using os.path if your intention is to clean up or mutate filesystem paths
>>> import os
>>> a='C:/Users/me/Documents/PythonProjects/opencv/Train\11\00011_00014_00018.png'
>>> os.path.normpath(a)
'C:\\Users\\me\\Documents\\PythonProjects\\opencv\\Train\t\x0011_00014_00018.png'
Using os.path for path manipulation will generally behave correctly on different operating systems without you having to manually modify slashes, drive names, etc.
Thanks it worked !
root_dir = 'C:/Users/me/Documents/PythonProjects/opencv/Train'
all_img_paths = glob.glob(os.path.join(root_dir, '**.png'))
for img_path in all_img_paths:
try:
img = preprocess_img(io.imread(img_path))
label = get_class(img_path)
to:
all_img_paths = glob.glob(os.path.join(os.path.normpath(root_dir), '**.png'))
np.random.shuffle(all_img_paths)

How do I get the parent directory's name only, not full path?

I am trying to get the parent directory's name only. Meaning, only its last component, not the full path.
So for example for the path a/b/c/d/e I want to get d, and not a/b/c/d.
My current code:
import os
path = "C:/example/folder/file1.jpg"
directoryName = os.path.dirname(os.path.normpath(path))
print(directoryName)
This prints out C:/example/folder and I want to get just folder.
The simplest way to do this would be using pathlib. Using parent will get you the parent's full path, and name will give you just the last component:
>>> from pathlib import Path
>>> path = Path("/a/b/c/d/e")
>>> path.parent.name
'd'
For comparison, to do the same with os.path, you will need to get the basename of the dirname of your path. So that translates directly to:
import os
path = "C:/example/folder/file1.jpg"
print(os.path.basename(os.path.dirname(path)))
Which is the nicer version of:
os.path.split(os.path.split(path)[0])[1]
Where both give:
'folder'
As you can see, the pathlib approach is much clearer and readable. Because pathlib incorporates the OOP approach for representing paths, instead of strings, we get a clear chain of attributes/method calls.
path.parent.name
Is read in order as:
start from path -> take its parent -> take its name
Whereas in the os functions-accepting-strings approach you actually need to read from inside-out!
os.path.basename(os.path.dirname(path))
Is read in order as:
The name of the parent of the path
Which I'm sure you'll agree is much harder to read and understand (and this is just a simple-case example).
You could also use the str.split method together with os.sep:
>>> path = "C:\\example\\folder\\file1.jpg"
>>> path.split(os.sep)[-2]
'folder'
But as the docs state:
Note that knowing this [(the separator)] is not sufficient to be able to parse or
concatenate pathnames — use os.path.split() and os.path.join() — but
it is occasionally useful.
Use pathlib.Path to get the .name of the .parent:
from pathlib import Path
p = Path("C:/example/folder/file1.jpg")
print(p.parent.name) # folder
Compared to os.path, pathlib represents paths as a separate type instead of strings. It generally is shorter and more convenient to use.
this works
path = "C:/example/folder/file1.jpg"
directoryName = os.path.dirname(path)
parent = directoryName.split("/")
parent.reverse()
print(parent[0])
Simple to solve using pathlib
0. Import Path from pathlib
from pathlib import Path
path = "C:/example/folder/file1.jpg"
1. Get parent level 1
parent_lv1 = Path(path).parent
2. Get parent level 2
parent_lv2 = parent_lv1.parent
3. Get immediate parent
imm_parent = parent_lv1.relative_to(parent_lv2)
print(imm_parent)
I prefer regex
import re
def get_parent(path: str) -> str:
match = re.search(r".*[\\|/](\w+)[\\|/].*", path)
if match:
return match.group(1)
else:
return ""
if __name__ == '__main__':
my_path = "/home/tony/some/cool/path"
print(get_parent(my_path))
win_path = r"C:\windows\path\has\dumb\backslashes"
print(get_parent(win_path))
Output
cool
dumb

replace part of path - python

Is there a quick way to replace part of the path in python?
for example:
old_path='/abc/dfg/ghi/f.txt'
I don't know the beginning of the path (/abc/dfg/), so what I'd really like to tell python to keep everything that comes after /ghi/ (inclusive) and replace everything before /ghi/ with /jkl/mno/:
>>> new_path
'/jkl/mno/ghi/f.txt/'
If you're using Python 3.4+, or willing to install the backport, consider using pathlib instead of os.path:
path = pathlib.Path(old_path)
index = path.parts.index('ghi')
new_path = pathlib.Path('/jkl/mno').joinpath(*path.parts[index:])
If you just want to stick with the 2.7 or 3.3 stdlib, there's no direct way to do this, but you can get the equivalent of parts by looping over os.path.split. For example, keeping each path component until you find the first ghi, and then tacking on the new prefix, will replace everything before the last ghi (if you want to replace everything before the first ghi, it's not hard to change things):
path = old_path
new_path = ''
while True:
path, base = os.path.split(path)
new_path = os.path.join(base, new_path)
if base == 'ghi':
break
new_path = os.path.join('/jkl/mno', new_path)
This is a bit clumsy, so you might want to consider writing a simple function that gives you a list or tuple of the path components, so you can just use find, then join it all back together, as with the pathlib version.
>>> import os.path
>>> old_path='/abc/dfg/ghi/f.txt'
First grab the relative path from the starting directory of your choice using os.path.relpath
>>> rel = os.path.relpath(old_path, '/abc/dfg/')
>>> rel
'ghi\\f.txt'
Then add the new first part of the path to this relative path using os.path.join
>>> new_path = os.path.join('jkl\mno', rel)
>>> new_path
'jkl\\mno\\ghi\\f.txt'
You can use the index of ghi:
old_path.replace(old_path[:old_path.index("ghi")],"/jkl/mno/")
In [4]: old_path.replace(old_path[:old_path.index("ghi")],"/jkl/mno/" )
Out[4]: '/jkl/mno/ghi/f.txt'
A rather naive approach, but does the job:
Function:
def replace_path(path, frm, to):
pre, match, post = path.rpartition(frm)
return ''.join((to if match else pre, match, post))
Example:
>>> s = '/abc/dfg/ghi/f.txt'
>>> replace_path(s, '/ghi/', '/jkl/mno')
'/jkl/mno/ghi/f.txt'
>>> replace_path(s, '/whatever/', '/jkl/mno')
'/abc/dfg/ghi/f.txt'
The following is useful when you want to replace some known base directory in your path.
from pathlib import Path
old_path = Path('/abc/dfg/ghi/f.txt')
old_root = Path('/abc/dfg')
new_root = Path('/jkl/mno')
new_path = new_root / old_path.relative_to(old_root)
# Result: /jkl/mno/ghi/f.txt
I understand that the OP specifically mentioned that the path to the base directory is not known. However, since it is a common task to remove the path to the base directory, and the title of the question ("replace part of the path") is certainly bringing some folks with this subtype of problem here, I am posting it anyway.
I needed to replace an arbitrary number of an arbitrary strings in a path
e.g. replace 'package' with foo in
VERSION_FILE = Path(f'{Path.home()}', 'projects', 'package', 'package', '_version.py')
So I use this call
_replace_path_text(VERSION_FILE, 'package', 'foo)
def _replace_path_text(path, text, replacement):
parts = list(path.parts)
new_parts = [part.replace(text, replacement) for part in parts]
return Path(*new_parts)

Comparing two paths in python

Consider:
path1 = "c:/fold1/fold2"
list_of_paths = ["c:\\fold1\\fold2","c:\\temp\\temp123"]
if path1 in list_of_paths:
print "found"
I would like the if statement to return True, but it evaluates to False,
since it is a string comparison.
How to compare two paths irrespective of the forward or backward slashes they have? I'd prefer not to use the replace function to convert both strings to a common format.
Use os.path.normpath to convert c:/fold1/fold2 to c:\fold1\fold2:
>>> path1 = "c:/fold1/fold2"
>>> list_of_paths = ["c:\\fold1\\fold2","c:\\temp\\temp123"]
>>> os.path.normpath(path1)
'c:\\fold1\\fold2'
>>> os.path.normpath(path1) in list_of_paths
True
>>> os.path.normpath(path1) in (os.path.normpath(p) for p in list_of_paths)
True
os.path.normpath(path1) in map(os.path.normpath, list_of_paths) also works, but it will build a list with entire path items even though there's match in the middle. (In Python 2.x)
On Windows, you must use os.path.normcase to compare paths because on Windows, paths are not case-sensitive.
All of these answers mention os.path.normpath, but none of them mention os.path.realpath:
os.path.realpath(path)
Return the canonical path of the specified filename, eliminating any symbolic links encountered in the path (if they are supported by the operating system).
New in version 2.2.
So then:
if os.path.realpath(path1) in (os.path.realpath(p) for p in list_of_paths):
# ...
The os.path module contains several functions to normalize file paths so that equivalent paths normalize to the same string. You may want normpath, normcase, abspath, samefile, or some other tool.
If you are using python-3, you can use pathlib to achieve your goal:
import pathlib
path1 = pathlib.Path("c:/fold1/fold2")
list_of_paths = [pathlib.Path(path) for path in ["c:\\fold1\\fold2","c:\\temp\\temp123"]]
assert path1 in list_of_paths
Store the list_of_paths as a list instead of a string:
list_of_paths = [["c:","fold1","fold2"],["c","temp","temp123"]]
Then split given path by '/' or '\' (whichever is present) and then use the in keyword.
Use os.path.normpath to canonicalize the paths before comparing them. For example:
if any(os.path.normpath(path1) == os.path.normpath(p)
for p in list_of_paths):
print "found"

Categories