Joining: string and absolute path with os.path - python

Why is this not working, what am I doing wrong?
>>> p1 = r'\foo\bar.txt'
>>> os.path.join('foo1', 'foo2', os.path.normpath(p1))
'\\foo\\bar.txt'
I expected this:
'foo1\\foo2\\foo\\bar.txt'
Edit:
A Solution
>>> p1 = r'\foo\bar.txt'
>>> p1 = p1.strip('\\') # Strip '\\' so the path would not be absolute
>>> os.path.join('foo1', 'foo2', os.path.normpath(p1))
'foo1\\foo2\\foo\\bar.txt'

When os.path.join encounters an absolute path, it throws away what it has accumulated to far. An absolute string is one that starts with a slash (ans on windows, with an optional drive letter). normpath won't touch that slash as it has the same notion of absolute paths. You have to strip that slash.
And if I may ask: where does it come from in the first place?

p1 is an absolute path (starts with \) - thus it is returned by itself, per the documentation:
join(a, *p)
Join two or more pathname components, inserting "\" as needed.
If any component is an absolute path, all previous path components
will be discarded.

If you want the target behaviour of os.path.join to join two absolute paths together, strip out the separator:
import os
p1 = os.path.join(os.sep, 'foo1', 'foo2')
p2 = os.path.join(os.sep, 'foo', 'bar.txt')
os.path.join(p1, p2.lstrip(os.sep))
If you want to modify the paths, you can also do cool things like this using list comprehensions:
# Make sure all folder names are lowercase:
os.path.join(p1, *[x.lower() for x in p2.split(os.sep)])

Related

What's the best way to add a trailing slash to a pathlib directory?

I have a directory I'd like to print out with a trailing slash: my_path = pathlib.Path('abc/def')
Is there a nicer way of doing this than os.path.join(str(my_path), '')?
No, you didn't miss anything. By design, pathlib strips trailing slashes, and provides no way to display paths with trailing slashes. This has annoyed several people, as mentioned in the bug tracker: pathlib strips trailing slash.
A compact way to add slashes in Python 3.6 is to use an f-string, eg f'{some_path}/' or f'{some_path}{os.sep}' if you want to be OS-agnostic.
from pathlib import Path
import os
some_path = '/etc'
p = Path(some_path)
print(f'{p}/')
print(f'{p}{os.sep}')
output
/etc/
/etc/
Another option is to add a dummy component and slice it off the resulting string:
print(str(p/'#')[:-1])
To add a trailing slash of the path's flavour using just pathlib you could do:
>>> from pathlib import Path
>>> my_path = Path("abc/def")
>>> str(my_path / "_")[:-1] # add a dummy "_" component, then strip it
'abc/def/'
Looking into the source, there's also a Path._flavour.sep attribute:
>>> str(my_path) + my_path._flavour.sep
'abc/def/'
But it doesn't seem to have any documented accessor yet.
You could also use:
os.path.normpath(str(my_path)) + os.sep
I would say it is down to preference rather than being "nicer"

How/where to use os.path.sep?

os.path.sep is the character used by the operating system to separate pathname components.
But when os.path.sep is used in os.path.join(), why does it truncate the path?
Example:
Instead of 'home/python', os.path.join returns '/python':
>>> import os
>>> os.path.join('home', os.path.sep, 'python')
'/python'
I know that os.path.join() inserts the directory separator implicitly.
Where is os.path.sep useful? Why does it truncate the path?
Where os.path.sep is usefull?
I suspect that it exists mainly because a variable like this is required in the module anyway (to avoid hardcoding), and if it's there, it might as well be documented. Its documentation says that it is "occasionally useful".
Why it truncates the path?
From the docs for os.path.join():
If a component is an absolute path, all previous components are thrown away and joining continues from the absolute path component.
and / is an absolute path on *nix systems.
Drop os.path.sep from the os.path.join() call. os.path.join() uses os.path.sep internally.
On your system, os.path.sep == '/' that is interpreted as a root directory (absolute path) and therefore os.path.join('home', '/', 'python') is equivalent to os.path.join('/', 'python') == '/python'. From the docs:
If a component is an absolute path, all previous components are thrown
away and joining continues from the absolute path component.
As correctly given in the docstring of os.path.join -
Join two or more pathname components, inserting '/' as needed. If any component is an absolute path, all previous path components will be discarded.
Same is given in the docs as well -
os.path.join(path, *paths)
Join one or more path components intelligently. The return value is the concatenation of path and any members of *paths with exactly one directory separator (os.sep) following each non-empty part except the last, meaning that the result will only end in a separator if the last part is empty. If a component is an absolute path, all previous components are thrown away and joining continues from the absolute path component.
When you give os.path.sep alone, it is considered as an absolute path to the root directory - / .
Please note , this is for unix/linux based os.path , which internally is posixpath . Though the same behavior is seen in windows os.path.join() .
Example -
>>> import os.path
>>> os.path.join.__doc__
"Join two or more pathname components, inserting '/' as needed.\n If any component is an absolute path, all previous path components\n will be discarded."
Here's the snippet of code that is run if you are on a POSIX machine:
posixpath.py
# Join pathnames.
# Ignore the previous parts if a part is absolute.
# Insert a '/' unless the first part is empty or already ends in '/'.
def join(a, *p):
"""Join two or more pathname components, inserting '/' as needed.
If any component is an absolute path, all previous path components
will be discarded. An empty last part will result in a path that
ends with a separator."""
sep = _get_sep(a)
path = a
try:
if not p:
path[:0] + sep #23780: Ensure compatible data type even if p is null.
for b in p:
if b.startswith(sep):
path = b
elif not path or path.endswith(sep):
path += b
else:
path += sep + b
except (TypeError, AttributeError, BytesWarning):
genericpath._check_arg_types('join', a, *p)
raise
return path
Specifically, the lines:
if b.startswith(sep):
path = b
And, since os.path.sep definitely starts with this character, whenever we encounter it we throw out the portion of the variable path that has already been constructed and start over with the next element in p.
But when os.path.sep is used in os.path.join() , why it truncates the path?
Quoting directly from the documentation of os.path.join
If a component is an absolute path, all previous components are thrown away and joining continues from the absolute path component.
So when you do:
os.path.join('home', os.path.sep, 'python')
os.path.sep returns '/' which is an absolute path, and so 'home' is thrown away and you get only '/python' as the output.
This can is also clear from the example:
>>> import os
>>> os.path.join('home','/python','kivy')
'/python/kivy'
Where os.path.sep is usefull?
os.path.sep or os.sep returns the character used by the operating system to separate pathname components.
But again quoting from the docs:
Note that knowing this is not sufficient to be able to parse or concatenate pathnames — use os.path.split() and os.path.join() — but it is occasionally useful.

replace part of path - python

Is there a quick way to replace part of the path in python?
for example:
old_path='/abc/dfg/ghi/f.txt'
I don't know the beginning of the path (/abc/dfg/), so what I'd really like to tell python to keep everything that comes after /ghi/ (inclusive) and replace everything before /ghi/ with /jkl/mno/:
>>> new_path
'/jkl/mno/ghi/f.txt/'
If you're using Python 3.4+, or willing to install the backport, consider using pathlib instead of os.path:
path = pathlib.Path(old_path)
index = path.parts.index('ghi')
new_path = pathlib.Path('/jkl/mno').joinpath(*path.parts[index:])
If you just want to stick with the 2.7 or 3.3 stdlib, there's no direct way to do this, but you can get the equivalent of parts by looping over os.path.split. For example, keeping each path component until you find the first ghi, and then tacking on the new prefix, will replace everything before the last ghi (if you want to replace everything before the first ghi, it's not hard to change things):
path = old_path
new_path = ''
while True:
path, base = os.path.split(path)
new_path = os.path.join(base, new_path)
if base == 'ghi':
break
new_path = os.path.join('/jkl/mno', new_path)
This is a bit clumsy, so you might want to consider writing a simple function that gives you a list or tuple of the path components, so you can just use find, then join it all back together, as with the pathlib version.
>>> import os.path
>>> old_path='/abc/dfg/ghi/f.txt'
First grab the relative path from the starting directory of your choice using os.path.relpath
>>> rel = os.path.relpath(old_path, '/abc/dfg/')
>>> rel
'ghi\\f.txt'
Then add the new first part of the path to this relative path using os.path.join
>>> new_path = os.path.join('jkl\mno', rel)
>>> new_path
'jkl\\mno\\ghi\\f.txt'
You can use the index of ghi:
old_path.replace(old_path[:old_path.index("ghi")],"/jkl/mno/")
In [4]: old_path.replace(old_path[:old_path.index("ghi")],"/jkl/mno/" )
Out[4]: '/jkl/mno/ghi/f.txt'
A rather naive approach, but does the job:
Function:
def replace_path(path, frm, to):
pre, match, post = path.rpartition(frm)
return ''.join((to if match else pre, match, post))
Example:
>>> s = '/abc/dfg/ghi/f.txt'
>>> replace_path(s, '/ghi/', '/jkl/mno')
'/jkl/mno/ghi/f.txt'
>>> replace_path(s, '/whatever/', '/jkl/mno')
'/abc/dfg/ghi/f.txt'
The following is useful when you want to replace some known base directory in your path.
from pathlib import Path
old_path = Path('/abc/dfg/ghi/f.txt')
old_root = Path('/abc/dfg')
new_root = Path('/jkl/mno')
new_path = new_root / old_path.relative_to(old_root)
# Result: /jkl/mno/ghi/f.txt
I understand that the OP specifically mentioned that the path to the base directory is not known. However, since it is a common task to remove the path to the base directory, and the title of the question ("replace part of the path") is certainly bringing some folks with this subtype of problem here, I am posting it anyway.
I needed to replace an arbitrary number of an arbitrary strings in a path
e.g. replace 'package' with foo in
VERSION_FILE = Path(f'{Path.home()}', 'projects', 'package', 'package', '_version.py')
So I use this call
_replace_path_text(VERSION_FILE, 'package', 'foo)
def _replace_path_text(path, text, replacement):
parts = list(path.parts)
new_parts = [part.replace(text, replacement) for part in parts]
return Path(*new_parts)

Comparing two paths in python

Consider:
path1 = "c:/fold1/fold2"
list_of_paths = ["c:\\fold1\\fold2","c:\\temp\\temp123"]
if path1 in list_of_paths:
print "found"
I would like the if statement to return True, but it evaluates to False,
since it is a string comparison.
How to compare two paths irrespective of the forward or backward slashes they have? I'd prefer not to use the replace function to convert both strings to a common format.
Use os.path.normpath to convert c:/fold1/fold2 to c:\fold1\fold2:
>>> path1 = "c:/fold1/fold2"
>>> list_of_paths = ["c:\\fold1\\fold2","c:\\temp\\temp123"]
>>> os.path.normpath(path1)
'c:\\fold1\\fold2'
>>> os.path.normpath(path1) in list_of_paths
True
>>> os.path.normpath(path1) in (os.path.normpath(p) for p in list_of_paths)
True
os.path.normpath(path1) in map(os.path.normpath, list_of_paths) also works, but it will build a list with entire path items even though there's match in the middle. (In Python 2.x)
On Windows, you must use os.path.normcase to compare paths because on Windows, paths are not case-sensitive.
All of these answers mention os.path.normpath, but none of them mention os.path.realpath:
os.path.realpath(path)
Return the canonical path of the specified filename, eliminating any symbolic links encountered in the path (if they are supported by the operating system).
New in version 2.2.
So then:
if os.path.realpath(path1) in (os.path.realpath(p) for p in list_of_paths):
# ...
The os.path module contains several functions to normalize file paths so that equivalent paths normalize to the same string. You may want normpath, normcase, abspath, samefile, or some other tool.
If you are using python-3, you can use pathlib to achieve your goal:
import pathlib
path1 = pathlib.Path("c:/fold1/fold2")
list_of_paths = [pathlib.Path(path) for path in ["c:\\fold1\\fold2","c:\\temp\\temp123"]]
assert path1 in list_of_paths
Store the list_of_paths as a list instead of a string:
list_of_paths = [["c:","fold1","fold2"],["c","temp","temp123"]]
Then split given path by '/' or '\' (whichever is present) and then use the in keyword.
Use os.path.normpath to canonicalize the paths before comparing them. For example:
if any(os.path.normpath(path1) == os.path.normpath(p)
for p in list_of_paths):
print "found"

get absolute path from one absolute and one relative

I want to get absolute path from absolute path and relative:
absolute1 = '/a/b/c/d.js'
relative = '../../e.js'
absolute2 = getAbsoluteFromAbsoluteAndRelative(absolute1, relative)
In this example absolute2 should be equal 'a/e.js'
How to write getAbsoluteFromAbsoluteAndRelative method?
Update:
I found os.path.abspath but it takes only one argument
Your absolute path still contains a filename, so remove that with os.path.dirname() to obtain just the directory.
Then join the two and apply os.path.normpath() to the result:
os.path.normpath(os.path.join(os.path.dirname(absolute1), relative))
normpath normalizes a path with relative references in it; A/foo/../B becomes A/B, for example.
Demo:
>>> import os.path
>>> absolute1 = '/a/b/c/d.js'
>>> relative = '../../e.js'
>>> os.path.normpath(os.path.join(os.path.dirname(absolute1), relative))
'/a/e.js'
Try absolute2 = os.path.join(os.path.dirname(absolute1), relative)
Edit: Martijn beat me to it. Wrapping this in os.path.normpath is the way to go.

Categories