How/where to use os.path.sep? - python

os.path.sep is the character used by the operating system to separate pathname components.
But when os.path.sep is used in os.path.join(), why does it truncate the path?
Example:
Instead of 'home/python', os.path.join returns '/python':
>>> import os
>>> os.path.join('home', os.path.sep, 'python')
'/python'
I know that os.path.join() inserts the directory separator implicitly.
Where is os.path.sep useful? Why does it truncate the path?

Where os.path.sep is usefull?
I suspect that it exists mainly because a variable like this is required in the module anyway (to avoid hardcoding), and if it's there, it might as well be documented. Its documentation says that it is "occasionally useful".
Why it truncates the path?
From the docs for os.path.join():
If a component is an absolute path, all previous components are thrown away and joining continues from the absolute path component.
and / is an absolute path on *nix systems.

Drop os.path.sep from the os.path.join() call. os.path.join() uses os.path.sep internally.
On your system, os.path.sep == '/' that is interpreted as a root directory (absolute path) and therefore os.path.join('home', '/', 'python') is equivalent to os.path.join('/', 'python') == '/python'. From the docs:
If a component is an absolute path, all previous components are thrown
away and joining continues from the absolute path component.

As correctly given in the docstring of os.path.join -
Join two or more pathname components, inserting '/' as needed. If any component is an absolute path, all previous path components will be discarded.
Same is given in the docs as well -
os.path.join(path, *paths)
Join one or more path components intelligently. The return value is the concatenation of path and any members of *paths with exactly one directory separator (os.sep) following each non-empty part except the last, meaning that the result will only end in a separator if the last part is empty. If a component is an absolute path, all previous components are thrown away and joining continues from the absolute path component.
When you give os.path.sep alone, it is considered as an absolute path to the root directory - / .
Please note , this is for unix/linux based os.path , which internally is posixpath . Though the same behavior is seen in windows os.path.join() .
Example -
>>> import os.path
>>> os.path.join.__doc__
"Join two or more pathname components, inserting '/' as needed.\n If any component is an absolute path, all previous path components\n will be discarded."

Here's the snippet of code that is run if you are on a POSIX machine:
posixpath.py
# Join pathnames.
# Ignore the previous parts if a part is absolute.
# Insert a '/' unless the first part is empty or already ends in '/'.
def join(a, *p):
"""Join two or more pathname components, inserting '/' as needed.
If any component is an absolute path, all previous path components
will be discarded. An empty last part will result in a path that
ends with a separator."""
sep = _get_sep(a)
path = a
try:
if not p:
path[:0] + sep #23780: Ensure compatible data type even if p is null.
for b in p:
if b.startswith(sep):
path = b
elif not path or path.endswith(sep):
path += b
else:
path += sep + b
except (TypeError, AttributeError, BytesWarning):
genericpath._check_arg_types('join', a, *p)
raise
return path
Specifically, the lines:
if b.startswith(sep):
path = b
And, since os.path.sep definitely starts with this character, whenever we encounter it we throw out the portion of the variable path that has already been constructed and start over with the next element in p.

But when os.path.sep is used in os.path.join() , why it truncates the path?
Quoting directly from the documentation of os.path.join
If a component is an absolute path, all previous components are thrown away and joining continues from the absolute path component.
So when you do:
os.path.join('home', os.path.sep, 'python')
os.path.sep returns '/' which is an absolute path, and so 'home' is thrown away and you get only '/python' as the output.
This can is also clear from the example:
>>> import os
>>> os.path.join('home','/python','kivy')
'/python/kivy'
Where os.path.sep is usefull?
os.path.sep or os.sep returns the character used by the operating system to separate pathname components.
But again quoting from the docs:
Note that knowing this is not sufficient to be able to parse or concatenate pathnames — use os.path.split() and os.path.join() — but it is occasionally useful.

Related

How to get the relative path between two absolute paths in Python using pathlib?

In Python 3, I defined two paths using pathlib, say:
from pathlib import Path
origin = Path('middle-earth/gondor/minas-tirith/castle').resolve()
destination = Path('middle-earth/gondor/osgiliath/tower').resolve()
How can I get the relative path that leads from origin to destination? In this example, I'd like a function that returns ../../osgiliath/tower or something equivalent.
Ideally, I'd have a function relative_path that always satisfies
origin.joinpath(
relative_path(origin, destination)
).resolve() == destination.resolve()
(well, ideally there would be an operator - such that destination == origin / (destination - origin) would always be true)
Note that Path.relative_to is not sufficient in this case, since origin is not a destination's parent. Also, I'm not working with symlinks, so it's safe to assume that there are none if this simplifies the problem.
How can relative_path be implemented?
This is trivially os.path.relpath
import os.path
from pathlib import Path
origin = Path('middle-earth/gondor/minas-tirith/castle').resolve()
destination = Path('middle-earth/gondor/osgiliath/tower').resolve()
assert os.path.relpath(destination, start=origin) == '..\\..\\osgiliath\\tower'
If you'd like your own Python function to convert an absolute path to a relative path:
def absolute_file_path_to_relative(start_file_path, destination_file_path):
return (start_file_path.count("/") + start_file_path.count("\\") + 1) * (".." + ((start_file_path.find("/") > -1) and "/" or "\\")) + destination_file_path
This assumes that:
1) start_file_path starts with the same root folder as destination_file_path.
2) Types of slashes don't occur interchangably.
3) You're not using a filesystem that permits slashes in the file name.
Those assumptions may be an advantage or disadvantage, depending on your use case.
Disadvantages: if you're using pathlib, you'll break that module's API flow in your code by mixing in this function; limited use cases; inputs have to be sterile for the filesystem you're working with.
Advantages: runs 202x faster than #AdamSmith's answer (tested on Windows 7, 32-bit)

How to join with relative paths only?

For a simple web server script, I wrote the following function that resolves the url to the file system.
def resolve(url):
url = url.lstrip('/')
path = os.path.abspath(os.path.join(os.path.dirname(__file__), url))
return path
Here are some example outputs for the __file__ variable being C:\projects\resolve.py.
/index.html => C:\projects\index.html
/\index.html => C:\index.html
/C:\index.html => C:\index.html
The first example is just fine. The url get resolved to a file inside the directory of the script. However, I didn't expect the second and third example. Since the appended path is interpreted as an absolute path, it completely ignores the directory in which the script file lies.
This is a security risk since all files on the file system can be accesses, not just those inside the sub directory of the script. Why does Python's os.path.join allow joining with absolute paths and how can I prevent it?
os.path.join() is not suitable for unsafe input, no. It is entirely deliberate that an absolute path ignores arguments before it; this allows for supporting both absolute and relative paths in a configuration file, say, without having to test the entered path. Just use os.path.join(standard_location, config_path) and it'll do the right thing for you.
Take a look at Flask's safe_join() to handle untrusted filenames:
import posixpath
import os.path
_os_alt_seps = list(sep for sep in [os.path.sep, os.path.altsep]
if sep not in (None, '/'))
def safe_join(directory, filename):
# docstring omitted for brevity
filename = posixpath.normpath(filename)
for sep in _os_alt_seps:
if sep in filename:
raise NotFound()
if os.path.isabs(filename) or \
filename == '..' or \
filename.startswith('../'):
raise NotFound()
return os.path.join(directory, filename)
This uses the posixpath (the POSIX implementation for the platform-agnostic os.path module) to normalise the URL path first; this removes any embedded ../ or ./ path segments, making it a fully normalised relative or absolute path.
Then any alternative separators other than / are excluded; you are not allowed to use /\index.html for example. Last but not least, absolute filenames, or relative filenames are specifically prohibited as well.

How to split path with slashes?

I have a requirement for adding and splitting path in my app.I want to work with this app on windows and linux.Here is my code to add paths
path = os.path.join(dir0,dir1,dir2,fn)
But when i am splitting with slashes i am facing problems .Because
the path in windows like:
dir0\dir1\dir2\fn
the path in linux like
dir0/dir1/dir2/fn
Now how can i split the path with single code(with out changing the code while using other platform/platform independent)
You can use os.sep
just
import os
path_string.split(os.sep)
For more info, look the doc
os.path.join(path1[, path2[, ...]])
Join one or more path components intelligently. If any component is an absolute path, all previous components (on Windows, including the previous drive letter, if there was one) are thrown away, and joining continues. The return value is the concatenation of path1, and optionally path2, etc., with exactly one directory separator (os.sep) following each non-empty part except the last. (This means that an empty last part will result in a path that ends with a separator.) Note that on Windows, since there is a current directory for each drive, os.path.join("c:", "foo") represents a path relative to the current directory on drive C: (c:foo), not c:\foo.
Use os.path.split. It is a system independent way to split paths. Note that this only splits into (head, tail). To get all the individual parts, you need to recursively split head or use str.split using os.path.sep as the separator.

Find the common path prefix of a list of paths

My problem is to find the common path prefix of a given set of files.
Literally I was expecting that "os.path.commonprefix" would do just that. Unfortunately, the fact that commonprefix is located in path is rather misleading, since it actually will search for string prefixes.
The question to me is, how can this actually be solved for paths? The issue was briefly mentioned in this (fairly high rated) answer but only as a side-note and the proposed solution (appending slashes to the input of commonprefix) imho has issues, since it will fail for instance for:
os.path.commonprefix(['/usr/var1/log/', '/usr/var2/log/'])
# returns /usr/var but it should be /usr
To prevent others from falling into the same trap, it might be worthwhile to discuss this issue in a separate question: Is there a simple / portable solution for this problem that does not rely on nasty checks on the file system (i.e., access the result of commonprefix and check whether it is a directory and if not returns a os.path.dirname of the result)?
It seems that this issue has been corrected in recent versions of Python. New in version 3.5 is the function os.path.commonpath(), which returns the common path instead of the common string prefix.
Awhile ago I ran into this where os.path.commonprefix is a string prefix and not a path prefix as would be expected. So I wrote the following:
def commonprefix(l):
# this unlike the os.path.commonprefix version
# always returns path prefixes as it compares
# path component wise
cp = []
ls = [p.split('/') for p in l]
ml = min( len(p) for p in ls )
for i in range(ml):
s = set( p[i] for p in ls )
if len(s) != 1:
break
cp.append(s.pop())
return '/'.join(cp)
it could be made more portable by replacing '/' with os.path.sep.
Assuming you want the common directory path, one way is to:
Use only directory paths as input. If your input value is a file name, call os.path.dirname(filename) to get its directory path.
"Normalize" all the paths so that they are relative to the same thing and don't include double separators. The easiest way to do this is by calling os.path.abspath( ) to get the path relative to the root. (You might also want to use os.path.realpath( ) to remove symbolic links.)
Add a final separator (found portably with os.path.sep or os.sep) to the end of all the normalized directory paths.
Call os.path.dirname( ) on the result of os.path.commonprefix( ).
In code (without removing symbolic links):
def common_path(directories):
norm_paths = [os.path.abspath(p) + os.path.sep for p in directories]
return os.path.dirname(os.path.commonprefix(norm_paths))
def common_path_of_filenames(filenames):
return common_path([os.path.dirname(f) for f in filenames])
A robust approach is to split the path into individual components and then find the longest common prefix of the component lists.
Here is an implementation which is cross-platform and can be generalized easily to more than two paths:
import os.path
import itertools
def components(path):
'''
Returns the individual components of the given file path
string (for the local operating system).
The returned components, when joined with os.path.join(), point to
the same location as the original path.
'''
components = []
# The loop guarantees that the returned components can be
# os.path.joined with the path separator and point to the same
# location:
while True:
(new_path, tail) = os.path.split(path) # Works on any platform
components.append(tail)
if new_path == path: # Root (including drive, on Windows) reached
break
path = new_path
components.append(new_path)
components.reverse() # First component first
return components
def longest_prefix(iter0, iter1):
'''
Returns the longest common prefix of the given two iterables.
'''
longest_prefix = []
for (elmt0, elmt1) in itertools.izip(iter0, iter1):
if elmt0 != elmt1:
break
longest_prefix.append(elmt0)
return longest_prefix
def common_prefix_path(path0, path1):
return os.path.join(*longest_prefix(components(path0), components(path1)))
# For Unix:
assert common_prefix_path('/', '/usr') == '/'
assert common_prefix_path('/usr/var1/log/', '/usr/var2/log/') == '/usr'
assert common_prefix_path('/usr/var/log1/', '/usr/var/log2/') == '/usr/var'
assert common_prefix_path('/usr/var/log', '/usr/var/log2') == '/usr/var'
assert common_prefix_path('/usr/var/log', '/usr/var/log') == '/usr/var/log'
# Only for Windows:
# assert common_prefix_path(r'C:\Programs\Me', r'C:\Programs') == r'C:\Programs'
I've made a small python package commonpath to find common paths from a list. Comes with a few nice options.
https://github.com/faph/Common-Path

Correct Results From Python's os.path.join()

After reading the online documentation for the os.path.join() method, the following case seems like it should qualify but apparently it doesn't. Am I reading that documentation correctly?
>>> import os
>>>
>>> os.path.join("/home/user", "/projects/pyproject", "mycode.py")
>>> '/projects/pyproject/mycode.py'
After trying different combinations of trailing and leading os.sep on the first and second paths, it seems that the second path to join cannot have its first character start with an os.sep.
>>> os.path.join("/home/user", "projects/pyproject", "mycode.py")
>>> '/home/user/projects/pyproject/mycode.py'
In the case where path1 and path2 are parts from, say, user input means writing code to parse their input for that leading os.sep.
From the python.org online reference:
os.path.join(path1[, path2[, ...]]) Join one or more path components
intelligently. If any component is an absolute path, all previous
components (on Windows, including the previous drive letter, if there
was one) are thrown away, and joining continues. The return value is
the concatenation of path1, and optionally path2, etc., with exactly
one directory separator (os.sep) following each non-empty part except
the last. (This means that an empty last part will result in a path
that ends with a separator.) Note that on Windows, since there is a
current directory for each drive, os.path.join("c:", "foo") represents
a path relative to the current directory on drive C: (c:foo), not
c:\foo.
Am I reading that documentation correctly?
Try reading it again, emphasis mine:
Join one or more path components intelligently. If any component is an
absolute path, all previous components (on Windows, including the
previous drive letter, if there was one) are thrown away, and
joining continues. The return value is the concatenation of path1,
and optionally path2, etc., with exactly one directory separator
(os.sep) following each non-empty part except the last. (This means
that an empty last part will result in a path that ends with a
separator.) Note that on Windows, since there is a current directory
for each drive, os.path.join("c:", "foo") represents a path relative
to the current directory on drive C: (c:foo), not c:\foo.
When it says previous components are "thrown away" means that they are ignored and not included in the final result.
It is just as the documentation says: if any component is absolute, the previous components are thrown away. If your path begins with /, then it is absolute. If it's not supposed to be absolute, it shouldn't start with /.

Categories