Correct Results From Python's os.path.join() - python

After reading the online documentation for the os.path.join() method, the following case seems like it should qualify but apparently it doesn't. Am I reading that documentation correctly?
>>> import os
>>>
>>> os.path.join("/home/user", "/projects/pyproject", "mycode.py")
>>> '/projects/pyproject/mycode.py'
After trying different combinations of trailing and leading os.sep on the first and second paths, it seems that the second path to join cannot have its first character start with an os.sep.
>>> os.path.join("/home/user", "projects/pyproject", "mycode.py")
>>> '/home/user/projects/pyproject/mycode.py'
In the case where path1 and path2 are parts from, say, user input means writing code to parse their input for that leading os.sep.
From the python.org online reference:
os.path.join(path1[, path2[, ...]]) Join one or more path components
intelligently. If any component is an absolute path, all previous
components (on Windows, including the previous drive letter, if there
was one) are thrown away, and joining continues. The return value is
the concatenation of path1, and optionally path2, etc., with exactly
one directory separator (os.sep) following each non-empty part except
the last. (This means that an empty last part will result in a path
that ends with a separator.) Note that on Windows, since there is a
current directory for each drive, os.path.join("c:", "foo") represents
a path relative to the current directory on drive C: (c:foo), not
c:\foo.

Am I reading that documentation correctly?
Try reading it again, emphasis mine:
Join one or more path components intelligently. If any component is an
absolute path, all previous components (on Windows, including the
previous drive letter, if there was one) are thrown away, and
joining continues. The return value is the concatenation of path1,
and optionally path2, etc., with exactly one directory separator
(os.sep) following each non-empty part except the last. (This means
that an empty last part will result in a path that ends with a
separator.) Note that on Windows, since there is a current directory
for each drive, os.path.join("c:", "foo") represents a path relative
to the current directory on drive C: (c:foo), not c:\foo.
When it says previous components are "thrown away" means that they are ignored and not included in the final result.

It is just as the documentation says: if any component is absolute, the previous components are thrown away. If your path begins with /, then it is absolute. If it's not supposed to be absolute, it shouldn't start with /.

Related

Relative path in Python

I'm writing some python code to generate the relative path. Situation need to be considered:
Under the same folder. I want "." or ".\", both of tham are ok for me.
Other folder. I want like ".\xxx\" and "..\xxx\xxx\"
os.path.relpath() will generate the relative path, but without .\ at the beginning and \ in the end. We can add \ in the end by using os.path.join(dirname, ""). But i can't figure out how to add ".\" at the beginning without impacting the first case when they are under the same folder and "..\xxx\xxx\".
It will give you relative path
import os
dir = os.path.dirname(__file__)
filename = os.path.join(dir,'Path')
The relpath() function will produce the ".." syntax given the appropriate base to start from (second parameter). For instance, supposing you were writing something like a script generator that produces code using relative paths, if the working directory is as the second parameter to relpath() as below indicates, and you want to reference in your code another file in your project under a directory one level up and two deep, you'll get "../blah/blah".. In the case where you want to prefix paths in the same folder, you can simply do a join with ".". That will produce a path with the correct OS specific separator.
print(os.path.relpath("/foo/bar/blah/blah", "/foo/bar/baz"))
>>> ../blah/blah
print(os.path.join('.', 'blah'))
>>> ./blah

How/where to use os.path.sep?

os.path.sep is the character used by the operating system to separate pathname components.
But when os.path.sep is used in os.path.join(), why does it truncate the path?
Example:
Instead of 'home/python', os.path.join returns '/python':
>>> import os
>>> os.path.join('home', os.path.sep, 'python')
'/python'
I know that os.path.join() inserts the directory separator implicitly.
Where is os.path.sep useful? Why does it truncate the path?
Where os.path.sep is usefull?
I suspect that it exists mainly because a variable like this is required in the module anyway (to avoid hardcoding), and if it's there, it might as well be documented. Its documentation says that it is "occasionally useful".
Why it truncates the path?
From the docs for os.path.join():
If a component is an absolute path, all previous components are thrown away and joining continues from the absolute path component.
and / is an absolute path on *nix systems.
Drop os.path.sep from the os.path.join() call. os.path.join() uses os.path.sep internally.
On your system, os.path.sep == '/' that is interpreted as a root directory (absolute path) and therefore os.path.join('home', '/', 'python') is equivalent to os.path.join('/', 'python') == '/python'. From the docs:
If a component is an absolute path, all previous components are thrown
away and joining continues from the absolute path component.
As correctly given in the docstring of os.path.join -
Join two or more pathname components, inserting '/' as needed. If any component is an absolute path, all previous path components will be discarded.
Same is given in the docs as well -
os.path.join(path, *paths)
Join one or more path components intelligently. The return value is the concatenation of path and any members of *paths with exactly one directory separator (os.sep) following each non-empty part except the last, meaning that the result will only end in a separator if the last part is empty. If a component is an absolute path, all previous components are thrown away and joining continues from the absolute path component.
When you give os.path.sep alone, it is considered as an absolute path to the root directory - / .
Please note , this is for unix/linux based os.path , which internally is posixpath . Though the same behavior is seen in windows os.path.join() .
Example -
>>> import os.path
>>> os.path.join.__doc__
"Join two or more pathname components, inserting '/' as needed.\n If any component is an absolute path, all previous path components\n will be discarded."
Here's the snippet of code that is run if you are on a POSIX machine:
posixpath.py
# Join pathnames.
# Ignore the previous parts if a part is absolute.
# Insert a '/' unless the first part is empty or already ends in '/'.
def join(a, *p):
"""Join two or more pathname components, inserting '/' as needed.
If any component is an absolute path, all previous path components
will be discarded. An empty last part will result in a path that
ends with a separator."""
sep = _get_sep(a)
path = a
try:
if not p:
path[:0] + sep #23780: Ensure compatible data type even if p is null.
for b in p:
if b.startswith(sep):
path = b
elif not path or path.endswith(sep):
path += b
else:
path += sep + b
except (TypeError, AttributeError, BytesWarning):
genericpath._check_arg_types('join', a, *p)
raise
return path
Specifically, the lines:
if b.startswith(sep):
path = b
And, since os.path.sep definitely starts with this character, whenever we encounter it we throw out the portion of the variable path that has already been constructed and start over with the next element in p.
But when os.path.sep is used in os.path.join() , why it truncates the path?
Quoting directly from the documentation of os.path.join
If a component is an absolute path, all previous components are thrown away and joining continues from the absolute path component.
So when you do:
os.path.join('home', os.path.sep, 'python')
os.path.sep returns '/' which is an absolute path, and so 'home' is thrown away and you get only '/python' as the output.
This can is also clear from the example:
>>> import os
>>> os.path.join('home','/python','kivy')
'/python/kivy'
Where os.path.sep is usefull?
os.path.sep or os.sep returns the character used by the operating system to separate pathname components.
But again quoting from the docs:
Note that knowing this is not sufficient to be able to parse or concatenate pathnames — use os.path.split() and os.path.join() — but it is occasionally useful.

How to split path with slashes?

I have a requirement for adding and splitting path in my app.I want to work with this app on windows and linux.Here is my code to add paths
path = os.path.join(dir0,dir1,dir2,fn)
But when i am splitting with slashes i am facing problems .Because
the path in windows like:
dir0\dir1\dir2\fn
the path in linux like
dir0/dir1/dir2/fn
Now how can i split the path with single code(with out changing the code while using other platform/platform independent)
You can use os.sep
just
import os
path_string.split(os.sep)
For more info, look the doc
os.path.join(path1[, path2[, ...]])
Join one or more path components intelligently. If any component is an absolute path, all previous components (on Windows, including the previous drive letter, if there was one) are thrown away, and joining continues. The return value is the concatenation of path1, and optionally path2, etc., with exactly one directory separator (os.sep) following each non-empty part except the last. (This means that an empty last part will result in a path that ends with a separator.) Note that on Windows, since there is a current directory for each drive, os.path.join("c:", "foo") represents a path relative to the current directory on drive C: (c:foo), not c:\foo.
Use os.path.split. It is a system independent way to split paths. Note that this only splits into (head, tail). To get all the individual parts, you need to recursively split head or use str.split using os.path.sep as the separator.

How can I tell if a file is a descendant of a given directory?

On the surface, this is pretty simple, and I could implement it myself easily. Just successively call dirname() to go up each level in the file's path and check each one to see if it's the directory we're checking for.
But symlinks throw the whole thing into chaos. Any directory along the path of either the file or directory being checked could be a symlink, and any symlink could have an arbitrary chain of symlinks to other symlinks. At this point my brain melts and I'm not sure what to do. I've tried writing the code to handle these special cases, but it soon gets too complicated and I assume I'm doing it wrong. Is there a reasonably elegant way to do this?
I'm using Python, so any mention of a library that does this would be cool. Otherwise, this is a pretty language-neutral problem.
Use os.path.realpath and os.path.commonprefix:
os.path.commonprefix(['/the/dir/', os.path.realpath(filename)]) == "/the/dir/"
os.path.realpath will expand any symlinks as well as .. in the filename. os.path.commonprefix is a bit fickle -- it doesn't really test for paths, just plain string prefixes, so you should make sure your directory ends in a directory separator. If you don't, it will claim /the/dirtwo/filename is also in /the/dir
Python 3.5 has the useful function os.path.commonpath:
Return the longest common sub-path of each pathname in the sequence paths. Raise ValueError if paths contains both absolute and relative pathnames, or if paths is empty. Unlike commonprefix(), this returns a valid path.
So to check if a file is a descendant of a directory, you could do this:
os.path.commonpath(["/the/dir", os.path.realpath(filename)]) == "/the/dir"
Unlike commonprefix, you don't need to worry if the inputs have trailing slashes or not. The return value of commonprefix always lacks a trailing slash.
Another way to do this in Python 3 is to use pathlib:
from pathlib import Path
is_descendant = Path("/the/dir") in Path(filename).resolve().parents
See documentation for Path.resolve() and Path.parents.

Detecting case mismatch on filename in Windows (preferably using python)?

I have some xml-configuration files that we create in a Windows environment but is deployed on Linux. These configuration files reference each other with filepaths. We've had problems with case-sensitivity and trailing spaces before, and I'd like to write a script that checks for these problems. We have Cygwin if that helps.
Example:
Let's say I have a reference to the file foo/bar/baz.xml, I'd do this
<someTag fileref="foo/bar/baz.xml" />
Now if we by mistake do this:
<someTag fileref="fOo/baR/baz.Xml " />
It will still work on Windows, but it will fail on Linux.
What I want to do is detect these cases where the file reference in these files don't match the real file with respect to case sensitivity.
os.listdir on a directory, in all case-preserving filesystems (including those on Windows), returns the actual case for the filenames in the directory you're listing.
So you need to do this check at each level of the path:
def onelevelok(parent, thislevel):
for fn in os.listdir(parent):
if fn.lower() == thislevel.lower():
return fn == thislevel
raise ValueError('No %r in dir %r!' % (
thislevel, parent))
where I'm assuming that the complete absence of any case variation of a name is a different kind of error, and using an exception for that; and, for the whole path (assuming no drive letters or UNC that wouldn't translate to Windows anyway):
def allpathok(path):
levels = os.path.split(path)
if os.path.isabs(path):
top = ['/']
else:
top = ['.']
return all(onelevelok(p, t)
for p, t in zip(top+levels, levels))
You may need to adapt this if , e.g., foo/bar is not to be taken to mean that foo is in the current directory, but somewhere else; or, of course, if UNC or drive letters are in fact needed (but as I mentioned translating them to Linux is not trivial anyway;-).
Implementation notes: I'm taking advantage of the fact that zip just drop "extra entries" beyond the length of the shortest of the sequences it's zipping; so I don't need to explicitly slice off the "leaf" (last entry) from levels in the first argument, zip does it for me. all will short circuit where it can, returning False as soon as it detects a false value, so it's just as good as an explicit loop but faster and more concise.
it's hard to judge what exactly your problem is, but if you apply os.path.normcase along with str.stript before saving your file name, it should solve all your problems.
as I said in comment, it's not clear how are you ending up with such a mistake. However, it would be trivial to check for existing file, as long as you have some sensible convention (all file names are lower case, for example):
try:
open(fname)
except IOError:
open(fname.lower())

Categories