Resolving mixed slashes from sys.path and os.path.join - python

I need to resolve a disparity between the separator that sys.path is providing, and the separator that os.path.join is using.
I mimicked this Esri method (Techniques for sharing Python scripts) to make my script portable. It is being used in Windows for now, but will eventually live on a Linux server; I need to let Python determine the appropriate slash.
What they suggest:
# Get the pathname to this script
scriptPath = sys.path[0]
# Get the pathname to the ToolShare folder
toolSharePath = os.path.dirname(scriptPath)
# Now construct pathname to the ToolData folder
toolDataPath = os.path.join(toolSharePath, "ToolData")
print "ToolData folder: " + toolDataPath
But this outputs ToolData folder: C:/gis\ToolData -- and obviously the mixed slashes aren't going to work.
This Question (mixed slashes with os.path.join on windows) includes the basic approach to a solution:
check your external input (the input you apparently do not control the format of) before putting it in os.path.join. This way you make sure that os.path.join does not make bad decisions based on possibly bad input
However, I'm unsure how to ensure that it will work cross-platform. If I use .replace("/","\\") on the sys.path[0] result, that's great for Windows, but isn't that going to cause the same mixed-slash problem once I transition to Unix?

How about using os.path.normpath()?
>>> import os
>>> os.path.normpath(r'c:\my/path\to/something.py')
'c:\\my\\path\\to\\something.py'
Also worth mentioning: the Windows path API doesn't care whether forward or back slashes are used. Usually it's the fault of program that doesn't handle the slashing properly. For example, in python:
with open(r'c:/path/to/my/file.py') as f:
print f.read()
will work.

After reading the documentation and trying a lot of variations:
The os.path.abspath function can "clean" the slashes, so whichever direction slash sys.path[0] decides to use, the slashes will be replaced with the preferred separator.
scriptPath = sys.path[0]
toolDataPath = os.path.join(scriptPath, "ToolData")
Result: C:/gis\ToolData
scriptPath = sys.path[0]
toolSharePath = os.path.abspath(scriptPath)
# or, in one line: toolSharePath = os.path.abspath(sys.path[0])
toolDataPath = os.path.join(toolSharePath, "ToolData")
Result: C:\gis\ToolData

There is an os.sep character in Python, which stores your OS's preferred folder separating character. Perhaps you could perform a manual string join using that?
On Linux:
>>> import os
>>> os.sep
'/'
https://docs.python.org/2/library/os.html#os.sep

Related

Identify String as Windows or UNIX like path in Python

How do I best identify if a String is a Windows path or a UNIX style path?
Example Strings:
some_path = '/Volumes/network-drive/file.txt'
or
some_path = 'Z:\\network-drive\\file.txt'
One way is to check which slashes the String contains:
if '/' in some_path:
# do something with UNIX Style Path
elif '\\' in some_path:
# do something else with Windows Path
Is there a better way to do this? I couldn't find suited methods in os.path or pathlib. BTW, assume that the path string will come from another system so it doesn't help to check on which OS my code runs on.

os.path.split seems to be returning wrong

I can't understand what os.path.split is doing. I'm debugging a program (specifically git's interface with Perforce: git-p4) and seeing that os.path.split is splitting the incoming path in ways the script isn't expecting, and also seems inconsistent with the documentation. I made some simpler tests and can't figure out what it's doing myself.
The path I want to split is //a/b (The path is actually a Perforce path, not a local filesystem path), and I need b in the second half of the returned pair. I'm running on Windows, and suspect the issue has something to do with the path not looking very Windows-esque. When I tried running my test code in an online sandbox it worked as expected unlike my Windows machine.
I've read the documentation:
os.path.split(path)
Split the pathname path into a pair, (head, tail) where tail is the last pathname component and head is everything leading up to that. The tail part will never contain a slash; if path ends in a slash, tail will be empty. If there is no slash in path, head will be empty. If path is empty, both head and tail are empty. Trailing slashes are stripped from head unless it is the root (one or more slashes only). In all cases, join(head, tail) returns a path to the same location as path (but the strings may differ). Also see the functions dirname() and basename().
My test code:
import os
print os.path.split("//a")
print os.path.split("//a/b")
print os.path.split("//a/b/c")
What I'd expect:
('//', 'a')
('//a', 'b')
('//a/b', 'c')
What I actually get on a couple online sandboxes:
('//', 'a')
('//a', 'b')
('//a/b', 'c')
What I actually get on my PC:
('//', 'a')
('//a/b', '')
('//a/b/', 'c')
Python 2 because the git-p4 code is written for Python 2.
So my first question is just for my own understanding. What's going wrong here? An OS difference?
And then beyond my own curiosity, I need a fix. I've been able to modify git-p4, but I'd of course prefer to edit it as little as possible as I'm not trying to understand it! I'm not a python expert. Is there a comparable method that can get ('//a', 'b') returned?
You are using the wrong tool to handle these paths. On Windows, paths that start with //foo/bar or \\foo\bar are seen as UNC network paths, and os.path.split() will first use os.path.splitdrive() to make sure the UNC portion is not split. The UNC or drive portion is then re-attached after splitting the remainder.
You can use the posixpath module instead, to get the POSIX behaviour:
import posixpath
posixpath.split(yourpaths)
Quoting from the top of the os.path module documentation:
Note: Since different operating systems have different path name conventions, there are several versions of this module in the standard library. The os.path module is always the path module suitable for the operating system Python is running on, and therefore usable for local paths. However, you can also import and use the individual modules if you want to manipulate a path that is always in one of the different formats. They all have the same interface:
posixpath for UNIX-style paths
ntpath for Windows paths
[...]
On Windows, os.path is the same module as ntpath, the online sandboxes must all have been POSIX systems.
Treating your Perforce paths as POSIX paths is fine, provided you always use forward slashes as path separators.

When to use `/` and when to use `\` in regards to path Python

I have a few questions regarding to the path in Python using os module:
(1) If using os module, is there any difference between \ and / in regards to the absolute path of a file?
For examples:
import os
example_path_1 = "C:\abc\def"
example_path_2 = "C:/abc/def"
a. Can os.system(example_path_1) and os.system(example_path_2) both work?
b. Can os.mkdir(example_path_1) and os.mkdir(example_path_2) both work?
(2) When using the os module in Python, if I'm getting this right, it seems in some situations we have to use /, and the other situations we have to use \. How to tell the difference?
You would be safe with always sticking to forward slashes
example_path = "/c/abc/def"
If you use windows style, you need to escape them or use a raw string
example_path = "C:\\abc\\def"
example_path = r"C:\abc\def"
In general, stick to doing as much as you can in the os.path module, it will handle these OS-specific issues fairly robustly. For example you can pass a path to os.path.normpath and it will normalize your slashes to whatever platform you're on. Similarly building up paths with os.path.join will insert the correct slashes for your system.

Does os.path.sep affect the tarfile module?

Is the path separator employed inside a Python tarfile.TarFile object a '/' regardless of platform, or is it a backslash on Windows?
I basically never touch Windows, but I would kind of like the code I'm writing to be compatible with it, if it can be. Unfortunately I have no Windows host on which to test.
A quick test tells me that a (forward) slash is always used.
In fact, the tar format stores the full path of each file as a single string, using slashes (try looking at a hex dump), and python just reads that full path without any modification. Likewise, at extraction time python hard-replaces slashes with the local separator (see TarFile._extract_member).
... which makes me think that there are surely some nonconformant implementations of tar for Windows that create tarfiles with backslashs as separators!?

which one should I use: os.sep or os.path.sep?

They are same, but which one should I use?
http://docs.python.org/library/os.html:
os.sep
The character used by the operating system to separate pathname components. This is '/' for POSIX and '\' for Windows. Note that knowing this is not sufficient to be able to parse or concatenate pathnames — use os.path.split() and os.path.join() — but it is occasionally useful. Also available via os.path.
I'd use os.path.sep to make it very clear that it's the path separator… But consistency is more important, so if one is already being used, use that. Otherwise, pick one and use it all the time.
Edit: Just to make sure you're not reinventing the wheel, though, the path module already has join, split, dirname, and basename functions… So you should rarely need to use path.sep:
>>> os.path.join("foo", "bar", "baz")
'foo/bar/baz'
>>> os.path.split(_)
('foo/bar', 'baz')
I recommend you use os.path.sep for clarity, since it's a path separator, not an OS separator. If you import os.path as path you can call it path.sep, which is even better.
If you are using Python 2.7, I suggest using os.sep (works) instead of os.path.sep (broken) as Jython on Windows has a bug returning a "/" slash instead of the required "\" backslash.
The following examples could highlight the differences between os.path.join and os.path.sep.join.
>>> import os
>>> os.path.join("output", "images", "saved")
'output/images/saved'
>>> os.path.sep.join(["output", "images", "saved"])
'output/images/saved'
I guess the os.path.sep.join is more robust and can be used w/o modifications for any os.
As mentioned in the python docs, both os.path.sep and os.sep return the same output.
The character used by the operating system to separate pathname
components. This is '/' for POSIX and '\' for Windows.
Both of them belongs to the same python class also.
print(type(os.sep))
print(type(os.path.sep))
# Output
<class 'str'>
<class 'str'>
Both have them have the same documentation.
print(os.path.sep.__doc__)
print(os.sep.__doc__)
# The outputs of both print statements are the same.
So, I think after Python2 where we mostly used os.sep, in Python3 only the consistency matters as far as their uses are concerned.

Categories