Unix paths that work for any platform in Python? - python

Can all paths in a Python program use ".." (for the parent directory) and / (for separating path components), and still work whatever the platform?
On one hand, I have never seen such a claim in the documentation (I may have missed it), and the os and os.path modules do provide facilities for handling paths in a platform agnostic way (os.pardir, os.path.join,…), which lets me think that they are here for a reason.
On the other hand, you can read on StackOverflow that "../path/to/file" works on all platforms…
So, should os.pardir, os.path.join and friends always be used, for portability purposes, or are Unix path names always safe (up to possible character encoding issues)? or maybe "almost always" safe (i.e. working under Windows, OS X, and Linux)?

I've never had any problems with using .., although it might be a good idea to convert it to an absolute path using os.path.abspath. Secondly, I would recommend always using os.path.join whereever possible. There are a lot of corner cases (aside from portability issues) in joining paths, and it's good not to have to worry about them. For instance:
>>> '/foo/bar/' + 'qux'
'/foo/bar/qux'
>>> '/foo/bar' + 'qux'
'/foo/barqux'
>>> from os.path import join
>>> join('/foo/bar/', 'qux')
'/foo/bar/qux'
>>> join('/foo/bar', 'qux')
'/foo/bar/qux'
You may run into problems with using .. if you're on some obscure platforms, but I can't name any (Windows, *nix, and OS X all support that notation).

"Almost always safe" is right. All of the platforms you care about probably work ok today and I don't think they will be changing their conventions any time soon.
However Python is very portable and runs on a lot more than the usual platforms. The reason for the os module is to help smooth things over it a platform does have different requirements.
Is there a good reason for you to not use the os functions?
os.pardir is self documenting whereas ".." isn't, and os.pardir might be easier to grep for
Here is some docs from python 1.6 when Mac was still different for everything
OS routines for Mac, DOS, NT, or Posix depending on what system we're
on.
This exports:
- all functions from posix, nt, dos, os2, mac, or ce, e.g. unlink, stat, etc.
- os.path is one of the modules posixpath, ntpath, macpath, or dospath
- os.name is 'posix', 'nt', 'dos', 'os2', 'mac', or 'ce'
- os.curdir is a string representing the current directory ('.' or ':')
- os.pardir is a string representing the parent directory ('..' or '::')
- os.sep is the (or a most common) pathname separator ('/' or ':' or '\')
- os.altsep is the alternate pathname separator (None or '/')
- os.pathsep is the component separator used in $PATH etc
- os.linesep is the line separator in text files (' ' or ' ' or ' ')
- os.defpath is the default search path for executables
Programs that import and use 'os' stand a better chance of being
portable between different platforms. Of course, they must then only
use functions that are defined by all platforms (e.g., unlink and
opendir), and leave all pathname manipulation to os.path (e.g., split
and join).

Within python, using / will always work. You will need to be aware of the OS convention if you want to execute a command in a subshell
myprog = "/path/to/my/program"
os.system([myprog, "-n"]) # 1
os.system([myprog, "C:/input/file/to/myprog"]) # 2
Command #1 will probably work as expected.
Command #2 might not work if myprog is a Windows command and expects to parse its command line arguments to get a Windows file name.

It works on Windows, so if you define "whatever the platform" to be Unix and Windows, you're fine.
On the other hand, Python also runs on VMS, RISC OS, and other odd platforms that use completely different filename conventions. However, it's probable that trying to get your application to run on VMS, blind, is kind of silly anyway - "premature portability is the root of some relatively minor evil"
I like using the os.path functions anyway because they are good for expressing intent - instead of just a string concatenation, which might be done for any of a million purposes, it reads very explicitly as a path manipulation.

Windows supports / as a path separator. The only incompatibilities between Unix filenames and Windows filenames are:
the allowed characters in filenames
the special names and
case sensitivity
Windows is more restrictive in the first two accounts (this is, it has more forbidden characters and more special names), while Unix is typically case sensitive. There are some answers here listing exactly what are these characters and names. I'll see if I can find them.
Now, if your development environment comes with a function to create or manipulate paths, you should use it, it's there for a reason, y'know. Especially given that there are a lot more platforms than Windows and Unix.
Answering your first question, yes ../dir/file will work, unless they hit some of the above mentioned incompatibilities.

OS/X and Linux are both Unix compatible, so by definition they use the format you gave at the beginning of the question. Windows allows "/" in addition to "\" so that programs could be interchangeable with Xenix, a Unix variant that Microsoft was trying out a long time ago, and that compatibility has been carried forward to the present. Thus it works too.
I don't know how many other platforms Python has been ported to, and I can't speak for them.

As others have said, a forward slash will work in all cases, but you're better off creating a list of path segments and os.path.join()-ing them.

Related

os.path.split seems to be returning wrong

I can't understand what os.path.split is doing. I'm debugging a program (specifically git's interface with Perforce: git-p4) and seeing that os.path.split is splitting the incoming path in ways the script isn't expecting, and also seems inconsistent with the documentation. I made some simpler tests and can't figure out what it's doing myself.
The path I want to split is //a/b (The path is actually a Perforce path, not a local filesystem path), and I need b in the second half of the returned pair. I'm running on Windows, and suspect the issue has something to do with the path not looking very Windows-esque. When I tried running my test code in an online sandbox it worked as expected unlike my Windows machine.
I've read the documentation:
os.path.split(path)
Split the pathname path into a pair, (head, tail) where tail is the last pathname component and head is everything leading up to that. The tail part will never contain a slash; if path ends in a slash, tail will be empty. If there is no slash in path, head will be empty. If path is empty, both head and tail are empty. Trailing slashes are stripped from head unless it is the root (one or more slashes only). In all cases, join(head, tail) returns a path to the same location as path (but the strings may differ). Also see the functions dirname() and basename().
My test code:
import os
print os.path.split("//a")
print os.path.split("//a/b")
print os.path.split("//a/b/c")
What I'd expect:
('//', 'a')
('//a', 'b')
('//a/b', 'c')
What I actually get on a couple online sandboxes:
('//', 'a')
('//a', 'b')
('//a/b', 'c')
What I actually get on my PC:
('//', 'a')
('//a/b', '')
('//a/b/', 'c')
Python 2 because the git-p4 code is written for Python 2.
So my first question is just for my own understanding. What's going wrong here? An OS difference?
And then beyond my own curiosity, I need a fix. I've been able to modify git-p4, but I'd of course prefer to edit it as little as possible as I'm not trying to understand it! I'm not a python expert. Is there a comparable method that can get ('//a', 'b') returned?
You are using the wrong tool to handle these paths. On Windows, paths that start with //foo/bar or \\foo\bar are seen as UNC network paths, and os.path.split() will first use os.path.splitdrive() to make sure the UNC portion is not split. The UNC or drive portion is then re-attached after splitting the remainder.
You can use the posixpath module instead, to get the POSIX behaviour:
import posixpath
posixpath.split(yourpaths)
Quoting from the top of the os.path module documentation:
Note: Since different operating systems have different path name conventions, there are several versions of this module in the standard library. The os.path module is always the path module suitable for the operating system Python is running on, and therefore usable for local paths. However, you can also import and use the individual modules if you want to manipulate a path that is always in one of the different formats. They all have the same interface:
posixpath for UNIX-style paths
ntpath for Windows paths
[...]
On Windows, os.path is the same module as ntpath, the online sandboxes must all have been POSIX systems.
Treating your Perforce paths as POSIX paths is fine, provided you always use forward slashes as path separators.

Does os.path.sep affect the tarfile module?

Is the path separator employed inside a Python tarfile.TarFile object a '/' regardless of platform, or is it a backslash on Windows?
I basically never touch Windows, but I would kind of like the code I'm writing to be compatible with it, if it can be. Unfortunately I have no Windows host on which to test.
A quick test tells me that a (forward) slash is always used.
In fact, the tar format stores the full path of each file as a single string, using slashes (try looking at a hex dump), and python just reads that full path without any modification. Likewise, at extraction time python hard-replaces slashes with the local separator (see TarFile._extract_member).
... which makes me think that there are surely some nonconformant implementations of tar for Windows that create tarfiles with backslashs as separators!?

Resolving mixed slashes from sys.path and os.path.join

I need to resolve a disparity between the separator that sys.path is providing, and the separator that os.path.join is using.
I mimicked this Esri method (Techniques for sharing Python scripts) to make my script portable. It is being used in Windows for now, but will eventually live on a Linux server; I need to let Python determine the appropriate slash.
What they suggest:
# Get the pathname to this script
scriptPath = sys.path[0]
# Get the pathname to the ToolShare folder
toolSharePath = os.path.dirname(scriptPath)
# Now construct pathname to the ToolData folder
toolDataPath = os.path.join(toolSharePath, "ToolData")
print "ToolData folder: " + toolDataPath
But this outputs ToolData folder: C:/gis\ToolData -- and obviously the mixed slashes aren't going to work.
This Question (mixed slashes with os.path.join on windows) includes the basic approach to a solution:
check your external input (the input you apparently do not control the format of) before putting it in os.path.join. This way you make sure that os.path.join does not make bad decisions based on possibly bad input
However, I'm unsure how to ensure that it will work cross-platform. If I use .replace("/","\\") on the sys.path[0] result, that's great for Windows, but isn't that going to cause the same mixed-slash problem once I transition to Unix?
How about using os.path.normpath()?
>>> import os
>>> os.path.normpath(r'c:\my/path\to/something.py')
'c:\\my\\path\\to\\something.py'
Also worth mentioning: the Windows path API doesn't care whether forward or back slashes are used. Usually it's the fault of program that doesn't handle the slashing properly. For example, in python:
with open(r'c:/path/to/my/file.py') as f:
print f.read()
will work.
After reading the documentation and trying a lot of variations:
The os.path.abspath function can "clean" the slashes, so whichever direction slash sys.path[0] decides to use, the slashes will be replaced with the preferred separator.
scriptPath = sys.path[0]
toolDataPath = os.path.join(scriptPath, "ToolData")
Result: C:/gis\ToolData
scriptPath = sys.path[0]
toolSharePath = os.path.abspath(scriptPath)
# or, in one line: toolSharePath = os.path.abspath(sys.path[0])
toolDataPath = os.path.join(toolSharePath, "ToolData")
Result: C:\gis\ToolData
There is an os.sep character in Python, which stores your OS's preferred folder separating character. Perhaps you could perform a manual string join using that?
On Linux:
>>> import os
>>> os.sep
'/'
https://docs.python.org/2/library/os.html#os.sep

wrong rename with python

I need to rename my picture with the EXIF data, but I have a problem: if I use ":" to separate the time (hour:minute:second), the file name gets crazy!
metadata = pyexiv2.ImageMetadata(lunga + i)
metadata.read()
tag = metadata['Exif.Image.DateTime']
estensione = ".jpg"
new_data = tag.value.strftime('%Y-%m-%d %H-%M-%S')
new_name = lunga + str(new_data) + estensione
os.renames(lunga + i, new_name)
works great, but with
new_data = tag.value.strftime('%Y-%m-%d %H:%M:%S')
I get something like
2A443K~H.jpg
The problem is that you're not allowed to put colons into filenames on Windows. You're not actually using Windows… but you are using an SMB share, which means you're bound by Windows rules.
The fix is to not put colons into your filenames.
If you want to understand why this bizarre stuff is happening, read on.
The details on Windows filenames are described in Naming Files, Paths, and Namespaces at MSDN, but I'll summarize the relevant parts here.
The NT kernel underneath Windows has no problems with colons, but the Win32 layer on top of it can't handle them (and the quasi-POSIX layer in MSVCRT sits on top of Win32).
So, at the C level, if you call NT functions like NtSetInformationFile, it will save them just fine. If you call Win32 functions like MoveFileEx, they will normally give you an error, but if you use the special \\?\ syntax to say "pass this name straight through to NT", it will work. And if you call MSVCRT functions like rename, you will get an error. Older versions of Python called rename, which would just give you an error. Newer versions call MoveFileEx, and will try to wrap the name up in \\?\ syntax (because that also allows you to get around some other stupid limitations, like the excessively short MAX_PATH value).
So, what happens if you give a file a name that Win32 can't understand? Remember that on Windows, every file has two different names: the "long name" and the "short name". The short name is a DOS-style 8.3 filename. So whenever it can't display the long name, it displays the short name instead.
Where does the short name come from? If you don't create one explicitly, Windows will create one for you from the long name by using the first 6 characters, a tilde, and a number of letter. So, for example, the short name for "Program Files" is "PROGRA~1". But if Windows can't handle the long name, it will just make up a short name out of 6 random characters, a tilde, and a random character. So you get something like 2A443K~H.
The NTFS filesystem, being designed for Windows, expects to be used in Windows-y ways. So, if you're using an NTFS volume, even on a non-Windows system, the driver will emulate some of this functionality, giving you similar but not identical behavior.
And of course if you're talking to a share from a Windows system, or a share backed by an NTFS drive on a non-Windows system, again, some of the same things will apply.
Even if both your computer and the file server are non-Windows and the filesystem is not NTFS, if you're using SMB/CIFS for file sharing, SMB was also designed for Windows, and you will again get similar behavior.
At least you no longer have to worry about VMS, classic Mac, and other naming systems, just POSIX and Windows.
Colons are reserved characters in Windows filesystem (see How would I go about creating a filename with invalid characters such as :?>?), so the name was replaced with an auto-generated on instead.
To be clear, this is not a Python issue. Don't use colons or other reserved characters in filenames if you don't want this to happen.

how to check for platform incompatible folder (file) names in python

I would like to be able to check from python if a given string could be a valid cross platform folder name - below is the concrete problem I ran into (folder name ending in .), but I'm sure there are some more special cases (e.g.: con, etc.).
Is there a library for this?
From python (3.2) I created a folder on Windows (7) with a name ending in dot ('.'), e.g. (without square brackets): [What I've done on my holidays, Part II.]
When the created folder was ftp'd (to linux, but I guess that's irrelevant), it did not have the dot in it anymore (and in return, this broke a lot of hyperlinks).
I've checked it from the command line, and it seems that the folder doesn't have the '.' in the filename
mkdir tmp.
dir
cd tmp
cd ..\tmp.
Apparently, adding a single dot at the end of the folder name is ignored, e.g.:
cd c:\Users.
works just as expected.
Nope there's sadly no way to do this. For windows you basically can use the following code to remove all illegal characters - but if someone still has a FAT filesystem you'd have to handle these too since those are stricter. Basically you'll have to read the documentation for all filesystem and come up with a complete list. Here's the NTFS one as a starting point:
ILLEGAL_NTFS_CHARS = "[<>:/\\|?*\"]|[\0-\31]"
def __removeIllegalChars(name):
# removes characters that are invalid for NTFS
return re.sub(ILLEGAL_NTFS_CHARS, "", name)
And then you need some "forbidden" name list as well to get rid of COM. Pretty much a complete mess that.. and that's ignoring linux (although there it's pretty relaxed afaik)
Do not end a file or directory name with a space or a period. Although
the underlying file system may support such names, the Windows shell
and user interface does not.
http://msdn.microsoft.com/en-us/library/aa365247.aspx#naming_conventions
That page will give you information about other illegal names too, for Windows that is. Including CON as you said your self.
If you respect those (seemingly harsh) rules, I think you'll be safe on Linux and most other systems too.

Categories