Is the path separator employed inside a Python tarfile.TarFile object a '/' regardless of platform, or is it a backslash on Windows?
I basically never touch Windows, but I would kind of like the code I'm writing to be compatible with it, if it can be. Unfortunately I have no Windows host on which to test.
A quick test tells me that a (forward) slash is always used.
In fact, the tar format stores the full path of each file as a single string, using slashes (try looking at a hex dump), and python just reads that full path without any modification. Likewise, at extraction time python hard-replaces slashes with the local separator (see TarFile._extract_member).
... which makes me think that there are surely some nonconformant implementations of tar for Windows that create tarfiles with backslashs as separators!?
Related
I can't understand what os.path.split is doing. I'm debugging a program (specifically git's interface with Perforce: git-p4) and seeing that os.path.split is splitting the incoming path in ways the script isn't expecting, and also seems inconsistent with the documentation. I made some simpler tests and can't figure out what it's doing myself.
The path I want to split is //a/b (The path is actually a Perforce path, not a local filesystem path), and I need b in the second half of the returned pair. I'm running on Windows, and suspect the issue has something to do with the path not looking very Windows-esque. When I tried running my test code in an online sandbox it worked as expected unlike my Windows machine.
I've read the documentation:
os.path.split(path)
Split the pathname path into a pair, (head, tail) where tail is the last pathname component and head is everything leading up to that. The tail part will never contain a slash; if path ends in a slash, tail will be empty. If there is no slash in path, head will be empty. If path is empty, both head and tail are empty. Trailing slashes are stripped from head unless it is the root (one or more slashes only). In all cases, join(head, tail) returns a path to the same location as path (but the strings may differ). Also see the functions dirname() and basename().
My test code:
import os
print os.path.split("//a")
print os.path.split("//a/b")
print os.path.split("//a/b/c")
What I'd expect:
('//', 'a')
('//a', 'b')
('//a/b', 'c')
What I actually get on a couple online sandboxes:
('//', 'a')
('//a', 'b')
('//a/b', 'c')
What I actually get on my PC:
('//', 'a')
('//a/b', '')
('//a/b/', 'c')
Python 2 because the git-p4 code is written for Python 2.
So my first question is just for my own understanding. What's going wrong here? An OS difference?
And then beyond my own curiosity, I need a fix. I've been able to modify git-p4, but I'd of course prefer to edit it as little as possible as I'm not trying to understand it! I'm not a python expert. Is there a comparable method that can get ('//a', 'b') returned?
You are using the wrong tool to handle these paths. On Windows, paths that start with //foo/bar or \\foo\bar are seen as UNC network paths, and os.path.split() will first use os.path.splitdrive() to make sure the UNC portion is not split. The UNC or drive portion is then re-attached after splitting the remainder.
You can use the posixpath module instead, to get the POSIX behaviour:
import posixpath
posixpath.split(yourpaths)
Quoting from the top of the os.path module documentation:
Note: Since different operating systems have different path name conventions, there are several versions of this module in the standard library. The os.path module is always the path module suitable for the operating system Python is running on, and therefore usable for local paths. However, you can also import and use the individual modules if you want to manipulate a path that is always in one of the different formats. They all have the same interface:
posixpath for UNIX-style paths
ntpath for Windows paths
[...]
On Windows, os.path is the same module as ntpath, the online sandboxes must all have been POSIX systems.
Treating your Perforce paths as POSIX paths is fine, provided you always use forward slashes as path separators.
I need to resolve a disparity between the separator that sys.path is providing, and the separator that os.path.join is using.
I mimicked this Esri method (Techniques for sharing Python scripts) to make my script portable. It is being used in Windows for now, but will eventually live on a Linux server; I need to let Python determine the appropriate slash.
What they suggest:
# Get the pathname to this script
scriptPath = sys.path[0]
# Get the pathname to the ToolShare folder
toolSharePath = os.path.dirname(scriptPath)
# Now construct pathname to the ToolData folder
toolDataPath = os.path.join(toolSharePath, "ToolData")
print "ToolData folder: " + toolDataPath
But this outputs ToolData folder: C:/gis\ToolData -- and obviously the mixed slashes aren't going to work.
This Question (mixed slashes with os.path.join on windows) includes the basic approach to a solution:
check your external input (the input you apparently do not control the format of) before putting it in os.path.join. This way you make sure that os.path.join does not make bad decisions based on possibly bad input
However, I'm unsure how to ensure that it will work cross-platform. If I use .replace("/","\\") on the sys.path[0] result, that's great for Windows, but isn't that going to cause the same mixed-slash problem once I transition to Unix?
How about using os.path.normpath()?
>>> import os
>>> os.path.normpath(r'c:\my/path\to/something.py')
'c:\\my\\path\\to\\something.py'
Also worth mentioning: the Windows path API doesn't care whether forward or back slashes are used. Usually it's the fault of program that doesn't handle the slashing properly. For example, in python:
with open(r'c:/path/to/my/file.py') as f:
print f.read()
will work.
After reading the documentation and trying a lot of variations:
The os.path.abspath function can "clean" the slashes, so whichever direction slash sys.path[0] decides to use, the slashes will be replaced with the preferred separator.
scriptPath = sys.path[0]
toolDataPath = os.path.join(scriptPath, "ToolData")
Result: C:/gis\ToolData
scriptPath = sys.path[0]
toolSharePath = os.path.abspath(scriptPath)
# or, in one line: toolSharePath = os.path.abspath(sys.path[0])
toolDataPath = os.path.join(toolSharePath, "ToolData")
Result: C:\gis\ToolData
There is an os.sep character in Python, which stores your OS's preferred folder separating character. Perhaps you could perform a manual string join using that?
On Linux:
>>> import os
>>> os.sep
'/'
https://docs.python.org/2/library/os.html#os.sep
I would like to be able to check from python if a given string could be a valid cross platform folder name - below is the concrete problem I ran into (folder name ending in .), but I'm sure there are some more special cases (e.g.: con, etc.).
Is there a library for this?
From python (3.2) I created a folder on Windows (7) with a name ending in dot ('.'), e.g. (without square brackets): [What I've done on my holidays, Part II.]
When the created folder was ftp'd (to linux, but I guess that's irrelevant), it did not have the dot in it anymore (and in return, this broke a lot of hyperlinks).
I've checked it from the command line, and it seems that the folder doesn't have the '.' in the filename
mkdir tmp.
dir
cd tmp
cd ..\tmp.
Apparently, adding a single dot at the end of the folder name is ignored, e.g.:
cd c:\Users.
works just as expected.
Nope there's sadly no way to do this. For windows you basically can use the following code to remove all illegal characters - but if someone still has a FAT filesystem you'd have to handle these too since those are stricter. Basically you'll have to read the documentation for all filesystem and come up with a complete list. Here's the NTFS one as a starting point:
ILLEGAL_NTFS_CHARS = "[<>:/\\|?*\"]|[\0-\31]"
def __removeIllegalChars(name):
# removes characters that are invalid for NTFS
return re.sub(ILLEGAL_NTFS_CHARS, "", name)
And then you need some "forbidden" name list as well to get rid of COM. Pretty much a complete mess that.. and that's ignoring linux (although there it's pretty relaxed afaik)
Do not end a file or directory name with a space or a period. Although
the underlying file system may support such names, the Windows shell
and user interface does not.
http://msdn.microsoft.com/en-us/library/aa365247.aspx#naming_conventions
That page will give you information about other illegal names too, for Windows that is. Including CON as you said your self.
If you respect those (seemingly harsh) rules, I think you'll be safe on Linux and most other systems too.
To fix a problem in code for work, I was told to "use a path relative to ~". What does ~ mean in a file path? How can I make a path that is relative to ~, and use that path to open files in Python?
it is your $HOME var in UNIX, which usually is /home/username.
"Your home" meaning the home of the user who's executing a command like cd ~/MyDocuments/ is cd /home/user_executing_cd_commnd/MyDocuments
Unless you're writing a shell script or using some other language that knows to substitute the value of $HOME for ~, tildes in file paths have no special meaning and will be treated as any other non-special character.
If you are writing a shell script, shells don't interpret tildes unless they occur as the first character in an argument. In other words, ~/file will become /path/to/users/home/directory/file, but ./~/file will be interpreted literally (i.e., "a file called file in a subdirectory of . called ~").
Used in URLs, interpretation of the tilde as a shorthand for a user's home directory (e.g., http://www.foo.org/~bob) is a convention borrowed from Unix. Implementation is entirely server-specific, so you'd need to check the documentation for your web server to see if it has any special meaning.
If you are using pathlib for filenames then you can use on both Windows and Linux (I came here for a windows answer):
from pathlib import Path
p = Path('~').expanduser()
print(p)
Can all paths in a Python program use ".." (for the parent directory) and / (for separating path components), and still work whatever the platform?
On one hand, I have never seen such a claim in the documentation (I may have missed it), and the os and os.path modules do provide facilities for handling paths in a platform agnostic way (os.pardir, os.path.join,…), which lets me think that they are here for a reason.
On the other hand, you can read on StackOverflow that "../path/to/file" works on all platforms…
So, should os.pardir, os.path.join and friends always be used, for portability purposes, or are Unix path names always safe (up to possible character encoding issues)? or maybe "almost always" safe (i.e. working under Windows, OS X, and Linux)?
I've never had any problems with using .., although it might be a good idea to convert it to an absolute path using os.path.abspath. Secondly, I would recommend always using os.path.join whereever possible. There are a lot of corner cases (aside from portability issues) in joining paths, and it's good not to have to worry about them. For instance:
>>> '/foo/bar/' + 'qux'
'/foo/bar/qux'
>>> '/foo/bar' + 'qux'
'/foo/barqux'
>>> from os.path import join
>>> join('/foo/bar/', 'qux')
'/foo/bar/qux'
>>> join('/foo/bar', 'qux')
'/foo/bar/qux'
You may run into problems with using .. if you're on some obscure platforms, but I can't name any (Windows, *nix, and OS X all support that notation).
"Almost always safe" is right. All of the platforms you care about probably work ok today and I don't think they will be changing their conventions any time soon.
However Python is very portable and runs on a lot more than the usual platforms. The reason for the os module is to help smooth things over it a platform does have different requirements.
Is there a good reason for you to not use the os functions?
os.pardir is self documenting whereas ".." isn't, and os.pardir might be easier to grep for
Here is some docs from python 1.6 when Mac was still different for everything
OS routines for Mac, DOS, NT, or Posix depending on what system we're
on.
This exports:
- all functions from posix, nt, dos, os2, mac, or ce, e.g. unlink, stat, etc.
- os.path is one of the modules posixpath, ntpath, macpath, or dospath
- os.name is 'posix', 'nt', 'dos', 'os2', 'mac', or 'ce'
- os.curdir is a string representing the current directory ('.' or ':')
- os.pardir is a string representing the parent directory ('..' or '::')
- os.sep is the (or a most common) pathname separator ('/' or ':' or '\')
- os.altsep is the alternate pathname separator (None or '/')
- os.pathsep is the component separator used in $PATH etc
- os.linesep is the line separator in text files (' ' or ' ' or ' ')
- os.defpath is the default search path for executables
Programs that import and use 'os' stand a better chance of being
portable between different platforms. Of course, they must then only
use functions that are defined by all platforms (e.g., unlink and
opendir), and leave all pathname manipulation to os.path (e.g., split
and join).
Within python, using / will always work. You will need to be aware of the OS convention if you want to execute a command in a subshell
myprog = "/path/to/my/program"
os.system([myprog, "-n"]) # 1
os.system([myprog, "C:/input/file/to/myprog"]) # 2
Command #1 will probably work as expected.
Command #2 might not work if myprog is a Windows command and expects to parse its command line arguments to get a Windows file name.
It works on Windows, so if you define "whatever the platform" to be Unix and Windows, you're fine.
On the other hand, Python also runs on VMS, RISC OS, and other odd platforms that use completely different filename conventions. However, it's probable that trying to get your application to run on VMS, blind, is kind of silly anyway - "premature portability is the root of some relatively minor evil"
I like using the os.path functions anyway because they are good for expressing intent - instead of just a string concatenation, which might be done for any of a million purposes, it reads very explicitly as a path manipulation.
Windows supports / as a path separator. The only incompatibilities between Unix filenames and Windows filenames are:
the allowed characters in filenames
the special names and
case sensitivity
Windows is more restrictive in the first two accounts (this is, it has more forbidden characters and more special names), while Unix is typically case sensitive. There are some answers here listing exactly what are these characters and names. I'll see if I can find them.
Now, if your development environment comes with a function to create or manipulate paths, you should use it, it's there for a reason, y'know. Especially given that there are a lot more platforms than Windows and Unix.
Answering your first question, yes ../dir/file will work, unless they hit some of the above mentioned incompatibilities.
OS/X and Linux are both Unix compatible, so by definition they use the format you gave at the beginning of the question. Windows allows "/" in addition to "\" so that programs could be interchangeable with Xenix, a Unix variant that Microsoft was trying out a long time ago, and that compatibility has been carried forward to the present. Thus it works too.
I don't know how many other platforms Python has been ported to, and I can't speak for them.
As others have said, a forward slash will work in all cases, but you're better off creating a list of path segments and os.path.join()-ing them.