python os.path.join on windows ignores first path element? - python

Consider the following:
>>> from django.conf import settings
>>> import os
>>> settings.VIRTUAL_ENV
'C:/Users/Marcin/Documents/oneclickcos'
>>> settings.EXTRA_BASE
'/oneclickcos/'
>>> os.path.join(settings.VIRTUAL_ENV,settings.EXTRA_BASE)
'/oneclickcos/'
As you can imagine, I neither expect nor want the concatenation of 'C:/Users/Marcin/Documents/oneclickcos' and '/oneclickcos/' to be '/oneclickcos/'.
Oddly enough, reversing the path components once again shows python ignoring the first path component:
>>> os.path.join(settings.EXTRA_BASE,settings.VIRTUAL_ENV)
'C:/Users/Marcin/Documents/oneclickcos'
While this works something like expected:
>>> os.path.join('/foobar',settings.VIRTUAL_ENV,'barfoo')
'C:/Users/Marcin/Documents/oneclickcos\\barfoo'
I am of course, running on Windows (Windows 7), with the native python.
Why is this happening, and what can I do about it?

That's pretty much how os.path.join is defined (quoting the docs):
If any component is an absolute path, all previous components (on Windows, including the previous drive letter, if there was one) are thrown away
And I'd say it's usually a good thing, as it avoids creating invalid paths. If you want to avoid this behavior, don't feed it absolute paths. Yes, starting with a slash qualifies as absolute path. A quick and dirty solution is just removing the leading slash (settings.EXTRA_BASE.lstrip('/') if you want to do it programmatically).

Remove the leading / from the second string:
>>> os.path.join('C:/Users/Marcin/Documents/oneclickcos', 'oneclickos/')
'C:/Users/Marcin/Documents/oneclickcos\\oneclickos/'
This is because os.path.join discards all previous components once it meets an absolute path, and /oneclickos/ is an absolute path.
Here's an excerpt from the doc of os.path.join:
Join one or more path components intelligently. If any component is an
absolute path, all previous components (on Windows, including the
previous drive letter, if there was one) are thrown away, and joining
continues. [...]

Related

Pathlib 'normalizes' UNC paths with "$"

On Python3.8, I'm trying to use pathlib to concatenate a string to a UNC path that's on a remote computer's C drive.
It's weirdly inconsistent.
For example:
>>> remote = Path("\\\\remote\\", "C$\\Some\\Path")
>>> remote
WindowsPath('//remote//C$/Some/Path')
>>> remote2 = Path(remote, "More")
>>> remote2
WindowsPath('/remote/C$/Some/Path/More')
Notice how the initial // is turned into /?
Put the initial path in one line though, and everything is fine:
>>> remote = Path("\\\\remote\\C$\\Some\\Path")
>>> remote
WindowsPath('//remote/C$/Some/Path')
>>> remote2 = Path(remote, "more")
>>> remote2
WindowsPath('//remote/C$/Some/Path/more')
This works as a workaround, but I suspect I'm misunderstanding how it's supposed to work or doing it wrong.
Anyone got a clue what's happening?
tldr: you should give the entire UNC share (\\\\host\\share) as a single unit, pathlib has special-case handling of UNC paths but it needs specifically this prefix in order to recognize a path as UNC. You can't use pathlib's facilities to separately manage host and share, it makes pathlib blow a gasket.
The Path constructor normalises (deduplicates) path separators:
>>> PPP('///foo//bar////qux')
PurePosixPath('/foo/bar/qux')
>>> PWP('///foo//bar////qux')
PureWindowsPath('/foo/bar/qux')
PureWindowsPath has a special case for paths recognised as UNC, that is //host/share... which avoids collapsing leading separators.
However your initial concatenation puts it in a weird funk because it creates a path of the form //host//share... then the path gets converted back to a string when passed to the constructor, at which point it doesn't match a UNC anymore and all the separators get collapsed:
>>> PWP("\\\\remote\\", "C$\\Some\\Path")
PureWindowsPath('//remote//C$/Some/Path')
>>> str(PWP("\\\\remote\\", "C$\\Some\\Path"))
'\\\\remote\\\\C$\\Some\\Path'
>>> PWP(str(PWP("\\\\remote\\", "C$\\Some\\Path")))
PureWindowsPath('/remote/C$/Some/Path')
the issue seems to be specifically the presence of a trailing separator on a UNC-looking path, I don't know if it's a bug or if it's matching some other UNC-style (but not UNC) special case:
>>> PWP("//remote")
PureWindowsPath('/remote')
>>> PWP("//remote/")
PureWindowsPath('//remote//') # this one is weird, the trailing separator gets doubled which breaks everything
>>> PWP("//remote/foo")
PureWindowsPath('//remote/foo/')
>>> PWP("//remote//foo")
PureWindowsPath('/remote/foo')
These behaviours don't really seem documented, the pathlib doc specifically notes that it collapses path separators, and has a few examples of UNC which show that it doesn't, but I don't really know what's supposed to happen exactly. Either way it only seems to handle UNC paths somewhat properly if the first two segments are kept as a single "drive" unit, and that the share-path is considered a drive is specifically documented.
Of note: using joinpath / / doesn't seem to trigger a re-normalisation, your path remains improper (because the second pathsep between host and share remains doubled) but it doesn't get completely collapsed.

How to save file with '/' in the filename in python on Mac OS? [duplicate]

I know that this is not something that should ever be done, but is there a way to use the slash character that normally separates directories within a filename in Linux?
The answer is that you can't, unless your filesystem has a bug. Here's why:
There is a system call for renaming your file defined in fs/namei.c called renameat:
SYSCALL_DEFINE4(renameat, int, olddfd, const char __user *, oldname,
int, newdfd, const char __user *, newname)
When the system call gets invoked, it does a path lookup (do_path_lookup) on the name. Keep tracing this, and we get to link_path_walk which has this:
static int link_path_walk(const char *name, struct nameidata *nd)
{
struct path next;
int err;
unsigned int lookup_flags = nd->flags;
while (*name=='/')
name++;
if (!*name)
return 0;
...
This code applies to any file system. What's this mean? It means that if you try to pass a parameter with an actual '/' character as the name of the file using traditional means, it will not do what you want. There is no way to escape the character. If a filesystem "supports" this, it's because they either:
Use a unicode character or something that looks like a slash but isn't.
They have a bug.
Furthermore, if you did go in and edit the bytes to add a slash character into a file name, bad things would happen. That's because you could never refer to this file by name :( since anytime you did, Linux would assume you were referring to a nonexistent directory. Using the 'rm *' technique would not work either, since bash simply expands that to the filename. Even rm -rf wouldn't work, since a simple strace reveals how things go on under the hood (shortened):
$ ls testdir
myfile2 out
$ strace -vf rm -rf testdir
...
unlinkat(3, "myfile2", 0) = 0
unlinkat(3, "out", 0) = 0
fcntl(3, F_GETFD) = 0x1 (flags FD_CLOEXEC)
close(3) = 0
unlinkat(AT_FDCWD, "testdir", AT_REMOVEDIR) = 0
...
Notice that these calls to unlinkat would fail because they need to refer to the files by name.
You could use a Unicode character that displays as / (for example the fraction slash), assuming your filesystem supports it.
It depends on what filesystem you are using. Of some of the more popular ones:
ext3: No
ext4: No
jfs: Yes
reiserfs: No
xfs: No
Only with an agreed-upon encoding. For example, you could agree that % will be encoded as %% and that %2F will mean a /. All the software that accessed this file would have to understand the encoding.
The short answer is: No, you can't. It's a necessary prohibition because of how the directory structure is defined.
And, as mentioned, you can display a unicode character that "looks like" a slash, but that's as far as you get.
In general it's a bad idea to try to use "bad" characters in a file name at all; even if you somehow manage it, it tends to make it hard to use the file later. The filesystem separator is flat-out not going to work at all, so you're going to need to pick an alternative method.
Have you considered URL-encoding the URL then using that as the filename? The result should be fine as a filename, and it's easy to reconstruct the name from the encoded version.
Another option is to create an index - create the output filename using whatever method you like - sequentially-numbered names, SHA1 hashes, whatever - then write a file with the generated filename/URL pair. You can save that into a hash and use it to do a URL-to-filename lookup or vice-versa with the reversed version of the hash, and you can write it out and reload it later if needed.
The short answer is: you must not. The long answer is, you probably can or it depends on where you are viewing it from and in which layer you are working with.
Since the question has Unix tag in it, I am going to answer for Unix.
As mentioned in other answers that, you must not use forward slashes in a filename.
However, in MacOS you can create a file with forward slashes / by:
# avoid doing it at all cost
touch 'foo:bar'
Now, when you see this filename from terminal you will see it as foo:bar
But, if you see it from finder: you will see finder converted it as foo/bar
Same thing can be done the other way round, if you create a file from finder with forward slashes in it like /foobar, there will be a conversion done in the background. As a result, you will see :foobar in terminal but the other way round when viewed from finder.
So, : is valid in the unix layer, but it is translated to or from / in the Mac layers like Finder window, GUI. : the colon is used as the separator in HFS paths and the slash / is used as the separator in POSIX paths
So there is a two-way translation happening, depending on which “layer” you are working with.
See more details here: https://apple.stackexchange.com/a/283095/323181
You can have a filename with a / in Linux and Unix. This is a very old question, but surprisingly nobody has said it in almost 10 years since the question was asked.
Every Unix and Linux system has the root directory named /. A directory is just a special kind of file. Symbolic links, character devices, etc are also special kinds of files. See here for an in depth discussion.
You can't create any other files with a /, but you certainly have one -- and a very important one at that.

Pythonic way to develop the ~ in a path?

(Linux/Python 3.5)
I want to normalize the ~ character in strings like "~/something" and get something like "/home/something/" .
I don't want to use a replacement of my own since I'd like to use a very generalist way to achieve this, using a Python module by example.
Neither os.path neither pathlib seem to fit my expectations.
Any idea ?
You apparently missed the os.path.expanduser() function:
On Unix and Windows, return the argument with an initial component of ~ or ~user replaced by that user‘s home directory.

Grab the last 2 of a split string, python

I've got a set of file directories that I am manipulating with python. However, all I care about is the last two levels of the directory. So if I had
"topdirectory/sub1/subsub1/subsubsub1/target"
"topdirectory/sub1/target"
The necesary returned strings would be
"subsubsub1/target"
and
"sub1/target"
I know python has a split string type method, but how can I tell it to only grab the LAST 2 components separated by delimeters?
Edit : Sorry guys, I should have explained that this is not REALLY a directory/file setup, but a timeseries DB that very closely resembles one. I figured it would just be easier to explain that way. The paths are essentially directories/files, but since it is a database, using the OS utilites wouldn't have any effect.
The os.path module contains a split function for this. It returns the dirname and the basename. Run it twice and you have the last two bases.
Obviously, you want some checking that there are two or more bases as well.
Try
"topdirectory/sub1/subsub1/subsubsub1/target".rsplit('/',2)[-2:]
This approach works for any string in general.
But as stated in the comments, if you refer to the system path, I'd rather use os module as suggested by Sean Perry. Note that on different operating system, delimiter can be different, etc.

Check if a string is valid absolute path address format

I have a string which contains user input for a directory address on a linux system. I need to check if it is properly formatted and could be an address in Python 2.6. It's important to note that this is not on the current system so I can't check if it is there using os.path nor can I try to create the directories as the function will be run many times.
These strings will always be absolute paths, so my first thought was to look for a leading slash. From there I wondered about checking if the rest of the string only contains valid characters and does not contain any double slashes. This seems a little clunky, any other ideas?
Sure the question has been edited since writing this but:
There is the os.path.isabs(PATH) which will tell you if the path is absolute or not.
Return True if path is an absolute pathname. On Unix, that means it begins with a slash, on Windows that it begins with a (back)slash after chopping off a potential drive letter.

Categories