Pathlib 'normalizes' UNC paths with "$" - python

On Python3.8, I'm trying to use pathlib to concatenate a string to a UNC path that's on a remote computer's C drive.
It's weirdly inconsistent.
For example:
>>> remote = Path("\\\\remote\\", "C$\\Some\\Path")
>>> remote
WindowsPath('//remote//C$/Some/Path')
>>> remote2 = Path(remote, "More")
>>> remote2
WindowsPath('/remote/C$/Some/Path/More')
Notice how the initial // is turned into /?
Put the initial path in one line though, and everything is fine:
>>> remote = Path("\\\\remote\\C$\\Some\\Path")
>>> remote
WindowsPath('//remote/C$/Some/Path')
>>> remote2 = Path(remote, "more")
>>> remote2
WindowsPath('//remote/C$/Some/Path/more')
This works as a workaround, but I suspect I'm misunderstanding how it's supposed to work or doing it wrong.
Anyone got a clue what's happening?

tldr: you should give the entire UNC share (\\\\host\\share) as a single unit, pathlib has special-case handling of UNC paths but it needs specifically this prefix in order to recognize a path as UNC. You can't use pathlib's facilities to separately manage host and share, it makes pathlib blow a gasket.
The Path constructor normalises (deduplicates) path separators:
>>> PPP('///foo//bar////qux')
PurePosixPath('/foo/bar/qux')
>>> PWP('///foo//bar////qux')
PureWindowsPath('/foo/bar/qux')
PureWindowsPath has a special case for paths recognised as UNC, that is //host/share... which avoids collapsing leading separators.
However your initial concatenation puts it in a weird funk because it creates a path of the form //host//share... then the path gets converted back to a string when passed to the constructor, at which point it doesn't match a UNC anymore and all the separators get collapsed:
>>> PWP("\\\\remote\\", "C$\\Some\\Path")
PureWindowsPath('//remote//C$/Some/Path')
>>> str(PWP("\\\\remote\\", "C$\\Some\\Path"))
'\\\\remote\\\\C$\\Some\\Path'
>>> PWP(str(PWP("\\\\remote\\", "C$\\Some\\Path")))
PureWindowsPath('/remote/C$/Some/Path')
the issue seems to be specifically the presence of a trailing separator on a UNC-looking path, I don't know if it's a bug or if it's matching some other UNC-style (but not UNC) special case:
>>> PWP("//remote")
PureWindowsPath('/remote')
>>> PWP("//remote/")
PureWindowsPath('//remote//') # this one is weird, the trailing separator gets doubled which breaks everything
>>> PWP("//remote/foo")
PureWindowsPath('//remote/foo/')
>>> PWP("//remote//foo")
PureWindowsPath('/remote/foo')
These behaviours don't really seem documented, the pathlib doc specifically notes that it collapses path separators, and has a few examples of UNC which show that it doesn't, but I don't really know what's supposed to happen exactly. Either way it only seems to handle UNC paths somewhat properly if the first two segments are kept as a single "drive" unit, and that the share-path is considered a drive is specifically documented.
Of note: using joinpath / / doesn't seem to trigger a re-normalisation, your path remains improper (because the second pathsep between host and share remains doubled) but it doesn't get completely collapsed.

Related

str.replace backslash with forward slash

I would like to replace the backslash \ in a windows path with forward slash / using python.
Unfortunately I'm trying from hours but I cannot solve this issue.. I saw other questions here but still I cannot find a solution
Can someone help me?
This is what I'm trying:
path = "\\ftac\admin\rec\pir"
path = path.replace("\", "/")
But I got an error (SyntaxError: EOL while scanning string literal) and is not return the path as I want:
//ftac/admin/rec/pir, how can I solve it?
I also tried path = path.replace(os.sep, "/") or path = path.replace("\\", "/") but with both methods the first double backslash becomes single and the \a was deleted..
Oh boy, this is a bit more complicated than first appears.
Your problem is that you have stored your windows paths as normal strings, instead of raw strings. The conversion from strings to their raw representation is lossy and ugly.
This is because when you make a string like "\a", the intperter sees a special character "\x07".
This means you have to manually know which of these special characters you expect, then [lossily] hack back if you see their representation (such as in this example):
def str_to_raw(s):
raw_map = {8:r'\b', 7:r'\a', 12:r'\f', 10:r'\n', 13:r'\r', 9:r'\t', 11:r'\v'}
return r''.join(i if ord(i) > 32 else raw_map.get(ord(i), i) for i in s)
>>> str_to_raw("\\ftac\admin\rec\pir")
'\\ftac\\admin\\rec\\pir'
Now you can use the pathlib module, this can handle paths in a system agnsotic way. In your case, you know you have Windows like paths as input, so you can use as follows:
import pathlib
def fix_path(path):
# get proper raw representaiton
path_fixed = str_to_raw(path)
# read in as windows path, convert to posix string
return pathlib.PureWindowsPath(path_fixed).as_posix()
>>> fix_path("\\ftac\admin\rec\pir")
'/ftac/admin/rec/pir'

Fix undesired escape sequences in path

I have a path in a variable like that:
path = "C:\HT_Projeler\7\Kaynak\wrapped_gedizw.tif"
Which is incorrect because it contains escape sequences:
>>> path
'C:\\HT_Projeler\x07\\Kaynak\\wrapped_gedizw.tif'
How can I fix the path in this variable so it becomes equivalent to r"C:\HT_Projeler\7\Kaynak\wrapped_gedizw.tif" or "C:/HT_Projeler/7/Kaynak/wrapped_gedizw.tif"?
I know the topic is common and I investigated many questions (1,2 etc.) in here.
ADD
Here is my exact script:
...
basinFile = self._gv.basinFile
basinDs = gdal.Open(basinFile, gdal.GA_ReadOnly)
basinNumberRows = basinDs.RasterYSize
basinNumberCols = basinDs.RasterXSize
...
In here self._gv.basinFile consists my path. So I cannot put "r" beginngin of self._gv.basinFile
If you insert paths in Python code, just use raw strings, as other have suggested.
If instead that string is out of your control, there's not much you can do "after the fact". Escape sequences conversion is not injective, so, given a string where escape sequences have already been processed, you cannot "go back" univocally. IOW, if someone incorrectly writes:
path = "C:\HT_Projeler\7\Kaynak\wrapped_gedizw.tif"
as you show, you get
'C:\\HT_Projeler\x07\\Kaynak\\wrapped_gedizw.tif'
and there's no way to guess surely "what they meant", because that \x07 may have been written as \7, or \x07, or \a. Heck, any letter may have been originally written as an escape sequence - what you see in that string as an a may have actually been \x61.
Long story short: your caller is responsible for giving you correct data. Once it's corrupted there's no way to come back.
In the general case, there is no way to tell whether a character in a path is correct or not without externally checking the actual paths on your computer (and "special character" is not really well-defined; how do you know that the path wasn't \0x41 which got converted to A anyway?)
As a weak heuristic, you could look for path names within a particular editing distance, for example.
import os
from difflib import SequenceMatcher as similarity # or whatever
path_components = os.path.split(variable)
path = ''
for p in path_components:
npath = os.path.join(path, p)
if not os.path.exists(npath):
similar = reversed(sorted([(similarity(None, x, p).ratio(), x) in os.listdir(npath)]))
# recurse on most similar, second most similar, etc? or something
path = npath

How to save file with '/' in the filename in python on Mac OS? [duplicate]

I know that this is not something that should ever be done, but is there a way to use the slash character that normally separates directories within a filename in Linux?
The answer is that you can't, unless your filesystem has a bug. Here's why:
There is a system call for renaming your file defined in fs/namei.c called renameat:
SYSCALL_DEFINE4(renameat, int, olddfd, const char __user *, oldname,
int, newdfd, const char __user *, newname)
When the system call gets invoked, it does a path lookup (do_path_lookup) on the name. Keep tracing this, and we get to link_path_walk which has this:
static int link_path_walk(const char *name, struct nameidata *nd)
{
struct path next;
int err;
unsigned int lookup_flags = nd->flags;
while (*name=='/')
name++;
if (!*name)
return 0;
...
This code applies to any file system. What's this mean? It means that if you try to pass a parameter with an actual '/' character as the name of the file using traditional means, it will not do what you want. There is no way to escape the character. If a filesystem "supports" this, it's because they either:
Use a unicode character or something that looks like a slash but isn't.
They have a bug.
Furthermore, if you did go in and edit the bytes to add a slash character into a file name, bad things would happen. That's because you could never refer to this file by name :( since anytime you did, Linux would assume you were referring to a nonexistent directory. Using the 'rm *' technique would not work either, since bash simply expands that to the filename. Even rm -rf wouldn't work, since a simple strace reveals how things go on under the hood (shortened):
$ ls testdir
myfile2 out
$ strace -vf rm -rf testdir
...
unlinkat(3, "myfile2", 0) = 0
unlinkat(3, "out", 0) = 0
fcntl(3, F_GETFD) = 0x1 (flags FD_CLOEXEC)
close(3) = 0
unlinkat(AT_FDCWD, "testdir", AT_REMOVEDIR) = 0
...
Notice that these calls to unlinkat would fail because they need to refer to the files by name.
You could use a Unicode character that displays as / (for example the fraction slash), assuming your filesystem supports it.
It depends on what filesystem you are using. Of some of the more popular ones:
ext3: No
ext4: No
jfs: Yes
reiserfs: No
xfs: No
Only with an agreed-upon encoding. For example, you could agree that % will be encoded as %% and that %2F will mean a /. All the software that accessed this file would have to understand the encoding.
The short answer is: No, you can't. It's a necessary prohibition because of how the directory structure is defined.
And, as mentioned, you can display a unicode character that "looks like" a slash, but that's as far as you get.
In general it's a bad idea to try to use "bad" characters in a file name at all; even if you somehow manage it, it tends to make it hard to use the file later. The filesystem separator is flat-out not going to work at all, so you're going to need to pick an alternative method.
Have you considered URL-encoding the URL then using that as the filename? The result should be fine as a filename, and it's easy to reconstruct the name from the encoded version.
Another option is to create an index - create the output filename using whatever method you like - sequentially-numbered names, SHA1 hashes, whatever - then write a file with the generated filename/URL pair. You can save that into a hash and use it to do a URL-to-filename lookup or vice-versa with the reversed version of the hash, and you can write it out and reload it later if needed.
The short answer is: you must not. The long answer is, you probably can or it depends on where you are viewing it from and in which layer you are working with.
Since the question has Unix tag in it, I am going to answer for Unix.
As mentioned in other answers that, you must not use forward slashes in a filename.
However, in MacOS you can create a file with forward slashes / by:
# avoid doing it at all cost
touch 'foo:bar'
Now, when you see this filename from terminal you will see it as foo:bar
But, if you see it from finder: you will see finder converted it as foo/bar
Same thing can be done the other way round, if you create a file from finder with forward slashes in it like /foobar, there will be a conversion done in the background. As a result, you will see :foobar in terminal but the other way round when viewed from finder.
So, : is valid in the unix layer, but it is translated to or from / in the Mac layers like Finder window, GUI. : the colon is used as the separator in HFS paths and the slash / is used as the separator in POSIX paths
So there is a two-way translation happening, depending on which “layer” you are working with.
See more details here: https://apple.stackexchange.com/a/283095/323181
You can have a filename with a / in Linux and Unix. This is a very old question, but surprisingly nobody has said it in almost 10 years since the question was asked.
Every Unix and Linux system has the root directory named /. A directory is just a special kind of file. Symbolic links, character devices, etc are also special kinds of files. See here for an in depth discussion.
You can't create any other files with a /, but you certainly have one -- and a very important one at that.

Check if a string is valid absolute path address format

I have a string which contains user input for a directory address on a linux system. I need to check if it is properly formatted and could be an address in Python 2.6. It's important to note that this is not on the current system so I can't check if it is there using os.path nor can I try to create the directories as the function will be run many times.
These strings will always be absolute paths, so my first thought was to look for a leading slash. From there I wondered about checking if the rest of the string only contains valid characters and does not contain any double slashes. This seems a little clunky, any other ideas?
Sure the question has been edited since writing this but:
There is the os.path.isabs(PATH) which will tell you if the path is absolute or not.
Return True if path is an absolute pathname. On Unix, that means it begins with a slash, on Windows that it begins with a (back)slash after chopping off a potential drive letter.

python os.path.join on windows ignores first path element?

Consider the following:
>>> from django.conf import settings
>>> import os
>>> settings.VIRTUAL_ENV
'C:/Users/Marcin/Documents/oneclickcos'
>>> settings.EXTRA_BASE
'/oneclickcos/'
>>> os.path.join(settings.VIRTUAL_ENV,settings.EXTRA_BASE)
'/oneclickcos/'
As you can imagine, I neither expect nor want the concatenation of 'C:/Users/Marcin/Documents/oneclickcos' and '/oneclickcos/' to be '/oneclickcos/'.
Oddly enough, reversing the path components once again shows python ignoring the first path component:
>>> os.path.join(settings.EXTRA_BASE,settings.VIRTUAL_ENV)
'C:/Users/Marcin/Documents/oneclickcos'
While this works something like expected:
>>> os.path.join('/foobar',settings.VIRTUAL_ENV,'barfoo')
'C:/Users/Marcin/Documents/oneclickcos\\barfoo'
I am of course, running on Windows (Windows 7), with the native python.
Why is this happening, and what can I do about it?
That's pretty much how os.path.join is defined (quoting the docs):
If any component is an absolute path, all previous components (on Windows, including the previous drive letter, if there was one) are thrown away
And I'd say it's usually a good thing, as it avoids creating invalid paths. If you want to avoid this behavior, don't feed it absolute paths. Yes, starting with a slash qualifies as absolute path. A quick and dirty solution is just removing the leading slash (settings.EXTRA_BASE.lstrip('/') if you want to do it programmatically).
Remove the leading / from the second string:
>>> os.path.join('C:/Users/Marcin/Documents/oneclickcos', 'oneclickos/')
'C:/Users/Marcin/Documents/oneclickcos\\oneclickos/'
This is because os.path.join discards all previous components once it meets an absolute path, and /oneclickos/ is an absolute path.
Here's an excerpt from the doc of os.path.join:
Join one or more path components intelligently. If any component is an
absolute path, all previous components (on Windows, including the
previous drive letter, if there was one) are thrown away, and joining
continues. [...]

Categories