(Linux/Python 3.5)
I want to normalize the ~ character in strings like "~/something" and get something like "/home/something/" .
I don't want to use a replacement of my own since I'd like to use a very generalist way to achieve this, using a Python module by example.
Neither os.path neither pathlib seem to fit my expectations.
Any idea ?
You apparently missed the os.path.expanduser() function:
On Unix and Windows, return the argument with an initial component of ~ or ~user replaced by that user‘s home directory.
Related
Assume an initial (Unix) path [segment] like /var/log. Underneath this path, there might be an entire tree of directories. A user provides a pattern for folder names using Unix shell-style wildcards, e.g. *var*. Folders following the pattern underneath the initial path [segment] shall be matched using a regular expression given a full path as input, i.e. the initial path segment must be excluded from matching.
How would I build a regular expression doing this?
I am working with Python, which offers the fnmatch module as part of its standard library. fnmatch provides a translate method, which translates patterns specified using Unix shell-style wildcards into regular expressions:
>>> fnmatch.translate('*var*')
'(?s:.*var.*)\\Z'
I would like to use this for constructing my regular expressions.
Matching input paths could look this this:
/var/log/foo/var/bar
/var/log/foo/avarb/bar
/var/log/var/
Not matching input paths could look like this:
/var/log
/var/log/foo/bar
The underlying issue is that I have to provide the regular expression to a third-party module, pyinotify, as input. I can not work around this by just stripping the initial path segment and then matching against the remainder ...
You should be able to do a negative look behind like so:
(?<!^\/)var
Both positive and negative look behinds are really useful when doing regex.
Also here is an interactive example so you can get a feel on how it works with visual feedback: https://regex101.com/r/52sZjw/1
another example https://regex101.com/r/F023eD/1/
Not exactly sure how you can use this with fnmatch. It really looks like you might end up building the strings yourself, that is when the users input will match part of the path you want to exclude.
I know that this is not something that should ever be done, but is there a way to use the slash character that normally separates directories within a filename in Linux?
The answer is that you can't, unless your filesystem has a bug. Here's why:
There is a system call for renaming your file defined in fs/namei.c called renameat:
SYSCALL_DEFINE4(renameat, int, olddfd, const char __user *, oldname,
int, newdfd, const char __user *, newname)
When the system call gets invoked, it does a path lookup (do_path_lookup) on the name. Keep tracing this, and we get to link_path_walk which has this:
static int link_path_walk(const char *name, struct nameidata *nd)
{
struct path next;
int err;
unsigned int lookup_flags = nd->flags;
while (*name=='/')
name++;
if (!*name)
return 0;
...
This code applies to any file system. What's this mean? It means that if you try to pass a parameter with an actual '/' character as the name of the file using traditional means, it will not do what you want. There is no way to escape the character. If a filesystem "supports" this, it's because they either:
Use a unicode character or something that looks like a slash but isn't.
They have a bug.
Furthermore, if you did go in and edit the bytes to add a slash character into a file name, bad things would happen. That's because you could never refer to this file by name :( since anytime you did, Linux would assume you were referring to a nonexistent directory. Using the 'rm *' technique would not work either, since bash simply expands that to the filename. Even rm -rf wouldn't work, since a simple strace reveals how things go on under the hood (shortened):
$ ls testdir
myfile2 out
$ strace -vf rm -rf testdir
...
unlinkat(3, "myfile2", 0) = 0
unlinkat(3, "out", 0) = 0
fcntl(3, F_GETFD) = 0x1 (flags FD_CLOEXEC)
close(3) = 0
unlinkat(AT_FDCWD, "testdir", AT_REMOVEDIR) = 0
...
Notice that these calls to unlinkat would fail because they need to refer to the files by name.
You could use a Unicode character that displays as / (for example the fraction slash), assuming your filesystem supports it.
It depends on what filesystem you are using. Of some of the more popular ones:
ext3: No
ext4: No
jfs: Yes
reiserfs: No
xfs: No
Only with an agreed-upon encoding. For example, you could agree that % will be encoded as %% and that %2F will mean a /. All the software that accessed this file would have to understand the encoding.
The short answer is: No, you can't. It's a necessary prohibition because of how the directory structure is defined.
And, as mentioned, you can display a unicode character that "looks like" a slash, but that's as far as you get.
In general it's a bad idea to try to use "bad" characters in a file name at all; even if you somehow manage it, it tends to make it hard to use the file later. The filesystem separator is flat-out not going to work at all, so you're going to need to pick an alternative method.
Have you considered URL-encoding the URL then using that as the filename? The result should be fine as a filename, and it's easy to reconstruct the name from the encoded version.
Another option is to create an index - create the output filename using whatever method you like - sequentially-numbered names, SHA1 hashes, whatever - then write a file with the generated filename/URL pair. You can save that into a hash and use it to do a URL-to-filename lookup or vice-versa with the reversed version of the hash, and you can write it out and reload it later if needed.
The short answer is: you must not. The long answer is, you probably can or it depends on where you are viewing it from and in which layer you are working with.
Since the question has Unix tag in it, I am going to answer for Unix.
As mentioned in other answers that, you must not use forward slashes in a filename.
However, in MacOS you can create a file with forward slashes / by:
# avoid doing it at all cost
touch 'foo:bar'
Now, when you see this filename from terminal you will see it as foo:bar
But, if you see it from finder: you will see finder converted it as foo/bar
Same thing can be done the other way round, if you create a file from finder with forward slashes in it like /foobar, there will be a conversion done in the background. As a result, you will see :foobar in terminal but the other way round when viewed from finder.
So, : is valid in the unix layer, but it is translated to or from / in the Mac layers like Finder window, GUI. : the colon is used as the separator in HFS paths and the slash / is used as the separator in POSIX paths
So there is a two-way translation happening, depending on which “layer” you are working with.
See more details here: https://apple.stackexchange.com/a/283095/323181
You can have a filename with a / in Linux and Unix. This is a very old question, but surprisingly nobody has said it in almost 10 years since the question was asked.
Every Unix and Linux system has the root directory named /. A directory is just a special kind of file. Symbolic links, character devices, etc are also special kinds of files. See here for an in depth discussion.
You can't create any other files with a /, but you certainly have one -- and a very important one at that.
So i'm trying to create path using the code mentioned below:
path = os.path.join(os.path.dirname(__file__),'folder_abc','file.abc')
But it keeps giving the wrong path.
i.e for the above statement, value of path variable is set to :
C:/User/abc\folder_abc\file.abc
see, before abc '/' is used and after it '\' .
Why is this happening ?
Thanks to SSchneid.
using os.path.normpath() solved this.
i.e in above case :
path = os.path.normpath(os.path.join(os.path.dirname(__file__),'folder_abc','file.abc'))
This is described in the Python docs see here:
https://docs.python.org/2/library/os.path.html#os.path.join
It means, that your operating system separator is set to '\' and not as you would like to '/'. but touching these variables is not recommended as described here in another stackoverflow post:
Python - Can (or should) I change os.path.sep?
I've got a set of file directories that I am manipulating with python. However, all I care about is the last two levels of the directory. So if I had
"topdirectory/sub1/subsub1/subsubsub1/target"
"topdirectory/sub1/target"
The necesary returned strings would be
"subsubsub1/target"
and
"sub1/target"
I know python has a split string type method, but how can I tell it to only grab the LAST 2 components separated by delimeters?
Edit : Sorry guys, I should have explained that this is not REALLY a directory/file setup, but a timeseries DB that very closely resembles one. I figured it would just be easier to explain that way. The paths are essentially directories/files, but since it is a database, using the OS utilites wouldn't have any effect.
The os.path module contains a split function for this. It returns the dirname and the basename. Run it twice and you have the last two bases.
Obviously, you want some checking that there are two or more bases as well.
Try
"topdirectory/sub1/subsub1/subsubsub1/target".rsplit('/',2)[-2:]
This approach works for any string in general.
But as stated in the comments, if you refer to the system path, I'd rather use os module as suggested by Sean Perry. Note that on different operating system, delimiter can be different, etc.
Consider the following:
>>> from django.conf import settings
>>> import os
>>> settings.VIRTUAL_ENV
'C:/Users/Marcin/Documents/oneclickcos'
>>> settings.EXTRA_BASE
'/oneclickcos/'
>>> os.path.join(settings.VIRTUAL_ENV,settings.EXTRA_BASE)
'/oneclickcos/'
As you can imagine, I neither expect nor want the concatenation of 'C:/Users/Marcin/Documents/oneclickcos' and '/oneclickcos/' to be '/oneclickcos/'.
Oddly enough, reversing the path components once again shows python ignoring the first path component:
>>> os.path.join(settings.EXTRA_BASE,settings.VIRTUAL_ENV)
'C:/Users/Marcin/Documents/oneclickcos'
While this works something like expected:
>>> os.path.join('/foobar',settings.VIRTUAL_ENV,'barfoo')
'C:/Users/Marcin/Documents/oneclickcos\\barfoo'
I am of course, running on Windows (Windows 7), with the native python.
Why is this happening, and what can I do about it?
That's pretty much how os.path.join is defined (quoting the docs):
If any component is an absolute path, all previous components (on Windows, including the previous drive letter, if there was one) are thrown away
And I'd say it's usually a good thing, as it avoids creating invalid paths. If you want to avoid this behavior, don't feed it absolute paths. Yes, starting with a slash qualifies as absolute path. A quick and dirty solution is just removing the leading slash (settings.EXTRA_BASE.lstrip('/') if you want to do it programmatically).
Remove the leading / from the second string:
>>> os.path.join('C:/Users/Marcin/Documents/oneclickcos', 'oneclickos/')
'C:/Users/Marcin/Documents/oneclickcos\\oneclickos/'
This is because os.path.join discards all previous components once it meets an absolute path, and /oneclickos/ is an absolute path.
Here's an excerpt from the doc of os.path.join:
Join one or more path components intelligently. If any component is an
absolute path, all previous components (on Windows, including the
previous drive letter, if there was one) are thrown away, and joining
continues. [...]