Do file systems have other components rather than files and directories? - python

I have seen this python snippet in a video tutorial which checks if the listed item is a directory or a file:
for item in os.listdir("."):
if os.path.isfile(item):
# do something
elif os.path.isdir(item):
# do somethin
else:
# What is this case ?!
is it possible that the else statement could be hit?

As #sisoft says, the simple answer is yes: there do exist file systems that support file types other than files and directories.
The longer answer, if you're interested, is that the types supported by a file system vary wildly with the file system. UNIX treats a huge number of things as a 'file' (meaning an object in the file system) and so has many types. Windows has a more restricted set of objects (files, directories and links only I believe (no source))
The POSIX specification (implemented by many file systems) for a file system doesn't specify what objects it must support(source).
Generally, file system is a fairly open term that can refer to any object store. The objects that it stores could be anything.
If you'd like to learn more about file systems, there is a great chapter in Operating Systems which gives an easily accessible introduction.

Yes. There are other types, like pipes, sockets, device nodes.
For example isfile() and isdir() returns False for most files from /dev.
You can see https://en.wikipedia.org/wiki/Unix_file_types at first.

is it possible that the else statement could be hit?
Your code fragment uses a narrow definition of files and directories: os.stat(path) (follows symlinks) is successful and either S_ISREG or S_ISDIR are true correspondingly.
else clause may be triggered for non-existing paths or due to permission errors for regular files and directories.
POSIX defines the following marcos:
S_ISBLK(m)
Test for a block special file.
S_ISCHR(m)
Test for a character special file.
S_ISDIR(m)
Test for a directory.
S_ISFIFO(m)
Test for a pipe or FIFO special file.
S_ISREG(m)
Test for a regular file.
S_ISLNK(m)
Test for a symbolic link.
S_ISSOCK(m)
Test for a socket.
i.e., in addition to a regular file and a directory, there could be sockets, symlinks, pipes, block/character devices:
>>> import os
>>> import stat
>>> stat.S_ISBLK(os.stat('/dev/sda').st_mode)
True
There could exist other objects that have meaning only for a particular filesystem.

Related

os.path.split seems to be returning wrong

I can't understand what os.path.split is doing. I'm debugging a program (specifically git's interface with Perforce: git-p4) and seeing that os.path.split is splitting the incoming path in ways the script isn't expecting, and also seems inconsistent with the documentation. I made some simpler tests and can't figure out what it's doing myself.
The path I want to split is //a/b (The path is actually a Perforce path, not a local filesystem path), and I need b in the second half of the returned pair. I'm running on Windows, and suspect the issue has something to do with the path not looking very Windows-esque. When I tried running my test code in an online sandbox it worked as expected unlike my Windows machine.
I've read the documentation:
os.path.split(path)
Split the pathname path into a pair, (head, tail) where tail is the last pathname component and head is everything leading up to that. The tail part will never contain a slash; if path ends in a slash, tail will be empty. If there is no slash in path, head will be empty. If path is empty, both head and tail are empty. Trailing slashes are stripped from head unless it is the root (one or more slashes only). In all cases, join(head, tail) returns a path to the same location as path (but the strings may differ). Also see the functions dirname() and basename().
My test code:
import os
print os.path.split("//a")
print os.path.split("//a/b")
print os.path.split("//a/b/c")
What I'd expect:
('//', 'a')
('//a', 'b')
('//a/b', 'c')
What I actually get on a couple online sandboxes:
('//', 'a')
('//a', 'b')
('//a/b', 'c')
What I actually get on my PC:
('//', 'a')
('//a/b', '')
('//a/b/', 'c')
Python 2 because the git-p4 code is written for Python 2.
So my first question is just for my own understanding. What's going wrong here? An OS difference?
And then beyond my own curiosity, I need a fix. I've been able to modify git-p4, but I'd of course prefer to edit it as little as possible as I'm not trying to understand it! I'm not a python expert. Is there a comparable method that can get ('//a', 'b') returned?
You are using the wrong tool to handle these paths. On Windows, paths that start with //foo/bar or \\foo\bar are seen as UNC network paths, and os.path.split() will first use os.path.splitdrive() to make sure the UNC portion is not split. The UNC or drive portion is then re-attached after splitting the remainder.
You can use the posixpath module instead, to get the POSIX behaviour:
import posixpath
posixpath.split(yourpaths)
Quoting from the top of the os.path module documentation:
Note: Since different operating systems have different path name conventions, there are several versions of this module in the standard library. The os.path module is always the path module suitable for the operating system Python is running on, and therefore usable for local paths. However, you can also import and use the individual modules if you want to manipulate a path that is always in one of the different formats. They all have the same interface:
posixpath for UNIX-style paths
ntpath for Windows paths
[...]
On Windows, os.path is the same module as ntpath, the online sandboxes must all have been POSIX systems.
Treating your Perforce paths as POSIX paths is fine, provided you always use forward slashes as path separators.

How do you define a local path in python

Good morning, I can indicate how to enter a path of internal hard disk in python, currently use the statement:
file = GETfile() or 'http://**********'
I would like to put a path to a local file, but it does not work, where am I wrong?
file = GETfile() or 'D:\xxx\xxxx\playlist\playlist.m3u'
\ is a escape character. You have three options.
1) use /. This, as a bonus works for linux as well:
'D:/xxx/xxxx/playlist/playlist.m3u'
2) escape the backslash
'D:\\xxx\\xxxx\\playlist\\playlist.m3u'
3) use raw strings:
r'D:\xxx\xxxx\playlist\playlist.m3u'
A correct answer is already given, but some additional information when working with local drive paths on Windows operating system.
Personally I would go with the r'D:\dir\subdir\filename.ext' format, however the other two methods already mentioned are valid as well.
Furthermore, file operations on Windows are limited by Explorer to a 256 character limit. Longer path names will usually result in an OS error.
However there is a workaround, by pre fixing "\\?\" to a long path.
Example of a path which does not work:
D:\reallyreallyreallyreallyreallylonglonglonglongdir\reallyreallyreallyreallyreallylonglonglonglongdir\reallyreallyreallyreallyreallylonglonglonglongdir\reallyreallyreallyreallyreallylonglonglonglongdir\reallyreallyreallyreallyreallylonglonglonglongdir\reallyreallyreallyreallyreallylonglonglonglongdir\reallyreallyreallyreallyreallylonglonglonglongdir\reallyreallyreallyreallyreallylonglonglonglongdir\filename.ext
Same file path which does work:
\\?\D:\reallyreallyreallyreallyreallylonglonglonglongdir\reallyreallyreallyreallyreallylonglonglonglongdir\reallyreallyreallyreallyreallylonglonglonglongdir\reallyreallyreallyreallyreallylonglonglonglongdir\reallyreallyreallyreallyreallylonglonglonglongdir\reallyreallyreallyreallyreallylonglonglonglongdir\reallyreallyreallyreallyreallylonglonglonglongdir\reallyreallyreallyreallyreallylonglonglonglongdir\filename.ext
so the following code I use to change filenames to include the "\\?\":
import os
import platform
def full_path_windows(filepath):
if platform.system() == 'Windows':
if filepath[1:3] == ':\\':
return u'\\\\?\\' + os.path.normcase(filepath)
return os.path.normcase(filepath)
I use this for every path to file (or directories), it will return the path with a prefix. The path does not need to exist; so you can use this also before you create a file or directory, to ensure you are not running into the Windows Explorer limitations.
HTH

wrong rename with python

I need to rename my picture with the EXIF data, but I have a problem: if I use ":" to separate the time (hour:minute:second), the file name gets crazy!
metadata = pyexiv2.ImageMetadata(lunga + i)
metadata.read()
tag = metadata['Exif.Image.DateTime']
estensione = ".jpg"
new_data = tag.value.strftime('%Y-%m-%d %H-%M-%S')
new_name = lunga + str(new_data) + estensione
os.renames(lunga + i, new_name)
works great, but with
new_data = tag.value.strftime('%Y-%m-%d %H:%M:%S')
I get something like
2A443K~H.jpg
The problem is that you're not allowed to put colons into filenames on Windows. You're not actually using Windows… but you are using an SMB share, which means you're bound by Windows rules.
The fix is to not put colons into your filenames.
If you want to understand why this bizarre stuff is happening, read on.
The details on Windows filenames are described in Naming Files, Paths, and Namespaces at MSDN, but I'll summarize the relevant parts here.
The NT kernel underneath Windows has no problems with colons, but the Win32 layer on top of it can't handle them (and the quasi-POSIX layer in MSVCRT sits on top of Win32).
So, at the C level, if you call NT functions like NtSetInformationFile, it will save them just fine. If you call Win32 functions like MoveFileEx, they will normally give you an error, but if you use the special \\?\ syntax to say "pass this name straight through to NT", it will work. And if you call MSVCRT functions like rename, you will get an error. Older versions of Python called rename, which would just give you an error. Newer versions call MoveFileEx, and will try to wrap the name up in \\?\ syntax (because that also allows you to get around some other stupid limitations, like the excessively short MAX_PATH value).
So, what happens if you give a file a name that Win32 can't understand? Remember that on Windows, every file has two different names: the "long name" and the "short name". The short name is a DOS-style 8.3 filename. So whenever it can't display the long name, it displays the short name instead.
Where does the short name come from? If you don't create one explicitly, Windows will create one for you from the long name by using the first 6 characters, a tilde, and a number of letter. So, for example, the short name for "Program Files" is "PROGRA~1". But if Windows can't handle the long name, it will just make up a short name out of 6 random characters, a tilde, and a random character. So you get something like 2A443K~H.
The NTFS filesystem, being designed for Windows, expects to be used in Windows-y ways. So, if you're using an NTFS volume, even on a non-Windows system, the driver will emulate some of this functionality, giving you similar but not identical behavior.
And of course if you're talking to a share from a Windows system, or a share backed by an NTFS drive on a non-Windows system, again, some of the same things will apply.
Even if both your computer and the file server are non-Windows and the filesystem is not NTFS, if you're using SMB/CIFS for file sharing, SMB was also designed for Windows, and you will again get similar behavior.
At least you no longer have to worry about VMS, classic Mac, and other naming systems, just POSIX and Windows.
Colons are reserved characters in Windows filesystem (see How would I go about creating a filename with invalid characters such as :?>?), so the name was replaced with an auto-generated on instead.
To be clear, this is not a Python issue. Don't use colons or other reserved characters in filenames if you don't want this to happen.

Is this python code safe against injections?

I have a server/client socket pair in Python. The server receives specific commands, then prepares the response and send it to the client.
In this question, my concern is just about possible injections in the code: if it could be possible to ask the server doing something weird with the 2nd parameter -- if the control on the command contents is not sufficient to avoid undesired behaviour.
EDIT:
according to advices received
added parameter shell=True when calling check_output on windows. Should not be dangerous since the command is a plain 'dir'.
.
self.client, address = self.sock.accept()
...
cmd = bytes.decode(self.client.recv(4096))
ls: executes a system command but only reads the content of a directory.
if cmd == 'ls':
if self.linux:
output = subprocess.check_output(['ls', '-l'])
else:
output = subprocess.check_output('dir', shell=True)
self.client.send(output)
cd: just calls os.chdir.
elif cmd.startswith('cd '):
path = cmd.split(' ')[1].strip()
if not os.path.isdir(path):
self.client.send(b'is not path')
else:
os.chdir(path)
self.client.send( os.getcwd().encode() )
get: send the content of a file to the client.
elif cmd.startswith('get '):
file = cmd.split(' ')[1].strip()
if not os.path.isfile(file):
self.client.send(b'ERR: is not a file')
else:
try:
with open(file) as f: contents = f.read()
except IOError as er:
res = "ERR: " + er.strerror
self.client.send(res.encode())
continue
... (send the file contents)
Except in implementation details, I cannot see any possibilities of direct injection of arbitrary code because you do not use received parameters in the only commands you use (ls -l and dir).
But you may still have some security problems :
you locate commands through the path instead of using absolute locations. If somebody could change the path environment variable what could happen ... => I advice you to use directly os.listdir('.') which is portable and has less risks.
you seem to have no control on allowed files. If I correctly remember reading CON: or other special files on older Windows version gave weird results. And you should never give any access to sensible files, configuration, ...
you could have control on length of asked files to avoid users to try to break the server with abnormally long file names.
Typical issues in a client-server scenario are:
Tricking the server into running a command that is determined by the client. In the most obvious form this happens if the server allows the client to run commands (yes, stupid). However, this can also happen if the client can supply only command parameters but shell=True is used. E.g. using subprocess.check_output('dir %s' % dir, shell=True) with a client-supplied dir variable would be a security issue, dir could have a value like c:\ && deltree c:\windows (a second command has been added thanks to the flexibility of the shell's command line interpreter). A relatively rare variation of this attack is the client being able to influence environment variables like PATH to trick the server into running a different command than intended.
Using unexpected functionality of built-in programming language functions. For example, fopen() in PHP won't just open files but fetch URLs as well. This allows passing URLs to functionality expecting file names and playing all kinds of tricks with the server software. Fortunately, Python is a sane language - open() works on files and nothing else. Still, database commands for example can be problematic if the SQL query is generated dynamically using client-supplied information (SQL Injection).
Reading data outside the allowed area. Typical scenario is a server that is supposed to allow only reading files from a particular directory, yet by passing in ../../../etc/passwd as parameter you can read any file. Another typical scenario is a server that allows reading only files with a particular file extension (e.g. .png) but passing in something like passwords.txt\0harmless.png still allows reading files of other types.
Out of these issues only the last one seems present in your code. In fact, your server doesn't check at all which directories and files the client should be allowed to read - this is a potential issue, a client might be able to read confidential files.

how to check for platform incompatible folder (file) names in python

I would like to be able to check from python if a given string could be a valid cross platform folder name - below is the concrete problem I ran into (folder name ending in .), but I'm sure there are some more special cases (e.g.: con, etc.).
Is there a library for this?
From python (3.2) I created a folder on Windows (7) with a name ending in dot ('.'), e.g. (without square brackets): [What I've done on my holidays, Part II.]
When the created folder was ftp'd (to linux, but I guess that's irrelevant), it did not have the dot in it anymore (and in return, this broke a lot of hyperlinks).
I've checked it from the command line, and it seems that the folder doesn't have the '.' in the filename
mkdir tmp.
dir
cd tmp
cd ..\tmp.
Apparently, adding a single dot at the end of the folder name is ignored, e.g.:
cd c:\Users.
works just as expected.
Nope there's sadly no way to do this. For windows you basically can use the following code to remove all illegal characters - but if someone still has a FAT filesystem you'd have to handle these too since those are stricter. Basically you'll have to read the documentation for all filesystem and come up with a complete list. Here's the NTFS one as a starting point:
ILLEGAL_NTFS_CHARS = "[<>:/\\|?*\"]|[\0-\31]"
def __removeIllegalChars(name):
# removes characters that are invalid for NTFS
return re.sub(ILLEGAL_NTFS_CHARS, "", name)
And then you need some "forbidden" name list as well to get rid of COM. Pretty much a complete mess that.. and that's ignoring linux (although there it's pretty relaxed afaik)
Do not end a file or directory name with a space or a period. Although
the underlying file system may support such names, the Windows shell
and user interface does not.
http://msdn.microsoft.com/en-us/library/aa365247.aspx#naming_conventions
That page will give you information about other illegal names too, for Windows that is. Including CON as you said your self.
If you respect those (seemingly harsh) rules, I think you'll be safe on Linux and most other systems too.

Categories