wrong rename with python - python

I need to rename my picture with the EXIF data, but I have a problem: if I use ":" to separate the time (hour:minute:second), the file name gets crazy!
metadata = pyexiv2.ImageMetadata(lunga + i)
metadata.read()
tag = metadata['Exif.Image.DateTime']
estensione = ".jpg"
new_data = tag.value.strftime('%Y-%m-%d %H-%M-%S')
new_name = lunga + str(new_data) + estensione
os.renames(lunga + i, new_name)
works great, but with
new_data = tag.value.strftime('%Y-%m-%d %H:%M:%S')
I get something like
2A443K~H.jpg

The problem is that you're not allowed to put colons into filenames on Windows. You're not actually using Windows… but you are using an SMB share, which means you're bound by Windows rules.
The fix is to not put colons into your filenames.
If you want to understand why this bizarre stuff is happening, read on.
The details on Windows filenames are described in Naming Files, Paths, and Namespaces at MSDN, but I'll summarize the relevant parts here.
The NT kernel underneath Windows has no problems with colons, but the Win32 layer on top of it can't handle them (and the quasi-POSIX layer in MSVCRT sits on top of Win32).
So, at the C level, if you call NT functions like NtSetInformationFile, it will save them just fine. If you call Win32 functions like MoveFileEx, they will normally give you an error, but if you use the special \\?\ syntax to say "pass this name straight through to NT", it will work. And if you call MSVCRT functions like rename, you will get an error. Older versions of Python called rename, which would just give you an error. Newer versions call MoveFileEx, and will try to wrap the name up in \\?\ syntax (because that also allows you to get around some other stupid limitations, like the excessively short MAX_PATH value).
So, what happens if you give a file a name that Win32 can't understand? Remember that on Windows, every file has two different names: the "long name" and the "short name". The short name is a DOS-style 8.3 filename. So whenever it can't display the long name, it displays the short name instead.
Where does the short name come from? If you don't create one explicitly, Windows will create one for you from the long name by using the first 6 characters, a tilde, and a number of letter. So, for example, the short name for "Program Files" is "PROGRA~1". But if Windows can't handle the long name, it will just make up a short name out of 6 random characters, a tilde, and a random character. So you get something like 2A443K~H.
The NTFS filesystem, being designed for Windows, expects to be used in Windows-y ways. So, if you're using an NTFS volume, even on a non-Windows system, the driver will emulate some of this functionality, giving you similar but not identical behavior.
And of course if you're talking to a share from a Windows system, or a share backed by an NTFS drive on a non-Windows system, again, some of the same things will apply.
Even if both your computer and the file server are non-Windows and the filesystem is not NTFS, if you're using SMB/CIFS for file sharing, SMB was also designed for Windows, and you will again get similar behavior.
At least you no longer have to worry about VMS, classic Mac, and other naming systems, just POSIX and Windows.

Colons are reserved characters in Windows filesystem (see How would I go about creating a filename with invalid characters such as :?>?), so the name was replaced with an auto-generated on instead.
To be clear, this is not a Python issue. Don't use colons or other reserved characters in filenames if you don't want this to happen.

Related

python youtube-dl output file not in unicode?

Is there anyway to get the youtube-dl.extract_info() function to use unicode when creating the output file?
I have encountered the problem that if you download something with unicode characters like | in the title then the output file name will not have the same character. It will be replaced with _ instead.
Take this song title for example.
If I download it with youtube-dl then I get this file name 【Nightcore】→ Pretty Girl _ Lyrics-dMAOnScOyGE. Same thing happens with different kind of characters.
Is there any way to stop this?
Because it's a annoying if you want do do anything with that file afterwards.
To get the new file name I would need to do something like os.listdir(dir) to get the file. So it's not impossible to get the new file name, but I am just interested if there is a easier way.
The encoding of | to _ is hardcoded in sanitize_filename in youtube_dl/utils.py. You can turn it off programatically by substituting youtube_dl.utils.sanitize_filename with your own implementation.
However, doing so is not recommended, and not supported out of the box. This is because | is an invalid character on Windows and can be used to execute arbitrary commands if expanded in a buggy script.
Insecure filenames were supported at one time, but I removed them from youtube-dl because too many people were shooting themselves in the foot, and often reported problems that clearly would have let any attacker execute arbitrary code on their machines.

Do file systems have other components rather than files and directories?

I have seen this python snippet in a video tutorial which checks if the listed item is a directory or a file:
for item in os.listdir("."):
if os.path.isfile(item):
# do something
elif os.path.isdir(item):
# do somethin
else:
# What is this case ?!
is it possible that the else statement could be hit?
As #sisoft says, the simple answer is yes: there do exist file systems that support file types other than files and directories.
The longer answer, if you're interested, is that the types supported by a file system vary wildly with the file system. UNIX treats a huge number of things as a 'file' (meaning an object in the file system) and so has many types. Windows has a more restricted set of objects (files, directories and links only I believe (no source))
The POSIX specification (implemented by many file systems) for a file system doesn't specify what objects it must support(source).
Generally, file system is a fairly open term that can refer to any object store. The objects that it stores could be anything.
If you'd like to learn more about file systems, there is a great chapter in Operating Systems which gives an easily accessible introduction.
Yes. There are other types, like pipes, sockets, device nodes.
For example isfile() and isdir() returns False for most files from /dev.
You can see https://en.wikipedia.org/wiki/Unix_file_types at first.
is it possible that the else statement could be hit?
Your code fragment uses a narrow definition of files and directories: os.stat(path) (follows symlinks) is successful and either S_ISREG or S_ISDIR are true correspondingly.
else clause may be triggered for non-existing paths or due to permission errors for regular files and directories.
POSIX defines the following marcos:
S_ISBLK(m)
Test for a block special file.
S_ISCHR(m)
Test for a character special file.
S_ISDIR(m)
Test for a directory.
S_ISFIFO(m)
Test for a pipe or FIFO special file.
S_ISREG(m)
Test for a regular file.
S_ISLNK(m)
Test for a symbolic link.
S_ISSOCK(m)
Test for a socket.
i.e., in addition to a regular file and a directory, there could be sockets, symlinks, pipes, block/character devices:
>>> import os
>>> import stat
>>> stat.S_ISBLK(os.stat('/dev/sda').st_mode)
True
There could exist other objects that have meaning only for a particular filesystem.

Does os.path.sep affect the tarfile module?

Is the path separator employed inside a Python tarfile.TarFile object a '/' regardless of platform, or is it a backslash on Windows?
I basically never touch Windows, but I would kind of like the code I'm writing to be compatible with it, if it can be. Unfortunately I have no Windows host on which to test.
A quick test tells me that a (forward) slash is always used.
In fact, the tar format stores the full path of each file as a single string, using slashes (try looking at a hex dump), and python just reads that full path without any modification. Likewise, at extraction time python hard-replaces slashes with the local separator (see TarFile._extract_member).
... which makes me think that there are surely some nonconformant implementations of tar for Windows that create tarfiles with backslashs as separators!?

how to check for platform incompatible folder (file) names in python

I would like to be able to check from python if a given string could be a valid cross platform folder name - below is the concrete problem I ran into (folder name ending in .), but I'm sure there are some more special cases (e.g.: con, etc.).
Is there a library for this?
From python (3.2) I created a folder on Windows (7) with a name ending in dot ('.'), e.g. (without square brackets): [What I've done on my holidays, Part II.]
When the created folder was ftp'd (to linux, but I guess that's irrelevant), it did not have the dot in it anymore (and in return, this broke a lot of hyperlinks).
I've checked it from the command line, and it seems that the folder doesn't have the '.' in the filename
mkdir tmp.
dir
cd tmp
cd ..\tmp.
Apparently, adding a single dot at the end of the folder name is ignored, e.g.:
cd c:\Users.
works just as expected.
Nope there's sadly no way to do this. For windows you basically can use the following code to remove all illegal characters - but if someone still has a FAT filesystem you'd have to handle these too since those are stricter. Basically you'll have to read the documentation for all filesystem and come up with a complete list. Here's the NTFS one as a starting point:
ILLEGAL_NTFS_CHARS = "[<>:/\\|?*\"]|[\0-\31]"
def __removeIllegalChars(name):
# removes characters that are invalid for NTFS
return re.sub(ILLEGAL_NTFS_CHARS, "", name)
And then you need some "forbidden" name list as well to get rid of COM. Pretty much a complete mess that.. and that's ignoring linux (although there it's pretty relaxed afaik)
Do not end a file or directory name with a space or a period. Although
the underlying file system may support such names, the Windows shell
and user interface does not.
http://msdn.microsoft.com/en-us/library/aa365247.aspx#naming_conventions
That page will give you information about other illegal names too, for Windows that is. Including CON as you said your self.
If you respect those (seemingly harsh) rules, I think you'll be safe on Linux and most other systems too.

Unix paths that work for any platform in Python?

Can all paths in a Python program use ".." (for the parent directory) and / (for separating path components), and still work whatever the platform?
On one hand, I have never seen such a claim in the documentation (I may have missed it), and the os and os.path modules do provide facilities for handling paths in a platform agnostic way (os.pardir, os.path.join,…), which lets me think that they are here for a reason.
On the other hand, you can read on StackOverflow that "../path/to/file" works on all platforms…
So, should os.pardir, os.path.join and friends always be used, for portability purposes, or are Unix path names always safe (up to possible character encoding issues)? or maybe "almost always" safe (i.e. working under Windows, OS X, and Linux)?
I've never had any problems with using .., although it might be a good idea to convert it to an absolute path using os.path.abspath. Secondly, I would recommend always using os.path.join whereever possible. There are a lot of corner cases (aside from portability issues) in joining paths, and it's good not to have to worry about them. For instance:
>>> '/foo/bar/' + 'qux'
'/foo/bar/qux'
>>> '/foo/bar' + 'qux'
'/foo/barqux'
>>> from os.path import join
>>> join('/foo/bar/', 'qux')
'/foo/bar/qux'
>>> join('/foo/bar', 'qux')
'/foo/bar/qux'
You may run into problems with using .. if you're on some obscure platforms, but I can't name any (Windows, *nix, and OS X all support that notation).
"Almost always safe" is right. All of the platforms you care about probably work ok today and I don't think they will be changing their conventions any time soon.
However Python is very portable and runs on a lot more than the usual platforms. The reason for the os module is to help smooth things over it a platform does have different requirements.
Is there a good reason for you to not use the os functions?
os.pardir is self documenting whereas ".." isn't, and os.pardir might be easier to grep for
Here is some docs from python 1.6 when Mac was still different for everything
OS routines for Mac, DOS, NT, or Posix depending on what system we're
on.
This exports:
- all functions from posix, nt, dos, os2, mac, or ce, e.g. unlink, stat, etc.
- os.path is one of the modules posixpath, ntpath, macpath, or dospath
- os.name is 'posix', 'nt', 'dos', 'os2', 'mac', or 'ce'
- os.curdir is a string representing the current directory ('.' or ':')
- os.pardir is a string representing the parent directory ('..' or '::')
- os.sep is the (or a most common) pathname separator ('/' or ':' or '\')
- os.altsep is the alternate pathname separator (None or '/')
- os.pathsep is the component separator used in $PATH etc
- os.linesep is the line separator in text files (' ' or ' ' or ' ')
- os.defpath is the default search path for executables
Programs that import and use 'os' stand a better chance of being
portable between different platforms. Of course, they must then only
use functions that are defined by all platforms (e.g., unlink and
opendir), and leave all pathname manipulation to os.path (e.g., split
and join).
Within python, using / will always work. You will need to be aware of the OS convention if you want to execute a command in a subshell
myprog = "/path/to/my/program"
os.system([myprog, "-n"]) # 1
os.system([myprog, "C:/input/file/to/myprog"]) # 2
Command #1 will probably work as expected.
Command #2 might not work if myprog is a Windows command and expects to parse its command line arguments to get a Windows file name.
It works on Windows, so if you define "whatever the platform" to be Unix and Windows, you're fine.
On the other hand, Python also runs on VMS, RISC OS, and other odd platforms that use completely different filename conventions. However, it's probable that trying to get your application to run on VMS, blind, is kind of silly anyway - "premature portability is the root of some relatively minor evil"
I like using the os.path functions anyway because they are good for expressing intent - instead of just a string concatenation, which might be done for any of a million purposes, it reads very explicitly as a path manipulation.
Windows supports / as a path separator. The only incompatibilities between Unix filenames and Windows filenames are:
the allowed characters in filenames
the special names and
case sensitivity
Windows is more restrictive in the first two accounts (this is, it has more forbidden characters and more special names), while Unix is typically case sensitive. There are some answers here listing exactly what are these characters and names. I'll see if I can find them.
Now, if your development environment comes with a function to create or manipulate paths, you should use it, it's there for a reason, y'know. Especially given that there are a lot more platforms than Windows and Unix.
Answering your first question, yes ../dir/file will work, unless they hit some of the above mentioned incompatibilities.
OS/X and Linux are both Unix compatible, so by definition they use the format you gave at the beginning of the question. Windows allows "/" in addition to "\" so that programs could be interchangeable with Xenix, a Unix variant that Microsoft was trying out a long time ago, and that compatibility has been carried forward to the present. Thus it works too.
I don't know how many other platforms Python has been ported to, and I can't speak for them.
As others have said, a forward slash will work in all cases, but you're better off creating a list of path segments and os.path.join()-ing them.

Categories