I am automatically generating filenames and I do not want there to be an overwrite. I am lazily using this little line of code
fd, filepath = tempfile.mkstemp(ext, prefix='odt_img_', dir=self.destPath)
os.close(fd) # just using the name and overwriting later
Later on I write to filepath, but I am not sure if mkstemp just adds some random letters or if it actually makes sure the name is unique.
tempfile.mkstemp only guarantees to create and open a new file with a name that does not exist. From the docs:
Creates a temporary file in the most secure manner possible. There are no race conditions in the file’s creation, assuming that the platform properly implements the os.O_EXCL flag for os.open().
and the O_EXCL flag specifies:
Ensure that this call creates the file: if this flag is specified in conjunction with O_CREAT, and the filename already exists, then open() will fail.
Internally, mkstemp just loops through a random sequence trying to create a file that does not exist until it succeeds or runs out of "ideas" in which case it would fail with an IOError.
Related
I'm writing a function that does some operations with a .log file: The program checks if /logs/ansible.log exists before proceeding. If /logs/ansible.log doesn't exist, it should go ahead and create the file / directory structure (both don't exist prior).
try:
if not os.path.exists("/logs/ansible.log"):
# create the /logs/ansible.log file
finally:
# do something
I know I can create the ansible.log file with open('ansible.log', 'w') and create the directory with os.makedirs('/logs/'), however how can I simply create '/logs/ansible.log' at once?
*** Assume program is being executed as root
def createAndOpen(filename, mode):
os.makedirs(os.path.dirname(path), exist_ok=True)
return open(filename, mode)
Now, you can open the file and create the folder at once:
with createAndOpen('/logs/ansible.log', 'a') as f:
f.write('Hello world')
Otherwise, this isn’t possible. The operating system does not give you a single function that does this, so whatever else existed that does this would have to have a similar logic as above. Just less visible to you. But since it simply doesn’t exist, you can just create it yourself.
I want to copy a file like for example Ubuntu Nautilus file manager does. If destination already exists it creates new filename with larger index. I tried shutil.copyfile but it overwrites destination file. How to increment filename if destination file already exists in python?
shutil.copyfile(src, dst, *, follow_symlinks=True)¶
Copy the contents (no metadata) of the file named src to a file named dst and return dst. src and dstare path names given as strings. dst must be the complete target file name; look at shutil.copy() for a copy that accepts a target directory path. If src and dst specify the same file, SameFileError is raised.
The destination location must be writable; otherwise, an OSError exception will be raised. If dst already exists, it will be replaced. Special files such as character or block devices and pipes cannot be copied with this function.
If follow_symlinks is false and src is a symbolic link, a new symbolic link will be created instead of copying the file src points to.
Changed in version 3.3: IOError used to be raised instead of OSError. Added follow_symlinks argument. Now returns dst.
Changed in version 3.4: Raise SameFileErrorinstead of Error. Since the former is a subclass of the latter, this change is backward compatible.
exception shutil.SameFileError
This exception is raised if source and destination in copyfile() are the same file.
I'm currently abusing tempfile a little bit by using it to generate unique names for permanent files. I'm using the following function to get a unique id:
def gen_id():
tf = tempfile.mktemp()
tfname = os.path.split(tf)[1]
return tfname.strip(tempfile.gettempprefix())
Then I'm storing a file in a custom directory with a filename from that function. I use this function to give me more flexibility than the built-ins; with this function I can choose my own directory and remove the tmp prefix.
Since tempfiles are supposed to be "temporary files," are there any dangers to using their uniqueness for permanent files like this? Any reasons why my function would not be safe for generating unique ids?
EDIT: I got the idea to use tempfile for unique names from this SO answer.
help(tempfile.mktemp) -> "This function is unsafe and should not be used. The file name refers to a file that did not exist at some point, but by the time you get around to creating it, someone else may have beaten you to the punch."
i.e. you could get a filename from this that has the same name as an existing file.
The replacement is tempfile.mkstemp() and it actually does create a file which you normally have to remove after use ... but you can give it a parameter for where your custom directory is and tell it to use no prefix, and let it create the files for you full stop. And it will check for existing files of the same name and make new names until it finds an unused name.
tempfile.mkstemp(suffix="", prefix=template, dir=None, text=False)
(The tempfile module is written in Python, you can see the code for it in \Lib\tempfile.py )
I highly suggest just using this comment from the same answer as the way of generating unique names.
No need to abuse mktemp for that (and it's deprecated anyway).
Keep in mind using mktemp guarantees the file name won't exist during the call but if you deleted all of your temp files and cache, or even immediately after the call, the same file (or in case of mktemp the same path) can be created twice.
Using a random choice has less chance in that case to cause collisions and has no downsides. You should however have a check for the small chance a collision occurs, in which case you should generate a new name.
As tempfile.mktemp is depreciated in Python 2.7 I generate a unique path to a temporary file as follows:
temp = tempfile.NamedTemporaryFile(suffix=".py")
path_to_generated_py = temp.name
temp.close()
# now I use path_to_gerated_py to create a python file
Is this the recommended way in Python 2.7? As I close the temp file immediately it looks like misusing NamedTemporaryFile....
The direct replacement for tempfile.mktemp() is tempfile.mkstemp(). The latter creates the file, like NamedTemporaryFile, so you must close it (as in your code snippet). The difference with NamedTemporaryFile is that the file is not deleted when closed. This is actually required: your version has a theoretical race condition where two processes might end up with the same temporary file name. If you use mkstemp() instead, the file is never deleted, and will likely be overwritten by the 3rd-party library you use --- but at any point in time, the file exists, and so there is no risk that another process would create a temporary file of the same name.
Is it possible to check if a file has been deleted or recreated in python?
For example, if you did a open("file") in the script, and then while that file is still open, you do rm file; touch file;, then the script will still hold a reference to the old file even though it's already been deleted.
You should fstat the file descriptor for the opened file.
>>> import os
>>> f = open("testdv.py")
>>> os.fstat(f.fileno())
posix.stat_result(st_mode=33188, st_ino=1508053, st_dev=65027L, st_nlink=1, st_uid=1000, st_gid=1000, st_size=1107, st_atime=1349180541, st_mtime=1349180540, st_ctime=1349180540)
>>> os.fstat(f.fileno()).st_nlink
1
Ok, this file has one link, so one name in the filesystem. Now remove it:
>>> os.unlink("testdv.py")
>>> os.fstat(f.fileno()).st_nlink
0
No more links, so we have an "anonymous file" that's only kept alive as long as we have it open. Creating a new file with the same name has no effect on the old file:
>>> g = open("testdv.py", "w")
>>> os.fstat(g.fileno()).st_nlink
1
>>> os.fstat(f.fileno()).st_nlink
0
Of course, st_nlink can sometimes be >1 initially, so checking that for zero is not entirely reliable (though in a controlled setting, it might be good enough). Instead, you can verify whether the file at the path you initially opened is the same one that you have a file descriptor for by comparing stat results:
>>> os.stat("testdv.py") == os.fstat(f.fileno())
False
>>> os.stat("testdv.py") == os.fstat(g.fileno())
True
(And if you want this to be 100% correct, then you should compare only the st_dev and st_ino fields on stat results, since the other fields and st_atime in particular might change in between the calls.)
Yes. Use the os.stat() function to check the file length. If the length is zero (or the function returns the error "File not found"), then someone deleted the file.
Alternatively, you can open+write+close the file each time you need to write something into it. The drawback is that opening a file is a pretty slow operation, so this is out of the question if you need to write a lot of data.
Why? Because the new file isn't the file that you're holding open. In a nutshell, Unix filesystems have two levels. One is the directory entry (i.e. the file name, file size, modification time, pointer to the data) and the second level is the file data.
When you open a file, Unix uses the name to find the file data. After that, it operates only on the second level - changes to the directory entry have no effect on any open "file handles". Which is exactly why you can delete the directory entry: Your program isn't using it.
When you use os.stat(), you don't look at the file data but at the directory entry again.
On the positive side, this allows you to create files which no one can see but your program: Open the file, delete it and then use it. Since there is no directory entry for the file, no other program can access the data.
On the negative side, you can't easily solve problems like the one you have.
Yes -- you can use the inotify facility to check for file changes and more. There also is a Python binding for it. Using inotify you can watch files or directories for filesystem activiy. From the manual the following events can be detected:
IN_ACCESS File was accessed (read) (*).
IN_ATTRIB Metadata changed, e.g., permissions, timestamps, extended attributes, link count (since Linux 2.6.25), UID, GID, etc. (*).
IN_CLOSE_WRITE File opened for writing was closed (*).
IN_CLOSE_NOWRITE File not opened for writing was closed (*).
IN_CREATE File/directory created in watched directory (*).
IN_DELETE File/directory deleted from watched directory (*).
IN_DELETE_SELF Watched file/directory was itself deleted.
IN_MODIFY File was modified (*).
IN_MOVE_SELF Watched file/directory was itself moved.
IN_MOVED_FROM File moved out of watched directory (*).
IN_MOVED_TO File moved into watched directory (*).
IN_OPEN File was opened (*).
From here you can google yourself a solution, but I think you get the overall idea. Of course this may only work on Linux, but from your question I assume you are using it (references to rm and touch).