I'm trying to create a script in Python to back up some files. But, these files could be renamed or deleted at any time. I don't want my script to prevent that by locking the file; the file should be able to still be deleted at any time during the backup.
How can I do this in Python? And, what happens? Do my objects just become null if the stream cannot be read?
Thank you! I'm somewhat new to Python.
As mentioned by #kindall, this is a Windows-specific issue. Unix OSes allow deleting.
To do this in Windows, I needed to use win32file.CreateFile() to use the Windows-specific dwSharingMode flag (in Python's pywin32, it's just called shareMode).
Rough Example:
import msvcrt
import os
import win32file
py_handle = win32file.CreateFile(
'filename.txt',
win32file.GENERIC_READ,
win32file.FILE_SHARE_DELETE
| win32file.FILE_SHARE_READ
| win32file.FILE_SHARE_WRITE,
None,
win32file.OPEN_EXISTING,
win32file.FILE_ATTRIBUTE_NORMAL,
None
)
try:
with os.fdopen(
msvcrt.open_osfhandle(py_handle.handle, os.O_RDONLY)
) as file_descriptor:
... # read from `file_descriptor`
finally:
py_handle.Close()
Note: if you need to keep the win32-file open beyond the lifetime of the file-handle object returned, you should invoke PyHandle.detach() on that handle.
On UNIX-like OSs, including Linux, this isn't an issue. Well, some other program could write to the file at the same time you're reading it, which could cause problems (the file you are copying could end up corrupted) but this is solvable with a verification pass.
On Windows, use Volume Snapshot Service (aka Volume Shadow Copy). VSS creates a snapshot of the volume at a moment in time, and you can open files on the snapshot without locking the files on the original volume. A quick Google found a Python module for doing copies using VSS here: http://sourceforge.net/projects/pyvss/
Related
I want to create empty file using Python script in Unix environment. Could see different ways mentioned of achieving the same. What are the benefits/pitfalls of one over the other.
os.system('touch abc')
open('abc','a').close()
open('abc','a')
subprocess.call(['touch','abc'])
Well, for a start, the ones that rely on touch are not portable. They won't work under standard Windows, for example, without the installation of CygWin, GNUWin32, or some other package providing a touch utility..
They also involve the creation of a separate process for doing the work, something that's totally unnecessary in this case.
Of the four, I would probably use open('abc','a').close() if the intent is to try and just create the file if it doesn't exist. In my opinion, that makes the intent clear.
But, if you're trying to create an empty file, I'd probably be using the w write mode rather than the a append mode.
In addition, you probably also want to catch the exception if, for example, you cannot actually create the file.
TLDR: use
open('abc','a').close()
(or 'w' instead of 'a' if the intent is to truncate the file if it already exists).
Invoking a separate process to do something Python can do itself is wasteful, and non-portable to platforms where the external command is not available. (Additionally, os.system uses two processes -- one more for a shell to parse the command line -- and is being deprecated in favor of subprocess.)
Not closing an open filehandle when you're done with it is bad practice, and could cause resource depletion in a larger program (you run out of filehandles if you open more and more files and never close them).
To create an empty file on Unix in Python:
import os
try:
os.close(os.open('abc', os.O_WRONLY | os.O_CREAT | os.O_EXCL |
getattr(os, "O_CLOEXEC", 0) |
os.O_NONBLOCK | os.O_NOCTTY))
except OSError:
pass # decide what to consider an error in your case and reraise
# 1. is it an error if 'abc' entry already exists?
# 2. is it an error if 'abc' is a directory or a symlink to a directory?
# 3. is it an error if 'abc' is a named pipe?
# 4. it is probably an error if the parent directory is not writable
# or the filesystem is read-only (can't create a file)
Or more portable variant:
try:
open('abc', 'ab', 0).close()
except OSError:
pass # see the comment above
Without the explicit .close() call, non-reference-counting Python implementations such as Pypy, Jython may delay closing the file until garbage collection is run (it may exhaust available file descriptors for your process).
The latter example may stuck on FIFO and follows symlinks. On my system, it is equivalent to:
from os import *
open("abc", O_WRONLY|O_CREAT|O_APPEND|O_CLOEXEC, 0666)
In addition, touch command updates the access and modification times of existing files to the current time.
In more recent Python 3 variants, we have Path.touch() from pathlib. This will create an empty file if it doesn't exist, and update the mtime if it does, in the same way as your example os.system('touch abc'), but it's much more portable:
from pathlib import Path
abc = Path('abc')
abc.touch()
I found myself unable to open new files in Python. When I examined with ls -l /proc/PID/fd I saw loads of files open for the Python process. The module I am using is apparently opening lots of files and not closing them.
I was expecting that I could close the files by deleting objects associated with the module that opened the files, but nothing happened.
I was also expecting to see the file objects somewhere in the garbage collection, but I saw nothing that resembled the open files with this:
for obj in gc.get_objects():
if hasattr(obj, 'read'):
print(obj)
The files disappear when I quit Python.
The problem is likely that those file descriptors are leaking without being associated to Python objects. Python has no way of seeing actual file descriptors (OS resources) that are not associated with Python objects. If they were associated with Python objects, Python would close them when they are garbage collected. Alternatively, the third-party library does its own tracking of file descriptors.
You can use os.close on plain integers to close the associated file descriptor. If you know which file descriptors you want to keep open (usually, stdin/stdout/stderr, which are 0, 1 and 2, and maybe a few others), you can just close all other integers from 0 to 65535, or simply those in /proc/<pid>/fd:
import os
KEEP_FD = set([0, 1, 2])
for fd in os.listdir(os.path.join("/proc", str(os.getpid()), "fd")):
if int(fd) not in KEEP_FD:
try:
os.close(int(fd))
except OSError:
pass
This is a pretty evil hack, though. The better solution would be to fix the third-party library.
I'm writing a Python script that needs to write some data to a temporary file, then create a subprocess running a C++ program that will read the temporary file. I'm trying to use NamedTemporaryFile for this, but according to the docs,
Whether the name can be used to open the file a second time, while the named temporary file is still open, varies across platforms (it can be so used on Unix; it cannot on Windows NT or later).
And indeed, on Windows if I flush the temporary file after writing, but don't close it until I want it to go away, the subprocess isn't able to open it for reading.
I'm working around this by creating the file with delete=False, closing it before spawning the subprocess, and then manually deleting it once I'm done:
fileTemp = tempfile.NamedTemporaryFile(delete = False)
try:
fileTemp.write(someStuff)
fileTemp.close()
# ...run the subprocess and wait for it to complete...
finally:
os.remove(fileTemp.name)
This seems inelegant. Is there a better way to do this? Perhaps a way to open up the permissions on the temporary file so the subprocess can get at it?
Since nobody else appears to be interested in leaving this information out in the open...
tempfile does expose a function, mkdtemp(), which can trivialize this problem:
try:
temp_dir = mkdtemp()
temp_file = make_a_file_in_a_dir(temp_dir)
do_your_subprocess_stuff(temp_file)
remove_your_temp_file(temp_file)
finally:
os.rmdir(temp_dir)
I leave the implementation of the intermediate functions up to the reader, as one might wish to do things like use mkstemp() to tighten up the security of the temporary file itself, or overwrite the file in-place before removing it. I don't particularly know what security restrictions one might have that are not easily planned for by perusing the source of tempfile.
Anyway, yes, using NamedTemporaryFile on Windows might be inelegant, and my solution here might also be inelegant, but you've already decided that Windows support is more important than elegant code, so you might as well go ahead and do something readable.
According to Richard Oudkerk
(...) the only reason that trying to reopen a NamedTemporaryFile fails on
Windows is because when we reopen we need to use O_TEMPORARY.
and he gives an example of how to do this in Python 3.3+
import os, tempfile
DATA = b"hello bob"
def temp_opener(name, flag, mode=0o777):
return os.open(name, flag | os.O_TEMPORARY, mode)
with tempfile.NamedTemporaryFile() as f:
f.write(DATA)
f.flush()
with open(f.name, "rb", opener=temp_opener) as f:
assert f.read() == DATA
assert not os.path.exists(f.name)
Because there's no opener parameter in the built-in open() in Python 2.x, we have to combine lower level os.open() and os.fdopen() functions to achieve the same effect:
import subprocess
import tempfile
DATA = b"hello bob"
with tempfile.NamedTemporaryFile() as f:
f.write(DATA)
f.flush()
subprocess_code = \
"""import os
f = os.fdopen(os.open(r'{FILENAME}', os.O_RDWR | os.O_BINARY | os.O_TEMPORARY), 'rb')
assert f.read() == b'{DATA}'
""".replace('\n', ';').format(FILENAME=f.name, DATA=DATA)
subprocess.check_output(['python', '-c', subprocess_code]) == DATA
You can always go low-level, though am not sure if it's clean enough for you:
fd, filename = tempfile.mkstemp()
try:
os.write(fd, someStuff)
os.close(fd)
# ...run the subprocess and wait for it to complete...
finally:
os.remove(filename)
At least if you open a temporary file using existing Python libraries, accessing it from multiple processes is not possible in case of Windows. According to MSDN you can specify a 3rd parameter (dwSharedMode) shared mode flag FILE_SHARE_READ to CreateFile() function which:
Enables subsequent open operations on a file or device to request read
access. Otherwise, other processes cannot open the file or device if
they request read access. If this flag is not specified, but the file
or device has been opened for read access, the function fails.
So, you can write a Windows specific C routine to create a custom temporary file opener function, call it from Python and then you can make your sub-process access the file without any error. But I think you should stick with your existing approach as it is the most portable version and will work on any system and thus is the most elegant implementation.
Discussion on Linux and windows file locking can be found here.
EDIT: Turns out it is possible to open & read the temporary file from multiple processes in Windows too. See Piotr Dobrogost's answer.
Using mkstemp() instead with os.fdopen() in a with statement avoids having to call close():
fd, path = tempfile.mkstemp()
try:
with os.fdopen(fd, 'wb') as fileTemp:
fileTemp.write(someStuff)
# ...run the subprocess and wait for it to complete...
finally:
os.remove(path)
I know this is a really old post, but I think it's relevant today given that the API is changing and functions like mktemp and mkstemp are being replaced by functions like TemporaryFile() and TemporaryDirectory(). I just wanted to demonstrate in the following sample how to make sure that a temp directory is still available downstream:
Instead of coding:
tmpdirname = tempfile.TemporaryDirectory()
and using tmpdirname throughout your code, you should trying to use your code in a with statement block to insure that it is available for your code calls... like this:
with tempfile.TemporaryDirectory() as tmpdirname:
[do dependent code nested so it's part of the with statement]
If you reference it outside of the with then it's likely that it won't be visible anymore.
In Python, and in general - does a close() operation on a file object imply a flush() operation?
Yes. It uses the underlying close() function which does that for you (source).
NB: close() and flush() won't ensure that the data is actually secure on the disk. It just ensures that the OS has the data == that it isn't buffered inside the process.
You can try sync or fsync to get the data written to the disk.
Yes, in Python 3 this is finally in the official documentation, but is was already the case in Python 2 (see Martin's answer).
As a complement to this question, yes python flushes before close, however if you want to ensure data is written properly to disk this is not enough.
This is how I would write a file in a way that it's atomically updated on a UNIX/Linux server, whenever the target file exists or not. Note that some filesystem will implicitly commit data to disk on close+rename (ext3 with data=ordered (default), and ext4 initially uncovered many application flaws before adding detection of write-close-rename patterns and sync data before metadata on those[1]).
# Write destfile, using a temporary name .<name>_XXXXXXXX
base, name = os.path.split(destfile)
tmpname = os.path.join(base, '.{}_'.format(name)) # This is the tmpfile prefix
with tempfile.NamedTemporaryFile('w', prefix=tmpname, delete=False) as fd:
# Replace prefix with actual file path/name
tmpname = str(fd.name)
try:
# Write fd here... ex:
json.dumps({}, fd)
# We want to fdatasync before closing, so we need to flush before close anyway
fd.flush()
os.fdatasync(fd)
# Since we're using tmpfile, we need to also set the proper permissions
if os.path.exists(destfile):
# Copy destination file's mask
os.fchmod(fd.fileno, os.stat(destfile).st_mode)
else:
# Set mask based on current umask value
umask = os.umask(0o22)
os.umask(umask)
os.fchmod(fd.fileno, 0o666 & ~umask) # 0o777 for dirs and executable files
# Now we can close and rename the file (overwriting any existing one)
fd.close()
os.rename(tmpname, destfile)
except:
# On error, try to cleanup the temporary file
try:
os.unlink(tmpname)
except OSError:
pass
raise
IMHO it would have been nice if Python provided simple methods around this... At the same time I guess if you care about data consistency it's probably best to really understand what is going on at a low level, especially since there are many differences across various Operating Systems and Filesystems.
Also note that this does not guarantee the written data can be recovered, only that you will get a consistent copy of the data (old or new). To ensure the new data is safely written and accessible when returning, you need to use os.fsync(...) after the rename, and even then if you have unsafe caches in the write path you could still lose data. this is common on consumer-grade hardware although any system can be configured for unsafe writes which boosts performance too. At least even with unsafe caches, the method above should still guarantee whichever copy of the data you get is valid.
filehandle.close does not necessarily flush. Surprisingly, filehandle.flush doesn't help either---it still can get stuck in the OS buffers when Python is running. Observe this session where I wrote to a file, closed it and Ctrl-Z to the shell command prompt and examined the file:
$ cat xyz
ghi
$ fg
python
>>> x=open("xyz","a")
>>> x.write("morestuff\n")
>>> x.write("morestuff\n")
>>> x.write("morestuff\n")
>>> x.flush
<built-in method flush of file object at 0x7f58e0044660>
>>> x.close
<built-in method close of file object at 0x7f58e0044660>
>>>
[1]+ Stopped python
$ cat xyz
ghi
Subsequently I can reopen the file, and that necessarily syncs the file (because, in this case, I open it in the append mode). As the others have said, the sync syscall (available from the os package) should flush all buffers to disk but it has possible system-wide performance implications (it syncs all files on the system).
I want to detect whether a file is locked, using python on Unix. It's OK to delete the file, assuming that it helps detects whether the file was locked.
The file could have been originally opened exclusively by another process. Documentation seems to suggest that os.unlink won't necessarily return an error if the file is locked.
Ideas?
The best way to check if a file is locked is to try to lock it. The fcntl module will do this in Python, e.g.
fcntl.lockf(fileobj.fileno(), fcntl.LOCK_EX | fcntl.LOCK_NB)
This will raise an IOError if the file is already locked; if it doesn't, you can then call
fcntl.lockf(fileobj.fileno(), fcntl.LOCK_UN)
To unlock it again.
Note that unlike Windows, opening a file for writing does not automatically give you an exclusive lock in Unix. Also note that the fcntl module is not available on Windows; you'll need to use os.open, which is a much less friendly but more portable interface (and may require re-opening the file again).
From the fcntl docs:
fcntl.lockf(fd, operation[, length[, start[, whence]]])
If LOCK_NB is used and the lock cannot be acquired, an IOError will be raised and the exception will have an errno attribute set to EACCES or EAGAIN (depending on the operating system; for portability, check for both values).
This uses the underlying unix flock mechanism, so looks like it should do what you want. Also note there is also os.open, which may be more platform-independent.
I tried to lock a file in mac and delete the same file in another terminal.
It allows the file to be deleted.
lock_file_path = "/tmp/lock.file"
fd = open(lock_file_path,"w")
fcntl.flock(fd.fileno(),LOCK_EX)
while True:
print("Locked")