I am rather new to software testing. I wonder how to write a unit test for file related functions in python. e.g., if I have a file copy function as follows.
def copy_file (self):
if not os.path.isdir(dest_path):
os.makedirs(dest_path)
try:
shutil.copy2(src_path, dest_path)
except IOError as e:
print e
What should I do about testing above function? what should I assert (directory, file content, exceptions)?
You can rely on shutil.copy2 copying the contents correctly. You only need to test your code. In this case that it creates the directories if they don't exist, and that it swallows IOErrors.
And don't forget to clean up. ;)
I think, you can get some clue from test_shutil itself and see how it is testing the copy functionality. Namely it is moving files and testing if it exists using another module. The difference in behavior of standard shutil.copy to your wrapper is in dealing with destination if it not already exists. In shutil.copy2, if the destination not already exists, then a file is created which is move from the source, in your case, it is not file, but a destination directory is created and you move your source into that. So write tests there destination does not exists and ensure that after your wrapper runs, the destination is still a directory and and it contains the file that shutil moved.
Think about requirements you put on your method, like:
The method shall report an error in case source_dir does not exist or either source or destination director are not accessible
The method shall report create destination_dir if it does not exist and report an error if destination dir cannot be created
The method shall report an error if either source or destination dir have illegal values (null or illegal chars for directories)
The method shall copy all files from source to destination directory
check, if all files have been copied
The method shall replace existing files (maybe)
check, if existing files are replaced
Those are just some thoughts. Don't put any possible requirement and don't test too much, keep in mind what want to do with this method. If it's private or only used in your own code, you can reduce the scope, but if you provide a public API, then you have to make sure that any input has a defined result (which may be an error message).
Related
I have a path (including directory and file name).
I need to test if the file-name is a valid, e.g. if the file-system will allow me to create a file with such a name.
The file-name has some unicode characters in it.
It's safe to assume the directory segment of the path is valid and accessible (I was trying to make the question more gnerally applicable, and apparently I wen too far).
I very much do not want to have to escape anything unless I have to.
I'd post some of the example characters I am dealing with, but apparently they get automatically removed by the stack-exchange system. Anyways, I want to keep standard unicode entities like ö, and only escape things which are invalid in a filename.
Here is the catch. There may (or may not) already be a file at the target of the path. I need to keep that file if it does exist, and not create a file if it does not.
Basically I want to check if I could write to a path without actually opening the path for writing (and the automatic file creation/file clobbering that typically entails).
As such:
try:
open(filename, 'w')
except OSError:
# handle error here
from here
Is not acceptable, because it will overwrite the existent file, which I do not want to touch (if it's there), or create said file if it's not.
I know I can do:
if not os.access(filePath, os.W_OK):
try:
open(filePath, 'w').close()
os.unlink(filePath)
except OSError:
# handle error here
But that will create the file at the filePath, which I would then have to os.unlink.
In the end, it seems like it's spending 6 or 7 lines to do something that should be as simple as os.isvalidpath(filePath) or similar.
As an aside, I need this to run on (at least) Windows and MacOS, so I'd like to avoid platform-specific stuff.
``
tl;dr
Call the is_path_exists_or_creatable() function defined below.
Strictly Python 3. That's just how we roll.
A Tale of Two Questions
The question of "How do I test pathname validity and, for valid pathnames, the existence or writability of those paths?" is clearly two separate questions. Both are interesting, and neither have received a genuinely satisfactory answer here... or, well, anywhere that I could grep.
vikki's answer probably hews the closest, but has the remarkable disadvantages of:
Needlessly opening (...and then failing to reliably close) file handles.
Needlessly writing (...and then failing to reliable close or delete) 0-byte files.
Ignoring OS-specific errors differentiating between non-ignorable invalid pathnames and ignorable filesystem issues. Unsurprisingly, this is critical under Windows. (See below.)
Ignoring race conditions resulting from external processes concurrently (re)moving parent directories of the pathname to be tested. (See below.)
Ignoring connection timeouts resulting from this pathname residing on stale, slow, or otherwise temporarily inaccessible filesystems. This could expose public-facing services to potential DoS-driven attacks. (See below.)
We're gonna fix all that.
Question #0: What's Pathname Validity Again?
Before hurling our fragile meat suits into the python-riddled moshpits of pain, we should probably define what we mean by "pathname validity." What defines validity, exactly?
By "pathname validity," we mean the syntactic correctness of a pathname with respect to the root filesystem of the current system – regardless of whether that path or parent directories thereof physically exist. A pathname is syntactically correct under this definition if it complies with all syntactic requirements of the root filesystem.
By "root filesystem," we mean:
On POSIX-compatible systems, the filesystem mounted to the root directory (/).
On Windows, the filesystem mounted to %HOMEDRIVE%, the colon-suffixed drive letter containing the current Windows installation (typically but not necessarily C:).
The meaning of "syntactic correctness," in turn, depends on the type of root filesystem. For ext4 (and most but not all POSIX-compatible) filesystems, a pathname is syntactically correct if and only if that pathname:
Contains no null bytes (i.e., \x00 in Python). This is a hard requirement for all POSIX-compatible filesystems.
Contains no path components longer than 255 bytes (e.g., 'a'*256 in Python). A path component is a longest substring of a pathname containing no / character (e.g., bergtatt, ind, i, and fjeldkamrene in the pathname /bergtatt/ind/i/fjeldkamrene).
Syntactic correctness. Root filesystem. That's it.
Question #1: How Now Shall We Do Pathname Validity?
Validating pathnames in Python is surprisingly non-intuitive. I'm in firm agreement with Fake Name here: the official os.path package should provide an out-of-the-box solution for this. For unknown (and probably uncompelling) reasons, it doesn't. Fortunately, unrolling your own ad-hoc solution isn't that gut-wrenching...
O.K., it actually is. It's hairy; it's nasty; it probably chortles as it burbles and giggles as it glows. But what you gonna do? Nuthin'.
We'll soon descend into the radioactive abyss of low-level code. But first, let's talk high-level shop. The standard os.stat() and os.lstat() functions raise the following exceptions when passed invalid pathnames:
For pathnames residing in non-existing directories, instances of FileNotFoundError.
For pathnames residing in existing directories:
Under Windows, instances of WindowsError whose winerror attribute is 123 (i.e., ERROR_INVALID_NAME).
Under all other OSes:
For pathnames containing null bytes (i.e., '\x00'), instances of TypeError.
For pathnames containing path components longer than 255 bytes, instances of OSError whose errcode attribute is:
Under SunOS and the *BSD family of OSes, errno.ERANGE. (This appears to be an OS-level bug, otherwise referred to as "selective interpretation" of the POSIX standard.)
Under all other OSes, errno.ENAMETOOLONG.
Crucially, this implies that only pathnames residing in existing directories are validatable. The os.stat() and os.lstat() functions raise generic FileNotFoundError exceptions when passed pathnames residing in non-existing directories, regardless of whether those pathnames are invalid or not. Directory existence takes precedence over pathname invalidity.
Does this mean that pathnames residing in non-existing directories are not validatable? Yes – unless we modify those pathnames to reside in existing directories. Is that even safely feasible, however? Shouldn't modifying a pathname prevent us from validating the original pathname?
To answer this question, recall from above that syntactically correct pathnames on the ext4 filesystem contain no path components (A) containing null bytes or (B) over 255 bytes in length. Hence, an ext4 pathname is valid if and only if all path components in that pathname are valid. This is true of most real-world filesystems of interest.
Does that pedantic insight actually help us? Yes. It reduces the larger problem of validating the full pathname in one fell swoop to the smaller problem of only validating all path components in that pathname. Any arbitrary pathname is validatable (regardless of whether that pathname resides in an existing directory or not) in a cross-platform manner by following the following algorithm:
Split that pathname into path components (e.g., the pathname /troldskog/faren/vild into the list ['', 'troldskog', 'faren', 'vild']).
For each such component:
Join the pathname of a directory guaranteed to exist with that component into a new temporary pathname (e.g., /troldskog) .
Pass that pathname to os.stat() or os.lstat(). If that pathname and hence that component is invalid, this call is guaranteed to raise an exception exposing the type of invalidity rather than a generic FileNotFoundError exception. Why? Because that pathname resides in an existing directory. (Circular logic is circular.)
Is there a directory guaranteed to exist? Yes, but typically only one: the topmost directory of the root filesystem (as defined above).
Passing pathnames residing in any other directory (and hence not guaranteed to exist) to os.stat() or os.lstat() invites race conditions, even if that directory was previously tested to exist. Why? Because external processes cannot be prevented from concurrently removing that directory after that test has been performed but before that pathname is passed to os.stat() or os.lstat(). Unleash the dogs of mind-fellating insanity!
There exists a substantial side benefit to the above approach as well: security. (Isn't that nice?) Specifically:
Front-facing applications validating arbitrary pathnames from untrusted sources by simply passing such pathnames to os.stat() or os.lstat() are susceptible to Denial of Service (DoS) attacks and other black-hat shenanigans. Malicious users may attempt to repeatedly validate pathnames residing on filesystems known to be stale or otherwise slow (e.g., NFS Samba shares); in that case, blindly statting incoming pathnames is liable to either eventually fail with connection timeouts or consume more time and resources than your feeble capacity to withstand unemployment.
The above approach obviates this by only validating the path components of a pathname against the root directory of the root filesystem. (If even that's stale, slow, or inaccessible, you've got larger problems than pathname validation.)
Lost? Great. Let's begin. (Python 3 assumed. See "What Is Fragile Hope for 300, leycec?")
import errno, os
# Sadly, Python fails to provide the following magic number for us.
ERROR_INVALID_NAME = 123
'''
Windows-specific error code indicating an invalid pathname.
See Also
----------
https://learn.microsoft.com/en-us/windows/win32/debug/system-error-codes--0-499-
Official listing of all such codes.
'''
def is_pathname_valid(pathname: str) -> bool:
'''
`True` if the passed pathname is a valid pathname for the current OS;
`False` otherwise.
'''
# If this pathname is either not a string or is but is empty, this pathname
# is invalid.
try:
if not isinstance(pathname, str) or not pathname:
return False
# Strip this pathname's Windows-specific drive specifier (e.g., `C:\`)
# if any. Since Windows prohibits path components from containing `:`
# characters, failing to strip this `:`-suffixed prefix would
# erroneously invalidate all valid absolute Windows pathnames.
_, pathname = os.path.splitdrive(pathname)
# Directory guaranteed to exist. If the current OS is Windows, this is
# the drive to which Windows was installed (e.g., the "%HOMEDRIVE%"
# environment variable); else, the typical root directory.
root_dirname = os.environ.get('HOMEDRIVE', 'C:') \
if sys.platform == 'win32' else os.path.sep
assert os.path.isdir(root_dirname) # ...Murphy and her ironclad Law
# Append a path separator to this directory if needed.
root_dirname = root_dirname.rstrip(os.path.sep) + os.path.sep
# Test whether each path component split from this pathname is valid or
# not, ignoring non-existent and non-readable path components.
for pathname_part in pathname.split(os.path.sep):
try:
os.lstat(root_dirname + pathname_part)
# If an OS-specific exception is raised, its error code
# indicates whether this pathname is valid or not. Unless this
# is the case, this exception implies an ignorable kernel or
# filesystem complaint (e.g., path not found or inaccessible).
#
# Only the following exceptions indicate invalid pathnames:
#
# * Instances of the Windows-specific "WindowsError" class
# defining the "winerror" attribute whose value is
# "ERROR_INVALID_NAME". Under Windows, "winerror" is more
# fine-grained and hence useful than the generic "errno"
# attribute. When a too-long pathname is passed, for example,
# "errno" is "ENOENT" (i.e., no such file or directory) rather
# than "ENAMETOOLONG" (i.e., file name too long).
# * Instances of the cross-platform "OSError" class defining the
# generic "errno" attribute whose value is either:
# * Under most POSIX-compatible OSes, "ENAMETOOLONG".
# * Under some edge-case OSes (e.g., SunOS, *BSD), "ERANGE".
except OSError as exc:
if hasattr(exc, 'winerror'):
if exc.winerror == ERROR_INVALID_NAME:
return False
elif exc.errno in {errno.ENAMETOOLONG, errno.ERANGE}:
return False
# If a "TypeError" exception was raised, it almost certainly has the
# error message "embedded NUL character" indicating an invalid pathname.
except TypeError as exc:
return False
# If no exception was raised, all path components and hence this
# pathname itself are valid. (Praise be to the curmudgeonly python.)
else:
return True
# If any other exception was raised, this is an unrelated fatal issue
# (e.g., a bug). Permit this exception to unwind the call stack.
#
# Did we mention this should be shipped with Python already?
Done. Don't squint at that code. (It bites.)
Question #2: Possibly Invalid Pathname Existence or Creatability, Eh?
Testing the existence or creatability of possibly invalid pathnames is, given the above solution, mostly trivial. The little key here is to call the previously defined function before testing the passed path:
def is_path_creatable(pathname: str) -> bool:
'''
`True` if the current user has sufficient permissions to create the passed
pathname; `False` otherwise.
'''
# Parent directory of the passed path. If empty, we substitute the current
# working directory (CWD) instead.
dirname = os.path.dirname(pathname) or os.getcwd()
return os.access(dirname, os.W_OK)
def is_path_exists_or_creatable(pathname: str) -> bool:
'''
`True` if the passed pathname is a valid pathname for the current OS _and_
either currently exists or is hypothetically creatable; `False` otherwise.
This function is guaranteed to _never_ raise exceptions.
'''
try:
# To prevent "os" module calls from raising undesirable exceptions on
# invalid pathnames, is_pathname_valid() is explicitly called first.
return is_pathname_valid(pathname) and (
os.path.exists(pathname) or is_path_creatable(pathname))
# Report failure on non-fatal filesystem complaints (e.g., connection
# timeouts, permissions issues) implying this path to be inaccessible. All
# other exceptions are unrelated fatal issues and should not be caught here.
except OSError:
return False
Done and done. Except not quite.
Question #3: Possibly Invalid Pathname Existence or Writability on Windows
There exists a caveat. Of course there does.
As the official os.access() documentation admits:
Note: I/O operations may fail even when os.access() indicates that they would succeed, particularly for operations on network filesystems which may have permissions semantics beyond the usual POSIX permission-bit model.
To no one's surprise, Windows is the usual suspect here. Thanks to extensive use of Access Control Lists (ACL) on NTFS filesystems, the simplistic POSIX permission-bit model maps poorly to the underlying Windows reality. While this (arguably) isn't Python's fault, it might nonetheless be of concern for Windows-compatible applications.
If this is you, a more robust alternative is wanted. If the passed path does not exist, we instead attempt to create a temporary file guaranteed to be immediately deleted in the parent directory of that path – a more portable (if expensive) test of creatability:
import os, tempfile
def is_path_sibling_creatable(pathname: str) -> bool:
'''
`True` if the current user has sufficient permissions to create **siblings**
(i.e., arbitrary files in the parent directory) of the passed pathname;
`False` otherwise.
'''
# Parent directory of the passed path. If empty, we substitute the current
# working directory (CWD) instead.
dirname = os.path.dirname(pathname) or os.getcwd()
try:
# For safety, explicitly close and hence delete this temporary file
# immediately after creating it in the passed path's parent directory.
with tempfile.TemporaryFile(dir=dirname): pass
return True
# While the exact type of exception raised by the above function depends on
# the current version of the Python interpreter, all such types subclass the
# following exception superclass.
except EnvironmentError:
return False
def is_path_exists_or_creatable_portable(pathname: str) -> bool:
'''
`True` if the passed pathname is a valid pathname on the current OS _and_
either currently exists or is hypothetically creatable in a cross-platform
manner optimized for POSIX-unfriendly filesystems; `False` otherwise.
This function is guaranteed to _never_ raise exceptions.
'''
try:
# To prevent "os" module calls from raising undesirable exceptions on
# invalid pathnames, is_pathname_valid() is explicitly called first.
return is_pathname_valid(pathname) and (
os.path.exists(pathname) or is_path_sibling_creatable(pathname))
# Report failure on non-fatal filesystem complaints (e.g., connection
# timeouts, permissions issues) implying this path to be inaccessible. All
# other exceptions are unrelated fatal issues and should not be caught here.
except OSError:
return False
Note, however, that even this may not be enough.
Thanks to User Access Control (UAC), the ever-inimicable Windows Vista and all subsequent iterations thereof blatantly lie about permissions pertaining to system directories. When non-Administrator users attempt to create files in either the canonical C:\Windows or C:\Windows\system32 directories, UAC superficially permits the user to do so while actually isolating all created files into a "Virtual Store" in that user's profile. (Who could have possibly imagined that deceiving users would have harmful long-term consequences?)
This is crazy. This is Windows.
Prove It
Dare we? It's time to test-drive the above tests.
Since NULL is the only character prohibited in pathnames on UNIX-oriented filesystems, let's leverage that to demonstrate the cold, hard truth – ignoring non-ignorable Windows shenanigans, which frankly bore and anger me in equal measure:
>>> print('"foo.bar" valid? ' + str(is_pathname_valid('foo.bar')))
"foo.bar" valid? True
>>> print('Null byte valid? ' + str(is_pathname_valid('\x00')))
Null byte valid? False
>>> print('Long path valid? ' + str(is_pathname_valid('a' * 256)))
Long path valid? False
>>> print('"/dev" exists or creatable? ' + str(is_path_exists_or_creatable('/dev')))
"/dev" exists or creatable? True
>>> print('"/dev/foo.bar" exists or creatable? ' + str(is_path_exists_or_creatable('/dev/foo.bar')))
"/dev/foo.bar" exists or creatable? False
>>> print('Null byte exists or creatable? ' + str(is_path_exists_or_creatable('\x00')))
Null byte exists or creatable? False
Beyond sanity. Beyond pain. You will find Python portability concerns.
if os.path.exists(filePath):
#the file is there
elif os.access(os.path.dirname(filePath), os.W_OK):
#the file does not exists but write privileges are given
else:
#can not write there
Note that path.exists can fail for more reasons than just the file is not there so you might have to do finer tests like testing if the containing directory exists and so on.
After my discussion with the OP it turned out, that the main problem seems to be, that the file name might contain characters that are not allowed by the filesystem. Of course they need to be removed but the OP wants to maintain as much human readablitiy as the filesystem allows.
Sadly I do not know of any good solution for this.
However Cecil Curry's answer takes a closer look at detecting the problem.
I found a PyPI module called pathvalidate.
pip install pathvalidate
Inside it has a function called sanitize_filepath which will take a file path and convert it into a valid file path:
from pathvalidate import sanitize_filepath
file1 = "ap:lle/fi:le"
print(sanitize_filepath(file1))
# Output: "apple/file"
It also works with reserved names too. If you feed it the file path con, it will return con_.
With this knowledge, we can check if the entered file path is equal to the sanitized one and that will mean the file path is valid.
import os
from pathvalidate import sanitize_filepath
def check(filePath):
if os.path.exists(filePath):
return True
if filePath == sanitize_filepath(filePath):
return True
return False
With Python 3, how about:
try:
with open(filename, 'x') as tempfile: # OSError if file exists or is invalid
pass
except OSError:
# handle error here
With the 'x' option we also don't have to worry about race conditions. See documentation here.
Now, this WILL create a very shortlived temporary file if it does not exist already - unless the name is invalid. If you can live with that, it simplifies things a lot.
open(filename,'r') #2nd argument is r and not w
will open the file or give an error if it doesn't exist. If there's an error, then you can try to write to the path, if you can't then you get a second error
try:
open(filename,'r')
return True
except IOError:
try:
open(filename, 'w')
return True
except IOError:
return False
Also have a look here about permissions on windows
try os.path.exists this will check for the path and return True if exists and False if not.
I'm writting an python program and now I'm working at exceptions.
while True:
try:
os.makedirs("{}\\test".format(dest))
except PermissionError:
print("Make sure that you have access to specified path")
print("Try again specify your path: ", end='')
dest = input()
continue
break
It is working but later I need to delete that folder.
What is the better way to do it?
Don't.
It is almost never worth verifying that you have permissions to perform an operation that your program requires. For one thing, permissions are not the only possible reason for failure. A delete may also fail because of a file lock by another program, for instance. Unless you have a very good reason to do otherwise, it is both more efficient and more reliable to just write your code to try the operation and then abort on failure:
import shutil
try:
shutil.rmtree(path_to_remove) # Recursively deletes directory and files inside it
except Exception as ex:
print('Failed to delete directory, manual clean up may be required: {}'.format(path_to_remove))
sys.exit(1)
Other concerns about your code
Use os.path.join to concatenate file paths: os.makedirs(os.path.join(dest, test)). This will use the appropriate directory separator for the operating system.
Why are you looping on failure? In real world programs, simply aborting the entire operation is simpler and usually makes for a better user experience.
Are you sure you aren't looking for the tempfile library? It allows you to spit out a unique directory to the operating system's standard temporary location:
import tempfile
with tempfile.TemporaryDirectory() as tmpdir:
some_function_that_creates_several_files(tmpdir)
for f in os.walk(tmpdir):
# do something with each file
# tmpdir automatically deleted when context manager exits
# Or if you really only need the file
with tempfile.TemporaryFile() as tmpfile:
tmpfile.write('my data')
some_function_that_needs_a_file(tmpfile)
# tmpfile automatically deleted when context manager exits
I think what you want is os.access.
os.access(path, mode, *, dir_fd=None, effective_ids=False, follow_symlinks=True)
Use the real uid/gid to test for access to path. Note that most operations will use the effective uid/gid, therefore this routine can be used in a suid/sgid environment to test if the invoking user has the specified access to path. mode should be F_OK to test the existence of path, or it can be the inclusive OR of one or more of R_OK, W_OK, and X_OK to test permissions. Return True if access is allowed, False if not.
For example:
os.access("/path", os.R_OK)
And the mode contains:
os.F_OK # existence
os.R_OK # readability
os.W_OK # writability
os.X_OK # executability
Refer: https://docs.python.org/3.7/library/os.html#os.access
I am trying to call a python function that takes an absolute path as an argument, but the file I want to reference is on the web.
Without cloning the file locally, is there a way I can refer to the file that will make python think the file is local?
In other words, I want to wrap the URL in a variable my_file_path, and have this return True:
os.path.isfile(my_file_path)
Note that I need to fake a file system path, as other calls in the program I am using are expecting a path, and not a file-like object (this includes other functions that call the function I linked)
A really great way to do this is with the requests library. You can get a file-like object using the stream=True option to the get function:
r = requests.get('https://api.github.com/events', stream=True)
loadmat(r.raw, ...)
In the case of needing an actual path, you can use the tempfile module as well:
with tempfile.NamedTemporaryFile() as fd:
r = requests.get('https://api.github.com/events', stream=True)
for chunk in r.iter_content(chunk_size):
fd.write(chunk)
fd.flush()
loadmat(fd.name)
# other code here, where the temp file no longer exists but the data has been read
There is no way to make Python take a URL where it wants a path.
In many cases—like the very function you linked in your question—it actually wants a file-like object, and the object returned by, e.g., urlopen is file-like. But in other cases, that doesn't work.
So, what can you do?
Below the Python level, your operating system may have a way to mount different kinds of remote paths as if they were part of your local filesystem.
At a higher level, write your own wrapper that just downloads the file to a temporary file. That temporary file will, of course, pass the os.path.isfile(my_file_path) test that you wanted, and will work with everything else that needs a file. But it means that you need to keep the two "layers" of your code—the part that wants to deal with URLs, and the part that needs to deal with functions that can only take local files—separate, and write the interface between those layers. On at least some platforms, you can create a temporary file that never gets flushed to disk unless necessary. (You can even create a temporary file that doesn't appear anywhere in the directory tree, but that wouldn't help here, because then you obviously can't pass a pathname around…) So you're not "cloning the file" in any sense that actually matters.
I'm building a relatively simple application that asks for directories, checks if they're correct, and then removes one of them and recreates it with contents of the other one. I'm encountering this weird behaviour, I'll try to explain:
When I've got the destination folder window open, AND it's empty, there's an access denied exception, then I get kicked out of the folder and it gets removed. But then if it's not empty, it works just fine, no exceptions, the destination directory (from what it seems) gets emptied then filled with files from the source directory. Which is strange because it's supposed to straight out remove the destination folder no matter what, and then recreate it with the same name and contents from the source destination.
This doesn't make sense to me, shouldn't there be the exact same exception when I'm browsing the directory when it's not empty as when it's empty? What's the difference, it's still supposed to just delete the folder. Is there any logical explanation to that? Also, if there's an exception, why does the directory get removed anyway?
Code for this particular part is pretty straightforward (please keep in mind I'm a beginner :) )
def Delete(self, dest):
try:
shutil.rmtree(dest)
self.Paste(self.src, dest)
except (IOError, os.error) as e:
print e
def Paste(self, src, dest):
try:
shutil.copytree(src, dest)
except (IOError, os.error) as e:
print e
This is expected behaviour on Windows.
Internally shutil.rmtree calls the windows API function DeleteFile which is documented on MSDN (http://msdn.microsoft.com/en-us/library/windows/desktop/aa363915%28v=vs.85%29.aspx).
This function has the following property (highlight by me):
The DeleteFile function marks a file for deletion on close. Therefore, the file deletion does not occur until the last handle to the file is closed. Subsequent calls to CreateFile to open the file fail with ERROR_ACCESS_DENIED.
If any other process still has a handle open (e.g. virus scanners, windows explorer because you watch the directory or anything else that might still have a handle to that directory), it will not go away.
Usually you just catch the exception in your Paste operation and retry it a few times/for some dozend milliseconds, to handle all those weird virus scanner anomalies.
Little bonus: You can use windbg or ProcessExplorer to find out who still keeps an open handle to your file (just use Find Handle in Process explorer and search for the filename).
I need to create a folder that I use only once, but need to have it exist until the next run. It seems like I should be using the tmp_file module in the standard library, but I'm not sure how to get the behavior that I want.
Currently, I'm doing the following to create the directory:
randName = "temp" + str(random.randint(1000, 9999))
os.makedirs(randName)
And when I want to delete the directory, I just look for a directory with "temp" in it.
This seems like a dirty hack, but I'm not sure of a better way at the moment.
Incidentally, the reason that I need the folder around is that I start a process that uses the folder with the following:
subprocess.Popen([command], shell=True).pid
and then quit my script to let the other process finish the work.
Creating the folder with a 4-digit random number is insecure, and you also need to worry about collisions with other instances of your program.
A much better way is to create the folder using tempfile.mkdtemp, which does exactly what you want (i.e. the folder is not deleted when your script exits). You would then pass the folder name to the second Popen'ed script as an argument, and it would be responsible for deleting it.
What you've suggested is dangerous. You may have race conditions if anyone else is trying to create those directories -- including other instances of your application. Also, deleting anything containing "temp" may result in deleting more than you intended. As others have mentioned, tempfile.mkdtemp is probably the safest way to go. Here is an example of what you've described, including launching a subprocess to use the new directory.
import tempfile
import shutil
import subprocess
d = tempfile.mkdtemp(prefix='tmp')
try:
subprocess.check_call(['/bin/echo', 'Directory:', d])
finally:
shutil.rmtree(d)
"I need to create a folder that I use only once, but need to have it exist until the next run."
"Incidentally, the reason that I need the folder around is that I start a process ..."
Not incidental, at all. Crucial.
It appears you have the following design pattern.
mkdir someDirectory
proc1 -o someDirectory # Write to the directory
proc2 -i someDirectory # Read from the directory
if [ %? == 0 ]
then
rm someDirectory
fi
Is that the kind of thing you'd write at the shell level?
If so, consider breaking your Python application into to several parts.
The parts that do the real work ("proc1" and "proc2")
A Shell which manages the resources and processes; essentially a Python replacement for a bash script.
A temporary file is something that lasts for a single program run.
What you need is not, therefore, a temporary file.
Also, beware of multiple users on a single machine - just deleting anything with the 'temp' pattern could be anti-social, doubly so if the directory is not located securely out of the way.
Also, remember that on some machines, the /tmp file system is rebuilt when the machine reboots.
You can also automatically register an function to completely remove the temporary directory on any exit (with or without error) by doing :
import atexit
import shutil
import tempfile
# create your temporary directory
d = tempfile.mkdtemp()
# suppress it when python will be closed
atexit.register(lambda: shutil.rmtree(d))
# do your stuff...
subprocess.Popen([command], shell=True).pid
tempfile is just fine, but to be on a safe side you'd need to safe a directory name somewhere until the next run, for example pickle it. then read it in the next run and delete directory. and you are not required to have /tmp for the root, tempfile.mkdtemp has an optional dir parameter for that. by and large, though, it won't be different from what you're doing at the moment.
The best way of creating the temporary file name is either using tempName.TemporaryFile(mode='w+b', suffix='.tmp', prifix='someRandomNumber' dir=None)
or u can use mktemp() function.
The mktemp() function will not actually create any file, but will provide a unique filename (actually does not contain PID).