Related
I have a path (including directory and file name).
I need to test if the file-name is a valid, e.g. if the file-system will allow me to create a file with such a name.
The file-name has some unicode characters in it.
It's safe to assume the directory segment of the path is valid and accessible (I was trying to make the question more gnerally applicable, and apparently I wen too far).
I very much do not want to have to escape anything unless I have to.
I'd post some of the example characters I am dealing with, but apparently they get automatically removed by the stack-exchange system. Anyways, I want to keep standard unicode entities like ö, and only escape things which are invalid in a filename.
Here is the catch. There may (or may not) already be a file at the target of the path. I need to keep that file if it does exist, and not create a file if it does not.
Basically I want to check if I could write to a path without actually opening the path for writing (and the automatic file creation/file clobbering that typically entails).
As such:
try:
open(filename, 'w')
except OSError:
# handle error here
from here
Is not acceptable, because it will overwrite the existent file, which I do not want to touch (if it's there), or create said file if it's not.
I know I can do:
if not os.access(filePath, os.W_OK):
try:
open(filePath, 'w').close()
os.unlink(filePath)
except OSError:
# handle error here
But that will create the file at the filePath, which I would then have to os.unlink.
In the end, it seems like it's spending 6 or 7 lines to do something that should be as simple as os.isvalidpath(filePath) or similar.
As an aside, I need this to run on (at least) Windows and MacOS, so I'd like to avoid platform-specific stuff.
``
tl;dr
Call the is_path_exists_or_creatable() function defined below.
Strictly Python 3. That's just how we roll.
A Tale of Two Questions
The question of "How do I test pathname validity and, for valid pathnames, the existence or writability of those paths?" is clearly two separate questions. Both are interesting, and neither have received a genuinely satisfactory answer here... or, well, anywhere that I could grep.
vikki's answer probably hews the closest, but has the remarkable disadvantages of:
Needlessly opening (...and then failing to reliably close) file handles.
Needlessly writing (...and then failing to reliable close or delete) 0-byte files.
Ignoring OS-specific errors differentiating between non-ignorable invalid pathnames and ignorable filesystem issues. Unsurprisingly, this is critical under Windows. (See below.)
Ignoring race conditions resulting from external processes concurrently (re)moving parent directories of the pathname to be tested. (See below.)
Ignoring connection timeouts resulting from this pathname residing on stale, slow, or otherwise temporarily inaccessible filesystems. This could expose public-facing services to potential DoS-driven attacks. (See below.)
We're gonna fix all that.
Question #0: What's Pathname Validity Again?
Before hurling our fragile meat suits into the python-riddled moshpits of pain, we should probably define what we mean by "pathname validity." What defines validity, exactly?
By "pathname validity," we mean the syntactic correctness of a pathname with respect to the root filesystem of the current system – regardless of whether that path or parent directories thereof physically exist. A pathname is syntactically correct under this definition if it complies with all syntactic requirements of the root filesystem.
By "root filesystem," we mean:
On POSIX-compatible systems, the filesystem mounted to the root directory (/).
On Windows, the filesystem mounted to %HOMEDRIVE%, the colon-suffixed drive letter containing the current Windows installation (typically but not necessarily C:).
The meaning of "syntactic correctness," in turn, depends on the type of root filesystem. For ext4 (and most but not all POSIX-compatible) filesystems, a pathname is syntactically correct if and only if that pathname:
Contains no null bytes (i.e., \x00 in Python). This is a hard requirement for all POSIX-compatible filesystems.
Contains no path components longer than 255 bytes (e.g., 'a'*256 in Python). A path component is a longest substring of a pathname containing no / character (e.g., bergtatt, ind, i, and fjeldkamrene in the pathname /bergtatt/ind/i/fjeldkamrene).
Syntactic correctness. Root filesystem. That's it.
Question #1: How Now Shall We Do Pathname Validity?
Validating pathnames in Python is surprisingly non-intuitive. I'm in firm agreement with Fake Name here: the official os.path package should provide an out-of-the-box solution for this. For unknown (and probably uncompelling) reasons, it doesn't. Fortunately, unrolling your own ad-hoc solution isn't that gut-wrenching...
O.K., it actually is. It's hairy; it's nasty; it probably chortles as it burbles and giggles as it glows. But what you gonna do? Nuthin'.
We'll soon descend into the radioactive abyss of low-level code. But first, let's talk high-level shop. The standard os.stat() and os.lstat() functions raise the following exceptions when passed invalid pathnames:
For pathnames residing in non-existing directories, instances of FileNotFoundError.
For pathnames residing in existing directories:
Under Windows, instances of WindowsError whose winerror attribute is 123 (i.e., ERROR_INVALID_NAME).
Under all other OSes:
For pathnames containing null bytes (i.e., '\x00'), instances of TypeError.
For pathnames containing path components longer than 255 bytes, instances of OSError whose errcode attribute is:
Under SunOS and the *BSD family of OSes, errno.ERANGE. (This appears to be an OS-level bug, otherwise referred to as "selective interpretation" of the POSIX standard.)
Under all other OSes, errno.ENAMETOOLONG.
Crucially, this implies that only pathnames residing in existing directories are validatable. The os.stat() and os.lstat() functions raise generic FileNotFoundError exceptions when passed pathnames residing in non-existing directories, regardless of whether those pathnames are invalid or not. Directory existence takes precedence over pathname invalidity.
Does this mean that pathnames residing in non-existing directories are not validatable? Yes – unless we modify those pathnames to reside in existing directories. Is that even safely feasible, however? Shouldn't modifying a pathname prevent us from validating the original pathname?
To answer this question, recall from above that syntactically correct pathnames on the ext4 filesystem contain no path components (A) containing null bytes or (B) over 255 bytes in length. Hence, an ext4 pathname is valid if and only if all path components in that pathname are valid. This is true of most real-world filesystems of interest.
Does that pedantic insight actually help us? Yes. It reduces the larger problem of validating the full pathname in one fell swoop to the smaller problem of only validating all path components in that pathname. Any arbitrary pathname is validatable (regardless of whether that pathname resides in an existing directory or not) in a cross-platform manner by following the following algorithm:
Split that pathname into path components (e.g., the pathname /troldskog/faren/vild into the list ['', 'troldskog', 'faren', 'vild']).
For each such component:
Join the pathname of a directory guaranteed to exist with that component into a new temporary pathname (e.g., /troldskog) .
Pass that pathname to os.stat() or os.lstat(). If that pathname and hence that component is invalid, this call is guaranteed to raise an exception exposing the type of invalidity rather than a generic FileNotFoundError exception. Why? Because that pathname resides in an existing directory. (Circular logic is circular.)
Is there a directory guaranteed to exist? Yes, but typically only one: the topmost directory of the root filesystem (as defined above).
Passing pathnames residing in any other directory (and hence not guaranteed to exist) to os.stat() or os.lstat() invites race conditions, even if that directory was previously tested to exist. Why? Because external processes cannot be prevented from concurrently removing that directory after that test has been performed but before that pathname is passed to os.stat() or os.lstat(). Unleash the dogs of mind-fellating insanity!
There exists a substantial side benefit to the above approach as well: security. (Isn't that nice?) Specifically:
Front-facing applications validating arbitrary pathnames from untrusted sources by simply passing such pathnames to os.stat() or os.lstat() are susceptible to Denial of Service (DoS) attacks and other black-hat shenanigans. Malicious users may attempt to repeatedly validate pathnames residing on filesystems known to be stale or otherwise slow (e.g., NFS Samba shares); in that case, blindly statting incoming pathnames is liable to either eventually fail with connection timeouts or consume more time and resources than your feeble capacity to withstand unemployment.
The above approach obviates this by only validating the path components of a pathname against the root directory of the root filesystem. (If even that's stale, slow, or inaccessible, you've got larger problems than pathname validation.)
Lost? Great. Let's begin. (Python 3 assumed. See "What Is Fragile Hope for 300, leycec?")
import errno, os
# Sadly, Python fails to provide the following magic number for us.
ERROR_INVALID_NAME = 123
'''
Windows-specific error code indicating an invalid pathname.
See Also
----------
https://learn.microsoft.com/en-us/windows/win32/debug/system-error-codes--0-499-
Official listing of all such codes.
'''
def is_pathname_valid(pathname: str) -> bool:
'''
`True` if the passed pathname is a valid pathname for the current OS;
`False` otherwise.
'''
# If this pathname is either not a string or is but is empty, this pathname
# is invalid.
try:
if not isinstance(pathname, str) or not pathname:
return False
# Strip this pathname's Windows-specific drive specifier (e.g., `C:\`)
# if any. Since Windows prohibits path components from containing `:`
# characters, failing to strip this `:`-suffixed prefix would
# erroneously invalidate all valid absolute Windows pathnames.
_, pathname = os.path.splitdrive(pathname)
# Directory guaranteed to exist. If the current OS is Windows, this is
# the drive to which Windows was installed (e.g., the "%HOMEDRIVE%"
# environment variable); else, the typical root directory.
root_dirname = os.environ.get('HOMEDRIVE', 'C:') \
if sys.platform == 'win32' else os.path.sep
assert os.path.isdir(root_dirname) # ...Murphy and her ironclad Law
# Append a path separator to this directory if needed.
root_dirname = root_dirname.rstrip(os.path.sep) + os.path.sep
# Test whether each path component split from this pathname is valid or
# not, ignoring non-existent and non-readable path components.
for pathname_part in pathname.split(os.path.sep):
try:
os.lstat(root_dirname + pathname_part)
# If an OS-specific exception is raised, its error code
# indicates whether this pathname is valid or not. Unless this
# is the case, this exception implies an ignorable kernel or
# filesystem complaint (e.g., path not found or inaccessible).
#
# Only the following exceptions indicate invalid pathnames:
#
# * Instances of the Windows-specific "WindowsError" class
# defining the "winerror" attribute whose value is
# "ERROR_INVALID_NAME". Under Windows, "winerror" is more
# fine-grained and hence useful than the generic "errno"
# attribute. When a too-long pathname is passed, for example,
# "errno" is "ENOENT" (i.e., no such file or directory) rather
# than "ENAMETOOLONG" (i.e., file name too long).
# * Instances of the cross-platform "OSError" class defining the
# generic "errno" attribute whose value is either:
# * Under most POSIX-compatible OSes, "ENAMETOOLONG".
# * Under some edge-case OSes (e.g., SunOS, *BSD), "ERANGE".
except OSError as exc:
if hasattr(exc, 'winerror'):
if exc.winerror == ERROR_INVALID_NAME:
return False
elif exc.errno in {errno.ENAMETOOLONG, errno.ERANGE}:
return False
# If a "TypeError" exception was raised, it almost certainly has the
# error message "embedded NUL character" indicating an invalid pathname.
except TypeError as exc:
return False
# If no exception was raised, all path components and hence this
# pathname itself are valid. (Praise be to the curmudgeonly python.)
else:
return True
# If any other exception was raised, this is an unrelated fatal issue
# (e.g., a bug). Permit this exception to unwind the call stack.
#
# Did we mention this should be shipped with Python already?
Done. Don't squint at that code. (It bites.)
Question #2: Possibly Invalid Pathname Existence or Creatability, Eh?
Testing the existence or creatability of possibly invalid pathnames is, given the above solution, mostly trivial. The little key here is to call the previously defined function before testing the passed path:
def is_path_creatable(pathname: str) -> bool:
'''
`True` if the current user has sufficient permissions to create the passed
pathname; `False` otherwise.
'''
# Parent directory of the passed path. If empty, we substitute the current
# working directory (CWD) instead.
dirname = os.path.dirname(pathname) or os.getcwd()
return os.access(dirname, os.W_OK)
def is_path_exists_or_creatable(pathname: str) -> bool:
'''
`True` if the passed pathname is a valid pathname for the current OS _and_
either currently exists or is hypothetically creatable; `False` otherwise.
This function is guaranteed to _never_ raise exceptions.
'''
try:
# To prevent "os" module calls from raising undesirable exceptions on
# invalid pathnames, is_pathname_valid() is explicitly called first.
return is_pathname_valid(pathname) and (
os.path.exists(pathname) or is_path_creatable(pathname))
# Report failure on non-fatal filesystem complaints (e.g., connection
# timeouts, permissions issues) implying this path to be inaccessible. All
# other exceptions are unrelated fatal issues and should not be caught here.
except OSError:
return False
Done and done. Except not quite.
Question #3: Possibly Invalid Pathname Existence or Writability on Windows
There exists a caveat. Of course there does.
As the official os.access() documentation admits:
Note: I/O operations may fail even when os.access() indicates that they would succeed, particularly for operations on network filesystems which may have permissions semantics beyond the usual POSIX permission-bit model.
To no one's surprise, Windows is the usual suspect here. Thanks to extensive use of Access Control Lists (ACL) on NTFS filesystems, the simplistic POSIX permission-bit model maps poorly to the underlying Windows reality. While this (arguably) isn't Python's fault, it might nonetheless be of concern for Windows-compatible applications.
If this is you, a more robust alternative is wanted. If the passed path does not exist, we instead attempt to create a temporary file guaranteed to be immediately deleted in the parent directory of that path – a more portable (if expensive) test of creatability:
import os, tempfile
def is_path_sibling_creatable(pathname: str) -> bool:
'''
`True` if the current user has sufficient permissions to create **siblings**
(i.e., arbitrary files in the parent directory) of the passed pathname;
`False` otherwise.
'''
# Parent directory of the passed path. If empty, we substitute the current
# working directory (CWD) instead.
dirname = os.path.dirname(pathname) or os.getcwd()
try:
# For safety, explicitly close and hence delete this temporary file
# immediately after creating it in the passed path's parent directory.
with tempfile.TemporaryFile(dir=dirname): pass
return True
# While the exact type of exception raised by the above function depends on
# the current version of the Python interpreter, all such types subclass the
# following exception superclass.
except EnvironmentError:
return False
def is_path_exists_or_creatable_portable(pathname: str) -> bool:
'''
`True` if the passed pathname is a valid pathname on the current OS _and_
either currently exists or is hypothetically creatable in a cross-platform
manner optimized for POSIX-unfriendly filesystems; `False` otherwise.
This function is guaranteed to _never_ raise exceptions.
'''
try:
# To prevent "os" module calls from raising undesirable exceptions on
# invalid pathnames, is_pathname_valid() is explicitly called first.
return is_pathname_valid(pathname) and (
os.path.exists(pathname) or is_path_sibling_creatable(pathname))
# Report failure on non-fatal filesystem complaints (e.g., connection
# timeouts, permissions issues) implying this path to be inaccessible. All
# other exceptions are unrelated fatal issues and should not be caught here.
except OSError:
return False
Note, however, that even this may not be enough.
Thanks to User Access Control (UAC), the ever-inimicable Windows Vista and all subsequent iterations thereof blatantly lie about permissions pertaining to system directories. When non-Administrator users attempt to create files in either the canonical C:\Windows or C:\Windows\system32 directories, UAC superficially permits the user to do so while actually isolating all created files into a "Virtual Store" in that user's profile. (Who could have possibly imagined that deceiving users would have harmful long-term consequences?)
This is crazy. This is Windows.
Prove It
Dare we? It's time to test-drive the above tests.
Since NULL is the only character prohibited in pathnames on UNIX-oriented filesystems, let's leverage that to demonstrate the cold, hard truth – ignoring non-ignorable Windows shenanigans, which frankly bore and anger me in equal measure:
>>> print('"foo.bar" valid? ' + str(is_pathname_valid('foo.bar')))
"foo.bar" valid? True
>>> print('Null byte valid? ' + str(is_pathname_valid('\x00')))
Null byte valid? False
>>> print('Long path valid? ' + str(is_pathname_valid('a' * 256)))
Long path valid? False
>>> print('"/dev" exists or creatable? ' + str(is_path_exists_or_creatable('/dev')))
"/dev" exists or creatable? True
>>> print('"/dev/foo.bar" exists or creatable? ' + str(is_path_exists_or_creatable('/dev/foo.bar')))
"/dev/foo.bar" exists or creatable? False
>>> print('Null byte exists or creatable? ' + str(is_path_exists_or_creatable('\x00')))
Null byte exists or creatable? False
Beyond sanity. Beyond pain. You will find Python portability concerns.
if os.path.exists(filePath):
#the file is there
elif os.access(os.path.dirname(filePath), os.W_OK):
#the file does not exists but write privileges are given
else:
#can not write there
Note that path.exists can fail for more reasons than just the file is not there so you might have to do finer tests like testing if the containing directory exists and so on.
After my discussion with the OP it turned out, that the main problem seems to be, that the file name might contain characters that are not allowed by the filesystem. Of course they need to be removed but the OP wants to maintain as much human readablitiy as the filesystem allows.
Sadly I do not know of any good solution for this.
However Cecil Curry's answer takes a closer look at detecting the problem.
I found a PyPI module called pathvalidate.
pip install pathvalidate
Inside it has a function called sanitize_filepath which will take a file path and convert it into a valid file path:
from pathvalidate import sanitize_filepath
file1 = "ap:lle/fi:le"
print(sanitize_filepath(file1))
# Output: "apple/file"
It also works with reserved names too. If you feed it the file path con, it will return con_.
With this knowledge, we can check if the entered file path is equal to the sanitized one and that will mean the file path is valid.
import os
from pathvalidate import sanitize_filepath
def check(filePath):
if os.path.exists(filePath):
return True
if filePath == sanitize_filepath(filePath):
return True
return False
With Python 3, how about:
try:
with open(filename, 'x') as tempfile: # OSError if file exists or is invalid
pass
except OSError:
# handle error here
With the 'x' option we also don't have to worry about race conditions. See documentation here.
Now, this WILL create a very shortlived temporary file if it does not exist already - unless the name is invalid. If you can live with that, it simplifies things a lot.
open(filename,'r') #2nd argument is r and not w
will open the file or give an error if it doesn't exist. If there's an error, then you can try to write to the path, if you can't then you get a second error
try:
open(filename,'r')
return True
except IOError:
try:
open(filename, 'w')
return True
except IOError:
return False
Also have a look here about permissions on windows
try os.path.exists this will check for the path and return True if exists and False if not.
How do I check whether a file exists or not, without using the try statement?
If the reason you're checking is so you can do something like if file_exists: open_it(), it's safer to use a try around the attempt to open it. Checking and then opening risks the file being deleted or moved or something between when you check and when you try to open it.
If you're not planning to open the file immediately, you can use os.path.isfile
Return True if path is an existing regular file. This follows symbolic links, so both islink() and isfile() can be true for the same path.
import os.path
os.path.isfile(fname)
if you need to be sure it's a file.
Starting with Python 3.4, the pathlib module offers an object-oriented approach (backported to pathlib2 in Python 2.7):
from pathlib import Path
my_file = Path("/path/to/file")
if my_file.is_file():
# file exists
To check a directory, do:
if my_file.is_dir():
# directory exists
To check whether a Path object exists independently of whether is it a file or directory, use exists():
if my_file.exists():
# path exists
You can also use resolve(strict=True) in a try block:
try:
my_abs_path = my_file.resolve(strict=True)
except FileNotFoundError:
# doesn't exist
else:
# exists
Use os.path.exists to check both files and directories:
import os.path
os.path.exists(file_path)
Use os.path.isfile to check only files (note: follows symbolic links):
os.path.isfile(file_path)
Unlike isfile(), exists() will return True for directories. So depending on if you want only plain files or also directories, you'll use isfile() or exists(). Here is some simple REPL output:
>>> os.path.isfile("/etc/password.txt")
True
>>> os.path.isfile("/etc")
False
>>> os.path.isfile("/does/not/exist")
False
>>> os.path.exists("/etc/password.txt")
True
>>> os.path.exists("/etc")
True
>>> os.path.exists("/does/not/exist")
False
import os
if os.path.isfile(filepath):
print("File exists")
Use os.path.isfile() with os.access():
import os
PATH = './file.txt'
if os.path.isfile(PATH) and os.access(PATH, os.R_OK):
print("File exists and is readable")
else:
print("Either the file is missing or not readable")
import os
os.path.exists(path) # Returns whether the path (directory or file) exists or not
os.path.isfile(path) # Returns whether the file exists or not
Although almost every possible way has been listed in (at least one of) the existing answers (e.g. Python 3.4 specific stuff was added), I'll try to group everything together.
Note: every piece of Python standard library code that I'm going to post, belongs to version 3.5.3.
Problem statement:
Check file (arguable: also folder ("special" file) ?) existence
Don't use try / except / else / finally blocks
Possible solutions:
1. [Python.Docs]: os.path.exists(path)
Also check other function family members like os.path.isfile, os.path.isdir, os.path.lexists for slightly different behaviors:
Return True if path refers to an existing path or an open file descriptor. Returns False for broken symbolic links. On some platforms, this function may return False if permission is not granted to execute os.stat() on the requested file, even if the path physically exists.
All good, but if following the import tree:
os.path - posixpath.py (ntpath.py)
genericpath.py - line ~20+
def exists(path):
"""Test whether a path exists. Returns False for broken symbolic links"""
try:
st = os.stat(path)
except os.error:
return False
return True
it's just a try / except block around [Python.Docs]: os.stat(path, *, dir_fd=None, follow_symlinks=True). So, your code is try / except free, but lower in the framestack there's (at least) one such block. This also applies to other functions (including os.path.isfile).
1.1. [Python.Docs]: pathlib - Path.is_file()
It's a fancier (and more [Wiktionary]: Pythonic) way of handling paths, but
Under the hood, it does exactly the same thing (pathlib.py - line ~1330):
def is_file(self):
"""
Whether this path is a regular file (also True for symlinks pointing
to regular files).
"""
try:
return S_ISREG(self.stat().st_mode)
except OSError as e:
if e.errno not in (ENOENT, ENOTDIR):
raise
# Path doesn't exist or is a broken symlink
# (see https://bitbucket.org/pitrou/pathlib/issue/12/)
return False
2. [Python.Docs]: With Statement Context Managers
Either:
Create one:
class Swallow: # Dummy example
swallowed_exceptions = (FileNotFoundError,)
def __enter__(self):
print("Entering...")
def __exit__(self, exc_type, exc_value, exc_traceback):
print("Exiting:", exc_type, exc_value, exc_traceback)
# Only swallow FileNotFoundError (not e.g. TypeError - if the user passes a wrong argument like None or float or ...)
return exc_type in Swallow.swallowed_exceptions
And its usage - I'll replicate the os.path.isfile behavior (note that this is just for demonstrating purposes, do not attempt to write such code for production):
import os
import stat
def isfile_seaman(path): # Dummy func
result = False
with Swallow():
result = stat.S_ISREG(os.stat(path).st_mode)
return result
Use [Python.Docs]: contextlib.suppress(*exceptions) - which was specifically designed for selectively suppressing exceptions
But, they seem to be wrappers over try / except / else / finally blocks, as [Python.Docs]: Compound statements - The with statement states:
This allows common try...except...finally usage patterns to be encapsulated for convenient reuse.
3. Filesystem traversal functions
Search the results for matching item(s):
[Python.Docs]: os.listdir(path='.') (or [Python.Docs]: os.scandir(path='.') on Python v3.5+, backport: [PyPI]: scandir)
Under the hood, both use:
Nix: [Man7]: OPENDIR(3) / [Man7]: READDIR(3) / [Man7]: CLOSEDIR(3)
Win: [MS.Learn]: FindFirstFileW function (fileapi.h) / [MS.Learn]: FindNextFileW function (fileapi.h) / [MS.Learn]: FindClose function (fileapi.h)
via [GitHub]: python/cpython - (main) cpython/Modules/posixmodule.c
Using scandir() instead of listdir() can significantly increase the performance of code that also needs file type or file attribute information, because os.DirEntry objects expose this information if the operating system provides it when scanning a directory. All os.DirEntry methods may perform a system call, but is_dir() and is_file() usually only require a system call for symbolic links; os.DirEntry.stat() always requires a system call on Unix, but only requires one for symbolic links on Windows.
[Python.Docs]: os.walk(top, topdown=True, onerror=None, followlinks=False)
Uses os.listdir (os.scandir when available)
[Python.Docs]: glob.iglob(pathname, *, root_dir=None, dir_fd=None, recursive=False, include_hidden=False) (or its predecessor: glob.glob)
Doesn't seem a traversing function per se (at least in some cases), but it still uses os.listdir
Since these iterate over folders, (in most of the cases) they are inefficient for our problem (there are exceptions, like non wildcarded globbing - as #ShadowRanger pointed out), so I'm not going to insist on them. Not to mention that in some cases, filename processing might be required.
4. [Python.Docs]: os.access(path, mode, *, dir_fd=None, effective_ids=False, follow_symlinks=True)
Its behavior is close to os.path.exists (actually it's wider, mainly because of the 2nd argument).
User permissions might restrict the file "visibility" as the doc states:
... test if the invoking user has the specified access to path. mode should be F_OK to test the existence of path...
Security considerations:
Using access() to check if a user is authorized to e.g. open a file before actually doing so using open() creates a security hole, because the user might exploit the short time interval between checking and opening the file to manipulate it.
os.access("/tmp", os.F_OK)
Since I also work in C, I use this method as well because under the hood, it calls native APIs (again, via "${PYTHON_SRC_DIR}/Modules/posixmodule.c"), but it also opens a gate for possible user errors, and it's not as Pythonic as other variants. So, don't use it unless you know what you're doing:
Nix: [Man7]: ACCESS(2)
Warning: Using these calls to check if a user is authorized to, for example, open a file before actually doing so using open(2) creates a security hole, because the user might exploit the short time interval between checking and opening the file to manipulate it. For this reason, the use of this system call should be avoided.
Win: [MS.Learn]: GetFileAttributesW function (fileapi.h)
As seen, this approach is highly discouraged (especially on Nix).
Note: calling native APIs is also possible via [Python.Docs]: ctypes - A foreign function library for Python, but in most cases it's more complicated. Before working with CTypes, check [SO]: C function called from Python via ctypes returns incorrect value (#CristiFati's answer) out.
(Win specific): since vcruntime###.dll (msvcr###.dll for older VStudio versions - I'm going to refer to it as UCRT) exports a [MS.Learn]: _access, _waccess function family as well, here's an example (note that the recommended [Python.Docs]: msvcrt - Useful routines from the MS VC++ runtime doesn't export them):
Python 3.5.3 (v3.5.3:1880cb95a742, Jan 16 2017, 16:02:32) [MSC v.1900 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import ctypes as cts, os
>>> cts.CDLL("msvcrt")._waccess(u"C:\\Windows\\Temp", os.F_OK)
0
>>> cts.CDLL("msvcrt")._waccess(u"C:\\Windows\\Temp.notexist", os.F_OK)
-1
Notes:
Although it's not a good practice, I'm using os.F_OK in the call, but that's just for clarity (its value is 0)
I'm using _waccess so that the same code works on Python 3 and Python 2 (in spite of [Wikipedia]: Unicode related differences between them - [SO]: Passing utf-16 string to a Windows function (#CristiFati's answer))
Although this targets a very specific area, it was not mentioned in any of the previous answers
The Linux (Ubuntu ([Wikipedia]: Ubuntu version history) 16 x86_64 (pc064)) counterpart as well:
Python 3.5.2 (default, Nov 17 2016, 17:05:23)
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import ctypes as cts, os
>>> cts.CDLL("/lib/x86_64-linux-gnu/libc.so.6").access(b"/tmp", os.F_OK)
0
>>> cts.CDLL("/lib/x86_64-linux-gnu/libc.so.6").access(b"/tmp.notexist", os.F_OK)
-1
Notes:
Instead hardcoding libc.so (LibC)'s path ("/lib/x86_64-linux-gnu/libc.so.6") which may (and most likely, will) vary across systems, None (or the empty string) can be passed to CDLL constructor (ctypes.CDLL(None).access(b"/tmp", os.F_OK)). According to [Man7]: DLOPEN(3):
If filename is NULL, then the returned handle is for the main
program. When given to dlsym(3), this handle causes a search for a
symbol in the main program, followed by all shared objects loaded at
program startup, and then all shared objects loaded by dlopen() with
the flag RTLD_GLOBAL.
Main (current) program (python) is linked against LibC, so its symbols (including access) will be loaded
This has to be handled with care, since functions like main, Py_Main and (all the) others are available; calling them could have disastrous effects (on the current program)
This doesn't also apply to Windows (but that's not such a big deal, since UCRT is located in "%SystemRoot%\System32" which is in %PATH% by default). I wanted to take things further and replicate this behavior on Windows (and submit a patch), but as it turns out, [MS.Learn]: GetProcAddress function (libloaderapi.h) only "sees" exported symbols, so unless someone declares the functions in the main executable as __declspec(dllexport) (why on Earth the common person would do that?), the main program is loadable, but it is pretty much unusable
5. 3rd-party modules with filesystem capabilities
Most likely, will rely on one of the ways above (maybe with slight customizations). One example would be (again, Win specific) [GitHub]: mhammond/pywin32 - Python for Windows (pywin32) Extensions, which is a Python wrapper over WinAPIs.
But, since this is more like a workaround, I'm stopping here.
6. SysAdmin approach
I consider this a (lame) workaround (gainarie): use Python as a wrapper to execute shell commands:
Win:
(py35x64_test) [cfati#CFATI-5510-0:e:\Work\Dev\StackOverflow\q000082831]> "e:\Work\Dev\VEnvs\py35x64_test\Scripts\python.exe" -c "import os; print(os.system('dir /b \"C:\\Windows\\Temp\" > nul 2>&1'))"
0
(py35x64_test) [cfati#CFATI-5510-0:e:\Work\Dev\StackOverflow\q000082831]> "e:\Work\Dev\VEnvs\py35x64_test\Scripts\python.exe" -c "import os; print(os.system('dir /b \"C:\\Windows\\Temp.notexist\" > nul 2>&1'))"
1
Nix ([Wikipedia]: Unix-like) - Ubuntu:
[cfati#cfati-5510-0:/mnt/e/Work/Dev/StackOverflow/q000082831]> python3 -c "import os; print(os.system('ls \"/tmp\" > /dev/null 2>&1'))"
0
[cfati#cfati-5510-0:/mnt/e/Work/Dev/StackOverflow/q000082831]> python3 -c "import os; print(os.system('ls \"/tmp.notexist\" > /dev/null 2>&1'))"
512
Bottom line:
Do use try / except / else / finally blocks, because they can prevent you running into a series of nasty problems
A possible counterexample that I can think of, is performance: such blocks are costly, so try not to place them in code that it's supposed to run hundreds of thousands times per second (but since (in most cases) it involves disk access, it won't be the case)
Python 3.4+ has an object-oriented path module: pathlib. Using this new module, you can check whether a file exists like this:
import pathlib
p = pathlib.Path('path/to/file')
if p.is_file(): # or p.is_dir() to see if it is a directory
# do stuff
You can (and usually should) still use a try/except block when opening files:
try:
with p.open() as f:
# do awesome stuff
except OSError:
print('Well darn.')
The pathlib module has lots of cool stuff in it: convenient globbing, checking file's owner, easier path joining, etc. It's worth checking out. If you're on an older Python (version 2.6 or later), you can still install pathlib with pip:
# installs pathlib2 on older Python versions
# the original third-party module, pathlib, is no longer maintained.
pip install pathlib2
Then import it as follows:
# Older Python versions
import pathlib2 as pathlib
This is the simplest way to check if a file exists. Just because the file existed when you checked doesn't guarantee that it will be there when you need to open it.
import os
fname = "foo.txt"
if os.path.isfile(fname):
print("file does exist at this time")
else:
print("no such file exists at this time")
How do I check whether a file exists, using Python, without using a try statement?
Now available since Python 3.4, import and instantiate a Path object with the file name, and check the is_file method (note that this returns True for symlinks pointing to regular files as well):
>>> from pathlib import Path
>>> Path('/').is_file()
False
>>> Path('/initrd.img').is_file()
True
>>> Path('/doesnotexist').is_file()
False
If you're on Python 2, you can backport the pathlib module from pypi, pathlib2, or otherwise check isfile from the os.path module:
>>> import os
>>> os.path.isfile('/')
False
>>> os.path.isfile('/initrd.img')
True
>>> os.path.isfile('/doesnotexist')
False
Now the above is probably the best pragmatic direct answer here, but there's the possibility of a race condition (depending on what you're trying to accomplish), and the fact that the underlying implementation uses a try, but Python uses try everywhere in its implementation.
Because Python uses try everywhere, there's really no reason to avoid an implementation that uses it.
But the rest of this answer attempts to consider these caveats.
Longer, much more pedantic answer
Available since Python 3.4, use the new Path object in pathlib. Note that .exists is not quite right, because directories are not files (except in the unix sense that everything is a file).
>>> from pathlib import Path
>>> root = Path('/')
>>> root.exists()
True
So we need to use is_file:
>>> root.is_file()
False
Here's the help on is_file:
is_file(self)
Whether this path is a regular file (also True for symlinks pointing
to regular files).
So let's get a file that we know is a file:
>>> import tempfile
>>> file = tempfile.NamedTemporaryFile()
>>> filepathobj = Path(file.name)
>>> filepathobj.is_file()
True
>>> filepathobj.exists()
True
By default, NamedTemporaryFile deletes the file when closed (and will automatically close when no more references exist to it).
>>> del file
>>> filepathobj.exists()
False
>>> filepathobj.is_file()
False
If you dig into the implementation, though, you'll see that is_file uses try:
def is_file(self):
"""
Whether this path is a regular file (also True for symlinks pointing
to regular files).
"""
try:
return S_ISREG(self.stat().st_mode)
except OSError as e:
if e.errno not in (ENOENT, ENOTDIR):
raise
# Path doesn't exist or is a broken symlink
# (see https://bitbucket.org/pitrou/pathlib/issue/12/)
return False
Race Conditions: Why we like try
We like try because it avoids race conditions. With try, you simply attempt to read your file, expecting it to be there, and if not, you catch the exception and perform whatever fallback behavior makes sense.
If you want to check that a file exists before you attempt to read it, and you might be deleting it and then you might be using multiple threads or processes, or another program knows about that file and could delete it - you risk the chance of a race condition if you check it exists, because you are then racing to open it before its condition (its existence) changes.
Race conditions are very hard to debug because there's a very small window in which they can cause your program to fail.
But if this is your motivation, you can get the value of a try statement by using the suppress context manager.
Avoiding race conditions without a try statement: suppress
Python 3.4 gives us the suppress context manager (previously the ignore context manager), which does semantically exactly the same thing in fewer lines, while also (at least superficially) meeting the original ask to avoid a try statement:
from contextlib import suppress
from pathlib import Path
Usage:
>>> with suppress(OSError), Path('doesnotexist').open() as f:
... for line in f:
... print(line)
...
>>>
>>> with suppress(OSError):
... Path('doesnotexist').unlink()
...
>>>
For earlier Pythons, you could roll your own suppress, but without a try will be more verbose than with. I do believe this actually is the only answer that doesn't use try at any level in the Python that can be applied to prior to Python 3.4 because it uses a context manager instead:
class suppress(object):
def __init__(self, *exceptions):
self.exceptions = exceptions
def __enter__(self):
return self
def __exit__(self, exc_type, exc_value, traceback):
if exc_type is not None:
return issubclass(exc_type, self.exceptions)
Perhaps easier with a try:
from contextlib import contextmanager
#contextmanager
def suppress(*exceptions):
try:
yield
except exceptions:
pass
Other options that don't meet the ask for "without try":
isfile
import os
os.path.isfile(path)
from the docs:
os.path.isfile(path)
Return True if path is an existing regular file. This follows symbolic
links, so both islink() and isfile() can be true for the same path.
But if you examine the source of this function, you'll see it actually does use a try statement:
# This follows symbolic links, so both islink() and isdir() can be true
# for the same path on systems that support symlinks
def isfile(path):
"""Test whether a path is a regular file"""
try:
st = os.stat(path)
except os.error:
return False
return stat.S_ISREG(st.st_mode)
>>> OSError is os.error
True
All it's doing is using the given path to see if it can get stats on it, catching OSError and then checking if it's a file if it didn't raise the exception.
If you intend to do something with the file, I would suggest directly attempting it with a try-except to avoid a race condition:
try:
with open(path) as f:
f.read()
except OSError:
pass
os.access
Available for Unix and Windows is os.access, but to use you must pass flags, and it does not differentiate between files and directories. This is more used to test if the real invoking user has access in an elevated privilege environment:
import os
os.access(path, os.F_OK)
It also suffers from the same race condition problems as isfile. From the docs:
Note:
Using access() to check if a user is authorized to e.g. open a file
before actually doing so using open() creates a security hole, because
the user might exploit the short time interval between checking and
opening the file to manipulate it. It’s preferable to use EAFP
techniques. For example:
if os.access("myfile", os.R_OK):
with open("myfile") as fp:
return fp.read()
return "some default data"
is better written as:
try:
fp = open("myfile")
except IOError as e:
if e.errno == errno.EACCES:
return "some default data"
# Not a permission error.
raise
else:
with fp:
return fp.read()
Avoid using os.access. It is a low level function that has more opportunities for user error than the higher level objects and functions discussed above.
Criticism of another answer:
Another answer says this about os.access:
Personally, I prefer this one because under the hood, it calls native APIs (via "${PYTHON_SRC_DIR}/Modules/posixmodule.c"), but it also opens a gate for possible user errors, and it's not as Pythonic as other variants:
This answer says it prefers a non-Pythonic, error-prone method, with no justification. It seems to encourage users to use low-level APIs without understanding them.
It also creates a context manager which, by unconditionally returning True, allows all Exceptions (including KeyboardInterrupt and SystemExit!) to pass silently, which is a good way to hide bugs.
This seems to encourage users to adopt poor practices.
Prefer the try statement. It's considered better style and avoids race conditions.
Don't take my word for it. There's plenty of support for this theory. Here's a couple:
Style: Section "Handling unusual conditions" of these course notes for Software Design (2007)
Avoiding Race Conditions
Use:
import os
#Your path here e.g. "C:\Program Files\text.txt"
#For access purposes: "C:\\Program Files\\text.txt"
if os.path.exists("C:\..."):
print "File found!"
else:
print "File not found!"
Importing os makes it easier to navigate and perform standard actions with your operating system.
For reference, also see How do I check whether a file exists without exceptions?.
If you need high-level operations, use shutil.
Testing for files and folders with os.path.isfile(), os.path.isdir() and os.path.exists()
Assuming that the "path" is a valid path, this table shows what is returned by each function for files and folders:
You can also test if a file is a certain type of file using os.path.splitext() to get the extension (if you don't already know it)
>>> import os
>>> path = "path to a word document"
>>> os.path.isfile(path)
True
>>> os.path.splitext(path)[1] == ".docx" # test if the extension is .docx
True
TL;DR
The answer is: use the pathlib module
Pathlib is probably the most modern and convenient way for almost all of the file operations. For the existence of a file or a folder a single line of code is enough. If file is not exists, it will not throw any exception.
from pathlib import Path
if Path("myfile.txt").exists(): # works for both file and folders
# do your cool stuff...
The pathlib module was introduced in Python 3.4, so you need to have Python 3.4+. This library makes your life much easier while working with files and folders, and it is pretty to use. Here is more documentation about it: pathlib — Object-oriented filesystem paths.
BTW, if you are going to reuse the path, then it is better to assign it to a variable.
So it will become:
from pathlib import Path
p = Path("loc/of/myfile.txt")
if p.exists(): # works for both file and folders
# do stuffs...
#reuse 'p' if needed.
In 2016 the best way is still using os.path.isfile:
>>> os.path.isfile('/path/to/some/file.txt')
Or in Python 3 you can use pathlib:
import pathlib
path = pathlib.Path('/path/to/some/file.txt')
if path.is_file():
...
It doesn't seem like there's a meaningful functional difference between try/except and isfile(), so you should use which one makes sense.
If you want to read a file, if it exists, do
try:
f = open(filepath)
except IOError:
print 'Oh dear.'
But if you just wanted to rename a file if it exists, and therefore don't need to open it, do
if os.path.isfile(filepath):
os.rename(filepath, filepath + '.old')
If you want to write to a file, if it doesn't exist, do
# Python 2
if not os.path.isfile(filepath):
f = open(filepath, 'w')
# Python 3: x opens for exclusive creation, failing if the file already exists
try:
f = open(filepath, 'wx')
except IOError:
print 'file already exists'
If you need file locking, that's a different matter.
You could try this (safer):
try:
# http://effbot.org/zone/python-with-statement.htm
# 'with' is safer to open a file
with open('whatever.txt') as fh:
# Do something with 'fh'
except IOError as e:
print("({})".format(e))
The ouput would be:
([Errno 2] No such file or directory:
'whatever.txt')
Then, depending on the result, your program can just keep running from there or you can code to stop it if you want.
Date: 2017-12-04
Every possible solution has been listed in other answers.
An intuitive and arguable way to check if a file exists is the following:
import os
os.path.isfile('~/file.md') # Returns True if exists, else False
# Additionally, check a directory
os.path.isdir('~/folder') # Returns True if the folder exists, else False
# Check either a directory or a file
os.path.exists('~/file')
I made an exhaustive cheat sheet for your reference:
# os.path methods in exhaustive cheat sheet
{'definition': ['dirname',
'basename',
'abspath',
'relpath',
'commonpath',
'normpath',
'realpath'],
'operation': ['split', 'splitdrive', 'splitext',
'join', 'normcase'],
'compare': ['samefile', 'sameopenfile', 'samestat'],
'condition': ['isdir',
'isfile',
'exists',
'lexists'
'islink',
'isabs',
'ismount',],
'expand': ['expanduser',
'expandvars'],
'stat': ['getatime', 'getctime', 'getmtime',
'getsize']}
Although I always recommend using try and except statements, here are a few possibilities for you (my personal favourite is using os.access):
Try opening the file:
Opening the file will always verify the existence of the file. You can make a function just like so:
def File_Existence(filepath):
f = open(filepath)
return True
If it's False, it will stop execution with an unhanded IOError
or OSError in later versions of Python. To catch the exception,
you have to use a try except clause. Of course, you can always
use a try except` statement like so (thanks to hsandt
for making me think):
def File_Existence(filepath):
try:
f = open(filepath)
except IOError, OSError: # Note OSError is for later versions of Python
return False
return True
Use os.path.exists(path):
This will check the existence of what you specify. However, it checks for files and directories so beware about how you use it.
import os.path
>>> os.path.exists("this/is/a/directory")
True
>>> os.path.exists("this/is/a/file.txt")
True
>>> os.path.exists("not/a/directory")
False
Use os.access(path, mode):
This will check whether you have access to the file. It will check for permissions. Based on the os.py documentation, typing in os.F_OK, it will check the existence of the path. However, using this will create a security hole, as someone can attack your file using the time between checking the permissions and opening the file. You should instead go directly to opening the file instead of checking its permissions. (EAFP vs LBYP). If you're not going to open the file afterwards, and only checking its existence, then you can use this.
Anyway, here:
>>> import os
>>> os.access("/is/a/file.txt", os.F_OK)
True
I should also mention that there are two ways that you will not be able to verify the existence of a file. Either the issue will be permission denied or no such file or directory. If you catch an IOError, set the IOError as e (like my first option), and then type in print(e.args) so that you can hopefully determine your issue. I hope it helps! :)
If the file is for opening you could use one of the following techniques:
with open('somefile', 'xt') as f: # Using the x-flag, Python 3.3 and above
f.write('Hello\n')
if not os.path.exists('somefile'):
with open('somefile', 'wt') as f:
f.write("Hello\n")
else:
print('File already exists!')
Note: This finds either a file or a directory with the given name.
Additionally, os.access():
if os.access("myfile", os.R_OK):
with open("myfile") as fp:
return fp.read()
Being R_OK, W_OK, and X_OK the flags to test for permissions (doc).
if os.path.isfile(path_to_file):
try:
open(path_to_file)
pass
except IOError as e:
print "Unable to open file"
Raising exceptions is considered to be an acceptable, and Pythonic,
approach for flow control in your program. Consider handling missing
files with IOErrors. In this situation, an IOError exception will be
raised if the file exists but the user does not have read permissions.
Source: Using Python: How To Check If A File Exists
If you imported NumPy already for other purposes then there is no need to import other libraries like pathlib, os, paths, etc.
import numpy as np
np.DataSource().exists("path/to/your/file")
This will return true or false based on its existence.
Check file or directory exists
You can follow these three ways:
1. Using isfile()
Note 1: The os.path.isfile used only for files
import os.path
os.path.isfile(filename) # True if file exists
os.path.isfile(dirname) # False if directory exists
2. Using exists
Note 2: The os.path.exists is used for both files and directories
import os.path
os.path.exists(filename) # True if file exists
os.path.exists(dirname) # True if directory exists
3. The pathlib.Path method (included in Python 3+, installable with pip for Python 2)
from pathlib import Path
Path(filename).exists()
You can write Brian's suggestion without the try:.
from contextlib import suppress
with suppress(IOError), open('filename'):
process()
suppress is part of Python 3.4. In older releases you can quickly write your own suppress:
from contextlib import contextmanager
#contextmanager
def suppress(*exceptions):
try:
yield
except exceptions:
pass
I'm the author of a package that's been around for about 10 years, and it has a function that addresses this question directly. Basically, if you are on a non-Windows system, it uses Popen to access find. However, if you are on Windows, it replicates find with an efficient filesystem walker.
The code itself does not use a try block… except in determining the operating system and thus steering you to the "Unix"-style find or the hand-buillt find. Timing tests showed that the try was faster in determining the OS, so I did use one there (but nowhere else).
>>> import pox
>>> pox.find('*python*', type='file', root=pox.homedir(), recurse=False)
['/Users/mmckerns/.python']
And the doc…
>>> print pox.find.__doc__
find(patterns[,root,recurse,type]); Get path to a file or directory
patterns: name or partial name string of items to search for
root: path string of top-level directory to search
recurse: if True, recurse down from root directory
type: item filter; one of {None, file, dir, link, socket, block, char}
verbose: if True, be a little verbose about the search
On some OS, recursion can be specified by recursion depth (an integer).
patterns can be specified with basic pattern matching. Additionally,
multiple patterns can be specified by splitting patterns with a ';'
For example:
>>> find('pox*', root='..')
['/Users/foo/pox/pox', '/Users/foo/pox/scripts/pox_launcher.py']
>>> find('*shutils*;*init*')
['/Users/foo/pox/pox/shutils.py', '/Users/foo/pox/pox/__init__.py']
>>>
The implementation, if you care to look, is here:
https://github.com/uqfoundation/pox/blob/89f90fb308f285ca7a62eabe2c38acb87e89dad9/pox/shutils.py#L190
Adding one more slight variation which isn't exactly reflected in the other answers.
This will handle the case of the file_path being None or empty string.
def file_exists(file_path):
if not file_path:
return False
elif not os.path.isfile(file_path):
return False
else:
return True
Adding a variant based on suggestion from Shahbaz
def file_exists(file_path):
if not file_path:
return False
else:
return os.path.isfile(file_path)
Adding a variant based on suggestion from Peter Wood
def file_exists(file_path):
return file_path and os.path.isfile(file_path):
Here's a one-line Python command for the Linux command line environment. I find this very handy since I'm not such a hot Bash guy.
python -c "import os.path; print os.path.isfile('/path_to/file.xxx')"
You can use the "OS" library of Python:
>>> import os
>>> os.path.exists("C:\\Users\\####\\Desktop\\test.txt")
True
>>> os.path.exists("C:\\Users\\####\\Desktop\\test.tx")
False
How do I check whether a file exists, without using the try statement?
In 2016, this is still arguably the easiest way to check if both a file exists and if it is a file:
import os
os.path.isfile('./file.txt') # Returns True if exists, else False
isfile is actually just a helper method that internally uses os.stat and stat.S_ISREG(mode) underneath. This os.stat is a lower-level method that will provide you with detailed information about files, directories, sockets, buffers, and more. More about os.stat here
Note: However, this approach will not lock the file in any way and therefore your code can become vulnerable to "time of check to time of use" (TOCTTOU) bugs.
So raising exceptions is considered to be an acceptable, and Pythonic, approach for flow control in your program. And one should consider handling missing files with IOErrors, rather than if statements (just an advice).
I want create a symlink, overwriting an existing file or symlink if needed.
I've discovered that os.path.exists only returns True for non-broken symlinks, so I'm guessing that any test must also include os.path.lexists.
What is the most atomic way to implement ln -sf in python? (Ie, preventing a file being created by another process between deletion and symlink creation)
Differentiation: This question doesn't specify the atomic requirement
This code tries to minimise the possibilities for race conditions:
import os
import tempfile
def symlink_force(target, link_name):
'''
Create a symbolic link link_name pointing to target.
Overwrites link_name if it exists.
'''
# os.replace() may fail if files are on different filesystems
link_dir = os.path.dirname(link_name)
while True:
temp_link_name = tempfile.mktemp(dir=link_dir)
try:
os.symlink(target, temp_link_name)
break
except FileExistsError:
pass
try:
os.replace(temp_link_name, link_name)
except OSError: # e.g. permission denied
os.remove(temp_link_name)
raise
Note:
If the function is interrupted (e.g. computer crashes), an additional random link to the target might exist.
An unlikely race condition still remains: the symlink created at the randomly-named temp_link_name could be modified by another process before replacing link_name.
I raised a python issue to highlight the issues of os.symlink() requiring the target not exist.
Credit to Robert Seimer's input.
This may be an open ended or awkward question, but I find myself running into more and more exception handling concerns where I do not know the "best" approach in handling them.
Python's logging module raises an IOError if you try to configure a FileHandler with a file that does not exist. The module does not handle this exception, but simply raises it. Often times, it is that the path to the file does not exist (and therefore the file does not exist), so we must create the directories along the path if we want to handle the exception and continue.
I want my application to properly handle this error, as every user has asked why we don't make the proper directory for them.
The way I have decided to handle this can be seen below.
done = False
while not done:
try:
# Configure logging based on a config file
# if a filehandler's full path to file does not exist, it raises an IOError
logging.config.fileConfig(filename)
except IOError as e:
if e.args[0] == 2 and e.filename:
# If we catch the IOError, we can see if it is a "does not exist" error
# and try to recover by making the directories
print "Most likely the full path to the file does not exist, so we can try and make it"
fp = e.filename[:e.rfind("/")]
# See http://stackoverflow.com/questions/273192/python-best-way-to-create-directory-if-it-doesnt-exist-for-file-write#273208 for why I don't just leap
if not os.path.exists(fp):
os.makedirs(fp)
else:
print "Most likely some other error...let's just reraise for now"
raise
else:
done = True
I need to loop (or recurse I suppose) since there is N FileHandlers that need to be configured and therefore N IOErrors that need to be raised and corrected for this scenario.
Is this the proper way to do this? Is there a better, more Pythonic way, that I don't know of or may not understand?
This is not something specific to the logging module: in general, Python code does not automatically create intermediate directories for you automatically; you need to do this explicitly using os.makedirs(), typically like this:
if not os.path.exists(dirname):
os.makedirs(dirname)
You can replace the standard FileHandler provided by logging with a subclass which does the checks you need and, when necessary, creates the directory for the logging file using os.makedirs(). Then you can specify this handler in the config file instead of the standard handler.
Assuming it only needs to be done once at the beginning of your app's execution, I would just os.makedirs() all the needed directories without checking for their existence first or even waiting for the logging module to raise an error. If you then you get an error trying to start a logger, you can just handle it the way you likely already did: print an error, disable the logger. You went above and beyond just by trying to create the directory. If the user gave you bogus information, you're no worse off than you are now, and you're better in the vast majority of cases.
I am writing a file using Python, and I want it to be placed in a specific path. How can I safely make sure that the path exists?
That is: how can I check whether the folder exists, along with its parents? If there are missing folders along the path, how can I create them?
On Python ≥ 3.5, use pathlib.Path.mkdir:
from pathlib import Path
Path("/my/directory").mkdir(parents=True, exist_ok=True)
For older versions of Python, I see two answers with good qualities, each with a small flaw, so I will give my take on it:
Try os.path.exists, and consider os.makedirs for the creation.
import os
if not os.path.exists(directory):
os.makedirs(directory)
As noted in comments and elsewhere, there's a race condition – if the directory is created between the os.path.exists and the os.makedirs calls, the os.makedirs will fail with an OSError. Unfortunately, blanket-catching OSError and continuing is not foolproof, as it will ignore a failure to create the directory due to other factors, such as insufficient permissions, full disk, etc.
One option would be to trap the OSError and examine the embedded error code (see Is there a cross-platform way of getting information from Python’s OSError):
import os, errno
try:
os.makedirs(directory)
except OSError as e:
if e.errno != errno.EEXIST:
raise
Alternatively, there could be a second os.path.exists, but suppose another created the directory after the first check, then removed it before the second one – we could still be fooled.
Depending on the application, the danger of concurrent operations may be more or less than the danger posed by other factors such as file permissions. The developer would have to know more about the particular application being developed and its expected environment before choosing an implementation.
Modern versions of Python improve this code quite a bit, both by exposing FileExistsError (in 3.3+)...
try:
os.makedirs("path/to/directory")
except FileExistsError:
# directory already exists
pass
...and by allowing a keyword argument to os.makedirs called exist_ok (in 3.2+).
os.makedirs("path/to/directory", exist_ok=True) # succeeds even if directory exists.
Python 3.5+:
import pathlib
pathlib.Path('/my/directory').mkdir(parents=True, exist_ok=True)
pathlib.Path.mkdir as used above recursively creates the directory and does not raise an exception if the directory already exists. If you don't need or want the parents to be created, skip the parents argument.
Python 3.2+:
Using pathlib:
If you can, install the current pathlib backport named pathlib2. Do not install the older unmaintained backport named pathlib. Next, refer to the Python 3.5+ section above and use it the same.
If using Python 3.4, even though it comes with pathlib, it is missing the useful exist_ok option. The backport is intended to offer a newer and superior implementation of mkdir which includes this missing option.
Using os:
import os
os.makedirs(path, exist_ok=True)
os.makedirs as used above recursively creates the directory and does not raise an exception if the directory already exists. It has the optional exist_ok argument only if using Python 3.2+, with a default value of False. This argument does not exist in Python 2.x up to 2.7. As such, there is no need for manual exception handling as with Python 2.7.
Python 2.7+:
Using pathlib:
If you can, install the current pathlib backport named pathlib2. Do not install the older unmaintained backport named pathlib. Next, refer to the Python 3.5+ section above and use it the same.
Using os:
import os
try:
os.makedirs(path)
except OSError:
if not os.path.isdir(path):
raise
While a naive solution may first use os.path.isdir followed by os.makedirs, the solution above reverses the order of the two operations. In doing so, it prevents a common race condition having to do with a duplicated attempt at creating the directory, and also disambiguates files from directories.
Note that capturing the exception and using errno is of limited usefulness because OSError: [Errno 17] File exists, i.e. errno.EEXIST, is raised for both files and directories. It is more reliable simply to check if the directory exists.
Alternative:
mkpath creates the nested directory, and does nothing if the directory already exists. This works in both Python 2 and 3. Note however that distutils has been deprecated, and is scheduled for removal in Python 3.12.
import distutils.dir_util
distutils.dir_util.mkpath(path)
Per Bug 10948, a severe limitation of this alternative is that it works only once per python process for a given path. In other words, if you use it to create a directory, then delete the directory from inside or outside Python, then use mkpath again to recreate the same directory, mkpath will simply silently use its invalid cached info of having previously created the directory, and will not actually make the directory again. In contrast, os.makedirs doesn't rely on any such cache. This limitation may be okay for some applications.
With regard to the directory's mode, please refer to the documentation if you care about it.
Using try except and the right error code from errno module gets rid of the race condition and is cross-platform:
import os
import errno
def make_sure_path_exists(path):
try:
os.makedirs(path)
except OSError as exception:
if exception.errno != errno.EEXIST:
raise
In other words, we try to create the directories, but if they already exist we ignore the error. On the other hand, any other error gets reported. For example, if you create dir 'a' beforehand and remove all permissions from it, you will get an OSError raised with errno.EACCES (Permission denied, error 13).
Starting from Python 3.5, pathlib.Path.mkdir has an exist_ok flag:
from pathlib import Path
path = Path('/my/directory/filename.txt')
path.parent.mkdir(parents=True, exist_ok=True)
# path.parent ~ os.path.dirname(path)
This recursively creates the directory and does not raise an exception if the directory already exists.
(just as os.makedirs got an exist_ok flag starting from python 3.2 e.g os.makedirs(path, exist_ok=True))
Note: when i posted this answer none of the other answers mentioned exist_ok...
I would personally recommend that you use os.path.isdir() to test instead of os.path.exists().
>>> os.path.exists('/tmp/dirname')
True
>>> os.path.exists('/tmp/dirname/filename.etc')
True
>>> os.path.isdir('/tmp/dirname/filename.etc')
False
>>> os.path.isdir('/tmp/fakedirname')
False
If you have:
>>> directory = raw_input(":: ")
And a foolish user input:
:: /tmp/dirname/filename.etc
... You're going to end up with a directory named filename.etc when you pass that argument to os.makedirs() if you test with os.path.exists().
Check os.makedirs: (It makes sure the complete path exists.)
To handle the fact the directory might exist, catch OSError.
(If exist_ok is False (the default), an OSError is raised if the target directory already exists.)
import os
try:
os.makedirs('./path/to/somewhere')
except OSError:
pass
Try the os.path.exists function
if not os.path.exists(dir):
os.mkdir(dir)
Insights on the specifics of this situation
You give a particular file at a certain path and you pull the directory from the file path. Then after making sure you have the directory, you attempt to open a file for reading. To comment on this code:
filename = "/my/directory/filename.txt"
dir = os.path.dirname(filename)
We want to avoid overwriting the builtin function, dir. Also, filepath or perhaps fullfilepath is probably a better semantic name than filename so this would be better written:
import os
filepath = '/my/directory/filename.txt'
directory = os.path.dirname(filepath)
Your end goal is to open this file, you initially state, for writing, but you're essentially approaching this goal (based on your code) like this, which opens the file for reading:
if not os.path.exists(directory):
os.makedirs(directory)
f = file(filename)
Assuming opening for reading
Why would you make a directory for a file that you expect to be there and be able to read?
Just attempt to open the file.
with open(filepath) as my_file:
do_stuff(my_file)
If the directory or file isn't there, you'll get an IOError with an associated error number: errno.ENOENT will point to the correct error number regardless of your platform. You can catch it if you want, for example:
import errno
try:
with open(filepath) as my_file:
do_stuff(my_file)
except IOError as error:
if error.errno == errno.ENOENT:
print 'ignoring error because directory or file is not there'
else:
raise
Assuming we're opening for writing
This is probably what you're wanting.
In this case, we probably aren't facing any race conditions. So just do as you were, but note that for writing, you need to open with the w mode (or a to append). It's also a Python best practice to use the context manager for opening files.
import os
if not os.path.exists(directory):
os.makedirs(directory)
with open(filepath, 'w') as my_file:
do_stuff(my_file)
However, say we have several Python processes that attempt to put all their data into the same directory. Then we may have contention over creation of the directory. In that case it's best to wrap the makedirs call in a try-except block.
import os
import errno
if not os.path.exists(directory):
try:
os.makedirs(directory)
except OSError as error:
if error.errno != errno.EEXIST:
raise
with open(filepath, 'w') as my_file:
do_stuff(my_file)
I have put the following down. It's not totally foolproof though.
import os
dirname = 'create/me'
try:
os.makedirs(dirname)
except OSError:
if os.path.exists(dirname):
# We are nearly safe
pass
else:
# There was an error on creation, so make sure we know about it
raise
Now as I say, this is not really foolproof, because we have the possiblity of failing to create the directory, and another process creating it during that period.
Check if a directory exists and create it if necessary?
The direct answer to this is, assuming a simple situation where you don't expect other users or processes to be messing with your directory:
if not os.path.exists(d):
os.makedirs(d)
or if making the directory is subject to race conditions (i.e. if after checking the path exists, something else may have already made it) do this:
import errno
try:
os.makedirs(d)
except OSError as exception:
if exception.errno != errno.EEXIST:
raise
But perhaps an even better approach is to sidestep the resource contention issue, by using temporary directories via tempfile:
import tempfile
d = tempfile.mkdtemp()
Here's the essentials from the online doc:
mkdtemp(suffix='', prefix='tmp', dir=None)
User-callable function to create and return a unique temporary
directory. The return value is the pathname of the directory.
The directory is readable, writable, and searchable only by the
creating user.
Caller is responsible for deleting the directory when done with it.
New in Python 3.5: pathlib.Path with exist_ok
There's a new Path object (as of 3.4) with lots of methods one would want to use with paths - one of which is mkdir.
(For context, I'm tracking my weekly rep with a script. Here's the relevant parts of code from the script that allow me to avoid hitting Stack Overflow more than once a day for the same data.)
First the relevant imports:
from pathlib import Path
import tempfile
We don't have to deal with os.path.join now - just join path parts with a /:
directory = Path(tempfile.gettempdir()) / 'sodata'
Then I idempotently ensure the directory exists - the exist_ok argument shows up in Python 3.5:
directory.mkdir(exist_ok=True)
Here's the relevant part of the documentation:
If exist_ok is true, FileExistsError exceptions will be ignored (same behavior as the POSIX mkdir -p command), but only if the last path component is not an existing non-directory file.
Here's a little more of the script - in my case, I'm not subject to a race condition, I only have one process that expects the directory (or contained files) to be there, and I don't have anything trying to remove the directory.
todays_file = directory / str(datetime.datetime.utcnow().date())
if todays_file.exists():
logger.info("todays_file exists: " + str(todays_file))
df = pd.read_json(str(todays_file))
Path objects have to be coerced to str before other APIs that expect str paths can use them.
Perhaps Pandas should be updated to accept instances of the abstract base class, os.PathLike.
fastest safest way to do it is:
it will create if not exists and skip if exists:
from pathlib import Path
Path("path/with/childs/.../").mkdir(parents=True, exist_ok=True)
Best way to do this in python
#Devil
import os
directory = "./out_dir/subdir1/subdir2"
if not os.path.exists(directory):
os.makedirs(directory)
In Python 3.4 you can also use the brand new pathlib module:
from pathlib import Path
path = Path("/my/directory/filename.txt")
try:
if not path.parent.exists():
path.parent.mkdir(parents=True)
except OSError:
# handle error; you can also catch specific errors like
# FileExistsError and so on.
For a one-liner solution, you can use IPython.utils.path.ensure_dir_exists():
from IPython.utils.path import ensure_dir_exists
ensure_dir_exists(dir)
From the documentation: Ensure that a directory exists. If it doesn’t exist, try to create it and protect against a race condition if another process is doing the same.
IPython is an extension package, not part of the standard library.
In Python3, os.makedirs supports setting exist_ok. The default setting is False, which means an OSError will be raised if the target directory already exists. By setting exist_ok to True, OSError (directory exists) will be ignored and the directory will not be created.
os.makedirs(path,exist_ok=True)
In Python2, os.makedirs doesn't support setting exist_ok. You can use the approach in heikki-toivonen's answer:
import os
import errno
def make_sure_path_exists(path):
try:
os.makedirs(path)
except OSError as exception:
if exception.errno != errno.EEXIST:
raise
The relevant Python documentation suggests the use of the EAFP coding style (Easier to Ask for Forgiveness than Permission). This means that the code
try:
os.makedirs(path)
except OSError as exception:
if exception.errno != errno.EEXIST:
raise
else:
print "\nBE CAREFUL! Directory %s already exists." % path
is better than the alternative
if not os.path.exists(path):
os.makedirs(path)
else:
print "\nBE CAREFUL! Directory %s already exists." % path
The documentation suggests this exactly because of the race condition discussed in this question. In addition, as others mention here, there is a performance advantage in querying once instead of twice the OS. Finally, the argument placed forward, potentially, in favour of the second code in some cases --when the developer knows the environment the application is running-- can only be advocated in the special case that the program has set up a private environment for itself (and other instances of the same program).
Even in that case, this is a bad practice and can lead to long useless debugging. For example, the fact we set the permissions for a directory should not leave us with the impression permissions are set appropriately for our purposes. A parent directory could be mounted with other permissions. In general, a program should always work correctly and the programmer should not expect one specific environment.
I found this Q/A after I was puzzled by some of the failures and errors I was getting while working with directories in Python. I am working in Python 3 (v.3.5 in an Anaconda virtual environment on an Arch Linux x86_64 system).
Consider this directory structure:
└── output/ ## dir
├── corpus ## file
├── corpus2/ ## dir
└── subdir/ ## dir
Here are my experiments/notes, which provides clarification:
# ----------------------------------------------------------------------------
# [1] https://stackoverflow.com/questions/273192/how-can-i-create-a-directory-if-it-does-not-exist
import pathlib
""" Notes:
1. Include a trailing slash at the end of the directory path
("Method 1," below).
2. If a subdirectory in your intended path matches an existing file
with same name, you will get the following error:
"NotADirectoryError: [Errno 20] Not a directory:" ...
"""
# Uncomment and try each of these "out_dir" paths, singly:
# ----------------------------------------------------------------------------
# METHOD 1:
# Re-running does not overwrite existing directories and files; no errors.
# out_dir = 'output/corpus3' ## no error but no dir created (missing tailing /)
# out_dir = 'output/corpus3/' ## works
# out_dir = 'output/corpus3/doc1' ## no error but no dir created (missing tailing /)
# out_dir = 'output/corpus3/doc1/' ## works
# out_dir = 'output/corpus3/doc1/doc.txt' ## no error but no file created (os.makedirs creates dir, not files! ;-)
# out_dir = 'output/corpus2/tfidf/' ## fails with "Errno 20" (existing file named "corpus2")
# out_dir = 'output/corpus3/tfidf/' ## works
# out_dir = 'output/corpus3/a/b/c/d/' ## works
# [2] https://docs.python.org/3/library/os.html#os.makedirs
# Uncomment these to run "Method 1":
#directory = os.path.dirname(out_dir)
#os.makedirs(directory, mode=0o777, exist_ok=True)
# ----------------------------------------------------------------------------
# METHOD 2:
# Re-running does not overwrite existing directories and files; no errors.
# out_dir = 'output/corpus3' ## works
# out_dir = 'output/corpus3/' ## works
# out_dir = 'output/corpus3/doc1' ## works
# out_dir = 'output/corpus3/doc1/' ## works
# out_dir = 'output/corpus3/doc1/doc.txt' ## no error but creates a .../doc.txt./ dir
# out_dir = 'output/corpus2/tfidf/' ## fails with "Errno 20" (existing file named "corpus2")
# out_dir = 'output/corpus3/tfidf/' ## works
# out_dir = 'output/corpus3/a/b/c/d/' ## works
# Uncomment these to run "Method 2":
#import os, errno
#try:
# os.makedirs(out_dir)
#except OSError as e:
# if e.errno != errno.EEXIST:
# raise
# ----------------------------------------------------------------------------
Conclusion: in my opinion, "Method 2" is more robust.
[1] How can I safely create a nested directory?
[2] https://docs.python.org/3/library/os.html#os.makedirs
You can use mkpath
# Create a directory and any missing ancestor directories.
# If the directory already exists, do nothing.
from distutils.dir_util import mkpath
mkpath("test")
Note that it will create the ancestor directories as well.
It works for Python 2 and 3.
In case you're writing a file to a variable path, you can use this on the file's path to make sure that the parent directories are created.
from pathlib import Path
path_to_file = Path("zero/or/more/directories/file.ext")
parent_directory_of_file = path_to_file.parent
parent_directory_of_file.mkdir(parents=True, exist_ok=True)
Works even if path_to_file is file.ext (zero directories deep).
See pathlib.PurePath.parent and pathlib.Path.mkdir.
Why not use subprocess module if running on a machine that supports command
mkdir with -p option ?
Works on python 2.7 and python 3.6
from subprocess import call
call(['mkdir', '-p', 'path1/path2/path3'])
Should do the trick on most systems.
In situations where portability doesn't matter (ex, using docker) the solution is a clean 2 lines. You also don't have to add logic to check if directories exist or not. Finally, it is safe to re-run without any side effects
If you need error handling:
from subprocess import check_call
try:
check_call(['mkdir', '-p', 'path1/path2/path3'])
except:
handle...
You have to set the full path before creating the directory:
import os,sys,inspect
import pathlib
currentdir = os.path.dirname(os.path.abspath(inspect.getfile(inspect.currentframe())))
your_folder = currentdir + "/" + "your_folder"
if not os.path.exists(your_folder):
pathlib.Path(your_folder).mkdir(parents=True, exist_ok=True)
This works for me and hopefully, it will works for you as well
I saw Heikki Toivonen and A-B-B's answers and thought of this variation.
import os
import errno
def make_sure_path_exists(path):
try:
os.makedirs(path)
except OSError as exception:
if exception.errno != errno.EEXIST or not os.path.isdir(path):
raise
I use os.path.exists(), here is a Python 3 script that can be used to check if a directory exists, create one if it does not exist, and delete it if it does exist (if desired).
It prompts users for input of the directory and can be easily modified.
Use this command check and create dir
if not os.path.isdir(test_img_dir):
os.mkdir(test_img_dir)
Call the function create_dir() at the entry point of your program/project.
import os
def create_dir(directory):
if not os.path.exists(directory):
print('Creating Directory '+directory)
os.makedirs(directory)
create_dir('Project directory')
If you consider the following:
os.path.isdir('/tmp/dirname')
means a directory (path) exists AND is a directory. So for me this way does what I need. So I can make sure it is folder (not a file) and exists.
You can use os.listdir for this:
import os
if 'dirName' in os.listdir('parentFolderPath')
print('Directory Exists')
This may not exactly answer the question. But I guess your real intention is to create a file and its parent directories, given its content all in 1 command.
You can do that with fastcore extension to pathlib: path.mk_write(data)
from fastcore.utils import Path
Path('/dir/to/file.txt').mk_write('Hello World')
See more in fastcore documentation