How to skip files being used in another program? [duplicate] - python

I my application, i have below requests:
1. There has one thread will regularly record some logs in file. The log file will be rollovered in certain interval. for keeping the log files small.
2. There has another thread also will regularly to process these log files. ex: Move the log files to other place, parse the log's content to generate some log reports.
But, there has a condition is the second thread can not process the log file that's using to record the log. in code side, the pseudocode similars like below:
#code in second thread to process the log files
for logFile in os.listdir(logFolder):
if not file_is_open(logFile) or file_is_use(logFile):
ProcessLogFile(logFile) # move log file to other place, and generate log report....
So, how do i check is a file is already open or is used by other process?
I did some research in internet. And have some results:
try:
myfile = open(filename, "r+") # or "a+", whatever you need
except IOError:
print "Could not open file! Please close Excel!"
I tried this code, but it doesn't work, no matter i use "r+" or "a+" flag
try:
os.remove(filename) # try to remove it directly
except OSError as e:
if e.errno == errno.ENOENT: # file doesn't exist
break
This code can work, but it can not reach my request, since i don't want to delete the file to check if it is open.

An issue with trying to find out if a file is being used by another process is the possibility of a race condition. You could check a file, decide that it is not in use, then just before you open it another process (or thread) leaps in and grabs it (or even deletes it).
Ok, let's say you decide to live with that possibility and hope it does not occur. To check files in use by other processes is operating system dependant.
On Linux it is fairly easy, just iterate through the PIDs in /proc. Here is a generator that iterates over files in use for a specific PID:
def iterate_fds(pid):
dir = '/proc/'+str(pid)+'/fd'
if not os.access(dir,os.R_OK|os.X_OK): return
for fds in os.listdir(dir):
for fd in fds:
full_name = os.path.join(dir, fd)
try:
file = os.readlink(full_name)
if file == '/dev/null' or \
re.match(r'pipe:\[\d+\]',file) or \
re.match(r'socket:\[\d+\]',file):
file = None
except OSError as err:
if err.errno == 2:
file = None
else:
raise(err)
yield (fd,file)
On Windows it is not quite so straightforward, the APIs are not published. There is a sysinternals tool (handle.exe) that can be used, but I recommend the PyPi module psutil, which is portable (i.e., it runs on Linux as well, and probably on other OS):
import psutil
for proc in psutil.process_iter():
try:
# this returns the list of opened files by the current process
flist = proc.open_files()
if flist:
print(proc.pid,proc.name)
for nt in flist:
print("\t",nt.path)
# This catches a race condition where a process ends
# before we can examine its files
except psutil.NoSuchProcess as err:
print("****",err)

I like Daniel's answer, but for Windows users, I realized that it's safer and simpler to rename the file to the name it already has. That solves the problems brought up in the comments to his answer. Here's the code:
import os
f = 'C:/test.xlsx'
if os.path.exists(f):
try:
os.rename(f, f)
print 'Access on file "' + f +'" is available!'
except OSError as e:
print 'Access-error on file "' + f + '"! \n' + str(e)

You can check if a file has a handle on it using the next function (remember to pass the full path to that file):
import psutil
def has_handle(fpath):
for proc in psutil.process_iter():
try:
for item in proc.open_files():
if fpath == item.path:
return True
except Exception:
pass
return False

I know I'm late to the party but I also had this problem and I used the lsof command to solve it (which I think is new from the approaches mentioned above). With lsof we can basically check for the processes that are using this particular file.
Here is how I did it:
from subprocess import check_output,Popen, PIPE
try:
lsout=Popen(['lsof',filename],stdout=PIPE, shell=False)
check_output(["grep",filename], stdin=lsout.stdout, shell=False)
except:
#check_output will throw an exception here if it won't find any process using that file
just write your log processing code in the except part and you are good to go.

Instead on using os.remove() you may use the following workaround on Windows:
import os
file = "D:\\temp\\test.pdf"
if os.path.exists(file):
try:
os.rename(file,file+"_")
print "Access on file \"" + str(file) +"\" is available!"
os.rename(file+"_",file)
except OSError as e:
message = "Access-error on file \"" + str(file) + "\"!!! \n" + str(e)
print message

You can use inotify to watch for activity in file system. You can watch for file close events, indicating that a roll-over has happened. You should also add additional condition on file-size. Make sure you filter out file close events from the second thread.

A slightly more polished version of one of the answers from above.
from pathlib import Path
def is_file_in_use(file_path):
path = Path(file_path)
if not path.exists():
raise FileNotFoundError
try:
path.rename(path)
except PermissionError:
return True
else:
return False

On Windows, you can also directly retrieve the information by leveraging on the NTDLL/KERNEL32 Windows API. The following code returns a list of PIDs, in case the file is still opened/used by a process (including your own, if you have an open handle on the file):
import ctypes
from ctypes import wintypes
path = r"C:\temp\test.txt"
# -----------------------------------------------------------------------------
# generic strings and constants
# -----------------------------------------------------------------------------
ntdll = ctypes.WinDLL('ntdll')
kernel32 = ctypes.WinDLL('kernel32', use_last_error=True)
NTSTATUS = wintypes.LONG
INVALID_HANDLE_VALUE = wintypes.HANDLE(-1).value
FILE_READ_ATTRIBUTES = 0x80
FILE_SHARE_READ = 1
OPEN_EXISTING = 3
FILE_FLAG_BACKUP_SEMANTICS = 0x02000000
FILE_INFORMATION_CLASS = wintypes.ULONG
FileProcessIdsUsingFileInformation = 47
LPSECURITY_ATTRIBUTES = wintypes.LPVOID
ULONG_PTR = wintypes.WPARAM
# -----------------------------------------------------------------------------
# create handle on concerned file with dwDesiredAccess == FILE_READ_ATTRIBUTES
# -----------------------------------------------------------------------------
kernel32.CreateFileW.restype = wintypes.HANDLE
kernel32.CreateFileW.argtypes = (
wintypes.LPCWSTR, # In lpFileName
wintypes.DWORD, # In dwDesiredAccess
wintypes.DWORD, # In dwShareMode
LPSECURITY_ATTRIBUTES, # In_opt lpSecurityAttributes
wintypes.DWORD, # In dwCreationDisposition
wintypes.DWORD, # In dwFlagsAndAttributes
wintypes.HANDLE) # In_opt hTemplateFile
hFile = kernel32.CreateFileW(
path, FILE_READ_ATTRIBUTES, FILE_SHARE_READ, None, OPEN_EXISTING,
FILE_FLAG_BACKUP_SEMANTICS, None)
if hFile == INVALID_HANDLE_VALUE:
raise ctypes.WinError(ctypes.get_last_error())
# -----------------------------------------------------------------------------
# prepare data types for system call
# -----------------------------------------------------------------------------
class IO_STATUS_BLOCK(ctypes.Structure):
class _STATUS(ctypes.Union):
_fields_ = (('Status', NTSTATUS),
('Pointer', wintypes.LPVOID))
_anonymous_ = '_Status',
_fields_ = (('_Status', _STATUS),
('Information', ULONG_PTR))
iosb = IO_STATUS_BLOCK()
class FILE_PROCESS_IDS_USING_FILE_INFORMATION(ctypes.Structure):
_fields_ = (('NumberOfProcessIdsInList', wintypes.LARGE_INTEGER),
('ProcessIdList', wintypes.LARGE_INTEGER * 64))
info = FILE_PROCESS_IDS_USING_FILE_INFORMATION()
PIO_STATUS_BLOCK = ctypes.POINTER(IO_STATUS_BLOCK)
ntdll.NtQueryInformationFile.restype = NTSTATUS
ntdll.NtQueryInformationFile.argtypes = (
wintypes.HANDLE, # In FileHandle
PIO_STATUS_BLOCK, # Out IoStatusBlock
wintypes.LPVOID, # Out FileInformation
wintypes.ULONG, # In Length
FILE_INFORMATION_CLASS) # In FileInformationClass
# -----------------------------------------------------------------------------
# system call to retrieve list of PIDs currently using the file
# -----------------------------------------------------------------------------
status = ntdll.NtQueryInformationFile(hFile, ctypes.byref(iosb),
ctypes.byref(info),
ctypes.sizeof(info),
FileProcessIdsUsingFileInformation)
pidList = info.ProcessIdList[0:info.NumberOfProcessIdsInList]
print(pidList)

I provided one solution. please see the following code.
def isFileinUsed(ifile):
widlcard = "/proc/*/fd/*"
lfds = glob.glob(widlcard)
for fds in lfds:
try:
file = os.readlink(fds)
if file == ifile:
return True
except OSError as err:
if err.errno == 2:
file = None
else:
raise(err)
return False
You can use this function to check if a file is in used.
Note:
This solution only can be used for Linux system.

Related

python: monitor updates in /proc/mydev file

I wrote a kernel module that writes in /proc/mydev to notify the python program in userspace. I want to trigger a function in the python program whenever there is an update of data in /proc/mydev from the kernel module. What is the best way to listen for an update here? I am thinking about using "watchdog" (https://pythonhosted.org/watchdog/). Is there a better way for this?
This is an easy and efficient way:
import os
from time import sleep
from datetime import datetime
def myfuction(_time):
print("file modified, time: "+datetime.fromtimestamp(_time).strftime("%H:%M:%S"))
if __name__ == "__main__":
_time = 0
while True:
last_modified_time = os.stat("/proc/mydev").st_mtime
if last_modified_time > _time:
myfuction(last_modified_time)
_time = last_modified_time
sleep(1) # prevent high cpu usage
result:
file modified, time: 11:44:09
file modified, time: 11:46:15
file modified, time: 11:46:24
The while loop guarantees that the program keeps listening to changes forever.
You can set the interval by changing the sleep time. Low sleep time causes high CPU usage.
import time
import os
# get the file descriptor for the proc file
fd = os.open("/proc/mydev", os.O_RDONLY)
# create a polling object to monitor the file for updates
poller = select.poll()
poller.register(fd, select.POLLIN)
# create a loop to monitor the file for updates
while True:
events = poller.poll(10000)
if len(events) > 0:
# read the contents of the file if updated
print(os.read(fd, 1024))
sudo pip install inotify
Example
Code for monitoring a simple, flat path (see “Recursive Watching” for watching a hierarchical structure):
import inotify.adapters
def _main():
i = inotify.adapters.Inotify()
i.add_watch('/tmp')
with open('/tmp/test_file', 'w'):
pass
for event in i.event_gen(yield_nones=False):
(_, type_names, path, filename) = event
print("PATH=[{}] FILENAME=[{}] EVENT_TYPES={}".format(
path, filename, type_names))
if __name__ == '__main__':
_main()
Expected output:
PATH=[/tmp] FILENAME=[test_file] EVENT_TYPES=['IN_MODIFY']
PATH=[/tmp] FILENAME=[test_file] EVENT_TYPES=['IN_OPEN']
PATH=[/tmp] FILENAME=[test_file] EVENT_TYPES=['IN_CLOSE_WRITE']
I'm not sure if this would work for your situation, since it seems that you're wanting to watch a folder, but this program watches a file at a time until the main() loop repeats:
import os
import time
def main():
contents = os.listdir("/proc/mydev")
for file in contents:
f = open("/proc/mydev/" + file, "r")
init = f.read()
f.close()
while different = false:
f = open("/proc/mydev/" + file, "r")
check = f.read()
f.close()
if init !== check:
different = true
else:
different = false
time.sleep(1)
main()
# Write what you would want to happen if a change occured here...
main()
main()
main()
You could then write what you would want to happen right before the last usage of main(), as it would then repeat.
Also, this may contain errors, since I rushed this.
Hope this at least helps!
You can't do this efficiently without modifying your kernel driver.
Instead of using procfs, have it register a new character device under /dev, and write that driver to make new content available to read from that device only when new content has in fact come in from the underlying hardware, such that the application layer can issue a blocking read and have it return only when new content exists.
A good example to work from (which also has plenty of native Python clients) is the evdev devices in the input core.

Check if a File is written or in use by another process

I try to find a solution for the following problems but can't find any good solution.
I have a folder with subfolder and files in it.
Some of the files may be in use by another process (the other process is writing data to the data (a .mdf file)).
I simply want to check if the files are in use or not.
Structure:
A_Folder
Setup
Data1
.mdf-file1
.mdf-file2
Data2
Data3
Evaluation
something like:
def file_in_use():
*your solution*
for file in folder:
if file_in_use(file):
print("file in use")
break
I'm Using Win10, PyCharm and a venv.
I tried so far form other "solutions":
psutil (works, but is too slow)
open(), os.rename - won't work for me
subprocess wont work either -cant find my filename: using the method from Amit
Gupta from my link down below, file looks like this: "C:\Data\S_t_h\S-t-h\H001.mdf"
basically I tried everything from this question:
Check if a file is not open nor being used by another process
from subprocess import check_output, Popen, PIPE
src = r"C:\Data\S_t_h\S-t-h\H001.mdf"
files_in_use = False
def file_in_use(src):
try:
lsout = Popen(['lsof', src], stdout=PIPE, shell=False)
check_output(["grep", src], stdin=lsout.stdout, shell=False)
except:
return False
return True
if file_in_use(src):
files_in_use = True
and im getting:
FileNotFoundError: [WinError 2] The system cannot find the file specified
this link suggesting setting
winerror-2-the-system-cannot-find-the-file-specified-python
shell=True
Im getting "lsof" and "grep" cant be found or are wrong now.
Here the psutil method that works for me, but is too slow (~10 Seconds)
import psutil
src = r"C:\Data\S_t_h\S-t-h\H001.mdf"
def has_handle(src):
for proc in psutil.process_iter():
try:
for item in proc.open_files():
if src == item.path:
return True
except Exception:
pass
return False
print(has_handle(src))
My Solution:
Sorry for the delayed answer.
It simply worked with:
try:
os.rename(src, src)
return False
except OSError: # file is in use
return True
I made it more complicated than it actually was i guess.
But thank you guys anyway for your feedback and critizism.

Python shutil module move exception handling

Destination path /tmp/abc in both Process 1 & Process 2
Say there are N number of process running
we need to retain the file generated by the latest one
Process1
import shutil
shutil.move(src_path, destination_path)
Process 2
import os
os.remove(destination_path)
Solution
1. Handle the process saying if copy fails with [ErrNo2]No Such File or Directory
Is this the correct solution? Is there a better way to handle this
Useful Link A safe, atomic file-copy operation
You can use FileNotFoundError error
try :
shutil.move(src_path, destination_path)
except FileNotFoundError:
print ('File Not Found')
# Add whatever logic you want to execute
except :
print ('Some Other error')
The primary solutions that come to mind are either
Keep partial information in per-process staging file
Rely on the os for atomic moves
or
Keep partial information in per-process memory
Rely on interprocess communication / locks for atomic writes
For the first, e.g.:
import tempfile
import os
FINAL = '/tmp/something'
def do_stuff():
fd, name = tempfile.mkstemp(suffix="-%s" % os.getpid())
while keep_doing_stuff():
os.write(fd, get_output())
os.close(fd)
os.rename(name, FINAL)
if __name__ == '__main__':
do_stuff()
You can choose to invoke individually from a shell (as shown above) or with some process wrappers (subprocess or multiprocessing would be fine), and either way will work.
For interprocess you would probably want to spawn everything from a parent process
from multiprocessing import Process, Lock
from cStringIO import StringIO
def do_stuff(lock):
output = StringIO()
while keep_doing_stuff():
output.write(get_output())
with lock:
with open(FINAL, 'w') as f:
f.write(output.getvalue())
output.close()
if __name__ == '__main__':
lock = Lock()
for num in range(2):
Process(target=do_stuff, args=(lock,)).start()

shutil.rmtree doesn't work with Windows Library

So I'm building a simple script that backs up certain documents to my second hard-drive (you never know what could happen!). So, I used the shutil.copytree function to replicate my data on the second drive. It works beautifully, and that is not the problem.
I use the shutil.rmtree function to remove the tree if the destination already exists. I'll show you my code:
import shutil
import os
def overwrite(src, dest):
if(not os.path.exists(src)):
print(src, "does not exist, so nothing may be copied.")
return
if(os.path.exists(dest)):
shutil.rmtree(dest)
shutil.copytree(src, dest)
print(dest, "overwritten with data from", src)
print("")
overwrite(r"C:\Users\Centurion\Dropbox\Documents", r"D:\Backup\Dropbox Documents")
overwrite(r"C:\Users\Centurion\Pictures", r"D:\Backup\All Pictures")
print("Press ENTER to continue...")
input()
As you can see, a simple script. Now, when I run the script for the first time, everything is fine. Pictures and Documents copy over to my D: drive just fine. However, when I run for the second time, this is my output:
C:\Users\Centurion\Programming\Python>python cpdocsnpics.py
D:\Backup\Dropbox Documents overwritten with data from C:\Users\Centurion\Dropbox\Documents
Traceback (most recent call last):
File "cpdocsnpics.py", line 17, in <module>
overwrite(r"C:\Users\Centurion\Pictures", r"D:\Backup\All Pictures")
File "cpdocsnpics.py", line 10, in overwrite
shutil.rmtree(dest)
File "C:\Python34\lib\shutil.py", line 477, in rmtree
return _rmtree_unsafe(path, onerror)
File "C:\Python34\lib\shutil.py", line 376, in _rmtree_unsafe
onerror(os.rmdir, path, sys.exc_info())
File "C:\Python34\lib\shutil.py", line 374, in _rmtree_unsafe
os.rmdir(path)
PermissionError: [WinError 5] Access is denied: 'D:\\Backup\\All Pictures'
The error only happens when I copy Pictures after the first time; I'm assuming it has something to do with being a Library.
What should I do?
That's a cross-platform consistency issue.
You've copied files/dirs with readonly attribute. On the first time "dest" not exists, thus rmtree method is not performed. However, when you try run "overwrite" function we can notice that "dest" location exists (and its subtree) but it was copied with readonly access. So here we got a problem.
In order to "fix" issue, you must provide a handler for onerror parameter of shutil.rmtree. As long as your problem is regarding readonly issues the workaround is somewhat like this:
def readonly_handler(func, path, execinfo):
os.chmod(path, 128) #or os.chmod(path, stat.S_IWRITE) from "stat" module
func(path)
As you can see in the python doc onerror must be a callable that accepts three parameters: function, path, and excinfo. For further info, read the docs.
def overwrite(src, dest):
if(not os.path.exists(src)):
print(src, "does not exist, so nothing may be copied.")
return
if(os.path.exists(dest)):
shutil.rmtree(dest, onerror=readonly_handler)
shutil.copytree(src, dest)
print(dest, "overwritten with data from", src)
print("")
Of course, this handler is simple and specific but if other errors occur, new exceptions will be raised and this handler may not be able to fix them!
Note:
Tim Golden (Python for windows contributor) has been patching the shutil.rmtree issue and it seems it will be resolved in Python 3.5 (see issue 19643).
I found a problem other than the read-only file using shutil.rmtree on Windows (testing on Windows 7). I was using a combination of shutil.rmtree and shutil.copytree to create a test fixture in a test suite, so the sequence was being called repeatedly in a short period of tme (<1 sec intervals), and I was seeing unpredictable failures part way through the test suite, with both EACCES and ENOTEMPTY errors reported. The symptoms suggested to me that the shutil.rmtree function had not completed on return to the calling program, and that it was only after some time that the deleted filenames were available for re-use.
TL;DR: the solution isn't pretty - broadly, it renames the directory before deleting it, but there are a number of wrinkles that need to be handled because the Windows file system seems to take some time to catch up with the operations perfumed. The actual code catches a variety of failure conditions and retries a variant of the failed operation after a short delay.
A longer discussion follows, with my final code at the end.
My first thought was to try renaming the directory tree before removing it, so that the original directory name is immediately available for re-use. This does appear to help. To this end, I created a replacement for rmtree whose essence is this:
def removetree(tgt):
def error_handler(func, path, execinfo):
e = execinfo[1]
if e.errno == errno.ENOENT or not os.path.exists(path):
return # path does not exist - treat as success
if func in (os.rmdir, os.remove) and e.errno == errno.EACCES:
os.chmod(path, stat.S_IRWXU| stat.S_IRWXG| stat.S_IRWXO) # 0777
func(path) # read-only file; make writable and retry
raise e
tmp = os.path.join(os.path.dirname(tgt),"_removetree_tmp")
os.rename(tgt, tmp)
shutil.rmtree(tmp, onerror=error_handler)
return
I found this logic was an improvement, but it was subject to unpredictable failure of the os.rename operation, with one of several possible errors. So I also added some retry logic around os.rename, thus:
def removetree(tgt):
def error_handler(func, path, execinfo):
# figure out recovery based on error...
e = execinfo[1]
if e.errno == errno.ENOENT or not os.path.exists(path):
return # path does not exist
if func in (os.rmdir, os.remove) and e.errno == errno.EACCES:
os.chmod(path, stat.S_IRWXU| stat.S_IRWXG| stat.S_IRWXO) # 0777
func(path) # read-only file; make writable and retry
raise e
# Rename target directory to temporary value, then remove it
count = 0
while count < 10: # prevents indefinite loop
count += 1
tmp = os.path.join(os.path.dirname(tgt),"_removetree_tmp_%d"%(count))
try:
os.rename(tgt, tmp)
shutil.rmtree(tmp, onerror=error_handler)
break
except OSError as e:
time.sleep(1) # Give file system some time to catch up
if e.errno in [errno.EACCES, errno.ENOTEMPTY]:
continue # Try another temp name
if e.errno == errno.EEXIST:
shutil.rmtree(tmp, ignore_errors=True) # Try to clean up old files
continue # Try another temp name
if e.errno == errno.ENOENT:
break # 'src' does not exist(?)
raise # Other error - propagate
return
The above code is not tested, but the general idea here does seem to work. The full code I actually use is below, and uses two functions. It probably contains some unnecessary logic, but does seem to be working more reliably for me (in that my test suite now passes repeatedly on Windows where previously it failed unpredictably on a majority of runs):
def renametree_temp(src):
"""
Rename tree to temporary name, and return that name, or
None if the source directory does not exist.
"""
count = 0
while count < 10: # prevents indefinite loop
count += 1
tmp = os.path.join(os.path.dirname(src),"_removetree_tmp_%d"%(count))
try:
os.rename(src, tmp)
return tmp # Success!
except OSError as e:
time.sleep(1)
if e.errno == errno.EACCES:
log.warning("util.renametree_temp: %s EACCES, retrying"%tmp)
continue # Try another temp name
if e.errno == errno.ENOTEMPTY:
log.warning("util.renametree_temp: %s ENOTEMPTY, retrying"%tmp)
continue # Try another temp name
if e.errno == errno.EEXIST:
log.warning("util.renametree_temp: %s EEXIST, retrying"%tmp)
shutil.rmtree(tmp, ignore_errors=True) # Try to clean up old files
continue # Try another temp name
if e.errno == errno.ENOENT:
log.warning("util.renametree_temp: %s ENOENT, skipping"%tmp)
break # 'src' does not exist(?)
raise # Other error: propagaee
return None
def removetree(tgt):
"""
Work-around for python problem with shutils tree remove functions on Windows.
See:
https://stackoverflow.com/questions/23924223/
https://stackoverflow.com/questions/1213706/
https://stackoverflow.com/questions/1889597/
http://bugs.python.org/issue19643
"""
# shutil.rmtree error handler that attempts recovery from attempts
# on Windows to remove a read-only file or directory (see links above).
def error_handler(func, path, execinfo):
e = execinfo[1]
if e.errno == errno.ENOENT or not os.path.exists(path):
return # path does not exist: nothing to do
if func in (os.rmdir, os.remove) and e.errno == errno.EACCES:
try:
os.chmod(path, stat.S_IRWXU| stat.S_IRWXG| stat.S_IRWXO) # 0777
except Exception as che:
log.warning("util.removetree: chmod failed: %s"%che)
try:
func(path)
except Exception as rfe:
log.warning("util.removetree: 'func' retry failed: %s"%rfe)
if not os.path.exists(path):
return # Gone, assume all is well
raise
if e.errno == errno.ENOTEMPTY:
log.warning("util.removetree: Not empty: %s, %s"%(path, tgt))
time.sleep(1)
removetree(path) # Retry complete removal
return
log.warning("util.removetree: rmtree path: %s, error: %s"%(path, repr(execinfo)))
raise e
# Try renaming to a new directory first, so that the tgt is immediately
# available for re-use.
tmp = renametree_temp(tgt)
if tmp:
shutil.rmtree(tmp, onerror=error_handler)
return
(The above code incorporates a solution to the read-only file problem from What user do python scripts run as in windows?, which according to Deleting directory in Python is tested. I don't think I encounter the read-only file problem, so assume it is not tested in my test suite.)

check what files are open in Python

I'm getting an error in a program that is supposed to run for a long time that too many files are open. Is there any way I can keep track of which files are open so I can print that list out occasionally and see where the problem is?
To list all open files in a cross-platform manner, I would recommend psutil.
#!/usr/bin/env python
import psutil
for proc in psutil.process_iter():
print(proc.open_files())
The original question implicitly restricts the operation to the currently running process, which can be accessed through psutil's Process class.
proc = psutil.Process()
print(proc.open_files())
Lastly, you'll want to run the code using an account with the appropriate permissions to access this information or you may see AccessDenied errors.
I ended up wrapping the built-in file object at the entry point of my program. I found out that I wasn't closing my loggers.
import io
import sys
import builtins
import traceback
from functools import wraps
def opener(old_open):
#wraps(old_open)
def tracking_open(*args, **kw):
file = old_open(*args, **kw)
old_close = file.close
#wraps(old_close)
def close():
old_close()
open_files.remove(file)
file.close = close
file.stack = traceback.extract_stack()
open_files.add(file)
return file
return tracking_open
def print_open_files():
print(f'### {len(open_files)} OPEN FILES: [{", ".join(f.name for f in open_files)}]', file=sys.stderr)
for file in open_files:
print(f'Open file {file.name}:\n{"".join(traceback.format_list(file.stack))}', file=sys.stderr)
open_files = set()
io.open = opener(io.open)
builtins.open = opener(builtins.open)
On Linux, you can look at the contents of /proc/self/fd:
$ ls -l /proc/self/fd/
total 0
lrwx------ 1 foo users 64 Jan 7 15:15 0 -> /dev/pts/3
lrwx------ 1 foo users 64 Jan 7 15:15 1 -> /dev/pts/3
lrwx------ 1 foo users 64 Jan 7 15:15 2 -> /dev/pts/3
lr-x------ 1 foo users 64 Jan 7 15:15 3 -> /proc/9527/fd
Although the solutions above that wrap opens are useful for one's own code, I was debugging my client to a third party library including some c extension code, so I needed a more direct way. The following routine works under darwin, and (I hope) other unix-like environments:
def get_open_fds():
'''
return the number of open file descriptors for current process
.. warning: will only work on UNIX-like os-es.
'''
import subprocess
import os
pid = os.getpid()
procs = subprocess.check_output(
[ "lsof", '-w', '-Ff', "-p", str( pid ) ] )
nprocs = len(
filter(
lambda s: s and s[ 0 ] == 'f' and s[1: ].isdigit(),
procs.split( '\n' ) )
)
return nprocs
If anyone can extend to be portable to windows, I'd be grateful.
On Linux, you can use lsof to show all files opened by a process.
As said earlier, you can list fds on Linux in /proc/self/fd, here is a simple method to list them programmatically:
import os
import sys
import errno
def list_fds():
"""List process currently open FDs and their target """
if not sys.platform.startswith('linux'):
raise NotImplementedError('Unsupported platform: %s' % sys.platform)
ret = {}
base = '/proc/self/fd'
for num in os.listdir(base):
path = None
try:
path = os.readlink(os.path.join(base, num))
except OSError as err:
# Last FD is always the "listdir" one (which may be closed)
if err.errno != errno.ENOENT:
raise
ret[int(num)] = path
return ret
On Windows, you can use Process Explorer to show all file handles owned by a process.
There are some limitations to the accepted response, in that it does not seem to count pipes. I had a python script that opened many sub-processes, and was failing to properly close standard input, output, and error pipes, which were used for communication. If I use the accepted response, it will fail to count these open pipes as open files, but (at least in Linux) they are open files and count toward the open file limit. The lsof -p solution suggested by sumid and shunc works in this situation, because it also shows you the open pipes.
Get a list of all open files. handle.exe is part of Microsoft's Sysinternals Suite. An alternative is the psutil Python module, but I find 'handle' will print out more files in use.
Here is what I made. Kludgy code warning.
#!/bin/python3
# coding: utf-8
"""Build set of files that are in-use by processes.
Requires 'handle.exe' from Microsoft SysInternals Suite.
This seems to give a more complete list than using the psutil module.
"""
from collections import OrderedDict
import os
import re
import subprocess
# Path to handle executable
handle = "E:/Installers and ZIPs/Utility/Sysinternalssuite/handle.exe"
# Get output string from 'handle'
handle_str = subprocess.check_output([handle]).decode(encoding='ASCII')
""" Build list of lists.
1. Split string output, using '-' * 78 as section breaks.
2. Ignore first section, because it is executable version info.
3. Turn list of strings into a list of lists, ignoring first item (it's empty).
"""
work_list = [x.splitlines()[1:] for x in handle_str.split(sep='-' * 78)[1:]]
""" Build OrderedDict of pid information.
pid_dict['pid_num'] = ['pid_name','open_file_1','open_file_2', ...]
"""
pid_dict = OrderedDict()
re1 = re.compile("(.*?\.exe) pid: ([0-9]+)") # pid name, pid number
re2 = re.compile(".*File.*\s\s\s(.*)") # File name
for x_list in work_list:
key = ''
file_values = []
m1 = re1.match(x_list[0])
if m1:
key = m1.group(2)
# file_values.append(m1.group(1)) # pid name first item in list
for y_strings in x_list:
m2 = re2.match(y_strings)
if m2:
file_values.append(m2.group(1))
pid_dict[key] = file_values
# Make a set of all the open files
values = []
for v in pid_dict.values():
values.extend(v)
files_open = sorted(set(values))
txt_file = os.path.join(os.getenv('TEMP'), 'lsof_handle_files')
with open(txt_file, 'w') as fd:
for a in sorted(files_open):
fd.write(a + '\n')
subprocess.call(['notepad', txt_file])
os.remove(txt_file)
I'd guess that you are leaking file descriptors. You probably want to look through your code to make sure that you are closing all of the files that you open.
You can use the following script. It builds on Claudiu's answer. It addresses some of the issues and adds additional features:
Prints a stack trace of where the file was opened
Prints on program exit
Keyword argument support
Here's the code and a link to the gist, which is possibly more up to date.
"""
Collect stacktraces of where files are opened, and prints them out before the
program exits.
Example
========
monitor.py
----------
from filemonitor import FileMonitor
FileMonitor().patch()
f = open('/bin/ls')
# end of monitor.py
$ python monitor.py
----------------------------------------------------------------------------
path = /bin/ls
> File "monitor.py", line 3, in <module>
> f = open('/bin/ls')
----------------------------------------------------------------------------
Solution modified from:
https://stackoverflow.com/questions/2023608/check-what-files-are-open-in-python
"""
from __future__ import print_function
import __builtin__
import traceback
import atexit
import textwrap
class FileMonitor(object):
def __init__(self, print_only_open=True):
self.openfiles = []
self.oldfile = __builtin__.file
self.oldopen = __builtin__.open
self.do_print_only_open = print_only_open
self.in_use = False
class File(self.oldfile):
def __init__(this, *args, **kwargs):
path = args[0]
self.oldfile.__init__(this, *args, **kwargs)
if self.in_use:
return
self.in_use = True
self.openfiles.append((this, path, this._stack_trace()))
self.in_use = False
def close(this):
self.oldfile.close(this)
def _stack_trace(this):
try:
raise RuntimeError()
except RuntimeError as e:
stack = traceback.extract_stack()[:-2]
return traceback.format_list(stack)
self.File = File
def patch(self):
__builtin__.file = self.File
__builtin__.open = self.File
atexit.register(self.exit_handler)
def unpatch(self):
__builtin__.file = self.oldfile
__builtin__.open = self.oldopen
def exit_handler(self):
indent = ' > '
terminal_width = 80
for file, path, trace in self.openfiles:
if file.closed and self.do_print_only_open:
continue
print("-" * terminal_width)
print(" {} = {}".format('path', path))
lines = ''.join(trace).splitlines()
_updated_lines = []
for l in lines:
ul = textwrap.fill(l,
initial_indent=indent,
subsequent_indent=indent,
width=terminal_width)
_updated_lines.append(ul)
lines = _updated_lines
print('\n'.join(lines))
print("-" * terminal_width)
print()

Categories