Have an example piece of (Python) code to check if a directory has changed:
import os
def watch(path, fdict):
"""Checks a directory and children for changes"""
changed = []
for root, dirs, files in os.walk(path):
for f in files:
abspath = os.path.abspath(os.path.join(root, f))
new_mtime = os.stat(abspath).st_mtime
if not fdict.has_key(abspath) or new_mtime > fdict[abspath]:
changed.append(abspath)
fdict[abspath] = new_mtime
return fdict, changed
But the accompanying unittest randomly fails unless I add at least a 2 second sleep:
import unittest
import project_creator
import os
import time
class tests(unittest.TestCase):
def setUp(self):
os.makedirs('autotest')
f = open(os.path.join('autotest', 'new_file.txt'), 'w')
f.write('New file')
def tearDown(self):
os.unlink(os.path.join('autotest', 'new_file.txt'))
os.rmdir('autotest')
def test_amend_file(self):
changed = project_creator.watch('autotest', {})
time.sleep(2)
f = open(os.path.join('autotest', 'new_file.txt'), 'a')
f.write('\nA change!')
f.close()
changed = project_creator.watch('autotest', changed[0])
self.assertEqual(changed[1], [os.path.abspath(os.path.join('autotest', 'new_file.txt'))])
if __name__ == '__main__':
unittest.main()
Is stat really limited to worse than 1 second accuracy? (Edit: apparently so, with FAT)
Is there any (cross platform) way of detecting more rapid changes?
The proper way is to watch a directory instead of polling for changes.
Check out FindFirstChangeNotification Function.
Watch a Directory for Changes is a Python implementation.
If directory watching isn't accurate enough then probably the only alternative is to intercept file systems calls.
Watchdog:
http://packages.python.org/watchdog/quickstart.html
Is a good project to have some multi-platform changes notification.
if this were linux, i'd use inotify. there's apparently a windows inotify equivalent - the java jnotify library has implemented it - but i don't know if there's a python implementation
Related
I have a Python script that creates a directory on local NTFS SSD disk on a computer running Windows.
os.makedirs(path)
After creating the directory
os.path.isdir(path)
returns true.
But when i start a process using multiprocessing.Pool's starmap, sometimes os.path.isdir(path) returns false for the same path in the function executed in the pool process. The failure occurs quite randomly, and i haven't spotted any obvious reason for it.
The directory is created in the same thread that starts the processes. There are multiple processes writing files to the directory, but each of them has it's own file and none of them modifies the directory in any other way.
I've tried solving this using retries and delays, but would like to know whether there's a correct solution to this problem?
Are the filesystem's records somehow outdated in the other processes at this time? And if so, is there any way to ensure they're updated before attempting to access any files or directories?
Here's a minimal example with all the relevant bits. The system processes "Works". Whenever a new work arrives, the system launches a thread that schedules processing in a process pool. The processing itself consists of writing a file the the directory causing problems i.e. the processes don't seem to find the directory.
Note: i haven't been able to reproduce this problem in the original environment nor with this example. I just see in the logs, that it occurs every now and then.
#!/usr/bin/python
import os
import sys
import time
import random
import shutil
import threading
import multiprocessing
def create_dir(path):
os.makedirs(path)
# This never fails
if not os.path.isdir(path):
raise Exception(f"Unable to create directory '{path}'")
def clean_dir(path):
for f in os.listdir(path):
filepath = os.path.join(path, f)
if os.path.isdir(filepath):
shutil.rmtree(filepath, ignore_errors = True)
else:
os.remove(filepath)
# Creates a file to the problem directory. Crashes if
# the directory doesn't exist (or isn't visible).
# This function is executed in a process.
def process(i, tmp_dir):
print(f"Processing {i} / {tmp_dir}")
# Most of the work is done here before accessing path
time.sleep (1 + 3 * random.random ())
# Check Foo/Work-id/Temp
# This fails every now and then, and when
# it fails, it fails for every child process
if not os.path.isdir(tmp_dir):
raise Exception(f"Directory '{tmp_dir}' doesn't exist")
filepath = os.path.join(tmp_dir, f"File-{i}.txt")
with open(filepath, 'w') as file:
file.write(f"{i}")
# Starts 3 processes (from a pool of 8) each writing a single file.
# This function itself is executed in a thread.
def start_processing(target_dir, pool):
tmp_dir = os.path.join(target_dir, "Temp")
# This never fails, create Foo/Work-id/Temp
if not os.path.exists(tmp_dir):
create_dir(tmp_dir)
num_processes = 3
results = pool.starmap(process, enumerate([tmp_dir] * num_processes))
# Creates directory Foo to the current directory. And starts 20 works
# to be executed in a process pool, 8 at a time.
if __name__ == "__main__":
pool = multiprocessing.Pool(processes = 8)
root = "Foo"
seed = int (time.time())
print(f"Seed {seed}")
# This never fails
if not os.path.exists(root):
create_dir(root)
else:
clean_dir(root)
time.sleep(2)
num_works = 20
for work_id in range (1, num_works + 1):
name = f"Work-{work_id}"
print(f"Starting {name}/{num_works})")
target_dir = os.path.join(root, name)
# Create target directory Foo/Work-id
if not os.path.exists(target_dir):
create_dir(target_dir)
thread = threading.Thread(
target = start_processing,
args = (target_dir, pool),
name = "Processing Thread"
)
thread.start()
time.sleep(0.07)
I have a tool which generates some reports as html file. Since there are many and it need to be generated manual organizing them manually will take a lot of time and that's why I tried on making a script which will organize the files automatically with some rules I have applied.
import os
import re
import endwith
filefullname = EndsWith('.html')
allfiles = filefullname.findfile()
report_path = "/home/user/reports/"
while True:
files = os.listdir("/home/user/")
if not allfiles:
continue
else:
header = re.match(r"^[^_]+(?=_)", allfiles[0])
if not os.path.exists(report_path + str(header.group())):
os.system(f"mkdir {report_path + str(header.group())}")
os.system(f"mv /home/user/*.html reports/{str(header.group())}")
else:
os.system(f"mv /home/user/*.html reports/{str(header.group())}")
This is the main file which do the automation. and the class is a custom endswith class because the native one returned only boolean types. The thing is it that it runs but the problem is that it requires a restart to finish the job.
Any suggestions?
P.S. This is the class code:
import os
class EndsWith:
def __init__(self, extension):
self.extension = extension
def findfile(self):
files = os.listdir("/home/user/")
file_list = []
for file in files:
#print(file)
if self.extension in file:
file_list.append(file)
return file_list
There are already several answered questions to this topic but actually not addressing my problem.
I'm using PyCharm 2021.2 and I want to be able to run unit tests both, individually and as a whole. This is important because if I have many tests and only some of them fail, I usually debug my code and would like to run only my failed tests to see if my debugging was successful. Only then do I want to re-run all the tests.
My setup is as follows:
So, for instance, I want to:
Right-click the folder tests and run all tests in this folder (in my example screenshot, there is only one test_MyClass.py but usually here would be many such tests).
Right-click an individual test, e.g. test_MyClass.py, and run it on its own.
Both possibilities usually work fine. However, when my single tests use some relative paths, for instance, to read some test assets (in my case from the folder containing_folder/tests/testassets), only the option 1) works. The option 2) runs into a FileNotFoundError: No such file or directory.
The code to reproduce this behavior is:
MyClass.py:
class MyClass:
_content = None
def set_content(self, content):
self._content = content
def get_content(self):
return self._content
test_MyClass.py:
import unittest
import io
from ..MyClass import MyClass
class MyClassTests(unittest.TestCase):
myClassInstance = None
#classmethod
def setUpClass(cls):
cls.myClassInstance = MyClass()
def get_file(self, use_case):
path_to_file = "testassets/" + use_case + ".txt"
with io.open(path_to_file, 'r', encoding="utf-8") as file:
file = file.read()
return file
def test_uc_file1(self):
file_content = self.get_file("uc_1")
self.myClassInstance.set_content(file_content)
self.assertEquals("test1", self.myClassInstance.get_content())
def test_uc_file2(self):
file_content = self.get_file("uc_2")
self.myClassInstance.set_content(file_content)
self.assertEquals("test2", self.myClassInstance.get_content())
It seems that path_to_file = "testassets/" + use_case + ".txt" only works as a relative path in the 1) option, but not in the 2) option.
How can I recognize programmatically, which option 1) or 2) I'm starting a test in PyCharm? And which path would then I have to choose for option 2)? I tried ../testassets, ../../testassets, ../../, , ../ but none of them worked for option 2).
Ok, I found how to accomplish what I want.
First of all, I got rid of relative paths when importing. Instead of from ..MyClass import MyClass I use simply from MyClass import MyClass.
Second, my methods setUpClass and get_file now look like this:
#classmethod
def setUpClass(cls):
cls.path = os.path.normpath(os.path.abspath(__file__)) # requires import os
if os.path.isfile(cls.path):
cls.path = os.path.dirname(cls.path)
cls.myClassInstance = MyClass()
def get_file(self, use_case):
path_to_file = self.path + "/testassets/" + use_case + ".txt"
with io.open(path_to_file, 'r', encoding="utf-8") as file:
file = file.read()
return file
The point is that os.path.abspath(__file__) returns a root path of either the directory containing_folder/tests if I choose option 1) to start all tests or the filename containing_folder/tests/test_MyClass.py if I choose option 2) to start a single test. In the if statement
if os.path.isfile(cls.path):
cls.path = os.path.dirname(cls.path)
I generalize both special cases to get the root directory of all the tests and easily find the test assets relative to them.
I am using the following function to allow the OS to open a third party application associated with the filetype in question. For example: If variable 'fileToOpen' links to a file (it's full path of course) called flower.psd, this function would open up Photoshop in Windows and Gimp in Linux (typically).
def launchFile(fileToOpen):
if platform.system() == 'Darwin': # macOS
subprocess.call(('open', fileToOpen))
elif platform.system() == 'Windows': # Windows
os.startfile(fileToOpen)
else: # linux variants
subprocess.call(('xdg-open', fileToOpen))
While it is running, I want to have the same python script monitor the use of that file and delete it once the third party app is done using it (meaning...the 3rd party app closed the psd file or the third party app itself closed and released the file from use).
I've tried using psutil and pywin32 but neither seem to work in Windows 10 with Python3.9. Does anyone have any success with this? If so, how did you go about getting the process of the third party app while not getting a permission error from Windows?
Ideally, I would like to get a solution that works across Windows, Macs, and Linux but I'll take any help with Windows 10 for now since Mac and Linux can be found easier with commandline assistance with the ps -ax | grep %filename% command
Keep in mind, this would ideally track any file. TIA for your help.
Update by request:
I tried adding this code to mine (from a previous suggestion). Even this alone in a python test.py file spits out permission errors:
import psutil
for proc in psutil.process_iter():
try:
# this returns the list of opened files by the current process
flist = proc.open_files()
if flist:
print(proc.pid,proc.name)
for nt in flist:
print("\t",nt.path)
# This catches a race condition where a process ends
# before we can examine its files
except psutil.NoSuchProcess as err:
print("****",err)
The follow code below does not spit out an error but does not detect a file in use:
import psutil
from pathlib import Path
def has_handle(fpath):
for proc in psutil.process_iter():
try:
for item in proc.open_files():
if fpath == item.path:
return True
except Exception:
pass
return False
thePath = Path("C:\\Users\\someUser\\Downloads\\Book1.xlsx")
fileExists = has_handle(thePath)
if fileExists :
print("This file is in use!")
else :
print("This file is not in use")
Found it!
The original recommendation from another post forgot one function..."Path" The item.path from the process list is returned as a string. This needs to convert to a Path object for comparison of your own path object.
Therefore this line:
if fpath == item.path:
Should be:
if fpath == Path(item.path):
and here is the full code:
import psutil
from pathlib import Path
def has_handle(fpath):
for proc in psutil.process_iter():
try:
for item in proc.open_files():
print (item.path)
if fpath == Path(item.path):
return True
except Exception:
pass
return False
thePath = Path("C:\\Users\\someUser\\Downloads\\Book1.xlsx")
fileExists = has_handle(thePath)
if fileExists :
print("This file is in use!")
else :
print("This file is not in use")
Note: The reason to use Path objects rather than a string is to stay OS independant.
Based on #Frankie 's answer I put together this script. The script above took 16.1 seconds per file as proc.open_files() is quite slow.
The script below checks all files in a directory and returns the pid related to each open file. 17 files only took 2.9s to check. This is due to only calling proc.open_files() if the files default app is open in memory.
As this is used to check if a folder can be moved, the pid can be later used to force close the locking application but BEWARE that that application could have other documents open and all data would be lost.
This does not detect open txt files or may not detect files that dont have a default application
from pathlib import Path
import psutil
import os
import shlex
import winreg
from pprint import pprint as pp
from collections import defaultdict
class CheckFiles():
def check_locked_files(self, path: str):
'''Check all files recursivly in a directory and return a dict with the
locked files associated with each pid (proocess id)
Args:
path (str): root directory
Returns:
dict: dict(pid:[filenames])
'''
fnames = []
apps = set()
for root, _, f_names in os.walk(path):
for f in f_names:
f = Path(os.path.join(root, f))
if self.is_file_in_use(f):
default_app = Path(self.get_default_windows_app(f.suffix)).name
apps.add(default_app)
fnames.append(str(f))
if apps:
return self.find_process(fnames, apps)
def find_process(self, fnames: list[str], apps: set[str]):
'''find processes for each locked files
Args:
fnames (list[str]): list of filepaths
apps (set[str]): set of default apps
Returns:
dict: dict(pid:[filenames])
'''
open_files = defaultdict(list)
for p in psutil.process_iter(['name']):
name = p.info['name']
if name in apps:
try:
[open_files[p.pid].append(x.path) for x in p.open_files() if x.path in fnames]
except:
continue
return dict(open_files)
def is_file_in_use(self, file_path: str):
'''Check if file is in use by trying to rename it to its own name (nothing changes) but if
locked then this will fail
Args:
file_path (str): _description_
Returns:
bool: True is file is locked by a process
'''
path = Path(file_path)
if not path.exists():
raise FileNotFoundError
try:
path.rename(path)
except PermissionError:
return True
else:
return False
def get_default_windows_app(self, suffix: str):
'''Find the default app dedicated to a file extension (suffix)
Args:
suffix (str): ie ".jpg"
Returns:
None|str: default app exe
'''
try:
class_root = winreg.QueryValue(winreg.HKEY_CLASSES_ROOT, suffix)
with winreg.OpenKey(winreg.HKEY_CLASSES_ROOT, r'{}\shell\open\command'.format(class_root)) as key:
command = winreg.QueryValueEx(key, '')[0]
return shlex.split(command)[0]
except:
return None
old_dir = r"C:\path_to_dir"
c = CheckFiles()
r = c.check_locked_files(old_dir)
pp(r)
I just started working with the Watchdog library in Python on Mac, and am doing some basic tests to make sure things are working like I would expect. Unfortunately, they're not -- I can only seem to obtain the path to the folder containing the file where an event was registered, not the path to the file itself.
Below is a simple test program (slightly modified from the example provided by Watchdog) to print out the event type, path, and time whenever an event is registered.
import time
from watchdog.observers import Observer
from watchdog.events import LoggingEventHandler
from watchdog.events import FileSystemEventHandler
class TestEventHandler(FileSystemEventHandler):
def on_any_event(self, event):
print("event noticed: " + event.event_type +
" on file " + event.src_path + " at " + time.asctime())
if __name__ == "__main__":
event_handler = TestEventHandler()
observer = Observer()
observer.schedule(event_handler, path='~/test', recursive=True)
observer.start()
try:
while True:
time.sleep(1)
except KeyboardInterrupt:
observer.stop()
observer.join()
The src_path variable should contain the path of the file that had the event happen to it.
However, in my testing, when I modify a file, src_path only prints the path to the folder containing the file, not the path to the file itself. For example, when I modify the file moon.txt in the folder europa, the program prints the following output:
event noticed: modified on file ~/test/europa at Mon Jul 8 15:32:07 2013
What do I need to change in order to obtain the full path to the modified file?
Problem solved. As it turns out, FSEvents in OS X returns only the directory for file modified events, leaving you to scan the directory yourself to find out which file was modified. This is not mentioned in Watchdog documentation, though it's found easily in FSEvents documentation.
To get the full path to the file, I added the following snippet of code (inspired by this StackOverflow thread) to find the most recently modified file in a directory, to be used whenever event.src_path returns a directory.
if(event.is_directory):
files_in_dir = [event.src_path+"/"+f for f in os.listdir(event.src_path)]
mod_file_path = max(files_in_dir, key=os.path.getmtime)
mod_file_path contains the full path to the modified file.
Thanks ekl for providing your solution. I just stumbled across the same problem. However, I used to use PatternMatchingEventHandler, which requires small changes to your solution:
subclass from FileSystemEventHandler
create an attribute pattern where you store your pattern matching. This is not as flexible as the original PatternMatchingEventHandler, but should suffice most needs, and you will get the idea anyway if you want to extend it.
Here's the code you have to put in your FileSystemEventHandlersubclass:
def __init__(self, pattern='*'):
super(MidiEventHandler, self).__init__()
self.pattern = pattern
def on_modified(self, event):
super(MidiEventHandler, self).on_modified(event)
if event.is_directory:
files_in_dir = [event.src_path+"/"+f for f in os.listdir(event.src_path)]
if len(files_in_dir) > 0:
modifiedFilename = max(files_in_dir, key=os.path.getmtime)
else:
return
else:
modifiedFilename = event.src_path
if fnmatch.fnmatch(os.path.basename(modifiedFilename), self.pattern):
print "Modified MIDI file: %s" % modifiedFilename
One other thing I changed is that I check whether the directory is empty or not before running max() on the file list. max() does not work with empty lists.