I'd like to ask if there's any way we could watch a directory in python and parse the latest text file being generated on the directory.
Tho i have this start up code which parse a certain text file.
import time
def follow(thefile):
thefile.seek(0,2)
while True:
line = thefile.readline()
if not line:
time.sleep(0.1)
continue
yield line
if __name__ == '__main__':
logfile = open(r'\\some directory\files.txt',"r")
loglines = follow(logfile)
for line in loglines:
print line,
See the bold files.txt i need that to be dynamic by watching the directory for newly generated text files and switch to the latest text file and parse it.
It will run on Windows XP service Pack 3
I'm using Python 2.7
Directory i'm watching is also using windows XP
Thank you.
To check for new files, repeatedly get a list of files currently in the directory with os.listdir('directory'). Save the entries in a set and calculate the difference of the set with the previous set.
# Initialize before an event loop:
old_entries = set()
# You need a loop that calls two handlers, each handler returning soon.
# Inside your loop, check for a "new file" event this way:
now_entries = os.listdir(r'\\some directory')
now_entries.symmetric_difference_update(old_entries)
for new_entry in now_entries:
handle_new_file(new_entry)
Your program needs to listen for two events:
New file in the directory.
New line in the old file.
You call follow(), which is like an event handler that never returns. I think you want that handler to return to one main event loop that checks for each kind of event. Your follow() function never returns because it continues within the while True infinite loop unless a new line is added to the file for it to yield. It will never yield if no more lines are getting added to that file.
Take a look into the FindFirstChangeNotification API
http://timgolden.me.uk/python/win32_how_do_i/watch_directory_for_changes.html
The approach here is to use the MS FindFirstChangeNotification API, exposed via the pywin32 win32file module. It needs a little explanation: you get a change handle for a directory (optionally with its subdirectories) for certain kinds of change. You then use the ubiquitous WaitForSingleObject call from win32event, which fires when something's changed in one of your directories.
Essentially because the Windows OS is responsible for managing creation/modification of files, you can ask it to let you know immediately when a file is changed/created.
Related
I use the right click menu to launch a program that moves files to a folder and do some work on them. The problem is that when i do this with multiple files, it starts multiple instances of the program. If i have 50 files, it will launch the app 50 times.
I need to have only one window not multiple windows. So far i manage to make it work sometimes with this code, but what i need is to make it work a 100% of time:
# ========================================================
#this only catches the file adress that was right clicked and launch with the app. Then it moves it to the installer folder
try:
# THIS CATCHES THE SELECTED FILE LINK TO BE MOVED TO THE INSTALL FOLDER IF IT IS SUCCESSFULL
variable= sys.argv[1]
index = -1
for i in variable:
index = index + 1
if "\\" in i:
sum= index
# MOVE TO INSTALLER DIRECTORY
shutil.move(variable, Install_folder + f"\Installer\files\{str(variable[sum+ 1::])}")
except:
print('FILE NOT ADDED THROUGH THE MOVE TOOL')
#=========================================================
processlist = list()
time.sleep(int(time.process_time())*random.randint(1,3))
for process in psutil.process_iter():
processlist.append(process.name())
if processlist.count("program.exe") >= 4:
sys.exit()
My guess is that the programs start activating at the same time and that could be why they are closing instead of letting only one active window remain. I only have 2 months of python so hope you could help me. Thank you in advance for reading.
My other alternate solution to this is separate the programs into two. One for moving the files and another for doing the work with the files. But this solution it is not the desired one.
Solution:
try:
variable = sys.argv[1]
index = -1
# THIS CATCHES THE SELECTED FILE LINK TO BE MOVED TO THE HANDY INSTALL FOLDER IF IT IS SUCCESSFULL
for i in chetita:
index = index + 1
if "\\" in i:
sum = index
# CHOOSES DIRECTORY WHERE THE FILES TO BE INSTALLED ARE
shutil.move(variable, f"{Install_folder}\\program\\InstallFolder\{str(variable[sum + 1::])}")
os.makedirs(f"{Install_folder}\\program\\Queue")
tiger = []
serpent = []
while True:
time.sleep(1)
serpent = os.listdir(f"{Install_folder}\\program\\InstallFolder")
time.sleep(1)
tiger = os.listdir(f"{Install_folder}\\program\\InstallFolder")
if tiger == serpent:
break
except:
if os.path.exists(f"{Install_folder}\\program\\Queue"):
sys.exit()
print('application executed directly through the .exe').
Basically when all the windows open, the first window on compliting the task of moving a file. Creates a file called Queue and enters a while loop that is active until the content of the folder match 2 variables(meaning all the other windows finish their work and are closed). The other programs closes because they get an execption when they tried to create that folder. When all the other programs are closed, the waiting windows will leave the while loop and start working on the files.
Author your python app so it behaves in this way.
from pathlib import Path
QUEUE_FILE = Path("~/queue.txt").expanduser()
...
if __name__ == "__main__":
if QUEUE_FILE.exists():
... # do the old app behavior in an interactive window
else:
append_filename(QUEUE_FILE, sys.argv)
So there are two modes of operation.
That second if clause will very quickly service any right-click requests.
It does almost no work, merely writing a line of text to a central queue file.
The first if clause mostly behaves the same as your current app,
and it keeps a single window open while you're interactively working with it.
The difference is that, instead of accepting filename(s) in sys.argv,
it accepts most of those fifty filenames via the central queue file.
Upon exiting, it must delete the queue file.
That sets us up for a subsequent interaction.
In my Flask application I have implemented a logging system using the logging library. It is currently run in a function below:
if __name__ == "__main__":
"""[Runs the webserver.
Finally block is used for some logging management. It will first shut down
logging, to ensure no files are open, then renames the file to 'log_'
+ the current date, and finally moves the file to the /logs archive
directory]
"""
try:
session_management.clean_uploads_on_start(UPLOAD_FOLDER)
app.run(debug=False)
finally:
try:
logging.shutdown()
new_log_file_name = log_management.rename_log(app.config['DEFAULT_LOG_NAME'])
log_management.move_log(new_log_file_name)
except FileNotFoundError:
logging.warning("Current log file not found")
except PermissionError:
logging.warning("Permissions lacking to rename or move log.")
I discovered that the file is not renamed and moved if (either) the cmd prompt is force closed, or if the server crashes. I thought it might be better to put the rename and move into the initial 'try' block of the function, prior to the server starting, but I run into issues because I have a config file (which is imported in this script) which has the following code:
logging.basicConfig(filename='current_log.log', level=logging.INFO,
filemode='a',
format='%(asctime)s:%(levelname)s:%(message)s')
I have tried to do something like the below, but I still run into permission errors, but I think I am still running into errors because the log_management script also imports config. Further, I could not find a function which starts the logging system similar to logging.shutdown() which is used upon the system ending, otherwise I would shut it down, move the file (if it exists) and the start it back up.
try:
session_management.clean_uploads_on_start(UPLOAD_FOLDER)
log_management.check_log_on_startup(app.config['DEFAULT_LOG_NAME'])
import config
app.run(debug=False)
finally:
try:
logging.shutdown()
new_log_file_name = log_management.rename_log(app.config['DEFAULT_LOG_NAME'])
log_management.move_log(new_log_file_name)
except FileNotFoundError:
logging.warning("Current log file not found")
except PermissionError:
logging.warning("Permissions lacking to rename or move log.")
# (in another script)
def check_log_on_startup(file_name):
if os.path.exists(file_name):
move_log(rename_log(file_name))
Any suggestions much welcomed, because I feel like I'm at a brick wall!
As you have already found out, trying to perform cleanups at the end of your process life cycle has the potential to fail if the process terminates uncleanly.
The issue with performing the cleanup at the start is, that you apparently call logging.basicConfig from your import before attempting to move the old log file.
This leads to the implicitly created FileHandler holding an open file object on the existing log when you attempt to rename and move it. Depending on the file system you are using, this might not be met with joy.
If you want to move the handling of potential old log files to the start of your application completely, you have to perform the renaming and moving before you call logging.basicConfig, so you'll have to remove it from your import and add it to the log_management somehow.
As an alternative, you could move the whole handling of log files to the logging file handler by subclassing the standard FileHandler class, e.g:
import logging
import os
from datetime import datetime
class CustomFileHandler(logging.FileHandler):
def __init__(self, filename, archive_path='archive', archive_name='log_%Y%m%d', **kwargs):
self._archive = os.path.join(archive_path, archive_name)
self._archive_log(filename)
super().__init__(filename, **kwargs)
def _archive_log(self, filepath):
if os.path.exists(filepath):
os.rename(filepath, datetime.now().strftime(self._archive))
def close(self):
super().close()
self._archive_log(self.baseFilename)
With this, you would configure your logging like so:
hdler = CustomFileHandler('current.log')
logging.basicConfig(level=logging.INFO, handlers=[hdler],
format='%(asctime)s:%(levelname)s:%(message)s')
The CustomFileHandler will check for, and potentially archive, old logs during initialization. This will deal with leftovers after an unclean process termination where the shutdown cleanup cannot take place. Since the parent class initializer is called after the log archiving is attempted, there is not yet an open handle on the log that would cause a PermissionError.
The overwritten close() method will perform the archiving on a clean process shutdown.
This should remove the need for the dedicated log_management module, at least as far as the functions you show in your code are concerned. rename_log, move_log and check_log_on_startup are all encapsulated in the CustomFileHandler. There is also no need to explicitly call logging.shutdown().
Some notes:
The reason you cannot find a start function equivalent to logging.shutdown() is that the logging system is started/initialized when you import the logging module. Among other things, it instantiates the implicit root logger and registers logging.shutdown as exit handler via atexit.
The latter is the reason why there is no need to explicitly call logging.shutdown() with the above solution. The Python interpreter will call it during finalization when preparing for interpreter shutdown due to the exit handler registration. logging.shutdown() then iterates through the list of registered handlers and calls their close() methods, which will perform the log archiving during a clean shutdown.
Depending on the method you choose for moving (and renaming) the old log file, the above solution might need some additional safeguards against exceptions. os.rename will raise an exception if the destination path already exists, i.e. when you have already stopped and started your process previously on the same day while os.replace would silently overwrite the existing file. See more details about moving files via Python here.
Thus I would recommend to name the archived logs not only by current date but also by time.
In the above, adding the current date to the archive file name is done via datetime's strftime, hence the 'log_%Y%m%d' as default for the archive_name parameter of the custom file handler. The characters with a preceding % are valid format codes that strftime() replaces with the respective parts of the datetime object it is called on. To append the current time to the archive log file name you would simply append the respective format codes to the archive_name, e.g.: 'log_%Y%m%d_%H%M%S' which would result in a log name such as log_20200819_123721.
I've added code to a Python package (brian2) that places an exclusive lock on a file to prevent a race condition. However, because this code includes calls to fcntl, it does not work on Windows. Is there a way for me to place exclusive locks on files in Windows without installing a new package, like pywin32? (I don't want to add a dependency to brian2.)
Since msvcrt is part of the standard library, I assume you have it. The msvcrt (MicroSoft Visual C Run Time) module only implements a small number of the routines available in the MS RTL, however it does implement file locking.
Here is an example:
import msvcrt, os, sys
REC_LIM = 20
pFilename = "rlock.dat"
fh = open(pFilename, "w")
for i in range(REC_LIM):
# Here, construct data into "line"
start_pos = fh.tell() # Get the current start position
# Get the lock - possible blocking call
msvcrt.locking(fh.fileno(), msvcrt.LK_RLCK, len(line)+1)
fh.write(line) # Advance the current position
end_pos = fh.tell() # Save the end position
# Reset the current position before releasing the lock
fh.seek(start_pos)
msvcrt.locking(fh.fileno(), msvcrt.LK_UNLCK, len(line)+1)
fh.seek(end_pos) # Go back to the end of the written record
fh.close()
The example shown has a similar function as for fcntl.flock(), however the code is very different. Only exclusive locks are supported.
Unlike fcntl.flock() there is no start argument (or whence). The lock or unlock call only operates on the current file position. This means that in order to unlock the correct region we have to move the current file position back to where it was before we did the read or write. Having unlocked, we now have to advance the file position again, back to where we were after the read or write, so we can proceed.
If we unlock a region for which we have no lock then we do not get an error or exception.
What I want to do is to check if a file exists, and if it doesn't, perform an action, then check again, until the file exists and then the code continues on with other operations.
For simplicity, I would implement a small polling function, with a timeout for safety:
def open_file(path_to_file, attempts=0, timeout=5, sleep_int=5):
if attempts < timeout and os.path.exists(path_to_file) and os.path.isfile(path_to_file):
try:
file = open(path_to_file)
return file
except:
# perform an action
sleep(sleep_int)
open_file(path_to_file, attempts + 1)
I would also look into using Python built-in polling, as this will track/report I/O events for your file descriptor
Assuming that you're on Linux:
If you really want to avoid any kind of looping to find if the file exists AND you're sure that it will be created at some point and you know the directory where it will be created, you can track changes to the parent directory using pynotify. It will notify you when something changes and you can detect if it's the file that you need being created.
Depending on your needs it might be more trouble than it's worth, though, in which case I suggest a small polling function like Kyle's solution.
I have a script to scan a directory to see when new files are added, and then process their contents. They're video files, so they're often very large, and they're being transferred over a network and often take a long time to transfer. So I need to make sure they have finished copying before going on.
At the moment, once I've found a new file in the directory I'm using os.path.mtime to check the modification date, and comparing that to the last time the file was scanned, to see if it is still being modified. The theory being that if it's no longer being modified then it should have finshed transferring.
if getmtime(path.join(self.rootFolder, thefile)) < self.lastchecktime: newfiles.append[thefile]
but that doesn't seem to work - the script gets triggered too early and the processing fails because the file is not fully loaded. Could it be that there is not enough of a pause between scans that the mtime stays the sameā¦? I give it 10 seconds between scans - that should be enough, surely.
Is there an easy / more pythonic way of doing this? The files are on a windows server running on a VM.
Do you have any control over the adding of the files? If so, you could create an empty file with a name like videoname-complete once a video has finished uploading, and watch for those files.
Wouldn't your check be "is my modified time greater than last checked?".
if os.path.getmtime(path) > self.lastAccessedTime:
# do something as modified time is greater than last time I checked
pass
I'm not a windows guy, but I'm sure there will be some equivalent library to inotify for windows. It is a really nice way to listen for file or directory changes on file system level. I'm leaving some sample code which works on linux with pyinotify, would be helpful for someone on linux.
class PTmp(pyinotify.ProcessEvent):
def process_IN_CLOSE_WRITE(self, event):
print "Changed: %s " % os.path.join(event.path, event.name)
wm = pyinotify.WatchManager()
mask = pyinotify.IN_CLOSE_WRITE
notifier = pyinotify.Notifier(wm, PTmp())
wdd = wm.add_watch(FILE_LOCATION, mask, rec=True)
while True:
try:
notifier.process_events()
if notifier.check_events():
notifier.read_events()
except KeyboardInterrupt:
notifier.stop()
break