This question already has answers here:
How do I watch a file for changes?
(28 answers)
Closed 6 years ago.
I'm trying to use a method within a Python program to detect whether a file on the file system has been modified. I know that I could have something run on an every-5-seconds to check the last modification date off of the system, but I was curious as to whether there's an easier method for doing this, without needing to require my program to check repeatedly.
Does anyone know of such a method?
watchdog
Excellent cross platform library for watching directories.
From the website
Supported Platforms
Linux 2.6 (inotify)
Mac OS X (FSEvents, kqueue)
FreeBSD/BSD (kqueue)
Windows (ReadDirectoryChangesW with I/O completion ports; ReadDirectoryChangesW worker threads)
OS-independent (polling the disk for directory snapshots and comparing them periodically; slow and not recommended)
I've used it on a couple projects and it seems to work wonderfully.
For linux, there is pyinotify.
From the homepage:
Pyinotify is a Python module for
monitoring filesystems changes.
Pyinotify relies on a Linux Kernel
feature (merged in kernel 2.6.13)
called inotify. inotify is an
event-driven notifier, its
notifications are exported from kernel
space to user space through three
system calls. pyinotify binds these
system calls and provides an
implementation on top of them offering
a generic and abstract way to
manipulate those functionalities.
Thus it is obviously not cross-platform and relies on a new enough kernel version. However, as far as I can see, requiring kernel support would be true about any non-polling mechanism.
On windows there is:
watcher, which is a nice python port of the .NET FileSystemWatcher API.
Also there's (the one I wrote) dirwatch.
Both rely on the windows ReadDirectoryChangesW function. Though for real work, I'd use watcher (proper C extension, good API, python 2 & 3 support).
Mine is mostly an experiment calling the relevant APIs on windows, so it's only interesting if you want an example of calling these things from python.
You should also see inotifyx which is very similar to the previously mentioned pyinotify, but is said to have an API which changes less.
Related
Fairly new 'programmer' here, trying to understand how Python interacts with Windows when multiple unrelated scripts are run simultaneously, for example from Task Manager or just starting them manually from IDLE. The scripts just make http calls and write files to disk, and environment is 3.6.
Is the interpreter able to draw resources from the OS (processor/memory/disk) independently such that the time to complete each script is more or less the same as it would be if it were the only script running (assuming the scripts cumulatively get nowhere near using up all the CPU or memory)? If so, what are the limitations (number of scripts, etc.).
Pardon mistakes in terminology. Note the quotes on 'programmer'.
how Python interacts with Windows
Python is an executable, a program. When a program is executed a new process is created.
python myscript.py starts a new python.exe process where the first argument is your script.
when multiple unrelated scripts are run simultaneously
They are multiple processes.
Is the interpreter able to draw resources from the OS (processor/memory/disk) independently?
Yes. Each process may access the OS API however it wishes, to the extend that it is possible.
What are the limitations?
Most likely RAM. The same limitations as any other process might encounter.
These are difficult questions to answer, in part because they depend on:
Your operating system: Your OS gets to schedule and run tasks when it wants, which the Python programmer often does not have control over.
What your scripts are actually doing: If your scripts are all trying to write to the same drive, their execution may be halted more often than if no device was being written to. Or the script might run even faster if only one script writes to the drive, as the CPU can let one script calculate when another script writes. (It's hard to tell without benchmark testing.)
How many CPUs you're using: The number of Central Processing Units can improve parallel processing of programs -- but perhaps not. If your programs are constantly reading and writing from the same disk, more CPUs may not be a benefit.
Your Python version: (I'm just adding this for completeness.)
Ultimately, the only way you're going to get any real information on this is if you do your own benchmarking -- and even then, you should remember that those figures you find are only applicable to your current setup. That is, if you go to another computer elsewhere, you may find you get different results.
If you aren't familiar with Python's timeit module, I recommend you look into it. (I'm pretty sure it's a standard module, so you should already have it.) It'll help you do benchmark testing and let you get some definitive answers for your platform.
By asking questions like yours, you may soon hear about Python's GIL (Global Interpreter Lock). It has to do with Python threads, and some people think it's a blessing, and some think it's a curse. Either way, this page:
https://realpython.com/python-gil/
has a good high-level explanation of it when it can work well and when it might not.
I have been developing a fairly extensive library of python modules that automate the more time consuming parts of "3D character development" for games/film/tv.
All of my code up until a few months ago has been run within Maya's dedicated python interpreter, however, my GUIs are built in PySide/PyQt, and so, run just fine in mac/windows/linux or a few other Graphics programs such as Nuke, XSI, Max.
What I would really like to figure out is a "simple" way to distribute my code to various different people ---> using various different operating Systems ---> potentially using various applications (Nuke, XSI, Max), which, in turn, have their own dedicated python interpreters.
The obvious option would be pip and easy_install.. These modules are clearly the "right" way to go, but its not really clear how a user would install/run them under the dedicated python installs that ship with Maya/Nuke/ etc...Though, it does seem possible (as explained here). Still Its going to be a pretty big barrier for a less-technical user.
Any help or points in the right direction would be immensely appreciated..
I would not say that pip/easy_install are the 'right' way for this problem. They are pretty good (not quite 'great') tools for motivated, technically inclined users -- but even in that context they have issues (such as unintended upgrades or deletions). Most importantly, they are opt-in methods: nobody can make you pip unless you want to. This means users can accidentally or deliberately get themselves into very different positions from each other, which makes support and maintenance a nightmare.
I've had very good luck in Maya distributing a zipped file containing a complete environment - all the modules etc. userSetup.py adds that zip to the path and the Python's native zipimport functionality handles the rest. This makes sure that there is only one file to maintain and distribute. It also fixes the common problem of leftover .pyc files creating havok after .py files get moved or renamed. Since this is all standard python, I'd assume this will work for any app-specific python that uses a 2.6+ version of python, though I've never tried it in Nuke or Max.
The main wrinkle will be modules with .pyd or other binary components, typically these don't work inside the zip files. I include a bootstrap routine which unpacks those to a (disposable) location on the user's disk and adds that to the path.
There's a detailed discussion of the method here and some background here
I created a module in Python which provides about a dozen functionalities. While it will be mostly used from within Python, there is a good fraction of legacy users which will be calling it from Perl.
What is the best way to make a plug in to this module? My thoughts are:
Provide the functionalities as command line utilities and make system calls
Create some sort of server and handle RPC calls (say, via JSON RPC)
Any advise?
One other choice is to inline Python directly in your Perl script, using Inline::Python.
This may be simpler than other solutions, and only requires one additional module.
In the short run the easiest solution is to use Inline::Python. Closely followed by calling a command-line script.
In the long run, using a server to provide RPC functionality or simply calling a command-line script will give you the most future proof solution.
Why?
Becuase that way you aren't tied to Perl or Python as the language used to build the systems that consume the services provided by your library. Either method creates a clear, language independent interface that you can use with whatever development environment you adopt.
Depending on your needs any of the presented options may be the "best choice". Depending on how your needs evolve over time, a different choice may be revealed as "best".
My approach to this would be to ask a couple of questions:
How often do you change development tools. You've switched to Python from Perl. Did you start with Tcl and go to Perl? Are you going to switch to the exciting new language X in 1, 5 or 10 years? If you change tools 'often' (whatever that means) emphasize cross tool compatibility.
How fast is fast enough? Is the start up time for command line solutions ok? Does Inline::Python slow things down too much (you are still initializing a Python interpreter, it's just embedded in your Perl interpreter)?
Based on the answers to these questions, I would do the simplest thing that is likely to work.
My guess is that means in order:
Inline::Python
Command line scripts
Build an RPC server
Provide the functionalities as command line utilities and make system calls
Works really nicely. This is the way programs like Python (and Perl) are meant to use used.
I have a script in python which uses a resource which can not be used by more than a certain amount of concurrent scripts running.
Classically, this would be solved by a named semaphores but I can not find those in the documentation of the multiprocessing module or threading .
Am I missing something or are named semaphores not implemented / exposed by Python? and more importantly, if the answer is no, what is the best way to emulate one?
Thanks,
Boaz
PS. For reasons which are not so relevant to this question, I can not aggregate the task to a continuously running process/daemon or work with spawned processes - both of which, it seems, would have worked with the python API.
I suggest a third party extension like these, ideally the posix_ipc one -- see in particular the sempahore section in the docs.
These modules are mostly about exposing the "system V IPC" (including semaphores) in a unixy way, but at least one of them (posix_ipc specifically) is claimed to work with Cygwin on Windows (I haven't verified that claim). There are some documented limitations on FreeBSD 7.2 and Mac OSX 10.5, so take care if those platforms are important to you.
You can emulate them by using the filesystem instead of a kernel path (named semaphores are implemented this way on some platforms anyhow). You'll have to implement sem_[open|wait|post|unlink] yourself, but it ought to be relatively trivial to do so. Your synchronization overhead might be significant (depending on how often you have to fiddle with the semaphore in your app), so you might want to initialize a ramdisk when you launch your process in which to store named semaphores.
Alternatively if you're not comfortable rolling your own, you could probably wrap boost::interprocess::named_semaphore (docs here) in a simple extension module.
This question already has answers here:
How do I watch a file for changes?
(28 answers)
Closed 10 years ago.
I'm looking for a cross-platform file monitoring python package? I know it is possible to monitor files on windows using pywin32, and there are packages working on Linux/Unix but does anyone know about a cross-platform one?
I'm working on an MIT-licensed library that helps Python
programs monitor file system events as portably as possible.
There are differences that I'm trying to iron out. Highly
alpha version at the moment:
Check it out here:
http://github.com/gorakhargosh/watchdog/
Patches and contributions are welcome.
For Unix/Linux based systems, you should use File Alteration Monitor Python bindings to libfam.
For Windows based systems, you should tie into the Win32 API FindFirstChangeNotification and related functions.
As for a cross platform way, I don't know about a good cross platform way. I think it would be best to build a module yourself that works on either OS that uses one of the 2 above methods after detecting what OS it is.
Also check out this option:
http://pypi.python.org/pypi/watchdog
Was used with a cross-platform app on Windows and OS X.
I found this link, which talks about your problem. Although it doesn't really provide s solution/library, I think it will help.
http://www.stepthreeprofit.com/2008/06/cross-platform-monitoring-of-filesystem.html
I don't think there is a cross-platform one yet, so you might want to roll your own.
I am inexperienced in this area so I am not really sure. I hope this helps.
Note
I stand corrected, gamin is available on cygwin as Adam Bernier pointed out to me in a comment. You may want to research other options on cygwin (if they exist).
The easiest way on Linux is to use inotifywait (given that your kernel is recent enough). You don't need any special bindings, inotifywait can be customized to print output lines on standard output in any way you want. Look and this question for a good example.