auto detect file in a folder - python

Sorry wasn't sure how to best word this question.
My scenario is that I have some python code (on a linux machine) that uses an xml file to acquire its arguements to perform a task, on completion of the task it disposes of the xml file and waits for another xml file to arrive to do it all over again.
I'm trying to find out the best way to be alerted an xml file has arrived in a specified folder.
On way would be to continually monitor the folder in the Python code, but that would mean a lot of excess resourses used while waiting for something to turn up (which may be as little as a few times a day). Another way, would be to set up a cronjob, but it's efficiency would't be any better than monitoring from within the code. An option I was hoping was possible would be to set up some sort of interrupt that would alert the code when an xml file appeared.
Any thoughts?
Thanks.

If you're looking for something "easy" to just run a specific script when new files arrive, the incron daemon provides a very handy combination of inotify(7) and cron(8)-like support for executing programs on demand.
If you want something a little better integrated into your application, or if you can't afford the constant fork(2) and execve(2) of the incron approach, then you should probably use the inotify(7) interface directly in your script. The pyinotify module can integrate with the underlying inotify(7) interfaces.

Related

Intentionally cause a read/write timeout?

I'm trying to test some file io and I was wondering if there's a way to emulate the following situation:
I have a block-storage device that is constantly being read/written from, but I want to notify the users of the proper error when they are trying to read/write from a file stored in the block-storage device but the block-storage service/device becomes unavailable or detached mid write. In which case, the read or write command would "timeout," or "hang."
I'm trying to write a test case that reads a file and I want to emulate that situation as closely as possible, meaning I don't want to use signal or just some timeout, I want to be able to make some kind of file that will hang a python file.read() statement or a file.write() statement.
Is this possible? I'm testing on a linux machine and mounting a blockstorage to a folder, pretty simple.
It seems to me that fsdisk is the right tool your looking for. It can bind your storage and inject errors.

Using the Output of Sysinternals Process Monitor in another programm/script in real time

I'm working on a script that should check on certain system events (like opening of a file, or changing of a registry key) and start further actions depending on that. But I haven't found a clean way to get the information into my script.
I'm looking for a way to get the output of Sysinternals Process Monitor into another program. This should happen without user interaction in close to real time; so saving into a CSV/XML and than using this doesn't work.
I've checked on using the backing file, but this is in the Process Monitor PML format, which i haven't found to be documented anywhere.
Does anybody know a way how I can get the output of Process Monitor into my script?
Or an other (not too messy) way to get a real time list of opened files, registry keys etc into a python program?
Thanks!
If you want to parse stdout or a file, and your ok with a 32 bit only solution, try Dr Strace or ntstrace.
YOu could also look into ospy or another ProcMon alternative. ospy is open source, so at the very least you could look at the source code for capturing events.
Here is a list of alternates to ProcMon.

Simplest way to have Python output, from a compiled package?

Prior info: I'm on a Mac.
Q: How can I get terminal-like text output from the program execution, if I compile it with py2app for redistribution?
My case is a program that copies a lot of big files and takes a while to process so I would like to at least have an output notification everytime each file is copied.
This is easy if I run it on the command line, I can just print a new line.
But when I make a self-sufficient package, it simply opens on the bottom dock, with no window, and closes upon completion.
A simple text window would be fine.
Thanks in advance.
If you want to create a simple text window, you need to pick a GUI framework to do that with. For something this simple, there's no reason not to use Tkinter (which comes with any Python) or PyObjC (which is pre-installed with Apple's Python 2.7), unless you happen to be more familiar with wx, gobject, Qt, etc.
At any rate, however you do it, you'll need to write a function that takes a message and appends it to the text window (maybe creating it lazily, if necessary), and call that function wherever you would normally print. You may also want to write and install a logging handler that does the same thing, so you can just log.info stuff. (You could instead create a file-like object that does this and redirect stdout and/or stderr, but unless you have no control over the printing code, that's going to be a lot more work.)
The only real problem here is that a GUI needs an event loop, and you probably just wrote your code as a sequential script.
One way around that is to turn your whole current script into a background thread. If you're using a GUI library that allows you to access the widgets from background threads, everything is easy; your printfunc just does textwidget.append(msg). If not, it may at least have a call_on_main_thread type function, so your printfunc does call_on_main_thread(textwidget.append, msg). If worst comes to worst (and I believe with Tkinter, it does), you have to create an explicit queue to push messages through, and write a queue handler in the event loop. This recipe should give you an idea. Replace the body of workerThread with your code, and end it with self.endApplication(). (There are probably better examples out there; this was just what I found first in a quick search.)
The other way around that is to have your code cooperatively operate with the event loop. Some libraries, like wx, have functions like SafeYield that make things work if you just call it after every chunk of processing. Others don't have that, but have a way to explicitly drive the event loop from your code. Others have neither—but every event loop framework has to have a way to schedule new events, so you can break your code up into a sequence of functions that each finish quickly and then do something like root.after_idle(nextfunc).
However… are you sure you need to do this?
First, any app, including one created by py2app, will send its stdout to the terminal if you run it with Foo.app/Contents/MacOS/Foo. And you can even set things up so that open Foo.app works that way, if you want. Obviously this doesn't help for people who just double-click the app in Finder (because then there is no terminal), but sometimes it's sufficient to just have to output available when people need it and know how to follow instructions.
And you can take this farther: Create a Foo.command file that just does something like $(dirname $0)/Foo.app/Contents/MacOS/Foo, and when you double-click that file, it launches Terminal.app and runs your script.
Or you can get even simpler: Just use logging to syslog the output, and if you want to see when each file is done, just watch the log messages go by in Console.app.
Finally, do you even need py2app in the first place? If you don't have any external dependencies, just rename you script to Foo.command, and double-clicking it will run it in Terminal.app. If you do have external dependencies, you might still be able to get away with bundling it all together as a folder with a .command in it instead of as a .app.
Obviously none of these ideas are exactly a professional or newbie-friendly way to build an interface, so if that matters, you will have to create a GUI.

which inotify event signals the completion of a large file operation?

for large files or slow connections, copying files may take some time.
using pyinotify, i have been watching for the IN_CREATE event code. but this seems to occur at the start of a file transfer. i need to know when a file is completely copied - it aint much use if it's only half there.
when a file transfer is finished and completed, what inotify event is fired?
IN_CLOSE probably means the write is complete. This isn't for sure since some applications are bad actors and open and close files constantly while working with them, but if you know the app you're dealing with (file transfer, etc.) and understand its' behaviour, you're probably fine. (Note, this doesn't mean the transfer completed successfully, obviously, it just means that the process that opened the file handle closed it).
IN_CLOSE catches both IN_CLOSE_WRITE and IN_CLOSE_NOWRITE, so make your own decisions about whether you want to just catch one of those. (You probably want them both - WRITE/NOWRITE have to do with file permissions and not whether any writes were actually made).
There is more documentation (although annoyingly, not this piece of information) in Documentation/filesystems/inotify.txt.
For my case I wanted to execute a script after a file was fully uploaded. I was using WinSCP which writes large files with a .filepart extension till done.
I first started modifying my script to ignore files if they're themselves ending with .filepart or if there's another file existing in the same directory with the same name but .filepart extension, hence that means the upload is not fully completed yet.
But then I noticed at the end of the upload, when all the parts have been finished, I have a IN_MOVED_IN notification getting triggered which helped me run my script exactly when I wanted it.
If you want to know how your file uploader behaves, add this to the incrontab:
/your/directory/ IN_ALL_EVENTS echo "$$ $# $# $% $&"
and then
tail -F /var/log/cron
and monitor all the events getting triggered to find out which one suits you best.
Good luck!
Why don't you add a dummy file at the end of the transfer? You can use the IN_CLOSE or IN_CREATE event code on the dummy. The important thing is that the dummy as to be transfered as the last file in the sequence.
I hope it'll help.

How do I watch a folder for changes and when changes are done using Python?

i need to watch a folder for incoming files. i did that with the following help:
How do I watch a file for changes?
the problem is that the files that are being moved are pretty big (10gb)
and i want to be notified when all files are done moving.
i tried comparing the size of the folder every 20 seconds but the file shows its correct size even tough windows shows that it is still moving.
i am using windows with python
i found a solution using open and waiting for an io exception.
if the file is still being moved i get errno 13.
You should take a look at this link:
http://timgolden.me.uk/python/win32_how_do_i/watch_directory_for_changes.html
There you can see the comparison of the method you are speaking about (simple polling) with two other windows-specific techniques which, in my opinion, offers a really better solution to your problem!
Otherwise, if you are using linux, there's iNotify and the relative Python wrapper:
Pyinotify is a pure Python module used
for monitoring filesystems events on
Linux platforms through inotify
Here: http://trac.dbzteam.org/pyinotify
If you have control over the process of importing the files, I would put a lock file when starting to copy files in, and remove it when you are done. by lock file I mean a tmp empty file, which is just there to indicate that you are coping a file. then your py script can check for the existence of the lock files.
You may be able to use os.stat() to monitor the mtime of the file. However be aware that under various network conditions, the copy may stall momentarily and so the mtime is not updated for a few seconds, so you need to make allowance for this.
Another option is to try opening the file with exclusive read/write which should fail under windows if the file is still opened by the other process
The most reliable method would be to write your own program to move the files.
try checking for the last-modified time change instead of the filesize during your poll.

Categories