compiling python file with Watchman

compiling python file with Watchman - python

what's the best way to capture files/path info from watchman to pass
to 'make' or another app?
here's what im trying to achieve:
when i save a .py(s) file on the dev server, i'd like to retrieve the filename and path, compile the py to pyc, then transfer the pyc file to a staging server.
should i be using watchman-make, 'heredoc' methods, ansible, etc.?
because the docs are note very helpful, are there any examples available?
and, what's the use case for pywatchman?
thanks in advance

Hopefully this will help clarify some things:
Watchman runs as a per-user service to monitor your filesystem. It can:
Provide live subscriptions to file changes as they occur
trigger a command to be run in the background as file changes occur
Answer queries about how files have changed since a given point in time
pywatchman is a python client implementation that allows you to build applications that consume information from watchman. The watchman-make and watchman-wait tools are implemented using pywatchman.
watchman-make is a tool that helps you invoke make (or a similar program) when files change. It is most appropriate in cases where the program you want to run doesn't need the specific list of files that have just changed. make is in this category; make will analyze the dependencies in your Makefile and then build only the pieces that are changed. You could alternatively execute a python distutils or setuptools setup.py script.
Native watchman triggers are a bit harder to use than watchman-make, as they are spawned in the background by the watchman service and are passed the list of changed files. These are most appropriate for completely unattended processes where you don't need to see the output and need the precise list of changed files.
From what you've described, it sounds like the simplest solution is a script that performs the compilation step and then performs the sync, something along the lines of the following; let's call it build-and-sync.sh
#!/bin/sh
python -m compileall .
rsync -avz . host:/path/
(If you don't really need a .pyc file and just need to sync, then you can simply remove the python line from the above script and just let it run rsync)
You can then use watchman-make to execute this when things change:
watchman-make --make='build-and-sync.sh' -p '**/*.py' -t dummy
Then, after any .py file (or set of .py files) are changed, watchman-make will execute build-and-sync.sh dummy. This should be sufficient unless you have a large enough number of python files that the compilation step takes too long each time you make a change. watchman-make will keep running until you hit CTRL-C or otherwise kill the process; it runs in the foreground in your terminal window unless you use something like nohup, tmux or screen to keep it around for longer.
If that is the case, then you can try using make with a pattern rule to compile only the changed python files, or if that is awkward to express using make then perhaps it is worth using pywatchman to establish a subscription and compile the changed files. This is a more advanced use-case and I'd suggest looking at the code for watchman-wait to see how that might be achieved. It may not be worth the additional effort for this unless you have a large number of files or very tight time constraints for syncing.
I'd recommend trying out the simplest solution first and see if that meets your needs before trying one of the more complex options.
Using native triggers
As an alternative, you can use triggers. These run in the background with their output going to the watchman log file. They are a bit harder to work with than using watchman-make.
You need to write a small program, typically a script, to receive the list of changed files from the trigger; the best way to do this is via stdin of the script. You can receive a list of files one-per-line or a JSON object with more structured information. Let's call this script trigger-build-and-sync; it is up to you to implement the contents of the script. Let's assume you just want a list of files on stdin.
This command will set up the trigger; you invoke it once and it will persist until the watch is removed:
watchman -j <<-EOT
["trigger", "/path/to/root", {
"name": "build-and-sync",
"expression": ["suffix", "py"],
"command": "/path/to/trigger-build-and-sync",
"append_files": false,
"stdin": "NAME_PER_LINE"
}]
EOT
The full docs for this can be found at https://facebook.github.io/watchman/docs/cmd/trigger.html#extended-syntax

Related

Can I read code from a previously run python script

I ran a python file using the command python file.py and it executed successfully. Right after I managed to delete the file, can I recover the code that was run in the previous command? I still have the terminal open and have not typed anything else into it. Running ubuntu.

if you had imported the file instead of running it directly, a .pyc file would have been created containing the compiled bytecode which you could easily transform back into the regular python code (minus the comments.) However, you did run the script directly, so there is no .pyc file.
If you deleted the file in a GUI file browser, the file might have been sent into a "trash bin" where you might be able to recover it from.
Assuming you deleted the file using the "rm" command in the terminal, however, the data might still be on the disk provided it hasn't been overwritten ("deleting" files normally just marks the data to be overwritten on the disk)
If you're lucky, you might be able to recover it. The process isn't exactly simple, though, and there's no guarantee that the file hasn't already been overwritten since the disk is pretty much in constant use when you're using the system. More info on that here:
https://help.ubuntu.com/community/DataRecovery
There's also a handy utility called lsof which you can use to recover a 'deleted' file if there's still an open file handle for it:
https://support.hpe.com/hpsc/doc/public/display?docId=emr_na-c00833030
Also, in the future, I recommend using "rm -i" instead of just rm with no options as that will at least prompt you to confirm if you're sure you want to delete something first. You can also make an alias for this in your shell so that regular rm just points to 'rm -i'

Using python to run different files (stata, .stc, etc), as you would a .bat on windows

In this guide to not being a total mess doing research, the authors talk about using a .py file to execute a directory in order -- that is, delete all the output files (.pdf, .txt, etc) and run just the .py and everything will be recreated from the raw data, stata files, maybe other .py's, etc etc.
What is the best way to do this in Python? I know one option is to use subprocesses, but is that the only option? Basically, how can I best mimic a .bat file using Python on a Mac.

You can certainly use Python for shell-script type stuff - with the bonus that it will be relatively portable.
Another option you could consider is "BASH" "(The Bourne Again SHell). That will do everything you can do with .BAT files (and much more). Search for BASH shell scripting.
Whether Python or BASH is the right tool for the job depends on whether you're mostly just writing glue (to call a bunch of other programs) or if you're actually writing complex logic yourself. If it's the former, then I'd go with BASH.

execute python script when directory is not empty - Directory Monitoring

I have a python script that converts images and videos withing a directory.
The problem, the python script executes manually but I need the script to execute automatically when a file is dropped into the directory under a linux platform.
What would be the best way to set a python script to watch/monitor a directory?
I've looked into many options but not sure which one just simply sets the script to execute when files are dropped into a directory.
Thank in advanced

The 'clean' way to do this is using the inotify system. There is the Pyinotify project if you want to use Python to interface with it.
You don't have to use inotify directly though - there are tools like icrond you can hook into. In fact, the person at that link looks to be trying to do something very similar to what you want - check it out.
Brute force, you could use watch, though that just runs a command periodically, not only when something changes.

Check out PyInotify
Or for an easier example:
PyInotify Tutorial

Use pyinotify:
https://github.com/seb-m/pyinotify
A tutorial is here: https://github.com/seb-m/pyinotify/wiki/Tutorial

Make Python ignore .pyc files

Is there a way to make Python ignore any .pyc files that are present and always interpret all the code (including imported modules) directly? Google hasn't turned up any answers, so I suspect not, but it seemed worth asking just in case.
(Why do I want to do this? I have a large pipeline of Python scripts which are run repeatedly over a cluster of a couple hundred computers. The Python scripts themselves live on a shared NFS filesystem. Somehow, rarely, after having been run hundreds of times over several hours, they will suddenly start crashing with an error about not being able to import a module. Forcing the regeneration of the .pyc file fixes the problem. I want, of course, to fix the underlying causes, but in the meantime we also need the system to continue running, so it seems like ignoring the .pyc files if possible would be a reasonable workaround).
P.S. I'm using Python 2.5, so I can't use -B.

You could use the standard Python library's imp module to reimplement __builtins__.__import__, which is the hook function called by import and from statement. In particular, the imp.load_module function can be used to load a .py even when the corresponding .pyc is present. Be sure to study carefully all the docs in the page I've pointed to, plus those for import, as it's kind of a delicate job. The docs themselves suggest using import hooks instead (per PEP 302) but for this particular task I suspect that would be even harder.
BTW, likely causes for your observed problems include race conditions between different computers trying to write .pyc files at the same time -- NFS locking is notoriously flaky and has always been;-). As long as every Python compiler you're using is at the same version (if not, you're in big trouble anyway;-), I'd rather precompile all of those .py files into .pyc and make their directories read-only; the latter seems the simplest approach anyway (rather than hacking __import__), even if for some reason you can't precompile.

It's not exactly what you asked for, but would removing the existing .pyc files and then not creating any more work for you? In that case, you could use the -B option:
>python --help
usage: python [option] ... [-c cmd | -m mod | file | -] [arg] ...
Options and arguments (and corresponding environment variables):
-B : don't write .py[co] files on import; also PYTHONDONTWRITEBYTECODE=x

In case anyone is using python 2.6 or above with the same question, the simplest thing to do is:
Delete all .pyc files
Run all your python interpreters with the -B option, so they won't generate .pyc files.
From the docs:
-B
If given, Python won’t try to write .pyc or .pyo files on the import of source modules. See also PYTHONDONTWRITEBYTECODE.
New in version 2.6.
If you can't delete all the .pycs, then you could:
1) Run all your python interpreters with the -B -O options.
This will tell python to look for .pyo files for bytecode instead of .pyc files (-O) and tell python not to generate any bytecode files (-B).
The combination of the two options, assuming you haven't used them before, is that Python won't generate any bytecode files and won't look for bytecode files that would have been generated by older runs.
From the docs:
-B
If given, Python won’t try to write .pyc or .pyo files on the import of source modules. See also PYTHONDONTWRITEBYTECODE.
New in version 2.6.
-O
Turn on basic optimizations. This changes the filename extension for compiled (bytecode) files from .pyc to .pyo. See also PYTHONOPTIMIZE.

Perhaps you could work around this by, for example, scheduling a job to periodically shut down the scripts and delete the .pyc files.

Well, I don't think Python ever interprets code directly if you're loading the code from a file. Even when using the interactive shell, Python will compile the imported module into a .pyc.
That said, you could write a shell script to go ahead and delete all the .pyc files before launching your scripts. That would certainly force a full rebuild before every execution.

You may find PEP 3147 - PYC Repository Directories to be of great interest from Python 3.2 onwards.

Bundle additional executables with py2exe

I have a python script that calls out to two Sysinternals tools (sigcheck and accesschk). Is there a way I can bundle these executables into a py2exe so that subprocess.Popen can see it when it runs?
Full explanation: My script is made to execute over a network share (S:\share\my_script.exe) and it makes hundreds of calls to sigcheck and accesscheck. If sigcheck and accesschk also reside on the server, they seem to get transferred to the host, called once, transferred the the host again, called a second time, on and on until the almost 400-500 calls are complete.
I can probably fall back to copying these two executables to the host (C:) and then deleting them when I'm done... how would you solve this problem?

I could be wrong about this, but I don't believe this is what py2exe was intended for. It's more about what you're distributing than about how you're distributing. I think what you may be looking for is the option to create a windows installer. You could probably add the executables as data files or scripts using distutils.
arggg Why can't my data_files just get bundled into the zipfile?
I've started using paver for this kind of thing. It makes it really easy to override commands or create new commands that will allow you to put some new files into the sdist.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.