Is there a way to make Python ignore any .pyc files that are present and always interpret all the code (including imported modules) directly? Google hasn't turned up any answers, so I suspect not, but it seemed worth asking just in case.
(Why do I want to do this? I have a large pipeline of Python scripts which are run repeatedly over a cluster of a couple hundred computers. The Python scripts themselves live on a shared NFS filesystem. Somehow, rarely, after having been run hundreds of times over several hours, they will suddenly start crashing with an error about not being able to import a module. Forcing the regeneration of the .pyc file fixes the problem. I want, of course, to fix the underlying causes, but in the meantime we also need the system to continue running, so it seems like ignoring the .pyc files if possible would be a reasonable workaround).
P.S. I'm using Python 2.5, so I can't use -B.
You could use the standard Python library's imp module to reimplement __builtins__.__import__, which is the hook function called by import and from statement. In particular, the imp.load_module function can be used to load a .py even when the corresponding .pyc is present. Be sure to study carefully all the docs in the page I've pointed to, plus those for import, as it's kind of a delicate job. The docs themselves suggest using import hooks instead (per PEP 302) but for this particular task I suspect that would be even harder.
BTW, likely causes for your observed problems include race conditions between different computers trying to write .pyc files at the same time -- NFS locking is notoriously flaky and has always been;-). As long as every Python compiler you're using is at the same version (if not, you're in big trouble anyway;-), I'd rather precompile all of those .py files into .pyc and make their directories read-only; the latter seems the simplest approach anyway (rather than hacking __import__), even if for some reason you can't precompile.
It's not exactly what you asked for, but would removing the existing .pyc files and then not creating any more work for you? In that case, you could use the -B option:
>python --help
usage: python [option] ... [-c cmd | -m mod | file | -] [arg] ...
Options and arguments (and corresponding environment variables):
-B : don't write .py[co] files on import; also PYTHONDONTWRITEBYTECODE=x
In case anyone is using python 2.6 or above with the same question, the simplest thing to do is:
Delete all .pyc files
Run all your python interpreters with the -B option, so they won't generate .pyc files.
From the docs:
-B
If given, Python won’t try to write .pyc or .pyo files on the import of source modules. See also PYTHONDONTWRITEBYTECODE.
New in version 2.6.
If you can't delete all the .pycs, then you could:
1) Run all your python interpreters with the -B -O options.
This will tell python to look for .pyo files for bytecode instead of .pyc files (-O) and tell python not to generate any bytecode files (-B).
The combination of the two options, assuming you haven't used them before, is that Python won't generate any bytecode files and won't look for bytecode files that would have been generated by older runs.
From the docs:
-B
If given, Python won’t try to write .pyc or .pyo files on the import of source modules. See also PYTHONDONTWRITEBYTECODE.
New in version 2.6.
-O
Turn on basic optimizations. This changes the filename extension for compiled (bytecode) files from .pyc to .pyo. See also PYTHONOPTIMIZE.
Perhaps you could work around this by, for example, scheduling a job to periodically shut down the scripts and delete the .pyc files.
Well, I don't think Python ever interprets code directly if you're loading the code from a file. Even when using the interactive shell, Python will compile the imported module into a .pyc.
That said, you could write a shell script to go ahead and delete all the .pyc files before launching your scripts. That would certainly force a full rebuild before every execution.
You may find PEP 3147 - PYC Repository Directories to be of great interest from Python 3.2 onwards.
Related
This answer tells me that a .pyc file gets created when a .py file is run, which I understand saves loading time when re-run. Which makes me wonder what the point of the .py file is after the .pyc is created.
When backing up my code, or sharing it, I don't want to include redundant or extraneous files. Which filetype should I focus on?
Side question: I have one script that calls another. After running them, the called script got a .pyc file written, but the master script that does the calling did not. Why would that be?
Python .pyc files are generated when a module is imported, not when a top level script is run. I'm not sure what you mean by calling, but if you ran your master script from the command line and it imported the other script, then only the imported one gets a .pyc.
As for distributing .pyc files, they are minor version sensitive. If you bundle your own python or distribute multiple python-version sensitive files, then maybe. But best practice is to distribute the .py files.
Python's script and module rules seem a bit odd until you consider its installation model. A common installation model is that executables are installed somewhere on the system's PATH and shared libraries are installed somewhere in a library path.
Python's setup.py does the same thing. Top level scripts go on the PATH but modules and packages go in an library path. For instance on my system, pdb3 (a top level script) is at /usr/bin/pdb3 and os (an imported module) is at /usr/lib/python3.4/os.py. Suppose python compiled pdb3 to pdb3.pyc. Well, I'd still call pdb3 and the .pyc is useless. So why clutter the path?
Its common for installs to run as root or administrator so you have write access on those paths. But you wouldn't have write access to them later as a regular user. You can have setup.py generate .pyc files during install. You get the right .pyc files for whatever python you happen to have, and since you are running as root/admin during install you still have acess to the directories. Trying to build .pyc files later is a problem because a regular user doesn't have access to the directories.
So, best practice is to distribute .py files and have setup.py build the .pyc during install.
If you simply want to run your Python script, all you really need is .pyc which is the bytecode generated from your source code. See here for details on running a .pyc file. I will warn that some of the detials are bit twisty.
However I recommend including your source code and leaving out your .pyc files as they are generated automatically by the Python Interpreter. Besides, if you, or another person would want to revise/revisit your source code at a later point, you would need the .py files. Furthermore, it is usually best practice to just include your source code.
After 3 intensive hour, I was testing my script on terminal. However, my editor messed up and it overwrote my script when it was being still executed on terminal. Well, I didn't terminate running script, so I was wondering that does python interpreter keep the currently running file in a temporary folder or somewhere else so that I can recover my script?
Python tries to cache your .pyc files. How that's done has changed over time (see PEP 3147 -- PYC Repository Directories. Top level scripts are not cached but imported ones are. So, you may not have one.
.pyc files are compiled byte codes so its not just a question of renaming it .py and you can't figure them out just by looking at them. There are decompilers out there like the one refenced here:
Decompiling .pyc files.
Personally, I create mercurial repos for my scripts and check them in frequently.... because I've made a similar mistake a time or two. git, svn, and etc... are other popular tools for maintaining repos.
Depending on your operating system and editor, you may have a copy in Trash or even saved by the editor. You may also be able to "roll back" the file system.
If you're running Linux, you may still be able to find a handle to open files in the /proc/ directory if the process is still running. This handle will keep the file from being deleted. Details see: https://superuser.com/questions/283102/how-to-recover-deleted-file-if-it-is-still-opened-by-some-process
Say I run a Python (2.7, though I'm not sure that makes a difference here) script. Instead of terminating the script, I tab out, or somehow switch back to my editing environment. I can then modify the script and save it, but this changes nothing in the still-running script.
Does Python load all source files into memory completely at launch? I am under the impression that this is how the Python interpreter works, but this contradicts my other views of the Python interpreter: I have heard that .pyc files serve as byte-code for Python's virtual machine, like .class files in Java. At the same time however, some (very few in my understanding) implementations of Python also use just-in-time compilation techniques.
So am I correct in thinking that if I make a change to a .py file while my script is running, I don't see that change until I re-run the script, because at launch all necessary .py files are compiled into .pyc files, and simply modifying the .py files does not remake the .pyc files?
If that is correct, then why don't huge programs, like the one I'm working on with ~6,550 kilobytes of source code distributed over 20+ .py files, take forever to compile at startup? How is the program itself so fast?
Additional Info:
I am not using third-party modules. All of the files have been written locally. The main source file is relatively small (10 kB), but the source file I primarily work on is 65 kB. It was also written locally and changes every time before launch.
Python loads the main script into memory, compiles it into bytecode and runs that. If you modify the source file in the meantime, you're not affecting the bytecode.
If you're running the script as the main script (i. e. by calling it like python myfile.py, then the bytecode will be discarded when the script exits.
If you're importing the script, however, then the bytecode will be written to disk as a .pyc file which won't be recompiled when imported again, unless you modify the corresponding .py file.
Your big 6.5 MB program consists of many modules which are imported by the (probably small) main script, so only that will have to be compiled at each run. All the other files will have their .pyc file ready to run.
First of all, you are indeed correct in your understanding that changes to a Python source file aren't seen by the interpreter until the next run. There are some debugging systems, usually built for proprietary purposes, that allow you to reload modules, but this bring attendant complexities such as existing objects retaining references to code from the old module, for example. It can get really ugly, though.
The reason huge programs start up so quickly is the the interpreter tries to create a .pyc file for every .py file it imports if either no corresponding .pyc file exists or if the .py is newer. The .pyc is indeed the program compiled into byte code, so it's relatively quick to load.
As far as JIT compilation goes you may be thinking of the PyPy implementation, which is written in Python and has backends in several different languages. It's increasingly being used in Python 2 shops where execution speed is important, but it's along way from the CPython that we all know and love.
I understand that ".pyc" files are compiled versions of the plain-text ".py" files, created at runtime to make programs run faster. However I have observed a few things:
Upon modification of "py" files, program behavior changes. This indicates that the "py" files are compiled or at least go though some sort of hashing process or compare time stamps in order to tell whether or not they should be re-compiled.
Upon deleting all ".pyc" files (rm *.pyc) sometimes program behavior will change. Which would indicate that they are not being compiled on update of ".py"s.
Questions:
How do they decide when to be compiled?
Is there a way to ensure that they have stricter checking during development?
The .pyc files are created (and possibly overwritten) only when that python file is imported by some other script. If the import is called, Python checks to see if the .pyc file's internal timestamp is not older than the corresponding .py file. If it is, it loads the .pyc; if it isn't or if the .pyc does not yet exist, Python compiles the .py file into a .pyc and loads it.
What do you mean by "stricter checking"?
.pyc files generated whenever the corresponding code elements are imported, and updated if the corresponding code files have been updated. If the .pyc files are deleted, they will be automatically regenerated. However, they are not automatically deleted when the corresponding code files are deleted.
This can cause some really fun bugs during file-level refactors.
First of all, you can end up pushing code that only works on your machine and on no one else's. If you have dangling references to files you deleted, these will still work locally if you don't manually delete the relevant .pyc files because .pyc files can be used in imports. This is compounded with the fact that a properly configured version control system will only push .py files to the central repository, not .pyc files, meaning that your code can pass the "import test" (does everything import okay) just fine and not work on anyone else's computer.
Second, you can have some pretty terrible bugs if you turn packages into modules. When you convert a package (a folder with an __init__.py file) into a module (a .py file), the .pyc files that once represented that package remain. In particular, the __init__.pyc remains. So, if you have the package foo with some code that doesn't matter, then later delete that package and create a file foo.py with some function def bar(): pass and run:
from foo import bar
you get:
ImportError: cannot import name bar
because python is still using the old .pyc files from the foo package, none of which define bar. This can be especially problematic on a web server, where totally functioning code can break because of .pyc files.
As a result of both of these reasons (and possibly others), your deployment code and testing code should delete .pyc files, such as with the following line of bash:
find . -name '*.pyc' -delete
Also, as of python 2.6, you can run python with the -B flag to not use .pyc files. See How to avoid .pyc files? for more details.
See also: How do I remove all .pyc files from a project?
I'm pretty new to Vim and I just set it up so that I can write Python code, with code completion, folding, etc. and am able to compile it also with a shortcut via plug-ins.
The thing is, I would also like to write some HTML/CSS in Vim as well and I'd like to install some similar plug-ins. I know that I could do this and configure different short-cuts for each language, but I'd like to set it up into two separate workspaces, so that am either working in my python or html workspace but not both. Is there any way to do this? Thanks in advance!
That's what compilers scripts are for!
The idea is to put a "compiler script" in your vim's compiler directory. That script is actually a settings file(the difference between script files and settings files in vim is only conceptual - technically they are the same), just like your .vimrc file. That script should contain configurations that are only loaded when you want them to. For example :compiler python loads your python settings.
Check out :help compiler for more info.
There are also "filetype plugins" - the main difference between them and compilers is that they are loaded automatically by the vim's filetype detection mechanism - which is actually an extensive set of scripts that can detect pretty much any filetype - unless you use an exotic language, or define your own extension, and even then you can extend that mechanism with your own ftdetect scripts. This is different from compiler scripts, which you need to explicitly call via a :compiler command, or define an :autocmd that calls the :compiler command.
Check out :help filetype for more info.
Compiler scripts more suitable for compiler-specific settings like make settings and build/run shortcuts, and filetype plugins more suitable for settings. It makes sense to build a C program the same way either if you are in a .c or .h file, if you are in the makefile, or if you are in an one of the program's resource text files.
Filetype scripts are more suitable for filetype-specific settings like syntax or code completion. It doesn't make sense to use C syntax and code completion for a C program's makefile or .ini file.
That said - for interpreted languages it doesn't really matter(unless you use a makefile to run them)