Is it possible to have Python save the .pyc files to a separate folder location that is in sys.path?
/code
foo.py
foo.pyc
bar.py
bar.pyc
To:
/code
foo.py
bar.py
/code_compiled
foo.pyc
bar.pyc
I would like this because I feel it'd be more organized. Thanks for any help you can give me.
Update:
In Python 3.8 -X pycache_prefix=PATH command-line option enables writing .pyc files to a parallel tree rooted at the given directory instead of to the code tree. See $PYTHONPYCACHEPREFIX envvarcredits: #RobertT' answer
The location of the cache is reported in sys.pycache_prefix (None indicates the default location in __pycache__ [since Python 3.2] subdirectories).
To turn off caching the compiled Python bytecode, -B may be set, then Python won’t try to write .pyc files on the import of source modules. See $PYTHONDONTWRITEBYTECODE envvarcredits: #Maleev's answer
Old [Python 2] answer:
There is PEP 304: Controlling Generation of Bytecode Files. Its status is Withdrawn and corresponding patch rejected. Therefore there might be no direct way to do it.
If you don't need source code then you may just delete *.py files. *.pyc files can be used as is or packed in an egg.
In the dark and ancient days of 2003, PEP 304 came forth to challenge this problem. Its patch was found wanting. Environment variable platform dependencies and version skews ripped it to shreds and left its bits scattered across the wastelands.
After years of suffering, a new challenger rose in the last days of 2009. Barry Warsaw summoned PEP 3147 and sent it to do battle, wielding a simple weapon with skill. The PEP crushed the cluttering PYC files, silenced the waring Unladen Swallow and CPython interpreter each trying to argue its PYC file should be triumphant, and allowed Python to rest easy with its dead ghosts occasionally running in the dead of night. PEP 3147 was found worthy by the dictator and was knighted into the official roles in the days of 3.2.
As of 3.2, Python stores a module's PYC files in __pycache__ under the module's directory. Each PYC file contains the name and version of the interpreter, e.g., __pycache__/foo.cpython-33.pyc. You might also have a __pycache__/foo.cpython-32.pyc compiled by an earlier version of Python. The right magic happens: the correct one is used and recompiled if out of sync with the source code. At runtime, look at the module's mymodule.__cached__ for the pyc filename and parse it with imp.get_tag(). See the What's New section for more information.
TL;DR - Just works in Python 3.2 and above. Poor hacks substitute for versions before that.
And only almost ten years later, Python 3.8 finally provides support for keeping bytecode in separate parallel filesystem tree by setting environment variable PYTHONPYCACHEPREFIX or using -X pycache_prefix=PATH argument (official doc here).
If you're willing to sacrifice bytecode generation altogether for it, there's a command line flag:
python -B file_that_imports_others.py
Can be put into IDE's build/run preferences
I agree, distributing your code as an egg is a great way to keep it organized. What could be more organized than a single-file containing all of the code and meta-data you would ever need. Changing the way the bytecode compiler works is only going to cause confusion.
If you really do not like the location of those pyc files, an alternative is to run from a read-only folder. Since python will not be able to write, no pyc files ever get made. The hit you take is that every python file will have to be re-compiled as soon as it is loaded, regardless of whether you have changed it or not. That means your start-up time will be a lot worse.
I disagree. The reasons are wrong or at least not well formulated; but the direction is valid. There are good reasons for being able to segregate source code from compiled objects. Here are a few of them (all of them I have run into at one point or another):
embedded device reading off a ROM, but able to use an in memory filesystem on RAM.
multi-os dev environment means sharing (with samba/nfs/whatever) my working directory and building on multiple platforms.
commercial company wishes to only distribute pyc to protect the IP
easily run test suite for multiple versions of python using the same working directory
more easily clean up transitional files (rm -rf $OBJECT_DIR as opposed to find . -name '*.pyc' -exec rm -f {} \;)
There are workarounds for all these problems, BUT they are mostly workarounds NOT solutions. The proper solution in most of these cases would be for the software to accept an alternative location for storing and lookup of these transitional files.
Since Python 3.2 has been implemented PEP 3147: this means that all .pyc files are generated inside a __pycache__ directory (there will be a __pycache__ directory for each directory where you have Python files, and it will hold .pyc files for each version of Python used on the sources)
There is ongoing pep that will enable building bytecode to magic directory.
Basically all python files will be compiled to directory __pythoncache__.
For Python 3.8 or higher:
The PYTHONPYCACHEPREFIX setting (also available as -X pycache_prefix) configures the implicit bytecode cache to use a separate filesystem tree, rather than the default __pycache__ subdirectories within each source directory.
The location of the cache is reported in sys.pycache_prefix (None indicates the default location in __pycache__ subdirectories).
"I feel it'd be more organized" Why? How? What are you trying to accomplish?
The point of saving the compiler output is to save a tiny bit of load time when the module gets imported. Why make this more complex? If you don't like the .pyc's, then run a "delete all the .pyc's" script periodically.
They aren't essential; they're helpful. Why turn off that help?
This isn't C, C++ or Java where the resulting objects are essential. This is just a cache that Python happens to use. We mark them as "ignored" in Subversion so they don't accidentally wind up getting checked in.
Related
I'm new to PyCharm/Python, and can't figure out where the IDE stores compiled python *.pyc files.
Coming from the IntelliJ world, it is strange that I don't see any menu options to re-build the project, or build individual files.
I'm also unable to find any pyc files while searching the project directory, so basically, I've no idea whether successful compilation has happened at all, although the GitHub imported project is error free.
What can I do here?
Because most Python implementations are interpreted rather than a compiled, the compilation step happens when you run the code. This is why the PyCharm UI features a prominent "Run" button (▶️) but no compile button.
It is true that for CPython there is a compilation step which compiles from the Python code to bytecode, but this is an implementation detail. CPython 3 stores its cached compilation results in .pyc files in a directory called __pycache__. These files are automatically generated when a module is imported (using import module will result in a module.pyc file) but not when a normal program is run.
Lastly, as per #shmee's comment, it is possible to compile a source file with the py_compile module, but I should emphasise that this is not usually done or necessary.
Now, if you are worried about checking that your code is correct, in the interpreted language world we rely more strongly on testing. I would recommend that you investigate tests for your code (using pytest and the excellent test integration in PyCharm).
Let me begin with a bit on terminology:
Python is a programming language. It's "just" the programming language specification.
CPython is the reference implementation of the Python language. It's actually just one of several different Python interpreters. CPython itself works (let's call it an implementation detail) by translating (but you could also say compiling) the code in imported Python files/modules to bytecode and then executing that bytecode. It actually stores the translation as .pyc files in the folder of that file) to make subsequent imports faster, but that's specific to CPython and can also be disabled.
PyCharm is an integrated development environment. However it requires to "Configure a Python Interpreter" to run Python code.
That means that PyCharm isn't responsible for creating .pyc files. If you configured a non-CPython interpreter or used the environmental variable to disable the pyc file creation there won't be any pyc files.
But if you used an appropriate CPython interpreter in PyCharm it will create .pyc files for the files/modules you successfully imported. That means you actually have to import or otherwise run the Python files in your project to get the .pyc files.
Actually the Python documentation contains a note about the "compiled" Python files:
To speed up loading modules, Python caches the compiled version of each module in the __pycache__ directory under the name module.version.pyc, where the version encodes the format of the compiled file; it generally contains the Python version number. For example, in CPython release 3.3 the compiled version of spam.py would be cached as __pycache__/spam.cpython-33.pyc. This naming convention allows compiled modules from different releases and different versions of Python to coexist.
Python checks the modification date of the source against the compiled version to see if it’s out of date and needs to be recompiled. This is a completely automatic process. Also, the compiled modules are platform-independent, so the same library can be shared among systems with different architectures.
Python does not check the cache in two circumstances. First, it always recompiles and does not store the result for the module that’s loaded directly from the command line. Second, it does not check the cache if there is no source module. To support a non-source (compiled only) distribution, the compiled module must be in the source directory, and there must not be a source module.
Some tips for experts:
You can use the -O or -OO switches on the Python command to reduce the size of a compiled module. The -O switch removes assert statements, the -OO switch removes both assert statements and doc strings. Since some programs may rely on having these available, you should only use this option if you know what you’re doing. “Optimized” modules have an opt- tag and are usually smaller. Future releases may change the effects of optimization.
A program doesn’t run any faster when it is read from a .pyc file than when it is read from a .py file; the only thing that’s faster about .pyc files is the speed with which they are loaded.
The module compileall can create .pyc files for all modules in a directory.
There is more detail on this process, including a flow chart of the decisions, in PEP 3147.
I'm working on an Inno Setup installer for a Python application for Windows 7, and I have these requirements:
The app shouldn't write anything to the installation directory
It should be able to use .pyc files
The app shouldn't require a specific Python version, so I can't just add a set of .pyc files to the installer
Is there a recommended way of handling this? Like give the user a way to (re)generate the .pyc files? Or is the shorter startup time benefit from the .pyc files usually not worth worrying about?
PYC files aren't guaranteed to be compatible for different python versions. If you don't know that all your customers are running the same python versions, you really don't want to distribute pyc's directly. So, you have to choose between distributing PYCs and supporting multiple python versions.
You could create build process that compiles all your files using py_compile and zips them up into a version-specific package. You can do this with setuptools.; however it will be awkward to do because you'll have to run py_compile in every version you need to support.
If you are basically distributing a closed application and don't want people to have trivial access to your source code, then py2exe is probably a simpler alternative. If your python is supposed to be integrated into the user's python install, then it's probably simpler to just create a zip of your .py files and add a one-line .py stub that imports the zipped package(s) using zipfile
if it makes you feel better, PYC doesn't provide much extra security and it doesn't really boost perf much either :)
If you haven't read PEP 3147, that will probably answer your questions.
I don't mean the solution described in that PEP and implemented as of Python 3.2. That's great if your "multiple Python versions" just means "3.2, 3.3, and probably future 3.x". Or even if it means "2.6+ and 3.1+, but I only really care about 3.2 and 3.3, so if I don't get the pyc speedups for other ones that's OK".
But when I asked your supported versions, you said, "2.7", which means you can't rely on PEP 3147 to solve your problems.
Fortunately, the PEP is full of discussion of earlier attempts to solve the problem, and the pitfalls of each, and there should be more than enough there to figure out what the options are and how to implement them.
The one problem is that the PEP is very linux-centric—mainly because it's primarily linux distros that tried to solve the problem in the past. (Apple also did so, but their solution was (a) pretty much working, and (b) tightly coupled with the whole Mac-specific "framework" thing, so they were mostly ignored…)
So, it largely leaves open the question of "Where should I put the .pyc files on Windows?"
The best choice is probably an app-specific directory under the user's local application data directory. See Known Folders if you can require Vista or later, CSIDL if you can't. Either way, you're looking for the FOLDERID_LocalAppData or CSIDL_LOCAL_APPDATA, which is:
The file system directory that serves as a data repository for local (nonroaming) applications. A typical path is C:\Documents and Settings\username\Local Settings\Application Data.
The point is that it's a place for applications to store data that's separate for each user (and inside that user's profile directory), and also for each machine the user's roaming profile might end up on, which means you can safely put stuff there and know that the user has the permissions to write there without UAC getting involved, and also know (as well as you ever can) that no other user or machine will interfere with what's there.
Within that directory, you create a directory for your program, and put whatever you want there, and as long as you picked a unique name (e.g., My Unique App Name or My Company Name\My App Name or a UUID), you're safe from accidental collision with other programs. (There used to be specific guidelines on this in MSDN, but I can no longer find them.)
So, how do you get to that directory?
The easiest way is to just use the env variable %LOCALAPPDATA%. If you need to deal with older Windows, you can use %USERPROFILE% and tack \Local Settings\Application Data onto the end, which is guaranteed to either be the same, or end up in the same place via junctions.
You can also use pywin32 or ctypes to access the native Windows APIs (since there are at least 3 different APIs for this and at least two ways to access those APIs, I don't want to give all possible ways to write this… but a quick google or SO search for "pywin32 SHGetFolderPath" or "ctypes SHGetKnownFolderPath" or whatever should give you what you need).
Or, there are multiple third-party modules to handle this. The first one both Google and PyPI turned up was winshell.
Re-reading the original question, there's a much simpler answer that probably fits your requirements.
I don't know much about Inno, but most installers give you a way to run an arbitrary command as a post-copy step.
So, you can just use python -m compileall to create the .pyc files for you at install time—while you've still got elevated privileges, so there's no problem with UAC.
In fact, if you look at pywin32, and various other Python packages that come as installer packages, they do exactly this. This is an idiomatic thing to do for installing libraries into the user's Python installation, so I don't see why it wouldn't be considered reasonable for installing an executable that uses the user's Python installation.
Of course if the user later decides to uninstall Python 2.6 and install 2.7, your .pyc files will be hosed… but from your description, it sounds like your entire program will be hosed anyway, and the recommended solution for the user would probably be to uninstall and reinstall anyway, right?
Let's say Tight Ars & Co. is a company with incredibly tight security policies, and lets assume I work for this company. Assume they've one task that requires a python script to write to excel files, and I find this incredibly wonderful library called xlwt. Now my script is able to write to excel files, everything is wonderful and the sun is shining, I release the code, and suddenly I'm asked what is this thingamajig setup.py, why should we run it? wait, we'll not even run it, we want the environment to be clean from third party code etc etc, since I'm unaware of any wizardry or voo doo is there any way I can package the dependent libraries and import them in my script?
All setup.py typically does with any pure-Python package is copy files into a standard place and compile the .py files to .pyc. I can't imagine why your employer would regard that as (nasty) third-party software, but the source of the package is OK, your IDE is OK, Python itself is OK, etc ...
Options:
(1) Copy the xlwt directory from a source distribution to somewhere that's listed in sys.path
(2) Make a ZIP file xlwt.zip containing the contents of the xlwt directory and copy it to ditto.
(3) As (2) but compile the .py files to .pyc first.
If somebody points out that the above involves error-prone manual steps, you can:
(a) write a script to do that
or
(b) copy setup.py, change its name, pretend that you wrote it yourself, use it, ...
Unless I am misunderstanding the question you should be able to obtain the source archive and simply copy the "xlwt" directory to the same directory as your script and it should be importable from the local directory.
I want to know what a pyc file(python bytecode) is. I want to know all the details.
I want to know about how pyc files interface with the compiler. Is it a replacement for exe?
Does it need to be run by python?
Is it as portable as the .py file is?
Where should I use this?
To supplement Mike Graham's answer there are some interesting comments here giving some information on pyc files. Most interestingly I suspect for you is the line:
A program doesn't run any faster when it is read from a ‘.pyc’ or ‘.pyo’ file than when it is read from a ‘.py’ file; the only thing that's faster about ‘.pyc’ or ‘.pyo’ files is the speed with which they are loaded.
Which hits the nail on the head w.r.t. the crux of a pyc file. A pyc is a pre-interpreted py file. The python bytecode is still the same as if it was generated from a py file - the difference is that when using a pyc file you don't have to go through the process of creating that pyc output (which you do when running a py file). Read as you don't have to convert the python script to python bytecode.
If you've come across .class files in java this is a similar concept - the difference is in java you have to do the compiling using javac before the java interpreter will execute the application. Different way of doing things (the internals will be very different as they're different languages) but same broad idea.
Python bytecode requires Python to run, cannot be ran standalone without Python, and is specific to a particular x.y release of Python. It should be portable across platforms for the same version. There is not a common reason for you to use it; Python uses it to optimize out parsing of your .py file on repeated imports. Your life will be fine ignoring the existence of pyc files.
From the docs:
As an important speed-up of the start-up time for short programs that use a lot of standard modules, if a file called spam.pyc exists in the directory where spam.py is found, this is assumed to contain an already-“byte-compiled” version of the module spam. The modification time of the version of spam.py used to create spam.pyc is recorded in spam.pyc, and the .pyc file is ignored if these don’t match.
See the ref for more info. But some specific answers:
The contents of the spam.pyc file are platform independent, so a Python module directory can be shared by machines of different architectures.
It's not an executable; it's used internally by the compiler as an intermediate step.
In general, you don't make .pyc files by hand: the interpreter makes them automatically.
To squeeze into the limited amount of filesystem storage available in an embedded system I'm currently playing with, I would like to eliminate any files that could reasonably be removed without significantly impacting functionality or performance. The *.py, *.pyo, and *.pyc files in the Python library account for a sizable amount of space, I'm wondering which of these options would be most reasonable for a Python 2.6 installation in a small embedded system:
Keep *.py, eliminate *.pyc and *.pyo (Maintain ability to debug, performance suffers?)
Keep *.py and *.pyc, eliminate *.pyo (Does optimization really buy anything?)
Keep *.pyc, eliminate *.pyo and *.py (Will this work?)
Keep *.py, *.pyc, and *.pyo (All are needed?)
http://www.network-theory.co.uk/docs/pytut/CompiledPythonfiles.html
When the Python interpreter is invoked with the -O flag, optimized code is generated and stored in ‘.pyo’ files. The optimizer currently doesn't help much; it only removes assert statements.
Passing two -O flags to the Python interpreter (-OO) will cause the bytecode compiler to perform optimizations that could in some rare cases result in malfunctioning programs. Currently only doc strings are removed from the bytecode, resulting in more compact ‘.pyo’ files.
My suggestion to you?
Use -OO to compile only .pyo files if you don't need assert statements and __doc__ strings.
Otherwise, go with .pyc only.
Edit
I noticed that you only mentioned the Python library. Much of the python library can be removed if you only need part of the functionality.
I also suggest that you take a look at tinypy which is large subset of Python in about 64kb.
Number 3 should and will work. You do not need the .pyo or .py files in order to use the compiled python code.
I would recommend keeping only .py files. The difference in startup time isn't that great, and having the source around is a plus, as it will run under different python versions without any issues.
As of python 2.6, setting sys.dont_write_bytecode to True will suppress compilation of .pyc and .pyo files altogether, so you may want to use that option if you have 2.6 available.
Here's how I minimize disk requirements for mainline Python 2.7 at the day job:
1) Remove packages from the standard library which you won't need. The following is a conservative list:
bsddb/test ctypes/test distutils/tests email/test idlelib lib-tk
lib2to3 pydoc.py tabnanny.py test unittest
Note that some Python code may have surprising dependencies; e.g. setuptools needs unittest to run.
2) Pre-compile all Python code, using -OO to strip asserts and docstrings.
find -name '*.py' | python -OO -m py_compile -
Note that Python by default does not look at .pyo files; you have to explicitly ask for optimization at runtime as well, using an option or an environment variable. Run scripts in one of the following ways:
python -OO -m mylib.myscript
PYTHONOPTIMIZE=2 python -m mylib.myscript
3) Remove .py source code files (unless you need to run them as scripts) and .pyc unoptimized files.
find '(' -name '*.py' -or -name '*.pyc' ')' -and -not -executable -execdir rm '{}' ';'
4) Compress the Python library files. Python can load modules from a zip file. The paths in the zip-file must match the package hierarchy; thus you should merge site-packages and .egg directories into the main library directory before zipping. (Or you can add multiple zip files to the Python path.)
On Linux, Python's default path includes /usr/lib/python27.zip already, so just drop the zip file there and you're ready to go.
Leave os.pyo as an ordinary (non-zipped) file, since Python looks for this as a sanity check. If you move it to the zip file, you'll get a warning on every Python invocation (though everything will still work). Or you can just leave an empty os.py file there, and put the real one in the zip file.
Final notes:
In this manner, Python fits in 7 MB of disk space. There's a lot more that can be done to reduce size, but 7 MB was small enough for my purposes. :)
Python bytecode is not compatible across versions, but who cares when it's you who do the compilation and you who controls the Python version?
.pyo files in a zip file should be a performance win in all cases, unless the disk is extremely fast and the processor/RAM is extremely slow. Either way, Python executes from memory, not the on-disk format, so it only affects performance on load. Although the stripping of docstrings can save quite a bit of memory.
Do note that .pyo files do not contain assert statements.
.pyo files preserve function names and line numbers, so debugability is not decreased: You still get nice tracebacks, you just have to manually go look up the line number in the source, which you'd have to do anyway.
If you want to "hack" a file at runtime, just put it in the current working directory. It take precedence over the library zip file.
What it ultimately boils down to is that you really only need one of the three options, but your best bet is to go with .pys and either .pyos or .pycs.
Here's how I see each of your options:
If you put the .pys in a zip file, you won't see pycs or pyos built. It should also be pointed out that the performance difference is only in startup time, and even then isn't too great in my experience (your milage may vary though). Also note that there is a way to prevent the interpreter from outputting .pycs as Algorias points out.
I think that this is an ideal option (either that or .pys and .pyos) because you get the best mix of performance, debuggability and reliability. You don't necessarily need a source file and compiled file though.
If you're really strapped for space and need performance, this will work. I'd advise you to keep the .pys if at all possible though. Compiled binaries (.pycs or .pyos) don't always transfer to different versions of python.
It's doubtful that you'll need all three unless you plan on running in optimized mode sometimes and non-optimized mode sometimes.
In terms of space it's been my (very anecdotal) experience that .py files compress the best compared to .pycs and .pyos if you put them in a zipfile. If you plan on compressing the files, .pyos don't tend to gain a lot in terms of sheer space because docstrings tend to compress fairly well and asserts just don't take up that much space.