Let's say Tight Ars & Co. is a company with incredibly tight security policies, and lets assume I work for this company. Assume they've one task that requires a python script to write to excel files, and I find this incredibly wonderful library called xlwt. Now my script is able to write to excel files, everything is wonderful and the sun is shining, I release the code, and suddenly I'm asked what is this thingamajig setup.py, why should we run it? wait, we'll not even run it, we want the environment to be clean from third party code etc etc, since I'm unaware of any wizardry or voo doo is there any way I can package the dependent libraries and import them in my script?
All setup.py typically does with any pure-Python package is copy files into a standard place and compile the .py files to .pyc. I can't imagine why your employer would regard that as (nasty) third-party software, but the source of the package is OK, your IDE is OK, Python itself is OK, etc ...
Options:
(1) Copy the xlwt directory from a source distribution to somewhere that's listed in sys.path
(2) Make a ZIP file xlwt.zip containing the contents of the xlwt directory and copy it to ditto.
(3) As (2) but compile the .py files to .pyc first.
If somebody points out that the above involves error-prone manual steps, you can:
(a) write a script to do that
or
(b) copy setup.py, change its name, pretend that you wrote it yourself, use it, ...
Unless I am misunderstanding the question you should be able to obtain the source archive and simply copy the "xlwt" directory to the same directory as your script and it should be importable from the local directory.
Related
Hi there wise people of stack. I'm having trouble with importing pyd files as python objects.
The story:
I have an internal repo on gitlab that runs python files as well as C++ files. the repo uses pybind for both languages to speak to one another. The whole project is built with CI/CD and the artefact I have access to are .pyd extension files.
What I was given as a task would be to access some .pyd files (in different folders) with a single python script and access their classes (encoded inside this .pyd file) to mock them using python.
The problem:
What I was told was that I would need a simple include to be able to access the .pyd as an object through python just like you would with a library. However, I came across errors in the whole process. I have gone through this post and this one, but it seems that none of them works for me.
What was tried:
The first thing I did was start a remote folder with a single .pyd file from the project(let's call it SomeClass.pyd). I then created a python file test.py in the same directory as the pyd file.
The whole architecture looks like the following:
|--folder
|--SomeClass.pyd
|--test.py
Then, in the test.py file, I tried running
import SomeClass.pyd
import SomeClass
import SomeClass.pyd as sc
from SomeClass.pyd import *
from SomeClass import *
which all yielded the same following error:
ImportError: dynamic module does not define module export function
Now, I know that pyd files are similar to dlls, but I was told multiple time that a simple import would let me access the object information without needing anything in particular.
I recall reading about adding the PYTHONPATH before launching the whole process. However, I need that file to access the pyd without having to add any variable to the path as I will likely not always have access rights to the PYTHONPATH.
The project is quite big, so I'm trying to keep it bare minimum, but if you need more info, I'll try to give some more.
Thank you for your feedback!
Alright, after some time and a lot of researching, I found the weird answer for the problem that occured. I really hope it will help anyone encountering the same issue.
The problem was caused by the fact that pycharm has sometimes issues with the whole dynamic import.
First problem: dynamic import
This was solved simply by going on pycharm --> files --> invalidate cache and then tick "Clear file system cache and Local History" as well as "Clear VCS Log caches and indexes". You should then be prompted to reboot.
I also add a note that even after fixing the issue, sometime, for no apparent reason, I still have to invalidate cache again.
Second problem: venv
Once rebooted, you might be able to import manually the path to your pyd file, but you probably won't be able to auto complete. What solved this for me was compiling manually the code responsible for the pyd in order to generate a wheel. In my case, I used poetry:
poetry build
Once the wheel was created, I did a manual pip install of the wheel created by the poetry build to install it directly into the venv:
pip install dist/the_name_of_your_wheel_file.whl
These steps were the ones to fix my problem. I hope this will help anyone encountering the same problem!
I have a .py file that imports from other python modules that import from config files, other modules, etc.
I am to move the code needed to run that .py file, but only whatever the py file is reading from (I am not talking about packages installed by pip install, it's more about other python files in the project directory, mostly classes, functions and ini files).
Is there a way to find out only the external files used by that particular python script? Is it something that can be found using PyCharm for example?
Thanks!
Static analysis tools (such as PyCharm's refactoring tools) can (mostly) figure out the module import tree for a program (unless you do dynamic imports using e.g. importlib.import_module()).
However, it's not quite possible to statically definitively know what other files are required for your program to function. You could use Python's audit events (or strace/ptrace or similar OS-level functions) to look at what files are being opened by your program (e.g. during your tests being run (you do have tests, right?), or during regular program use), but it's likely not going to be exhaustive.
I write and maintain a Python library for quantum chemistry calculations called PyQuante. I have a fairly standard Python distribution with a setup.py file in the main directory, a subdirectory called "PyQuante" that holds all of the Python modules, and one called "Src" that contains source code for C extension modules.
I've been lucky enough to have some users donate code that uses Cython, which I hadn't used before, since I started PyQuante before either it or Pyrex existed. On my suggestion, they put the code into the Src subdirectory, since that's where all the C code went.
However, looking at the code that generates the extensions, I wonder whether I should have simply put the code in subdirectories of the Python branch instead. And thus my question is:
what are the best practices for the directory structure of python distributions with both Python and Cython source files?
Do you put the .pyx files in the same directory as the .py files?
Do you put them in in a subdirectory of the one that holds the .py files?
Do you put them in a child of the .py directory's parent?
Does the fact that I'm even asking this question betray my ignorance at distributing .pyx files? I'm sure there are many ways to make this work, and am mostly concerned with what has worked best for people.
Thanks for any help you can offer.
Putting the .pyx files in the same directory as .py files makes the most sense to me. It's what the authors of scikit-learn have done and what I've done in my py-earth module. I guess I think of Cython modules as optimized replacements for Python modules. I will often begin by writing a package in pure Python, then replace some modules with Cython if I need better performance. Since I'm treating Cython modules as replacements for Python modules, it makes sense to me to keep them in the same place. It also works well for test builds using the --inplace argument.
I have a Python project that has the following structure:
package1
class.py
class2.py
...
package2
otherClass.py
otherClass2.py
...
config
dev_settings.ini
prod_settings.ini
I wrote a setup.py file that converts this into an egg with the same file structure. (When I examine it using a zip program the structure seems identical.) The funny thing is, when I run the Python code from my IDE it works fine and can access the config files; but when I try to run it from a different Python script using the egg, it can't seem to find the config files in the egg. If I put the config files into a directory relative to the calling Python script (external to the egg), it works - but that sort of defeats the purpose of having a self-contained egg that has all the functionality of the program and can be called from anywhere. I can use any classes/modules and run any functions from the egg as long as they don't use the config files... but if they do, the egg can't find them and so the functions don't work.
Any help would be really appreciated! We're kind of new to the egg thing here and don't really know where to start.
The problem is, the config files are not files anymore - they're packaged within the egg. It's not easy to find the answer in the docs, but it is there. From the setuptools developer's guide:
Typically, existing programs manipulate a package's __file__ attribute in order to find the location of data files. However, this manipulation isn't compatible with PEP 302-based import hooks, including importing from zip files and Python Eggs.
To access them, you need to follow the instructions for the Resource Management API.
In my own code, I had this problem with a logging configuration file. I used the API successfully like this:
from pkg_resources import resource_stream
_log_config_file = 'logging.conf'
_log_config_location = resource_stream(__name__, _log_config_file)
logging.config.fileConfig(_log_config_location)
_log = logging.getLogger('package.module')
See Setuptools' discussion of accessing pacakged data files at runtime. You have to get at your configuration file a different way if you want the script to work inside an egg. Also, for that to work, you may need to make your config directory a Python package by tossing in an empty __init__.py file.
Is it possible to have Python save the .pyc files to a separate folder location that is in sys.path?
/code
foo.py
foo.pyc
bar.py
bar.pyc
To:
/code
foo.py
bar.py
/code_compiled
foo.pyc
bar.pyc
I would like this because I feel it'd be more organized. Thanks for any help you can give me.
Update:
In Python 3.8 -X pycache_prefix=PATH command-line option enables writing .pyc files to a parallel tree rooted at the given directory instead of to the code tree. See $PYTHONPYCACHEPREFIX envvarcredits: #RobertT' answer
The location of the cache is reported in sys.pycache_prefix (None indicates the default location in __pycache__ [since Python 3.2] subdirectories).
To turn off caching the compiled Python bytecode, -B may be set, then Python won’t try to write .pyc files on the import of source modules. See $PYTHONDONTWRITEBYTECODE envvarcredits: #Maleev's answer
Old [Python 2] answer:
There is PEP 304: Controlling Generation of Bytecode Files. Its status is Withdrawn and corresponding patch rejected. Therefore there might be no direct way to do it.
If you don't need source code then you may just delete *.py files. *.pyc files can be used as is or packed in an egg.
In the dark and ancient days of 2003, PEP 304 came forth to challenge this problem. Its patch was found wanting. Environment variable platform dependencies and version skews ripped it to shreds and left its bits scattered across the wastelands.
After years of suffering, a new challenger rose in the last days of 2009. Barry Warsaw summoned PEP 3147 and sent it to do battle, wielding a simple weapon with skill. The PEP crushed the cluttering PYC files, silenced the waring Unladen Swallow and CPython interpreter each trying to argue its PYC file should be triumphant, and allowed Python to rest easy with its dead ghosts occasionally running in the dead of night. PEP 3147 was found worthy by the dictator and was knighted into the official roles in the days of 3.2.
As of 3.2, Python stores a module's PYC files in __pycache__ under the module's directory. Each PYC file contains the name and version of the interpreter, e.g., __pycache__/foo.cpython-33.pyc. You might also have a __pycache__/foo.cpython-32.pyc compiled by an earlier version of Python. The right magic happens: the correct one is used and recompiled if out of sync with the source code. At runtime, look at the module's mymodule.__cached__ for the pyc filename and parse it with imp.get_tag(). See the What's New section for more information.
TL;DR - Just works in Python 3.2 and above. Poor hacks substitute for versions before that.
And only almost ten years later, Python 3.8 finally provides support for keeping bytecode in separate parallel filesystem tree by setting environment variable PYTHONPYCACHEPREFIX or using -X pycache_prefix=PATH argument (official doc here).
If you're willing to sacrifice bytecode generation altogether for it, there's a command line flag:
python -B file_that_imports_others.py
Can be put into IDE's build/run preferences
I agree, distributing your code as an egg is a great way to keep it organized. What could be more organized than a single-file containing all of the code and meta-data you would ever need. Changing the way the bytecode compiler works is only going to cause confusion.
If you really do not like the location of those pyc files, an alternative is to run from a read-only folder. Since python will not be able to write, no pyc files ever get made. The hit you take is that every python file will have to be re-compiled as soon as it is loaded, regardless of whether you have changed it or not. That means your start-up time will be a lot worse.
I disagree. The reasons are wrong or at least not well formulated; but the direction is valid. There are good reasons for being able to segregate source code from compiled objects. Here are a few of them (all of them I have run into at one point or another):
embedded device reading off a ROM, but able to use an in memory filesystem on RAM.
multi-os dev environment means sharing (with samba/nfs/whatever) my working directory and building on multiple platforms.
commercial company wishes to only distribute pyc to protect the IP
easily run test suite for multiple versions of python using the same working directory
more easily clean up transitional files (rm -rf $OBJECT_DIR as opposed to find . -name '*.pyc' -exec rm -f {} \;)
There are workarounds for all these problems, BUT they are mostly workarounds NOT solutions. The proper solution in most of these cases would be for the software to accept an alternative location for storing and lookup of these transitional files.
Since Python 3.2 has been implemented PEP 3147: this means that all .pyc files are generated inside a __pycache__ directory (there will be a __pycache__ directory for each directory where you have Python files, and it will hold .pyc files for each version of Python used on the sources)
There is ongoing pep that will enable building bytecode to magic directory.
Basically all python files will be compiled to directory __pythoncache__.
For Python 3.8 or higher:
The PYTHONPYCACHEPREFIX setting (also available as -X pycache_prefix) configures the implicit bytecode cache to use a separate filesystem tree, rather than the default __pycache__ subdirectories within each source directory.
The location of the cache is reported in sys.pycache_prefix (None indicates the default location in __pycache__ subdirectories).
"I feel it'd be more organized" Why? How? What are you trying to accomplish?
The point of saving the compiler output is to save a tiny bit of load time when the module gets imported. Why make this more complex? If you don't like the .pyc's, then run a "delete all the .pyc's" script periodically.
They aren't essential; they're helpful. Why turn off that help?
This isn't C, C++ or Java where the resulting objects are essential. This is just a cache that Python happens to use. We mark them as "ignored" in Subversion so they don't accidentally wind up getting checked in.