Python packaging distribute post-install step - python

I am packaging a project that uses nltk. When you install nltk with pip, you get core functionalitiy, but not all the modules that come with it. To get those modules, you call nltk's download method.
I tried the following, but it doesn't work, saying ImportError: No module named nltk. I assume this is happening because import nltk occurs before nltk is installed by the call to setup(...).
Is there a clean way of having a post-install step with distribute that executes one of the following?
$ python -m nltk.downloader punkt
>>> import nltk; nltk.download('punkt')
Here's my failed attempt at setup.py:
class my_install(install):
def run(self):
install.run(self)
import nltk
nltk.download('punkt')
setup(
...
install_requires = [..., 'nltk==2.0.4'],
cmdclass={'install': my_install},
)

pip does not handle dependencies, so you'll need to write a README file and explain your users what they need to install, or a script that runs pip install on all the things you need.
This second way would be the way to go I think, along with the README file that explains what is going on.
As a debian maintainer I can tell you that doing an installation command that downloads stuff around is considered not acceptable there, it would have to be packaged listing the dependencies to other packages and then installing yours if the dependencies are met, and I think it is a sane way to proceed in general. http://wiki.debian.org/UpstreamGuide#No_Downloads

I used command line installation method and was successful.
like this...
import subprocess
class my_install(install):
def run(self):
install.run(self)
cmd = ["python", "-m", "nltk.downloader", "punkt"]
with subprocess.Popen(cmd, stdout=subprocess.PIPE) as proc:
print(proc.stdout.read())

Related

Is it possible to ensure a package's dependencies have been installed before running it's setup.py?

I'm working on distributing a Python package. It depends on the library lupa. I want to run a post-install script that depends on lupa that initializes some data within the package after it's been installed. After looking at some answers around StackOverflow, my stripped setup.py essentially looks like this:
# setup.py
from distutils.core import setup
from setuptools.command.install import install
class PostInstallCommand(install):
def run(self):
# Do normal setup
install.run(self)
# Now do my setup
from mymodule.script import init
init()
setup(
# ...
install_requires = [
# ...
"lupa >= 1.10",
# ...
],
cmdclass = {
'install': PostInstallCommand
}
)
However, when emulating a fresh install/setup with tox on Python 3.10, I get this error:
File "C:\My\Computer\Temp\pip-req-build-pl0jria3\setup.py", line 26, in run
from mymodule.script import init
File "C:\My\Computer\Temp\pip-req-build-pl0jria3\mymodule\script.py", line 28, in <module>
import lupa
ModuleNotFoundError: No module named 'lupa'
I was under the impression that anything put into install_requires would be installed by the time setup() finished, but that appears not to be the case (also corroborated by this answer). Is there anything I can do to ensure that lupa is installed prior to mymodule.script.init(), or is that stage of the setup process entirely out of the user's hands?
After doing a good bit of research, it seems this kind of post-install script is somewhat against the core philosophy of setuptools, which means that a request like this is unlikely to be added, or at least added anytime soon.
Fortunately, this is somewhat of a blessing in disguise; my post-install script is actually an "update" console entry point that the user calls anytime they've added mods or updated any of the packages data. This script can be (and is supposed to be) called many times by the user, so by having it as part of the install process it helps introduce the purpose of the script to the user right from the start. This makes the slight annoyance on install tolerable, at least in my circumstance.

How do I generate python grpc code from within a setuptools installer (setup.py)?

We have some proto files for gRPC in a repo and I read that it is not good to commit generated code. So I figured I need to have the generation as part of the package installation (e.g. setuptools, setup.py)
However, to generate gRPC code, you need to first install the package by running pip install grpcio-tools according to the docs. But the purpose of setup.py is to automatically pull down dependencies like grpcio-tools.
So is there a best-practice for doing this? As in, how to generate code that depends on another python package from within setuptools? Am I better off just create a separate build.sh script that manually pip-installs and generates the code? Or should I expect users of the package to already have grpcio-tools installed?
As far as I know, the "current" best practice is:
pip manages dependencies
setup.py performs build
Executing "pip install ." is almost equivalent to perform "pip install -r requirements.txt" + "python setup.py build" + "python setup.py install".
This is a custom command that generates python sources from proto files:
class GrpcTool (Command):
def initialize_options(self):
pass
def finalize_options(self):
pass
def run(self):
import grpc_tools.protoc
proto_include = pkg_resources.resource_filename('grpc_tools', '_proto')
grpc_tools.protoc.main([
'grpc_tools.protoc',
'-I{}'.format(proto_include),
'--python_out=SOME_PATH/',
'--grpc_python_out=SOME_PATH/',
'SOME_PROTO.proto'
])
that is invoked customizing build_py command, like this:
class BuildPyCommand (build_py):
def run(self):
self.run_command('grpc')
super(BuildPyCommand, self).run()
Note the import inside the run method. It seems that pip run setup.py several times, both before and after having installed requirements. So if you have the import on top of file, the build fails.
Along with #makeroo approach, alternative way is to execute grpc_tools module as a subprocess.
The benefit of this approach is to receive a generation result for sure; 0 is a success and 1 for error.
proto_files = ["proto/file1.proto", "proto/file2.proto"]
import subprocess
for file in proto_files:
args = "--proto_path=. --python_out=. --grpc_python_out=. {0}".format(file)
result = subprocess.call("python -m grpc_tools.protoc " + args, shell=True)
print("grpc generation result for '{0}': code {1}".format(file, result))
Above code will create generated python files to proto directory where .proto files reside.

how to import a python module before installing it?

So I'm trying to create a setup.py file do deploy a test framework in python.
The library has dependencies in pexpect and easy_install. So, after installing easy_install, I need to install s3cmd which is a tool to work with Amazon's S3.
However, to configure s3cmd I use pexpect, but if you want to run setup.py from a fresh VM, so we run into an ImportError:
import subprocess
import sys
import pexpect # pexpect is not installed ... it will be
def install_s3cmd():
subprocess.call(['sudo easy_install s3cmd'])
# now use pexpect to configure s3cdm
child = pexpect.spawn('s3cmd --configure')
child.expect ('(?i)Access Key')
# ... more code down there
def main():
subprocess.call(['sudo apt-get install python-setuptools']) # installs easy_install
subprocess.call(['sudo easy_install pexpect']) # installs pexpect
install_s3cmd()
# ... more code down here
if __name__ == "__main__":
main()
I know of course I could create a another file, initial_setup.py to have easy_install and pexpect installed, before using setup.py, but my question is: Is there a way to import pexpect before having it installed? The library will be installed before using it, but does the Python interpreter will accept the import pexpect command?
It won't accept it like that, but Python allows you to import things everywhere, not only in the global scope. So you can postpone the import until the time when you really need it:
def install_s3cmd():
subprocess.call(['easy_install', 's3cmd'])
# assuming that by now it's already been installed
import pexpect
# now use pexpect to configure s3cdm
child = pexpect.spawn('s3cmd --configure')
child.expect ('(?i)Access Key')
# ... more code down there
EDIT: there is a peculiarity with using setuptools this way, since the .pth file will not be reloaded until Python relaunches. You can enforce reloading though (found here):
import subprocess, pkg_resources
subprocess.call(['easy_install', 'pexpect'])
pkg_resources.get_distribution('pexpect').activate()
import pexpect # Now works
(Unrelated: I'd rather assume that the script itself is called with the needed privileges, not use sudo in it. That will be useful with virtualenv.)

Execute a Python script post install using distutils / setuptools

Note: distutils is deprecated and the accepted answer has been updated to use setuptools
I'm trying to add a post-install task to Python distutils as described in How to extend distutils with a simple post install script?. The task is supposed to execute a Python script in the installed lib directory. This script generates additional Python modules the installed package requires.
My first attempt is as follows:
from distutils.core import setup
from distutils.command.install import install
class post_install(install):
def run(self):
install.run(self)
from subprocess import call
call(['python', 'scriptname.py'],
cwd=self.install_lib + 'packagename')
setup(
...
cmdclass={'install': post_install},
)
This approach works, but as far as I can tell has two deficiencies:
If the user has used a Python interpreter other than the one picked up from PATH, the post install script will be executed with a different interpreter which might cause a problem.
It's not safe against dry-run etc. which I might be able to remedy by wrapping it in a function and calling it with distutils.cmd.Command.execute.
How could I improve my solution? Is there a recommended way / best practice for doing this? I'd like to avoid pulling in another dependency if possible.
The way to address these deficiences is:
Get the full path to the Python interpreter executing setup.py from sys.executable.
Classes inheriting from setuptools.Command (such as setuptools.command.install.install which we use here) implement the execute method, which executes a given function in a "safe way" i.e. respecting the dry-run flag.
Note however that the --dry-run option is currently broken and does not work as intended anyway.
I ended up with the following solution:
import os, sys
from setuptools import setup
from setuptools.command.install import install as _install
def _post_install(dir):
from subprocess import call
call([sys.executable, 'scriptname.py'],
cwd=os.path.join(dir, 'packagename'))
class install(_install):
def run(self):
_install.run(self)
self.execute(_post_install, (self.install_lib,),
msg="Running post install task")
setup(
...
cmdclass={'install': install},
)
Note that I use the class name install for my derived class because that is what python setup.py --help-commands will use.
I think the easiest way to perform the post-install, and keep the requirements, is to decorate the call to setup(...):
from setup tools import setup
def _post_install(setup):
def _post_actions():
do_things()
_post_actions()
return setup
setup = _post_install(
setup(
name='NAME',
install_requires=['...
)
)
This will run setup() when declaring setup. Once done with the requirements installation, it will run the _post_install() function, which will run the inner function _post_actions().

How to perform custom build steps in setup.py?

The distutils module allows to include and install resource files together with Python modules. How to properly include them if resource files should be generated during a building process?
For example, the project is a web application which contains CoffeeScript sources that should be compiled into JavaScript and included in a Python package then. Is there a way to integrate this into a normal sdist/bdist process?
I spent a fair while figuring this out, the various suggestions out there are broken in various ways - they break installation of dependencies, or they don't work in pip, etc. Here's my solution:
in setup.py:
from setuptools import setup, find_packages
from setuptools.command.install import install
from distutils.command.install import install as _install
class install_(install):
# inject your own code into this func as you see fit
def run(self):
ret = None
if self.old_and_unmanageable or self.single_version_externally_managed:
ret = _install.run(self)
else:
caller = sys._getframe(2)
caller_module = caller.f_globals.get('__name__','')
caller_name = caller.f_code.co_name
if caller_module != 'distutils.dist' or caller_name!='run_commands':
_install.run(self)
else:
self.do_egg_install()
# This is just an example, a post-install hook
# It's a nice way to get at your installed module though
import site
site.addsitedir(self.install_lib)
sys.path.insert(0, self.install_lib)
from mymodule import install_hooks
install_hooks.post_install()
return ret
Then, in your call to the setup function, pass the arg:
cmdclass={'install': install_}
You could use the same idea for build as opposed to install, write yourself a decorator to make it easier, etc. This has been tested via pip, and direct 'python setup.py install' invocation.
The best way would be to write a custom build_coffeescript command and make it a subcommand of build. More details are given in other replies to similar/duplicate questions, for example this one:
https://stackoverflow.com/a/1321345/150999

Categories