GCP DataflowRunner ImportErrors - python

Code is working when using option DirectRunner. But getting import errors when switching it to DataflowRunner. lxml module is not found is the reason. When trying to use setuptools code along with the main code, its still not working ( --setup_file setup.py).
setuptools.setup(
name='lxml',
version='4.2.5',
install_requires=[],
packages= setuptools.find_packages(),
)
Error: ImportError: No module named lxml [while running 'Run Query']
Any help/suggestions to overcome this error? Thanks.

The name you pass to the setuptools.setup function is the name of your package, and its dependencies should be specified in the argument install_requires. I would imagine it works with the DirectRunner because the package is installed on your local machine.
The Beam juliaset example provides a sample setup.py file:
REQUIRED_PACKAGES = ['numpy']
setuptools.setup(
name='juliaset', # this is their package name
version='0.0.1',
description='Julia set workflow package.',
install_requires=REQUIRED_PACKAGES,
...)
PyPI dependencies
If lxml is your only dependency, or all your dependencies are on PyPI, you should be able to use the much simpler requirements.txt file. In general, the setup.py approach requires much more boilerplate.
To use requirements.txt, freeze your dependencies:
pip freeze > requirements.txt
And pass the requirements.txt file to your pipeline:
--requirements_file requirements.txt
See also the Beam documentation's page for various dependency patterns for Python.

Related

setup.py file using requirements.txt

I've read a discussion where a suggestion was to use the requirements.txt inside the setup.py file to ensure the correct installation is available on multiple deployments without having to maintain both a requirements.txt and the list in setup.py.
However, when I'm trying to do an installation via pip install -e ., I get an error:
Obtaining file:///Users/myuser/Documents/myproject
Processing /home/ktietz/src/ci/alabaster_1611921544520/work
ERROR: Could not install packages due to an OSError: [Errno 2] No such file or directory:
'/System/Volumes/Data/home/ktietz/src/ci/alabaster_1611921544520/work'
It looks like pip is trying to look for packages that are available on pip (alabaster) on my local machine. Why? What am I missing here? Why isn't pip looking for the required packages on the PyPi server?
I have done it before the other way around, maintaining the setup file and not the requirements file. For the requirements file, just save it as:
*
and for setup, do
from distutils.core import setup
from setuptools import find_packages
try:
from Module.version import __version__
except ModuleNotFoundError:
exec(open("Module/version.py").read())
setup(
name="Package Name",
version=__version__,
packages=find_packages(),
package_data={p: ["*"] for p in find_packages()},
url="",
license="",
install_requires=[
"numpy",
"pandas"
],
python_requires=">=3.8.0",
author="First.Last",
author_email="author#company.com",
description="Description",
)
For reference, my version.py script looks like:
__build_number__ = "_LOCAL_"
__version__ = f"1.0.{__build_number__}"
Which Jenkins is replacing the build_number with a tag
This question consists of two separate questions, for the rather philosopihc choice of how to arrange setup requirements is actually unrelated to the installation error that you are experiencing.
First about the error: It looks like the project you are trying to install depends on another library (alabaster) of which you apparently also did an editable install using pip3 install -e . that points to this directory:
/home/ktietz/src/ci/alabaster_1611921544520/work
What the error tells you is that the directory where the install is supposed to be located does not exist anymore. You should only install your project itself in editable mode, but the dependencies should be installed into a classical system directory, i. e. without the option -e.
To clean up, I would suggest that you do the following:
# clean up references to the broken editable install
pip3 uninstall alabaster
# now do a proper non-editable install
pip3 install alabaster
Concerning the question how to arrange setup requirements, you should primarily use the install_requires and extras_require options of setuptools:
# either in setup.py
setuptools.setup(
install_requires = [
'dep1>=1.2',
'dep2>=2.4.1',
]
)
# or in setup.cfg
[options]
install_requires =
dep1>=1.2
dep2>=2.4.1
[options.extras_require]
extra_deps_a =
dep3
dep4>=4.2.3
extra_deps_b =
dep5>=5.2.1
Optional requirements can be organised in groups. To include such an extra group with the install, you can do pip3 install .[extra_deps_name].
If you wish to define specific dependency environments with exact versions (e. g. for Continuous Integration), you may use requirements.txt files in addition, but the general dependency and version constraint definitions should be done in setup.cfg or setup.py.

pip and tox ignore full path dependencies, instead look for "best match" in pypi

This is an extension of SO setup.py ignores full path dependencies, instead looks for "best match" in pypi
I am trying to write setup.py to install a proprietary package from a .tar.gz file on an internal web site. Unfortunately for me the prop package name duplicates a public package in the public PyPI, so I need to force install of the proprietary package at a specific version. I'm building a docker image from a Debian-Buster base image, so pip, setuptools and tox are all freshly installed, the image brings python 3.8 and pip upgrades itself to version 21.2.4.
Solution 1 - dependency_links
I followed the instructions at the post linked above to put the prop package in install_requires and dependency_links. Here are the relevant lines from my setup.py:
install_requires=["requests", "proppkg==70.1.0"],
dependency_links=["https://site.mycompany.com/path/to/proppkg-70.1.0.tar.gz#egg=proppkg-70.1.0"]
Installation is successful in Debian-Buster if I run python3 setup.py install in my package directory. I see the proprietary package get downloaded and installed.
Installation fails if I run pip3 install . also tox (version 3.24.4) fails similarly. In both cases, pip shows a message "Looking in indexes" then fails with "ERROR: Could not find a version that satisfies the requirement".
Solution 2 - PEP 508
Studying SO answer pip ignores dependency_links in setup.py which states that dependency_links is deprecated, I started over, revised setup.py to have:
install_requires=[
"requests",
"proppkg # https://site.mycompany.com/path/to/proppkg-70.1.0.tar.gz#egg=proppkg-70.1.0"
],
Installation is successful in Debian-Buster if I run pip3 install . in my package directory. Pip shows a message "Looking in indexes" but still downloads and installs the proprietary package successfully.
Installation fails in Debian-Buster if I run python3 setup.py install in my package directory. I see these messages:
Searching for proppkg# https://site.mycompany.com/path/to/proppkg-70.1.0.tar.gz#egg=proppkg-70.1.0
..
Reading https://pypi.org/simple/proppkg/
..
error: Could not find suitable distribution for Requirement.parse(...).
Tox also fails in this scenario as it installs dependencies.
Really speculating now, it almost seems like there's an ordering issue. Tox invokes pip like this:
python -m pip install --exists-action w .tox/.tmp/package/1/te-0.3.5.zip
In that output I see "Collecting proppkg# https://site.mycompany.com/path/to/proppkg-70.1.0.tar.gz#egg=proppkg-70.1.0" as the first step. That install fails because it fails to import package requests. Then tox continues collecting other dependencies. Finally tox reports as its last step "Collecting requests" (and that succeeds). Do I have to worry about ordering of install steps?
I'm starting to think that maybe the proprietary package is broken. I verified that the prop package setup.py has requests in its install_requires entry. Not sure what else to check.
Workaround solution
My workaround is installing the proprietary package in the docker image as a separate step before I install my own package, just by running pip3 install https://site.mycompany.com/path/to/proppkg-70.1.0.tar.gz. The setup.py has the PEP508 URL in install_requires. Then pip and tox find the prop package in the pip cache, and work fine.
Please suggest what to try for the latest pip and tox, or if this is as good as it gets, thanks in advance.
Update - add setup.py
Here's a (slightly sanitized) version of my package's setup.py
from setuptools import setup, find_packages
def get_version():
"""
read version string
"""
version_globals = {}
with open("te/version.py") as fp:
exec(fp.read(), version_globals)
return version_globals['__version__']
setup(
name="te",
version=get_version(),
packages=find_packages(exclude=["tests.*", "tests"]),
author="My Name",
author_email="email#mycompany.com",
description="My Back-End Server",
entry_points={"console_scripts": [
"te-be=te.server:main"
]},
python_requires=">=3.7",
install_requires=["connexion[swagger-ui]",
"Flask",
"gevent",
"redis",
"requests",
"proppkg # https://site.mycompany.com/path/to/proppkg-70.1.0.tar.gz#egg=proppkg-70.1.0"
],
package_data={"te": ["openapi_te.yml"]},
include_package_data=True, # read MANIFEST.in
)

How to add/install python libraries to my github project?

I'm building my first project on GitHub and my python src code uses an open-source, 3rd-party library that I have installed on my computer. However, I heard it is best practice to create a dep (dependencies) folder to store any additional libraries I would need. How do I actually install the libraries in the dep folder and use them from there instead of my main computer?
You have to create a requirements.txt file with each package on a separate line. e.g.
pandas==0.24.2
You also might want to add a setup.py to your python package. In the setup you have to use "install_requires" argument. Although install_requires will not install packages when installing your package but will let the user know which packages are needed. The user can refer to the requirements.txt to see the requirements.
You can check it here: https://packaging.python.org/discussions/install-requires-vs-requirements/
The following is an example of setup.py file:
from distutils.core import setup
from setuptools import find_packages
setup(
name='foobar',
version='0.0',
packages=find_packages(),
url='',
license='',
author='foo bar',
author_email='foobar#gmail.com',
description='A package for ...'
install_requires=['A','B']
)
Never heard about installing additional libraries in a dependencies folder.
Create a setup python file in your root folder if you don't already have it, in there you can define what packages (libraries as you call them) your project needs. This is a simple setup file for example:
from setuptools import setup, find_packages
setup(
name = "yourpackage",
version = "1.2.0",
description = "Simple description",
packages = find_packages(),
install_requires = ['matplotlib'] # Example of external package
)
When installing a package that has this setup file it automatically also install every requirement in your VENV. And if you're using pycharm then it also warns you if there's a requirement that's not installed.

Python package additional name

I'm building a new PyPI package based on an existing open source project using setuptools and add some code modifications (they are not the same).
Example:
opensource-custom=2.13.1
Since this project requires dependencies that will look for opensource
what options can I pass to my setup.py when building my wheel files so when I do pip freeze/pip list I can see both?
opensource-custom=2.13.1
opensource=2.13.0
An example of this scenario is intel-numpy if you do a pip install of it, it will generate a copy of numpy.
>pip install intel-numpy
>pip freeze
icc-rt==2019.0
intel-numpy==1.15.1
intel-openmp==2019.0
mkl==2019.0
mkl-fft==1.0.6
mkl-random==1.0.1.1
numpy==1.15.1
tbb==2019.0
tbb4py==2019.0
It sounds like you want to make opensource a dependency of opensource-custom. To do this, you can specify the install_requires parameter in setup.py:
from setuptools import setup
setup(
name='opensource-custom',
install_requires=[
'opensource',
],
...
)
See https://packaging.python.org/guides/distributing-packages-using-setuptools/#install-requires

How to install a dependency from a submodule in Python?

I have a Python project with the following structure (irrelevant source files omitted for simplicity):
myproject/
mysubmodule/
setup.py
setup.py
The file myproject/setup.py uses distutils.core.setup to install the module myproject and the relevant sources. However, myproject requires mysubmodule to be installed (this is a git submodule). So what I am doing right now is:
myproject/$ cd mysubmodule
myproject/mysubmodule/$ python setup.py install
myproject/mysubmodule/$ cd ..
myproject/$ python setup.py install
This is too tedious for customers, especially if the project will be extended by further submodules in the future.
Is there a way to automate the installation of mysubmodule when calling myproject/setup.py?
setuptools.find_packages() is able to discover submodules
Your setup.py should look like
from setuptools import setup, find_packages
setup(
packages=find_packages(),
# ...
)
Create a package for mysubmodule with its own setup.py and let the top-level package depend on that package in its setup.py. This means you only need to make the packages / dependencies available and run python setup.py install on the top-level package.
The question then becomes how to ship the dependencies / packages to your customers but this can be solved by putting them in a directory and configuring setup.py to include that directory when searching for dependencies.
The alternative is to "vendor" mysubmodule which simply means including it all in one package (no further questions asked) and having one python setup.py install to install the main package. For example, pip vendors (includes) requests so it can use it without having to depend on that requests package.

Categories