Docker pip install creates recursive tmp/pip-build/tmp/pip-build... folder - python

I'm a bit new to Docker and Python apps. I was running into a really perplexing problem, and after a bit of button-pushing, I miraculously solved it. However, I understand neither the problem nor the solution, and want to understand both.
So I have a Dockerfile in the root directory of my app that does something like this:
COPY . .
RUN apt-get update && apt-get install -y enchant \
&& pip install --extra-index-url=${ARTIFACTORY} --no-cache-dir requirements.txt \
&& pip install . \ # installs my python app using setup.py
&& python -m app.run_model
ENTRYPOINT ...
This was failed consistently because it ran out of disk space. Ok, fine, I deleted old, unused images. But I don't think that was the issue, because it kept failing, and was also generating these super long, super weird recursive filenames like:
/tmp/pip-fih7z5-build/tmp/pip-fih7z5-build/tmp/pip-fih7z5-build/tmp/pip-fih7z5-build/tmp/pip-fih7z5-build/tmp/pip-fih7z5-build/tmp/pip-fih7z5-build/tmp/pip-fih7z5-build/tmp/pip-fih7z5-build/tmp/pip-fih7z5-build/tmp/pip-fih7z5-build/...
Adding the following wrapper around the Docker command somehow worked:
COPY . /workdir
RUN cd /workdir \
...
&& rm -rf /workdir
To get past the installation phase (although now it appears like the app process is still failing.) I'm not sure what was/is going on. Does anyone have any insight? My best guess is somehow the two pip installs were creating some sort of recursive nightmare?
My setup.py is pretty standard, I think:
#!/usr/bin/env python
from glob import glob
from os.path import abspath, basename, dirname, join, splitext
from setuptools import find_packages, setup
here = abspath(dirname(__file__))
with open(join(here, 'README.md')) as f:
long_description = f.read()
setup(
...
packages=find_packages('src'),
package_dir={'': 'src'},
py_modules=[splitext(basename(path))[0] for path in glob('src/*.py')],
zip_safe=False,
include_package_data=True,
install_requires=[
'scikit-learn==0.18.1',
'scipy==0.19.1',
'nltk==3.2.3',
'requests==2.17.3',
'jsonschema==2.6.0',
'pandas==0.20.1',
'numpy==1.13.0',
'textblob==0.12.0',
'textstat==0.3.1',
'langdetect==1.0.7',
'unidecode==0.4.20',
'Flask==0.12.1',
'Flask-Env==1.0.1',
'pyenchant==1.6.11'
],
setup_requires=[
'flake8==3.3.0',
],
tests_require=[],
)
```

Related

pip install -e . vs setup.py

I have been locally editing (inside a conda env) the package GSTools cloned from the github repo https://github.com/GeoStat-Framework/GSTools, to adapt it to my own purposes. The package is c++ wrapped in python (cython).
I've thus far used pip install -e . in the main package dir for my local changes. But I want to now use their OpenMP support by setting the env variable export GSTOOLS_BUILD_PARALLEL=1 . Then doing pip install -e . I get among other things in the terminal ...
Installing collected packages: gstools
Running setup.py develop for gstools
Successfully installed gstools-1.3.6.dev37
The issue is nothing actually changed because, setup.py (shown below) is supposed to print "OpenMP=True" if the env variable is set to GSTOOLS_BUILD_PARALLEL=1 in the linux terminal , and print something else if its not set to 1.
here is setup.py.
# -*- coding: utf-8 -*-
"""GSTools: A geostatistical toolbox."""
import os
​
import numpy as np
from Cython.Build import cythonize
from extension_helpers import add_openmp_flags_if_available
from setuptools import Extension, setup
​
# cython extensions
CY_MODULES = [
Extension(
name=f"gstools.{ext}",
sources=[os.path.join("src", "gstools", *ext.split(".")) + ".pyx"],
include_dirs=[np.get_include()],
define_macros=[("NPY_NO_DEPRECATED_API", "NPY_1_7_API_VERSION")],
)
for ext in ["field.summator", "variogram.estimator", "krige.krigesum"]
]
# you can set GSTOOLS_BUILD_PARALLEL=0 or GSTOOLS_BUILD_PARALLEL=1
if int(os.getenv("GSTOOLS_BUILD_PARALLEL", "0")):
added = [add_openmp_flags_if_available(mod) for mod in CY_MODULES]
print(f"## GSTools setup: OpenMP used: {any(added)}")
else:
print("## GSTools setup: OpenMP not wanted by the user.")
​
# setup - do not include package data to ignore .pyx files in wheels
setup(ext_modules=cythonize(CY_MODULES), include_package_data=False)
I tried instead just python setup.py install but that gives
UNKNOWN 0.0.0 is already the active version in easy-install.pth
Installed /global/u1/b/benabou/.conda/envs/healpy_conda_gstools_dev/lib/python3.8/site-packages/UNKNOWN-0.0.0-py3.8-linux-x86_64.egg
Processing dependencies for UNKNOWN==0.0.0
Finished processing dependencies for UNKNOWN==0.0.0
and import gstools
no longer works correctly.
So how can I install my edited version of the package with OpenMP support?
developer of GSTools here.
I guess you don't see the printed message, because pip is suppressing output for the setup. So you could try making pip verbose with:
GSTOOLS_BUILD_PARALLEL=1 pip install -v -e .
BTW, we are always interested in enhancements. So maybe you are willing the share your edits on GSTools? :-)
Cheers,
Sebastian

How to install requirements.txt and ignore not found models [duplicate]

I am installing packages from requirements.txt
pip install -r requirements.txt
The requirements.txt file reads:
Pillow
lxml
cssselect
jieba
beautifulsoup
nltk
lxml is the only package failing to install and this leads to everything failing (expected results as pointed out by larsks in the comments). However, after lxml fails pip still runs through and downloads the rest of the packages.
From what I understand the pip install -r requirements.txt command will fail if any of the packages listed in the requirements.txt fail to install.
Is there any argument I can pass when running pip install -r requirements.txt to tell it to install what it can and skip the packages that it cannot, or to exit as soon as it sees something fail?
Running each line with pip install may be a workaround.
cat requirements.txt | xargs -n 1 pip install
Note: -a parameter is not available under MacOS, so old cat is more portable.
This solution handles empty lines, whitespace lines, # comment lines, whitespace-then-# comment lines in your requirements.txt.
cat requirements.txt | sed -e '/^\s*#.*$/d' -e '/^\s*$/d' | xargs -n 1 pip install
Hat tip to this answer for the sed magic.
For windows users, you can use this:
FOR /F %k in (requirements.txt) DO ( if NOT # == %k ( pip install %k ) )
Logic: for every dependency in file(requirements.txt), install them and ignore those start with "#".
For Windows:
pip version >=18
import sys
from pip._internal import main as pip_main
def install(package):
pip_main(['install', package])
if __name__ == '__main__':
with open(sys.argv[1]) as f:
for line in f:
install(line)
pip version <18
import sys
import pip
def install(package):
pip.main(['install', package])
if __name__ == '__main__':
with open(sys.argv[1]) as f:
for line in f:
install(line)
The xargs solution works but can have portability issues (BSD/GNU) and/or be cumbersome if you have comments or blank lines in your requirements file.
As for the usecase where such a behavior would be required, I use for instance two separate requirement files, one which is only listing core dependencies that need to be always installed and another file with non-core dependencies that are in 90% of the cases not needed for most usecases. That would be an equivalent of the Recommends section of a debian package.
I use the following shell script (requires sed) to install optional dependencies:
#!/bin/sh
while read dependency; do
dependency_stripped="$(echo "${dependency}" | sed -e 's/^[[:space:]]*//' -e 's/[[:space:]]*$//')"
# Skip comments
if [[ $dependency_stripped == \#* ]]; then
continue
# Skip blank lines
elif [ -z "$dependency_stripped" ]; then
continue
else
if pip install "$dependency_stripped"; then
echo "$dependency_stripped is installed"
else
echo "Could not install $dependency_stripped, skipping"
fi
fi
done < recommends.txt
Building on the answer by #MZD, here's a solution to filter out all text starting with a comment sign #
cat requirements.txt | grep -Eo '(^[^#]+)' | xargs -n 1 pip install
For Windows using PowerShell:
foreach($line in Get-Content requirements.txt) {
if(!($line.StartsWith('#'))){
pip install $line
}
}
One line PowerShell:
Get-Content .\requirements.txt | ForEach-Object {pip install $_}
If you need to ignore certain lines then:
Get-Content .\requirements.txt | ForEach-Object {if (!$_.startswith("#")){pip install $_}}
OR
Get-Content .\requirements.txt | ForEach-Object {if ($_ -notmatch "#"){pip install $_}}
Thanks, Etienne Prothon for windows cases.
But, after upgrading to pip 18, pip package don't expose main to public. So you may need to change code like this.
# This code install line by line a list of pip package
import sys
from pip._internal import main as pip_main
def install(package):
pip_main(['install', package])
if __name__ == '__main__':
with open(sys.argv[1]) as f:
for line in f:
install(line)
Another option is to use pip install --dry-run to get a list of packages that you need to install and then keep trying it and remove the ones that don't work.
A very general solution
The following code installs all requirements for:
multiple requirement files (requirements1.txt, requirements2.txt)
ignores lines with comments #
skips packages, which are not instalable
runs pip install each line (not each word as in some other answers)
$ (cat requirements1.txt; echo ""; cat requirements2.txt) | grep "^[^#]" | xargs -L 1 pip install
For Windows:
import os
from pip.__main__ import _main as main
error_log = open('error_log.txt', 'w')
def install(package):
try:
main(['install'] + [str(package)])
except Exception as e:
error_log.write(str(e))
if __name__ == '__main__':
f = open('requirements1.txt', 'r')
for line in f:
install(line)
f.close()
error_log.close()
Create a local directory, and put your requirements.txt file in it.
Copy the code above and save it as a python file in the same directory. Remember to use .py extension, for instance, install_packages.py
Run this file using a cmd: python install_packages.py
All the packages mentioned will be installed in one go without stopping at all. :)
You can add other parameters in install function. Like:
main(['install'] + [str(package)] + ['--update'])

How to install dependencies from requirements.txt in a Yocto recipe for a local Python project

What I should have:
I want my Yocto Project to build a package for my Python project with all dependencies inside. The project has to run out of box on the resulting read-only sdcard image.
It simply should install all requirements in the required version to the package.
What I tried without luck:
Calling pip in do_install():
"pip/pip3 is not found", even it's in RDEPENDS.
Anyway, I really prefer this way.
With inherit pypi:
When trying with inherit pypi, it tries to get also my local sources (my pyton project) from pypi. And I have always to copy the requirements to the recipe. This is not my preferred way.
Calling pip in pkg_postinst():
It tries to install the modules on first start and fails, because the system has no internet connection and it's a read-only system. It must run out of the box without installation on first boot time. Does its stuff to late.
Where I'll get around:
There should be no need to change anything in the recipes when something changes in requirements.txt.
Background information
I'm working with Yocto Rocko in a Linux environment.
In the Hostsystem, there is no pip installed. I want to run this one installed from RDEPENDS in the target system.
Building the Package (only this recipe) with:
bitbake myproject
Building the whole sdcard image:
bitbake myProject-image-base
The recipe:
myproject.bb (relevant lines):
RDEPENDS_${PN} = "python3 python3-pip"
APP_SOURCES_DIR := "${#os.path.abspath(os.path.dirname(d.getVar('FILE', True)) + '/../../../../app-sources')}"
FILESEXTRAPATHS_prepend := "${THISDIR}/files:"
SRC_URI = " \
file://${APP_SOURCES_DIR}/myProject \
...
"
inherit allarch # tried also with pypi and setuptools3 for the pypi way.
do_install() { # Line 116
install -d -m 0755 ${D}/myProject
cp -R --no-dereference --preserve=mode,links -v ${APP_SOURCES_DIR}/myProject/* ${D}/myProject/
pip3 install -r ${APP_SOURCES_DIR}/myProject/requirements.txt
# Tried also python ${APP_SOURCES_DIR}/myProject/setup.py install
}
# Tried also this, but it's no option because the data MUST be included in the Package:
# pkg_postinst_${PN}() {
# #!/bin/sh -e
# pip3 install -r /myProject/requirements.txt
# }
FILES_${PN} = "/myProject/*"
Resulting Errors:
Expected to install the listed modules from requirements.txt into the myProject package, so that the python app will run directly on the resulting readonly sdcard image.
With pip, I get:
| /*/tmp/work/*/myProject/0.1.0-r0/temp/run.do_install: 116: pip3: not found
| WARNING: exit code 127 from a shell command.
| ERROR: Function failed: do_install ...
When using pypi:
404 Not Found
ERROR: myProject-0.1.0-r0 do_fetch: Fetcher failure for URL: 'https://files.pythonhosted.org/packages/source/m/myproject/myproject-0.1.0.tar.gz'. Unable to fetch URL from any source.
=> But it should not fetch myProject, since it is already local and nowhere remote.
Any ideas? What would be the best way to reach to a ready to use sdcard image without the need to change recipes when requirements.txt changes?
You should use RDEPENDS_${PN} to take care of your dependencies for your app in the recipe.
For example, assuming your python app needs aws-iot-device-sdk-python module, you should add it to RDEPENDS in the recipe. In your case, it would be like this:
RDEPENDS_${PN} = "python3 \
python3-pip \
python3-aws-iot-device-sdk-python \
"
Here's the link showing the Python modules supported by OpenEmbedded Layer.
https://layers.openembedded.org/layerindex/branch/master/layer/meta-python/
If the modules you need are not there, you will likely need to create recipes for the modules.
My newest findings:
Yocto/bitbake seems to suppress interpreting the requirements, because this breaks automatic dependency resolving what could lead to conflicts.
Reason: The required modules from setup.py would not be stored as independent packages, but as part of my package. So, bitbake does not know about this modules what could conflict with other packages that probably requires same modules in different versions.
What was in my recipe:
MY_INSTALL_ARGS = "--root=${D} \
--prefix=${prefix} \
--install-lib=${PYTHON_SITEPACKAGES_DIR} \
--install-data=${datadir}"
do_install() {
PYTHONPATH=${PYTHON_SITEPACKAGES_DIR} \
${STAGING_BINDIR_NATIVE}/${PYTHON_PN}-native/${PYTHON_PN} setup.py install ${MY_INSTALL_ARGS}
}
If I execute this outside of bitbake as python3 setup.py install ${MY_INSTALL_ARGS}, all will be installed correctly, but in the recipe, no requirements are installed.
There is a parameter --no-deps, but I didn't find where it is set.
I think there could be one possibility to exploit the requirements out of setup.py:
Find out where to disable --no-deps in the openembedded/poky layer for easy_install.
Creating a separate PYTHON_SITEPACKAGES_DIR
Install this separate PYTHON_SITEPACKAGES_DIR in eg the home directory as private python modules dir.
This way, no python module would trigger a conflict.
Since I do not have the time to experiment with this, I'll define now one recipe per requirement.
You try installing pip?
Debian
apt-get install python-pip
apt-get install python3-pip
Centos
yum install python-pip

File not found even after adding the file inside docker

I have written a docker file which adds my python script inside the container:
ADD test_pclean.py /test_pclean.py
My directory structure is:
.
├── Dockerfile
├── README.md
├── pipeline.json
└── test_pclean.py
My json file which acts as a configuration file for creating a pipeline in Pachyderm is as follows:
{
"pipeline": {
"name": "mopng-beneficiary-v2"
},
"transform": {
"cmd": ["python3", "/test_pclean.py"],
"image": "avisrivastava254084/mopng-beneficiary-v2-image-7"
},
"input": {
"atom": {
"repo": "mopng_beneficiary_v2",
"glob": "/*"
}
}
}
Even though I have copied the official documentation's example, I am facing an error:
python3: can't open file '/test_pclean.py': [Errno 2] No such file or directory
My dockerfile is:
FROM debian:stretch
# Install opencv and matplotlib.
RUN apt-get update \
&& apt-get upgrade -y \
&& apt-get install -y unzip wget build-essential \
cmake git pkg-config libswscale-dev \
&& apt-get clean \
&& rm -rf /var/lib/apt
RUN apt update
RUN apt-get -y install python3-pip
RUN pip3 install matplotlib
RUN pip3 install pandas
ADD test_pclean.py /test_pclean.py
ENTRYPOINT [ "/bin/bash/" ]
Like some of the comments above suggest. It looks like your test_pclean.py file isn't in the docker image. Here's what should fix it.
Make sure your test_pclean.py file is in your docker image by having be included as part of the build process. Put this as the last step in your dockerfile:
COPY test_pclean.py .
Ensure that your pachyderm pipeline spec has the following for the cmd portion:
"cmd": ["python3", "./test_pclean.py"]
And this is more of a suggestion than a requirement.... You'll make life easier for yourself if you use image tags as part of your docker build. If you default to latest tag, any future iterations/builds of this step in your pipeline could have negitave affects (new bugs in your code etc.). Therefore the best practice is to use a particular version in your pipeline: mopng-beneficiary-v2-image-7:v1 and mopng-beneficiary-v2-image-7:v2 and so on. That way you can iterate on say version 3 and it won't affect the already running pipeline.
docker build -t avisrivastava254084/mopng-beneficiary-v2-image-7:v1
Then just update your pipeline spec to use avisrivastava254084/mopng-beneficiary-v2-image-7:v1
I was not changing the commits to my docker images on each build and hence, Kubernetes was using the local docker file that it had(w/o tags and commits, it doesn't acknowledge any change). Once I started using commit with each build, Kubernetes started downloading the intended docker image.

ImportError: No module named rfc822 pyramid python

When I attempt to run pyramid
[~/env/MyStore]# ../bin/pserve development.ini
It will show the following error
File "/home/vretinfo/env/lib/python3.2/site-packages/Paste-1.7.5.1-py3.2.egg/paste/fileapp.py", line 14, in <module>
from paste.httpheaders import *
File "/home/vretinfo/env/lib/python3.2/site-packages/Paste-1.7.5.1-py3.2.egg/paste/httpheaders.py", line 140, in <module>
from rfc822 import formatdate, parsedate_tz, mktime_tz
ImportError: No module named rfc822
How should I resolve this?
This is what I did to install
$ mkdir opt
$ cd opt
$ wget http://python.org/ftp/python/3.2.3/Python-3.2.3.tgz
$ tar -xzf Python-3.2.3.tgz
$ cd Python-3.2.3
./configure --prefix=$HOME/opt/Python-3.2.3
$ make;
$ make install
$ cd ~
$ wget http://python-distribute.org/distribute_setup.py
$ pico distribute_setup.py
* change first line to opt/Python-3.2.3/python
$ opt/Python-3.2.3/bin/python3.2 distribute_setup.py
$ opt/Python-3.2.3/bin/easy_install virtualenv
$ opt/Python-3.2.3/bin/virtualenv --no-site-packages env
$ cd env
$ ./bin/pip install passlib
$ ./bin/pip install pyramid_beaker
$ ./bin/pip install pyramid_mailer
$ ./bin/pip install pyramid_mongodb
$ ./bin/pip install pyramid_jinja2
$ ./bin/pip install Werkzeug
$ ./bin/pip install pyramid
$ ./bin/pcreate -s pyramid_mongodb MyShop
$ cd MyShop
$ ../bin/python setup.py develop
$ ../bin/python setup.py test -q
Ok, I've done some searching around on pyramid docs ( http://docs.pylonsproject.org/projects/pyramid/en/latest/narr/paste.html ).
It states on the 3rd paragraph
"However, all Pyramid scaffolds render PasteDeploy configuration files, to provide new developers with a standardized way of setting deployment values, and to provide new users with a standardized way of starting, stopping, and debugging an application."
So I made changes to development.ini and replaced
[server:main]
use = egg:waitress#main
and in setup.py, I added 'waitress' into the requires array
Next step, I did to totally remove all things related to Paste, in /home/vretinfo/env/ECommerce/,
$ rm -rf Paste*;rm -rf paste*
After this, I tried running test -q again, this is the stack trace:
[~/env/ECommerce]# ../bin/python setup.py test -q
/home/vretinfo/opt/Python-3.2.3/lib/python3.2/distutils/dist.py:257: UserWarning: Unknown distribution option: 'paster_plugins'
warnings.warn(msg)
running test
Checking .pth file support in .
/home/vretinfo/env/ECommerce/../bin/python -E -c pass
Searching for Paste>=1.7.1
Reading http://pypi.python.org/simple/Paste/
Reading http://pythonpaste.org
Best match: Paste 1.7.5.1
Downloading http://pypi.python.org/packages/source/P/Paste/Paste-1.7.5.1.tar.gz#md5=7ea5fabed7dca48eb46dc613c4b6c4ed
Processing Paste-1.7.5.1.tar.gz
Writing /tmp/easy_install-q5h5rn/Paste-1.7.5.1/setup.cfg
Running Paste-1.7.5.1/setup.py -q bdist_egg --dist-dir /tmp/easy_install-q5h5rn/Paste-1.7.5.1/egg-dist-tmp-e3nvmj
warning: no previously-included files matching '*' found under directory 'docs/_build/_sources'
It seems like paste is needed for pyramid1.4 for some reason. Perhaps someone have some insights on this.
I have managed to solve this issue through the folks in IRC#pyramid. I'm posting the solution in here in case someone encounters in future. I've tested this out in Python3.2, works ok now.
After running ./bin/pcreate -s <...>
in the folder of the project, development.ini
Change the following:
1. in the 1st line, rename [app:<Project>] to [app:main]
2. [server:main]
If it is egg:Paste#http, change it to
use = egg:waitress#main
3. remove [pipeline:main] and its section
in the same folder, make the necessary changes to setup.py:
1. requires = [....], add in waitress into the array, remove WebError from the array
2. remove paster_plugins=['pyramid']
Then finally, run
$ ../bin/python setup.py develop
Paste will not be installed or checked if it exists.

Categories