Cannot load CLoader with pyyaml - python

I'm working on a python project using pyyaml. I need to run it in a Docker container based on bitnami/minideb:jessie. Python version is 2.7.9.
The original code is using CLoader and I cannot change it currently.
Any reason CLoader fails to load but Loader is fine ?
>>> import yaml
>>> yaml.__version__
'3.12'
>>> from yaml import Loader
>>> from yaml import CLoader
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: cannot import name CLoader
>>>
I cannot figure out what I'm missing here. Any idea ?
Running it from the Docker image python:2.7.9 does not raise any error then:
$ docker run -ti python:2.7.9 bash
#/ python
>>> from yaml import CLoader
>>> from yaml import Loader
>>>

By default, the setup.py script checks whether LibYAML is installed
and if so, builds and installs LibYAML bindings.
This is the minimum to get CLoader compiled and installed.
FROM ubuntu:20.04
RUN apt-get update && apt-get install -y \
python3 python3-dev python3-pip gcc libyaml-dev
RUN pip3 install pyyaml
# verify
RUN python3 -c "import yaml; yaml.CLoader"

I ran into the same problem. You need to install the libyaml-dev package, then install libyaml and pyyaml from source. Here's the complete Dockerfile for minideb:jessie:
FROM bitnami/minideb:jessie
RUN apt-get update
RUN apt-get install -y \
automake \
autoconf \
build-essential \
git-core \
libtool \
libyaml-dev \
make \
python \
python-dev \
python-pip
RUN pip install --upgrade pip
RUN pip install Cython==0.29.10
RUN mkdir /libyaml
WORKDIR /libyaml
RUN git clone https://github.com/yaml/libyaml.git . && \
git checkout dist-0.2.2 && \
autoreconf -f -i && \
./configure && \
make && \
make install
RUN mkdir /pyyaml
WORKDIR /pyyaml
RUN git clone https://github.com/yaml/pyyaml.git . && \
git checkout 5.1.1 && \
python setup.py install
RUN python -c "import yaml; from yaml import CLoader; print 'Loaded CLoader!'"

A couple of additions to others' solutions:
If you want the install command to hard-fail if the libyaml C extension won't build (instead of silently falling back to a pure-Python only install), you can pass the --with-libyaml global option, eg: python setup.py --with-libyaml install.
If you're doing this with something that might ever need to be upgraded (eg implicitly via another package's requirement for a higher pyyaml version), it's better to use pip instead of directly calling setup.py, as that (currently) uses a pure distutils installation, which pip will fail to uninstall later. You'll see an error like "ERROR: Cannot uninstall 'PyYAML'. It is a distutils installed project and thus we cannot accurately determine which files belong to it which would lead to only a partial uninstall."
Doing the required extension build with pip looks something like pip install --global-option='--with-libyaml' pyyaml.

I'm just copying the developer's answer from the issue linked above, but this happens because pyyaml only installs the libyaml bindings (CLoader & co.) if it finds the libyaml-dev package (that's the debian package, anyway) at install time. If it doesn't find it, it prints a warning and skips the libyaml bindings.
So, install libyaml-dev before installing pyyaml.

I tried all the step mentions, and the following steps fixed my issue.
Install
apt-get install -y gcc libyaml-dev
pip install --ignore-installed --global-option='--with-libyaml' pyyaml
Test
python -c "import yaml; yaml.CLoader"

Related

How do I install a specific python version without pyenv within nvidia docker so that it interacts well with poetry?

I am trying to build a docker image that contains cuda, cudnn and python, each with specific versions that are templatable as a base for downstream users.
(In this example I have replace all the irrelevant templating with hard-coded versions, this is just FYI as a motivation).
Please note that the following questions are not duplicates:
How to install python in a docker image? does not involve poetry
Integrating Python Poetry with Docker Does not concern itself with installing dependencies
How do I integrate pyenv, poetry, and docker? This works for me already, I am looking for a different solution
I have achieved what I want using pyenv to install the specific python version within docker inside the nvidia image.
However, this solution is not optimal since the resulting image is about 1.5GB larger than what I think should be possible. Sidenote: I know that there are other ways to reduce the image size further that I have not done in this example. This is not the question here.
I have prepared a dummy pyproject.toml and poetry.lock to demonstrate the issue that I am currently facing:
pyproject.toml
[tool.poetry]
name = "example_project"
version = "1.0.0"
description = ""
authors = ["RunOrVeith"]
[tool.poetry.dependencies]
python = ">=3.8,<3.11"
scipy = "^1.9.3"
[build-system]
requires = ["poetry-core>=1.1.0"]
build-backend = "poetry.core.masonry.api"
Working Dockerfile.pyenv
FROM nvidia/cuda:11.0.3-cudnn8-runtime-ubuntu20.04 as base
ARG PYTHON_VERSION=3.8
ENV DEBIAN_FRONTEND=noninteractive
# Set-up necessary Env vars for PyEnv
ENV PYENV_ROOT /root/.pyenv
ENV PATH $PYENV_ROOT/shims:$PYENV_ROOT/bin:$PATH
ENV PATH="/root/.local/bin/:$PATH"
# Install essentials for pyenv https://github.com/pyenv/pyenv/wiki
RUN apt-get update \
&& apt-get install -y --no-install-recommends \
make build-essential libssl-dev zlib1g-dev libbz2-dev libreadline-dev libsqlite3-dev wget ca-certificates \
curl llvm libncurses5-dev xz-utils tk-dev libxml2-dev libxmlsec1-dev libffi-dev liblzma-dev mecab-ipadic-utf8 git \
&& rm -rf /var/lib/apt/lists/*
# Install pyenv
RUN set -ex \
&& curl https://pyenv.run | bash \
&& pyenv update \
&& pyenv install $PYTHON_VERSION \
&& pyenv global $PYTHON_VERSION \
&& pyenv rehash \
&& pip install --upgrade pip
# Install poetry
RUN curl -sSL https://install.python-poetry.org | python - \
&& poetry --version && poetry config virtualenvs.create false
FROM base as example # The template that I want to provide ends here, this is just for demoing the issue
WORKDIR /app
COPY pyproject.toml .
COPY poetry.lock .
RUN poetry install --no-interaction --no-ansi
The version that doesn't work Dockerfile.plain
FROM nvidia/cuda:11.0.3-cudnn8-runtime-ubuntu20.04 as base
ENV DEBIAN_FRONTEND=noninteractive
ENV PYTHON_VERSION=3.8
ENV PATH="/root/.local/bin/:$PATH"
RUN apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys A4B469963BF863CC \
&& apt update \
&& apt install -y git curl \
&& apt install -y --no-install-recommends make build-essential
# Don't be confused, distutils-3.9 also installs python 3.8 https://github.com/deadsnakes/issues/issues/150
RUN apt install -y --no-install-recommends python${PYTHON_VERSION} python${PYTHON_VERSION}-dev python${PYTHON_VERSION}-distutils python${PYTHON_VERSION}-venv \
&& update-alternatives --install /usr/bin/python python /usr/bin/python${PYTHON_VERSION} 10 \
&& update-alternatives --install /usr/bin/python3 python3 /usr/bin/python${PYTHON_VERSION} 10 \
&& apt-get install -y --no-install-recommends python3-pip python3-setuptools \
&& update-alternatives --install /usr/local/bin/pip pip /usr/bin/pip 10 \
&& update-alternatives --install /usr/local/bin/pip3 pip3 /usr/bin/pip 10 \
&& apt-get clean
WORKDIR /virtualenvs
RUN curl -sSL https://install.python-poetry.org | python${PYTHON_VERSION} - \
&& poetry --version && poetry config virtualenvs.create false
FROM base as example
WORKDIR /app
COPY pyproject.toml .
COPY poetry.lock .
RUN poetry install --no-interaction --no-ansi
You can build this using
DOCKER_BUILDKIT=1 docker build -t github:example-plain --target example -f Dockerfile.plain .
and then run using
docker run -it github:example-plain bash
Here is the issue:
All the following commands are run from within the docker image.
According to poetry, everything is installed:
root#5e1ffb1f971c:/app# poetry show
Skipping virtualenv creation, as specified in config file.
numpy 1.23.4 NumPy is the fundamental package for array computing with Python.
scipy 1.9.3 Fundamental algorithms for scientific computing in Python
root#5e1ffb1f971c:/app# poetry run pip --version
Skipping virtualenv creation, as specified in config file.
pip 20.0.2 from /usr/lib/python3/dist-packages/pip (python 3.8)
However using regular pip, there is nothing, and imports also fail.
If I use poetry to import something, it also does not work.
root#5e1ffb1f971c:/app# pip --version
pip 20.0.2 from /usr/lib/python3/dist-packages/pip (python 3.8)
root#5e1ffb1f971c:/app# pip freeze
root#5e1ffb1f971c:/app# python -c "import scipy"
Traceback (most recent call last):
File "<string>", line 1, in <module>
ModuleNotFoundError: No module named 'scipy'
root#5e1ffb1f971c:/app# poetry run python -c "import scipy"
Skipping virtualenv creation, as specified in config file.
Traceback (most recent call last):
File "<string>", line 1, in <module>
ModuleNotFoundError: No module named 'scipy'
What is also interesting is that if I upgrade pip with poetry it tells me it can't uninstall pip, I am assuming this is due to this ubuntu patch that tries to prevent me from breaking the system (even though I just install pip).
Afterwards, the poetry pip executable also points somewhere else.
root#5e1ffb1f971c:/app# poetry run pip install --upgrade pip
Skipping virtualenv creation, as specified in config file.
Collecting pip
Using cached pip-22.3.1-py3-none-any.whl (2.1 MB)
Installing collected packages: pip
Attempting uninstall: pip
Found existing installation: pip 20.0.2
Not uninstalling pip at /usr/lib/python3/dist-packages, outside environment /usr
Can't uninstall 'pip'. No files were found to uninstall.
Successfully installed pip-22.3.1
root#5e1ffb1f971c:/app# poetry run pip --version
Skipping virtualenv creation, as specified in config file.
pip 22.3.1 from /usr/local/lib/python3.8/dist-packages/pip (python 3.8)
So how do I set this up so that I get a fresh python install of whichever version I configure, and it works with poetry? It is also required that the python and python3 aliases point to whatever poetry is using.
Reference with working version:
If I do the same commands with the working version using pyenv, it looks like this:
root#c0a9af7f05b4:/app# pip freeze
numpy==1.23.4
scipy==1.9.3
root#c0a9af7f05b4:/app# poetry show
Skipping virtualenv creation, as specified in config file.
numpy 1.23.4 NumPy is the fundamental package for array computing with Python.
scipy 1.9.3 Fundamental algorithms for scientific computing in Python
root#c0a9af7f05b4:/app# poetry run pip --version
Skipping virtualenv creation, as specified in config file.
pip 22.3.1 from /root/.pyenv/versions/3.8.15/lib/python3.8/site-packages/pip (python 3.8)
root#c0a9af7f05b4:/app# pip --version
pip 22.3.1 from /root/.pyenv/versions/3.8.15/lib/python3.8/site-packages/pip (python 3.8)

No module named PyInstaller' after what appears to be a successful install

I am building a docker image. Within it I am trying to install a number of python packages within one RUN. All packages within that command are installed correctly, but PyInstaller is not for some reason, although the build logs make me think that it should have been: Successfully installed PyInstaller
The minimal Dockerfile to reproduce the issue:
FROM debian:buster
RUN apt-get update && \
apt-get install -y \
python3 \
python3-pip \
unixodbc-dev
RUN python3 -m pip install --no-cache-dir pyodbc==4.0.30 && \
python3 -m pip install --no-cache-dir Cython==0.29.19 && \
python3 -m pip install --no-cache-dir PyInstaller==3.5 && \
python3 -m pip install --no-cache-dir selenium==3.141.0 && \
python3 -m pip install --no-cache-dir bs4==0.0.1
RUN python3 -m PyInstaller
The last run command fails with /usr/bin/python3: No module named PyInstaller, all other packages can be imported as expected.
The issue is also reproducible with this Dockerfile:
FROM debian:buster
RUN apt-get update && \
apt-get install -y \
python3 \
python3-pip
RUN python3 -m pip install --no-cache-dir PyInstaller==3.5
RUN python3 -m PyInstaller
What is the reason for this issue and what is the fix?
EDIT:
When I run the layer before the last RUN, I can see that no PyInstaller is installed, but I can run python3 -m pip install --no-cache-dir PyInstaller==3.5 and then it works without changing anything else.
Although I do not fully undestand the reason behind it, it seems like the --no-cache-dir option was causing the issue. The dockerfile below builds without an issue:
FROM debian:buster
RUN apt-get update && \
apt-get install -y \
python3 \
python3-pip
RUN python3 -m pip install PyInstaller==3.5
RUN python3 -m PyInstaller --help
Edit: This seems to be an issue outside of PyInstaller, but with the specific version of pip, see https://github.com/pyinstaller/pyinstaller/issues/6963 for details.
I'm not familiar with PyInstaller but in their requirements page they wrote:
If the pip setup fails to build a bootloader, or if you do not use pip
to install, you must compile a bootloader manually. The process is
described under Building the Bootloader.
Have you try that in your Dockerfile?
(And you're totally right, it should fail... )

Minimal SciPy Dockerfile

I have a Dockerfile like the following, app code is omitted:
FROM python:3
# Binary dependencies
RUN apt update && apt install -y gfortran libopenblas-dev liblapack-dev
# Wanted Python packages
RUN python3 -m pip install mysqlclient numpy scipy pandas matplotlib
It works fine but produces an image of 1.75 GB in size (while code is about 50 MB). How can I reduce such huge volume??
I also tried to use Alpine Linux, like this:
FROM python:3-alpine
# Binary dependencies for numpy & scipy; though second one doesn't work anyway
RUN apk add --no-cache --virtual build-dependencies \
gfortran gcc g++ libstdc++ \
musl-dev lapack-dev freetype-dev python3-dev
# For mysqlclient
RUN apk --no-cache add mariadb-dev
# Wanted Python packages
RUN python3 -m pip install mysqlclient numpy scipy pandas matplotlib
But Alpine leads to many different strange errors. Error from the upper code:
File "scipy/odr/setup.py", line 28, in configuration
blas_info['define_macros'].extend(numpy_nodepr_api['define_macros'])
KeyError: 'define_macros'
So, how one can get minimal possible (or at least just smaller) image of Python 3 with mentioned packages?
There are several things you can do to make your Docker image smaller.
Use the python:3-slim Docker image as a base. The -slim images do not include packages needed for compiling software.
Pin the Python version, let's say to 3.8. Some packages do not have wheel files for python 3.9 yet, so you might have to compile them. It is good practice, in general, to use a more specific tag because the python:3-slim tag will point to different versions of python at different points in time.
You can also omit the installation of gfortran, libopenblas-dev, and liblapack-dev. Those packages are necessary for building numpy/scipy, but if you install the wheel files, which are pre-compiled, you do not need to compile any code.
Use --no-cache-dir in pip install to disable the cache. If you do not include this, then pip's cache counts toward the Docker image size.
There are no linux wheels for mysqlclient, so you will have to compile it. You can install build dependencies, install the package, then remove build dependencies in a single RUN instruction. Keep in mind that libmariadb3 is a runtime dependency of this package.
Here is a Dockerfile that implements the suggestions above. It makes a Docker image 354 MB large.
FROM python:3.8-slim
# Install mysqlclient (must be compiled).
RUN apt-get update -qq \
&& apt-get install --no-install-recommends --yes \
build-essential \
default-libmysqlclient-dev \
# Necessary for mysqlclient runtime. Do not remove.
libmariadb3 \
&& rm -rf /var/lib/apt/lists/* \
&& python3 -m pip install --no-cache-dir mysqlclient \
&& apt-get autoremove --purge --yes \
build-essential \
default-libmysqlclient-dev
# Install packages that do not require compilation.
RUN python3 -m pip install --no-cache-dir \
numpy scipy pandas matplotlib
Using alpine linux was a good idea, but alpine uses muslc instead of glibc, so it is not compatible with most pip wheels. The result is that you would have to compile numpy/scipy.

Docker build fails if pip is installed and updated to 10.0.1 in a single RUN section

How would you explain docker build failure with Dockerfile1, and it's success with Dockerfile2 (see below).
1)
// Dockerfile1
FROM ubuntu:16.04
RUN apt-get -y update && \
apt-get -y install python-pip python-dev build-essential && \
pip install --upgrade pip && \
pip install --upgrade virtualenv
docker build . fails with the following err
Collecting pip
Downloading
https://files.pythonhosted.org/packages/0f/74/ecd13431bcc456ed390b44c8a6e917c1820365cbebcb6a8974d1cd045ab4/pip-10.0.1-py2.py3-none-any.whl (1.3MB)
Installing collected packages: pip
Found existing installation: pip 8.1.1
Not uninstalling pip at /usr/lib/python2.7/dist-packages, outside
environment /usr
Successfully installed pip-10.0.1
Traceback (most recent call last):
File "/usr/bin/pip", line 9, in <module>
from pip import main
ImportError: cannot import name main
The command '/bin/sh -c apt-get -y update && apt-get -y install
python-pip python-dev build-essential && pip install --upgrade pip && pip install --upgrade virtualenv && virtualenv /venv' returned a non-zero code: 1
However, it succeeds if we split it into two RUN.
2)
// Dockerfile2
FROM ubuntu:16.04
RUN apt-get -y update && \
apt-get -y install python-pip python-dev build-essential && \
pip install --upgrade pip
RUN pip install --upgrade virtualenv
The installation failure for pip is related to this reported issue. So my questions:
Why does docker build fail in the first case? If we just run those command in bash, there wont be any error.
Why does docker build succeed in the second case? How is it related to layering concept in docker?
Why specifying pip version in Dockerfile1 (i.e. pip install --upgrade pip=0.9.3) solves the problem too?
Update (May 6, 2018):
I've figured out the issue. What happens here is as below:
apt-get -y install python-pip installs an old version of pip whose shim script import pip's main directly.
pip install --upgrade pip installs pip 10.0.1 and moves main into an internal directory _internal. It adds its shim script to PATH.
Calling pip fails as it still calls the old shim script as it's path is cached. Running hash -d pip in between fixes the issue.
So apparently, splitting install and update into two RUN sections has similar effect as hash -d pip. Workarounds (also suggested by Andriy Maletsky) are 1) pin pip update to 9.0.3, or 2) install (latest) pip from source in the first place, or 3) use hash -r in between, or 4) use another RUN command for later use of pip.
The problem is that pip executable (/usr/bin/pip) breaks while updating pip from version 9 to version 10.
Possible solutions:
1. Do not update and use pip v9
2. Do not use apt-get to install pip. Download it manually.
Why does docker build fail in the first case? If we just run those command in bash, there wont be any error.
No, there will be an error. I ran those commands inside docker run --rm -it ubuntu:16.04 bash and got it.
Why does docker build succeed in the second case? How is it related to layering concept in docker?
I believe you made a mistake somewhere in second RUN and it's silencing an error (in that place which you didn't provide). For example, this will work (because ; used instead of && and execution doesn't break after bad command):
RUN pip install --upgrade virtualenv && \
virtualenv /venv; source /venv/bin/activate
Why specifying pip version in Dockerfile1 (i.e. pip install --upgrade pip=0.9.3) solves the problem too?
Because this pip bug appeared in version 10.
P.S. You should not update or manually change files you added to your system via apt-get (you are doing this via pip install --upgrade pip).

Installing XGBoost

I am trying use the XGBoost package, but I am having trouble installing it. I am following the installation guide
here
https://xgboost.readthedocs.io/en/latest/build.html#python-package-installation. I have successfully built xgboost for OSX using
git clone --recursive https://github.com/dmlc/xgboost
cd xgboost; cp make/minimum.mk ./config.mk; make -j4
However, when I try to install the python package in my terminal using this code
cd python-package; sudo python setup.py install
I get the error python: command not found. I am not sure why I get this error because I have python installed and I can run ipython notebooks. Python is install here on my computer /usr/local/Cellar/python/2.7.8/Frameworks/Python.framework/Versions/2.7/lib/python2.7. Do I need to add a path in my bash_profile to access it? I don't understand why I can't use python from the command line.
I have answered similar issue in this question. You can install xgboost library along with other essential libraries as follows(please choose based on the libraries sufficient for your project), my main focus in this answer is to make it helpful in setting up for most data science projects requiring sklearn, pandas, scipy and xgboost algorithms along with visualization libraries.
# installing essentials
apt-get update; \
apt-get install -y \
python python-pip \
build-essential \
python-dev \
python-setuptools \
python-matplotlib \
libatlas-dev \
curl \
libatlas3gf-base && \
apt-get clean
# upgrading pip
curl -O https://bootstrap.pypa.io/get-pip.py && \
python get-pip.py && \
rm get-pip.py
# installing libraries
pip install numpy==1.13.1
pip install scipy
pip install -U scikit-learn
pip install seaborn
pip install --pre xgboost
If you're still having environment issues I would suggest using this Dockerfile. You might also find Datmo conversion useful to facilitate this.
DISCLAIMER: I work at this company called Datmo, which is building a community of developers by simplifying the machine learning workflow.
If you have python in your /usr/bin/ directory, all you need to do is to add that directory to your path.
Add this line to your .bash_profile and restart your shell.
export PATH="$PATH:/usr/bin"
Then you should be able to use any of the python versions in your /usr/bin directory. python, python3 etc. Hope this helps.

Categories