ModuleNotFoundError when running docker and poetry - python

I am running into an error when trying to run my container where it is saying it can't find a module while trying to import. Specifically:
ModuleNotFoundError: No module named 'sentry_sdk'
The following is my DockerFile which is a multistage build, it seems to install all the packages according to the console output.
###############################################
# Base Image
###############################################
FROM tiangolo/uvicorn-gunicorn-fastapi:python3.9 as python-base
ENV PYTHONUNBUFFERED=1 \
PYTHONDONTWRITEBYTECODE=1 \
PIP_NO_CACHE_DIR=off \
PIP_DISABLE_PIP_VERSION_CHECK=on \
PIP_DEFAULT_TIMEOUT=100 \
POETRY_VERSION=1.1.13 \
POETRY_HOME="/opt/poetry" \
POETRY_VIRTUALENVS_IN_PROJECT=true \
POETRY_NO_INTERACTION=1 \
PYSETUP_PATH="/opt/pysetup" \
VENV_PATH="/opt/pysetup/.venv"
# prepend poetry and venv to path
ENV PATH="$POETRY_HOME/bin:$VENV_PATH/bin:$PATH"
###############################################
# Builder Image
###############################################
FROM python-base as builder-base
# install poetry - respects $POETRY_VERSION & $POETRY_HOME
RUN curl -sSL https://install.python-poetry.org | python3 -
# copy project requirement files here to ensure they will be cached.
WORKDIR $PYSETUP_PATH
COPY pyproject.toml ./
# install runtime deps - uses $POETRY_VIRTUALENVS_IN_PROJECT internally
RUN poetry install --no-dev
###############################################
# Production Image
###############################################
FROM python-base as production
COPY --from=builder-base $PYSETUP_PATH $PYSETUP_PATH
COPY . .
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "80"]
The start of my main file is the following:
from logging import getLogger
from os import environ
from typing import List
from fastapi import FastAPI
from starlette.status import HTTP_200_OK
from sentry_sdk import init as SentryInit
from sentry_sdk.integrations.asgi import SentryAsgiMiddleware
from sentry_sdk.integrations.sqlalchemy import SqlalchemyIntegration
It is failing on the line:
from sentry_sdk import init as SentryInit
This is the first line where the package is not a default install on the container, so this may be related to the venv but I am not sure why or how.
My pyproject.toml looks like this:
[tool.poetry]
authors = ["xxx"]
name = "xxx"
description = "xxx"
version = "xxx"
[tool.poetry.dependencies]
asyncpg = "^0.21.0"
fastapi = "^0.73.0"
pydantic = "^1.9.0"
python = "^3.8.7"
sqlalchemy = "^1.3.22"
databases = "^0.5.5"
sentry-sdk = "^1.5.5"
[tool.poetry.dev-dependencies]
pytest = "^3.4"
httpx = "^0.22.0"
[build-system]
build-backend = "poetry.core.masonry.api"
requires = ["poetry-core>=1.0.0"]

OK I figured it out and now I feel dumb.
The issue was indeed related to the venv, basically, uvicorn is installed on the base image but not in my pyproject.toml. So poetry didn't install it in the venv. When I started the app in the Dockerfile using CMD it couldn't find uvicorn in the venv so went to the base install and ran from there. When I added uvicorn to the venv it all worked fine.

Related

<class 'TypeError'>, Value: can't pickle _thread.RLock objects from within Docker (using Redis Queue)

Running a python app from Docker using Redis, Redis Queue and PYODBC. Below is my Dockerfile and part of my code, which includes the line where it fails. This code works fine on PyCharm but does not work from Docker - is it because of the way Dockerfile is set up or is my code wrong in any way? As seen from the code, no other threading/multiprocessing is being used except Redis Queue/queue.enqueue
Dockerfile
FROM ubuntu:18.04
RUN apt-get update -y && \
apt-get install -y \
libpq-dev \
python3.7 \
gcc \
python3-pip \
unixodbc-dev
RUN apt-get update && apt-get install -y \
curl apt-utils apt-transport-https debconf-utils gcc build-essential g++-5\
&& rm -rf /var/lib/apt/lists/*
RUN curl https://packages.microsoft.com/keys/microsoft.asc | apt-key add -
RUN curl https://packages.microsoft.com/config/ubuntu/18.04/prod.list > /etc/apt/sources.list.d/mssql-release.list
RUN apt-get update
RUN ACCEPT_EULA=Y apt-get install -y --allow-unauthenticated msodbcsql17
RUN pip3 install pyodbc
WORKDIR /app
COPY requirements.txt ./
COPY Main.py EmailSender.py AppConfig.py Queries.py SettingsCopierEngine.py Worker.py wsgi.py ./
RUN pip3 install --upgrade pip && pip3 install --no-cache-dir -r requirements.txt
ENV FLASK_APP=Main.py
ENV FLASK_ENV=production
ENTRYPOINT ["gunicorn", "--bind", "0.0.0.0:5000", "wsgi:app"]
SettingsCopierEngine.py (Error on line 58 currentRequest = queue.enqueue(self.copySettingsFromSourceToDestination))
from pythonjsonlogger import jsonlogger
import sys
import uuid
import logging
import redis
from rq import Queue
from EmailSender import *
import AppConfig as cfg
import pandas as pd
import pytds
import inspect
import Queries as q
from github import Github
import re
from datetime import datetime
import pyodbc
import sqlalchemy as sa
from sqlalchemy.engine import URL
redisClient = redis.StrictRedis(host=cfg.RedisBroker["Host"], port=cfg.RedisBroker["Port"]
, decode_responses=True)
logger = logging.getLogger()
logHandler = logging.StreamHandler()
formatter = jsonlogger.JsonFormatter("%(asctime)s %(filename)s %(module)s %(funcName)s %(lineno)s %(message)s")
logHandler.setFormatter(formatter)
logger.setLevel(logging.INFO)
if logger.hasHandlers():
logger.handlers.clear()
logger.addHandler(logHandler)
class SettingsCopierEngine:
def __init__(self, userEmail):
logger.info("Initializing SettingsCopierEngine class")
self.requestId = str(uuid.uuid4())
self.userEmail = userEmail
self.masterSettingsQuery = ""
self.masterPricebookQuery = ""
self.emailClient = EmailSender(logger)
try:
logger.info("Checking if redis instance is up or not"
, extra={"RequestId": self.requestId, "UserEmail": self.userEmail})
redisClient.ping()
logger.info("Redis instance is up and running", extra={"RequestId": self.requestId, "UserEmail": self.userEmail})
except (redis.exceptions.ConnectionError, ConnectionRefusedError):
logger.error("Failed to connect to redis instance", extra={"RequestId": self.requestId, "UserEmail": self.userEmail})
raise
#setting up queue and passing each json request to queue with a job id.
def processData(self, requestData):
try:
self.requestData = requestData
workerRequestId = self.requestId
logger.info("Starting worker and working on current request",
extra={"RequestId": self.requestId, "UserEmail": self.userEmail})
queue = Queue(cfg.RedisBroker["QueueName"], connection=redisClient)
currentRequest = queue.enqueue(self.copySettingsFromSourceToDestination)
logger.info(f"Request: {currentRequest.id} added to queue at {currentRequest.enqueued_at}. {len(queue)} tasks in queue"
, extra={"RequestId": self.requestId, "UserEmail": self.userEmail})
except (ValueError, Exception):
functionName = inspect.currentframe().f_code.co_name
exc_type, exc_obj, exc_tb = sys.exc_info()
errorMessage = f"Type: {exc_type}, Value: {exc_obj}.\nFunctionName: {functionName}, Actual LineNumber: {exc_tb.tb_lineno}"
logger.error(errorMessage, extra={"requestId": self.requestId})
raise
return
def copySettingsFromSourceToDestination(self):
print(1)
I believe it's more of a Docker(file) or Python Version issue, as the inclusion of logger (that other posts mention) and the entire code, works fine in PyCharm. The python version in PyCharm is 3.7.4 while the Docker version is 3.6.9, could this be an issue? If so, how do I upgrade the python version in the Dockerfile?

Why is my container copying a non existing file?

I am currently trying to deploy and serve a fasttext model for a business venture. I decided to use Google's Vertex AI (if you have a better idea of something to use, please do!). I created a dockerfile and training script to train my model, I built the docker image and then pushed it to the Google Cloud Repository. Here is the code for it :
Dockerfile :
FROM python:3.8-slim-buster
RUN apt-get update && apt-get install -y \
build-essential \
wget \
git \
python-dev \
unzip \
python-numpy \
python-scipy \
&& rm -rf /var/cache/apk/*
RUN wget -nv \
https://dl.google.com/dl/cloudsdk/release/google-cloud-sdk.tar.gz && \
mkdir /root/tools && \
tar xvzf google-cloud-sdk.tar.gz -C /root/tools && \
rm google-cloud-sdk.tar.gz && \
/root/tools/google-cloud-sdk/install.sh --usage-reporting=false \
--path-update=false --bash-completion=false \
--disable-installation-options && \
rm -rf /root/.config/* && \
ln -s /root/.config /config && \
# Remove the backup directory that gcloud creates
rm -rf /root/tools/google-cloud-sdk/.install/.backup
# Path configuration
ENV PATH $PATH:/root/tools/google-cloud-sdk/bin
# Make sure gsutil will use the default service account
RUN echo '[GoogleCompute]\nservice_account = default' > /etc/boto.cfg
RUN pip3 install fasttext
RUN pip3 install google
RUN pip3 install google-cloud-storage
RUN pip3 install --upgrade google-api-python-client
RUN pip3 install --upgrade google-cloud
COPY . .
ENTRYPOINT ["python3", "trainer.py"]
Trainer.py :
import fasttext
from google.cloud import storage
import tempfile
from google.cloud.storage import blob
from google.oauth2 import service_account
credentials = service_account.Credentials.from_service_account_file('endless-bank-344008-a75f5b89470f.json')
with tempfile.NamedTemporaryFile() as tmp_file:
local_model_file = tmp_file.name
remote_model_file = storage.Client('endless-bank-344008', credentials).bucket('bucket2035').blob('cc.en.300.bin')
remote_model_file.download_to_filename(local_model_file)
model_1 = fasttext.load_model(local_model_file)
model_1.save_model("plagscan.bin")
target = storage.Client('endless-bank-344008', credentials).bucket('bucket2035').blob('plagscanner.bin')
target.upload_from_filename('plagscan.bin')
This code, works, which is great. I run it in the vertex ai platform, I press create a model, check everything that applies, use a custom container (after selecting the one I created that is now in the google cloud registry), it runs, very cool, no prediction container. It runs, doesn't create a model because there is no prediction container but it runs successfully and in the bucket2035 there is indeed an output file "plagscanne.bin". Then I created a dockerfile flask app thing to serve as a prediction container, here is the dockerfile and the flask app :
Dockerfile :
FROM python:3.8-slim-buster
RUN apt-get update && apt-get install -y \
build-essential \
wget \
git \
python-dev \
unzip \
python-numpy \
python-scipy \
&& rm -rf /var/cache/apk/*
RUN wget -nv \
https://dl.google.com/dl/cloudsdk/release/google-cloud-sdk.tar.gz && \
mkdir /root/tools && \
tar xvzf google-cloud-sdk.tar.gz -C /root/tools && \
rm google-cloud-sdk.tar.gz && \
/root/tools/google-cloud-sdk/install.sh --usage-reporting=false \
--path-update=false --bash-completion=false \
--disable-installation-options && \
rm -rf /root/.config/* && \
ln -s /root/.config /config && \
# Remove the backup directory that gcloud creates
rm -rf /root/tools/google-cloud-sdk/.install/.backup
# Path configuration
ENV PATH $PATH:/root/tools/google-cloud-sdk/bin
# Make sure gsutil will use the default service account
RUN echo '[GoogleCompute]\nservice_account = default' > /etc/boto.cfg
RUN pip3 install flask
RUN pip3 install fasttext
RUN pip3 install google
RUN pip3 install google-cloud-storage
RUN pip3 install --upgrade google-api-python-client
RUN pip3 install --upgrade google-cloud
RUN pip3 install simplejson
COPY . .
ENV FLASK_APP=app.py
EXPOSE 8080
CMD flask run --host=0.0.0.0 --port=8080
Flask app :
import fasttext
from google.cloud import storage
import tempfile
from google.cloud.storage import blob
from google.oauth2 import service_account
import json
import os
import simplejson
from flask import Flask, request, Response
a = os.path.join(model_dir, 'plagscanner.bin')
model_1 = fasttext.load_model(a)
app = Flask(__name__)
#app.route("/isalive")
def isalive():
print("/isalive request")
status_code = Response(status=200)
return status_code
# Flask route for predictions
#app.route('/predict',methods=['GET','POST'])
def prediction():
result = request.get_json(silent=True, force=True)
data = result['words']
wordvectors = json.dumps([model_1(x) for x in data])
return wordvectors
if __name__ == "__main__":
app.run(debug=True, host='0.0.0.0', port=8080)
Now... this should work right? Wrong. I built this container, pushed it to the google cloud registry and it didn't work, bizarely it gave me the error : Training pipeline failed with error message: There are no files under "gs://bucket2035/model" to copy.
Very bizarre, so instead I tried a variation of the app.py code, this version instead downloaded the model training file via bucket download :
import fasttext
from google.cloud import storage
import tempfile
from google.cloud.storage import blob
from google.oauth2 import service_account
import json
import os
import simplejson
from flask import Flask, request, Response
credentials = service_account.Credentials.from_service_account_file('endless-bank-344008-a75f5b89470f.json')
with tempfile.NamedTemporaryFile() as tmp_file:
local_model_file = tmp_file.name
remote_model_file = storage.Client('endless-bank-344008', credentials).bucket('bucket2035').blob('cc.en.300.bin')
remote_model_file.download_to_filename(local_model_file)
model_1 = fasttext.load_model(local_model_file)
app = Flask(__name__)
#app.route("/isalive")
def isalive():
print("/isalive request")
status_code = Response(status=200)
return status_code
# Flask route for predictions
#app.route('/predict',methods=['GET','POST'])
def prediction():
result = request.get_json(silent=True, force=True)
data = result['words']
wordvectors = json.dumps([model_1(x) for x in data])
return wordvectors
if __name__ == "__main__":
app.run(debug=True, host='0.0.0.0', port=8080)
Here is the full error :
Training pipeline failed with error message: There are no files under "gs://bucket2035/model" to copy.
Now guess what happens! It gives the same error. I don't understand this, what is it I'm trying to copy? Why is it not working? Is there another solution as opposed to Vertex AI I should be using for this very simple thing? What is the meaning of life (lol)? Please help, I've tried many things and none of them work and I kinda think that there must be an easier solution to this problem. Anyways, any help would be appreciated!

how to set the path of local module in python to be recognized in CircleCI?

I am building a python module. In order to define its path, a .pth file has been defined as follows:
# creation of the virtual environment
python -v venv env
# activation of the newly creation virtual environment
source env/bin/activate
To set the path of my module (my module is located in packages/regression_model/regression_model) I created this .pth file env/lib/python3.7/site-packages/regression_model.pth which contains:
# env/lib/python3.7/site-packages/regression_model.pth
../../../../packages/regression_model
Now, any where in my project, I can import my module regression_model through this command:
import regression_model
Actually my objective is to use CircleCI for the deployment of my project.
CircleCI is configured as follows:
version: 2
jobs:
test_regression_model:
working_directory: ~/project
docker:
- image: circleci/python:3.7.6
environment: # environment variables for primary container
PYTHONPATH: ~/project/packages/regression_model:~/project/packages/ml_api
steps:
- checkout
- run:
name: Runnning tests
command: |
virtualenv venv
. venv/bin/activate
pip install --upgrade pip
pip install -r packages/regression_model/requirements.txt
chmod +x ./scripts/fetch_kaggle_dataset.sh
./scripts/fetch_kaggle_dataset.sh
python packages/regression_model/regression_model/train_pipeline.py
py.test -vv packages/regression_model/tests
workflows:
version: 2
test-all:
jobs:
- test_regression_model
The problem I am facing is that CircleCI is indicating that my module can not be imported
Traceback (most recent call last):
File "packages/regression_model/regression_model/train_pipeline.py", line 4, in <module>
from regression_model import pipeline
ModuleNotFoundError: No module named 'regression_model'
To solve the problem, the path to that module regression_model has to be defined exactly as it was done locally. The question is then: how to define path in the CircleCI?
I tried to do it through the use of the environment variable PYTHONPATH but without success.
Any suggestions?
I found out the solution. Similarly to what it has been done manually on my local machine, I just define 2 command lines to get it done in CircleCI:
echo "../../../../packages/regression_model" >> env/lib/python3.7/site-packages/extra.pth
echo "../../../../packages/ml_api" >> env/lib/python3.7/site-packages/extra.pth
And below the full yml file just in case it could help others.
version: 2
jobs:
test_regression_model:
working_directory: ~/project
docker:
- image: circleci/python:3.7.6
steps:
- checkout
- run:
name: Runnning tests
command: |
virtualenv env
. env/bin/activate
pip install --upgrade pip
pip install -r packages/regression_model/requirements.txt
echo "../../../../packages/regression_model" >> env/lib/python3.7/site-packages/extra.pth
echo "../../../../packages/ml_api" >> env/lib/python3.7/site-packages/extra.pth
chmod +x ./scripts/fetch_kaggle_dataset.sh
./scripts/fetch_kaggle_dataset.sh
sudo apt-get install unzip
unzip packages/regression_model/regression_model/datasets/house-prices-advanced-regression-techniques.zip -d packages/regression_model/regression_model/datasets/
python packages/regression_model/regression_model/train_pipeline.py
py.test -vv packages/regression_model/tests
workflows:
version: 2
test-all:
jobs:
- test_regression_model

passing url as arguments in python when running docker image

I want to pass an GCP storage URL as argument when running my docker image, so that it can pull my csv file from my storage and print the dataset .
Below is my dockerfile
# Use the official lightweight Python image.
# https://hub.docker.com/_/python
FROM continuumio/miniconda3
# Copy local code to the container image.
ENV APP_HOME /app
WORKDIR $APP_HOME
COPY . ./
# Install production dependencies.
RUN pip install Flask gunicorn
RUN pip install scikit-learn==0.20.2 firefly-python==0.1.15
RUN pip install --upgrade google-cloud-storage
ENTRYPOINT ["python"]
CMD ["pre.py"]
I tried running the docker image by below command and getting below error
docker run preprocess:v1 "https://storage.googleapis.com/MYBucket/app/Model/IrisClassifier.sav"
.
python: can't open file 'https://storage.googleapis.com/MYBucket/app/Model/IrisClassifier.sav': [Errno 2] No such file or directory
import os
import argparse
from google.cloud import storage
from sklearn.externals import joblib
from urllib.request import urlopen
def parse_arguments():
print('entered parse arg')
parser = argparse.ArgumentParser()
parser.add_argument('data_dir', type=str, help='GCSpath')
args = parser.parse_known_args()[0]
print('Argument passed')
print(os.getcwd())
print('STARTING CLOUD RETRIVAL')
print('*****client initialized')
dataset_load = joblib.load(urlopen(args.dat_dir))
print('*****loaded Dataset')
print(dataset_load)
def main(_):
print("Prior to entering arg")
parse_arguments()
I want to pass a similar GCP bucket when running my docker image
https://storage.googleapis.com/MYBucket/app/Model/IrisClassifier.sav
you need to change all your CMD to ENTRYPOINT at first:
FROM continuumio/miniconda3
ENV APP_HOME /app
WORKDIR $APP_HOME
COPY . ./
RUN pip install Flask gunicorn
RUN pip install scikit-learn==0.20.2 firefly-python==0.1.15
RUN pip install --upgrade google-cloud-storage
ENTRYPOINT ["python", "pre.py"]
then you can pass your URL.
The Problem with your setup is:
docker will start the entrypoint and that is python and with your command you overwrite the CMD wich will give you:
python YOUR_URL
Update
I do not know if you add if statement to run the main def but here how you schould edit the script:
def main():
print("Prior to entering arg")
parse_arguments()
if __name__ == '__main__':
main()

How to maintain glibc and libmusl Python wheels in the same pip repository?

Previously we've used our internal pip repository for source distributions only. Moving forward we want to host wheels as well to accomplish two things:
serve our own code to both (local) developer machines and Alpine Docker environments
create wheels for packages that don't have Alpine wheels
Unfortunately the wheels built with different libraries share the same artifact name and the second one gets rejected by the pip repository:
docker-compose.yml
version: '3'
services:
build-alpine:
build: alpine
image: build-alpine-wheels
volumes:
- $PWD/cython:/build
working_dir: /build
command: sh -c 'python setup.py bdist_wheel && twine upload --repository-url http://pypi:8080 -u admin -p admin dist/*'
build-debian:
build: debian
image: build-debian-wheels
volumes:
- $PWD/cython-debian:/build
working_dir: /build
command: bash -c 'sleep 10s && python setup.py bdist_wheel && twine upload --repository-url http://pypi:8080 -u admin -p admin dist/*'
pypi:
image: stevearc/pypicloud:1.0.2
volumes:
- $PWD/pypi:/etc/pypicloud/
alpine-test:
image: build-alpine-wheels
depends_on:
- build-alpine
command: sh -c 'while ping -c1 build-alpine &>/dev/null; do sleep 1; done; echo "build container finished" && pip install -i http://pypi:8080/pypi --trusted-host pypi cython && cython --version'
debian-test:
image: python:3.6
depends_on:
- build-debian
command: bash -c 'while ping -c1 build-debian &>/dev/null; do sleep 1; done; echo "build container finished" && pip install -i http://pypi:8080/pypi --trusted-host pypi cython && cython --version'
alpine/Dockerfile
FROM python:3.6-alpine
RUN apk add --update --no-cache build-base
RUN pip install --upgrade pip
RUN pip install twine
debian/Dockerfile
FROM python:3.6-slim
RUN apt-get update && apt-get install -y \
build-essential \
&& rm -rf /var/lib/apt/lists/*
RUN pip install --upgrade pip
RUN pip install twine
pypi/config.ini
[app:main]
use = egg:pypicloud
pyramid.reload_templates = False
pyramid.debug_authorization = false
pyramid.debug_notfound = false
pyramid.debug_routematch = false
pyramid.default_locale_name = en
pypi.default_read =
everyone
pypi.default_write =
everyone
pypi.storage = file
storage.dir = %(here)s/packages
db.url = sqlite:///%(here)s/db.sqlite
auth.admins =
admin
user.admin = $6$rounds=535000$sFuRqMc5PbRccW1J$OBCsn8szlBwr4yPP243JPqomapgInRCUavv/p/UErt7I5FG4O6IGSHkH6H7ZPlrMXO1I8p5LYCQQxthgWZtxe1
# For beaker
session.encrypt_key = s0ETvuGG9Z8c6lK23Asxse4QyuVCsI2/NvGiNvvYl8E=
session.validate_key = fJvHQieaa0g3XsdgMF5ypE4pUf2tPpkbjueLQAAHN/k=
session.secure = False
session.invalidate_corrupt = true
###
# wsgi server configuration
###
[uwsgi]
paste = config:%p
paste-logger = %p
master = true
processes = 20
reload-mercy = 15
worker-reload-mercy = 15
max-requests = 1000
enable-threads = true
http = 0.0.0.0:8080
virtualenv = /env
###
# logging configuration
# http://docs.pylonsproject.org/projects/pyramid/en/latest/narr/logging.html
###
[loggers]
keys = root, botocore, pypicloud
[handlers]
keys = console
[formatters]
keys = generic
[logger_root]
level = INFO
handlers = console
[logger_pypicloud]
level = DEBUG
qualname = pypicloud
handlers =
[logger_botocore]
level = WARN
qualname = botocore
handlers =
[handler_console]
class = StreamHandler
args = (sys.stderr,)
level = NOTSET
formatter = generic
[formatter_generic]
format = %(levelname)s %(asctime)s [%(name)s] %(message)s
Setup and execution
git clone https://github.com/cython/cython
git clone https://github.com/cython/cython cython-debian
docker-compose build
docker-compose up
At the end I would like both test containers to be able to execute cython --version. Which works for the Alpine container:
alpine-test_1 | Collecting cython
alpine-test_1 | Downloading http://pypi:8080/api/package/cython/Cython-0.29.12-cp36-cp36m-linux_x86_64.whl (5.0MB)
alpine-test_1 | Installing collected packages: cython
alpine-test_1 | Successfully installed cython-0.29.12
alpine-test_1 | Cython version 0.29.12
But doesn't work for the Debian container:
debian-test_1 | Downloading http://pypi:8080/api/package/cython/Cython-0.29.12-cp36-cp36m-linux_x86_64.whl (5.0MB)
debian-test_1 | Installing collected packages: cython
debian-test_1 | Successfully installed cython-0.29.12
debian-test_1 | Traceback (most recent call last):
debian-test_1 | File "/usr/local/bin/cython", line 6, in <module>
debian-test_1 | from Cython.Compiler.Main import setuptools_main
debian-test_1 | File "/usr/local/lib/python3.6/site-packages/Cython/Compiler/Main.py", line 28, in <module>
debian-test_1 | from .Scanning import PyrexScanner, FileSourceDescriptor
debian-test_1 | ImportError: libc.musl-x86_64.so.1: cannot open shared object file: No such file or directory
I find it particularly curious that both environments try to pull this wheel because there are all sorts of packages which don't work with Alpine (e.g. Pandas) in which case pip goes straight for the source distribution. I suppose I must be doing something wrong in that regard as well.
So now I'm wondering how I can create these wheels such that for each version of the software package two different wheels can live in the pip repository and have pip automatically download and install the correct one.
There is currently no support for musl in the manylinux standard: your options are to always build from source, or target a different, glibc-based platform.
It seems that now PEP656 defines a platform tag 'musllinux'
https://www.python.org/dev/peps/pep-0656/
I would suggest not using Alpine at all—you can get images almost as small with multi-stage builds (https://pythonspeed.com/articles/smaller-python-docker-images/), and musl doesn't just mean lack of binary wheels. There's a whole bunch of production bugs people have had due to musl (Python crashes, timestamp formatting problems—see https://pythonspeed.com/articles/base-image-python-docker-images/ for references).
Most of the known musl links have been fixed, but it's different enough that it doesn't seem worth the production risk (not to mention your very expensive developer time!) just to get a 100MB-smaller image.

Categories