Why is my container copying a non existing file? - python

I am currently trying to deploy and serve a fasttext model for a business venture. I decided to use Google's Vertex AI (if you have a better idea of something to use, please do!). I created a dockerfile and training script to train my model, I built the docker image and then pushed it to the Google Cloud Repository. Here is the code for it :
Dockerfile :
FROM python:3.8-slim-buster
RUN apt-get update && apt-get install -y \
build-essential \
wget \
git \
python-dev \
unzip \
python-numpy \
python-scipy \
&& rm -rf /var/cache/apk/*
RUN wget -nv \
https://dl.google.com/dl/cloudsdk/release/google-cloud-sdk.tar.gz && \
mkdir /root/tools && \
tar xvzf google-cloud-sdk.tar.gz -C /root/tools && \
rm google-cloud-sdk.tar.gz && \
/root/tools/google-cloud-sdk/install.sh --usage-reporting=false \
--path-update=false --bash-completion=false \
--disable-installation-options && \
rm -rf /root/.config/* && \
ln -s /root/.config /config && \
# Remove the backup directory that gcloud creates
rm -rf /root/tools/google-cloud-sdk/.install/.backup
# Path configuration
ENV PATH $PATH:/root/tools/google-cloud-sdk/bin
# Make sure gsutil will use the default service account
RUN echo '[GoogleCompute]\nservice_account = default' > /etc/boto.cfg
RUN pip3 install fasttext
RUN pip3 install google
RUN pip3 install google-cloud-storage
RUN pip3 install --upgrade google-api-python-client
RUN pip3 install --upgrade google-cloud
COPY . .
ENTRYPOINT ["python3", "trainer.py"]
Trainer.py :
import fasttext
from google.cloud import storage
import tempfile
from google.cloud.storage import blob
from google.oauth2 import service_account
credentials = service_account.Credentials.from_service_account_file('endless-bank-344008-a75f5b89470f.json')
with tempfile.NamedTemporaryFile() as tmp_file:
local_model_file = tmp_file.name
remote_model_file = storage.Client('endless-bank-344008', credentials).bucket('bucket2035').blob('cc.en.300.bin')
remote_model_file.download_to_filename(local_model_file)
model_1 = fasttext.load_model(local_model_file)
model_1.save_model("plagscan.bin")
target = storage.Client('endless-bank-344008', credentials).bucket('bucket2035').blob('plagscanner.bin')
target.upload_from_filename('plagscan.bin')
This code, works, which is great. I run it in the vertex ai platform, I press create a model, check everything that applies, use a custom container (after selecting the one I created that is now in the google cloud registry), it runs, very cool, no prediction container. It runs, doesn't create a model because there is no prediction container but it runs successfully and in the bucket2035 there is indeed an output file "plagscanne.bin". Then I created a dockerfile flask app thing to serve as a prediction container, here is the dockerfile and the flask app :
Dockerfile :
FROM python:3.8-slim-buster
RUN apt-get update && apt-get install -y \
build-essential \
wget \
git \
python-dev \
unzip \
python-numpy \
python-scipy \
&& rm -rf /var/cache/apk/*
RUN wget -nv \
https://dl.google.com/dl/cloudsdk/release/google-cloud-sdk.tar.gz && \
mkdir /root/tools && \
tar xvzf google-cloud-sdk.tar.gz -C /root/tools && \
rm google-cloud-sdk.tar.gz && \
/root/tools/google-cloud-sdk/install.sh --usage-reporting=false \
--path-update=false --bash-completion=false \
--disable-installation-options && \
rm -rf /root/.config/* && \
ln -s /root/.config /config && \
# Remove the backup directory that gcloud creates
rm -rf /root/tools/google-cloud-sdk/.install/.backup
# Path configuration
ENV PATH $PATH:/root/tools/google-cloud-sdk/bin
# Make sure gsutil will use the default service account
RUN echo '[GoogleCompute]\nservice_account = default' > /etc/boto.cfg
RUN pip3 install flask
RUN pip3 install fasttext
RUN pip3 install google
RUN pip3 install google-cloud-storage
RUN pip3 install --upgrade google-api-python-client
RUN pip3 install --upgrade google-cloud
RUN pip3 install simplejson
COPY . .
ENV FLASK_APP=app.py
EXPOSE 8080
CMD flask run --host=0.0.0.0 --port=8080
Flask app :
import fasttext
from google.cloud import storage
import tempfile
from google.cloud.storage import blob
from google.oauth2 import service_account
import json
import os
import simplejson
from flask import Flask, request, Response
a = os.path.join(model_dir, 'plagscanner.bin')
model_1 = fasttext.load_model(a)
app = Flask(__name__)
#app.route("/isalive")
def isalive():
print("/isalive request")
status_code = Response(status=200)
return status_code
# Flask route for predictions
#app.route('/predict',methods=['GET','POST'])
def prediction():
result = request.get_json(silent=True, force=True)
data = result['words']
wordvectors = json.dumps([model_1(x) for x in data])
return wordvectors
if __name__ == "__main__":
app.run(debug=True, host='0.0.0.0', port=8080)
Now... this should work right? Wrong. I built this container, pushed it to the google cloud registry and it didn't work, bizarely it gave me the error : Training pipeline failed with error message: There are no files under "gs://bucket2035/model" to copy.
Very bizarre, so instead I tried a variation of the app.py code, this version instead downloaded the model training file via bucket download :
import fasttext
from google.cloud import storage
import tempfile
from google.cloud.storage import blob
from google.oauth2 import service_account
import json
import os
import simplejson
from flask import Flask, request, Response
credentials = service_account.Credentials.from_service_account_file('endless-bank-344008-a75f5b89470f.json')
with tempfile.NamedTemporaryFile() as tmp_file:
local_model_file = tmp_file.name
remote_model_file = storage.Client('endless-bank-344008', credentials).bucket('bucket2035').blob('cc.en.300.bin')
remote_model_file.download_to_filename(local_model_file)
model_1 = fasttext.load_model(local_model_file)
app = Flask(__name__)
#app.route("/isalive")
def isalive():
print("/isalive request")
status_code = Response(status=200)
return status_code
# Flask route for predictions
#app.route('/predict',methods=['GET','POST'])
def prediction():
result = request.get_json(silent=True, force=True)
data = result['words']
wordvectors = json.dumps([model_1(x) for x in data])
return wordvectors
if __name__ == "__main__":
app.run(debug=True, host='0.0.0.0', port=8080)
Here is the full error :
Training pipeline failed with error message: There are no files under "gs://bucket2035/model" to copy.
Now guess what happens! It gives the same error. I don't understand this, what is it I'm trying to copy? Why is it not working? Is there another solution as opposed to Vertex AI I should be using for this very simple thing? What is the meaning of life (lol)? Please help, I've tried many things and none of them work and I kinda think that there must be an easier solution to this problem. Anyways, any help would be appreciated!

Related

Streamlit app not loading from google colab

Tried to run one of my streamlit app from google colab as the local system is not that much friendly for such heavy task.
I used ngork according to the instructions from gist. The output is showing the app is running on some local port. But the locallink is not loading and finally shows site cant be reached.
Implementation:
from google.colab import drive
drive.mount('/content/drive/',force_remount=True)
%cd drive/MyDrive/MyProject/
!pip install -r requirements.txt
!pip install pyngrok
!pip install -q streamlit
!wget https://bin.equinox.io/c/4VmDzA7iaHb/ngrok-stable-linux-amd64.zip
!unzip ngrok-stable-linux-amd64.zip
!./ngrok authtoken ***************** (my authentication from ngork is used here.)
get_ipython().system_raw('./ngrok http 8501 &')
# !curl -s http://localhost:4040/api/tunnels | python3 -c \
# 'import sys, json; print("Execute the next cell and the go to the following URL: " +json.load(sys.stdin)["tunnels"][0]["public_url"])'
!nohup streamlit run main.py &

<class 'TypeError'>, Value: can't pickle _thread.RLock objects from within Docker (using Redis Queue)

Running a python app from Docker using Redis, Redis Queue and PYODBC. Below is my Dockerfile and part of my code, which includes the line where it fails. This code works fine on PyCharm but does not work from Docker - is it because of the way Dockerfile is set up or is my code wrong in any way? As seen from the code, no other threading/multiprocessing is being used except Redis Queue/queue.enqueue
Dockerfile
FROM ubuntu:18.04
RUN apt-get update -y && \
apt-get install -y \
libpq-dev \
python3.7 \
gcc \
python3-pip \
unixodbc-dev
RUN apt-get update && apt-get install -y \
curl apt-utils apt-transport-https debconf-utils gcc build-essential g++-5\
&& rm -rf /var/lib/apt/lists/*
RUN curl https://packages.microsoft.com/keys/microsoft.asc | apt-key add -
RUN curl https://packages.microsoft.com/config/ubuntu/18.04/prod.list > /etc/apt/sources.list.d/mssql-release.list
RUN apt-get update
RUN ACCEPT_EULA=Y apt-get install -y --allow-unauthenticated msodbcsql17
RUN pip3 install pyodbc
WORKDIR /app
COPY requirements.txt ./
COPY Main.py EmailSender.py AppConfig.py Queries.py SettingsCopierEngine.py Worker.py wsgi.py ./
RUN pip3 install --upgrade pip && pip3 install --no-cache-dir -r requirements.txt
ENV FLASK_APP=Main.py
ENV FLASK_ENV=production
ENTRYPOINT ["gunicorn", "--bind", "0.0.0.0:5000", "wsgi:app"]
SettingsCopierEngine.py (Error on line 58 currentRequest = queue.enqueue(self.copySettingsFromSourceToDestination))
from pythonjsonlogger import jsonlogger
import sys
import uuid
import logging
import redis
from rq import Queue
from EmailSender import *
import AppConfig as cfg
import pandas as pd
import pytds
import inspect
import Queries as q
from github import Github
import re
from datetime import datetime
import pyodbc
import sqlalchemy as sa
from sqlalchemy.engine import URL
redisClient = redis.StrictRedis(host=cfg.RedisBroker["Host"], port=cfg.RedisBroker["Port"]
, decode_responses=True)
logger = logging.getLogger()
logHandler = logging.StreamHandler()
formatter = jsonlogger.JsonFormatter("%(asctime)s %(filename)s %(module)s %(funcName)s %(lineno)s %(message)s")
logHandler.setFormatter(formatter)
logger.setLevel(logging.INFO)
if logger.hasHandlers():
logger.handlers.clear()
logger.addHandler(logHandler)
class SettingsCopierEngine:
def __init__(self, userEmail):
logger.info("Initializing SettingsCopierEngine class")
self.requestId = str(uuid.uuid4())
self.userEmail = userEmail
self.masterSettingsQuery = ""
self.masterPricebookQuery = ""
self.emailClient = EmailSender(logger)
try:
logger.info("Checking if redis instance is up or not"
, extra={"RequestId": self.requestId, "UserEmail": self.userEmail})
redisClient.ping()
logger.info("Redis instance is up and running", extra={"RequestId": self.requestId, "UserEmail": self.userEmail})
except (redis.exceptions.ConnectionError, ConnectionRefusedError):
logger.error("Failed to connect to redis instance", extra={"RequestId": self.requestId, "UserEmail": self.userEmail})
raise
#setting up queue and passing each json request to queue with a job id.
def processData(self, requestData):
try:
self.requestData = requestData
workerRequestId = self.requestId
logger.info("Starting worker and working on current request",
extra={"RequestId": self.requestId, "UserEmail": self.userEmail})
queue = Queue(cfg.RedisBroker["QueueName"], connection=redisClient)
currentRequest = queue.enqueue(self.copySettingsFromSourceToDestination)
logger.info(f"Request: {currentRequest.id} added to queue at {currentRequest.enqueued_at}. {len(queue)} tasks in queue"
, extra={"RequestId": self.requestId, "UserEmail": self.userEmail})
except (ValueError, Exception):
functionName = inspect.currentframe().f_code.co_name
exc_type, exc_obj, exc_tb = sys.exc_info()
errorMessage = f"Type: {exc_type}, Value: {exc_obj}.\nFunctionName: {functionName}, Actual LineNumber: {exc_tb.tb_lineno}"
logger.error(errorMessage, extra={"requestId": self.requestId})
raise
return
def copySettingsFromSourceToDestination(self):
print(1)
I believe it's more of a Docker(file) or Python Version issue, as the inclusion of logger (that other posts mention) and the entire code, works fine in PyCharm. The python version in PyCharm is 3.7.4 while the Docker version is 3.6.9, could this be an issue? If so, how do I upgrade the python version in the Dockerfile?

Flask is raising "requests.exceptions.ConnectionError" when accessing non empty body of request in a docker container but not locally

I have the following endpoint in flask:
#app.route('/test', methods=['POST'])
def test():
d = flask.request.form
return "200"
When running locally the following two requests return "200":
url = "http://127.0.0.1:8080/test"
non_empty_body = {"test": "2"}
empty_body = {}
response_1 = requests.request("POST", url, data=non_empty_body)
print(response_1.text)
response_2 = requests.request("POST", url, data=empty_body)
print(response_2.text)
when running in a docker container the second request (with the empty body) does also return "200" but the first request fails. It gets stuck and after approx 1 minute it raises the following error:
requests.exceptions.ConnectionError: ('Connection aborted.',
RemoteDisconnected('Remote end closed connection without response'))
My Docker file looks as follows (leaving out a few lines that I am not comfortable sharing):
FROM python:3.7
ARG DEPLOY_TOKEN
# External dependencies
ENV WORKER_COUNT=2
ENV TASK_WORKER_COUNT=2
ENV SMART_ALERTS_WORKER_COUNT=5
ENV TASKS_CACHE_TTL_IN_DAYS=0
ENV TASKS_CACHE_TTL_IN_HOURS=4
ENV LOGLEVEL="INFO"
VOLUME ["/cache"]
EXPOSE 8080
# Install python and other tools
RUN apt-get update && apt-get install -y --no-install-recommends \
graphviz \
fonts-freefont-ttf \
fonts-roboto \
gzip \
curl \
cron \
dnsutils \
iputils-ping \
&& rm -rf /var/lib/apt/lists/*
ENV PYTHONUNBUFFERED=TRUE
ENV PYTHONDONTWRITEBYTECODE=TRUE
RUN pip install --upgrade pip
RUN pip install --upgrade gunicorn cryptography \
pandas==1.2.5 matplotlib numpy scikit-learn requests pytest flask pytz graphviz psutil prophet apscheduler
What is happening here and how to solve this? Obviously I need to be able to access the request body to handle post requests..

Trouble on training YoloV5 on AWS Sagemaker | AlgorithmError: , exit code: 1

I'm trying to train YoloV5 on AWS Sagemaker with custom data (that is stored in S3) via a Docker Image (ECR) and I keep getting "AlgorithmError: , exit code: 1". Can someone please tell me how to debug this problem?
Here's the Docker Image :
# GET THE AWS IMAGE
FROM 763104351884.dkr.ecr.eu-west-3.amazonaws.com/pytorch-training:1.11.0-gpu-py38-cu113-ubuntu20.04-sagemaker
# UPDATES
RUN apt update
RUN DEBIAN_FRONTEND=noninteractive TZ=Etc/UTC apt install -y tzdata
RUN apt install -y python3-pip git zip curl htop screen libgl1-mesa-glx libglib2.0-0
RUN alias python=python3
# INSTALL REQUIREMENTS
COPY requirements.txt .
RUN python3 -m pip install --upgrade pip
RUN pip install --no-cache -r requirements.txt albumentations gsutil notebook \
coremltools onnx onnx-simplifier onnxruntime openvino-dev tensorflow-cpu tensorflowjs
COPY code /opt/ml/code
WORKDIR /opt/ml/code
RUN git clone https://github.com/ultralytics/yolov5 /opt/ml/code/yolov5
ENV SAGEMAKER_SUBMIT_DIRECTORY /opt/ml/code
ENV SAGEMAKER_PROGRAM trainYolo.py
ENTRYPOINT ["python", "trainYolo.py"]
And here's trainYolo.py :
import json
import os
import numpy as np
import cv2 as cv
import subprocess
import yaml
import shutil
trainSet = os.environ["SM_CHANNEL_TRAIN"]
valSet = os.environ["SM_CHANNEL_VAL"]
output_dir = os.environ["SM_CHANNEL_OUTPUT"]
#Creating the data.yaml for yolo
dict_file = [{'names' : ['block']},
{'nc' : ['1']}, {'train': [trainSet]}
, {'val': [valSet]}]
with open(r'data.yaml', 'w') as file:
documents = yaml.dump(dict_file, file)
#Execute this command to train Yolo
res = subprocess.run(["python3", "yolov5/train.py", "--batch", "16" "--epochs", "100", "--data", "data.yaml", "--cfg", "yolov5/models/yolov5s.yaml","--weights", "yolov5s.pt" "--cache"], shell=True)
shutil.copy("yolov5", output_dir)
Note : I'm not sure if subprocess.run() works in an environment such as Sagemaker.
Thank you
So your training script is not configured properly. When using a SageMaker estimator or Script Mode you must configure it in a format that will save the model properly. Here's an example notebook with TensorFlow and script mode. If you would like to build your own Dockerfile (Bring Your Own Container) then you would have to configure your train file as shown in the second link.
Script-Mode: https://github.com/RamVegiraju/SageMaker-Deployment/tree/master/RealTime/Script-Mode/TensorFlow/Classification
BYOC: https://github.com/RamVegiraju/SageMaker-Deployment/tree/master/RealTime/BYOC/Sklearn/Sklearn-Regressor/container/randomForest

ModuleNotFoundError when running docker and poetry

I am running into an error when trying to run my container where it is saying it can't find a module while trying to import. Specifically:
ModuleNotFoundError: No module named 'sentry_sdk'
The following is my DockerFile which is a multistage build, it seems to install all the packages according to the console output.
###############################################
# Base Image
###############################################
FROM tiangolo/uvicorn-gunicorn-fastapi:python3.9 as python-base
ENV PYTHONUNBUFFERED=1 \
PYTHONDONTWRITEBYTECODE=1 \
PIP_NO_CACHE_DIR=off \
PIP_DISABLE_PIP_VERSION_CHECK=on \
PIP_DEFAULT_TIMEOUT=100 \
POETRY_VERSION=1.1.13 \
POETRY_HOME="/opt/poetry" \
POETRY_VIRTUALENVS_IN_PROJECT=true \
POETRY_NO_INTERACTION=1 \
PYSETUP_PATH="/opt/pysetup" \
VENV_PATH="/opt/pysetup/.venv"
# prepend poetry and venv to path
ENV PATH="$POETRY_HOME/bin:$VENV_PATH/bin:$PATH"
###############################################
# Builder Image
###############################################
FROM python-base as builder-base
# install poetry - respects $POETRY_VERSION & $POETRY_HOME
RUN curl -sSL https://install.python-poetry.org | python3 -
# copy project requirement files here to ensure they will be cached.
WORKDIR $PYSETUP_PATH
COPY pyproject.toml ./
# install runtime deps - uses $POETRY_VIRTUALENVS_IN_PROJECT internally
RUN poetry install --no-dev
###############################################
# Production Image
###############################################
FROM python-base as production
COPY --from=builder-base $PYSETUP_PATH $PYSETUP_PATH
COPY . .
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "80"]
The start of my main file is the following:
from logging import getLogger
from os import environ
from typing import List
from fastapi import FastAPI
from starlette.status import HTTP_200_OK
from sentry_sdk import init as SentryInit
from sentry_sdk.integrations.asgi import SentryAsgiMiddleware
from sentry_sdk.integrations.sqlalchemy import SqlalchemyIntegration
It is failing on the line:
from sentry_sdk import init as SentryInit
This is the first line where the package is not a default install on the container, so this may be related to the venv but I am not sure why or how.
My pyproject.toml looks like this:
[tool.poetry]
authors = ["xxx"]
name = "xxx"
description = "xxx"
version = "xxx"
[tool.poetry.dependencies]
asyncpg = "^0.21.0"
fastapi = "^0.73.0"
pydantic = "^1.9.0"
python = "^3.8.7"
sqlalchemy = "^1.3.22"
databases = "^0.5.5"
sentry-sdk = "^1.5.5"
[tool.poetry.dev-dependencies]
pytest = "^3.4"
httpx = "^0.22.0"
[build-system]
build-backend = "poetry.core.masonry.api"
requires = ["poetry-core>=1.0.0"]
OK I figured it out and now I feel dumb.
The issue was indeed related to the venv, basically, uvicorn is installed on the base image but not in my pyproject.toml. So poetry didn't install it in the venv. When I started the app in the Dockerfile using CMD it couldn't find uvicorn in the venv so went to the base install and ran from there. When I added uvicorn to the venv it all worked fine.

Categories