I have a complex API which takes around 7GB memory when I deploy it using Uvicorn.
I want to understand how I can deploy it, such a way that from my end I want to be able to make parallel requests. The deployed API should be capable of processing two or three requests at same time.
I am using FastAPI with uvicorn and nginx for deployment. Here is my deployed command.
uvicorn --host 0.0.0.0 --port 8888
Can someone provide some clarity on how people achieve that?
You can use gunicorn instead of uvicorn to handle your backend. Gunicorn offers multiple workers to effectively make load balancing of the arriving requests. This means that you will have as many gunicorn running process as you specify to receive and process requests. From the doc, gunicorn should only need 4-12 worker processes to handle hundreds or thousands of requests per second. However, the number of workers should be no more than (2 x number_of_cpu_cores) + 1 to avoid running out of memory errors. You can check this out in the doc.
For example, if you want to use 4 workers for your fastapi-based backend, you can specify it with the flag w:
gunicorn main:app -w 4 -k uvicorn.workers.UvicornWorker -b "0.0.0.0:8888"
In this case, the script where I have my backend functionalities is called main and fastapi is instantiated as app.
I'm working on something like this using Docker and NGINX.
There's a Docker official image created by the guy who developed FastAPI that deploys uvicorn/gunicorn for you that can be configured to your needs:
It took some time to get the hang of Docker but I'm really liking it now. You can build an nginx image using the below configuration and then build x amount of your app inside of separate containers for however many you need to serve as hosts.
The below example is running a weighted load balancer for two of my app services with a backup third if those two should fail.
https://hub.docker.com/r/tiangolo/uvicorn-gunicorn-fastapi
nginx Dockerfile:
FROM nginx
# Remove the default nginx.conf
RUN rm /etc/nginx/conf.d/default.conf
# Replace with our own nginx.conf
COPY nginx.conf /etc/nginx/conf.d/
nginx.conf:
upstream loadbalancer {
server 192.168.115.5:8080 weight=5;
server 192.168.115.5:8081;
server 192.168.115.5:8082 backup;
}
server {
listen 80;
location / {
proxy_pass http://loadbalancer;
}
}
app Dockerfile:
FROM tiangolo/uvicorn-gunicorn-fastapi:python3.7
RUN pip install --upgrade pip
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . /app
Related
First off, I would like to state that I have scoured SO for a solution, yet nothing worked for me...
I am trying to deploy a flask server on App engine, yet I always get a 404 Error with /readiness_check failReason:"null"
This is my app.yaml (yes, I did increase the app_start_timeout_sec)
# yaml config for custom environment that uses docker
runtime: custom
env: flex
service: test-appengine
# change readiness check ;
# rediness failure leads to 502 Error
readiness_check:
path: "/readiness_check"
check_interval_sec: 5
timeout_sec: 4
failure_threshold: 2
success_threshold: 2
app_start_timeout_sec: 1800
And this is my Dockerfile:
# Use the official Python image.
# https://hub.docker.com/_/python
FROM python:3.8-buster
# Install Python dependencies.
COPY requirements.txt requirements.txt
RUN pip install -r requirements.txt
# Copy local code to the container image.
ENV APP_HOME /app
WORKDIR $APP_HOME
COPY . .
# expose port 8080 for app engine
EXPOSE 8080
# Run the web service on container startup. Here we use the gunicorn
# webserver, with one worker process and 8 threads.
# For environments with multiple CPU cores, increase the number of workers
# to be equal to the cores available.
# CMD exec gunicorn --bind :$PORT --workers 1 --threads 8 main:app
CMD ["gunicorn", "main:app", "-b", ":8080", "--timeout", "300"]
Finally, my main.py contains a very basic route, for the sake of the argument :
from flask import Flask
app = Flask(__name__)
#app.route("/")
def return_hello():
return "Hello!"
Could you please let me know what I'm doing wrong? Have been battling this issue for days now ... Thank you !
I believe you still need to define the handler for your readiness_check (you're getting 404 which means route not found).
See this article for an example
I have deployed python:3.8-slim-buster image to the App Service. Generally it is being run correctly as I can see the processing in the logs, however the health-check mechanism tries to ping the hosted server but it does not respond as it is only code that runs in a loop and process the messages from the queue.
It would be fine, but the application is being killed with the error:
Container didn't respond to HTTP pings on port: 80, failing site start.
Stopping site because it failed during startup.
Is there either a way to remove this Waiting for response to warmup request for container or specify in the dockerfile to respond with OK to those requests?
Currently my dockerfile is a 2 liner, that only copies the scripts and then runs python script.
The code that is inside this script is copied from https://learn.microsoft.com/en-us/azure/event-hubs/event-hubs-python-get-started-send#create-a-python-script-to-receive-events
The Dockerfile:
FROM python:3.8-slim-buster
COPY ./Scripts .
CMD [ "python3","-u","./calculate.py"]
The fix for that is to either host the script in a server e.g. nodejs or equivalent for given language, or create separate process that will return something for the 80 port.
There also might be a problem with the port configured on default, here is an answer how to fix that case:
https://serverfault.com/questions/1003418/azure-docker-app-error-site-did-not-start-within-expected-time-limit-and-co
step-1
add EXPOSE 8080 inside Dockerfile
step-2
build image from Dockerfile:
docker build . -t python-calculator
step-3
docker images #search the image with the tag you mentioned earlier i.e python-calculator
step-4
docker run -p 8080:8080 -d
step-5
localhost:8080
I am currently running my django app inside the docker container by using the below command
docker-compose run app sh -c "python manage.py runserver"
but I am not able to access the app with local host url, (not using any additional db server or ngnix or gunicorn, just simply running the django devlopment server inside the docker).
please let me know how to access the app
docker-compose run is intended to launch a utility container based on a service in your docker-compose.yml as a template. It intentionally does not publish the ports: declared in the Compose file, and you shouldn't need it to run the main service.
docker-compose up should be your go-to call for starting the services. Just docker-compose up on its own will start everything in the docker-compose.yml, concurrently, in the foreground; you can add -d to start the processes in the background, or a specific service name docker-compose up app to only start the app service and its dependencies.
The python command itself should be the main CMD in your image's Dockerfile. You shouldn't need to override it in your docker-compose.yml file or to provide it at the command line.
A typical Compose YAML file might look like:
version: '3.8'
services:
app:
build: . # from the Dockerfile in the current directory
ports:
- 5000:5000 # make localhost:5000 forward to port 5000 in the container
While Compose supports many settings, you do not need to provide most of them. Compose provides reasonable defaults for container_name:, hostname:, image:, and networks:; expose:, entrypoint:, and command: will generally come from your Dockerfile and don't need to be overridden.
Try 0.0.0.0:<PORT_NUMBER> (typically 80 or 8000), If you are still troubling to connect the server you should use the Docker Machine IP instead of localhost. Enter the following in terminal and navigate to the provided url:
docker-machine ip
I am experiencing a large performance penalty for calls to Tensorflow Serving, when the calling app is hosted in a docker container. I was hoping someone would have some suggestions. Details below on the setup and what I have tried.
Scenario 1:
Docker (version 18.09.0, build 4d60db4) hosted Tensorflow model, following the instructions here.
Flask app running on the host machine (not in container).
Using gRPC for sending the request to the model.
Performance: 0.0061 seconds / per prediction
Scenario 2:
Same docker container hosted Tensorflow model.
Container hosted Flask app running on the host machine (inside same container as the model).
Using gRPC for sending the request to the model.
Performance: 0.0107 / per prediction
In other words, when the app is hosted in the same container as the model, performance is ~40% lower.
I have logged timing on nearly every step in the app and have tracked the difference down to this line:
result = self.stub.Predict(self.request, 60.0)
In the container hosted app, the average round-trip for this is 0.006 seconds. For the same app hosted outside the container, the round-trip for this line is 0.002 seconds.
This is the function I am using to establish the connection to the model.
def TFServerConnection():
channel = implementations.insecure_channel('127.0.0.1', 8500)
stub = prediction_service_pb2.beta_create_PredictionService_stub(channel)
request = predict_pb2.PredictRequest()
return (channel, stub, request)
I have tried hosting the app and model in separate containers, building a custom Tensorflow Serving containers (optimized for my VM), and using the REST api (which decreased performance slightly for both scenarios).
Edit 1
To add a bit more information, I am running the docker container with the following command:
docker run \
--detach \
--publish 8000:8000 \
--publish 8500:8500 \
--publish 8501:8501 \
--name tfserver \
--mount type=bind,source=/home/jason/models,target=/models \
--mount type=bind,source=/home/jason/myapp/serve/tfserve.conf,target=/config/tfserve.conf \
--network host \
jason/myapp:latest
Edit 2
I have now tracked this down to being an issue with stub.Predict(request, 60.0) in Flask apps only. It seems Docker is not the issue. Here are the versions of Flask and Tensorflow I am currently running.
$ sudo pip3 freeze | grep Flask
Flask==1.0.2
$ sudo pip3 freeze | grep tensor
tensorboard==1.12.0
tensorflow==1.12.0
tensorflow-serving-api==1.12.0
I am using gunicorn as my WSGI server:
gunicorn --preload --config config/gunicorn.conf app:app
And the contents of config/gunicorn.conf:
bind = "0.0.0.0:8000"
workers = 3
timeout = 60
worker_class = 'gevent'
worker_connections = 1000
Edit 3
I have now narrowed the issue down to Flask. I ran the Flask app directly with app.run() and got the same performance as when using gunicorn. What could Flask be doing that would slow the call to Tensorflow?
I have a helper container and an app container.
The helper container handles mounting of code via git to a shared mount with the app container.
I need for the helper container to check for a package.json or requirements.txt in the cloned code and if one exists to run npm install or pip install -r requirements.txt, storing the dependencies in the shared mount.
Thing is the npm command and/or the pip command needs to be run from the app container to keep the helper container as generic and as agnostic as possible.
One solution would be to mount the docker socket to the helper container and run docker exec <command> <app container> but what if I have thousands of such apps on a single host.
Will there be issues having hundreds of containers all accessing the docker socket at the same time? And is there a better way to do this? Get commands run on another container?
Well there is no "container to container" internal communication layer like "ssh". In this regard, the containers are as standalone as 2 different VMs ( beside the network part in general ).
You might go the usual way, install opensshd-server on the "receiving" server, configure it key-based only. You do not need to export the port to the host, just connect to the port using the docker-internal network. Deploy the ssh private key on the 'caller server' and the public key into .ssh/authorized_keys on the 'receiving server' during container start time ( volume mount ) so you do not keep the secrets in the image (build time).
Probably also create a ssh-alias in .ssh/config and also set HostVerify to no, since the containers could be rebuild. Then do
ssh <alias> your-command
Found that better way I was looking for :-) .
Using supervisord and running the xml rpc server enables me to run something like:
supervisorctl -s http://127.0.0.1:9002 -utheuser -pthepassword start uwsgi
supervisorctl -s http://127.0.0.1:9002 -utheuser -pthepassword start uwsgi
In the helper container, this will connect to the rpc server running on port 9002 on the app container and execute a program block that may look something like;
[program:uwsgi]
directory=/app
command=/usr/sbin/uwsgi --ini /app/app.ini --uid nginx --gid nginx --plugins http,python --limit-as 512
autostart=false
autorestart=unexpected
stdout_logfile=/var/log/uwsgi/stdout.log
stdout_logfile_maxbytes=0
stderr_logfile=/var/log/uwsgi/stderr.log
stderr_logfile_maxbytes=0
exitcodes=0
environment = HOME="/app", USER="nginx"]
This is exactly what I needed!
For anyone who finds this you'll probably need your supervisord.conf on your app container to look sth like:
[supervisord]
nodaemon=true
[supervisorctl]
[rpcinterface:supervisor]
supervisor.rpcinterface_factory = supervisor.rpcinterface:make_main_rpcinterface
[inet_http_server]
port=127.0.0.1:9002
username=user
password=password
[program:uwsgi]
directory=/app
command=/usr/sbin/uwsgi --ini /app/app.ini --uid nginx --gid nginx --plugins http,python --limit-as 512
autostart=false
autorestart=unexpected
stdout_logfile=/var/log/uwsgi/stdout.log
stdout_logfile_maxbytes=0
stderr_logfile=/var/log/uwsgi/stderr.log
stderr_logfile_maxbytes=0
exitcodes=0
environment = HOME="/app", USER="nginx"]
You can setup the inet_http_server to listen on a socket. You can link the containers to be able to access them at a hostname.