I'm currently using FastApi with Gunicorn/Uvicorn as my server engine.
I'm using the following config for Gunicorn:
TIMEOUT 0
GRACEFUL_TIMEOUT 120
KEEP_ALIVE 5
WORKERS 10
Uvicorn has all default settings, and is started in docker container casually:
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]
Everything is packed in docker container.
The problem is the following:
After some time (somewhere between 1 day and 1 week, depending on load) my app stops responding (even simple curl http://0.0.0.0:8000 command hangs forever). Docker container keeps working, there are no application errors in logs, and there are no connection issues, but none of my workers are getting the request (and so I'm never getting my response). It seems like my request is lost somewhere between server engine and my application. Any ideas how to fix it?
UPDATE: I've managed to reproduce this behaviour with custom locust load profile:
The scenario was the following:
In first 15 minutes ramp up to 50 users (30 of them will send requests requiring GPU at 1 rps, and 20 will send requests that do not require GPU at 10 rps)
Work for another 4 hours
As the plot shows, in about 30 minutes API stops responding. (And still, there are no error messages/warnings in output)
UPDATE 2:
Can there be any hidden memory leak or deadlock due to incorrect Gunicorn setup or bug (such as https://github.com/tiangolo/fastapi/issues/596)?
UPDATE 4:
I've got inside my container and executed ps command. It shows:
PID TTY TIME CMD
120 pts/0 00:00:00 bash
134 pts/0 00:00:00 ps
Which means my Gunicorn server app just silently turned off. And also there is binary file named core in the app directory, which obviously mens that something has crashed
Related
I'm trying to run a bot on a VPS and im able to get a systemd service create so as to be able to run my python code automatically if the server were to ever reboot for any reason. The service is enabled, the status is showing as active when I check its status, and journalctl shows that the .py file has started, but that's where my progress ends. I receive no other output after the notification that the service has started. And when I check my VPS console there is 0 CPU usage meaning that the script is in fact not running.
The script is located at /home/user/projects/ytbot1/bot/main.py and runs perfectly fine when executed manually through python3 main.py.
both the script and the .service file were given u+x permissions to the root and user, and the service is set to run only when the user is logged in (I think,... all I did was set User=myusername in ytbot1.service)
[Unit]
Description=reiss YT Bot
[Service]
User=reiss
Group=reiss
Type=exec
ExecStart=/usr/bin/python3 "/home/reiss/projects/ytbot1/bot/main.py"
Restart=always
RestartSec=5
PrivateTmp=true
TimeoutSec=900
[Install]
WantedBy=multi-user.target
here's the output from sudo systemctl status ytbot1
● ytbot1.service - reiss YT Bot
Loaded: loaded (/etc/systemd/system/ytbot1.service; enabled; vendor preset: enabled)
Active: active (running) since Mon 2022-05-16 10:34:04 CEST; 9s ago
Main PID: 7684 (python3)
Tasks: 1 (limit: 19141)
Memory: 98.4M
CGroup: /system.slice/ytbot1.service
└─7684 /usr/bin/python3 /home/reiss/projects/ytbot1/bot/main.py
and sudo journalctl -fu ytbot1.service
root#vm1234567:~# journalctl -fu ytbot1.service
-- Logs begin at Mon 2022-05-16 07:41:00 CEST. --
May 16 10:07:18 vm1234567.contaboserver.net systemd[1]: Starting reiss YT Bot...
May 16 10:07:18 vm1234567.contaboserver.net systemd[1]: Started reiss YT Bot.
and it stops there. the log doesn't update or add new information.
desired output:
-- Logs begin at Mon 2022-05-16 07:41:00 CEST. --
May 16 10:07:18 vm1234567.contaboserver.net systemd[1]: Starting reiss YT Bot...
May 16 10:07:18 vm1234567.contaboserver.net systemd[1]: Started reiss YT Bot.
Handling GoogleAPI
2022 5 15 14 38 2
./APR_2022_V20 MAY_2022_V15.mp4
DOWNLOADING VIDEOS...
[...] *Script runs, you get the picture*
Any help? Could it be that I have my .py file in the wrong place or maybe something's wrong with the .service file/working directory? Maybe I should use a different version of python? The script i'm trying to run is pretty complex so maybe forking could be an issue (the code calls on a couple google apis but setting Type=forking just forces the service startup to infinitely load then time-out for some reason)? I don't know mayn... I appreciate feedback. Thanks!
Try using /usr/bin/python3 -u and then the file path.
The -u option tells Python not to fully buffer output.
By default, Python uses line buffering if the output is a console, otherwise full buffering. Line buffering means output is saved up until there's a complete line, and then flushed. Full buffering can buffer many lines at a time. And the systemd journal is probably not detected as a console.
I am running airflow standalone as a local development environment. I followed the instructions provided by Airflow to setup the environment, but now I'd like to shut it down in the most graceful way possible.
I ran the standalone command in a terminal, and so my first attempt was to simply use Ctrl+C. It looks promising:
triggerer | [2022-02-02 10:44:06,771] {triggerer_job.py:251} INFO - 0 triggers currently running
^Cstandalone | Shutting down components
However, even 10 minutes later, the shutdown is still in progress, with no more messages in the terminal. I used Ctrl+C again and got a KeyboardInterrupt. Did I do this the wrong way? Is there a better way to shut down the standalone environment?
You could try the following (in bash):
pkill --signal 2 -u $USER airflow
or
pkill --signal 15 -u $USER airflow
or
pkill --signal 9 -u $USER airflow
Say what?
Here's more description of each part:
pkill - Process kill function.
--signal - Tells what 'signal' to send to the process
2 | 15 | 9 - Is the id for the terminal signal to send.
2 = SIGINT, which is like CTRL + C.
15 = SIGTERM, the default for pkill.
9 = SIGKILL, which doesn't messaround with gracefully ending a process.
For more info, run kill -L in your bash terminal.
-u - Tells the functon to only match processes whose real user ID is listed.
$USER - The current session user environment variable. This may be different on your system, so adjust accordingly.
airflow - The name of the selection criteria or pattern to match.
prep info page for more detail on the options available.
I am working on an ubuntu/windows dual booted system, with the following specifications -->
system-specs
And, my Python version is 3.9.7
So, I am trying to run the following python program using Jina AI : simple-jina-examples/basics/2_executor_options.
But, I am continuously getting stuck. The program, while executing keeps showing the following output -->
euhid#euhid-Inspiron-3576:~/Desktop/python_projects/simple-jina-examples/basics/2_executor_options$ python app.py
indexer#12691[C]:Docker daemon seems not running. Please run Docker daemon and try again.
encoder#12691[W]:Pea is being closed before being ready. Most likely some other Pea in the Flow or Pod failed to start
Collecting en-core-web-md==3.1.0
Using cached https://github.com/explosion/spacy-models/releases/download/en_core_web_md-3.1.0/en_core_web_md-3.1.0-py3-none-any.whl (45.4 MB)
Requirement already satisfied: spacy<3.2.0,>=3.1.0 in /home/euhid/Desktop/python_projects/jina-venv/lib/python3.9/site-packages (from en-core-web-md==3.1.0) (3.1.2)
Now, I already have docker installed and running on my computer too. Because, when I do systemctl status docker, I get the following output -->
● docker.service - Docker Application Container Engine
Loaded: loaded (/lib/systemd/system/docker.service; enabled; vendor preset: enabled)
Active: active (running) since Sun 2021-12-19 10:44:24 IST; 20s ago
TriggeredBy: ● docker.socket
Docs: https://docs.docker.com
Main PID: 14868 (dockerd)
Tasks: 21
Memory: 60.6M
CPU: 860ms
CGroup: /system.slice/docker.service
└─14868 /usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock
Dec 19 10:44:21 euhid-Inspiron-3576 dockerd[14868]: time="2021-12-19T10:44:21.733734909+05:30" level=info msg="ccResolverWrapper: sending update to cc: {[{unix:///run/conta>
Dec 19 10:44:21 euhid-Inspiron-3576 dockerd[14868]: time="2021-12-19T10:44:21.733748034+05:30" level=info msg="ClientConn switching balancer to \"pick_first\"" module=grpc
Dec 19 10:44:22 euhid-Inspiron-3576 dockerd[14868]: time="2021-12-19T10:44:22.315221221+05:30" level=info msg="[graphdriver] using prior storage driver: overlay2"
Dec 19 10:44:22 euhid-Inspiron-3576 dockerd[14868]: time="2021-12-19T10:44:22.545319612+05:30" level=info msg="Loading containers: start."
And to troubleshoot the issue, I have already tried stopping and starting docker with the commands systemctl stop docker and systemctl start docker.
I have also tried uninstalling and installing docker again.
So, I would like for a way to resolve the above issue, so that the program executes properly.
And, I believe that this is happening because my Python Docker client is not able to find the Docker daemon.
If anyone has a way to fix this, with proper and simple explanation. Then it would be very helpful for me.
As I am a beginner, so I would really appreciate it, if the answer can be provided in a beginner-friendly manner too.
Well, this had a very simple answer.
So, basically I was present in the sudo group, and not in the docker group.
And, after adding my user to the docker group, I was able to fix the above issue.
To add an user to a group, one can refer to the direct documentation itself at -->
https://docs.docker.com/engine/install/linux-postinstall/
Help me with this error please.
ERROR: for web Cannot start service web: driver failed programming
external connectivity on endpoint semestral_dj01
(335d0ad4599512f3228b4ed0bd1bfed96f54af57cff4a553d88635f80ac2e26c):
Bind for 0.0.0.0:8000 failed: port is already allocated ERROR:
Encountered errors while bringing up the project.
Go to Terminal and run command:
lsof -i:8000
Where 8000 is the port number.
The result will be like:
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
Python 123456 user ab type 123 000 TCP 0.0.0.0:8000
Now run command in terminal:
kill -9 <PID>
like
kill -9 123456
Then again run your server and the issue will be resolved.
The way i resolved this was by stopping the containers in execution and executing the one I wanted to start.
Use this command into your CMD for stop containers:
docker stop $(docker ps -a -q)
In the case you may want to delete them use this:
docker rm $(docker ps -a -q)
This happen to me time to time in my dev environment. Usually I have to restart docker service to get it working.
I encountered a very similar error. In my case, I had recently upgraded the native nginx version on the Linux box. After the upgrade, nginx automatically started (I had not noticed). When I deployed a docker image with nginx, the 2 nginx instances were competing for the same port (native and docker).
I saw it with:
> sudo netstat -nl -p tcp | grep 443
tcp 0a 0 0.0.0.0:443 0.0.0.0:* LISTEN #####/nginx: master
tcp6 0 0 :::443 :::* LISTEN #####/nginx: master
It was a bit confusing since I was trying to get nginx to run, and it said nginx was using the port. After I had typed docker-compose down, I realized nginx was still using the port, even though the nginx container was destroyed. That made me realize that the native nginx had started up again, even though I didn't manually start it.
My error message:
Cannot start service <webserver>: driver failed programming external connectivity on endpoint <server_instance>_webserver (...<guid>...): Error starting userland proxy: listen tcp 0.0.0.0:443: bind: address already in use
I added a bottle server that uses python's cassandra library, but it exits with this error: Bottle FATAL Exited too quickly (process log may have details) log shows this: File "/usr/local/lib/python2.7/dist-packages/cassandra/cluster.py", line 1765, in _reconnect_internal
raise NoHostAvailable("Unable to connect to any servers", errors)So I tried to run it manually using supervisorctl start Bottle ,and then it started with no issue. The conclusion= Bottle service starts too fast (before the needed cassandra supervised service does): a delay is needed!
This is what I use:
[program:uwsgi]
command=bash -c 'sleep 5 && uwsgi /etc/uwsgi.ini'
Not happy enough with the sleep hack I created a startup script and launched supervisorctl start processname from there.
[program:startup]
command=/startup.sh
startsecs = 0
autostart = true
autorestart = false
startretries = 1
priority=1
[program:myapp]
command=/home/website/venv/bin/gunicorn /home/website/myapp/app.py
autostart=false
autorestart=true
process_name=myapp
startup.sh
#!/bin/bash
sleep 5
supervisorctrl start myapp
This way supervisor will fire the startup script once and this will start myapp after 5 seconds, mind the autostart=false and autorestart=true on myapp.
I had a similar issue where, starting 64 python rq-worker processes using supervisorctl was raising CPU and RAM alert at every restart. What I did was the following:
command=/bin/bash -c "sleep %(process_num)02d && virtualenv/bin/python3 manage.py rqworker --name %(program_name)s_my-rq-worker_%(process_num)02d default low"
Basically, before running the python command, I sleep for N second, where N is the process number, which basically means I supervisor will start one rq-worker process every second.