Python Docker client is not able to find Docker daemon - python

I am working on an ubuntu/windows dual booted system, with the following specifications -->
system-specs
And, my Python version is 3.9.7
So, I am trying to run the following python program using Jina AI : simple-jina-examples/basics/2_executor_options.
But, I am continuously getting stuck. The program, while executing keeps showing the following output -->
euhid#euhid-Inspiron-3576:~/Desktop/python_projects/simple-jina-examples/basics/2_executor_options$ python app.py
indexer#12691[C]:Docker daemon seems not running. Please run Docker daemon and try again.
encoder#12691[W]:Pea is being closed before being ready. Most likely some other Pea in the Flow or Pod failed to start
Collecting en-core-web-md==3.1.0
Using cached https://github.com/explosion/spacy-models/releases/download/en_core_web_md-3.1.0/en_core_web_md-3.1.0-py3-none-any.whl (45.4 MB)
Requirement already satisfied: spacy<3.2.0,>=3.1.0 in /home/euhid/Desktop/python_projects/jina-venv/lib/python3.9/site-packages (from en-core-web-md==3.1.0) (3.1.2)
Now, I already have docker installed and running on my computer too. Because, when I do systemctl status docker, I get the following output -->
● docker.service - Docker Application Container Engine
Loaded: loaded (/lib/systemd/system/docker.service; enabled; vendor preset: enabled)
Active: active (running) since Sun 2021-12-19 10:44:24 IST; 20s ago
TriggeredBy: ● docker.socket
Docs: https://docs.docker.com
Main PID: 14868 (dockerd)
Tasks: 21
Memory: 60.6M
CPU: 860ms
CGroup: /system.slice/docker.service
└─14868 /usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock
Dec 19 10:44:21 euhid-Inspiron-3576 dockerd[14868]: time="2021-12-19T10:44:21.733734909+05:30" level=info msg="ccResolverWrapper: sending update to cc: {[{unix:///run/conta>
Dec 19 10:44:21 euhid-Inspiron-3576 dockerd[14868]: time="2021-12-19T10:44:21.733748034+05:30" level=info msg="ClientConn switching balancer to \"pick_first\"" module=grpc
Dec 19 10:44:22 euhid-Inspiron-3576 dockerd[14868]: time="2021-12-19T10:44:22.315221221+05:30" level=info msg="[graphdriver] using prior storage driver: overlay2"
Dec 19 10:44:22 euhid-Inspiron-3576 dockerd[14868]: time="2021-12-19T10:44:22.545319612+05:30" level=info msg="Loading containers: start."
And to troubleshoot the issue, I have already tried stopping and starting docker with the commands systemctl stop docker and systemctl start docker.
I have also tried uninstalling and installing docker again.
So, I would like for a way to resolve the above issue, so that the program executes properly.
And, I believe that this is happening because my Python Docker client is not able to find the Docker daemon.
If anyone has a way to fix this, with proper and simple explanation. Then it would be very helpful for me.
As I am a beginner, so I would really appreciate it, if the answer can be provided in a beginner-friendly manner too.

Well, this had a very simple answer.
So, basically I was present in the sudo group, and not in the docker group.
And, after adding my user to the docker group, I was able to fix the above issue.
To add an user to a group, one can refer to the direct documentation itself at -->
https://docs.docker.com/engine/install/linux-postinstall/

Related

FastApi with gunicorn/uvicorn stops responding

I'm currently using FastApi with Gunicorn/Uvicorn as my server engine.
I'm using the following config for Gunicorn:
TIMEOUT 0
GRACEFUL_TIMEOUT 120
KEEP_ALIVE 5
WORKERS 10
Uvicorn has all default settings, and is started in docker container casually:
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]
Everything is packed in docker container.
The problem is the following:
After some time (somewhere between 1 day and 1 week, depending on load) my app stops responding (even simple curl http://0.0.0.0:8000 command hangs forever). Docker container keeps working, there are no application errors in logs, and there are no connection issues, but none of my workers are getting the request (and so I'm never getting my response). It seems like my request is lost somewhere between server engine and my application. Any ideas how to fix it?
UPDATE: I've managed to reproduce this behaviour with custom locust load profile:
The scenario was the following:
In first 15 minutes ramp up to 50 users (30 of them will send requests requiring GPU at 1 rps, and 20 will send requests that do not require GPU at 10 rps)
Work for another 4 hours
As the plot shows, in about 30 minutes API stops responding. (And still, there are no error messages/warnings in output)
UPDATE 2:
Can there be any hidden memory leak or deadlock due to incorrect Gunicorn setup or bug (such as https://github.com/tiangolo/fastapi/issues/596)?
UPDATE 4:
I've got inside my container and executed ps command. It shows:
PID TTY TIME CMD
120 pts/0 00:00:00 bash
134 pts/0 00:00:00 ps
Which means my Gunicorn server app just silently turned off. And also there is binary file named core in the app directory, which obviously mens that something has crashed

Systemd says my service is active and started, but I receive no output

I'm trying to run a bot on a VPS and im able to get a systemd service create so as to be able to run my python code automatically if the server were to ever reboot for any reason. The service is enabled, the status is showing as active when I check its status, and journalctl shows that the .py file has started, but that's where my progress ends. I receive no other output after the notification that the service has started. And when I check my VPS console there is 0 CPU usage meaning that the script is in fact not running.
The script is located at /home/user/projects/ytbot1/bot/main.py and runs perfectly fine when executed manually through python3 main.py.
both the script and the .service file were given u+x permissions to the root and user, and the service is set to run only when the user is logged in (I think,... all I did was set User=myusername in ytbot1.service)
[Unit]
Description=reiss YT Bot
[Service]
User=reiss
Group=reiss
Type=exec
ExecStart=/usr/bin/python3 "/home/reiss/projects/ytbot1/bot/main.py"
Restart=always
RestartSec=5
PrivateTmp=true
TimeoutSec=900
[Install]
WantedBy=multi-user.target
here's the output from sudo systemctl status ytbot1
● ytbot1.service - reiss YT Bot
Loaded: loaded (/etc/systemd/system/ytbot1.service; enabled; vendor preset: enabled)
Active: active (running) since Mon 2022-05-16 10:34:04 CEST; 9s ago
Main PID: 7684 (python3)
Tasks: 1 (limit: 19141)
Memory: 98.4M
CGroup: /system.slice/ytbot1.service
└─7684 /usr/bin/python3 /home/reiss/projects/ytbot1/bot/main.py
and sudo journalctl -fu ytbot1.service
root#vm1234567:~# journalctl -fu ytbot1.service
-- Logs begin at Mon 2022-05-16 07:41:00 CEST. --
May 16 10:07:18 vm1234567.contaboserver.net systemd[1]: Starting reiss YT Bot...
May 16 10:07:18 vm1234567.contaboserver.net systemd[1]: Started reiss YT Bot.
and it stops there. the log doesn't update or add new information.
desired output:
-- Logs begin at Mon 2022-05-16 07:41:00 CEST. --
May 16 10:07:18 vm1234567.contaboserver.net systemd[1]: Starting reiss YT Bot...
May 16 10:07:18 vm1234567.contaboserver.net systemd[1]: Started reiss YT Bot.
Handling GoogleAPI
2022 5 15 14 38 2
./APR_2022_V20 MAY_2022_V15.mp4
DOWNLOADING VIDEOS...
[...] *Script runs, you get the picture*
Any help? Could it be that I have my .py file in the wrong place or maybe something's wrong with the .service file/working directory? Maybe I should use a different version of python? The script i'm trying to run is pretty complex so maybe forking could be an issue (the code calls on a couple google apis but setting Type=forking just forces the service startup to infinitely load then time-out for some reason)? I don't know mayn... I appreciate feedback. Thanks!
Try using /usr/bin/python3 -u and then the file path.
The -u option tells Python not to fully buffer output.
By default, Python uses line buffering if the output is a console, otherwise full buffering. Line buffering means output is saved up until there's a complete line, and then flushed. Full buffering can buffer many lines at a time. And the systemd journal is probably not detected as a console.

Elastic Beanstalk deployment failing due to .ebextensions File

I'm deploying a flask (flask-restplus) REST API to an AWS Elastic Beanstalk instance, and I'm running into a weird failure mode.
One of my API endpoints has a dependency on OpenCV, which requires some dependencies as outlined at: ImportError: libGL.so.1: cannot open shared object file: No such file or directory while importing OCC. Per the answers there, I created an .ebextensions directory and created two files, one to install the libGL packages, which looks like this:
packages:
yum:
mesa-libGL : []
mesa-libGL-devel : []
I saved that file as packages.config, if that matters.
The second file in .ebextensions downloads and installs zlib:
commands:
00_download_zlib:
command: |
wget https://github.com/madler/zlib/archive/v1.2.9.tar.gz
tar xzvf v1.2.9.tar.gz
cd zlib-1.2.9
./configure
make
make install
ln -fs /usr/local/lib/libz.so.1.2.9 /lib64/libz.so
ln -fs /usr/local/lib/libz.so.1.2.9 /lib64/libz.so.1
I saved that file as zlib.config.
When I first ran eb deploy, everything worked great. Deployment was successful, my API responded to requests, and the code that depended on OpenCV worked. So far so good.
However, on subsequent deployments, I've gotten the following errors:
2020-11-18 23:47:44 ERROR Instance deployment failed. For details, see 'eb-engine.log'.
2020-11-18 23:47:45 ERROR [Instance: i-XXXXXXXXXXXXX] Command failed on instance. Return code: 1 Output: Engine execution has encountered an error..
2020-11-18 23:47:45 INFO Command execution completed on all instances. Summary: [Successful: 0, Failed: 1].
2020-11-18 23:47:45 ERROR Unsuccessful command execution on instance id(s) 'i-XXXXXXXXXXXXX'. Aborting the operation.
2020-11-18 23:47:46 ERROR Failed to deploy application.
I went in and pulled down the logs from the instance, first looking at eb-engine.log. The only error there is:
2020/11/18 23:47:44.131837 [ERROR] An error occurred during execution of command [app-deploy] - [PreBuildEbExtension]. Stop running the command. Error: EbExtension build failed. Please refer to /var/log/cfn-init.log for more details.
However, looking at cfn-init.log just indicates that everything succeeded:
2020-11-18 23:47:34,297 [INFO] -----------------------Starting build-----------------------
2020-11-18 23:47:34,306 [INFO] Running configSets: Infra-EmbeddedPreBuild
2020-11-18 23:47:34,309 [INFO] Running configSet Infra-EmbeddedPreBuild
2020-11-18 23:47:34,313 [INFO] Running config prebuild_0_newapi
2020-11-18 23:47:36,512 [INFO] Running config prebuild_1_newapi
2020-11-18 23:47:44,106 [INFO] Command 00_download_zlib succeeded
2020-11-18 23:47:44,108 [INFO] ConfigSets completed
2020-11-18 23:47:44,108 [INFO] -----------------------Build complete-----------------------
I then tried removing the entire .ebextensions directory and re-deploying, and the deployment succeeded. Then I tried adding back the .ebextensions directory and adding the files one at a time, and discovered that the deployment worked fine when I added packages.config, but failed again when I added zlib.config.
My question boils down to: why is this happening, and is there anything I can do to resolve it? My understanding is that I need both of these files deployed to my instance in case I migrate to a different enviroment, or AutoScaling migrates my instance, etc.
The only thing I can think of is that the instance doesn't like the fact that I keep re-installing zlib, but the cfn-init-cmd.log indicates that all the commands in zlib.config are succeeding, as does cfn-init.log. So why is eb-engine.log reporting an error? Is it telling me to look in the wrong place for logs that may be relevant? I've looked in every log file and I don't see anything else indicating any issues.
I did find one tangentially-related possible solution relating to Immutable Environment Updates, which looks like it may work but feels like a bit of unnecessary work. At the very least I'd like to understand why I need to make that change and why Elastic Beanstalk isn't playing nicer with my .ebextensions.
Just in case anyone runs across this in the future, I wanted to share the solution. I was never able to determine why the zlib install process was failing after the first deployment on an instance, so I ended up using the Immutable Environment Updates settings I linked in my original question.
Deployments take a bit longer to process as the deployment creates an autoscaling group and a new instance on each deployment, but my deployments just work every time now.

Openshift cartridge deploying to wrong/old app

After accidentally damaging my flask app on openshift I deleted it and am trying to rebuild it. I believe I have installed it correctly by creating a new python app, then performing:
$ git remote set-url origin ssh://55ddee2489f5.......#myapp-mydomain.rhcloud.com/~/git/myapp.git/
$ git push -f origin master
then
remote: Activation status: success
remote: Deployment completed with status: success
To ssh://55ddee248........c#myflaskapp-mydomain.rhcloud.com/~/git/myflaskapp.git/
+ 068620c...00df6fb master -> master (forced update)
Next I want to add a redis cartridge.
$ rhc add-cartridge http://cartreflect-claytondev.rhcloud.com/reflect\?github\=smarterclayton/openshift-redis-cart
The cartridge 'http://cartreflect-claytondev.rhcloud.com/reflect?github=smarterclayton/openshift-redis-cart' will be downloaded and installed
Adding http://cartreflect-claytondev.rhcloud.com/reflect?github=smarterclayton/openshift-redis-cart to application 'myflaskapp' ... Application '5585ab144.......'
not found.
As you can see the cartridge is being deployed to the old location '5585ab144.......', not ssh://55ddee248........c#myflaskapp-mydomain.rhcloud.com/~/git/myflaskapp.git/
How can I fix this?
If you use the same DNS (application) name (app-domain.rhcloud.com) that your old application was using, you need to wait for the DNS to update and point to the new application. It could take up to 24 hours, but usually it just takes a couple of hours.

Apache2 not serving django content

I am taking over a django project which another developer maintained. The service is run on an Ubuntu machine. ZEO is used for content caching. Ajax/Dajax is used for asynchronious content. Celery is used for task management and Django is used for the project itself.
The service is usually reached via a specific IP address which limits access to specific URLs. http://my_server_ip. Without knowingly changing anything, this started to not work. Instead of taking me to the splash page, entering the IP would hang, unsuccesfully connecting. I don't get a 404, 500 or some other error, it just sits and continually tries to load as if waiting to connect or to receive content.
I attempted to restart the service in the hopes that this would solve the problem, it did not. I performed a system reboot and followed the following commands, as per the prior developer's documentation, to reboot the server.
From within the django project:
runzeo -a localhost:8090 -f /path/to/operations_cache.fs
su project_owner
python manage.py celery worker -n multiprocessing_worker --loglevel=debug -Q multiprocessing_queue
python manage.py celery worker --concurrency=500 --pool=eventlet --loglevel=debug -Q celery -n eventlet_worker
The two celery commands had to be run as the owner of the project directory.
Finally I ran sudo service apache2 restart. Upon completion, I tried to navigate to the webpage but received the same response: hung on connecting. The trac pages do work at http://my_server_ip/trac.
The following is all I have found in the apache log files.
error.log
[Fri Feb 06 16:01:11 2015] [error] /usr/lib/python2.7/dist-packages/configobj.py:145: DeprecationWarning: The compiler package is deprecated and removed in Python 3.x.
[Fri Feb 06 16:01:11 2015] [error] import compiler
[Fri Feb 06 16:01:11 2015] [error]
access.log
my.ip - user [06/Feb/2015:15:55:40 -0500] "GET / HTTP/1.1" 500 632 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:35.0 Gecko/20100101 Firefox/35.0"
I have tried looking into the django logs, but nothing appears there. Perhaps I am not finding the correct logs.
As a good start, where can I find out what the system is hanging on. How do I determine if it is an issue with django, apache, the interaction between the two, etc. This would help me zero in on specifically what is happening.
Edit
I was able to resolve my problem, though I cannot say for sure what resolved the issue. I suspect that the issue had to do with permissions on the static files folder. I serve my content through the www-user. After going through the steps described above I then ran yes yes | python manage.py collectstatic as the user www-data. The server was able restart and I was able to access the trac content as well as the django content.
I am hesitant to post this as an answer because I do not know with certainty whether my solution described here is the step which solved the problem.

Categories