I am updating my ECS service to use a new task definition. The task definition in this case is a flask application running on gunicorn.
Under certain conditions, I want the flask application to exit and subsequently the update to the ECS service to fail. In this case, I want to check the database connection (and exit if the database connection is not running).
However, I am not seeing this. Whenever I exit or kill the flask application (using sys.kill or os.kill for example), the task definitions still continue to be updated.
How do I kill the update to the ECS service as well given that my entrypoint fails?
I saw this as well: https://aws.amazon.com/blogs/containers/graceful-shutdowns-with-ecs/ but the SIGTERM also doesn't work.
To update this, the problem was that the service kept crash-looping. To resolve this, I used the ECS deployment breaker: https://aws.amazon.com/blogs/containers/announcing-amazon-ecs-deployment-circuit-breaker/ which allows you to exit out of this loop.
Related
I am looking for help deploying my flash app. I've already written the app and it works well. I'm currently using the following command in the directory of my flask code:
sudo uwsgi --socket 0.0.0.0:70 --protocol=http -w AppName:app --buffer-size=32768
This is on my Amazon Lightsail instance. I have the instance linked to a static public IP, and if I navigate to the website, it works great. However, to get the command to continuously run in the background even after logging out of the Lightsail, I first start a screen command, execute the above line of code, and then detach the screen using ctrl-a-d.
The problem is, if the app crashes (which is understandable since it is very large and under development), or if the command is left running for too long, the process is killed, and it is no longer being served.
I am looking for a better method of deploying a flask app on Amazon Lightsail so that it will redeploy the app in the event of a crash without any interaction from myself.
Generally you would write your own unit file for systemd to keep your application running, auto restart when it crashes and start when you boot your instances.
There are many tutorials out there showing how to write such a unit file. Some examples:
Systemd: Service File Examples
Creating a Linux service with systemd
How to write startup script for Systemd?
You can use pm2
Starting an application with PM2 is straightforward. It will auto
discover the interpreter to run your application depending on the
script extension. This can be configurable via the Ecosystem config
file, as I will show you later on this article.
All you need to install pm2 and then
pm2 start appy.py
Great, this application will now run forever, meaning that if the process exit or throw an exception it will get automatically restarted. If you exit the console and connect again you will still be able to check the application state.
To list application managed by PM2 run:
pm2 ls
You can also check logs
pm2 logs
Keeping Processes Alive at Server Reboot
If you want to keep your application online across unexpected (or expected) server restart, you will want to setup init script to tell your system to boot PM2 and your applications.
It’s really simple with PM2, just run this command (without sudo):
pm2 startup
Pm2 Manage-Python-Processes
I have a flask app which i'm trying to front with gunicorn. I want to use the preload flag since my application has some scheduled jobs using apscheduler which i want only to run in the master and not the workers.
I also want to use the ThreadPoolExecutor in python to delegate jobs to the background triggered by a route on my app.
when I use the --preload flag with gunicorn any calls to my threadpoolexecutor (using executor.submit) seem to fail. The same seems to happen when i programatically trigger a job through the apscheduler.
When i don't use the --preload flag everything runs smoothly.
Is there some config i can change to get this working or would this not work with the --preload flag?
I have created a Django application and uploaded in to AWS EC2. I can access the site using public IP address only when I run the python manage.py in AWS command line.
If I close the Putty window, I am not able to access the site. How can I make sure that the site is available always even if I close the command line / putty?
I tried WSGI option but its not working at all. Appreciate your help to give us a solution to run the Python application in AWS.
It happens because you are running the app from within the SSH session, which means that ending the session (SIGHUP) will kill your application.
There are several ways to keep the app running after you disconnect the SSH, the simplest would be to run it inside a screen session and keeping this instance running while disconnecting from SSH, the advantage of this method is that you can still control the app when you are reconnecting to this machine and control the state of the app and also potentially see the logs.
Although it might be pretty cool it's considered a patch, the more stable and solid way would be to create a service that will run the app and will allow you to start, stop and look at logs using the nifty wrappers of systemd.
Keep the process running with screen:
First you'll have to make sure screen is installed (apt-get or yum) whatever suits your desired distro.
Run screen.
Run the app just like you did outside screen.
Detach from the screen session by pressing Ctrl+A and then d.
Disconnect from the SSH and see how the service is still running.
Creating a systemd service is a bit more complicated so try and read through the following manual.
When my new versions of my django application are deployed to heroku, the workers are forced to be restarted. I have some long running tasks which should perform some cleanup prior to being killed.
I have tried registering a worker_shutdown hook which doesn't every seem to get called.
I have also tried the answer in Notify celery task of worker shutdown but i am unclear of how to abort a given task from within this context as calling celery.task.control.active() throws an exception (celery is no longer running).
Thanks for any help.
If you control the deployment maybe you can run a script that does a Control.broadcast to a custom command that you can register beforehand and only after receiving the required replies (you'd have to implement that logic) you'd continue the deployment (or raise a TimeoutException)?
Also, celery already has a predefined command for shutdown which I'm guessing you could overload in your instance or Subclass of worker. Commands have the advantage of being passed a Panel instance which allows you access to the consumer. That should expose a lot of control right there...
I have a wsgi app with a celery component. Basically, when certain requests come in they can hand off relatively time-consuming tasks to celery. I have a working version of this product on a server I set up myself, but our client recently asked me to deploy it to Cloud Foundry. Since Celery is not available as a service on Cloud Foundry, we (me and the client's deployment team) decided to deploy the app twice – once as a wsgi app and once as a standalone celery app, sharing a rabbitmq service.
The code between the apps is identical. The wsgi app responds correctly, returning the expected web pages. vmc logs celeryapp shows that celery is to be up-and-running, but when I send requests to wsgi that should become celery tasks, they disappear as soon as they get to a .delay() statement. They neither appear in the celery logs nor do they appear as an error.
Attempts to debug:
I can't use celery.contrib.rdb in Cloud Foundry (to supply a telnet interface to pdb), as each app is sandboxed and port-restricted.
I don't know how to find the specific rabbitmq instance these apps are supposed to share, so I can see what messages it's passing.
Update: to corroborate the above statement about finding rabbitmq, here's what happens when I try to access the node that should be sharing celery tasks:
root#cf:~# export RABBITMQ_NODENAME=eecef185-e1ae-4e08-91af-47f590304ecc
root#cf:~# export RABBITMQ_NODE_PORT=57390
root#cf:~# ~/cloudfoundry/.deployments/devbox/deploy/rabbitmq/sbin/rabbitmqctl list_queues
Listing queues ...
=ERROR REPORT==== 18-Jun-2012::11:31:35 ===
Error in process <0.36.0> on node 'rabbitmqctl17951#cf' with exit value: {badarg,[{erlang,list_to_existing_atom,["eecef185-e1ae-4e08-91af-47f590304ecc#localhost"]},{dist_util,recv_challenge,1},{dist_util,handshake_we_started,1}]}
Error: unable to connect to node 'eecef185-e1ae-4e08-91af-47f590304ecc#cf': nodedown
diagnostics:
- nodes and their ports on cf: [{'eecef185-e1ae-4e08-91af-47f590304ecc',57390},
{rabbitmqctl17951,36032}]
- current node: rabbitmqctl17951#cf
- current node home dir: /home/cf
- current node cookie hash: 1igde7WRgkhAea8fCwKncQ==
How can I debug this and/or why are my tasks vanishing?
Apparently the problem was caused by a deadlock between the broker and the celery worker, such that the worker would never acknowledge the task as complete, and never accept a new task, but never crashed or failed either. The tasks weren't vanishing; they were simply staying in queue forever.
Update: The deadlock was caused by the fact that we were running celeryd inside a wrapper script that installed dependencies. (Literally pip install -r requirements.txt && ./celeryd -lINFO). Because of how Cloud Foundry manages process trees, Cloud Foundry would try to kill the parent process (bash), which would HUP celeryd, but ultimately lots of child processes would never die.