How to run various workers on OpenShift?

How to run various workers on OpenShift? - python

I have a Python/Flask project (API) that contains a few workers that must be run continuously. They connect to Redis using an outside provider (https://redislabs.com/). I didn't find how can I configure Openshift to run my workers. When using Heroku, it was as simple as:
web: gunicorn wsgi --log-file -
postsearch: python manage.py worker --queue post-search
statuses: python manage.py worker --queue statuses
message: python manage.py worker --queue message
invoice: python manage.py worker --queue invoice
But for Openshift, despite googling many things, I was not able to find anything to help me. Ideally, I would avoid deploying my application to each gears. How can I run multiple workers with OpenShift?

Taken from Getting Started with Openshift by Katie J. Miller and Steven Pousty
Cartridge
To get a gear to do anything, you need to add a cartridge. Cartridges are the plugins that house the framework or components that can be used to create and run an application. One or more cartridges run on each gear, and the same cartridge can run on many gears for clustering or scaling. There are two kind of cartridges:
Standalone
These are the languages or application server that are set up to serve your web content, such as JBoss, Tomcat, Python, or Node.js. Having one of these cartridges is sufficient to run an application.
Embedded
An embedded cartridge provides functionality to enhance your application, such as database or Cron, but cannot be used on its own to create and application.
TL;DR: you must use cartridges to run a worker process. The documentation can be found here and here, and the community-mantained examples here and a series of blog post begins here
A cartridges is a bunch of file and a manifest to let OS know how to run the cartridge and how to resolve a deps.
But let's build something. Create a Django/Python app, the result is:
Now install your (custom) cartridge from the link on the bottom or from the command line tool, you can use the link to the cartridge repository.

OpenShift's integration with external services is done by configuring the relevant environment variables as explained at: https://developers.openshift.com/external-services/index.html#setting-environment-variables
Heroku's apps rely on a REDISCLOUD_URL env var that is automatically provisioned - you'll need to set up something similar in your OpenShift deployment with the applicable information about your database from the service's dashboard.

Related

Django how to run external module as Daemon

Is there a correct way to start an infinite task from Django Framework?
I need to run a MQTT Client (based on Paho) and a Python PID implementation.
I want to use Django as "Orhestrator" because I want to start daemons only if django it's running.
I use django becasue of it's simplicity for creating Rest API and ORM layer.
The only way I've found (here on github) it's to modify the __init__.py including here my external module --> How to use paho mqtt client in django?.
This it's not suitable for me beacause it start the daemons on every django manage task.
Has anyone already solved this problem?
Thank you in advance.

As far as I am concerned, I use supervisor to daemonize my django management commands.
As my django projects all run in a virtualenv, I created a script to initialize the virtualenv before running the management command:
/home/cocoonr/run_standalone.sh
#/bin/bash
export WORKON_HOME=/usr/share/virtualenvs
source /usr/share/virtualenvwrapper/virtualenvwrapper.sh
workon cocoonr # name of my virtualenv
django-admin "$#"
And here is an exemple of supervisor configuration for a command
/etc/supervisor/conf.d/cocoonr.conf
[program:send_queued_mails_worker]
command=/bin/bash /home/cocoonr/run_standalone.sh send_queued_mails_worker
user=cocoonr
group=cocoonr
stopasgroup=true
environment=LANG=fr_FR.UTF-8,LC_ALL=fr_FR.UTF-8
stderr_logfile=/var/log/cocoonr/send_queued_mails_worker.err
stdout_logfile=/var/log/cocoonr/send_queued_mails_worker.log

Correct way to update live django web application

Before the actual problem let me explain our architecture. We are using git through ssh on our servers and have post-recieve hooks enabled to setup code. The code is all maintained in a separate folder. What we need is whenever a person pushes code to the server, it runs tests,migrations and updates it on live site. Currently whenever the application undergoes update in model it crashes.
What we need is a way that the hooks script detect if the code is proper, By proper i mean no syntax error etc, then run migrations and update the current application with the new codes without downtime. We are using nginx to proxy to django application,virtualenv for packages install from requirements.txt file and gunicorn for deployment.
The base line is that if there is failure at any point the push commit should be rejected. and if all tests are successfull, it should make migrations to dbs and start with the new app.
A though that i had was to use two ports for the same . One runing the main application and another with the push commits. If pushed codes were successfully tested , change port on nginx to git application and have nginx reload. Please discuss drawbacks of this application if any. And a sample post-commit script to show how to reject git commit in case of failure.

Consider using fabric. Fabric will allow you to create pythonic scripts and you can run deployments in remote server creating a new database and check whether the migrations are done safe. Once all good you can mention in your fabric script to deploy in prod or if fails mention in fabric to send an email.
This makes you life simple.

Can I use Heroku as a Python server?

My web host does not have python and I am trying to build a machine learning application. I know that heroku lets you use python. I was wondering if I could use heroku as a python server? As in I would let heroku do all of the python processing for me and use my regular domain for everything else.

Yes, and it may be a pain at first but once it is set I would say Heroku is the easiest platform to continually deploy to. However, it is not intuitive - don't try and just 'take a stab' at it; follow a tutorial and try and understand why Heroku works the way it does.
Following the docs is a good bet; Heroku has great documentation for the most part.
Here's the generalized workflow for deploying to Heroku:
Locally, create your project and use virtualenv to install/manage
libraries.
Initialize a git repository in the base dir for your
Python project; create a heroku remote (heroku create)
Create a
procfile for Heroku to use when starting gunicorn (or see
the options for using waitress/etc); this is used by Heroku to start your process
cd to your base dir; freeze
your virtualenv (pip freeze > requirements.txt) and add/commit
requirements.txt. This tells Heroku what packages need to be installed, a requirement for your deployment to work. If you are trying to run a Python project and there are required packages missing, the app will be unable to start and Heroku will display an Internal Server Error.
Whenever changes are made, git commit your changes and git push heroku master to push all commits to Heroku. This will cause Heroku to restart the server application with your updated deployment. If there's a failure, you can use heroku rollback to just return to your last deployment.
In reality, it's not a pain in the ass, just particular. Knowing the rules of Heroku, you are able to manage your deployment with command-line git commands with ease.
One caveat - If deploying Django, Flask applications etc there are peculiarities to account for; specifically, non-project files (including assets) should NOT be stored on Heroku as Heroku periodically restarts your 'dyno' (server instance(s)), loading the whole project from the latest push to Heroku. With Django and Flask, this typically means serving assets/static/media files from an Amazon S3 bucket.
That being said, if you use virtualenv properly, provision your databases, and follow Heroku practices for serving files and commiting updates, it is (imho) the absolute best platform out there for ease of use, reliable uptime, and well-oiled rolling deployments.
One last tip - if you are creating a Django app, I'd suggest starting your project out of this boilerplate. I have a custom one I use for new projects and can start and publish a project in minutes.

Yes, you can use Heroku as a python server. I put a Python Flask server on Heroku but it was a pain: Heroku seemed to have some difficulties, and there were lots of conflicting advice on getting around those. I eventually got it working, can't remember what web page had the ultimate answer but you might look at this one: http://blog.miguelgrinberg.com/post/the-flask-mega-tutorial-part-xviii-deployment-on-the-heroku-cloud

Have you done your Python Server on Heroku by using twisted?
I don't know if this can help you.
I see the doc 'Getting Started on Heroku with Python' is about the Django.
It is sure that Heroku can use Twisted from docs
Pure Python applications, such as headless processes and evented web frameworks like Twisted, are fully supported.
django-twisted-server has twisted in django but it isn't on Heroku.

Can celery celerybeat use a Database Scheduler without Django?

I have a small infrastructure plan that does not include Django. But, because of my experience with Django, I really like Celery. All I really need is Redis + Celery to make my project. Instead of using the local filesystem, I'd like to keep everything in Redis. My current architecture uses Redis for everything until it is ready to dump the results to AWS S3. Admittedly I don't have a great reason for using Redis instead of the filesystem. I've just invested so much into architecting this with Docker and scalability in mind, it feels wrong not to.

I was searching for a non-Django database scheduler too a while back, but it looked like there's nothing else. So I took the Django scheduler code and modified it to use SQLAlchemy. Should be even easier to make it use Redis instead.

It turns out that you can!
First I created this little project from the tutorial on celeryproject.org.
That went great so I built a Dockerized demo as a proof of concept.
Things I learned from this project
Docker
using --link to create network connections between containers
running commands inside containers
Dockerfile
using FROM to build images iteratively
using official images
using CMD for images that "just work"
Celery
using Celery without Django
using Celerybeat without Django
using Redis as a queue broker
project layout
task naming requirements
Python
proper project layout for setuptools/setup.py
installation of project via pip
using entry_points to make console_scripts accessible
using setuid and setgid to de-escalate privileges for the celery deamon

What is the best way to run a django project on aws?

How should the project be deployed and run. There are loads of tools in this space. Which should be used and why?
Supervisor
Gunocorn
Ngnix
Fabric
Boto
Pip
Virtualenv
Load balancers

It depends on your configuration. We are using the following stack for our environment on Rackspace, but you can setup the same thing on AWS with EC2 instances.
Ubuntu 11.04
Varnish (in memory cache) to avoid disk seeks
NginX to server static content
Apache to server dynamic content (MOD-WSGI)
Python 2.7.2 with Django
Jenkins for our continuous builds
GIT for version control
Fabric for the deployment.
So the way it works is that a GIT push to the origin repository is being polled by Jenkins. Jenkins then pulls the changes down from the origin. Builds a Python Egg, runs Unit tests, uses Fabric to deploy this egg to the environments necessary and reloads the Apache config to make sure the forked Apache processes are picking up the new Python egg.
Hope this helps.

As Michael Klockel already stated depends on your configuration, I have:
Ubuntu 10.04 LTS
Nginx
Uwsgi
git version control
python virtualenv and pip
You can check the deployment settings here:
Django, Virtualenv, nginx + uwsgi import module wsgi error
and why I use nginx and uwsgi here:
http://nichol.as/benchmark-of-python-web-servers
Also I use fabric for the deployment of the app, and chef solo http://ericholscher.com/blog/2010/nov/8/building-django-app-server-chef/
johny cache for sql queries and raven and sentry to keep a log of whats going on on the app.

I'd use uWSGI+Nginx from a performance perspective (I think the comparison has already been linked in another answer), pip and virtualenv for deployment as this keeps things self-contained, and facilitates clean deployment using fabric or similar. Use git for version control. Jenkins can handle continuous integration. I'd use the AWS load balancer (ELB) in front of your EC2 instances for balancing - does the job without you having to fret too much about it. django-storages for uploading your static files to s3, that saves you the effort of having another server to hand out static files.
However, it depends a little on your admin overheads. If you're looking for something clean and simple for deployment and scaling, I'd scrap the whole AWS EC2 stack, use Heroku as a front end, and s3 for your static files. This saves all the admin time of maintaining the boxes, and allows you to concentrate on the dev.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.