I have a functional Django app that has many Google Text-To-Speech API calls and database reads/writes in my view. When testing locally it takes about 3 seconds to load a page, but when I deploy the app live to Heroku it takes about 15 seconds to load the webpage. So I am trying to reduce load time.
I came across this article: https://devcenter.heroku.com/articles/python-rq that suggests I should use background tasks by queueing jobs to workers using an RQ (Redis Queue) library. I followed their steps and included their worker.py file in the same directory as my manage.py file (not sure if that's the right place to put it). I wanted to test it out locally with a dummy function and view to see if it would run without errors.
# views.py
from rq import Queue
from worker import conn
def dummy(foo):
return 2
def my_view(request):
q = Queue(connection=conn)
for i in range(10):
dummy_foo = q.enqueue(dummy, "howdy")
return render(request, 'dummy.html', {})
In separate terminals I run:
$ python worker.py
$ python manage.py runserver
But when loading the webpage I received many "Apps aren't loaded yet." error messages in the python worker.py terminal. I haven't tried to deploy to Heroku yet, but I'm wondering why I am I getting this error message locally?
Better late than never.
Django-rq requires Django2.0, unfortunately for our project there is no plan to upgrade to the latest version.
So if you are in the same situation, you can still use plain RQ, you just need to add the two following lines in worker.py (worker_django_1_11) :
import django
django.setup()
and pass the worker class like :
> DJANGO_SETTINGS_MODULE=YOURPROJECT.settings rq worker --worker-class='worker_django_1_11.Worker'
You didn't post the code of worker.py, but I'd wager it does not properly initialize Django. Take a look at the contents of manage.py to see an example. So, if worker.py tries to instantiate (or even import) any models, views, etc, you'll get that kind of error. Django needs to resolve settings.py (among other things), then use that to look up database settings, resolve models/relationships, etc.
Simplest path is to use django-rq, a simple library that integrates RQ and Django to handle all this. Your worker.py essentially just becomes python manage.py rqworker.
Related
I'm working on Flask/Python back-end for web application, which will be deployed with Docker. It's receiving some data from the user, then send it to the RabbitMQ queue. On the other side there're workers in docker containers. The consume task, process and send result back to other RabbitMQ message queue. It works pretty well when I'm running Flask app locally without placing it into docker. What I'm doing - on application start-up I'm running receive thread, which is linked to callback function that consume result from message queue and update note in database. Deploying it with docker and gunicorn as web server, thread is not started with no reason.
How can I run background threads, that would work alongside with Flask web application through the whole web application cycle?
At this moment I'm thinking about next options:
Try to start thread on application start-up
Run background task with Celery (cons: after reading a lot of stuff I think Celery is not applicable here because it mostly using for task that could be finished in the upcoming future. I could be wrong. If yes, please correct me).
Flask module called app_scheduler. I've used it just couple times and I think Celery is better in this case.
My web app structure is pretty simple:
app.py
models.py
app
|-engine - # classes Send and Receive, function run_receive()
|-tasks
|-__init__.py
|-routes.py
|-tools.py
|-email.py
|-templates
|-__init__.py # create_app() function here
My create_app() function from the app/__init__.py file.
def create_app(config_class=Config):
app_ = Flask(__name__)
app_.config.from_object(config_class)
# extensions
db.init_app(app_)
migrate.init_app(app_, db)
celery.init_app(app_)
mail.init_app(app_)
# blueprints
from app.engine import bp as bp_engine, run_receive
from app.tasks import bp as bp_tasks
app_.register_blueprint(bp_engine)
app_.register_blueprint(bp_tasks)
...
So what I want to do is on start-up run thread which will consume results from RabbitMQ. Function for this action is called run_receive() and is placed in the app/engine/__init__.py.
Also in the future I'd like to run another thread to check whether notes in the database are out-of-date. So it would be another thread, that would be invoked every day to check database.
So what are the best practices in this situation?
Thank you for your response.
POOL = redis.ConnectionPool(host='localhost', port=6379, db=0)
app = Flask(__name__)
#app.route('/get_cohort_curve/', methods=['GET'])```
def get_cohort_curve():
curve = str(request.args.get('curve'))
cohort = str(request.args.get('cohort'))
key = curve+cohort
return get_from_redis(key)
def get_from_redis(key):
try:
my_server = redis.Redis(connection_pool=POOL)
return json.dumps(my_server.get(key))
except Exception, e:
logging.error(e)
app.run()
I need to write unit-tests for this.
How do I test just the route, i.e. a get request goes to the right place?
Do I need to create and destroy instances of the app in the test for each function?
Do I need to create a mock redis connection?
If you are interested in running something in Flask, you could create a virtual environment and test the whole shebang, but in my opinion that is THE HARDEST way to do it.
When I built my site installing Redis locally, setting the port and then depositing some data inside it with an appropriate key was essential. I did all of my development in iPython (jupyter) notebooks so that I could test the functions and interactions with Redis before adding the confounding layer of Flask.
Then you set up a flawless template, solid HTML around it and CSS to style the site. If it works without data as an html page, then you move on to the python part of Flask.
Solidify the Flask directories. Make sure that you house them in a virtual environment so that you can call them from your browser and it will treat your virtual environment as a server.
You create your app.py application. I created each one of the page functions one at a time. tested to see that it was properly pushing the variables to post on the page and calling the right template. After you get on up and running right, with photos and data, then at the next page's template, using #app.route
Take if very slow, one piece at a time with debugging on so that you can see where when and how you are going wrong. You will only get the site to run with redis server on and your virtual environment running.
Then you will have to shut down the VE to edit and reboot to test. At first it is awful, but over time it becomes rote.
EDIT :
If you really want to test just the route, then make an app.py with just the #app.route definition and return just the page (the template you are calling). You can break testing into pieces as small as you like, but you have to be sure that the quanta you pick are executable as either a python script in a notebook or commandline execution or as a compact functional self-contained website....unless you use the package I mentioned in the comment: Flask Unit Testing Applications
And you need to create REAL Redis connections or you will error out.
I have Django app just for CRUD of some daily data.
Model only have price and date.
I should write some code that will automatically(daily) insert new data to my model.
I am planning to use BeautifulSoup for web page parsing.
So I have few questions:
I am planing to use crontab (manual edit with crontab -e) for setting task to run once daily. Is there smarter solution ?
Should I use Django ORM or just write SQL in separate script ?
I am looking for advices what is better in the long run. I will have more task like this one.
Thanks
If you are already building supporting code in Django for your models and will be running the code on the same server your app is installed on, then you should probably use Django ORM.
See this page for help getting started writing command-line admin utilities that get run in the context of your Django app:
https://docs.djangoproject.com/en/dev/howto/custom-management-commands/
This answer is more a general architecture answer...
To start, everything can be done in django.
I would set up celery and periodic tasks: http://docs.celeryproject.org/en/latest/userguide/periodic-tasks.html
For the actual crawl, you will probably need to fan out on link discovery... you can use celery for that too using just the #task decorator.
Start the project using the django:/// broker. Once you get to size, move on to RabbitMQ.
I'm having a hard time getting a backend to run on the GAE servers. The following works locally, but not when deployed:
counter.py:
from google.appengine.api import logservice
logservice.AUTOFLUSH_ENABLED = False
logging.error("Backend started!")
logservice.flush()
No log message is seen when deployed. I've even tried putting syntax error's in, they are not reported either, so it doesn't seem like the backend is actually running my code. I've tried doing the same in infinite loops with sleeps and such too, same result.
Here is the backends.yaml:
backends:
- name: counter
start: counter.py
instances: 1
class: B1
The backend is listed as running in the management console, but doesn't seem to be actually doing anything.
Anyone able to get a backend running on the GAE servers? Thanks!
There are three ways to call a backend service: scheduled Backend, tasked Backend and browsed Backend. Try http://counter.appname.appspot.com/path.
Sources:
http://www.pdjamez.com/2011/05/google-app-engine-backend-patterns/
http://www.pdjamez.com/2011/05/google-app-engine-backends/comment-page-1/
I have a normal Django site running. In addition, there is another twisted process, which listens for Jabber presence notifications and updates the Django DB using Django's ORM.
So far it works as I just call the corresponding Django models (after having set up the settings environment correctly). This, however, blocks the Twisted app, which is not what I want.
As I'm new to twisted I don't know, what the best way would be to access the Django DB (via its ORM) in a non-blocking way using deferreds.
deferredGenerator ?
twisted.enterprise.adbapi ? (circumvent the ORM?)
???
If the presence message is parsed I want to save in the Django DB that the user with jid_str is online/offline (using the Django model UserProfile). I do it with that function:
def django_useravailable(jid_str, user_available):
try:
userhost = jid.JID(jid_str).userhost()
user = UserProfile.objects.get(im_jabber_name=userhost)
user.im_jabber_online = user_available
user.save()
return jid_str, user_available
except Exception, e:
print e
raise jid_str, user_available,e
Currently, I invoke it with:
d = threads.deferToThread(django_useravailable, from_attr, user_available)
d.addCallback(self.success)
d.addErrback(self.failure)
"I have a normal Django site running."
Presumably under Apache using mod_wsgi or similar.
If you're using mod_wsgi embedded in Apache, note that Apache is multi-threaded and your Python threads are mashed into Apache's threading. Analysis of what's blocking could get icky.
If you're using mod_wsgi in daemon mode (which you should be) then your Django is a separate process.
Why not continue this design pattern and make your "jabber listener" a separate process.
If you'd like this process to be run any any of a number of servers, then have it be started from init.rc or cron.
Because it's a separate process it will not compete for attention. Your Django process runs quickly and your Jabber listener runs independently.
I have been successful using the method you described as your current method. You'll find by reading the docs that the twisted DB api uses threads under the hood because most SQL libraries have a blocking API.
I have a twisted server that saves data from power monitors in the field, and it does it by starting up a subthread every now and again and calling my Django save code. You can read more about my live data collection pipeline (that's a blog link).
Are you saying that you are starting up a sub thread and that is still blocking?
I have a running Twisted app where I use Django ORM. I'm not deferring it. I know it's wrong, but hadd no problems yet.