Using imported variable in Celery task - python

I have a Flask app which uses SocketIO to communicate with users currently online. I keep track of them by mapping the user ID with a session ID, which I can then use to communicate with them:
online_users = {'uid...':'sessionid...'}
I delcare this in my run.py file where the app is launched, and then I import it when I need it as such:
from app import online_users
I'm using Celery with RabbitMQ for task deployment, and I need to use this dict from within the tasks. So I import it as above, but when I use it it is empty even when I know it is populated. I realize after reading this that it is because each task is asynchronous and starts a new process with an empty dict, and so my best bet is to use some sort of database or cache.
I'd rather not run an additional service, and I only need to read from the dict (I won't be writing to it from the tasks). Is a cache/database my only option here?

That depends on what you have in the dict.... If it's something you can serialize to string you can serialize it to Json and pass it as an argument to that task. If it's an object you cannot serialize, then yes you need to use a cache/database.

I came across this discussion which seems to be a solution for exactly what I'm trying to do.
Communication through a message queue is now implemented in package python-socketio, through the use of Kombu, which provides a common API to work with several message queues including Redis and RabbitMQ.
Supposedly an official release will be out soon, but as of now it can be done using an additional package.

Related

flask API calls scheduling with cron jobs

I have a function which calls several API's and updates the database upon being called. I want to schedule the function to run daily at specific time.
Already tried flask_apscheduler and APScheduler which gives this error:
This typically means that you attempted to use functionality that needed an active HTTP request. Consult the documentation on testing for information about how to avoid this problem.
Any leads on this will be helpful.
You should:
Post the code where you define your flask application.
Specify how you try to access the app.
How you're calling the APIs.
Whether those APIs are 3rd party or part of your blueprint.
However, this is probably a context issue. I have come across a similar one with SQLAlchemy before.
You will need to somehow get access to your app, either by using app_context or by importing current_app from Flask and accessing the config.
Assuming you imported the app where your function is used, try this:
with app.app_context():
# call your function here
Refer to this document for more information: Flask Documentation
Another approach you can try, is passing your app configurations through a config class object.
You can define the jobs you want to schedule and pass a reference to your function inside.
Check this example from flask-apscheduler repository on GitHub.

Centralized object store for multiple Python processes

I'm looking for a pythonic and simple way to synchronously share a common data source across multiple Python processes.
I've been thinking about using Pyro4 or Flask to write a kind of a CRUD service that I can get and put objects from and into. But Flask appears to be a lot of coding for a simple task and Pyro4 seems to require some name service.
Do you know of any (preferably easy to use, matured, high-level) library or package that provides centralized storage and high performance access to objects shared across multiple Python processes?
Take a look at Redis
Redis is an in-memory key-value database.
And you can download redis-py to use redis with python

Google AppEngine and Threaded Workers

I am currently trying to develop something using Google AppEngine, I am using Python as my runtime and require some advise on setting up the following.
I am running a webserver that provides JSON data to clients, The data comes from an external service in which I have to pull the data from.
What I need to be able to do is run a background system that will check the memcache to see if there are any required ID's, if there is an ID I need to fetch some data for that ID from the external source and place the data in the memecache.
If there are multiple id's, > 30 I need to be able to pull all 30 request as quickly and efficiently as possible.
I am new to Python Development and AppEngine so any advise you guys could give would be great.
Thanks.
You can use "backends" or "task queues" to run processes in the background. Tasks have a 10-minute run time limit, and backends have no run time limit. There's also a cronjob mechanism which can trigger requests at regular intervals.
You can fetch the data from external servers with the "URLFetch" service.
Note that using memcache as the communication mechanism between front-end and back-end is unreliable -- the contents of memcache may be partially or fully erased at any time (and it does happen from time to time).
Also note that you can't query memcache of you don't know the exact keys ahead of time. It's probably better to use the task queue to queue up requests instead of using memcache, or using the datastore as a storage mechanism.

Web application: Hold large object between requests

I'm working on a web application related to genome searching. This application makes use of this suffix tree library through Cython bindings. Objects of this type are large (hundreds of MB up to ~10GB) and take as long to load from disk as it takes to process them in response to a page request. I'm looking for a way to load several of these objects once on server boot and then use them for all page requests.
I have tried using a remote manager / client setup using the multiprocessing module, modeled after this demo, but it fails when the client connects with an error message that says the object is not picklable.
I would suggest writing a small Flask (or even raw WSGI… But it's probably simpler to use Flask, as it will be easier to get up and running quickly) application which loads the genome database then exposes a simple API. Something like this:
app = Flask(__name__)
database = load_database()
#app.route('/get_genomes')
def get_genomes():
return database.all_genomes()
app.run(debug=True)
Or, you know, something a bit more sensible.
Also, if you need to be handling more than one request at a time (I believe that app.run will only handle one at a time), start by threading… And if that's too slow, you can os.fork() after the database is loaded and run multiple request handlers from there (that way they will all share the same database in memory).

Shared object between requests in Django

I am using a Python module (PyCLIPS) and Django 1.3.
I want develop a thread-safety class which realizes the Object Pool and the Singleton patterns and also that have to be shared between requests in Django.
For example, I want to do the following:
A request gets the object with some ID from the pool, do
something with it and push it back to the pool, then send response
with the object's ID.
Another request, that has the object's ID, gets
the object with the given ID from the pool and repeats the steps from the above request.
But the state of the object will has to be kept while it'll be at the pool while the server is running.
It should be like a Singleton Session Bean in Java EE
How I should do it? Is there something I'll should read?
Update:
I can't store objects from the pool in a database, because these objects are wrappers under a library written on C-language which is API for the Expert System Engine CLIPS.
Thanks!
Well, I think a different angle is necessary here. Django is not like Java, the solution should be tailored for a multi-process environment, not a multi-threaded one.
Django has no immediate equivalent of a singleton session bean.
That said, I see no reason your description does not fit a classic database model. You want to save per object data, which should always go in the DB layer.
Otherwise, you can always save stuff on the session, which Django provides for both logged-in users as well as for anonymous ones - see the docs on Django sessions.
Usage of any other pattern you might be familiar with from a Java environment will ultimately fail, considering the vast difference between running a Java web container, and the Python/Django multi-process environment.
Edit: well, considering these objects are not native to your app rather accessed via a third-party library, it does complicate things. My gut feeling is that these objects should not be handled by the web layer but rather by some sort of external service which you can access from a multi-process environment. As Daniel mentioned, you can always throw them in the cache (if said objects are pickle-able). But it feels as if these objects do not belong in the web tier.
Assuming the object cannot be pickled, you will need to create an app to manage the object and all of the interactions that need to happen against it. Probably the easiest implementation would be to create a single process wsgi app (on a different port) that exposes an api to do all of the operations that you need. Whether you use a RESTful api or form posts is up to your personal preference.
Are these database objects? Because if so, the db itself is really the pool, and there's no need to do anything special - each request can independently load the instance from the db, modify it, and save it back.
Edit after comment Well, the biggest problem is that a production web server environment is likely to be multi-process, so any global variables (ie the pool) are not shared between processes. You will need to store them somewhere that's globally accessible. A short in the dark, but are they serializable using Pickle? If so, then perhaps memcache might work.

Categories