I am using a Python module (PyCLIPS) and Django 1.3.
I want develop a thread-safety class which realizes the Object Pool and the Singleton patterns and also that have to be shared between requests in Django.
For example, I want to do the following:
A request gets the object with some ID from the pool, do
something with it and push it back to the pool, then send response
with the object's ID.
Another request, that has the object's ID, gets
the object with the given ID from the pool and repeats the steps from the above request.
But the state of the object will has to be kept while it'll be at the pool while the server is running.
It should be like a Singleton Session Bean in Java EE
How I should do it? Is there something I'll should read?
Update:
I can't store objects from the pool in a database, because these objects are wrappers under a library written on C-language which is API for the Expert System Engine CLIPS.
Thanks!
Well, I think a different angle is necessary here. Django is not like Java, the solution should be tailored for a multi-process environment, not a multi-threaded one.
Django has no immediate equivalent of a singleton session bean.
That said, I see no reason your description does not fit a classic database model. You want to save per object data, which should always go in the DB layer.
Otherwise, you can always save stuff on the session, which Django provides for both logged-in users as well as for anonymous ones - see the docs on Django sessions.
Usage of any other pattern you might be familiar with from a Java environment will ultimately fail, considering the vast difference between running a Java web container, and the Python/Django multi-process environment.
Edit: well, considering these objects are not native to your app rather accessed via a third-party library, it does complicate things. My gut feeling is that these objects should not be handled by the web layer but rather by some sort of external service which you can access from a multi-process environment. As Daniel mentioned, you can always throw them in the cache (if said objects are pickle-able). But it feels as if these objects do not belong in the web tier.
Assuming the object cannot be pickled, you will need to create an app to manage the object and all of the interactions that need to happen against it. Probably the easiest implementation would be to create a single process wsgi app (on a different port) that exposes an api to do all of the operations that you need. Whether you use a RESTful api or form posts is up to your personal preference.
Are these database objects? Because if so, the db itself is really the pool, and there's no need to do anything special - each request can independently load the instance from the db, modify it, and save it back.
Edit after comment Well, the biggest problem is that a production web server environment is likely to be multi-process, so any global variables (ie the pool) are not shared between processes. You will need to store them somewhere that's globally accessible. A short in the dark, but are they serializable using Pickle? If so, then perhaps memcache might work.
Related
In my small web-site I feel need to make some data widely available, to avoid exchanging with database for every request made. E.g. this could be the list of current users show in the bottom of every page or the time of last update of ranking.
The stuff works in Python (Flask) running upon nginx + uwsgi (this docker image).
I wonder, do I have some small cache or shared memory for keeping such information "out of the box", or I need to take care of explicitly setting up some dedicated cache? Or perhaps some thing like this is provided by nginx?
alternatively I still can use database for it has its own cache I think, anyway
Sorry if question seems to be naive/silly - for I come from java world (where things a bit different as we serve all requests with one fat instance of java application) - and have some difficulty grasping what powers does wsgi/uwsgi provide. Thanks in advance!
Firstly, nginx has cache:
https://www.nginx.com/blog/nginx-caching-guide/
But for flask cacheing you also have options:
https://pythonhosted.org/Flask-Cache/
http://flask.pocoo.org/docs/1.0/patterns/caching/
Did you have a look at caching section from Flask docs?
It literally says:
Flask itself does not provide caching for you, but Werkzeug, one of the libraries it is based on, has some very basic cache support
You create a cache object once and keep it around, similar to how Flask objects are created. If you are using the development server you can create a SimpleCache object, that one is a simple cache that keeps the item stored in the memory of the Python interpreter:
from werkzeug.contrib.cache import SimpleCache
cache = SimpleCache()
-- UPDATE --
Or you could solve on the frontend side storing data in the web browser local storage.
If there's nothing in the local storage you call the DB, else you use the information from local storage rather than making db call.
Hope it helps.
I'm using Azure function apps with Python. I have two dozen function apps that all use a Postgres DB and Custom Vision. All function apps are setup as HttpTriggers. Right now, when a function is triggered, a new database handler (or custom vision handler) object is created, used and terminated when the function app call is done.
It seems to be very counterproductive to instantiate a new objects on every single request that comes in. Is there a way to instantiate shared objects once and then pass them to a function when they are called?
In general, Azure Functions are intended to be stateless and not share objects from one invocation to the next. However, there are some exceptions.
Sharing Connection Objects
Azure Docs recommend the Improper Instantiation Pattern for sharing of connection objects that are intended in an application to be opened once and used again and again.
There are some things to keep in mind for this to work for you, mainly:
The key element of this antipattern is repeatedly creating and destroying instances of a shareable object. If a class is not shareable (not thread-safe), then this antipattern does not apply.
They have some walkthroughs there that will probably help you. Since your question is fairly generic, the best I can do is recommend you read through it and see if that will help you.
Durable Functions
The alternative is to consider the Durable Functions instead of the standard. They are intended to be able to pass objects between functions making them not quite stateless.
Durable Functions is an advanced extension for Azure Functions that isn't appropriate for all applications. This article assumes that you have a strong familiarity with concepts in Azure Functions and the challenges involved in serverless application development.
Background:
This is the situation I am facing and so far my current solution seems rather clunky. I want to improve on it. Right now:
I setup connections to each database in the main function of the Pyramid application:
def main(global_config, **settings):
a_engine = engine_from_config(settings, 'A.')
b_engine = engine_from_config(settings, 'B.')
ASession.configure(bind=a_engine)
BSession.configure(bind=b_engine)
"ASession" and "BSession" are simply globally defined scoped_session in /models/init.py.
ASession = scoped_session(sessionmaker(extension=ZopeTransactionExtension()))
I define model base class like so. For example:
ABase = declarative_base()
class user(ABase):
id = Column(Integer, primary_key=True)
name = Column(String)
This somehow already doesn't feel very clean. But now that this model is supposed to be accessed from a different application, I also need to define the engine and connection again in that application. This feels extremely redundant.
Problem Abstracted:
Assume that there are 2 different databases:
A and B
Also assume that you want A and B to be accessible from 2 different applications (e.g.: Pyramid application, Bokeh Server App which uses Tornado) using the same model.
In short, how would one best pattern objects/models/classes/functions to produce clean non-redundant code in Python3?
Initial Thought After The Question Was Posted:
Thinking about this a bit more, I think I want each model to be somehow "self-contained". The model should bring with it methods for initiating connections. In other words, the initiation of db connections should be decoupled from the web application itself.
And it should be done in an instance kind of manner. So that multiple applications can use the same models. Each application would have its own session connection to either DB.
How would the community pattern this? Friday afternoons don't lend themselves to find answers to these kinds of questions for myself at least.
I have done this. My recommendation below is how I like doing it, but is not the only way. I would ditch scoped sessions and the transaction manager and make explicit session management objects, with request lifecycle callbacks handling creation, closing, committing, or rolling back your sessions. Basically scoped sessions are a way to simulate a global by getting the same item for that thread of execution. The other way to do this in Pyramid is to attach things to the registry and the request, because you have those everywhere. You attach shared components to the registry (the ZCA) and per-request objects to the request.
When you have multiple sessions, I've found it much easier to reason about them and keep track of them if they are handled by components that wrap up everything for that engine. So for a case like that you describe, I've made two different DB engine components, that are created on start up, attached to the registry, and have a method for getting a fresh session. If you create these components properly, they should be usable in any application, whether it's Pyramid, Tornado, or your test script. You just make sure it has a constructor with some sane way of passing in settings for setting up the engine, whether it's a settings dict or kwargs. I then make my data model(s) live in their own python packages and it's easy to have any app in the family import the model, instantiate the engine components and go to town. Note that if you like using the ZCA registry (and I love it, it's a fantastic DI system), there's nothing preventing you from using it in non-pyramid apps, you just set it up manually in your server start up code.
In Pyramid specifically, I make a custom Request class and use the reify decorator to allow other pyramid code to get the session(s) for that request. The request class has end-of-life-callbacks attached to close out the sessions, and to do rollbacks or commits. There is a bit more boilerplate, but for me it's cleaner in that I can very easily trace where and when in code and time my session management is happening. It's also a good way for testing.
That said, there are lots of smart folks in SQLAlchemy/Pyramid land who swear by scoped sessions and the transaction manager, so there are other valid approaches. Hope that helps.
I'm just wondering if Django was designed to be a fully stateless framework?
It seems to encourage statelessness and external storage mechanisms (databases and caches) but I'm wondering if it is possible to store some things in the server's memory while my app is in develpoment and runs via manage.py runserver.
Sure it's possible. But if you are writing a web application you probably won't want to do that because of threading issues.
That depends on what you mean by "store things in the server's memory." It also depends on the type of data. If you can, you're better off storing "global data" in a database or in the file system somewhere. Unless it is needed every request it doesn't really make sense to store it in the Django instance itself. You'll need to implement some form of locking to prevent race conditions, but you'd need to worry about race conditions if you stored everything on the server object anyway.
Of course, if you're talking about user-by-user data, Django does support sessions. Or, and this is another perfectly good option if you're willing to make the user save the data, cookies.
The best way to maintain state in a django app on a per-user basis is request.session (see django sessions) which is a dictionary you can use to remember things about the current user.
For Application-wide state you should use the a persistent datastore (database or key/value store)
example view for sessions:
def my_view(request):
pages_viewed = request.session.get('pages_viewed', 1) + 1
request.session['pages_viewed'] = pages_viewed
...
If you wanted to maintain local variables on a per app-instance basis you can just define module level variables like so
# number of times my_view has been served since by this server
# instance since the last restart
served_since_restart = 0
def my_view(request):
served_since_restart += 1
...
If you wanted to maintain some server state across ALL app servers (like total number of pages viewed EVER) you should probably use a persistent key/value store like redis, memcachedb, or riak. There is a decent comparison of all these options here: http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis
You can do it with redis (via redis-py) like so (assuming your redis server is at "127.0.0.1" (localhost) and it's port 6379 (the default):
import redis
def my_view(request):
r = redis.Redis(host='127.0.0.1', port="6379")
served = r.get('pages_served_all_time', 0)
served += 1
r.set('pages_served_all_time', served)
...
There is LocMemCache cache backend that stores data in-process. You can use it with sessions (but with great care: this cache is not cross-process so you will have to use single process for deployment because it will not be guaranteed that subsequent requests will be handled by the same process otherwise). Global variables may also work (use threadlocals if they shouldn't be shared for all process threads; the warning about cross-process communication also applies here).
By the way, what's wrong with external storage? External storage provides easy cross-process data sharing and other features (like memory limiting algorithms for cache or persistance with databases).
To share a state(e.g. user) between a module in django people sometime use thread local storage, but as google app engine follows CGI standard and keeps state of a request in os.environ , can I share objects between two modules just by setting it e.g.
mod1.my_data = {} and now any other module can get handle to my_data?
without worrying about other threads/requests sharing/overwriting it?
Later requests that happen to be served on the same process (you can't control that) would access just the same mod1.my_data object (unless you take pains to reassign it as a fresh object at the start of each request, of course).