I've been able to create objects that are created at every request from this link: http://flask.pocoo.org/docs/appcontext/#locality-of-the-context.
I'm actually creating an API based off of http://blog.miguelgrinberg.com/post/designing-a-restful-api-using-flask-restful.
I want to be able to load an object once and just have it return a processed response rather than it loading at every request. The object is not a DB, just requires unpickling a large file.
I've looked through the documentation, but I'm still confused about this whole Flask two states thing.
The Flask contexts only apply per request. Use a module global to store data you only want to load once.
You could just load the data on startup, as a global:
some_global_name = load_data_from_pickle()
WSGI servers that support multiple processes either fork the process, or start a new Python interpreter as needed. When forking, globals are copied to the child process.
You can also use before_first_request() hook to load that data into your process; this is only called if the process has to handle an actual request. This would be after the process fork, giving your child process unique data:
#app.before_first_request
def load_global_data():
global some_global_name
some_global_name = load_data_from_pickle()
Related
Currently I create a library that records backend calls like ones made to boto3 and requests libraries, and then populates a global "data" object based on some data like the status code of responses, etc.
I originally had the data object as global, but then I realized this was a bad idea because when the application is run in parallel, the data object is simultaneously modified (which would possibly corrupt it), however I want to keep this object separate for each invocation of my application.
So I looked into Flask context locals, similar to how it does for its global "request" object. I manage to implement a way using LocalProxy how they did it, so it works fine now with parallel requests to my application - the issue now though, is that whenever the application spawns a new sub-thread it creates an entirely new context and thus I can't retrieve the data object from its parent thread, e.g. for that request session - basically I need to copy and modify the same data object that is local to the main thread for that particular application request.
To clarify, I was able to do this when I previously had data as a true "global" object - multiple sub-threads could properly modify the same object. However, it did not handle the case for simultaneous requests made to application, as I mentioned; so I manage to fix that, but now the sub-threads are not able to modify the same data object any more *sad face*
I looked at some solutions like below, but this did not help me because the decorator approach only works for "local" functions. Since the functions that I need to decorate are "global" functions like requests.request that threads across various application requests will use, I think I need to use another approach where I can temporarily copy same thread context to use in sub-threads (and my understanding is it should not overwrite or decorate the function, as this is a "global" one that will be use by simultaneous requests to application). Would appreciate any help or possible ideas how I can make this work for my use-case.
Thanks.
Flask throwing 'working outside of request context' when starting sub thread
I have a line of code like this:
pool.map(functools.partial(method_to_run, self), data)
method_to_run takes the data item and then uses an object attached to self to make a request to a server, using an instance variable of that object containing an authentication token, set earlier.
The issue I have is that each process seems to get a new instance of that object (or self), and therefore that instance has not had the token set and therefore the request fails.
Is there a way to share self between pooled processes?
TLDR: the way you describe it - no - it is not possible.
Whenever you fork a process (i.e. via creating a pool of those) all data from the memory is copied into a fork (i.e. not referenced).
As a result any mutation that you make to the original payload does not affect the fork's replica.
You have three options here:
Use a thread pool (threads share memory): https://docs.python.org/3/library/threading.html
Employ IPC structures: https://pymotw.com/2/multiprocessing/communication.html
Assign the authentication token to self before you create a process pool. This will make sure that self's replica will contain the token.
While you could do it using a manager, it means the object will be copied to all other processes every time it is modified.
I suggest passing the token itself to the process, instead of copying the entire object around. If you don't have the token available at the time you call pool.map, then the idiomatic idea is to create a Queue to send the token to your process later. Your process can sit at the other end of the Queue and wait for the token before issuing the request.
First of all to begin with 'Yes' i checked and googled this topic but can't find anything that gives me a clear answer to my question? I am a beginner in Djagno and studying its documentation where i read about the Thread Safety Considerations for render method of nodes for Templates Tags. Here is the link to the documentation Link. My question lies where it states that Once the node is parsed the render method for that node might be called multiple times i am confused whether it is talking about the use of the template tag in the same document at different places for the same user at the single instance level of the user on the server or the use of the template tag for multiple request coming from users all around the world sharing the same django instance in memory? If its the latter one does't django create a new instance at the server level for every new user request and have separate resources for every user in the memory or am i wrong about this?
It's the latter.
A WSGI server usually runs a number of persistent processes, and in each process it runs a number of threads. While some automatic scaling can be applied, the number of processes and threads is more or less constant, and determines how many concurrent requests Django can handle. The days where each request would create a new CGI process are long gone, and in most cases persistent processes are much more efficient.
Each process has its own memory, and the communication between processes is usually handled by the database, the cache etc. They can't communicate directly through memory.
Each thread within a process shares the same memory. That means that any object that is not locally scoped (e.g. only defined inside a function), is accessible from the other threads. The cached template loader parses each template once per process, and each thread uses the same parsed nodes. That also means that if you set e.g. self.foo = 'bar' in one thread, each thread will then read 'bar' when accessing self.foo. Since multiple threads run at the same time, this can quickly become a huge mess that's impossible to debug, which is why thread safety is so important.
As the documentation says, as long as you don't store data on self, but put it into context.render_context, you should be fine.
I am trying to set up global variables that would be accessible by any of the threads in Django. I know there are endless posts on stackoverflow about this, and everyone says don't do it. I am writing a web application which does some file processing using the Acora Python module. The Acora module builds a tree of sorts based on some input data (strings). The process of building the tree takes some time, so I'd like to build the Acora structure at application start up time, so that when files are submitted to be processed, the Acora structures would be ready to go. This would shave 30 seconds from each file to be processed if I could pull this off.
I've tried a few methods, but for each request, the data isn't available and I think it's because each request is processed in a separate thread, so I need a cross thread or shared memory solution, or I have to find something other than Acora. Also, Acora can't be pickled or serialized as it is a C module and doesn't expose it's data to Python. I've tried Django cache and cPickle, without luck because they use Pickle. Thoughts?
Pull the Acora task out of Django entirely. Use Twisted or some other event framework to create a service that Django can talk to either directly or via a message queue such as Celery whenever it has files that need processing.
I'm happy to accept that this might not be possible, let alone sensible, but is it possible to keep a persistent reference to an object I have created?
For example, in a few of my views I have code that looks a bit like this (simplified for clarity):
api = Webclient()
api.login(GPLAY_USER,GPLAY_PASS)
url = api.get_stream_urls(track.stream_id)[0]
client = mpd.MPDClient()
client.connect("localhost", 6600)
client.clear()
client.add(url)
client.play()
client.disconnect()
It would be really neat if I could just keep one reference to api and client throughout my project, especially to avoid repeated api logins with gmusicapi. Can I declare them in settings.py? (I'm guessing this is a terrible idea), or by some other means keep a connection to them that's persistent?
Ideally I would then have functions like get_api() which would check the existing object was still ok and return it or create a new one as required.
You can't have anything that's instantiated once per application, because you'll almost certainly have more than one server process, and objects aren't easily shared across processes. However, one per process is definitely possible, and worthwhile. To do that, you only need to instantiate it at module level in the relevant file (eg views.py). That means it will be automatically instantiated when Django first imports that file (in that process), and you can refer to it as a global variable in that file. It will persist as long as the process does, and when as new process is created, a new global var will be instantiated.
You could make them properties of your application object or of some
other application object that is declared at the top level of your
project - before anything else needs it.
If you put them into a class that gets instantiated on the first
import and is then just used on the rest it can be imported by
several modules and accessed.
Either way they would have a life of the length of the execution.
You can't persist the object reference, but you can store something either in memory django cache or in memcached django cache.
Django Cache
https://docs.djangoproject.com/en/dev/topics/cache/
See also
Creating a Persistent Data Object In Django