I am trying to use memcached with Google App Engine. I import the library using
from google.appengine.api import memcache
and then call it using
posts = memcache.gets("posts")
Then I get the following error:
AttributeError: 'module' object has no attribute 'gets'
I have looked through the Google App Engine documentation regarding memcache, but I can't find any examples using memcache.gets(). Memcache.get() seems to be used the way I call gets above.
gets is a method of the memcache client object, not a module-level function of memcache. The module-level functions are quite simple, stateless, and synchronous; using the client object, you can do more advanced stuff, if you have to, as documented at https://cloud.google.com/appengine/docs/python/memcache/clientclass .
Specifically, per the docs at https://cloud.google.com/appengine/docs/python/memcache/clientclass#Client_gets , "You use" gets "rather than get if you want to avoid conditions in which two or more callers are trying to modify the same key value at the same time, leading to undesired overwrites." since gets also gets (and stashes in the client object) the cas_id which lets you use the cas (compare-and-set) call (you don't have to explicitly handle the cas_id yourself).
Since it doesn't seem you're attempting a compare-and-set operation, I would recommend using the simpler module-level function get, rather than instantiating a client object and using its instance method gets.
If you actually do need to compare and set, a very good explanation can be found here:
The Client object is required because the gets() operation actually
squirrels away some hidden information that is used by the subsequent
cas() operation. Because the memcache functions are stateless (meaning
they don't alter any global values), these operations are only
available as methods on the Client object, not as functions in the
memcache module. (Apart from these two, the methods on the Client
object are exactly the same as the functions in the module, as you can
tell by comparing the documentation.)
The solution would be to use the class:
client = memcache.Client()
posts = client.gets("posts")
...
client.cas("posts", "new_value")
Although, of course, you would need more than that for cas to be useful.
Related
Currently I create a library that records backend calls like ones made to boto3 and requests libraries, and then populates a global "data" object based on some data like the status code of responses, etc.
I originally had the data object as global, but then I realized this was a bad idea because when the application is run in parallel, the data object is simultaneously modified (which would possibly corrupt it), however I want to keep this object separate for each invocation of my application.
So I looked into Flask context locals, similar to how it does for its global "request" object. I manage to implement a way using LocalProxy how they did it, so it works fine now with parallel requests to my application - the issue now though, is that whenever the application spawns a new sub-thread it creates an entirely new context and thus I can't retrieve the data object from its parent thread, e.g. for that request session - basically I need to copy and modify the same data object that is local to the main thread for that particular application request.
To clarify, I was able to do this when I previously had data as a true "global" object - multiple sub-threads could properly modify the same object. However, it did not handle the case for simultaneous requests made to application, as I mentioned; so I manage to fix that, but now the sub-threads are not able to modify the same data object any more *sad face*
I looked at some solutions like below, but this did not help me because the decorator approach only works for "local" functions. Since the functions that I need to decorate are "global" functions like requests.request that threads across various application requests will use, I think I need to use another approach where I can temporarily copy same thread context to use in sub-threads (and my understanding is it should not overwrite or decorate the function, as this is a "global" one that will be use by simultaneous requests to application). Would appreciate any help or possible ideas how I can make this work for my use-case.
Thanks.
Flask throwing 'working outside of request context' when starting sub thread
Complete code here: https://gist.github.com/mnjul/82151862f7c9585dcea616a7e2e82033
Environment is Python 2.7.6 on an up-to-date Ubuntu 14.04 x64.
Prologue:
Well, I got this strange piece of code at my work project, and it's a classic "somebody wrote it and quit the job, and it works but I don't why" piece, so I decided to write a stripped-down version of it, hoping to get my questions clarified/answered. Please kindly check the referred gist.
Situation:
So, I have a custom class Storage inheriting from Python's thread local storage, intended to book-keep some thread-local data. There is only one instance of that class, instantiated in the global scope when no threads have been constructed. So I would expect that as there is only one Storage instance, its __init__() running only once, those Runner threads would actually not have thread-local storage and data accesses will clash.
However this turned out to be wrong and the code output (see my comment at that gist) indicates that each thread actually perfectly has its own local storage --- strangely, at each thread's first access to the storage object (i.e. a set()), Storage.__init__() is mysteriously run, thus properly creating the thread-local storage, producing the desired effect.
Questions: Why on earth did Storage.__init__ get invoked when the threads attempted to call a member function of a seemingly already-instantiated object? Is this a CPython (or PThread, if that matters) implementation detail? I feel like there're a lot of things happening between my stack trace's "py_thr_local.py", line 36, in run => storage.set('keykey', value) and "py_thr_local.py", line 14, in __init__, but I can't find any relevant piece of information in (C)Python's source code, or on the StackOverflow.
Any feedback is welcome. Let me know if I need to clarify things or provide more information.
The first piece of information to consider is what is a thread-local? They are independently initialized instances of a particular type that are tied to a particular thread. With that in mind I would expect that some initialization code would be called multiple times. While in some languages like Java the initialization is more explicit, it does not necessarily need to be.
Let's look at the source for the supertype of the storage container you're using: https://github.com/python/cpython/blob/2.7/Lib/_threading_local.py
Line 186 contains the local type that is being used. Taking a look at that class you can see that the methods setattr and getattribute are among the overridden methods. Remember that in python these methods are called every time you attempt to assign a value or access a value in a type. The implementations of these methods acquire a local lock and then call the _patch method. This patch method creates a new dictionary and assigns it to the current instance dict (using object base to avoid infinite recursion: How is the __getattribute__ method used?)
So when you are calling storage.set(...) you are actually looking up a proxy dictionary in the local thread. If one doesn't exist the the init method is called on your type (see line 182). The result of that lookup is substituted in to the current instances dict method, and then the appropriate method is called on object to retrieve or set that value (l. 193,206,219) which uses the newly installed dict.
That's part of the contract (from http://svn.python.org/projects/python/tags/r27a1/Lib/_threading_local.py):
Note that if you define an init method, it will be
called each time the local object is used in a separate thread.
It's not too well documented in the official docs but basically each time you interact with a thread local in a different thread a new instance unique to that thread gets allocated.
Recently I've read this article:
http://blog.roseman.org.uk/2010/02/01/middleware-post-processing-django-gotcha/
I don't understand, why does the solution described there work?
Why does instantiating separate objects make data chunk thread-safe?
I have two guesses:
Django explicitly holds middleware objects in shared memory, and do not do this for other objects, so other objects are thread-safe.
In second example, in article, lifetime of thread-safety-critical data is much less that in first example, so probably, thread-unsafe operations just have no time to occur.
There is also issues with thread-safety in Django templates.
My question is - how to guess when Django thread-safe and where its not? is there any logic in it or conventions? Another question - I know that request object is thread safe - it is clear, that it wouldn't be safe, web-sites built with Django would be not able to operate, but what exactly makes it thread-safe?
The point, as I note in that article, is that the middleware is instantiated once per process. In most methods of deploying Django, a process lasts for multiple requests. Note that you never instantiate the middleware object yourself: Django takes care of that. That's a clue that it's being done outside the request/response cycle.
The extra object I used there is being instantiated within the process_response method. So, as soon as that method returns, the new object goes out of scope and is destroyed, and there are no thread-safety issues.
Generally speaking, the only objects you have to worry about thread-safety on are those you instantiate at module or class level rather than inside a function/method, and those you don't instantiate yourself, like the middleware here. And even there, requests are explicitly an exception: you can count on those being per-request (naturally).
I'm happy to accept that this might not be possible, let alone sensible, but is it possible to keep a persistent reference to an object I have created?
For example, in a few of my views I have code that looks a bit like this (simplified for clarity):
api = Webclient()
api.login(GPLAY_USER,GPLAY_PASS)
url = api.get_stream_urls(track.stream_id)[0]
client = mpd.MPDClient()
client.connect("localhost", 6600)
client.clear()
client.add(url)
client.play()
client.disconnect()
It would be really neat if I could just keep one reference to api and client throughout my project, especially to avoid repeated api logins with gmusicapi. Can I declare them in settings.py? (I'm guessing this is a terrible idea), or by some other means keep a connection to them that's persistent?
Ideally I would then have functions like get_api() which would check the existing object was still ok and return it or create a new one as required.
You can't have anything that's instantiated once per application, because you'll almost certainly have more than one server process, and objects aren't easily shared across processes. However, one per process is definitely possible, and worthwhile. To do that, you only need to instantiate it at module level in the relevant file (eg views.py). That means it will be automatically instantiated when Django first imports that file (in that process), and you can refer to it as a global variable in that file. It will persist as long as the process does, and when as new process is created, a new global var will be instantiated.
You could make them properties of your application object or of some
other application object that is declared at the top level of your
project - before anything else needs it.
If you put them into a class that gets instantiated on the first
import and is then just used on the rest it can be imported by
several modules and accessed.
Either way they would have a life of the length of the execution.
You can't persist the object reference, but you can store something either in memory django cache or in memcached django cache.
Django Cache
https://docs.djangoproject.com/en/dev/topics/cache/
See also
Creating a Persistent Data Object In Django
I use SqlAlchemy to connect to my database backend and make heavy use of multiprocessing in my Python application. I came to a situation which requires to pass an object reference, which is the result of a database query, from one process to another.
This is a problem, because when accessing an attribute of the object, SqlAlchemy trys to reattach the object into the current session of the other process, which fails with an exception, because the object is attached in an other session:
InvalidRequestError: Object '<Field at 0x9af3e4c>' is already attached to session '148848780' (this is '159831148')
What is the way to handle this situation? Is it possible to detach the object from the first session or clone the object without the ORM related stuff?
This is a bad idea (tm).
You shouldn't share a stateful object between processes like this (I know it's tempting) because all kinds of bad things can happen since lock primitives are not intended to work across multiple python runtimes.
I suggest taking the attributes you need out of that object, jamming them into a dict and
sending it across processes using multprocessing Pipes:
http://docs.python.org/library/multiprocessing.html#pipes-and-queues