Can't get Beaker cache working - python

I'm trying to use Beaker's caching library but I can't get it working.
Here's my test code.
class IndexHandler():
#cache.cache('search_func', expire=300)
def get_results(self, query):
results = get_results(query)
return results
def get(self, query):
results = self.get_results(query)
return render_index(results=results)
I've tried the examples in Beaker's documentation but all I see is
<type 'exceptions.TypeError'> at /
can't pickle generator objects
Clearly I'm missing something but I couldn't find a solution.
By the way this problem occurs if the cache type is set to "file".

If you configure beaker to save to the filesystem, you can easily see that each argument is being pickled as well. Example:
tp3
sS'tags <myapp.controllers.tags.TagsController object at 0x103363c10> <MySQLdb.cursors.Cursor object at 0x103363dd0> apple'
p4
Notice that the cache "key" contains more than just my keyword, "apple," but instance-specific information. This is pretty bad, because especially the 'self' won't be the same across invocations. The cache will result in a miss every single time (and will get filled up with useless keys.)
The method with the cache annotation should only have the arguments to correspond to whatever "key" you have in mind. To paraphrase this, let's say that you want to store the fact that "John" corresponds to value 555-1212 and you want to cache this. Your function should not take anything except a string as an argument. Any arguments you pass in should stay constant from invocation to invocation, so something like "self" would be bad.
One easy way to make this work is to inline the function so that you don't need to pass anything else beyond the key. For example:
def index(self):
# some code here
# suppose 'place' is a string that you're using as a key. maybe
# you're caching a description for cities and 'place' would be "New York"
# in one instance
#cache_region('long_term', 'place_desc')
def getDescriptionForPlace(place):
# perform expensive operation here
description = ...
return description
# this will either fetch the data or just load it from the cache
description = getDescriptionForPlace(place)
Your cache file should resemble the following. Notice that only 'place_desc' and 'John' were saved as a key.
tp3
sS'place_desc John'
p4

I see that the beaker docs do not mention this explicitly, but, clearly, the decorate function must pickle the arguments it's called with (to use as part of the key into the cache, to check if the entry is present and to add it later otherwise) -- and, generator objects are not pickleable, as the error message is telling you. This implies that query is a generator object, of course.
What you should be doing in order to use beaker or any other kind of cache is to pass around, instead of a query generator object, the (pickleable) parameters from which that query can be built -- strings, numbers, dicts, lists, tuples, etc, etc, composed in any way that is easy to for you to arrange and easy to build the query from "just in time" only within the function body of get_results. This way, the arguments will be pickleable and caching will work.
If convenient, you could build a simple pickleable class whose instances "stand for" queries, emulating whatever initialization and parameter-setting you require, and performing the just-in-time instantiation only when some method requiring an actual query object is called. But that's just a "convenience" idea, and does not alter the underlying concept as explained in the previous paragraph.

Try return list(results) instead of return results and see if it helps.
The beaker file cache needs to be able to pickle both cache keys and values; most iterators and generators are unpickleable.

Related

Safe way to create an "out of band" alternative value for a function argument in Python

I have a function like the following:
def do_something(thing):
pass
def foo(everything, which_things):
"""Do stuff to things.
Args:
everything: Dict of things indexed by thing identifiers.
which_things: Iterable of thing identifiers. Something is only
done to these things.
"""
for thing_identifier in which_things:
do_something(everything[thing_identifier])
But I want to extend it so that a caller can do_something with everything passed in everything without having to provide a list of identifiers. (As a motivation, if everything was an opaque container whose keys weren't accessible to library users but only library internals. In this case, foo can access the keys but the caller can't. Another motivation is error prevention: having a constant with obvious semantics avoids a caller mistakenly passing the wrong set of identifiers in.) So one thought is to have a constant USE_EVERYTHING that can be passed in, like so:
def foo(everything, which_things):
"""Do stuff to things.
Args:
everything: Dict of things indexed by thing identifiers.
which_things: Iterable of thing identifiers. Something is only
done to these things. Alternatively pass USE_EVERYTHING to
do something to everything.
"""
if which_things == USE_EVERYTHING:
which_things = everything.keys()
for thing_identifier in which_things:
do_something(everything[thing_identifier])
What are some advantages and limitations of this approach? How can I define a USE_EVERYTHING constant so that it is unique and specific to this function?
My first thought is to give it its own type, like so:
class UseEverythingType:
pass
USE_EVERYTHING = UseEverythingType()
This would be in a package only exporting USE_EVERYTHING, discouraging creating any other UseEverythingType objects. But I worry that I'm not considering all aspects of Python's object model -- could two instances of USE_EVERYTHING somehow compare unequal?

use of attributes in python

This is kind of a high level question. I'm not sure what you'd do with code like this:
class Object(object):
pass
obj = Object
obj.a = lambda: None
obj.d = lambda: dict
setattr(obj.d, 'dictionary', {4,3,5})
setattr(obj.a, 'somefield', 'somevalue')
If I'm going to call obj.a.somefield, why would I use print? It feels redundant.
I simply can't see what programming strictly with setting attributes would be good for?
I could write an entire program with all of my variables in object classes.
First about your print question. Print is used more for debugging or for attributes that are an output from an object that gives you information when you create it.
For example, there might be an object that you create by passing it data and it finds all of the basic statistics information of that data. You could have it return a dictionary via a method and access the values from there or you could simply access it via an attribute, making the data more readable.
For your second part of your question about why you would want to use attributes in general, they're more for internally passing information from function to function in an object or for configuring an object. Python has different scopes that determine which information each function can access. All methods of an object can access that object's attributes, which allows you to avoid using external or global variables. That makes your object nice and self contained. Global variables are generally avoided at all costs, because they can get messy, so they are considered bad practice.
Taking that a step further, using setattr is a more sophisticated way of setting these attributes to make your code more readable. You could use a function to modify aspects of an object or you could "hide" the complexity inside your setattr so the user can use a higher level interface rather than getting bogged down in the specifics.

list() constructor with list?

I'm new in Python and I don't understand the purpose of list() function in this piece of code:
documents = [(list(movie_reviews.words(fileid)), category)
for category in movie_reviews.categories()
for fileid in movie_reviews.fileids(category)]
The method words() is already returning a list of tokenized words from a string, and I don't see any difference between that and
documents = [(movie_reviews.words(fileid), category)
for category in movie_reviews.categories()
for fileid in movie_reviews.fileids(category)]
There are three possibilities:
It is a mistake, no call to list() is required.
The interface only guarantees that the method returns an Iterable type, which may be any one of: list, set, iterator, generator, etc. The specific movie_reviews.words() may return a list today, but either that may change in future versions, or in other classes with similar interfaces (child/parent/or simply similar interface).
Whether this is the case, should be stated explicitly in the documentation, or could be gleamed out of the inheritance hierarchy.
The method performs some sort of memoization, while keeping a copy of the returned list. A good practice would be to copy the cached-list inside the method, but maybe they returned a shared list-object.
If the method returns a reference to a shared list object, then it is a good idea to call list(), in order to create a new list object. Without the copy operation, any change to the list by one side (inside the method vs. through documents variable) will confuse the other side. If you change the list through documents varaible, then calling movie_reviews.words(fileid) with the same fileid may return the wrong value.
In general, although this is bad design, this happens in real code. I once had to debug such an issue in live code. Usually, in case of memoization, it is better to return an immutable type such as a tuple, instead of a list, which will guarantee both speed and safety.

Python pseudo-immutable object field

I currently need to partially create a Python object and be able to update it for some time. Although, I must not be able to update it once I used the object as a dictionary key.
Of course there is the solution of marking the fields as private, which is mostly a warning for the programmer, and I will actually go for that solution.
But I stumbled on another solution and I want to know if this could be a good idea, or if it could simply go horribly wrong. Here it is:
class Foo():
def __init__(self, bar):
self._bar = bar
self._has_been_hashed = False
def __hash__(self):
self._has_been_hashed = True
return self._bar.__hash__()
def __eq__(self, other):
return self._bar == other._bar
def __copy__(self):
return Foo(self._bar)
def set_bar(self, bar):
if self.has_been_hashed:
raise FooIsNowImmutable
else:
self._bar = bar
Some testing proved it to work as desired, I can no longer use set_bar once I, say, used my object as a dictionary key.
What do you think? Is it a good idea? Will it turn against me? Is there an easier way? And is it somehow a bad practice?
Doing it that way is a bit fragile, since you never know when something might be used as a dictionary key, or when its hash might be called for some other reason. An object isn't supposed to "know" whether it's being used as a dictionary key. It will be confusing to have code that may raise an exception just because some other code somewhere else put the object in a dictionary.
Following the Python philosophy of "explicit is better than implicit", it would be safer to just give your object a method called .finalize() or .lock() or something, which would set a flag indicating the object is immutable. You could also reverse the exception-raising logic, so that __hash__ raises an exception if the object is not yet locked (rather than mutation raising an exception if the object has been hashed).
You would then call .lock() when you're ready to make the object immutable. It makes more sense to explicitly set it immutable when you're done with whatever mutating you need to do, rather than implicitly assuming that as soon as you use it in a dictionary, you're done mutating it.
You can do that, but I'm not sure I'd recommend it. Why do you need it in a dictionary?
It requires a lot more awareness of the state of the object... think a file object. Would you put one in a dictionary? It has to be opened for a lot of the functions to work, and once it's closed, you can't do them anymore. The user has to be aware in the surrounding code which state the object is in.
For files, that makes sense - after all, you don't normally hold files open across large parts of your program, or if you do, they have very defined init and close codes; something similar has to make sense for your object. Especially if you have some APIs that take the object, but expect an immutable version, and others that take the same object, but expect to change it...
I have used the lock method before, and it works well for complex, read-only objects that you want to initialize once and then make sure no one is messing with. E.G. you load a copy of a (say, English) dictionary from disk... it has to be mutable while you are populating it, but you don't want anyone to accidentally modify it, so locking it is a great idea. I would only use it if it was a one-time lock though - something you are locking and unlocking seems like a recipe for disaster.
There are two solutions IMHO if you just want to create a version you can use in hashable places. First is to explicitly create an immutable copy when you put it in a dictionary - tuple and frozenset are examples of this sort of behaviour... if you want to put a list in a dict, you can't, but you can create a tuple from it first, and that can be hashed. Create a frozen version of your object, then it's very clear by looking at the object type whether it's expected to be mutable or immutable, and so cases where it was used incorrectly are easily seen.
Second, if you really want it to be hashable, but need it to be mutable... that's actually legal, but implemented a little different. It goes back to the idea of hashing... hashing is used both for optimized lookups, and equality.
The first is to ensure you can get objects back... you put something in a dictionary, and it hashes to a value of 4 - goes in slot 4. Then you modify it. Then you go to look it up again, and now it hashes to 9 - there's nothing in slot 9, or worse, a different object, and you're broken.
Second is equality - for things like sets, I need to know if my object is already in there. I can hash, but if you know anything about hashing, you still need to check equality to check for hash collisions.
That doesn't preclude supporting __hash__ and being mutable, but it's unusual. You need to decide for your item what makes it the same, even though it's mutable. What you need to do then is give each object a unique id. Technically, you may be able to get away with id(self), but something like the uuid module is probably a better possibility. The UUID4 (or technically, the hash of the UUID4) is what determines both the hash and equality; two objects that contain the same UUID4 should be the exact same object; two objects that have the exact same data but a different UUID4 would be different object.

Use python dict to lookup mutable objects

I have a bunch of File objects, and a bunch of Folder objects. Each folder has a list of files. Now, sometimes I'd like to lookup which folder a certain file is in. I don't want to traverse over all folders and files, so I create a lookup dict file -> folder.
folder = Folder()
myfile = File()
folder_lookup = {}
# This is pseudocode, I don't actually reach into the Folder
# object, but have an appropriate method
folder.files.append(myfile)
folder_lookup[myfile] = folder
Now, the problem is, the files are mutable objects. My application is built around the fact. I change properites on them, and the GUI is notified and updated accordingly. Of course you can't put mutable objects in dicts. So what I tried first is to generate a hash based on the current content, basically:
def __hash__(self):
return hash((self.title, ...))
This didn't work of course, because when the object's contents changed its hash (and thus its identity) changed, and everything got messed up. What I need is an object that keeps its identity, although its contents change. I tried various things, like making __hash__ return id(self), overriding __eq__, and so on, but never found a satisfying solution. One complication is that the whole construction should be pickelable, so that means I'd have to store id on creation, since it could change when pickling, I guess.
So I basically want to use the identity of an object (not its state) to quickly look up data related to the object. I've actually found a really nice pythonic workaround for my problem, which I might post shortly, but I'd like to see if someone else comes up with a solution.
I felt dirty writing this. Just put folder as an attribute on the file.
class dodgy(list):
def __init__(self, title):
self.title = title
super(list, self).__init__()
self.store = type("store", (object,), {"blanket" : self})
def __hash__(self):
return hash(self.store)
innocent_d = {}
dodge_1 = dodgy("dodge_1")
dodge_2 = dodgy("dodge_2")
innocent_d[dodge_1] = dodge_1.title
innocent_d[dodge_2] = dodge_2.title
print innocent_d[dodge_1]
dodge_1.extend(range(5))
dodge_1.title = "oh no"
print innocent_d[dodge_1]
OK, everybody noticed the extremely obvious workaround (that took my some days to come up with), just put an attribute on File that tells you which folder it is in. (Don't worry, that is also what I did.)
But, it turns out that I was working under wrong assumptions. You are not supposed to use mutable objects as keys, but that doesn't mean you can't (diabolic laughter)! The default implementation of __hash__ returns a unique value, probably derived from the object's address, that remains constant in time. And the default __eq__ follows the same notion of object identity.
So you can put mutable objects in a dict, and they work as expected (if you expect equality based on instance, not on value).
See also: I'm able to use a mutable object as a dictionary key in python. Is this not disallowed?
I was having problems because I was pickling/unpickling the objects, which of course changed the hashes. One could generate a unique ID in the constructor, and use that for equality and deriving a hash to overcome this.
(For the curious, as to why such a "lookup based on instance identity" dict might be neccessary: I've been experimenting with a kind of "object database". You have pure python objects, put them in lists/containers, and can define indexes on attributes for faster lookup, complex queries and so on. For foreign keys (1:n relationships) I can just use containers, but for the backlink I have to come up with something clever if I don't want to modify the objects on the n side.)

Categories