Expiring Dictionaries with individual TTLs - python

I need a dictionary where I can store items with a TTL (time to live) so that the items disappear once the time is up. I found the ExpiringDict class for this purpose but it appears to be restricted to having the same timeout for each item in the dictionary. Is there an alternative that lets me specify different timeout values for each key?

It is easy to build yourself. Ingredients: a normal dict to store the values; a heapq to store (expiry, key) pairs; a Thread to run a loop, check the top of the heap and delete (or mark expired, depending on what your need is) while top's expiry is in the past (don't forget to let it sleep). When you push to dict, at the same time add (now + ttl, key) to the heapq. There's some details that you might want to attend to (e.g. removing stuff from heapq if you delete from dict etc, though that'd be a bit slow as you'd have to search the heap, then re-heapify - again, only necessary if your use case requires it) but the basic idea is quite simple.

One place to look for inspiration might be Django's LocMemCache in-memory key-value object. It basically wraps a _cache dict to hold the actual cache and an _expire_info dict to store any expiry. Then in get(), it will call self._has_expired() and make the comparison to the current timestamp with time.time().
You can find the class at django.core.cache.backends.locmem.
Granted, this is not a dict subclass; as mentioned above, it actually wraps two separate dictionaries, one for caching and one for storing expiries; but, its API is dictionary-like.

Related

Implement a persistent stack data structure using Redis

I am implementing a stack data structure using redis-py and Redis list data type. I am not clear on how to handle the case when the corresponding list data type is empty. Default Redis behaviour appears to be that once a list is empty the related key is deleted. The empty list case is hit on Redis for example, when I pop or clear all elements in my stack data structure on the Python end. Basically, my set up is that I have stack object in my code that calls operations on the Redis list. For example, when a client of the stack object does, stack.pop(), the stack object then calls BRPOP on the corresponding list in Redis using redis-py. Also, in my set up, the stack object has key attribute, which is the key of the related list in Redis.
I have thought about 2 possible solutions so far:
Never empty the Redis list completely. At least maintain one element in the list. From the client's perspective a stack is empty if the Redis list only contains 1 element. This approach works, but I mainly dont like it as it involves keeping track of number of elements pushed/popped.
If the list is empty, and the related key is deleted. Upon subsequent push, just create a new list on Redis. This approach also works, but an added complexity here is that I cannot be sure if someone else has created k,v pair on Redis using the same key as that of my stack object.
So, I am basically looking for a way to keep a key with an empty list that does not involve the bookkeeping required in the above two approaches. Thanks.
Your 2nd solution is the best Fit.
Yes you have to maintain the naming conventions, this is very basic for any no sql databases or key value store. You must have the control over the keys you put in. If you don't have that control, then over the period of time you don't know which key is used for which purpose. To achieve the same, you can prefix some meaningful string in all the keys you put in.
For example, if i want to store 3 hashmaps for a single user user1 I will do like this
hmset ACTIONS_user1 a 10 b 20 ...
hmset VIEWS_user1 home_page 4 login 10 ...
hmset ALERTS_user1 daily 5 hourly 12 ...
In the above example user1 is dynamically created by the app logic and you append them with a meaningful string representing what that key holds.
In this way you will always have control over the keys you put in and you will never face key collision.
Hope this helps.

Is it a good idea to store in cache a dict of my app's cache

When client make a POST, I need to delete all data modified by this request.
My memcached keys are hashed, so I can't find it by prefix. The only solution I found is to store a dict in DB or in cache, and I prefer in cache.
Do you think following ideas are good:
Store a dict/list of cache keys ?
To use cache or db for this function ?
It's really stable solution, in logical and performance ?
Maybe use a second cache for make it ?
Thanks to all.
Edit:
I want to store a dict as following:
{
'subject1': ['key1','key2'],
'subject2': ['key3','key3']
}
It's hard to give you a satisfying answer without knowing the specifics of your application. With that in mind...
This can be an acceptable solution; however keep in mind that objects can be evicted from memcached before their expiry date, in which case you'd lose your key index.
Depending on your specific use case, this limitation can be acceptable or not.
An approach that may serve you better is to be clever about how your name your keys; for example, if your application deals with books, you could cache various data under names such as book.<id>.author, book.<id>.title, etc. This way, if you need to invalidate all the data for the book with id 42, you could just generate all the cache names you need to invalidate (book.42.author, book.42.title, etc.) and delete those.

python dictionary structure, speed concerns

I am new to python. I need a data structure to store counts of some objects. For example, I want to store the most visited webpages. Lets say. I have 100 the most visited webpages. I keep the counts of visits to each webpage. I may need to update the list. I will definitely update the visit-counts. It does not have to be ordered. I will look at the associated visit-count given the webpage ID. I am planning to use a dictionary. Is there a faster way of doing this in python?
The dictionary is an appropriate and fast data structure for this task (mapping webpage IDs to visit counts).
Python dictionaries are implemented using hash tables for fast O(1) access. They are so fast that almost any attempt to avoid them will make code run slower and make the code unpleasant to look at.
P.S. Also take a look at collections.Counter which is specifically designed for this kind of work (counting hits). It is implemented as a dictionary with initial default values set to zero.
Python dictionary object is one of the most optimized parts of the whole Python language and the reason is that dictionaries are used everywhere.
For example normally every object instance of every class uses a dictionary to keep instance data members content, the class is a dictionary containing the methods, the modules use a dictionary to keep the globals, the system uses a dictionary to keep and lookup the modules and so on.
For keeping a counter using a dictionary is a good approach in Python.

map data type for python google app engine

I would like to have a map data type for one of my entity types in my python google app engine application. I think what I need is essentially the python dict datatype where I can create a list of key-value mappings. I don't see any obvious way to do this with the provided datatypes in app engine.
The reason I'd like to do this is that I have a User entity and I'd like to track within that user a mapping of lessonIds to values that represent that user's status with a particular lesson id. I'd like to do this without creating a whole new entity that might be titled UserLessonStatus and have it reference the User and have to be queried, since I often want to iterate through all the lesson statuses. Maybe it is better done this way, in which case, I'd appreciate opinions that this is how it's best done. Otherwise if someone knows a good way to create a mapping within my User entity itself, that'd be great.
One solution I considered is using two ListProperties in conjunction, i.e. when adding an object append the key and value to each list; when locating, find the index of the string in one list and reference using that index in the other; when removing, find the index in one, use it to remove from each, and so forth.
You're probably better off using another kind, as you suggest. If you do want to store it all in the one entity, though, you have several options - parallel lists, as you suggest, are one option. You could also simply pickle a Python dictionary, assuming you don't want to query on it.
You may want to check out the ndb project, which supports nested entities, which would also be a viable solution.

Are there reasons to use get/put methods instead of item access?

I find that I have recently been implementing Mapping interfaces on classes which on the surface fit the model (they are essentially just key-value stores with no more meta-data), but underneath they are sometimes quite complex.
Here are a couple examples of increasing severity:
An object which wraps another mapping converting all objects to strings when set.
An object which uses a local database as a back-end to store the key-value pairs.
An object which makes HTTP requests to remote servers to get/set data.
Lets suppose all of these examples seamlessly implement the Mapping interface, and the only indication that there is something fishy going on is that item access potentially could take a few seconds, and an item may not be retrievable in the same form as stored (if at all). I'm perfectly content with something like the first example, pretty okay with the second, but I'm getting kinda uncomfortable with the last.
The question is, is there a line at which the API for these models should not use item access, even though the underlying structure may feel like it fits on the surface?
It sounds like you are describing the standard anydbm module semantics. And just as anydbm can raise exception anydbm.error, so too could your subclass raise derivatives like MyDbmTimeoutError as needed. Whether you implement it as dictionary operations or function calls, the caller will still have to contend with exceptions anyway (e.g. KeyError, NameError).
I think that arbitrary "tied" hashes exist in Python 2 and 3.x is decent justification for saying it is a reasonable approach. Indeed, I've been looking for (and designing in my head) more complex bindings than simple key ⇒ value mappings without a heavy ORM SQL layer in between.
added: The more I think about it, the more Pythonic tied dictionaries seem to be. A key ⇒ value collection is a dictionary. Whether it lives in core or on disk or across the network is an implementation detail that is best abstracted away. The only substantive differences are increased latency and possible unavailability; however, on a virtual memory based OS, "core" can have higher latency than RAM and in a multiprocessing OS, "core" can become unavailable too. So these are differences in degree only, not kind.
From a strictly philosophical point of view, I don't think that there is a line you can cross with this. If some tool provides the needed functionality, but its API is different, adapt away. The only time you shouldn't do this is if the adapted to API simply is not expressive enough to manipulate the adapted component in a way that is needed.
I wouldn't hesitate to adapt a database into a dict, because that's a great way to manipulate collections, and it's already compatible with a heck of a lot of other code. If I find that my particular application must make calls to the database connections begin(), commit(), and rollback() methods to work right, then a dict won't do, since dict's don't have transaction semantics.

Categories