Implement a persistent stack data structure using Redis - python

I am implementing a stack data structure using redis-py and Redis list data type. I am not clear on how to handle the case when the corresponding list data type is empty. Default Redis behaviour appears to be that once a list is empty the related key is deleted. The empty list case is hit on Redis for example, when I pop or clear all elements in my stack data structure on the Python end. Basically, my set up is that I have stack object in my code that calls operations on the Redis list. For example, when a client of the stack object does, stack.pop(), the stack object then calls BRPOP on the corresponding list in Redis using redis-py. Also, in my set up, the stack object has key attribute, which is the key of the related list in Redis.
I have thought about 2 possible solutions so far:
Never empty the Redis list completely. At least maintain one element in the list. From the client's perspective a stack is empty if the Redis list only contains 1 element. This approach works, but I mainly dont like it as it involves keeping track of number of elements pushed/popped.
If the list is empty, and the related key is deleted. Upon subsequent push, just create a new list on Redis. This approach also works, but an added complexity here is that I cannot be sure if someone else has created k,v pair on Redis using the same key as that of my stack object.
So, I am basically looking for a way to keep a key with an empty list that does not involve the bookkeeping required in the above two approaches. Thanks.

Your 2nd solution is the best Fit.
Yes you have to maintain the naming conventions, this is very basic for any no sql databases or key value store. You must have the control over the keys you put in. If you don't have that control, then over the period of time you don't know which key is used for which purpose. To achieve the same, you can prefix some meaningful string in all the keys you put in.
For example, if i want to store 3 hashmaps for a single user user1 I will do like this
hmset ACTIONS_user1 a 10 b 20 ...
hmset VIEWS_user1 home_page 4 login 10 ...
hmset ALERTS_user1 daily 5 hourly 12 ...
In the above example user1 is dynamically created by the app logic and you append them with a meaningful string representing what that key holds.
In this way you will always have control over the keys you put in and you will never face key collision.
Hope this helps.

Related

Expiring Dictionaries with individual TTLs

I need a dictionary where I can store items with a TTL (time to live) so that the items disappear once the time is up. I found the ExpiringDict class for this purpose but it appears to be restricted to having the same timeout for each item in the dictionary. Is there an alternative that lets me specify different timeout values for each key?
It is easy to build yourself. Ingredients: a normal dict to store the values; a heapq to store (expiry, key) pairs; a Thread to run a loop, check the top of the heap and delete (or mark expired, depending on what your need is) while top's expiry is in the past (don't forget to let it sleep). When you push to dict, at the same time add (now + ttl, key) to the heapq. There's some details that you might want to attend to (e.g. removing stuff from heapq if you delete from dict etc, though that'd be a bit slow as you'd have to search the heap, then re-heapify - again, only necessary if your use case requires it) but the basic idea is quite simple.
One place to look for inspiration might be Django's LocMemCache in-memory key-value object. It basically wraps a _cache dict to hold the actual cache and an _expire_info dict to store any expiry. Then in get(), it will call self._has_expired() and make the comparison to the current timestamp with time.time().
You can find the class at django.core.cache.backends.locmem.
Granted, this is not a dict subclass; as mentioned above, it actually wraps two separate dictionaries, one for caching and one for storing expiries; but, its API is dictionary-like.

Dictionary in Python which keeps the last x accessed keys

Is there a dictionary in python which will only keep the most recently accessed keys. Specifically, I am caching relatively large blobs of data in a dictionary, and I am looking for a way of preventing the dictionary from ballooning in size, and to drop to the variables which were only accessed a long time ago [i.e. to only keep the say the 1000 most recently accessed keys - and when a new key gets added, to drop the key that was accessed the longest ago].
I suspect this is not part of the standard dictionary class, but am hoping there is something analogous.
Sounds like you want a Least Recently Used (LRU) cache.
Here's a Python implementation already: https://pypi.python.org/pypi/lru-dict/
Here's another one: https://www.kunxi.org/blog/2014/05/lru-cache-in-python/

python dictionary structure, speed concerns

I am new to python. I need a data structure to store counts of some objects. For example, I want to store the most visited webpages. Lets say. I have 100 the most visited webpages. I keep the counts of visits to each webpage. I may need to update the list. I will definitely update the visit-counts. It does not have to be ordered. I will look at the associated visit-count given the webpage ID. I am planning to use a dictionary. Is there a faster way of doing this in python?
The dictionary is an appropriate and fast data structure for this task (mapping webpage IDs to visit counts).
Python dictionaries are implemented using hash tables for fast O(1) access. They are so fast that almost any attempt to avoid them will make code run slower and make the code unpleasant to look at.
P.S. Also take a look at collections.Counter which is specifically designed for this kind of work (counting hits). It is implemented as a dictionary with initial default values set to zero.
Python dictionary object is one of the most optimized parts of the whole Python language and the reason is that dictionaries are used everywhere.
For example normally every object instance of every class uses a dictionary to keep instance data members content, the class is a dictionary containing the methods, the modules use a dictionary to keep the globals, the system uses a dictionary to keep and lookup the modules and so on.
For keeping a counter using a dictionary is a good approach in Python.

How can I test the validity of a ReferenceProperty in Appengine?

I am currently testing a small application I have written. I have not been sufficiently careful in ensuring the data in my datastore is consistent and now I have an issue that I have some records referencing objects which no longer exist. More specifially, I have some objects which have ReferenceProperty's which have been assigned values; the objects referred to have been deleted but the reference remains.
I would like to add some checks to my code to ensure that referenced objects exist and acting accordingly. (Of course, I also need to clean up my data).
One approach is to just try to get() it; however, this should probably retrieve the entire object - I'd like an approach which just tests the existence of the object, ideally, just resulting in costs of key manipulations on the datastore, rather than full entity reads.
So, my question is the following: is there a simple mechanism to test if a given ReferenceProperty is valid which only involves key access operations (rather than full entity get operations)?
This will test key existence without returning an entity:
db.Query(keys_only=True).filter('__key__ =', test_key).count(1) == 1
I'm not certain that it's computationally cheaper than fetching the entity.

map data type for python google app engine

I would like to have a map data type for one of my entity types in my python google app engine application. I think what I need is essentially the python dict datatype where I can create a list of key-value mappings. I don't see any obvious way to do this with the provided datatypes in app engine.
The reason I'd like to do this is that I have a User entity and I'd like to track within that user a mapping of lessonIds to values that represent that user's status with a particular lesson id. I'd like to do this without creating a whole new entity that might be titled UserLessonStatus and have it reference the User and have to be queried, since I often want to iterate through all the lesson statuses. Maybe it is better done this way, in which case, I'd appreciate opinions that this is how it's best done. Otherwise if someone knows a good way to create a mapping within my User entity itself, that'd be great.
One solution I considered is using two ListProperties in conjunction, i.e. when adding an object append the key and value to each list; when locating, find the index of the string in one list and reference using that index in the other; when removing, find the index in one, use it to remove from each, and so forth.
You're probably better off using another kind, as you suggest. If you do want to store it all in the one entity, though, you have several options - parallel lists, as you suggest, are one option. You could also simply pickle a Python dictionary, assuming you don't want to query on it.
You may want to check out the ndb project, which supports nested entities, which would also be a viable solution.

Categories