Python: Efficient lookup in a list of dicts - python

I have a huge list of dicts (or objects), that have exactly the same fields. I need to perform a search within this collection to retrieve a single object based on the given criteria (there could be many matches, but I take only the first one).
The search criteria don't care about all the fields, and the fields are often different, so changing the list into a dictionary with hashed values is impossible.
This looks like a job for a database, so I'm thinking about using an in-memory sqlite database for it. The problem is that I need to write some kind of wrapper that will translate SQL queries into Python API, which makes me think that perhaps there's already a solution for that somewhere.
Maybe someone already had a similar problem, and there's already a tool that will help me with that? Or sqlite is the only way?

Related

How can I prefetch_related() everything related to an object?

I'm trying to export all data connected to an User instance to CSV file. In order to do so, I need to get it from the DB first. Using something like
data = SomeModel.objects.filter(owner=user)
on every model possible seems to be very inefficient, so I want to use prefetch_related(). My question is, is there any way to prefetch all different model's instances with FK pointing at my User, at once?
Actually, you don't need to "prefetch everything" in order to create a CSV file – or, anything else – and you really don't want to. Python's CSV support is of course designed to work "row by row," and that's what you want to do here: in a loop, read one row at a time from the database and write it one row at a time to the file.
Remember that Django is lazy. Functions like filter() specify what the filtration is going to be, but things really don't start happening until you start to iterate over the actual collection. That's when Django will build the query, submit it to the SQL engine, and start retrieving the data that's returned ... one row at a time.
Let the SQL engine, Python and the operating system take care of "efficiency." They're really good at that sort of thing.

Python & MongoDB - Store data in descending order

I am using MongoDB with Python. I used the following command to insert my documents:
db.test.insert_one({"Name": name, "Age": age})
(name and age are variables within my code)
I used the following command to sort from oldest age to youngest:
db.test.find().sort("Age", 1)
I understand that I am simply issuing a find command within my code. Is there a clever way to use the db.test.save() method to sort my documents and overwrite the original with the sorted documents?
I looked into many questions that somewhat address this problem such as this:
How to store sorted array back to MongoDB?
But in order to do the $push command inside update_one and use the $sort feature, I would have to re-work my entire code. What are my options? Do I have to re-work my code or is there another way?
You are mixing up two concepts in mongo: sorting an array embedded in a document and sorting documents in a collection by some properties.
The post you link to refers to the former. You seem to be trying to permanently sort the documents in a collection.
Mongo stores documents in roughly the order they are written in. Like most databases it uses indexes to make retrieving data more efficient.
In the instance you describe, you would create an index on the age property of your document.
Use either the ensure_index or create_index functions:
db.test.ensure_index("Age", pymongo.DESCENDING)
Henceforth, every query like
db.test.find().sort("Age", -1)
uses the index and will be very fast.
I suppose you could could try rewriting the documents to another collection in sorted prefer and then see whether a simple fetch always comes back on age order. But this really isn't the right way to be thinking about mongo.

Does Django QuerySet use binary search to filter by date?

I have a Location object in my Django app that uses Simple History (https://github.com/treyhunner/django-simple-history), and I need to frequently query that object for its history between two dates. I know you can do that with:
Location.objects.filter(id=1, history_date__range=(starttime, endtime))
However, I noticed that all history for a given id is ordered from latest to earliest. That means I should be able to do a binary search on that list to get all dates included.
My question is -- is hard coding a python binary search after
Location.objects.filter(id=1)
faster or slower than just using the Query described above?
It all depends on the database you're using. In most databases, there's an implicit "natural ordering" of rows: when the row was inserted into the database. This might be how it's displaying, which doesn't guarantee ordering.
If you're concerned with computation time, you can place an index on history_date, so grabbing an ordered list is O(log(n)) time. However, inserting, updating, and deleting will take slightly longer!

python dictionary structure, speed concerns

I am new to python. I need a data structure to store counts of some objects. For example, I want to store the most visited webpages. Lets say. I have 100 the most visited webpages. I keep the counts of visits to each webpage. I may need to update the list. I will definitely update the visit-counts. It does not have to be ordered. I will look at the associated visit-count given the webpage ID. I am planning to use a dictionary. Is there a faster way of doing this in python?
The dictionary is an appropriate and fast data structure for this task (mapping webpage IDs to visit counts).
Python dictionaries are implemented using hash tables for fast O(1) access. They are so fast that almost any attempt to avoid them will make code run slower and make the code unpleasant to look at.
P.S. Also take a look at collections.Counter which is specifically designed for this kind of work (counting hits). It is implemented as a dictionary with initial default values set to zero.
Python dictionary object is one of the most optimized parts of the whole Python language and the reason is that dictionaries are used everywhere.
For example normally every object instance of every class uses a dictionary to keep instance data members content, the class is a dictionary containing the methods, the modules use a dictionary to keep the globals, the system uses a dictionary to keep and lookup the modules and so on.
For keeping a counter using a dictionary is a good approach in Python.

map data type for python google app engine

I would like to have a map data type for one of my entity types in my python google app engine application. I think what I need is essentially the python dict datatype where I can create a list of key-value mappings. I don't see any obvious way to do this with the provided datatypes in app engine.
The reason I'd like to do this is that I have a User entity and I'd like to track within that user a mapping of lessonIds to values that represent that user's status with a particular lesson id. I'd like to do this without creating a whole new entity that might be titled UserLessonStatus and have it reference the User and have to be queried, since I often want to iterate through all the lesson statuses. Maybe it is better done this way, in which case, I'd appreciate opinions that this is how it's best done. Otherwise if someone knows a good way to create a mapping within my User entity itself, that'd be great.
One solution I considered is using two ListProperties in conjunction, i.e. when adding an object append the key and value to each list; when locating, find the index of the string in one list and reference using that index in the other; when removing, find the index in one, use it to remove from each, and so forth.
You're probably better off using another kind, as you suggest. If you do want to store it all in the one entity, though, you have several options - parallel lists, as you suggest, are one option. You could also simply pickle a Python dictionary, assuming you don't want to query on it.
You may want to check out the ndb project, which supports nested entities, which would also be a viable solution.

Categories