Ordered lists in django

Ordered lists in django - python

i have very simple problem. I need to create model, that represent element of ordered list. This model can be implemented like this:
class Item(models.Model):
data = models.TextField()
order = models.IntegerField()
or like this:
class Item(models.Model):
data = models.TextField()
next = models.ForeignKey('self')
What way is preferred? What drawbacks have each solution?

Essentially, the second solution you propose is a linked list. Linked list implemented at the database level are usually not a good idea. To retrieve a list of n elements, you will need n database access (or use complicated queries). Performance wise, retrieving a list in O(n) is awfully not efficient.
In regular code, linked list are used to get better insert performance compared to arrays (no need to move all elements around). In your database, updating all elements is not that complicated in only 2 queries :
UPDATE item.order = item.order + 1 FROM item WHERE order > 3
INSERT INTO item (order, ...) VALUES (3, ...)
I remember seeing a reuseable app that implemented all that and a nice admin interface, but I cant find it right now ...
To summarize, definitly use solution #1 and stay away from solution #2 unless you have a very very good reason not to !

That depends on what you want to do.
The first one seems better to make a single query in the database and get all data in the correct order
The second one seems better to insert an element between two existing elements (because in the first one you'd have to change a lot of items if the numbers are sequential)
I'd use the first one, because it seems to fit better a database table, which is how django stores model data behind the hood.

There is another solution.
class Item(models.Model):
data = models.TextField()
You can just pickle or marshal Python list into the data field and the load it up. This one is good for updating and reading, but not for searching e.g. fetching all lists that contain a specific item.

Related

Django the fastest way to do Query get list of items in row from table

In my app I need to do fast Query but I don't know which is faster
materials = Material.objects.only('name')
Or do filter this in view
materials = Material.objects.all()
And then use for loop to show list of items from 'name' row
I think that first is better or there is better way to do this?
It cant be done with filter() because it need to show all of fields in this row.

If you only want the names, you can use a .values_list(..):
materials = list(Material.objects.values_list('name', flat=True))
This will avoid wrapping the records in Material objects. That being said, unless some of the columns contain (very) large amounts of data, using .only(..) will not significantly speed up the process. Furthermore software design-wise it is often better to fetch Material objects, since that means that you can define behavior in your Material model.

Django - how can I have a field in my model consisting of tuple of integers?

I am defining the models for my Django app, and I would like to have a field for a model consisting of a tuple of two (positive) integers. How can I do this? I'm looking at the Django Models API reference but I can't see any way of doing this.

Depends on how you intend to use them after storing in the database; 2 methods I can think of are:
Option 1)
models.IntegerField(unique=True)
now the trick is loading data and parsing it: you would have to concatenate the numbers then have a way to split them back out.
fast would be
Option 2)
models.CommaSeparatedIntegerField(max_length=1024, unique=True)
not sure how it handles unique values; likely '20,40' is not equal to '40,20', so those two sets would be unique.
or just implement it yourself in a custom field/functions in the model.

"Nested" queries in SQL / SQLAlchemy

I'm using SQLAlchemy (being relatively new both to it and SQL) and I want to get a list of all comments posted to a set of things, but I'm only interested in comments that have been posted since a certain date, and the date is different for each thing:
To clarify, here's what I'm doing now: I begin with a dictionary that maps the ID code of each thing I'm interested in to the date I'm interested in for that thing. I do a quick list comprehension to get a list of just the codes (thingCodes) and then do this query:
things = meta.Session.query(Thing)\
.filter(Thing.objType.in_(['fooType', 'barType']))\
.filter(Thing.data.any(and_(Data.key == 'thingCode',Data.value.in_(thingCodes))))\
.all()
which returns a list of the thing objects (I do need those in addition to the comments). I then iterate through this list, and for each thing do a separate query:
comms = meta.Session.query( Thing )
.filter_by(objType = 'comment').filter(Thing.data.any(wc('thingCode', code))) \
.filter(Thing.date >= date) \
.order_by('-date').all()
This works, but it seems horribly inefficient to be to be doing all these queries separately. So, I have 2 questions:
a) Rather than running the second query n times for an n-length list of things, is there a way I could do it in a single query while still returning a separate set of results for each ID (presumably in the form of a dictionary of ID's to lists)? I suppose I could do a value_in(listOfIds) to get a single list of all the comments I want and then iterate through that and build the dictionary manually, but I have a feeling there's a way to use JOINs for this.
b) Am I over-optimizing here? Would I be better off with the second approach I just mentioned? And is it even that important that I roll them all into a single transactions? The bulk of my experience is with Neo4j, which is pretty good at transparently nesting many small transactions into larger ones - does SQL/SQLAlchemy have similar functionality, or is it definitely in my interest to minimize the number of queries?

Get the latest entries django

I currently have a function that writes one to four entries into a database every 12 hours. When certain conditions are met the function is called again to write another 1-4 entries based on the previous ones. Now since time isn't the only factor I have to check whether or not the conditions are met and because the entries are all in the same database I have to differentiate them based on their time posted into the database (DateTimeField is in the code)
How could I achieve this? Is there a function built in in django that I just couldn't find? Or would I have to take a look at a rather complicated solution.
as a sketch I would say i'd expect something like this:
latest = []
allData = myManyToManyField.objects.get(externalId=2)
for data in allData:
if data.Timestamp.checkIfLatest(): #checkIfLatest returns true/false
latest.append(data)
or even better something like this (although I don't think that's implemented)
latest = myManyToManyField.objects.get.latest.filter(externalId=2)

The django documentation is very very good, especially with regards to querysets and model layer functions. It's usually the first place you should look. It sounds like you want .latest(), but it's hard to tell with your requirements regarding conditions.
latest_entry = m2mfield.objects.latest('mydatefield')
if latest_entry.somefield:
# do something
Or perhaps you wanted:
latest_entry = m2mfield.objects.filter(somefield=True).latest('mydatefield')
You might also be interested in order_by(), which will order the rows according to a field you specify. You could then iterate on all the m2m fields until you find the one that matches a condition.
But without more information on what these conditions are, it's hard to be more specific.

IT's just a thought.. we can keep epoch time(current time of the entrie) field in database as a primary key and compare with the previous entiries and diffrentiate them

Efficient large dicts of dicts to represent M:M relationships in Python

I have a very large dataset - millions of records - that I want to store in Python. I might be running on 32-bit machines so I want to keep the dataset down in the hundreds-of-MB range and not ballooning much larger than that.
These records - represent a M:M relationship - two IDs (foo and bar) and some simple metadata like timestamps (baz).
Some foo have too nearly all bar in them, and some bar have nearly all foo. But there are many bar that have almost no foos and many foos that have almost no bar.
If this were a relational database, a M:M relationship would be modelled as a table with a compound key. You can of course search on either component key individually comfortably.
If you store the rows in a hashtable, however, you need to maintain three hashtables as the compound key is hashed and you can't search on the component keys with it.
If you have some kind of sorted index, you can abuse lexical sorting to iterate the first key in the compound key, and need a second index for the other key; but its less obvious to me what actual data-structure in the standard Python collections this equates to.
I am considering a dict of foo where each value is automatically moved from tuple (a single row) to list (of row tuples) to dict depending on some thresholds, and another dict of bar where each is a single foo, or a list of foo.
Are there more efficient - speedwise and spacewise - ways of doing this? Any kind of numpy for indices or something?
(I want to store them in Python because I am having performance problems with databases - both SQL and NoSQL varieties. You end up being IPC memcpy and serialisation-bound. That is another story; however the key point is that I want to move the data into the application rather than get recommendations to move it out of the application ;) )

Have you considered using a NoSQL database that runs in memory such at Redis? Redis supports a decent amount of familiar data structures.
I realize you don't want to move outside of the application, but not reinventing the wheel can save time and quite frankly it may be more efficient.

If you need to query the data in a flexible way, and maintain various relationships, I would suggest looking further into using a database, of which there are many options. How about using an in-memory databse, like sqlite (using ":memory:" as the file)? You're not really moving the data "outside" of your program, and you will have much more flexibility than with multi-layered dicts.
Redis is also an interesting alternative, as it has other data-structures to play with, rather than using a relational model with SQL.

What you describe sounds like a sparse matrix, where the foos are along one axis and the bars along the other one. Each non-empty cell represents a relationship between one foo and one bar, and contains the "simple metadata" you describe.
There are efficient sparse matrix packages for Python (scipy.sparse, PySparse) you should look at. I found these two just by Googling "python sparse matrix".
As to using a database, you claim that you've had performance problems. I'd like to suggest that you may not have chosen an optimal representation, but without more details on what your access patterns look like, and what database schema you used, it's awfully hard for anybody to contribute useful help. You might consider editing your post to provide more information.

NoSQL systems like redis don't provide MM tables.
In the end, a python dict keyed by pairs holding the values, and a dict of the set of pairings for each term was the best I could come up with.
class MM:
def __init__(self):
self._a = {} # Bs for each A
self._b = {} # As for each B
self._ab = {}

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.