SQLAlchemy: how can I order a table by a column permanently? - python

I'm not sure if this has been answered before, I didn't get anything on a quick search.
My table is built in a random order, but thereafter it is modified very rarely. I do frequent selects from the table and in each select I need to order the query by the same column. Now is there a way to sort a table permanently by a column so that it does not need to be done again for each select?

You can add an index sorted by the column you want. The data will be presorted according to that index.

You can have only one place where you define it, and re-use that for
every query:
def base_query(session, what_for):
return session.query(what_for).order_by(what_for.rank_or_whatever)
Expand that as needed, then for all but very complex queries you can use that like so:
some_query = base_query(session(), Employee).filter(Employee.feet > 3)
The resulting query will be ordered by Employee.rank_or_whatever. If you are always querying for the same, You won't habve to use it as argument, of course.
EDIT: If you could somehow define a "permanent" order on your table which is observed by the engine without being issued an ORDER BY I'd think this would be an implementation feature specific to which RDBMS you use, and just convenience. Internally it makes for a DBMS no sense to being coerced how to store the data, since retrieving this data in a specific order is easily and efficiently accomplished by using an INDEX - forcing a specific order would probably decrease overall performance.

Related

Order by the same list as the one used in IN using Peewee ORM

I'm trying to write a query in Peewee for MySQL and I'd like to do something similar to the solution offered here: Sort by order of values in a select statement "in" clause in mysql
That is, I'd like to select a table using a WHERE clause and an IN operator.
However, rather than have the results ordered based on value(s) found in these tables, I'd like them to arrange them in the same order they're found in the list or operator I provide.
The alternative I'm using now is to simply loop through and accumulate on another list, but this takes much longer (~50-70% more time than just a simple query with an order_by).
Is there a way to do this more elegantly in Peewee?
Assuming that information isn't stored in your database, you probably want to just re-order the rows into a new list. This can be done in O(n) for the size of your result-set, so it should not be any slower than just iterating over the rows in the first place.

NDB: Sort query results

In App Engine NDB, I am querying entities that have a repeated property. I would like to order the result by the length of the array representing that property.
What I wish I could do:
Entity.query(...).order(len(Entity.repeatedProp))
You'll need to add an ndb.IntegerProperty() to your entity where you will store the length of the repeated property. Every time you change your repeated property, you'll need to update the stored length. Then you sort based on that stored length.
You could probably use a computed property, but I've never used one of those so I'm not sure.
Depending on how many entities are you sorting you can sort it by code.

Real-time access to simple but large data set with Python

I am currently facing the problem of having to frequently access a large but simple data set on a smallish (700 Mhz) device in real time. The data set contains around 400,000 mappings from abbreviations to abbreviated words, e.g. "frgm" to "fragment". Reading will happen frequently when the device is used and should not require more than 15-20ms.
My first attempt was to utilize SQLite in order to create a simple data base which merely contains a single table where two strings constitute a data set:
CREATE TABLE WordMappings (key text, word text)
This table is created once and although alterations are possible, only read-access is time critical.
Following this guide, my SELECT statement looks as follows:
def databaseQuery(self, query_string):
self.cursor.execute("SELECT word FROM WordMappings WHERE key=" + query_string + " LIMIT 1;")
result = self.cursor.fetchone()
return result[0]
However, using this code on a test data base with 20,000 abbreviations, I am unable to fetch data quicker than ~60ms, which is far to slow.
Any suggestions on how to improve performance using SQLite or would another approach yield more promising results?
You can speed up lookups on the key column by creating an index for it:
CREATE INDEX kex_index ON WordMappings(key);
To check whether a query uses an index or scans the entire table, use EXPLAIN QUERY PLAN.
A long time ago I tried to use SQLite for sequential data and it was not fast enough for my needs. At the time, I was comparing it against an existing in-house binary format, which I ended up using.
I have not personally used, but a friend uses PyTables for large time-series data; maybe it's worth looking into.
It turns out that defining a primary key speeds up individual queries by an factor order of magnitude.
Individual queries on a test table with 400,000 randomly created entries (10/20 characters long) took no longer than 5ms which satisfies the requirements.
The table is now created as follows:
CREATE TABLE WordMappings (key text PRIMARY KEY, word text)
A primary key is used because
It is implicitly unique, which is a property of the abbreviations stored
It cannot be NULL, so the rows containing it must not be NULL. In our case, if they were, the database would be corrupt
Other users have suggested using an index, however, they are not necessarily unique and according to the accept answer to this question, they unnecessarily slow down update/insert/delete performance. Nevertheless, using an index may as well increase performance. This has, however not been tested by the original author, although not tested by the original author.

"Nested" queries in SQL / SQLAlchemy

I'm using SQLAlchemy (being relatively new both to it and SQL) and I want to get a list of all comments posted to a set of things, but I'm only interested in comments that have been posted since a certain date, and the date is different for each thing:
To clarify, here's what I'm doing now: I begin with a dictionary that maps the ID code of each thing I'm interested in to the date I'm interested in for that thing. I do a quick list comprehension to get a list of just the codes (thingCodes) and then do this query:
things = meta.Session.query(Thing)\
.filter(Thing.objType.in_(['fooType', 'barType']))\
.filter(Thing.data.any(and_(Data.key == 'thingCode',Data.value.in_(thingCodes))))\
.all()
which returns a list of the thing objects (I do need those in addition to the comments). I then iterate through this list, and for each thing do a separate query:
comms = meta.Session.query( Thing )
.filter_by(objType = 'comment').filter(Thing.data.any(wc('thingCode', code))) \
.filter(Thing.date >= date) \
.order_by('-date').all()
This works, but it seems horribly inefficient to be to be doing all these queries separately. So, I have 2 questions:
a) Rather than running the second query n times for an n-length list of things, is there a way I could do it in a single query while still returning a separate set of results for each ID (presumably in the form of a dictionary of ID's to lists)? I suppose I could do a value_in(listOfIds) to get a single list of all the comments I want and then iterate through that and build the dictionary manually, but I have a feeling there's a way to use JOINs for this.
b) Am I over-optimizing here? Would I be better off with the second approach I just mentioned? And is it even that important that I roll them all into a single transactions? The bulk of my experience is with Neo4j, which is pretty good at transparently nesting many small transactions into larger ones - does SQL/SQLAlchemy have similar functionality, or is it definitely in my interest to minimize the number of queries?

Get the latest entries django

I currently have a function that writes one to four entries into a database every 12 hours. When certain conditions are met the function is called again to write another 1-4 entries based on the previous ones. Now since time isn't the only factor I have to check whether or not the conditions are met and because the entries are all in the same database I have to differentiate them based on their time posted into the database (DateTimeField is in the code)
How could I achieve this? Is there a function built in in django that I just couldn't find? Or would I have to take a look at a rather complicated solution.
as a sketch I would say i'd expect something like this:
latest = []
allData = myManyToManyField.objects.get(externalId=2)
for data in allData:
if data.Timestamp.checkIfLatest(): #checkIfLatest returns true/false
latest.append(data)
or even better something like this (although I don't think that's implemented)
latest = myManyToManyField.objects.get.latest.filter(externalId=2)
The django documentation is very very good, especially with regards to querysets and model layer functions. It's usually the first place you should look. It sounds like you want .latest(), but it's hard to tell with your requirements regarding conditions.
latest_entry = m2mfield.objects.latest('mydatefield')
if latest_entry.somefield:
# do something
Or perhaps you wanted:
latest_entry = m2mfield.objects.filter(somefield=True).latest('mydatefield')
You might also be interested in order_by(), which will order the rows according to a field you specify. You could then iterate on all the m2m fields until you find the one that matches a condition.
But without more information on what these conditions are, it's hard to be more specific.
IT's just a thought.. we can keep epoch time(current time of the entrie) field in database as a primary key and compare with the previous entiries and diffrentiate them

Categories