Find parent with certain combination of child rows - SQLite with Python - python

There are several parts to this question. I am working with sqlite3 in Python 2.7, but I am less concerned with the exact syntax, and more with the methods I need to use. I think the best way to ask this question is to describe my current database design, and what I am trying to accomplish. I am new to databases in general, so I apologize if I don't always use correct nomenclature.
I am modeling refrigeration systems (using Modelica--not really important to know), and I am using the database to manage input data, results data, and models used for that data.
My top parent table is Model, which contains the columns:
id, name, version, date_created
My child table under Model is called Design. It is used to create a unique id for each combination of design input parameters and the model used. the columns it contains are:
id, model_id, date_created
I then have two child tables under Design, one called Input, and the other called Result. We can just look at Input for now, since one example should be enough. The columns for input are:
id, value, design_id, parameter_id, component_id
parameter_id and component_id are foreign keys to their own tables.The Parameter table has the following columns:
id, name, units
Some example rows for Parameter under name are: length, width, speed, temperature, pressure (there are many dozens more). The Component table has the following columns:
id, name
Some example rows for Component under name are: compressor, heat_exchanger, valve.
Ultimately, in my program I want to search the database for a specific design. I want to be able to search a specific design to be able to grab specific results for that design, or to know whether or not a model simulation with that design has already been run previously, to avoid re-running the same data point.
I also want to be able to grab all the parameters for a given design, and insert it into a class I have created in Python, which is then used to provide inputs to my models. In case it helps for solving the problem, the classes I have created are based on the components. So, for example, I have a compressor class, with attributes like compressor.speed, compressor.stroke, compressor.piston_size. Each of these attributes should have their own row in the Parameter table.
So, how would I query this database efficiently to find if there is a design that matches a long list (let's assume 100+) of parameters with specific values? Just as a side note, my friend helped me design this database. He knows databases, but not my application super well. It is possible that I designed it poorly for what I want to accomplish.
Here is a simple picture trying to map a certain combination of parameters with certain values to a design_id, where I have taken out component_id for simplicity:
Picture of simplified tables

Simply join the necessary tables. Your schema properly reflects normalization (separating tables into logical groupings) and can scale for one-to-many relationships. Specifically, to answer your question --So, how would I query this database efficiently to find if there is a design that matches a long list (let's assume 100+) of parameters with specific values?-- consider below approaches:
Inner Join with Where Clause
For handful of parameters, use an inner join with a WHERE...IN() clause. Below returns design fields joined by input and parameters tables, filtered for specific parameter names where you can have Python pass as parameterized values even iteratively in a loop:
SELECT d.id, d.model_id, d.date_created
FROM design d
INNER JOIN input i ON d.id = i.design_id
INNER JOIN parameters p ON p.id = i.parameter_id
WHERE p.name IN ('param1', 'param2', 'param3', 'param4', 'param5', ...)
Inner Join with Temp Table
Should values be over 100+ in a long list, consider a temp table that filters parameters table to specific parameter values:
# CREATE EMPTY TABLE (SAME STRUCTURE AS parameters)
sql = "CREATE TABLE tempparams AS SELECT id, name, units FROM parameters WHERE 0;"
cur.execute(sql)
db.commit()
# ITERATIVELY APPEND TO TEMP
for i in paramslist: # LIST OF 100+ ITEMS
sql = "INSERT INTO tempparams (id, name, units) \
SELECT p.id, p.name, p.units \
FROM parameters p \
WHERE p.name = ?;"
cur.execute(sql, i) # CURSOR OBJECT COMMAND PASSING PARAM
db.commit() # DB OBJECT COMMIT ACTION
Then, join main design and input tables with new temp table holding specific parameters:
SELECT d.id, d.model_id, d.date_created
FROM design d
INNER JOIN input i ON d.id = i.design_id
INNER JOIN tempparams t ON t.id = i.parameter_id
Same process can work with components table as well.
*Moved picture to question section

Related

Add table columns to select without adding it to select_from

I have a prepared function in the database, which I want to call using Gino. This function has a return type equal to one of the tables, that is created using declarative. What I try to do is:
select(MyModel).select_from(func.my_function);
The problem is, that SQLAlchemy automatically detects the table in my select and adds it implicitly to select_from. The resulting SQL contains both my function and the table name in the FROM clause and the result is a cartesian of the function result and the whole table (not what I want really).
My question is – can I somehow specify that I want to select all the columns for a model without having the corresponding class in the FROM?
You have to specify the columns (as an array) if you don't want SA to automatically add MyModel to the FROM clause.
You have to either do this:
select([your_model_table.c.column1, your_model_table.c.column2]).select_from(func.my_function);
Your if you want all columns:
select(your_model_table.columns).select_from(func.my_function);

Best way to perform bulk insert SQLAlchemy

I have a tabled called products
which has following columns
id, product_id, data, activity_id
What I am essentially trying to do is copy bulk of existing products and update it's activity_id and create new entry in the products table.
Example:
I already have 70 existing entries in products with activity_id 2
Now I want to create another 70 entries with same data except for updated activity_id
I could have thousands of existing entries that I'd like to make a copy of and update the copied entries activity_id to be a new id.
products = self.session.query(model.Products).filter(filter1, filter2).all()
This returns all the existing products for a filter.
Then I iterate through products, then simply clone existing products and just update activity_id field.
for product in products:
product.activity_id = new_id
self.uow.skus.bulk_save_objects(simulation_skus)
self.uow.flush()
self.uow.commit()
What is the best/ fastest way to do these bulk entries so it kills time, as of now it's OK performance, is there a better solution?
You don't need to load these objects locally, all you really want to do is have the database create these rows.
You essentially want to run a query that creates the rows from the existing rows:
INSERT INTO product (product_id, data, activity_id)
SELECT product_id, data, 2 -- the new activity_id value
FROM product
WHERE activity_id = old_id
The above query would run entirely on the database server; this is far preferable over loading your query into Python objects, then sending all the Python data back to the server to populate INSERT statements for each new row.
Queries like that are something you could do with SQLAlchemy core, the half of the API that deals with generating SQL statements. However, you can use a query built from a declarative ORM model as a starting point. You'd need to
Access the Table instance for the model, as that then lets you create an INSERT statement via the Table.insert() method.
You could also get the same object from models.Product query, more on that later.
Access the statement that would normally fetch the data for your Python instances for your filtered models.Product query; you can do so via the Query.statement property.
Update the statement to replace the included activity_id column with your new value, and remove the primary key (I'm assuming that you have an auto-incrementing primary key column).
Apply that updated statement to the Insert object for the table via Insert.from_select().
Execute the generated INSERT INTO ... FROM ... query.
Step 1 can be achieved by using the SQLAlchemy introspection API; the inspect() function, applied to a model class, gives you a Mapper instance, which in turn has a Mapper.local_table attribute.
Steps 2 and 3 require a little juggling with the Select.with_only_columns() method to produce a new SELECT statement where we swapped out the column. You can't easily remove a column from a select statement but we can, however, use a loop over the existing columns in the query to 'copy' them across to the new SELECT, and at the same time make our replacement.
Step 4 is then straightforward, Insert.from_select() needs to have the columns that are inserted and the SELECT query. We have both as the SELECT object we have gives us its columns too.
Here is the code for generating your INSERT; the **replace keyword arguments are the columns you want to replace when inserting:
from sqlalchemy import inspect, literal
from sqlalchemy.sql import ClauseElement
def insert_from_query(model, query, **replace):
# The SQLAlchemy core definition of the table
table = inspect(model).local_table
# and the underlying core select statement to source new rows from
select = query.statement
# validate asssumptions: make sure the query produces rows from the above table
assert table in select.froms, f"{query!r} must produce rows from {model!r}"
assert all(c.name in select.columns for c in table.columns), f"{query!r} must include all {model!r} columns"
# updated select, replacing the indicated columns
as_clause = lambda v: literal(v) if not isinstance(v, ClauseElement) else v
replacements = {name: as_clause(value).label(name) for name, value in replace.items()}
from_select = select.with_only_columns([
replacements.get(c.name, c)
for c in table.columns
if not c.primary_key
])
return table.insert().from_select(from_select.columns, from_select)
I included a few assertions about the model and query relationship, and the code accepts arbitrary column clauses as replacements, not just literal values. You could use func.max(models.Product.activity_id) + 1 as a replacement value (wrapped as a subselect), for example.
The above function executes steps 1-4, producing the desired INSERT SQL statement when printed (I created a products model and query that I thought might be representative):
>>> print(insert_from_query(models.Product, products, activity_id=2))
INSERT INTO products (product_id, data, activity_id) SELECT products.product_id, products.data, :param_1 AS activity_id
FROM products
WHERE products.activity_id != :activity_id_1
All you have to do is execute it:
insert_stmt = insert_from_query(models.Product, products, activity_id=2)
self.session.execute(insert_stmt)

How to implement a specific SQL statement as a SQLAlchemy ORM query

I was following a tutorial to make my first Flask API (https://medium.com/#dushan14/create-a-web-application-with-python-flask-postgresql-and-deploy-on-heroku-243d548335cc) I did it but now I want to do queries more custom with SQLAlchemy and PostgreSQL. My question is how I could do something like this:
query = text("""SELECT enc.*, persona."Persona_Nombre", persona."Persona_Apellido", metodo."MetEnt_Nombre", metodo_e."MetPag_Descripcion"
FROM "Ventas"."Enc_Ventas" AS enc
INNER JOIN "General"."Persona" AS persona ON enc."PersonaId" = persona."PersonaId"
INNER JOIN "Ventas"."Metodo_Entrega" AS metodo ON enc."MetodoEntregaId" = metodo."MetodoEntregaId"
INNER JOIN "General"."Metodo_Pago" AS metodo_e ON enc."MetodoPagoId" = metodo_e."MetodoPagoId"
INNER JOIN "General"."Estatus" AS estado ON enc. """)
but with SQLAlchemy in order to use the models that I created previously. Thanks in advance for any answer!!
Edit:
The columns that I wish to see at the final result are: enc.*, persona."Persona_Nombre", persona."Persona_Apellido", metodo."MetEnt_Nombre", metodo_e."MetPag_Descripcion"
I really wish I could share more info but sadly I can't at the moment.
Doing this from the ORM layer, you would reference model names (I match the names of your query above, but I'm sure some of the model/table names are off - now adjusted slightly).
Now revised to include the specific columns you only want to see (note that I ignore your SQL aliases, ORM layer handles the actual query construction):
selection = session.query(Enc_Ventas, Persona.Persona_Nombre, Persona.Persona_Apellido, Metodo_Entrega.MetEnt_Nombre, Metodo_Pago.MetPag_Descripcion).\
join(Persona, Enc_Ventas.PersonaId == Persona.PersonaId).
join(Metodo_Entrega, Enc_Ventas.MetodoEntregaId == Metodo_Entrega.MetodoEntregaId).\
join(Metodo_Pago, Enc_Ventas.MetodoPagoId == Metodo_Pago.MetodoPagoId).\
join(Estatus).all()
Referencing the selection collection would be by iteration through the rows of tuples. A more robust and stable solution would be to transform each output row into a dict.
Otherwise, by including whole models, the collection of rows returned can be individually accessed by referencing as dot notation the model names in the query().
If you need further access to the columns in the related tables, use the ORM technique of .options(joinedload(myTable)), which in a single database query will bring in those additional columns, using the relationship name, also as dot notation.
You also need to define sqlalchemy relationships within your models for this to work, as well as defining the underlying SQL foreign keys.
Much more detail and/or a more specific question is needed to help further, imo.

SQLite get id and insert when using executemany

I am optimising my code, and reducing the amount of queries. These used to be in a loop but I am trying to restructure my code to be done like this. How do I get the second query working so that it uses the id entered in the first query from each row. Assume that the datasets are in the right order too.
self.c.executemany("INSERT INTO nodes (node_value, node_group) values (?, (SELECT node_group FROM nodes WHERE node_id = ?)+1)", new_values)
#my problem is here
new_id = self.c.lastrowid
connection_values.append((node_id, new_id))
#insert entry
self.c.executemany("INSERT INTO connections (parent, child, strength) VALUES (?,?,1)", connection_values)
These queries used to be a for loop but were taking too long so I am trying to avoid using a for loop and doing the query individually. I believe their might be a way with combining it into one query but I am unsure how this would be done.
You will need to either insert rows one at a time or read back the rowids that were picked by SQLite's ID assignment logic; as documented in Autoincrement in SQLite, there is no guarantee that the IDs generated will be consecutive and trying to guess them in client code is a bad idea.
You can do this implicitly if your program is single-threaded as follows:
Set the AUTOINCREMENT keyword in your table definition. This will guarantee that any generated row IDs will be higher than any that appear in the table currently.
Immediately before the first statement, determine the highest ROWID in use in the table.
oldmax ← Execute("SELECT max(ROWID) from nodes").
Perform the first insert as before.
Read back the row IDs that were actually assigned with a select statement:
NewNodes ← Execute("SELECT ROWID FROM nodes WHERE ROWID > ? ORDER BY ROWID ASC", oldmax) .
Construct the connection_values array by combining the parent ID from new_values and the child ID from NewNodes.
Perform the second insert as before.
This may or may not be faster than your original code; AUTOINCREMENT can slow down performance, and without actually doing the experiment there's no way to tell.
If your program is writing to nodes from multiple threads, you'll need to guard this algorithm with a mutex as it will not work at all with multiple concurrent writers.

Query syntax to select exactly one item for each category

class Category(models.Model):
pass
class Item(models.Model):
cat = models.ForeignKey(Category)
I want to select exactly one item for each category, which is the query syntax for do this?
Your question isn't entirely clear: since you didn't say otherwise, I'm going to assume that you don't care which item is selected for each category, just that you need any one. If that isn't the case, please update the question to clarify.
tl;dr version: there is no documented
way to explicitly use GROUP BY
statements in Django, except by using
a raw query. See the bottom for code to do so.
The problem is that in doing what you're looking for in SQL itself requires a bit of a hack. You can easily try this example with by entering sqlite3 :memory: at the command line:
CREATE TABLE category
(
id INT
);
CREATE TABLE item
(
id INT,
category_id INT
);
INSERT INTO category VALUES (1);
INSERT INTO category VALUES (2);
INSERT INTO category VALUES (3);
INSERT INTO item VALUES (1,1);
INSERT INTO item VALUES (2,2);
INSERT INTO item VALUES (3,3);
INSERT INTO item VALUES (4,1);
INSERT INTO item VALUES (5,2);
SELECT id, category_id, COUNT(category_id) FROM item GROUP BY category_id;
returns
4|1|2
5|2|2
3|3|1
Which is what you're looking for (one item id for each category id), albeit with an extraneous COUNT. The count (or some other aggregate function) is needed in order to apply the GROUP BY.
Note: this will ignore categories that don't contain any items, which seems like sensible behaviour.
Now the question becomes, how to do this in Django?
The obvious answer is to use Django's aggregation/annotation support, in particular, combining annotate with values as is recommend elsewhere to GROUP queries in Django.
Reading those posts, it would seem we could accomplish what we're looking for with
Item.objects.values('id').annotate(unneeded_count=Count('category_id'))
However this doesn't work. What Django does here is not just GROUP BY "category_id", but groups by all fields selected (ie GROUP BY "id", "category_id")1. I don't believe there is a way (in the public API, at least) to change this behaviour.
The solution is to fall back to raw SQL:
qs = Item.objects.raw('SELECT *, COUNT(category_id) FROM myapp_item GROUP BY category_id')
1: Note that you can inspect what queries Django is running with:
from django.db import connection
print connection.queries[-1]
Edit:
There are a number of other possible approaches, but most have (possibly severe) performance problems. Here are a couple:
1. Select an item from each category.
items = []
for c in Category.objects.all():
items.append(c.item_set[0])
This is a more clear and flexible approach, but has the obvious disadvantage of requiring many more database hits.
2. Use select_related
items = Item.objects.select_related()
and then do the grouping/filtering yourself (in Python).
Again, this is perhaps more clear than using raw SQL and only requires one query, but this one query could be very large (it will return all items and their categories) and doing the grouping/filtering yourself is probably less efficient than letting the database do it for you.

Categories