Nested JSON Output of SQLAlchemy Query with Join - python

I have two tables Orders and OrderItems. It's a common setup whereby OrderItems has a foreign key linking it to Orders. So we have a one-to-many join from Orders to OrderItems.
Note: Tables would have many more fields in real life.
Orders OrderItems
+---------+ +-------------+---------+
| orderId | | orderItemId | orderId |
+---------+ +-------------+---------+
| 1 | | 5 | 1 |
| 2 | | 6 | 1 |
| | | 7 | 2 |
+---------+ +-------------+---------+
I'm using SQLAlchemy to reflect an existing database. So to query this data I do something like
ordersTable = db.Model.metadata.tables['Orders']
orderItemsTable = db.Model.metadata.tables['OrdersItems']
statement = ordersTable.join(orderItemsTable, ordersTable.c.orderId==orderItemsTable.c.orderId).select()
result = db.engine.execute(statement)
rlist = [dict(row) for row in result.fetchall()]
return flask.jsonify(rlist)
But the problem with this output is that I get duplicates of information from the Orders table due to the join. E.g. you can see that because orderId has two items I'll get everything in the Orders table twice.
What I'm after is a way to obtain a nested JSON output from the select query aboce. Such as:
[
{
"orderId": 1,
"orderItems": [
{ "orderItemId": 5 },
{ "orderItemId": 6 }
]
},
{
"orderId": 2,
"orderItems":[
{ "orderItemId": 7 }
]
}
]
This question has been raised before
How do I produce nested JSON from database query with joins? Using Python / SQLAlchemy
I've spent quite a bit of time looking over the Marshmallow documentation, but I cannot find how to implement this using the type of query that I outlined above.

I didn't like how cluttered marshmallow is, so I wrote this. I also like that I can keep all of the data manipulation in the SQL statement instead of also instructing marshmallow what to do.
import json
from flask.json import JSONEncoder
def join_to_nested_dict(join_result):
"""
Takes a sqlalchemy result and converts it to a dictionary.
The models must use the dataclass decorator.
Adds results to the right in a key named after the table the right item is contained in.
:param List[Tuple[dataclass]] join_result:
:return dict:
"""
if len(join_result) == 0:
return join_result
# couldn't be the result of a join without two entries on each row
assert(len(join_result[0]) >= 2)
right_name = join_result[0][1].__tablename__
# if there are multiple joins recurse on sub joins
if len(join_result[0]) > 2:
right = join_to_nested_dict([res[1:] for res in join_result])
elif len(join_result[0]) == 2:
right = [
json.loads(json.dumps(row[1], cls=JSONEncoder))
for row in join_result if row[1] is not None
]
right_items = {item['id']: item for item in right}
items = {}
for row in join_result:
# in the case of a right outer join
if row[0] is None:
continue
if row[0].id not in items:
items[row[0].id] = json.loads(json.dumps(row[0], cls=JSONEncoder))
# in the case of a left outer join
if row[1] is None:
continue
if right_name not in items[row[0].id]:
items[row[0].id][right_name] = []
items[row[0].id][right_name].append(right_items[row[1].id])
return list(items.values())
And you should be able to just plug the result into this function. However you will need to add the dataclass decorator to your models for this code to work.
statement = ordersTable.join(orderItemsTable, ordersTable.c.orderId==orderItemsTable.c.orderId).select()
result = db.engine.execute(statement)
join_to_nested_dict(result)
Also, if you don't want to use the flask json encoder you can delete the import and cls arguments.

Related

Django query Select X from Table where Y = Z

It's just yet hard to me to clearly understand the way that Django makes queries.
I have two tables:
Table A:
+----+-----+----+
| id |code |name|
+----+-----+----+
Table B:
+----+----+
| id |name|
+----+----+
Value of name of both tables can be equal (or not). What I need to do is to get the value of Table A column code, by comparing both tables' name if Table B does match with Table A in any row.
Example:
Table A:
+----+----+----+
| id |code|name|
+----+----+----+
| 4 | A1 |John|
+----+----+----+
Table B:
+----+----+
| id |name|
+----+----+
| 96 |John|
+----+----+
So, by comparing John (B) with John (A), I need A1 to be returned, since it's the code result in the same row that matches on Table A.
In conclusion I need a Django code to do the query:
a_name = 'John'
SELECT code FROM Table_A WHERE name = a_name
Take into account that I only know the value of table B, therefore I can't get the value of code by Table A's name.
Another approach is to use Django's values and values_list methods. You provide the field name you want data for.
values = Table_A.objects.filter(name=B_name).values('code')
This returns a dictionary with only the code values in it. From the django documentation, https://docs.djangoproject.com/en/2.1/ref/models/querysets/#django.db.models.query.QuerySet.values
Or you can use values_list to format the result as a list.
values = Table_A.objects.filter(name=B_name).values_list('code')
This will return a list of tuples, even if you only request one field. The django documentation, https://docs.djangoproject.com/en/2.1/ref/models/querysets/#django.db.models.query.QuerySet.values_list
To try to make this a little more robust, you first get your list of named values from Table_B. Supplying flat=True creates a true list, as values_list will give you a list of tuples. Then use the list to filter on Table_A. You can return just the code or the code and name. As written, it returns a flat list of user codes for every matching name in Table A and Table B.
b_names_list = Table_B.objects.values_list('name', flat=True)
values =Table_A.objects.filter(name__in=b_names_list).values_list('code', flat=True)
Suppose name of your tables are A and B respectively then:
try:
obj = A.objects.get(name='John')
if B.objects.filter(name='John').exists():
print obj.code # found a match and now print code.
except:
pass
Let's suppose TableA and TableB are django models. Then, your query, may look like this:
a_name = 'John'
it_matches_on_b = ( Table_B
.objects
.filter( name = a_name )
.exists()
)
fist_a = ( Table_A
.objects
.filter( name = a_name )
.first()
)
your_code = fist_a.code if it_matches_on_b and fist_a != None else None
I don't comment code because it is self-explanatory. But write questions on comments if you have.
B_name = ‘whatever’
Table_A.objects.filter(name = B_name)
The above is the basic query if you want to get the db fields values connected to name value from Table_A, based on the fact that you know the name value of Table_B
To get the value:
obj = Table_A.objects.get(name = B_name)
print(obj.name)
print(obj.code) # if you want the 'code' field value

Insert a value into a row with petl?

I'm using petl and trying to figure out how to insert a value into a specific row.
I have a table that looks like this:
+----------------+---------+------------+
| Cambridge Data | IRR | Price List |
+================+=========+============+
| '3/31/1989' | '4.37%' | |
+----------------+---------+------------+
| '4/30/1989' | '5.35%' | |
+----------------+---------+------------+
I want to set the price list to 100 on the row where Cambridge Data is 4/30/1989. This is what I have so far:
def insert_initial_price(self, table):
import petl as etl
initial_price_row = etl.select(table, 'Cambridge Data', lambda v: v == '3/31/1989')
That selects the row I need to insert 100 into, but i'm unsure how to insert it. petl doesn't seem to have an "insert value" function.
I would advice not to use select.
To update the value of a field use convert.
See the docs with many examples: https://petl.readthedocs.io/en/stable/transform.html#petl.transform.conversions.convert
I have not tested it, but this should solve it:
import petl as etl
table2 = etl.convert(
table,
'Price List',
100,
where = lambda rec: rec["Cambridge Data"] == '4/30/1989',
)

Get Max Value of Column out of a a query list in Python

I am new to Python and am currently trying to create a Web-form to edit customer data. The user selects a customer and gets all DSL-Products linked to the customer. What I am now trying is to get the maximum downstream possible for a customer. So when the customer got DSL1, DSL3 and DSL3 then his MaxDownstream is 550. Sorry for my poor english skills.
Here is the structure of my tables..
Customer_has_product:
Customer_idCustomer | Product_idProduct
----------------------------
1 | 1
1 | 3
1 | 4
2 | 5
3 | 3
Customer:
idCustomer | MaxDownstream
----------------------------
1 |
2 |
3 |
Product:
idProduct | Name | downstream
-------------------------------------------------
1 | DSL1 | 50
2 | DSL2 | 100
3 | DSL3 | 550
4 | DSL4 | 400
5 | DSL5 | 1000
And the code i've got so far:
db_session = Session(db_engine)
customer_object = db_session.query(Customer).filter_by(
idCustomer=productform.Customer.data.idCustomer
).first()
productlist = request.form.getlist("DSLPRODUCTS_PRIVATE")
oldproducts = db_session.query(Customer_has_product.Product_idProduct).filter_by(
Customer_idCustomer=customer_object.idCustomer)
id_list_delete = list(set([r for r, in oldproducts]) - set(productlist))
for delid in id_list_delete:
db_session.query(Customer_has_product).filter_by(Customer_idCustomer=customer_object.idCustomer,
Product_idProduct=delid).delete()
db_session.commit()
for product in productlist:
if db_session.query(Customer_has_product).filter_by(
Customer_idCustomer=customer_object.idCustomer,
Product_idProduct=product
).first() is not None:
continue
else:
product_link_to_add = Customer_has_product(
Customer_idCustomer=productform.Customer.data.idCustomer,
Product_idProduct=product
)
db_session.add(product_link_to_add)
db_session.commit()
What you want to do is JOIN the tables onto each other. All relational database engines support joins, as does SQLAlchemy.
So how do you do that in SQLAlchemy?
You have two options, really. One is to use the Query builder of SQLAlchemy's ORM, the other is using SQLAlchemy Core (upon which the ORM is built) directly. I really prefer the later, because it maps more directly to SELECT statements, but I'm going to show both.
Using SQLAlchemy Core
How to do a join in Core is documented here. First argument is the table to JOIN to, second argument is the JOIN-condition.
from sqlalchemy import select, func
query = select(
[
Customer.idCustomer,
func.max(Product.downstream),
]
).select_from(
Customer.__table__
.join(Customer_has_product.__table__,
Customer_has_product.Customer_idCustomer ==
Customer.idCustomer)
.join(Product.__table__,
Product.idProduct == Customer_has_product.Product_idProduct)
).group_by(
Customer.idCustomer
)
# Now we can execute the built query on the database.
result = db_session.execute(query).fetchall()
print(result) # Should now give you the correct result.
Using SQLAlchemy ORM
To simplify this it's best to declare some [relationships on your models][2].joinis documented [here][2]. First argument tojoin` is the model to join onto and the second argument is the JOIN-condition again.
Without the relationships you'll have to do it like this.
result = (db_session
.query(Customer.idCustomer, func.max(Product.downstream))
.join(Customer_has_product,
Customer_has_product.Customer_idCustomer ==
Customer.idCustomer)
.join(Product,
Product.idProduct == Customer_has_product.Product_idProduct)
.group_by(Customer.idCustomer)
).all()
print(result)
This should be enough to get the idea on how to do this.

SQLAlchemy; prevent automatic selection when ordering

With SQLAlchemy ORM querying with PostgreSQL(v9.5); how to prevent the automatic selection when sorting by a column; the sorted column should not be selected.
Hopefully the sample code below makes this more clear.
Example code
A table with an integer 'id', an integer 'object_id' and a string 'text':
id | object_id | text
---------------------
1 | 1 | house
2 | 2 | tree
3 | 1 | dog
The following query should return the distinct object_id as its own id with the most recent text:
query = session.query(
MyTable.object_id.label('id'),
MyTable.text
).\
distinct(MyTable.object_id).\
order_by(MyTable.object_id, MyTable.id.desc())
So far so good; but when I compile the query:
print(query.statement.compile(dialect=postgresql.dialect()))
The mytable.id and mytable.object_id are selected as well, so the column id is specified twice:
SELECT DISTINCT ON (mytable.object_id) mytable.object_id AS id,
mytable.text,
mytable.object_id,
mytable.id
FROM mytable
ORDER BY mytable.object_id,
mytable.id DESC
You can try it. It should work:
query = session.query(MyTable.object_id.distinct().label('id'), MyTable.text).order_by(MyTable.object_id, MyTable.id.desc())

python 2.7 sqlite3 cursor returns only one result

I'm fiddling around with Python and SQLite. I have a table structured like so:
PID | CID | a | b
=================
1 | 1 | ...
1 | 2 | ...
2 | 1 | ...
2 | 2 | ...
where PID is the ID of one object, and CID the id of another. Basically, a table that keeps track of the relationships between these two objects with properties (a, b, etc...) that may override those of the objects.
When I execute the following statement in python (c is a sqlite3 cursor):
results = c.execute("SELECT cid FROM test WHERE pid=?", (the_related_id,)).fetchmany()
I only get one result in the list, however, when I run the same (?) query in a sqlite browser, I get many results as expected:
SELECT cid FROM test WHERE pid=1
Whats the deal?
The number of rows to fetch per call is specified by the size parameter. If it is not given, the cursor’s arraysize determines the number of rows to be fetched.
results = c.execute("SELECT cid FROM test WHERE pid=?", (the_related_id,)).fetchmany(N)
will return N rows.
If you want to retrieve all the rows, use fetchall() function instead:
results = c.execute("SELECT cid FROM test WHERE pid=?", (the_related_id,)).fetchall()

Categories