SQLAlchemy many-to-many query - python

Let's say I have a blog front page with several posts that each have a number of tags (like the example at http://pythonhosted.org/Flask-SQLAlchemy/models.html#many-to-many-relationships but with posts instead of pages). How do I retrieve all tags for all shown posts in a single query with SQLAlchemy?
The way I would do it is this (I'm just curious if there's a better way):
Run a query that returns all relevant posts for the page.
Use a list comprehension to get a list of all post IDs in the above query.
Run a single query that gets all tags where post_id in ( [the list of post IDs I just made] )
Is that the way to do it?

This is of course not the way to do it. The purpose of an ORM like sqlalchemy is to represent the records and all relations/related records as objects which you can just work on without thinking about the underlying sql-queries.
You don't need to retrieve anything. You already have it. The tags-property of your Post()-objects is (something like) a list of Tag()-objects.
I don't know Flask-SQLAlchemy but since you asked for SQLAlchemy I feel free to post a pure SQLAlchemy example that uses the models from the Flask example (and is self contained):
#!/usr/bin/env python3
# coding: utf-8
import sqlalchemy as sqAl
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker, relationship, backref
engine = sqAl.create_engine('sqlite:///m2m.sqlite') #, echo=True)
metadata = sqAl.schema.MetaData(bind=engine)
Base = declarative_base(metadata)
tags = sqAl.Table('tags', Base.metadata,
sqAl.Column('tag_id', sqAl.Integer, sqAl.ForeignKey('tag.id')),
sqAl.Column('page_id', sqAl.Integer, sqAl.ForeignKey('page.id'))
)
class Page(Base):
__tablename__ = 'page'
id = sqAl.Column(sqAl.Integer, primary_key=True)
content = sqAl.Column(sqAl.String)
tags = relationship('Tag', secondary=tags,
backref=backref('pages', lazy='dynamic'))
class Tag(Base):
__tablename__ = 'tag'
id = sqAl.Column(sqAl.Integer, primary_key=True)
label = sqAl.Column(sqAl.String)
def create_sample_data(sess):
tag_strings = ('tag1', 'tag2', 'tag3', 'tag4')
page_strings = ('This is page 1', 'This is page 2', 'This is page 3', 'This is page 4')
tag_obs, page_obs = [], []
for ts in tag_strings:
t = Tag(label=ts)
tag_obs.append(t)
sess.add(t)
for ps in page_strings:
p = Page(content=ps)
page_obs.append(p)
sess.add(p)
page_obs[0].tags.append(tag_obs[0])
page_obs[0].tags.append(tag_obs[1])
page_obs[1].tags.append(tag_obs[2])
page_obs[1].tags.append(tag_obs[3])
page_obs[2].tags.append(tag_obs[0])
page_obs[2].tags.append(tag_obs[1])
page_obs[2].tags.append(tag_obs[2])
page_obs[2].tags.append(tag_obs[3])
sess.commit()
Base.metadata.create_all(engine, checkfirst=True)
session = sessionmaker(bind=engine)()
# uncomment the next line and run it once to create some sample data
# create_sample_data(session)
pages = session.query(Page).all()
for p in pages:
print("page '{0}', content:'{1}', tags: '{2}'".format(
p.id, p.content, ", ".join([t.label for t in p.tags])))
Yes, life can be so easy...

Related

Querying based on related element attributes in SQLAlchemy

For simplicity sake, I will make an example to illustrate my problem.
I have a database that contains a table for baskets (primary keys basket_1, basket_2,..) and a table for fruits (apple_1, apple_2, pear_1, banana_1,...).
Each fruit instance has an attribute that describes its type (apple_1, and apple_2 have an attribute type = 'apple', pear_1 has an attribute type='pear' and so on).
Each basket has a one to many relationship with the fruits (for example basket_1 has an apple_1, an apple_2 and a pear_1).
My question is, given a series of inputs such as [2x elements of type apple and 1 element of type pear], is there a straightforward way to query/find which baskets do indeed contain all those fruits?
I tried something along the lines of:
from sqlalchemy import (
Table, Column, String, ForeignKey, Boolean
)
from sqlalchemy.orm import relationship, declarative_base
from sqlalchemy import (
Table, Column, String, ForeignKey, Boolean
)
from sqlalchemy.orm import relationship, declarative_base
from sqlalchemy.orm import sessionmaker
# Create session
database_path = "C:/Data/my_database.db"
engine = create_engine(database_path)
session = sessionmaker()
session.configure(bind=engine)
# Model
class Basket(Base):
__tablename__ = "baskets"
id = Column(String, primary_key=True)
fruits = relationship("Fruit",backref='baskets')
class Fruit(Base):
__tablename__ = "fruits"
id = Column(String, primary_key=True)
type = Column(String)
parent_basket = Column(String, ForeignKey('basket.id'))
# List of fruits
fruit_list = ['apple', 'apple', 'pear']
# Query baskets that contain all those elements (I currently have no idea on how to set up the condition or should I use a 'join' in this query)
CONDITION_TO_FILTER_ON = "basket should contain as many fruits of each type as specified in the fruit list"
baskets = session.query(Basket).filter(CONDITION_TO_FILTER_ON)
Sorry if the phrasing/explanation is not clear enough. I've been playing around with filters but it still isn't clear enough to me how to approach this.

SQLAlchemy best way to filter a table based on values from another table

I apologize in advance if my question is banal: I am a total beginner of SQL.
I want to create a simple database, with two tables: Students and Answers.
Basically, each student will answer three question (possible answers are True or False for each question), and his answers will be stored in Answers table.
Students can have two "experience" levels: "Undergraduate" and "Graduate".
What is the best way to obtain all Answers that were given by Students with "Graduate" experience level?
This is how I define SQLAlchemy classes for entries in Students and Answers tables:
import random
from sqlalchemy import create_engine
from sqlalchemy import Column, Integer, String, Date, Boolean, ForeignKey
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker, relationship
db_uri = "sqlite:///simple_answers.db"
db_engine = create_engine(db_uri)
db_connect = db_engine.connect()
Session = sessionmaker()
Session.configure(bind=db_engine)
db_session = Session()
Base = declarative_base()
class Student(Base):
__tablename__ = "Students"
id = Column(Integer, primary_key=True)
experience = Column(String, nullable=False)
class Answer(Base):
__tablename__ = "Answers"
id = Column(Integer, primary_key=True)
student_id = Column(Integer, ForeignKey("Students.id"), nullable=False)
answer = Column(Boolean, nullable=False)
Base.metadata.create_all(db_connect)
Then, I insert some random entries in the database:
categories_experience = ["Undergraduate", "Graduate"]
categories_answer = [True, False]
n_students = 20
n_answers_by_each_student = 3
random.seed(1)
for _ in range(n_students):
student = Student(experience=random.choice(categories_experience))
db_session.add(student)
db_session.commit()
answers = [Answer(student_id=student.id, answer=random.choice(categories_answer))
for _ in range(n_answers_by_each_student)]
db_session.add_all(answers)
db_session.commit()
Then, I obtain Student.id of all "Graduate" students:
ids_graduates = db_session.query(Student.id).filter(Student.experience == "Graduate").all()
ids_graduates = [result.id for result in ids_graduates]
And finally, I select Answers from "Graduate" Students using .in_ operator:
answers_graduates = db_session.query(Answer).filter(Answer.student_id.in_(ids_graduates)).all()
I manually checked the answers, and they are right. But, since I am a total beginner of SQL, I suspect that there is some better way to achieve the same result.
Is there such an objectively "best" way (more Pythonic, more efficient...)? I would like to achieve my result with SQLAlchemy, possibly using the ORM interface.
When I asked the question, I was in a hurry.
Since then, I have had the time to study SQLAlchemy ORM documentation.
There are two recommended ways to filter tables based on values from another table.
The first way is actually very similar to what I had originally tried:
query_graduates = (
db_session
.query(User.id)
.filter(User.experience == "Graduate")
)
query_answers_graduates = (
db_session
.query(Answer)
.filter(Answer.user_id.in_(query_graduates))
)
answers_graduates = query_answers_graduates.all()
It uses .in_ operator, which accepts as argument either a list of objects, or another query.
The second way uses .join method:
query_answers_graduates = (
db_session
.query(Answer)
.join(User)
.filter(User.experience == "Graduate")
)
The second approach is more concise. I timed both solutions, and the second approach, which uses .join, is slightly faster.
You are mentioning SQL but I am confused if you want to do this particular step in Python or SQL. If SQL, something like this could work:
select * from Students s
inner join Answers a on s.id = a.student_id
where s.experience = "Graduate";
Updated code
I have never used SQLAlchemy before but something similar to this may work...
sql = """select s.Id, a.answer from Students s
inner join Answers a on s.id = a.student_id
where s.experience = "Graduate";"""
with db_session as con:
rows = con.execute(sql)
for row in rows:
print(row)

SQLAlchemy - pass a dynamic tablename to query function?

I have a simple polling script that polls entries based on new ID's in a MSSQL table. I'm using SQLAlchemy's ORM to create a table class and then query that table. I want to be able to add more tables "dynamically" without coding it directly into the method.
My polling function:
def poll_db():
query = db.query(
Transactions.ID).order_by(Transactions.ID.desc()).limit(1)
# Continually poll for new images to classify
max_id_query = query
last_max_id = max_id_query.scalar()
while True:
max_id = max_id_query.scalar()
if max_id > last_max_id:
print(
f"New row(s) found. "
f"Processing ids {last_max_id + 1} through {max_id}"
)
# Insert ML model
id_query = db.query(Transactions).filter(
Transactions.ID > last_max_id)
df_from_query = pd.read_sql_query(
id_query.statement, db.bind, index_col='ID')
print(f"New query was made")
last_max_id = max_id
time.sleep(5)
My table model:
import sqlalchemy as db
from sqlalchemy import Boolean, Column, ForeignKey, Integer, String, Text
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import defer, relationship, query
from database import SessionLocal, engine
insp = db.inspect(engine)
db_list = insp.get_schema_names()
Base = declarative_base(cls=BaseModel)
class Transactions(Base):
__tablename__ = 'simulation_data'
sender_account = db.Column('sender_account', db.BigInteger)
recipient_account = db.Column('recipient_account', db.String)
sender_name = db.Column('sender_name', db.String)
recipient_name = db.Column('recipient_name', db.String)
date = db.Column('date', db.DateTime)
text = db.Column('text', db.String)
amount = db.Column('amount', db.Float)
currency = db.Column('currency', db.String)
transaction_type = db.Column('transaction_type', db.String)
fraud = db.Column('fraud', db.BigInteger)
swift_bic = db.Column('swift_bic', db.String)
recipient_country = db.Column('recipient_country', db.String)
internal_external = db.Column('internal_external', db.String)
ID = Column('ID', db.BigInteger, primary_key=True)
QUESTION
How can I pass the table class name "dynamically" in the likes of poll_db(tablename), where tablename='Transactions', and instead of writing similar queries for multiple tables, such as:
query = db.query(Transactions.ID).order_by(Transactions.ID.desc()).limit(1)
query2 = db.query(Transactions2.ID).order_by(Transactions2.ID.desc()).limit(1)
query3 = db.query(Transactions3.ID).order_by(Transactions3.ID.desc()).limit(1)
The tables will have identical structure, but different data.
I can't give you a full example right now (will edit later) but here's one hacky way to do it (the documentation will probably be a better place to check):
def dynamic_table(tablename):
for class_name, cls in Base._decl_class_registry.items():
if cls.__tablename__ == tablename:
return cls
Transactions2 = dynamic_table("simulation_data")
assert Transactions2 is Transactions
The returned class is the model you want. Keep in mind that Base can only access the tables that have been subclassed already so if you have them in other modules you need to import them first so they are registered as Base's subclasses.
For selecting columns, something like this should work:
def dynamic_table_with_columns(tablename, *columns):
cls = dynamic_table(tablename)
subset = []
for col_name in columns:
column = getattr(cls, col_name)
if column:
subset.append(column)
# in case no columns were given
if not subset:
return db.query(cls)
return db.query(*subset)

SQLalchemy custom String primary_key sequence

For the life of me, I cannot think of a simple way to accomplishing this without querying the database whenever a new record is created, but this is what I'm trying to do with sqlalchemy+postgresql:
I would like to have a primary key of a given table follow this format:
YYWW0001, YYWW0002 etc. such that I see values like 20010001, 20010002 such that the last four digits are only incremented within the given week of the year, then resetting when a new week or year is entered.
I'm at the limit of my knowledge here so any help is greatly appreciated!
In the meantime, I am looking into sqlalchemy.schema.Sequence.
Another thing I can think to try is creating a table that has let's say 10,000 records that just have a plain Integer primary key and the actual ID I want, then find some sort of 'next' method to pull from that table when my Core object is constructed? This seems less than Ideal in my mind since I would still need to ensure that the data portion of the id in the table is correct and current. I think if there is a dynamic approach it would best suit my needs.
so far my naiive implementation looks like this:
BASE = declarative_base()
_name = os.environ.get('HW_QC_USER_ID', None)
_pass = os.environ.get('HW_QC_USER_PASS', None)
_ip = os.environ.get('HW_QC_SERVER_IP', None)
_db_name = 'mock'
try:
print('Creating engine')
engine = create_engine(
f'postgresql://{_name}:{_pass}#{_ip}/{_db_name}',
echo=False
)
except OperationalError as _e:
print('An Error has occured when connecting to the database')
print(f'postgresql://{_name}:{_pass}#{_ip}/{_db_name}')
print(_e)
class Core(BASE):
"""
This class describes a master table.
"""
__tablename__ = 'cores'
udi = Column(String(11), primary_key=True, unique=True) # <-- how do I get this to be the format described?
_date_code = Column(
String(4),
default=datetime.datetime.now().strftime("%y%U")
)
BASE.metadata.create_all(engine)
session = sessionmaker(bind=engine)()
date_code_now = datetime.datetime.now().strftime("%y%U")
cores_from_this_week = session.query(Core).filter(
Core._date_code == date_code_now
).all()
num_cores_existing = len(cores_from_this_week)
new_core = Core(
udi=f'FRA{date_code_now}{num_cores_existing+1:04}'
)
session.add(new_core)
session.commit()
session.close()
engine.dispose()

Updating row in SqlAlchemy ORM

I am trying to obtain a row from DB, modify that row and save it again.
Everything by using SqlAlchemy
My code
from sqlalchemy import Column, DateTime, Integer, String, Table, MetaData
from sqlalchemy.orm import mapper
from sqlalchemy import create_engine, orm
metadata = MetaData()
product = Table('product', metadata,
Column('id', Integer, primary_key=True),
Column('name', String(1024), nullable=False, unique=True),
)
class Product(object):
def __init__(self, id, name):
self.id = id
self.name = name
mapper(Product, product)
db = create_engine('sqlite:////' + db_path)
sm = orm.sessionmaker(bind=db, autoflush=True, autocommit=True, expire_on_commit=True)
session = orm.scoped_session(sm)
result = session.execute("select * from product where id = :id", {'id': 1}, mapper=Product)
prod = result.fetchone() #there are many products in db so query is ok
prod.name = 'test' #<- here I got AttributeError: 'RowProxy' object has no attribute 'name'
session .add(prod)
session .flush()
Unfortunately it does not work, because I am trying to modify RowProxy object. How can I do what I want (load, change and save(update) row) in SqlAlchemy ORM way?
I assume that your intention is to use Object-Relational API.
So to update row in db you'll need to do this by loading mapped object from the table record and updating object's property.
Please see code example below.
Please note I've added example code for creating new mapped object and creating first record in table also there is commented out code at the end for deleting the record.
from sqlalchemy import Column, DateTime, Integer, String, Table, MetaData
from sqlalchemy.orm import mapper
from sqlalchemy import create_engine, orm
metadata = MetaData()
product = Table('product', metadata,
Column('id', Integer, primary_key=True),
Column('name', String(1024), nullable=False, unique=True),
)
class Product(object):
def __init__(self, id, name):
self.id = id
self.name = name
def __repr__(self):
return "%s(%r,%r)" % (self.__class__.name,self.id,self.name)
mapper(Product, product)
db = create_engine('sqlite:////temp/test123.db')
metadata.create_all(db)
sm = orm.sessionmaker(bind=db, autoflush=True, autocommit=True, expire_on_commit=True)
session = orm.scoped_session(sm)
#create new Product record:
if session.query(Product).filter(Product.id==1).count()==0:
new_prod = Product("1","Product1")
print "Creating new product: %r" % new_prod
session.add(new_prod)
session.flush()
else:
print "product with id 1 already exists: %r" % session.query(Product).filter(Product.id==1).one()
print "loading Product with id=1"
prod = session.query(Product).filter(Product.id==1).one()
print "current name: %s" % prod.name
prod.name = "new name"
print prod
prod.name = 'test'
session.add(prod)
session.flush()
print prod
#session.delete(prod)
#session.flush()
PS SQLAlchemy also provides SQL Expression API that allows to work with table records directly without creating mapped objects. In my practice we are using Object-Relation API in most of the applications, sometimes we use SQL Expressions API when we need to perform low level db operations efficiently such as inserting or updating thousands of records with one query.
Direct links to SQLAlchemy documentation:
Object Relational Tutorial
SQL Expression Language Tutorial

Categories