SQLAlchemy: relationship collection lazy loading - python

As per the SQLAlchemy documentation on relationship loading:
When the given collection or reference is first accessed on a particular object, an additional SELECT statement is emitted such that the requested collection is loaded.
How do I achieve loading behavior such that only the single elements of a relationship collection that I access are loaded, rather than the entire collection all at once?
I have heard of deferred column loading; this would be more like "deferred row loading". Rather than deferring loading of attributes, I'd like to defer loading of relationship collection elements.
Desired use case:
# Persist instance.
coln = Collection([1, 2, 3])
session.add(coln)
session.commit()
# Test lazy loading.
print('data' in coln.__dict__)
# Lazy loads the entire collection. I'd like only one element.
print(coln.data[1])
# Will output: "True 3". I'd like: "True 1".
print('data' in coln.__dict__, len(coln.__dict__['data']))
Class definitions and other backwork:
from sqlalchemy import Column, Integer, ForeignKey
from sqlalchemy import create_engine
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import relationship
Base = declarative_base()
engine = create_engine('sqlite:///:memory:')
# Define classes.
class Collection(Base):
__tablename__ = 'collection'
id = Column(Integer, primary_key=True)
data = relationship('Element')
def __init__(self, list_):
self.data = [Element(e) for e in list_]
class Element(Base):
__tablename__ = 'element'
id = Column(Integer, primary_key=True)
parent_id = Column(Integer, ForeignKey('collection.id'))
value = Column(Integer)
def __init__(self, value):
self.value = value
def __repr__(self):
return 'Element({})'.format(self.value)
# Create schema.
Base.metadata.create_all(engine)
# Create session.
from sqlalchemy.orm import sessionmaker
Session = sessionmaker(bind=engine)
session = Session()

Use the lazy parameter with dynamic value:
data = relationship('Element', lazy='dynamic')
https://docs.sqlalchemy.org/en/13/orm/collections.html#dynamic-relationship

Related

Efficiently copy data between databases with sqlalchemy

I'm trying to mirror a postgresql + PostGIS database that I defined with sqlalchemy to a sqlite (spatialite) file database. The session.merge() method appears to work for adding the instances queried from the first session to the other session, but it does not scale for nearly a million rows. See the example below that copies data from an in-memory sqlite database to another memory database for the sake of easy reproducibility. I'm looking for an approach (potentially completely different from what I'm doing now) to efficiently move all the data from one database to another.
from sqlalchemy import create_engine
from sqlalchemy import Table, Column, Integer, ForeignKey, String
from sqlalchemy.orm import declarative_base, sessionmaker
from sqlalchemy.orm import relationship, joinedload
engine_0 = create_engine('sqlite:///:memory:', echo=True)
engine_1 = create_engine('sqlite:///:memory:', echo=True)
Base = declarative_base()
Session0 = sessionmaker(bind=engine_0)
Session1 = sessionmaker(bind=engine_1)
# Define ORM models
association_table = Table('association', Base.metadata,
Column('parent_id', ForeignKey('parent.id'), primary_key=True),
Column('child_id', ForeignKey('child.id'), primary_key=True)
)
class Parent(Base):
__tablename__ = 'parent'
id = Column(Integer, primary_key=True)
name = Column(String)
children = relationship(
"Child",
secondary=association_table,
back_populates="parents")
class Child(Base):
__tablename__ = 'child'
id = Column(Integer, primary_key=True)
name = Column(String)
parents = relationship(
"Parent",
secondary=association_table,
back_populates="children")
# Create schema
Base.metadata.create_all(engine_0)
Base.metadata.create_all(engine_1)
# Create some example instances
# Children
bart = Child(name='Bart')
lisa = Child(name='Lisa')
maggie = Child(name='Maggie')
milhouse = Child(name='Milhouse')
# Parents
homer = Parent(name='Homer',
children=[bart, lisa, maggie])
marge = Parent(name='Marge',
children=[bart, lisa, maggie])
flanders = Parent(name='Ned')
kirk = Parent(name='Kirk', children=[milhouse])
# Insert data into first database
session_0 = Session0()
session_0.add_all([homer, marge, flanders, kirk])
session_0.commit()
# Query the data and insert it into the second database
all_obj = session_0.query(Parent).options(joinedload('*')).all()
session_0.expunge_all()
session_1 = Session1()
for obj in all_obj:
session_1.merge(obj)
session_1.commit()
# MAke sure that 4 instance of child are present in the second database
print(session_1.query(Child).all())
One alternative approach I have tried (unsuccessfully) is to make the parent objects transient using the sqlalchemy.orm.make_transient() function and use session.add_all() instead of session.merge() to insert the objects into the second session. However, this does not propagate to the relationships and only Parent objects are made transient.

Scrapy - SQLalchemy Foreign Key not created in SQLite

I tried to run Scrapy using itemLoader to collect all the data and put them into SQLite 3. I am success in gathering all the info I wanted but I cannot get the foreign keys to be generated in my ThreadInfo and PostInfo tables using back_populates with foreign key. I did try with back_ref but it also did not work.
All the other info was inserted to SQLite database after my Scrapy finished.
My goal is to have four tables, boardInfo, threadInfo, postInfo, and authorInfo linked to each others.
boardInfo will have one-to-many relationship with threadInfo
threadInfo will have one-to-many relationship with postInfo
authorInfo will have one-to-many relationship with threadInfo and
postInfo.
I used DB Browser for SQLite and found that the values of my foreign keys are Null.
I tried query for the value (threadInfo.boardInfos_id), and it displayed None. I try to fix this for many days and read through the document but cannot solve the issue.
How can I have the foriegn keys generated in my threadInfo and postInfo tables?
Thank you for all guidances and comments.
Here is my models.py
from sqlalchemy import create_engine, Column, Table, ForeignKey, MetaData
from sqlalchemy import Integer, String, Date, DateTime, Float, Boolean, Text
from sqlalchemy.orm import relationship
from sqlalchemy.ext.declarative import declarative_base
from scrapy.utils.project import get_project_settings
Base = declarative_base()
def db_connect():
'''
Performs database connection using database settings from settings.py.
Returns sqlalchemy engine instance
'''
return create_engine(get_project_settings().get('CONNECTION_STRING'))
def create_table(engine):
Base.metadata.create_all(engine)
class BoardInfo(Base):
__tablename__ = 'boardInfos'
id = Column(Integer, primary_key=True)
boardName = Column('boardName', String(100))
threadInfosLink = relationship('ThreadInfo', back_populates='boardInfosLink') # One-to-Many with threadInfo
class ThreadInfo(Base):
__tablename__ = 'threadInfos'
id = Column(Integer, primary_key=True)
threadTitle = Column('threadTitle', String())
threadLink = Column('threadLink', String())
threadAuthor = Column('threadAuthor', String())
threadPost = Column('threadPost', Text())
replyCount = Column('replyCount', Integer)
readCount = Column('readCount', Integer)
boardInfos_id = Column(Integer, ForeignKey('boardInfos.id')) # Many-to-One with boardInfo
boardInfosLink = relationship('BoardInfo', back_populates='threadInfosLink') # Many-to-One with boardInfo
postInfosLink = relationship('PostInfo', back_populates='threadInfosLink') # One-to-Many with postInfo
authorInfos_id = Column(Integer, ForeignKey('authorInfos.id')) # Many-to-One with authorInfo
authorInfosLink = relationship('AuthorInfo', back_populates='threadInfosLink') # Many-to-One with authorInfo
class PostInfo(Base):
__tablename__ = 'postInfos'
id = Column(Integer, primary_key=True)
postOrder = Column('postOrder', Integer, nullable=True)
postAuthor = Column('postAuthor', Text(), nullable=True)
postContent = Column('postContent', Text(), nullable=True)
postTimestamp = Column('postTimestamp', Text(), nullable=True)
threadInfos_id = Column(Integer, ForeignKey('threadInfos.id')) # Many-to-One with threadInfo
threadInfosLink = relationship('ThreadInfo', back_populates='postInfosLink') # Many-to-One with threadInfo
authorInfos_id = Column(Integer, ForeignKey('authorInfos.id')) # Many-to-One with authorInfo
authorInfosLink = relationship('AuthorInfo', back_populates='postInfosLink') # Many-to-One with authorInfo
class AuthorInfo(Base):
__tablename__ = 'authorInfos'
id = Column(Integer, primary_key=True)
threadAuthor = Column('threadAuthor', String())
postInfosLink = relationship('PostInfo', back_populates='authorInfosLink') # One-to-Many with postInfo
threadInfosLink = relationship('ThreadInfo', back_populates='authorInfosLink') # One-to-Many with threadInfo
Here is my pipelines.py
from sqlalchemy import exists, event
from sqlalchemy.orm import sessionmaker
from scrapy.exceptions import DropItem
from .models import db_connect, create_table, BoardInfo, ThreadInfo, PostInfo, AuthorInfo
from sqlalchemy.engine import Engine
from sqlite3 import Connection as SQLite3Connection
import logging
#event.listens_for(Engine, "connect")
def _set_sqlite_pragma(dbapi_connection, connection_record):
if isinstance(dbapi_connection, SQLite3Connection):
cursor = dbapi_connection.cursor()
cursor.execute("PRAGMA foreign_keys=ON;")
# print("####### PRAGMA prog is running!! ######")
cursor.close()
class DuplicatesPipeline(object):
def __init__(self):
'''
Initializes database connection and sessionmaker.
Creates tables.
'''
engine = db_connect()
create_table(engine)
self.Session = sessionmaker(bind=engine)
logging.info('****DuplicatesPipeline: database connected****')
def process_item(self, item, spider):
session = self.Session()
exist_threadLink = session.query(exists().where(ThreadInfo.threadLink == item['threadLink'])).scalar()
exist_thread_replyCount = session.query(ThreadInfo.replyCount).filter_by(threadLink = item['threadLink']).scalar()
if exist_threadLink is True: # threadLink is in DB
if exist_thread_replyCount < item['replyCount']: # check if replyCount is more?
return item
session.close()
else:
raise DropItem('Duplicated item found and replyCount is not changed')
session.close()
else: # New threadLink to be added to BoardPipeline
return item
session.close()
class BoardPipeline(object):
def __init__(self):
'''
Initializes database connection and sessionmaker
Creates tables
'''
engine = db_connect()
create_table(engine)
self.Session = sessionmaker(bind=engine)
def process_item(self, item, spider):
'''
Save scraped info in the database
This method is called for every item pipeline component
'''
session = self.Session()
# Input info to boardInfos
boardInfo = BoardInfo()
boardInfo.boardName = item['boardName']
# Input info to threadInfos
threadInfo = ThreadInfo()
threadInfo.threadTitle = item['threadTitle']
threadInfo.threadLink = item['threadLink']
threadInfo.threadAuthor = item['threadAuthor']
threadInfo.threadPost = item['threadPost']
threadInfo.replyCount = item['replyCount']
threadInfo.readCount = item['readCount']
# Input info to postInfos
# Due to info is in list, so we have to loop and add it.
for num in range(len(item['postOrder'])):
postInfoNum = 'postInfo' + str(num)
postInfoNum = PostInfo()
postInfoNum.postOrder = item['postOrder'][num]
postInfoNum.postAuthor = item['postAuthor'][num]
postInfoNum.postContent = item['postContent'][num]
postInfoNum.postTimestamp = item['postTimestamp'][num]
session.add(postInfoNum)
# Input info to authorInfo
authorInfo = AuthorInfo()
authorInfo.threadAuthor = item['threadAuthor']
# check whether the boardName exists
exist_boardName = session.query(exists().where(BoardInfo.boardName == item['boardName'])).scalar()
if exist_boardName is False: # the current boardName does not exists
session.add(boardInfo)
# check whether the threadAuthor exists
exist_threadAuthor = session.query(exists().where(AuthorInfo.threadAuthor == item['threadAuthor'])).scalar()
if exist_threadAuthor is False: # the current threadAuthor does not exists
session.add(authorInfo)
try:
session.add(threadInfo)
session.commit()
except:
session.rollback()
raise
finally:
session.close()
return item
From the code I can see, it doesn't look to me like you are setting ThreadInfo.authorInfosLink or ThreadInfo.authorInfos_id anywhere (the same goes for all of your FK/relationships).
For the related objects to be attached to a ThreadInfo instance, you need to create them and then attach them something like:
# Input info to authorInfo
authorInfo = AuthorInfo()
authorInfo.threadAuthor = item['threadAuthor']
threadInfo.authorInfosLink = authorInfo
You probably don't want to session.add() each object if it's related via FK. You'll want to:
instantiate a BoardInfo object bi
then instantiate attach your related ThreadInfo object ti
attach your the related object eg bi.threadInfosLink = ti
At the end of all of your chained relationships, you can simply add bi to the session using session.add(bi) -- all of the related objects will be added through their relationships and the FKs will be correct.
Per the discussion in the comments of my other answer, below is how I would rationalize your models to make them make more sense to me.
Notice:
I have removed the unnecessary "Info" everywhere
I have removed explicit column names from your model definitions and will rely instead on SQLAlchemy's ability to infer those for me based on my attribute names
In a "Post" object I do not name the attribute PostContent, it's implied that the content relates to the Post because that's how we're accessing it -- instead simply call the attribute "Post"
I've removed all "Link" terminology -- in places where I think you want a reference to a collection of related objects I've provided a plural attribute of that object as the relationship.
I've left a line in the Post model for you to remove. As you can see, you don't need "author" twice -- once as a related object and once on the Post, that defeats the purpose of the FKs.
With these changes, when you attempt to use these models from your other code it becomes obvious where you need to use .append() and where you simply assign the related object. For a given Board object you know that 'threads' is a collection just based on the attribute name, so you're going to do something like b.threads.append(thread)
from sqlalchemy import create_engine, Column, Table, ForeignKey, MetaData
from sqlalchemy import Integer, String, Date, DateTime, Float, Boolean, Text
from sqlalchemy.orm import relationship
from sqlalchemy.ext.declarative import declarative_base
class Board(Base):
__tablename__ = 'board'
id = Column(Integer, primary_key=True)
name = Column(String(100))
threads = relationship(back_populates='board')
class Thread(Base):
__tablename__ = 'thread'
id = Column(Integer, primary_key=True)
title = Column(String())
link = Column(String())
author = Column(String())
post = Column(Text())
reply_count = Column(Integer)
read_count = Column(Integer)
board_id = Column(Integer, ForeignKey('Board.id'))
board = relationship('Board', back_populates='threads')
posts = relationship('Post', back_populates='threads')
author_id = Column(Integer, ForeignKey('Author.id'))
author = relationship('Author', back_populates='threads')
class Post(Base):
__tablename__ = 'post'
id = Column(Integer, primary_key=True)
order = Column(Integer, nullable=True)
author = Column(Text(), nullable=True) # remove this line and instead use the relationship below
content = Column(Text(), nullable=True)
timestamp = Column(Text(), nullable=True)
thread_id = Column(Integer, ForeignKey('Thread.id'))
thread = relationship('Thread', back_populates='posts')
author_id = Column(Integer, ForeignKey('Author.id'))
author = relationship('Author', back_populates='posts')
class AuthorInfo(Base):
__tablename__ = 'author'
id = Column(Integer, primary_key=True)
name = Column(String())
posts = relationship('Post', back_populates='author')
threads = relationship('Thread', back_populates='author')

sqlalchemy: column_prefix causes issues accessing model attributes

I went searching w/o result in a way to get the integer value or the boolean value from an object model created via sqlalchemy,
I mean i can add it and it works flawless but i cant get the integer value or the boolean value all i get when i tried to print it is the object name:
from sqlalchemy import create_engine, MetaData, Table, Column,Integer,String,Boolean,Sequence
from sqlalchemy.orm import mapper, sessionmaker
from sqlalchemy.ext.declarative import declarative_base
import json
class Bookmarks(object):
pass
#----------------------------------------------------------------------
engine = create_engine('postgresql://u:p#localghost/asd', echo=True)
Base = declarative_base()
class Tramo(Base):
__tablename__ = 'tramos'
__mapper_args__ = {'column_prefix':'tramos'}
id = Column(Integer, primary_key=True)
nombre = Column(String)
tramo_data = Column(String)
estado = Column(Boolean,default=True)
def __init__(self,nombre,tramo_data):
self.nombre=nombre
self.tramo_data=tramo_data
def __repr__(self):
return "[id:%s][nombre:%s][tramo:%s]" % (getattr(self, 'id'), self.nombre,self.tramo_data)
Session = sessionmaker(bind=engine)
session = Session()
tabla = Tramo.__table__
metadata = Base.metadata
metadata.create_all(engine)
b=Tramo('tramo1','adadas')
session.add(b)
session.commit()
print b
print b.id
its prints
[id:tramos.id][nombre:tramo1][tramo:adadas]
tramos.id
i cant get to print the id value, looks like the object column is in there but it doesn't return the value ot the property
i even use
session.refresh(b)
after the add but the result is the same.
According to the documentation Naming All Columns with a Prefix:
...prefix to the mapped attribute names relative to the
(table) column name ...
Since you define the mapped attributes in your class, I do not think it does what you desire.
Solution-1: remove the 'column_prefix':'tramos' from your __mapper_args__
Solution-2: print b.tramosid will print its id. You would need to change the __repr__ accordingly:
def __repr__(self):
return "[id:%s][nombre:%s][tramo:%s]" % (getattr(self, 'tramosid'), self.nombre, self.tramo_data)

SQLAlchemy inheritance with relationship is None in instantiated object

I would like to have a 'relationship' in an inherited (mixin) class.
However, when I create the inherited object, the relationship object is None. I cannot append to it.
How do I resolve this?
Here is code based upon the documentation
from sqlalchemy import Column, Integer, String, DateTime, Boolean, BigInteger, Float
from sqlalchemy import ForeignKey
from sqlalchemy.orm import relationship, backref
from sqlalchemy.ext.declarative import declared_attr
from sqlalchemy.ext.declarative import declarative_base
Base = declarative_base()
class Target(Base):
__tablename__ = "target"
id = Column(Integer, primary_key=True)
class RefTargetMixin(object):
#declared_attr
def target_id(cls):
return Column('target_id', ForeignKey('target.id'))
#declared_attr
def target(cls):
return relationship("Target",
primaryjoin="Target.id==%s.target_id" % cls.__name__
)
class Foo(RefTargetMixin, Base):
__tablename__ = 'foo'
id = Column(Integer, primary_key=True)
print repr(RefTargetMixin.target)
print repr(Foo.target)
print repr(Foo().target)
The output is:
<sqlalchemy.orm.properties.RelationshipProperty object at 0x24e7890>
<sqlalchemy.orm.attributes.InstrumentedAttribute object at 0x24e7690>
None
In general, I should be able to append to the relationship object (target), but here I cannot because it is None. Why?
the reason the value is None is because you've defined this as a many-to-one relationship. Many-to-one, from parent-to-child, means there is a foreign key on the parent, which can only refer to one and only one child. If you'd like something of class RefTargetMixin to refer to a collection of items, then foreign keys must be on the remote side.
So then the goal here is to make any object that is a subclass of RefTargetMixin be a potential parent for a Target. This pattern is called the polymorphic association pattern. While it is common in many ORM toolkits to provide this by declaring a "polymorphic foreign key" on Target, this is not a good practice relationally, so the answer is to use multiple tables in some way. There are three scenarios for this provided in SQLAlchemy core in the examples/generic_association folder, including "single association table with discriminator", "table per association", and "table per related". Each pattern provides the identical declarative pattern for RefTargetMixin here but the structure of the tables changes.
For example, here is your model using "table per association", which in my view tends to scale the best provided you don't need to query multiple types of RefTargetMixin objects at once (note I literally used the example as is, just changed the names):
from sqlalchemy.ext.declarative import declarative_base, declared_attr
from sqlalchemy import create_engine, Integer, Column, \
String, ForeignKey, Table
from sqlalchemy.orm import Session, relationship
class Base(object):
"""Base class which provides automated table name
and surrogate primary key column.
"""
#declared_attr
def __tablename__(cls):
return cls.__name__.lower()
id = Column(Integer, primary_key=True)
Base = declarative_base(cls=Base)
class Target(Base):
pass
class RefTargetMixin(object):
#declared_attr
def targets(cls):
target_association = Table(
"%s_targets" % cls.__tablename__,
cls.metadata,
Column("target_id", ForeignKey("target.id"),
primary_key=True),
Column("%s_id" % cls.__tablename__,
ForeignKey("%s.id" % cls.__tablename__),
primary_key=True),
)
return relationship(Target, secondary=target_association)
class Customer(RefTargetMixin, Base):
name = Column(String)
class Supplier(RefTargetMixin, Base):
company_name = Column(String)
engine = create_engine('sqlite://', echo=True)
Base.metadata.create_all(engine)
session = Session(engine)
session.add_all([
Customer(
name='customer 1',
targets=[
Target(),
Target()
]
),
Supplier(
company_name="Ace Hammers",
targets=[
Target(),
]
),
])
session.commit()
for customer in session.query(Customer):
for target in customer.targets:
print target
This is the normal behaviour : Foo has one Target. When you create the Foo object, it has no Target yet, so the value of Foo().target is None.
If you want Foo to have multiple Targets, you should put a foo_id in Target, and not a target_id in Foo, and use a backref.
Also, in that case, it is not needed to specify the primary join.

How to automatically add a SQLAlchemy object to the session?

I have a SQLAlchemy table class created using the Declarative method:
mysqlengine = create_engine(dsn)
session = scoped_session(sessionmaker(bind=mysqlengine))
Base = declarative_base()
Base.metadata.bind = mysqlengine
class MyTable(Base):
__table_args__ = {'autoload' : True}
Now, when using this table within the code I would like to not have to use the session.add method in order to add each new record to the active session so instead of:
row = MyTable(1, 2, 3)
session.add(row)
session.commit()
I would like to have:
row = MyTable(1, 2, 3)
session.commit()
Now, I know of this question already: Possible to add an object to SQLAlchemy session without explicit session.add()?
And, I realize you can force this behavior by doing the following:
class MyTable(Base):
def __init__(self, *args, **kw):
super(MyTable, self).__init__(*args, **kw)
session.add(self)
However, I do not want to bloat my code containing 30 tables with this method. I also know that Elixir ( http://elixir.ematia.de/trac/wiki ) does this so it must be possible in some sense.
Super simple. Use an event:
from sqlalchemy import event, Integer, Column, String
from sqlalchemy.orm import scoped_session, sessionmaker, mapper
from sqlalchemy.ext.declarative import declarative_base
Session = scoped_session(sessionmaker())
#event.listens_for(mapper, 'init')
def auto_add(target, args, kwargs):
Session.add(target)
Base = declarative_base()
class A(Base):
__tablename__ = "a"
id = Column(Integer, primary_key=True)
data = Column(String)
a1 = A(data="foo")
assert a1 in Session()

Categories