SQLAlchemy doesn't correctly create in-memory database - python

Making an API using FastAPI and SQLAlchemy I'm experiencing strange behaviour when database (SQLite) is in-memory which doesn't occur when stored as file.
Model:
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy import Column, Integer, String
Base = declarative_base()
class Thing(Base):
__tablename__ = "thing"
id = Column(Integer, primary_key=True, autoincrement=True)
name = Column(String)
I create two global engine objects. One with database as file, the other as in-memory database:
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
args = dict(echo=True, connect_args={"check_same_thread": False})
engine1 = create_engine("sqlite:///db.sqlite", **args)
engine2 = create_engine("sqlite:///:memory:", **args)
Session1 = sessionmaker(bind=engine1)
Session2 = sessionmaker(bind=engine2)
I create my FastAPI app and a path to add an object to database:
from fastapi import FastAPI
app = FastAPI()
#app.get("/")
def foo(x: int):
with {1: Session1, 2: Session2}[x]() as session:
session.add(Thing(name="foo"))
session.commit()
My main to simulate requests and check everything is working:
from fastapi.testclient import TestClient
if __name__ == "__main__":
Base.metadata.create_all(engine1)
Base.metadata.create_all(engine2)
client = TestClient(app)
assert client.get("/1").status_code == 200
assert client.get("/2").status_code == 200
thing table is created in engine1 and committed, same with engine2. On first request "foo" was successfully inserted into engine1's database (stored as file) but second request raises "sqlite3.OperationalError" claiming "no such table: thing".
Why is there different behaviour between the two? Why does in-memory database claim the table doesn't exist even though SQLAlchemy logs show create table statement ran successfully and was committed?

The docs explain this in the following https://docs.sqlalchemy.org/en/14/dialects/sqlite.html#using-a-memory-database-in-multiple-threads
To use a :memory: database in a multithreaded scenario, the same connection object must be shared among threads, since the database exists only within the scope of that connection. The StaticPool implementation will maintain a single connection globally, and the check_same_thread flag can be passed to Pysqlite as False
It also shows how to get the intended behavior, so in your case
from sqlalchemy.pool import StaticPool
args = dict(echo=True, connect_args={"check_same_thread": False}, poolclass=StaticPool)

Related

SQLAlchemy: automap_base in a forking code

I develop an API server that interacts with MySQL DB reflecting it's schema and also get worked into multiple processes. My code for DB work looks like this:
from sqlalchemy import MetaData
from sqlalchemy.ext.automap import automap_base
from sqlalchemy.orm.session import Session
my_engine = create_engine_by_info(my_config)
metadata = MetaData(bind=my_engine)
Base: type = automap_base(metadata=metadata)
class User(Base):
__tablename__ = 'auth_user'
# Relation descriptions...
# Other classes...
Base.prepare(my_engine, reflect=True)
def find_user(field):
with Session(my_engine) as session:
query = session.query(User)
query = query.filter(User.field == field)
records = query.all()
for u in records:
return u
return None
And it works fine until process gets forked: after work of the child process the original one looses connection: Lost connection to MySQL server during query.
I guess I should keep my_engine separate for each process (e.g some function with a dict of engines where key is a PID), but how can I do that if my classes definition requires an engine at the beginning? Probably I can move classes in a function too, but it would be a hell... So, what is a good solution here?

How use pytest to unit test sqlalchemy orm classes

I want to write some py.test code to test 2 simple sqlalchemy ORM classes that were created based on this Tutorial. The problem is, how do I set a the database in py.test to a test database and rollback all changes when the tests are done? Is it possible to mock the database and run tests without actually connect to de database?
here is the code for my classes:
from sqlalchemy import create_engine, ForeignKey
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy import Column, Integer, String
from sqlalchemy.orm import sessionmaker, relationship
eng = create_engine('mssql+pymssql://user:pass#host/my_database')
Base = declarative_base(eng)
Session = sessionmaker(eng)
intern_session = Session()
class Author(Base):
__tablename__ = "Authors"
AuthorId = Column(Integer, primary_key=True)
Name = Column(String)
Books = relationship("Book")
def add_book(self, title):
b = Book(Title=title, AuthorId=self.AuthorId)
intern_session.add(b)
intern_session.commit()
class Book(Base):
__tablename__ = "Books"
BookId = Column(Integer, primary_key=True)
Title = Column(String)
AuthorId = Column(Integer, ForeignKey("Authors.AuthorId"))
Author = relationship("Author")
I usually do that this way:
I do not instantiate engine and session with the model declarations, instead I only declare a Base with no bind:
Base = declarative_base()
and I only create a session when needed with
engine = create_engine('<the db url>')
db_session = sessionmaker(bind=engine)
You can do the same by not using the intern_session in your add_book method but rather use a session parameter.
def add_book(self, session, title):
b = Book(Title=title, AuthorId=self.AuthorId)
session.add(b)
session.commit()
It makes your code more testable since you can now pass the session of your choice when you call the method.
And you are no more stuck with a session bound to a hardcoded database url.
I add a custom --dburl option to pytest using its pytest_addoption hook.
Simply add this to your top-level conftest.py:
def pytest_addoption(parser):
parser.addoption('--dburl',
action='store',
default='<if needed, whatever your want>',
help='url of the database to use for tests')
Now you can run pytest --dburl <url of the test database>
Then I can retrieve the dburl option from the request fixture
From a custom fixture:
#pytest.fixture()
def db_url(request):
return request.config.getoption("--dburl")
# ...
Inside a test:
def test_something(request):
db_url = request.config.getoption("--dburl")
# ...
At this point you are able to:
get the test db_url in any test or fixture
use it to create an engine
create a session bound to the engine
pass the session to a tested method
It is quite a mess to do this in every test, so you can make a usefull usage of pytest fixtures to ease the process.
Below are some fixtures I use:
from sqlalchemy import create_engine
from sqlalchemy.orm import scoped_session, sessionmaker
#pytest.fixture(scope='session')
def db_engine(request):
"""yields a SQLAlchemy engine which is suppressed after the test session"""
db_url = request.config.getoption("--dburl")
engine_ = create_engine(db_url, echo=True)
yield engine_
engine_.dispose()
#pytest.fixture(scope='session')
def db_session_factory(db_engine):
"""returns a SQLAlchemy scoped session factory"""
return scoped_session(sessionmaker(bind=db_engine))
#pytest.fixture(scope='function')
def db_session(db_session_factory):
"""yields a SQLAlchemy connection which is rollbacked after the test"""
session_ = db_session_factory()
yield session_
session_.rollback()
session_.close()
Using the db_session fixture you can get a fresh and clean db_session for each single test.
When the test ends, the db_session is rollbacked, keeping the database clean.

SQLAlchemy not creating tables

I am trying to setup a database just like in a tutorial but I am getting a programming error that a table doesn't exist when I'm trying to add a User
This is the file that errors (database.py):
from sqlalchemy import create_engine, MetaData
from sqlalchemy.orm import scoped_session, sessionmaker
from sqlalchemy.ext.declarative import declarative_base
engine = create_engine(
"mysql+pymysql://testuser:testpassword#localhost/test?charset=utf8",
connect_args = {
"port": 3306
},
echo="debug",
echo_pool=True
)
db_session = scoped_session(
sessionmaker(
bind=engine,
autocommit=False,
autoflush=False
)
)
Base = declarative_base()
def init_db():
import models
Base.metadata.create_all(bind=engine)
from models import User
db_session.add(
User(username="testuser", password_hash=b"", password_salt=b"", balance=1)
)
db_session.commit()
print("Initialized the db")
if __name__ == "__main__":
init_db()
To init the database (create the tables) I just run the file.
It errors when it creates the test user.
Here is models.py:
from sqlalchemy import Column, Integer, Numeric, Binary, String
from sqlalchemy.orm import relationship
from database import Base
class User(Base):
__tablename__ = "users"
id = Column(Integer, primary_key=True)
username = Column(String(16), unique=True)
password_hash = Column(Binary(32))
password_salt = Column(Binary(32))
balance = Column(Numeric(precision=65, scale=8))
def __repr__(self):
return "<User(balance={})>".format(balance)
I tried:
Committing before adding users (after create_all)
Drop existing tables from the database (although it seems like the table never gets committed)
from models import User instead of import models (before create_all)
Sorry if there are so many simillar questions, I promise I scavenged for answers, but it's always silly mistakes I made sure I didn't make (or atleast the ones I saw).
I am using MariaDB.
Sorry for long post, many thanks in advance.
The Base in database.py isn't the same Base that is imported into models.py.
A simple test is to put a print('creating Base') function call just above the Base = declarative_base() statement, and you'll see it is being created twice.
Python calls the module that is being executed '__main__', which you know as you have the if __name__ == '__main__' conditional at the bottom of your module. So the first Base that is created is __main__.Base. Then, in models.py, from database import Base causes the database module to be parsed again, creating database.Base in the namespace, and that is the Base from which User inherits. Then back in database.py, the Base.metadata.create_all(bind=engine) call is using the metadata from __main__.Base which has no tables in it, and as such creates nothing.
Don't execute out of the module that creates the Base instance. Create another module called main.py (or whatever), and move your init_db() function there and import Base, db_session and engine from database.py into main.py. That way, you are always using the same Base instance. This is example of main.py:
from database import Base, db_session, engine
from models import User
def init_db():
Base.metadata.create_all(bind=engine)
db_session.add(
User(username="testuser", password_hash=b"", password_salt=b"", balance=1)
)
db_session.commit()
print("Initialized the db")
if __name__ == "__main__":
init_db()
Declare Base class once(for each database) & import it to all modules which define table classes (inherited from Base)
For Base (a metaclass) to scan & find out all classes which are inherited from it, we need to import all the modules where such table classes (inherited from Base) are defined to module where we call Metadata.create_all(engine).
You need to import the relevant model where you call "Base.metadata.create_all". Example below to create user table
from ModelBase import Base
from UserModel import User
def create_db_schema(engine):
Base.metadata.create_all(engine,checkfirst=True)

pandas.read_sql Read uncommitted with SQLAlchemy

I am trying to use the pandas function pd.read_sql to read records that have been created, added, and flushed in a SQLAlchemy session, but not committed. So I want to create an object in a SQLAlchemy session and query it with pandas before calling commit. Using pandas 0.22.0 and SQLAlchemy 1.1.10.
I have tried setting the isolation_level on create_engine, and various other ways of setting the isolation level to 'READ UNCOMMITTED', but this does not seem to work. Minimal example below:
# Import packages
import pandas as pd
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy import create_engine, Column, Integer, String
from sqlalchemy.orm import sessionmaker
# Set up an example ORM
Base = declarative_base()
class Record(Base):
__tablename__ = 'records'
id = Column(Integer, primary_key=True)
foo = Column(String(255))
# Create a session and engine:
database='foobar'
user=''
password = ''
host = 'localhost'
port = '5432'
connection_string = f"postgresql+psycopg2://{user}:{password}#{host}:{port}/{database}"
engine = create_engine(connection_string, encoding = 'utf8', convert_unicode = True,
isolation_level='READ_UNCOMMITTED'
)
session = sessionmaker()
session.configure(bind=engine)
db = session()
# Set up the example record:
Record.__table__.create(bind=engine)
record = Record(foo='bar')
db.add(record)
db.flush()
# Attempt to query:
records = pd.read_sql('select * from records', db.get_bind())
assert records.empty
I am looking for a solution that will cause the above code to throw an AssertionError on the last line. records.empty currently evaluates to true.
And of course I figure it out as soon as I post here. For posterity: use db.connection() instead of db.get_bind().

Instantiating object automatically adds to SQLAlchemy Session. Why?

From my understanding of SQLAlchemy, in order to add a model to a session, I need to call session.add(obj). However, for some reason, in my code, SQLAlchemy seems to do this automatically.
Why is it doing this, and how can I stop it? Am I approaching session in the correct way?
example
>>> from database import Session as db
>>> import clients
>>> from instances import Instance
>>> from uuid import uuid4
>>> len(db.query(Instance).all())
>>> 0 # Note, no instances in database/session
>>> i = Instance(str(uuid4()), clients.get_by_code('AAA001'), [str(uuid4())])
>>> len(db.query(Instance).all())
>>> 1 # Why?? I never called db.add(i)!
database.py
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker, scoped_session
from sqlalchemy.ext.declarative import declarative_base
import config
Base = declarative_base()
class Database():
def __init__(self):
db_url = 'postgresql://{:s}:{:s}#{:s}:{}/{:s}'.format(
config.database['user'],
config.database['password'],
config.database['host'],
config.database['port'],
config.database['dbname']
)
self.engine = create_engine(db_url)
session_factory = sessionmaker(bind=self.engine)
self.session = scoped_session(session_factory)
Database = Database()
Session = Database.session
instance.py
from sqlalchemy import Column, Text, ForeignKey
from sqlalchemy.orm import relationship
from sqlalchemy.dialects.postgresql import UUID, ARRAY
import database
Base = database.Base
class Instance(Base):
__tablename__ = 'instances'
uuid = Column(UUID, primary_key=True)
client_code = Column(
Text, ForeignKey('clients.code', ondelete='CASCADE'), nullable=False)
mac_addresses = Column(ARRAY(Text, as_tuple=True),
primary_key=True)
client = relationship("Client", back_populates="instances")
def __init__(self, uuid, client, mac_addresses):
self.uuid = uuid
self.client = client
self.mac_addresses = tuple(mac_addresses)
client.py
from sqlalchemy import Column, Text
from sqlalchemy.orm import relationship
import database
from database import Session as db
Base = database.Base
class Client(Base):
__tablename__ = 'clients'
code = Column(Text, primary_key=True)
name = Column(Text)
instances = relationship("Instance", back_populates='client')
def __init__(self, code, name=None):
self.code = code
self.name = name
def get_by_code(code):
client = db.query(Client).filter(Client.code == code).first()
return client
When you create a SQLAlchemy object and link it directly to another SQLAlchemy object, both objects end up in the session.
The reason is that SQLAlchemy needs to make sure you can query these objects.
Take, for example, a user with addresses.
If you create a user in code, with an address, both the user and the address end up in the session, because the address is linked to the user and SQLAlchemy needs to make sure you can query all addresses of a user using user.addresses.all().
In that case all (possibly) existing addresses need to be fetched, as well as the new address you just added. For that purpose the newly added address needs to be saved in the database.
To prevent this from happening (for example if you only need objects to just calculate with), you can link the objects with their IDs/Foreign Keys:
address.user_id = user.user_id
However, if you do this, you won't be able to access the SQLAlchemy properties anymore. So user.addresses or address.user will no longer yield results.
The reverse is also true; I asked a question myself a while back why linking two objects by ID will not result in SQLAlchemy linking these objects in the ORM:
relevant stackoverflow question
another description of this behavior

Categories