I'm looking for a way to be able to test code using pytest as well as use that code in production, and I'm struggling with session handling.
For pytest, I have a conftest.py that includes:
#pytest.fixture
def session(setup_database, connection):
transaction = connection.begin()
yield scoped_session(
sessionmaker(autocommit=False, autoflush=False, bind=connection)
)
transaction.rollback()
That allows me to write low-level tests using a test database along the lines of:
def test_create(session):
thing = Things(session, "my thing")
assert thing
...where Things is a sqlalchemy declarative base class defining a database table. This works fine.
The problem I'm trying to solve arises when testing higher levels of the code. The models.py includes:
engine = sqlalchemy.create_engine(
Config.MYSQL_CONNECT,
encoding='utf-8',
pool_pre_ping=True)
Session = scoped_session(sessionmaker(bind=engine))
...and the usage in the code is typically:
def fn():
with Session() as session:
thing = Things(session, "my thing")
I want fn() to use the Session defined in models.py in production, but use the pytest Session in testing.
I clearly have this architected incorrectly but I'm struggling to find a way forwards for what must be quite a common problem.
How do others handle this?
Related
I have a FastAPI application where I have several tests written with pytest.
Two particular tests are causing me issues. test_a calls a post endpoint that creates a new entry into the database. test_b gets these entries. test_b is including the created entry from test_a. This is not desired behaviour.
When I run the test individually (using VS Code's testing tab) it runs fine. However when running all the tests together and test_a runs before test_b, test_b fails.
My conftest.py looks like this:
import pytest
from fastapi.testclient import TestClient
from sqlmodel import Session, SQLModel, create_engine
from application.core.config import get_database_uri
from application.core.db import get_db
from application.main import app
#pytest.fixture(scope="module", name="engine")
def fixture_engine():
engine = create_engine(
get_database_uri(uri="postgresql://user:secret#localhost:5432/mydb")
)
SQLModel.metadata.create_all(bind=engine)
yield engine
SQLModel.metadata.drop_all(bind=engine)
#pytest.fixture(scope="function", name="db")
def fixture_db(engine):
connection = engine.connect()
transaction = connection.begin()
session = Session(bind=connection)
yield session
session.close()
transaction.rollback()
connection.close()
#pytest.fixture(scope="function", name="client")
def fixture_client(db):
app.dependency_overrides[get_db] = lambda: db
with TestClient(app) as client:
yield client
The file containing test_a and test_b also has a module-scoped pytest fixture that seeds the data using the engine fixture:
#pytest.fixture(scope="module", autouse=True)
def seed(engine):
connection = test_db_engine.connect()
seed_data_session = Session(bind=connection)
seed_data(seed_data_session)
yield
seed_data_session.rollback()
All tests use the client fixture, like so:
def test_a(client):
...
SQLAlchemy version is 1.4.41, FastAPI version is 0.78.0, and pytest version is 7.1.3.
My Observations
It seems the reason tests run fine on their own is due to SQLModel.metadata.drop_all(bind=engine) being called at the end of testing. However I would like to avoid having to do this, and instead only use rollback between tests.
What worked really well for me is using testcontainers: https://github.com/testcontainers/testcontainers-python.
#pytest.fixture(scope="module", name="session_for_db_in_testcontainer")
def db_engine():
"""
Creates testcontainer with Postgres db
"""
pg_container = PostgresContainer('postgres:latest')
pg_container.start()
# Fireup the SQLModel engine with the uri of the container
db_engine = create_engine(pg_container.get_connection_url())
sqlmodel_metadata.create_all(db_engine)
with Session(db_engine) as session_for_db_in_testcontainer:
# add some rows to start, for test get requests and posting existing data
add_data_to_test_db(database_input_path, session_for_db_in_testcontainer)
yield session_for_db_in_testcontainer
# Will be executed after the last test
session_for_db_in_testcontainer.close()
pg_container.stop()
Like this during the test run a (Postgres) DB is created it only runs during a session, module or function depending on the scope of the fixture. If you want, you can add test data to the db as well like in the example.
In your case you might want to set the scope of this fixture as function. Than test_a and test_b should run independently.
So this question is a little like Does SQLAlchemy reset the database session between SQLAlchemy Sessions from the same connection?
I have a Flask/SQLAlchemy/Postgres app, which intermittently seems to drop connections after a commit() that occurs as part of a POST request.
This causes me headaches as I rely upon a customized option (https://www.postgresql.org/docs/9.6/runtime-config-custom.html) to control row level security - in effect executing the following before each Flask request while utilising scoped sessions:
#app.before_request
def load_user():
...
# Set-up RLS.
statement = f"SET app.permitted_workspace_id = '{workspace_id}'"
db.db_session.execute(statement)
...
This pattern generally works fine, but occasionally seems to fail when, so far as I can tell, after a commit(), SQLAlchemy drops the existing session and checks out a new one, in which app.permitted_workspace_id is no longer set.
My workaround for this is to listen for session checkout events, and then re-set the parameter:
#event.listens_for(db_engine, 'checkout')
def receive_checkout(dbapi_connection, connection_record, connection_proxy):
...
cursor = dbapi_connection.cursor()
statement = f"SET app.permitted_workspace_id = '{g.user.workspace_id}'"
cursor.execute(statement)
return
So my question is really: is it unavoidable that SQLAlchemy may close sessions after commit(), meaning I lose my session parameters - even with more DB work still to do?
If so, do we think this pattern is secure or even acceptable practice? Ideally, I'd keep the session open until removed (via #app.teardown_appcontext), but since I'm struggling to achieve that, and still have the relevant info available within the Flask request, I think this is the next best way to go.
Thanks
Edit 1:
In terms of session scoping, the layout is this:
In a database module, I lay out the following:
def get_database_connection()
...
db_engine = sa.create_engine(
f'postgresql://{user}:{password}#{host}/postgres',
echo=False,
poolclass=sa.pool.NullPool
)
# Connect - RLS is controlled by db_get_user_details.
db_connection = db_engine.connect()
db_session = scoped_session(
sessionmaker(
autocommit=False,
autoflush=False,
expire_on_commit=False,
bind=db_engine
)
)
return(db_engine, db_session, db_connection)
This is then called up top from inside the main Flask application:
db_engine, db_session, db_connection = db.get_database_connection()
And session removal is controlled by a function as follows:
#app.teardown_appcontext
def remove_session(exception=None):
db_session.remove()
So the answer in here seems to be that commit() does perform a checkin with this pattern:
https://github.com/sqlalchemy/sqlalchemy/issues/4925
if Session is what you're working with then yes, the Session will release connections when any of commit(), rollback(), or close() is called.
I am currently working on a POC using FastAPI on a complex system. This project is heavy in business logic and will interact with 50+ different database tables when completed. Each model has a service, and some of the more complex business logic has its own service (which then interacts/queries with the different tables through the model-specific services).
While everything works, I've gotten some push-back from some members of my team regarding the dependency injection for the Session object. The biggest issue being mainly having to pass the Session from the controller, to a service, to a second service and (in a few cases), a third service further in. In those cases, the intermediary service functions tend to have no database queries but the functions that they call on other services might have some. The complaint mainly lies in this being more difficult to maintain and having to pass the DB object everywhere seems uselessly repetitive.
Example as code:
databases/mysql.py (one of 3 dbs in the project)
from sqlalchemy import create_engine
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker, Session
def get_uri():
return 'the mysql uri'
engine = create_engine(get_uri())
SessionLocal = sessionmaker(autocommit=False, autoflush=False, bind=engine)
Base = declarative_base()
def get_db():
db: Session = SessionLocal()
try:
yield db
db.commit()
except Exception:
db.rollback()
finally:
db.close()
controllers/controller1.py
from fastapi import APIRouter, HTTPException, Path, Depends
from sqlalchemy.orm import Session
from services.mysql.bar import get_bar_by_id
from services.mysql.process_x import bar_process
from databases.mysql import get_db
router = APIRouter(prefix='/foo')
#router.get('/bar/{bar_id}')
def process_bar(bar_id: int = Path(..., title='The ID of the bar to process', ge=1),
mysql_session: Session = Depends(get_db)):
# From the crontroller, to a service which only runs a query. This is fine.
bar = get_bar_by_id(bar_id, mysql_session)
if bar is None:
raise HTTPException(status_code=404,
detail='Bar not found for id: {bar_id}'.format(bar_id=bar_id))
# This one calls a function in a service which has a lot of business logic but no queries
processed_bar = bar_process(bar, mysql_session)
return processed_bar
services/mysql/process_x.py
from .process_bar import process_the_bar
from models.mysql.w import W
from models.mysql.bar import Bar
from models.mysql.y import Y
from models.mysql.z import Z
from sqlalchemy.orm import Session
def w_process(w: W, mysql_session: Session):
...
def bar_process(bar: Bar, mysql_session: Session):
# Very simplified, there's actually 5 conditional branching service calls here
return process_the_bar(bar, mysql_session)
def y_process(y: Y, mysql_session: Session):
...
def z_process(z: Z, mysql_session: Session):
...
services/mysql/process_bar.py
from . import model_service1
from . import model_service2
from . import model_service3
from . import additional_rules_service
from libraries.bar_functions import do_thing_to_bar
from models.mysql.bar import Bar
from sqlalchemy.orm import Session
def process_the_bar(bar: bar, mysql_session: Session):
process_result = list()
# Many processing steps, not all of them require db and might work on the bar directly
process_result.append(process1(bar, mysql_session))
process_result.append(process2(bar, mysql_session))
process_result.append(process3(bar, mysql_session))
process_result.append(process4(bar))
process_result.append(...(bar))
process_result.append(processY(bar))
def process1(bar: Bar, mysql_session: Session):
return model_service1.do_something(bar.val, mysql_session)
def process2(bar: Bar, mysql_session: Session):
return model_service2.do_something(bar.val, mysql_session)
def process3(bar: Bar, mysql_session: Session):
return model_service3.do_something(bar.val, mysql_session)
def process4-Y(bar: Bar, mysql_session: Session):
# do something using the bar library, or maybe on another service with no queries
return list()
As you can see, we're stuck passing the mysql_session and having it repeat everywhere when using this approach.
Here are a two solutions I have thought of:
Adding the DB session to the Starlette request state
I could do this either through the app.startup event ( https://fastapi.tiangolo.com/advanced/events/ ) or a middleware. However, it does mean passing the request state back and forth in a similar fashion (if my understanding of it is correct)
Session scope approach using Context Manager
Pretty much, I would turn the get_db function into a context manager instead and not inject it as a dependency. By far the cleanest end result, however it goes completely against the concept of sharing a single db session across the request.
I've considered the fully async approach using encode/databases as shown in the FastAPI documentation ( https://fastapi.tiangolo.com/advanced/async-sql-databases/ ), however one of the databases we are working with on SqlAlchemy is used through a plugin and I am assuming does not support async out of the box (Vertica). If I'm wrong, then I could consider the fully async approach.
So in the end, what I'm wondering is if it's possible to accomplish something "cleaner" without compromising the single session per request approach?
I have gotten some help directly from the FastAPI Github
As user Insomnes mentioned, what I am looking to do can be achieved by using ContextVar. I have tried it in my code and it seems to work just fine.
I built an API using Flask and I'm using a service (as below) to create my database connections.
class DatabaseService:
def __init__(self):
self.connection_string = "foo"
def create_connection(self):
engine = create_engine(self.connection_string)
Session = scoped_session(sessionmaker(bind=engine))
return Session
In my app.py I add and remove these sessions to Flask application context (g) as the docs suggests.
So I can reference to g.session always I need them.
def get_session():
if 'session' not int g:
session = database_service.create_session()
g.session = session
#app.teardown_appcontext
def shutdown_session(exception=None):
if 'session' in g:
g.session.remove()
return None
This way every request has your own session that will close after processing. Am I right?
I don't understand why the connections are still alive on my database after the request is already done.
Always I run the command show processlist I can see multiple connections sleeping from my API.
I see no problem opening and closing sessions per-request
my_session = Session(engine)
my_session.execute(some_query)
my_session.close()
I am developing my API server with Python-eve, and would like to know how to test the API endpoints. A few things that I would like to test specifically:
Validation of POST/PATCH requests
Authentication of different endpoints
Before_ and after_ hooks working property
Returning correct JSON response
Currently I am testing the app against a real MongoDB, and I can imagine the testing will take a long time to run once I have hundreds or thousands of tests to run. Mocking up stuff is another approach but I couldn't find tools that allow me to do that while keeping the tests as realistic as possible. I am wondering if there is a recommended way to test eve apps. Thanks!
Here is what I am having now:
from pymongo import MongoClient
from myModule import create_app
import unittest, json
class ClientAppsTests(unittest.TestCase):
def setUp(self):
app = create_app()
app.config['TESTING'] = True
self.app = app.test_client()
# Insert some fake data
client = MongoClient(app.config['MONGO_HOST'], app.config['MONGO_PORT'])
self.db = client[app.config['MONGO_DBNAME']]
new_app = {
'client_id' : 'test',
'client_secret' : 'secret',
'token' : 'token'
}
self.db.client_apps.insert(new_app)
def tearDown(self):
self.db.client_apps.remove()
def test_access_public_token(self):
res = self.app.get('/token')
assert res.status_code == 200
def test_get_token(self):
query = { 'client_id': 'test', 'client_secret': 'secret' }
res = self.app.get('/token', query_string=query)
res_obj = json.loads(res.get_data())
assert res_obj['token'] == 'token'
The Eve test suite itself is using a test db and not mocking anything. The test db gets created and dropped on every run to guarantee isolation (not super fast yes, but as close as possible to a production environment). While of course you should test your own code, you probably don't need to write tests like test_access_public_token above since, stuff like that is covered by the Eve suite already. You might want to check the Eve Mocker extension too.
Also make yourself familiar with Authentication and Authorization tutorials. It looks like the way you're going get the whole token thing going is not really appropriate (you want to use request headers for that kind of stuff).