SQLAlchemy: automap_base in a forking code - python

I develop an API server that interacts with MySQL DB reflecting it's schema and also get worked into multiple processes. My code for DB work looks like this:
from sqlalchemy import MetaData
from sqlalchemy.ext.automap import automap_base
from sqlalchemy.orm.session import Session
my_engine = create_engine_by_info(my_config)
metadata = MetaData(bind=my_engine)
Base: type = automap_base(metadata=metadata)
class User(Base):
__tablename__ = 'auth_user'
# Relation descriptions...
# Other classes...
Base.prepare(my_engine, reflect=True)
def find_user(field):
with Session(my_engine) as session:
query = session.query(User)
query = query.filter(User.field == field)
records = query.all()
for u in records:
return u
return None
And it works fine until process gets forked: after work of the child process the original one looses connection: Lost connection to MySQL server during query.
I guess I should keep my_engine separate for each process (e.g some function with a dict of engines where key is a PID), but how can I do that if my classes definition requires an engine at the beginning? Probably I can move classes in a function too, but it would be a hell... So, what is a good solution here?

Related

SQLAlchemy doesn't correctly create in-memory database

Making an API using FastAPI and SQLAlchemy I'm experiencing strange behaviour when database (SQLite) is in-memory which doesn't occur when stored as file.
Model:
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy import Column, Integer, String
Base = declarative_base()
class Thing(Base):
__tablename__ = "thing"
id = Column(Integer, primary_key=True, autoincrement=True)
name = Column(String)
I create two global engine objects. One with database as file, the other as in-memory database:
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
args = dict(echo=True, connect_args={"check_same_thread": False})
engine1 = create_engine("sqlite:///db.sqlite", **args)
engine2 = create_engine("sqlite:///:memory:", **args)
Session1 = sessionmaker(bind=engine1)
Session2 = sessionmaker(bind=engine2)
I create my FastAPI app and a path to add an object to database:
from fastapi import FastAPI
app = FastAPI()
#app.get("/")
def foo(x: int):
with {1: Session1, 2: Session2}[x]() as session:
session.add(Thing(name="foo"))
session.commit()
My main to simulate requests and check everything is working:
from fastapi.testclient import TestClient
if __name__ == "__main__":
Base.metadata.create_all(engine1)
Base.metadata.create_all(engine2)
client = TestClient(app)
assert client.get("/1").status_code == 200
assert client.get("/2").status_code == 200
thing table is created in engine1 and committed, same with engine2. On first request "foo" was successfully inserted into engine1's database (stored as file) but second request raises "sqlite3.OperationalError" claiming "no such table: thing".
Why is there different behaviour between the two? Why does in-memory database claim the table doesn't exist even though SQLAlchemy logs show create table statement ran successfully and was committed?
The docs explain this in the following https://docs.sqlalchemy.org/en/14/dialects/sqlite.html#using-a-memory-database-in-multiple-threads
To use a :memory: database in a multithreaded scenario, the same connection object must be shared among threads, since the database exists only within the scope of that connection. The StaticPool implementation will maintain a single connection globally, and the check_same_thread flag can be passed to Pysqlite as False
It also shows how to get the intended behavior, so in your case
from sqlalchemy.pool import StaticPool
args = dict(echo=True, connect_args={"check_same_thread": False}, poolclass=StaticPool)

sqlalchemy existing database query

I am using SQLAlchemy as ORM for a python project. I have created few models/schema and it is working fine. Now I need to query a existing MySQL database, no insert/update just the select statement.
How can I create a wrapper around the tables of this existing database? I have briefly gone through the sqlalchemy docs and SO but couldn't find anything relevant. All suggest execute method, where I need to write the raw sql queries, while I want to use the SQLAlchemy query method in same way as I am using with the SA models.
For example if the existing db has table name User then I want to query it using the dbsession ( only the select operation, probably with join)
You seem to have an impression that SQLAlchemy can only work with a database structure created by SQLAlchemy (probably using MetaData.create_all()) - this is not correct. SQLAlchemy can work perfectly with a pre-existing database, you just need to define your models to match database tables. One way to do that is to use reflection, as Ilja Everilä suggests:
from sqlalchemy import Table
from sqlalchemy.ext.declarative import declarative_base
Base = declarative_base()
class MyClass(Base):
__table__ = Table('mytable', Base.metadata,
autoload=True, autoload_with=some_engine)
(which, in my opinion, would be totally fine for one-off scripts but may lead to incredibly frustrating bugs in a "real" application if there's a potential that the database structure may change over time)
Another way is to simply define your models as usual taking care to define your models to match the database tables, which is not that difficult. The benefit of this approach is that you can map only a subset of database tables to you models and even only a subset of table columns to your model's fields. Suppose you have 10 tables in the database but only interested in users table from where you only need id, name and email fields:
import sqlalchemy as sa
from sqlalchemy.ext.declarative import declarative_base
Base = declarative_base()
class User(Base):
id = sa.Column(sa.Integer, primary_key=True)
name = sa.Column(sa.String)
email = sa.Column(sa.String)
(note how we didn't need to define some details which are only needed to emit correct DDL, such as the length of the String fields or the fact that the email field has an index)
SQLAlchemy will not emit INSERT/UPDATE queries unless you create or modify models in your code. If you want to ensure that your queries are read-only you may create a special user in the database and grant that user SELECT privileges only. Alternatively/in addition, you may also experiment with rolling back the transaction in your application code.
You can access an existing table using the automap extension:
from sqlalchemy.ext.automap import automap_base
from sqlalchemy.orm import Session
Base = automap_base()
Base.prepare(engine, reflect=True)
Users = Base.classes.users
session = Session(engine)
res = session.query(Users).first()
Create a table with autoload enabled that will inspect it. Some example code:
from sqlalchemy.sql import select
from sqlalchemy import create_engine, MetaData, Table
CONN_STR = '…'
engine = create_engine(CONN_STR, echo=True)
metadata = MetaData()
cookies = Table('cookies', metadata, autoload=True,
autoload_with=engine)
cols = cookies.c
with engine.connect() as conn:
query = (
select([cols.created_at, cols.name])
.order_by(cols.created_at)
.limit(1)
)
for row in conn.execute(query):
print(row)
Other answers don't mention what to do if you have a table with no primary key, so I thought I would address this. Assuming a table called Customers that has columns for CustomerId, CustomerName, CustomerLocation you could do;
from sqlalchemy.ext.automap import automap_base
from sqlalchemy import create_engine, MetaData, Column, String, Table
from sqlalchemy.orm import Session
Base = automap_base()
conn_str = '...'
engine = create_engine(conn_str)
metadata = MetaData()
# you only need to define which column is the primary key. It can automap the rest of the columns.
customers = Table('Customers',metadata, Column('CustomerId', String, primary_key=true), autoload=True, autoload_with=engine)
Base.prepare()
Customers= Base.classes.Customers
session = Session(engine)
customer1 = session.query(Customers).first()
print(customer1.CustomerName)
Assume we have a Postgresql database named accounts. And we already have a table named users.
import sqlalchemy as sa
psw = "verysecret"
db = "accounts"
# create an engine
pengine = sa.create_engine('postgresql+psycopg2://postgres:' + psw +'#localhost/' + db)
from sqlalchemy.ext.declarative import declarative_base
# define declarative base
Base = declarative_base()
# reflect current database engine to metadata
metadata = sa.MetaData(pengine)
metadata.reflect()
# build your User class on existing `users` table
class User(Base):
__table__ = sa.Table("users", metadata)
# call the session maker factory
Session = sa.orm.sessionmaker(pengine)
session = Session()
# filter a record
session.query(User).filter(User.id==1).first()
Warning: Your table should have a Primary Key defined. Otherwise, Sqlalchemy won't like it.

peewee vs sqlalchemy performance

I have 2 simple scripts:
from sqlalchemy import create_engine, ForeignKey, Table
from sqlalchemy import Column, Date, Integer, String, DateTime, BigInteger, event
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.engine import Engine
from sqlalchemy.orm import relationship, backref, sessionmaker, scoped_session, Session
class Test(declarative_base()):
__tablename__ = "Test"
def __init__(self, *args, **kwargs):
args = args[0]
for key in args:
setattr(self, key, args[key] )
key = Column(String, primary_key=True)
data = []
for a in range(0,10000):
data.append({ "key" : "key%s" % a})
engine = create_engine("sqlite:///testn", echo=False)
with engine.connect() as connection:
Test.metadata.create_all(engine)
session = Session(engine)
list(map(lambda x: session.merge(Test(x)), data))
session.commit()
result:
real 0m15.300s
user 0m14.920s
sys 0m0.351s
second script:
from peewee import *
class Test(Model):
key = TextField(primary_key=True,null=False)
dbname = "test"
db = SqliteDatabase(dbname)
Test._meta.database = db
data = []
for a in range(0,10000):
data.append({ "key" : "key%s" % a })
if not Test.table_exists():
db.create_tables([Test])
with db.atomic() as tr:
Test.insert_many(data).upsert().execute()
result:
real 0m3.253s
user 0m2.620s
sys 0m0.571s
Why?
This comparison is not entirely valid, as issuing an upsert style query is very different from what SQLAlchemy's Session.merge does:
Session.merge() examines the primary key attributes of the source instance, and attempts to reconcile it with an instance of the same primary key in the session. If not found locally, it attempts to load the object from the database based on primary key, and if none can be located, creates a new instance.
In this test case this will result in 10,000 load attempts against the database, which is expensive.
On the other hand when using peewee with sqlite the combination of insert_many(data) and upsert() can result in a single query:
INSERT OR REPLACE INTO Test (key) VALUES ('key0'), ('key1'), ...
There's no session state to reconcile, since peewee is a very different kind of ORM from SQLAlchemy and on a quick glance looks closer to Core and Tables
In SQLAlchemy instead of list(map(lambda x: session.merge(Test(x)), data)) you could revert to using Core:
session.execute(Test.__table__.insert(prefixes=['OR REPLACE']).values(data))
A major con about this is that you have to write a database vendor specific prefix to INSERT by hand. This will also subvert the Session, as it will have no information or knowledge about the newly added rows.
Bulk insertions using model objects are a little more involved with SQLAlchemy. Very simply put using an ORM is a trade-off between ease of use and speed:
ORMs are basically not intended for high-performance bulk inserts - this is the whole reason SQLAlchemy offers the Core in addition to the ORM as a first-class component.

How to deploy a Python/SQLAlchemy application?

SQLAlchemy allowed me to create a powerful database utility. Now I don't know how to deploy it. Let me explain how is it built with an example:
# objects.py
class Item(object):
def __init__(self, name):
self.name = name
# schema.py
from sqlalchemy import *
from objects import Item
engine=create_engine('sqlite:///mydb.db')
metadata = MetaData(engine)
item_table = Table(
'items', metadata,
Column('id', Integer, primary_key=True),
Column('name', String(100))
)
item_mapper = mapper(Item, item_table)
metadata.create_all()
# application.py
from schema import engine, Item
from sqlalchemy import *
Session = sessionmaker(bind=engine)
class Browser(object):
def __init__(self):
self.s = Session()
def get_by_name(self, name):
return self.s.query(Item).filter_by(name=name)
As you can see, what I want to make available is the last interface (Browser) where I simplify the queries for the end user.
If you simply request every user to open a Python shell and from application import Browser it seems that the advantages of connection pooling are not realized, because every user creates a different Session class (as opposed to creating a different session instance).
So, should I write a server that the users connect to? Or, how would you deploy this hypothetical application?
Thank you.
Connection pooling happens within the same python instance, so when your users connect from remote to the database, you have to write a small server anyways, if you want to use it. You can also connect directly to a database server, resulting in (at least) one connection per user. Depends on what you want to achieve.

How to set up global connection to database?

I have problem with setting up database connection. I want to set connection, where I can see this connection in all my controllers.
Now I use something like this in my controller:
db = create_engine('mysql://root:password#localhost/python')
metadata = MetaData(db)
email_list = Table('email',metadata,autoload=True)
In development.ini I have:
sqlalchemy.url = mysql://root#password#localhost/python
sqlalchemy.pool_recycle = 3600
How do I set _____init_____.py?
I hope you got pylons working; for anyone else that may later read question I'll present some pointers in the right direction.
First of all, you are only creating a engine and a metadata object. While you can use the engine to create connections directly you would almost always use a Session to manage querying and updating your database.
Pylons automatically setups this for you by creating a engine from your configuration file, then passing it to yourproject.model.__init__.py:init_model() which binds it to a scoped_session object.
This scoped_session object is available from yourproject.model.meta and is the object you would use to query your database. For example:
record = meta.Session.query(model.MyTable).filter(id=42)
Because it is a scoped_session it automatically creates a Session object and associates it with the current thread if it doesn't already exists. Scoped_session passes all action (.query(), .add(), .delete()) down into the real Session object and thus allows you a simple way to interact the database with having to manage the non-thread-safe Session object explicitly.
The scoped_session, Session, object from yourproject.model.meta is automatically associated with a metadata object created as either yourproject.model.meta:metadata (in pylons 0.9.7 and below) or yourproject.model.meta:Base.metadata (in pylons 1.0). Use this metadata object to define your tables. As you can see in newer versions of pylons a metadata is associated with a declarative_base() object named Base, which allows you to use SqlAlchemy's declarative style.
Using this from the controller
from yourproject import model
from yourproject.model import Session
class MyController(..):
def resource(self):
result = Session.query(model.email_list).\
filter(model.email_list.c.id=42).one()
return str(result)
Use real connections
If you really want to get a connection object simply use
from yourproject.model import Session
connection = Session.connection()
result = connection.execute("select 3+4;")
// more connection executions
Session.commit()
However this is all good, but what you should be doing is...
This leaves out that you are not really using SqlAlchemy much. The power of SqlAlchemy really shines when you start mapping your database tables to python classes. So anyone looking into using pylons with a database should take a serious look at what you can do with SqlAlchemy. If SqlAlchemy starts out intimidating simply start out with using its declarative approach, which should be enough for almost all pylons apps.
In your model instead of defining Table constructs, do this:
from sqlalchemy import Column, Integer, Unicode, ForeignKey
from sqlalchemy.orm import relation
from yourproject.model.meta import Base
class User(Base):
__tablename__ = 'users'
# primary_key implies nullable=False
id = Column(Integer, primary_key=True, index=True)
# nullable defaults to True
name = Column(Unicode, nullable=False)
notes = relation("UserNote", backref="user")
query = Session.query_property()
class UserNote(Base):
__tablename__ = 'usernotess'
# primary_key implies nullable=False
id = Column(Integer, primary_key=True, index=True)
userid = Column(Integer, index=True, ForeignKey("User.id"))
# nullable defaults to True
text = Column(Unicode, nullable=False)
query = Session.query_property()
Note the query objects. These are smart object that live on the class and associates your classes with the scoped_session(), Session. This allows you to event more easily extract data from your database.
from sqlalchemy.orm import eagerload
def resource(self):
user = User.query.filter(User.id==42).options(eagerload("notes")).one()
return "\n".join([ x.text for x in user.notes ])
1.0 version of Pylons use declarative syntax. More about this, you can see here .
In mode/init.py you can write somthing like this:
from your_programm.model.meta import Session, Base
from sqlalchemy import *
from sqlalchemy.types import *
def init_model(engine):
Session.configure(bind=engine)
class Foo(Base) :
__tablename__ = "foo"
id = Column(Integer, primary_key=True)
name = Column(String)
...
What you want to do is modify the Globals class in your app_globals.py file to include a .engine (or whatever) attribute. Then, in your controllers, you use from pylons import app_globals and app_globals.engine to access the engine (or metadata, session, scoped_session, etc...).

Categories