Is it OK to execute code when a module imports?

Is it OK to execute code when a module imports? - python

I'm designing a small GUI application to wrap an sqlite DB (simple CRUD operations). I have created three sqlalchemy models (m_person, m_card.py, m_loan.py, all in a /models folder) and had previously had the following code at the top of each one:
from sqlalchemy import Table, Column, create_engine
from sqlalchemy import Integer, ForeignKey, String, Unicode
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import backref, relation
engine = create_engine("sqlite:///devdata.db", echo=True)
declarative_base = declarative_base(engine)
metadata = declarative_base.metadata
This felt a bit wrong (DRY) so it was suggested that I move all this stuff out to the module level (into models/__init__.py).
from sqlalchemy import create_engine
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker
from sqlalchemy import Table, Column, Boolean, Unicode
from settings import setup
engine = create_engine('sqlite:///' + setup.get_db_path(), echo=False)
declarative_base = declarative_base(engine)
metadata = declarative_base.metadata
session = sessionmaker()
session = session()
..and import declarative_base like so:
from sqlalchemy import Table, Column, Unicode
from models import declarative_base
class Person(declarative_base):
"""
Person model
"""
__tablename__ = "people"
id = Column(Unicode(50), primary_key=True)
fname = Column(Unicode(50))
sname = Column(Unicode(50))
However I've had a lot of feedback that executing code as the module imports like this is bad? I'm looking for a definitive answer on the right way to do it as it seems like by trying to remove code repetition I've introduced some other bad practises. Any feedback would be really useful.
(Below is the get_db_path() method from settings/setup.py for completeness as it is called in the above models/__init__.py code.)
def get_db_path():
import sys
from os import makedirs
from os.path import join, dirname, exists
from constants import DB_FILE, DB_FOLDER
if len(sys.argv) > 1:
db_path = sys.argv[1]
else:
#Check if application is running from exe or .py and adjust db path accordingly
if getattr(sys, 'frozen', False):
application_path = dirname(sys.executable)
db_path = join(application_path, DB_FOLDER, DB_FILE)
elif __file__:
application_path = dirname(__file__)
db_path = join(application_path, '..', DB_FOLDER, DB_FILE)
#Check db path exists and create it if not
def ensure_directory(db_path):
dir = dirname(db_path)
if not exists(dir):
makedirs(dir)
ensure_directory(db_path)
return db_path

Some popular frameworks (Twisted is one example) perform a good deal of initialization logic at import-time. The benefits of being able to build module contents dynamically do come at a price, one of them being that IDEs cannot always decide what is "in" the module or not.
In your specific case, you may want to refactor so that the particular engine is not supplied at import-time but later. You can create the metadata and your declarative_base class at import-time. Then during start-time, after all the classes are defined, you call create_engine and bind the result to your sqlalchemy.orm.sessionmaker. But if your needs are simple, you may not even need to go that far.
In general, I would say this is not Java or C. There is no reason to fear doing things at the module level other than define functions, classes and constants. Your classes are created when the application starts anyway, one after the other. A little monkey-patching of classes (in the same module!), or creating one or two global lookup tables, is OK in my opinion if it simplifies your implementation.
What I would definitely avoid is any code in your module that causes the order of imports to matter for your users (other than the normal way of simply providing logic to be used), or that modifies the behavior of code outside your module. Then your module becomes black magic, which is accepted in the Perl world (ala use strict;), but I find is not "pythonic".
For example, if your module modifies properties of sys.stdout when imported, I would argue that behavior should instead be moved into a function that the user can either call or not.

In principle there is nothing wrong with executing Python code when a module is imported, in fact every single Python module works that way. Defining module members is executing code, after all.
However, in your particular use case I would strongly advise against making a singleton database session object in your code base. You'll be losing out on the ability to do many things, for example unit test your model against an in-memory SQLite or other kind of database engine.
Take a look at the documentation for declarative_base and note how the examples are able to create the model with a declarative_base supplied class that's not yet bound to a database engine. Ideally you want to do that and then have some kind of connection function or class that will manage creating a session and then bind it to base.

There is nothing wrong with executing code at import time.
The guideline is executing enough so that your module is usable, but not so much that importing it is unnecessarily slow, and not so much that you are unnecessarily limiting what can be done with it.
Typically this means defining classes, functions, and global names (actually module level -- bit of a misnomer there), as well as importing anything your classes, functions, etc., need to operate.
This does not usually involve making connections to databases, websites, or other external, dynamic resources, but rather supplying a function to establish those connections when the module user is ready to do so.

I ran into this as well, and created a database.py file with a database manager class, after which I created a single global object. That way the class could read settings from my settings.py file to configure the database, and the first class that needed to use a base object (or session / engine) would initialize the global object, after which everyone just re-uses it. I feel much more comfortable having "from myproject.database import DM" at the top of each class using the SQLAlchemy ORM, where DM is my global database object, and then DM.getBase() to get the base object.
Here is my database.py class:
Session = scoped_session(sessionmaker(autoflush=True))
class DatabaseManager():
"""
The top level database manager used by all the SQLAlchemy classes to fetch their session / declarative base.
"""
engine = None
base = None
def ready(self):
"""Determines if the SQLAlchemy engine and base have been set, and if not, initializes them."""
host='<database connection details>'
if self.engine and self.base:
return True
else:
try:
self.engine = create_engine(host, pool_recycle=3600)
self.base = declarative_base(bind=self.engine)
return True
except:
return False
def getSession(self):
"""Returns the active SQLAlchemy session."""
if self.ready():
session = Session()
session.configure(bind=self.engine)
return session
else:
return None
def getBase(self):
"""Returns the active SQLAlchemy base."""
if self.ready():
return self.base
else:
return None
def getEngine(self):
"""Returns the active SQLAlchemy engine."""
if self.ready():
return self.engine
else:
return None
DM = DatabaseManager()

Related

Should you always only have one session open at a time?

I created a session factory using python-flask-sqlalchemy and i use it to create sessions for reading and writing, however i did not use session.close or with statement to close the session (I thought it would automatically close when there are not more reference to it).
There was a bug somewhere and i accidentally gained some insight to how my backend works.
Object < ClassName at memory location > is already attached to session '134' (this is '135')
134 sounds like an awefully huge number of sessions, if i understand this correctly, the previous session was not closed and it is consuming extra memory?
edit: the error was due to the fact that i was trying to initiate 2 different sessions attached to the same class (as i want to write to one and read on a different db). But i accidentally found out that i have 134 sessions going on (if i understand it correctly)!
edit #2 : alternatively should i just stop using session factory and stick to the officially endorsed multiple bind method?
Heres the code
import sqlalchemy
from sqlalchemy.orm import Query, Session, scoped_session, sessionmaker
from db import db
from config import SQL_SERVER_STRING
class ReadFactory:
SQL_SERVER_STRING = SQL_SERVER_STRING
def __init__(self):
engine = db.create_engine(self.SQL_SERVER_STRING, {})
Session = sessionmaker()
Session.configure(bind=engine)
self.readSession = Session()
def read(self):
return self.readSession
and then when i use it inside a model class
#classmethod
def get_from_db(cls, **kwargs):
readSession = ReadFactory()
return readSession.read().query(cls).filter_by(**kwargs).all()

Cannot mock python function called during import

I am trying to mock a sqlalchemy create_engine method called during import of a python script. But I am not able to properly do so. Here is my file structure:
File main.py
from database_wrapper import update_database
def do_something():
# do something and
update_database()
database_wrapper.py
from sqlalchemy import create_engine
engine = create_engine("connection_uri")
def update_database() -> bool:
con = engine.connect()
# do something with the connection, and so on...
In my test unit i do something like:
class TestMicroservice(TestCase):
#patch('database_wrapper.create_engine')
#patch('main.update_database')
def test_1(self, update_database_mock, create_engine_mock):
create_engine_mock.return_value = "Fake engine, babe"
update_database_mock.return_value = True
from main import do_something
do_something()
I have also tried #patch('main.database_wrapper.create_engine'), #patch('main.database_wrapper.sqlalchemy.create_engine'), #patch('sqlalchemy.create_engine') and many other variations but with no luck. I have read a few similar posts but couldn't find this exact case.
I would rather avoid changing database_wrapper.py, hence i cannot move engine = create_engine("connection_uri"). Also because I am happy to create the engine when the program start without having to pass it around.

I found a solution. Thank to MrBean Bremen for setting me on the right route. In fact engine = create_engine("connection_uri") is already lazy initialized, as explained here. The only thing I needed to change was to pass a reasonable string as connection uri, something like
engine = create_engine("postgresql://user:pass#localhost:5432/db?sslmode=disable")
This way the engine would be initialised, but because i will never actually use it in the update_database() method, it would never give a problem.

How to initialize a Postgresql database in a Flask app with SQLAlchemy

The Flask tutorial (and many other tutorials out there) suggests that the engine, the db_session and the Base (an instance of declarative_metadata) are all created at import-time.
This creates some problems, one being, that the URI of the DB is hardcoded in the code and evaluated only once.
One solution is to wrap these calls in functions that accept the app as a parameter, which is what I've done. Mind you - each call caches the result in app.config:
def get_engine(app):
"""Return the engine connected to the database URI in the config file.
Store it in the config for later use.
"""
engine = app.config.setdefault(
'DB_ENGINE', create_engine(app.config['DATABASE_URI'](), echo=True))
return engine
def get_session(app):
"""Return the DB session for the database in use
Store it in the config for later use.
"""
engine = get_engine(app)
db_session = app.config.setdefault(
'DB_SESSION', scoped_session(sessionmaker(
autocommit=False, autoflush=False, bind=engine)))
return db_session
def get_base(app):
"""Return the declarative base to use in DB models.
Store it in the config for later use.
"""
Base = app.config.setdefault('DB_BASE', declarative_base())
Base.query = get_session(app).query_property()
return Base
In init_db, I call all those functions, but there's still code smell:
def init_db(app):
"""Initialise the database"""
create_db(app)
engine = get_engine(app)
db_session = get_session(app)
base = get_base(app)
if not app.config['TESTING']:
import flaskr.models
else:
if 'flaskr.models' not in sys.modules:
import flaskr.models
else:
import flaskr.models
importlib.reload(flaskr.models)
base.metadata.create_all(bind=engine)
The smell is of course the hoops I have to go through to import and create all models.
The reason for the code above is that, when unit testing, init_db is called once for each test (in setup(), as suggested in the same tutorial), but the import will only be performed the first time, and create_all will therefore work only that time.
Not only that, now with a session shared for the duration of the app, I have problems in parametrized negative unit tests (that is, parametrized unit tests that expect some sort of failures): the first instance of the test will trigger a failure (e.g. login failure, see test_login_validate_input in the tutorial) and exit correctly, while all subsequent will bail out early because the db_session should be rolled back first. Clearly there's something wrong with the DB initialization.
What is the Right Way(TM) to initialize the database?

I have eventually decided to refactor the app so that it uses Flask-SQLAlchemy.
In short, the app now does something like this:
from flask_sqlalchemy import SQLAlchemy
db = SQLAlchemy()
def create_app():
app = Flask(__name__)
db.init_app(app)
# ...
With the benefit of hindsight, it's definitely a cleaner approach.
What put me off at the start was this entry from the tutorial (bold is mine):
Because SQLAlchemy is a common database abstraction layer and object relational mapper that requires a little bit of configuration effort, there is a Flask extension that handles that for you. This is recommended if you want to get started quickly.
Which I somehow read as "Using the Flask-SQLAlchemy extension will allow you to cut some corners, which you'll probably end up paying for later".
It's very early stages, but so far no price to pay in terms of flexibility for using said extension.

Using python mocking library on sqlalchemy

I'm using sqlalchemy to query my databases for a project. Side-by-side, I'm new to unit testing and I'm trying to learn how can I make unit tests to test my database. I tried to use mocking library to test but so far, it seems to be very difficult.
So I made a piece of code which creates a Session object. This object is used to connect to database.
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.exc import OperationalError, ArgumentError
test_db_string = 'postgresql+psycopg2://testdb:hello#localhost/' \
'test_databasetable'
def get_session(database_connection_string):
try:
Base = declarative_base()
engine = create_engine(database_connection_string)
Base.metadata.bind = engine
DBSession = sessionmaker(bind=engine)
session = DBSession()
connection = session.connection()
return session
except OperationalError:
return None
except ArgumentError:
return None
So I made a unit test case for this function:
import mock
import unittest
from mock import sentinel
import get_session
class TestUtilMock(unittest.TestCase):
#mock.patch('app.Utilities.util.create_engine') # mention the whole path
#mock.patch('app.Utilities.util.sessionmaker')
#mock.patch('app.Utilities.util.declarative_base')
def test_get_session1(self, mock_delarative_base, mock_sessionmaker,
mock_create_engine):
mock_create_engine.return_value = sentinel.engine
get_session('any_path')
mock_delarative_base.called
mock_create_engine.assert_called_once_with('any_path')
mock_sessionmaker.assert_called_once_with(bind=sentinel.engine)
As you see in my unit test, I cannot test code in get_session() starting from line session = DBSession(). So far after googling, I cannot find out if mock value returned can be also used to mock function calls - something like, I mock session object and verify if I called session.connection()
Is the above method of writing unit test case the right way? Is there a better method to do this?

First of all you can test session = DBSession() line also by
self.assertEqual(get_session('any_path'), mock_sessionmaker.return_value.return_value)
Moreover, mock_delarative_base.called is not an assert and cannot fail. Replace it with
self.assertTrue(mock_delarative_base.called)
General considerations
Writing test like these can be very time consuming and make your production code really coupled to test code. It is better to write your own wrapper that presents a comfortable interface for your business code and test it with some mocks and real (simple) tests: you should trust sqlalchemy implementation and just test how your wrapper calls it
After you can either mock your wrapper by a fake object that you can control or use mocks and check how your code calls it but your wrapper will present just some business interface and mocking it should be very simple.

How to create a non-persistent Elixir/SQLAlchemy object?

Because of legacy data which is not available in the database but some external files, I want to create a SQLAlchemy object which contains data read from the external files, but isn't written to the database if I execute session.flush()
My code looks like this:
try:
return session.query(Phone).populate_existing().filter(Phone.mac == ident).one()
except:
return self.createMockPhoneFromLicenseFile(ident)
def createMockPhoneFromLicenseFile(self, ident):
# Some code to read necessary data from file deleted....
phone = Phone()
phone.mac = foo
phone.data = bar
phone.state = "Read from legacy file"
phone.purchaseOrderPosition = self.getLegacyOrder(ident)
# SQLAlchemy magic doesn't seem to work here, probably because we don't insert the created
# phone object into the database. So we set the id fields manually.
phone.order_id = phone.purchaseOrderPosition.order_id
phone.order_position_id = phone.purchaseOrderPosition.order_position_id
return phone
Everything works fine except that on a session.flush() executed later in the application SQLAlchemy tries to write the created Phone object to the database (which fortunately doesn't succeed, because phone.state is longer than the data type allows), which breaks the function which issues the flush.
Is there any way to prevent SQLAlchemy from trying to write such an object?
Update
While I didn't find anything on
using_mapper_options(save_on_init=False)
in the Elixir documentation (maybe you can provide a link?), it seemed to me worth a try (I would have preferred a way to prevent a single instance from being written instead of the whole entity).
At first it seemed that the statement has no effect and I suspected my SQLAlchemy/Elixir versions to be too old, but then I found out that the connection to the PurchaseOrderPosition entity (which I didn't modify) made with
phone.purchaseOrderPosition = self.getLegacyOrder(ident)
causes the phone object to be written again. If I remove the statement, everything seems to be fine.

You need to do
import elixir
elixir.options_defaults['mapper_options'] = { 'save_on_init': False }
to prevent Entity instances which you instantiate being auto-added to the session. Ideally, this should be done as early as possible in your code. You can also do this on a per-entity basis, via using_mapper_options(save_on_init=False) - see the Elixir documentation for more details.
Update:
See this post on the Elixir mailing list indicating that this is the solution.
Also, as Ants Aasma points out, you can use cascade options on the Elixir relationship to set up cascade options in SQLAlchemy. See this page for more details.

Well, sqlalchemy doesn't, by default.
Consider the following self-contained example code.
from sqlalchemy import Column, Integer, Unicode, create_engine
from sqlalchemy.orm import create_session
from sqlalchemy.ext.declarative import declarative_base
e = create_engine('sqlite://')
Base = declarative_base(bind=e)
class User(Base):
__tablename__ = 'users'
id = Column(Integer, primary_key=True)
name = Column(Unicode(50))
# create the empty table and a session
Base.metadata.create_all()
s = create_session(bind=e, autoflush=False, autocommit=False)
# assert the table is empty
assert s.query(User).all() == []
# create a new User instance but don't save it to database:
u = User()
u.name = 'siebert'
# I could run s.add(u) here but I won't
s.flush()
s.commit()
# assert the table is still empty
assert s.query(User).all() == []
So I'm not sure what's implicity adding your instances to the session. Normally you have to manually call s.add(u) to make it go to the session. I'm not familiar with elixir so perhaps this is some elixir trickery... Maybe you could remove it from the session, by using session.expunge().

Old post but I came across a similar issue, in my case in sqlalchemy it was caused by cascading on backrefs:
http://docs.sqlalchemy.org/en/rel_0_7/orm/session.html#backref-cascade
Turn it off on your backrefs so that you have to explicitly add things to the session

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Is it OK to execute code when a module imports? - python

Related

Should you always only have one session open at a time?

Cannot mock python function called during import

How to initialize a Postgresql database in a Flask app with SQLAlchemy

Using python mocking library on sqlalchemy

How to create a non-persistent Elixir/SQLAlchemy object?

Categories

Resources