Get compiled SQL of `create_all`

Get compiled SQL of `create_all` - python

I'd like to speed my integration tests a bit and execute raw SQL code equivalent to create_all()
My idea is to run create_all (in order to get it SQL equivalent) just once when the test session starts and use SQL code between the tests to migrate the tables.
Do you have any idea how it can be done?
Thanks in advance!

You can accomplish the task by hooking your code with sqlalachemy after_cursor_execute hook.
https://docs.sqlalchemy.org/en/13/core/events.html#sqlalchemy.events.ConnectionEvents.after_cursor_execute
class QueryLogger:
"""Log query duration and SQL as a context manager."""
def __init__(self,
engine: sqlalchemy.engine.Engine,
f: io.StringIO):
"""
Initialize for an engine and file.
engine: The sqlalchemy engine for which events should be logged.
You can pass the class `sqlalchemy.engine.Engine` to capture all engines
f: file you want to write your output to
"""
self.engine = engine
self.file = f
def _after_cursor_execute(self, conn, cursor, statement, parameters, context, executemany):
"""Listen for the 'after_cursor_execute' event and log sqlstatement and time."""
# check if it's a ddl operation create_all execute a bunch of select statements
if context.isddl:
s = statement % parameters
self.file.write(f"{s};")
def __enter__(self, *args, **kwargs):
"""Context manager."""
if isinstance(self.engine, sqlalchemy.engine.Engine):
sqlalchemy.event.listen(self.engine, "after_cursor_execute", self._after_cursor_execute)
return self
def __exit__(self, *args, **kwargs) -> None:
"""Context manager."""
if isinstance(self.engine, sqlalchemy.engine.Engine):
sqlalchemy.event.remove(self.engine, "after_cursor_execute", self._after_cursor_execute)
And then you can use the context manager to log the queries to an in-memory file for write to SQL
with open("x.sql", "w") as f:
with QueryLogger(db.engine, f):
db.create_all()
A major part of the code is inspired by https://stackoverflow.com/a/67298123/3358570

Related

Celery - assign session dynamicall/reusing connections

Breaking my head for few days for simple task(Thought it simple...not anymore):
Main program sends hundreds of sql queries to fetch data from Multiple DBs .
I thought Celery can be the right choice as it can scale and also simplify the threading/async orchestration .
The "clean" solution would be one generic class supposed to looks something like:
#app.task(bind=True , name='fetch_data')
def fetch_data(self,*args,**kwargs):
db= kwargs['db']
sql= kwargs['sql']
session = DBContext().get_session(db)
result = session.query(sql).all()
...
But having trouble to implement such DBContext class which will instantiate once for each DB and reuse the DB sessions for request and once requests done - close it .
(or any other recommendation you suggest ).
I was thinking about using a Base class to decorate the function and
keep the all available connections there ,
But the problem such class can't init dynamically but once ...
maybe there's way to make it work but not sure how ...
class DatBaseFactory(Task):
def __call__(self, *args, **kwargs):
print("In class",self.db)
self.engine = DBContext.get_db(self.db)
return super().__call__(*args, **kwargs)
#app.task(bind=True ,base=DatBaseFactory, name='test_db', db=db ,engine='' )
def test_db(self,*args,**kwargs):
print("From Task" ,self.engine)
Other alterative would be duplicating the functions as number of the DB and "preserved" them the sessions - but that's quite ugly solution .
Hope some1 can help here with this trouble ....

How to use dependencies with yield in FastApi

I am using FastApi and I would like to know if I am using the dependencies correctly.
First, I have a function that yields the database session.
class ContextManager:
def __init__(self):
self.db = DBSession()
def __enter__(self):
return self.db
def __exit__(self):
self.db.close()
def get_db():
with ContextManager() as db:
yield db
I would like to use that function in another function:
def validate(db=Depends(get_db)):
is_valid = verify(db)
if not is is_valid:
raise HTTPException(status_code=400)
yield db
Finally, I would like to use the last functions as a dependency on the routes:
#router.get('/')
def get_data(db=Depends(validate)):
data = db.query(...)
return data
I am using this code and it seems to work, but I would like to know if it is the most appropiate way to use dependencies. Especially, I am not sure if I have to use 'yield db' inside the function validate or it would be better to use return. I would appreciate your help. Thanks a lot

SQLAlchemy session cleared in celery job and on_success function

I am building a tool that fetches data from a different database, transforms it, and stores it in my own database. I'm migrating from APScheduler to Celery, but I ran into the following problem:
I use a class I call JobRecords to store when a job ran, whether it was successful and which errors it encountered. I use this to know not too look too far back for updated entries, especially since some tables have multiple millions of rows.
Since the system is the same for all jobs, I created a subclass from the celery Task object. I make sure the job is executed within the Flask app context, and I fetch the latest time this Job finished successfully. I also make sure I register a value for now to avoid timing issues between querying the database and adding the job record.
class RecordedTask(Task):
"""
Task sublass that uses JobRecords to get the last run date
and add new JobRecords on completion
"""
now: datetime = None
ignore_result = True
_session: scoped_session = None
success: bool = True
info: dict = None
#property
def session(self) -> Session:
"""Making sure we have one global session instance"""
if self._session is None:
from app.extensions import db
self._session = db.session
return self._session
def __call__(self, *args, **kwargs):
from app.models import JobRecord
kwargs['last_run'] = (
self.session.query(func.max(JobRecord.run_at_))
.filter(JobRecord.job_id == self.name, JobRecord.success)
.first()
)[0] or datetime.min
self.now = kwargs['now'] = datetime.utcnow()
with app.app_context():
super(RecordedTask, self).__call__(*args, **kwargs)
def on_failure(self, exc, task_id, args: list, kwargs: dict, einfo):
self.session.rollback()
self.success = False
self.info = dict(
args=args,
kwargs=kwargs,
error=exc.args,
exc=format_exception(exc.__class__, exc, exc.__traceback__),
)
app.logger.error(f"Error executing job '{self.name}': {exc}")
def on_success(self, retval, task_id, args: list, kwargs: dict):
app.logger.info(f"Executed job '{self.name}' successfully, adding JobRecord")
for entry in self.to_trigger:
if len(entry) == 2:
job, kwargs = entry
else:
job, = entry
kwargs = {}
app.logger.info(f"Scheduling job '{job}'")
current_celery_app.signature(job, **kwargs).delay()
def after_return(self, *args, **kwargs):
from app.models import JobRecord
record = JobRecord(
job_id=self.name,
run_at_=self.now,
info=self.info,
success=self.success
)
self.session.add(record)
self.session.commit()
self.session.remove()
I added an example of a job to update a model called Location, but there are a lot of jobs just like this one.
#celery.task(bind=True, name="update_locations")
def update_locations(self, last_run: datetime = datetime.min, **_):
"""Get the locations from the external database and check for updates"""
locations: List[ExternalLocation] = ExternalLocation.query.filter(
ExternalLocation.updated_at_ >= last_run
).order_by(ExternalLocation.id).all()
app.logger.info(f"ExternalLocation: collected {len(locations)} updated locations")
for update_location in locations:
existing_location: Location = Location.query.filter(
Location.external_id == update_location.id
).first()
if existing_location is None:
self.session.add(Location.from_worker(update_location))
else:
existing_location.update_from_worker(update_location)
The problem is that when I run this job, the Location objects are not committed with the JobRecord, so only the latter is created. If I track it with the debugger, Location.query.count() returns the correct value inside the function, but as soon as it enters the on_success callback, it's back to 0, and self._session.new returns an empty dict.
I already tried adding the session as a property to make sure it's the same instance everywhere, but the problem still persists. Maybe it has something to do with it being a scoped_session because of Flask-SQLAlchemy?
Sorry about the large amount of code, I did try to strip as much away as possible. Any help is welcome!

I found out that the culprit was the combination of scoped_session and the Flask app context. Like any contextmanager, running the code with app.app_context() triggered the __exit__ function on leaving, which in turn caused the ScopedRegistry, where the scoped_session was stored, to be cleared. Then, a new session was created, the JobRecords were added to that, and that session was committed. Therefore, the locations would not be written to the database.
There are two possible solutions. If you don't use sessions in other files than in your task, you can add a session property to the task. This way, you avoid the scoped_session alltogether, and can clean up in your after_return function.
#property
def session(self):
if self._session is None:
from dashboard.extensions import db
self._session = db.create_session(options={})()
return self._session
However, I was accessing the session in my model definition files as well, through from extensions import db. Therefore, I was using two different sessions. I ended up using app.app_context().push() instead of the contextmanager, thus avoiding the __exit__ function
app.app_context().push()
super(RecordedTask, self).__call__(*args, **kwargs)

how to use python cx_Oracle with spool

I'm using python3.4 to interact with oracle(11g)/sql developer.
Is it true that cx_Oracle could not deal with sqlPlus statements? It seems that the page https://sourceforge.net/p/cx-oracle/mailman/message/2932119/ said so.
So how could we execute 'spool' command by python?
The code:
import cx_Oracle
db_conn = cx_Oracle.connect(...)
cursor = db_conn.cursor()
cursor.execute('spool C:\\Users\Administrator\Desktop\mycsv.csv')
...
the error: cx_Oracle.DatabaseError: ORA-00900:

The "spool" command is very specific to SQL*Plus and is not available in cx_Oracle or any other application that uses the OCI (Oracle Call Interface). You can do something similar, however, without too much trouble.
You can create your own Connection class subclassed from cx_Oracle.Connection and your own Cursor class subclassed from cx_Oracle.Cursor that would perform any logging and have a special command "spool" that would turn it on and off at will. Something like this:
class Connection(cx_Oracle.Connection):
def __init__(self, *args, **kwargs):
self.spoolFile = None
return super(Connection, self).__init__(*args, **kwargs)
def cursor(self):
return Cursor(self)
def spool(self, fileName):
self.spoolFile = open(fileName, "w")
class Cursor(cx_Oracle.Cursor):
def execute(self, statement, args):
result = super(Cursor, self).execute(statement, args)
if self.connection.spoolFile is not None:
self.connection.spoolFile.write("Headers for query\n")
self.connection.spoolFile.write("use cursor.description")
def fetchall(self):
rows = super(Cursor, self).fetchall()
if self.connection.spoolFile is not None:
for row in rows:
self.connection.spoolFile.write("row details")
That should give you some idea on where to go with this.

How do I promote PostgreSQL warnings to exceptions in psycopg2?

From the PostgreSQL docs on BEGIN:
Issuing BEGIN when already inside a transaction block will provoke a
warning message. The state of the transaction is not affected.
How can I make psycopg2 raise an exception on any such warning?

I am very far from being psycopg2 or Postgres expert, and, I am sure there is a better solution to increase the warning level, but here is something that worked for me - a custom cursor which looks into connection notices and, if there is something there - it throws an exception. The implementation itself is for education purposes mostly - I am sure it needs to be adjusted to work in your use case:
import psycopg2
# this "cursor" class needs to be used as a base for custom cursor classes
from psycopg2.extensions import cursor
class ErrorThrowingCursor(cursor):
def __init__(self, conn, *args, **kwargs):
self.conn = conn
super(ErrorThrowingCursor, self).__init__(*args, **kwargs)
def execute(self, query, vars=None):
result = super(ErrorThrowingCursor, self).execute(query, vars)
for notice in self.conn.notices:
level, message = notice.split(": ")
if level == "WARNING":
raise psycopg2.Warning(message.strip())
return result
Usage sample:
conn = psycopg2.connect(user="user", password="secret")
cursor = conn.cursor(conn, cursor_factory=ErrorThrowingCursor)
This would throw an exception (of a psycopg2.Warning type) if a warning was issued after a query execution. Sample:
psycopg2.Warning: there is already a transaction in progress

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Get compiled SQL of `create_all` - python

Related

Celery - assign session dynamicall/reusing connections

How to use dependencies with yield in FastApi

SQLAlchemy session cleared in celery job and on_success function

how to use python cx_Oracle with spool

How do I promote PostgreSQL warnings to exceptions in psycopg2?

Categories

Resources