Breaking my head for few days for simple task(Thought it simple...not anymore):
Main program sends hundreds of sql queries to fetch data from Multiple DBs .
I thought Celery can be the right choice as it can scale and also simplify the threading/async orchestration .
The "clean" solution would be one generic class supposed to looks something like:
#app.task(bind=True , name='fetch_data')
def fetch_data(self,*args,**kwargs):
db= kwargs['db']
sql= kwargs['sql']
session = DBContext().get_session(db)
result = session.query(sql).all()
...
But having trouble to implement such DBContext class which will instantiate once for each DB and reuse the DB sessions for request and once requests done - close it .
(or any other recommendation you suggest ).
I was thinking about using a Base class to decorate the function and
keep the all available connections there ,
But the problem such class can't init dynamically but once ...
maybe there's way to make it work but not sure how ...
class DatBaseFactory(Task):
def __call__(self, *args, **kwargs):
print("In class",self.db)
self.engine = DBContext.get_db(self.db)
return super().__call__(*args, **kwargs)
#app.task(bind=True ,base=DatBaseFactory, name='test_db', db=db ,engine='' )
def test_db(self,*args,**kwargs):
print("From Task" ,self.engine)
Other alterative would be duplicating the functions as number of the DB and "preserved" them the sessions - but that's quite ugly solution .
Hope some1 can help here with this trouble ....
I am using FastApi and I would like to know if I am using the dependencies correctly.
First, I have a function that yields the database session.
class ContextManager:
def __init__(self):
self.db = DBSession()
def __enter__(self):
return self.db
def __exit__(self):
self.db.close()
def get_db():
with ContextManager() as db:
yield db
I would like to use that function in another function:
def validate(db=Depends(get_db)):
is_valid = verify(db)
if not is is_valid:
raise HTTPException(status_code=400)
yield db
Finally, I would like to use the last functions as a dependency on the routes:
#router.get('/')
def get_data(db=Depends(validate)):
data = db.query(...)
return data
I am using this code and it seems to work, but I would like to know if it is the most appropiate way to use dependencies. Especially, I am not sure if I have to use 'yield db' inside the function validate or it would be better to use return. I would appreciate your help. Thanks a lot
I am building a tool that fetches data from a different database, transforms it, and stores it in my own database. I'm migrating from APScheduler to Celery, but I ran into the following problem:
I use a class I call JobRecords to store when a job ran, whether it was successful and which errors it encountered. I use this to know not too look too far back for updated entries, especially since some tables have multiple millions of rows.
Since the system is the same for all jobs, I created a subclass from the celery Task object. I make sure the job is executed within the Flask app context, and I fetch the latest time this Job finished successfully. I also make sure I register a value for now to avoid timing issues between querying the database and adding the job record.
class RecordedTask(Task):
"""
Task sublass that uses JobRecords to get the last run date
and add new JobRecords on completion
"""
now: datetime = None
ignore_result = True
_session: scoped_session = None
success: bool = True
info: dict = None
#property
def session(self) -> Session:
"""Making sure we have one global session instance"""
if self._session is None:
from app.extensions import db
self._session = db.session
return self._session
def __call__(self, *args, **kwargs):
from app.models import JobRecord
kwargs['last_run'] = (
self.session.query(func.max(JobRecord.run_at_))
.filter(JobRecord.job_id == self.name, JobRecord.success)
.first()
)[0] or datetime.min
self.now = kwargs['now'] = datetime.utcnow()
with app.app_context():
super(RecordedTask, self).__call__(*args, **kwargs)
def on_failure(self, exc, task_id, args: list, kwargs: dict, einfo):
self.session.rollback()
self.success = False
self.info = dict(
args=args,
kwargs=kwargs,
error=exc.args,
exc=format_exception(exc.__class__, exc, exc.__traceback__),
)
app.logger.error(f"Error executing job '{self.name}': {exc}")
def on_success(self, retval, task_id, args: list, kwargs: dict):
app.logger.info(f"Executed job '{self.name}' successfully, adding JobRecord")
for entry in self.to_trigger:
if len(entry) == 2:
job, kwargs = entry
else:
job, = entry
kwargs = {}
app.logger.info(f"Scheduling job '{job}'")
current_celery_app.signature(job, **kwargs).delay()
def after_return(self, *args, **kwargs):
from app.models import JobRecord
record = JobRecord(
job_id=self.name,
run_at_=self.now,
info=self.info,
success=self.success
)
self.session.add(record)
self.session.commit()
self.session.remove()
I added an example of a job to update a model called Location, but there are a lot of jobs just like this one.
#celery.task(bind=True, name="update_locations")
def update_locations(self, last_run: datetime = datetime.min, **_):
"""Get the locations from the external database and check for updates"""
locations: List[ExternalLocation] = ExternalLocation.query.filter(
ExternalLocation.updated_at_ >= last_run
).order_by(ExternalLocation.id).all()
app.logger.info(f"ExternalLocation: collected {len(locations)} updated locations")
for update_location in locations:
existing_location: Location = Location.query.filter(
Location.external_id == update_location.id
).first()
if existing_location is None:
self.session.add(Location.from_worker(update_location))
else:
existing_location.update_from_worker(update_location)
The problem is that when I run this job, the Location objects are not committed with the JobRecord, so only the latter is created. If I track it with the debugger, Location.query.count() returns the correct value inside the function, but as soon as it enters the on_success callback, it's back to 0, and self._session.new returns an empty dict.
I already tried adding the session as a property to make sure it's the same instance everywhere, but the problem still persists. Maybe it has something to do with it being a scoped_session because of Flask-SQLAlchemy?
Sorry about the large amount of code, I did try to strip as much away as possible. Any help is welcome!
I found out that the culprit was the combination of scoped_session and the Flask app context. Like any contextmanager, running the code with app.app_context() triggered the __exit__ function on leaving, which in turn caused the ScopedRegistry, where the scoped_session was stored, to be cleared. Then, a new session was created, the JobRecords were added to that, and that session was committed. Therefore, the locations would not be written to the database.
There are two possible solutions. If you don't use sessions in other files than in your task, you can add a session property to the task. This way, you avoid the scoped_session alltogether, and can clean up in your after_return function.
#property
def session(self):
if self._session is None:
from dashboard.extensions import db
self._session = db.create_session(options={})()
return self._session
However, I was accessing the session in my model definition files as well, through from extensions import db. Therefore, I was using two different sessions. I ended up using app.app_context().push() instead of the contextmanager, thus avoiding the __exit__ function
app.app_context().push()
super(RecordedTask, self).__call__(*args, **kwargs)
I'm using python3.4 to interact with oracle(11g)/sql developer.
Is it true that cx_Oracle could not deal with sqlPlus statements? It seems that the page https://sourceforge.net/p/cx-oracle/mailman/message/2932119/ said so.
So how could we execute 'spool' command by python?
The code:
import cx_Oracle
db_conn = cx_Oracle.connect(...)
cursor = db_conn.cursor()
cursor.execute('spool C:\\Users\Administrator\Desktop\mycsv.csv')
...
the error: cx_Oracle.DatabaseError: ORA-00900:
The "spool" command is very specific to SQL*Plus and is not available in cx_Oracle or any other application that uses the OCI (Oracle Call Interface). You can do something similar, however, without too much trouble.
You can create your own Connection class subclassed from cx_Oracle.Connection and your own Cursor class subclassed from cx_Oracle.Cursor that would perform any logging and have a special command "spool" that would turn it on and off at will. Something like this:
class Connection(cx_Oracle.Connection):
def __init__(self, *args, **kwargs):
self.spoolFile = None
return super(Connection, self).__init__(*args, **kwargs)
def cursor(self):
return Cursor(self)
def spool(self, fileName):
self.spoolFile = open(fileName, "w")
class Cursor(cx_Oracle.Cursor):
def execute(self, statement, args):
result = super(Cursor, self).execute(statement, args)
if self.connection.spoolFile is not None:
self.connection.spoolFile.write("Headers for query\n")
self.connection.spoolFile.write("use cursor.description")
def fetchall(self):
rows = super(Cursor, self).fetchall()
if self.connection.spoolFile is not None:
for row in rows:
self.connection.spoolFile.write("row details")
That should give you some idea on where to go with this.
From the PostgreSQL docs on BEGIN:
Issuing BEGIN when already inside a transaction block will provoke a
warning message. The state of the transaction is not affected.
How can I make psycopg2 raise an exception on any such warning?
I am very far from being psycopg2 or Postgres expert, and, I am sure there is a better solution to increase the warning level, but here is something that worked for me - a custom cursor which looks into connection notices and, if there is something there - it throws an exception. The implementation itself is for education purposes mostly - I am sure it needs to be adjusted to work in your use case:
import psycopg2
# this "cursor" class needs to be used as a base for custom cursor classes
from psycopg2.extensions import cursor
class ErrorThrowingCursor(cursor):
def __init__(self, conn, *args, **kwargs):
self.conn = conn
super(ErrorThrowingCursor, self).__init__(*args, **kwargs)
def execute(self, query, vars=None):
result = super(ErrorThrowingCursor, self).execute(query, vars)
for notice in self.conn.notices:
level, message = notice.split(": ")
if level == "WARNING":
raise psycopg2.Warning(message.strip())
return result
Usage sample:
conn = psycopg2.connect(user="user", password="secret")
cursor = conn.cursor(conn, cursor_factory=ErrorThrowingCursor)
This would throw an exception (of a psycopg2.Warning type) if a warning was issued after a query execution. Sample:
psycopg2.Warning: there is already a transaction in progress