MongoDB into a Celery Task - Flask Application - python

I'm trying to use Celery on my Flask application.
I'm defining a task in a file insight_tasks.py.
In that file is defined a function:
#celery_app.task
def save_insights_task()
That function do some stuff and, here comes the error, I'm trying to save data into MongoDB and the console throws me:
MongoEngineConnectionError('You have not defined a default connection',)
So I think it's because MongoEngine has not been initialized, and here is my question:
How should I use MongoDB inside a Celery Task?, because when using MongoDB on my routes (flask app) It's working as expected.
Celery does not share the db instance?
Files:
__init__.py (Celery intialization)
celery_app = Celery('insights',
broker=config.CELERY_LOCATIONS_BROKER_URL,
backend=config.CELERY_LOCATIONS_RESULT_BACKEND,
include=['app.insight_tasks']
)
insight_tasks.py
from app.google import google_service
from app.models import LocationStats
from . import celery_app
from firebase_admin import db as firebase_db
import arrow
#celery_app.task
def save_insight_task(account_location, uid, gid, locations_obj, aggregation):
try:
insights, reviews = google_service.store_location_resources(
gid, uid,
start_datetime, end_datetime,
account_location, aggregation
)
except StandardError as err:
from pprint import pprint
import traceback
pprint(err)
pprint(traceback.print_exc())
path = 'saved_locations/{}/accounts/{}'.format(gid, account_location)
location = [loc for loc in locations_obj if loc['name'] == 'accounts/' + account_location]
if len(location) > 0:
firebase_db.reference(path).update(location[0])
Here google_service.store_location_resources() is the function that saves thew data into MongoDB. this function is used on another side, by the routes of my app, so it works as expected, except on the Celery task
---------
The Celery task is called into a POST request
accounts/routes.py
#account.route('/save/queue', methods=['POST'])
def save_all_locations():
data = request.data
dataDict = json.loads(data)
uid = request.headers.get('uid', None)
gid = request.headers.get('gid', None)
account_locations = dataDict['locations']
locations_obj = dataDict['locations_obj']
for path in account_locations:
save_insight_task.delay(account_location=path, uid=uid, gid=gid, locations_obj=locations_obj, aggregate='SOME_TEXT')

You are supposed to connect to the database inside the task. The reason is because child processes (created by Celery) must have their own instance of mongo client.
More details here : Using PyMongo with Multiprocessing
For example define a utils.py :
from pymodm import connect
def mongo_connect():
return connect("mongodb://{0}:{1}/{2}".format(MONGODB['host'],
MONGODB['port'],
MONGODB['db_name']),
alias=MONGODB['db_name'])
Then in insight_tasks.py
from utils import mongo_connect
#celery_app.task
def save_insight_task(account_location, uid, gid, locations_obj, aggregation):
# connect to mongodb
mongo_connect()
# do your db operations
try:
insights, reviews = google_service.store_location_resources(
gid, uid,
start_datetime, end_datetime,
account_location, aggregation
)
except StandardError as err:
from pprint import pprint
import traceback
pprint(err)
pprint(traceback.print_exc())
path = 'saved_locations/{}/accounts/{}'.format(gid, account_location)
location = [loc for loc in locations_obj if loc['name'] == 'accounts/' + account_location]
if len(location) > 0:
firebase_db.reference(path).update(location[0])
Note that I use pymodm package instead of mongoengine package as ODM for mongo.

Related

FastAPI read configuration before specifying dependencies

I'm using fastapi-azure-auth to make call to my API impossible, if the user is not logged in (doesn't pass a valid token in the API call from the UI to be precise).
My question doesn't have anything to do with this particular library, it's about FastAPI in general.
I use a class (SingleTenantAzureAuthorizationCodeBearer) which is callable. It is used in two cases:
api.onevent("startup") - to connect to Azure
as a dependency in routes that user wants to have authentication in
To initialize it, it requires some things like Azure IDs etc. I provide those via a config file.
The problem is, this class is created when the modules get evaluated, so the values from the config file would have to be already present.
So, I have this:
dependecies.py
azure_scheme = SingleTenantAzureAuthorizationCodeBearer(
app_client_id=settings.APP_CLIENT_ID,
tenant_id=settings.TENANT_ID,
scopes={
f'api://{settings.APP_CLIENT_ID}/user_impersonation': 'user_impersonation',
}
)
api.py
from .dependencies import azure_scheme
api = FastAPI(
title="foo"
)
def init_api() -> FastAPI:
# I want to read configuration here
api.swagger_ui.init_oauth = {"clientID": config.CLIENT_ID}
return api
#api.on_event('startup')
async def load_config() -> None:
"""
Load OpenID config on startup.
"""
await azure_scheme.openid_config.load_config()
#api.get("/", dependencies=[Depends(azure_scheme)])
def test():
return {"hello": "world"}
Then I'd run the app with gunicorn -k uvicorn.workers.UvicornWorker foo:init_api().
So, for example, the Depends part will get evaluated before init_api, before reading the config. I would have to read the config file before that happens. And I don't want to do that, I'd like to control when the config reading happens (that's why I have init_api function where I initialize the logging and other stuff).
My question would be: is there a way to first read the config then initialize a dependency like SingleTenantAzureAuthorizationCodeBearer so I can use the values from config for this initialization?
#Edit
api.py:
from fastapi import Depends, FastAPI, Response
from fastapi.middleware.cors import CORSMiddleware
from .config import get_config
from .dependencies import get_azure_scheme
api = FastAPI(
title="Foo",
swagger_ui_oauth2_redirect_url="/oauth2-redirect",
)
api.add_middleware(
CORSMiddleware,
allow_origins=["*"],
allow_credentials=True,
allow_methods=["*"],
allow_headers=["*"],
)
def init_api() -> FastAPI:
api.swagger_ui_init_oauth = {
"usePkceWithAuthorizationCodeGrant": True,
"clientId": get_config().client_id,
}
return api
#api.get("/test", dependencies=[Depends(get_azure_scheme)])
def test():
return Response(status_code=200)
config.py:
import os
from functools import lru_cache
import toml
from pydantic import BaseSettings
class Settings(BaseSettings):
client_id: str
tenant_id: str
#lru_cache
def get_config():
with open(os.getenv("CONFIG_PATH", ""), mode="r") as config_file:
config_data = toml.load(config_file)
return Settings(
client_id=config_data["azure"]["CLIENT_ID"], tenant_id=config_data["azure"]["TENANT_ID"]
)
dependencies.py:
from fastapi import Depends
from fastapi_azure_auth import SingleTenantAzureAuthorizationCodeBearer
from .config import Settings, get_config
def get_azure_scheme(config: Settings = Depends(get_config)):
return SingleTenantAzureAuthorizationCodeBearer(
app_client_id=config.client_id,
tenant_id=config.tenant_id,
scopes={
f"api://{config.client_id}/user": "user",
},
)

Stuck implementing elasticsearch in a flask app

I'm currently working through the Flask Mega-Tutorial (Part XVI) and have gotten stuck implementing elasticsearch. Specifically, I get this error when running the following from my flask shell command line:
from app.search import add_to_index, remove_from_index, query_index
>>> for post in Post.query.all():
... add_to_index('posts', post)
AttributeError: module 'flask.app' has no attribute 'elasticsearch'
I should mention that I did not implement the app restructuring from the previous lesson to use blueprints. Here's what my files look like:
__init__.py:
#
from elasticsearch import Elasticsearch
app.elasticsearch = Elasticsearch([app.config['ELASTICSEARCH_URL']]) \
if app.config['ELASTICSEARCH_URL'] else None
config.py:
class Config(object):
#
ELASTICSEARCH_URL = 'http://localhost:9200'
search.py:
from flask import app
def add_to_index(index, model):
if not app.elasticsearch:
return
payload = {}
for field in model.__searchable__:
payload[field] = getattr(model, field)
app.elasticsearch.index(index=index, id=model.id, body=payload)
def remove_from_index(index, model):
if not app.elasticsearch:
return
app.elasticsearch.delete(index=index, id=model.id)
def query_index(index, query, page, per_page):
if not app.elasticsearch:
return [], 0
search = app.elasticsearch.search(
index=index,
body={'query': {'multi_match': {'query': query, 'fields': ['*']}},
'from': (page - 1) * per_page, 'size': per_page})
ids = [int(hit['_id']) for hit in search['hits']['hits']]
return ids, search['hits']['total']['value']
I think I'm not importing elasticsearch correctly into search.py but I'm not sure how to represent it given that I didn't do the restructuring in the last lesson. Any ideas?
The correct way to write it in the search.py file should be from flask import current_app
Not sure if you got this working, but the way I implemented it was by still using app.elasticsearch but instead within search.py do:
from app import app

Flask return response but keep processing?

I'm not sure if I worded the question correctly, but for example I want to return the response without returning the function.
My context here is, the user asks for a large excel file to be generate, so a link will be returned to him, and when the excel is done an email will also be sent.
Pseudo example :
from flask import Flask
from flask import send_file
from someXlsLib import createXls
from someIoLib import deleteFile
from someMailLib import sendMail
import uuid
app = Flask(__name__)
host = 'https://myhost.com/myApi'
#app.route('/getXls')
def getXls:
fileName = uuid.uuid4().hex + '.xls'
downloadLink = host + '/tempfiles/' + fileName
#Returning the downloadLink for the user to acces when xls file ready
return downloadLink
#But then this code is unreachable
generateXls(fileName)
def generateXls(fileName, downloadLink)
createXls('/tempfiles/' + fileName)
sendMail(downloadLink)
#app.route('/tempfiles/<fileName>')
def getTempFile:
#Same problem here, I need the user to finish the download before deleting the file
return send_file('/tempfiles/' + fileName, attachment_filename=fileName)
deleteFile('/tempfiles/' + fileName)
Other commenters are right that you need to use something to manage asynchronous actions. One of the most popular options, and one that comes with lots of tools for completing delayed, scheduled, and asynchronous actions is Celery. You can do what you want using celery with something like the following:
from celery import Celery
...
# This is for Redis on the local host. You can also use RabbitMQ or AWS SQS.
celery = Celery(app.name, broker='redis://localhost:6379/0')
celery.conf.update(app.config)
...
# Create your Celery task
#celery.task(bind=True)
def generateXls(file_name, downloadLink):
createXls('/tempfiles/' + fileName)
sendMail(downloadLink)
#app.route('/getXls')
def getXls:
fileName = uuid.uuid4().hex + '.xls'
downloadLink = host + '/tempfiles/' + fileName
# Asynchronously call your Celery task.
generateXls.delay(file_name, downloadLink)
return downloadLink
This will return the download link immediately while continuing with generateXls in its own thread.

deadline = None after using urlfetch.set_default_fetch_deadline(n)

I'm working on a web application with Python and Google App Engine.
I tried to set the default URLFetch deadline globally as suggested in a previous thread:
https://stackoverflow.com/a/14698687/2653179
urlfetch.set_default_fetch_deadline(45)
However it doesn't work - When I print its value in one of the functions: urlfetch.get_default_fetch_deadline() is None.
Here is main.py:
from google.appengine.api import users
import webapp2
import jinja2
import random
import string
import hashlib
import CQutils
import time
import os
import httpRequests
import logging
from google.appengine.api import urlfetch
urlfetch.set_default_fetch_deadline(45)
...
class Del(webapp2.RequestHandler):
def get(self):
id = self.request.get('id')
ext = self.request.get('ext')
user_id = httpRequests.advance(id,ext)
d2 = urlfetch.get_default_fetch_deadline()
logging.debug("value of deadline = %s", d2)
Prints in the Log console:
DEBUG 2013-09-05 07:38:21,654 main.py:427] value of deadline = None
The function which is being called in httpRequests.py:
def advance(id, ext=None):
url = "http://localhost:8080/api/" + id + "/advance"
if ext is None:
ext = ""
params = urllib.urlencode({'ext': ext})
result = urlfetch.fetch(url=url,
payload=params,
method=urlfetch.POST,
headers={'Content-Type': 'application/x-www-form-urlencoded'})
if (result.status_code == 200):
return result.content
I know this is an old question, but recently ran into the issue.
The setting is placed into a thread-local, meaning that if your application is set to thread-safe and you handle a request in a different thread than the one you set the default deadline for, it can be lost. For me, the solution was to set the deadline before every request as part of the middleware chain.
This is not documented, and required looking through the source to figure it out.

How can I use multiple databases in the same request in Cherrypy and SQLAlchemy?

My app connects to multiple databases using a technique similar to this. It works so long as I don't try to access different databases in the same request. Having looked back to the above script I see they have written a comment to this end:
SQLAlchemy integration for CherryPy,
such that you can access multiple databases,
but only one of these databases per request or thread.
My app now requires me to fetch data from Database A and Database B. Is it possible to do this in a single request?
Please see below for sources and examples:
Working Example 1:
from model import meta
my_object_instance = meta.main_session().query(MyObject).filter(
MyObject.id == 1
).one()
Working Example 2:
from model import meta
my_user = meta.user_session().query(User).filter(
User.id == 1
).one()
Error Example:
from model import meta
my_object_instance = meta.main_session().query(MyObject).filter(
MyObject.id == 1
).one()
my_user = meta.user_session().query(User).filter(
User.id == 1
).one()
This errors with:
(sqlalchemy.exc.ProgrammingError) (1146, "Table 'main_db.user' doesn't exist")
Sources:
# meta.py
import cherrypy
import sqlalchemy
from sqlalchemy import MetaData
from sqlalchemy.orm import scoped_session, sessionmaker
from sqlalchemy.ext.declarative import declarative_base
# Return an Engine
def create_engine(defaultschema = True, schema = "", **kwargs):
# A blank DB is the same as no DB so to specify a non-schema-specific connection just override with defaultschema = False
connectionString = 'mysql://%s:%s#%s/%s?charset=utf8' % (
store['application'].config['main']['database-server-config-username'],
store['application'].config['main']['database-server-config-password'],
store['application'].config['main']['database-server-config-host'],
store['application'].config['main']['database-server-config-defaultschema'] if defaultschema else schema
)
# Create engine object. we pass **kwargs through so this call can be extended
return sqlalchemy.create_engine(connectionString, echo=True, pool_recycle=10, echo_pool=True, encoding='utf-8', **kwargs)
# Engines
main_engine = create_engine()
user_engine = None
# Sessions
_main_session = None
_user_session = None
# Metadata
main_metadata = MetaData()
main_metadata.bind = main_engine
user_metadata = MetaData()
# No idea what bases are/do but nothing works without them
main_base = declarative_base(metadata = main_metadata)
user_base = declarative_base(metadata = user_metadata)
# An easy collection of user database connections
engines = {}
# Each thread gets a session based on this object
GlobalSession = scoped_session(sessionmaker(autoflush=True, autocommit=False, expire_on_commit=False))
def main_session():
_main_session = cherrypy.request.main_dbsession
_main_session.configure(bind=main_engine)
return _main_session
def user_session():
_user_session = cherrypy.request.user_dbsession
_user_session.configure(bind = get_user_engine())
return _user_session
def get_user_engine():
# Get dburi from the users instance
dburi = cherrypy.session['auth']['user'].instance.database
# Store this engine for future use
if dburi in engines:
engine = engines.get(dburi)
else:
engine = engines[dburi] = create_engine(defaultschema = False, schema = dburi)
# Return Engine
return engine
def get_user_metadata():
user_metadata.bind = get_user_engine()
return user_metadata
# open a new session for the life of the request
def open_dbsession():
cherrypy.request.user_dbsession = cherrypy.thread_data.scoped_session_class
cherrypy.request.main_dbsession = cherrypy.thread_data.scoped_session_class
return
# close the session for this request
def close_dbsession():
if hasattr(cherrypy.request, "user_dbsession"):
try:
cherrypy.request.user_dbsession.flush()
cherrypy.request.user_dbsession.remove()
del cherrypy.request.user_dbsession
except:
pass
if hasattr(cherrypy.request, "main_dbsession"):
try:
cherrypy.request.main_dbsession.flush()
cherrypy.request.main_dbsession.remove()
del cherrypy.request.main_dbsession
except:
pass
return
# initialize the session factory class for the selected thread
def connect(thread_index):
cherrypy.thread_data.scoped_session_class = scoped_session(sessionmaker(autoflush=True, autocommit=False))
return
# add the hooks to cherrypy
cherrypy.tools.dbsession_open = cherrypy.Tool('on_start_resource', open_dbsession)
cherrypy.tools.dbsession_close = cherrypy.Tool('on_end_resource', close_dbsession)
cherrypy.engine.subscribe('start_thread', connect)
You could also choose an ORM that is designed from the ground up for multiple databases, like Dejavu.
Take a look at this:
http://pythonhosted.org/Flask-SQLAlchemy/binds.html
Basically, it suggests that you use a bind param - for each connection. That said, this seems to be a bit of a hack.
This question has a lot more detail in the answer:
With sqlalchemy how to dynamically bind to database engine on a per-request basis
That said, both this question and the one referenced aren't the newest and sqlalchemy will probably have moved on since then.

Categories