I'm developing a reasonably basic app in Pyramid. The app includes functionality to send email. Currently I'm intending to use Sendgrid for this purpose but do not want to couple it too tightly. Additionally, I do not want any emails sent out during development or testing. My solution is to create lightweight middleware classes for each provider, all providing a send() method.
I imagine loose coupling can be achieved by using the Configurator object but I'm not quite there yet.
If given the following code (note there is no request as I want to be able to call this via Celery):
def send_email(sender, recipient, subject, contents):
emailer = get_emailer()
emailer.send(from=sender, to=receipient, subject=subject, body=contents)
How would the get_emailer() function look like, assuming my development.ini contained something like pyramid.includes = my_app.DumpToConsoleEmailer?
Your mention of Celery changes everything... Celery doesn't make it very obvious, but a Celery worker is a completely absolutely separate process, which knows absolutely nothing about your Pyramid application and potentially runs on a different machine, executing tasks hours after your web application created them - a worker just takes tasks one by one from the queue and executes them. There's no request, no Configurator, no WSGI stack, no PasteDeploy which assembles your application from an .ini file.
Point is - a Celery worker does not know if your Pyramid process was started in development or production configuration, unless you tell it explicitly. It is even possible to have a worker executing tasks from two applications, one running in development mode, and another in production :)
One option is to pass the configuration to your celery worker explicitly on startup (say, by declaring some variable in celeryconfig.py). Then a worker would always use the same mailer for all tasks.
Another option is to pass a "mailer_type" parameter explicitly from your Pyramid app to the worker for each task:
#task
def send_email(sender, recipient, subject, contents, mailer_type='dummy'):
emailer = get_emailer(mailer_type)
emailer.send(from=sender, to=receipient, subject=subject, body=contents)
In your Pyramid app, you can put any key/value pairs in your .ini file and access them via request.registry.settings:
send_email.delay(..., request.registry.settings['mailer_type'])
Since asking the question a month ago I have done a bit of reading. It has lead me to two possible solutions:
A) Idiomatic Pyramid
We want to address two problems:
How to set up a class, specified in a PasteDeploy Configuration File (.ini file) globally for the Pyramid app
How to access our class at runtime without "going through" request
To set up the specified class we should define an includeme() function in our module and then specify the module in our .ini file as part of pyramid.includes. In our includeme() function we then use config.registry.registerUtility(), a part of the Zope Component Architecture, to register our class and an interface it implements.
To access our class at runtime we then need to call registry.queryUtility(), having gotten the registry from pyramid.threadlocal.get_current_registry().
This solution is a bit of a hack since it uses threadlocal to get the config.
B) Lazy Module Globals
My personal solution to the problem was more simple (and likely not thread-safe):
# In module MailerHolder:
class Holder(object):
mailer = None
holder = Holder()
def get_emailer():
return holder.mailer
#In module ConsoleMailer:
import MailHolder
class ConsoleMailer(object):
def send(self, **kwargs):
# Code to print email to console
def includeme(config):
MailHolder.holder.mailer = ConsoleMailer()
Related
I have a Django rest framework app that calls 2 huey tasks in succession in a serializer create method like so:
...
def create(self, validated_data):
user = self.context['request'].user
player_ids = validated_data.get('players', [])
game = Game.objects.create()
tasks.make_players_friends_task(player_ids)
tasks.send_notification_task(user.id, game.id)
return game
# tasks.py
#db_task()
def make_players_friends_task(ids):
players = User.objects.filter(id__in=ids)
# process players
#db_task()
def send_notification_task(user_id, game_id):
user = User.objects.get(id=user_id)
game = Game.objects.get(id=game_id)
# send notifications
When running the huey process in the terminal, when I hit this endpoint, I can see that only one or the other of the tasks is ever called, but never both. I am running huey with the default settings (redis with 1 thread worker.)
If I alter the code so that I am passing in the objects themselves as parameters, rather than the ids, and remove the django queries in the #db_task methods, things seem to work alright.
The reason I initially used the ids as parameters is because I assumed (or read somewhere) that huey uses json serialization as default, but after looking into it, pickle is actually the default serializer.
One theory is that since I am only running one worker, and also have a #db_periodic_task method in the app, the process can only handle listening for tasks or executing them at any time, but not both. This is the way celery seems to work, where you need a separate process for a scheduler and a worker each, but this isn't mentioned in huey's documentation.
If you run the huey consumer it will actually spawn a separate scheduler together with the amount of workers you've specified, so that's not going to be your problem.
You're not giving enough information to actually properly see what's going wrong so check the following:
If you run the huey consumer in the terminal, observe whether all your tasks show up as properly registered so that the consumer is actually capable of consuming them.
Check whether your redis process is running.
Try performing the tasks with a blocking call to see on which tasks it fails:
task_result = tasks.make_players_friends_task(player_ids)
task_result.get(blocking=True)
task_result = tasks.send_notification_task(user.id, game.id)
task_result.get(blocking=True)
Do this with a debugger or print statements to see whether it makes it to the end of your function or where it gets stuck.
Make sure to always restart your consumer when you change code. It doesn't automatically pick up new code like the django dev server. The fact that your code works as intended while pickling whole objects instead of passing id's could point to this, as it would be really weird that this would break it. On the other hand, you shouldn't pass in django ORM objects. It makes way more sense to use your id approach.
I am having some problems using Background Threads in a Managed VM in Google App Engine.
I am getting callbacks from a library linked via Ctypes which need to be executed in the background as I am explaining in a previous question.
The problem is: The Application loses its execution context (wsgi application) and is missing environment variables like the Application id. Without those I cannot make calls to the database as they fail.
I do call the background thread like
background_thread.start_new_background_thread(saveItemsToDatabase, [])
Is there a way to copy the environment to the background thread or maybe execute the task in a different context?
Update: The traceback which makes it already clear what the problem is:
_ToDatastoreError(err)google.appengine.api.datastore_errors.BadRequestError: Application Id (app) format is invalid: '_'
application context is thread local in appengine when created through standard app handler. Remember the applications in appengine run in python27 with thread enabled already have threads. So each wsgi call then environment variables has to be thread local, or information would leak between handled requests.
This means that additional threads you create will need to be passed the app context explicitly.
In fact when you start reading the docs on background threads it is pretty clear about what is going on, https://cloud.google.com/appengine/docs/python/modules/#Python_Background_threads - A background thread's os.environ and logging entries are independent of those of the spawning thread.
So you have to copy the env (os.environ) or the parts you need and pass it to the thread as arguments. The problem may not be limited to appid you may find thats only the first thing missing. For instance if you use namespaces.
INTRO
I've recently switched to Python, after about 10 years of PHP development and habits.
Eg. in Symfony2, every request to server (Apache for instance) has to load eg. container class and instantiate it, to construct the "rest" of the objects.
As far as I understand (I hope) Python's WSGI env, an app is created once, and until that app closes, every request just calls methods/functions.
This means that I can have eg. one instance of some class, that can be accessed every time, request is dispatched, without having to instantiate it in every request. Am I right?
QUESTION
I want to have one instance of class since the call to __init__ is very expensive (in both computing and resources lockup). In PHP instantiating this in every request degrades performance, am I right that with Python's WSGI I can instantiate this once, on app startup, and use through requests? If so, how do I achieve this?
WSGI is merely a standardized interface that makes it possible to build the various components of a web-server architecture so that they can talk to each other.
Pyramid is a framework whose components are glued with each other through WSGI.
Pyramid, like other WSGI frameworks, makes it possible to choose the actual server part of the stack, like gunicorn, Apache, or others. That choice is for you to make, and there lies the ultimate answer to your question.
What you need to know is whether your server is multi-threaded or multi-process. In the latter case, it's not enough to check whether a global variable has been instantiated in order to initialize costly resources, because subsequent requests might end up in separate processes, that don't share state.
If your model is multi-threaded, then you might indeed rely on global state, but be aware of the fact that you are introducing a strong dependency in your code. Maybe a singleton pattern coupled with dependency-injection can help to keep your code cleaner and more open to change.
The best method I found was mentioned (and I missed it earlier) in Pyramid docs:
From Pyramid Docs#Startup
Note that an augmented version of the values passed as **settings to the Configurator constructor will be available in Pyramid view callable code as request.registry.settings. You can create objects you wish to access later from view code, and put them into the dictionary you pass to the configurator as settings. They will then be present in the request.registry.settings dictionary at application runtime.
There are a number of ways to do this in pyramid, depending on what you want to accomplish in the end. It might be useful to look closely at the Pyramid/SQLAlchemy tutorial as an example of how to handle an expensive initialization (database connection and metadata setup) and then pass that into the request-handling engine.
Note that in the referenced link, the important part for your question is the __init__.py file's handling of initialize_sql and the subsequent creation of DBSession.
Using Google App Engine, Python 2.7, threadsafe:true, webapp2.
I would like to include all logging.XXX() messages in my API responses, so I need an efficient way to collect up all the log messages that occur during the scope of a request. I also want to operate in threadsafe:true, so I need to be careful to get only the right log messages.
Currently, my strategy is to add a logging.Handler at the start of my webapp2 dispatch method, and then remove it at the end. To collect logs only for my thread, I instantiate the logging.Handler with the name of the current thread; the handler will simply throw out log records that are from a different thread. I am using thread name and not thread ID because I was getting some unexpected results on dev_appserver when using the ID.
Questions:
Is it efficient to constantly be adding/removing logging.Handler objects in this fashion? I.e., every request will add, then remove, a Handler. Is this "cheap"?
Is this the best way to get only the logging messages for my request? My big assumption is that each request gets its own thread, and that thread name will actually select the right items.
Am I fundamentally misunderstanding Python logging? Perhaps I should only have a single additional Handler added once at the "module-level" statically, and my dispatch should do something lighter.
Any advice is appreciated. I don't have a good understanding of what Python (and specifically App Engine Python) does under the hood with respect to logging. Obviously, this is eminently possible because the App Engine Log Viewer does exactly the same thing: it displays all the log messages for that request. In fact, if I could piggyback on that somehow, that would be even better. It absolutely needs to be super-cheap though - i.e., an RPC call is not going to cut it.
I can add some code if that will help.
I found lots of goodness here:
from google.appengine.api import logservice
entries = logservice.logs_buffer().parse_logs()
I'm working on an app that uses the standard logging module to do logging. We have a setup where we log to a bunch of files based on levels etc. We also use celery to run some jobs out of the main app (maintenance stuff usually that's time consuming).
The celery task does nothing other than call functions (lets say spam) which do the actual work. These functions use the logging module to output status messages. Now, I want to write a decorator that hijacks all the logging calls made by spam and puts them into a StringIO so that I can put them somewhere.
One of the solutions I had was to insert a handler for the root logger while the function is executing that grabs everything. However, this is messing with global shared objects which might be problematic later.
I came across this answer but it's not exactly what I'm looking for.
The thing about the StringIO is, there could be multiple processes running (Celery tasks), hence multiple StringIOs, right?
You can do something like this:
In the processes run under Celery, add to the root logger a handler which sends events to a socket (SocketHandler for TCP or DatagramHandler for UDP).
Create a socket receiver to receive and handle the events, as documented here. This acts like a consolidated StringIO across multiple Celery-run processes.
If you are using multiprocessing, you can also use the approach described here. Though that post talks about Python 3.2, the functionality is also available for Python 2.x using logutils.
Update: If you want to avoid a separate receiver process, you can log to a database directly, using a handler similar to that in this answer. If you want to buffer all the logging till the end of the process, you can use a MemoryHandler in conjunction with a database handler to achieve this.
For the StringIO handler, you could add an extra handler for the root logger that would grab everything, but at the same time add a dummy filter (Logger.addFilter) that filters everything out (so nothing is actually logged to StringIO).
You could then write a decorator for spam that removes the filter (Logger.removeFilter) before the function executes, and adds the dummy filter back after.