I am trying to run multiple tasks in queue. The tasks come on user input. What i tried was creating a singleton class with ThreadPoolExecutor property and adding tasks into it. The tasks are added fine, but it looks like only the first addition of set of tasks works. The following are added but not executed.
class WebsiteTagScrapper:
class __WebsiteTagScrapper:
def __init__(self):
self.executor = ThreadPoolExecutor(max_workers=5)
instance = None
def __new__(cls): # __new__ always a classmethod
if not WebsiteTagScrapper.instance:
WebsiteTagScrapper.instance = WebsiteTagScrapper.__WebsiteTagScrapper()
return WebsiteTagScrapper.instance
I used multiprocess in one of my project without using celery, cause i think it was overkill for my use.
Maybe you could do something like this:
from multiprocessing import Process
class MyQueuProcess(Process):
def __init__(self):
super(MyQueuProcess, self).__init__()
self.tasks = []
def add_task(self, task):
self.tasks.append(task)
def run(self):
for task in self.tasks:
#Do your task
You just have to create an instance in your view, set up your task and then run(). Also if you need to access your database, you will need to import django in your child and then make a django.setup().
Related
I'm trying to create some celery tasks as classes, but am having some difficulty. The classes are:
class BaseCeleryTask(app.Task):
def is_complete(self):
""" default method for checking if celery task has completed. """
# simply return result (since by default tasks return boolean indicating completion)
try:
return self.result
except AttributeError:
logger.error('Result not defined. Make sure task has run!')
return False
class MacroReportTask(BaseCeleryTask):
def run(self, params):
""" Override the default run method with signal factory run"""
# hold on to the factory
process = MacroCountryReport(params)
self.result = process.run()
return self.result
but when I initialize the app, and check app.tasks (or run worker), app doesn't seem to have these above tasks in its registry. Other function based tasks (using app.task() decorator) seem to be registered fine.
I run the above task as:
process = SignalFactoryTask()
process.delay(params)
Celery worker errors with the following message:
Received unregistered task of type None.
I think the issue I'm having is: how do I add custom classes to the task registry as I do with regular function based tasks?
Ran into the exact same issue, took hours to find the solution cause I'm 90% sure it's a bug. In your class tasks, try the following
class BaseCeleryTask(app.Task):
def __init__(self):
self.name = "[modulename].BaseCeleryTask"
class MacroReportTask(app.Task):
def __init__(self):
self.name = "[modulename].MacroReportTask"
It seems registering it with the app still has a bug where the name isn't automatically configured. Let me know if that works.
If I have a class with attributes...
class Test(object):
def __init__():
self.variable='test'
self.variable2=''
def testmethod():
print self.variable2
t=Test()
#celery.task(name="tasks.application")
def application():
t.testmethod()
t.variable2 = '1234'
job = application.apply_async()
and I want to access the attributes of my class...
In my testing I am not able to access t.variable2 once inside of my celery task... How can I get access to those attributes?
Thanks!
Tasks are executed by a separate worker process, which being in a different process does not have access to the thread where you assigned those values. You need to send the data required by the class you're instantiating inside the task as arguments to the task, and create the instance inside the task as well:
#celery.task(name="tasks.application")
def application(variable, variable2):
t = Test()
t.variable = variable
t.variable2 = variable2
t.testmethod()
job = application.apply_async(['test', '1234'])
I have a UUT class which instantiates Worker objects, and calls their do_stuff() method.
The Worker objects uses a Provider object for two things:
Calls methods on the provider object to do some stuff
Gets notifications from the provider by subscribing a method with the provider's events
When a worker gets a notification, it processes it, an notifies the UUT object, which in reponse can create more Worker objects.
I've already tested each class on its own, and I want to test UUT+Worker together. For that, I intend to mock-out Provider.
import mock
import unittest
import provider
class Worker():
def __init__(self, *args):
resource.default_resource.subscribe('on_spam', self._on_spam) # I'm going to patch 'resource.default_resource'
def do_stuff(self):
self.resource.do_stuff()
def _on_spam(self, message):
self._tell_uut_to_create_more_workers(message['num_of_new_workers_to_create'])
class UUT():
def __init__(self, *args):
self._workers = []
def gen_worker_and_do_stuff(self, *args)
worker = Worker(*args)
self._workers.append(resource)
worker.do_stuff()
class TestCase1(unittest.TestCase):
#mock.patch('resource.default_resource', spec_set=resource.Resource)
def test_1(self, mock_resource):
uut = UUT()
uut.gen_worker_and_do_stuff('Egg') # <-- say I automagically grabbed the resulting Worker into self.workers
self.workers[0]._on_spam({'num_of_new_workers_to_create':5}) # <-- I also want to get hold of the newly-created workers
Is there a way to grab the worker objects generated by uut, without directly accessing the _workers list in uut (which is an implementation detail)?
I guess I can do it in Worker.__init__, where the worker subscribes to provider events, so I guess the question reduces to:
How to I extract the self in the callee, when calling resource.default_resource.subscribe('on_spam', self._on_spam)?
As an application of the Dependency Inversion principle, I'd pass the Worker class as a dependency to UUT:
class UUT():
def __init__(self, make_worker=Worker):
self._workers = []
self._make_worker = make_worker
def gen_worker_and_connect(self, *args)
worker = self._make_worker(*args)
self._workers.append(resource)
worker.connect()
Then provide anything you want from the test instead of Worker. This own function could share the created object with the test scope. Besides solving this particular problem, that would also make the dependency explicit and independent of the UUT implementation. And you would not need to mock the resource thing as well, which makes the test dependent on things unrelated to the class under test.
I'm creating a task (by subclassing celery.task.Task) that creates a connection to Twitter's streaming API. For the Twitter API calls, I am using tweepy. As I've read from the celery-documentation, 'a task is not instantiated for every request, but is registered in the task registry as a global instance.' I was expecting that whenever I call apply_async (or delay) for the task, I will be accessing the task that was originally instantiated but that doesn't happen. Instead, a new instance of the custom task class is created. I need to be able to access the original custom task since this is the only way I can terminate the original connection created by the tweepy API call.
Here's some piece of code if this would help:
from celery import registry
from celery.task import Task
class FollowAllTwitterIDs(Task):
def __init__(self):
# requirements for creation of the customstream
# goes here. The CustomStream class is a subclass
# of tweepy.streaming.Stream class
self._customstream = CustomStream(*args, **kwargs)
#property
def customstream(self):
if self._customstream:
# terminate existing connection to Twitter
self._customstream.running = False
self._customstream = CustomStream(*args, **kwargs)
def run(self):
self._to_follow_ids = function_that_gets_list_of_ids_to_be_followed()
self.customstream.filter(follow=self._to_follow_ids, async=False)
follow_all_twitterids = registry.tasks[FollowAllTwitterIDs.name]
And for the Django view
def connect_to_twitter(request):
if request.method == 'POST':
do_stuff_here()
.
.
.
follow_all_twitterids.apply_async(args=[], kwargs={})
return
Any help would be appreciated. :D
EDIT:
For additional context for the question, the CustomStream object creates an httplib.HTTPSConnection instance whenever the filter() method is called. This connection needs to be closed whenever there is another attempt to create one. The connection is closed by setting customstream.running to False.
The task should only be instantiated once, if you think it is not for some reason,
I suggest you add a
print("INSTANTIATE")
import traceback
traceback.print_stack()
to the Task.__init__ method, so you could tell where this would be happening.
I think your task could be better expressed like this:
from celery.task import Task, task
class TwitterTask(Task):
_stream = None
abstract = True
def __call__(self, *args, **kwargs):
try:
return super(TwitterTask, self).__call__(stream, *args, **kwargs)
finally:
if self._stream:
self._stream.running = False
#property
def stream(self):
if self._stream is None:
self._stream = CustomStream()
return self._stream
#task(base=TwitterTask)
def follow_all_ids():
ids = get_list_of_ids_to_follow()
follow_all_ids.stream.filter(follow=ids, async=false)
I'm working on a project in Tornado that relies heavily on the asynchronous features of the library. By following the chat demo, I've managed to get long-polling working with my application, however I seem to have run into a problem with the way it all works.
Basically what I want to do is be able to call a function on the UpdateManager class and have it finish the asynchronous request for any callbacks in the waiting list. Here's some code to explain what I mean:
update.py:
class UpdateManager(object):
waiters = []
attrs = []
other_attrs = []
def set_attr(self, attr):
self.attrs.append(attr)
def set_other_attr(self, attr):
self.other_attrs.append(attr)
def add_callback(self, cb):
self.waiters.append(cb)
def send(self):
for cb in self.waiters:
cb(self.attrs, self.other_attrs)
class LongPoll(tornado.web.RequestHandler, UpdateManager):
#tornado.web.asynchronous
def get(self):
self.add_callback(self.finish_request)
def finish_request(self, attrs, other_attrs):
# Render some JSON to give the client, etc...
class SetSomething(tornado.web.RequestHandler):
def post(self):
# Handle the stuff...
self.add_attr(some_attr)
(There's more code implementing the URL handlers/server and such, however I don't believe that's necessary for this question)
So what I want to do is make it so I can call UpdateManager.send from another place in my application and still have it send the data to the waiting clients. The problem is that when you try to do this:
from update import UpdateManager
UpdateManager.send()
it only gets the UpdateManager class, not the instance of it that is holding user callbacks. So my question is: is there any way to create a persistent object with Tornado that will allow me to share a single instance of UpdateManager throughout my application?
Don't use instance methods - use class methods (after all, you're already using class attributes, you just might not realize it). That way, you don't have to instantiate the object, and can instead just call the methods of the class itself, which acts as a singleton:
class UpdateManager(object):
waiters = []
attrs = []
other_attrs = []
#classmethod
def set_attr(cls, attr):
cls.attrs.append(attr)
#classmethod
def set_other_attr(cls, attr):
cls.other_attrs.append(attr)
#classmethod
def add_callback(cls, cb):
cls.waiters.append(cb)
#classmethod
def send(cls):
for cb in cls.waiters:
cb(cls.attrs, cls.other_attrs)
This will make...
from update import UpdateManager
UpdateManager.send()
work as you desire it to.