I'm trying to test a queued redis job but the meta data doesn't seem to be passing between the task and the originator. The job_id's appear to match so I'm a perplexed. Maybe some fresh eyes can help me work out the problem:
The task is as per the documentation:
from rq import get_current_job
def do_test(word):
job = get_current_job()
print job.get_id()
job.meta['word'] = word
job.save()
print "saved: ", job.meta['word']
return True
The rqworker log prints the job_id and word after it is saved
14:32:32 *** Listening on default...
14:33:07 default: labeller.do_test('supercalafragelistic') (a6e2e579-df26-411a-b017-8788d621149f)
a6e2e579-df26-411a-b017-8788d621149f
saved: supercalafragelistic
14:33:07 Job OK, result = True
14:33:07 Result is kept for 500 seconds.
The task is invoked from a unittest:
class RedisQueueTestCase(unittest.TestCase):
"""
Requires running "rqworker" on the localhost cmdline
"""
def setUp(self):
use_connection()
self.q = Queue()
def test_enqueue(self):
job = self.q.enqueue(do_test, "supercalafragelistic")
while True:
print job.get_id(), job.get_status(), job.meta.get('word')
if job.is_finished:
print "Result: ", job.result, job.meta.get('word')
break
time.sleep(0.25)
And generates this log showing the same job_id and correct result, but the meta variable word is never populated.
Testing started at 2:33 PM ...
a6e2e579-df26-411a-b017-8788d621149f queued None
a6e2e579-df26-411a-b017-8788d621149f finished None
Result: True None
Process finished with exit code 0
I tried adding a long delay so the log has a chance to see the task in started, but not finished state (in case meta is cleared when it finishes), but it didn't make any difference.
Any idea what I've missed?
The local job doesn't automatically update itself after a save occurs at the remote end. One must do a refresh to update it. Before the refactoring this was not necessary as I was doing a fetch_job with the job_id on every request.
So the test routine needs to include a refresh() (or fetch_job) to reflect any changes:
def test_enqueue(self):
job = self.q.enqueue(do_test, "supercalafragelistic")
while True:
job.refresh() #<--- well, duh, freddy
print job.get_id(), job.get_status(), job.meta.get('word')
if job.is_finished:
print "Result: ", job.result, job.meta.get('word')
break
time.sleep(0.25)
Which works a bit better:
Testing started at 5:14 PM ...
6ea0163f-b5d5-411a-906a-f765aa0b3cc6 queued None 0 []
6ea0163f-b5d5-411a-906a-f765aa0b3cc6 started supercalafragelistic
6ea0163f-b5d5-411a-906a-f765aa0b3cc6 finished supercalafragelistic
Result: True supercalafragelistic
The fact that the get_status was updating fooled me into overlooking this: get_status() is a method that goes as looks for the current status, whereas meta is just a pointer to some possibly stale data somewhere.
Related
I have the following setup
class TaskA(luigi.Task):
def requires(self):
yield TaskB()
if not get_results_from_task_B_written_on_S3():
print('Did not find any results and will exit')
return
else:
print('Found results and will proceed')
yield TaskC()
results = get_results_from_task_C_written_on_S3():
# do other stuff
class TaskB(luigi.Task):
def run(self):
// process and write results to s3
def output(self):
return URITarget('b_path')
class TaskC(luigi.Task):
def run(self):
// process and write results to s3
def output(self):
return URITarget('c_path')
The Luigi logs show the following:
Did not find any results and will exit
Found results and will proceed
To me seems like the control flow enters both if and else. Since this is in principle impossible I suspect that Luigi attempts to run the pipeline twice. Once it produces this
Did not find any results and will exit
Since it cannot find any results written on s3 from TaskB.
Then TaskB actually finishes its execution. Writes its results on s3. TaskA reruns. Finds the results from TaskB on s3 and produces
Found results and will proceed
But then it seems like the yield of TaskC is not working. It's just stuck there indifinitely.
This is just my assumption of Luigi's behavior. Please let me know if I'm wrong about this.
I need this modularisation of tasks B and C into separate tasks since it makes testing much easier. TaskC is a fairly complex tasks whose test setup would be much more involved than testing its constituents separately.
Part of the problem is that requires() can get called multiple times during scheduling. Therefore, the first time your TaskA.requires() gets called, it yields TaskB. But the next time TaskA.requires() is called, you are yielding TaskB again and you hit the else block. That first call to TaskA.requires() is the only one that gets used for the actual scheduling dependencies.
I wrote a test program just to test this out and you can see in my output how many times TaskB.output() is called.
import luigi
taskC_complete = False
taskB_complete = False
def get_results_from_task_C_written_on_S3():
return taskC_complete
def get_results_from_task_B_written_on_S3():
return taskB_complete
def set_taskB_complete():
taskB_complete = True
def set_taskC_complete():
taskC_complete = True
class TaskA(luigi.Task):
def requires(self):
yield TaskB()
if not get_results_from_task_B_written_on_S3():
print('Did not find any results and will exit')
return
else:
print('Found results and will proceed')
yield TaskC()
results = get_results_from_task_C_written_on_S3()
class TaskB(luigi.Task):
def run(self):
print("Task B")
def output(self):
return print('b_path')
class TaskC(luigi.Task):
def run(self):
print("Task C")
def output(self):
return print('c_path')
if __name__ == '__main__':
luigi_run_results = luigi.build([TaskA()], workers=1,
local_scheduler=True, detailed_summary=True, log_level='INFO')
This code outputs
Did not find any results and will exit
b_path
Task B
Did not find any results and will exit
b_path
Did not find any results and will exit
b_path
Did not find any results and will exit
b_path
Although the code is not a perfect replica of what you are attempting, here's the output from the scheduler which shows what will actually run:
INFO: Informed scheduler that task TaskA__99914b932b has status PENDING
INFO: Informed scheduler that task TaskB__99914b932b has status PENDING
I'm not sure what exactly you're trying to achieve, but read up on their documentation on task dependencies. You're better off trying to yield other tasks in your run() function for TaskA.
I want to use rq to run tasks on a separate worker to gather data from a measuring instrument. The end of the task will be signaled by a user pressing a button on a dash app.
The problem is that the task itself does not know when to terminate since it doesn't have access to the dash app's context.
I already use meta to pass information from the worker back to the caller but can I pass information from the caller to the worker?
Example task:
from rq import get_current_job
from time import time
def mock_measurement():
job = get_current_job()
t_start = time()
# Run the measurement
t = []
i = []
job.meta['should_stop'] = False # I want to use this tag to tell the job to stop
while not job.meta['should_stop']:
t.append(time() - t_start)
i.append(np.random.random())
job.meta['data'] = (t, i)
job.save_meta()
sleep(5)
print("Job Finished")
From the console, I can start a job as such
queue = rq.Queue('test-app', connection=Redis('localhost', 6379))
job = queue.enqueue('tasks.mock_measurement')
and I would like to be able to do this from the console to signify to the worker it can stop running:
job.meta['should_stop'] = True
job.save_meta()
job.refresh
However, while the commands above return without an error, they do not actually update the meta dictionary.
Because you didn't fetch the updated meta. But, don't do this!!
Invoking save_meta and refresh in caller and worker will lose data.
Instead, Use job.connection.set(job + ':should_stop', 1, ex=300) to set flag, and use job.connection.get(job + ':should_stop') to check if flag is set.
I'm trying to pause a celery task temporary based on user button click.
What I've done is:
When a user clicks a button; I release an AJAX request that updates my celery task state to "PAUSE"
Then; my tactic was to; when I initate a task into celery; it runs a for loop.
Every for loop; I read my database 'state' and see if it's set to PAUSE: if it is set to pause; I want to sleep it for 60 seconds or sleep it until user hits resume button; same idea.
This is my code:
r = redis.StrictRedis(host='localhost', port=6379, db=0)
#celery.task(bind=True)
def runTask(self, arr)
for items in arr:
current_task_id = self.request.id
item = r.get('celery-task-meta-'+current_task_id)
load_as_json = json.loads(item)
if "PAUSE" in load_as_json['status']:
sleep(50)
#app.route('/start')
def start_task()
runTask.apply_async(args=[arr])
return 'task started running
Here is how my pause API endpoint looks like:
#app.route('/stop/<task_id>')
def updateTaskState():
task_id = request.cookie.get('task_id')
loadAsJson = json.loads(r.get('celery-task-meta-'+str(task_id)))
loadAsJson['status'] = 'PAUSE'
loadAsJson.update(loadAsJson)
dump_as_json = json.dumps(loadAsJson)
updated_state = r.set('celery-task-meta-'+last_key, dump_as_json)
return 'updated state';
From what I conceptually understand; is that the reason why I'm not seeing an updated state is because; the task is already executed and isnt able to retrieve updated values from database.
FYI: The task update state is set to PAUSE immediately; I checked this by creating a seperate script that checks state within while loop; everytime I click the button that release AJAX request to update the state; my db gets updated and it reads "PAUSE" on the seperate script; however within the #celery.task decorator I can't seem to get the updated state.
Below is my seperate script I used to test; and it seems to be updatign state as expected; I just can't get the updated state within task decorator... weirdly.
r = redis.StrictRedis(host='localhost', port=6379, db=0)
last_key = r.keys()
while True:
response = r.get('celery-task-meta-b1534a87-e18b-4f0a-89e2-08348d833056')
loadAsJson = json.loads(response)
print loadAsJson['status']
Faced with the same question and no good answers I came up with solution you might like and it is not dependent on the message queue you are using (aka Redis or RabbitMQ). The key for me was that the update_state method in the celery.app.task.Task class takes task_id as an optional parameter. In my case I am running long running file copy and checksum tasks through multiple worker nodes and sometimes the user wants to pause one running task to reduce performance requirements on the storage to allow other tasks to finish first. I am also running a stateless Flask REST API to initiate the backend tasks and retrieve status of running tasks so I needed a way to have an API call come in to pause and resume the tasks.
Here is my test function which can receive a "message" to pause itself by monitoring it's own state:
celery.task(bind=True)
def long_test(self, i):
print('long test starting with delay of ' + str(i) + 'seconds on each loop')
print('task_id =' + str(self.request.id))
self.update_state(state='PROCESSING')
count = 0
while True:
task = celery.AsyncResult(self.request.id)
while task.state == 'PAUSING' or task.state == 'PAUSED':
if task.state == 'PAUSING':
self.update_state(state='PAUSED')
time.sleep(i)
if task.state == 'RESUME':
self.update_state(state='PROCESSING')
print('long test loop ' + str(count) + ' ' + str(task.state))
count += 1
time.sleep(i)
Then, in order to pause or resume I can do the following:
>>> from project.celeryworker.tasks import long_test
>>> from project import create_app, make_celery
>>> flaskapp = create_app()
>>> celery = make_celery(flaskapp)
>>> from celery.app.task import Task
>>> long_test.apply_async(kwargs={'i': 5})
<AsyncResult: bf19d50f-cf04-47f0-a069-6545fb253887>
>>> Task.update_state(self=celery, task_id='bf19d50f-cf04-47f0-a069-6545fb253887', state='PAUSING')
>>> celery.AsyncResult('bf19d50f-cf04-47f0-a069-6545fb253887').state
'PAUSED'
>>> Task.update_state(self=celery, task_id='bf19d50f-cf04-47f0-a069-6545fb253887', state='RESUME')
>>> celery.AsyncResult('bf19d50f-cf04-47f0-a069-6545fb253887').state
'PROCESSING'
>>> Task.update_state(self=celery, task_id='bf19d50f-cf04-47f0-a069-6545fb253887', state='PAUSING')
>>> celery.AsyncResult('bf19d50f-cf04-47f0-a069-6545fb253887').state
'PAUSED'
In a nutshell
I get a BrokenProcessPool exception when parallelizing my code with concurrent.futures. No further error is displayed. I want to find the cause of the error and ask for ideas of how to do that.
Full problem
I am using concurrent.futures to parallelize some code.
with ProcessPoolExecutor() as pool:
mapObj = pool.map(myMethod, args)
I end up with (and only with) the following exception:
concurrent.futures.process.BrokenProcessPool: A child process terminated abruptly, the process pool is not usable anymore
Unfortunately, the program is complex and the error appears only after the program has run for 30 minutes. Therefore, I cannot provide a nice minimal example.
In order to find the cause of the issue, I wrapped the method that I run in parallel with a try-except-block:
def myMethod(*args):
try:
...
except Exception as e:
print(e)
The problem remained the same and the except block was never entered. I conclude that the exception does not come from my code.
My next step was to write a custom ProcessPoolExecutor class that is a child of the original ProcessPoolExecutor and allows me to replace some methods with cusomized ones. I copied and pasted the original code of the method _process_worker and added some print statements.
def _process_worker(call_queue, result_queue):
"""Evaluates calls from call_queue and places the results in result_queue.
...
"""
while True:
call_item = call_queue.get(block=True)
if call_item is None:
# Wake up queue management thread
result_queue.put(os.getpid())
return
try:
r = call_item.fn(*call_item.args, **call_item.kwargs)
except BaseException as e:
print("??? Exception ???") # newly added
print(e) # newly added
exc = _ExceptionWithTraceback(e, e.__traceback__)
result_queue.put(_ResultItem(call_item.work_id, exception=exc))
else:
result_queue.put(_ResultItem(call_item.work_id,
result=r))
Again, the except block is never entered. This was to be expected, because I already ensured that my code does not raise an exception (and if everything worked well, the exception should be passed to the main process).
Now I am lacking ideas how I could find the error. The exception is raised here:
def submit(self, fn, *args, **kwargs):
with self._shutdown_lock:
if self._broken:
raise BrokenProcessPool('A child process terminated '
'abruptly, the process pool is not usable anymore')
if self._shutdown_thread:
raise RuntimeError('cannot schedule new futures after shutdown')
f = _base.Future()
w = _WorkItem(f, fn, args, kwargs)
self._pending_work_items[self._queue_count] = w
self._work_ids.put(self._queue_count)
self._queue_count += 1
# Wake up queue management thread
self._result_queue.put(None)
self._start_queue_management_thread()
return f
The process pool is set to be broken here:
def _queue_management_worker(executor_reference,
processes,
pending_work_items,
work_ids_queue,
call_queue,
result_queue):
"""Manages the communication between this process and the worker processes.
...
"""
executor = None
def shutting_down():
return _shutdown or executor is None or executor._shutdown_thread
def shutdown_worker():
...
reader = result_queue._reader
while True:
_add_call_item_to_queue(pending_work_items,
work_ids_queue,
call_queue)
sentinels = [p.sentinel for p in processes.values()]
assert sentinels
ready = wait([reader] + sentinels)
if reader in ready:
result_item = reader.recv()
else: #THIS BLOCK IS ENTERED WHEN THE ERROR OCCURS
# Mark the process pool broken so that submits fail right now.
executor = executor_reference()
if executor is not None:
executor._broken = True
executor._shutdown_thread = True
executor = None
# All futures in flight must be marked failed
for work_id, work_item in pending_work_items.items():
work_item.future.set_exception(
BrokenProcessPool(
"A process in the process pool was "
"terminated abruptly while the future was "
"running or pending."
))
# Delete references to object. See issue16284
del work_item
pending_work_items.clear()
# Terminate remaining workers forcibly: the queues or their
# locks may be in a dirty state and block forever.
for p in processes.values():
p.terminate()
shutdown_worker()
return
...
It is (or seems to be) a fact that a process terminates, but I have no clue why. Are my thoughts correct so far? What are possible causes that make a process terminate without a message? (Is this even possible?) Where could I apply further diagnostics? Which questions should I ask myself in order to come closer to a solution?
I am using python 3.5 on 64bit Linux.
I think I was able to get as far as possible:
I changed the _queue_management_worker method in my changed ProcessPoolExecutor module such that the exit code of the failed process is printed:
def _queue_management_worker(executor_reference,
processes,
pending_work_items,
work_ids_queue,
call_queue,
result_queue):
"""Manages the communication between this process and the worker processes.
...
"""
executor = None
def shutting_down():
return _shutdown or executor is None or executor._shutdown_thread
def shutdown_worker():
...
reader = result_queue._reader
while True:
_add_call_item_to_queue(pending_work_items,
work_ids_queue,
call_queue)
sentinels = [p.sentinel for p in processes.values()]
assert sentinels
ready = wait([reader] + sentinels)
if reader in ready:
result_item = reader.recv()
else:
# BLOCK INSERTED FOR DIAGNOSIS ONLY ---------
vals = list(processes.values())
for s in ready:
j = sentinels.index(s)
print("is_alive()", vals[j].is_alive())
print("exitcode", vals[j].exitcode)
# -------------------------------------------
# Mark the process pool broken so that submits fail right now.
executor = executor_reference()
if executor is not None:
executor._broken = True
executor._shutdown_thread = True
executor = None
# All futures in flight must be marked failed
for work_id, work_item in pending_work_items.items():
work_item.future.set_exception(
BrokenProcessPool(
"A process in the process pool was "
"terminated abruptly while the future was "
"running or pending."
))
# Delete references to object. See issue16284
del work_item
pending_work_items.clear()
# Terminate remaining workers forcibly: the queues or their
# locks may be in a dirty state and block forever.
for p in processes.values():
p.terminate()
shutdown_worker()
return
...
Afterwards I looked up the meaning of the exit code:
from multiprocessing.process import _exitcode_to_name
print(_exitcode_to_name[my_exit_code])
whereby my_exit_code is the exit code that was printed in the block I inserted to the _queue_management_worker. In my case the code was -11, which means that I ran into a segmentation fault. Finding the reason for this issue will be a huge task but goes beyond the scope of this question.
If you are using macOS, there is a known issue with how some versions of macOS uses forking that's not considered fork-safe by Python in some scenarios. The workaround that worked for me is to use no_proxy environment variable.
Edit ~/.bash_profile and include the following (it might be better to specify list of domains or subnets here, instead of *)
no_proxy='*'
Refresh the current context
source ~/.bash_profile
My local versions the issue was seen and worked around are: Python 3.6.0 on
macOS 10.14.1 and 10.13.x
Sources:
Issue 30388
Issue 27126
First time into the realm of Luigi (and Python!) and have some questions. Relevant code is:
from Database import Database
import luigi
class bbSanityCheck(luigi.Task):
conn = luigi.Parameter()
date = luigi.Parameter()
def __init__(self, *args, **kwargs):
super(bbSanityCheck, self).__init__(*args, **kwargs)
self.has_run = False
def run(self):
print "Entering run of bb sanity check"
# DB STUFF HERE THAT DOESN"T MATTER
print "Are we in la-la land?"
def complete(self):
print "BB Sanity check being asked for completeness: " , self.has_run
return self.has_run
class Pipeline(luigi.Task):
date = luigi.DateParameter()
def requires(self):
db = Database('cbs')
self.conn = db.connect()
print "I'm about to yield!"
return bbSanityCheck(conn = self.conn, date = self.date)
def run(self):
print "Hello World"
self.conn.query("""SELECT *
FROM log_blackbook""")
result = conn.store_result()
print result.fetch_row()
def complete(self):
return False
if __name__=='__main__':
luigi.run()
Output is here (with relevant DB returns removed 'cause):
DEBUG: Checking if Pipeline(date=2013-03-03) is complete
I'm about to yield!
INFO: Scheduled Pipeline(date=2013-03-03)
I'm about to yield!
DEBUG: Checking if bbSanityCheck(conn=<_mysql.connection open to 'sas1.rad.wc.truecarcorp.com' at 223f050>, date=2013-03-03) is complete
BB Sanity check being asked for completeness: False
INFO: Scheduled bbSanityCheck(conn=<_mysql.connection open to 'sas1.rad.wc.truecarcorp.com' at 223f050>, date=2013-03-03)
INFO: Done scheduling tasks
DEBUG: Asking scheduler for work...
DEBUG: Pending tasks: 2
INFO: [pid 5150] Running bbSanityCheck(conn=<_mysql.connection open to 'sas1.rad.wc.truecarcorp.com' at 223f050>, date=2013-03-03)
Entering run of bb sanity check
Are we in la-la land?
INFO: [pid 5150] Done bbSanityCheck(conn=<_mysql.connection open to 'sas1.rad.wc.truecarcorp.com' at 223f050>, date=2013-03-03)
DEBUG: Asking scheduler for work...
INFO: Done
INFO: There are no more tasks to run at this time
INFO: There are 1 pending tasks possibly being run by other workers
INFO: Worker was stopped. Shutting down Keep-Alive thread
So the questions:
1.) Why does "I'm about to yield" get printed twice?
2.) Why is "hello world" never printed?
3.) What is the "1 pending tasks possibly run by other workers"?
I prefer super-ultra clean output because it is way easier to maintain. I'm hoping I can get these warning equivalents ironed out.
I've also noted that requires either "yield" or "return item, item2, item3". I've read about yield and understand it. What I don't get is which convention is considered superior here or if their are subtle differences that I being new to the language am not getting.
I think you're misunderstanding how luigi works in general.
(1) Hmm.. not sure about that. It looks more like an issue with printing the same thing in both INFO and DEBUG to me
(2)
So, you're trying to run Pipeline which depends on bbSanityCheck to run. bbSanityCheck.complete() never returns True because you never set has_run to True in bbSanityCheck. So the Pipeline task can never run and output hello world, because its dependencies are never complete.
(3) That's probably because you have this pending task(it's actually Pipeline). But Luigi understands it is impossible for it to run and shuts down.
I would personally not use has_run as a way to check if a task has run, but instead check for the existence of the result of this job. Ie, if this job does sth to the database then, complete() should check that the expected contents are there.