I am trying to configure Luigi's retry mechanism so that failed tasks will be retried a few times. However, while the task is retried successfully, Luigi exits unsuccessfully:
===== Luigi Execution Summary =====
Scheduled 3 tasks of which:
* 2 ran successfully:
- 1 FailOnceThenSucceed(path=/tmp/job-id-18.subtask)
- 1 MasterTask(path=/tmp/job-id-18)
* 1 failed:
- 1 FailOnceThenSucceed(path=/tmp/job-id-18.subtask)
This progress looks :( because there were failed tasks
So the question is: how do I configure Luigi (I have installed version 2.3.3 with pip install) so that when a task fails once, but is then retried with success, then Luigi will exit successfully with This progress looks :) instead of fail with This progress looks :(?
Here is a minimal scheduler and worker config I've come up with, as well as tasks to demonstrate the behavior:
[scheduler]
retry_count = 3
retry-delay = 1
[worker]
keep_alive=true
mytasks.py:
import luigi
class FailOnceThenSucceed(luigi.Task):
path = luigi.Parameter()
def output(self):
return luigi.LocalTarget(self.path)
def run(self):
failmarker = luigi.LocalTarget(self.path + ".hasfailedonce")
if failmarker.exists():
with self.output().open('w') as target:
target.write('OK')
else:
with failmarker.open('w') as marker:
marker.write('Failed')
raise RuntimeError("Failed once")
class MasterTask(luigi.Task):
path = luigi.Parameter()
def requires(self):
return FailOnceThenSucceed(path=self.path + '.subtask')
def output(self):
return luigi.LocalTarget(self.path)
def run(self):
with self.output().open('w') as target:
target.write('OK')
Example execution:
PYTHONPATH=. luigi --module mytasks MasterTask --workers=2 --path='/tmp/job-id-18'
This is an old issue of Luigi - where successful retried tasks were not marked as such when failed and then succeeded on retry:
https://github.com/spotify/luigi/issues/1932
It was fixed in version 2.7.2:
https://github.com/spotify/luigi/releases/tag/2.7.2
I suggest you upgrade to the latest Luigi version, i.e. by running pip install -U luigi.
Related
i write a trivial piece of code to run the tasks in Luigi. The code is as bellow:
import luigi
count = 0
class TaskC(luigi.Task):
def requires(self):
return None
def run(self):
print("Running task C ...")
global count
with self.output().open('w') as outfile:
outfile.write("Finished task C, count = %d", count)
count += 1
def output(self):
return luigi.LocalTarget("./logs/task_c.txt")
class TaskB(luigi.Task):
def requires(self):
return None
def run(self):
print("Running task B ...")
global count
with self.output().open('w') as outfile:
outfile.write("Finished task B, count = %d ...", count)
count += 1
def output(self):
return luigi.LocalTarget("./logs/task_b.txt")
class TaskA(luigi.Task):
def requires(self):
return [TaskB(), TaskC()]
def run(self):
print("Running task A ...")
global count
with self.output().open('w') as outfile:
outfile.write("Finished task A, count = %d ...", count)
count += 1
def output(self):
return luigi.LocalTarget("./logs/task_a.txt")
if __name__ == '__main__':
print("Start the fisrt luigi app :)")
luigi.run()
Expect: i want to run TaskA, but TaskA requires TaskB and TaskC -> TaskB and TaskC should run before and first when both tasks B,C are finished, then TaskA can run
Actual: Only TaskA runs. The other tasks don't. The log in console:
Start the fisrt luigi app :)
DEBUG: Checking if TaskA() is complete
INFO: Informed scheduler that task TaskA__99914b932b has status DONE
INFO: Done scheduling tasks
INFO: Running Worker with 1 processes
DEBUG: Asking scheduler for work...
DEBUG: Done
DEBUG: There are no more tasks to run at this time
INFO: Worker Worker(salt=382715991, workers=1, host=w10tng, username=tng, pid=2096) was stopped. Shutting down Keep-Alive thread
INFO:
===== Luigi Execution Summary =====
Scheduled 1 tasks of which:
* 1 complete ones were encountered:
- 1 TaskA()
Did not run any tasks
This progress looks :) because there were no failed tasks or missing dependencies
===== Luigi Execution Summary =====
Command that i used to run:
python first_luigi_app.py --local-scheduler TaskA
I don't know if i've been missing somethings ! Would appreciate if some one can help :)
you can try removing requires methods from task B and task C as currently by returning None they are skipped.
Also when using formatting with f-string it worked ok.
Run with: python -m luigi --module l1 TaskA --local-scheduler where l1 is l1.py(copy of your code)
Iam new to luigi and exploring its possibilities. I encountered a problem wherein I defined the task with (requires ,run and output method). In run(), I'm executing the contents of a file.
However , if the file do not exist , the task does not fail . Is there something I'm missing ?
import luigi
import logging
import time
import sys, os
logging.basicConfig(filename='Execution.log',level=logging.DEBUG)
date = time.strftime("%Y%m%d")
class CreateTable(luigi.Task):
def run(self):
os.system('./createsample.hql')
# with self.output().open('w') as f:
# f.write('Completed')
def output(self):
return luigi.LocalTarget('/tmp/CreateTable_Success_%s.csv' % date)
Output :
INFO: [pid 15553] Worker Worker(salt=747259359, workers=1, host=host-.com, username=root, pid=15553) running CreateTable()
sh: ./createsample.hql: No such file or directory
INFO: [pid 15553] Worker Worker(salt=747259359, workers=1, host=host-.com, username=root, pid=15553) done CreateTable()
DEBUG: 1 running tasks, waiting for next task to finish
INFO: Informed scheduler that task CreateTable__99914b932b has status DONE
Technically your code works and the Python part of your job ran successfully. The problem is that you are doing a system call that fails because the file does not exist.
What you need to do here is to check the return code of the system call. Return code 0 means it ran successfully. Any other outcome will yield a non-zero return code:
rc = os.system('./createsample.hql')
if rc:
raise Exception("something went wrong")
You might want to use the subprocess module for system calls to have more flexibility (and complexity): https://docs.python.org/2/library/subprocess.html
I have a coroutine as follows:
async def download():
downloader = DataManager()
downloader.download()
DataManager.download() method looks like:
def download(self):
start_multiple_docker_containers()
while True:
check_containers_statuses()
sleep(N) # synchronous sleep from time module
Is this a good practice? If no, how can I use asyncio.sleep in download()?
Or maybe such code structure is conceptually wrong?
Here's my solution:
import asyncio
import time
# Mocks of domain-specific functions
# ----------------------------------
def get_container_status(container_id, initial_time):
"""This mocks container status to change to 'exited' in 10 seconds"""
if time.time() - initial_time < 10:
print("%s: container %s still running" % (time.time(), container_id))
return 'running'
else:
print("%s: container %s exited" % (time.time(), container_id))
return 'exited'
def is_download_complete(container_id, initial_time):
"""This mocks download finished in 20 seconds after program's start"""
if time.time() - initial_time < 20:
print("%s: download from %s in progress" % (time.time(), container_id))
return False
else:
print("%s: download from %s done" % (time.time(), container_id))
return True
def get_downloaded_data(container_id):
return "foo"
# Coroutines
# ----------
async def container_exited(container_id, initial_time):
while True:
await asyncio.sleep(1) # == setTimeout(1000), != sleep(1000)
if get_container_status(container_id, initial_time) == 'exited':
return container_id
async def download_data_by_container_id(container_id, initial_time):
container_id = await container_exited(container_id, initial_time)
while True:
await asyncio.sleep(1)
if is_download_complete(container_id, initial_time):
return get_downloaded_data(container_id)
# Main loop
# ---------
if __name__ == "__main__":
initial_time = time.time()
loop = asyncio.get_event_loop()
tasks = [
asyncio.ensure_future(download_data_by_container_id("A", initial_time)),
asyncio.ensure_future(download_data_by_container_id("B", initial_time))
]
loop.run_until_complete(asyncio.wait(tasks))
loop.close()
Results in:
1487334722.321165: container A still running
1487334722.321412: container B still running
1487334723.325897: container A still running
1487334723.3259578: container B still running
1487334724.3285959: container A still running
1487334724.328662: container B still running
1487334725.3312798: container A still running
1487334725.331337: container B still running
1487334726.3340318: container A still running
1487334726.33409: container B still running
1487334727.336779: container A still running
1487334727.336842: container B still running
1487334728.339425: container A still running
1487334728.339506: container B still running
1487334729.34211: container A still running
1487334729.342168: container B still running
1487334730.3448708: container A still running
1487334730.34493: container B still running
1487334731.34754: container A exited
1487334731.347598: container B exited
1487334732.350253: download from A in progress
1487334732.3503108: download from B in progress
1487334733.354369: download from A in progress
1487334733.354424: download from B in progress
1487334734.354686: download from A in progress
1487334734.3548028: download from B in progress
1487334735.358371: download from A in progress
1487334735.358461: download from B in progress
1487334736.3610592: download from A in progress
1487334736.361115: download from B in progress
1487334737.363115: download from A in progress
1487334737.363211: download from B in progress
1487334738.3664992: download from A in progress
1487334738.36656: download from B in progress
1487334739.369131: download from A in progress
1487334739.36919: download from B in progress
1487334740.371079: download from A in progress
1487334740.37119: download from B in progress
1487334741.374521: download from A done
1487334741.3745651: download from B done
As for the sleep() function - no, you shouldn't use it. It blocks the whole python interpreter for 1 second, which is not what you want.
Remember, you don't have parallelism (threads etc.), you have concurrency.
I.e. you have a python interpreter with just 1 thread of execution, where your main loop and all your coroutines run, preempting each other. You want your interpreter to spend 99.999% of its working time in that main loop, created by asyncio, polling sockets and waiting for timeouts.
All your coroutines should return as fast as possible and definitely shouldn't contain blocking sleep - if you call it, it blocks the whole interpreter and prevents main loop from getting information from sockets or running coroutines in response to data, arriving to those sockets.
So, instead you should await asyncio.sleep() which is essentially equivalent to Javascript's setTimeout() - it just tells the main loop that in certain time it should wake this coroutine up and continue running it.
Suggested reading:
https://snarky.ca/how-the-heck-does-async-await-work-in-python-3-5/
https://docs.python.org/3/library/asyncio.html
It's most likely a bad practice, as time.sleep() will block everything, while you only want to block the specific coroutine (i guess).
you are making a sync operation in async world.
What about the following pattern?
async def download():
downloader = DataManager()
downloader.start_multiple_docker_containers()
while True:
downloader.check_containers_statuses()
await syncio.sleep(N)
I'm new at asyncio, but it seems that if you run sync code like this
f = app.loop.run_in_executor(None, your_sync_function, app,param1,param2,...)
then your_sync_function is running in a separate thread, and you can do time.sleep() without disturbing the asyncio loop. It blocks the loop executor's thread, but not the asyncio thread. At least, this is what it seems to do.
If you want to send messages from your_sync_function back to asyncio's loop, look into the janus library
More tips on this:
https://carlosmaniero.github.io/asyncio-handle-blocking-functions.html
My initial files are in AWS S3. Could someone point me how I need to setup this in a Luigi Task?
I reviewed the documentation and found luigi.S3 but is not clear for me what to do with that, then I searched in the web and only get links from mortar-luigi and implementation in top of luigi.
UPDATE
After following the example provided for #matagus (I created the ~/.boto file as suggested too):
# coding: utf-8
import luigi
from luigi.s3 import S3Target, S3Client
class MyS3File(luigi.ExternalTask):
def output(self):
return S3Target('s3://my-bucket/19170205.txt')
class ProcessS3File(luigi.Task):
def requieres(self):
return MyS3File()
def output(self):
return luigi.LocalTarget('/tmp/resultado.txt')
def run(self):
result = None
for input in self.input():
print("Doing something ...")
with input.open('r') as f:
for line in f:
result = 'This is a line'
if result:
out_file = self.output().open('w')
out_file.write(result)
When I execute it nothing happens
DEBUG: Checking if ProcessS3File() is complete
INFO: Informed scheduler that task ProcessS3File() has status PENDING
INFO: Done scheduling tasks
INFO: Running Worker with 1 processes
DEBUG: Asking scheduler for work...
DEBUG: Pending tasks: 1
INFO: [pid 21171] Worker Worker(salt=226574718, workers=1, host=heliodromus, username=nanounanue, pid=21171) running ProcessS3File()
INFO: [pid 21171] Worker Worker(salt=226574718, workers=1, host=heliodromus, username=nanounanue, pid=21171) done ProcessS3File()
DEBUG: 1 running tasks, waiting for next task to finish
INFO: Informed scheduler that task ProcessS3File() has status DONE
DEBUG: Asking scheduler for work...
INFO: Done
INFO: There are no more tasks to run at this time
INFO: Worker Worker(salt=226574718, workers=1, host=heliodromus, username=nanounanue, pid=21171) was stopped. Shutting down Keep-Alive thread
As you can see, the message Doing something... never prints. What is wrong?
The key here is to define an External Task that has no inputs and which outputs are those files you already have in living in S3. Luigi docs mention this in Requiring another Task:
Note that requires() can not return a Target object. If you have a simple Target object that is created externally you can wrap it in a Task class
So, basically you end up with something like this:
import luigi
from luigi.s3 import S3Target
from somewhere import do_something_with
class MyS3File(luigi.ExternalTask):
def output(self):
return luigi.S3Target('s3://my-bucket/path/to/file')
class ProcessS3File(luigi.Task):
def requires(self):
return MyS3File()
def output(self):
return luigi.S3Target('s3://my-bucket/path/to/output-file')
def run(self):
result = None
# this will return a file stream that reads the file from your aws s3 bucket
with self.input().open('r') as f:
result = do_something_with(f)
# and the you
out_file = self.output().open('w')
# it'd better to serialize this result before writing it to a file, but this is a pretty simple example
out_file.write(result)
UPDATE:
Luigi uses boto to read files from and/or write them to AWS S3, so in order to make this code work, you'll need to provide your credentials in your boto config file ~/boto (look for other possible config file locations here):
[Credentials]
aws_access_key_id = <your_access_key_here>
aws_secret_access_key = <your_secret_key_here>
First time into the realm of Luigi (and Python!) and have some questions. Relevant code is:
from Database import Database
import luigi
class bbSanityCheck(luigi.Task):
conn = luigi.Parameter()
date = luigi.Parameter()
def __init__(self, *args, **kwargs):
super(bbSanityCheck, self).__init__(*args, **kwargs)
self.has_run = False
def run(self):
print "Entering run of bb sanity check"
# DB STUFF HERE THAT DOESN"T MATTER
print "Are we in la-la land?"
def complete(self):
print "BB Sanity check being asked for completeness: " , self.has_run
return self.has_run
class Pipeline(luigi.Task):
date = luigi.DateParameter()
def requires(self):
db = Database('cbs')
self.conn = db.connect()
print "I'm about to yield!"
return bbSanityCheck(conn = self.conn, date = self.date)
def run(self):
print "Hello World"
self.conn.query("""SELECT *
FROM log_blackbook""")
result = conn.store_result()
print result.fetch_row()
def complete(self):
return False
if __name__=='__main__':
luigi.run()
Output is here (with relevant DB returns removed 'cause):
DEBUG: Checking if Pipeline(date=2013-03-03) is complete
I'm about to yield!
INFO: Scheduled Pipeline(date=2013-03-03)
I'm about to yield!
DEBUG: Checking if bbSanityCheck(conn=<_mysql.connection open to 'sas1.rad.wc.truecarcorp.com' at 223f050>, date=2013-03-03) is complete
BB Sanity check being asked for completeness: False
INFO: Scheduled bbSanityCheck(conn=<_mysql.connection open to 'sas1.rad.wc.truecarcorp.com' at 223f050>, date=2013-03-03)
INFO: Done scheduling tasks
DEBUG: Asking scheduler for work...
DEBUG: Pending tasks: 2
INFO: [pid 5150] Running bbSanityCheck(conn=<_mysql.connection open to 'sas1.rad.wc.truecarcorp.com' at 223f050>, date=2013-03-03)
Entering run of bb sanity check
Are we in la-la land?
INFO: [pid 5150] Done bbSanityCheck(conn=<_mysql.connection open to 'sas1.rad.wc.truecarcorp.com' at 223f050>, date=2013-03-03)
DEBUG: Asking scheduler for work...
INFO: Done
INFO: There are no more tasks to run at this time
INFO: There are 1 pending tasks possibly being run by other workers
INFO: Worker was stopped. Shutting down Keep-Alive thread
So the questions:
1.) Why does "I'm about to yield" get printed twice?
2.) Why is "hello world" never printed?
3.) What is the "1 pending tasks possibly run by other workers"?
I prefer super-ultra clean output because it is way easier to maintain. I'm hoping I can get these warning equivalents ironed out.
I've also noted that requires either "yield" or "return item, item2, item3". I've read about yield and understand it. What I don't get is which convention is considered superior here or if their are subtle differences that I being new to the language am not getting.
I think you're misunderstanding how luigi works in general.
(1) Hmm.. not sure about that. It looks more like an issue with printing the same thing in both INFO and DEBUG to me
(2)
So, you're trying to run Pipeline which depends on bbSanityCheck to run. bbSanityCheck.complete() never returns True because you never set has_run to True in bbSanityCheck. So the Pipeline task can never run and output hello world, because its dependencies are never complete.
(3) That's probably because you have this pending task(it's actually Pipeline). But Luigi understands it is impossible for it to run and shuts down.
I would personally not use has_run as a way to check if a task has run, but instead check for the existence of the result of this job. Ie, if this job does sth to the database then, complete() should check that the expected contents are there.