Usage of variable in airflow DAG - python

I set the variable with the "airflow variables" command in cli
I wants to use this variable in DAG.
I executed the following command on the terminal
The error continues occurs.
Broken DAG: [/root/airflow/dags/param_test.py] invalid syntax (param_test.py, line 13)
airflow variables -s sh_path = "/tmp/echo_test.sh"
airflow scheduler
here the code :
from airflow import DAG
from airflow.models import Variable
from airflow.operators.bash_operator import BashOperator
tmpl_search_path = Variable.get ("sh_path")
dag = DAG ('param_test', schedule_interval = '* / 5 * * * *'
           start_date = datetime (2018,9,4), catchup = False)
bash_task = BashOperator (
      task_id = "bash_task"
      bash_command = 'sh '+ {{var.value.tmpl_search_path}},
      dag = dag)
bash_task.set_downstream (python_task)
bash_task1 = BashOperator (
      task_id = 'echo',
      bash_command = 'echo 1',
      dag = dag)
bash_task.set_downstream (bash_task1)

You need to quote the jinja templating. Use it as below:
bash_task = BashOperator (
task_id = "bash_task"
bash_command = "sh {{var.value.tmpl_search_path}}",
dag = dag)

Related

How to handle Inter State DAG exeution

Both DAG A and DAG B run at the same time. for example 10 AM. DAG A complete in 5 minutes and DAG B will wait for the execution state of DAG A if the state is successful then DAG B will move to the next step otherwise will throw the error. DAB B always takes the execution state of DAG A on the same day and time. For example - Suppose DAG A ran yesterday successfully but today it is not started due to some issue but DAG B started and should not consider the previous run state. DAG B should consider DAG A's current execution state.
If the execution state is other than failed, success then how to handle the code.
I am new to Airflow and don't know how to handle this
Code
def status(**context):
try:
TI = context["task_instance"]
exuection_date = context["execution_date"]
run_state_intra = []
run_id_intra = []
for data_tuple in (
settings.Session()
.query(DR.dag_id, DR.execution_date, DR.state, DR.run_id)
.order_by(DR.execution_date.desc())
.limit(1)
):
In DagB you can create a BranchPythonOperator that finds the last run of DagA and decide if to continue with next task or raise exception.
in the example below, I am checking if DagA was running in the same day (midnight to now) and with state=success. if I have results I am returning the name of the next task to run, else I raise exception and DagB fails.
from datetime import datetime
from airflow import DAG
from airflow.models import DagRun, TaskInstance
from airflow.operators.python import BranchPythonOperator, PythonOperator
from airflow.exceptions import AirflowException
from airflow.utils import timezone
from airflow.utils.trigger_rule import TriggerRule
with DAG(
dag_id="DagB",
start_date=datetime(2022, 1, 1),
schedule_interval=None,
render_template_as_native_obj=True,
tags=["DagB"],
) as dag:
def check_success_dag_a(**context):
ti: TaskInstance = context['ti']
dag_run: DagRun = context['dag_run']
date: datetime = ti.execution_date
ts = timezone.make_aware(datetime(date.year, date.month, date.day, 0, 0, 0))
dag_a = dag_run.find(
dag_id='DagA',
state='success',
execution_start_date=ts,
execution_end_date=ti.execution_date)
if dag_a:
return "taskB"
raise AirflowException("DagA failed")
check_success = BranchPythonOperator(
task_id="check_success_dag_a",
python_callable=check_success_dag_a,
)
def run():
print('DagB')
run_task = PythonOperator(
task_id="taskB",
dag=dag,
python_callable=run,
trigger_rule=TriggerRule.ONE_SUCCESS
)
(check_success >> [run_task])

Airflow - Bach command failed

I am trying to execute following airflow dag file. but getting following error.
import json
from airflow.models import DAG
from airflow.operators.dummy_operator import DummyOperator
from airflow.operators.bash_operator import BashOperator
from airflow.utils.dates import days_ago
from datetime import datetime, timedelta, time
args = {
'owner': 'test',
'start_date': days_ago(2),
'depends_on_past': False,
}
dag = DAG(
dag_id='test',
default_args=args,
schedule_interval='0 5 * * *',
tags=['test']
)
start_test = DummyOperator(
task_id='start_test',
dag=dag,
)
end_test = DummyOperator(
task_id='end_test',
dag=dag,
)
load_complete = DummyOperator(
task_id='load_complete',
dag=dag,
)
execution_date = datetime.now()
def check_monthstart_trigger(execution_date, **kwargs):
return execution_date.day() == 1
for i in json.loads(open('/home/test_123/list_of_files.json', 'r').read())['tables'].keys():
extract_phase = BashOperator(
task_id = 'extract_' + str(i),
bash_command = 'python3 /home/python/extract_code.py -t {}'.format(i),
dag = dag,
)
create_phase = BashOperator(
task_id = 'modify_' + str(i),
bash_command = 'python3 /home/python/table_create.py -t {}'.format(i),
dag = dag,
)
load_phase = BashOperator(
task_id = 'load_' + str(i),
bash_command = 'python3 /home/python/load_test.py -t {}'.format(i),
dag = dag,
)
start_test >> extract_phase >> create_phase >> load_phase >> load_complete >> end_test
This is the error what i am getting
i have tried this solution also (Error calling BashOperator: Bash command failed) but didn't work. Any idea how to resolve this.

How to create dags inside another dag apache airflow

I am trying to have a master dag which will create further dags based on my need.
I have the following python file inside the dags_folder in airflow.cfg.
This code creates the master dag in database. This master dag should read a text file and should create dags for each line in the text file. But the dags created inside the master dag are not added to the database. What is the correct way to create it?
Version details:
Python version: 3.7
Apache-airflow version: 1.10.8
import datetime as dt
from airflow import DAG
from airflow.operators.bash_operator import BashOperator
from airflow.operators.python_operator import PythonOperator
root_dir = "/home/user/TestSpace/airflow_check/res"
print("\n\n ===> \n Dag generator")
default_args = {
'owner': 'airflow',
'start_date': dt.datetime(2020, 3, 22, 00, 00, 00),
'concurrency': 1,
'retries': 0
}
def greet(_name):
message = "Greetings {} at UTC: {} Local: {}\n".format(_name, dt.datetime.utcnow(), dt.datetime.now())
f = open("{}/greetings.txt".format(root_dir), "a+")
print("\n\n =====> {}\n\n".format(message))
f.write(message)
f.close()
def create_dag(dag_name):
with DAG(dag_name, default_args=default_args,
schedule_interval='*/2 * * * *',
catchup=False
) as i_dag:
i_opr_greet = PythonOperator(task_id='greet', python_callable=greet,
op_args=["{}_{}".format("greet", dag_name)])
i_echo_op = BashOperator(task_id='echo', bash_command='echo `date`')
i_opr_greet >> i_echo_op
return i_dag
def create_all_dags():
all_lines = []
f = open("{}/../dag_names.txt".format(root_dir), "r")
for x in f:
all_lines.append(str(x))
f.close()
for line in all_lines:
print("Dag creation for {}".format(line))
globals()[line] = create_dag(line)
with DAG('master_dag', default_args=default_args,
schedule_interval='*/1 * * * *',
catchup=False
) as dag:
echo_op = BashOperator(task_id='echo', bash_command='echo `date`')
create_op = PythonOperator(task_id='create_dag', python_callable=create_all_dags)
echo_op >> create_op
You have 2 options:
Use SubDagOperator: Example DAG. Use it if your Schedule Interval can be the same.
Write a Python DAG File: From you master DAG, create Python files in your AIRFLOW_HOME containing DAGs. You can use Jinja2 templating engine for this.
Have a look at the TriggerDagRunOperator:
https://airflow.apache.org/docs/stable/_api/airflow/operators/dagrun_operator/index.html
Example usage:
https://github.com/apache/airflow/blob/master/airflow/example_dags/example_trigger_controller_dag.py

Apache Airflow - Prescript rerun at each task of the dag and date change

I am new with using airflow.
I noticed that if you define a global variable (timestamp) in the code, this value will change for each task. For example in the very basic example bellow, I define a variable now but each time I print it in a task, this value changes.
from datetime import timedelta
from airflow import DAG
from airflow.operators.python_operator import PythonOperator
from airflow.utils.dates import days_ago
import time
now = int(time.time() * 1000)
RANGE = range(1, 10)
def init_step():
print("Run on RANGE {}".format(RANGE))
print("Date of the Scans {}".format(now))
return RANGE
def trigger_step(index):
time.sleep(10)
print("index {} - date {}".format(index, now))
return index
default_args = {
'owner': 'airflow',
'start_date': days_ago(1),
'retries': 2,
'retry_delay': timedelta(minutes=15)
}
with DAG('test',
default_args=default_args,
schedule_interval='0 16 */7 * *',
) as dag:
init = PythonOperator(task_id='init',
python_callable=init_step,
dag=dag)
for index in init_step():
run = PythonOperator(task_id='trigger-port-' + str(index),
op_kwargs={'index': index},
python_callable=trigger_step, dag=dag)
dag >> init >> run
Is it a normal behavior ? Is there a way to change it ?

Airflow Python operator passing parameters

I'm trying to write a Python operator in an airflow DAG and pass certain parameters to the Python callable.
My code looks like below.
def my_sleeping_function(threshold):
print(threshold)
fmfdependency = PythonOperator(
task_id='poke_check',
python_callable=my_sleeping_function,
provide_context=True,
op_kwargs={'threshold': 100},
dag=dag)
end = BatchEndOperator(
queue=QUEUE,
dag=dag)
start.set_downstream(fmfdependency)
fmfdependency.set_downstream(end)
But I keep getting the below error.
TypeError: my_sleeping_function() got an unexpected keyword argument 'dag_run'
Not able to figure out why.
Add **kwargs to your operator parameters list after your threshold param
This is how you can pass arguments for a Python operator in Airflow.
from airflow import DAG
from airflow.operators.dummy_operator import DummyOperator
from airflow.operators.python_operator import PythonOperator
from time import sleep
from datetime import datetime
def my_func(*op_args):
print(op_args)
return op_args[0]
with DAG('python_dag', description='Python DAG', schedule_interval='*/5 * * * *', start_date=datetime(2018, 11, 1), catchup=False) as dag:
dummy_task = DummyOperator(task_id='dummy_task', retries=3)
python_task = PythonOperator(task_id='python_task', python_callable=my_func, op_args=['one', 'two', 'three'])
dummy_task >> python_task

Categories