I'm trying to use airflow to define a specific workflow that I want to manually trigger from the command line.
I create the DAG and add a bunch of tasks.
dag = airflow.DAG(
"DAG_NAME",
start_date=datetime(2015, 1, 1),
schedule_interval=None,
default_args=args)
I then run in the terminal
airflow trigger_dag DAG_NAME
and nothing happens. The scheduler is running in another thread. Any direction is much appreciated. Thank You
I just encountered the same issue.
Assuming you are able to see your dag in airflow list_dags or via the web server then:
Not only did I have to turn on the dag in the web UI, but I also had to ensure that airflow scheduler was running as a separate process.
Once I had the scheduler running I was able to successfully execute my dag using airflow trigger_dag <dag_id>
My dag configuration is not significantly different from yours. I also have schedule_interval=None
You may have disabled the workflow.
To enable the workflow manually. Open up the airflow web server by
$ airflow webserver -p 8080
Go to http://localhost:8080 . You should see the list of all available dags with a toggle button on/off. By default everything is set to off. Search for your dag and toggle your workflow. Now try triggering the workflow from terminal. It should work now.
first make sure your database connection string on the airflow is working, weather it be on postgres, sqlite(by default) or any other database. Then run the command
airflow initdb
This command should not be showing any connection errors
Secondly make sure your webserver is running on a separate thread
airflow webserver
Then run your schdeuler on a different thread
airflow scheduler
Finally trigger your dag on a different thread after the scheduler is running
airflow trigger_dag dag_id
Also make sure the dag name and task are present in the dag and task list
airflow list_dags
airflow list_tasks dag_id
And if the dag is switched off in your UI then toggle it on.
You should 'unpause' the drag you what to trigger. use airflow unpause xxx_drag and then airflow trigger_dag xxx_drag and it should work.
airflow trigger_dag -e <execution_date> <dag_id>
Related
I run command airflow webserver in one terminal and it works well.
But when I run airflow scheduler in another terminal it stops webserver and can`t run scheduler too. I tried to change webserver port to 8070 but it still stucks.
Im getting the following airflow issue:
When I run Dags that have mutiple tasks in it, randomly airflow set some of the tasks to failed state, and also doesn't show any logs on the UI. I went to my running worker container and saw that the log files for those failed tasks were also not created.
Going to Celery Flower, I found these logs on failed tasks:
airflow.exceptions.AirflowException: Celery command failed on host
How to solve this?
My environment is:
airflow:2.3.1
Docker compose
Celery Executor
Worker, webserver, scheduler and triggerer in different containers
Docker compose hosted on Ubuntu
I also saw this https://stackoverflow.com/a/69201032/11949273 answer that might be related.
Anyone with these same issues?
Edit:
On my EC2 Instance I got more vCPU's and fine tuned airflow/celery workers parameters and solved this. Probably is some issue with lack of CPU and or something else.
I am faced with some issue. In my case in Inspect -> Console has some error with replaceAll in old browser (Chrome 83.X). Chrome 98.X does not have this issue.
I have a strange behaviour of Airflow with Kubernetes executor. In my config tasks run in dynamically created kubernetes pods, and i have a number of tasks that runs once or twice a day. Taks itself is python operators that runs some ETL routine, dag files is syncing via separate pod with git repo inside. For some time all working ok, but not so long ago in scheduler pod i begin to see error
kubernetes.client.exceptions.ApiException: (410)
Reason: Gone: too old resource version: 51445975 (51489631)
After that error is appear, old pods from tasks won't be deleted and after some time new pods can't be created and tasks won't run(or to be more precise it freezes in "scheduled" state). In this situation only deleting the scheduler pod with
kubectl delete -n SERVICE_NAME pod scheduler
and waiting for kubernetes to recreate it helps, but after some time error appears again and situation repeats. Another strange thing, that this error seems only appear after scheduled tasks run. If i trigger any task any number of time it via UI no error appears and pods are created and deleted normally.
Airflow version is 1.10.12 Any help will be appreciated, thanks!
This is because of Kubernetes Python client version 12.0
Restrict the version to <12
pip install -U 'kubernetes<12'
I am trying to find a way to cancel Airflow dag run while it is being executed. (whichever task it is at that moment). I wonder if I can set the status to "failed" while dag is running?
yes , you can just click on task that is running and mark as failed. This will fail all downstream tasks and then DAG eventually
I have an Airflow cluster up, configured to use the CeleryExecutor and a Postgres backend.
For some reason, the statuses of the DAGs on the Webserver UI are inconsistent every time I refresh. Upon each refresh, it shows many different things such as the DAG not available in the webserver dagbag object, or black statuses, or hiding the links on the right.
It changes on each refresh.
Here are a few screenshots:
Webserver UI 1
Webserver UI 2
Run airflow web-server in debug mode than you can get this resolved
airflow webserver -p <<port>> -d
The problem seems to be some dynamic code changes happening on new dag creation and hence production mode flask server is not patching it through