Scheduled task - no next run time - python

here is my problem. When I create a new scheduled task using win32com in python there is no next run time for the task. It says 'never' in task scheduler gui.
My workflow of creating tasks:
try to make new task, if failed, get existing one for update,
create daily triggers for the task,
save it all.
Any advice?

So here is the simple solution.
I checked the defaults params for the trigger and than I saw, that Flags is set to 4, which means DISABLED.
It seems, that's the default setting for a new trigger for a task.

Related

Is it possible to kill the previous DAG run if its still running when its time for the latest run?

Our airflow is forced to interact with a company with a very poor system. Its not unusual for our DAG to get stuck waiting for a report that never actually gets completed. This DAG runs daily pulling the the same information, so if its time for the next run it would be nice to just kill the last run and move on with the new one. I haven't found anything saying Airflow has a DAG argument that can achieve this. Is there a quick easy setting for this behavior, or would it need to be done logically in the sensor that checks if the report is complete?
If your DAG is scheduled daily, how about setting dagrun_timeout to 24 hours? I believe this should in effect kill the previous dag run around when it kicks off a new one. Related question about setting DAG timeouts.
Alternatively, you could either use a PythonOperator, define your own operator, or extend the report sensor you describe to kill the previous DagRun programmatically. I believe that this would look like...
Get the current dag run from the Airflow context
Get the previous dag run with dag_run.get_previous_dagrun()
Set the state on the previous dag run with prev_dag_run.set_state
My recommendation would be to set the timeout given these two options. I agree that there is no specific kill_previous_run dag argument

Triggering the external dag using another dag in Airflow

Having list of tasks which calls different dags in master dag.I'm using the TriggerDagrunoperator to accomplish this. But facing few issues.
TriggerDagrunoperator doesn't wait for completion of external dag, it triggers next task. I want that to wait until completion and next task should trigger based on the status. Came across ExternalTaskSensor. It is making the process complicated. Is there any other solution to fix this?
If I trigger the master dag again, I want the task to restart from where it is failed. Right now, it's not restarting, but for time based schedule,it will.
.. I want that to wait until completion .. Came across
ExternalTaskSensor. It is making the process complicated ..
I'm unaware of any other way to achieve this. I myself did this the same way.
If I trigger the master dag again, I want the task to restart from
where it is failed...
This requirement of your goes against the principle of idempotency that Airflow demands. I'd suggest you try to re-work you jobs in incorporate idempotency (for instance in case of retries, you have to have idempotency). Meanwhile you can take inspiration from some people and try to achieve something similar (but it will be pretty complicated)
With Airflow 2.0.1, the triggering dag can be made to wait for completion of target dag with parameter wait_for_completion
ref: here

How to run a function in the future using Django based on a condition?

I would like to run a particular function (let's say to delete a post) at a specific time in the future (e.g.: at 10am) only once based on a condition.
I am using Django, and I was thinking about using cron or python-crontab, but it seems that these task schedulers can only be used when a particular task has to be executed more than once in the future. As I was trying to use the python-crontab with Django, I also did not find any resources that allow me to execute "this task of deleting a post at 10am tomorrow only if a user does a particular action", for example.
Does anyone know if I can still use python-crontab? Or other technology should be used?
I would use:
https://github.com/celery/django-celery-beat
http://docs.celeryproject.org/en/latest/getting-started/first-steps-with-celery.html
Celery to run background tasks, and celery beat is a scheduler to kick off the background tasks at the specified times.

Airflow: how to specify condition for hourly workflow like trigger only if currently same instance is not running?

I have created an workflow (contains few tasks). It is hourly execution. Workflow should be triggered only if another instance of workflow is not running at same time. If it is running, workflow execution should be skipped for that hour.
I checked with "depends_on_past" but couldn't get it.
Set the max_active_runs on your DAG to 1 and also catchup to False
From the official Airflow documentation for trigger rules:
The depends_on_past (boolean), when set to True, keeps a task from getting triggered if the previous schedule for the task hasn’t succeeded.
It will work if you use it in the definition of the task. You can pair it with wait_for_downstream= True as well to guarantee that the new run's instance will not begin until the last run's instance of the task has completed execution.
task_depends = DummyOperator( task_id= "task_depend", dag= dag, depends_on_past= True )
However another way to work around this assuming that you only need the latest run to work is using the Latest Run Only concept:
Standard workflow behavior involves running a series of tasks for a particular date/time range. Some workflows, however, perform tasks that are independent of run time but need to be run on a schedule, much like a standard cron job. In these cases, backfills or running jobs missed during a pause just wastes CPU cycles.
For situations like this, you can use the LatestOnlyOperator to skip tasks that are not being run during the most recent scheduled run for a DAG. The LatestOnlyOperator skips all immediate downstream tasks, and itself, if the time right now is not between its execution_time and the next scheduled execution_time.

How to force apscheduler to add jobs to the job store?

I'm adding a job to a scheduler using apscheduler using a script. Unfortunately, the job is not properly scheduled when using a script as I didn't start the scheduler.
scheduler = self.getscheduler() # initializes and returns scheduler
scheduler.add_job(trigger=trigger, func = function, jobstore = 'mongo') #sample code. Note that I did not call scheduler.start()
I'm seeing a message: apscheduler.scheduler - INFO - Adding job tentatively -- it will be properly scheduled when the scheduler starts
The script is supposed to add jobs to the scheduler (not to run the scheduler at that particular instance) and there are some other info which are to be added on the event of a job added to the database. Is it possible to add a job and force the scheduler to add it to the jobstore without actually running the scheduler?
I know, that it is possible to start and shutdown the scheduler after addition of each job to make the scheduler save the job information into the jobstore. Is that really a good approach?
Edit: My original intention was to isolate initialization process of my software. I just wanted to add some jobs to a scheduler, which is not yet started. The real issue is that I've given permission for the user to start and stop scheduler. I cannot assure that there is a running instance of scheduler in the system. I've temporarily fixed the problem by starting the scheduler and shutting it down after addition of jobs. It works.
You would have to have some way to notify the scheduler that a job has been added, so that it could wake up and adjust the delay to its next wakeup. It's better to do this via some sort of RPC mechanism. What kind of mechanism is appropriate for your particular use case, I don't know. But RPyC and Execnet are good candidates. Use one of them or something else to remotely control the scheduler process to add said jobs, and you'll be fine.

Categories