I'm trying to use win32com.client in order to interact with scheduled tasks on windows.
So far it's working fine, but I'm having trouble figuring out how to get the PID of a running process created by the scheduled task.
Documentation says to look at the value of IRunningTask::get_EnginePID method. However, while I can do something like:
scheduler = win32com.client.Dispatch('Schedule.Service')
scheduler.Connect()
folder = scheduler.GetFolder('\\')
task = folder.GetTask('Name of created task')
task.State
task.Name
I'm not sure how to access this EnginePID attribute as task.EnginePID or anything like that doesn't work.
So for example, I have a task that launches calc.exe. I want to find the PID of the calc.exe or whatever process spawned by the scheduled task.
How can I achieve this?
Related
This is a very naive question, but I feel I don't understand something fundamental about asynchronous/background tasks in django and python.
I try to replicate a simple example provided by django-background-tasks (https://github.com/collinmutembei/django-background-tasks-example) in order to make django perform a background task 60 seconds later than it was actually run. But I guess the same is valid for any other background task manager such as Celery or Huey
The example is pretty simple - as soon as the user accesses the url, a simple function that prints a message is executed without blocking the main django process, 60 sec later
from background_task import background
from logging import getLogger
logger = getLogger(__name__)
#background(schedule=60)
def demo_task(message):
logger.debug('demo_task. message={0}'.format(message))
The problem is that I really don't understand the basics. It doesn't run unless I start a separate (or detached) process python manage.py process_tasks. Should I always do it to make the background task work, or there is a way to do it without starting a parallel process?
If I should start a parallel process, can I do it from inside of django code. Something like:
import subprocess
process = subprocess.Popen(['python', 'manage.py','process_tasks'], stdout=subprocess.PIPE, stderr=subprocess.PIPE)
It is not necessary but good and helpful to run a separate process to run tasks in the background.
When you run a server, a process is created - run ps aux | grep runserver - which is responsible for serving web requests. When you say that you want to run certain tasks in the background, it implicitly means that you want a separate process to execute those tasks. This is where asynchronous task tools like celery come in.
You can also spawn a separate process yourself - as you said - by doing:
import subprocess
process = subprocess.Popen(['python', 'manage.py','process_tasks'], stdout=subprocess.PIPE, stderr=subprocess.PIPE
This method is also completely fine if you have just one or two small tasks that you want to run in parallel. However, when you have tons of complicated tasks that you are running in the background, you would want to manage them properly. Also, you need to be able to debug those tasks, if something goes wrong. Later, you will need more visibility into what is happening in all the background tasks, their status, etc. This is where celery will help you. It will give you decorated methods which will handle all those things for you. You just have to worry about your business logic then
I have to do some long work in my Flask app. And I want to do it async. Just start working, and then check status from javascript.
I'm trying to do something like:
#app.route('/sync')
def sync():
p = Process(target=routine, args=('abc',))
p.start()
return "Working..."
But this it creates defunct gunicorn workers.
How can it be solved? Should I use something like Celery?
There are many options. You can develop your own solution, use Celery or Twisted (I'm sure there are more already-made options out there but those are the most common ones).
Developing your in-house solution isn't difficult. You can use the multiprocessing module of the Python standard library:
When a task arrives you insert a row in your database with the task id and status.
Then launch a process to perform the work which updates the row status at finish.
You can have a view to check if the task is finished, which actually just checks the status in the corresponding.
Of course you have to think where you want to store the result of the computation and what happens with errors.
Going with Celery is also easy. It would look like the following.
To define a function to be executed asynchronously:
#celery.task
def mytask(data):
... do a lot of work ...
Then instead of calling the task directly, like mytask(data), which would execute it straight away, use the delay method:
result = mytask.delay(mydata)
Finally, you can check if the result is available or not with ready:
result.ready()
However, remember that to use Celery you have to run an external worker process.
I haven't ever taken a look to Twisted so I cannot tell you if it more or less complex than this (but it should be fine to do what you want to do too).
In any case, any of those solutions should work fine with Flask. To check the result it doesn't matter at all if you use Javascript. Just make the view that checks the status return JSON (you can use Flask's jsonify).
I would use a message broker such as rabbitmq or activemq. The flask process would add jobs to the message queue and a long running worker process (or pool or worker processes) would take jobs off the queue to complete them. The worker process could update a database to allow the flask server to know the current status of the job and pass this information to the clients.
Using celery seems to be a nice way to do this.
I'm adding a job to a scheduler using apscheduler using a script. Unfortunately, the job is not properly scheduled when using a script as I didn't start the scheduler.
scheduler = self.getscheduler() # initializes and returns scheduler
scheduler.add_job(trigger=trigger, func = function, jobstore = 'mongo') #sample code. Note that I did not call scheduler.start()
I'm seeing a message: apscheduler.scheduler - INFO - Adding job tentatively -- it will be properly scheduled when the scheduler starts
The script is supposed to add jobs to the scheduler (not to run the scheduler at that particular instance) and there are some other info which are to be added on the event of a job added to the database. Is it possible to add a job and force the scheduler to add it to the jobstore without actually running the scheduler?
I know, that it is possible to start and shutdown the scheduler after addition of each job to make the scheduler save the job information into the jobstore. Is that really a good approach?
Edit: My original intention was to isolate initialization process of my software. I just wanted to add some jobs to a scheduler, which is not yet started. The real issue is that I've given permission for the user to start and stop scheduler. I cannot assure that there is a running instance of scheduler in the system. I've temporarily fixed the problem by starting the scheduler and shutting it down after addition of jobs. It works.
You would have to have some way to notify the scheduler that a job has been added, so that it could wake up and adjust the delay to its next wakeup. It's better to do this via some sort of RPC mechanism. What kind of mechanism is appropriate for your particular use case, I don't know. But RPyC and Execnet are good candidates. Use one of them or something else to remotely control the scheduler process to add said jobs, and you'll be fine.
I have to do some long work in my Flask app. And I want to do it async. Just start working, and then check status from javascript.
I'm trying to do something like:
#app.route('/sync')
def sync():
p = Process(target=routine, args=('abc',))
p.start()
return "Working..."
But this it creates defunct gunicorn workers.
How can it be solved? Should I use something like Celery?
There are many options. You can develop your own solution, use Celery or Twisted (I'm sure there are more already-made options out there but those are the most common ones).
Developing your in-house solution isn't difficult. You can use the multiprocessing module of the Python standard library:
When a task arrives you insert a row in your database with the task id and status.
Then launch a process to perform the work which updates the row status at finish.
You can have a view to check if the task is finished, which actually just checks the status in the corresponding.
Of course you have to think where you want to store the result of the computation and what happens with errors.
Going with Celery is also easy. It would look like the following.
To define a function to be executed asynchronously:
#celery.task
def mytask(data):
... do a lot of work ...
Then instead of calling the task directly, like mytask(data), which would execute it straight away, use the delay method:
result = mytask.delay(mydata)
Finally, you can check if the result is available or not with ready:
result.ready()
However, remember that to use Celery you have to run an external worker process.
I haven't ever taken a look to Twisted so I cannot tell you if it more or less complex than this (but it should be fine to do what you want to do too).
In any case, any of those solutions should work fine with Flask. To check the result it doesn't matter at all if you use Javascript. Just make the view that checks the status return JSON (you can use Flask's jsonify).
I would use a message broker such as rabbitmq or activemq. The flask process would add jobs to the message queue and a long running worker process (or pool or worker processes) would take jobs off the queue to complete them. The worker process could update a database to allow the flask server to know the current status of the job and pass this information to the clients.
Using celery seems to be a nice way to do this.
I have to monitor a process continuously and I use the process ID to monitor the process. I wrote a program to send an email once the process had stopped so that I would manually reschedule it, but often I forget to reschedule the process ( basically another python program). I then came across the apscheduler module and used the cron style scheduling ( http://packages.python.org/APScheduler/cronschedule.html) to spawn a process once it has stopped. Now, I am able to spawn the process once PID of the process has been killed, but when I spawn it using the apscheduler I am not able to get the process id (PID) of the newly scheduled process; Hence, I am not able to monitor the process. Is there a function in apscheduler to get the process ID of the scheduled process?
Instead of relying on APSchedule to return the pid, why not have your program report the pid itself. It's quite common for daemons to have pidfiles, which are files at a known location that just contain the pid of the running process. Just wrap your main function in something like this:
import os
try:
with open("/tmp/myproc.pid") as pidfile:
pidfile.write(str(os.getpid()))
main()
finally:
os.remove("/tmp/myproc.pid")
Now whenever you want to monitor your process you can firstly check to see in the pid file exists, and if it does, retrieve the pid of the process for further monitoring. This has the benefit of being independent of a specific implementation of cron, and will make it easier in future if you want to write more programs that interact with the program locally.