problems in Python Sched - python

I have created the number of schedulers using python in windows which are running in background.
Can anyone tell me any command to check how many schedulers running on windows and also how can I remove them?

If you are using sched.scheduler, you can query sched.scheduler.queue.
scheduler.queue
Read-only attribute returning a list of upcoming events in the order they will be run. Each event is shown as a named tuple with the following fields: time, priority, action, argument.
In the very docs there is also this little piece of advice:
In multi-threaded environments, the scheduler class has limitations with respect to thread-safety, inability to insert a new task before the one currently pending in a running scheduler, and holding up the main thread until the event queue is empty. Instead, the preferred approach is to use the threading.Timer class instead.

All your schedulers are part of your a single Python process, then you won't be able to count the the individual timers which are scheduled. As the python schedulers are something you write, you can choose to keep a file which would be updated periodically.
If each scheduler is a separate python process, then count the many python processes from your Windows task manager.

Related

What's the problem with sharing job stores in APScheduler?

I did quite get the problem that arises by sharing a job store across multiple schedulers in APScheduler.
The official documentation mentions
Job stores must never be shared between schedulers
but doesn't discuss the problems related to that, Can someone please explain it?
and also if I deploy a Django application containing APScheduler in production, will multiple job stores be created for each worker process?
There are multiple reasons for this. In APScheduler 3.x, schedulers do not have any means to signal each other about changes happening in the job stores. When the scheduler starts, it queries the job store for jobs due for execution, processes them and then asks how long it should sleep until the next due job. If another scheduler adds a job that would be executed before that wake-up time, the other scheduler would happily sleep past that time because there is no mechanism with which it could receive a notification about the new (or updated) job.
Additionally, schedulers do not have the ability to enforce the maximum number of running instances of a job since they don't communicate with other schedulers. This can lead to conflicts when the same job is run on more than one scheduler process at the same time.
These shortcomings are addressed in the upcoming 4.x series and the ability to share job stores could be considered one of its most significant new features.

Interruptible multiprocessing pool workers (python)

I have python GUI application which can kick off any number of computation-intensive long-running tasks that naturally belong in multiprocessing.Pool workers.
However, I'd like to be able to cancel these tasks, because later GUI input (such as changing a configuration variable) might render these tasks irrelevant.
Is there a popular pattern in Python for keeping track of which workers are working on what task, and interrupting them as needed?
The solutions I can think of are:
When a worker starts on a task it "announces" through some shared state that it is working on that particular task; if we need to cancel that task we look up which process is working on it and .terminate() it. There are many complexities here though.
Use raw multiprocessing.Processes and write a Pool-like manager that does exactly what we want.
Use some alternative library such as Celery. A huge list is here.

How to create cancellable tasks in Python?

I'm building a Python IDE, which needs to highlight all occurrences of the name under cursor (using Jedi library). The process of finding the occurrences can be quite slow.
In order to avoid freezing the GUI, I could run the search in another thread, but when the user moves quickly over several words, the background threads could pile up while working on now obsolete tasks. I would like to cancel the search for previous occurrences when user moves to new name.
Looks like killing a thread is complicated in Python. What are the other options for creating an easily cancellable background tasks in Python 3.4+?
I think concurrent.futures is the answer.
You can create a Thread / Process pool, submit any callable, receive a Future, which you can cancel if needed.
Reference: https://docs.python.org/3/library/concurrent.futures.html
A thread cannot be stopped by another one. This is a OS limitation rather than a Python one. Only thing you can do is periodically inspect a variable and, if set, stop the thread itself (just return).
Moreover, threads in Python suffer from the GIL. This means that CPU intensive operations, when carried out in a separate thread, will still affect your main loop as only one thread per process can run at a time.
I'd recommend you to run the search in a separate process which you can easily cancel whenever you want.
What the guys of YouCompleteMe are doing for example is wrapping Jedi in a HTTP server which they can query in the background. If the user moves the cursor before the completion comes back, the IDE can simply drop the request.
Well, my personal favorites are work queues. If it's a one-time application you should take a look at python rq. Extremely easy and fun to use. If you want to build something more "professional-grade" take a look at something like celery.
You might also want to look at multiprocessing

Run specific django manage.py commands at intervals

I need to run a specific manage.py commands on an EC2 instance every X minutes. For example: python manage.py some_command.
I have looked up django-chronograph. Following the instructions, I've added chronograph to my settings.py but on runserver it keeps telling me No module named chronograph.
Is there something I'm missing to get this running? And after running how do I get manage.py commands to run using chronograph?
Edit: It's installed in the EC2 instance's virtualenv.
I would suggest you to configure cron to run your command at specific times/intervals.
First, install it by running pip install django-chronograph.
I would say handle this through cross, but if you don't want to use cross then:
Make sure you installed the module in the virtualenv (With easy_install, pip, or any other way that Amazon EC2 allows). After that you might want to look up the threading module documentation:
Python 2 threading module documentation
Python 3 threading module documentation
The purpose of using threading will be to have the following structure:
A "control" thread, which will use the chronograph module and do the time measurements, and putting the new work to do in an "input queue" on each scheduled time, for the worker threads (which will be active already) to process, or just trigger each worker thread (make it active) at the time you want to trigger each execution. In the first case you'll be taking advantage of parallel threads to do a big chunk of work and minimize io wait times, but since the work is in a queue, the workers will process one at a time. Meaning if you schedule two things too close together and the previous element is still being processed, the new item will have to wait (Depending on your programming logic and amount of worker threads some workers might start processing the new item, but is a bit more complex logic).
In the second case your control thread will actually trigger the start of a new thread (or group of threads) each time you want to trigger a scheduled action. If there's big data to process you might need to spawn a new queue for each task to process and create a group of worker threads for it for each task, but if the data is not that big then you can just get away with having the worker process just one data package and be done once execution is done and you get a result. Either way this method will allow you to schedule tasks without limitation on how close they can be, since new independent worker threads will be created for them every time.
Finally, you might want to create an "output queue" and output thread, to store and process (or output, or anything else you want to do with it...) the results of each worker threads.
The control thread will be basically trying to imitate cron in its logic, triggering actions at certain times depending on how it was configured.
There's also a multiprocessing module in python which will work with processes instead and take advantage of true multiprocessing hardware, but I don't think you'll really need it in this case, unless you see performance issues caused by cpu performance.
If you need any clarification, help, examples, just let me know.

Pythonic way for managing thread pool

I have a couple of python scripts that are responsible for managing some live feed processing. Its structured like so:
Script 1: manages an "aggregate" list of live events that provides some very thin data about all events.
Script 2: manages a list of threads that process detailed feeds for each live event.
Script 1 is responsible for defining which events are active and (for now) writes all the unique identifiers for the active events to a flat file (not liking that at all). Script 2 reads those unique identifiers, checks to see if it already has a thread with that ID and if not, it starts that thread which then processes the detailed data for that event. Script 2 does NOT define when that thread should be marked as inactive or remove it from the quasi-queue file. The threads know when they should terminate themselves and script 1 monitors a feed that is the master list which defines what events are active. This works but fairly well but it feels clunky and poor to me.
I've looked at this Threading pool similar to the multiprocessing Pool? and Queue approaches like this https://www.ibm.com/developerworks/aix/library/au-threadingpython/ but they don't seem to apply well because the live event threads don't have a specified lifespan...they are spawned and live until their event is over (in the order of hours).
I'm still new to python and this feels a bit over my head. Any kind of sanity/stupidity check you could offer in terms of implementation approaches would be greatly appreciated.
EDIT: I'm not in a position to use an external module because of sys admin limitations :(
It sounds like you need to use something like celery.

Categories