I've searched several related posts, but none explicitly answer my query. I'm trying to create a class that will use multiprocessing to distribute jobs to a machine. The 'jobs' are system calls using subprocess and I don't wish for the script to stay connected to the jobs once spawned. I've gotten everything to work using the Process class, but I would like to try the Pool class, and I'm having problems.
My code is here:
https://gist.github.com/2627589
The relevant methods are the run_queue method. You can see in the Test_pool class I overwrite the run_queue method of my Runner class. However, when I run this, I get an error:
PicklingError: Can't pickle : attribute lookup builtin.instancemethod failed
My goal is to be able to define a MAX_NUM_CORES that should be kept busy, and continually submit jobs as long as the jobs distributed are not using more than the maximum number of cores defined (e.g. MAX_NUM_CORES). Maybe I'm not using the right design pattern? Suggestions are welcome.
Related
I'm trying to parallelize a for loop to speed-up my code, since the loop processing operations are all independent. Following online tutorials, it seems the standard multiprocessing library in Python is a good start, and I've got this working for basic examples.
However, for my actual use case, I find that parallel processing (using a dual core machine) is actually a little (<5%) slower, when run on Windows. Running the same code on Linux, however, results in a parallel processing speed-up of ~25%, compared to serial execution.
From the docs, I believe this may relate to Window's lack of fork() function, which means the process needs to be initialised fresh each time. However, I don't fully understand this and wonder if anyone can confirm this please?
Particularly,
--> Does this mean that all code in the calling python file gets run for each parallel process on Windows, even initialising classes and importing packages?
--> If so, can this be avoided by somehow passing a copy (e.g. using deepcopy) of the class into the new processes?
--> Are there any tips / other strategies for efficient parallelisation of code design for both unix and windows.
My exact code is long and uses many files, so I have created a pseucode-style example structure which hopefully shows the issue.
# Imports
from my_package import MyClass
imports many other packages / functions
# Initialization (instantiate class and call slow functions that get it ready for processing)
my_class = Class()
my_class.set_up(input1=1, input2=2)
# Define main processing function to be used in loop
def calculation(_input_data):
# Perform some functions on _input_data
......
# Call method of instantiate class to act on data
return my_class.class_func(_input_data)
input_data = np.linspace(0, 1, 50)
output_data = np.zeros_like(input_data)
# For Loop (SERIAL implementation)
for i, x in enumerate(input_data):
output_data[i] = calculation(x)
# PARALLEL implementation (this doesn't work well!)
with multiprocessing.Pool(processes=4) as pool:
results = pool.map_async(calculation, input_data)
results.wait()
output_data = results.get()
EDIT: I do not believe the question is a duplicate of the one suggested, since this relates to a difference in Windows and Linunx, which is not mentioned at all in the suggested duplicate question.
NT Operating Systems lack the UNIX fork primitive. When a new process is created, it starts as a blank process. It's responsibility of the parent to instruct the new process on how to bootstrap.
Python multiprocessing APIs abstracts the process creation trying to give the same feeling for the fork, forkserver and spawn start methods.
When you use the spawn starting method, this is what happens under the hood.
A blank process is created
The blank process starts a brand new Python interpreter
The Python interpreter is given the MFA (Module Function Arguments) you specified via the Process class initializer
The Python interpreter loads the given module resolving all the imports
The target function is looked up within the module and called with the given args and kwargs
The above flow brings few implications.
As you noticed yourself, it is a much more taxing operation compared to fork. That's why you notice such a difference in performance.
As the module gets imported from scratch in the child process, all import side effects are executed anew. This means that constants, global variables, decorators and first level instructions will be executed again.
On the other side, initializations made during the parent process execution will not be propagated to the child. See this example.
This is why in the multiprocessing documentation they added a specific paragraph for Windows in the Programming Guidelines. I highly recommend to read the Programming Guidelines as they already include all the required information to write portable multi-processing code.
I want to keep a python class permanently alive so I can continually interact with it. The reason for this is that this class is highly memory intensive which means that (1) I cannot fit it into memory multiple times, and (2) Loading the class is prohibitively slow.
I have tried implementing this using both Pyro and RPYC, but it appears that these packages always delete the object and create a new object every time a new request is made (which is exactly what I don't want to do.) However, I did find the following option for Pyro:
#Pyro4.behavior(instance_mode="single")
Which ensures that only a single instance is created. However, since it is possible that multiple requests will be made simultaneously I am not 100% that this is safe to do. Is there a better way to accomplish what I am trying to do?
Thanks in advance for any help, it is greatly appreciated! (I've been struggling with this for quite a while now).
L
If you don't want to make your class thread safe, you can set SERVERTYPE to "multiplex", this will make it so all remote method calls are processed sequentially.
https://pythonhosted.org/Pyro4/servercode.html#server-types-and-concurrency-model:
multiplexed server (servertype "multiplex")
This server uses a connection multiplexer to process all remote method calls sequentially. No threads are used in this server. It uses the best supported selector available on your platform (kqueue, poll, select). It means only one method call is running at a time, so if it takes a while to complete, all other calls are waiting for their turn (even when they are from different proxies). The instance mode used for registering your class, won’t change the way the concurrent access to the instance is done: in all cases, there is only one call active at all times. Your objects will never be called concurrently from different threads, because there are no threads. It does still affect when and how often Pyro creates an instance of your class.
In a Django Python app, I launch jobs with Celery (a task manager). When each job is launched, they return an object (lets call it an instance of class X) that lets you check on the job and retrieve the return value or errors thrown.
Several people (someday, I hope) will be able to use this web interface at the same time; therefore, several instances of class X may exist at the same time, each corresponding to a job that is queued or running in parallel. It's difficult to come up with a way to hold onto these X objects because I cannot use a global variable (a dictionary that allows me to look up each X objects from a key); this is because Celery uses different processes, not just different threads, so each would modify its own copy of the global table, causing mayhem.
Subsequently, I received the great advice to use memcached to share the memory across the tasks. I got it working and was able to set and get integer and string values between processes.
The trouble is this: after a great deal of debugging today, I learned that memcached's set and get don't seem to work for classes. This is my best guess: Perhaps under the hood memcached serializes objects to the shared memory; class X (understandably) cannot be serialized because it points at live data (the status of the job), and so the serial version may be out of date (i.e. it may point to the wrong place) when it is loaded again.
Attempts to use a SQLite database were similarly fruitless; not only could I not figure out how to serialize objects as database fields (using my Django models.py file), I would be stuck with the same problem: the handles of the launched jobs need to stay in RAM somehow (or use some fancy OS tricks underneath), so that they update as the jobs finish or fail.
My best guess is that (despite the advice that thankfully got me this far) I should be launching each job in some external queue (for instance Sun/Oracle Grid Engine). However, I couldn't come up with a good way of doing that without using a system call, which I thought may be bad style (and potentially insecure).
How do you keep track of jobs that you launch in Django or Django Celery? Do you launch them by simply putting the job arguments into a database and then have another job that polls the database and runs jobs?
Thanks a lot for your help, I'm quite lost.
I think django-celery does this work for you. Did you had a look at the tables made by django-celery? I.e. djcelery_taskstate holds all data for a given task like state, worker_id and so on. For periodic tasks there is a table called djcelery_periodictask.
In a Django view you can access the TaskMeta object:
from djcelery.models import TaskMeta
task = TaskMeta.objects.get(task_id=task_id)
print task.status
I try to create a Python script that performs queries to multiple sites. The script works well (I use urllib2) but just for one link. For multiples sites, I make multiple requests one after the other but it is not very powerful.
What is the ideal solution (the threads I guess) to run multiple queries in parallel and stop others when a query returns a specific string please ?
I found this question but I have not found how to change it to stop the remaining threads... :
Python urllib2.urlopen() is slow, need a better way to read several urls
Thank you in advance !
(sorry if I made mistakes in English, I'm French ^^)
You can use Twisted to deal with multiple requests concurrently. Internally it will use epoll (or iocp or kqueue depending on the platform) to get notified of tcp availability efficently, which is cheaper than using threads. Once one request matches, you cancel the others.
Here is the Twisted http agent tutorial.
Usually this is implemented with the following pattern (sorry, my Python skills are not so good).
You have a class named Runner. This class has long running method, which gets the information you need. Also, it has a Cancel method, which interrupts the long running method in some way (you can make the url request object a class member field, so the cancel class calls the equivalent of request.terminate()).
The long running method need to accept a callback function, which to signal when done.
Then, before you start your many threads, you create instances of all these objects of that class, and keep them in a list. In the same loop you can start these long running methods, passing a callback method of your main program.
And, in the callback method, you just go trough the list of all threaded classes and call their cancel method.
Please, edit my answer with any Python specific implementation :)
You can run your queries with the multiprocessing library, poll for results, and shutdown queries you no longer need. Documentation for the module includes information on the Process class which has a terminate() method. If you wish to limit the number of requests sent out, check out options for pooling.
I have created the number of schedulers using python in windows which are running in background.
Can anyone tell me any command to check how many schedulers running on windows and also how can I remove them?
If you are using sched.scheduler, you can query sched.scheduler.queue.
scheduler.queue
Read-only attribute returning a list of upcoming events in the order they will be run. Each event is shown as a named tuple with the following fields: time, priority, action, argument.
In the very docs there is also this little piece of advice:
In multi-threaded environments, the scheduler class has limitations with respect to thread-safety, inability to insert a new task before the one currently pending in a running scheduler, and holding up the main thread until the event queue is empty. Instead, the preferred approach is to use the threading.Timer class instead.
All your schedulers are part of your a single Python process, then you won't be able to count the the individual timers which are scheduled. As the python schedulers are something you write, you can choose to keep a file which would be updated periodically.
If each scheduler is a separate python process, then count the many python processes from your Windows task manager.