I want to call python functions in sub-processes without creating a copy of the current process.
I have a method A.run() which should call B.run() multiple times.
A.run() consumes a lot of memory, so I don't want to use a ProcessPoolExecutor because it copies the whole memory AFAIK.
I also do not want to use subprocess.Popen because it has several disadvantages to me:
only pass strings as parameters
cannot take advantage of exceptions
I have to know the location of B.py exactly, instead of relying on PYTHONPATH
I also do not want to spawn threads because B.run() crashes easily and I don't want it to effect the parent process.
Is there a way I have overlooked that has the advantage of spawning separate processes, without the extra memory but with the benefits of calling a python method?
Edit 1:
Answers to some questions:
If I understand this correctly, I don't need the context of the first python process.
I cannot reuse Processes because I call a C++ library which has static variables and they need to be destroyed.
Most Unix Operating Systems are using Copy-On-Write when they fork new processes.
This implies that, if the memory is not changed by the process children, the memory is not duplicated but shared.
You see the processes having the same amount of memory due to the fact that they use that amount of virtual memory, but when it comes to the physical one, the parent process memory is actually in a unique copy shared among them all.
If I assume right and the children processes are not touching the parent's memory at all, then you're just wasting your time going against Unix design principles.
More info here.
Related
I need to read a large dataset (about 25GB of images) into memory and read it from multiple processes. None of the processes has to write, only read. All the processes are started using Python's multiprocessing module, so they have the same parent process. They train different models on the data and run independently of each other. The reason why I want to read it only one time rather than in each process is that the memory on the machine is limited.
I have tried using Redis, but unfortunately it is extremely slow when many processes read from it. Is there another option to do this?
Is it maybe somehow possible to have another process that only serves as a "get the image with ID x" function? What Python module would be suited for this? Otherwise, I was thinking about implementign a small webserver using werkzeug or Flask, but I am not sure if that would become my new bottleneck then...
Another possibility that came to my mind was to use threads instead of processes, but since Python is not really doing "real" multithreading, this would probably become my new bottleneck.
If you are on linux and the content is read-only, you can use the linux fork inheriting mechanism.
from mp documentation:
Better to inherit than pickle/unpickle
When using the spawn or forkserver start methods many types from
multiprocessing need to be picklable so that child processes can use
them. However, one should generally avoid sending shared objects to
other processes using pipes or queues. Instead you should arrange the
program so that a process which needs access to a shared resource
created elsewhere can inherit it from an ancestor process.
which means:
Before you fork your child processes, prepare your big data in a module level variable (global to all the functions).
Then in the same module, run your child with multiprocessing in 'fork' mode set_start_method('fork').
using this the sub-processes will see this variable without copying it. This happens due to linux forking mechanism that creates child processes with the same memory mapping as the parent (see "copy on write").
I'd suggest mmapping the files, that way they can be shared across multiple processes as well as getting swapped in/out as appropriate
the details of this would depend on what you mean by "25GB of images" and how these models want to access the images
the basic idea would be to preprocess the images into an appropriate format (e.g. one big 4D uint8 numpy array or maybe smaller ones, indicies could be (image, row, column, channel)) and save them in a format where they can be efficiently used by the models. see numpy.memmap for some examples of this
I'd suggest preprocessing files into a useful format "offline", i.e. not part of the model training but a seperate program that is run first. as this would probably take a while and you'd probably not want to do it every time
When a python session/terminal is closed or killed forcefully does the memory occupied by lists other data structures in the code released automatically (I mean the garbage collection)?
Yes, the Python process attached to that terminal releases all the acquired memory. This applies not only to Python, but to standalone processes in terminals (with no dependencies).
This is particularly important when using GPUs, since many modules (e.g. tensorflow/Pytorch) rely on full control of the GPU.
Yes, assuming its the parent process. If you kill the parent process all memory should be release back.
Python's garbage collector kicks in in cases where the parent is still running but the memory requirements are changing over time. Eg: You started by reading in a large file into a list and are removing items as you process items on the list. In this scenario you should see a decrease in memory usage, but it will never go down to what it was before the list was created (even if you del the entire list). This is because python tries to "think ahead" and does't release some of the memory in case your process asks for it again.
Another fun case is when using python's multiprocessing lib. If you don't close the pool correctly, the memory used by the child processes may not get released.
I think you just cared about what would happen when you open up a REPL, ran some manipulations and killed the shell. In that case, the memory is released back.
I'm playing with Python multiprocessing module to have a (read-only) array shared among multiple processes. My goal is to use multiprocessing.Array to allocate the data and then have my code forked (forkserver) so that each worker can read straight from the array to do their job.
While reading the Programming guidelines I got a bit confused.
It is first said:
Avoid shared state
As far as possible one should try to avoid shifting large amounts of
data between processes.
It is probably best to stick to using queues or pipes for
communication between processes rather than using the lower level
synchronization primitives.
And then, a couple of lines below:
Better to inherit than pickle/unpickle
When using the spawn or forkserver start methods many types from
multiprocessing need to be picklable so that child processes can use
them. However, one should generally avoid sending shared objects to
other processes using pipes or queues. Instead you should arrange the
program so that a process which needs access to a shared resource
created elsewhere can inherit it from an ancestor process.
As far as I understand, queues and pipes pickle objects. If so, aren't those two guidelines conflicting?
Thanks.
The second guideline is the one relevant to your use case.
The first is reminding you that this isn't threading where you manipulate shared data structures with locks (or atomic operations). If you use Manager.dict() (which is actually SyncManager.dict) for everything, every read and write has to access the manager's process, and you also need the synchronization typical of threaded programs (which itself may come at a higher cost from being cross-process).
The second guideline suggests inheriting shared, read-only objects via fork; in the forkserver case, this means you have to create such objects before the call to set_start_method, since all workers are children of a process created at that time.
The reports on the usability of such sharing are mixed at best, but if you can use a small number of any of the C-like array types (like numpy or the standard array module), you should see good performance (because the majority of pages will never be written to deal with reference counts). Note that you do not need multiprocessing.Array here (though it may work fine), since you do not need writes in one concurrent process to be visible in another.
My current assignment is to run a Python program continuously, it is going to be a cron job kind of thing, internally it will have objects which is going to be updated every 24hrs and then basically write the details in a file.
Some advice required about the memory management
Should I use single process or multi threaded. As there is a scope in the program which can be done parallel. As it is going to run continuously some clarification would be required about the memory consumption of these threads also do I need to cleanup the resources of the threads after each execution. Is there any clean up method available for threads in python.
When I do a object allocation in Python, do I need to think about the destructor as well or Python will do the gc.
Please share your thoughts on this as well as what would be the best approach.
There seems to be a misunderstanding in your question.
A cron job is a scheduled task that runs at a given interval of time. A program running continuously doesn't need to be scheduled, aside from being launched at boot.
First, multi-threading in Python suffers the GIL, so unless you are calling multi-threading aware library functions or your computations are I/O bound (often blocked by input/output such as disk access, network access and such) that releases the GIL, you will only have an insubstantial gain by using threading. You should though consider using the multiprocessing package for parallel computing. Other options are NumPy-based calculations when the library is compiled with OpenMP or using a task-based parallel framework such as SCOOP or Celery.
As stated in the comments, memory management is built in in Python and you won't have to worry about it apart from deleting unused instances or elements. Python will garbage collect your program automatically for every element that doesn't have any variable bound to it, so be sure to delete them or let them fall off-scope accordingly.
On a side-note, be careful with objects destructors in Python, they tend to exhibit a different behavior than other Object Oriented languages. I recommend you reading of this matter before using them.
I'm new here and I'm Italian (forgive me if my English is not so good).
I am a computer science student and I am working on a concurrent program project in Python.
We should use monitors, a class with its methods and data (such as condition variables). An instance (object) of this class monitor should be shared accross all processes we have (created by os.fork o by multiprocessing module) but we don't know how to do. It is simpler with threads because they already share memory but we MUST use processes. Is there any way to make this object (monitor) shareable accross all processes?
Hoping I'm not saying nonsenses...thanks a lot to everyone for tour attention.
Waiting answers.
Lorenzo
As far as "sharing" the instance, I believe the instructor wants you to make your monitor's interface to its local process such that it's as if it were shared (a la CORBA).
Look into the absolutely fantastic documentation on multiprocessing's Queue:
from multiprocessing import Process, Queue
def f(q):
q.put([42, None, 'hello'])
if __name__ == '__main__':
q = Queue()
p = Process(target=f, args=(q,))
p.start()
print q.get() # prints "[42, None, 'hello']"
p.join()
You should be able to imagine how your monitor's attributes might be propagated among the peer processes when changes are made.
shared memory between processes is usually a poor idea; when calling os.fork(), the operating system marks all of the memory used by parent and inherited by the child as copy on write; if either process attempts to modify the page, it is instead copied to a new location that is not shared between the two processes.
This means that your usual threading primitives (locks, condition variables, et-cetera) are not useable for communicating across process boundaries.
There are two ways to resolve this; The preferred way is to use a pipe, and serialize communication on both ends. Brian Cain's answer, using multiprocessing.Queue, works in this exact way. Because pipes do not have any shared state, and use a robust ipc mechanism provided by the kernel, it's unlikely that you will end up with processes in an inconsistent state.
The other option is to allocate some memory in a special way so that the os will allow you to use shared memory. the most natural way to do that is with mmap. cPython won't use shared memory for native python object's though, so you would still need to sort out how you will use this shared region. A reasonable library for this is numpy, which can map the untyped binary memory region into useful arrays of some sort. Shared memory is much harder to work with in terms of managing concurrency, though; since there's no simple way for one process to know how another processes is accessing the shared region. The only time this approach makes much sense is when a small number of processes need to share a large volume of data, since shared memory can avoid copying the data through pipes.