How to catch runtime errors from native code in python? - python

I have the following problem, Lets have this python function
def func():
run some code here which calls some native code
Inside func() I am calling some functions which in turn calls some native C code.
If any crash happens the whole python process crashes alltoghether.
How is possible to catch and recover from such errors?
One way that came to my mind is run this function in a separate process, but not just starting another process because there is a lot of memory and objects used by the function, will be very hard to split that. Is there something like fork() in C available in python, to create a copy of the same exact process with same memory structures and etc?
Or maybe other ideas?
Update:
It seems that there is no real way of catching the C runtime errors in python, those are at a lower level and crashes the whole Python virtual machine.
As solutions you currently have two options:
Use os.fork() but work only in unix like OS env.
Use multiprocessing and a shared memory model to share big objects between processes. Usual serialization will just not work with objects that have multi-gigabytes in memory (you will just run out of memory). However there is a very good python library called Ray (https://docs.ray.io/en/master/) that performs in-memory big objects serialization using shared memory model and it's ideal for BigData/ML workloads - highly recommended.

As long as you are running on an operating system that supports fork that's already how the multiprocessing module creates subprocesses. You could os.fork, multiprocessing.Process or multiprocessing.Pool to get what you want. You can also use the os.fork() call on these systems.

Related

Python - alternatives for internal memory

I'm coding a program that requires high memory usage.
I use python 3.7.10.
During the program I create about 3GB of python objects, modifying them.
Some objects I create contain pointer to other objects.
Also, sometimes I need to deepcopy one object to create another.
My problem is that these objects creation and modification takes a lot of time and causing some performance issues.
I wish I could do some of the creation and modification in parallel. However, there are some limitations:
the program is very CPU-bound and there is almost no usage of IO/network - so multithreading library will not work due to the GIL
the system I work with has no Read-on-write feature- so using multiprocessing python library spend a lot of time on forking the process
the objects do not contain numbers and most of the work in the program are not mathematical - so I cannot benefit from numpy and ctypes
What can be a good alternative for this kind of memory to allow me to parallelize better my code?
Deepcopy is extremely slow in python. A possible solution is to serialize and load the objects from the disk. See this answer for viable options – perhaps ujson and cPickle. Furthermore, you can serialize and deserialize objects asynchronously using aiofiles.
Can't you use your GPU RAM and use CUDA?
https://developer.nvidia.com/how-to-cuda-python
If it doesn't need to be realtime I'd use PySpark (see streaming section https://spark.apache.org/docs/latest/api/python/) and work with remote machines.
Can you tell me a bit about the application? Perhaps you're searching for something like the PyTorch framework (https://pytorch.org/).
You may also like to try using Transparent Huge Pages and a hugepage-aware allocator, such as tcmalloc. That may speed up your application by 5-15% without having to change a line of code.
See thp-usage for more information.

Why is pickle needed for multiprocessing module in python

I was doing multiprocessing in python and hit a pickling error. Which makes me wonder why do we need to pickle the object in order to do multiprocessing? isn't fork() enough?
Edit: I kind of get why we need pickle to do interprocess communication, but that is just for the data you want to transfer right? why does the multiprocessing module also try to pickle stuff like functions etc?
Which makes me wonder why do we need to pickle the object in order to
do multiprocessing?
We don't need pickle, but we do need to communicate between processes, and pickle happens to be a very convenient, fast, and general serialization method for Python. Serialization is one way to communicate between processes. Memory sharing is the other. Unlike memory sharing, the processes don't even need to be on the same machine to communicate. For example, PySpark using serialization very heavily to communicate between executors (which are typically different machines).
Addendum: There are also issues with the GIL (Global Interpreter Lock) when sharing memory in Python (see comments below for detail).
isn't fork() enough?
Not if you want your processes to communicate and share data after they've forked. fork() clones the current memory space, but changes in one process won't be reflected in another after the fork (unless we explicitly share data, of course).
I kind of get why we need pickle to do interprocess communication, but
that is just for the data you want to transfer right? why does the
multiprocessing module also try to pickle stuff like functions etc?
Sometimes complex objects (i.e. "other stuff"? not totally clear on what you meant here) contain the data you want to manipulate, so we'll definitely want to be able to send that "other stuff".
Being able to send a function to another process is incredibly useful. You can create a bunch of child processes and then send them all a function to execute concurrently that you define later in your program. This is essentially the crux of PySpark (again a bit off topic, since PySpark isn't multiprocessing, but it feels strangely relevant).
There are some functional purists (mostly the LISP people) that make arguments that code and data are the same thing. So it's not much of a line to draw for some.

Multiprocessing slower than serial processing in Windows (but not in Linux)

I'm trying to parallelize a for loop to speed-up my code, since the loop processing operations are all independent. Following online tutorials, it seems the standard multiprocessing library in Python is a good start, and I've got this working for basic examples.
However, for my actual use case, I find that parallel processing (using a dual core machine) is actually a little (<5%) slower, when run on Windows. Running the same code on Linux, however, results in a parallel processing speed-up of ~25%, compared to serial execution.
From the docs, I believe this may relate to Window's lack of fork() function, which means the process needs to be initialised fresh each time. However, I don't fully understand this and wonder if anyone can confirm this please?
Particularly,
--> Does this mean that all code in the calling python file gets run for each parallel process on Windows, even initialising classes and importing packages?
--> If so, can this be avoided by somehow passing a copy (e.g. using deepcopy) of the class into the new processes?
--> Are there any tips / other strategies for efficient parallelisation of code design for both unix and windows.
My exact code is long and uses many files, so I have created a pseucode-style example structure which hopefully shows the issue.
# Imports
from my_package import MyClass
imports many other packages / functions
# Initialization (instantiate class and call slow functions that get it ready for processing)
my_class = Class()
my_class.set_up(input1=1, input2=2)
# Define main processing function to be used in loop
def calculation(_input_data):
# Perform some functions on _input_data
......
# Call method of instantiate class to act on data
return my_class.class_func(_input_data)
input_data = np.linspace(0, 1, 50)
output_data = np.zeros_like(input_data)
# For Loop (SERIAL implementation)
for i, x in enumerate(input_data):
output_data[i] = calculation(x)
# PARALLEL implementation (this doesn't work well!)
with multiprocessing.Pool(processes=4) as pool:
results = pool.map_async(calculation, input_data)
results.wait()
output_data = results.get()
EDIT: I do not believe the question is a duplicate of the one suggested, since this relates to a difference in Windows and Linunx, which is not mentioned at all in the suggested duplicate question.
NT Operating Systems lack the UNIX fork primitive. When a new process is created, it starts as a blank process. It's responsibility of the parent to instruct the new process on how to bootstrap.
Python multiprocessing APIs abstracts the process creation trying to give the same feeling for the fork, forkserver and spawn start methods.
When you use the spawn starting method, this is what happens under the hood.
A blank process is created
The blank process starts a brand new Python interpreter
The Python interpreter is given the MFA (Module Function Arguments) you specified via the Process class initializer
The Python interpreter loads the given module resolving all the imports
The target function is looked up within the module and called with the given args and kwargs
The above flow brings few implications.
As you noticed yourself, it is a much more taxing operation compared to fork. That's why you notice such a difference in performance.
As the module gets imported from scratch in the child process, all import side effects are executed anew. This means that constants, global variables, decorators and first level instructions will be executed again.
On the other side, initializations made during the parent process execution will not be propagated to the child. See this example.
This is why in the multiprocessing documentation they added a specific paragraph for Windows in the Programming Guidelines. I highly recommend to read the Programming Guidelines as they already include all the required information to write portable multi-processing code.

scheduling embedded python processes

I've been trying to create a C++ program that embeds multiple python threads. Due to the nature of the program the advantage of multitasking comes from asynchronous I/O; but due to some variables that need to be altered between context switching I need to control the scheduling. I thought that because of python's GIL lock this would be simple enough, but it's turning out not to be: python wants to use POSIX threads rather than software threads, I can't figure out from the documentation what happens if I store the result of PyEval_SaveThread() and don't call PyEval_RestoreThread() in the same function--so presumably I'm not supposed to be doing that, etc.
Is it possible to create a custom scheduler for embedded python threads, or was python basically designed so that it can't be done?
It turns out that using PyEval_SaveThread() and PyEval_RestoreThread() is unnecessary, basically I used coroutines to run the scripts and control the scheduling. In this case from libPCL. However this isn't really much of a solution because if python encounters a syntax error it will segfault if it is in a coroutine, oddly enough even if there is only one python script running in one coroutine this will still happen. But at the very least they don't seem to conflict with each other.

Running a continuous Python process and memory management

My current assignment is to run a Python program continuously, it is going to be a cron job kind of thing, internally it will have objects which is going to be updated every 24hrs and then basically write the details in a file.
Some advice required about the memory management
Should I use single process or multi threaded. As there is a scope in the program which can be done parallel. As it is going to run continuously some clarification would be required about the memory consumption of these threads also do I need to cleanup the resources of the threads after each execution. Is there any clean up method available for threads in python.
When I do a object allocation in Python, do I need to think about the destructor as well or Python will do the gc.
Please share your thoughts on this as well as what would be the best approach.
There seems to be a misunderstanding in your question.
A cron job is a scheduled task that runs at a given interval of time. A program running continuously doesn't need to be scheduled, aside from being launched at boot.
First, multi-threading in Python suffers the GIL, so unless you are calling multi-threading aware library functions or your computations are I/O bound (often blocked by input/output such as disk access, network access and such) that releases the GIL, you will only have an insubstantial gain by using threading. You should though consider using the multiprocessing package for parallel computing. Other options are NumPy-based calculations when the library is compiled with OpenMP or using a task-based parallel framework such as SCOOP or Celery.
As stated in the comments, memory management is built in in Python and you won't have to worry about it apart from deleting unused instances or elements. Python will garbage collect your program automatically for every element that doesn't have any variable bound to it, so be sure to delete them or let them fall off-scope accordingly.
On a side-note, be careful with objects destructors in Python, they tend to exhibit a different behavior than other Object Oriented languages. I recommend you reading of this matter before using them.

Categories