How to check whether a python web application is compatible with gevent?

How to check whether a python web application is compatible with gevent? - python

Currently we have several web applications(written with Django) which work well under gunicorn's default sync worker, and we want to use its gevent worker to get a better performance.
It is known that several operations which have side-effects may cause problems when gevent.monkey.patch_all() is used:
Read and write global variables including static class variables and so on. Read and write global variables between greenlets may get unexpected results.
3rd-party libs that using C extension (mysqlclient etc). Generally there should be no problems though IO/timeouts may be blocking in the C extension. But if the C extension stores some global state variables, gevent may cause some unexpected behaviors. And I wonder that if there are any problem when non-blocking IO or multithreading is used in the C extension.
Now the problem comes for:
How to check if any global variables may be read/write by two or more greenlets? Or generally, is there any global variable writing operations in our web applications?
How to check if a C extension module is compatible with gevent?
For the first problem we have an idea:
Add hooks code in cpython. For every PyObject we create two lists to store IDs (or something like) of greenlets which read or write the object. Rebuild cpython and run our application with real workload to check. But it seems too complex to implement.

Related

python inter-process mutex for arbitrary processes

I need to mutex several processes running python on a linux host.
They processes are not spawned in a way I control (to be clear, they are my code), so i cannot use multithreading.Lock, at least as I understand it. The resource being synchronized is a series of reads/writes to two separate internal services, which are old, stateful, not designed for concurrent/transactional access, and out of scope to modify.
a couple approaches I'm familiar with but rejected so far:
In native code using shmget / pthread_mutex_lock (eg create a pthread mutex by well-known string name, in shared memory provided by the OS). Im hoping to not have to use/add a ctypes wrapper for this (or ideally have any low-level constructs visible at all here for this high-level app).
Using one of the lock file libraries such as fasteners would work - but requiring any particular file system access is awkward (the library/approach could use it robustly under the hood, but ideally my client code is abstracted from that).
Is there a preferred way to accomplish this in python (under linux; bonus points for cross-platform)?

Options for synchronizing non-child processes:
Use a remote manager. I'm not super familiar with this process, but the docs has at least a simple example.
create a simple server with your own protocol (rather than a manager): something like a socket server on the loopback address for bouncing simple messages around.
use the filesystem: https://pypi.org/project/filelock/
On posix compliant systems, there's a rather straightforward wrapper for IPC constructs posix-ipc. I also found a wrapper for windows semaphores, but it's not quite as simple (though also not difficult per-say). In both cases your program would use a well known string "name" to access / create the mutex. In both cases, care / error checking is needed to handle creation of the mutex properly (see things like O_CREX flag...)

How to catch runtime errors from native code in python?

I have the following problem, Lets have this python function
def func():
run some code here which calls some native code
Inside func() I am calling some functions which in turn calls some native C code.
If any crash happens the whole python process crashes alltoghether.
How is possible to catch and recover from such errors?
One way that came to my mind is run this function in a separate process, but not just starting another process because there is a lot of memory and objects used by the function, will be very hard to split that. Is there something like fork() in C available in python, to create a copy of the same exact process with same memory structures and etc?
Or maybe other ideas?
Update:
It seems that there is no real way of catching the C runtime errors in python, those are at a lower level and crashes the whole Python virtual machine.
As solutions you currently have two options:
Use os.fork() but work only in unix like OS env.
Use multiprocessing and a shared memory model to share big objects between processes. Usual serialization will just not work with objects that have multi-gigabytes in memory (you will just run out of memory). However there is a very good python library called Ray (https://docs.ray.io/en/master/) that performs in-memory big objects serialization using shared memory model and it's ideal for BigData/ML workloads - highly recommended.

As long as you are running on an operating system that supports fork that's already how the multiprocessing module creates subprocesses. You could os.fork, multiprocessing.Process or multiprocessing.Pool to get what you want. You can also use the os.fork() call on these systems.

Should the use of subprocess.Popen and subprocess.call be avoided if possible?

I'm doing a project doing simple image processing and comparison through the use of ImageMagick
Right now, to execute my commands I'm using the python subprocess module as such:
color_space = ...
evaluate_sequence = ...
output_file_name = ...
convert_cmd = ["magick", "convert", "-colorspace", color_space.name] + queue + \
["-evaluate-sequence", evaluate_sequence.name, output_file_name]
subprocess.call(convert_cmd)
I recently learned there are Python wrappers for ImageMagick. In particular, I was looking into MagickWand.
Is there a large benefit of refactoring my code to not use the subprocess module in terms of performance, security, etc?
I think the subprocess call is more readable/simple than if I used something like MagickWand, but if there are other benefits I want to switch.

Should the use of subprocess.Popen and subprocess.call be avoided if possible?
Not worth it. The code you shared is a straight forward task that you already templated in python in a clear & readable way. Why complicate your solution with additional dependencies and complexity for a single quick task. Also convert utility works for you today, but there may be another external utility tomorrow.
Is there a large benefit of refactoring my code to not use the subprocess module in terms of performance, security, etc?
I would argue that it's only the smallest benefit to performance via NOT invoking a system call, but including a C-API wrapper module that dynamically loads shared libraries would also be about the same. Plus, depending on delegations, it's possible that ImageMagick itself would invoke a system call.
For security, either way, your application is still responsible to sanitize variables. I would also suggest...
Reading up on Security Policies with IM
Ensure that the subprocess.call can not be access by external resources. (i.e. If solution is on a web-server, move tasks to a remote queue worker)
Improve error & warning handling. They are more common then you think!
Arguments FOR switching over
Common justifications that I can think of...
Reduce I/O because the image data is already in python memory.
Tasks are dynamic based on an algorithm (like meme generators).
Common pixel iterators
OCR/CV preprocessors
Again, you really don't have an argument to justify switching over. At least not today.

Are Python extensions produced by Cython/Pyrex threadsafe?

If not, is there a way I can guarantee thread safety by programming a certain way?
To clarify, when talking about "threadsafe,' I mean Python threads, not OS-level threads.

It all depends on the interaction between your Cython code and Python's GIL, as documented in detail here. If you don't do anything special, Cython-generated code will respect the GIL (as will a C-coded extension that doesn't use the GIL-releasing macros); that makes such code "as threadsafe as Python code" -- which isn't much, but is easier to handle than completely free-threading code (you still need to architect multi-threaded cooperation and synchronization, ideally with Queue instances but possibly with locking &c).
Code that has relinquished the GIL and not yet acquired it back MUST NOT in any way interact with the Python runtime and the objects that the Python runtime uses -- this goes for Cython just as well as for C-coded extensions. The upside of it is of course that such code can run on a separate core (until it needs to sync up or in any way communicate with the Python runtime again, of course).

Python's global interpreter lock means that only one thread can be active in the interpreter at any one time. However, once control is passed out to a C extension another thread can be active within the interpreter. Multiple threads can be created, and nothing prevents a thread from being interrupted within the middle of a critical section. N
on thread-safe code can be implemented within the interpreter, so nothing about code running within the interpreter is inherently thread safe. Code in C or Pyrex modules can still modify data structures that are visible to python code. Native code can, of course, also have threading issues with native data structures.
You can't guarantee thread safety beyond using appropriate design and synchronisation - the GIL on the python interpreter doesn't materially change this.

python threadsafe object cache

I have implemented a python webserver. Each http request spawns a new thread.
I have a requirement of caching objects in memory and since its a webserver, I want the cache to be thread safe. Is there a standard implementatin of a thread safe object cache in python? I found the following
http://freshmeat.net/projects/lrucache/
This does not look to be thread safe. Can anybody point me to a good implementation of thread safe cache in python?
Thanks!

Well a lot of operations in Python are thread-safe by default, so a standard dictionary should be ok (at least in certain respects). This is mostly due to the GIL, which will help avoid some of the more serious threading issues.
There's a list here: http://coreygoldberg.blogspot.com/2008/09/python-thread-synchronization-and.html that might be useful.
Though atomic nature of those operation just means that you won't have an entirely inconsistent state if you have two threads accessing a dictionary at the same time. So you wouldn't have a corrupted value. However you would (as with most multi-threading programming) not be able to rely on the specific order of those atomic operations.
So to cut a long story short...
If you have fairly simple requirements and aren't to bothered about the ordering of what get written into the cache then you can use a dictionary and know that you'll always get a consistent/not-corrupted value (it just might be out of date).
If you want to ensure that things are a bit more consistent with regard to reading and writing then you might want to look at Django's local memory cache:
http://code.djangoproject.com/browser/django/trunk/django/core/cache/backends/locmem.py
Which uses a read/write lock for locking.

Thread per request is often a bad idea. If your server experiences huge spikes in load it will take the box to its knees. Consider using a thread pool that can grow to a limited size during peak usage and shrink to a smaller size when load is light.

Point 1. GIL does not help you here, an example of a (non-thread-safe) cache for something called "stubs" would be
stubs = {}
def maybe_new_stub(host):
""" returns stub from cache and populates the stubs cache if new is created """
if host not in stubs:
stub = create_new_stub_for_host(host)
stubs[host] = stub
return stubs[host]
What can happen is that Thread 1 calls maybe_new_stub('localhost'), and it discovers we do not have that key in the cache yet. Now we switch to Thread 2, which calls the same maybe_new_stub('localhost'), and it also learns the key is not present. Consequently, both threads call create_new_stub_for_host and put it into the cache.
The map itself is protected by the GIL, so we cannot break it by concurrent access. The logic of the cache, however, is not protected, and so we may end up creating two or more stubs, and dropping all except one on the floor.
Point 2. Depending on the nature of the program, you may not want a global cache. Such shared cache forces synchronization between all your threads. For performance reasons, it is good to make the threads as independent as possible. I believe I do need it, you may actually not.
Point 3. You may use a simple lock. I took inspiration from https://codereview.stackexchange.com/questions/160277/implementing-a-thread-safe-lrucache and came up with the following, which I believe is safe to use for my purposes
import threading
stubs = {}
lock = threading.Lock()
def maybe_new_stub(host):
""" returns stub from cache and populates the stubs cache if new is created """
with lock:
if host not in stubs:
channel = grpc.insecure_channel('%s:6666' % host)
stub = cli_pb2_grpc.BrkStub(channel)
stubs[host] = stub
return stubs[host]
Point 4. It would be best to use existing library. I haven't found any I am prepared to vouch for yet.

You probably want to use memcached instead. It's very fast, very stable, very popular, has good python libraries, and will allow you to grow to a distributed cache should you need to:
http://www.danga.com/memcached/

I'm not sure any of these answers are doing what you want.
I have a similar problem and I'm using a drop-in replacement for lrucache called cachetools which allows you to pass in a lock to make it a bit safer.

For a thread safe object you want threading.local:
from threading import local
safe = local()
safe.cache = {}
You can then put and retrieve objects in safe.cache with thread safety.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.