Python dict.get() Lock - python

When I use dictionary.get() function, is it locking the whole dictionary? I am developing a multiprocess and multithreading program. The dictionary is used to act as a state table to keep track of data. I have to impose a size limit to the dictionary, so whenever the limit is being hit, I have to do garbage collection on the table, based on the timestamp. The current implementation will delay adding operation while garbage collection is iterating through the whole table.
I will have 2 or more threads, one just to add data and one just to do garbage collection. Performance is critical in my program to handle streaming data. My program is receiving streaming data, and whenever it receives a message, it has to look for it in the state table, then add the record if it's non-existent in the first place, or copy certain information and then send it along the pipe.
I have thought of using multiprocessing to do the search and adding operation concurrently, but if I used processes, I have to make a copy of state table for each process, in that case, the performance overhead for synchronization is too high. And I also read that multiprocessing.manager.dict() is also locking the access for each CRUD operation. I could not spare the overhead for it so my current approach is using threading.
So my question is while one thread is doing .get(), del dict['key'] operation on the table, will the other insertion thread be blocked from accessing it?
Note: I have read through most SO's python dictionary related posts, but I cannot seem to find the answer. Most people only answer that even though python dictionary operations are atomic, it is safer to use a Lock for insertion/update. I'm handling a huge amount of streaming data so Locking every time is not ideal for me. Please advise if there is a better approach.

If the process of hashing or comparing the keys in your dictionary could invoke arbitrary Python code (basically, if the keys aren't all Python built-in types implemented in C, e.g. str, int, float, etc.), then yes, it would be possible for a race condition to occur in which the GIL is released while a bucket collision is being resolved (during the equality test), and another thread could leap in and cause the object being compared to to disappear from the dict. They try to ensure it doesn't actually crash the interpreter, but it has been a source of errors in the past.
If that's a possibility (or you're on a non-CPython interpreter, where there is no GIL providing basic guarantees like this), then you should really use a lock to coordinate access. On CPython, as long as you're on modern Python 3, the cost will be fairly low; contention on the lock should be fairly low since the GIL ensures only one thread is actually running at once; most of the time your lock should be uncontended (because the contention is on the GIL), so the incremental cost of using it should be fairly small.
A note: You might consider using collections.OrderedDict to simplify the process of limiting the size of your table. With OrderedDict, you can implement the size limit as a strict LRU (least-recently used) system by making additions to the table done as:
with lock:
try:
try:
odict.move_to_end(key) # If key already existed, make sure it's "renewed"
finally:
odict[key] = value # set new value whether or not key already existed
except KeyError:
# move_to_end raising key error means newly added key, so we might
# have grown larger than limit
if len(odict) > maxsize:
odict.popitem(False) # Pops oldest item
and usage done as:
with lock:
# move_to_end optional; if using key means it should live longer, then do it
# if only setting key should refresh it, omit move_to_end
odict.move_to_end(key)
return odict[key]
This does need a lock, but it also reduces the work for garbage collection when it grows too large from "check every key" (O(n) work) to "pop the oldest item off without looking at anything else" (O(1) work).

A lock is used to avoid race conditions so no two threads could make change to the dict at the same time so it is advisible that you use the lock else you might go into a race condition causing program to fail. A mutex lock can be used to deal with 2 threads.

Related

what's the best practice for accessing shared data with python multi-threading

In python multi-threading, there are some atomic types that can be accessed
by multiple threads without protection(list, dict, etc). There are also some types need protected by lock.
My question is:
where can I find official document that list all atomic types, I can google some answers, but they are not "official" and out of date.
some book suggest that we should protect all shared data with lock, because atomic type may because non-atomic, we shouldn't rely on it. Is this correct?
because lock surely have overhead, is this overhead negligible even with big program?
Locks are used for making an operation atomic. This means only one thread can access some resource. Using many locks causes your application lose the benefit of threading, as only one thread can access the resource.
If you think about it, it doesn't make much sense. It will make your program slower, because of the python needs to manage and context switch between the threads.
When using threads, you should look for minimizing the number of locks as much as possible. Try use local variables whenever possible. Make your function do some work, and return a value instead of updating an existing one.
Then you can create a Queue and collect the results.
Besides locks, there are Semaphores. These are basically Locks, with a limited number of threads can use it:
A semaphore manages an internal counter which is decremented by each acquire() call and incremented by each release() call. The counter can never go below zero; when acquire() finds that it is zero, it blocks, waiting until some other thread calls release().
Python has a good documentation for threading module.
Here is a small example of a dummy function tested using single thread vs 3 threads. Pay attention to the impact Lock makes on the running time:
threads (no locks) duration: 1.0949997901
threads (with locks) duration: 3.1289999485
single thread duration: 3.09899997711
def work():
x = 0
for i in range(100):
x += i
lock.acquire()
print 'acquried lock, do some calculations'
time.sleep(1)
print x
lock.release()
print 'lock released'
I think you are looking for this link.
From above link :
An operation acting on shared memory is atomic if it completes in a
single step relative to other threads. When an atomic store is
performed on a shared variable, no other thread can observe the
modification half-complete. When an atomic load is performed on a
shared variable, it reads the entire value as it appeared at a single
moment in time. Non-atomic loads and stores do not make those
guarantees.
Any manipulation on list won't be atomic operation, so extra care need to be taken to make it thread safe using Lock, Event, Condition or Semaphores etc.
For example, you can check this answer which explains how list are thread safe.

Why is this python code not thread-safe?

I handed this in as part a school assignment and the person who marked it mentioned that this section was not thread safe.
The assignment was to create a multithreaded socket server in python that accepted a number and returned the Fibonacci value of that number. My approach was to memoize the calculations by sharing a dictionary between each of the threads.
Here is the code (with error handling and such removed for brevity)
from socketserver import ThreadingMixIn, TCPServer, BaseRequestHandler
class FibonacciThreadedTCPServer(ThreadingMixIn, TCPServer):
def __init__(self, server_address):
TCPServer.__init__(self, server_address, FibonacciThreadedTCPRequestHandler, bind_and_activate=True)
#this dictionary will be shared between all Request handlers
self.fib_dict = {0: 0, 1: 1, 2: 1}
class FibonacciThreadedTCPRequestHandler(BaseRequestHandler):
def handle(self):
data = self.request.recv(1024).strip()
num = int(data)
result = self.calc_fib(self.server.fib_dict, num)
ret = bytes(str(result) + '\n', 'ascii')
self.request.sendall(ret)
#staticmethod
def calc_fib(fib_dict, n):
"""
Calculates the fibonacci value of n using a shared lookup table and a linear calculation.
"""
length = len(fib_dict)
while length <= n:
fib_dict[length] = fib_dict[length - 1] + fib_dict[length - 2]
length = len(fib_dict)
return fib_dict[n]
I understand that there are reads and writes occuring simultaneously in the calc_fib method, and normally that means that code is not thread-safe. However in this instance I think it's possible to prove that the code will always provide predictable results.
Is the fact that reads and writes can be occuring simultaneously enough for it to not be considered thread-safe? Or is something considered thread-safe if it always returns a reliabel result.
Why I think this code will always produce reliable results:
A read will never occur on any given index in the dictionary until a write has occurred there.
Any subsequent writes to any given index will contain the same number as the previous writes, so regardless of when the read/write sequence occurs it will always receive the same data.
I have tested this by added random sleeps between each operation and making requests with several hundred threads simultaneously and the right answer is already returned during my test.
Any thoughts or criticism would be appreciated. Thanks.
In this particular case, the GIL should keep your code safe, because:
CPython built-in data structures are protected from actual damage (as opposed to merely incorrect behavior) by the GIL (dict in particular needs this guarantee, because class instances and non-local scope usually use dicts for attribute/name lookup, and without the GIL, simply reading the values would be fraught with danger)
You're updating a cached value of the length then using it for the next set of operations instead of rechecking the length during the mutation; this can lead to repeated work (multiple threads see the old length and compute the new value repeatedly) but since the key is always set to the same value, it doesn't matter if they each set it independently
You never delete from your cache (if you did, that cached length would bite you)
So in CPython, this should be fine. I can't make any guarantees for other Python interpreters though; without the GIL, if they implement their dict without internal locking, it's wholly possible a rehash operation triggered by a write in one thread could cause another thread to read from a dict in an inconsistent/unusable state.
First of all, why do you think that dictionaries are thread-safe? I did a quick search of Python3 documentation (I'm a Python novice myself), and I can not find any reassurance that two unsynchronized threads can safely update the same dictionary without corrupting the dictionary internals and maybe crashing the program.
I've been writing multi-threaded code in other languages since 1980-something, and I've learned never to trust that something is thread safe just because it acts that way when I test it. I want to see documentation that it's supposed to be thread safe. Otherwise, I throw a mutex around it.
Second of all, you are assuming that fib_dict[length - 1] and fib_dict[length - 2] will be valid. My experience with other programming languages says not to assume it. In other programming languages, (e.g., Java), when threads share data without synchronization, it is possible for one thread to see variable updates happen in a different order from the order in which some other thread performed them. E.g., it is theoretically possible for a Java thread that accesses a Map with no synchronization to see the size() of the Map increase before it sees the new values actually appear in the map. I will assume that something similar could happen in Python until somebody shows me official documentation that says otherwise.

Will the collections.deque "pop" methods release GIL?

I have a piece of code where I have a processing thread and a monitor thread. In the processing thread, I have a call to collections.deque.popleft function. I wanted to know if this function releases GIL because I want run my monitor thread even when the processing function is blocked on the popleft function
Instead of answering this specific question I'll answer a different question:
What is the Global Interpreter Lock (GIL), and when will it block my program?
In short, the GIL protects the interpreter's state from becoming corrupted by concurrent threads.
For a sense of what it is for, Consider the low level implementation of dict, which somewhere has an array of keys, organized for quick lookup. When you write some code like:
myDict['foo'] = 'bar'
the python interpreter needs to adjust its collection of keys. That might involve things like making more room for the additional key as well as adding the particular key to that array.
If multiple, concurrent threads are modifying that dict, then one thread might reallocate the array while another is in the middle of modifying it, which could cause some unpredictable, probably bad behavior (anything from corrupted data, segfault or heartbleed like memory content leak of sensitive data or arbitrary code execution)
Since that's not the sort of state you can reasonably describe or prevent at the level of your python application, the run-time goes to great lengths to prevent those sorts of problems from occuring. The way it does it is that certain parts of the interpreter, such as the modification of a dict, is surrounded by a PyGILState_Ensure()/PyGILState_Release() pair, so that critical operations always reach a consistent state.
Note however that the scope of this lock is very narrow; it doesn't attempt to protect from general data races, it won't protect you from writing a program with multiple threads overwriting each other's work in a common container (say, a collections.deque), only that even if you do write such a program, it wont' cause the interpreter to crash, you'll always have a valid, working deque. You can add additional application locks, as in queue.Queue to give good concurrent semantics to your application.
Since every operation that the GIL protects is a change in the interpreter state, it never blocks on external events; since those events won't cause the interpreter state to be changed, a signaling condition variable cannot corrupt memory.
The only time you might have a problem is when you have several unblocked threads, since they are potentially all executing code in the low level interpreter, they'll compete for the GIL, and only one thread can hold it, blocking other threads that also want to do some computation.
Unless you are writing C extensions, you probably don't need to worry about it, and unless you have multiple, compute bound threads, in python, you won't be affected by it, either.
Yes -- deque is thread-safe (thanks #hemanths) http://docs.python.org/2/library/collections.html#collections.deque
No, because collections.deque is not thread-safe. Use a Queue, or make your own deque subclass.

Implementing a Timer in Python

General Overview
I have medium size django project
I have a bunch of prefix trees in memory (as opposed to DB)
The nodes of these trees represent entities/objects that are subject to a timeout. Ie, I need to timeout these nodes at various points in time
Design:
Essentially, I needed a Timer construct that allows me to fire a resettable 1-shot timer and associate and give it a callback that can can perform some operation on the entity creating the timer, which in this case is a node of the tree.
After looking through the various options, I couldn't find anything that I could natively use (like some django app). The Timer object in Python is not suitable for this since it won't scale/perform. Thus I decided to write my own timer based on:
A sorted list of time-delta objects that holds the time-horizon
A mechanism to trigger the "tick"
Implementation Choices:
Went with a wrapper around Bisect for the sorted delta list:
http://code.activestate.com/recipes/577197-sortedcollection/
Went with celery to provide the tick - A granularity of 1 minute, where the worker would trigger the timer_tick function provided by my Timer class.
The timer_tick essentially should go through the sorted list, decrementing the head node every tick. Then if any nodes have ticked down to 0, kick off the callback and remove those nodes from the sorted timer list.
Creating a timer involves instantiating a Timer object which returns the id of the object. This id is stored in db and associated with an entry in DB that represents the entity creating the timer
Additional Data Structures:
In order to track the Timer instances (which get instantiated for each timer creation) I have a WeakRef Dictionary that maps the id to obj
So essentially, I have 2 data-structures in memory of my main Django project.
Problem Statement:
Since the celery worker needs to walk the timer list and also potentially modify the id2obj map, looks like I need to find a way to share state between my celery worker and main
Going through SO/Google, I find the following suggestions
Manager
Shared Memory
Unfortunately, bisect wrapper doesn't lend itself very well to pickling and/or state sharing. I tried the Manager approach by creating a dict and trying to embed the sorted List within the Dict..it came out with an error (kind of expected I guess since the memory held by the Sorted List is not shared and embedding it within a "shared" memory object will not work)
Finally...Question:
Is there a way I can share my SortedCollection and Weakref Dict with the worker thread
Alternate solution:
How about keeping the worker thread simple...having it write to DB for every tick and then using a post Db signal to get notified on the main and execute the processing of expired timers in the main. Of course, the con is that I lose parallelisation.
Let's start with some comments on your existing implementation:
Went with a wrapper around Bisect for the sorted delta list: http://code.activestate.com/recipes/577197-sortedcollection/
While this gives you O(1) pops (as long as you keep the list in reverse time order), it makes each insert O(N) (and likewise for less common operations like deleting arbitrary jobs if you have a "cancel" API). Since you're doing exactly as many inserts as pops, this means the whole thing is algorithmically no better than an unsorted list.
Replacing this with a heapq (that's exactly what they're for) gives you O(log N) inserts. (Note that Python's heapq doesn't have a peek, but that's because heap[0] is equivalent to heap.peek(0), so you don't need it.)
If you need to make other operations (cancel, iterate non-destructively, etc.) O(log N) as well, you want a search tree; look at blist and bintrees on PyPI for some good ones.
Went with celery to provide the tick - A granularity of 1 minute, where the worker would trigger the timer_tick function provided by my Timer class. The timer_tick essentially should go through the sorted list, decrementing the head node every tick. Then if any nodes have ticked down to 0, kick off the callback and remove those nodes from the sorted timer list.
It's much nicer to just keep the target times instead of the deltas. With target times, you just have to do this:
while q.peek().timestamp <= now():
process(q.pop())
Again, that's O(1) rather than O(N), and it's a lot simpler, and it treats the elements on the queue as immutable, and it avoids any possible problems with iterations taking longer than your tick time (probably not a problem with 1-minute ticks…).
Now, on to your main question:
Is there a way I can share my SortedCollection
Yes. If you just want a priority heap of (timestamp, id) pairs, you can fit that into a multiprocessing.Array just as easily as a list, except for the need to keep track of length explicitly. Then you just need to synchronize every operation, and… that's it.
If you're only ticking once/minute, and you expect to be busy more often than not, you can just use a Lock to synchronize, and have the schedule-worker(s) tick itself.
But honestly, I'd drop the ticks completely and just use a Condition—it's more flexible, and conceptually simpler (even if it's a bit more code), and it means you're using 0% CPU when there's no work to be done and responding quickly and smoothly when you're under load. For example:
def schedule_job(timestamp, job):
job_id = add_job_to_shared_dict(job) # see below
with scheduler_condition:
scheduler_heap.push((timestamp, job))
scheduler_condition.notify_all()
def scheduler_worker_run_once():
with scheduler_condition:
while True:
top = scheduler_heap.peek()
if top is not None:
delay = top[0] - now()
if delay <= 0:
break
scheduler_condition.wait(delay)
else:
scheduler_condition.wait()
top = scheduler_heap.pop()
if top is not None:
job = pop_job_from_shared_dict(top[1])
process_job(job)
Anyway, that brings us to the weakdict full of jobs.
Since a weakdict is explicitly storing references to in-process objects, it doesn't make any sense to share it across processes. What you want to store are immutable objects that define what the jobs actually are, not the mutable jobs themselves. Then it's just a plain old dict.
But still, a plain old dict is not an easy thing to share across processes.
The easy way to do that is to use a dbm database (or a shelve wrapper around one) instead of an in-memory dict, synchronized with a Lock. But this means re-flushing and re-opening the database every time anyone wants to change it, which may be unacceptable.
Switching to, say, a sqlite3 database may seem like overkill, but it may be a whole lot simpler.
On the other hand… the only operations you actually have here are "map the next id to this job and return the id" and "pop and return the job specified by this id". Does that really need to be a dict? The keys are integers, and you control them. An Array, plus a single Value for the next key, and a Lock, and you're almost done. The problem is that you need some kind of scheme for key overflow. Instead of just next_id += 1, you have to roll over, and check for already-used slots:
with lock:
next_id += 1
if next_id == size: next_id = 0
if arr[next_id] is None:
arr[next_id] = job
return next_id
Another option is to just store the dict in the main process, and use a Queue to make other processes query it.

python threadsafe object cache

I have implemented a python webserver. Each http request spawns a new thread.
I have a requirement of caching objects in memory and since its a webserver, I want the cache to be thread safe. Is there a standard implementatin of a thread safe object cache in python? I found the following
http://freshmeat.net/projects/lrucache/
This does not look to be thread safe. Can anybody point me to a good implementation of thread safe cache in python?
Thanks!
Well a lot of operations in Python are thread-safe by default, so a standard dictionary should be ok (at least in certain respects). This is mostly due to the GIL, which will help avoid some of the more serious threading issues.
There's a list here: http://coreygoldberg.blogspot.com/2008/09/python-thread-synchronization-and.html that might be useful.
Though atomic nature of those operation just means that you won't have an entirely inconsistent state if you have two threads accessing a dictionary at the same time. So you wouldn't have a corrupted value. However you would (as with most multi-threading programming) not be able to rely on the specific order of those atomic operations.
So to cut a long story short...
If you have fairly simple requirements and aren't to bothered about the ordering of what get written into the cache then you can use a dictionary and know that you'll always get a consistent/not-corrupted value (it just might be out of date).
If you want to ensure that things are a bit more consistent with regard to reading and writing then you might want to look at Django's local memory cache:
http://code.djangoproject.com/browser/django/trunk/django/core/cache/backends/locmem.py
Which uses a read/write lock for locking.
Thread per request is often a bad idea. If your server experiences huge spikes in load it will take the box to its knees. Consider using a thread pool that can grow to a limited size during peak usage and shrink to a smaller size when load is light.
Point 1. GIL does not help you here, an example of a (non-thread-safe) cache for something called "stubs" would be
stubs = {}
def maybe_new_stub(host):
""" returns stub from cache and populates the stubs cache if new is created """
if host not in stubs:
stub = create_new_stub_for_host(host)
stubs[host] = stub
return stubs[host]
What can happen is that Thread 1 calls maybe_new_stub('localhost'), and it discovers we do not have that key in the cache yet. Now we switch to Thread 2, which calls the same maybe_new_stub('localhost'), and it also learns the key is not present. Consequently, both threads call create_new_stub_for_host and put it into the cache.
The map itself is protected by the GIL, so we cannot break it by concurrent access. The logic of the cache, however, is not protected, and so we may end up creating two or more stubs, and dropping all except one on the floor.
Point 2. Depending on the nature of the program, you may not want a global cache. Such shared cache forces synchronization between all your threads. For performance reasons, it is good to make the threads as independent as possible. I believe I do need it, you may actually not.
Point 3. You may use a simple lock. I took inspiration from https://codereview.stackexchange.com/questions/160277/implementing-a-thread-safe-lrucache and came up with the following, which I believe is safe to use for my purposes
import threading
stubs = {}
lock = threading.Lock()
def maybe_new_stub(host):
""" returns stub from cache and populates the stubs cache if new is created """
with lock:
if host not in stubs:
channel = grpc.insecure_channel('%s:6666' % host)
stub = cli_pb2_grpc.BrkStub(channel)
stubs[host] = stub
return stubs[host]
Point 4. It would be best to use existing library. I haven't found any I am prepared to vouch for yet.
You probably want to use memcached instead. It's very fast, very stable, very popular, has good python libraries, and will allow you to grow to a distributed cache should you need to:
http://www.danga.com/memcached/
I'm not sure any of these answers are doing what you want.
I have a similar problem and I'm using a drop-in replacement for lrucache called cachetools which allows you to pass in a lock to make it a bit safer.
For a thread safe object you want threading.local:
from threading import local
safe = local()
safe.cache = {}
You can then put and retrieve objects in safe.cache with thread safety.

Categories