Specifically I'm talking about Python. I'm trying to hack something (just a little) by seeing an object's value without ever passing it in, and I'm wondering if it is thread safe to use thread local to do that. Also, how do you even go about doing such a thing?
No -- thread local means that each thread gets its own copy of that variable. Using it is (at least normally) thread-safe, simply because each thread uses its own variable, separate from variables by the same name that's accessible to other threads. OTOH, they're not (normally) useful for communication between threads.
Related
What is the best way to store thread local data in a PyQt application?
My application uses both QThreads and, through some dependencies, also native Python threads (from the threading module). My specific use case for thread local storage is mostly related to the former.
I can see some options.
Use threading.local. When called from a thread foreign to threading, threading.current_thread() returns a _DummyThread object. AFAICT, threading.local does support dummy threads, but it feels fragile. In particular, since a DummyThread is never deleted, the store will not be cleared when the thread exits. Obviously I want no memory leaks.
When I know that the code in question will always run in a QThread, I could just store the data directly in the thread object obtained via QThread::getCurrentThread(). No idea what would happen if called from a non-Qt thread.
Is there a QThreadStorage equivalent in PyQt? In Qt, it is a template, not a class, so I don't think it is available in PyQt.
Using python, I am writing a nasty cralwer system that cralws something from the websites of each local government, and total websites count to over 100, just in case their webpage changes, I have to use reload to do hot-update. But I am wondering if reload is thread safe. because say, I am reloading moudle Cralwer1 in thread 1, but at the same time, thread 2 is using Cralwer1. Will thread 1's reload cause thread 2 to fail? If so, I have to do a lock or something, otherwise, I can happily do the reload without extra work. Can any one help me out?
Is Python reload thread safe?
No.
The reload() executes all the pure python code in the module. Any pure python step can thread-switch at any time. So, this definitely isn't safe.
reload = re-execute top level code in Crawler1.
Generally speaking without more info/code sample, you can:
Encapsulate the "operational" top level code that kicks things off, e.g. put it in a function or a class, and invoke that instead of reloading the whole module. This may involve calling/adding some cleanup function.
Use a global variable, which thread1 and thread2 will flip and be aware of to prevent stepping on each other. This doesn't scale quite as well, but can perhaps prevent/delay usage of locks.
Using locks is actually not that hard,
they even support context managers:
https://docs.python.org/3/library/threading.html#with-locks
there is a part in the Python documentation that is unclear to me:
https://docs.python.org/3.4/library/multiprocessing.html#sharing-state-between-processes
"As mentioned above, when doing concurrent programming it is usually best to avoid using shared state as far as possible."
But I cannot find any description above 17.2.1.5 that describes why it is best to avoid using shared state. Any ideas?
Shared state is like a global variable, but… more global.
Not only do you have to consider what parts of your code are reading and modifying the state, but also which running copy of your code is accessing it, and how. This gets even trickier when the state is mutable, i.e. can be changed.
To make sure one thread doesn't stomp on what another thread is doing you have to coordinate access to the state. That could be done using semaphores, message-passing, software transactional memory, etc.
See also https://softwareengineering.stackexchange.com/questions/148108/why-is-global-state-so-evil.
I have a piece of code where I have a processing thread and a monitor thread. In the processing thread, I have a call to collections.deque.popleft function. I wanted to know if this function releases GIL because I want run my monitor thread even when the processing function is blocked on the popleft function
Instead of answering this specific question I'll answer a different question:
What is the Global Interpreter Lock (GIL), and when will it block my program?
In short, the GIL protects the interpreter's state from becoming corrupted by concurrent threads.
For a sense of what it is for, Consider the low level implementation of dict, which somewhere has an array of keys, organized for quick lookup. When you write some code like:
myDict['foo'] = 'bar'
the python interpreter needs to adjust its collection of keys. That might involve things like making more room for the additional key as well as adding the particular key to that array.
If multiple, concurrent threads are modifying that dict, then one thread might reallocate the array while another is in the middle of modifying it, which could cause some unpredictable, probably bad behavior (anything from corrupted data, segfault or heartbleed like memory content leak of sensitive data or arbitrary code execution)
Since that's not the sort of state you can reasonably describe or prevent at the level of your python application, the run-time goes to great lengths to prevent those sorts of problems from occuring. The way it does it is that certain parts of the interpreter, such as the modification of a dict, is surrounded by a PyGILState_Ensure()/PyGILState_Release() pair, so that critical operations always reach a consistent state.
Note however that the scope of this lock is very narrow; it doesn't attempt to protect from general data races, it won't protect you from writing a program with multiple threads overwriting each other's work in a common container (say, a collections.deque), only that even if you do write such a program, it wont' cause the interpreter to crash, you'll always have a valid, working deque. You can add additional application locks, as in queue.Queue to give good concurrent semantics to your application.
Since every operation that the GIL protects is a change in the interpreter state, it never blocks on external events; since those events won't cause the interpreter state to be changed, a signaling condition variable cannot corrupt memory.
The only time you might have a problem is when you have several unblocked threads, since they are potentially all executing code in the low level interpreter, they'll compete for the GIL, and only one thread can hold it, blocking other threads that also want to do some computation.
Unless you are writing C extensions, you probably don't need to worry about it, and unless you have multiple, compute bound threads, in python, you won't be affected by it, either.
Yes -- deque is thread-safe (thanks #hemanths) http://docs.python.org/2/library/collections.html#collections.deque
No, because collections.deque is not thread-safe. Use a Queue, or make your own deque subclass.
I'm having a hard time wrapping my head around Python threading, especially since the documentation explicitly tells you to RTFS at some points, instead of kindly including the relevant info. I'll admit I don't feel qualified to read the threading module. I've seen lots of dirt-simple examples, but they all use global variables, which is offensive and makes me wonder if anyone really knows when or where it's required to use them as opposed to just convenient.
In particular, I'd like to know:
In threading.Thread(target=x), is x shared or private? Does each thread have its own stack, or are all threads using the same context simultaneously?
What is the preferred way to pass mutable variables to threads? Immutable ones are obviously through Thread(args=[],kwargs={}) and that's what all the examples cover. If it's global, I'll have to hold my nose and use it, but it seems like there has to be a better way. I suppose I could wrap everything in a class and just pass the instance in, but it'd be nice to point at regular variables, too.
When do I need threading.local()? In the x above?
Do I have to subclass Thread to update data, as many examples show?
I'm used to Win32 threads and pthreads, where it's explicitly laid out in docs what is and isn't shared with different uses of threads. Those are pretty low-level, and I'd like to avoid _thread if possible to be pythonic.
I'm not sure if it's relevant, but I'm trying to thread OpenMP-style to get the hang of it - make a for loop run concurrently using a queue and some threads. It was easy with globals and locks, but now I'd like to nail down scopes for better lock use.
In threading.Thread(target=x), is x shared or private?
It is private. Each thread has its own private invocation of x.
This is similar to recursion, for example (regardless of multithreading). If x calls itself, each invocation of x gets its own "private" frame, with its own private local variables.
What is the preferred way to pass mutable variables to threads? Do I have to subclass Thread to update data?
I view the target argument as a quick shortcut, good for quick experiments, but not much else. Using it where it ought not be used leads to all the limitations you describe in your question (and the hacks you describe in the possible solutions you contemplate).
Most of the time, you'd want to subclass threading.Thread. The code creating/managing the threads would pass all mutable shared objects to your thread-classes' __init__, and they should keep those objects as their attributes, and access them when running (within their run method).
When do I need threading.local()?
You rarely do, so you probably don't.
I'd like to avoid _thread if possible to be pythonic
Without a doubt, avoid it.