I have a system which contains -
Queue & 2 types of instances -
1.push to the Queue
2.pull from Queue
I want to push and pull from the Queue in the same time but i don't sure (I didn't find in documentation and didn't find the implementation)
if the queue protects from collisions of access to the same memory
for example:
There is zero elements in the Queue -> I push and then I pull in the same time
My question is if the Queue not protects it, there is any way to lock only the entrance or the exit of the Queue?
The Queue class knows about concurrent access and handles it correctly. If you pull from the queue (queue.get()) and there is nothing in the queue then the call will block or time out. If you push to the queue (queue.put()) then this will be correctly handled and the call will only block or time out if you have set a maximum size for the queue and it is full.
Documentation says:
The queue module implements multi-producer, multi-consumer queues. It
is especially useful in threaded programming when information must be
exchanged safely between multiple threads. The Queue class in this
module implements all the required locking semantics. It depends on
the availability of thread support in Python; see the threading
module.
Related
What is the difference between a Queue and a JoinableQueue in multiprocessing in Python? This question has already been asked here, but as some comments point out, the accepted answer is not helpful because all it does is quote the documentation. Could someone explain the difference in terms of when to use one versus the other? For example, why would one choose to use Queue over JoinableQueue if JoinableQueue is pretty much the same thing except for offering the two extra methods join() and task_done(). Additionally, the other answer in the post I linked to mentions that Based on the documentation, it's hard to be sure that Queue is actually empty. which again raises the question as to why would I want to use a Queue over JoinableQueue? What advantages does it offer?
multiprocessing patterns its queues off of queue.Queue. In that model, Queue keeps a "task count" of everything put on the queue. There are generally two ways to use this queue. Producers could just put things on the queue and ignore what happens to them in the long run. The producer may wait from time to time if the queue is full, but doesn't care if any of the things put on the queue are actually processed by the consumer. In this case the queue's task count grows, but who cares?
Alternately, the producer can "join" the queue. That means that it waits until the last task on the queue has been processed and the task count has gone to zero. But to do this, the producer needs the consumer's help. A consumer gets an item from the queue, but that doesn't decrease the task count. The consumer has to actively call task_done (typically when the task is done...) and the join will wait until every put has a task_done.
Fast forward to multiprocessing. The task_done mechanism requires communication between processes which is relatively expensive. If you are a type A producer that doesn't play the join game, use a multiprocessing.Queue and save a bit of CPU time. If you are a type B producer use multiprocessing.JoinableQueue. But remember that the consumer also has to play the task_done game or the producer will hang.
Is there a difference in thread operation functionality between an application with 3 daemon threads that all pull from a multiprocessing Queue and 4 separate applications: a multiprocessing Queue/Pipe and 3 daemon thread applications that read from the Queue/Pipe application?
Neither application uses blocking/synchronisation. At the end of the day the operating system will decide when to allow a thread to run and for how long. Are there any other differences in functionality here or are they essentially the same?
Generic Application (no synchronisation or blocking):
'Stock Market Feed' Queue: StockTrade messages (dictonaries)
'TradingStrategy' 1 Daemon Thread: Pull from queue, inspect messages and perform trades
'TradingStrategy' 1 Daemon Thread: Pull from queue, inspect messages and perform trades
'TradingStrategy' 1 Daemon Thread: Pull from queue, inspect messages and perform trades
Alternate architecture:
Feed Application (no multi-threading):
'Stock Market Feed' Queue or Pipe: StockTrade messages (dictonaries). Can a Queue be accessed from another outside process? I know a named pipe can but can a queue?
Trading Application (no multi-threading):
'TradingStrategy': Interacts with feed (pipe?/queue), inspect messages and perform trades
Trading Application (no multi-threading):
'TradingStrategy': Interacts with feed (pipe?/queue), inspect messages and perform trades
Trading Application (no multi-threading):
'TradingStrategy': Interacts with feed (pipe?/queue), inspect messages and perform trades
Yes, the two options are quite different. But it gets complicated fast trying to explain the difference. You should research and read up on the differences between a thread and a process. Get that in your head straight first.
Now, given your specific scenario, assuming by "multiprocessing queue" you actually mean an instance of a python Queue in one thread of a process, since the queue is inside the same process as all the worker threads, the workers will be able to access, and share that same instance of the Queue.
However when the workers are all separate processes then they cannot access the Queue by shared memory and will need some form of interprocess communication to gain access to that queue.
In practice, I'd be thinking something like redis or zeromq to be your queue, then build a python program to talk to it, then scale up as few, or as many copies of it, as you need.
I'm using the Python threading library. Works fine (subject to the Global Interpreter Lock, of course).
Now I have a condundrum. I have two separate sources of concurrency: either two Queues, or a Queue and a Condition. How can I wait for the first one that is ready? (They have to be separate objects since they are owned by different modular parts of my application.)
Windows has the WaitForMultipleObjects function; is there something similar for Python concurrency primitives?
There is not an already existing function that I know of that you asked about. However there is the threading.enumaerate() which I think just might return a list off all current daemon threads no matter the source. Once you have that list you could iterate over it looking for the condition you want. To set a thread as a daemon each thread has a method that can be called like thread.setDaemon(True) before the thread is started.
I cant say for sure that this is your answer. I don't have as much experience as apparently you do, but I looked this up in a book I have, The Python Standard Library by Example - by Doug Hellmann. He has 23 pages on managing concurrent operations in the section on threading and enumerate seamed to be something that would help.
You could create a new synchronization object (queue, condition, etc.) let's call it the ready_event, and one Thread for each sync object you want to watch. Each thread waits for its sync object to be ready, when a thread's sync object is ready, the thread signals it via the ready_event. after you created and started the threads, you can wait on that ready_event.
I have a lot of tasks that I'd like to execute a few at a time. The normal solution for this is a thread pool. However, my tasks need resources that only certain threads have. So I can't just farm a task out to any old thread; the thread has to have the resource the task needs.
It seems like there should be a concurrency pattern for this, but I can't seem to find it. I'm implementing this in Python 2 with multiprocessing, so answers in those terms would be great, but a generic solution is fine. In my case the "threads" are actually separate OS processes and the resources are network connections (and no, it's not a server, so (e)poll/select is not going to help). In general, a thread/process can hold several resources.
Here is a naive solution: put the tasks in a work queue and turn my thread pool loose on it. Have each thread check, "Can I do this task?" If yes, do it; if no, put it back in the queue. However, if each task can only be done by one of N threads, then I'm doing ~2N expensive, wasted accesses to a shared queue just to get one unit of work.
Here is my current thought: have a shared work queue for each resource. Farm out tasks to the matching queue. Each thread checks the queue(s) it can handle.
Ideas?
A common approach to this is to not allocate resources to threads and queue the appropriate resource in with the data, though I appreciate that this is not always possible if a resource is bound to a particular thread.
The idea of using a queue per resource with threads only popping objects from the queues containing objects it can handle may work.
It may be possible to use a semaphore+concurrentQueue array, indexed by resource, for signaling such threads and also providing a priority system, so eliminating most of the polling and wasteful requeueing. I will have to think a bit more about that - it kinda depends on how the resources map to the threads.
Is it save if I just using put and get_nowait functions in a queue, where the queue is shared between the threads. When Do I need to use a thread lock?
The essential idea of queue is to share it between multiple threads.
The Queue class implements all the required locking semantics.
So you don't have to acquire lock explicitly.
http://docs.python.org/library/queue.html#module-Queue
The Queue module (called queue in Python 3) is specifically designed to work in multithreaded environments.
If that's what you're using, you don't need any additional locking.