multiprocessinq.Queue as attribute of Queue.Queue child - python

I'm trying to figure out what the following module is doing.
import Queue
import multiprocessing
import threading
class BufferedReadQueue(Queue.Queue):
def __init__(self, lim=None):
self.raw = multiprocessing.Queue(lim)
self.__listener = threading.Thread(target=self.listen)
self.__listener.setDaemon(True)
self.__listener.start()
Queue.Queue.__init__(self, lim)
def listen(self):
try:
while True:
self.put(self.raw.get())
except:
pass
#property
def buffered(self):
return self.qsize()
It is only instantiated once in the calling code, and the .raw attribute, multiprocessing.Queue, gets sent to another class, which appears to inherit from multiprocessing.Process.
So as I'm seeing it, an attribute of BufferedReadQueue is being used as a Queue, but not the class (nor an instance of it) itself.
What would be a reason that BufferedReadQueue inherits from Queue.Queue and not just object, if it's not actually being used as a queue?

It looks like BufferedReadQueue is meant to be used as a way to convert the read end of a multiprocessing.Queue into a normal Queue.Queue. Note this in __init__:
self.__listener = threading.Thread(target=self.listen)
self.__listener.setDaemon(True)
self.__listener.start()
This starts up a listener thread, which just constantly tries to get items from the internal multiprocessing.Queue, and then puts all those items to self. It looks like the use-case is something like this:
def func(queue):
queue.put('stuff')
...
buf_queue = BufferedReadQueue()
proc = multiprocessing.Process(target=func, args=(buf_queue.raw,))
proc.start()
out = buf_queue.get() # Only get calls in the parent
Now, why would you do this instead of just using the multiprocessing.Queue directly? Probably because multiprocessing.Queue has some shortcomings that Queue.Queue doesn't. For example qsize(), which this BufferedReadQueue uses, is not reliable with multiprocessing.Queue:
Return the approximate size of the queue. Because of multithreading/multiprocessing semantics, this number is not reliable.
Note that this may raise NotImplementedError on Unix platforms like Mac OS X where sem_getvalue() is not implemented.
It's also possible to introspect a Queue.Queue, and peek at its contents without popping them. This isn't possible with a multiprocessing.Queue.

Related

Custom class inheritance from threading.Thread

I am writing a class that has many functionalities(therefore methods), but I require this class to run inside a thread(class opens a subprocess). I want to use the common way of declaring thread based classes of,
class HiStackOverflow(threading.Thread):
# Somethings...
However, as I said, this class of mine has many pseudo-private, regular and static methods. And as I declare them, I want to avoid overriding some necessary threading.Thread method by mistake.
Well I can always check the directory of threading.Thread and see if there are any method names that overlap, however this seemed like a inappropriate way to handle this. It may be impractical as the method count increases.
My question is, is this kind of implementation feasible ? If not, how should I handle this ? Should I write some wrapper class as the Thread handler.
Thanks in advance.
If you're worried about namespace clashes between your class and threading.Thread, I would definitely suggest that you use composition rather than inheritance (or keep the two functionalities separate entirely). There shouldn't be significant overhead to just wrapping the couple threading methods that you need and then name clashes become a non-issue.
It also more cleanly will separate the functionality of your class from the functionality provided by threading. That's likely to be a win in the long run for understanding your code.
There isn't much benefit from inheriting from Thread. You could have a factory method that creates the thread or even have its __init__ do it.
import threading
import time
class MyClass:
def __init__(self):
self._thread = threading.Thread(target=self.run)
self._thread.start()
def run(self):
for i in range(5):
print('worker thread', i)
time.sleep(.5)
def join(self):
self._thread.join()
my_obj = MyClass()
for i in range(3):
print('main thread', i)
time.sleep(.5)
my_obj.join()
print('done')
There seem to be some ideas conflated in this phrase:
but I require this class to run inside a thread(class opens a subprocess)
Classes don't "run". You can start a new thread which executes some class method, or an instance method. That class doesn't have to inherit from Thread. It doesn't even need a reference to the running thread. You just start to execute some function in a new thread and you're done.
Subprocesses are unrelated to threads. You don't need one to do the other.
If you're worried about overriding something, check the documentation (https://docs.python.org/3/library/threading.html#thread-objects). Otherwise, if you want to keep the reference to the thread, you can always do:
class HiStackoverflow:
def run(self):
self.thread = Thread(target=self.entry_point)
self.thread.start()
def entry_point(self):
...

Copy member functions as a way of providing an interface

Is this good Python practice?
import threading
import Queue
class Poppable(threading.Thread):
def __init__(self):
super(Poppable, self).__init__()
self._q = Queue.Queue()
# provide a limited subset of the Queue interface to clients
self.qsize = self._q.qsize
self.get = self._q.get
def run(self):
# <snip> -- do stuff that puts new items onto self._q
# this is why clients don't need access to put functionality
Does this approach of "promoting" member's functions up to the containing class's interface violate the style, or Zen, of Python?
Mainly I'm trying to contrast this approach with the more standard one that would involve declaring wrapper functions normally:
def qsize(self):
return self._q.qsize()
def get(self, *args):
return self._q.get(*args)
I don't think that is Python specific. In general, this is a good OOP practice. You expose just the functions you need the client to know, hiding the internals of the contained queue. This is a typical approach when wrapping an object, and totally compliant with principle of least knowledge.
If, instead of self.qsize the client had to call self._q.qsize, you cannot easily change _q with a different data type, which does not have a qsize method if that is needed later. So, your approach, makes the object more open to possible future changes.

how to subclass multiprocessing.JoinableQueue

I am trying to subclass multiprocessing.JoinableQueue so I can keep track of jobs that were skipped instead of completed. I am using a JoinableQueue to pass jobs to a set of multiprocessing.Process's and I have a threading.Thread populating the queue. Here is my implementation attempt:
import multiprocessing
class InputJobQueue(multiprocessing.JoinableQueue):
def __init__(self, max_size):
super(InputJobQueue, self).__init__(0)
self._max_size = max_size
self._skipped_job_count = 0
def isFull(self):
return self.qsize() >= self._max_size
def taskSkipped(self):
self._skipped_job_count += 1
self.task_done()
However, I run into this issue documented here:
class InputJobQueue(multiprocessing.JoinableQueue):
TypeError
:
Error when calling the metaclass bases
function() argument 1 must be code, not str
Looking at the code in multiprocessing I see that the actual class is in multiprocessing.queues. So I try to extend that class:
import multiprocessing.queues
class InputJobQueue(multiprocessing.queues.JoinableQueue):
def __init__(self, max_size):
super(InputJobQueue, self).__init__(0)
self._max_size = max_size
self._skipped_job_count = 0
def isFull(self):
return self.qsize() >= self._max_size
def taskSkipped(self):
self._skipped_job_count += 1
self.task_done()
But I get inconsistent results: sometimes my custom attributes exist, other times they don't. E.g. the following error is reported in one of my worker Processes:
AttributeError: 'InputJobQueue' object has no attribute '_max_size'
What am I missing to subclass multiprocessing.JoinableQueue?
With multiprocessing, the way objects like JoinableQueue are magically shared between processes is by explicitly sharing the core sync objects, and pickling the "wrapper" stuff to pass over a pipe.
If you understand how pickling works, you can look at the source to JoinableQueue and see that it's using __getstate__/__setstate__. So, you just need to override those to add your own attributes. Something like this:
def __getstate__(self):
return super(InputJobQueue, self).__getstate__() + (self._max_size,)
def __setstate__(self, state):
super(InputJobQueue, self).__setstate__(state[:-1])
self._max_size = state[-1]
I'm not promising this will actually work, since clearly these classes were not designed to be subclassed (the proposed fix for the bug you referenced is to document that the classes can't be subclassed and find a way to make the error messages nicer…). But it should get you past the particular problem you're having here.
You're trying to subclass a type that isn't meant to be subclassed. This requires you to depend on the internals of its implementation in two different ways (one of which is arguably a bug in the stdlib, but the other isn't). And this isn't necessary.
If the actual type is hidden under the covers, no code can actual expect you to be a formal subtype; as long as you duck-type as a queue, you're fine. Which you can do by delegating to a member:
class InputJobQueue(object):
def __init__(self, max_size):
self._jq = multiprocessing.JoinableQueue(0)
self._max_size = max_size
self._skipped_job_count = 0
def __getattr__(self, name):
return getattr(self._jq, name)
# your overrides/new methods
(It would probably be cleaner to explicitly delegate only the documented methods of JoinableQueue than to __getattr__-delegate everything, but in the interests of brevity, I did the shorter version.)
It doesn't matter whether that constructor is a function or a class, because the only thing you're doing is calling it. It doesn't matter how the actual type is pickled, because a class is only responsible for identifying its members, not knowing how to pickle them. All of your problems go away.

How to use the context manager to avoid the use of __del__ in python?

As it is common knowledge, the python __del__ method should not be used to clean up important things, as it is not guaranteed this method gets called. The alternative is the use of a context manager, as described in several threads.
But I do not quite understand how to rewrite a class to use a context manager. To elaborate, I have a simple (non-working) example in which a wrapper class opens and closes a device, and which shall close the device in any case the instance of the class gets out of its scope (exception etc).
The first file mydevice.py is a standard wrapper class to open and close a device:
class MyWrapper(object):
def __init__(self, device):
self.device = device
def open(self):
self.device.open()
def close(self):
self.device.close()
def __del__(self):
self.close()
this class is used by another class myclass.py:
import mydevice
class MyClass(object):
def __init__(self, device):
# calls open in mydevice
self.mydevice = mydevice.MyWrapper(device)
self.mydevice.open()
def processing(self, value):
if not value:
self.mydevice.close()
else:
something_else()
My question: When I implement the context manager in mydevice.py with __enter__ and __exit__ methods, how can this class be handled in myclass.py? I need to do something like
def __init__(self, device):
with mydevice.MyWrapper(device):
???
but how to handle it then? Maybe I overlooked something important? Or can I use a context manager only within a function and not as a variable inside a class scope?
I suggest using the contextlib.contextmanager class instead of writing a class that implements __enter__ and __exit__. Here's how it would work:
class MyWrapper(object):
def __init__(self, device):
self.device = device
def open(self):
self.device.open()
def close(self):
self.device.close()
# I assume your device has a blink command
def blink(self):
# do something useful with self.device
self.device.send_command(CMD_BLINK, 100)
# there is no __del__ method, as long as you conscientiously use the wrapper
import contextlib
#contextlib.contextmanager
def open_device(device):
wrapper_object = MyWrapper(device)
wrapper_object.open()
try:
yield wrapper_object
finally:
wrapper_object.close()
return
with open_device(device) as wrapper_object:
# do something useful with wrapper_object
wrapper_object.blink()
The line that starts with an at sign is called a decorator. It modifies the function declaration on the next line.
When the with statement is encountered, the open_device() function will execute up to the yield statement. The value in the yield statement is returned in the variable that's the target of the optional as clause, in this case, wrapper_object. You can use that value like a normal Python object thereafter. When control exits from the block by any path – including throwing exceptions – the remaining body of the open_device function will execute.
I'm not sure if (a) your wrapper class is adding functionality to a lower-level API, or (b) if it's only something you're including so you can have a context manager. If (b), then you can probably dispense with it entirely, since contextlib takes care of that for you. Here's what your code might look like then:
import contextlib
#contextlib.contextmanager
def open_device(device):
device.open()
try:
yield device
finally:
device.close()
return
with open_device(device) as device:
# do something useful with device
device.send_command(CMD_BLINK, 100)
99% of context manager uses can be done with contextlib.contextmanager. It is an extremely useful API class (and the way it's implemented is also a creative use of lower-level Python plumbing, if you care about such things).
The issue is not that you're using it in a class, it's that you want to leave the device in an "open-ended" way: you open it and then just leave it open. A context manager provides a way to open some resource and use it in a relatively short, contained way, making sure it is closed at the end. Your existing code is already unsafe, because if some crash occurs, you can't guarantee that your __del__ will be called, so the device may be left open.
Without knowing exactly what the device is and how it works, it's hard to say more, but the basic idea is that, if possible, it's better to only open the device right when you need to use it, and then close it immediately afterwards. So your processing is what might need to change, to something more like:
def processing(self, value):
with self.device:
if value:
something_else()
If self.device is an appropriately-written context manager, it should open the device in __enter__ and close it in __exit__. This ensures that the device will be closed at the end of the with block.
Of course, for some sorts of resources, it's not possible to do this (e.g., because opening and closing the device loses important state, or is a slow operation). If that is your case, you are stuck with using __del__ and living with its pitfalls. The basic problem is that there is no foolproof way to leave the device "open-ended" but still guarantee it will be closed even in the event of some unusual program failure.
I'm not quite sure what you're asking. A context manager instance can be a class member - you can re-use it in as many with clauses as you like and the __enter__() and __exit__() methods will be called each time.
So, once you'd added those methods to MyWrapper, you can construct it in MyClass just as you are above. And then you'd do something like:
def my_method(self):
with self.mydevice:
# Do stuff here
That will call the __enter__() and __exit__() methods on the instance you created in the constructor.
However, the with clause can only span a function - if you use the with clause in the constructor then it will call __exit__() before exiting the constructor. If you want to do that, the only way is to use __del__(), which has its own problems as you've already mentioned. You could open and close the device just when you need it using with but I don't know if this fulfills your requirements.

Monitor thread synchronization in python

Is there any way to use monitor thread synchronization like java methods synchronization,in python class to ensure thread safety and avoid race condition?
I want a monitor like synchronization mechanism that allows only one method call in my class or object
You might want to have a look at python threading interface. For simple mutual exclusion functionality you might use a Lock object. You can easily do this using the with statement like:
...
lock = Lock()
...
with (lock):
# This code will only be executed by one single thread at a time
# the lock is released when the thread exits the 'with' block
...
See also here for an overview of different thread synchronization mechanisms in python.
There is no python language construct for Java's synchronized (but I guess it could be built using decorators)
I built a simple prototype for it, here's a link to the GitHub repository for all the details : https://github.com/m-a-rahal/monitor-sync-python
I used inheritance instead of decorators, but maybe I'll include that option later
Here's what the 'Monitor' super class looks like:
import threading
class Monitor(object):
def __init__(self, lock = threading.Lock()):
''' initializes the _lock, threading.Lock() is used by default '''
self._lock = lock
def Condition(self):
''' returns a condition bound to this monitor's lock'''
return threading.Condition(self._lock)
init_lock = __init__
Now all you need to do to define your own monitor is to inherit from this class:
class My_Monitor_Class(Monitor):
def __init__(self):
self.init_lock() # just don't forget this line, creates the monitor's _lock
cond1 = self.Condition()
cond2 = self.Condition()
# you can see i defined some 'Condition' objects as well, very simple syntax
# these conditions are bound to the lock of the monitor
you can also pass your own lock instead
class My_Monitor_Class(Monitor):
def __init__(self, lock):
self.init_lock(lock)
check out threading.Condition() documentation
Also you need to protect all the 'public' methods with the monitor's lock, like this:
class My_Monitor_Class(Monitor):
def method(self):
with self._lock:
# your code here
if you want to use 'private' methods (called inside the monitor), you can either NOT protect them with the _lock (or else the threads will get stuck), or use RLock instead for the monitor
EXTRA TIP
sometimes a monitor consists of 'entrance' and 'exit' protocols
monitor.enter_protocol()
<critical section>
monitor.exit_protocol()
in this case, you can exploit python's cool with statement :3
just define the __enter__ and __exit__ methods like this:
class monitor(Monitor):
def __enter__(self):
with self._lock:
# enter_protocol code here
def __exit__(self, type, value, traceback):
with self._lock:
# exit_protocol code here
now all you need to do is call the monitor using with statement:
with monitor:
<critical section>

Categories