I am writing a class that has many functionalities(therefore methods), but I require this class to run inside a thread(class opens a subprocess). I want to use the common way of declaring thread based classes of,
class HiStackOverflow(threading.Thread):
# Somethings...
However, as I said, this class of mine has many pseudo-private, regular and static methods. And as I declare them, I want to avoid overriding some necessary threading.Thread method by mistake.
Well I can always check the directory of threading.Thread and see if there are any method names that overlap, however this seemed like a inappropriate way to handle this. It may be impractical as the method count increases.
My question is, is this kind of implementation feasible ? If not, how should I handle this ? Should I write some wrapper class as the Thread handler.
Thanks in advance.
If you're worried about namespace clashes between your class and threading.Thread, I would definitely suggest that you use composition rather than inheritance (or keep the two functionalities separate entirely). There shouldn't be significant overhead to just wrapping the couple threading methods that you need and then name clashes become a non-issue.
It also more cleanly will separate the functionality of your class from the functionality provided by threading. That's likely to be a win in the long run for understanding your code.
There isn't much benefit from inheriting from Thread. You could have a factory method that creates the thread or even have its __init__ do it.
import threading
import time
class MyClass:
def __init__(self):
self._thread = threading.Thread(target=self.run)
self._thread.start()
def run(self):
for i in range(5):
print('worker thread', i)
time.sleep(.5)
def join(self):
self._thread.join()
my_obj = MyClass()
for i in range(3):
print('main thread', i)
time.sleep(.5)
my_obj.join()
print('done')
There seem to be some ideas conflated in this phrase:
but I require this class to run inside a thread(class opens a subprocess)
Classes don't "run". You can start a new thread which executes some class method, or an instance method. That class doesn't have to inherit from Thread. It doesn't even need a reference to the running thread. You just start to execute some function in a new thread and you're done.
Subprocesses are unrelated to threads. You don't need one to do the other.
If you're worried about overriding something, check the documentation (https://docs.python.org/3/library/threading.html#thread-objects). Otherwise, if you want to keep the reference to the thread, you can always do:
class HiStackoverflow:
def run(self):
self.thread = Thread(target=self.entry_point)
self.thread.start()
def entry_point(self):
...
Related
I have a class that looks like the following
class A:
communicate = set()
def __init__(self):
pass
...
def _some_func(self):
...some logic...
self.communicate.add(some_var)
The communicate variable is shared among the instances of the class. I use it to provide a convenient way for the instances of this class to communicate with one another (they have some mild orchestration needed and I don't want to force the calling object to serve as an intermediary of this communication). However, I realized this causes problems when I run my tests. If I try to test multiple aspects of my code, since the python interpreter is the same throughout all the tests, I won't get a "fresh" A class for the tests, and as such the communicate set will be the aggregate of all objects I add to that set (in normal usage this is exactly what I want, but for testing I don't want interactions between my tests). Furthermore, down the line this will also cause problems in my code execution if I want to loop over my whole process multiple times (because I won't have a way of resetting this class variable).
I know I can fix this issue where it occurs by having the creator of the A objects do something like
A.communicate = set()
before it creates and uses any instances of A. However, I don't really love this because it forces my caller to know some details about the communication pathways of the A objects, and I don't want that coupling. Is there a better way for me to to reset the communicate A class variable? Perhaps some method I could call on the class instead of an instance itself like (A.new_batch()) that would perform this resetting? Or is there a better way I'm not familiar with?
Edit:
I added a class method like
class A:
communicate = set()
def __init__(self):
pass
...
#classmethod
def new_batch(cls):
cls.communicate = set()
def _some_func(self):
...some logic...
self.communicate.add(some_var)
and this works with the caller running A.new_batch(). Is this the way it should be constructed and called, or is there a better practice here?
I have a small pyramid web service.
I have also a python class that creates an index of items and methods to search fast across them. Something like:
class MyCorpus(object):
def __init__(self):
self.table = AwesomeDataStructure()
def insert(self):
self.table.push_back(1)
def find(self, needle):
return self.table.find(needle)
I would like to expose the above class to my api.
I can create only one instance of that class (memory limit).
So I need to be able to instantiate this class before the server starts.
And my threads should be able to access it.
I also need some locking mechanism(conccurrent inserts are not supported).
What is the best way to achieve that?
Add an instance of your class to the global application registry during your Pyramid application's configuration:
config.registry.mycorpus = MyCorpus()
and later, for example in your view code, access it through a request:
request.registry.mycorpus
You could also register it as a utility with Zope Component Architecture using registry.registerUtility, but you'd need to define what interface MyCorpus provides etc., which is a good thing in the long run. Either way having a singleton instance as part of the registry makes testing your application easier; just create a configuration with a mock corpus.
Any locking should be handled by the instance itself:
from threading import Lock
class MyCorpus(object):
def __init__(self, Lock=Lock):
self.table = AwesomeDataStructure()
self.lock = Lock()
...
def insert(self):
with self.lock:
self.table.push_back(1)
Any global variable is shared between threads in Python, so this part is really easy: "... create only one instance of that class ... before the server starts ... threads should be able to access it":
corpus = MyCorpus() # in global scope in any module
Done! Then import the instance from anywhere and call your class' methods:
from mydata import corpus
corpus.do_stuff()
No need for ZCA, plain pythonic Python :)
(the general approach of keeping something large and very database-like within the webserver process feels quite suspicious though, I hope you know what you're doing. I mean - persistence? locking? sharing data between multiple processes? Redis, MongoDB and 1001 other database products have those problems solved)
I'm trying to figure out what the following module is doing.
import Queue
import multiprocessing
import threading
class BufferedReadQueue(Queue.Queue):
def __init__(self, lim=None):
self.raw = multiprocessing.Queue(lim)
self.__listener = threading.Thread(target=self.listen)
self.__listener.setDaemon(True)
self.__listener.start()
Queue.Queue.__init__(self, lim)
def listen(self):
try:
while True:
self.put(self.raw.get())
except:
pass
#property
def buffered(self):
return self.qsize()
It is only instantiated once in the calling code, and the .raw attribute, multiprocessing.Queue, gets sent to another class, which appears to inherit from multiprocessing.Process.
So as I'm seeing it, an attribute of BufferedReadQueue is being used as a Queue, but not the class (nor an instance of it) itself.
What would be a reason that BufferedReadQueue inherits from Queue.Queue and not just object, if it's not actually being used as a queue?
It looks like BufferedReadQueue is meant to be used as a way to convert the read end of a multiprocessing.Queue into a normal Queue.Queue. Note this in __init__:
self.__listener = threading.Thread(target=self.listen)
self.__listener.setDaemon(True)
self.__listener.start()
This starts up a listener thread, which just constantly tries to get items from the internal multiprocessing.Queue, and then puts all those items to self. It looks like the use-case is something like this:
def func(queue):
queue.put('stuff')
...
buf_queue = BufferedReadQueue()
proc = multiprocessing.Process(target=func, args=(buf_queue.raw,))
proc.start()
out = buf_queue.get() # Only get calls in the parent
Now, why would you do this instead of just using the multiprocessing.Queue directly? Probably because multiprocessing.Queue has some shortcomings that Queue.Queue doesn't. For example qsize(), which this BufferedReadQueue uses, is not reliable with multiprocessing.Queue:
Return the approximate size of the queue. Because of multithreading/multiprocessing semantics, this number is not reliable.
Note that this may raise NotImplementedError on Unix platforms like Mac OS X where sem_getvalue() is not implemented.
It's also possible to introspect a Queue.Queue, and peek at its contents without popping them. This isn't possible with a multiprocessing.Queue.
Is there any way to use monitor thread synchronization like java methods synchronization,in python class to ensure thread safety and avoid race condition?
I want a monitor like synchronization mechanism that allows only one method call in my class or object
You might want to have a look at python threading interface. For simple mutual exclusion functionality you might use a Lock object. You can easily do this using the with statement like:
...
lock = Lock()
...
with (lock):
# This code will only be executed by one single thread at a time
# the lock is released when the thread exits the 'with' block
...
See also here for an overview of different thread synchronization mechanisms in python.
There is no python language construct for Java's synchronized (but I guess it could be built using decorators)
I built a simple prototype for it, here's a link to the GitHub repository for all the details : https://github.com/m-a-rahal/monitor-sync-python
I used inheritance instead of decorators, but maybe I'll include that option later
Here's what the 'Monitor' super class looks like:
import threading
class Monitor(object):
def __init__(self, lock = threading.Lock()):
''' initializes the _lock, threading.Lock() is used by default '''
self._lock = lock
def Condition(self):
''' returns a condition bound to this monitor's lock'''
return threading.Condition(self._lock)
init_lock = __init__
Now all you need to do to define your own monitor is to inherit from this class:
class My_Monitor_Class(Monitor):
def __init__(self):
self.init_lock() # just don't forget this line, creates the monitor's _lock
cond1 = self.Condition()
cond2 = self.Condition()
# you can see i defined some 'Condition' objects as well, very simple syntax
# these conditions are bound to the lock of the monitor
you can also pass your own lock instead
class My_Monitor_Class(Monitor):
def __init__(self, lock):
self.init_lock(lock)
check out threading.Condition() documentation
Also you need to protect all the 'public' methods with the monitor's lock, like this:
class My_Monitor_Class(Monitor):
def method(self):
with self._lock:
# your code here
if you want to use 'private' methods (called inside the monitor), you can either NOT protect them with the _lock (or else the threads will get stuck), or use RLock instead for the monitor
EXTRA TIP
sometimes a monitor consists of 'entrance' and 'exit' protocols
monitor.enter_protocol()
<critical section>
monitor.exit_protocol()
in this case, you can exploit python's cool with statement :3
just define the __enter__ and __exit__ methods like this:
class monitor(Monitor):
def __enter__(self):
with self._lock:
# enter_protocol code here
def __exit__(self, type, value, traceback):
with self._lock:
# exit_protocol code here
now all you need to do is call the monitor using with statement:
with monitor:
<critical section>
If an object relies on a module that is not included with Python (like win32api, gstreamer, gui toolkits, etc.), and a class/function/method from that module may fail, what should the object do?
Here's an example:
import guimodule # Just an example; could be anything
class RandomWindow(object):
def __init__(self):
try:
self.dialog = guimodule.Dialog() # I might fail
except: guimodule.DialogError:
self.dialog = None # This can't be right
def update(self):
self.dialog.prepare()
self.dialog.paint()
self.dialog.update()
# ~30 more methods
This class would only be a tiny (and unnecessary, but useful) part of a bigger program.
Let's assume we have an imaginary module called guimodule, with a class called Dialog, that may fail to instantiate. If our RandomWindow class has say, 30 methods that manipulate this window, checking if self.dialog is not None will be a pain, and will slow down the program when implemented in constantly used methods (like the update method in the example above). Calling .paint() on a NoneType (when the Dialog fails to load) will raise an error, and making a dummy Dialog class with all of the original's methods and attributes would be absurd.
How can I modify my class to handle a failed creation of the Dialog class?
Rather than creating an invalid object, you should have allowed the exception raised in __init__ to propogate out so the error could be handled in an appropriate manner. Or you could have raised a different exception.
See also Python: is it bad form to raise exceptions within __init__?
You may find it useful to have two subclasses of it; one that uses that module and one that does not. A "factory" method could determine which subclass was appropriate, and return an instance of that subclass.
By subclassing, you allow them to share code that is independent of whether that module is available.
I agree that "checking if self.dialog is not None will be pain" but I don't agree that it will slow down things because if self.dialog existed it will be more slower. So forget about slowness for time being. so one way to handle is to create a MockDialog which does nothing on function calls e.g.
class RandomWindow(object):
def __init__(self):
try:
self.dialog = guimodule.Dialog() # I might fail
except: guimodule.DialogError:
self.dialog = DummyDialog() # create a placeholder
class DummyDialog(object):
# either list all methods or override __getattr__ to create a mock object
Making a dummy Dialog class is not as absurd as you might thing if you consider using Pythons __getattr__ feature. This following dummy-implementation would completely fit your needs:
class DummyDialog:
def __getattr__(self, name):
def fct(*args, **kwargs):
pass
return fct