I'm trying to use a multiprocessing.Queue to communicate something when an object in another process which has this queue as an attribute is deleted. While doing so I noticed that Queue.put() with block=False (or equivalently Queue.put_nowait()) blocks anyway under certain circumstances and boiled it down to the following minimal reproducible example:
import multiprocessing as mp
class A:
def __init__(self):
self.q = mp.Queue()
# uncommenting this makes it work fine:
# self.q.put("test")
def __del__(self):
print("before putting CLOSE")
self.q.put_nowait("CLOSE")
print("after putting CLOSE")
a = A()
# uncommenting this also makes it work fine:
# del a
Running this code, the output gets as far as
before putting CLOSE
and then it freezes indefinitely.
I'm at a loss as to what's going on here and why this happens. Queue.put_nowait() seems to block if and only if it's called from an object's destructor (__del__()) and it hasn't had data put into it before and the destructor was called due to the object going out of scope rather than due to explicit del. The same exact thing happens if the queue is a global variable rather than an attribute of A, as long as put_nowait() is called within A's destructor.
Interestingly, aborting the script with Ctrl+C doesn't result in the usual exception output (KeyboardInterrupt etc.) but quits without any output.
Am I missing something obvious here? Why does this happen?
Related
I have some code which sets up an interrupt handler in the main thread and runs a loop in a side thread. This is so I can Ctrl-C the main thread to signal to the loop to gracefully shutdown, and this all happens inside one class, which looks like:
class MyClass:
# non-relevant stuff omitted for brevity
def run(self):
with concurrent.futures.ThreadPoolExecutor() as executor:
future = executor.submit(self.my_loop, self.arg_1, self.arg_2)
try:
future.result()
except KeyboardInterrupt as e:
self.exit_event.set() # read in my_loop(), exits after finishing an iteration
future.result()
This works fine. My question is, are there are special types of objects or characteristics of objects I should be aware of with this approach, specifically regarding self. members on MyClass? I think it's fine because my_loop is spawned inside MyClass and so no copies of the self. properties are made - initial testing points this to be the case. I'm really wondering if there are any more exotic objects (eg non-pickleable, which does work fine in this) I should consider?
As this is threads instead of between processes communication, pickleability does not matter as nothing is transmitted in queues. Your objects within your class (or outside the class) can be anything.
The only thing you need to keep in mind with class variables is that you need a lock to protect access to them. If you use several threads to modify a class variable, your results will eventually be something unexpected.
I have multiple python files in different folders that work together to make my program function. They consist of a main.pyfile that creates new threads for each file and then starts them with the necessary parameters. This works great while the parameters are static, but if a variable changes in the main.py it doesn't get changed in the other files. I also can't import the main.py file into otherfile.py to get the new variable since it is in a previous dir.
I have created an example below. What should happen is that the main.py file creates a new thread and calls otherfile.py with set params. After 5 seconds, the variable in main.py changes and so should the var in otherfile (so it starts printing the number 5 instead of 10), but I haven't found a solution to update them in otherfile.py
The folder structure is as follows:
|-main.py
|-other
|
otherfile.py
Here is the code in both files:
main.py
from time import sleep
from threading import Thread
var = 10
def newthread():
from other.otherfile import loop
nt = Thread(target=loop(var))
nt.daemon = True
nt.start()
newthread()
sleep(5)
var = 5 #change the var, otherfile.py should start printing it now (doesnt)
otherfile.py
from time import sleep
def loop(var):
while True:
sleep(1)
print(var)
In Python, there are two types of objects:
Immutable objects can’t be changed.
Mutable objects can be changed.
Int is immutable. you must be use list or dict variable.
from time import sleep
from threading import Thread
var = [10]
def newthread():
from other.otherfile import loop
nt = Thread(target=loop, args=(var,), daemon=True)
nt.start()
newthread()
sleep(5)
var[0] = 5
This happens because of how objects are passed into functions in Python. You'll hear that everything is passed by reference in Python, but since integers are immutable, when you edit the value of val, you're actually creating a new object and your thread still holds a reference to the integer with a value of 10.
To get around this, I wrote a simple wrapper class for an integer:
class IntegerHolder():
def __init__(self, n):
self.value = n
def set_value(self, n):
self.value = n
def get_value(self):
return self.value
Then, instead of var = 10, I did i = IntegerHolder(10), and after the sleep(5) call, I simply did i.set_value(5), which updates the wrapper object. The thread still has the same reference to the IntegerHolder object i, and when i.get_value() is called in the thread, it will return 5, as required.
You can also do this with a Python list, since lists are objects — it's just that this implementation makes it clearer what's going on. You'd just do var = [10] and do var[0] = 5, which would work since your thread should still keep a reference to the same list object as the main thread.
Two more errors:
Instead of Thread(target=loop(var)), you need to do Thread(target=loop, args=(i,)). This is because target is supposed to be a callable object, which is basically a function. Doing loop(var) will cause the Thread constructor to loop forever waiting for the function to return (and then set target to the return value), so the thread never actually gets created. You can verify this with your favorite Python debugger, or print statements.
Setting nt.daemon = True allows main.py to exit before the thread finishes. This means that as soon as i.set_value(5) is called, the main program terminates and your integer wrapper object ceases to exist. This makes your thread very confused when it tries to access the wrapper object, and by very confused, I mean it throws an exception and dies because threads do that. You can verify this by catching the exit code of the thread. Deleting that line fixes things (nt.daemon = False by default), but it's probably safer to do a nt.join() call in the main thread, which waits for a thread to finish execution.
And one warning, because programming wouldn't be complete without warnings:
Whenever different threads try to access a value, if AT LEAST ONE thread is modifying the value, this can cause a race condition. This means that all accesses at that point should be wrapped in a lock/mutex to prevent this. The Python (3.7.4) docs have more info about this.
Let me know if you have any more questions!
I've wrote a class that inherits from object and has instances of sub-objects that uses some threads for tasks. There are two socket listeners that creates other threads for each accepted connection. They do what they have to do. To finish them, they are looking on a Threading.Event object to know that they have to finish.
I've noticed that, when exit the python console they are not notified (or don't catch the notification) and the exit don't return control to the bash console, unless a Close() is called before.
First idea to fix it has been to implement the '__del__' method to use the garbage collector to clean it when exit.
class ServiceProvider(object):
def __init__(self):
super(ServiceProvider,self).__init__()
#...
self.Open()
def Open(self):
#... Some threads are created.
def Close(self):
#.... Threading.Event to report the threads to finish
def __del__(self):
self.Close()
But the behaviour is the same. If I place a print in those methods, non in '__del__', neither in 'Close' they are written. Unless it is closed before, then the print in the del is wrote.
Then I've implemented the '__enter__' and '__exit__' methods to manage the with statement. And the exit behaves as expected and when the with ends, things are release. But what I really want is to have something like the file descriptors that event if file.close() is not called, it is executed when exits the program.
class ServiceProvider(object):
#...
def __enter__(self):
return self
def __exit__(self):
self.Close()
Searching for more solutions I've tried with atexit but not. I have similar results that doesn't fix the issue. Even I collect all the objects created of this class, the doOnExit only writes its print if the objects in the list are already Close.
import atexit
global objects2Close
objects2Close = []
#atexit.register
def doOnExit():
for obj in objects2Close:
obj.Close()
class ServiceProvider(object):
def __init__(self):
super(ServiceProvider,self).__init__()
objects2Close.append(self)
It's usually a good idea to use with when you have resources that you don't want to leak (files, connections, whatever else you care about).
Somewhere, just outside your main loop you should have something like:
with ServiceProvider(some_params) as service_provider:
rest_of_the_code()
What this does is that regardless of how you exit rest_of_the_code() (except for kill -9) it will call service_provider.Close() at the end. This works for exceptions and interrupts as well. Kill -9 doesn't work because the process is kill at os level and doesn't have a chance to attempt to recover.
I've got a solution for this issue. The posted information in this question was not related with the real issue.
This is as simple as daemon threading.
A the implementation uses some threads for listening remote connections they have to finish their execution when the program goes to exit. But the program ends when all the no daemon thread has finished.
Mistakenly those listeners and talkers where not set to be daemons and that's why the execution waits for them.
I have written a class in python 2.7 (under linux) that uses multiple processes to manipulate a database asynchronously. I encountered a very strange blocking behaviour when using multiprocessing.Queue.put() and multiprocessing.Queue.get() which I can't explain.
Here is a simplified version of what I do:
from multiprocessing import Process, Queue
class MyDB(object):
def __init__(self):
self.inqueue = Queue()
p1 = Process(target = self._worker_process, kwargs={"inqueue": self.inqueue})
p1.daemon = True
started = False
while not started:
try:
p1.start()
started = True
except:
time.sleep(1)
#Sometimes I start a same second process but it makes no difference to my problem
p2 = Process(target = self._worker_process, kwargs={"inqueue": self.inqueue})
#blahblah... (same as above)
#staticmethod
def _worker_process(inqueue):
while True:
#--------------this blocks depite data having arrived------------
op = inqueue.get(block = True)
#do something with specified operation
#---------------problem area end--------------------
print "if this text gets printed, the problem was solved"
def delete_parallel(self, key, rawkey = False):
someid = ...blahblah
#--------------this section blocked when I was posting the question but for unknown reasons it's fine now
self.inqueue.put({"optype": "delete", "kwargs": {"key":key, "rawkey":rawkey}, "callid": someid}, block = True)
#--------------problem area end----------------
print "if you see this text, there was no blocking or block was released"
If I run the code above inside a test (in which I call delete_parallel on the MyDB object) then everything works, but if I run it in context of my entire application (importing other stuff, inclusive pygtk) strange things happen:
For some reason self.inqueue.get blocks and never releases despite self.inqueue having the data in its buffer. When I instead call self.inqueue.get(block = False, timeout = 1) then the call finishes by raising Queue.Empty, despite the queue containing data. qsize() returns 1 (suggests that data is there) while empty() returns True (suggests that there is no data).
Now clearly there must be something somewhere else in my application that renders self.inqueue unusable by causing acquisition of some internal semaphore. However I don't know what to look for. Eclipse dubugging becomes useless once a blocking semaphore is reached.
Edit 8 (cleaning up and summarizing my previous edits) Last time I had a similar problem, it turned out that pygtk was hijacking the global interpreter lock, but I solved it by calling gobject.threads_init() before I called anything else. Could this issue be related?
When I introduce a print "successful reception" after the get() method and execute my application in terminal, the same behaviour happens at first. When I then terminate by pressing CTRL+D I suddenly get the string "successful reception" inbetween messages. This looks to me like some other process/thread is terminated and releases the lock that blocks the process that is stuck at get().
Since the process that was stuck terminates later, I still see the message. What kind of process could externally mess with a Queue like that? self.inqueue is only accessed inside my class.
Right now it seems to come down to this queue which won't return anything despite the data being there:
the get() method seems to get stuck when it attempts to receive the actual data from some internal pipe. The last line before my debugger hangs is:
res = self._recv()
which is inside of multiprocessing.queues.get()
Tracking this internal python stuff further I find the assignments
self._recv = self._reader.recv and self._reader, self._writer = Pipe(duplex=False).
Edit 9
I'm currently trying to hunt down the import that causes it. My application is quite complex with hundreds of classes and each class importing a lot of other classes, so it's a pretty painful process. I have found a first candidate class which Uses 3 different MyDB instances when I track all its imports (but doesn't access MyDB.inqueue at any time as far as I can tell). The strange thing is, it's basically just a wrapper and the wrapped class works just fine when imported on its own. This also means that it uses MyDB without freezing. As soon as I import the wrapper (which imports that class), I have the blocking issue.
I started rewriting the wrapper by gradually reusing the old code. I'm testing each time I introduce a couple of new lines until I will hopefully see which line will cause the problem to return.
queue.Queue uses internal threads to maintain its state. If you are using GTK then it will break these threads. So you will need to call gobject.init_threads().
It should be noted that qsize() only returns an approximate size of the queue. The real size may be anywhere between 0 and the value returned by qsize().
I have two definitions or methods in python. I'd like to run them at the same exact time. Originally I tried to use forking but since the child retained the memory from the parent, it's writing multiple things that I don't need in a file. So I switched to threading.
I have something similar to
import threading
class test(threading.Thread)
def __init__(self,numA, list):
self.__numA=numA # (random number)
self.__list=list #(list)
def run(self):
makelist(self)
makelist2(self)
makelist() and makelist2() use numA and list. So in those definitions/methods instead of saying
print list
I say
print self.__list.
In the main() I made a new class object:
x = test()
x.start()
When I run my program I get an attribute error saying it cannot recognize the __list or __numA.
I've been stuck on this for a while. If there's another better way to run two methods at the same time (the methods are not connected at all) please inform me of so and explain how.
Thank you.
The __list and __numA won't be visible from makelist and makelist2 if they are not also members of the same class. The double underscore will make things like this fail:
>>> class A(object):
... def __init__(self):
... self.__a = 2
...
>>> def f(x):
... print x.__a
...
>>> a = A()
>>> f(a)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 2, in f
AttributeError: 'A' object has no attribute '__a'
But, naming the __a something without two leading underscores would work. Is that what you are seeing?
You can read more about private variables in the python documentation.
Firstly, don't name your variable the same as built-in types or functions, i.e. list.
Secondly. as well as the problems that others have pointed out (__ name mangling, initialising Thread etc), if your intention is to run makelist and makelist2 at the same time then you are doing it wrong, since your run method will still execute them one after the other. You need to run them in separate threads, not sequentially in the same thread.
Thirdly how exact do you mean by "same exact time"? Using threads in (C)Python this is physically impossible, since the execution will be interleaved at the bytecode level. Other versions of Python (Jython, IronPython etc) may run them at exactly the same time on a multi-core system, but even then you have no control over when the OS scheduler will start each one.
Finally it is a bad idea to share mutable objects between threads, since if both threads change the data at the same time then unpredictable things can (and will) happen. You need to protect against this by either using locks or only passing round immutable data or copies of the data. Using locks can also cause its own problems if you are not careful, such as deadlocks.
I'd like to run them at the same exact time.
You can't do this with threading: the Global Interpreter Lock in Python ensures that only one thread can execute Python code at any time (threads are switched every sys.getcheckinterval() bytecodes). Use multiprocessing instead:
from multiprocessing import Process
import os
def info(title):
print title
print 'module name:', __name__
print 'parent process:', os.getppid()
print 'process id:', os.getpid()
def f(name):
info('function f')
print 'hello', name
if __name__ == '__main__':
info('main line')
p = Process(target=f, args=('bob',))
p.start()
p.join()
A Couple of things:
A) When you override the __init__ method of the threading.Thread object you need to initialize threading.Thread yourself which can be accomplished by putting "threading.Thread.__init__(self)" at the end of the __init__ function
B) As msw pointed out those calls to "makelist" and "makelist2" seem to be to global functions which kinda
defeats the purpose of the threading. I recommend making them functions of test.