Using python thread as a plain object - python

If i define a python thread extending threading.Thread class and overriding run I can then invoke run() instead of start() and use it in the caller thread instead of a separate one.
i.e.
class MyThread(threading.thread):
def run(self):
while condition():
do_something()
this code (1) will execute "run" method this in a separate thread
t = MyThread()
t.start()
this code (2) will execute "run" method in the current thread
t = MyThread()
t.run()
Are there any practical disadvantages in using this approach in writing code that can be executed in either way? Could invoking directly the "run" of a Thread object cause memory problems, performance issues or some other unpredictable behavior?
In other words, what are the differences (if any notable, i guess some more memory will be allocated but It should be negligible) between invoking the code (2) on MyThread class and another identical class that extends "object" instead of "threading.Tread"
I guess that some (if any) of the more low level differences might depend on the interpreter. In case this is relevant i'm mainly interested in CPython 3.*

There will be no difference in the behavior of run when you're using the threading.Thread object, or an object of a threading.Thread's subclass, or an object of any other class that has the run method:
threading.Thread.start starts a new thread and then runs run in this thread.
run starts the activity in the calling thread, be it the main thread or another one.
If you run run in the main thread, the whole thread will be busy executing the task run is supposed to execute, and you won't be able to do anything until the task finishes.
That said, no, there will be no notable differences as the run method behaves just like any other method and is executed in the calling thread.

I looked into the code implementing threading.Thread class in cpython 3. The init method simply assigns some variables and do not do anything that seems related to actually create a new thread. Therefore we can assume that it should be safe use a threading.Thread object in the proposed manner.

Related

Which objects are not destroyed upon Python interpreter exit?

According to Python documentation:
It is not guaranteed that __del__() methods are called for objects that still exist when the interpreter exits.
I know that in older versions of Python cyclic referencing would be one of the examples for this behaviour, however as I understand it, in Python 3 such cycles will successfully be destroyed upon interpreter exit.
I'm wondering what are the cases (as close to exhaustive list as possible) when the interpreter would not destroy an object upon exit.
All examples are implementation details - Python does not promise whether or not it will call __del__ for any particular objects on interpreter exit. That said, one of the simplest examples is with daemon threads:
import threading
import time
def target():
time.sleep(1000)
class HasADel:
def __del__(self):
print('del')
x = HasADel()
threading.Thread(target=target, daemon=True).start()
Here, the daemon thread prevents the HasADel instance from being garbage collected on interpreter shutdown. The daemon thread doesn't actually do anything with that object, but Python can't clean up references the daemon thread owns, and x is reachable from references the daemon thread owns.
When the interpreter exits normally, in such ways as the program ending or sys.exit being called, not all objects are guaranteed to be destroyed. There is probably some amount of logic to this, but not very simple logic. After all, the __del__ method is for freeing memory resources, not other resources (like network connections) - that's what __enter__ and __exit__ are for.
Having said that, there are situtations in which __del__ will most certainly not be called. The parallel to this is atexit functions; they are usually run at exit. However:
Note: The functions registered via this module are not called when the program is killed by a signal not handled by Python, when a Python fatal internal error is detected, or when os._exit() is called.
atexit documentation
So, there are situations in which clean-up functions, like __del__, __exit__, and functions registered with atexit will not be called:
The program is killed by a signal not handled by Python - If a program recieves a signal to stop, like SIGINT or SIGQUIT, and it doesn't handle the signal, then it will be stopped.
A Python fatal interpreter error occurs.
os._exit() is called - the documentation says:
Exit the process with status n, without calling cleanup handlers, flushing stdio buffers, etc.
So it is pretty clear that __del__ should not be called.
In conclusion, the interpreter does not guarantee __del__ being called, but there are situations in which it will definitely not be called.
After comparing the quoted sentence from documentation and your title, I thought you misunderstood what __del__ is and what it does.
You used the word "destroyed", and documentation said __del__ may not get called in some situations... The thing is "all" objects are get deleted after the interpreter's process finishes. __del__ is not a destructor and has nothing to do with the destruction of objects. Even if a memory leakage occurs in a process, operating systems(the ones I know at least: Linux, Windows,...) will eventually reclaim that memory for the process after it finishes. So everything is destroyed/deleted!(here and here)
In normal cases when these objects are about to get destroyed, __del__ (better known as finalizer) gets called in the very last step of destruction. In other cases mentioned by other answers, It doesn't get called.
That's why people say don't count on __del__ method for cleaning vital stuff and instead use a context manager. In some scenarios, __del__ may even revive the object by passing a reference around.

Understanding a key difference between multiprocessing and threading in Python

I have written a program to use threads and I create an instance of a custom object with a run(p, q) method. I pass this run() method as the target for the thread, like this:
class MyClass(object):
def run(p, q):
# code here
obj = MyClass()
thrd = threading.Thread(target=obj.run, args=(a, b))
My thread starts by executing the run() method with the passed arguments, a and b. In my case, one of them is an Event that is eventually used to stop the thread. Also, the run() method has access to all the object's instance variables, which include other objects.
As I understand it, this works because a thread shares memory with the program that creates it.
My question is how this differs with multiprocessing. e.g.
proc = multiprocessing.Process(target=obj.run, args=(a, b))
I believe a process does not share memory, so am I able to do the same? How is the process able to access the whole obj object when it is just being given a reference to one method? Can I pass an Event? If the created process gets a copy of the whole creating program's memory, what happens with things like open database connections? How does it connect with the original Event?
And a final question (thanks for bearing with me). Is it necessary that the whole program is duplicated (with all its imported modules etc.) in a created process? What if I want a minimal process that doesn't need as much as the main program?
Happy to receive any hints at answers, or a pointer to somewhere that describes multiprocessing in this amount of detail.
Thanks so much.
Julian
each process has its own memory map! two process don't share thier memory each other.
Inside a process thread are declared inside the memory of the process they come from.
it doesn't, "obj.run" 's code is copied from this process to the new spawned.

Python multiprocessing.Process object with multiprocessing.Manager creates multiple multiprocessing forks in Windows Task Manager

I am running python 3.4.3 on Windows Standard Embedded 7. I have a class that inherits multiprocessing.Process.
In the class's run method I create a thread for the process object to start.
While watching Task Manager, specifically the Command Line column, when the process class is instantiated I see a 'from.multiprocessing.spawn import spawn_main(parent_pid=XXXX, pipe_handle=XXXX)"" --multiprocessing-fork'.
When the thread in the process starts I see another pythonw.exe multiprocessing fork from the same parent process id. When the thread finishes the separate process ends.
Why does the creation of thread in a separate process cause another multiprocessing fork to spawn?
Thanks for any insight. Will post if it will help, but figured I would ask more generically if this is expected behavior.
EDIT
Sorry it took a bit to some test code together to demonstrate the behavior I am seeing. Unfortunately I neglected to mention that I was also passing in a multiprocessing.Manager Namespace object to the process object. The code below demonstrates what I thought should happen, multiple threads spawn in the child process and only one multiprocessing fork is displayed in Task Manager.
import multiprocessing
import threading
import time
class Comm(multiprocessing.Process):
def __init__(self):#, namespace=None):
multiprocessing.Process.__init__(self)
#self.namespace=namespace
self.comm_queue=multiprocessing.Queue()
def talk(self):
counter=0
while counter != 4:
self.comm_queue.put('i am talking')
time.sleep(2)
counter += 1
def yell(self):
counter=0
while counter != 3:
self.comm_queue.put('I AM YELLING')
time.sleep(5)
counter += 1
def make_threads(self):
self.talk_thread=threading.Thread(target=self.talk)
self.yell_thread=threading.Thread(target=self.yell)
def run(self):
self.make_threads()
self.talk_thread.start()
self.yell_thread.start()
while True:
time.sleep(1)
if __name__=='__main__':
#test_manager=multiprocessing.Manager()
#test_ns=test_manager.Namespace()
test=Comm()#namespace=test_ns)
test.start()
while True:
message=test.comm_queue.get()
print(message)
However, if you uncomment everything and pass in the Namespace object, I see two multiprocessing forks spawn. Why does this happen with the multiprocessing.Manager() / Namespace() is included with the process object?
multiprocessing.Manager works by spawning a separate Manager server process, which will run until the Manager is garbage collected:
Managers provide a way to create data which can be shared between
different processes. A manager object controls a server process which
manages shared objects. Other processes can access the shared objects
by using proxies.
So, the two processes you see are expected; one is your multiprocessing.Process subclass, and the other is the multiprocessing.Manager server process.
This is an easily overlook-able point.
To elaborate the implication for future readers:
If a complex app makes use of multiple multiprocess Queues, dicts and or lists, and each time acquires them via call to a new multiprocess.Manager() object, it would end up in "too many" processes that show on the OS!
It did in mine; and thanks to Dano's input in this thread, the issue got resolved!
While acquiring dicts from a shared multiprocess.Manager() however, one noteworthy issue to beware of, as of Py 361, is: https://bugs.python.org/issue30256

thread.join() being called and its not me

I have overrriden the .join() method when creating a subclass of threading.Thread(). When I test my class with a test script it works fine, however when using it in my program the thread.join) method is being called over and over but its not me doing it. What is calling this method? No exception are being thrown as far as i can tell. using inspect the calling functions seems to be _exitfunc but I cant find any info on this.
My code is to long to post but can be found here
If the calling function is _exitfunc that means that the join method is being called at program termination. That is to be expected because the Python threading framework does call join on all running non-daemon threads as part of program termination.
The best explanation of _exitfunc is another Stack Overflow question: What is a python thread
If you don't want join() to be called when program exits, make the thread a daemon:
t.daemon = True
Non-daemon threads will keep the process running until they all die.

Python Terminated Thread Cannot Restart

I have a thread that gets executed when some action occurs. Given the logic of the program, the thread cannot possibly be started while another instance of it is still running. Yet when I call it a second time, I get a "RuntimeError: thread already started" error. I added a check to see if it is actually alive using the Thread.is_alive() function, and it is actually dead.
What am I doing wrong?
I can provide more details as are needed.
Threads cannot be restarted. You must re-create the Thread in order to start it again.
From the Python documentation:
start()
starts the thread's activity.
This must be called at most once per thread object. It arranges for the object's run()method to be invoked in a separate thread of control.
If you derive a class from threading.Thread you can add a Thread.__init__(self) at the end of your run method and you'll be able to call start again and it'll automatically reinitialize itself when done.
You can try setting
thread._Thread__started = False
It isn't officially documented, so use it on your own risk! :)

Categories