Unable to update a class variable with multiprocessing - python

I'm making a GUI application that tracks time spent on each foreground window. I attempted to do this with a loop for every process being monitored as such:
class processes(object):
def __init__(self, name, pid):
self.name = name
self.pid = pid
self.time_spent = 0
self.time_active = 0
p1 = multiprocessing.Process(target=self.loop, args=())
p1.start()
def loop(self):
t = 0
start_time = time.time()
while True:
#While the process is running, check if foreground window (window currently being used) is the same as the process
h_wnd = user32.GetForegroundWindow()
pid = wintypes.DWORD()
user32.GetWindowThreadProcessId(h_wnd, ctypes.byref(pid))
p = psutil.Process(pid.value)
name = str(p.name())
name2 = str(self.name)
if name2 == name:
t = time.time() - start_time
#Log the total time the user spent using the window
self.time_active += t
self.time_spent = time.perf_counter()
time.sleep(2)
def get_time(self):
print("{:.2f}".format(self.time_active) + " name: " + self.name)
I select the process I want in the gui and find it by its name in a list. Once found I call the function get_time() that's supposed to print how long the selected process has been in the foreground.
def display_time(Lb2):
for s in Lb2.curselection():
for e in process_list:
if Lb2.get(s) == e.name:
e.get_time()
The problem is time_active is 0 every time I print it.
I've debugged the program and can tell it's somewhat working (not perfectly, it still records time while the program is not on the foreground) and updating the variable inside the loop. However, when it comes to printing it out the value remains as 0. I think I'm having trouble understanding multiprocessing if anyone could clear up the confusion

The simplest solution was offered by #TheLizzard, i.e. just use threading instead of multiprocessing:
import threading
...
#p1 = multiprocessing.Process(target=self.loop, args=())
p1 = threading.Thread(target=self.loop, args=())
But that doesn't explain why creating a process instead did not work. What happened was that your process.__init__ code first created several attributes such as self.time_active, self.time_spent, etc. This code is executing in the main process. But when you execute the following two statements ...
p1 = multiprocessing.Process(target=self.loop, args=())
p1.start()
... the process object that was created must now be serialized/deserialized to the new address space in which the new Process instance you just created must run. Consequently, in the loop method when you execute a statement such as self.time_active += t, you are updating the instance of self.time_active that "lives" in the address space of the sub-process. But code that prints out the value of self.time_active is executing in the main process's address space and is therefore printing out only the original value of that attribute.
If you had to use multiprocessing because your loop method was CPU-intensive and you needed the parallelism with other processes, then the solution would be to create self.time_active and self.time_spent in shared memory so that both the main process and the sub-process would be accessing the same, shared attributes:
class processes(object):
def __init__(self, name, pid):
self.name = name
self.pid = pid
# Create shared floating point values:
self.time_spent = multiprocessing.Value('f', 0)
self.time_active = multiprocessing.Value('f', 0)
...
def loop(self):
...
self.time_active.value += t
self.time_spent.value = time.perf_counter()
...
def get_time(self):
print("{:.2f}".format(self.time_active.value) + " name: " + self.name)

Related

Difference in starting threading.Thread objects from a list in python3

I am trying to do an exercise about the use of multi-threading in python. This is the task "Write a program that increments a counter shared by two or more threads up untile a certain threshold. Consider various numbers of threads you can use and various initial values and thresholds. Every thread increases the value of the counter by one, if this is lower than the threashold, every 2 seconds."
My attempt at solving the problem is the following:
from threading import Thread
import threading
import time
lock = threading.Lock()
class para:
def __init__(self, value):
self.para = value
class myT(Thread):
def __init__(self,nome,para, end, lock):
Thread.__init__(self)
self.nome = nome
self.end = end
self.para = para
self.lock = lock
def run(self):
while self.para.para < self.end:
self.lock.acquire()
self.para.para += 1
self.lock.release()
time.sleep(2)
print(self.nome, self.para.para)
para = para(1)
threads = []
for i in range(2):
t = myT('Thread' + str(i), para, 15, lock)
threads.append(t)
for i in range(len(threads)):
threads[i].start()
threads[i].join()
print('End code')
I have found an issue:
for i in range(len(threads)):
threads[i].start()
threads[i].join()
The for cycle makes just one thread start while the others are not started (in fact, the output is just the Thread with name 'Thread0' incresing the variable. While if i type manually:
threads[0].start()
threads[1].start()
threads[0].join()
threads[1].join()
I get the correct output, meanining that both threads are working at the same time
Writing the join outside the for and implementing a for just for the join seems to solve the issue, but i do not completely understand why:
for i in range(len(threads)):
threads[i].start()
for i in range(len(threads)):
threads[i].join()
I wanted to ask here for an explanation of the correct way to solve the task using multi-threading in python
Here's an edit of your code and some observations.
Threads share the same memory space therefore, there's no need to pass the reference to the Lock object - that can be in global space.
The Lock object supports enter and exit and can therefore be used in the style of a work manager.
In the first loop we build a list of all threads and also start them. Once they're all started we use another loop to join them.
So now it looks like this:
from threading import Thread, Lock
class para:
def __init__(self, value):
self.para = value
class myT(Thread):
def __init__(self, nome, para, end):
super().__init__()
self.nome = nome
self.end = end
self.para = para
def run(self):
while self.para.para < self.end:
with LOCK:
self.para.para += 1
print(self.nome, self.para.para)
para = para(1)
LOCK = Lock()
threads = []
NTHREADS = 2
for i in range(NTHREADS):
t = myT(f'Thread-{i}', para, 15)
threads.append(t)
t.start()
for t in threads:
t.join()
print('End code')

Multiprocess a generator method only working once for each process

I have a generator that looks kind of like this:
class GeneratorClass():
def __init__(self, objClient):
self.clienteGen = objClient
def generatorDataClient(self):
amount = 0
while True:
amount += random.randint(0, 2000)
yield amount
sleep = random.choice([1,2,3])
print("sleep " + str(sleep))
time.sleep(sleep)
Then I iterate through it, which works: it does the current_mean() method each time new data is generated.
def iterate_clients(pos):
genobject4 = GeneratorClass(client_list[pos])
generator4 = genobject4.generatorDataClient()
current_client = genobject1.default_client
account1 = current_client.account
cnt = 0
acc_mean = 0
for item in generator4:
#We call a function previously defined
acc_mean, cnt = account1.current_mean(acc_mean, item, cnt)
print("media : " + str(acc_mean), str(cnt))
# iterate_clients(2)
And it works, you give it a valid client, it starts doing the generation operation, which is a moving average, and since it is defined with a While: True it does not stop.
Now I wanted to paralellize this and I managed to get it work, but only once:
names = ["James", "Anna"]
client_list = [Cliente(name) for name in names]
array_length = len(client_list)
import multiprocessing
if __name__ == '__main__':
for i in range(array_length):
p = multiprocessing.Process(target=iterate_clients, args=(i,))
p.start()
But instead each process starts, iterates exactly once, then stops. The result is the following:
calling object with ID: 140199258213624
calling the generator
moving average : 4622.0 1
calling object with ID: 140199258211160
sleep 2
calling the generator
moving average : 8013.0 1
sleep 1
I am sure the code can be improved but could it be I am missing some information on how to parallelize this problem in particular?
Edit:
Thanks to this answer I tried changing the loop from for i in range(array_length): to while True:
And I got something new:
calling object 140199258211160
calling the generator
calling object 140199258211160
moving average : 7993.0 1
calling the generator
duerme 3
calling object 140199258211160
calling the generator
calling object 140199258211160
moving average : 8000.0 1
moving average : 7869.0 1
duerme 3
calling the generator
And it never stops. So from this I get that I am making a huge mistake, because only 1 process gets created, and it has what seems to be a race condition, since the moving average goes back and forth and it only goes up in a normal process.
The issue here is probably that the child processes are terminating once the main process finishes so they never get a chance to fully run.
Using .join() will make the main process wait for the child processes.
...
import multiprocessing
procs = []
if __name__ == '__main__':
for i in range(array_length):
p = multiprocessing.Process(target=iterate_clients, args=(i,))
p.start()
procs.append(p) # Hold a reference to each child process
for proc in procs:
proc.join() # Wait for each child process

Can I assume my threads are done when threading.active_count() returns 1?

Given the following class:
from abc import ABCMeta, abstractmethod
from time import sleep
import threading
from threading import active_count, Thread
class ScraperPool(metaclass=ABCMeta):
Queue = []
ResultList = []
def __init__(self, Queue, MaxNumWorkers=0, ItemsPerWorker=50):
# Initialize attributes
self.MaxNumWorkers = MaxNumWorkers
self.ItemsPerWorker = ItemsPerWorker
self.Queue = Queue # For testing purposes.
def initWorkerPool(self, PrintIDs=True):
for w in range(self.NumWorkers()):
Thread(target=self.worker, args=(w + 1, PrintIDs,)).start()
sleep(1) # Explicitly wait one second for this worker to start.
def run(self):
self.initWorkerPool()
# Wait until all workers (i.e. threads) are done.
while active_count() > 1:
print("Active threads: " + str(active_count()))
sleep(5)
self.HandleResults()
def worker(self, id, printID):
if printID:
print("Starting worker " + str(id) + ".")
while (len(self.Queue) > 0):
self.scraperMethod()
if printID:
print("Worker " + str(id) + " is quiting.")
# Todo Kill is this Thread.
return
def NumWorkers(self):
return 1 # Simplified for testing purposes.
#abstractmethod
def scraperMethod(self):
pass
class TestScraper(ScraperPool):
def scraperMethod(self):
# print("I am scraping.")
# print("Scraping. Threads#: " + str(active_count()))
temp_item = self.Queue[-1]
self.Queue.pop()
self.ResultList.append(temp_item)
def HandleResults(self):
print(self.ResultList)
ScraperPool.register(TestScraper)
scraper = TestScraper(Queue=["Jaap", "Piet"])
scraper.run()
print(threading.active_count())
# print(scraper.ResultList)
When all the threads are done, there's still one active thread - threading.active_count() on the last line gets me that number.
The active thread is <_MainThread(MainThread, started 12960)> - as printed with threading.enumerate().
Can I assume that all my threads are done when active_count() == 1?
Or can, for instance, imported modules start additional threads so that my threads are actually done when active_count() > 1 - also the condition for the loop I'm using in the run method.
You can assume that your threads are done when active_count() reaches 1. The problem is, if any other module creates a thread, you'll never get to 1. You should manage your threads explicitly.
Example: You can put the threads in a list and join them one at a time. The relevant changes to your code are:
def __init__(self, Queue, MaxNumWorkers=0, ItemsPerWorker=50):
# Initialize attributes
self.MaxNumWorkers = MaxNumWorkers
self.ItemsPerWorker = ItemsPerWorker
self.Queue = Queue # For testing purposes.
self.WorkerThreads = []
def initWorkerPool(self, PrintIDs=True):
for w in range(self.NumWorkers()):
thread = Thread(target=self.worker, args=(w + 1, PrintIDs,))
self.WorkerThreads.append(thread)
thread.start()
sleep(1) # Explicitly wait one second for this worker to start.
def run(self):
self.initWorkerPool()
# Wait until all workers (i.e. threads) are done. Waiting in order
# so some threads further in the list may finish first, but we
# will get to all of them eventually
while self.WorkerThreads:
self.WorkerThreads[0].join()
self.HandleResults()
according to docs active_count() includes the main thread, so if you're at 1 then you're most likely done, but if you have another source of new threads in your program then you may be done before active_count() hits 1.
I would recommend implementing explicit join method on your ScraperPool and keeping track of your workers and explicitly joining them to main thread when needed instead of checking that you're done with active_count() calls.
Also remember about GIL...

Python GUI stays frozen waiting for thread code to finish running

I have a python GUI program that needs to do a same task but with several threads. The problem is that I call the threads but they don't execute parallel but sequentially. First one executes, it ends and then second one, etc. I want them to start independently.
The main components are:
1. Menu (view)
2. ProcesStarter (controller)
3. Process (controller)
The Menu is where you click on the "Start" button which calls a function at ProcesStarter.
The ProcesStarter creates objects of Process and threads, and starts all threads in a for-loop.
Menu:
class VotingFrame(BaseFrame):
def create_widgets(self):
self.start_process = tk.Button(root, text="Start Process", command=lambda: self.start_process())
self.start_process.grid(row=3,column=0, sticky=tk.W)
def start_process(self):
procesor = XProcesStarter()
procesor_thread = Thread(target=procesor.start_process())
procesor_thread.start()
ProcesStarter:
class XProcesStarter:
def start_process(self):
print "starting new process..."
# thread count
thread_count = self.get_thread_count()
# initialize Process objects with data, and start threads
for i in range(thread_count):
vote_process = XProcess(self.get_proxy_list(), self.get_url())
t = Thread(target=vote_process.start_process())
t.start()
Process:
class XProcess():
def __init__(self, proxy_list, url, browser_show=False):
# init code
def start_process(self):
# code for process
When I press the GUI button for "Start Process" the gui is locked until both threads finish execution.
The idea is that threads should work in the background and work in parallel.
you call procesor.start_process() immediately when specifying it as the target of the Thread:
#use this
procesor_thread = Thread(target=procesor.start_process)
#not this
procesor_thread = Thread(target=procesor.start_process())
# this is called right away ^
If you call it right away it returns None which is a valid target for Thread (it just does nothing) which is why it happens sequentially, the threads are not doing anything.
One way to use a class as the target of a thread is to use the class as the target, and the arguments to the constructor as args.
from threading import Thread
from time import sleep
from random import randint
class XProcesStarter:
def __init__(self, thread_count):
print ("starting new process...")
self._i = 0
for i in range(thread_count):
t = Thread(
target=XProcess,
args=(self.get_proxy_list(), self.get_url())
)
t.start()
def get_proxy_list(self):
self._i += 1
return "Proxy list #%s" % self._i
def get_url(self):
self._i += 1
return "URL #%d" % self._i
class XProcess():
def __init__(self, proxy_list, url, browser_show=False):
r = 0.001 * randint( 1, 5000)
sleep(r)
print (proxy_list)
print (url)
def main():
t = Thread( target=XProcesStarter, args=(4, ) )
t.start()
if __name__ == '__main__':
main()
This code runs in python2 and python3.
The reason is that the target of a Thread object must be a callable (search for "callable" and "__call__" in python documentation for a complete explanation).
Edit The other way has been explained in other people's answers (see Tadhg McDonald-Jensen).
I think your issue is that in both places you're starting threads, you're actually calling the method you want to pass as the target to the thread. That runs its code in the main thread (and tries to start the new thread on the return value, if any, once its done).
Try:
procesor_thread = Thread(target=procesor.start_process) # no () after start_process
And:
t = Thread(target=vote_process.start_process) # no () here either

Program hangs upwards of 60 seconds in between the __init__ and the run method starting?

I've encountered an odd problem. As best as I can tell, the actual act of assigning data to a variable is causing a massive delay in my program. I measured the time it takes to actually build the data that I want to assign to an instance variable, and it averages right around .7 seconds. It is only when I try to assign it (e.g. self.data = data) that the massive delay happens.
EDIT:
The above assumption was incorrect. It was not hanging during the variable assignment. I added a timer before and after the call to build_data and the time is negligible.
The delay is somehow in between when __init__ finishes and run is called.
Updated Class
Here is the code for the class. I added a timer which starts at the end of __init__ and stops at the first call to run(). There is a 50 second delay between the two!
class Worker(multiprocessing.Process):
def __init__(self, queue, image):
multiprocessing.Process.__init__(self)
self.queue = queue
self.data = self.build_data(image)
self.start_time = time.time()
def run(self):
print 'I finally reached the run statement!'
print "Time taken:", time.time() - self.start_time
print "Exiting {}".format(self.name)
And the build_data function.
def build_data(self, im):
start_time = time.time()
size, data = im.size, list(im.getdata())
data = [data[x:size[0] + x] for x in range(0, len(data), size[0])]
print 'Process time:', time.time() - start_time
return data
Any one know what could be causing this?
Worker pool code:
if __name__ == '__main__':
im = ImageGrab.grab()
queue = multiprocessing.Queue()
workers = []
for i in range(1):
w = Worker(queue, im)
w.start()
workers.append(w)
print 'waiting for workers to join'
for i in workers: i.join()
Partial Solution:
At Toreks suggestion, I looked through the programming guidelines on multiprocessing, which states that all arguments passed to __init__ are picklable. My understanding on the internals are very fuzzy, but I guess that delay in the code was the giant nested list structure being pickled/unpickled as it was passed to __init__ (Could be completely wrong on that fact though).
Moving the data out of __init__, and assigning it later cleared up the problem completely. No more delay.
Working code:
class Worker(DataStore):
data = None
def __init__(self, queue, image):
multiprocessing.Process.__init__(self)
self.queue = queue
self.img = image
self.start_time = time.time()
def run(self):
self.assign_data(self.img)
print 'I finally reached the run statement!'
print "Time taken:", time.time() - self.start_time
print "Exiting {}".format(self.name)
#classmethod
def assign_data(cls, im):
size, data = im.size, list(im.getdata())
cls.data = [
data[x:size[0] + x] for x in range(0, len(data), size[0])
]
So, I just moved the datavariable to the class scope, and then turned build_data into a #classmethod. All works swimmingly now.

Categories