Global variable doesnt update on loop on external process - python

I have been trying to pass data into a loop in another process by creating an empty object in which I put the data in. When I change the content of my object the process doesn't seem to update, and keeps returning the same value.
Here is some code I have tried:
from multiprocessing import Process
from time import sleep
class Carrier(): # Empty object to exchange data
pass
def test():
global carry
while True:
print(carry.content)
sleep(0.5)
carry = Carrier()
carry.content = "test" # Giving the value I want to share
p = Process(target=test, args=())
p.start()
while True:
carrier.content = input() # Change the value from the object
I have also tried deleting the object from memory each time and redefining it in the next loop but doesn't seem to have any effect, instead it keeps the initial "test" value, the one present when it was executed.

I assuming that you are running this program on windows as the child process or threads you open in a program are independent of their parent process. So you actually can't share data between the process like this.
It is possible to share data in when running python on linux as the spawned processes are child of parent process where it can share the data.
To be able to share data between parent and child process irrespective of the underlying OS, you can see this link - How to share data between Python processes?

Related

How to make a secondary process continually update a variable through ProcessPoolExecutor in python

Well met!
I'm trying to use pyautogui to run some simple checks, I'm attempting to make the main process detect a visual input, then start a sub process that continually updates a shared variable with the Y position of a different image as it moves through the screen until it disappears.
Unfortunately I'm barely a programmer so I keep getting stuck on the execution, so I wanted to ask for help. This is the code I wrote,
import pyautogui
import time
import importlib
foobar = importlib.import_module("cv2")
foocat = importlib.import_module("concurrent")
import numpy
import concurrent.futures
with concurrent.futures.ProcessPoolExecutor() as executor:
CheckingInput = executor.submit(CheckPositionInput)
CheckingImage = executor.submit(CheckPositionImage)
print(XMark, YMark)
print(time.time() - startingtime)
def CheckPositionInput():
Checked = False
global XImage, YImage, XMark, YMark, AreaOnScreen
while not Checked:
print('Searching')
if pyautogui.locateCenterOnScreen('Area.png', confidence=0.8) != None:
Checked = True
AreaOnScreen = True
XMark, YMark = pyautogui.locateCenterOnScreen('Area.png', confidence=0.8)
def CheckPositionImage():
global XImage, YImage, XMark, YMark, AreaOnScreen
print('start')
while not AreaOnScreen:
print('Waiting')
while AreaOnScreen:
if pyautogui.locateCenterOnScreen('Image.png', confidence=0.6) != None:
XMark, YMark = pyautogui.locateCenterOnScreen('Image.png', confidence=0.6)
print(YMark)
print('Checking')
The problems I've run into go from the while loop in CheckPositionImage closing and dying after a single loop, to the while loop in CheckPositionImage getting stuck and stopping the check position process, and that no matter what I try I can't manage to update the crucial Ymark variable properly outside the process.
It's important to understand that global variables are not read/write sharable across multiple processes. A child process can possibly inherit such a variable value (depends on what platform you are running) and read its value, but once a process assigns a new value to that variable, this change is not reflected back to any other process. This is because every process runs in its own address space and can only modify its copy of a global variable. You would need to use instead shared memory variables created by the main process and passed to its child processes. But let's ignore this fundamental problem for now and assume that global variables were sharable.
If I follow your code correctly, this is what you appear to be doing:
The main process submits two tasks to a multiprocessing pool to be processed by worker functions CheckPositionInput and CheckPositionImage and then waits for both tasks to complete to print out global variables XMark and YMark, presumably set by the CheckPositionImage function.
CheckPositionImage is effectively doing nothing until CheckPositionInput sets global variable AreaOnScreen to True, which only occurs after the call pyautogui.locateCenterOnScreen('Area.png', confidence=0.8) returns a value that is not None. When this occurs, checked is set to True and your loop terminates effectively terminating the task.
When varibale AreaOnScreen is set to True (in step 2. above), function CheckPositionImage finally enters into a loop calling pyautogui.locateCenterOnScreen('Image.png', confidence=0.6). When this function returns a value that is not None a couple of print statements are issued and the loop is re-iterated.
To the extent that my analysis is correct, I have a few comments:
This CheckPositionImage task never ends since variable AreaOnSCreen is neither ever reset to False nor is a return nor break statement issued in the loop. I assume this is an oversight and once we are returned a non-None value from our call to pyautogui.locateCenterOnScreen, we should return. My assumption is based on the fact that without this termination occurring, the main process's block beginning with concurrent.futures.ProcessPoolExecutor() as executor: will never complete (there is am implicit wait for all submitted tasks to complete) and you will therefore never fall through to the subsequent print statements.
You never initialize variable startingtime.
Function CheckPositionInput sets global variables XMark and YMark, whose values are never referenced by either the main process or function pyautogui.locateCenterOnScreen('Image.png', confidence=0.6). What is the point in calling this function a second time with identical arguments to set these variables that are never read?
You have processes running, but the actual processing is essentially sequential: The main process does nothing until both child processes it has created end and one child process does nothing useful until the other child process sets a flag when its terminating. I see, therefore, no reason for using multiprocessing at all. Your code could be simply (note that I have renamed variables and functions according to Python's PEP8 coding conventions:
import pyautogui
import time
# What is the purpose of these next 3 commented-out statements?
#import importlib
#foobar = importlib.import_module("cv2")
#foocat = importlib.import_module("concurrent")
def search_and_check():
print('Searching...')
while True:
if pyautogui.locateCenterOnScreen('Area.png', confidence=0.8) != None:
# What is the purpose of this second call, which I have commented out?
# Note that the values set, i.e. xMark and yMark, are never referenced.
#xMark, yMark = pyautogui.locateCenterOnScreen('Area.png', confidence=0.8)
break
print('Checking...')
while True:
result = pyautogui.locateCenterOnScreen('Image.png', confidence=0.6)
if result != None:
return result
starting_time = time.time()
xMark, yMark = search_and_check()
print(xMark, yMark)
print(time.time() - starting_time)
Could/should the two different calls to pyautogui.locateCenterOnScreen be done in parallel?

multiprocessing - child process constantly sending back results and keeps running

Is it possible to have a few child processes running some calculations, then send the result to main process (e.g. update PyQt ui), but the processes are still running, after a while they send back data and update ui again?
With multiprocessing.queue, it seems like the data can only be sent back after process is terminated.
So I wonder whether this case is possible or not.
I don't know what you mean by "With multiprocessing.queue, it seems like the data can only be sent back after process is terminated". This is exactly the use case that Multiprocessing.Queue was designed for.
PyMOTW is a great resource for a whole load of Python modules, including Multiprocessing. Check it out here: https://pymotw.com/2/multiprocessing/communication.html
A simple example of how to send ongoing messages from a child to the parent using multiprocessing and loops:
import multiprocessing
def child_process(q):
for i in range(10):
q.put(i)
q.put("done") # tell the parent process we've finished
def parent_process():
q = multiprocessing.Queue()
child = multiprocessing.Process(target=child_process, args=(q,))
child.start()
while True:
value = q.get()
if value == "done": # no more values from child process
break
print value
# do other stuff, child will continue to run in separate process

Is this Python code a safe way to use multi-threading

An application I use for graphics has an embedded Python interpreter - It works exactly the same as any other Python interpreter except there are a few special objects.
Basically I am trying to use Python to download a bunch of images and make other Network and disk I/O. If I do this without multithreading, my application will freeze (i.e. videos quit playing) until the downloads are finished.
To get around this I am trying to use multi-threading. However, I can not touch any of the main process.
I have written this code. The only parts unique to the program are commented. me.store / me.fetch is basically a way of getting a global variable. op('files') refers to a global table.
These are two things, "in the main process" that can only be touched in a thread safe way. I am not sure if my code does this.
I would apprecaite any input as to why or (why not) this code is thread-safe and how I can get around access the global variables in a thread safe way.
One thing I am worried about is how the counter is fetched multiple times by many threads. Since it is only updated after the file is written, could this cause a race-condition where the different threads access the counter with the same value (and then don't store the incremented value correctly). Or, what happens to the counter if the disk write fails.
from urllib import request
import threading, queue, os
url = 'http://users.dialogfeed.com/en/snippet/dialogfeed-social-wall-twitter-instagram.json?api_key=ac77f8f99310758c70ee9f7a89529023'
imgs = [
'http://search.it.online.fr/jpgs/placeholder-hollywood.jpg.jpg',
'http://www.lpkfusa.com/Images/placeholder.jpg',
'http://bi1x.caltech.edu/2015/_images/embryogenesis_placeholder.jpg'
]
def get_pic(url):
# Fetch image data
data = request.urlopen(url).read()
# This is the part I am concerned about, what if multiple threads fetch the counter before it is updated below
# What happens if the file write fails?
counter = me.fetch('count', 0)
# Download the file
with open(str(counter) + '.jpg', 'wb') as outfile:
outfile.write(data)
file_name = 'file_' + str(counter)
path = os.getcwd() + '\\' + str(counter) + '.jpg'
me.store('count', counter + 1)
return file_name, path
def get_url(q, results):
url = q.get_nowait()
file_name, path = get_pic(url)
results.append([file_name, path])
q.task_done()
def fetch():
# Clear the table
op('files').clear()
results = []
url_q = queue.Queue()
# Simulate getting a JSON feed
print(request.urlopen(url).read().decode('utf-8'))
for img in imgs:
# Add url to queue and start a thread
url_q.put(img)
t = threading.Thread(target=get_url, args=(url_q, results,))
t.start()
# Wait for threads to finish before updating table
url_q.join()
for cell in results:
op('files').appendRow(cell)
return
# Start a thread so that the first http get doesn't block
thread = threading.Thread(target=fetch)
thread.start()
Your code doesn't appear to be safe at all. Key points:
Appending to results is unsafe -- two threads might try to append to the list at the same time.
Accessing and setting counter is unsafe -- a thread my fetch counter before another thread has set the new counter value.
Passing a queue of urls is redundant -- just pass a new url to each job.
Another way (concurrent.futures)
Since you are using python 3, why not make use of the concurrent.futures module, which makes your task much easier to manage. Below I've written out your code in a way which does not require explicit synchronisation -- all the work is handled by the futures module.
from urllib import request
import os
import threading
from concurrent.futures import ThreadPoolExecutor
from itertools import count
url = 'http://users.dialogfeed.com/en/snippet/dialogfeed-social-wall-twitter-instagram.json?api_key=ac77f8f99310758c70ee9f7a89529023'
imgs = [
'http://search.it.online.fr/jpgs/placeholder-hollywood.jpg.jpg',
'http://www.lpkfusa.com/Images/placeholder.jpg',
'http://bi1x.caltech.edu/2015/_images/embryogenesis_placeholder.jpg'
]
def get_pic(url, counter):
# Fetch image data
data = request.urlopen(url).read()
# Download the file
with open(str(counter) + '.jpg', 'wb') as outfile:
outfile.write(data)
file_name = 'file_' + str(counter)
path = os.getcwd() + '\\' + str(counter) + '.jpg'
return file_name, path
def fetch():
# Clear the table
op('files').clear()
with ThreadPoolExecutor(max_workers=2) as executor:
count_start = me.fetch('count', 0)
# reserve these numbers for our tasks
me.store('count', count_start + len(imgs))
# separate fetching and storing is usually not thread safe
# however, if only one thread modifies count (the one running fetch) then
# this will be safe (same goes for the files variable)
for cell in executor.map(get_pic, imgs, count(count_start)):
op('files').appendRow(cell)
# Start a thread so that the first http get doesn't block
thread = threading.Thread(target=fetch)
thread.start()
If multiple threads modify count then you should use a lock when modifying count.
eg.
lock = threading.Lock()
def fetch():
...
with lock:
# Do not release the lock between accessing and modifying count.
# Other threads wanting to modify count, must use the same lock object (not
# another instance of Lock).
count_start = me.fetch('count', 0)
me.store('count', count_start + len(imgs))
# use count_start here
The only problem with this if one job fails for some reason then you will get a missing file number. Any raised exception will also interrupt the executor doing the mapping, by re-raising the exception there --so you can then do something if needed.
You could avoid using a counter by using the tempfile module to find somewhere to temporarily store a file before moving the file somewhere permanent.
Remember to look at multiprocessing and threading if you are new to python multi-threading stuff.
Your code seems ok, though the code style is not very easy to read. You need to run it to see if it works as your expectation.
with will make sure your lock is released. The acquire() method will be called when the block is entered, and release() will be called when the block is exited.
If you add more threads, make sure they are not using the same address from queue and no race condition (seems it is done by Queue.get(), but you need to run it to verify). Remember, each threads share the same process so almost everything is shared. You don't want two threads are handling the same address
The Lock doesn't do anything at all. You only have one thread that ever calls download_job - that's the one you assigned to my_thread. The other one, the main thread, calls offToOn and is finished as soon as it reaches the end of that function. So there is no second thread that ever tries to acquire the lock, and hence no second thread ever gets blocked. The table you mention is, apparently, in a file that you explicitly open and close. If the operating system protects this file against simultaneous access from different programs, you can get away with this; otherwise it is definitely unsafe because you haven't accomplished any thread synchronization.
Proper synchronization between threads requires that different threads have access to the SAME lock; i.e., one lock is accessed by multiple threads. Also note that "thread" is not a synonym for "process." Python supports both. If you're really supposed to avoid accessing the main process, you have to use the multiprocessing module to launch and manage a second process.
And this code will never exit, since there is always a thread running in an infinite loop (in threader).
Accessing a resource in a thread-safe manner requires something like this:
a_lock = Lock()
def use_resource():
with a_lock:
# do something
The lock is created once, outside the function that uses it. Every access to the resource in the whole application, from whatever thread, must acquire the same lock, either by calling use_resource or some equivalent.

How to stop a thread in python

I am using thread in my python card application.
whenever i press the refresh button i am calling a function using thread, when the function was called, there will be a another function inside the main function.
what i want is whenever the child function ends. I want the thread to be killed or stopped without closing the application or ctrl+ c.
i started the thread like this
def on_refresh_mouseClick(self,event):
thread.start_new_thread( self.readalways,() )
in the "readalways" function i am using while loop, in that while loop whenever the condition satisfies it will call continuousread() function. check it:
def readalways(self):
while 1:
cardid = self.parent.uhf.readTagId()
print "the tag id is",cardid
self.status = self.parent.db.checktagid(cardid)
if len(self.status) != 0:
break
print "the value is",self.status[0]['id']
self.a = self.status[0]['id']
self.continuesread()
def continuesread(self):
.......
.......
after this continuesread read function the values that in the thread should be cleared.
Because, if i again click the refresh button a new thread is starting but the some of the values are coming from the old thread.
so i want to kill the old thread when it completes the continuesread function
Please note that different threads from the same process share their memory, e.g. when you access self.status, you (probably) manipulate an object shared within the whole process. Thus, even if your threads are killed when finishing continuesread (what they probably are), the manipulated object's state will still remain the same.
You could either
hold the status in a local variable instead of an attribute of self,
initialize those attributes when entering readalways,
or save this state in local storage of an thread object, which is not shared (see documentation).
The first one seems to be the best as far as I can see.

How to use multiprocessing with class instances in Python?

I am trying to create a class than can run a separate process to go do some work that takes a long time, launch a bunch of these from a main module and then wait for them all to finish. I want to launch the processes once and then keep feeding them things to do rather than creating and destroying processes. For example, maybe I have 10 servers running the dd command, then I want them all to scp a file, etc.
My ultimate goal is to create a class for each system that keeps track of the information for the system in which it is tied to like IP address, logs, runtime, etc. But that class must be able to launch a system command and then return execution back to the caller while that system command runs, to followup with the result of the system command later.
My attempt is failing because I cannot send an instance method of a class over the pipe to the subprocess via pickle. Those are not pickleable. I therefore tried to fix it various ways but I can't figure it out. How can my code be patched to do this? What good is multiprocessing if you can't send over anything useful?
Is there any good documentation of multiprocessing being used with class instances? The only way I can get the multiprocessing module to work is on simple functions. Every attempt to use it within a class instance has failed. Maybe I should pass events instead? I don't understand how to do that yet.
import multiprocessing
import sys
import re
class ProcessWorker(multiprocessing.Process):
"""
This class runs as a separate process to execute worker's commands in parallel
Once launched, it remains running, monitoring the task queue, until "None" is sent
"""
def __init__(self, task_q, result_q):
multiprocessing.Process.__init__(self)
self.task_q = task_q
self.result_q = result_q
return
def run(self):
"""
Overloaded function provided by multiprocessing.Process. Called upon start() signal
"""
proc_name = self.name
print '%s: Launched' % (proc_name)
while True:
next_task_list = self.task_q.get()
if next_task is None:
# Poison pill means shutdown
print '%s: Exiting' % (proc_name)
self.task_q.task_done()
break
next_task = next_task_list[0]
print '%s: %s' % (proc_name, next_task)
args = next_task_list[1]
kwargs = next_task_list[2]
answer = next_task(*args, **kwargs)
self.task_q.task_done()
self.result_q.put(answer)
return
# End of ProcessWorker class
class Worker(object):
"""
Launches a child process to run commands from derived classes in separate processes,
which sit and listen for something to do
This base class is called by each derived worker
"""
def __init__(self, config, index=None):
self.config = config
self.index = index
# Launce the ProcessWorker for anything that has an index value
if self.index is not None:
self.task_q = multiprocessing.JoinableQueue()
self.result_q = multiprocessing.Queue()
self.process_worker = ProcessWorker(self.task_q, self.result_q)
self.process_worker.start()
print "Got here"
# Process should be running and listening for functions to execute
return
def enqueue_process(target): # No self, since it is a decorator
"""
Used to place an command target from this class object into the task_q
NOTE: Any function decorated with this must use fetch_results() to get the
target task's result value
"""
def wrapper(self, *args, **kwargs):
self.task_q.put([target, args, kwargs]) # FAIL: target is a class instance method and can't be pickled!
return wrapper
def fetch_results(self):
"""
After all processes have been spawned by multiple modules, this command
is called on each one to retreive the results of the call.
This blocks until the execution of the item in the queue is complete
"""
self.task_q.join() # Wait for it to to finish
return self.result_q.get() # Return the result
#enqueue_process
def run_long_command(self, command):
print "I am running number % as process "%number, self.name
# In here, I will launch a subprocess to run a long-running system command
# p = Popen(command), etc
# p.wait(), etc
return
def close(self):
self.task_q.put(None)
self.task_q.join()
if __name__ == '__main__':
config = ["some value", "something else"]
index = 7
workers = []
for i in range(5):
worker = Worker(config, index)
worker.run_long_command("ls /")
workers.append(worker)
for worker in workers:
worker.fetch_results()
# Do more work... (this would actually be done in a distributor in another class)
for worker in workers:
worker.close()
Edit: I tried to move the ProcessWorker class and the creation of the multiprocessing queues outside of the Worker class and then tried to manually pickle the worker instance. Even that doesn't work and I get an error
RuntimeError: Queue objects should only be shared between processes
through inheritance
. But I am only passing references of those queues into the worker instance?? I am missing something fundamental. Here is the modified code from the main section:
if __name__ == '__main__':
config = ["some value", "something else"]
index = 7
workers = []
for i in range(1):
task_q = multiprocessing.JoinableQueue()
result_q = multiprocessing.Queue()
process_worker = ProcessWorker(task_q, result_q)
worker = Worker(config, index, process_worker, task_q, result_q)
something_to_look_at = pickle.dumps(worker) # FAIL: Doesn't like queues??
process_worker.start()
worker.run_long_command("ls /")
So, the problem was that I was assuming that Python was doing some sort of magic that is somehow different from the way that C++/fork() works. I somehow thought that Python only copied the class, not the whole program into a separate process. I seriously wasted days trying to get this to work because all of the talk about pickle serialization made me think that it actually sent everything over the pipe. I knew that certain things could not be sent over the pipe, but I thought my problem was that I was not packaging things up properly.
This all could have been avoided if the Python docs gave me a 10,000 ft view of what happens when this module is used. Sure, it tells me what the methods of multiprocess module does and gives me some basic examples, but what I want to know is what is the "Theory of Operation" behind the scenes! Here is the kind of information I could have used. Please chime in if my answer is off. It will help me learn.
When you run start a process using this module, the whole program is copied into another process. But since it is not the "__main__" process and my code was checking for that, it doesn't fire off yet another process infinitely. It just stops and sits out there waiting for something to do, like a zombie. Everything that was initialized in the parent at the time of calling multiprocess.Process() is all set up and ready to go. Once you put something in the multiprocess.Queue or shared memory, or pipe, etc. (however you are communicating), then the separate process receives it and gets to work. It can draw upon all imported modules and setup just as if it was the parent. However, once some internal state variables change in the parent or separate process, those changes are isolated. Once the process is spawned, it now becomes your job to keep them in sync if necessary, either through a queue, pipe, shared memory, etc.
I threw out the code and started over, but now I am only putting one extra function out in the ProcessWorker, an "execute" method that runs a command line. Pretty simple. I don't have to worry about launching and then closing a bunch of processes this way, which has caused me all kinds of instability and performance issues in the past in C++. When I switched to launching processes at the beginning and then passing messages to those waiting processes, my performance improved and it was very stable.
BTW, I looked at this link to get help, which threw me off because the example made me think that methods were being transported across the queues: http://www.doughellmann.com/PyMOTW/multiprocessing/communication.html
The second example of the first section used "next_task()" that appeared (to me) to be executing a task received via the queue.
Instead of attempting to send a method itself (which is impractical), try sending a name of a method to execute.
Provided that each worker runs the same code, it's a matter of a simple getattr(self, task_name).
I'd pass tuples (task_name, task_args), where task_args were a dict to be directly fed to the task method:
next_task_name, next_task_args = self.task_q.get()
if next_task_name:
task = getattr(self, next_task_name)
answer = task(**next_task_args)
...
else:
# poison pill, shut down
break
REF: https://stackoverflow.com/a/14179779
Answer on Jan 6 at 6:03 by David Lynch is not factually correct when he says that he was misled by
http://www.doughellmann.com/PyMOTW/multiprocessing/communication.html.
The code and examples provided are correct and work as advertised. next_task() is executing a task received via the queue -- try and understand what the Task.__call__() method is doing.
In my case what, tripped me up was syntax errors in my implementation of run(). It seems that the sub-process will not report this and just fails silently -- leaving things stuck in weird loops! Make sure you have some kind of syntax checker running e.g. Flymake/Pyflakes in Emacs.
Debugging via multiprocessing.log_to_stderr()F helped me narrow down the problem.

Categories