How get value from function which is executed by Thread? - python

I have main_script.py which import scripts which get data from webpages. I want do this by use multithreading. I came up with this solution, but it does not work:
main_script:
import script1
temp_path = ''
thread1 = threading.Thread(target=script1.Main,
name='Script1',
args=(temp_path, ))
thread1.start()
thread1.join()
script1:
class Main:
def __init__()
def some_func()
def some_func2()
def __main__():
some_func()
some_func2()
return callback
Now only 1 way I know to get value of callback from script1 to main_script is:
main_script:
import script1
temp_path = ''
# make instance of class with temp_path
inst_script1 = script1.Main(temp_path)
print("instance1:")
print(inst_script1.callback)
It's works but then I run instances of scripts one-by-one, no concurrently.
Anybody has any idea how handle that? :)

First off if you are using threading in Python make sure you read: https://docs.python.org/2/glossary.html#term-global-interpreter-lock. Unless you are using C modules or a lot of I/O you won't see the scripts run concurrently. Generally speaking, multiprocessing.pool is a better approach.
If you are certain we want threads rather then processes you can use a mutable variable to store the result. For example, a dictionary which keeps track of the result of each thread.
result = {}
def test(val, name, target):
target[name] = val * 4
temp_path = 'ASD'
thread1 = threading.Thread(target=test,
name='Script1',
args=(temp_path, 'A', result))
thread1.start()
thread1.join()
print (result)

Thanks for response. Yes, I readed about GIL, but it's doesn't make me any problem yet. Generally I solve my problem, because I find solution on other website. Code like this now:
Main_script:
import queue
import script1
import script2
queue_callbacks = queue.Queue()
threads_list = list()
temp_path1 = ''
thread1 = threading.Thread(target= lambda q, arg1: q.put(Script1.Main(arg1)),
name='Script1',
args=(queue_callbacks, temp_path1, ))
thread1.start()
temp_path2 = ''
thread2 = threading.Thread(target= lambda q, arg1: q.put(Script2.Main(arg1)),
name='Script2',
args=(queue_callbacks, temp_path2, ))
thread2.start()
for t in threads_list:
t.join()
while not kolejka_callbacks.empty():
result = queue_callbacks.get()
callbacks.append({"service": result.service, "callback": result.callback, "error": result.error})
And this works fine. Now I have other problem, because I want this to work in big scale, where I have a hundreds of scripts and handle this by e.q. 5 threads.
In general, is there any limit to the number of threads running at any one time?

Related

Python JoinableQueue and Queue Thread Not Completing

I am using the following code to complete a task using multithreading with Queue and Joinable Queue. Sometimes the script executes perfectly other times it stalls at the end of the task without ending the worker and will not continue on to the next portion of the script. I am new to working with Queue and JoinableQueue and I need to find out why this stalling happens.
Before this part in the code I run another Queue, JoinableQueue worker to download some data and it works perfectly fine everytime. Do I need to close() any thing from the first Queue/JoinableQueue? Is there a way to check if it stalls and if so continue on?
Here is my code:
import multiprocessing
from multiprocessing import Queue
from multiprocessing import JoinableQueue
from threading import Thread
def run_this_definition(hr):
#do things here
return()
def worker():
while True:
item = jq.get()
run_this_definition(item)
jq.task_done()
return()
q = Queue()
jq = JoinableQueue()
number_of_threads = 8
for i in range(number_of_threads):
t = Thread(target=worker)
t.daemon = True
t.start()
input_list = [0,1,2,3,4]
for item in input_list:
jq.put(item)
jq.join()
print "finished"
The script never prints "finished" when it stalls, but seems to finish all the tasks and stalls at the end of the 'run_this_definition' on the very last item in the Queue.
My guess is you are using the multiprocessing.JoinableQueue()!? Use the Queue.Queue() instead for threading. It has a .join() and a .task_done() method as well. Furthermore you should pass your queue as an argument to your threads: See the following example:
import threading
from threading import Thread
from Queue import Queue
def worker(jq):
while True:
item = jq.get()
# Do whatever you have to do.
print '{}: {}'.format(threading.currentThread().name, item)
jq.task_done()
return()
number_of_threads = 4
input_list = [1,2,3,4,5]
jq = Queue()
for i in range(number_of_threads):
t = Thread(target=worker, args=(jq,))
t.daemon = True
t.start()
for item in input_list:
jq.put(item)
jq.join()
print "finished"
The print output form multiple threads might look messy, but as an example it should be fine.
For the future: Please provide a comprehensive example of your code. Neither your imports, nor number_of_threads, run_this_definition or input_list were defined in your example.

Basic multiprocessing with infinity loop and queue

import random
import queue as Queue
import _thread as Thread
a = Queue.Queue()
def af():
while True:
a.put(random.randint(0,1000))
def bf():
while True:
if (not a.empty()): print (a.get())
def main():
Thread.start_new_thread(af, ())
Thread.start_new_thread(bf, ())
return
if __name__ == "__main__":
main()
the above code works fine with extreme high CPU usage, i tried to use multiprocessing with no avail. i have tried
def main():
multiprocessing.Process(target=af).run()
multiprocessing.Process(target=bf).run()
and
def main():
manager = multiprocessing.Manager()
a = manager.Queue()
pool = multiprocessing.Pool()
pool.apply_async(af)
pool.apply_async(bf)
both not working, can anyone please help me? thanks a bunch ^_^
def main():
multiprocessing.Process(target=af).run() # will not return
multiprocessing.Process(target=bf).run()
The above code does not work because af does not return; no chance to call bf. You need to separate run call to start/join so that both can run in parallel. (+ to make them share manage.Queue)
To make the second code work, you need to pass a (manager.Queue object) to functions. Otherwise they will use Queue.Queue global object which is not shared between processes; need to modify af, bf to accepts a, and main to pass a.
def af(a):
while True:
a.put(random.randint(0, 1000))
def bf(a):
while True:
print(a.get())
def main():
manager = multiprocessing.Manager()
a = manager.Queue()
pool = multiprocessing.Pool()
proc1 = pool.apply_async(af, [a])
proc2 = pool.apply_async(bf, [a])
# Wait until process ends. Uncomment following line if there's no waiting code.
# proc1.get()
# proc2.get()
In the first alternative main you use Process, but the method you should call to start the activity is not run(), as one would think, but rather start(). You will want to follow that up with appropriate join() statements. Following the information in multiprocessing (available here: https://docs.python.org/2/library/multiprocessing.html), here is a working sample:
import random
from multiprocessing import Process, Queue
def af(q):
while True:
q.put(random.randint(0,1000))
def bf(q):
while True:
if not q.empty():
print (q.get())
def main():
a = Queue()
p = Process(target=af, args=(a,))
c = Process(target=bf, args=(a,))
p.start()
c.start()
p.join()
c.join()
if __name__ == "__main__":
main()
To add to the accepted answer, in the original code:
while True:
if not q.empty():
print (q.get())
q.empty() is being called every time which is unnecessary since q.get() if the queue is empty will wait until something is available here documentation.
Similar answer here
I assume that this could affect the performance since calling the .empty() every iteration should consume more resources (it should be more noticeable if Thread was used instead of Process because Python Global Interpreter Lock (GIL))
I know it's an old question but hope it helps!

Control running Python Process (multiprocessing)

I have yet another question about Python multiprocessing.
I have a module that creates a Process and just runs in a while True loop.
This module is meant to be enabled/disabled from another Python module.
That other module will import the first one once and is also run as a process.
How would I better implement this?
so for a reference:
#foo.py
def foo():
while True:
if enabled:
#do something
p = Process(target=foo)
p.start()
and imagine second module to be something like that:
#bar.py
import foo, time
def bar():
while True:
foo.enable()
time.sleep(10)
foo.disable()
Process(target=bar).start()
Constantly running a process checking for condition inside a loop seems like a waste, but I would gladly accept the solution that just lets me set the enabled value from outside.
Ideally I would prefer to be able to terminate and restart the process, again from outside of this module.
From my understanding, I would use a Queue to pass commands to the Process. If it is indeed just that, can someone show me how to set it up in a way that I can add something to the queue from a different module.
Can this even be easily done with Python or is it time to abandon hope and switch to something like C or Java
I purposed in comment two different approches :
using a shared variable from multiprocessing.Value
pause / resume the process with signals
Control by sharing a variable
def target_process_1(run_statement):
while True:
if run_statement.value:
print "I'm running !"
time.sleep(1)
def target_process_2(run_statement):
time.sleep(3)
print "Stoping"
run_statement.value = False
time.sleep(3)
print "Resuming"
run_statement.value = True
if __name__ == "__main__":
run_statement = Value("i", 1)
process_1 = Process(target=target_process_1, args=(run_statement,))
process_2 = Process(target=target_process_2, args=(run_statement,))
process_1.start()
process_2.start()
time.sleep(8)
process_1.terminate()
process_2.terminate()
Control by sending a signal
from multiprocessing import Process
import time
import os, signal
def target_process_1():
while True:
print "Running !"
time.sleep(1)
def target_process_2(target_pid):
time.sleep(3)
os.kill(target_pid, signal.SIGSTOP)
time.sleep(3)
os.kill(target_pid, signal.SIGCONT)
if __name__ == "__main__":
process_1 = Process(target=target_process_1)
process_1.start()
process_2 = Process(target=target_process_2, args=(process_1.pid,))
process_2.start()
time.sleep(8)
process_1.terminate()
process_2.terminate()
Side note: if possible do not run a while True.
EDIT: if you want to manage your process in two different files, supposing you want to use a control by sharing a variable, this is a way to do.
# file foo.py
from multiprocessing import Value, Process
import time
__all__ = ['start', 'stop', 'pause', 'resume']
_statement = None
_process = None
def _target(run_statement):
""" Target of the foo's process """
while True:
if run_statement.value:
print "I'm running !"
time.sleep(1)
def start():
global _process, _statement
_statement = Value("i", 1)
_process = Process(target=_target, args=(_statement,))
_process.start()
def stop():
global _process, _statement
_process.terminate()
_statement, _process = None, _process
def enable():
_statement.value = True
def disable():
_statement.value = False

python threading : cannot switch thread to Daemon

I would expect next code to be executed simultaneously and all filenames from os.walk iterations , that got 0 at random , will get in result dictionary. And all threads that have some timeout would get into deamon mode and will be killed as soon as script reaches end. However, script respects all timeouts for each thread.
Why is this happening? Should it put all threads in backgroung and kill them if they will not finish and return result before the end of script execution? thank you.
import threading
import os
import time
import random
def check_file(file_name,timeout):
time.sleep(timeout)
print file_name
result.append(file_name)
result = []
for home,dirs,files in os.walk("."):
for ifile in files :
filename = '/'.join([home,ifile])
t = threading.Thread(target=check_file(filename,random.randint(0,5)))
t.setDaemon(True)
t.start()
print result
Solution: I found my mistake:
t = threading.Thread(target=check_file(filename,random.randint(0,5)))
has to be
t = threading.Thread(target=check_file, args=(filename,random.randint(0,5)))
In this case, threading will spawn a thread with function as object ang give it arguments. In my initial example, function with args has to be resolved BEFORE thread spawns. And this is fair.
However, example above works for me at 2.7.3 , but at 2.7.2 i cannot make it working.
I `m getting got exception that
function check_file accepts exactly 1 argument (34 is given).
Soulution :
in 2.7.2 i had to put ending coma in args tuple , considering that i have 1 variable only . God knows why this not affects 2.7.3 version . It was
t = threading.Thread(target=check_file, args=(filename))
and started to work with
t = threading.Thread(target=check_file, args=(filename,))
I understand what you were trying to do, but you're not using the right format for threading. I fixed your example...look up the Queue class on how to do this properly.
Secondly, never ever do string manipulation on file paths. Use the os.path module; there's a lot more than adding separators between strings that you and I don't think about most of the time.
Good luck!
import threading
import os
import time
import random
import Queue
def check_file():
while True:
item = q.get()
time.sleep(item[1])
print item
q.task_done()
q = Queue.Queue()
result = []
for home,dirs,files in os.walk("."):
for ifile in files:
filename = os.path.join(home, ifile)
q.put((filename, random.randint(0,5)))
number_of_threads = 25
for i in range(number_of_threads):
t = threading.Thread(target=check_file)
t.daemon = True
t.start()
q.join()
print result

parallelly execute blocking calls in python

I need to do a blocking xmlrpc call from my python script to several physical server simultaneously and perform actions based on response from each server independently.
To explain in detail let us assume following pseudo code
while True:
response=call_to_server1() #blocking and takes very long time
if response==this:
do that
I want to do this for all the servers simultaneously and independently but from same script
Use the threading module.
Boilerplate threading code (I can tailor this if you give me a little more detail on what you are trying to accomplish)
def run_me(func):
while not stop_event.isSet():
response= func() #blocking and takes very long time
if response==this:
do that
def call_to_server1():
#code to call server 1...
return magic_server1_call()
def call_to_server2():
#code to call server 2...
return magic_server2_call()
#used to stop your loop.
stop_event = threading.Event()
t = threading.Thread(target=run_me, args=(call_to_server1))
t.start()
t2 = threading.Thread(target=run_me, args=(call_to_server2))
t2.start()
#wait for threads to return.
t.join()
t2.join()
#we are done....
You can use multiprocessing module
import multiprocessing
def call_to_server(ip,port):
....
....
for i in xrange(server_count):
process.append( multiprocessing.Process(target=call_to_server,args=(ip,port)))
process[i].start()
#waiting process to stop
for p in process:
p.join()
You can use multiprocessing plus queues. With one single sub-process this is the example:
import multiprocessing
import time
def processWorker(input, result):
def remoteRequest( params ):
## this is my remote request
return True
while True:
work = input.get()
if 'STOP' in work:
break
result.put( remoteRequest(work) )
input = multiprocessing.Queue()
result = multiprocessing.Queue()
p = multiprocessing.Process(target = processWorker, args = (input, result))
p.start()
requestlist = ['1', '2']
for req in requestlist:
input.put(req)
for i in xrange(len(requestlist)):
res = result.get(block = True)
print 'retrieved ', res
input.put('STOP')
time.sleep(1)
print 'done'
To have more the one sub-process simply use a list object to store all the sub-processes you start.
The multiprocessing queue is a safe object.
Then you may keep track of which request is being executed by each sub-process simply storing the request associated to a workid (the workid can be a counter incremented when the queue get filled with new work). Usage of multiprocessing.Queue is robust since you do not need to rely on stdout/err parsing and you also avoid related limitation.
Then, you can also set a timeout on how long you want a get call to wait at max, eg:
import Queue
try:
res = result.get(block = True, timeout = 10)
except Queue.Empty:
print error
Use twisted.
It has a lot of useful stuff for work with network. It is also very good at working asynchronously.

Categories