How do I convert current python program to multi prosessing/multi threading - python

Sorry to ask this question as a new python starter, I have a working python program to be converted into multiprocessing or multithreading, here is the working py's structure:
class XMLToJson():
def __init__(self, region=None, flow=None, path=None, output=None):
def run(self):
def run_from_cmd():
XMLToJson().run()
if __name__ == '__main__':
XMLToJson().run()
It would be greatly appreciated if anyone can tell me how to do the conversion.
Thank you very much.
P.S.
The following is the framework I am thinking how to fit into it:
from threading import Thread, current_thread, Lock
import time
def worker(l):
while True:
l.acquire()
print ('in worker:' + str(current_thread()))
l.release()
time.sleep(0.5)
if __name__ == '__main__':
l = Lock()
print ('in main: ' + str(current_thread()))
threads = [Thread(target=worker, args=[l]) for i in range(5)]
for t in threads:
t.start()
for t in threads:
t.join()
I modified the original working program from run() to main_process(), and set the target from worker to main_process,
if __name__ == '__main__':
l = Lock()
print ('in main: ' + str(current_thread()))
threads = [Thread(target=main_process, args=[l]) for i in range(5)]
for t in threads:
t.start()
for t in threads:
t.join()
but the program doesn't even pass the compile, error out in target=main_process.
Thank you very much.

Your question lacks quite a bit of specifics. What is your program doing? Why do you want to multiprocess/thread? What is your input/output? What is there to multiprocess/multithread?
If what you have is a script that does input => transform => output and terminates, multiprocessing/threading would be just a way to process several sets of input at the same time to gain time. In that case you could either call your script several times with each set of inputs, or pass the multi-inputs to a single instance of your multi-threaded script where you use e.g. binge library (pip install binge) to deal with multiprocessing:
from binge import B
result = B(worker, n=5)(....)
where worker is your transform function, n the number of times it should happen, and .... your inputs to be sent to the 5 parallel worker instances - mind that if you have n=5, then your inputs should be either length 5 (distributed over workers), or 1 (given identically to each worker).
cf: binge documentation

Related

Threading module

Have a question, I'm some new in threading, I made this code....
import threading
from colorama import *
import random
import os
listax = [Fore.GREEN,Fore.YELLOW,Fore.RED]
print(random.choice(listax))
def hola():
import requests
a = requests.get('https://google.com')
print(a.status_code)
if __name__ == "__main__":
t1 = threading.Thread(target=hola)
t2 = threading.Thread(target=hola)
t3 = threading.Thread(target=hola)
t1.start()
t2.start()
t3.start()
t1.join()
t2.join()
t3.join()
And output shows 3 times if I execute 3 times the code, but my question is, for example, if I have big code and all start in:
def main():
code...
How I can add multiple threading for fast work, I see I can add 1 thread, if I add 3 threads the output shows 3 times, but how I can do it for example for add 10 threads to the same task without the output repeating 10 times for this execute fast as possible using the resourses of the system?
Multithreading does not magically sped up your code. It's up to you to break the code in chunks that can be run concurrently. When you create 3 threads that run hola, you are not "running hola once using 3 threads", but you are "running hola three times, each time in a different thread.
Although multithreading can be used to perform computation in parallel, the most common python interpreter (CPython) is implemented using a lock (the GIL) that lets only one thread run at a time. There are libraries that release the GIL before doing CPU-intensive work, so threading in python is useful for doing CPU-intensive work. Moreover, I/O operations relese the gil, so multithreading in python is very well suited for I/O work.
As an example, let's imagine that you have to need to access three different sites. You can access them sequentially, one after the other:
import requests
sites = ['https://google.com', 'https://yahoo.com', 'https://rae.es']
def hola(site):
a = requests.get(site)
print(site, " answered ", a.status_code)
for s in sites:
hola(s)
Or concurrently (all at the same time) using threads:
import requests
import threading
sites = ['https://google.com', 'https://yahoo.com', 'https://rae.es']
def hola(site):
a = requests.get(site)
print(site, " answered ", a.status_code)
th = [threading.Thread(target=hola, args=(s, )) for s in sites]
for t in th:
t.start()
for t in th:
t.join()
Please note that this is a simple example: the output can get scrambled, you have no acces to the return values, etc. For this kind of tasks I would use a thread pool.
i tried to use the loop of the code you give me
# Python program to illustrate the concept
# of threading
# importing the threading module
import threading
from colorama import *
import random
import os
listax = [Fore.GREEN,Fore.YELLOW,Fore.RED]
print(random.choice(listax))
"""
def print_cube(num):
function to print cube of given num
print("Cube: {}".format(num * num * num))
"""
def print_square():
num = 2
"""
function to print square of given num
"""
print("Square: {}".format(num * num))
def hola():
import requests
a = requests.get('https://google.com')
print(a.status_code)
if __name__ == "__main__":
for j in range(10):
t1 = threading.Thread(target=hola)
t1.start()
t1.join()
but when i run the code the code run 1 print per time, in my case give me
200
1 sec later again 200
and 200 again (x 10 times because i added 10 thread)
but i want know how i can do for this execute as fast possible without show me the 10 output, just i want the code do 1 print but as fast possible with 10 thread for example
You can simply use a for loop.
number_of_threads is the number of how many threads u want to run
for _ in range(number_of_threads):
t = threading.Thread(target=hola)
t.start()
t.join()

Multiprocessing Pool initializer fails pickling

I am trying to use the multiprocessing.Pool to implement a multithread application. To share some variables I am using a Queue as hinted here:
def get_prediction(data):
#here the real calculation will be performed
....
def mainFunction():
def get_prediction_init(q):
print("a")
get_prediction.q = q
queue = Queue()
pool = Pool(processes=16, initializer=get_prediction_init, initargs=[queue,])
if __name__== '__main__':
mainFunction()
This code is running perfectly on a Debian machine, but is not working at all on another Windows 10 device. It fails with the error
AttributeError: Can't pickle local object 'mainFunction.<locals>.get_prediction_init'
I do not really know what exactly is causing the error. How can I solve the problem so that I can run the code on the Windows device as well?
EDIT: The problem is solved if I create the get_predediction_init function on the same level as the mainFunction. It has only failed when I defined it as an inner function. Sorry for the confusion in my post.
The problem is in something you haven't shown us. For example, it's a mystery where "mainFunction" came from in the AttributeError message you showed.
Here's a complete, executable program based on the fragment you posted. Worked fine for me under Windows 10 just now, under Python 3.6.1 (I'm guessing you're using Python 3 from your print syntax), printing "a" 16 times:
import multiprocessing as mp
def get_prediction(data):
#here the real calculation will be performed
pass
def get_prediction_init(q):
print("a")
get_prediction.q = q
if __name__ == "__main__":
queue = mp.Queue()
pool = mp.Pool(processes=16, initializer=get_prediction_init, initargs=[queue,])
pool.close()
pool.join()
Edit
And, based on your edit, this program also works fine for me:
import multiprocessing as mp
def get_prediction(data):
#here the real calculation will be performed
pass
def get_prediction_init(q):
print("a")
get_prediction.q = q
def mainFunction():
queue = mp.Queue()
pool = mp.Pool(processes=16, initializer=get_prediction_init, initargs=[queue,])
pool.close()
pool.join()
if __name__ == "__main__":
mainFunction()
Edit 2
And now you've moved the definition of get_prediction_init() into the body of mainFunction. Now I can see your error :-)
As shown, define the function at module level instead. Trying to pickle local function objects can be a nightmare. Perhaps someone wants to fight with that, but not me ;-)

Can't access global variable in python

I'm using multi processing library in python in code below:
from multiprocessing import Process
import os
from time import sleep as delay
test = "First"
def f():
global test
print('hello')
print("before: "+test)
test = "Second"
if __name__ == '__main__':
p = Process(target=f, args=())
p.start()
p.join()
delay(1)
print("after: "+test)
It's supposed to change the value of test so at last the value of test must be Second, but the value doesn't change and remains First.
here is the output:
hello
before: First
after: First
The behavior you're seeing is because p is a new process, not a new thread. When you spawn a new process, it copies your initial process's state completely and then starts executing in parallel. When you spawn a thread, it shares memory with your initial thread.
Since processes have memory isolation, they won't create race-condition errors caused by reading and writing to shared memory. However, to get data from your child process back into the parent, you'll need to use some form of inter-process communication like a pipe, and because they fork memory, they are more expensive to spawn. As always in computer science, you have to make a tradeoff.
For more information, see:
https://en.wikipedia.org/wiki/Process_(computing)
https://en.wikipedia.org/wiki/Thread_(computing)
https://en.wikipedia.org/wiki/Inter-process_communication
Based on what you're actually trying to accomplish, consider using threads instead.
Global state is not shared so the changes made by child processes has no effect.
Here is why:
Actually it does change the global variable but only for the spawned
process. If you would access it within your process you can see it. As
its a process your global variable environment will be initialized but
the modification you make will be limited to the process itself and
not the whole.
Try this It explains whats happening
from multiprocessing import Process
import os
from time import sleep as delay
test = "First"
def f2():
print ("f2:" + test)
def f():
global test
print ('hello')
print ("before: "+test)
test = "Second"
f2()
if __name__ == '__main__':
p = Process(target=f, args=())
p.start()
p.join()
delay(1)
print("after: "+test)
If you really need to use modify from the process their's another way of doing it, read this doc or post it might help you.

How does the python multiprocessing works in backend?

When i tried to run the code:
import multiprocessing
def worker():
"""worker function"""
print 'Worker'
return
if __name__ == '__main__':
jobs = []
for i in range(5):
p = multiprocessing.Process(target=worker)
jobs.append(p)
p.start()
The output is blank and simply executing without printing "Worker". How to print the required output in multiprocessing?
What actually is happening while using multiprocessing?
What is the maximum number of cores we can use for multiprocessing?
I've tried your code in Windows 7, Cygwin, and Ubuntu. For me all the threads finish before the loop comes to an end so I get all the prints to show, but using join() will guarantee all the threads will finish.
import multiprocessing
def worker():
"""worker function"""
print 'Worker'
return
if __name__ == '__main__':
jobs = []
for i in range(5):
p = multiprocessing.Process(target=worker)
jobs.append(p)
p.start()
for i in range(len(jobs)):
jobs.pop().join()
As far as how multiprocessing works in the backend, I'm going to let someone more experienced than myself answer that one :) I'll probably just make a fool of myself.
I get 5 time "Worker" printed for my part, are you on Python 3 ? if it is the case you muste use print("Worker"). from my experiment, I think multitreading doesn't mean using multiple cores, it just run the diferent tread alternatively to ensure a parallelism. try reading the multiprocessing lib documentation for more info.

Monitoring a threaded Python program with htop

First of all, this is the code I am referring to:
from random import randint
import time
from threading import Thread
import Queue
class TestClass(object):
def __init__(self, queue):
self.queue = queue
def do(self):
while True:
wait = randint(1, 10)
time.sleep(1.0/wait)
print '[>] Enqueuing from TestClass.do...', wait
self.queue.put(wait)
class Handler(Thread):
def __init__(self, queue):
Thread.__init__(self)
self.queue = queue
def run(self):
task_no = 0
while True:
task = self.queue.get()
task_no += 1
print ('[<] Dequeuing from Handler.run...', task,
'task_no=', task_no)
time.sleep(1) # emulate processing time
print ('[*] Task %d done!') % task_no
self.queue.task_done()
def main():
q = Queue.Queue()
watchdog = TestClass(q)
observer = Thread(target=watchdog.do)
observer.setDaemon(True)
handler = Handler(q)
handler.setDaemon(True)
handler.start()
observer.start()
try:
while True:
wait = randint(1, 10)
time.sleep(1.0/wait)
print '[>] Enqueuing from main...', wait
q.put(wait)
except KeyboardInterrupt:
print '[*] Exiting...', True
if __name__ == '__main__':
main()
While the code is not very important to my question, it is a simple script that spawns 2 threads, on top of the main one. Two of them enqueue "tasks", and one dequeues them and "executes" them.
I am just starting to study threading in python, and I have of course ran into the subject of GIL, so I expected to have one process. But the thing is, when I monitor this particular script with htop, I notice not 1, but 3 processes being spawned.
How is this possible?
The GIL means only one thread will "do work" at a time but it doesn't mean that Python won't spawn the threads. In your case, you asked Python to spawn two threads so it did (giving you a total of three threads). FYI, top lists both processes and threads in case this was causing your confusion.
Python threads are useful for when you want concurrency but don't need parallelism. Concurrency is a tool for making programs simpler and more modular; it allows you to spawn a thread per task instead of having to write one big (often messy) while loop and/or use a bunch of callbacks (like JavaScript).
If you're interested in this subject, I recommend googling "concurrency versus parallelism". The concept is not language specific.
Edit: Alternativly, you can just read this Stack Overflow thread.

Categories