Have a question, I'm some new in threading, I made this code....
import threading
from colorama import *
import random
import os
listax = [Fore.GREEN,Fore.YELLOW,Fore.RED]
print(random.choice(listax))
def hola():
import requests
a = requests.get('https://google.com')
print(a.status_code)
if __name__ == "__main__":
t1 = threading.Thread(target=hola)
t2 = threading.Thread(target=hola)
t3 = threading.Thread(target=hola)
t1.start()
t2.start()
t3.start()
t1.join()
t2.join()
t3.join()
And output shows 3 times if I execute 3 times the code, but my question is, for example, if I have big code and all start in:
def main():
code...
How I can add multiple threading for fast work, I see I can add 1 thread, if I add 3 threads the output shows 3 times, but how I can do it for example for add 10 threads to the same task without the output repeating 10 times for this execute fast as possible using the resourses of the system?
Multithreading does not magically sped up your code. It's up to you to break the code in chunks that can be run concurrently. When you create 3 threads that run hola, you are not "running hola once using 3 threads", but you are "running hola three times, each time in a different thread.
Although multithreading can be used to perform computation in parallel, the most common python interpreter (CPython) is implemented using a lock (the GIL) that lets only one thread run at a time. There are libraries that release the GIL before doing CPU-intensive work, so threading in python is useful for doing CPU-intensive work. Moreover, I/O operations relese the gil, so multithreading in python is very well suited for I/O work.
As an example, let's imagine that you have to need to access three different sites. You can access them sequentially, one after the other:
import requests
sites = ['https://google.com', 'https://yahoo.com', 'https://rae.es']
def hola(site):
a = requests.get(site)
print(site, " answered ", a.status_code)
for s in sites:
hola(s)
Or concurrently (all at the same time) using threads:
import requests
import threading
sites = ['https://google.com', 'https://yahoo.com', 'https://rae.es']
def hola(site):
a = requests.get(site)
print(site, " answered ", a.status_code)
th = [threading.Thread(target=hola, args=(s, )) for s in sites]
for t in th:
t.start()
for t in th:
t.join()
Please note that this is a simple example: the output can get scrambled, you have no acces to the return values, etc. For this kind of tasks I would use a thread pool.
i tried to use the loop of the code you give me
# Python program to illustrate the concept
# of threading
# importing the threading module
import threading
from colorama import *
import random
import os
listax = [Fore.GREEN,Fore.YELLOW,Fore.RED]
print(random.choice(listax))
"""
def print_cube(num):
function to print cube of given num
print("Cube: {}".format(num * num * num))
"""
def print_square():
num = 2
"""
function to print square of given num
"""
print("Square: {}".format(num * num))
def hola():
import requests
a = requests.get('https://google.com')
print(a.status_code)
if __name__ == "__main__":
for j in range(10):
t1 = threading.Thread(target=hola)
t1.start()
t1.join()
but when i run the code the code run 1 print per time, in my case give me
200
1 sec later again 200
and 200 again (x 10 times because i added 10 thread)
but i want know how i can do for this execute as fast possible without show me the 10 output, just i want the code do 1 print but as fast possible with 10 thread for example
You can simply use a for loop.
number_of_threads is the number of how many threads u want to run
for _ in range(number_of_threads):
t = threading.Thread(target=hola)
t.start()
t.join()
Related
I've stumbled across a weird timing issue while using the multiprocessing module.
Consider the following scenario. I have functions like this:
import multiprocessing as mp
def workerfunc(x):
# timehook 3
# something with x
# timehook 4
def outer():
# do something
mygen = ... (some generator expression)
pool = mp.Pool(processes=8)
# time hook 1
result = [pool.apply(workerfunc, args=(x,)) for x in mygen]
# time hook 2
if __name__ == '__main__':
outer()
I am utilizing the time module to get an arbitrary feeling for how long my functions run. I successfully create 8 separate processes, which terminate without error. The longest time for a worker to finish is about 130 ms (measured between timehook 3 and 4).
I expected (as they are running in parallel) that the time between hook 1 and 2 will be approximately the same. Surprisingly, I get 600 ms as a result.
My machine has 32 cores and should be able to handle this easily. Can anybody give me a hint where this difference in time comes from?
Thanks!
You are using pool.apply which is blocking. Use pool.apply_async instead and then the function calls will all run in parallel, and each will return an AsyncResult object immediately. You can use this object to check when the processes are done and then retrieve the results using this object also.
Since you are using multiprocessing and not multithreading your performance issue is not related to GIL (Python's Global Interpreter Lock).
I've found an interesting link explaining this with an example, you can find it in the bottom of this answer.
The GIL does not prevent a process from running on a different
processor of a machine. It simply only allows one thread to run at
once within the interpreter.
So multiprocessing not multithreading will allow you to achieve true
concurrency.
Lets understand this all through some benchmarking because only that
will lead you to believe what is said above. And yes, that should be
the way to learn — experience it rather than just read it or
understand it. Because if you experienced something, no amount of
argument can convince you for the opposing thoughts.
import random
from threading import Thread
from multiprocessing import Process
size = 10000000 # Number of random numbers to add to list
threads = 2 # Number of threads to create
my_list = []
for i in xrange(0,threads):
my_list.append([])
def func(count, mylist):
for i in range(count):
mylist.append(random.random())
def multithreaded():
jobs = []
for i in xrange(0, threads):
thread = Thread(target=func,args=(size,my_list[i]))
jobs.append(thread)
# Start the threads
for j in jobs:
j.start()
# Ensure all of the threads have finished
for j in jobs:
j.join()
def simple():
for i in xrange(0, threads):
func(size,my_list[i])
def multiprocessed():
processes = []
for i in xrange(0, threads):
p = Process(target=func,args=(size,my_list[i]))
processes.append(p)
# Start the processes
for p in processes:
p.start()
# Ensure all processes have finished execution
for p in processes:
p.join()
if __name__ == "__main__":
multithreaded()
#simple()
#multiprocessed()
Additional information
Here you can find the source of this information and a more detailed technical explanation (bonus: there's also Guido Van Rossum quotes in it :) )
Sorry to ask this question as a new python starter, I have a working python program to be converted into multiprocessing or multithreading, here is the working py's structure:
class XMLToJson():
def __init__(self, region=None, flow=None, path=None, output=None):
def run(self):
def run_from_cmd():
XMLToJson().run()
if __name__ == '__main__':
XMLToJson().run()
It would be greatly appreciated if anyone can tell me how to do the conversion.
Thank you very much.
P.S.
The following is the framework I am thinking how to fit into it:
from threading import Thread, current_thread, Lock
import time
def worker(l):
while True:
l.acquire()
print ('in worker:' + str(current_thread()))
l.release()
time.sleep(0.5)
if __name__ == '__main__':
l = Lock()
print ('in main: ' + str(current_thread()))
threads = [Thread(target=worker, args=[l]) for i in range(5)]
for t in threads:
t.start()
for t in threads:
t.join()
I modified the original working program from run() to main_process(), and set the target from worker to main_process,
if __name__ == '__main__':
l = Lock()
print ('in main: ' + str(current_thread()))
threads = [Thread(target=main_process, args=[l]) for i in range(5)]
for t in threads:
t.start()
for t in threads:
t.join()
but the program doesn't even pass the compile, error out in target=main_process.
Thank you very much.
Your question lacks quite a bit of specifics. What is your program doing? Why do you want to multiprocess/thread? What is your input/output? What is there to multiprocess/multithread?
If what you have is a script that does input => transform => output and terminates, multiprocessing/threading would be just a way to process several sets of input at the same time to gain time. In that case you could either call your script several times with each set of inputs, or pass the multi-inputs to a single instance of your multi-threaded script where you use e.g. binge library (pip install binge) to deal with multiprocessing:
from binge import B
result = B(worker, n=5)(....)
where worker is your transform function, n the number of times it should happen, and .... your inputs to be sent to the 5 parallel worker instances - mind that if you have n=5, then your inputs should be either length 5 (distributed over workers), or 1 (given identically to each worker).
cf: binge documentation
I'm running a pythong program using the multiprocessing module to take advantage of multiple cores on the cpu.
The program itself works fine, but when it comes to show a kind of progress percentage it all messes up.
In order to try to simulate what happens to me, I've written this little scenario where I've used some random times to try to replicate some tasks that could take different times in the original program.
When you ran it, you'll see how percentages are mixed up.
Is there a propper way to achieve this?
from multiprocessing import Pool, Manager
import random
import time
def PrintPercent(percent):
time.sleep(random.random())
print(' (%s %%) Ready' %(percent))
def HeavyProcess(cont):
total = 20
cont[0] = cont[0] + 1
percent = round((float(cont[0])/float(total))*100, 1)
PrintPercent(percent)
def Main():
cont = Manager().list(range(1))
cont[0] = 0
pool = Pool(processes=2)
for i in range(20):
pool.apply_async(HeavyProcess, [cont])
pool.close()
pool.join()
Main()
I have the below code:
import time
from threading import Thread
from multiprocessing import Process
def fun1():
for _ in xrange(10000000):
print 'in fun1'
pass
def fun2():
for _ in xrange(10000000):
print 'in fun2'
pass
def fun3():
for _ in xrange(10000000):
print 'in fun3'
pass
def fun4():
for _ in xrange(10000000):
print 'in fun4'
pass
if __name__ == '__main__':
#t1 = Thread(target=fun1, args=())
t1 = Process(target=fun1, args=())
#t2 = Thread(target=fun2, args=())
t2 = Process(target=fun2, args=())
#t3 = Thread(target=fun3, args=())
t3 = Process(target=fun3, args=())
#t4 = Thread(target=fun4, args=())
t4 = Process(target=fun4, args=())
t1.start()
t2.start()
t3.start()
t4.start()
start = time.clock()
t1.join()
t2.join()
t3.join()
t4.join()
end = time.clock()
print("Time Taken = ",end-start)
'''
start = time.clock()
fun1()
fun2()
fun3()
fun4()
end = time.clock()
print("Time Taken = ",end-start)
'''
I ran the above program in three ways:
First Sequential Execution ALONE(look at the commented code and comment the upper code)
Second Multithreaded Execution ALONE
Third Multiprocessing Execution ALONE
The observations for end_time-start time are as follows:
Overall Running times
('Time Taken = ', 342.5981313667716) --- Running time by threaded execution
('Time Taken = ', 232.94691744899296) --- Running time by sequential Execution
('Time Taken = ', 307.91093406618216) --- Running time by Multiprocessing execution
Question :
I see sequential execution takes least time and Multithreading takes highest time. Why? I am unable to understand and also surprised by results.Please clarify.
Since this is a CPU intensive task and GIL is acquired, my understanding was
Multiprocessing would take least time while threaded execution would take highest time.Please validate my understanding.
You use time.clock, wich gave you CPU time and not real time : you can't use that in your case, as it gives you the execution time (how long did you use the CPU to run your code, wich will be almost the same time for each of these case)
Running your code with time.time() instead of time.clock gave me these time on my computer :
Process : ('Time Taken = ', 5.226783990859985)
seq : ('Time Taken = ', 6.3122560000000005)
Thread : ('Time Taken = ', 17.10062599182129)
The task given here (printing) is so fast that the speedup from using multiprocessing is almost balanced by the overhead.
For Threading, as you can only have one Thread running because of the GIL, you end up running all your functions sequentially BUT you had the overhead of threading (changing threads every few iterations can cost up to several milliseconds each time). So you end up with something much slower.
Threading is usefull if you have waiting times, so you can run tasks in between.
Multiprocessing is usefull for computationnally expensive tasks, if possible completely independant (no shared variables). If you need to share variables, then you have to face the GIL and it's a little bit more complicated (but not impossible most of the time).
EDIT : Actually, using time.clock like you did gave you the information about how much overhead using Threading and Multiprocessing cost you.
Basically you're right.
What platform do you use to run the code snippet? I guess Windows.
Be noticed that "print" is not CPU bound so you should comment out "print" and try to run it on Linux to see the difference (It should be what you expect).
Use code like this:
def fun1():
for _ in xrange(10000000):
# No print, and please run on linux
pass
Hello I'm trying to calculate the first 10000 prime numbers.
I'm doing this first non threaded and then splitting the calculation in 1 to 5000 and 5001 to 10000. I expected that the use of threads makes it significant faster but the output is like this:
--------Results--------
Non threaded Duration: 0.012244000000000005 seconds
Threaded Duration: 0.012839000000000017 seconds
There is in fact no big difference except that the threaded function is even a bit slower.
What is wrong?
This is my code:
import math
from threading import Thread
def nonThreaded():
primeNtoM(1,10000)
def threaded():
t1 = Thread(target=primeNtoM, args=(1,5000))
t2 = Thread(target=primeNtoM, args=(5001,10000))
t1.start()
t2.start()
t1.join()
t2.join()
def is_prime(n):
if n % 2 == 0 and n > 2:
return False
for i in range(3, int(math.sqrt(n)) + 1, 2):
if n % i == 0:
return False
return True
def primeNtoM(n,m):
L = list()
if (n > m):
print("n should be smaller than m")
return
for i in range(n,m):
if(is_prime(i)):
L.append(i)
if __name__ == '__main__':
import time
print("--------Nonthreaded calculation--------")
nTstart_time = time.clock()
nonThreaded()
nonThreadedTime = time.clock() - nTstart_time
print("--------Threaded calculation--------")
Tstart_time = time.clock()
threaded()
threadedTime = time.clock() - Tstart_time
print("--------Results--------")
print ("Non threaded Duration: ",nonThreadedTime, "seconds")
print ("Threaded Duration: ",threadedTime, "seconds")
from: https://wiki.python.org/moin/GlobalInterpreterLock
In CPython, the global interpreter lock, or GIL, is a mutex that prevents multiple native threads from executing Python bytecodes at once. This lock is necessary mainly because CPython's memory management is not thread-safe. (However, since the GIL exists, other features have grown to depend on the guarantees that it enforces.)
This means: since this is CPU-intensive, and python is not threadsafe, it does not allow you to run multiple bytecodes at once in the same process. So, your threads alternate each other, and the switching overhead is what you get as extra time.
You can use the multiprocessing module, which gives results like below:
('Non threaded Duration: ', 0.016599999999999997, 'seconds')
('Threaded Duration: ', 0.007172000000000005, 'seconds')
...after making just these changes to your code (changing 'Thread' to 'Process'):
import math
#from threading import Thread
from multiprocessing import Process
def nonThreaded():
primeNtoM(1,10000)
def threaded():
#t1 = Thread(target=primeNtoM, args=(1,5000))
#t2 = Thread(target=primeNtoM, args=(5001,10000))
t1 = Process(target=primeNtoM, args=(1,5000))
t2 = Process(target=primeNtoM, args=(5001,10000))
t1.start()
t2.start()
t1.join()
t2.join()
By spawning actual OS processes instead of using in-process threading, you eliminate the GIL issues discussed in #Luis Masuelli's answer.
multiprocessing is a package that supports spawning processes using an
API similar to the threading module. The multiprocessing package
offers both local and remote concurrency, effectively side-stepping
the Global Interpreter Lock by using subprocesses instead of threads.
Due to this, the multiprocessing module allows the programmer to fully
leverage multiple processors on a given machine. It runs on both Unix
and Windows.