I have the below code:
import time
from threading import Thread
from multiprocessing import Process
def fun1():
for _ in xrange(10000000):
print 'in fun1'
pass
def fun2():
for _ in xrange(10000000):
print 'in fun2'
pass
def fun3():
for _ in xrange(10000000):
print 'in fun3'
pass
def fun4():
for _ in xrange(10000000):
print 'in fun4'
pass
if __name__ == '__main__':
#t1 = Thread(target=fun1, args=())
t1 = Process(target=fun1, args=())
#t2 = Thread(target=fun2, args=())
t2 = Process(target=fun2, args=())
#t3 = Thread(target=fun3, args=())
t3 = Process(target=fun3, args=())
#t4 = Thread(target=fun4, args=())
t4 = Process(target=fun4, args=())
t1.start()
t2.start()
t3.start()
t4.start()
start = time.clock()
t1.join()
t2.join()
t3.join()
t4.join()
end = time.clock()
print("Time Taken = ",end-start)
'''
start = time.clock()
fun1()
fun2()
fun3()
fun4()
end = time.clock()
print("Time Taken = ",end-start)
'''
I ran the above program in three ways:
First Sequential Execution ALONE(look at the commented code and comment the upper code)
Second Multithreaded Execution ALONE
Third Multiprocessing Execution ALONE
The observations for end_time-start time are as follows:
Overall Running times
('Time Taken = ', 342.5981313667716) --- Running time by threaded execution
('Time Taken = ', 232.94691744899296) --- Running time by sequential Execution
('Time Taken = ', 307.91093406618216) --- Running time by Multiprocessing execution
Question :
I see sequential execution takes least time and Multithreading takes highest time. Why? I am unable to understand and also surprised by results.Please clarify.
Since this is a CPU intensive task and GIL is acquired, my understanding was
Multiprocessing would take least time while threaded execution would take highest time.Please validate my understanding.
You use time.clock, wich gave you CPU time and not real time : you can't use that in your case, as it gives you the execution time (how long did you use the CPU to run your code, wich will be almost the same time for each of these case)
Running your code with time.time() instead of time.clock gave me these time on my computer :
Process : ('Time Taken = ', 5.226783990859985)
seq : ('Time Taken = ', 6.3122560000000005)
Thread : ('Time Taken = ', 17.10062599182129)
The task given here (printing) is so fast that the speedup from using multiprocessing is almost balanced by the overhead.
For Threading, as you can only have one Thread running because of the GIL, you end up running all your functions sequentially BUT you had the overhead of threading (changing threads every few iterations can cost up to several milliseconds each time). So you end up with something much slower.
Threading is usefull if you have waiting times, so you can run tasks in between.
Multiprocessing is usefull for computationnally expensive tasks, if possible completely independant (no shared variables). If you need to share variables, then you have to face the GIL and it's a little bit more complicated (but not impossible most of the time).
EDIT : Actually, using time.clock like you did gave you the information about how much overhead using Threading and Multiprocessing cost you.
Basically you're right.
What platform do you use to run the code snippet? I guess Windows.
Be noticed that "print" is not CPU bound so you should comment out "print" and try to run it on Linux to see the difference (It should be what you expect).
Use code like this:
def fun1():
for _ in xrange(10000000):
# No print, and please run on linux
pass
Related
Have a question, I'm some new in threading, I made this code....
import threading
from colorama import *
import random
import os
listax = [Fore.GREEN,Fore.YELLOW,Fore.RED]
print(random.choice(listax))
def hola():
import requests
a = requests.get('https://google.com')
print(a.status_code)
if __name__ == "__main__":
t1 = threading.Thread(target=hola)
t2 = threading.Thread(target=hola)
t3 = threading.Thread(target=hola)
t1.start()
t2.start()
t3.start()
t1.join()
t2.join()
t3.join()
And output shows 3 times if I execute 3 times the code, but my question is, for example, if I have big code and all start in:
def main():
code...
How I can add multiple threading for fast work, I see I can add 1 thread, if I add 3 threads the output shows 3 times, but how I can do it for example for add 10 threads to the same task without the output repeating 10 times for this execute fast as possible using the resourses of the system?
Multithreading does not magically sped up your code. It's up to you to break the code in chunks that can be run concurrently. When you create 3 threads that run hola, you are not "running hola once using 3 threads", but you are "running hola three times, each time in a different thread.
Although multithreading can be used to perform computation in parallel, the most common python interpreter (CPython) is implemented using a lock (the GIL) that lets only one thread run at a time. There are libraries that release the GIL before doing CPU-intensive work, so threading in python is useful for doing CPU-intensive work. Moreover, I/O operations relese the gil, so multithreading in python is very well suited for I/O work.
As an example, let's imagine that you have to need to access three different sites. You can access them sequentially, one after the other:
import requests
sites = ['https://google.com', 'https://yahoo.com', 'https://rae.es']
def hola(site):
a = requests.get(site)
print(site, " answered ", a.status_code)
for s in sites:
hola(s)
Or concurrently (all at the same time) using threads:
import requests
import threading
sites = ['https://google.com', 'https://yahoo.com', 'https://rae.es']
def hola(site):
a = requests.get(site)
print(site, " answered ", a.status_code)
th = [threading.Thread(target=hola, args=(s, )) for s in sites]
for t in th:
t.start()
for t in th:
t.join()
Please note that this is a simple example: the output can get scrambled, you have no acces to the return values, etc. For this kind of tasks I would use a thread pool.
i tried to use the loop of the code you give me
# Python program to illustrate the concept
# of threading
# importing the threading module
import threading
from colorama import *
import random
import os
listax = [Fore.GREEN,Fore.YELLOW,Fore.RED]
print(random.choice(listax))
"""
def print_cube(num):
function to print cube of given num
print("Cube: {}".format(num * num * num))
"""
def print_square():
num = 2
"""
function to print square of given num
"""
print("Square: {}".format(num * num))
def hola():
import requests
a = requests.get('https://google.com')
print(a.status_code)
if __name__ == "__main__":
for j in range(10):
t1 = threading.Thread(target=hola)
t1.start()
t1.join()
but when i run the code the code run 1 print per time, in my case give me
200
1 sec later again 200
and 200 again (x 10 times because i added 10 thread)
but i want know how i can do for this execute as fast possible without show me the 10 output, just i want the code do 1 print but as fast possible with 10 thread for example
You can simply use a for loop.
number_of_threads is the number of how many threads u want to run
for _ in range(number_of_threads):
t = threading.Thread(target=hola)
t.start()
t.join()
I am trying to use python threading and am having problems getting the threads to work independently. They seem to be running in sequential order and waiting for one to finish before starting to process the next thread. I have read other posts suggesting that I need to get more work into the threads to differentiate actual CPU work vs the CPU work of starting and managing the threads, and that a sleep timer could be used to simulate this. So I tried that and then measured the task durations.
So my code is below. It first runs three tasks sequentially with a 2 second timer. This takes about 6 seconds to run as expected. The next section starts three threads and they should run roughly in parallel if my understanding of threading is correct. I have played with the timers to test the overall duration of this section of code, expecting that if one timer is larger than the other two, the code will execute in an interval closest to that larger one. but what I am seeing is that is taking the same amount of time as the three running in sequence - one after the other.
I got onto this because I am writing some code to read an asynchronous queue in the background. After launching the thread to read the queue, my code seems to stop and wait until the queue reader is stopped, which it normally doesn't as it waits for messages to come in. So what happens is that it never executes the next section of code and it seems to be waiting for the thread to complete.
Also I checked the number of threads active and it remains at the same number, and when I check for the thread ID in the code (not shown) I get the same thread number coming back for every thread.
I am new to python and am using the jupyter compiler environment. Is there a compile option or some other limitation that I am not aware of that is preventing the threading? Am I just not getting the concept? I dont believe that this is related to CPU cores / threading as it would be done through logical thread cores within the python compiled code. I also ran a similar program in a command shell environment and got the same sequential performance.
Cut and paste this code to see what it does. What am I missing?
'''
import threading
import logging
import datetime
import time
import random
class Parallel:
def work(self, interval):
time.sleep(interval)
name = self.__repr__()
print (name, " is complete after ", interval, " seconds")
# SetupLogger()
logging.getLogger().setLevel(logging.DEBUG)
logging.debug("thread program start time is %s", datetime.datetime.now())
thread1 = Parallel()
thread2 = Parallel()
thread3 = Parallel()
print ("sequential threads::")
thread1.work(2.0)
thread2.work(2.0)
thread3.work(2.0)
logging.info("parallel threads start time is %s ", datetime.datetime.now())
start = time.time()
work1 = threading.Thread(target=thread1.work(1), daemon=True)
work1.start()
print ("thread 1 is started and there are ", threading.activeCount(), " threads active")
work2 = threading.Thread(target=thread2.work(2), daemon=False)
work2.start()
print ("thread 2 is started and there are ", threading.activeCount(), " threads active")
work3 = threading.Thread(target=thread3.work(5), daemon=False)
work3.start()
print ("thread 3 is started and there are ", threading.activeCount(), " threads active")
# wait for all to complete
print ("now wait for all to finish at ", datetime.datetime.now())
work1.join()
work2.join()
work3.join()
end = time.time()
logging.info ("parallel threads end time is %s with %s elapsed", datetime.datetime.now(), str(end-start))
print ("all threads completed at:", datetime.datetime.now())
'''
In the line that initializes the thread, you are actually executing the function instead of passing its reference to the thread.
thread1.work() ----> this will actually execute the function when the program runs and encounters this statement
So when your program reaches this line,
work1 = threading.Thread(target=thread1.work(1), daemon=True)
and encounters target=thread1.work(1), it simply calls the function right there and the actual thread does nothing.
thread1.work is a reference to the function, which you need to pass to your Thread object.
So just remove the parenthesis and your code becomes
work1 = threading.Thread(target=thread1.work, daemon=True, args=(1,))
and this will behave as you expect.
In the code below, I am generating cube of a number 9999 and calling the same via thread pool and normal method.
I am timing the difference between the same. Seems like the normal method is way faster. I am running this on a i7 8th gen intel processor with 16 gig ram inside a python 2.7 terminal.
I am baffled by this. May be I am missing something. I hope this question is helpful for people in the future.
import time
from multiprocessing.pool import ThreadPool
def cube():
return 9999*9999*9999
print "Start Execution Threading: "
x = int(round(time.time() * 1000))
pool = ThreadPool()
for i in range(0,100):
result = pool.apply_async(cube, ())
result = pool.apply_async(cube, ())
result = pool.apply_async(cube, ())
# print result.get()
pool.close()
pool.join()
print "Stop Execution Threading: "
y = int(round(time.time() * 1000))
print y-x
print "Start Execution Main: "
x = int(round(time.time() * 1000))
for i in range(0,100):
cube()
cube()
cube()
print "Stop Execution Main: "
y = int(round(time.time() * 1000))
print y-x
Multiprocessing means you will start a new thread. That comes with quite an overhead in that it must be initialized. As such, multi-threading only pays off, especially in python, when you parallelize tasks which all on their own take considerable time to execute (in comparison to python start-up time) and which can be allowed to run asyncronously.
In your case, a simple multiplication, is so quickly executed it will not pay off.
Because of from multiprocessing.pool import ThreadPool, you are using multi-threading and not multi-processing. CPython uses a Global Interpreter Lock to prevent more than one thread to execute Python code at the same time.
So as your program is CPU-bounded, you add the threading overhead with no benefits because of the GIL. Multi-threading does make sense in Python for IO-bounded problem, because a thread can run while others are waiting for IO completion.
You could try to use true multiprocessing, because then each Python process will have its own GIL, but I am still unsure of the gain, because the communication between processes adds even more overhead...
Hello I'm trying to calculate the first 10000 prime numbers.
I'm doing this first non threaded and then splitting the calculation in 1 to 5000 and 5001 to 10000. I expected that the use of threads makes it significant faster but the output is like this:
--------Results--------
Non threaded Duration: 0.012244000000000005 seconds
Threaded Duration: 0.012839000000000017 seconds
There is in fact no big difference except that the threaded function is even a bit slower.
What is wrong?
This is my code:
import math
from threading import Thread
def nonThreaded():
primeNtoM(1,10000)
def threaded():
t1 = Thread(target=primeNtoM, args=(1,5000))
t2 = Thread(target=primeNtoM, args=(5001,10000))
t1.start()
t2.start()
t1.join()
t2.join()
def is_prime(n):
if n % 2 == 0 and n > 2:
return False
for i in range(3, int(math.sqrt(n)) + 1, 2):
if n % i == 0:
return False
return True
def primeNtoM(n,m):
L = list()
if (n > m):
print("n should be smaller than m")
return
for i in range(n,m):
if(is_prime(i)):
L.append(i)
if __name__ == '__main__':
import time
print("--------Nonthreaded calculation--------")
nTstart_time = time.clock()
nonThreaded()
nonThreadedTime = time.clock() - nTstart_time
print("--------Threaded calculation--------")
Tstart_time = time.clock()
threaded()
threadedTime = time.clock() - Tstart_time
print("--------Results--------")
print ("Non threaded Duration: ",nonThreadedTime, "seconds")
print ("Threaded Duration: ",threadedTime, "seconds")
from: https://wiki.python.org/moin/GlobalInterpreterLock
In CPython, the global interpreter lock, or GIL, is a mutex that prevents multiple native threads from executing Python bytecodes at once. This lock is necessary mainly because CPython's memory management is not thread-safe. (However, since the GIL exists, other features have grown to depend on the guarantees that it enforces.)
This means: since this is CPU-intensive, and python is not threadsafe, it does not allow you to run multiple bytecodes at once in the same process. So, your threads alternate each other, and the switching overhead is what you get as extra time.
You can use the multiprocessing module, which gives results like below:
('Non threaded Duration: ', 0.016599999999999997, 'seconds')
('Threaded Duration: ', 0.007172000000000005, 'seconds')
...after making just these changes to your code (changing 'Thread' to 'Process'):
import math
#from threading import Thread
from multiprocessing import Process
def nonThreaded():
primeNtoM(1,10000)
def threaded():
#t1 = Thread(target=primeNtoM, args=(1,5000))
#t2 = Thread(target=primeNtoM, args=(5001,10000))
t1 = Process(target=primeNtoM, args=(1,5000))
t2 = Process(target=primeNtoM, args=(5001,10000))
t1.start()
t2.start()
t1.join()
t2.join()
By spawning actual OS processes instead of using in-process threading, you eliminate the GIL issues discussed in #Luis Masuelli's answer.
multiprocessing is a package that supports spawning processes using an
API similar to the threading module. The multiprocessing package
offers both local and remote concurrency, effectively side-stepping
the Global Interpreter Lock by using subprocesses instead of threads.
Due to this, the multiprocessing module allows the programmer to fully
leverage multiple processors on a given machine. It runs on both Unix
and Windows.
Good day!
I'm trying to learn multithreading features in python and I wrote the following code:
import time, argparse, threading, sys, subprocess, os
def item_fun(items, indices, lock):
for index in indices:
items[index] = items[index]*items[index]*items[index]
def map(items, cores):
count = len(items)
cpi = count/cores
threads = []
lock = threading.Lock()
for core in range(cores):
thread = threading.Thread(target=item_fun, args=(items, range(core*cpi, core*cpi + cpi), lock))
threads.append(thread)
thread.start()
item_fun(items, range((core+1)*cpi, count), lock)
for thread in threads:
thread.join()
parser = argparse.ArgumentParser(description='cube', usage='%(prog)s [options] -n')
parser.add_argument('-n', action='store', help='number', dest='n', default='1000000', metavar = '')
parser.add_argument('-mp', action='store_true', help='multi thread', dest='mp', default='True')
args = parser.parse_args()
items = range(NUMBER_OF_ITEMS)
# print 'items before:'
# print items
mp = args.mp
if mp is True:
NUMBER_OF_PROCESSORS = int(os.getenv("NUMBER_OF_PROCESSORS"))
NUMBER_OF_ITEMS = int(args.n)
start = time.time()
map(items, NUMBER_OF_PROCESSORS)
end = time.time()
else:
NUMBER_OF_ITEMS = int(args.n)
start = time.time()
item_fun(items, range(NUMBER_OF_ITEMS), None)
end = time.time()
#print 'items after:'
#print items
print 'time elapsed: ', (end - start)
When I use mp argument, it works slower, on my machine with 4 cpus, it takes about 0.5 secs to compute result, while if I use a single thread it takes about 0.3 secs.
Am I doing something wrong?
I know there's Pool.map() and e.t.c but it spawns subprocess not threads and it works faster as far as I know, but I'd like to write my own thread pool.
Python has no true multithreading, due to an implementation detail called the "GIL". Only one thread actually runs at a time, and Python switches between the threads. (Third party implementations of Python, such as Jython, can actually run parallel threads.)
As to why actually your program is slower in the multithreaded version depends, but when coding for Python, one needs to be aware of the GIL, so one does not believe that CPU bound loads are more efficiently processed by adding threads to the program.
Other things to be aware of are for instance multiprocessing and numpy for solving CPU bound loads, and PyEv (minimal) and Tornado (huge kitchen sink) for solving I/O bound loads.
You'll only see an increase in throughput with threads in Python if you have threads which are IO bound. If what you're doing is CPU bound then you won't see any throughput increase.
Turning on the thread support in Python (by starting another thread) also seems to make some things slower so you may find that overall performance still suffers.
This is all cpython of course, other Python implementations have different behaviour.