I'm trying to finish my programming course and I'm stuck on one exercise.
I have to count how much time it takes in Python to create threads and whether it depends on the number of threads created.
I wrote a simple script and I don't know if it is good:
import threading
import time
def fun1(a,b):
c = a + b
print(c)
time.sleep(100)
times = []
for i in range(10000):
start = time.time()
threading.Thread(target=fun1, args=(55,155)).start()
end = time.time()
times.append(end-start)
print(times)
In times[] I got a 10000 results near 0.0 or exacly 0.0.
And now I don't know if I created the test because I don't understand something, or maybe the result is correct and the time of creating a thread does not depend on the number of already created ones?
Can U help me with it? If it's worng solution, explain me why, or if it's correct confirm it? :)
So there are two ways to interpret your question:
Whether the existence of other threads (that have not been started) affects creation time for new threads
Whether other threads running in the background (threads already started) affects creation time for new threads.
Checking the first one
In this case, you simply don't start the threads:
import threading
import time
def fun1(a,b):
c = a + b
print(c)
time.sleep(100)
times = []
for i in range(10):
start = time.time()
threading.Thread(target=fun1, args=(55,155)) # don't start
end = time.time()
times.append(end-start)
print(times)
output for 10 runs:
[4.696846008300781e-05, 2.8848648071289062e-05, 2.6941299438476562e-05, 2.5987625122070312e-05, 2.5987625122070312e-05, 2.5987625122070312e-05, 2.5987625122070312e-05, 2.5987625122070312e-05, 2.5033950805664062e-05, 2.6941299438476562e-05]
As you can see, the times are about the same (as you would expect).
Checking the second one
In this case, we want the previously created threads to keep running as we create more threads. So we give each thread a task that never finishes:
import threading
import time
def fun1(a,b):
while True:
pass # never ends
times = []
for i in range(100):
start = time.time()
threading.Thread(target=fun1, args=(55,155)).start()
end = time.time()
times.append(end-start)
print(times)
output:
Over 100 runs, the first one took 0.0003440380096435547 whereas the last one took 0.3017098903656006 so there's quite a magnitude of increase there.
Related
I am using schedule module to automatically run a function...
I am thinking of changing the scheduling time dynamically, but the solution is not success
Code -
import schedule
import pandas
from time import gmtime, strftime, sleep
import time
import random
time = 0.1
def a():
global time
print(strftime("%Y-%m-%d %H:%M:%S", gmtime()))
index = random.randint(1, 9)
print(index, time)
if(index==2):
time = 1
print(strftime("%Y-%m-%d %H:%M:%S", gmtime()))
schedule.every(time).minutes.do(a) #specify the minutes to automatically run the api
while True:
schedule.run_pending()
In this program, I scheduled the program to run every 6 seconds. And if the random integer - index value becomes 2, then the time variable is assigned as 1(1 minute). I checked, the time variable is changed to 1 after the random integer index becomes 2. The issue - After changing the time variable to 1, the scheduling still runs the function a() every 6 seconds not 1 minute.
How to change the scheduling time dynamically?
Thank you
After changing the time variable to 1, the scheduling still runs the function a() every 6 seconds not 1 minute.
This is because schedule.every(time).minutes.do(a) # specify the minutes to automatically run the api sets time to 6 seconds at beginning which does not change even if you change the value of that variable because that line has executed just once where value of time was 6 seconds at that execution.
How to change the scheduling time dynamically?
After reading DOCUMENTATION, I found nothing(I think) regarding changing time manually(when certain condition becomes satisfies) but it has built in Random Interval function where that function itself specifies random time within the range.
In your case you could do:
schedule.every(5).to(10).seconds.do(a)
The problem is that you cannot change time when certain condition satisfies.
Maybe there might be some way to fix that issue but could not figure out. And these information may help to investigate further to solve your problem.
I usually use custom schedulers, as they allow greater control and are also less memory intensive. The variable "time" needs to be shared between processes. This is where Manager().Namespace() comes to rescue. It talks 'between' processes.
import time
import random
from multiprocessing import Process, Manager
ns = Manager().Namespace()
ns.time = 0.1
processes = []
def a():
print(time.strftime("%Y-%m-%d %H:%M:%S", time.gmtime()))
index = random.randint(1, 4)
if(index==2):
ns.time = 1
print(index, ns.time)
while True:
try:
s = time.time() + ns.time*60
for x in processes:
if not x.is_alive():
x.join()
processes.remove(x)
print('Sleeping :',round(s - time.time()))
time.sleep(round(s - time.time()))
p = Process(target = a)
p.start()
processes.append(p)
except:
print('Killing all to prevent orphaning ...')
[p.terminate() for p in processes]
[processes.remove(p) for p in processes]
break
I am playing around with multiprocessing in Python 3 to try and understand how it works and when it's good to use it.
I am basing my examples on this question, which is really old (2012).
My computer is a Windows, 4 physical cores, 8 logical cores.
First: not segmented data
First I try to brute force compute numpy.sinfor a million values. The million values is a single chunk, not segmented.
import time
import numpy
from multiprocessing import Pool
# so that iPython works
__spec__ = "ModuleSpec(name='builtins', loader=<class '_frozen_importlib.BuiltinImporter'>)"
def numpy_sin(value):
return numpy.sin(value)
a = numpy.arange(1000000)
if __name__ == '__main__':
pool = Pool(processes = 8)
start = time.time()
result = numpy.sin(a)
end = time.time()
print('Singled threaded {}'.format(end - start))
start = time.time()
result = pool.map(numpy_sin, a)
pool.close()
pool.join()
end = time.time()
print('Multithreaded {}'.format(end - start))
And I get that, no matter the number of processes, the 'multi_threading' always takes 10 times or so as much as the 'single threading'. In the task manager, I see that not all the CPUs are maxed out, and the total CPU usage is goes between 18% and 31%.
So I try something else.
Second: segmented data
I try to split up the original 1 million computations in 10 batches of 100,000 each. Then I try again for 10 million computations in 10 batches of 1 million each.
import time
import numpy
from multiprocessing import Pool
# so that iPython works
__spec__ = "ModuleSpec(name='builtins', loader=<class '_frozen_importlib.BuiltinImporter'>)"
def numpy_sin(value):
return numpy.sin(value)
p = 3
s = 1000000
a = [numpy.arange(s) for _ in range(10)]
if __name__ == '__main__':
print('processes = {}'.format(p))
print('size = {}'.format(s))
start = time.time()
result = numpy.sin(a)
end = time.time()
print('Singled threaded {}'.format(end - start))
pool = Pool(processes = p)
start = time.time()
result = pool.map(numpy_sin, a)
pool.close()
pool.join()
end = time.time()
print('Multithreaded {}'.format(end - start))
I ran this last piece of code for different processes p and different list length s, 100000and 1000000.
At least now the task Manager gives the CPU maxed out at 100% usage.
I get the following results for the elapsed times (ORANGE: multiprocess, BLUE: single):
So multiprocessing never wins over the single process.
Why??
Numpy changes how the parent process runs so that it only runs on one core. You can call os.system("taskset -p 0xff %d" % os.getpid()) after you import numpy to reset the CPU affinity so that all cores are used.
See this question for more details
A computer can really only do one thing at a time. When multi-threading or multi-processing, the computer is really only switching back and forth between tasks quickly. With the provided problem, the computer could either perform the calculation 1,000,000 times, or split-up the work between a couple "workers" and perform 100,000 for each of 10 "workers".
Multi-processing shines not when computing something straight out, as the computer has to take time to create multiple processes, but while waiting for something. The main example I've heard is for webscraping. If a program requested data from a list of websites and waited for each server to send data before requesting data from the next, the program will have to sit for a couple seconds. If instead, the computer used multiprocessing/threading to ask all the websites first and all concurrently wait, the total running time is much shorter.
I'm trying to modify a Python script to multiprocess with "Process". The problem is it's not working. In a first step, the content is retrieved sequentially (test1, test2). In a second one, it is to be called in parallel (test1 and test2). There is practically no speed difference. If you execute the functions individually, you will notice a difference. In my opinion, parallelization should only take as long as the longest individual process. What am I missing here?
import multiprocessing
import time
def test1(k):
k = k * k
for e in range(1, k):
e = e**k
def test2(k):
k = k * k
for e in range(1, k):
e = e + 5 - 5*k ** 4000
if __name__ == '__main__':
start = time.time()
test1(100)
test2(100)
end = time.time()
print(end-start)
start = time.time()
worker_1 = multiprocessing.Process(target=test1(100))
worker_1.start()
worker_2 = multiprocessing.Process(target=test2, args=(100,))
worker_2.start()
worker_1.join()
worker_2.join()
end = time.time()
print(end-start)
I want to add that I checked the task manager and saw that only 1 core is used. (4 real Core only 25% CPU => 1Core 100% used)
I know Pool Class, but I don't want to use it.
Thank you for your help.
Update
Hello, everybody,
the one with the "typo" was unfavorable. Sorry about that. Bakuriu, thank you for your answer. In fact, you're right. I think it was the typo and too much work. :-( So I changed the example once again. For all who are interested:
I create two functions, in the first part of the main I run 3 times the functions sequentially. My computer needs approx. 36 sec. Then I start two new processes. These calculate their results here in parallel. As a small addition, the skin process of the program itself also calculates the function test1, which should show that the main program itself can also do something. I get a computing time of 12 sec. So that it is comprehensible for all in the Internet, what this means I once attached a picture here.
Task Manager
import multiprocessing
import time
def test1(k):
k = k * k
for e in range(1, k):
e = e**k
def test2(k):
k = k * k
for e in range(1, k):
e = e**k
if __name__ == '__main__':
start = time.time()
test1(100)
test2(100)
test1(100)
end = time.time()
print(end-start)
start = time.time()
worker_1 = multiprocessing.Process(target=test1, args=(100,))
worker_1.start()
worker_2 = multiprocessing.Process(target=test2, args=(100,))
worker_2.start()
test1(100)
worker_1.join()
worker_2.join()
end = time.time()
print(end-start)
Your code is executing sequentially because instead of passing test1 to the Process's target argument you are passing test1's result to it!
You want to do this:
worker_1 = multiprocessing.Process(target=test1, args=(100,))
As you do in the other call not this:
worker_1 = multiprocessing.Process(target=test1(100))
This code is first executing test1(100), then returns None and assigns that to target spawning an "empty process". After that you spawn a second process that executes test2(100). So you execute the code sequentially plus you add the overhead of spawning two processes.
I need to call a function, exactly 08:00, 18:00, 22:00 hours. I've created a example to test the comparison between hours. When the current time reaches one of those horary. Put in inside a While loop thinking this example would work as a stopwatch, but I think I'm wrong. How is the best way to compare those values?
currentH= dt.datetime.now().strftime("%H:%M:%S")
h = "16:15:10"
while True:
if(currentH==h):
print 'Ok'
print 'The current Hour is: '+h
import datetime as dt
import time
currentH= dt.datetime.now().replace(microsecond=0).time()
hrs = ['00:02', '12:00']
for i in range(len(hrs)):
h = [int(x) for x in hrs[i].split(':')]
h = dt.datetime.now().replace(hour=h[0], minute=h[1], second=0,microsecond=0).time()
hrs[i] = h
while True:
currentH = dt.datetime.now().replace(microsecond=0).time()
print(currentH)
if currentH in hrs:
print('Time is now',currentH)
time.sleep(1)
The biggest problem with your code is that you never call now() again inside the loop, so you're just spinning forever comparing the initial time to 16:15:10.
While we're at it: Why convert the time to a string for comparison instead of just comparing times?
But there are bigger problems with this design that can't be fixed as easily.
What happens if you check the time at 16:15, then go to sleep, then wake up at 16:25? Then now() never returns 16:15:10.
Also, do you really want to burn 100% CPU for 10 hours?
A better solution is to write a sleep_until function:
def sleep_until(target):
left = target - dt.datetime.now()
if left > dt.timedelta(seconds=0):
time.sleep(left.total_seconds())
(If you're using Python 2.7 or 3.4, it's a bit more complicated, because sleep will wake up early if there's a signal. But to handle that case, you just need to add a while True: loop around the whole thing.)
Now, the only tricky bit is working out the first time you need to sleep until, which isn't all that tricky:
waits = itertools.cycle(dt.timedelta(hours=wait) for wait in (10, 4, 10))
now = dt.datetime.now()
start = dt.datetime.combine(dt.date.today(), dt.time(hour=8))
for wait in waits:
start += wait
if start > now:
break
And now, we just loop over the waits forever, sleeping until each next time:
for wait in waits:
sleep_until(start)
print('Time to make the donuts')
start += wait
Or, of course, you could just grab one of the many scheduling libraries off PyPI.
Or just use your platform's cron/launchd/Scheduled Tasks API to run your script.
I am working on a project which accurate timer is really crucial. I am working on python and am using timer.sleep() function.
I noticed that timer.sleep() function will add additional delay because of the scheduling problem (refer to timer.sleep docs). Due to that issue, the longer my program runs, the more inaccurate the timer is.
Is there any more accurate timer/ticker to sleep the program or solution for this problem?
Any help would be appreciated. Cheers.
I had a solution similar to above, but it became processor heavy very quickly. Here is a processor-heavy idea and a workaround.
def processor_heavy_sleep(ms): # fine for ms, starts to work the computer hard in second range.
start = time.clock()
end = start + ms /1000.
while time.clock() < end:
continue
return start, time.clock()
def efficient_sleep(secs, expected_inaccuracy=0.5): # for longer times
start = time.clock()
end = secs + start
time.sleep(secs - expected_inaccuracy)
while time.clock() < end:
continue
return start, time.clock()
output of efficient_sleep(5, 0.5) 3 times was:
(3.1999303695151594e-07, 5.0000003199930365)
(5.00005983869791, 10.00005983869791)
(10.000092477987678, 15.000092477987678)
This is on windows. I'm running it for 100 loops right now. Here are the results.
(485.003749358414, 490.003749358414)
(490.0037919174879, 495.0037922374809)
(495.00382903668014, 500.00382903668014)
The sleeps remain accurate, but the calls are always delayed a little. If you need a scheduler that accurately calls every xxx secs to the millisecond, that would be a different thing.
the longer my program runs, the more inaccurate the timer is.
So, for example by expecting 0.5s delay, it will be time.sleep(0.5 - (start-end)). But still didn't solve the issue
You seem to be complaining about two effects, 1) the fact that timer.sleep() may take longer than you expect, and 2) the inherent creep in using a series of timer.sleep() calls.
You can't do anything about the first, short of switching to a real-time OS. The underlying OS calls are defined to sleep for at least as long as requested. They only guarantee that you won't wake early; they make no guarantee that you won't wake up late.
As for the second, you ought to figure your sleep time according to an unchanging epoch, not from your wake-up time. For example:
import time
import random
target = time.time()
def myticker():
# Sleep for 0.5s between tasks, with no creep
target += 0.5
now = time.time()
if target > now:
time.sleep(target - now)
def main():
previous = time.time()
for _ in range(100):
now = time.time()
print(now - previous)
previous = now
# simulate some work
time.sleep(random.random() / 10) # Always < tick frequency
# time.sleep(random.random()) # Not always < tick frequency
myticker()
if __name__ == "__main__":
main()
Working on Linux with zero knowledge of Windows, I may be being naive here but is there some reason that writing your own sleep function, won't work for you?
Something like:
import time
def sleep_time():
start_time = time.time()
while (time.time() - start_time) < 0.0001:
continue
end_time = time.time() + 60 # run for a minute
cnt = 0
while time.time() < end_time:
cnt += 1
print('sleeping',cnt)
sleep_time()
print('Awake')
print("Slept ",cnt," Times")