I'm trying to familiarize myself with multiprocessing so I used a brute force search to find a string.
The script works as expected and due to the use of an iterator RAM usage is pretty good. What I don't understand is what is happening after the "password" has been found. It always takes double the time for the script to exit (in this example 70sec for finding the password and 160sec to complete) where, as far as I understand, the only thing it still has to do is terminate all the processes.
Is this what is happening or there is something else?
import itertools
import multiprocessing as mp
import string
import time
# start timer
tStart = time.time()
userPass = 'mypass'
def getPassword(passList):
str = ''.join(passList)
if userPass == str:
print('\n')
print('~~~ CRACKED ~~~')
print('User password is {}'.format(str))
print('Cracked password in {:.3f} seconds'.format(time.time() - tStart))
if __name__ == '__main__':
# possible characters used in password
chars = list(string.ascii_lowercase)
# get all character combinations
allPasswords = itertools.product(chars, repeat=len(userPass))
# calculate optimum chunk number
totalComb = len(chars) ** len(userPass)
nChunks = int(max(1, divmod(totalComb, mp.cpu_count() * 4)[0]))
with mp.Pool(processes=mp.cpu_count()) as pool:
for result in pool.imap_unordered(getPassword, allPasswords, chunksize=nChunks):
if result == userPass:
pool.terminate()
break
del result # trying to reduce memory usage
tEnd = time.time()
tElapsed = tEnd - tStart
print('Total elapsed time {:.3f} seconds'.format(tElapsed))
Make your getPassword function return the string (at least in the success case). Right now it always returns the default None, so result == userPass is never true and pool.terminate() is never executed.
Also, you might want to use a much smaller chunksize.
Related
I was testing a program to do something every N seconds, but I bumped into a weird problem.
If I use something simple like this:
import time
def main():
start_t = time.time()
while(True):
if (time.time()-start_t)%10 == 0:
print("Test")
if __name__ == "__main__":
main()
the program works as expected, i.e. it prints "Test" every 10 seconds.
However, I made a small modification, because I need to check at every iteration the current date...if I change the program to this:
import time
from datetime import datetime
def main():
start_t = time.time()
path_screenshots = "screenshots"
while(True):
path_screenshots_today = f"{path_screenshots}/{datetime.now().strftime('%Y_%m_%d')}/"
if (time.time()-start_t)%10 == 0:
print(f"Checking folder {path_screenshots_today}...")
if __name__ == "__main__":
main()
I would expect the program to print "Checking folder {path_screenshots_today}" every 10 seconds again, but instead it keeps running, without printing anything.
I understand that the result of the operation (time.time()-start_t)%10 is never precisely equal to 0, which might be creating the issue...but then, why does it even work in the first case?
I suspect it is working in the first case because the loop is running fast enough that it happens to line up. The lag created by creating path_screenshots_today (particularly the datetime.now() call) causes it not to line up as often. To actually do what you want, try:
import time
from datetime import datetime
def main():
last = time.time()
path_screenshots = "screenshots"
while True:
path_screenshots_today = f"{path_screenshots}/{datetime.now().strftime('%Y_%m_%d')}/"
if time.time() - last >= 10:
last = time.time()
print(f"Checking folder {path_screenshots_today}...")
if __name__ == "__main__":
main()
The first case works because the time is checked frequently enough, which does not happen in the second case because of the delay introduced by the string formatting. A more robust way is the following:
start_t = time.time()
while True:
path_screenshots_today = f"{path_screenshots}/{datetime.now().strftime('%Y_%m_%d')}/"
tt = time.time()
if tt - start_t >= 10:
print(f"Checking folder {path_screenshots_today}...")
start_t = tt # set last check time to "now"
And an even better way would be:
while True:
path_screenshots_today = f"{path_screenshots}/{datetime.now().strftime('%Y_%m_%d')}/"
print(f"Checking folder {path_screenshots_today}...")
time.sleep(10)
This avoids "busy waiting", i.e. keeping the CPU running like crazy.
It's a coincidence of how often the check is happening. If you actually loop over and print your value, you'll notice it's floating point:
while(True):
print('Current value is, ', (time.time()-start_t)%10)
You'll see output like this:
Current value is, 0.45271849632263184
Current value is, 0.45272231101989746
Given that you're doing so little in your loop, the odds are good that you'll coincidentally do that evaluation when the current value is exactly 0.0. But when you add some extra computation, even just the string formatting in datetime, each iteration of your loop will take a little longer and you might just happily skip over 0.0.
So strictly speaking, you should cast your value to an int before comparing it to 0. Eg, int((time.time() - start_t) % 10) == 0. That will be true for an entire second, until the modulus value is once again not zero, a second after it's first true.
A better solution, however, is to probably just use the time.sleep() function. You can call time.sleep to sleep for a number of seconds:
time.sleep(10) # Sleep for 10 seconds
When i run the script below, the print statement(s) appear at the same time. I would like the first print statement to display, then the loop to execute, then the run time to print.
Is there an additional statement I need to include?
import time
def main():
Time = time.clock()
print("Counting to 100,000,000:", end = "")
for i in range(1,100000000):
a = 3
print(" -- Time: ", time.clock() - Time)
if __name__ == "__main__":
main()
There is a buffer in Python's standard output. This means that data you want to write (as in print()) is sometimes stored and then written in one go - as you are experiencing here.
To override this, just call sys.stdout.flush() which "flushes" everything already in this buffer out.
So to apply this to your code, it would look something like:
import time, sys
def main():
Time = time.clock()
print("Counting to 100,000,000:", end = "")
sys.stdout.flush()
for i in range(1,100000000):
a = 3
print(" -- Time: ", time.clock() - Time)
if __name__ == "__main__":
main()
which ran as expected for me (producing the following output) with the second part coming through later.
Counting to 100,000,000: -- Time: 4.796258999999999
I use Multiprocessing library in Python to distribute a function over multiple cores. To do that I use "Pool" function, but I want to know when each processor has completed its work.
Here is the code :
def parallel(m,G):
D=0
for i in xrange(G):
D+=random()
return 1*(D<1)
pool=Pool()
TOTAL=0
for i in xrange(10):
TOTAL += sum(pool.map(partial(parallel,G=2),xrange(100)))
print TOTAL
I know how to use time.time() in normal situation, but what I need is to know when each core has completed is part of the job. If I put a time stamp directly in the function I will get many time values without knowing on what core it is processed.
Any advice is welcome!
You may return the completion time along with the actual result from parallel and then pick the last timestamp for each worker.
import time
from random import random
from functools import partial
from multiprocessing import Pool, current_process
def parallel(m, G):
D = 0
for i in xrange(G):
D += random()
# uncomment to give the other workers more chances to run
# time.sleep(.001)
return (current_process().name, time.time()), 1 * (D < 1)
# don't deny the existence of Windows
if __name__ == '__main__':
pool = Pool()
TOTAL = 0
proc_times = {}
for i in xrange(5):
# times is a list of proc_name:timestamp pairs
times, results = zip(*pool.map(partial(parallel, G=2), xrange(100)))
TOTAL += sum(results)
# process_times_loc is guaranteed to hold the last timestamp
# for each proc_name, see the doc on dict
proc_times_loc = dict(times)
print 'local completion times:', proc_times_loc
proc_times.update(proc_times_loc)
print TOTAL
print 'total completion times:', proc_times
However when jobs are that simple you may find that calling time.time each time consumes too much of CPU time.)
I have some problem here. I want to stop the print command at desired time. I figured out some codes and it still keep looping. Here the code,
import time
t = time.strftime("%H%M%S")
while ti:
print(time.strftime("%H%M%S"))
time.sleep(1)
if t = ("140000"): #just example of time to stop print
break
Thanks
t = time.strftime("%H%M%S")
is only executed once before the loop, so t's value doesn't ever change.
Your approach is the worst method of checking time difference; python's datetime framework allows for subtraction of timestamps and thus, you can check the time since something else happened easily without doing any string comparisons...
This will work
import time
t = time.strftime("%H%M%S")
while t:
t = time.strftime("%H%M%S")
print(time.strftime("%H%M%S"))
time.sleep(1)
if t == ("140000"): #just example of time to stop print
break
You had some bugs in your code
while ti: -- > while t:
if t = ("140000"): --> if t== ("140000"):
and you were missing this line t = time.strftime("%H%M%S")
time.sleep(1) may sleep less or more than a second therefore t == "140000" is not enough.
To stop a loop at a given local time:
import time
from datetime import datetime
stop_dt = datetime.combine(datetime.now(), datetime.strptime("1400", "%H%M").time())
stop_time = time.mktime(stop_dt.timetuple())
while time.time() < stop_time:
print(time.strftime("%H%M%S"))
time.sleep(max(1, (stop_time - time.time()) // 2))
time.time() returns "seconds since the epoch" -- unlike strings comparison it works across a midnight.
The sleep interval is a half of the remaining time or one second (whatever larger).
time.mktime() may return a wrong result if stop time is during an end-of-DST transition ("fall back") when the local time is ambiguous (the string-based solution may stop twice in this case).
Try this:
import time
while ti:
t = time.strftime("%H%M%S")
print(time.strftime("%H%M%S"))
time.sleep(1)
if t = ("140000"): #just example of time to stop print
break
I am trying to understand how to get children to write to a parent's variables. Maybe I'm doing something wrong here, but I would have imagined that multiprocessing would have taken a fraction of the time that it is actually taking:
import multiprocessing, time
def h(x):
h.q.put('Doing: ' + str(x))
return x
def f_init(q):
h.q = q
def main():
q = multiprocessing.Queue()
p = multiprocessing.Pool(None, f_init, [q])
results = p.imap(h, range(1,5))
p.close()
-----Results-----:
1
2
3
4
Multiprocessed: 0.0695610046387 seconds
1
2
3
4
Normal: 2.78949737549e-05 seconds # much shorter
for i in range(len(range(1,5))):
print results.next() # prints 1, 4, 9, 16
if __name__ == '__main__':
start = time.time()
main()
print "Multiprocessed: %s seconds" % (time.time()-start)
start = time.time()
for i in range(1,5):
print i
print "Normal: %s seconds" % (time.time()-start)
#Blender basically already answered your question, but as a comment. There is some overhead associated with the multiprocessing machinery, so if you incur the overhead without doing any significant work, it will be slower.
Try actually doing some work that parallelizes well. For example, write Python code to open a file, scan it using a regular expression, and pull out matching lines; then make a list of ten big files and time how long it takes to do all ten with multiprocessing vs. plain Python. Or write code to compute an expensive function and try that.
I have used multiprocessing.Pool() just to run a bunch of instances of an external program. I used subprocess to run an audio encoder, and it ran four instances of the encoder at once for a noticeable speedup.