Progress Check while running a loop with multiprocessing pool.apply.async - python

I have dig through everywhere but now I am stuck, and i need the help of the community. I am not a programmer and barely use python inside a VFX program called Houdini.
Using multiprocessing I am running wedges of tasks in batches using another program called hython.
Task creates n amount of folders and populates these folders with x amount of files each with equally total amount of files such as
/files/wedge_1/file1...file2
/files/wedge_2/file1...file2
pool decides how many it can ran these tasks in batches. I am trying to implement a progress bar that runs along the side of my code and checks the files every x amount until total number of files = total number of files required.
Other possible option is that hython task already can output an alfred progress report, but since everything runs in together i get several copies of same frame printed in terminal, which doesn't tel me from which loop they are coming from.
here is the code for your considerations.
# Importing all needed modules
import multiprocessing
from multiprocessing.pool import ThreadPool
import time, timeit
import os
#My Variables
hou.hipFile.load("/home/tricecold/pythonTest/HoudiniWedger/HoudiniWedger.hiplc") #scene file
wedger = hou.parm('/obj/geo1/popnet/source_first_input/seed') #wedged parameter
cache = hou.node('/out/cacheme') #runs this node
total_tasks = 10 #Wedge Amount
max_number_processes = 5 #Batch Size
FileRange = abs(hou.evalParmTuple('/out/cacheme/f')[0] - hou.evalParmTuple('/out/cacheme/f')[1])
target_dir = os.path.dirname(hou.evalParm('/out/cacheme/sopoutput')) + "/"
totals = FileRange * total_tasks
def cacheHoudini(wedge=total_tasks): #houdini task definiton
wedger.set(wedge)
time.sleep(0.1)
cache.render(verbose=False)
def files(wedge=total_tasks): #figure out remaining files
count = 0
while (count < totals):
num = len([name for name in os.listdir(target_dir)])
count = count + 1
print (count)
if __name__ == '__main__':
pool = multiprocessing.Pool(max_number_processes) #define pool size
#do i need to run my progress function files here
for wedge in range(0,total_tasks):
#or I add function files here, not really sure
pool.apply_async(cacheHoudini,args=(wedge,)) #run tasks
pool.close() # After all threads started we close the pool
pool.join() # And wait until all threads are done

Related

Fastest way to call a function millions of times in Python

I have a function readFiles that I need to call 8.5 million times (essentially stress-testing a logger to ensure the log rotates correctly). I don't care about the output/result of the function, only that I run it N times as quickly as possible.
My current solution is this:
from threading import Thread
import subprocess
def readFile(filename):
args = ["/usr/bin/ls", filename]
subprocess.run(args)
def main():
filename = "test.log"
threads = set()
for i in range(8500000):
thread = Thread(target=readFile, args=(filename,)
thread.start()
threads.add(thread)
# Wait for all the reads to finish
while len(threads):
# Avoid changing size of set while iterating
for thread in threads.copy():
if not thread.is_alive():
threads.remove(thread)
readFile has been simplified, but the concept is the same. I need to run readFile 8.5 million times, and I need to wait for all the reads to finish. Based on my mental math, this spawns ~60 threads per second, which means it will take ~40 hours to finish. Ideally, this would finish within 1-8 hours.
Is this possible? Is the number of iterations simply too high for this to be done in a reasonable span of time?
Oddly enough, when I wrote a test script, I was able to generate a thread about every ~0.0005 seconds, which should equate to ~2000 threads per second, but this is not the case here.
I considered iteration 8500000 / 10 times, and spawning a thread which then runs the readFile function 10 times, which should decrease the amount of time by ~90%, but it caused some issues with blocking resources, and I think passing a lock around would be a bit complicated insofar as keeping the function usable by methods that don't incorporate threading.
Any tips?
Based on #blarg's comment, and scripts I've used using multiprocessing, the following can be considered.
It simply reads the same file based on the size of the list. Here I'm looking at 1M reads.
With 1 core it takes around 50 seconds. With 8 cores it's down to around 22 seconds. this is on a windows PC, but I use these scripts on linux EC2 (AWS) instances as well.
just put this in a python file and run:
import os
import time
from multiprocessing import Pool
from itertools import repeat
def readfile(fn):
f = open(fn, "r")
def _multiprocess(mylist, num_proc):
with Pool(num_proc) as pool:
r = pool.starmap(readfile, zip(mylist))
pool.close()
pool.join()
return r
if __name__ == "__main__":
__spec__=None
# use the system cpus or change explicitly
num_proc = os.cpu_count()
num_proc = 1
start = time.time()
mylist = ["test.txt"]*1000000 # here you'll want to 8.5M, but test first that it works with smaller number. note this module is slow with low number of reads, meaning 8 cores is slower than 1 core until you reach a certain point, then multiprocessing is worth it
rs = _multiprocess(mylist, num_proc=num_proc)
print('total seconds,', time.time()-start )
I think you should considering using subprocess here, if you just want to execute ls command I think it's better to use os.system since it will reduce the resource consumption of your current GIL
also you have to put a little delay with time.sleep() while waiting the thread to be finished to reduce resource consumption
from threading import Thread
import os
import time
def readFile(filename):
os.system("/usr/bin/ls "+filename)
def main():
filename = "test.log"
threads = set()
for i in range(8500000):
thread = Thread(target=readFile, args=(filename,)
thread.start()
threads.add(thread)
# Wait for all the reads to finish
while len(threads):
time.sleep(0.1) # put this delay to reduce resource consumption while waiting
# Avoid changing size of set while iterating
for thread in threads.copy():
if not thread.is_alive():
threads.remove(thread)

Process a lot of data without waiting for a chunk to finish

I am confused with map, imap, apply_async, apply, Process etc from the multiprocessing python package.
What I would like to do:
I have 100 simulation script files that need to be run through a simulation program. I would like python to run as many as it can in parallel, then as soon as one is finished, grab a new script and run that one. I don't want any waiting.
Here is a demo code:
import multiprocessing as mp
import time
def run_sim(x):
# run
print("Running Sim: ", x)
# artificailly wait 5s
time.sleep(5)
return x
def main():
# x => my simulation files
x = list(range(100))
# run parralel process
pool = mp.Pool(mp.cpu_count()-1)
# get results
result = pool.map(run_sim, x)
print("Results: ", result)
if __name__ == "__main__":
main()
However, I don't think that map is the correct way here since I want the PC not to wait for the batch to be finished but immediately proceed to the next simulation file.
The code will run mp.cpu_count()-1 simulations at the same time and then wait for every one of them to be finished, before starting a new batch of size mp.cpu_count()-1 . I don't want the code to wait, but just to grab a new simulation file as soon as possible.
Do you have any advice on how to code it better?
Some clarifications:
I am reducing the pool to one less than the CPU count because I don't want to block the PC.
I still need to do light work while the code is running.
It works correctly using map. The trouble is simply that you sleep all thread for 5 seconds, so they all finish at the same time.
Try this code to see the effect correctly:
import multiprocessing as mp
import time
import random
def run_sim(x):
# run
t = random.randint(3,10)
print("Running Sim: ", x, " - sleep ", t)
time.sleep(t)
return x
def main():
# x => my simulation files
x = list(range(100))
# run parralel process
pool = mp.Pool(mp.cpu_count()-1)
# get results
result = pool.map(run_sim, x)
print("Results: ", result)
if __name__ == "__main__":
main()

How do I adapt my code for multiprocessing

Whenever I use my (other) multiprocessing code it works fine but in terms of feedback for where I am in regards to completion of the script for example "Completed 5 / 10 files" I do not know how to adapt my code to return the count. Basically I would like to adapt the code below to allow multiprocessing.
So I Use
file_paths = r"path to file with paths"
count = 0
pool = Pool(16)
pool.map(process_control, file_paths)
pool.close()
pool.join()
within process_control I have at the end of the function count += 1 and return count
I guess the equivelant code would be something like
def process_control(count, file_path):
do stuff
count += 1
print("Process {} / {} completed".format(count, len(file_paths))
return count
file_paths = r"path to file with paths"
count = 0
for path in file_paths:
count = process_control(count, path)
SOmething like that so that. I hope my explanation is clear.
Each subprocess has its own copy of count so all they can do is track the work in that one process. The count won't aggregate for all of the processes. But the parent can do the counting. map waits for all tasks to complete, so that isn't helpful. imap is better, it iterates but it also maintains order so reporting is still delayed. imap_unordered with chunksize 1 is your best option. Each task return value (even if it is None) is returned immediately.
def process_control(count, file_path):
do stuff
file_paths = ["path1", ...]
with multiprocessing.Pool(16) as pool:
count = 0
for _ in pool.imap_unordered(porcess_control, file_paths,chunksize=1):
count += 1
print("Process {} / {} completed".format(count, len(file_paths))
A note on chunksize. There are costs to using a pool - each work item needs to be sent to the subprocess and its value returned. This back-and-forth IPC is relatively expensive, so the pool will "chunk" the work items, meaning that it will send many work items to a given subprocess all in one chunk and the process will only return when the entire chunk of data has been processed through the worker function.
This is great when there are many relatively short work items. But suppose that different work items take different amount of time to execute. There will be a tall-pole subprocess still working on its chunk even though the others have finished.
More important for your case, the results aren't posted back to the parent until the chunk completes so you don't get real-time reporting of completion.
Set chunksize to 1 and the subprocess will return results immediately for more accurate accounting.
For simple cases, the previous answer by #tedelaney is excellent.
For more complicated cases, Value provides shared memroy:
from multiprocessing import Value
counter = Value('i', 0)
# increment the value
with variable.get_lock():
counter.value += 1
# get the value. Read lock automatically used
processes_done = counter.value

Python multiprocessing progress record

I'm running a pythong program using the multiprocessing module to take advantage of multiple cores on the cpu.
The program itself works fine, but when it comes to show a kind of progress percentage it all messes up.
In order to try to simulate what happens to me, I've written this little scenario where I've used some random times to try to replicate some tasks that could take different times in the original program.
When you ran it, you'll see how percentages are mixed up.
Is there a propper way to achieve this?
from multiprocessing import Pool, Manager
import random
import time
def PrintPercent(percent):
time.sleep(random.random())
print(' (%s %%) Ready' %(percent))
def HeavyProcess(cont):
total = 20
cont[0] = cont[0] + 1
percent = round((float(cont[0])/float(total))*100, 1)
PrintPercent(percent)
def Main():
cont = Manager().list(range(1))
cont[0] = 0
pool = Pool(processes=2)
for i in range(20):
pool.apply_async(HeavyProcess, [cont])
pool.close()
pool.join()
Main()

Multiprocessing in python

I am writing a Python script (in Python 2.7) wherein I need to generate around 500,000 uniform random numbers within a range. I need to do this 4 times, perform some calculations on them and write out the 4 files.
At the moment I am doing: (this is just part of my for loop, not the entire code)
random_RA = []
for i in xrange(500000):
random_RA.append(np.random.uniform(6.061,6.505)) # FINAL RANDOM RA
random_dec = []
for i in xrange(500000):
random_dec.append(np.random.uniform(min(data_dec_1),max(data_dec_1))) # FINAL RANDOM 'dec'
to generate the random numbers within the range. I am running Ubuntu 14.04 and when I run the program I also open my system manager to see how the 8 CPU's I have are working. I seem to notice that when the program is running, only 1 of the 8 CPU's seem to work at 100% efficiency. So the entire program takes me around 45 minutes to complete.
I noticed that it is possible to use all the CPU's to my advantage using the module Multiprocessing
I would like to know if this is enough in my example:
random_RA = []
for i in xrange(500000):
multiprocessing.Process()
random_RA.append(np.random.uniform(6.061,6.505)) # FINAL RANDOM RA
i.e adding just the line multiprocessing.Process(), would that be enough?
If you use multiprocessing, you should avoid shared state (like your random_RA list) as much as possible.
Instead, try to use a Pool and its map method:
from multiprocessing import Pool, cpu_count
def generate_random_ra(x):
return np.random.uniform(6.061, 6.505)
def generate_random_dec(x):
return np.random.uniform(min(data_dec_1), max(data_dec_1))
pool = Pool(cpu_count())
random_RA = pool.map(generate_random_ra, xrange(500000))
random_dec = pool.map(generate_random_dec, xrange(500000))
To get you started:
import multiprocessing
import random
def worker(i):
random.uniform(1,100000)
print i,'done'
if __name__ == "__main__":
for i in range(4):
t = multiprocessing.Process(target = worker, args=(i,))
t.start()
print 'All the processes have been started.'
You must gate the t = multiprocess.Process(...) with __name__ == "__main__" as each worker calls this program (module) again to find out what it has to do. If the gating didn't happen it would spawn more processes ...
Just for completeness, generating 500000 random numbers is not going to take you 45 minutes so i assume there are some intensive calculations going on here: you may want to look at them closely.

Categories