I have more than 10,000 C files, which i need to pass each one of them to some application foo.exe for processing and generating dis-assembly files for each one of the C files,i.e. at the end of this process i will have 10,000 lst/output files! Assuming that, this process is not IO-Bound (despite the fact that foo.exe Write new lst file to disk for each c file. is it right assumption ?).
My task is
To implement parallel python program to get the job done in minimum time! by utilizing all cpu cores for this task.
My approach
I have implemented this program and it works for me, the pseudo code listed below:
iterate over all c files and push the abs path for each one in a global List, files_list.
calculate the cpu logical cores number (with psutil py module), this will be the maximum threads to be dispatched later. lets assume it is 8 threads.
generate new list, workers_list (its a list of lists) which contains the intervals or indexes (L_index, R_index) yielded from division of files_list by 8 . e.g. if i have 800 c files then workers_list will look like this: workers_list = [[0-99],[100,199],...,[700,799]].
dispatch 8 thread, workers, which each one will manipulate single entry in workers_list. each thread will open process (subprocess.call(...)) and call the foo.exe on the current c file.
posting the relevant code below:
The relevant Code
import multiprocessing
import subprocess
import psutil
import threading
import os
class LstGenerator(object):
def __init__(self):
self.elfdumpExePath = r"C:\.....\elfdump.exe" #abs path to the executable
self.output_dir = r"C:\.....\out" #abs path to where i want the lst files to be generated
self.files = [] # assuming that i have all the files in this list (abs path for each .C file)
def slice(self, files):
files_len = len(files)
j = psutil.cpu_count()
slice_step = files_len / j
workers_list = []
lhs = 0
rhs = slice_step
while j:
workers_list.append(files[lhs:rhs])
lhs += slice_step
rhs += slice_step
j -= 1
if j == 1: # last iteration
workers_list.append(files[lhs:files_len])
break
for each in workers_list: #for debug only
print len(each)
return workers_list
def disassemble(self, objectfiles):
for each_object in objectfiles:
cmd = "{elfdump} -T {object} -o {lst}".format(
elfdump=self.elfdumpExePath,
object=each_object,
lst=os.path.join(self.outputs, os.path.basename(each_object).rstrip('o') + 'lst'))
p = subprocess.call(cmd, shell=True)
def execute(self):
class FuncThread(threading.Thread):
def __init__(self, target, *args):
self._target = target
self._args = args
threading.Thread.__init__(self)
workers = []
for portion in self.slice(self.files):
workers.append(FuncThread(self.disassemble, portion))
# dispatch the workers
for worker in workers:
worker.start()
# wait or join the previous dispatched workers
for worker in workers:
worker.join()
if __name__ == '__main__':
lst_gen = LstGenerator()
lst_gen.execute()
My Questions
can i do this in more efficient way?
do python have standard lib or module that can get the job done and reduce my code/logic complexity? maybe multiprocessing.Pool?
running on windows, with python 2.7!
thanks
Yes, multiprocessing.Pool can help with this. That also does the work of sharding the list of inputs for each CPU. Here is python code (untested) that should get you on your way.
import multiprocessing
import os
import subprocess
def convert(objectfile):
elfdumpExePath = "C:\.....\elfdump.exe"
output_dir = "C:\.....\out"
cmd = "{elfdump} -T {obj} -o {lst}".format(
elfdump=elfdumpExePath,
obj=objectfile,
lst=os.path.join(output_dir, os.path.basename(objectfile).rstrip('o') + 'lst'))
return cmd
files = ["foo.c", "foo1.c", "foo2.c"]
p = multiprocessing.Pool()
outputs = p.map(convert, files)
Keep in mind that your worker function (convert above) must accept one argument. So if you need to pass in an input path and output path, that must be done as a single argument, and your list of filenames will have to be transformed into a list of pairs, where each pair is input and output.
The answer above is for python 2.7, but keep in mind that python2 has reached its end-of-life. In python3, you can use multiprocessing.Pool in a with statement so that it cleans up on its own.
Posting an answer for my question after strugling with it for a while, and noticing that i can import concurrent.futures in python2.x! this approach reduce code complexity ro minimum and even improve the execution time. unlike my first thoughts these processes is more IO-bound than cpu-bound! yet, the the time efficiency that i have get was enough convinient for run the program with multi-process.
concurrent.futures
The concurrent.futures module provides a high-level interface for asynchronously executing callables.
The asynchronous execution can be performed with threads, using
ThreadPoolExecutor, or separate processes, using ProcessPoolExecutor.
Both implement the same interface, which is defined by the abstract
Executor class.
class concurrent.futures.Executor
An abstract class that provides
methods to execute calls asynchronously. It should not be used
directly, but through its concrete subclasses.
submit(fn, *args, **kwargs)
Schedules the callable, fn, to be executed as fn(*args **kwargs) and
returns a Future object representing the execution of the callable.
for further reading please folow the like below:
parallel tasks with concurrent.futures
import multiprocessing
import subprocess
import psutil
import threading
import os
import concurrent.futures
class LstGenerator(object):
def __init__(self):
self.elfdumpExePath = r"C:\.....\elfdump.exe" #abs path to the executable
self.output_dir = r"C:\.....\out" #abs path to where i want the lst files to be generated
self.files = [] # assuming that i have all the files in this list (abs path for each .C file)
def disassemble(self, objectfile):
cmd = "{elfdump} -T {object} -o {lst}".format(
elfdump=self.elfdumpExePath,
object=objectfile,
lst=os.path.join(self.outputs, os.path.basename(objectfile).rstrip('o') + 'lst'))
return subprocess.call(cmd, shell=True,stdout=subprocess.PIPE)
def execute(self):
with concurrent.futures.ProcessPoolExecutor() as executor:
results = [executor.submit(self.disassemble(file)) for file in self.files]
if __name__ == '__main__':
lst_gen = LstGenerator()
lst_gen.execute()
I have some text files that I need to read with Python. The text files keep an array of floats only (ie no strings) and the size of the array is 2000-by-2000. I tried to use the multiprocessing package but for some reason it now runs slower. The times I have on my pc for the code attached below are
Multi thread: 73.89 secs
Single thread: 60.47 secs
What am I doing wrong here, is there a way to speed up this task? My pc is powered by an Intel Core i7 processor and in real life I have several hundreds of these text files, 600 or even more.
import numpy as np
from multiprocessing.dummy import Pool as ThreadPool
import os
import time
from datetime import datetime
def read_from_disk(full_path):
print('%s reading %s' % (datetime.now().strftime('%H:%M:%S'), full_path))
out = np.genfromtxt(full_path, delimiter=',')
return out
def make_single_path(n):
return r"./dump/%d.csv" % n
def save_flatfiles(n):
for i in range(n):
temp = np.random.random((2000, 2000))
_path = os.path.join('.', 'dump', str(i)+'.csv')
np.savetxt(_path, temp, delimiter=',')
if __name__ == "__main__":
# make some text files
n = 10
save_flatfiles(n)
# list with the paths to the text files
file_list = [make_single_path(d) for d in range(n)]
pool = ThreadPool(8)
start = time.time()
results = pool.map(read_from_disk, file_list)
pool.close()
pool.join()
print('finished multi thread in %s' % (time.time()-start))
start = time.time()
for d in file_list:
out = read_from_disk(d)
print('finished single thread in %s' % (time.time() - start))
print('Done')
You are using multiprocessing.dummy which replicates the API of multiprocessing but actually it is a wrapper around the threading module.
So, basically you are using Threads instead of Process. And threads in python are not useful( Due to GIL) when you want to perform computational tasks.
So Replace:
from multiprocessing.dummy import Pool as ThreadPool
With:
from multiprocessing import Pool
I've tried running your code on my machine having a i5 processor, it finished execution in 45 seconds. so i would say that's a big improvement.
Hope this clears your understanding.
I am trying to run a function concurrently over multiple files in a gui using tkinter and concurrent.futures
Outside of the GUI, this script works fine. However, whenever I translate it over into the GUI script, instead of running the function in parallel, the script opens up 5 new gui tkinter windows (the number of windows it opens is equal to the number of processors I allow the program to use).
Ive looked over the code thoroughly and just cant understand why it is opening new windows as opposed to just running the function over the files.
Can anyone see something I am missing?
An abridged version of the code is below. I have cut out a significant part of the code and only left in the parts pertinent to parralelization. This code undoubtedly has variables in it that I have not defined in this example.
import pandas as pd
import numpy as np
import glob
from pathlib import Path
from tkinter import *
from tkinter import filedialog
from concurrent.futures import ProcessPoolExecutor
window = Tk()
window.title('Problem with parralelizing')
window.geometry('1000x700')
def calculate():
#establish where the files are coming from to operate on
folder_input = folder_entry_var.get()
#establish the number of processors to use
nbproc = int(np_var.get())
#loop over files to get a list of file to be worked on by concurrent.futures
files = []
for file in glob.glob(rf'{folder_input}'+'//*'):
files.append(file)
#this function gets passed to concurrent.futures. I have taken out a significant portion of
#function itself as I do not believe the problem resides in the function itself.
def process_file(filepath):
excel_input = excel_entry_var.get()
minxv = float(min_x_var.get())
maxxv = float(man_x_var.get())
output_dir = odir_var.get()
path = filepath
event_name = Path(path).stem
event['event_name'] = event_name
min_x = 292400
max_x = 477400
list_of_objects = list(event.object.unique())
missing_master_surface = []
for line in list_of_objects:
df = event.loc[event.object == line]
current_y = df.y.max()
y_cl = df.x.values.tolist()
full_ys = np.arange(min_x,max_x+200,200).tolist()
for i in full_ys:
missing_values = []
missing_v_y = []
exist_yn = []
event_name_list = []
if i in y_cl:
next
elif i not in y_cl:
missing_values.append(i)
missing_v_y.append(current_y)
exist_yn.append(0)
event_name_list.append(event_name)
# feed the function to processpool executer to run. At this point, I hear the processors
# spin up, but all it does is open 5 new tkinter windows (the number of windows is proportionate
#to the number of processors I give it to run
if __name__ == '__main__':
with ProcessPoolExecutor(max_workers=nbproc) as executor:
executor.map(process_file, files)
window.mainloop()
Ive looked over the code thoroughly and just cant understand why it is opening new windows as opposed to just running the function over the files.
Each process has to reload your code. At the very top of your code you do window = Tk(). That is why you get one window per process.
I tried the following python programs, both sequential and parallel versions on a cluster computing facility. I could clearly see(using top command) more processes initiating for the parallel program. But when I time it, it seems the parallel version is taking more time. What could be the reason? I am attaching the codes and the timing info herewith.
#parallel.py
from multiprocessing import Pool
import numpy
def sqrt(x):
return numpy.sqrt(x)
pool = Pool()
results = pool.map(sqrt, range(100000), chunksize=10)
#seq.py
import numpy
def sqrt(x):
return numpy.sqrt(x)
results = [sqrt(x) for x in range(100000)]
user#domain$ time python parallel.py > parallel.txt
real 0m1.323s
user 0m2.238s
sys 0m0.243s
user#domain$ time python seq.py > seq.txt
real 0m0.348s
user 0m0.324s
sys 0m0.024s
The amount of work per task is by far too little to compensate for the work-distribution-overhead. First you should increase the chunksize, but still a single square root operation is too short to compensate for the cost of sending around the data between processes. You can see an effective speedup from something like this:
def sqrt(x):
for _ in range(100):
x = numpy.sqrt(x)
return x
results = pool.map(sqrt, range(10000), chunksize=100)
I am using multiprocessing.Pool to create a pool of workers to load files into Pandas. It hangs all the time, probably about 75% of the time.
import pandas as pd
import multiprocessing as mp
def load(filename):
thing = pd.read_table(filename)
return thing
files = ['a','b','c'] # A list of a bunch of files
with mp.Pool(5) as pool:
result = pool.map(load, files)
By "hangs" I mean never finishes. I've straced the procs and they are waiting on futuexes, so I have no idea what that means. Am I invoking the pool correctly?
Again, it works perfectly 25% of the time, so I must be doing something right... thx!
Ubuntu Xenial Python3.5.2