I have written a very basic code to test multiprocess in python.
When i try to run the code on my windows machine, it does not run while it works fine on linux machine.
Below is the code and the error that it throws.
from multiprocessing import Process
import os
import time
# creating a list to store all the processes that will be created
processes = []
# Get count of CPU Cores
cores = os.cpu_count()
def square(n): #just creating a random program for demostration
for i in range (n):
print(i)
i*i
time.sleep(0.1)
# create a process.
for i in range(cores):
p = Process(target=square,args=(100,))
processes.append(p)
#statrting all the processes
for proc in processes:
proc.start()
# join process
for proc in processes:
proc.join()
print("All processes are done")
Most Process executions (Not Threads) on Python will start a new instance importing itself. This means your global code will be executed every time the instance is doing the import. (This only applies to the spawn start method)
In order to avoid these issues, you have to move your code into the if __name__ == "__main__": function in order for the Process to create a new instance correctly.
You fix it like so:
from multiprocessing import Process
import os
import time
def square(n): #just creating a random program for demostration
for i in range (n):
print(i)
i*i
time.sleep(0.1)
if __name__ == "__main__":
# Get count of CPU Cores
cores = os.cpu_count()
# creating a list to store all the processes that will be created
processes = []
# create a process.
for i in range(cores):
p = Process(target=square,args=(100,))
processes.append(p)
#statrting all the processes
for proc in processes:
proc.start()
# join process
for proc in processes:
proc.join()
print("All processes are done")
Result:
1
0
0
1
1
0
2
0
0
1
1
1
2
... Truncated
All processes are done
Related
On several occasions, I have a list of tasks that need to be executed via Python. Typically these tasks take a few seconds, but there are hundreds-of-thousands of tasks and treading significantly improves execution time. Is there a way to dynamically specify the number of threads a python script should utilize in order to solve a stack of tasks?
I have had success running threads when executed in the body of Python code, but I have never been able to run threads correctly when they are within a function (I assume this is because of scoping). Below is my approach to dynamically define a list of threads which should be used to execute several tasks.
The problem is that this approach waits for a single thread to complete before continuing through the for loop.
import threading
import sys
import time
def null_thread():
""" used to instanciate threads """
pass
def instantiate_threads(number_of_threads):
""" returns a list containing the number of threads specified """
threads_str = []
threads = []
index = 0
while index < number_of_threads:
exec("threads_str.append(f't{index}')")
index += 1
for t in threads_str:
t = threading.Thread(target = null_thread())
t.start()
threads.append(t)
return threads
def sample_task():
""" dummy task """
print("task start")
time.sleep(10)
def main():
number_of_threads = int(sys.argv[1])
threads = instantiate_threads(number_of_threads)
# a routine that assigns work to the array of threads
index = 0
while index < 100:
task_assigned = False
while not task_assigned:
for thread in threads:
if not thread.is_alive():
thread = threading.Thread(target = sample_task())
thread.start()
# script seems to wait until thread is complete before moving on...
print(f'index: {index}')
task_assigned = True
index += 1
# wait for threads to finish before terminating
for thread in threads:
while thread.is_alive():
pass
if __name__ == '__main__':
main()
Solved:
You could convert to using concurrent futures ThreadPoolExecutor,
where you can set the amount of threads to spawn using
max_workers=amount of threads. – user56700
when I trying to make my script multi-threading,
I've found out multiprocessing,
I wonder if there is a way to make multiprocessing work with threading?
cpu 1 -> 3 threads(worker A,B,C)
cpu 2 -> 3 threads(worker D,E,F)
...
Im trying to do it myself but I hit so much problems.
is there a way to make those two work together?
You can generate a number of Processes, and then spawn Threads from inside them. Each Process can handle almost anything the standard interpreter thread can handle, so there's nothing stopping you from creating new Threads or even new Processes within each Process. As a minimal example:
def foo():
print("Thread Executing!")
def bar():
threads = []
for _ in range(3): # each Process creates a number of new Threads
thread = threading.Thread(target=foo)
threads.append(thread)
thread.start()
for thread in threads:
thread.join()
if __name__ == "__main__":
processes = []
for _ in range(3):
p = multiprocessing.Process(target=bar) # create a new Process
p.start()
processes.append(p)
for process in processes:
process.join()
Communication between threads can be handled within each Process, and communication between the Processes can be handled at the root interpreter level using Queues or Manager objects.
You can define a function that takes a process and make it run 3 threads and then spawn your processes to target this function, for example:
def threader(process):
for _ in range(3):
threading.Thread(target=yourfunc).start()
def main():
# spawn whatever processes here to target threader
I have a fairly simple program that I am trying to run in parallel. The program works when I run it using 4 processes. The processor I have is a 4 core processor with 8 logical cores. When I increase the number of processes from 4 to 5+, the program will run with the increased number of processes up until it hits .join() where the program hangs...whereas if I keep it at 4 or less, it never hangs and finishes properly. Are there different considerations that need to be made when spawning more processes than physical cores in your machine that I haven't thought of? Here's some example code of what I'm doing to create and run the processes.
import multiprocessing as mp
def f(list1, list2):
#do something
if __name__ == '__main__':
list1 = list()
list2 = list()
procs = list()
num_cores = 4
for i in xrange(0, num_cores):
p = mp.Process(target=f, args=(list1, list2,))
procs.append(p)
for p in procs:
p.start()
for p in procs:
p.join() # hangs here with num_cores over 4
I have a simple implementation of python's multi-processing module
if __name__ == '__main__':
jobs = []
while True:
for i in range(40):
# fetch one by one from redis queue
#item = item from redis queue
p = Process(name='worker '+str(i), target=worker, args=(item,))
# if p is not running, start p
if not p.is_alive():
jobs.append(p)
p.start()
for j in jobs:
j.join()
jobs.remove(j)
def worker(url_data):
"""worker function"""
print url_data['link']
What I expect this code to do:
run in infinite loop, keep waiting for Redis queue.
if Redis queue not empty, fetch item.
create 40 multiprocess.Process, not more not less
if a process has finished processing, start new process, so that ~40 process are running at all time.
I read that, to avoid zombie process that should be bound(join) to the parent, that's what I expected to achieve in the second loop. But the issue is that on launching it spawns 40 processes, workers finish processing and enter zombie state, until all currently spawned processes haven't finished,
then in next iteration of "while True", the same pattern continues.
So my question is:
How can I avoid zombie processes. and spawn new process as soon as 1 in 40 has finished
For a task like the one you described is usually better to use a different approach using Pool.
You can have the main process fetching data and the workers deal with it.
Following an example of Pool from Python Docs
def f(x):
return x*x
if __name__ == '__main__':
pool = Pool(processes=4) # start 4 worker processes
result = pool.apply_async(f, [10]) # evaluate "f(10)" asynchronously
print result.get(timeout=1) # prints "100" unless your computer is *very* slow
print pool.map(f, range(10)) # prints "[0, 1, 4,..., 81]"
I also suggest to use imap instead of map as it seems your task can be asynch.
Roughly your code will be:
p = Pool(40)
while True:
items = items from redis queue
p.imap_unordered(worker, items) #unordered version is faster
def worker(url_data):
"""worker function"""
print url_data['link']
I am trying to run 2 things in parallel with multiprocessing, I have this code:
from multiprocessing import Process
def secondProcess():
x = 0
while True:
x += 1
if __name__ == '__main__':
p = Process(target=secondProcess())
p.start()
print "blah"
p.join()
What seems to happen is that the second process starts running but it does not proceed with running the parent process, it just hangs until the second process finishes (so in this case never). So "blah" is never printed.
How can I make it run both in parallel?
You don't want to call secondProcess. You want to pass it as a parameter.
p = Process(target=secondProcess)