I'm writing my first multiprocessing program in python.
I want to create a list of values to be processed, and 8 processes (number os CPU cores) will consume and process the list of values.
I wrote the following python code:
__author__ = 'Rui Martins'
from multiprocessing import cpu_count, Process, Lock, Value
def proc(lock, number_of_active_processes, valor):
lock.acquire()
number_of_active_processes.value+=1
print "Active processes:", number_of_active_processes.value
lock.release()
# DO SOMETHING ...
for i in range(1, 100):
valor=valor**2
# (...)
lock.acquire()
number_of_active_processes.value-=1
lock.release()
if __name__ == '__main__':
proc_number=cpu_count()
number_of_active_processes=Value('i', 0)
lock = Lock()
values=[11, 24, 13, 40, 15, 26, 27, 8, 19, 10, 11, 12, 13]
values_processed=0
processes=[]
for i in range(proc_number):
processes+=[Process()]
while values_processed<len(values):
while number_of_active_processes.value < proc_number and values_processed<len(values):
for i in range(proc_number):
if not processes[i].is_alive() and values_processed<len(values):
processes[i] = Process(target=proc, args=(lock, number_of_active_processes, values[values_processed]))
values_processed+=1
processes[i].start()
while number_of_active_processes.value == proc_number:
# BUG: always number_of_active_processes.value == 8 :(
print "Active processes:", number_of_active_processes.value
print ""
print "Active processes at END:", number_of_active_processes.value
And, I have the following problem:
The program never stop
I get out of RAM
Simplifying your code to the following:
def proc(lock, number_of_active_processes, valor):
lock.acquire()
number_of_active_processes.value += 1
print("Active processes:", number_of_active_processes.value)
lock.release()
# DO SOMETHING ...
for i in range(1, 100):
print(valor)
valor = valor **2
# (...)
lock.acquire()
number_of_active_processes.value -= 1
lock.release()
if __name__ == '__main__':
proc_number = cpu_count()
number_of_active_processes = Value('i', 0)
lock = Lock()
values = [11, 24, 13, 40, 15, 26, 27, 8, 19, 10, 11, 12, 13]
values_processed = 0
processes = [Process() for _ in range(proc_number)]
while values_processed < len(values)-1:
for p in processes:
if not p.is_alive():
p = Process(target=proc,
args=(lock, number_of_active_processes, values[values_processed]))
values_processed += 1
p.start()
If you run it like above the print(valor) added you see exactly what is happening, you are exponentially growing valor to the point you run out of memory, you don't get stuck in the while you get stuck in the for loop.
This is the output at the 12th process adding a print(len(srt(valor))) after a fraction of a second and it just keeps on going:
2
3
6
11
21
.........
59185
70726
68249
73004
77077
83805
93806
92732
90454
104993
118370
136498
131073
Just changing your loop to the following:
for i in range(1, 100):
print(valor)
valor = valor *2
The last number created is:
6021340351084089657109340225536
Using your own code you seem to get stuck in the while but it is valor is growing in the for loop to numbers with as many digits as:
167609
180908
185464
187612
209986
236740
209986
And on....
The problem is not your multiprocessing code. It's the pow operator in the for loop:
for i in range(1, 100):
valor=valor**2
the final result would be pow(val, 2**100), and this is too big, and calculate it would cost too much time and memory. so you got out of memory error in the last.
4 GB = 4 * pow(2, 10) * pow(2, 10) * pow(2, 20) * 8 bit = 2**35 bit
and for your smallest number 8:
pow(8, 2**100) = pow(2**3, 2**100) = pow(2, 3*pow(2, 100))
pow(2, 3*pow(2, 100))bit/4GB = 3*pow(2, 100-35) = 3*pow(2, 65)
it need 3*pow(2, 65) times of 4 GB memory.
Related
I'm a bit stuck, i'm trying to pass thread names given by the system to my function so that i can print the start and end time of the current thread working in the function, i'm using global variables name for that. The user has to input a number in the given interval. The thread names work fine when i inputed 1001 but if i input numbers like 1200 or 10001 the names do not fit anymore. I put examples of the output, output 1 is not what i'm looking, output 2 is what i need. I'm not sure what is causing the name change. If any additional information is needed i'm happy to provide it
import os
from posixpath import abspath
import time
import sys
import signal
import threading
import platform
import subprocess
from pathlib import Path
import math
lokot = threading.Lock()
lista = []
name = 0
name3 = 0
def divisor(start,end):
lokot.acquire()
start = time.time()
print('{} started working at {}'.format(name, start))
for i in range(int(start),int(end)+1):
if int(end) % i == 0:
lista.append(i)
end = time.time()
print('{} ended working at {}'.format(name, end))
lokot.release()
def new_lista():
lokot.acquire()
start = time.time()
nlista = []
for i in lista:
if i % 2 == 0:
nlista.append(i)
print(nlista)
print('{} was executed in time frame {}'.format(name3,time.time()-start))
lokot.release()
def f4():
while (True):
print ('Input a non negative number in given range <1000,200000>')
number = input()
if number.isalpha() or not number or int(number) not in range(1000,200000) :
print('Number entered is not in the interval <1000,200000>')
continue
else:
global name
global name3
x = int(number) / 2
t1 = threading.Thread(target=divisor, args=(1, x))
t2 = threading.Thread(target=divisor, args=(1, number))
t3 = threading.Thread(target=nova_lista)
name = t1.name
t1.start()
name = t2.name
t2.start()
name3 = t3.name
t3.start()
t1.join()
t2.join()
t3.join()
break
Input 1:
100001
Output 1:
Thread-1 started working at 1624538800.4813018
Thread-2 ended working at 1624538800.4887686
Thread-2 started working at 1624538800.4892647
Thread-2 ended working at 1624538800.5076165
[2, 4, 8, 10, 16, 20, 40, 50, 80, 100, 200, 250, 400, 500, 1000, 1250, 2000, 2500, 5000, 6250, 10000, 12500, 25000, 50000]
Thread-3 dwas executed in time frame 0.0
Input 2:
1001
Output 2:
Thread-1 started working at 1624538882.90607
Thread-1 ended working at 1624538882.9070616
Thread-2 started working at 1624538882.9074266
Thread-2 ended working at 1624538882.9089162
[2, 4, 8, 10, 20, 40, 50, 100, 200, 250, 500, 1000, 1250, 2500, 5000]
Thread-3 dretva se izvodila 0.0
This won't necessarily work:
name = t1.name
t1.start()
name = t2.name
Nothing prevents the second assignment from happening before the t1 thread accesses the name variable.
Q: Why don't you just assign names when you create the threads instead of letting the Threading library assign them? E.g.;
t1 = threading.Thread(target=divisor, args=(1, x), name="t1")
I have a bunch of matrix multiplication operations that are performed only row-wise. I was wondering how to speed-up the computation by parallelization:
data = np.random.randint(1, 100, (100000, 800))
indices_1 = np.equal(data, 1)
A = np.zeros((100000, 100))
B = np.random.randn(800, 100)
for i in range(100000):
ones = indices_1[i]
not_ones = ~indices_1[i]
B_ones = B[ones]
B_not_ones = B[not_ones]
A[i] = (data[i][not_ones] # B_not_ones) # np.linalg.inv(B_not_ones.T # B_not_ones)
data[i][ones] = A[i] # B_ones.T
I tried multiprocessor but for some reason, but it did not perform better than sequential. Here is my multiprocessor implementation:
from multiprocessing.pool import ThreadPool, Pool
pool = ThreadPool() # can also use Pool
def f(i):
ones = indices_1[i]
not_ones = ~indices_1[i]
B_ones = B[ones]
B_not_ones = B[not_ones]
A[i] = (data[i][not_ones] # B_not_ones) # np.linalg.inv(B_not_ones.T # B_not_ones)
data[i][ones] = A[i] # B_ones.T
pool.map(f, range(100000))
Both yielded the same amount of running time (around 32 seconds). Other parallelization method like concurrent.futures did not improve the runtime (used like below):
with concurrent.futures.ThreadPoolExecutor() as executor:
result = executor.map(f, range(100000))
I also tried to apply dask but could not make their framework work in my case. Any help will be much appreciated! Thanks!
import numpy as np
import multiprocessing as mp
data = list(np.random.randint(1, 100, (100000, 800)))
indices_1 = np.equal(data, 1)
A = list(np.zeros((100000, 100)))
B = np.random.randn(800, 100)
def f(data, A, i):
ones = indices_1[i]
not_ones = ~indices_1[i]
B_ones = B[ones]
B_not_ones = B[not_ones]
A[i] = (data[i][not_ones] # B_not_ones) # np.linalg.inv(B_not_ones.T # B_not_ones)
data[i][ones] = A[i] # B_ones.T
with mp.Manager() as manager:
data_global = manager.list(data)
A_global = manager.list(A)
with mp.Pool() as p:
results = [ p.apply_async(f, (data_global, A_global, i,)) for i in range(100000) ]
for i in results:
i.wait()
data_global = list(data_global)
A_global = list(A_global)
I've read all the related posts on the subject, but I can't for the life of me get multiprocessing to work properly with shared memory.
I'm using an EC2 instance with 96 cores, but for some reason despite using shared memory, my memory consumption explodes when using a worker pool with 96 workers.
EDIT: Had a bug earlier, which caused not all the cores to be used (stupid bug where I didn't give the right parameters for map) - anyways, clarified my current problem.
Any ideas? Attaching a screenshot of htop on my server to show the CPU usage + memory consumption.
For reference, I used the figtree package from here: https://github.com/ec2604/figtree (commit - 7ba197e45a5c6577fab56d469b4b1ccf02242e3d), it's a forked repository that ports C level code to python. Don't think it should really matter, you can plop any CPU intensive code in there instead.
!!!!!!EDIT!!!!: In hindsight, the figtree package allocates memory for the result (5000099958) / (1024**3) GB per process. If you multiply that by 96 processes this is what causes the insane memory consumption.
import figtree
import numpy as np
import multiprocessing
import ctypes
from multiprocessing import Pool, sharedctypes
n = 50000
m = 9995
X_base = sharedctypes.RawArray(ctypes.c_double, n* 77)
X_shared = np.frombuffer(X_base.get_obj())
X_shared = X_shared.reshape(n, 77)
X_shared[:] = np.random.normal(0, 1, (n, 77))
del X_shared
Q_base = sharedctypes.RawArray(ctypes.c_double, m** 2)
Q_shared = np.frombuffer(Q_base.get_obj())
Q_shared = Q_shared.reshape(m, m)
Q_shared[:] = np.random.normal(0, 1, (m, m))
del Q_shared
def fig_helper_efficient(slice):
print(id(Q_shared))
Q_shared = np.frombuffer(Q_base)
Q_shared = Q_shared.reshape(9995, 9995)
X_shared = np.frombuffer(X_base)
X_shared = X_shared.reshape(n,77)
if Q_shared.shape[0] == Q_shared.shape[1]:
res = figtree.figtree(**{'X': X_shared[slice, :], 'Y': X_shared,
'Q': Q_shared[:, slice].copy(), 'epsilon': 1e-12,
'h': 15})
print("done")
return res
def divide_batches_equally(num_examples, num_batches):
div_result = num_examples // num_batches
mod_result = num_examples % num_batches
size = np.zeros((num_batches + 1, 1)).astype(np.int32)
size[1:] = div_result
if mod_result > 0:
size[1:mod_result + 1] += 1
return np.cumsum(size)
def parallel_fig_vert_efficient():
n_proc = 96
size = divide_batches_equally(m, n_proc)
parallel_list = [slice(int(size[i]), int(size[i + 1])) for i in range(n_proc)]
with Pool(n_proc) as pool:
res = pool.map(fig_helper_efficient, parallel_list)
return res
if __name__ == '__main__':
parallel_fig_vert_efficient()
I'm stuck with a variation of sliding window problem!
Usually we set the number of element to slide but in my case I want to slide the time!
The goal that I would like to reach is a function (thread in this case)
that is able to create a "time" windows in seconds (given by user).
Starting from the first element of the queue in this case:
[datetime.time(7, 6, 14, 537370), 584 add 5 seconds -> 7:6:19.537370 (ending point) and sum all elements in this interval:
[datetime.time(7, 6, 14, 537370), 584]
[datetime.time(7, 6, 18, 542798), 761]
Total: 584+761= 1345
Then create another "windows" with the second elements and goes on.
IMPORTANT: One item can be part of several window. the item are generated meanwhile, a naif solution with function that sleep for n second and then flush the queue is not good for my problem.
I think its a variation of this post:
Flexible sliding window (in Python)
But still can't solve the problem! Any help or suggests will be appreciated.
Thanks!
Example list of elements:
[datetime.time(7, 6, 14, 537370), 584]
[datetime.time(7, 6, 18, 542798), 761]
[datetime.time(7, 6, 20, 546007), 848]
[datetime.time(7, 6, 24, 550969), 20]
[datetime.time(7, 6, 27, 554370), 478]
[datetime.time(7, 6, 27, 554628), 12]
[datetime.time(7, 6, 31, 558919), 29]
[datetime.time(7, 6, 31, 559562), 227]
[datetime.time(7, 6, 32, 560863), 379]
[datetime.time(7, 6, 35, 564863), 132]
[datetime.time(7, 6, 37, 567276), 651]
[datetime.time(7, 6, 38, 568652), 68]
[datetime.time(7, 6, 40, 569861), 100]
[datetime.time(7, 6, 41, 571459), 722]
[datetime.time(7, 6, 44, 574802), 560]
...
Code:
import random
import time
import threading
import datetime
from multiprocessing import Queue
q = Queue()
#this is a producer that put elements in queue
def t1():
element = [0,0]
while True:
time.sleep(random.randint(0, 5))
element[0] = datetime.datetime.now().time()
element[1] = random.randint(0, 1000)
q.put(element)
#this is a consumer that sum elements inside a window of n seconds
#Ineed something a sliding window time of ten seconds that sum all elements for n seconds
def t2():
windowsize = 5 #size of the window 5 seconds
while not queue.empty():
e = q.get()
start = e[0] #the first element is the beginning point
end = start + datetime.timedelta(seconds=windowsize) #ending point
sum += e[1]
#some code that solve the problem :)
a = threading.Thread(target=t1)
a.start()
b = threading.Thread(target=t2)
b.start()
while True:
time.sleep(1)
Would this do? This is how I understood your problem. What this does is it creates a class that keeps track of things. You either add to this by tw.insert() or sum with tw.sum_window(seconds).
When you initialise TimeWindow, you can give it a max size parameter, default is 10 seconds. When you add elements or calculate sums, it does a clean up so that before every insert or sum operation, first element time e[0][0] and last element time e[n][0] are within 10 seconds of each other. Older entries are expunged. A "poller" thread is there to track your requests.
I have added two queues as I do not know what you intend to do with results. Now if you want to request data starting from now to 5 seconds in the future, you create a request and put it in queue. The request has a random id so that you can match it to results. Your main thread needs to monitor result queue and after five seconds, every request sent to queue return with the same id and the sum.
If this is not what you want to do, then I just don't understand what is it that you try to achieve here. Even this is already rather complicated and there may be a much simpler way to achieve what you intend to do.
import random
import time
import threading
import datetime
import Queue
import uuid
from collections import deque
q_lock = threading.RLock()
class TimeWindow(object):
def __init__(self, max_size=10):
self.max_size = max_size
self.q = deque()
def expire(self):
time_now = datetime.datetime.now()
while True:
try:
oldest_element = self.q.popleft()
oe_time = oldest_element[0]
if oe_time + datetime.timedelta(seconds=self.max_size) > time_now:
self.q.appendleft(oldest_element)
break
except IndexError:
break
def insert(self,elm):
self.expire()
self.q.append(elm)
def sum_window(self, start, end):
self.expire()
try:
_ = self.q[0]
except IndexError:
return 0
result=0
for f in self.q:
if start < f[0] < end:
result += f[1]
else:
pass
return result
tw = TimeWindow()
def t1():
while True:
time.sleep(random.randint(0, 3))
element = [datetime.datetime.now(), random.randint(0,1000)]
with q_lock:
tw.insert(element)
def poller(in_q, out_q):
pending = []
while True:
try:
new_request = in_q.get(0.1)
new_request["end"] = new_request["start"] + datetime.timedelta(seconds=new_request["frame"])
pending.append(new_request)
except Queue.Empty:
pass
new_pending = []
for a in pending:
if a["end"] < datetime.datetime.now():
with q_lock:
r_sum = tw.sum_window(a["start"], a["end"])
r_structure = {"id": a["id"], "result": r_sum}
out_q.put(r_structure)
else:
new_pending.append(a)
pending = new_pending
a = threading.Thread(target=t1)
a.daemon = True
a.start()
in_queue = Queue.Queue()
result_queue = Queue.Queue()
po = threading.Thread(target=poller, args=(in_queue, result_queue,))
po.daemon = True
po.start()
while True:
time.sleep(1)
newr = {"id": uuid.uuid4(), "frame": 5, "start": datetime.datetime.now()}
in_queue.put(newr)
try:
ready = result_queue.get(0)
print ready
except Queue.Empty:
pass
garim#wof:~$ python solution.py
1 t1 produce element: 16:09:30.472497 1
2 t1 produce element: 16:09:33.475714 9
3 t1 produce element: 16:09:34.476922 10
4 t1 produce element: 16:09:37.480100 7
solution: 16:09:37.481171 {'id': UUID('adff334f-a97a-459d-8dcc-f28309e25574'), 'result': 19}
5 t1 produce element: 16:09:38.481352 10
solution: 16:09:38.482687 {'id': UUID('0a7481e5-e993-439a-9f7e-2c5aeef86155'), 'result': 19}
It still doent works :( I add a counter for each element it inserts with function t1. The goal is do the sum (result_queue.get) at this time:
16:09:35.472497 ---> 16:09:30.472497 + 5 seconds
no before. Only then the element goes out. The next time the sum will be done at:
16:09:35.475714 ---> 16:09:33.475714 + 5 seconds
I understand that it's hard to explain.. With both of your solution the time window slide so I can consider the problem solved :) I will try to improve when the function sum will be execute, that time trigger is important. I acquire a lot useful knowledge. Thanks for helping.
from multiprocessing import Process , Queue
from datetime import datetime
c = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27]
out = Queue()
def support(m):
for k in m :
print "%s <-- hi" % k
out.put("done")
all = Queue()
temp = []
total = len(c)
count = 0
for m in c :
count += 1
total = total - 1
temp.append(m)
if count == 5 or total == 0 :
all.put(temp)
count = 0
temp = []
process_count = 3
while all.qsize() != 0 :
process_list = []
try :
for x in range(process_count) :
p = Process(target=support, args=(all.get(),))
process_list.append(p)
for p in process_list :
p.start()
for p in process_list :
p.join()
except Exception as e :
print e
while out.qsize != 0 :
print out.get()
print "all done"
I dont know why it does not end and does not print "all done" , just remain continuously in loop or keep executing .
Will be of great help if you can make this code more efficient but first i want to know why it does not end .
The problem is:
while out.qsize != 0 :
print out.get()
out.qsize is a function, so now you're comparing the function itself (not the return value!) with 0, with is of course always False.
You should use:
while out.qsize() != 0 :
print out.get()