python time sliding window variation - python

I'm stuck with a variation of sliding window problem!
Usually we set the number of element to slide but in my case I want to slide the time!
The goal that I would like to reach is a function (thread in this case)
that is able to create a "time" windows in seconds (given by user).
Starting from the first element of the queue in this case:
[datetime.time(7, 6, 14, 537370), 584 add 5 seconds -> 7:6:19.537370 (ending point) and sum all elements in this interval:
[datetime.time(7, 6, 14, 537370), 584]
[datetime.time(7, 6, 18, 542798), 761]
Total: 584+761= 1345
Then create another "windows" with the second elements and goes on.
IMPORTANT: One item can be part of several window. the item are generated meanwhile, a naif solution with function that sleep for n second and then flush the queue is not good for my problem.
I think its a variation of this post:
Flexible sliding window (in Python)
But still can't solve the problem! Any help or suggests will be appreciated.
Thanks!
Example list of elements:
[datetime.time(7, 6, 14, 537370), 584]
[datetime.time(7, 6, 18, 542798), 761]
[datetime.time(7, 6, 20, 546007), 848]
[datetime.time(7, 6, 24, 550969), 20]
[datetime.time(7, 6, 27, 554370), 478]
[datetime.time(7, 6, 27, 554628), 12]
[datetime.time(7, 6, 31, 558919), 29]
[datetime.time(7, 6, 31, 559562), 227]
[datetime.time(7, 6, 32, 560863), 379]
[datetime.time(7, 6, 35, 564863), 132]
[datetime.time(7, 6, 37, 567276), 651]
[datetime.time(7, 6, 38, 568652), 68]
[datetime.time(7, 6, 40, 569861), 100]
[datetime.time(7, 6, 41, 571459), 722]
[datetime.time(7, 6, 44, 574802), 560]
...
Code:
import random
import time
import threading
import datetime
from multiprocessing import Queue
q = Queue()
#this is a producer that put elements in queue
def t1():
element = [0,0]
while True:
time.sleep(random.randint(0, 5))
element[0] = datetime.datetime.now().time()
element[1] = random.randint(0, 1000)
q.put(element)
#this is a consumer that sum elements inside a window of n seconds
#Ineed something a sliding window time of ten seconds that sum all elements for n seconds
def t2():
windowsize = 5 #size of the window 5 seconds
while not queue.empty():
e = q.get()
start = e[0] #the first element is the beginning point
end = start + datetime.timedelta(seconds=windowsize) #ending point
sum += e[1]
#some code that solve the problem :)
a = threading.Thread(target=t1)
a.start()
b = threading.Thread(target=t2)
b.start()
while True:
time.sleep(1)

Would this do? This is how I understood your problem. What this does is it creates a class that keeps track of things. You either add to this by tw.insert() or sum with tw.sum_window(seconds).
When you initialise TimeWindow, you can give it a max size parameter, default is 10 seconds. When you add elements or calculate sums, it does a clean up so that before every insert or sum operation, first element time e[0][0] and last element time e[n][0] are within 10 seconds of each other. Older entries are expunged. A "poller" thread is there to track your requests.
I have added two queues as I do not know what you intend to do with results. Now if you want to request data starting from now to 5 seconds in the future, you create a request and put it in queue. The request has a random id so that you can match it to results. Your main thread needs to monitor result queue and after five seconds, every request sent to queue return with the same id and the sum.
If this is not what you want to do, then I just don't understand what is it that you try to achieve here. Even this is already rather complicated and there may be a much simpler way to achieve what you intend to do.
import random
import time
import threading
import datetime
import Queue
import uuid
from collections import deque
q_lock = threading.RLock()
class TimeWindow(object):
def __init__(self, max_size=10):
self.max_size = max_size
self.q = deque()
def expire(self):
time_now = datetime.datetime.now()
while True:
try:
oldest_element = self.q.popleft()
oe_time = oldest_element[0]
if oe_time + datetime.timedelta(seconds=self.max_size) > time_now:
self.q.appendleft(oldest_element)
break
except IndexError:
break
def insert(self,elm):
self.expire()
self.q.append(elm)
def sum_window(self, start, end):
self.expire()
try:
_ = self.q[0]
except IndexError:
return 0
result=0
for f in self.q:
if start < f[0] < end:
result += f[1]
else:
pass
return result
tw = TimeWindow()
def t1():
while True:
time.sleep(random.randint(0, 3))
element = [datetime.datetime.now(), random.randint(0,1000)]
with q_lock:
tw.insert(element)
def poller(in_q, out_q):
pending = []
while True:
try:
new_request = in_q.get(0.1)
new_request["end"] = new_request["start"] + datetime.timedelta(seconds=new_request["frame"])
pending.append(new_request)
except Queue.Empty:
pass
new_pending = []
for a in pending:
if a["end"] < datetime.datetime.now():
with q_lock:
r_sum = tw.sum_window(a["start"], a["end"])
r_structure = {"id": a["id"], "result": r_sum}
out_q.put(r_structure)
else:
new_pending.append(a)
pending = new_pending
a = threading.Thread(target=t1)
a.daemon = True
a.start()
in_queue = Queue.Queue()
result_queue = Queue.Queue()
po = threading.Thread(target=poller, args=(in_queue, result_queue,))
po.daemon = True
po.start()
while True:
time.sleep(1)
newr = {"id": uuid.uuid4(), "frame": 5, "start": datetime.datetime.now()}
in_queue.put(newr)
try:
ready = result_queue.get(0)
print ready
except Queue.Empty:
pass

garim#wof:~$ python solution.py
1 t1 produce element: 16:09:30.472497 1
2 t1 produce element: 16:09:33.475714 9
3 t1 produce element: 16:09:34.476922 10
4 t1 produce element: 16:09:37.480100 7
solution: 16:09:37.481171 {'id': UUID('adff334f-a97a-459d-8dcc-f28309e25574'), 'result': 19}
5 t1 produce element: 16:09:38.481352 10
solution: 16:09:38.482687 {'id': UUID('0a7481e5-e993-439a-9f7e-2c5aeef86155'), 'result': 19}
It still doent works :( I add a counter for each element it inserts with function t1. The goal is do the sum (result_queue.get) at this time:
16:09:35.472497 ---> 16:09:30.472497 + 5 seconds
no before. Only then the element goes out. The next time the sum will be done at:
16:09:35.475714 ---> 16:09:33.475714 + 5 seconds
I understand that it's hard to explain.. With both of your solution the time window slide so I can consider the problem solved :) I will try to improve when the function sum will be execute, that time trigger is important. I acquire a lot useful knowledge. Thanks for helping.

Related

how to add multiprocessing to loops?

I have a large customer data set (10 million+) , that I am running my loop calculation. I am trying to add multiprocessing, but it seems to take longer when I use multiprocessing, by splitting data1 into chunks running it in sagemaker studio. I am not sure what I am doing wrong but the calculation takes longer when using multiprocessing, please help.
input data example:
state_list = ['A','B','C','D','E'] #possible states
data1 = pd.DataFrame({"cust_id": ['x111','x112'], #customer data
"state": ['B','E'],
"amount": [1000,500],
"year":[3,2],
"group":[10,10],
"loan_rate":[0.12,0.13]})
data1['state'] = pd.Categorical(data1['state'],
categories=state_list,
ordered=True).codes
lookup1 = pd.DataFrame({'year': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
'lim %': [0.1, 0.1, 0.1, 0.1, 0.1,0.1, 0.1, 0.1, 0.1, 0.1]}).set_index(['year'])
matrix_data = np.arange(250).reshape(10,5,5) #3d matrix by state(A-E) and year(1-10)
end = pd.Timestamp(year=2021, month=9, day=1) # creating a list of dates
df = pd.DataFrame({"End": pd.date_range(end, periods=10, freq="M")})
df['End']=df['End'].dt.day
End=df.values
end_dates = End.reshape(-1) # array([30, 31, 30, 31, 31, 28, 31, 30, 31, 30]); just to simplify access to the end date values
calculation:
num_processes = 4
# Split the customer data into chunks
chunks = np.array_split(data1, num_processes)
queue = mp.Queue()
def calc(chunk):
results1={}
for cust_id, state, amount, start, group, loan_rate in chunks.itertuples(name=None, index=False):
res1 = [amount * matrix_data[start-1, state, :]]
for year in range(start+1, len(matrix_data)+1,):
res1.append(lookup1.loc[year].iat[0] * np.array(res1[-1]))
res1.append(res1[-1] * loan_rate * end_dates[year-1]/365) # year - 1 here
res1.append(res1[-1]+ 100)
res1.append(np.linalg.multi_dot([res1[-1],matrix_data[year-1]]))
results1[cust_id] = res1
queue.put(results1)
processes = [mp.Process(target=calc, args=(chunk,)) for chunk in chunks]
for p in processes:
p.start()
for p in processes:
p.join()
results1 = {}
while not queue.empty():
results1.update(queue.get())
I think it would be easier to use a multiprocessing pool with the map method, which will submit tasks in chunks anyway but your worker function calc just needs to deal with individuals tuples since the chunking is done in a transparent function. The pool will compute what it thinks is an optimal number of rows to be chunked together based on the total number of rows and the number of processes in the pool, but you can override this. So a solution would look something like the following. Since you have not tagged your question with the OS you are running under, the code below should run under Windows, Linux or MacOS in the most efficient way for that platform. But as I mentioned in a comment, multiprocessing may actually slow down getting your results if calc is not sufficiently CPU-intensive.
from multiprocessing import Pool
import pandas as pd
import numpy as np
def init_pool_processes(*args):
global lookup1, matrix_data, end_dates
lookup1, matrix_data, end_dates = args # unpack
def calc(t):
cust_id, state, amount, start, group, loan_rate = t # unpack
results1 = {}
res1 = [amount * matrix_data[start-1, state, :]]
for year in range(start+1, len(matrix_data)+1,):
res1.append(lookup1.loc[year].iat[0] * np.array(res1[-1]))
res1.append(res1[-1] * loan_rate * end_dates[year-1]/365) # year - 1 here
res1.append(res1[-1] + 100)
return (cust_id, res1) # return tuple
def main():
state_list = ['A','B','C','D','E'] #possible states
data1 = pd.DataFrame({"cust_id": ['x111','x112'], #customer data
"state": ['B','E'],
"amount": [1000,500],
"year":[3,2],
"group":[10,10],
"loan_rate":[0.12,0.13]})
data1['state'] = pd.Categorical(data1['state'],
categories=state_list,
ordered=True).codes
lookup1 = pd.DataFrame({'year': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
'lim %': [0.1, 0.1, 0.1, 0.1, 0.1,0.1, 0.1, 0.1, 0.1, 0.1]}).set_index(['year'])
matrix_data = np.arange(250).reshape(10,5,5) #3d matrix by state(A-E) and year(1-10)
end = pd.Timestamp(year=2021, month=9, day=1) # creating a list of dates
df = pd.DataFrame({"End": pd.date_range(end, periods=10, freq="M")})
df['End']=df['End'].dt.day
End=df.values
end_dates = End.reshape(-1) # array([30, 31, 30, 31, 31, 28, 31, 30, 31, 30]); just to simplify access to the end date values
with Pool(initializer=init_pool_processes, initargs=(lookup1, matrix_data, end_dates)) as pool:
results = {cust_id: arr for cust_id, arr in pool.map(calc, data1.itertuples(name=None, index=False))}
for cust_id, arr in results.items():
print(cust_id, arr)
if __name__ == '__main__':
main()
Prints:
x111 [array([55000, 56000, 57000, 58000, 59000]), array([5500., 5600., 5700., 5800., 5900.]), array([56.05479452, 57.0739726 , 58.09315068, 59.11232877, 60.13150685]), array([156.05479452, 157.0739726 , 158.09315068, 159.11232877,
160.13150685]), array([15.60547945, 15.70739726, 15.80931507, 15.91123288, 16.01315068]), array([0.15904763, 0.16008635, 0.16112507, 0.1621638 , 0.16320252]), array([100.15904763, 100.16008635, 100.16112507, 100.1621638 ,
100.16320252]), array([10.01590476, 10.01600864, 10.01611251, 10.01621638, 10.01632025]), array([0.09220121, 0.09220216, 0.09220312, 0.09220407, 0.09220503]), array([100.09220121, 100.09220216, 100.09220312, 100.09220407,
100.09220503]), array([10.00922012, 10.00922022, 10.00922031, 10.00922041, 10.0092205 ]), array([0.10201178, 0.10201178, 0.10201178, 0.10201178, 0.10201178]), array([100.10201178, 100.10201178, 100.10201178, 100.10201178,
100.10201178]), array([10.01020118, 10.01020118, 10.01020118, 10.01020118, 10.01020118]), array([0.09873075, 0.09873075, 0.09873075, 0.09873075, 0.09873075]), array([100.09873075, 100.09873075, 100.09873075, 100.09873075,
100.09873075]), array([10.00987308, 10.00987308, 10.00987308, 10.00987308, 10.00987308]), array([0.10201843, 0.10201843, 0.10201843, 0.10201843, 0.10201843]), array([100.10201843, 100.10201843, 100.10201843, 100.10201843,
100.10201843]), array([10.01020184, 10.01020184, 10.01020184, 10.01020184, 10.01020184]), array([0.09873076, 0.09873076, 0.09873076, 0.09873076, 0.09873076]), array([100.09873076, 100.09873076, 100.09873076, 100.09873076,
100.09873076])]
x112 [array([22500, 23000, 23500, 24000, 24500]), array([2250., 2300., 2350., 2400., 2450.]), array([24.04109589, 24.57534247, 25.10958904, 25.64383562, 26.17808219]), array([124.04109589, 124.57534247, 125.10958904, 125.64383562,
126.17808219]), array([12.40410959, 12.45753425, 12.5109589 , 12.56438356, 12.61780822]), array([0.13695496, 0.13754483, 0.1381347 , 0.13872456, 0.13931443]), array([100.13695496, 100.13754483, 100.1381347 , 100.13872456,
100.13931443]), array([10.0136955 , 10.01375448, 10.01381347, 10.01387246, 10.01393144]), array([0.11056217, 0.11056282, 0.11056347, 0.11056413, 0.11056478]), array([100.11056217, 100.11056282, 100.11056347, 100.11056413,
100.11056478]), array([10.01105622, 10.01105628, 10.01105635, 10.01105641, 10.01105648]), array([0.09983629, 0.09983629, 0.09983629, 0.09983629, 0.09983629]), array([100.09983629, 100.09983629, 100.09983629, 100.09983629,
100.09983629]), array([10.00998363, 10.00998363, 10.00998363, 10.00998363, 10.00998363]), array([0.11052119, 0.11052119, 0.11052119, 0.11052119, 0.11052119]), array([100.11052119, 100.11052119, 100.11052119, 100.11052119,
100.11052119]), array([10.01105212, 10.01105212, 10.01105212, 10.01105212, 10.01105212]), array([0.10696741, 0.10696741, 0.10696741, 0.10696741, 0.10696741]), array([100.10696741, 100.10696741, 100.10696741, 100.10696741,
100.10696741]), array([10.01069674, 10.01069674, 10.01069674, 10.01069674, 10.01069674]), array([0.11052906, 0.11052906, 0.11052906, 0.11052906, 0.11052906]), array([100.11052906, 100.11052906, 100.11052906, 100.11052906,
100.11052906]), array([10.01105291, 10.01105291, 10.01105291, 10.01105291, 10.01105291]), array([0.10696741, 0.10696741, 0.10696741, 0.10696741, 0.10696741]), array([100.10696741, 100.10696741, 100.10696741, 100.10696741,
100.10696741])]
If you wish to save memory, you could use method imap_unordered:
def main():
... # code omitted
def compute_chunksize(iterable_size, pool_size):
chunksize, remainder = divmod(iterable_size, 4 * pool_size)
if remainder:
chunksize += 1
return chunksize
from multiprocessing import cpu_count
pool_size = cpu_count()
iterable_size = 100_000 # Your best estimate
chunksize = compute_chunksize(iterable_size, pool_size)
with Pool(pool_size, initializer=init_pool_processes, initargs=(lookup1, matrix_data, end_dates)) as pool:
it = pool.imap_unordered(calc, data1.itertuples(name=None, index=False), chunksize=chunksize)
"""
# Create dictionary in memory:
results = {cust_id: arr for cust_id, arr in it}
"""
# Or to save memory, iterate the results:
for cust_id, arr in it:
print(cust_id, arr)
if __name__ == '__main__':
main()

Passing thread names as gloabl variables to a function

I'm a bit stuck, i'm trying to pass thread names given by the system to my function so that i can print the start and end time of the current thread working in the function, i'm using global variables name for that. The user has to input a number in the given interval. The thread names work fine when i inputed 1001 but if i input numbers like 1200 or 10001 the names do not fit anymore. I put examples of the output, output 1 is not what i'm looking, output 2 is what i need. I'm not sure what is causing the name change. If any additional information is needed i'm happy to provide it
import os
from posixpath import abspath
import time
import sys
import signal
import threading
import platform
import subprocess
from pathlib import Path
import math
lokot = threading.Lock()
lista = []
name = 0
name3 = 0
def divisor(start,end):
lokot.acquire()
start = time.time()
print('{} started working at {}'.format(name, start))
for i in range(int(start),int(end)+1):
if int(end) % i == 0:
lista.append(i)
end = time.time()
print('{} ended working at {}'.format(name, end))
lokot.release()
def new_lista():
lokot.acquire()
start = time.time()
nlista = []
for i in lista:
if i % 2 == 0:
nlista.append(i)
print(nlista)
print('{} was executed in time frame {}'.format(name3,time.time()-start))
lokot.release()
def f4():
while (True):
print ('Input a non negative number in given range <1000,200000>')
number = input()
if number.isalpha() or not number or int(number) not in range(1000,200000) :
print('Number entered is not in the interval <1000,200000>')
continue
else:
global name
global name3
x = int(number) / 2
t1 = threading.Thread(target=divisor, args=(1, x))
t2 = threading.Thread(target=divisor, args=(1, number))
t3 = threading.Thread(target=nova_lista)
name = t1.name
t1.start()
name = t2.name
t2.start()
name3 = t3.name
t3.start()
t1.join()
t2.join()
t3.join()
break
Input 1:
100001
Output 1:
Thread-1 started working at 1624538800.4813018
Thread-2 ended working at 1624538800.4887686
Thread-2 started working at 1624538800.4892647
Thread-2 ended working at 1624538800.5076165
[2, 4, 8, 10, 16, 20, 40, 50, 80, 100, 200, 250, 400, 500, 1000, 1250, 2000, 2500, 5000, 6250, 10000, 12500, 25000, 50000]
Thread-3 dwas executed in time frame 0.0
Input 2:
1001
Output 2:
Thread-1 started working at 1624538882.90607
Thread-1 ended working at 1624538882.9070616
Thread-2 started working at 1624538882.9074266
Thread-2 ended working at 1624538882.9089162
[2, 4, 8, 10, 20, 40, 50, 100, 200, 250, 500, 1000, 1250, 2500, 5000]
Thread-3 dretva se izvodila 0.0
This won't necessarily work:
name = t1.name
t1.start()
name = t2.name
Nothing prevents the second assignment from happening before the t1 thread accesses the name variable.
Q: Why don't you just assign names when you create the threads instead of letting the Threading library assign them? E.g.;
t1 = threading.Thread(target=divisor, args=(1, x), name="t1")

I need some assistance with list methods in Python, storing input given automatically by system and execution when called

So i need to store input from system and trigger code block when given command via input matches with condition. Given commands are randomly produced by system and its not same everytime when codes are executed. What i do below is; i store input in a list until input become blankspace and which shows commands are over and it specifically stated in statement that commands will end with blankspace after last command. Read commands, input and values from that command list until there is no command to perform. I know this is bad practice. Since I am newb in this language i need some advice to change my code. Thanks in advance. Btw i cant change conditions in if statements as given commands via input is not the same but like this and more:
append_it 15
insert_it 0 25
remove_it 30
Code works just fine i need advice to make it good code practice to improve myself in Python.
i = 0
command_list = []
while True:
command = input('')
if command == '':
break
command_list.append(command)
i += 1
b = 0
arr = []
while i != b:
command1 = command_list[b]
b += 1
if command1[0:8] == "append_it":
value = int(command1[9:])
arr.append(value)
elif command1[0:4] == "insert_it":
index = int(command1[5:7])
value = int(command1[7:])
arr.insert(index, value)
elif command1[0:3] == "remove_it":
value = int(command1[4:])
if value in liste:
arr.remove(value)
elif command1[0:] == "print_it":
print(arr)
elif command1[0:] == "reverse_it":
arr.reverse()
elif command1[0:] == "sort_it":
arr.sort()
elif command1[0:] == "pop_it":
arr.pop()
You can improve by defining actions to do in a dictionary, adding the inputted values as splitted list and call the appropriate function for the appropriate input:
def appendit(a, *prms):
v = int(prms[0])
a.append(v)
def insertit(a, *prms):
i = int(prms[0])
v = int(prms[1])
a.insert(i,v)
def removeit(a, *prms):
v = int(prms[0])
a.remove(v) # no need to test
def reverseit(a): a.reverse()
def sortit(a): a.sort()
def popit(a): a.pop()
# define what command to run for what input
cmds = {"append_it" : appendit,
"insert_it" : insertit,
"remove_it" : removeit,
"print_it" : print, # does not need any special function
"reverse_it": reverseit,
"sort_it" : sortit,
"pop_it" : popit}
command_list = []
while True:
command = input('')
if command == '':
break
c = command.split() # split the command already
# only allow commands you know into your list - they still might have the
# wrong amount of params given - you should check that in the functions
if c[0] in cmds:
command_list.append(c)
arr = []
for (command, *prms) in command_list:
# call the correct function with/without params
if prms:
cmds[command](arr, *prms)
else:
cmds[command](arr)
Output:
# inputs from user:
append_it 42
append_it 32
append_it 52
append_it 62
append_it 82
append_it 12
append_it 22
append_it 33
append_it 12
print_it # 1st printout
sort_it
print_it # 2nd printout sorted
reverse_it
print_it # 3rd printout reversed sorted
pop_it
print_it # one elem popped
insert_it 4 99
remove_it 42
print_it # 99 inserted and 42 removed
# print_it - outputs
[42, 32, 52, 62, 82, 12, 22, 33, 12]
[12, 12, 22, 32, 33, 42, 52, 62, 82]
[82, 62, 52, 42, 33, 32, 22, 12, 12]
[82, 62, 52, 42, 33, 32, 22, 12]
[82, 62, 52, 99, 33, 32, 22, 12]

Multiprocessing in Python to process a list of parameters

I'm writing my first multiprocessing program in python.
I want to create a list of values to be processed, and 8 processes (number os CPU cores) will consume and process the list of values.
I wrote the following python code:
__author__ = 'Rui Martins'
from multiprocessing import cpu_count, Process, Lock, Value
def proc(lock, number_of_active_processes, valor):
lock.acquire()
number_of_active_processes.value+=1
print "Active processes:", number_of_active_processes.value
lock.release()
# DO SOMETHING ...
for i in range(1, 100):
valor=valor**2
# (...)
lock.acquire()
number_of_active_processes.value-=1
lock.release()
if __name__ == '__main__':
proc_number=cpu_count()
number_of_active_processes=Value('i', 0)
lock = Lock()
values=[11, 24, 13, 40, 15, 26, 27, 8, 19, 10, 11, 12, 13]
values_processed=0
processes=[]
for i in range(proc_number):
processes+=[Process()]
while values_processed<len(values):
while number_of_active_processes.value < proc_number and values_processed<len(values):
for i in range(proc_number):
if not processes[i].is_alive() and values_processed<len(values):
processes[i] = Process(target=proc, args=(lock, number_of_active_processes, values[values_processed]))
values_processed+=1
processes[i].start()
while number_of_active_processes.value == proc_number:
# BUG: always number_of_active_processes.value == 8 :(
print "Active processes:", number_of_active_processes.value
print ""
print "Active processes at END:", number_of_active_processes.value
And, I have the following problem:
The program never stop
I get out of RAM
Simplifying your code to the following:
def proc(lock, number_of_active_processes, valor):
lock.acquire()
number_of_active_processes.value += 1
print("Active processes:", number_of_active_processes.value)
lock.release()
# DO SOMETHING ...
for i in range(1, 100):
print(valor)
valor = valor **2
# (...)
lock.acquire()
number_of_active_processes.value -= 1
lock.release()
if __name__ == '__main__':
proc_number = cpu_count()
number_of_active_processes = Value('i', 0)
lock = Lock()
values = [11, 24, 13, 40, 15, 26, 27, 8, 19, 10, 11, 12, 13]
values_processed = 0
processes = [Process() for _ in range(proc_number)]
while values_processed < len(values)-1:
for p in processes:
if not p.is_alive():
p = Process(target=proc,
args=(lock, number_of_active_processes, values[values_processed]))
values_processed += 1
p.start()
If you run it like above the print(valor) added you see exactly what is happening, you are exponentially growing valor to the point you run out of memory, you don't get stuck in the while you get stuck in the for loop.
This is the output at the 12th process adding a print(len(srt(valor))) after a fraction of a second and it just keeps on going:
2
3
6
11
21
.........
59185
70726
68249
73004
77077
83805
93806
92732
90454
104993
118370
136498
131073
Just changing your loop to the following:
for i in range(1, 100):
print(valor)
valor = valor *2
The last number created is:
6021340351084089657109340225536
Using your own code you seem to get stuck in the while but it is valor is growing in the for loop to numbers with as many digits as:
167609
180908
185464
187612
209986
236740
209986
And on....
The problem is not your multiprocessing code. It's the pow operator in the for loop:
for i in range(1, 100):
valor=valor**2
the final result would be pow(val, 2**100), and this is too big, and calculate it would cost too much time and memory. so you got out of memory error in the last.
4 GB = 4 * pow(2, 10) * pow(2, 10) * pow(2, 20) * 8 bit = 2**35 bit
and for your smallest number 8:
pow(8, 2**100) = pow(2**3, 2**100) = pow(2, 3*pow(2, 100))
pow(2, 3*pow(2, 100))bit/4GB = 3*pow(2, 100-35) = 3*pow(2, 65)
it need 3*pow(2, 65) times of 4 GB memory.

Python multiprocessing , code keep on executing?

from multiprocessing import Process , Queue
from datetime import datetime
c = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27]
out = Queue()
def support(m):
for k in m :
print "%s <-- hi" % k
out.put("done")
all = Queue()
temp = []
total = len(c)
count = 0
for m in c :
count += 1
total = total - 1
temp.append(m)
if count == 5 or total == 0 :
all.put(temp)
count = 0
temp = []
process_count = 3
while all.qsize() != 0 :
process_list = []
try :
for x in range(process_count) :
p = Process(target=support, args=(all.get(),))
process_list.append(p)
for p in process_list :
p.start()
for p in process_list :
p.join()
except Exception as e :
print e
while out.qsize != 0 :
print out.get()
print "all done"
I dont know why it does not end and does not print "all done" , just remain continuously in loop or keep executing .
Will be of great help if you can make this code more efficient but first i want to know why it does not end .
The problem is:
while out.qsize != 0 :
print out.get()
out.qsize is a function, so now you're comparing the function itself (not the return value!) with 0, with is of course always False.
You should use:
while out.qsize() != 0 :
print out.get()

Categories