Let two process function1, function2 are running at the same time.
function1// continuously appending the list
function2// take that the list from function1 and get all the data from the list and copy to another list, flush the original list and process that copied list.
sample code:
list_p =[]
def function1(data):
list_p.append(data)
def function2(list_p):
list_q = list_p.copy()
list_p.flush()
x= process(list_q)
return x
while True:
//coming data continously
function1(coming data)
So, how to work with both function1 and function2 at a time so that I can get the data from function1 and flush it (after flushing start appending the index in function1 from 0) Also, at the same time list could be appending in function1.
At the same time, function1 could be appending the list and function 2 could be processing the new list, after finishing function2's process, It again takes all the data in the original list that was appending while function2 was processing.
continue..
Here is an example using Threading. In place of data stream I used input function in producer. (It's based on https://techmonger.github.io/55/producer-consumer-python/.)
from threading import Thread
from queue import Queue
q = Queue()
final_results = []
def producer():
while True:
i = int(input('Give me some number: ')) # here you should get data from data stream
q.put(i)
def consumer():
while True:
number = q.get()
result = number**2
final_results.append(result)
print(final_results)
q.task_done()
t = Thread(target=consumer)
t.daemon = True
t.start()
producer()
Related
Hello i have a csv with about 2,5k lines of outlook emails and passwords
The CSV looks like
header:
username, password
content:
test1233#outlook.com,123password1
test1234#outlook.com,123password2
test1235#outlook.com,123password3
test1236#outlook.com,123password4
test1237#outlook.com,123password5
the code allows me to go into the accounts and delete every mail from them, but its taking too long for 2,5k accounts to pass the script so i wanted to make it faster with multithreading.
This is my code:
from csv import DictReader
import imap_tools
from datetime import datetime
def IMAPDumper(accountList, IMAP_SERVER, search_criteria, row):
accountcounter = 0
with open(accountList, 'r') as read_obj:
csv_dict_reader = DictReader(read_obj)
for row in csv_dict_reader:
# TIMESTAMP FOR FURTHER DEBUGGING TO CHECK IF THE SCRIPT IS STOPPING AT A POINT
TIMESTAMP = datetime.now().strftime("[%H:%M:%S]")
# adds a counter for the amount of accounts processed by the script
accountcounter = accountcounter + 1
print("_____________________________________________")
print(TIMESTAMP, "Account", accountcounter)
print("_____________________________________________")
# resetting emailcounter each time
emailcounter = 0
This is a job that is best accomplished using a thread pool whose optimum size will need to be experimented with. I have set the size below to 100, which may be overly ambitious (or not). You can try decreasing or increasing NUM_THREADS to see what effect it has.
The important thing is to modify function IMAPDumper so that it is passed a single row from the csv file that it is to be processed and that it therefore does not need to open and read the file itself.
There are various methods you can use with class ThreadPool in module multiprocessing.pool (this class is not well-documented; it is the multithreading analog of the multiprocessing pool class Pool in module multiprocessing.pool and has the same exact interface). The advantage of imap_unordered is that (1) the passed iterable argument can be a generator that will not be converted to a list, which will save memory and time if that list would be very large and (2) the ordering of the results (return values from the worker function, IMAPDumper in this case) are arbitrary and therefore might run slightly faster than imap or map. Since your worker function does not explicitly return a value (defaults to None), this should not matter.
from csv import DictReader
import imap_tools
from datetime import datetime
from multiprocessing.pool import ThreadPool
from functools import partial
def IMAPDumper(IMAP_SERVER, search_criteria, row):
""" process a single row """
# TIMESTAMP FOR FURTHER DEBUGGING TO CHECK IF THE SCRIPT IS STOPPING AT A POINT
TIMESTAMP = datetime.now().strftime("[%H:%M:%S]")
# adds a counter for the amount of accounts processed by the script
accountcounter = accountcounter + 1
print("_____________________________________________")
print(TIMESTAMP, "Account", accountcounter)
... # etc
def generate_rows():
""" generator function to yield rows """
with open('outlookAccounts.csv', newline='') as f:
dict_reader = DictReader(f)
for row in dict_reader:
yield row
NUM_THREADS = 100
worker = partial(IMAPDumper, "outlook.office365.com", "ALL")
pool = ThreadPool(NUM_THREADS)
for return_value in pool.imap_unordered(worker, generate_rows()):
# must iterate the iterator returned by imap_unordered to ensure all tasks are run and completed
pass # return values are None
This is not necessarily the best way to do it, but the shortest in writitng time. I don't know if you are familiar with python generators, but we will have to use one. the generator will work as a work dispatcher.
def generator():
with open("t.csv", 'r') as read_obj:
csv_dict_reader = DictReader(read_obj)
for row in read_obj:
yield row
gen = generator()
Next, you will have your main function where you do your IMAP stuff
def main():
while True:
#The try prevent the thread from crashing when all the file will be processed
try:
#Returns next line of the csv
working_set = next(gen)
#do_some_stuff
# -
#do_other_stuff
except:
break
Then you just have to split the work in multiple thread!
#You can change the number of thread
number_of_threads = 5
thread_list = []
#Creates 5 thread object
for _ in range(number_of_threads):
thread_list.append(threading.Thread(target=main))
# Starts all thread object
for thread in thread_list:
thread.start()
I hope this helped you!
I try to use multiprocessing in this way:
from multiprocessing import Pool
added = []
def foo(i):
added = []
# do something
added.append(x[i])
return added
if __name__ == '__main__':
h = 0
while len(added)<len(c):
pool = Pool(4)
result = pool.imap_unordered(foo, c)
added.append(result[-1])
pool.close()
pool.join()
h = h + 1
Multiprocessing takes place in the while-loop, and in the foo function, the
added list is created. In each subsequent step h in the loop, the listadded should be incremented by subsequent values, and the current list added should be used in the functionfoo. Is it possible to pass the current contents of the list to the function in each subsequent step of the loop? Because in the above code, the foo function creates the new contents of the added list from scratch each time. How can this be solved?
You can use a multiprocessing.Queue. The rough idea is to construct one of these in your main process, pass it to the child processes, and each foo() invocation can call put(x[i]) to add a value to the queue.
The main process will then read the queue to collect the results.
I have a situation where I have to read and write the list simultaneously.
It seems the code starts to read after it completes writing all the elements in the list.What I want to do is that the code will keep on adding elements in one end and I need to keep on processing first 10 elements simultaneously.
import csv
testlist=[]
with open('some.csv', 'rb') as f:
reader = csv.reader(f)
for row in reader:
testlist.append(row)
def render(small)
#do some stuff
while(len(testlist)>0)
pool = Pool(processes=10)
small=testlist[:10]
del testlist[:10]
pool.map_async(render,small)
pool.close()
pool.join()
You need a queue that is shared between processes. One process adds to the queue, the other processes from the queue.
Here is a simplified example:
from multiprocessing import Process, Queue
def put(queue):
# writes items to the queue
queue.put('something')
def get(queue):
# gets item from the queue to work on
while True:
item = queue.get()
# do something with item
if __name__=='__main__':
queue = Queue()
getter_process = Process(target=get, args=((queue),))
getter_process.daemon = True
getter_process.start()
writer(queue) # Send something to the queue
getter_process.join() # Wait for the getter to finish
If you want to only process 10 things at a time, you can limit the queue size to 10. This means, the "writer" cannot write anything until if the queue already has 10 items waiting to be processed.
By default, the queue has no bounds/limits. The documentation is a good place to start for more on queues.
You can do like this
x=[]
y=[1,2,3,4,5,6,7,...]
for i in y:
x.append(i)
if len(x)<10:
print x
else:
print x[:10]
x=x-x[:10]
PS: assuming y is an infinite stream
I have a Producer process that runs and puts the results in a Queue
I also have a Consumer function that takes the results from the Queue and processes them , for example:
def processFrame(Q,commandsFile):
fr = Q.get()
frameNum = fr[0]
Frame = fr[1]
#
# Process the frame
#
commandsFile.write(theProcessedResult)
I want to run my consumer function using multiple processes, they number should be set by user:
processes = raw_input('Enter the number of process you want to use: ')
i tried using Pool:
pool = Pool(int(processes))
pool.apply(processFrame, args=(q,toFile))
when i try this , it returns a RuntimeError: Queue objects should only be shared between processes through inheritance.
what does that mean?
I also tried to use a list of processes:
while (q.empty() == False):
mp = [Process(target=processFrame, args=(q,toFile)) for x in range(int(processes))]
for p in mp:
p.start()
for p in mp:
p.join()
This one seems to run, but not as expected.
it using multiple processes on same frame from Queue, doesn't Queue have locks?
also ,in this case the number of processes i'm allowed to use must divide the number of frames without residue(reminder) - for example:
if i have 10 frames i can use only 1,2,5,10 processes. if i use 3,4.. it will create a process while Q empty and wont work.
if u want to recycle the procces until q is empty u should just try to do somthing like that:
code1:
def proccesframe():
while(True):
frame = queue.get()
##do something
your procces will be blocked until there is something in the queue
i dont think that's a good idie to use multiproccess on the cunsomer part , you should use them on the producer.
if u want to terminate the procces when the queue is empty u can do something like that:
code2:
def proccesframe():
while(!queue.empty()):
frame = queue.get()
##do something
terminate_procces()
update:
if u want to use multiproccesing in the consumer part just do a simple loop and add code2 , then you will be able to close your proccess when u finish doing stuff with the queue.
I am not entirely sure what are you trying to accomplish from your explanation, but have you considered using multiprocessing.Pool with its methods map or map_async?
from multiprocessing import Pool
from foo import bar # your function
if __name__ == "__main__":
p = Pool(4) # your number of processes
result = p.map_async(bar, [("arg #1", "arg #2"), ...])
print result.get()
It collects result from your function in unordered(!) iterable and you can use it however you wish.
UPDATE
I think you should not use queue and be more straightforward:
from multiprocessing import Pool
def process_frame(fr): # PEP8 and see the difference in definition
# magic
return result # and result handling!
if __name__ == "__main__":
p = Pool(4) # your number of processes
results = p.map_async(process_frame, [fr_1, fr_2, ...])
# Do not ever write or manipulate with files in parallel processes
# if you are not 100% sure what you are doing!
for result in results.get():
commands_file.write(result)
UPDATE 2
from multiprocessing import Pool
import random
import time
def f(x):
return x*x
def g(yr):
with open("result.txt", "ab") as f:
for y in yr:
f.write("{}\n".format(y))
if __name__ == '__main__':
pool = Pool(4)
while True:
# here you fetch new data and send it to process
new_data = [random.randint(1, 50) for i in range(4)]
pool.map_async(f, new_data, callback=g)
Some example how to do it and I updated the algorithm to be "infinite", it can be only closed by interruption or kill command from outside. You can use also apply_async, but it would cause slow downs with result handling (depending on speed of processing).
I have also tried using long-time open result.txt in global scope, but every time it hit deadlock.
I have a complexed problem with python multiprocessing module.
I have build a script that in one place has to call a multiargument function (call_function) for each element in a specyfic list. My idea is to define an integer 'N' and divide this problem for single sub processes.
li=[a,b,c,d,e] #elements are int's
for element in li:
call_function(element,string1,string2,int1)
call_summary_function()
Summary function will analyze results obtained by all iterations of the loop. Now, I want each iteration to be carried out by single sub process, but there cannot be more than N subprocesses altogether. If so, main process should wait until 1 of subprocesses end and then perform another iteration. Also, call_sumary_function need to be called after all the sub processes finish.
I have tried my best with multiprocessing module, Locks and global variables to keep the actual number of subprocesses running (to compare to N) but every time i get error.
//--------------EDIT-------------//
Firstly, the main process code:
MAX_PROCESSES=3
lock=multiprocessing.Lock()
processes=0
k=0
while k < len(k_list):
if processes<=MAX_PROCESSES: # running processes <= 'N' set by me
p = multiprocessing.Process(target=single_analysis, args=(k_list[k],main_folder,training_testing,subsets,positive_name,ratio_list,lock,processes))
p.start()
k+=1
else: time.sleep(1)
while processes>0: time.sleep(1)
Now: the function that is called by multiprocessing:
def single_analysis(k,main_folder,training_testing,subsets,positive_name,ratio_list,lock,processes):
lock.acquire()
processes+=1
lock.release()
#stuff to do
lock.acquire()
processes-=1
lock.release()
I get the Error that int value (processes variable) is always equal to 0, since single_analysis() function seems to create new, local variable processes.
When I change processes to global and import it in single_analysis() with global keyword and type print processes in within the function I get len(li) times 1...
What you're describing is pefectly suited for multiprocessing.Pool - specifically its map method:
import multiprocessing
from functools import partial
def call_function(string1, string2, int1, element):
# Do stuff here
if __name__ == "__main__":
li=[a,b,c,d,e]
p = multiprocessing.Pool(N) # The pool will contain N worker processes.
# Use partial so that we can pass a method that takes more than one argument to map.
func = partial(call_function, string1,string2,int1)
results = p.map(func, li)
call_summary_function(results)
p.map will call call_function(string1, string2, int1, element), for each element in the li list. results will be a list containing the value returned by each call to call_function. You can pass that list to call_summary_function to process the results.