Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed last year.
Improve this question
I have recently heard some things about parallel programming and I know basically nothing about it but will read up on it. But as a start, is it possible to run for instance this code:
for i in range(1000):
print('Process1:', i)
for j in range(1000):
print('Process2:', j)
in parallel? This case is just a toy example but it would help the understanding of the "potential" of parallel programming.
And, if that code can be ran in parallell, will it print out output from both loops in parallell in the following manner:
Process1: 0
Process2: 0
Process1: 1
.
.
.
or what?
Since you are a beginner, the same with threading below:
(Note that the output is not controlled in any way and might mix up!)
import threading
def f1():
for i in range(1000):
print('Process1:', i)
def f2():
for j in range(1000):
print('Process2:', j)
t1=threading.Thread(target=f1)
t2=threading.Thread(target=f2)
t1.start()
t2.start()
Short anser is yes, and how the output will look depends on which method you use.
For example if you want to use concurrent.futures, you can print the output as it is performed inside the function, and order will be scrambled.
If you instead want to access return values, you can chose if you want to access them as they are completed, in whichever order they happen to finish, or you can use the "map" function to retrieve them in the expected order.
import concurrent.futures
def test_function(arguments: tuple):
test_value, function = arguments
"""Print the test value 1000 times"""
for i in range(0, 1000):
print(f"Function {function}, {test_value}, iteration {i}")
return test_value
def main():
"""Main function"""
# Context manager for parallel tasks
with concurrent.futures.ThreadPoolExecutor() as executor:
# Submit example. Executes the function calls asynchronously
result = [executor.submit(test_function, (i, "submit")) for i in range(1, 21)]
# Map example.
# Takes an iterable as argument that will execute the function once for each item
result_2 = executor.map(test_function, [(i, "map") for i in range(1, 21)])
for future in concurrent.futures.as_completed(result):
print(f"Submit: Process {future.result()} completed")
for future in result_2:
print(f"Map: Process {future} completed")
if __name__ == '__main__':
main()
Related
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 1 year ago.
Improve this question
How can I add 1 to a variable every 2 minutes it until reaches 100.
The program will start the number from 0 and it will add every 1 number in every 2 minutes until it reaches 100.
2 min later
0/100 ------> 1/100
Use sleep function from time module
from time import sleep
i = 0
while i <= 100:
sleep(120)
i += 1
I used sleep!
from time import sleep
for i in range(100):
sleep(120)
# print(i)
If you need to make a progressbar, you can also check tqdm.
from tqdm import tqdm
import time
for _ in tqdm(range(100)):
time.sleep(120)
One line solution:
from time import sleep
for t in range(100): time.sleep(120)
I believe all solutions presented so far are thread locking??
import asyncio, time
async def waitT(tWait, count):
print(count)
while count < 100: #The 100 could be passed as a param to make it more generic
await asyncio.sleep(tWait)
count = count + 1
print(count)
return
async def myOtherFoo():
#Do stuff in here
print("aaa")
await asyncio.sleep(2)
print("test that its working")
return
async def main(count):
asyncio.gather(myOtherFoo(), waitT(120, count)) #Obviously tweak 120 to whatever interval in seconds you want
return
if __name__ == "__main__":
count = 0
asyncio.create_task(main(count))
A simple, and hopefully readable, async solution. Doesn't check for running loops etc - but should open you up to a range of possibilities for acting between each update in your counter.
Let us consider the following code where I calculate the factorial of 4 really large numbers, saving each output to a separate .txt file (out_mp_{idx}.txt). I use multiprocessing (4 processes) to reduce the computation time. Though this works fine, I want to output all the 4 results in one file.
One way is to open each of the generated (4) files I create (from the code below) and append to a new file, but that's not my choice (below is just a simplistic version of my code, I have too many files to handle, which defeats the purpose of time-saving via multiprocessing). Is there a better way to automate such that the results from the processes are all dumped/appended to some file? Also, in my case the returned results form each process could be several lines, so how would we avoid open-file conflict for the case when the results are appended in the output file by one process and second process returns its answer and wants to open/access the output file?
As an alternative, I tried process.immap route, but that's not as computationally efficient as the below code. Something like this SO post.
from multiprocessing import Process
import os
import time
tic = time.time()
def factorial(n, idx): # function to calculate the factorial
num = 1
while n >= 1:
num *= n
n = n - 1
with open(f'out_mp_{idx}.txt', 'w') as f0: # saving output to a separate file
f0.writelines(str(num))
def My_prog():
jobs = []
N = [10000, 20000, 40000, 50000] # numbers for which factorial is desired
n_procs = 4
# executing multiple processes
for i in range(n_procs):
p = Process(target=factorial, args=(N[i], i))
jobs.append(p)
for j in jobs:
j.start()
for j in jobs:
j.join()
print(f'Exec. Time:{time.time()-tic} [s]')
if __name__=='__main__':
My_prog()
You can do this.
Create a Queue
a) manager = Manager()
b) data_queue = manager.Queue()
c) put all data in this queue.
Create a thread and start it before multiprocess
a) create a function which waits on data_queue.
Something like
`
def fun():
while True:
data = data_queue.get()
if instance(data_queue, Sentinal):
break
#write to a file
`
3) Remember to send some Sentinal object after all multiprocesses are done.
You can also make this thread a daemon thread and skip sentinal part.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I am very new to Python. I am currently using Jupyter Notebook and I need to print the variable "pos_best_g" outside of the following class:
class PSO():
def __init__(self,costFunc,x0,bounds,num_particles,maxiter):
global num_dimensions
num_dimensions=len(x0)
err_best_g=-1 # best error for group
pos_best_g=[] # best position for group
# establish the swarm
swarm=[]
for i in range(0,num_particles):
swarm.append(Particle(x0))
# begin optimization loop
i=0
while i < maxiter:
#print i,err_best_g
# cycle through particles in swarm and evaluate fitness
for j in range(0,num_particles):
swarm[j].evaluate(costFunc)
# determine if current particle is the best (globally)
if swarm[j].err_i < err_best_g or err_best_g == -1:
pos_best_g=list(swarm[j].position_i)
err_best_g=float(swarm[j].err_i)
# cycle through swarm and update velocities and position
for j in range(0,num_particles):
swarm[j].update_velocity(pos_best_g)
swarm[j].update_position(bounds)
i+=1
# print final results
print ('FINAL:')
print (pos_best_g)
print (err_best_g)
initial=[5,5,5,5,5] # initial starting location [x1,x2...]
bounds=[(-10,10),(-10,10),(-10,10),(-10,10),(-10,10)] # input bounds [(x1_min,x1_max),(x2_min,x2_max)...]
PSO(func1,initial,bounds,num_particles=15,maxiter=30)
At the moment I get the following result:
FINAL:
[4.999187204673611, 5.992158863901226, 4.614395966906296, 0.7676323454298957, 8.533876878259441]
0.001554888332705297
However, I don't know how to extract the results as they are all within an In[] cell and not an Out[] cell.
What do I need to do to enable this?
Many thanks
There are 2 ways to do this:
1. return "pos_best_g" variable at the end of "init" function.
define a variable before class and define it as global variable inside the class, then change its value at the end of the init
like:
your_new_variable
class PSO():
def __init__(self,costFunc,x0,bounds,num_particles,maxiter):
global num_dimensions
global your_new_variable
...
print ('FINAL:')
print (pos_best_g)
print (err_best_g)
your_new_variable = pos_best_g
Set the result you want to extract as class attributes
class PSO():
def __init__(self,costFunc,x0,bounds,num_particles,maxiter):
...
# Set class attributes
self.pos_best_g = pos_best_g
self.err_best_g = err_best_g
Then you can access it from the object
pso = PSO(func1,initial,bounds,num_particles=15,maxiter=30)
# print final results
print ('FINAL:')
print (pso.pos_best_g)
print (pso.err_best_g)
I am guessing this is your own defined class, right? You can try adding a getter method and later call this method in Jupyter notebook to store the output results in variables as follows:
Just include these small functions inside your class
def get_pos_best(self):
return self.pos_best_g
def get_err_best(self):
return self.err_best_g
Now, inside your notebook do the following:
object_PSO = PSO(costFunc,x0,bounds,num_particles,maxiter)
list_you_want = object_PSO.get_pos_best()
error_you_want = object_PSO.get_err_best()
Good luck!
I have two classes. One called algorithm and the other called Chain. In algorithm, I create multiple chains, which are going to be a sequence of sampled values. I want to run the sampling in parallel at the chain level.
In other words, the algorithm class instantiates n chains and I want to run the _sample method, which belongs to the Chain class, for each of the chains in parallel within the algorithm class.
Below is a sample code that attempts what I would like to do.
I have seen a similar questions here: Apply a method to a list of objects in parallel using multi-processing, but as shown in the function _sample_chains_parallel_worker, this method does not work for my case (I am guessing it is because of the nested class structure).
Question 1: Why does this not work for this case?
The method in _sample_chains_parallel also does not even run in parallel.
Question 2: Why?
Question 3: How do I sample each of these chains in parallel?
import time
import multiprocessing
class Chain():
def __init__(self):
self.thetas = []
def _sample(self):
for i in range(3):
time.sleep(1)
self.thetas.append(i)
def clear_thetas(self):
self.thetas = []
class algorithm():
def __init__(self, n=3):
self.n = n
self.chains = []
def _init_chains(self):
for _ in range(self.n):
self.chains.append(Chain())
def _sample_chains(self):
for chain in self.chains:
chain.clear_thetas()
chain._sample()
def _sample_chains_parallel(self):
pool = multiprocessing.Pool(processes=self.n)
for chain in self.chains:
chain.clear_thetas()
pool.apply_async(chain._sample())
pool.close()
pool.join()
def _sample_chains_parallel_worker(self):
def worker(obj):
obj._sample()
pool = multiprocessing.Pool(processes=self.n)
pool.map(worker, self.chains)
pool.close()
pool.join()
if __name__=="__main__":
import time
alg = algorithm()
alg._init_chains()
start = time.time()
alg._sample_chains()
end = time.time()
print "sequential", end - start
start = time.time()
alg._sample_chains_parallel()
end = time.time()
print "parallel", end - start
start = time.time()
alg._sample_chains_parallel_worker()
end = time.time()
print "parallel, map and worker", end - start
In _sample_chains_parallel you are calling chain._sample() instead of just passing the function: pool.apply_async(chain._sample()). So you are passing the result as an argument instead of letting apply_async calculate it.
But removing () won't help you much, because Python 2 cannot pickle instance methods (possible for Python +3.5). It wouldn't raise the error unless you call get() on the result objects so don't rejoice if you see low times for this approach, that's because it immidiately quits with an unraised exception.
For the parallel versions you would have to relocate worker to the module level and call it pool.apply_async(worker (chain,)) respectively pool.map(worker, self.chains).
Note that you forgot clear_thetas() for _sample_chains_parallel_worker. The better solution would be anyway to let let Chain._sample take care of calling self._clear_thetas().
I am fairly new to python, kindly excuse me for insufficient information if any. As a part of the curriculum , I got introduced to python for quants/finance, I am studying multiprocessing and trying to understand this better. I tried modifying the problem given and now I am stuck mentally with the problem.
Problem:
I have a function which gives me ticks, in ohlc format.
{'scrip_name':'ABC','timestamp':1504836192,'open':301.05,'high':303.80,'low':299.00,'close':301.10,'volume':100000}
every minute. I wish to do the following calculation concurrently and preferably append/insert in the samelist
Find the Moving Average of the last 5 close data
Find the Median of the last 5 open data
Save the tick data to a database.
so expected data is likely to be
['scrip_name':'ABC','timestamp':1504836192,'open':301.05,'high':303.80,'low':299.00,'close':301.10,'volume':100000,'MA_5_open':300.25,'Median_5_close':300.50]
Assuming that the data is going to a db, its fairly easy to write a simple dbinsert routine to the database, I don't see that as a great challenge, I can spawn a to execute a insert statement for every minute.
How do I sync 3 different functions/process( a function to insert into db, a function to calculate the average, a function to calculate the median), while holding in memory 5 ticks to calculate the 5 period, simple average Moving Average and push them back to the dict/list.
The following assumption, challenges me in writing the multiprocessing routine. can someone guide me. I don't want to use pandas dataframe.
====REVISION/UPDATE===
The reason, why I don't want any solution on pandas/numpy is that, my objective is to understand the basics, and not the nuances of a new library. Please don't mistake my need for understanding to be arrogance or not wanting to be open to suggestions.
The advantage of having
p1=Process(target=Median,arg(sourcelist))
p2=Process(target=Average,arg(sourcelist))
p3=process(target=insertdb,arg(updatedlist))
would help me understand the possibility of scaling processes based on no of functions /algo components.. But how should I make sure p1&p2 are in sync while p3 should execute post p1&p2
Here is an example of how to use multiprocessing:
from multiprocessing import Pool, cpu_count
def db_func(ma, med):
db.save(something)
def backtest_strat(d, db_func):
a = d.get('avg')
s = map(sum, a)
db_func(s/len(a), median(a))
with Pool(cpu_count()) as p:
from functools import partial
bs = partial(backtest_strat, db_func=db_func)
print(p.map(bs, [{'avg': [1,2,3,4,5], 'median': [1,2,3,4,5]}]))
also see :
https://stackoverflow.com/a/24101655/2026508
note that this will not speed up anything unless there are a lot of slices.
so for the speed up part:
def get_slices(data)
for slice in data:
yield {'avg': [1,2,3,4,5], 'median': [1,2,3,4,5]}
p.map(bs, get_slices)
from what i understand multiprocessing works by message passing via pickles, so the pool.map when called should have access to all three things, the two arrays, and the db_save function. There are of course other ways to go about it, but hopefully this shows one way to go about it.
Question: how should I make sure p1&p2 are in sync while p3 should execute post p1&p2
If you sync all Processes, computing one Task (p1,p2,p3) couldn't be faster as the slowes Process are be.
In the meantime the other Processes running idle.
It's called "Producer - Consumer Problem".
Solution using Queue all Data serialize, no synchronize required.
# Process-1
def Producer()
task_queue.put(data)
# Process-2
def Consumer(task_queue)
data = task_queue.get()
# process data
You want multiple Consumer Processes and one Consumer Process gather all Results.
You don't want to use Queue, but Sync Primitives.
This Example let all Processes run independent.
Only the Process Result waits until notified.
This Example uses a unlimited Task Buffer tasks = mp.Manager().list().
The Size could be minimized if List Entrys for done Tasks are reused.
If you have some very fast algos combine some to one Process.
import multiprocessing as mp
# Base class for all WORKERS
class Worker(mp.Process):
tasks = mp.Manager().list()
task_ready = mp.Condition()
parties = mp.Manager().Value(int, 0)
#classmethod
def join(self):
# Wait until all Data processed
def get_task(self):
for i, task in enumerate(Worker.tasks):
if task is None: continue
if not self.__class__.__name__ in task['result']:
return (i, task['range'])
return (None, None)
# Main Process Loop
def run(self):
while True:
# Get a Task for this WORKER
idx, _range = self.get_task()
if idx is None:
break
# Compute with self Method this _range
result = self.compute(_range)
# Update Worker.tasks
with Worker.lock:
task = Worker.tasks[idx]
task['result'][name] = result
parties = len(task['result'])
Worker.tasks[idx] = task
# If Last, notify Process Result
if parties == Worker.parties.value:
with Worker.task_ready:
Worker.task_ready.notify()
class Result(Worker):
# Main Process Loop
def run(self):
while True:
with Worker.task_ready:
Worker.task_ready.wait()
# Get (idx, _range) from tasks List
idx, _range = self.get_task()
if idx is None:
break
# process Task Results
# Mark this tasks List Entry as done for reuse
Worker.tasks[idx] = None
class Average(Worker):
def compute(self, _range):
return average of DATA[_range]
class Median(Worker):
def compute(self, _range):
return median of DATA[_range]
if __name__ == '__main__':
DATA = mp.Manager().list()
WORKERS = [Result(), Average(), Median()]
Worker.start(WORKERS)
# Example creates a Task every 5 Records
for i in range(1, 16):
DATA.append({'id': i, 'open': 300 + randrange(0, 5), 'close': 300 + randrange(-5, 5)})
if i % 5 == 0:
Worker.tasks.append({'range':(i-5, i), 'result': {}})
Worker.join()
Tested with Python: 3.4.2