i have 200 SQL Servers and 40 same tables in all servers. i want to execute my python data pipeline per table in all 200 threads. just want know if i can run 10 thread concurrently at a time and if they finish, it run next 10 thread until 200 thread completed in one job automatically
for index, row in shops.iterrows():
tf =isOpen(row['shop_ip'] , row['port'])
endTime = time.time()
pingTime = endTime-startTime
if tf:
print(f"UP {row['shop_ip']} Ping Successful Time Taken : "+str(pingTime)+" seconds")
x = threading.Thread(target=ETLScript.ETLLoadingShopPos,args=(SelectColumns,tableName,tableName,row['shop_code'],'where 1=1',str(row['shop_code']),row))
jobs.append(x)
x.start()
x.join()
Based on your explanation, is it matter to run 10 thread as a running group or the matter thing is only 10 thread is allowed to run?
I think queue concept will works
import time
import threading
jobs = []
for index, row in shops.iterrows():
tf =isOpen(row['shop_ip'] , row['port'])
endTime = time.time()
pingTime = endTime-startTime
if tf:
while sum([j.is_alive() for j in jobs]) >= 10:
time.sleep(0.3) # give it a range to check to avoid resource consumed the whole time by main thread
print(f"UP {row['shop_ip']} Ping Successful Time Taken : "+str(pingTime)+" seconds")
x = threading.Thread(target=ETLScript.ETLLoadingShopPos,args=(SelectColumns,tableName,tableName,row['shop_code'],'where 1=1',str(row['shop_code']),row))
jobs.append(x)
x.start()
This allows you to run 10 threads at once. while loop will be used as a blocker to check if any of those threads is finished, another new thread will be started.
Related
So im trying to have a function run in 4 separate instances using the multiprocessing module. Inside the function is an infinite loop but for some reason it only runs one time and returns control of my terminal window to me.
Here are the 2 functions:
The function that creates the pool
def mainLogic():
global direction_array
pool = Pool()
for dir in direction_array:
pool.apply_async(generic_arrow_logic, args=(dir, direction_array.index(dir)))
print("starting process " + str(direction_array.index(dir)))
pool.close()
pool.join()
The function that im trying to run infinitely
def generic_arrow_logic(arrType, thread):
#Average runtime = .3 seconds so roughly 3fps
global color_dictionary, key_dictionary, arrowArrayCurr, default_arrow_color
last_time = time.time()
parseCoords(False)
while True:
working = screenGrab("not") #numpy array of entire image
currArr = cutImage(working, arrType, "not")#.convert("RGB") - Another numpy array
(height, width, depth) = currArr.shape
print("Loop on process {0} took {1} seconds...".format(thread,time.time()-last_time))
last_time = time.time()
if(not (currArr[int(width/2), int(height/2)] == default_arrow_color).all()):
pydirectinput.press(key_dictionary[arrType])
# sys.stdout.flush()
and this is what happens when I run the program
starting process 0
starting process 1
starting process 2
starting process 3
Loop on process 0 took 0.02699136734008789 seconds...
Loop on process 2 took 0.04453277587890625 seconds...
Loop on process 1 took 0.060872793197631836 seconds...
Loop on process 3 took 0.07178044319152832 seconds...
Jalen Morgan#gridl0ck-TL MINGW64 ~
$
Does anything stand out in my code that would explain why this doesn't run forever?
I have a program i want to split into 10 parts with multiprocessing. Each worker will be searching for the same answer using different variables to look for it (in this case its brute forcing a password). How to I get the processes to communicate their status, and how do I terminate all processes once one process has found the answer. Thank you!
If you are going to split it into 10 parts than either you should have 10 cores or at least your worker function should not be 100% CPU bound.
The following code initializes each process with a multiprocess.Queue instance to which the worker function will write its result. The main process waits for the first entry written to the queue and then terminates all pool processes. For this demo, the worker function is passed arguments 1, 2, 3, ... 10 and then sleeps for that amount of time and returns the argument passed. So we would expect that the worker function that was passed the argument value of 1 to complete first and that the total running time of the program should be slightly more than 1 second (it takes some time to create the 10 processes):
import multiprocessing
import time
def init_pool(q):
global queue
queue = q
def worker(x):
time.sleep(x)
# write result to queue
queue.put_nowait(x)
def main():
queue = multiprocessing.Queue()
pool = multiprocessing.Pool(10, initializer=init_pool, initargs=(queue,))
for i in range(1, 11):
# non-blocking:
pool.apply_async(worker, args=(i,))
# wait for first result
result = queue.get()
pool.terminate() # kill all tasks
print('Result: ', result)
# required for Windows:
if __name__ == '__main__':
t = time.time()
main()
print('total time =', time.time() - t)
Prints:
Result: 1
total time = 1.2548246383666992
I don't want the code to go the sleep in these 5 minutes and just waits. I want to run some other code block or script in the meantime.
How to run a python script that executes every 5 minutes BUT in the meantime executes some other script it code block until the 5 minute time is reached again.
e.g I want to run 3 functions . One to run every 5 minutes. another every 1 minutes. another every 10-20 seconds.
You can use a Thread to control your subprocess and eventually kill it after 5 minutes
import time
delay = 1 # time between your next script execution
wait = delay
t1 = time.time()
while True:
t2 = time.time() - t1
if t2 >= wait:
wait += delay
# execute your script once every 5 minutes (now it is set to 1 second)
# execute your other code here
First, you need to get the time of your script, then you need a variable to store "wait time" of your script (in this case "wait").
Every time your script time is higher or equal to "wait" delay variable is added to wait and code is executed.
And for multiple delays it's:
import time
delay = [1, 3]
wait = [delay[0], delay[1]]
t1 = time.time()
while True:
t2 = time.time() - t1
for i in range(len(wait)):
if t2 >= wait[i]:
wait[i] += delay[i]
if i==0:
print("This is executed every second")
if i==1:
print("This is executed every 3 second")
I'm trying to run a set of code that starts exactly in 5 second blocks from UTC time, starting at an even minute.
For example it would execute each sample at exactly:
11:20:45
11:20:50
11:20:55
11:21:00
11:21:05
11:21:10
I want that to happen regardless of execution time of the code block, if running the code is instant or 3 seconds I still want to execute at the 5 second UTC time intervals.
Not exactly sure how to do this, though I think that datetime.datetime.utcnow().timestamp() - (datetime.datetime.utcnow().timestamp() % 5.0) + 5 gets me the next upcoming start time?
You can use python's scheduler module:
from datetime import datetime
import sched, time
s = sched.scheduler(time.time, time.sleep)
def execute_something(start_time):
print("starting at: %f" % time.time())
time.sleep(3) # simulate a task taking 3 seconds
print("Done at: %f" % time.time())
# Schedule next iteration
next_start_time = start_time + 5
s.enterabs(next_start_time, 1, execute_something, argument=(next_start_time,))
next_start_time = round(time.time() + 5, -1) # align to next to 10sec
s.enterabs(next_start_time, 1, execute_something, argument=(next_start_time,))
print("Starting scheduler at: %f" % time.time())
s.run()
# Starting scheduler at: 1522031714.523436
# starting at: 1522031720.005633
# Done at: 1522031723.008825
# starting at: 1522031725.002102
# Done at: 1522031728.005263
# starting at: 1522031730.002157
# Done at: 1522031733.005365
# starting at: 1522031735.002160
# Done at: 1522031738.005370
Use time.sleep to wait until the desired time. Note that this is approximate; especially when the system is under high load, your process might not be waken in time. You can increase process priority to increase your chance.
To avoid blocking the waiting thread, run the task in a separate thread, either by constructing a new thread for every task or using a (faster) thread pool, like this:
import concurrent.futures
import time
def do_something(): # Replace this with your real code
# This code outputs the time and then simulates work between 0 and 10 seconds
import datetime
import random
print(datetime.datetime.utcnow())
time.sleep(10 * random.random())
pool = concurrent.futures.ThreadPoolExecutor()
while True:
now = time.time()
time.sleep(5 - now % 5)
pool.submit(do_something)
I've found that numpy.fft.fft (and its variants) very slow when run in the background. Here is an example of what I'm talking about
import numpy as np
import multiprocessing as mproc
import time
import sys
# the producer function, which will run in the background and produce data
def Producer(dataQ):
numFrames = 5
n = 0
while n < numFrames:
data = np.random.rand(3000, 200)
dataQ.put(data) # send the datta to the consumer
time.sleep(0.1) # sleep for 0.5 second, so we dont' overload CPU
n += 1
# the consumer function, which will run in the backgrounnd and consume data from the producer
def Consumer(dataQ):
while True:
data = dataQ.get()
t1 = time.time()
fftdata = np.fft.rfft(data, n=3000*5)
tDiff = time.time() - t1
print("Elapsed time is %0.3f" % tDiff)
time.sleep(0.01)
sys.stdout.flush()
# the main program if __name__ == '__main__': is necessary to prevent this code from being run
# only when this program is started by user
if __name__ == '__main__':
data = np.random.rand(3000, 200)
t1 = time.time()
fftdata = np.fft.rfft(data, n=3000*5, axis=0)
tDiff = time.time() - t1
print("Elapsed time is %0.3f" % tDiff)
# generate a queue for transferring data between the producedr and the consumer
dataQ = mproc.Queue(4)
# start up the processoso
producerProcess = mproc.Process(target=Producer, args=[dataQ], daemon=False)
consumerProcess = mproc.Process(target=Consumer, args=[dataQ], daemon=False)
print("starting up processes")
producerProcess.start()
consumerProcess.start()
time.sleep(10) # let program run for 5 seconds
producerProcess.terminate()
consumerProcess.terminate()
The output it produes on my machine:
Elapsed time is 0.079
starting up processes
Elapsed time is 0.859
Elapsed time is 0.861
Elapsed time is 0.878
Elapsed time is 0.863
Elapsed time is 0.758
As you can see, it is roughly 10x slower when run in the background, and I can't figure out why this would be the case. The time.sleep() calls should ensure that the other process (the main process and producer process) aren't doing anything when the FFT is being computed, so it should use all the cores. I've checked CPU utilization through Windows Task Manager and it seems to use up about 25% when numpy.fft.fft is called heavily in both the single process and multiprocess cases.
Anyone have an idea whats going on?
The main problem is that your fft call in the background thread is:
fftdata = np.fft.rfft(data, n=3000*5)
rather than:
fftdata = np.fft.rfft(data, n=3000*5, axis=0)
which for me made all the difference.
There are a few other things worth noting. Rather than having the time.sleep() everywhere, why not just let the processor take care of this itself? Further more, rather than suspending the main thread, you can use
consumerProcess.join()
and then have the producer process run dataQ.put(None) once it has finished loading the data, and break out of the loop in the consumer process, i.e.:
def Consumer(dataQ):
while True:
data = dataQ.get()
if(data is None):
break
...