Is there any way to mitigate the cost of multiprocessing.Process.start()? - python

So I've been tooling around with threads and processes in Python, and along the way I cooked up a pattern that allows the same class to be pitched back and forth between threads and/or processes without losing state data by using by-name RPC calls and Pipes.
Everything works fine, but it takes an absurd amount of time to start a process as compared to loading the state from a pickled file, and Thread.start() returns immediately, so there's only the minor cost of the constructor. So: what's the best way start a Process with a large initial state without an absurd startup time. Snips and debug output below, the size of "counter" is just over 34,000K pickled to file with mode 2.
...
elif command == "load":
# RPC call - Loads state from file "pickle_name":
timestart = time.time()
print do_remote("take_pickled_state", pickle_name)
print "Load cost: " + str(time.time() - timestart)
elif command == "asproc":
if type(_async) is multiprocessing.Process:
print "Already running as a Process you fool!."
else:
do_remote("stop")
_async.join()
p_pipe.close()
p_pipe, c_pipe = multiprocessing.Pipe()
timestart = time.time()
_async = multiprocessing.Process(target = counter, args = (c_pipe,))
# Why is this so expensive!!?!?!?! AAARRG!!?!
_async.start()
print "Start cost: " + str(time.time() - timestart)
elif command == "asthread":
if type(_async) is threading.Thread:
print "Already running as a Thread you fool!."
else:
# Returns the state of counter on stop:
timestart = time.time()
counter = do_remote("stop")
print "Proc stop time: " + str(time.time() - timestart)
_async.join()
p_pipe.close()
p_pipe, c_pipe = multiprocessing.Pipe()
timestart = time.time()
_async = threading.Thread(target = counter, args = (c_pipe,))
_async.start()
print "Start cost: " + str(time.time() - timestart)
...
Corresponding debug statements:
Waiting for command...
>>> load
Load complete.
Load cost: 2.18700003624
Waiting for command...
>>> asproc
Start cost: 23.3910000324
Waiting for command...
>>> asthread
Proc stop time: 0.921999931335
Start cost: 0.0629999637604
Edit 1:
OS: Win XP 64.
Python version: 2.7.x
Processor: Xeon quad core.
Edit 2:
The thing I really don't get is it takes ~1 sec for the process stop to return the entire state, but it takes 20x longer to receive the state and start. (debug outputs added)

Related

Comparing old CPU usage with new one using python not working

I am trying to check the CPU usage and print it the user every 20 seconds.
when the code first runs it sshould print the CPU usage, but after that it should compare the old cpu usage with the new one, and
When there is an increase in CPU usage over 9%, we will print every process using over 1% of the CPU using the code below
for process in psutil.process_iter():
p = process.as_dict()
if p["memory_percent"] > 1:
print(p["name"] + " is using " + str(p["memory_percent"]) + " of the CPU")
here is what I have tried below
import psutil
import time
status = 'start'
while(status != 'stop'):
old_cpu_usage = 0
cpu_usage_percent = psutil.cpu_percent(1, False)
print()
# if the CPU usage has not increased by over 10% then it prints the current CPU usage, if not the
if(old_cpu_usage==0):
print(f"The CPU usage is {str(cpu_usage_percent)}%")
old_cpu_usage = cpu_usage_percent
elif ((cpu_usage_percent - old_cpu_usage )/cpu_usage_percent * 100 > 9):
for process in psutil.process_iter():
p = process.as_dict()
if p["memory_percent"] > 1:
print(p["name"] + " is using " + str(p["memory_percent"]) + " of the CPU")
old_cpu_usage = cpu_usage_percent
else:
print(f"The CPU usage did not increase significantly, the usage is {str(cpu_usage_percent)}%")
old_cpu_usage = cpu_usage_percent
time.sleep(2)
upon trying the solution above, it doesn't work. instead it keeps printing the first if statement, and does not make any comparism

How to call a pool with sleep between executions within a multiprocessing process in Python?

In the main function, I am calling a process to run imp_workload() method parallely for each DP_WORKLOAD
#!/usr/bin/env python
import multiprocessing
import subprocess
if __name__ == "__main__":
for DP_WORKLOAD in DP_WORKLOAD_NAME:
p1 = multiprocessing.Process(target=imp_workload, args=(DP_WORKLOAD, DP_DURATION_SECONDS, DP_CONCURRENCY, ))
p1.start()
However, inside this imp_workload() method, I need the import_command_run() method to run a number of processes (the number is equivalent to variable DP_CONCURRENCY) but with the sleep of 60 seconds before new execution.
This is the sample code I have written.
def imp_workload(DP_WORKLOAD, DP_DURATION_SECONDS, DP_CONCURRENCY):
while DP_DURATION_SECONDS > 0:
pool = multiprocessing.Pool(processes = DP_CONCURRENCY)
for j in range(DP_CONCURRENCY):
pool.apply_async(import_command_run, args=(DP_WORKLOAD, dp_workload_cmd, j,)
# Sleep for 1 minute
time.sleep(60)
pool.close()
# Clean the schemas after import is completed
clean_schema(DP_WORKLOAD)
# Sleep for 1 minute
time.sleep(60)
def import_command_run(DP_WORKLOAD):
abccmd = 'impdp admin/DP_PDB_ADMIN_PASSWORD#DP_PDB_FULL_NAME SCHEMAS=ABC'
defcmd = 'impdp admin/DP_PDB_ADMIN_PASSWORD#DP_PDB_FULL_NAME SCHEMAS=DEF'
# any of the above commands
run_imp_cmd(eval(dp_workload_cmd))
def run_imp_cmd(cmd):
output = subprocess.Popen([cmd], shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE, universal_newlines=True)
stdout,stderr = output.communicate()
return stdout
When I tried running it in this format, I got the following error:
time.sleep(60)
^
SyntaxError: invalid syntax
So, how can I kickoff the 'abccmd' job for DP_CONCURRENCY times parallely with a sleep of 1 min between each job and also each of these pool running in multiProcess?
Working on Python 2.7.5 (Due to restrictions, can't use Python 3.x so, will appreciate answers specific to Python 2.x)
P.S. This is a very large script and complex file so I have tried posting only relevant excerpts. Please ask for more details if necessary (or if it is not clear from this much)
Let me offer two possibilities:
Possibility 1
Here is an example of how you would kick off a worker function in parallel with DP_CURRENCY == 4 possible arguments, 0, 1, 2 and 3, cycling over and over for up to DP_DURATION_SECONDS seconds with a pool size of DP_CURRENCY and as soon as a job completes restarting the job but guaranteeing that at least TIME_BETWEEN_SUBMITS == 60 seconds has elapsed between successive restarts.
from __future__ import print_function
from multiprocessing import Pool
import time
from queue import SimpleQueue
TIME_BETWEEN_SUBMITS = 60
def worker(i):
print(i, 'started at', time.time())
time.sleep(40)
print(i, 'ended at', time.time())
return i # the argument
def main():
q = SimpleQueue()
def callback(result):
# every time a job finishes, put result (the argument) on the queue
q.put(result)
DP_CURRENCY = 4
DP_DURATION_SECONDS = TIME_BETWEEN_SUBMITS * 10
pool = Pool(DP_CURRENCY)
t = time.time()
expiration = t + DP_DURATION_SECONDS
# kick off initial tasks:
start_times = [None] * DP_CURRENCY
for i in range(DP_CURRENCY):
pool.apply_async(worker, args=(i,), callback=callback)
start_times[i] = time.time()
while True:
i = q.get() # wait for a job to complete
t = time.time()
if t >= expiration:
break
time_to_wait = TIME_BETWEEN_SUBMITS - (t - start_times[i])
if time_to_wait > 0:
time.sleep(time_to_wait)
pool.apply_async(worker, args=(i,), callback=callback)
start_times[i] = time.time()
# wait for all jobs to complete:
pool.close()
pool.join()
# required by Windows:
if __name__ == '__main__':
main()
Possibility 2
This is closer to what you had in that DP_DURATION_SECONDS == 60 seconds of sleeping is done between successive submission of any two jobs. But to me this doesn't make as much sense. If, for example, the worker function only took 50 seconds to complete, you would not be doing any parallel processing at all. In fact, each job would need to take at least 180 (i.e. (DP_CURRENCY - 1) * TIME_BETWEEN_SUBMITS) seconds to complete in order to have all 4 processes in the pool busy running jobs at the same time.
from __future__ import print_function
from multiprocessing import Pool
import time
from queue import SimpleQueue
TIME_BETWEEN_SUBMITS = 60
def worker(i):
print(i, 'started at', time.time())
# A task must take at least 180 seconds to run to have 4 tasks running in parallel if
# you wait 60 seconds between starting each successive task:
# take 182 seconds to run
time.sleep(3 * TIME_BETWEEN_SUBMITS + 2)
print(i, 'ended at', time.time())
return i # the argument
def main():
q = SimpleQueue()
def callback(result):
# every time a job finishes, put result (the argument) on the queue
q.put(result)
# at most 4 tasks at a time but only if worker takes at least 3 * TIME_BETWEEN_SUBMITS
DP_CURRENCY = 4
DP_DURATION_SECONDS = TIME_BETWEEN_SUBMITS * 10
pool = Pool(DP_CURRENCY)
t = time.time()
expiration = t + DP_DURATION_SECONDS
# kick off initial tasks:
for i in range(DP_CURRENCY):
if i != 0:
time.sleep(TIME_BETWEEN_SUBMITS)
pool.apply_async(worker, args=(i,), callback=callback)
time_last_job_submitted = time.time()
while True:
i = q.get() # wait for a job to complete
t = time.time()
if t >= expiration:
break
time_to_wait = TIME_BETWEEN_SUBMITS - (t - time_last_job_submitted)
if time_to_wait > 0:
time.sleep(time_to_wait)
pool.apply_async(worker, args=(i,), callback=callback)
time_last_job_submitted = time.time()
# wait for all jobs to complete:
pool.close()
pool.join()
# required by Windows:
if __name__ == '__main__':
main()

How can I run a script for 18 hours in Python?

if __name__=='__main__':
print("================================================= \n")
print 'The test will be running for: 18 hours ...'
get_current_time = datetime.now()
test_ended_time = get_current_time + timedelta(hours=18)
print 'Current time is:', get_current_time.time(), 'Your test will be ended at:', test_ended_time.time()
autodb = autodb_connect()
db = bw_dj_connect()
started_date, full_path, ips = main()
pid = os.getpid()
print('Main Process is started and PID is: ' + str(pid))
start_time = time.time()
process_list = []
for ip in ips:
p = Process(target=worker, args=(ip, started_date, full_path))
p.start()
p.join()
child_pid = str(p.pid)
print('PID is:' + child_pid)
process_list.append(child_pid)
child = multiprocessing.active_children()
print process_list
while child != []:
time.sleep(1)
child = multiprocessing.active_children()
print ' All processes are completed successfully ...'
print '_____________________________________'
print(' All processes took {} second!'.format(time.time()-start_time))
I have got a python test script which should be running for 18 hours and then kill itself. The script uses multiprocessing for multi devices. The data I am getting from main() function will be changed by time.
I am passing these three args to worker method in multiprocessing.
How can I achieve that ?
if you don't need to worry about cleanup too much on the child processes you can kill them using .terminate()
...
time.sleep(18 * 60 * 60) # go to sleep for 18 hours
children = multiprocessing.active_children()
for child in children:
child.terminate()
for child in multiprocessing.active_children():
child.join() # wait for the children to terminate
if you do need to do some cleanup in all the child processes then you need to modify their run loop (I'm assuming while True) to monitor the time passing and only have the second while loop above in the main program, waiting for the children to go away on their own.
you are never comparing datetime.now() to test_ended_time.
# check if my current time is greater than the 18 hour check point.
While datetime.now() < test_ended_time and multiprocessing.active_children():
print('still running my process.')
sys.exit(0)

Python create a thread and start it on key pressed

I've made a python script that factorizes a number into its prime factors. However when dealing with big numbers i may like to have an idea to the progress of the computation. (I simplified the script)
import time, sys, threading
num = int(input("Input the number to factor: "))
factors = []
def check_progress():
but = input("Press p: ")
if but == "p":
tot = int(num**(1/2))
print("Step ", k, " of ", tot, " -- ", round(k*100/tot,5), "%", end="\r", sep="")
t = threading.Thread(target=check_progress) ?
t.daemon = True ?
t.start() ?
k = 1
while(k != int(num**(1/2))):
k = (k+1)
if num%k == 0:
factors.append(int(k))
num = num//k
k = 1
print(factors)
I'm wondering if there is a way to show the progress on demand, for example, during the loop, i press a key and it prints the progress?
How can i implement a thread of something like that in my script?
Thanks and sorry for my english
Edit:
def check_progress():
while True:
but = input("## Press return to show progress ##")
tot = int(num**(1/2))
print("Step ", k, " of ", tot, " -- ", round(k*100/tot,5), "%", sep="")
Here is one possible design:
Main thread:
create queue and thread
start the progress thread
wait user input
on input:
pop result from queue (may be None)
display it
loop
Progress thread:
do the work an put status in queue
I can provide example, but I feel you are willing to learn. Feel free to comment for help.
Edit: Full example with queue.
from time import sleep
from Queue import Queue
from threading import Thread
# Main thread:
def main():
# create queue and thread
queue = Queue()
thread = Thread(target=worker, args=(queue,))
# start the progress thread
thread.start()
# wait user input
while thread.isAlive():
raw_input('--- Press any key to show status ---')
# pop result from queue (may be None)
status = queue.get_nowait()
queue.task_done()
# display it
if status:
print 'Progress: %s%%' % status
else:
print 'No status available'
# Progress thread:
def worker(queue):
# do the work an put status in queue
# Simulate long work ...
for x in xrange(100):
# put status in queue
queue.put_nowait(x)
sleep(.5)
if __name__ == '__main__':
main()

Python Bootstrap Error: Can't Start New Thread with only two threads running

From what I've gather by looking through the website, the following error is due to the program having too many threads open and hitting some sort of resource barrier.
_start_new_thread(self.__bootstrap, ())
error: can't start new thread
The fact that it's creating too many thread would be all fine and dandy and would be an acceptable reason, except I'm only trying to run two threads. These threads are countdown timers.
When clicking the first button to start the first timer, it gives me this error but the timer still starts running. When I click on the second button to start the second timer, it gives me this error then crashes the program.
Here's the relevant code.
def OB():
t1 = Thread(target = OB)
t1.start()
OBx = 300
for x in xrange(1, 300):
OwnBlue.configure(text = "Own Blue Is Up In " + str(OBx) + " Seconds.")
OBx = OBx - 1
time.sleep(1)
root.update()
def OR():
t2 = Thread(target = OR)
t2.start()
ORx = 300
for x in xrange(1, 300):
OwnRed.configure(text = "Own Red Is Up In " + str(ORx) + " Seconds.")
ORx = ORx - 1
time.sleep(1)
root.update()
Aight guys I figured dis here program out.
When you target a function with a thread, you also run the function. So because the thread was within the function, the function was infinitely being run.

Categories