Set limit and check memory usage of subprocess.Popen in Python

Set limit and check memory usage of subprocess.Popen in Python - python

I am developing online judge using Django on Debian. All user scripts I call with subprocess.Popen. To check script time usage I use time module and to limit timeout I'm giving parameter timeout to communicate on process object and handling subprocess.TimeoutExpired exception. Is it possible to do something similar to check memory usage of process and limit it?
That is a code sample how I do it now:
try:
execution = subprocess.Popen(execute_line.split(), stdout=subprocess.PIPE, stderr=subprocess.PIPE, stdin=subprocess.PIPE, shell=False)
execution.stdin.write(bytes(test.input_data, 'UTF-8'))
execution.stdin.flush()
start_time = time.time()
test_output, test_error_string = execution.communicate(timeout=time_limit)
end_time = time.time()
finish_time = end_time - start_time
test_output = test_output.decode('utf-8')
test_error_string = test_error_string.decode('utf-8')
except subprocess.TimeoutExpired:
end_time = time.time()
finish_time = end_time - start_time

There is no straight-forward one-liner easy way to check for memory consumption.
Solutions:
You can use psutil (pip install psutil), for example:
def get_process_memory():
process = psutil.Process(os.getpid())
return process.memory_info().rss
def track(func):
def wrapper(*args, **kwargs):
mem_before = get_process_memory() / 1024 ** 2
start = time.time()
result = func(*args, **kwargs)
elapsed_time = elapsed_since(start)
mem_after = get_process_memory() / 1024 ** 2
metrics = {
'callable': func.__name__,
'memory_before': mem_before,
'memory_after': mem_after,
'memory_used': mem_after - mem_before,
'exec_time': elapsed_time
}
print(f"{json.dumps(metrics, indent=4)}")
return result
return wrapper
Use similar approaches including resource Python library (only works on Unix systems)
# Memory consumption with psutil (MB)
import os, psutil; print(psutil.Process(os.getpid()).memory_info().rss / 1024 ** 2)
# Memory consumption with resource (MB) - Only works on Unix
import resource; print(resource.getrusage(resource.RUSAGE_SELF).ru_maxrss / 1024)
Use tracemalloc - a bit more complex IMHO
Use an external process/software to monitor metrics of your processes

Related

How can I improve what I already have?

I have created a windows internet speed test, I'd like to improve it and make the code more presentable as well as better define my functions.
When the computer reaches initialise, due to the variable being in another function, it cannot call it. How can I rectify this as I have various variables being called in different functions.
Feel free to use this speedtester as well, I will be working on developing a useful phone app to run the code as well.
The code prints the current date and time, searches for the connected SSID, initialises the speedtest module, scans for servers, selects the best server, initiates ping test, then download speed test, then upload speed test, followed by printing the results on screen and writing it to a simple txt file for viewing later.
Each function shows its run time using the time module and lastly total execution time with date and time also.
It works perfectly without the functions, and on android without find_ssid(): but I keep running into the trouble of localised variables.
import speedtest
from datetime import datetime
import subprocess
import re
import time
def main():
def date():
dt_now = datetime.now()
dtn = dt_now.strftime("%a %d-%m-%Y, %H:%M:%S%p")
return dtn
print(date())
def find_ssid():
stt = time.time()
cdop = subprocess.run(["netsh", "WLAN", "show", "interfaces"], capture_output=True).stdout.decode()
ssid = (re.findall("SSID : (.*)\r", cdop))
for char in ssid:
ssid = f"Network Name: {char} \n"
sid = time.time() - stt
print(f'SSID found in: {sid:.2f}s')
print(ssid)
find_ssid()
def initialise():
print("Initialising network speed test... ")
st = speedtest.Speedtest()
print("Network speed test active.")
sta = time.time() - stt
print(f'Speed test activation time: {sta - sid:.2f}s')
def scan_servers():
print("Scanning for available servers...")
st.get_servers()
print("Found available servers.")
sft = time.time() - stt
print(f'Servers found in: {sft - sta:.2f}s')
def best_server():
print("Choosing best server...")
bserv = st.get_best_server()
print(f"Best server is: {bserv['sponsor']} - {bserv['host']} located in {bserv['name']}, {bserv['country']}")
bst = time.time() - stt
print(f'Best server found in: {bst - sft:.2f}s')
def ping_test():
print("Ping testing...")
p = st.results.ping
ph = f"Ping: {p:.2f}ms"
print("Ping test complete.")
ptt = time.time() - stt
print(f'Ping test completed in: {ptt - bst:.2f}s')
def download_speed_test():
print("Download speed testing...")
ds = st.download()
dsh = f"Download speed: {ds / 1024 / 1024:.2f}mb/s"
print("Download speed test complete.")
dst = time.time() - stt
print(f'Download speed test completed in: {dst - ptt:.2f}s')
def upload_speed_test():
print("Upload speed testing...")
us = st.upload()
ust = time.time() - stt
ush = f"Upload speed: {us / 1024 / 1024:.2f}mb/s \n"
print("Upload speed test complete. \n")
print(f'Upload speed test completed in: {ust - dst:.2f}s')
def result():
print("Speed test results are: \n")
print(ssid)
print(ph)
print(dsh)
print(ush)
ttn = datetime.now()
fdt = ttn.strftime("%a %d-%m-%Y, %H:%M:%S%p")
tt = time.time() - stt
print(f"Start Time: {dtn}")
print(f"Finish Time: {fdt}")
print(f'Total execution time: {tt:.2f}s')
results = [ssid, ph, dsh, ush, dtn]
txt = "Speedtest Results.txt"
with open(txt, 'a') as f:
f.write("\n")
f.write("\n".join(results))
f.write("\n")
f.close()
main()

You can run this on, one line i believe by
ssid = (re.findall("SSID : (.*)\r", cdop))
for char in ssid:
ssid = f"Network Name: {char} \n/
Which should make it quicker, have a look at list comprehension

Python multiprocessing finish the work correctly, but the processes still alive (Linux)

I use python multiprocessing to compute some sort of scores on DNA sequences from a large file.
For that I write and use the script below.
I use a Linux machine with 48 cpu in python 3.8 environment.
Th code work fine, and terminate the work correctly and print the processing time at the end.
Problem: when I use the htop command, I find that all 48 processes are still alive.
I don't know why, and I don't know what to add to my script to avoid this.
import csv
import sys
import concurrent.futures
from itertools import combinations
import psutil
import time
nb_cpu = psutil.cpu_count(logical=False)
def fun_job(seq_1, seq_2): # seq_i : (id, string)
start = time.time()
score_dist = compute_score_dist(seq_1[1], seq_2[1])
end = time.time()
return seq_1[0], seq_2[0], score_dist, end - start # id seq1, id seq2, score, time
def help_fun_job(nested_pair):
return fun_job(nested_pair[0], nested_pair[1])
def compute_using_multi_processing(list_comb_ids, dict_ids_seqs):
start = time.perf_counter()
with concurrent.futures.ProcessPoolExecutor(max_workers=nb_cpu) as executor:
results = executor.map(help_fun_job,
[((pair_ids[0], dict_ids_seqs[pair_ids[0]]), (pair_ids[1], dict_ids_seqs[pair_ids[1]]))
for pair_ids in list_comb_ids])
save_results_to_csv(results)
finish = time.perf_counter()
proccessing_time = str(datetime.timedelta(seconds=round(finish - start, 2)))
print(f' Processing time Finished in {proccessing_time} hh:mm:ss')
def main():
print("nb_cpu in this machine : ", nb_cpu)
file_path = sys.argv[1]
dict_ids_seqs = get_dict_ids_seqs(file_path)
list_ids = list(dict_ids_seqs) # This will convert the dict_keys to a list
list_combined_ids = list(combinations(list_ids, 2))
compute_using_multi_processing(list_combined_ids, dict_ids_seqs)
if __name__ == '__main__':
main()
Thank you for your help.
Edit : add the complete code for fun_job (after #Booboo answer)
from Bio import Align
def fun_job(seq_1, seq_2): # seq_i : (id, string)
start = time.time()
aligner = Align.PairwiseAligner()
aligner.mode = 'global'
score_dist = aligner.score(seq_1[1],seq_2[1])
end = time.time()
return seq_1[0], seq_2[0], score_dist, end - start # id seq1, id seq2, score, time

When the with ... as executor: block exits, there is an implicit call to executor.shutdown(wait=True). This will wait for all pending futures to to be done executing "and the resources associated with the executor have been freed", which presumably includes terminating the processes in the pool (if possible?). Why your program terminates (or does it?) or at least you say all the futures have completed executing, while the processes have not terminated is a bit of a mystery. But you haven't provided the code for fun_job, so who can say why this is so?
One thing you might try is to switch to using the multiprocessing.pool.Pool class from the multiprocessing module. It supports a terminate method, which is implicitly called when its context manager with block exits, that explicitly attempts to terminate all processes in the pool:
#import concurrent.futures
import multiprocessing
... # etc.
def compute_using_multi_processing(list_comb_ids, dict_ids_seqs):
start = time.perf_counter()
with multiprocessing.Pool(processes=nb_cpu) as executor:
results = executor.map(help_fun_job,
[((pair_ids[0], dict_ids_seqs[pair_ids[0]]), (pair_ids[1], dict_ids_seqs[pair_ids[1]]))
for pair_ids in list_comb_ids])
save_results_to_csv(results)
finish = time.perf_counter()
proccessing_time = str(datetime.timedelta(seconds=round(finish - start, 2)))
print(f' Processing time Finished in {proccessing_time} hh:mm:ss')

Stop multiprocess pool when a condition is met and continue with program

I've been trying to wrap my head around multiprocessing using an old python bitcoin mining program. Although relatively useless for mining, I figured this would be a great way to explore multiprocessing. However, I've hit a wall when it comes to stopping the processes when one of them achieves the goal they are all working towards.
I want to kill all multiprocessing pools when one of them finds the solution. Then allow the program to continue. I have tried terminate() and join(). I've attempted to include an Event(). I've tried using Process instead of Pool with the direction of a similar issue here: Killing a multiprocessing process when condition is met. However, same problem. How can I stop all processes after a condition is met without exiting the program with something like sys.exit() that would kill the entire program?
I tried also apply_sync with the direction from this post: Python Multiprocess Pool. How to exit the script when one of the worker process determines no more work needs to be done? However, it did not solve the problem of needing to continue executing the final functions of the program. In fact, it actually slowed the program significantly.
For clarity, I've included the code I tried based on the above mentioned link here:
from multiprocessing import Pool
from hashlib import sha256
import time
def SHA256(text):
return sha256(text.encode("ascii")).hexdigest()
def solution_helper(args):
solution, nonce = do_job(args)
if solution:
print(f"\nNonce Found: {nonce}\n")
return True
else:
return False
class Mining():
def __init__(self, workers, initargs):
self.pool = Pool(processes=workers, initargs=initargs)
def callback(self, result):
if result:
print('Solution Found...Terminating Processes...')
self.pool.terminate()
def do_job(self):
for args in values:
start_nonce = args[0]
end_nonce = args[1]
prefix_str = '0'*difficulty
self.pool.apply_async(solution_helper, args=args, callback=self.callback)
start = time.time()
for nonce in range(start_nonce, end_nonce):
text = str(block_number) + transactions + previous_hash + str(nonce)
new_hash = SHA256(text)
if new_hash.startswith(prefix_str):
print(f"Hashing: {text}")
print(f"\nSuccessfully mined bitcoin with nonce value: {nonce}\n")
print(f"New hash: {new_hash}")
total_time = str((time.time()-start))
print(f"\nEnd mning... Mining took {total_time} seconds\n")
return new_hash, nonce
self.pool.close()
self.pool.join()
print('.Goodbye.')
block_number = 5
transactions = """
bill->steve->20,
jan->phillis->45
"""
previous_hash = '0000000b7c7723e4d3a8654c975fe4dd23d4d37f22d0ea7e5abde2225d1567dc6'
values = [(20000, 100000), (100000, 1000000), (1000000, 10000000), (10000000, 100000000)]
difficulty = 4
m = Mining(5, values)
m.do_job()
Here's the basic concept. It works great to start the processes, but I cannot figure out how to stop them:
from multiprocessing import Pool
from hashlib import sha256
import functools
MAX_NONCE = 1000000000
def SHA256(text):
return sha256(text.encode("ascii")).hexdigest()
def nonce(block_number, transactions, previous_hash, prefix_str):
import time
start = time.time()
for nonce in range(MAX_NONCE):
text = str(block_number) + transactions + previous_hash + str(nonce)
new_hash = SHA256(text)
if new_hash.startswith(prefix_str):
print(f"\nYay! Successfully mined bitcoins with nonce value:{nonce}")
total_time = str((time.time()-start))
print(f"\nend mining. Mining took: {total_time} seconds\n")
print(new_hash + "\n")
def mine(block_number, transactions, previous_hash, prefix_zeros):
from multiprocessing import Pool
with Pool(4) as p:
prefix_str = '0'*prefix_zeros
p.map(nonce(block_number, transactions, previous_hash, prefix_str), [20000, 40000, 60000, 80000, 100000])
if __name__=='__main__':
transactions="""
bill->steve->20,
jan->phillis->45
"""
difficulty=7
print("\nstart mining\n")
new_hash = mine(5, transactions, '0000000b7c7723e4d3a8654c975fe4dd23d4d37f22d0ea7e5abde2225d1567dc6', difficulty)
# Do some other things... Here is where I'd like to get to after the multiproccesses are killed
print(f"\nMission Complete...{new_hash}\n") <---This never gets a chance to happen

can pexpect be used to get the output till timeout without looking for any specific pattern

i have a telnet session spawned. I want to get whatever output the session throws in a given time interval. How to do it using pexpect.
I tried expect('.*',20). But this matches whatever the output has till that time and not waits for the timeout 20 seconds.

# set expected to some string you might not see
def ReadByDuration(connection, expected = ".*", read_time = 20, epsilon = 0.001):
from time import time
start_time = time()
response = connection.read_until(expected,float(read_time))
time_passed = time() - start_time
# if timeout not met, read again
while read_time - time_passed > epsilon :
response += connection.read_until(expected, float(read_time - time_passed))
time_passed = time() - start_time
return response
and if you want to get really fancy, you can add it to the class functions
import telnetlib, types
t = telnetlib.Telnet(IP, PORT, 20)
t.ReadByDuration = types.MethodType( ReadByDuration, t )
t.ReadByDuration(".*", 20)

Wrong speed test using time.clock() between synchronous and asynchronous cases?

I'm reading a tutorial about gevent, and it provids sample codes for a demonstrastion for synchronous and asynchronous cases:
import gevent
import random
def task(pid):
"""
Some non-deterministic task
"""
gevent.sleep(random.randint(0,2)*0.001)
print('Task', pid, 'done')
def synchronous():
for i in range(1,10):
task(i)
def asynchronous():
threads = [gevent.spawn(task, i) for i in xrange(1000)]
gevent.joinall(threads)
This article explains that 'the order of execution in the async case is essentially random and that the total execution time in the async case is much less than the sync case'.
So I used time module to test it:
print('Synchronous:')
start1 = time.clock()
synchronous()
end1 = time.clock()
print "%.2gs" % (end1-start1)
print('Asynchronous:')
start2 = time.clock()
asynchronous()
end2 = time.clock()
print "%.2gs" % (end2-start2)
However, the time run by 'asynchronous' is much longer than 'synchronous':
ubuntu#ip:/tmp$ python gevent_as.py
Synchronous:
0.32s
Asynchronous:
0.64s
ubuntu#ip:/tmp$ python gevent_as.py
Synchronous:
0.3s
Asynchronous:
0.61s
I want to know what's wronge with my test program? Thanks.

It is the problem of time.clock(), that doesn't work properly under ubuntu. See the link for details: Python - time.clock() vs. time.time() - accuracy?
I changed the test program:
print('Synchronous:')
start1 = time.time()
synchronous()
end1 = time.time()
print "%.2gs" % (end1-start1)
print('Asynchronous:')
start2 = time.time()
asynchronous()
end2 = time.time()
print "%.2gs" % (end2-start2)
Then the test speed of 'asynchronous' is much faster than 'synchronous':
ubuntu#ip:/tmp$ python gevent_as.py
Synchronous:
1.1s
Asynchronous:
0.057s

Probably the sleeps are very small and overhead matters. Try replacing 0.001 with 0.1.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Set limit and check memory usage of subprocess.Popen in Python - python

Related

How can I improve what I already have?

Python multiprocessing finish the work correctly, but the processes still alive (Linux)

Stop multiprocess pool when a condition is met and continue with program

can pexpect be used to get the output till timeout without looking for any specific pattern

Wrong speed test using time.clock() between synchronous and asynchronous cases?

Categories

Resources