How to limit memory usage within a python process - python

I run Python 2.7 on a Linux machine with 16GB Ram and 64 bit OS. A python script I wrote can load too much data into memory, which slows the machine down to the point where I cannot even kill the process any more.
While I can limit memory by calling:
ulimit -v 12000000
in my shell before running the script, I'd like to include a limiting option in the script itself. Everywhere I looked, the resource module is cited as having the same power as ulimit. But calling:
import resource
_, hard = resource.getrlimit(resource.RLIMIT_DATA)
resource.setrlimit(resource.RLIMIT_DATA, (12000, hard))
at the beginning of my script does absolutely nothing. Even setting the value as low as 12000 never crashed the process. I tried the same with RLIMIT_STACK, as well with the same result. Curiously, calling:
import subprocess
subprocess.call('ulimit -v 12000', shell=True)
does nothing as well.
What am I doing wrong? I couldn't find any actual usage examples online.
edit: For anyone who is curious, using subprocess.call doesn't work because it creates a (surprise, surprise!) new process, which is independent of the one the current python program runs in.

resource.RLIMIT_VMEM is the resource corresponding to ulimit -v.
RLIMIT_DATA only affects brk/sbrk system calls while newer memory managers tend to use mmap instead.
The second thing to note is that ulimit/setrlimit only affects the current process and its future children.
Regarding the AttributeError: 'module' object has no attribute 'RLIMIT_VMEM' message: the resource module docs mention this possibility:
This module does not attempt to mask platform differences — symbols
not defined for a platform will not be available from this module on
that platform.
According to the bash ulimit source linked to above, it uses RLIMIT_AS if RLIMIT_VMEM is not defined.

Related

app package built by pyinstaller fails after it uses tqdm

I have a program which can optionally use tqdm to display a progress bar for long calculations.
I've packaged the program using pyinstaller.
The packaged program runs to completion just fine if I don't enable tqdm.
If I enable tqdm, the package works fine and displays progress bars until it gets to the very end, then the message below is displayed and the packaged program is relaunched (with the wrong args).
multiprocessing/resource_tracker.py:104: UserWarning: resource_tracker:
process died unexpectedly, relaunching. Some resources might leak.
There is nothing fancy about the way the program invokes tqdm.
There is a loop that iterates over the members of a list.
The program does not import the multiprocessing module.
def best_cost(self, guesses, steps=1, verbose=0) -> None:
""" find the guess with the minimum cost """
self.cost = float("inf")
for guess in guesses if verbose < 3 else tqdm(guesses, mininterval=1.0):
groups = dict()
for (word, frequency) in self.words.items():
result = calc_result(guess, word)
groups.setdefault(result, {})[word] = frequency
self.check_cost(guess, groups, steps=steps)
return
The problem occurs whether pyinstaller creates the package with --onedir or --onefile.
I'm running on MACOS Monterey V12.4 on an Intel processor.
I'm using python 3.10.4, but the problem also occurs with 3.9. Pip installed packages include: pyinstaller 5.1, pyinstaller-hooks-contrib 2022.7, and tqdm 4.64.0.
I package the program in a virtualenv.
I found this in the documentation for the multiprocessing module:
On Unix using the spawn or forkserver start methods will also start a resource tracker process which tracks the unlinked named system resources (such as named semaphores or SharedMemory objects) created by processes of the program. When all processes have exited the resource tracker unlinks any remaining tracked object. Usually there should be none, but if a process was killed by a signal there may be some “leaked” resources. (Neither leaked semaphores nor shared memory segments will be automatically unlinked until the next reboot. This is problematic for both objects because the system allows only a limited number of named semaphores, and shared memory segments occupy some space in the main memory.)
I can't figure out how to debug the the problem since it only occurs with the pyinstaller packaged program and my IDE (vscode) doesn't help.
My program doesn't use the multiprocessing module.
The tqdm documentation doesn't mention issues like this, but it does include the multiprocessing module to create locks (semaphores) in the tqdm class which
get used by the instances.
There aren't any tqdm methods I can find to delete the semaphores resources;
I guess that is just supposed to happen automatically when the program ends.
I think the pyinstaller packaged program may also use the multiprocessing
module.
Perhaps pyinstaller is interfering with deletion of the tqdm semaphores somehow?
The packaged program works fine till it starts to exit and then gets restarted.
Does anyone know how I can get the packaged program to exit cleanly?

Can't import custom modules in python3.9 when running in wsl2

So I am trying to write some python code that will do two things, that seem to be mutually exclusive on my machine. My PC's host operating system is windows and I run Kali-Linux in WSL2 when I need to test my code on Linux. My code's main function creates two separate multiprocessing.Process objects, assigning a different thread, starting them both one after the other and then calling for them both to be joined. The plan is to allow each to run a simple server application simultaneously on different ports. This does not work when running python3 in PowerShell, as it seems to require access to os.fork() which doesn't work in said environment. When I found this out I pivoted to running in WSL2 which worked fantastically, for a time. After a while of experimenting with some ideas I decided to take some of my code and spin it off into its own file, which I placed in its own 'Libs' folder. WSL2 however, was unable to import this new file, instead giving me the exception ModuleNotFoundError: No module named 'NetStuff'. I originally had added:
sys.path.append('./Libs')
as has worked for me in the past, however when I found that WSL2 was unable to find my module, I printed out sys.path and it revealed that rather than appending my $current_working_directory/Libs like I intended, I was just appending the literal string, which wasn't useful. I then decided to try:
sys.path.append(str(pathlib.Path().resolve()) + '/Libs')
which at the bare minimum shows up as I would expect in sys.path. This, however still didn't work, python was unable to find my module and would unceremoniously crash every time. This led me to try something else, I ran my code in python3 under PowerShell again, which had no issue importing my module, it did still crash due to lacking os.fork() but the import gave no issues. Confused and annoyed I opened my code in IDLE 3.9 which, for some inexplicable reason, was able to import the file, and seemingly use os.fork(). The only major issue with running in IDLE is that it is seemingly incapable of understanding ascii colour escape characters. Given that the goal is to run my code in bash, and ideally also PowerShell, I am not satisfied with this as a solution. I returned to trying to fix the issue in WSL2 by adding my module to /home/Noah/bin, and appending this directory to sys.path, but this has still not so much as given me a new symptom.
I am utterly at a loss at this point. none of the fixes I know off hand are working, and neither are the new ones I've found online. I can't tell if I'm just missing something fundamental about python or if I'm running into a bug, if it's the latter, i can't seem to find other people with the same issue. As a result of my confusion and frustration I am appealing to you, kind users of stackoverflow.
The following is the snippet that is causing me problems in WSL2:
path0 = ('/home/Noah/bin')
path1 = (str(pathlib.Path().resolve()) + '/Libs')
sys.path.append(path0)
sys.path.append(path1)
print(sys.path)
import NetStuff
The following is output of print(sys.path) in WSL2:
['/mnt/c/Users/Noah/VSCodeRepos/Python/BlackPack', '/usr/lib/python39.zip', '/usr/lib/python3.9', '/usr/lib/python3.9/lib-dynload', '/home/noah/.local/lib/python3.9/site-packages', '/usr/local/lib/python3.9/dist-packages', '/usr/lib/python3/dist-packages', '/home/Noah/bin', '/mnt/c/Users/Noah/VSCodeRepos/Python/BlackPack/Libs']
The following is the error being thrown by WSL2:
Traceback (most recent call last):
File "/mnt/c/Users/Noah/VSCodeRepos/Python/BlackPack/BlackPackServer.py", line 21, in <module>
import NetStuff
ModuleNotFoundError: No module named 'NetStuff'
I am specifically hoping to fix the issue with WSL2 at the moment as I am fairly certain that getting the code to run on PowerShell is merely going to require rewriting my code so that it doesn't rely on os.fork(). Thank you for reading my problem, and if I left out any information that you would like to see just tell me and I'll add it in an edit!
Edit: I instantly realized that I should specify that my host machine is running windows 10.

List all daemon/service names in linux using Python?

I want to have list of all daemons/services in Linux using python.
I am able to get information using
service --status-all
However I don't want to execute terminal commands through Python. Is there any API available to perform this operation?
My project includes lots of stuff so I need to be careful with memory and cpu usage, also I may need to run the command every 10sec or 60sec depending on configuration.
So I don't want to create too many process.
Earlier I had used subprocess.check_output(command)
But I was told by my manager to avoid using commands and try to use any available packages, I tried searching some but could found packages which can only monitor services and cannot list.
Finally my objective is to minimize load on system.
Any suggestions ?
Additoinal details-
Python 3.7.2
Ubuntu 16
With psutil (sudo pip3 install psutil)
#!/usr/bin/env python
import psutil
def show_services():
return [(
psutil.Process(p).name(),
psutil.Process(p).status(),
) for p in psutil.pids()]
def main():
for service in show_services():
if service[1] == 'running':
print(service[0])
if __name__ == '__main__':
main()
show_services returns a list of tuples (name, status)
There is a package for that, pystemd.
Quoting its front page:
This library allows you to talk to systemd over dbus from python, without actually thinking that you are talking to systemd over dbus. This allows you to programmatically start/stop/restart/kill and verify services status from systemd point of view, avoiding executing subprocess.Popen(['systemctl', ... and then parsing the output to know the result.
so
pip install pystemd
and
from pystemd.systemd1 import Manager
manager = Manager()
manager.load()
print(manager.Manager.ListUnitFiles())
For efficiency, you don't need to reload list of services each time, if it hasn't changed. I don't have a right solution for that in my head, but I guess you could watch unit files directories with inotify for example and only reload when they're changed.

Slow down when subprocesses launched in parallel when using mpi4py

Using mpi4py, I'm running a python program which launches multiple fortran processes in parallel, starting from a SLURM script using (for example):
mpirun -n 4 python myprog.py
but have noticed that myprog.py takes longer to run the higher the number of tasks requested eg. running myprog.py (following code shows only the mpi part of program):
comm = MPI.COMM_WORLD
size = comm.Get_size()
rank = comm.Get_rank()
data = None
if rank == 0:
data = params
recvbuf = np.empty(4, dtype=np.float64)
comm.Scatter(data, recvbuf, root=0)
py_task(int(recvbuf[0]), recvbuf[1], recvbuf[2], int(recvbuf[3]))
with mpirun -n 1 ... on a single recvbuf array takes 3min- whilst running on four recvbuf arrays (expectedly in parallel), on four processors using mpirun -n 4 ... takes about 5 min. However, I would expect run times to be approximately equal for both the single and four processor cases.
py_task is effectively a python wrapper to launch a fortran program using:
subprocess.check_call(cmd)
There seems to be some interaction between subprocess.check_call(cmd) and the mpi4py package that is stopping the code from properly operating in parallel.
I've looked up this issue but can't seem to find anything that's helped it. Are there any fixes to this issue/ detailed descriptions explaining what's going on here/ recommendations on how to isolate the cause of the bottleneck in this code?
Additional note:
This pipeline has been adapted to mpi4py from "joblib import Parallel", where there was no previous issues with the subprocess.check_call() running in parallel, and is why I suspect this issue is linked with the interaction between subprocess and mpi4py.
The slowdown was initially fixed by adding in:
export SLURM_CPU_BIND=none
to the slurm script that was launching the jobs.
Whilst the above did provide a temporary fix, the issue was actually much deeper and I will provide a very basic description of it here.
1) I uninstalled the mpi4py I had installed with conda, then reinstalled it with Intel MPI loaded (the recommended MPI version for our computing cluster). In the SLURM script, I then changed the launching of the python program to:
srun python my_prog.py .
and removed the export... line above, and the slowdown was removed.
2) Another slowdown was found for launching > 40 tasks at once. This was due to:
Each time the fortran-based subprocess gets launched, there's a cost to the filesystem requesting the initial resources (eg. supplying file as an argument to the program). In my case there were a large number of tasks being launched simultaneously and each file could be ~500mb, which probably exceeded the IO capabilities of the cluster filesystem. This led to a large overhead introduced to the program from the slowdown of launching each subprocess.
The previous joblib implementation of the parallelisation was only using a maximum of 24 cores at a time, and then there was no significant bottleneck in the requests to the filesystem- hence why no performance issue was previously found.
For 2), I found the best solution was to significantly refactor my code to minimise the number of subprocesses launched. A very simple fix, but one I hadn't been aware of before finding out about the bottlenecks in resource requests on filesystems.
(Finally, I'll also add that using the subprocess module within mpi4py is generally not recommended online, with the multiprocessing module preferred for single node usage. )

FiPy fork error

I am currently trying to use fipy on our local cluster (Scientific Linux 6.9 and Python 2.7.8) to perform some drift-diffusion calculations, and I'm having difficulty with solving in parallel. When I attempt to run my script invoking the --trilinos option I get the following error:
A process has executed an operation involving a call to the
"fork()" system call to create a child process. Open MPI is currently
operating in a condition that could result in memory corruption or
other system errors; your job may hang, crash, or produce silent
data corruption. The use of fork() (or system() or other calls that
create child processes) is strongly discouraged.
The process that invoked fork was:
Local host: [[20666,15131],0] (PID 21499)
If you are absolutely sure that your application will successfully and
and correctly survive a call to fork(), you may disable this warning by setting the mpi_warn_on_fork MCA parameter to 0.
I have isolated this error in the simple script:
from mpi4py import *
from fipy import *
I think that there is therefore some conflict between mpi4py and fipy but I am lost as to how to diagnose the conflict. Is there something simple I am missing in the installation? I have installed fipy and mpi4py through pip, and PyTrilinos is installed from source.

Categories