Do Processes take up memory after it has finished running? - python

I am write an operational script which starts processes at specific times of the day. The script will be running constantly, as such I am afraid that the creation and starting of processes will take up memory.
I write a short script to see what happens to the Process after it has finished running the function.
# t.py
import time
import multiprocessing as mp
def f(t=0):
time.sleep(t)
a = mp.Process(target=f, args=(10, ))
a.start()
print(a.pid)
time.sleep(10000)
$ python3 t.py
32263
I then run the following to see what processes are running
$ ps auxf | grep python t.py
user 32262 3.0 0.1 21360 11200 pts/0 S+ 11:37 0:00 | \_ python3 t.py
user 32263 0.0 0.0 21360 7704 pts/0 S+ 11:37 0:00 | \_ python3 t.py
This shows that a is still present despite having finished running. I found that the pid 32263 only disappears if i include a.join() in the script. But this would block me from starting and creating other processes.
Is there any way to kill a for good? a.kill() and a.terminate() does not seem to do the job.
If I were to neglect the fact that a still appears in ps auxf, and I were to keep on creating and starting new processes, would that take up memory?

Related

Call 5 sh scripts from main sh script

I would like some help on how to set up properly a complicated job on a HPC. So, at some point in my python code I want to submit a job by using os.system("bsub -K < mama.sh") , I fould that the -K arg would actually wait for the job to end before continuing. So now I want from this mama.sh script to call 5 other jobs (kid1.sh, kid2.sh ... kid5.sh) that would run in parallel (to reduce computational time). Each one of these 5 children scripts will run a python piece of code. mama.sh should wait until all 5 other jobs have finished before continuing.
I thought of something like that:
#!/bin/sh
#BSUB -q hpc
#BSUB -J kids[1-5]
#BSUB -n 5
#BSUB -W 10:00
#BSUB -R "rusage[mem=6GB]"
#BSUB -R "span[hosts=1]"
# -- end of LSF options --
module load python3/3.8
python3 script%Ι.py
ORRR
python3 script1.py
python3 script2.py
python3 script3.py
python3 script4.py
python3 script5.py
Maybe the above doesn't make sense at all though. Is there any way to actually do that?
Thanks in advance
As is know to me, you can accomplish the goal in different level.
By two easy ways:
parallel your python code by import multiprocessing
parallel your shell script by &, command can be executed in the background.
python3 script1.py &
python3 script2.py

Python subprocess call hungs when running rpm2cpio

I'm running the below command using python subprocess to extract files from rpm.
But the command failes when the rpm size is more than 25 - 30 MB. Tried the command using Popen, call, with stdout as PIPE and os.system as well. This command is working fine when i run it in shell directly. The problem is only when i invoke this by some means from Python
Command:
rpm2cpio <rpm_name>.rpm| cpio -idmv
I did an strace on the process id and found that its always hung on some write system call
ps -ef | grep cpio
root 4699 4698 4 11:05 pts/0 00:00:00 rpm2cpio kernel-2.6.32-573.26.1.el6.x86_64.rpm
root 4700 4698 0 11:05 pts/0 00:00:00 cpio -idmv
strace -p 4699
Process 4699 attached
write(10, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0rc_pixelview_new"..., 8192
strace -p 4700
Process 4700 attached
write(2, "./lib/modules/2.6.32-573.26.1.el"..., 94
I have 2 questions:
Can someone figure out what is the problem here? Why is it failing when the rpm size is more than 25 MB.
Is there any other way i can extract the rpm contents from python?
Your output pipe is full. The python docs note in many places not to do what you are doing:
Do not use stdout=PIPE or stderr=PIPE with this function as that can deadlock based on the child process output volume. Use Popen with the communicate() method when you need pipes.
If all you want is the payload of a *.rpm package, then do the computations to find the beginning of the compressed cpio payload and do the operations directly in python.
See How do I extract the contents of an rpm? for a rpm2cpio.sh shell script that documents the necessary computations. The only subtlety is ensuring that the padding (needed for alignment) between the signature and metadata headers is correct.

`ps -ef` shows running process twice if started with `subprocess.Popen`

I use the following snippet in a larger Python program to spawn a process in background:
import subprocess
command = "/media/sf_SharedDir/FOOBAR"
subprocess.Popen(command, shell=True)
After that I wanted to check whether the process was running when my Python program returned.
Output of ps -ef | grep -v grep | grep FOOBAR:
ap 3396 937 0 16:08 pts/16 00:00:00 /bin/sh -c /media/sf_SharedDir/FOOBAR
ap 3397 3396 0 16:08 pts/16 00:00:00 /bin/sh /media/sf_SharedDir/FOOBAR
I was surprised to see two lines of and they have differend PIDs so are those two processes running? Is there something wrong with my Popen call?
FOOBAR Script:
#!/bin/bash
while :
do
echo "still alive"
sleep 1
done
EDIT: Starting the script in a terminal ps displayes only one process.
Started via ./FOOBAR
ap#VBU:/media/sf_SharedDir$ ps -ef | grep -v grep | grep FOOBAR
ap 4115 3463 0 16:34 pts/5 00:00:00 /bin/bash ./FOOBAR
EDIT: shell=True is causing this issue (if it is one). But how would I fix that if I required shell to be True to run bash commands?
There is nothing wrong, what you see is perfectly normal. There is no "fix".
Each of your processes has a distinct function. The top-level process is running the python interpreter.
The second process, /bin/sh -c /media/sf_SharedDir/FOOBAR' is the shell that interprets the cmd line (because you want | or * or $HOME to be interpreted, you specified shell=True).
The third process, /bin/sh /media/sf_SharedDir/FOOBAR is the FOOBAR cmd. The /bin/sh comes from the #! line inside your FOOBAR program. If it were a C program, you'd just see /media/sf_SharedDir/FOOBAR here. If it were a python program, you'd see /usr/bin/python/media/sf_SharedDir/FOOBAR.
If you are really bothered by the second process, you could modify your python program like so:
command = "exec /media/sf_SharedDir/FOOBAR"
subprocess.Popen(command, shell=True)

Celeryd launching too many processes

How do you ensure celeryd only runs as a single process? When I run manage.py celeryd --concurrency=1 and then ps aux | grep celery I see 3 instances running:
www-data 8609 0.0 0.0 20744 1572 ? S 13:42 0:00 python manage.py celeryd --concurrency=1
www-data 8625 0.0 1.7 325916 71372 ? S 13:42 0:01 python manage.py celeryd --concurrency=1
www-data 8768 0.0 1.5 401460 64024 ? S 13:42 0:00 python manage.py celeryd --concurrency=1
I've noticed a similar problem with celerybeat, which always runs as 2 processes.
As per this link .. The number of processes would be 4: one main process, two child processes and one celerybeat process,
also if you're using FORCE_EXECV there's another process started to cleanup semaphores.
If you use celery+django-celery development, and using RabbitMQ or Redis as a broker, then it shouldn't use more
than one extra thread (none if CELERY_DISABLE_RATE_LIMITS is set)

Running bash script from python

I'm encountering the following problem:
I have this simple script, called test.sh:
#!/bin/bash
function hello() {
echo "hello world"
}
hello
when I run it from shell, I got the expected result:
$ ./test2.sh
hello world
However, when I try to run it from Python (2.7.?) I get the following:
>>> import commands
>>> cmd="./test2.sh"
>>> commands.getoutput(cmd)
'./test2.sh: 3: ./test2.sh: Syntax error: "(" unexpected'
I believe it somehow runs the script from "sh" rather than bash. I think so because when I run it with sh I get the same error message:
$ sh ./test2.sh
./test2.sh: 3: ./test2.sh: Syntax error: "(" unexpected
In addition, when I run the command with preceding "bash" from python, it works:
>>> cmd="bash ./test2.sh"
>>> commands.getoutput(cmd)
'hello world'
My question is: Why does python choose to run the script with sh instead of bash though I added the #!/bin/bash line at the beginning of the script? How can I make it right (I don't want to use preceding 'bash' in python since my script is being run from python by distant machines which I cant control).
Thanks!
There seems to be some other problem - the shbang and commands.getoutput should work properly as you show here. Change the shell script to just:
#!/bin/bash
sleep 100
and run the app again. Check with ps f what's the actual process tree. It's true that getoutput calls sh -c ..., but this shouldn't change which shell executes the script itself.
From a minimal test as described in the question, I see the following process tree:
11500 pts/5 Ss 0:00 zsh
15983 pts/5 S+ 0:00 \_ python2 ./c.py
15984 pts/5 S+ 0:00 \_ sh -c { ./c.sh; } 2>&1
15985 pts/5 S+ 0:00 \_ /bin/bash ./c.sh
15986 pts/5 S+ 0:00 \_ sleep 100
So in isolation, this works as expected - python calls sh -c { ./c.sh; } which is executed by the shell specified in the first line (bash).
Make sure you're executing the right script - since you're using ./test2.sh, double-check you're in the right directory and executing the right file. (Does print open('./test2.sh').read() return what you expect?)

Categories