Bash Process Execution Logic - python

I have a number of scripts to run, some of which have one or more scripts that must be completed first. I've read a number of examples showing how bash's control operators work, but haven't found any good examples to address the complexity of the logic i'm trying to implement.
I have p_01.py and p_03.py that are both requirements for p_09.py, but also have individual processes that only require p_01. For example:
((python p_01.py & python p_03.py) && python p_09.py) &
(python p_01.py &&
(
(python p_05.py;
python p_10.py) &
(python p_08.py;
python p_11.py)
)
)
wait $(jobs -p)
My question is, how can I accomplish all of the scripts running only after their requirements without repeating the running scripts (such as p_01.py, which you'll notice it's used twice above)? I'm looking for a generalized answer with some detail, since in actuality the dependencies are more numerous/nested than the example above. Thank you!

If you are thinking of the scripts in terms of their dependencies, that's difficult to translate directly to a master script. Consider using make, which would let you express these dependencies directly:
SCRIPTS = $(wildcard *.py)
.PHONY: all
all: $(SCRIPTS)
$(SCRIPTS):
python $#
p_05.py p_08.py p_09.py: p_01.py
p_09.py: p_03.py
p_10.py: p_05.py
p_11.py: p_08.py
Running make -B -j4 would run all of the Python scripts with up to 4 executing in parallel at any one time.

Related

disabling python script tracing, equivalent of turning off -x in bash

I have python scripts which I'd like to disable tracing on. By tracing, I mean the ability to run:
python -m trace --tracing thescript.py
In bash, if you want to see the inner workings of a script, you can just run the following:
sh -x thescript.sh
or
bash -x thescript.sh
However, if thescript.sh contains a set +x; that will stop the external sh -x or bash -x from showing any further inner workings of the script past the line containing the set +x.
I want the same for python. I'm only aware of python -m trace --tracing as the only way to see the inner workings of a python script. I'm sure there are many different ways to do this. What I'm looking for here is a solid method to use to stop any type of tracing that could be done to a python script.
I understand if a user has the proper permissions, they can edit the script and comment out whatever is put in to disable tracing. I'm aware of that. But i would still like to know.
Taking even a cursory look at trace.py from the standard library makes the interfaces it uses, and thus the mechanism to disable them, clear:
#!/usr/bin/env python
import sys, threading
sys.settrace(None)
threading.settrace(None)
print("This is not traced")
Of course, this won't (and can't) work against anyone who wants to modify the live interpreter to NOP out those calls; the usual caveats about anyone who owns and controls the hardware being able to own and control the software apply.

How can I limit memory usage for a Python script via command line?

How can I limit memory usage for a Python script via command line?
For context, I'm implementing a code judge so I need to run every script students submit, I was able to do the same for Java with the following command:
java -Xmx<memoryLimit> Main
So far no luck with Python, any ideas?
PS: I'm using Python 3.8
Thank you.
You can use ulimit on Linux systems. (Within Python, there's also resource.setrlimit() to limit the current process.)
Something like this (sorry, my Bash is rusty) should be a decent enough wrapper:
#!/bin/bash
ulimit -m 10240 # kilobytes
exec python3 $#
Then run e.g. that-wrapper.sh student-script.py.
(That said, are you sure you can trust your students not to submit something that uploads your secret SSH keys and/or trashes your file system? I'd suggest a stronger sandbox such as running everything in a Docker container.)
Not sure why you want/need that. In contrast to Java, Python is very good at handling memory. It has proper garbage collectors and is quite efficient in using memory. So in my 10+ years of python programming, I never had to limit memory in python. However, if you really need it, check out this thread Limit RAM usage to python program. Someone seems to have posted a solution.
You usually limit the memory on OS level, not in python itself. You could also use Docker to achieve all of that.

Setting Environment Up with Python

Our environment has a shell script to setup the working area. setup.sh looks like this:
export BASE_DIR=$PWD
export PATH=$BASE_DIR/bin
export THIS_VARIABLE=THAT_VALUE
The user does the following:
% . setup.sh
Some of our users are looking for a csh version and that would mean having two setup files.
I'm wondering if there is a way to do this work with a common python file. In The Hitchhiker's Guide to Python Kenneth Reitz suggests using a setup.py file in projects, but I'm not sure if Python can set environment variables in the shell as I do above.
Can I replace this shell script with a python script that does the same thing? I don't see how.
(There are other questions that ask this more broadly with many many comments, but this one has a direct question and direct single answer.)
No, Python (or generally any process on Unix-like platforms) cannot change its parent's environment.
A common solution is to have your script print the output in a format suitable for the user's shell. E.g. ssh-agent will print out sh-compatible global assignments with -s or when it sees that it is being invoked from a Bourne-compatible shell; and csh syntax if invoked from csh or tcsh or when explicitly invoked with -c.
The usual invocation in sh-compatible shells is $(eval ssh-agent) -- so the text that the program prints is evaluated by the shell where the user invoked this command.
eval is a well-known security risk, so you want to make this code very easy to vet even for people who don't speak much Python (or shell, or anything much else).
If you are, eh cough, skeptical of directly supporting Csh users, perhaps you can convince them to run your sh-compatible script in a Bourne-compatible shell and then exec csh to get their preferred interactive environment. This also avoids the slippery slope of having an ever-growing pile of little maintenance challenges for supporting Csh, Fish, rc, Powershell etc users.

Make (install from source) python without running tests

I compiling python from source tar. All works good, but tests running 2 hours and two times. How to bypass these tests?
0:16:20 [178/405] test_inspect
0:16:26 [179/405] test_int
0:16:27 [180/405] test_int_literal
0:16:27 [181/405] test_io
0:18:18 [182/405] test_ioctl -- test_io passed in 1 min 51 sec
0:18:19 [183/405] test_ipaddress
0:18:22 [184/405] test_isinstance
0:18:23 [185/405] test_iter
0:18:24 [186/405] test_iterlen
0:18:25 [187/405] test_itertools
0:19:09 [188/405] test_json -- test_itertools passed in 44 sec
0:19:30 [189/405] test_keyword
As result
make 7724,86s user 188,63s system 101% cpu 2:10:18,93 total
I make its distribution like this
PYTHON_VERSION = 3.6.1
PYTHON_URL = https://www.python.org/ftp/python/${PYTHON_VERSION}/Python-${PYTHON_VERSION}.tar.xz
wget -O dl/Python-${PYTHON_VERSION}.tar.xz ${PYTHON_URL}
cd dl
tar xf Python-${PYTHON_VERSION}.tar.xz
mkdir -p dl/Python-${PYTHON_VERSION}-build/
cd Python-${PYTHON_VERSION}
./configure --enable-optimizations --prefix=$$(pwd)-build --cache-file=$$(pwd)/cache-file
This commands runs tests twice:
make -C dl/Python-${PYTHON_VERSION} -j8
make -C dl/Python-${PYTHON_VERSION} -j8 install
p.s. This is part of another make file.
The configure option --enable-optimizations enables running test suites to generate data for profiling Python. The resulting python binary has better performance in executing python code. Improvements noted here
From configure help:
--enable-optimizations Enable expensive optimizations (PGO, etc). Disabled by default.
From wikipedia
profile-guided optimisation uses the results of profiling test runs of the instrumented program to optimize the final generated code.
In short, you should not skip tests when using --enable-optimizations as the data required for profiling is generated by running tests.
You can run make -j8 build_all followed by make -j8 install to skip tests once(the tests would still run with install target), but that would defeat the purpose.
You can instead drop the configure flag for better build times.
just build and install with
make -j8 build_all
make -j8 altinstall
I did some (quick) research on skipping the test runs when building Python by instructing either:
configure - passing some args (e.g. --without-tests, --disable-tests, --skip-tests)
make - specifying some variable (either via env vars or cmdline)
The former yielded no results. The latter (by looking in the Makefile template) revealed the fact that test execution is invoked by calling ${PYTHON_SRC_DIR}/Tools/scripts/run_tests.py (which sets some stuff and calls another script, which calls another one, ...). Note that I found the file on Python 3.5(.4) and Python 3.6(.4) but not on Python 2.7(.14). A little bit more research revealed that it is possible to skip the (above) test run. What you need to do is:
make -C dl/Python-${PYTHON_VERSION} -j8 EXTRATESTOPTS=--list-tests install
Notes:
Googleing didn't reveal anything (relevant) on EXTRATESTOPTS, so I guess it's not officialy supported
You could also set EXTRATESTOPTS=--list-tests as an environment variable, before launching (inner) make
Needless to say that if some "minor" error happened during build (e.g. a non critical external module (like _ssl.so for example) failed to build), there will be no tests to fail, so you'll only find about it at runtime (which would be terribly nasty if it would happen in production)
#EDIT0:
After #amohr 's comment, I decided to play a little bit more, so I ran the whole process:
configure (opts)
make (opts)
make install
on a Lnx (Ubtu 16) machine with 2 CPUs, where one (full) test run takes ~24 minutes. Here are my findings (Python 3.6):
It ran successfully on Python 3.5(.4)
The solution that I suggested earlier, operates at the 3rd step, so it only skips the 2nd test run: it operates on the (root) Makefile's test target (make test) which is invoked by install target
Regarding the 1st test run, by checking the Makefile, and make's output, here's what I discovered that happens at the 2nd (make) step:
The C sources are built "normally"
Tests are being run (I deducted that some profile data is stored somewhere)
The C sources are rebuilt with different flags (e.g. in my case gcc's -fprofile-generate was replaced by -fprofile-use -fprofile-correction (check [GNU.GCC]: Options That Control Optimization for more details)) to make use of the profile info generated at previous (sub) step
Skipping the 1st test run would automatically imply no optimizations. Way(s) of achieving:
make build_all (at 2nd step) - as suggested by other answers
Here's a snippet of the (root) Makefile generated by configure (with --enable-optimizations):
all: profile-opt
build_all: check-clean-src $(BUILDPYTHON) oldsharedmods sharedmods gdbhooks \
Programs/_testembed python-config
And here's one without it:
all: build_all
build_all: check-clean-src $(BUILDPYTHON) oldsharedmods sharedmods gdbhooks \
Programs/_testembed python-config
As seen, running:
configure --enable-optimizations
make build_all
is identical to:
configure
make
Manually modifying the (root) Makefile between 1st (configure --enable-optimizations) and 2nd (make) steps:
Find the macro definition PROFILE_TASK=-m test.regrtest --pgo (for me it was around line ~250)
Add --list-tests at the end
Substeps (#2.)#1. and (#2.)#3. are exactly the same, while for (#2.)#2., the tests are not being run. That can mean that either:
The 2nd sources build is identical to the 1st one (which would make it completely useless)
The 2nd does some optimizations (without having any information), which means that it could crash at runtime (I think / hope it's the former case)
the default build target for optimized builds includes running the tests.
to skip them, try:
make -C dl/Python-${PYTHON_VERSION} -j8 build_all
We faced the same problem with python 3.7.6 and based on reverse engineering the Makefile, found that following steps will build python quickly while running the tests too (so that we don't loose on --enable-optimization flag)
cd /home/ubuntu
PYTHON_VER=3.7.6
wget https://www.python.org/ftp/python/$PYTHON_VER/Python-$PYTHON_VER.tgz
tar xvf Python-$PYTHON_VER.tgz
cd Python-$PYTHON_VER/
./configure --enable-loadable-sqlite-extensions --enable-optimizations
make profile-gen-stamp; ./python -m test.regrtest --pgo -j8; make build_all_merge_profile; touch profile-run-stamp; make
make install
The key is run python tests in parallel by passing -j8 to it.
the first part of tests is required for optimizing the code. simply you can't / should not skip it.

Multi-threading using shell script

I am using a python script to perform some calculations in my image and save the array obtained into a .png file. I deal with 3000 to 4000 images. To perform all these I use a shell script in Ubuntu. It gets the job done. But is there anyway to make it fast. I have 4 cores in my machine. How to use all of them. The script I am using is below
#!/bin/bash
cd $1
for i in $(ls *.png)
do
python ../tempcalc12.py $i
done
cd ..
tempcalc12.py is my python script
This question might be trivial. But I am really new to programming.
Thank you
xargs has --max-procs= ( or -P) option which does the job in parallel.
The following code does the job in maximum of 4 processes.
ls *.png | xargs -n 1 -P 4 python ../tempcalc12.py
You can just add a & to the python line to have everything executed in parallel:
python ../tempcalc12.py $i &
This is a bad idea though, as having too many processes will just slow everything down.
What you can do is limit the number of threads, like this:
MAX_THREADS=4
for i in $(ls *.png); do
python ../tempcalc12.py $i &
while [ $( jobs | wc -l ) -ge "$MAX_THREADS" ]; do
sleep 0.1
done
done
Every 100ms, it will check the number of running jobs, and if it is inferior to MAX_THREADS, add new jobs in background.
This is a nice hack if you just want a quick working solution, but you might also want to investigate what GNU Parallel can do.
If you have GNU Parallel you can do:
parallel python ../tempcalc12.py ::: *.png
It will do The Right Thing by spawning a job per core, even if the names your PNGs have space, ', or " in them. It also makes sure the output from different jobs are not mixed together, so if you use the output you are guaranteed that you will not get half-a-line from two different jobs.
GNU Parallel is a general parallelizer and makes is easy to run jobs in parallel on the same machine or on multiple machines you have ssh access to.
If you have 32 different jobs you want to run on 4 CPUs, a straight forward way to parallelize is to run 8 jobs on each CPU:
GNU Parallel instead spawns a new process when one finishes - keeping the CPUs active and thus saving time:
Installation
If GNU Parallel is not packaged for your distribution, you can do a personal installation, which does not require root access. It can be done in 10 seconds by doing this:
(wget -O - pi.dk/3 || curl pi.dk/3/ || fetch -o - http://pi.dk/3) | bash
For other installation options see http://git.savannah.gnu.org/cgit/parallel.git/tree/README
Learn more
See more examples: http://www.gnu.org/software/parallel/man.html
Watch the intro videos: https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1
Walk through the tutorial: http://www.gnu.org/software/parallel/parallel_tutorial.html
Sign up for the email list to get support: https://lists.gnu.org/mailman/listinfo/parallel

Categories