Make (install from source) python without running tests - python

I compiling python from source tar. All works good, but tests running 2 hours and two times. How to bypass these tests?
0:16:20 [178/405] test_inspect
0:16:26 [179/405] test_int
0:16:27 [180/405] test_int_literal
0:16:27 [181/405] test_io
0:18:18 [182/405] test_ioctl -- test_io passed in 1 min 51 sec
0:18:19 [183/405] test_ipaddress
0:18:22 [184/405] test_isinstance
0:18:23 [185/405] test_iter
0:18:24 [186/405] test_iterlen
0:18:25 [187/405] test_itertools
0:19:09 [188/405] test_json -- test_itertools passed in 44 sec
0:19:30 [189/405] test_keyword
As result
make 7724,86s user 188,63s system 101% cpu 2:10:18,93 total
I make its distribution like this
PYTHON_VERSION = 3.6.1
PYTHON_URL = https://www.python.org/ftp/python/${PYTHON_VERSION}/Python-${PYTHON_VERSION}.tar.xz
wget -O dl/Python-${PYTHON_VERSION}.tar.xz ${PYTHON_URL}
cd dl
tar xf Python-${PYTHON_VERSION}.tar.xz
mkdir -p dl/Python-${PYTHON_VERSION}-build/
cd Python-${PYTHON_VERSION}
./configure --enable-optimizations --prefix=$$(pwd)-build --cache-file=$$(pwd)/cache-file
This commands runs tests twice:
make -C dl/Python-${PYTHON_VERSION} -j8
make -C dl/Python-${PYTHON_VERSION} -j8 install
p.s. This is part of another make file.

The configure option --enable-optimizations enables running test suites to generate data for profiling Python. The resulting python binary has better performance in executing python code. Improvements noted here
From configure help:
--enable-optimizations Enable expensive optimizations (PGO, etc). Disabled by default.
From wikipedia
profile-guided optimisation uses the results of profiling test runs of the instrumented program to optimize the final generated code.
In short, you should not skip tests when using --enable-optimizations as the data required for profiling is generated by running tests.
You can run make -j8 build_all followed by make -j8 install to skip tests once(the tests would still run with install target), but that would defeat the purpose.
You can instead drop the configure flag for better build times.

just build and install with
make -j8 build_all
make -j8 altinstall

I did some (quick) research on skipping the test runs when building Python by instructing either:
configure - passing some args (e.g. --without-tests, --disable-tests, --skip-tests)
make - specifying some variable (either via env vars or cmdline)
The former yielded no results. The latter (by looking in the Makefile template) revealed the fact that test execution is invoked by calling ${PYTHON_SRC_DIR}/Tools/scripts/run_tests.py (which sets some stuff and calls another script, which calls another one, ...). Note that I found the file on Python 3.5(.4) and Python 3.6(.4) but not on Python 2.7(.14). A little bit more research revealed that it is possible to skip the (above) test run. What you need to do is:
make -C dl/Python-${PYTHON_VERSION} -j8 EXTRATESTOPTS=--list-tests install
Notes:
Googleing didn't reveal anything (relevant) on EXTRATESTOPTS, so I guess it's not officialy supported
You could also set EXTRATESTOPTS=--list-tests as an environment variable, before launching (inner) make
Needless to say that if some "minor" error happened during build (e.g. a non critical external module (like _ssl.so for example) failed to build), there will be no tests to fail, so you'll only find about it at runtime (which would be terribly nasty if it would happen in production)
#EDIT0:
After #amohr 's comment, I decided to play a little bit more, so I ran the whole process:
configure (opts)
make (opts)
make install
on a Lnx (Ubtu 16) machine with 2 CPUs, where one (full) test run takes ~24 minutes. Here are my findings (Python 3.6):
It ran successfully on Python 3.5(.4)
The solution that I suggested earlier, operates at the 3rd step, so it only skips the 2nd test run: it operates on the (root) Makefile's test target (make test) which is invoked by install target
Regarding the 1st test run, by checking the Makefile, and make's output, here's what I discovered that happens at the 2nd (make) step:
The C sources are built "normally"
Tests are being run (I deducted that some profile data is stored somewhere)
The C sources are rebuilt with different flags (e.g. in my case gcc's -fprofile-generate was replaced by -fprofile-use -fprofile-correction (check [GNU.GCC]: Options That Control Optimization for more details)) to make use of the profile info generated at previous (sub) step
Skipping the 1st test run would automatically imply no optimizations. Way(s) of achieving:
make build_all (at 2nd step) - as suggested by other answers
Here's a snippet of the (root) Makefile generated by configure (with --enable-optimizations):
all: profile-opt
build_all: check-clean-src $(BUILDPYTHON) oldsharedmods sharedmods gdbhooks \
Programs/_testembed python-config
And here's one without it:
all: build_all
build_all: check-clean-src $(BUILDPYTHON) oldsharedmods sharedmods gdbhooks \
Programs/_testembed python-config
As seen, running:
configure --enable-optimizations
make build_all
is identical to:
configure
make
Manually modifying the (root) Makefile between 1st (configure --enable-optimizations) and 2nd (make) steps:
Find the macro definition PROFILE_TASK=-m test.regrtest --pgo (for me it was around line ~250)
Add --list-tests at the end
Substeps (#2.)#1. and (#2.)#3. are exactly the same, while for (#2.)#2., the tests are not being run. That can mean that either:
The 2nd sources build is identical to the 1st one (which would make it completely useless)
The 2nd does some optimizations (without having any information), which means that it could crash at runtime (I think / hope it's the former case)

the default build target for optimized builds includes running the tests.
to skip them, try:
make -C dl/Python-${PYTHON_VERSION} -j8 build_all

We faced the same problem with python 3.7.6 and based on reverse engineering the Makefile, found that following steps will build python quickly while running the tests too (so that we don't loose on --enable-optimization flag)
cd /home/ubuntu
PYTHON_VER=3.7.6
wget https://www.python.org/ftp/python/$PYTHON_VER/Python-$PYTHON_VER.tgz
tar xvf Python-$PYTHON_VER.tgz
cd Python-$PYTHON_VER/
./configure --enable-loadable-sqlite-extensions --enable-optimizations
make profile-gen-stamp; ./python -m test.regrtest --pgo -j8; make build_all_merge_profile; touch profile-run-stamp; make
make install
The key is run python tests in parallel by passing -j8 to it.

the first part of tests is required for optimizing the code. simply you can't / should not skip it.

Related

python script: git checkout prior to running

I've started to use gitHub to manage the development process of production scripts running daily on my workstation (via cron).
One way to make sure that the lastest valid production version runs would be to run a git checkout inside the production directory, moments prior to running the target script. I was wondering if it can be done from within the production script (i.e. checking is this is the latest version, if not, git checkout, if yes, do nothing and run)
It is certainly possible to do this, e.g. (untested):
git fetch &&
ahead=$(git rev-list --count master..origin/master) &&
case "$ahead" in
0) ;; # run normally
*) echo "I seem to be out of date"
git merge --ff-only || { echo "update failed, quitting"; exit 1; }
exec <path-to-script>;;
esac
# ... normal part of script here
But this is also almost certainly the wrong approach. Instead of doing that, schedule a job—a script—that consists of:
git fetch && git merge --ff-only && exec <path-to-script>
This script can live in the same repository. It's a separate script whose job is to update in place—which, if there's nothing to do, is a no-op (it says "Already up to date." and then exits 0 = success)—and then run the other script, whether it's updated or not. This provides a clean separation of purpose: one script updates; one script runs; there's no weird mix of self-update-and-oops-now-I-have-to-quit-because-maybe-my-code-is-different.
Note that adding --quiet to the git merge --ff-only suppresses the "Already up to date." message, which may be helpful if your version of cron emails you the output when there is output. (If your version of cron doesn't do this, it probably should be upgraded to one that does.) So you probably really want:
git fetch && git merge --ff-only --quiet && exec <path-to-script>
The fetch followed by merge is what git pull does by default, but git pull is a program meant to be run by a human. Git divides its various programs into so-called porcelain and plumbing, and the porcelain commands are the ones meant for humans, while the plumbing ones are meant for writing scripts. Git's division here is quite imperfect: some commands are both plumbing and porcelain, and there are some varieties that are missing (e.g., git log is porcelain, but there are no plumbing commands for some of what it does)—but to the extent that you can, it's usually wise to stick to this pattern.
In case it might be useful, here's the python script I am using now. I call it from vim immediately after a commit.
#!/bin/python3
"""
This script syncs development -> production branches
Both directories are connected to the same repository
Assume projectP (for production) and projectD (for development)
"""
####################################################
# Modules
####################################################
import git
####################################################
# Globals
####################################################
theProjectD = "path_to_projectD"
theProjectP = "path_to_projectP"
####################################################
# Code
####################################################
# push & merge develop to main from the develop directory
repo = git.Repo(theProjectD)
repo.remotes.origin.push()
repo.git.checkout('main')
repo.git.merge('develop')
repo.remotes.origin.push()
repo.git.checkout('develop')
# fetch latest version of main in the production directory
repo = git.Repo(theProjectP)
repo.remotes.origin.fetch()
repo.remotes.origin.pull()

Best practice for triggering a docker build if requirements.txt hasn't changed [duplicate]

I have a few RUN commands in my Dockerfile that I would like to run with -no-cache each time I build a Docker image.
I understand the docker build --no-cache will disable caching for the entire Dockerfile.
Is it possible to disable cache for a specific RUN command?
There's always an option to insert some meaningless and cheap-to-run command before the region you want to disable cache for.
As proposed in this issue comment, one can add a build argument block (name can be arbitrary):
ARG CACHEBUST=1
before such region, and modify its value each run by adding --build-arg CACHEBUST=$(date +%s) as a docker build argument (value can also be arbitrary, here it is current datetime, to ensure its uniqueness across runs).
This will, of course, disable cache for all following blocks too, as hash sum of the intermediate image will be different, which makes truly selective cache disabling a non-trivial problem, taking into account how docker currently works.
Use
ADD "https://www.random.org/cgi-bin/randbyte?nbytes=10&format=h" skipcache
before the RUN line you want to always run. This works because ADD will always fetch the file/URL and the above URL generates random data on each request, Docker then compares the result to see if it can use the cache.
I have also tested this and works nicely since it does not require any additional Docker command line arguments and also works from a Docker-compose.yaml file :)
If your goal is to include the latest code from Github (or similar), one can use the Github API (or equivalent) to fetch information about the latest commit using an ADD command.
docker build will always fetch an URL from an ADD command, and if the response is different from the one received last time docker build ran, it will not use the subsequent cached layers.
eg.
ADD "https://api.github.com/repos/username/repo_name/commits?per_page=1" latest_commit
RUN curl -sLO "https://github.com/username/repo_name/archive/main.zip" && unzip main.zip
As of February 2016 it is not possible.
The feature has been requested at GitHub
Not directly but you can divide your Dockerfile in several parts, build an image, then FROM thisimage at the beginning of the next Dockerfile, and build the image with or without caching
the feature added a week ago.
ARG FOO=bar
FROM something
RUN echo "this won't be affected if the value of FOO changes"
ARG FOO
RUN echo "this step will be executed again if the value of FOO changes"
FROM something-else
RUN echo "this won't be affected because this stage doesn't use the FOO build-arg"
https://github.com/moby/moby/issues/1996#issuecomment-550020843
Building on #Vladislav’s solution above I used in my Dockerfile
ARG CACHEBUST=0
to invalidate the build cache from hereon.
However, instead of passing a date or some other random value, I call
docker build --build-arg CACHEBUST=`git rev-parse ${GITHUB_REF}` ...
where GITHUB_REF is a branch name (e.g. main) whose latest commit hash is used. That means that docker’s build cache is being invalidated only if the branch from which I build the image has had commits since the last run of docker build.
I believe that this is a slight improvement on #steve's answer, above:
RUN git clone https://sdk.ghwl;erjnv;wekrv;qlk#gitlab.com/your_name/your_repository.git
WORKDIR your_repository
# Calls for a random number to break the cahing of the git clone
# (https://stackoverflow.com/questions/35134713/disable-cache-for-specific-run-commands/58801213#58801213)
ADD "https://www.random.org/cgi-bin/randbyte?nbytes=10&format=h" skipcache
RUN git pull
This uses the Docker cache of the git clone, but then runs an uncached update of the repository.
It appears to work, and it is faster - but many thanks to #steve for providing the underlying principles.
Another quick hack is to write some random bytes before your command
RUN head -c 5 /dev/random > random_bytes && <run your command>
writes out 5 random bytes which will force a cache miss

Bash Process Execution Logic

I have a number of scripts to run, some of which have one or more scripts that must be completed first. I've read a number of examples showing how bash's control operators work, but haven't found any good examples to address the complexity of the logic i'm trying to implement.
I have p_01.py and p_03.py that are both requirements for p_09.py, but also have individual processes that only require p_01. For example:
((python p_01.py & python p_03.py) && python p_09.py) &
(python p_01.py &&
(
(python p_05.py;
python p_10.py) &
(python p_08.py;
python p_11.py)
)
)
wait $(jobs -p)
My question is, how can I accomplish all of the scripts running only after their requirements without repeating the running scripts (such as p_01.py, which you'll notice it's used twice above)? I'm looking for a generalized answer with some detail, since in actuality the dependencies are more numerous/nested than the example above. Thank you!
If you are thinking of the scripts in terms of their dependencies, that's difficult to translate directly to a master script. Consider using make, which would let you express these dependencies directly:
SCRIPTS = $(wildcard *.py)
.PHONY: all
all: $(SCRIPTS)
$(SCRIPTS):
python $#
p_05.py p_08.py p_09.py: p_01.py
p_09.py: p_03.py
p_10.py: p_05.py
p_11.py: p_08.py
Running make -B -j4 would run all of the Python scripts with up to 4 executing in parallel at any one time.

How to remove Profiling *.gcda:Cannot open errors with python virtualenv builder?

In jenkins output I am getting the following errors. Is this a problem or can it be silenced?
profiling:/opt/Python-3.6.1/Python/structmember.gcda:Cannot open
profiling:/opt/Python-3.6.1/Python/getcompiler.gcda:Cannot open
profiling:/opt/Python-3.6.1/Objects/odictobject.gcda:Cannot open
profiling:/opt/Python-3.6.1/Objects/enumobject.gcda:Cannot open
profiling:/opt/Python-3.6.1/Objects/descrobject.gcda:Cannot open
profiling:/opt/Python-3.6.1/Objects/cellobject.gcda:Cannot open
profiling:/opt/Python-3.6.1/Objects/bytes_methods.gcda:Cannot open
profiling:/opt/Python-3.6.1/Objects/accu.gcda:Cannot open
profiling:/opt/Python-3.6.1/Parser/myreadline.gcda:Cannot open
profiling:/opt/Python-3.6.1/Parser/parser.gcda:Cannot open
profiling:/opt/Python-3.6.1/Modules/xxsubtype.gcda:Cannot open
profiling:/opt/Python-3.6.1/Modules/symtablemodule.gcda:Cannot open
profiling:/opt/Python-3.6.1/Modules/zipimport.gcda:Cannot open
profiling:/opt/Python-3.6.1/Modules/stringio.gcda:Cannot open
profiling:/opt/Python-3.6.1/Modules/textio.gcda:Cannot open
profiling:/opt/Python-3.6.1/Modules/bufferedio.gcda:Cannot open
profiling:/opt/Python-3.6.1/Modules/bytesio.gcda:Cannot open
profiling:/opt/Python-3.6.1/Modules/fileio.gcda:Cannot open
profiling:/opt/Python-3.6.1/Modules/iobase.gcda:Cannot open
profiling:/opt/Python-3.6.1/Modules/_iomodule.gcda:Cannot open
profiling:/opt/Python-3.6.1/Modules/_localemodule.gcda:Cannot open
I built python from source on the debian 8 server.
I fixed this issue by doing change owner. I was setting up homeassistant using Python 3.6.3 build using ./configure --enable-optimizations.
From my virtual env i got these errors but fixed them doing: from su / root account
sudo chown -R homeassistant:homeassistant /home/pi/Python-3.6.3
I thought maybe it could help other people ;)
Have a nice day! Ciao!
This happened to me when I did ./configure --enable-optimizations. If you remove --enable-optimizations, compile and install it again - these messages are not shown anymore.
To sum things up, here's an example with a fresh version of Python:
wget https://www.python.org/ftp/python/3.6.3/Python-3.6.3.tgz
tar xvf Python-3.6.3.tgz
cd Python-3.6.3
./configure
make
sudo make altinstall
python3.6
The gcda files are gcc profiling records, which are used for seeing which functions the CPU spent most of its time in. This tells you where you can get the most bang for your buck, when optimising the code.
You can retain the Python code opinisations, but not have the profiling, by using the configuration options --enable-optimizations --disable-profiling. Well, worked for me.
As the configure script will tell you, if you do not have the --enable-optimizations you will lose out on the best performance.
You are probably getting gcda files because you interrupted the Python build part-way through. When you run with --enable-optimizations the Python build run in three phases
Builds the code
Runs all the test modules to profile the code
Re-compiles the code to optimise it based on the profiling
It pretty common to think the test phase is just to check the code is working correctly, as this is what it looks like it is doing, but be patient and leave it and it will compile again, the second time omitting the profiling.
So, its better to compile with --enable-optimizations and without --disable-profiling and just wait, as you should get better code that way.

when we need use sudo python xxx.py or just python xxx.py or xxx.py

I have write a website,what confused me is when i run the website,first i need start the the app,
so there are 3 ways:
sudo python xxx.py
python xxx.py
xxx.py
I didn't clear with how to use each of them,the NO.3 method currently in my computer dosen't work well
sudo will run the application with superuser permissions. Considering that you're referring to a website, this is certainly not what you want to do. (For a webapp, if it requires superuser permissions, it's broken. That's far, far too big of a security risk to consider actually using.)
Under other circumstances you might have a python program that does some sort of system maintaince and requires being run as root. In this case, you'd use sudo, but you would never want to do this for something that's publicly accessible and could potentially be exploited. In fact, for anything other than testing, you should probably run the webapp as a separate user with very limited access (e.g. with their shell set to /dev/null, no read or write access to anything that they don't need, etc...).
The other two are effectively identical (in therms of what they do), but the last option (executing the script directly) will require:
the executable bit to be set (on
unix-y systems) (e.g. chmod +x whatever.py)
a shebang on the first line(e.g. #!
/usr/bin/python) pointing to the
python execuctable that you want to
run things with (again, this only applies to unix-y systems)
Calling python to run the code (python whatever.py) and following the steps above (resulting in a script that you can call directly with whatever.py)
do exactly the same thing (assuming that the shebang in the python file points to the same python executable as "python" does, anyway...)

Categories