python script: git checkout prior to running - python

I've started to use gitHub to manage the development process of production scripts running daily on my workstation (via cron).
One way to make sure that the lastest valid production version runs would be to run a git checkout inside the production directory, moments prior to running the target script. I was wondering if it can be done from within the production script (i.e. checking is this is the latest version, if not, git checkout, if yes, do nothing and run)

It is certainly possible to do this, e.g. (untested):
git fetch &&
ahead=$(git rev-list --count master..origin/master) &&
case "$ahead" in
0) ;; # run normally
*) echo "I seem to be out of date"
git merge --ff-only || { echo "update failed, quitting"; exit 1; }
exec <path-to-script>;;
esac
# ... normal part of script here
But this is also almost certainly the wrong approach. Instead of doing that, schedule a job—a script—that consists of:
git fetch && git merge --ff-only && exec <path-to-script>
This script can live in the same repository. It's a separate script whose job is to update in place—which, if there's nothing to do, is a no-op (it says "Already up to date." and then exits 0 = success)—and then run the other script, whether it's updated or not. This provides a clean separation of purpose: one script updates; one script runs; there's no weird mix of self-update-and-oops-now-I-have-to-quit-because-maybe-my-code-is-different.
Note that adding --quiet to the git merge --ff-only suppresses the "Already up to date." message, which may be helpful if your version of cron emails you the output when there is output. (If your version of cron doesn't do this, it probably should be upgraded to one that does.) So you probably really want:
git fetch && git merge --ff-only --quiet && exec <path-to-script>
The fetch followed by merge is what git pull does by default, but git pull is a program meant to be run by a human. Git divides its various programs into so-called porcelain and plumbing, and the porcelain commands are the ones meant for humans, while the plumbing ones are meant for writing scripts. Git's division here is quite imperfect: some commands are both plumbing and porcelain, and there are some varieties that are missing (e.g., git log is porcelain, but there are no plumbing commands for some of what it does)—but to the extent that you can, it's usually wise to stick to this pattern.

In case it might be useful, here's the python script I am using now. I call it from vim immediately after a commit.
#!/bin/python3
"""
This script syncs development -> production branches
Both directories are connected to the same repository
Assume projectP (for production) and projectD (for development)
"""
####################################################
# Modules
####################################################
import git
####################################################
# Globals
####################################################
theProjectD = "path_to_projectD"
theProjectP = "path_to_projectP"
####################################################
# Code
####################################################
# push & merge develop to main from the develop directory
repo = git.Repo(theProjectD)
repo.remotes.origin.push()
repo.git.checkout('main')
repo.git.merge('develop')
repo.remotes.origin.push()
repo.git.checkout('develop')
# fetch latest version of main in the production directory
repo = git.Repo(theProjectP)
repo.remotes.origin.fetch()
repo.remotes.origin.pull()

Related

Check latest commit with Python in Linux/Windows/Mac

I'm trying to generate a simple Python code that:
checks if it is running inside a git folder
if so, fetch the latest commit, else skip
it should work under the three platforms: Linux, Windows, and Mac
I have this code that works correctly under Linux:
from subprocess import call, STDOUT
import os
if call(["git", "branch"], stderr=STDOUT, stdout=open(os.devnull, 'w')) != 0:
# Not a git folder
commit = ''
else:
# Inside a git folder.: fetch latest commit
commit = subprocess.check_output(['git', 'rev-parse', '{}'.format('HEAD')])
print(commit)
but I have no way of checking if itwill work under Windows and Mac.
Does it work? Is there any way of checking/knowing this sort of things when one has no access to the other operating system?
You don't want to run git branch to detect whether you're in a Git repository, because you may or may not have any branches. To detect whether you're able to use Git commands, you'll want to run something like git rev-parse --git-dir, which will exit non-zero if you're not within a Git repository.
However, there are a couple of other issues with your code. First of all, in a new repository (one created fresh with git init), there will be a .git directory and the above command will succeed, but HEAD will not point anywhere. Therefore, your git rev-parse HEAD command will fail and print HEAD and an error.
Finally, if you want parse a revision, you should usually use --verify so that you don't print the dummy HEAD value on failure. So your invocation should look like git rev-parse --verify HEAD.
Ultimately, it's up to you to figure out what you want to do in a newly initialized repository, whether that's fail or fall back to an empty string.
The behaviors I've described here are consistent across platforms; they're built into Git and well defined.
There's a method check_output in subprocess library
from subprocess import check_output
try:
# use python to parse this log for info. This is your entire last commit
logs = check_output(['git', 'log', '-1', '--stat']).decode("UTF-8")
except Exception as e:
# Do whatever you wanna do otherwise if not git repository
print(e)
Git has a command called "git log".
"-1" indicates the last commit and
--stat will give you the files that were changed, commit ID, TIME ETC
then you can use python to parse this log and retrive any information you want
Check this out for more info on git log

Best practice for triggering a docker build if requirements.txt hasn't changed [duplicate]

I have a few RUN commands in my Dockerfile that I would like to run with -no-cache each time I build a Docker image.
I understand the docker build --no-cache will disable caching for the entire Dockerfile.
Is it possible to disable cache for a specific RUN command?
There's always an option to insert some meaningless and cheap-to-run command before the region you want to disable cache for.
As proposed in this issue comment, one can add a build argument block (name can be arbitrary):
ARG CACHEBUST=1
before such region, and modify its value each run by adding --build-arg CACHEBUST=$(date +%s) as a docker build argument (value can also be arbitrary, here it is current datetime, to ensure its uniqueness across runs).
This will, of course, disable cache for all following blocks too, as hash sum of the intermediate image will be different, which makes truly selective cache disabling a non-trivial problem, taking into account how docker currently works.
Use
ADD "https://www.random.org/cgi-bin/randbyte?nbytes=10&format=h" skipcache
before the RUN line you want to always run. This works because ADD will always fetch the file/URL and the above URL generates random data on each request, Docker then compares the result to see if it can use the cache.
I have also tested this and works nicely since it does not require any additional Docker command line arguments and also works from a Docker-compose.yaml file :)
If your goal is to include the latest code from Github (or similar), one can use the Github API (or equivalent) to fetch information about the latest commit using an ADD command.
docker build will always fetch an URL from an ADD command, and if the response is different from the one received last time docker build ran, it will not use the subsequent cached layers.
eg.
ADD "https://api.github.com/repos/username/repo_name/commits?per_page=1" latest_commit
RUN curl -sLO "https://github.com/username/repo_name/archive/main.zip" && unzip main.zip
As of February 2016 it is not possible.
The feature has been requested at GitHub
Not directly but you can divide your Dockerfile in several parts, build an image, then FROM thisimage at the beginning of the next Dockerfile, and build the image with or without caching
the feature added a week ago.
ARG FOO=bar
FROM something
RUN echo "this won't be affected if the value of FOO changes"
ARG FOO
RUN echo "this step will be executed again if the value of FOO changes"
FROM something-else
RUN echo "this won't be affected because this stage doesn't use the FOO build-arg"
https://github.com/moby/moby/issues/1996#issuecomment-550020843
Building on #Vladislav’s solution above I used in my Dockerfile
ARG CACHEBUST=0
to invalidate the build cache from hereon.
However, instead of passing a date or some other random value, I call
docker build --build-arg CACHEBUST=`git rev-parse ${GITHUB_REF}` ...
where GITHUB_REF is a branch name (e.g. main) whose latest commit hash is used. That means that docker’s build cache is being invalidated only if the branch from which I build the image has had commits since the last run of docker build.
I believe that this is a slight improvement on #steve's answer, above:
RUN git clone https://sdk.ghwl;erjnv;wekrv;qlk#gitlab.com/your_name/your_repository.git
WORKDIR your_repository
# Calls for a random number to break the cahing of the git clone
# (https://stackoverflow.com/questions/35134713/disable-cache-for-specific-run-commands/58801213#58801213)
ADD "https://www.random.org/cgi-bin/randbyte?nbytes=10&format=h" skipcache
RUN git pull
This uses the Docker cache of the git clone, but then runs an uncached update of the repository.
It appears to work, and it is faster - but many thanks to #steve for providing the underlying principles.
Another quick hack is to write some random bytes before your command
RUN head -c 5 /dev/random > random_bytes && <run your command>
writes out 5 random bytes which will force a cache miss

Bash Process Execution Logic

I have a number of scripts to run, some of which have one or more scripts that must be completed first. I've read a number of examples showing how bash's control operators work, but haven't found any good examples to address the complexity of the logic i'm trying to implement.
I have p_01.py and p_03.py that are both requirements for p_09.py, but also have individual processes that only require p_01. For example:
((python p_01.py & python p_03.py) && python p_09.py) &
(python p_01.py &&
(
(python p_05.py;
python p_10.py) &
(python p_08.py;
python p_11.py)
)
)
wait $(jobs -p)
My question is, how can I accomplish all of the scripts running only after their requirements without repeating the running scripts (such as p_01.py, which you'll notice it's used twice above)? I'm looking for a generalized answer with some detail, since in actuality the dependencies are more numerous/nested than the example above. Thank you!
If you are thinking of the scripts in terms of their dependencies, that's difficult to translate directly to a master script. Consider using make, which would let you express these dependencies directly:
SCRIPTS = $(wildcard *.py)
.PHONY: all
all: $(SCRIPTS)
$(SCRIPTS):
python $#
p_05.py p_08.py p_09.py: p_01.py
p_09.py: p_03.py
p_10.py: p_05.py
p_11.py: p_08.py
Running make -B -j4 would run all of the Python scripts with up to 4 executing in parallel at any one time.

Rebasing Git Repo with Python

Is there a way to use use python to rebase a repo from one on github, then push the result. As well as detecting if the rebase failed as a result of conflicts that need to be resolved?
Git is primarily a command-line tool. Once installed, you should be able to open-up a console, command prompt, powershell, c-shell, bash shell, etc. and just type git and get a list of available git commands.
Once you have Git setup and working this way, then from Python it would be possible to execute git commands in the same way you would execute any other shell commands. I'm not a Python expert, but ElpieKay suggests in the comments to use:
commands.getstatusoutput("git <command>")
You will need to do a separate search for git rebase specifically and figure out how the output is formatted and parse it to determine success, or possibly there is an error code or StdErr output that you can get through the .getstatusoutput or a similar command from commands in python.
Another thing that may help is looking at the man page for rebase with git rebase --help.
Summary
I recommend doing a search to find out more about the python commands library or just shell interaction in general for python, and then a separate set of searches/research to determine exactly how to implement the git rebase commands and its output format to determine what you need to parse to determine success or failure.

Make (install from source) python without running tests

I compiling python from source tar. All works good, but tests running 2 hours and two times. How to bypass these tests?
0:16:20 [178/405] test_inspect
0:16:26 [179/405] test_int
0:16:27 [180/405] test_int_literal
0:16:27 [181/405] test_io
0:18:18 [182/405] test_ioctl -- test_io passed in 1 min 51 sec
0:18:19 [183/405] test_ipaddress
0:18:22 [184/405] test_isinstance
0:18:23 [185/405] test_iter
0:18:24 [186/405] test_iterlen
0:18:25 [187/405] test_itertools
0:19:09 [188/405] test_json -- test_itertools passed in 44 sec
0:19:30 [189/405] test_keyword
As result
make 7724,86s user 188,63s system 101% cpu 2:10:18,93 total
I make its distribution like this
PYTHON_VERSION = 3.6.1
PYTHON_URL = https://www.python.org/ftp/python/${PYTHON_VERSION}/Python-${PYTHON_VERSION}.tar.xz
wget -O dl/Python-${PYTHON_VERSION}.tar.xz ${PYTHON_URL}
cd dl
tar xf Python-${PYTHON_VERSION}.tar.xz
mkdir -p dl/Python-${PYTHON_VERSION}-build/
cd Python-${PYTHON_VERSION}
./configure --enable-optimizations --prefix=$$(pwd)-build --cache-file=$$(pwd)/cache-file
This commands runs tests twice:
make -C dl/Python-${PYTHON_VERSION} -j8
make -C dl/Python-${PYTHON_VERSION} -j8 install
p.s. This is part of another make file.
The configure option --enable-optimizations enables running test suites to generate data for profiling Python. The resulting python binary has better performance in executing python code. Improvements noted here
From configure help:
--enable-optimizations Enable expensive optimizations (PGO, etc). Disabled by default.
From wikipedia
profile-guided optimisation uses the results of profiling test runs of the instrumented program to optimize the final generated code.
In short, you should not skip tests when using --enable-optimizations as the data required for profiling is generated by running tests.
You can run make -j8 build_all followed by make -j8 install to skip tests once(the tests would still run with install target), but that would defeat the purpose.
You can instead drop the configure flag for better build times.
just build and install with
make -j8 build_all
make -j8 altinstall
I did some (quick) research on skipping the test runs when building Python by instructing either:
configure - passing some args (e.g. --without-tests, --disable-tests, --skip-tests)
make - specifying some variable (either via env vars or cmdline)
The former yielded no results. The latter (by looking in the Makefile template) revealed the fact that test execution is invoked by calling ${PYTHON_SRC_DIR}/Tools/scripts/run_tests.py (which sets some stuff and calls another script, which calls another one, ...). Note that I found the file on Python 3.5(.4) and Python 3.6(.4) but not on Python 2.7(.14). A little bit more research revealed that it is possible to skip the (above) test run. What you need to do is:
make -C dl/Python-${PYTHON_VERSION} -j8 EXTRATESTOPTS=--list-tests install
Notes:
Googleing didn't reveal anything (relevant) on EXTRATESTOPTS, so I guess it's not officialy supported
You could also set EXTRATESTOPTS=--list-tests as an environment variable, before launching (inner) make
Needless to say that if some "minor" error happened during build (e.g. a non critical external module (like _ssl.so for example) failed to build), there will be no tests to fail, so you'll only find about it at runtime (which would be terribly nasty if it would happen in production)
#EDIT0:
After #amohr 's comment, I decided to play a little bit more, so I ran the whole process:
configure (opts)
make (opts)
make install
on a Lnx (Ubtu 16) machine with 2 CPUs, where one (full) test run takes ~24 minutes. Here are my findings (Python 3.6):
It ran successfully on Python 3.5(.4)
The solution that I suggested earlier, operates at the 3rd step, so it only skips the 2nd test run: it operates on the (root) Makefile's test target (make test) which is invoked by install target
Regarding the 1st test run, by checking the Makefile, and make's output, here's what I discovered that happens at the 2nd (make) step:
The C sources are built "normally"
Tests are being run (I deducted that some profile data is stored somewhere)
The C sources are rebuilt with different flags (e.g. in my case gcc's -fprofile-generate was replaced by -fprofile-use -fprofile-correction (check [GNU.GCC]: Options That Control Optimization for more details)) to make use of the profile info generated at previous (sub) step
Skipping the 1st test run would automatically imply no optimizations. Way(s) of achieving:
make build_all (at 2nd step) - as suggested by other answers
Here's a snippet of the (root) Makefile generated by configure (with --enable-optimizations):
all: profile-opt
build_all: check-clean-src $(BUILDPYTHON) oldsharedmods sharedmods gdbhooks \
Programs/_testembed python-config
And here's one without it:
all: build_all
build_all: check-clean-src $(BUILDPYTHON) oldsharedmods sharedmods gdbhooks \
Programs/_testembed python-config
As seen, running:
configure --enable-optimizations
make build_all
is identical to:
configure
make
Manually modifying the (root) Makefile between 1st (configure --enable-optimizations) and 2nd (make) steps:
Find the macro definition PROFILE_TASK=-m test.regrtest --pgo (for me it was around line ~250)
Add --list-tests at the end
Substeps (#2.)#1. and (#2.)#3. are exactly the same, while for (#2.)#2., the tests are not being run. That can mean that either:
The 2nd sources build is identical to the 1st one (which would make it completely useless)
The 2nd does some optimizations (without having any information), which means that it could crash at runtime (I think / hope it's the former case)
the default build target for optimized builds includes running the tests.
to skip them, try:
make -C dl/Python-${PYTHON_VERSION} -j8 build_all
We faced the same problem with python 3.7.6 and based on reverse engineering the Makefile, found that following steps will build python quickly while running the tests too (so that we don't loose on --enable-optimization flag)
cd /home/ubuntu
PYTHON_VER=3.7.6
wget https://www.python.org/ftp/python/$PYTHON_VER/Python-$PYTHON_VER.tgz
tar xvf Python-$PYTHON_VER.tgz
cd Python-$PYTHON_VER/
./configure --enable-loadable-sqlite-extensions --enable-optimizations
make profile-gen-stamp; ./python -m test.regrtest --pgo -j8; make build_all_merge_profile; touch profile-run-stamp; make
make install
The key is run python tests in parallel by passing -j8 to it.
the first part of tests is required for optimizing the code. simply you can't / should not skip it.

Categories