Linux package management with Python

Linux package management with Python - python

I am a complete Python beginner and try to write a Python script to automate the setup of a SDK on Linux machines from remote Github repositories.
The script starts by performing some basic preliminary operations, especially the check/setup of several packages (git, docker, pip, etc.).
For now, I target Debian (Stretch, Buster), Centos (6, 7) and Ubuntu Server 18.04LTS.
Of course, I want the script to run on the widest range of linux machines.
Today I rely on available package managers (apt-get and yum), roughly requested through subprocess.call() statements.
I customize the related commands using nasty script configuration variables like below :
import platform
distribution = platform.dist()[0]
version = platform.dist()[1]
if distribution == 'debian':
pkgInstaller = 'dpkg'
pkmManager = 'apt-get'
checkIfInstalled = '-s'
installPackage = 'install'
yesToAll = '-y'
dockerPackage = 'docker-ce'
elif distribution == 'centos':
pkgInstaller = 'rpm'
pkgManager = 'yum'
checkIfInstalled = '-q'
installPackage = 'install'
yesToAll = '-y'
dockerPackage = 'docker'
I then simply loop on an array containing the names of packages to be installed, then run the command through subprocess.call() :
prerequisites = ['git', dockerPackage, 'doxygen', 'python2-pip']
for pkg in prerequisites:
pgkInstallation = subprocess.call(['sudo', pkgManager, yesToAll, installPackage, pkg])
While this approach may have the benefit of not having too much bonding to third-party Python modules, I guess there are... some smarter ways of doing such simple operation ?

Usually when doing switch statements like this, a dictionary might be a bit more useful. Also, normally I'm not one to try to PEP-8 things, but this is an instance where PEP-8 might really help your readability by not matching up your equals signs for all of your lines of code.
The dict will hold your distro as the key, and your vars as a value wrapped in a tuple
options = {
'debian': ('dpkg', 'apt-get', '-s', 'install', '-y', 'docker-ce'),
'centos': ('rpm', 'yum', '-q', 'install', '-y', 'docker'),
}
# unpack this function call here
distribution, version, *_ = platform.dist()
# now get the match
pkg_installer, pkg_manager, check, install_pkg, yes_to_all, docker = options[distribution]
requisites = ['git', docker, 'doxygen', 'python2-pip']
for pkg in requisites:
pgkInstallation = subprocess.call(['sudo', pkg_manager, yes_to_all, install_pkg, pkg])
The options[distribution] call will raise a KeyError for unsupported distributions, so you can probably catch that and raise something a bit more useful like:
try:
pkg_installer, pkg_manager, check, install_pkg, yes_to_all, docker = options[distribution]
except KeyError as e:
raise ValueError(f"Got unsupported OS, expected one of {', '.join(options.keys())}") from e
To make it less verbose, the only var you use out of order is docker, so you can house all of the others in a single var:
try:
*args, docker = options[distribution]
except KeyError as e:
raise ValueError(f"Got unsupported OS, expected one of {', '.join(options.keys())}") from e
requisites = ['git', docker, 'doxygen', 'python2-pip']
for pkg in requisites:
pgkInstallation = subprocess.call(['sudo', *args, pkg])

Related

Python subprocess.run can't find /bin/sh in chroot

(First off, apologies for the roughness of this question's writing-- would love any constructive feedback.)
Ok what I'm doing is a bit involved-- I'm trying to make a Python script that executes Bash scripts that each compile a component of a Linux From Scratch (LFS) system. I'm following the LFS 11.2 book pretty closely (but not 100%, although I've been very careful to check where my deviations break things. If you're familiar with LFS, this is a deviation that breaks things).
Basically, my script builds a bunch of tools (bash, tar, xz, make, gcc, binutils) with a cross compiler, and tells their build systems to install them into a directory lfs/temp-tools. Then the script calls os.chroot('lfs') to chroot into the lfs directory, and immediately resets all the environment variables (most importantly PATH) with:
os.environ = {"PATH" : "/usr/bin:/usr/sbin:/temp-tools/bin", ...other trivial stuff like HOME...}
But after the chroot, my calls of
subprocess.run([f"{build_script_path} >{log_file_path} 2>&1"], shell=True)
are failing with FileNotFoundError: [Errno 2] No such file or directory: '/bin/sh', even though
bin/sh in the chroot directory is a sym link to bash
there's a perfectly good copy of bash in /temp-tools/bin
calling print(os.environ) after the python chroot shows /temp-tools/bin is in PATH
I thought maybe subprocess.run is stuck using the old environment variables, before I reset them upon entering the chroot, but adding env=os.environ to subprocess.run does not help. :/ I'm stuck for now
For context if it helps, here is where the subprocess.run call gets made:
def vanilla_build(target_name, src_dir_name=None):
def f():
nonlocal src_dir_name
if src_dir_name == None:
src_dir_name = target_name
tarball_path = find_tarball(src_dir_name)
src_dir_path = tarball_path.split(".tar")[0]
if "tcl" in src_dir_path:
src_dir_path = src_dir_path.rsplit("-",1)[0]
snap1 = lfs_dir_snapshot()
os.chdir(os.environ["LFS"] + "srcs/")
subprocess.run(["tar", "-xf", tarball_path], check=True, env=os.environ)
os.chdir(src_dir_path)
build_script_path = f"{os.environ['LFS']}root/build-scripts/{target_name.replace('_','-')}.sh"
log_file_path = f"{os.environ['LFS']}logs/{target_name}"
####### The main call #######
proc = subprocess.run([f"{build_script_path} >{log_file_path} 2>&1"],
shell=True, env=os.environ)
subprocess.run(["rm", "-rf", src_dir_path], check=True)
if proc.returncode != 0:
red_print(build_script_path + " failed!")
return
tracked_file_record_path = f"{os.environ['LFS']}logs/tracked/{target_name}"
with open(tracked_file_record_path, 'w') as f:
new_files = lfs_dir_snapshot() - snap1
f.writelines('\n'.join(new_files))
f.__name__ = "build_" + target_name
return f
And how I enter the chroot:
def enter_chroot():
os.chdir(os.environ["LFS"])
os.chroot(os.environ["LFS"])
os.environ = {"HOME" : "/root",
"TERM" : os.environ["TERM"],
"PATH" : "/usr/bin:/usr/sbin:/temp-tools/bin",
"LFS" : '/'}
Thank you! In the meantime I'm going to chop away as much code as possible to isolate the problem to either understand whatever I'm not getting or rewrite this question to be less context specific

Pyinfra - host/config objects

I was trying to run through some of the examples from pyinfra's docs to see if this tool would work for me (so I could manage some servers using good old Python) and I even found they had an example for installing virtualbox on a host machine, but I can't get even the example working. I'm getting a weird error that makes me think there's been an update to the tool since the examples, and even the docs, have been updated. Just wondering if anyone else has a way to get this to work with pyinfra (currently trying with version 1.5).
The example code:
from pyinfra import config, host
from pyinfra.facts.server import LinuxDistribution, LinuxName, OsVersion
from pyinfra.operations import apt, python, server
config.SUDO = True
virtualbox_version = '6.1'
def verify_virtualbox_version(state, host, version):
command = '/usr/bin/virtualbox --help'
status, stdout, stderr = host.run_shell_command(state, command=command, sudo=config.SUDO)
assert status is True # ensure the command executed OK
if version not in str(stdout):
raise Exception('`{}` did not work as expected.stdout:{} stderr:{}'.format(
command, stdout, stderr))
if host.get_fact(LinuxName) == 'Ubuntu':
code_name = host.get_fact(LinuxDistribution)['release_meta'].get('DISTRIB_CODENAME')
apt.packages(
name='Install packages',
packages=['wget'],
update=True,
)
apt.key(
name='Install VirtualBox key',
src='https://www.virtualbox.org/download/oracle_vbox_2016.asc',
)
apt.repo(
name='Install VirtualBox repo',
src='deb https://download.virtualbox.org/virtualbox/debian {} contrib'.format(code_name),
)
# install kernel headers
# Note: host.get_fact(OsVersion) is the same as `uname -r` (ex: '4.15.0-72-generic')
apt.packages(
{
'Install VirtualBox version {} and '
'kernel headers for {}'.format(virtualbox_version, host.get_fact(OsVersion)),
},
[
'virtualbox-{}'.format(virtualbox_version),
'linux-headers-{}'.format(host.get_fact(OsVersion)),
],
update=True,
)
server.shell(
name='Run vboxconfig which will stop/start VirtualBox services and build kernel modules',
commands='/sbin/vboxconfig',
)
python.call(
name='Verify VirtualBox version',
function=verify_virtualbox_version,
version=virtualbox_version,
)
Here is the error that I'm seeing:
File "install_virtualbox.py", line 21, in <module>
if host.get_fact(LinuxName) == 'Ubuntu':
AttributeError: module 'pyinfra.api.host' has no attribute 'get_fact'
I checked the source and I can't argue with my editor when it says it 'cannot find a reference to config' or 'init', as I don't see them either. But then how is anyone getting facts about hosts? Looks like the logic may have been moved to the 'pyinfra.api.host' and 'pyinfra.api.config' packages, but the logic there looks totally different.
Hoping I'm just missing something that someone who's been using the tool can help explain to me.

Python3 (pip): find which package provides a particular module

Without getting confused, there are tons of questions about installing packages, how to import the resulting modules, and listing what packages are available. But there doesn't seem to be the equivalent of a "--what-provides" option for pip, if you don't have a requirements.txt or pipenv. This question is similar to a previous question, but asks for the parent package, and not additional metadata. That said, these other questions did not get a lot of attention or many accepted answers - eg. How do you find python package metadata information given a module. So forging ahead... .
By way of example, there are two packages (to name a few) that will install a module called "serial" - namely "pyserial" and "serial". So assuming that one of the packages was installed, we might find it by using pip list:
python3 -m pip list | grep serial
However, the problem comes in if the name of the package does not match the name of the module, or if you just want to find out what package to install, working on a legacy server or dev machine.
You can check the path of the imported module - which can give you a clue. But continuing the example...
>>> import serial
>>> print(serial.__file__)
/usr/lib/python3.6/site-packages/serial/__init__.py
It is in a "serial" directory, but only pyserial is in fact installed, not serial:
> python3 -m pip list | grep serial
pyserial 3.4
The closest I can come is to generate a requirements.txt via "pipreqs ./" which may fail on a dependent child file (as it does with me), or to reverse check dependencies via pipenv (which brings a whole set of new issues along to get it all setup):
> pipenv graph --reverse
cymysql==0.9.15
ftptool==0.7.1
netifaces==0.10.9
pip==20.2.2
PyQt5-sip==12.8.1
- PyQt5==5.15.0 [requires: PyQt5-sip>=12.8,<13]
setuptools==50.3.0
wheel==0.35.1
Does anyone know of a command that I have missed for a simple solution to finding what pip package provides a particular module?

Use the packages_distributions() function from importlib.metadata (or importlib-metadata). So for example, in your case where serial is the name of the "import package":
import importlib.metadata # or: `import importlib_metadata`
importlib.metadata.packages_distributions()['serial']
This should return a list containing pyserial, which is the name of the "distribution package" (the name that should be used to pip-install).
References
https://importlib-metadata.readthedocs.io/en/stable/using.html#package-distributions
https://github.com/python/importlib_metadata/pull/287/files
For older Python versions and/or older versions of importlib-metadata...
I believe something like the following should work:
#!/usr/bin/env python3
import importlib.util
import pathlib
import importlib_metadata
def get_distribution(file_name):
result = None
for distribution in importlib_metadata.distributions():
try:
relative = (
pathlib.Path(file_name)
.relative_to(distribution.locate_file(''))
)
except ValueError:
pass
else:
if distribution.files and relative in distribution.files:
result = distribution
break
return result
def alpha():
file_name = importlib.util.find_spec('serial').origin
distribution = get_distribution(file_name)
print("alpha", distribution.metadata['Name'])
def bravo():
import serial
file_name = serial.__file__
distribution = get_distribution(file_name)
print("bravo", distribution.metadata['Name'])
if __name__ == '__main__':
alpha()
bravo()
This is just an example of code showing how to get the metadata of the installed project a specific module belongs to.
The important bit is the get_distribution function, it takes a file name as an argument. It could be the file name of a module or package data. If that file name belongs to a project installed in the environment (via pip install for example) then the importlib.metadata.Distribution object is returned.

Edit 2023/01/31: This issue is now solved via the importlib_metadata library. See Provide mapping from "Python packages" to "distribution packages", specifically "Note 2" deals with this exact issue. As such, see comments by #sinoroc, you can locate the package (eg. package "pyserial" providing module "serial") with something like this:
>>> import importlib_metadata
>>> print(importlib_metadata.packages_distributions()['serial'])
['pyserial']
Building on #sinoroc's much-published answer, I came up with the following code (incorporating the mentioned importlib.util.find_spec method, but with a bash-based search against the RECORD file in the path returned). I also tried to implement #sinoroc's version - but was not successful. Both methods are included to demonstrate.
Run as "python3 python_find-module-package.py -m [module-name-here] -d", which will also print debug. Leave off the "-d" switch to get just the package name returned (and errors).
TLDR: Code on github.
#!/usr/bin/python3
import sys
import os.path
import importlib.util
import importlib_metadata
import pathlib
import subprocess
import argparse
parser = argparse.ArgumentParser()
parser.add_argument("-m", "--module", help="Find matching package for the specified Python module",
type=str)
#parser.add_argument("-u", "--username", help="Database username",
# type=str)
#parser.add_argument("-p", "--password", help="Database password",
# type=str)
parser.add_argument("-d", "--debug", help="Debug messages are enabled",
action="store_true")
args = parser.parse_args()
TESTMODULE='serial'
def debugPrint (message="Nothing"):
if args.debug:
print ("[DEBUG] %s" % str(message))
class application ():
def __init__(self, argsPassed):
self.argsPassed = argsPassed
debugPrint("Got these arguments:\n%s" % (argsPassed))
def run (self):
#debugPrint("Running with args:\n%s" % (self.argsPassed))
try:
if self.argsPassed.module is not None:
self.moduleName=self.argsPassed.module #i.e. the module that you're trying to match with a package.
else:
self.moduleName=TESTMODULE
print("[WARN] No module name supplied - defaulting to %s!" % (TESTMODULE))
self.location=importlib.util.find_spec(self.moduleName).origin
debugPrint(self.location)
except:
print("[ERROR] Parsing module name!")
exit(1)
try:
self.getPackage()
except Exception as e:
print ("[ERROR] getPackage failed: %s" % str(e))
try:
distResult=self.getDistribution(self.location)
self.packageStrDist=distResult.metadata['Name']
print(self.packageStrDist)
except Exception as e:
print ("[ERROR] getDistribution failed: %s" % str(e))
debugPrint("Parent package for \"%s\" is: \"%s\"" % (self.moduleName, self.packageStr))
return self.packageStr
def getPackage (self):
locationStr=self.location.split("site-packages/",1)[1]
debugPrint(locationStr)
#serial/__init__.py
locationDir=self.location.split(locationStr,1)[0]
debugPrint(locationDir)
#/usr/lib/python3.6/site-packages
cmd='find \"' + locationDir + '\" -type f -iname \'RECORD\' -printf \'\"%p\"\\n\' | xargs grep \"' + locationStr + '\" -l -Z'
debugPrint(cmd)
#find "/usr/lib/python3.6/site-packages" -type f -iname 'RECORD' -printf '"%p"\n' | xargs grep "serial/__init__.py" -l -Z
#return_code = os.system(cmd)
#return_code = subprocess.run([cmd], stdout=subprocess.PIPE, universal_newlines=True, shell=False)
#findResultAll = return_code.stdout
findResultAll = subprocess.check_output(cmd, shell=True) # Returns stdout as byte array, null terminated.
findResult = str(findResultAll.decode('ascii').strip().strip('\x00'))
debugPrint(findResult)
#/usr/lib/python3.6/site-packages/pyserial-3.4.dist-info/RECORD
findDir = os.path.split(findResult)
self.packageStr=findDir[0].replace(locationDir,"")
debugPrint(self.packageStr)
def getDistribution(self, fileName=TESTMODULE):
result = None
for distribution in importlib_metadata.distributions():
try:
relative = (pathlib.Path(fileName).relative_to(distribution.locate_file('')))
#except ValueError:
#except AttributeError:
except:
pass
else:
if relative in distribution.files:
result = distribution
return result
if __name__ == '__main__':
result=1
try:
prog = application(args)
result = prog.run()
except Exception as E:
print ("[ERROR] Prog Exception: %s" % str(E))
finally:
sys.exit(result)
# exit the program if we haven't already
print ("Shouldn't get here.")
sys.exit(result)

Link Waf target to a library generated by external build system (CMake)

My waf project has two dependencies, built with CMake.
What I'm trying to do, is following the dynamic_build3 example found in waf git repo, create a tool which spawns CMake and after a successful build, performs an install into waf's output subdirectory:
#extension('.txt')
def spawn_cmake(self, node):
if node.name == 'CMakeLists.txt':
self.cmake_task = self.create_task('CMake', node)
self.cmake_task.name = self.target
#feature('cmake')
#after_method('process_source')
def update_outputs(self):
self.cmake_task.add_target()
class CMake(Task.Task):
color = 'PINK'
def keyword(self):
return 'CMake'
def run(self):
lists_file = self.generator.source[0]
bld_dir = self.generator.bld.bldnode.make_node(self.name)
bld_dir.mkdir()
# process args and append install prefix
try:
cmake_args = self.generator.cmake_args
except AttributeError:
cmake_args = []
cmake_args.append(
'-DCMAKE_INSTALL_PREFIX={}'.format(bld_dir.abspath()))
# execute CMake
cmd = '{cmake} {args} {project_dir}'.format(
cmake=self.env.get_flat('CMAKE'),
args=' '.join(cmake_args),
project_dir=lists_file.parent.abspath())
try:
self.generator.bld.cmd_and_log(
cmd, cwd=bld_dir.abspath(), quiet=Context.BOTH)
except WafError as err:
return err.stderr
# execute make install
try:
self.generator.bld.cmd_and_log(
'make install', cwd=bld_dir.abspath(), quiet=Context.BOTH)
except WafError as err:
return err.stderr
try:
os.stat(self.outputs[0].abspath())
except:
return 'library {} does not exist'.format(self.outputs[0])
# store the signature of the generated library to avoid re-running the
# task without need
self.generator.bld.raw_deps[self.uid()] = [self.signature()] + self.outputs
def add_target(self):
# override the outputs with the library file name
name = self.name
bld_dir = self.generator.bld.bldnode.make_node(name)
lib_file = bld_dir.find_or_declare('lib/{}'.format(
(
self.env.cshlib_PATTERN
if self.generator.lib_type == 'shared' else self.env.cstlib_PATTERN
) % name))
self.set_outputs(lib_file)
def runnable_status(self):
ret = super(CMake, self).runnable_status()
try:
lst = self.generator.bld.raw_deps[self.uid()]
if lst[0] != self.signature():
raise Exception
os.stat(lst[1].abspath())
return Task.SKIP_ME
except:
return Task.RUN_ME
return ret
I'd like to spawn the tool and then link the waf target to the installed libraries, which I perform using the "fake library" mechanism by calling bld.read_shlib():
def build(bld):
bld.post_mode = Build.POST_LAZY
# build 3rd-party CMake dependencies first
for lists_file in bld.env.CMAKE_LISTS:
if 'Chipmunk2D' in lists_file:
bld(
source=lists_file,
features='cmake',
target='chipmunk',
lib_type='shared',
cmake_args=[
'-DBUILD_DEMOS=OFF',
'-DINSTALL_DEMOS=OFF',
'-DBUILD_SHARED=ON',
'-DBUILD_STATIC=OFF',
'-DINSTALL_STATIC=OFF',
'-Wno-dev',
])
bld.add_group()
# after this, specifying `use=['chipmunk']` in the target does the job
out_dir = bld.bldnode.make_node('chipmunk')
bld.read_shlib(
'chipmunk',
paths=[out_dir.make_node('lib')],
export_includes=[out_dir.make_node('include')])
I find this * VERY UGLY * because:
The chipmunk library is needed ONLY during final target's link phase, there's no reason to block the whole build (by using Build.POST_LAZY mode and bld.add_group()), though unblocking it makes read_shlib() fail. Imagine if there was also some kind of git clone task before that...
Calling read_shlib() in build() command implies that the caller knows about how and where the tool installs the files. I'd like the tool itself to perform the call to read_shlib() (if necessary at all). But I failed doing this in run() and in runnable_status(), as suggested paragraph 11.4.2 of Waf Book section about Custom tasks, seems that I have to incapsulate in some way the call to read_shlib() in ANOTHER task and put it inside the undocumented more_tasks attribute.
And there are the questions:
How can I incapsulate the read_shlib() call in a task, to be spawned by the CMake task?
Is it possible to let the tasks go in parallel in a non-blocking way for other tasks (suppose a project has 2 or 3 of these CMake dependencies, which are to be fetched by git from remote repos)?

Well in fact you have already done most of the work :)
read_shlib only create a fake task pretending to build an already existing lib. In your case, you really build the lib, so you really don't need read_shlib. You can just use your cmake task generator somewhere, given that you've set the right parameters.
The keyword use recognizes some parameters in the used task generators:
export_includes
export_defines
It also manage libs and tasks order if the used task generator has a link_task.
So you just have to set the export_includes and export_defines correctly in your cmake task generator, plus set a link_task attribute which reference your cmake_task attribute. You must also set your cmake_task outputs correctly for this to work, ie the first output of the list must be the lib node (what you do in add_target seems ok). Something like:
#feature('cmake')
#after_method('update_outputs')
def export_for_use(self):
self.link_task = self.cmake_task
out_dir = self.bld.bldnode.make_node(self.target)
self.export_includes = out_dir.make_node('include')
This done, you will simply write in your main wscript:
def build(bld):
for lists_file in bld.env.CMAKE_LISTS:
if 'Chipmunk2D' in lists_file:
bld(
source=lists_file,
features='cmake',
target='chipmunk',
lib_type='shared',
cmake_args=[
'-DBUILD_DEMOS=OFF',
'-DINSTALL_DEMOS=OFF',
'-DBUILD_SHARED=ON',
'-DBUILD_STATIC=OFF',
'-DINSTALL_STATIC=OFF',
'-Wno-dev',
])
bld.program(source="main.cpp", use="chipmunk")
You can of course simplify/factorize the code. I think add_target should not be in the task, it manages mainly task generator attributes.

Get the current git hash in a Python script

I would like to include the current git hash in the output of a Python script (as a the version number of the code that generated that output).
How can I access the current git hash in my Python script?

No need to hack around getting data from the git command yourself. GitPython is a very nice way to do this and a lot of other git stuff. It even has "best effort" support for Windows.
After pip install gitpython you can do
import git
repo = git.Repo(search_parent_directories=True)
sha = repo.head.object.hexsha
Something to consider when using this library. The following is taken from gitpython.readthedocs.io
Leakage of System Resources
GitPython is not suited for long-running processes (like daemons) as it tends to leak system resources. It was written in a time where destructors (as implemented in the __del__ method) still ran deterministically.
In case you still want to use it in such a context, you will want to search the codebase for __del__ implementations and call these yourself when you see fit.
Another way assure proper cleanup of resources is to factor out GitPython into a separate process which can be dropped periodically

This post contains the command, Greg's answer contains the subprocess command.
import subprocess
def get_git_revision_hash() -> str:
return subprocess.check_output(['git', 'rev-parse', 'HEAD']).decode('ascii').strip()
def get_git_revision_short_hash() -> str:
return subprocess.check_output(['git', 'rev-parse', '--short', 'HEAD']).decode('ascii').strip()
when running
print(get_git_revision_hash())
print(get_git_revision_short_hash())
you get output:
fd1cd173fc834f62fa7db3034efc5b8e0f3b43fe
fd1cd17

The git describe command is a good way of creating a human-presentable "version number" of the code. From the examples in the documentation:
With something like git.git current tree, I get:
[torvalds#g5 git]$ git describe parent
v1.0.4-14-g2414721
i.e. the current head of my "parent" branch is based on v1.0.4, but since it has a few commits on top of that, describe has added the number of additional commits ("14") and an abbreviated object name for the commit itself ("2414721") at the end.
From within Python, you can do something like the following:
import subprocess
label = subprocess.check_output(["git", "describe"]).strip()

Here's a more complete version of Greg's answer:
import subprocess
print(subprocess.check_output(["git", "describe", "--always"]).strip().decode())
Or, if the script is being called from outside the repo:
import subprocess, os
print(subprocess.check_output(["git", "describe", "--always"], cwd=os.path.dirname(os.path.abspath(__file__))).strip().decode())
Or, if the script is being called from outside the repo and you like pathlib:
import subprocess
from pathlib import Path
print(subprocess.check_output(["git", "describe", "--always"], cwd=Path(__file__).resolve().parent).strip().decode())

numpy has a nice looking multi-platform routine in its setup.py:
import os
import subprocess
# Return the git revision as a string
def git_version():
def _minimal_ext_cmd(cmd):
# construct minimal environment
env = {}
for k in ['SYSTEMROOT', 'PATH']:
v = os.environ.get(k)
if v is not None:
env[k] = v
# LANGUAGE is used on win32
env['LANGUAGE'] = 'C'
env['LANG'] = 'C'
env['LC_ALL'] = 'C'
out = subprocess.Popen(cmd, stdout = subprocess.PIPE, env=env).communicate()[0]
return out
try:
out = _minimal_ext_cmd(['git', 'rev-parse', 'HEAD'])
GIT_REVISION = out.strip().decode('ascii')
except OSError:
GIT_REVISION = "Unknown"
return GIT_REVISION

If subprocess isn't portable and you don't want to install a package to do something this simple you can also do this.
import pathlib
def get_git_revision(base_path):
git_dir = pathlib.Path(base_path) / '.git'
with (git_dir / 'HEAD').open('r') as head:
ref = head.readline().split(' ')[-1].strip()
with (git_dir / ref).open('r') as git_hash:
return git_hash.readline().strip()
I've only tested this on my repos but it seems to work pretty consistantly.

This is an improvement of Yuji 'Tomita' Tomita answer.
import subprocess
def get_git_revision_hash():
full_hash = subprocess.check_output(['git', 'rev-parse', 'HEAD'])
full_hash = str(full_hash, "utf-8").strip()
return full_hash
def get_git_revision_short_hash():
short_hash = subprocess.check_output(['git', 'rev-parse', '--short', 'HEAD'])
short_hash = str(short_hash, "utf-8").strip()
return short_hash
print(get_git_revision_hash())
print(get_git_revision_short_hash())

if you want a bit more data than the hash, you can use git-log:
import subprocess
def get_git_hash():
return subprocess.check_output(['git', 'log', '-n', '1', '--pretty=tformat:%H']).strip()
def get_git_short_hash():
return subprocess.check_output(['git', 'log', '-n', '1', '--pretty=tformat:%h']).strip()
def get_git_short_hash_and_commit_date():
return subprocess.check_output(['git', 'log', '-n', '1', '--pretty=tformat:%h-%ad', '--date=short']).strip()
for full list of formating options - check out git log --help

I ran across this problem and solved it by implementing this function.
https://gist.github.com/NaelsonDouglas/9bc3bfa26deec7827cb87816cad88d59
from pathlib import Path
def get_commit(repo_path):
git_folder = Path(repo_path,'.git')
head_name = Path(git_folder, 'HEAD').read_text().split('\n')[0].split(' ')[-1]
head_ref = Path(git_folder,head_name)
commit = head_ref.read_text().replace('\n','')
return commit
r = get_commit('PATH OF YOUR CLONED REPOSITORY')
print(r)

I had a problem similar to the OP, but in my case I'm delivering the source code to my client as a zip file and, although I know they will have python installed, I cannot assume they will have git. Since the OP didn't specify his operating system and if he has git installed, I think I can contribute here.
To get only the hash of the commit, Naelson Douglas's answer was perfect, but to have the tag name, I'm using the dulwich python package. It's a simplified git client in python.
After installing the package with pip install dulwich --global-option="--pure" one can do:
from dulwich import porcelain
def get_git_revision(base_path):
return porcelain.describe(base_path)
r = get_git_revision("PATH OF YOUR REPOSITORY's ROOT FOLDER")
print(r)
I've just run this code in one repository here and it showed the output v0.1.2-1-gfb41223, similar to what is returned by git describe, meaning that I'm 1 commit after the tag v0.1.2 and the 7-digit hash of the commit is fb41223.
It has some limitations: currently it doesn't have an option to show if a repository is dirty and it always shows a 7-digit hash, but there's no need to have git installed, so one can choose the trade-off.
Edit: in case of errors in the command pip install due to the option --pure (the issue is explained here), pick one of the two possible solutions:
Install Dulwich package's dependencies first:
pip install urllib3 certifi && pip install dulwich --global-option="--pure"
Install without the option pure: pip install dulwich. This will install some platform dependent files in your system, but it will improve the package's performance.

If you don't have Git available for some reason, but you have the git repo (.git folder is found), you can fetch the commit hash from .git/fetch/heads/[branch].
For example, I've used a following quick-and-dirty Python snippet run at the repository root to get the commit id:
git_head = '.git\\HEAD'
# Open .git\HEAD file:
with open(git_head, 'r') as git_head_file:
# Contains e.g. ref: ref/heads/master if on "master"
git_head_data = str(git_head_file.read())
# Open the correct file in .git\ref\heads\[branch]
git_head_ref = '.git\\%s' % git_head_data.split(' ')[1].replace('/', '\\').strip()
# Get the commit hash ([:7] used to get "--short")
with open(git_head_ref, 'r') as git_head_ref_file:
commit_id = git_head_ref_file.read().strip()[:7]

If you are like me :
Multiplatform so subprocess may crash one day
Using Python 2.7 so GitPython not available
Don't want to use Numpy just for that
Already using Sentry (old depreciated version : raven)
Then (this will not work on shell because shell doesn't detect current file path, replace BASE_DIR by your current file path) :
import os
import raven
BASE_DIR = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
print(raven.fetch_git_sha(BASE_DIR))
That's it.
I was looking for another solution because I wanted to migrate to sentry_sdk and leave raven but maybe some of you want to continue using raven for a while.
Here was the discussion that get me into this stackoverflow issue
So using the code of raven without raven is also possible (see discussion) :
from __future__ import absolute_import
import os.path
__all__ = 'fetch_git_sha'
def fetch_git_sha(path, head=None):
"""
>>> fetch_git_sha(os.path.dirname(__file__))
"""
if not head:
head_path = os.path.join(path, '.git', 'HEAD')
with open(head_path, 'r') as fp:
head = fp.read().strip()
if head.startswith('ref: '):
head = head[5:]
revision_file = os.path.join(
path, '.git', *head.split('/')
)
else:
return head
else:
revision_file = os.path.join(path, '.git', 'refs', 'heads', head)
if not os.path.exists(revision_file):
# Check for Raven .git/packed-refs' file since a `git gc` may have run
# https://git-scm.com/book/en/v2/Git-Internals-Maintenance-and-Data-Recovery
packed_file = os.path.join(path, '.git', 'packed-refs')
if os.path.exists(packed_file):
with open(packed_file) as fh:
for line in fh:
line = line.rstrip()
if line and line[:1] not in ('#', '^'):
try:
revision, ref = line.split(' ', 1)
except ValueError:
continue
if ref == head:
return revision
with open(revision_file) as fh:
return fh.read().strip()
I named this file versioning.py and I import "fetch_git_sha" where I need it passing file path as argument.
Hope it will help some of you ;)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Linux package management with Python - python

Related

Python subprocess.run can't find /bin/sh in chroot

Pyinfra - host/config objects

Python3 (pip): find which package provides a particular module

Link Waf target to a library generated by external build system (CMake)

Get the current git hash in a Python script

Categories

Resources