I would like to use the Docker image jupyter/datascience-notebook to start a Jupyter notebook frontend, but I need to be able to control which ports it chooses to use for its communication. I understand that the server is designed to potentially provision many kernels, not just one, but what I want is for the first kernel to use the ports I specify. I have tried supply arguments like:
docker run --rm -it jupyter/datascience-notebook:latest start-notebook.sh --existing /my/connection/file.json
docker run --rm -it jupyter/datascience-notebook:latest start-notebook.sh --KernelManager.control_port=60018
And it does not seem to care, instead creating the connection file in the usual location under /home/jovyan/.local/share/jupyter/
Any assistance is appreciated.
I ended up doing as suggested in a similar question - IPython notebook: How to connect to existing kernel? this, I could not find a better way.
Subclass LocalProvisioner to override its pre_launch method
from typing import Any, Dict
from jupyter_client import LocalProvisioner, LocalPortCache, KernelProvisionerBase
from jupyter_client.localinterfaces import is_local_ip, local_ips
class PickPortsProvisioner(LocalProvisioner):
async def pre_launch(self, **kwargs: Any) -> Dict[str, Any]:
"""Perform any steps in preparation for kernel process launch.
This includes applying additional substitutions to the kernel launch command and env.
It also includes preparation of launch parameters.
Returns the updated kwargs.
"""
# This should be considered temporary until a better division of labor can be defined.
km = self.parent
if km:
if km.transport == 'tcp' and not is_local_ip(km.ip):
raise RuntimeError(
"Can only launch a kernel on a local interface. "
"This one is not: %s."
"Make sure that the '*_address' attributes are "
"configured properly. "
"Currently valid addresses are: %s" % (km.ip, local_ips())
)
# build the Popen cmd
extra_arguments = kwargs.pop('extra_arguments', [])
# write connection file / get default ports
# TODO - change when handshake pattern is adopted
if km.cache_ports and not self.ports_cached:
lpc = LocalPortCache.instance()
km.shell_port = 60000
km.iopub_port = 60001
km.stdin_port = 60002
km.hb_port = 60003
km.control_port = 60004
self.ports_cached = True
km.write_connection_file()
self.connection_info = km.get_connection_info()
kernel_cmd = km.format_kernel_cmd(
extra_arguments=extra_arguments
) # This needs to remain here for b/c
else:
extra_arguments = kwargs.pop('extra_arguments', [])
kernel_cmd = self.kernel_spec.argv + extra_arguments
return await KernelProvisionerBase.pre_launch(self, cmd=kernel_cmd, **kwargs)
Specify entry point in setup.py
entry_points = {
'jupyter_client.kernel_provisioners': [
'pickports-provisioner = mycompany.pickports_provisioner:PickPortsProvisioner',
],
},
Create kernel.json to overwrite the default one
{
"argv": [
"/opt/conda/bin/python",
"-m",
"ipykernel_launcher",
"-f",
"{connection_file}"
],
"display_name": "Python 3 (ipykernel)",
"language": "python",
"metadata": {
"debugger": true,
"kernel_provisioner": { "provisioner_name": "pickports-provisioner" }
}
}
Dockerfile
# Start from a core stack version
FROM jupyter/datascience-notebook:latest
# Install from requirements.txt file
COPY --chown=${NB_UID}:${NB_GID} requirements.txt .
COPY --chown=${NB_UID}:${NB_GID} setup.py .
RUN pip install --quiet --no-cache-dir --requirement requirements.txt
# Copy kernel.json to default location
COPY kernel.json /opt/conda/share/jupyter/kernels/python3/
# Install from sources
COPY --chown=${NB_UID}:${NB_GID} src .
RUN pip install --quiet --no-cache-dir .
Profit???
I am having some issues getting gcloud to run in a Bazel genrule. Looks like python path related issues.
genrule(
name="foo",
outs=["bar"],
srcs=[":bar.enc"],
cmd="gcloud decrypt --location=global --keyring=foo --key=bar --plaintext-file $# --ciphertext-file $(location bar.enc)"
)
The exception is:
ImportError: No module named traceback
From:
try:
gcloud_main = _import_gcloud_main()
except Exception as err: # pylint: disable=broad-except
# We want to catch *everything* here to display a nice message to the user
# pylint:disable=g-import-not-at-top
import traceback
# We DON'T want to suggest `gcloud components reinstall` here (ex. as
# opposed to the similar message in gcloud_main.py), as we know that no
# commands will work.
sys.stderr.write(
('ERROR: gcloud failed to load: {0}\n{1}\n\n'
'This usually indicates corruption in your gcloud installation or '
'problems with your Python interpreter.\n\n'
'Please verify that the following is the path to a working Python 2.7 '
'executable:\n'
' {2}\n\n'
'If it is not, please set the CLOUDSDK_PYTHON environment variable to '
'point to a working Python 2.7 executable.\n\n'
'If you are still experiencing problems, please reinstall the Cloud '
'SDK using the instructions here:\n'
' https://cloud.google.com/sdk/\n').format(
err,
'\n'.join(traceback.format_exc().splitlines()[2::2]),
sys.executable))
sys.exit(1)
My questions are:
How do I best call gcloud from a genrule?
What are the parameters needed to specify the python path?
How is Bazel blocking this?
Update:
Able to get it to run by specifying the CLOUDSDK_PYTHON.
Indeed, bazel runs in a sandbox, hence gcloud cannot find its dependencies. Acutally, I'm surprised gcloud can be invoked at all.
To proceed, I would wrap gcloud in a bazel py_binary and refer it with tools attribute in the genrule. You also need to wrap it with location in the cmd. In the end, you will have
genrule(
name = "foo",
outs = ["bar"],
srcs = [":bar.enc"],
cmd = "$(location //third_party/google/gcloud) decrypt --location=global --keyring=foo --key=bar --plaintext-file $# --ciphertext-file $(location bar.enc)",
tools = ["//third_party/google/gcloud"],
)
And for that you define in third_party/google/gcloud/BUILD (or anywhere your want, I just used a path that makes sense to me)
py_binary(
name = "gcloud",
srcs = ["gcloud.py"],
main = "gcloud.py",
visibility = ["//visibility:public"],
deps = [
":gcloud_sdk",
],
)
py_library(
name = "gcloud_sdk",
srcs = glob(
["**/*.py"],
exclude = ["gcloud.py"],
# maybe exclude tests and unrelated code, too.
),
deps = [
# Whatever extra deps are needed by gcloud to compile
]
)
I had a similar issue, worked for me running this command:
export CLOUDSDK_PYTHON=/usr/bin/python
(this was answered above as an update but I felt to post the whole command for future people coming here)
I am working with a scons build system for building shared libraries. Everything builds fine up to this part but now I am having some difficulties.
I cannot get scons to move the output file to a directory outside of the scons folder structure.
The only thing that I've sen on SO about it is this question here
If I try to write a python function to do it, the function runs first before the build finishes so I get some file not found errors.
How can I get scons to move a file to a directory outside of the directory where SConstruct file is defined?
import os
import shutil
Import('env')
android_files = [
'os_android.cpp',
'pic_android.cpp',
'file_access_android.cpp',
'dir_access_android.cpp',
'audio_driver_opensl.cpp',
'file_access_jandroid.cpp',
'dir_access_jandroid.cpp',
'thread_jandroid.cpp',
'audio_driver_jandroid.cpp',
'ifaddrs_android.cpp',
'android_native_app_glue.c',
'java_glue.cpp',
'cpu-features.c',
'java_class_wrapper.cpp'
]
env = env.Clone()
if env['target'] == "profile":
env.Append(CPPFLAGS=['-DPROFILER_ENABLED'])
android_objects=[]
for x in android_files:
android_objects.append( env.SharedObject( x ) )
prog = None
abspath=env.Dir(".").abspath
pp_basein = open(abspath+"/project.properties.template","rb")
pp_baseout = open(abspath+"/java/project.properties","wb")
pp_baseout.write( pp_basein.read() )
refcount=1
name="libpic"+env["SHLIBSUFFIX"]
dir="#bin/"+name
output=dir[1:]
ANDROID_HOME=os.environ.get('ANDROID_HOME')
ant_build=Dir('.').abspath+"/java/"
ANT_TARGET=ant_build+'local.properties'
ANT_SOURCES=ant_build+'build.xml'
ANDROID_HOME=os.environ.get('ANDROID_HOME')
ANT_COMMAND='ant release -Dsdk.dir='+ANDROID_HOME+' -f $SOURCE'
for x in env.android_source_modules:
pp_baseout.write("android.library.reference."+str(refcount)+"="+x+"\n")
refcount+=1
pp_baseout.close()
pp_basein = open(abspath+"/AndroidManifest.xml.template","rb")
pp_baseout = open(abspath+"/java/AndroidManifest.xml","wb")
manifest = pp_basein.read()
manifest = manifest.replace("$$ADD_APPLICATION_CHUNKS$$",env.android_manifest_chunk)
pp_baseout.write( manifest )
for x in env.android_source_files:
shutil.copy(x,abspath+"/java/src/com/android/pic")
for x in env.android_module_libraries:
shutil.copy(x,abspath+"/java/libs")
env.SharedLibrary("#bin/libpic",[android_objects],SHLIBSUFFIX=env["SHLIBSUFFIX"])
env.Command('#platform/android/java/libs/armeabi/libpic_android.so', dir, Copy('platform/android/java/libs/armeabi/libpic_android.so', output))
apk = env.Command(ANT_TARGET, source=ANT_SOURCES, action=ANT_COMMAND)
#env.Install('[install_dir]', apk)
#Cannot get this part to work, tried env.Install() but donno
#if env['target'] == 'release_debug'
#copy 'platform/android/java/bin/Pic-release-unsigned.apk' to templates as 'android_debug.apk'
#else:
#copy 'platform/android/java/bin/Pic-release-unsigned.apk' to templates as 'android_release.apk'
You should consider using the SCons Install Builder. This is the only way to install targets outside of the SCons project hierarchy.
I often find myself recreating file structures for Flask apps so I have decided to make a script to do all that for me. I would like the script to create all the folders I need as well as the files with some basic boilerplate, which it does, that part is working fine. However I would also like to create a virtual environment and install Flask to that environment. That is where I am encountering the problem. The script runs but it installs Flask to my system installation of Python.
I followed the advice in this question here but it's not working. I am running Ubuntu 12.04.4 LTS via crouton on a Chromebook.
#!/usr/bin/python
from os import mkdir, chdir, getcwd, system
import sys
APP_NAME = sys.argv[1]
ROOT = getcwd()
PROJECT_ROOT = ROOT + '/' + APP_NAME
# dictionary represents folder structure. Key is the folder name and the value is it's contents
folders = {APP_NAME : {'app' : {'static': {'css' : '', 'img' : '', 'js' : ''}, 'templates' : ''} } }
def create_folders(dic):
for key in dic:
if isinstance(dic[key], dict):
mkdir(key)
prev = getcwd() + '/' + key
chdir(prev)
create_folders(dic[key])
else:
mkdir(key)
create_folders(folders)
chdir(PROJECT_ROOT)
open('config.py', 'a').close()
with open('run.py', 'a') as run:
run.write("""stuff""")
with open('app/__init__.py', 'a') as init:
init.write("""stuff""")
with open('app/views.py', 'a') as views:
views.write("""stuff""")
open('app/models.py', 'a').close()
open('app/forms.py', 'a').close()
with open('app/templates/layout.html', 'a') as layout:
layout.write("""stuff""")
system('chmod a+x run.py')
system('virtualenv venv')
system('. venv/bin/activate;sudo pip install flask') # this does not seem to be working the way I am expecting it to
I suppose your calls are not within the same console session and therefore the console environment is not as expected. I suggest to concatenate the related commands in one system call using subprocess.Popen like this (including suggestions by limasxgoesto0):
subprocess.Popen('virtualenv venv;source venv/bin/activate;pip install flask')
You should probably be using subprocess; os.system is deprecated.
I would like to include the current git hash in the output of a Python script (as a the version number of the code that generated that output).
How can I access the current git hash in my Python script?
No need to hack around getting data from the git command yourself. GitPython is a very nice way to do this and a lot of other git stuff. It even has "best effort" support for Windows.
After pip install gitpython you can do
import git
repo = git.Repo(search_parent_directories=True)
sha = repo.head.object.hexsha
Something to consider when using this library. The following is taken from gitpython.readthedocs.io
Leakage of System Resources
GitPython is not suited for long-running processes (like daemons) as it tends to leak system resources. It was written in a time where destructors (as implemented in the __del__ method) still ran deterministically.
In case you still want to use it in such a context, you will want to search the codebase for __del__ implementations and call these yourself when you see fit.
Another way assure proper cleanup of resources is to factor out GitPython into a separate process which can be dropped periodically
This post contains the command, Greg's answer contains the subprocess command.
import subprocess
def get_git_revision_hash() -> str:
return subprocess.check_output(['git', 'rev-parse', 'HEAD']).decode('ascii').strip()
def get_git_revision_short_hash() -> str:
return subprocess.check_output(['git', 'rev-parse', '--short', 'HEAD']).decode('ascii').strip()
when running
print(get_git_revision_hash())
print(get_git_revision_short_hash())
you get output:
fd1cd173fc834f62fa7db3034efc5b8e0f3b43fe
fd1cd17
The git describe command is a good way of creating a human-presentable "version number" of the code. From the examples in the documentation:
With something like git.git current tree, I get:
[torvalds#g5 git]$ git describe parent
v1.0.4-14-g2414721
i.e. the current head of my "parent" branch is based on v1.0.4, but since it has a few commits on top of that, describe has added the number of additional commits ("14") and an abbreviated object name for the commit itself ("2414721") at the end.
From within Python, you can do something like the following:
import subprocess
label = subprocess.check_output(["git", "describe"]).strip()
Here's a more complete version of Greg's answer:
import subprocess
print(subprocess.check_output(["git", "describe", "--always"]).strip().decode())
Or, if the script is being called from outside the repo:
import subprocess, os
print(subprocess.check_output(["git", "describe", "--always"], cwd=os.path.dirname(os.path.abspath(__file__))).strip().decode())
Or, if the script is being called from outside the repo and you like pathlib:
import subprocess
from pathlib import Path
print(subprocess.check_output(["git", "describe", "--always"], cwd=Path(__file__).resolve().parent).strip().decode())
numpy has a nice looking multi-platform routine in its setup.py:
import os
import subprocess
# Return the git revision as a string
def git_version():
def _minimal_ext_cmd(cmd):
# construct minimal environment
env = {}
for k in ['SYSTEMROOT', 'PATH']:
v = os.environ.get(k)
if v is not None:
env[k] = v
# LANGUAGE is used on win32
env['LANGUAGE'] = 'C'
env['LANG'] = 'C'
env['LC_ALL'] = 'C'
out = subprocess.Popen(cmd, stdout = subprocess.PIPE, env=env).communicate()[0]
return out
try:
out = _minimal_ext_cmd(['git', 'rev-parse', 'HEAD'])
GIT_REVISION = out.strip().decode('ascii')
except OSError:
GIT_REVISION = "Unknown"
return GIT_REVISION
If subprocess isn't portable and you don't want to install a package to do something this simple you can also do this.
import pathlib
def get_git_revision(base_path):
git_dir = pathlib.Path(base_path) / '.git'
with (git_dir / 'HEAD').open('r') as head:
ref = head.readline().split(' ')[-1].strip()
with (git_dir / ref).open('r') as git_hash:
return git_hash.readline().strip()
I've only tested this on my repos but it seems to work pretty consistantly.
This is an improvement of Yuji 'Tomita' Tomita answer.
import subprocess
def get_git_revision_hash():
full_hash = subprocess.check_output(['git', 'rev-parse', 'HEAD'])
full_hash = str(full_hash, "utf-8").strip()
return full_hash
def get_git_revision_short_hash():
short_hash = subprocess.check_output(['git', 'rev-parse', '--short', 'HEAD'])
short_hash = str(short_hash, "utf-8").strip()
return short_hash
print(get_git_revision_hash())
print(get_git_revision_short_hash())
if you want a bit more data than the hash, you can use git-log:
import subprocess
def get_git_hash():
return subprocess.check_output(['git', 'log', '-n', '1', '--pretty=tformat:%H']).strip()
def get_git_short_hash():
return subprocess.check_output(['git', 'log', '-n', '1', '--pretty=tformat:%h']).strip()
def get_git_short_hash_and_commit_date():
return subprocess.check_output(['git', 'log', '-n', '1', '--pretty=tformat:%h-%ad', '--date=short']).strip()
for full list of formating options - check out git log --help
I ran across this problem and solved it by implementing this function.
https://gist.github.com/NaelsonDouglas/9bc3bfa26deec7827cb87816cad88d59
from pathlib import Path
def get_commit(repo_path):
git_folder = Path(repo_path,'.git')
head_name = Path(git_folder, 'HEAD').read_text().split('\n')[0].split(' ')[-1]
head_ref = Path(git_folder,head_name)
commit = head_ref.read_text().replace('\n','')
return commit
r = get_commit('PATH OF YOUR CLONED REPOSITORY')
print(r)
I had a problem similar to the OP, but in my case I'm delivering the source code to my client as a zip file and, although I know they will have python installed, I cannot assume they will have git. Since the OP didn't specify his operating system and if he has git installed, I think I can contribute here.
To get only the hash of the commit, Naelson Douglas's answer was perfect, but to have the tag name, I'm using the dulwich python package. It's a simplified git client in python.
After installing the package with pip install dulwich --global-option="--pure" one can do:
from dulwich import porcelain
def get_git_revision(base_path):
return porcelain.describe(base_path)
r = get_git_revision("PATH OF YOUR REPOSITORY's ROOT FOLDER")
print(r)
I've just run this code in one repository here and it showed the output v0.1.2-1-gfb41223, similar to what is returned by git describe, meaning that I'm 1 commit after the tag v0.1.2 and the 7-digit hash of the commit is fb41223.
It has some limitations: currently it doesn't have an option to show if a repository is dirty and it always shows a 7-digit hash, but there's no need to have git installed, so one can choose the trade-off.
Edit: in case of errors in the command pip install due to the option --pure (the issue is explained here), pick one of the two possible solutions:
Install Dulwich package's dependencies first:
pip install urllib3 certifi && pip install dulwich --global-option="--pure"
Install without the option pure: pip install dulwich. This will install some platform dependent files in your system, but it will improve the package's performance.
If you don't have Git available for some reason, but you have the git repo (.git folder is found), you can fetch the commit hash from .git/fetch/heads/[branch].
For example, I've used a following quick-and-dirty Python snippet run at the repository root to get the commit id:
git_head = '.git\\HEAD'
# Open .git\HEAD file:
with open(git_head, 'r') as git_head_file:
# Contains e.g. ref: ref/heads/master if on "master"
git_head_data = str(git_head_file.read())
# Open the correct file in .git\ref\heads\[branch]
git_head_ref = '.git\\%s' % git_head_data.split(' ')[1].replace('/', '\\').strip()
# Get the commit hash ([:7] used to get "--short")
with open(git_head_ref, 'r') as git_head_ref_file:
commit_id = git_head_ref_file.read().strip()[:7]
If you are like me :
Multiplatform so subprocess may crash one day
Using Python 2.7 so GitPython not available
Don't want to use Numpy just for that
Already using Sentry (old depreciated version : raven)
Then (this will not work on shell because shell doesn't detect current file path, replace BASE_DIR by your current file path) :
import os
import raven
BASE_DIR = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
print(raven.fetch_git_sha(BASE_DIR))
That's it.
I was looking for another solution because I wanted to migrate to sentry_sdk and leave raven but maybe some of you want to continue using raven for a while.
Here was the discussion that get me into this stackoverflow issue
So using the code of raven without raven is also possible (see discussion) :
from __future__ import absolute_import
import os.path
__all__ = 'fetch_git_sha'
def fetch_git_sha(path, head=None):
"""
>>> fetch_git_sha(os.path.dirname(__file__))
"""
if not head:
head_path = os.path.join(path, '.git', 'HEAD')
with open(head_path, 'r') as fp:
head = fp.read().strip()
if head.startswith('ref: '):
head = head[5:]
revision_file = os.path.join(
path, '.git', *head.split('/')
)
else:
return head
else:
revision_file = os.path.join(path, '.git', 'refs', 'heads', head)
if not os.path.exists(revision_file):
# Check for Raven .git/packed-refs' file since a `git gc` may have run
# https://git-scm.com/book/en/v2/Git-Internals-Maintenance-and-Data-Recovery
packed_file = os.path.join(path, '.git', 'packed-refs')
if os.path.exists(packed_file):
with open(packed_file) as fh:
for line in fh:
line = line.rstrip()
if line and line[:1] not in ('#', '^'):
try:
revision, ref = line.split(' ', 1)
except ValueError:
continue
if ref == head:
return revision
with open(revision_file) as fh:
return fh.read().strip()
I named this file versioning.py and I import "fetch_git_sha" where I need it passing file path as argument.
Hope it will help some of you ;)