how to install python libraries upon creation of ai platform notebooks - python

I want to use "Select a script to run after creation" when I create a notebook instance in GCP.
Specifically, I want to use it to install python packages.
What kind of script (extension and contents) do I need to write?

This will be an example of Post startup script that installs Voila.
Save this file in a GCS bucket and when creating the Notebook, define the path, for example:
gcloud notebooks instances create nb-1 \
'--vm-image-project=deeplearning-platform-release' \
'--vm-image-family=tf2-latest-cpu' \
'--metadata=post-startup-script=gs://ai-platform-notebooks-tools/install-voila.sh' \
'--location=us-central1-a'
Script contents:
#!/bin/bash -eu
# Installs Voila in AI Platform Notebook
function install_voila() {
echo 'Installing voila...'
/opt/conda/condabin/conda install -y -c conda-forge ipywidgets ipyvolume bqplot scipy
/opt/conda/condabin/conda install -y -c conda-forge voila
/opt/conda/bin/jupyter lab build
systemctl restart jupyter.service || echo 'Error restarting jupyter.service.'
}
function download_samples() {
echo 'Downloading samples...'
cd /home/jupyter
git clone https://github.com/voila-dashboards/voila
}
function main() {
install_voila || echo 'Error installing voila.'
download_samples || echo 'Error downloading voila samples.'
}
main

Related

How to fix python: not found error when I already installed python

I want to run a shell script in conda, but it shows the errors like
./run_augment_data.sh: 9: python: not found
but when I type
type python python3
the shell gives me an existing path.
python is /home/rd142857/anaconda3/envs/test_env/bin/python
python3 is /home/rd142857/anaconda3/envs/test_env/bin/python3
I tried changing python into python3, the above error disappears but the new error is
/usr/bin/python3: Error while finding module specification for 'torch.distributed.launch' (ModuleNotFoundError: No module named 'torch')
I notice that the python the script want to use is not the python in my conda. So I add the following sentence to the top of the script
#!/home/rd142857/anaconda3/envs/test_env/bin/python
then re-run the script, the new error is
File "/home/rd142857/grappa/grappa/./run_augment_data.sh", line 6
rm -r $LOGDIR
^
SyntaxError: invalid syntax
I really don't know what to do now.
The full content of the shell script is
#export NGPU=2;
#CUDA_VISIBLE_DEVICES=0,1 python -u -m torch.distributed.launch --nproc_per_node=$NGPU finetuning_roberta.py --train_corpus data/augment_data.txt \
LOGDIR="grappa_logs_checkpoints/ssp/"
rm -r $LOGDIR
mkdir $LOGDIR
export NGPU=4;
python -u -m torch.distributed.launch --nproc_per_node=$NGPU finetuning_roberta.py --train_corpus data/augment_data.txt \
--eval_corpus data/spider_dev_data_v2.txt \
--train_eval_corpus data/spider_train_data_small_v2.txt \
--bert_model roberta-large \
--output_dir $LOGDIR/ \
--do_train \
--do_eval \
--train_batch_size 12 \
--max_seq_length 218 \
--num_train_epochs 10 \
> $LOGDIR/log.out
The rm -r $LOGDIR implies that your script is a shell script, not a Python one, and so the shebang (top of the script) should be something like
run_augment_data.sh
#!/usr/bin/env bash -l
When trying to use a Python interpreter from a specific Conda environment, I recommend using conda run. Something like
conda run -n test_env python script.py
See conda run --help.

AWS CDK: Installing external dependencies using requirements.txt via PythonFunction

I am trying to synthesize a CDK app (typeScript) which has some python lambda functions.
I am using PythonFunction to use a requirements.txt file to install the external dependencies. I am running vscode on WSL. I am encountering the following error.
Bundling asset Test/test-lambda-stack/test-subscriber-data-validator-poc/Code/Stage...
node:internal/fs/utils:347
throw err;
^
Error: ENOENT: no such file or directory, open '~/.nvm/versions/node/v16.17.0/lib/node_modules/docker/node_modules/highlight.js/styles/cp -rTL /asset-input/ /asset-output && cd /asset-output && python -m pip install -r requirements.txt -t /asset-output.css'
at Object.openSync (node:fs:594:3)
at Object.readFileSync (node:fs:462:35)
at module.exports (~/.nvm/versions/node/v16.17.0/lib/node_modules/docker/src/getColourScheme.js:47:26)
at ~/.nvm/versions/node/v16.17.0/lib/node_modules/docker/src/docker.js:809:47
at FSReqCallback.readFileAfterClose [as oncomplete] (node:internal/fs/read_file_context:68:3)
at FSReqCallback.callbackTrampoline (node:internal/async_hooks:130:17) {
errno: -2,
syscall: 'open',
code: 'ENOENT',
path: '~/.nvm/versions/node/v16.17.0/lib/node_modules/docker/node_modules/highlight.js/styles/cp -rTL /asset-input/ /asset-output && cd /asset-output && python -m pip install -r requirements.txt -t /asset-output.css'
}
Error: Failed to bundle asset Test/test-lambda-stack/test-subscriber-data-validator-poc/Code/Stage, bundle output is located at ~/Code/AWS/CDK/test-dev-poc/cdk.out/asset.6b577fe604573a3b53e635f09f768df3f87ad6651b18e9f628c2a086a525bb49-error: Error: docker exited with status 1
at AssetStaging.bundle (~/Code/AWS/CDK/test-dev-poc/node_modules/aws-cdk-lib/core/lib/asset-staging.js:2:614)
at AssetStaging.stageByBundling (~/Code/AWS/CDK/test-dev-poc/node_modules/aws-cdk-lib/core/lib/asset-staging.js:1:4506)
at stageThisAsset (~/Code/AWS/CDK/test-dev-poc/node_modules/aws-cdk-lib/core/lib/asset-staging.js:1:1867)
at Cache.obtain (~/Code/AWS/CDK/test-dev-poc/node_modules/aws-cdk-lib/core/lib/private/cache.js:1:242)
at new AssetStaging (~/Code/AWS/CDK/test-dev-poc/node_modules/aws-cdk-lib/core/lib/asset-staging.js:1:2262)
at new Asset (~/Code/AWS/CDK/test-dev-poc/node_modules/aws-cdk-lib/aws-s3-assets/lib/asset.js:1:736)
at AssetCode.bind (~/Code/AWS/CDK/test-dev-poc/node_modules/aws-cdk-lib/aws-lambda/lib/code.js:1:4628)
at new Function (~/Code/AWS/CDK/test-dev-poc/node_modules/aws-cdk-lib/aws-lambda/lib/function.js:1:2803)
at new PythonFunction (~/Code/AWS/CDK/test-dev-poc/node_modules/#aws-cdk/aws-lambda-python-alpha/lib/function.ts:73:5)
at new lambdaInfraStack (~/Code/AWS/CDK/test-dev-poc/lib/serviceInfraStacks/lambda-infra-stack.ts:24:40)
My requirements.txt file looks like this
attrs==22.1.0
jsonschema==4.16.0
pyrsistent==0.18.1
My cdk code is this
new PythonFunction(this,`${appName}-subscriber-data-validator-${stage}`,{
runtime: Runtime.PYTHON_3_9,
entry: join('lambdas/subscriber_data_validator'),
handler: 'lambda_hander',
index: 'subscriber_data_validator.py'
})
Do I need to install anything additional? I have esbuild installed as a devDependency. Having a real hard time getting this work. Any help is appreciated.

Running LibreOffice converter on Docker

The problem is related to using LibreOffice headless converter to automatically convert uploaded files. Getting this error:
LibreOffice 7 fatal error - Application cannot be started
Ubuntu ver: 21.04
What I have tried:
Getting the file from Azure Blob storage,
put it into BASE_DIR/Input_file,
convert it to PDF using Linux command that I am running by subproccess,
put it into BASE_DIR/Output_file folder.
Below is my code:
I am installing the LibreOffice to docker this way
RUN apt-get update \
&& ACCEPT_EULA=Y apt-get -y install LibreOffice
The main logic:
blob_client = container_client.get_blob_client(f"Folder_with_reports/")
with open(os.path.join(BASE_DIR, f"input_files/{filename}"), "wb") as source_file:
source_file.write(data)
source_file = os.path.join(BASE_DIR, f"input_files/{filename}") # original docs here
output_folder = os.path.join(BASE_DIR, "output_files") # pdf files will be here
# assign the command of converting files through LibreOffice
command = rf"lowriter --headless --convert-to pdf {source_file} --outdir {output_folder}"
# running the command
subprocess.run(command, shell=True)
# reading the file and uploading it back to Azure Storage
with open(os.path.join(BASE_DIR, f"output_files/MyFile.pdf"), "rb") as outp_file:
outp_data = outp_file.read()
blob_name_ = f"test"
container_client.upload_blob(name = blob_name_ ,data = outp_data, blob_type="BlockBlob")
Should I install lowriter instead of LibreOffice? Is it okay to use BASE_DIR for this kind of operations? I would appreciate any suggestion.
Patial solution:
Here I have simplified the case and created additional docker image with this Dockerfile.
I apply both methods: unoconv and straight conversion.
Dockerfile:
FROM ubuntu:21.04
ENV DEBIAN_FRONTEND noninteractive
RUN apt-get update && apt-get -y upgrade && \
apt-get -y install python3.10 && \
apt update && apt install python3-pip -y
# Method1 - installing LibreOffice and java
RUN apt-get --no-install-recommends install libreoffice -y
RUN apt-get install -y libreoffice-java-common
# Method2 - additionally installing unoconv
RUN apt-get install unoconv
ARG CACHEBUST=1
ADD BASE.py /code/BASE.py
# copying input doc/docx files to the docker's linux
COPY /input_files /code/input_files
CMD ["/code/BASE.py"]
ENTRYPOINT ["python3"]
BASE.py
import os
import subprocess
BASE_DIR = "/code"
# subprocess.run("ls code/input_files", shell=True)
for filename in os.listdir('code/input_files'):
source_file = f"/code/input_files/{filename}" # original document
output_filename = os.path.splitext(filename)[0]+".pdf"
output_file = f"code/output_files/{output_filename}"
output_folder = "code/output_files" # pdf files will be here
# METHOD 1 - LibreOffice straightly
assign the command of converting files through LibreOffice
convert_to_pdf = rf"libreoffice --headless --convert-to pdf {source_file} --outdir {output_folder}"
subprocess.run(r'ls code/output_files/', shell=True)
## METHOD 2 - Using unoconv - also working
# convert_to_pdf = f"unoconv -f pdf {source_file}"
# subprocess.run(convert_to_pdf, shell=True)
# print(f'file {filename} converted')
The above mentioned methods allows to work with the problem if files was already in Linux filesystem while building. But still didn't find a way to write files into system after building the docker image.

How to launch jupyter lab in VSCode using Docker with .devcontainer.json

I am trying to launch jupyter lab in VSCode remote server, capsuled by Docker, but got error saying
Unable to start session for kernel Python 3.8.5 64-bit. Select another kernel to launch with.
I set up Dockerfile and .devcontainer.json in workspace dir.
Do I also need docker-compose.yaml file for jupyter lab setting like port forwarding?
Or I can handle and replace docker-compose file by .devcontainer.json file?
Dockerfile:
FROM python:3.8
RUN apt-get update --fix-missing && apt-get upgrade -y
# Set Japanese UTF-8 as locale so Japanese can be used
RUN apt-get install -y locales \
&& locale-gen ja_JP.UTF-8
ENV LANG ja_JP.UTF-8
ENV LANGUAGE ja_JP:ja
ENV LC_ALL ja_JP.UTF-8
# RUN apt-get install zsh -y && \
# chsh -s /usr/bin/zsh
# Install zsh with theme and some plugins
RUN sh -c "$(wget -O- https://raw.githubusercontent.com/deluan/zsh-in-docker/master/zsh-in-docker.sh)" \
-t mrtazz \
-p git -p ssh-agent
RUN pip install jupyterlab
RUN jupyter serverextension enable --py jupyterlab
WORKDIR /app
CMD ["bash"]
.devcontainer.json
{
"name": "Python 3.8",
"build": {
"dockerfile": "Dockerfile",
"context": ".."
},
// Uncomment to use docker-compose
// "dockerComposeFile": "docker-compose.yml",
// "service": "dev",
// Set *default* container specific settings.json values on container create.
"settings": {
"terminal.integrated.shell.linux": "/bin/bash",
"python.pythonPath": "/usr/local/bin/python",
"python.linting.enabled": true,
"python.linting.pylintEnabled": true,
"python.formatting.autopep8Path": "/usr/local/py-utils/bin/autopep8",
"python.formatting.blackPath": "/usr/local/py-utils/bin/black",
"python.formatting.yapfPath": "/usr/local/py-utils/bin/yapf",
"python.linting.banditPath": "/usr/local/py-utils/bin/bandit",
"python.linting.flake8Path": "/usr/local/py-utils/bin/flake8",
"python.linting.mypyPath": "/usr/local/py-utils/bin/mypy",
"python.linting.pycodestylePath": "/usr/local/py-utils/bin/pycodestyle",
"python.linting.pydocstylePath": "/usr/local/py-utils/bin/pydocstyle",
"python.linting.pylintPath": "/usr/local/py-utils/bin/pylint"
},
// Add the IDs of extensions you want installed when the container is created.
"extensions": [
"ms-python.python",
"teabyii.ayu",
"jeff-hykin.better-dockerfile-syntax",
"coenraads.bracket-pair-colorizer-2",
"file-icons.file-icons",
"emilast.logfilehighlighter",
"zhuangtongfa.material-theme",
"ibm.output-colorizer",
"wayou.vscode-todo-highlight",
"atishay-jain.all-autocomplete",
"amazonwebservices.aws-toolkit-vscode",
"hookyqr.beautify",
"phplasma.csv-to-table",
"alefragnani.bookmarks",
"mrmlnc.vscode-duplicate",
"tombonnike.vscode-status-bar-format-toggle",
"donjayamanne.githistory",
"codezombiech.gitignore",
"eamodio.gitlens",
"zainchen.json",
"ritwickdey.liveserver",
"yzhang.markdown-all-in-one",
"pkief.markdown-checkbox",
"shd101wyy.markdown-preview-enhanced",
"ionutvmi.path-autocomplete",
"esbenp.prettier-vscode",
"diogonolasco.pyinit",
"ms-python.vscode-pylance",
"njpwerner.autodocstring",
"kevinrose.vsc-python-indent",
"mechatroner.rainbow-csv",
"msrvida.vscode-sanddance",
"rafamel.subtle-brackets",
"formulahendry.terminal",
"tyriar.terminal-tabs",
"redhat.vscode-yaml"
],
// Use 'forwardPorts' to make a list of ports inside the container available locally.
"forwardPorts": [8888],
// Use 'postCreateCommand' to run commands after the container is created.
// "postCreateCommand": "pip3 install -r requirements.txt",
// Comment out to connect as root instead.
// "remoteUser": "myname",
"shutdownAction": "none"
}

How to install Bloomberg API Library for Python 2.7 on Mac OS X

I'm trying to setup my Mac OS X system to use the pdblp Python library which requires me to first install the Bloomberg Open API libary for Python. After cloning the git repo and running python setup.py install, I get
File "setup.py", line 20, in <module>
raise Exception("BLPAPI_ROOT environment variable isn't defined")
Exception: BLPAPI_ROOT environment variable isn't defined
How should I proceed?
Just to complete the question (thanks mob :)
Packages Source - https://www.bloomberglabs.com/api/libraries/
Preparation
SDK for C/C++
SDK for Python
Instructions
# navigate yourself to the path where you want to keep your SDK for some tim
cd /Users/msam/
# unzip C/C++ Package
tar zxvf Downloads/blpapi_cpp_3.8.1.1-darwin.tar.gz
# set variable
export BLPAPI_ROOT=/some/directory/blpapi_cpp_3.8.1.1/
export DYLD_LIBRARY_PATH=/Users/sampathkumarm/blpapi_cpp_3.8.1.1/Darwin/
# save variable to reuse in next session
echo >> ~/.bash_profile
echo "Bloomberg API (python)library Settings " >> ~/.bash_profile
echo "export BLPAPI_ROOT=/some/directory/blpapi_cpp_3.8.1.1/" >> ~/.bash_profile
echo "export DYLD_LIBRARY_PATH=/Users/sampathkumarm/blpapi_cpp_3.8.1.1/Darwin/" >> ~/.bash_profile
echo >> ~/.bash_profile
Ref:
1. python blpapi installation error
You also need to install the C/C++ libraries and then set BLPAPI_ROOT to the location of the libblpapi3_32.so or libblpapi3_64.so files. For example:
cd /some/directory
wget https://bloomberg.bintray.com/BLPAPI-Experimental-Generic/blpapi_cpp_3.8.1.1-darwin.tar.gz
tar zxvf blpapi_cpp_3.8.1.1-darwin.tar.gz
export BLPAPI_ROOT=/some/directory/blpapi_cpp_3.8.1.1/Darwin
export BLPAPI_ROOT=/some/directory/blpapi_cpp_3.8.1.1
Then you can proceed with installing the python library.

Categories