Installation of tensorflow terminated on RStudio Cloud

Installation of tensorflow terminated on RStudio Cloud - python

Similarly to posts here and here, I am having more trouble when I try to install TensorFlow in a new RStudio Cloud project. I know I need to set up both Miniconda and a virtual environment locally in /cloud/project/ so the Python dependencies stay with copies of the cloud project. Previous versions of the following setup script worked.
install.packages(c("keras", "rstudioapi", "tensorflow"))
lines <- c(
paste0("RETICULATE_CONDA=", file.path(getwd(), "miniconda", "bin", "conda")),
paste0("RETICULATE_PYTHON=", file.path(getwd(), "miniconda", "bin", "python")),
paste0("WORKON_HOME=", file.path(getwd(), "virtualenvs"))
)
writeLines(lines, ".Renviron")
rstudioapi::restartSession()
reticulate::install_miniconda("miniconda")
reticulate::virtualenv_create(
envname = "r-tensorflow",
python = Sys.getenv("RETICULATE_PYTHON")
)
keras::install_keras(
method = "virtualenv",
conda = Sys.getenv("RETICULATE_CONDA"),
envname = "r-tensorflow"
)
But I get an error on Cloud when I try to install Python's TensorFlow and Keras:
keras::install_keras(
+ method = "virtualenv",
+ conda = Sys.getenv("RETICULATE_CONDA"),
+ envname = "r-tensorflow"
+ )
Using virtual environment 'r-tensorflow' ...
Collecting tensorflow==2.2.0
Downloading tensorflow-2.2.0-cp38-cp38-manylinux2010_x86_64.whl (516.3 MB)
Killed
Error: Error installing package(s): 'tensorflow==2.2.0', 'keras', 'tensorflow-hub', 'h5py', 'pyyaml==3.12', 'requests', 'Pillow', 'scipy'
The same script on my local Ubuntu machine appears to succeed, but it ignores my local virtual environment even though I set WORKON_HOME.
> tensorflow::tf_config()
Installation of TensorFlow not found.
Python environments searched for 'tensorflow' package:
/home/landau/projects/targets-tutorial/miniconda/bin/python3.8
You can install TensorFlow using the install_tensorflow() function.
Example project that uses this general approach: https://github.com/wlandau/targets-keras.

Related

Azure dataset .to_pandas_dataframe() error

I am following an azure ml course on udemy and cannot get around the following error:
Execution failed in operation 'to_pandas_dataframe' for Dataset(id='id', name='Loan Applications Using SDK', version=1, error_code=None, exception_type=PandasImportError)
Here is the code for Submitting the Script:
from azureml.core import Workspace, Experiment, ScriptRunConfig,
Environment
ws = Workspace.from_config(path="./config")
new_experiment = Experiment(workspace=ws,
name="Loan_Script")
script_config = ScriptRunConfig(source_directory=".",
script="180 - Script to Run.py")
script_config.framework = "python"
script_config.environment = Environment("conda_env")
new_run = new_experiment.submit(config=script_config)
Here is the Script being run:
from azureml.core import Workspace, Datastore, Dataset,
Experiment
from azureml.core import Run
ws = Workspace.from_config(path="./config")
az_store = Datastore.get(ws, "bencouser_sdk_blob01")
az_dataset = Dataset.get_by_name(ws, name='Loan Applications Using SDK')
az_default_store = ws.get_default_datastore()
#%%----------------------------------------------------
# Get context of the run
#------------------------------------------------------
new_run = Run.get_context()
#%%----------------------------------------------------
# Stuff that will be logged
#------------------------------------------------------
df = az_dataset.to_pandas_dataframe()
total_observations = len(df)
nulldf = df.isnull().sum()
#%%----------------------------------------------------
# Complete the Experiment
#------------------------------------------------------
new_run.log("Total Observations:", total_observations)
for columns in df.columns:
new_run.log(columns, nulldf[columns])
new_run.complete()
I have run the .to_pandas_dataframe() part outside of an experiment and it worked without error. I have also tried the following (that was recommended in the driver log):
InnerException Could not import pandas. Ensure a compatible version is installed by running: pip install azureml-dataprep[pandas]
I have seen people come across this before but I cannot find a solution, any help is appreciated.

When doing an experiment a new azure environment was created without pandas installed. To install pandas (if using anaconda nav) go onto environments in the anaconda nav window, click the azure env, go to uninstalled packages and search pandas, click install. It worked once this was done.

For VS code users:
conda info --envs
You will have an environment with a name starting from azureml_f3f7e6c5xxxxxxx. Activate this environment
conda activate azureml_f3f7e6c5xxxxxxx
Then install pandas in the environment
pip install pandas

Dependency missing when running AzureML Estimator in docker environment

Scenario description
I'm trying to submit a training script to AzureML (want to use AmlCompute, but I'm starting/testing locally first, for debugging purposes).
The train.py script I have uses a custom package (arcus.ml) and I believe I have specified the right settings and dependencies, but still I get the error:
User program failed with ModuleNotFoundError: No module named 'arcus.ml'
Code and reproduction
This the python code I have:
name='test'
script_params = {
'--test-par': 0.2
}
est = Estimator(source_directory='./' + name,
script_params=script_params,
compute_target='local',
entry_script='train.py',
pip_requirements_file='requirements.txt',
conda_packages=['scikit-learn','tensorflow', 'keras'])
run = exp.submit(est)
print(run.get_portal_url())
This is the (fully simplified) train.py script in the testdirectory:
from arcus.ml import dataframes as adf
from azureml.core import Workspace, Dataset, Datastore, Experiment, Run
# get hold of the current run
run = Run.get_context()
ws = run.get_environment()
print('training finished')
And this is my requirements.txt file
arcus-azureml
arcus-ml
numpy
pandas
azureml-core
tqdm
joblib
scikit-learn
matplotlib
tensorflow
keras
Logs
In the logs file of the run, I can see this section, sot it seems the external module is being installed anyhow.
Collecting arcus-azureml
Downloading arcus_azureml-1.0.3-py3-none-any.whl (3.1 kB)
Collecting arcus-ml
Downloading arcus_ml-1.0.6-py3-none-any.whl (2.1 kB)

It could be there's an issue with arcus-ml 1.0.6 wheel installable, like Anders pointed out it doesn't seem to have any code. Could you try with earlier version arcus-ml==1.0.5 ?

I think this error isn't necessarily about Azure ML. I think the error has to do w/ the difference b/w using a hyphen and a period in your package name. But I'm a python packaging newb.
In a new conda environment on my laptop, I ran the following
> conda create -n arcus python=3.6 -y
> conda activate arcus
> pip install arcus-ml
> python
>>> from arcus.ml import dataframes as adf
ModuleNotFoundError: No module named 'arcus'
When I look in the env's site packages folder, I didn't see the arcus/ml folder structure I was expecting. There's no arcus code there at all, only the .dist-info file
~/opt/anaconda3/envs/arcus/lib/python3.6/site-packages

Problem with accessing virtual environment Python from R Markdown

Note: I'm on Windows using Git Bash.
So, I am trying to setup a dev environment for the work for my class. It is going to involve a combination of coding in R and Python.
I created virtual environments using pipenv and virtualenv and ran into the same problem with both. So, first, let's create a virtual environment for the project in a sub-folder dev_env:
cd project_folder/dev_env
pipenv --python 3.7
pipenv --py
Output
C:\Users\Ra Me\.virtualenvs\dev_env-5TUtSZI9\Scripts\python.exe
Now I'm going into my file.rmd and trying the reticulate package.
#install.packages("reticulate")
library(reticulate)
Next, I tried 2 methods:
Sys.setenv(RETICULATE_PYTHON = "C:/Users/Ra Me/.virtualenvs/dev_env-5TUtSZI9/Scripts")
or
use_virtualenv("C:/Users/Ra Me/.virtualenvs/dev_env-5TUtSZI9/", required = TRUE)
x = 1
if x:
print('Hello!')
Both of them produced the error
Fatal Python error: initfsencoding: unable to load the file system codec
ModuleNotFoundError: No module named 'encodings'
However, when I change the path to the Python environment that's installed for all users on my machine, it works.
Sys.setenv(RETICULATE_PYTHON = "C:/Program Files/Python37/")
This method also works. However, here we are not even using the reticulate project.
knitr::opts_chunk$set(engine.path = list(
python = "C:/Program Files/Python37/python.exe",
r = "C:/Program Files/R/R-3.6.1/bin/R.exe"
))

In NixOS, how can I install an environment with the Python packages SpaCy, pandas, and jenks-natural-breaks?

I'm very new to NixOS, so please forgive my ignorance. I'm just trying to set up a Python environment---any kind of environment---for developing with SpaCy, the SpaCy data, pandas, and jenks-natural-breaks. Here's what I've tried so far:
pypi2nix -V "3.6" -E gcc -E libffi -e spacy -e pandas -e numpy --default-overrides, followed by nix-build -r requirements.nix -A packages. I've managed to get the first command to work, but the second fails with Could not find a version that satisfies the requirement python-dateutil>=2.5.0 (from pandas==0.23.4)
Writing a default.nix that looks like this: with import <nixpkgs> {};
python36.withPackages (ps: with ps; [ spacy pandas scikitlearn ]). This fails with collision between /nix/store/9szpqlby9kvgif3mfm7fsw4y119an2kb-python3.6-msgpack-0.5.6/lib/python3.6/site-packages/msgpack/_packer.cpython-36m-x86_64-linux-gnu.so and /nix/store/d08bgskfbrp6dh70h3agv16s212zdn6w-python3.6-msgpack-python-0.5.6/lib/python3.6/site-packages/msgpack/_packer.cpython-36m-x86_64-linux-gnu.so
Making a new virtualenv, and then running pip install on all these packages. Scikit-learn fails to install, with fish: Unknown command 'ar rc build/temp.linux-x86_64-3.6/liblibsvm-skl.a build/temp.linux-x86_64-3.6/sklearn/svm/src/libsvm/libsvm_template.o'
I guess ideally I'd like to install this environment with nix, so that I could enter it with nix-shell, and so other environments could reuse the same python packages. How would I go about doing that? Especially since some of these packages exist in nixpkgs, and others are only on Pypi.

Caveat
I had trouble with jenks-natural-breaks to the tune of
nix-shell ❯ poetry run python -c 'import jenks_natural_breaks'
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/home/matt/2022/12/28-2/.venv/lib/python3.10/site-packages/jenks_natural_breaks/__init__.py", line 5, in <module>
from ._jenks_matrices import ffi as _ffi
ModuleNotFoundError: No module named 'jenks_natural_breaks._jenks_matrices'
So I'm going to use jenkspy which appears to be a bit livelier. If that doesn't scratch your itch, I'd contact the maintainer of jenks-natural-breaks for guidance
Flakes
you said:
so other environments could reuse the same python packages
Which makes me think that a flake.nix is what you need. What's cool about flakes is that you can define an environment that has spacy, pandas, and jenkspy with one flake. And then you (or somebody else) might say:
I want an env like Jonathan's, except I also want sympy
and rather than copying your env and making tweaks, they can declare your env as a build input and write a flake.nix with their modifications--which can be further modified by others.
One could imagine a sort of family-tree of environments, so you just need to pick the one that suits your task. The python community has not yet converged on this vision.
Poetry
Poetry will treat you like you're trying to publish a library when all you asked for is an environment, but a library's dependencies are pretty much an environment so there's nothing wrong with having an empty package and just using poetry as an environment factory.
Bonus: if you decide to publish a library after all, you're ready.
The Setup
nix flakes thinks in terms of git repo's, so we'll start with one:
$ git init
Then create a file called flake.nix. Usually I end up with poetry handling 90% of the python stuff, but both pandas and spacy are in that 10% that has dependencies which link to system libraries. So we ask nix to install them so that when poetry tries to install them in the nix develop shell, it has what it needs.
{
description = "Jonathan's awesome env";
inputs = {
nixpkgs.url = "github:nixos/nixpkgs";
};
outputs = { self, nixpkgs, flake-utils }: (flake-utils.lib.eachSystem [
"x86_64-linux"
"x86_64-darwin"
"aarch64-linux"
"aarch64-darwin"
] (system:
let
pkgs = nixpkgs.legacyPackages.${system};
in
rec {
packages.jonathansenv = pkgs.poetry2nix.mkPoetryApplication {
projectDir = ./.;
};
defaultPackage = packages.jonathansenv;
devShell = pkgs.mkShell {
buildInputs = [
pkgs.poetry
pkgs.python310Packages.pandas
pkgs.python310Packages.spacy
];
};
}));
}
Now we let git know about the flake and enter the environment:
❯ git add flake.nix
❯ nix develop
$
Then we initialize the poetry project. I've found that poetry, installed by nix, is kind of odd about which python it uses by default, so we'll set it explicitly
$ poetry init # follow prompts
$ poetry env use $(which python)
$ poetry run python --version
Python 3.10.9 # declared in the flake.nix
At this point, we should have a pyproject.toml:
[tool.poetry]
name = "jonathansenv"
version = "0.1.0"
description = ""
authors = ["Your Name <you#example.com>"]
readme = "README.md"
[tool.poetry.dependencies]
python = "^3.10"
jenkspy = "^0.3.2"
spacy = "^3.4.4"
pandas = "^1.5.2"
[build-system]
requires = ["poetry-core"]
build-backend = "poetry.core.masonry.api"
Usage
Now we create the venv that poetry will use, and run a command that depends on these.
$ poetry install
$ poetry run python -c 'import jenkspy, spacy, pandas'
You can also have poetry put you in a shell:
$ poetry shell
(venv)$ python -c 'import jenkspy, spacy, pandas'
It's kind of awkward to do so though, because we're two subshells deep and any shell customizations that we have the grandparent shell are not available. So I recommend using direnv, to enter the dev shell whenever I navigate to that directory and then just use poetry run ... to run commands in the environment.
Publishing the env
In addition to running nix develop with the flake.nix in your current dir, you can also do nix develop /local/path/to/repo or develop nix develop github:/githubuser/githubproject to achieve the same result.
To demonstrate the github example, I have pushed the files referenced above here. So you ought to be able to run this from any linux shell with nix installed:
❯ nix develop github:/MatrixManAtYrService/nix-flake-pandas-spacy
$ poetry install
$ poetry run python -c 'import jenkspy, spacy, pandas'
I say "ought" because if I run that command on a mac it complains about linux-headers-5.19.16 being unsupported on x86_64-darwin.
Presumably there's a way to write the flake (or fix a package) so that it doesn't insist on building linux stuff on a mac, but until I figure it out I'm afraid that this is only a partial answer.

How to import newly compiled python module?

I have compiled lightgbm with GPU support for python from sources following this guide http://lightgbm.readthedocs.io/en/latest/GPU-Windows.html
Test usage from console was succesful:
C:\github_repos\LightGBM\examples\binary_classification>"../../lightgbm.exe" config=train.conf data=binary.train valid=binary.test objective=binary device=gpu
[LightGBM] [Warning] objective is set=binary, objective=binary will be ignored. Current value: objective=binary
[LightGBM] [Warning] data is set=binary.train, data=binary.train will be ignored. Current value: data=binary.train
[LightGBM] [Warning] valid is set=binary.test, valid_data=binary.test will be ignored. Current value: valid=binary.test
[LightGBM] [Info] Finished loading parameters
[LightGBM] [Info] Loading weights...
Then I tried to import in Python with no luck. It import anaconda version without GPU support:
from sklearn.datasets import load_iris
iris = load_iris()
import lightgbm as lgb
lgtrain = lgb.Dataset(iris.data, iris.target)
lgb_clf = lgb.train(
{
'objective' : 'regression',
'metric' : 'rmse',
'num_leaves' : 350,
#'max_depth': 14,
'learning_rate' : 0.017,
'feature_fraction' : 0.5,
'bagging_fraction' : .8,
'verbosity' : -1 ,
'device' : 'gpu'
},
lgtrain,
num_boost_round=3500,
verbose_eval=100
)
LightGBMError: b'GPU Tree Learner was not enabled in this build. Recompile with CMake option -DUSE_GPU=1'
I believe I have to specify the location but how?

I think this might not be specific to lightGBM, but rather a problem with Anaconda's virtual environment. When working within the Anaconda virtual env, your system paths are modified to point to Anaconda installation directories.
As you point out, this leads to Anaconda loading its own version, rather than the external version you configured, compiled and tested.
There are several ways to force Anaconda to find your package, see this related discussion.
The suggestions that involve running ln -s are only for Linux and Mac, but you can do something similar in Windows.
You could start by uninstalling the Anaconda version of lightGBM, then create a copy of the custom-compiled version within the Anaconda path. You can discover this using
import sys
sys.path

Remove previously installed Python package with the following command:
pip uninstall lightgbm
or
conda uninstall lightgbm
After doing that navigate to the Python package directory and install it with the library file which you've compiled:
cd LightGBM/python-package
python setup.py install --precompile

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.