deepwater with tensorflow on RHEL6

deepwater with tensorflow on RHEL6 - python

I am trying to get Python + deepwater + tensorflow to run on RHEL 6.7. Using conda, I have installed python 3.6.0, tensorflow 1.1.0 and also gcc 4.8.5. TF is working fine.
I have installed the following libraries using pip install: h2o-3.11.0.3904-py2.py3-none-any.whl and h2o-3.11.0-py2.py3-none-any.whl.
I tried to run the following example from the h2o tutorial
import h2o
from h2o.estimators.deepwater import H2ODeepWaterEstimator
h2o.init()
train = h2o.import_file("https://h2o-public-test-data.s3.amazonaws.com/bigdata/laptop/mnist/train.csv.gz")
features = list(range(0,784))
target = 784
train[target] = train[target].asfactor()
model = H2ODeepWaterEstimator(epochs=100, activation="Rectifier", hidden=[200,200], ignore_const_cols=False,
mini_batch_size=256, input_dropout_ratio=0.1, hidden_dropout_ratios=[0.5,0.5], stopping_rounds=3,
stopping_tolerance=0.05, stopping_metric="misclassification", score_interval=2, score_duty_cycle=0.5,
score_training_samples=1000, score_validation_samples=1000, nfolds=5, gpu=False, seed=1234, backend="tensorflow")
model.train(x=features, y=target, training_frame=train)
The following exception is thrown
Exception: Unable to initialize the native Deep Learning backend: Cannot find TensorFlow native library for OS: linux, architecture: x86_64. See https://github.com/tensorflow/tensorflow/tree/master/java/README.md for possible solutions (such as building the library from source).
Is there anything else that I am missing? Would I need to build the bits from scratch for this platform?

Related

Why does onnxruntime fail to create CUDAExecutionProvider in Linux(Ubuntu 20)?

import onnxruntime as rt
ort_session = rt.InferenceSession(
"my_model.onnx",
providers=["CUDAExecutionProvider"],
)
onnxruntime (onnxruntime-gpu 1.13.1) works (in Jupyter VsCode env - Python 3.8.15) well when providers is ["CPUExecutionProvider"]. But for ["CUDAExecutionProvider"] it sometimes(not always) throws an error as:
[W:onnxruntime:Default, onnxruntime_pybind_state.cc:578 CreateExecutionProviderInstance] Failed to create CUDAExecutionProvider. Please reference https://onnxruntime.ai/docs/reference/execution-providers/CUDA-ExecutionProvider.html#requirements to ensure all dependencies are met.
I tried following the provided link in the error, and tried different setups in the conda environment to test the code with various version combinations.

Replacing:
import onnxruntime as rt
with
import torch
import onnxruntime as rt
somehow perfectly solved my problem.

AttributeError: module 'tensorflow.python.keras' has no attribute 'applications'

Trying to convert a keras model (Thumbs.h5) into an onnx model on Google Colab, however I am getting an "AttributeError: module 'tensorflow.python.keras' has no attribute 'applications'" error when I run the code.
My code:
from tensorflow.python.keras import backend as K
from tensorflow.python.keras.models import load_model
import onnx
import keras2onnx
onnx_model_name = 'fish-resnet50.onnx'
model = load_model('model-resnet50-final.h5')
onnx_model = keras2onnx.convert_keras(model, model.name)
onnx.save_model(onnx_model, onnx_model_name)
What I've tried:
Updating keras with !pip install keras --upgrade (already updated)
Running it locally with a jupyter notebook on my M1 Mac (V12.4) to get the same error
Pointers or solutions greatly appreciated.

As per the documentation
keras2onnx has been tested on Python 3.5 - 3.8, with tensorflow
1.x/2.0 - 2.2
So install compatible version of tensorflow.
Also keras-onnx is not under active development so use tf2onnx as per the documentation

detectron2 - CUDA is not available

I am trying out detectron2 and want to train the sample model.
When running the following code I get (<class 'RuntimeError'>, RuntimeError('No CUDA GPUs are available'), <traceback object at 0x7f42b094ebc0>). Find below the code:
import detectron2
from detectron2.utils.logger import setup_logger
setup_logger()
# import some common libraries
import matplotlib.pyplot as plt
import numpy as np
import cv2
# import some common detectron2 utilities
from detectron2.engine import DefaultPredictor
from detectron2.config import get_cfg
from detectron2.utils.visualizer import Visualizer
from detectron2.data import MetadataCatalog, DatasetCatalog
from detectron2.data.datasets import register_coco_instances
import random
from detectron2.engine import DefaultTrainer
from detectron2.config import get_cfg
import os
# To verify the data loading is correct, let's visualize the annotations of randomly selected samples in the training set:
register_coco_instances("fruits_nuts", {}, "../data/trainval.json", "../data/images")
fruits_nuts_metadata = MetadataCatalog.get("fruits_nuts")
dataset_dicts = DatasetCatalog.get("fruits_nuts")
'''
for d in random.sample(dataset_dicts, 3):
img = cv2.imread(d["file_name"])
visualizer = Visualizer(img[:, :, ::-1], metadata=fruits_nuts_metadata, scale=0.5)
vis = visualizer.draw_dataset_dict(d)
cv2.imshow('new', vis.get_image()[:, :, ::-1])
cv2.waitKey(0)
'''
# train model
cfg = get_cfg()
cfg.merge_from_file("../detectron2_repo/configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml")
cfg.DATASETS.TRAIN = ("fruits_nuts",)
cfg.DATASETS.TEST = () # no metrics implemented for this dataset
cfg.DATALOADER.NUM_WORKERS = 2
cfg.MODEL.WEIGHTS = "detectron2://COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x/137849600/model_final_f10217.pkl" # initialize from model zoo
cfg.SOLVER.IMS_PER_BATCH = 2
cfg.SOLVER.BASE_LR = 0.02
cfg.SOLVER.MAX_ITER = 300 # 300 iterations seems good enough, but you can certainly train longer
cfg.MODEL.ROI_HEADS.BATCH_SIZE_PER_IMAGE = 128 # faster, and good enough for this toy dataset
cfg.MODEL.ROI_HEADS.NUM_CLASSES = 3 # 3 classes (data, fig, hazelnut)
os.makedirs(cfg.OUTPUT_DIR, exist_ok=True)
trainer = DefaultTrainer(cfg)
trainer.resume_or_load(resume=False)
trainer.train()
I ran the script collect_env.py from torch:
/home/project/.venv/bin/python /home/project/src/collect_env.py
Collecting environment information...
PyTorch version: 1.10.2+cu102
Is debug build: False
CUDA used to build PyTorch: 10.2
ROCM used to build PyTorch: N/A
OS: Ubuntu 20.04.3 LTS (x86_64)
GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Clang version: Could not collect
CMake version: Could not collect
Libc version: glibc-2.31
Python version: 3.8.10 (default, Nov 26 2021, 20:14:08) [GCC 9.3.0] (64-bit runtime)
Python platform: Linux-5.13.0-27-generic-x86_64-with-glibc2.29
Is CUDA available: False
CUDA runtime version: 10.1.243
GPU models and configuration: Could not collect
Nvidia driver version: Could not collect
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Versions of relevant libraries:
[pip3] mypy-extensions==0.4.3
[pip3] numpy==1.22.1
[pip3] torch==1.10.2
[pip3] torchvision==0.11.3
[conda] Could not collect
Process finished with exit code 0
I am having on the system a RTX3080 graphic card. However, it seems to me that its not found.
Any suggestions why?
Is there a way to run the training without CUDA?
I appreciate your replies!

I'm not sure if this works for you. But let's see from a Windows user perspective.
I'm using Detectron2 on Windows 10 with RTX3060 Laptop GPU CUDA enabled.
The first thing you should check is the CUDA. You can check by using the command:
nvcc -V
It should be shown this message:
C:\Users\User>nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Mon_May__3_19:41:42_Pacific_Daylight_Time_2021
Cuda compilation tools, release 11.3, V11.3.109
Build cuda_11.3.r11.3/compiler.29920130_0
And to check if your Pytorch is installed with CUDA enabled, use this command (reference from their website):
import torch
torch.cuda.is_available()
As on your system info shared in this question, you haven't installed CUDA on your system. And your system doesn't detect any GPU (driver) available on your system.
As far as I know, they recommended installing Pytorch CUDA to run Detectron2 by (Nvidia) GPU.
(you can check on Pytorch website and Detectron2 GitHub repo for more details).
Or, you can use this option:
Add this line of code to your python program (as reference of this issues#300):
cfg.MODEL.DEVICE = "cpu"
I hope it helps. Cheers.

AttributeError: module transformers has no attribute TFGPTNeoForCausalLM

I cloned this repository/documentation https://huggingface.co/EleutherAI/gpt-neo-125M
I get the below error whether I run it on google collab or locally. I also installed transformers using this
pip install git+https://github.com/huggingface/transformers
and made sure the configuration file is named as config.json
5 tokenizer = AutoTokenizer.from_pretrained("gpt-neo-125M/",from_tf=True)
----> 6 model = AutoModelForCausalLM.from_pretrained("gpt-neo-125M",from_tf=True)
7
8
3 frames
/usr/local/lib/python3.7/dist-packages/transformers/file_utils.py in __getattr__(self, name)
AttributeError: module transformers has no attribute TFGPTNeoForCausalLM
Full code:
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-neo-125M",from_tf=True)
model = AutoModelForCausalLM.from_pretrained("EleutherAI/gpt-neo-125M",from_tf=True)
transformers-cli env results:
transformers version: 4.10.0.dev0
Platform: Linux-4.4.0-19041-Microsoft-x86_64-with-glibc2.29
Python version: 3.8.5
PyTorch version (GPU?): 1.9.0+cpu (False)
Tensorflow version (GPU?): 2.5.0 (False)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using GPU in script?:
Using distributed or parallel set-up in script?:
Both collab and locally have TensorFlow 2.5.0 version

Try without using from_tf=True flag like below:
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-neo-125M")
model = AutoModelForCausalLM.from_pretrained("EleutherAI/gpt-neo-125M")
from_tf expects the pretrained_model_name_or_path (i.e. the first parameter) to be a path to load saved Tensorflow checkpoints from.

My solution was to first edit the source code to remove the line that adds "TF" in front of the package as the correct transformers module is GPTNeoForCausalLM
, but somewhere in the source code it manually added a "TF" in front of it.
Secondly, before cloning the repository it is a must to run
git lfs install.
This link helped me install git lfs properly https://askubuntu.com/questions/799341/how-to-install-git-lfs-on-ubuntu-16-04

How to import newly compiled python module?

I have compiled lightgbm with GPU support for python from sources following this guide http://lightgbm.readthedocs.io/en/latest/GPU-Windows.html
Test usage from console was succesful:
C:\github_repos\LightGBM\examples\binary_classification>"../../lightgbm.exe" config=train.conf data=binary.train valid=binary.test objective=binary device=gpu
[LightGBM] [Warning] objective is set=binary, objective=binary will be ignored. Current value: objective=binary
[LightGBM] [Warning] data is set=binary.train, data=binary.train will be ignored. Current value: data=binary.train
[LightGBM] [Warning] valid is set=binary.test, valid_data=binary.test will be ignored. Current value: valid=binary.test
[LightGBM] [Info] Finished loading parameters
[LightGBM] [Info] Loading weights...
Then I tried to import in Python with no luck. It import anaconda version without GPU support:
from sklearn.datasets import load_iris
iris = load_iris()
import lightgbm as lgb
lgtrain = lgb.Dataset(iris.data, iris.target)
lgb_clf = lgb.train(
{
'objective' : 'regression',
'metric' : 'rmse',
'num_leaves' : 350,
#'max_depth': 14,
'learning_rate' : 0.017,
'feature_fraction' : 0.5,
'bagging_fraction' : .8,
'verbosity' : -1 ,
'device' : 'gpu'
},
lgtrain,
num_boost_round=3500,
verbose_eval=100
)
LightGBMError: b'GPU Tree Learner was not enabled in this build. Recompile with CMake option -DUSE_GPU=1'
I believe I have to specify the location but how?

I think this might not be specific to lightGBM, but rather a problem with Anaconda's virtual environment. When working within the Anaconda virtual env, your system paths are modified to point to Anaconda installation directories.
As you point out, this leads to Anaconda loading its own version, rather than the external version you configured, compiled and tested.
There are several ways to force Anaconda to find your package, see this related discussion.
The suggestions that involve running ln -s are only for Linux and Mac, but you can do something similar in Windows.
You could start by uninstalling the Anaconda version of lightGBM, then create a copy of the custom-compiled version within the Anaconda path. You can discover this using
import sys
sys.path

Remove previously installed Python package with the following command:
pip uninstall lightgbm
or
conda uninstall lightgbm
After doing that navigate to the Python package directory and install it with the library file which you've compiled:
cd LightGBM/python-package
python setup.py install --precompile

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

deepwater with tensorflow on RHEL6 - python

Related

Why does onnxruntime fail to create CUDAExecutionProvider in Linux(Ubuntu 20)?

AttributeError: module 'tensorflow.python.keras' has no attribute 'applications'

detectron2 - CUDA is not available

AttributeError: module transformers has no attribute TFGPTNeoForCausalLM

How to import newly compiled python module?

Categories

Resources