similarly to the topic below, keras stopped working.
tf.keras - Training on first epoch not progressing despite using GPU memory
I've a python 3.7 anaconda installation on windows
cuda 10.2 and cudnn installed
3080 GPU
keras 2.3.1
TF 1.4
A few days ago everything was running perfectly. Then after installing pytorch keras stopped working. The same script I was training before now get stuck on the first epoch. No errors are displayed when running model.fit (verbose 2). Simply the whole memory is full (even using a very small dataset) and the training is not advancing.
As additional information pytorch displayed an error about the impossibility to use cuda.
I've tried to format the whole PC (factory reset) and the issue is still happening.
I'm out of ideas. Any suggestion would be more then welcome.
Thanks!
I really think that factory reset of the whole PC was really not necessary. I would suggest creating two conda virtual environments, one with Tensorflow and the other with PyTorch. Conda virtual environments are a really useful, they keep things separated and this might be really useful for your application. Here there is the Anaconda official reference explaining how to manage the environments.
Related
I'm running one kernel to learn a Tensorflow model and that's using my GPU. Now, in the same conda environment, I would like to evaluate another model learned before, and the model is also a Tensorflow one. I'm sure I can run two kernels with the same conda environment mostly but I'm not sure when using GPU. Now if I run a kernel using Tensorflow, can it affect a kernel running early somehow, especially in terms of GPU usage?
My environment: Windows10, tensorflow2.1, python3.7.9
This is not the best answer though, I realized that I can evaluate my model in another conda environment that has another version of Tensorflow. In this environment, my CUDA and CUDNN versions are not compatible with the version of Tensorflow, so my GPU was not used. In this sense, I evaluated a model without stopping or affecting learning a model in the running kernel.
I am trying to create a GPU environment in Jupyter notebook to run CNN models but have had trouble. I am on MacOS (Big Sur) and was following the instructions from: https://www.techentice.com/how-to-make-jupyter-notebook-to-run-on-gpu/
First, to create a separate GPU environment in Jupyter understand that I need CUDA toolkit. However, found out that CUDA toolkit no longer supports Mac.
Second, understand that I have to download tensor flow GPU which apparently doesn't support MAC/python 3.7.
would be grateful for any help or advice please. essentially I just want to be able to run my code on GPU as CPU is way too slow for machine learning models. is there any way around this?
I am not experienced user in Python. I have been working with R for the years, but keras implemented there doesn't provide any reproducible examples of working with object detection architectures like Faster R-CNN. I found a lot of examples that harness Python, but I faced troubles just even run some examples from the first lines: it is all built on the downloading through pip operator (in terminal in Ubuntu or orther Linux OS), while analogues for Windows conda users are not provided.
That is, I even don't know how to install module mrcnn from the one example on my Windows machine. Should I suffer further? I have had a very bad experience trying launch compatible versions of CUDA, cudNN and other things for my keras on Ubuntu. And now I am returning to the Windows, but... keras in R still doesn't provide any suggestions for object detection techniques.
Does somebody have links for Faster or Mask R-CNN implementation with conda examples for downloading prerequisites? My googling is failed here. Or in R-keras.
I am new to Jetson tegra x2 board.
I have a plan to run my tensorflow-gpu models on TX2 board and see how they perform there. These models are trained and tested on GTX GPU machine.
On tx2 board, Jetpack full does not have tensorflow in it. So tensorflow needs to be built/installed which I have seen several tutorials on and tried. My python files train.py and test.py expect tensorflow-gpu.
Now I suspect, if tensorflow-gpu buiding on tx2 board is the right way to go?
Oh, there is Nvidia TensorRT on TX2, that will do part of the job, but how? and is that right?
Will tensorflow and tensorRT work together to replace tensorflow-gpu? but how? then what modifications will i have to make in my train and test python files?
Do I really need to build tensorflow for tx2 at all? I only need inference I don't want to do training there.
I have studied different blogs and tried a several options but now things are bit messed up.
My simple question is:
What are steps to get inference done on Jetson TX2 board by using TensorFlow-GPU deep learning models trained on GTX machine?
The easiest way is to install the NVIDIA provided wheel: https://docs.nvidia.com/deeplearning/dgx/install-tf-jetsontx2/index.html
All the dependencies are already installed by JetPack.
After you install Tensorflow using the wheel, you can use it however you use Tensorflow on other platforms. For running inference, you can download a Tensorflow model into TX2 memory, and run your Tensorflow inference scripts on them.
You can also optimize your Tensorflow models by passing them through TF-TRT: https://docs.nvidia.com/deeplearning/dgx/integrate-tf-trt/index.html
There is just one API call that does the optimization: create_inference_graph(...)
This will optimize the Tensorflow graph (by mostly fusing nodes), and also let you build the models for lower precision to get better speedup.
I built tensorflow on JetsonTX2 following this guide. It provides instructions and wheels for both Python 2 and Python3.
https://github.com/jetsonhacks/installTensorFlowTX2
If you are new to Jetson TX2, also take a look at this "Guide to deploying deep-learning inference networks and deep vision primitives with TensorRT and NVIDIA Jetson". (*This does not require tensorflow installation since Jetpack already builds TensorRT)
https://github.com/dusty-nv/jetson-inference#building-from-source-on-jetson
If you have tensorflow trained graphs that you want to run inference on Jetson then you need to first install tensorflow. Afterwards, it is recommended (not compulsory for inference) that you optimize your trained models with tensorRT.Check out these repos for object detection/classification examples that uses TensorRT optimization.
https://github.com/NVIDIA-AI-IOT/tf_trt_models
https://github.com/NVIDIA-AI-IOT/tf_to_trt_image_classification
You can find the tensorflow-gpu wheel files of TX2 for both python 2.7 and python 3.5 in this link of Nvidia's Developer Forum.
https://devtalk.nvidia.com/default/topic/1031300/jetson-tx2/tensorflow-1-8-wheel-with-jetpack-3-2-/
I developed a CNN using tensorflow and python2.7. I am now switching my code to python3.5. I have both python versions on my machine and I have two tensorflow versions installed (one through pip and one through pip3). I am using Linux 16.04.
When I try to run my code using the python3 command, it takes a very long time to load and doesn't start training (where it used to take 3 seconds). It slows down my entire machine, so it's probably a memory issue. My coworker gets a memory error when running the same code on Windows (his machine has 128GB of memory).
My CNN has only two convolutional layers and one fully connected layer, and I'm loading less than 100MB of data.
Why is tensorflow acting differently when I change the python version?