how to install invokeai properly? - python

I have some trouble installing the newest version (2.3.0) of invoke Ai.
Python 3.10.9 is already installed, but I always receive several error messages as following:
** Could not load VAE stabilityai/sd-vae-ft-mse: Unable to load weights from checkpoint file for 'C:\Users\User\invokeai\models\diffusers\models--stabilityai--sd-vae-ft-mse\snapshots\ad7ac2cf88578c68f660449f60fe9496f35a1cbf\diffusion_pytorch_model.safetensors' at 'C:\Users\User\invokeai\models\diffusers\models--stabilityai--sd-vae-ft-mse\snapshots\ad7ac2cf88578c68f660449f60fe9496f35a1cbf\diffusion_pytorch_model.safetensors'. If you tried to load a PyTorch model from a TF 2.0 checkpoint, please set from_tf=True.
An unexpected error occurred while downloading the model: Unable to load weights from checkpoint file for 'models\diffusers\models--stabilityai--sd-vae-ft-mse\snapshots\ad7ac2cf88578c68f660449f60fe9496f35a1cbf\diffusion_pytorch_model.safetensors' at 'models\diffusers\models--stabilityai--sd-vae-ft-mse\snapshots\ad7ac2cf88578c68f660449f60fe9496f35a1cbf\diffusion_pytorch_model.safetensors'. If you tried to load a PyTorch model from a TF 2.0 checkpoint, please set from_tf=True.)
d
I installed invoke AI through an automatic installation wizard but invoke ai wont run.
my system specs are:
Device name DESKTOP-0D5KCD8
Processor Intel(R) Core(TM) i7-7500U CPU # 2.70GHz 2.90 GHz
Installed RAM 16,0 GB (15,9 GB usable)
Device ID 29AF1B96-7F46-4A9E-BC4B-228963988119
Product ID 00330-50647-38567-AAOEM
System type 64-bit operating system, x64-based processor
Pen and touch Pen support
I tried to reinstall invoke ai and installed different versions alongside different versions of Python.

Related

How to use integrated GPU while training XGBoost model?

Firstly, I want to say that I am new in this field and don't know much.
I have following laptop: "dell vostro 15 5510", with GPU: "Intel(R) iris(R) Xe Graphics"
I have installed xgboost with following code
pip install xgboost
now am trying to train a model on GPU:
param = {'objective': 'multi:softmax', 'num_class':22}
param['tree_method'] = 'gpu_hist'
bst = xgb.train(param, dtrain, 50, verbose_eval=True, evals=eval_set)
but it throws following error:
XGBoostError: [11:16:53] C:/buildkite-agent/builds/buildkite-windows-cpu-autoscaling-group-i-0ac76685cf763591d-1/xgboost/xgboost-ci-windows/src/gbm/gbtree.cc:611: Check failed: common::AllVisibleGPUs() >= 1 (0 vs. 1) : No visible GPU is found for XGBoost.
I have tried to execute same code on google colab and it worked perfectly well. That's why I am thinking maybe my laptop needs to have dedicated GPU instead of integrated. and I don't think it is a problem of installation because https://xgboost.readthedocs.io/en/stable/install.html#python claims that pip install xgboost have GPU support on Windows

pytorch profiler nvidia cuda permission error (microsoft wsl): CUPTI_ERROR_NOT_INITIALIZED, CUPTI_ERROR_INSUFFICIENT_PRIVILEGES

I am trying to run a profiling script for pytorch on MS WSL 2.0 Ubuntu 20.04.
WSL is on the newest version (wsl --update). I am running the stable conda pytorch cuda 11.3 version from the pytorch website with pytorch 1.11. My GPU is a GTX 1650 Ti.
I can run my script fine and it finishes without error, but when I try to profile it using pytorch's bottleneck profiling tool python -m torch.utils.bottleneck run.py
it first throws this warning when starting the autograd profiler:
Running your script with the autograd profiler...
WARNING:2022-06-01 13:37:49 513:513 init.cpp:129] function status failed with error CUPTI_ERROR_NOT_INITIALIZED (15)
WARNING:2022-06-01 13:37:49 513:513 init.cpp:130] CUPTI initialization failed - CUDA profiler activities will be missing
Then, if I run for a small number of epochs, the script finishes fine, and it shows also the cuda profiling stats (even though it says profiler activities will be missing). But when I do a longer run, I get the message Killed after the script runs "through" the autograd profiler. The command dmesg gives this output at the end:
[ 1224.321233] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/,task=python,pid=295,uid=1000
[ 1224.321421] Out of memory: Killed process 295 (python) total-vm:55369308kB, anon-rss:15107852kB, file-rss:0kB, shmem-rss:353072kB, UID:1000 pgtables:39908kB oom_score_adj:0
[ 1224.746786] oom_reaper: reaped process 295 (python), now anon-rss:0kB, file-rss:0kB, shmem-rss:353936kB
So, when using the profiler, there seems to be a memory error (it might not necessarily be related to the above CUPTI warning). Is this related to the profiler somehow saving too much data in-mem? Then, it might be a common problem that occurs for too long runs, right?
The cuda warning CUPTI_ERROR_NOT_INITIALIZED indicates that my CUPTI (short for "CUDA Profiling Tools Interface") is not running. I read in another post that this might be related to me running a newer version of CUPTI that is not backcompatible with the older version of CUDA 11.3. As cupti is not included in the cudatoolkit on conda by default, the system is probably trying to use / locate the cupti, but does not find it / cannot use it.
I'd appreciate any help for this issue. It would be quite nice to see a longer profiling run, in order to determine the bottlenecks / expensive operations in my pytorch code.
Thanks!

(Tensorflow) Stuck at Epoch 1 during model.fit()

I've been trying to make Tensorflow 2.8.0 work with my Windows GPU (GeForce GTX 1650 Ti), and even though it detects my GPU, any model that I make will be stuck at Epoch 1 indefinitely when I try to use the fit method till the kernel (I've tried on jupyter notebook and spyder) hangs and restarts.
Based on Tensorflow's website, I've downloaded the respective cuDNN and CUDA versions, for which I've further verified (together with tensorflow's detection of my GPU) by running the various commands:
CUDA (Supposed to be 11.2)
(on command line)
nvcc --version
Build cuda_11.2.r11.2/compiler.29373293_0
(In python)
import tensorflow.python.platform.build_info as build
print(build.build_info['cuda_version'])
Output: '64_112'
cuDNN (Supposed to be 8.1)
import tensorflow.python.platform.build_info as build
print(build.build_info['cuda_version'])
Output: '64_8' # Looks like v8 but I've actually installed v8.1 (cuDNN v8.1.1 (Feburary 26th, 2021), for CUDA 11.0,11.1 and 11.2) so I think it's fine?
GPU Checks
tf.config.list_physical_devices('GPU')
Output: [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
tf.test.is_gpu_available()
Output: True
tf.test.gpu_device_name()
Output: This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
Created device /device:GPU:0 with 2153 MB memory: -> device: 0, name: NVIDIA GeForce GTX 1650 Ti, pci bus id: 0000:01:00.0, compute capability: 7.5
When I then try to fit any sort of model, it just fails following what I described above. What is surprising is that even though it can't load code such as that described in Tensorflow's CNN Tutorial, the only time it ever works is if I run the chunk of code from this stackoverflow question. This chunk of code looks almost the same as every other chunk that failed.
Can someone help me with this issue? I've been desperately testing TensorFlow with every chunk of code that I came across for the past couple of hours, and the only time where it does not get stuck at Epoch 1 is with the link above.
**(I've also tried running only on my CPU via os.environ['CUDA_VISIBLE_DEVICES'] = '-1' and everything seems to work fine)
Update (Solution)
It seems like the suggestions from this post helped - I've copied the following files from the zipped cudnn bin sub folder (cudnn-11.2-windows-x64-v8.1.1.33\cuda\bin) into my cuda bin folder (C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.2\bin)
cudnn_adv_infer64_8.dll
cudnn_adv_train64_8.dll
cudnn_cnn_infer64_8.dll
cudnn_cnn_train64_8.dll
cudnn_ops_infer64_8.dll
cudnn_ops_train64_8.dll
It seems like I initially misinterpreted the copy all cudnn*.dll files as only copying over the cudnn64_8.dll file, rather than copying every other file listed above.

Do pretrained tensorflow models need to be used by machines with the same versions?

I trained a cnn on a Linux machine with keras/tensorflow but can’t get the pretrained model to run on my Raspberry Pi. The model was made on Ubuntu 16.04 with Python 3.6.7, tensorflow version 1.7.0, CuDNN 7.0.5 and CUDA 9. I am trying to run it on the Raspberry Pi 3 Model B+ with Python 3.5.3 and tensorflow version 1.13.1.
I have no problem loading and running the pretrained model on the same machine it was created on. The issue is only when I try to run that same pretrained model on the RPi system. I end up getting a segmentation fault.
I tried updating the Linux machine that created the model to tensorflow 1.12 but after tensorflow 1.12 successfully installed, I got "Failed to get convolution algorithm. This is probably because cuDNN failed to initialize" errors, so I'd rather not go down that route. I want to know if it's possible to just use this pretrained model with tensorflow 1.13.1 on the RPi.
Here's what I'm doing on the RPi:
>>> import tensorflow as tf
/usr/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: compiletime version 3.4 of module 'tensorflow.python.framework.fast_tensor_util' does not match runtime version 3.5
return f(*args, **kwds)
/usr/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: builtins.type size changed, may indicate binary incompatibility. Expected 432, got 412
>>> print(tf.__version__)
1.13.1
>>> from keras.models import load_model
Using TensorFlow backend.
>>> model = load_model(save_dir+model_name)
WARNING:tensorflow:From /home/pi/.local/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
WARNING:tensorflow:From /home/pi/.local/lib/python3.5/site-packages/keras/backend/tensorflow_backend.py:3445: calling dropout (from tensorflow.python.ops.nn_ops) with keep_prob is deprecated and will be removed in a future version.
Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.
2019-03-25 17:08:11.471364: W tensorflow/core/framework/allocator.cc:124] Allocation of 209715200 exceeds 10% of system memory.
2019-03-25 17:12:55.123877: W tensorflow/core/framework/allocator.cc:124] Allocation of 209715200 exceeds 10% of system memory.
Backend terminated (returncode: -11)
Fatal Python error: Segmentation fault
I need some guidance on whether this is happening - are the versions incompatible? Maybe the model is too large for RPi (doubt it - it's a fairly shallow model with 18 layers)? The other forum posts I've seen with segmentation faults seemed a lot more dire (e.g., they can't even write standard commands in the Terminal without seeing a segmentation error) - this segmentation fault only happens (and happens repeatably) through the above commanding.
Any advice/help greatly appreciated!

when training simple code of pytorch, cpu ratio increased. GPU is 0% approximately

I'm doing tutorial of Pytorch.
Code is clearly completed. but i have one problem.
It is about my CPU use ratio.
If I enter into training, CPU usage ratio is increasıng up to 100%.
but GPU is roughly 0%.
I installed CUDA 9.2 and cudnn.
and I already checked massage about torch.cuda.is_available()==True.
is it OK, or my setup is wrong?
1.. Did you upload your model and input tensors onto GPU explicitly, showing as follow
https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html#training-on-gpu
For example,
# Configure your device
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
# Upload your model onto GPU
net.to(device)
# Upload your tensor onto GPU
inputs, labels = inputs.to(device), labels.to(device)
2.. You also can use "gpustat" to check GPU usage.
https://github.com/wookayin/gpustat
After installing, you can type "gpustat" on terminal
If your code runs on GPU, GPU usage will increase.
3.. And check whether you've added following CUDA path into your bashrc file.
Following CUDA path is general path on Ubuntu Linux,
but that path can be different per OS or your setting.
You can open bashrc file by typing vim ./.bashrc
when your current directory is home in case where you use Ubuntu Linux.
export PATH=/usr/local/cuda/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
4.. Also check your graphic driver has been installed
by typing nvidia-smi on terminal if you use Ubuntu Linux.

Categories