PyTorch Lightning working in One Collab Notebook but not Another

PyTorch Lightning working in One Collab Notebook but not Another - python

When I am following a Youtube tutorial, my code breaks when they try to import pytorch_lightning. I had a similar problem in my PyCharm environment but it has since gone away, and I am unsure why.
Error provided was
import pytorch_lightning as pl fails with ValueError: transformers.models.auto._spec_ is None.
This is my collab notebook : https://colab.research.google.com/drive/1DeeYFvK7wFsf9VBxjr184VAlulg6M_ly?usp=sharing
I decided to try and see if I can replicate the error using a minimal sub-example in a new notebook : https://colab.research.google.com/drive/1NiHIoIt8v215-lO8KoHKDSkCuw9W5S6a?usp=sharing
In the new notebook, it imports just fine. Why is it then in my actual notebook I get an error?
Update : Ending my runtime session, and restarting it, has solved the issue. However I'd like to know what I did that caused it in the first place, if possible, to avoid having this occur again.

it's a known issue with torchmetrics==0.7.0. Until torchmetrics==0.7.1 will be released, use an older version:
pip install "torchmetrics<0.7"
Here's the tracking issue: https://github.com/PyTorchLightning/pytorch-lightning/issues/11524

Related

scanpy neighbors function: LLVM ERROR: Symbol not found: __svml_sqrtf8

Whenever I use sc.pp.neighbors(adata) I get this message (without any error):
I have:
scanpy==1.8.1
pynndescent==0.5.4
numba==0.54.0
umap-learn==0.5.1
anndata==0.7.6
My dataset contains only ~20,000 cells so it's quite weird that my kernel dies using this relatively small dataset.
I even tried to use scanpy's bbknn function as an alternative, and my kernel died as well.
I also encountered the same problem as an issue on github: https://github.com/theislab/scanpy/issues/1567 but it had no solution yet.
I tried to run the code on cmd instead of jupyter-notebook and got the next error:
LLVM ERROR: Symbol not found: __svml_sqrtf8
What should I do in order to properly run this function?

The above comment by #Iguananaut worked for me:
If you can reproduce the problem outside the Jupyter Notebook, then it's not really a problem relative to the use of Jupyter, and that tag can be avoided. The problem is somewhere else. The issue is likely related to numba, and possibly an incompatibility between a pre-compiled numba and other libraries installed on your system. I wonder if it would help if you set the environment variable NUMBA_DISABLE_INTEL_SVML=1
I created a new environmental variable as below:
variable name: NUMBA_DISABLE_INTEL_SVML
variable value: 1
This then allowed me to run umap. Before I was seeing the same error in a terminal window:
Symbol not found: _svml_sqrtf8

Create a mess I need to clean up, while trying to solve conflicts between Macbook Pro M1 and Tensorflow library

I have a 2021 Macbook pro M1.
It is a well recognized problem that it conflicts with Tensoflow library, see here or here; in particular this last one was exactly the issue I experienced: when I tried to import tensorflow in jupyter notebook via the command
import tensorflow as tf
then the message
The kernel appears to have died. It will restart automatically.
appeared. Then searching in the above linked discussions, I have the feeling that the suggestion given at some point, which points at this link, SEEMS to be useful.
FIRST QUESTION: is this a/the solution for the M1-Tensorflow conflict?
I say "it seems" since before trying that I have been into the kind of tornado of desperate attempts leading a beginner like me to search for hints all around the web and then copy-paste commands taken here and there into the Terminal without understanding them all properly.
On one hand it sounds dumb, I admit, on the other the cost of understanding everything goes well beyond my humble intentions of learning some ML.
So, the final result is that I have a complete mess in my computer; the old libraries like numpy don't work anymore (when I import them inside a Python3 page opened with jupyter notebook with the command import numpy as np, the message
ModuleNotFoundError: No module named 'numpy'
appears), then the pip command doesn't work, if I use the pip3 to install, nothing changes. I read somewhere to use a virtual enviroment, and I followed the instructions even if I wasn't really aware of what I was doing; I downloaded XCode, miniforge3...
Well, I guess that there is somebody out there who can relate with this.
SECOND PROBLEM: I would like to clean-up everything dealing with Python/pip/anaconda and so on and install everything from scratch, possibly following the above link to solve the M1-tensorflow conflict...if it is correct. How can I do that?
Can somebody help me, please? Thanks

how to use effectively the R keras (and other R packages using python in its background) on google colab's notebook with python kernel?

This is partially related to an already closed question about keras package and R in google colab. But I have some specific doubts in such a workflow...
1 .It is known we have how to use R in google colab. And the use of the google colab's GPU and TPU are indeed interesting.
And although the documentations says we need to run install_keras() in R if we want to use GPU on keras, at google colab it works without this setting. Neither python installation is required.
But deep learning processes are time consuming... running all the code in just one notebook have some limitations... Splitting into several ones, saving and sharing the resultant object to re-use in the next notebook can be interesting.
We can think the above is more desirable specially because the environment is ephemeral. And the solution would be mounting the google drive to be able to use its data, and save some partial outputs on it. But mounting google drive appears to be restricted to python notebooks... yes, we have discussions proposing solutions as here and here but I was not able to implement them...
So I am really curious how the R keras users (and other R users) deal with such an issue when using google colab?
If we keep in this idea of a workflow using more than one notebook, some possible related questions is this one (without answer)
So, I have tried to another alternative: using the python notebook and run R in specific cells inside it using rpy2 like indicated here and other discussions I mentioned before... Yes, one can ask why not coding on Python... Ignoring it and still keeping R...
But happens that R's keras is an api for python keras, and need python to run... But I do not why, when I try to run any keras function, even a simple
%%R
imdb<-dataset_imdb()
I get:
R[write to console]: Error: Error 1 occurred creating conda
environment /root/.local/share/r-miniconda/envs/r-reticulate
I also see one saying Rkerkel does not see the colab's python like here, but I know it is not true, because R'keras works in the Rkernel, and if I run the same py_config, I am able to see the python versions...
But the point is... why in this python notebook, using rpy2, we cannot verify the Python...?
If we run the notebook with R Kernel, the all package requiring python works well without any intervention... that's strange...
I see discussions of how to install conda like here. But I believe this should not be the way... Maybe is related to rpy2...
I have tried some alternatives to check the existence of python versions inside the R cells (called with %%R), and I believe the R called in this sense are not able to see python...
%%R
library(reticulate)
py_config()
It returns the same
R[write to console]: Error: Error 1 occurred creating conda
environment /root/.local/share/r-miniconda/envs/r-reticulate
Error: Error 1 occurred creating conda environment
/root/.local/share/r-miniconda/envs/r-reticulate
So, my major question:
how to use effectively the R keras (and other R packages using python in its background) inside google colab's notebook with python kernel?
What I am missing here with rpy2?

cannot load CSVs or Excel files after some updates

I was trying to schedule a new python code by running with a BAT file, but was getting an error that the statsmodels package was not present. The package loaded fine in Spyder, but not when running from a BAT file. I followed a thread here that suggested updating my packages in console (pip command) which I did.
That led to a new error that NumPy was not loading. I noticed that I now had 2 versions of NumPy (1.19.1 and 1.19.2). Further searches yielded advise to uninstall and reinstall NumPy. I had to uninstall twice to get rid of both versions, then installing left me with 1.19.2.
Now, when I run my code in Spyder, I get a strange error on pd.read_csv:
"Only callable can be used as callback"
I couldn't find anyone getting this error from pd.read_csv. Next, I tried to run pd.read_excel in Spyder, but I get this error message:
"int() argument must be a string, a bytes-like object or a number, not '_NoValueType'"
This is code that worked fine yesterday on files that have not changed, so it is not the files. I even made a couple test files and get the same error. Trying to load statsmodels in Spyder now fails:
"from statsmodels.tsa.ar_model import AutoReg"
"AttributeError: module 'numpy.core' has no attribute 'records'"
Running the same code in BAT, reading csv and excel files DO work, but still hangs up on loading statsmodels.
I think at this point, I need to reload Anaconda, but I don't understand why code that works in Spyder does not work running from BAT file, when I am referencing the only copy of python that I have in Anaconda.
Thanks,

It seems to be fine today, so perhaps I needed a full reboot to implement the updates? I don't remember doing this in the past.
I'm still having the original issue with loading the statsmodels package when running from BAT file, but I will ask that in a new post.

How can I fix an nltk.download [Win Errorr 10054] when trying to run the #nltk.download('stopwords')) code at my corporate computer?

I am trying to use nltk and I need to download the nltk.download() data. I have tried a number of things on my work computer, but I'm not sure if it's our firewall, or if there is something else going on. I am doing this in Jupyter Notebook.
I have tried running the following code (nltk.download()) and updating the the directory and server as shown below, but I still get an error. Which server index should I be using?
I've also tried simply running an import statement followed by a download statement. Do we still need to do this to use stopwords?
import nltk
nltk.download('stopwords')
I've tried going through Anaconda Prompt and running the code below and I still get the same error.
python -m nltk.downloader all
Lastly, I've tried going directly to the site (http://nltk.org/nltk_data/)to download the data and the website never opens and times out.
Can anyone help direct me on how to fix this? I've seen something written on a proxy server. If that is the issue, how do I get around it?

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.