pandas-profiling in databricks - python

I am trying to run basic dataframe profile on my dataset. I am using databricks python notebook.
pip install --upgrade pip
pip install --upgrade setuptools
pip install pandas-profiling
import numpy as np
import pandas as pd
from pandas_profiling import ProfileReport
df = sql("select * from table").cache()
prof = ProfileReport(df)
prof.to_file(output_file='output.html')
output
Successfully installed pip-20.1.1
Successfully installed setuptools-47.1.1
Successfully installed MarkupSafe-1.1.1 Pillow-7.1.2 PyWavelets-1.1.1 Send2Trash-1.5.0 astropy-4.0.1.post1 attrs-19.3.0 bleach-3.1.5 confuse-1.1.0 defusedxml-0.6.0 entrypoints-0.3 htmlmin-0.1.12 imagehash-4.1.0 importlib-metadata-1.6.1 ipywidgets-7.5.1 jinja2-2.11.2 joblib-0.15.1 jsonschema-3.2.0 llvmlite-0.32.1 matplotlib-3.2.1 missingno-0.4.2 mistune-0.8.4 nbconvert-5.6.1 nbformat-5.0.6 networkx-2.4 notebook-6.0.3 numba-0.49.1 packaging-20.4 pandas-1.0.4 pandas-profiling-2.8.0 pandocfilters-1.4.2 phik-0.10.0 prometheus-client-0.8.0 pyrsistent-0.16.0 pyyaml-5.3.1 requests-2.23.0 scipy-1.4.1 tangled-up-in-unicode-0.0.6 terminado-0.8.3 testpath-0.4.4 tqdm-4.46.1 visions-0.4.4 webencodings-0.5.1 widgetsnbextension-3.5.1 zipp-3.1.0
I am getting the following error :-
ImportError: cannot import name 'PY2' from 'scipy._lib.six' (/databricks/python/lib/python3.7/site-packages/scipy/_lib/six.py)
How can I resolve this error ?

Issue is with scipy package.
This worked for me.
%sh
/databricks/python/bin/pip install --upgrade pip
/databricks/python/bin/pip install scipy
/databricks/python/bin/pip install pandas_profiling
dbutils.library.restartPython()
import pandas_profiling
OR
!pip install --upgrade pip
!pip install --upgrade setuptools
!pip install scipy
!pip install pandas-profiling
dbutils.library.restartPython()
import pandas_profiling

Related

I can't get Seaborn to import

I'm trying to add Seaborn to my Anaconda3 python installation, and get multiple failure codes. It successfully installed using conda ... but won't import. After multiple installs/deletion/reinstalls of scipy, numpy, and seaborn, I get the following error message. I'm running Python 3.9.12, numpy 1.22.0 (downgraded from 1.23 due to prev error message stating had to be < 1.23 for Seaborn to install -- on a Windows 10. Any suggestions would be appreciated.
I can't post an image yet, so here's a summary of the error message with the failure points
code:
import Seaborn as sns
ImportError
...
import seaborn as sns
from .rcmod import * # noqa:F401, F403
from .import palettes
from . import _arpack
ImportError: DLL load failed while importing _arpack: The specified procedure could not be found.
It is not issue of seaborn. First of all uninstall numpy using this command
pip uninstall -y numpy
Then uninstall setuptools with this command
pip uninstall -y setuptools
Now, reinstall setuptools with this command
pip install setuptools
Then, You can reinstall numpy with this command
pip install numpy
Now, Your error must be solved.

ImportError: No module named speech_recognition

Getting ImportError: No module named speech_recognition,
Code Snippet:
# for speech-to-text
import speech_recognition as sr
# for text-to-speech
from gtts import gTTS
# for language model
import transformers
import os
import time
# for data
import os
import datetime
import numpy as np
I have already tried pip3 install SpeechRecognition and pip install SpeechRecognition
pip3 install SpeechRecognition
Requirement already satisfied: SpeechRecognition in /Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages (3.8.1)
WARNING: You are using pip version 21.3.1; however, version 22.1.2 is available.
You should consider upgrading via the '/usr/local/bin/python3 -m pip install --upgrade pip' command.
I am using Python 3.8

Installing waymo opendataset on windows

I am trying to install waymo open dataset in the windows environment. However when I try I get the following error:
(python3.6) C:\Users\19729>pip install waymo-open-dataset-tf-2-3-0 ERROR: Could not find a version that satisfies the requirement waymo-open-dataset-tf-2-3-0 (from versions: none) ERROR: No matching distribution found for waymo-open-dataset-tf-2-3-0
How to resolve this?
This piece of code does not show any error when I tried this in colab with TensorFlow 2.7.
However you can try again using below code :
pip3 install --upgrade pip
pip3 install waymo-open-dataset-tf-2-6-0 --user # for tf 2.6.0. OR
pip3 install waymo-open-dataset-tf-2-5-0 --user # for tf 2.5.0. OR
pip3 install waymo-open-dataset-tf-2-4-0 --user # for tf 2.4.0.
You can also load this built-in tensorflow dataset from tensorflow_datasets using:
!pip install tensorflow_datasets
import tensorflow_datasets as tfds
from tensorflow_datasets.object_detection import WaymoOpenDataset
ds = tfds.load('waymo_open_dataset', split='train', shuffle_files=True)
Note: This may take some time as it is 336.62 GiB dataset.
There are some waymo_open_datasets also available in Kaggle.

problem in upgrading scikit-image version

On my system spyder version '0.13.1' is installed and I want to use
from skimage.filters import unsharp_mask
the error comes:
ImportError: cannot import name 'unsharp_mask
then I treid to upgrade the version using:
1st passed;
!pip install scikit-image
then I passed:
!pip install --upgrade scikit-image
then still version is same which is '0.13.1'.
What should I do?
The problem is resloved with the help of Anaconda official video tutorial :
https://anaconda.cloud/tutorials/getting-started-with-anaconda-individual-edition?source=install

How do I import 'umap' package in python?

After install 'umap' package, I can't import
I tried reinstall pre version (1.3.10, 1.4.0rc1).
But, It's not working
How can I do?
!pip3 install umap-learn
!pip3 install umap-learn[plot]
import umap
This is the error I get:
No module named 'umap'
you probably confused umap and umap-learn libs,
if that's the case, the code below'll solve your problem
https://umap-learn.readthedocs.io/en/latest/basic_usage.html
pip uninstall umap
pip install umap-learn
import umap.umap_ as umap
reducer = umap.UMAP()

Categories