Shap value plotting error on Databricks but works locally

Shap value plotting error on Databricks but works locally - python

I want to do a simple shap analysis and plot a shap.force_plot. I noticed that it works without any issues locally in a .ipynb file, but fails on Databricks with the following error message:
Visualization omitted, Javascript library not loaded!
Have you run `initjs()` in this notebook? If this notebook was from another user you must
also trust this notebook (File -> Trust notebook). If you are viewing this notebook on
github the Javascript has been stripped for security. If you are using JupyterLab this
error is because a JupyterLab extension has not yet been written.
Code:
import xgboost
import shap
shap.initjs()
X, y = shap.datasets.boston()
bst = xgboost.train({"learning_rate": 0.01}, xgboost.DMatrix(X, label=y), 100)
explainer = shap.TreeExplainer(bst)
shap_values = explainer.shap_values(X)
shap.force_plot(explainer.expected_value, shap_values[0,:], X.iloc[0,:])
Is there any way to get the image plotting work on Databricks?

Let's try slightly different (matplotlib=True):
import xgboost
import shap
X, y = shap.datasets.boston()
bst = xgboost.train({"learning_rate": 0.01}, xgboost.DMatrix(X, label=y), 100)
explainer = shap.TreeExplainer(bst)
shap_values = explainer.shap_values(X)
shap.force_plot(
explainer.expected_value,
shap_values[0,:],
X.iloc[0,:],
matplotlib=True # <--
)

Related

Compute class weight function issue in 'sklearn' library when used in 'Keras' classification (Python 3.8, only in VS code)

The classifier script I wrote is working fine and recently added weight balancing to the fitting. Since I added the weight estimate function using 'sklearn' library I get the following error :
compute_class_weight() takes 1 positional argument but 3 were given
This error does not make sense per documentation. The script should have three inputs but not sure why it says expecting only one variable. Full error and code information is shown below. Apparently, this is failing only in VS code. I tested in the Jupyter notebook and working fine. So it seems an issue with VS code compiler. Any one notice? ( I am using Python 3.8 with other latest other libraries)
from sklearn.utils import compute_class_weight
train_classes = train_generator.classes
class_weights = compute_class_weight(
"balanced",
np.unique(train_classes),
train_classes
)
class_weights = dict(zip(np.unique(train_classes), class_weights)),
class_weights
In Jupyter Notebook,

After spending a lot of time, this is how I fixed it. I still don't know why but when the code is modified as follows, it works fine. I got the idea after seeing this solution for a similar but slightly different issue.
class_weights = compute_class_weight(
class_weight = "balanced",
classes = np.unique(train_classes),
y = train_classes
)
class_weights = dict(zip(np.unique(train_classes), class_weights))
class_weights

I solved this problem with recode configuraiton.
from sklearn.utils.class_weight import compute_class_weight
class_weights = compute_class_weight(class_weight = "balanced", classes= np.unique(train_labels), y= train_labels)

You need to use older version of sklearn than you have.
for me it works fine with scikit-learn version 0.24.2.

Just follow this:
Why doesn't class_weight.compute_weight() work?
You just need to use class_weight, classes, y terms when you assign the related values.

How to add Mlib library to Spark?

I was given assignment to run some code and show results using the Apache Spark using Python Language, I installed the Apache Spark server using the following steps: https://phoenixnap.com/kb/install-spark-on-windows-10. I tried my code and everything was fine. Now I am assigned another assignment, it needs MLlib linear regression and they provide us with some code that should be running then we will add additional code for it. When I try to run the code I have some errors and warnings, part of them appeared in the previous assignment but it still working. I believe the issue is that there are additiona things related to Mlib Library should be added so the code will run correctly. Anybody has any idea what files should be added to the spark so it runs the code related to MLib?
I am using Windows 10, and spark-3.0.1-bin-hadoop2.7
This is my code :
from pyspark import SparkConf, SparkContext
from pyspark.sql import SQLContext
from pyspark.ml.regression import LinearRegression
from pyspark.ml.feature import StandardScaler
conf = SparkConf().setMaster("local").setAppName("LinearRegression")
sc = SparkContext(conf = conf)
sqlContext = SQLContext(sc)
# Load training data
df = sqlContext.read.format("libsvm").option("numFeatures", 13).load("boston_housing.txt")
# Data needs to be scaled for better results and interpretation
# Initialize the `standardScaler`
standardScaler = StandardScaler(inputCol="features", outputCol="features_scaled")
# Fit the DataFrame to the scaler
scaler = standardScaler.fit(df)
# Transform the data in `df` with the scaler
scaled_df = scaler.transform(df)
# Initialize the linear regression model
lr = LinearRegression(labelCol="label", maxIter=10, regParam=0.3, elasticNetParam=0.8)
# Fit the data to the model
linearModel = lr.fit(scaled_df)
# Print the coefficients for the model
print("Coefficients: %s" % str(linearModel.coefficients))
print("Intercept: %s" % str(linearModel.intercept))
here is the screenshot for what I have when I run the code:

Try to do pip install numpy (or pip3 install numpy if that fails). The traceback says numpy module is not found.

'NearMiss' object has no attribute '_validate_data'

Detailed Image
This is the code below which shows the error.
from imblearn.under_sampling import NearMiss
nm = NearMiss()
X_res,y_res=nm.fit_sample(X,Y)

You are probably trying to under sample your imbalanced dataset. For this purpose, you can use RandomUnderSampler instead of NearMiss.
Try the following code:
from imblearn.under_sampling import RandomUnderSampler
under_sampler = RandomUnderSampler()
X_res, y_res = under_sampler.fit_resample(X, y)
Now, your dataset is balanced. You can verify it using y_res.value_counts().
Cheers!

Instead of "imblearn" package my conda installed a package named "imbalanced-learn" that's why it does not take the data. But it is strange that the jupyter notebook doesn't tell me that "imblearn" isn't installed.

using google maps api with python

I'm following this guide in starting with gmaps api and python;
import gmaps
import gmaps.datasets
import pandas as pd
def func():
# Use google maps api
gmaps.configure(api_key='MY_API_KEY') # Fill in with your API key
# Get the dataset
earthquake_df = gmaps.datasets.load_dataset_as_df('earthquakes')
# Get the locations from the data set
locations = earthquake_df[['latitude', 'longitude']]
# Get the magnitude from the data
weights = earthquake_df['magnitude']
# Set up your map
fig = gmaps.figure()
fig.add_layer(gmaps.heatmap_layer(locations, weights=weights))
return fig
func()
been using this code from the guide (both on pychram and jupyter notebooks) but when i run the code (pychram/jupyter/terminal) I don't get the output map like in the guide.
just a nice old-fashion
Process finished with exit code 0

Check if it is enabled for jupyter first by running jupyter nbextension list.
If you do not see jupyter-gmaps/extension enabled then you need to run this line in terminal jupyter nbextension enable --py gmaps and then run a new jupyter notebook.

ipython not producing output graph using matplotlib

SO I have recently started trying to use ipython, I am finding I cannot get it to produce an output graph. I am running the following code in ipython:
from sklearn import linear_model
regr = linear_model.LinearRegression()
regr.fit(x, y)
pl.plot(x, y, 'o')
pl.plot(x_test, regr.predict(x_test))
and I am recieving the output:
[<matplotlib.lines.Line2D at 0x21d453b0>]
With no image attatched.
I installed ipython using the pythonxy package.
Any thoughts of suggestions on methods to get plots outputting correctly in ipython
See attached image:

Try running in a cell:
%pylab inline # or
%matplotlib inline
After that the plots should be displayed inline. Alternatively start the notebook using the inline option in the command line:
ipython notebook --pylab=inline

from IPython.display import display
from IPython.display import Image
# your code here
Image(data=<your_image_data_here>, embed=True)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Shap value plotting error on Databricks but works locally - python

Related

Compute class weight function issue in 'sklearn' library when used in 'Keras' classification (Python 3.8, only in VS code)

How to add Mlib library to Spark?

'NearMiss' object has no attribute '_validate_data'

using google maps api with python

ipython not producing output graph using matplotlib

Categories

Resources