Related
X = df.copy()
# Save and drop labels
y = df['class']
X = X.drop('class', axis=1)
cat_features = list(range(0, X.shape[1]))
model = CatBoostClassifier(iterations=2000, learning_rate=0.1, random_seed=12)
model.fit(X, y, verbose=False, plot=False)
explainer = shap.Explainer(model)
shap_values = explainer(X)
shap.force_plot(explainer.expected_value, shap_values[0:5,:],X.iloc[0:5,:], plot_cmap="DrDb")
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
<ipython-input-170-ba1eca12b9ed> in <module>
----> 1 shap.force_plot(10, shap_values[0:5,:],X.iloc[0:5,:], plot_cmap="DrDb")
~\anaconda3\lib\site-packages\shap\plots\_force.py in force(base_value, shap_values, features, feature_names, out_names, link, plot_cmap, matplotlib, show, figsize, ordering_keys, ordering_keys_time_format, text_rotation, contribution_threshold)
101
102 if type(shap_values) != np.ndarray:
--> 103 return visualize(shap_values)
104
105 # convert from a DataFrame or other types
~\anaconda3\lib\site-packages\shap\plots\_force.py in visualize(e, plot_cmap, matplotlib, figsize, show, ordering_keys, ordering_keys_time_format, text_rotation, min_perc)
343 return AdditiveForceArrayVisualizer(e, plot_cmap=plot_cmap, ordering_keys=ordering_keys, ordering_keys_time_format=ordering_keys_time_format)
344 else:
--> 345 assert False, "visualize() can only display Explanation objects (or arrays of them)!"
346
347 class BaseVisualizer:
AssertionError: visualize() can only display Explanation objects (or arrays of them)!
Was trying to plot with shap and my data, but got a mistake and I actually don't understand why. Haven't found anything about this. Please explain how to avoid this error?
explainer.expected_value
-5.842052267820879
You should change the last line to this : shap.force_plot(explainer.expected_value, shap_values.values[0:5,:],X.iloc[0:5,:], plot_cmap="DrDb")
by calling shap_values.values instead of just shap_values, because shap_values holds the shapley values, the base_values and the data . I had the same problem until I inspected the variable.
I am learning visualization of data in python using Cartopy
I have this code for plotting Africa's population and GDP.
def choropleth(ax, attr, cmap_name):
# We need to normalize the values before we can
# use the colormap.
values = [c.attributes[attr] for c in africa]
norm = Normalize(
vmin=min(values), vmax=max(values))
cmap = plt.cm.get_cmap(cmap_name)
for c in africa:
v = c.attributes[attr]
sp = ShapelyFeature(c.geometry, crs,
edgecolor='k',
facecolor=cmap(norm(v)))
ax.add_feature(sp)
fig, (ax1, ax2) = plt.subplots(
1, 2, figsize=(10, 16),
subplot_kw=dict(projection=crs))
draw_africa(ax1)
choropleth(ax1, 'POP_EST', 'Reds')
ax1.set_title('Population')
draw_africa(ax2)
choropleth(ax2, 'GDP_MD_EST', 'Blues')
ax2.set_title('GDP')
And the expected output should be-
But I am getting an error as such -
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-41-b443c58ecbd5> in <module>
3 subplot_kw=dict(projection=crs))
4 draw_africa(ax1)
----> 5 choropleth(ax1, 'POP_EST', 'Reds')
6 ax1.set_title('Population')
7
<ipython-input-40-161126226479> in choropleth(ax, attr, cmap_name)
8 for c in africa:
9 v = c.attributes[attr]
---> 10 sp = ShapelyFeature(c.geometry, crs,
11 edgecolor='k',
12 facecolor=cmap(norm(v)))
~/anaconda3/lib/python3.8/site-packages/cartopy/feature/__init__.py in __init__(self, geometries, crs, **kwargs)
219 """
220 super(ShapelyFeature, self).__init__(crs, **kwargs)
--> 221 self._geoms = tuple(geometries)
222
223 def geometries(self):
TypeError: 'Polygon' object is not iterable
I tried searching for this issue on github but no to avail. Can anyone please help me out how can I correct this ?
Here is the site for reference .
The issue is that the code tries to pass a shapely Polygon to a function that expects MultiPolygon. The elegant solution by swatchai here https://stackoverflow.com/a/63812490/13208790 is to catch Polygons and put them in a list so they can be treated as MultiPolygons.
Here's the code adapted to your case:
for i, c in enumerate(africa):
v = c.attributes[attr]
print(i)
# swatchai's Polygon catch logic
if c.geometry.geom_type=='MultiPolygon':
# this is a list of geometries
sp = ShapelyFeature(c.geometry, crs,
edgecolor='k',
facecolor=cmap(norm(v)))
elif c.geometry.geom_type=='Polygon':
# this is a single geometry
sp = ShapelyFeature([c.geometry], crs,
edgecolor='k',
facecolor=cmap(norm(v)))
else:
pass #do not plot the geometry
I am getting an error while plotting the dendrogram for the spearmanr correlation.
Below is the code I am using
corr = np.round(scipy.stats.spearmanr(full_data[list_of_continous]).correlation, 4)
corr_condensed = hc.distance.squareform(1-corr)
z = hc.linkage(corr_condensed, method='average')
fig = plt.figure(figsize=(20,20))
dendrogram = hc.dendrogram(z, labels=full_data[list_of_continous].columns, orientation='left', leaf_font_size=30)
plt.show()
Below is the error I am getting:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-10-9873c0be8dc7> in <module>()
1 corr = np.round(scipy.stats.spearmanr(full_data[list_of_continous]).correlation, 4)
----> 2 corr_condensed = hc.distance.squareform(1-corr)
3 z = hc.linkage(corr_condensed, method='average')
4 fig = plt.figure(figsize=(20,20))
5 dendrogram = hc.dendrogram(z, labels=full_data[list_of_continous].columns, orientation='left', leaf_font_size=30)
/usr/local/anaconda/lib/python3.6/site-packages/scipy/spatial/distance.py in squareform(X, force, checks)
1844 raise ValueError('The matrix argument must be square.')
1845 if checks:
-> 1846 is_valid_dm(X, throw=True, name='X')
1847
1848 # One-side of the dimensions is set here.
/usr/local/anaconda/lib/python3.6/site-packages/scipy/spatial/distance.py in is_valid_dm(D, tol, throw, name, warning)
1920 if name:
1921 raise ValueError(('Distance matrix \'%s\' must be '
-> 1922 'symmetric.') % name)
1923 else:
1924 raise ValueError('Distance matrix must be symmetric.')
ValueError: Distance matrix 'X' must be symmetric.
Variable corr might have nan values which might deform it.
Try:
corr = np.nan_to_num(corr)
Update:
skipping
corr_condensed = hc.distance.squareform(1-corr)
works without any error for me.
So
corr = np.round(scipy.stats.spearmanr(full_data[list_of_continous]).correlation, 4)
z = hc.linkage(corr, method='average')
fig = plt.figure(figsize=(20,20))
dendrogram = hc.dendrogram(z, labels=full_data[list_of_continous].columns, orientation='left', leaf_font_size=30)
plt.show()
should work for you too.
If you are sure the matrix is symmetric, set checks=False
corr_condensed = hc.distance.squareform(1-corr, checks=False)
I am trying to understand how torchvision interacts with mathplotlib to produce a grid of images. It's easy to generate images and display them iteratively:
import torch
import torchvision
import matplotlib.pyplot as plt
w = torch.randn(10,3,640,640)
for i in range (0,10):
z = w[i]
plt.imshow(z.permute(1,2,0))
plt.show()
However, displaying these images in a grid does not seem to be as straightforward.
w = torch.randn(10,3,640,640)
grid = torchvision.utils.make_grid(w, nrow=5)
plt.imshow(grid)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-61-1601915e10f3> in <module>()
1 w = torch.randn(10,3,640,640)
2 grid = torchvision.utils.make_grid(w, nrow=5)
----> 3 plt.imshow(grid)
/anaconda3/lib/python3.6/site-packages/matplotlib/pyplot.py in imshow(X, cmap, norm, aspect, interpolation, alpha, vmin, vmax, origin, extent, shape, filternorm, filterrad, imlim, resample, url, hold, data, **kwargs)
3203 filternorm=filternorm, filterrad=filterrad,
3204 imlim=imlim, resample=resample, url=url, data=data,
-> 3205 **kwargs)
3206 finally:
3207 ax._hold = washold
/anaconda3/lib/python3.6/site-packages/matplotlib/__init__.py in inner(ax, *args, **kwargs)
1853 "the Matplotlib list!)" % (label_namer, func.__name__),
1854 RuntimeWarning, stacklevel=2)
-> 1855 return func(ax, *args, **kwargs)
1856
1857 inner.__doc__ = _add_data_doc(inner.__doc__,
/anaconda3/lib/python3.6/site-packages/matplotlib/axes/_axes.py in imshow(self, X, cmap, norm, aspect, interpolation, alpha, vmin, vmax, origin, extent, shape, filternorm, filterrad, imlim, resample, url, **kwargs)
5485 resample=resample, **kwargs)
5486
-> 5487 im.set_data(X)
5488 im.set_alpha(alpha)
5489 if im.get_clip_path() is None:
/anaconda3/lib/python3.6/site-packages/matplotlib/image.py in set_data(self, A)
651 if not (self._A.ndim == 2
652 or self._A.ndim == 3 and self._A.shape[-1] in [3, 4]):
--> 653 raise TypeError("Invalid dimensions for image data")
654
655 if self._A.ndim == 3:
TypeError: Invalid dimensions for image data
Even though PyTorch's documentation indicates that w is the correct shape, Python says that it isn't. So I tried to permute the indices of my tensor:
w = torch.randn(10,3,640,640)
grid = torchvision.utils.make_grid(w.permute(0,2,3,1), nrow=5)
plt.imshow(grid)
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-62-6f2dc6313e29> in <module>()
1 w = torch.randn(10,3,640,640)
----> 2 grid = torchvision.utils.make_grid(w.permute(0,2,3,1), nrow=5)
3 plt.imshow(grid)
/anaconda3/lib/python3.6/site-packages/torchvision-0.2.1-py3.6.egg/torchvision/utils.py in make_grid(tensor, nrow, padding, normalize, range, scale_each, pad_value)
83 grid.narrow(1, y * height + padding, height - padding)\
84 .narrow(2, x * width + padding, width - padding)\
---> 85 .copy_(tensor[k])
86 k = k + 1
87 return grid
RuntimeError: The expanded size of the tensor (3) must match the existing size (640) at non-singleton dimension 0
What's happening here? How can I place a bunch of randomly generated images into a grid and display them?
There's a small mistake in your code. torchvision.utils.make_grid() returns a tensor which contains the grid of images. But the channel dimension has to be moved to the end since that's what matplotlib recognizes. Below is the code that works fine:
In [107]: import torchvision
# sample input (10 RGB images containing just Gaussian Noise)
In [108]: batch_tensor = torch.randn(*(10, 3, 256, 256)) # (N, C, H, W)
# make grid (2 rows and 5 columns) to display our 10 images
In [109]: grid_img = torchvision.utils.make_grid(batch_tensor, nrow=5)
# check shape
In [110]: grid_img.shape
Out[110]: torch.Size([3, 518, 1292])
# reshape and plot (because matplotlib needs channel as the last dimension)
In [111]: plt.imshow(grid_img.permute(1, 2, 0))
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Out[111]: <matplotlib.image.AxesImage at 0x7f62081ef080>
which shows the output as:
You have to convert to numpy first
import numpy as np
def show(img):
npimg = img.numpy()
plt.imshow(np.transpose(npimg, (1,2,0)), interpolation='nearest')
w = torch.randn(10,3,640,640)
grid = torchvision.utils.make_grid(w, nrow=10, padding=100)
show(grid)
I am trying to use matplotlib and seaborn to create a scatter plot. It works fine if the entire plot is only one color like below:
sns.regplot(x = pair[0], y = pair[1], data = d, fit_reg = False, ax = ax, x_jitter = True, scatter_kws = {'linewidths':0, 's':2, 'color':'r'})
However, if I need the color of each data point depends on the value in col like:
col = pandas_df.prediction.map({0: [1,0,0], 1:[0,1,0]})
sns.regplot(x = pair[0], y = pair[1], data = d, fit_reg = False, ax = ax, x_jitter = True, scatter_kws = {'linewidths':0, 's':2, 'cmap':"RGB", 'color':col})
where pandas_df is a pandas dataframe, so col is a series of RGB point like:
[1,0,0]
[0,1,0]
[1,0,0]
[0,1,0]
:
:
Then I got the errors:
IndexErrorTraceback (most recent call last)
<ipython-input-12-e17a2dbdd639> in <module>()
15 #print dtype(col)
16 d.plot.scatter(*pair, ax=ax, c=col, linewidths=0, s=2, alpha = 0.7)
---> 17 sns.regplot(x = pair[0], y = pair[1], data = d, fit_reg = False, ax = ax, x_jitter = True, scatter_kws = {'linewidths':0, 's':2, 'cmap':"RGB", 'color':col})
18
19 fig.tight_layout()
/usr/local/lib/python2.7/dist-packages/seaborn/linearmodels.pyc in regplot(x, y, data, x_estimator, x_bins, x_ci, scatter, fit_reg, ci, n_boot, units, order, logistic, lowess, robust, logx, x_partial, y_partial, truncate, dropna, x_jitter, y_jitter, label, color, marker, scatter_kws, line_kws, ax)
777 scatter_kws["marker"] = marker
778 line_kws = {} if line_kws is None else copy.copy(line_kws)
--> 779 plotter.plot(ax, scatter_kws, line_kws)
780 return ax
781
/usr/local/lib/python2.7/dist-packages/seaborn/linearmodels.pyc in plot(self, ax, scatter_kws, line_kws)
328 # Draw the constituent plots
329 if self.scatter:
--> 330 self.scatterplot(ax, scatter_kws)
331 if self.fit_reg:
332 self.lineplot(ax, line_kws)
/usr/local/lib/python2.7/dist-packages/seaborn/linearmodels.pyc in scatterplot(self, ax, kws)
353 kws.setdefault("linewidths", lw)
354
--> 355 if not hasattr(kws['color'], 'shape') or kws['color'].shape[1] < 4:
356 kws.setdefault("alpha", .8)
357
IndexError: tuple index out of range
What did I do wrong in assigning color and cmap in this case? Thanks!
I just came across this problem myself, with code that worked a year ago. (I might have switched from Python 2 to Python 3, which might explain the error.)
Reiteration
I dug into the code a bit, and as you noted, the error is at
--> 355 if not hasattr(kws['color'], 'shape') or kws['color'].shape[1] < 4:
356 kws.setdefault("alpha", .8)
357
IndexError: tuple index out of range
If you look at what is happening here, whatever you pass into the 'color' keyword (specifically 'color':col in your case) needs both of the following features:
It needs to not have a shape attribute
But if I does have a shape attribute, that attribute must have at least 2 dimensions.
The Root Problem
Well, there is the problem: a pandas Series or a numpy ndarray (or several other data structures, I'm guessing), have a shape attribute that can have only 1 dimension.
E.g., when I ran across the problem, I had something like the following:
col.shape
(2506,)
This means that my col variable (in my case, a pandas Series object), both has a shape and the shape has only 1 dimension.
It was not obvious to me how to fix this. I tried to just force my pandas Series into a list, but that didn't fix things. I tried to just pass a 2D pandas DataFrame, where each column was identical, but that didn't fix it.
Potential Fix (for those who cannot be deterred)
In looking through the source code, it was not obvious to me how I could fix things. The right fix, it would seem, might be to add another check in line 355 that looked something like this:
355 if not hasattr(kws['color'], 'shape') or len(kws['color'].shape) < 2 or kws['color'].shape[1] < 4:
But I didn't have the energy (or time) to go through the hassle of forking the source and submitting the fix. :(