Recently I started using SHAP. I really liked the way they explain the contribution of each feature in the model.
I somehow face difficulty to understand the decision plot.
I could not find explanation about this plot
I would be thankful if someone explain the plot below because I am a bit confusing about the base value and the model output.
Usually I see in decision plot all the lines starting at same point from bottom and then they separate based on feature contribution.
Is it always like that? Because my plot is different in this case.
And also, does the blue color means positive influence and the red is negative?
Many thanks in advance for any sharing ideas with me to understand the plot.
Regards
If you haven't already you should check out the documentation notebook at: https://github.com/slundberg/shap/blob/master/notebooks/plots/decision_plot.ipynb is talks about the coloring.
As for why the lines don't start at the bottom in your plot, I think that is because there are more features that are cut off. I could imagine summing all those features and showing them combined like in the SHAP waterfall plot, but that would need to be coded up (feel free to open that as an issue or PR for that if you like).
Related
Can someone please let me know what kind of graph this is called?
I am running an asset allocation with different risk/return combinations and would like to show the change in asset allocation through a graph.
The graph shown has time on the x axis and I would like to change that to increase in risk (volatility) Thank you so much!
I am not sure what in Matplot this is called but have no luck finding the right graph to do so.
there is powerful package plotly. It has lots of plot types and python api, you can check examples here https://plotly.com/python/. There are some which might suit for you
Take a look to contour plot https://plotly.com/python/contour-plots/ and https://plotly.com/python/knn-classification/#probability-estimates-with-gocontour
So I was plotting a SHAP plot in Python, just like this tutorial. However, instead of returning red and blue values (like the tutorial), my plot is grey. I didn't find any post talking about this so I've decided to open this one, even though I can't share my data due to my country regulation and company rules. Since this is a generic question, I still believe I can open this post and perhaps help others. If I'm wrong and this post violated any rule, I sincerely apologize.
I am working on a code to project a given object onto a plane.
The code works fine (at least it seems like it) in achieving that purpose, the only issue I'm having is in plotting my results.
In the image below, for instance, I'm plotting the projection of a parallelepiped (its edges, to be more precise) in a plane of my choice.
I would like to make a plot where each point is connected to its closest neighbor. I'm not very confident that this approach would get the job done, but I think it would be worth the shot.
Different ideas to get there are also welcome!
Any thoughts?
Thanks in advance.
Note: I also tried using a solid line style when plotting as opposed to the pixel marker style, but the result I got was not quite what I expected to say the least:
When telling matplotlib to plot a sequence of points and join them with a line, it creates a straight line between two adjacent points in your input data. To create several lines, it's often easier to split your plot command into several ones. An alternative is to arrange your points such that they form the edges you want, but that would be much more complicated in your case.
As discussed in the comments, separating each edge into its own separate plot command worked for your case.
I'm trying to analyze a set of costs using python.
The columns in the data frame are,
'TotalCharges', 'TotalPayments', 'TotalDirectVariableCost', 'TotalDirectFixedCost', 'TotalIndirectVariableCost', 'TotalIndirectFixedCost.
When I tried to plot them using the whisker plots, this is how they could display
I need to properly analyze these data and understand their behavior.
The following are my questions.
Is there any way that I can use wisker plots more clearly?
I believe since these are costs, we cannot ignore them as outliars. So keeping the data as it is what else I can use to represent data more clearly?
Thanks
There are a couple of things you could do:
larger print area
rotate the axis
plot one axis log scale
That said, I think you should examine once again your understanding of what a box and whisker plot is for.
Additionally, you might consider posting this on the Math or Cross Validated site as this doesn't have much to do with code.
I'm having some trouble visualizing a certain dataset that I have in a contour plot. The issue is that I have a bunch of datapoints (X,Y,Z) for which the Z values range from about 2 to 0, where a lot of the interesting features are located in the 0 to 0.3 range. Using a normal scaling, they are very difficult to see, as illustrated in this image:
Now, I have thought about what else to do. Of course there is logarithmic scaling, but then I first need to think about some sort of mapping, and I am not 100% sure how one would do that. Inspired by this question one could think of a mapping of the type scaling(x) = Log(x/min)/Log(max/min) which worked reasonably well in that question.
Also interesting was the followup discussed here.
where they used some sort of ArcSinh scaling function. That seemed to enlarge the small features quite well, proportionally to the whole.
So my question is two fold in a way I suppose.
How would one scale the data in my contour plot in such a way that the small amplitude features do not get blown away by the outliers?
Would you do it using either of the methods mentioned above, or using something completely different?
I am rather new to python and I am constantly amazed by all the things that are already out there, so I am sure there might be a built in way that is better than anything I mentioned above.
For completeness I uploaded the datafile here (the upload site is robustfiles.com, which a quick google search told me is a trustworthy website to share things like these)
I plotted the above with
data = np.load("D:\SavedData\ThreeQubitRess44SpecHighResNormalFreqs.npy")
fig, (ax1) = plt.subplots(1,figsize=(16,16))
cs = ax1.contourf(X, Y, data, 210, alpha=1,cmap='jet')
fig.colorbar(cs, ax=ax1, shrink=0.9)
ax1.set_title("Freq vs B")
ax1.set_ylabel('Frequency (GHz)'); ax1.set_xlabel('B (arb.)')
Excellent question.
Don't scale the data. You'll be looking for compromises in ranges with many scaling functions.
Instead, use a custom colormap. That way, you won't have to remap your actual data and can easily customize the visualization of the regions you'd like to highlight. Another example can be found in the scipy cookbook and there's quite a few more on the internet.
Another option is to break the plot into 2 separate regions by breaking the axis like so