Python/R Code for creating a schema as shown below? - python

I need to prepare a set of flowcharts that look like this:
The goal here is to show dataset splits for training models, where the different colors show different categorical variables in the y-parm.
What is important here is that the ratio be perfectly preserved (80:20 here).
Please help me with the code to do so (preferably R).
I tried using DiagrammeR, but I lack a coding background and was thus unable to set the bar widths accurately.
Edit: As asked, here's my (admittedly poor) code:
library(DiagrammeR)
grViz("digraph flowchart {
node [shape=rectangle, height=0.75, color=grey]
ds1[label='Dataset-1', width=5]
ds1tr[label='Training Set-1', width=4]
ds1ts[label='Test Set-1', width=1]
ds1->ds1tr;
ds1->ds1ts
}")
I need help to-
Position each node properly
Fix the colors in the proper ratio (9:1/8:2)
Beautify the arrows/edges
Thank you in advance.

Related

How to make an asset allocation line graph in Python?

Can someone please let me know what kind of graph this is called?
I am running an asset allocation with different risk/return combinations and would like to show the change in asset allocation through a graph.
The graph shown has time on the x axis and I would like to change that to increase in risk (volatility) Thank you so much!
I am not sure what in Matplot this is called but have no luck finding the right graph to do so.
there is powerful package plotly. It has lots of plot types and python api, you can check examples here https://plotly.com/python/. There are some which might suit for you
Take a look to contour plot https://plotly.com/python/contour-plots/ and https://plotly.com/python/knn-classification/#probability-estimates-with-gocontour

Understanding shap.decision_plot

Recently I started using SHAP. I really liked the way they explain the contribution of each feature in the model.
I somehow face difficulty to understand the decision plot.
I could not find explanation about this plot
I would be thankful if someone explain the plot below because I am a bit confusing about the base value and the model output.
Usually I see in decision plot all the lines starting at same point from bottom and then they separate based on feature contribution.
Is it always like that? Because my plot is different in this case.
And also, does the blue color means positive influence and the red is negative?
Many thanks in advance for any sharing ideas with me to understand the plot.
Regards
If you haven't already you should check out the documentation notebook at: https://github.com/slundberg/shap/blob/master/notebooks/plots/decision_plot.ipynb is talks about the coloring.
As for why the lines don't start at the bottom in your plot, I think that is because there are more features that are cut off. I could imagine summing all those features and showing them combined like in the SHAP waterfall plot, but that would need to be coded up (feel free to open that as an issue or PR for that if you like).

Wiskerplots are not clear enough to analyze data

I'm trying to analyze a set of costs using python.
The columns in the data frame are,
'TotalCharges', 'TotalPayments', 'TotalDirectVariableCost', 'TotalDirectFixedCost', 'TotalIndirectVariableCost', 'TotalIndirectFixedCost.
When I tried to plot them using the whisker plots, this is how they could display
I need to properly analyze these data and understand their behavior.
The following are my questions.
Is there any way that I can use wisker plots more clearly?
I believe since these are costs, we cannot ignore them as outliars. So keeping the data as it is what else I can use to represent data more clearly?
Thanks
There are a couple of things you could do:
larger print area
rotate the axis
plot one axis log scale
That said, I think you should examine once again your understanding of what a box and whisker plot is for.
Additionally, you might consider posting this on the Math or Cross Validated site as this doesn't have much to do with code.

Resetting default fonts/colours for plots in Python/Matplotlib

First time here and newbie so please bear with me.
I'm following along a data viz tutorial with matplotlib. I'm getting the same results (go me) but the font used (in the plot legend in particular) and the default co lours are different and no parameter has been set to change them. And of course the tutorial's stuff looks much nicer. I don't know why this happens but I'm thinking that maybe I did change some of the stuff in unrelated notebooks and those choices stuck. If so, how do I "reset" please?
I am not allowed to attach screenshots yet. Basically, my plot lines are light blue and orange while his are the traditional discrete dark blue and green. My legend font looks like bad excel while his looks like LateX.
Thank you for your suggestions.
There are a number of ways you can do this and they vary depending on the specific use case but the one I use most often is
from matplotlib import rcParams, rcParamsDefault
rcParams.update(rcParamsDefault)
Although it is likely the case that the tutorial is using a style other than the default so I wouldn't necessarily expect this to make your output look exactly like the tutorial. When using LaTex with matplotlib I use
pgf_with_latex = {
"text.usetex": True,
"pgf.rcfonts": False,
"pgf.preamble": [
r'\usepackage{color}'
]
}
rcParams.update(pgf_with_latex)
You can find more information here regarding customization and here regarding using LaTex with matplotlib.
To find all the built in matplotlib themes, use
plt.style.available
I think the normal one is 'default'. So add this before you make your plots:
plt.style.use(['default'])
But you can make your plots look even better than the tutorial by using some of the other ones.

Visual representation of a Ranking over time (image provided)

The graph below shows a ranking of countries at 10 different points. The cool thing with this graph is that it allows you to track changes in the ranking over time. I want to create create something similar, but I have no idea how it was created...
My guess is that it was created using some design tool like adobe indesign, but my hope is that there might be some other tools for obtaining such a graphic (maybe even a way to do it within Mathematica or Python?)?
Any ideas and/or suggestions on where to look would be much appreciated.
PS: In case I did not post this in the most appropriate of stackExchange's many forums--my apology.
it is called a bump chart, you can learn how to make it with python in this article
How to make a bump chart

Categories