Visualizing decision tree in scikit-learn

Visualizing decision tree in scikit-learn - python

I am trying to design a simple Decision Tree using scikit-learn in Python (I am using Anaconda's Ipython Notebook with Python 2.7.3 on Windows OS) and visualize it as follows:
from pandas import read_csv, DataFrame
from sklearn import tree
from os import system
data = read_csv('D:/training.csv')
Y = data.Y
X = data.ix[:,"X0":"X33"]
dtree = tree.DecisionTreeClassifier(criterion = "entropy")
dtree = dtree.fit(X, Y)
dotfile = open("D:/dtree2.dot", 'w')
dotfile = tree.export_graphviz(dtree, out_file = dotfile, feature_names = X.columns)
dotfile.close()
system("dot -Tpng D:.dot -o D:/dtree2.png")
However, I get the following error:
AttributeError: 'NoneType' object has no attribute 'close'
I use the following blog post as reference: Blogpost link
The following stackoverflow question doesn't seem to work for me as well: Question
Could someone help me with how to visualize the decision tree in scikit-learn?

Here is one liner for those who are using jupyter and sklearn(18.2+) You don't even need matplotlib for that. Only requirement is graphviz
pip install graphviz
than run (according to code in question X is a pandas DataFrame)
from graphviz import Source
from sklearn import tree
Source( tree.export_graphviz(dtreg, out_file=None, feature_names=X.columns))
This will display it in SVG format. Code above produces Graphviz's Source object (source_code - not scary) That would be rendered directly in jupyter.
Some things you are likely to do with it
Display it in jupter:
from IPython.display import SVG
graph = Source( tree.export_graphviz(dtreg, out_file=None, feature_names=X.columns))
SVG(graph.pipe(format='svg'))
Save as png:
graph = Source( tree.export_graphviz(dtreg, out_file=None, feature_names=X.columns))
graph.format = 'png'
graph.render('dtree_render',view=True)
Get the png image, save it and view it:
graph = Source( tree.export_graphviz(dtreg, out_file=None, feature_names=X.columns))
png_bytes = graph.pipe(format='png')
with open('dtree_pipe.png','wb') as f:
f.write(png_bytes)
from IPython.display import Image
Image(png_bytes)
If you are going to play with that lib here are the links to examples and userguide

sklearn.tree.export_graphviz doesn't return anything, and so by default returns None.
By doing dotfile = tree.export_graphviz(...) you overwrite your open file object, which had been previously assigned to dotfile, so you get an error when you try to close the file (as it's now None).
To fix it change your code to
...
dotfile = open("D:/dtree2.dot", 'w')
tree.export_graphviz(dtree, out_file = dotfile, feature_names = X.columns)
dotfile.close()
...

If, like me, you have a problem installing graphviz, you can visualize the tree by
exporting it with export_graphviz as shown in previous answers
Open the .dot file in a text editor
Copy the piece of code and paste it # webgraphviz.com

Scikit learn recently introduced the plot_tree method to make this very easy (new in version 0.21 (May 2019)). Documentation here.
Here's the minimum code you need:
from sklearn import tree
plt.figure(figsize=(40,20)) # customize according to the size of your tree
_ = tree.plot_tree(your_model_name, feature_names = X.columns)
plt.show()
plot_tree supports some arguments to beautify the tree. For example:
from sklearn import tree
plt.figure(figsize=(40,20))
_ = tree.plot_tree(your_model_name, feature_names = X.columns,
filled=True, fontsize=6, rounded = True)
plt.show()
If you want to save the picture to a file, add the following line before plt.show():
plt.savefig('filename.png')
If you want to view the rules in text format, there's an answer here. It's more intuitive to read.

Alternatively, you could try using pydot for producing the png file from dot:
...
tree.export_graphviz(dtreg, out_file='tree.dot') #produces dot file
import pydot
dotfile = StringIO()
tree.export_graphviz(dtreg, out_file=dotfile)
pydot.graph_from_dot_data(dotfile.getvalue()).write_png("dtree2.png")
...

The following also works fine:
from sklearn.datasets import load_iris
iris = load_iris()
# Model (can also use single decision tree)
from sklearn.ensemble import RandomForestClassifier
model = RandomForestClassifier(n_estimators=10)
# Train
model.fit(iris.data, iris.target)
# Extract single tree
estimator = model.estimators_[5]
from sklearn.tree import export_graphviz
# Export as dot file
export_graphviz(estimator, out_file='tree.dot',
feature_names = iris.feature_names,
class_names = iris.target_names,
rounded = True, proportion = False,
precision = 2, filled = True)
# Convert to png using system command (requires Graphviz)
from subprocess import call
call(['dot', '-Tpng', 'tree.dot', '-o', 'tree.png', '-Gdpi=600'])
# Display in jupyter notebook
from IPython.display import Image
Image(filename = 'tree.png')
You can find the source here

I copy and change a part of your code as the below:
from pandas import read_csv, DataFrame
from sklearn import tree
from sklearn.tree import DecisionTreeClassifier
from os import system
data = read_csv('D:/training.csv')
Y = data.Y
X = data.ix[:,"X0":"X33"]
dtree = tree.DecisionTreeClassifier(criterion = "entropy")
dtree = dtree.fit(X, Y)
After making sure you have dtree, which means that the above code runs well, you add the below code to visualize decision tree:
Remember to install graphviz first: pip install graphviz
import graphviz
from graphviz import Source
dot_data = tree.export_graphviz(dtree, out_file=None, feature_names=X.columns)
graph = graphviz.Source(dot_data)
graph.render("name of file",view = True)
I tried with my data, visualization worked well and I got a pdf file viewed immediately.

You can copy the contents of the export_graphviz file and you can paste the same in the webgraphviz.com site.
You can check out the article on How to visualize the decision tree in Python with graphviz for more information.

Simple way founded here with pydotplus (graphviz must be installed):
from IPython.display import Image
from sklearn import tree
import pydotplus # installing pyparsing maybe needed
...
dot_data = tree.export_graphviz(best_model, out_file=None, feature_names = X.columns)
graph = pydotplus.graph_from_dot_data(dot_data)
Image(graph.create_png())

Here is the minimal code to have a nice looking graph with just 3 lines of code :
from sklearn import tree
import pydotplus
dot_data=tree.export_graphviz(dt,filled=True,rounded=True)
graph=pydotplus.graph_from_dot_data(dot_data)
graph.write_png('tree.png')
plt.imshow(plt.imread('tree.png'))
I just added the plt.imgshow to view the graph in Jupyter Notebook. You can ignore it if you are only interested in saving the png file.
I installed the following dependencies:
pip3 install graphviz
pip3 install pydotplus
For MacOs the pip version of Graphviz did not work. Following Graphviz's official documentation I installed it with brew and everything worked fine.
brew install graphviz

If you run into issues with grabbing the source .dot directly you can also use Source.from_file like this:
from graphviz import Source
from sklearn import tree
tree.export_graphviz(dtreg, out_file='tree.dot', feature_names=X.columns)
Source.from_file('tree.dot')

Related

xgboost.plot_tree shows - Empty characters/boxes/blocks as labels

SITUATION
When I plot xgboost.plot_tree I get a bunch of empty characters/boxes/blocks on the graph only instead of the titles, labels and numbers. I use more than 400 features so that can be a contributing factor for this.
CODE 1
fig, ax = plt.subplots(figsize=(170, 170))
plot_tree(xgbmodel, ax=ax)
plt.savefig("temp.pdf")
plt.show()
CODE 2
plot_tree(xgbmodel, num_trees=2)
fig = plt.gcf()
fig.set_size_inches(150, 100)
fig.savefig('tree.png')
ERROR
both code 1 and code 2 results the same image
This is is just a crop of the whole tree because that is much bigger so I would not be able to upload here, but the tree shape look perfect.
SOLUTIONS I have Tried
This has problem with plotting, I can plot without any problem - Plot a Single XGBoost Decision Tree
This has other issues - xgboost.plot_tree: binary feature interpretation
I have plotted the code that #jared_mamrot has given to me and it have brought the same error, I have restarted and cleaned my environment and run this fist and only, in the same notebook.
GitHub Recommendation this model.get_booster().get_dump(dump_format='text') printed a out a bit more than 200'000 characters = 63 A4 size pages of 11size fonts of Calibri, that looks perfectly correct ex.: 0.0268656723\n\t\t\t\t\t34:[f0<6.5] yes=53,no=54,missing=53\n\t\t\t\t\t\. Is it possible that I have this issue because it can not display so much text in such a normal size graph?

I wasn't able to reproduce your error. Can you please add more details to your question and confirm that this code works? link to pima-indians-diabetes.csv
#!/usr/bin/env python3
# plot decision tree
from numpy import loadtxt
from xgboost import XGBClassifier
from xgboost import plot_tree
import matplotlib.pyplot as plt
import graphviz
# load data
dataset = loadtxt('pima-indians-diabetes.csv', delimiter=",")
# split data into X and y
X = dataset[:,0:8]
y = dataset[:,8]
# fit model no training data
model = XGBClassifier()
model.fit(X, y)
# plot/save fig
fig, ax = plt.subplots(figsize=(170, 170))
plot_tree(model, ax=ax)
plt.savefig("test.pdf")
Edit per comment:
I can't reproduce this issue/error. No matter which package version / char encoding / line endings / etc my notebook always renders the text correctly. The only thing I can suggest is installing a new virtual environment (e.g. miniconda) with current versions of the required packages (conda install notebook numpy matplotlib xgboost graphviz python-graphviz) and testing it again.
Also, make sure you don't have windows line endings (see: Matplotlib plotting some characters as blank square / https://github.com/jupyterlab/jupyterlab/issues/1104 / https://github.com/jupyterlab/jupyterlab/issues/3718 / https://github.com/jupyterlab/jupyterlab/pull/3882 ) and specify the font you are using (e.g. How to change fonts in matplotlib (python)?):
# plot decision tree
from numpy import loadtxt
from xgboost import XGBClassifier
from xgboost import plot_tree
from matplotlib.font_manager import FontProperties
import matplotlib.pyplot as plt
import graphviz
# load data
dataset = loadtxt('pima-indians-diabetes.csv', delimiter=",")
# split data into X and y
X = dataset[:,0:8]
y = dataset[:,8]
# fit model no training data
model = XGBClassifier()
model.fit(X, y)
# plot/save fig
prop = FontProperties()
prop.set_file('Arial.ttf')
fig, ax = plt.subplots(figsize=(170, 170))
plot_tree(model, ax=ax, fontproperties=prop)
plt.savefig("test.png")
fig.show()

I have moved my whole environment to a local machine from an AWS EC2 than it run perfectly. The AWS EC2 some other weird things like it wasn't allowing to use Extension in Jupyter Lab. Both of them are Ubuntu 20.04 LTS.

Writing create_tree_digraph plot to a png file in Python

I want the tree of my lightgbm model to save to a .png format. I have tried two plotting methods from lightgbm API - plot_tree and create_tree_diagraph.
import lightgbm as lgb
from sklearn.datasets import load_iris
X, y = load_iris(True)
clf = lgb.LGBMClassifier()
clf.fit(X, y)
When I use plot_tree, it displays the tree but in place of values there are small blank boxes
lgb.plot_tree(clf, tree_index=0)
When I try the create_tree_diagraph, I get the graph but I cant save it as it is.
lgb.create_tree_digraph(clf)
I used the below code to save it a file but that gets saved as the first plot (using plot_tree)
import graphviz
s = graphviz.Source(graph_b.source, filename = "test1.gv", format = "png")
s.view()
Any suggestions to save the plot as an image. I ultimately want to write these tree plots to excel.
I am using graphviz version 0.8.3
Thanks,

I suppose you are using a jupyter notebook or something like that.
This worked for me, but surely is not the best way.
ax = lgb.create_tree_digraph(clf)
with open('fst.svg', 'w') as f:
f.write(ax._repr_svg_())

Loading datasets in offline mode in sklearn and skmultilearn

I would like to use datasets: emotions, scene, and yeast in my project in anaconda (python 3.6.5).
I have used the following codes:
from skmultilearn.dataset import load_dataset
X_train, y_train, feature_names, label_names = load_dataset('emotions', 'train')
It works successfully when I am connected to the internet,
But when I am offline, it doesn't work!
I have downloaded all 3 named above datasets in a folder like this:
H:\Projects\Datasets
How can I use this folder as my source datasets while I am offline?
(I'm using windows 10)
The extensions of datasets that I have downloaded them are: .rar
Like this: emotions.rar, scene.rar, and yeast.rar, and I have downloaded them from: http://mulan.sourceforge.net/datasets-mlc.html

You can but you first need to know the path that the dataset was stored to.
To do this you can load once and get the path. This path will never change so you only need to do the following once in order to get the desired path. Next, knowing the path, you can load offline whatever you want.
Example:
from sklearn.datasets import load_iris
import pandas as pd, os
#get the path
path = load_iris()['filename']
print(path)
#offline load
df = pd.read_csv(path)
#the path: THIS IS WHAT YOU NEED
main_path_with_datasets = os.path.dirname(path)
Once you get the main_path_with_datasets i.e. by doing main_path_with_datasets = os.path.dirname(path), you will now have the path. You can use it to load all the available downloaded datasets.
os.listdir(main_path_with_datasets)
['digits.csv.gz',
'wine_data.csv',
'diabetes_target.csv.gz',
'iris.csv',
'breast_cancer.csv',
'diabetes_data.csv.gz',
'linnerud_physiological.csv',
'linnerud_exercise.csv',
'boston_house_prices.csv']
EDIT for skmultilearn
from skmultilearn.dataset import load_dataset_dump
path = 'C:\\Users\\myname\\scikit_ml_learn_data\\'
X, y, feature_names, label_names = load_dataset_dump(path + 'emotions-train.scikitml.bz2')

Strange plot by using sklearn.linear_model

I typically use MATLAB, but want to push myself to learn something about Python. I tried a code of linear regression that introduced by a youtuber. Here is the code:
import pandas as pd
from sklearn import linear_model
import matplotlib.pyplot as plt
#read data
dataframe = pd.read_fwf('brain_body.txt')
x_values = dataframe[['Brain']]
y_values = dataframe[['Body']]
#train model on data
body_reg = linear_model.LinearRegression()
body_reg.fit(x_values,y_values)
#visualize results
plt.scatter(x_values,y_values)
plt.plot(x_values,body_reg.predict(x_values))
plt.show()
But I ended up with a very strange plot (I use Python 3.6):
1
here is part of details:
2
Apparently, something is missing or wrong.
The data of brain_body.txt can be found in https://github.com/llSourcell/linear_regression_demo/blob/master/brain_body.txt
Any suggestion or advice is welcome.
Update
I tried sera's code, and here is what I get:
3
It's funny and weird. it occurred to me that something is wrong with my data file, or something missing in my Python, but I just copied and pasted the raw data into the notepad and saved as .txt; I tried Python 3.6 and 2.7 as well as Pycharm and Spyder...so I have no idea...
BTW, the youtube video is here
#sascha #Moritz #sera I asked my friend to run the same code and data file, and everything is fine. In other words, there is something wrong with my Python and I don't know why. Let me try another computer and/or try an earlier version of python.
I tried, but nothing changed. Here are two different approaches I used to install Python:
1. Install Python (e.g. ver. 3.6); install Pycharm; install packages Pandas, scikit-learn...
2. Install Anaconda
Solved
Thanks for #Marc Bataillou 's suggestion. This is a problem associated with different versions of matplotlib. The problem was found in version 2.1.0. I tried 2.0.2 and found that the original code works fine in the older version; apparently, some changes are made from 2.0.2 to 2.1.0. Thanks for all your efforts.

You should use
plt.scatter(x_values.values,y_values.values)
instead of
plt.scatter(x_values,y_values)
I hope it works !

You can visualize the results using the following code. I use cross validation for the predictions. If the model was perfect, then all the dots would be on the plotted line.
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import cross_val_predict
from sklearn import linear_model
#read data
dataframe = pd.read_fwf('brain_body.txt')
x_values = dataframe[['Brain']]
y_values = dataframe[['Body']]
#model on data
body_reg = linear_model.LinearRegression()
# cross_val_predict returns an array of the same size as `y` where each entry
# is a prediction obtained by cross validation:
predicted = cross_val_predict(body_reg, x_values, y_values, cv=10)
fig, ax = plt.subplots()
ax.scatter(y_values, predicted, edgecolors=(0, 0, 0))
ax.plot([y_values.min(), y_values.max()], [y_values.min(), y_values.max()], 'k--', lw=4)
ax.set_xlabel('Measured')
ax.set_ylabel('Predicted')
plt.show()
Results:
Data
https://ufile.io/p7x0r

Display graph without saving using pydot

I am trying to display a simple graph using pydot.
My question is that is there any way to display the graph without writing it to a file as currently I use write function to first draw and then have to use the Image module to show the files.
However is there any way that the graph directly gets printed on the screen without being saved ??
Also as an update I would like to ask in this same question that I observe that while the image gets saved very quickly when I use the show command of the Image module it takes noticeable time for the image to be seen .... Also sometimes I get the error that the image could'nt be opened because it was either deleted or saved in unavailable location which is not correct as I am saving it at my Desktop..... Does anyone know what's happening and is there a faster way to get the image loaded.....
Thanks a lot....

Here's a simple solution using IPython:
from IPython.display import Image, display
def view_pydot(pdot):
plt = Image(pdot.create_png())
display(plt)
Example usage:
import networkx as nx
to_pdot = nx.drawing.nx_pydot.to_pydot
pdot = to_pdot(nx.complete_graph(5))
view_pydot(pdot)

You can render the image from pydot by calling GraphViz's dot without writing any files to the disk. Then just plot it. This can be done as follows:
import io
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import networkx as nx
# create a `networkx` graph
g = nx.MultiDiGraph()
g.add_nodes_from([1,2])
g.add_edge(1, 2)
# convert from `networkx` to a `pydot` graph
pydot_graph = nx.drawing.nx_pydot.to_pydot(g)
# render the `pydot` by calling `dot`, no file saved to disk
png_str = pydot_graph.create_png(prog='dot')
# treat the DOT output as an image file
sio = io.BytesIO()
sio.write(png_str)
sio.seek(0)
img = mpimg.imread(sio)
# plot the image
imgplot = plt.imshow(img, aspect='equal')
plt.show()
This is particularly useful for directed graphs.
See also this pull request, which introduces such capabilities directly to networkx.

Based on this answer (how to show images in python), here's a few lines:
gr = ... <pydot.Dot instance> ...
import tempfile, Image
fout = tempfile.NamedTemporaryFile(suffix=".png")
gr.write(fout.name,format="png")
Image.open(fout.name).show()
Image is from the Python Imaging Library

IPython.display.SVG method embeds an SVG into the display and can be used to display graph without saving to a file.
Here, keras.utils.model_to_dot is used to convert a Keras model to dot format.
from IPython.display import SVG
from tensorflow import keras
#Create a keras model.
model = keras.models.Sequential()
model.add(keras.layers.Dense(units=2, input_shape=(2,1), activation='relu'))
model.add(keras.layers.Dense(units=1, activation='relu'))
#model visualization
SVG(keras.utils.model_to_dot(model).create(prog='dot', format='svg'))

This worked for me inside a Python 3 shell (requires the Pillow package):
import pydot
from PIL import Image
from io import BytesIO
graph = pydot.Dot(graph_type="digraph")
node = pydot.Node("Hello pydot!")
graph.add_node(node)
Image.open(BytesIO(graph.create_png())).show()
You can also add a method called _repr_html_ to an object with a pydot graph member to render a nice crisp SVG inside a Jupyter notebook:
class MyClass:
def __init__(self, graph):
self.graph = graph
def _repr_html_(self):
return self.graph.create_svg().decode("utf-8")

I'm afraid pydot uses graphviz to render the graphs. I.e., it runs the executable and loads the resulting image.
Bottom line - no, you cannot avoid creating the file.

It works well with AGraph Class as well
https://pygraphviz.github.io/documentation/latest/reference/agraph.html#pygraphviz.AGraph.draw
If path is None, the result is returned as a Bytes object.
So, just omit this argument to return the image data without saving it to disk
Using
from networkx.drawing.nx_agraph import graphviz_layout, to_agraph
g = nx.Graph()
...
A = to_agraph(g)
A.draw()
https://networkx.org/documentation/stable/reference/drawing.html#module-networkx.drawing.nx_agraph
In order to show the resulting image which is saved as Bytes object:
# create image without saving to disk
img = A.draw(format='png')
image = Image.open(BytesIO(img))
image.show(title="Graph")
it needs
from PIL import Image
from io import BytesIO

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Visualizing decision tree in scikit-learn - python

If, like me, you have a problem installing graphviz, you can visualize the tree by exporting it with export_graphviz as shown in previous answers Open the .dot file in a text editor Copy the piece of code and paste it # webgraphviz.com

You can copy the contents of the export_graphviz file and you can paste the same in the webgraphviz.com site. You can check out the article on How to visualize the decision tree in Python with graphviz for more information.

If you run into issues with grabbing the source .dot directly you can also use Source.from_file like this: from graphviz import Source from sklearn import tree tree.export_graphviz(dtreg, out_file='tree.dot', feature_names=X.columns) Source.from_file('tree.dot')

Related

xgboost.plot_tree shows - Empty characters/boxes/blocks as labels

Writing create_tree_digraph plot to a png file in Python

Loading datasets in offline mode in sklearn and skmultilearn

Strange plot by using sklearn.linear_model

Display graph without saving using pydot

Categories

Resources