Writing create_tree_digraph plot to a png file in Python - python

I want the tree of my lightgbm model to save to a .png format. I have tried two plotting methods from lightgbm API - plot_tree and create_tree_diagraph.
import lightgbm as lgb
from sklearn.datasets import load_iris
X, y = load_iris(True)
clf = lgb.LGBMClassifier()
clf.fit(X, y)
When I use plot_tree, it displays the tree but in place of values there are small blank boxes
lgb.plot_tree(clf, tree_index=0)
When I try the create_tree_diagraph, I get the graph but I cant save it as it is.
lgb.create_tree_digraph(clf)
I used the below code to save it a file but that gets saved as the first plot (using plot_tree)
import graphviz
s = graphviz.Source(graph_b.source, filename = "test1.gv", format = "png")
s.view()
Any suggestions to save the plot as an image. I ultimately want to write these tree plots to excel.
I am using graphviz version 0.8.3
Thanks,

I suppose you are using a jupyter notebook or something like that.
This worked for me, but surely is not the best way.
ax = lgb.create_tree_digraph(clf)
with open('fst.svg', 'w') as f:
f.write(ax._repr_svg_())

Related

xgboost.plot_tree shows - Empty characters/boxes/blocks as labels

SITUATION
When I plot xgboost.plot_tree I get a bunch of empty characters/boxes/blocks on the graph only instead of the titles, labels and numbers. I use more than 400 features so that can be a contributing factor for this.
CODE 1
fig, ax = plt.subplots(figsize=(170, 170))
plot_tree(xgbmodel, ax=ax)
plt.savefig("temp.pdf")
plt.show()
CODE 2
plot_tree(xgbmodel, num_trees=2)
fig = plt.gcf()
fig.set_size_inches(150, 100)
fig.savefig('tree.png')
ERROR
both code 1 and code 2 results the same image
This is is just a crop of the whole tree because that is much bigger so I would not be able to upload here, but the tree shape look perfect.
SOLUTIONS I have Tried
This has problem with plotting, I can plot without any problem - Plot a Single XGBoost Decision Tree
This has other issues - xgboost.plot_tree: binary feature interpretation
I have plotted the code that #jared_mamrot has given to me and it have brought the same error, I have restarted and cleaned my environment and run this fist and only, in the same notebook.
GitHub Recommendation this model.get_booster().get_dump(dump_format='text') printed a out a bit more than 200'000 characters = 63 A4 size pages of 11size fonts of Calibri, that looks perfectly correct ex.: 0.0268656723\n\t\t\t\t\t34:[f0<6.5] yes=53,no=54,missing=53\n\t\t\t\t\t\. Is it possible that I have this issue because it can not display so much text in such a normal size graph?
I wasn't able to reproduce your error. Can you please add more details to your question and confirm that this code works? link to pima-indians-diabetes.csv
#!/usr/bin/env python3
# plot decision tree
from numpy import loadtxt
from xgboost import XGBClassifier
from xgboost import plot_tree
import matplotlib.pyplot as plt
import graphviz
# load data
dataset = loadtxt('pima-indians-diabetes.csv', delimiter=",")
# split data into X and y
X = dataset[:,0:8]
y = dataset[:,8]
# fit model no training data
model = XGBClassifier()
model.fit(X, y)
# plot/save fig
fig, ax = plt.subplots(figsize=(170, 170))
plot_tree(model, ax=ax)
plt.savefig("test.pdf")
Edit per comment:
I can't reproduce this issue/error. No matter which package version / char encoding / line endings / etc my notebook always renders the text correctly. The only thing I can suggest is installing a new virtual environment (e.g. miniconda) with current versions of the required packages (conda install notebook numpy matplotlib xgboost graphviz python-graphviz) and testing it again.
Also, make sure you don't have windows line endings (see: Matplotlib plotting some characters as blank square / https://github.com/jupyterlab/jupyterlab/issues/1104 / https://github.com/jupyterlab/jupyterlab/issues/3718 / https://github.com/jupyterlab/jupyterlab/pull/3882 ) and specify the font you are using (e.g. How to change fonts in matplotlib (python)?):
# plot decision tree
from numpy import loadtxt
from xgboost import XGBClassifier
from xgboost import plot_tree
from matplotlib.font_manager import FontProperties
import matplotlib.pyplot as plt
import graphviz
# load data
dataset = loadtxt('pima-indians-diabetes.csv', delimiter=",")
# split data into X and y
X = dataset[:,0:8]
y = dataset[:,8]
# fit model no training data
model = XGBClassifier()
model.fit(X, y)
# plot/save fig
prop = FontProperties()
prop.set_file('Arial.ttf')
fig, ax = plt.subplots(figsize=(170, 170))
plot_tree(model, ax=ax, fontproperties=prop)
plt.savefig("test.png")
fig.show()
I have moved my whole environment to a local machine from an AWS EC2 than it run perfectly. The AWS EC2 some other weird things like it wasn't allowing to use Extension in Jupyter Lab. Both of them are Ubuntu 20.04 LTS.

.plot() command does not display anything

I have this code based on this question, just a different point Extract constrained polygon using OSMnx
I am trying to plot the block in which the point is located but it does nothing, it just prints "Done" but I cannot see any image
import osmnx as ox
import geopandas as gpd
import shapely
point = (50.090464, 14.400070)
streets_graph = ox.graph_from_point(point, distance=500, network_type='drive')
streets_graph = ox.project_graph(streets_graph)
streets = ox.save_load.graph_to_gdfs(streets_graph, nodes=False, edges=True,
node_geometry=False, fill_edge_geometry=True)
point = streets.unary_union.centroid
polygons = shapely.ops.polygonize(streets.geometry)
polygons = gpd.GeoSeries(polygons)
target = polygons.loc[polygons.contains(point)]
target_streets = streets.loc[streets.intersection(target.iloc[0]).type == 'MultiLineString']
ax = target_streets.plot()
gpd.GeoSeries([point]).plot(ax=ax, color='r')
print("Done")
I do not think this may help but I am using Visual Studio Code
Thank you very much
Since my comment answered your question, I will summarize it here for other people:
When using plotting library dependent on matplotlib, like geopandas or seaborn, you will need to import matplotlib in order to show the plot. The way matplotlib is imported will depend on whether you are using Jupyter or simple scripting (.py) files.
For Jupyter you need to import it like this:
%matplotlib inline
For simple scripting (.py) file you need to import it like this:
import matplotlib.pyplot as plt
Then when you want to show your plot you simply do
plt.show()
Hope it helps!

Printing image in r-markdown using matplotlib and python code

I am trying to run the python code using the R-Markdown file (RMarkdown to pdf).
What I achieved till now -
1- I am able to configure my python engine using knitr and reticulate library
2- I am able to execute my python codes.
What I tried -
1- I tried all the methods which are discussed in this forum, but nothing is working out.
2- I also tried to save the image,(as one of the posts here suggests), but that also is not working.
My problem -
1- When I am trying to plot a graph using matlplotlib and command plt.imshow() and plt.show(), it's not printing the image in the output. Rather it's showing the image in a separate window. You can see my results in the attached image.
Result_of_my_code
Here is my code
```{r setup, include=FALSE}
library(knitr)
library(reticulate)
knitr::knit_engines$set(python = reticulate::eng_python)
```
```{python}
import numpy as np
import os
import torch
import torchvision.datasets as dsets
import matplotlib.pyplot as plt
print(os.getcwd())
os.chdir('D:\\1st year\\Python codes\\CIFR Analysis\\self contained analysis')
print(os.getcwd())
train_mnist = dsets.MNIST("../data", train=True)
test_mnist = dsets.MNIST("../data", train= False)
print(len(train_mnist))
#print(train_mnist[0][0])
plt.imshow(train_mnist[0][0], cmap="gray")
#plt.savefig("trainzero.png")
plt.show()
```
Kindly, help me to fix this issue, as I want to compile my python codes using the R markdown file.
thanks
So with R Markdown, you have to do some things a little differently. In the following, I have a dataframe with two series created by concatenating them. The original plotting code in the Jupyter Notebook is as follows and just printed out the series.
# make a plot of model fit
train.plot(figsize=(16,8), legend=True)
backtest.plot(legend=True);
However, it does not work with way with R Markdown. Then with plotting, you always have to assign them, and with the code below, you get the same plot.
dfreg = pd.concat([reg, backtest], axis = 1)
ax = dfreg.plot(figsize=(16,8), legend = True)
ax1 = predictions.plot(legend=True)
plt.show()
This is common with other plotting functions like plot_acf() too.

Strange plot by using sklearn.linear_model

I typically use MATLAB, but want to push myself to learn something about Python. I tried a code of linear regression that introduced by a youtuber. Here is the code:
import pandas as pd
from sklearn import linear_model
import matplotlib.pyplot as plt
#read data
dataframe = pd.read_fwf('brain_body.txt')
x_values = dataframe[['Brain']]
y_values = dataframe[['Body']]
#train model on data
body_reg = linear_model.LinearRegression()
body_reg.fit(x_values,y_values)
#visualize results
plt.scatter(x_values,y_values)
plt.plot(x_values,body_reg.predict(x_values))
plt.show()
But I ended up with a very strange plot (I use Python 3.6):
1
here is part of details:
2
Apparently, something is missing or wrong.
The data of brain_body.txt can be found in https://github.com/llSourcell/linear_regression_demo/blob/master/brain_body.txt
Any suggestion or advice is welcome.
Update
I tried sera's code, and here is what I get:
3
It's funny and weird. it occurred to me that something is wrong with my data file, or something missing in my Python, but I just copied and pasted the raw data into the notepad and saved as .txt; I tried Python 3.6 and 2.7 as well as Pycharm and Spyder...so I have no idea...
BTW, the youtube video is here
#sascha #Moritz #sera I asked my friend to run the same code and data file, and everything is fine. In other words, there is something wrong with my Python and I don't know why. Let me try another computer and/or try an earlier version of python.
I tried, but nothing changed. Here are two different approaches I used to install Python:
1. Install Python (e.g. ver. 3.6); install Pycharm; install packages Pandas, scikit-learn...
2. Install Anaconda
Solved
Thanks for #Marc Bataillou 's suggestion. This is a problem associated with different versions of matplotlib. The problem was found in version 2.1.0. I tried 2.0.2 and found that the original code works fine in the older version; apparently, some changes are made from 2.0.2 to 2.1.0. Thanks for all your efforts.
You should use
plt.scatter(x_values.values,y_values.values)
instead of
plt.scatter(x_values,y_values)
I hope it works !
You can visualize the results using the following code. I use cross validation for the predictions. If the model was perfect, then all the dots would be on the plotted line.
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import cross_val_predict
from sklearn import linear_model
#read data
dataframe = pd.read_fwf('brain_body.txt')
x_values = dataframe[['Brain']]
y_values = dataframe[['Body']]
#model on data
body_reg = linear_model.LinearRegression()
# cross_val_predict returns an array of the same size as `y` where each entry
# is a prediction obtained by cross validation:
predicted = cross_val_predict(body_reg, x_values, y_values, cv=10)
fig, ax = plt.subplots()
ax.scatter(y_values, predicted, edgecolors=(0, 0, 0))
ax.plot([y_values.min(), y_values.max()], [y_values.min(), y_values.max()], 'k--', lw=4)
ax.set_xlabel('Measured')
ax.set_ylabel('Predicted')
plt.show()
Results:
Data
https://ufile.io/p7x0r

Visualizing decision tree in scikit-learn

I am trying to design a simple Decision Tree using scikit-learn in Python (I am using Anaconda's Ipython Notebook with Python 2.7.3 on Windows OS) and visualize it as follows:
from pandas import read_csv, DataFrame
from sklearn import tree
from os import system
data = read_csv('D:/training.csv')
Y = data.Y
X = data.ix[:,"X0":"X33"]
dtree = tree.DecisionTreeClassifier(criterion = "entropy")
dtree = dtree.fit(X, Y)
dotfile = open("D:/dtree2.dot", 'w')
dotfile = tree.export_graphviz(dtree, out_file = dotfile, feature_names = X.columns)
dotfile.close()
system("dot -Tpng D:.dot -o D:/dtree2.png")
However, I get the following error:
AttributeError: 'NoneType' object has no attribute 'close'
I use the following blog post as reference: Blogpost link
The following stackoverflow question doesn't seem to work for me as well: Question
Could someone help me with how to visualize the decision tree in scikit-learn?
Here is one liner for those who are using jupyter and sklearn(18.2+) You don't even need matplotlib for that. Only requirement is graphviz
pip install graphviz
than run (according to code in question X is a pandas DataFrame)
from graphviz import Source
from sklearn import tree
Source( tree.export_graphviz(dtreg, out_file=None, feature_names=X.columns))
This will display it in SVG format. Code above produces Graphviz's Source object (source_code - not scary) That would be rendered directly in jupyter.
Some things you are likely to do with it
Display it in jupter:
from IPython.display import SVG
graph = Source( tree.export_graphviz(dtreg, out_file=None, feature_names=X.columns))
SVG(graph.pipe(format='svg'))
Save as png:
graph = Source( tree.export_graphviz(dtreg, out_file=None, feature_names=X.columns))
graph.format = 'png'
graph.render('dtree_render',view=True)
Get the png image, save it and view it:
graph = Source( tree.export_graphviz(dtreg, out_file=None, feature_names=X.columns))
png_bytes = graph.pipe(format='png')
with open('dtree_pipe.png','wb') as f:
f.write(png_bytes)
from IPython.display import Image
Image(png_bytes)
If you are going to play with that lib here are the links to examples and userguide
sklearn.tree.export_graphviz doesn't return anything, and so by default returns None.
By doing dotfile = tree.export_graphviz(...) you overwrite your open file object, which had been previously assigned to dotfile, so you get an error when you try to close the file (as it's now None).
To fix it change your code to
...
dotfile = open("D:/dtree2.dot", 'w')
tree.export_graphviz(dtree, out_file = dotfile, feature_names = X.columns)
dotfile.close()
...
If, like me, you have a problem installing graphviz, you can visualize the tree by
exporting it with export_graphviz as shown in previous answers
Open the .dot file in a text editor
Copy the piece of code and paste it # webgraphviz.com
Scikit learn recently introduced the plot_tree method to make this very easy (new in version 0.21 (May 2019)). Documentation here.
Here's the minimum code you need:
from sklearn import tree
plt.figure(figsize=(40,20)) # customize according to the size of your tree
_ = tree.plot_tree(your_model_name, feature_names = X.columns)
plt.show()
plot_tree supports some arguments to beautify the tree. For example:
from sklearn import tree
plt.figure(figsize=(40,20))
_ = tree.plot_tree(your_model_name, feature_names = X.columns,
filled=True, fontsize=6, rounded = True)
plt.show()
If you want to save the picture to a file, add the following line before plt.show():
plt.savefig('filename.png')
If you want to view the rules in text format, there's an answer here. It's more intuitive to read.
Alternatively, you could try using pydot for producing the png file from dot:
...
tree.export_graphviz(dtreg, out_file='tree.dot') #produces dot file
import pydot
dotfile = StringIO()
tree.export_graphviz(dtreg, out_file=dotfile)
pydot.graph_from_dot_data(dotfile.getvalue()).write_png("dtree2.png")
...
The following also works fine:
from sklearn.datasets import load_iris
iris = load_iris()
# Model (can also use single decision tree)
from sklearn.ensemble import RandomForestClassifier
model = RandomForestClassifier(n_estimators=10)
# Train
model.fit(iris.data, iris.target)
# Extract single tree
estimator = model.estimators_[5]
from sklearn.tree import export_graphviz
# Export as dot file
export_graphviz(estimator, out_file='tree.dot',
feature_names = iris.feature_names,
class_names = iris.target_names,
rounded = True, proportion = False,
precision = 2, filled = True)
# Convert to png using system command (requires Graphviz)
from subprocess import call
call(['dot', '-Tpng', 'tree.dot', '-o', 'tree.png', '-Gdpi=600'])
# Display in jupyter notebook
from IPython.display import Image
Image(filename = 'tree.png')
You can find the source here
I copy and change a part of your code as the below:
from pandas import read_csv, DataFrame
from sklearn import tree
from sklearn.tree import DecisionTreeClassifier
from os import system
data = read_csv('D:/training.csv')
Y = data.Y
X = data.ix[:,"X0":"X33"]
dtree = tree.DecisionTreeClassifier(criterion = "entropy")
dtree = dtree.fit(X, Y)
After making sure you have dtree, which means that the above code runs well, you add the below code to visualize decision tree:
Remember to install graphviz first: pip install graphviz
import graphviz
from graphviz import Source
dot_data = tree.export_graphviz(dtree, out_file=None, feature_names=X.columns)
graph = graphviz.Source(dot_data)
graph.render("name of file",view = True)
I tried with my data, visualization worked well and I got a pdf file viewed immediately.
You can copy the contents of the export_graphviz file and you can paste the same in the webgraphviz.com site.
You can check out the article on How to visualize the decision tree in Python with graphviz for more information.
Simple way founded here with pydotplus (graphviz must be installed):
from IPython.display import Image
from sklearn import tree
import pydotplus # installing pyparsing maybe needed
...
dot_data = tree.export_graphviz(best_model, out_file=None, feature_names = X.columns)
graph = pydotplus.graph_from_dot_data(dot_data)
Image(graph.create_png())
Here is the minimal code to have a nice looking graph with just 3 lines of code :
from sklearn import tree
import pydotplus
dot_data=tree.export_graphviz(dt,filled=True,rounded=True)
graph=pydotplus.graph_from_dot_data(dot_data)
graph.write_png('tree.png')
plt.imshow(plt.imread('tree.png'))
I just added the plt.imgshow to view the graph in Jupyter Notebook. You can ignore it if you are only interested in saving the png file.
I installed the following dependencies:
pip3 install graphviz
pip3 install pydotplus
For MacOs the pip version of Graphviz did not work. Following Graphviz's official documentation I installed it with brew and everything worked fine.
brew install graphviz
If you run into issues with grabbing the source .dot directly you can also use Source.from_file like this:
from graphviz import Source
from sklearn import tree
tree.export_graphviz(dtreg, out_file='tree.dot', feature_names=X.columns)
Source.from_file('tree.dot')

Categories