I want to visualize applied decision tree classifier on my dataset and see how it branches out. Doing several google search, this link came up. I am fine with these codes till this line "f = tree.export_graphviz(clf, out_file=f)".
from sklearn.datasets import load_iris
from sklearn import tree
iris = load_iris()
clf = tree.DecisionTreeClassifier()
clf = clf.fit(iris.data, iris.target)
with open("iris.dot", 'w') as f:
... f = tree.export_graphviz(clf, out_file=f)
My question is after this code how can i visualize the tree? According to the link "http://scikit-learn.org/stable/modules/tree.html", I have to use this code "dot -Tpdf iris.dot -o iris.pdf" to create a PDF file. I don't understand where should i use this? in the Graphviz’s dot tool? if yes, i get this error "Error: : syntax error in line 1 near 'dot' "
I will be grateful if anybody answer my question.Thanks.
dot -Tpdf iris.dot -o iris.pdf is a command you can use in bash.And you should have Graphviz tools installed.For example,you can install it on ubuntu use command:sudo apt-get install graphviz
According to the link "http://scikit-learn.org/stable/modules/tree.html", if we have Python module pydotplus installed, we can generate a PDF file (or any other supported file type) directly in Python:
import pydotplus
dot_data = tree.export_graphviz(clf, out_file=None)
graph = pydotplus.graph_from_dot_data(dot_data)
graph.write_pdf("iris.pdf")
For Shelly's comment, I add the following code, which is the complete code ran on my ipython notebook.
import pydotplus
from sklearn.datasets import load_iris
from sklearn import tree
iris = load_iris()
clf = tree.DecisionTreeClassifier()
clf = clf.fit(iris.data, iris.target)
with open("iris.dot", 'w') as f:
f = tree.export_graphviz(clf, out_file=f)
dot_data = tree.export_graphviz(clf, out_file=None)
graph = pydotplus.graph_from_dot_data(dot_data)
graph.write_pdf("iris.pdf")
Related
So i'm trying to build a decisiontree in python using sklearn.
from sklearn.tree import DecisionTreeClassifier, export_graphviz
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
import graphviz
cancer = load_breast_cancer()
X_train, X_test, y_train, y_test = train_test_split(cancer.data, cancer.target,
stratify=cancer.target, random_state=42)
tree = DecisionTreeClassifier(random_state=0, max_depth=4)
tree.fit(X_train, y_train)
export_graphviz(tree, out_file=r"C:\Users\obaro\OneDrive\Documents\tree.dot", class_names=["malignant", "benign"],
feature_names=cancer.feature_names, impurity=False, filled=True)
with open(r"C:\Users\obaro\OneDrive\Documents\tree.dot") as f:
dot_graph = f.read()
display(graphviz.Source(dot_graph))
When I try to run this code in Jupyter, though, I get a FileNotFound error and an ExecutableNotFound error. At first, I tried using a relative path, but that didn't work, so I tried using an absolute path. The file WAS created and is in my current home directory so I'm not sure what's going on here. Any help would be appreciated, thanks.
If I run
from sklearn.datasets import load_breast_cancer
import lightgbm as lgb
breast_cancer = load_breast_cancer()
data = breast_cancer.data
target = breast_cancer.target
params = {
"task": "convert_model",
"convert_model_language": "cpp",
"convert_model": "test.cpp",
}
gbm = lgb.train(params, lgb.Dataset(data, target))
then I was expecting that a file called test.cpp would be created, with the model saved in c++ format.
However, nothing appears in my current directory.
I have read the documentation (https://lightgbm.readthedocs.io/en/latest/Parameters.html#io-parameters), but can't tell what I'm doing wrong.
Here's a real 'for dummies' answer:
Install the CLI version of lightgbm: https://lightgbm.readthedocs.io/en/latest/Installation-Guide.html
Make note of your installation path, and find the executable. For example, for me, this was ~/LightGBM/lightgbm.
Run the following in a Jupyter notebook:
from sklearn.datasets import load_breast_cancer
import pandas as pd
breast_cancer = load_breast_cancer()
data = pd.DataFrame(breast_cancer.data)
target = pd.DataFrame(breast_cancer.target)
pd.concat([target, data], axis=1).to_csv("regression.train", header=False, index=False)
train_conf = """
task = train
objective = binary
metric = auc
data = regression.train
output_model = trained_model.txt
"""
with open("train.conf", "w") as f:
f.write(train_conf)
conf_convert = """
task = convert_model
input_model= trained_model.txt
"""
with open("convert.conf", "w") as f:
f.write(conf_convert)
! ~/LightGBM/lightgbm config=train.conf
! ~/LightGBM/lightgbm config=convert.conf
Your model with be saved in your current directory.
In the doc they say:
Note: can be used only in CLI version
under the convert_model and convert_model_language parameters.
That means that you should probably use the CLI (Command Line Interfarce) of LGBM instead of the python wrapper to do this.
Link to Quick Start CLI version.
I'm trying to learn ML, and I am a noob.
Currently, I'm trying this video (https://www.youtube.com/watch?v=tNa99PG8hR8).
And my code is:
import pydotplus
import numpy as np
from sklearn.datasets import load_iris
from sklearn import tree
from sklearn.externals.six import StringIO
iris = load_iris()
##print(iris.feature_names)
##print(iris.target_names)
##print(iris.data[0])
##print(iris.target[0])
##for i in range(len(iris.target)):
## print("Example %d: label %s, features %s" % (i, iris.target[i], iris.data[i]))
test_idx = [0,50,100]
##training data
train_target = np.delete(iris.target, test_idx)
train_data = np.delete(iris.data, test_idx, axis=0)
##testing data
test_target = iris.target[test_idx]
test_data = iris.data[test_idx]
clf = tree.DecisionTreeClassifier()
clf.fit(train_data, train_target)
print(test_target)
print(clf.predict(test_data))
dot_data = StringIO()
tree.export_graphviz(clf,
out_file=dot_data,
feature_names=iris.feature_names,
class_names=iris.target_names,
filled=True, rounded=True,
impurity=False)
graph = pydotplus.graph_from_dot_data(dot_data.getvalue())
graph.write_pdf("iris.pdf")
But the error shows:
Traceback (most recent call last):
File "C:\Users\Denis\Desktop\Machine Learning\iris.py", line 42, in graph.write_pdf("iris.pdf")
File "C:\Users\Denis\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pydotplus\graphviz.py", line 1810, in prog=self.prog: self.write(path, format=f, prog=prog)
File "C:\Users\Denis\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pydotplus\graphviz.py", line 1918, in write fobj.write(self.create(prog, format))
File "C:\Users\Denis\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pydotplus\graphviz.py", line 1960, in create 'GraphViz\'s executables not found') pydotplus.graphviz.InvocationException: GraphViz's executables not found
I've tried reinstall pydotplus and graphviz, but to no avail.
I have no clue on how to change path.
I've searched my graphviz folder, and I found no bin files.
So, you installed the graphviz library for python, but you haven't installed Graphviz software I guess.
You can install it from here and make sure that the directory containing the dot executable is on your systems’ path.
Good luck in ML journey! :-)
For all those who are facing this issue in windows 10 even after trying the above mentioned steps (i.e. installing Graphviz software seperately) , this worked for me - For Windows 10 users trying to debug this same error, launch CMD as administrator (important!) and run dot -c and then run dot -v
This fixed the issue for me
I am trying to run the following code:
# Gaussian Naive Bayes
from sklearn import datasets
from sklearn import metrics
from sklearn.naive_bayes import GaussianNB
# load the iris datasets
dataset = datasets.load_iris()
# fit a Naive Bayes model to the data
model = GaussianNB()
model.fit(dataset.data, dataset.target)
print(model)
# make predictions
expected = dataset.target
predicted = model.predict(dataset.data)
# summarize the fit of the model
print(metrics.classification_report(expected, predicted))
print(metrics.confusion_matrix(expected, predicted))
I have all the prerequisite modules installed like numpy, scipy, scikit-learn.
when i run the code the error displayed is:
Traceback (most recent call last):
File "C:/Users/user/PycharmProjects/sentiment-analysis/ldaex.py", line 4, in
from sklearn import datasets
File "C:\Python27\lib\site-packages\sklearn__init__.py", line 57, in
from .base import clone
File "C:\Python27\lib\site-packages\sklearn\base.py", line 9, in
from scipy import sparse
File "C:\Python27\lib\site-packages\scipy__init__.py", line 131, in
raise ImportError("numpy openblaspy flavour needed.")
ImportError: numpy openblaspy flavour needed.
Can anyone let me know the problem in my modules???
Also when i try to run this following program, the same set of errors are displayed:
from sklearn.datasets import fetch_20newsgroups
categories = ['alt.atheism', 'soc.religion.christian','comp.graphics',
'sci.med']
twenty_train = fetch_20newsgroups(subset='train',categories=categories,
shuffle=True, random_state=42)
print twenty_train.target_names
My code is follow the class of machine learning of google.The two code are same.I don't know why it show error.May be the type of variable is error.But google's code is same to me.Who has ever had this problem?
This is error
[0 1 2]
[0 1 2]
Traceback (most recent call last):
File "/media/joyce/oreo/python/machine_learn/VisualizingADecisionTree.py", line 34, in <module>
graph.write_pdf("iris.pdf")
AttributeError: 'list' object has no attribute 'write_pdf'
[Finished in 0.4s with exit code 1]
[shell_cmd: python -u "/media/joyce/oreo/python/machine_learn/VisualizingADecisionTree.py"]
[dir: /media/joyce/oreo/python/machine_learn]
[path: /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games]
This is code
import numpy as np
from sklearn.datasets import load_iris
from sklearn import tree
iris = load_iris()
test_idx = [0, 50, 100]
# training data
train_target = np.delete(iris.target, test_idx)
train_data = np.delete(iris.data, test_idx, axis=0)
# testing data
test_target = iris.target[test_idx]
test_data = iris.data[test_idx]
clf = tree.DecisionTreeClassifier()
clf.fit(train_data, train_target)
print test_target
print clf.predict(test_data)
# viz code
from sklearn.externals.six import StringIO
import pydot
dot_data = StringIO()
tree.export_graphviz(clf,
out_file=dot_data,
feature_names=iris.feature_names,
class_names=iris.target_names,
filled=True, rounded=True,
impurity=False)
graph = pydot.graph_from_dot_data(dot_data.getvalue())
graph.write_pdf("iris.pdf")
I think you are using newer version of python. Please try with pydotplus.
import pydotplus
...
graph = pydotplus.graph_from_dot_data(dot_data.getvalue())
graph.write_pdf("iris.pdf")
This should do it.
pydot.graph_from_dot_data() returns a list, so try:
graph = pydot.graph_from_dot_data(dot_data.getvalue())
graph[0].write_pdf("iris.pdf")
I had exactly the same issue. Turned out that I hadn't installed graphviz. Once i did that it started to work.
#Alex Sokolov, for my case in window, i downloaded and install / unzip the following to a folder then setup the PATH in Windows environment variables. re-run the py code works for me. hope is helpful to you.
I install scikit-learn via conda and all of about not work.
Firstly, I have to install libtool
brew install libtool --universal
Then I follow this sklearn guide
Then change the python file to this code
clf = clf.fit(train_data, train_target)
tree.export_graphviz(clf,out_file='tree.dot')
Finally convert to png in terminal
dot -Tpng tree.dot -o tree.png
I tried the previous answers and still got a error when running the script Therefore,
I just used pydotplus
import pydotplus
and install the "graphviz" by using:
sudo apt-get install graphviz
Then it worked for me, and I added
graph = pydotplus.graph_from_dot_data(dot_data.getvalue())
graph.write_pdf("iris.pdf")
Thanks to the previous contributors.
It works as the following on Python3.7 but don't forget to install pydot using Anaconda prompt:
from sklearn.externals.six import StringIO
import pydot
# viz code
dot_data = StringIO()
tree.export_graphviz(clf, out_file=dot_data, feature_names=iris.feature_names,
class_names=iris.target_names, filled=True, rounded=True,
impurity=False)
graph = pydot.graph_from_dot_data(dot_data.getvalue())
graph[0].write_pdf('iris.pdf')
I use Anaconda. Here's what worked for me:
run from terminal:
conda install python-graphviz
conda install pydot ## don't forget this <-----------------
Then run
clf = clf.fit(train_data, train_target)
tree.export_graphviz(clf,out_file='tree.dot')
Then from the terminal:
dot -Tpng tree.dot -o tree.png
To add all graphs for the number of your n_estimators you can do:
for i in range(0, n): #n is your n_estimators number
dot_data = StringIO()
tree.export_graphviz(clf.estimators_[i], out_file=dot_data, feature_names=iris.feature_names,
class_names=iris.target_names, filled=True, rounded=True,
impurity=False)
graph = pydotplus.graph_from_dot_data(dot_data.getvalue())
graph.write_pdf("iris%s.pdf"%i)
you could also switch the line
graph = pydotplus.graph_from_dot_data(dot_data.getvalue())
for this one
(graph,) = pydot.graph_from_dot_data(dot_data.getvalue())
graph.write_pdf("iris.pdf")
and it would still work.
I hope this helps, I was having a similar issue. I decided not to use pydot / pydotplus, but rather graphviz. I modified (barely) the code and it works wonders! :)
# 2. Train classifier
# Testing Data
# Examples used to "test" the classifier's accuracy
# Not part of the training data
import numpy as np
from sklearn.datasets import load_iris
from sklearn import tree
iris = load_iris()
test_idx = [0, 50, 100] # Grabs one example of each flower for testing data (in the data set it so happens to be that
# each flower begins at 0, 50, and 100
# training data
train_target = np.delete(iris.target, test_idx) # Delete all but 3 for training target data
train_data = np.delete(iris.data, test_idx, axis=0) # Delete all but 3 for training data
# testing data
test_target = iris.target[test_idx] # Get testing target data
test_data = iris.data[test_idx] # Get testing data
# create decision tree classifier and train in it on the testing data
clf = tree.DecisionTreeClassifier()
clf.fit(train_data, train_target)
# Predict label for new flower
print(test_target)
print(clf.predict(test_data))
# Visualize the tree
from sklearn.externals.six import StringIO
import graphviz
dot_data = StringIO()
tree.export_graphviz(clf,
out_file=dot_data,
feature_names=iris.feature_names,
class_names=iris.target_names,
filled=True, rounded=True,
impurity=False)
graph = graphviz.Source(dot_data.getvalue())
graph.render("iris.pdf", view=True)