I'm trying to learn ML, and I am a noob.
Currently, I'm trying this video (https://www.youtube.com/watch?v=tNa99PG8hR8).
And my code is:
import pydotplus
import numpy as np
from sklearn.datasets import load_iris
from sklearn import tree
from sklearn.externals.six import StringIO
iris = load_iris()
##print(iris.feature_names)
##print(iris.target_names)
##print(iris.data[0])
##print(iris.target[0])
##for i in range(len(iris.target)):
## print("Example %d: label %s, features %s" % (i, iris.target[i], iris.data[i]))
test_idx = [0,50,100]
##training data
train_target = np.delete(iris.target, test_idx)
train_data = np.delete(iris.data, test_idx, axis=0)
##testing data
test_target = iris.target[test_idx]
test_data = iris.data[test_idx]
clf = tree.DecisionTreeClassifier()
clf.fit(train_data, train_target)
print(test_target)
print(clf.predict(test_data))
dot_data = StringIO()
tree.export_graphviz(clf,
out_file=dot_data,
feature_names=iris.feature_names,
class_names=iris.target_names,
filled=True, rounded=True,
impurity=False)
graph = pydotplus.graph_from_dot_data(dot_data.getvalue())
graph.write_pdf("iris.pdf")
But the error shows:
Traceback (most recent call last):
File "C:\Users\Denis\Desktop\Machine Learning\iris.py", line 42, in graph.write_pdf("iris.pdf")
File "C:\Users\Denis\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pydotplus\graphviz.py", line 1810, in prog=self.prog: self.write(path, format=f, prog=prog)
File "C:\Users\Denis\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pydotplus\graphviz.py", line 1918, in write fobj.write(self.create(prog, format))
File "C:\Users\Denis\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pydotplus\graphviz.py", line 1960, in create 'GraphViz\'s executables not found') pydotplus.graphviz.InvocationException: GraphViz's executables not found
I've tried reinstall pydotplus and graphviz, but to no avail.
I have no clue on how to change path.
I've searched my graphviz folder, and I found no bin files.
So, you installed the graphviz library for python, but you haven't installed Graphviz software I guess.
You can install it from here and make sure that the directory containing the dot executable is on your systems’ path.
Good luck in ML journey! :-)
For all those who are facing this issue in windows 10 even after trying the above mentioned steps (i.e. installing Graphviz software seperately) , this worked for me - For Windows 10 users trying to debug this same error, launch CMD as administrator (important!) and run dot -c and then run dot -v
This fixed the issue for me
Related
Working with torch package:
import torch
from torch.autograd import Variable
x_data = [1.0,2.0,3.0]
y_data = [2.0,4.0,6.0]
w = Variable(torch.Tensor([1.0]), requires_grad = True)
def forward(x):
return x*w
def loss(x,y):
y_pred = forward(x)
return (y_pred-y)*(y_pred-y)
print("my prediction before training",4,forward(4))
for epoch in range(10):
for x_val, y_val in zip(x_data,y_data):
l= loss(x_val, y_val)
l.backward()
print("\tgrad: ", x_val, y_val, w.grad.data[0])
w.data=w.data-0.01*w.grad.data
w.grad.data.zero_()
print("progress:", epoch, l.data[0] )
print("my new prediction after training ", forward(4))
Got error:
runfile('C:/gdrive/python/temp2.py', wdir='C:/gdrive/python')
Traceback (most recent call last):
File "C:\gdrive\python\temp2.py", line 11, in <module>
from torch.autograd import Variable
ModuleNotFoundError: No module named 'torch.autograd'
Command conda list pytorch brings:
# packages in environment at C:\Users\g\.conda\envs\test:
#
# Name Version Build Channel
(test) PS C:\gdrive\python>
How to fix this problem?
It seems to me that you have installed pytorch using conda.
Might be you have torch named folder in your current directory.
Try changing the directory, or try installing pytorch using pip.
This https://github.com/pytorch/pytorch/issues/1851 might help you to solve your problem.
People who are using pip:
pip install torchvision
I installed sklearn in my enviorment and running it now on jupyter notebook on windows.
How can I avoid the error:
URLError: urlopen error [Errno 11004] getaddrinfo failed
I am running the following code:
import sklearn
import sklearn.ensemble
import sklearn.metrics
from sklearn.datasets import fetch_20newsgroups
categories = ['alt.atheism', 'soc.religion.christian']
newsgroups_train = fetch_20newsgroups(subset='train', categories=categories)
which gives the error with line 5:
----> 3 newsgroups_train = fetch_20newsgroups(subset='train', categories=categories)
I am behind a proxy on my working computer, is there any option to avoid this error and to be able to use the sample datasets?
According to source code, scikit-learn will download the file from:
https://ndownloader.figshare.com/files/5975967
I am assuming that you cannot reach this location from behind the proxy.
Can you access the dataset by some other means? If yes, then you can download it manually and then run the following script on it:
and keep it at the location:
~/scikit_learn_data/
Here ~ refers to the user home folder. You can use the following code to know the default location of that folder according to your system.
from sklearn.datasets import get_data_home
print(get_data_home())
Update: Once done, use the following script to make it in a form in which scikit-learn keeps its caches
import codecs, pickle, tarfile, shutil
from sklearn.datasets import load_files
data_folder = '~/scikit_learn_data/'
target_folder = data_folder+'20news_home/'
tarfile.open(data_folder+'20newsbydate.tar.gz', "r:gz").extractall(path=target_folder)
cache = dict(train=load_files(target_folder+'20news-bydate-train', encoding='latin1'),
test=load_files(target_folder+'20news-bydate-test', encoding='latin1'))
compressed_content = codecs.encode(pickle.dumps(cache), 'zlib_codec')
with open(data_folder+'20news-bydate_py3.pkz', 'wb') as f:
f.write(compressed_content)
shutil.rmtree(target_folder)
Scikit-learn will always check if the dataset exists locally before attempting to download from internet. For that it will check the above location.
After that you can run the import normally.
I want to visualize applied decision tree classifier on my dataset and see how it branches out. Doing several google search, this link came up. I am fine with these codes till this line "f = tree.export_graphviz(clf, out_file=f)".
from sklearn.datasets import load_iris
from sklearn import tree
iris = load_iris()
clf = tree.DecisionTreeClassifier()
clf = clf.fit(iris.data, iris.target)
with open("iris.dot", 'w') as f:
... f = tree.export_graphviz(clf, out_file=f)
My question is after this code how can i visualize the tree? According to the link "http://scikit-learn.org/stable/modules/tree.html", I have to use this code "dot -Tpdf iris.dot -o iris.pdf" to create a PDF file. I don't understand where should i use this? in the Graphviz’s dot tool? if yes, i get this error "Error: : syntax error in line 1 near 'dot' "
I will be grateful if anybody answer my question.Thanks.
dot -Tpdf iris.dot -o iris.pdf is a command you can use in bash.And you should have Graphviz tools installed.For example,you can install it on ubuntu use command:sudo apt-get install graphviz
According to the link "http://scikit-learn.org/stable/modules/tree.html", if we have Python module pydotplus installed, we can generate a PDF file (or any other supported file type) directly in Python:
import pydotplus
dot_data = tree.export_graphviz(clf, out_file=None)
graph = pydotplus.graph_from_dot_data(dot_data)
graph.write_pdf("iris.pdf")
For Shelly's comment, I add the following code, which is the complete code ran on my ipython notebook.
import pydotplus
from sklearn.datasets import load_iris
from sklearn import tree
iris = load_iris()
clf = tree.DecisionTreeClassifier()
clf = clf.fit(iris.data, iris.target)
with open("iris.dot", 'w') as f:
f = tree.export_graphviz(clf, out_file=f)
dot_data = tree.export_graphviz(clf, out_file=None)
graph = pydotplus.graph_from_dot_data(dot_data)
graph.write_pdf("iris.pdf")
I am trying to run the following code:
# Gaussian Naive Bayes
from sklearn import datasets
from sklearn import metrics
from sklearn.naive_bayes import GaussianNB
# load the iris datasets
dataset = datasets.load_iris()
# fit a Naive Bayes model to the data
model = GaussianNB()
model.fit(dataset.data, dataset.target)
print(model)
# make predictions
expected = dataset.target
predicted = model.predict(dataset.data)
# summarize the fit of the model
print(metrics.classification_report(expected, predicted))
print(metrics.confusion_matrix(expected, predicted))
I have all the prerequisite modules installed like numpy, scipy, scikit-learn.
when i run the code the error displayed is:
Traceback (most recent call last):
File "C:/Users/user/PycharmProjects/sentiment-analysis/ldaex.py", line 4, in
from sklearn import datasets
File "C:\Python27\lib\site-packages\sklearn__init__.py", line 57, in
from .base import clone
File "C:\Python27\lib\site-packages\sklearn\base.py", line 9, in
from scipy import sparse
File "C:\Python27\lib\site-packages\scipy__init__.py", line 131, in
raise ImportError("numpy openblaspy flavour needed.")
ImportError: numpy openblaspy flavour needed.
Can anyone let me know the problem in my modules???
Also when i try to run this following program, the same set of errors are displayed:
from sklearn.datasets import fetch_20newsgroups
categories = ['alt.atheism', 'soc.religion.christian','comp.graphics',
'sci.med']
twenty_train = fetch_20newsgroups(subset='train',categories=categories,
shuffle=True, random_state=42)
print twenty_train.target_names
My code is follow the class of machine learning of google.The two code are same.I don't know why it show error.May be the type of variable is error.But google's code is same to me.Who has ever had this problem?
This is error
[0 1 2]
[0 1 2]
Traceback (most recent call last):
File "/media/joyce/oreo/python/machine_learn/VisualizingADecisionTree.py", line 34, in <module>
graph.write_pdf("iris.pdf")
AttributeError: 'list' object has no attribute 'write_pdf'
[Finished in 0.4s with exit code 1]
[shell_cmd: python -u "/media/joyce/oreo/python/machine_learn/VisualizingADecisionTree.py"]
[dir: /media/joyce/oreo/python/machine_learn]
[path: /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games]
This is code
import numpy as np
from sklearn.datasets import load_iris
from sklearn import tree
iris = load_iris()
test_idx = [0, 50, 100]
# training data
train_target = np.delete(iris.target, test_idx)
train_data = np.delete(iris.data, test_idx, axis=0)
# testing data
test_target = iris.target[test_idx]
test_data = iris.data[test_idx]
clf = tree.DecisionTreeClassifier()
clf.fit(train_data, train_target)
print test_target
print clf.predict(test_data)
# viz code
from sklearn.externals.six import StringIO
import pydot
dot_data = StringIO()
tree.export_graphviz(clf,
out_file=dot_data,
feature_names=iris.feature_names,
class_names=iris.target_names,
filled=True, rounded=True,
impurity=False)
graph = pydot.graph_from_dot_data(dot_data.getvalue())
graph.write_pdf("iris.pdf")
I think you are using newer version of python. Please try with pydotplus.
import pydotplus
...
graph = pydotplus.graph_from_dot_data(dot_data.getvalue())
graph.write_pdf("iris.pdf")
This should do it.
pydot.graph_from_dot_data() returns a list, so try:
graph = pydot.graph_from_dot_data(dot_data.getvalue())
graph[0].write_pdf("iris.pdf")
I had exactly the same issue. Turned out that I hadn't installed graphviz. Once i did that it started to work.
#Alex Sokolov, for my case in window, i downloaded and install / unzip the following to a folder then setup the PATH in Windows environment variables. re-run the py code works for me. hope is helpful to you.
I install scikit-learn via conda and all of about not work.
Firstly, I have to install libtool
brew install libtool --universal
Then I follow this sklearn guide
Then change the python file to this code
clf = clf.fit(train_data, train_target)
tree.export_graphviz(clf,out_file='tree.dot')
Finally convert to png in terminal
dot -Tpng tree.dot -o tree.png
I tried the previous answers and still got a error when running the script Therefore,
I just used pydotplus
import pydotplus
and install the "graphviz" by using:
sudo apt-get install graphviz
Then it worked for me, and I added
graph = pydotplus.graph_from_dot_data(dot_data.getvalue())
graph.write_pdf("iris.pdf")
Thanks to the previous contributors.
It works as the following on Python3.7 but don't forget to install pydot using Anaconda prompt:
from sklearn.externals.six import StringIO
import pydot
# viz code
dot_data = StringIO()
tree.export_graphviz(clf, out_file=dot_data, feature_names=iris.feature_names,
class_names=iris.target_names, filled=True, rounded=True,
impurity=False)
graph = pydot.graph_from_dot_data(dot_data.getvalue())
graph[0].write_pdf('iris.pdf')
I use Anaconda. Here's what worked for me:
run from terminal:
conda install python-graphviz
conda install pydot ## don't forget this <-----------------
Then run
clf = clf.fit(train_data, train_target)
tree.export_graphviz(clf,out_file='tree.dot')
Then from the terminal:
dot -Tpng tree.dot -o tree.png
To add all graphs for the number of your n_estimators you can do:
for i in range(0, n): #n is your n_estimators number
dot_data = StringIO()
tree.export_graphviz(clf.estimators_[i], out_file=dot_data, feature_names=iris.feature_names,
class_names=iris.target_names, filled=True, rounded=True,
impurity=False)
graph = pydotplus.graph_from_dot_data(dot_data.getvalue())
graph.write_pdf("iris%s.pdf"%i)
you could also switch the line
graph = pydotplus.graph_from_dot_data(dot_data.getvalue())
for this one
(graph,) = pydot.graph_from_dot_data(dot_data.getvalue())
graph.write_pdf("iris.pdf")
and it would still work.
I hope this helps, I was having a similar issue. I decided not to use pydot / pydotplus, but rather graphviz. I modified (barely) the code and it works wonders! :)
# 2. Train classifier
# Testing Data
# Examples used to "test" the classifier's accuracy
# Not part of the training data
import numpy as np
from sklearn.datasets import load_iris
from sklearn import tree
iris = load_iris()
test_idx = [0, 50, 100] # Grabs one example of each flower for testing data (in the data set it so happens to be that
# each flower begins at 0, 50, and 100
# training data
train_target = np.delete(iris.target, test_idx) # Delete all but 3 for training target data
train_data = np.delete(iris.data, test_idx, axis=0) # Delete all but 3 for training data
# testing data
test_target = iris.target[test_idx] # Get testing target data
test_data = iris.data[test_idx] # Get testing data
# create decision tree classifier and train in it on the testing data
clf = tree.DecisionTreeClassifier()
clf.fit(train_data, train_target)
# Predict label for new flower
print(test_target)
print(clf.predict(test_data))
# Visualize the tree
from sklearn.externals.six import StringIO
import graphviz
dot_data = StringIO()
tree.export_graphviz(clf,
out_file=dot_data,
feature_names=iris.feature_names,
class_names=iris.target_names,
filled=True, rounded=True,
impurity=False)
graph = graphviz.Source(dot_data.getvalue())
graph.render("iris.pdf", view=True)