Strange plot by using sklearn.linear_model - python

I typically use MATLAB, but want to push myself to learn something about Python. I tried a code of linear regression that introduced by a youtuber. Here is the code:
import pandas as pd
from sklearn import linear_model
import matplotlib.pyplot as plt
#read data
dataframe = pd.read_fwf('brain_body.txt')
x_values = dataframe[['Brain']]
y_values = dataframe[['Body']]
#train model on data
body_reg = linear_model.LinearRegression()
body_reg.fit(x_values,y_values)
#visualize results
plt.scatter(x_values,y_values)
plt.plot(x_values,body_reg.predict(x_values))
plt.show()
But I ended up with a very strange plot (I use Python 3.6):
1
here is part of details:
2
Apparently, something is missing or wrong.
The data of brain_body.txt can be found in https://github.com/llSourcell/linear_regression_demo/blob/master/brain_body.txt
Any suggestion or advice is welcome.
Update
I tried sera's code, and here is what I get:
3
It's funny and weird. it occurred to me that something is wrong with my data file, or something missing in my Python, but I just copied and pasted the raw data into the notepad and saved as .txt; I tried Python 3.6 and 2.7 as well as Pycharm and Spyder...so I have no idea...
BTW, the youtube video is here
#sascha #Moritz #sera I asked my friend to run the same code and data file, and everything is fine. In other words, there is something wrong with my Python and I don't know why. Let me try another computer and/or try an earlier version of python.
I tried, but nothing changed. Here are two different approaches I used to install Python:
1. Install Python (e.g. ver. 3.6); install Pycharm; install packages Pandas, scikit-learn...
2. Install Anaconda
Solved
Thanks for #Marc Bataillou 's suggestion. This is a problem associated with different versions of matplotlib. The problem was found in version 2.1.0. I tried 2.0.2 and found that the original code works fine in the older version; apparently, some changes are made from 2.0.2 to 2.1.0. Thanks for all your efforts.

You should use
plt.scatter(x_values.values,y_values.values)
instead of
plt.scatter(x_values,y_values)
I hope it works !

You can visualize the results using the following code. I use cross validation for the predictions. If the model was perfect, then all the dots would be on the plotted line.
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import cross_val_predict
from sklearn import linear_model
#read data
dataframe = pd.read_fwf('brain_body.txt')
x_values = dataframe[['Brain']]
y_values = dataframe[['Body']]
#model on data
body_reg = linear_model.LinearRegression()
# cross_val_predict returns an array of the same size as `y` where each entry
# is a prediction obtained by cross validation:
predicted = cross_val_predict(body_reg, x_values, y_values, cv=10)
fig, ax = plt.subplots()
ax.scatter(y_values, predicted, edgecolors=(0, 0, 0))
ax.plot([y_values.min(), y_values.max()], [y_values.min(), y_values.max()], 'k--', lw=4)
ax.set_xlabel('Measured')
ax.set_ylabel('Predicted')
plt.show()
Results:
Data
https://ufile.io/p7x0r

Related

xgboost.plot_tree shows - Empty characters/boxes/blocks as labels

SITUATION
When I plot xgboost.plot_tree I get a bunch of empty characters/boxes/blocks on the graph only instead of the titles, labels and numbers. I use more than 400 features so that can be a contributing factor for this.
CODE 1
fig, ax = plt.subplots(figsize=(170, 170))
plot_tree(xgbmodel, ax=ax)
plt.savefig("temp.pdf")
plt.show()
CODE 2
plot_tree(xgbmodel, num_trees=2)
fig = plt.gcf()
fig.set_size_inches(150, 100)
fig.savefig('tree.png')
ERROR
both code 1 and code 2 results the same image
This is is just a crop of the whole tree because that is much bigger so I would not be able to upload here, but the tree shape look perfect.
SOLUTIONS I have Tried
This has problem with plotting, I can plot without any problem - Plot a Single XGBoost Decision Tree
This has other issues - xgboost.plot_tree: binary feature interpretation
I have plotted the code that #jared_mamrot has given to me and it have brought the same error, I have restarted and cleaned my environment and run this fist and only, in the same notebook.
GitHub Recommendation this model.get_booster().get_dump(dump_format='text') printed a out a bit more than 200'000 characters = 63 A4 size pages of 11size fonts of Calibri, that looks perfectly correct ex.: 0.0268656723\n\t\t\t\t\t34:[f0<6.5] yes=53,no=54,missing=53\n\t\t\t\t\t\. Is it possible that I have this issue because it can not display so much text in such a normal size graph?
I wasn't able to reproduce your error. Can you please add more details to your question and confirm that this code works? link to pima-indians-diabetes.csv
#!/usr/bin/env python3
# plot decision tree
from numpy import loadtxt
from xgboost import XGBClassifier
from xgboost import plot_tree
import matplotlib.pyplot as plt
import graphviz
# load data
dataset = loadtxt('pima-indians-diabetes.csv', delimiter=",")
# split data into X and y
X = dataset[:,0:8]
y = dataset[:,8]
# fit model no training data
model = XGBClassifier()
model.fit(X, y)
# plot/save fig
fig, ax = plt.subplots(figsize=(170, 170))
plot_tree(model, ax=ax)
plt.savefig("test.pdf")
Edit per comment:
I can't reproduce this issue/error. No matter which package version / char encoding / line endings / etc my notebook always renders the text correctly. The only thing I can suggest is installing a new virtual environment (e.g. miniconda) with current versions of the required packages (conda install notebook numpy matplotlib xgboost graphviz python-graphviz) and testing it again.
Also, make sure you don't have windows line endings (see: Matplotlib plotting some characters as blank square / https://github.com/jupyterlab/jupyterlab/issues/1104 / https://github.com/jupyterlab/jupyterlab/issues/3718 / https://github.com/jupyterlab/jupyterlab/pull/3882 ) and specify the font you are using (e.g. How to change fonts in matplotlib (python)?):
# plot decision tree
from numpy import loadtxt
from xgboost import XGBClassifier
from xgboost import plot_tree
from matplotlib.font_manager import FontProperties
import matplotlib.pyplot as plt
import graphviz
# load data
dataset = loadtxt('pima-indians-diabetes.csv', delimiter=",")
# split data into X and y
X = dataset[:,0:8]
y = dataset[:,8]
# fit model no training data
model = XGBClassifier()
model.fit(X, y)
# plot/save fig
prop = FontProperties()
prop.set_file('Arial.ttf')
fig, ax = plt.subplots(figsize=(170, 170))
plot_tree(model, ax=ax, fontproperties=prop)
plt.savefig("test.png")
fig.show()
I have moved my whole environment to a local machine from an AWS EC2 than it run perfectly. The AWS EC2 some other weird things like it wasn't allowing to use Extension in Jupyter Lab. Both of them are Ubuntu 20.04 LTS.

Writing create_tree_digraph plot to a png file in Python

I want the tree of my lightgbm model to save to a .png format. I have tried two plotting methods from lightgbm API - plot_tree and create_tree_diagraph.
import lightgbm as lgb
from sklearn.datasets import load_iris
X, y = load_iris(True)
clf = lgb.LGBMClassifier()
clf.fit(X, y)
When I use plot_tree, it displays the tree but in place of values there are small blank boxes
lgb.plot_tree(clf, tree_index=0)
When I try the create_tree_diagraph, I get the graph but I cant save it as it is.
lgb.create_tree_digraph(clf)
I used the below code to save it a file but that gets saved as the first plot (using plot_tree)
import graphviz
s = graphviz.Source(graph_b.source, filename = "test1.gv", format = "png")
s.view()
Any suggestions to save the plot as an image. I ultimately want to write these tree plots to excel.
I am using graphviz version 0.8.3
Thanks,
I suppose you are using a jupyter notebook or something like that.
This worked for me, but surely is not the best way.
ax = lgb.create_tree_digraph(clf)
with open('fst.svg', 'w') as f:
f.write(ax._repr_svg_())

PowerBI is not passing over the datafield to the Python script

I am trying to run a simple Python script in PowerBI to get it working but it only comes up with many errors.I have tried to:
reinstall Numpy and Matplotlib.
tried using Python 3.5 and 3.6
Uninstall Conda and reinstall
I have a screen shot showing my issue (it all fits on one screen bellow). The data is just a,b,c and 1,2,3 for x and y labeled "Label" and "Data".
Screen shot of error and code
Code here:
# The following code to create a dataframe and remove duplicated rows is always executed and acts as
a preamble for your script:
dataset = pandas.DataFrame(Data, Label)
# dataset = dataset.drop_duplicates()
# Paste or type your script code here:
import matplotlib.pyplot as plt
dataset.plot(kind='scatter', x='Label', y='Data', color='red')
plt.show()
You need to comment out the following:
dataset = pandas.DataFrame(Data, Label)
Even though it is 'commented out' it will run. It just sets up the data frame ready for the visual, here is a none working example, were it isn't commented out:
Hope that helps

Printing image in r-markdown using matplotlib and python code

I am trying to run the python code using the R-Markdown file (RMarkdown to pdf).
What I achieved till now -
1- I am able to configure my python engine using knitr and reticulate library
2- I am able to execute my python codes.
What I tried -
1- I tried all the methods which are discussed in this forum, but nothing is working out.
2- I also tried to save the image,(as one of the posts here suggests), but that also is not working.
My problem -
1- When I am trying to plot a graph using matlplotlib and command plt.imshow() and plt.show(), it's not printing the image in the output. Rather it's showing the image in a separate window. You can see my results in the attached image.
Result_of_my_code
Here is my code
```{r setup, include=FALSE}
library(knitr)
library(reticulate)
knitr::knit_engines$set(python = reticulate::eng_python)
```
```{python}
import numpy as np
import os
import torch
import torchvision.datasets as dsets
import matplotlib.pyplot as plt
print(os.getcwd())
os.chdir('D:\\1st year\\Python codes\\CIFR Analysis\\self contained analysis')
print(os.getcwd())
train_mnist = dsets.MNIST("../data", train=True)
test_mnist = dsets.MNIST("../data", train= False)
print(len(train_mnist))
#print(train_mnist[0][0])
plt.imshow(train_mnist[0][0], cmap="gray")
#plt.savefig("trainzero.png")
plt.show()
```
Kindly, help me to fix this issue, as I want to compile my python codes using the R markdown file.
thanks
So with R Markdown, you have to do some things a little differently. In the following, I have a dataframe with two series created by concatenating them. The original plotting code in the Jupyter Notebook is as follows and just printed out the series.
# make a plot of model fit
train.plot(figsize=(16,8), legend=True)
backtest.plot(legend=True);
However, it does not work with way with R Markdown. Then with plotting, you always have to assign them, and with the code below, you get the same plot.
dfreg = pd.concat([reg, backtest], axis = 1)
ax = dfreg.plot(figsize=(16,8), legend = True)
ax1 = predictions.plot(legend=True)
plt.show()
This is common with other plotting functions like plot_acf() too.

Any example/idea of how to plot the U-MATRIX using python (working on SOM)?

Working on SOM implementation using Python ? I want to know how to generate the u-matrix
One implementation can be found in the SUSI package (pip3 install susi). You can use it like that:
import susi
import matplotlib.pyplot as plt
from sklearn.datasets import make_blobs
# get data (replace this part with your data)
X, y = make_blobs(n_samples=100, n_features=2, centers=3)
# initialize and fit SOM
som = susi.SOMClustering()
som.fit(X)
u_matrix = som.get_u_matrix()
plt.imshow(np.squeeze(u_matrix), cmap="Greys")
plt.colorbar()
plt.show()
which results in this plot:
The code and the plot are taken from susi/SOMClustering.ipynb. You find the implementation of the u-matrix there as well.

Categories