Error in matplotlib when implementing K-Means algorithm - python

Trying to plot the graph using matplotlib which is been imported as plt but getting an error.
K-Mean Clustering
#importing libraries
import NumPy as np
import matplotlib as plt
import pandas as pd
#importing dataset with pandas
dataset = pd.read_csv('Mall_Customers.csv')
X = dataset.iloc[:, [3,4]].values
from sklearn.cluster import KMeans
wcss = []
for i in range(1,11):
kmeans = KMeans(n_clusters = i, init = 'k-means++', max_iter = 300, n_init = 10, random_state = 0)
kmeans.fit(X)
wcss.append(kmeans.inertia_)
plt.plot(range(1, 11), wcss)
plt.title('The Elbow Method')
plt.xlabel('Number of clusters')
plt.ylabel('WCSS')
plt.show()
Traceback (most recent call last):
File "<ipython-input-4-d9dfde180017>", line 8, in <module>
plt.plot(range(1, 11), wcss)
AttributeError: module 'matplotlib' has no attribute 'plot'

A 'has no attribute' error means the function or variable you're trying to access on an object doesn't exist. When you get this error from a library it means you did not read the library api documentations well enough.
matplotlib is the index of multiple apis, you may be looking for the pyplot api of matplotlib as this has a plot function.
See the documentation: https://matplotlib.org/api/pyplot_summary.html
See the index of matplotlib apis: https://matplotlib.org/api/index.html
Change
import matplotlib as plt
to
import matplotlib.pyplot as plt

Related

How to make a legend in the scipy dendrogram

I want to make a legend for a scipy dendrogram. I have tried to find hexadecimal color in the dendrogram function, but I didn't find anything. How to do this?
This is the code:
import pandas as pd
import numpy as np
from sklearn.cluster import AgglomerativeClustering
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import silhouette_score
import matplotlib.pyplot as plt
from scipy.cluster.hierarchy import dendrogram, linkage
.......
# Agglomerative Clustering di sklearn
agg_clustering = AgglomerativeClustering(n_clusters=4).fit(dfx_scaled)
# Aggiunta della colonna "clusters"
dfx['clusters'] = agg_clustering.labels_
# plot del dendogramma
linkage_matrix = linkage(dfx_scaled, 'ward')
plt.figure(figsize=(10, 7))
dendrogram(linkage_matrix,no_labels=True )
d = dendrogram(linkage_matrix,no_labels=True )
plt.axhline(y=140, color='black', linestyle='--')
plt.show()

I am new to k-means clustering and I am trying to run a code about clustering car types but there errors occurring to me

import numpy as np
import pandas as pd
from sklearn import svm
import matplotlib.pyplot as plt
import seaborn as sns; sns.set(font_scale=1.2)
%matplotlib inline
dataset = pd.read_csv("cars.csv")
#importing the dataset
X= dataset[dataset.columns[:-1]] #print all except last column
X=X.convert_obects(convert_numeric=True) ##show error 'DataFrame' object has no attribute
'convert_obects'
# Eliminating null values
for i in X.columns:
X[i]=X[i].fillna(int (X [i].mean()))
for i in X.columns: #double check
print(X[i].isnull().sum())
X.head()
from sklearn.cluster import KMeans
wcss=[] #an array
for i in range (0,11):
kmeans= KMeans(n_clusters=i, init='k-means++', max_iter=300, n_init=10, random_state=0)
wcss.append(kmeans.inertia_) #show error 'KMeans' object has no attribute 'inertia_'
plt.plot(range(0,11),WCSS)
plt.title('The Elbow Method')
plt.xlabel('WCSS')
plt.show()
#so I don't know what is the problem in these two errors
#Also I have imported pandas and numpy libraries
First error is because it is deprecated and does no longer exists in pandas, the second seem to be because you did not fit your model. From docs:
inertia_ : float
Sum of squared distances of samples to their closest cluster center.
You first need to fit a model to get such info.

KMeans scatter plot on macbook

I am a newbie in datascience and I was trying to plot a scatter plot for a dataset with 4000 rows. I am running Jupyter Notebook on a macbook. I found it took more than five minutes for the scatter plot to appear in the Jupyter notebook. My notebook was recently bought and it is 2.3Ghz intel core i5 and the memory is 8GB.
I have two questions: why it took so long? why the plot was so congested (for example, all x scales appeared small and they came together and could not be read clearly) and not very clear. The dataset is here: https://raw.githubusercontent.com/datascienceinc/learn-data-science/master/Introduction-to-K-means-Clustering/Data/data_1024.csv
I really appreciate for any englightments.
Here is my code:
import numpy as np
import pandas as pd
import matplotlib
from matplotlib import pyplot as plt
%matplotlib inline
from sklearn.cluster import KMeans
df= pd.read_csv('/users/kyaw/Downloads/data_1024.csv')
df = df.join(df['Driver_ID'].str.split(expand=True))
df = df.drop(["Driver_ID"], axis=1)
df.columns=['Driver_ID','Distance_Feature','Speeding_Feature']
f1 = df['Distance_Feature'].values
f2 = df['Speeding_Feature'].values
X=np.array(list(zip(f1,f2)))
fig=plt.gcf()
fig.set_size_inches(10,8)
kmeans = KMeans(n_clusters=3).fit(X)
plt.scatter(X[:,0], X[:,1], c=kmeans.labels_, cmap='rainbow')
plt.scatter(kmeans.cluster_centers_[:,0] ,kmeans.cluster_centers_[:,1], color='black')
plt.show()
I tried to run your code and it didn't work. I make the following corrections
import numpy as np
import pandas as pd
import matplotlib
from matplotlib import pyplot as plt
#%matplotlib inline --> Removed this inline, maybe is here due to jupyter
from sklearn.cluster import KMeans
df= pd.read_csv('./data_1024.csv',sep='\t' ) #indicate the separator as tab.
#remove the other instructions that are useless
f1 = df['Distance_Feature'].values
f2 = df['Speeding_Feature'].values
X=np.array(list(zip(f1,f2)))
fig=plt.gcf()
fig.set_size_inches(10,8)
kmeans = KMeans(n_clusters=3).fit(X)
plt.scatter(X[:,0], X[:,1], c=kmeans.labels_, cmap='rainbow')
plt.scatter(kmeans.cluster_centers_[:,0] ,kmeans.cluster_centers_[:,1], color='black')
plt.show()
I got this image

AttributeError: 'OLSResults' object has no attribute 'norm_resid'

When I run this I have the following error :
AttributeError: 'OLSResults' object has no attribute 'norm_resid'
I have the latest version of OLS, so the attribute norm_resid should be there.
Any ideas ?
from scipy import stats
from pandas import Series, DataFrame
import pandas as pd
import numpy as np
import matplotlib.pylab as plt
from sklearn import datasets, linear_model
from statsmodels.formula.api import ols
"""
Data Management
"""
data = pd.read_csv("TestExer1-sales-round1.csv")
X_train = data["Advertising"]
Y_train = data["Sales"]
# use of linregregress
model = ols("Y_train ~ X_train", data).fit()
print(model.summary())
plt.plot(X_train,Y_train , 'ro')
plt.plot(X_train, model.fittedvalues, 'b')
plt.legend(['Sales', 'Advertising'])
plt.ylim(0, 70)
plt.xlim(5, 18)
plt.hist(model.norm_resid())
plt.ylabel('Count')
plt.xlabel('Normalized residuals')
plt.xlabel('Temperature')
plt.ylabel('Gas')
plt.title('Before Insulation')
I had the same issue, but the following worked:
plt.hist(model.resid_pearson)
Thus your solution should look like:
from scipy import stats
from pandas import Series, DataFrame
import pandas as pd
import numpy as np
import matplotlib.pylab as plt
from sklearn import datasets, linear_model
from statsmodels.formula.api import ols
"""
Data Management
"""
data = pd.read_csv("TestExer1-sales-round1.csv")
X_train = data["Advertising"]
Y_train = data["Sales"]
# use of linregregress
model = ols("Y_train ~ X_train", data).fit()
print(model.summary())
plt.plot(X_train,Y_train , 'ro')
plt.plot(X_train, model.fittedvalues, 'b')
plt.legend(['Sales', 'Advertising'])
plt.ylim(0, 70)
plt.xlim(5, 18)
plt.hist(model.resid_pearson)
plt.ylabel('Count')
plt.xlabel('Normalized residuals')
plt.xlabel('Temperature')
plt.ylabel('Gas')
plt.title('Before Insulation')
when using statsmodel version 0.8.0 or greater.
Note: the pearson residuals only divide each residual value with standard error of residuals. While normalisation also divides each residual by the sum of all residuals. For more see here
From the docs.

Can you change iris cube projections in cartopy

I really like the idea that cartopy can automatically plot in different map projections. However, I couldn't figure out how to do with the Iris cubes. As its a sister project, I expected that I might be able to. Is it possible to do something like this?
import iris as I
import cartopy.crs as ccrs
import matplotlib.pyplot as plt
someCube = I.load('someCube.pp')
ax = plt.axes(projection=ccrs.Robinson())
I.plot.contourf(someCube, transform=ccrs.Robinson())
plt.show()
thanks
I took your pseudo code and made it runnable with Iris' sample data:
import iris
import iris.plot as iplt
import cartopy.crs as ccrs
import matplotlib.pyplot as plt
fname = iris.sample_data_path('air_temp.pp')
air_temp = iris.load_cube(fname)
ax = plt.axes(projection=ccrs.Robinson())
iplt.contourf(air_temp, transform=ccrs.Robinson(central_longitude=180))
ax.coastlines()
plt.show()
If you run this code, you will get an exception along the lines of:
Traceback (most recent call last):
File "using_custom_projections.py", line 11, in <module>
iris.plot.contourf(air_temp, transform=ccrs.Robinson())
File "lib/iris/plot.py", line 452, in contourf
result = _draw_2d_from_points('contourf', None, cube, *args, **kwargs)
File "lib/iris/plot.py", line 263, in _draw_2d_from_points
result = _map_common(draw_method_name, arg_func, iris.coords.POINT_MODE, cube, data, *args, **kwargs)
File "lib/iris/plot.py", line 406, in _map_common
assert 'transform' not in kwargs, 'Transform keyword is not allowed.'
AssertionError: Transform keyword is not allowed.
Which is trying to tell you that you do not need to tell it which "transform" (or coordinate system) the cube is in. The reason for that is that an Iris cube should contain full metadata about the underlying data: the coordinate systems is part of that metadata.
So, to get the example to work, you can simply remove the transform keyword argument in your contourf call:
import iris
import iris.plot as iplt
import cartopy.crs as ccrs
import matplotlib.pyplot as plt
fname = iris.sample_data_path('air_temp.pp')
air_temp = iris.load_cube(fname)
ax = plt.axes(projection=ccrs.Robinson(central_longitude=180))
iplt.contourf(air_temp)
ax.coastlines()
plt.show()
There is a similar example in the iris gallery, specifically http://scitools.org.uk/iris/docs/latest/examples/graphics/rotated_pole_mapping.html#rotated-pole-mapping-03 (the very last plot in the example).
HTH,

Categories