AffinityPropagation' object has no attribute 'label_' - python

I am using scikit learn for affinity propogation algo. My input data is a numpy array of size 2303*2303 . It is a similarity matrix. I want to calculate the distance of each element in a cluster to its centroid. When i try to print the labels, i am getting the following error:
"AffinityPropagation' object has no attribute 'label_'". Here is the code:
clusterer = AffinityPropagation(affinity = 'precomputed')
af = clusterer.fit(l2)
print af.label_
I am getting the following error:
AttributeError: 'AffinityPropagation' object has no attribute 'label_'
Thanks.

According to the docs of AffinityPropagation you have to type
print af.labels_

Related

how to convert tensorflow tensor to numpy array from given example?

I am trying to compute Fast Fourier Transform (fft2d) but the following code provides the error:
print("type(pred[2]): ", type(pred[2]))
pred[2] = tf.make_ndarray(pred[2])
fft2_pre = np.fft.fft2(pred[2])
The error and output:
type(pred[2]): <class 'tensorflow.python.framework.ops.Tensor'>
AttributeError: 'Tensor' object has no attribute 'tensor_shape'
how it could be solved?
temp = tf.cast(pred[2], dtype=tf.complex64)
temp = tf.signal.fft(temp)

recover column details after PCA and Kmeans

I did KMeans clustering after reducing numerical columns in my DataFrame from 5 to 2 using PCA and plotted scatterplot
pc=PCA(n_components = 2).fit_transform(scaled_df)
scaled_df_PCA= pd.DataFrame(pc, columns=['pca_col1','pca_col2'])
#Then I did the KMeans and its plotting
label_PCA=final_km.fit_predict(scaled_df_PCA)
scaled_df_PCA["label_PCA_df"]=label_PCA
a=scaled_df_PCA[scaled_df_PCA.label_PCA_df==0]
b=scaled_df_PCA[scaled_df_PCA.label_PCA_df==1]
c=scaled_df_PCA[scaled_df_PCA.label_PCA_df==2]
sns.scatterplot(a.pca_col1, a.pca_col2, color="green")
sns.scatterplot(b.pca_col1, b.pca_col2, color="red")
sns.scatterplot(c.pca_col1, c.pca_col2, color="yellow")
I get 3 clusters from above based upon 2 columns reduced using PCA. Now I wish to get the columns back for further analysis of those clusters but I am not able to.
And when i use pc.components_ I get error :
AttributeError Traceback (most recent call last)
/tmp/ipykernel_33/4073743739.py in
----> 1 pc.components_
AttributeError: 'numpy.ndarray' object has no attribute 'components_'
or when I do scaled_df_PCA.components_
AttributeError: 'DataFrame' object has no attribute 'components_'
So I wanted to know how to recover details of columns back which were reduced during PCA.
This line from your code stores an NDArray into pc rather than the PCA instance.
pc=PCA(n_components = 2).fit_transform(scaled_df)
An easy fix is to create the PCA instance first and then call fit_transform().
pca = PCA(n_components=2)
df_transformed = pca.fit_transform(scaled_df)
Afterwards, you can still access attributes and methods of the PCA instance, pca.

Map categorical data for logistic regression

I am trying to run a logistic regression, predicting income based off age, num, and hours-per-week. The income column consists of either <=50K or >50. I have tried to replace the categorical data with numerics below by using the Pandas.map() function and recieved the error:
'DataFrame' object has no attribute 'map'. Then I tried adding the rdd function (as shown below) but get the error:
'DataFrame' object has no attribute 'rdd'
import pandas as pd
import statsmodels.api as sm
adult_train = pd.read_csv("C:/.../adult_training.csv")
adult_test = pd.read_csv("C:/.../adult_test.csv")
# Separate data into predictor variables, X, and target variables, y:
X = pd.DataFrame(adult_train[['age', 'hours-per-week', 'num']])
X = sm.add_constant(X)
y = pd.DataFrame(adult_train[['income']]).rdd.map({'<=50K': 0, '>50K': 1}).astype(int)
logreg01 = sm.Logit(y, X).fit()
If you could please help me be able to run the last line of code, it would be really appreciated.

AttributeError: 'numpy.ndarray' object has no attribute 'cost_'

I am trying to conduct Kprototype clustering algorithm. When I both run the model and try to do the cost graph as follows, I always get a 'no attribute' error for labels_ and cost_ functions. I checked the examples on several web sites, but there is no difference. What can I do? Thank you for your help.
1)
from kmodes.kmodes import KModes
from kmodes.kprototypes import KPrototypes
kproto1 = KPrototypes(n_clusters=15, init='Cao').fit_predict(data,categorical = [23])
labels= kproto1.labels_
**AttributeError: 'numpy.ndarray' object has no attribute 'label_'**
cost = []
range_cluster=[5,8,10,15,20,25,30,35,40,45,50,55,70,85,100]
for num_clusters in range_cluster:
kproto = KPrototypes(n_clusters=num_clusters, init='Cao').fit_predict(data, categorical=[23])
cost.append(kproto.cost_)
plt.plot(cost)
According to the source code, there are 2 ways to achieve this :
fit_predict method will return a tuple of labels, cost. So to get your labels, you should :
kproto1_result = KPrototypes(n_clusters=15, init='Cao').fit_predict(data,categorical = [23])
labels= kproto1[0]
or the 2nd method is just using the fit method :
kproto1 = KPrototypes(n_clusters=15, init='Cao').fit(data,categorical = [23])
labels = kproto1.labels_

Pandas Dataframe AttributeError: 'DataFrame' object has no attribute 'design_info'

I am trying to use the predict() function of the statsmodels.formula.api OLS implementation. When I pass a new data frame to the function to get predicted values for an out-of-sample dataset result.predict(newdf) returns the following error: 'DataFrame' object has no attribute 'design_info'. What does this mean and how do I fix it? The full traceback is:
p = result.predict(newdf)
File "C:\Python27\lib\site-packages\statsmodels\base\model.py", line 878, in predict
exog = dmatrix(self.model.data.orig_exog.design_info.builder,
File "C:\Python27\lib\site-packages\pandas\core\frame.py", line 2088, in __getattr__
(type(self).__name__, name))
AttributeError: 'DataFrame' object has no attribute 'design_info'
EDIT: Here is a reproducible example. The error appears to occur when I pickle and then unpickle the result object (which I need to do in my actual project):
import cPickle
import pandas as pd
import numpy as np
import statsmodels.formula.api as sm
df = pd.DataFrame({"A": [10,20,30,324,2353], "B": [20, 30, 10, 1, 2332], "C": [0, -30, 120, 11, 2]})
result = sm.ols(formula="A ~ B + C", data=df).fit()
print result.summary()
test1 = result.predict(df) #works
f_myfile = open('resultobject', "wb")
cPickle.dump(result, f_myfile, 2)
f_myfile.close()
print("Result Object Saved")
f_myfile = open('resultobject', "rb")
model = cPickle.load(f_myfile)
test2 = model.predict(df) #produces error
Pickling and unpickling of a pandas DataFrame doesn't save and restore attributes that have been attached by a user, as far as I know.
Since the formula information is currently stored together with the DataFrame of the original design matrix, this information is lost after unpickling a Results and Model instance.
If you don't use categorical variables and transformations, then the correct designmatrix can be built with patsy.dmatrix. I think the following should work
x = patsy.dmatrix("B + C", data=df) # df is data for prediction
test2 = model.predict(x, transform=False)
or constructing the design matrix for the prediction directly should also work Note we need to explicitly add a constant that the formula adds by default.
from statsmodels.api import add_constant
test2 = model.predict(add_constant(df[["B", "C"]]), transform=False)
If the formula and design matrix contain (stateful) transformation and categorical variables, then it's not possible to conveniently construct the design matrix without the original formula information. Constructing it by hand and doing all the calculations explicitly is difficult in this case, and looses all the advantages of using formulas.
The only real solution is to pickle the formula information design_info independently of the dataframe orig_exog.

Categories