I am trying to conduct Kprototype clustering algorithm. When I both run the model and try to do the cost graph as follows, I always get a 'no attribute' error for labels_ and cost_ functions. I checked the examples on several web sites, but there is no difference. What can I do? Thank you for your help.
1)
from kmodes.kmodes import KModes
from kmodes.kprototypes import KPrototypes
kproto1 = KPrototypes(n_clusters=15, init='Cao').fit_predict(data,categorical = [23])
labels= kproto1.labels_
**AttributeError: 'numpy.ndarray' object has no attribute 'label_'**
cost = []
range_cluster=[5,8,10,15,20,25,30,35,40,45,50,55,70,85,100]
for num_clusters in range_cluster:
kproto = KPrototypes(n_clusters=num_clusters, init='Cao').fit_predict(data, categorical=[23])
cost.append(kproto.cost_)
plt.plot(cost)
According to the source code, there are 2 ways to achieve this :
fit_predict method will return a tuple of labels, cost. So to get your labels, you should :
kproto1_result = KPrototypes(n_clusters=15, init='Cao').fit_predict(data,categorical = [23])
labels= kproto1[0]
or the 2nd method is just using the fit method :
kproto1 = KPrototypes(n_clusters=15, init='Cao').fit(data,categorical = [23])
labels = kproto1.labels_
Related
I am trying to run a logistic regression, predicting income based off age, num, and hours-per-week. The income column consists of either <=50K or >50. I have tried to replace the categorical data with numerics below by using the Pandas.map() function and recieved the error:
'DataFrame' object has no attribute 'map'. Then I tried adding the rdd function (as shown below) but get the error:
'DataFrame' object has no attribute 'rdd'
import pandas as pd
import statsmodels.api as sm
adult_train = pd.read_csv("C:/.../adult_training.csv")
adult_test = pd.read_csv("C:/.../adult_test.csv")
# Separate data into predictor variables, X, and target variables, y:
X = pd.DataFrame(adult_train[['age', 'hours-per-week', 'num']])
X = sm.add_constant(X)
y = pd.DataFrame(adult_train[['income']]).rdd.map({'<=50K': 0, '>50K': 1}).astype(int)
logreg01 = sm.Logit(y, X).fit()
If you could please help me be able to run the last line of code, it would be really appreciated.
I want to make a custom loss function that compares gradient between two images, using Keras. So I made a code like:
def mean_gradient_error(y_true,y_pred):
alpha = 0.6
if not B.is_tensor(y_pred):
y_pred = B.constant(y_pred)
y_true = B.cast(y_true, y_pred.dtype)
yt_grad = B_.tf.image.image_gradients(y_true)
yp_grad = B_.tf.image.image_gradients(y_pred)
dotprod = B.mean(1-B.sum(y_pred*y_true,axis=-1))
grad_diff = (yt_grad-yp_grad)
gerr = B.mean(grad_diff**2,axis=-1)
return (1-alpha)*dotprod+alpha*gerr
Here dotprod worked well, but grad_diff put out error message like :
TypeError: unsupported operand type(s) for -: 'tuple' and 'tuple'
I think keras says that I should change the type of yp_grad and yt_grad, but I'm not getting how I should.
Which code should I add ?
The documentation of tf.image.image_gradients states :
Returns
Pair of tensors (dy, dx) holding the vertical and horizontal image gradients (1-step finite difference).
This function returns a tuple. You can't subtract tuples.
One possibility is to compute the diff for x and y, and then sum the two errors.
yt_grad = B_.tf.image.image_gradients(y_true)
yp_grad = B_.tf.image.image_gradients(y_pred)
grad_diff_x = (yt_grad[0]-yp_grad[0])
grad_diff_y = (yt_grad[1]-yp_grad[1])
gerr = B.mean(grad_diff_x**2,axis=-1) + B.mean(grad_diff_y**2,axis=-1)
I am trying to use setTermCriteria with SVM. But when I use it I am getting below error:
AttributeError: 'cv2.ml_SVM' object has no attribute 'setTermCritera_MAX_ITER'
This is how I am using it
svm.setTermCritera_MAX_ITER=10000
svm.setTermCriteria_EPS = 1e-3
I am not getting error but not finding it useful when I use it below way:
cv2.setTermCritera_MAX_ITER=10000
cv2.setTermCriteria_EPS = 1e-3
When I try below method
svm.setTermCriteria(10000)
SystemError: new style getargs format but argument is not a tuple
Which is the right way to use it in Python with OpenCV
The error message is clear, a tuple is needed. Let's see the default value:
svm = cv2.ml.SVM_create()
svm.getTermCriteria()
returns (3, 1000, 1.1920928955078125e-07). So if you want to set only the maximum number of iterations should call:
svm.setTermCriteria((cv2.TermCriteria_MAX_ITER, 10000, 0))
and if want to keep the same epsilon criterion and also set max iter:
svm.setTermCriteria((cv2.TermCriteria_MAX_ITER + cv2.TermCriteria_EPS, 10000, 1.1920928955078125e-07))
I am using scikit learn for affinity propogation algo. My input data is a numpy array of size 2303*2303 . It is a similarity matrix. I want to calculate the distance of each element in a cluster to its centroid. When i try to print the labels, i am getting the following error:
"AffinityPropagation' object has no attribute 'label_'". Here is the code:
clusterer = AffinityPropagation(affinity = 'precomputed')
af = clusterer.fit(l2)
print af.label_
I am getting the following error:
AttributeError: 'AffinityPropagation' object has no attribute 'label_'
Thanks.
According to the docs of AffinityPropagation you have to type
print af.labels_
I am trying to use the predict() function of the statsmodels.formula.api OLS implementation. When I pass a new data frame to the function to get predicted values for an out-of-sample dataset result.predict(newdf) returns the following error: 'DataFrame' object has no attribute 'design_info'. What does this mean and how do I fix it? The full traceback is:
p = result.predict(newdf)
File "C:\Python27\lib\site-packages\statsmodels\base\model.py", line 878, in predict
exog = dmatrix(self.model.data.orig_exog.design_info.builder,
File "C:\Python27\lib\site-packages\pandas\core\frame.py", line 2088, in __getattr__
(type(self).__name__, name))
AttributeError: 'DataFrame' object has no attribute 'design_info'
EDIT: Here is a reproducible example. The error appears to occur when I pickle and then unpickle the result object (which I need to do in my actual project):
import cPickle
import pandas as pd
import numpy as np
import statsmodels.formula.api as sm
df = pd.DataFrame({"A": [10,20,30,324,2353], "B": [20, 30, 10, 1, 2332], "C": [0, -30, 120, 11, 2]})
result = sm.ols(formula="A ~ B + C", data=df).fit()
print result.summary()
test1 = result.predict(df) #works
f_myfile = open('resultobject', "wb")
cPickle.dump(result, f_myfile, 2)
f_myfile.close()
print("Result Object Saved")
f_myfile = open('resultobject', "rb")
model = cPickle.load(f_myfile)
test2 = model.predict(df) #produces error
Pickling and unpickling of a pandas DataFrame doesn't save and restore attributes that have been attached by a user, as far as I know.
Since the formula information is currently stored together with the DataFrame of the original design matrix, this information is lost after unpickling a Results and Model instance.
If you don't use categorical variables and transformations, then the correct designmatrix can be built with patsy.dmatrix. I think the following should work
x = patsy.dmatrix("B + C", data=df) # df is data for prediction
test2 = model.predict(x, transform=False)
or constructing the design matrix for the prediction directly should also work Note we need to explicitly add a constant that the formula adds by default.
from statsmodels.api import add_constant
test2 = model.predict(add_constant(df[["B", "C"]]), transform=False)
If the formula and design matrix contain (stateful) transformation and categorical variables, then it's not possible to conveniently construct the design matrix without the original formula information. Constructing it by hand and doing all the calculations explicitly is difficult in this case, and looses all the advantages of using formulas.
The only real solution is to pickle the formula information design_info independently of the dataframe orig_exog.