import matplotlib.pyplot as plt
from sklearn.datasets import load_boston
%matplotlib inline
boston = load_boston()
print(boston.keys())
When I type this I get the output:
dict_keys(['data', 'target', 'feature_names', 'DESCR', 'filename'])
so I know that feature_names is an attribute. However, when I type
boston.columns = boston.feature_names
the ouput comes as
'DataFrame' object has no attribute 'feature_names'
To convert boston sklearn dataset to pandas Dataframe use:
df = pd.DataFrame(boston.data,columns=boston.feature_names)
df['target'] = pd.Series(boston.target)
I had something similar. Also with scikitlearn to make a random forest with this tutorial:
https://www.datacamp.com/tutorial/random-forests-classifier-python
I stumbled upon this line of code:
import pandas as pd
feature_imp = pd.Series(clf.feature_importances_,**index=iris.feature_names**).sort_values(ascending=False)
feature_imp
I got an error from the bold part (between the **).
Thanks to the suggestions of #anky and #David Meu I tried:
feature_imp = pd.Series(clf.feature_importances_, index = dfNAN.columns.values).sort_values(ascending=False)
that results in the error:
ValueError: Length of values (4) does not match length of index (5)
so I tried:
feature_imp = pd.Series(clf.feature_importances_, index = dfNAN.columns.values[:4]).sort_values(ascending=False)
which works!
Related
i am trying to learn about decision trees and I found a tutorial about the subject. The goal of the tutorial is to decide if a flower is a special type of flower called iris but i seem to run into some errors that i hope somebody got the answer to i get two errors like the following:
iris: Bunch
iris: inner_f
Instance of 'tuple' has no 'target' member
and
iris: Bunch
iris: inner_f
Instance of 'tuple' has no 'data' member
here is the code:
import warnings
warnings.filterwarnings('ignore')
import pandas as pd
from sklearn import datasets
from sklearn.model_selection import train_test_split
#load iris data
iris = datasets.load_iris()
x = iris.data
y = iris.target
d = [{"sepal_length":row[0],
"sepal_width":row[1],
"petal_length":row[2],
"petal_width":row[3]} for row in x]
df = pd.DataFrame(d) # construct dataframe
df["types"] = y # assign types
df = df.sample(frac=1.0) # random shuffle rows
df.head()
Is there anybody that knows why i get these errors?
I am running this code
import pandas as np
import numpy as np
from sklearn import cluster
from sklearn.cluster import KMeans
model = cluster.KMeans(n_clusters=4, random_state=10)
Then I put that through a dataframe I am working on and that includes the columns age and income, which is the clusters I am working on,
model.fit(df[['income', 'age']]
And so far it works well until I run the following bit, which aims at creating a column with the label of the cluster each data point belongs to.
df['cluster'] = model.labels_df.head()
And this is the error code I get:
AttributeError: 'KMeans' object has no attribute 'labels_df'
Any suggestions?
The attribute to access the labels of the model is: model.labels_
Use:
df['cluster'] = model.labels_
By typing model.labels_df.head() you request the head of model.labels_df that does not exist.
I believe you have mistyped it and you need:
df['cluster'] = model.labels_
df.head()
I have used the following code to convert the sk learn breast cancer data set to data frame : I am not getting the output ? I am very new in python and not able to figure out what is wrong.
def answer_one():
import numpy as np
import pandas as pd
from sklearn.datasets import load_breast_cancer
cancer = load_breast_cancer()
data = numpy.c_[cancer.data, cancer.target]
columns = numpy.append(cancer.feature_names, ["target"])
return pandas.DataFrame(data, columns=columns)
answer_one()
Use pandas
There was a great answer here: How to convert a Scikit-learn dataset to a Pandas dataset?
The keys in bunch object give you an idea about which data you want to make columns for.
df = pd.DataFrame(cancer.data, columns=cancer.feature_names)
df['target'] = pd.Series(cancer.target)
The following code works
def answer_one():
import numpy as np
import pandas as pd
from sklearn.datasets import load_breast_cancer
cancer = load_breast_cancer()
data = np.c_[cancer.data, cancer.target]
columns = np.append(cancer.feature_names, ["target"])
return pd.DataFrame(data, columns=columns)
answer_one()
The reason why your code doesn't work before was you try to call numpy and pandas package again after defining it as np and pd respectively.
However, i suggest that the package loading and redefinition is done at the beginning of the script, outside a function definition.
As of scikit-learn 0.23 you can do the following to get a DataFrame and save some keystrokes:
df = load_breast_cancer(as_frame=True)
df.frame
dataframe = pd.DataFrame(data=cancer.data, columns=cancer.feature_names)
dataframe['target'] = cancer.target
return dataframe
I am trying to do regression analysis in python, but there are errors. Please help me.
I already imported modules below:
import pandas as pd
import numpy as np
from scipy import stats
from statsmodels.sandbox.regression.predstd import wls_prediction_std
import statsmodels.api as sm
import matplotlib.pyplot as plt
%pylab
and I got data like below:
data=pd.read_csv('file.csv',names['storedate','amount','location'])
then I defined x and y like below:
x=data['amount']
y=data['location']
and I tried to do the code below
x = sm.add_constant(x, prepend=False)
but here is an first error like below:
AttributeError: 'numpy.ndarray' object has no attribute 'name'
and I also got an error with the code below:
model = sm.OLS(y,x)
results = model.fit()
the message is:
can't multiply sequence by non-int of type 'float'
I think, the error message "can't multiply sequence by non-int of type 'float'" appears, because x and y are not numpy arrays. Use
x = np.array(data['amount'])
y = np.array(data['location'])
instead of your current definition of x and y.
I'm trying to run code provided by yhat in their article about random forests in Python, but I keep getting following error message:
File "test_iris_with_rf.py", line 11, in <module>
df['species'] = pd.Factor(iris.target, iris.target_names)
AttributeError: 'module' object has no attribute 'Factor'
Code:
from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier
import pandas as pd
import numpy as np
iris = load_iris()
df = pd.DataFrame(iris.data, columns=iris.feature_names)
print df
print iris.target_names
df['is_train'] = np.random.uniform(0, 1, len(df)) <= .75
df['species'] = pd.Factor(iris.target, iris.target_names)
df.head()
In newer versions of pandas, the Factor is called Categorical instead. Change your line to:
df['species'] = pd.Categorical.from_codes(iris.target, iris.target_names)
Categorical variables seems to be one of the more active areas of development in pandas, so I believe it's changed yet again in pandas 0.15.0 :
df['species'] = pd.Categorical.from_codes(iris.target, iris.target_names)
(I lacked sufficient reputation to add this as a comment on David Robinson's answer)
def factor(series):
#input should be a pandas series object
dic = {}
for i,val in enumerate(series.value_counts().index):
dic[val] = i
return [ dic[val] for val in series.values ]