i am trying to learn about decision trees and I found a tutorial about the subject. The goal of the tutorial is to decide if a flower is a special type of flower called iris but i seem to run into some errors that i hope somebody got the answer to i get two errors like the following:
iris: Bunch
iris: inner_f
Instance of 'tuple' has no 'target' member
and
iris: Bunch
iris: inner_f
Instance of 'tuple' has no 'data' member
here is the code:
import warnings
warnings.filterwarnings('ignore')
import pandas as pd
from sklearn import datasets
from sklearn.model_selection import train_test_split
#load iris data
iris = datasets.load_iris()
x = iris.data
y = iris.target
d = [{"sepal_length":row[0],
"sepal_width":row[1],
"petal_length":row[2],
"petal_width":row[3]} for row in x]
df = pd.DataFrame(d) # construct dataframe
df["types"] = y # assign types
df = df.sample(frac=1.0) # random shuffle rows
df.head()
Is there anybody that knows why i get these errors?
Related
when i try to define X and Y from my dataset that is already defined and I made some analysis based on it and i dont have any problem.
but when i start to define (X) and (Y), an error message appear that "NameError: name 'MyNewDataSet' is not defined
the dataset name is "MyNewDataSet"
do I need to define a new dataset before assign the values to X and Y? or what should I do
Here is how I define X and Y from my dataframe:
#Create X
X = MyNewDataSet.drop(['y'], axis=1)
For X I am using multiple columns so all I do is remove the column I will be using for my y variable.
#Create y
y = MyNewDataSet['y']
Here I create Y by assigning the columns y to the variable.
If this does not work please share some of your code. That way it might be easier for us to visualize your problem, but I hope this can help.
there is my code to define my new dataset
###KNN imputation to fill missing values.
import numpy as np
from numpy import isnan
from sklearn.impute import KNNImputer
imputer = KNNImputer(n_neighbors=5)
imputer.fit(MyDataSet)
MyNewDataSet = pd.DataFrame(imputer.transform(MyDataSet), columns = MyDataSet.columns)
MyNewDataSet.set_index(MyDataSet.index, inplace= True)
MyNewDataSet = MyNewDataSet.astype(MyDataSet.dtypes.to_dict())
MyNewDataSet
and here is my code to assign values to X and Y
### Split the dataset into a set of features (X) and Target variable (y)
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from sklearn.linear_model import LinearRegression
from sklearn.linear_model import HuberRegressor
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
y=MyNewDataSet["Target"]
x=MyNewDataSet.drop("Target", axis=1)
i am trying to learn about decision trees and I ended up finding a article about decision trees. The goal of the article is to decide if a flower is a iris flower or not but i seem to run into some errors that i hope somebody got the answer to i get two errors like the following:
iris: Bunch iris: inner_f Instance of 'tuple' has no 'target' member
and
iris: Bunch iris: inner_f Instance of 'tuple' has no 'data' member
i get these errors at the x = iris.data line and at the y = iris.target line.
Here is the code:
import warnings
warnings.filterwarnings('ignore')
import pandas as pd
from sklearn import datasets
from sklearn.model_selection import train_test_split
#load iris data
iris = datasets.load_iris()
x = iris.data
y = iris.target
d = [{"sepal_length":row[0],
"sepal_width":row[1],
"petal_length":row[2],
"petal_width":row[3]} for row in x]
df = pd.DataFrame(d) # construct dataframe
df["types"] = y # assign types
df = df.sample(frac=1.0) # random shuffle rows
df.head()
Is there anybody that knows why i get these errors?
Your error message indicates that the problematic value iris is a tuple, which doesn't have the attributes you're referencing. Check the documentation for the tools you're using; they should explain how to unpack datasets.load_iris() into the objects you need.
I would not filter warnings in most cases, as you get useful information from the warnings.
So, the sklearn datasets format is a Bunch, which is a specialized container object that works like a dictionary. You can access it with dot notation, e.g. iris.data or dictionary notation, e.g. iris['data']. Here, it is unclear what the error is on your machine, as I (like other commenters) had no problem accessing iris.data or iris['data'] in python 3.8.5.
I wanted to let you know a couple of places to improve your approach:
(1) It is unclear why you need to construct a dataframe as you can get the samples you need directly from calling train_test_split on the concatenated numpy arrays or you can get a random sample of indices from the numpy arrays directly.
(2) Your method for constructing the dataframe is more complex than it needs to be.
import pandas as pd
import numpy as np
from sklearn import datasets
from sklearn.model_selection import train_test_split
# load iris data
iris = datasets.load_iris()
# train test split
X_train, y_train, X_test, y_test = train_test_split(iris.data, iris.target)
# random shuffle of data/target indices
rng = np.random.default_rng()
rng_size = iris.data.shape[0]
idx_sample = rng.choice(np.arange(rng_size), size=rng_size, replace=False)
# simpler way to create dataframe
# concatenate along the columns (axis 1)
# then set the column names in one place
df = pd.concat([pd.DataFrame(iris.data), pd.DataFrame(iris.target)], axis=1)
df.columns = ["sepal_length", "sepal_width", "petal_length", "petal_width", "types"]
import matplotlib.pyplot as plt
from sklearn.datasets import load_boston
%matplotlib inline
boston = load_boston()
print(boston.keys())
When I type this I get the output:
dict_keys(['data', 'target', 'feature_names', 'DESCR', 'filename'])
so I know that feature_names is an attribute. However, when I type
boston.columns = boston.feature_names
the ouput comes as
'DataFrame' object has no attribute 'feature_names'
To convert boston sklearn dataset to pandas Dataframe use:
df = pd.DataFrame(boston.data,columns=boston.feature_names)
df['target'] = pd.Series(boston.target)
I had something similar. Also with scikitlearn to make a random forest with this tutorial:
https://www.datacamp.com/tutorial/random-forests-classifier-python
I stumbled upon this line of code:
import pandas as pd
feature_imp = pd.Series(clf.feature_importances_,**index=iris.feature_names**).sort_values(ascending=False)
feature_imp
I got an error from the bold part (between the **).
Thanks to the suggestions of #anky and #David Meu I tried:
feature_imp = pd.Series(clf.feature_importances_, index = dfNAN.columns.values).sort_values(ascending=False)
that results in the error:
ValueError: Length of values (4) does not match length of index (5)
so I tried:
feature_imp = pd.Series(clf.feature_importances_, index = dfNAN.columns.values[:4]).sort_values(ascending=False)
which works!
I have imported values into python from a PostgreSQL DB.
data = cur.fetchall()
The list is like this:-
[('Ending Crowds', 85, Decimal('50.49')), ('Salute Apollo', 73, Decimal('319.93'))][0]
I need to give 85 as X & Decimal('50.49') as Y in LinearRegression model
Then I imported packages & class-
import numpy as np
import pandas as pd
from sklearn.linear_model import LinearRegression
I provide data & perform linear regression -
X = data.iloc[:, 1].values.reshape(-1, 1)
Y = data.iloc[:, 2].values.reshape(-1, 1)
linear_regressor = LinearRegression() # create object for the class
linear_regressor.fit(X, Y) # perform linear regression
I am getting the error-
AttributeError: 'list' object has no attribute 'iloc'
I am a beginner to pyhon and started just 2 days back but need to do linear regression in python at my job for a project. I think iloc can't be used for list object. But, not able to figure out as to how to pass on X & Y values to linear_regressor. All the examples performing Linear Regression on sites are using .CSV. Please help me out.
No, you can't use .iloc on 'list', it is for dataframe.
convert it into dataframe and try using .iloc
Your solution is below, please approve it if it is correct.
Because it's my 1st answer on StackOverflow
import pandas as pd
from decimal import Decimal
from sklearn.linear_model import LinearRegression
#I don't know what that "[0]" in your list,because I haven't used data fetched from PostgreSQL. Anyway remove it first and store it in temp
temp=[('Ending Crowds', 85, Decimal('50.49')), ('Salute Apollo', 73, Decimal('319.93'))]
#I don't know it really needed or not
var = list(var)
data = []
#It is to remove "Decimal" word
for row in var:
data.append(list(map(str, list(row))))
data=pd.DataFrame(data,columns=["no_use","X","Y"])
X=data['X'].values.reshape(-1, 1)
Y=data['Y'].values.reshape(-1, 1)
print(X,Y)
linear_regressor = LinearRegression() # create object for the class
linear_regressor.fit(X, Y) # perform linear regression
I am running this code
import pandas as np
import numpy as np
from sklearn import cluster
from sklearn.cluster import KMeans
model = cluster.KMeans(n_clusters=4, random_state=10)
Then I put that through a dataframe I am working on and that includes the columns age and income, which is the clusters I am working on,
model.fit(df[['income', 'age']]
And so far it works well until I run the following bit, which aims at creating a column with the label of the cluster each data point belongs to.
df['cluster'] = model.labels_df.head()
And this is the error code I get:
AttributeError: 'KMeans' object has no attribute 'labels_df'
Any suggestions?
The attribute to access the labels of the model is: model.labels_
Use:
df['cluster'] = model.labels_
By typing model.labels_df.head() you request the head of model.labels_df that does not exist.
I believe you have mistyped it and you need:
df['cluster'] = model.labels_
df.head()