python and regression analysis

python and regression analysis - python

I am trying to do regression analysis in python, but there are errors. Please help me.
I already imported modules below:
import pandas as pd
import numpy as np
from scipy import stats
from statsmodels.sandbox.regression.predstd import wls_prediction_std
import statsmodels.api as sm
import matplotlib.pyplot as plt
%pylab
and I got data like below:
data=pd.read_csv('file.csv',names['storedate','amount','location'])
then I defined x and y like below:
x=data['amount']
y=data['location']
and I tried to do the code below
x = sm.add_constant(x, prepend=False)
but here is an first error like below:
AttributeError: 'numpy.ndarray' object has no attribute 'name'
and I also got an error with the code below:
model = sm.OLS(y,x)
results = model.fit()
the message is:
can't multiply sequence by non-int of type 'float'

I think, the error message "can't multiply sequence by non-int of type 'float'" appears, because x and y are not numpy arrays. Use
x = np.array(data['amount'])
y = np.array(data['location'])
instead of your current definition of x and y.

Related

'DataFrame' object has no attribute 'feature_names'

import matplotlib.pyplot as plt
from sklearn.datasets import load_boston
%matplotlib inline
boston = load_boston()
print(boston.keys())
When I type this I get the output:
dict_keys(['data', 'target', 'feature_names', 'DESCR', 'filename'])
so I know that feature_names is an attribute. However, when I type
boston.columns = boston.feature_names
the ouput comes as
'DataFrame' object has no attribute 'feature_names'

To convert boston sklearn dataset to pandas Dataframe use:
df = pd.DataFrame(boston.data,columns=boston.feature_names)
df['target'] = pd.Series(boston.target)

I had something similar. Also with scikitlearn to make a random forest with this tutorial:
https://www.datacamp.com/tutorial/random-forests-classifier-python
I stumbled upon this line of code:
import pandas as pd
feature_imp = pd.Series(clf.feature_importances_,**index=iris.feature_names**).sort_values(ascending=False)
feature_imp
I got an error from the bold part (between the **).
Thanks to the suggestions of #anky and #David Meu I tried:
feature_imp = pd.Series(clf.feature_importances_, index = dfNAN.columns.values).sort_values(ascending=False)
that results in the error:
ValueError: Length of values (4) does not match length of index (5)
so I tried:
feature_imp = pd.Series(clf.feature_importances_, index = dfNAN.columns.values[:4]).sort_values(ascending=False)
which works!

sklearn python3 categorical_features unrecognized error

I am currently following along with a Machine Learning Full Course sponsored by Simplilearn to get a better understanding of regression, and am running into this error:
TypeError: init() got an unexpected keyword argument 'categorical_features'
From this code:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
%matplotlib inline
companies = pd.read_csv('Companies_1000.csv')
X = companies.iloc[:, :-1].values
X = companies.iloc[:, :4].values
companies.head()
cmap = sns.cm.rocket_r
sns.heatmap(companies.corr(), cmap = cmap)
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
labelencoder = LabelEncoder()
X[:, 3] = labelencoder.fit_transform(X[:, 3])
onehotencoder = OneHotEncoder(categorical_features = [3])
X = onehotencoder.fit_transform(X).toarray()
print(X)
This is the csv file: https://raw.githubusercontent.com/boosuro/profit_estimation_of_companies/master/1000_Companies.csv
The video does not get the same error as me, and I have assumed that it is outdated, however after crawling through the sklearn docs, I have come up empty-handed for a solution. I am using python 3. If you want to check out exactly the info and code that's happening in the video, here it is:
https://www.youtube.com/watch?v=9f-GarcDY58
My error appears around the 47:25 mark. Thank you for checking this out, and thanks for your answers.

The error is due to the following line
onehotencoder = OneHotEncoder(categorical_features = [3])
There is no parameter named "categorical_features" . Instead there is "categories" where you can pass a list of categories. By default "categories" is set to "auto" which means it will automatically determine categories from the training data.
So you need not to pass anything in the OneHotEncoder() function, just leave it like this.
Change the line as below
onehotencoder = OneHotEncoder()

TypeError: Cannot interpret 'torch.uint8' as a data type

Today I have started to learn Pytorch and I stuck here. The code piece in the comment raises this error:
TypeError: Cannot interpret 'torch.uint8' as a data type
For changing the data type of the tensor I used: quzu_torch = quzu_torch.type(torch.float) But this time I got this error:
TypeError: Cannot interpret 'torch.float32' as a data type
import numpy as np
import torch
from matplotlib import pyplot as plt
quzu = np.array(Image.open('wool.jpg').resize((224,224)))
quzu_torch = torch.from_numpy(quzu)
plt.imsave('quzucuq.jpg', quzu)
# quzu_torch = quzu_torch.type(torch.bool)
plt.imsave('quzucuq_tensor.jpg', quzu_torch)```

Error when calling model.labels in KMeans

I am running this code
import pandas as np
import numpy as np
from sklearn import cluster
from sklearn.cluster import KMeans
model = cluster.KMeans(n_clusters=4, random_state=10)
Then I put that through a dataframe I am working on and that includes the columns age and income, which is the clusters I am working on,
model.fit(df[['income', 'age']]
And so far it works well until I run the following bit, which aims at creating a column with the label of the cluster each data point belongs to.
df['cluster'] = model.labels_df.head()
And this is the error code I get:
AttributeError: 'KMeans' object has no attribute 'labels_df'
Any suggestions?

The attribute to access the labels of the model is: model.labels_
Use:
df['cluster'] = model.labels_
By typing model.labels_df.head() you request the head of model.labels_df that does not exist.
I believe you have mistyped it and you need:
df['cluster'] = model.labels_
df.head()

TypeError: only size-1 arrays can be converted to Python scalars + Solution

According to Python Documentation a TypeError is defined as
Raised when an operation or function is applied to an object of inappropriate type. The associated value is a string giving details about the type mismatch.
exception TypeError
The reason I got this Error was because my code looked like this:
import math as m
import pylab as pyl
import numpy as np
#normal distribution function
def normal(x,mu,sigma):
P=(1/(m.sqrt(2*m.pi*sigma**2)))*(m.exp((-(x-mu)**2)/2*sigma**2))
return P
#solution
x = np.linspace(-5,5,1000)
P = normal(x,0,1)
#plotting the function
pyl.plot(x,P)
pyl.show()
P=(1/(m.sqrt(2***m**.pisigma2)))(**m.exp((-(x-mu)2)/2*sigma2))
Notice the m. - This is incorrect, because math. can only handle scalars. And the Error said that a TypeError had occurred.
np. (Numpy) can handle scalers as well as arrays and the problem is solved.

The right code looks like this:
import math as m
import pylab as pyl
import numpy as np
# normal distribution function
def normal(x,mu,sigma):
P = (1/(np.sqrt(2*np.pi*sigma**2))) * (np.exp((-(x-mu)**2)/2*sigma**2))
return P
# solution
x = np.linspace(-5,5,1000)
P = normal(x,0,1)
# plotting the function
pyl.plot(x,P)
pyl.show()
In the end we get a great normal distribution function that looks like this:
This Error occurred in Spyder IDE.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

python and regression analysis - python

I think, the error message "can't multiply sequence by non-int of type 'float'" appears, because x and y are not numpy arrays. Use x = np.array(data['amount']) y = np.array(data['location']) instead of your current definition of x and y.

Related

'DataFrame' object has no attribute 'feature_names'

sklearn python3 categorical_features unrecognized error

TypeError: Cannot interpret 'torch.uint8' as a data type

Error when calling model.labels in KMeans

TypeError: only size-1 arrays can be converted to Python scalars + Solution

Categories

Resources