ValueError: could not convert string to float: 'ID1' - python
import pandas as pd
from sklearn.linear_model import LinearRegression
data = {
'ID': ['ID1', 'ID2', 'ID3', 'ID4', 'ID5'],
'RMSE': [10.05616902165789, 9.496130901397015, 9.857060740380899,9.528204292426823,9.491117416326155]
}
df = pd.DataFrame(data)
X = df[['ID']]
y = df['RMSE']
reg = LinearRegression().fit(X, y)
preds = reg.predict(X)
mean_pred = preds.mean()
print('Mean of predicted RMSE values:', mean_pred)
how to resolve this error.
You are getting the error because your column, ID only contains str objects, which makes it impossible to convert to float. The X column must be numerical in order to work.
Related
LinearRegression TypeError
The above screenshot is refereed to as: sample.xlsx. I've been having trouble getting the beta for each stock using the LinearRegression() function. Input: import numpy as np import pandas as pd from sklearn.linear_model import LinearRegression df = pd.read_excel('sample.xlsx') mean = df['ChangePercent'].mean() for index, row in df.iterrows(): symbol = row['stock'] perc = row['ChangePercent'] x = np.array(perc).reshape((-1, 1)) y = np.array(mean) model = LinearRegression().fit(x, y) print(model.coef_) Output: Line 16: model = LinearRegression().fit(x, y) "Singleton array %r cannot be considered a valid collection." % x TypeError: Singleton array array(3.34) cannot be considered a valid collection. How can I make the collection valid so that I can get a beta value(model.coef_) for each stock?
X and y must have same shape, so you need to reshape both x and y to 1 row and 1 column. In this case it is resumed to the following: np.array(mean).reshape(-1,1) or np.array(mean).reshape(1,1) Given that you are training 5 classifiers, each one with just one value, is not surprising that the 5 models will "learn" that the coefficient of the linear regression is 0 and the intercept is 3.37 (y). import numpy as np import pandas as pd from sklearn.linear_model import LinearRegression df = pd.DataFrame({ "stock": ["ABCD", "XYZ", "JK", "OPQ", "GHI"], "ChangePercent": [-1.7, 30, 3.7, -15.3, 0] }) mean = df['ChangePercent'].mean() for index, row in df.iterrows(): symbol = row['stock'] perc = row['ChangePercent'] x = np.array(perc).reshape(-1,1) y = np.array(mean).reshape(-1,1) model = LinearRegression().fit(x, y) print(f"{model.intercept_} + {model.coef_}*{x} = {y}") Which is correct from an algorithmic point of view, but it doesn't make any practical sense given that you're only providing one example to train each model.
Converting strings to numpy arrays to run a for loop on dataframe linear regression
import datetime import numpy as np from sklearn.linear_model import LinearRegression import os os.chdir(r'path') df = pd.read_excel("test.xlsx") print(df) def process_date(date_str): # the date format is month-day-year d = datetime.datetime.strptime(date_str, '%m-%d-%Y') return d.timestamp() df['Date'] = df['Date'].apply(process_date) df.head() for (columnName, columnData) in df.iteritems(): cn = columnName cd = columnData print('Column Name : ', columnName) print('Column Contents : ', columnData.values) X = np.array(cn).reshape(-1,1) y = cd.to_numpy() model = LinearRegression() model.fit(X, y) ValueError: Unable to convert array of bytes/strings into decimal numbers with dtype='numeric' Here is my dataset
could not convert string to float: 'Runny_nose'
import pandas as pd from sklearn.tree import DecisionTreeClassifier Disease_data = pd.read_csv("Disease_dataset.csv") X = Disease_data.drop(columns='Diseases') y = Disease_data['Diseases'] model = DecisionTreeClassifier() model.fit(X, y) I get this error: ValueError: could not convert string to float: 'Runny_nose' I tried Disease_data = Disease_data['Diseases'].astype(float) and music_data = pd.to_numeric(music_data, errors='coerce') instead I get empty columns
Some of your lines might don't have valid float data. Visit this thread for more info.
cannot concatenate object of type "<class 'numpy.ndarray'>"; only pd.Series, pd.DataFrame, and pd.Panel (deprecated) objs are valid
My input data is under the form: gold,Program,MethodType,CallersT,CallersN,CallersU,CallersCallersT,CallersCallersN,CallersCallersU,CalleesT,CalleesN,CalleesU,CalleesCalleesT,CalleesCalleesN,CalleesCalleesU,CompleteCallersCallees,classGold T,chess,Inner,Low,-1,-1,Low,-1,-1,High,-1,-1,-1,-1,Low,1,Trace, N,chess,Inner,-1,Low,-1,-1,Low,-1,-1,High,-1,-1,-1,Low,1,NoTrace, N,chess,Inner,-1,Low,-1,-1,Low,-1,-1,High,-1,-1,-1,Low,1,NoTrace, N,chess,Inner,-1,Low,-1,-1,Low,-1,-1,High,-1,-1,-1,Low,1,Trace, N,chess,Inner,-1,Low,-1,-1,Low,-1,-1,High,-1,-1,-1,Low,1,NoTrace, N,chess,Inner,-1,Low,-1,-1,Low,-1,-1,High,-1,-1,-1,Low,1,Trace, N,chess,Inner,-1,Low,-1,-1,Low,-1,-1,High,-1,-1,-1,Low,1,Trace, N,chess,Inner,-1,Low,-1,-1,Low,-1,-1,High,-1,-1,-1,Low,1,NoTrace, N,chess,Inner,-1,Low,-1,-1,Low,-1,-1,Medium,Medium,-1,High,High,0,Trace, N,chess,Inner,-1,Low,-1,-1,Low,-1,-1,Medium,Medium,-1,High,High,0,NoTrace, N,chess,Inner,-1,Low,-1,-1,Low,-1,-1,Medium,Medium,-1,High,High,0,NoTrace, N,chess,Inner,-1,Low,-1,-1,Low,-1,-1,Medium,Medium,-1,High,High,0,Trace, N,chess,Inner,-1,Low,-1,-1,Low,-1,-1,Medium,Medium,-1,High,High,0,NoTrace, T,chess,Inner,Low,-1,-1,Low,-1,-1,Medium,-1,Medium,High,-1,High,0,Trace, T,chess,Inner,Low,-1,-1,Low,-1,-1,Medium,-1,Medium,High,-1,High,0,Trace, N,chess,Inner,-1,Low,-1,-1,Low,-1,-1,Medium,Medium,-1,High,High,0,NoTrace, N,chess,Inner,-1,Low,-1,-1,-1,-1,Low,Low,High,Medium,-1,Medium,0,Trace, N,chess,Inner,-1,Low,-1,-1,-1,-1,-1,Medium,High,Low,Low,Medium,0,NoTrace, N,chess,Inner,-1,Low,-1,-1,-1,-1,-1,Medium,High,-1,Medium,Medium,0,NoTrace, T,chess,Inner,-1,Low,-1,-1,-1,-1,-1,Medium,High,Low,Low,Medium,0,Trace, N,chess,Inner,-1,Low,-1,-1,-1,-1,-1,Medium,High,-1,Medium,Medium,0,NoTrace, N,chess,Inner,-1,Low,-1,-1,-1,-1,Low,Low,High,Low,Low,Medium,0,Trace, N,chess,Inner,Low,-1,-1,-1,-1,-1,Low,Low,High,Low,Low,Medium,0,Trace, N,chess,Inner,-1,Low,-1,-1,-1,-1,-1,Medium,High,-1,Medium,Medium,0,NoTrace, .... N,chess,Inner,-1,Low,-1,-1,Medium,-1,-1,Low,Low,-1,-1,-1,0,Trace, N,chess,Inner,-1,Low,-1,-1,Medium,-1,-1,Low,Low,-1,-1,-1,0,NoTrace, T,chess,Inner,Low,-1,-1,Low,Low,-1,Low,-1,Low,-1,-1,-1,0,Trace, T,chess,Inner,Low,-1,-1,Medium,-1,-1,Low,-1,Low,-1,-1,-1,0,Trace, N,chess,Inner,-1,Low,-1,-1,Medium,-1,-1,Low,Low,-1,-1,-1,0,NoTrace, I am reading my data and I am trying to concatenate two data sets that are subsets of the original data set, here is the code I am using: import pandas as pd import numpy as np from sklearn.feature_selection import SelectFromModel from sklearn.model_selection import train_test_split # Feature Scaling from sklearn.preprocessing import StandardScaler SeparateProjectLearning=False CompleteCallersCallees=False PartialTrainingSetCompleteCallersCallees=True def main(): X_train={} X_test={} y_train={} y_test={} dataset = pd.read_csv( 'InputData.txt', sep= ',', index_col=False) #convert T into 1 and N into 0 dataset['gold'] = dataset['gold'].astype('category').cat.codes dataset['Program'] = dataset['Program'].astype('category').cat.codes dataset['classGold'] = dataset['classGold'].astype('category').cat.codes dataset['MethodType'] = dataset['MethodType'].astype('category').cat.codes dataset['CallersT'] = dataset['CallersT'].astype('category').cat.codes dataset['CallersN'] = dataset['CallersN'].astype('category').cat.codes dataset['CallersU'] = dataset['CallersU'].astype('category').cat.codes dataset['CallersCallersT'] = dataset['CallersCallersT'].astype('category').cat.codes dataset['CallersCallersN'] = dataset['CallersCallersN'].astype('category').cat.codes dataset['CallersCallersU'] = dataset['CallersCallersU'].astype('category').cat.codes dataset['CalleesT'] = dataset['CalleesT'].astype('category').cat.codes dataset['CalleesN'] = dataset['CalleesN'].astype('category').cat.codes dataset['CalleesU'] = dataset['CalleesU'].astype('category').cat.codes dataset['CalleesCalleesT'] = dataset['CalleesCalleesT'].astype('category').cat.codes dataset['CalleesCalleesN'] = dataset['CalleesCalleesN'].astype('category').cat.codes dataset['CalleesCalleesU'] = dataset['CalleesCalleesU'].astype('category').cat.codes pd.set_option('display.max_columns', None) row_count, column_count = dataset.shape Xcol = dataset.iloc[:, 1:column_count] CompleteSet=dataset.loc[dataset['CompleteCallersCallees'] == 1] CompleteSet_X = CompleteSet.iloc[:, 1:column_count].values CompleteSet_Y = CompleteSet.iloc[:, 0].values X_train, X_test, y_train, y_test = train_test_split(CompleteSet_X, CompleteSet_Y, test_size = 0.2, random_state = 0) TestSet=dataset.loc[dataset['CompleteCallersCallees'] == 0] X_test1=TestSet.iloc[:, 1:column_count].values X_test=pd.concat(X_test1,X_test) I want to build my own test set and training set by using concatenation and I am trying to concatenate X_test1 and X_test in the code above. However, the problem is that I am getting an error for the last line of code X_test=pd.concat(X_test1,X_test) and the error says TypeError: cannot concatenate object of type "<class 'numpy.ndarray'>"; only pd.Series, pd.DataFrame, and pd.Panel (deprecated) objs are valid. How can I fix this?
By adding .values to the end of your filters in the following lines: CompleteSet_X = CompleteSet.iloc[:, 1:column_count].values CompleteSet_Y = CompleteSet.iloc[:, 0].values X_test1=TestSet.iloc[:, 1:column_count].values You are extracting the underlying Numpy ndarray from the Pandas Series/DataFrame the prior code extracts, just remove .values at the end and you can use concat directly with the Series or DataFrame.
How can I plug a prediction of X into BernoulliNB.predict_proba?
I have a Python script that, given an observed class (X) and some columns of binary (Y), predicts a class (Pred_X). It then predicts the probability of each class (Prob(1), etc). How could I get the probability of only the observed class (Prob(X)) please? import pandas as pd from sklearn.naive_bayes import BernoulliNB BNB = BernoulliNB() # Data df_1 = pd.DataFrame({'X' : [1,2,1,1,1,2,1,2,2,1], 'Y1': [1,0,0,1,0,0,1,1,0,1], 'Y2': [0,0,1,0,0,1,0,0,1,0], 'Y3': [1,0,0,0,0,0,1,0,0,0]}) # Split the data df_I = df_1 .loc[ : , ['Y1', 'Y2', 'Y3']] S_O = df_1['X'] # Bernoulli Naive Bayes Classifier A_F = BNB.fit(df_I, S_O) # Predict X A_P = BNB.predict(df_I) df_P = pd.DataFrame(A_P) df_P.columns = ['Pred_X'] # Predict Probability A_R = BNB.predict_proba(df_I) df_R = pd.DataFrame(A_R) df_R.columns = ['Prob_1', 'Prob_2'] # Join df_1 = df_1.join(df_P) df_1 = df_1.join(df_R)
thanks #jezrael: # Rename the columns after the classes of X classes = df_1['X'].unique() df_R.columns = [classes] # Look up the predicted probability of X df_1['Prob_X'] = df_R.lookup(df_R.index, df_1.X)