How use Linear Discriminant Analysis to predict based on values from serial - python

I'm working on a program that predicts hand movements based on EMG signals. So far, i have a CSV file to be used as a database for the LDA program. The issue that i'm finding is actually being able to predict with the program. Is there a way in which I can predict the finger movement based on the values I get from my serial port (my sensors)?
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis as LDA
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score
import serial as ser
names = ['Finger', 'Val1', 'Val2', 'Val3']
dataset = pd.read_csv('EmgSig.csv', names=names)
X = dataset.iloc[:, 1:3].values
y = dataset.iloc[:, 0].values
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
lda = LDA(n_components=1)
X_train = lda.fit_transform(X_train, y_train)
X_test = lda.transform(X_test)
classifier = RandomForestClassifier(max_depth=2, random_state=0)
classifier.fit(X_train, y_train)
y_pred = classifier.predict(X_test)
cm = confusion_matrix(y_test, y_pred)
print(cm)
print('Accuracy' + str(accuracy_score(y_test, y_pred)))
while True:
data = ser.readline()
decode = (data[0:len(data)-2].decode("utf-8"))
datasplit = decode.split('-')
Val1 = int(datasplit[0])
Val2 = int(datasplit[1])
Val3 = int(datasplit[2])

Related

gridsearch before RFE is taking super long

I try to do a gridsearch on my dataset to know how many features i want to select in my RFE, but it is taking super long. Does anyone know if this is normal, or do i have a foult in my script?
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.feature_selection import RFECV, RFE
from sklearn.model_selection import StratifiedKFold, GridSearchCV
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.pipeline import make_pipeline
import matplotlib.pyplot as plt
#%% train-test split
data = pd.read_csv('preprocesseddata.csv')
data.drop(['Date', 'About'], axis=1, inplace=True)
y = data['Class']
X = data[['Duration_Ball Training','Duration_Match','Duration_Other','Duration_Strenght Training','Positie','Gender','Voorkeursbeen','Instroomjaar','Age','Hours Sleep','Stress','Muscle Soreness','T-test','20m Sprint','CMJ 2b','Yo Yo Result','Heart Rate (Max)','Latest Height', 'Body Fat %','Repetitive Injury','Prefered Leg','AcuteLegs_1day','AcuteCardio_1day','AcuteLegs_3days','AcuteCardio_3days','AcuteLegs_7days','AcuteCardio_7days','ChronicLegs_14days','ChronicCardio_14days','ChronicLegs_21days','ChronicCardio_21days','ChronicLegs_28days','ChronicCardio_28days','TrainingmonotonyLegs','TrainingmonotonyCardio']]
y = y.astype('category')
y = y.cat.codes
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42, stratify=y)
sc = StandardScaler()
X_train_std = sc.fit_transform(X_train)
X_test_std = sc.transform(X_test)
#%% RFE as part of pipeline
lr = LogisticRegression(solver='liblinear', random_state=123)
pipe = make_pipeline(RFE(estimator=lr, step=1), KNeighborsClassifier())
parameters = {'rfe__n_features_to_select': range(1,35), 'kneighborsclassifier__n_neighbors': range(1,30)}
grid = GridSearchCV(pipe, param_grid=parameters, cv=10, n_jobs=1)
grid.fit(X_train_std, y_train)
print('Best params:', grid.best_params_)
print('Best accuracy:', grid.best_score_)
#%% RFE
lr = LogisticRegression(solver='liblinear', random_state=123)
rfe= RFE(estimator=lr, n_features_to_select=5, step=-1)
rfe.fit(X_train_std, y_train)
X_train_sub = rfe.transform(X_train_std)
rfe.support_
It seems to get stuck at the print best parameters line

How to feed data into random forest classifier and see prediction

I have build a random forest classifier using scikit learn and python, and I am having trouble actually feeding data in to see the prediction. I want to see the format of the output, and to convert this to a json file. I have attached the code for the random forest and what the data looks like. I believe I need to use 'y_pred', but I am not sure what format the input data needs to be.
X = dataset.iloc[:, 2:4].values
y = dataset["pages"]
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.2,random_state=0)
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
from sklearn.ensemble import RandomForestClassifier
classifier = RandomForestClassifier(n_estimators=20, random_state = 0)
classifier = classifier.fit(X_train,y_train)
y_pred = classifier.predict(X_test)
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score
You can simply concatenate the predicted values with the matrix of features.
Also note that the pipeline is exactly for this purpose, when you first want to transform the data and then apply some classifier.
This should work for you:
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import make_pipeline
from sklearn.ensemble import RandomForestClassifier
import pandas as pd
X = dataset.iloc[:, 2:4].values
y = dataset["pages"]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
classifier = make_pipeline(StandardScaler(), RandomForestClassifier(n_estimators=20, random_state=0))
classifier = classifier.fit(X_train, y_train)
y_pred = classifier.predict(X_test)
pred = pd.concat([X_test, pd.Series(y_pred, name="pages")], axis=1)

Error: y could not convert string to float python random forests

I am using Python and random forests to predict the first column of my input file, my input file is under the form of:
T,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
N,0,0,0,0,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
N,0,0,0,0,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
N,0,0,0,0,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
N,0,0,0,0,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
N,0,0,0,0,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
N,0,0,0,0,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
Here is the link to my full data: https://drive.google.com/file/d/1gjKoSi4rmMYZVm31LZ2Li92HM9USlu6A/view?usp=sharing
I am trying to predict the first column either T or N, depending on the values of the remaining columns and I am using random forests. I am getting the following error, how to fix it? Here is the code:
import pandas as pd
import numpy as np
dataset = pd.read_csv( 'data1extended.txt', sep= ',')
dataset.head()
row_count, column_count = dataset.shape
X = dataset.iloc[:, 1:column_count].values
y = dataset.iloc[:, 0].values
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
from sklearn.ensemble import RandomForestRegressor
regressor = RandomForestRegressor(n_estimators=20, random_state=0)
regressor.fit(X_train, y_train)
y_pred = regressor.predict(X_test)
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score
print(confusion_matrix(y_test,y_pred))
print(classification_report(y_test,y_pred))
print(accuracy_score(y_test, y_pred))
Try changing your target variable to numeric first. Assuming 'gold' column is your target, run this immediately after loading the data to a dataframe.
dataset['gold'] = dataset['gold'].astype('category').cat.codes

ValueError: The number of classes has to be greater than one; got 1 class ScikitLearn Python

I have a problem with this code. The error is on the line: ppn.fit(X_train, y_train)
I just use Python 3.7
import pandas as pd
import numpy as np
import sklearn as sk
from sklearn.metrics import accuracy_score
from sklearn.linear_model import Perceptron
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
df = pd.read_csv("file.csv", sep=',', error_bad_lines=False, low_memory=False)
X = df.iloc[:, 1:44].values
y = df.iloc[:, 48].values
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)
X_train = np.isnan(X_train)
y_train = np.isnan(y_train)
X_test = np.isnan(X_test)
ppn = Perceptron(max_iter=40, tol=0.001, eta0=0.1, random_state=0)
ppn.fit(X_train, y_train)
y_pred = ppn.predict(X_test)
y_pred = np.isnan(y_pred)
print(accuracy_score(y_test, y_pred))
How can I fix it? Thanks.

What does the error mean and how to fix it - "ValueError: query data dimension must match training data dimension"

I am trying to write the code for K-NN
Below is my code. - I know that issue is in `predict() but I am not able to figure out how o fix it.
# Importing the libraries
import numpy as np
import pandas as pd
# Importing the dataset
dataset = pd.read_csv('UniversalBank.csv')
X = dataset.iloc[:,[ 1,2,3,5,6,7,8,10,11,12,13]].values #,
y = dataset.iloc[:,9].values
#Splitting the dataset to training and test sets
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state= 0)
#Feature scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
#Fitting the classifier to training set
from sklearn.neighbors import KNeighborsClassifier
classifier = KNeighborsClassifier(n_neighbors = 5, metric = 'minkowski', p = 2)
classifier.fit(X_train,y_train)
#Predicting the test results
y_pred = classifier.predict(X_test)

Categories