Using Machine Learning in Python to load custom datasets? - python

Here's the problem:
It take 2 variable inputs, and predict a result.
For example: price and volume as inputs and a decision to buy/sell as a result.
I tried implementing this using K-Neighbors with no success. How would you go about this?
X = cleanedData['ES1 End Price'] #only accounts for 1 variable, don't know how to use input another.
y = cleanedData["Result"]
print(X.shape, y.shape)
kmm = KNeighborsClassifier(n_neighbors = 5)
kmm.fit(X,y) #ValueError for size inconsistency, but both are same size.
Thanks!

X needs to be a matrix/2d array where each column stands for a feature, which doesn't seem true from your code, try reshape X to 2d with X[:,None]:
kmm.fit(X[:,None], y)
Or without resorting to reshape, you'd better always use a list to extract features from a data frame:
X = cleanedData[['ES1 End Price']]
OR with more than one columns:
X = cleanedData[['ES1 End Price', 'volume']]
Then X would be a 2d array, and can be used directly in fit:
kmm.fit(X, y)

Related

fit_transform and inverse_transform on two different scripts

How to fit_transform & inverse_transform in separate scripts?
I first normalize numerical targets (integers) in a script.
Then, I use an other script to predict in real-time these numerical targets (regression).
fit_transform & inverse_transform functions def are in a third script.
scaler = MinMaxScaler(copy=True, feature_range=(0.,1.))
def normalize(array):
array = scaler.fit_transform(array).flatten()
return array
def inverse_norm(array):
array = scaler.inverse_transform(array).flatten()
return array
Naively, I just "inverse_transformed" the predicted values within my real-time script.
But predicted values were not in the range as the original numerical targets: these are little float numbers.
Thank you for your help.
In general I think you don't want to normalize you target variable but if you want to do so you can use a label encoder instead of a minmax scaler which is rather use to normalize features
I fixed myself the problem thanks to mattOrNothing.
Just write down the answer.
First script:
myNormalizedArray = (myArray - myArrayMin) / (myArrayMax - myArrayMin)
Second script:
myDenormalizedArray = myPredicedArray * (myArrayMax - myArrayMin) + myArrayMin
Where myArray & myPredicedArray are numpy arrays.

np.sort(X,axis=1) doesn't sort array?

# Data
order=3
df = pd.read_csv('singleXregression.csv')
X = df.iloc[:, :-1].values
y = df.iloc[:, -1].values
y = y.reshape(len(y),1)
This is standard opening for my regression but when I try to sort X with the line:
X = np.sort(X,axis=1)
it just doesn't do anything. No error message - just X is still not sorted. I know I can sort it id dataframe but I'm trying to make a template for fast copy-paste and therefore trying to work on indexes instead. Why does this line not work? I understand that X is a 2D numpy array as X.shape is (201,1)
The whole reason why I'm trying to sort it, it's because I'm doing polynomial regression and everything works except for the graph, which is all over the place. If anyone could help me sorting X or the graph that would be great.
I fixed it.
feature=df.columns[0]
df = df.sort_values(by=[feature])
This is what did the trick.
maybe you should try
X = X.sort(axis=1)

structuring data in numpy for ltsm (examples)

I am having problem with understanding how data should be prepared for different models:
One to many
Many to one
Many to many(A)
Many to many(B)
Is the right way to think o it this way. Shape numbers are no relevant and do not match the one on picture. I am just trying to understand logic behind.:
import numpy as np
#1. one to many
# X for input y for output
X = np.ones([10,1,5])
y = np.zeros([10,3]) #3 represnts size of output vector
#2. many to one
X = np.ones([10,5,5])
y = np.zeros([10,1])
#3. many to many
X = np.ones([10,5,5])
y = np.zeros([10,5])
# in this case cell should be different than y. It must be bigger to shift some data
#4. many to many
X = np.ones([10,5,5])
y = np.zeros([10,5])
# in this case cell is the same shape as y

Printing remaining features in Feature Reduction

I am running a feature reduction (from 500 to around 30) for a randomforest classifier algo. I can reduce the number of features, but I want to see what features are left at every point in the reduction.As you can see below, I have made an attempt, but does not work.
X does not contain the ColumnNames. Ideally, it could be possible to also have the columnnames in X but only fit from row, then printing X would be possible I think.
I am sure there is a much better way though...
Anybody know how to do this?
FEATURES = []
readThisFile = r'C:\ManyFeatures.txt'
featuresFile = open(readThisFile)
AllFeatures = featuresFile.read()
FEATURES = AllFeatures.split('\n')
featuresFile.close()
Location = r'C:\MASSIVE.xlsx'
data = pd.read_excel(Location)
X = np.array(data[FEATURES])
y = data['_MiniTARGET'].values
for x in range(533, 10,-100):
X = SelectKBest(f_classif, k=x).fit_transform(X, y)
#U=pd.DataFrame(X)
#print (U.feature_importances_)

How to fit multidimensional output using scikit-learn?

I am trying to fit OneVsAll Classification output in training data , rows of output adds upto 1 .
One possible way is to read all the rows and find which column has highest value and prepare data for training .
Eg : y = [[0.2,0.8,0],[0,1,0],[0,0.3,0.7]] can be reduced to y = [b,b,c] , considering a,b,c as corresponding class of the columns 0,1,2 respectively.
Is there a function in scikit-learn which helps to achieve such transformations?
This code does what you want:
import numpy as np
import string
y = np.array([[0.2,0.8,0],[0,1,0],[0,0.3,0.7]])
def transform(y,labels):
f = np.vectorize(lambda i : string.letters[i])
y = f(y.argmax(axis=1))
return y
y = transform(y,'abc')
EDIT: Using the comment by alko, I made it more general be letting the user supply the labels to the transform function.

Categories