I'm trying to train a simple model using TensorFlow.
I want to get an accurate prediction (98%+) for a new input into the trained model.
There is 500 Input's. 1 stands for a Price (500-200.000), 1 for a Construction-Year (1980-2017) and the 498 remaining are either true or false (0/1). Now the 498 Input's are really important (in fact the other two are almost obsolete). So it's all numeric.
I looked through the TensorFlow documentation and couldn't find any tutorial on how to weigh the different input's. e.g. Price/Year should only make up for "10%" of the Prediction (0.05 each) while the remainders make up for the rest (0.9).
Is there a solution for this?
Problem
Construct a learning algorithm that does not choose all of its own weights, but instead only chooses some of its weights. Specifically, given 500 features the first feature contributes 5% , the second features contributes 5% and the remaining 498 features contribute 95% to the final prediction.
Solution
This is very similar to the architecture of a collection of experts. Instead of creating 1 network, you will create 3 networks :
a network that uses feature 1
a network that uses feature 2
a network that uses feature 3 through 500
You would create each of these networks as you would a regular network with independant weight matrices and output hypothesis.
That will end in some tensorflow code like
y_out_from_price = tf.nn.softmax( h_out_from_price )
y_out_from_year = tf.nn.softmax( h_out_from_year )
y_out_from_rest = tf.nn.softmax( h_out_from_rest )
Where each of these is your categorization prediction. Your final prediction is
y_out_final = 0.05 * y_out_from_price +
0.05 * y_out_from_year +
0.95 * y_out_from_rest
My opinion is that this will underperform just using all the features.
Related
on my journey learning ML I was testing some NN and I was seeing that my output doesn't seem to take into consideration one of my 3 inputs which is very important.
My dataset is composed of 4 cols (csv):
3 are numbers (including the output) between 1 000 and can go up to 150 000 000
1 is a number between 0 and 100 which is the one that's not taken in consideration by my NN
I scale my dataset this way using MinMaxScaler from scikit-learn:
df = pd.read_csv('rawData.csv')
dataset = df.values
min_max_scaler = preprocessing.MinMaxScaler()
dataset = min_max_scaler.fit_transform(dataset)
X = dataset[:,0:3] # input
Y = dataset[:,3] # output
I also use another way to scale my data (when I want to test my model):
min_test = np.min(runset)
max_test = np.max(runset)
normalized = (runset - min_test) / (max_test - min_test)
test = model.predict(normalized)
result = test * (max_test - min_test) + min_test
So my question is: Is it possible and recommended to use different scales for different inputs? If yes how do I do that?
Short answer to your question is another question: do I have prior knowledge about the importance of the features characterizing my dataset?
if yes, I may scale my data in such a way that more important features have larger variance/range.
if not, I should scale my data in such a way that features have mean 0 and standard deviation 1. Why? Mainly for improving numerical condition, removing the problem of scale dependence of the initial weights, making the training process faster, and reducing the risk of getting stuck in local optima.
Do not underestimate the sensitivity of gradient descent methods to scaling.
Finally, remember to use statistics (mean and standard deviation) from the training set to standardize the validation/test set.
I'm new to ML and would be grateful for any assistance provided. I've run a linear regression prediction using test set A and training set A. I saved the linear regression model and would now like to use the same model to predict a test set A target using features from test set B. Each time I run the model it throws up the error below
How can I successfully predict a test data set from features and a target with different shapes?
Input
print(testB.shape)
print(testA.shape)
Output
(2480, 5)
(1315, 6)
Input
saved_model = joblib.load(filename)
testB_result = saved_model.score(testB_features, testA_target)
print(testB_result)
Output
ValueError: Found input variables with inconsistent numbers of samples: [1315, 2480]
Thanks again
They are inconsistent shapes which is why the error is being thrown. Have you tried to reshape the data so one of them are same shape? From a quick look, it seems that you have more samples and one less feature in testA.
Think about it, if you have trained your model with 5 features you cannot then ask the same model to make a prediction given 6 features. You speak of using a Linear Regressor, the equation is roughly:
y = b + w0*x0 + w1*x1 + w2*x2 + .. + wN-1*xN-1
Where {
y is your output/label
N is the number of features
b is the bias term
w(i) is the ith weight
x(i) is the ith feature value
}
You have trained a linear regressor with 5 features, effectively producing the following
y (your output/label) = b + w0*x0 + w1*x1 + w2*x2 + w3*x3 + w4*x4
You then ask it to make a prediction given 6 features but it only knows how to deal with 5.
Aside from that issue, you also have too many samples, testB has 2480 and testA has 1315. These need to match, as the model wants to make 2480 predictions, but you only give it 1315 outputs to compare it to. How can you get a score for 1165 missing samples? Do you now see why the data has to be reshaped?
EDIT
Assuming you have datasets with an equal amount of features as discussed above, you may now look at reshaping (removing data) testB like so:
testB = testB[0:1314, :]
testB.shape
(1315, 5)
Or, if you would prefer a solution using the numpy API:
testB = np.delete(testB, np.s_[0:(len(testB)-len(testA))], axis=0)
testB.shape
(1315, 5)
Keep in mind, when doing this you slice out a number of samples. If this is important to you (which it can be) then it may be better to introduce a pre-processing step to help out with the missing values, namely imputing them like this. It is worth noting that the data you are reshaping should be shuffled (unless it is already), as you may be removing parts of the data the model should be learning about. Neglecting to do this could result in a model that may not generalise as well as you hoped.
I'm fairly new to deep learning and Keras, and this problem has bothered me for weeks. Hope I can get some hints from here.
Features:
I simulated two variables, each has 10k samples and follow a standard normal distribution: A ~ Norm(0, 1); B ~ Norm(0, 1).
Labels
And I derived two labels from the simulated variables: y1 = A * B; y2 = A / B.
Model
Input dimension: 2
Hidden layers: 4 dense layers, all of them were 32 neurons wide
Output layers: a dense layer with 1 neuron
Activation functions: ReLU for all the activation functions
Compiler: 'MSE' as the loss function, 'Adam' as the optimizer with learning rate at 1e-05
Tasks
Finally, I set up three tasks for MLP to learn:
(1) Use A, B to predict y1;
(2) Use A, B to predict y2;
(3) Use A, 1/B to predict y2
Validation
Use 'validation_split = 0.2' to verify the model
Results and Inference
It can reach MSE below 1 easily for both training and validation set after 10~15 epochs in task 1. However, I'll always get a very high loss like 30k+ on training loss for the other two tasks.
[update] I also evaluated the results by Pearson correlation coefficient, which returned ~0.7 for task 1 and <0.01 for task 2 and 3.
It's weird to me since the ideas of multiplication(y1) and division(y2) are mathematically the same. So I then tried to look into the distribution of 1/B, and I found that it has extremely long tails at each side. I surpose it might be the source of difficulty but couldn't figure any strategy for it. I also tried to normalize 1/B before the training but got no luck on it.
Any advice or comment is welcome. Can't find discussion on this either on web or books, really want to make some progress on it. Thank you.
y2 values have a much different distribution from y1 values, specifically, it will have values with much larger absolute values. This means that comparing the loss directly isn't really fair.
It's kinda like estimating the mass of a person vs. estimating the mass of a planet, and being upset that you're off by millions of pounds.
For an illustration, try calculating the loss on all three problems, but with an estimator that only ever guesses 0.0. I suspect that problem 1 will have much lower loss than the other two.
I am currently using tensorflow to create a neural network, that replicates the function of creating a certain output given an input.
The input in this case is a sampled audio, and the audio is generating MFCC features. Know for each file what the corresponding MFCC feature, is, but aren't sure how i should setup the neural network.
I am following this guide/tutorial http://www.kdnuggets.com/2016/09/urban-sound-classification-neural-networks-tensorflow.html/2
It which it says that the neural network is setup as such
training_epochs = 5000
n_dim = tr_features.shape[1]
n_classes = 10
n_hidden_units_one = 280
n_hidden_units_two = 300
sd = 1 / np.sqrt(n_dim)
learning_rate = 0.01
My question here is how i define the number of classes? I mean, the real values I've computed aren't divided into classes, but is a decimal number, so should I just create multiple networks with different number of classes, and choose the one which has the smallest error compared to the original value, or is there a tensorflow command that can do that, as I am doing supervised learning..
Neural networks could be use for classification tasks or regression tasks. In tutorial, the author wants to classify sounds into 10 different categories. So the neural networks have 10 output neurons (n_classes) and each of their activation value give the probability of membership to a class for an input sound.
In our case, you want to map a given sound with a decimal number (that's right ?), so it's a regression task : the neural network have to learn an unknown function. The number of output neurons has to be equal to the output dimension of our unknown function (1 if it's just a decimal number).
So if you want to keep the same architecture to our regression task, just set n_classes = 1 and modify y_ to
y_ = tf.matmul(h_2,W) + b
because tf.nn.softmax convert the final score to probability (it's good for classification but not for regression)
I am using sklearn RandomForest Classifier/Bag classifier for learning and I am not getting the expected results when compared to Java/Weka Machine Learning library.
In Weka, I am learning the model with - Random forest of 10 trees, each constructed while considering 6 random features. (setNumFeatures need to be set and default is 10 trees)
In sklearn - I am not sure how to specify the number of features to randomly consider while constructing a random forest of 10 trees. This what I am doing:
rf_classifier = RandomForestClassifier(n_estimators=num_trees, max_features=6)
rf_classifier = rf_classifier.fit(train_file, train_file_label)
for items in rf_classifier.estimators_:
classifier_list.append(items)
I saw the docs and there is a parameter - max_features but I am not sure if that serves the purpose. I get this error when I am trying to calculate entropy:
# code to calculate voting entropy for all features (unlabeled data)
vote_count_for_features = list(classifier_list[0].predict(feature_data_arr))
for i in range(1, len(classifier_list)):
res_temp = []
res_temp = list(classifier_list[i].predict(feature_data_arr))
vote_count_for_features = [x + y for x, y in zip(vote_count_for_features, res_temp)]
If I set that parameter to 6, than my code fails with the error message:
Number of features of the model must match the input. Model n_features
is 6 and input n_features is 31
Inputs: Sample set of 1 million records with 31 features. When I run weka, the number of rules extracted are around 1000 whereas when I run the same thing through sklearn - I get hardly 70 rules.
I am new to python and sklearn and I am trying to figure out where am I doing wrong. (Weka code has been tested well and gives 95% precision, 80% recall - so I am assuming that's good)
Note: I have used sklearn imputer to impute missing values using 'mean' strategy whereas Weka has ways to handle NaN.
This is what I am trying to achieve: Learn Random Forest on a sample set, extract rules, evaluate rules and then apply on the bigger set
Any suggestions or input will really help me debug through the issue and solve it quickly.
I think the issue is that the individual trees get confused since they only use 6 features, but you give them 31. You can try to get the prediction to work by setting check_input = False:
list(classifier_list[i].predict(feature_data_arr, check_input = False))