Related
I have a temperature dataset of 427 days(daily temperature data) I am training the ARIMA model for 360 days and trying to predict the rest of the 67 days data and comparing the results. While fitting the model in test data I am just getting a straight line as predictions, Am i doing something wrong? `
from statsmodels.tsa.arima.model import ARIMA
model = ARIMA(train['max'],order=(1,1,2),)
results = model.fit()
results.summary()
start = len(train)
end = len(train) + len(test) -1
predictions= pd.DataFrame()
predictions['pred'] = results.predict(start=start, end=end, typ='levels').rename('ARIMA(1,1,1) Predictions')
Your ARIMA model uses the last two observations to make a prediction, that means the prediction for t(361) is based on true values of t(360) and t(359). The prediction of t(362) is based on the already predicted t(361) and the true t(360). The prediction for t(363) is based on two predicted values t(361) and t(360). The prediction is based on previous predictions, and that means that forecasting errors will negatively impact new predictions. The prediction for t(400) is based on predictions that are based on predictions that are based on predictions etc. Imagine your prediction deviates only 1% for each time step, the forecasting error will become bigger and bigger the more time steps you try to predict. In such cases the predictions often form a straight line at some point.
If you use and ARIMA(p, d, q) model, then you can forecast a maximum of q steps into the future. Predicting 67 steps into the future is a very far horizon and ARIMA is most likely not able to do that. Instead, try to predict only the next single or few time steps.
I am trying to optimize a given neural network (ex Perceptron Multilayer, with 2 hidden layers), by finding the number of epoch and batch that give the highest accuracy.
for epoch from 10 to 200 (in steps of 10):
for batch from 40 to 200 (in steps of 20):
modele.fit (X_train, Y_train, epochs = epoch, batch_size = batch)
I save batch, epoch, Accuracy;
Afterwards I kept the smallest epoch with the smallest corresponding batch which has the highest recognition
ex best_params: epoch = 10, batch = 150 => Accuracy = 94%
My problem is that when I re-run my model with the best_params, it doesn't give me the same results (loss, accuracy), even sometimes very low accuracy (eg 10%).
i try to fix seed, but no best result
Regards
Djam75
df=pd.DataFrame(columns=['Nb_Batch','Nb_Epoch','Accuracy'])
i=0
lst_loss=[]
lst_accuracy=[]
lst_epoch=list(np.arange(10,200,10))
lst_batch=list(np.arange(100,400,20))
for epoch in lst_epoch:
print ('---------------- Epoch ' + str(epoch)+ '------------------')
for batch in lst_batch:
modelSimple.fit(X_train, Y_train, nb_epoch = epoch, batch_size = batch, verbose = 0)
score = modelSimple.evaluate(X_test, Y_test)
df.loc[i,"Nb_Batch"]=batch
df.loc[i,"Nb_Epoch"]=epoch
df.loc[i,"Accuracy"]=score[1]*100
i=i+1
This might be happening due to random parameter initialization. Because if you are building an end-to-end model without transfer learn the weights, every time you training architecture get random values for its parameters.
In this case, a good practice is to use batch normalization layers after some layers according to your architecture.
tensoflow-implementation
pytorch-implmentation
extra idea:
Do not use any 'for', 'while' loops in the model implementation.
you can follow templates in TensorFlow or PyTorch.
OR, if you build a complete model from scratch, vectorize operations by using NumPy like metrics operation library.
Thanks for the update.
I resolve my probelm by saving a model and load it after.
thaks for idea (batch normalization ) and extra idea : not user any for ;-)
regards
I think you might not be updating the weight matrix after completing the training for certain batch sizes and epochs.
Please include the code as well in order to see the problem
I'm training a neural network (doesn't matter which one) on CIFAR-10 dataset. I'm using Federated Learning:
I have 10 models, each model having access to its own part of the dataset. At every time step, each model makes a step using its own data, and then the global model is an average of the model (this version is based on this, but I tried a lot of options):
def server_aggregate(server_model, client_models):
global_dict = server_model.state_dict()
for k in global_dict.keys():
global_dict[k] = torch.stack([client_models[i].state_dict()[k].float() for i in range(len(client_models))], 0).mean(0)
server_model.load_state_dict(global_dict)
for model in client_models:
model.load_state_dict(server_model.state_dict())
To be specific, each machine only has access to a data corresponding to a single class. I.e. machine 0 has only samples corresponding to class 0, etc. I'm doing it the following way:
def split_into_classes(full_ds, batch_size, num_classes=10):
class2indices = [[] for _ in range(num_classes)]
for i, y in enumerate(full_ds.targets):
class2indices[y].append(i)
datasets = [torch.utils.data.Subset(full_ds, indices) for indices in class2indices]
return [DataLoader(ds, batch_size=batch_size, shuffle=True) for ds in datasets]
Problem. During training, I can see that my federated training loss decreases. However, I never see my test loss/accuracy improve (acc is always around 10%).
Moreover, when I check accuracy on train/test datasets:
For the federated dataset, the accuracy improves.
For the testing dataset, the accuracy doesn't improve.
(Most surprising) for the training dataset, the accuracy doesn't improve. Note that this dataset is essentially the same as federated dataset, but not split into classes. The checking code is the following:
def epoch_summary(model, fed_loaders, true_train_loader, test_loader, frac):
with torch.no_grad():
train_len = 0
train_loss, train_acc = 0, 0
for train_loader in fed_loaders:
cur_loss, cur_acc, cur_len = true_results(model, train_loader, frac)
train_loss += cur_len * cur_loss
train_acc += cur_len * cur_acc
train_len += cur_len
train_loss /= train_len
train_acc /= train_len
true_train_loss, true_train_acc, true_train_len = true_results(model, true_train_loader, frac)
test_loss, test_acc, test_len = true_results(model, test_loader, frac)
print("TrainLoss: {:.4f} TrainAcc: {:.2f} TrueLoss: {:.4f} TrueAcc: {:.2f} TestLoss: {:.4f} TestAcc: {:.2f}".format(
train_loss, train_acc, true_train_loss, true_train_acc, test_loss, test_acc
), flush=True)
The full code can be found here. Things which don't seem to matter:
Model. I got the same problem for Resnet models and for some other models.
How I aggregate the models. I tried using state_dict or directly manipulate model.parameters(), no effect.
How I learn the models. I tried using optim.SGD or directly update param.data -= learning_rate * param.grad, no effect.
Computational graph. I've tried adding .detach().clone() and with torch.no_grad() into all possible places, no effect.
So I'm suspecting that the problem is somehow with the federated data itself (especially given strange accuracy results). What can be a problem?
10% on CIFAR-10 is basically random - your model outputs labels at random and gets 10%.
I think the problem lies in your "federated training" strategy: you cannot expect your sub-models to learn anything meaningful when all they see is a single label. This is why training data is shuffled.
Think of it: if each of your sub models learns all weights to be zero apart from the bias vector of the last classification layer that has 1 in the entry corresponding to the class this sub-model sees - the training of each sub model is perfect (it gets it right for all training samples it sees), but the averaged model is meaningless.
I wanted to simulate classifying whether a student will pass or fail a course depending on training data with a single input, namely a student's exam score.
I start by creating data set of test scores for 1000 students, normally distributed with a mean of 80.
I then created a classification "1" (passing) for the top 300 students, which based on the seed is a test score of 80.87808591534409.
(Obviously we don't really need machine learning for this, as this means anyone with a test score higher than 80.87808591534409 passes the class. But I want to build a model that accurately predicts this, so that I can start adding new input features and expand my classification beyond, pass/fail).
Next I created a test set in the same way, and classified these students using the classification threshold previously computed for the training set (80.87808591534409).
Then, as you can see below or in the linked Jupyter notebook, I created a model that takes one input feature and returns two results (a probability for the zero index classification (fail) and a probability for one index classification (pass).
Then I trained it on the training data set. But as you can see the loss never really improves per iteration. It just kind of hovers at 0.6.
Finally, I ran the trained model on the test data set and generated predictions.
I plotted the results as follows:
The green line represents the actual (not the predicted) classifications of the test set.
The blue line represents the probability of 0 index outcome (failing) and the orange line represents the probability of the 1 index outcome (passing).
As you can see they remain flat. If my model is working, I would have expected these lines to trade places at the threshold where the actual data switches from failing to passing.
I imagine I could be doing a lot of things wrong, but if anyone has time to look at the code below and give me some advice I would be grateful.
I've created a public working example of my attempt here.
And I've included the current code below.
The problem I'm having is that the model training seems to get stuck in computing the loss, and as a result, it reports that every student in my testing set (all 1,000 students fail) no matter what their test result is, which is obviously wrong.
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
import tensorflow_hub as hub
import tensorflow_datasets as tfds
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense
print("Version: ", tf.__version__)
print("Eager mode: ", tf.executing_eagerly())
print("Hub version: ", hub.__version__)
print("GPU is", "available" if tf.config.experimental.list_physical_devices("GPU") else "NOT AVAILABLE")
## Create data
# Set Seed
np.random.seed(0)
# Create 1000 test scores normally distributed with a range of 2 with a mean of 80
train_exam_scores = np.sort(np.random.normal(80,2,1000))
# Create classification; top 300 pass the class (classification of 1), bottom 700 do not class (classification of 0)
train_labels = np.array([0. for i in range(700)])
train_labels = np.append(train_labels, [1. for i in range(300)])
print("Point at which test scores correlate with passing class: {}".format(train_exam_scores[701]))
print("computed point with seed of 0 should be: 80.87808591534409")
print("Plot point at which test scores correlate with passing class")
## Plot view
plt.plot(train_exam_scores)
plt.plot(train_labels)
plt.show()
#create another set of 1000 test scores with different seed (10)
np.random.seed(10)
test_exam_scores = np.sort(np.random.normal(80,2,1000))
# create classification labels for the new test set based on passing rate of 80.87808591534409 determined above
test_labels = np.array([])
for index, i in enumerate(test_exam_scores):
if (i >= 80.87808591534409):
test_labels = np.append(test_labels, 1)
else:
test_labels = np.append(test_labels, 0)
plt.plot(test_exam_scores)
plt.plot(test_labels)
plt.show()
print(tf.shape(train_exam_scores))
print(tf.shape(train_labels))
print(tf.shape(test_exam_scores))
print(tf.shape(test_labels))
train_dataset = tf.data.Dataset.from_tensor_slices((train_exam_scores, train_labels))
test_dataset = tf.data.Dataset.from_tensor_slices((test_exam_scores, test_labels))
BATCH_SIZE = 5
SHUFFLE_BUFFER_SIZE = 1000
train_dataset = train_dataset.shuffle(SHUFFLE_BUFFER_SIZE).batch(BATCH_SIZE)
test_dataset = test_dataset.batch(BATCH_SIZE)
# view example of feature to label correlation, values above 80.87808591534409 are classified as 1, those below are classified as 0
features, labels = next(iter(train_dataset))
print(features)
print(labels)
# create model with first layer to take 1 input feature per student; and output layer of two values (percentage of 0 or 1 classification)
model = tf.keras.Sequential([
tf.keras.layers.Dense(10, activation=tf.nn.relu, input_shape=(1,)), # input shape required
tf.keras.layers.Dense(10, activation=tf.nn.relu),
tf.keras.layers.Dense(2)
])
# Test untrained model on training features; should produce nonsense results
predictions = model(features)
print(tf.nn.softmax(predictions[:5]))
print("Prediction: {}".format(tf.argmax(predictions, axis=1)))
loss_object = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
optimizer = tf.keras.optimizers.SGD(learning_rate=0.1)
model.compile(optimizer=optimizer,
loss=loss_object,
metrics=['categorical_accuracy'])
#train model
model.fit(train_dataset,
epochs=20,
validation_data=test_dataset,
verbose=1)
#make predictions on test scores from test_dataset
predictions = model.predict(test_dataset)
tf.nn.softmax(predictions[:1000])
tf.argmax(predictions, axis=1)
# I anticipate that the predictions would show a higher probability for index position [0] (classification 0, "did not pass")
#until it reaches a value greater than 80.87808591534409
# which in the test data with a seed of 10 should be the value at the 683 index position
# but at this point I would expect there to be a higher probability for index position [1] (classification 1), "did pass"
# because it is obvious from the data that anyone who scores higher than 80.87808591534409 should pass.
# Thus in the chart below I would expect the lines charting the probability to switch precisely at the point where the test classifications shift.
# However this is not the case. All predictions are the same for all 1000 values.
plt.plot(tf.nn.softmax(predictions[:1000]))
plt.plot(test_labels)
plt.show()
The main issue here: Use softmax activation in the last layer, not separetely outside the model. Change the final layer to:
tf.keras.layers.Dense(2, activation="softmax")
Secondly, for two hidden layers with relu, 0.1 may be too high a learning rate. Try with a lower rate of maybe 0.01 or 0.001.
Another thing to try is to divide the input by 100, to get inputs in the range [0, 1]. This makes training easier, since the update step does not heavily modify the weights.
Context:
I am using Passive Aggressor from scikit library and confused whether to use warm start or partial fit.
Efforts hitherto:
Referred this thread discussion:
https://github.com/scikit-learn/scikit-learn/issues/1585
Gone through the scikit code for _fit and _partial_fit.
My observations:
_fit in turn calls _partial_fit.
When warm_start is set, _fit calls _partial_fit with self.coef_
When _partial_fit is called without coef_init parameter and self.coef_ is set, it continues to use self.coef_
Question:
I feel both are ultimately providing the same functionalities.Then, what is the basic difference between them? In which contexts, either of them are used?
Am I missing something evident? Any help is appreciated!
I don't know about the Passive Aggressor, but at least when using the SGDRegressor, partial_fit will only fit for 1 epoch, whereas fit will fit for multiple epochs (until the loss converges or max_iter
is reached). Therefore, when fitting new data to your model, partial_fit will only correct the model one step towards the new data, but with fit and warm_start it will act as if you would combine your old data and your new data together and fit the model once until convergence.
Example:
from sklearn.linear_model import SGDRegressor
import numpy as np
np.random.seed(0)
X = np.linspace(-1, 1, num=50).reshape(-1, 1)
Y = (X * 1.5 + 2).reshape(50,)
modelFit = SGDRegressor(learning_rate="adaptive", eta0=0.01, random_state=0, verbose=1,
shuffle=True, max_iter=2000, tol=1e-3, warm_start=True)
modelPartialFit = SGDRegressor(learning_rate="adaptive", eta0=0.01, random_state=0, verbose=1,
shuffle=True, max_iter=2000, tol=1e-3, warm_start=False)
# first fit some data
modelFit.fit(X, Y)
modelPartialFit.fit(X, Y)
# for both: Convergence after 50 epochs, Norm: 1.46, NNZs: 1, Bias: 2.000027, T: 2500, Avg. loss: 0.000237
print(modelFit.coef_, modelPartialFit.coef_) # for both: [1.46303288]
# now fit new data (zeros)
newX = X
newY = 0 * Y
# fits only for 1 epoch, Norm: 1.23, NNZs: 1, Bias: 1.208630, T: 50, Avg. loss: 1.595492:
modelPartialFit.partial_fit(newX, newY)
# Convergence after 49 epochs, Norm: 0.04, NNZs: 1, Bias: 0.000077, T: 2450, Avg. loss: 0.000313:
modelFit.fit(newX, newY)
print(modelFit.coef_, modelPartialFit.coef_) # [0.04245779] vs. [1.22919864]
newX = np.reshape([2], (-1, 1))
print(modelFit.predict(newX), modelPartialFit.predict(newX)) # [0.08499296] vs. [3.66702685]
If warm_start = False, each subsequent call to .fit() (after an initial call to .fit() or partial_fit()) will reset the model's trainable parameters for the initialisation. If warm_start = True, each subsequent call to .fit() (after an initial call to .fit() or partial_fit()) will retain the values of the model's trainable parameters from the previous run, and use those initially.
Regardless of the value of warm_start, each call to partial_fit() will retain the previous run's model parameters and use those initially.
Example using MLPRegressor:
import sklearn.neural_network
import numpy as np
np.random.seed(0)
x = np.linspace(-1, 1, num=50).reshape(-1, 1)
y = (x * 1.5 + 2).reshape(50,)
cold_model = sklearn.neural_network.MLPRegressor(hidden_layer_sizes=(), warm_start=False, max_iter=1)
warm_model = sklearn.neural_network.MLPRegressor(hidden_layer_sizes=(), warm_start=True, max_iter=1)
cold_model.fit(x,y)
print cold_model.coefs_, cold_model.intercepts_
#[array([[0.17009494]])] [array([0.74643783])]
cold_model.fit(x,y)
print cold_model.coefs_, cold_model.intercepts_
#[array([[-0.60819342]])] [array([-1.21256186])]
#after second run of .fit(), values are completely different
#because they were re-initialised before doing the second run for the cold model
warm_model.fit(x,y)
print warm_model.coefs_, warm_model.intercepts_
#[array([[-1.39815616]])] [array([1.651504])]
warm_model.fit(x,y)
print warm_model.coefs_, warm_model.intercepts_
#[array([[-1.39715616]])] [array([1.652504])]
#this time with the warm model, params change relatively little, as params were
#not re-initialised during second call to .fit()
cold_model.partial_fit(x,y)
print cold_model.coefs_, cold_model.intercepts_
#[array([[-0.60719343]])] [array([-1.21156187])]
cold_model.partial_fit(x,y)
print cold_model.coefs_, cold_model.intercepts_
#[array([[-0.60619347]])] [array([-1.21056189])]
#with partial_fit(), params barely change even for cold model,
#as no re-initialisation occurs
warm_model.partial_fit(x,y)
print warm_model.coefs_, warm_model.intercepts_
#[array([[-1.39615617]])] [array([1.65350392])]
warm_model.partial_fit(x,y)
print warm_model.coefs_, warm_model.intercepts_
#[array([[-1.39515619]])] [array([1.65450372])]
#and of course the same goes for the warm model
First, let us look at the difference between .fit() and .partial_fit().
.fit() would let you train from the scratch. Hence, you could think of this as a option that can be used only once for a model. If you call .fit() again with a new set of data, the model would be build on the new data and will have no influence of previous dataset.
.partial_fit() would let you update the model with incremental data. Hence, this option can be used more than once for a model. This could be useful, when the whole dataset cannot be loaded into the memory, refer here.
If both .fit() or .partial_fit() are going to be used once, then it makes no difference.
warm_start can be only used in .fit(), it would let you start the learning from the co-eff of previous fit(). Now it might sound similar to the purpose to partial_fit(), but recommended way would be partial_fit(). May be do the partial_fit() with same incremental data few number of times, to improve the learning.
About difference. Warm start it just an attribute of class. Partial fit it is method of this class. It's basically different things.
About same functionalities. Yes, partial fit will use self.coef_ because it still needed to get some values to update on training period. And for empty coef_init we just put zero values to self.coef_ and go to the next step of training.
Description.
For first start:
Whatever how (with or without warm start). We will train on zero coefficients but in result we will save average of our coefficients.
N+1 start:
With warm start. We will check via method _allocate_parameter_mem our previous coefficients and take it to train. In result save our average coefficients.
Without warm start. We will put zero coefficients (as first start) and go to training step. In result we will still write average coefficients to memory.