I'm using a deep CNN+LSTM network to perfom a classification on a dataset of 1D signals. I'm using keras 2.2.4 backed by tensorflow 1.12.0. Since I have a large dataset and limited resources, I'm using a generator to load the data into the memory during the training phase. First, I tried this generator:
def data_generator(batch_size, preproc, type, x, y):
num_examples = len(x)
examples = zip(x, y)
examples = sorted(examples, key = lambda x: x[0].shape[0])
end = num_examples - batch_size + 1
batches = [examples[i:i + batch_size] for i in range(0, end, batch_size)]
random.shuffle(batches)
while True:
for batch in batches:
x, y = zip(*batch)
yield preproc.process(x, y)
Using the above method, I'm able to launch training with a mini-batch size up to 30 samples at a time. However, this kind of method does not guarantee that the network will only train once on each sample per epoch. Considering this comment from Keras's website:
Sequence is a safer way to do multiprocessing. This structure
guarantees that the network will only train once on each sample per
epoch which is not the case with generators.
I've tried another way of loading data using the following class:
class Data_Gen(Sequence):
def __init__(self, batch_size, preproc, type, x_set, y_set):
self.x, self.y = np.array(x_set), np.array(y_set)
self.batch_size = batch_size
self.indices = np.arange(self.x.shape[0])
np.random.shuffle(self.indices)
self.type = type
self.preproc = preproc
def __len__(self):
# print(self.type + ' - len : ' + str(int(np.ceil(self.x.shape[0] / self.batch_size))))
return int(np.ceil(self.x.shape[0] / self.batch_size))
def __getitem__(self, idx):
inds = self.indices[idx * self.batch_size:(idx + 1) * self.batch_size]
batch_x = self.x[inds]
batch_y = self.y[inds]
return self.preproc.process(batch_x, batch_y)
def on_epoch_end(self):
np.random.shuffle(self.indices)
I can confirm that using this method the network is training once on each sample per epoch but this time when I put more than 7 samples in the mini-batch, I got out of memory error:
OP_REQUIRES failed at random_op.cc: 202: Resource exhausted: OOM when
allocating tensor with shape...............
I can confirm that I'm using the same model architecture, configuration, and machine to do this test. I'm wondering why would be a difference between these 2 ways of loading data??
Please don't hesitate to ask for more details in case needed.
Thanks in advance.
EDITED:
Here is the code I'm using to fit the model:
reduce_lr = keras.callbacks.ReduceLROnPlateau(
factor=0.1,
patience=2,
min_lr=params["learning_rate"])
checkpointer = keras.callbacks.ModelCheckpoint(
filepath=str(get_filename_for_saving(save_dir)),
save_best_only=False)
batch_size = params.get("batch_size", 32)
path = './logs/run-{0}'.format(datetime.now().strftime("%b %d %Y %H:%M:%S"))
tensorboard = keras.callbacks.TensorBoard(log_dir=path, histogram_freq=0,
write_graph=True, write_images=False)
if index == 0:
print(model.summary())
print("Model memory needed for batchsize {0} : {1} Gb".format(batch_size, get_model_memory_usage(batch_size, model)))
if params.get("generator", False):
train_gen = load.data_generator(batch_size, preproc, 'Train', *train)
dev_gen = load.data_generator(batch_size, preproc, 'Dev', *dev)
valid_metrics = Metrics(dev_gen, len(dev[0]) // batch_size, batch_size)
model.fit_generator(
train_gen,
steps_per_epoch=len(train[0]) / batch_size + 1 if len(train[0]) % batch_size != 0 else len(train[0]) // batch_size,
epochs=MAX_EPOCHS,
validation_data=dev_gen,
validation_steps=len(dev[0]) / batch_size + 1 if len(dev[0]) % batch_size != 0 else len(dev[0]) // batch_size,
callbacks=[valid_metrics, MyCallback(), checkpointer, reduce_lr, tensorboard])
# train_gen = load.Data_Gen(batch_size, preproc, 'Train', *train)
# dev_gen = load.Data_Gen(batch_size, preproc, 'Dev', *dev)
# model.fit_generator(
# train_gen,
# epochs=MAX_EPOCHS,
# validation_data=dev_gen,
# callbacks=[valid_metrics, MyCallback(), checkpointer, reduce_lr, tensorboard])
Those methods are roughly the same. It is correct to subclass
Sequence when your dataset doesn't fit in memory. But you shouldn't
run any preprocessing in any of the class' methods because that will
be reexecuted once per epoch wasting lots of computing resources.
It is probably also easier to shuffle the samples rather than their
indices. Like this:
from random import shuffle
class DataGen(Sequence):
def __init__(self, batch_size, preproc, type, x_set, y_set):
self.samples = list(zip(x, y))
self.batch_size = batch_size
shuffle(self.samples)
self.type = type
self.preproc = preproc
def __len__(self):
return int(np.ceil(len(self.samples) / self.batch_size))
def __getitem__(self, i):
batch = self.samples[i * self.batch_size:(i + 1) * self.batch_size]
return self.preproc.process(*zip(batch))
def on_epoch_end(self):
shuffle(self.samples)
I think it is impossible to say why you run out of memory without
knowing more about your data. My guess would be that your preproc
function is doing something wrong. You can debug it by running:
for e in DataGen(batch_size, preproc, *train):
print(e)
for e in DataGen(batch_size, preproc, *dev):
print(e)
You will most likely run out of memory.
Related
The tensorflow versions that I can still recreate this behavior are: 2.7.0, 2.7.3, 2.8.0, 2.9.0. Actually, these are all the versions I've tried; I wasn't able to resolve the issue in any version.
OS: Ubuntu 20
GPU: RTX 2060
RAM: 16GB
I am trying to feed my data to a model using a generator:
class DataGen(tf.keras.utils.Sequence):
def __init__(self, indices, batch_size):
self.X = X
self.y = y
self.indices = indices
self.batch_size = batch_size
def __getitem__(self, index):
X_batch = self.X[self.indices][
index * self.batch_size : (index + 1) * self.batch_size
]
y_batch = self.y[self.indices][
index * self.batch_size : (index + 1) * self.batch_size
]
return X_batch, y_batch
def __len__(self):
return len(self.y[self.indices]) // self.batch_size
train_gen = DataGen(train_indices, 32)
val_gen = DataGen(val_indices, 32)
test_gen = DataGen(test_indices, 32)
where X, y is my dataset loaded from a .h5 file using h5py, and train_indices, val_indices, test_indices are the indices for each set that will be used on X and y.
I am creating the model and feeding the data using:
# setup model
base_model = tf.keras.applications.MobileNetV2(input_shape=(128, 128, 3),
include_top=False)
base_model.trainable = False
mobilenet1 = Sequential([
base_model,
Flatten(),
Dense(27, activation='softmax')
])
mobilenet1.compile(optimizer=tf.keras.optimizers.Adam(),
loss=tf.keras.losses.CategoricalCrossentropy(),
metrics=['accuracy'])
# model training
hist_mobilenet = mobilenet1.fit(train_gen, validation_data=val_gen, epochs=1)
The memory right before training is 8%, but the moment training starts it begins getting values from 30% up to 60%. Since I am using a generator and loading the data in small parts of 32 observations at a time, it seems odd to me that the memory climbs this high. Also, even when training stops, memory stays above 30%. I checked all global variables but none of them has such a large size. If I start another training session memory starts having even higher usage values and eventually jupyter notebook kernel dies.
Is something wrong with my implementation or this is normal?
Edit 1: some additional info.
Whenever the training stops, memory usage drops a little, but I can decrease it even more by calling garbage collector. However, I cannot bring it back down to 8%, even when I delete the history created by fit
the x and y batches' size sum up to 48 bytes; this outrages me! how come loading 48 of data at a time is causing the memory usage to increase that much? Supposedly I am using HDF5 dataset to be able to handle the data without overloading RAM. The next thing that comes to my mind is that fit creates some variables, but it doesn't make sense that it needs so many GBs of memory to store them
Literally, this is not a generator. When you instantiate DataGen, you create a complete class with full indices (def init (self, indices, batch_size)), with datasets (self.X, self.Y), with inheritance from Sequential, and so on.
The simplest real generator for tensorflow looks something like this:
from sklearn.model_selection import train_test_split
BATCH_SIZE = 32
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)
X_val = X_train[int(len(X_train) * 0.8):]
X_train = X_train[int(len(X_train) * 0.8)]
y_val = y_train[int(len(y_train) * 0.8):]
y_train = y_train[:int(len(y_train) * 0.8)]
def gen_reader(X_train, y_train):
for data, label in zip(X_train, y_train):
yield data, label
train_ds = tf.data.Dataset.from_generator(gen_reader, args=[X_train, y_train], output_types=(tf.float64, tf.int8)).batch(BATCH_SIZE).prefetch(buffer_size=AUTOTUNE)
val_ds = tf.data.Dataset.from_generator(gen_reader, args=[X_val, y_val], output_types=(tf.float64, tf.int8)).batch(BATCH_SIZE).prefetch(buffer_size=AUTOTUNE)
test_ds = tf.data.Dataset.from_generator(gen_reader, args=[X_test, y_test], output_types=(tf.float64, tf.int8)).batch(BATCH_SIZE).prefetch(buffer_size=AUTOTUNE)
...
hist_mobilenet = mobilenet1.fit(train_ds, validation_data=val_ds, epochs=1)
How to minimize RAM usage
From the very helpful comments and answers of our fellow friends, I came to this conclusion:
First, we have to save the data to an HDF5 file, so we would not have to load the whole dataset in memory.
import h5py as h5
import gc
file = h5.File('data.h5', 'r')
X = file['X']
y = file['y']
gc.collect()
I am using garbage collector just to be safe.
Then, we would not have to pass the data to the generator, as the X and y will be same for training, validation and testing. In order to differentiate between the different data, we will use index maps
# split data for validation and testing
val_split, test_split = 0.2, 0.1
train_indices = np.arange(len(X))[:-int(len(X) * (val_split + test_split))]
val_indices = np.arange(len(X))[-int(len(X) * (val_split + test_split)) : -int(len(X) * test_split)]
test_indices = np.arange(len(X))[-int(len(X) * test_split):]
class DataGen(tf.keras.utils.Sequence):
def __init__(self, index_map, batch_size):
self.X = X
self.y = y
self.index_map = index_map
self.batch_size = batch_size
def __getitem__(self, index):
X_batch = self.X[self.index_map[
index * self.batch_size : (index + 1) * self.batch_size
]]
y_batch = self.y[self.index_map[
index * self.batch_size : (index + 1) * self.batch_size
]]
return X_batch, y_batch
def __len__(self):
return len(self.index_map) // self.batch_size
train_gen = DataGen(train_indices, 32)
val_gen = DataGen(val_indices, 32)
test_gen = DataGen(test_indices, 32)
Last thing to notice is how I implemented the the data fetching inside __getitem__.
Correct solution:
X_batch = self.X[self.index_map[
index * self.batch_size : (index + 1) * self.batch_size
]]
Wrong solution:
X_batch = self.X[self.index_map][
index * self.batch_size : (index + 1) * self.batch_size
]
same for y
Notice the difference? In the wrong solution I am loading the whole dataset (training, validation or testing) in memory! Instead, in the correct solution I am only loading the batch meant to feed in the fit method.
With this setup, I managed to raise RAM only to 2.88 GB, which is pretty cool!
Make use of fit_generator instead of the fit method
I mean instead of
hist_mobilenet = mobilenet1.fit(train_gen, validation_data=val_gen, epochs=1)
Use
hist_mobilenet = mobilenet1.fit_generator(train_gen, validation_data=val_gen, epochs=1)
according to this answer it says
Keras' fit method loads all the data into memory at once meaning
changing your batch size will have no effect on the RAM it takes up.
Have a look at using which is designed for use with a large dataset.
I think the fit_generator will load data batch-wise and not take up the whole ram instantly.
I have a problem with the DataLoader form Pytorch, because is very slow.
I did a test to show this, here is the code:
data = np.load('slices.npy')
data = np.reshape(data, (-1, 1225))
data = torch.FloatTensor(data).to('cuda')
print(data.shape)
# ==> torch.Size([273468, 1225])
class UnlabeledTensorDataset(TensorDataset):
def __init__(self, data_tensor):
self.data_tensor = data_tensor
self.samples = data_tensor.shape[0]
def __getitem__(self, index):
return self.data_tensor[index]
def __len__(self):
return self.samples
test_set = UnlabeledTensorDataset(data)
test_loader = DataLoader(test_set, batch_size=data.shape[0])
start = datetime.datetime.now()
with torch.no_grad():
for batch in test_loader:
print(batch.shape) # ==> torch.Size([273468, 1225])
y_pred = model(batch)
loss = torch.sqrt(criterion(y_pred, batch))
avg_loss = loss
print(round((datetime.datetime.now() - start).total_seconds() * 1000, 2))
# ==> 1527.57 (milliseconds) !!!!!!!!!!!!!!!!!!!!!!!!
start = datetime.datetime.now()
with torch.no_grad():
print(data.shape) # ==> torch.Size([273468, 1225])
y_pred = model(data)
loss = torch.sqrt(criterion(y_pred, data))
avg_loss = loss
print(round((datetime.datetime.now() - start).total_seconds() * 1000, 2))
# ==> 2.0 (milliseconds) !!!!!!!!!!!!!!!!!!!!!!!!
I will like to use the DataLoader but I want a way to fix the slow issue, dose anyone know why this is happening?
The time difference seems logical to me:
On one end you're looping over test_loader and doing 1225 inferences.
On the other, you are doing a single inference.
The problem is that the DataLoader is getting the item one by one by applying the __getitem__(self, index) function 273468 times (batch size).
My solution is to ditch the DataLoder
i can't understand how in function train() below, the variable (data, target) are choosen.
def train(args, model, device, federated_train_loader, optimizer, epoch):
model.train()
for batch_idx, (data, target) in enumerate(federated_train_loader): # <-- now it is a distributed dataset
model.send(data.location) # <-- NEW: send the model to the right location`
i guess they are 2 tensor representing 2 random images of dataset train, but then the loss function
loss = F.nll_loss(output, target)
is calculated at every interaction with different target?
Also i have different question: i trained the network with images of cats, then i test it with images of cars and the accuracy reached is 97%. How is this possible? is a proper value or i'm doing something wrong?
here is the entire code:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms
import syft as sy # <-- NEW: import the Pysyft library
hook = sy.TorchHook(torch) # <-- NEW: hook PyTorch ie add extra functionalities to support Federated Learning
bob = sy.VirtualWorker(hook, id="bob") # <-- NEW: define remote worker bob
alice = sy.VirtualWorker(hook, id="alice") # <-- NEW: and alice
class Arguments():
def __init__(self):
self.batch_size = 64
self.test_batch_size = 1000
self.epochs = 2
self.lr = 0.01
self.momentum = 0.5
self.no_cuda = False
self.seed = 1
self.log_interval = 30
self.save_model = False
args = Arguments()
use_cuda = not args.no_cuda and torch.cuda.is_available()
torch.manual_seed(args.seed)
device = torch.device("cuda" if use_cuda else "cpu")
kwargs = {'num_workers': 1, 'pin_memory': True} if use_cuda else {}
federated_train_loader = sy.FederatedDataLoader( # <-- this is now a FederatedDataLoader
datasets.MNIST("C:\\users...\\train", train=True, download=True,
transform=transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.1307,), (0.3081,))
]))
.federate((bob, alice)), # <-- NEW: we distribute the dataset across all the workers, it's now a FederatedDataset
batch_size=args.batch_size, shuffle=True, **kwargs)
test_loader = torch.utils.data.DataLoader(
datasets.MNIST("C:\\Users...\\test", train=False, download=True, transform=transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.1307,), (0.3081,))
])),
batch_size=args.test_batch_size, shuffle=True, **kwargs)
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(1, 20, 5, 1)
self.conv2 = nn.Conv2d(20, 50, 5, 1)
self.fc1 = nn.Linear(4*4*50, 500)
self.fc2 = nn.Linear(500, 10)
def forward(self, x):
x = F.relu(self.conv1(x))
x = F.max_pool2d(x, 2, 2)
x = F.relu(self.conv2(x))
x = F.max_pool2d(x, 2, 2)
x = x.view(-1, 4*4*50)
x = F.relu(self.fc1(x))
x = self.fc2(x)
return F.log_softmax(x, dim=1)
def train(args, model, device, federated_train_loader, optimizer, epoch):
model.train()
for batch_idx, (data, target) in enumerate(federated_train_loader): # <-- now it is a distributed dataset
model.send(data.location) # <-- NEW: send the model to the right location
data, target = data.to(device), target.to(device)
optimizer.zero_grad()
output = model(data)
loss = F.nll_loss(output, target)
loss.backward()
optimizer.step()
model.get() # <-- NEW: get the model back
if batch_idx % args.log_interval == 0:
loss = loss.get() # <-- NEW: get the loss back
print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
epoch, batch_idx * args.batch_size, len(federated_train_loader) * args.batch_size,
100. * batch_idx / len(federated_train_loader), loss.item()))
def test(args, model, device, test_loader):
model.eval()
test_loss = 0
correct = 0
with torch.no_grad():
for data, target in test_loader:
data, target = data.to(device), target.to(device)
output = model(data)
test_loss += F.nll_loss(output, target, reduction='sum').item() # sum up batch loss
pred = output.argmax(1, keepdim=True) # get the index of the max log-probability
correct += pred.eq(target.view_as(pred)).sum().item()
test_loss /= len(test_loader.dataset)
print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
test_loss, correct, len(test_loader.dataset),
100. * correct / len(test_loader.dataset)))
model = Net().to(device)
optimizer = optim.SGD(model.parameters(), lr=args.lr) # TODO momentum is not supported at the moment
for epoch in range(1, args.epochs + 1):
train(args, model, device, federated_train_loader, optimizer, epoch)
test(args, model, device, test_loader)
if (args.save_model):
torch.save(model.state_dict(), "mnist_cnn.pt")
Consider it like this. When you hook torch, all your torch tensors will get additional functionality - methods like .send(), .federate(), and attributes like .location and ._objects. Your data and target, which were once torch tensors, became pointers to tensors residing in different VirtualWorker objects due to .federate((bob, alice)).
Now data and target have additional attributes that includes .location, which will return the location of that tensor - data/target pointed by the pointer called data/target.
Federated learning sends the global model to this location, as seen in model.send(data.location).
Now, model is a pointer residing at the same location and data is also a pointer residing there. Hence when you take the output as output = model(data), output will also reside there and all we (the central server or in other words, the VirtualWorker called 'me') will get is a pointer to that output.
Now, regarding your doubt on loss calculation, since output and target are both residing in that same location, calculation of loss will also happen there. Same goes for backprop and step.
Finally, you can see model.get(), here is where the central server pulls the remote model using the pointer called model. (I'm not sure if it should be model = model.get() though).
So anything with .get() will be pulled from that worker and will be returned in our python statement. Also note that .get() will remove that object from it's location when called. Hence use .copy().get() if you are going to need it further.
Is there anyway we can add some functionality to the ImageDataGenerator, so that the ImageDataGenerator can take a list of filenames, and random sample images for each minibatch?
I know that I can custom a class which inherit ImageDataGenerator class, but I still don't know the details how to do that.
Here is what I have done:
for epoch in range(epochs):
print("epoch is: %d, total epochs: %f" % ((epoch+1), int(epochs)))
print("prepare training batch...")
train_batch = makebatch(filelist=self.train_files, img_num=img_num, slice_times=slice_times)
print("prepare validation batch..")
val_batch = makebatch(filelist=self.val_files, img_num=int(math.ceil(img_num*0.2)), slice_times=slice_times)
x_train = train_batch
y_train = x_train
x_val = val_batch
y_val = x_val
print("generate training data...")
train_datagen.fit(x_train)
train_generator = train_datagen.flow(
x=x_train,
y=y_train,
batch_size=16)
val_datagen.fit(x_val)
val_generator = val_datagen.flow(
x=x_val,
y=y_val,
batch_size=16)
print("start training..")
history = model.fit_generator(
generator=train_generator,
steps_per_epoch=None,
epochs=1,
verbose=1,
validation_data=val_generator,
validation_steps=None,
callbacks=self.callbacks)
what I really want to obtain is that I can remove the for loop and the generator random sample images for each batch.
Someone can help with that?
Here, what I would do.
Suppose I have a list of paths to all images stored in variables X_train, X_validation and the labels are stored as y_train and y_validation.
First, I would define a sequence generator. ( This is from keras website)
from skimage.io import imread
from skimage.transform import resize
import numpy as np
# Here, `x_set` is list of path to the images
# and `y_set` are the associated classes.
class CIFAR10Sequence(Sequence):
def __init__(self, x_set, y_set, batch_size):
self.x, self.y = x_set, y_set
self.batch_size = batch_size
def __len__(self):
return int(np.ceil(len(self.x) / float(self.batch_size)))
def __getitem__(self, idx):
batch_x = self.x[idx * self.batch_size:(idx + 1) * self.batch_size]
batch_y = self.y[idx * self.batch_size:(idx + 1) * self.batch_size]
return np.array([
resize(imread(file_name), (200, 200))
for file_name in batch_x]), np.array(batch_y)
Now, I would define generator for training and validation as
Xtrain_gen = detracSequence(X_train,y_train,batch_size=512) # you can choose your batch size.
Xvalidation_gen = detracSequence(X_validation,y_validation,batch_size=512)
Now, final step to train the model
model.fit_generator(generator=Xtrain_gen, epochs=100, validation_data=Xvalidation_gen,use_multiprocessing=True)
This will avoid the for loop for you and it's very efficient because CPU fetch data in parallel.
I am just starting with tensorflow and I thought a good first step would be to adapt CIFAR10 model for my own use. My database are not images but signals and a whole database has a shape of [16400,3000,1,1] (dimensionwise: number of all samples, height, width and number of channels added on purpose). I am already working on this problem with MatConvNet toolbox, so this question is strictly about tensorflow machnism. The database is a ready numpy tensor of the size above, in the code below is my attempt to prepare the data to be readable for the training script
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os
from six.moves import xrange # pylint: disable=redefined-builtin
import tensorflow as tf
import numpy as np
IMAGE_SIZE = 3000
data = np.load('/home/tensorflow-master/tensorflow/models/image/cifar10 /konsensop/data.npy')
labels = np.load('/home/tensorflow-master/tensorflow/models/image/cifar10/konsensop/labels.npy')
labels = labels-1
labels = labels.astype(int)
data = tf.cast(data,tf.float32)
labels = tf.cast(labels,tf.int64)
NUM_CLASSES = 2
NUM_EXAMPLES_PER_EPOCH_FOR_TRAIN = 10000
NUM_EXAMPLES_PER_EPOCH_FOR_EVAL = 6400
def _generate_image_and_label_batch(data_sample, label, in_queue_examples,
batch_size, shuffle):
num_preprocess_threads = 16
if shuffle:
data, label_batch = tf.train.shuffle_batch(
[data_sample, label],
batch_size=batch_size,
num_threads=num_preprocess_threads,
capacity=min_queue_examples + batch_size,
min_after_dequeue=min_queue_examples)
else:
data, label_batch = tf.train.batch(
[data_sample, label],
batch_size=batch_size,
num_threads=num_preprocess_threads,
capacity=min_queue_examples + batch_size)
return data, tf.reshape(label_batch, [batch_size])
def inputs(data,labels, batch_size):
for i in xrange(0, data.shape[0]/batch_size):
data_sample = data[i,:,:,:]
label = labels[i,0]
height = 3000
width = 1
min_fraction_of_examples_in_queue = 0.4
min_queue_examples = int(NUM_EXAMPLES_PER_EPOCH_FOR_TRAIN*
min_fraction_of_examples_in_queue)
print('Filling queue with %d data before starting to train' % min_queue_examples)
return _generate_image_and_label_batch(data_sample, label,
min_queue_examples, batch_size,
shuffle=True)
I'm trying to load the data I aleady have and generate batches in a way cifar10 model did, but when running the trainer code I get an error indata,labels = konsensop_input.inputs(data,labels,batch_size) UnboundcocalError: local variable 'data' referenced before assigment
data = konsensop_input.data
labels = konsensop_input.labels
def train():
with tf.Graph().as_default():
global_step = tf.Variable(0, trainable = False)
data, labels = konsensop_input.inputs(data, labels, batch_size)
logits = konsensop_train.inference(data)
# calculate loss
loss = konsensop.loss(logits, labels)
train_op = konsensop.train(loss, global_step)
# create a saver
saver = tf.train.Saver(tf.all_variables()) #saves all variables in a graph
# build the summary operation based on the TF collection of summaries
summary_op = tf.merge_all_summaries()
# build an initialization operation to run below
init = tf.initialize_all_variables()
# start running operations on the graph
sess = tf.Session(config = tf.ConfigProto(log_device_placement=False))
sess.run(init)
# start the queue runners
tf.train.start_queue_runners(sess = sess) #co to i po co to"""
summary_writer = tf.train.SummaryWriter( FLAGS.train_dir, sess.graph)
for step in xrange(FLAGS.max_step):
start_time = time.time()
_, loss_value = sess.run([train_op, loss])
duration = time.time() - start_time
assert not np.isnan(loss_value), 'Model diverged with loss = NaN'
if step % 10 == 0:
num_examples_per_step = FLAGS.batch_size
examples_per_sec = num_examples_per_step / duration
sec_per_batch = float(duration)
format_str = ('%s: step %d, loss = %.2f (%.1f examples/sec; %.3f sec/batch)')
print ( format_str % (datetime.now(), step, loss_value, examples_per_sec, sec_per_batch))
if step % 100 == 0:
summary_str = sess.run(summary_op)
summary_writer.add_summary(summary_str, step)
if step % 1000 == 0 or (step + 1) == FLAGS.max_steps:
checkpoint_path = os.path.join(FLAGS.train_dir, 'model.ckpt')
saver.save(sess, checkpoint_path, global_step = step)
def main(argv=None):
train()
if __name__=='__main__':
tf.app.run()
I would like to figure out how to implement a reasonable data feeding technique here
For the relatively small data set you want to work with, you might consider just loading it into a big numpy array, then iterating over it in mini-batches, which you feed to the computation graph via tf.placeholders and the feed_dict mechanism.
The mini-batch iteration could look something like this (you should probably add random shuffling after each epoch):
def iterate_batches(X, y, batch_size, num_epochs):
N = np.size(X, 0)
batches_per_epoch = N/float(batch_size)
for i in range(num_epochs):
for j in range(batches_per_epoch):
start, stop = j*batch_size, (j+1)*batch_size
yield X[start:stop, :], y[start:stop]
(If you are not familiar with Python's yield mechanism, google for Python generators. There a lots of good introductions on the web.)
Given that you have a mechanism to load the whole data set into a numpy array X_train, y_train, you can then write your training loop like this
train_op = ...
for X, y in iterate_batches(X_train, y_train, you_batch_size, your_num_epochs):
sess.run([train_op], feed_dict={X_tensor: X, y_tensor: y}
Here, X_tensor and y_tensor are tf.placeholders for the data, that you have to specify in your network architecture.