Binary classifier in CNTK with C++

Binary classifier in CNTK with C++ - python

I am trying to use the C++ API of CNTK to achieve online learning. While reading the source code of the unit tests and the CNTKLibrary.h header, I only saw the Trainer.TrainMinibatch method to train a model. Can this method be used to pass a single input-output data point? If it is possible, what is the easiest way to do this?
I tried to use the CNTK::Value::CreateSequence method to create a sequence which I then wanted to use in the TrainMinibatch function, but its not working the way I expected it to work:
I tried to port this python code to C++:
num_hidden_layers = 2
num_output_classes = 2
input_dim = 1
hidden_layers_dim = 400
input_var = C.input_variable(input_dim)
label_var = C.input_variable(num_output_classes)
def create_model(features):
with C.layers.default_options(init = C.glorot_uniform(), activation=C.ops.relu):
h = features
for _ in range(num_hidden_layers):
h = C.layers.Dense(hidden_layers_dim, activation=C.sigmoid)(h)
r = C.layers.Dense(num_output_classes, activation=None)(h)
return r
z = create_model(input_var)
loss = C.cross_entropy_with_softmax(z, label_var)
label_error = C.classification_error(z, label_var)
learning_rate = 0.2
lr_schedule = C.learning_rate_schedule(learning_rate, C.UnitType.minibatch)
learner = C.sgd(z.parameters, lr_schedule)
trainer = C.Trainer(z, (loss, label_error), [learner])
input_map = { label_var : None, input_var : None}
training_progress_output_freq = 500
for i in range(0, 10000):
input_map[input_var] = np.array([np.random.randint(0,2)], dtype=np.float32);
if input_map[input_var] == 0:
input_map[label_var] = np.array([1,0], dtype=np.float32)
else:
input_map[label_var] = np.array([0, 1], dtype=np.float32)
trainer.train_minibatch(input_map)
I ended up with this C++ code:
const size_t inputDim = 1;// 28 * 28;
const size_t numOutputClasses = 2;// 10;
const size_t hiddenLayerDim = 400;
const size_t numHiddenLayers = 2;
//build the model
auto input = InputVariable({ inputDim }, DataType::Float, L"features");
FunctionPtr classifierOutput = input;
for (int i = 0; i < numHiddenLayers; i++)
{
classifierOutput = FullyConnectedDNNLayer(classifierOutput, hiddenLayerDim, device, std::bind(Sigmoid, _1, L""));
}
classifierOutput = FullyConnectedLinearLayer(classifierOutput, 2, device);
auto labels = InputVariable({ numOutputClasses }, DataType::Float, L"labels");
auto trainingLoss = CrossEntropyWithSoftmax(classifierOutput, labels, L"lossFunction");
auto prediction = Minus(Constant::Scalar(1.0f, device), ClassificationError(classifierOutput, labels, L"classificationError"));
LearningRatePerMinibatchSchedule learningRatePerSample = 0.2;
auto trainer = CreateTrainer(classifierOutput, trainingLoss, prediction,
{ SGDLearner(classifierOutput->Parameters(), learningRatePerSample) }
);
std::cout << "Starting to train...\n";
size_t outputFrequencyInMinibatches = 500;
for (size_t i = 0; i < 10000; ++i)
{
//input data
std::vector<float> inputData(1);
inputData[0] = ((float)rand()) / RAND_MAX;
//output data
std::vector<float> outputData(2);
outputData[0] = inputData[0] > 0.5 ? 1.0 : 0.0;
outputData[1] = 1.0 - outputData[0];
ValuePtr inputSequence = CNTK::Value::CreateSequence(NDShape({ 1 }), inputData, device);
ValuePtr outputSequence = CNTK::Value::CreateSequence(NDShape({ 2 }), outputData, device);
std::unordered_map<Variable, ValuePtr> map = {{ input, inputSequence }, { labels, outputSequence } };
trainer->TrainMinibatch(map, device);
}
I am able to compile the code and to let it run, but the loss in the C++ version is not converging to 0; in the python version after a few hundred iterations the loss is more or less 0...

It seems the input data to python is either 0 or 1:
input_map[input_var] = np.array([np.random.randint(0,2)], dtype=np.float32);
while in C++ code it's float between 0 and 1
//input data
std::vector<float> inputData(1);
inputData[0] = ((float)rand()) / RAND_MAX;
Please change them to the same and check if convergence speeds are different.

Related

Image classifcation in python correct but not android studio

Currently i am implementing Tensorflow model into Android Studio using Tensorflow Lite, i have already check the Tensorflow model using Tensorflow interperter and it gives correct result in Python. The problem is where when i input image from android studio it gives wrong classification. Here is the code if i want to predict in python.
image1 =cv2.imread("image")
image_fromarray = Image.fromarray(image1,'RGB')
resize_image = image_fromarray.resize((100, 100))
expand_input = np.expand_dims(resize_image,axis=0)
input_data = np.array(expand_input)
input_data = input_data/255
pred = loaded_model.predict(input_data)
result = pred.argmax()
result
And here is the code from Android Studio if want to get the image from imageView and predict.
public void onClick(View view) {
if (img == null) {
Toast.makeText(MainActivity.this, "No image selected", Toast.LENGTH_SHORT).show();
return;
}
try {
//resize image (100,100)
img = Bitmap.createScaledBitmap(img, imgsize, imgsize, false);
// Get pixels from the bitmap
int[] intValues = new int[imgsize *imgsize];
img.getPixels(intValues, 0, img.getWidth(), 0, 0, img.getWidth(), img.getHeight());
// Convert pixels to float values
float[] floatValues = new float[intValues.length * 3];
for (int i = 0; i < intValues.length; i++) {
final int val = intValues[i];
floatValues[i * 3] = ((val >> 16) & 0xFF) / 255.f;
floatValues[i * 3 + 1] = ((val >> 8) & 0xFF) / 255.f;
floatValues[i * 3 + 2] = (val & 0xFF) / 255.f;
}
TensorBuffer inputBuffer = TensorBuffer.createFixedSize(new int[]{1, imgsize, imgsize, 3}, DataType.FLOAT32);
inputBuffer.loadArray(floatValues);
Model2 model = Model2.newInstance(getApplicationContext());
Model2.Outputs outputs = model.process(inputBuffer);
TensorBuffer outputFeature0 = outputs.getOutputFeature0AsTensorBuffer();
// Releases model resources if no longer used.
model.close();
float[] confidence = outputFeature0.getFloatArray();
int maxPos=-1;
float maxConfidence = -1;
for (int i = 0;i<confidence.length;i++)
{
if(confidence[i]>maxConfidence){
maxConfidence = confidence[i];
maxPos=i;
}
}
String[] classes = {"Ripe Braeburn", "Ripe Red Apple", "Ripe Red Delicious", "Rotten"};
tv.setText(classes[maxPos]);
The problem is it only gives 1 result for any picture i choose from the test set. How can i modify the code in android studio so it gives the same result in python?

How to get the output from YOLO model using tensorflow with C++ correctly?

I'm trying to write an inference program with YOLO model in C++. I've searched for some info about darknet, but it has to use .cfg file to import the model structure(which is a bit too complicated for me...), thus I want to do the program with tensorflow.
(My model weight is converted from .hdf5(used in python) to .pb(used in C++))
I've found some examples written in python, it seems like they have done some work before the inference process... Source
def yolo_eval(yolo_outputs,
anchors,
num_classes,
image_shape,
max_boxes=50,
score_threshold=.6,
iou_threshold=.5):
"""Evaluate YOLO model on given input and return filtered boxes."""
num_layers = len(yolo_outputs)
anchor_mask = [[6,7,8], [3,4,5], [0,1,2]] if num_layers==3 else [[3,4,5], [1,2,3]] # default setting
input_shape = K.shape(yolo_outputs[0])[1:3] * 32
boxes = []
box_scores = []
for l in range(num_layers):
_boxes, _box_scores = yolo_boxes_and_scores(yolo_outputs[l],
anchors[anchor_mask[l]], num_classes, input_shape, image_shape)
boxes.append(_boxes)
box_scores.append(_box_scores)
boxes = K.concatenate(boxes, axis=0)
box_scores = K.concatenate(box_scores, axis=0)
mask = box_scores >= score_threshold
max_boxes_tensor = K.constant(max_boxes, dtype='int32')
boxes_ = []
scores_ = []
classes_ = []
for c in range(num_classes):
# TODO: use keras backend instead of tf.
class_boxes = tf.boolean_mask(boxes, mask[:, c])
class_box_scores = tf.boolean_mask(box_scores[:, c], mask[:, c])
nms_index = tf.image.non_max_suppression(
class_boxes, class_box_scores, max_boxes_tensor, iou_threshold=iou_threshold)
class_boxes = K.gather(class_boxes, nms_index)
class_box_scores = K.gather(class_box_scores, nms_index)
classes = K.ones_like(class_box_scores, 'int32') * c
boxes_.append(class_boxes)
scores_.append(class_box_scores)
classes_.append(classes)
boxes_ = K.concatenate(boxes_, axis=0)
scores_ = K.concatenate(scores_, axis=0)
classes_ = K.concatenate(classes_, axis=0)
return boxes_, scores_, classes_
I've printed out the return value
and it looks like this
boxes-> Tensor("concat_11:0", shape=(?, 4), dtype=float32)
scores-> Tensor("concat_12:0", shape=(?,), dtype=float32)
classes-> Tensor("concat_13:0", shape=(?,), dtype=int32)
the original output of my YOLO model(.hdf5) is (I got this by printed out model.output)
tf.Tensor 'conv2d_59_1/BiasAdd:0' shape=(?, ?, ?, 21) dtype=float32
tf.Tensor 'conv2d_67_1/BiasAdd:0' shape=(?, ?, ?, 21) dtype=float32
tf.Tensor 'conv2d_75_1/BiasAdd:0' shape=(?, ?, ?, 21) dtype=float32
And the inference part of the python code is
out_boxes, out_scores, out_classes = sess.run(
[boxes, scores, classes],
feed_dict={
yolo_model.input: image_data,
input_image_shape: [image.size[1], image.size[0]],
K.learning_phase(): 0
})
Compare to the python version of inference code,
C++ part is... (Reference)
int main()
{
string image = "test.jpg";
string graph = "yolo_weight.pb";
string labels = "coco.names";
int32 input_width = 416;
int32 input_height = 416;
float input_mean = 0;
float input_std = 255;
string input_layer = "input_1:0";
std::vector<std::string> output_layer = {"conv2d_59/BiasAdd:0", "conv2d_67/BiasAdd:0", "conv2d_75/BiasAdd:0" };
std::unique_ptr<tensorflow::Session> session;
string graph_path = tensorflow::io::JoinPath(root_dir, graph);
Status load_graph_status = LoadGraph(graph_path, &session);
std::vector<Tensor> resized_tensors;
string image_path = tensorflow::io::JoinPath(root_dir, image);
Status read_tensor_status = ReadTensorFromImageFile(image_path, input_height, input_width,
input_mean, input_std, &resized_tensors);
Tensor inpTensor = Tensor(DT_FLOAT, TensorShape({ 1, input_height, input_width, 3 }));
std::vector<Tensor> outputs;
cv::Mat srcImage = cv::imread(image);
cv::resize(srcImage, srcImage, cv::Size(input_width, input_height));
srcImage.convertTo(srcImage, CV_32FC3);
srcImage = srcImage / 255;
string ty = type2str(srcImage.type());
float *p = (&inpTensor)->flat<float>().data();
cv::Mat tensorMat(input_height, input_width, CV_32FC3, p);
srcImage.convertTo(tensorMat, CV_32FC3);
Status run_status = session->Run({{ input_layer, inpTensor }}, { output_layer }, {}, &outputs);
int cc = 1;
auto output_detection_class = outputs[0].tensor<float, 4>();
std::cout << "detection scores" << std::endl;
std::cout << "typeid(output_detection_scoreclass).name->" << typeid(output_detection_class).name() << std::endl;
for (int i = 0; i < 13; ++i)
{
for (int j = 0; j < 13; ++j)
{
for (int k = 0; k < 21; ++k)
{
// using (index_1, index_2, index_3) to access the element in a tensor
printf("i->%d, j->%d, k->%d\t", i, j, k);
std::cout << output_detection_class(1, i, j, k) << "\t";
cc += 1;
if (cc % 4 == 0)
{
std::cout << "\n";
}
}
}
std::cout << std::endl;
}
return 0;
}
The output of c++ version inference part is
outputs.size()-> 3
outputs[0].shape()-> [1,13,13,21]
outputs[1].shape()-> [1,26,26,21]
outputs[2].shape()-> [1,52,52,21]
But the output I get is pretty weird...
(The output value of outputs[0] doesn't seems like any one of score, class, or coordinates...)
So I'm wondering is it because I miss the part written in python before its inference? Or I use the wrong way to get my output data?
I've checked some related questions and answers...
1.Yolo v3 model output clarification with keras
2.Convert YoloV3 output to coordinates of bounding box, label and confidence
3.How to access tensorflow::Tensor C++
But I still can't figure out how to make it :(
I also found a repo which might be helpful,
I've taken a look at its yolo.cpp, but its model output tensor's shape is different from mine, I'm not sure if I can revise the code directly, its output tensor is
tf.Tensor 'import/output:0' shape=(?, 735) dtype = float32
Any help or advice is appreciated...

In case you're still struggling with this, I don't see where you are applying the Sigmoid and Exp to the output layer values.
You might look at this paper, which describes how to handle the output.
https://medium.com/analytics-vidhya/yolo-v3-theory-explained-33100f6d193

As Bryan said, there're still some actions need to be done with the output layer.
So in my case (according to this repo), I add this to the YOLO class (at file yolo.py) for adding those post-processing when saving model:
def output_pb(self, out_dir, out_pb):
out_bx = self.boxes.name.split(":")[0]
out_sc = self.scores.name.split(":")[0]
out_cs = self.classes.name.split(":")[0]
print(out_bx, out_sc, out_cs)
frozen_graph = tf.graph_util.remove_training_nodes(tf.graph_util.convert_variables_to_constants(self.sess, self.sess.graph.as_graph_def(), [out_bx, out_sc, out_cs]))
tf.io.write_graph(frozen_graph, out_dir, out_pb, as_text=False)
print("===== FINISH saving new pb file =====")
When saving model, I called the function like this:
yolo = YOLO(**config)
yolo.output_pb(output_dir, output_pb_name)
And when doing inference in C++,
the whole process goes like this:
// initialize model
YOLO* YOLO_data = (YOLO*)Init_DllODM_object(config);
// do some stuff to set data in YOLO_data
cv::Mat input_pic = "whatever_pic.png";
predict(YOLO_data, input_pic, YOLO_data ->bbox_res, YOLO_data ->score_res, YOLO_data ->class_res);
// draw result on pic
cv::Mat res = show_result(YOLO_data, input_pic);
Detailed code is here:
// yolo_cpp.h
struct YOLO
{
float score_thres;
std::vector<int> class_res;
std::vector<float> bbox_res, score_res;
std::string inp_tensor_name;
std::string placeholder_name;
std::vector<std::string> out_tensors;
Session* session;
Tensor t, inpTensor;
std::vector<tensorflow::Tensor> outTensor;
std::vector<int> MD_size;
std::vector<int> inp_pic_size;
std::vector<std::string> md_class_list;
std::vector<cv::Scalar> color_list;
int show_score;
int score_type;
int return_origin;
};
// yolo_cpp.cpp
void* Init_DllODM_object(json config)
{
std::string model_path = config["model"].get<std::string>();
YOLO* YOLO_data = new YOLO();
auto options = tensorflow::SessionOptions();
GraphDef graphdef;
// loading model to graph
Status status_load = ReadBinaryProto(Env::Default(), model_path, &graphdef);
options.config.mutable_gpu_options()->set_per_process_gpu_memory_fraction(0.7);
options.config.mutable_gpu_options()->set_allow_growth(true);
int node_count = graphdef.node_size();
for (int i = 0; i < node_count; i++)
{
auto n = graphdef.node(i);
if (n.name().find("input_") != string::npos)
{
YOLO_data->inp_tensor_name = n.name();
}
else if (n.name().find("Placeholder_") != string::npos)
{
YOLO_data->placeholder_name = n.name();
}
else if (i == node_count - 5)
{
YOLO_data->out_tensors.push_back(n.name());
}
else if (i == node_count - 3)
{
YOLO_data->out_tensors.push_back(n.name());
}
else if (i == node_count - 1)
{
YOLO_data->out_tensors.push_back(n.name());
}
}
if (!status_load.ok()) {
std::cout << "ERROR: Loading model failed..." << std::endl;
std::cout << model_path << status_load.ToString() << "\n";
}
std::vector<int> MD_size_ = config["input_size"];
YOLO_data->MD_size = MD_size_;
std::vector<int> inp_pic_size_ = config["input_pic_size"];
YOLO_data->inp_pic_size = inp_pic_size_;
YOLO_data->inpTensor = Tensor(DT_FLOAT, TensorShape({ 1, YOLO_data->MD_size[0], YOLO_data->MD_size[1], 3 })); // input tensor
YOLO_data->t = Tensor(DT_FLOAT, TensorShape({ 2 }));
//ref: https://stackoverflow.com/questions/36804714/define-a-feed-dict-in-c-for-tensorflow-models
auto t_matrix = YOLO_data->t.tensor<float, 1>();
t_matrix(0) = YOLO_data->inp_pic_size[0];
t_matrix(1) = YOLO_data->inp_pic_size[1];
// create session
Status status_newsess = NewSession(options, &YOLO_data->session); //for the usage of gpu setting
Status status_create = YOLO_data->session->Create(graphdef);
if (!status_create.ok()) {
std::cout << "ERROR: Creating graph in session failed.." << status_create.ToString() << std::endl;
}
else {
std::cout << "----------- Successfully created session and load graph -------------" << std::endl;
}
return YOLO_data;
}
int predict(YOLO* YOLO_, cv::Mat srcImage, std::vector<float>& bbox_res, std::vector<float>& score_res, std::vector<int>& class_res)
{
// read image -> input image
if (srcImage.empty()) // check if image can open correctly
{
std::cout << "can't open the image!!!!!!!" << std::endl;
int res = -1;
return res;
}
// ref: https://ppt.cc/f7ERNx
std::vector<std::pair<string, tensorflow::Tensor>> inputs = {
{ YOLO_->inp_tensor_name, YOLO_->inpTensor },
{ YOLO_->placeholder_name, YOLO_->t },
};
srcImage = letterbox_image(srcImage, YOLO_->MD_size[0], YOLO_->MD_size[1]);
convertCVMatToTensor(YOLO_, srcImage);
Status status_run = YOLO_->session->Run({ inputs }, { YOLO_->out_tensors }, {}, &YOLO_->outTensor);
if (!status_run.ok()) {
std::cout << "ERROR: RUN failed..." << std::endl;
std::cout << status_run.ToString() << "\n";
int res = -1;
return res;
}
TTypes<float>::Flat pp1 = YOLO_->outTensor[0].flat<float>();
TTypes<float>::Flat pp2 = YOLO_->outTensor[1].flat<float>();
TTypes<int>::Flat pp3 = YOLO_->outTensor[2].flat<int>();
int pp1_idx;
for (int i = 0; i < pp2.size(); i++)
{
pp1_idx = i * 4;
bbox_res.push_back(pp1(pp1_idx));
bbox_res.push_back(pp1(pp1_idx + 1));
bbox_res.push_back(pp1(pp1_idx + 2));
bbox_res.push_back(pp1(pp1_idx + 3));
score_res.push_back(pp2(i));
class_res.push_back(pp3(i));
}
return 0;
}
cv::Mat show_result(YOLO* inf_obj, cv::Mat inp_pic)
{
int bbox_idx;
std::string plot_str;
bool under_thresh = false;
std::vector<int> del_idx;
for (int i = 0; i < inf_obj->class_res.size(); i++)
{
int y_min, y_max, x_min, x_max;
bbox_idx = i * 4;
y_min = std::max(0, (int)floor(inf_obj->bbox_res[bbox_idx] + 0.5));
x_min = std::max(0, (int)floor(inf_obj->bbox_res[bbox_idx + 1] + 0.5));
y_max = std::max(0, (int)floor(inf_obj->bbox_res[bbox_idx + 2] + 0.5));
x_max = std::max(0, (int)floor(inf_obj->bbox_res[bbox_idx + 3] + 0.5));
//std::cout << md_class_list[class_res[i]] << ", ";
//std::cout << score_res[i] << ",";
//std::cout << "[" << x_min << ", " << y_min << ", " << x_max << ", " << y_max << "]\n";
if (inf_obj->show_score)
{
if (inf_obj->score_type)
plot_str = inf_obj->md_class_list[inf_obj->class_res[i]] + ", " + std::to_string(rounding(inf_obj->score_res[i] * 100, 2)).substr(0, 5) + "%";
else
plot_str = inf_obj->md_class_list[inf_obj->class_res[i]] + ", " + std::to_string(rounding(inf_obj->score_res[i], 2)).substr(0, 4);
}
else
plot_str = inf_obj->md_class_list[inf_obj->class_res[i]];
if (inf_obj->score_res[i] >= inf_obj->score_thres)
{
inp_pic = plot_one_box(inp_pic, x_min, y_min, x_max, y_max, plot_str, inf_obj->color_list[inf_obj->class_res[i]]);
}
else
{
//std::cout << "score_res[i]->" << score_res[i] << "under thresh!!" << std::endl;
under_thresh = true;
del_idx.push_back(i);
}
}
if (under_thresh)
{
//std::cout << "*** deleting element" << std::endl;
for (int x = 0; x < del_idx.size(); x++)
{
bbox_idx = (del_idx[x] - x) * 4;
inf_obj->bbox_res.erase(inf_obj->bbox_res.begin() + bbox_idx + 3);
inf_obj->bbox_res.erase(inf_obj->bbox_res.begin() + bbox_idx + 2);
inf_obj->bbox_res.erase(inf_obj->bbox_res.begin() + bbox_idx + 1);
inf_obj->bbox_res.erase(inf_obj->bbox_res.begin() + bbox_idx);
inf_obj->score_res.erase(inf_obj->score_res.begin() + del_idx[x] - x);
inf_obj->class_res.erase(inf_obj->class_res.begin() + del_idx[x] - x);
}
del_idx.clear();
}
return inp_pic;
}
Since my code is used for dll, I arranged in this way.
There are still some redundant code I didn't delete,
but I think the whole process can be done with these provided code so far.
Hope this help :D

Tensorflow frozen graph protobuf does not predict using c api

I have trained model for semantic segmentation using this repo, got good results and tried to use this net in small library writen with tensorflow c API. I turned my keras model into protobuf file using this repo and run session using this code:
typedef struct model_t {
TF_Graph* graph;
TF_Session* session;
TF_Status* status;
TF_Output input, target, output;
TF_Operation *init_op, *train_op, *save_op, *restore_op;
TF_Output checkpoint_file;
} model_t;
typedef struct NetProperties {
int width;
int height;
int border;
int classes;
int inputSize;
} NetProperties;
static model_t * model;
static NetProperties * properties;
extern "C" EXPORT int ModelCreate(const char* nnFilename, const char* inputName, const char* outputName, int pictureWidth, int pictureHeight, int border, int classes) {
ModelDestroy();
model = (model_t*)malloc(sizeof(model_t));;
model->status = TF_NewStatus();
model->graph = TF_NewGraph();
properties = (NetProperties*)malloc(sizeof(NetProperties));
properties->width = pictureWidth;
properties->height = pictureHeight;
properties->border = border;
properties->classes = classes;
properties->inputSize = (pictureWidth + border * 2) * (pictureHeight + border * 2) * 3;
{
// Create the session.
TF_SessionOptions* opts = TF_NewSessionOptions();
model->session = TF_NewSession(model->graph, opts, model->status);
TF_DeleteSessionOptions(opts);
if (!Okay(model->status)) return 0;
}
TF_Graph* g = model->graph;
{
// Import the graph.
TF_Buffer* graph_def = read_file(nnFilename);
if (graph_def == NULL) return 0;
printf("Read GraphDef of %zu bytes\n", graph_def->length);
TF_ImportGraphDefOptions* opts = TF_NewImportGraphDefOptions();
TF_GraphImportGraphDef(g, graph_def, opts, model->status);
TF_DeleteImportGraphDefOptions(opts);
TF_DeleteBuffer(graph_def);
if (!Okay(model->status)) return 0;
}
// Handles to the interesting operations in the graph.
model->input.oper = TF_GraphOperationByName(g, inputName);
model->input.index = 0;
model->target.oper = TF_GraphOperationByName(g, "target");
model->target.index = 0;
model->output.oper = TF_GraphOperationByName(g, outputName);
model->output.index = 0;
model->init_op = TF_GraphOperationByName(g, "init");
model->train_op = TF_GraphOperationByName(g, "train");
model->save_op = TF_GraphOperationByName(g, "save/control_dependency");
model->restore_op = TF_GraphOperationByName(g, "save/restore_all");
model->checkpoint_file.oper = TF_GraphOperationByName(g, "save/Const");
model->checkpoint_file.index = 0;
// first prediction is slow
unsigned char * randomData = (unsigned char*)malloc(properties->inputSize * sizeof(unsigned char));
for (int i = 0; i < properties->inputSize; i++) {
randomData[i] = (unsigned char)100;
}
ModelPredict(randomData);
free(randomData);
return 1;
}
extern "C" EXPORT void ModelDestroy() {
if (model == nullptr) return;
TF_DeleteSession(model->session, model->status);
Okay(model->status);
TF_DeleteGraph(model->graph);
TF_DeleteStatus(model->status);
free(model);
}
extern "C" EXPORT unsigned char* ModelPredict(unsigned char * batch1) {
if (model == NULL) return NULL;
const int64_t dims[4] = { 1, properties->height + properties->border * 2, properties->width + properties->border * 2, 3 };
size_t nbytes = properties->inputSize;
// can be faster
float * arrayOfFloats = (float*)malloc(nbytes * sizeof(float));
//float sumUp = 0;
for (int i = 0; i < properties->inputSize; i++) {
arrayOfFloats[i] = batch1[i] * (1.f / 255.f);
//sumUp += arrayOfFloats[i];
}
//std::cout << sumUp << std::endl;
// removed due to jdehesa answer
//float ** inputFloats = (float**)malloc(nbytes * sizeof(float*));
//inputFloats[0] = arrayOfFloats;
// Optionally, you can check that your input_op and input tensors are correct
//// by using some of the functions provided by the C API.
//std::cout << "Input op info: " << TF_OperationNumOutputs(input_op) << "\n";
//std::cout << "Input data info: " << TF_Dim(input, 0) << "\n";
std::vector<TF_Output> inputs;
std::vector<TF_Tensor*> input_values;
TF_Operation* input_op = model->input.oper;
TF_Output input_opout = { input_op, 0 };
inputs.push_back(input_opout);
// reworked due to jdehesa answer
//TF_Tensor* input = TF_NewTensor(TF_FLOAT, dims, 4, (void*)inputFloats, //nbytes * sizeof(float), &Deallocator, NULL);
TF_Tensor* input = TF_NewTensor(TF_FLOAT, dims, 4, (void*)arrayOfFloats, nbytes * sizeof(float), &Deallocator, NULL);
input_values.push_back(input);
int outputSize = properties->width * properties->height * properties->classes;
int64_t out_dims[] = { 1, properties->height, properties->width, properties->classes };
// Create vector to store graph output operations
std::vector<TF_Output> outputs;
TF_Operation* output_op = model->output.oper;
TF_Output output_opout = { output_op, 0 };
outputs.push_back(output_opout);
// Create TF_Tensor* vector
//std::vector<TF_Tensor*> output_values(outputs.size(), nullptr);
// Similar to creating the input tensor, however here we don't yet have the
// output values, so we use TF_AllocateTensor()
TF_Tensor* output_value = TF_AllocateTensor(TF_FLOAT, out_dims, 4, outputSize * sizeof(float));
//output_values.push_back(output_value);
//// As with inputs, check the values for the output operation and output tensor
//std::cout << "Output: " << TF_OperationName(output_op) << "\n";
//std::cout << "Output info: " << TF_Dim(output_value, 0) << "\n";
TF_SessionRun(model->session, NULL,
&inputs[0], &input_values[0], inputs.size(),
&outputs[0], &output_value, outputs.size(),
/* No target operations to run */
NULL, 0, NULL, model->status);
if (!Okay(model->status)) return NULL;
TF_DeleteTensor(input_values[0]);
// memory allocations take place here
float* prediction = (float*)TF_TensorData(output_value);
//float* prediction = (float*)malloc(sizeof(float) * properties->inputSize / 3 * properties->classes);
//memcpy(prediction, TF_TensorData(output_value), sizeof(float) * properties->inputSize / 3 * properties->classes);
unsigned char * charPrediction = new unsigned char[outputSize * sizeof(unsigned char)];
sumUp = 0;
for (int i = 0; i < outputSize; i++) {
charPrediction[i] = (unsigned char)((prediction[i] * 255));
//sumUp += prediction[i];
}
//std::cout << sumUp << std::endl << std::endl;
//free(prediction);
TF_DeleteTensor(output_value);
return charPrediction;
}
The problem is that prediction result is always the same. I tried to pass random data and real images but the result was equal. However, defferent trained models give different prediction result, but for each model it's always same. As you can see in code snippet, I checked that pass different data and get same prediction every time
// first is float sum of passed picture, second is the float sum of answer
724306
22982.6
692004
22982.6
718490
22982.6
692004
22982.6
720861
22982.6
692004
22982.6
I tried to write my own keras to tensorflow .pb converter but result was the same.
import os, argparse
import tensorflow as tf
from tensorflow.keras.utils import get_custom_objects
from segmentation_models.losses import bce_dice_loss,dice_loss,cce_dice_loss
from segmentation_models.metrics import iou_score
# some custom functions from segmentation_models
get_custom_objects().update({
'dice_loss': dice_loss,
'bce_dice_loss': bce_dice_loss,
'cce_dice_loss': cce_dice_loss,
'iou_score': iou_score,
})
def freeze_keras(model_name):
tf.keras.backend.set_learning_phase(0)
model = tf.keras.models.load_model(model_name)
sess = tf.keras.backend.get_session()
constant_graph = tf.graph_util.convert_variables_to_constants(sess, sess.graph.as_graph_def(), [out.op.name for out in model.outputs])
tf.train.write_graph(constant_graph, './', 'saved_model.pb', as_text=False)
freeze_keras('best-weights.hdf5')
Help me to find out how to fix prediction result in c api.
UPDATE 1: Reworked input array as jdehesa suggested
UPDATE 2: Added definition of model and NetProperties

I think you are not setting the input data correctly. Let's see.
float * arrayOfFloats1 = (float*)malloc(nbytes * sizeof(float));
float sumUp = 0;
Here you create arrayOfFloats1 to hold all the image data.
for (int i = 0; i < properties->inputSize; i++) {
arrayOfFloats1[i] = batch1[i] * (1.f / 255.f);
sumUp += arrayOfFloats1[i];
}
std::cout << sumUp << std::endl;
Here you set arrayOfFloats1 to the image data. This is all fine.
But then:
float ** inputFloats = (float**)malloc(nbytes * sizeof(float*));
Here you have inputFloats, which has space for nbytes float pointers. First, you probably would want to allocate space for float values, not float pointers (which probably do not have the same size). And then:
inputFloats[0] = arrayOfFloats1;
Here you are setting the first of those nbytes pointers to the pointer arrayOfFloats1. And then inputFloats is used as input to the model. But the remaining nbytes - 1 pointers have not been set to anything. Although not required, they are probably set all to zero.
If you just want to make an "array of arrays of floats" with arrayOfFloats1 you don't need to allocate any memory, you can simply do:
float ** inputFloats = &arrayOfFloats1;
But then you actually use inputFloats like this:
TF_Tensor* input = TF_NewTensor(
TF_FLOAT, dims, 4, (void*)inputFloats, nbytes * sizeof(float), &Deallocator, NULL);
So here you are saying that input is made up of the data in inputFloats, which will be a pointer to arrayOfFloats1 and then uninitialized memory. Probably you actually want something like:
TF_Tensor* input = TF_NewTensor(
TF_FLOAT, dims, 4, (void*)arrayOfFloats1, nbytes * sizeof(float), &Deallocator, NULL);
Which means input will be a tensor made up of the data in arrayOfFloats1 that you copied before. In fact, I don't think your code needs inputFloats at all.
Otherwise, from what I can tell the rest of the code seems correct. You should ensure that all allocated memory is properly freed in all cases (e.g. when you do if (!Okay(model->status)) return NULL; you should probably delete the input and output tensors before returning), but that is a different issue.

The issue was in the model. I have trained it using not normalized data from images (pixel values are between 0.0 and 255.0) and tried to interfere normilezed data (I devided each pixel value by 255 arrayOfFloats[i] = batch1[i] * (1.f / 255.f); and got values between 0.0 and 1.0) so my model thought that it gets black images every time and gave me similar answers. So I removed normalization and the model started to predict.

Java to Python Code Not Working

I am trying to convert the Java Code to Python Code and i have done it so far. Java Code works but Python Code doesn't work. Please help me.
Python Code
import random
class QLearning():
alpha = 0.1
gamma = 0.9
state_a = 0
state_b = 1
state_c = 2
state_d = 3
state_e = 4
state_f = 5
states_count = 6
states = [state_a, state_b, state_c, state_d, state_e, state_f]
R = [[0 for x in range(states_count)] for x in range(states_count)]
Q = [[0 for x in range(states_count)] for x in range(states_count)]
action_from_a = [state_b, state_d]
action_from_b = [state_a, state_c, state_e]
action_from_c = [state_c]
action_from_d = [state_a, state_e]
action_from_e = [state_b, state_d, state_f]
action_from_f = [state_c, state_e]
actions = [action_from_a, action_from_b, action_from_c, action_from_d, action_from_e, action_from_f]
state_names = ["A","B","C","D","E","F"]
def __init__(self):
self.R[self.state_b][self.state_c] = 100
self.R[self.state_f][self.state_c] = 100
def run(self):
for i in range(1000):
state = random.randrange(self.states_count)
while(state != self.state_c):
actions_from_state = self.actions[state]
index = random.randrange(len(actions_from_state))
action = actions_from_state[index]
next_state = action
q = self.Q_Value(state, action)
max_Q = self.max_q(next_state)
r = self.R_Value(state, action)
value = q + self.alpha * (r + self.gamma * max_Q - q)
self.set_q(state, action, value)
state = next_state
def max_q(self, s):
self.run().actions_from_state = self.actions[s]
max_value = 5
for i in range(len(self.run().actions_from_state)):
self.run().next_state = self.run().actions_from_state[i]
self.run().value = self.Q[s][self.run().next_state]
if self.run().value > max_value:
max_value = self.run().value
return max_value
def policy(self, state):
self.run().actions_from_state = self.actions[state]
max_value = 5
policy_goto_state = state
for i in range(len(self.run().actions_from_state)):
self.run().next_state = self.run().actions_from_state[i]
self.run().value = self.Q[state][self.run().next_state]
if self.run().value > max_value:
max_value = self.run().value
policy_goto_state = self.run().next_state
return policy_goto_state
def Q_Value(self, s,a):
return self.Q[s][a]
def set_q(self, s, a, value):
self.Q[s][a] = value
def R_Value(self, s, a):
return self.R[s][a]
def print_result(self):
print("Print Result")
for i in range(len(self.Q)):
print("Out From (0)".format(self.state_names[i]))
for j in range(len(self.Q[i])):
print(self.Q[i][j])
def show_policy(self):
print("Show Policy")
for i in range(len(self.states)):
fro = self.states[i]
to = self.policy(fro)
print("From {0} goto {1}".format(self.state_names[fro], self.state_names[to]))
obj = QLearning()
obj.run()
obj.print_result()
obj.show_policy()
Java Code
import java.text.DecimalFormat;
import java.util.Random;
public class Qlearning {
final DecimalFormat df = new DecimalFormat("#.##");
// path finding
final double alpha = 0.1;
final double gamma = 0.9;
// states A,B,C,D,E,F
// e.g. from A we can go to B or D
// from C we can only go to C
// C is goal state, reward 100 when B->C or F->C
//
// _______
// |A|B|C|
// |_____|
// |D|E|F|
// |_____|
//
final int stateA = 0;
final int stateB = 1;
final int stateC = 2;
final int stateD = 3;
final int stateE = 4;
final int stateF = 5;
final int statesCount = 6;
final int[] states = new int[]{stateA,stateB,stateC,stateD,stateE,stateF};
// http://en.wikipedia.org/wiki/Q-learning
// http://people.revoledu.com/kardi/tutorial/ReinforcementLearning/Q-Learning.htm
// Q(s,a)= Q(s,a) + alpha * (R(s,a) + gamma * Max(next state, all actions) - Q(s,a))
int[][] R = new int[statesCount][statesCount]; // reward lookup
double[][] Q = new double[statesCount][statesCount]; // Q learning
int[] actionsFromA = new int[] { stateB, stateD };
int[] actionsFromB = new int[] { stateA, stateC, stateE };
int[] actionsFromC = new int[] { stateC };
int[] actionsFromD = new int[] { stateA, stateE };
int[] actionsFromE = new int[] { stateB, stateD, stateF };
int[] actionsFromF = new int[] { stateC, stateE };
int[][] actions = new int[][] { actionsFromA, actionsFromB, actionsFromC,
actionsFromD, actionsFromE, actionsFromF };
String[] stateNames = new String[] { "A", "B", "C", "D", "E", "F" };
public Qlearning() {
init();
}
public void init() {
R[stateB][stateC] = 100; // from b to c
R[stateF][stateC] = 100; // from f to c
}
public static void main(String[] args) {
long BEGIN = System.currentTimeMillis();
Qlearning obj = new Qlearning();
obj.run();
obj.printResult();
obj.showPolicy();
long END = System.currentTimeMillis();
System.out.println("Time: " + (END - BEGIN) / 1000.0 + " sec.");
}
void run() {
/*
1. Set parameter , and environment reward matrix R
2. Initialize matrix Q as zero matrix
3. For each episode: Select random initial state
Do while not reach goal state o
Select one among all possible actions for the current state o
Using this possible action, consider to go to the next state o
Get maximum Q value of this next state based on all possible actions o
Compute o Set the next state as the current state
*/
// For each episode
Random rand = new Random();
for (int i = 0; i < 1000; i++) { // train episodes
// Select random initial state
int state = rand.nextInt(statesCount);
while (state != stateC) // goal state
{
// Select one among all possible actions for the current state
int[] actionsFromState = actions[state];
// Selection strategy is random in this example
int index = rand.nextInt(actionsFromState.length);
int action = actionsFromState[index];
// Action outcome is set to deterministic in this example
// Transition probability is 1
int nextState = action; // data structure
// Using this possible action, consider to go to the next state
double q = Q(state, action);
double maxQ = maxQ(nextState);
int r = R(state, action);
double value = q + alpha * (r + gamma * maxQ - q);
setQ(state, action, value);
// Set the next state as the current state
state = nextState;
}
}
}
double maxQ(int s) {
int[] actionsFromState = actions[s];
double maxValue = Double.MIN_VALUE;
for (int i = 0; i < actionsFromState.length; i++) {
int nextState = actionsFromState[i];
double value = Q[s][nextState];
if (value > maxValue)
maxValue = value;
}
return maxValue;
}
// get policy from state
int policy(int state) {
int[] actionsFromState = actions[state];
double maxValue = Double.MIN_VALUE;
int policyGotoState = state; // default goto self if not found
for (int i = 0; i < actionsFromState.length; i++) {
int nextState = actionsFromState[i];
double value = Q[state][nextState];
if (value > maxValue) {
maxValue = value;
policyGotoState = nextState;
}
}
return policyGotoState;
}
double Q(int s, int a) {
return Q[s][a];
}
void setQ(int s, int a, double value) {
Q[s][a] = value;
}
int R(int s, int a) {
return R[s][a];
}
void printResult() {
System.out.println("Print result");
for (int i = 0; i < Q.length; i++) {
System.out.print("out from " + stateNames[i] + ": ");
for (int j = 0; j < Q[i].length; j++) {
System.out.print(df.format(Q[i][j]) + " ");
}
System.out.println();
}
}
// policy is maxQ(states)
void showPolicy() {
System.out.println("\nshowPolicy");
for (int i = 0; i < states.length; i++) {
int from = states[i];
int to = policy(from);
System.out.println("from "+stateNames[from]+" goto "+stateNames[to]);
}
}
}
Traceback
C:\Python33\python.exe "C:/Users/Ajay/Documents/Python Scripts/RL/QLearning.py"
Traceback (most recent call last):
File "C:/Users/Ajay/Documents/Python Scripts/RL/QLearning.py", line 4, in <module>
class QLearning():
File "C:/Users/Ajay/Documents/Python Scripts/RL/QLearning.py", line 19, in QLearning
R = [[0 for x in range(states_count)] for x in range(states_count)]
File "C:/Users/Ajay/Documents/Python Scripts/RL/QLearning.py", line 19, in <listcomp>
R = [[0 for x in range(states_count)] for x in range(states_count)]
NameError: global name 'states_count' is not defined

To access all of the class attributes you define (i.e. everything between class QLearning and def __init__), you need to use self or the class name:
self.states_count
or
QLearning.states_count
I don't know the algorithm, but it is possible that these class attributes should be instance attributes (i.e. separate for each instance of the class, rather than shared amongst all instances) and therefore defined in __init__ (or other instance methods) using self anyway.

Segmentation fault in if clause using gcc/ubuntu

I am writing a c-function for use in python. When run a segmentation fault occurs, which, according to printf calls, is thrown at an if-clause. The output to the shell is:
row 1, col 0: 1.000000
row:0, -col:0, index:0
-2: 0.000000
else
row:0, -col:1, index:1
-2: 0.000000
else
row:0, -col:2, index:2
-2: 0.000000
else
row:0, -col:3, index:3
-2: 0.000000
else
row:0, -col:4, index:4
-2: 0.000000
else
row:1, -col:0, index:5
-2: 1.000000
Speicherzugriffsfehler (Speicherabzug geschrieben)
(the last line means segmentation fault)
and the c-code is:
#include <stdio.h>
#include <math.h>
#define pi 3.1415926535897932384626433832795028841971693993751058209749445923078164062862089986280348253421170679
void hough(const void* img, int imgRowCount, int imgColCount, const void* thetas, int thetaCount, const void* rhos, int rhoCount, void* out)
{
const double* imgD = (double*) img;
double* outD = (double*)out;
const double* thetasD = (double*)thetas;
const double* rhosD = (double*)rhos;
printf("row 1, col 0: %f\n", imgD[getIndex(1, 0, imgColCount)]);
int row, col, thetaInd, rhoInd, index;
double rhoVal, minDiff, diff, tmp;
for(row = 0; row<imgRowCount; row++)
{
for(col = 0; col<imgColCount; col++)
{
printf("row:%d, -col:%d, index:%d\n", row, col, getIndex(row, col, imgColCount));
tmp = imgD[getIndex(row, col, imgColCount)];
printf("-2: %f\n", tmp);
if (tmp>0.0)
{
printf("-1");
for(thetaInd = 0; thetaInd<thetaCount; thetaInd++)
{
rhoVal = col*cos(thetasD[thetaInd]*(pi/180)) + row*sin(thetasD[thetaInd]*(pi/180));
minDiff = INFINITY;
index = -1;
for(rhoInd = 0; rhoInd<rhoCount; rhoInd++)
{
diff = abs(rhoVal-rhosD[rhoInd]);
if(diff<minDiff)
{
minDiff = diff;
index = rhoInd;
}
}
if(index>=0)
{
printf("1\n");
outD[getIndex(index, thetaInd, thetaCount)] += 1;
}
}
}
else
{
printf("else\n");
}
}
}
}
int getIndex(int row, int col, int maxCol)
{
return col + row*maxCol;
}
and at last the python code beeing used:
import numpy as np
import ctypes
from scipy.misc import imread
def makeReady(arr):
return np.require(arr, dtype=np.double, requirements=["C_CONTIGUOUS"])
def hough(imgBin, thetaRes=1, rhoRes=1):
if len(imgBin.shape) > 2:
imgBin = np.mean(imgBin, axis=2)
if imgBin.max() > 1:
imgBin /= imgBin.max()
if ((imgBin!=0) * (imgBin!=1)).sum()>0:
imgBin = imgBin > (imgBin.max()/2.0)
nR,nC = imgBin.shape
theta = np.linspace(-90.0, 90.0, np.ceil(180.0/thetaRes) + 1.0)
D = np.sqrt((nR - 1)**2 + (nC - 1)**2)
q = np.ceil(D/rhoRes)
nrho = 2*q + 1
rho = np.linspace(-q*rhoRes, q*rhoRes, nrho)
H = np.zeros((len(rho), len(theta)))
imgC = makeReady(imgBin)
thetasC = makeReady(theta)
rhosC = makeReady(rho)
outC = makeReady(H)
lib = ctypes.cdll.LoadLibrary("./hough.so")
lib.hough(imgC.ctypes.data_as(ctypes.c_void_p), imgC.shape[0], imgC.shape[1], thetasC.ctypes.data_as(ctypes.c_void_p), len(thetasC), rhosC.ctypes.data_as(ctypes.c_void_p),outC.ctypes.data_as(ctypes.c_void_p))
if __name__ == "__main__":
img = 1 - (imread("lines.jpeg"))>125
print img.shape
a = np.zeros((5,5))
a[1,0] = 5
hough(a)
what am i doing wrong?
Thank you

The only thing that looks like it could cause that error is going out-of-bounds on an array. Using the function getIndex(...) inside of [] could be causing your problem.
However, due to the difficulty to read the code (no comments, and no context), I recommend using a debugger (like valgrind) to give you information about the location of the error. In fact, valgrind will even print the line number the error occurs on, provided you compile with debug symbols (-g -O0 on gcc and clang).

From the output the error seems to happen in this part of the code:
for(thetaInd = 0; thetaInd<thetaCount; thetaInd++)
{
rhoVal = col*cos(thetasD[thetaInd]*(pi/180)) + row*sin(thetasD[thetaInd]*(pi/180));
minDiff = INFINITY;
index = -1;
for(rhoInd = 0; rhoInd<rhoCount; rhoInd++)
{
diff = abs(rhoVal-rhosD[rhoInd]);
if(diff<minDiff)
{
minDiff = diff;
index = rhoInd;
}
}
if(index>=0)
{
printf("1\n");
outD[getIndex(index, thetaInd, thetaCount)] += 1;
}
}
A segmentation violation could only be caused here by accessing one of the three arrays (thetasD and rhosD and outD) out of their bounds.
This could only happend if the indices run to far, which in turn could only happen if the for-loops' break condtions are wrong, which could only happen if the wrong values had been passed to hough.
The latter indeed seems to be the case, as the Python script is missing to pass rho's size and though is passing nothing for outD.
This line:
lib.hough(imgC.ctypes.data_as(ctypes.c_void_p), imgC.shape[0], imgC.shape[1],
thetasC.ctypes.data_as(ctypes.c_void_p), len(thetasC),
rhosC.ctypes.data_as(ctypes.c_void_p),
outC.ctypes.data_as(ctypes.c_void_p))
should look like:
lib.hough(imgC.ctypes.data_as(ctypes.c_void_p), imgC.shape[0], imgC.shape[1],
thetasC.ctypes.data_as(ctypes.c_void_p), len(thetasC),
rhosC.ctypes.data_as(ctypes.c_void_p), len(rhosC),
outC.ctypes.data_as(ctypes.c_void_p))

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Binary classifier in CNTK with C++ - python

Related

Image classifcation in python correct but not android studio

How to get the output from YOLO model using tensorflow with C++ correctly?

Tensorflow frozen graph protobuf does not predict using c api

Java to Python Code Not Working

Segmentation fault in if clause using gcc/ubuntu

Categories

Resources