Image classifcation in python correct but not android studio - python

Currently i am implementing Tensorflow model into Android Studio using Tensorflow Lite, i have already check the Tensorflow model using Tensorflow interperter and it gives correct result in Python. The problem is where when i input image from android studio it gives wrong classification. Here is the code if i want to predict in python.
image1 =cv2.imread("image")
image_fromarray = Image.fromarray(image1,'RGB')
resize_image = image_fromarray.resize((100, 100))
expand_input = np.expand_dims(resize_image,axis=0)
input_data = np.array(expand_input)
input_data = input_data/255
pred = loaded_model.predict(input_data)
result = pred.argmax()
And here is the code from Android Studio if want to get the image from imageView and predict.
public void onClick(View view) {
if (img == null) {
Toast.makeText(MainActivity.this, "No image selected", Toast.LENGTH_SHORT).show();
try {
//resize image (100,100)
img = Bitmap.createScaledBitmap(img, imgsize, imgsize, false);
// Get pixels from the bitmap
int[] intValues = new int[imgsize *imgsize];
img.getPixels(intValues, 0, img.getWidth(), 0, 0, img.getWidth(), img.getHeight());
// Convert pixels to float values
float[] floatValues = new float[intValues.length * 3];
for (int i = 0; i < intValues.length; i++) {
final int val = intValues[i];
floatValues[i * 3] = ((val >> 16) & 0xFF) / 255.f;
floatValues[i * 3 + 1] = ((val >> 8) & 0xFF) / 255.f;
floatValues[i * 3 + 2] = (val & 0xFF) / 255.f;
TensorBuffer inputBuffer = TensorBuffer.createFixedSize(new int[]{1, imgsize, imgsize, 3}, DataType.FLOAT32);
Model2 model = Model2.newInstance(getApplicationContext());
Model2.Outputs outputs = model.process(inputBuffer);
TensorBuffer outputFeature0 = outputs.getOutputFeature0AsTensorBuffer();
// Releases model resources if no longer used.
float[] confidence = outputFeature0.getFloatArray();
int maxPos=-1;
float maxConfidence = -1;
for (int i = 0;i<confidence.length;i++)
maxConfidence = confidence[i];
String[] classes = {"Ripe Braeburn", "Ripe Red Apple", "Ripe Red Delicious", "Rotten"};
The problem is it only gives 1 result for any picture i choose from the test set. How can i modify the code in android studio so it gives the same result in python?


Input Image in tensorflow-lite C++

I am trying to move a Python+Keras model to Tensorflow Lite with C++ for an embedded platform.
I don't know how to pass the image data to the interpreter properly.
I have the following working python code:
interpreter = tf.lite.Interpreter(model_path="model.tflite")
input_details = interpreter.get_input_details()
input_shape = input_details[0]['shape']
print("Input Shape ")
image_a = plt.imread('image/0_0_0_copy.jpeg')
image_a = cv2.resize(image_a,(224,224))
image_a = np.asarray(image_a)/255
image_a = np.reshape(image_a,(1,224,224,3))
input_data = np.array(image_a, dtype=np.float32)
interpreter.set_tensor(input_details[0]['index'], input_data)
output_data = interpreter.get_tensor(output_details[0]['index'])
print("Output Data ")
The input shape for the image is (1, 224, 224, 3).
I need the equivalent C++ code for the same. How do I translate this?
I have the following C++ code upto now:
int main(){
std::unique_ptr<tflite::FlatBufferModel> model =
printf("Failed to map model\n");
tflite::ops::builtin::BuiltinOpResolver resolver;
tflite::InterpreterBuilder builder(*model, resolver);
std::unique_ptr<tflite::Interpreter> interpreter;
printf("Failed to construct interpreter\n");
if(interpreter->AllocateTensors() != kTfLiteOk){
printf("Failed to allocate tensors\n")
LOG(INFO) << "tensors size: " << interpreter->tensors_size() << "\n";
LOG(INFO) << "nodes size: " << interpreter->nodes_size() << "\n";
LOG(INFO) << "inputs: " << interpreter->inputs().size() << "\n";
LOG(INFO) << "input(0) name: " << interpreter->GetInputName(0) << "\n";
float* input = interpreter->typed_input_tensor<float>(0);
// Need help here
float* output = interpreter->typed_output_tensor<float>(0);
printf("output1 = %f\n", output[0]);
printf("output2 = %f\n", output[1]);
return 0;
I solved the problem in this way.
Build the interpreter as usual:
// Load model
std::unique_ptr<tflite::FlatBufferModel> model = tflite::FlatBufferModel::BuildFromFile(filename);
TFLITE_MINIMAL_CHECK(model != nullptr);
// Build the interpreter
tflite::ops::builtin::BuiltinOpResolver resolver;
enter code here
InterpreterBuilder builder(*model, resolver);
std::unique_ptr<Interpreter> interpreter;
TFLITE_MINIMAL_CHECK(interpreter != nullptr);
TFLITE_MINIMAL_CHECK(interpreter->AllocateTensors() == kTfLiteOk);
To get the input shape:
const std::vector<int>& t_inputs = interpreter->inputs();
TfLiteTensor* tensor = interpreter->tensor(t_inputs[0]);
// input size - for a cnn is four: (batch_size, h, w, channels)
input_size = tensor->dims->size;
batch_size = tensor->dims->data[0];
h = tensor->dims->data[1];
w = tensor->dims->data[2];
channels = tensor->dims->data[3];
This worked for me. I hope it will be good for you too.

How to get the output from YOLO model using tensorflow with C++ correctly?

I'm trying to write an inference program with YOLO model in C++. I've searched for some info about darknet, but it has to use .cfg file to import the model structure(which is a bit too complicated for me...), thus I want to do the program with tensorflow.
(My model weight is converted from .hdf5(used in python) to .pb(used in C++))
I've found some examples written in python, it seems like they have done some work before the inference process... Source
def yolo_eval(yolo_outputs,
"""Evaluate YOLO model on given input and return filtered boxes."""
num_layers = len(yolo_outputs)
anchor_mask = [[6,7,8], [3,4,5], [0,1,2]] if num_layers==3 else [[3,4,5], [1,2,3]] # default setting
input_shape = K.shape(yolo_outputs[0])[1:3] * 32
boxes = []
box_scores = []
for l in range(num_layers):
_boxes, _box_scores = yolo_boxes_and_scores(yolo_outputs[l],
anchors[anchor_mask[l]], num_classes, input_shape, image_shape)
boxes = K.concatenate(boxes, axis=0)
box_scores = K.concatenate(box_scores, axis=0)
mask = box_scores >= score_threshold
max_boxes_tensor = K.constant(max_boxes, dtype='int32')
boxes_ = []
scores_ = []
classes_ = []
for c in range(num_classes):
# TODO: use keras backend instead of tf.
class_boxes = tf.boolean_mask(boxes, mask[:, c])
class_box_scores = tf.boolean_mask(box_scores[:, c], mask[:, c])
nms_index = tf.image.non_max_suppression(
class_boxes, class_box_scores, max_boxes_tensor, iou_threshold=iou_threshold)
class_boxes = K.gather(class_boxes, nms_index)
class_box_scores = K.gather(class_box_scores, nms_index)
classes = K.ones_like(class_box_scores, 'int32') * c
boxes_ = K.concatenate(boxes_, axis=0)
scores_ = K.concatenate(scores_, axis=0)
classes_ = K.concatenate(classes_, axis=0)
return boxes_, scores_, classes_
I've printed out the return value
and it looks like this
boxes-> Tensor("concat_11:0", shape=(?, 4), dtype=float32)
scores-> Tensor("concat_12:0", shape=(?,), dtype=float32)
classes-> Tensor("concat_13:0", shape=(?,), dtype=int32)
the original output of my YOLO model(.hdf5) is (I got this by printed out model.output)
tf.Tensor 'conv2d_59_1/BiasAdd:0' shape=(?, ?, ?, 21) dtype=float32
tf.Tensor 'conv2d_67_1/BiasAdd:0' shape=(?, ?, ?, 21) dtype=float32
tf.Tensor 'conv2d_75_1/BiasAdd:0' shape=(?, ?, ?, 21) dtype=float32
And the inference part of the python code is
out_boxes, out_scores, out_classes =
[boxes, scores, classes],
yolo_model.input: image_data,
input_image_shape: [image.size[1], image.size[0]],
K.learning_phase(): 0
Compare to the python version of inference code,
C++ part is... (Reference)
int main()
string image = "test.jpg";
string graph = "yolo_weight.pb";
string labels = "coco.names";
int32 input_width = 416;
int32 input_height = 416;
float input_mean = 0;
float input_std = 255;
string input_layer = "input_1:0";
std::vector<std::string> output_layer = {"conv2d_59/BiasAdd:0", "conv2d_67/BiasAdd:0", "conv2d_75/BiasAdd:0" };
std::unique_ptr<tensorflow::Session> session;
string graph_path = tensorflow::io::JoinPath(root_dir, graph);
Status load_graph_status = LoadGraph(graph_path, &session);
std::vector<Tensor> resized_tensors;
string image_path = tensorflow::io::JoinPath(root_dir, image);
Status read_tensor_status = ReadTensorFromImageFile(image_path, input_height, input_width,
input_mean, input_std, &resized_tensors);
Tensor inpTensor = Tensor(DT_FLOAT, TensorShape({ 1, input_height, input_width, 3 }));
std::vector<Tensor> outputs;
cv::Mat srcImage = cv::imread(image);
cv::resize(srcImage, srcImage, cv::Size(input_width, input_height));
srcImage.convertTo(srcImage, CV_32FC3);
srcImage = srcImage / 255;
string ty = type2str(srcImage.type());
float *p = (&inpTensor)->flat<float>().data();
cv::Mat tensorMat(input_height, input_width, CV_32FC3, p);
srcImage.convertTo(tensorMat, CV_32FC3);
Status run_status = session->Run({{ input_layer, inpTensor }}, { output_layer }, {}, &outputs);
int cc = 1;
auto output_detection_class = outputs[0].tensor<float, 4>();
std::cout << "detection scores" << std::endl;
std::cout << "typeid(output_detection_scoreclass).name->" << typeid(output_detection_class).name() << std::endl;
for (int i = 0; i < 13; ++i)
for (int j = 0; j < 13; ++j)
for (int k = 0; k < 21; ++k)
// using (index_1, index_2, index_3) to access the element in a tensor
printf("i->%d, j->%d, k->%d\t", i, j, k);
std::cout << output_detection_class(1, i, j, k) << "\t";
cc += 1;
if (cc % 4 == 0)
std::cout << "\n";
std::cout << std::endl;
return 0;
The output of c++ version inference part is
outputs.size()-> 3
outputs[0].shape()-> [1,13,13,21]
outputs[1].shape()-> [1,26,26,21]
outputs[2].shape()-> [1,52,52,21]
But the output I get is pretty weird...
(The output value of outputs[0] doesn't seems like any one of score, class, or coordinates...)
So I'm wondering is it because I miss the part written in python before its inference? Or I use the wrong way to get my output data?
I've checked some related questions and answers...
1.Yolo v3 model output clarification with keras
2.Convert YoloV3 output to coordinates of bounding box, label and confidence
3.How to access tensorflow::Tensor C++
But I still can't figure out how to make it :(
I also found a repo which might be helpful,
I've taken a look at its yolo.cpp, but its model output tensor's shape is different from mine, I'm not sure if I can revise the code directly, its output tensor is
tf.Tensor 'import/output:0' shape=(?, 735) dtype = float32
Any help or advice is appreciated...
In case you're still struggling with this, I don't see where you are applying the Sigmoid and Exp to the output layer values.
You might look at this paper, which describes how to handle the output.
As Bryan said, there're still some actions need to be done with the output layer.
So in my case (according to this repo), I add this to the YOLO class (at file for adding those post-processing when saving model:
def output_pb(self, out_dir, out_pb):
out_bx =":")[0]
out_sc =":")[0]
out_cs =":")[0]
print(out_bx, out_sc, out_cs)
frozen_graph = tf.graph_util.remove_training_nodes(tf.graph_util.convert_variables_to_constants(self.sess, self.sess.graph.as_graph_def(), [out_bx, out_sc, out_cs])), out_dir, out_pb, as_text=False)
print("===== FINISH saving new pb file =====")
When saving model, I called the function like this:
yolo = YOLO(**config)
yolo.output_pb(output_dir, output_pb_name)
And when doing inference in C++,
the whole process goes like this:
// initialize model
YOLO* YOLO_data = (YOLO*)Init_DllODM_object(config);
// do some stuff to set data in YOLO_data
cv::Mat input_pic = "whatever_pic.png";
predict(YOLO_data, input_pic, YOLO_data ->bbox_res, YOLO_data ->score_res, YOLO_data ->class_res);
// draw result on pic
cv::Mat res = show_result(YOLO_data, input_pic);
Detailed code is here:
// yolo_cpp.h
struct YOLO
float score_thres;
std::vector<int> class_res;
std::vector<float> bbox_res, score_res;
std::string inp_tensor_name;
std::string placeholder_name;
std::vector<std::string> out_tensors;
Session* session;
Tensor t, inpTensor;
std::vector<tensorflow::Tensor> outTensor;
std::vector<int> MD_size;
std::vector<int> inp_pic_size;
std::vector<std::string> md_class_list;
std::vector<cv::Scalar> color_list;
int show_score;
int score_type;
int return_origin;
// yolo_cpp.cpp
void* Init_DllODM_object(json config)
std::string model_path = config["model"].get<std::string>();
YOLO* YOLO_data = new YOLO();
auto options = tensorflow::SessionOptions();
GraphDef graphdef;
// loading model to graph
Status status_load = ReadBinaryProto(Env::Default(), model_path, &graphdef);
int node_count = graphdef.node_size();
for (int i = 0; i < node_count; i++)
auto n = graphdef.node(i);
if ("input_") != string::npos)
YOLO_data->inp_tensor_name =;
else if ("Placeholder_") != string::npos)
YOLO_data->placeholder_name =;
else if (i == node_count - 5)
else if (i == node_count - 3)
else if (i == node_count - 1)
if (!status_load.ok()) {
std::cout << "ERROR: Loading model failed..." << std::endl;
std::cout << model_path << status_load.ToString() << "\n";
std::vector<int> MD_size_ = config["input_size"];
YOLO_data->MD_size = MD_size_;
std::vector<int> inp_pic_size_ = config["input_pic_size"];
YOLO_data->inp_pic_size = inp_pic_size_;
YOLO_data->inpTensor = Tensor(DT_FLOAT, TensorShape({ 1, YOLO_data->MD_size[0], YOLO_data->MD_size[1], 3 })); // input tensor
YOLO_data->t = Tensor(DT_FLOAT, TensorShape({ 2 }));
auto t_matrix = YOLO_data->t.tensor<float, 1>();
t_matrix(0) = YOLO_data->inp_pic_size[0];
t_matrix(1) = YOLO_data->inp_pic_size[1];
// create session
Status status_newsess = NewSession(options, &YOLO_data->session); //for the usage of gpu setting
Status status_create = YOLO_data->session->Create(graphdef);
if (!status_create.ok()) {
std::cout << "ERROR: Creating graph in session failed.." << status_create.ToString() << std::endl;
else {
std::cout << "----------- Successfully created session and load graph -------------" << std::endl;
return YOLO_data;
int predict(YOLO* YOLO_, cv::Mat srcImage, std::vector<float>& bbox_res, std::vector<float>& score_res, std::vector<int>& class_res)
// read image -> input image
if (srcImage.empty()) // check if image can open correctly
std::cout << "can't open the image!!!!!!!" << std::endl;
int res = -1;
return res;
// ref:
std::vector<std::pair<string, tensorflow::Tensor>> inputs = {
{ YOLO_->inp_tensor_name, YOLO_->inpTensor },
{ YOLO_->placeholder_name, YOLO_->t },
srcImage = letterbox_image(srcImage, YOLO_->MD_size[0], YOLO_->MD_size[1]);
convertCVMatToTensor(YOLO_, srcImage);
Status status_run = YOLO_->session->Run({ inputs }, { YOLO_->out_tensors }, {}, &YOLO_->outTensor);
if (!status_run.ok()) {
std::cout << "ERROR: RUN failed..." << std::endl;
std::cout << status_run.ToString() << "\n";
int res = -1;
return res;
TTypes<float>::Flat pp1 = YOLO_->outTensor[0].flat<float>();
TTypes<float>::Flat pp2 = YOLO_->outTensor[1].flat<float>();
TTypes<int>::Flat pp3 = YOLO_->outTensor[2].flat<int>();
int pp1_idx;
for (int i = 0; i < pp2.size(); i++)
pp1_idx = i * 4;
bbox_res.push_back(pp1(pp1_idx + 1));
bbox_res.push_back(pp1(pp1_idx + 2));
bbox_res.push_back(pp1(pp1_idx + 3));
return 0;
cv::Mat show_result(YOLO* inf_obj, cv::Mat inp_pic)
int bbox_idx;
std::string plot_str;
bool under_thresh = false;
std::vector<int> del_idx;
for (int i = 0; i < inf_obj->class_res.size(); i++)
int y_min, y_max, x_min, x_max;
bbox_idx = i * 4;
y_min = std::max(0, (int)floor(inf_obj->bbox_res[bbox_idx] + 0.5));
x_min = std::max(0, (int)floor(inf_obj->bbox_res[bbox_idx + 1] + 0.5));
y_max = std::max(0, (int)floor(inf_obj->bbox_res[bbox_idx + 2] + 0.5));
x_max = std::max(0, (int)floor(inf_obj->bbox_res[bbox_idx + 3] + 0.5));
//std::cout << md_class_list[class_res[i]] << ", ";
//std::cout << score_res[i] << ",";
//std::cout << "[" << x_min << ", " << y_min << ", " << x_max << ", " << y_max << "]\n";
if (inf_obj->show_score)
if (inf_obj->score_type)
plot_str = inf_obj->md_class_list[inf_obj->class_res[i]] + ", " + std::to_string(rounding(inf_obj->score_res[i] * 100, 2)).substr(0, 5) + "%";
plot_str = inf_obj->md_class_list[inf_obj->class_res[i]] + ", " + std::to_string(rounding(inf_obj->score_res[i], 2)).substr(0, 4);
plot_str = inf_obj->md_class_list[inf_obj->class_res[i]];
if (inf_obj->score_res[i] >= inf_obj->score_thres)
inp_pic = plot_one_box(inp_pic, x_min, y_min, x_max, y_max, plot_str, inf_obj->color_list[inf_obj->class_res[i]]);
//std::cout << "score_res[i]->" << score_res[i] << "under thresh!!" << std::endl;
under_thresh = true;
if (under_thresh)
//std::cout << "*** deleting element" << std::endl;
for (int x = 0; x < del_idx.size(); x++)
bbox_idx = (del_idx[x] - x) * 4;
inf_obj->bbox_res.erase(inf_obj->bbox_res.begin() + bbox_idx + 3);
inf_obj->bbox_res.erase(inf_obj->bbox_res.begin() + bbox_idx + 2);
inf_obj->bbox_res.erase(inf_obj->bbox_res.begin() + bbox_idx + 1);
inf_obj->bbox_res.erase(inf_obj->bbox_res.begin() + bbox_idx);
inf_obj->score_res.erase(inf_obj->score_res.begin() + del_idx[x] - x);
inf_obj->class_res.erase(inf_obj->class_res.begin() + del_idx[x] - x);
return inp_pic;
Since my code is used for dll, I arranged in this way.
There are still some redundant code I didn't delete,
but I think the whole process can be done with these provided code so far.
Hope this help :D

Tensorflow frozen graph protobuf does not predict using c api

I have trained model for semantic segmentation using this repo, got good results and tried to use this net in small library writen with tensorflow c API. I turned my keras model into protobuf file using this repo and run session using this code:
typedef struct model_t {
TF_Graph* graph;
TF_Session* session;
TF_Status* status;
TF_Output input, target, output;
TF_Operation *init_op, *train_op, *save_op, *restore_op;
TF_Output checkpoint_file;
} model_t;
typedef struct NetProperties {
int width;
int height;
int border;
int classes;
int inputSize;
} NetProperties;
static model_t * model;
static NetProperties * properties;
extern "C" EXPORT int ModelCreate(const char* nnFilename, const char* inputName, const char* outputName, int pictureWidth, int pictureHeight, int border, int classes) {
model = (model_t*)malloc(sizeof(model_t));;
model->status = TF_NewStatus();
model->graph = TF_NewGraph();
properties = (NetProperties*)malloc(sizeof(NetProperties));
properties->width = pictureWidth;
properties->height = pictureHeight;
properties->border = border;
properties->classes = classes;
properties->inputSize = (pictureWidth + border * 2) * (pictureHeight + border * 2) * 3;
// Create the session.
TF_SessionOptions* opts = TF_NewSessionOptions();
model->session = TF_NewSession(model->graph, opts, model->status);
if (!Okay(model->status)) return 0;
TF_Graph* g = model->graph;
// Import the graph.
TF_Buffer* graph_def = read_file(nnFilename);
if (graph_def == NULL) return 0;
printf("Read GraphDef of %zu bytes\n", graph_def->length);
TF_ImportGraphDefOptions* opts = TF_NewImportGraphDefOptions();
TF_GraphImportGraphDef(g, graph_def, opts, model->status);
if (!Okay(model->status)) return 0;
// Handles to the interesting operations in the graph.
model->input.oper = TF_GraphOperationByName(g, inputName);
model->input.index = 0;
model->target.oper = TF_GraphOperationByName(g, "target");
model->target.index = 0;
model->output.oper = TF_GraphOperationByName(g, outputName);
model->output.index = 0;
model->init_op = TF_GraphOperationByName(g, "init");
model->train_op = TF_GraphOperationByName(g, "train");
model->save_op = TF_GraphOperationByName(g, "save/control_dependency");
model->restore_op = TF_GraphOperationByName(g, "save/restore_all");
model->checkpoint_file.oper = TF_GraphOperationByName(g, "save/Const");
model->checkpoint_file.index = 0;
// first prediction is slow
unsigned char * randomData = (unsigned char*)malloc(properties->inputSize * sizeof(unsigned char));
for (int i = 0; i < properties->inputSize; i++) {
randomData[i] = (unsigned char)100;
return 1;
extern "C" EXPORT void ModelDestroy() {
if (model == nullptr) return;
TF_DeleteSession(model->session, model->status);
extern "C" EXPORT unsigned char* ModelPredict(unsigned char * batch1) {
if (model == NULL) return NULL;
const int64_t dims[4] = { 1, properties->height + properties->border * 2, properties->width + properties->border * 2, 3 };
size_t nbytes = properties->inputSize;
// can be faster
float * arrayOfFloats = (float*)malloc(nbytes * sizeof(float));
//float sumUp = 0;
for (int i = 0; i < properties->inputSize; i++) {
arrayOfFloats[i] = batch1[i] * (1.f / 255.f);
//sumUp += arrayOfFloats[i];
//std::cout << sumUp << std::endl;
// removed due to jdehesa answer
//float ** inputFloats = (float**)malloc(nbytes * sizeof(float*));
//inputFloats[0] = arrayOfFloats;
// Optionally, you can check that your input_op and input tensors are correct
//// by using some of the functions provided by the C API.
//std::cout << "Input op info: " << TF_OperationNumOutputs(input_op) << "\n";
//std::cout << "Input data info: " << TF_Dim(input, 0) << "\n";
std::vector<TF_Output> inputs;
std::vector<TF_Tensor*> input_values;
TF_Operation* input_op = model->input.oper;
TF_Output input_opout = { input_op, 0 };
// reworked due to jdehesa answer
//TF_Tensor* input = TF_NewTensor(TF_FLOAT, dims, 4, (void*)inputFloats, //nbytes * sizeof(float), &Deallocator, NULL);
TF_Tensor* input = TF_NewTensor(TF_FLOAT, dims, 4, (void*)arrayOfFloats, nbytes * sizeof(float), &Deallocator, NULL);
int outputSize = properties->width * properties->height * properties->classes;
int64_t out_dims[] = { 1, properties->height, properties->width, properties->classes };
// Create vector to store graph output operations
std::vector<TF_Output> outputs;
TF_Operation* output_op = model->output.oper;
TF_Output output_opout = { output_op, 0 };
// Create TF_Tensor* vector
//std::vector<TF_Tensor*> output_values(outputs.size(), nullptr);
// Similar to creating the input tensor, however here we don't yet have the
// output values, so we use TF_AllocateTensor()
TF_Tensor* output_value = TF_AllocateTensor(TF_FLOAT, out_dims, 4, outputSize * sizeof(float));
//// As with inputs, check the values for the output operation and output tensor
//std::cout << "Output: " << TF_OperationName(output_op) << "\n";
//std::cout << "Output info: " << TF_Dim(output_value, 0) << "\n";
TF_SessionRun(model->session, NULL,
&inputs[0], &input_values[0], inputs.size(),
&outputs[0], &output_value, outputs.size(),
/* No target operations to run */
NULL, 0, NULL, model->status);
if (!Okay(model->status)) return NULL;
// memory allocations take place here
float* prediction = (float*)TF_TensorData(output_value);
//float* prediction = (float*)malloc(sizeof(float) * properties->inputSize / 3 * properties->classes);
//memcpy(prediction, TF_TensorData(output_value), sizeof(float) * properties->inputSize / 3 * properties->classes);
unsigned char * charPrediction = new unsigned char[outputSize * sizeof(unsigned char)];
sumUp = 0;
for (int i = 0; i < outputSize; i++) {
charPrediction[i] = (unsigned char)((prediction[i] * 255));
//sumUp += prediction[i];
//std::cout << sumUp << std::endl << std::endl;
return charPrediction;
The problem is that prediction result is always the same. I tried to pass random data and real images but the result was equal. However, defferent trained models give different prediction result, but for each model it's always same. As you can see in code snippet, I checked that pass different data and get same prediction every time
// first is float sum of passed picture, second is the float sum of answer
I tried to write my own keras to tensorflow .pb converter but result was the same.
import os, argparse
import tensorflow as tf
from tensorflow.keras.utils import get_custom_objects
from segmentation_models.losses import bce_dice_loss,dice_loss,cce_dice_loss
from segmentation_models.metrics import iou_score
# some custom functions from segmentation_models
'dice_loss': dice_loss,
'bce_dice_loss': bce_dice_loss,
'cce_dice_loss': cce_dice_loss,
'iou_score': iou_score,
def freeze_keras(model_name):
model = tf.keras.models.load_model(model_name)
sess = tf.keras.backend.get_session()
constant_graph = tf.graph_util.convert_variables_to_constants(sess, sess.graph.as_graph_def(), [ for out in model.outputs])
tf.train.write_graph(constant_graph, './', 'saved_model.pb', as_text=False)
Help me to find out how to fix prediction result in c api.
UPDATE 1: Reworked input array as jdehesa suggested
UPDATE 2: Added definition of model and NetProperties
I think you are not setting the input data correctly. Let's see.
float * arrayOfFloats1 = (float*)malloc(nbytes * sizeof(float));
float sumUp = 0;
Here you create arrayOfFloats1 to hold all the image data.
for (int i = 0; i < properties->inputSize; i++) {
arrayOfFloats1[i] = batch1[i] * (1.f / 255.f);
sumUp += arrayOfFloats1[i];
std::cout << sumUp << std::endl;
Here you set arrayOfFloats1 to the image data. This is all fine.
But then:
float ** inputFloats = (float**)malloc(nbytes * sizeof(float*));
Here you have inputFloats, which has space for nbytes float pointers. First, you probably would want to allocate space for float values, not float pointers (which probably do not have the same size). And then:
inputFloats[0] = arrayOfFloats1;
Here you are setting the first of those nbytes pointers to the pointer arrayOfFloats1. And then inputFloats is used as input to the model. But the remaining nbytes - 1 pointers have not been set to anything. Although not required, they are probably set all to zero.
If you just want to make an "array of arrays of floats" with arrayOfFloats1 you don't need to allocate any memory, you can simply do:
float ** inputFloats = &arrayOfFloats1;
But then you actually use inputFloats like this:
TF_Tensor* input = TF_NewTensor(
TF_FLOAT, dims, 4, (void*)inputFloats, nbytes * sizeof(float), &Deallocator, NULL);
So here you are saying that input is made up of the data in inputFloats, which will be a pointer to arrayOfFloats1 and then uninitialized memory. Probably you actually want something like:
TF_Tensor* input = TF_NewTensor(
TF_FLOAT, dims, 4, (void*)arrayOfFloats1, nbytes * sizeof(float), &Deallocator, NULL);
Which means input will be a tensor made up of the data in arrayOfFloats1 that you copied before. In fact, I don't think your code needs inputFloats at all.
Otherwise, from what I can tell the rest of the code seems correct. You should ensure that all allocated memory is properly freed in all cases (e.g. when you do if (!Okay(model->status)) return NULL; you should probably delete the input and output tensors before returning), but that is a different issue.
The issue was in the model. I have trained it using not normalized data from images (pixel values are between 0.0 and 255.0) and tried to interfere normilezed data (I devided each pixel value by 255 arrayOfFloats[i] = batch1[i] * (1.f / 255.f); and got values between 0.0 and 1.0) so my model thought that it gets black images every time and gave me similar answers. So I removed normalization and the model started to predict.

Binary classifier in CNTK with C++

I am trying to use the C++ API of CNTK to achieve online learning. While reading the source code of the unit tests and the CNTKLibrary.h header, I only saw the Trainer.TrainMinibatch method to train a model. Can this method be used to pass a single input-output data point? If it is possible, what is the easiest way to do this?
I tried to use the CNTK::Value::CreateSequence method to create a sequence which I then wanted to use in the TrainMinibatch function, but its not working the way I expected it to work:
I tried to port this python code to C++:
num_hidden_layers = 2
num_output_classes = 2
input_dim = 1
hidden_layers_dim = 400
input_var = C.input_variable(input_dim)
label_var = C.input_variable(num_output_classes)
def create_model(features):
with C.layers.default_options(init = C.glorot_uniform(), activation=C.ops.relu):
h = features
for _ in range(num_hidden_layers):
h = C.layers.Dense(hidden_layers_dim, activation=C.sigmoid)(h)
r = C.layers.Dense(num_output_classes, activation=None)(h)
return r
z = create_model(input_var)
loss = C.cross_entropy_with_softmax(z, label_var)
label_error = C.classification_error(z, label_var)
learning_rate = 0.2
lr_schedule = C.learning_rate_schedule(learning_rate, C.UnitType.minibatch)
learner = C.sgd(z.parameters, lr_schedule)
trainer = C.Trainer(z, (loss, label_error), [learner])
input_map = { label_var : None, input_var : None}
training_progress_output_freq = 500
for i in range(0, 10000):
input_map[input_var] = np.array([np.random.randint(0,2)], dtype=np.float32);
if input_map[input_var] == 0:
input_map[label_var] = np.array([1,0], dtype=np.float32)
input_map[label_var] = np.array([0, 1], dtype=np.float32)
I ended up with this C++ code:
const size_t inputDim = 1;// 28 * 28;
const size_t numOutputClasses = 2;// 10;
const size_t hiddenLayerDim = 400;
const size_t numHiddenLayers = 2;
//build the model
auto input = InputVariable({ inputDim }, DataType::Float, L"features");
FunctionPtr classifierOutput = input;
for (int i = 0; i < numHiddenLayers; i++)
classifierOutput = FullyConnectedDNNLayer(classifierOutput, hiddenLayerDim, device, std::bind(Sigmoid, _1, L""));
classifierOutput = FullyConnectedLinearLayer(classifierOutput, 2, device);
auto labels = InputVariable({ numOutputClasses }, DataType::Float, L"labels");
auto trainingLoss = CrossEntropyWithSoftmax(classifierOutput, labels, L"lossFunction");
auto prediction = Minus(Constant::Scalar(1.0f, device), ClassificationError(classifierOutput, labels, L"classificationError"));
LearningRatePerMinibatchSchedule learningRatePerSample = 0.2;
auto trainer = CreateTrainer(classifierOutput, trainingLoss, prediction,
{ SGDLearner(classifierOutput->Parameters(), learningRatePerSample) }
std::cout << "Starting to train...\n";
size_t outputFrequencyInMinibatches = 500;
for (size_t i = 0; i < 10000; ++i)
//input data
std::vector<float> inputData(1);
inputData[0] = ((float)rand()) / RAND_MAX;
//output data
std::vector<float> outputData(2);
outputData[0] = inputData[0] > 0.5 ? 1.0 : 0.0;
outputData[1] = 1.0 - outputData[0];
ValuePtr inputSequence = CNTK::Value::CreateSequence(NDShape({ 1 }), inputData, device);
ValuePtr outputSequence = CNTK::Value::CreateSequence(NDShape({ 2 }), outputData, device);
std::unordered_map<Variable, ValuePtr> map = {{ input, inputSequence }, { labels, outputSequence } };
trainer->TrainMinibatch(map, device);
I am able to compile the code and to let it run, but the loss in the C++ version is not converging to 0; in the python version after a few hundred iterations the loss is more or less 0...
It seems the input data to python is either 0 or 1:
input_map[input_var] = np.array([np.random.randint(0,2)], dtype=np.float32);
while in C++ code it's float between 0 and 1
//input data
std::vector<float> inputData(1);
inputData[0] = ((float)rand()) / RAND_MAX;
Please change them to the same and check if convergence speeds are different.

Python vs. C++ OpenCV matchTemplate

I have a weird problem with OpenCV. I was doing template matching with OpenCV on both Python and C++, however, even though Python uses the C++ methods under the hood, I get very different results. Python method gives me really accurate place, C++ is just not even close. What is the reason for this? Is it my C++ code or something else??
I use Python 2.7.11, Apple LLVM version 7.3.0 (clang-703.0.29), and OpenCV3.0.
My Python Code:
def toGray(img):
_, _, channels = img.shape
if channels == 3:
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
gray = img
return gray
def template_match(img, template):
w, h = template.shape[::-1]
res = cv2.matchTemplate(img,template,cv2.TM_CCOEFF_NORMED)
min_val, max_val, min_loc, max_loc = cv2.minMaxLoc(res)
top_left = max_loc
bottom_right = (top_left[0] + w, top_left[1] + h)
cv2.rectangle(img,top_left, bottom_right, 255, 2)
plt.subplot(121),plt.imshow(res,cmap = 'gray')
plt.title('Matching Result'), plt.xticks([]), plt.yticks([])
plt.subplot(122),plt.imshow(img,cmap = 'gray')
plt.title('Detected Point'), plt.xticks([]), plt.yticks([])
if __name__ == "__main__":
img_name = sys.argv[1]
img_name2 = sys.argv[2]
img_rgb = cv2.imread(img_name)
img_rgb2 = cv2.imread(img_name2)
gimg1 = toGray(img_rgb)
gimg2 = toGray(img_rgb2)
template_match(gimg1, gimg2)
My C++ code (It is exactly the same with OpenCV documentation):
Mat img; Mat templ; Mat result;
char* image_window = "Source Image";
char* result_window = "Result window";
int match_method;
int max_Trackbar = 5;
/// Function Headers
void MatchingMethod( int, void* );
/** #function main */
int main( int argc, char** argv )
/// Load image and template
img = imread( argv[1], 1 );
templ = imread( argv[2], 1 );
/// Create windows
namedWindow( image_window, CV_WINDOW_AUTOSIZE );
namedWindow( result_window, CV_WINDOW_AUTOSIZE );
/// Create Trackbar
char* trackbar_label = "Method: \n 0: SQDIFF \n 1: SQDIFF NORMED \n 2: TM CCORR \n 3: TM CCORR NORMED \n 4: TM COEFF \n 5: TM COEFF NORMED";
createTrackbar( trackbar_label, image_window, &match_method, max_Trackbar, MatchingMethod );
MatchingMethod( 0, 0 );
return 0;
* #function MatchingMethod
* #brief Trackbar callback
void MatchingMethod( int, void* )
/// Source image to display
Mat img_display;
img.copyTo( img_display );
/// Create the result matrix
int result_cols = img.cols - templ.cols + 1;
int result_rows = img.rows - templ.rows + 1;
result.create( result_rows, result_cols, CV_32FC1 );
/// Do the Matching and Normalize
matchTemplate( img, templ, result, match_method );
normalize( result, result, 0, 1, NORM_MINMAX, -1, Mat() );
/// Localizing the best match with minMaxLoc
double minVal; double maxVal; Point minLoc; Point maxLoc;
Point matchLoc;
minMaxLoc( result, &minVal, &maxVal, &minLoc, &maxLoc, Mat() );
/// For SQDIFF and SQDIFF_NORMED, the best matches are lower values. For all the other methods, the higher the better
if( match_method == CV_TM_SQDIFF || match_method == CV_TM_SQDIFF_NORMED )
{ matchLoc = minLoc; }
{ matchLoc = maxLoc; }
/// Show me what you got
rectangle( img_display, matchLoc, Point( matchLoc.x + templ.cols , matchLoc.y + templ.rows ), Scalar::all(0), 2, 8, 0 );
rectangle( result, matchLoc, Point( matchLoc.x + templ.cols , matchLoc.y + templ.rows ), Scalar::all(0), 2, 8, 0 );
imshow( image_window, img_display );
imshow( result_window, result );
cv::imwrite("rec.jpg", img_display);
Original Images:
Python Output:
C++ Output
Looking through the two implementations, the most evident difference between them is the colour format of the images used.
In the Python version, you load the images "as-is". Since your input images are RGB (as the variable names also suggest), you will be doing the template matching on colour images.
img_rgb = cv2.imread(img_name)
img_rgb2 = cv2.imread(img_name2)
However in C++ you load the images as grayscale, since you pass the 1 as second parameter.
img = imread( argv[1], 1 );
templ = imread( argv[2], 1 );
According to cv::matchTemplate documentation:
In case of a color image, template summation in the numerator and each
sum in the denominator is done over all of the channels and separate
mean values are used for each channel. That is, the function can take
a color template and a color image. The result will still be a
single-channel image, which is easier to analyze.
That would suggest that it's quite possible to get different results when applying it on a 3-channel image, than when applying it to a single channel version of the same image.
