I have an image that consists of only black and white pixels. 500x233px. I want to use this image as an input, or maybe all the pixels in the image as individual inputs, and then receive 3 floating point values as output using machine learning.
I have spent all day on this and have come up with nothing. The best I can find is some image classification libraries, but I am not trying to classify an image. I’m trying to get 3 values that range from -180.0 to 180.0.
I just want to know where to start. Tensorflow seems like it could probably do what I want, but I have no idea where to start with it.
I think my main issue is that I don’t have one output for each input. I’ve been trying to use each pixel’s value (0 or 1) as an input, but my output doesn’t apply to each pixel, only to the image as a whole. I’ve tried creating a string of each pixel’s value and using that as one input, but that didn’t seem to work either.
Should I be using neural networks? Or genetic algorithms? Or something else? Or would I be better off with only receiving one of the three outputs I need, and just training three different models for each output? Even then, I’m not sure how to get a floating point value out of these things. Maybe machine learning isn’t even the correct approach.
Any help is greatly appreciated!
Related
I am trying to use the following link to understand how a CVAE works. Although i can see how this works for something like a 28x28x1 input image, I'm not sure how to modify this to work for something like an input image of size 64x64x3.
I have tried looking at other sources for information, but all of them use the MNIST dataset used in the example above. None of them really explain why they chose the numbers for filters, kernels, or strides. I need help understanding this and how to modify the network to work for a 64x64x3.
None of them really explain why they chose the numbers for filters,
kernels, or strides.
I'm new to CNNs too, but from what I understand it's really more about experimentation, there is not an exact formula that would give you the amount of filters you have to use or the correct size, it depends on your problem, if you are trying to recognize an object and the object has small features that would make it "recognizable" to the network, then using a filter of small size may be best, but if you think that the features that allow the network to recognize the object are "bigger", then using filters of bigger size may be best, but again from what I've learned, these are just tips, you may have a CNN that has a completely different configuration.
The requirement of my task is to find the similar output image with the input image in the CNN. There are about half millions images need to be handled with
, it would not realistic to label each image. Beyond that, the images are candlestick charts about all stocks, so it is also very hard to classify all images.
My basic idea is that exporting the key features of images in CNN and comparing these features by using Hashing algorithm to get the similar one. But the idea was not compeleted, and how to done it in the python is also a big challenge. Therefor, is there anyone can help me with this issue,Thank you very much!!. If possible could you give me any relative articles or codes?
Mind reading this
https://towardsdatascience.com/find-similar-images-using-autoencoders-315f374029ea
This uses autoencoders to find similar images.
Please do post the final output code that you will apply so that i have better understand also.
I am currently trying to get the Faster R-CNN network from here to work in windows with tensorflow. For that, I wanted to re-implement the ROI-Pooling layer, since it is not working in windows (at least not for me. If you got any tips on porting to windows with tensorflow, I would highly appreciate your comments!). According to this website, what you do is, you take your proposed roi from your feature map and max pool its content to a fixed output size. This fixed output is needed for the following fully connected layers, since they only accept a fixed size input.
The problem now is the follwing:
After conv5_3, the last convolutional layer before roi pooling, the box that results from the region proposal network is mostly 5x5 pixels in size. This is totally fine, since the objects I want to detect usually have dimensions of 80x80 pixels in the original image (downsampling factor due to pooling is 16). However, I now have to max pool an area of 5x5 pixels and ENLARGE it to 7x7, the target size for the ROI-Pooling. My first try by simply doing interpolation did not work. Also, padding with zeros did not work. I always seem to get the same scores for my classes.
Is there anything I am doing wrong? I do not want to change the dimensions of any layer and I know that my trained network in general works because I have the reference implementation running in Linux on my dataset.
Thank you very much for your time and effort :)
There is now an official TF implementation of Faster-RCNN, and other object detection algorithms, in their Object Detection API, you should probably check it out.
If you still want to code it yourself, I wondered exactly the same thing as you and could not find an answer about how you're supposed to do. My three guesses would be:
interpolation, but it changes the feature values, so it destroys some information...
Resizing to 35x35 just by copying 7 times each cell and then max-pooling back to 7x7 (you don't have to actually do the resizing and then the pooling , for instance in 1D it basically reduces itself to output[i]=max(input[floor(i*5/7)], input[ceil(i*5/7)]), with a similar max over 4 elements in 2D -be careful, I might have forgotten some +1/-1 or something-). I see at least two problems: some values are over-represented, being copied more than others; but worse, some (small) values will not even be copied at all in the output ! (which you should avoid given that you can store more information in the output than in the input)
Making sure all input feature values are copied at least once exactly in the output, at the best possible place (basically copy input[i] to output[j] with j=floor((i+1)*7/5)-1)). For the remaining spots, either leave a 0 or do interpolation. I would think this solution is the best, maybe with interpolations but I'm really not sure at all.
It looks like smallcorgi's implementation uses my 2nd solution (without actually resizing, just using max pooling), since it's the same implementation as for the case where the input is bigger than the output.
I know it's late but i post this answer because it might help others. I have written a code that explains how roi pooling works in different height and width conditions for both pool and region.
you can see the link of the code in github:
https://github.com/Parsa33033/RoiPooling
I would like to understand more the machine learning technics, I have read and watch a bunch of things on Python, sklearn and supervised feed forward net but I am still struggling to see how I can apply all this to my project and where to start with. Maybe it is a little bit too ambitious yet.
I have the following algorithm which generates nice patterns as binary format inputs on csv file. The outputs and the goal is to predict the next row.
The simplify logic of this algorithm is the prediction of the next line (top line being the most recent one) would be 0,0,1,1,1,0 and then the next after that would become either 0,0,0,1,1,0 or come back to its previous step 0,1,1,1,0. However you can see the model is slightly more complex and noisy this is why I would like to introduce some machine learnings here. I am aware to have a reliable prediction I will need to introduce other relevant inputs afterwards.
Would someone please help me to get started and stand on my feet here?
I don't like throwing this here and not being able to provide a single piece of code but I am slightly confused to where to start.
Should I pass as input each (line-1) as vectors and then the associated output would be the top line? Should I build the array manually with all my dataset?
I guess I have to use the sigmoid function and python seems the most common way to answer this but for the synapses (or weights), I understand I need also to provide a constant, should this be 1?
Finally assuming you want this to run continuously what would be required?
Please would you share with me readings or simplification tasks that could help me to increase my knowledge with all this.
Many thanks.
I am working on detection of regions of specific tree in an aerial image and my approach is using texture detection. I have 4 descriptors/features and I want to use FANN to create a machine learning environment that would detect properly the regions.
My question is,
is the format I am reading, regarding the input of pyfann, always as described in https://stackoverflow.com/a/25703709/5722784 ?
What if I would like to have 4 input neurons and one output neuron, where for each input neuron I have a list (not a single integer) that I would like to feed on it? Can FANN provide it? If so, what's the format that I have to follow in making the input data?
Thank you so much for significant responses :)
Each input neuron can only take a single input - this is the case for all neural networks, irrespective of the library you use. I would suggest using each element in each of your lists as an input to the neural network, e.g. inputs 1-5 are your first list and then 6-10 are your second list. If you have variable length lists you likely have a problem, though.