I am currently working on vehicle platooning for which I need to design a code in python opencv for counting the number of vehicles based on the classification.The input is a real time traffic video.
The aim is finding an average size "x" for the bounding box and say that for cars its "x", for buses its "3x" and so on.Based on size of "x" or multiples of "x", determine the classification.Is there any possible way I can approach this problem statement?
Haar-cascades are a good method, however training them takes a lot of time as well as effort.
You can get a lot of trained cascades file online.
Second approach could be of getting the contours from the image and then proceeding forward.
- Original image
- Smooth the image so that you will get an image without edges.
- (Original image- Smooth image) to get the edges
- Get Contours from image
I have worked on almost similar problem.
Easiest way is to train a Haar-cascade on the vehicles of similar size.
You will have to train multiple cascades based on the number of categories.
Data for the cascade can be downloaded from any used car selling site using some browser plugin.
The negative sets pretty much depend on the context in which this solution will be used.
This also brings the issue that, if you plan to do this on a busy street, there are going to be many unforeseen scenarios. For example, pedestrian walking in the FoV. Also, FoV needs to be fixed, especially the distance from which objects are observed. Trail and error is the only way to achieve sweet-spot for the thresholds, if any.
Now I am going to suggest something outside the scope of the question you asked.
Though this is purely image processing based approach, you can turn the problem on its face, and ask a question 'Why' classification is needed? Depending on the use-case, more often than not, it will be possible to train a deep reinforcement learning agent. It will solve the problem without getting into lot of manual work.
Let me know in case of a specific issues.
Related
I am a marine biology PhD candidate with minimal experience with Python. I need to measure the diameters a large number of urchins frequently which I would like to automate and improve accuracy(it's tricky to physically measure urchins).
I have been advised to use OpenCV and have been trying to get the code from this pyimagesearch blog post. I have not found it to be highly effective for 2 reasons:
Accuracy is unlikely to be high enough (based on a small sample size I have been able to get so far. This is eluded to in the blog post, which is not ideal for round objects.
I am also mostly picking up many incorrect/inappropriate frames (not sure about terminology, but see images for example). Basically, it not only picks up the full urchin but also various points on the urchins (100s of these points). I have tried to increase the kernel size but this has not caused any changes. I cannot find out how to fix this. See these images for examples.
I am sure there is probably an easy fix for the frames (?), and I would appreciate if anyone could point me in the right direction. Furthermore, if there is a more accurate way of doing this I would also like to know.
We need to know the size of the urchins' shells and not the spines. Therefore ideally I would like to measure just the shell of the urchin and not the spines, but if I have to measure the spines as well I can just deduct a constant (average spine length for urchins of a given size) which is fine but would reduce accuracy further.
Any assistance would be appreciated.
Thanks in advance.
So recently I tried to make a bot to fish in Minecraft as a challenge. (not that I use it in any community, or modify the game`s code so I guess its ok with TOS) My approach was and stays so far to track the movements of the bob.
My first bot relied on color space segmentation and finetuning the image with morphological transformations from OpenCV-python (as part of my learning experience I aimed to make the bot purely computer vision based). That bot only worked in specific location where I set illumination and environment color with in-game methods. Also it worked at expense of turning games graphics to lowest settings to disable particles.
My second bot used HAAR-like classifiers, since I already made few models for real life objects which were fairly good. Sadly this time (I assume due to the game`s unique graphic style where essentially everything is a cube with textures mapped on it) it was fairly inconsistent and caused a lot of false positives.
My third bot used HOG-features based svm but it was fairly slow for all models ranging from more then 4000 original samples with really fit bounding boxes to about 200, due to that lack of speed fish was of the hook when detection occurred.
My last attempt used tensor flow lite and failed miserably due to even worse detection speed.
I also looked into possibility of doing motion detection by comparing the consequent frames, and speed benefits of java vs python, as well as different preprocessing options like increasing contrast, reducing color pallet and etc.
AT this point I don't know if wondering 'blind' will give me any clues on what would be the 'to go' approach, and hence I decided to ask here.
Thanks in advance.
P.S.
For exact specifics - I think the time to reel is approximately 0.7 seconds but I can be slightly off.
For a fast and straight forward object detection technique, I would suggest you to use
a pretrained retinanet.
You can find all the explanation that you would need to know, from these links:
https://github.com/fizyr/keras-retinanet
And follow this Collab, for fast training and straight forward implementation:
https://colab.research.google.com/drive/1v3nzYh32q2rm7aqOaUDvqZVUmShicAsT
I would suggest that you resnet50 as backbone, and use the pretrained weights to start your training.
Hello i'm currently working on a project where i have to use instance segmentation of different parts of seedlings (the top part and the stem)
Example image:
https://imgur.com/kWAZBed
I have to be able to calculate the angle of the hook for every seedling.
I've heard that the Mask-RCNN instance segmentation method might not be good for biological images, so should i go with U-net semantic segmentation instead?. The problem with U-net is that every seed and root gets categorized into two classes, where as i need to calculate the angle for each of them.
Some input would be appreciated.
You should start with whichever network is easiest for you to get off the ground and see if it's good enough. If not, try another model and see if it's good enough.
You can only go so far in choosing a network architecture for a new image use case. Sometimes you just have to try a few on the new type of image data and see which performs best.
Because your time is valuable, I would recommend starting with the simplest/fastest model for you to use, and try a "trickier" one, only if the first one wasn't good enough.
I must add that it's kind of difficult to understand all of the nuance's of your requirements just from the one image you posted...
good luck.
This is a fairly straightforward question, but I am new to the field. Using this tutorial I have a great way of detecting certain patterns or features. However, the images I'm testing are large and often the feature I'm looking for only occupies a small fraction of the image. When I run it on the entire picture the classification is bad, though when zoomed it and cropped the classification is good.
I've considered writing a script that breaks an image into many different images and runs the test on all (time isn't a huge concern). However, this still seems inefficient and unideal. I'm wondering about suggestions for the best, but also easiest to implement, solution for this.
I'm using Python.
This may seem to be a simple question, which it is, but the answer is not so simple. Localisation is a difficult task and requires much more leg work than classifying an entire image. There are a number of different tools and models that people have experimented with. Some models include R-CNN which looks at many regions in a manner not too dissimilar to what you suggested. Alternatively you could look at a model such as YOLO or TensorBox.
There is no one answer to this, and this gets asked a lot! For example: Does Convolutional Neural Network possess localization abilities on images?
The term you want to be looking for in research papers is "Localization". If you are looking for a dirty solution (that's not time sensitive) then sliding windows is definitely a first step. I hope that this gets you going in your project and you can progress from there.
Suppose I have an image of a car taken from my mobile camera and I have another image of the same car taken downloaded from the internet.
(For simplicity please assume that both the images contain the same side view projection of the same car.)
How can I detect that both the images are representing the same object i.e. the car, in this case, using OpenCV?
I've tried template matching, feature matching (ORB) etc but those are not working and are not providing satisfactory results.
SIFT feature matching might produce better results than ORB. However, the main problem here is that you have only one image of each type (from the mobile camera and from the Internet. If you have a large number of images of this car model, then you can train a machine learning system using those images. Later you can submit one image of the car to the machine learning system and there is a much higher chance of the machine learning system recognizing it.
From a machine learning point of view, using only one image as the master and matching another with it is analogous to teaching a child the letter "A" using only one handwritten letter "A", and expecting him/her to recognize any handwritten letter "A" written by anyone.
Think about how you can mathematically describe the car's features so that every car is different. Maybe every car has a different size of wheels? Maybe the distance from the door handle to bottom of the side window is a unique characteristic for every car? Maybe every car's proportion of front side window's to rear side window's width is an individual feature of that car?
You probably can't answer yes with 100% confidence to any of these quesitons. But, what you can do, is combine those into a multidimensional feature vector and perform classification.
Now, what will be the crucial part here is that since you're doing manual feature description, you need to take care of doing an excellent work and testing every step of the way. For example, you need to design features that will be scale and perspective invariant. Here, I'd recommend reading on how face detection was designed to fulfill that requirement.
Will Machine Learning be a better solution? Depends greatly on two things. Firstly, what kind of data are you planning to throw at the algorithm. Secondly, how well can you control the process.
What most people don't realize today, is that Machine Learning is not some magical solution to every problem. It is a tool and as every tool it needs proper handling to provide results. If I were to give you advice, I'd say you will not handle it very well yet.
My suggestion: get acquainted with basic feature extraction and general image processing algorithms. Edge detection (Canny, Sobel), contour finding, shape description, hough transform, morphological operations, masking, etc. Without those at your fingertips, I'd say in that particular case, even Machine Learning will not save you.
I'm sorry: there is no shortcut here. You need to do your homework in order to make that one work. But don't let that scare you. It's a great project. Good luck!