Now I want to train my own image data in caffe using SegNet.
But at the first step we need label our own image like these:
I have tried to search github but cannot find anything. So my question is anyone know which tool can make semantic label image?
Check out a tool called sloth: https://github.com/cvhciKIT/sloth, which is an open-source tool written in Python with PyQt for creating ground truth computer vision datasets for a wide array of applications, such as semantically creating data like you have above.
If you don't like sloth, you can use any image editing software, like GIMP where you would make one layer per label and use polygons and flood fill of different hues to create your data. You would then merge all of the layers together to make a final image that you would use for your purposes.
However, as user Miki mentioned (see discussion thread below), creating new datasets from the beginning will take a considerable amount of effort. It is highly advisable that you don't create this on your own as you need a lot of data to ensure your algorithms are performing correctly. You'll need the help of other (hopefully willing) PhD students, preferably those you know personally or work with you in your lab or workplace to help manually curate this data for you.
If this isn't an option, you can use crowd sourced funded places like Amazon Mechanical Turk where you can outsource the work to willing individuals where you inform them of the task at hand and you pay a small amount per image. This would be something to consider if you can't find many people to help you.
All in all, this will take a considerable amount of effort, not only in terms of time but in terms of people if you want to create a large data set within a short span of time. I would recommend you simply use established datasets, such as what you have referenced from Cambridge, or Miki suggested LabelMe by Antonio Torralba which not only is a toolbox for annotating images from his LabelMe dataset but it also allows you to do the same for your own images.
Good luck!
As answer by #rayryeng a tool called sloth is great to finish these task in simple way. However, if I have more than 20 object waiting for me to classify, sloth is not a ideal tools. Thus I develop a simple tool which call IsLabel to finish these problem with few algorithms.
And the result look like these while using IsLabel just took me 40s:
INPUT:
OUTPUT:
I know its not perfect but it work fine for me.
I would recommend using https://www.labelbox.io/. They open sourced a lot of their code and have a hosting platform to manage the whole labeling process end to end.
Here is an example of segmentation
And you can export labels with a mask.
Related
I'm currently working on a tkinter python school project where the sole purpose is to generate images from audio files, I'm going to pick audio properties and use them as values to generate unique abstract images from it, however I don't know which properties I can analyze to extract the values from. So I was looking for some guidance on which properties (audio frequency, amplitude... etc.) I can extract values from to use to generate the images with Python.
The question is very broad in it's current form.
(Bare in mind audio is not my area of expertise so do keep an eye out for the opinion of people working in audio/audiovisual/generative fields.)
You can go about it either way: figure out what kind of image(s) you'd like to create from audio and from there figure out which audio features to use. The other way around is also valid: pick an audio feature you'd like to explore, then think of how you'd best or most interestingly represent that visually.
There's a distintion between image and images.
For a single image, the simplest thing I can think of is drawing a grid of squares where a visual property of the square (e.g. square size, fill colour intensity, etc.) is mapped to the amplitude at that time. The single image would visualise a whole track's amplitude pattern. Even with such a simple example there are many choices you can make (how often you sample, how you layout the grid (cartesian, polar), how each amplitude sample is visualised (could different shapes, sizes, colours, etc.).
(Similar concept to CinemaRedux, simpler for audio only)
You can look into the field of data visualisation for inspiration.
Information is Beautiful is great place to start.
If you want to generate images that seems to go into the audiovisual territory (e.g. abstract animation, audio reactive motion graphics, etc.).
Your question originally had the tag Processing tag, which I removed, however you could be using Processing's Python Mode.
In ferms of audio visualisisation one good example I can think is Robert Hogin's work, see Magnetosphere and the Audio-generated landscape prototype. He is using frequency analysis (FFT) with a bit of smoothing/data massaging to amplify the elements useful for visualisation and dampen some of the noise:
(There are a few handy audio libraries such as Minim and beads, however I assume you're intresting in using raw Python, not Jython (which is what the official Processing Python mode uses). He is an answer on FFT analysis for visualisation (even though it's in Processing Java, the principles can be applied in Python)
Personally I've only used pyaudio so far for basic audio tasks. I would assume you could use it for amplitude analysis, but for other more complex tasks you might something extra.
Doing a quick search librosa pops up.
If what you want to achieve isn't clear, try prototyping first and start with the simplest audio analysis and visual elements you can think of (e.g. amplitude mapped to boxes over time). Constraints can be great for creativity and the minimal approach could translate into a cleaner, minimal visuals.
You can then look into FFT, MFCC, onset/ beat detection, etc.
Another tool that could be useful for prototyping is Sonic Visualiser.
You can open a track and use some of the built-in feature extractors.
(You can even get away with exporting XML or CSV data from Sonic Visualser which you can load/parse in Python and use to render image(s))
It uses a plugin system (similar to VST plugins in DAWs like Abbleton Live, Apple Logic, etc.) called Vamp plugins. You can then use the VampPy Python wrapper if you need the data at runtime.
(You might also want to draw inspiration from other languages used of audiovisual artworks like PureData + Gems , MaxMSP + Jitter, VVVV, etc.)
Time domain: Zero-crossing rate, Root mean square energy ,etc . Frequency Domain: Spectral bandwith,flux,rollof,flatness,MFCC etc. Also ,tempo, You can use librosa for Python , link : https://librosa.org/doc/latest/index.html for extraction from a .wav file , which implements Fast Fourier Transfrom and framing. And then you can apply some statistics such mean,standard deviation to the vector of the above characteristics across the whole audio file.
Providing an additional avenue for exploration: you have some tools to explore this qualitatively (as opposed to quantitatively using metrics derived from the audio signal as suggested in the great answers above)
As you mention the objective is to generate unique abstract images from sound - I would suggest an interesting angle may be to apply some Machine Learning techniques and derive some mood classification predictions from the source audio.
For instance you could use the Tensorflow models in essentia to predict the mood of the track and associate images you select with the mood scores generated. I would suggest going well beyond this and using the tkinter image creation tools to create your mappings to mood. Use pen and paper to develop your mapping strategy - are certain moods more angular or circular? What colour mappings will you select, and why? You have a great deal of freedom to create these mappings - so start simple as complexity builds naturally.
Using some some simple mood predictions may be more useful for you as someone who has more experience with the qualitative experience with sound rather than the quantitative experience as an audio engineer. I think this may be worth making central to the report you write and documenting your mapping decisions and design process for the report if this is a requirement of the task.
Let's say I have a set of images of passports. I am working on a project where I have to identify the name on each passport and eventually transform that object into text.
For the very first part of labeling (or classification (I think. beginner here)) where the name is on each passport, how would I go about that?
What techniques / software can I use to accomplish this?
in great detail or any links would be great. I'm trying to figure out how this is done exactly so I can began coding
I know training a model is involved possibly but I'm just not sure
I'm using Python if that matters.
thanks
There's two routes you can take, one where you have labeled data (or you want to label data yourseld), and one where you don't have that.
Let's start with the latter. Say you have an image of a passport. You want to detect where the text in the image is, and what that text says. You can achieve this using a library called pytessaract. It's an AI that does exactly this for you. It works well because it has been trained on a lot of other images, so it's good in detecting text in any image.
If you have labels you might be able to improve your model you could make with pytessaract, but this is a lot harder. If you want to learn it anyway, I would recommend with learning ŧensorflow, and use "transfer learning" to improve your model.
This is a fairly straightforward question, but I am new to the field. Using this tutorial I have a great way of detecting certain patterns or features. However, the images I'm testing are large and often the feature I'm looking for only occupies a small fraction of the image. When I run it on the entire picture the classification is bad, though when zoomed it and cropped the classification is good.
I've considered writing a script that breaks an image into many different images and runs the test on all (time isn't a huge concern). However, this still seems inefficient and unideal. I'm wondering about suggestions for the best, but also easiest to implement, solution for this.
I'm using Python.
This may seem to be a simple question, which it is, but the answer is not so simple. Localisation is a difficult task and requires much more leg work than classifying an entire image. There are a number of different tools and models that people have experimented with. Some models include R-CNN which looks at many regions in a manner not too dissimilar to what you suggested. Alternatively you could look at a model such as YOLO or TensorBox.
There is no one answer to this, and this gets asked a lot! For example: Does Convolutional Neural Network possess localization abilities on images?
The term you want to be looking for in research papers is "Localization". If you are looking for a dirty solution (that's not time sensitive) then sliding windows is definitely a first step. I hope that this gets you going in your project and you can progress from there.
Suppose I have an image of a car taken from my mobile camera and I have another image of the same car taken downloaded from the internet.
(For simplicity please assume that both the images contain the same side view projection of the same car.)
How can I detect that both the images are representing the same object i.e. the car, in this case, using OpenCV?
I've tried template matching, feature matching (ORB) etc but those are not working and are not providing satisfactory results.
SIFT feature matching might produce better results than ORB. However, the main problem here is that you have only one image of each type (from the mobile camera and from the Internet. If you have a large number of images of this car model, then you can train a machine learning system using those images. Later you can submit one image of the car to the machine learning system and there is a much higher chance of the machine learning system recognizing it.
From a machine learning point of view, using only one image as the master and matching another with it is analogous to teaching a child the letter "A" using only one handwritten letter "A", and expecting him/her to recognize any handwritten letter "A" written by anyone.
Think about how you can mathematically describe the car's features so that every car is different. Maybe every car has a different size of wheels? Maybe the distance from the door handle to bottom of the side window is a unique characteristic for every car? Maybe every car's proportion of front side window's to rear side window's width is an individual feature of that car?
You probably can't answer yes with 100% confidence to any of these quesitons. But, what you can do, is combine those into a multidimensional feature vector and perform classification.
Now, what will be the crucial part here is that since you're doing manual feature description, you need to take care of doing an excellent work and testing every step of the way. For example, you need to design features that will be scale and perspective invariant. Here, I'd recommend reading on how face detection was designed to fulfill that requirement.
Will Machine Learning be a better solution? Depends greatly on two things. Firstly, what kind of data are you planning to throw at the algorithm. Secondly, how well can you control the process.
What most people don't realize today, is that Machine Learning is not some magical solution to every problem. It is a tool and as every tool it needs proper handling to provide results. If I were to give you advice, I'd say you will not handle it very well yet.
My suggestion: get acquainted with basic feature extraction and general image processing algorithms. Edge detection (Canny, Sobel), contour finding, shape description, hough transform, morphological operations, masking, etc. Without those at your fingertips, I'd say in that particular case, even Machine Learning will not save you.
I'm sorry: there is no shortcut here. You need to do your homework in order to make that one work. But don't let that scare you. It's a great project. Good luck!
So, I use SPM to register fMRI brain images between the same patient; however, I am having trouble registering images between patients.
Essentially, I want to register a brain atlas to a patient-specific scan, so that I can do some image patching. So register, then apply that warping and transformation to any number of images.
SPM was unsuccessful in such a registration. It cannot warp the atlas to be in the same brain shape as the patient brain.
Would software such as freesurfer be good for this?? Or is there something better out there in either matlab or python (but preferably python)??
Thanks!
tylerthemiler
There is a bulk of tools for image registration, e.g. look at http://www.nitrc.org under "Spatial transformation" -> "Registration". Nipype is indeed a nice Python module which wraps many of those (e.g. FSL, Freesurfer, etc) so you could explore different available tools within somewhat unified interface.
Besides those well known (SPM, FSL, AFNI) also you could give a try to somewhat less known but very powerful CMTK (http://www.nitrc.org/projects/cmtk) which comes with non-linear registration(s), population-based template construction, many other features and SRI24 atlas. Such script as asegment_sri24 could be used for a quick start with registering/reslicing each subject using labels available in SRI24 atlas.
To start using CMTK (or dozens of other neuroimaging software) in a matter of minutes I would recommend you to look at http://neuro.debian.net -- the platform to allow very easy deployment of (maintained) neuroscience software. FSL, AFNI, CMTK, SRI24 atlas etc are available there upon your demand ;)
Freesurfer segments and annotates the brain in the patient's native space, resulting in patient-specific regions, like so.
I'm not sure what you mean by patching, or to what other images you'd like to apply this transformation, but it seems like the software most compatible for working with individual patient data, rather than normalized data across patients.
I think ITK is made for this kind if purpose. A Python wrapper exists (Paul Novotny distributes binaries for Ubuntu on his site), but this is mainly C++. If you work under Linux then it is quite simple to compile if you are familiar with cmake.
As this toolkit is a very low-level framework I can advise you to try elastix which is a command line utility allowing one to make registration on picture using multiscale Bspline dense registration.
Another interesting tool based on Maxwell demons and improved with diffeomorphic capabilities is MedINIRA.
Along SPM's lines, you can use IBSPM. It was developed to solve exactly that problem.
You can use ANTs software, or u can use Python within 3dSclicer for template registration.
However, I did mane template registration in SPM and I recommend it for fMRI data better than ITK or Slicer.
I found these links very helpful :) let me know if you need more help.
https://fmri-training-course.psych.lsa.umich.edu/wp-content/uploads/2017/08/Preprocessing-of-fMRI-Data-in-SPM-12-Lab-1.pdf
https://nipype.readthedocs.io/en/latest/users/examples/fmri_spm.html