I am using opencv to exposure fuse bracketed images. I started based on this article https://learnopencv.com/exposure-fusion-using-opencv-cpp-python/
In that article alignMTB is used for aligning and MergeMertens for the exposure fusion.
The relevant part of my code is:
MAXBITS = 9
EXCLUDE_RANGE = 3
CUT = True
print("Aligning images using AlignMTB ... ")
for filename in file_list:
print("alignMTB: reading image " + filename)
im = cv2.imread(filename)
images.append(im)
alignMTB = cv2.createAlignMTB(MAXBITS, EXCLUDE_RANGE, CUT)
alignMTB.process(images, images)
# Merge using Exposure Fusion
print("\nMerging using Exposure Fusion ... ");
mergeMertens = cv2.createMergeMertens()
exposureFusion = mergeMertens.process(images)
(The parameter constants are in the top of my script, but for clarity I put them here in the copied code part)
However, the alignment of my hand-held images is pretty bad. I used to use align_image_stack and enfuse, and align_image_stack always aligns the images (much) better than alignMTB.
Now the createAlignMTB(MAXBITS, EXCLUDE_RANGE, CUT) accepts 3 optional parameters, but at first I simply used alignMTB = cv2.createAlignMTB() without parameters. However, as the alignment wasn't as good as align_image_stack (sometimes not at all), I then started playing with the parameters. But whatever values I use, it doesn't do anything. The created images, after fusing, are identical and there is absolutely no improvement or worsening in correct alignment or not.
I tried this with opencv-python 4.6.0.66 and now 4.7.0.68.
When I try the same with hdr using createCalibrateDebevec or one of the others, the alignment with alignMTB is just as meager or as bad. It is really the alignMTB step.
Something I do wrong or totally not understand?
EDIT: I also use ORB (preferably) and ECC to simply stack images to reduce noise. ORB does a great job in alignment and is fast, but I can't combine it with merging images. Is there an option to use ORB as alignment step?
EDIT 2: I can now use ORB (and ECC and SIFT) to align images pairwise and save the aligned image (compared to the "reference image") to a tmp folder and then merge them all. Compared to the total process (10~40 seconds ORB, the saving/loading of the tmp files is 0.1~2 seconds depending on the amount). So far, so good.
Still I would very much like to know how I can improve the alignMTB results.
Related
I have some kind of alignment task to do. In the process, I need to extract descriptors and keypoints.
I'm using the following simple code for 2 images that are almost identical, with the same shape:
orb = cv2.ORB_create(maxFeatures)
(kpsA, descsA) = orb.detectAndCompute(image, None)
(kpsB, descsB) = orb.detectAndCompute(template, None)
ORB fails with the image on the left, but fine with the right one.
The returned (kpsA, descsA) are fine, but len(kpsB)==0 and descsB==None and I can't find the reason for that.
As mentioned in the comments, ORB fails to detect any features on the left image and probably only finds few features on the right image.
Instead consider doing image alignment/ image registration using a method that is not feature based. Have a look at dense optical flow algorithms such as cv::optflow::DenseRLOFOpticalFlow.
With that being said, your task looks challenging. Even humans will have difficulties solving it well. Good luck.
It's been a year since you asked the question but I want to suggest you try this. A plausible reason you see None for some images is because the default value of the threshold in the function is too high. So just play with the fastThreshold and edgeThreshold params and see if it works for you.
A good sanity check will be to set
orb = cv2.ORB_create(fastThreshold=0, edgeThreshold=0)
and see what's happen.
Next, you can chose whether to ignore these images or to try a smaller threshold.
The full explanation of the function params is here in the opencv doc.
I am trying to use MRI brain imaging data for deep learning model. Currently my image has 4 dimensions as shown below but I would like to retain only the T1c modality of the MRI image because my model input should only be 1 channel 3D MRIs (T1c).
I did try to make use of the Nibabel package as shown below
import nibabel as nib
ff = glob.glob('imagesTr\*')
a = nib.load(ff[0])
a.shape
This returns the below output
I am also pasting the header info of 'a'
From this, which of the dimension is used to identify the MRI modality like (T1,T2, T1c, FLAIR etc)? and How can I retain only T1c?. Can you please help?
First you need to identify the order of the images stores in the 4th dimensions.
Probably the header will help:
print(a.header)
Next, to keep only 1 modality you can use this:
data = a.get_fdata()
modality_1 = data[:,:,:,0]
EDIT 1:
Based on the website of the challenge:
All BraTS multimodal scans are available as NIfTI files (.nii.gz) and
describe a) native (T1) and b) post-contrast T1-weighted (T1Gd), c)
T2-weighted (T2), and d) T2 Fluid Attenuated Inversion Recovery
(FLAIR) volumes, and were acquired with different clinical protocols
and various scanners from multiple (n=19) institutions, mentioned as
data contributors here.
and
The provided data are distributed after their pre-processing, i.e.
co-registered to the same anatomical template, interpolated to the
same resolution (1 mm^3) and skull-stripped.
So the header will not help in this case (equal dimensions for all modalities due to preprocessing).
If you are looking for the post-contrast T1-weighted (T1Gd) images then it's the 2nd dimension so use:
data = a.get_fdata()
modality_1 = data[:,:,:,1]
Additionally, we can visualize the each 3D volume (data[:,:,:,0], data[:,:,:,1],data[:,:,:,2], data[:,:,:,3]) and verify my statement.
See here: https://gofile.io/?c=fhoZTu
It's not possible to identify the type of MRI from the Nifti header. You would need the original DICOM images to derive this type of information.
You can, however, visually check your images and compare the contrast/colour of the white matter, grey matter and ventricles to figure out if your image is T1, T2, FLAIR, etc. For instance in a T1-image you would expect darker grey matter, lighter white matter and black CSF. In a T2 image you would expect lighter grey matter, darker white matter and white CSF. A FLAIR is the same as T2 but with 'inverted' CSF.
See some example brain images here: https://casemed.case.edu/clerkships/neurology/Web%20Neurorad/t1t2flairbrain.jpg
That being said, you seem to have a 4-dimensional image, which suggests some sort of time series, so I would assume your data is DTI or fMRI or something like it.
It's also not possible to transform one type of MRI into another, so if your data set is not already T1, then there is no way to use it in a model that expects clean T1 data.
I would strongly encourage you to learn more about MRI and the type of data you are working with. Otherwise it will be impossible to interpret your results.
Imagine someone taking a burst shot from camera, he will be having multiple images, but since no tripod or stand was used, images taken will be slightly different.
How can I align them such that they overlay neatly and crop out the edges
I have searched a lot, but most of the solutions were either making a 3D reconstruction or using matlab.
e.g. https://github.com/royshil/SfM-Toy-Library
Since I'm very new to openCV, I will prefer a easy to implement solution
I have generated many datasets by manually rotating and cropping images in MSPaint but any link containing corresponding datasets(slightly rotated and translated images) will also be helpful.
EDIT:I found a solution here
http://www.codeproject.com/Articles/24809/Image-Alignment-Algorithms
which gives close approximations to rotation and translation vectors.
How can I do better than this?
It depends on what you mean by "better" (accuracy, speed, low memory requirements, etc). One classic approach is to align each frame #i (with i>2) with the first frame, as follows:
Local feature detection, for instance via SIFT or SURF (link)
Descriptor extraction (link)
Descriptor matching (link)
Alignment estimation via perspective transformation (link)
Transform image #i to match image 1 using the estimated transformation (link)
I have been experimenting with PyTesser for the past couple of hours and it is a really nice tool. Couple of things I noticed about the accuracy of PyTesser:
File with icons, images and text - 5-10% accurate
File with only text(images and icons erased) - 50-60% accurate
File with stretching(And this is the best part) - Stretching file
in 2) above on x or y axis increased the accuracy by 10-20%
So apparently Pytesser does not take care of font dimension or image stretching. Although there is much theory to be read about image processing and OCR, are there any standard procedures of image cleanup(apart from erasing icons and images) that needs to be done before applying PyTesser or other libraries irrespective of the language?
...........
Wow, this post is quite old now. I started my research again on OCR these last couple of days. This time I chucked PyTesser and used the Tesseract Engine with ImageMagik instead. Coming straight to the point, this is what I found:
1) You can increase the resolution with ImageMagic(There are a bunch of simple shell commands you can use)
2) After increasing the resolution, the accuracy went up by 80-90%.
So the Tesseract Engine is without doubt the best open source OCR engine in the market. No prior image cleaning was required here. The caveat is that it does not work on files with a lot of embedded images and I coudn't figure out a way to train Tesseract to ignore them. Also the text layout and formatting in the image makes a big difference. It works great with images with just text. Hope this helped.
As it turns out, tesseract wiki has an article that answers this question in best way I can imagine:
Illustrated guide about "Improving the quality of the [OCR] output".
Question "image processing to improve tesseract OCR accuracy" may also be of interest.
(initial answer, just for the record)
I haven't used PyTesser, but I have done some experiments with tesseract (version: 3.02.02).
If you invoke tesseract on colored image, then it first applies global Otsu's method to binarize it and then actual character recognition is run on binary (black and white) image.
Image from: http://scikit-image.org/docs/dev/auto_examples/plot_local_otsu.html
As it can be seen, 'global Otsu' may not always produce desirable result.
To better understand what tesseract 'sees' is to apply Otsu's method to your image and then look at the resulting image.
In conclusion: the most straightforward method to improve recognition ratio is to binarize images yourself (most likely you will have find good threshold by trial and error) and then pass those binarized images to tesseract.
Somebody was kind enough to publish api docs for tesseract, so it is possible to verify previous statements about processing pipeline: ProcessPage -> GetThresholdedImage -> ThresholdToPix -> OtsuThresholdRectToPix
Not sure if your intent is for commercial use or not, But this works wonders if your performing OCR on a bunch of like images.
http://www.fmwconcepts.com/imagemagick/textcleaner/index.php
ORIGINAL
After Pre-Processing with given arguments.
I know it's not a perfect answer. But I'd like to share with you a video that I saw from PyCon 2013 that might be applicable. It's a little devoid of implementation details, but just might be some guidance/inspiration to you on how to solve/improve your problem.
Link to Video
Link to Presentation
And if you do decide to use ImageMagick to pre-process your source images a little. Here is question that points you to nice python bindings for it.
On a side note. Quite an important thing with Tesseract. You need to train it, otherwise it wont be nearly as good/accurate as it's capable of being.
I would like to ask you for help. I am a student and for academic research I'm designing a system where one of the modules is responsible for comparison of low-resolution simple images (img, jpg, jpeg, png, gif). However, I need guidance if I can write an implementation in Python and how to get started. Maybe someone of you met once with something like this and would be able to share their knowledge.
Issue 1 - simple version
The input data must be compared with the pattern (including images) and the data output will contain information about the degree of similarity (percentage), and the image of the pattern to which the given input is the most similar. In this version, the presumption is that the input image is not modified in any way (ie not rotated, tilted, etc.)
Issue 2 - difficult version
The input data must be compared with the pattern (including images) and the data output will contain information about the degree of similarity (percentage), and the image of the pattern to which the given input is the most similar. In this version, the presumption is that the input image can be rotated
Can some of you guys tell me what I need to do that and how to start. I will appreciate any help.
As a starter, you could read in the images using matplotlib, or the python imaging library (PIL).
Comparing to a pattern could be done by a cross-correlation, which you could do using scipyor numpy. As you only have few pixels, I would go for numpy which does not use fourier transforms.
import pylab as P
import numpy as N
# read the images
im1 = P.imread('4Fsjx.jpg')
im2 = P.imread('xUHhB.jpg')
# do the crosscorrelation
conv = N.convolve(im1, im2)
# a measure for similarity then is:
sim = N.sum(N.flatten(conv))
please note, this is a very quick and dirty approach and you should spend quite some thoughts on how to improve it, not even including the rotation that you mentioned. Anyhow; this code can read in your images, and give you a measure for similarity, although the convolve will not work on color coded data. I hope it will give you something to start at.
Here is a start as some pseudo code. I would strongly recommend getting numpy/scipy to help with this.
#read the input image:
files = glob.glob('*.templates')
listOfImages = []
for elem in files:
imagea = scipy.misc.imread(elem)
listOfImages.append(imagea)
#read input/test imagea
targetImage = scipy.misc.imread(targetImageName)
now loop through each of the listOfImages and compute the "distance"
note that this is probably the hardest part. How will you decide
if two images are similar? Using direct pixel comparisons? Using
image histograms, using some image aligment metric(this would be useful
for your difficult version). Some of the simple gotchas, I noticed that your uploaded images were different sizes. If the images are of different sizes then you will have to
sweep over the images. Also, can the images be scaled? Then you will need to either have a scale invariant metric or try the sweep over different scales
#keep track of the min distance
minDistance = Distance(targetImage,listOfImages[0])
minIndex = 0
for index,elem in enumerate(listOfImages):
currentDistance = Distance(targetImage,elem)
if currentDistance < minDistance:
minDistance = currentDistance
minIndex = index
The distance function is where the challenges are, but I'll leave that
for you.