Contrast stretching in Python/ OpenCV - python

Searching Google for Histogram Equalization Python or Contrast Stretching Python I am directed to the same links from python documentation in OpenCv which are actually both related to equalization and not stretching (IMO).
http://docs.opencv.org/2.4/doc/tutorials/imgproc/histograms/histogram_equalization/histogram_equalization.html
http://docs.opencv.org/3.2.0/d5/daf/tutorial_py_histogram_equalization.html
Read the documentation, it seems to be a confusion with the wording, as it describes equalization as a stretching operation:
What Histogram Equalization does is to stretch out this range.
AND
So you need to stretch this histogram to either ends (as given in below image, from wikipedia) and that is what Histogram Equalization does (in simple words)
I feel that is wrong because nowhere on Wikipedia it says that histogram equalization means stretching, and reading other sources they clearly distinguish the two operations.
http://homepages.inf.ed.ac.uk/rbf/HIPR2/histeq.htm
http://homepages.inf.ed.ac.uk/rbf/HIPR2/stretch.htm
My questions:
does the OpenCV documentation actually implements Histogram Equalization, while badly explaining it?
Is there any implementation for contrast stretching in Python? (OpenCV, etc?)

OpenCV doesn't have any function for contrast stretching and google yields the same result because histogram equalization does stretch the histogram horizontally but its just the difference of the transformation function. (Both methods increase the contrast of the images.Transformation function transfers the pixel intensity levels from the given range to required range.)
Histogram equalization derives the transformation function(TF) automatically from probability density function (PDF) of the given image where as in contrast stretching you specify your own TF based on the applications' requirement.
One simple TF through which you can do contrast stretching is min-max contrast stretching -
((pixel – min) / (max – min))*255.
You do this for each pixel value. min and max being the minimum and maximum intensities.

You can also use cv2.LUT for contrast stretching by creating a custom table using np.interp. Links to their documentation are this and this respectively. Below an example is shown.
import cv2
import numpy as np
img = cv2.imread('messi.jpg')
original = img.copy()
xp = [0, 64, 128, 192, 255]
fp = [0, 16, 128, 240, 255]
x = np.arange(256)
table = np.interp(x, xp, fp).astype('uint8')
img = cv2.LUT(img, table)
cv2.imshow("original", original)
cv2.imshow("Output", img)
cv2.waitKey(0)
cv2.destroyAllWindows()
The table created
[ 0 0 0 0 1 1 1 1 2 2 2 2 3 3 3 3 4 4
4 4 5 5 5 5 6 6 6 6 7 7 7 7 8 8 8 8
9 9 9 9 10 10 10 10 11 11 11 11 12 12 12 12 13 13
13 13 14 14 14 14 15 15 15 15 16 17 19 21 23 24 26 28
30 31 33 35 37 38 40 42 44 45 47 49 51 52 54 56 58 59
61 63 65 66 68 70 72 73 75 77 79 80 82 84 86 87 89 91
93 94 96 98 100 101 103 105 107 108 110 112 114 115 117 119 121 122
124 126 128 129 131 133 135 136 138 140 142 143 145 147 149 150 152 154
156 157 159 161 163 164 166 168 170 171 173 175 177 178 180 182 184 185
187 189 191 192 194 196 198 199 201 203 205 206 208 210 212 213 215 217
219 220 222 224 226 227 229 231 233 234 236 238 240 240 240 240 240 241
241 241 241 242 242 242 242 243 243 243 243 244 244 244 244 245 245 245
245 245 246 246 246 246 247 247 247 247 248 248 248 248 249 249 249 249
250 250 250 250 250 251 251 251 251 252 252 252 252 253 253 253 253 254
254 254 254 255]
Now cv2.LUT will replace the values of the original image with the values in the table. For example, all the pixels having values 1 will be replaced by 0 and all pixels having values 4 will be replaced by 1.
Original Image
Contrast Stretched Image
The values of xp and fp can be varied to create custom tables as required and it will stretch the contrast even if min and max pixels are 0 and 255 unlike the answer provided by hashcode55.

Python/OpenCV can do contrast stretching via the cv2.normalize() method using min_max normalization. For example:
Input:
#!/bin/python3.7
import cv2
import numpy as np
# read image
img = cv2.imread("zelda3_bm20_cm20.jpg", cv2.IMREAD_COLOR)
# normalize float versions
norm_img1 = cv2.normalize(img, None, alpha=0, beta=1, norm_type=cv2.NORM_MINMAX, dtype=cv2.CV_32F)
norm_img2 = cv2.normalize(img, None, alpha=0, beta=1.2, norm_type=cv2.NORM_MINMAX, dtype=cv2.CV_32F)
# scale to uint8
norm_img1 = (255*norm_img1).astype(np.uint8)
norm_img2 = np.clip(norm_img2, 0, 1)
norm_img2 = (255*norm_img2).astype(np.uint8)
# write normalized output images
cv2.imwrite("zelda1_bm20_cm20_normalize1.jpg",norm_img1)
cv2.imwrite("zelda1_bm20_cm20_normalize2.jpg",norm_img2)
# display input and both output images
cv2.imshow('original',img)
cv2.imshow('normalized1',norm_img1)
cv2.imshow('normalized2',norm_img2)
cv2.waitKey(0)
cv2.destroyAllWindows()
Normalize1:
Normalize2:
You can also do your own stretching by using a simple linear equation with 2 pair of input/ouput values using the form y=A*x+B and solving the two simultaneous equations. See concept for stretching shown in How can I make the gradient appearance of one image equal to the other?

Ok, so I wrote this function that does Standard Deviation Contrast Stretching, on each band of an image.
For normal distributions, 68% of the observations lie within – 1 standard deviation of the mean, 95.4% of all observations lie within – 2 standard deviations, and 99.73% within – 3 standard deviations.
this is basically a min-max stretch but the max is mean+sigma*std and min is mean-sigma*std
def stretch(img,sigma =3,plot_hist=False):
stretched = np.zeros(img.shape)
for i in range(img.shape[2]): #looping through the bands
band = img[:,:,i] # copiying each band into the variable `band`
if np.min(band)<0: # if the min is less that zero, first we add min to all pixels so min becomes 0
band = band + np.abs(np.min(band))
band = band / np.max(band)
band = band * 255 # convertaning values to 0-255 range
if plot_hist:
plt.hist(band.ravel(), bins=256) #calculating histogram
plt.show()
# plt.imshow(band)
# plt.show()
std = np.std(band)
mean = np.mean(band)
max = mean+(sigma*std)
min = mean-(sigma*std)
band = (band-min)/(max-min)
band = band * 255
# this streching cuases the values less than `mean-simga*std` to become negative
# and values greater than `mean+simga*std` to become more than 255
# so we clip the values ls 0 and gt 255
band[band>255]=255
band[band<0]=0
print('band',i,np.min(band),np.mean(band),np.std(band),np.max(band))
if plot_hist:
plt.hist(band.ravel(), bins=256) #calculating histogram
plt.show()
stretched[:,:,i] = band
stretched = stretched.astype('int')
return stretched
in the case above, I didn't need the band ratios to stay the same, but the best practice for an RGB image would be like this:
https://docs.opencv.org/4.x/d5/daf/tutorial_py_histogram_equalization.html
Unfortunately, this CLAHE stretching does not work on multi-band images so should be applied to each band separately - which gives wrong results since the contrast between each band will be lost and the images tend to be gray. what we need to do is:
we need to transform the image into HSV color space and stretch the V (value - intensity) and leave the rest. this is how we get a good stretch(pun intended).
The thing about cv.COLOR_HSV2RGB is that it actually returns BGR instead of RGB so after the HSV2RGB we need to reverse the bands.
here's the function I wrote:
def stack_3_channel(r,g,b , clipLimit = 20 , tileGridSize=(16,16) ):
img = np.stack([r,g,b], axis=2)
img = cv.normalize(img, None, 0, 255, cv.NORM_MINMAX, dtype=cv.CV_8U)
hsv_img = cv.cvtColor(img, cv.COLOR_BGR2HSV)
h, s, v = hsv_img[:,:,0], hsv_img[:,:,1], hsv_img[:,:,2]
clahe = cv.createCLAHE(clipLimit, tileGridSize)
v = clahe.apply(v) #stretched histogram for showing the image with better contrast - its not ok to use it for scientific calculations
hsv_img = np.dstack((h,s,v))
# NOTE: HSV2RGB returns BGR instead of RGB
bgr_stretched = cv.cvtColor(hsv_img, cv.COLOR_HSV2RGB)
#reversing the bands back to RGB
rgb_stretched = np.zeros(bgr_stretched.shape)
rgb_stretched[:,:,0] = bgr_stretched[:,:,2]
rgb_stretched[:,:,1] = bgr_stretched[:,:,1]
rgb_stretched[:,:,2] = bgr_stretched[:,:,0]
# if the valuse are float, plt will have problem showing them
rgb_stretched = rgb_stretched.astype('uint8')
return img , rgb_stretched

Related

[cv2.filter2D() on Python]: why does it return these specific values?

The following Python script computes the 2D convolution of the blue color channel of a .jpg image:
It reads a 6x6 BGR image:
It extracts the channel 0. In cv2 this corresponds to color channel blue.
I print the values of the channel
Input data type is uint8. Therefore, we making cv2.filter2D() and setting ddepth=-1, the output will have data type uint8 too and hence values >255 cannot be represented. Hence, I decided to convert the image from uint8 to, for example, short to have a wider numeric range and be able to represent the values at the output of the filter.
I define a kernel of size 3x3 (see the values of the kernel in the code below).
I filter the blue channel with the kernel and I obtain a filtered image of the same size due to the padding
The filtered values given by the function filter2D() don't correspond to what I would expect. For example, for the top left value the functions returns 449, however I would have expected 425 instead since 71*0+60*0+65*1+69*1+58*3+61*1+89*0+66*0+56*1=425.
Does anyone have any idea about how the filtered image is being calculated by filter2D() function? Is there anything wrong with my proposed calculation?
import cv2
import numpy as np
image = cv2.imread('image.jpg')
# Read image 6x6x3 (BGR)
blue_channel=image[:,:,0]
# Obtain blue channel
print(blue_channel)
# Result is:
#[[71 60 65 71 67 67]
# [69 58 61 69 69 67]
# [89 66 56 55 45 37]
# [65 37 27 32 31 30]
# [46 23 22 38 43 45]
# [55 36 44 60 60 47]]
blue_channel=np.short(blue_channel)
# Convert image from uint8 to short. Otherwise, output of the filter will have the same data type as the
# input when using ddepth=-1 and hence filtered values >255 won't be able to be represented
print(blue_channel)
# Result is (same as before, ok...):
# 71 60 65 71 67 67
# 69 58 61 69 69 67
# 89 66 56 55 45 37
# 65 37 27 32 31 30
# 46 23 22 38 43 45
# 55 36 44 60 60 47
kernel=np.array([ [0, 0, 1], [1, 3, 1], [0, 0, 1] ])
# Kernel is of size 3x3
# [0 0 1]
# [1 3 1]
# [0 0 1]
filtered_image = cv2.filter2D(blue_channel, -1, kernel)
# Blue channel is filtered with the kernel and the result gives:
# 449 438 464 483 473 473
# 449 425 436 449 447 451
# 494 431 390 366 324 301
# 358 281 243 242 237 240
# 257 208 219 270 289 312
# 283 251 304 370 377 347
print(filtered_image)
# Why top left filtered value is 449?
# I would expect this:
# 71*0+60*0+65*1+69*1+58*3+61*1+89*0+66*0+56*1=425
# In short, I would expect 425 instead of 449, how is that 449 computed?
Your calculation is not wrong, but you have actually written convolution for value at [2,2], which match with your result 425.
To calculate value e.g. [1,1] you need values outside of the image, you have to handle surrounding edges. And by default in function filter2D they are handled as reflect 101 , in wiki its shifted mirror edge handling by +1.
To understand diffrence between mirror(reflect) and reflect 101:
Mirror (reflect)
left edge | image | right edge
| |
b a | a b c | c b
Reflect 101
left edge | image | right edge
| |
c b | a b c | b a
So calculation for [1,1] with default edge handling in filder2D would be:
0*58 + 0*69 + 1*58 + 1*60 + 3*71 + 1*60 + 0*58 + 0*69 + 1*58 = 449

Change pixels for improve contrast in picture

I have input image file with hidden text, problem is difference of pixels of hidden text is really small, sometimes only 1px. I want change pixels for see this text.
Because never working with something similar idea is convert image to numpy array and replace values by dict:
from PIL import Image
import matplotlib
img = Image.open('4YtCA.jpg')
data = np.array( img, dtype='uint8' )
#print (data)
a = np.ravel(data)
u, c = np.unique(a, return_counts=True)
print (u)
[ 48 49 50 51 77 78 79 80 100 101 102 103 121 122 123 124 142 143
144 145 164 165 166 167 188 189 190 191 208 209 210 211 212 230 231 232
233 253 254 255]
#new values for replace
new = (u.reshape(-1, 4) / [1,2,3,4]).astype(int)
print (new)
[[ 48 24 16 12]
[ 77 39 26 20]
[100 50 34 25]
[121 61 41 31]
[142 71 48 36]
[164 82 55 41]
[188 94 63 47]
[208 104 70 52]
[212 115 77 58]
[233 126 84 63]]
d = dict(zip(u, np.ravel(new)))
#print (d)
#https://stackoverflow.com/a/46868996
indexer = np.array([d.get(i, -1) for i in range(data.min(), data.max() + 1)])
out = indexer[(data - data.min())]
matplotlib.image.imsave('out.png', out.astype(np.uint8))
I think my solution is not nice, because last value are not seen very well. Is possible change pixels to some different colors like red, green, purple? Or change contract some better way? The best should be change each pixels some smart way, but not idea how.
Input image:
Output image:
You could try a histogram equalisation. I'll just do it with ImageMagick in the Terminal for now to demonstrate:
magick hidden.jpg -equalize -rotate -90 result.png
Or a "Local Adaptive Threshold" - see here:
magick hidden.jpg -lat 50x50 -rotate -90 result.png
If you are running v6 ImageMagick, replace magick with convert in the previous commands.
This is pretty equivalent in Python:
from PIL import Image
from skimage.filters import threshold_local
import numpy as np
# Open image in greyscale
im = Image.open('hidden.jpg').convert('L')
na = np.array(im)
# Local Adaptive Threshold
LAT = threshold_local(na, 49)
result = na > LAT
Image.fromarray((result*255).astype(np.uint8)).save('result.png')
If you really, really don't want to introduce a new dependency on skimage, you can use PIL or Numpy to generate a blurred copy of your image and subtract the blurred from the original and then threshold the difference image. That looks like this:
#!/usr/bin/env python3
from PIL import Image, ImageFilter
import numpy as np
# Open image in greyscale, and make heavily blurred copy
im = Image.open('hidden.jpg').convert('L')
blur = im.filter(ImageFilter.BoxBlur(25))
# Go to Numpy for maths!
na = np.array(im)
nb = np.array(blur)
# Local Adaptive Threshold
res = na >= nb
# Save
Image.fromarray((res*255).astype(np.uint8)).save('result.png')
from PIL import Image
import numpy as np
img = Image.open('4YtCA.jpg').convert('L')
data = np.array(img, dtype='uint8')
u, c = np.unique(data, return_counts=True)
# Set the background colors to white and the rest to black
#data = np.where(np.isin(data, u[c>17000]), 255, 0).astype(np.uint8)
data = np.isin(data, u[c>17000]).astype(np.uint8) * 255 # thanks to Mad Physicist
# Create new image and save
img_new = Image.fromarray(data)
img_new.save('4YtCA_new.jpg')

How to get the co-ordinates of the text recogonized from Image using OCR in python

I am trying to get the coordinates or positions of text character from an Image using Tesseract.
I want to know the exact pixel position, so that i can click that text using some other tool.
Edit :
import pytesseract
from pytesseract import pytesseract
import PIL
from PIL import Image
import cv2
import csv
img = 'E:\\OCR-DATA\\sample.jpg'
imge = Image.open(img)
data=pytesseract.image_to_string(imge,lang='eng',boxes=True,config='hocr')
print(data)
data contains recognized text with box boundary value. But i am not sure , how to use that boundary value to get the co-ordinates of the text.
Value of the data variable is as follows:
O 100 356 115 373 0
u 117 356 127 368 0
t 130 356 138 372 0
p 141 351 152 368 0
u 154 356 164 368 0
t 167 356 175 371 0
you can try This:
img = 'tes.jpg'
imge = Image.open(img)
data=pytesseract.image_to_boxes(imge)
print(data)
This will directly give you the result Like:
T 22 58 52 97 0
H 62 58 95 96 0
R 102 58 135 97 0
E 146 57 174 97 0
A 184 57 216 96 0
D 225 56 258 96 0
You have the coordinates of the bounding box in every line.
From: Training Tesseract – Make Box Files
character, left, bottom, right, top, page
So for each character you get the character, followed by its bounding box characters, followed by the 0-based page number.

Is there a way to save a custom matplotlib colorbar to use elsewhere?

Is there a way to save a custom maplotlib colourmap (matplotlib.cm) as a file (e.g Color Palette Table file (.cpt), like used in MATLAB) to be shared and then use later in other programs? (e.g. Panopoly, MATLAB...)
Example
Below a new LinearSegmentedColormap is made by modifying an existing colormap (by truncation, as shown in another question linked here).
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
# Get an existing colorbar
cb = 'CMRmap'
cmap = plt.get_cmap( cb )
# Variables to modify (truncate) the colormap with
minval = 0.15
maxval = 0.95
npoints = 100
# Now modify (truncate) the colorbar
cmap = matplotlib.colors.LinearSegmentedColormap.from_list(
'trunc({n},{a:.2f},{b:.2f})'.format(n=cmap.name, a=minval,
b=maxval), cmap(np.linspace(minval, maxval, npoints)))
# Now the data can be extracted as a dictionary
cdict = cmap._segmentdata
# e.g. variables ('blue', 'alpha', 'green', 'red')
print( cdict.keys() )
# Now, is it possible to save to this as a .cpt?
More detail
I am aware of ways of loading external colormaps in matplotlib (e.g. shown here and here).
From NASA GISS's Panoply documentation:
Color Palette Table (CPT) indicates a color palette format used by the
Generic Mapping Tools program. The format defines a number of solid
color and/or gradient bands between the colorbar extrema rather than a
finite number of distinct colors.
The following is a function that takes a colormap, some limits (vmin and vmax) and the number of colors as input and creates a cpt file from it.
import matplotlib.pyplot as plt
import numpy as np
def export_cmap_to_cpt(cmap, vmin=0,vmax=1, N=255, filename="test.cpt",**kwargs):
# create string for upper, lower colors
b = np.array(kwargs.get("B", cmap(0.)))
f = np.array(kwargs.get("F", cmap(1.)))
na = np.array(kwargs.get("N", (0,0,0))).astype(float)
ext = (np.c_[b[:3],f[:3],na[:3]].T*255).astype(int)
extstr = "B {:3d} {:3d} {:3d}\nF {:3d} {:3d} {:3d}\nN {:3d} {:3d} {:3d}"
ex = extstr.format(*list(ext.flatten()))
#create colormap
cols = (cmap(np.linspace(0.,1.,N))[:,:3]*255).astype(int)
vals = np.linspace(vmin,vmax,N)
arr = np.c_[vals[:-1],cols[:-1],vals[1:],cols[1:]]
# save to file
fmt = "%e %3d %3d %3d %e %3d %3d %3d"
np.savetxt(filename, arr, fmt=fmt,
header="# COLOR_MODEL = RGB",
footer = ex, comments="")
# test case: create cpt file from RdYlBu colormap
cmap = plt.get_cmap("RdYlBu",255)
# you may create your colormap differently, as in the question
export_cmap_to_cpt(cmap, vmin=0,vmax=1,N=20)
The resulting file looks like
# COLOR_MODEL = RGB
0.000000e+00 165 0 38 5.263158e-02 190 24 38
5.263158e-02 190 24 38 1.052632e-01 215 49 39
1.052632e-01 215 49 39 1.578947e-01 231 83 55
1.578947e-01 231 83 55 2.105263e-01 244 114 69
2.105263e-01 244 114 69 2.631579e-01 249 150 86
2.631579e-01 249 150 86 3.157895e-01 253 181 104
3.157895e-01 253 181 104 3.684211e-01 253 207 128
3.684211e-01 253 207 128 4.210526e-01 254 230 153
4.210526e-01 254 230 153 4.736842e-01 254 246 178
4.736842e-01 254 246 178 5.263158e-01 246 251 206
5.263158e-01 246 251 206 5.789474e-01 230 245 235
5.789474e-01 230 245 235 6.315789e-01 206 234 242
6.315789e-01 206 234 242 6.842105e-01 178 220 235
6.842105e-01 178 220 235 7.368421e-01 151 201 224
7.368421e-01 151 201 224 7.894737e-01 120 176 211
7.894737e-01 120 176 211 8.421053e-01 96 149 196
8.421053e-01 96 149 196 8.947368e-01 70 118 180
8.947368e-01 70 118 180 9.473684e-01 59 86 164
9.473684e-01 59 86 164 1.000000e+00 49 54 149
B 165 0 38
F 49 54 149
N 0 0 0
and would be in the required format.

Image to ASCII in python using a PPM file. PIL not allowed

Needing a little help/direction with a project. Our task is to take in a .ppm file (the one we are required to test with is found here: http://beastie.cs.ua.edu/cs250/projects/asciiart/tux.ppm) and reprint it out on the screen using ascii characters. We are required to convert the pixels to greyscale. This is really where I am stuck. Cannot figure out how to read in every three elements (because every three is a pixel in PPM files), convert them to greyscale and move on. Again PIL is not allowed. Any help or links on what to read up on would be awesome!
The PPM isn't hard to parse.
The header:
P3
50 50
255
P3 means that the image is an ASCII pixmap (color).
50 50 is the width and height.
255 is the max color value.
The body:
254 254 252 254 254 252 254 254 252 254 254 252 254 254 252 254 254 252
254 254 252 254 254 252 254 254 252 254 254 252 254 254 252 254 254 252
254 254 252 254 254 252 254 254 252 253 255 250 239 244 237 251 255 248
234 236 231 255 255 251 252 251 249 255 254 251 253 248 242 255 255 244
...
Just remove all newlines:
body.replace('\n', ' ')
And parse it in triplets (not too elegant):
raw = body.split(' ')
for i in range(0, len(raw), 3):
red = raw[i]
green = raw[i + 1]
blue = raw[i + 2]
Reading the ppm file you can do this:
# Open the PPM file and process the 3 first lines
f = open("tux.ppm")
color = f.readline().splitlines()
size_x, size_y = f.readline().split()
max = f.readline().splitlines()
You really don't need to know about the 3 first lines of the file. You only must know that you are working with a RGB image, which means you have 3 values (0-255) for each pixel.
To convert the image to grayscale you have two options: you can either generate another PPM file (with 3 values per pixel) or you can generate a PGM file which has the same format as the PPM but the first line is P2 instead of P3 and you will have only one value per pixel (that's the cool way).
To convert a RGB color value (r,g,b) in one grayscale intensity value you can apply this formula (better than simply apply the average):
0.21*r + 0.71*g + 0.07*b
Generate the grayscale image with one value per pixel (if you want it in 3-values you only have to repeat 3 times it for r,g,b):
# Getting the image data (if you have read the 3 first lines...)
data = f.read().split()
# Generate a new array of pixels with grayscale values (1 per pixel)
gray_data = [0.21*data[i] + 0.71*data[i+1] + 0.07*data[i+2] for i in range(0,len(data),3)]

Categories