Histogram equalization without extreme values - python

Is it possible to make a histogram equalization without the extreme values 0 and 255?
Specifically I have an image, in which many pixels are zero. More than half of all pixels are zero. So if I do a histogram equalization there I shift basically the value 1 up to value 240 which is exactly the opposite what I want to do with a histogram equalization.
So is there a method to only calculate the histogram equalization between values 1 and 254?
At the moment my code looks the following:
flat = image.flatten()
# get image histogram
image_histogram, bins = np.histogram(flat, bins=range(0, number_bins), density=True)
cdf = image_histogram.cumsum() # cumulative distribution function
cdf = 255 * cdf /cdf.max() # normalize
cdf = cdf.astype('uint8')
# use linear interpolation of cdf to find new pixel values
image_equalized = np.interp(flat, bins[:-1], cdf)
image_equalized = image_equalized.reshape(image.shape), cdf
Thanks

One way to solve this would be to filter out the unwanted values before we make the histogram, and then make a "conversion table" from a non-normalized pixel to a normalized pixel.
import numpy as np
# generate random image
image = np.random.randint(0, 256, (32, 32))
# flatten image
flat = image.flatten()
# get image histogram
image_histogram, bins = np.histogram(flat[np.where((flat != 0) & (flat != 255))[0]],
bins=range(0, 10),
density=True)
cdf = image_histogram.cumsum() # cumulative distribution function
cdf = 255 * cdf /cdf.max() # normalize
cdf = cdf.astype('uint8')
# use linear interpolation of cdf to find new pixel values
# we make a list conversion_table, where the index is the original pixel value,
# and the value is the histogram normalized pixel value
conversion_table = np.interp([i for i in range(0, 256)], bins[:-1], cdf)
# replace unwanted values by original
conversion_table[0] = 0
conversion_table[-1] = 255
image_equalized = np.array([conversion_table[pixel] for pixel in flat])
image_equalized = image_equalized.reshape(image.shape), cdf
disclaimer: I have absolutely no experience whatsoever with image processing, so I have no idea about the validity :)

Related

Calculating probability distribution of an image?

I want to find the probability distribution of two images so I can calculate KL Divergence.
I'm trying to figure out what probability distribution means in this sense. I've converted my images to grayscale, flattened them to a 1d array and plotted them as a histogram with bins = 256
imageone = imgGray.flatten() # array([0.64991451, 0.65775765, 0.66560078, ...,
imagetwo = imgGray2.flatten()
plt.hist(imageone, bins=256, label = 'image one')
plt.hist(imagetwo, bins=256, alpha = 0.5, label = 'image two')
plt.legend(loc='upper left')
My next step is to call the ks_2samp function from scikit to calculate the divergence, but I'm unclear what arguments to use.
A previous answer explained that we should take the "take the histogram of the image(in gray scale) and than divide the histogram values by the total number of pixels in the image. This will result in the probability to find a gray value in the image."
Ref: Can Kullback-Leibler be applied to compare two images?
But what do we mean by take the histogram values? How do I 'take' these values?
Might be overcomplicating things, but confused by this.
The hist function will return 3 values, the first of which is the values (i.e., number counts) in each histogram bin. If you pass the density=True argument to hist, these values will be the probability density in each bin. I.e.,:
prob1, _, _ = plt.hist(imageone, bins=256, density=True, label = 'image one')
prob2, _, _ = plt.hist(imagetwo, bins=256, density=True, alpha = 0.5, label = 'image two')
You can then calculate the KL divergence using the scipy entropy function:
from scipy.stats import entropy
entropy(prob1, prob2)

How to add 20% noise of the maximum pixel intensity in an image?

I have read an image and stored it in the NumPy array.
Let's say the image is stored in variable 'img'.
I know we add Gaussian noise based on the image as the standard deviation of the image. So, I write a function below.
def white_noise_2(sigma, n, mu=0):
noise = np.random.normal(mu, sigma, n)
return noise
where 'n' is the shape of the image and sigma is variance. I want to change or modify my function to add 20% noise in the image as the percentage of maximum intensity. How can we achieve this by modifying the function above?
Using uniform distribution (the function name implies this is the intent):
def white_noise_2(amplitude, n, mu=0):
# amplitude is expected to be 20% of max intensity
# can be obtained by `amplitude = image_array.max() * 0.2
# don't forget to clip the output after adding to the image
np.random.uniform(-amplitude, amplitude, n)
Clipping normal:
def white_noise_2(amplitude, n, mu=0):
# clipping at 2 sigma, which covers ~95% of the data points,
return np.clip(
np.random.normal(0, amplitude / 2, n), -amplitude, amplitude)

Calculate the length of an edge consisting of many pixel data

I have made a workflow code to detect the edges of a flame in an image. I could get the edge line. It consists of many pixel points stored in an array (data in my code). Now based on the data, I would like to calculate the length of the edge. The idea is to calculate the distance between every point in data and sum them all to get the length. I really stuck in making that. Please help me, many thanks.
Here is a processed image:
Here is the original image that converted to the processed image, I put in the code is to compare the result:
import cv2
import matplotlib.pyplot as plt
if __name__ == '__main__':
path = '1897_1.jpg' #processed image
pic = cv2.imread(path)
original = cv2.imread('1897_2.jpg') #original image
img2 = cv2.flip(original, 1)
b,g,r = cv2.split(pic)
img4 = cv2.flip(b, 1)
h,w = img4.shape
data = []
th_val = 20
for i in range(h):
for j in range(w):
val = img4[i, j]
if (val >= th_val):
data.append(j)
break
b1 = range(len(data))
b2 = len(data)
result = [b2]
print (b2)
plt.figure(figsize = (10, 8))
plt.subplot(121)
plt.imshow(img4)
plt.plot(data, b1)
plt.axis('off');
plt.subplot(122)
plt.plot(data, b1)
plt.imshow(img2)
plt.axis('off')
I came up with a very simple solution, it is far from optimal, but it works for this example, and it is a good starting point. Unfortunately, this solution is not optimal for the blue chanell, where the curve is not smooth, but it works for green and red chanells.
data contains width coordinates of the first red pixel overcoming threshold. So, all first pixels are separated by 1 pixel step on vertical axes and data[i+1] - data[i] on horizontal axes. These two values can be considered as two cathetus of the squeare triangle, and the hypothenuse is the distance we want to calculate. So, here is the solution:
length = 0
for i in range(0,len(data)-1):
cathetus = data[i+1]-data[i]
hypothenuse = (cathetus**2 + 1**2)**1/2
length += hypothenuse
print(length)
Update
I have came up with two solutions: a hardcoded one and one released in the form of the function. Let us start with the first one: mean is a rather good approximator for the signal + noise. In the situation, when you do not have very strong noise or missing data, you may use this approach. In the example below we select points with x in [1,2,3] then we calculate mean y for these points and assign mean to coordinate x=2. Next we select points x in [2,3,4] and so on. As a result, we obtain mean_data list with y coordinates and mean_x with x coordinates. We can calculate length with the approach described above. You may also increase the power of smoothing by averaging over 4 and more points from data.
mean_data = []
mean_x = range(1,len(data)-1)
for i in range(0,len(data)-2):
mean_d = (data[i] + data[i+1] + data[i+2])/3
mean_data.append(mean_d)
Another approach is to use smoothing tools from scipy package. One of them is described below. When calculating the length you will have to adjust to new x axes xnew.
from scipy.interpolate import spline
import numpy as np
#transform to np.arrays initial data
b1_ = np.array(b1)
data_ = np.array(data)
# create new x with more data points
xnew = np.linspace(b1_.min(),b1_.max(),50) #50 is a number of points in between
smoothed_data = spline(b1_,data_,xnew)

find mean bin values using histogram2d python [duplicate]

This question already has answers here:
binning data in python with scipy/numpy
(6 answers)
Closed 7 years ago.
How do you calculate the mean values for bins with a 2D histogram in python? I have temperature ranges for the x and y axis and I am trying to plot the probability of lightning using bins for the respective temperatures. I am reading in the data from a csv file and my code is such:
filename = 'Random_Events_All_Sorted_85GHz.csv'
df = pd.read_csv(filename)
min37 = df.min37
min85 = df.min85
verification = df.five_min_1
#Numbers
x = min85
y = min37
H = verification
#Estimate the 2D histogram
nbins = 4
H, xedges, yedges = np.histogram2d(x,y,bins=nbins)
#Rotate and flip H
H = np.rot90(H)
H = np.flipud(H)
#Mask zeros
Hmasked = np.ma.masked_where(H==0,H)
#Plot 2D histogram using pcolor
fig1 = plt.figure()
plt.pcolormesh(xedges,yedges,Hmasked)
plt.xlabel('min 85 GHz PCT (K)')
plt.ylabel('min 37 GHz PCT (K)')
cbar = plt.colorbar()
cbar.ax.set_ylabel('Probability of Lightning (%)')
plt.show()
This makes a nice looking plot, but the data that is plotted is the count, or number of samples that fall into each bin. The verification variable is an array that contains 1's and 0's, where a 1 indicates lightning and a 0 indicates no lightning. I want the data in the plot to be the probability of lightning for a given bin based on the data from the verification variable - thus I need bin_mean*100 in order to get this percentage.
I tried using an approach similar to what is shown here (binning data in python with scipy/numpy), but I was having difficulty getting it to work for a 2D histogram.
There is an elegant and fast way to do this! Use weights parameter to sum values:
denominator, xedges, yedges = np.histogram2d(x,y,bins=nbins)
nominator, _, _ = np.histogram2d(x,y,bins=[xedges, yedges], weights=verification)
So all you need is to divide in each bin the sum of values by the number of events:
result = nominator / denominator.clip(1)
Voila!
This is doable at least with the following method
# xedges, yedges as returned by 'histogram2d'
# create an array for the output quantities
avgarr = np.zeros((nbins, nbins))
# determine the X and Y bins each sample coordinate belongs to
xbins = np.digitize(x, xedges[1:-1])
ybins = np.digitize(y, yedges[1:-1])
# calculate the bin sums (note, if you have very many samples, this is more
# effective by using 'bincount', but it requires some index arithmetics
for xb, yb, v in zip(xbins, ybins, verification):
avgarr[yb, xb] += v
# replace 0s in H by NaNs (remove divide-by-zero complaints)
# if you do not have any further use for H after plotting, the
# copy operation is unnecessary, and this will the also take care
# of the masking (NaNs are plotted transparent)
divisor = H.copy()
divisor[divisor==0.0] = np.nan
# calculate the average
avgarr /= divisor
# now 'avgarr' contains the averages (NaNs for no-sample bins)
If you know the bin edges beforehand, you can do the histogram part in the same just by adding one row.

numpy.interp & masked arrays

I am using a numpy masked array to perform some image processing. The mask is in place to handle NoData pixels which surround the image (a necessary border as these are map projected images with the origin in a no data pixel).
Using the following code block, I am able to perform a gaussian stretch on an image.
def gaussian_stretch(input_array, array_mean, array_standard_deviation, number_of_bins, n):
shape = input_array.shape
input_array = input_array.flatten()
#define a gaussian distribution, get binned GDF histogram
array_standard_deviation *= n
gdf = numpy.random.normal(array_mean, array_standard_deviation, 10000)
hist, bins = numpy.histogram(gdf, number_of_bins, normed=True)
cdf = hist.cumsum()
cdf = 256 * cdf / cdf[-1]
#interpolate and reshape
input_array = numpy.interp(input_array,bins[:-1],cdf)
input_array = input_array.reshape(shape)
return input_array
If the image does not contain a NoData border the stretch works as expected. On an image with a mask, the mask is ignored. Is this expected behavior? Any ideas on how to process only the unmasked data?
I have tried using input_array.compressed(), but this returns a 1D array of only the unmasked values. Using numpy.interp then fails, as expected, because of the size disparity between arrays.
Finally, I understand that using numpy.random.normal will not always return a perfect gaussian distribution and I will add some margin of error contraints once the rest of the algorithm is functioning.
you can get the mask of input_array first, and apply it to the result array, and use scipy.stats.norm to calculate cdf of normal distribution, or you can use scipy.special.erf() to calculate cdf by using the cdf formula of normal distribution:
import scipy.stats as stats
def gaussian_stretch2(input_array, array_mean, array_standard_deviation, n):
mask = input_array.mask
n = stats.norm(array_mean, array_standard_deviation*n)
return numpy.ma.array(n.cdf(input_array), mask=mask)

Categories