Strange darkening effect in matrix colour printing - python

Please explain me why this happens:
I have a function to print matrices with discrete values (in this case, -1, 0 and 1):
def PrintMatrixBW(array, title, s_folder):
"""
'array' is the matrix to be represented;
'title' is the tile of the plot;
's_folder' is the saving folder. For NOT saving, s_folder=0.
"""
fig, (ax0) = plt.subplots(1)
fig.tight_layout()
ax0.set_title(title)
colour_map = {
0 : '#000000', # black
1: '#FFFFFF'# white
}
N = len(colour_map)
values = list(colour_map.keys())
colours = list(colour_map.values())
cmap = LinearSegmentedColormap.from_list('', colours, N)
plt.imshow(array, cmap=cmap, origin='lower')
cbar = plt.colorbar()
# Puts each label in the middle of respective colour interval:
colour_width = (max(values) - min(values)) / N
positions = np.linspace(min(values) + colour_width/2, max(values) - colour_width/2, N)
cbar.set_ticks(positions)
cbar.set_ticklabels(values)
plt.show()
filename = title + '_out' + '.png'
# saving option:
if s_folder != 0:
outpath = s_folder
fig.savefig(outpath + filename)
return filename
Now I create a set of matrices like this:
n = 100
matrix = [[rd.choice([-1, 0, 1]) for j in range(n)] for i in range(n)]
PrintMatrixBWR(matrix, "cool_image" + str(n), out_folder)
These are the set of images as an outcome of running the previous lines from n=100 to n=1000:
Why is the matrix becoming more black as I increase its size?
I also tried to change rd.choice([-1, 0, 1]) to rd.choice([0, -1, 1])to see if the problem was in the random number generator, but I obtained the same result.
A slightly different result was when I tried to do rd.choice([1, 0, 1]). I expected not to see red at all, but the three colours appear (this time in different proportions). Here's the result, still weird to me:
In the first set of pictures I don't understand how it gets darker, since the colours should mantain their proportions, and black is not a privileged colour. In the second set, I don't understand how the colour red appears when I specify only to values to populate the matrix, with rd.choice([1, 0, 1]). I also don't understand how can it converge to 50-50 black and white, when I have a 2/3 for 1/3 proportion of ones to zeros.
What is going on?

Related

How to Create a Boxplot / Group Boxplot from [Min ,Q1 ,Q2 ,Q3 ,Max] in Python? [duplicate]

From what I can see, boxplot() method expects a sequence of raw values (numbers) as input, from which it then computes percentiles to draw the boxplot(s).
I would like to have a method by which I could pass in the percentiles and get the corresponding boxplot.
For example:
Assume that I have run several benchmarks and for each benchmark I've measured latencies ( floating point values ). Now additionally, I have precomputed the percentiles for these values.
Hence for each benchmark, I have the 25th, 50th, 75th percentile along with the min and max.
Now given these data, I would like to draw the box plots for the benchmarks.
As of 2020, there is a better method than the one in the accepted answer.
The matplotlib.axes.Axes class provides a bxp method, which can be used to draw the boxes and whiskers based on the percentile values. Raw data is only needed for the outliers, and that is optional.
Example:
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
boxes = [
{
'label' : "Male height",
'whislo': 162.6, # Bottom whisker position
'q1' : 170.2, # First quartile (25th percentile)
'med' : 175.7, # Median (50th percentile)
'q3' : 180.4, # Third quartile (75th percentile)
'whishi': 187.8, # Top whisker position
'fliers': [] # Outliers
}
]
ax.bxp(boxes, showfliers=False)
ax.set_ylabel("cm")
plt.savefig("boxplot.png")
plt.close()
This produces the following image:
To draw the box plot using just the percentile values and the outliers ( if any ) I made a customized_box_plot function that basically modifies attributes in a basic box plot ( generated from a tiny sample data ) to make it fit according to your percentile values.
The customized_box_plot function
def customized_box_plot(percentiles, axes, redraw = True, *args, **kwargs):
"""
Generates a customized boxplot based on the given percentile values
"""
box_plot = axes.boxplot([[-9, -4, 2, 4, 9],]*n_box, *args, **kwargs)
# Creates len(percentiles) no of box plots
min_y, max_y = float('inf'), -float('inf')
for box_no, (q1_start,
q2_start,
q3_start,
q4_start,
q4_end,
fliers_xy) in enumerate(percentiles):
# Lower cap
box_plot['caps'][2*box_no].set_ydata([q1_start, q1_start])
# xdata is determined by the width of the box plot
# Lower whiskers
box_plot['whiskers'][2*box_no].set_ydata([q1_start, q2_start])
# Higher cap
box_plot['caps'][2*box_no + 1].set_ydata([q4_end, q4_end])
# Higher whiskers
box_plot['whiskers'][2*box_no + 1].set_ydata([q4_start, q4_end])
# Box
box_plot['boxes'][box_no].set_ydata([q2_start,
q2_start,
q4_start,
q4_start,
q2_start])
# Median
box_plot['medians'][box_no].set_ydata([q3_start, q3_start])
# Outliers
if fliers_xy is not None and len(fliers_xy[0]) != 0:
# If outliers exist
box_plot['fliers'][box_no].set(xdata = fliers_xy[0],
ydata = fliers_xy[1])
min_y = min(q1_start, min_y, fliers_xy[1].min())
max_y = max(q4_end, max_y, fliers_xy[1].max())
else:
min_y = min(q1_start, min_y)
max_y = max(q4_end, max_y)
# The y axis is rescaled to fit the new box plot completely with 10%
# of the maximum value at both ends
axes.set_ylim([min_y*1.1, max_y*1.1])
# If redraw is set to true, the canvas is updated.
if redraw:
ax.figure.canvas.draw()
return box_plot
USAGE
Using inverse logic ( code at the very end ) I extracted the percentile values from this example
>>> percentiles
(-1.0597368367634488, 0.3977683984966961, 1.0298955252405229, 1.6693981537742526, 3.4951447843464449)
(-0.90494930553559483, 0.36916539612108634, 1.0303658700697103, 1.6874542731392828, 3.4951447843464449)
(0.13744105279440233, 1.3300645202649739, 2.6131540656339483, 4.8763411136047647, 9.5751914834437937)
(0.22786243898199182, 1.4120860286080519, 2.637650402506837, 4.9067126578493259, 9.4660357513550899)
(0.0064696168078617741, 0.30586770128093388, 0.70774153557312702, 1.5241965711101928, 3.3092932063051976)
(0.007009744579241136, 0.28627373934008982, 0.66039691869500572, 1.4772725266672091, 3.221716765477217)
(-2.2621660374110544, 5.1901313713883352, 7.7178532139979357, 11.277744848353247, 20.155971739152388)
(-2.2621660374110544, 5.1884411864079532, 7.3357079047721054, 10.792299385806913, 18.842012119715388)
(2.5417888074435702, 5.885996170695587, 7.7271286220368598, 8.9207423361593179, 10.846938621419374)
(2.5971767318505856, 5.753551925927133, 7.6569980004033464, 8.8161056254143233, 10.846938621419374)
Note that to keep this short I haven't shown the outliers vectors which will be the 6th element of each of the percentile array.
Also note that all usual additional kwargs / args can be used since they are simply passed to the boxplot method inside it :
>>> fig, ax = plt.subplots()
>>> b = customized_box_plot(percentiles, ax, redraw=True, notch=0, sym='+', vert=1, whis=1.5)
>>> plt.show()
EXPLANATION
The boxplot method returns a dictionary mapping the components of the boxplot to the individual matplotlib.lines.Line2D instances that were created.
Quoting from the matplotlib.pyplot.boxplot documentation :
That dictionary has the following keys (assuming vertical boxplots):
boxes: the main body of the boxplot showing the quartiles and the median’s confidence intervals if enabled.
medians: horizonal lines at the median of each box.
whiskers: the vertical lines extending to the most extreme, n-outlier data points. caps: the horizontal lines at the ends of the whiskers.
fliers: points representing data that extend beyond the whiskers (outliers).
means: points or lines representing the means.
For example observe the boxplot of a tiny sample data of [-9, -4, 2, 4, 9]
>>> b = ax.boxplot([[-9, -4, 2, 4, 9],])
>>> b
{'boxes': [<matplotlib.lines.Line2D at 0x7fe1f5b21350>],
'caps': [<matplotlib.lines.Line2D at 0x7fe1f54d4e50>,
<matplotlib.lines.Line2D at 0x7fe1f54d0e50>],
'fliers': [<matplotlib.lines.Line2D at 0x7fe1f5b317d0>],
'means': [],
'medians': [<matplotlib.lines.Line2D at 0x7fe1f63549d0>],
'whiskers': [<matplotlib.lines.Line2D at 0x7fe1f5b22e10>,
<matplotlib.lines.Line2D at 0x7fe20c54a510>]}
>>> plt.show()
The matplotlib.lines.Line2D objects have two methods that I'll be using in my function extensively. set_xdata ( or set_ydata ) and get_xdata ( or get_ydata ).
Using these methods we can alter the position of the constituent lines of the base box plot to conform to your percentile values ( which is what the customized_box_plot function does ). After altering the constituent lines' position, you can redraw the canvas using figure.canvas.draw()
Summarizing the mappings from percentile to the coordinates of the various Line2D objects.
The Y Coordinates :
The max ( q4_end - end of 4th quartile ) corresponds to the top most cap Line2D object.
The min ( q1_start - start of the 1st quartile ) corresponds to the lowermost most cap Line2D object.
The median corresponds to the ( q3_start ) median Line2D object.
The 2 whiskers lie between the ends of the boxes and extreme caps ( q1_start and q2_start - lower whisker; q4_start and q4_end - upper whisker )
The box is actually an interesting n shaped line bounded by a cap at the lower portion. The extremes of the n shaped line correspond to the q2_start and the q4_start.
The X Coordinates :
The Central x coordinates ( for multiple box plots are usually 1, 2, 3... )
The library automatically calculates the bounding x coordinates based on the width specified.
INVERSE FUNCTION TO RETRIEVE THE PERCENTILES FROM THE boxplot DICT:
def get_percentiles_from_box_plots(bp):
percentiles = []
for i in range(len(bp['boxes'])):
percentiles.append((bp['caps'][2*i].get_ydata()[0],
bp['boxes'][i].get_ydata()[0],
bp['medians'][i].get_ydata()[0],
bp['boxes'][i].get_ydata()[2],
bp['caps'][2*i + 1].get_ydata()[0],
(bp['fliers'][i].get_xdata(),
bp['fliers'][i].get_ydata())))
return percentiles
NOTE:
The reason why I did not make a completely custom boxplot method is because, there are many features offered by the inbuilt box plot that cannot be fully reproduced.
Also excuse me if I may have unnecessarily explained something that may have been too obvious.
Here is an updated version of this useful routine. Setting the vertices directly appears to work for both filled boxes (patchArtist=True) and unfilled ones.
def customized_box_plot(percentiles, axes, redraw = True, *args, **kwargs):
"""
Generates a customized boxplot based on the given percentile values
"""
n_box = len(percentiles)
box_plot = axes.boxplot([[-9, -4, 2, 4, 9],]*n_box, *args, **kwargs)
# Creates len(percentiles) no of box plots
min_y, max_y = float('inf'), -float('inf')
for box_no, pdata in enumerate(percentiles):
if len(pdata) == 6:
(q1_start, q2_start, q3_start, q4_start, q4_end, fliers_xy) = pdata
elif len(pdata) == 5:
(q1_start, q2_start, q3_start, q4_start, q4_end) = pdata
fliers_xy = None
else:
raise ValueError("Percentile arrays for customized_box_plot must have either 5 or 6 values")
# Lower cap
box_plot['caps'][2*box_no].set_ydata([q1_start, q1_start])
# xdata is determined by the width of the box plot
# Lower whiskers
box_plot['whiskers'][2*box_no].set_ydata([q1_start, q2_start])
# Higher cap
box_plot['caps'][2*box_no + 1].set_ydata([q4_end, q4_end])
# Higher whiskers
box_plot['whiskers'][2*box_no + 1].set_ydata([q4_start, q4_end])
# Box
path = box_plot['boxes'][box_no].get_path()
path.vertices[0][1] = q2_start
path.vertices[1][1] = q2_start
path.vertices[2][1] = q4_start
path.vertices[3][1] = q4_start
path.vertices[4][1] = q2_start
# Median
box_plot['medians'][box_no].set_ydata([q3_start, q3_start])
# Outliers
if fliers_xy is not None and len(fliers_xy[0]) != 0:
# If outliers exist
box_plot['fliers'][box_no].set(xdata = fliers_xy[0],
ydata = fliers_xy[1])
min_y = min(q1_start, min_y, fliers_xy[1].min())
max_y = max(q4_end, max_y, fliers_xy[1].max())
else:
min_y = min(q1_start, min_y)
max_y = max(q4_end, max_y)
# The y axis is rescaled to fit the new box plot completely with 10%
# of the maximum value at both ends
axes.set_ylim([min_y*1.1, max_y*1.1])
# If redraw is set to true, the canvas is updated.
if redraw:
ax.figure.canvas.draw()
return box_plot
Here is a bottom-up approach where the box_plot is build up using matplotlib's vline, Rectangle, and normal plot functions
def boxplot(df, ax=None, box_width=0.2, whisker_size=20, mean_size=10, median_size = 10 , line_width=1.5, xoffset=0,
color=0):
"""Plots a boxplot from existing percentiles.
Parameters
----------
df: pandas DataFrame
ax: pandas AxesSubplot
if to plot on en existing axes
box_width: float
whisker_size: float
size of the bar at the end of each whisker
mean_size: float
size of the mean symbol
color: int or rgb(list)
If int particular color of property cycler is taken. Example of rgb: [1,0,0] (red)
Returns
-------
f, a, boxes, vlines, whisker_tips, mean, median
"""
if type(color) == int:
color = plt.rcParams['axes.prop_cycle'].by_key()['color'][color]
if ax:
a = ax
f = a.get_figure()
else:
f, a = plt.subplots()
boxes = []
vlines = []
xn = []
for row in df.iterrows():
x = row[0] + xoffset
xn.append(x)
# box
y = row[1][25]
height = row[1][75] - row[1][25]
box = plt.Rectangle((x - box_width / 2, y), box_width, height)
a.add_patch(box)
boxes.append(box)
# whiskers
y = (row[1][95] + row[1][5]) / 2
vl = a.vlines(x, row[1][5], row[1][95])
vlines.append(vl)
for b in boxes:
b.set_linewidth(line_width)
b.set_facecolor([1, 1, 1, 1])
b.set_edgecolor(color)
b.set_zorder(2)
for vl in vlines:
vl.set_color(color)
vl.set_linewidth(line_width)
vl.set_zorder(1)
whisker_tips = []
if whisker_size:
g, = a.plot(xn, df[5], ls='')
whisker_tips.append(g)
g, = a.plot(xn, df[95], ls='')
whisker_tips.append(g)
for wt in whisker_tips:
wt.set_markeredgewidth(line_width)
wt.set_color(color)
wt.set_markersize(whisker_size)
wt.set_marker('_')
mean = None
if mean_size:
g, = a.plot(xn, df['mean'], ls='')
g.set_marker('o')
g.set_markersize(mean_size)
g.set_zorder(20)
g.set_markerfacecolor('None')
g.set_markeredgewidth(line_width)
g.set_markeredgecolor(color)
mean = g
median = None
if median_size:
g, = a.plot(xn, df['median'], ls='')
g.set_marker('_')
g.set_markersize(median_size)
g.set_zorder(20)
g.set_markeredgewidth(line_width)
g.set_markeredgecolor(color)
median = g
a.set_ylim(np.nanmin(df), np.nanmax(df))
return f, a, boxes, vlines, whisker_tips, mean, median
This is how it looks in action:
import numpy as np
import pandas as pd
import matplotlib.pylab as plt
nopts = 12
df = pd.DataFrame()
df['mean'] = np.random.random(nopts) + 7
df['median'] = np.random.random(nopts) + 7
df[5] = np.random.random(nopts) + 4
df[25] = np.random.random(nopts) + 6
df[75] = np.random.random(nopts) + 8
df[95] = np.random.random(nopts) + 10
out = boxplot(df)

Change the scale of the graph image

I try to generate a graph and save an image of the graph in python. Although the "plotting" of the values seems ok and I can get my picture, the scale of the graph is badly shifted.
If you compare the correct graph from tutorial example with my bad graph generated from different dataset, the curves are cut at the bottom to early: Y-axis should start just above the highest values and I should also see the curves for the highest X-values (in my case around 10^3).
But honestly, I think that problem is the scale of the y-axis, but actually do not know what parameteres should I change to fix it. I tried to play with some numbers (see below script), but without any good results.
This is the code for calculation and generation of the graph image:
import numpy as np
hic_data = load_hic_data_from_reads('/home/besy/Hi-C/MOREX/TCC35_parsedV2/TCC35_V2_interaction_filtered.tsv', resolution=100000)
min_diff = 1
max_diff = 500
import matplotlib.pyplot as plt
fig = plt.figure(figsize=(12, 12))
for cnum, c in enumerate(hic_data.chromosomes):
if c in ['ChrUn']:
continue
dist_intr = []
for diff in xrange(min_diff, min((max_diff, 1 + hic_data.chromosomes[c]))):
beg, end = hic_data.section_pos[c]
dist_intr.append([])
for i in xrange(beg, end - diff):
dist_intr[-1].append(hic_data[i, i + diff])
mean_intrp = []
for d in dist_intr:
if len(d):
mean_intrp.append(float(np.nansum(d)) / len(d))
else:
mean_intrp.append(0.0)
xp, yp = range(min_diff, max_diff), mean_intrp
x = []
y = []
for k in xrange(len(xp)):
if yp[k]:
x.append(xp[k])
y.append(yp[k])
l = plt.plot(x, y, '-', label=c, alpha=0.8)
plt.hlines(mean_intrp[2], 3, 5.25 + np.exp(cnum / 4.3), color=l[0].get_color(),
linestyle='--', alpha=0.5)
plt.text(5.25 + np.exp(cnum / 4.3), mean_intrp[2], c, color=l[0].get_color())
plt.plot(3, mean_intrp[2], '+', color=l[0].get_color())
plt.xscale('log')
plt.yscale('log')
plt.ylabel('number of interactions')
plt.xlabel('Distance between bins (in 100 kb bins)')
plt.grid()
plt.ylim(2, 250)
_ = plt.xlim(1, 110)
fig.savefig('/home/besy/Hi-C/MOREX/TCC35_V2_results/filtered/TCC35_V2_decay.png', dpi=fig.dpi)
I think that problem is in scale I need y-axis to start from 10^-1 (0.1), in order to change this I tried this:
min_diff = 0.1
.
.
.
dist_intr = []
for diff in xrange(min_diff, min((max_diff, 0.1 + hic_data.chromosomes[c]))):
.
.
.
plt.ylim((0.1, 20))
But this values return: "integer argument expected, got float"
I also tried to play with:
max_diff, plt.ylim and plt.xlim parameters little bit, but nothing changed to much.
I would like to ask you what parameter/s and how I need change to generate image of the correctly focused graph. Thank you in advance.

Plotly.py: fill to zero, different color for positive/negative

With Plotly, I can easily plot a single lines and fill the area between the line and y == 0:
import plotly.graph_objects as go
fig = go.Figure()
fig.add_trace(go.Scatter(
x=[1, 2, 3, 4],
y=[-2, -1.5, 1, 2.5],
fill='tozeroy',
mode='lines',
))
fig.show()
How can I split the filled area in two? In particular, filling with red where y < 0 and with green where y > 0.
I would like to maintain the line as continuous. That means, I am not interested in just drawing two separate filled polygons.
Note that the line does not necessarily have values at y == 0.
I was in need of such a graph so I wrote a function.
def resid_fig(resid, tickvalues):
"""
resid: The y-axis points to be plotted. x-axis is assumed to be linearly increasing
tickvalues: The values you want to be displayed as ticklabels on x-axis. Works for hoverlabels too. Has to be
same length as `resid`. (This is necessary to ignore 'gaps' in graph where two polygons meet.)
"""
#Adjusting array with paddings to connect polygons at zero line on x-axis
index_array = []
start_digit = 0
split_array = np.split(resid,np.where(np.abs(np.diff(np.sign(resid)))==2)[0]+1)
split_array = [np.append(x,0) for x in split_array]
split_array = [np.insert(x,0,0) for x in split_array]
split_array[0] = np.delete(split_array[0],0)
split_array[-1] = np.delete(split_array[-1],-1)
for x in split_array:
index_array.append(np.arange(start_digit,start_digit+len(x)))
start_digit += len(x)-1
#Making an array for ticklabels
flat = []
for x in index_array:
for y in x:
flat.append(y)
flat_counter = Counter(flat)
none_indices = np.where([(flat_counter[x]>1) for x in flat_counter])[0]
custom_tickdata = []
neg_padding = 0
start_pos = 0
for y in range(len(flat)):
for x in range(start_pos,flat[-1]+1):
if x in none_indices:
custom_tickdata.append('')
break
custom_tickdata.append(tickvalues[x-neg_padding])
neg_padding +=1
start_pos = 1+x
#Making an array for hoverlabels
custom_hoverdata=[]
sublist = []
for x in custom_tickdata:
if x == '':
custom_hoverdata.append(sublist)
sublist = []
sublist.append(x)
continue
sublist.append(x)
sublist2 = sublist.copy()
custom_hoverdata.append(sublist2)
#Creating figure
fig = go.Figure()
idx = 0
for x,y in zip(split_array,index_array):
color = 'rgba(219,43,57,0.8)' if x[1]<0 else 'rgba(47,191,113,0.8)'
if (idx==0 and x[0] < 0):
color= 'rgba(219,43,57,0.8)'
fig.add_scatter(y=x, x=y, fill='tozeroy', fillcolor=color, line_color=color, customdata=custom_hoverdata[idx],
hovertemplate='%{customdata}<extra></extra>',legendgroup='mytrace',
showlegend=False if idx>0 else True)
idx += 1
fig.update_layout()
fig.update_xaxes(tickformat='', hoverformat='',tickmode = 'array',
tickvals = np.arange(index_array[-1][-1]+1),
ticktext = custom_tickdata)
fig.update_traces(mode='lines')
fig.show()
Example-
resid_fig([-2,-5,7,11,3,2,-1,1,-1,1], [1,2,3,4,5,6,7,8,9,10])
Now, for the caveats-
It does use separate polygons but I have combined all the traces into a single legendgroup so clicking on legend turns all of them on or off. About the legend colour, one way is to change 0 to 1 in showlegend=False if idx>0 in the fig.add_scatter() call. It then shows two legends, red and green still in the same legendgroup though so they still turn on and off together.
The function works by first separating continuous positive and negative values into arrays, adding 0 to the end and start of each array so the polygons can meet at the x-axis. This means the figure does not scale as well but depending on the use-case, it might not matter as much. This does not affect the hoverlabels or ticklabels as they are blank on these converging points.
The most important one, the graph does not work as intended when any of the point in the passed array is 0. I'm sure it can be modified to work for it but I have no use for it and the question doesn't ask for it either.

Python: Point Clustering/Averaging

I have a detector which returns the detected objects' bounding box centers, it works fine for the most part. What I want to do, however, is to consider 10 frames and not 1 frame to conduct the detection, so that I can eliminate more false positives.
The way my detector normally works is follows:
1. Get a frame.
2. Conduct the algorithm.
3. Record the centers into a dictionary per each frame.
The way I thought would help reducing false positives is:
1. Set up a loop of 10:
1. Get a frame.
2. Conduct the algorithm.
3. Record the centers into a dictionary per each frame.
2. Loop over the recorded points after every 10 frames.
3. Use a clustering algorithm or simple distance averaging
4. Get the final centers.
So, I've already implemented some of this logic. I am on step 1.3, I need to find a way to group the coordinates and finalize the estimation.
After 10 frames, my dictionary holds such values (can't paste all):
(4067.0, 527.0): ['torx8', 'screw8'],
(4053.0, 527.0): ['torx8', 'screw1'],
(2627.0, 707.0): ['torx8', 'screw12'],
(3453.0, 840.0): ['torx6', 'screw14'],
(3633.0, 1373.0): ['torx6', 'screw15'],
(3440.0, 840.0): ['torx6', 'screw14'],
(3447.0, 840.0): ['torx6', 'screw14'],
(1660.0, 1707.0): ['torx8', 'screw3'],
(2633.0, 700.0): ['torx8', 'screw7'],
(2627.0, 693.0): ['torx8', 'screw8'],
(4060.0, 533.0): ['torx8', 'screw6'],
(3627.0, 1367.0): ['torx6', 'screw13'],
(2600.0, 680.0): ['torx8', 'screw15'],
(2607.0, 680.0): ['torx8', 'screw7']
As you can notice, most of these points are already the same points with a bit of pixel shift, which is why I am trying to find a way to get rid of the so called duplicates.
Is there an intelligent and efficient way of dealing with this problem? First thing came to my mind was k-means clustering, but I am not sure if this fits to this problem.
Did anyone have similar experience?
EDIT: Okay so I made some progress and I am able to cluster the points using Hierarchical Clustering, because in my case I have no priori knowledge of the number of cluster. Hence, an approximation is required.
# cluster now
points = StandardScaler().fit_transform(points)
db = self.dbscan.fit(points)
core_samples_mask = np.zeros_like(db.labels_, dtype=bool)
core_samples_mask[db.core_sample_indices_] = True
labels = db.labels_
# Number of clusters in labels, ignoring noise if present.
n_clusters_ = len(set(db.labels_)) - (1 if -1 in db.labels_ else 0)
n_noise_ = list(db.labels_).count(-1)
# Black removed and is used for noise instead.
unique_labels = set(labels)
colors = [plt.cm.Spectral(each)
for each in np.linspace(0, 1, len(unique_labels))]
for k, col in zip(unique_labels, colors):
if k == -1:
# Black used for noise.
col = [0, 0, 0, 1]
class_member_mask = (labels == k)
xy = points[class_member_mask & core_samples_mask]
plt.plot(xy[:, 0], xy[:, 1], 'o', markerfacecolor=tuple(col),
markeredgecolor='k', markersize=14)
xy = points[class_member_mask & ~core_samples_mask]
plt.plot(xy[:, 0], xy[:, 1], 'o', markerfacecolor=tuple(col),
markeredgecolor='k', markersize=6)
plt.title('Estimated number of clusters: %d' % n_clusters_)
plt.show()
which works great. I am able to eliminate the false positives (see the black dot), however, I still don't know how I could get the average per cluster. Like, after I find the clusters, how can I loop over each cluster and average all the X,Y values? (Before StandardScaler().fit_transform(points), obviously, since after that I lose the pixel coordinates, they are fit between minus one and one.)
Okay, finally, I got it. Since I also would need my points in their original scale (not between -1 and 1) I also had to do rescaling. Anyway, here is the full magic:
def cluster_dbscan(self, points, visualize=False):
# scale the points between -1 and 1
scaler = StandardScaler()
scaled_points = scaler.fit_transform(points)
# cluster
db = DBSCAN(eps=self.clustering_epsilon, min_samples=self.clustering_min_samples, metric='euclidean')
db.fit(scaled_points)
core_samples_mask = np.zeros_like(db.labels_, dtype=bool)
core_samples_mask[db.core_sample_indices_] = True
# Number of clusters in labels, ignoring noise if present.
n_clusters_ = len(set(db.labels_)) - (1 if -1 in db.labels_ else 0)
n_noise_ = list(db.labels_).count(-1)
if (visualize == True):
# Black removed and is used for noise instead.
unique_labels = set(db.labels_)
colors = [plt.cm.Spectral(each)
for each in np.linspace(0, 1, len(unique_labels))]
for k, col in zip(unique_labels, colors):
if k == -1:
# Black used for noise.
col = [0, 0, 0, 1]
class_member_mask = (db.labels_ == k)
xy = scaled_points[class_member_mask & core_samples_mask]
plt.plot(xy[:, 0], xy[:, 1], 'o', markerfacecolor=tuple(col),
markeredgecolor='k', markersize=14)
xy = scaled_points[class_member_mask & ~core_samples_mask]
plt.plot(xy[:, 0], xy[:, 1], 'o', markerfacecolor=tuple(col),
markeredgecolor='k', markersize=6)
plt.title('Estimated number of clusters: %d' % n_clusters_)
plt.show()
# back to original scale
points = scaler.inverse_transform(scaled_points)
# loop over the clusters, get the centers
centers = np.zeros((n_clusters_, 2)) # for x and y
for i in range(0, n_clusters_):
cluster_points = points[db.labels_ == i]
cluster_mean = np.mean(cluster_points, axis=0)
centers[i, :] = cluster_mean
# we need the original points
return centers

trouble displaying image HSI converted to RGB python

I've been working in a algorithm to convert RGB to HSI and vice-versa in python 3, which it display the resulted images and each channel using matplotlib.
The trouble is displaying HSI to RGB resulted image: Each channel alone is being displayed correctly, but when it shows the tree channels together I get a weird image.
By the way, when I save the resulted image with OpenCV it shows the image correctly.
Resulted display
What I did, but nothing changed:
Round the values and if it pass 1, give 1 to the pixel
In the conversion HSI to RGB, instead define R, G and B arrays with zeros, define arrays with ones
In the conversion RGB to HSI, change the values between [0,360],[0,1],[0,1] to values between [0,360],[0,255],[0,255] rounded or not
Instead use Jupyter notebook, use collab.research by google or Spider
Execute the code on terminal, but it gives me blank windows
Function to display images:
def show_images(T, cols=1):
N = len(T)
fig = plt.figure()
for i in range(N):
a = fig.add_subplot(np.ceil(N/float(cols)), cols, i+1)
try:
img,title = T[i]
except ValueError:
img,title = T[i], "Image %d" % (i+1)
if(img.ndim == 2):
plt.gray()
plt.imshow(img)
a.set_title(title)
plt.xticks([0,img.shape[1]]), plt.yticks([0,img.shape[0]])
fig.set_size_inches(np.array(fig.get_size_inches()) * N)
plt.show()
Then the main function do this:
image = bgr_to_rgb(cv2.imread("rgb.png"))
img1 = rgb_to_hsi(image)
img2 = hsi_to_rgb(img1)
show_images([(image,"RGB"),
(image[:,:,0],"Red"),
(image[:,:,1],"Green"),
(image[:,:,2],"Blue")], 4)
show_images([(img1,"RGB->HSI"),
(img1[:,:,0],"Hue"),
(img1[:,:,1],"Saturation"),
(img1[:,:,2],"Intensity")], 4)
show_images([(img2,"HSI->RGB"),
(img2[:,:,0],"Red"),
(img2[:,:,1],"Green"),
(img2[:,:,2],"Blue")], 4)
Conversion RGB to HSI:
def rgb_to_hsi(img):
zmax = 255 # max value
# values in [0,1]
R = np.divide(img[:,:,0],zmax,dtype=np.float)
G = np.divide(img[:,:,1],zmax,dtype=np.float)
B = np.divide(img[:,:,2],zmax,dtype=np.float)
# Hue, when R=G=B -> H=90
a = (0.5)*np.add(np.subtract(R,G), np.subtract(R,B)) # (1/2)*[(R-G)+(R-B)]
b = np.sqrt(np.add(np.power(np.subtract(R,G), 2) , np.multiply(np.subtract(R,B),np.subtract(G,B))))
tetha = np.arccos( np.divide(a, b, out=np.zeros_like(a), where=b!=0) ) # when b = 0, division returns 0, so then tetha = 90
H = (180/math.pi)*tetha # convert rad to degree
H[B>G]=360-H[B>G]
# saturation = 1 - 3*[min(R,G,B)]/(R+G+B), when R=G=B -> S=0
a = 3*np.minimum(np.minimum(R,G),B) # 3*min(R,G,B)
b = np.add(np.add(R,G),B) # (R+G+B)
S = np.subtract(1, np.divide(a,b,out=np.ones_like(a),where=b!=0))
# intensity = (1/3)*[R+G+B]
I = (1/3)*np.add(np.add(R,G),B)
return np.dstack((H, zmax*S, np.round(zmax*I))) # values between [0,360], [0,255] e [0,255]
Conversion HSI to RGB:
def f1(I,S): # I(1-S)
return np.multiply(I, np.subtract(1,S))
def f2(I,S,H): # I[1+(ScosH/cos(60-H))]
r = math.pi/180
a = np.multiply(S, np.cos(r*H)) # ScosH
b = np.cos(r*np.subtract(60,H)) # cos(60-H)
return np.multiply(I, np.add(1, np.divide(a,b)) )
def f3(I,C1,C2): # 3I-(C1+C2)
return np.subtract(3*I, np.add(C1,C2))
def hsi_to_rgb(img):
zmax = 255 # max value
# values between[0,360], [0,1] and [0,1]
H = img[:,:,0]
S = np.divide(img[:,:,1],zmax,dtype=np.float)
I = np.divide(img[:,:,2],zmax,dtype=np.float)
R,G,B = np.ones(H.shape),np.ones(H.shape),np.ones(H.shape) # values will be between [0,1]
# for 0 <= H < 120
B[(0<=H)&(H<120)] = f1(I[(0<=H)&(H<120)], S[(0<=H)&(H<120)])
R[(0<=H)&(H<120)] = f2(I[(0<=H)&(H<120)], S[(0<=H)&(H<120)], H[(0<=H)&(H<120)])
G[(0<=H)&(H<120)] = f3(I[(0<=H)&(H<120)], R[(0<=H)&(H<120)], B[(0<=H)&(H<120)])
# for 120 <= H < 240
H = np.subtract(H,120)
R[(0<=H)&(H<120)] = f1(I[(0<=H)&(H<120)], S[(0<=H)&(H<120)])
G[(0<=H)&(H<120)] = f2(I[(0<=H)&(H<120)], S[(0<=H)&(H<120)], H[(0<=H)&(H<120)])
B[(0<=H)&(H<120)] = f3(I[(0<=H)&(H<120)], R[(0<=H)&(H<120)], G[(0<=H)&(H<120)])
# for 240 <= H < 360
H = np.subtract(H,120)
G[(0<=H)&(H<120)] = f1(I[(0<=H)&(H<120)], S[(0<=H)&(H<120)])
B[(0<=H)&(H<120)] = f2(I[(0<=H)&(H<120)], S[(0<=H)&(H<120)], H[(0<=H)&(H<120)])
R[(0<=H)&(H<120)] = f3(I[(0<=H)&(H<120)], G[(0<=H)&(H<120)], B[(0<=H)&(H<120)])
return np.dstack( ((zmax*R) , (zmax*G) , (zmax*B)) ) # values between [0,255]
If you take a look at the imshow documentation of matplotlib, you will see the following lines:
X : array-like or PIL image The image data. Supported array shapes
are:
(M, N): an image with scalar data. The data is visualized using a
colormap. (M, N, 3): an image with RGB values (float or uint8). (M, N,
4): an image with RGBA values (float or uint8), i.e. including
transparency. The first two dimensions (M, N) define the rows and
columns of the image.
The RGB(A) values should be in the range [0 .. 1] for floats or [0 ..
255] for integers. Out-of-range values will be clipped to these
bounds.
Which tells you the ranges that it should be in... In your case, the HSI values go from 0-360 in the Hue which will be clipped to 255 any value above it. That is one of the reasons why OpenCV uses the Hue range from 0-180, to be able to fit it inside the range.
Then the HSI->RGB seems to return the image in float, then it will be clipped in 1.0.
This will happen only for the display, but also if you save the image it will be clipped most probably, maybe it gets saved as a 16 bit image.
Possible solutions:
normalize the values from 0-1 or from 0-255 (this may change the min and max value) and then display it (dont forget to cast it to np.uint8).
Create a range that is always inside the possible values.
This is for display or saving purposes... If you use 0-360 save it at least in 16 bits

Categories