I have a table of 10000 RGB triplets and another table of corresponding colour names. But the number of unique colour names is only 10. Now I want to create a 10-bin histogram for a given image using these two tables. I use NearestNeighbors from scikit-learn for this. Here's part of my code:
rgb_matrix = np.asarray(joblib.load('rgb-matrix.pkl'))
rgb_colors = np.asarray(joblib.load('rgb-colors.pkl'))
color_list = []
for i in xrange(len(rgb_colors)):
color = rgb_colors[i]
if color not in color_list:
nbrs = NearestNeighbors(n_neighbors=4,algorithm='ball_tree').fit(rgb_matrix)
rgb_arr = input_image.reshape(-1,3)
color_arr = nbrs.kneighbors(rgb_arr)[1] # No of nearest neighbours is set to 4
color_index = np.asarray(color_arr[:,0]) # Get the top color index
hist = np.zeros(10)
for i in xrange(len(color_index)):
hist[color_list.index(rgb_colors[color_index[i]])] += 1.0
But this loop makes the process really slow. Is there a way I can use np.histogram here?


How do I filter by area or eccentricity using skimage.measure.regionprops on a binary image in Python

I have a binary image of a road surface and I am trying to isolate the pothole only. Using skimage.measure.regionprops and skimage.measure.label I can produce a table of properties for different labels within the image.
How do I then filter using those values? - for instance using area or axis length or eccentricity to turn off certain labels.
Input, labled Image and properties table
using python 3
I would use pandas together with skimage.measure.regionprops_table to get what you want:
import pandas as pd
import imageio as iio
from skimage.measure import regionprops_table, label
image = np.asarray(iio.imread('path/to/image.png'))
labeled = label(image > 0) # ensure input is binary
data = regionprops_table(
properties=('label', 'eccentricity'),
table = pd.DataFrame(data)
table_sorted_by_ecc = table.sort_values(
by='eccentricity', ascending=False
# print e.g. the 10 most eccentric labels
If you then want to e.g. produce the label image with only the most eccentric label, you can do:
eccentric_label = table['labels'].iloc[np.argmax(table['eccentricity'])]
labeled_ecc = np.where(labeled == eccentric_label, eccentric_label, 0)
You can also do more sophisticated things, e.g. make a label image with only labels above a certain eccentricity. Below, we use NumPy elementwise multiplication to produce an array that is the original label if that label has high eccentricity, or 0 otherwise. We then use the skimage.util.map_array function to map the original labels to either themselves or 0, again, depending on the eccentricity.
from skimage.util import map_array
ecc_threshold = 0.3
eccentric_labels = table['labels'] * (table['eccentricity'] > ecc_threshold)
new_labels = map_array(

Color ggplot2 values by a sorted unique column values in Python

I came across this package plotnine which can give the same results as R's ggplot2 in Python. It's pretty useful but I have a problem coloring "label_1" and "label_2" values by their unique "ID"s. The colors should be distinguishable. It could be ranging from a bright shade of a color to the darkest shade. My code gives a pretty close result to what I wish but still the colors are not distinguishable enough. My graph is not using my colors now but I'd like to find out if it could work out too.
# Generating 100 random colors for 100 values
from plotnine import *
import random as random
colors= lambda n: list(map(lambda i: "#" + "%06x" % random.randint(0, 0xFFFFFF), range(n)))
colors = colors(100)
musk_df) + geom_point(aes(x = 'label_1',y = 'label_2',fill = 'id'),alpha = 0.5) +labs(
title ='Graph',
x = 'label_1',
y = 'label_2',) +scale_fill_manual(
name = 'id',values = colors) +scale_fill_gradient(low="green",high="darkgreen")
When using scale_fill_manual, you have to create a dict that associates the fill element and a color. For istance:
color_dict = {'ID1': 'green',
'ID2': 'red',...
'IDN': 'darkgreen'}
You have to modify the key of the dictionary with the unique values of your id column.
That should do the trick.

Detecting border pixel of a segmentation label

I can compute the SLIC boundaries using skimage as follows:
def compute_superpixels(frame, num_pixels=100, std=5, iter_max=10,
connectivity=False, compactness=10.0):
return slic(frame, n_segments=num_pixels, sigma=std, max_iter=iter_max,
enforce_connectivity=connectivity, compactness=compactness)
Now, what I would like to do is get the index of pixels which form the boundary of each label. So my idea was to get all pixels belonging to a given segment and then check which pixels have a change in all two directions
def boundary_pixels(segments, index):
# Get all pixels having a given index
x, y = np.where(segments == index)
right = x + 1
# check we are in bounds
right_mask = right < segments.shape[0]
down = y + 1
down_mask = down < segments.shape[1]
left = x - 1
left_mask = left >= 0
up = y - 1
up_mask = up >= 0
neighbors_1 = np.union1d(right_n, down_n)
neighbors_2 = np.union1d(left_n, up_n)
neighbors = np.union1d(neighbors_1, neighbors_2)
# Not neighbours to ourselves
neighbors = np.delete(neighbors, np.where(neighbors == i))
However, with this all I managed to do was to get the neighbours in the 4 directions of a given label. Can someone suggest some way to actually get all pixels on the border of the label.
I found an answer to my own question. The mark_boundaries in the skimage.segmentation package does exactly what I needed.
processed = mark_boundaries(frame, segments==some_segment)
Here frame is he current image frame and segments is the label array. some_segment is the label integer index whose boundaries we are interested in.
You can make use of the find_contours function available in skimage.measure module to find the co-ordinates of the pixels along the boundary. An example is available at find_contours.. Next, you can change for change in both directions as needed.

Isolate greatest/smallest labeled patches from numpy array

i have a large numpy array and labeled it with the connected component labeling in scipy. Now i want to create subsets of this array, where only the biggest or smallest labels in size are left.
Both extrema can of course occur several times.
import numpy
from scipy import ndimage
# Loaded in my image file here. To big to paste
s = ndimage.generate_binary_structure(2,2) # iterate structure
labeled_array, numpatches = ndimage.label(array,s) # labeling
# get the area (nr. of pixels) of each labeled patch
sizes = ndimage.sum(array,labeled_array,range(1,numpatches+1))
# To get the indices of all the min/max patches. Is this the correct label id?
map = numpy.where(sizes==sizes.max())
mip = numpy.where(sizes==sizes.min())
# This here doesn't work! Now i want to create a copy of the array and fill only those cells
# inside the largest, respecitively the smallest labeled patches with values
feature = numpy.zeros_like(array, dtype=int)
feature[labeled_array == map] = 1
Someone can give me hint how to move on?
Here is the full code:
import numpy
from scipy import ndimage
array = numpy.zeros((100, 100), dtype=np.uint8)
x = np.random.randint(0, 100, 2000)
y = np.random.randint(0, 100, 2000)
array[x, y] = 1
pl.imshow(array, cmap="gray", interpolation="nearest")
s = ndimage.generate_binary_structure(2,2) # iterate structure
labeled_array, numpatches = ndimage.label(array,s) # labeling
sizes = ndimage.sum(array,labeled_array,range(1,numpatches+1))
# To get the indices of all the min/max patches. Is this the correct label id?
map = numpy.where(sizes==sizes.max())[0] + 1
mip = numpy.where(sizes==sizes.min())[0] + 1
# inside the largest, respecitively the smallest labeled patches with values
max_index = np.zeros(numpatches + 1, np.uint8)
max_index[map] = 1
max_feature = max_index[labeled_array]
min_index = np.zeros(numpatches + 1, np.uint8)
min_index[mip] = 1
min_feature = min_index[labeled_array]
numpy.where returns a tuple
the size of label 1 is sizes[0], so you need to add 1 to the result of numpy.where
To get a mask array with multiple labels, you can use labeled_array as the index of a label mask array.
The results:
first you need a labeled mask, given a mask with only 0(background) and 1(foreground):
labeled_mask, cc_num = ndimage.label(mask)
then find the largest connected component:
largest_cc_mask = (labeled_mask == (np.bincount(labeled_mask.flat)[1:].argmax() + 1))
you can deduce the smallest object finding by using argmin()..

creating a color coded time chart using colorbar and colormaps in python

I'm trying to make a time tracking chart based on a daily time tracking file that I used. I wrote code that crawls through my files and generates a few lists.
endTimes is a list of times that a particular activity ends in minutes going from 0 at midnight the first day of the month to however many minutes are in a month.
labels is a list of labels for the times listed in endTimes. It is one shorter than endtimes since the trackers don't have any data about before 0 minute. Most labels are repeats.
categories contains every unique value of labels in order of how well I regard that time.
I want to create a colorbar or a stack of colorbars (1 for eachday) that will depict how I spend my time for a month and put a color associated with each label. Each value in categories will have a color associated. More blue for more good. More red for more bad. It is already in order for the jet colormap to be right, but I need to get desecrate color values evenly spaced out for each value in categories. Then I figure the next step would be to convert that to a listed colormap to use for the colorbar based on how the labels associated with the categories.
I think this is the right way to do it, but I am not sure. I am not sure how to associate the labels with color values.
Here is the last part of my code so far. I found one function to make a discrete colormaps. It does, but it isn't what I am looking for and I am not sure what is happening.
Thanks for the help!
# now I need to develop the graph
import numpy as np
from matplotlib import pyplot,mpl
import matplotlib
from scipy import interpolate
from scipy import *
def contains(thelist,name):
# checks if the current list of categories contains the one just read
for val in thelist:
if val == name:
return True
return False
def getCategories(lastFile):
must determine the colors to use
I would like to make a gradient so that the better the task, the closer to blue
bad labels will recieve colors closer to blue
read the last file given for the information on how I feel the order should be
then just keep them in the order of how good they are in the tracker
use a color range and develop discrete values for each category by evenly spacing them out
any time not found should assume to be sleep
sleep should be white
tracker = open(lastFile+'.txt') # open the last file
# find all the categories
categories = []
for line in tracker:
pos = line.find(':') # does it have a : or a ?
if pos==-1: pos=line.find('?')
if pos != -1: # ignore if no : or ?
name = line[0:pos].strip() # split at the : or ?
if contains(categories,name)==False: # if the category is new
categories.append(name) # make a new one
return categories
# find good values in order of last day
for val in getCategories(lastDay):
if contains(labels,val):
# convert discrete colormap to listed colormap python
for ii,val in enumerate(labels):
if contains(categories,val)==False:
# create a figure
fig = pyplot.figure()
axes = []
for x in range(endTimes[-1]%(24*60)):
ax = fig.add_axes([0.05, 0.65, 0.9, 0.15])
# figure out the colors to use
# stole this function to make a discrete colormap
def cmap_discretize(cmap, N):
"""Return a discrete colormap from the continuous colormap cmap.
cmap: colormap instance, eg. cm.jet.
N: Number of colors.
x = resize(arange(100), (5,100))
djet = cmap_discretize(cm.jet, 5)
imshow(x, cmap=djet)
cdict = cmap._segmentdata.copy()
# N colors
colors_i = np.linspace(0,1.,N)
# N+1 indices
indices = np.linspace(0,1.,N+1)
for key in ('red','green','blue'):
# Find the N colors
D = np.array(cdict[key])
I = interpolate.interp1d(D[:,0], D[:,1])
colors = I(colors_i)
# Place these colors at the correct indices.
A = zeros((N+1,3), float)
A[:,0] = indices
A[1:,1] = colors
A[:-1,2] = colors
# Create a tuple for the dictionary.
L = []
for l in A:
cdict[key] = tuple(L)
# Return colormap object.
return matplotlib.colors.LinearSegmentedColormap('colormap',cdict,1024)
# jet colormap goes from blue to red (good to bad)
cmap = cmap_discretize(, len(categories))
#norm = mpl.colors.Normalize(endTimes,cmap.N)
print endTimes
print labels
# make a color list by matching labels to a picture
#norm = mpl.colors.ListedColormap(colorList)
cb1 = mpl.colorbar.ColorbarBase(axes[0],cmap=cmap
It sounds like you want something like a stacked bar chart with the color values mapped to a given range? In that case, here's a rough example:
import matplotlib.pyplot as plt
import as cm
import numpy as np
# Generate data....
intervals, weights = [], []
max_weight = 5
for _ in range(30):
numtimes = np.random.randint(3, 15)
times = np.random.randint(1, 24*60 - 1, numtimes)
times = np.r_[0, times, 24*60]
intervals.append(np.diff(times) / 60.0)
weights.append(max_weight * np.random.random(numtimes + 1))
# Plot the data as a stacked bar chart.
for i, (interval, weight) in enumerate(zip(intervals, weights)):
# We need to calculate where the bottoms of the bars will be.
bottoms = np.r_[0, np.cumsum(interval[:-1])]
# We want the left edges to all be the same, but increase with each day.
left = len(interval) * [i]
patches =, interval, bottom=bottoms, align='center')
# And set the colors of each bar based on the weights
for val, patch in zip(weight, patches):
# We need to normalize the "weight" value between 0-1 to feed it into
# a given colorbar to generate an actual color...
color = cm.jet(float(val) / max_weight)
# Setting the ticks and labels manually...
plt.xticks(range(0, 30, 2), range(1, 31, 2))
plt.yticks(range(0, 24 + 4, 4),
['12am', '4am', '8am', '12pm', '4pm', '8pm', '12am'])
