How to isolate the bars with cv2?

How to isolate the bars with cv2? - python

How do I isolate the bars with no white fill, in the image below that represents a bar chart?
I'm looking for a solution which will work for any variation of this image. You can assume that the format will be the same, but some features like the gaps in the gridlines/axis line might be in different places.
I've tried detecting various features of the bars, like the 26x3px ends of the bars, or the top-left and bottom-right corners. For example, using masks like the following for top-left:
bar_top_kernel: Numpy = np.array([
[1, 1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, 0, 0, 0, 0, 0, 0, 0],
], dtype='uint8')
or
bar_top_kernel: Numpy = np.array([
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
[0, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[0, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[0, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[0, 1, 1, 0, 0, 0, 0, 0, 0, 0],
], dtype='uint8')
But depending on what I try, I either get missed corners or false positives because of how the ends of the bars interact with the gridlines.
I've tried removing the gridlines first. But due to the interaction of the bar ends and the gridlines, I tend to get pieces left over which interfere with the feature detection.
I'm starting to think this might not be possible without some kind of ML approach, but I'm hoping someone will spot a clever trick to achieve this.
(please click to see full size)

I ended up figuring out a simple trick:
Invert the colours.
Find and fill any contour with: bounding box width > bar width. This basically isolates the bars.
Find contours again and find pairs of contours whose centres of mass are close together. This gives the bars small bars that are split in 2 by the tick marks.
Fill rectangle that encompasses each pair to fix the splits.

Related

Is there a way to plot matrix elements like a heat map?

I'm programming a code where I use a matrix full of 0's and 1's, the idea is to represent a galaxy, so the 0's are like the void and the 1's will be solar systems (for now), later I intend to add more elements. So, I was wondering if there's a way to plot this elements sorta like a heat map (1 = red and 0 = blue). I'd appreciate any ideas or suggestions if you think there's a better way to pose the problem. Thanks in advance!

Using plt.imshow we can make heat maps you can read more about it here: https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.imshow.html
import numpy as np
import matplotlib.pyplot as plt
matrix = np.array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 1, 1, 1, 1, 1, 1, 1, 1, 0],
[0, 1, 1, 1, 1, 1, 1, 1, 1, 0],
[0, 1, 1, 1, 1, 1, 1, 1, 1, 0],
[0, 1, 1, 1, 1, 1, 1, 1, 1, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])
plt.imshow(matrix, cmap='hot', interpolation='nearest')
plt.show()
output:

My go-to is Plotly for any kind of figure
import plotly.express as px
matrix = np.reshape([random.choice([0,1]) for n in range(10000)], (100, 100))
fig = px.imshow(matrix,
color_continuous_scale=['blue','red'])
fig.show()

Estimate rigid transformation between two numpy array

I have a quick question regarding rigid transformation between two 2D numpy arrays. I have tried several methods from opencv but none return interesting result and I guess that my problem is not too complicated, so maybe I am looking in the wrong direction and I will need your precious help.
So I have two 2D numpy arrays of the same size filled with 0 and 1, like this one:
[[0, 0, 0, 1, 0, 0, 0, 1, 0],
[0, 1, 0, 0, 0, 1, 0, 0, 1],
[1, 0, 0, 1, 0, 0, 1, 0, 0],
[0, 0, 1, 0, 0, 0, 0, 1, 0]]
When there is a 1 it means that I have a point at location (x,y) and 0, means there is nothing.
So at least, I can consider this matrix as a cloud of points that can be drawn in a graph.
I have a second array with same size as the previous one but where the 1 elements have been translated in one direction (all the 1 elements are translated in the same direction and with egal number of translations). It means that some of the 1 element will be out of the array, while some other 1 elements will appear in the free space leaves by the translation, for example second matrix can look like this :
[[1, 0, 1, 0, 1, 0, 1, 0, 1],
[[0, 0, 0, 1, 0, 0, 0, 1, 0],
[0, 1, 0, 0, 0, 1, 0, 0, 1],
[1, 0, 0, 1, 0, 0, 1, 0, 0]]
So first matrix has been translated down of 1 row. First row is new and the three rows below are common in the two matrix. The last row disappears in the second matrix because of the translation. Translation can be in any direction, but it is a rigid transformation (keep distance between points).
Is there a clever method to estimate the best warp matrix between this two arrays ?
Thanks a lot for your help

finding continuous signal in noisy binary time series

Suppose I have a time series such as:
[1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0 , 1, 1, 1, 1]
and I know there is some noise in the signal. I want to remove the noise as best I can and still output a binary signal. The above example would turn into something like:
[1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 , 1, 1, 1, 1]
I have implemented a naive rule-based approach where I iterate through the values and have some minimum amount of 1s or 0s I need to "swap" the signal.
It seems like there must be a better way to do it. A lot of the results from googling around give non-binary output. Is there some scipy function I could leverage for this?

There are two similar functions that can help you: scipy.signal.argrelmin and scipy.signal.argrelmax. There are search for local min/max in discrete arrays. You should pass your array and neighbours search radius as order. Your problem can be solved by their combination:
>>> a = np.asarray([1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0 , 1, 1, 1, 1], int)
>>> signal.argrelmin(a, order=3)
(array([4], dtype=int32),)
>>> signal.argrelmax(a, order=3)
(array([15], dtype=int32),)
Then you can just replace these elements.

Somewhat Randomly create 3D points given 2 images

Somewhat Randomly create 3D points given 2 images
The goal is to create a set of n 3D coordinates (seeds) from 2 images. n could be any where from 100 - 1000 points.
I have 2 pure black and white images whose heights are the same and the widths variable. The size of the images can be as big as 1000x1000 pixels. I read them into numpy arrays and flattened the rgb codes to 1's (black) and zeros (white).
Here is example from processing 2 very small images:
In [6]: img1
Out[6]:
array([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, 0, 0, 0, 0, 0, 0, 1, 1],
[1, 1, 0, 0, 0, 0, 0, 0, 1, 1],
[1, 1, 0, 0, 0, 0, 0, 0, 1, 1],
[1, 1, 0, 0, 0, 0, 0, 0, 1, 1],
[1, 1, 0, 0, 0, 0, 0, 0, 1, 1],
[1, 1, 0, 0, 0, 0, 0, 0, 1, 1],
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1]], dtype=uint8)
In [8]: img2
Out[8]:
array([[0, 0, 0, 0, 0, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 1, 1, 1, 0, 0, 0],
[0, 0, 0, 1, 1, 0, 0, 1, 0, 0],
[0, 0, 1, 1, 0, 0, 0, 1, 0, 0],
[1, 1, 0, 0, 0, 0, 0, 0, 1, 1],
[1, 1, 0, 0, 0, 0, 0, 0, 1, 1],
[0, 0, 1, 0, 0, 0, 1, 1, 0, 0],
[0, 0, 1, 0, 0, 1, 1, 0, 0, 0],
[0, 0, 0, 1, 1, 1, 0, 0, 0, 0],
[0, 0, 0, 1, 0, 0, 0, 0, 0, 0]], dtype=uint8)
Next, I create an index array to map all locations of black pixels for each image like so:
In [10]: np.transpose(np.nonzero(img1))
Out[10]:
array([[0, 0],
[0, 1],
[0, 2],
[0, 3],
[0, 4],
[0, 5],
[0, 6],
...
I then want to extend each 2D black pixel for each image into 3D space. Where those 3D points intersect, I want to randomly grab n number of 3D ponts (seeds). Furthermore, as an enhancement, it would be even better if I could disperse these 3d points somewhat evenly in the 3d space to avoid 'clustering' of points where there are areas of greater black pixel density. But I haven't been able to wrap my head around that process yet.
Here's a visualization of the set up:
What I've tried below seems to work on very small images but slows to a halt as the images get bigger. The bottleneck seems to occur where I assign common_points.
img1_array = process_image("Images/nhx.jpg", nheight)
img2_array = process_image("Images/ku.jpg", nheight)
img1_black = get_black_pixels(img1_array)
img2_black = get_black_pixels(img2_array)
# create all img1 3D points:
img1_3d = []
z1 = len(img2_array[1]) # number of img2 columns
for pixel in img1_black:
for i in range(z1):
img1_3d.append((pixel[0], pixel[1], i)) # (img1_row, img1_col, img2_col)
# create all img2 3D points:
img2_3d = []
z2 = len(img1_array[1]) # number of img1 columns
for pixel in img2_black:
for i in range(z2):
img2_3d.append((pixel[0], pixel[1], i)) # (img2_row, img2_col, img1_col)
# get all common 3D points
common_points = [x for x in img1_3d if x in img2_3d]
# get num_seeds number of random common_points
seed_indices = np.random.choice(len(common_points), num_seeds, replace=False)
seeds = []
for index_num in seed_indices:
seeds.append(common_points[index_num])
Questions:
How can I avoid the bottleneck? I haven't been able to come up with a numpy solution.
Is there a better solution, in general, to how I am coding this?
Any thoughts on how I could somewhat evenly disperse seeds?
Update Edit:
Based on Luke's algorithm, I've come up with the following working code. Is this the correct implementation? Could this be improved upon?
img1_array = process_image("Images/John.JPG", 500)
img2_array = process_image("Images/Ryan.jpg", 500)
img1_black = get_black_pixels(img1_array)
# img2_black = get_black_pixels(img2_array)
density = 0.00001
seeds = []
for img1_pixel in img1_black:
row = img1_pixel[0]
img2_row = np.array(np.nonzero(img2_array[row])) # array of column numbers where there is a black pixel
if np.any(img2_row):
for img2_col in img2_row[0]:
if np.random.uniform(0, 1) < density:
seeds.append([row, img1_pixel[1], img2_col])

The bottleneck is because you're comparing every 3D point in the "apple" shaded area to every 3D point in the "orange" shaded area, which is a huge number of comparisons. You could speed it up by a factor of imgHeight by only looking at points in the same row. You could also speed it up by storing img2_3d as a set instead of a list, because calling "in" on a set is much faster (it's an O(1) operation instead of an O(n) operation).
However, it's better to completely avoid making lists of all 3D points. Here's one solution:
Choose an arbitrary density parameter, call it Density. Try Density = 0.10 to fill in 10% of the intersection points.
For each black pixel in Apple, loop through the black pixels in the same row of Orange. If (random.uniform(0,1) < Density), create a 3D point at (applex, orangex, row) or whatever the correct arrangement is for your coordinate system.
That algorithm will sample evenly, so 3D areas with more black will have more samples. If I understand your last question, you want to sample more densely in areas with less black (though I'm not sure why). To do that you could:
Do a Gaussian blur of the inverse of your two images (OpenCV has functions for this), and multiply each times 0.9 and add 0.1. You now have an image that has a higher value where the image is more white.
Do the algorithm above, but for each pixel pair in step 2, set Density = blurredOrangePixel * blurredApplePixel. Thus, your selection density will be higher in white regions.
I would try the basic algorithm first though; I think it will look better.

Smoothing a list with matplotlib

I have a long list of reward signals (-1 for loss, 0 for tie, and +1 for win). I want to average these signals in "windows" and then smooth this resulting curve to show progress. How do I do this with matplotlib/scipy?
My codes like:
#!/usr/bin/env python
import matplotlib
matplotlib.rcParams['backend'] = "Qt4Agg"
import matplotlib.pyplot as plt
import numpy as np
y = np.array([-1, 1, 0, -1, -1, -1, 1, 1, 1, 1, 0, 0, 0, 1, 1, -1, 1, 1, -1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, -1, 1, 1, 0, 1, 1, 0, 1, -1, -1, 1, -1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 0, -1, 0, 1, 1, 1, -1, 1, 1, 1, 1, 0, -1, 0, 1, 0, 1, 1, 1, -1, 0, 0, 1, 1, 0, 1, 1, 1, 0, 1, -1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1]
)
x = np.array(xrange(len(y)))
plt.plot(x,y)
plt.show()
I tried solutions from similar questions, like this, which recommending using a spline, but when applied to my data, that consumes all my memory and crashes my machine.

At some point I found this somewhere. I am having trouble finding the source, but I use it for convolving 1d ndarrays with various windows, and should solve your problem.
def smooth(x,window_len=11,window='hanning'):
if x.ndim != 1:
raise ValueError, "smooth only accepts 1 dimension arrays."
if x.size < window_len:
raise ValueError, "Input vector needs to be bigger than window size."
if window_len<3:
return x
if not window in ['flat', 'hanning', 'hamming', 'bartlett', 'blackman']:
raise ValueError, "Window is on of 'flat', 'hanning', 'hamming', 'bartlett', 'blackman'"
s=numpy.r_[x[window_len-1:0:-1],x,x[-1:-window_len:-1]]
if window == 'flat': #moving average
w=numpy.ones(window_len,'d')
else:
w=eval('numpy.'+window+'(window_len)')
y=numpy.convolve(w/w.sum(),s,mode='valid')
return y
So for example, with your data you'd just do:
plt.plot(smooth(y))
plt.show()
And you get:

The answer you linked recommends using scipy.interpolate.spline which constructs the b-spline representation using full matrices. This is why it consumes this much memory. If smoothing splines is what you're after, at the moment you're better off using scipy.interpolate.UnivariateSpline, it should have saner memory footprint.
If you need some window averages/convolutions, check out numpy.convolve and/or convolution/window functionality in scipy.signal.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.