Related
I have a vector X of size 100x2 and the corresponding binary labels in a vector y ={1, -1} of length 100. I would like to plot the scattered data with s.t. I get the features on the axis and the color of the data point corresponds to a label e.g. red is -1, yellow is 1 for a given data point.
I've been looking into matplotlib and the fcn scatter however it accepts only a single feature vector and its label.
I would be grateful for any help.
You can do this easily using seaborn (or matplotlib as well). Below is the code.
I am creating a random array of size 100x2 and calling it X. I am creating a random array of 0s and 1s of size 100x1 and calling it Y
>> import numpy as np
>> X = np.random.randint(100, size=(100, 2))
>> Y = np.random.choice([0, 1], size=(100))
>> X
array([[11, 47],
[23, 2],
[91, 14],
[65, 32],
[81, 78],
....
>> Y
array([0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 0, 0, 1, 0, 1, 0, 1,
0, 0, 0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 1, 0,
1, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1,
0, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 1, 1, 1,
1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1])
Use Seaborn scatterplot
import seaborn as sns
sns.scatterplot(x=X[:,0], y=X[:,1], hue=Y)
Output sns scatterplot
How do I isolate the bars with no white fill, in the image below that represents a bar chart?
I'm looking for a solution which will work for any variation of this image. You can assume that the format will be the same, but some features like the gaps in the gridlines/axis line might be in different places.
I've tried detecting various features of the bars, like the 26x3px ends of the bars, or the top-left and bottom-right corners. For example, using masks like the following for top-left:
bar_top_kernel: Numpy = np.array([
[1, 1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, 0, 0, 0, 0, 0, 0, 0],
], dtype='uint8')
or
bar_top_kernel: Numpy = np.array([
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
[0, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[0, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[0, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[0, 1, 1, 0, 0, 0, 0, 0, 0, 0],
], dtype='uint8')
But depending on what I try, I either get missed corners or false positives because of how the ends of the bars interact with the gridlines.
I've tried removing the gridlines first. But due to the interaction of the bar ends and the gridlines, I tend to get pieces left over which interfere with the feature detection.
I'm starting to think this might not be possible without some kind of ML approach, but I'm hoping someone will spot a clever trick to achieve this.
(please click to see full size)
I ended up figuring out a simple trick:
Invert the colours.
Find and fill any contour with: bounding box width > bar width. This basically isolates the bars.
Find contours again and find pairs of contours whose centres of mass are close together. This gives the bars small bars that are split in 2 by the tick marks.
Fill rectangle that encompasses each pair to fix the splits.
Suppose I have a time series such as:
[1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0 , 1, 1, 1, 1]
and I know there is some noise in the signal. I want to remove the noise as best I can and still output a binary signal. The above example would turn into something like:
[1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 , 1, 1, 1, 1]
I have implemented a naive rule-based approach where I iterate through the values and have some minimum amount of 1s or 0s I need to "swap" the signal.
It seems like there must be a better way to do it. A lot of the results from googling around give non-binary output. Is there some scipy function I could leverage for this?
There are two similar functions that can help you: scipy.signal.argrelmin and scipy.signal.argrelmax. There are search for local min/max in discrete arrays. You should pass your array and neighbours search radius as order. Your problem can be solved by their combination:
>>> a = np.asarray([1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0 , 1, 1, 1, 1], int)
>>> signal.argrelmin(a, order=3)
(array([4], dtype=int32),)
>>> signal.argrelmax(a, order=3)
(array([15], dtype=int32),)
Then you can just replace these elements.
So I am making a program that reads in multiple two dimensional lists and plots them as step graph functions. I want to print out each set of graphs side by side like so (I made the graphs different colors just to differentiate the two):
Desired Output
However my code right now makes these two sets overlap each other instead, like so:
Actual Output
I believe it might have something to do with my "t" variable in plotPoints but I am not sure what I need to do. Any help would be greatly appreciated.
# supress warning message
import warnings; warnings.simplefilter("ignore")
# extension libraries
import matplotlib.pyplot as plt
import numpy as np
def plotPoints(bits, color):
for i in range(len(bits)):
data = np.repeat(bits[i], 2)
t = 0.5 * np.arange(len(data))
plt.step(t, data + i * 3, linewidth=1.5, where='post', color=color)
# Labels the graphs with binary sequence
for tbit, bit in enumerate(bits[i]):
plt.text(tbit + 0.3, 0.1 + i * 3, str(bit), fontsize=6, color=color)
def main():
plt.ylim([-1, 32])
set1 = [[0, 0, 0, 1, 1, 0, 1, 1], [0, 0, 1, 0, 1, 1, 0, 0], [1, 1, 0, 0, 1, 0, 0, 0]]
set2 = [[1, 1, 1, 0, 0, 1, 0, 0], [1, 1, 0, 1, 0, 0, 1, 1], [0, 0, 1, 1, 0, 1, 1, 1]]
plotPoints(set1, 'g')
plotPoints(set2, 'b')
# removes the built in graph axes and prints line every interation
plt.gca().axis('off')
plt.ylim([-1, 10])
plt.show()
main()
You can add some offset to t.
import matplotlib.pyplot as plt
import numpy as np
def plotPoints(bits, color, offset=0):
for i in range(len(bits)):
data = np.repeat(bits[i], 2)
t = 0.5 * np.arange(len(data)) + offset
plt.step(t, data + i * 3, linewidth=1.5, where='post', color=color)
# Labels the graphs with binary sequence
for tbit, bit in enumerate(bits[i]):
plt.text(tbit + 0.3 +offset, 0.1 + i * 3, str(bit), fontsize=6, color=color)
def main():
set1 = [[0, 0, 0, 1, 1, 0, 1, 1], [0, 0, 1, 0, 1, 1, 0, 0], [1, 1, 0, 0, 1, 0, 0, 0]]
set2 = [[1, 1, 1, 0, 0, 1, 0, 0], [1, 1, 0, 1, 0, 0, 1, 1], [0, 0, 1, 1, 0, 1, 1, 1]]
plotPoints(set1, 'g')
plotPoints(set2, 'b', offset=len(set1[0]))
# removes the built in graph axes and prints line every interation
plt.gca().axis('off')
plt.ylim([-1, 10])
plt.show()
main()
Somewhat Randomly create 3D points given 2 images
The goal is to create a set of n 3D coordinates (seeds) from 2 images. n could be any where from 100 - 1000 points.
I have 2 pure black and white images whose heights are the same and the widths variable. The size of the images can be as big as 1000x1000 pixels. I read them into numpy arrays and flattened the rgb codes to 1's (black) and zeros (white).
Here is example from processing 2 very small images:
In [6]: img1
Out[6]:
array([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, 0, 0, 0, 0, 0, 0, 1, 1],
[1, 1, 0, 0, 0, 0, 0, 0, 1, 1],
[1, 1, 0, 0, 0, 0, 0, 0, 1, 1],
[1, 1, 0, 0, 0, 0, 0, 0, 1, 1],
[1, 1, 0, 0, 0, 0, 0, 0, 1, 1],
[1, 1, 0, 0, 0, 0, 0, 0, 1, 1],
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1]], dtype=uint8)
In [8]: img2
Out[8]:
array([[0, 0, 0, 0, 0, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 1, 1, 1, 0, 0, 0],
[0, 0, 0, 1, 1, 0, 0, 1, 0, 0],
[0, 0, 1, 1, 0, 0, 0, 1, 0, 0],
[1, 1, 0, 0, 0, 0, 0, 0, 1, 1],
[1, 1, 0, 0, 0, 0, 0, 0, 1, 1],
[0, 0, 1, 0, 0, 0, 1, 1, 0, 0],
[0, 0, 1, 0, 0, 1, 1, 0, 0, 0],
[0, 0, 0, 1, 1, 1, 0, 0, 0, 0],
[0, 0, 0, 1, 0, 0, 0, 0, 0, 0]], dtype=uint8)
Next, I create an index array to map all locations of black pixels for each image like so:
In [10]: np.transpose(np.nonzero(img1))
Out[10]:
array([[0, 0],
[0, 1],
[0, 2],
[0, 3],
[0, 4],
[0, 5],
[0, 6],
...
I then want to extend each 2D black pixel for each image into 3D space. Where those 3D points intersect, I want to randomly grab n number of 3D ponts (seeds). Furthermore, as an enhancement, it would be even better if I could disperse these 3d points somewhat evenly in the 3d space to avoid 'clustering' of points where there are areas of greater black pixel density. But I haven't been able to wrap my head around that process yet.
Here's a visualization of the set up:
What I've tried below seems to work on very small images but slows to a halt as the images get bigger. The bottleneck seems to occur where I assign common_points.
img1_array = process_image("Images/nhx.jpg", nheight)
img2_array = process_image("Images/ku.jpg", nheight)
img1_black = get_black_pixels(img1_array)
img2_black = get_black_pixels(img2_array)
# create all img1 3D points:
img1_3d = []
z1 = len(img2_array[1]) # number of img2 columns
for pixel in img1_black:
for i in range(z1):
img1_3d.append((pixel[0], pixel[1], i)) # (img1_row, img1_col, img2_col)
# create all img2 3D points:
img2_3d = []
z2 = len(img1_array[1]) # number of img1 columns
for pixel in img2_black:
for i in range(z2):
img2_3d.append((pixel[0], pixel[1], i)) # (img2_row, img2_col, img1_col)
# get all common 3D points
common_points = [x for x in img1_3d if x in img2_3d]
# get num_seeds number of random common_points
seed_indices = np.random.choice(len(common_points), num_seeds, replace=False)
seeds = []
for index_num in seed_indices:
seeds.append(common_points[index_num])
Questions:
How can I avoid the bottleneck? I haven't been able to come up with a numpy solution.
Is there a better solution, in general, to how I am coding this?
Any thoughts on how I could somewhat evenly disperse seeds?
Update Edit:
Based on Luke's algorithm, I've come up with the following working code. Is this the correct implementation? Could this be improved upon?
img1_array = process_image("Images/John.JPG", 500)
img2_array = process_image("Images/Ryan.jpg", 500)
img1_black = get_black_pixels(img1_array)
# img2_black = get_black_pixels(img2_array)
density = 0.00001
seeds = []
for img1_pixel in img1_black:
row = img1_pixel[0]
img2_row = np.array(np.nonzero(img2_array[row])) # array of column numbers where there is a black pixel
if np.any(img2_row):
for img2_col in img2_row[0]:
if np.random.uniform(0, 1) < density:
seeds.append([row, img1_pixel[1], img2_col])
The bottleneck is because you're comparing every 3D point in the "apple" shaded area to every 3D point in the "orange" shaded area, which is a huge number of comparisons. You could speed it up by a factor of imgHeight by only looking at points in the same row. You could also speed it up by storing img2_3d as a set instead of a list, because calling "in" on a set is much faster (it's an O(1) operation instead of an O(n) operation).
However, it's better to completely avoid making lists of all 3D points. Here's one solution:
Choose an arbitrary density parameter, call it Density. Try Density = 0.10 to fill in 10% of the intersection points.
For each black pixel in Apple, loop through the black pixels in the same row of Orange. If (random.uniform(0,1) < Density), create a 3D point at (applex, orangex, row) or whatever the correct arrangement is for your coordinate system.
That algorithm will sample evenly, so 3D areas with more black will have more samples. If I understand your last question, you want to sample more densely in areas with less black (though I'm not sure why). To do that you could:
Do a Gaussian blur of the inverse of your two images (OpenCV has functions for this), and multiply each times 0.9 and add 0.1. You now have an image that has a higher value where the image is more white.
Do the algorithm above, but for each pixel pair in step 2, set Density = blurredOrangePixel * blurredApplePixel. Thus, your selection density will be higher in white regions.
I would try the basic algorithm first though; I think it will look better.