How to identify cells representing contour of cluster in matrix (Python) - python

I have a binary matrix [0/1] where 1 is a cluster. I would like to identify the cells (in term of position) representing the contour of this cluster.
> test
X0 X0.1 X0.2 X0.3 X0.4 X0.5 X0.6
1 0 0 0 0 0 0 0
2 0 0 1 1 1 0 0
3 0 0 1 1 1 1 0
4 0 0 0 1 1 1 0
5 0 0 0 0 1 1 0
6 0 0 0 0 1 0 0
7 0 0 0 0 1 0 0
8 1 1 0 0 0 0 0
9 1 1 1 0 0 0 0
10 1 1 1 0 0 0 0
11 0 1 0 0 0 0 0
Thanks

Related

How can I make a square with a specified circumference and add margin?

I am trying to make a square path of a specified length:
I made a function - and if I put 20 then I get a 6x6 matrix.
How can I add a margin of 0's of eg. 3 fields thickness?
like this
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 1 1 1 1 1 1 0 0 0
0 0 0 1 1 1 1 1 1 0 0 0
0 0 0 1 1 1 1 1 1 0 0 0
0 0 0 1 1 1 1 1 1 0 0 0
0 0 0 1 1 1 1 1 1 0 0 0
0 0 0 1 1 1 1 1 1 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
def square(length): return [
[1 for _ in range(length//4+1)]
for _ in range(length//4+1)
]
for x in square(24):
print(x)
You can prepare a line pattern of 0s and 1s then build a 2D matrix by intersecting them.
def square(size,margin=3):
p = [0]*margin + [1]*(size-2*margin) + [0]*margin
return [[r*c for r in p] for c in p]
for row in square(20):print(*row)
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0
0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0
0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0
0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0
0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0
0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0
0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0
0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0
0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0
0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0
0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0
0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0
0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0
0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Here's one way. One caution here is that, because of the way I duplicated the zero rows, those are all the same list. If you modify one of the zero rows, it will modify all of them.
def square(length):
zeros = [0]*(length//4+7)
sq = [zeros] * 3
sq.extend( [
([0,0,0] + [1 for _ in range(length//4+1)] + [0,0,0] )
for _ in range(length//4+1)
])
sq.extend( [zeros]*3 )
return sq
for x in square(24):
print(x)
Here's a numpy method.
import numpy as np
def square(length):
c = length//4+1
sq = np.zeros((c+6,c+6)).astype(int)
sq[3:c+3,3:c+3] = np.ones((c,c))
return sq
print( square(24) )
One way to do this is to build it as a flat string, then use textwrap to style the output into the right number of lines:
import textwrap
# The number of 1's in a row/column
count = 6
# The number of 0's to pad with
margin = 3
# The total 'size' of a row/column
size = margin + count + margin
pad_rows = "0" * size * margin
core = (("0" * margin) + ("1" * count) + ("0" * margin)) * count
print('\n'.join(textwrap.wrap(pad_rows + core + pad_rows, size)))

How do you remove values not in a cluster using a pandas data frame?

If I have a pandas data frame like this made up of 0 and 1s:
1 1 1 0 0 0 0 1 0
1 1 1 1 1 0 0 0 0
1 1 1 0 0 0 0 1 0
1 0 0 0 0 1 0 0 0
How do I filter out outliers such that I get something like this:
1 1 1 0 0 0 0 0 0
1 1 1 1 1 0 0 0 0
1 1 1 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0
Such that I remove the outliers.
We can do this with a cummulative product over the second axis with pandas.cumprod [pandas-doc]:
>>> df.cumprod(axis=1)
0 1 2 3 4 5 6 7 8
0 1 1 1 0 0 0 0 0 0
1 1 1 1 1 1 0 0 0 0
2 1 1 1 0 0 0 0 0 0
3 1 0 0 0 0 0 0 0 0
The same result can here be obtained with pandas.cummin [pandas-doc]:
>>> df.cummin(axis=1)
0 1 2 3 4 5 6 7 8
0 1 1 1 0 0 0 0 0 0
1 1 1 1 1 1 0 0 0 0
2 1 1 1 0 0 0 0 0 0
3 1 0 0 0 0 0 0 0 0

Create dummies when categories are single characters in multi character strings

Consider my data in a Pandas Series
s = pd.Series('1az wb58 jsui ne3'.split())
s
0 1az
1 wb58
2 jsui
3 ne3
dtype: object
I need it to look like:
1 3 5 8 a b e i j n s u w z
0 1 0 0 0 1 0 0 0 0 0 0 0 0 1
1 0 0 1 1 0 1 0 0 0 0 0 0 1 0
2 0 0 0 0 0 0 0 1 1 0 1 1 0 0
3 0 1 0 0 0 0 1 0 0 1 0 0 0 0
However when I try:
pd.get_dummies(s)
1az jsui ne3 wb58
0 1 0 0 0
1 0 0 0 1
2 0 1 0 0
3 0 0 1 0
What is the most concise way to do this?
Maybe apply list
pd.get_dummies(s.apply(list).apply(pd.Series).stack()).sum(level=0)
Out[222]:
1 3 5 8 a b e i j n s u w z
0 1 0 0 0 1 0 0 0 0 0 0 0 0 1
1 0 0 1 1 0 1 0 0 0 0 0 0 1 0
2 0 0 0 0 0 0 0 1 1 0 1 1 0 0
3 0 1 0 0 0 0 1 0 0 1 0 0 0 0
Or
s.apply(list).str.join(',').str.get_dummies(',')
Out[224]:
1 3 5 8 a b e i j n s u w z
0 1 0 0 0 1 0 0 0 0 0 0 0 0 1
1 0 0 1 1 0 1 0 0 0 0 0 0 1 0
2 0 0 0 0 0 0 0 1 1 0 1 1 0 0
3 0 1 0 0 0 0 1 0 0 1 0 0 0 0
Solution with MultiLabelBinarizer and DataFrame constructor:
from sklearn.preprocessing import MultiLabelBinarizer
mlb = MultiLabelBinarizer()
df = pd.DataFrame(mlb.fit_transform(s),columns=mlb.classes_)
print (df)
1 3 5 8 a b e i j n s u w z
0 1 0 0 0 1 0 0 0 0 0 0 0 0 1
1 0 0 1 1 0 1 0 0 0 0 0 0 1 0
2 0 0 0 0 0 0 0 1 1 0 1 1 0 0
3 0 1 0 0 0 0 1 0 0 1 0 0 0 0
Another solution - DataFrame.from_records + get_dummies, but last is necessary aggregate columns by max:
df = pd.get_dummies(pd.DataFrame.from_records(s),prefix_sep='',prefix='').max(level=0, axis=1)
print (df)
1 3 5 8 a b e i j n s u w z
0 1 0 0 0 1 0 0 0 0 0 0 0 0 1
1 0 0 1 1 0 1 0 0 0 0 0 0 1 0
2 0 0 0 0 0 0 0 1 1 0 1 1 0 0
3 0 1 0 0 0 0 1 0 0 1 0 0 0 0

OpenCV canny edge detection is not working properly on ideal square

I am using this binary square image of 15*15 pixels.
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 1 1 1 1 1 1 1 1 1 1 0 0
0 0 0 1 1 1 1 1 1 1 1 1 1 0 0
0 0 0 1 1 1 1 1 1 1 1 1 1 0 0
0 0 0 1 1 1 1 1 1 1 1 1 1 0 0
0 0 0 1 1 1 1 1 1 1 1 1 1 0 0
0 0 0 1 1 1 1 1 1 1 1 1 1 0 0
0 0 0 1 1 1 1 1 1 1 1 1 1 0 0
0 0 0 1 1 1 1 1 1 1 1 1 1 0 0
0 0 0 1 1 1 1 1 1 1 1 1 1 0 0
0 0 0 1 1 1 1 1 1 1 1 1 1 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
I am applying canny edge detection provided by openCV (version 2.7)
for object size measurement. My expected output should look like,
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 1 1 1 1 1 1 1 1 1 1 0 0
0 0 0 1 0 0 0 0 0 0 0 0 1 0 0
0 0 0 1 0 0 0 0 0 0 0 0 1 0 0
0 0 0 1 0 0 0 0 0 0 0 0 1 0 0
0 0 0 1 0 0 0 0 0 0 0 0 1 0 0
0 0 0 1 0 0 0 0 0 0 0 0 1 0 0
0 0 0 1 0 0 0 0 0 0 0 0 1 0 0
0 0 0 1 0 0 0 0 0 0 0 0 1 0 0
0 0 0 1 0 0 0 0 0 0 0 0 1 0 0
0 0 0 1 1 1 1 1 1 1 1 1 1 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
But two edges (top and left edge) are always getting shifted by one pixel.
The output of canny edge detection is,
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 1 1 1 1 1 1 1 1 0 0 0
0 0 0 1 0 0 0 0 0 0 0 0 1 0 0
0 0 1 0 0 0 0 0 0 0 0 0 1 0 0
0 0 1 0 0 0 0 0 0 0 0 0 1 0 0
0 0 1 0 0 0 0 0 0 0 0 0 1 0 0
0 0 1 0 0 0 0 0 0 0 0 0 1 0 0
0 0 1 0 0 0 0 0 0 0 0 0 1 0 0
0 0 1 0 0 0 0 0 0 0 0 0 1 0 0
0 0 1 0 0 0 0 0 0 0 0 0 1 0 0
0 0 1 0 0 0 0 0 0 0 0 0 1 0 0
0 0 0 1 1 1 1 1 1 1 1 1 1 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Why is this pixel shift happening?
Is there any way I can avoid this. (I cannot manually adjust pixel shift after output, as I have to use edge detection on irregular shapes) The same shift happens irrespective of odd / even pixels.
At first glance, I was quite surprised when I came across this question. Moreover I did not believe that Canny edge detection would be so deceiving. So I took a similar image and applied Canny edge to it. To my surprise I encountered the same problem you are facing. Why is it so?
After digging in to the documentation I came across many operations that were occurring under the hood.
The documentation claims that Gaussian filtering is done to reduce noise. Well, it is true. But this blurs out the existing edges present in the image as well. So when you blur a perfect square/rectangle, it tends to have curved corners.
After Gaussian filtering, the next step is finding edge gradient. As said, by now the perfect edge of the square/rectangle is gone due to blurring (Gaussian filtering). What is left are rounded/curved edges. Finding the intensity of gradients on rounded/curved edges will never yield a perfect square/rectangle -like edge. I might be wrong, but I guess this the main reason as to why we do not get perfect edges while performing Canny edge detection.
If you want a perfect edge my suggestion would be to try finding contours(as suggested by Micka) and draw a bounding rectangle.

Pivotting via Python and Pandas

Has a table like this:
ID Word
1 take
2 the
3 long
4 long
5 road
6 and
7 walk
8 it
9 walk
10 it
Wanna to use pivot table in pandas to get distinct words in columns and 1 and 0 in Values. Smth like this matrix:
ID Take The Long Road And Walk It
1 1 0 0 0 0 0 0
2 0 1 0 0 0 0 0
3 0 0 1 0 0 0 0
4 0 0 1 0 0 0 0
5 0 0 0 1 0 0 0
and so on
Trying to use pivot table but not familiar with pandas syntax yet:
import pandas as pd
data = pd.read_csv('dataset.txt', sep='|', encoding='latin1')
table = pd.pivot_table(data,index=["ID"],columns=pd.unique(data["Word"].values),fill_value=0)
How can I rewrite pivot table function to deal with it?
You can use concatwith str.get_dummies:
print pd.concat([df['ID'], df['Word'].str.get_dummies()], axis=1)
ID and it long road take the walk
0 1 0 0 0 0 1 0 0
1 2 0 0 0 0 0 1 0
2 3 0 0 1 0 0 0 0
3 4 0 0 1 0 0 0 0
4 5 0 0 0 1 0 0 0
5 6 1 0 0 0 0 0 0
6 7 0 0 0 0 0 0 1
7 8 0 1 0 0 0 0 0
8 9 0 0 0 0 0 0 1
9 10 0 1 0 0 0 0 0
Or as Edchum mentioned in comments - pd.get_dummies:
print pd.concat([df['ID'], pd.get_dummies(df['Word'])], axis=1)
ID and it long road take the walk
0 1 0 0 0 0 1 0 0
1 2 0 0 0 0 0 1 0
2 3 0 0 1 0 0 0 0
3 4 0 0 1 0 0 0 0
4 5 0 0 0 1 0 0 0
5 6 1 0 0 0 0 0 0
6 7 0 0 0 0 0 0 1
7 8 0 1 0 0 0 0 0
8 9 0 0 0 0 0 0 1
9 10 0 1 0 0 0 0 0

Categories