Identifying the number of clusters in a Python DataFrame

Identifying the number of clusters in a Python DataFrame - python

Picture showing the clusters I wish to count
I am looking to identify the number of clusters of non-zeros in my DataFrame.
Here I have a DataFrame with four (4) clusters in total, but I have trouble finding a code, that can count them for me.
data = [
[0,0,0,255,255,255,0,0],
[0,255,0,255,255,255,0,0],
[0,0,0,255,255,255,0,0,],
[0,0,0,0,255,0,0,0],
[0,255,255,0,0,255,0,0],
[0,255,0,0,0,255,0,0],
[0,0,0,0,0,255,0,0],
[0,0,0,0,0,255,0,0]
]
df2 = pd.DataFrame(data)
Any help is appreciated!

I searched a bit myself and got this. It is a bit try and error without background knowledge but I changed the number of groups in your data a bit and skimage.measure always got the right result:
import numpy as np
from skimage import measure
data = [
[0, 0, 0, 255, 255, 255, 0, 0],
[0, 255, 0, 255, 255, 255, 0, 0],
[0, 0, 0, 255, 255, 255, 0, 0, ],
[0, 0, 0, 0, 255, 0, 0, 0],
[0, 255, 255, 0, 0, 255, 0, 0],
[0, 255, 0, 0, 0, 255, 0, 0],
[0, 0, 0, 0, 0, 255, 0, 0],
[0, 0, 0, 0, 0, 255, 0, 0]
]
arr = np.array(data)
groups, group_count = measure.label(arr == 255, return_num = True, connectivity = 1)
print('Groups: \n', groups)
print(f'Number of groups: {group_count}')
Output:
Groups:
[[0 0 0 1 1 1 0 0]
[0 2 0 1 1 1 0 0]
[0 0 0 1 1 1 0 0]
[0 0 0 0 1 0 0 0]
[0 3 3 0 0 4 0 0]
[0 3 0 0 0 4 0 0]
[0 0 0 0 0 4 0 0]
Number of Groups: 4
In measure.label you define what the criteria is. In your case arr==255 works or just simply arr>0 if the values are not always only 255. Connectivity needs to be set to 1 because you don't want clusters to be connected diagonally (if you do, set it to 2). If return_num = True the result is a tuple where the 2nd element is the number of different clusters.

Related

Average diameter of complex shapes from pixels in df, Python

I have a DataFrame of multiple particles, that have gotten the group numbers (1,2,3,4) like this:
Groups:
[[0 0 0 1 1 1 0 0]
[0 2 0 1 1 1 0 0]
[0 0 0 1 1 1 0 0]
[0 0 0 0 1 0 0 0]
[0 3 3 0 0 4 0 0]
[0 3 0 0 0 4 0 0]
[0 0 0 0 0 4 0 0]
[0 0 0 0 0 4 0 0]]
Number of particles: 4
I have then calculated the areas of the particles and created a DataFrame (assuming 1 pixel = 1 nm):
Particle # Size [pixel #] A [nm2]
1 1 10 10
2 2 1 1
3 3 3 3
4 4 4 4
Now I want to calculate the diameter of the particles. However, the shapes of the particles are complex, therefore I am looking for a method to calculate the average diameter (considering the shapes are not perfectly round) and adding another column next to A [nm2] with the average diameter.
Will this be possible?
Here is my full code:
import numpy as np
from skimage import measure
import pandas as pd
final = [
[0, 0, 0, 255, 255, 255, 0, 0],
[0, 255, 0, 255, 255, 255, 0, 0],
[0, 0, 0, 255, 255, 255, 0, 0, ],
[0, 0, 0, 0, 255, 0, 0, 0],
[0, 255, 255, 0, 0, 255, 0, 0],
[0, 255, 0, 0, 0, 255, 0, 0],
[0, 0, 0, 0, 0, 255, 0, 0],
[0, 0, 0, 0, 0, 255, 0, 0]
]
final = np.asarray(final)
groups, group_count = measure.label(final > 0, return_num = True, connectivity = 1)
print('Groups: \n', groups)
print(f'Number of particles: {group_count}')
df = (pd.DataFrame(dict(zip(['Particle #', 'Size [pixel #]'],
np.unique(groups, return_counts=True))))
.loc[lambda d: d['Particle #'].ne(0)]
)
pixel_nm_size = 1*1
df['A [nm2]'] = df['Size [pixel #]'] * pixel_nm_size
Any help is appreciated!

I think you are looking for regionprops.
Specifically, either equivalent_diameter, or just perimeter.
props = measure.regionprops_table(groups, properties = ['label', 'equivalent_diameter', 'perimeter'])
df = pd.DataFrame(props)
edit
from the docs:
equivalent_diameter_area: float
The diameter of a circle with the same area as the region.
So, the function takes your labeled region, measures the area and constructs a circle with that area (there is only one such circle for each area).
Then it measures the diameter of the circle.
You can also look at major_axis_length and minor_axis_length. These are computed by fitting an ellipse around the object and measuring the long and short axis that define it.

IIUC, you could use a custom function to find the height/width of the bounding box and compute the average of both dimensions:
def get_diameter(g):
a = (groups==g)
h = (a.sum(1)!=0).sum()
w = (a.sum(0)!=0).sum()
return (h+w)/2
df['diameter'] = df['Particle #'].map(get_diameter)
output:
Particle # Size [pixel #] A [nm2] diameter
1 1 10 10 3.5
2 2 1 1 1.0
3 3 3 3 2.0
4 4 4 4 2.5

how to create an array of (100,19) size with each row as a vector of 19 values [0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0] in python?

I need to create an array of size (100,19) in python where each row is a fixed 19 valued vector of value [0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0]?
Any solution suggested?

a = np.zeros((100,19))
a[:,11] = 1

a = [0,0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0]
b = np.array(a)
c = np.tile(a,(100,1))
c.shape
Output:
(100, 19)

You can do it with np.zeros
array0 = np.zeros((100,19))
array0[:,11] = 1
On the other hand, if you want to have all one element
array1 = np.ones((100,19))
array1[:,11] = 0

np.full is a useful function for this purpose:
a = [0,0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0]
result=np.full((100,19),a )
result.shape
output:
(100,19)

Count string elements on a map based on where strings are and fill grid with the counts

I have a list called my_map that contains two different kinds of string values '.' and '&'. Now, for each value [x][y] that is a '.' I want to count the number of times an '&' was found in any of the eight directions next to the '.'
I created a grid to store the counts but I am just not able to formulate my conditions correctly. I can not use numpy arrays.
Note: 'S' and 'E' are treated like '.'
my_map = ['................' '....&...........' '..........E.....'
'&&..&...........' '....&&&.........' '......&&&&..&&..'
'................' '.......&........' '.....&.&........'
'....S...........' '.......&.&&.....']
def create_grid(my_map):
grid = [[0]*(len(my_map[0])) for x in range(len(my_map))]
return grid
grid = create_grid(my_map)
for x, y in [(x,y) for x in range(len(my_map)) for y in range(len(my_map[0]))]:
#any '&' north ?
if my_map[x][y+1]== '&' and my_map[x][y]=='.':
grid[x][y]+= 1
#any '&' west ?
if my_map[x-1][y]== '&' and my_map[x][y]=='.':
grid[x][y]+=1
#any '&' south ?
if my_map[x][y-1]== '&'and my_map[x][y]=='.':
grid[x][y]+=1
#any '&' east ?
if my_map[x+1][y]== '&'and my_map[x][y]=='.':
grid[x][y]+=1
#any '&' north-east ?
if my_map[x+1][y+1] == '&'and my_map[x][y]=='.':
grid[x][y]+=1
#any '&' south-west ?
if my_map[x-1][y-1] == '&'and my_map[x][y]=='.':
grid[x][y]+=1
#any '&' south-east ?
if my_map[x+1][y-1]== '&'and my_map[x][y]=='.':
grid[x][y]+=1
#any '&' north-west?
if my_map[x-1][y+1]== '&'and my_map[x][y]=='.':
grid[x][y]+=1
#desired output for first 3 rows
grid = [[0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0],[0,0,0,1,1,1,0,0,0,0,0,0,0,0,0,0],[2,2,1,1,2,1,0,0,0,0,0,0,0,0,0,0]]
At the moment, I get an 'IndexError: string index out of range'. I dont know how to limit the range so it will still be correct.The only thing I managed so far was a grid displaying 1s for all '.' and 0s for all '&'.

I don't think the nested conditionals are appropriate here; each outer conditional must be true for the inner ones to be evaluated. They should be independent of each other and sequential.
It's also a lot of work and error-prone to enumerate every conditional by hand. For each cell, there are up to 8 directions in which a neighbor might live, and we do the exact same check on each direction. A loop is the appropriate construct for doing this; each loop iteration checks one neighboring cell, determining whether it's in bounds and of the appropriate character.
Furthermore, since your grid has few &, it makes sense to only perform neighbor checks for & characters. For each one, increment counts for neighboring .s. Do the opposite if the grid is predominantly & characters.
my_map = [
'................',
'....&...........',
'..........E.....',
'&&..&...........',
'....&&&.........',
'......&&&&..&&..',
'................',
'.......&........',
'.....&.&........',
'....S...........',
'.......&.&&.....'
]
grid = [[0] * len(x) for x in my_map]
directions = [
[-1, 0], [1, 0], [0, 1], [0, -1],
[-1, -1], [1, 1], [1, -1], [-1, 1]
]
for row in range(len(my_map)):
for col in range(len(my_map[row])):
if my_map[row][col] == "&":
for x, y in directions:
y += row
x += col
if y < len(my_map) and y >= 0 and \
x < len(my_map[y]) and x >= 0 and \
my_map[y][x] != "&":
grid[y][x] += 1
for row in grid:
print(row)
Output:
[0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
[0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
[2, 2, 1, 2, 2, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
[0, 0, 1, 2, 0, 4, 2, 1, 0, 0, 0, 0, 0, 0, 0, 0]
[2, 2, 1, 2, 0, 0, 0, 4, 3, 2, 1, 1, 2, 2, 1, 0]
[0, 0, 0, 1, 2, 4, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0]
[0, 0, 0, 0, 0, 1, 3, 4, 4, 2, 1, 1, 2, 2, 1, 0]
[0, 0, 0, 0, 1, 1, 3, 0, 2, 0, 0, 0, 0, 0, 0, 0]
[0, 0, 0, 0, 1, 0, 3, 0, 2, 0, 0, 0, 0, 0, 0, 0]
[0, 0, 0, 0, 1, 1, 3, 2, 3, 2, 2, 1, 0, 0, 0, 0]
[0, 0, 0, 0, 0, 0, 1, 0, 2, 0, 0, 1, 0, 0, 0, 0]
And a version that overlays counts with the original map Minesweeper-style:
0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0
0 0 0 1 & 1 0 0 0 0 0 0 0 0 0 0
2 2 1 2 2 2 0 0 0 0 0 0 0 0 0 0
& & 1 2 & 4 2 1 0 0 0 0 0 0 0 0
2 2 1 2 & & & 4 3 2 1 1 2 2 1 0
0 0 0 1 2 4 & & & & 1 1 & & 1 0
0 0 0 0 0 1 3 4 4 2 1 1 2 2 1 0
0 0 0 0 1 1 3 & 2 0 0 0 0 0 0 0
0 0 0 0 1 & 3 & 2 0 0 0 0 0 0 0
0 0 0 0 1 1 3 2 3 2 2 1 0 0 0 0
0 0 0 0 0 0 1 & 2 & & 1 0 0 0 0
Try it!

How to get an object bounding box given pixel label in python?

Say I have a scene parsing map for an image, each pixel in this scene parsing map indicates which object this pixel belongs to. Now I want to get bounding box of each object, how can I implement this in python?
For a detail example, say I have a scene parsing map like this:
0 0 0 0 0 0 0
0 1 1 0 0 0 0
1 1 1 1 0 0 0
0 0 1 1 1 0 0
0 0 1 1 1 0 0
0 0 0 0 0 0 0
0 0 0 0 0 0 0
So the bounding box is:
0 0 0 0 0 0 0
1 1 1 1 1 0 0
1 0 0 0 1 0 0
1 0 0 0 1 0 0
1 1 1 1 1 0 0
0 0 0 0 0 0 0
0 0 0 0 0 0 0
Actually, in my task, just know the width and height of this object is enough.
A basic idea is to search four edges in the scene parsing map, from top, bottom, left and right direction. But there might be a lot of small objects in the image, this way is not time efficient.
A second way is to calculate the coordinates of all non-zero elements and find the max/min x/y. Then calculate weight and height using these x and y.
Is there any other more efficient way to do this? Thx.

If you are processing images, you can use scipy's ndimage library.
If there is only one object in the image, you can get the measurements with scipy.ndimage.measurements.find_objects (http://docs.scipy.org/doc/scipy-0.16.1/reference/generated/scipy.ndimage.measurements.find_objects.html):
import numpy as np
from scipy import ndimage
a = np.array([[0, 0, 0, 0, 0, 0, 0],
[0, 1, 1, 0, 0, 0, 0],
[1, 1, 1, 1, 0, 0, 0],
[0, 0, 1, 1, 1, 0, 0],
[0, 0, 1, 1, 1, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0]])
# Find the location of all objects
objs = ndimage.find_objects(a)
# Get the height and width
height = int(objs[0][0].stop - objs[0][0].start)
width = int(objs[0][1].stop - objs[0][1].start)
If there are many objects in the image, you first have to label each object and then get the measurements:
import numpy as np
from scipy import ndimage
a = np.array([[0, 0, 0, 0, 0, 0, 0],
[0, 1, 1, 0, 0, 0, 0],
[1, 1, 1, 1, 0, 0, 0],
[0, 0, 1, 1, 1, 0, 0],
[0, 0, 1, 1, 1, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 1, 1, 1, 0, 0]]) # Second object here
# Label objects
labeled_image, num_features = ndimage.label(a)
# Find the location of all objects
objs = ndimage.find_objects(labeled_image)
# Get the height and width
measurements = []
for ob in objs:
measurements.append((int(ob[0].stop - ob[0].start), int(ob[1].stop - ob[1].start)))
If you check ndimage.measurements, you can get more measurements: center of mass, area...

using numpy:
import numpy as np
ind = np.nonzero(arr.any(axis=0))[0] # indices of non empty columns
width = ind[-1] - ind[0] + 1
ind = np.nonzero(arr.any(axis=1))[0] # indices of non empty rows
height = ind[-1] - ind[0] + 1
a bit more explanation:
arr.any(axis=0) gives a boolean array telling you if the columns are empty (False) or not (True). np.nonzero(arr.any(axis=0))[0] then extract the non zero (i.e. True) indices from that array. ind[0] is the first element of that array, hence the left most column non empty column and ind[-1] is the last element, hence the right most non empty column. The difference then gives the width, give or take 1 depending on whether you include the borders or not.
Similar stuff for the height but on the other axis.

How to search shapes in a graph?

I have an adjancency matrix am of a 5 node undirected graph where am(i,j) = 1 means node i is connected to node j. I generated all possible
versions of this 5-node graph by the following code:
import itertools
graphs = list(itertools.product([0, 1], repeat=10))
This returns me an array of arrays where each element is a possible configuration of the matrix (note that I only generate these for upper triangle since matrix is symetric):
[ (0, 0, 0, 0, 0, 0, 1, 0, 1, 1),
(0, 0, 0, 0, 0, 0, 1, 1, 0, 0),
(0, 0, 0, 0, 0, 0, 1, 1, 0, 1),
(0, 0, 0, 0, 0, 0, 1, 1, 1, 0),
(0, 0, 0, 0, 0, 0, 1, 1, 1, 1),
....]
where (0, 0, 0, 0, 0, 0, 1, 1, 1, 1) actually corresponds to:
m =
0 0 0 0 0
0 0 0 0 0
0 0 1 1 1
0 0 0 0 1
0 0 0 0 0
I would like to search for all possible triangle shapes in this graph. For example, here, (2, 4), (2,5) and (4, 5) together makes a triangle shape:
m =
0 0 0 0 0
0 0 0 1 1
0 0 0 0 0
0 0 0 0 1
0 0 0 0 0
Is there a known algorithm to do such a search in a graph? Note that triangle shape is an example here, ideally I would like to find a solution that would search any particular shape, for example a square or a pentagon. How can I encode these shapes to search in the first place? Any help, reference, algorithm name is appreciated.

Your explanation for the graph representation is not quite understandable.
However, finding cycles of size k is NP-complete problem when k is your input (since it includes the NP-complete hamiltonian-cycle problem).
If that is the case, then you should have a look at these posts:
Finding all cycles of a certain length in a graph
Finding all cycles in undirected graphs
But, if you have fixes size lengths, then this problem can be solved in good polinomial time.
Here is an article about this very issue:
Finding and Counting Given Length Cycles | Algorithmica

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Identifying the number of clusters in a Python DataFrame - python

Related

Average diameter of complex shapes from pixels in df, Python

how to create an array of (100,19) size with each row as a vector of 19 values [0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0] in python?

Count string elements on a map based on where strings are and fill grid with the counts

How to get an object bounding box given pixel label in python?

How to search shapes in a graph?

Categories

Resources