How to find cluster sizes in 2D numpy array? - python

My problem is the following,
I have a 2D numpy array filled with 0 an 1, with an absorbing boundary condition (all the outer elements are 0) , for example:
[[0 0 0 0 0 0 0 0 0 0]
[0 0 1 0 0 0 0 0 0 0]
[0 0 1 0 1 0 0 0 1 0]
[0 0 0 0 0 0 1 0 1 0]
[0 0 0 0 0 0 1 0 0 0]
[0 0 0 0 1 0 1 0 0 0]
[0 0 0 0 0 1 1 0 0 0]
[0 0 0 1 0 1 0 0 0 0]
[0 0 0 0 1 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0]]
I want to create a function that takes this array and its linear dimension L as input parameters, (in this case L = 10) and returns the list of cluster sizes of this array.
By "clusters" I mean the isolated groups of elements 1 of the array
the array element [ i ][ j ] is isolated if all its neighbours are zeros, and its neighbours are the elements:
[i+1][j]
[i-1][j]
[i][j+1]
[i][j-1]
So in the previous array we have 7 clusters of sizes (2,1,2,6,1,1,1)
I tried to complete this task by creating two functions, the first one is a recursive function:
def clust_size(array,i,j):
count = 0
if array[i][j] == 1:
array[i][j] = 0
if array[i-1][j] == 1:
count += 1
array[i-1][j] = 0
clust_size(array,i-1,j)
elif array[i][j-1] == 1:
count += 1
array[i-1][j] = 0
clust_size(array,i,j-1)
elif array[i+1][j] == 1:
count += 1
array[i-1][j] = 0
clust_size(array,i+1,j)
elif array[i][j+1] == 1:
count += 1
array[i-1][j] = 0
clust_size(array,i,j+1)
return count+1
and it should return the size of one cluster. Everytime the function finds an array element equal to 1 it increases the value of the counter "count" and changes the value of the element to 0, in this way each '1' element it's counted just one time.
If one of the neighbours of the element is equal to 1 then the function calls itself on that element.
The second function is:
def clust_list(array,L):
sizes_list = []
for i in range(1,L-1):
for i in range(1,L-1):
count = clust_size(array,i,j)
sizes_list.append(count)
return sizes_list
and it should return the list containing the cluster sizes. The for loop iterates from 1 to L-1 because all the outer elements are 0.
This doesn't work and I can't see where the error is...
I was wondering if maybe there's an easier way to do it.

it seems like a percolation problem.
The following link has your answer if you have scipy installed.
http://dragly.org/2013/03/25/working-with-percolation-clusters-in-python/
from pylab import *
from scipy.ndimage import measurements
z2 = array([[0,0,0,0,0,0,0,0,0,0],
[0,0,1,0,0,0,0,0,0,0],
[0,0,1,0,1,0,0,0,1,0],
[0,0,0,0,0,0,1,0,1,0],
[0,0,0,0,0,0,1,0,0,0],
[0,0,0,0,1,0,1,0,0,0],
[0,0,0,0,0,1,1,0,0,0],
[0,0,0,1,0,1,0,0,0,0],
[0,0,0,0,1,0,0,0,0,0],
[0,0,0,0,0,0,0,0,0,0]])
This will identify the clusters:
lw, num = measurements.label(z2)
print lw
array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 1, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 1, 0, 2, 0, 0, 0, 3, 0],
[0, 0, 0, 0, 0, 0, 4, 0, 3, 0],
[0, 0, 0, 0, 0, 0, 4, 0, 0, 0],
[0, 0, 0, 0, 5, 0, 4, 0, 0, 0],
[0, 0, 0, 0, 0, 4, 4, 0, 0, 0],
[0, 0, 0, 6, 0, 4, 0, 0, 0, 0],
[0, 0, 0, 0, 7, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])
The following will calculate their area.
area = measurements.sum(z2, lw, index=arange(lw.max() + 1))
print area
[ 0. 2. 1. 2. 6. 1. 1. 1.]
This gives what you expect, although I would think that you would have a cluster with 8 members by eye-percolation.

I feel your problem with finding "clusters", is essentially the same problem of finding connected components in a binary image (with values of either 0 or 1) based on 4-connectivity. You can see several algorithms to identify the connected components (or "clusters" as you defined them) in this Wikipedia page:
http://en.wikipedia.org/wiki/Connected-component_labeling
Once the connected components or "clusters" are labelled, you can find any information you want easily, including the area, relative position or any other information you may want.

I believe that your way ist almost correct, except that you are initializing the variable count over and over again whenever you recursively call your function clust_size. I would add the count variable to the input parameters of clust_size and just reinitialize it for every first call in your nested for loops with count = 0.
Like this, you would call clust_size always like count=clust_size(array, i ,j, count)
I haven't tested it but it seems to me that it should work.
Hope it helps.

A relatively simple problem if you convert this to strings
import numpy as np
arr=np.array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0,],
[0, 0, 1, 0, 0, 0, 0, 0, 0, 0,],
[0, 0, 1, 1, 1, 1, 1, 1, 1, 0,], #modified
[0, 0, 0, 0, 0, 0, 1, 0, 1, 0,],
[0, 0, 0, 0, 0, 0, 1, 0, 0, 0,],
[0, 0, 0, 0, 1, 0, 1, 0, 0, 0,],
[0, 0, 0, 0, 0, 1, 1, 0, 0, 0,],
[0, 0, 0, 1, 0, 1, 0, 0, 0, 0,],
[0, 0, 0, 0, 1, 0, 0, 0, 0, 0,],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])
arr = "".join([str(x) for x in arr.reshape(-1)])
print [len(x) for x in arr.replace("0"," ").split()]
output
[1, 7, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1] #Cluster sizes

Related

count the number of occurance of each one hot code

I have a list of numpy arrays (one-hot represantation) like the example bellow, I want to count the number of occurances of each one-hot code.
[0 0 1 0 0 0 0 0 0 0]
[0 0 1 0 0 0 0 0 0 0]
[0 1 0 0 0 0 0 0 0 0]
[0 0 0 0 0 1 0 0 0 0]
[0 1 0 0 0 0 0 0 0 0]
[0 0 0 0 1 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 1]
[0 0 0 0 1 0 0 0 0 0]
[1 0 0 0 0 0 0 0 0 0]
[0 0 0 1 0 0 0 0 0 0]
[0 1 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 1]
Edit :
Expected output :
[1 0 0 0 0 0 0 0 0 0] ==> 1 occurrence
[0 0 1 0 0 0 0 0 0 0] ==> 2 occurrences
[0 1 0 0 0 0 0 0 0 0] ==> 3 occurrences
[0 0 0 0 0 1 0 0 0 0] ==> 1 occurrence
[0 0 0 0 1 0 0 0 0 0] ==> 2 occurrences
[0 0 0 0 0 0 0 0 0 1] ==> 2 occurrences
I think you can get the result you seek:
[1 3 2 1 2 1 0 0 0 2]
indicating the count of occurrences of one hot in that position via a simple column-wise sum using ndarray.sum():
import numpy
data = numpy.array([
[0, 0, 1, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 1, 0, 0, 0, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 1, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
[0, 0, 0, 0, 1, 0, 0, 0, 0, 0],
[1, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 1, 0, 0, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
])
print(numpy.ndarray.sum(data, axis=0))
or more compactly as just:
print(data.sum(axis=0))
both should give you:
[1 3 2 1 2 1 0 0 0 2]
Using the face that each row is 1 hot, you can do the following:
temp = np.array([[0, 0, 1, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 1, 0, 0, 0, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 1, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
[0, 0, 0, 0, 1, 0, 0, 0, 0, 0],
[1, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0 ,0 ,0 ,1 ,0 ,0 ,0 ,0 ,0, 0],
[0, 1, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 1]])
converting the one-hot to indices can be done as follows:
temp2 = np.argmax(temp, axis=1) # array([2, 2, 1, 5, 1, 4, 9, 4, 0, 3, 1, 9])
and then the counting of the occurances can be done using np.histogram. We know that you have 10 possible values, so we use 10 bins as follows:
temp3 = np.histogram(temp2, bins=10, range=(-0.5,9.5))
np.histogram returns a touple where index [0] holds the histogram values and index [1] holds the bins. In your case:
(array([1, 3, 2, 1, 2, 1, 0, 0, 0, 2]),
array([-0.5, 0.5, 1.5, 2.5, 3.5, 4.5, 5.5, 6.5, 7.5, 8.5, 9.5]))

How to join matrices like puzzle pieces in python

I've got three puzzle pieces defined as a number of arrays, 7x7, in a following manner:
R3LRU = pd.DataFrame([
[1, 1, 1, 1, 1, 1, 1],
[1, 0, 0, 0, 0, 0, 1],
[1, 0, 0, 0, 0, 0, 1],
[1, 0, 0, 0, 0, 0, 1],
[1, 0, 0, 0, 0, 0, 1],
[1, 0, 0, 0, 0, 0, 1],
[1, 0, 0, 0, 0, 0, 1]
])
I am trying to join them by the following rules: 1111111 can be joined with 1000001, 1000001 can be joined with 1000001, but 1111111 cannot be joined with 1111111. Better illustration will be the following:
I have tried using pd.concat function, but it just glues them together instead of joining by sides, like this:
Or, in terms of code output, like this:
0 1 2 3 4 5 6 0 1 2 3 4 5 6 0 1 2 3 4 5 6
0 1 1 1 1 1 1 1 1 0 0 0 0 0 1 1 1 1 1 1 1 1
1 1 0 0 0 0 0 1 1 0 0 0 0 0 1 1 0 0 0 0 0 0
2 1 0 0 0 0 0 1 1 0 0 0 0 0 1 1 0 0 0 0 0 0
3 1 0 0 0 0 0 1 1 0 0 0 0 0 1 1 0 0 0 0 0 0
4 1 0 0 0 0 0 1 1 0 0 0 0 0 1 1 0 0 0 0 0 0
5 1 0 0 0 0 0 1 1 0 0 0 0 0 1 1 0 0 0 0 0 0
6 1 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
I suppose I would like to join by columns 6 and 0, or rows 6 and 0
How can I define "joining" sides, so that the pieces would join through the proposed rules?
I take it you want to concatenate if the last column and first columns match and then "overlap" both parts. I dont think, pandas is a good fit for this problem as you only need values, no columns or basically any features you would use pandas for.
I would recommend simple numpy arrays. Then you could do something like
In [1]: import numpy as np
In [2]: R3LRU = np.array([
...: [1, 1, 1, 1, 1, 1, 1],
...: [1, 0, 0, 0, 0, 0, 1],
...: [1, 0, 0, 0, 0, 0, 1],
...: [1, 0, 0, 0, 0, 0, 1],
...: [1, 0, 0, 0, 0, 0, 1],
...: [1, 0, 0, 0, 0, 0, 1],
...: [1, 0, 0, 0, 0, 0, 1]
...: ])
In [3]: R3LRU
Out[3]:
array([[1, 1, 1, 1, 1, 1, 1],
[1, 0, 0, 0, 0, 0, 1],
[1, 0, 0, 0, 0, 0, 1],
[1, 0, 0, 0, 0, 0, 1],
[1, 0, 0, 0, 0, 0, 1],
[1, 0, 0, 0, 0, 0, 1],
[1, 0, 0, 0, 0, 0, 1]])
Get the last column of the first part and the first column of the second part
In [4]: R3LRU[:,0]
Out[4]: array([1, 1, 1, 1, 1, 1, 1])
In [5]: R3LRU[:,-1]
Out[5]: array([1, 1, 1, 1, 1, 1, 1])
Compare them
In [6]: R3LRU[:,0] == R3LRU[:,-1]
Out[6]: array([ True, True, True, True, True, True, True])
In [7]: np.all(R3LRU[:,0] == R3LRU[:,-1])
Out[7]: True
If they are equal, combine them
In [8]: if np.all(R3LRU[:,0] == R3LRU[:,-1]):
...: combined = np.hstack([R3LRU[:,:-1], R3LRU])
In [9]: combined
Out[9]:
array([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1],
[1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1],
[1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1],
[1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1],
[1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1],
[1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1]])
Maybe your rules are a bit more complicated than a simple == comparison, but you can just make that if statement more complicated to reflect all rules you have ;)

how to remove surrounding empty data from 3d array

I have an array with shape (136, 512, 512) this array contains 0s and 1s indicating an objects shape inside this 3d space. I am trying to reduce the size of my array by removing empty slices of the array. essentially I want to keep all the 1s in my array but remove unnecessary rows and columns while keeping my array rectangular, similar to a hitbox or something like that. for example:
(0, 0, 0, 0, 0,
0, 0, 0, 1, 0,
0, 1, 1, 1, 0,
0, 1, 0, 1, 0,
1, 1, 0, 1, 0,
1, 0, 0, 1, 0,
0, 1, 1, 1, 0,
0, 0, 0, 0, 0)
would become:
(0, 0, 0, 1,
0, 1, 1, 1,
0, 1, 0, 1,
1, 1, 0, 1,
1, 0, 0, 1,
0, 1, 1, 1)
but on a 3d scale
(sorry about the horrible formatting, I'm terrible at this.)
and this is only necessary because pyplot doesn't seem to be able to plot such a large 3d graph with voxels, or atleast takes a very long time on my computer. so if anyone knows how to do large scale 3d plots that would be great.
EDIT
to clarify, the example is only a 2d example but to do this in 3d it must take into account all other rows / cols as each square must be the same shape. Not sure if this makes much sense, its hard to explain it in this many dimensions.
think of it as removing anything outside of the outer most 1s on each side from the centre of the cube.
EDIT: For removing only the surrounding brackets, read the excellent answer by Bill.
You can use np.all and np.delete to achieve this.
import numpy as np
l = [[0, 0, 0, 0, 0],
[0, 0, 0, 1, 0],
[0, 1, 1, 1, 0],
[0, 1, 0, 1, 0],
[1, 1, 0, 1, 0],
[1, 0, 0, 1, 0],
[0, 1, 1, 1, 0],
[0, 0, 0, 0, 0]]
arr = np.array(l)
arr1 = np.delete(arr, np.all(arr[..., :] == 0, axis=0), axis=1) # Deletes all 0-value columns
arr2 = np.delete(arr1, np.all(arr1[..., :] == 0, axis=1), axis=0) # Deletes all 0-value rows
print(arr)
print(arr2)
Output
[[0 0 0 0 0]
[0 0 0 1 0]
[0 1 1 1 0]
[0 1 0 1 0]
[1 1 0 1 0]
[1 0 0 1 0]
[0 1 1 1 0]
[0 0 0 0 0]]
[[0 0 0 1]
[0 1 1 1]
[0 1 0 1]
[1 1 0 1]
[1 0 0 1]
[0 1 1 1]]
The same can be extended to 3D array too.

How to transform 1D list of values to 2D grid of 0's and 1's in python [duplicate]

This question already has answers here:
How can I one hot encode in Python?
(22 answers)
Closed 3 years ago.
I would like to take a list of values and transform them to a table (2D-list) of 0's and 1's, with one column for each unique number in the source list and an equal number of rows to the original. Each row will have a 1 if that column index matches the original value-1.
I have code that accomplishes this task, but I'm wondering if there is a better/faster way to do it. (The actual dataset has millions of entries vs. the simplified set below)
Sample Input:
value_list = [1, 2, 1, 3, 6, 5, 4, 3]
Desired output:
output_table = [[1, 0, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0],
[1, 0, 0, 0, 0, 0],
[0, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 0, 1],
[0, 0, 0, 0, 1, 0],
[0, 0, 0, 1, 0, 0],
[0, 0, 1, 0, 0, 0]]
Current Solution:
value_list = [1, 2, 1, 3, 6, 5, 4, 3]
max_val = max(value_list)
# initialize to table of 0's
a = [([0] * max_val) for i in range(len(value_list))]
# overwrite with 1's where required
for i in range(len(value_list)):
j = value_list[i] - 1
a[i][j] = 1
print(f'a = ')
for row in a:
print(f'{row}')
You can do:
import numpy as np
value_list = [1, 2, 1, 3, 6, 5, 4, 3]
# create matrix of zeros
x = np.zeros(shape=(len(value_list), max(value_list)), dtype='int')
for i,v in enumerate(value_list):
x[i,v-1] = 1
print(x)
Output:
[[1 0 0 0 0 0]
[0 1 0 0 0 0]
[1 0 0 0 0 0]
[0 0 1 0 0 0]
[0 0 0 0 0 1]
[0 0 0 0 1 0]
[0 0 0 1 0 0]
[0 0 1 0 0 0]]
You can try this:
dummy_list = [0]*6
output_table = [dummy_list[:i-1] + [1] + dummy_list[i:] for i in value_list]
Output:
output_table = [[1, 0, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0],
[1, 0, 0, 0, 0, 0],
[0, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 0, 1],
[0, 0, 0, 0, 1, 0],
[0, 0, 0, 1, 0, 0],
[0, 0, 1, 0, 0, 0]]

How to exceed limitation of numpy.array() to convert list of array to an array of array?

I have a list of arrays containing each one 16 int :
ListOfArray=[array([0,1,....,15]), array([0,1,....,15]), array([0,1,....,15]),....,array([0,1,....,15])]
I want to convert it to an array of array.
So I use :
ListOfArray=numpy.array(ListOfArray)
or:
ListOfArray=numpy.asarray(ListOfArray)
or :
ArrayOfArray=numpy.asarray(ListOfArray)
Same result
If my list of arrays contained less than 17716 arrays I have the normal result :
[[0 0 0 ... 0 0 1]
[1 0 0 ... 0 1 0]
[0 0 0 ... 0 0 1]
...
[0 1 1 ... 1 0 0]
[0 1 1 ... 0 0 0]
[0 1 1 ... 0 0 1]]
But from 17716 arrays I have this :
[array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1])
array([1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0])
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1]) ...
array([0, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 0, 0, 0, 0, 0])
array([0, 1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 1])
array([0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])]
It seems that there is a limit somewhere, why ?
Can we exceed it ?
Edit :
there is no problem with numpy.array.. It's wasn't desired to have an array containing 17 values. I converted wav frames into binary and then into string (of fifteen 0 and 1), which I add 0 before if it's a negative and 1 for positive, and then convert to a list and then an array.. I didn't expect a value of -32768 (-0b10000000000000000), believed that -32767 and 32767 (15 binaries digits) was the maximums.
It's a pretty ugly code, i'm not proud, but if you have advice for a less patchworking code, here it is :
import numpy as np
import wave
import struct
f= wave.open('Test16PCM.wav','rb')
nf = f.getnframes()
frames=f.readframes(nf)
f.close()
L=[]
# extracting values samples
for i in range (0,((nf-1)*4)+1,4):
L.append( (struct.unpack('<h',frames[i:(i+2)])[0]) ) # only the left track
Lbin=[] # convert int values to string of binaries + 0 or 1 for negative or positive
for i in L:
a=str(bin(i))
if a[0]=="-" : # something like "-0b00101101"
a=a[3:]
while len(a)<16: # to have same length binary number (was 15 before correction)
a='0'+a
Lbin.append('0'+a)
else : # something like "0b00101101"
a=a[2:]
while len(a)<16:
a='0'+a
Lbin.append('1'+a)
Lout=[]
for i in Lbin :
temp=[]
for j in i :
temp.append(int(j))
temp=np.array(temp)
Lout.append(temp)
Lout=np.asarray(Lout)
print(Lout)

Categories