Python: fill hollow object - python

I am writing a script to calculate the volume of any random shaped 3D object. I don't care if the object is hollow or not I need to calculate its total volume.
The data model I have is a 3D table (histogram of pixels) with ones and zeros. ones are evidently where the object is and zero where we have nothing. to calculate the volume of a well filled object it's as easy as summing all the pixels that contains one and multiply by the pixel volume.
On the other hand, the main difficulty remains where we have a hollow object, so we have zeros surrounded by ones. Therefore applying the straightforward method I described herein is not valid anymore. What we need to do is fill all the object area with ones. here is a 2D example so you can understand What i mean
a 2D table :
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 1 1 1 1 1 1 0 0 0
0 0 1 1 0 0 0 1 1 1 0 0
0 0 0 1 0 0 1 0 0 0 0 0
0 0 1 0 0 0 0 1 0 0 0 0
0 0 1 0 0 0 0 1 0 0 0 0
0 0 1 1 1 1 1 1 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
I need to transform it to this
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 1 1 1 1 1 1 0 0 0
0 0 1 1 1 1 1 1 1 1 0 0
0 0 0 1 1 1 1 0 0 0 0 0
0 0 1 1 1 1 1 1 0 0 0 0
0 0 1 1 1 1 1 1 0 0 0 0
0 0 1 1 1 1 1 1 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0

If you use scipy you can do this in one line with binary_fill_holes. And this works in n-dimensions. With your example:
import numpy as np
from scipy import ndimage
shape=np.array([
[0,0,0,0,0,0,0,0,0,0,0,0],
[0,0,0,1,1,1,1,1,1,0,0,0],
[0,0,1,1,0,0,0,1,1,1,0,0],
[0,0,0,1,0,0,1,0,0,0,0,0],
[0,0,1,0,0,0,0,1,0,0,0,0],
[0,0,1,0,0,0,0,1,0,0,0,0],
[0,0,1,1,1,1,1,1,0,0,0,0],
[0,0,0,0,0,0,0,0,0,0,0,0]
])
shape[ndimage.binary_fill_holes(shape)] = 1
#Output:
[[0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 1 1 1 1 1 1 0 0 0]
[0 0 1 1 1 1 1 1 1 1 0 0]
[0 0 0 1 1 1 1 0 0 0 0 0]
[0 0 1 1 1 1 1 1 0 0 0 0]
[0 0 1 1 1 1 1 1 0 0 0 0]
[0 0 1 1 1 1 1 1 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0]]

A standard flood fill should be extensible to three dimensions. From Wikipedia, the 2-d version in outline:
1. If the color of node is not equal to target-color, return.
2. Set the color of node to replacement-color.
3. Perform Flood-fill (one step to the west of node, target-color, replacement-color).
Perform Flood-fill (one step to the east of node, target-color, replacement-color).
Perform Flood-fill (one step to the north of node, target-color, replacement-color).
Perform Flood-fill (one step to the south of node, target-color, replacement-color).
4. Return.
Notice that in step 3. you are keeping track of all the adjacent cells. If you change this to find all adjacent cells in 3-d and run as before it should work nicely.

Not intuitive and hard to read, but compact:
matrix = [[0, 0, 0, 0, 0, 0],
[0, 0, 1, 0, 1, 0],
[0, 1, 0, 0, 0, 1],
[0, 0, 0, 0, 0, 0]]
ranges = [1 in m and range(m.index(1), len(m)-list(reversed(m)).index(1)) or None for m in matrix]
result = [[ranges[j] is not None and i in ranges[j] and 1 or 0 for i,a in enumerate(m)] for j,m in enumerate(matrix)]
result
[[0, 0, 0, 0, 0, 0],
[0, 0, 1, 1, 1, 0],
[0, 1, 1, 1, 1, 1],
[0, 0, 0, 0, 0, 0]]

matrix=[
[0,0,0,0,0,0,0,0,0,0,0,0],
[0,0,0,1,1,1,1,1,1,0,0,0],
[0,0,1,1,0,0,0,1,1,1,0,0],
[0,0,0,1,0,0,1,0,0,0,0,0],
[0,0,1,0,0,0,0,1,0,0,0,0],
[0,0,1,0,0,0,0,1,0,0,0,0],
[0,0,1,1,1,1,1,1,0,0,0,0],
[0,0,0,0,0,0,0,0,0,0,0,0]
]
def fill (x,y):
global matrix
if ( x==len (matrix)
or y==len (matrix[0])
or x==-1
or y==-1
or matrix[x][y]==1 ):
return
else:
matrix[x][y]=1
fill (x+1,y)
fill (x-1,y)
fill (x,y+1)
fill (x,y-1)
fill (4,4)
for i in matrix:
print i

Assuming you're talking about something like filling a voxel shape, why can't you just do something like this (take it as pseudocode example for the simplified 2D case, as I don't know what data structure you're using - maybe a numpy.array? - so I'm just taking a hypothetical "list of lists as a matrix" and I don't take in consideration the problem of modifying an iterable while traversing it etc.):
for i, row in enumerate(matrix):
last_filled_voxel_j = false
for j, voxel in enumerate(row):
if voxel:
if last_filled_voxel != false:
fill_matrix(matrix, i, last_filled_voxel_j, j)
last_filled_voxel_j = j
...assuming that fill_matrix(matrix, row, column_start, column_end) just fills the row of voxels between and not including column_start and column_end.
I guess this is probably not the answer you're looking for, but can you expand what thing different than what I pseudocoded before you actually need to do so we can be of more help?

Related

How to pivot dataframe into ML format

My head is spinning trying to figure out if I have to use pivot_table, melt, or some other function.
I have a DF that looks like this:
month day week_day classname_en origin destination
0 1 7 2 1 2 5
1 1 2 6 2 1 167
2 2 1 5 1 2 54
3 2 2 6 4 1 6
4 1 2 6 5 6 1
But I want to turn it into something like:
month_1 month_2 ...classname_en_1 classname_en_2 ... origin_1 origin_2 ...destination_1
0 1 0 1 0 0 1 0
1 1 0 0 1 1 0 0
2 0 1 1 0 0 1 0
3 0 1 0 0 1 0 0
4 1 0 0 0 0 0 1
Basically, turn all values into columns and then have binary rows 1 - if the column is present, 0 if none.
IDK if it is at all possible to do with like a single function or not, but would appreciate all and any help!
To expand #Corraliens answer
It is indeed a way to do it, but since you write for ML purposes, you might introduce a bug.
With the code above you get a matrix with 20 features. Now, say you want to predict on some data which suddenly have a month more than your training data, then your matrix on your prediction data would have 21 features, thus you cannot parse that into your fitted model.
To overcome this you can use one-hot-encoding from Sklearn. It'll make sure that you always have the same amount of features on "new data" as your training data.
import pandas as pd
df_train = pd.DataFrame({"color":["red","blue"],"age":[10,15]})
pd.get_dummies(df_train)
# output
age color_blue color_red
0 10 0 1
1 15 1 0
df_new = pd.DataFrame({"color":["red","blue","green"],"age":[10,15,20]})
pd.get_dummies(df_new)
#output
age color_blue color_green color_red
0 10 0 0 1
1 15 1 0 0
2 20 0 1 0
and as you can see, the order of the color-binary representation has also changed.
If we on the other hand use OneHotEncoder you can ommit all those issues
from sklearn.preprocessing import OneHotEncoder
df_train = pd.DataFrame({"color":["red","blue"],"age":[10,15]})
ohe = OneHotEncoder(handle_unknown="ignore")
color_ohe_transformed= ohe.fit_transform(df_train[["color"]]) #creates sparse matrix
ohe_features = ohe.get_feature_names_out() # [color_blue, color_red]
pd.DataFrame(color_ohe_transformed.todense(),columns = ohe_features, dtype=int)
# output
color_blue color_red
0 0 1
1 1 0
# now transform new data
df_new = pd.DataFrame({"color":["red","blue","green"],"age":[10,15,20]})
new_data_ohe_transformed = ohe.transform(df_new[["color"]])
pd.DataFrame(new_data_ohe_transformed .todense(),columns = ohe_features, dtype=int)
#output
color_blue color_red
0 0 1
1 1 0
2 0 0
note in the last row that both blue and red are both zeros since it has color= "green" which was not present in the training data.
Note the todense() function is only used here to illustrate how it works. Ususally you would like to keep it a sparse matrix and use e.g scipy.sparse.hstack to append your other features such as age to it.
Use pd.get_dummies:
out = pd.get_dummies(df, columns=df.columns)
print(out)
# Output
month_1 month_2 day_1 day_2 day_7 week_day_2 week_day_5 ... origin_2 origin_6 destination_1 destination_5 destination_6 destination_54 destination_167
0 1 0 0 0 1 1 0 ... 1 0 0 1 0 0 0
1 1 0 0 1 0 0 0 ... 0 0 0 0 0 0 1
2 0 1 1 0 0 0 1 ... 1 0 0 0 0 1 0
3 0 1 0 1 0 0 0 ... 0 0 0 0 1 0 0
4 1 0 0 1 0 0 0 ... 0 1 1 0 0 0 0
[5 rows x 20 columns]
You can use get_dummies function of pandas for convert row to column based on data.
For that your code will be:
import pandas as pd
df = pd.DataFrame({
'month': [1, 1, 2, 2, 1],
'day': [7, 2, 1, 2, 2],
'week_day': [2, 6, 5, 6, 6],
'classname_en': [1, 2, 1, 4, 5],
'origin': [2, 1, 2, 1, 6],
'destination': [5, 167, 54, 6, 1]
})
response = pd.get_dummies(df, columns=df.columns)
print(response)
Result :

How to compare each array in a set of binary arrays to an array that is outside the set

I have a set of arrays. I also have a separate array (T) to compare each array in the set to. I've tried to use SequenceMatcher to do this but can't figure out how to loop it so that each array from the set gets compared to T.
This is for a fitness function for a genetic algorithm. I'm new to python and have tried several things. The code below may be laughable!
import difflib
parents = set()
while len(parents) < 5:
a = tuple(np.random.choice([0, 1], size=(10)))
if a not in parents: parents.add(a) # To make them different
parents = np.array([list(x) for x in parents])
print(pop_sp())
T = tuple(np.random.choice([0, 1], size=(20)))
for i in parents:
fitness=difflib.SequenceMatcher(None,i,T)
print(fitness.ratio)
I expect the output to be
[[0 0 1 0 0 0 0 1 1 0]
[0 0 0 1 1 0 1 0 0 0]
[1 0 1 1 1 1 0 0 1 1]
[0 0 1 0 0 1 1 0 0 0]
[1 1 0 1 1 0 0 1 0 0]]
and the percent of similarity of each array to T.
but I am getting the following:
[[0 0 1 0 0 0 0 1 1 0]
[0 0 0 1 1 0 1 0 0 0]
[1 0 1 1 1 1 0 0 1 1]
[0 0 1 0 0 1 1 0 0 0]
[1 1 0 1 1 0 0 1 0 0]]
<bound method SequenceMatcher.ratio of <difflib.SequenceMatcher object at 0x1c1ea8ec88>>
<bound method SequenceMatcher.ratio of <difflib.SequenceMatcher object at 0x1c1dff9438>>
<bound method SequenceMatcher.ratio of <difflib.SequenceMatcher object at 0x1c1ea8ec88>>
<bound method SequenceMatcher.ratio of <difflib.SequenceMatcher object at 0x1c1dff9438>>
<bound method SequenceMatcher.ratio of <difflib.SequenceMatcher object at 0x1c1ea8ec88>>
You have to call the fitness function
for i in parents:
fitness=difflib.SequenceMatcher(None,i,T)
ratios = fitness.ratio()
print(ratios)

Discard rows and corresponding colums of a matrix that are all 0 [duplicate]

This question already has answers here:
Efficiently test matrix rows and columns with numpy
(2 answers)
Closed 4 years ago.
I have a square matrix that looks something like
0 0 0 0 0 0 0
1 0 0 0 1 0 0
1 1 1 0 0 0 0
0 0 0 0 0 0 0
1 1 1 0 1 1 0
1 1 1 0 1 1 0
0 0 0 0 0 0 0
eg the output of this would be:
0 0 0 | 0 0 |
1 0 0 | 1 0 |
1 1 1 | 0 0 |
- - - + - - +
1 1 1 | 1 1 |
1 1 1 | 1 1 |
- - - + - - +
0 0 0 0 0
1 0 0 1 0
1 1 1 0 0
1 1 1 1 1
1 1 1 1 1
Notice how the 4th row and column are all 0, as well as the last. I would like to delete rows and columns if and only if the ith row and the ith column are all 0s. (Also note that the first row of 0s remains since the first column contains non-zero elements.)
Is there a clean and easy way to do this without looping through each one?
Assume a is a numpy array with same sizes on both dimensions:
# find out the index to keep
keep_idx = a.any(0) | a.any(1)
# subset the array
a[keep_idx][:, keep_idx]
#array([[0, 0, 0, 0, 0],
# [1, 0, 0, 1, 0],
# [1, 1, 1, 0, 0],
# [1, 1, 1, 1, 1],
# [1, 1, 1, 1, 1]])
Suppose we have a 7*7 data frame, similar to your matrix, then the following code does the work:
row_sum = df.sum(axis=1)
col_sum = df.sum(axis=0)
lst=[]
for i in range(len(df)):
if ((row_sum[i] == 0) & (col_sum[i]==0)):
lst.append(i)
df1 = df.drop(lst, axis = 1).drop(lst, axis = 0)

reason for transposed confusion matrix in heatmap

I plot a heatmap which takes a confusion matrix as input data. The confusion matrix has the shape:
[[37 0 0 0 0 0 0 0 0 0]
[ 0 42 0 0 0 1 0 0 0 0]
[ 1 0 43 0 0 0 0 0 0 0]
[ 0 0 0 44 0 0 0 0 1 0]
[ 0 0 0 0 37 0 0 1 0 0]
[ 0 0 0 0 0 47 0 0 0 1]
[ 0 0 0 0 0 0 52 0 0 0]
[ 0 0 0 0 1 0 0 47 0 0]
[ 0 1 0 1 0 0 0 1 45 0]
[ 0 0 0 0 0 2 0 0 0 45]]
The code to plot the heatmap is:
fig2=plt.figure()
fig2.add_subplot(111)
sns.heatmap(confm.T,annot=True,square=True,cbar=False,fmt="d")
plt.xlabel("true label")
plt.ylabel("predicted label")
which yields:
As you can see, the input matrix "confm" is transposed (confm.T). What is the reason for this? Do I necessarily have to do that?
When I plot your data with the code you provided I get this:
Without the transpose and when swapping the x and y labels you get:
fig2=plt.figure()
fig2.add_subplot(111)
sns.heatmap(confm,annot=True,square=True,cbar=False,fmt="d")
plt.xlabel("predicted label")
plt.ylabel("true label")
Which results in the same confusion matrix. What the transpose really does is swap which is the prediction and which is the ground truth (true label). What you need to use depends on how the data is formatted.
You need to transpose only if you want to switch along which axis which data will be placed. I'm usually use confusion matrix as is: y - true labels, x - predicted labels. You need transpose matrix and swap labels only if you like it vice versa: y - predicted labels, x - true labels.

How to apply logical operator OR in some of the list item?

I want to know is it possible to include logical operator OR in the list item. For example:
CHARS = ['X','Y','Z']
change this line of code to something like: (I know this is not a correct way)
Can anyone help me?
CHARS = ['X','Y','Z','X OR Y','Y OR Z','X OR Z']
Example code:
import numpy as np
seqs = ["XYZXYZ","YZYZYZ"]
CHARS = ['X','Y','Z']
CHARS_COUNT = len(CHARS)
maxlen = max(map(len, seqs))
res = np.zeros((len(seqs), CHARS_COUNT * maxlen), dtype=np.uint8)
for si, seq in enumerate(seqs):
seqlen = len(seq)
arr = np.chararray((seqlen,), buffer=seq)
for ii, char in enumerate(CHARS):
res[si][ii*seqlen:(ii+1)*seqlen][arr == char] = 1
print res
It scan through to detect X first if it is occurred then will be awarded 1 then detect Y and last Z.
Output:
[[1 0 0 1 0 0 0 1 0 0 1 0 0 0 1 0 0 1]
[0 0 0 0 0 0 1 0 1 0 1 0 0 1 0 1 0 1]]
Expected output after include logical OR:
[[1 0 0 1 0 0 0 1 0 0 1 0 0 0 1 0 0 1 1 1 0 1 1 0 0 1 1 0 1 1 1 0 1 1 0 1]
[0 0 0 0 0 0 1 0 1 0 1 0 0 1 0 1 0 1 1 0 1 0 1 0 1 1 1 1 1 1 0 1 0 1 0 1]]
The example below is a bit contrived, but using itertools.combinations would be a way to generate combinations of size n for a given list. Combine this with str.join() and you'd be able to generate strings as exemplified in the first part of your question:
import itertools
CHARS = ['X','Y','Z']
allCombinations = [" OR ".join(x) for i in range(1,len(CHARS)) for x in itertools.combinations(CHARS, i)]
print repr(allCombinations)
Output:
['X', 'Y', 'Z', 'X OR Y', 'X OR Z', 'Y OR Z']

Categories