How to perform morphology erosion using FFT Convolution - python

I try to calculate morphology erosion using FFT Convolution. I know erosion is dual operation to dilation. The first problem is I cannot use 0 as background as usualy I do. So I biased my values. Lets 0.1 value denotes background and 1.0 denotes foreground. Inverting background to foreground and perform FFT convolution with structure element (using scipy.signal.fftconvolve) I obtained result which I cannot interpret further. I know I should threshold the solution somehow and inverse again. How to do?
My 2D signal A:
1 1 0 1 1
1 1 1 1 1
0 1 1 1 0
1 1 1 1 1
1 1 0 1 1
Stucture element B:
0 1 0
1 1 1
0 1 0
Erode(A,B):
0 0 0 0 0
0 1 0 1 0
0 0 1 0 0
0 1 0 1 0
0 0 0 0 0
Using FFT Convolution, inv(A):
0.1 0.1 1.0 0.1 0.1
0.1 0.1 0.1 0.1 0.1
1.0 0.1 0.1 0.1 1.0
0.1 0.1 0.1 0.1 0.1
0.1 0.1 1.0 0.1 0.1
and B:
0.1 1.0 0.1
1.0 1.0 1.0
0.1 1.0 0.1
The result as below:
0.31 1.32 1.32 1.32 0.31
1.32 0.72 1.44 0.72 1.32
1.32 1.44 0.54 1.44 1.32
1.32 0.72 1.44 0.72 1.32
0.31 1.32 1.32 1.32 0.31
What next? normalize/threshold then inverse?
Best regards

My answer arrives really late, but I still give it.
The Erode(A,B) is wrong. This should be the answer:
1 0 0 0 1
0 1 0 1 0
0 0 1 0 0
0 1 0 1 0
1 0 0 0 1
Erosion/dilation are rank operations and more particularly min/max operations, but definitely not convolutions. Thus, you cannot perform them using FFT.

After performing the convolution by multiplying in frequency-space and inverse-transforming back into real space, you must then threshold the result above a certain value. According to the white paper Dilation and Erosion of Gray Images with Spherical Masks by J. Kukal, D. Majerova, A. Prochazka, that threshold is >0.5 for dilation, and for erosion it's >m-0.5 where m the structure element's volume (the number of 1s in B, 5 in this case).
In brief: This code will give you the expected result.
from scipy.signal import fftconvolve
import numpy as np
def erode(A,B):
thresh = np.count_nonzero(B)-0.5
return fftconvolve(A,B,'same') > thresh
This will work in any dimension and reproduce exactly the results of scipy.ndimage.binary_erosion - at least, I've tested it in 2D and 3D.
As for the other answer posted here that disputes the expected result of an erosion: it depends on the boundary condition. Both scipy.ndimage.binary_erosion and the custom erode(A,B) function written here assume that erosion may occur from all edges of the input A - i.e. A is padded out with 0s before the erosion. If you don't like this boundary condition - if you think, for example, that it should be treated as a reflected boundary condition - then you should consider padding the array yourself using e.g. np.pad(A,np.shape(B)[0],'reflect'). Then you'd need to un-pad the result afterwards.
I was light on details here because I wrote a more comprehensive answer on using frequency-space convolution to perform both erosion and dilation over at another question you asked, but I thought it was worthwhile to have a conclusive answer posted here for anyone seeking it.

Related

standardize pandas groupby results

I am using pandas to get subgroup averages, and the basics work fine. For instance,
d = np.array([[1,4],[1,1],[0,1],[1,1]])
m = d.mean(axis=1)
p = pd.DataFrame(m,index='A1,A2,B1,B2'.split(','),columns=['Obs'])
pprint(p)
x = p.groupby([v[0] for v in p.index])
pprint(x.mean('Obs'))
x = p.groupby([v[1] for v in p.index])
pprint(x.mean('Obs'))
YIELDS:
Obs
A1 2.5
A2 1.0
B1 0.5
B2 1.0
Obs
A 1.75. <<<< 1.75 is (2.5 + 1.0) / 2
B 0.75
Obs
1 1.5
2 1.0
But, I also need to know how much A and B (1 and 2) deviate from their common mean. That is, I'd like to have tables like:
Obs Dev
A 1.75 0.50 <<< deviation of the Obs average, i.e., 1.75 - 1.25
B 0.75 -0.50 <<< 0.75 - 1.25 = -0.50
Obs Dev
1 1.5 0.25
2 1.0 -0.25
I can do this using loc, apply etc - but this seems silly. Can anyone think of an elegant way to do this using groupby or something similar?
Aggregate the means, then compute the difference to the mean of means:
(p.groupby(p.index.str[0])
.agg(Obs=('Obs', 'mean'))
.assign(Dev=lambda d: d['Obs']-d['Obs'].mean())
)
Or, in case of a variable number of items if you want the difference to the overall mean (not the mean of means!):
(p.groupby(p.index.str[0])
.agg(Obs=('Obs', 'mean'))
.assign(Dev=lambda d: d['Obs']-p['Obs'].mean()) # notice the p (not d)
)
output:
Obs Dev
A 1.75 0.5
B 0.75 -0.5

How to measure similarity of inner observation variation without considering actual values?

I am sure that this has been done before but I am unsure of how to even phrase the question for google and have been racking my brain for a few hours now, but I can explain it with an example. Imagine you have the data below.
observation #
m1
m2
m3
m4
m5
m6
1
T
L
T
L
T
L
2
A
R
A
R
A
A
3
B
C
B
C
B
C
4
K
K
K
A
L
K
5
P
P
P
R
L
P
I want to generate some sort of similarity metric between observations that relates to the variation across the m1-6 variables. The actual values in the cells shouldn't matter at all.
Considering the table above, for example observations 1 and 3 are exactly the same as they vary the same across the m's (TLTLTL & BCBCBC). 1 & 3 are very similar to 2, and observations 4 and 5 are the same but not similar to 1-3.
I would like an output that captures all these relationships for example . . .
observation #
1
2
3
4
5
1
1
0.8
1
0.1
0.1
2
0.8
1
0.8
0.2
0.2
3
1
0.8
1
0.1
0.1
4
0.1
0.2
0.1
1
1
5
0.1
0.2
0.1
1
1
A few notes - each cell can have more than just 1 letter but again the actual contents of each cell don't matter - just the variation across the m's within each observation compared to other observations. Is their a name for what I am trying to do here? Also I only know python & R so if you provide any code please have it in those (python preferred).
It is driving me crazy that I can't figure this out. Thanks in advance for any help :)

Converting a column into floats except first few rows i.e selective conversion in a column

I m trying to do some data processing but I m getting the same error everytime.
My dataframe (con_tc) looks like this as follows:-
Index u_p0 u_p1 u_p2.........u_p100
x 0 0 0 0
y 0 0 0 0
z 30 50 75 1000
0.01 0.5 0.6 0.43 0.83
0.02 0.56 0.94 0.94 0.7
....
1000 0.4 0.5 0.45 0.56
When I run this line of code
con_tc.index = con_tc.index.map(lambda w: float(w) if (w not in 'xyz') else w)
which is trying to clean the index into float, I am getting the error as
TypeError: 'in <string>' requires string as left operand, not float
The aim behind this is to convert all the numeric values into floats except x,y and z.
In basic term
Index
x
y
z
0.01
0.02
....
1000
If anyone can help me out it will be really helpful.
May be you can use this way
con_tc[ 'float_u_p_0' ] = con_tc.apply( lambda x: float(x.u_p0) if x.Index not in ("xyz") else x.u_p0

Choosing a random value from a discrete distribution

I had come across the following code while reading up about RL. The probs vector contains the probabilities of each action to be taken. And I believe the given loop tries to choose an action randomly from the given distribution. Why/How does this work?
a = 0
rand_select = np.random.rand()
while True:
rand_select -= probs[a]
if rand_select < 0 or a + 1 == n_actions:
break
a += 1
actions = a
After going through similar code, I realised that "actions" contains the final action to be taken.
You can view the probabilities as a distribution of contiguous parts on the line from 0.0 to 1.0.
if we have A: 0.2, B: 0.3, C: 0.5 to the line could be
0.0 --A--> 0.2
0.2 --B--> 0.5
0.5 --C--> 1.0
And in total 1.0.
The algorithm is choosing a random location between 0.0->1.0 and finds out where it "landed" (A, B or C) by sequentially ruling out parts.
Suppose we draw 0.73, We can "visualize" it like this (selection marked with *):
0.0 ---------------------------> 1.0
*
0.0 --A--> 0.2 --B--> 0.5 --C--> 1.0
0.73 - 0.2 > 0 so we reduce 0.2 (=0.53) and are left with:
0.2 --B--> 0.5
0.5 --C--> 1.0
0.53 - 0.3 > 0 so we reduce 0.5 (=0.23) and are left with:
0.5 --C--> 1.0
0.23 - 0.5 < 0 so we know the part we drew was C.
The selection distributes the same as the probabilities and the algorithm is O(n) where n is the number of probabilities.

How SelectKBest (chi2) calculates score?

I am trying to find the most valuable features by applying feature selection methods to my dataset. Im using the SelectKBest function for now. I can generate the score values and sort them as I want, but I don't understand exactly how this score value is calculated. I know that theoretically high score is more valuable, but I need a mathematical formula or an example to calculate the score for learning this deeply.
bestfeatures = SelectKBest(score_func=chi2, k=10)
fit = bestfeatures.fit(dataValues, dataTargetEncoded)
feat_importances = pd.Series(fit.scores_, index=dataValues.columns)
topFatures = feat_importances.nlargest(50).copy().index.values
print("TOP 50 Features (Best to worst) :\n")
print(topFatures)
Thank you in advance
Say you have one feature and a target with 3 possible values
X = np.array([3.4, 3.4, 3. , 2.8, 2.7, 2.9, 3.3, 3. , 3.8, 2.5])
y = np.array([0, 0, 0, 1, 1, 1, 2, 2, 2, 2])
X y
0 3.4 0
1 3.4 0
2 3.0 0
3 2.8 1
4 2.7 1
5 2.9 1
6 3.3 2
7 3.0 2
8 3.8 2
9 2.5 2
First we binarize the target
y = LabelBinarizer().fit_transform(y)
X y1 y2 y3
0 3.4 1 0 0
1 3.4 1 0 0
2 3.0 1 0 0
3 2.8 0 1 0
4 2.7 0 1 0
5 2.9 0 1 0
6 3.3 0 0 1
7 3.0 0 0 1
8 3.8 0 0 1
9 2.5 0 0 1
Then perform a dot product between feature and target, i.e. sum all feature values by class value
observed = y.T.dot(X)
>>> observed
array([ 9.8, 8.4, 12.6])
Next take a sum of feature values and calculate class frequency
feature_count = X.sum(axis=0).reshape(1, -1)
class_prob = y.mean(axis=0).reshape(1, -1)
>>> class_prob, feature_count
(array([[0.3, 0.3, 0.4]]), array([[30.8]]))
Now as in the first step we take the dot product, and get expected and observed matrices
expected = np.dot(class_prob.T, feature_count)
>>> expected
array([[ 9.24],[ 9.24],[12.32]])
Finally we calculate a chi^2 value:
chi2 = ((observed.reshape(-1,1) - expected) ** 2 / expected).sum(axis=0)
>>> chi2
array([0.11666667])
We have a chi^2 value, now we need to judge how extreme it is. For that we use a chi^2 distribution with number of classes - 1 degrees of freedom and calculate the area from chi^2 to infinity to get the probability of chi^2 be the same or more extreme than what we've got. This is a p-value. (using chi square survival function from scipy)
p = scipy.special.chdtrc(3 - 1, chi2)
>>> p
array([0.94333545])
Compare with SelectKBest:
s = SelectKBest(chi2, k=1)
s.fit(X.reshape(-1,1),y)
>>> s.scores_, s.pvalues_
(array([0.11666667]), [0.943335449873492])

Categories