Intersection over Union over two segments

Intersection over Union over two segments - python

Let's say I have this two Dataframes : Groundtruth and Prediction.
Each Dataframe has 3 columns ; Action, Start and End.
**Prediction :**
Action | Start | End
-------------------------
3 | 0 | 10
2 | 10 | 70
3 | 80 | 120
0 | 120 | 350
7 | 400 | 610
...
**Groundtruth :**
Action | Start | End
-------------------------
2 | 20 | 140
0 | 150 | 340
6 | 420 | 600
...
I want to compute the Intersection-over-Union (IoU) over these two dataframes using all columns, meaning Action first to see if it's a correct prediction or not plus the start and the end segments for each action to see if starts and ends correctly.
Here's my code :
def compute_iou(y_pred, y_true):
y_pred = y_pred.flatten()
y_true = y_true.flatten()
cm = confusion_matrix(y_true, y_pred)
intersection = np.diag(cm)
ground_truth_set = cm.sum(axis=1)
predicted_set = cm.sum(axis=0)
union = ground_truth_set + predicted_set - intersection
IoU = intersection / union
for i in range(len(IoU)):
if (IoU[i]>0.5):
IoU[i] = 1
return round(np.mean(IoU)*100, 3)
This works when I want to calculate the IoU over the actions column.
Now how can I adapt this so I can get IoU to get the overlapping segments over the start and end columns ?
PS : Groundtruth and Prediction dataframes don't have the same number of rows.

(post edit)
The calculation is broken into three cases:
Overlap: where the activity matches, and there's an overlap between the ground truth interval and the prediction interval.
No overlap: the activity matches, but there's no such overlap.
No hit: the activity wasn't predicted at all, or there's a wrong activity.
Here's the code:
df = pd.merge(pred, groundtruth, on = "Action", how = "outer", suffixes = ["_pred", "_gt"])
overlap = df[(df.Start_pred < df.End_gt) & (df.Start_gt < df.End_pred)]
intersection = (overlap[["End_pred", "End_gt"]].min(axis=1) - overlap[["Start_pred", "Start_gt"]].max(axis=1)).sum()
union_where_overlap = (overlap[["End_pred", "End_gt"]].max(axis=1) - overlap[["Start_pred", "Start_gt"]].\
min(axis=1)).sum()
no_hit = df[df.isna().sum(axis=1) > 0]
union_no_hit = (no_hit[["End_pred", "End_gt"]].max(axis=1) - no_hit[["Start_pred", "Start_gt"]].min(axis=1)).sum()
no_overlap = df[~((df.Start_pred < df.End_gt) & (df.Start_gt < df.End_pred))].dropna()
union_no_overlap = ((no_overlap.End_pred - no_overlap.Start_pred) + (no_overlap.End_gt - no_overlap.Start_gt)).sum()
IoU = intersection / (union_no_hit + union_where_overlap + union_no_overlap)

Related

Apply rotation matrix determined by separate fixed point - python

I'm applying a rotation matrix to a group of points with the aim to align the points along the horizontal axis. Using below, the xy points I want to adjust are recorded in x and y.
I'm hoping to transform the points using the angle between X_Ref and Y_Ref and X_Fixed and Y_Fixed. I'm also hoping to transform the points so X_Ref and Y_Ref is at 0,0 once the rotation is completed.
The rotated points currently don't adjust for this. I'm not sure if I should account for the reference point prior to rotating or afterwards.
import numpy as np
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA
import pandas as pd
df = pd.DataFrame({
'Period' : ['1','1','1','1','2','2','2','2'],
'Label' : ['A','B','C','D','A','B','C','D'],
'x' : [2.0,3.0,3.0,2.0,2.0,3.0,3.0,1.0],
'y' : [2.0,3.0,-1.0,0.0,2.0,3.0,-1.0,1.0],
'X_Ref' : [1,1,1,1,2,2,2,2],
'Y_Ref' : [1,1,1,1,0,0,0,0],
'X_Fixed' : [0,0,0,0,0,0,0,0],
'Y_Fixed' : [0,0,0,0,2,2,2,2],
})
np.random.seed(1)
xy = df[['x','y']].values
Ref = df[['X_Ref','Y_Ref']].values
Fix = df[['X_Fixed','Y_Fixed']].values
fig, ax = plt.subplots()
plot_kws = {'alpha': 0.75,
'edgecolor': 'white',
'linewidths': 0.75}
ax.scatter(xy[:, 0], xy[:, 1], **plot_kws)
ax.scatter(Ref[:, 0], Ref[:, 1], marker = 'x')
ax.scatter(Fix[:, 0], Fix[:, 1], marker = '+')
pca = PCA(2)
# Fit the PCA object, but do not transform the data
pca.fit(xy)
# pca.components_ : array, shape (n_components, n_features)
# cos theta
ct = pca.components_[0, 0]
# sin theta
st = pca.components_[0, 1]
# One possible value of theta that lies in [0, pi]
t = np.arccos(ct)
# If t is in quadrant 1, rotate CLOCKwise by t
if ct > 0 and st > 0:
t *= -1
# If t is in Q2, rotate COUNTERclockwise by the complement of theta
elif ct < 0 and st > 0:
t = np.pi - t
# If t is in Q3, rotate CLOCKwise by the complement of theta
elif ct < 0 and st < 0:
t = -(np.pi - t)
# If t is in Q4, rotate COUNTERclockwise by theta, i.e., do nothing
elif ct > 0 and st < 0:
pass
# Manually build the ccw rotation matrix
rotmat = np.array([[np.cos(t), -np.sin(t)],
[np.sin(t), np.cos(t)]])
# Apply rotation to each row of 'm'. The output (m2)
# will be the rotated FIFA input coordinates.
m2 = (rotmat # xy.T).T
# Center the rotated point cloud at (0, 0)
m2 -= m2.mean(axis=0)
Initial distribution period 1:
Intended distribution period 1:
Initial distribution period 2:
Intended distribution period 2:

Your question in unclear, since the "intended rotation" mentioned in the question can be already achieved if you plot m2 which has been already calculated:
fig, ax = plt.subplots()
ax.scatter(m2[:, 0], m2[:, 1], **plot_kws)
Output:
But you have also mentioned the following in the question:
The rotation angle is determined by the angle between X_Ref,Y_Ref and X_Fixed,Y_Fixed.
This is a totally different scenario. You can calculate the angle between two points by calculating the arctan between them, without having to use PCA at all. This can be done using numpy.arctan as follows:
t = np.arctan((Y_Fixed - Y_Ref/ X_Fixed - X_Ref))
Here (X_Fixed, Y_Fixed) and (X_Ref, Y_Ref) are being assumed as two points.
For each row in your dataframe, you can then calculate the x and y values after rotation with respect to the angle between (X_Fixed, Y_Fixed) and (X_Ref, Y_Ref) in that particular row. This can be done using the following code snippet;
def rotate_points(row):
t = np.arctan((row['Y_Fixed'] - row['Y_Ref']/ row['X_Fixed'] - row['X_Ref']))
rotmat = np.array([[np.cos(t), -np.sin(t)],
[np.sin(t), np.cos(t)]])
xy = row[['x','y']].values
rotated = rotmat # xy
return rotated
df['rotated_x'] = df.apply(lambda row: rotate_points(row)[0], axis = 1)
df['rotated_y'] = df.apply(lambda row: rotate_points(row)[1], axis = 1)
Your dataframe would now look like this with the two new columns added to the right:
+----+----------+---------+-----+-----+---------+---------+-----------+-----------+-------------+-------------+-------------+
| | Period | Label | x | y | X_Ref | Y_Ref | X_Fixed | Y_Fixed | Direction | rotated_x | rotated_y |
|----+----------+---------+-----+-----+---------+---------+-----------+-----------+-------------+-------------+-------------|
| 0 | 1 | A | -1 | 1 | 1 | 3 | -2 | 0 | Left | -1.34164 | 0.447214 |
| 1 | 1 | B | 0 | 4 | 1 | 3 | -2 | 0 | Left | -1.78885 | 3.57771 |
| 2 | 1 | C | 2 | 2 | 1 | 3 | -2 | 0 | Left | 0.894427 | 2.68328 |
| 3 | 1 | D | 2 | 3 | 1 | 3 | -2 | 0 | Left | 0.447214 | 3.57771 |
| 4 | 2 | E | 2 | 4 | 1 | 3 | -2 | 0 | Right | 0 | 4.47214 |
| 5 | 2 | F | 1 | 4 | 1 | 3 | -2 | 0 | Right | -0.894427 | 4.02492 |
| 6 | 2 | G | 3 | 5 | 1 | 3 | -2 | 0 | Right | 0.447214 | 5.81378 |
| 7 | 2 | H | 0 | 2 | 1 | 3 | -2 | 0 | Right | -0.894427 | 1.78885 |
+----+----------+---------+-----+-----+---------+---------+-----------+-----------+-------------+-------------+-------------+
Now you have your rotated x and y points as desired.
UPDATE:
As per the amended question, you can add the reference point at (0,0) in your plot as follows:
fig, ax = plt.subplots()
ax.scatter(m2[:, 0], m2[:, 1], **plot_kws)
ax.scatter(list(np.repeat(0, len(Ref))), list(np.repeat(0, len(Ref))) , **plot_kws)
plt.show()
Output:

There is no need for any PCA if I understood what you try to achieve. I'd use complex number and that seems more straightforward :
EDIT
There was a small mistake in the order of steps for translation previously. This edit will correct it as well as use your new dataset including changing ref/fixed points at different periods.
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
df = pd.DataFrame({
'Period' : ['1','1','1','1','2','2','2','2'],
'Label' : ['A','B','C','D','A','B','C','D'],
'x' : [2.0,3.0,3.0,2.0,2.0,3.0,3.0,1.0],
'y' : [2.0,3.0,-1.0,0.0,2.0,3.0,-1.0,1.0],
'X_Ref' : [1,1,1,1,2,2,2,2],
'Y_Ref' : [1,1,1,1,0,0,0,0],
'X_Fixed' : [0,0,0,0,0,0,0,0],
'Y_Fixed' : [0,0,0,0,2,2,2,2],
})
First, transform fixed/ref points to complex numbers :
for f in ['Ref', 'Fixed']:
df[f] = df['X_'+f] + 1j*df['Y_'+f]
df.drop(['X_'+f, 'Y_'+f], axis=1, inplace=True)
Compute the rotation (note that it is the opposite angle of what you stated in your question to match your expected results) :
df['angle'] = - np.angle(df['Ref'] - df['Fixed'])
Compute the rotation for every point (ref/fixed included) :
df['rotated'] = (df['x'] + 1j*df["y"]) * np.exp(1j*df['angle'])
for f in ['Ref', 'Fixed']:
df[f+'_Rotated'] = df[f] * np.exp(1j*df['angle'])
Center your dataset around the "ref" point :
df['translation'] = - df['Ref_Rotated']
df['NewPoint'] = df['rotated'] + df['translation']
for f in ['Ref', 'Fixed']:
df[f+'_Transformed'] = df[f+'_Rotated'] + df['translation']
Revert to cartesian coordinates :
df['x2'] = np.real(df['NewPoint'])
df['y2'] = np.imag(df['NewPoint'])
for f in ['Ref', 'Fixed']:
df['NewX_'+f] = np.real(df[f+'_Transformed'])
df['NewY_'+f] = np.imag(df[f+'_Transformed'])
And then plot the output for any period you like :
output = df[['Period', 'Label', 'x2', 'y2', 'NewX_Ref', 'NewY_Ref', 'NewX_Fixed', 'NewY_Fixed']]
output.set_index('Period', inplace=True)
fig, ax = plt.subplots()
plot_kws = {'alpha': 0.75,
'edgecolor': 'white',
'linewidths': 0.75}
plt.xlim(-5,5)
plt.ylim(-5,5)
period = '1'
ax.scatter(output.loc[period, 'NewX_Ref'], output.loc[period, 'NewY_Ref'])
ax.scatter(output.loc[period, 'NewX_Fixed'], output.loc[period, 'NewY_Fixed'])
ax.scatter(output.loc[period, 'x2'], output.loc[period, 'y2'], **plot_kws, marker = '+')
plt.gca().set_aspect('equal', adjustable='box')
plt.show()
Result for period 1 :
Result for period 2 :

How can I do to add the condition AND using exlude?

I have this query :
MyTable.objects.filter(date=date).exclude(starthour__range=(start, end), endhour__range=(start, end))
But I want exclude the queries that starthour__range=(start, end) AND endhour__range=(start, end) not OR. I think in this case the OR is used.
Could you help me please ?
Thank you very much !

This is consequence of De Morgan's law [wiki], this specifies that ¬ (x ∧ y) is ¬ x ∨ ¬ y. This thus means that the negation of x and y, is not x or not y. Indeed, if we take a look at the truth table:
x | y | x &wedge; y | ¬x | ¬y | ¬(x &wedge; y) | ¬x &vee; ¬y
---+---+-------+----+----+----------+---------
0 | 0 | 0 | 1 | 1 | 1 | 1
0 | 1 | 0 | 1 | 0 | 1 | 1
1 | 0 | 0 | 0 | 1 | 1 | 1
1 | 1 | 1 | 0 | 0 | 0 | 0
So excluding items where both the starthour is in the(start, end) range and endhour is in the (start, end) range, is logically equivalent to allowing items where the starthour is not in the range, or where the endhour is not in the range.
Using and logic
You can thus make a disjunction in the .exclude(…) call to filter out items that satisfy one of the two conditions, or thus retain objects that do not satisfy any of the two conditions:
MyTable.objects.filter(date=date).exclude(
Q(starthour__range=(start, end)) | Q(endhour__range=(start, end))
)
Overlap logic
Based on your query however, you are looking for overlap, not for such range checks. It will not be sufficient to validate the starthour and endhour. If you want to check if two things overlap. Indeed, imagine that an event starts at 08:00 and ends at 18:00, and you filter for a range with 09:00 and 17:00, then both the starthour and endhour are not in the range, but still the events overlap.
Two ranges [s1, e1] and [s2, e2] do not overlap if s1≥ e2, or s2≥ e1. The negation, the condition when the two overlap is thus: s1< e2 and s2< e1. We thus can exclude the items that overlap with:
# records that do not overlap
MyTable.objects.filter(date=date).exclude(
starthour__lt=end, endhour__lt=start
)

Comparing a value from one dataframe with values from columns in another dataframe and getting the data from third column

The title is bit confusing but I'll do my best to explain my problem here. I have 2 pandas dataframes, a and b:
>> print a
id | value
1 | 250
2 | 150
3 | 350
4 | 550
5 | 450
>> print b
low | high | class
100 | 200 | 'A'
200 | 300 | 'B'
300 | 500 | 'A'
500 | 600 | 'C'
I want to create a new column called class in table a that contains the class of the value in accordance with table b. Here's the result I want:
>> print a
id | value | class
1 | 250 | 'B'
2 | 150 | 'A'
3 | 350 | 'A'
4 | 550 | 'C'
5 | 450 | 'A'
I have the following code written that sort of does what I want:
a['class'] = pd.Series()
for i in range(len(a)):
val = a['value'][i]
cl = (b['class'][ (b['low'] <= val) \
(b['high'] >= val) ].iat[0])
a['class'].set_value(i,cl)
Problem is, this is quick for tables length of 10 or so, but I am trying to do this with a table size of 100,000+ for both a and b. Is there a quicker way to do this, using some function/attribute in pandas?

Here is a way to do a range join inspired by #piRSquared's solution:
A = a['value'].values
bh = b.high.values
bl = b.low.values
i, j = np.where((A[:, None] >= bl) & (A[:, None] <= bh))
pd.DataFrame(
np.column_stack([a.values[i], b.values[j]]),
columns=a.columns.append(b.columns)
)
Output:
id value low high class
0 1 250 200 300 'B'
1 2 150 100 200 'A'
2 3 350 300 500 'A'
3 4 550 500 600 'C'
4 5 450 300 500 'A'

Here's a solution that is admittedly less elegant than using Series.searchsorted, but it runs super fast!
I pull data out from the pandas DataFrames and convert them to lists and then use np.where to populate a variable called "aclass" where the conditions are satified (in brute force for loops). Then I write "aclass" to the original data frame a.
The evaluation time was 0.07489705 s, so it's pretty fast, even with 200,000 data points!
# create 200,000 fake a data points
avalue = 100+600*np.random.random(200000) # assuming you extracted this from a with avalue = np.array(a['value'])
blow = [100,200,300,500] # assuming you extracted this from b with list(b['low'])
bhigh = [200,300,500,600] # assuming you extracted this from b with list(b['high'])
bclass = ['A','B','A','C'] # assuming you extracted this from b with list(b['class'])
aclass = [[]]*len(avalue) # initialize aclass
start_time = time.time() # this is just for timing the execution
for i in range(len(blow)):
for j in np.where((avalue>=blow[i]) & (avalue<=bhigh[i]))[0]:
aclass[j]=bclass[i]
# add the class column to the original a DataFrame
a['class'] = aclass
print("--- %s seconds ---" % np.round(time.time() - start_time,decimals = 8))

Passing value from previous result python

I want to evaluate the gap of a variable between time interval.
Here is an example of the calculation:
Count | Gap | Gap Result | Evaluate
----------------------------------------
19 | 15-5 | 10 | 10
18 | 15-3 | 12 | 10-12 = -2
17 | 15-4 | 11 | 12-11 = 1
I have no idea how to express it. Please advice.
number = [1,2,3,4,5,6,7]
goal = 15
count = 20
def step (self)
while count > 0:
count -= 1
gap = [goal - (random.choice(number))]
previous_gap = gap from (count - 1) # I don't know how to express this
evaluate = previous_gap - gap

You'll need to store the previous gap too; set it to 0 to start with. You don't want a list, you are dealing with individual numbers here:
goal = 15
count = 20
previous_gap = evaluate = 0
while count > 0:
count -= 1
gap = goal - random.choice(number)
if previous_gap:
evaluate = previous_gap - gap
# remember the gap for the next step
previous_gap = gap

Gurobi: How can I sum just a part of a variable?

I have the following model:
from gurobipy import *
n_units = 1
n_periods = 3
n_ageclasses = 4
units = range(1,n_units+1)
periods = range(1,n_periods+1)
periods_plus1 = periods[:]
periods_plus1.append(max(periods_plus1)+1)
ageclasses = range(1,n_ageclasses+1)
nothickets = ageclasses[1:]
model = Model('MPPM')
HARVEST = model.addVars(units, periods, nothickets, vtype=GRB.INTEGER, name="HARVEST")
FOREST = model.addVars(units, periods_plus1, ageclasses, vtype=GRB.INTEGER, name="FOREST")
model.addConstrs((quicksum(HARVEST[(k+1), (t+1), nothicket] for k in range(n_units) for t in range(n_periods) for nothicket in nothickets) == FOREST[unit, period+1, 1] for unit in units for period in periods if period < max(periods_plus1)), name="A_Thicket")
I have a problem with formulating the constraint. I want for every unit and every period to sum the nothickets part of the variable HARVEST. Concretely I want xk=1,t=1,2 + xk=1,t=1,3 + xk=1,t=1,4
and so on. This should result in only three ones per row of the constraint matrix. But with the formulation above I get 9 ones.
I tried to use a for loop outside of the sum, but this results in another problem:
for k in range(n_units):
for t in range(n_periods):
model.addConstrs((quicksum(HARVEST[(k+1), (t+1), nothicket] for nothicket in nothickets) == FOREST[unit,period+1, 1] for unit in units for period in periods if period < max(periods_plus1)), name="A_Thicket")
With this formulation I get this matrix:
constraint matrix
But what I want is:
row_idx | col_idx | coeff
0 | 0 | 1
0 | 1 | 1
0 | 2 | 1
0 | 13 | -1
1 | 3 | 1
1 | 4 | 1
1 | 5 | 1
1 | 17 | -1
2 | 6 | 1
2 | 7 | 1
2 | 8 | 1
2 | 21 | -1
Can anybody please help me to reformulate this constraint?

This worked for me:
model.addConstrs((HARVEST.sum(unit, period, '*') == ...

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Intersection over Union over two segments - python

Related

Apply rotation matrix determined by separate fixed point - python

How can I do to add the condition AND using exlude?

Comparing a value from one dataframe with values from columns in another dataframe and getting the data from third column

Passing value from previous result python

Gurobi: How can I sum just a part of a variable?

Categories

Resources