Why does OpenCV matchShape() match "X" with "N"?

Why does OpenCV matchShape() match "X" with "N"? - python

I am comparing the contours of letters and have several cases of unexpected results. The most confusing to me is how X and N are being identified as best matches.
In the images below, yellow represents the unknown shape and blue represents candidate shapes. The white numbers are the result returned by cv.matchShapes using CONTOURS_MATCH_I3. (I've tried the other matching methods and just get similar odd results but with a different set of letters.)
Below shows X matching N better than X
Below shows N matching X better than N
At the end of the post are the raw data and below is a chart of the the data.
I can't come up with a rotation, scale, or skew to show that this is an optical illusion. I'm not suggesting there is an issue in matchShapes but rather an issue in my understanding of Hu moments.
I'd appreciate if someone would take a moment (pun intended) and explain how cv.matchShapes is producing these results.
--- edited ----
The images below are the result of using poly-filled shapes. I am still baffled how these letters match better than the correct ones.
target_letter
33 23
32 24
30 24
28 26
28 30
29 31
29 32
31 34
31 35
33 37
33 38
36 41
36 42
38 44
38 47
35 50
35 51
33 53
33 54
30 57
30 58
28 60
28 61
27 62
27 67
29 69
34 69
38 65
38 64
40 62
40 61
42 59
42 58
46 54
47 54
49 56
49 57
51 59
51 60
53 62
53 63
56 66
56 67
58 69
63 69
65 67
65 60
63 58
63 57
60 54
60 53
58 51
58 50
55 47
55 44
57 42
57 41
61 37
61 36
64 33
64 32
65 31
65 25
64 24
62 24
61 23
60 24
58 24
55 27
55 28
52 31
52 32
50 34
50 35
47 38
45 36
45 35
41 31
41 30
40 29
40 28
38 26
38 25
37 24
35 24
34 23
candidateLetter N
10 3
9 4
7 4
6 5
5 5
5 6
4 7
4 9
3 10
3 44
4 45
4 47
6 49
12 49
13 48
13 47
14 46
14 23
15 22
17 24
17 25
21 29
21 30
24 33
24 34
27 37
27 38
31 42
31 43
34 46
34 47
35 48
36 48
37 49
43 49
45 47
45 6
43 4
38 4
36 6
36 8
35 9
35 27
36 28
36 29
34 31
33 30
33 29
31 27
31 26
27 22
27 21
24 18
24 17
21 14
21 13
18 10
18 9
13 4
11 4
candidateLetter X
10 2
9 3
7 3
6 4
6 6
5 7
5 8
6 9
6 11
8 13
8 14
10 16
10 17
14 21
14 22
16 24
16 25
13 28
13 29
10 32
10 33
7 36
7 37
5 39
5 40
4 41
4 46
6 48
11 48
15 44
15 43
17 41
17 40
19 38
19 37
21 35
21 34
23 32
26 35
26 36
28 38
28 39
30 41
30 42
33 45
33 46
35 48
40 48
42 46
42 39
40 37
40 36
37 33
37 32
34 29
34 28
32 26
32 23
34 21
34 20
37 17
37 16
41 12
41 11
42 10
42 4
41 3
39 3
38 2
37 3
35 3
32 6
32 7
29 10
29 11
27 13
27 14
24 17
21 14
21 13
18 10
18 9
17 8
17 7
15 5
15 4
14 3
12 3
11 2

Related

Grid of integers

I need to make a grid with the numbers generated by the code, but I'm not understanding how to align them in columns.
Is there a parameter of print or something else that could help me out?
#main()
a=0
b=0
for i in range(1, 13):
a=a+1
print(" ")
b=b+1
for f in range(1,13):
print(f*b, end=" ")
My output at the moment:

I would recommend using python's f-strings:
for i in range(1, 13):
print(''.join(f"{i*j: 4}" for j in range(1,13)))
Here's the output:
1 2 3 4 5 6 7 8 9 10 11 12
2 4 6 8 10 12 14 16 18 20 22 24
3 6 9 12 15 18 21 24 27 30 33 36
4 8 12 16 20 24 28 32 36 40 44 48
5 10 15 20 25 30 35 40 45 50 55 60
6 12 18 24 30 36 42 48 54 60 66 72
7 14 21 28 35 42 49 56 63 70 77 84
8 16 24 32 40 48 56 64 72 80 88 96
9 18 27 36 45 54 63 72 81 90 99 108
10 20 30 40 50 60 70 80 90 100 110 120
11 22 33 44 55 66 77 88 99 110 121 132
12 24 36 48 60 72 84 96 108 120 132 144
The most common form is to use almost any arbitrary expression within the curly braces. This can include dictionary values, function calls and so on. The above usage specifies formatting after the colon. The space before the 4 indicates that the fill character should be a space, and the 4 indicates that the whole expression should take up 4 characters total. For more info, check out the documentation.

Considering the width of each grid cell is stored as w, which for above snippet suffices as 4, a regularly spaced grid can be printed using
w = 4
a, b = 0, 0
for i in range(1, 13):
a, b = a+1, b+1
for f in range(1, 13):
print(('{:'+str(w)+'}').format(f*b), end='')
print('')
Its output is
1 2 3 4 5 6 7 8 9 10 11 12
2 4 6 8 10 12 14 16 18 20 22 24
3 6 9 12 15 18 21 24 27 30 33 36
4 8 12 16 20 24 28 32 36 40 44 48
5 10 15 20 25 30 35 40 45 50 55 60
6 12 18 24 30 36 42 48 54 60 66 72
7 14 21 28 35 42 49 56 63 70 77 84
8 16 24 32 40 48 56 64 72 80 88 96
9 18 27 36 45 54 63 72 81 90 99 108
10 20 30 40 50 60 70 80 90 100 110 120
11 22 33 44 55 66 77 88 99 110 121 132
12 24 36 48 60 72 84 96 108 120 132 144

You can reference keyword argument values passed to the str.format() method in the format string by name via {name}. Here's an example of doing that where the value referenced is computed (as opposed to being a constant):
mx = 12
w = len(str(mx*mx)) + 1
for b in range(1, mx+1):
for f in range(1, mx+1):
print(('{:{w}}').format(f*b, w=w), end='')
print('')
Output:
1 2 3 4 5 6 7 8 9 10 11 12
2 4 6 8 10 12 14 16 18 20 22 24
3 6 9 12 15 18 21 24 27 30 33 36
4 8 12 16 20 24 28 32 36 40 44 48
5 10 15 20 25 30 35 40 45 50 55 60
6 12 18 24 30 36 42 48 54 60 66 72
7 14 21 28 35 42 49 56 63 70 77 84
8 16 24 32 40 48 56 64 72 80 88 96
9 18 27 36 45 54 63 72 81 90 99 108
10 20 30 40 50 60 70 80 90 100 110 120
11 22 33 44 55 66 77 88 99 110 121 132
12 24 36 48 60 72 84 96 108 120 132 144

Reading csv file with delimiter | using pandas

def main():
l=[]
for i in range(1981,2018):
df = pd.read_csv("ftp://ftp.cpc.ncep.noaa.gov/htdocs/degree_days/weighted/daily_data/"+ str(i)+"/Population.Heating.txt")
print(df[12:])
I am trying to download and read the "CONUS" row in Population.Heating.txt from 1981 to 2017.
My code seems to get the CONUS parts, but How can I actually read it like csv format with |?
Thank you!

Try this:
def main():
l=[]
url = "ftp://ftp.cpc.ncep.noaa.gov/htdocs/degree_days/weighted/daily_data/{}/Population.Heating.txt"
for i in range(1981,2018):
df = pd.read_csv(url.format(i), sep='\|', skiprows=3, engine='python')
print(df[12:])
Demo:
In [14]: url = "ftp://ftp.cpc.ncep.noaa.gov/htdocs/degree_days/weighted/daily_data/{}/Population.Heating.txt"
In [15]: i = 2017
In [16]: df = pd.read_csv(url.format(i), sep='\|', skiprows=3, engine='python')
In [17]: df
Out[17]:
Region 20170101 20170102 20170103 20170104 20170105 20170106 20170107 20170108 20170109 ... 20171222 20171223 \
0 1 30 36 31 25 37 39 47 51 55 ... 40 32
1 2 28 32 28 23 39 41 46 49 51 ... 31 25
2 3 34 30 26 43 52 58 57 54 44 ... 29 32
3 4 37 34 37 57 60 62 59 54 43 ... 39 45
4 5 15 11 9 10 20 21 27 36 33 ... 12 7
5 6 16 9 7 22 31 38 45 44 35 ... 9 9
6 7 8 5 9 23 23 34 37 32 17 ... 9 19
7 8 30 32 34 33 36 42 42 31 23 ... 36 33
8 9 25 25 24 23 22 25 23 15 17 ... 23 20
9 CONUS 24 23 21 26 33 38 40 39 34 ... 23 22
20171224 20171225 20171226 20171227 20171228 20171229 20171230 20171231
0 32 34 43 53 59 59 57 59
1 30 33 43 49 54 53 50 55
2 41 47 58 62 60 54 54 60
3 47 55 61 64 57 54 62 68
4 12 20 21 22 27 26 24 29
5 22 33 31 35 37 33 32 39
6 19 24 23 28 28 23 19 27
7 34 30 32 29 26 24 27 30
8 18 17 17 15 13 11 12 15
9 26 30 34 37 38 35 34 40
[10 rows x 366 columns]

def main():
l=[]
for i in range(1981,2018):
l.append( pd.read_csv("ftp://ftp.cpc.ncep.noaa.gov/htdocs/degree_days/weighted/daily_data/"+ str(i)+"/Population.Heating.txt"
, sep='|', skiprows=3))
Files look like:
Product: Daily Heating Degree Days
Regions: Regions::CensusDivisions
Weights: Population
[... data ...]
so you need to skip 3 rows. Afterwards you have several 'df' in your list 'l' for further processing.

How to calculate cross-validation score?

I am quite new to the scikit-learn module and therefore reading the tutorials carefully here: http://scikit-learn.org/stable/modules/cross_validation.html. But, I got stuck while playing with my data. I want to try the cross validation score (CVS) scheme with my data. Can someone please help me out?
I have a data file here: https://www.dropbox.com/s/e8xq7qm5gy7lnjw/data.dat?dl=0
The 'x' and 'y' columns represent the actual and model values resp. I just want to know how good my model values are and therefore want to calculate CVS. Can someone guide me with a code how can I do that? Since I am starting so might be short of information, please let me know if there is need of any other information.
I started like this:
import numpy as np
from sklearn.model_selection import train_test_split, cross_val_score, KFold
from sklearn import datasets
from sklearn import svm
data = np.loadtxt("deviation.dat")
k_fold = KFold(n_splits=3)
for train, test in k_fold.split(data):
print('Train: %s | test: %s' % (train, test))
This prints:
Train: [25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49
50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74] | test: [ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24]
Train: [ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74] | test: [25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49]
Train: [ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49] | test: [50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74]
But now I want to know what is the score?
N.B.: 'deviation.dat' is just the (abs(column x - column y )) of the data file.

python index error pop up when I have a longer input

I am solving one sorting problem. I encountered with one problem which has been troubling me for 2 days.
I ran the code for a shorter input, it worked just as what I expected.
However, when I put a much longer input into this program, runtime error emerges.
Here is the code:
row_number, row_length = input().split()
row_number, row_length = int(row_number), int(row_length)
def row_input():
data_input = []
for i in range(0,row_number):
row = list(map(int,input().split()))
data_input.append(row)
return data_input
def sort_data(data):
k = int(input())
sorted_data = []
for row in data:
sorted_data.append(row[k])
sorted_data.sort()
n = 0
while n < row_number:
for m in data:
if sorted_data[n] == m[k]:
print_data(m)
n = n + 1
def print_data(data):
b=''
for n in data:
b= b + str(n).ljust(len(str(n))+1)
print(b)
data = row_input()
sort_data(data)
Here is the short input:
10 3
1 1 1
1 1 2
1 1 3
1 1 4
2 2 5
2 3 6
2 3 7
2 3 8
2 3 9
2 4 0
1
Here is the longer input:
100 10
64 79 18 94 46 81 74 97 71 92
46 24 23 20 68 15 53 93 24 91
17 66 34 64 28 5 55 25 44 96
16 71 80 84 5 79 63 77 69 77
33 77 24 13 58 81 41 36 73 62
93 26 16 55 61 51 39 69 29 45
44 85 1 48 23 59 52 82 50 37
77 74 9 21 35 54 81 57 32 76
82 21 72 49 98 21 77 64 6 63
68 17 93 83 12 43 84 28 96 86
9 16 3 89 38 11 70 25 41 38
49 99 31 19 85 97 80 63 16 69
50 85 80 75 36 48 56 69 63 94
78 80 83 86 92 60 56 90 22 73
69 81 45 9 67 25 82 46 68 82
98 38 23 31 38 83 37 76 69 82
95 48 21 64 25 6 38 96 69 23
44 97 46 54 21 56 65 51 66 34
87 22 27 24 55 48 90 10 8 51
21 6 74 78 8 88 26 63 72 43
64 4 42 20 54 91 2 51 79 40
93 76 52 58 40 78 98 27 53 48
85 23 86 30 91 49 81 4 59 9
88 96 77 95 36 71 7 52 14 20
69 98 21 94 14 35 28 97 3 9
60 47 56 34 35 61 9 44 80 92
4 76 57 28 60 3 46 4 6 17
59 44 88 7 71 60 84 12 91 38
76 57 5 2 25 12 46 62 32 68
14 15 11 1 34 20 54 58 45 38
89 49 16 43 74 51 80 22 88 31
8 98 51 73 32 13 59 12 56 92
36 82 9 63 77 79 77 25 52 91
63 82 58 75 13 20 79 89 55 89
58 37 93 1 29 72 78 95 47 35
90 82 58 60 55 86 82 22 44 94
55 17 51 99 29 92 1 79 96 34
32 78 41 1 24 52 11 80 3 25
30 32 32 71 85 80 63 23 80 97
35 22 11 71 10 48 43 58 31 33
30 98 60 58 28 71 95 28 21 29
74 4 13 99 90 64 28 27 73 4
52 21 52 31 35 82 35 64 21 71
92 85 13 48 5 32 92 70 15 85
47 55 25 80 24 22 19 78 17 43
3 91 71 53 49 39 96 88 59 61
79 26 98 2 95 95 70 38 82 85
69 67 41 11 95 39 20 19 96 36
11 74 48 23 84 49 47 43 27 90
4 28 35 14 70 62 52 94 46 91
72 11 14 82 59 51 93 98 55 79
90 84 84 24 21 81 11 57 27 78
98 97 59 51 89 40 96 35 25 59
73 85 64 17 46 9 79 54 27 15
48 91 7 56 41 6 4 26 96 39
43 22 34 89 52 59 55 52 38 42
10 31 9 8 21 46 29 4 97 4
44 49 78 31 53 29 11 35 46 14
44 39 57 35 9 63 85 5 97 24
9 72 49 50 41 47 23 71 15 45
51 6 98 64 75 35 39 48 2 50
92 22 72 60 96 15 17 4 79 27
90 30 98 28 92 8 83 71 24 62
5 54 86 14 71 96 87 2 58 78
37 61 60 30 46 96 49 58 27 48
14 59 22 35 75 60 55 28 91 85
21 1 85 85 78 67 24 69 22 17
76 61 84 64 33 76 61 10 33 95
71 9 1 32 31 80 69 7 25 59
69 64 78 85 21 88 56 70 92 74
79 12 8 9 54 56 37 44 1 84
6 66 54 5 82 17 41 25 3 71
8 44 63 17 75 43 87 15 85 3
15 42 15 59 38 22 46 27 19 13
54 71 76 93 67 39 46 12 78 46
23 82 71 34 31 61 94 58 10 62
30 8 43 38 7 23 77 38 93 32
32 72 46 59 64 45 14 73 62 72
76 26 47 89 25 73 79 28 60 48
41 58 85 55 29 64 39 84 20 87
24 8 70 16 69 32 17 26 58 16
40 53 40 63 22 37 11 74 7 8
23 4 56 39 27 94 91 72 14 61
41 86 3 29 41 15 99 50 82 84
33 5 22 93 73 86 99 87 26 66
73 25 55 46 69 38 99 14 43 55
43 21 82 30 90 66 6 67 49 25
81 38 65 40 80 7 90 82 33 13
18 45 1 90 53 51 51 96 32 90
32 69 51 22 71 85 80 61 99 23
88 8 41 92 4 25 64 89 30 75
93 85 99 87 67 3 54 16 98 57
33 54 31 83 64 93 3 24 65 81
74 19 15 66 17 14 34 50 57 16
10 30 20 97 32 85 83 89 68 18
46 82 9 14 54 50 55 28 26 96
29 96 3 33 12 52 11 26 19 22
50 81 95 59 76 53 10 9 72 87
25 85 54 43 53 13 52 70 38 76
20 14 30 80 23 43 27 67 42 11
5
Here is the error while running the longer input:
Traceback (most recent call last):
File "solution.py", line 30, in <module>
sort_data(data)
File "solution.py", line 19, in sort_data
if sorted_data[n] == m[k]:
IndexError: list index out of range

The problem is in your sorting logics, because it is highly possible you increment n by more than one in one iteration of the while loop if there are multiple matching rows in the dataset.
The right solution is simpler than you think:
def sort_data(data):
k = int(input())
output = sorted(data, key=lambda row: row[k])
for r in output:
print_data(r)
UPDATE: The smallest dataset on what your algorithm fails is:
2 1
2
1
0
A small modification on your function will stop it from overindexing. The key is to store sorted_data[n] in a variable, and that way it will not try to over index sorted_data when no more output is expected.
def sort_data(data):
k = int(input())
sorted_data = []
for row in data:
sorted_data.append(row[k])
sorted_data.sort()
n = 0
while n < row_number:
key = sorted_data[n]
for m in data:
if key == m[k]:
print_data(m)
n = n + 1
UPDATE:
The sorted function's key parameter is a function, which just selects a value what to sort by. In your case, selects the kth column, which is what you want to sort by.

Gradient four figures scale

I have a question using matplotlib and imshow. I want to plot in the same figure four "matrices", using imshow, and I need the gradient to be between [0, 1]. I also need to normalize the data with the following formula:
data_norm = data * 2/400
So far I have this:
from matplotlib import mpl,pyplot
import numpy as np
zvals = np.loadtxt("sharedGradient.txt")
img = pyplot.imshow(zvals,interpolation='nearest')
pyplot.colorbar(img)
pyplot.show()
The data is in .txt files, but this is a sample of data:
61 62 63 64 65 66 67 6 5 83 82 81 28 29 30 33 34 35 36 37
60 13 12 11 10 9 8 7 4 3 2 7 27 76 31 32 69 42 41 38
59 14 15 16 17 18 69 12 11 10 1 0 26 75 74 73 70 43 40 39
58 57 56 41 40 19 70 71 72 73 4 3 25 79 133 72 71 44 61 62
160 161 55 42 39 20 21 107 114 0 1 2 24 51 52 47 46 45 60 108
62 61 54 43 38 37 22 35 38 37 36 35 23 50 49 48 57 58 59 0
63 64 53 44 25 24 23 34 31 32 33 34 22 51 56 55 56 108 107 1
203 65 52 45 26 31 24 33 30 33 34 20 21 52 53 54 55 109 106 2
202 66 51 46 27 30 25 28 29 17 18 19 38 37 36 35 111 110 105 3
156 199 50 47 28 29 26 27 28 16 30 54 50 51 52 34 112 103 104 4
121 120 49 48 28 29 46 45 27 15 39 55 49 54 53 33 113 102 6 5
114 113 112 109 27 30 31 12 13 14 40 41 46 55 31 32 120 101 7 8
3 4 5 6 15 0 10 11 25 35 40 42 45 48 30 29 28 100 99 9
2 1 0 3 2 1 2 77 32 33 34 45 46 57 67 68 27 26 25 10
9 6 5 0 1 7 80 81 31 30 35 44 60 58 59 69 70 23 24 11
10 2 3 4 5 6 79 82 83 29 36 43 42 41 60 65 66 22 21 12
11 1 11 10 21 20 23 67 66 28 37 38 39 40 61 64 67 92 20 13
12 0 14 15 20 70 7 6 26 27 80 77 76 73 62 63 68 91 19 14
13 15 51 18 19 71 8 5 4 3 2 82 83 84 71 70 69 90 18 15
14 14 13 12 11 10 9 128 129 0 1 146 147 85 86 87 88 89 17 16
My issue is that I can't get the gradient to be between [0, 1] and I can't put different plots in the same figure. Hope somebody can help.

After you normalize the data the gradient is already adjusted from 0 to 1
to separate the imshow graphs simply add subplots to the figures: plt.subplot(number of rows, number of columns, graph number)
import matplotlib.pyplot as plt
import numpy as np
zvals = np.loadtxt("sharedGradient.txt")
zvals = zvals/200
plt.subplot(2,2,1)
img = plt.imshow(zvals,interpolation='nearest')
plt.colorbar(img)
plt.subplot(2,2,2)
img = plt.imshow(zvals)
plt.colorbar(img)
plt.subplot(2,2,3)
img = plt.imshow(zvals)
plt.colorbar(img)
plt.subplot(2,2,4)
img = plt.imshow(zvals)
plt.colorbar(img)
plt.show()
If you're also trying to make the axis range from 0 to 1 then use the extent=(0,1,0,1) inside imshow()

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Why does OpenCV matchShape() match "X" with "N"? - python

Related

Grid of integers

Reading csv file with delimiter | using pandas

How to calculate cross-validation score?

python index error pop up when I have a longer input

Gradient four figures scale

Categories

Resources