I need help to understand how to downsample a matrix from an image to a matrix of 10x10.
Libraries:
import sys
import cv2
import numpy
from PIL import Image
from numpy import array
I used PIL to get the image and used numpy to get the matrix array
im_1 = Image.open(r"C:\Users\DELL\Downloads\FaceDataset\s1\1.pgm")
ar = array(im_1)
numpy.set_printoptions(threshold=sys.maxsize)
print("Orignal Image Matrix")
print(ar)
output:
[[ 48 49 45 47 49 57 39 42 53 49 53 60 76 91 99 95 80 75
66 54 47 49 50 43 46 53 61 70 84 105 133 130 110 94 81 107
95 80 57 55 66 86 80 74 65 71 62 84 52 74 71 67 64 88
68 71 75 66 57 61 62 52 47 50 58 60 64 66 57 46 54 66
80 80 68 71 87 64 77 66 83 77 58 46 41 43 56 55 51 56
56 54]
this is not the complete output. how can i downsample this into a 10x10 matrix?
note: i used opencv to use the "pyrDown"but it didn't give the output i need
I solved it by image slicing method in 2d array
dsAr = ar[::12, ::10]
print(dsAr)
note: slicing notation is => start:stop:step
Related
Give this salt in Python
salt = b"0000000000000000004d6ec16dafe9d8370958664c1dc422f452892264c59526"
What's the equivalent in Nodejs?
I have this
const salt = Buffer.from("0000000000000000004d6ec16dafe9d8370958664c1dc422f452892264c59526", "hex");
But upon conversion to string, they don't match.
This isn't a duplicate, but it should explain, why your code doesn't work: What is the difference between a string and a byte string? A byte string in Python isn't a hex encoding of a string.
The equivalent Node.js code is:
const salt = Buffer.from("0000000000000000004d6ec16dafe9d8370958664c1dc422f452892264c59526");
without 'hex'. The default encoding is 'utf8'.
You can see it with:
salt = b"0000000000000000004d6ec16dafe9d8370958664c1dc422f452892264c59526"
print(' '.join(map(str, salt)))
# output:
# 48 48 48 48 48 48 48 48 48 48 48 48 48 48 48 48 48 48 52 100 54 101 99 49 54 100 97 102 101 57 100 56 51 55 48 57 53 56 54 54 52 99 49 100 99 52 50 50 102 52 53 50 56 57 50 50 54 52 99 53 57 53 50 54
and
const salt = Buffer.from("0000000000000000004d6ec16dafe9d8370958664c1dc422f452892264c59526");
console.log(salt.join(' '));
// output:
// 48 48 48 48 48 48 48 48 48 48 48 48 48 48 48 48 48 48 52 100 54 101 99 49 54 100 97 102 101 57 100 56 51 55 48 57 53 56 54 54 52 99 49 100 99 52 50 50 102 52 53 50 56 57 50 50 54 52 99 53 57 53 50 54
I have a 3D array and I want to rearrange them so each position in each dimension still corresponds to the next dimension, but the order of all 3 are changed.
I have a 3D array:
array = [32,30,96]
The third dimension with 96 values I already ranked ordered to find that :
ranks = [57 23 68 58 25 91 70 83 77 75 89 34 49 79 66 54 67 44 63 52 46 20 64 10 80 33 30 29 28 26 17 27 50 51 92 86 69 47 0 7 3 85 18 11 13 53 8 78 82 81 14 74 59 32 42 39 1 31 36 19 24 5 38 9 73 71 76 87 41 55 94 93 84 16 90 62 48 43 72 95 65 45 61 22 21 15 37 88 2 40 56 6 12 60 4 35]
I want to rearrange the whole 3D array to reflect the rank order. SO I still want the same association across the array, but in a different order.
So i want a new array that has the same dimensions, newarray=[32,30,96] but where its order based off the last dimension so [32,30,ranks] i don't care about the specific value, position 0 doesn't have to become 57, I want all the data corresponding to position 57 to now be position 0
I am fairly new to jython and I am inquiring about creating a function that is dealing with a list. Basically what I am trying to do with the below function is create a function that will loop through the entirety of my list, then find the lowest value within that list and return the variable with the lowest number. Though, I keep getting a return of function min at 0x26 everytime I execute the main() I receive the same message but it seems as if the function min at 0x26 will count up ex: 0x27, 0x28... Not sure why this is. As my list only contains integers of minimum 0 to max 99.
Here is the sourcecode:
def min(dataset): #defining a function minimum, with input dataset(the list we are using)..
min = dataset[0]
for num in range(0, len(dataset)):
if dataset[num] < min:
min = dataset(num)
return min
minimum = min(dataset)
print(str(minimum))
Here is the code in its entirety. Though, I currently have a way to find the min/max values in the list. I am looking to move towards a function, as I want to know how to correctly use a function.
def main( ):
dataset = [0]
file = open("D:\numbs.dat", "r")
for line in file: #loop for writing over every line to a storage loc.
num = int(float(line)) #converting int to string
dataset.append(num) #appending the data to 'dataset' list
max = dataset[0] #setting an imaginary max initially
low = dataset[0] #setting an imaginary low initially
for i in range(0, len(dataset)): #for loop to scan thru entire list
if dataset[i] > max: #find highest through scan and replacing each max in var max
max = dataset[i]
if dataset[i] < low: #find lowest through scan and replacing each max in var max
low = dataset[i]
#printNow(dataset) #printing the list in its entirety
#printNow("The maximum is " +str(max)) #print each values of lowest and highest
#printNow("The lowest is " +str(low))
def min(dataset): #defining a function minimum..
min = dataset[0]
for num in range(0, len(dataset)):
if dataset[num] < min:
min = dataset(num)
return min
minimum = min(dataset)
print(str(minimum)) #test to see what output is.
As mentioned above, there is the for loop for finding max/min values. Though I tried doing the same exact thing for the function I am trying to create...
the contents of the numbs.dat can be found here (1001 entries):
70
75
76
49
73
76
52
63
11
25
19
89
17
48
5
48
29
41
23
84
28
39
67
48
97
34
0
24
47
98
0
64
24
51
45
11
37
77
5
54
53
33
91
0
27
0
80
5
11
66
45
57
48
25
72
8
38
29
93
29
58
5
72
36
94
18
92
17
43
82
44
93
10
38
31
52
44
10
50
22
39
71
46
40
33
51
51
57
27
24
40
61
88
87
40
85
91
99
6
3
56
10
85
38
61
91
31
69
39
74
9
17
80
96
49
0
47
68
12
5
6
60
81
51
62
87
70
66
50
30
30
22
45
35
2
39
23
63
35
69
83
84
69
6
54
74
3
29
31
54
45
79
21
74
30
77
77
80
26
63
84
21
58
54
69
2
50
79
90
26
45
29
97
28
57
22
59
2
72
1
92
35
38
2
47
23
52
77
87
34
84
15
84
13
23
93
19
50
99
74
59
4
73
93
29
61
8
45
10
20
15
95
58
43
75
19
61
39
68
47
69
58
88
82
33
30
72
21
74
12
18
0
52
50
62
21
66
26
56
84
16
12
7
45
58
22
26
95
82
6
74
12
16
2
61
58
22
39
0
53
88
79
71
13
54
25
31
93
48
91
90
45
23
54
42
39
78
25
95
58
2
41
61
72
98
91
48
97
93
11
12
1
35
80
81
86
38
70
67
55
55
87
73
79
31
43
97
79
3
51
17
58
70
34
59
61
28
46
13
42
18
0
18
75
75
62
50
62
85
49
83
71
63
32
27
59
42
46
8
13
39
25
13
94
17
48
73
40
31
31
86
23
81
40
92
24
94
67
30
18
74
78
62
89
1
27
95
99
33
53
74
5
84
88
8
52
0
24
21
99
1
74
84
94
29
25
83
93
98
40
21
66
93
28
72
63
77
9
71
18
87
50
77
48
68
88
22
33
16
79
68
69
94
64
5
28
33
22
21
74
44
62
68
47
93
69
9
42
44
87
64
97
42
34
90
70
91
12
18
84
65
23
99
1
55
6
1
23
92
50
96
96
68
27
17
98
42
10
27
26
20
13
94
73
75
12
12
25
33
1
33
67
61
0
98
71
35
75
68
56
45
11
1
69
57
9
15
96
69
2
0
65
44
86
78
97
17
4
81
23
4
43
24
72
70
57
21
91
84
94
40
96
40
78
46
67
6
7
16
49
24
14
12
82
73
60
42
76
62
10
84
49
75
89
43
47
31
68
15
11
32
37
98
72
40
25
69
30
64
60
48
21
11
74
54
24
60
10
96
29
39
53
48
24
68
4
52
12
6
91
15
86
77
65
68
22
91
36
72
82
81
9
77
0
5
83
27
88
17
35
66
76
78
81
19
51
87
66
26
59
65
2
37
37
73
34
98
37
78
92
17
52
62
40
50
84
34
22
25
42
90
19
86
76
68
42
9
89
57
78
64
89
12
34
94
9
77
58
32
27
97
93
79
35
32
75
97
79
65
90
53
43
98
4
99
5
79
38
99
60
78
64
90
2
39
42
52
2
21
77
15
8
87
13
0
4
7
43
76
31
74
16
87
50
73
49
14
35
10
37
91
44
88
71
95
75
98
7
17
23
13
16
77
20
50
50
74
78
58
30
21
74
76
93
5
74
94
83
23
67
18
5
50
47
56
79
26
84
78
48
71
43
41
8
91
23
7
11
96
87
12
42
32
44
99
67
99
64
96
52
19
79
60
66
52
62
17
61
54
24
25
36
4
78
3
94
91
62
65
76
94
2
52
25
61
55
49
88
85
96
5
46
56
48
17
25
3
70
62
3
50
45
47
58
12
41
27
42
90
91
71
53
4
79
47
68
43
87
35
63
10
49
4
81
45
88
80
6
92
47
70
40
7
33
70
61
30
9
55
42
83
26
72
57
77
91
13
15
33
13
62
49
43
65
73
98
59
56
77
62
12
25
33
53
78
73
1
17
44
56
95
10
33
89
33
20
56
69
66
60
53
83
58
43
33
25
21
8
28
65
51
70
53
78
49
30
64
17
76
9
2
32
87
77
39
25
21
66
65
54
81
49
15
27
7
14
4
11
94
9
84
23
13
95
45
67
57
20
3
58
50
97
35
68
47
41
84
59
46
34
19
25
77
29
41
89
80
61
70
40
1
18
32
70
86
76
25
98
99
40
43
92
43
4
70
78
72
71
85
14
84
73
92
60
23
57
44
56
6
96
39
91
63
43
39
71
80
18
93
54
1
4
46
68
93
74
74
88
52
88
55
24
19
92
53
59
1
91
48
47
Let me know what the heck I am doing wrong. Thanks!
#ohGosh welcome to stack overflow. You are almost there with the solution. There are few problems with your program
1) Nums.dat file contains just one line with numbers separated by spaces, not a new line(\n). In order to get the read the numbers from the file do the following
dataset = [] #Create an empty list
file = open("D:\numbs.dat", "r") #Open the file
for line in file:
tempData = line.split(" ") #Returns a list with space used as delimiter
dataset = map(int, tempData) #Convert string data to int
2) Wrong way to get data from a list in the min function
Use
min = dataset[num]
Instead of
min = dataset(num)
Fix this and your program will work. Cheers.
Is there a way to find to find and rank rows in a Pandas Dataframe by their similarity to a row from another Dataframe?
My understanding of your question: you have two data frames, hopfully of the same column count. You want to rate first data frame's, the subject data frame, members by how close, i.e. similar, they are to any of the members of the target data frame.
I am not aware of a built in method.
It is probably not the most efficient way but here is how I'd approach:
#! /usr/bin/python3
import pandas as pd
import numpy as np
import pprint
pp = pprint.PrettyPrinter(indent=4)
# Simulate data
df_subject = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD')) # This is the one we're iterating to check similarity to target.
df_target = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD')) # This is the one we're checking distance to
# This will hold the min dstances.
distances=[]
# Loop to iterate over subject DF
for ix1,subject in df_subject.iterrows():
distances_cur=[]
# Loop to iterate over target DF
for ix2,target in df_target.iterrows():
distances_cur.append(np.linalg.norm(target-subject))
# Get the minimum distance for the subject set member.
distances.append(min(distances_cur))
# Distances to df
distances=pd.DataFrame(distances)
# Normalize.
distances=0.5-(distances-distances.mean(axis=0))/distances.max(axis=0)
# Column index joining, ordering and beautification.
Proximity_Ratings_name='Proximity Ratings'
distances=distances.rename(columns={0: Proximity_Ratings_name})
df_subject=df_subject.join(distances)
pp.pprint(df_subject.sort_values(Proximity_Ratings_name,ascending=False))
It should yeild something like the table below. Higher rating means there's a similar member in the target data frame:
A B C D Proximity Ratings
55 86 21 91 78 0.941537
38 91 31 35 95 0.901638
43 49 89 49 6 0.878030
98 28 98 98 36 0.813685
77 67 23 78 84 0.809324
35 52 16 36 58 0.802223
54 2 25 61 44 0.788591
95 76 3 60 46 0.766896
5 55 39 88 37 0.756049
52 79 71 90 70 0.752520
66 52 27 82 82 0.751353
41 45 67 55 33 0.739919
76 12 93 50 62 0.720323
94 99 84 39 63 0.716123
26 62 6 97 60 0.715081
40 64 50 37 27 0.714042
68 70 21 8 82 0.698824
47 90 54 60 65 0.676680
7 85 95 45 71 0.672036
2 14 68 50 6 0.661113
34 62 63 83 29 0.659322
8 87 90 28 74 0.647873
75 14 61 27 68 0.633370
60 9 91 42 40 0.630030
4 46 46 52 35 0.621792
81 94 19 82 44 0.614510
73 67 27 34 92 0.608137
30 92 64 93 51 0.608137
11 52 25 93 50 0.605770
51 17 48 57 52 0.604984
.. .. .. .. .. ...
64 28 56 0 9 0.397054
18 52 84 36 79 0.396518
99 41 5 32 34 0.388519
27 19 54 43 94 0.382714
92 69 56 73 93 0.382714
59 1 29 46 16 0.374878
58 2 36 8 96 0.362525
69 58 92 16 48 0.361505
31 27 57 80 35 0.349887
10 59 23 47 24 0.345891
96 41 77 76 33 0.345891
78 42 71 87 65 0.344398
93 12 31 6 27 0.329152
23 6 5 10 42 0.320445
14 44 6 43 29 0.319964
6 81 51 44 15 0.311840
3 17 60 13 22 0.293066
70 28 40 22 82 0.251549
36 95 72 35 5 0.249354
49 78 10 30 18 0.242370
17 79 69 57 96 0.225168
46 42 95 86 81 0.224742
84 58 81 59 86 0.221346
9 9 62 8 30 0.211659
72 11 51 74 8 0.159265
90 74 26 80 1 0.138993
20 90 4 6 5 0.117652
50 3 12 5 53 0.077088
42 90 76 42 1 0.075284
45 94 46 88 14 0.054244
Hope I understand correctly. Don't use if performance matters, I'm sure there's an algebraic way to approach this (Multiply matrices) that would run way faster.
I am working with a 2D Numpy masked_array in Python.
I need to change the data values in the masked area such that they equal the nearest unmasked value.
NB. If there are more than one nearest unmasked values then it can take any of those nearest values (which ever one turns out to be easiest to codeā¦)
e.g.
import numpy
import numpy.ma as ma
a = numpy.arange(100).reshape(10,10)
fill_value=-99
a[2:4,3:8] = fill_value
a[8,8] = fill_value
a = ma.masked_array(a,a==fill_value)
>>> a [[0 1 2 3 4 5 6 7 8 9]
[10 11 12 13 14 15 16 17 18 19]
[20 21 22 -- -- -- -- -- 28 29]
[30 31 32 -- -- -- -- -- 38 39]
[40 41 42 43 44 45 46 47 48 49]
[50 51 52 53 54 55 56 57 58 59]
[60 61 62 63 64 65 66 67 68 69]
[70 71 72 73 74 75 76 77 78 79]
[80 81 82 83 84 85 86 87 -- 89]
[90 91 92 93 94 95 96 97 98 99]],
I need it to look like this:
>>> a.data
[[0 1 2 3 4 5 6 7 8 9]
[10 11 12 13 14 15 16 17 18 19]
[20 21 22 ? 14 15 16 ? 28 29]
[30 31 32 ? 44 45 46 ? 38 39]
[40 41 42 43 44 45 46 47 48 49]
[50 51 52 53 54 55 56 57 58 59]
[60 61 62 63 64 65 66 67 68 69]
[70 71 72 73 74 75 76 77 78 79]
[80 81 82 83 84 85 86 87 ? 89]
[90 91 92 93 94 95 96 97 98 99]],
NB. where "?" could take any of the adjacent unmasked values.
What is the most efficient way to do this?
Thanks for your help.
I generally use a distance transform, as wisely suggested by Juh_ in this question.
This does not directly apply to masked arrays, but I do not think it will be that hard to transpose there, and it is quite efficient, I've had no problem applying it to large 100MPix images.
Copying the relevant method there for reference :
import numpy as np
from scipy import ndimage as nd
def fill(data, invalid=None):
"""
Replace the value of invalid 'data' cells (indicated by 'invalid')
by the value of the nearest valid data cell
Input:
data: numpy array of any dimension
invalid: a binary array of same shape as 'data'. True cells set where data
value should be replaced.
If None (default), use: invalid = np.isnan(data)
Output:
Return a filled array.
"""
#import numpy as np
#import scipy.ndimage as nd
if invalid is None: invalid = np.isnan(data)
ind = nd.distance_transform_edt(invalid, return_distances=False, return_indices=True)
return data[tuple(ind)]
You could use np.roll to make shifted copies of a, then use boolean logic on the masks to identify the spots to be filled in:
import numpy as np
import numpy.ma as ma
a = np.arange(100).reshape(10,10)
fill_value=-99
a[2:4,3:8] = fill_value
a[8,8] = fill_value
a = ma.masked_array(a,a==fill_value)
print(a)
# [[0 1 2 3 4 5 6 7 8 9]
# [10 11 12 13 14 15 16 17 18 19]
# [20 21 22 -- -- -- -- -- 28 29]
# [30 31 32 -- -- -- -- -- 38 39]
# [40 41 42 43 44 45 46 47 48 49]
# [50 51 52 53 54 55 56 57 58 59]
# [60 61 62 63 64 65 66 67 68 69]
# [70 71 72 73 74 75 76 77 78 79]
# [80 81 82 83 84 85 86 87 -- 89]
# [90 91 92 93 94 95 96 97 98 99]]
for shift in (-1,1):
for axis in (0,1):
a_shifted=np.roll(a,shift=shift,axis=axis)
idx=~a_shifted.mask * a.mask
a[idx]=a_shifted[idx]
print(a)
# [[0 1 2 3 4 5 6 7 8 9]
# [10 11 12 13 14 15 16 17 18 19]
# [20 21 22 13 14 15 16 28 28 29]
# [30 31 32 43 44 45 46 47 38 39]
# [40 41 42 43 44 45 46 47 48 49]
# [50 51 52 53 54 55 56 57 58 59]
# [60 61 62 63 64 65 66 67 68 69]
# [70 71 72 73 74 75 76 77 78 79]
# [80 81 82 83 84 85 86 87 98 89]
# [90 91 92 93 94 95 96 97 98 99]]
If you'd like to use a larger set of nearest neighbors, you could perhaps do something like this:
neighbors=((0,1),(0,-1),(1,0),(-1,0),(1,1),(-1,1),(1,-1),(-1,-1),
(0,2),(0,-2),(2,0),(-2,0))
Note that the order of the elements in neighbors is important. You probably want to fill in missing values with the nearest neighbor, not just any neighbor. There's probably a smarter way to generate the neighbors sequence, but I'm not seeing it at the moment.
a_copy=a.copy()
for hor_shift,vert_shift in neighbors:
if not np.any(a.mask): break
a_shifted=np.roll(a_copy,shift=hor_shift,axis=1)
a_shifted=np.roll(a_shifted,shift=vert_shift,axis=0)
idx=~a_shifted.mask*a.mask
a[idx]=a_shifted[idx]
Note that np.roll happily rolls the lower edge to the top, so a missing value at the top may be filled in by a value from the very bottom. If this is a problem, I'd have to think more about how to fix it. The obvious but not very clever solution would be to use if statements and feed the edges a different sequence of admissible neighbors...
For more complicated cases you could use scipy.spatial:
from scipy.spatial import KDTree
x,y=np.mgrid[0:a.shape[0],0:a.shape[1]]
xygood = np.array((x[~a.mask],y[~a.mask])).T
xybad = np.array((x[a.mask],y[a.mask])).T
a[a.mask] = a[~a.mask][KDTree(xygood).query(xybad)[1]]
print a
[[0 1 2 3 4 5 6 7 8 9]
[10 11 12 13 14 15 16 17 18 19]
[20 21 22 13 14 15 16 17 28 29]
[30 31 32 32 44 45 46 38 38 39]
[40 41 42 43 44 45 46 47 48 49]
[50 51 52 53 54 55 56 57 58 59]
[60 61 62 63 64 65 66 67 68 69]
[70 71 72 73 74 75 76 77 78 79]
[80 81 82 83 84 85 86 87 78 89]
[90 91 92 93 94 95 96 97 98 99]]