I have a dataframe with three points given as columns (so, a total of six):
x_measured y_measured x_calculated y_calculated x_fixedpoint \
0 142 37 143 37.5 138
1 142 37 143 37.6 138
2 142 37 143 37.7 138
3 142 37 143 37.8 138
4 142 37 143 37.9 138
5 73 55 71 55.6 72
6 73 55 71 55.7 72
7 73 55 71 55.8 72
8 73 55 71 55.9 72
9 73 55 71 55.1 72
y_fixedpoint
0 38
1 38
2 38
3 38
4 38
5 55
6 55
7 55
8 55
9 55
Now, I need to calculate the angle between (x_measured, y_measured) and (x_calculated y_calculated) relative to (x_fixedpoint, y_fixedpoint). To do so, I created this function:
def angle_calculator(x1,x2,x3,x4,x5,x6):
All_points = np.array([[x1,x2],[x3,x4],[x5,x6]])
A = All_points[2] - All_points[0]
B = All_points[1] - All_points[0]
C = All_points[2] - All_points[1]
for e1, e2 in ((A, B), (A, C), (B, -C)):
dotproduct = np.dot(e1, e2)
norm = np.linalg.norm(e1) * np.linalg.norm(e2)
if dotproduct !=0:
angle = round(np.arccos(dotproduct/norm) * 180 / np.pi, 2)
else:
angle = 0
return angle
taking the different x,y coordinates as arguments and returning an angle. It works and gives:
df['angles'] = df.apply(lambda x: angle_calculator(x.x_measured, x.y_measured, x.x_calculated, x.y_calculated, x.x_fixedpoint, x.y_fixedpoint), axis=1)
x_measured y_measured x_calculated y_calculated x_fixedpoint \
0 142 37 143 37.5 138
1 142 37 143 37.6 138
2 142 37 143 37.7 138
3 142 37 143 37.8 138
4 142 37 143 37.9 138
5 73 55 71 55.6 72
6 73 55 71 55.7 72
7 73 55 71 55.8 72
8 73 55 71 55.9 72
9 73 55 71 55.1 72
y_fixedpoint angles
0 38 32.275644
1 38 35.537678
2 38 38.425651
3 38 40.950418
4 38 43.132975
5 55 14.264512
6 55 15.701974
7 55 16.858399
8 55 17.759467
9 55 2.848188
Usually, I would be realatively pleased with this BUT.....it is rather slow for dataframes with over 200 000 rows. Slow (Iknow!) is a realtive term, but in this cas it takes around 10 seconds for 200 000 rows.
So, my questions are:
Am I overcomplicating things?
Is there a more efficient way to do this?
As always, thankful for knowledge.
Considering the values in the numpy domain, we can do:
# extract the pairs (and go to numpy)
meas = df.filter(like="measured").to_numpy()
calc = df.filter(like="calculated").to_numpy()
fix = df.filter(like="fixed").to_numpy()
# calculate the differences of `meas` and `calc` from `fix`
meas_dist = fix - meas
calc_dist = fix - calc
# get the inner products
inners = (meas_dist * calc_dist).sum(axis=1)
# or with: inners = np.einsum("ij,ij->i", meas_dist, calc_dist); might be faster
# norm function for brevity
norm = lambda mat: np.linalg.norm(mat, axis=1)
# get the angles (in radians)
angles_in_rad = np.arccos(inners / (norm(meas_dist) * norm(calc_dist)))
# handling possible NaNs (by #Serge de Gosson de Varennes, thanks!)
where_nans = isnan(angles_in_rad)
angles_in_rad[where_nans ] = 0
# go to degrees
angles_in_deg = np.rad2deg(angles_in_rad)
# put back to df
df["angles"] = angles_in_deg
I get:
>>> df
x_measured y_measured x_calculated y_calculated x_fixedpoint y_fixed_point angles
0 142 37 143 37.5 138 38 8.325650
1 142 37 143 37.6 138 38 9.462322
2 142 37 143 37.7 138 38 10.602613
3 142 37 143 37.8 138 38 11.745633
4 142 37 143 37.9 138 38 12.890481
5 73 55 71 55.6 72 55 149.036243
6 73 55 71 55.7 72 55 145.007980
7 73 55 71 55.8 72 55 141.340192
8 73 55 71 55.9 72 55 138.012788
9 73 55 71 55.1 72 55 174.289407
Related
In an attempt to write a query in Python that will generate the first one hundred sums of squares between one and ten, the following attempt is made:
for a in range(1, 10):
for b in range(1,10):
print(a**2+b**2)
Format should look like this:
1 1 4 9 16 25 36 49 64 81 100
4 5 8 13 20 29 40 53 68 85 104
9 10 13 18 25 34 45 58 73 90 109
16 17 20 25 32 41 52 65 80 97 116
25 26 38 34 41 50 61 74 89 106 125
36 37 40 45 52 61 72 85 100 117 136
49 75 53 58 65 74 85 98 113 130 149
64 65 68 73 80 89 100 113 128 145 164
81 82 85 90 97 106 117 130 145 162 181
100 101 104 109 116 125 136 149 164 181 200
After reading that the sum of two squares in a polynomial will always be positive, it occurred to me that it would make a good dataset to have all the numbers that could be generated by the sum of two squares. Polynomial like (a+b)*(a-b)...
Seems to me there should be an answer that creates a couple of matrices multiplied together.
So tried this but format is wrong (Thank UnsignedFoo for the help)
df=pd.Dataframe=([])
for a in range(1, 10):
val_row = " ".join([str(a**2+b**2) for b in range(1,10)])
print("{}\n".format("["+val_row+"]"))
df.append("{}\n".format("["+val_row+"]"))
It can be hard coded a line a time:
df1 = pd.DataFrame([[1, 2,3,4,5,6,7,8,9,10], [2,5,10,17,26,37,50,65,82,101]], columns=list('ABCDEFGHIJ'))
df2 = pd.DataFrame([[5,8,13,20,29,40,53,68,85,104], [10,13,18,25,34,45,58,73,90,109]], columns=list('ABCDEFGHIJ'))
df3 = pd.DataFrame([[17,20,25,32,41,52,65,80,97,116], [26,38,34,41,50,61,74,89,106,125]], columns=list('ABCDEFGHIJ'))
df4 = pd.DataFrame([[37,40,45,52,61,72,85,100,117,136], [75,53,58,65,74,85,98,113,130,149]], columns=list('ABCDEFGHIJ'))
df5 = pd.DataFrame([[65,68,73,80,89,100,113,128,145,164], [82,85,90,97,106,117,130,145,162,181]], columns=list('ABCDEFGHIJ'))
df6 = pd.DataFrame([[101,104,109,116,125,136,149,164,181,200], [126,125,130,137,146,157,170,185,202,221]], columns=list('ABCDEFGHIJ'))
df1.append(df2)
df1.append(df3)
df1.append(df4)
df1.append(df5)
df1.append(df6)
And at that point can be graphed and it would be great if the matrice could be built with less work - up to this point, Excel does everything but the Excel doesn't have the advanced matplot libraries:
import matplotlib.pyplot as plt
from matplotlib import cm
from mpl_toolkits.mplot3d import Axes3D
%matplotlib inline
%config InlineBackend.figure_format='retina'
from mpl_toolkits import mplot3d
fig=plt.figure(figsize=(8,6))
xs=df1['A']
ys=df1['B']
zs=df1['C']
ax=fig.add_subplot(111,projection='3d')
ax.scatter(xs,ys,zs,s=50,alpha=0.6,edgecolors='w')
ax.set_xlabel('A')
ax.set_ylabel('B')
ax.set_zlabel('C')
plt.show()
A better example that is closer to where this should go is:
X = np.arange(1, 10, 1)
Y = np.arange(1, 10, 1)
X, Y = np.meshgrid(X, Y)
R = (X**2 + Y**2)# R = np.sqrt(X**2 + Y**2)
Z = np.sin(R)
fig = plt.figure()
ax = Axes3D(fig)
ax.plot_surface(X, Y, Z, rstride=1, cstride=1, cmap=cm.viridis)
plt.show()
Still hoping someone can answer this question. I'd like to delete and start again: the "answers" don't address the issue of graphing.
The syntax error with your code is, that after a for loop a : followed by a newline is required:
This results in:
for b in range(1, 10):
for a in range(1,b):
print(a**2+b**2)
Please note this still does not print your table, as print always generates a newline in python.
for a in range(1, 11):
val_row = " ".join([str(a**2 + b**2 -1) for b in range(1,11)])
print("{}\n".format(val_row))
Output:
1 4 9 16 25 36 49 64 81 100
4 7 12 19 28 39 52 67 84 103
9 12 17 24 33 44 57 72 89 108
16 19 24 31 40 51 64 79 96 115
25 28 33 40 49 60 73 88 105 124
36 39 44 51 60 71 84 99 116 135
49 52 57 64 73 84 97 112 129 148
64 67 72 79 88 99 112 127 144 163
81 84 89 96 105 116 129 144 161 180
100 103 108 115 124 135 148 163 180 199
I am using matplotlib to create many plots. The plots involve making many FancyBboxPatches and setting the color for each patch using a ScalarMappable. Each plot corresponds to a "time step" from a physical process. I have made the following minimal working example to illustrate what I am trying to do and the problem I am having.
Suppose there is a file data.txt. If a line has one entry, that value is the time step. If a line has three entries, then the first entry is the x value, the second entry is the y value, and the third entry is the value that will use the ScalarMappable. Here is an example of data.txt:
1
0 0 0.1
0 1 1
0 2 2
0 3 3
0 4 4
1 0 10
1 1 11
1 2 12
1 3 13
1 4 14
2 0 20
2 1 21
2 2 22
2 3 23
2 4 24
3 0 30
3 1 31
3 2 32
3 3 33
3 4 34
2
1 0 10
1 1 11
1 2 12
1 3 13
1 4 14
2 0 110
2 1 111
2 2 112
2 3 113
2 4 114
3 0 120
3 1 121
3 2 122
3 3 123
3 4 124
4 0 130
4 1 131
4 2 132
4 3 133
4 4 134
3
2 0 110
2 1 111
2 2 112
2 3 113
2 4 114
3 0 1110
3 1 1111
3 2 1112
3 3 1113
3 4 1114
4 0 1120
4 1 1121
4 2 1122
4 3 1123
4 4 1124
5 0 1130
5 1 1131
5 2 1132
5 3 1133
5 4 1134
4
3 0 1110
3 1 1111
3 2 1112
3 3 1113
3 4 1114
4 0 11110
4 1 11111
4 2 11112
4 3 11113
4 4 11114
5 0 11120
5 1 11121
5 2 11122
5 3 11123
5 4 11124
6 0 11130
6 1 11131
6 2 11132
6 3 11133
6 4 11134
Here is the script I use to generate the plots:
#!/usr/bin/env python3
import matplotlib.pyplot as plt
import matplotlib.cm as cm
from matplotlib.colors import LogNorm
from matplotlib.patches import FancyBboxPatch
def parse_file(file_name):
output = {}
with open(file_name, 'r') as data_file:
for line in data_file:
entries = line.strip().split()
if len(entries) == 1:
time_step = int(entries[0])
output[time_step] = {}
elif len(entries) == 3:
x = float(entries[0])
y = float(entries[1])
value = float(entries[2])
output[time_step][(x, y)] = value
else:
raise RuntimeError('Anomalous line {} in file {}'.format(line, data_file.name))
return output
def main():
fig, axes = plt.subplots()
axes.set_xlim(-1,10)
axes.set_ylim(-1,10)
cmap = cm.plasma
norm = LogNorm(vmin = 1e-2, vmax = 1.2e4)
smap = cm.ScalarMappable(norm = norm, cmap = cmap)
smap.set_array([])
color_bar = fig.colorbar(mappable = smap, ax = axes, orientation = 'vertical', label = 'label')
data = parse_file(file_name = 'data.txt')
for time_step, information in data.items():
cells = []
for (x,y), value in information.items():
cell = FancyBboxPatch(xy = (x - 0.5, y - 0.5),
width = 1, height = 1,
boxstyle = 'square,pad=0.',
edgecolor = 'black',
facecolor = smap.to_rgba(value))
#print(time_step, '\t', x, '\t', y, '\t', value, '\t', smap.to_rgba(value))
axes.add_patch(cell)
cells.append(cell)
fig.savefig('time-step_{}.png'.format(time_step))
for cell in cells:
cell.remove()
if __name__ == '__main__':
main()
And here is one of the plots that is created from running that script:
This plot (and the other three that are created, but not shown here) look fine. So I am confident that I am using ScalarMappable correctly. Now I take the actual data I want to plot, again in a file called data.txt. The format is the same as before, except if a line has four entries, then the first entry is the time step (and I do not care about the other entries). Here is an example of data.txt:
2 0.424066E-02 0.200000E+01 0.885500E+08
0 1 0.850703E+00
1 3 0.388551E-09
2 4 0.141948E-06
2 6 0.126299E-09
3 9 0.166871E-08
4 12 0.340738E-08
5 13 0.246948E-09
5 14 0.129005E-09
6 16 0.140043E-08
6 17 0.885307E-09
26 76 0.591676E-08
26 78 0.745985E-08
27 77 0.263136E-08
27 78 0.131857E-08
27 79 0.151193E-05
27 80 0.265941E-07
27 81 0.170975E-05
27 82 0.206355E-08
27 83 0.334444E-07
28 80 0.569439E-05
28 81 0.864904E-07
28 82 0.114196E-02
28 83 0.130067E-06
28 84 0.608045E-04
28 85 0.351649E-07
28 86 0.543117E-07
28 88 0.202115E-08
29 83 0.225374E-07
29 84 0.125586E-07
29 85 0.253383E-04
29 86 0.943810E-06
29 87 0.104539E-04
29 88 0.210241E-06
29 89 0.196533E-03
29 90 0.707278E-06
29 91 0.565096E-05
29 92 0.840856E-08
29 93 0.277478E-07
30 86 0.707234E-09
30 88 0.549048E-07
30 89 0.281776E-08
30 90 0.259219E-04
30 91 0.298973E-06
30 92 0.311047E-04
30 93 0.144465E-05
30 94 0.632642E-04
30 95 0.787893E-08
30 96 0.252900E-08
31 91 0.425350E-08
31 92 0.371105E-08
31 93 0.621869E-05
31 94 0.680069E-06
31 95 0.315149E-04
31 96 0.670790E-07
31 97 0.568911E-06
31 98 0.187946E-08
31 99 0.135024E-07
32 94 0.384693E-09
32 96 0.174407E-06
32 97 0.480216E-08
32 98 0.244989E-05
32 99 0.876257E-07
32 100 0.189371E-04
32 101 0.264917E-06
32 102 0.297745E-05
32 103 0.213684E-09
33 99 0.110356E-08
33 100 0.131345E-08
33 101 0.448076E-06
33 102 0.106369E-06
33 103 0.128984E-04
33 104 0.230382E-07
33 105 0.266535E-07
34 102 0.428166E-08
34 103 0.668242E-08
34 104 0.842244E-05
34 105 0.843016E-07
34 106 0.137510E-05
34 107 0.879097E-08
34 108 0.758233E-07
35 105 0.280844E-06
35 106 0.639110E-07
35 107 0.497335E-05
35 108 0.260105E-06
35 109 0.188060E-05
35 110 0.375853E-09
35 111 0.935430E-09
35 112 0.138533E-07
35 113 0.101658E-06
35 114 0.504823E-09
35 115 0.989704E-09
35 116 0.152468E-06
35 117 0.220735E-07
36 114 0.430884E-08
36 116 0.115980E-07
36 117 0.128436E-05
36 118 0.814433E-05
37 117 0.316595E-09
37 118 0.141531E-06
37 119 0.965141E-05
38 119 0.459954E-08
38 120 0.114088E-04
38 121 0.198695E-09
39 120 0.109457E-08
39 121 0.105160E-04
39 122 0.254984E-08
40 122 0.717566E-05
40 123 0.179081E-08
40 124 0.352463E-09
41 123 0.454357E-05
41 124 0.629608E-07
41 125 0.777480E-07
42 124 0.453866E-05
42 125 0.108592E-06
42 126 0.320262E-06
42 127 0.252596E-09
42 128 0.114714E-09
43 125 0.372578E-06
43 126 0.344297E-07
43 127 0.188018E-05
43 128 0.631276E-08
43 129 0.368003E-08
44 126 0.170090E-07
44 127 0.121695E-07
44 128 0.147407E-05
44 129 0.349674E-07
44 130 0.767494E-06
45 128 0.193141E-09
45 129 0.361851E-06
45 130 0.573704E-07
45 131 0.457287E-06
45 132 0.148004E-08
45 133 0.164772E-07
45 134 0.386942E-09
45 135 0.539603E-08
45 136 0.227778E-09
45 137 0.640126E-08
45 138 0.189604E-09
45 139 0.754561E-09
46 132 0.215880E-07
46 134 0.102847E-08
46 136 0.628736E-08
46 137 0.427124E-09
46 138 0.711664E-07
46 139 0.749082E-08
46 140 0.425043E-06
46 141 0.776307E-08
46 142 0.102985E-06
46 143 0.693232E-09
46 144 0.215846E-08
47 141 0.660244E-08
47 142 0.901189E-09
47 143 0.299062E-07
47 144 0.195833E-08
47 145 0.178405E-07
47 146 0.558550E-09
47 147 0.235167E-08
48 144 0.393065E-09
48 146 0.493252E-08
48 147 0.299176E-09
48 148 0.130504E-07
48 149 0.244654E-09
48 150 0.143702E-08
49 149 0.565286E-09
49 151 0.122230E-08
3 0.424066E-02 0.200000E+01 0.885500E+08
0 1 0.850710E+00
1 3 0.388551E-09
2 4 0.141948E-06
2 6 0.126299E-09
3 9 0.166871E-08
4 12 0.340738E-08
5 13 0.246948E-09
5 14 0.129005E-09
6 16 0.140043E-08
6 17 0.885307E-09
26 76 0.593799E-08
26 78 0.747463E-08
27 77 0.283934E-08
27 78 0.115725E-08
27 79 0.153613E-05
27 80 0.236099E-08
27 81 0.171178E-05
27 83 0.334426E-07
28 80 0.575684E-05
28 81 0.242170E-07
28 82 0.114208E-02
28 83 0.133947E-07
28 84 0.608362E-04
28 85 0.335522E-08
28 86 0.543624E-07
28 88 0.202170E-08
29 83 0.258149E-07
29 84 0.107337E-07
29 85 0.261133E-04
29 86 0.167223E-06
29 87 0.108977E-04
29 88 0.432469E-08
29 89 0.196993E-03
29 90 0.997563E-08
29 91 0.565922E-05
29 92 0.127589E-09
29 93 0.277365E-07
30 86 0.731139E-09
30 88 0.613936E-07
30 89 0.984612E-09
30 90 0.261316E-04
30 91 0.845314E-07
30 92 0.324848E-04
30 93 0.656773E-07
30 94 0.632706E-04
30 95 0.335583E-09
30 96 0.252938E-08
31 91 0.529954E-08
31 92 0.394099E-08
31 93 0.681605E-05
31 94 0.104800E-06
31 95 0.315602E-04
31 96 0.231610E-08
31 97 0.566868E-06
31 99 0.135330E-07
32 94 0.450380E-09
32 96 0.178679E-06
32 97 0.955313E-09
32 98 0.252946E-05
32 99 0.770340E-08
32 100 0.191937E-04
32 101 0.825856E-08
32 102 0.297762E-05
33 99 0.128999E-08
33 100 0.146516E-08
33 101 0.616111E-06
33 102 0.539415E-07
33 103 0.128046E-04
33 104 0.865090E-09
33 105 0.266759E-07
34 102 0.899336E-08
34 103 0.331924E-08
34 104 0.850733E-05
34 105 0.462457E-08
34 106 0.137714E-05
34 107 0.199044E-09
34 108 0.758844E-07
35 105 0.308602E-06
35 106 0.470668E-07
35 107 0.520013E-05
35 108 0.458893E-07
35 109 0.185756E-05
35 111 0.159320E-07
35 112 0.729552E-09
35 113 0.101697E-06
35 114 0.135746E-09
35 115 0.128676E-06
35 116 0.231448E-07
35 117 0.220783E-07
36 114 0.480979E-08
36 116 0.921582E-06
36 117 0.373798E-06
36 118 0.814449E-05
37 117 0.888355E-08
37 118 0.132905E-06
37 119 0.965147E-05
38 118 0.360663E-09
38 119 0.423745E-08
38 120 0.114090E-04
39 120 0.109122E-08
39 121 0.105186E-04
40 122 0.717737E-05
40 124 0.352428E-09
41 123 0.460618E-05
41 124 0.358205E-09
41 125 0.777514E-07
42 124 0.464136E-05
42 125 0.589035E-08
42 126 0.320503E-06
42 128 0.114709E-09
43 125 0.408148E-06
43 126 0.567978E-08
43 127 0.187958E-05
43 129 0.368007E-08
44 126 0.258868E-07
44 127 0.348446E-08
44 128 0.150718E-05
44 129 0.167101E-08
44 130 0.767515E-06
45 128 0.176686E-09
45 129 0.403334E-06
45 130 0.162718E-07
45 131 0.458273E-06
45 132 0.196826E-09
45 133 0.167474E-07
45 135 0.563904E-08
45 137 0.655709E-08
45 139 0.751998E-09
46 132 0.216010E-07
46 134 0.107901E-08
46 136 0.673825E-08
46 138 0.784839E-07
46 139 0.220743E-09
46 140 0.432287E-06
46 141 0.427029E-09
46 142 0.103696E-06
46 144 0.211976E-08
47 141 0.696394E-08
47 142 0.585710E-09
47 143 0.315456E-07
47 144 0.425448E-09
47 145 0.181981E-07
47 146 0.136911E-09
47 147 0.226765E-08
48 144 0.442465E-09
48 146 0.553370E-08
48 147 0.138932E-09
48 148 0.128376E-07
48 150 0.144107E-08
49 149 0.624360E-09
49 151 0.123765E-08
The script that I use to create the plots is almost the same as before. The only differences are (1) how data.txt is parsed, (2) setting the limits of the x and y axes, and (3) the variable norm. Here is the script:
#!/usr/bin/env python3
import matplotlib.pyplot as plt
import matplotlib.cm as cm
from matplotlib.colors import LogNorm
from matplotlib.patches import FancyBboxPatch
def parse_file(file_name):
output = {}
with open(file_name, 'r') as data_file:
for line in data_file:
entries = line.strip().split()
if len(entries) == 4:
time_step = int(entries[0])
output[time_step] = {}
elif len(entries) == 3:
x = float(entries[0])
y = float(entries[1])
value = float(entries[2])
output[time_step][(x, y)] = value
else:
raise RuntimeError('Anomalous line {} in file {}'.format(line, data_file.name))
return output
def main():
fig, axes = plt.subplots()
axes.set_xlim(0,150)
axes.set_ylim(0,250)
cmap = cm.plasma
norm = LogNorm(vmin = pow(10, -10), vmax = pow(10, -2.2))
smap = cm.ScalarMappable(norm = norm, cmap = cmap)
smap.set_array([])
color_bar = fig.colorbar(mappable = smap, ax = axes, orientation = 'vertical', label = 'label')
data = parse_file(file_name = 'data.txt')
for time_step, information in data.items():
cells = []
for (x,y), value in information.items():
cell = FancyBboxPatch(xy = (x - 0.5, y - 0.5),
width = 1, height = 1,
boxstyle = 'square,pad=0.',
edgecolor = 'black',
facecolor = smap.to_rgba(value))
#print(time_step, '\t', x, '\t', y, '\t', value, '\t', smap.to_rgba(value))
axes.add_patch(cell)
cells.append(cell)
fig.savefig('time-step_{}.png'.format(time_step))
for cell in cells:
cell.remove()
if __name__ == '__main__':
main()
Now all of the patches are black. Here is one of the plots that is created:
I do not see anything obviously wrong using the print statement (which is commented out in the script):
print(time_step, '\t', x, '\t', y, '\t', value, '\t', smap.to_rgba(value))
Why are all the FancyBboxPatches black instead of the color I have chosen with the ScalarMappable (and how can I make them be the color I have chosen with the ScalarMappable)?
It doesn't look as if the patches are black. I would guess that they are just too small, such that their edge (which is black) takes up the complete area of the patch. You may use a thinner edge, or no edge at all, or you may set the edgecolor to the value of your liking as well. In general, You may also use simple patches like Rectangle instead of the FancyBboxPatch.
This is something I've been struggling with for a couple of weeks. The algorithm is the following:
Select a subarray as an array of rows and columns from a larger array
Compute the median of the subarray
Replace cells in subarray with median value
Move the subarray to the right by its own length
Repeat to end of array
Move subarray down by its own height
Repeat
I've got steps 1 to 3 as follows:
import numpy as np
w1 = np.arange(100).reshape(10,10)
side = 3
patch = w1[0:side, 0:side]
i, j = patch.shape
for j in range(side):
for i in range(side):
patch[i,j] = np.median(patch)
Eventually, I'll be using a 901x877 array from an image but I'm just trying to get a hold of this simple task first. How can I slide the array along and then down with a loop?
You can use scikit-image's view_as_blocks and NumPy broadcasting to vectorize the operation:
import numpy as np
import skimage
w1 = np.arange(144).reshape(12,12)
print(w1)
# [[ 0 1 2 3 4 5 6 7 8 9 10 11]
# [ 12 13 14 15 16 17 18 19 20 21 22 23]
# [ 24 25 26 27 28 29 30 31 32 33 34 35]
# [ 36 37 38 39 40 41 42 43 44 45 46 47]
# [ 48 49 50 51 52 53 54 55 56 57 58 59]
# [ 60 61 62 63 64 65 66 67 68 69 70 71]
# [ 72 73 74 75 76 77 78 79 80 81 82 83]
# [ 84 85 86 87 88 89 90 91 92 93 94 95]
# [ 96 97 98 99 100 101 102 103 104 105 106 107]
# [108 109 110 111 112 113 114 115 116 117 118 119]
# [120 121 122 123 124 125 126 127 128 129 130 131]
# [132 133 134 135 136 137 138 139 140 141 142 143]]
side = 3
w2 = skimage.util.view_as_blocks(w1, (side, side))
w2[...] = np.median(w2, axis=(-2, -1))[:, :, None, None]
print(w1)
# [[ 13 13 13 16 16 16 19 19 19 22 22 22]
# [ 13 13 13 16 16 16 19 19 19 22 22 22]
# [ 13 13 13 16 16 16 19 19 19 22 22 22]
# [ 49 49 49 52 52 52 55 55 55 58 58 58]
# [ 49 49 49 52 52 52 55 55 55 58 58 58]
# [ 49 49 49 52 52 52 55 55 55 58 58 58]
# [ 85 85 85 88 88 88 91 91 91 94 94 94]
# [ 85 85 85 88 88 88 91 91 91 94 94 94]
# [ 85 85 85 88 88 88 91 91 91 94 94 94]
# [121 121 121 124 124 124 127 127 127 130 130 130]
# [121 121 121 124 124 124 127 127 127 130 130 130]
# [121 121 121 124 124 124 127 127 127 130 130 130]]
Note that I had to change the size of your array to 12x12 so that all of your tiles of 3x3 actually fit in there.
Here are a few "code smells" I see.
Start with the range(side) since this number is set to 3 then you are going to have a result of [0,1,2]. Is that what you really want?
you set i,j = patch.size then immediately over write these values, in your for loops.
Finally, you're recalculating median every loop.
Ok, here's what I'd do.
figure out how many patches you'll need in both width and height. and multiply those by the size of the side.
slice your array (matrix) up into those pieces.
assign the patch to the median.
import numpy as np
w1 = np.arange(100).reshape(10,10)
side = 3
w, h = w1.shape
width_index = np.array(range(w//side)) * side
height_index = np.array(range(h//side)) * side
def assign_patch(patch, median, side):
"""Break this loop out to prevent 4 nested 'for' loops"""
for j in range(side):
for i in range(side):
patch[i,j] = median
return patch
for width in width_index:
for height in height_index:
patch = w1[width:width+side, height:height+side]
median = np.median(patch)
assign_patch(patch, median, side)
print w1
I've created a simple script that will make a multiplication table and output it. It works and is pretty cool but I would like to know if there's a way I could fix it for when it goes higher then 10. After 10 (on the row) it will be a whitespace off of the rest of the table, how can I fix this little format issue?
if __name__ == '__main__':
for row in range(1, 20+1):
table = ''
for column in range(1, 20+1):
table += '{:4} '.format(row * column)
print(table.strip())
Example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40
3 6 9 12 15 18 21 24 27 30 33 36 39 42 45 48 51 54 57 60
4 8 12 16 20 24 28 32 36 40 44 48 52 56 60 64 68 72 76 80
5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100
6 12 18 24 30 36 42 48 54 60 66 72 78 84 90 96 102 108 114 120
7 14 21 28 35 42 49 56 63 70 77 84 91 98 105 112 119 126 133 140
8 16 24 32 40 48 56 64 72 80 88 96 104 112 120 128 136 144 152 160
9 18 27 36 45 54 63 72 81 90 99 108 117 126 135 144 153 162 171 180
10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200
11 22 33 44 55 66 77 88 99 110 121 132 143 154 165 176 187 198 209 220
12 24 36 48 60 72 84 96 108 120 132 144 156 168 180 192 204 216 228 240
13 26 39 52 65 78 91 104 117 130 143 156 169 182 195 208 221 234 247 260
14 28 42 56 70 84 98 112 126 140 154 168 182 196 210 224 238 252 266 280
15 30 45 60 75 90 105 120 135 150 165 180 195 210 225 240 255 270 285 300
16 32 48 64 80 96 112 128 144 160 176 192 208 224 240 256 272 288 304 320
17 34 51 68 85 102 119 136 153 170 187 204 221 238 255 272 289 306 323 340
18 36 54 72 90 108 126 144 162 180 198 216 234 252 270 288 306 324 342 360
19 38 57 76 95 114 133 152 171 190 209 228 247 266 285 304 323 342 361 380
20 40 60 80 100 120 140 160 180 200 220 240 260 280 300 320 340 360 380 400
You can left adjust string with spaces with {:<6}
for row in range(1, 20+1):
table = ''
for column in range(1, 20+1):
table += '{:<6} '.format(row * column)
print table.strip()
Output
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40
3 6 9 12 15 18 21 24 27 30 33 36 39 42 45 48 51 54 57 60
4 8 12 16 20 24 28 32 36 40 44 48 52 56 60 64 68 72 76 80
5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100
6 12 18 24 30 36 42 48 54 60 66 72 78 84 90 96 102 108 114 120
7 14 21 28 35 42 49 56 63 70 77 84 91 98 105 112 119 126 133 140
8 16 24 32 40 48 56 64 72 80 88 96 104 112 120 128 136 144 152 160
9 18 27 36 45 54 63 72 81 90 99 108 117 126 135 144 153 162 171 180
10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200
11 22 33 44 55 66 77 88 99 110 121 132 143 154 165 176 187 198 209 220
12 24 36 48 60 72 84 96 108 120 132 144 156 168 180 192 204 216 228 240
13 26 39 52 65 78 91 104 117 130 143 156 169 182 195 208 221 234 247 260
14 28 42 56 70 84 98 112 126 140 154 168 182 196 210 224 238 252 266 280
15 30 45 60 75 90 105 120 135 150 165 180 195 210 225 240 255 270 285 300
16 32 48 64 80 96 112 128 144 160 176 192 208 224 240 256 272 288 304 320
17 34 51 68 85 102 119 136 153 170 187 204 221 238 255 272 289 306 323 340
18 36 54 72 90 108 126 144 162 180 198 216 234 252 270 288 306 324 342 360
19 38 57 76 95 114 133 152 171 190 209 228 247 266 285 304 323 342 361 380
20 40 60 80 100 120 140 160 180 200 220 240 260 280 300 320 340 360 380 400
So a simple way of doing this is using generators (I find it more readable right justified):
>>> n = 10
>>> print('\n'.join(''.join(format(i*j, ' >4') for i in range(1, n+1)) for j in range(1, n+1)))
1 2 3 4 5 6 7 8 9 10
2 4 6 8 10 12 14 16 18 20
3 6 9 12 15 18 21 24 27 30
4 8 12 16 20 24 28 32 36 40
5 10 15 20 25 30 35 40 45 50
6 12 18 24 30 36 42 48 54 60
7 14 21 28 35 42 49 56 63 70
8 16 24 32 40 48 56 64 72 80
9 18 27 36 45 54 63 72 81 90
10 20 30 40 50 60 70 80 90 100
If you need to work out the maximum width dynamically then you can use math.log10():
>>> import math
>>> n = 9
>>> w = int(math.log10(n**2))+1
>>> print('\n'.join(' '.join(format(i*j, ' >'+str(w)) for i in range(1, n+1)) for j in range(1, n+1)))
1 2 3 4 5 6 7 8 9
2 4 6 8 10 12 14 16 18
3 6 9 12 15 18 21 24 27
4 8 12 16 20 24 28 32 36
5 10 15 20 25 30 35 40 45
6 12 18 24 30 36 42 48 54
7 14 21 28 35 42 49 56 63
8 16 24 32 40 48 56 64 72
9 18 27 36 45 54 63 72 81
Your code example is not working due to the last "strip", which is removing whitespace through the left and shifting your results.
Just removing the strip:
if __name__ == '__main__':
for row in range(1, 20+1):
table = ''
for column in range(1, 20+1):
table += '{:4} '.format(row * column)
print(table)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40
3 6 9 12 15 18 21 24 27 30 33 36 39 42 45 48 51 54 57 60
4 8 12 16 20 24 28 32 36 40 44 48 52 56 60 64 68 72 76 80
5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100
6 12 18 24 30 36 42 48 54 60 66 72 78 84 90 96 102 108 114 120
7 14 21 28 35 42 49 56 63 70 77 84 91 98 105 112 119 126 133 140
8 16 24 32 40 48 56 64 72 80 88 96 104 112 120 128 136 144 152 160
9 18 27 36 45 54 63 72 81 90 99 108 117 126 135 144 153 162 171 180
10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200
11 22 33 44 55 66 77 88 99 110 121 132 143 154 165 176 187 198 209 220
12 24 36 48 60 72 84 96 108 120 132 144 156 168 180 192 204 216 228 240
13 26 39 52 65 78 91 104 117 130 143 156 169 182 195 208 221 234 247 260
14 28 42 56 70 84 98 112 126 140 154 168 182 196 210 224 238 252 266 280
15 30 45 60 75 90 105 120 135 150 165 180 195 210 225 240 255 270 285 300
16 32 48 64 80 96 112 128 144 160 176 192 208 224 240 256 272 288 304 320
17 34 51 68 85 102 119 136 153 170 187 204 221 238 255 272 289 306 323 340
18 36 54 72 90 108 126 144 162 180 198 216 234 252 270 288 306 324 342 360
19 38 57 76 95 114 133 152 171 190 209 228 247 266 285 304 323 342 361 380
20 40 60 80 100 120 140 160 180 200 220 240 260 280 300 320 340 360 380 400
Another possible approach would be to use list comprehensions as follows:
for row in [['{:4}'.format(row * col) for col in range(1, 21)] for row in range(1, 21)]:
print ''.join(row)
This would give you the following output:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40
3 6 9 12 15 18 21 24 27 30 33 36 39 42 45 48 51 54 57 60
4 8 12 16 20 24 28 32 36 40 44 48 52 56 60 64 68 72 76 80
5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100
6 12 18 24 30 36 42 48 54 60 66 72 78 84 90 96 102 108 114 120
7 14 21 28 35 42 49 56 63 70 77 84 91 98 105 112 119 126 133 140
8 16 24 32 40 48 56 64 72 80 88 96 104 112 120 128 136 144 152 160
9 18 27 36 45 54 63 72 81 90 99 108 117 126 135 144 153 162 171 180
10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200
11 22 33 44 55 66 77 88 99 110 121 132 143 154 165 176 187 198 209 220
12 24 36 48 60 72 84 96 108 120 132 144 156 168 180 192 204 216 228 240
13 26 39 52 65 78 91 104 117 130 143 156 169 182 195 208 221 234 247 260
14 28 42 56 70 84 98 112 126 140 154 168 182 196 210 224 238 252 266 280
15 30 45 60 75 90 105 120 135 150 165 180 195 210 225 240 255 270 285 300
16 32 48 64 80 96 112 128 144 160 176 192 208 224 240 256 272 288 304 320
17 34 51 68 85 102 119 136 153 170 187 204 221 238 255 272 289 306 323 340
18 36 54 72 90 108 126 144 162 180 198 216 234 252 270 288 306 324 342 360
19 38 57 76 95 114 133 152 171 190 209 228 247 266 285 304 323 342 361 380
20 40 60 80 100 120 140 160 180 200 220 240 260 280 300 320 340 360 380 400
I have the following string (say the variable name is "str")
(((TEST (4 5 17 33 38 45 93 101 104 108 113 116 135 146 148)) (TRAIN (0 1 2 3 6 7 8 9 10 11 12 13 14 15 16 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 34 35 36 37 39 40 41 42 43 44 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 94 95 96 97 98 99 100 102 103 105 106 107 109 110 111 112 114 115 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 136 137 138 139 140 141 142 143 144 145 147 149 150 151))) ((TEST (19 35 46 47 48 56 59 61 65 69 71 84 105 107 130)) (TRAIN (0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 36 37 38 39 40 41 42 43 44 45 49 50 51 52 53 54 55 57 58 60 62 63 64 66 67 68 70 72 73 74 75 76 77 78 79 80 81 82 83 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 106 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151)))'
from which I would like to get
['TEST (4 5 17 33 38 45 93 101 104 108 113 116 135 146 148)', 'TEST (19 35 46 47 48 56 59 61 65 69 71 84 105 107 130)']
using re.findall() function in Python.
I tried the following
m = re.findall(r'TEST\s\((\d+\s?)*\)', str)
for which I get the result
['148', '130']
which is a list of only the last numbers of each set of numbers I want. I don't know why my regexp is wrong. Can someone please help me fix this problem?
Thanks!
Do not use a capturing group that repeats; only the last value will be captured. re.findall() will only return captured groups when you use them.
A non-capturing group for the repeat would work much better here:
m = re.findall(r'TEST\s\((?:\d+\s?)*\)', str)
Demo:
>>> import re
>>> s = '(((TEST (4 5 17 33 38 45 93 101 104 108 113 116 135 146 148)) (TRAIN (0 1 2 3 6 7 8 9 10 11 12 13 14 15 16 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 34 35 36 37 39 40 41 42 43 44 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 94 95 96 97 98 99 100 102 103 105 106 107 109 110 111 112 114 115 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 136 137 138 139 140 141 142 143 144 145 147 149 150 151))) ((TEST (19 35 46 47 48 56 59 61 65 69 71 84 105 107 130)) (TRAIN (0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 36 37 38 39 40 41 42 43 44 45 49 50 51 52 53 54 55 57 58 60 62 63 64 66 67 68 70 72 73 74 75 76 77 78 79 80 81 82 83 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 106 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151)))'
>>> re.findall(r'TEST\s\((?:\d+\s?)*\)', s)
['TEST (4 5 17 33 38 45 93 101 104 108 113 116 135 146 148)', 'TEST (19 35 46 47 48 56 59 61 65 69 71 84 105 107 130)']
Without the capturing group, re.findall() returns the whole match.
You can use (not worrying about the digits in between):
import re
print re.findall(r'\((TEST.*?\))\)', s)
['TEST (4 5 17 33 38 45 93 101 104 108 113 116 135 146 148)', 'TEST (19 35 46 47 48 56 59 61 65 69 71 84 105 107 130)']
Try this one. After TEST it matches every character until a closing parentheses and it stops there ([^)]+):
re.findall(r'\((TEST[^)]+\))', s)
It yields:
['TEST (4 5 17 33 38 45 93 101 104 108 113 116 135 146 148)',
'TEST (19 35 46 47 48 56 59 61 65 69 71 84 105 107 130)']