cumsum in numpy ndarray (edited) - python

I had an issue with cumsum in a dataframe, which was nicely resolved here : https://stackoverflow.com/a/61842690/7937578
But when I tried to do it with my entire dataframe, I couldn't fit all my data into pandas, so I tried converting it to numpy arrays only, but I can't seem to reproduce the code in numpy only.
So far I have this :
test = np.arange(200).reshape(4, 50)
test[2] = np.random.choice([-1, 0, 1], size=50)
TARGET_SUM = 10
x = np.cumsum(test[2] != 0)
changing = np.roll(x, 1) != x
indices = np.where(changing & (x % TARGET_SUM == 0) & (x > 0))[0]
indices = np.concatenate(([-1,], indices))
indices += 1
for i1, i2 in zip(indices[0:-1], indices[1:]):
print(i1, i2)
print(test[i1:i2])
But the output is this :
0 13
[[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35
36 37 38 39 40 41 42 43 44 45 46 47 48 49]
[ 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67
68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85
86 87 88 89 90 91 92 93 94 95 96 97 98 99]
[ -1 1 0 -1 -1 0 0 -1 1 -1 1 1 -1 -1 0 -1 0 0
0 -1 0 0 -1 1 -1 1 1 -1 -1 0 1 0 0 -1 1 -1
1 0 0 0 1 0 -1 1 1 1 1 1 1 1]
[150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167
168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185
186 187 188 189 190 191 192 193 194 195 196 197 198 199]]
13 29
[]
29 46
[]
Where it should be more like this :
0 13
[[ 0 1 2 3 4 5 6 7 8 9 10 11 12]
[ 50 51 52 53 54 55 56 57 58 59 60 61 62]
[ -1 1 0 -1 -1 0 0 -1 1 -1 1 1 -1]
[ 150 151 152 153 154 155 156 157 158 159 160 161 162]]
13 29
etc...

The solution of #Ben.T juste above worked perfectly !
Quoting : "I think suppr is a list of arrays, so I guess what you need is print(np.array(suppr)[:, i1:i2]) and if it is already an array, then suppr[:, i1:i2] should be enough. "

Related

Linear warping of pandas time-series

I have a pandas data frame, df:
import pandas as pd
import numpy as np
np.random.seed(123)
s = np.arange(5)
df = pd.DataFrame()
for i in s:
s_df = pd.DataFrame({'time':np.arange(100),
'x':np.arange(100),
'y':np.arange(100),
'r':np.random.randint(60,100)})
s_df['unit'] = str(i)
df = df.append(s_df)
I want select the 'x' and 'y' data for each 'unit', from 'time' 0 up until its value of 'r', and then warp the selected data to fit a new normalized timescale of 0-100. The new DataFrame should look the same, but x and y will have been stretched to fit the new timescale.
I think you can start with this and modify:
df.groupby('unit', as_index=False, group_keys=False)\
.apply(lambda g: g[g.time <= g.r.max()].pipe(lambda x: x.assign(x = np.interp(x.time * 100/x.r.max(), g.time, g.x),
y = np.interp(x.time * 100/x.r.max(), g.time, g.y))))
Output:
r time x y unit
0 91 0 0.369445 0.802790 0
1 91 1 0.802881 0.411523 0
2 91 2 0.080290 0.228482 0
3 91 3 0.248865 0.624470 0
4 91 4 0.350376 0.207805 0
5 91 5 0.604447 0.495269 0
6 91 6 0.402430 0.317250 0
7 91 7 0.205757 0.296371 0
8 91 8 0.426954 0.793716 0
9 91 9 0.728095 0.486691 0
10 91 10 0.087941 0.701258 0
11 91 11 0.653719 0.937834 0
12 91 12 0.702571 0.381267 0
13 91 13 0.113419 0.492686 0
14 91 14 0.381172 0.539422 0
15 91 15 0.490320 0.166290 0
16 91 16 0.440490 0.029675 0
17 91 17 0.663973 0.245057 0
18 91 18 0.273116 0.280711 0
19 91 19 0.807658 0.869288 0
20 91 20 0.227972 0.987803 0
21 91 21 0.747295 0.526613 0
22 91 22 0.491929 0.118479 0
23 91 23 0.403465 0.564284 0
24 91 24 0.618359 0.648467 0
25 91 25 0.867436 0.447866 0
26 91 26 0.487128 0.526473 0
27 91 27 0.135412 0.855466 0
28 91 28 0.469281 0.753690 0
29 91 29 0.397495 0.786670 0
.. .. ... ... ... ...
53 82 53 0.985053 0.534743 4
54 82 54 0.255997 0.789710 4
55 82 55 0.629316 0.889916 4
56 82 56 0.730672 0.539548 4
57 82 57 0.484289 0.278669 4
58 82 58 0.120573 0.754350 4
59 82 59 0.071606 0.912240 4
60 82 60 0.126613 0.775831 4
61 82 61 0.392633 0.706384 4
62 82 62 0.312653 0.698514 4
63 82 63 0.164337 0.420798 4
64 82 64 0.655284 0.317136 4
65 82 65 0.526961 0.484673 4
66 82 66 0.205197 0.516752 4
67 82 67 0.405965 0.314419 4
68 82 68 0.892710 0.620090 4
69 82 69 0.351876 0.422846 4
70 82 70 0.601449 0.152340 4
71 82 71 0.187239 0.486854 4
72 82 72 0.757108 0.727058 4
73 82 73 0.728311 0.623236 4
74 82 74 0.725225 0.279149 4
75 82 75 0.536730 0.746806 4
76 82 76 0.584319 0.543595 4
77 82 77 0.591636 0.451003 4
78 82 78 0.042806 0.766688 4
79 82 79 0.326183 0.832956 4
80 82 80 0.558992 0.507238 4
81 82 81 0.303649 0.143872 4
82 82 82 0.303214 0.113151 4
[428 rows x 5 columns]

Matplotlib ScalarMappable returning only black color

I am using matplotlib to create many plots. The plots involve making many FancyBboxPatches and setting the color for each patch using a ScalarMappable. Each plot corresponds to a "time step" from a physical process. I have made the following minimal working example to illustrate what I am trying to do and the problem I am having.
Suppose there is a file data.txt. If a line has one entry, that value is the time step. If a line has three entries, then the first entry is the x value, the second entry is the y value, and the third entry is the value that will use the ScalarMappable. Here is an example of data.txt:
1
0 0 0.1
0 1 1
0 2 2
0 3 3
0 4 4
1 0 10
1 1 11
1 2 12
1 3 13
1 4 14
2 0 20
2 1 21
2 2 22
2 3 23
2 4 24
3 0 30
3 1 31
3 2 32
3 3 33
3 4 34
2
1 0 10
1 1 11
1 2 12
1 3 13
1 4 14
2 0 110
2 1 111
2 2 112
2 3 113
2 4 114
3 0 120
3 1 121
3 2 122
3 3 123
3 4 124
4 0 130
4 1 131
4 2 132
4 3 133
4 4 134
3
2 0 110
2 1 111
2 2 112
2 3 113
2 4 114
3 0 1110
3 1 1111
3 2 1112
3 3 1113
3 4 1114
4 0 1120
4 1 1121
4 2 1122
4 3 1123
4 4 1124
5 0 1130
5 1 1131
5 2 1132
5 3 1133
5 4 1134
4
3 0 1110
3 1 1111
3 2 1112
3 3 1113
3 4 1114
4 0 11110
4 1 11111
4 2 11112
4 3 11113
4 4 11114
5 0 11120
5 1 11121
5 2 11122
5 3 11123
5 4 11124
6 0 11130
6 1 11131
6 2 11132
6 3 11133
6 4 11134
Here is the script I use to generate the plots:
#!/usr/bin/env python3
import matplotlib.pyplot as plt
import matplotlib.cm as cm
from matplotlib.colors import LogNorm
from matplotlib.patches import FancyBboxPatch
def parse_file(file_name):
output = {}
with open(file_name, 'r') as data_file:
for line in data_file:
entries = line.strip().split()
if len(entries) == 1:
time_step = int(entries[0])
output[time_step] = {}
elif len(entries) == 3:
x = float(entries[0])
y = float(entries[1])
value = float(entries[2])
output[time_step][(x, y)] = value
else:
raise RuntimeError('Anomalous line {} in file {}'.format(line, data_file.name))
return output
def main():
fig, axes = plt.subplots()
axes.set_xlim(-1,10)
axes.set_ylim(-1,10)
cmap = cm.plasma
norm = LogNorm(vmin = 1e-2, vmax = 1.2e4)
smap = cm.ScalarMappable(norm = norm, cmap = cmap)
smap.set_array([])
color_bar = fig.colorbar(mappable = smap, ax = axes, orientation = 'vertical', label = 'label')
data = parse_file(file_name = 'data.txt')
for time_step, information in data.items():
cells = []
for (x,y), value in information.items():
cell = FancyBboxPatch(xy = (x - 0.5, y - 0.5),
width = 1, height = 1,
boxstyle = 'square,pad=0.',
edgecolor = 'black',
facecolor = smap.to_rgba(value))
#print(time_step, '\t', x, '\t', y, '\t', value, '\t', smap.to_rgba(value))
axes.add_patch(cell)
cells.append(cell)
fig.savefig('time-step_{}.png'.format(time_step))
for cell in cells:
cell.remove()
if __name__ == '__main__':
main()
And here is one of the plots that is created from running that script:
This plot (and the other three that are created, but not shown here) look fine. So I am confident that I am using ScalarMappable correctly. Now I take the actual data I want to plot, again in a file called data.txt. The format is the same as before, except if a line has four entries, then the first entry is the time step (and I do not care about the other entries). Here is an example of data.txt:
2 0.424066E-02 0.200000E+01 0.885500E+08
0 1 0.850703E+00
1 3 0.388551E-09
2 4 0.141948E-06
2 6 0.126299E-09
3 9 0.166871E-08
4 12 0.340738E-08
5 13 0.246948E-09
5 14 0.129005E-09
6 16 0.140043E-08
6 17 0.885307E-09
26 76 0.591676E-08
26 78 0.745985E-08
27 77 0.263136E-08
27 78 0.131857E-08
27 79 0.151193E-05
27 80 0.265941E-07
27 81 0.170975E-05
27 82 0.206355E-08
27 83 0.334444E-07
28 80 0.569439E-05
28 81 0.864904E-07
28 82 0.114196E-02
28 83 0.130067E-06
28 84 0.608045E-04
28 85 0.351649E-07
28 86 0.543117E-07
28 88 0.202115E-08
29 83 0.225374E-07
29 84 0.125586E-07
29 85 0.253383E-04
29 86 0.943810E-06
29 87 0.104539E-04
29 88 0.210241E-06
29 89 0.196533E-03
29 90 0.707278E-06
29 91 0.565096E-05
29 92 0.840856E-08
29 93 0.277478E-07
30 86 0.707234E-09
30 88 0.549048E-07
30 89 0.281776E-08
30 90 0.259219E-04
30 91 0.298973E-06
30 92 0.311047E-04
30 93 0.144465E-05
30 94 0.632642E-04
30 95 0.787893E-08
30 96 0.252900E-08
31 91 0.425350E-08
31 92 0.371105E-08
31 93 0.621869E-05
31 94 0.680069E-06
31 95 0.315149E-04
31 96 0.670790E-07
31 97 0.568911E-06
31 98 0.187946E-08
31 99 0.135024E-07
32 94 0.384693E-09
32 96 0.174407E-06
32 97 0.480216E-08
32 98 0.244989E-05
32 99 0.876257E-07
32 100 0.189371E-04
32 101 0.264917E-06
32 102 0.297745E-05
32 103 0.213684E-09
33 99 0.110356E-08
33 100 0.131345E-08
33 101 0.448076E-06
33 102 0.106369E-06
33 103 0.128984E-04
33 104 0.230382E-07
33 105 0.266535E-07
34 102 0.428166E-08
34 103 0.668242E-08
34 104 0.842244E-05
34 105 0.843016E-07
34 106 0.137510E-05
34 107 0.879097E-08
34 108 0.758233E-07
35 105 0.280844E-06
35 106 0.639110E-07
35 107 0.497335E-05
35 108 0.260105E-06
35 109 0.188060E-05
35 110 0.375853E-09
35 111 0.935430E-09
35 112 0.138533E-07
35 113 0.101658E-06
35 114 0.504823E-09
35 115 0.989704E-09
35 116 0.152468E-06
35 117 0.220735E-07
36 114 0.430884E-08
36 116 0.115980E-07
36 117 0.128436E-05
36 118 0.814433E-05
37 117 0.316595E-09
37 118 0.141531E-06
37 119 0.965141E-05
38 119 0.459954E-08
38 120 0.114088E-04
38 121 0.198695E-09
39 120 0.109457E-08
39 121 0.105160E-04
39 122 0.254984E-08
40 122 0.717566E-05
40 123 0.179081E-08
40 124 0.352463E-09
41 123 0.454357E-05
41 124 0.629608E-07
41 125 0.777480E-07
42 124 0.453866E-05
42 125 0.108592E-06
42 126 0.320262E-06
42 127 0.252596E-09
42 128 0.114714E-09
43 125 0.372578E-06
43 126 0.344297E-07
43 127 0.188018E-05
43 128 0.631276E-08
43 129 0.368003E-08
44 126 0.170090E-07
44 127 0.121695E-07
44 128 0.147407E-05
44 129 0.349674E-07
44 130 0.767494E-06
45 128 0.193141E-09
45 129 0.361851E-06
45 130 0.573704E-07
45 131 0.457287E-06
45 132 0.148004E-08
45 133 0.164772E-07
45 134 0.386942E-09
45 135 0.539603E-08
45 136 0.227778E-09
45 137 0.640126E-08
45 138 0.189604E-09
45 139 0.754561E-09
46 132 0.215880E-07
46 134 0.102847E-08
46 136 0.628736E-08
46 137 0.427124E-09
46 138 0.711664E-07
46 139 0.749082E-08
46 140 0.425043E-06
46 141 0.776307E-08
46 142 0.102985E-06
46 143 0.693232E-09
46 144 0.215846E-08
47 141 0.660244E-08
47 142 0.901189E-09
47 143 0.299062E-07
47 144 0.195833E-08
47 145 0.178405E-07
47 146 0.558550E-09
47 147 0.235167E-08
48 144 0.393065E-09
48 146 0.493252E-08
48 147 0.299176E-09
48 148 0.130504E-07
48 149 0.244654E-09
48 150 0.143702E-08
49 149 0.565286E-09
49 151 0.122230E-08
3 0.424066E-02 0.200000E+01 0.885500E+08
0 1 0.850710E+00
1 3 0.388551E-09
2 4 0.141948E-06
2 6 0.126299E-09
3 9 0.166871E-08
4 12 0.340738E-08
5 13 0.246948E-09
5 14 0.129005E-09
6 16 0.140043E-08
6 17 0.885307E-09
26 76 0.593799E-08
26 78 0.747463E-08
27 77 0.283934E-08
27 78 0.115725E-08
27 79 0.153613E-05
27 80 0.236099E-08
27 81 0.171178E-05
27 83 0.334426E-07
28 80 0.575684E-05
28 81 0.242170E-07
28 82 0.114208E-02
28 83 0.133947E-07
28 84 0.608362E-04
28 85 0.335522E-08
28 86 0.543624E-07
28 88 0.202170E-08
29 83 0.258149E-07
29 84 0.107337E-07
29 85 0.261133E-04
29 86 0.167223E-06
29 87 0.108977E-04
29 88 0.432469E-08
29 89 0.196993E-03
29 90 0.997563E-08
29 91 0.565922E-05
29 92 0.127589E-09
29 93 0.277365E-07
30 86 0.731139E-09
30 88 0.613936E-07
30 89 0.984612E-09
30 90 0.261316E-04
30 91 0.845314E-07
30 92 0.324848E-04
30 93 0.656773E-07
30 94 0.632706E-04
30 95 0.335583E-09
30 96 0.252938E-08
31 91 0.529954E-08
31 92 0.394099E-08
31 93 0.681605E-05
31 94 0.104800E-06
31 95 0.315602E-04
31 96 0.231610E-08
31 97 0.566868E-06
31 99 0.135330E-07
32 94 0.450380E-09
32 96 0.178679E-06
32 97 0.955313E-09
32 98 0.252946E-05
32 99 0.770340E-08
32 100 0.191937E-04
32 101 0.825856E-08
32 102 0.297762E-05
33 99 0.128999E-08
33 100 0.146516E-08
33 101 0.616111E-06
33 102 0.539415E-07
33 103 0.128046E-04
33 104 0.865090E-09
33 105 0.266759E-07
34 102 0.899336E-08
34 103 0.331924E-08
34 104 0.850733E-05
34 105 0.462457E-08
34 106 0.137714E-05
34 107 0.199044E-09
34 108 0.758844E-07
35 105 0.308602E-06
35 106 0.470668E-07
35 107 0.520013E-05
35 108 0.458893E-07
35 109 0.185756E-05
35 111 0.159320E-07
35 112 0.729552E-09
35 113 0.101697E-06
35 114 0.135746E-09
35 115 0.128676E-06
35 116 0.231448E-07
35 117 0.220783E-07
36 114 0.480979E-08
36 116 0.921582E-06
36 117 0.373798E-06
36 118 0.814449E-05
37 117 0.888355E-08
37 118 0.132905E-06
37 119 0.965147E-05
38 118 0.360663E-09
38 119 0.423745E-08
38 120 0.114090E-04
39 120 0.109122E-08
39 121 0.105186E-04
40 122 0.717737E-05
40 124 0.352428E-09
41 123 0.460618E-05
41 124 0.358205E-09
41 125 0.777514E-07
42 124 0.464136E-05
42 125 0.589035E-08
42 126 0.320503E-06
42 128 0.114709E-09
43 125 0.408148E-06
43 126 0.567978E-08
43 127 0.187958E-05
43 129 0.368007E-08
44 126 0.258868E-07
44 127 0.348446E-08
44 128 0.150718E-05
44 129 0.167101E-08
44 130 0.767515E-06
45 128 0.176686E-09
45 129 0.403334E-06
45 130 0.162718E-07
45 131 0.458273E-06
45 132 0.196826E-09
45 133 0.167474E-07
45 135 0.563904E-08
45 137 0.655709E-08
45 139 0.751998E-09
46 132 0.216010E-07
46 134 0.107901E-08
46 136 0.673825E-08
46 138 0.784839E-07
46 139 0.220743E-09
46 140 0.432287E-06
46 141 0.427029E-09
46 142 0.103696E-06
46 144 0.211976E-08
47 141 0.696394E-08
47 142 0.585710E-09
47 143 0.315456E-07
47 144 0.425448E-09
47 145 0.181981E-07
47 146 0.136911E-09
47 147 0.226765E-08
48 144 0.442465E-09
48 146 0.553370E-08
48 147 0.138932E-09
48 148 0.128376E-07
48 150 0.144107E-08
49 149 0.624360E-09
49 151 0.123765E-08
The script that I use to create the plots is almost the same as before. The only differences are (1) how data.txt is parsed, (2) setting the limits of the x and y axes, and (3) the variable norm. Here is the script:
#!/usr/bin/env python3
import matplotlib.pyplot as plt
import matplotlib.cm as cm
from matplotlib.colors import LogNorm
from matplotlib.patches import FancyBboxPatch
def parse_file(file_name):
output = {}
with open(file_name, 'r') as data_file:
for line in data_file:
entries = line.strip().split()
if len(entries) == 4:
time_step = int(entries[0])
output[time_step] = {}
elif len(entries) == 3:
x = float(entries[0])
y = float(entries[1])
value = float(entries[2])
output[time_step][(x, y)] = value
else:
raise RuntimeError('Anomalous line {} in file {}'.format(line, data_file.name))
return output
def main():
fig, axes = plt.subplots()
axes.set_xlim(0,150)
axes.set_ylim(0,250)
cmap = cm.plasma
norm = LogNorm(vmin = pow(10, -10), vmax = pow(10, -2.2))
smap = cm.ScalarMappable(norm = norm, cmap = cmap)
smap.set_array([])
color_bar = fig.colorbar(mappable = smap, ax = axes, orientation = 'vertical', label = 'label')
data = parse_file(file_name = 'data.txt')
for time_step, information in data.items():
cells = []
for (x,y), value in information.items():
cell = FancyBboxPatch(xy = (x - 0.5, y - 0.5),
width = 1, height = 1,
boxstyle = 'square,pad=0.',
edgecolor = 'black',
facecolor = smap.to_rgba(value))
#print(time_step, '\t', x, '\t', y, '\t', value, '\t', smap.to_rgba(value))
axes.add_patch(cell)
cells.append(cell)
fig.savefig('time-step_{}.png'.format(time_step))
for cell in cells:
cell.remove()
if __name__ == '__main__':
main()
Now all of the patches are black. Here is one of the plots that is created:
I do not see anything obviously wrong using the print statement (which is commented out in the script):
print(time_step, '\t', x, '\t', y, '\t', value, '\t', smap.to_rgba(value))
Why are all the FancyBboxPatches black instead of the color I have chosen with the ScalarMappable (and how can I make them be the color I have chosen with the ScalarMappable)?
It doesn't look as if the patches are black. I would guess that they are just too small, such that their edge (which is black) takes up the complete area of the patch. You may use a thinner edge, or no edge at all, or you may set the edgecolor to the value of your liking as well. In general, You may also use simple patches like Rectangle instead of the FancyBboxPatch.

Sliding a subarray along a 2d array

This is something I've been struggling with for a couple of weeks. The algorithm is the following:
Select a subarray as an array of rows and columns from a larger array
Compute the median of the subarray
Replace cells in subarray with median value
Move the subarray to the right by its own length
Repeat to end of array
Move subarray down by its own height
Repeat
I've got steps 1 to 3 as follows:
import numpy as np
w1 = np.arange(100).reshape(10,10)
side = 3
patch = w1[0:side, 0:side]
i, j = patch.shape
for j in range(side):
for i in range(side):
patch[i,j] = np.median(patch)
Eventually, I'll be using a 901x877 array from an image but I'm just trying to get a hold of this simple task first. How can I slide the array along and then down with a loop?
You can use scikit-image's view_as_blocks and NumPy broadcasting to vectorize the operation:
import numpy as np
import skimage
w1 = np.arange(144).reshape(12,12)
print(w1)
# [[ 0 1 2 3 4 5 6 7 8 9 10 11]
# [ 12 13 14 15 16 17 18 19 20 21 22 23]
# [ 24 25 26 27 28 29 30 31 32 33 34 35]
# [ 36 37 38 39 40 41 42 43 44 45 46 47]
# [ 48 49 50 51 52 53 54 55 56 57 58 59]
# [ 60 61 62 63 64 65 66 67 68 69 70 71]
# [ 72 73 74 75 76 77 78 79 80 81 82 83]
# [ 84 85 86 87 88 89 90 91 92 93 94 95]
# [ 96 97 98 99 100 101 102 103 104 105 106 107]
# [108 109 110 111 112 113 114 115 116 117 118 119]
# [120 121 122 123 124 125 126 127 128 129 130 131]
# [132 133 134 135 136 137 138 139 140 141 142 143]]
side = 3
w2 = skimage.util.view_as_blocks(w1, (side, side))
w2[...] = np.median(w2, axis=(-2, -1))[:, :, None, None]
print(w1)
# [[ 13 13 13 16 16 16 19 19 19 22 22 22]
# [ 13 13 13 16 16 16 19 19 19 22 22 22]
# [ 13 13 13 16 16 16 19 19 19 22 22 22]
# [ 49 49 49 52 52 52 55 55 55 58 58 58]
# [ 49 49 49 52 52 52 55 55 55 58 58 58]
# [ 49 49 49 52 52 52 55 55 55 58 58 58]
# [ 85 85 85 88 88 88 91 91 91 94 94 94]
# [ 85 85 85 88 88 88 91 91 91 94 94 94]
# [ 85 85 85 88 88 88 91 91 91 94 94 94]
# [121 121 121 124 124 124 127 127 127 130 130 130]
# [121 121 121 124 124 124 127 127 127 130 130 130]
# [121 121 121 124 124 124 127 127 127 130 130 130]]
Note that I had to change the size of your array to 12x12 so that all of your tiles of 3x3 actually fit in there.
Here are a few "code smells" I see.
Start with the range(side) since this number is set to 3 then you are going to have a result of [0,1,2]. Is that what you really want?
you set i,j = patch.size then immediately over write these values, in your for loops.
Finally, you're recalculating median every loop.
Ok, here's what I'd do.
figure out how many patches you'll need in both width and height. and multiply those by the size of the side.
slice your array (matrix) up into those pieces.
assign the patch to the median.
import numpy as np
w1 = np.arange(100).reshape(10,10)
side = 3
w, h = w1.shape
width_index = np.array(range(w//side)) * side
height_index = np.array(range(h//side)) * side
def assign_patch(patch, median, side):
"""Break this loop out to prevent 4 nested 'for' loops"""
for j in range(side):
for i in range(side):
patch[i,j] = median
return patch
for width in width_index:
for height in height_index:
patch = w1[width:width+side, height:height+side]
median = np.median(patch)
assign_patch(patch, median, side)
print w1

Multiplication table format

I've created a simple script that will make a multiplication table and output it. It works and is pretty cool but I would like to know if there's a way I could fix it for when it goes higher then 10. After 10 (on the row) it will be a whitespace off of the rest of the table, how can I fix this little format issue?
if __name__ == '__main__':
for row in range(1, 20+1):
table = ''
for column in range(1, 20+1):
table += '{:4} '.format(row * column)
print(table.strip())
Example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40
3 6 9 12 15 18 21 24 27 30 33 36 39 42 45 48 51 54 57 60
4 8 12 16 20 24 28 32 36 40 44 48 52 56 60 64 68 72 76 80
5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100
6 12 18 24 30 36 42 48 54 60 66 72 78 84 90 96 102 108 114 120
7 14 21 28 35 42 49 56 63 70 77 84 91 98 105 112 119 126 133 140
8 16 24 32 40 48 56 64 72 80 88 96 104 112 120 128 136 144 152 160
9 18 27 36 45 54 63 72 81 90 99 108 117 126 135 144 153 162 171 180
10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200
11 22 33 44 55 66 77 88 99 110 121 132 143 154 165 176 187 198 209 220
12 24 36 48 60 72 84 96 108 120 132 144 156 168 180 192 204 216 228 240
13 26 39 52 65 78 91 104 117 130 143 156 169 182 195 208 221 234 247 260
14 28 42 56 70 84 98 112 126 140 154 168 182 196 210 224 238 252 266 280
15 30 45 60 75 90 105 120 135 150 165 180 195 210 225 240 255 270 285 300
16 32 48 64 80 96 112 128 144 160 176 192 208 224 240 256 272 288 304 320
17 34 51 68 85 102 119 136 153 170 187 204 221 238 255 272 289 306 323 340
18 36 54 72 90 108 126 144 162 180 198 216 234 252 270 288 306 324 342 360
19 38 57 76 95 114 133 152 171 190 209 228 247 266 285 304 323 342 361 380
20 40 60 80 100 120 140 160 180 200 220 240 260 280 300 320 340 360 380 400
You can left adjust string with spaces with {:<6}
for row in range(1, 20+1):
table = ''
for column in range(1, 20+1):
table += '{:<6} '.format(row * column)
print table.strip()
Output
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40
3 6 9 12 15 18 21 24 27 30 33 36 39 42 45 48 51 54 57 60
4 8 12 16 20 24 28 32 36 40 44 48 52 56 60 64 68 72 76 80
5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100
6 12 18 24 30 36 42 48 54 60 66 72 78 84 90 96 102 108 114 120
7 14 21 28 35 42 49 56 63 70 77 84 91 98 105 112 119 126 133 140
8 16 24 32 40 48 56 64 72 80 88 96 104 112 120 128 136 144 152 160
9 18 27 36 45 54 63 72 81 90 99 108 117 126 135 144 153 162 171 180
10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200
11 22 33 44 55 66 77 88 99 110 121 132 143 154 165 176 187 198 209 220
12 24 36 48 60 72 84 96 108 120 132 144 156 168 180 192 204 216 228 240
13 26 39 52 65 78 91 104 117 130 143 156 169 182 195 208 221 234 247 260
14 28 42 56 70 84 98 112 126 140 154 168 182 196 210 224 238 252 266 280
15 30 45 60 75 90 105 120 135 150 165 180 195 210 225 240 255 270 285 300
16 32 48 64 80 96 112 128 144 160 176 192 208 224 240 256 272 288 304 320
17 34 51 68 85 102 119 136 153 170 187 204 221 238 255 272 289 306 323 340
18 36 54 72 90 108 126 144 162 180 198 216 234 252 270 288 306 324 342 360
19 38 57 76 95 114 133 152 171 190 209 228 247 266 285 304 323 342 361 380
20 40 60 80 100 120 140 160 180 200 220 240 260 280 300 320 340 360 380 400
So a simple way of doing this is using generators (I find it more readable right justified):
>>> n = 10
>>> print('\n'.join(''.join(format(i*j, ' >4') for i in range(1, n+1)) for j in range(1, n+1)))
1 2 3 4 5 6 7 8 9 10
2 4 6 8 10 12 14 16 18 20
3 6 9 12 15 18 21 24 27 30
4 8 12 16 20 24 28 32 36 40
5 10 15 20 25 30 35 40 45 50
6 12 18 24 30 36 42 48 54 60
7 14 21 28 35 42 49 56 63 70
8 16 24 32 40 48 56 64 72 80
9 18 27 36 45 54 63 72 81 90
10 20 30 40 50 60 70 80 90 100
If you need to work out the maximum width dynamically then you can use math.log10():
>>> import math
>>> n = 9
>>> w = int(math.log10(n**2))+1
>>> print('\n'.join(' '.join(format(i*j, ' >'+str(w)) for i in range(1, n+1)) for j in range(1, n+1)))
1 2 3 4 5 6 7 8 9
2 4 6 8 10 12 14 16 18
3 6 9 12 15 18 21 24 27
4 8 12 16 20 24 28 32 36
5 10 15 20 25 30 35 40 45
6 12 18 24 30 36 42 48 54
7 14 21 28 35 42 49 56 63
8 16 24 32 40 48 56 64 72
9 18 27 36 45 54 63 72 81
Your code example is not working due to the last "strip", which is removing whitespace through the left and shifting your results.
Just removing the strip:
if __name__ == '__main__':
for row in range(1, 20+1):
table = ''
for column in range(1, 20+1):
table += '{:4} '.format(row * column)
print(table)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40
3 6 9 12 15 18 21 24 27 30 33 36 39 42 45 48 51 54 57 60
4 8 12 16 20 24 28 32 36 40 44 48 52 56 60 64 68 72 76 80
5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100
6 12 18 24 30 36 42 48 54 60 66 72 78 84 90 96 102 108 114 120
7 14 21 28 35 42 49 56 63 70 77 84 91 98 105 112 119 126 133 140
8 16 24 32 40 48 56 64 72 80 88 96 104 112 120 128 136 144 152 160
9 18 27 36 45 54 63 72 81 90 99 108 117 126 135 144 153 162 171 180
10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200
11 22 33 44 55 66 77 88 99 110 121 132 143 154 165 176 187 198 209 220
12 24 36 48 60 72 84 96 108 120 132 144 156 168 180 192 204 216 228 240
13 26 39 52 65 78 91 104 117 130 143 156 169 182 195 208 221 234 247 260
14 28 42 56 70 84 98 112 126 140 154 168 182 196 210 224 238 252 266 280
15 30 45 60 75 90 105 120 135 150 165 180 195 210 225 240 255 270 285 300
16 32 48 64 80 96 112 128 144 160 176 192 208 224 240 256 272 288 304 320
17 34 51 68 85 102 119 136 153 170 187 204 221 238 255 272 289 306 323 340
18 36 54 72 90 108 126 144 162 180 198 216 234 252 270 288 306 324 342 360
19 38 57 76 95 114 133 152 171 190 209 228 247 266 285 304 323 342 361 380
20 40 60 80 100 120 140 160 180 200 220 240 260 280 300 320 340 360 380 400
Another possible approach would be to use list comprehensions as follows:
for row in [['{:4}'.format(row * col) for col in range(1, 21)] for row in range(1, 21)]:
print ''.join(row)
This would give you the following output:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40
3 6 9 12 15 18 21 24 27 30 33 36 39 42 45 48 51 54 57 60
4 8 12 16 20 24 28 32 36 40 44 48 52 56 60 64 68 72 76 80
5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100
6 12 18 24 30 36 42 48 54 60 66 72 78 84 90 96 102 108 114 120
7 14 21 28 35 42 49 56 63 70 77 84 91 98 105 112 119 126 133 140
8 16 24 32 40 48 56 64 72 80 88 96 104 112 120 128 136 144 152 160
9 18 27 36 45 54 63 72 81 90 99 108 117 126 135 144 153 162 171 180
10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200
11 22 33 44 55 66 77 88 99 110 121 132 143 154 165 176 187 198 209 220
12 24 36 48 60 72 84 96 108 120 132 144 156 168 180 192 204 216 228 240
13 26 39 52 65 78 91 104 117 130 143 156 169 182 195 208 221 234 247 260
14 28 42 56 70 84 98 112 126 140 154 168 182 196 210 224 238 252 266 280
15 30 45 60 75 90 105 120 135 150 165 180 195 210 225 240 255 270 285 300
16 32 48 64 80 96 112 128 144 160 176 192 208 224 240 256 272 288 304 320
17 34 51 68 85 102 119 136 153 170 187 204 221 238 255 272 289 306 323 340
18 36 54 72 90 108 126 144 162 180 198 216 234 252 270 288 306 324 342 360
19 38 57 76 95 114 133 152 171 190 209 228 247 266 285 304 323 342 361 380
20 40 60 80 100 120 140 160 180 200 220 240 260 280 300 320 340 360 380 400

How to apply a function to rows of two pandas DataFrame

There are two pandas DataFrame, say dfx, dfy of the same shape and exactly the same column and row indices. I want to apply a function to the corresponding rows of these two DataFrame.
In other words, suppose we have a function as follows
def fun( row_x, row_y):
...# a function of the corresponding rows
Let index be the common index of dfx, dfy. I want to compute in pandas the following list/Series
[fun(dfx[i], dfy[i]) for i in index] (pseudo-code)
By the following code, I make a grouped two-level indexed DataFrame. Then I do not know how to apply agg in the proper way.
dfxy = pd.concat({'dfx':dfx, 'dfy':dfy})
dfxy = dfxy.swaplevel(0,1,axis=0).sort_index(level=0)
grouped=dfxy.groupby(level=0)
In [19]:
dfx = pd.DataFrame(data = np.random.randint(0 , 100 , 50).reshape(10 ,-1) , columns=list('abcde'))
dfx
Out[19]:
a b c d e
3 44 8 55 95
26 5 18 34 10
20 20 91 15 8
83 7 50 47 27
97 65 10 94 93
44 6 70 60 4
38 64 8 67 92
44 21 42 6 12
30 98 34 7 79
76 7 14 58 5
In [4]:
dfy = pd.DataFrame(data = np.random.randint(0 , 100 , 50).reshape(10 ,-1) , columns=list('fghij'))
dfy
Out[4]:
f g h i j
82 48 29 54 78
7 31 78 38 30
90 91 43 8 40
52 88 13 87 39
41 88 90 51 91
55 4 94 62 98
31 23 4 59 93
87 12 33 77 0
25 99 39 23 1
7 50 46 39 66
In [13]:
dfxy = pd.concat({'dfx':dfx, 'dfy':dfy} , axis = 1)
dfxy
Out[13]:
dfx dfy
a b c d e f g h i j
20 76 5 98 38 82 48 29 54 78
39 36 9 3 74 7 31 78 38 30
43 12 50 72 14 90 91 43 8 40
89 41 95 91 86 52 88 13 87 39
33 30 55 64 94 41 88 90 51 91
89 84 48 1 60 55 4 94 62 98
68 40 27 10 63 31 23 4 59 93
33 10 86 89 67 87 12 33 77 0
56 89 0 70 67 25 99 39 23 1
48 58 98 18 24 7 50 46 39 66
def f(x , y):
return pd.Series(data = [np.mean(x) , np.mean(y)] , index=['x_mean' , 'y_mean'])
In [17]:
dfxy.apply( lambda x : f(x['dfx'] , x['dfy']) , axis = 1)
Out[17]:
x_mean y_mean
0 47.4 58.2
1 32.2 36.8
2 38.2 54.4
3 80.4 55.8
4 55.2 72.2
5 56.4 62.6
6 41.6 42.0
7 57.0 41.8
8 56.4 37.4
9 49.2 41.6
Could this be what you are looking for?
In [1]: import pandas as pd
In [2]: import numpy as np
In [3]: dfx = pd.DataFrame(data=np.random.randint(0,100,50).reshape(10,-1),
columns=['index', 'a', 'b', 'c', 'd'])
In [4]: dfy = pd.DataFrame(data=np.random.randint(0,100,50).reshape(10,-1),
columns=['index', 'a', 'b', 'c', 'd'])
In [5]: dfy['index'] = dfx['index']
In [6]: print(dfx)
index a b c d
0 25 41 46 18 98
1 0 21 9 20 29
2 18 78 63 94 70
3 86 71 71 95 64
4 23 33 19 34 29
5 69 10 91 19 42
6 92 68 60 12 58
7 74 49 22 74 1
8 47 35 56 41 80
9 93 20 44 16 49
In [7]: print(dfy)
index a b c d
0 25 28 35 96 89
1 0 44 94 50 43
2 18 18 39 75 45
3 86 18 87 72 88
4 23 2 28 24 4
5 69 53 55 55 40
6 92 0 52 54 91
7 74 8 1 96 59
8 47 74 21 7 7
9 93 42 83 42 60
In [8]: print(dfx.merge(dfy, on='index'))
index a_x b_x c_x d_x a_y b_y c_y d_y
0 25 41 46 18 98 28 35 96 89
1 0 21 9 20 29 44 94 50 43
2 18 78 63 94 70 18 39 75 45
3 86 71 71 95 64 18 87 72 88
4 23 33 19 34 29 2 28 24 4
5 69 10 91 19 42 53 55 55 40
6 92 68 60 12 58 0 52 54 91
7 74 49 22 74 1 8 1 96 59
8 47 35 56 41 80 74 21 7 7
9 93 20 44 16 49 42 83 42 60
In [9]: def my_function(x):
...: return sum(x)
...:
In [10]: print(dfx.merge(dfy, on='index').drop('index', axis=1).apply(my_function, axis=1))
0 451
1 310
2 482
3 566
4 173
5 365
6 395
7 310
8 321
9 356
dtype: int64
In [11]: print(pd.DataFrame(
{
'my_function':
dfx.merge(dfy, on='index').\
drop('index', axis=1).apply(my_function, axis=1),
'index':
dfx['index']
}))
index my_function
0 25 451
1 0 310
2 18 482
3 86 566
4 23 173
5 69 365
6 92 395
7 74 310
8 47 321
9 93 356

Categories