Python: Audio segmentation with overlapping and hamming windows - python

I would like to do such a thinks:
Segment the audio file (divide it into frames) - to avoid information loss, the frames should overlap.
In each frame, apply a window function (Hann, Hamming, Blackman etc) - to minimize discontinuities at the beginning and end.
I managed to save the audio file as a numpy array:
def wave_open(path, normalize=True, rm_constant=False):
path = wave.open(path, 'rb')
frames_n = path.getnframes()
channels = path.getnchannels()
sample_rate = path.getframerate()
duration = frames_n / float(sample_rate)
read_frames = path.readframes(frames_n)
path.close()
data = struct.unpack("%dh" % channels * frames_n, read_frames)
if channels == 1:
data = np.array(data, dtype=np.int16)
return data
else:
print("More channels are not supported")
And then I did a hamming window on the whole signal:
N = 11145
win = np.hanning(N)
windowed_signal = (np.fft.rfft(win*data))
But I don't know how to split my signal into frames (segments) before useing hamming window.
Please help me :)

Here is a solution using librosa.
import librosa
import numpy as np
x = np.arange(0, 128)
frame_len, hop_len = 16, 8
frames = librosa.util.frame(x, frame_length=frame_len, hop_length=hop_len)
windowed_frames = np.hanning(frame_len).reshape(-1, 1)*frames
# Print frames
for i, frame in enumerate(frames):
print("Frame {}: {}".format(i, frame))
# Print windowed frames
for i, frame in enumerate(windowed_frames):
print("Win Frame {}: {}".format(i, np.round(frame, 3)))
Output:
Frame 0: [ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15]
Frame 1: [ 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23]
Frame 2: [16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31]
Frame 3: [24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39]
Frame 4: [32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47]
Frame 5: [40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55]
Frame 6: [48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63]
Frame 7: [56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71]
Frame 8: [64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79]
Frame 9: [72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87]
Frame 10: [80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95]
Frame 11: [ 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103]
Frame 12: [ 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111]
Frame 13: [104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119]
Frame 14: [112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127]
Win Frame 0: [0. 0.043 0.331 1.036 2.209 3.75 5.427 6.924 7.913 8.141 7.5 6.075
4.146 2.151 0.605 0. ]
Win Frame 1: [ 0. 0.389 1.654 3.8 6.627 9.75 12.663 14.836 15.825 15.377
13.5 10.493 6.91 3.474 0.951 0. ]
Win Frame 2: [ 0. 0.735 2.978 6.564 11.045 15.75 19.899 22.749 23.738 22.613
19.5 14.911 9.674 4.798 1.297 0. ]
Win Frame 3: [ 0. 1.081 4.301 9.328 15.463 21.75 27.135 30.661 31.65 29.849
25.5 19.329 12.438 6.121 1.643 0. ]
Win Frame 4: [ 0. 1.426 5.625 12.092 19.882 27.75 34.371 38.574 39.563 37.085
31.5 23.747 15.202 7.445 1.988 0. ]
Win Frame 5: [ 0. 1.772 6.948 14.856 24.3 33.75 41.607 46.486 47.476 44.321
37.5 28.165 17.966 8.768 2.334 0. ]
Win Frame 6: [ 0. 2.118 8.272 17.62 28.718 39.75 48.843 54.399 55.388 51.557
43.5 32.584 20.729 10.092 2.68 0. ]
Win Frame 7: [ 0. 2.464 9.595 20.384 33.136 45.75 56.08 62.312 63.301 58.793
49.5 37.002 23.493 11.415 3.026 0. ]
Win Frame 8: [ 0. 2.81 10.919 23.148 37.554 51.75 63.316 70.224 71.213 66.029
55.5 41.42 26.257 12.738 3.372 0. ]
Win Frame 9: [ 0. 3.156 12.242 25.912 41.972 57.75 70.552 78.137 79.126 73.265
61.5 45.838 29.021 14.062 3.718 0. ]
Win Frame 10: [ 0. 3.501 13.566 28.676 46.39 63.75 77.788 86.049 87.038 80.501
67.5 50.256 31.785 15.385 4.063 0. ]
Win Frame 11: [ 0. 3.847 14.889 31.44 50.808 69.75 85.024 93.962 94.951 87.737
73.5 54.674 34.549 16.709 4.409 0. ]
Win Frame 12: [ 0. 4.193 16.213 34.204 55.226 75.75 92.26 101.875 102.864
94.973 79.5 59.092 37.313 18.032 4.755 0. ]
Win Frame 13: [ 0. 4.539 17.536 36.968 59.645 81.75 99.496 109.787 110.776
102.209 85.5 63.51 40.077 19.356 5.101 0. ]
Win Frame 14: [ 0. 4.885 18.86 39.732 64.063 87.75 106.732 117.7 118.689
109.446 91.5 67.929 42.841 20.679 5.447 0. ]

Actually, you're applying a Hann/Hanning window. Use np.hamming() to get a Hamming window.
To split the array, you can use np.split() or np.array_split()
Here's an example:
import numpy as np
x = np.arange(0,128)
frame_size = 16
y = np.split(x,range(frame_size,x.shape[0],frame_size))
for v in y:
print (v)
Output:
[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15]
[16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31]
[32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47]
[48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63]
[64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79]
[80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95]
[ 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111]
[112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127]

Related

In ProcessPoolExecutor of concurrent.futures, BrokenProcessPool occurs

from concurrent.futures import ProcessPoolExecutor
import numpy as np
def func(x):
return x+1
def main():
data = np.arange(100).reshape(10,-1)
with ProcessPoolExecutor() as executor:
for result in executor.map(func,data):
print(result)
if __name__ == '__main__':
main()
I expected this code to result in the following:
[ 1 2 3 4 5 6 7 8 9 10]
[11 12 13 14 15 16 17 18 19 20]
[21 22 23 24 25 26 27 28 29 30]
[31 32 33 34 35 36 37 38 39 40]
[41 42 43 44 45 46 47 48 49 50]
[51 52 53 54 55 56 57 58 59 60]
[61 62 63 64 65 66 67 68 69 70]
[71 72 73 74 75 76 77 78 79 80]
[81 82 83 84 85 86 87 88 89 90]
[91 92 93 94 95 96 97 98 99 100]
the result was:
BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.
But, the following code works
###
for result in executor.map(np.sum,data):
print(result)
###
The result is
45
145
245
345
445
545
645
745
845
945
I thought because I created 'func' by myself.
Please tell me why the error is happening.

Matplotlib ScalarMappable returning only black color

I am using matplotlib to create many plots. The plots involve making many FancyBboxPatches and setting the color for each patch using a ScalarMappable. Each plot corresponds to a "time step" from a physical process. I have made the following minimal working example to illustrate what I am trying to do and the problem I am having.
Suppose there is a file data.txt. If a line has one entry, that value is the time step. If a line has three entries, then the first entry is the x value, the second entry is the y value, and the third entry is the value that will use the ScalarMappable. Here is an example of data.txt:
1
0 0 0.1
0 1 1
0 2 2
0 3 3
0 4 4
1 0 10
1 1 11
1 2 12
1 3 13
1 4 14
2 0 20
2 1 21
2 2 22
2 3 23
2 4 24
3 0 30
3 1 31
3 2 32
3 3 33
3 4 34
2
1 0 10
1 1 11
1 2 12
1 3 13
1 4 14
2 0 110
2 1 111
2 2 112
2 3 113
2 4 114
3 0 120
3 1 121
3 2 122
3 3 123
3 4 124
4 0 130
4 1 131
4 2 132
4 3 133
4 4 134
3
2 0 110
2 1 111
2 2 112
2 3 113
2 4 114
3 0 1110
3 1 1111
3 2 1112
3 3 1113
3 4 1114
4 0 1120
4 1 1121
4 2 1122
4 3 1123
4 4 1124
5 0 1130
5 1 1131
5 2 1132
5 3 1133
5 4 1134
4
3 0 1110
3 1 1111
3 2 1112
3 3 1113
3 4 1114
4 0 11110
4 1 11111
4 2 11112
4 3 11113
4 4 11114
5 0 11120
5 1 11121
5 2 11122
5 3 11123
5 4 11124
6 0 11130
6 1 11131
6 2 11132
6 3 11133
6 4 11134
Here is the script I use to generate the plots:
#!/usr/bin/env python3
import matplotlib.pyplot as plt
import matplotlib.cm as cm
from matplotlib.colors import LogNorm
from matplotlib.patches import FancyBboxPatch
def parse_file(file_name):
output = {}
with open(file_name, 'r') as data_file:
for line in data_file:
entries = line.strip().split()
if len(entries) == 1:
time_step = int(entries[0])
output[time_step] = {}
elif len(entries) == 3:
x = float(entries[0])
y = float(entries[1])
value = float(entries[2])
output[time_step][(x, y)] = value
else:
raise RuntimeError('Anomalous line {} in file {}'.format(line, data_file.name))
return output
def main():
fig, axes = plt.subplots()
axes.set_xlim(-1,10)
axes.set_ylim(-1,10)
cmap = cm.plasma
norm = LogNorm(vmin = 1e-2, vmax = 1.2e4)
smap = cm.ScalarMappable(norm = norm, cmap = cmap)
smap.set_array([])
color_bar = fig.colorbar(mappable = smap, ax = axes, orientation = 'vertical', label = 'label')
data = parse_file(file_name = 'data.txt')
for time_step, information in data.items():
cells = []
for (x,y), value in information.items():
cell = FancyBboxPatch(xy = (x - 0.5, y - 0.5),
width = 1, height = 1,
boxstyle = 'square,pad=0.',
edgecolor = 'black',
facecolor = smap.to_rgba(value))
#print(time_step, '\t', x, '\t', y, '\t', value, '\t', smap.to_rgba(value))
axes.add_patch(cell)
cells.append(cell)
fig.savefig('time-step_{}.png'.format(time_step))
for cell in cells:
cell.remove()
if __name__ == '__main__':
main()
And here is one of the plots that is created from running that script:
This plot (and the other three that are created, but not shown here) look fine. So I am confident that I am using ScalarMappable correctly. Now I take the actual data I want to plot, again in a file called data.txt. The format is the same as before, except if a line has four entries, then the first entry is the time step (and I do not care about the other entries). Here is an example of data.txt:
2 0.424066E-02 0.200000E+01 0.885500E+08
0 1 0.850703E+00
1 3 0.388551E-09
2 4 0.141948E-06
2 6 0.126299E-09
3 9 0.166871E-08
4 12 0.340738E-08
5 13 0.246948E-09
5 14 0.129005E-09
6 16 0.140043E-08
6 17 0.885307E-09
26 76 0.591676E-08
26 78 0.745985E-08
27 77 0.263136E-08
27 78 0.131857E-08
27 79 0.151193E-05
27 80 0.265941E-07
27 81 0.170975E-05
27 82 0.206355E-08
27 83 0.334444E-07
28 80 0.569439E-05
28 81 0.864904E-07
28 82 0.114196E-02
28 83 0.130067E-06
28 84 0.608045E-04
28 85 0.351649E-07
28 86 0.543117E-07
28 88 0.202115E-08
29 83 0.225374E-07
29 84 0.125586E-07
29 85 0.253383E-04
29 86 0.943810E-06
29 87 0.104539E-04
29 88 0.210241E-06
29 89 0.196533E-03
29 90 0.707278E-06
29 91 0.565096E-05
29 92 0.840856E-08
29 93 0.277478E-07
30 86 0.707234E-09
30 88 0.549048E-07
30 89 0.281776E-08
30 90 0.259219E-04
30 91 0.298973E-06
30 92 0.311047E-04
30 93 0.144465E-05
30 94 0.632642E-04
30 95 0.787893E-08
30 96 0.252900E-08
31 91 0.425350E-08
31 92 0.371105E-08
31 93 0.621869E-05
31 94 0.680069E-06
31 95 0.315149E-04
31 96 0.670790E-07
31 97 0.568911E-06
31 98 0.187946E-08
31 99 0.135024E-07
32 94 0.384693E-09
32 96 0.174407E-06
32 97 0.480216E-08
32 98 0.244989E-05
32 99 0.876257E-07
32 100 0.189371E-04
32 101 0.264917E-06
32 102 0.297745E-05
32 103 0.213684E-09
33 99 0.110356E-08
33 100 0.131345E-08
33 101 0.448076E-06
33 102 0.106369E-06
33 103 0.128984E-04
33 104 0.230382E-07
33 105 0.266535E-07
34 102 0.428166E-08
34 103 0.668242E-08
34 104 0.842244E-05
34 105 0.843016E-07
34 106 0.137510E-05
34 107 0.879097E-08
34 108 0.758233E-07
35 105 0.280844E-06
35 106 0.639110E-07
35 107 0.497335E-05
35 108 0.260105E-06
35 109 0.188060E-05
35 110 0.375853E-09
35 111 0.935430E-09
35 112 0.138533E-07
35 113 0.101658E-06
35 114 0.504823E-09
35 115 0.989704E-09
35 116 0.152468E-06
35 117 0.220735E-07
36 114 0.430884E-08
36 116 0.115980E-07
36 117 0.128436E-05
36 118 0.814433E-05
37 117 0.316595E-09
37 118 0.141531E-06
37 119 0.965141E-05
38 119 0.459954E-08
38 120 0.114088E-04
38 121 0.198695E-09
39 120 0.109457E-08
39 121 0.105160E-04
39 122 0.254984E-08
40 122 0.717566E-05
40 123 0.179081E-08
40 124 0.352463E-09
41 123 0.454357E-05
41 124 0.629608E-07
41 125 0.777480E-07
42 124 0.453866E-05
42 125 0.108592E-06
42 126 0.320262E-06
42 127 0.252596E-09
42 128 0.114714E-09
43 125 0.372578E-06
43 126 0.344297E-07
43 127 0.188018E-05
43 128 0.631276E-08
43 129 0.368003E-08
44 126 0.170090E-07
44 127 0.121695E-07
44 128 0.147407E-05
44 129 0.349674E-07
44 130 0.767494E-06
45 128 0.193141E-09
45 129 0.361851E-06
45 130 0.573704E-07
45 131 0.457287E-06
45 132 0.148004E-08
45 133 0.164772E-07
45 134 0.386942E-09
45 135 0.539603E-08
45 136 0.227778E-09
45 137 0.640126E-08
45 138 0.189604E-09
45 139 0.754561E-09
46 132 0.215880E-07
46 134 0.102847E-08
46 136 0.628736E-08
46 137 0.427124E-09
46 138 0.711664E-07
46 139 0.749082E-08
46 140 0.425043E-06
46 141 0.776307E-08
46 142 0.102985E-06
46 143 0.693232E-09
46 144 0.215846E-08
47 141 0.660244E-08
47 142 0.901189E-09
47 143 0.299062E-07
47 144 0.195833E-08
47 145 0.178405E-07
47 146 0.558550E-09
47 147 0.235167E-08
48 144 0.393065E-09
48 146 0.493252E-08
48 147 0.299176E-09
48 148 0.130504E-07
48 149 0.244654E-09
48 150 0.143702E-08
49 149 0.565286E-09
49 151 0.122230E-08
3 0.424066E-02 0.200000E+01 0.885500E+08
0 1 0.850710E+00
1 3 0.388551E-09
2 4 0.141948E-06
2 6 0.126299E-09
3 9 0.166871E-08
4 12 0.340738E-08
5 13 0.246948E-09
5 14 0.129005E-09
6 16 0.140043E-08
6 17 0.885307E-09
26 76 0.593799E-08
26 78 0.747463E-08
27 77 0.283934E-08
27 78 0.115725E-08
27 79 0.153613E-05
27 80 0.236099E-08
27 81 0.171178E-05
27 83 0.334426E-07
28 80 0.575684E-05
28 81 0.242170E-07
28 82 0.114208E-02
28 83 0.133947E-07
28 84 0.608362E-04
28 85 0.335522E-08
28 86 0.543624E-07
28 88 0.202170E-08
29 83 0.258149E-07
29 84 0.107337E-07
29 85 0.261133E-04
29 86 0.167223E-06
29 87 0.108977E-04
29 88 0.432469E-08
29 89 0.196993E-03
29 90 0.997563E-08
29 91 0.565922E-05
29 92 0.127589E-09
29 93 0.277365E-07
30 86 0.731139E-09
30 88 0.613936E-07
30 89 0.984612E-09
30 90 0.261316E-04
30 91 0.845314E-07
30 92 0.324848E-04
30 93 0.656773E-07
30 94 0.632706E-04
30 95 0.335583E-09
30 96 0.252938E-08
31 91 0.529954E-08
31 92 0.394099E-08
31 93 0.681605E-05
31 94 0.104800E-06
31 95 0.315602E-04
31 96 0.231610E-08
31 97 0.566868E-06
31 99 0.135330E-07
32 94 0.450380E-09
32 96 0.178679E-06
32 97 0.955313E-09
32 98 0.252946E-05
32 99 0.770340E-08
32 100 0.191937E-04
32 101 0.825856E-08
32 102 0.297762E-05
33 99 0.128999E-08
33 100 0.146516E-08
33 101 0.616111E-06
33 102 0.539415E-07
33 103 0.128046E-04
33 104 0.865090E-09
33 105 0.266759E-07
34 102 0.899336E-08
34 103 0.331924E-08
34 104 0.850733E-05
34 105 0.462457E-08
34 106 0.137714E-05
34 107 0.199044E-09
34 108 0.758844E-07
35 105 0.308602E-06
35 106 0.470668E-07
35 107 0.520013E-05
35 108 0.458893E-07
35 109 0.185756E-05
35 111 0.159320E-07
35 112 0.729552E-09
35 113 0.101697E-06
35 114 0.135746E-09
35 115 0.128676E-06
35 116 0.231448E-07
35 117 0.220783E-07
36 114 0.480979E-08
36 116 0.921582E-06
36 117 0.373798E-06
36 118 0.814449E-05
37 117 0.888355E-08
37 118 0.132905E-06
37 119 0.965147E-05
38 118 0.360663E-09
38 119 0.423745E-08
38 120 0.114090E-04
39 120 0.109122E-08
39 121 0.105186E-04
40 122 0.717737E-05
40 124 0.352428E-09
41 123 0.460618E-05
41 124 0.358205E-09
41 125 0.777514E-07
42 124 0.464136E-05
42 125 0.589035E-08
42 126 0.320503E-06
42 128 0.114709E-09
43 125 0.408148E-06
43 126 0.567978E-08
43 127 0.187958E-05
43 129 0.368007E-08
44 126 0.258868E-07
44 127 0.348446E-08
44 128 0.150718E-05
44 129 0.167101E-08
44 130 0.767515E-06
45 128 0.176686E-09
45 129 0.403334E-06
45 130 0.162718E-07
45 131 0.458273E-06
45 132 0.196826E-09
45 133 0.167474E-07
45 135 0.563904E-08
45 137 0.655709E-08
45 139 0.751998E-09
46 132 0.216010E-07
46 134 0.107901E-08
46 136 0.673825E-08
46 138 0.784839E-07
46 139 0.220743E-09
46 140 0.432287E-06
46 141 0.427029E-09
46 142 0.103696E-06
46 144 0.211976E-08
47 141 0.696394E-08
47 142 0.585710E-09
47 143 0.315456E-07
47 144 0.425448E-09
47 145 0.181981E-07
47 146 0.136911E-09
47 147 0.226765E-08
48 144 0.442465E-09
48 146 0.553370E-08
48 147 0.138932E-09
48 148 0.128376E-07
48 150 0.144107E-08
49 149 0.624360E-09
49 151 0.123765E-08
The script that I use to create the plots is almost the same as before. The only differences are (1) how data.txt is parsed, (2) setting the limits of the x and y axes, and (3) the variable norm. Here is the script:
#!/usr/bin/env python3
import matplotlib.pyplot as plt
import matplotlib.cm as cm
from matplotlib.colors import LogNorm
from matplotlib.patches import FancyBboxPatch
def parse_file(file_name):
output = {}
with open(file_name, 'r') as data_file:
for line in data_file:
entries = line.strip().split()
if len(entries) == 4:
time_step = int(entries[0])
output[time_step] = {}
elif len(entries) == 3:
x = float(entries[0])
y = float(entries[1])
value = float(entries[2])
output[time_step][(x, y)] = value
else:
raise RuntimeError('Anomalous line {} in file {}'.format(line, data_file.name))
return output
def main():
fig, axes = plt.subplots()
axes.set_xlim(0,150)
axes.set_ylim(0,250)
cmap = cm.plasma
norm = LogNorm(vmin = pow(10, -10), vmax = pow(10, -2.2))
smap = cm.ScalarMappable(norm = norm, cmap = cmap)
smap.set_array([])
color_bar = fig.colorbar(mappable = smap, ax = axes, orientation = 'vertical', label = 'label')
data = parse_file(file_name = 'data.txt')
for time_step, information in data.items():
cells = []
for (x,y), value in information.items():
cell = FancyBboxPatch(xy = (x - 0.5, y - 0.5),
width = 1, height = 1,
boxstyle = 'square,pad=0.',
edgecolor = 'black',
facecolor = smap.to_rgba(value))
#print(time_step, '\t', x, '\t', y, '\t', value, '\t', smap.to_rgba(value))
axes.add_patch(cell)
cells.append(cell)
fig.savefig('time-step_{}.png'.format(time_step))
for cell in cells:
cell.remove()
if __name__ == '__main__':
main()
Now all of the patches are black. Here is one of the plots that is created:
I do not see anything obviously wrong using the print statement (which is commented out in the script):
print(time_step, '\t', x, '\t', y, '\t', value, '\t', smap.to_rgba(value))
Why are all the FancyBboxPatches black instead of the color I have chosen with the ScalarMappable (and how can I make them be the color I have chosen with the ScalarMappable)?
It doesn't look as if the patches are black. I would guess that they are just too small, such that their edge (which is black) takes up the complete area of the patch. You may use a thinner edge, or no edge at all, or you may set the edgecolor to the value of your liking as well. In general, You may also use simple patches like Rectangle instead of the FancyBboxPatch.

Sliding a subarray along a 2d array

This is something I've been struggling with for a couple of weeks. The algorithm is the following:
Select a subarray as an array of rows and columns from a larger array
Compute the median of the subarray
Replace cells in subarray with median value
Move the subarray to the right by its own length
Repeat to end of array
Move subarray down by its own height
Repeat
I've got steps 1 to 3 as follows:
import numpy as np
w1 = np.arange(100).reshape(10,10)
side = 3
patch = w1[0:side, 0:side]
i, j = patch.shape
for j in range(side):
for i in range(side):
patch[i,j] = np.median(patch)
Eventually, I'll be using a 901x877 array from an image but I'm just trying to get a hold of this simple task first. How can I slide the array along and then down with a loop?
You can use scikit-image's view_as_blocks and NumPy broadcasting to vectorize the operation:
import numpy as np
import skimage
w1 = np.arange(144).reshape(12,12)
print(w1)
# [[ 0 1 2 3 4 5 6 7 8 9 10 11]
# [ 12 13 14 15 16 17 18 19 20 21 22 23]
# [ 24 25 26 27 28 29 30 31 32 33 34 35]
# [ 36 37 38 39 40 41 42 43 44 45 46 47]
# [ 48 49 50 51 52 53 54 55 56 57 58 59]
# [ 60 61 62 63 64 65 66 67 68 69 70 71]
# [ 72 73 74 75 76 77 78 79 80 81 82 83]
# [ 84 85 86 87 88 89 90 91 92 93 94 95]
# [ 96 97 98 99 100 101 102 103 104 105 106 107]
# [108 109 110 111 112 113 114 115 116 117 118 119]
# [120 121 122 123 124 125 126 127 128 129 130 131]
# [132 133 134 135 136 137 138 139 140 141 142 143]]
side = 3
w2 = skimage.util.view_as_blocks(w1, (side, side))
w2[...] = np.median(w2, axis=(-2, -1))[:, :, None, None]
print(w1)
# [[ 13 13 13 16 16 16 19 19 19 22 22 22]
# [ 13 13 13 16 16 16 19 19 19 22 22 22]
# [ 13 13 13 16 16 16 19 19 19 22 22 22]
# [ 49 49 49 52 52 52 55 55 55 58 58 58]
# [ 49 49 49 52 52 52 55 55 55 58 58 58]
# [ 49 49 49 52 52 52 55 55 55 58 58 58]
# [ 85 85 85 88 88 88 91 91 91 94 94 94]
# [ 85 85 85 88 88 88 91 91 91 94 94 94]
# [ 85 85 85 88 88 88 91 91 91 94 94 94]
# [121 121 121 124 124 124 127 127 127 130 130 130]
# [121 121 121 124 124 124 127 127 127 130 130 130]
# [121 121 121 124 124 124 127 127 127 130 130 130]]
Note that I had to change the size of your array to 12x12 so that all of your tiles of 3x3 actually fit in there.
Here are a few "code smells" I see.
Start with the range(side) since this number is set to 3 then you are going to have a result of [0,1,2]. Is that what you really want?
you set i,j = patch.size then immediately over write these values, in your for loops.
Finally, you're recalculating median every loop.
Ok, here's what I'd do.
figure out how many patches you'll need in both width and height. and multiply those by the size of the side.
slice your array (matrix) up into those pieces.
assign the patch to the median.
import numpy as np
w1 = np.arange(100).reshape(10,10)
side = 3
w, h = w1.shape
width_index = np.array(range(w//side)) * side
height_index = np.array(range(h//side)) * side
def assign_patch(patch, median, side):
"""Break this loop out to prevent 4 nested 'for' loops"""
for j in range(side):
for i in range(side):
patch[i,j] = median
return patch
for width in width_index:
for height in height_index:
patch = w1[width:width+side, height:height+side]
median = np.median(patch)
assign_patch(patch, median, side)
print w1

How to apply a function to rows of two pandas DataFrame

There are two pandas DataFrame, say dfx, dfy of the same shape and exactly the same column and row indices. I want to apply a function to the corresponding rows of these two DataFrame.
In other words, suppose we have a function as follows
def fun( row_x, row_y):
...# a function of the corresponding rows
Let index be the common index of dfx, dfy. I want to compute in pandas the following list/Series
[fun(dfx[i], dfy[i]) for i in index] (pseudo-code)
By the following code, I make a grouped two-level indexed DataFrame. Then I do not know how to apply agg in the proper way.
dfxy = pd.concat({'dfx':dfx, 'dfy':dfy})
dfxy = dfxy.swaplevel(0,1,axis=0).sort_index(level=0)
grouped=dfxy.groupby(level=0)
In [19]:
dfx = pd.DataFrame(data = np.random.randint(0 , 100 , 50).reshape(10 ,-1) , columns=list('abcde'))
dfx
Out[19]:
a b c d e
3 44 8 55 95
26 5 18 34 10
20 20 91 15 8
83 7 50 47 27
97 65 10 94 93
44 6 70 60 4
38 64 8 67 92
44 21 42 6 12
30 98 34 7 79
76 7 14 58 5
In [4]:
dfy = pd.DataFrame(data = np.random.randint(0 , 100 , 50).reshape(10 ,-1) , columns=list('fghij'))
dfy
Out[4]:
f g h i j
82 48 29 54 78
7 31 78 38 30
90 91 43 8 40
52 88 13 87 39
41 88 90 51 91
55 4 94 62 98
31 23 4 59 93
87 12 33 77 0
25 99 39 23 1
7 50 46 39 66
In [13]:
dfxy = pd.concat({'dfx':dfx, 'dfy':dfy} , axis = 1)
dfxy
Out[13]:
dfx dfy
a b c d e f g h i j
20 76 5 98 38 82 48 29 54 78
39 36 9 3 74 7 31 78 38 30
43 12 50 72 14 90 91 43 8 40
89 41 95 91 86 52 88 13 87 39
33 30 55 64 94 41 88 90 51 91
89 84 48 1 60 55 4 94 62 98
68 40 27 10 63 31 23 4 59 93
33 10 86 89 67 87 12 33 77 0
56 89 0 70 67 25 99 39 23 1
48 58 98 18 24 7 50 46 39 66
def f(x , y):
return pd.Series(data = [np.mean(x) , np.mean(y)] , index=['x_mean' , 'y_mean'])
In [17]:
dfxy.apply( lambda x : f(x['dfx'] , x['dfy']) , axis = 1)
Out[17]:
x_mean y_mean
0 47.4 58.2
1 32.2 36.8
2 38.2 54.4
3 80.4 55.8
4 55.2 72.2
5 56.4 62.6
6 41.6 42.0
7 57.0 41.8
8 56.4 37.4
9 49.2 41.6
Could this be what you are looking for?
In [1]: import pandas as pd
In [2]: import numpy as np
In [3]: dfx = pd.DataFrame(data=np.random.randint(0,100,50).reshape(10,-1),
columns=['index', 'a', 'b', 'c', 'd'])
In [4]: dfy = pd.DataFrame(data=np.random.randint(0,100,50).reshape(10,-1),
columns=['index', 'a', 'b', 'c', 'd'])
In [5]: dfy['index'] = dfx['index']
In [6]: print(dfx)
index a b c d
0 25 41 46 18 98
1 0 21 9 20 29
2 18 78 63 94 70
3 86 71 71 95 64
4 23 33 19 34 29
5 69 10 91 19 42
6 92 68 60 12 58
7 74 49 22 74 1
8 47 35 56 41 80
9 93 20 44 16 49
In [7]: print(dfy)
index a b c d
0 25 28 35 96 89
1 0 44 94 50 43
2 18 18 39 75 45
3 86 18 87 72 88
4 23 2 28 24 4
5 69 53 55 55 40
6 92 0 52 54 91
7 74 8 1 96 59
8 47 74 21 7 7
9 93 42 83 42 60
In [8]: print(dfx.merge(dfy, on='index'))
index a_x b_x c_x d_x a_y b_y c_y d_y
0 25 41 46 18 98 28 35 96 89
1 0 21 9 20 29 44 94 50 43
2 18 78 63 94 70 18 39 75 45
3 86 71 71 95 64 18 87 72 88
4 23 33 19 34 29 2 28 24 4
5 69 10 91 19 42 53 55 55 40
6 92 68 60 12 58 0 52 54 91
7 74 49 22 74 1 8 1 96 59
8 47 35 56 41 80 74 21 7 7
9 93 20 44 16 49 42 83 42 60
In [9]: def my_function(x):
...: return sum(x)
...:
In [10]: print(dfx.merge(dfy, on='index').drop('index', axis=1).apply(my_function, axis=1))
0 451
1 310
2 482
3 566
4 173
5 365
6 395
7 310
8 321
9 356
dtype: int64
In [11]: print(pd.DataFrame(
{
'my_function':
dfx.merge(dfy, on='index').\
drop('index', axis=1).apply(my_function, axis=1),
'index':
dfx['index']
}))
index my_function
0 25 451
1 0 310
2 18 482
3 86 566
4 23 173
5 69 365
6 92 395
7 74 310
8 47 321
9 93 356

Gradient four figures scale

I have a question using matplotlib and imshow. I want to plot in the same figure four "matrices", using imshow, and I need the gradient to be between [0, 1]. I also need to normalize the data with the following formula:
data_norm = data * 2/400
So far I have this:
from matplotlib import mpl,pyplot
import numpy as np
zvals = np.loadtxt("sharedGradient.txt")
img = pyplot.imshow(zvals,interpolation='nearest')
pyplot.colorbar(img)
pyplot.show()
The data is in .txt files, but this is a sample of data:
61 62 63 64 65 66 67 6 5 83 82 81 28 29 30 33 34 35 36 37
60 13 12 11 10 9 8 7 4 3 2 7 27 76 31 32 69 42 41 38
59 14 15 16 17 18 69 12 11 10 1 0 26 75 74 73 70 43 40 39
58 57 56 41 40 19 70 71 72 73 4 3 25 79 133 72 71 44 61 62
160 161 55 42 39 20 21 107 114 0 1 2 24 51 52 47 46 45 60 108
62 61 54 43 38 37 22 35 38 37 36 35 23 50 49 48 57 58 59 0
63 64 53 44 25 24 23 34 31 32 33 34 22 51 56 55 56 108 107 1
203 65 52 45 26 31 24 33 30 33 34 20 21 52 53 54 55 109 106 2
202 66 51 46 27 30 25 28 29 17 18 19 38 37 36 35 111 110 105 3
156 199 50 47 28 29 26 27 28 16 30 54 50 51 52 34 112 103 104 4
121 120 49 48 28 29 46 45 27 15 39 55 49 54 53 33 113 102 6 5
114 113 112 109 27 30 31 12 13 14 40 41 46 55 31 32 120 101 7 8
3 4 5 6 15 0 10 11 25 35 40 42 45 48 30 29 28 100 99 9
2 1 0 3 2 1 2 77 32 33 34 45 46 57 67 68 27 26 25 10
9 6 5 0 1 7 80 81 31 30 35 44 60 58 59 69 70 23 24 11
10 2 3 4 5 6 79 82 83 29 36 43 42 41 60 65 66 22 21 12
11 1 11 10 21 20 23 67 66 28 37 38 39 40 61 64 67 92 20 13
12 0 14 15 20 70 7 6 26 27 80 77 76 73 62 63 68 91 19 14
13 15 51 18 19 71 8 5 4 3 2 82 83 84 71 70 69 90 18 15
14 14 13 12 11 10 9 128 129 0 1 146 147 85 86 87 88 89 17 16
My issue is that I can't get the gradient to be between [0, 1] and I can't put different plots in the same figure. Hope somebody can help.
After you normalize the data the gradient is already adjusted from 0 to 1
to separate the imshow graphs simply add subplots to the figures: plt.subplot(number of rows, number of columns, graph number)
import matplotlib.pyplot as plt
import numpy as np
zvals = np.loadtxt("sharedGradient.txt")
zvals = zvals/200
plt.subplot(2,2,1)
img = plt.imshow(zvals,interpolation='nearest')
plt.colorbar(img)
plt.subplot(2,2,2)
img = plt.imshow(zvals)
plt.colorbar(img)
plt.subplot(2,2,3)
img = plt.imshow(zvals)
plt.colorbar(img)
plt.subplot(2,2,4)
img = plt.imshow(zvals)
plt.colorbar(img)
plt.show()
If you're also trying to make the axis range from 0 to 1 then use the extent=(0,1,0,1) inside imshow()

Categories