Sliding a subarray along a 2d array - python

This is something I've been struggling with for a couple of weeks. The algorithm is the following:
Select a subarray as an array of rows and columns from a larger array
Compute the median of the subarray
Replace cells in subarray with median value
Move the subarray to the right by its own length
Repeat to end of array
Move subarray down by its own height
Repeat
I've got steps 1 to 3 as follows:
import numpy as np
w1 = np.arange(100).reshape(10,10)
side = 3
patch = w1[0:side, 0:side]
i, j = patch.shape
for j in range(side):
for i in range(side):
patch[i,j] = np.median(patch)
Eventually, I'll be using a 901x877 array from an image but I'm just trying to get a hold of this simple task first. How can I slide the array along and then down with a loop?

You can use scikit-image's view_as_blocks and NumPy broadcasting to vectorize the operation:
import numpy as np
import skimage
w1 = np.arange(144).reshape(12,12)
print(w1)
# [[ 0 1 2 3 4 5 6 7 8 9 10 11]
# [ 12 13 14 15 16 17 18 19 20 21 22 23]
# [ 24 25 26 27 28 29 30 31 32 33 34 35]
# [ 36 37 38 39 40 41 42 43 44 45 46 47]
# [ 48 49 50 51 52 53 54 55 56 57 58 59]
# [ 60 61 62 63 64 65 66 67 68 69 70 71]
# [ 72 73 74 75 76 77 78 79 80 81 82 83]
# [ 84 85 86 87 88 89 90 91 92 93 94 95]
# [ 96 97 98 99 100 101 102 103 104 105 106 107]
# [108 109 110 111 112 113 114 115 116 117 118 119]
# [120 121 122 123 124 125 126 127 128 129 130 131]
# [132 133 134 135 136 137 138 139 140 141 142 143]]
side = 3
w2 = skimage.util.view_as_blocks(w1, (side, side))
w2[...] = np.median(w2, axis=(-2, -1))[:, :, None, None]
print(w1)
# [[ 13 13 13 16 16 16 19 19 19 22 22 22]
# [ 13 13 13 16 16 16 19 19 19 22 22 22]
# [ 13 13 13 16 16 16 19 19 19 22 22 22]
# [ 49 49 49 52 52 52 55 55 55 58 58 58]
# [ 49 49 49 52 52 52 55 55 55 58 58 58]
# [ 49 49 49 52 52 52 55 55 55 58 58 58]
# [ 85 85 85 88 88 88 91 91 91 94 94 94]
# [ 85 85 85 88 88 88 91 91 91 94 94 94]
# [ 85 85 85 88 88 88 91 91 91 94 94 94]
# [121 121 121 124 124 124 127 127 127 130 130 130]
# [121 121 121 124 124 124 127 127 127 130 130 130]
# [121 121 121 124 124 124 127 127 127 130 130 130]]
Note that I had to change the size of your array to 12x12 so that all of your tiles of 3x3 actually fit in there.

Here are a few "code smells" I see.
Start with the range(side) since this number is set to 3 then you are going to have a result of [0,1,2]. Is that what you really want?
you set i,j = patch.size then immediately over write these values, in your for loops.
Finally, you're recalculating median every loop.
Ok, here's what I'd do.
figure out how many patches you'll need in both width and height. and multiply those by the size of the side.
slice your array (matrix) up into those pieces.
assign the patch to the median.
import numpy as np
w1 = np.arange(100).reshape(10,10)
side = 3
w, h = w1.shape
width_index = np.array(range(w//side)) * side
height_index = np.array(range(h//side)) * side
def assign_patch(patch, median, side):
"""Break this loop out to prevent 4 nested 'for' loops"""
for j in range(side):
for i in range(side):
patch[i,j] = median
return patch
for width in width_index:
for height in height_index:
patch = w1[width:width+side, height:height+side]
median = np.median(patch)
assign_patch(patch, median, side)
print w1

Related

Python: Audio segmentation with overlapping and hamming windows

I would like to do such a thinks:
Segment the audio file (divide it into frames) - to avoid information loss, the frames should overlap.
In each frame, apply a window function (Hann, Hamming, Blackman etc) - to minimize discontinuities at the beginning and end.
I managed to save the audio file as a numpy array:
def wave_open(path, normalize=True, rm_constant=False):
path = wave.open(path, 'rb')
frames_n = path.getnframes()
channels = path.getnchannels()
sample_rate = path.getframerate()
duration = frames_n / float(sample_rate)
read_frames = path.readframes(frames_n)
path.close()
data = struct.unpack("%dh" % channels * frames_n, read_frames)
if channels == 1:
data = np.array(data, dtype=np.int16)
return data
else:
print("More channels are not supported")
And then I did a hamming window on the whole signal:
N = 11145
win = np.hanning(N)
windowed_signal = (np.fft.rfft(win*data))
But I don't know how to split my signal into frames (segments) before useing hamming window.
Please help me :)
Here is a solution using librosa.
import librosa
import numpy as np
x = np.arange(0, 128)
frame_len, hop_len = 16, 8
frames = librosa.util.frame(x, frame_length=frame_len, hop_length=hop_len)
windowed_frames = np.hanning(frame_len).reshape(-1, 1)*frames
# Print frames
for i, frame in enumerate(frames):
print("Frame {}: {}".format(i, frame))
# Print windowed frames
for i, frame in enumerate(windowed_frames):
print("Win Frame {}: {}".format(i, np.round(frame, 3)))
Output:
Frame 0: [ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15]
Frame 1: [ 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23]
Frame 2: [16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31]
Frame 3: [24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39]
Frame 4: [32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47]
Frame 5: [40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55]
Frame 6: [48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63]
Frame 7: [56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71]
Frame 8: [64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79]
Frame 9: [72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87]
Frame 10: [80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95]
Frame 11: [ 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103]
Frame 12: [ 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111]
Frame 13: [104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119]
Frame 14: [112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127]
Win Frame 0: [0. 0.043 0.331 1.036 2.209 3.75 5.427 6.924 7.913 8.141 7.5 6.075
4.146 2.151 0.605 0. ]
Win Frame 1: [ 0. 0.389 1.654 3.8 6.627 9.75 12.663 14.836 15.825 15.377
13.5 10.493 6.91 3.474 0.951 0. ]
Win Frame 2: [ 0. 0.735 2.978 6.564 11.045 15.75 19.899 22.749 23.738 22.613
19.5 14.911 9.674 4.798 1.297 0. ]
Win Frame 3: [ 0. 1.081 4.301 9.328 15.463 21.75 27.135 30.661 31.65 29.849
25.5 19.329 12.438 6.121 1.643 0. ]
Win Frame 4: [ 0. 1.426 5.625 12.092 19.882 27.75 34.371 38.574 39.563 37.085
31.5 23.747 15.202 7.445 1.988 0. ]
Win Frame 5: [ 0. 1.772 6.948 14.856 24.3 33.75 41.607 46.486 47.476 44.321
37.5 28.165 17.966 8.768 2.334 0. ]
Win Frame 6: [ 0. 2.118 8.272 17.62 28.718 39.75 48.843 54.399 55.388 51.557
43.5 32.584 20.729 10.092 2.68 0. ]
Win Frame 7: [ 0. 2.464 9.595 20.384 33.136 45.75 56.08 62.312 63.301 58.793
49.5 37.002 23.493 11.415 3.026 0. ]
Win Frame 8: [ 0. 2.81 10.919 23.148 37.554 51.75 63.316 70.224 71.213 66.029
55.5 41.42 26.257 12.738 3.372 0. ]
Win Frame 9: [ 0. 3.156 12.242 25.912 41.972 57.75 70.552 78.137 79.126 73.265
61.5 45.838 29.021 14.062 3.718 0. ]
Win Frame 10: [ 0. 3.501 13.566 28.676 46.39 63.75 77.788 86.049 87.038 80.501
67.5 50.256 31.785 15.385 4.063 0. ]
Win Frame 11: [ 0. 3.847 14.889 31.44 50.808 69.75 85.024 93.962 94.951 87.737
73.5 54.674 34.549 16.709 4.409 0. ]
Win Frame 12: [ 0. 4.193 16.213 34.204 55.226 75.75 92.26 101.875 102.864
94.973 79.5 59.092 37.313 18.032 4.755 0. ]
Win Frame 13: [ 0. 4.539 17.536 36.968 59.645 81.75 99.496 109.787 110.776
102.209 85.5 63.51 40.077 19.356 5.101 0. ]
Win Frame 14: [ 0. 4.885 18.86 39.732 64.063 87.75 106.732 117.7 118.689
109.446 91.5 67.929 42.841 20.679 5.447 0. ]
Actually, you're applying a Hann/Hanning window. Use np.hamming() to get a Hamming window.
To split the array, you can use np.split() or np.array_split()
Here's an example:
import numpy as np
x = np.arange(0,128)
frame_size = 16
y = np.split(x,range(frame_size,x.shape[0],frame_size))
for v in y:
print (v)
Output:
[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15]
[16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31]
[32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47]
[48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63]
[64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79]
[80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95]
[ 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111]
[112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127]

GRAPH: Generate sum of squares between 1 and 10 for polynomial graph

In an attempt to write a query in Python that will generate the first one hundred sums of squares between one and ten, the following attempt is made:
for a in range(1, 10):
for b in range(1,10):
print(a**2+b**2)
Format should look like this:
1 1 4 9 16 25 36 49 64 81 100
4 5 8 13 20 29 40 53 68 85 104
9 10 13 18 25 34 45 58 73 90 109
16 17 20 25 32 41 52 65 80 97 116
25 26 38 34 41 50 61 74 89 106 125
36 37 40 45 52 61 72 85 100 117 136
49 75 53 58 65 74 85 98 113 130 149
64 65 68 73 80 89 100 113 128 145 164
81 82 85 90 97 106 117 130 145 162 181
100 101 104 109 116 125 136 149 164 181 200
After reading that the sum of two squares in a polynomial will always be positive, it occurred to me that it would make a good dataset to have all the numbers that could be generated by the sum of two squares. Polynomial like (a+b)*(a-b)...
Seems to me there should be an answer that creates a couple of matrices multiplied together.
So tried this but format is wrong (Thank UnsignedFoo for the help)
df=pd.Dataframe=([])
for a in range(1, 10):
val_row = " ".join([str(a**2+b**2) for b in range(1,10)])
print("{}\n".format("["+val_row+"]"))
df.append("{}\n".format("["+val_row+"]"))
It can be hard coded a line a time:
df1 = pd.DataFrame([[1, 2,3,4,5,6,7,8,9,10], [2,5,10,17,26,37,50,65,82,101]], columns=list('ABCDEFGHIJ'))
df2 = pd.DataFrame([[5,8,13,20,29,40,53,68,85,104], [10,13,18,25,34,45,58,73,90,109]], columns=list('ABCDEFGHIJ'))
df3 = pd.DataFrame([[17,20,25,32,41,52,65,80,97,116], [26,38,34,41,50,61,74,89,106,125]], columns=list('ABCDEFGHIJ'))
df4 = pd.DataFrame([[37,40,45,52,61,72,85,100,117,136], [75,53,58,65,74,85,98,113,130,149]], columns=list('ABCDEFGHIJ'))
df5 = pd.DataFrame([[65,68,73,80,89,100,113,128,145,164], [82,85,90,97,106,117,130,145,162,181]], columns=list('ABCDEFGHIJ'))
df6 = pd.DataFrame([[101,104,109,116,125,136,149,164,181,200], [126,125,130,137,146,157,170,185,202,221]], columns=list('ABCDEFGHIJ'))
df1.append(df2)
df1.append(df3)
df1.append(df4)
df1.append(df5)
df1.append(df6)
And at that point can be graphed and it would be great if the matrice could be built with less work - up to this point, Excel does everything but the Excel doesn't have the advanced matplot libraries:
import matplotlib.pyplot as plt
from matplotlib import cm
from mpl_toolkits.mplot3d import Axes3D
%matplotlib inline
%config InlineBackend.figure_format='retina'
from mpl_toolkits import mplot3d
fig=plt.figure(figsize=(8,6))
xs=df1['A']
ys=df1['B']
zs=df1['C']
ax=fig.add_subplot(111,projection='3d')
ax.scatter(xs,ys,zs,s=50,alpha=0.6,edgecolors='w')
ax.set_xlabel('A')
ax.set_ylabel('B')
ax.set_zlabel('C')
plt.show()
A better example that is closer to where this should go is:
X = np.arange(1, 10, 1)
Y = np.arange(1, 10, 1)
X, Y = np.meshgrid(X, Y)
R = (X**2 + Y**2)# R = np.sqrt(X**2 + Y**2)
Z = np.sin(R)
fig = plt.figure()
ax = Axes3D(fig)
ax.plot_surface(X, Y, Z, rstride=1, cstride=1, cmap=cm.viridis)
plt.show()
Still hoping someone can answer this question. I'd like to delete and start again: the "answers" don't address the issue of graphing.
The syntax error with your code is, that after a for loop a : followed by a newline is required:
This results in:
for b in range(1, 10):
for a in range(1,b):
print(a**2+b**2)
Please note this still does not print your table, as print always generates a newline in python.
for a in range(1, 11):
val_row = " ".join([str(a**2 + b**2 -1) for b in range(1,11)])
print("{}\n".format(val_row))
Output:
1 4 9 16 25 36 49 64 81 100
4 7 12 19 28 39 52 67 84 103
9 12 17 24 33 44 57 72 89 108
16 19 24 31 40 51 64 79 96 115
25 28 33 40 49 60 73 88 105 124
36 39 44 51 60 71 84 99 116 135
49 52 57 64 73 84 97 112 129 148
64 67 72 79 88 99 112 127 144 163
81 84 89 96 105 116 129 144 161 180
100 103 108 115 124 135 148 163 180 199

Sort randomly generated Numpy array according to a different array

I have created 2 variables. One to hold 200 randomly generated ages, the other to hold 200 randomly generated marks.
from numpy import *
age = random.random_integers(18,40, size=(1,200))
marks = random.random_integers(0,100, size=(1,200))
I'd like to use NumPy to sort the marks array by the age array. eg:
#random student ages
[32 37 53 48 39 44 33 40 56 47]
#random student marks
[167 160 176 163 209 178 201 164 190 156]
#sorted marked according to ages
[32 33 37 39 40 44 47 48 53 56]
[167 201 160 209 164 178 156 163 176 190]
It is a similar question to this one. I am just unsure if a similar solution applies due to the elements being randomly generated.
One way is to first calculate an ordering via argsort, then use this to index your input arrays::
import numpy as np
np.random.seed(0)
ages = np.random.randint(18, 40, size=10) # [30 33 39 18 21 21 25 27 37 39]
marks = np.random.randint(0, 100, size=10) # [36 87 70 88 88 12 58 65 39 87]
order = ages.argsort() # [3 4 5 6 7 0 1 8 2 9]
print(ages[order]) # [18 21 21 25 27 30 33 37 39 39]
print(marks[order]) # [88 88 12 58 65 36 87 39 70 87]

Parenthesized repetitions in Python regular expressions

I have the following string (say the variable name is "str")
(((TEST (4 5 17 33 38 45 93 101 104 108 113 116 135 146 148)) (TRAIN (0 1 2 3 6 7 8 9 10 11 12 13 14 15 16 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 34 35 36 37 39 40 41 42 43 44 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 94 95 96 97 98 99 100 102 103 105 106 107 109 110 111 112 114 115 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 136 137 138 139 140 141 142 143 144 145 147 149 150 151))) ((TEST (19 35 46 47 48 56 59 61 65 69 71 84 105 107 130)) (TRAIN (0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 36 37 38 39 40 41 42 43 44 45 49 50 51 52 53 54 55 57 58 60 62 63 64 66 67 68 70 72 73 74 75 76 77 78 79 80 81 82 83 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 106 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151)))'
from which I would like to get
['TEST (4 5 17 33 38 45 93 101 104 108 113 116 135 146 148)', 'TEST (19 35 46 47 48 56 59 61 65 69 71 84 105 107 130)']
using re.findall() function in Python.
I tried the following
m = re.findall(r'TEST\s\((\d+\s?)*\)', str)
for which I get the result
['148', '130']
which is a list of only the last numbers of each set of numbers I want. I don't know why my regexp is wrong. Can someone please help me fix this problem?
Thanks!
Do not use a capturing group that repeats; only the last value will be captured. re.findall() will only return captured groups when you use them.
A non-capturing group for the repeat would work much better here:
m = re.findall(r'TEST\s\((?:\d+\s?)*\)', str)
Demo:
>>> import re
>>> s = '(((TEST (4 5 17 33 38 45 93 101 104 108 113 116 135 146 148)) (TRAIN (0 1 2 3 6 7 8 9 10 11 12 13 14 15 16 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 34 35 36 37 39 40 41 42 43 44 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 94 95 96 97 98 99 100 102 103 105 106 107 109 110 111 112 114 115 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 136 137 138 139 140 141 142 143 144 145 147 149 150 151))) ((TEST (19 35 46 47 48 56 59 61 65 69 71 84 105 107 130)) (TRAIN (0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 36 37 38 39 40 41 42 43 44 45 49 50 51 52 53 54 55 57 58 60 62 63 64 66 67 68 70 72 73 74 75 76 77 78 79 80 81 82 83 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 106 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151)))'
>>> re.findall(r'TEST\s\((?:\d+\s?)*\)', s)
['TEST (4 5 17 33 38 45 93 101 104 108 113 116 135 146 148)', 'TEST (19 35 46 47 48 56 59 61 65 69 71 84 105 107 130)']
Without the capturing group, re.findall() returns the whole match.
You can use (not worrying about the digits in between):
import re
print re.findall(r'\((TEST.*?\))\)', s)
['TEST (4 5 17 33 38 45 93 101 104 108 113 116 135 146 148)', 'TEST (19 35 46 47 48 56 59 61 65 69 71 84 105 107 130)']
Try this one. After TEST it matches every character until a closing parentheses and it stops there ([^)]+):
re.findall(r'\((TEST[^)]+\))', s)
It yields:
['TEST (4 5 17 33 38 45 93 101 104 108 113 116 135 146 148)',
'TEST (19 35 46 47 48 56 59 61 65 69 71 84 105 107 130)']

Fill in missing values with nearest neighbour in Python numpy masked arrays?

I am working with a 2D Numpy masked_array in Python.
I need to change the data values in the masked area such that they equal the nearest unmasked value.
NB. If there are more than one nearest unmasked values then it can take any of those nearest values (which ever one turns out to be easiest to codeā€¦)
e.g.
import numpy
import numpy.ma as ma
a = numpy.arange(100).reshape(10,10)
fill_value=-99
a[2:4,3:8] = fill_value
a[8,8] = fill_value
a = ma.masked_array(a,a==fill_value)
>>> a [[0 1 2 3 4 5 6 7 8 9]
[10 11 12 13 14 15 16 17 18 19]
[20 21 22 -- -- -- -- -- 28 29]
[30 31 32 -- -- -- -- -- 38 39]
[40 41 42 43 44 45 46 47 48 49]
[50 51 52 53 54 55 56 57 58 59]
[60 61 62 63 64 65 66 67 68 69]
[70 71 72 73 74 75 76 77 78 79]
[80 81 82 83 84 85 86 87 -- 89]
[90 91 92 93 94 95 96 97 98 99]],
I need it to look like this:
>>> a.data
[[0 1 2 3 4 5 6 7 8 9]
[10 11 12 13 14 15 16 17 18 19]
[20 21 22 ? 14 15 16 ? 28 29]
[30 31 32 ? 44 45 46 ? 38 39]
[40 41 42 43 44 45 46 47 48 49]
[50 51 52 53 54 55 56 57 58 59]
[60 61 62 63 64 65 66 67 68 69]
[70 71 72 73 74 75 76 77 78 79]
[80 81 82 83 84 85 86 87 ? 89]
[90 91 92 93 94 95 96 97 98 99]],
NB. where "?" could take any of the adjacent unmasked values.
What is the most efficient way to do this?
Thanks for your help.
I generally use a distance transform, as wisely suggested by Juh_ in this question.
This does not directly apply to masked arrays, but I do not think it will be that hard to transpose there, and it is quite efficient, I've had no problem applying it to large 100MPix images.
Copying the relevant method there for reference :
import numpy as np
from scipy import ndimage as nd
def fill(data, invalid=None):
"""
Replace the value of invalid 'data' cells (indicated by 'invalid')
by the value of the nearest valid data cell
Input:
data: numpy array of any dimension
invalid: a binary array of same shape as 'data'. True cells set where data
value should be replaced.
If None (default), use: invalid = np.isnan(data)
Output:
Return a filled array.
"""
#import numpy as np
#import scipy.ndimage as nd
if invalid is None: invalid = np.isnan(data)
ind = nd.distance_transform_edt(invalid, return_distances=False, return_indices=True)
return data[tuple(ind)]
You could use np.roll to make shifted copies of a, then use boolean logic on the masks to identify the spots to be filled in:
import numpy as np
import numpy.ma as ma
a = np.arange(100).reshape(10,10)
fill_value=-99
a[2:4,3:8] = fill_value
a[8,8] = fill_value
a = ma.masked_array(a,a==fill_value)
print(a)
# [[0 1 2 3 4 5 6 7 8 9]
# [10 11 12 13 14 15 16 17 18 19]
# [20 21 22 -- -- -- -- -- 28 29]
# [30 31 32 -- -- -- -- -- 38 39]
# [40 41 42 43 44 45 46 47 48 49]
# [50 51 52 53 54 55 56 57 58 59]
# [60 61 62 63 64 65 66 67 68 69]
# [70 71 72 73 74 75 76 77 78 79]
# [80 81 82 83 84 85 86 87 -- 89]
# [90 91 92 93 94 95 96 97 98 99]]
for shift in (-1,1):
for axis in (0,1):
a_shifted=np.roll(a,shift=shift,axis=axis)
idx=~a_shifted.mask * a.mask
a[idx]=a_shifted[idx]
print(a)
# [[0 1 2 3 4 5 6 7 8 9]
# [10 11 12 13 14 15 16 17 18 19]
# [20 21 22 13 14 15 16 28 28 29]
# [30 31 32 43 44 45 46 47 38 39]
# [40 41 42 43 44 45 46 47 48 49]
# [50 51 52 53 54 55 56 57 58 59]
# [60 61 62 63 64 65 66 67 68 69]
# [70 71 72 73 74 75 76 77 78 79]
# [80 81 82 83 84 85 86 87 98 89]
# [90 91 92 93 94 95 96 97 98 99]]
If you'd like to use a larger set of nearest neighbors, you could perhaps do something like this:
neighbors=((0,1),(0,-1),(1,0),(-1,0),(1,1),(-1,1),(1,-1),(-1,-1),
(0,2),(0,-2),(2,0),(-2,0))
Note that the order of the elements in neighbors is important. You probably want to fill in missing values with the nearest neighbor, not just any neighbor. There's probably a smarter way to generate the neighbors sequence, but I'm not seeing it at the moment.
a_copy=a.copy()
for hor_shift,vert_shift in neighbors:
if not np.any(a.mask): break
a_shifted=np.roll(a_copy,shift=hor_shift,axis=1)
a_shifted=np.roll(a_shifted,shift=vert_shift,axis=0)
idx=~a_shifted.mask*a.mask
a[idx]=a_shifted[idx]
Note that np.roll happily rolls the lower edge to the top, so a missing value at the top may be filled in by a value from the very bottom. If this is a problem, I'd have to think more about how to fix it. The obvious but not very clever solution would be to use if statements and feed the edges a different sequence of admissible neighbors...
For more complicated cases you could use scipy.spatial:
from scipy.spatial import KDTree
x,y=np.mgrid[0:a.shape[0],0:a.shape[1]]
xygood = np.array((x[~a.mask],y[~a.mask])).T
xybad = np.array((x[a.mask],y[a.mask])).T
a[a.mask] = a[~a.mask][KDTree(xygood).query(xybad)[1]]
print a
[[0 1 2 3 4 5 6 7 8 9]
[10 11 12 13 14 15 16 17 18 19]
[20 21 22 13 14 15 16 17 28 29]
[30 31 32 32 44 45 46 38 38 39]
[40 41 42 43 44 45 46 47 48 49]
[50 51 52 53 54 55 56 57 58 59]
[60 61 62 63 64 65 66 67 68 69]
[70 71 72 73 74 75 76 77 78 79]
[80 81 82 83 84 85 86 87 78 89]
[90 91 92 93 94 95 96 97 98 99]]

Categories