I've got a numpy array that looks like this:
1 0 0 0 200 0 0 0 1
6 0 0 0 2 0 0 0 4.3
5 0 0 0 1 0 0 0 7.1
expected out put would be
1 100 100 100 200 100 100 100 1
6 4 4 4 2 3.15 3.15 3.15 4.3
5 3 3 3 1 4.05 4.05 4.05 7.1
and I would like to replace all the 0 values with an average of their neighbours. Any hints welcome! Many thanks!
If the structure in the sample array is preserved throughout your array, then this code will work:
In [159]: def avg_func(r):
lavg = (r[0] + r[4])/2.0
ravg = (r[4] + r[-1])/2.0
r[1:4] = lavg
r[5:-1] = ravg
return r
In [160]: np.apply_along_axis(avg_func, 1, arr)
Out[160]:
array([[ 1. , 100.5 , 100.5 , 100.5 , 200. , 100.5 , 100.5 ,
100.5 , 1. ],
[ 6. , 4. , 4. , 4. , 2. , 3.15, 3.15,
3.15, 4.3 ],
[ 5. , 3. , 3. , 3. , 1. , 4.05, 4.05,
4.05, 7.1 ]])
But, as you can see this is kinda messy with hardcoding the indexes. You just have to get creative when you define avg_func here. Feel free to improve this solution and get creative. Also, note that this implementation does in-place modification on the input array.
Related
i can't still figure out how to do this the best possible way with less code,
i have a Ndarray called X :
array([0.5 , 2 , 3.2 , 0.16 , 3.3 , 10 , 12 , 2.5 , 10 , 1.2 ])
and i want somehow to get the smallest 5 values with their position in X
as in i want ( 0.5 , 0.16 , 1.2, 2 ,2.5 ) and to know that they are the first and fourth and 10th and second 8th in the ndarray X ( the are actually the the values of a row in a matrix and i want to know the position of the smallest 5 )
thank you!
You can use ndarray.argpartition:
X = np.array([0.5 , 2 , 3.2 , 0.16 , 3.3 , 10 , 12 , 2.5 , 10 , 1.2 ])
n = 5
arg = X.argpartition(range(n))[:n]
print(arg)
# [3 0 9 1 7]
print(X[arg])
# [0.16 0.5 1.2 2. 2.5 ]
My data looks like:
list=[44359, 16610, 8364, ..., 1, 1, 1]
For each element in list I want to take i*([i+1]+[i-1])/2, where i is an element in the list, and i+1 and i-1 are the adjacent elements.
For some reason I cannot seem to do this cleanly in NumPy.
Here's what I've tried:
weights=[]
weights.append(1)
for i in range(len(hoff[3])-1):
weights.append((hoff[3][i-1]+hoff[3][i+1])/2)
Where I append 1 to the weights list so that lengths will match at the end. I arbitrarily picked 1, I'm not sure how to deal with the leftmost and rightmost points either.
You can use numpy's array operations to represent your "loop". If you think of data as bellow, where pL and pR are the values you choose to "pad" your data with on the left and right:
[pL, 0, 1, 2, ..., N-2, N-1, pR]
What you're trying to do is this:
[0, ..., N - 1] * ([pL, 0, ..., N-2] + [1, ..., N -1, pR]) / 2
Written in code it looks something like this:
import numpy as np
data = np.random.random(10)
padded = np.concatenate(([data[0]], data, [data[-1]]))
data * (padded[:-2] + padded[2:]) / 2.
Repeating the first and last value is known as "extending" in image processing, but there are other edge handling methods you could try.
I would use pandas for this, filling in the missing left- and right-most values with 1 (but you can use any value you want):
import numpy
import pandas
numpy.random.seed(0)
data = numpy.random.randint(0, 10, size=15)
df = (
pandas.DataFrame({'hoff': data})
.assign(before=lambda df: df['hoff'].shift(1).fillna(1).astype(int))
.assign(after=lambda df: df['hoff'].shift(-1).fillna(1).astype(int))
.assign(weight=lambda df: df['hoff'] * df[['before', 'after']].mean(axis=1))
)
print(df.to_string(index=False)
And that gives me:
hoff before after weight
5 1 0 2.5
0 5 3 0.0
3 0 3 4.5
3 3 7 15.0
7 3 9 42.0
9 7 3 45.0
3 9 5 21.0
5 3 2 12.5
2 5 4 9.0
4 2 7 18.0
7 4 6 35.0
6 7 8 45.0
8 6 8 56.0
8 8 1 36.0
1 8 1 4.5
A pure numpy-based solution would look like this (again, filling with 1):
before_after = numpy.ones((data.shape[0], 2))
before_after[1:, 0] = data[:-1]
before_after[:-1, 1] = data[1:]
weights = data * before_after.mean(axis=1)
print(weights)
array([ 2.5, 0. , 4.5, 15. , 42. , 45. , 21. , 12.5, 9. ,
18. , 35. , 45. , 56. , 36. , 4.5])
UPDATED:
In my dataset I have 3 columns (x,y) and VALUE.
It's looking like this(sorted already):
df1:
x , y ,value
1 , 1 , 12
2 , 2 , 12
4 , 3 , 12
1 , 1 , 11
2 , 2 , 11
4 , 3 , 11
1 , 1 , 33
2 , 2 , 33
4 , 3 , 33
I need to get those rows where, distance bewteen them (in X and Y column) is <= 1 , lets say its my radius. But in same time i need to group and filter only those where Value is equal.
I had problems to compare it in one dataset because there was one header, so i have created second dataset with python commands:
df:
x , y ,value
1 , 1 , 12
2 , 2 , 12
4 , 3 , 12
x , y ,value
1 , 1 , 11
2 , 2 , 11
4 , 3 , 11
x , y ,value
1 , 1 , 33
2 , 2 , 33
4 , 3 , 33
I have tried to use this code:
def dist_value_comp(row):
x_dist = abs(df['y'] - row['y']) <= 1
y_dist = abs(df['x'] - row['x']) <= 1
xy_dist = x_dist & y_dist
max_value = df.loc[xy_dist, 'value'].max()
return row['value'] == max_value
df['keep_row'] = df.apply(dist_value_comp, axis=1)
df.loc[df['keep_row'], ['x', 'y', 'value']]
and
filtered_df = df[df.apply(lambda line: abs(line['x']- line['y']) <= 1, 1)]
for i in filtered_df.groupby('value'):
print(i)
Before I have received errors connected with bad data frame, I have repaired it but I have still no results on output.
That's how I am creating my new data frame df from df1, if you will have any better idea please put it here, is one have big minus because always prints me the table. And I test it again and this def gives me empty DataFrame.
VALUE1= df1.VALUE.unique()
def separator():
lst=[]
for VALUE in VALUE1:
abc= df1[df1.VALUE==VALUE]
print abc
return lst
ab=separator()
df=pd.DataFrame(ab)
When I am trying normal dataset df1, I have on output all data without taking into account radius =1
I need to get on my output table like this one:
x , y ,value
1 , 1 , 12
2 , 2 , 12
x , y ,value
1 , 1 , 11
2 , 2 , 11
x , y ,value
1 , 1 , 33
2 , 2 , 33
UPDATE 2:
I am working right now with this code:
filtered_df = df[df.apply(lambda line: abs(line['x']- line['y']) <= 1, 1)]
for i in filtered_df.groupby('value'):
print(i)
It seems to be ok(i am taking df1 as input), but when i am looking on the output,
its doing nothing because he dont know from what value it should use the radius +/-1, thats the reason i think.
In my dataset i have more columns, so lets take into account my 4th and 5th column 'D'&'E', so radius will be taken from this row where is minimum value in column D & E in same time.
df1:
x , y ,value ,D ,E
1 , 1 , 12 , 1 , 2
2 , 2 , 12 , 2 , 3
4 , 3 , 12 , 3 , 4
1 , 1 , 11 , 2 , 1
2 , 2 , 11 , 3 , 2
4 , 3 , 11 , 5 , 3
1 , 1 , 33 , 1 , 3
2 , 2 , 33 , 2 , 3
4 , 3 , 33 , 3 , 3
So output result should be same as i want to , but right now i know from what value radius +/-1 in this case should start.
Anyone can help me right now?
Sorry for misunderstanding !
From what I understand, the order in which you make your operations (filter those with distance <= 1 and grouping them) has no importance.
Here is my take:
#first selection of the lines with right distance
filtered_df = df[df.apply(lambda line: abs(line['x']- line['y']) <= 1, 1)]
# Then group
for i in filtered_df.groupby('value'):
print(i)
# Or do whatever you want
Let me know if you want some explanations on how some part of the code works.
I Have a matrix with int inside it.
I need to replace the 111 with the median of its immediate 4 neighbourhood if any of the neighbors are 111 then they are ignored.
For eg:-
matrix = 1 2 3 4 5 6
101 111 1 3 44 3
111 3 4 4 5 6
1 2 4 5 7 7
after replacing expected op
1 2 3 4 5 6
101 2.5 1 3 44 3
3 3 4 4 5 6
1 2 4 5 7 7
My code is pretty bad and probably very slow. any help appreciated
def median_fil_mat(matrix):
rows,columns= np.where(matrix==111)
r,c=np.shape(matrix)
for each_row in rows:
for each_colmn in columns:
if each_row==r-1:
r1=[each_row-1]
elif each_row>0 & each_row!=r-1:
r1= [each_row-1,each_row+1]
else:
r1=[each_row+1]
if each_colmn ==c-1:
c1=[each_colmn-1]
elif each_colmn >0 & each_colmn!=c-1:
c1=[each_colmn-1,each_colmn+1]
else:
c1=[each_colmn+1]
med_lis=list()
for rr in r1:
for cc in c1:
med_lis.append(matrix[rr,cc])
med_lis=[x for x in med_lis if x!=111 ]
matrix[each_row,each_colmn]= np.median(med_lis)
return matrix
def func(array):
if array[2]==111:
return np.median(array[array!=111])
else:
return array[2]
fp = np.array([[0, 1, 0], [1, 1, 1], [0, 1, 0]])
a = np.fromstring("""1 2 3 4 5 6
101 111 1 3 44 3
111 3 4 4 5 6
1 2 4 5 7 7""", sep=" ").reshape((4, 6))
generic_filter(a, func, footprint=fp, mode='nearest')
Returns
array([[ 1. , 2. , 3. , 4. , 5. , 6. ],
[ 101. , 2.5, 1. , 3. , 44. , 3. ],
[ 3. , 3. , 4. , 4. , 5. , 6. ],
[ 1. , 2. , 4. , 5. , 7. , 7. ]])
I have the following sets:
x =
[[ 0. 1. 2. 3. 4. 5. 6. 7.]
[ 0. 1. 2. 3. 4. 5. 6. 7.]
[ 0. 1. 2. 3. 4. 5. 6. 7.]
[ 0. 1. 2. 3. 4. 5. 6. 7.]
[ 0. 1. 2. 3. 4. 5. 6. 7.]]
y=
[[-0.9 -0.9 -0.9 -0.9 -0.9 -0.9 -0.9 -0.9]
[ 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1]
[ 1.1 1.1 1.1 1.1 1.1 1.1 1.1 1.1]
[ 2.1 2.1 2.1 2.1 2.1 2.1 2.1 2.1]
[ 3.1 3.1 3.1 3.1 3.1 3.1 3.1 3.1]]
and
Z =
[[0 0 0 0 0 0 1 1]
[0 0 0 0 1 1 1 1]
[0 0 0 1 1 1 1 1]
[0 2 2 2 2 2 2 2]
[2 2 2 2 2 2 2 2]]
I have colors = ('red', 'blue', 'green') and when I use matplotlib to draw the contour with plt.contour(x, y, Z, colors=colors), I get:
I was expecting to only have three lines dividing the 0 area from the 1 area from the 2 area. Why do I have so many?
matplotlib has interpolated between your points so that the z-value changes more gradually. You can see this by plotting your x and y values as dots (pyplot.plot(x[i][j],y[i][j],'ok'), looping over i and j). If you do this, you'll see that the lines all fall between the dots, so they actually do separate the area into three regions.
You can specify which lines to show using the levels keyword argument:
pyplot.contour(x,y,z,levels=[.5,1.5])
(you only need two lines to separate these three areas.)
If you want to look at the matrix elements without interpolation, you can use matshow:
pyplot.matshow(z)