I have a class task to resize an array to [0,1] ie. so that the smallest number becomes 0 and largest 1.
It seems to not like the 0, as whenever there is a 0 in the code it spits out an empty array, but doing e.g [5,1] works. The output is this for [0,1]:
array([], shape=(0, 1), dtype=int64)
Is there any way to make it work? Profs have said it's right and are unsure why it's not working. Collab is the env.
test = [0,1,2,3,4,5]
arr1 = np.array(test)
def rescale(a):
"""Return the rescaled version of a on the [0,1] interval."""
a = (np.resize(a,[0, 1]))
return a
print(a)
rescale(arr1)
If I understand correctly what you really want is to normalize the values of the array, so the title is a bit misleading.
np.resize() changes the shape of the array but it does not change the values.
When any of the dimensions in the shape given to resize() is zero, then the array becomes empty because the number of elements in the array is the product of the dimensions and if one of them is zero, the product is zero. That explains your output.
So if you'd like to normalize the values, check this post:
How to normalize a NumPy array to within a certain range?
You need to compute the increment per unit and multiply this increment with each value and add the start of your interval:
test = [0,1,2,3,4,5]
def rescale(a, interval):
"""Return the rescaled version of a on the [0,1] interval."""
incr = (interval[1] - interval[0]) / (max(a) - min(a))
a = [i*incr + interval[0] for i in a]
return a
rescale(test, [0, 1])
Related
I have a 3D numpy array that represents a 3D image and I want to create a list from it with all the (x,y,z) coordinates/index tuples that are both above a certain value, and within a certain distance from other coordinates also above that certain value. So if coords (3,4,5) and (3,3,3) were both above the value, but the minimum distance apart was 4, then only one of these coords would be added to the new array (doesnt matter which).
I thought about doing something like this:
arr = [(x,y,z) for x in range(x_dim) for y in range(y_dim) for z in range(z_dim) if original_arr[z][y][x]>threshold
To get arr, which contains all coordinates above the threshold. Im stuck on how to remove all coordinates from array 'arr' which are then too close to other coordinates also inside it. Checking each coordinate against every other coordinate isnt possible, as due to the image being very large it would take too long.
Any ideas? Thanks
You can replace your threshold checking with:
import numpy as np
arr = np.argwhere(original_array> threshold)
The rest depends on your arr size and data type(please provide image size and dtype to assist better). If the number of points above the threshold is not too high you can use:
from sklearn.metrics.pairwise import euclidean_distances
euclidean_distances(arr,arr)
And check for distance threshold. If it is a high number of points, you can check via a loop iteration(I usually try to avoid changing loop variable array inside the loop, but this will save you a lot of memory space and time in case of large image):
arr = np.argwhere(original_array>threshold)
for i in range(arr.shape[0]):
try:
diff = np.argwhere(np.sum(arr[i+1:,:]-arr[i,:], axis=1)<=distance)
arr = np.delete(arr, diff+i+1, axis=0)
except IndexError as e:
break
your arr will contain coordinates you want:
output for sample code:
original_array = np.arange(40).reshape(10,2,2).astype(np.int32)
threshold = 5
distance = 3
arr:
[[1 1 0]
[4 1 1]
[8 1 1]]
distance matrix between final points:
[[0. 3.16227766 7.07106781]
[3.16227766 0. 4. ]
[7.07106781 4. 0. ]]
EDIT: per comment, if you want to ignore distance along z axis, replace this line:
diff = np.argwhere(np.sum((arr[i+1:,:]-arr[i,:])[:,0:2], axis=1)<=distance)
I'm currently learning about broadcasting in Numpy and in the book I'm reading (Python for Data Analysis by Wes McKinney the author has mentioned the following example to "demean" a two-dimensional array:
import numpy as np
arr = np.random.randn(4, 3)
print(arr.mean(0))
demeaned = arr - arr.mean(0)
print(demeaned)
print(demeand.mean(0))
Which effectively causes the array demeaned to have a mean of 0.
I had the idea to apply this to an image-like, three-dimensional array:
import numpy as np
arr = np.random.randint(0, 256, (400,400,3))
demeaned = arr - arr.mean(2)
Which of course failed, because according to the broadcasting rule, the trailing dimensions have to match, and that's not the case here:
print(arr.shape) # (400, 400, 3)
print(arr.mean(2).shape) # (400, 400)
Now, i have gotten it to work mostly, by substracting the mean from every single index in the third dimension of the array:
demeaned = np.ones(arr.shape)
for i in range(3):
demeaned[...,i] = arr[...,i] - means
print(demeaned.mean(0))
At this point, the returned values are very close to zero and i think, that's a precision error. Am i actually right with this thought or is there another caveat, that i missed?
Also, this doesn't seam to be the cleanest, most 'numpy'-way to achieve what i wanted to achieve. Is there a function or a principle that i can make use of to improve the code?
As of numpy version 1.7.0, np.mean, and several other functions, accept a tuple in their axis parameter. This means that you can perform the operation on the planes of the image all at once:
m = arr.mean(axis=(0, 1))
This mean will have shape (3,), with one element for each plane of the image.
If you want to subtract the means of each pixel individually, you have to remember that broadcasting aligns shape tuples on the right edge. That means that you need to insert an extra dimension:
n = arr.mean(axis=2)
n = n.reshape(*n.shape, 1)
Or
n = arr.mean(axis=2)[..., None]
Try np.apply_along_axis().
np.apply_along_axis(lambda x: x - np.mean(x), 2, arr)
Output: you get the array of the same shape where each cell is demeaned in the dimension you want (the second parameter, here it is 2).
I've an image of about 8000x9000 size as a numpy matrix. I also have a list of indices in a numpy 2xn matrix. These indices are fractional as well as may be out of image size. I need to interpolate the image and find the values for the given indices. If the indices fall outside, I need to return numpy.nan for them. Currently I'm doing it in for loop as below
def interpolate_image(image: numpy.ndarray, indices: numpy.ndarray) -> numpy.ndarray:
"""
:param image:
:param indices: 2xN matrix. 1st row is dim1 (rows) indices, 2nd row is dim2 (cols) indices
:return:
"""
# Todo: Vectorize this
M, N = image.shape
num_indices = indices.shape[1]
interpolated_image = numpy.zeros((1, num_indices))
for i in range(num_indices):
x, y = indices[:, i]
if (x < 0 or x > M - 1) or (y < 0 or y > N - 1):
interpolated_image[0, i] = numpy.nan
else:
# Todo: Do Bilinear Interpolation. For now nearest neighbor is implemented
interpolated_image[0, i] = image[int(round(x)), int(round(y))]
return interpolated_image
But the for loop is taking huge amount of time (as expected). How can I vectorize this? I found scipy.interpolate.interp2d, but I'm not able to use it. Can someone explain how to use this or any other method is also fine. I also found this, but again it is not according to my requirements. Given x and y indices, these generated interpolated matrices. I don't want that. For the given indices, I just want the interpolated values i.e. I need a vector output. Not a matrix.
I tried like this, but as said above, it gives a matrix output
f = interpolate.interp2d(numpy.arange(image.shape[0]), numpy.arange(image.shape[1]), image, kind='linear')
interp_image_vect = f(indices[:,0], indices[:,1])
RuntimeError: Cannot produce output of size 73156608x73156608 (size too large)
For now, I've implemented nearest-neighbor interpolation. scipy interp2d doesn't have nearest neighbor. It would be good if the library function as nearest neighbor (so I can compare). If not, then also fine.
It looks like scipy.interpolate.RectBivariateSpline will do the trick:
from scipy.interpolate import RectBivariateSpline
image = # as given
indices = # as given
spline = RectBivariateSpline(numpy.arange(M), numpy.arange(N), image)
interpolated = spline(indices[0], indices[1], grid=False)
This gets you the interpolated values, but it doesn't give you nan where you need it. You can get that with where:
nans = numpy.zeros(interpolated.shape) + numpy.nan
x_in_bounds = (0 <= indices[0]) & (indices[0] < M)
y_in_bounds = (0 <= indices[1]) & (indices[1] < N)
bounded = numpy.where(x_in_bounds & y_in_bounds, interpolated, nans)
I tested this with a 2624x2624 image and 100,000 points in indices and all told it took under a second.
I have a two-dimensional array that I want to fill up with values that represent powers but my problem lies in the speed of the code because the two-dimensional array is 100x100 size and I don't want to first initialize it with 100x100 list of zereos then fill up the list with values but rather fill up the 100x100 two-dimensional list by values directly. My code is shown down below
x_list = np.linspace(min_x, max_x, (max_x - min_x)+1)
y_list = np.linspace(min_y, max_y, (max_y - min_y)+1)
X, Y = np.meshgrid(x_list, y_list)
Y = Y[::-1]
Z = [[0 for x in range(len(x_list))] for x in range(len(y_list))] #Z is the two-dimensional list containing powers of reach position in the structure to be plotted
for each_axes in range(len(Z)):
for each_point in range(len(Z[each_axes])):
Z[len(Z)-1-each_axes][each_point] = power_at_each_point(each_point, each_axes)
#The method power_at_each_point is the one that calculates the values in the two-dimensional array Z
An example what I want to do is instead of doing what is shown below:
Z_old = [[0,0,0,0,0],[0,0,0,0,0],[0,0,0,0,0]]
for each_axes in range(len(Z_old)):
for each_point in range(len(Z_old[each_axes])):
Z_old[len(Z_old)-1-each_axes][each_point] = power_at_each_point(each_point, each_axes)
I want now to not initialize the Z_old array with zeroes but rather fill it up with values while iterating through it which is going to be something like the written below although it's syntax is horribly wrong but that's what I want to reach in the end.
Z = np.zeros((len(x_list), len(y_list))) for Z[len(x_list) -1 - counter_1][counter_2] is equal to power_at_each_point(counter_1, counter_2] for counter_1 in range(len(x_list)) and counter_2 in range(len(y_list))]
plus the method of power_at_each_point is shown below with it's related methods if it helps you understand what I wanted to do:
#A method to calculate the power reached from one node to the other for contourf function
def cal_pow_rec_plandwall_contour(node_index_tx, receiver):
nodess_excel = xlrd.open_workbook(Node_file_location)
nodes_sheet = nodess_excel.sheet_by_index(0)
node_index_tx_coor = [nodes_sheet.cell_value(node_index_tx - 1, 3), nodes_sheet.cell_value(node_index_tx - 1, 4)] #just co-ordinates of a point
distance = cal_distance(node_index_tx_coor, receiver)
if distance == 0:
power_rec = 10 * (np.log10((nodes_sheet.cell_value(node_index_tx - 1, 0) * 1e-3)))
return power_rec #this is the power received at each position
else:
power_rec = 10 * (np.log10((nodes_sheet.cell_value(node_index_tx - 1, 0) * 1e-3))) - 20 * np.log10((4 * math.pi * distance * 2.4e9) / 3e8) - cal_wall_att([node_index_tx_coor, receiver])
return power_rec
def power_at_each_point(x_cord, y_coord): #A method to get each position in the structure and calculate the power reached at that position to draw the structure's contourf plot
fa = lambda xa: cal_pow_rec_plandwall_contour(xa, [x_cord, y_coord])
return max(fa(each_node) for each_node in range(1, len(Node_Positions_Ascending) + 1)) #Node_position_ascending is a list containing the co-ordinate positions of markers basically or nodes.
If someone could tell me how can I fill the two-dimensional array Z with values from the bottom of the top as I did right there without initially setting the two-dimensional array to zero first it would be much appreciated.
OK, first, you want to create a NumPy array, not a list of lists. This is almost always going to be significantly smaller, and a little faster to work on. And, more importantly, it opens the door to vectorizing your loops, which makes them a lot faster to work on. So, instead of this:
Z_old = [[0,0,0,0,0],[0,0,0,0,0],[0,0,0,0,0]]
… do this:
Z_old = np.zeros((3, 5))
But now let's see whether we can vectorize your loop instead of modifying the values:
for each_axes in range(len(Z_old)):
for each_point in range(len(Z_old[each_axes])):
Z_old[len(Z_old)-1-each_axes][each_point] = each_point**2 + each_axes**2
The initial values of Z[…] aren't being used at all here, so we don't need to pre-fill them with 0, just as you suspected. What is being used at each point is r and c. (I'm going to rename your Z_old, each_axes, and each_point to Z, r, and c for brevity.) In particular, you're trying to set each Z[len(Z)-1-r, c] to r**2 + c**2.
First, let's reverse the negatives so you're setting each Z[r, c] to something—in this case, to (len(Z)-1-r)**2 + c**2.
That "something" is just a function on r and c values. Which we can get by creating aranges. In particular, arange(5) is just an array of the numbers 0, 1, 2, 3, 4, and arange(5)**2 is an array of the squares 0, 1, 4, 9, 16.
The only problem is that to get a 3x5 array out of this, we have to elementwise add two 2D arrays, a 3x1 array and a 1x5 array, vice-versa, but we've got two 1D arrays from arange. Well, we can reshape one of them:
Z_old = (3 - 1 - np.arange(3))**2 + (np.arange(5)**2).reshape((5, 1))
You can, of course, simplify this further (you obviously don't need 3 - 1, and you can just add a new axis without reshape), but hopefully this shows directly how it corresponds to your original code.
Having a nxn (6x6 in the example below) matrix filled only with 0 and 1:
old_matrix=[[0,0,0,1,1,0],
[1,1,1,1,0,0],
[0,0,1,0,0,0],
[1,0,0,0,0,1],
[0,1,1,1,1,0],
[1,0,0,1,1,0]]
I want to resize it in a particular way. Taking (2x2) sub-matrice and checking if there are more ones or zeros. This means the new matrix will be (3x3) If there are more 1 than 0 un the sub-matrice a 1 value will be assigned in the new matrix. Otherwise, (if there are less or equal) its new value will be 0.
new_matrix=[[0,1,0],
[0,0,0],
[0,1,0]]
I've tried to achieve this by using lots of whiles. However it doesn seem to work. Here's what I got so far:
def convert_track(a):
#converts original map to a 8x8 tile Track
NEW_TRACK=[]
w=0 #matrix width
h=0 #matrix heigth
t_w=0 #submatrix width
t_h=0 #submatrix heigth
BLACK=0 #number of ones in submatrix
WHITE=0 #number of zeros in submatrix
while h<=6:
while w<=6:
l=[]
while t_h<=2 and h<=6:
t_w=0
while t_w<=2 and w<=6:
if a[h][w]==1:
BLACK+=1
else:
WHITE+=1
t_w+=1
w+=1
h+=1
t_h+=1
t_w=0
t_h+=1
if BLACK<=WHITE:
l.append(0)
else:
l.append(1)
BLACK=0
WHITE=0
t_h=0
NEW_TRACK.append(l)
return NEW_TRACK
Raises the error list index out of range or returns the list
[[0]]
is there an easier way to achieve this? What am i doing wrong?
If you are willing/able to use NumPy you can do something like this. If you're working with anything like the data you've shown it's well worth your time to learn as operations like these can be done very efficiently and with very little code.
import numpy as np
from scipy.signal import convolve2d
old_matrix=[[0,0,0,1,1,0],
[1,1,1,1,0,0],
[0,0,1,0,0,0],
[1,0,0,0,0,1],
[0,1,1,1,1,0],
[1,0,0,1,1,0]]
a = np.array(old_matrix)
k = np.ones((2,2))
# compute sums at each submatrix
local_sums = convolve2d(a, k, mode='valid')
# restrict to sums corresponding to non-overlapping
# sub-matrices with local_sums[::2, ::2] and check if
# there are more 1 than 0 elements
result = local_sums[::2, ::2] > 2
# convert back to Python list if needed
new_matrix = result.astype(np.int).tolist()
Result:
>>> result.astype(np.int).tolist()
[[0, 1, 0], [0, 0, 0], [0, 1, 0]]
Here I've used convolve2d to compute the sums at each submatrix. From what I can tell you are only interested in non-overlapping sub-matrices, so the part local_sums[::2, ::2] chops out only the sums corresponding to those.