I'm trying to classify images using an Artificial Neural Network and the approach I want to try is:
Get feature descriptors (using SIFT for now)
Classify using a Neural Network
I'm using OpenCV3 and Python for this.
I'm relatively new to Machine Learning and I have the following question -
Each image that I analyse will have different number of 'keypoints' and hence different dimensions of the 2D 'descriptor' array. How do I decide the input for my ANN. For example for one sample image the descriptor shape is (12211, 128) so do I flatten this array and use it as an input, in which case I have to worry about varying input sizes for each image, or do I compute something else for the input?
I'm not sure if this is an exact solution but this worked for me. The main idea is as follows:
Divide your image into a MxN grid.
Obtain a set number of feature points for each sub-image.
Concatenate the results for all the sub-images to obtain a feature vector for the entire image.
The supporting code roughly is given below (the function "pre_process_image"):
def tiles(arr, nrows, ncols):
"""
If arr is a 2D array, the returned list contains nrowsXncols numpy arrays
with each array preserving the "physical" layout of arr.
When the array shape (rows, cols) are not divisible by (nrows, ncols) then
some of the array dimensions can change according to numpy.array_split.
"""
rows, cols, channel = arr.shape
col_arr = np.array_split(range(cols), ncols)
row_arr = np.array_split(range(rows), nrows)
return [arr[r[0]: r[-1]+1, c[0]: c[-1]+1]
for r, c in product(row_arr, col_arr)]
def pre_process_images(data, dimensions=(28, 28)):
images = data['image']
features = []
count = 1
nrows = dimensions[0]
ncols = dimensions[1]
sift = cv2.xfeatures2d.SIFT_create(1)
for arr in images:
image_feature = []
cut_image = tiles(arr, nrows, ncols)
for small_image in cut_image:
(kps, descs) = sift.detectAndCompute(im, None)
image_feature.append(descs.flatten())
features.append(image_feature)
print count
count += 1
data['sift_features'] = features
return data
However this is extremely slow. I'm working on a way to optimally select features using PCA right now for the same.
It will be good if you apply Normalization on each image before getting the feature extractor.
Related
I have some masked 1D signal data stacked in a 3D tensor, so the dimensions are [batch_size, num_signals, num_samples]. I would like to compute bootstrap averages of each set of non-masked signals. So the final dimensions should be [batch_size, num_samples]
I have a solution that is pretty painful in a mix of python, numpy, and TensorFlow - it loops over each batch, gets the non-masked signals, samples from that list, and computes an average - is there a better way?
# set up data
data = tf.cast(tf.random.uniform((2, 4, 5), minval=1.0, maxval=100.0), tf.float32)
mask = tf.constant([[1,1,0,0], [1,1,0,1]])
# gather results for each element in batch
results = []
for n_in_batch in range(data.shape[0]):
# select signals by mask
valid_idxs = np.nonzero(mask[n_in_batch,:])[0]
# sample from valid indexes with replacement
boot_idxs = tf.numpy_function(np.random.choice, (valid_idxs, len(valid_idxs), True), tf.int64)
# select from the data using the bootstrapped indices
boot_signals = tf.gather(data[n_in_batch,:,:], boot_idxs)
# compute average and save results
av_signal = tf.reduce_mean(boot_signals, axis=0)
results.append(av_signal)
final = tf.stack(results)
I have (for the most part) gigapixel images that I have divided into 512x512 patches. Then I feed each 512x512 2D image with 3 channel into a ResNet18 frozen network for feature extraction and I end up with a 1D 512 tensor. Eventually, I concatenate all these 512x512 1D 512 tensors and I end up with Nx512 intermediate representation dimension where N is the number of patches in the gigapixel image.
Since my original gigapixel images are not all the same size and they range from 17x512 to 6000x512, I am using the following as a strategy in order to feed them to my model. However, my preference is to use a more standardize method as in PyTorch (in case of 2D images with 3 channel perhaps we could easily do torch transform -- not here).
feature_path = 'features.pt'
features = torch.load(feature_path, map_location=lambda storage, loc: storage)
if features.shape[0] <= median_num_patches:
a = torch.zeros((median_num_patches - features.shape[0], 512)) #zero padding to lenght median_num_patches
embeddings = torch.cat((features, a), axis=0)
sample['image'] = embeddings
else:
random_indices = torch.randint(features.shape[0], (median_num_patches, )) # max size: 6000 patches in an image
sample['image'] = features[random_indices, :]
^ As mentioned earlier, the 2D intermediate representation (Nx512) is created in an offline process and saved in features.pt files.
The above solution, after finding what the median of size of 2D intermediate representations are based on number of patches in each gigapixel image, first checks to see if the size of current 2D intermediate representation in the batch is smaller that the median, and if so, it zero-fills that 2D intermediate representation to the size of median. And if the size of 2D intermediate representation in the batch is larger than median, it does sample median number of patches from that 2D representation.
I am looking for a better solution than the current one. Perhaps something without sampling or zero-filling and without loss of data. Thanks for any possible lead.
I have a ndarray of shape (68, 64, 64) called 'prediction'. These dimensions correspond to image_number, height, width. For each image, I have a tuple of length two that contains coordinates that corresponds to a particular location in each 64x64 image, for example (12, 45). I can stack these coordinates into another Numpy ndarray of shape (68,2) called 'locations'.
How can I construct a slice object or construct the necessary advanced indexing indices to access these locations without using a loop? Looking for help on the syntax. Using pure Numpy matrixes without loops is the goal.
Working loop structure
Import numpy as np
# example code with just ones...The real arrays have 'real' data.
prediction = np.ones((68,64,64), dtype='float32')
locations = np.ones((68,2), dtype='uint32')
selected_location_values = np.empty(prediction.shape[0], dtype='float32')
for index, (image, coordinates) in enumerate(zip(prediction, locations)):
selected_locations_values[index] = image[coordinates]
Desired approach
selected_location_values = np.empty(prediction.shape[0], dtype='float32')
correct_indexing = some_function_here(locations). # ?????
selected_locations_values = predictions[correct_indexing]
A straightforward indexing should work:
img = np.arange(locations.shape[0])
r = locations[:, 0]
c = locations[:, 1]
selected_locations_values = predictions[img, r, c]
Fancy indexing works by selecting elements of the indexed array that correspond to the shape of the broadcasted indices. In this case, the indices are quite straightforward. You just need the range to tell you what image each location corresponds to.
I am analyzing some image represented datasets using keras. I am stuck that I have two different dimensions of images. Please see the snapshot. Features has 14637 images having dimension (10,10,3) and features2 has dimension (10,10,100)
Is there any way that I can merge/concatenate these two data together.?
If features and features2 contain the features of the same batch of images, that is features[i] is the same image of features2[i] for each i, then it would make sense to group the features in a single array using the numpy function concatenate():
newArray = np.concatenate((features, features2), axis=3)
Where 3 is the axis along which the arrays will be concatenated. In this case, you'll end up with a new array having dimension (14637, 10, 10, 103).
However, if they refer to completely different batches of images and you would like to merge them on the first axis such that the 14637 images of features2 are placed after the first 14637 image, then, there no way you can end up with an array, since numpy array are structured as matrix, non as a list of objects.
For instance, if you try to execute:
> a = np.array([[0, 1, 2]]) // shape = (1, 3)
> b = np.array([[0, 1]]) // shape = (1, 2)
> c = np.concatenate((a, b), axis=0)
Then, you'll get:
ValueError: all the input array dimensions except for the concatenation axis must match exactly
since you are concatenating along axis = 0 but axis 1's dimensions differ.
If dealing with numpy arrays, you should be able to use concatenate method and specify the axis, along which the data should be merged. Basically: np.concatenate((array_a, array_b), axis=2)
I think it would be better if you use class.
class your_class:
array_1 = []
array_2 = []
final_array = []
for x in range(len(your_previous_one_array)):
temp_class = your_class
temp_class.array_1 = your_previous_one_array
temp_class.array_2 = your_previous_two_array
final_array.append(temp_class)
Let there be some 4D array [x,y,z,k] comprised of k 3D images [x,y,z].
Is there any way to calculate the variance of each individual pixel in 3D from the 4D array?
E.g. I have a 10x10x10x5 array and would like to return a 10x10x10 variance array; the variance is calculated for each pixel (or voxel, really) along k
If this doesn't make sense, let me know and I'll try explaining better.
Currently, my code is:
tensors = []
while error > threshold:
for _ in range(5): #arbitrary
new_tensor = foo(bar) #always returns array of same size
tensors.append(new_tensor)
tensors = np.stack(tensors, axis = 3)
#tensors.shape
And I would like the calculate a variance array for tensors
There is a simple way to do that if you're using numpy:
variance = tensors.var(axis=3)