Minimum amount of image slices - CNN - python

The images of my dataset exceed the maximum input for my CNN. This means I need to slice my images in parts to feed to the CNN. The most simple method is like this, resulting in 4 sliced images, where 2 do not contain any labels:
The slices are marked by the dotted line.
I was wondering if there was an algorithm that finds the optimal slices to get a minimum amount of sliced images while still including all labels. Like in this example, the minimum slice amount of slices is 1:
Is there any algorithm implemented in Python that does something like this?

Related

Create tensor with arrays of different dimensions in PyTorch

I want to concatenate arrays of different dimensions to feed them to my neural network that will have as first layer the AdaptiveAveragePooling1d. I have a dataset that is composed of several signals (1D arrays), each one with a different length. For example:
array1 = np.random.randn(1200,1)
array2 = np.random.randn(950,1)
array3 = np.random.randn(1000,1)
I want to obtain a tensor in which I concatenate these three signals to obtain a 2D tensor.
However if I try to do
tensor = torch.Tensor([array1, array2, array3])
It gives me this error:
ValueError: expected sequence of length 1200 at dim 2 (got 950)
Is there a way to obtain such thing?
EDIT
More information about the dataset:
Each signal window represents a heart beat on the ECG registration, taken from several patients, sampled with a sampling frequency of 1000Hz
The beats can have different lengths, because it depends on the heart rate of the patient itself
For each beat I need to predict the length of the QRS interval (the target of the network) that I have, expressed in milliseconds
I have already thought of interpolating the shortest samples to the the length of the longest ones, but then I would also have to change the length of the QRS interval in the labels, is that right?
I have read of this AdaptiveAveragePooling1d layer, that would allow me to input the network with samples of different sizes. But my problem is how do I input the network a dataset in which each sample has a different length? How do I group them without using a filling method with NaNs or zeros?
I hope I explained myself.
This disobeys the definition of a tensor and is impossible. If a tensor is of shape (NxMx1), all of the N matrices must be of size (Mx1).
There are still ways to get all your arrays to the same length. Look at where your data is coming from and what its structure is and figure out which of the following solutions would work. Some of these may change the signal's derivative in a way you don't like
Cropping arrays to the same size (ie cutting start/end off) or zero padding the shorter ones to the length of the longer one (I really dislike this one and it would only work for very specific applications)
'Stretching' the arrays to the same size by using interpolation
Shortening the arrays to the same size by subsampling
For some applications, maybe even passing the coefficients of a
fourier series from the signals
EDIT
For heart rate, which should be a roughly periodic signal, I'd definitely crop the signal which should work quite well. Passing FFT(equally cropped signals) or Fourier coefficients may also yield interesting results, but from my experience with neural spike data, training on the FFT of a signal like this doesn't perform any better when you have enough data to train off.
Also if you're using a fully connected network, a using 1D convolutions is a good alternative to try.

Creating a 3D numpy array with different data types

I want to create a numpy 3-dimensional array that would be a representation of a texas holdem starting hand matrix with corresponding frequencies for performing certain action in a given spot facing some action (for example UTG facing 3-bet from BU).
If you google preflop hand chart you will find thousands of pictures with hand matrices where fold/call/raise as actions are usually indicated by different colors.
I want to represent that in a numpy 3-dimensional array WITH DIFFERENT DATA TYPES with 13 rows x 13 columns and any number of "layers" in 3rd dimension depending on number of actions I want to store, for example I might want to store min raise/raise 3x/raise all in/call/fold. For that I would need different data types for the first element of "3rd dimension" and integers or decimals for other layers of 3rd dimension. First layer would be just the text representing starting hand combination (like "AA" or "89suited" and the rest of the cells would be numeric.
I created an image for easier understanding of what I mean.
Green layer would be string data type representing the hand matrix.
Yellow layer would be number of combinations of that starting hand.
Blue layer would be for example how often you raise. If you look at the picture you would see that AKs gets raised 81% of the time while AQs 34% of the time.
To get green layer you would type:
array[:,:,0]
Yellow layer would be:
array[:,:,1]
ans so forth.
I know how to create a solution for my problem using JSON, dictionary or some other tool but in the interests of learning and challenges I would like to solve that using numpy.
I also know how to create an array of all text and I could store numbers as strings, retrieve them as such and convert them but that solution is also unsatisfactory.
Plus it would be beneficial to have it as numpy array because of all the slicing and summing that you can do on an array, like knowing the total number of hands that get raised which in this case would be sum of (number of combos, i.e. layer 2 * frequencies of individual starting hands getting raised).
So the question boils down to how to create a 3d numpy array from the start with different data types?

1D correlation between 2 matrices

I want to find 1D correlation between two matrices. These two matrices are the output of a convolution operation on two different images. Let's call the first matrix as matrix A and the other one as matrix B. Both these matrices have the shape 100 x 100 x 64 (say).
I've been following a research paper which basically computes 1D correlation between these two matrices (matrix A and matrix B) in one of the steps and the output of the correlation operation is also a matrix with the shape 100 x 100 x 64. The link to the paper can be found here. The network can be found on Page 4. The correlation part is in the bottom part of the network. A couple of lines have been mentioned about it in the 2nd paragraph of section 3.3 (on the same page, below the network).
I am not really sure what they mean by 1D correlation and more so how to implement it in Python. I am also confused as to how the shape of the output remains the same as the input after applying correlation. I am using the PyTorch library for implementing this network.
Any help will be appreciated. Thanks.
So they basically have 1 original image, which they treat as the left side view for the depth perception algorithm, but since you need stereo vision to calculate depth in a still image they use a neural structure to synthesise a right side view.
1 Dimensional Correlation takes 2 sequences and calculates the correlation at each point giving you another 1D sequence of the same length as the 2 inputs. So if you apply this correlation along a certain axis of a tensor the resultant tensor does not change shape.
Intuitively they thought it made sense to correlate the images along the horizontal axes a bit like reading the images like reading a book, but in this instance it should have an effect akin to identifying that things that are further away also appear to be points that are closer together in the left and right side views. The correlation is probably higher for left and right side data-points that are further away and this makes the depth classification for the neural network much easier.

How to input Scikit learn MLP classifier with variable length of input data.

I want to run simple MLP Classifier (Scikit learn) with following set of data.
Data set consists of 100 files, containing sound signals. Each file has two columns (two signals) and rows (length of the signals). The length of rows (signals) vary from file to file ranges between 70 to 80 values. So the dimensions of file are 70 x 2 to 80 x 2. Each file represent one complete record.
The problem I am facing how to train simple MLP with variable length of data, with training and testing set contains 75 and 25 files respectively.
One solution is to concatenate all file and make one file i.e. 7500 x 2 and train MLP. But important information of signals is no longer useful in this case.
Three approaches in order of usefulness. Approach 1 is strongly recommended.
1st Approach - LSTM/GRU
You don't use simple MLP. The type of data you're dealing with is a sequential data. Recurrent networks (LSTM/GRU) have been created for this purpose. They are capable of processing variable length sequences.
2nd Approach - Embeddings
Find a function that can transform your data into a fixed-length sequence, called embedding. An example of network producing time series embedding is TimeNet. However, that essentially brings us back to the first approach.
3rd Approach - Padding
If you you can find a reasonable upper bound for the length of sequence, you can pad shorter series to the length of the longest one (pad 0 at the beginning/end of the series, interpolate/forecast the remaining values), or cut longer series to the length of the shortest one. Obviously you will either introduce noise or lose information, respectively.
This is a very old question, however, it is very related to my recent research topic. Aechlys provides alternatives to solve your problems, which is great. Let me clarify it more clearly. Neural networks can be divided into two sorts according to the size of input length: one is fixed-size and the other is varying-size.
For fixed-size, the most common example is MLP. Traditionally, it is insensitive to the position of your model input. In other words, you assume that the order of your input features does not matter. For instance, you use age, sex, education to predict the salary of a person. These characteristics can be placed in positions of your MLP.
For varying-size, model architectures include RNN, LSTM, Transformer. They are specifically designed for sequential data like texts and time series. These sorts of data have a natural order within their data points. They can perfectly deal with vary-size inputs.
To summarize, you might use the wrong model to deal with signals with MLP. The optimal choice is to adopt RNN/Transformer.

image feature detection with large structuring element

I am trying to extract some features from an image but each of the extracted features are really small. The easiest way to extract larger features seems to be to use a larger structuring element but the following code fails when ITER > 1.
from scipy import ndimage,misc
lena=misc.lena().astype(float64)
lena/=ndimage.maximum(lena)
lena=lena>0.54# convert to binary image
# =====================
ITER=1 # || FAILS WHEN ITER > 1 ||
# =====================
struct=ndimage.generate_binary_structure(2,1)
struct=ndimage.iterate_structure(struct,ITER)
lena_label,n =ndimage.label(lena,struct)
slices=ndimage.find_objects(lena_label)
images=[lena[sl] for sl in slices]
imshow(images[0])
.
RuntimeError: structure dimensions must be equal to 3
The parameter structure for the ndimage.label function is used to determine the connectivity of the input. When you represent the input as a rectangular matrix, this connectivity commonly regards either the 4 or the 8 neighbors around a point p. Scipy follows this convention and limits the accepted structure to such cases, therefore it raises an error when anything larger than 3x3 is passed to the function.
If you really want to do such thing, first you need to define very clearly the connectivity you are trying to describe. Then you need to implement it. A simpler way is to first dilate the input, and then label it. This will effectively give the larger features that would be labeled with a larger structure parameter.

Categories