Can Tensorflow handle inputs of varying shapes (sizes)?
The project
I'm developing a image/shape recognizer which captures an array of {x:#,y:#} positions.
For example, a circle might look like this
[{"x":0.38,"y":0.32},{"x":0.33,"y":0.35},{"x":0.31,"y":0.4},{"x":0.31,"y":0.46},{"x":0.34,"y":0.51},{"x":0.39,"y":0.52},{"x":0.44,"y":0.51},{"x":0.47,"y":0.47},{"x":0.49,"y":0.42},{"x":0.47,"y":0.37},{"x":0.42,"y":0.34},{"x":0.37,"y":0.33}]
and a square like this
[{"x":0.15,"y":0.19},{"x":0.15,"y":0.25},{"x":0.15,"y":0.31},{"x":0.15,"y":0.37},{"x":0.14,"y":0.42},{"x":0.14,"y":0.48},{"x":0.14,"y":0.53},{"x":0.14,"y":0.59},{"x":0.14,"y":0.64},{"x":0.2,"y":0.64},{"x":0.26,"y":0.64},{"x":0.31,"y":0.65},{"x":0.37,"y":0.65},{"x":0.43,"y":0.65},{"x":0.49,"y":0.65},{"x":0.54,"y":0.65},{"x":0.6,"y":0.65},{"x":0.65,"y":0.65},{"x":0.67,"y":0.6},{"x":0.68,"y":0.55},{"x":0.68,"y":0.5},{"x":0.68,"y":0.44},{"x":0.68,"y":0.38},{"x":0.68,"y":0.32},{"x":0.67,"y":0.27},{"x":0.67,"y":0.22},{"x":0.66,"y":0.17},{"x":0.61,"y":0.15},{"x":0.56,"y":0.13},{"x":0.51,"y":0.13},{"x":0.45,"y":0.13},{"x":0.39,"y":0.13},{"x":0.33,"y":0.13},{"x":0.27,"y":0.13},{"x":0.22,"y":0.14},{"x":0.17,"y":0.15}]
Because the length of these shapes can vary I was wondering how Tensorflow would handle it...as I understand it, the input "shape" needs to always be the same length, right?
Yes, the shape should be the same. But, in your case, you can make sure that for a batch, all the arrays have the same number of elements by adding dummy elements to those which fall short in length.
Just make sure that for a batch, your shape is same.
I want to create a numpy 3-dimensional array that would be a representation of a texas holdem starting hand matrix with corresponding frequencies for performing certain action in a given spot facing some action (for example UTG facing 3-bet from BU).
If you google preflop hand chart you will find thousands of pictures with hand matrices where fold/call/raise as actions are usually indicated by different colors.
I want to represent that in a numpy 3-dimensional array WITH DIFFERENT DATA TYPES with 13 rows x 13 columns and any number of "layers" in 3rd dimension depending on number of actions I want to store, for example I might want to store min raise/raise 3x/raise all in/call/fold. For that I would need different data types for the first element of "3rd dimension" and integers or decimals for other layers of 3rd dimension. First layer would be just the text representing starting hand combination (like "AA" or "89suited" and the rest of the cells would be numeric.
I created an image for easier understanding of what I mean.
Green layer would be string data type representing the hand matrix.
Yellow layer would be number of combinations of that starting hand.
Blue layer would be for example how often you raise. If you look at the picture you would see that AKs gets raised 81% of the time while AQs 34% of the time.
To get green layer you would type:
array[:,:,0]
Yellow layer would be:
array[:,:,1]
ans so forth.
I know how to create a solution for my problem using JSON, dictionary or some other tool but in the interests of learning and challenges I would like to solve that using numpy.
I also know how to create an array of all text and I could store numbers as strings, retrieve them as such and convert them but that solution is also unsatisfactory.
Plus it would be beneficial to have it as numpy array because of all the slicing and summing that you can do on an array, like knowing the total number of hands that get raised which in this case would be sum of (number of combos, i.e. layer 2 * frequencies of individual starting hands getting raised).
So the question boils down to how to create a 3d numpy array from the start with different data types?
The images of my dataset exceed the maximum input for my CNN. This means I need to slice my images in parts to feed to the CNN. The most simple method is like this, resulting in 4 sliced images, where 2 do not contain any labels:
The slices are marked by the dotted line.
I was wondering if there was an algorithm that finds the optimal slices to get a minimum amount of sliced images while still including all labels. Like in this example, the minimum slice amount of slices is 1:
Is there any algorithm implemented in Python that does something like this?
I want to find 1D correlation between two matrices. These two matrices are the output of a convolution operation on two different images. Let's call the first matrix as matrix A and the other one as matrix B. Both these matrices have the shape 100 x 100 x 64 (say).
I've been following a research paper which basically computes 1D correlation between these two matrices (matrix A and matrix B) in one of the steps and the output of the correlation operation is also a matrix with the shape 100 x 100 x 64. The link to the paper can be found here. The network can be found on Page 4. The correlation part is in the bottom part of the network. A couple of lines have been mentioned about it in the 2nd paragraph of section 3.3 (on the same page, below the network).
I am not really sure what they mean by 1D correlation and more so how to implement it in Python. I am also confused as to how the shape of the output remains the same as the input after applying correlation. I am using the PyTorch library for implementing this network.
Any help will be appreciated. Thanks.
So they basically have 1 original image, which they treat as the left side view for the depth perception algorithm, but since you need stereo vision to calculate depth in a still image they use a neural structure to synthesise a right side view.
1 Dimensional Correlation takes 2 sequences and calculates the correlation at each point giving you another 1D sequence of the same length as the 2 inputs. So if you apply this correlation along a certain axis of a tensor the resultant tensor does not change shape.
Intuitively they thought it made sense to correlate the images along the horizontal axes a bit like reading the images like reading a book, but in this instance it should have an effect akin to identifying that things that are further away also appear to be points that are closer together in the left and right side views. The correlation is probably higher for left and right side data-points that are further away and this makes the depth classification for the neural network much easier.
I want to run simple MLP Classifier (Scikit learn) with following set of data.
Data set consists of 100 files, containing sound signals. Each file has two columns (two signals) and rows (length of the signals). The length of rows (signals) vary from file to file ranges between 70 to 80 values. So the dimensions of file are 70 x 2 to 80 x 2. Each file represent one complete record.
The problem I am facing how to train simple MLP with variable length of data, with training and testing set contains 75 and 25 files respectively.
One solution is to concatenate all file and make one file i.e. 7500 x 2 and train MLP. But important information of signals is no longer useful in this case.
Three approaches in order of usefulness. Approach 1 is strongly recommended.
1st Approach - LSTM/GRU
You don't use simple MLP. The type of data you're dealing with is a sequential data. Recurrent networks (LSTM/GRU) have been created for this purpose. They are capable of processing variable length sequences.
2nd Approach - Embeddings
Find a function that can transform your data into a fixed-length sequence, called embedding. An example of network producing time series embedding is TimeNet. However, that essentially brings us back to the first approach.
3rd Approach - Padding
If you you can find a reasonable upper bound for the length of sequence, you can pad shorter series to the length of the longest one (pad 0 at the beginning/end of the series, interpolate/forecast the remaining values), or cut longer series to the length of the shortest one. Obviously you will either introduce noise or lose information, respectively.
This is a very old question, however, it is very related to my recent research topic. Aechlys provides alternatives to solve your problems, which is great. Let me clarify it more clearly. Neural networks can be divided into two sorts according to the size of input length: one is fixed-size and the other is varying-size.
For fixed-size, the most common example is MLP. Traditionally, it is insensitive to the position of your model input. In other words, you assume that the order of your input features does not matter. For instance, you use age, sex, education to predict the salary of a person. These characteristics can be placed in positions of your MLP.
For varying-size, model architectures include RNN, LSTM, Transformer. They are specifically designed for sequential data like texts and time series. These sorts of data have a natural order within their data points. They can perfectly deal with vary-size inputs.
To summarize, you might use the wrong model to deal with signals with MLP. The optimal choice is to adopt RNN/Transformer.