"Array" detection in Tensorflow - python
Can Tensorflow handle inputs of varying shapes (sizes)?
The project
I'm developing a image/shape recognizer which captures an array of {x:#,y:#} positions.
For example, a circle might look like this
[{"x":0.38,"y":0.32},{"x":0.33,"y":0.35},{"x":0.31,"y":0.4},{"x":0.31,"y":0.46},{"x":0.34,"y":0.51},{"x":0.39,"y":0.52},{"x":0.44,"y":0.51},{"x":0.47,"y":0.47},{"x":0.49,"y":0.42},{"x":0.47,"y":0.37},{"x":0.42,"y":0.34},{"x":0.37,"y":0.33}]
and a square like this
[{"x":0.15,"y":0.19},{"x":0.15,"y":0.25},{"x":0.15,"y":0.31},{"x":0.15,"y":0.37},{"x":0.14,"y":0.42},{"x":0.14,"y":0.48},{"x":0.14,"y":0.53},{"x":0.14,"y":0.59},{"x":0.14,"y":0.64},{"x":0.2,"y":0.64},{"x":0.26,"y":0.64},{"x":0.31,"y":0.65},{"x":0.37,"y":0.65},{"x":0.43,"y":0.65},{"x":0.49,"y":0.65},{"x":0.54,"y":0.65},{"x":0.6,"y":0.65},{"x":0.65,"y":0.65},{"x":0.67,"y":0.6},{"x":0.68,"y":0.55},{"x":0.68,"y":0.5},{"x":0.68,"y":0.44},{"x":0.68,"y":0.38},{"x":0.68,"y":0.32},{"x":0.67,"y":0.27},{"x":0.67,"y":0.22},{"x":0.66,"y":0.17},{"x":0.61,"y":0.15},{"x":0.56,"y":0.13},{"x":0.51,"y":0.13},{"x":0.45,"y":0.13},{"x":0.39,"y":0.13},{"x":0.33,"y":0.13},{"x":0.27,"y":0.13},{"x":0.22,"y":0.14},{"x":0.17,"y":0.15}]
Because the length of these shapes can vary I was wondering how Tensorflow would handle it...as I understand it, the input "shape" needs to always be the same length, right?
Yes, the shape should be the same. But, in your case, you can make sure that for a batch, all the arrays have the same number of elements by adding dummy elements to those which fall short in length.
Just make sure that for a batch, your shape is same.
Related
Create tensor with arrays of different dimensions in PyTorch
I want to concatenate arrays of different dimensions to feed them to my neural network that will have as first layer the AdaptiveAveragePooling1d. I have a dataset that is composed of several signals (1D arrays), each one with a different length. For example: array1 = np.random.randn(1200,1) array2 = np.random.randn(950,1) array3 = np.random.randn(1000,1) I want to obtain a tensor in which I concatenate these three signals to obtain a 2D tensor. However if I try to do tensor = torch.Tensor([array1, array2, array3]) It gives me this error: ValueError: expected sequence of length 1200 at dim 2 (got 950) Is there a way to obtain such thing? EDIT More information about the dataset: Each signal window represents a heart beat on the ECG registration, taken from several patients, sampled with a sampling frequency of 1000Hz The beats can have different lengths, because it depends on the heart rate of the patient itself For each beat I need to predict the length of the QRS interval (the target of the network) that I have, expressed in milliseconds I have already thought of interpolating the shortest samples to the the length of the longest ones, but then I would also have to change the length of the QRS interval in the labels, is that right? I have read of this AdaptiveAveragePooling1d layer, that would allow me to input the network with samples of different sizes. But my problem is how do I input the network a dataset in which each sample has a different length? How do I group them without using a filling method with NaNs or zeros? I hope I explained myself.
This disobeys the definition of a tensor and is impossible. If a tensor is of shape (NxMx1), all of the N matrices must be of size (Mx1). There are still ways to get all your arrays to the same length. Look at where your data is coming from and what its structure is and figure out which of the following solutions would work. Some of these may change the signal's derivative in a way you don't like Cropping arrays to the same size (ie cutting start/end off) or zero padding the shorter ones to the length of the longer one (I really dislike this one and it would only work for very specific applications) 'Stretching' the arrays to the same size by using interpolation Shortening the arrays to the same size by subsampling For some applications, maybe even passing the coefficients of a fourier series from the signals EDIT For heart rate, which should be a roughly periodic signal, I'd definitely crop the signal which should work quite well. Passing FFT(equally cropped signals) or Fourier coefficients may also yield interesting results, but from my experience with neural spike data, training on the FFT of a signal like this doesn't perform any better when you have enough data to train off. Also if you're using a fully connected network, a using 1D convolutions is a good alternative to try.
Dynamically change observation array length in Tensorflow?
I try to do a reinforcement learning enviroment with tf_agents in Tensorflow. Is it possible to dynamically change the size of the observation array? For example I want the agent to learn to find the minimum path in a weighted graph, so each episode I create a random graph. Each step the agent is on a vertex and the observation array contains the outgoing edge weights. Sometimes there is 1 but sometimes more, so the size is not constant. I define the observation like this in the enviroment's init function, where n is the number of outgoing edges from the start vertex: self._observation_spec = array_spec.BoundedArraySpec(shape=(1,n), dtype=np.int32, minimum=0, name='observation') If later on I want to change the size of the array it raises an error (ValueError given time_spec does not match expected...). Is it possible to get around this error or do I need to change the structure of the enviroment in this example?
It is definetly not possible to change the size of your observation (even if your could pass this ArraySpec check, your agent cannot manage differently sized inputs). I suggest reformatting your environment so that it supports graphs where each node has a maximum of x neighbours and you just output a multi-hot encoded vector of size x.
Creating a 3D numpy array with different data types
I want to create a numpy 3-dimensional array that would be a representation of a texas holdem starting hand matrix with corresponding frequencies for performing certain action in a given spot facing some action (for example UTG facing 3-bet from BU). If you google preflop hand chart you will find thousands of pictures with hand matrices where fold/call/raise as actions are usually indicated by different colors. I want to represent that in a numpy 3-dimensional array WITH DIFFERENT DATA TYPES with 13 rows x 13 columns and any number of "layers" in 3rd dimension depending on number of actions I want to store, for example I might want to store min raise/raise 3x/raise all in/call/fold. For that I would need different data types for the first element of "3rd dimension" and integers or decimals for other layers of 3rd dimension. First layer would be just the text representing starting hand combination (like "AA" or "89suited" and the rest of the cells would be numeric. I created an image for easier understanding of what I mean. Green layer would be string data type representing the hand matrix. Yellow layer would be number of combinations of that starting hand. Blue layer would be for example how often you raise. If you look at the picture you would see that AKs gets raised 81% of the time while AQs 34% of the time. To get green layer you would type: array[:,:,0] Yellow layer would be: array[:,:,1] ans so forth. I know how to create a solution for my problem using JSON, dictionary or some other tool but in the interests of learning and challenges I would like to solve that using numpy. I also know how to create an array of all text and I could store numbers as strings, retrieve them as such and convert them but that solution is also unsatisfactory. Plus it would be beneficial to have it as numpy array because of all the slicing and summing that you can do on an array, like knowing the total number of hands that get raised which in this case would be sum of (number of combos, i.e. layer 2 * frequencies of individual starting hands getting raised). So the question boils down to how to create a 3d numpy array from the start with different data types?
How do you do ROI-Pooling on Areas smaller than the target size?
I am currently trying to get the Faster R-CNN network from here to work in windows with tensorflow. For that, I wanted to re-implement the ROI-Pooling layer, since it is not working in windows (at least not for me. If you got any tips on porting to windows with tensorflow, I would highly appreciate your comments!). According to this website, what you do is, you take your proposed roi from your feature map and max pool its content to a fixed output size. This fixed output is needed for the following fully connected layers, since they only accept a fixed size input. The problem now is the follwing: After conv5_3, the last convolutional layer before roi pooling, the box that results from the region proposal network is mostly 5x5 pixels in size. This is totally fine, since the objects I want to detect usually have dimensions of 80x80 pixels in the original image (downsampling factor due to pooling is 16). However, I now have to max pool an area of 5x5 pixels and ENLARGE it to 7x7, the target size for the ROI-Pooling. My first try by simply doing interpolation did not work. Also, padding with zeros did not work. I always seem to get the same scores for my classes. Is there anything I am doing wrong? I do not want to change the dimensions of any layer and I know that my trained network in general works because I have the reference implementation running in Linux on my dataset. Thank you very much for your time and effort :)
There is now an official TF implementation of Faster-RCNN, and other object detection algorithms, in their Object Detection API, you should probably check it out. If you still want to code it yourself, I wondered exactly the same thing as you and could not find an answer about how you're supposed to do. My three guesses would be: interpolation, but it changes the feature values, so it destroys some information... Resizing to 35x35 just by copying 7 times each cell and then max-pooling back to 7x7 (you don't have to actually do the resizing and then the pooling , for instance in 1D it basically reduces itself to output[i]=max(input[floor(i*5/7)], input[ceil(i*5/7)]), with a similar max over 4 elements in 2D -be careful, I might have forgotten some +1/-1 or something-). I see at least two problems: some values are over-represented, being copied more than others; but worse, some (small) values will not even be copied at all in the output ! (which you should avoid given that you can store more information in the output than in the input) Making sure all input feature values are copied at least once exactly in the output, at the best possible place (basically copy input[i] to output[j] with j=floor((i+1)*7/5)-1)). For the remaining spots, either leave a 0 or do interpolation. I would think this solution is the best, maybe with interpolations but I'm really not sure at all. It looks like smallcorgi's implementation uses my 2nd solution (without actually resizing, just using max pooling), since it's the same implementation as for the case where the input is bigger than the output.
I know it's late but i post this answer because it might help others. I have written a code that explains how roi pooling works in different height and width conditions for both pool and region. you can see the link of the code in github: https://github.com/Parsa33033/RoiPooling
What does the column and rows for images in TensorBoard mean?
I was trying to use the tensorflow tf. image_summary but it wasn't clear to me how to use it. In the tensorboard readme file they have the following sentence that confuses me: The dashboard is set up so that each row corresponds to a different tag, and each column corresponds to a run. I don't understand the sentence and thus, I am having a hard time figuring out what the columns and rows mean for TensorBoard image visualization. What exactly is a "tag" and what exactly is a "run"? How do I get multiple "tags" and multiple "runs" to display? Why would I want multiple "tags" and "runs" to display? Does someone have a very simple but non-trivial example of how to use this? Ideally, what I want to use is compare how my model performs with respect to PCA so in my head it would be nice to compare how the reconstructions compare to PCA reconstruction at each step. Not sure if this is a good idea but I also want to see what the activation images look like and how the templates look like. Curenttly I have a very simple script with the following lines: with tf.name_scope('input_reshape'): x_image = tf.to_float(x, name='ToFloat') image_shaped_input = tf.reshape(x_image, [-1, 28, 28, 1]) tf.image_summary('input', image_shaped_input, 10) currently I have managed to discover that the rows are of length 10 so i assume its showing me 10 images that have something to do with the current run/batch. however, if possible I'd like to see reconstruction, filters (currently I am doing fully connected to keep things simple but eventually it would be nice to see a conv net examples), activation units (with any number of units that I choose), etc.
TensorFlow was officially released (r1.0) after this question was posed, and the functions and documentation accompanying Tensorboard have been simplified. tf.summary.image is now the Op for writing images represented by a 4D Tensor to the summary file; here is the documentation. To answer your questions about rows and columns, each call to tf.summary.image generates a new tag or row of image summaries with the total number dictated by the value passed as max_outputs (10 in your given example). As to why one might want to view more than one column of data, If the first dimension of the 4D Tensor is greater than 1 (i.e. batch size > 1), it will be helpful to see more than one column in Tensorboard to get a better sense of the entire batch of images. Finally, having multiple tags is helpful when wanting to view two different collections of images, such as input images and reconstructed images if you were building an autoencoder architecture.