Creating feature matrix and target vector - python

I have total 100 rows in dataset. I want to create feature matrix and target vector with only 1% of rows with random_state=150. How can I do that?
I tries train_test_split.

Related

Deep NN architecture for predicting a matrix from a matrix and list of floats

I am trying to predict a matrix (size RxC) based on an input matrix (size RxC) and a list of floats L (length P). Essentially, I am trying to create an ML model to replace ray-tracing simulation. For each L, ray-tracing software will spend about 20-30 mins to generate a new matrix, so I was thinking of by-passing the ray-tracing by training an NN to predict the new matrix using a base-line input matrix and a list of float that contains the deviation of ray-tracing parameters from the base-line value.
There is a spatial relationship in the matrix i.e M[0,0] has a relationship with M[1,1]. But since I will also have a list of float as an input, I don't think using CNN will work, won't it?
Is there any known architecture to complete this task?
I am thinking to flatten the input (input dimension becomes R*C+P) and the output layer will also be flattened (dimension R*C).
Thanks a lot and any suggestion will be appreciated! Cheers!

Can we use CNN to effectively learn from a table whose various rows are permuted in the testing dataset, output for each remains same?

I have a set of classes, 37 to be precise. Each class is represented by a feature vector of size [1,10].
A single input sample has the dimensions of [6,10], where each row in this sample represents a different class.
Input only has one arrangement of these rows, whereas during testing, the rows can be permuted in input sample and output should remain same.
Can I use CNN to drop the need of training on every permutation by using a kernel of size [1,10] which will effectively run slide vertically on input sample?
The size of the training dataset is considerably less(500 samples). #FilippoGrazioli

How to perform regression using a 2-D matrix to predict a single constant value?

How do I predict a single target value with size of [1 x 1] using input of size [M x N]?
I worked with data where every row of features had a corresponding 'target' value I'm trying to match with my prediction, so dimensions of input and output are matched.
I could duplicate the single 'target' value M times to create a target with [M x 1] size, but I think theoretically, using a single row of features to predict a single target (albeit the target is constant), then tuning the model over many rows (iterations) for the final model, is not the same as using multiple rows and multiple columns to predict the single target value. Am I wrong?

Classification of individual values with matrix

I have a series of matrices 30x30 matrices that contain elements ranging from 0 to 75 (input matrices) and each one has a 30x30 matrix containing only 1s and 0s (output matrices). I am trying to train a classifier on the input matrices to predict the output matrices, however I am not sure how to best represent the input matrices for the classifier (ideally sk-learn). I can't abstract the matrices to another form as each element from the input matrix must map to the element in the same location of the output matrix. Has anyone tried to something similar?
Option 1: Multi label classifier
You can flatten the 30X30 matrix into a 900 element vector and feed it to a neural network for multi label classification
https://en.wikipedia.org/wiki/Multi-label_classification
Treat 30X30 matrix as a single channel image and model a CNN with proper loss function for multi label classification.
Option 2: Sequence to Sequence classifier
Flatten the 30X30 matrix into a 900 element vector and build a LSTM with 900 timesteps with ith element in the vector being input to ith timestep. The LSTM is connected to a Dense layer with sigmoid activation (2 class classification). If you use keras for implementaiton you will have to use return_sequence=True for this.

What tensorflow distribution to represent a list of categorical data

I want to construct a variational autoencoder where one sample is an N*M matrix where for each row, there are M categories. Essentially one sample is a list of categorical data where only one category can be selected - a list of one-hot vectors.
Currently, I have a working autoencoder for this type of data - I use a softmax on the last dimension to create this constraint and it works (reconstruction cross entropy is low).
Now, I want to use tf.distributions to create a variational autoencoder. I was wondering what kind of distribution would be appropriate.
Does tf.contrib.distributions.Categorical satisfy your needs? Samples should be from (0 to n - 1), where n represents the category.
Example:
# logits has shape [N, M], where M is the number of classes
dist = tf.contrib.distributions.Categorical(logits=logits)
# Sample 20 times. Should give shape [20, N].
samples = dist.sample(20)
# depth is the number of categories.
one_hots = tf.one_hot(samples, depth=M)

Categories