Feature Extraction Using Representation Learning - python

I'm new to machine learning, and I've been given a task where I'm asked to extract features from a data set with continuous data using representation learning (for example a stacked autoencoder).
Then I'm to combine these extracted features with the original features of the dataset and then use a feature selection technique to determine my final set of features that goes into my prediction model.
Could anyone point me to some resources or demos or sample code of how I could get started on this? I'm very confused on where to begin on this and would love some advice!

Okay, say you have an input of (1000 instances and 30 features). What I would do based on what you told us is:
Train an autoencoder, a neural network that compresses the input and then decompresses it, which has as a target your original input. The compressed representation lies in the latent space and encapsulates information about the input which is not directly accessible by humans. Now you may find such networks in tensorflow or pytorch. Tensorflow is easier and more straightforward so it could be better for you. I will provide this link (https://keras.io/examples/generative/vae/) for a variational autoencoder that may do the job for you. This has Conv2D layers so it performs really well for image data, but you can play around with the architecture. I cannot tell u more because you did not provide more info for your dataset. However, the important thing is the following:
After your autoencoder is trained properly and you need to make sure of it, (it adequately reconstructs the input) then you need to extract the aforementioned latent inputs (you will find more in the link). Now, that will be let's say 16 numbers but you can play with it. These 16 numbers were built to preserve info regarding your input. You said you wanted to combine these numbers with your input so might as well do that and end up with 46 input features. Now the feature selection part has to do with selecting the input features that are more useful for your model. That is not very interesting, you may find more information (https://towardsdatascience.com/feature-selection-techniques-in-machine-learning-with-python-f24e7da3f36e) and one way to select features is by training many models with different feature subsets. Remember, techniques such as PCA are for feature extraction not selection. I cannot provide any demo that does the whole thing but there are sources that can help. Remember, your autoencoder is supposed to return 16 numbers for each training example. Your autoencoder is trained only on your train data, with your train data as targets.

Related

Using tensorflow classification for feature extraction

I am currently working on a system that extracts certain features out of 3D-objects (Voxelgrids to be precise), and i would like to compare those features to automatically made features when it comes to performance (classification) in a tensorflow cNN with some other data, but that is not the point here, just for background.
My idea now was, to take a dataset (modelnet10), train a tensorflow cNN to classify them, and then use what it learned there on my dataset - not to classify, but to extract features.
So i want to throw away everything the cnn does,except for what it takes from the objects.
Is there anyway to get these features? and how do i do that? i certainly have no idea.
Yes, it is possible to train models exclusively for feature extraction. This is called transfer learning where you can either train your own model and then extract the features or you can extract features from pre-trained models and then use it in your task if your task is similar in nature to that of what the pre-trained model was trained for. You can of course find a lot of material online for these topics. However, I am providing some links below which give details on how you can go about it:
https://keras.io/api/applications/
https://keras.io/guides/transfer_learning/
https://machinelearningmastery.com/how-to-use-transfer-learning-when-developing-convolutional-neural-network-models/
https://www.pyimagesearch.com/2019/05/27/keras-feature-extraction-on-large-datasets-with-deep-learning/
https://www.kaggle.com/angqx95/feature-extractor-fine-tuning-with-keras

Using MFCC's for voice recognition

I'm currently using the Fourier transformation in conjunction with Keras for voice recogition (speaker identification). I have heard MFCC is a better option for voice recognition, but I am not sure how to use it.
I am using librosa in python (3) to extract 20 MFCC features. My question is: which MFCC features should I use for speaker identification?
In addition to this I am unsure on how to implement these features. What I would do is to get the necessary features and make one long vector input for a neural network. However, it is also possible to display colors, so could image recognition also be possible, or is this more aimed at speech, and not speaker recognition?
In short, I am unsure where I should start, as I am not very experienced with image recognition and have no idea where to start.
Thanks in advance!!
My question is: which MFCC features should I use for speaker identification?
I shall say that use all of them. Technically MFCC features are output from different filter banks. It is hard to say a priori which of them will be useful.
In addition to this I am unsure on how to implement these features. What I would do is to get the necessary features and make one long vector input for a neural network.
Actually when you extract MFCC for N samples you get an array like N x T x 20 T represents the number of frames in the audio signal after processed for MFCC. I will suggest using Sequence classification with LSTM. This will give better result.
In addition to this I am unsure on how to implement these features.
What I would do is to get the necessary features and make one long
vector input for a neural network.
For each sample, you must have got a 2D matrix of MFCC like N x T X no_mfccs (in your case no_mfccs=20); to make it into one single vector, various researchers take statistics such as mean, var, IQR, etc. to reduce the feature dimension. Some also model it using multivariate regression, and some fit it to a Gaussian mixture model. It depends on the next stage. In your case, you can use statistics to convert into a single vector
OR As told by Parthosarathi, you can use LSTM to preserve sequential information across time frames.
However, it is also possible to display colors, so could image recognition also be possible, or is this more aimed at speech, and not speaker recognition?
I will not recommend you to use spectrogram (image) as a feature vector to neural network because visual images and spectrograms do not accumulate visual objects and sound events information in the same manner.
when you feed image to neural network it assumes that features (pixel values) of an image carry the same meaning regardless of their location. But in case of the spectrogram, location of feature matters a lot.
e.g. Moving the frequencies of a male voice upwards could change its meaning from man to child. Therefore, the spatial invariance that 2D CNN provides might not perform as well for this form of data.
To learn more about it refer: What’s wrong with CNNs and spectrograms for audio processing?
You can use MFCCs with dense layers / multilayer perceptron, but probably a Convolutional Neural Network on the mel-spectrogram will perform better, assuming that you have enough training data.

Forecasting using LSTM

How can I use Long Short-term Memory (LSTM) to predict a future value x(t+1) (out of sample prediction) based on a historical dataset. I read and tried many web tutorials for forecasting and prediction using lstm, but still far away from the point. What's the exact procedure to do this prediction? Is it just as simple as shifting the target array (n)steps where n is the number of future predicts and do the prediction operation? or there's another techniques?
please help or leave a suggestion.
Can you provide the framework you are using? tensorflow? pytorch? which web tutorials specifically?
Assuming you are going tensorflow, you can copy and paste code from one of these, test that it works on the provided dataset, then modify the input encoding functions to fit your dataset, then run on your dataset.
https://github.com/llSourcell/How-to-Predict-Stock-Prices-Easily-Demo (best)
https://github.com/sebastianheinz/stockprediction
https://github.com/talolard/MarketVectors/blob/master/preparedata.ipynb (you will have to replace fc layers with lstm, and fiddle with inputs)
In general procedure is something like (assuming tensorflow):
Download Dataset
Create a function to load batches of data
Create a function to encode batch of data (normalization, other transforms)
Create LSTM layer to recieve series of inputs.
Create output layer (usually fully connected) to take last lstm state and predict output of your desired size.
Create a tf session to wire everything together, and hit run.
Some questions to ask conceptually about which network use:
How many inputs to how many outputs - see this excellent http://cs231n.stanford.edu/slides/2016/winter1516_lecture10.pdf by Karpathy
How far back do you consider the stock prices eg {t-100... t} or {t-10 ...t} which may dictate size of hidden layers.
What other information do you think is relevant to the model? does stock A influence stock B? in which case you may have 2 lstms outputing a state to your fully connected layer...

How to classify continuous audio

I have a audio data set and each of them has different length. There are some events in these audios, that I want to train and test but these events are placed randomly, plus the lengths are different, it is really hard to build a machine learning system with using that dataset. I thought fixing a default size of length and build a multilayer NN however, the length's of events are also different. Then I thought about using CNN, like it is used to recognise patterns or multiple humans on an image. The problem for that one is I am really struggling when I try to understand the audio file.
So, my questions, Is there anyone who can give me some tips about building a machine learning system that classifies different types of defined events with training itself on a dataset that has these events randomly(1 data contains more than 1 events and they are different from each other.) and each of them has different lenghts?
I will be so appreciated if anyone helps.
First, you need to annotate your events in the sound streams, i.e. specify bounds and labels for them.
Then, convert your sounds into sequences of feature vectors using signal framing. Typical choices are MFCCs or log-mel filtebank features (the latter corresponds to a spectrogram of a sound). Having done this, you will convert your sounds into sequences of fixed-size feature vectors that can be fed into a classifier. See this. for better explanation.
Since typical sounds have a longer duration than an analysis frame, you probably need to stack several contiguous feature vectors using sliding window and use these stacked frames as input to your NN.
Now you have a) input data and b) annotations for each window of analysis. So, you can try to train a DNN or a CNN or a RNN to predict a sound class for each window. This task is known as spotting. I suggest you to read Sainath, T. N., & Parada, C. (2015). Convolutional Neural Networks for Small-footprint Keyword Spotting. In Proceedings INTERSPEECH (pp. 1478–1482) and to follow its references for more details.
You can use a recurrent neural network (RNN).
https://www.tensorflow.org/versions/r0.12/tutorials/recurrent/index.html
The input data is a sequence and you can put a label in every sample of the time series.
For example a LSTM (a kind of RNN) is available in libraries like tensorflow.

How to find the most important features learned during Deep Learning using CNN?

I followed the tutorial given at this site, which detailed how to perform text classification on the movie dataset using CNN. It utilized the movie review dataset to find predict positive and negative reviews.
My question is, is there any way to find the most important learned features from the model? Does Tensorflow/Theano has any support for this?
Thanks !
A word of warning: if you can trace the classification back to specific input features, it's quite possible that CNN is the wrong ML paradigm for your application. Most text processing uses RNN, bag-of-words, bi-grams, and other simple linear combinations.
The structure of a CNN is generally antithetical to identifying the importance of individual features. Because of the various non-linear layers, it is rarely possible to pick out any one feature as important; rather, the combinations of inputs form small structures of inference, which then convolve to form more complex structures, until the final output is driven by a series of neighbor relationships, cut-offs, poolings, and other items.
This is why back-propagation is so important to running CNNs: the causation chain does not reverse cleanly. Otherwise, we'd reduce the process to a simple linear NN with one hidden layer.
If you want to analyze what's happening, try visualizing your intermediate layers. There are various modules to help with that; for instance, try a search for "+theano +visualize +CNN -news" (the last is to remove the high-traffic references to Cable News Network). There are plenty of examples in image processing; we won't know how much it might help your text processing, until you try it.

Categories