How to apply data-augmentation on acoustic datasets?

How to apply data-augmentation on acoustic datasets? - python

I have a small acoustic dataset of human sounds which I would like to augment and later pass to a binary classifier.
I am familiar with data augmentation for images, but how is it done for acoustic datasets?
I've found 2 related answers regarding autoencoders and SpecAugment with Pytorch & TorchAudio
but I would like to hear your thoughts about the audio-specific "best method".

It really depends on what are you trying to achieve, what your classifier is designed for and how it works.
Depending on the above, you can for example cut the audio differently (if you are feeding the classifier with cut audio segments, and that makes sense in your particular case). You can also augment it with some background noise (artificial like white noise, or recorded one) with different signal to noise ratio - this should additionally make the classifier more robust against noise.

Related

Non-image data augmentation

I am looking for an algorithm and-or tutorial about data augmentation but all of them belong to image augmentation , is it possible to do that in other datasets ?
I am working on parkinsons data set (https://archive.ics.uci.edu/ml/datasets/parkinsons) and want to create an example of data aug with python , is this possible ? or should i use smt like mnist/fmnist ?

If you had access to the actual voice recordings, you could apply some augmentation techniques used in speech recognition and then re-extract the features such as fundamental frequency. However, since you're dealing directly with the features, augmentation is more tricky. It is possible to generate synthetic samples by interpolating between existing ones or adding noise, but since the features are highly correlated, you need a smart way of doing that (see this paper for a simple approach and this one for a more advanced technique). If you have a class imbalance problem, you can try simply over- or under-sampling.

Siamese Network For Image Similarity

Where can I find details to implement siamese networks to perform image similarity and to retrieve the most similar image from a dataset
It is difficult to get a large number of image data for all the classes, so only a few images, eg 10 images for some classes, are available for most of the classes.
SIFT or ORB seems to perform poorly on some classes.
My project is to differentiate between the license plates based on the states of the UAE. Here I upload few example images.

When there is few training data, no matter how annoying it sounds, the best approach is usually to collect more. Deep networks are infamously data hungry and their performance is poor when the data is scarce. This said, there are approaches that might help you:
Transfer learning
Data augmentation
In transfer learning, you take an already trained deep net (e.g. ResNet50), which was trained for some other task (e.g. ImageNet), fix all its network weights except for the weights in the last few layers and train on your task of interest.
Data augmentation slightly modifies your training data in some predictable way. In your case you can rotate your image by a small angle, apply a perspective transformation, scale the image intensities or slightly change the colors. You apply a different set of these operations with different parameters every time you want to use a particular training image. This way you generate new training examples enlarging your training set.

Data augmentation before splitting

for my exam based around data crunching, we've received a small simpsons dataset of 4 characters (Bart, Homer, Lisa, Marge) to build a convolutional neural network around. However, the dataset contains only a rather small amount of images: around 2200 to split into test & train.
Since I'm very new to neural networks and deep learning, is it acceptable to augment my data (i'm turning the images X degrees 9 times) and splitting my data afterwards using sklearn's testtrainsplit function.
Since I've made this change, I'm getting a training and test accuracy of around 95% after 50 epochs with my current model. Since that's more than I've expected to get, I started questioning if augmenting test-data mainly is accepted without having a biased or wrong result in the end.
so:
a) Can you augment your data before splitting it with sklearn's TrainTestSplit without influencing your results in a wrong way?
b) if my method is wrong, what's another method I could try out?
Thanks in advance!

One should augment the data after Train and Test split. To work correctly one needs to make sure to augment data only from the train split.
If one augments data and before splitting the dataset, it will likely inject small variations of the train dataset into the test dataset. Thus the network will be overestimating its accuracy (and it might be over-fitting as well, among other issues).
A good way to avoid this pitfall it is to augment the data after the original dataset was split.
A lot of libraries implement python generators that randomly apply one or more combination of image modifications to augment the data. These might include
Image rotation
Image Shearing
Image zoom ( Cropping and re-scaling)
Adding noise
Small shift in hue
Image shifting
Image padding
Image Blurring
Image embossing
This github library has a good overview of classical image augmentation techniques: https://github.com/aleju/imgaug ( I have not used this library. Thus cannot endorse it speed or implementation quality, but their overview in README.md seems to be quite comprehensive.)
Some neural network libraries already have some utilities to do that. For example: Keras has methods for Image Preprocessing https://keras.io/preprocessing/image/

Convolutional Autoencoder feature learning

I am training a convolutional autoencoder on my own dataset. After training, the network is able to reconstruct the test images from the dataset quite well.
I am now taking the intermediate representation(1648-dim) from the encoder network and trying to cluster the feature vectors into 17(known upfront) different classes using a GMM soft clustering. However, the clusters are really bad and it is not able to cluster the images into its respective categories.
I am using sklearn.mixture.GaussianMixture package for clustering with a regularization of 0.01 and 'full' covariance_type.
My question: Why do you think that the reconstruction is very decent but the clustering is quite bad? Does it mean the intermediate features learned by the network is not adequate?

Lets revert the question - why do you think it should have any meaning? You are using clustering, which is just arbitrary method of splitting into groups yet you expect it will discover classes. Why would it do it? There is literally nothing forcing model to do so, and it is probably modeling completely different things (like patches of images, textures etc.). In general you should never expect clustering to solve the problem of some arbitrary labeling, this is not what clustering is for. To give you more perspective here - you have images, which come from say 10 categories (like cats, dogs etc.), and you ask:
why clustering in the feature space does not recover classes?
Note that equally valid questions would be:
why clustering in the features space does not divide images to "redish", "greenish" and "blueish"?
why clustering in the features space does not divide images by the size of the object on the image?
why clustering in the features space does not divide images by the country it is from?
There are exponentially many labelings to be assigned to each dataset, and nothing in your training uses any labels (autoencoding is unsupervised, clustering is unsupervised) so expecting that the result will magically guess which of so many labellings you have in mind is simply a wild guess, and the fact it does not do so means nothing. It is neither good nor bad. (Lets also ignore at this point how good can GMM be with ~1700 dimensional space. )
If you want a model to perform some task you have to give it a chance, train it to solve it. If you want to see if features learned are enough to recover categories then learn a classifier on them.

how to prepare image dataset for training model?

I have a project that use Deep CNN to classify parking lot. My idea is to classify every space whether there is a car or not. and my question is, how do i prepare my image dataset to train my model ?
i have downloaded PKLot dataset for training included negative and positive image.
should i turn all my data training image to grayscale ? should i rezise all my training image to one fix size? (but if i resize my training image to one fixed size, i have landscape and portrait image). Thanks :)

This is an extremely vague question since every image processing algorithm has different approaches to extracting features. However, in your parking lot example, you would probably need to do RGB to Greyscale conversion and Size normalization among other image processing techniques.
A great starting point would be in this link: http://www.scipy-lectures.org/advanced/image_processing/

First detect the cars present in the image, and obtain their size and alignment. Then go for segmentation and labeling of the parking lot by fixing a suitable size and alignment.

as you want use pklot dataset for training your machine and test with real data, the best approach is to make both datasets similar and homological, they must be normalized , fixed sized , gray-scaled and parameterized shapes. then you can use Scale-invariant feature transform (SIFT) for image feature extraction as basic method.the exact definition often depends on the problem or the type of application. Since features are used as the starting point and main primitives for subsequent algorithms, the overall algorithm will often only be as good as its feature detector. you can use these types of image features based on your problem:
Corners / interest points
Edges
Blobs / regions of interest points
Ridges
...

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.