how to do speaker identification using voice?

how to do speaker identification using voice? - python

I was wondering if you can detect s using their voice. For example, we give feed some data in our program like this is the voice of a cat and when it detects it, it says hello cat or something.

Sure! Machine learning can do that, I suggest using deep-learning such as CNN, LSTM model, etc. what you need to prepare is the dataset of your cat voices and accurate model architecture.
roughly process
prepare dataset:
for example: 300 audio files each contain 3 seconds of different cat voices.
train model
model evaluation
script to respond to the output of model classification.
say "hello cat" when the prediction output is a cat.
but it's not that easy, hope this example is helpful.
more detail: https://www.analyticsvidhya.com/blog/2022/03/implementing-audio-classification-project-using-deep-learning/

of course, you can use machine learning to solve that problem, you can either deal with it as a signal processing/time series-ish problem and slide a window over your audio then feed it to your model to classify it (cat, dog, ...) LSTMs and RNNs are your go-to in this case. Or transform your audios to spectrogram to transform it into an image, then feed those images to your typical CNN architecture (NASNet, DenseNet, ...).
here is a reference you can check:
UrbanSound (audio/signal):https://www.kaggle.com/datasets/raghavrawat/segregatedurban8ksounds
UrbanSound (Spectrogram, I create it long ago): https://www.kaggle.com/datasets/skywolfmo/urban8kspectograms
There might be other ideas and methods besides the ones I mentioned above.
Good luck

Related

Using tensorflow classification for feature extraction

I am currently working on a system that extracts certain features out of 3D-objects (Voxelgrids to be precise), and i would like to compare those features to automatically made features when it comes to performance (classification) in a tensorflow cNN with some other data, but that is not the point here, just for background.
My idea now was, to take a dataset (modelnet10), train a tensorflow cNN to classify them, and then use what it learned there on my dataset - not to classify, but to extract features.
So i want to throw away everything the cnn does,except for what it takes from the objects.
Is there anyway to get these features? and how do i do that? i certainly have no idea.

Yes, it is possible to train models exclusively for feature extraction. This is called transfer learning where you can either train your own model and then extract the features or you can extract features from pre-trained models and then use it in your task if your task is similar in nature to that of what the pre-trained model was trained for. You can of course find a lot of material online for these topics. However, I am providing some links below which give details on how you can go about it:
https://keras.io/api/applications/
https://keras.io/guides/transfer_learning/
https://machinelearningmastery.com/how-to-use-transfer-learning-when-developing-convolutional-neural-network-models/
https://www.pyimagesearch.com/2019/05/27/keras-feature-extraction-on-large-datasets-with-deep-learning/
https://www.kaggle.com/angqx95/feature-extractor-fine-tuning-with-keras

How to use tensorflow model for predicting my own images

I've just started with tensorflow. I wrote a program that uses Fashion_MNIST dataset to train the model. And then predicts the labels using 'test_images'and it's working good so far.
But what I am curious how can I use my own image of a shoe or shirt for prediction. Because all the test images are of shape 28*28. How can I do this ?

The task you are engaged in is the task of data preparation and preprocessing. Among the things you must do already having a directory with images is the tagging of the images, for this task I recommend labelImg.
If you also need the dimensionality of the input to be of a specific size like the example you give, you can use digital image processing software. The OpenCV library has dimensionality reduction tools that work for this.

Good approach to determine quality of a human posture with TensorFlow

i try to determine the quality of a person's sitting posture. (e.g. sitting upright = good / sitting crouched = bad) with a webcam.
First try:
Image aquisition (with OpenCV python bindings)
Create a dataset with labeled images into good/bad
Feature detection (FAST)
Train a neuronal net on the dataset with that features(ANN_MLP)
The result was ok with a few restrictions:
not invariant to webcam movements, displacement, other persons, objects etc.
iam not sure if FAST features will fit
im pretty new to machine learning and want to try more sophisticated approaches with TensorFlow:
Second try:
I tried human pose detection via Tensorflow PoseNet
And got a mini example working which can determine probabilities of human bodypart positions. So now the challenge would be to detect the quality of a person's sitting posture out of the output of PoseNet.
What is a good way to proceed:
train a second TF model which gets probabilities of human bodypart
positions as input and outputs good/bad posture? (so PoseNet is used as fancy feature detector)
rework the PoseNet model to fit my output needs and retrain it?
transfer learning from PoseNet (i just read about it but have no clue how or if its even applicable here)?
or maybe a complete different approach?

Using machine learning to detect images based on single learning image

I have a use case where I have about 300 images out of 300 different items. I need machine learning to detect an item about once a minute.
I've been using Keras with Sequential to detect images but I'm wondering what I should take into consideration when I have 300 labels and only one image per label for learning.
So in short:
1) Can you do machine learning image detection with one learning image per label?
2) Are there any special things I take into consideration?

If this were a special case -- say, one class in 100 was represented by a single training image -- then you might get away with it. However, a unique image per class is asking for trouble.
A neural network learns by iterative correction, figuring out what features and combinations are important, and which are not, in discriminating the classes from one another. Training starts by a chaotic process that has some similarities to research: look at the available data, form hypotheses, and test then against the real world.
In a NN, the "hypotheses" are the various kernels it develops. Each kernel is a pattern to recognize something important to the discrimination process. If you lack enough examples for the model to generalize and discriminate for each class, then you run the risk (actually, you have the likelihood) of the model making a conclusion that is valid for the one input image, but not others in the same class.
For instance, one acquaintance of mine did the canonical cat-or-dog model, using his own photos, showing the pets of his own household and those of a couple of friends. The model trained well, identified cats and dogs with 100% accuracy on the test data, and he brought it into work ...
... where it failed, having an accuracy of about 65% (random guessing is 50%). He did some analysis and found the problem: his friends have indoor cats, but their preferred dog photos were out of doors. Very simply, the model had learned to identify not cats vs dogs, but rather couches and kitchen cabinets vs outdoor foliage. One of the main filters was of large, textured, green areas. Yes, a dog is a large, textured, green being. :-)
The only way your one-shot training would work is if each of your training images was specifically designed to include exactly those features that differentiate this class from the other 299, and no other visual information. Unfortunately, to identify what features those might be, and to provide canonical training photos, you'd have to know in advance what patterns the model needed to pick.
This entirely defeats the use case of deep learning and model training.

If you were to only train on that image once, it probably wouldn't be able to detect it yet. If you train it more, it will probably over fit and only recognize that one image. If that is what you are trying to do then you should make an algorithm to search the screen for that image (it will be more efficient).

1) You'll probably have problems with the generalization of your models because the lack of training set. In other words, your model will not "learn" about that class.
2) It's good to have a better training set in order to create a better model.

How to get better results when using the TensorFlow object detection API for a custom trained model to recognize hand poses?

in my project I need to train a object detection model which is able to recognize hands in different poses in real-time from a rgb webcam.
Thus, I'm using the TensorFlow object detection API.
What I did so far is training the model based on the ssd_inception_v2 architecture with the ssd_inception_v2_coco model as finetune-checkpoint.
I want to detect 10 different classes of hand poses. Each class has 300 images which are augmented. In total there are 2400 images in for training and 600 for evaluation. The labeling was done with LabelImg.
The Problem is that the model isn't able to detect the different classes properly. Even if it still wasn't good I got much better results by training with the same images, but only with like 3 different classes. It seems like the problem is the SSD architecture. I've read several times that SSD networks are not good in detecting small objects.
Finally, my questions are the following:
Could I receive better results by using a faster_rcnn_inception architecture for this use case?
Has anyone some advice how to optimize the model?
Do I need more images?
Thanks for your answers!

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

how to do speaker identification using voice? - python

I was wondering if you can detect s using their voice. For example, we give feed some data in our program like this is the voice of a cat and when it detects it, it says hello cat or something.

Related

Using tensorflow classification for feature extraction

How to use tensorflow model for predicting my own images

Good approach to determine quality of a human posture with TensorFlow

Using machine learning to detect images based on single learning image

How to get better results when using the TensorFlow object detection API for a custom trained model to recognize hand poses?

Categories

Resources