In school we have to listen to intervals and chords and determine their name. I'm really into neuronal network. Thats why I want to create a neuronal network with Python which listen to the audio and give me the name as an output. I've learned once that for music I need a LSTM. Should I need for this purpose also a LSTM and how/where should I start? Can anybody teach me how to achieve my goal?
first of all you need to exactly define the task you like to solve: Do you like to classify a whole piece of music/track or do you like to classify segments of the piece/track? This will influence which architecture you need to use to solve your task. I will briefly present an approach for each of those tasks.
Classifying a track: Recordings of music are time series, for each of your recordings you need to have a label. Your first intuition of using LSTMs (or RNNs in general) is a good one. Just use your recording transformed into a vector as input-sequence for your LSTM-network and let it put out probabilities for each class. As already indicated by a comment, working in frequency-space can be beneficial. However just using the Fourier-Transformation of the whole track will most likely lose important information since the temporal frequency information is lost. Rather use Short-time Fourier-Transormation (STFT) or Mel-frequency cepstrum coefficients (MFCC, here is a python-library to calculate them: libROSA). Very oversimplified, those methods will transform your time series into some kind of 'image', an two-dimensional frequncy-spectrum, and for image classification task Convolutional Neural Networks (CNNs) are the way to go.
Classifying segments: If you like to classify segments of your track you need to have a labels for each time-frame in your song. Lets say your song is 3 minutes long and you have a sampling frequency of 60 Hz, your vector representation of the song will have 3*60*60 = 10800 time-frames, thus for each of the entries you need to provide a class label (chord or whatever). Again you can use LSTMs, use your vector as input-sequence and let your network produce an output sequence of the same length of your song and compare it to the class labels. You also could use the previously mentioned STFT- or MFC-coefficients as inputs and take advantage of the frequency information, now you will have a spectrum for each time-frame as input.
I hope these broad ideas will bring you one step closer to solve your task. For implementation details I like to point you to the keras documentation and to countless tutorials on the internet.
Disclaimer:
My knowledge of music theory is rather limited, so please take my answer with a grain of salt and feel free to correct me or ask for clarification. Have fun
Related
I'm working on a model that would predict an exam schedule for a given course and term. My input would be the term and the course name, and the output would be the date. I'm currently done with the data cleaning and preprocessing step, however, I can't wrap my head around a way to make a model whose input is two strings and the output is two numbers (the day and month of exam). One approach that I thought of would be encoding my course names, and writing the term as a binary list. I.E input: encoded(course), [0,0,1] output: day, month. and then feeding to a regression model.
I hope someone who's more experienced could tell me a better approach.
Before I start answering your question:
/rant
I know this sounds dumb and doesn't really help your question, but why are you using Neural Networks for this?!
To me, this seems like the classical case of "everybody uses ML/AI in their area, so now I have to, too!" (which is completely not true) /rant over
For string-like inputs, there exist several methods to encode these; choosing the right one might depend on your specific task. As you have a very "simple" (and predictable) input - i.e., you know in advance that there might not be any new/unseen course titles during testing/inference, or you do not need contextual/semantic information, you can resort to something like scikit-learn's LabelEncoder, which will turn it into different classes.
Alternatively, you could also throw a more heavy-weight encoding structure at the problem, that embeds the values in a matrix. Most DL frameworks offer some form of internal function for this, which basically requires you to pass an unique index for your input data, and actively learns some k-dimensional embedding vector for this. Intuitively, these embeddings correspond to a semantic or topical direction. If you have for example 3-dimensional embeddings, the first one could represent "social sciences course", the other one "technical courses", and the third for "seminar".
Of course, this is just a simplification of it, but helps imagining how it works.
For the output, predicting a specific date is actually a really good question. As I have personally never predicted dates myself, I can only recommend tips by other users. A nice answer on dates (as input) is given here.
If you can sacrifice a little bit of accuracy in the result, predicting the calendar week in which the exam is happening might be a good idea. Otherwise, you could simply treat it as two regressed values, but you might end up with invalid combinations (i.e. "negative days/months", or something like "31st February".
Depending on how much training data of high quality you have, results might vary quite heavily. Lastly, I would again recommend you to overthink whether you actually need a neural network for this task, or whether there are simpler metrics to do this.
Create dummy variables or use RandomForest. They accept text input and numerical output.
I have a list of temporal series of values measured in different places. These measurements may or may not be correlated, (mostly depending on their relative positions, but it is plausible that some very close detectors would actually measure decorrelated series). I would like to predict the values of the whole set, taking into account the series of all of them and their correlation through time. If it is of any help, the values should also have relative periodicity
EDIT: I have access to the generated power of several solar panels. These solar panels are spread spatially, and I would like to use them as 'irradiance detectors'. Knowing the sun illumination in several places in the past, I wish to identify correlations in between signals, which could then be used to make predictions of illumination.
Regardless of usual patterns of production through a day (as seen on image), what I am interested in is the information I can extract from one pannels' past to predict another ones future.
I think I would need a Neural Network to solve this problem, but I am not sure how to feed it :I thought of using a temporal window and feed my NN with a few past values from A, B and C, but I am afraid it's a little weak.
The image shows an example of what my data I looks like.
How can I predict the next values of curve A knowing past values of A, B and C?
How to handle this prediction?
I think the easiest way is to train 3 models with the same input but each will predict one value (A, B or C).
If you are sure about correlation between input variable and their impact on the predicted output, you may create one neural network with a common branch (probably RNN over the stacked 3 inputs) then 3 different prediction head where each will produce one prediction A or B or C. Fast-rcnn architecture is a great example of this.
The best way to achieve this task is to use a RNN.
A good tutorial for learning how to develop such a neural network is here :
https://www.tensorflow.org/tutorials/recurrent
I also found this link, where they achieved training a RNN for a rather close problem :
http://blog.datatonic.com/2016/11/traffic-in-london-episode-ii-predicting.html
An even better inspiration :
http://machinelearningmastery.com/time-series-prediction-lstm-recurrent-neural-networks-python-keras/
Suppose I have an image of a car taken from my mobile camera and I have another image of the same car taken downloaded from the internet.
(For simplicity please assume that both the images contain the same side view projection of the same car.)
How can I detect that both the images are representing the same object i.e. the car, in this case, using OpenCV?
I've tried template matching, feature matching (ORB) etc but those are not working and are not providing satisfactory results.
SIFT feature matching might produce better results than ORB. However, the main problem here is that you have only one image of each type (from the mobile camera and from the Internet. If you have a large number of images of this car model, then you can train a machine learning system using those images. Later you can submit one image of the car to the machine learning system and there is a much higher chance of the machine learning system recognizing it.
From a machine learning point of view, using only one image as the master and matching another with it is analogous to teaching a child the letter "A" using only one handwritten letter "A", and expecting him/her to recognize any handwritten letter "A" written by anyone.
Think about how you can mathematically describe the car's features so that every car is different. Maybe every car has a different size of wheels? Maybe the distance from the door handle to bottom of the side window is a unique characteristic for every car? Maybe every car's proportion of front side window's to rear side window's width is an individual feature of that car?
You probably can't answer yes with 100% confidence to any of these quesitons. But, what you can do, is combine those into a multidimensional feature vector and perform classification.
Now, what will be the crucial part here is that since you're doing manual feature description, you need to take care of doing an excellent work and testing every step of the way. For example, you need to design features that will be scale and perspective invariant. Here, I'd recommend reading on how face detection was designed to fulfill that requirement.
Will Machine Learning be a better solution? Depends greatly on two things. Firstly, what kind of data are you planning to throw at the algorithm. Secondly, how well can you control the process.
What most people don't realize today, is that Machine Learning is not some magical solution to every problem. It is a tool and as every tool it needs proper handling to provide results. If I were to give you advice, I'd say you will not handle it very well yet.
My suggestion: get acquainted with basic feature extraction and general image processing algorithms. Edge detection (Canny, Sobel), contour finding, shape description, hough transform, morphological operations, masking, etc. Without those at your fingertips, I'd say in that particular case, even Machine Learning will not save you.
I'm sorry: there is no shortcut here. You need to do your homework in order to make that one work. But don't let that scare you. It's a great project. Good luck!
I just started working on an artificial life simulation (again... I lost the other one) in Python and Pygame using Pybrain, and I'm planning how this is going to work. So far I have an environment with some "food pellets". A food pellet is added every minutes. I haven't made my agents (aka "Creatures") yet, but I know I want them to have simple feed forward neural networks with some inputs and the outputs will be its' movement. I want the inputs to show what's in front of them, sort of like they are seeing the simulated world in front of them. How should I go about this? I either want them to actually "see" the colors in their line of vision, or just input the nearest object into their NN. Which one would be best, and how will I implement them?
Having a full field of vision is technically possible in a neural network, but requires a LOT of inputs and massive processing; not a direction you should expect to be able to evolve in any kind of meaningful way.
A neural network deals with values and thresholds. I'd recommend using two inputs associated with the nearest individual - one of them has a value for distance (of the nearest) and the other its angle (with zero being directly ahead, less than zero being on the left and greater than zero bring on the right).
Make sure that these values are easy to process into outputs. For example, if one output goes to a rotation actuator, make sure that the input values and output values are on the same scale. Then it will be easy to both turn toward or away from a particular individual.
If you want them to be able to see multiple individuals, simple include multiple pairs of inputs. I was going to suggest putting them in distance order, but it might be easier for them if as soon as an organism sees something it always comes in to the same inputs until it's no longer tracked.
Im working on EEG signal processing method for recognition of P300 ERP.
At the moment, Im training my classifier with a single vector of data that I get by averaging across preprocessed data from chosen subset of original 64 channels. Im using the values from EEG directly, not a frequency features from fft. The method actually got quite a solid performance of around 75% accurate classification.
I would like to improve it by using ICA to clean the EEG data a bit. I read through a lot of tutorials and papers and I am still kinda confused.
Im implementing my method in python so I chose to use sklearn's FastICA.
from sklearn.decomposition import FastICA
self.ica = FastICA(n_components=64,max_iter=300)
icaSignal = self.ica.fit_transform(self.signal)
From 25256 samples x 64 channels matrix I get matrix of original sources, that is also 25256x64. The problem is, that im not quite sure how to use the output.
Averaging those components and training a classifier same way as with signal reduces performance to less than 30%. So this is not probably the way.
Another way that I read about, is rejecting some of components at this point - the ones that represent eye blinks, muscle activity etc. Doing that based on their frequency and some other heuristics. - I also not quite confident about how to do that exactly.
After I reject some of the components, what is the next step? Should I try to average the ones that left and feed the classifier with them, or should i try to reconstruct the EEG signal without them now - if so, how to do that in python? I wasnt able to find any information about that reconstruction step. It is probably much easier to do in matlab so nobody bothered to write about it :(
Any suggestions? :)
Thank you very much!
I haven't used Python for ICA, but in turns of the steps, it shouldn't matter whether it's Matlab or Python.
You are completely right that it's hard to reject ICA components. There is no widely-accepted objective measurement. There are certain patterns for eye blinks (high voltage in frontal channels), muscle artifacts (wide spectrum coverage because it's EMG, at peripheral channels). If you don't know where to get started, I recommend reading the help of a Matlab plugin called EEGLAB. This UCSD group has some nice materials to help you start.
https://eeglab.org/
To answer your question on the ICA reconstruction: after rejecting some ICA components, you should reconstruct the original EEG without them.