Neural Network Data Sparsity - python

I am using PyBrain to train a network on music. The input is two notes, and the output is the next two notes.
Each note is represented by an integer mapped to a note (E.G C# = 11, F = 7), the octave, and the duration. So I was using a dataset as such:
ds = SupervisedDataSet(6, 6)
Which would look like ([note1, octave1, duration1, note2, octave2, duration2], [note1, octave1, duration1, note2, octave2, duration2])
However, I ran into a problem with chords (I.E more than one note played at once). To solve this, I got rid of the first integer representing a note and replaced it with 22 integers, set to either one or zero, to indicate which notes are being played. I still have this followed by octave and duration.
So for example, the following
[0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 4, 0.5]
represents a chord of C#, E and A, with an octave of 4 and duration of 0.5.
PyBrain always gives me an output of all zeros after training and testing. I understand why it's doing this but I don't know how to fix it.
Is there a better way to represent the notes/chords so that PyBrain won't have this problem?
EDIT: I have since converted the bit vector into a decimal number, and while the network isn't just giving zeros anymore it's still pretty clear it's not learning the patterns correctly.
I am using a network like this:
net = buildNetwork(6, 24, 6, bias=True, hiddenclass=LSTMLayer, recurrent=True)
and a trainer like this:
trainer = BackpropTrainer(net, ds, verbose = True)
when I train I am getting a huge error, something like ten or a hundred thousand.

Your problem is not so clear for me, I think it needs more detailed explanation, but depended what I understood I suppose that you don't need reccurence in your network, also try to use another activation function in hidden layer, for example Softmax. I tested it on some data set of samples with 6 nodes input and 6 - output and it is being trained properly, so I there I suggest you my version:
from pybrain.tools.shortcuts import buildNetwork
from pybrain.datasets import SupervisedDataSet
from pybrain.supervised.trainers import BackpropTrainer
from pybrain.structure.modules import SoftmaxLayer
ds = SupervisedDataSet(6, 6)
#
# fill dataset
#
net = buildNetwork(6, 24, 6, bias=True, hiddenclass=SoftmaxLayer)
trainer = BackpropTrainer(net, ds)
train:
error = 10
while error > 0.00001: #choose error like you want
error = trainer.train()
print error #just for logging
#and activate
print net.activate([*,*,*,*,*,*])

Related

q table with gym (using box observation space)

I'm trying to run a q-learning algorithm with this observation space:
self.observation_space = spaces.Box(low=np.array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0]), high=np.array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1]), dtype=np.flo
when im trying to access the q table, like this:
q_value = q_table[state][action]
I'm getting this error:
IndexError: arrays used as indices must be of integer (or boolean) type
So my question is: how am i supposed to access the q table when my observation space is definded using space.box?
If thats needed, this is how the q_table is defined (its a code i took from the internet, trying to adjust it to my project):
num_box = tuple((env.observation_space.high + np.ones(env.observation_space.shape)).astype(int))
q_table = np.zeros(num_box + (env.action_space.n,))
You're not saying of what type q_table is. I will assume it's an numpy array defined as in OpenAI Gym and Python set up for Q-learning:
action_space_size = env.action_space.n
state_space_size = env.observation_space.n
q_table = np.zeros((state_space_size, action_space_size))
You're getting this error because you're not indexing the elements of the numpy array with an integer. Again, I haven't seen your code, but I believe you are trying to get a specific row of the Q table using a tuple.
Regardless, you should not use a Box observation space when using Q-learning, but rather a Discrete one. When using Q-learning, you need to know the number of states in advance, to initialize the Q-table.
Box spaces are for real values, and the number of dimensions of the space does not define the number of states. For example, if you create a Box space like this:
spaces.Box(low=0, high=1, shape=(2, 2), dtype=np.float16)
you won't have 4 states, but potentially infinite states. The parameters low=0 and high=1 indicate the minimum and maximum value of the four variables in the Box space, but there can be may several values between 0 and 1 (0.1, 0.2, etc.). For this reason, you cannot estimate the number of states beforehand.
If you use np.uint8 (or any integer type) as dtype, you could potentially count the number of states, but it would still be a stretch to use Box spaces instead of Discrete spaces. Moreover, even using integer values the following will not work:
num_box = tuple((env.observation_space.high + np.ones(env.observation_space.shape)).astype(int))
q_table = np.zeros(num_box + (env.action_space.n,))

Coin tosses, arithmetic of random variables, and PyMC3

I find myself wanting to perform arithmetic of random variables in Python; for the sake of example, let us consider the experiment of repeatedly tossing two independent fair coins and counting the number of heads.
Sampling from each random variable independently is straightforward with scipy.stats, and we can start getting results right away
In [5]: scipy.stats.bernoulli(0.5).rvs(10) + scipy.stats.bernoulli(0.5).rvs(10)
Out[5]: array([1, 0, 0, 0, 1, 1, 1, 2, 1, 2])
Now, a pessimist would remark that we wouldn't even have to go that far and could instead just do np.random.randint(2, size=10) + np.random.randint(2, size=10), and a cynic would notice that we could just calculate the sum and never have to sample anything.
And they'd be right. So, say that we have many more variables and more complex operations to perform on them, and graphical models quickly become useful. That is, we might want to operate on the random variables themselves and only start sampling when our graph of computation is set up. In lea, which does exactly that (albeit only for discrete distributions), the example above becomes
In [1]: from lea import Lea
In [7]: (Lea.bernoulli(0.5) + Lea.bernoulli(0.5)).random(10)
Out[7]: (0, 2, 0, 2, 0, 2, 1, 1, 1, 2)
Appears to be working like a charm. Enter PyMC3, one of the more popular libraries for probabilistic programming. Now, PyMC3 is intended for usage with MCMC and Bayesian modeling in particular, but it has the building blocks we need for our experiment above. Alas,
In [1]: import pymc3 as pm
In [2]: pm.__version__
Out[2]: '3.2'
In [3]: with pm.Model() as model:
...: x = pm.Bernoulli('x', 0.5)
...: y = pm.Bernoulli('y', 0.5)
...: z = pm.Deterministic('z', x+y)
...: trace = pm.sample(10)
...:
Assigned BinaryGibbsMetropolis to x
Assigned BinaryGibbsMetropolis to y
100%|███████████████████████████████████████| 510/510 [00:02<00:00, 254.22it/s]
In [4]: trace['z']
Out[4]: array([2, 0, 2, 0, 2, 0, 2, 0, 2, 0], dtype=int64)
Not exactly random. Unfortunately, I lack the theoretical understanding of why the Gibbs sampler produces this particular result (and really I should probably just hit the books). Using step=pm.Metropolis() instead, we get the correct distribution at the end of the day, even if the individual samples correlate strongly with their neighbours (as is to be expected from MCMC).
In [8]: with pm.Model() as model:
...: x = pm.Bernoulli('x', 0.5)
...: y = pm.Bernoulli('y', 0.5)
...: z = pm.Deterministic('z', x+y)
...: trace = pm.sample(10000, step=pm.Metropolis())
...:
100%|██████████████████████████████████████████████████████████████████████████████████████████| 10500/10500 [00:02<00:00, 5161.18it/s]
In [14]: collections.Counter(trace['z'])
Out[14]: Counter({0: 2493, 1: 5024, 2: 2483})
So, maybe I could just go ahead and use pm.Metropolis for simulating my post-arithmetic distribution, but I'd be afraid that I was missing something, and so the question finally becomes: Why does the step-less simulation above fail, and are there any pitfalls in using PyMC3 for ordinary, non-MC, MC, and is what I'm trying to do even possible in PyMC3 in the first place?
Comments by colcarroll:
[Feb. 21, 2018]: Definitely a bug - github.com/pymc-devs/pymc3/issues/2866 . What you are doing should work, but is not the intention of the library. You would use PyMC3 to reason about uncertainty (perhaps observing z and reasoning about the probabilities of x and y). I think your first two approaches, and perhaps the pomegranate library would be more efficient. See stackoverflow.com/questions/46454814/… –
[Feb. 25, 2018]: This is now fixed on master (see github.com/pymc-devs/pymc3/pull/2867) by Junpeng Lao. See andrewgelman.com/2018/01/18/… for background on "Anticorrelated draws". I am not sure how stackoverflow wants to handle a question like this.

Scikitlearn - order of fit and predict inputs, does it matter?

Just getting started with this library... having some issues (i've read the docs but didn't get clarity) with RandomForestClassifiers
My question is pretty simple, say i have a train data set like
A B C
1 2 3
Where A is the independent variable (y) and B-C are the dependent variables (x). Let's say the test set looks the same, however the order is
B A C
1 2 3
When I call forest.fit(train_data[0:,1:],train_data[0:,0])
do I then need to reorder the test set to match this order before running? (Ignoring the fact that I need to remove the already predicted y value (a), so lets just say B and C are out of order... )
Yes, you need to reorder them. Imagine a simpler case, Linear Regression. The algorithm will calculate the weights for each of the features, so for example if feature 1 is unimportant, it will get assigned a close to 0 weight.
If at prediction time the order is different, an important feature will be multiplied by this almost null weight, and the prediction will be totally off.
elyase is correct. scikit-learn will simply take the data in whatever order you give it. Hence, you'll have to ensure that the data is in the same order during training and prediction time.
Here's a simple illustrating example:
Training time:
from sklearn.tree import DecisionTreeClassifier
model = DecisionTreeClassifier()
x = pd.DataFrame({
'feature_1': [0, 0, 1, 1],
'feature_2': [0, 1, 0, 1]
})
y = [0, 0, 1, 1]
model.fit(x, y)
# we now have a model that
# (i) predicts 0 when x = [0, 0] or [0, 1], and
# (ii) predicts 1 when x = [1, 0] or [1, 1]
Prediction time:
# positive example
http_request_payload = {
'feature_1': 0,
'feature_2': 1
}
input_features = pd.DataFrame([http_request_payload])
model.predict(input_features) # this returns 0, as expected
# negative example
http_request_payload = {
'feature_2': 1, # notice that the order is jumbled up
'feature_1': 0
}
input_features = pd.DataFrame([http_request_payload])
model.predict(input_features) # this returns 1, when it should have returned 0.
# scikit-learn doesn't care about the key-value mapping of the features.
# it simply vectorizes the dataframe in whatever order it comes in.
This is how I cache the column order during training so that I can use it during prediction time.
# training
x = pd.DataFrame([...])
column_order = x.columns
model = SomeModel().fit(x, y) # train model
# save the things that we need at prediction time. you can also use pickle if you don't want to pip install joblib
import joblib
joblib.dump(model, 'my_model.joblib')
joblib.dump(column_order, 'column_order.txt')
# load the artifacts from disk
model = joblib.load('linear_model.joblib')
column_order = joblib.load('column_order.txt')
# imaginary http request payload
request_payload = { 'feature_1': ..., 'feature_1': ... }
# create empty dataframe with the right shape and order (using column_order)
input_features = pd.DataFrame([], columns=column_order)
input_features = input_features.append(request_payload, ignore_index=True)
input_features = input_features.fillna(0) # handle any missing data however you like
model.predict(input_features.values.tolist())

Using prepared data for Sci-kit classification

I am trying to use the Sci-kit learn python library to classify a bunch of urls for the presence of certain keywords matching a user profile. A user has name, email address ... and a url assigned to them. I have created a txt with the result of each profile data match on each link so it is in the format:
Name Email Address
0 1 0 =>Relavent
1 1 0 =>Relavent
0 1 1 =>Relavent
0 0 0 =>Not Relavent
Where the 0 or 1 signifies that the attribute was found on the page(each row is a webpage)
How do i give this data to the sci-kit so it can use it to run a classifier? The examples i have seen all have data coming from a predefined sch-kit library such as digits or iris or are being generated in the format i already have. I just dont know how to use the data format i have to provide to the library
The above is a toy example and i have many more features than 3
The data needed is a numpy array (in this case a "matrix") with the shape (n_samples, n_features).
A simple way to read the csv-file to the right format by using numpy.genfromtxt. Also refer this thread.
Let the contents of a csv file (say file.csv in the current working directory) be:
a,b,c,target
1,1,1,0
1,0,1,0
1,1,0,1
0,0,1,1
0,1,1,0
To load it we do
data = np.genfromtxt('file.csv', skip_header=True)
The skip_header is set to True, to prevent reading the header column (The a,b,c,target line). Refer numpy's documentation for more details.
Once you load the data, you need to do some pre-processing based on your input data format. The preprocessing could be something like splitting the input and the targets (classification) or splitting the whole dataset into a training and validation set (for cross-validation).
To split the input (feature matrix) from the output (target vector) we do
features = data[:, :3]
targets = data[:, 3] # The last column is identified as the target
For the above given CSV data, the arrays will use will look like:
features = array([[ 0, 1, 0],
[ 1, 1, 0],
[ 0, 1, 1],
[ 0, 0, 0]]) # shape = ( 4, 3)
targets = array([ 1, 1, 1, 0]) # shape = ( 4, )
Now these matrices are passed to the estimator objects fit function. If you are using the popular svm classifier then
>>> from sklearn.svm import LinearSVC
>>> linear_svc_model = LinearSVC()
>>> linear_svc_model.fit(X=features, y=targets)

Any python Support Vector Machine library around that allows online learning?

I do know there are some libraries that allow to use Support vector Machines from python code, but I am looking specifically for libraries that allow one to teach it online (this is, without having to give it all the data at once).
Are there any?
LibSVM includes a python wrapper that works via SWIG.
Example svm-test.py from their distribution:
#!/usr/bin/env python
from svm import *
# a three-class problem
labels = [0, 1, 1, 2]
samples = [[0, 0], [0, 1], [1, 0], [1, 1]]
problem = svm_problem(labels, samples);
size = len(samples)
kernels = [LINEAR, POLY, RBF]
kname = ['linear','polynomial','rbf']
param = svm_parameter(C = 10,nr_weight = 2,weight_label = [1,0],weight = [10,1])
for k in kernels:
param.kernel_type = k;
model = svm_model(problem,param)
errors = 0
for i in range(size):
prediction = model.predict(samples[i])
probability = model.predict_probability
if (labels[i] != prediction):
errors = errors + 1
print "##########################################"
print " kernel %s: error rate = %d / %d" % (kname[param.kernel_type], errors, size)
print "##########################################"
param = svm_parameter(kernel_type = RBF, C=10)
model = svm_model(problem, param)
print "##########################################"
print " Decision values of predicting %s" % (samples[0])
print "##########################################"
print "Numer of Classes:", model.get_nr_class()
d = model.predict_values(samples[0])
for i in model.get_labels():
for j in model.get_labels():
if j>i:
print "{%d, %d} = %9.5f" % (i, j, d[i,j])
param = svm_parameter(kernel_type = RBF, C=10, probability = 1)
model = svm_model(problem, param)
pred_label, pred_probability = model.predict_probability(samples[1])
print "##########################################"
print " Probability estimate of predicting %s" % (samples[1])
print "##########################################"
print "predicted class: %d" % (pred_label)
for i in model.get_labels():
print "prob(label=%d) = %f" % (i, pred_probability[i])
print "##########################################"
print " Precomputed kernels"
print "##########################################"
samples = [[1, 0, 0, 0, 0], [2, 0, 1, 0, 1], [3, 0, 0, 1, 1], [4, 0, 1, 1, 2]]
problem = svm_problem(labels, samples);
param = svm_parameter(kernel_type=PRECOMPUTED,C = 10,nr_weight = 2,weight_label = [1,0],weight = [10,1])
model = svm_model(problem, param)
pred_label = model.predict(samples[0])
Haven't heard of one. But do you really need online learning? I'm using SVMs for quite some time and never encountered a problem where i had to use online learning. Usually i set a threshold on the number of changes of training examples (maybe 100 or 1000) and then just batch-retrain all.
If your problem is at a scale, where you absolutely have to use online learning, then you might want to take a look at vowpal wabbit.
Reedited below, after comment:
Olivier Grisel suggested to use a ctypes wrapper around LaSVM. Since i didn't know about LaSVM before and it looks pretty cool, i'm intrigued to try it on my own problems :).
If you're limited to use the Python-VM only (embedded device, robot), i'd suggest to use voted/averaged perceptron, which performs close to a SVM, but is easy to implement and "online" by default.
Just saw that Elefant has some online-SVM code.
While there are no python bindings there, the algorithm described at
http://leon.bottou.org/projects/sgd is trained in an online fashion and is easily reimplemented using e.g. numpy.
Pegasos is an online SVM algorithm that performs quite nicely. It's also fairly easy to implement, even without a specific Python binding. There is a C implementation on the author's website that is adaptable or embeddable as well.
Why would you want to train it online? Adding trainings instances would usually require to re-solve the quadratic programming problem associated with the SVM.
A way to handle this is to train a SVM in batch mode, and when new data is available, check if these data points are in the [-1, +1] margin of the hyperplane. If so, retrain the SVM using all the old support vectors, and the new training data that falls in the margin.
Of course, the results can be slightly different compared to batch training on all your data, as some points can be discarded that would be support vectors later on. So again, why do you want to perform online training of you SVM?

Categories