I am trying to build a simple digit recognising ANN circuit using python. It has one input layer with 15 inputs, hidden layer and output layer(10 neurons). I am truly a beginner in this field but i am an experienced programmer.
When a=1 and b=0 and I want output f=0 & g=1
Value at:
C
1*1+-0.5*0= 1
D
1*0+.1*1 = 0.1
E
-1*0+-0.5*1 = -0.5
Since the sigmoidal function fires only when value > 0 i guess only neurons C and D fires. So output of E will be 0 right?
C:1 D:0.1 E:0
Value at F:
1*1+0.1*-0.3+0*0.3=0.97 (neuron fires)
Value at G:
1*-1+-0.5*0.1+0*.1= -1.05 (neuron does not fire)
So the output seems to be F:1 & G:0 which is opposite of desired.
Now i am really confused about backpropagation. How can i use backpropagation to correct the weights in this case? The math steps would be great..
Guys i need confirmation whether the math is right. And after that i have lot of supplementary questions i have to ask.
I am using the sigmoidal function for threshold. So if the value is less than 0, there is no output and if greater than 0 then it fires.
If a neuron does not fire, then its output is taken as zero right?
I hope this helps a bit, although our implementations / approches are different.
In my implementation(small/simple network) I calculate the output as the sum of all input-nodes-output times the weights, in this case if the neuron doesn't fire it doesn't count as input.
(but in my implementation i let the negative values get passed to the next level)
Pseudocode
for (let idxHidden in hiddensOutput) {
let sum = 0
for (let idxInput in inputsOutput) {
sum = sum + inputsOutput[idxInput] * inputsWeight[idxInput][idxHidden];
}
hiddensOutput[idxHidden] = sigmoid(sum);
}
hiddensOutput ... is a list of node in the next layer
inputsOutput ... is a list of inputs for the nodes of the hiddensOutputs
inputsWeight ... is a matrix of the weights between those nodes (setting this up is the "tricky" part)
hiddensOutput ... is a list with the inputs for the next layer
So to answer the question: "If a neuron does not fire, then its output is taken as zero right?"
Yes
Update:
Here are the links from the comment Section I posted before online
course
http://coursera.org/learn/machine-learning (there are other some even self paced)
http://shop.oreilly.com/product/9780596529321.do interesting book (with lots of python code [explained])
http://en.m.wikipedia.org/wiki/Artificial_neural_network (basic Math)
Related
I have a very specific task to complete and I am honestly lost in it. The goal is to define function in Python, that would remove all 1s in binary input that do not have any 1 next to it. I will show you in example.
Let's have input 0b11010 -–> the output of this would be 0b11000. Another example 0b10101 --> output would be Ob00000.
The real twist is that I cannot use any for/while loop, import any library or use zip, lists, map etc. The function needs to be defined purely using bitwise operations and nothing else.
I was already trying to implement the concepts of operations, but those were only blind shots and got nowhere. Any help would be appreciated, thanks!
To break down the condition mathematically, the i-th bit of the output should be 1 if and only if:
The i-th bit of the input is 1.
And either the (i-1)-th bit or the (i+1)-th bit of the input is also 1.
Logically the condition is input[i] and (input[i-1] or input[i+1]) if the input is a bit vector. If the input is simply a number, indexing can be emulated with bit shifting and masking, giving this code:
def remove_lonely_ones(b):
return b & ((b << 1) | (b >> 1))
Testing shows that it works both on your examples and on edge cases:
print("{: 5b}".format(remove_lonely_ones(0b11111))) # prints 11111
print("{: 5b}".format(remove_lonely_ones(0b11010))) # prints 11000
print("{: 5b}".format(remove_lonely_ones(0b11011))) # prints 11011
print("{: 5b}".format(remove_lonely_ones(0b10101))) # prints 0
print("{: 5b}".format(remove_lonely_ones(0b00000))) # prints 0
I'm trying to implement a really simple neuron of AND gate using adaline learning, but even after implementing the algorithms over many epochs, i cannot understand why I get the answer no where closer to the real answers.
x1 =[1,0,1,0];
x2 = [1,1,0,0];
o = [1,-1,-1,-1];
w0=0.2;
w1=0.2;
w2=0.2;
learningRate=0.2;
for j in range(0,200):
for i in range(0,4):
y=x1[i]*w1+x2[i]*w2+w0;
w1+=learningRate*(o[i]-y)*x1[i];
w2+=learningRate*(o[i]-y)*x2[i];
w0+=learningRate*(o[i]-y);
print(w1)
print(w2);
a =int(input('give input A'));
b = int(input('give input B'));
print(w1*a+w2*b+w0);
After training, I expected the output to be really close to 1 when the inputs were 1 and the output to be really close to 0 in other inputs
This is because there is no numerical solution to mentioned AND gate. This is why activation functions are used for the output of neurons. For this specific binary/logical case you just need a threshold and you are fine.
I am a beginner to Python. Currently I'm writing a code for developing a simple solver for non-linear ODE systems with initial value. The equations of the system are as follow.
The function of myu is evaluated first to get the value of myu, then used in dX/dt, dS/dt, and dDO/dt. At the next step, myu is evaluated again to get its new value based on new value of S and DO.
I am using General Linear Method (GLM), proposed by J. C. Butcher, as my method. This method use a transition matrix, which value and size depends on numerical method that we use. In this case, I use Runge Kutta Cash-Karp.
While you may find in the equation that D is also a function, here I set the value of D as a constant.
In initialization, the value of h is set first, to get the number of step. I create a vector named 'initValue', with 8 columns and 4 rows, consist of values of k for each equations (row 1 to 6), initial value for fourth order of the Runge Kutta (row 7. I set it to 0 since it just act as a 'predictor'), and initial value for each equations (row 8).
Transition matrix is created based on the GLM, which values inside it comes from the constants of stage equations (to find the value of k1 to k6) and step equations (to find the solutions) of Runge Kutta Cash-Karp.
In the looping, at the very first time, I simply add the initial values to an array named 'result'. At the first step, I simply multiple the transition matrix with vector 'initValue'. And at the next until final step, I initialize the 'initValue' based on result from previous step.
What I'm looking for is the solution which should look like this.
My code works if h is less than 1. I compare my result with result from scipy.integrate.odeint. But when I set h bigger than 1, it show different result than the result it should be. For example, in the code, I set h = 100, which means that it will only display the initial value and final value (when time = 100). While X and S should going upward, and DO and Xr going downward, mine was the opposite of them. The result from odeint when h is set to bigger than 1 show the same result with the expected solution.
I need help to fix my code so it can display the expected solution for any value of h.
Thank you.
Why do you expect any type of reasonable result for ridiculously large step sizes?
The most simple demonstration is y'=-y and the explicit Euler method. If you use step sizes smaller 1 you will get qualititively correct solutions. For step sizes smaller 0.1, you will start to get also quantitatively correct step sizes.
However, if you use a step size h=10, then the iteration
y[k+1]= y[k] - h*y[k] = -9*y[k]
will explode. The same also happens for higher order methods, sufficiently small step sizes give quantitatively correct results, medium step sizes can still give a qualitatively correct picture, large step sizes lead to errors that are very quickly larger than the solution values.
I'm constructing a Naive Bayes text classifier from scratch in Python and I am aware that, upon encountering a product of very small probabilities, using a logarithm over the probabilities is a good choice.
The issue now, is that the mathematical function that I'm using has a summation OVER a product of these extremely small probabilities.
To be specific, I'm trying to calculate the total word probabilities given a mixture component (class) over all classes.
Just plainly adding up the logs of these total probabilities is incorrect, since the log of a sum is not equal to the sum of logs.
To give an example, lets say that I have 3 classes, 2000 words and 50 documents.
Then I have a word probability matrix called wordprob with 2000 rows and 3 columns.
The algorithm for the total word probability in this example would look like this:
sum = 0
for j in range(0,3):
prob_product = 1
for i in words: #just the index of words from my vocabulary in this document
prob_product = prob_product*wordprob[i,j]
sum = sum + prob_product
What ends up happening is that prob_product becomes 0 on many iterations due to many small probabilities multiplying with each other.
Since I can't easily solve this with logs (because of the summation in front) I'm totally clueless.
Any help will be much appreciated.
I think you may be best to keep everything in logs. The first part of this, to compute the log of the product is just adding up the log of the terms. The second bit, computing the log of the sum of the exponentials of the logs is a bit trickier.
One way would be to store each of the logs of the products in an array, and then you need a function that, given an array L with n elements, will compute
S = log( sum { i=1..n | exp( L[i])})
One way to do this is to find the maximum, M say, of the L's; a little algebra shows
S = M + log( sum { i=1..n | exp( L[i]-M)})
Each of the terms L[i]-M is non-positive so overflow can't occur. Underflow is not a problem as for them exp will return 0. At least one of them (the one where L[i] is M) will be zero so it's exp will be one and we'll end up with something we can pass to log. In other words the evaluation of the formula will be trouble free.
If you have the function log1p (log1p(x) = log(1+x)) then you could gain some accuracy by omitting the (just one!) i where L[i] == M from the sum, and passing the sum to log1p instead of log.
your question seems on the math side of things rather than the coding of it.
I haven't quite figured out what your issue is but the sum of logs equals the log of the products. Dont know if that helps..
Also, you are calculating one prob_product for every j but you are just using the last one (and you are re-initializing it). you meant to do one of two things: either initialize it before the j-loop or use it before you increment j. Finally, i doesnt look that you need to initialize sum unless this is part of yet another loop you are not showing here.
That's all i have for now.
Sorry for the long post and no code.
High school algebra tells you this:
log(A*B*....*Z) = log(A) + log(B) + ... + log(Z) != log(A + B + .... + Z)
I am solving the homework-1 of Caltech Machine Learning Course (http://work.caltech.edu/homework/hw1.pdf) . To solve ques 7-10 we need to implement a PLA. This is my implementation in python:
import sys,math,random
w=[] # stores the weights
data=[] # stores the vector X(x1,x2,...)
output=[] # stores the output(y)
# returns 1 if dot product is more than 0
def sign_dot_product(x):
global w
dot=sum([w[i]*x[i] for i in xrange(len(w))])
if(dot>0):
return 1
else :
return -1
# checks if a point is misclassified
def is_misclassified(rand_p):
return (True if sign_dot_product(data[rand_p])!=output[rand_p] else False)
# loads data in the following format:
# x1 x2 ... y
# In the present case for d=2
# x1 x2 y
def load_data():
f=open("data.dat","r")
global w
for line in f:
data_tmp=([1]+[float(x) for x in line.split(" ")])
data.append(data_tmp[0:-1])
output.append(data_tmp[-1])
def train():
global w
w=[ random.uniform(-1,1) for i in xrange(len(data[0]))] # initializes w with random weights
iter=1
while True:
rand_p=random.randint(0,len(output)-1) # randomly picks a point
check=[0]*len(output) # check is a list. The ith location is 1 if the ith point is correctly classified
while not is_misclassified(rand_p):
check[rand_p]=1
rand_p=random.randint(0,len(output)-1)
if sum(check)==len(output):
print "All points successfully satisfied in ",iter-1," iterations"
print iter-1,w,data[rand_p]
return iter-1
sign=output[rand_p]
w=[w[i]+sign*data[rand_p][i] for i in xrange(len(w))] # changing weights
if iter>1000000:
print "greater than 1000"
print w
return 10000000
iter+=1
load_data()
def simulate():
#tot_iter=train()
tot_iter=sum([train() for x in xrange(100)])
print float(tot_iter)/100
simulate()
The problem according to the answer of question 7 it should take around 15 iterations for perceptron to converge when size of training set but the my implementation takes a average of 50000 iteration . The training data is to be randomly generated but I am generating data for simple lines such as x=4,y=2,..etc. Is this the reason why I am getting wrong answer or there is something else wrong. Sample of my training data(separable using y=2):
1 2.1 1
231 100 1
-232 1.9 -1
23 232 1
12 -23 -1
10000 1.9 -1
-1000 2.4 1
100 -100 -1
45 73 1
-34 1.5 -1
It is in the format x1 x2 output(y)
It is clear that you are doing a great job learning both Python and classification algorithms with your effort.
However, because of some of the stylistic inefficiencies with your code, it makes it difficult to help you and it creates a chance that part of the problem could be a miscommunication between you and the professor.
For example, does the professor wish for you to use the Perceptron in "online mode" or "offline mode"? In "online mode" you should move sequentially through the data point and you should not revisit any points. From the assignment's conjecture that it should require only 15 iterations to converge, I am curious if this implies the first 15 data points, in sequential order, would result in a classifier that linearly separates your data set.
By instead sampling randomly with replacement, you might be causing yourself to take much longer (although, depending on the distribution and size of the data sample, this is admittedly unlikely since you'd expect roughly that any 15 points would do about as well as the first 15).
The other issue is that after you detect a correctly classified point (cases when not is_misclassified evaluates to True) if you then witness a new random point that is misclassified, then your code will kick down into the larger section of the outer while loop, and then go back to the top where it will overwrite the check vector with all 0s.
This means that the only way your code will detect that it has correctly classified all the points is if the particular random sequence that it evaluates them (in the inner while loop) happens to be a string of all 1's except for the miraculous ability that on any particular 0, on that pass through the array, it classifies correctly.
I can't quite formalize why I think that will make the program take much longer, but it seems like your code is requiring a much stricter form of convergence, where it sort of has to learn everything all at once on one monolithic pass way late in the training stage after having been updated a bunch already.
One easy way to check if my intuition about this is crappy would be to move the line check=[0]*len(output) outside of the while loop all together and only initialize it one time.
Some general advice to make the code easier to manage:
Don't use global variables. Instead, let your function to load and prep the data return things.
There are a few places where you say, for example,
return (True if sign_dot_product(data[rand_p])!=output[rand_p] else False)
This kind of thing can be simplified to
return sign_dot_product(data[rand_p]) != output[rand_p]
which is easier to read and conveys what criteria you're trying to check for in a more direct manner.
I doubt efficiency plays an important role since this seems to be a pedagogical exercise, but there are a number of ways to refactor your use of list comprehensions that might be beneficial. And if possible, just use NumPy which has native array types. Witnessing how some of these operations have to be expressed with list operations is lamentable. Even if your professor doesn't want you to implement with NumPy because she or he is trying to teach you pure fundamentals, I say just ignore them and go learn NumPy. It will help you with jobs, internships, and practical skill with these kinds of manipulations in Python vastly more than fighting with the native data types to do something they were not designed for (array computing).