how to speed up kernelize perceptron using parallelization? [closed]

how to speed up kernelize perceptron using parallelization? [closed] - python

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I am dealing with some kind of huge data set where I need to do binary classification using kernelized perceptron. I am using this source code: https://gist.github.com/mblondel/656147 .
Here there 3 things that can be paralelized , 1)kernel computation, 2)update rule 3)projection part. Also I did some kind of other speed up like calculation upper triangulated part of kernel then making it to full symmetric matrix :
K = np.zeros((n_samples, n_samples))
for index in itertools.combinations_with_replacement(range(n_samples),2):
K[index] = self.kernel(X[ index[0] ],X[ index[1] ],self.gamma)
#make the full KERNEL
K = K + np.triu(K,1).T
I also paralelized the projection part like:
def parallel_project(self,X):
""" Function to parallelizing prediction"""
y_predict=np.zeros(self.nOfWorkers,"object")
pool=mp.Pool(processes=self.nOfWorkers)
results=[pool.apply_async(prediction_worker,args=(self.alpha,self.sv_y,self.sv,self.kernel,(parts,))) for parts in np.array_split(X,self.nOfWorkers)]
pool.close()
pool.join()
i=0
for r in results:
y_predict[i]=r.get()
i+=1
return np.hstack(y_predict)
and worker:
def prediction_worker(alpha,sv_y,sv,kernel,samples):
""" WORKER FOR PARALELIZING PREDICTION PART"""
print "starting:" , mp.current_process().name
X= samples[0]
y_predict=np.zeros(len(X))
for i in range(len(X)):
s = 0
for a1, sv_y1, sv1 in zip(alpha, sv_y, sv):
s += a1 * sv_y1 * kernel(X[i], sv1)
y_predict[i]=s
return y_predict.flatten()
but still the code is too slow. So can you give me any hint regarding paralelization or any other speed up?
remark:
please prove general solution,I am not dealing with customize kernel functions.
thanks

Here's something that should give you an instant speedup. The kernels in Mathieu's example code take single samples, but then full Gram matrices are computed using them:
K = np.zeros((n_samples, n_samples))
for i in range(n_samples):
for j in range(n_samples):
K[i,j] = self.kernel(X[i], X[j])
This is slow, and can be avoided by vectorizing the kernel functions:
def linear_kernel(X, Y):
return np.dot(X, Y.T)
def polynomial_kernel(X, Y, p=3):
return (1 + np.dot(X, Y.T)) ** p
# the Gaussian RBF kernel is a bit trickier
Now the Gram matrix can be computed as just
K = kernel(X, X)
The project function should be changed accordingly to speed that up as well.

Related

Unsure of calculation step in polygon formula [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 months ago.
Improve this question
The final step seen on Symbolab: is a conversion to decimal to get Radius = 2005.65151, which I'm not sure how to recreate or if there's a step in between.
The result I have so far (RadiusD) prints a fraction.
Image: Polygon formula where S = side length. N = num sides, r = radius
Error in this code:
import math
from fractions import Fraction, Decimal
def TestRadius():
HalfSideLen = 120/2
#60
edgeNum2Radians = math.radians(105) #Edge count
Radians = math.pi/edgeNum2Radians # correct so far
# 1.7142857142857142
Radius = HalfSideLen / math.sin(Radians)
RadiusD = Decimal(HalfSideLen / math.sin(Radians))
#1066491443117295/17592186044416
# wanting r = 2005.65151
print(RadiusD)
print(TestRadius())
My math is very poor, thanks for your help
Corrected by #ytung-dev. Somehow step 3 was returning a correct result so I didn't look too close at Step 2, where actually it was the error.
import math
def TestRadius():
HalfSideLen = 120/2
edgeNum = 105
Radians = math.pi/edgeNum
Radius = HalfSideLen / math.sin(Radians)
print(Radius)
print(TestRadius())

Try this:
import math
def fx(s,n):
return s / ( 2 * math.sin(math.pi/n) )
print(fx(120, 105))
# 2005.65
Few things to note:
math.sin() use radian
sin() in symbolab use degree
equation in your image use degree
180 deg = math.pi rad
What is wrong in your script is that edgeNum is a counting number not an angle, so you should not convert it to radian. The only degree-radian conversion you should handle is the 180 deg in the equation.
So, to make your equation work in python, you simply change the 180 deg in the equation to math.pi.

How to use propertly theano.scan in the SGD updates of a simple Probabilistic Matrix Factorization algorithm?

I am trying to implement Probabilistic Matrix Factorization with Stochastic Gradient Descent updates, in theano, without using a for loop.
I have just started learning the basics of theano; unfortunately on my experiment I get this error:
UnusedInputError: theano.function was asked to create a function
computing outputs given certain inputs, but the provided input
variable at index 0 is not part of the computational graph needed
to compute the outputs: trainM.
The source code is the following:
def create_training_set_matrix(training_set):
return np.array([
[_i,_j,_Rij]
for (_i,_j),_Rij
in training_set
])
def main():
R = movielens.small()
U_values = np.random.random((config.K,R.shape[0]))
V_values = np.random.random((config.K,R.shape[1]))
U = theano.shared(U_values)
V = theano.shared(V_values)
lr = T.dscalar('lr')
trainM = T.dmatrix('trainM')
def step(curr):
i = T.cast(curr[0],'int32')
j = T.cast(curr[1],'int32')
Rij = curr[2]
eij = T.dot(U[:,i].T, V[:,j])
T.inc_subtensor(U[:,i], lr * eij * V[:,j])
T.inc_subtensor(V[:,j], lr * eij * U[:,i])
return {}
values, updates = theano.scan(step, sequences=[trainM])
scan_fn = function([trainM, lr],values)
print "training pmf..."
for training_set in cftools.epochsloop(R,U_values,V_values):
training_set_matrix = create_training_set_matrix(training_set)
scan_fn(training_set_matrix, config.lr)
I realize that it's a rather unconventional way to use theano.scan: do you have a suggestion on how I could implement my algorithm better?
The main difficulty lies on the updates: a single update depends on possibly all the previous updates. For this reason I defined the latent matrices U and V as shared (I hope I did that correctly).
The version of theano I am using is: 0.8.0.dev0.dev-8d6800181bedb03a4bced4f456338e5194524317
Any hint and suggestion is highly appreciated. I am available to provide further details.

Suggestions on matlab/python conversion [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I'm trying to translate the matlab code below to a python code. The code calculates numerical the para-state of a deuterium molecule and then plots the result. When I try to translate it to python, it seems that I get stuck in a nested for-loop which calculates a sum. I have been searching on the internet the past days yet without success.
Because it's a physics code I will mention some aspect from the code. So first we calculate the partition function (Z). After that there is a calculation of the energy which is a partial derivative of ln(Z) to beta. From this we can calculate the specific heat (approximately) as the derivative of energy to temperature.
So the matlab code looks like this:
epsilon = 0.0038*1.60217662*10^-19;
k = 1.38*10^-23;
T = 1:.1:2000;
beta = 1./(k*T);
%partitionfunction
clear Z Zodd;
for i = 1:length(T)
clear p;
for s = 1:2:31;
a = 2*s+1;
b = s^2+s;
p(s) = 3*a*exp(-b*epsilon*beta(i));
end
Zodd(i) = sum(p);
end
%energy
ln_Zodd = log(Zodd);
for i = 1 : (length(T)-1)
Epara(i) = -(ln_Zodd(i+1)-ln_Zodd(i))/(beta(i+1)-beta(i));
end
%heat capacity
for i = 1 : (length(T)-2)
Cpara(i) = (Epara(i+1)-Epara(i))/(T(i+1)-T(i));
end
%plot
x = k*T/epsilon;
plot(x(1:6000),Cpara(1:6000)/k, 'r');
axis([0 7 0 1.5]);
ylabel('C_v/k');
xlabel('kT/eps');
The corresponding python code:
import numpy as np
import matplotlib.pyplot as plt
import math
epsilon=0.0038*1.60217662*10**-19
k = 1.38*10**-23
T = np.arange(1,2000,0.1)
beta = 1/(k*T)
#partitionfunction
for i in np.arange(1,len(T)):
for s in np.arange(1,31,2):
p[s] = 3*(2*s+1)*math.exp(-(s**2+s)*epsilon*beta(i))
Zodd[i] = sum(p)
#energy
ln_Zodd = math.log(Zodd)
for i in np.arange(1,(len(T) - 1)):
Epara[i]=- (ln_Zodd(i + 1) - ln_Zodd(i)) / (beta(i + 1) - beta(i))
#heat capacity
for i in np.arange(1,(len(T) - 2)):
Cpara[i]=(Epara(i + 1) - Epara(i)) / (T(i + 1) - T(i))
#plot
x = k*T/epsilon
plt.plot(x(np.arange(1,6000)),Cpara(np.arange(1,6000)) / k,'r')
plt.axis([0, 7, 0, 1.5])
plt.ylabel('C_v/k')
plt.xlabel('kT/eps')
plt.show()
This should be the easiest way to calculate (approximate) this problem because the analytic expression is way more involved. I'm new to python so any suggestions or corrections are appreciated.

I agree with #rayryeng that this question is off-topic. However, as I'm interested in matlab, python, and theoretical physics, I took the time to look through your code.
There are multiple syntactical problems with it, and multiple semantical ones as well. Arrays should always be accessed by [] in python, often you try to use (). And the natural indexing of arrays starts from 0, unlike matlab.
Here's a syntactically and semantically corrected version of your original code:
import numpy as np
import matplotlib.pyplot as plt
#import math #use np.* if you have it already imported
epsilon=0.0038*1.60217662*10**-19
k = 1.38*10**-23
T = np.arange(1,2000,0.1)
beta = 1.0/(k*T) #changed to 1.0 for safe measure; redundant
#partitionfunction
svec=np.arange(1,31,2)
p=np.zeros(max(svec)) #added pre-allocation
Zodd=np.zeros(len(T)) #added pre-allocation
for i in np.arange(len(T)): #changed to index Zodd from 0
for s in svec: #changed to avoid magic numbers
p[s-1] = 3*(2*s+1)*np.exp(-(s**2+s)*epsilon*beta[i]) #changed to index p from 0; changed beta(i) to beta[i]; changed to np.exp
Zodd[i] = sum(p)
#energy
ln_Zodd = np.log(Zodd) #changed to np.log
Epara=np.zeros(len(T)-2) #added pre-allocation
for i in np.arange(len(T) - 2): #changed to index Epara from 0
Epara[i]=- (ln_Zodd[i + 1] - ln_Zodd[i]) / (beta[i + 1] - beta[i]) #changed bunch of () to []
#heat capacity
Cpara=np.zeros(len(T)-3) #added pre-allocation
for i in np.arange(len(T) - 3): #changed to index Cpara from 0
Cpara[i]=(Epara[i + 1] - Epara[i]) / (T[i + 1] - T[i])
#plot
x = k*T/epsilon
plt.plot(x[:6000],Cpara[:6000] / k,'r') #fixed and simplified array indices
plt.axis([0, 7, 0, 1.5])
plt.ylabel('C_v/k')
plt.xlabel('kT/eps')
plt.show()
Take the time to look through the comments I made, they are there to instruct you. If something is not clear, please ask for clarification:)
However, this code is far from efficient. Especially your double loop takes a long time to run (which might explain why you think it hung). So I also made it very numpy-based.
Here's the result:
import numpy as np
import scipy.constants as consts
import matplotlib.pyplot as plt
epsilon=0.0038*consts.eV #changed eV
k = consts.k #changed
T = np.arange(1,2000,0.1)
beta = 1.0/(k*T) #changed to 1.0 for safe measure; redundant
#partitionfunction
s=np.arange(1,31,2)[:,None]
Zodd = (3*(2*s+1)*np.exp(-(s**2+s)*epsilon*beta)).sum(axis=0)
#energy
ln_Zodd = np.log(Zodd) #changed to np.log
#Epara = - (ln_Zodd[1:]-ln_Zodd[:-1])/(beta[1:]-beta[:-1]) #manual version
Epara = - np.diff(ln_Zodd)/np.diff(beta)
#heat capacity
Cpara=np.diff(Epara)/np.diff(T)[:-1]
#plot
x = k*T/epsilon
plt.plot(x[:len(Cpara)],Cpara / k,'r') #fixed and simplified array indices
plt.axis([0, 7, 0, 1.5])
plt.ylabel('C_v/k')
plt.xlabel('kT/eps')
plt.show()
Again, please review the changes made. I made use of the scipy.constants module to import physical constants to high precision. I also made use of array broadcasting, which allowed me to turn your double loop into a sum of a matrix along one of its dimensions (just like how you should have done it in matlab; your original matlab code is also far from efficient).
Here's the common result:
You can see that it seems right: at high temperature you get the Dulong--Petit behaviour, and at T->0 we get the zero limit in accordance with the third law of thermodynamics. The heat capacity decays exponentially, but this should make sense since you have a finite energy gap.

Design of a Notch filter in Python [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 7 years ago.
Improve this question
I'm trying to design an IIR Notch filter in python using numpy array and the scipy librairy to remove a sine tone from a imported wave file (I'm using the wave module to do so). My file was generated by Adobe audition : it is a pure sine # 1.2 kHZ, sampled # 48, 96 or 192 kHz, in order to have a "pseudo-periodic" data for my circular fft (just ask if I'm not clear enough)
Here is the code I used to implement the coefficient of my filter (I get the coefficient from the article "Second-order IIR Notch Filter Design and implementation of digital signal
processing system" by C. M. Wang & W. C. Xiao)
f_cut = 1200.0
wn = f_cut/rate
r = 0.99
B, A = np.zeros(3), np.zeros(3)
A[0],A[1],A[2] = 1.0, -2.0*r*np.cos(2*np.pi*wn), r*r
B[0],B[1],B[2] = 1.0, -2.0*np.cos(2*np.pi*wn), 1.0
filtered = signal.lfilter(B, A, data_flt_R, axis=0)
Where data_flt_R is a numpy array containing my right channel in a float64 type, and rate is my sampling frequency. I plot the frequency response and the fft of my data using the matplotlib module to see if everything is ok.
N = len(data_flt_R)
w, h = signal.freqz(B,A, N)
pyplot.subplot(2,1,1)
pyplot.semilogx(w*rate/(2*np.pi), 20*np.log10(np.absolute(h)))
fft1 = fftpack.fft(data_flt_R, N)
fft_abs1 = np.absolute(fft1)
ref = np.nanmax(fft_abs1)
dB_unfiltered = 20*np.log10(fft_abs1/ref)
fft2 = fftpack.fft(filtered, N)
fft_abs2 = np.absolute(fft2)
dB_filtered = 20*np.log10(fft_abs2/ref)
abs = fftpack.fftfreq(N,1.0/rate)
pyplot.subplot(2,1,2)
pyplot.semilogx(abs,dB_unfiltered,'r', label='unfiltered')
pyplot.semilogx(abs,dB_filtered,'b', label='filtered')
pyplot.grid(True)
pyplot.legend()
pyplot.ylabel('power spectrum (in dB)')
pyplot.xlim(10,rate/2)
pyplot.xlabel('frequencies (in Hz)')
And here is what I get :
I don't understand the results and values I get before and after my fc. Shouldn't I get a plot which looks like the red one but whithout the main peak ? Why do I have a slope in HF? Is this linked with windowing?
Moreover, the result changes if I change my sampling frequency and/or the data length (16/24 or 32 bits). Can anyone enlighten me?

Neural Network Example Source-code (preferably Python) [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 1 year ago.
Improve this question
I wonder if anyone has some example code of a Neural network in python. If someone know of some sort of tutorial with a complete walkthrough that would be awesome, but just example source would be great as well!
Thanks

Found this interresting discusion on ubuntu forums
http://ubuntuforums.org/showthread.php?t=320257
import time
import random
# Learning rate:
# Lower = slower
# Higher = less precise
rate=.2
# Create random weights
inWeight=[random.uniform(0, 1), random.uniform(0, 1)]
# Start neuron with no stimuli
inNeuron=[0.0, 0.0]
# Learning table (or gate)
test =[[0.0, 0.0, 0.0]]
test+=[[0.0, 1.0, 1.0]]
test+=[[1.0, 0.0, 1.0]]
test+=[[1.0, 1.0, 1.0]]
# Calculate response from neural input
def outNeuron(midThresh):
global inNeuron, inWeight
s=inNeuron[0]*inWeight[0] + inNeuron[1]*inWeight[1]
if s>midThresh:
return 1.0
else:
return 0.0
# Display results of test
def display(out, real):
if out == real:
print str(out)+" should be "+str(real)+" ***"
else:
print str(out)+" should be "+str(real)
while 1:
# Loop through each lesson in the learning table
for i in range(len(test)):
# Stimulate neurons with test input
inNeuron[0]=test[i][0]
inNeuron[1]=test[i][1]
# Adjust weight of neuron #1
# based on feedback, then display
out = outNeuron(2)
inWeight[0]+=rate*(test[i][2]-out)
display(out, test[i][2])
# Adjust weight of neuron #2
# based on feedback, then display
out = outNeuron(2)
inWeight[1]+=rate*(test[i][2]-out)
display(out, test[i][2])
# Delay
time.sleep(1)
EDIT: there is also a framework named chainer
https://pypi.python.org/pypi/chainer/1.0.0

You might want to take a look at Monte:
Monte (python) is a Python framework
for building gradient based learning
machines, like neural networks,
conditional random fields, logistic
regression, etc. Monte contains
modules (that hold parameters, a
cost-function and a gradient-function)
and trainers (that can adapt a
module's parameters by minimizing its
cost-function on training data).
Modules are usually composed of other
modules, which can in turn contain
other modules, etc. Gradients of
decomposable systems like these can be
computed with back-propagation.

Here is a probabilistic neural network tutorial :http://www.youtube.com/watch?v=uAKu4g7lBxU
And my Python Implementation:
import math
data = {'o' : [(0.2, 0.5), (0.5, 0.7)],
'x' : [(0.8, 0.8), (0.4, 0.5)],
'i' : [(0.8, 0.5), (0.6, 0.3), (0.3, 0.2)]}
class Prob_Neural_Network(object):
def __init__(self, data):
self.data = data
def predict(self, new_point, sigma):
res_dict = {}
np = new_point
for k, v in self.data.iteritems():
res_dict[k] = sum(self.gaussian_func(np[0], np[1], p[0], p[1], sigma) for p in v)
return max(res_dict.iteritems(), key=lambda k : k[1])
def gaussian_func(self, x, y, x_0, y_0, sigma):
return math.e ** (-1 *((x - x_0) ** 2 + (y - y_0) ** 2) / ((2 * (sigma ** 2))))
prob_nn = Prob_Neural_Network(data)
res = prob_nn.predict((0.2, 0.6), 0.1)
Result:
>>> res
('o', 0.6132686067117191)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

how to speed up kernelize perceptron using parallelization? [closed] - python

Related

Unsure of calculation step in polygon formula [closed]

How to use propertly theano.scan in the SGD updates of a simple Probabilistic Matrix Factorization algorithm?

Suggestions on matlab/python conversion [closed]

Design of a Notch filter in Python [closed]

Neural Network Example Source-code (preferably Python) [closed]

Categories

Resources