Matlab to Python - python

I have a simple code in MATLAB which I am trying to translate to python, but I am stuck in a simple for loop:
Here is the situation:
Matlab
f0 = constant
fn = (nx1) matrix
b = (nx1) matrix
d and x are constant
mthd = 1 or 2
s = 1:-0.1:0.1;
for i = 1:10
f = fn * s(i)
switch mthd
case 1
v(:,i) = d *(1 + 1./b.*(f0./f)).^x
case 2
v(:,i) = log(f0./f)./b;
v(:,i) = v./(1+v)
end
v(1,:) = min(vp(2,:));
The output in Matlab results v with nx1 matrix
Assuming it is a simple equation with element wise operation in matlab,
I went ahead and wrote a code in python like this:
s = np.linspace(1,0.1,num=10)
for i in range(1,11)
f = fn * s[i]
if mthd ==1:
v = d *(1 + 1/b*(f0/f))^x
elif mthd ==2:
v = log(f0/f)/b;
v = v/(1+v)
Clearly, this is not the right one and I get stuck right from f = fn* s[i]
Any suggestion in this conversion will be of great help.
Thank you

Clearly this is not the right one and I get stuck right from f = fn* s[i]
What error message are you getting here? Make sure your vectors fn and b are numpy arrays and not lists.
for i in range(1,11)
Python uses zero indexing, whereas Matlab uses 1-indexing. Therefore your for loop should use for i in range(10), which iterates from 0 to 9 instead of 1 to 10.
v = d *(1 + 1/b*(f0/f))^x
Assuming fn and b are numpy arrays in your Python implementation, if you really want this to mirror the Matlab code you can still use indexing such as v[:,i]. However you need to initialize v as a numpy array with the correct size first.
v = log(f0/f)/b;
You probably want np.log here.
Hopefully this is helpful, let me know if you still have questions. You may also find this website helpful.

The code block below should be closer to what you want. Here are a few things to look out for:
Phyton arrays are indexed from 0. In base Python you handle powers with ** e.g. 2 ** 2 equals 4
When performing scalar multiplication and divide of arrays, better to use np.multiply and np.divide
Use np.log for logarithm and np.power for exponentiation with numpy matrices.
Use np.add to add a scalar to a numpy array.
import numpy as np
f0 = 5 # constant
fn = np.matrix([[5], [4], [3], [2], [1]]) # 5 x 1 matrix
b = np.matrix([[9], [8], [7], [6], [5]]) # 5 x 1 matrix
# d and x are constant
d = 4
x = 8
# mthd = 1 or 2
mthd = 1
s = np.linspace(1,0.1,num=10)
# python arrays are indexed from 0
for i in range(0,len(s)):
f = fn * s[i]
if mthd == 1:
v = np.power(np.multiply(d, (1 + np.divide(1., np.multiply
(b, np.divide(f0, f) ) ) ) ), x)
elif mthd ==2:
v = np.divide(np.log(np.divide(f0,f)), b);
v = np.divide(v, (np.add(1, v)) )

Related

Is there another way to find item n in this recursive sequence?

The sequence is:
an = an-1 + (2 * an-2)
a0 = 1, a1= 1. Find a100
The way I did it is making a list.
#List 'a' with a0 = 1 , a1 = 1.
a = [1,1]
#To get the a100, implement 'i' as the index value of the list.
for i in range (2,101):
x = a[i-1] + (2 * a[i-2])
print( str(len(a)) + (": ") + str(x))
#Save new value to list
a.append(x)
Is there another way to do this where you can just directly get the value of a100? Or the value of a10000.. it will take up so much memory.
For this specific case, the sequence appears to be known as the Jacobsthal sequence. Wikipedia gives a closed form expression for a_n that can be expressed as follows:
def J(n):
return (2**(n+1) - (1 if n % 2 else -1))//3
Slightly more generally, you can use fast matrix exponentiation to find a specific value of a_n in O(log n) matrix operations. The approach here is a slight modification of this.
import numpy as np
def a(n):
mat = np.array([[1, 2], [1, 0]], dtype=object) # object for large integers
return np.linalg.matrix_power(mat, n)[0,0]
Here is the value for a_1000:
7143390714575115472989500327066678737076032078036890716291669255802340340832907483287989192104639054183964486117020978834580968571282093623989718383132383202623045183216153990280716403374914094585302788102030983322387960844932511706110362630718041943047464318457694778440286554435082924558137112046251
This recurrence relation has a closed form solution:
a = lambda n: (2**(n+1) + (-1)**n)//3
Then
a(0) == 1
a(1) == 1
a(2) == 3
...
Use Wolfram Alpha solve for the closed form solution.
For a more general solution, sympy's rsolve can generate a formula for linear recurrences. And then use substitution to find particular values.
from sympy import rsolve, Function, symbols
f = Function('f')
n = symbols('n', integer=True)
T = f(n) - f(n - 1) - 2 * f(n - 2)
sol = rsolve(T, f(n), {f(0): 1, f(1): 1})
print(sol.factor())
for k in range(6):
print(f'a[{10**k}] = {sol.subs({n:10**k})}')
This finds the formula: ((-1)**n + 2*2**n)/3 and substituting various values gives:
a[1] = 1
a[10] = 683
a[100] = 845100400152152934331135470251
a[1000] = 7143390714575115472989500327066678737076032078036890716291669255802340340832907483287989192104639054183964486117020978834580968571282093623989718383132383202623045183216153990280716403374914094585302788102030983322387960844932511706110362630718041943047464318457694778440286554435082924558137112046251
a[10000] = 13300420779205055899224947751223900558823312212574616365680059665686292553481297754613307789357463065266220752948806082847704327566275854078395857288064215971903820031195863017843497700844039930347033391278795541028339072307078736457006049910726416592060326596558672835961088838567081045539649268371274925376816731095916294031173247751323635481912358774462877183753093841891253840488152356727760984122637587639312975932940560640357511880709747618222262691017043766353735428453489979600223956211100972865182186443850404115054687605329465453071585497122508186691535256991501267222976387636433705286400943222614410489725426171396919846079518533884638490449629415374679171890883668485549192847140249201910928687618755494267749463781127049374279769561549759200832570764870138287994839741197500087328573494472227205070621546774178994858997503894208562707691159300991409504210074059830342802209213468621093971730976504006937230561044048029975244677676707947087336124281517272447267049737611904634607637370045500833604005013228174598706158078702963192048604263495032226147988471602982108251173897742022519137359868942131422329103081800375446624970338827853981873988860876269047918349845673238184625284288814399599917924440538912558558685095521850114849105048496522741529593155873907738282168861316542080131736118854643798317265443020838956090639908522753418790270855651099392460347365053921743882641323846748271362887055383912692879736402269982104388805781403942200602501882277026496929598476838303527006808207298214407168983217160516849324232198998893837958637097759081249712999519344381402467576288757211476207860932148655897231556293513976121900670048618498909700385756334067235325208259649285799693889564105871362639412657210097186118095746465818754306322522134720983321447905340926047485500603884544957480384983947611769143791817076603055269994974019086721023722205420067991783904156229025970272783748933896591684108429045765889012975813584862160062970831282169566933785351515891836917604484599090827358327607311145704700506065400164526586785514617302254188281302685535172938965970009784445593131997924161090875584262602248970534271757827918474036922817159666073457645479797721100990086996148246631809842046103645478455250800241851505149187576887740797874187195112987924800865762440512367759907023068198581038345298256830912964615391929510632144672034080214910330858779357159414245558929061170945822567007313514409276959727327732103102944890874437957354081499958646666151187821572015407908429716866090505450005466559490856410166587392640154829574782514412057571343645656039081553195235917082324370960357975081345975714019208241045008362225535513352731779100379038105003677818345932796086474225126766610787543447696005152433715459704967280220123536564742545543604882702212692308056024281175802607700426526000495235781464187268985316355546978912530579053491968145752746720495213034211965438416298865678974339803258684849814383125421063166939821410053665460303868944551299858094210708807124261007787849536528397806251

Confusing result with quadratic regression

So, I'm trying to fit some pairs of x,y data with a quadratic regression, a sample formula can be found at http://polynomialregression.drque.net/math.html.
Following is my code that does the regression using that explicit formula and using numpy inbuilt functions,
import numpy as np
x = [6.230825,6.248279,6.265732]
y = [0.312949,0.309886,0.306639472]
toCheck = x[2]
def evaluateValue(coeff,x):
c,b,a = coeff
val = np.around( a+b*x+c*x**2,9)
act = 0.306639472
error= np.abs(act-val)*100/act
print "Value = {:.9f} Error = {:.2f}%".format(val,error)
###### USing numpy######################
coeff = np.polyfit(x,y,2)
evaluateValue(coeff, toCheck)
################# Using explicit formula
def determinant(a,b,c,d,e,f,g,h,i):
# the matrix is [[a,b,c],[d,e,f],[g,h,i]]
return a*(e*i - f*h) - b*(d*i - g*f) + c*(d*h - e*g)
a = b = c = d = e = m = n = p = 0
a = len(x)
for i,j in zip(x,y):
b += i
c += i**2
d += i**3
e += i**4
m += j
n += j*i
p += j*i**2
det = determinant(a,b,c,b,c,d,c,d,e)
c0 = determinant(m,b,c,n,c,d,p,d,e)/det
c1 = determinant(a,m,c,b,n,d,c,p,e)/det
c2 = determinant(a,b,m,b,c,n,c,d,p)/det
evaluateValue([c2,c1,c0], toCheck)
######Using another explicit alternative
def determinantAlt(a,b,c,d,e,f,g,h,i):
return a*e*i - a*f*h - b*d*i +b*g*f + c*d*h - c*e*g # <- barckets removed
a = b = c = d = e = m = n = p = 0
a = len(x)
for i,j in zip(x,y):
b += i
c += i**2
d += i**3
e += i**4
m += j
n += j*i
p += j*i**2
det = determinantAlt(a,b,c,b,c,d,c,d,e)
c0 = determinantAlt(m,b,c,n,c,d,p,d,e)/det
c1 = determinantAlt(a,m,c,b,n,d,c,p,e)/det
c2 = determinantAlt(a,b,m,b,c,n,c,d,p)/det
evaluateValue([c2,c1,c0], toCheck)
This code gives this output
Value = 0.306639472 Error = 0.00%
Value = 0.308333580 Error = 0.55%
Value = 0.585786477 Error = 91.03%
As, you can see these are different from each other and third one is totally wrong. Now my questions are:
1. Why the explicit formula is giving slightly wrong result and how to improve that?
2. How numpy is giving so accurate result?
3. In the third case only by openning the parenthesis, how come the result changes so drastically?
So there are a few things that are going on here that are unfortunately plaguing the way you are doing things. Take a look at this code:
for i,j in zip(x,y):
b += i
c += i**2
d += i**3
e += i**4
m += j
n += j*i
p += j*i**2
You are building features such that the x values are not only squared, but cubed and fourth powered.
If you print out each of these values before you put them into the 3 x 3 matrix to solve:
In [35]: a = b = c = d = e = m = n = p = 0
...: a = len(x)
...: for i,j in zip(xx,y):
...: b += i
...: c += i**2
...: d += i**3
...: e += i**4
...: m += j
...: n += j*i
...: p += j*i**2
...: print(a, b, c, d, e, m, n, p)
...:
...:
3 18.744836 117.12356813829001 731.8283056811686 4572.738547313946 0.9294744720000001 5.807505391292503 36.28641270376207
When dealing with floating-point arithmetic and especially for small values, the order of operations does matter. What's happening here is that by fluke, the mix of both small values and large values that have been computed result in a value that is very small. Therefore, when you compute the determinant using the factored form and expanded form, notice how you get slightly different results but also look at the precision of the values:
In [36]: det = determinant(a,b,c,b,c,d,c,d,e)
In [37]: det
Out[37]: 1.0913403514223319e-10
In [38]: det = determinantAlt(a,b,c,b,c,d,c,d,e)
In [39]: det
Out[39]: 2.3283064365386963e-10
The determinant is on the order of 10-10! The reason why there's a discrepancy is because with floating-point arithmetic, theoretically both determinant methods should yield the same result but unfortunately in reality they are giving slightly different results and this is due to something called error propagation. Because there are a finite number of bits that can represent a floating-point number, the order of operations changes how the error propagates, so even though you are removing the parentheses and the formulas do essentially match, the order of operations to get to the result are now different. This article is an essential read for any software developer who deals with floating-point arithmetic regularly: What Every Computer Scientist Should Know About Floating-Point Arithmetic.
Therefore, when you're trying to solve the system with Cramer's Rule, inevitably when you divide by the main determinant in your code, even though the change is on the order of 10-10, the change is negligible between the two methods but you will get very different results because you're dividing by this number when solving for the coefficients.
The reason why NumPy doesn't have this problem is because they solve the system by least-squares and the pseudo-inverse and not using Cramer's Rule. I would not recommend using Cramer's Rule to find regression coefficients mostly due to experience and that there are more robust ways of doing it.
However to solve your particular problem, it's good to normalize the data so that the dynamic range is now centered at 0. Therefore, the features you use to construct your coefficient matrix are more sensible and thus the computational process has an easier time dealing with the data. In your case, something as simple as subtracting the data with the mean of the x values should work. As such, if you have new data points you want to predict, you must subtract by the mean of the x data first prior to doing the prediction.
Therefore at the beginning of your code, perform mean subtraction and regress on this data. I've showed you where I've modified the code given your source above:
import numpy as np
x = [6.230825,6.248279,6.265732]
y = [0.312949,0.309886,0.306639472]
# Calculate mean
me = sum(x) / len(x)
# Make new dataset that is mean subtracted
xx = [pt - me for pt in x]
#toCheck = x[2]
# Data point to check is now mean subtracted
toCheck = x[2] - me
def evaluateValue(coeff,x):
c,b,a = coeff
val = np.around( a+b*x+c*x**2,9)
act = 0.306639472
error= np.abs(act-val)*100/act
print("Value = {:.9f} Error = {:.2f}%".format(val,error))
###### USing numpy######################
coeff = np.polyfit(xx,y,2) # Change
evaluateValue(coeff, toCheck)
################# Using explicit formula
def determinant(a,b,c,d,e,f,g,h,i):
# the matrix is [[a,b,c],[d,e,f],[g,h,i]]
return a*(e*i - f*h) - b*(d*i - g*f) + c*(d*h - e*g)
a = b = c = d = e = m = n = p = 0
a = len(x)
for i,j in zip(xx,y): # Change
b += i
c += i**2
d += i**3
e += i**4
m += j
n += j*i
p += j*i**2
det = determinant(a,b,c,b,c,d,c,d,e)
c0 = determinant(m,b,c,n,c,d,p,d,e)/det
c1 = determinant(a,m,c,b,n,d,c,p,e)/det
c2 = determinant(a,b,m,b,c,n,c,d,p)/det
evaluateValue([c2,c1,c0], toCheck)
######Using another explicit alternative
def determinantAlt(a,b,c,d,e,f,g,h,i):
return a*e*i - a*f*h - b*d*i +b*g*f + c*d*h - c*e*g # <- barckets removed
a = b = c = d = e = m = n = p = 0
a = len(x)
for i,j in zip(xx,y): # Change
b += i
c += i**2
d += i**3
e += i**4
m += j
n += j*i
p += j*i**2
det = determinantAlt(a,b,c,b,c,d,c,d,e)
c0 = determinantAlt(m,b,c,n,c,d,p,d,e)/det
c1 = determinantAlt(a,m,c,b,n,d,c,p,e)/det
c2 = determinantAlt(a,b,m,b,c,n,c,d,p)/det
evaluateValue([c2,c1,c0], toCheck)
When I run this, we now get:
In [41]: run interp_test
Value = 0.306639472 Error = 0.00%
Value = 0.306639472 Error = 0.00%
Value = 0.306639472 Error = 0.00%
As some final reading for you, this is a similar problem that someone else encountered which I addressed in their question: Fitting a quadratic function in python without numpy polyfit. The summary is that I advised them not to use Cramer's Rule and to use least-squares through the pseudo-inverse. I showed them how to get exactly the same results without using numpy.polyfit. Also, using least-squares generalizes where if you have more than 3 points, you can still fit a quadratic through your points so that the model has the smallest error possible.

The long integer when using NumPy

I have the following simple codes for the sum of two vectors.
However, I get the wrong results when I use NumPy.
The results of the codes is as follows:
In [12]: %run -i test.py
The last 2 elements of the sum [7980015996L, 7992002000L]
The last 2 elements of the sum [-609918596 -597932592]
It's not a long integer, Why?
import numpy as np
def numpysum(n):
a = np.arange(n) ** 2
b = np.arange(n) ** 3
c = a + b
return c
def pythonsum(n):
a = range(n)
b = range(n)
c = []
for i in range(len(a)):
a[i] = i ** 2
b[i] = i ** 3
c.append(a[i] + b[i])
return c
size = 2000
c = pythonsum(size)
print "The last 2 elements of the sum", c[-2:]
c = numpysum(size)
print "The last 2 elements of the sum", c[-2:]
Plain Python integers can grow arbitrarily large. Numpy integers cannot; they are limited by the size of the data type. If they get too big, they will wrap around and become negative. It looks like your array dtype is probably int32, which overflows and results in negative results. You can get the correct results in this case by using int64:
a = np.arange(n, dtype=np.int64) ** 2
b = np.arange(n, dtype=np.int64) ** 3
However, it will still overflow eventually (if you make size larger). You could also use float64, which allows even larger numbers, but then you will lose precision.
The cap on integer sizes is the price you pay for the speed numpy gives you.

Nested for Loop optimization in python

i want to optimize 2 for loops into single for loop, is there any way as length of array is very large.
A = [1,4,2 6,9,10,80] #length of list is very large
B = []
for x in A:
for y in A:
if x != y:
B.append(abs(x-y))
print(B)
not any better but more pythonic:
B = [abs(x-y) for x in A for y in A if x!=y]
unless you absolutely need duplicates (abs(a-b) == abs(b-a)), you can half your list (and thus computation):
B = [abs(A[i]-A[j]) for i in range(len(A)) for j in range(i+1, len(A))]
finaly you can use the power of numpy to get C++ speedup:
import numpy as np
A = np.array(A)
A.shape = -1,1 # make it a column vector
diff = np.abs(A - A.T) # diff is the matrix of abs differences
# grab upper triangle of order 1 (i.e. less the diagonal)
B = diff[np.triu_indices(len(A), k=1)]
But this will always be O(n^2) no matter what...

Tridiagonal Matrix Algorithm (TDMA) aka Thomas Algorithm, using Python with NumPy arrays

I found an implementation of the thomas algorithm or TDMA in MATLAB.
function x = TDMAsolver(a,b,c,d)
%a, b, c are the column vectors for the compressed tridiagonal matrix, d is the right vector
n = length(b); % n is the number of rows
% Modify the first-row coefficients
c(1) = c(1) / b(1); % Division by zero risk.
d(1) = d(1) / b(1); % Division by zero would imply a singular matrix.
for i = 2:n-1
temp = b(i) - a(i) * c(i-1);
c(i) = c(i) / temp;
d(i) = (d(i) - a(i) * d(i-1))/temp;
end
d(n) = (d(n) - a(n) * d(n-1))/( b(n) - a(n) * c(n-1));
% Now back substitute.
x(n) = d(n);
for i = n-1:-1:1
x(i) = d(i) - c(i) * x(i + 1);
end
end
I need it in python using numpy arrays, here my first attempt at the algorithm in python.
import numpy
aa = (0.,8.,9.,3.,4.)
bb = (4.,5.,9.,4.,7.)
cc = (9.,4.,5.,7.,0.)
dd = (8.,4.,5.,9.,6.)
ary = numpy.array
a = ary(aa)
b = ary(bb)
c = ary(cc)
d = ary(dd)
n = len(b)## n is the number of rows
## Modify the first-row coefficients
c[0] = c[0]/ b[0] ## risk of Division by zero.
d[0] = d[0]/ b[0]
for i in range(1,n,1):
temp = b[i] - a[i] * c[i-1]
c[i] = c[i]/temp
d[i] = (d[i] - a[i] * d[i-1])/temp
d[-1] = (d[-1] - a[-1] * d[-2])/( b[-1] - a[-1] * c[-2])
## Now back substitute.
x = numpy.zeros(5)
x[-1] = d[-1]
for i in range(-2, -n-1, -1):
x[i] = d[i] - c[i] * x[i + 1]
They give different results, so what am I doing wrong?
I made this since none of the online implementations for python actually work. I've tested it against built-in matrix inversion and the results match.
Here a = Lower Diag, b = Main Diag, c = Upper Diag, d = solution vector
import numpy as np
def TDMA(a,b,c,d):
n = len(d)
w= np.zeros(n-1,float)
g= np.zeros(n, float)
p = np.zeros(n,float)
w[0] = c[0]/b[0]
g[0] = d[0]/b[0]
for i in range(1,n-1):
w[i] = c[i]/(b[i] - a[i-1]*w[i-1])
for i in range(1,n):
g[i] = (d[i] - a[i-1]*g[i-1])/(b[i] - a[i-1]*w[i-1])
p[n-1] = g[n-1]
for i in range(n-1,0,-1):
p[i-1] = g[i-1] - w[i-1]*p[i]
return p
For an easy performance boost for large matrices, use numba! This code outperforms np.linalg.inv() in my tests:
import numpy as np
from numba import jit
#jit
def TDMA(a,b,c,d):
n = len(d)
w= np.zeros(n-1,float)
g= np.zeros(n, float)
p = np.zeros(n,float)
w[0] = c[0]/b[0]
g[0] = d[0]/b[0]
for i in range(1,n-1):
w[i] = c[i]/(b[i] - a[i-1]*w[i-1])
for i in range(1,n):
g[i] = (d[i] - a[i-1]*g[i-1])/(b[i] - a[i-1]*w[i-1])
p[n-1] = g[n-1]
for i in range(n-1,0,-1):
p[i-1] = g[i-1] - w[i-1]*p[i]
return p
There's at least one difference between the two:
for i in range(1,n,1):
in Python iterates from index 1 to the last index n-1, while
for i = 2:n-1
iterates from index 1 (zero-based) to the last-1 index, since Matlab has one-based indexing.
In your loop, the Matlab version iterates over the second through second-to last elements. To do the same in Python, you want:
for i in range(1,n-1):
(As noted in voithos's comment, this is because the range function excludes the last index, so you need to correct for this in addition to the change to 0 indexing).
Writing somthing like this in python is going to be really slow. You would be much better off using LAPACK to do the numerical heavy lifting and use python for everything around it. LAPACK is compiled so it will run much faster than python it is also much more higly optimised than it is feasible for most of us to match.
SciPY provides low level wrappers for LAPACK so that you can call it from python very simply, the one you are looking for can be found here:
https://docs.scipy.org/doc/scipy/reference/generated/scipy.linalg.lapack.dgtsv.html#scipy.linalg.lapack.dgtsv

Categories