SymPy lambdify gives wrong result, while *.subs gives the accruate one - python

Sorry for bothering you with this. I have a serious issue and now im on clock to solve it, so here is my question.
I have an issue where I lambdify a quantity, but the result of the quantity differs from the ".subs" result, and sometimes it's way off, or it's a NaN, where in reality there is a real number (found by subs)
Here, I have a small MWE where you can see the issue! Thanks in advance for ur time
import sympy as sy
import numpy as np
##STACK
#some quantities needed before u see the problem
r = sy.Symbol('r', real=True)
th = sy.Symbol('th', real=True)
e_c = 1e51
lf0 = 100
A = 1.6726e-24
#here are some quantities I define to go the problem
lfac = lf0+2
rd = 4*3.14/4/sy.pi/A/lfac**2
xi = r/rd #rescaled r
#now to the problem:
#QUANTITY
lfxi = xi**(-3)*(lfac+1)/2*(sy.sqrt( 1 + 4*lfac/(lfac+1)*xi**(3) + (2*xi**(3)/(lfac+1))**2) -1)
#RESULT WITH SUBS
print(lfxi.subs({th:1.00,r:1.00}).evalf())
#RESULT WITH LAMBDIFY
lfxi_l = sy.lambdify((r,th),lfxi)
lfxi_l(0.01,1.00)
##gives 0

The issue is that your mpmath precision needs to be set higher!
By default mpmath uses prec=53 and dps=15, but your expression requires a much higher resolution than this for it
# print(lfxi)
3.0256512324559e+62*(sqrt(1.09235114769539e-125*pi**6*r**6 + 6.74235013645028e-61*pi**3*r**3 + 1) - 1)/(pi**3*r**3)
...
from mpmath import mp
lfxi_l = sy.lambdify((r,th),lfxi, modules=["mpmath"])
mp.dps = 125
print(lfxi_l(1.00,1.00))
# 101.999... result

Changing a couple of the constants to "modest" values:
In [89]: e_c=1; A=1
The different methods produce essentially the same thing:
In [91]: lfxi.subs({th:1.00,r:1.00}).evalf()
Out[91]: 1.00000000461176
In [92]: lfxi_l = sy.lambdify((r,th),lfxi)
In [93]: lfxi_l(1.0,1.00)
Out[93]: 1.000000004611762
In [94]: lfxi_m = sy.lambdify((r,th),lfxi, modules=["mpmath"])
In [95]: lfxi_m(1.0,1.00)
Out[95]: mpf('1.0000000046117619')

Related

simple t-test in python with CIs of difference

What is the most straightforward way to perform a t-test in python and to include CIs of the difference? I've seen various posts but everything is different and when I tried to calculate the CIs myself it seemed slightly wrong... Here:
import numpy as np
from scipy import stats
g1 = np.array([48.7107107,
36.8587287,
67.7129929,
39.5538852,
35.8622661])
g2 = np.array([62.4993857,
49.7434833,
67.7516511,
54.3585559,
71.0933957])
m1, m2 = np.mean(g1), np.mean(g2)
dof = (len(g1)-1) + (len(g2)-1)
MSE = (np.var(g1) + np.var(g2)) / 2
stderr_diffs = np.sqrt((2 * MSE)/len(g1))
tcl = stats.t.ppf([.975], dof)
lower_limit = (m1-m2) - (tcl) * (stderr_diffs)
upper_limit = (m1-m2) + (tcl) * (stderr_diffs)
print(lower_limit, upper_limit)
returns:
[-30.12845447] [-0.57070077]
However, when I run the same test in SPSS, although I have the same t and p values, the CIs are -31.87286, 1.17371, and this is also the case in R. I can't seem to find the correct way to do this and would appreciate some help.
You're subtracting 1 when you compute dof, but when you compute the variance you're not using the sample variance:
MSE = (np.var(g1) + np.var(g2)) / 2
should be
MSE = (np.var(g1, ddof=1) + np.var(g2, ddof=1)) / 2
which gives me
[-31.87286426] [ 1.17370902]
That said, instead of doing the manual implementation, I'd probably use statsmodels' CompareMeans:
In [105]: import statsmodels.stats.api as sms
In [106]: r = sms.CompareMeans(sms.DescrStatsW(g1), sms.DescrStatsW(g2))
In [107]: r.tconfint_diff()
Out[107]: (-31.872864255548553, 1.1737090155485568)
(really we should be using a DataFrame here, not an ndarray, but I'm lazy).
Remember though that you're going to want to consider what assumption you want to make about the variance:
In [110]: r.tconfint_diff(usevar='pooled')
Out[110]: (-31.872864255548553, 1.1737090155485568)
In [111]: r.tconfint_diff(usevar='unequal')
Out[111]: (-32.28794665832114, 1.5887914183211436)
and if your g1 and g2 are representative, the assumption of equal variance might not be a good one.

Python SciPy linprog optimization fails with status 3

Trying to minimize a simple linear function with linprog. The coefficients are the elements of arr2 multiplied by -1. There are only inequality constraints for each variable, such as -1 <= x1 <= 1, -2 <= x2 <= 2 and so on.
If a choose not to specify bounds in linprog:
from scipy.optimize import linprog
import numpy as np
import pandas as pd
numdim = 28
arr1 = np.ones(numdim)
arr1 = - arr1
arr2 = np.array([
19.53,
128.97,
3538,
931.8,
0.1825,
150.88,
10315,
0.8109,
3.9475,
3022,
31.77,
10323,
110.93,
220,
2219.5,
119.2,
703.6,
616,
338,
84.67,
151.13,
111.28,
29.515,
29.67,
158800,
167.15,
0.06802,
1179
])
constr_a = []
for i in range(numdim):
constr_default = np.zeros(numdim)
constr_default[i] = 1
constr_a.append(constr_default)
for i in range(numdim):
constr_default = np.zeros(numdim)
constr_default[i] = -1
constr_a.append(constr_default)
constr_a = np.asarray(constr_a)
constr_b = np.arange(1, 2*numdim + 1, 1)
constr_b[numdim:] = constr_b[:numdim]
print linprog(np.transpose(arr1 * arr2), constr_a, constr_b, bounds=(None, None))
I get the following result:
fun: -4327476.2887400016
message: 'Optimization failed. The problem appears to be unbounded.'
status: 3
I've tried changing the last row to:
print linprog(np.transpose(arr1 * arr2), constr_a, constr_b, bounds=(-1000, 1000))
The numbers specified as bounds are random. The output is:
fun: -4327476.2887400296
message: 'Optimization terminated successfully.'
status: 0
which gives us a slightly different result and the desired status.
My question is, do I misuse the library and in which way? Which answer is correct? This code was expected to work without specifying the 'bounds' parameter. I cannot use this parameter because these simple constraints are unique for each variable.
I use python 2.7 and scipy 0.17.1. Big thanks in advance.
Upd
constr_a should be a matrix according to the documentation (https://docs.scipy.org/doc/scipy/reference/optimize.linprog-simplex.html) and actually is in the code. To be sure the syntax is correct, we can cut the number of dimensions to 2:
from scipy.optimize import linprog
import numpy as np
import pandas as pd
numdim = 2
arr1 = np.ones(numdim)
arr1 = - arr1
arr2 = np.array([
19.53,
128.97
])
constr_a = []
for i in range(numdim):
constr_default = np.zeros(numdim)
constr_default[i] = 1
constr_a.append(constr_default)
for i in range(numdim):
constr_default = np.zeros(numdim)
constr_default[i] = -1
constr_a.append(constr_default)
constr_a = np.asarray(constr_a)
constr_b = np.arange(1, 2*numdim + 1, 1)
constr_b[numdim:] = constr_b[:numdim]
print constr_a
print constr_b
print linprog(np.transpose(arr1 * arr2), constr_a, constr_b, bounds=(None, None))
and this will work.
the constr_a list is not properly formed. It is an array of array's instead of being an array of scalar. This might be leading to a improper lower bound causing the optimization to fail.
Perhaps
constr_a.append(constr_default)
should be
constr_a.append(constr_default[i])
inspect both the bound arrays to make sure they have proper form and values.

result of coupled ODE in python code is different from mathematica

As per my language knowledge my code is written correct. But It is not giving me correct solution (plot). When I had solved same system of ODE's in mathematica, I have correct solution and both solutions are totally different. I am writing a research project so I need a proper code in python. could you please let me know the mistake of mine code.
python code solution
Mathematica solution
import numpy as np
import matplotlib.pyplot as plt
import scipy.integrate as si
##Three system
def func(state, T):
H = state[0]
P = state[1]
R = state[2]
Hd = -(16./3.)*np.pi*P
Pd = -4.*H*P
Rd = H*R
return Hd,Pd,Rd
T = np.linspace(0.1,0.9,50)
state0 = [1,0.0001, 0.1]
s = si.odeint(func, state0, T)
h = np.transpose(s)
plt.plot(T,h[0])
plt.show()
Mathematica code
Clear[H,\[Rho],a]
Eq1=(H'[t] == -16 \[Pi] \[Rho][t]/3)
Eq2= (\[Rho]'[t] == -4 H[t] \[Rho][t])
Eq3 = (a'[t] == H[t] a[t])
sol=NDSolve[{Eq1,Eq2, Eq3,
H[0.1]==0.1, \[Rho][0.1]==0.1, a[0.1]==0.1},
{H[t],\[Rho][t],a[t]}, {t,0.1, 0.9}]
Plot[Evaluate[{H[t]}/.sol],{t,0.1,0.9}]
Both codes are correct, I just turned off my laptop and on it again, and it gives me the correct result (as mathematica)

Interval containing specified percent of values

With numpy or scipy, is there any existing method that will return the endpoints of an interval which contains a specified percent of the values in a 1D array? I realize that this is simple to write myself, but it seems like the kind of thing that might be built in, although I can't find it.
E.g:
>>> import numpy as np
>>> x = np.random.randn(100000)
>>> print(np.bounding_interval(x, 0.68))
Would give approximately (-1, 1)
You can use np.percentile:
In [29]: x = np.random.randn(100000)
In [30]: p = 0.68
In [31]: lo = 50*(1 - p)
In [32]: hi = 50*(1 + p)
In [33]: np.percentile(x, [lo, hi])
Out[33]: array([-0.99206523, 1.0006089 ])
There is also scipy.stats.scoreatpercentile:
In [34]: scoreatpercentile(x, [lo, hi])
Out[34]: array([-0.99206523, 1.0006089 ])
I don't know of a built-in function to do it, but you can write one using the math package to specify approximate indices like this:
from __future__ import division
import math
import numpy as np
def bound_interval(arr_in, interval):
lhs = (1 - interval) / 2 # Specify left-hand side chunk to exclude
rhs = 1 - lhs # and the right-hand side
sorted = np.sort(arr_in)
lower = sorted[math.floor(lhs * len(arr_in))] # use floor to get index
upper = sorted[math.floor(rhs * len(arr_in))]
return (lower, upper)
On your specified array, I got the interval (-0.99072237819851039, 0.98691691784955549). Pretty close to (-1, 1)!

Estimate formants using LPC in Python

I'm new to signal processing (and numpy, scipy, and matlab for that matter). I'm trying to estimate vowel formants with LPC in Python by adapting this matlab code:
http://www.mathworks.com/help/signal/ug/formant-estimation-with-lpc-coefficients.html
Here is my code so far:
#!/usr/bin/env python
import sys
import numpy
import wave
import math
from scipy.signal import lfilter, hamming
from scikits.talkbox import lpc
"""
Estimate formants using LPC.
"""
def get_formants(file_path):
# Read from file.
spf = wave.open(file_path, 'r') # http://www.linguistics.ucla.edu/people/hayes/103/Charts/VChart/ae.wav
# Get file as numpy array.
x = spf.readframes(-1)
x = numpy.fromstring(x, 'Int16')
# Get Hamming window.
N = len(x)
w = numpy.hamming(N)
# Apply window and high pass filter.
x1 = x * w
x1 = lfilter([1., -0.63], 1, x1)
# Get LPC.
A, e, k = lpc(x1, 8)
# Get roots.
rts = numpy.roots(A)
rts = [r for r in rts if numpy.imag(r) >= 0]
# Get angles.
angz = numpy.arctan2(numpy.imag(rts), numpy.real(rts))
# Get frequencies.
Fs = spf.getframerate()
frqs = sorted(angz * (Fs / (2 * math.pi)))
return frqs
print get_formants(sys.argv[1])
Using this file as input, my script returns this list:
[682.18960189917243, 1886.3054773107765, 3518.8326108511073, 6524.8112723782951]
I didn't even get to the last steps where they filter the frequencies by bandwidth because the frequencies in the list aren't right. According to Praat, I should get something like this (this is the formant listing for the middle of the vowel):
Time_s F1_Hz F2_Hz F3_Hz F4_Hz
0.164969 731.914588 1737.980346 2115.510104 3191.775838
What am I doing wrong?
Thanks very much
UPDATE:
I changed this
x1 = lfilter([1., -0.63], 1, x1)
to
x1 = lfilter([1], [1., 0.63], x1)
as per Warren Weckesser's suggestion and am now getting
[631.44354635609318, 1815.8629524985781, 3421.8288991389031, 6667.5030877036006]
I feel like I'm missing something since F3 is very off.
UPDATE 2:
I realized that the order being passed to scikits.talkbox.lpc was off due to a difference in sampling frequency. Changed it to:
Fs = spf.getframerate()
ncoeff = 2 + Fs / 1000
A, e, k = lpc(x1, ncoeff)
Now I'm getting:
[257.86573127888488, 774.59006835496086, 1769.4624576002402, 2386.7093679399809, 3282.387975973973, 4413.0428174593926, 6060.8150432549655, 6503.3090645887842, 7266.5069407315023]
Much closer to Praat's estimation!
The problem had to do with the order being passed to the lpc function. 2 + fs / 1000 where fs is the sampling frequency is the rule of thumb according to:
http://www.phon.ucl.ac.uk/courses/spsci/matlab/lect10.html
I have not been able to get the results you expect, but I do notice two things which might cause some differences:
Your code uses [1, -0.63] where the MATLAB code from the link you provided has [1 0.63].
Your processing is being applied to the entire x vector at once instead of smaller segments of it (see where the MATLAB code does this: x = mtlb(I0:Iend); ).
Hope that helps.
There are at least two problems:
According to the link, the "pre-emphasis filter is a highpass all-pole (AR(1)) filter". The signs of the coefficients given there are correct: [1, 0.63]. If you use [1, -0.63], you get a lowpass filter.
You have the first two arguments to scipy.signal.lfilter reversed.
So, try changing this:
x1 = lfilter([1., -0.63], 1, x1)
to this:
x1 = lfilter([1.], [1., 0.63], x1)
I haven't tried running your code yet, so I don't know if those are the only problems.

Categories