getting more precise answer in python - python

I am trying to implement a simple formula with python . In the case of "RLSR" there two approach to calculate the coefficient "C" by C=k^-1*y where k is the kernel matrix and y is the target value vector.
The second way is by using eigenvalues(w) and eigenvectors(v) of the kernel method and using formula :
Sum(u(i)*<u(i),y> / w(i) ).
to check the answer of the result I'm checking whether y=kc or not! in the first case it's ok , but the latter case it's not.but the problem is my c vectors in both method look the same.
output from invers algorithm :
[ 19.49840251 18.82695226 20.08390355 15.01043404 14.79353281
16.75316736 12.88504257 16.92127176 16.77292954 17.81827473
20.90503787 17.09359467 18.76366701 18.14816903 20.03491117
22.56668264 21.45176136 25.44051036 30.40312692 22.61466379
22.86480382 19.34631818 17.0169598 19.85244414 16.63702471
20.35280156 20.58093488 22.42058736 20.54935198 19.35541575
20.39006958 19.74766081 20.41781019 22.33858797 17.57962283
22.61915219 22.54823733 24.96292824 22.82888425 34.18952603
20.7487537 24.82019935 22.40621769 21.15767304 27.58919263
18.39293156 21.55455108 18.69532341]
output from second(eigendecomposition ) algorithm :
[ 19.25280289 18.73927731 19.77184991 14.7650427 14.87364331
16.2273648 12.29183797 16.52024239 16.66669961 17.59282615
20.83059115 17.02815857 18.3635798 18.16931845 20.50528549
22.67690164 21.40479524 25.54544 30.94618128 22.72992565
23.06289609 17.6485592 15.2758427 19.50578691 16.45571607
20.19960765 20.35352859 22.60091638 20.36586912 18.90760728
20.57141151 19.43677153 20.43437031 22.39310576 17.72296978
22.82139991 22.50744791 25.10496617 22.30462867 34.80540213
20.77064617 25.18071618 22.5500315 20.71481252 27.91939784
18.29868659 22.00800019 18.71266093]
this how I have implemented it:
so let say if we have 48 samples the size of K is 48x48 , y is 48x1
def cpu_compute(y,k):
w, u=LA.eigh(k)
uDoty=np.dot(u,y)
temp=np.transpose(np.tile(w,(len(u),1)))
div=np.divide(u,temp)
r=np.tile(uDoty,(len(div),1))
a=div*r.T
c=sum(a)
return c
and the result from
print np.allclose(Y,np.dot(K,c))
is false.
also the norm of difference from the true result is 3.13014997999.
now I have no idea how can I fix this . I thought maybe some how by doing more precise answer.
Appreciate any help!

To solve k c = y with numpy, use numpy.linalg.solve:
c = solve(k, y)
This uses an algorithm that is much more robust than the methods you are trying.

Related

How to change the domain (i.e. polynomial ring) using sympy in Pytho?

I'm in the middle of a big (and frankly quite hard) project so while this is my first interrogation, it probably won't be the last. Also : english is not my first langage so 'Sorry for bad english' and I'm writing this on my phone so 'Sorry for bad formating'.
Ok so : I'm trying to implement the General Number Field Sieve in Python, and I'm, at least for now, heavily relying on sympy.
Here is a peice of code where I'm struggling. In the code below, gpc(N,m) is a float list.
From sympy import Poly
From sympy.abc import x
g = Poly(gpc(N,m), x) [*]
However, when I do that, I get a polynomial over the domain RR and I would very much like to switch this to another domain D (where D will end up being ZZ['x'] but I would like this function to be general)
I'm aware of the fact that I can slightly modify [*] in
g = Poly(gpc(N,m), x, domain = D)
to get what I want. However, this wouldn't be enough. Somewhere else in my code, I need to be able to change the domain of an already constructed polynomial, and this solution wouldn't help.
When I lookep it up, I found the change_ring method so I tried this :
f = g.change_ring(D)
However, upon execution, I get the error message :
'Poly' object has no attribute 'change_ring'
So I guess that this function don't exist.
Does anyone knows how to change the domain of a polynomial ?
Thanks a lot !
It looks like creating a new Poly instance is the best approach; there are a few class methods that could help (take a look at the Poly.from_* class methods)
For example:
from sympy import Poly
from sympy.abc import x, a
g = Poly(x**3 + a*x*2 - 5*x + 6, x)
print(g) # Poly(x**3 + (2*a - 5)*x + 6, x, domain='ZZ[a]')
f = Poly.from_poly(g, *g.gens, domain='ZZ[a, b]')
print(f) # Poly(x**3 + (2*a - 5)*x + 6, x, domain='ZZ[a,b]')
I also wonder if rationalizing your floats at some point might help - see e.g. nsimplify.

Solving simultaneous equations (>2) without conversion to matrices in R or Python

I have a set of 4 simultaneous equations:
0.059z = x
0.06w = y
z+w = 8093
x+y = 422
All the solutions I've found so far seem to be for equations that have all the variables present in each equation, then convert to matrices and use the solve function.
Is there an easier way to do this in R or Python using the equations in their original form?
Also, how can I ensure that only positive numbers are returned in the solution?
Hope this makes sense...many thanks for the help
You can use sympy for this:
from sympy import symbols, linsolve, Eq
x,y,z,w = symbols('x y z w')
linsolve([Eq(0.059*z, x), Eq(0.06*w, y), Eq(z+w, 8093), Eq(x+y, 422)], (x, y, z, w))
Output:
Regarding your comments about negative values - there is only one solution to the system of equations, and it has negative values for y and w. If there was more than one solution, sympy would return them, and you could filter the solutions from there to only positive values.
In R, maybe you try it like below:
library(rootSolve)
library(zeallot)
model <- function(v){
c(x,y,z,w) %<-% v
return(c(0.059*z-x, 0.06*w-y, z+w-8093, x+y-422))
}
res <- multiroot(f = model, start = c(0,0,0,0))
then you can get the solution res as
> res
[1] 3751.22 -3329.22 63580.00 -55487.00
there are a few things going on here. first as CDJB notes: if there were any positive solutions then sympy would find them. I searched for those numbers and found this paper which suggests you should be using 7088 instead of 8093. we can do a quick sanity check:
def pct(value):
return f"{value:.1%}"
print(pct(422 / 8093)) # ~5.2%
print(pct(422 / 7088)) # ~6.0%
confirming that you're going to struggle averaging ~5.9% and ~6.0% towards ~5.2%, and explaining the negative solutions in the other answers. further, these are presumably counts so all your variables also need to be whole numbers.
once this correct denominator is used, I'd comment that there are many solutions (11645 by my count) e.g:
cases = [1, 421]
pop = [17, 7071]
rates = [pct(c / p) for c, p in zip(cases, pop)]
gives the appropriate output, as does:
cases = [2, 420]
pop = [34, 7054]
this is because the data was rounded to two decimal places. you probably also don't want to use either of the above, they're just the first two valid solutions I got.
we can define a Python function to enumerate all solutions:
from math import floor, ceil
def solutions(pop, cases, rate1, rate2, err):
target = (pct(rate1), pct(rate2))
for pop1 in range(1, pop):
pop2 = pop - pop1
c1_lo = ceil(pop1 * (rate1 - err))
c1_hi = floor(pop1 * (rate1 + err))
for c1 in range(c1_lo, c1_hi+1):
c2 = cases - c1
if (pct(c1 / pop1), pct(c2 / pop2)) == target:
yield c1, c2, pop1, pop2
all_sols = list(solutions(7088, 422, 0.059, 0.060, 0.0005))
which is where I got my count of 11645 above from.
not sure what to suggest with this, but you could maybe do a bootstrap to see how much your statistic varies with different solutions. another option would be to do a Bayesian analysis which would let you put priors over the population sizes and hence cut this down a lot.

How to get individual coordinates PCA in Python ? (Results different between R and Python)

I searched everywhere and answers i found don't work ...
I try to get coordinates of variables in Python but the solution i tried don't give me the same results than R
A)
Coordinates_Indiv = pca.fit_transform(df5)
Coordinates_Indiv = pd.DataFrame(Coordinates_Indiv)
B)
irlambdas=1/(5*np.sqrt(Explained_Variance))
mirlambadas=np.diagflat(irlambdas)
ProjectionsVars=df5.dot(df6)
ProjectionsVars2= ProjectionsVars.dot(mirlambadas)
with df6=df5.T
Has anyone an idea to solve it ?
First ,thank you so much for your answer and your time.
I am using the module Factominer and the summary(pca).
For python i did it with sklearn (pca) but also with numpy
The thing is i got the same Eigenvalues in both cases between R and Python
(For python i computed it by using :
Correlation_Matrix = np.corrcoef(df)
Correlation_Matrix = np.nan_to_num(Correlation_Matrix)
Correlation_Matrix[np.diag_indices_from(Correlation_Matrix)] = 1
Eigenvalues = np.linalg.eigvals(Correlation_Matrix)
Eigenvalues[Eigenvalues == 1] = 0
Or in a more easy way:
#Eigenvalues = pca.explained_variance_)
In both case i got the same results than with R ... but when i compute the individual coordinates , i get the same results between numpy and sklearn but not the same results than with R.God only knows

Nested Anova in python with Spm1d. Can't print f statistics and p values

I'm looking for a simple solution to perform multi-factor ANOVA analysis in python. A 2-factor nested ANOVA is what I'm after, and the SPM1D python module is one way to do that, however I am having an issue.
http://www.spm1d.org/doc/Stats1D/anova.html#two-way-nested-anova
for any of the nested approach examples, there is never any F-statistic or p_values printed, nor can I find any way to print them or send them to a variable.
To go through the motions of running one of their examples, where B is nested inside A, with Y observations:
import numpy as np
from matplotlib import pyplot
import spm1d
dataset = spm1d.data.uv1d.anova2nested.SPM1D_ANOVA2NESTED_3x3()
Y,A,B = dataset.get_data()
#(1) Conduct ANOVA:
alpha = 0.05
FF = spm1d.stats.anova2nested(Y, A, B, equal_var=True)
FFi = FF.inference(0.05)
print( FFi )
#(2) Plot results:
pyplot.close('all')
FFi.plot(plot_threshold_label=True, plot_p_values=True)
pyplot.show()
The only indication of statistical significance provided is whether the h0 hypothesis is rejected or not.
> print( FFi )
SPM{F} inference list
design : ANOVA2nested
nEffects : 2
Effects:
A z=(1x101) array df=(2, 6) h0reject=True
B z=(1x101) array df=(6, 36) h0reject=False
In reality, that should be enough. However, in science, scientists like to think of something as more or less significant, which is actually kind of crap... significance is binary. But that's how they think about it, so I have to play along in order to get work published.
The example code produces a matplotlib plot, and this DOES have the f statistic and p_values on it!
#(2) Plot results:
pyplot.close('all')
FFi.plot(plot_threshold_label=True, plot_p_values=True)
pyplot.show()
But I can't seem to get any output which prints it.
FFi.get_p_values
and
FFi.get_f_values
produce the output:
<bound method SPMFiList.get_p_values <kabammi edit -- or get_f_values> of SPM{F} inference list
design : ANOVA2nested
nEffects : 2
Effects:
A z=(1x101) array df=(2, 6) h0reject=True
B z=(1x101) array df=(6, 36) h0reject=False
So I don't know what to do. Clearly the FFi.plot class can access the p_values (with plot_p_values) but FFi.get_p_values cant!!? Can anyone lend a hand?
cheers,
K
The easiest way to get the p values is to use the get_p_values method that you mention, you just need to call the method by adding () to the end.
p = FFi.get_p_values()
print(p)
This yields:
([0.016584151119287904], [])
To see more detailed information for each effect in 2+-way ANOVA, including p values, use print along with the individual F statistics like this:
print( FFi[0] )
print( FFi[1] )
The first print statement will produce output like this:
SPM{F} inference field
SPM.effect : Main A
SPM.z : (1x101) raw test stat field
SPM.df : (2, 6)
SPM.fwhm : 11.79254
SPM.resels : (1, 8.47993)
Inference:
SPM.alpha : 0.050
SPM.zstar : 24.30619
SPM.h0reject : True
SPM.p_set : 0.017
SPM.p_cluster : (0.017)
You can retrieve the clusters' p values like this:
p = [F.p for F in FFi]
which gives the same result as calling get_p_values.
Note that there are no p values in this case for FFi[1] because the test statistic fails to cross the alpha-defined threshold (see the "Main B" panel in the figure above). If you need to report p values in this case as well, one option is simply to use "p > alpha". More precise p value are available parametrically up until about p = 0.5, but larger p values than that are not very accurate using parametric methods, so if you need p values for all cases consider using the nonparametric version: spm1d.stats.nonparam.anova2nested.

'bounds' doesn't work in scipy.stats.minimize function

I had a very strange problem in the minimize function. I wrote following code and hoped to output result from (0,1).
cons = ({'type':'eq','fun':lambda x: 1-sum(x)})
bnds = [(0,1)]*len(w)
minS = minimize(min_S_function, w, method = 'SLSQP', bounds = bnds, constraints = cons)
However, the result output many extremely small numbers instead of zero even though I set bounds between 0 to 1. Why is that?
In [68]:minS.x
Out[68]:
array([ 2.18674802e-14, -2.31905438e-14, 4.05696128e-01,
1.61295198e-14, 4.98954818e-02, -2.75073615e-14,
3.97195447e-01, 1.09796187e-14, -4.33297358e-15,
2.38805100e-14, 7.73037793e-15, 3.21824430e-14,
-1.42202909e-14, -1.08110329e-14, -1.83513297e-14,
-1.37745269e-14, 3.37854385e-14, 4.69473932e-14,
-1.09088800e-15, -1.57169147e-14, 7.47784562e-02,
1.32782180e-02, 1.64441640e-14, 2.72140153e-15,
5.23069695e-14, 5.91562687e-02, 2.16467506e-15,
-6.50672519e-15, 2.53337977e-15, -6.68019297e-14])
This is an acceptable solution!
Those iterative solvers only guarantee a local-optimum approximate-solution and that's what you got!
Check your numbers and you will see, that the distance to the lower-bound (of your negative out-of-bound values) is within 10^-14 = 0,00000000000001, an acceptable error (we use floating-point math after all).

Categories