Multinomial Distribution in python - python

I am trying to translate some Julia code to Python. This is code for the multinomial distribution, and I am stuck in the last part of the it. I don't know how to write it in Python, because I want to know if there is a package that will do what I want. I don't know if I can do this using SciPy.stats, because the documentation seems kind of limited.
Here you will find a part of the Julia code where I'm stuck:
x1s is an array, x also is an array, OrthoNNDist is a name of a struct:
Base.length(d::OrthoNNDist) = length(d.x0)
Distributions.rand(d::OrthoNNDist) = rand(d.x1s)
Distributions.pdf(d::OrthoNNDist, x::Vector) = x in d.x1s ? d.prob : 0.0
Distributions.pdf(d::OrthoNNDist) = fill(d.prob, size(d.x1s))
Distributions.logpdf(d::OrthoNNDist, x::Vector) = log(pdf(d, x))

Related

How to translate a 2007 sympy/sage/python code to a modern day sympy/python code?

Me struggling with the General Number Field Sieve : round 2 ! As usual : english is not my first langage so 'sorry for bad english'.
So : I'm trying to implement this algorithm and to do it I'm heavily relying on sympy.
But I'm also basing parts of my work on the code of Pr. William Stein (https://wstein.org/misc/aly-lll/gnfs.html), written in python/sympy/sage in 2007.
In it, there is this peice of code (which, of course, happens to be one of the most important) :
def find_kernel_vectors(M):
return M.kernel().basis_matrix().rows() [1]
It will be applied the matrix :
M = Matrix(GF(2), len(sel), ev) [2]
where sel is a list and ev the coefficients of the matrix, which are each either 0 or 1.
However, when I copy-paste it, adding from sympy import Matrix, GF, I run in quite a lot of troubles.
Upon execution of [2], I get the error message GF(2) is not an integer
I looked up how to use matrices in sympy and from what I gathered, only the list of coefficients in an acceptable input.
So I decided to define M as follows in order to be able to test [1] :
M = Matrix(ev)
And upon execution of 1, I get the error message MutableDenseMatrix has no attribute kernel
The thing is, it's not the first time that things like this happen, because Pr. Stein was coding in a 2007 version of python/sympy/sage.
I don't really think that the issue with [2] is an actual issue because, as I said, the coefficients of my matrix are already 0 or 1 (i.e. elements of GF(2)), however I'm not sure about it. But I'm sure about the fact that [1] is a big issue that I don't have any idea to solve.
I beleive it all boils down to the translation between 2007 sage/sympy/python to modern day python, so if someone knows how to do it, it would save my code.
Thanks a lot !

How to implement this R Poisson distribution in Python?

I've coded something in R but I can't seem to do the same in Python.
Below is the code - it definitely works in R.
I am having trouble with the Python syntax to achieve the same with numpy.
myMaxAC = qpois(p=as.numeric(0.95),
lambda=(121412)*(0.005))
For clarity, 0.95 is the confidence interval, 121412 is my population size, and 0.005 is a frequency within the population.
I just want to know how to get the same answer in Python, which incidentally is 648.
You can get this using poisson.ppf:
from scipy.stats import poisson
myMaxAC = poisson.ppf(0.95, (121412)*(0.005))
print(myMaxAC)
648.0

Using MATLAB functions in Python for solving a fourth order polynomial

I am solving a fourth order polynomial which has varying coefficients and thus i want to merge a MATLAB function in python.
I am new to this concept and thus i am getting several tracebacks for the test case which i wrote before moving to the actual code.
I am a beginner in both MATLAB and Python.
Here's the python code:
import matlab.engine
import math
eng = matlab.engine.start_matlab()
D=(eng.hub(1,0,0,-184602.030,-(75.2)**4))
print(D)
Here's the MATLAB code:
function D=hub(a,b,c,d,e)
coefvct = [a b c d e]; % Coefficient Vector
D= roots(coefvct) % Solution
end
Here's the traceback i encountered
I am not familiar with the Matlab engine, but looking at the error, the first thing you need to correct is to give it floats and not interested, since this is what it is complaining about: eng.hub(1.0, 0.0, 0.0, -184602.030,-(75.2)**4)).
Notice the decimal points in the first three arguments.

how to use simpleitk get inverse displacement field

I just moved from matlab to python recently so I can use simpleitk and sorry if this is a dumb question.
I have a transformation tx after demons registration using simpleitk. I wish to get the displacement field and its inverse by doing the following,
disp_field = tx.GetDisplacementField()
disp_field_inv = tx.GetInverseDisplacementField()
It turns out disp_field is exactly what I need --- an image volume of 256*256*176. But disp_field_inv is an empty array. Does anyone know why?
Then I tried the following,
disp_field_inv = sitk.InverseDisplacementField(disp_field,disp_field.GetSize(),disp_field.GetOrigin(),disp_field.GetSpacing(),
subsamplingFactor=16)
But python is just running like forever. Does anybody know how to do it properly?
The following is the specification for running the InvertDisplacementField procedural interface
Image itk::simple::InvertDisplacementField (const Image & image1,
uint32_t maximumNumberOfIterations = 10u,
double maxErrorToleranceThreshold = 0.1,
double meanErrorToleranceThreshold = 0.001,
bool enforceBoundaryCondition = true)
So I think that by you passing the
disp_field.GetSize(),disp_field.GetOrigin(),disp_field.GetSpacing(), subsamplingFactor=16
as parameters 2 to 5 means you are passing the interface not what is expected?
Try just running disp_field_inv = sitk.InverseDisplacementField(disp_field)
and see if it iterates to a result!
For what it's worth after all these years, just wanted to point out that the original question and (so far only) answer by g.stevo mix-up two different filters available in SimpleITK, namely:
sitk.InverseDisplacementField
sitk.InvertDisplacementField
Each of these procedural APIs and their respective image filters have different Execute function arguments.

Python's implementation of Mutual Information

I am having some issues implementing the Mutual Information Function that Python's machine learning libraries provide, in particular :
sklearn.metrics.mutual_info_score(labels_true, labels_pred, contingency=None)
(http://scikit-learn.org/stable/modules/generated/sklearn.metrics.mutual_info_score.html)
I am trying to implement the example I find in the Stanford NLP tutorial site:
The site is found here : http://nlp.stanford.edu/IR-book/html/htmledition/mutual-information-1.html#mifeatsel2
The problem is I keep getting different results, without figuring out the reason yet.
I get the concept of Mutual Information and feature selection, I just don't understand how it is implemented in Python. What I do is that I provide the mutual_info_score method with two arrays based on the NLP site example, but it outputs different results. The other interesting fact is that anyhow you play around and change numbers on those arrays you are most likely to get the same result. Am I supposed to use another data structure specific to Python or what is the issue behind this? If anyone has used this function successfully in the past it would be of a great help to me, thank you for your time.
I encountered the same issue today. After a few trials I found the real reason: you take log2 if you strictly followed NLP tutorial, but sklearn.metrics.mutual_info_score uses natural logarithm(base e, Euler's number). I didn't find this detail in sklearn documentation...
I verified this by:
import numpy as np
def computeMI(x, y):
sum_mi = 0.0
x_value_list = np.unique(x)
y_value_list = np.unique(y)
Px = np.array([ len(x[x==xval])/float(len(x)) for xval in x_value_list ]) #P(x)
Py = np.array([ len(y[y==yval])/float(len(y)) for yval in y_value_list ]) #P(y)
for i in xrange(len(x_value_list)):
if Px[i] ==0.:
continue
sy = y[x == x_value_list[i]]
if len(sy)== 0:
continue
pxy = np.array([len(sy[sy==yval])/float(len(y)) for yval in y_value_list]) #p(x,y)
t = pxy[Py>0.]/Py[Py>0.] /Px[i] # log(P(x,y)/( P(x)*P(y))
sum_mi += sum(pxy[t>0]*np.log2( t[t>0]) ) # sum ( P(x,y)* log(P(x,y)/( P(x)*P(y)) )
return sum_mi
If you change this np.log2 to np.log, I think it would give you the same answer as sklearn. The only difference is that when this method returns 0, sklearn will return a number very near to 0. ( And of course, use sklearn if you don't care about log base, my piece of code is just for demo, it gives poor performance...)
FYI, 1)sklearn.metrics.mutual_info_score takes lists as well as np.array; 2) the sklearn.metrics.cluster.entropy uses also log, not log2
Edit: as for "same result", I'm not sure what you really mean. In general, the values in the vectors don't really matter, it is the "distribution" of values that matters. You care about P(X=x), P(Y=y) and P(X=x,Y=y), not the value x,y.
The code below should provided a result: 0.00011053558610110256
c=np.concatenate([np.ones(49), np.zeros(27652), np.ones(141), np.zeros(774106) ])
t=np.concatenate([np.ones(49), np.ones(27652), np.zeros(141), np.zeros(774106)])
computeMI(c,t)

Categories