Python Writing a function with a Markov chain matrix - python

I'm learnign python of biologists
I have this Markov chain of an infectious disease process:
I had to save a dictionary with states as keys and values as states index and then fill a respective transition matrix representing the Markov chain so I did this:
state_index = {'S':0, 'I':1, 'R':2, 'D':3}
initial_distribution = [0.9999, 0.0001, 0, 0]
# model parameters
r = 0.05
d = 0.02
h = 0.95
S, I, R, D = initial_distribution
T = sp.matrix([[1-(r*I) , (r*I) , 0 , 0 ],
[0 , 1-d , d*h , d*(1-h) ],
[0 , 0 , 1 , 0 ],
[0 , 0 , 0 , 1 ]])
Then I had to write a function name 'step' that gets a list with the state probability distribution (indexed according to the dictionary), the model parameters, and the transition matrix.
And the function returns an updated probability distribution (after multiplying it with the transition matrix) and an updated transition matrix (taking into account I change).
So I wrote this: but couldn't understand how can I update a transition matrix here
def step(dist, T, r, d, h):
next_dist_matrix = dist*T #updated probability distribution
next_dist = list(np.array(next_dist_matrix)[0])
next_T = T?
return next_dist, next_T
And now I'm stuck here because I need this func to create for-loop to take 1000 simulation steps. each time saving the current state distribution to a list

Related

Issue with Python scipy optimize minimize fmin_slsqp solver

I start with the optimization function from scipy.
I tried to create my code by copying the Find optimal vector that minimizes function solution
I have an array that contains series in columns. I need to multiply each of them by a weight so that the sum of last row of these columns multiplied by the weights gives a given number (constraint).
The sum of the series multiplied by the weights gives a new series where I extract the max-draw-down and I want to minimize this mdd.
I wrote my code as best as I can (2 months of Python and 3 hours of scipy) and can't solve the error message on the function used to solve the problem.
Here is my code and any help would be much appreciated:
import numpy as np
from scipy.optimize import fmin_slsqp
# based on: https://stackoverflow.com/questions/41145643/find-optimal-vector-that-minimizes-function
# the number of columns (and so of weights) can vary; it should be generic, regardless the number of columns
def mdd(serie): # finding the max-draw-down of a series (put aside not to create add'l problems)
min = np.nanargmax(np.fmax.accumulate(serie) - serie)
max = np.nanargmax((serie)[:min])
return serie[np.nanargmax((serie)[:min])] - serie[min] # max-draw-down
# defining the input data
# mat is an array of 5 columns containing series of independent data
mat = np.array([[1, 0, 0, 1, 1],[2, 0, 5, 3, 4],[3, 2, 4, 3, 7],[4, 1, 3, 3.1, -6],[5, 0, 2, 5, -7],[6, -1, 4, 1, -8]]).astype('float32')
w = np.ndarray(shape=(5)).astype('float32') # 1D vector for the weights to be used for the columns multiplication
w0 = np.array([1/5, 1/5, 1/5, 1/5, 1/5]).astype('float32') # initial weights (all similar as a starting point)
fixed_value = 4.32 # as a result of constraint nb 1
# testing the operations that are going to be used in the minimization
series = np.sum(mat * w0, axis=1)
# objective:
# minimize the mdd of the series by modifying the weights (w)
def test(w, mat):
series = np.sum(mat * w, axis=1)
return mdd(series)
# constraints:
def cons1(last, w, fixed_value): # fixed_value = 4.32
# the sum of the weigths multiplied by the last value of each column must be equal to this fixed_value
return np.sum(mat[-1, :] * w) - fixed_value
def cons2(w): # the sum of the weights must be equal to 1
return np.sum(w) - 1
# solution:
# looking for the optimal set of weights (w) values that minimize the mdd with the two contraints and bounds being respected
# all w values must be between 0 and 1
result = fmin_slsqp(test, w0, f_eqcons=[cons1, cons2], bounds=[(0.0, 1.0)]*len(w), args=(mat, fixed_value, w0), full_output=True)
weights, fW, its, imode, smode = result
print(weights)
You weren't that far off the mark. The biggest problem lies in the mdd function: In case there is no draw-down, your function spits out an empty list as an intermediate result, which then can no longer cope with the argmax function.
def mdd(serie): # finding the max-draw-down of a series (put aside not to create add'l problems)
i = np.argmax(np.maximum.accumulate(serie) - serie) # end of the period
start = serie[:i]
# check if there is dd at all
if not start.any():
return 0
j = np.argmax(start) # start of period
return serie[j] - serie[i] # max-draw-down
In addition, you must make sure that the parameter list is the same for all functions involved (cost function and constraints).
# objective:
# minimize the mdd of the series by modifying the weights (w)
def test(w, mat,fixed_value):
series = mat # w
return mdd(series)
# constraints:
def cons1(w, mat, fixed_value): # fixed_value = 4.32
# the sum of the weigths multiplied by the last value of each column must be equal to this fixed_value
return mat[-1, :] # w - fixed_value
def cons2(w, mat, fixed_value): # the sum of the weights must be equal to 1
return np.sum(w) - 1
# solution:
# looking for the optimal set of weights (w) values that minimize the mdd with the two contraints and bounds being respected
# all w values must be between 0 and 1
result = fmin_slsqp(test, w0, eqcons=[cons1, cons2], bounds=[(0.0, 1.0)]*len(w), args=(mat,fixed_value), full_output=True)
One more remark: You can make the matrix-vector multiplications much leaner with the #-operator.

How to calculate the distance between all atoms in a PDB file and create a distance matrix from that

I would like to calculate the distances between all atom in a pdb file and then create a distance matrix from the result of the PDB.
I currently have all the x, y and z coordinates but I am struggling to to do this distance calculation for all atoms.
distance = sqrt((x1-x2)^2+(y1-y2)^2+(z1-z2)^2)
For example:
Distance between Atom 1 and Atom 2 ,3 ,4...
Distance between Atom 2 and Atom 3, 4, 5...
And so forth for every Atom in the PDB file. I'm new to coding so any method to achieve the end result would be great.
pdb file in question - https://files.rcsb.org/download/6GCH.pdb
considering your code, you can:
x_y_z_ = list()
...
for atom in residue:
x = (atom.coord[0])
y = (atom.coord[1])
z = (atom.coord[2])
x_y_z_.append([x,y,z])
...
x_y_z_ = np.array(x_y_z_)
print( pairwise_distances(x_y_z_,x_y_z_) )
and them use pairwise_distances from sklearn, like:
from sklearn.metrics import pairwise_distances
import numpy as np
x_y_z_ = np.array([[120,130,123],
[655,123,666],
[111,444,333],
[520,876,222]])
print( pairwise_distances(x_y_z_,x_y_z_) )
out:
[[ 0. 762.31423967 377.8584391 852.24233643]
[762.31423967 0. 714.04901793 884.51681725]
[377.8584391 714.04901793 0. 605.1660929 ]
[852.24233643 884.51681725 605.1660929 0. ]]

Incorrect output for calculating the compact svd

I am trying to calculate the truncated svd for a given matrix. I have written the code but when I test it has an incorrect output. I'm not sure where I am going wrong. I think I may be calculating my pos_v incorrectly but I can't seem to find the issue, can anyone give any guidance?
Here is my code:
def compact_svd(A, tol=1e-6):
"""Compute the truncated SVD of A.
Parameters:
A ((m,n) ndarray): The matrix (of rank r) to factor.
tol (float): The tolerance for excluding singular values.
Returns:
((m,r) ndarray): The orthonormal matrix U in the SVD.
((r,) ndarray): The singular values of A as a 1-D array.
((r,n) ndarray): The orthonormal matrix V^H in the SVD.
"""
lambda_, v = sp.linalg.eig((A.conj().T # A))
lambda_ = lambda_.real
sigma = np.sqrt(lambda_)
indices = np.argsort(sigma)[::-1]
v = v[:, indices]
r = 0
for i in range(len(sigma)):
if sigma[i] > tol:
r = r + 1
pos_sigma = sigma[:r]
pos_v = v[:,:r]
U = (A # pos_v) / pos_sigma
return U, pos_sigma, pos_v.conj().T
Here is my test matrix:
A = np.array([[9,9,9,3,2,9,3,7,7,8],
[4,4,7,4,2,4,8,7,1,8],
[1,4,7,4,5,6,8,4,1,6],
[5,5,1,8,9,4,9,4,2,7],
[7,7,7,9,4,7,4,3,7,1]],dtype = float)
print(compact_svd(A))
The correct output:
(array([[ 0.54036027+0.j, 0.58805563+0.j, -0.29423603+0.j,
-0.4346745 +0.j, -0.29442248+0.j],
[ 0.41227593+0.j, -0.21929894+0.j, -0.51747179+0.j,
0.08375491+0.j, 0.71214086+0.j],
[ 0.38514303+0.j, -0.32015959+0.j, -0.24745912+0.j,
0.60060756+0.j, -0.57201156+0.j],
[ 0.43356274+0.j, -0.61204283+0.j, 0.41057641+0.j,
-0.51216171+0.j, -0.080897 +0.j],
[ 0.44914089+0.j, 0.35916564+0.j, 0.64485588+0.j,
0.42544582+0.j, 0.26912684+0.j]]),
array([39.03360665, 11.91940614, 9.3387396 , 5.38285176, 3.33439025]),
array([[ 0.31278916-0.j, 0.34239004-0.j, 0.35924746-0.j,
0.31566457-0.j, 0.24413875-0.j, 0.35101654-0.j,
0.35095554-0.j, 0.28925585-0.j, 0.22009374-0.j,
0.34370454-0.j],
[ 0.29775734-0.j, 0.21717625-0.j, 0.28679345-0.j,
-0.17261926-0.j, -0.41403132-0.j, 0.21480395-0.j,
-0.5556673 -0.j, -0.00587411-0.j, 0.40832611-0.j,
-0.24296833-0.j],
[ 0.17147953-0.j, 0.09198514-0.j, -0.32960263-0.j,
0.55102537-0.j, 0.36556324-0.j, -0.00497598-0.j,
-0.07790604-0.j, -0.33140639-0.j, 0.26883294-0.j,
-0.47752981-0.j],
[-0.47542292-0.j, -0.14068908-0.j, 0.62131114-0.j,
0.21645498-0.j, -0.11266769-0.j, 0.17761373-0.j,
0.23467192-0.j, -0.15350902-0.j, -0.07515751-0.j,
-0.43906049-0.j],
[ 0.33174054-0.j, -0.18290668-0.j, 0.04021533-0.j,
0.43552649-0.j, -0.50269662-0.j, -0.50174342-0.j,
0.17580464-0.j, 0.33582599-0.j, -0.05960136-0.j,
-0.1162055 -0.j]])

Theano - Sum by group

I'm working on a custom likelihood function for Theano (Attempting to fit a conditional logistic regression.)
The likelihood requires summing values by group. In R we have the "ave()" function, in Python Pandas we have "groupby()". How would I do something similar in Theano?
Edited for more detail
I want to create a cox proportional hazards model (same as conditional logistic regression.) The log likelihood requires the sum of values by group:
In Pandas, this would be:
temp = df.groupby('groupid')['eta'].aggregate(np.sum)
denominator = np.log(temp).sum()
In the data, we have a column with group ID, and the values to be summed
group eta
1 2.1
1 1.8
1 0.9
2 1.2
2 0.75
2 1.42
The output for the group sums would then be:
group sum
1 4.8
2 3.37
Then, the sum of the log of the sums:
log(4.8) + log(3.37) = 2.7835
This is quick and easy to do in Pandas. How can I do something similar in Thano? Sure, could write a nexted loop, but that seems slow and I try to avoid manually coded loops when possible as they are slow.
Thanks!
Let say you have "X" (a list of all your etas), with the dim. Nx1 (I guess) and a matrix H. H is a NxG matrix that has a on-hot encoding of the groups.
The you you write something like:
import numpy as np
from numpy import newaxis as na
import theano.tensor as T
X = T.vector()
H = T.matrix()
tmp = T.sum(X[:, na] * H, axis=0)
O = T.sum(T.log(tmp))
x = np.array([5, 10, 10, 0.5, 5, 0.5])
# create a 1-hot encoding
g = np.array([1, 2, 2, 0, 1, 0])
h = np.zeros(shape=(len(x), 3))
for i,j in enumerate(g):
h[i,j] = 1.0
O.eval({X:x, H: h})
This should work as long as there is at least one eta per point (or else -inf).

Translating Matlab (Octave) group coloring code into python (numpy, pyplot)

I want to translate the following group coloring octave function to python and use it with pyplot.
Function input:
x - Data matrix (m x n)
a - A parameter.
index - A vector of size "m" with values in range [: a]
(For example if a = 4, index can be [random.choice(range(4)) for i in range(m)]
The values in "index" indicate the number of the group the "m"th data point belongs to.
The function should plot all the data points from x and color them in different colors (Number of different colors is "a").
The function in octave:
p = hsv(a); % This is a x 3 metrix
colors = p(index, :); % ****This is m x 3 metrix****
scatter(X(:,1), X(:,2), 10, colors);
I couldn't find a function like hsv in python, so I wrote it myself (I think I did..):
p = colors.hsv_to_rgb(numpy.column_stack((
numpy.linspace(0, 1, a), numpy.ones((a ,2)) )) )
But I can't figure out how to do the matrix selection p(index, :) in python (numpy).
Specially because the size of "index" is bigger then "a".
Thanks in advance for your help.
So, you want to take an m x 3 of HSV values, and convert each row to RGB?
import numpy as np
import colorsys
mymatrix = np.matrix([[11,12,13],
[21,22,23],
[31,32,33]])
def to_hsv(x):
return colorsys.rgb_to_hsv(*x)
#Apply the to_hsv function to each matrix row.
print np.apply_along_axis(to_hsv, axis=1, arr=mymatrix)
This produces:
[[ 0.5 0. 13. ]
[ 0.5 0. 23. ]
[ 0.5 0. 33. ]]
Follow through on your comment:
If I understand you have a matrix p that is an a x 3 matrix, and you want to randomly select rows from the matrix over and over again, until you have a new matrix that is m x 3?
Ok. Let's say you have a matrix p defined as follows:
a = 5
p = np.random.randint(5, size=(a, 3))
Now, make a list of random integers between the range 0 -> 3 (index starts at 0 and ends to a-1), That is m in length:
m = 20
index = np.random.randint(a, size=m)
Now access the right indexes and plug them into a new matrix:
p_prime = np.matrix([p[i] for i in index])
Produces a 20 x 3 matrix.

Categories