I am doing some work with Markov Chains and I need to lookup the transition probability from a transition matrix given a sequence of state changes. How does one do this efficiently in numpy?
For example:
import numpy as np
#here is the sequence that I need to look up in the transition matrix
sequence = [0, 1, 0, 1, 1]
#the transition matrix that gives the probability to change between each
of the states
transition_matrix = np.array([[0.2, 0.8], [0.6, 0.4]])
#desired output
result = [0.8, 0.6, 0.8, 0.4]
So the result is just the probability value that was looked up in the transition matrix. How to do this efficiently when there are many states and the sequence is very long?
Thanks.
Just use zip:
result = []
for (step, next_step) in zip(sequence[:-1], sequence[1:]):
result.append(transition_matrix[step][next_step])
Result:
[0.8, 0.6, 0.8, 0.4]
Related
I have a python NxN numpy pair-wise array (matrix) of double values. Each array element of e.g., (i,j), is a measurement between the i and j item. The diagonal, where i==j, is 1 as it's a pairwise measurement of itself. This also means that the 2D NxN numpy array can be represented in matrix triangular form (one half of the numpy array identical to the other half across the diagonal).
A truncated representation:
[[1. 0.11428571 0.04615385 ... 0.13888889 0.07954545 0.05494505]
[0.11428571 1. 0.09836066 ... 0.06578947 0.09302326 0.07954545]
[0.04615385 0.09836066 1. ... 0.07843137 0.09821429 0.11711712]
...
[0.13888889 0.06578947 0.07843137 ... 1. 0.34313725 0.31428571]
[0.07954545 0.09302326 0.09821429 ... 0.34313725 1. 0.64130435]
[0.05494505 0.07954545 0.11711712 ... 0.31428571 0.64130435 1. ]]
I want to get out the smallest N values whilst not including the pairwise values twice, as would be the case due to the pair-wise duplication e.g., (5,6) == (6,5), and I do not want to include any of the identical diagonal values of 1 where i == j.
I understand that numpy has the partition method and I've seen plenty of examples for a flat array, but I'm struggling to find anything straightforward for a pair-wise comparison matrix.
EDIT #1
Based on my first response below I implemented:
seventyPercentInt: int = round((populationSizeInt/100)*70)
upperTriangleArray = dataArray[np.triu_indices(len(dataArray),1)]
seventyPercentArray = upperTriangleArray[np.argpartition(upperTriangleArray,seventyPercentInt)][0:seventyPercentInt]
print(len(np.unique(seventyPercentArray)))
The upperTriangleArray numpy array has 1133265 elements to pick the lowest k from. In this case k is represented by seventyPercentInt, which is around 1054 values. However, when I apply np.argpartition only the value of 0 is returned.
The flat array upperTriangleArray is reduced to a shape (1133265,).
SOLUTION
As per the first reply below (the accepted answer), my code that worked:
upperTriangleArray = dataArray[np.triu_indices(len(dataArray),1)]
seventyPercentInt: int = round((len(upperTriangleArray)/100)*70)
seventyPercentArray = upperTriangleArray[np.argpartition(upperTriangleArray,seventyPercentInt)][0:seventyPercentInt]
I ran into some slight trouble (my own making), with the seventyPercentInt. Rather than taking 70% of the pairwise elements, I took 70% of the elements to be compared. Two very different values.
You can use np.triu_indices to keep only the values of the upper triangle.
Then you can use np.argpartition as in the example below.
import numpy as np
A = np.array([[1.0, 0.1, 0.2, 0.3],
[0.1, 1.0, 0.4, 0.5],
[0.2, 0.3, 1.0, 0.6],
[0.3, 0.5, 0.4, 1.0]])
A_upper_triangle = A[np.triu_indices(len(A), 1)]
print(A_upper_triangle)
# return [0.1 0.2 0.3 0.3 0.5 0.4]
k=2
print(A_upper_triangle[np.argpartition(A_upper_triangle, k)][0:k])
#return [0.1 0.2]
Problem:
I have a dataset of multiple binary neurons recorded in parallel. Whenever a neuron fires, the time of fire is recorded into a NumPy array, and the ID of the recorded neuron is recorded in a secondary NumPy array. For example, if [0.25, 0.31, 0.41, 0.50] is the array of firing times, and [0, 5, 2, 1] is the array of IDs, then neuron 0 fired at 0.25s, neuron 5 fired at 0.31s, and so on.
Expected Output:
Instead of this, I want an array for each neuron, with the times it fired.
So, if neuron 0 fired at [0.25, 1.25, 4.0], then that'll be the array for neuron 0, and each neuron will have its own array.
What I've tried:
The brute force approach of creating an array for each neuron and looping through and appending to the correct array will work, but appending is slow.
A better approach than doing an individual element loop is using .where() to get the locations of each ID, and then indexing into the array of times:
individual_times = []
for i in range(minindex, maxindex+1):
i_indices = np.where(indices==i)[0]
individual_times.append(times[i_indices])
This is not very fast, though, so if there's a faster way to do this, I'd love to know. Thanks!
Edit: Reproducible input and output:
#input:
times = np.array([0.25, 0.52, 1.25, 4.0, 6.78])
IDs = np.array([0, 1, 0, 0, 2])
#expected output
individual_times = [np.array([0.25, 1.25, 4.0]), np.array([0.52]), np.array([6.78])]
You can try boolean masking
times = np.array([0.25, 0.52, 1.25, 4.0, 6.78])
IDs = np.array([0, 1, 0, 0, 2])
individual_times = []
for unique_ids in np.sort(np.unique(IDs)):
individual_times.append(times[IDs == unique_ids])
print(individual_times)
[array([0.25, 1.25, 4. ]), array([0.52]), array([6.78])]
So I have a script that reads multiple values from a file, and creates lists with these values.
I have a list of fractional coordinates of atoms (where each element is a list containing x, y, z coords) and their corresponding charges in another list. I also have three values that are scalars and correspond to the dimensions I am working in.
Here is a snippet of how the lists look:
Coords = [[0.982309, 0.927798, 0.458125], [0.017691, 0.072202, 0.958125], [0.482309, 0.572202, 0.458125], [0.517691, 0.427798, 0.958125], [0.878457, 0.311996, 0.227878], [0.121543, 0.688004, 0.727878], [0.378457, 0.188004, 0.227878], [0.621543, 0.811996, 0.727878], [0.586004, 0.178088, 0.37778], [0.413997, 0.821912, 0.87778], [0.086003, 0.321912, 0.37778], ......]
Charges = [0.18, 0.18, 0.18, 0.18, 0.17, 0.17, 0.17, 0.17, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.39, 0.39, 0.39, 0.39, 0.4, 0.4, 0.4, 0.4, 0.17, 0.17, 0.17....]
Then I have the three dimension values that I will call a, b, c.
Here is what I need to calculate:
I need to do an inner (dot) product of the fractional coordinates by the dimensions, for each atom. Of this sub-list, I need to multiply the first element by a, the second element by b and the third by c. I also need to multiply each of these components by the corresponding charge. This will give me the dipole moments, that I need to finally calculate.
Detailed example:
So I want to take each element in the list. We'll start with element 0.
So coords[0] = [0.982309, 0.927798, 0.458125], and charge[0] = 0.18
What I want to do, is take the first element of coords[0], multiply that by a. Then the second by b, and the third by c. Then, I want to multiply each of these elements by charges[0], 0.18, and sum the three together. That gives me the dipole moment due to one atom. I want to repeat this for every atom, so coords[1] with charges[1], coords[2] with charges[2] and so forth. Then, I want to sum all of these together, and divide by a * b * c to give the polarization.
I'm still new to using Python, and so I am quite unsure on even where to start with this! I hope I have explained well enough, but if not I can clarify where needed.
Side note:
I also then need to change some of these fractional coordinates and repeat the calculation. I am fairly certain I know how to do this, here's how it would look.
for x in displacement
fracCoord[0]=fracCoord[0]+x
So I would change the relevant values in the fractional coordinates list by constant amount before repeating the calculations.
You could use the numpy package, it is designed for such (and more complex) numerical computations based on matrices and arrays.
Then, you have:
import numpy as np
# Assume you have N atoms.
# N x 3 matrix where each row is the coordinates of an atom
CoordsMat = np.array(Coords)
# N x 1 vector where each value is charge of atom
ChargesVec = np.array(Charges)
# 3 x 1 vector, the weights for each coordinate
dims = np.array([a, b, c])
# N x 1 vector, computes dot product for each atom's coordinates with the weights
positionVecs = np.matmul(CoordsMat, dims)
# N x 1 vector, scales each dot product by the charges vector
dipoleMoments = np.multiply(positionVecs, ChargesVec)
I was playing around with scipy's kmeans2 algorithm until I noticed a problem. Consider the following code:
x = np.array([[0.1, 0.0], [0.0, 0.1], [1.1, 1.0], [1.0, 1.1]])
c = np.array([[3,3], [4, 4]])
kmeans2(x, c, minit = 'matrix', iter=100)
You'd expect this code (rather deviously) to just converge to a solution with the following centroids: [0.05, 0.05] and [1.05, 1.05].
However, the code returns this:
(array([[ 0.55, 0.55],
[ 4. , 4. ]]), array([0, 0, 0, 0], dtype=int32))
It seems like the k-means algorithm takes its initial centroids into account when finding the new centroids. Why is this? How can I prevent this from happening?
I haven't really worked on this for a while but I randomly got this Eureka-moment in which I figured out why my problem was occuring:
Although the results seem kinda strange, if you look at how k-means works, these results are actually easy to explain: in the first epoch of k-means, the four data points are all assigned to the [3, 3] centroid, because that centroid is closest to all data points. The mean of the data points is [ 0.55, 0.55]. No matter how many epochs you do after, the centroid initialised as [3, 3] will stay the same (because it's not 'attracted' to any other data points, there aren't any) and the other centroid (initialised as [4, 4]) will stay put because none of the data points are closer to this centroid than to the other. That's it.
I am having problem fitting the following data to a range of 0.1-1.0:
t=[0.23,0.76,0.12]
Obviously each item in the t-list falls within the range 0.1-1.0, but the output of my code indicates the opposite.
My attempt
import numpy as np
>>> g=np.arange(0.1,1.0,0.1)
>>> t=[0.23,0.76,0.12]
>>> t2=[x for x in t if x in g]
>>> t2
[]
Desired output:[0.23,0.76,0.12]
I clearly understand that using an interval of 0.1 will make it difficult to find any of the t-list items in the specified arange. Could have made some adjustment but my range is fixed and my data is large which makes it practically impossible to keep adjust the range.
Any suggestions on how to get around this? thanks
Did you try to inspect g?
>>> g
array([ 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9])
So clearly none of your elements is in g.
Probably, you look for something like
>>> [x for x in t if 0.1<=x<=1.0]
[0.23, 0.76, 0.12]