Use of pykalman

Use of pykalman - python

I want to try to use pykalman to apply a kalman filter to data from sensor variables. Now, I have a doubt with the data of the observations. In the example, the 3 observations are two variables measured in three instants of time or are 3 variables measured in a moment of time
from pykalman import KalmanFilter
>>> import numpy as np
>>> kf = KalmanFilter(transition_matrices = [[1, 1], [0, 1]], observation_matrices = [[0.1, 0.5], [-0.3, 0.0]])
>>> measurements = np.asarray([[1,0], [0,0], [0,1]]) # 3 observations
>>> kf = kf.em(measurements, n_iter=5)
>>> (filtered_state_means, filtered_state_covariances) = kf.filter(measurements)
>>> (smoothed_state_means, smoothed_state_covariances) = kf.smooth(measurements)

Let's see:
transition_matrices = [[1, 1], [0, 1]]
means
So your state vector consists of 2 elements, for example:
observation_matrices = [[0.1, 0.5], [-0.3, 0.0]]
means
The dimension of an observation matrix should be [n_dim_obs, n_dim_state].
So your measurement vector also consists of 2 elements.
Conclusion: the code has 3 observations of two variables measured at 3 different points in time.
You can change the given code so it can process each measurement at a time step. You use kf.filter_update() for each measurement instead of kf.filter() for all measurements at once:
from pykalman import KalmanFilter
import numpy as np
kf = KalmanFilter(transition_matrices = [[1, 1], [0, 1]], observation_matrices = [[0.1, 0.5], [-0.3, 0.0]])
measurements = np.asarray([[1,0], [0,0], [0,1]]) # 3 observations
kf = kf.em(measurements, n_iter=5)
filtered_state_means = kf.initial_state_mean
filtered_state_covariances = kf.initial_state_covariance
for m in measurements:
filtered_state_means, filtered_state_covariances = (
kf.filter_update(
filtered_state_means,
filtered_state_covariances,
observation = m)
)
print(filtered_state_means);
Output:
[-1.69112511 0.30509999]
The result is slightly different as when using kf.filter() because this function does not perform prediction on the first measurement, but I think it should.

Related

Calculate Similarity multi dimensions array Using fastdtw

I have trying to use fastdtw to calculate similarity
Here is the working example: The similarity is 0.916%.
dataSetI = [1, 0.5, 2, 2]
dataSetII = [1, 1, 1, 0.51, 2, 1]
x = np.array(dataSetI)
y = np.array(dataSetII)
distance, path = fastdtw(x, y, dist=euclidean)
print("{:.3f}%".format(similarity))#0.916%
But the dataset I am going to compare is a multidimensional array, random index length
Example:
dataSetI = [[1, 0.5], [2, 2],[]]]
dataSetII = [[1, 1,3,5], [1, 0.51], [2, 1,5,6,7]]
x = np.array(dataSetI)
y = np.array(dataSetII)
distance, path = fastdtw(x, y, dist=euclidean)
#error here
ValueError: setting an array element with a sequence.
So my question is: Am I able to do this using fastdtw? Or is there any library able to do this? Please let me know. Thx.

Nooo! Dont use FastDTW
FastDTW is approximate and Generally Slower than the Algorithm it Approximates
Renjie Wu, Eamonn J. Keogh: ICDE 2021: 2327-2328

how to iterate over certain rows based on condition to calculate cosine distance

I have a data frame that look like below. Notice that the index is not sequential.
pd.DataFrame(np.array([[0.1, 0.2, 0.1, 1], [0.4, 0.5, 0, 0], [0.2, 0.4, 0.2,0],[0.3, 0.1, 0.2,1],[0.4, 0.2, 0.2,1]]),
columns=['a', 'b', 'c','manager'])
df=df.set_index([pd.Index([0, 2, 10, 14,16])], 'id')
I would like to calculate the cosine distance between each row and those that have 1 in manager (excluding itself), and then take an average and append it to a new column cos_distance. For example, for row0, I will get cosine distance with row 3 and 4 and then take the average. How do I add the condition to restrict it to those with 1 in the manager column only?
I tried running below code, but probably because we don't have sequential indices, it returned an empty list.
from scipy.spatial.distance import cosine as cos
x=df.iloc[:, :3]
manager=df[df['manager']==1].iloc[:, :3]
lead_cos = []
for i in range(0):
person_cos = []
for j in range(0, len(manager)):
person_cos.append(cos(x.loc[i], manager.loc[j]))
lead_cos.append(np.average(person_cos))
lead_cos
Desired output:

This is what I'm trying. I'm not getting the exact values as your desired output, probably because for each "manager" I include itself in the cosine calculation (maybe you need to avoid that too, not sure).
EDIT: I manage to avoid repeating the current manager. However, index 14 gives me a value different than yours. I also included rounding to 2 decimal places.
from scipy.spatial.distance import cosine as cos
import pandas as pd
import numpy as np
df = pd.DataFrame(np.array([[0.1, 0.2, 0.1, 1], [0.4, 0.5, 0, 0], [0.2, 0.4, 0.2,0],[0.3, 0.1, 0.2,1],[0.4, 0.2, 0.2,1]]),
columns=['a', 'b', 'c','manager'])
df=df.set_index([pd.Index([0, 2, 10, 14,16])], 'id')
n = df.shape[0]
x=df.iloc[:, :3]
manager=df[df['manager']==1].iloc[:, :3]
n_man = manager.shape[0]
lead_cos = []
for i in range(n):
person_cos = []
for j in range(n_man):
if x.index[i] != manager.index[j]:
person_cos.append(cos(x.values.tolist()[i], manager.values.tolist()[j]))
lead_cos.append(round(np.average(person_cos),2))
df['lead_cos'] = lead_cos
print(df)
Output:

Scipy KDTree get rectangular subset of grid defined by two points

I am using the following example from :
from scipy import spatial
x, y = np.mgrid[0:5, 2:8]
tree = spatial.KDTree(list(zip(x.ravel(), y.ravel())))
pts = np.array([[0, 0], [2.1, 2.9]])
idx = tree.query(pts)[1]
data = tree.data[??????????]
If I input two arbitrary points (see variable pts), I am looking to return all pairs of coordinates that lie within the rectangle defined by the two points (KDTree finds the closest neighbour). So in this case:
array([[0, 0],
[0, 1],
[0, 2],
[1, 0],
[1, 1],
[1, 2],
[2, 0],
[2, 1],
[2, 2]])
How can I achieve that from the tree data?

Seems that I found a solution:
from scipy import spatial
import numpy as np
x, y = np.mgrid[0:5, 0:5]
tree = spatial.KDTree(list(zip(x.ravel(), y.ravel())))
pts = np.array([[0, 0], [2.1, 2.2]])
idx = tree.query(pts)[1]
data = tree.data[[idx[0], idx[1]]]
rectangle = tree.data[np.where((tree.data[:,0]>=min(data[:,0])) & (tree.data[:,0]<=max(data[:,0])) & (tree.data[:,1]>=min(data[:,1])) & (tree.data[:,1]<=max(data[:,1])))]
However, I would love to see a solution using the query option!

Using pykalman on live data

All the examples I've seen on pykalman documentation works on a given dataset, I was wandering how it could be used by feeding a single observation at the time while taking into account the time delta.
From the documentation:
from pykalman import KalmanFilter
import numpy as np
kf = KalmanFilter(transition_matrices = [[1, 1], [0, 1]], observation_matrices = [[0.1, 0.5], [-0.3, 0.0]])
measurements = np.asarray([[1,0], [0,0], [0,1]]) # 3 observations
kf = kf.em(measurements, n_iter=5)
(filtered_state_means, filtered_state_covariances) = kf.filter(measurements)
(smoothed_state_means, smoothed_state_covariances) = kf.smooth(measurements)

Return associated probability for samples from tf.multinomial in Tensorflow

I'm generating samples in Tensorflow with tf.multinomial, and I'm looking for a way to return associated probability with the randomly selected element. So in the following case:
logits = [[-1., 0., 1], [1, 1, 1], [0, 1, 2]]
samples = tf.multinomial(logits, 2)
with tf.Session() as sess:
sess.run(samples)
Instead of having
[[1, 2], [0, 1], [1, 1]]
as result, I'd like to see something like
[[(1, 0.244728), (2, 0.66524)],
[(0, 0.33333), (1, 0.33333)],
[(1, 0.244728), (1, 0.244728)]]
Is there any way to achieve this?

I'm confused , does tensorflow do some sort of transformation on the inside that turns your logits into probabilities? The multinomial distribution takes in as parameters a set of positional probabilities that determines, probabilistically the likelihood of the outcome (positionally) being sampled. i.e
# this is all psuedocode
x = multinomial([.2, .3, .5])
y ~ x
# this will give a value of 0 20% of the time
# a value of 1 30% of the time
# and a value of 2 50% of the time
therefor your probabilities might be your logits.
looking at https://www.tensorflow.org/api_docs/python/tf/multinomial
you see it states they are "unnormalized log probabilities" so if you can apply that transformation, you have the probabilities

You can try tf.gather_nd, you can try
>>> import tensorflow as tf
>>> tf.enable_eager_execution()
>>> probs = tf.constant([[0.5, 0.2, 0.1, 0.2], [0.6, 0.1, 0.1, 0.1]], dtype=tf.float32)
>>> idx = tf.multinomial(probs, 1)
>>> row_indices = tf.range(probs.get_shape()[0], dtype=tf.int64)
>>> full_indices = tf.stack([row_indices, tf.squeeze(idx)], axis=1)
>>> rs = tf.gather_nd(probs, full_indices)
Or, you can use tf.distributions.Multinomial, the advantage is you do not need to care about the batch_size in the above code. It works under varying batch_size when you set the batch_size=None. Here is a simple example,
multinomail = tf.distributions.Multinomial(
total_count=tf.constant(1, dtype=tf.float32),
probs=probs)
sampled_actions = multinomail.sample() # sample one action for data in the batch
predicted_actions = tf.argmax(sampled_actions, axis=-1)
action_probs = sampled_actions * predicted_probs
action_probs = tf.reduce_sum(action_probs, axis=-1)
I think this is what you want to do. I prefer the latter one because it is flexible and elegant.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Use of pykalman - python

Related

Calculate Similarity multi dimensions array Using fastdtw

how to iterate over certain rows based on condition to calculate cosine distance

Scipy KDTree get rectangular subset of grid defined by two points

Using pykalman on live data

Return associated probability for samples from tf.multinomial in Tensorflow

Categories

Resources