mlpy - Dynamic Time Warping depends on x?

mlpy - Dynamic Time Warping depends on x? - python

I am trying to get the distance between these two arrays shown below by DTW.
I am using the Python mlpy package that offers
dist, cost, path = mlpy.dtw_std(y1, y2, dist_only=False)
I understand that DTW does take care of the "shifting". In addition, as can be seen from above, the mlpy.dtw_std() only takes in 2 1-D arrays. So I expect that no matter how I left/right shift my curves, the dist returned by the function should never change.
However after shifting my green curve a bit to the right, the dist returned by mlpy.dtw_std() changes!
Before shifting: Python mlpy.dwt_std reports dist = 14.014
After shifting: Python mlpy.dwt_std reports dist = 38.078
Obviously, since the curves are still those two curves, I don't expect the distances to be different!
Why is it so? Where went wrong?

Let me reiterate what I have understood, please correct me if I am going wrong anywhere. I observe that in both your plots, your 1D series in blue is remaining identical, while green colored is getting stretched. How you are doing it, that you have explained it in the post on Sep 19 '13 at 9:36. Your premise is that because (1) DTW 'takes care' of time shift and (2) all that you are doing is stretching one time-series length-wise, not affecting y-values, (Inference:) you are expecting distance to remain the same.
There is a little missing link between [(1),(2)] and [(Inference)]. Which is, individual distance values corresponding to mappings WILL change as you change set of signals itself. And this will result into difference in the overall distance computation. Plot the warping paths, cost grid to see it for yourself.
Let's take an oversimplified case...
Let
a=range(0,101,5) = [0,5,10,15...95, 100]
and b=range(0,101,5) = [0,5,10,15...95, 100].
Now intuitively speaking, you/I would expect one to one correspondence between 2 signals (for DTW mapping), and distance for all of the mappings to be 0, signals being identically looking.
Now if we make, b=range(0,101,4) = [0,4,8,12...96,100],
DTW mapping between a and b still would start with a's 0 getting mapped to b's 0, and end at a's 100 getting mapped to b's 100 (boundary constraints). Also, because DTW 'takes care' of time shift, I would also expect 20's, 40's, 60's and 80's of the two signals to be mapped with one another. (I haven't tried DTWing these two myself, saying it from intuition, so please check. There is little possibility of non-intuitive warpings taking place as well, depending on step patterns allowed / global constraints, but let's go with intuitive warpings for the moment for the ease of understanding / sake of simplicity).
For the remaining data points, clearly, distances corresponding to mapping are now non-zero, therefore the overall distance too is non-zero. Our distance/overall cost value has changed from zero to something that is non-zero.
Now, this was the case when our signals were too simplistic, linearly increasing. Imagine the variabilities that will come into picture when you have real life non-monotonous signals, and need to find time-warping between them. :)
(PS: Please don't forget to upvote answer :D). Thanks.

Obviously, the curves are not identical, and therefore the distance function must not be 0 (otherwise, it is not a distance by definition).
What IS "relatively large"? The distance probably is not infinite, is it?
140 points in time, each with a small delta, this still adds up to a non-zero number.
The distance "New York" to "Beijing" is roughly 11018 km. Or 1101800000 mm.
The distance to Alpha Centauri is small, just 4.34 lj. That is the nearest other stellar system to us...
Compare with the distance to a non-similar series; that distance should be much larger.

Related

How to integrate discrete function

I need to integrate a certain function that I have specified as discrete values for discrete arguments (I want to count the area under the graph I get).
I.e., from the earlier part of the code I have the literal:
args=[a1, a2, a3, a3]
valuses=[v1, v2, v3, v4]
where value v1 corresponds to a1, etc. If it's important, I have args set in advance with a specific discretization width, and I count values with a ready-made function.
I am attaching a figure.
And putting this function, which gave me a 'values' array, into integrate.quad() gives me an error:
IntegrationWarning: The maximum number of subdivisions (50) has been achieved. If increasing the limit yields no improvement it is advised to analyze
the integrand in order to determine the difficulties. If the position of a
local difficulty can be determined (singularity, discontinuity) one will
probably gain from splitting up the interval and calling the integrator on the subranges. Perhaps a special-purpose integrator should be used.
How can I integrate this? I'm mulling over the scipy documentation, but I can't seem to put it together. Because, after all, args themselves are already discretized by a finite number.

I am guessing that before passing the integral to quad you did some kind of interpolation on it. In general this is a misguided approach.
Integration and interpolation is very closely related. An integral requires you to compute the area under the curve and thus you must know the value of the function at any given point. Hence, starting from a set of data it is natural to want to interpolate it first. Yet the quad routine does not know that you started with a limited set of data, it just assumes that the function you gave it is "perfect" and it will do its best to compute the area under it! However the interpolated function is just a guess on what the values are between given points and thus integrating an interpolated function is a waste of time.
As MB-F said, in the discrete case you should simply sum up the points while multiplying them by the step size between them. You can do this the naïve way by pretending that the function is just rectangles. Or you can do what MB-F suggested which pretend that all the data points are connected with straight lines. Going one step further is pretending that the line connecting the data points is smooth (often true for physical systems) and use simpson integration implemented by scipy

Since you only have a discrete approximation of the function, integration reduces to summation.
As a simple approximation of an integral, try this:
midpoints = (values[:-1] + values[1:]) / 2
steps = np.diff(args)
area = np.sum(midpoints * steps)
(Assuming args and values are numpy arrays and the function is value = f(arg).)
That approach sums the areas of all trapezoids between adjacent data points (Wikipedia).

How to deal with functions that approach infinity in NumPy?

In a book on matplotlib I found a plot of 1/sin(x) that looks similar to this one that I made:
I used the domain
input = np.mgrid[0 : 1000 : 200j]
What confuses me here to an extreme extent is the fact that the sine function is just periodic. I don't understand why the maximal absolute value is decreasing. Plotting the same function in wolfram-alpha does not show this decreasing effect. Using a different step-amount
input = np.mgrid[0 : 1000 : 300j]
delivers a different result:
where we also have this decreasing tendency in maximal absolute value.
So my questions are:
How can I make a plot like this consistent i.e. independent of step-size/step-amount?
Why does one see this decreasing tendency even though the function is purely periodic?

The period of the sine function is rather higher than what is plotted, so what you’re seeing is aliasing from the difference in the sampling frequency and some multiple of the true frequency. Since one of the roots is at 0, the smallest discrepancy that happens to exist between the first few samples and a
multiple of π itself scales linearly away from 0, producing a 1/x envelope.
In this example, input[5] is 5(1000/(200-1))=8π−0.007113, so the function is about −141 there, as shown. input[10] is of course 16π−0.014226, so that the function is about −70, and so on as long as the discrepancy is much smaller than π.
It’s possible for some one of the quasi-periodic sample sequences to eventually land even closer to nπ, producing a more complicated pattern like that in the second plot.

Why does one see this decreasing tendency even though the function is purely periodic?
Keep in mind that actually at every multiple of pi the function goes to infinity. And the size of the jump displayed actually only reflects the biggest value of the sampled values where the function still made sense. Therefore you get a big jump if you happen to sample a value were the function is big but not too big to be a float.
To be able to plot anything matplotlib throws away values that do not make sense. Like the np.nan you get at multiples of pi and ±np.infs you get for values very close to that. I believe what happens is that one step size away from zero you happen to get a value small enough not to be thrown away but still very large. While when you get to pi and multiples of it the largest value gets thrown away.
How can I make a plot like this consistent i.e. independent of step-size/step-amount?
You get strange behaviour around the values where your function becomes unreasonable large. Just pick a ylimit to avoid plotting those crazy large values.
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.transforms import Bbox
x = np.linspace(10**-10,50, 10**4)
plt.plot(x,1/np.sin(x))
plt.ylim((-15,15))

How to programatically report reasonable local maximas and minimas in a data set?

I have an array of y-values which are evenly spaced along the x-axis and I need to programmatically find the "troughs". I think either Octave or Python3 are good language choices for this problem as I know both have strong math capabilities.
I thought to interpolate the function and look for where the derivatives are 0, but that would require my human eyes to first analyze the resulting graph to know where the maxima and minima already were, but I need this entire thing to be automatic; as to work with an arbitrary dataset.
It dawned on me that this problem likely had an existing solution in a Python3 or Octave function or library, but I could not find one. Does there exist a library to automatically report local maximas and minimas within a dataset?
More Info
My current planned approach is to implement a sort of "n-day moving average" with a threshold. After initializing the first day moving average, I'll watch for the next moving average to move above or below it by a threshold. If it moves higher then I'll consider myself in a "rising" period. If it moves lower then I'm in a "falling" period. While I'm in a rising period, I'll update the maximum observed moving average until the current moving average is sufficiently below the previous maximum.
At this point, I'll consider myself in a "falling" period. I'll lock in the point where the moving average was previously highest, and then repeat except using inverse logic for the "falling" period.
It seemed to me that this is probably a pretty common problem though, so I'm sure there's an existing solution.

Python answer:
This is a common problem, with existing solutions.
Examples include:
peakutils
scipy find_peaks
see also this question
in all cases, you'll have to test your parameters to have what you want.

Octave answer:
I believe immaximas and imregionalmax do exactly what you are looking for (depending on which of the two it is exactly that you are looking for - have a look at their documentation to see the difference).
These are part of the image package, but will obviously work on 1D signals too.
For more 'functional' zero-finding functions, there is also fzero etc.

Appropriate encoding using Particle Swarm Optimization

The Problem
I've been doing a bit of research on Particle Swarm Optimization, so I said I'd put it to the test.
The problem I'm trying to solve is the Balanced Partition Problem - or reduced simply to the Subset Sum Problem (where the sum is half of all the numbers).
It seems the generic formula for updating velocities for particles is
but I won't go into too much detail for this question.
Since there's no PSO attempt online for the Subset Sum Problem, I looked at the Travelling Salesman Problem instead.
They're approach for updating velocities involved taking sets of visited towns, subtracting one from another and doing some manipulation on that.
I saw no relation between that and the formula above.
My Approach
So I scrapped the formula and tried my own approach to the Subset Sum Problem.
I basically used gbest and pbest to determine the probability of removing or adding a particular element to the subset.
i.e - if my problem space is [1,2,3,4,5] (target is 7 or 8), and my current particle (subset) has [1,None,3,None,None], and the gbest is [None,2,3,None,None] then there is a higher probability of keeping 3, adding 2 and removing 1, based on gbest
I can post code but don't think it's necessary, you get the idea (I'm using python btw - hence None).
So basically, this worked to an extent, I got decent solutions out but it was very slow on larger data sets and values.
My Question
Am I encoding the problem and updating the particle "velocities" in a smart way?
Is there a way to determine if this will converge correctly?
Is there a resource I can use to learn how to create convergent "update" formulas for specific problem spaces?
Thanks a lot in advance!

Encoding
Yes, you're encoding this correctly: each of your bit-maps (that's effectively what your 5-element lists are) is a particle.
Concept
Your conceptual problem with the equation is because your problem space is a discrete lattice graph, which doesn't lend itself immediately to the update step. For instance, if you want to get a finer granularity by adjusting your learning rate, you'd generally reduce it by some small factor (say, 3). In this space, what does it mean to take steps only 1/3 as large? That's why you have problems.
The main possibility I see is to create 3x as many particles, but then have the transition probabilities all divided by 3. This still doesn't satisfy very well, but it does simulate the process somewhat decently.
Discrete Steps
If you have a very large graph, where a high velocity could give you dozens of transitions in one step, you can utilize a smoother distance (loss or error) function to guide your model. With something this small, where you have no more than 5 steps between any two positions, it's hard to work with such a concept.
Instead, you utilize an error function based on the estimated distance to the solution. The easy one is to subtract the particle's total from the nearer of 7 or 8. A harder one is to estimate distance based on that difference and the particle elements "in play".
Proof of Convergence
Yes, there is a way to do it, but it requires some functional analysis. In general, you want to demonstrate that the error function is convex over the particle space. In other words, you'd have to prove that your error function is a reliable distance metric, at least as far as relative placement goes (i.e. prove that a lower error does imply you're closer to a solution).
Creating update formulae
No, this is a heuristic field, based on shape of the problem space as defined by the particle coordinates, the error function, and the movement characteristics.
Extra recommendation
Your current allowable transitions are "add" and "delete" element.
Include "swap elements" to this: trade one present member for an absent one. This will allow the trivial error function to define a convex space for you, and you'll converge in very little time.

Find best interpolation nodes

I am studying physics and ran into a really interesting problem. I'm not an expert on programming so please take this into account while reading this.
I really hope that someone can help me with this problem because I struggle with this matter for about 2 months now and don't see any success.
So here is my Problem:
I got a bunch of data sets (more than 2 less than 20) from numerical calculations. The set is given by x against measurement values. I have a set of sensors and want to find the best positions x for my sensors such that the integral of the interpolation comes as close as possible to the integral of the numerical data set.
As this sounds like a typical mathematical problem I started to look for some theorems but I did not find anything.
So I started to write a python program based on the SLSQP minimizer. I chose this because it can handle bounds and constraints. (Note there is always a sensor at 0 and one at 1)
Constraints: the sensor array must stay sorted all the time such that x_i smaller than x_i+1 and the interval of x is normalized to [0,1].
Before doing an overall optimization I started to look for good starting points and searched for maximums, minimums and linear areas of my given data sets. But an optimization over 40 values turned out to deliver bad results.
In my second try I started to search for these points and defined certain areas. So I optimized each area with 1 to 40 sensors. Then I compared the results and decided which area is worth putting more sensors in. I the last step I wanted to do an overall optimization again. But these idea didn't seem to be the proper solution, too, because the optimization had convergence problem as well.
The big problem was, that my optimizer broke the boundaries. I covered this by interrupting the optimization, because once this boundaries were broken the result was not correct in the end. If this happens I reset my initial setup and a homogeneous distribution. After this there are normally no violence of boundaries but the results seems to be a homogeneous distribution, too, often this is obviously not the perfect distribution.
As my algorithm works for simple examples and dies for more complex data I think there is a general problem and not just some error in my coding. Does anyone have an idea how to move on or knows some theoretical stuff about this matter?
The attached plot show the areas in different colors. The function is shown at the bottom and the sensor positions are represented as dots. Dots at value y=1 are from the optimization with one sensors 2 represents the results of optimization with 2 variables. So as the program reaches higher sensor numbers the whole thing gets more and more homogeneous.
It is easy to see that if n is the number of sensors and n goes to infinity you have a total homogeneous distribution. But as far as I see this this should not happen for just 10 sensors.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.