Implementation of a threshold detection function in Python - python

I want to implement following trigger function in Python:
Input:
time vector t [n dimensional numpy vector]
data vector y [n dimensional numpy vector] (values correspond to t vector)
threshold tr [float]
Threshold type vector tr_type [m dimensional list of int values]
Output:
Threshold time vector tr_time [m dimensional list of float values]
Function:
I would like to return tr_time which consists of the exact (preferred also interpolated which is not yet in code below) time values at which y is crossing tr (crossing means going from less then to greater then or the other way around). The different values in tr_time correspond to the tr_type vector: the elements of tr_type indicate the number of the crossing and if this is an upgoing or a downgoing crossing. For example 1 means first time y goes from less then tr to greater than tr, -3 means the third time y goes from greater then tr to less then tr (third time means along the time vector t)
For the moment I have next code:
import numpy as np
import matplotlib.pyplot as plt
def trigger(t, y, tr, tr_type):
triggermarker = np.diff(1 * (y > tr))
positiveindices = [i for i, x in enumerate(triggermarker) if x == 1]
negativeindices = [i for i, x in enumerate(triggermarker) if x == -1]
triggertime = []
for i in tr_type:
if i >= 0:
triggertime.append(t[positiveindices[i - 1]])
elif i < 0:
triggertime.append(t[negativeindices[i - 1]])
return triggertime
t = np.linspace(0, 20, 1000)
y = np.sin(t)
tr = 0.5
tr_type = [1, 2, -2]
print(trigger(t, y, tr, tr_type))
plt.plot(t, y)
plt.grid()
Now I'm pretty new to Python so I was wondering if there is a more Pythonic and more efficient way to implement this. For example without for loops or without the need to write separate code for upgoing or downgoing crossings.

You can use two masks: the first separates the value below and above the threshold, the second uses np.diff on the first mask: if the i and i+1 value are both below or above the threshold, np.diff yields 0:
import numpy as np
import matplotlib.pyplot as plt
t = np.linspace(0, 8 * np.pi, 400)
y = np.sin(t)
th = 0.5
mask = np.diff(1 * (y > th) != 0)
plt.plot(t, y, 'bx', markersize=3)
plt.plot(t[:-1][mask], y[:-1][mask], 'go', markersize=8)
Using the slice [:-1] will yield the index "immediately before" crossing the threshold (you can see that in the chart). if you want the index "immediately after" use [1:] instead of [:-1]

Related

Calculate derivate of spatial measurements

I have a set of spatial distributed measurements.
For each point p1 = [x1,y1,z1] there is a measurement v1 which is a scalar. (e.g. Temperature measurements under water.)
Lets assume these measurements are on a regular grid.
I would like to find out where is the most variation in this distribution.
That means in what positions is the most change of temperature.
I think this corresponds to the spatial derivation of temperature.
Can somebody give me an advice how to proceed?
What are methodologies to archive this?
I tried to implement it with np.gradient() but i fail at interpreting the result...
This is absolutely not an optimized code, but here is what I came up with, at least to explain how it works.
grid = [[[1, 2], [2, 3]], [[8, 5], [4, 1000]]]
def get_greatest_diff(g, x, y, z):
value = g[x][y][z]
try:
diff_x = abs(value-g[x+1][y][z])
except IndexError:
diff_x = -1
try:
diff_y= abs(value-g[x][y+1][z])
except IndexError:
diff_y = -1
try:
diff_z = abs(value-g[x][y][z+1])
except IndexError:
diff_z = -1
if diff_x>=diff_y and diff_x>=diff_z:
return diff_x, [x+1, y, z]
if diff_y>diff_x and diff_y>=diff_z:
return diff_y, [x, y+1, z]
return diff_z, [x, y, z+1]
greatest_diff = 0
greatest_diff_pos0 = []
greatest_diff_pos1 = []
for x in range(len(grid)):
for y in range(len(grid[x])):
for z in range(len(grid[x][y])):
diff, coords = get_greatest_diff(grid, x, y, z)
if diff > greatest_diff:
greatest_diff = diff
greatest_diff_pos0 = [x, y, z]
greatest_diff_pos1 = coords
print(greatest_diff, greatest_diff_pos0, greatest_diff_pos1)
The try:...except:... are here to handle the edge conditions. (That's dirty but that's quick!)
For each cell, you will look at the three neighbours x+1 or y+1 or z+1 and you compute the difference with their values. You keep the largest difference in the neighborhood and you return it. (That is the explanation of get_greatest_diff)
In the main loop, you check if the difference in this neighborhood is the greatest of all, if so, store the difference, and the two cells in question.
Finally, return the greatest difference and the cells in question.
Here is a numpy solution that returns the indices in an ndarray that has the biggest total differences with its neighbors.
Say the input array is X and it is 2D. I will create D where D[i,j] = |X[i, j]-X[i-1, j]|+|X[i,j]-X[i, j-1]|. And return the indices of D which give the largest value in D.
def greatest_diff(X):
ndim = X.ndim
Ds = [np.abs(np.diff(X, axis = i, prepend=0)) for i in range(ndim)]
D = sum(Ds)
return np.unravel_index(D.argmax(), D.shape)
X = np.zeros((5,5))
X[2,2] = 1
greatest_diff(X)
# returns (2, 2)
X = np.zeros((5,10,9))
X[2,2,7] = -1
greatest_diff(X)
# returns (2, 2, 7)
Another solution might be calculating the difference between X[i, j] and sum(X[k, l]) where k,l are the neighbors of i, j. You can achieve this by applying a gaussian filter to the X say gX then taking the squared differences: (X-gX)^2.
def greatest_diff_gaussian(X, sigma = 1):
from scipy.ndimage import gaussian_filter
gX = gaussian_filter(X, sigma)
dgX = np.power(X - gX, 2)
return np.unravel_index(dgX.argmax(), dgX.shape)

Fill values into numpy array that depend non-trivially on indices

The problem: I'm trying to fill a 2D array arr with values where the values depend on the indices (i, j) in some nontrivial way. More precisely, i and j together provide a new index k (i, j, and k all have the same range), which I then use to lookup a value in some other array (i.e., H[i,j] = values[k]).
My initial thought was that np.put_along_axis could be used for this. I generated two lists indices and values, such that
nrows, ncols = arr.shape
for i in range(nrows):
arr[i, indices[i]] = values[i]
In principle this works fine, but when I try
np.put_along_axis(arr, indices, values, axis=1)
I get the following error
AttributeError: 'list' object has no attribute 'dtype'
However, I can't make these lists into arrays because they're ragged; some rows have fewer values that need insertion than others. I am wondering if there is a way to use np.put_along_axis?
In short you probably want to use np.indices.
Since you didn't give an example I will use indices to calculate polar coordinates and look them up in an other picture.
First I have a picture to look up the values later
import matplotlib.pyplot as plt
import matplotlib
import numpy as np
n = 100
func = lambda i,j: np.linalg.norm(np.array([i-n/2,j-n/2]), axis=0)
arr = np.fromfunction(func, (n,n), dtype='int')
arr = (arr < np.median(arr)).astype('int')
plt.imshow(arr, cmap='gray')
Now I calculate polar coordinates on the above picture. In case you need a refresher on your calculus. This means we identify points by distance to a point and angle. I.e. if you go left/right in the below picture you go in a circle (counterclockwise/clockwise) on the above on and up and down means you go to and away from the center. In polar coordinates the disk should more or less turn into a rectangle.
r,phi = np.indices(arr.shape, dtype='float')
r *= 50/100
phi *= 2*np.pi/100
def polar2cartesian(r, phi):
x = r * np.cos(phi)
y = r * np.sin(phi)
return(x, y)
i,j = polar2cartesian(r, phi)
i = (i+50).astype('int')
j = (j+50).astype('int')
out = np.zeros(arr.shape)
out = arr[i,j]
plt.imshow(out, cmap='gray')
plt.xlabel('phi (0 to 2pi)')
plt.ylabel('r (0 to 50)')

How to plot curve with given polynomial coefficients?

using Python I have an array with coefficients from a polynomial, let's say
polynomial = [1,2,3,4]
which means the equation:
y = 4x³ + 3x² + 2x + 1
(so the array is in reversed order)
Now how do I plot this into a visual curve in the Jupyter Notebook?
There was a similar question:
Plotting polynomial with given coefficients
but I didn't understand the answer (like what is a and b?).
And what do I need to import to make this happen?
First, you have to decide the limits for x in your plot. Let's say x goes from -2 to 2. Let's also ask for a hundred points on our curve (this can be any sufficiently large number for your interval so that you get a smooth-looking curve)
Let's create that array:
lower_limit = -2
upper_limit = 2
num_pts = 100
x = np.linspace(lower_limit, upper_limit, num_pts)
Now, let's evaluate y at each of these points. Numpy has a handy polyval() that'll do this for us. Remember that it wants the coefficients ordered by highest exponent to lowest, so you'll have to reverse the polynomial list
poly_coefs = polynomial[::-1] # [4, 3, 2, 1]
y = np.polyval(poly_coefs, x)
Finally, let's plot everything:
plt.plot(x, y, '-r')
You'll need the following imports:
import numpy as np
from matplotlib import pyplot as plt
If you don't want to import numpy, you can also write vanilla python methods to do the same thing:
def linspace(start, end, num_pts):
step = (end - start) / (num_pts - 1)
return [start + step * i for i in range(num_pts)]
def polyval(coefs, xvals):
yvals = []
for x in xvals:
y = 0
for power, c in enumerate(reversed(coefs)):
y += c * (x ** power)
yvals.append(y)
return yvals

Efficiently select elements from an (x,y) field with a 2D mask in Python

I have a large field of 2D-position data, given as two arrays x and y, where len(x) == len(y). I would like to return the array of indices idx_masked at which (x[idx_masked], y[idx_masked]) is masked by an N x N int array called mask. That is, mask[x[idx_masked], y[idx_masked]] == 1. The mask array consists of 0s and 1s only.
I have come up with the following solution, but it (specifically, the last line below) is very slow, given that I have N x N = 5000 x 5000, repeated 1000s of times:
import numpy as np
import matplotlib.pyplot as plt
# example mask of one corner of a square
N = 100
mask = np.zeros((N, N))
mask[0:10, 0:10] = 1
# example x and y position arrays in arbitrary units
x = np.random.uniform(0, 1, 1000)
y = np.random.uniform(0, 1, 1000)
x_bins = np.linspace(np.min(x), np.max(x), N)
y_bins = np.linspace(np.min(y), np.max(y), N)
x_bin_idx = np.digitize(x, x_bins)
y_bin_idx = np.digitize(y, y_bins)
idx_masked = np.ravel(np.where(mask[y_bin_idx - 1, x_bin_idx - 1] == 1))
plt.imshow(mask[::-1, :])
plt.scatter(x, y, color='red')
plt.scatter(x[idx_masked], y[idx_masked], color='blue')
Is there a more efficient way of doing this?
Given that mask overlays your field with identically-sized bins, you do not need to define the bins explicitly. *_bin_idx can be determined at each location from a simple floor division, since you know that each bin is 1 / N in size. I would recommend using 1 - 0 for the total width (what you passed into np.random.uniform) instead of x.max() - x.min(), if of course you know the expected size of the range.
x0 = 0 # or x.min()
x1 = 1 # or x.max()
x_bin = (x1 - x0) / N
x_bin_idx = ((x - x0) // x_bin).astype(int)
# ditto for y
This will be faster and simpler than digitizing, and avoids the extra bin at the beginning.
For most purposes, you do not need np.where. 90% of the questions asking about it (including this one) should not be using where. If you want a fast way to access the necessary elements of x and y, just use a boolean mask. The mask is simply
selction = mask[x_bin_idx, y_bin_idx].astype(bool)
If mask is already a boolean (which it should be anyway), the expression mask[x_bin_idx, y_bin_idx] is sufficient. It results in an array of the same size as x_bin_idx and y_bin_idx (which are the same size as x and y) containing the mask value for each of your points. You can use the mask as
x[selection] # Elements of x in mask
y[selection] # Elements of y in mask
If you absolutely need the integer indices, where is sill not your best option.
indices = np.flatnonzero(selection)
OR
indices = selection.nonzero()[0]
If your goal is simply to extract values from x and y, I would recommend stacking them together into a single array:
coords = np.stack((x, y), axis=1)
This way, instead of having to apply indices twice, you can extract the values with just
coords[selection, :]
OR
coords[indices, :]
Depending on the relative densities of mask and x and y, either the boolean masking or linear indexing may be faster. You will have to time some relevant cases to get a better intuition.

np.where() to eliminate data, where coordinates are too close to each other

I'm doing aperture photometry on a cluster of stars, and to get easier detection of background signal, I want to only look at stars further apart than n pixels (n=16 in my case).
I have 2 arrays, xs and ys, with the x- and y-values of all the stars' coordinates:
Using np.where I'm supposed to find the indexes of all stars, where the distance to all other stars is >= n
So far, my method has been a for-loop
import numpy as np
# Lists of coordinates w. values between 0 and 2000 for 5000 stars
xs = np.random.rand(5000)*2000
ys = np.random.rand(5000)*2000
# for-loop, wherein the np.where statement in question is situated
n = 16
for i in range(len(xs)):
index = np.where( np.sqrt( pow(xs[i] - xs,2) + pow(ys[i] - ys,2)) >= n)
Due to the stars being clustered pretty closely together, I expected a severe reduction in data, though even when I tried n=1000 I still had around 4000 datapoints left
Using just numpy (and part of the answer here)
X = np.random.rand(5000,2) * 2000
XX = np.einsum('ij, ij ->i', X, X)
D_squared = XX[:, None] + XX - 2 * X.dot(X.T)
out = np.where(D_squared.min(axis = 0) > n**2)
Using scipy.spatial.pdist
from scipy.spatial import pdist, squareform
D_squared = squareform(pdist(x, metric = 'sqeuclidean'))
out = np.where(D_squared.min(axis = 0) > n**2)
Using a KDTree for maximum fast:
from scipy.spatial import KDTree
X_tree = KDTree(X)
in_radius = np.array(list(X_tree.query_pairs(n))).flatten()
out = np.where(~np.in1d(np.arange(X.shape[0]), in_radius))
np.random.seed(seed=1)
xs = np.random.rand(5000,1)*2000
ys = np.random.rand(5000,1)*2000
n = 16
mask = (xs>=0)
for i in range(len(xs)):
if mask[i]:
index = np.where( np.sqrt( pow(xs[i] - x,2) + pow(ys[i] - y,2)) <= n)
mask[index] = False
mask[i] = True
x = xs[mask]
y = ys[mask]
print(len(x))
4220
You can use np.subtract.outer for creating the pairwise comparisons. Then you check for each row whether the distance is below 16 for exactly one item (which is the comparison with the particular start itself):
distances = np.sqrt(
np.subtract.outer(xs, xs)**2
+ np.subtract.outer(ys, ys)**2
)
indices = np.nonzero(np.sum(distances < 16, axis=1) == 1)

Categories