Matplotlib: Hysteresis loop using Mirrored or Split x axis - python
In an experiment, a load cell advances in equal increments of distance with time, compresses a sample; stops when a specified distance from the start point is reached; then retracts in equal increments of distance with time back to the starting position.
A plot of pressure (load cell reading) on the y axis against pressure on the x axis produces a familiar hysteresis loop. A plot of pressure (load cell reading) on the y axis against time on the x axis produces an assymetric peak with the maximum pressure in the centre, corresponding to the maximum advancement point of the sensor.
Instead of the above, I'd like to plot pressure on the y axis against distance on the x axis, with the additional constraint that the x axis is labelled starting at 0, with maximum pressure at the middle of the x axis, and 0 again at the right hand end of the x-axis. In other words, the curve will be identical in shape to the plot of pressure v time, but will be of pressure v distance, where the left half of the plot indicates the distance of the probe from its starting position during advancement; and the right half of the plot indicates distance of the probe from its starting position during retraction.
My actual datasets contain thousands of rows of data but by way of illustration, a minimal dummy dataset would look something like the following, where the 3 columns correspond to Time, Distance of probe from origin, and Pressure measured by probe respectively:
[
[0,0,0],
[1,2,10],
[2,4,30],
[3,6,60],
[4,4,35],
[5,2,15],
[6,0,0]
]
I can't work out how to get MatPlotlib to construct the x-axis so that the range goes from 0 to a maximum, then back to 0 again. I'd be grateful for advice on how to achieve this plot in the most simple and elegant way. Many thanks.
As you have time, you can use it for the x axis values and just change the x tick labels:
import numpy as np
import matplotlib.pyplot as plt
# Time, Distance, Pressure
data = [[0, 0, 0],
[1, 2, 10],
[2, 4, 30],
[3, 6, 60],
[4, 4, 35],
[5, 2, 15],
[6, 0, 0]]
# convert to array to allow indexing like [i, j]
data = np.array(data)
fig = plt.figure()
ax = fig.add_subplot(111)
max_ticks = 10
skip = (data.shape[0] / max_ticks) + 1
ax.plot(data[:, 0], data[:, 2]) # Pressure(time)
ax.set_xticks(data[::skip, 0])
ax.set_xticklabels(data[::skip, 1]) # Pressure(Distance(time)) ?
ax.set_ylabel('Pressure [Pa?]')
ax.set_xlabel('Distance [m?]')
fig.show()
The skip is just so you don't end up with too many ticks on the plot, change as you like.
As said in comment, the above only holds for uniforme changes in distance as a function of time. For non uniform changes, you'll have to use something like:
data = [[0, 0, 0],
[1, 2, 10],
[2, 4, 30],
[3, 6, 60],
[3.5, 5.4, 40],
[4, 4, 35],
[5, 2, 15],
[6, 0, 0]]
# convert to array to allow indexing like [i, j]
data = np.array(data)
def find_max_pos(data, column=0):
return np.argmax(data[:, column])
def reverse_unload(data, unload_start):
# prepare new_data with new column:
new_shape = np.array(data.shape)
new_shape[1] += 1
new_data = np.empty(new_shape)
# copy all correct data
new_data[:, 0] = data[:, 0]
new_data[:, 1] = data[:, 1]
new_data[:, 2] = data[:, 2]
new_data[:unload_start+1, 3] = data[:unload_start+1, 1]
# use gradient to fill the rest
gradient = -np.gradient(data[:, 1])
for i in range(unload_start + 1, data.shape[0]):
new_data[i, 3] = new_data[i-1, 3] + gradient[i]
return new_data
data = reverse_unload(data, find_max_pos(data, 1))
fig = plt.figure()
ax = fig.add_subplot(111)
max_ticks = 10
skip = (data.shape[0] / max_ticks) + 1
ax.plot(data[:, 3], data[:, 2]) # Pressure("Distance")
ax.set_xticks(data[::skip, 3])
ax.set_xticklabels(data[::skip, 1])
ax.grid() # added for clarity
ax.set_ylabel('Pressure [Pa?]')
ax.set_xlabel('Distance [m?]')
fig.show()
Regarding the fact that using the measured values as the ticks results in these not being round nice numbers, I found it was just easier to map the automatic ticks from matplotlib to the correct values:
import numpy as np
import matplotlib.pyplot as plt
data = [[0, 0, 0],
[1, 2, 10],
[2, 4, 30],
[3, 6, 60],
[3.5, 5.4, 40],
[4, 4, 35],
[5, 2, 15],
[6, 0, 0]]
# convert to array to allow indexing like [i, j]
data = np.array(data)
def find_max_pos(data, column=0):
return np.argmax(data[:, column])
def reverse_unload(data):
unload_start = find_max_pos(data, 1)
# prepare new_data with new column:
new_shape = np.array(data.shape)
new_shape[1] += 1
new_data = np.empty(new_shape)
# copy all correct data
new_data[:, 0] = data[:, 0]
new_data[:, 1] = data[:, 1]
new_data[:, 2] = data[:, 2]
new_data[:unload_start+1, 3] = data[:unload_start+1, 1]
# use gradient to fill the rest
gradient = data[unload_start:-1, 1]-data[unload_start+1:, 1]
for i, j in enumerate(range(unload_start + 1, data.shape[0])):
new_data[j, 3] = new_data[j-1, 3] + gradient[i]
return new_data
def create_map_function(data):
"""
Return function that maps values of distance
folded over the maximum pressure applied.
"""
max_index = find_max_pos(data, 1)
x0, y0 = data[max_index, 1], data[max_index, 1]
x1, y1 = 2*data[max_index, 1], 0
m = (y1 - y0) / (x1 - x0)
b = y0 - m*x0
def map_function(x):
if x < x0:
return x
else:
return m*x+b
return map_function
def process_data(data):
data = reverse_unload(data)
map_function = create_map_function(data)
fig, ax = plt.subplots()
ax.plot(data[:, 3], data[:, 2])
ax.set_xticklabels([map_function(x) for x in ax.get_xticks()])
ax.grid()
ax.set_ylabel('Pressure [Pa?]')
ax.set_xlabel('Distance [m?]')
fig.show()
if __name__ == '__main__':
process_data(data)
Update: Have found a workaround to the problem of rounding ticks to the nearest integer by using the np.around function which rounds decimals to the nearest even value, to a specified number of decimal places (default = 0): e.g. 1.5 and 2.5 round to 2.0, -0.5 and 0.5 round to 0.0, etc. More info here: https://docs.scipy.org/doc/numpy1.10.4/reference/generated/numpy.around.html
So berna1111's code becomes:
import numpy as np
import matplotlib.pyplot as plt
# Time, Distance, Pressure
data = [[0, 0, 0],
[1, 1.9, 10], # Dummy data including decimals to demonstrate rounding
[2, 4.1, 30],
[3, 6.1, 60],
[4, 3.9, 35],
[5, 1.9, 15],
[6, -0.2, 0]]
# convert to array to allow indexing like [i, j]
data = np.array(data)
fig = plt.figure()
ax = fig.add_subplot(111)
max_ticks = 10
skip = (data.shape[0] / max_ticks) + 1
ax.plot(data[:, 0], data[:, 2]) # Pressure(time)
ax.set_xticks(data[::skip, 0])
ax.set_xticklabels(np.absolute(np.around((data[::skip, 1])))) # Pressure(Distance(time)); rounded to nearest integer
ax.set_ylabel('Pressure [Pa?]')
ax.set_xlabel('Distance [m?]')
fig.show()
According to the numpy documentation, np.around should round the final value of -0.2 for Distance to '0.0'; however it seems to round to '-0.0' instead. Not sure why this occurs, but since all my xticklabels in this particular case need to be positive integers or zero, I can correct this behaviour by using the np.absolute function as shown above. Everything now seems to work OK for my requirements, but if I'm missing something, or there's a better solution, please let me know.
Related
How to calculate the correlation coefficient on a rolling window of a vector using numpy?
I'm able to calculate a rolling correlation coefficient for a 1D-array (data against [0, 1, 2, 3, 4]) using a loop. I'm looking for a smarter solution using numpy (not pandas). Here is my current code: import numpy as np data = np.array([10,5,8,9,15,22,26,11,15,16,18,7,4,8,-2,-3,-4,-6,-2,0,10,0,5,8]) x = np.zeros_like(data).astype('float32') length = 5 for i in range(length, data.shape[0]): x[i] = np.corrcoef(data[i - length:i], np.arange(length))[0, 1] print(x) x gives : [ 0. 0. 0. 0. 0. 0.607 0.959 0.98 0.328 -0.287 -0.61 -0.314 -0.18 -0.8 -0.782 -0.847 -0.811 -0.825 -0.869 -0.283 0.566 0.863 0.643 0.454] Any solution without the loop please?
Use a numpy.lib.stride_tricks.sliding_window_view (available in numpy v1.20.0+) swindow = np.lib.stride_tricks.sliding_window_view(data, (length,)) which gives a view on the data array that looks like so: array([[10, 5, 8, 9, 15], [ 5, 8, 9, 15, 22], [ 8, 9, 15, 22, 26], [ 9, 15, 22, 26, 11], [15, 22, 26, 11, 15], [22, 26, 11, 15, 16], [26, 11, 15, 16, 18], [11, 15, 16, 18, 7], [15, 16, 18, 7, 4], [16, 18, 7, 4, 8], [18, 7, 4, 8, -2], [ 7, 4, 8, -2, -3], [ 4, 8, -2, -3, -4], [ 8, -2, -3, -4, -6], [-2, -3, -4, -6, -2], [-3, -4, -6, -2, 0], [-4, -6, -2, 0, 10], [-6, -2, 0, 10, 0], [-2, 0, 10, 0, 5], [ 0, 10, 0, 5, 8]]) Now, we want to apply the correlation coefficient calculation to each row of this array. Unfortunately, np.corrcoef doesn't take an axis argument, it applies the calculation to the entire matrix and doesn't provide a way to do so for each row/column. However, the calculation for the correlation coefficient of two vectors is quite simple: Applying that here: def vec_corrcoef(X, y, axis=1): Xm = np.mean(X, axis=axis, keepdims=True) ym = np.mean(y) n = np.sum((X - Xm) * (y - ym), axis=axis) d = np.sqrt(np.sum((X - Xm)**2, axis=axis) * np.sum((y - ym)**2)) return n / d Now, call this function with our array and arange: cc = vec_corrcoef(swindow, np.arange(length)) which gives the desired result: array([ 0.60697698, 0.95894955, 0.98 , 0.3279521 , -0.28709766, -0.61035663, -0.31390158, -0.17995394, -0.80041656, -0.78192905, -0.84702587, -0.81091772, -0.82464375, -0.86892667, -0.28347335, 0.56568542, 0.86304424, 0.64326752, 0.45374261, 0.38135638]) To get your x, just set the appropriate indices of a zeros array of the correct size. Note: I think your x should contain nonzero values starting at the 4 index (because that's where the sliding window is full) instead of starting at index 5. x = np.zeros(data.shape) x[-len(cc):] = cc If you are sure that your values should start at the index 5, then you can do: x = np.zeros(data.shape) x[length:] = cc[:-1] # Ignore the last value in cc Comparing the runtimes of your original approach with those suggested in the answers here: f_OP_loopy is your approach, which implements a sliding window using a loop f_PH_numpy is my approach, which uses the sliding_window_view and the vectorized function for row-wise calculation of the vector correlation coefficient f_RA_numpy is Rontogiannis's approach, which tiles the arange, calculates the correlation coefficient for the entire matrices, and only selects the first len(data) - length rows of the last column f_RA_recur is Rontogiannis's recursive approach, but I didn't time this because it misses out on the last correlation coefficient. Unsurprisingly, the numpy-only solution is faster than the loopy approach. My numpy solution, which computes the row-wise correlation coefficient, is faster than that shown by Rontogiannis below, because the extra work involved in tiling the vector input and calculating the correlation of the entire matrix, only to discard the unwanted elements, is avoided by my approach. As the input data size increases, this "extra work" in Rontogiannis's approach increases so much that its runtime is worse even than the loopy approach! I am unsure if this extra time is in the np.corrcoef calculation or in the np.tile operation. Note: This plot was obtained on my 2.2GHz i7 Macbook Air with 8GB RAM, Python 3.10.7 and numpy 1.23.3. Similar results were obtained on Google Colab If you're interested in the timing code, here it is: import timeit import numpy as np from matplotlib import pyplot as plt def time_funcs(funcs, sizes, arg_gen, N=20): times = np.zeros((len(sizes), len(funcs))) gdict = globals().copy() for i, s in enumerate(sizes): args = arg_gen(s) print(args) for j, f in enumerate(funcs): gdict.update(locals()) try: times[i, j] = timeit.timeit("f(*args)", globals=gdict, number=N) / N print(f"{i}/{len(sizes)}, {j}/{len(funcs)}, {times[i, j]}") except ValueError: print(f"ERROR in {f}, with args=", *args) return times def plot_times(times, funcs): fig, ax = plt.subplots() for j, f in enumerate(funcs): ax.plot(sizes, times[:, j], label=f.__name__) ax.set_xlabel("Array size") ax.set_ylabel("Time per function call (s)") ax.set_xscale("log") ax.set_yscale("log") ax.legend() ax.grid() fig.tight_layout() return fig, ax #%% def arg_gen(n): return [np.random.randint(-100, 100, (n,)), 5] #%% def f_OP_loopy(data, length): x = np.zeros_like(data).astype('float32') for i in range(length-1, data.shape[0]): x[i] = np.corrcoef(data[i - length + 1:i+1], np.arange(length))[0, 1] return x def f_PH_numpy(data, length): swindow = np.lib.stride_tricks.sliding_window_view(data, (length,)) cc = vec_corrcoef(swindow, np.arange(length)) x = np.zeros(data.shape) x[-len(cc):] = cc return x def f_RA_recur(data, length): return np.concatenate(( np.zeros([length,]), rolling_correlation_recurse(data, 0, length) )) def f_RA_numpy(data, length): n = len(data) cc = np.corrcoef(np.lib.stride_tricks.sliding_window_view(data, length), np.tile(np.arange(length), (n-length+1, 1)))[:n-length+1, -1] x = np.zeros(data.shape) x[-len(cc):] = cc return x #%% def rolling_correlation_recurse(data, i, length) : assert i+length < data.size left = np.array([np.corrcoef(data[i:i+length], np.arange(length))[0, 1]]) if i+length+1 == data.size : return left right = rolling_correlation_recurse(data, i+1, length) return np.concatenate((left, right)) def vec_corrcoef(X, y, axis=1): Xm = np.mean(X, axis=axis, keepdims=True) ym = np.mean(y) n = np.sum((X - Xm) * (y - ym), axis=axis) d = np.sqrt(np.sum((X - Xm)**2, axis=axis) * np.sum((y - ym)**2)) return n / d #%% if __name__ == "__main__": #%% Set up sim sizes = [5, 10, 50, 100, 500, 1000, 5000, 10_000] #, 50_000, 100_000] funcs = [f_OP_loopy, #f_RA_recur, f_PH_numpy, f_RA_numpy] #%% Run timing time_fcalls = np.zeros((len(sizes), len(funcs))) * np.nan time_fcalls = time_funcs(funcs, sizes, arg_gen) fig, ax = plot_times(time_fcalls, funcs) ax.set_xlabel(f"Input size") plt.show() input("Enter x to exit")
Ask and you shall receive. Here is a solution that uses recursion: import numpy as np data = np.array([10,5,8,9,15,22,26,11,15,16,18,7,4,8,-2,-3,-4,-6,-2,0,10,0,5,8]) length = 5 def rolling_correlation_recurse(data, i, length) : assert i+length < data.size left = np.array([np.corrcoef(data[i:i+length], np.arange(length))[0, 1]]) if i+length+1 == data.size : return left right = rolling_correlation_recurse(data, i+1, length) return np.concatenate((left, right)) def rolling_correlation(data, length) : return np.concatenate(( np.zeros([length,]), rolling_correlation_recurse(data, 0, length) )) print(rolling_correlation(data, length)) Edit: here is a numpy solution too: n = len(data) print(np.corrcoef(np.lib.stride_tricks.sliding_window_view(data, length), np.tile(np.arange(length), (n-length+1, 1)))[:n-length+1, -1])
How to get equally spaced grid points in an irregularly shaped figure?
I have an irregularly shaped image and I want to get equally spaced grid points inside that. The image that I have for example is Image I have I am thinking of using OpenCV to get the corner coordinates and that is easy. But I do not know how to pass all the corner coordinates or divide my shape in identifiable geometric shapes and do this. Right now, I have hard coded the coordinates and created a function to pass the coordinates. import numpy as np import matplotlib.pyplot as plt import functools def gridFunc(arr): center = np.mean(arr, axis=0) x = np.arange(min(arr[:, 0]), max(arr[:, 0]) + 0.04, 0.4) y = np.arange(min(arr[:, 1]), max(arr[:, 1]) + 0.04, 0.4) a, b = np.meshgrid(x, y) points = np.stack([a.reshape(-1), b.reshape(-1)]).T def normal(a, b): v = b - a n = np.array([v[1], -v[0]]) # normal needs to point out if (center - a) # n > 0: n *= -1 return n mask = functools.reduce(np.logical_and, [((points - a) # normal(a, b)) < 0 for a, b in zip(arr[:-1], arr[1:])]) #plt.plot(arr[:, 0], arr[:, 1]) #plt.gca().set_aspect('equal') #plt.scatter(points[mask][:, 0], points[mask][:, 1]) #plt.show() return points[mask] arr1 = np.array([[0, 7],[3, 10],[3, 4],[0, 7]]) arr2 = np.array([[3, 0], [3, 14], [12, 14], [12, 0], [3,0]]) arr3 = np.array([[12, 4], [12, 10], [20, 10], [20, 4], [12, 4]]) arr_1 = gridFunc(arr1) arr_2 = gridFunc(arr2) arr_3 = gridFunc(arr3) res = np.append(arr_1, arr_2) res = np.reshape(res, (-1, 2)) res = np.append(res, arr_3) res = np.reshape(res, (-1, 2)) plt.scatter(res[:,0], res[:,1]) plt.show() The image that I get is this, But I am doing this manually And I want to extend this to other shapes as well. Image I get
Creating a contour plot from three data columns
I have two columns of input data, that I want as my x and y axis, and a third column of results data relating to the inputs. I have 36 combinations of inputs and then 36 results I want to achieve something like this plot I have tried using a cmap but get told the z data is in 1D and needs to be 2D and don't understand how I get get around this issue Also attached another method below data = excel[['test','A_h','f_h','fore C_T','hind C_T','fore eff','hind eff','hind C_T ratio','hind eff ratio']] x = data['A_h'] y = data['f_h'] z = data['hind C_T ratio'] X,Y = np.meshgrid(x,y) Z = z plt.pcolor(x,y,z)
If you have arrays [1, 2, 3] and [4, 5, 6] then meshgrid will will give you two arrays of 3x3 each: [[1, 1, 1], [2, 2, 2], [3, 3, 3]] and [[4, 5, 6], [4, 5, 6], [4, 5, 6]]. In your case, you seem to have this already taken care of, since you have 36 each of x, y, z, values. So meshgrid won't be necessary. If your arrays are well defined (already in the 11122233 and 456456456 format above), then you can just reshape them: x = np.reshape(data['A_h'], (6,6)) y = np.reshape(data['f_h'], (6,6)) z = np.reshape(data['hind C_T ratio'], (6,6)) plt.contourf(x, y, z) You can see more help about contourf for details. On the other hand, if your data are irregular (the 36 points do not form a grid), then you will have to use griddata as #obchardon suggested above.
How to get a list of all leaves under a node in a dendrogram?
I made a dendrogram using scipy.cluster.hierarchy.dendrogram, using the following generated data: a = np.random.multivariate_normal([10, 0], [[3, 1], [1, 4]], size=[100,]) b = np.random.multivariate_normal([0, 20], [[3, 1], [1, 4]], size=[50,]) c = np.random.multivariate_normal([8, 2], [[3, 1], [1, 4]], size=[80,]) X = np.concatenate((a, b, c),) creating the linkage function: from scipy.cluster.hierarchy import dendrogram, linkage Z = linkage(X, 'ward') and then: dendrogram( Z, truncate_mode='lastp', # show only the last p merged clusters p=5, # show only the last p merged clusters show_leaf_counts=False, # otherwise numbers in brackets are counts leaf_rotation=90., leaf_font_size=12., show_contracted=True, # to get a distribution impression in truncated branches ) Now, I have overall 230 observations in my data that were splitted to p=5 clusters. I want to have, for each cluster, a list of all row indices of all observations that are in it. In addition, I'd like to know the structure of the hierarchy above those 5 clusters. Thanks!
I am a newbie to clustering and dendrogram. So welcome to point out error if there is any. # put X in a dataframe df = pd.DataFrame() df['col1']=X[:,0] df['col2']=X[:,1] index=[] for i in range(len(X)): elem = 'A' + str(i) index.append(elem) df['index'] = index print(df.shape) df.head() Z = linkage(X, 'ward') dendrogram( Z, truncate_mode='lastp', # show only the last p merged clusters p=5, # show only the last p merged clusters show_leaf_counts=True, # otherwise numbers in brackets are counts leaf_rotation=90., leaf_font_size=12., show_contracted=True, # to get a distribution impression in truncated branches ); plt.show() # retrieve elements in each cluster label = fcluster(Z, 5, criterion='maxclust') df_clst = pd.DataFrame() df_clst['index'] = df['index'] df_clst['label'] = label # print them for i in range(5): elements = df_clst[df_clst['label']==i+1]['index'].tolist() size = len(elements) print('\n Cluster {}: N = {} {}'.format(i+1, size, elements))
Generating random numbers around a set of coordinates without for loop
I have a set of coordinate means (3D) and a set of standard deviations (3D) accompying them like this: means = [[x1, y1, z1], [x2, y2, z2], ... [xn, yn, zn]] stds = [[sx1, sy1, sz1], [sx2, sy2, sz2], ... [sxn, syn, szn]] so the problem is N x 3 I am looking to generate 1000 coordinate sample sets (N x 3 x 1000) randomly using np.random.normal(). Currently I generate the samples using a for loop: for i in range(0,1000): samples = np.random.normal(means, stds) But I have the feeling I can lose the for loop and let numpy do it faster and in one call, anybody know how I should code that?
or alternatively use the size argument: import numpy as np means = [ [0, 0, 0], [1, 1, 1] ] std = [ [1, 1, 1], [1, 1, 1] ] #100 samples print(np.random.normal(means, std, size = (100, len(means), 3)))
You can repeat your means and stds arrays 1000 times, and then call np.random.normal() once. means = [[0, 0, 0], [1, 1, 1]] stds = [[1, 1, 1], [2, 2, 2]] means = numpy.array(means) * numpy.ones(1000)[:, None, None] stds = numpy.array(stds) * numpy.ones(1000)[:, None, None] samples = numpy.random.normal(means, stds)