Detecting pattern in OHLC data in Python [closed] - python
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I have the following set of OHLC data:
[[datetime.datetime(2020, 7, 1, 6, 30), '0.00013449', '0.00013866', '0.00013440', '0.00013857', '430864.00000000', 1593579599999, '59.09906346', 1885, '208801.00000000', '28.63104974', '0', 3.0336828016952944], [datetime.datetime(2020, 7, 1, 7, 0), '0.00013854', '0.00013887', '0.00013767', '0.00013851', '162518.00000000', 1593581399999, '22.48036621', 809, '78014.00000000', '10.79595625', '0', -0.02165439584236435], [datetime.datetime(2020, 7, 1, 7, 30), '0.00013851', '0.00013890', '0.00013664', '0.00013780', '313823.00000000', 1593583199999, '43.21919087', 1077, '157083.00000000', '21.62390537', '0', -0.5125983683488642], [datetime.datetime(2020, 7, 1, 8, 0), '0.00013771', '0.00013818', '0.00013654', '0.00013707', '126925.00000000', 1593584999999, '17.44448931', 428, '56767.00000000', '7.79977280', '0', -0.46474475346744676], [datetime.datetime(2020, 7, 1, 8, 30), '0.00013712', '0.00013776', '0.00013656', '0.00013757', '62261.00000000', 1593586799999, '8.54915420', 330, '26921.00000000', '3.69342184', '0', 0.3281796966161107], [datetime.datetime(2020, 7, 1, 9, 0), '0.00013757', '0.00013804', '0.00013628', '0.00013640', '115154.00000000', 1593588599999, '15.80169390', 510, '52830.00000000', '7.24924784', '0', -0.8504761212473579], [datetime.datetime(2020, 7, 1, 9, 30), '0.00013640', '0.00013675', '0.00013598', '0.00013675', '66186.00000000', 1593590399999, '9.02070446', 311, '24798.00000000', '3.38107106', '0', 0.25659824046919455], [datetime.datetime(2020, 7, 1, 10, 0), '0.00013655', '0.00013662', '0.00013577', '0.00013625', '56656.00000000', 1593592199999, '7.71123423', 367, '27936.00000000', '3.80394497', '0', -0.2196997436836377], [datetime.datetime(2020, 7, 1, 10, 30), '0.00013625', '0.00013834', '0.00013625', '0.00013799', '114257.00000000', 1593593999999, '15.70194874', 679, '56070.00000000', '7.70405037', '0', 1.2770642201834814], [datetime.datetime(2020, 7, 1, 11, 0), '0.00013812', '0.00013822', '0.00013630', '0.00013805', '104746.00000000', 1593595799999, '14.39147417', 564, '46626.00000000', '6.39959586', '0', -0.05068056762237037], [datetime.datetime(2020, 7, 1, 11, 30), '0.00013805', '0.00013810', '0.00013720', '0.00013732', '37071.00000000', 1593597599999, '5.10447229', 231, '16349.00000000', '2.25258584', '0', -0.5287939152480996], [datetime.datetime(2020, 7, 1, 12, 0), '0.00013733', '0.00013741', '0.00013698', '0.00013724', '27004.00000000', 1593599399999, '3.70524540', 161, '15398.00000000', '2.11351192', '0', -0.06553557125171522], [datetime.datetime(2020, 7, 1, 12, 30), '0.00013724', '0.00013727', '0.00013687', '0.00013717', '27856.00000000', 1593601199999, '3.81864840', 140, '11883.00000000', '1.62931445', '0', -0.05100553774411102], [datetime.datetime(2020, 7, 1, 13, 0), '0.00013716', '0.00013801', '0.00013702', '0.00013741', '83867.00000000', 1593602999999, '11.54964001', 329, '42113.00000000', '5.80085155', '0', 0.18226888305628908], [datetime.datetime(2020, 7, 1, 13, 30), '0.00013741', '0.00013766', '0.00013690', '0.00013707', '50299.00000000', 1593604799999, '6.90474065', 249, '20871.00000000', '2.86749244', '0', -0.2474346845207872], [datetime.datetime(2020, 7, 1, 14, 0), '0.00013707', '0.00013736', '0.00013680', '0.00013704', '44745.00000000', 1593606599999, '6.13189248', 205, '14012.00000000', '1.92132206', '0', -0.02188662727072625], [datetime.datetime(2020, 7, 1, 14, 30), '0.00013704', '0.00014005', '0.00013703', '0.00013960', '203169.00000000', 1593608399999, '28.26967457', 904, '150857.00000000', '21.00600041', '0', 1.8680677174547595]]
That looks like this:
I'm trying to detect a pattern that looks like the one above in other sets of OHLC data. It doesn't have to be the same, it only needs to be similar, i.e. the number of candles doesn't have to be the same. Just the shape needs to be similar.
The problem:
I don't know where to start to accomplish this. I know it's not easy to do, but I'm sure there is a way to do this.
What I have tried:
Until now, I only managed to cut away manually the OHLC data that I don't need, so that I can only have the patterns I want. Then, I plotted it using a Pandas dataframe:
import mplfinance as mpf
import numpy as np
import pandas as pd
df = pd.DataFrame([x[:6] for x in OHLC],
columns=['Date', 'Open', 'High', 'Low', 'Close', 'Volume'])
format = '%Y-%m-%d %H:%M:%S'
df['Date'] = pd.to_datetime(df['Date'], format=format)
df = df.set_index(pd.DatetimeIndex(df['Date']))
df["Open"] = pd.to_numeric(df["Open"],errors='coerce')
df["High"] = pd.to_numeric(df["High"],errors='coerce')
df["Low"] = pd.to_numeric(df["Low"],errors='coerce')
df["Close"] = pd.to_numeric(df["Close"],errors='coerce')
df["Volume"] = pd.to_numeric(df["Volume"],errors='coerce')
mpf.plot(df, type='candle', figscale=2, figratio=(50, 50))
What I thought: A possible solution to this problem is using Neural Networks, so I would have to feed images of the patterns I want to a NN and let the NN loop though other charts and see if it can find the patterns I specified. Before going this way, I was looking for simpler solutions, since I don't know much about Neural Networks and I don't know what kind of NN I would need to do and what tools would I be supposed to use.
Another solution I was thinking about was the following: I would need, somehow, to convert the pattern I want to find on other datasets in a series of values. So for example the OHLC data I posted above would be quantified, somehow, and on another set of OHLC data I would just need to find values that get close to the pattern I want. This approach is very empirical for now and I don't know how to put that in code.
A tool I was suggested to use: Stumpy
What I need:
I don't need the exact code, I only need an example, an article, a library or any kind of source that can point me out on how to work when I want to detect a certain pattern specified by me on a OHLC data set. I hope I was specific enough; any kind of advice is appreciated!
Stumpy will work for you.
Basic Methodology
The basic gist of the algorithm is to compute a matrix profile of a data stream, and then use that to find areas that are similar. (You can think of the matrix profile as a sliding window that gives a rating of how closely two patters match using Z-normalized Euclidean Distance).
This article explains matrix profiles in a pretty straightforward way. Here's an excerpt that explains what you want:
Simply put, a motif is a repeated pattern in a time series and a discord is an anomaly.
With the Matrix Profile computed, it is simple to find the top-K number of motifs or
discords. The Matrix Profile stores the distances in Euclidean space meaning that a
distance close to 0 is most similar to another sub-sequence in the time series and a
distance far away from 0, say 100, is unlike any other sub-sequence. Extracting the lowest
distances gives the motifs and the largest distances gives the discords.
The benefits of using a matrix profile can be found here.
The gist of what you want to do is compute the matrix profile, then look for minima. Minima mean the sliding window matched another place well.
This example shows how to use it to find repeating patterns in one data set:
To reproduce their results myself, I navigated to the DAT file and downloaded it myself, then opened and read it instead of using their broken urllib calls to get the data.
Replace
context = ssl.SSLContext() # Ignore SSL certificate verification for simplicity
url = "https://www.cs.ucr.edu/~eamonn/iSAX/steamgen.dat"
raw_bytes = urllib.request.urlopen(url, context=context).read()
data = io.BytesIO(raw_bytes)
with
steam_df = None
with open("steamgen.dat", "r") as data:
steam_df = pd.read_csv(data, header=None, sep="\s+")
I also had to add some plt.show() calls since I ran it outside of Jupyter. With those tweaks, you can run their example and see how it works.
Here's the full code I used, so you don't have to repeat what I did:
import pandas as pd
import stumpy
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.patches import Rectangle
import urllib
import ssl
import io
import os
def change_plot_size(width, height, plt):
fig_size = plt.rcParams["figure.figsize"]
fig_size[0] = width
fig_size[1] = height
plt.rcParams["figure.figsize"] = fig_size
plt.rcParams["xtick.direction"] = "out"
change_plot_size(20, 6, plt)
colnames = ["drum pressure", "excess oxygen", "water level", "steam flow"]
context = ssl.SSLContext() # Ignore SSL certificate verification for simplicity
url = "https://www.cs.ucr.edu/~eamonn/iSAX/steamgen.dat"
raw_bytes = urllib.request.urlopen(url, context=context).read()
data = io.BytesIO(raw_bytes)
steam_df = None
with open("steamgen.dat", "r") as data:
steam_df = pd.read_csv(data, header=None, sep="\s+")
steam_df.columns = colnames
steam_df.head()
plt.suptitle("Steamgen Dataset", fontsize="25")
plt.xlabel("Time", fontsize="20")
plt.ylabel("Steam Flow", fontsize="20")
plt.plot(steam_df["steam flow"].values)
plt.show()
m = 640
mp = stumpy.stump(steam_df["steam flow"], m)
true_P = mp[:, 0]
fig, axs = plt.subplots(2, sharex=True, gridspec_kw={"hspace": 0})
plt.suptitle("Motif (Pattern) Discovery", fontsize="25")
axs[0].plot(steam_df["steam flow"].values)
axs[0].set_ylabel("Steam Flow", fontsize="20")
rect = Rectangle((643, 0), m, 40, facecolor="lightgrey")
axs[0].add_patch(rect)
rect = Rectangle((8724, 0), m, 40, facecolor="lightgrey")
axs[0].add_patch(rect)
axs[1].set_xlabel("Time", fontsize="20")
axs[1].set_ylabel("Matrix Profile", fontsize="20")
axs[1].axvline(x=643, linestyle="dashed")
axs[1].axvline(x=8724, linestyle="dashed")
axs[1].plot(true_P)
def compare_approximation(true_P, approx_P):
fig, ax = plt.subplots(gridspec_kw={"hspace": 0})
ax.set_xlabel("Time", fontsize="20")
ax.axvline(x=643, linestyle="dashed")
ax.axvline(x=8724, linestyle="dashed")
ax.set_ylim((5, 28))
ax.plot(approx_P, color="C1", label="Approximate Matrix Profile")
ax.plot(true_P, label="True Matrix Profile")
ax.legend()
plt.show()
approx = stumpy.scrump(steam_df["steam flow"], m, percentage=0.01, pre_scrump=False)
approx.update()
approx_P = approx.P_
seed = np.random.randint(100000)
np.random.seed(seed)
approx = stumpy.scrump(steam_df["steam flow"], m, percentage=0.01, pre_scrump=False)
compare_approximation(true_P, approx_P)
# Refine the profile
for _ in range(9):
approx.update()
approx_P = approx.P_
compare_approximation(true_P, approx_P)
# Pre-processing
approx = stumpy.scrump(
steam_df["steam flow"], m, percentage=0.01, pre_scrump=True, s=None
)
approx.update()
approx_P = approx.P_
compare_approximation(true_P, approx_P)
Self join vs. join against target
Note that this example was a "self join", meaning it was looking for repeated patterns in it's own data. You'll want to join with the target you are looking to match.
Looking at the signature of stumpy.stump shows you how to do this:
def stump(T_A, m, T_B=None, ignore_trivial=True):
"""
Compute the matrix profile with parallelized STOMP
This is a convenience wrapper around the Numba JIT-compiled parallelized
`_stump` function which computes the matrix profile according to STOMP.
Parameters
----------
T_A : ndarray
The time series or sequence for which to compute the matrix profile
m : int
Window size
T_B : ndarray
The time series or sequence that contain your query subsequences
of interest. Default is `None` which corresponds to a self-join.
ignore_trivial : bool
Set to `True` if this is a self-join. Otherwise, for AB-join, set this
to `False`. Default is `True`.
Returns
-------
out : ndarray
The first column consists of the matrix profile, the second column
consists of the matrix profile indices, the third column consists of
the left matrix profile indices, and the fourth column consists of
the right matrix profile indices.
What you'll want to do is pass the data (pattern) you want to look for as T_B and then the larger sets you want to look in as T_A. The window size specifies how large of a search area you want (this will probably be the length of your T_B data, I'd imagine, or smaller if you want).
Once you have the matrix profile, you will just want to do a simple search and get the indicies of the lowest values. Each window starting at that index is a good match. You may also want to define some threshold minimum such that you only consider it a match if there is at least one value in the matrix profile below that minimum.
Another thing to realize is that your data set is really several correlated data sets (Open, High, Low, Close, and Volume). You'll have to decide which you want to match. Maybe you want a good match just for the opening prices, or maybe you want a good match for all of them. You'll have to decide what a good match means and calculate the matrix for each, then decide what to do if only one or a couple of those subsets match. For example, one data set may match the opening prices well, but close prices don't match as well. Another set's volume may match and that's it. Maybe you'll want to see if the normalized prices match (meaning you'd only be looking at the shape and not the relative magnitudes, i.e. a $1 stock going to $10 would look the same as a $10 one going to $100). All of that is pretty straightforward once you can compute a matrix profile.
Related
Extracting data from a histogram with custom bins in Python
I have a data set of distances between two particles, and I want to bin these data in custom bins. For example, I want to see how many distance values lay in the interval from 1 to 2 micrometers, and so on. I wrote a code about it, and it seems to work. This is my code for this part: #Custom binning of data bins= [0,1,2,3,4,5,6,7,8,9,10] fig, ax = plt.subplots(n,m,figsize = (30,10)) #using this because I actually have 5 histograms, but only posted one here ax.hist(dist_from_spacer1, bins=bins, edgecolor="k") ax.set_xlabel('Distance from spacer 1 [µm]') ax.set_ylabel('counts') plt.xticks(bins) plt.show() However, now I wish to extract those data values from the intervals, and store them into lists. I tried to use: np.histogram(dist_from_spacer1, bins=bins) However, this just gives how many data points are on each bin and the bin intervals, just like this: (array([ 0, 0, 44, 567, 481, 279, 309, 202, 117, 0]), array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10])) How can I get the exact data that belong to each histogram bin?
Yes, np.histogram calculates what you need for a histogram, and hence the specific data points are not necessary, just bins' boundaries and count for each bin. However, the bins' boundaries is sufficient to acheive what you want by using np.digitizr counts, bins = np.histogram(dist_from_spacer1) indices = np.digitize(dist_from_spacer1, bins) lists = [[] for _ in range(len(bins))] [lists[i].append(x) for i, x in zip(indices, dist_from_spacer1) In your case, the bins' boundaries are predefined, so you can use np.digitize directly
Is there a way to cut only the first gap from histogram and take all the remain values in Python?
I have a data frame with fields: 'unique years', 'counts'. I plotted this data frame and i am getting the following histogram: histogram - example. I need to define a start year variable but if i have empty gaps at the starting point of histogram i need to skip them and shift the starting year. I was wondering if there is a pythonic way to do this. In the histogram - example plot, i have a not empty bin at the starting point but then i have a big gap with empty bins. So i need to find the point with a continuous not empty bins and define this point as a starting year (for the above sample i need the starting year as 1935). The n numpy.ndarray is giving me information about empty or not bins but i need a efficient way to resolve this. Thank you :) Sample of my data frame: import pandas as pd data = {'unique_years': [1907, 1935, 1938, 1939, 1940], 'counts' : [11, 14, 438, 85, 8]} df = pd.DataFrame(data, columns = ['unique_years', 'counts']) code for the histogram plot (n, bins, patches) = plt.hist(df.unique_years, bins=25, label='hst') plt.show()
The issue with your question is that 'continuous' is not really well defined here. Do you mean that every year should have a non-empty count (that is fairly easy to do as you can filter your data for that prior to building your histogram), or should every consecutive bucket be non empty? If the latter, this means that you must: Build your histogram Filter your data on the resulting bins Either use the filtered histogram or re-bin the remaining data, with bins sizes not guaranteed to stay the same (so it is possible that you have the same issue with the new bins!) As it is difficult to know exactly what is relevant in your exact case, I think the best answer would be to give you a set of tools that you can use as you see fit for the exact problem that you are encountering: I want to filter my data starting from a certain date filtered = df.unique_years[df.unique_years > 1930] I want to find the second non-empty bin (n, x) = np.histogram(df.unique_years, bins=25) second_nonempty = np.where(n > 0)[0][1] From there you can: rebin your filtered data: (n, x) = np.histogram(df.unique_years, bins=25) second_nonempty = np.where(n > 0)[0][1] # Re-binning on the filtered data plt.hist(df.unique_years[df.unique_years >= n[second_nonempty]], bins=25) Plot your histogram directly on the filtered bins: (n, x) = np.histogram(df.unique_years, bins=25) second_nonempty = np.where(n > 0)[0][1] # Forcing the bins to take the provided values plt.hist(df.unique_years, bins=x[second_nonempty:]) Now the 'second_nonempty' above can of course be replaced by any estimator of where you want to start, e.g.: # Last empty bin + 1 all_bins_full_after = np.where(n == 0)[0][-1] + 1 Or anything else really
This should work to eliminate all the bins that are not consecutive. I am working mainly on the df. You can use this to plot your histogram df = pd.DataFrame(data, columns = ['unique_years', 'counts']) yd = df.unique_years.diff().eq(1) df[yd|yd.shift(-1)] this is the result you would get:
Is this an error in the seaborn.lineplot hue parameter?
With this code snippet, I'm expecting a line plot with one line per hue, which has these distinct values: [1, 5, 10, 20, 40]. import math import pandas as pd import seaborn as sns sns.set(style="whitegrid") TANH_SCALING = [1, 5, 10, 20, 40] X_VALUES = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15] COLUMNS = ['x', 'y', 'hue group'] tanh_df = pd.DataFrame(columns=COLUMNS) for sc in TANH_SCALING: data = { COLUMNS[0]: X_VALUES, COLUMNS[1]: [math.tanh(x/sc) for x in X_VALUES], COLUMNS[2]: len(X_VALUES)*[sc]} tanh_df = tanh_df.append( pd.DataFrame(data=data, columns=COLUMNS), ignore_index=True ) sns.lineplot(x=COLUMNS[0], y=COLUMNS[1], hue=COLUMNS[2], data=tanh_df); However, what I get is a hue legend with values [0, 15, 30, 45], and an additional line, like so: Is this a bug or am I missing something obvious?
This is a known bug of seaborn when the hue can be cast to integers. You could add a prefix to the hue so casting to integers fails: for sc in TANH_SCALING: data = { COLUMNS[0]: X_VALUES, COLUMNS[1]: [math.tanh(x/sc) for x in X_VALUES], COLUMNS[2]: len(X_VALUES)*[f'A{sc}']} # changes here tanh_df = tanh_df.append( pd.DataFrame(data=data, columns=COLUMNS), ignore_index=True ) Output: Or after you created your data: # data creation for sc in TANH_SCALING: data = { COLUMNS[0]: X_VALUES, COLUMNS[1]: [math.tanh(x/sc) for x in X_VALUES], COLUMNS[2]: len(X_VALUES)*[f'A{sc}']} tanh_df = tanh_df.append( pd.DataFrame(data=data, columns=COLUMNS), ignore_index=True ) # hue manipulation sns.lineplot(x=COLUMNS[0], y=COLUMNS[1], hue='A_' + tanh_df[COLUMNS[2]].astype(str), # change hue here data=tanh_df);
As #LudvigH's comment on the other answer says, this isn't a bug, even if the default behavior is surprising in this case. As explained in the docs: The default treatment of the hue (and to a lesser extent, size) semantic, if present, depends on whether the variable is inferred to represent “numeric” or “categorical” data. In particular, numeric variables are represented with a sequential colormap by default, and the legend entries show regular “ticks” with values that may or may not exist in the data. This behavior can be controlled through various parameters, as described and illustrated below. Here are two specific ways to control the behavior. If you want to keep the numeric color mapping but have the legend show the exact values in your data, set legend="full": sns.lineplot(x=COLUMNS[0], y=COLUMNS[1], hue=COLUMNS[2], data=tanh_df, legend="full") If you want to have seaborn treat the levels of the hue parameter as discrete categorical values, pass a named categorical colormap or either a list or dictionary of the specific colors you want to use: sns.lineplot(x=COLUMNS[0], y=COLUMNS[1], hue=COLUMNS[2], data=tanh_df, palette="deep")
3D surface plot never shows any data
The 3D surface plot in plotly never shows the data, I get the plot to show up, but nothing shows up in the plot, as if I had ploted an empty Data Frame. At first, I tried something like the solution I found here(Plotly Plot surface 3D not displayed), but had the same result, another plot with no data. df3 = pd.DataFrame({'x':[1, 2, 3, 4, 5],'y':[10, 20, 30, 40, 50],'z': [5, 4, 3, 2, 1]}) iplot(dict(data=[Surface(x=df3['x'], y=df3['y'], z=df3['z'])])) And so I tried the code at the plotly website(the first cell of this notebook: https://plot.ly/python/3d-scatter-plots/), exactly as it is there, just to see if their example worked, but I get an error. I am getting this: https://lh3.googleusercontent.com/sOxRsIDLVkBGKTksUfVqm3HtaSQAN_ybQq2HLA-aclzEU-9ekmvd1ETdfsswC2SdbysizOI=s151 But I should get this: https://lh3.googleusercontent.com/5Hy2Z-97_vwd3ftKBA6dYZfikJHnA-UMEjd3PHvEvdBzw2m2zeEHBtneLC1jzO3RmE2lyw=s151 Observation: could not post the images because of lack of reputation.
In order to plot a surface you have to provide a value for each point. In this case your x and y are series of size 5, that means that your z should have a shape (5, 5). If I had a bit more info I could give you more details but for a minimal working example try to pass a (5, 5) dataframe, numpy array or even a list of lists to the z value of data. EDIT: In a notebook environment the following code works for me: from plotly import offline from plotly import graph_objs as go offline.init_notebook_mode(connected=False) df3 = {'x':[1, 2, 3, 4, 5],'y':[10, 20, 30, 40, 50],'z': [[5, 4, 3, 2, 1]]*5} offline.iplot(dict(data=[go.Surface(x=df3['x'], y=df3['y'], z=df3['z'])])) as shown here: I'm using plotly 3.7.0.
matplotlib - extracting data from contour lines
I would like to get data from a single contour of evenly spaced 2D data (an image-like data). Based on the example found in a similar question: How can I get the (x,y) values of the line that is ploted by a contour plot (matplotlib)? >>> import matplotlib.pyplot as plt >>> x = [1,2,3,4] >>> y = [1,2,3,4] >>> m = [[15,14,13,12],[14,12,10,8],[13,10,7,4],[12,8,4,0]] >>> cs = plt.contour(x,y,m, [9.5]) >>> cs.collections[0].get_paths() The result of this call into cs.collections[0].get_paths() is: [Path([[ 4. 1.625 ] [ 3.25 2. ] [ 3. 2.16666667] [ 2.16666667 3. ] [ 2. 3.25 ] [ 1.625 4. ]], None)] Based on the plots, this result makes sense and appears to be collection of (y,x) pairs for the contour line. Other than manually looping over this return value, extracting the coordinates and assembling arrays for the line, are there better ways to get data back from a matplotlib.path object? Are there pitfalls to be aware of when extracting data from a matplotlib.path? Alternatively, are there alternatives within matplotlib or better yet numpy/scipy to do a similar thing? Ideal thing would be to get a high resolution vector of (x,y) pairs describing the line, which could be used for further analysis, as in general my datasets are not a small or simple as the example above.
For a given path, you can get the points like this: p = cs.collections[0].get_paths()[0] v = p.vertices x = v[:,0] y = v[:,1]
from: http://matplotlib.org/api/path_api.html#module-matplotlib.path Users of Path objects should not access the vertices and codes arrays directly. Instead, they should use iter_segments() to get the vertex/code pairs. This is important, since many Path objects, as an optimization, do not store a codes at all, but have a default one provided for them by iter_segments(). Otherwise, I'm not really sure what your question is. [Zip] is a sometimes useful built in function when working with coordinates. 1
The vertices of an all paths can be returned as a numpy array of float64 simply via: cs.allsegs[i][j] # for element j, in level i where cs is defined as in the original question as: import matplotlib.pyplot as plt x = [1, 2, 3, 4] y = [1, 2, 3, 4] m = [[15, 14, 13, 12], [14, 12, 10, 8], [13, 10, 7, 4], [12, 8, 4, 0]] cs = plt.contour(x, y, m, [9.5]) More detailed: Going through the collections and extracting the paths and vertices is not the most straight forward or fastest thing to do. The returned Contour object actually has attributes for the segments via cs.allsegs, which returns a nested list of shape [level][element][vertex_coord]: num_levels = len(cs.allsegs) num_element = len(cs.allsegs[0]) # in level 0 num_vertices = len(cs.allsegs[0][0]) # of element 0, in level 0 num_coord = len(cs.allsegs[0][0][0]) # of vertex 0, in element 0, in level 0 See reference: https://matplotlib.org/stable/api/contour_api.html
I am facing a similar problem, and stumbled over this matplotlib list discussion. Basically, it is possible to strip away the plotting and call the underlying functions directly, not super convenient, but possible. The solution is also not pixel precise, as there is probably some interpolation going on in the underlying code. import matplotlib.pyplot as plt import matplotlib._cntr as cntr import scipy as sp data = sp.zeros((6,6)) data[2:4,2:4] = 1 plt.imshow(data,interpolation='none') level=0.5 X,Y = sp.meshgrid(sp.arange(data.shape[0]),sp.arange(data.shape[1])) c = cntr.Cntr(X, Y, data.T) nlist = c.trace(level, level, 0) segs = nlist[:len(nlist)//2] for seg in segs: plt.plot(seg[:,0],seg[:,1],color='white') plt.show()