How to helpfully plot time series data in python - python

I have a 2 Dimensional set of time series data containing about 1000 samples. I.e I have a long list of 1000 elements each entry is a list of two numbers.
It can be thought of the position, x and y coordinates, of a car with a picture taken each second for 1000 seconds. When I plot this, as seen below, you get a decent idea of the trajectory but it's unclear where the car starts or finishes, i.e which direction it is traveling in. I was thinking about including arrows between each point but I think this would get quite clustered (maybe you know a way to overcome that issue?) Also, I thought of colouring each point with a spectrum that made it clear to see time increasing, i.e hotter points to colder points as time goes on. Any idea how to achieve this in matplotlib?

I believe both your ideas would work well, I just think you need to test which option works best for your case.
Option 1: arrows
To avoid a cluttered plot I believe you could plot arrows between only a selection of points to show the general direction of your trajectory. In my example below I only plot an arrow between points 1 and 2, 6 and 7, and so on and. You might want to increase the spacing between the points to make this work for your long series. It is also possible to connect points that are seperated by, say, 10 points to make them more clearly visible.
import numpy as np
import matplotlib.pyplot as plt
# example data
x = np.linspace(0, 10, 100)
y = x
plt.figure()
# plot the data points
for i in range(len(x)):
plt.plot(x[i], y[i], "ro")
# plot arrows between points 1 and 2, 6 and 7 and so on.
for i in range(0, len(x)-1, 5):
plt.arrow(x[i], y[i], x[i+1] - x[i], y[i+1] - y[i], color = "black",zorder = 2, width = 0.05)
plt.xlabel('x')
plt.ylabel('y')
plt.show()
This yields this plot.
Option 2: colors
You can generate any number of colors from a colormap, meaning you can make a list of 1000 sequential colors. This way you can plot each of your points in an increasingly warm color.
Example:
import numpy as np
import matplotlib.pyplot as plt
# example data
x = np.linspace(0, 10, 100)
y = x
# generate 100 (number of data points) colors from colormap
colors = [plt.get_cmap("coolwarm")(i) for i in np.linspace(0,1, len(x))]
plt.figure()
# plot the data points with the generated colors
for i in range(len(x)):
plt.plot(x[i], y[i], color = colors[i], marker = "o")
plt.xlabel('x')
plt.ylabel('y')
plt.show()
This yields this figure, where the oldest data point is cool (blue) and the newest is red (warm).

Related

How do you plot with multiple colours for the same point?

I have some data that I have clustered, and wish to compare it to human-annotations of the same data. The problem I have is that the human-annotator has marked some of the points to have co-existing events (i.e. some points in the space would have two or more labels associated with them). Is there a way that I can show this using matplotlib?
I am thinking something along the lines of the following simplified example:
The main point is to not have the points with events 1 & 2 be classified as some new event '3', but instead to plot the points that have two events, to display these events independently. I assume that this is not an easy task, as there could potentially be more than two coexisting events, but for this example I am only focussing on two.
My plan was to create a one-hot array of shape=(n_points, n_events), and linearly selecting colours for a colourmap by using plt.cm.rainbow which would represent each unique event. But I have gotten stuck here as I do not know how to plot the points with >1 label.
The style in which the points are plotted does not strictly matter (i.e. having the side-by-side colours as I have illustrated is not a requirement), any method of displaying them should be adequate so long as points with multiple events are easily identifiable.
I would post my attempt so far, but as I am stuck on such an early step, it only goes as far as generating a random toy dataset of shape (20, 2), and creating the one-hot array of labels as I had previously mentioned.
You can specify color as well as markerfacecoloralt together with fillstyle='left' in order to obtain a side-by-side color plot. For more information and other styles see this tutorial.
import matplotlib.pyplot as plt
import numpy as np
x = np.sort(np.random.random(size=20))
y = x + np.random.normal(scale=0.2, size=x.shape)
i, j = len(x)//2 - 2, len(x)//2 + 3 # separate the points in left and right
colors = ['#1f77b4', '#ff7f0e']
fig, ax = plt.subplots()
ax.plot(x[:i], y[:i], 'o', color=colors[0], ms=15) # left part
ax.plot(x[j:], y[j:], 'o', color=colors[1], ms=15) # right part
ax.plot(x[i:j], y[i:j], 'o', # middle part
fillstyle='left', color=colors[0], markerfacecoloralt=colors[1], ms=15)
plt.show()
Example plot:
If you don't require side-by-side colors you could plot two points on top of each other, using different sizes.
import matplotlib.pyplot as plt
import numpy as np
x = np.sort(np.random.random(size=20))
y = x + np.random.normal(scale=0.2, size=x.shape)
i, j = len(x)//2 - 2, len(x)//2 + 3 # separate the points in left and right
colors = ['#1f77b4', '#ff7f0e']
fig, ax = plt.subplots()
ax.scatter(x[:j], y[:j], c=colors[0], s=100) # left part (including middle)
ax.scatter(x[j:], y[j:], c=colors[1], s=100) # right part
ax.scatter(x[i:j], y[i:j], c=colors[1], s=20) # middle part (using smaller size)
plt.show()
Example plot:

How to drawheat map with large data set in python

I am trying to plot a sine wave, and the color of the curve at each point is represented by its tangential slope value.
For example, a 3600 * 1000 data frame should be filled:
x_axis = list(range(0, 3601))
y_axis = list(range(-1000, 1001))
wave = pd.DataFrame(index = y_axis,columns= x_axis )
for i in range(0, 3601, 1):
y = int(round(np.sin(np.radians(i / 10)), 3) * 1000)
wave.loc[y, i] = -abs(y)
wave = wave.fillna(0)
wave[wave == 0] =np.nan
seaborn.heatmap(wave)
and by using seaborn.heatmap(wave) the heatmap will be generated like attached image. But what I am looking for is to draw maybe 50-100 sine wave like this in one picture, so the dataframe size will be much larger to 360000*10000. With this size of dataframe I still want to show similar heatmap, or any type or drawing that can represent the value change for each cell. My work station seems to freeze by using seaborn heatmap with this dataset.
Some of my thoughts would be to normalize all the values to 0-255 and use some GLV plotting function, I am still researching it.
You could create a similar plot using plt.scatter:
import matplotlib.pyplot as plt
import numpy as np
x_axis = np.arange(0, 360, 0.1)
y = np.round(np.sin(np.radians(x_axis)), 3) * 1000
plt.scatter(x_axis, y, c=-np.abs(y), s=1, cmap='gist_heat')
plt.show()
To get a wider curve, just increase s. To get rid of the white part of the colormap, you can move the color limits (called vmin and vmax). Standard they are the minimum and maximum of the given color values. In this case the maximum is 0 and the minimum is -1000. Setting vmax to +100 would leave out 10% of the color range.
plt.scatter(x_axis, y, c=-np.abs(y), vmax=0.1*y.max(), s=10, cmap='gist_heat')

Matplotlib plot has slanted lines

I'm trying to plot projections of coordinates onto a line, but for some reason, Matplotlib is plotting the projections in a slightly slanted manner. Ideally, I would like the (blue) projections to be perpendicular to the (green) line. Here's an image of how it looks with sample data:
As you can see, the angles between the blue lines and the green line are slightly obtuse instead of right. I tried playing around with the rotation parameter to the annotate function, but this did not help. The code for this plot is below, although the data might look a bit different since the random generator is not seeded:
import numpy as np
import matplotlib.pyplot as plt
prefs = {'color':'purple','edgecolors':'black'}
X = np.dot(np.random.rand(2,2), np.random.rand(2,50)).T
pts = np.linspace(-1,1)
v1_m = 0.8076549717643662
plt.scatter(X[:,0],X[:,1],**prefs)
plt.plot(pts, [v1_m*x for x in pts], color='lightgreen')
for x,y in X:
# slope of connecting line
# y = mx+b
m = -np.reciprocal(v1_m)
b = y-m*x
# find intersecting point
zx = b/(v1_m-m)
zy = v1_m*zx
# draw line
plt.annotate('',(zx,zy),(x,y),arrowprops=dict(linewidth=2,arrowstyle='-',color='lightblue'))
plt.show()
The problem lies in the unequal axes which makes it look like they are not at a right angle. Use plt.axis('equal') to have equal axis spans on x- and y-axis and a square figure with equal height and width. plt.axis('scaled') works the same way. As pointed out by #CedricZoppolo, you should set the equal aspect ratios before plt.show(). As per docs, setting the aspect ratio to "equal" means
same scaling from data to plot units for x and y
import numpy as np
import matplotlib.pyplot as plt
fig = plt.figure(figsize=(8,8))
# Your code here
plt.axis('equal')
plt.show()
Choosing a square figure is not necessary as it works also with rectangular figures as
fig = plt.figure(figsize=(8,6))
# Your code here
plt.axis('equal')
plt.show()
The blue lines not being perpendicular is due to axis not being equal.
You just need to add below line before plt.show()
plt.gca().set_aspect('equal')
Below you can see the resulted graph:

Radius of matplotlib scatter plot [duplicate]

In the pyplot document for scatter plot:
matplotlib.pyplot.scatter(x, y, s=20, c='b', marker='o', cmap=None, norm=None,
vmin=None, vmax=None, alpha=None, linewidths=None,
faceted=True, verts=None, hold=None, **kwargs)
The marker size
s:
size in points^2. It is a scalar or an array of the same length as x and y.
What kind of unit is points^2? What does it mean? Does s=100 mean 10 pixel x 10 pixel?
Basically I'm trying to make scatter plots with different marker sizes, and I want to figure out what does the s number mean.
This can be a somewhat confusing way of defining the size but you are basically specifying the area of the marker. This means, to double the width (or height) of the marker you need to increase s by a factor of 4. [because A = WH => (2W)(2H)=4A]
There is a reason, however, that the size of markers is defined in this way. Because of the scaling of area as the square of width, doubling the width actually appears to increase the size by more than a factor 2 (in fact it increases it by a factor of 4). To see this consider the following two examples and the output they produce.
# doubling the width of markers
x = [0,2,4,6,8,10]
y = [0]*len(x)
s = [20*4**n for n in range(len(x))]
plt.scatter(x,y,s=s)
plt.show()
gives
Notice how the size increases very quickly. If instead we have
# doubling the area of markers
x = [0,2,4,6,8,10]
y = [0]*len(x)
s = [20*2**n for n in range(len(x))]
plt.scatter(x,y,s=s)
plt.show()
gives
Now the apparent size of the markers increases roughly linearly in an intuitive fashion.
As for the exact meaning of what a 'point' is, it is fairly arbitrary for plotting purposes, you can just scale all of your sizes by a constant until they look reasonable.
Edit: (In response to comment from #Emma)
It's probably confusing wording on my part. The question asked about doubling the width of a circle so in the first picture for each circle (as we move from left to right) it's width is double the previous one so for the area this is an exponential with base 4. Similarly the second example each circle has area double the last one which gives an exponential with base 2.
However it is the second example (where we are scaling area) that doubling area appears to make the circle twice as big to the eye. Thus if we want a circle to appear a factor of n bigger we would increase the area by a factor n not the radius so the apparent size scales linearly with the area.
Edit to visualize the comment by #TomaszGandor:
This is what it looks like for different functions of the marker size:
x = [0,2,4,6,8,10,12,14,16,18]
s_exp = [20*2**n for n in range(len(x))]
s_square = [20*n**2 for n in range(len(x))]
s_linear = [20*n for n in range(len(x))]
plt.scatter(x,[1]*len(x),s=s_exp, label='$s=2^n$', lw=1)
plt.scatter(x,[0]*len(x),s=s_square, label='$s=n^2$')
plt.scatter(x,[-1]*len(x),s=s_linear, label='$s=n$')
plt.ylim(-1.5,1.5)
plt.legend(loc='center left', bbox_to_anchor=(1.1, 0.5), labelspacing=3)
plt.show()
Because other answers here claim that s denotes the area of the marker, I'm adding this answer to clearify that this is not necessarily the case.
Size in points^2
The argument s in plt.scatter denotes the markersize**2. As the documentation says
s : scalar or array_like, shape (n, ), optional
size in points^2. Default is rcParams['lines.markersize'] ** 2.
This can be taken literally. In order to obtain a marker which is x points large, you need to square that number and give it to the s argument.
So the relationship between the markersize of a line plot and the scatter size argument is the square. In order to produce a scatter marker of the same size as a plot marker of size 10 points you would hence call scatter( .., s=100).
import matplotlib.pyplot as plt
fig,ax = plt.subplots()
ax.plot([0],[0], marker="o", markersize=10)
ax.plot([0.07,0.93],[0,0], linewidth=10)
ax.scatter([1],[0], s=100)
ax.plot([0],[1], marker="o", markersize=22)
ax.plot([0.14,0.86],[1,1], linewidth=22)
ax.scatter([1],[1], s=22**2)
plt.show()
Connection to "area"
So why do other answers and even the documentation speak about "area" when it comes to the s parameter?
Of course the units of points**2 are area units.
For the special case of a square marker, marker="s", the area of the marker is indeed directly the value of the s parameter.
For a circle, the area of the circle is area = pi/4*s.
For other markers there may not even be any obvious relation to the area of the marker.
In all cases however the area of the marker is proportional to the s parameter. This is the motivation to call it "area" even though in most cases it isn't really.
Specifying the size of the scatter markers in terms of some quantity which is proportional to the area of the marker makes in thus far sense as it is the area of the marker that is perceived when comparing different patches rather than its side length or diameter. I.e. doubling the underlying quantity should double the area of the marker.
What are points?
So far the answer to what the size of a scatter marker means is given in units of points. Points are often used in typography, where fonts are specified in points. Also linewidths is often specified in points. The standard size of points in matplotlib is 72 points per inch (ppi) - 1 point is hence 1/72 inches.
It might be useful to be able to specify sizes in pixels instead of points. If the figure dpi is 72 as well, one point is one pixel. If the figure dpi is different (matplotlib default is fig.dpi=100),
1 point == fig.dpi/72. pixels
While the scatter marker's size in points would hence look different for different figure dpi, one could produce a 10 by 10 pixels^2 marker, which would always have the same number of pixels covered:
import matplotlib.pyplot as plt
for dpi in [72,100,144]:
fig,ax = plt.subplots(figsize=(1.5,2), dpi=dpi)
ax.set_title("fig.dpi={}".format(dpi))
ax.set_ylim(-3,3)
ax.set_xlim(-2,2)
ax.scatter([0],[1], s=10**2,
marker="s", linewidth=0, label="100 points^2")
ax.scatter([1],[1], s=(10*72./fig.dpi)**2,
marker="s", linewidth=0, label="100 pixels^2")
ax.legend(loc=8,framealpha=1, fontsize=8)
fig.savefig("fig{}.png".format(dpi), bbox_inches="tight")
plt.show()
If you are interested in a scatter in data units, check this answer.
You can use markersize to specify the size of the circle in plot method
import numpy as np
import matplotlib.pyplot as plt
x1 = np.random.randn(20)
x2 = np.random.randn(20)
plt.figure(1)
# you can specify the marker size two ways directly:
plt.plot(x1, 'bo', markersize=20) # blue circle with size 10
plt.plot(x2, 'ro', ms=10,) # ms is just an alias for markersize
plt.show()
From here
It is the area of the marker. I mean if you have s1 = 1000 and then s2 = 4000, the relation between the radius of each circle is: r_s2 = 2 * r_s1. See the following plot:
plt.scatter(2, 1, s=4000, c='r')
plt.scatter(2, 1, s=1000 ,c='b')
plt.scatter(2, 1, s=10, c='g')
I had the same doubt when I saw the post, so I did this example then I used a ruler on the screen to measure the radii.
I also attempted to use 'scatter' initially for this purpose. After quite a bit of wasted time - I settled on the following solution.
import matplotlib.pyplot as plt
input_list = [{'x':100,'y':200,'radius':50, 'color':(0.1,0.2,0.3)}]
output_list = []
for point in input_list:
output_list.append(plt.Circle((point['x'], point['y']), point['radius'], color=point['color'], fill=False))
ax = plt.gca(aspect='equal')
ax.cla()
ax.set_xlim((0, 1000))
ax.set_ylim((0, 1000))
for circle in output_list:
ax.add_artist(circle)
This is based on an answer to this question
If the size of the circles corresponds to the square of the parameter in s=parameter, then assign a square root to each element you append to your size array, like this: s=[1, 1.414, 1.73, 2.0, 2.24] such that when it takes these values and returns them, their relative size increase will be the square root of the squared progression, which returns a linear progression.
If I were to square each one as it gets output to the plot: output=[1, 2, 3, 4, 5]. Try list interpretation: s=[numpy.sqrt(i) for i in s]

Can python/matplotlib's confourf be made to plot a limited region of data?

Am trying to make a contour plot with matplotlib's contourf. Is there a way to zoom in on a particular region of my data and leave the rest unplotted? For instance, maybe my horizontal extent goes from -1 to 101, but I just want to plot the data that's in between 0 and 100 inclusive, and I want the boundary of the plot to be drawn at 0 on the left and 100 on the right. I thought the "extent" keyword would do the job, but it is inactive when X and Y data are given. I know that I can mask the extraneous data in various ways, but that leaves the boundaries of the plot drawn beyond my region of interest, which is not what I want. I guess I could also filter or interpolate my data to my region of interest and then give that filtered data to contourf, but if I can just make contourf focus on a particular region, it would be alot easier. Thanks.
Perhaps you are looking for plt.xlim:
import matplotlib.pyplot as plt
import numpy as np
x = np.linspace(-1,101,100)
y = np.linspace(-1,101,100)
x, y = np.meshgrid(x, y)
z = x*x+y*y
plt.figure()
plt.xlim(0, 50)
plt.contourf(x, y, z)
plt.show()
Above, plt.xlim(0, 50) was used instead of plt.xlim(0,100) just to emphasize the change. Without plt.xlim(0, 50) the plot looks like this:

Categories