I calculate the eigenvalues of large matrices depending on a parameter and would like to plot the eigenvalues in different colors. So I do not have functions where I can conveniently plot different functions in different colors, but instead I just have a set of points which just get connected as interpolation. My problem is that the lines should be intersecting, but that cannot be achieved with this numerical approach.
Maybe it is best explained with a small example.
import numpy as np
import numpy.linalg
import matplotlib.pyplot as plt
def mat(x):
#return np.array([[np.sin(x), 0], [0, -np.sin(x)]])
return np.array([[np.sin(x), 0, 0], [0, -np.sin(x), 0], [0, 0, np.sin(10*x)+x]])
fig=plt.figure()
fig.suptitle('wrong colors')
ax=fig.add_subplot(111)
# x = np.linspace(-1,1,100) # no, not that easy, the intersection points are difficult to find
x = np.sort(np.random.uniform(low=-1, high=1, size=1000))
#evs = np.zeros((2, len(x)))
evs = np.zeros((3, len(x)))
for i in range(len(x)):
evs[:, i] = np.linalg.eigvalsh(mat(x[i]))
print(np.shape(evs))
ax.plot(x, evs[0,:], color='C0')
ax.plot(x, evs[1,:], color='C1')
ax.plot(x, evs[2,:], color='C2')
# just reference plot, this is how it should look like
fig2 = plt.figure()
fig2.suptitle('correct colors')
ax2 = fig2.add_subplot(111)
ax2.plot(x, np.sin(x), color='C0')
ax2.plot(x, -np.sin(x), color='C1')
ax2.plot(x, np.sin(10*x)+x, color='C2')
plt.show()
So what I get is this:
What I would like to have is this:
One difficulty is that the intersection point is difficult to calculate and usually not included. That's ok, I don't need the point, as the graphics is purely informative. But the colors should be shown correctly. Any suggestions how I could achieve something like this easily?
To give you an idea of where this is to be used, have a look at the following picture.
Here, the straight lines in the middle should have a different color than the curved ones.
Besides the matrix being a lot more complex, the image is created in the same way as above.
EDIT: My example was not good and clear, I have come up with one which is closer to my real problem. The matrix is numeric and I cannot diagonalize it analytically, i.e. I cannot know whether it is sin, cos or maybe some mean np.sin(2*x+0.2)+np.cos(x)**2.
Here you go:
Just concatenate the first part of one signal with the last part of the other
import numpy as np
import numpy.linalg
import matplotlib.pyplot as plt
def mat(x):
return np.array([[np.sin(x), 0], [0, -np.sin(x)]])
fig=plt.figure()
fig.suptitle('wrong colors')
ax=fig.add_subplot(111)
x = np.linspace(-1,1,100)
evs = np.zeros((2, len(x)))
for i in range(len(x)):
evs[:, i] = np.linalg.eigvalsh(mat(x[i]))
print(np.shape(evs))
ax.plot(x, np.concatenate((evs[0,:int(len(x)//2)],evs[1,int(len(x)//2):])), color='C0')
ax.plot(x, np.concatenate((evs[1,:int(len(x)//2)],evs[0,int(len(x)//2):])), color='C1')
plt.show()
Related
I have two list as below:
latt=[42.0,41.978567980875397,41.96622693388357,41.963791391892457,...,41.972407378075879]
lont=[-66.706920989908909,-66.703116557977069,-66.707351643324543,...-66.718218142021925]
now I want to plot this as a line, separate each 10 of those 'latt' and 'lont' records as a period and give it a unique color.
what should I do?
There are several different ways to do this. The "best" approach will depend mostly on how many line segments you want to plot.
If you're just going to be plotting a handful (e.g. 10) line segments, then just do something like:
import numpy as np
import matplotlib.pyplot as plt
def uniqueish_color():
"""There're better ways to generate unique colors, but this isn't awful."""
return plt.cm.gist_ncar(np.random.random())
xy = (np.random.random((10, 2)) - 0.5).cumsum(axis=0)
fig, ax = plt.subplots()
for start, stop in zip(xy[:-1], xy[1:]):
x, y = zip(start, stop)
ax.plot(x, y, color=uniqueish_color())
plt.show()
If you're plotting something with a million line segments, though, this will be terribly slow to draw. In that case, use a LineCollection. E.g.
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.collections import LineCollection
xy = (np.random.random((1000, 2)) - 0.5).cumsum(axis=0)
# Reshape things so that we have a sequence of:
# [[(x0,y0),(x1,y1)],[(x0,y0),(x1,y1)],...]
xy = xy.reshape(-1, 1, 2)
segments = np.hstack([xy[:-1], xy[1:]])
fig, ax = plt.subplots()
coll = LineCollection(segments, cmap=plt.cm.gist_ncar)
coll.set_array(np.random.random(xy.shape[0]))
ax.add_collection(coll)
ax.autoscale_view()
plt.show()
For both of these cases, we're just drawing random colors from the "gist_ncar" coloramp. Have a look at the colormaps here (gist_ncar is about 2/3 of the way down): http://matplotlib.org/examples/color/colormaps_reference.html
Copied from this example:
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.collections import LineCollection
from matplotlib.colors import ListedColormap, BoundaryNorm
x = np.linspace(0, 3 * np.pi, 500)
y = np.sin(x)
z = np.cos(0.5 * (x[:-1] + x[1:])) # first derivative
# Create a colormap for red, green and blue and a norm to color
# f' < -0.5 red, f' > 0.5 blue, and the rest green
cmap = ListedColormap(['r', 'g', 'b'])
norm = BoundaryNorm([-1, -0.5, 0.5, 1], cmap.N)
# Create a set of line segments so that we can color them individually
# This creates the points as a N x 1 x 2 array so that we can stack points
# together easily to get the segments. The segments array for line collection
# needs to be numlines x points per line x 2 (x and y)
points = np.array([x, y]).T.reshape(-1, 1, 2)
segments = np.concatenate([points[:-1], points[1:]], axis=1)
# Create the line collection object, setting the colormapping parameters.
# Have to set the actual values used for colormapping separately.
lc = LineCollection(segments, cmap=cmap, norm=norm)
lc.set_array(z)
lc.set_linewidth(3)
fig1 = plt.figure()
plt.gca().add_collection(lc)
plt.xlim(x.min(), x.max())
plt.ylim(-1.1, 1.1)
plt.show()
See the answer here to generate the "periods" and then use the matplotlib scatter function as #tcaswell mentioned. Using the plot.hold function you can plot each period, colors will increment automatically.
Cribbing the color choice off of #JoeKington,
import numpy as np
import matplotlib.pyplot as plt
def uniqueish_color(n):
"""There're better ways to generate unique colors, but this isn't awful."""
return plt.cm.gist_ncar(np.random.random(n))
plt.scatter(latt, lont, c=uniqueish_color(len(latt)))
You can do this with scatter.
I have been searching for a short solution how to use pyplots line plot to show a time series coloured by a label feature without using scatter due to the amount of data points.
I came up with the following workaround:
plt.plot(np.where(df["label"]==1, df["myvalue"], None), color="red", label="1")
plt.plot(np.where(df["label"]==0, df["myvalue"], None), color="blue", label="0")
plt.legend()
The drawback is you are creating two different line plots so the connection between the different classes is not shown. For my purposes it is not a big deal. It may help someone.
I'm very new to Python, and I was trying to use this problem as a learning exercise, but I can't get anywhere with it.
What I want to do is to show that for two random variables that come uniformly distributed within a 200ns window, the probability of them arriving within 7ns of each other is ~5%:
X, Y ~ U[0, 200]
Z = X - Y
P(|Z| < 7) = ?
I wanted to know the most analytical way of doing this, because I thought Python might have some useful libraries to help, and because if I wanted to do a stochastic simulation I would do it in C++ ROOT which would take me far less time!
The way that I've done it is below, but it's different from what I've calculated analytically. Can anyone suggest a better/more accurate way of solving the same problem?
Thanks a lot!
from scipy.stats import uniform, expon
import matplotlib.pyplot as plt
import numpy as np
fig, ax = plt.subplots(1, 1)
a, b = 0, 200
size = 1000000
# Genrating uniform distribution
uniform_distribution = uniform(loc=a, scale=b)
x = uniform_distribution.rvs(size=size)
y = uniform_distribution.rvs(size=size)
z=x-y
ax.hist(z)
zsmall=[z for i in z if abs(i)<7]
n=len(zsmall)
print("probability = ",n/size)
Edit: added some code to improve the figure.
Your code is fine, and the results do agree with the analytically derived value. To see this more readily, I have modified your code slightly, scaling the domains of X and Y down to [0, 1] and computing P(|Z| < 7/200), so that this is still equivalent to your original question.
from scipy.stats import uniform
import matplotlib.pyplot as plt
a, b = 0, 1
size = 1000000
# generate uniformly distributed x and y
uniform_distribution = uniform(loc=a, scale=b)
x = uniform_distribution.rvs(size=size)
y = uniform_distribution.rvs(size=size)
z = x - y
# set up figure
fig, ax = plt.subplots(figsize = [16, 8])
ax.set_aspect('equal')
ax.set_xlim([-1, 1])
ax.set_ylim([0, 1])
ax.set_xticks([-1, 0, 1])
ax.set_xticklabels([-1, 0, 1], size=20)
ax.set_yticks([0, 1])
ax.set_yticklabels([0, 1], size=20)
# plot histogram with y-axis scaled to show density,
# increased bin number for better resolution
ax.hist(z, density=True, bins=200, alpha=0.5)
# plot lines around the area we want to estimate
plt.axvline(-7/200, color='black', linestyle='--')
ax.annotate('x = -7/200', xy=(-7/200, 0.4), xytext=(-0.05, 0.4), fontsize=16, ha='right')
plt.axvline( 7/200, color='black', linestyle='--')
ax.annotate('x = 7/200', xy=(7/200, 0.2), xytext=(0.05, 0.2), fontsize=16)
# plot theoretical probability density function
ax.plot([-1, 0], [0, 1], color='gray', linestyle=':')
ax.plot([ 0, 1], [1, 0], color='gray', linestyle=':')
zsmall = [1 for i in z if abs(i) < 7/200]
n = len(zsmall)
print("probability =", n/size)
probability = 0.06857
As you can see, this approaches the theoretically expected triangular distribution (gray dotted lines) pretty closely already. For comparison, we can calculate the theoretical probability, which is the area between the dashed lines and below the dotted lines. We can compute this as the area of the whole rectangle between the dashed lines minus the area of the square consisting of the two small triangles above the dotted lines:
2*(7/200) - (7/200)**2
= 0.068775
So the theoretical value does agree with your simulation result.
A good way to show the concentration of the data points in a plot is using a scatter plot with non-unit transparency. As a result, the areas with more concentration would appear darker.
# this is synthetic example
N = 10000 # a very very large number
x = np.random.normal(0, 1, N)
y = np.random.normal(0, 1, N)
plt.scatter(x, y, marker='.', alpha=0.1) # an area full of dots, darker wherever the number of dots is more
which gives something like this:
Imagine the case we want to emphasize on the outliers. So the situation is almost reversed: A plot in which the less-concentrated areas are bolder. (There might be a trick to apply for my simple example, but imagine a general case where a distribution of points are not known prior, or it's difficult to define a rule for transparency/weight on color.)
I was thinking if there's anything handy same as alpha that is designed for this job specifically. Although other ideas for emphasizing on outliers are also welcomed.
UPDATE: This is what happens when more then one data point is scattered on the same area:
I'm looking for something like the picture below, the more data point, the less transparent the marker.
To answer the question: You can calculate the density of points, normalize it and encode it in the alpha channel of a colormap.
import numpy as np
from scipy import stats
import matplotlib.pyplot as plt
from matplotlib.colors import LinearSegmentedColormap
# this is synthetic example
N = 10000 # a very very large number
x = np.random.normal(0, 1, N)
y = np.random.normal(0, 1, N)
fig, (ax,ax2) = plt.subplots(ncols=2, figsize=(8,5))
ax.scatter(x, y, marker='.', alpha=0.1)
values = np.vstack([x,y])
kernel = stats.gaussian_kde(values)
weights = kernel(values)
weights = weights/weights.max()
cols = plt.cm.Blues([0.8, 0.5])
cols[:,3] = [1., 0.005]
cmap = LinearSegmentedColormap.from_list("", cols)
ax2.scatter(x, y, c=weights, s = 1, marker='.', cmap=cmap)
plt.show()
Left is the original image, right is the image where higher density points have a lower alpha.
Note, however, that this is undesireable, because high density transparent points are undistinguishable from low density. I.e. in the right image it really looks as though you have a hole in the middle of your distribution.
Clearly, a solution with a colormap which does not contain the color of the background is a lot less confusing to the reader.
import numpy as np
from scipy import stats
import matplotlib.pyplot as plt
# this is synthetic example
N = 10000 # a very very large number
x = np.random.normal(0, 1, N)
y = np.random.normal(0, 1, N)
fig, ax = plt.subplots(figsize=(5,5))
values = np.vstack([x,y])
kernel = stats.gaussian_kde(values)
weights = kernel(values)
weights = weights/weights.max()
ax.scatter(x, y, c = weights, s=9, edgecolor="none", marker='.', cmap="magma")
plt.show()
Here, low density points are still emphazised by darker color, but at the same time it's clear to the viewer that the highest density lies in the middle.
As far as I know, there is no "direct" solution to this quite interesting problem. As a workaround, I propose this solution:
N = 10000 # a very very large number
x = np.random.normal(0, 1, N)
y = np.random.normal(0, 1, N)
fig = plt.figure() # create figure directly to be able to extract the bg color
ax = fig.gca()
ax.scatter(x, y, marker='.') # plot all markers without alpha
bgcolor = ax.get_facecolor() # extract current background color
# plot with alpha, "overwriting" dense points
ax.scatter(x, y, marker='.', color=bgcolor, alpha=0.2)
This will plot all points without transparency and then plot all points again with some transparency, "overwriting" those points with the highest density the most. Setting the alpha value to other higher values will put more emphasis to outliers and vice versa.
Of course the color of the second scatter plot needs to be adjusted to your background color. In my example this is done by extracting the background color and setting it as the new scatter plot's color.
This solution is independent of the kind of distribution. It only depends on the density of the points. However it produces twice the amount of points, thus may take slightly longer to render.
Reproducing the edit in the question, my solution is showing exactly the desired behavior. The leftmost point is a single point and is the darkest, the rightmost is consisting of three points and is the lightest color.
x = [0, 1, 1, 2, 2, 2]
y = [0, 0, 0, 0, 0, 0]
fig = plt.figure() # create figure directly to be able to extract the bg color
ax = fig.gca()
ax.scatter(x, y, marker='.', s=10000) # plot all markers without alpha
bgcolor = ax.get_facecolor() # extract current background color
# plot with alpha, "overwriting" dense points
ax.scatter(x, y, marker='.', color=bgcolor, alpha=0.2, s=10000)
Assuming that the distributions are centered around a specific point (e.g. (0,0) in this case), I would use this:
import numpy as np
import matplotlib.pyplot as plt
N = 500
# 0 mean, 0.2 std
x = np.random.normal(0,0.2,N)
y = np.random.normal(0,0.2,N)
# calculate the distance to (0, 0).
color = np.sqrt((x-0)**2 + (y-0)**2)
plt.scatter(x , y, c=color, cmap='plasma', alpha=0.7)
plt.show()
Results:
I don't know if it helps you, because it's not exactly you asked for, but you can simply color points, which values are bigger than some threshold. For example:
import matplotlib.pyplot as plt
num = 100
threshold = 80
x = np.linspace(0, 100, num=num)
y = np.random.normal(size=num)*45
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)
ax.scatter(x[np.abs(y) < threshold], y[np.abs(y) < threshold], color="#00FFAA")
ax.scatter(x[np.abs(y) >= threshold], y[np.abs(y) >= threshold], color="#AA00FF")
plt.show()
I am trying to plot plot a function over a 2D domain using contourf. Unfortunately, my first attempt did not work out very well. There was a region in the plot that was unexpectedly not covered by any contours. For debugging purposes, I have reduced the problem to the smallest dataset that I could find that also reveals the issue with missing filled contours:
import matplotlib.pyplot as plt
import numpy as np
v = np.array([0, 1, 2, 3])
x, y = np.meshgrid(v, v)
z = np.array([[5.5e-14, 5.5e-14, 5.5e-14, 5.5e-14],
[2e-13, 2e-13, 2e-13, 2e-13],
[2.2e-13, 2.2e-13, 2.2e-13, 2.2e-13],
[0, 0,0, 0]])
fig, ax = plt.subplots()
cntr = ax.contourf(x, y, z)
fig.colorbar(cntr, ax=ax)
plt.show()
This gives the following plot:
as you can see there are missing contours from y = 1.5 to approximately y=2.0.
Another strange thing I observed: If I normalize the z matrix by multiplying by e.g. 1e14 before plotting it works fine.
I have two list as below:
latt=[42.0,41.978567980875397,41.96622693388357,41.963791391892457,...,41.972407378075879]
lont=[-66.706920989908909,-66.703116557977069,-66.707351643324543,...-66.718218142021925]
now I want to plot this as a line, separate each 10 of those 'latt' and 'lont' records as a period and give it a unique color.
what should I do?
There are several different ways to do this. The "best" approach will depend mostly on how many line segments you want to plot.
If you're just going to be plotting a handful (e.g. 10) line segments, then just do something like:
import numpy as np
import matplotlib.pyplot as plt
def uniqueish_color():
"""There're better ways to generate unique colors, but this isn't awful."""
return plt.cm.gist_ncar(np.random.random())
xy = (np.random.random((10, 2)) - 0.5).cumsum(axis=0)
fig, ax = plt.subplots()
for start, stop in zip(xy[:-1], xy[1:]):
x, y = zip(start, stop)
ax.plot(x, y, color=uniqueish_color())
plt.show()
If you're plotting something with a million line segments, though, this will be terribly slow to draw. In that case, use a LineCollection. E.g.
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.collections import LineCollection
xy = (np.random.random((1000, 2)) - 0.5).cumsum(axis=0)
# Reshape things so that we have a sequence of:
# [[(x0,y0),(x1,y1)],[(x0,y0),(x1,y1)],...]
xy = xy.reshape(-1, 1, 2)
segments = np.hstack([xy[:-1], xy[1:]])
fig, ax = plt.subplots()
coll = LineCollection(segments, cmap=plt.cm.gist_ncar)
coll.set_array(np.random.random(xy.shape[0]))
ax.add_collection(coll)
ax.autoscale_view()
plt.show()
For both of these cases, we're just drawing random colors from the "gist_ncar" coloramp. Have a look at the colormaps here (gist_ncar is about 2/3 of the way down): http://matplotlib.org/examples/color/colormaps_reference.html
Copied from this example:
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.collections import LineCollection
from matplotlib.colors import ListedColormap, BoundaryNorm
x = np.linspace(0, 3 * np.pi, 500)
y = np.sin(x)
z = np.cos(0.5 * (x[:-1] + x[1:])) # first derivative
# Create a colormap for red, green and blue and a norm to color
# f' < -0.5 red, f' > 0.5 blue, and the rest green
cmap = ListedColormap(['r', 'g', 'b'])
norm = BoundaryNorm([-1, -0.5, 0.5, 1], cmap.N)
# Create a set of line segments so that we can color them individually
# This creates the points as a N x 1 x 2 array so that we can stack points
# together easily to get the segments. The segments array for line collection
# needs to be numlines x points per line x 2 (x and y)
points = np.array([x, y]).T.reshape(-1, 1, 2)
segments = np.concatenate([points[:-1], points[1:]], axis=1)
# Create the line collection object, setting the colormapping parameters.
# Have to set the actual values used for colormapping separately.
lc = LineCollection(segments, cmap=cmap, norm=norm)
lc.set_array(z)
lc.set_linewidth(3)
fig1 = plt.figure()
plt.gca().add_collection(lc)
plt.xlim(x.min(), x.max())
plt.ylim(-1.1, 1.1)
plt.show()
See the answer here to generate the "periods" and then use the matplotlib scatter function as #tcaswell mentioned. Using the plot.hold function you can plot each period, colors will increment automatically.
Cribbing the color choice off of #JoeKington,
import numpy as np
import matplotlib.pyplot as plt
def uniqueish_color(n):
"""There're better ways to generate unique colors, but this isn't awful."""
return plt.cm.gist_ncar(np.random.random(n))
plt.scatter(latt, lont, c=uniqueish_color(len(latt)))
You can do this with scatter.
I have been searching for a short solution how to use pyplots line plot to show a time series coloured by a label feature without using scatter due to the amount of data points.
I came up with the following workaround:
plt.plot(np.where(df["label"]==1, df["myvalue"], None), color="red", label="1")
plt.plot(np.where(df["label"]==0, df["myvalue"], None), color="blue", label="0")
plt.legend()
The drawback is you are creating two different line plots so the connection between the different classes is not shown. For my purposes it is not a big deal. It may help someone.