I have a dataframe (size: 1008,100). the values of cells are within 0.1 and 1. I would like to visualize it in a surface plot but i cannot really figure out what the x,y and z values are gonna be. I'd like to position the surface plot like the rows(1008) is the aligned with the x axis and the columns(100) is aligned with the y axis.
Any help is much appreciated.
thanks
The x and y you are looking for can be created with meshgrid. A good way to start is to find an example on the matplotlib gallery and make changes from there. As an example:
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
# create a data frame with 1008x100 values between 0.1 and 1.0
xs = np.arange(0, 1008)
ys = np.arange(0,100)
zs = np.square(xs[:, np.newaxis]) * np.square(ys[np.newaxis, :])
max_value = np.max(zs)
min_value = np.min(zs)
zs = (zs - min_value) / (max_value - min_value) * 0.9 + 0.1
data = pd.DataFrame(zs)
# create X and Y with np.meshgrid and the 2D data frame
# (reusing the scratch variable xs and ys)
xs = np.arange(data.shape[0]) # [0,1,....,1007]
ys = np.arange(data.shape[1]) # [0,1,...,99]
X, Y = np.meshgrid(xs, ys)
# create a surface plot
fig = plt.figure()
ax = fig.gca(projection='3d')
ax.plot_surface(X, Y, data.T)
( note: i needed to transpose the data with .T, don't know why, sometime it is needed...)
I want to plot a piecewise function in Python where there are two variables x and y. This means I would need some kind of contour graph. In Matlab one may use
syms x y
eq1 = 0.1*(x/2)^2-0.3*(y/4)^2;
eq2 = 0.15*(x/3)^2-0.25*(y/2)^2;
ezplot(eq1,[-5 5 -10 10]);
hold on
ezplot(eq2,[-4 4 -5 5]);
where ezplot plots eq1 = 0 over xmin < x < xmax and ymin < y < ymax. Is there any (simple) equivalent function(s) in Python?
I have looked at the solutions in this post. Their problem only involves one variable x so it's not helpful in my case.
You can separate the different pieces by using numpy's masked array.
At first define your functions, the x and y ranges and their 2D versions for the contour plot:
import numpy as np
import matplotlib.pyplot as plt
from numpy.ma import masked_array as marr
def eq1(x, y):
return 0.1*(x/2)**2-0.3*(y/4)**2
def eq2(x, y):
return 0.15*(x/3)**2-0.25*(y/2)**2
x = np.linspace(-5, 5, 101)
y = np.linspace(-10, 10, 201)
xx, yy = np.meshgrid(x, y)
Now we need a 2D mask (or some, depending on the number of pieces of the piecewise function) to separate definition ranges of the different pieces of the function:
maskx = ((xx>=-4) * (xx<=4))
masky = ((yy>=-5) * (yy<=5))
mask = maskx * masky
This can be applied to different masked arrays:
res1 = marr(eq1(xx, yy), mask)
res2 = marr(eq2(xx, yy), ~mask)
Plotting a masked array leaves all the area blank where the mask is True:
fig, axs = plt.subplots(1, 2, sharey=True)
axs[0].contour(xx, yy, res1)
axs[0].contour(xx, yy, res2)
axs[0].set_title('contour')
axs[1].contourf(xx, yy, res1)
axs[1].contourf(xx, yy, res2)
axs[1].set_title('contourf')
I have data of a plot on two arrays that are stored in unsorted way, so the plot jumps from one place to another discontinuously:
I have tried one example of finding the closest point in a 2D array:
import numpy as np
def distance(pt_1, pt_2):
pt_1 = np.array((pt_1[0], pt_1[1]))
pt_2 = np.array((pt_2[0], pt_2[1]))
return np.linalg.norm(pt_1-pt_2)
def closest_node(node, nodes):
nodes = np.asarray(nodes)
dist_2 = np.sum((nodes - node)**2, axis=1)
return np.argmin(dist_2)
a = []
for x in range(50000):
a.append((np.random.randint(0,1000),np.random.randint(0,1000)))
some_pt = (1, 2)
closest_node(some_pt, a)
Can I use it somehow to "clean" my data? (in the above code, a can be my data)
Exemplary data from my calculations is:
array([[ 2.08937872e+001, 1.99020033e+001, 2.28260611e+001,
6.27711094e+000, 3.30392288e+000, 1.30312878e+001,
8.80768833e+000, 1.31238275e+001, 1.57400130e+001,
5.00278061e+000, 1.70752624e+001, 1.79131456e+001,
1.50746185e+001, 2.50095731e+001, 2.15895974e+001,
1.23237801e+001, 1.14860312e+001, 1.44268222e+001,
6.37680265e+000, 7.81485403e+000],
[ -1.19702178e-001, -1.14050879e-001, -1.29711421e-001,
8.32977493e-001, 7.27437322e-001, 8.94389885e-001,
8.65931116e-001, -6.08199292e-002, -8.51922900e-002,
1.12333841e-001, -9.88131292e-324, 4.94065646e-324,
-9.88131292e-324, 4.94065646e-324, 4.94065646e-324,
0.00000000e+000, 0.00000000e+000, 0.00000000e+000,
-4.94065646e-324, 0.00000000e+000]])
After using radial_sort_line (of Joe Kington) I have received the following plot:
This is actually a problem that's tougher than you might think in general.
In your exact case, you might be able to get away with sorting by the y-values. It's hard to tell for sure from the plot.
Therefore, a better approach for somewhat circular shapes like this is to do a radial sort.
For example, let's generate some data somewhat similar to yours:
import numpy as np
import matplotlib.pyplot as plt
t = np.linspace(.2, 1.6 * np.pi)
x, y = np.cos(t), np.sin(t)
# Shuffle the points...
i = np.arange(t.size)
np.random.shuffle(i)
x, y = x[i], y[i]
fig, ax = plt.subplots()
ax.plot(x, y, color='lightblue')
ax.margins(0.05)
plt.show()
Okay, now let's try to undo that shuffle by using a radial sort. We'll use the centroid of the points as the center and calculate the angle to each point, then sort by that angle:
x0, y0 = x.mean(), y.mean()
angle = np.arctan2(y - y0, x - x0)
idx = angle.argsort()
x, y = x[idx], y[idx]
fig, ax = plt.subplots()
ax.plot(x, y, color='lightblue')
ax.margins(0.05)
plt.show()
Okay, pretty close! If we were working with a closed polygon, we'd be done.
However, we have one problem -- This closes the wrong gap. We'd rather have the angle start at the position of the largest gap in the line.
Therefore, we'll need to calculate the gap to each adjacent point on our new line and re-do the sort based on a new starting angle:
dx = np.diff(np.append(x, x[-1]))
dy = np.diff(np.append(y, y[-1]))
max_gap = np.abs(np.hypot(dx, dy)).argmax() + 1
x = np.append(x[max_gap:], x[:max_gap])
y = np.append(y[max_gap:], y[:max_gap])
Which results in:
As a complete, stand-alone example:
import numpy as np
import matplotlib.pyplot as plt
def main():
x, y = generate_data()
plot(x, y).set(title='Original data')
x, y = radial_sort_line(x, y)
plot(x, y).set(title='Sorted data')
plt.show()
def generate_data(num=50):
t = np.linspace(.2, 1.6 * np.pi, num)
x, y = np.cos(t), np.sin(t)
# Shuffle the points...
i = np.arange(t.size)
np.random.shuffle(i)
x, y = x[i], y[i]
return x, y
def radial_sort_line(x, y):
"""Sort unordered verts of an unclosed line by angle from their center."""
# Radial sort
x0, y0 = x.mean(), y.mean()
angle = np.arctan2(y - y0, x - x0)
idx = angle.argsort()
x, y = x[idx], y[idx]
# Split at opening in line
dx = np.diff(np.append(x, x[-1]))
dy = np.diff(np.append(y, y[-1]))
max_gap = np.abs(np.hypot(dx, dy)).argmax() + 1
x = np.append(x[max_gap:], x[:max_gap])
y = np.append(y[max_gap:], y[:max_gap])
return x, y
def plot(x, y):
fig, ax = plt.subplots()
ax.plot(x, y, color='lightblue')
ax.margins(0.05)
return ax
main()
Sorting the data base on their angle relative to the center as in #JoeKington 's solution might have problems with some parts of the data:
In [1]:
import scipy.spatial as ss
import matplotlib.pyplot as plt
import numpy as np
import re
%matplotlib inline
In [2]:
data=np.array([[ 2.08937872e+001, 1.99020033e+001, 2.28260611e+001,
6.27711094e+000, 3.30392288e+000, 1.30312878e+001,
8.80768833e+000, 1.31238275e+001, 1.57400130e+001,
5.00278061e+000, 1.70752624e+001, 1.79131456e+001,
1.50746185e+001, 2.50095731e+001, 2.15895974e+001,
1.23237801e+001, 1.14860312e+001, 1.44268222e+001,
6.37680265e+000, 7.81485403e+000],
[ -1.19702178e-001, -1.14050879e-001, -1.29711421e-001,
8.32977493e-001, 7.27437322e-001, 8.94389885e-001,
8.65931116e-001, -6.08199292e-002, -8.51922900e-002,
1.12333841e-001, -9.88131292e-324, 4.94065646e-324,
-9.88131292e-324, 4.94065646e-324, 4.94065646e-324,
0.00000000e+000, 0.00000000e+000, 0.00000000e+000,
-4.94065646e-324, 0.00000000e+000]])
In [3]:
plt.plot(data[0], data[1])
plt.title('Unsorted Data')
Out[3]:
<matplotlib.text.Text at 0x10a5c0550>
See x values between 15 and 20 are not sorted correctly.
In [10]:
#Calculate the angle in degrees of [0, 360]
sort_index = np.angle(np.dot((data.T-data.mean(1)), np.array([1.0, 1.0j])))
sort_index = np.where(sort_index>0, sort_index, sort_index+360)
#sorted the data by angle and plot them
sort_index = sort_index.argsort()
plt.plot(data[0][sort_index], data[1][sort_index])
plt.title('Data Sorted by angle relatively to the centroid')
plt.plot(data[0], data[1], 'r+')
Out[10]:
[<matplotlib.lines.Line2D at 0x10b009e10>]
We can sort the data based on a nearest neighbor approach, but since the x and y are of very different scale, the choice of distance metrics becomes an important issue. We will just try all the distance metrics available in scipy to get an idea:
In [7]:
def sort_dots(metrics, ax, start):
dist_m = ss.distance.squareform(ss.distance.pdist(data.T, metrics))
total_points = data.shape[1]
points_index = set(range(total_points))
sorted_index = []
target = start
ax.plot(data[0, target], data[1, target], 'o', markersize=16)
points_index.discard(target)
while len(points_index)>0:
candidate = list(points_index)
nneigbour = candidate[dist_m[target, candidate].argmin()]
points_index.discard(nneigbour)
points_index.discard(target)
#print points_index, target, nneigbour
sorted_index.append(target)
target = nneigbour
sorted_index.append(target)
ax.plot(data[0][sorted_index], data[1][sorted_index])
ax.set_title(metrics)
In [6]:
dmetrics = re.findall('pdist\(X\,\s+\'(.*)\'', ss.distance.pdist.__doc__)
In [8]:
f, axes = plt.subplots(4, 6, figsize=(16,10), sharex=True, sharey=True)
axes = axes.ravel()
for metrics, ax in zip(dmetrics, axes):
try:
sort_dots(metrics, ax, 5)
except:
ax.set_title(metrics + '(unsuitable)')
It looks like standardized euclidean and mahanalobis metrics give the best result. Note that we choose a starting point of the 6th data (index 5), it is the data point this the largest y value (use argmax to get the index, of course).
In [9]:
f, axes = plt.subplots(4, 6, figsize=(16,10), sharex=True, sharey=True)
axes = axes.ravel()
for metrics, ax in zip(dmetrics, axes):
try:
sort_dots(metrics, ax, 13)
except:
ax.set_title(metrics + '(unsuitable)')
This is what happens if you choose the starting point of max. x value (index 13). It appears that mahanalobis metrics is better than standardized euclidean as it is not affected by the starting point we choose.
If we do the assumption that the data are 2D and the x axis should be in an increasing fashion, then you could:
sort the x axis data, e.g. x_old and store the result in a different variable, e.g. x_new
for each element in the x_new find its index in the x_old array
re-order the elements in the y_axis array according to the indices that you got from previous step
I would do it with python list instead of numpy array due to list.index method been more easily manipulated than the numpy.where method.
E.g. (and assume that x_old and y_old are your previous numpy variables for x and y axis respectively)
import numpy as np
x_new_tmp = x_old.tolist()
y_new_tmp = y_old.tolist()
x_new = sorted(x_new_tmp)
y_new = [y_new_tmp[x_new_tmp.index(i)] for i in x_new]
Then you can plot x_new and y_new
Since the complete simulation is to big to post it right here only the code to plot the spectrum is given (I think this is enough)
d = i.sum(axis=2)
pylab.figure(figsize=(15,15))
pylab = imshow(d)
plt.axis('tight')
pylab.show()
This spectrum is given in pixel. But I would like to have this in the units of length. I will hope you may give me some advices.
Do you mean that you want axis ticks to show your custom dimensions instead of the number of pixels in d? If yes, use the extent keyword of imshow:
import numpy
import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt
d = numpy.random.normal(size=(20, 40))
fig = plt.figure()
s = fig.add_subplot(1, 1, 1)
s.imshow(d, extent=(0, 1, 0, 0.5), interpolation='none')
fig.tight_layout()
fig.savefig('tt.png')
I'm guess a bit at what your problem is, so let's start by stating my interpretation/ You have some 2D data d that you plot using imshow and the units on the x and y axes are in the number of pixels. For example in the following we see the x axis labelled from 0 -> 10 for the number of data points:
import numpy as np
import matplotlib.pyplot as plt
# Generate a fake d
x = np.linspace(-1, 1, 10)
y = np.linspace(-1, 1, 10)
X, Y = np.meshgrid(x, y)
d = np.sin(X**2 + Y**2)
plt.imshow(d)
If this correctly describes your issue, then the solution is to avoid using imshow, which is designed to plot images. Firstly this will help as imshow attemps to interpolate to give a smoother image (which may hide features in the spectrum) and second because it is an image, there is no meaningful x and y data so it doesn't plot it.
The best alternative would be to use plt.pcolormesh which generate a psuedocolor plot of a 2D array and takes as arguments X and Y, which are both 2D arrays of points to which the values of d correspond.
For example:
# Generate a fake d
x = np.linspace(-1, 1, 10)
y = np.linspace(-1, 1, 10)
X, Y = np.meshgrid(x, y)
d = np.sin(X**2 + Y**2)
plt.pcolormesh(X, Y, d)
Now the x and y values correspond to the values of X and Y.
I would like to plot a 3D matrix - essentially a box of numbers, each labelled by an x, y, z triad of coordinates- by assigning a different colour to each of the x, y, z point, according to its magnitude (for example, bigger numbers in red and smaller numbers in blue).
I cannot plot sections of the matrix, I rather need to plot the whole matrix together.
If we call matrix3D my matrix, its elements are built this way:
matrix3D[x][y][z] = np.exp(-(x**2+y**2+z**2))
How can I obtain the desired plot?
EDIT: Using Mayavi2 Contour3D(), I have tried to write the following:
from mayavi import mlab
X = np.arange(0, n_x, 1)
Y = np.arange(0, n_z, 1)
Z = np.arange(0, n_z, 1)
X, Y, Z = np.meshgrid(X, Y, Z)
obj = mlab.contour3d(X, Y, Z, matrix3D, contours=4, transparent=True)
where n_x, n_y, n_z are the dimension of the 3 axes. How can I actually see and/or save the image now?
If you need to plot the whole thing I think you're best taking a look at mayavi. This will let you plot a volume and you should be able to get the results you need.
I know you said you need to plot the whole thing at once, but this might still be of some use. You can use countourf to plot like this:
import numpy as np
import matplotlib.pyplot as plt
matrix3D = np.empty((10, 10, 10))
x = np.arange(10)
y = np.arange(10)
z = np.arange(10)
matrix3D[x][y][z] = np.exp(-(x**2+y**2+z**2))
fig = plt.figure()
ax = fig.add_subplot(plt.subplot(1, 1, 1))
ax.contourf(x, y, matrix3D[:, :, 3])
plt.show()
This gives you a slice of the 3D matrix (in this example the 4th slice).