Let me begin this query by admitting that I am very new to Python. I want to create contour plot of the data in Python so as to automate the process, which otherwise can be easily carried out using Surfer. I have 1000s of such data files, and creating manually could be very tedious.
The data I'm using looks like follows, which is a dataframe with 0, 1 and 2 headers and 1,2,..279 as index:
0 1 2
0 3 -1 -0.010700
1 4 -1 0.040100
2 5 -1 0.061000
3 6 -1 0.052000
4 7 -1 0.013100
.. .. .. ...
275 30 -9 -1.530100
276 31 -9 -1.362300
277 32 -9 -1.190200
278 33 -9 -1.083600
279 30 -10 -1.864600
[280 rows x 3 columns]
Here,
x=data[0]
y=data[1]
z=data[2]
As contour function pf matplotlib requires z to be a 2D array; this is where the confusion begins. Following several solutions of stackoverflow queries, I did the following:
import numpy as np
x=np.array(x)
y=np.array(y)
z=np.array(z)
X, Y = np.meshgrid(x, y)
import scipy.interpolate
rbf = scipy.interpolate.Rbf(x, y, z, function='cubic')
Z=rbf(X,Y)
lmin=data[2].min()
lmax=data[2].max()
progn=(lmax-lmin)/20
limit=np.arange(lmin,lmax,progn)
fig, ax = plt.subplots(figsize=(6,2)) #x ranges between 3 to 57, y -1 to -10
ax.contour(X,Y,Z,limit)
ax.set_title('Contour Plot')
plt.show()
With the above code this plot is derived.
However, it is not desired and if once can see through the surfacial noise lines then there are ordered contour lines underneath, which actually is desired as seen from the contour plot generated by surfer here.
I'd like to reiterate that the same data was used in generating the surfer plot.
Any help in creating the desired plot shall be highly appreciated.
Thanks to #JohanC for the answer. I'd like to put his suggestion to perspective with my query.
ax.contour replaced by ax.tricontour solves my situation. And ax.tricontourf gets the contour fill done. Therefore, the last segment of my code would be:
fig, ax = plt.subplots(figsize=(6,2)) #x ranges between 3 to 57, y -1 to -10
ax.tricontour(X,Y,Z,limit)
ax.tricontourf(X,Y,Z,limit)
ax.set_title('Contour Plot')
plt.show()
I had a similar issue working with irregularly spaced data on top of that loss of missing data. The suggestion made by someone about a 2D scatter plot is the perfect solution.
plt.figure(figsize=(10,10))
plt.scatter(df.doy,df[i].UT,c=df[i].TEC,s=10,cmap="jet")
plt.colorbar()
Likewsie, I plotted the same contour plot using plt.tricontourf and got the same result
plt.figure(figsize=(10,10))
plt.tricontourf(df.doy,df.UT,df.TEC,100,cmap="jet")
plt.colorbar()
Related
I have been trying to plot a 3d graph using matplotlib on Python. Here is a copy of the code:
import math
import matplotlib.pyplot as plt
from mpl_toolkits import mplot3d
#global variable a
a = 1
#time steps = 100
T = 100
#length steps = 50
L = 50
time = [i for i in range(T)]
#checking for 50 points onthe vent only
length = [i for i in range(L)]
#creating the concentration matrix for each point in the vent at each point in time. The innerlists represent the point on the vent. Each such list is the time stamp.
concentration = [ [i for i in range(L)] for j in range(T)]
#Boundary condition at x=0
def h(t):
return (1/(t+(1/a))) + a
#Boundary condition at x=L
def j(t):
return (-1/(t+(1/a))) + a
#Initilal condition is that the temperate is 0 everywhere on the vent. Hence we are setting the concentration to equal 0 at every point in the vent when time is equal to 0
for i in range(5):
concentration[0][i] = 2
for i in range(5,L):
concentration[0][i] = 0
#Setting the boundary conditions
for i in range(T):
concentration[i][0] = h(i)
concentration[i][L-1] = j(i)
#Doing the numerical solution based on the formula in Walter Strauss
for j in range(0,L-1): #j is the position column
for n in range(0,T-1): #n is the time row
concentration[n+1][j] = concentration[n][j+1] + concentration[n][j-1] + (-1)*concentration[n][j]
This part of the code runs fine, and without any errors. However, when I try to plot the data using the following instructions:
fig = plt.figure()
ax = plt.axes(projection='3d')
ax.scatter(time, length, concentration, cmap='binary')
ax.set_xlabel('rod')
ax.set_ylabel('time')
ax.set_zlabel('concentration');
I get an error. The error reads:
ValueError Traceback (most recent call last)
<ipython-input-12-321ea987eeb6> in <module>()
53 fig = plt.figure()
54 ax = plt.axes(projection='3d')
---> 55 ax.scatter(time, length, concentration, cmap='binary')
56 ax.set_xlabel('rod')
57 ax.set_ylabel('time')
2 frames
<__array_function__ internals> in broadcast_arrays(*args, **kwargs)
/usr/local/lib/python3.6/dist-packages/numpy/lib/stride_tricks.py in _broadcast_shape(*args)
189 # use the old-iterator because np.nditer does not handle size 0 arrays
190 # consistently
--> 191 b = np.broadcast(*args[:32])
192 # unfortunately, it cannot handle 32 or more arguments directly
193 for pos in range(32, len(args), 31):
ValueError: shape mismatch: objects cannot be broadcast to a single shape
Could someone help me understand where the error is coming from? I don't think this is a computational error in the first half of the code since it runs just fine. The dimensions of the concentration matrix, and the length and time matrix all match up perfectly well. But the 3d plotting gives an error.
If you want to create a scatter plot (you might consider a surface plot, such as ax.plot_wireframe(), if you want to create a continuous surface) of every single point in your concentration list, you will plot 100x50 points. Your x and y values have to follow this shape (hence the error message you observed), so they cannot stay 100 and 50, but need to be 2-dimensional, too. Hence, you have to create a meshgrid that transforms both dimensions into arrays with the dimensions 100x50.
After adding...
import numpy as np
time, length = np.meshgrid(time, length)
... a plot is created and shown.
I have data that are multidimensional compositional data (all dimensions sum to 1 or 100). I have learned how to use three of the variables to create a 2d ternary plot.
I would like to add a fourth dimension such that my plot looks like this.
I am willing to use python or R. I am using pyr2 to create the ternary plots in python using R right now, but just because that's an easy solution. If the ternary data could be transformed into 3d coordinates a simple wire plot could be used.
This post shows how 3d compositional data can be transformed into 2d data so that normal plotting method can be used. One solution would be to do the same thing in 3d.
Here is some sample Data:
c1 c2 c3 c4
0 0.082337 0.097583 0.048608 0.771472
1 0.116490 0.065047 0.066202 0.752261
2 0.114884 0.135018 0.073870 0.676229
3 0.071027 0.097207 0.070959 0.760807
4 0.066284 0.079842 0.103915 0.749959
5 0.016074 0.074833 0.044532 0.864561
6 0.066277 0.077837 0.058364 0.797522
7 0.055549 0.057117 0.045633 0.841701
8 0.071129 0.077620 0.049066 0.802185
9 0.089790 0.086967 0.083101 0.740142
10 0.084430 0.094489 0.039989 0.781093
Well, I solved this myself using a wikipedia article, an SO post, and some brute force. Sorry for the wall of code, but you have to draw all the plot outlines and labels and so forth.
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import axes3d, Axes3D
from itertools import combinations
import pandas as pd
def plot_ax(): #plot tetrahedral outline
verts=[[0,0,0],
[1,0,0],
[0.5,np.sqrt(3)/2,0],
[0.5,0.28867513, 0.81649658]]
lines=combinations(verts,2)
for x in lines:
line=np.transpose(np.array(x))
ax.plot3D(line[0],line[1],line[2],c='0')
def label_points(): #create labels of each vertices of the simplex
a=(np.array([1,0,0,0])) # Barycentric coordinates of vertices (A or c1)
b=(np.array([0,1,0,0])) # Barycentric coordinates of vertices (B or c2)
c=(np.array([0,0,1,0])) # Barycentric coordinates of vertices (C or c3)
d=(np.array([0,0,0,1])) # Barycentric coordinates of vertices (D or c3)
labels=['a','b','c','d']
cartesian_points=get_cartesian_array_from_barycentric([a,b,c,d])
for point,label in zip(cartesian_points,labels):
if 'a' in label:
ax.text(point[0],point[1]-0.075,point[2], label, size=16)
elif 'b' in label:
ax.text(point[0]+0.02,point[1]-0.02,point[2], label, size=16)
else:
ax.text(point[0],point[1],point[2], label, size=16)
def get_cartesian_array_from_barycentric(b): #tranform from "barycentric" composition space to cartesian coordinates
verts=[[0,0,0],
[1,0,0],
[0.5,np.sqrt(3)/2,0],
[0.5,0.28867513, 0.81649658]]
#create transformation array vis https://en.wikipedia.org/wiki/Barycentric_coordinate_system
t = np.transpose(np.array(verts))
t_array=np.array([t.dot(x) for x in b]) #apply transform to all points
return t_array
def plot_3d_tern(df,c='1'): #use function "get_cartesian_array_from_barycentric" to plot the scatter points
#args are b=dataframe to plot and c=scatter point color
bary_arr=df.values
cartesian_points=get_cartesian_array_from_barycentric(bary_arr)
ax.scatter(cartesian_points[:,0],cartesian_points[:,1],cartesian_points[:,2],c=c)
#Create Dataset 1
np.random.seed(123)
c1=np.random.normal(8,2.5,20)
c2=np.random.normal(8,2.5,20)
c3=np.random.normal(8,2.5,20)
c4=[100-x for x in c1+c2+c3] #make sur ecomponents sum to 100
#df unecessary but that is the format of my real data
df1=pd.DataFrame(data=[c1,c2,c3,c4],index=['c1','c2','c3','c4']).T
df1=df1/100
#Create Dataset 2
np.random.seed(1234)
c1=np.random.normal(16,2.5,20)
c2=np.random.normal(16,2.5,20)
c3=np.random.normal(16,2.5,20)
c4=[100-x for x in c1+c2+c3]
df2=pd.DataFrame(data=[c1,c2,c3,c4],index=['c1','c2','c3','c4']).T
df2=df2/100
#Create Dataset 3
np.random.seed(12345)
c1=np.random.normal(25,2.5,20)
c2=np.random.normal(25,2.5,20)
c3=np.random.normal(25,2.5,20)
c4=[100-x for x in c1+c2+c3]
df3=pd.DataFrame(data=[c1,c2,c3,c4],index=['c1','c2','c3','c4']).T
df3=df3/100
fig = plt.figure()
ax = Axes3D(fig) #Create a 3D plot in most recent version of matplot
plot_ax() #call function to draw tetrahedral outline
label_points() #label the vertices
plot_3d_tern(df1,'b') #call function to plot df1
plot_3d_tern(df2,'r') #...plot df2
plot_3d_tern(df3,'g') #...
The accepted answer explains how to do this in python but the question was also asking about R.
I've provided an answer in this thread on how to do this 'manually' in R.
Otherwise, you can use the klaR package directly for this:
df <- matrix(c(
0.082337, 0.097583, 0.048608, 0.771472,
0.116490, 0.065047, 0.066202, 0.752261,
0.114884, 0.135018, 0.073870, 0.676229,
0.071027, 0.097207, 0.070959, 0.760807,
0.066284, 0.079842, 0.103915, 0.749959,
0.016074, 0.074833, 0.044532, 0.864561,
0.066277, 0.077837, 0.058364, 0.797522,
0.055549, 0.057117, 0.045633, 0.841701,
0.071129, 0.077620, 0.049066, 0.802185,
0.089790, 0.086967, 0.083101, 0.740142,
0.084430, 0.094489, 0.039989, 0.781094
), byrow = TRUE, nrow = 11, ncol = 4)
# install.packages(c("klaR", "scatterplot3d"))
library(klaR)
#> Loading required package: MASS
quadplot(df)
Created on 2020-08-14 by the reprex package (v0.3.0)
I have dataframes with columns containing x,y coordinates for multiple points. One row can consist of several points.
I'm trying to find out an easy way to be able to plot lines between each point generating a curve for each row of data.
Here is a simplified example where two lines are represented by two points each.
line1 = {'p1_x':1, 'p1_y':10, 'p2_x':2, 'p2_y':11 }
line2 = {'p1_x':2, 'p1_y':9, 'p2_x':3, 'p2_y':12 }
df = pd.DataFrame([line1,line2])
df.plot(y=['p1_y','p2_y'], x=['p1_x','p2_x'])
when trying to plot them I expect line 1 to start where x=1 and line 2 to start where x=2.
Instead, the x axis contains two value-pairs (1,2) and (2,3) and both lines have the same start and end-point in x-axis.
How do I get around this problem?
Edit:
If using matplotlib, the following hardcoded values generates the plot i'm interested in
plt.plot([[1,2],[2,3]],[[10,9],[11,12]])
While I'm sure that there should be a more succinct way using pure pandas, here's a simple approach using matplotlib and some derivatives from the original df.(I hope I understood the question correctly)
Assumption: In df, you place x values in even columns and y values in odd columns
Obtain x values
x = df.loc[:, df.columns[::2]]
x
p1_x p2_x
0 1 2
1 2 3
Obtain y values
y = df.loc[:, df.columns[1::2]]
y
p1_y p2_y
0 10 11
1 9 12
Then plot using a for loop
for i in range(len(df)):
plt.plot(x.iloc[i,:], y.iloc[i,:])
One does not need to create additional data frames. One can loop through the rows to plot these lines:
line1 = {'p1_x':1, 'p1_y':10, 'p2_x':2, 'p2_y':11 }
line2 = {'p1_x':2, 'p1_y':9, 'p2_x':3, 'p2_y':12 }
df = pd.DataFrame([line1,line2])
for i in range(len(df)): # for each row:
# plt.plot([list of Xs], [list of Ys])
plt.plot([df.iloc[i,0],df.iloc[i,2]],[df.iloc[i,1],df.iloc[i,3]])
plt.show()
The lines will be drawn in different colors. To get lines of same color, one can add option c='k' or whatever color one wants.
plt.plot([df.iloc[i,0],df.iloc[i,2]],[df.iloc[i,1],df.iloc[i,3]], c='k')
I generaly don't use the pandas plotting because I think it is rather limited, if using matplotlib is not an issue, the following code works:
from matplotlib import pyplot as plt
plt.plot(df.p1_x,df.p1_y)
plt.plot(df.p2_x,df.p2_y)
plt.plot()
if you got lots of lines to plot, you can use a for loop.
This question already has answers here:
Matplotlib: how to make imshow read x,y coordinates from other numpy arrays?
(4 answers)
Set Matplotlib colorbar size to match graph
(9 answers)
Closed 5 years ago.
PREAMBLE: I have seen these, but I can't figure out from the answer how to do the plot. Also, I'm new with python and matplotlib.
I have a data file of the form
X Y Z
0.05 1 z
0.10 1 z
... ... ...
0.95 1 z
0.05 2 z
... ... ...
... ... ...
0.95 10 z
with z in [-0.02:0.5] for each of them. These results in 190 (x,y,z) points.
I acquire the data in this way
data_file = open('tau.txt', 'r')
buffer = data_file.read()
data_file.close()
data = [map(float, row.split('\t')) for row
buffer.strip().split("\n")]
As the link suggests, I convert them into a grid
mu = []
alpha = []
tau = []
for elements in data:
mu.append(elements[0])
alpha.append(elements[1])
tau.append(elements[2])
x_data = np.asarray(mu)
y_data = np.asarray(alpha)
z_data = np.asarray(tau)
xi = np.linspace(0.05,0.95,19)
yi = np.linspace(1,10,10)
ar = griddata(x_data,y_data,z_data,xi,yi,interp='nn')
Then I do the plot: I would like this so that each (x,y) co-ordinate has a square centered on the co-ordinate, with a colorbar showing the z value.
cmap = mpl.colors.LinearSegmentedColormap.from_list('my_colormap',
['white','grey','black'],256)
img = plt.imshow(ar,interpolation='nearest',cmap =
cmap,origin='lower')
plt.colorbar(img,cmap=cmap)
I obtain this:
First of all, I want the colourbar to be of the same height of the plot itself. I can't understand how to avoid this trash.
Moreover, if you look at the file you immediately see that ranges are not right: x has to be in [0.05:0.95] and y in [1:10]. y is simply shifted of 1 (the white lines, with all z=0 should be for y=1 and not y=0), while x assumes values I can't understand.
I this is important to note that except for these, the plot is right, both in the z values and in the trend.
How can I fix my problem(s)?
imshow is rather used for plotting images and matrices using a grid the same size as your matrix or image. Thats why your x- and y- axis are that way.
For what you are trying to do use pcolormesh or pcolor
in combination with numpy.meshgrid to get the correct x and y spacing.
These functions should also support non-regular grid spacings.
This page has some information on how it works.
I have a dataset of three columns and n number of rows. column 1 contains name, column 2 value1, and column 3 value2 (rank2).
I want to plot a scatter plot with the outlier values displaying names.
The R commands I am using in are:
tiff('scatterplot.tiff')
data<-read.table("scatterplot_data", header=T)
attach(data)
reg1<-lm(A~B)
plot(A,B,col="red")
abline(reg1)
outliers<-data[which(2^(data[,2]-data[,3]) >= 4 | 2^(data[,2]-data[,3]) <=0.25),]
text(outliers[,2], outliers[,3],labels=outliers[,1],cex=0.50)
dev.off()
and I get a figure like this:
What I want is the labels on the lower half should be of one colour and the labels in upper half should be of another colour say green and red respectively.
Any suggestions, or adjustment in the commands?
You already have a logical test that works to your satisfaction. Just use it in the color spec to text:
text(outliers[,2], outliers[,3],labels=outliers[,1],cex=0.50,
col=c("blue", "green")[
which(2^(data[,2]-data[,3]) >= 4 , 2^(data[,2]-data[,3]) <=0.25)] )
It's untested of course because you offered no test case, but my reasoning is that the which() function should return 1 for the differences >= 4, and 2 for the ones <= 0.25, and integer(0) for all the others and that this should give you the proper alignment of color choices with the 'outliers' vector.
Using python, matplotlib (pylab) to plot, and scipy, numpy to fit data. The trick with numpy is to create a index or mask to filter out the results that you want.
EDIT: Want to selectively color the top and bottom outliers? It's a simple combination of both masks that we created:
import scipy as sci
import numpy as np
import pylab as plt
# Create some data
N = 1000
X = np.random.normal(5,1,size=N)
Y = X + np.random.normal(0,5.5,size=N)/np.random.normal(5,.1)
NAMES = ["foo"]*1000 # Customize names here
# Fit a polynomial
(a,b)=sci.polyfit(X,Y,1)
# Find all points above the line
idx = (X*a + b) < Y
# Scatter according to that index
plt.scatter(X[idx],Y[idx], color='r')
plt.scatter(X[~idx],Y[~idx], color='g')
# Find top 10 outliers
err = ((X*a+b) - Y) ** 2
idx_L = np.argsort(err)[-10:]
for i in idx_L:
plt.text(X[i], Y[i], NAMES[i])
# Color the outliers purple or black
top = idx_L[idx[idx_L]]
bot = idx_L[~idx[idx_L]]
plt.scatter(X[top],Y[top], color='purple')
plt.scatter(X[bot],Y[bot], color='black')
XF = np.linspace(0,10,1000)
plt.plot(XF, XF*a + b, 'k--')
plt.axis('tight')
plt.show()