how can i plt this data? its file extension is .xvg - python

I am new in Python. I have tried this script but it does not work.
It give me this error:
Traceback (most recent call last):
File "temp.py", line 11, in <module>
y = [row.split(' ')[1] for row in data]
File "temp.py", line 11, in <listcomp>
y = [row.split(' ')[1] for row in data]
IndexError: list index out of range
The script is:
import numpy as np
import matplotlib.pyplot as plt
with open("data.xvg") as f:
data = f.read()
data = data.split('\n')
x = [row.split(' ')[0] for row in data]
y = [row.split(' ')[1] for row in data]
fig = plt.figure()
ax1 = fig.add_subplot(111)
ax1.set_title("Plot title...")
ax1.set_xlabel('your x label..')
ax1.set_ylabel('your y label...')
ax1.plot(x,y, c='r', label='the data')
leg = ax1.legend()
plt.show()
The data is:
0.000000 299.526978
1.000000 4.849206
2.000000 0.975336
3.000000 0.853160
4.000000 0.767092
5.000000 0.995595
6.000000 0.976332
7.000000 1.111898
8.000000 1.251045
9.000000 1.346720
10.000000 1.522089
11.000000 1.705517
12.000000 1.822599
13.000000 1.988752
14.000000 2.073061
15.000000 2.242703
16.000000 2.370366
17.000000 2.530256
18.000000 2.714863
19.000000 2.849218
20.000000 3.033373
21.000000 3.185251
22.000000 3.282328
23.000000 3.431681
24.000000 3.668798
25.000000 3.788214
26.000000 3.877117
27.000000 4.032224
28.000000 4.138007
29.000000 4.315784
30.000000 4.504521
31.000000 4.668567
32.000000 4.787213
33.000000 4.973860
34.000000 5.128736
35.000000 5.240545
36.000000 5.392560
37.000000 5.556009
38.000000 5.709351
39.000000 5.793169
40.000000 5.987224
41.000000 6.096015
42.000000 6.158622
43.000000 6.402116
44.000000 6.533816
45.000000 6.711002
46.000000 6.876793
47.000000 7.104519
48.000000 7.237456
49.000000 7.299352
50.000000 7.471975
51.000000 7.691428
52.000000 7.792002
53.000000 7.928269
54.000000 8.014977
55.000000 8.211984
56.000000 8.330894
57.000000 8.530197
58.000000 8.690166
59.000000 8.808934
60.000000 8.996209
61.000000 9.104818
62.000000 9.325309
63.000000 9.389288
64.000000 9.576900
65.000000 9.761865
66.000000 9.807437
67.000000 10.027261
68.000000 10.129250
69.000000 10.392891
70.000000 10.497618
71.000000 10.627769
72.000000 10.811770
73.000000 11.119184
74.000000 11.181286
75.000000 11.156842
76.000000 11.350290
77.000000 11.493779
78.000000 11.720265
79.000000 11.700112
80.000000 11.939404
81.000000 12.293530
82.000000 12.267791
83.000000 12.394929
84.000000 12.545286
85.000000 12.784669
86.000000 12.754122
87.000000 13.129798
88.000000 13.166340
89.000000 13.389514
90.000000 13.436648
91.000000 13.647285
92.000000 13.722875
93.000000 13.992217
94.000000 14.167837
95.000000 14.320843
96.000000 14.450310
97.000000 14.515556
98.000000 14.598526
99.000000 14.807360
100.000000 14.982592
101.000000 15.312892
102.000000 15.280009

If it is an xvg file from GROMACS it probably has some comments starting with # so without editing that file you can:
x,y = np.loadtxt("file.xvg",comments="#",unpack=True)
plt.plot(x,y)
unpack=True makes the columns come out as individual arrays that are set to x and y on the left-hand side. Of course you could also parse the comments to get the labels and legends.

Try the following, you needed to convert each of your values into a float before appending them:
import numpy as np
import matplotlib.pyplot as plt
x, y = [], []
with open("data.xvg") as f:
for line in f:
cols = line.split()
if len(cols) == 2:
x.append(float(cols[0]))
y.append(float(cols[1]))
fig = plt.figure()
ax1 = fig.add_subplot(111)
ax1.set_title("Plot title...")
ax1.set_xlabel('your x label..')
ax1.set_ylabel('your y label...')
ax1.plot(x,y, c='r', label='the data')
leg = ax1.legend()
plt.show()
This would give you a graph looking like:
The reason for getting the error is probably because you have an empty line somewhere in your file. By checking that the number of entries after the split is 2, it ensures that you should not get an index out of range error.

You can use python library or windows/linux executable to plot XVG files from GMXvg package.
It will discover XVGs and convert them to JPG or any other extension supported by python's matplotlib.

Related

How do I plot a beautiful scatter plot with linear regression?

I want to make a beautiful scatter plot with linear regression line using the data given below. I was able to create a scatter plot but am not satisfied with how it looks. Additionally, I want to plot a linear regression line on the data.
My data and code are below:
x y
117.00 111.0
107.00 110.0
77.22 78.0
112.00 95.4
149.00 150.0
121.00 121.0
121.61 120.0
111.54 140.0
73.00 72.0
70.47 000.0
66.3 72.0
113.00 131.0
81.00 81.0
72.00 00.0
74.20 98.0
84.24 90.0
86.60 88.0
99.00 97.0
90.00 102.0
85.00 000.0
138.0 135.0
96.00 93.0
import numpy as np
import matplotlib.pyplot as plt
print(plt.style.available)
from sklearn.linear_model import LinearRegression
plt.style.use('ggplot')
data = np.loadtxt('test_data', dtype=float, skiprows=1,usecols=(0,1))
x=data[:,0]
y=data[:,1]
plt.xlim(20,200)
plt.ylim(20,200)
plt.scatter(x,y, marker="o",)
plt.show()
Please check the snippet. You can use numpy.polyfit() with degree=1 to calculate slope and y-intercept of line to y=m*x+c
import numpy as np
import matplotlib.pyplot as plt
plt.style.use('ggplot')
data = np.loadtxt('test_data.txt', dtype=float, skiprows=1,usecols=(0,1))
x=data[:,0]
y=data[:,1]
plt.xlim(20,200)
plt.ylim(20,200)
plt.scatter(x,y, marker="o",)
m, b = np.polyfit(x, y, 1)
plt.plot(x, m*x + b)
plt.show()
Edit1:
Based on your comment, I added more points and now graph seems like this and it seems it passes via points.
To set transparency to points you can use alpha argument . You can set range between 0 and 1 to change transparency. Here I set alpha=0.5
plt.scatter(x,y, marker="o",alpha=0.5)
Edit2: Based on #tmdavison suggestion
import numpy as np
import matplotlib.pyplot as plt
plt.style.use('ggplot')
data = np.loadtxt('test_data.txt', dtype=float, skiprows=1,usecols=(0,1))
x=data[:,0]
y=data[:,1]
x2 = np.arange(0, 200)
plt.xlim(20,200)
plt.ylim(20,200)
plt.scatter(x,y, marker="o",)
m, b = np.polyfit(x, y, 1)
plt.plot(x2, m*x2 + b)
plt.show()

error in dataframe loc when applying these two dataframes

I'm trying to put these two dataframes(data2 and trades) together tto make it look like this https://i.stack.imgur.com/pR8bW.png:
data2:
Close
2015-08-28 113.290001
2015-08-31 112.760002
2015-09-01 107.720001
2015-09-02 112.339996
2015-09-03 110.370003
2015-09-04 109.269997
2015-09-08 112.309998
2015-09-09 110.150002
2015-09-10 112.570000
2015-09-11 114.209999
trades:
Trades
2015-08-28 3.0
2015-08-31 3.0
2015-09-01 3.0
2015-09-02 3.0
2015-09-03 2.0
code:
import matplotlib.pyplot as plt
fig = plt.figure()
ax1 = fig.add_subplot(111, ylabel='Portfolio value in $')
data2["Close"].plot(ax=ax1, lw=2.)
ax1.plot(data2.loc[trades.Trades == 2.0].index, data2.total[trades.Trades == 2.0],
'^', markersize=10, color='m')
ax1.plot(data2.loc[trades.Trades == 3.0].index,
data2.total[trades.Trades == 3.0],
'v', markersize=10, color='k')
plt.show()
But this gives the following error:
---------------------------------------------------------------------------
IndexingError Traceback (most recent call last)
<ipython-input-38-9cde686354a8> in <module>()
7 data2["Close"].plot(ax=ax1, lw=2.)
8
----> 9 ax1.plot(data2.loc[trades.Trades == 2.0].index, data2.total[trades.Trades == 2.0],
10 '^', markersize=10, color='m')
11 ax1.plot(data2.loc[trades.Trades == 3.0].index,
3 frames
/usr/local/lib/python3.6/dist-packages/pandas/core/indexing.py in check_bool_indexer(index, key)
2316 if mask.any():
2317 raise IndexingError(
-> 2318 "Unalignable boolean Series provided as "
2319 "indexer (index of the boolean Series and of "
2320 "the indexed object do not match)."
IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match).
The indexes of the two data frames are different. I've taken the approach of define masks for data2 dataframe that are based of values in trades dataframe and it works.
Additionally your sample code referred to total which does not exist. Updates to use Close
import pandas as pd
import io
import matplotlib.pyplot as plt
data2 = pd.read_csv(io.StringIO(""" Close
2015-08-28 113.290001
2015-08-31 112.760002
2015-09-01 107.720001
2015-09-02 112.339996
2015-09-03 110.370003
2015-09-04 109.269997
2015-09-08 112.309998
2015-09-09 110.150002
2015-09-10 112.570000
2015-09-11 114.209999"""), sep="\s+")
trades = pd.read_csv(io.StringIO(""" Trades
2015-08-28 3.0
2015-08-31 3.0
2015-09-01 3.0
2015-09-02 3.0
2015-09-03 2.0"""), sep="\s+")
# make sure it's dates
data2 = data2.reset_index().assign(index=lambda x: pd.to_datetime(x["index"])).set_index("index")
trades = trades.reset_index().assign(index=lambda x: pd.to_datetime(x["index"])).set_index("index")
fig = plt.figure()
ax1 = fig.add_subplot(111, ylabel='Portfolio value in $')
data2["Close"].plot(ax=ax1, lw=2.)
mask2 = data2.index.isin((trades.Trades == 2.0).index)
mask3 = data2.index.isin((trades.Trades == 3.0).index)
ax1.plot(data2.loc[mask2].index, data2.Close[mask2],
'^', markersize=10, color='m')
ax1.plot(data2.loc[mask3].index,
data2.Close[mask3],
'v', markersize=10, color='k')
plt.show()
output

plot from pandas dataframe with negative and positive values

I have a dataframe which looks like this:
MM Initial Energy MM Initial Angle QM Energy QM Angle
0 13.029277 120.0 18.048 120.0
1 11.173115 125.0 15.250 125.0
2 9.411475 130.0 12.668 130.0
3 7.762888 135.0 10.309 135.0
4 6.239025 140.0 8.180 140.0
5 4.853004 145.0 6.286 145.0
6 3.617394 150.0 4.633 150.0
7 2.544760 155.0 3.226 155.0
8 1.646335 160.0 2.070 160.0
9 0.934298 165.0 1.166 165.0
10 0.419003 170.0 0.519 170.0
11 0.105913 175.0 0.130 175.0
12 0.000000 -180.0 0.000 -180.0
13 0.105988 -175.0 0.130 -175.0
14 0.420029 -170.0 0.519 -170.0
15 0.937312 -165.0 1.166 -165.0
16 1.650080 -160.0 2.070 -160.0
17 2.548463 -155.0 3.227 -155.0
18 3.621227 -150.0 4.633 -150.0
19 4.856266 -145.0 6.286 -145.0
20 6.236939 -140.0 8.180 -140.0
21 7.760035 -135.0 10.309 -135.0
22 9.409117 -130.0 12.669 -130.0
23 11.170671 -125.0 15.251 -125.0
24 13.033293 -120.0 18.048 -120.0
I want to plot the data with Angles on the x-axis and energy on the y. This sounds fairly simple, however what happens is that pandas or matplotlib sorts the X-axis values in a such a manner that my plot looks split. This is what it looks like:
However, this is how I want it:
My code is as follows:
df=pd.read_fwf('scan_c1c2c3h31_orig.txt', header=None, prefix='X')
df.rename(columns={'X0':'MM Initial Energy',
'X1':'MM Initial Angle',
'X2':'QM Energy', 'X3':'QM Angle'},
inplace=True)
df=df.sort_values(by=['MM Initial Angle'], axis=0, ascending=True)
df=df.reset_index(drop=False)
df2=pd.read_fwf('scan_c1c2c3h31.txt', header=None, prefix='X')
df2.rename(columns={'X0':'MM Energy',
'X1':'MM Angle',
'X2':'QM Energy', 'X3':'QM Angle'},
inplace=True)
df2=df2.sort_values(by=['MM Angle'], axis=0, ascending=True)
df2=df2.reset_index(drop=False)
df
df2
ax = plt.axes()
df.plot(y="MM Initial Energy", x="MM Initial Angle", color='red', linestyle='dashed',linewidth=2.0, ax=ax, fontsize=20, legend=True)
df2.plot(y="MM Energy", x="MM Angle", color='red', ax=ax, linewidth=2.0, fontsize=20, legend=True)
df2.plot(y="QM Energy", x="QM Angle", color='blue', ax=ax, linewidth=2.0, fontsize=20, legend=True)
plt.ylim(-0.05, 6)
ax.xaxis.set_major_locator(MultipleLocator(20))
ax.xaxis.set_minor_locator(MultipleLocator(10))
ax.yaxis.set_minor_locator(MultipleLocator(0.5))
plt.xlabel('Angles (Degrees)', fontsize=25)
plt.ylabel('Energy (kcal/mol)', fontsize=25)
What I am doing is, sorting the dataframe by 'MM Angles'/'MM Initial Angles' to avoid plot "scarambling" due to repeating values in the y-axis.The angles vary from -180 to 180, where I want the -180 and +180 next to each other.
I have tried sorting the negative values in ascending order and positive values in descending order as suggested in this post, but I still get the same plot where x axis ranges from -180 to +180.
I have also tried matplotlib axis spines to recenter the plot, and I have also tried inverting the x-axis as suggested in this post, but still get the same plot. Additionally, I have also tried suggestion in this another post.
Any help will be appreciated.
If you don't need to rescale the plot, I would plot against the positive angles 0-360 and manually re-label the ticks:
fig, ax = plt.subplots()
(df.assign(Angle=df['MM Initial Angle']%360)
.plot(x='Angle', y=['QM Energy','MM Initial Energy'], ax=ax)
)
ax.xaxis.set_major_locator(MultipleLocator(20))
x_ticks = ax.get_xticks()
x_ticks = [t-360 if t>180 else t for t in x_ticks]
ax.set_xticklabels(x_ticks)
plt.plot()
Output:

plot a 3d plot using dataframe in matplotlib

I have following data and i am having trouble plotting a 3d Plot similar to the one showed in the examples of Matplotlib -> https://matplotlib.org/examples/mplot3d/custom_shaded_3d_surface.html
On the x axis i want to have the Residue column, on the y-axis the first row and the z axis should represent the values.
residue 0 1 2 3 4 5 6 \
0 0.0 0.0 1.671928 1.441439 0.808492 1.079337 1.186970 1.445275
1 1.0 0.0 1.348867 1.216174 1.324360 1.965453 2.121130 1.713321
2 2.0 0.0 1.281589 0.794236 1.083470 1.476939 2.011159 2.360246
3 3.0 0.0 0.798151 0.993858 1.020617 0.829792 1.280412 1.653299
4 4.0 0.0 0.789995 1.194215 1.407934 1.291384 1.555449 1.258266
5 5.0 0.0 0.653958 0.910582 1.585495 1.245847 1.620384 1.664490
6 6.0 0.0 0.782577 0.648373 1.284292 1.087762 1.523729 1.631152
7 7.0 0.0 1.094054 1.127248 0.958693 1.168483 0.897470 1.404080
8 8.0 0.0 0.433993 1.165169 0.925521 1.292363 1.075700 1.146139
9 9.0 0.0 1.114398 0.963963 1.062597 1.297358 1.412016 1.422071
10 10.0 0.0 0.706276 1.056272 1.381639 1.682080 1.779487 1.914487
11 11.0 0.0 1.059623 1.000653 1.152697 1.895022 1.562730 1.964862
Is it better not to use a Dataframe in this case?
this is the code im using:
z = df.iloc[1:,1:-1]
ff= [i for i in range(1,500)]
y=df["residue"]
print(len(z))
nrows, ncols = z.shape
x = np.linspace(min(ff),max(ff), ncols)
x, y = np.meshgrid(x, y)
fig, ax = plt.subplots(subplot_kw=dict(projection='3d'))
plt.show()
u = """ residue 0 1 2 3 4 5 6
0 0.0 0.0 1.671928 1.441439 0.808492 1.079337 1.186970 1.445275
1 1.0 0.0 1.348867 1.216174 1.324360 1.965453 2.121130 1.713321
2 2.0 0.0 1.281589 0.794236 1.083470 1.476939 2.011159 2.360246
3 3.0 0.0 0.798151 0.993858 1.020617 0.829792 1.280412 1.653299
4 4.0 0.0 0.789995 1.194215 1.407934 1.291384 1.555449 1.258266
5 5.0 0.0 0.653958 0.910582 1.585495 1.245847 1.620384 1.664490
6 6.0 0.0 0.782577 0.648373 1.284292 1.087762 1.523729 1.631152
7 7.0 0.0 1.094054 1.127248 0.958693 1.168483 0.897470 1.404080
8 8.0 0.0 0.433993 1.165169 0.925521 1.292363 1.075700 1.146139
9 9.0 0.0 1.114398 0.963963 1.062597 1.297358 1.412016 1.422071
10 10.0 0.0 0.706276 1.056272 1.381639 1.682080 1.779487 1.914487
11 11.0 0.0 1.059623 1.000653 1.152697 1.895022 1.562730 1.964862"""
import io
import pandas as pd
import numpy as np
df = pd.read_csv(io.StringIO(u), delim_whitespace=True)
df = df.set_index("residue")
Setting such that the residue column is not part of the data anymore.
Then you can create the meshgrid from the columns and the index and plot it according to the linked example.
x,y = np.meshgrid(df.columns.astype(float), df.index)
z = df.values
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from matplotlib.colors import LightSource
fig, ax = plt.subplots(subplot_kw=dict(projection='3d'))
rgb = LightSource(270, 45).shade(z, cmap=plt.cm.gist_earth, vert_exag=0.1, blend_mode='soft')
surf = ax.plot_surface(x, y, z, facecolors=rgb,
linewidth=0, antialiased=False, shade=False)
plt.show()

How to make axis tick labels visible on the other side of the plot in gridspec?

Plotting my favourite example dataframe,which looks like this:
x val1 val2 val3
0 0.0 10.0 NaN NaN
1 0.5 10.5 NaN NaN
2 1.0 11.0 NaN NaN
3 1.5 11.5 NaN 11.60
4 2.0 12.0 NaN 12.08
5 2.5 12.5 12.2 12.56
6 3.0 13.0 19.8 13.04
7 3.5 13.5 13.3 13.52
8 4.0 14.0 19.8 14.00
9 4.5 14.5 14.4 14.48
10 5.0 NaN 19.8 14.96
11 5.5 15.5 15.5 15.44
12 6.0 16.0 19.8 15.92
13 6.5 16.5 16.6 16.40
14 7.0 17.0 19.8 18.00
15 7.5 17.5 17.7 NaN
16 8.0 18.0 19.8 NaN
17 8.5 18.5 18.8 NaN
18 9.0 19.0 19.8 NaN
19 9.5 19.5 19.9 NaN
20 10.0 20.0 19.8 NaN
I have two subplots, for some other reasons it is best for me to use gridspec. The plotting code is as follows (it is quite comprehensive, so I would like to avoid major changes in the code that otherwise works perfectly and just doesn't do one unimportant detail):
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import gridspec
import matplotlib as mpl
df = pd.read_csv('H:/DocumentsRedir/pokus/dataframe.csv', delimiter=',')
# setting limits for x and y
ylimit=(0,10)
yticks1=np.arange(0,11,1)
xlimit1=(10,20)
xticks1 = np.arange(10,21,1)
# general plot formatting (axes colour, background etc.)
plt.style.use('ggplot')
plt.rc('axes',edgecolor='black')
plt.rc('axes', facecolor = 'white')
plt.rc('grid', color = 'grey')
plt.rc('grid', alpha = 0.3) # alpha is percentage of transparency
colours = ['g','b','r']
title1 = 'The plot'
# GRIDSPEC INTRO - rows, cols, distance of individual plots
fig = plt.figure(figsize=(6,4))
gs=gridspec.GridSpec(1,2, hspace=0.15, wspace=0.08,width_ratios=[1,1])
## SUBPLOT of GRIDSPEC with lines
# the first plot
axes1 = plt.subplot(gs[0,0])
for count, vals in enumerate(df.columns.values[1:]):
X = np.asarray(df[vals])
h = vals
p1 = plt.plot(X,df.index,color=colours[count],linestyle='-',linewidth=1.5,label=h)
# formatting
p1 = plt.ylim(ylimit)
p1 = plt.yticks(yticks1, yticks1, rotation=0)
p1 = axes1.yaxis.set_minor_locator(mpl.ticker.MultipleLocator(0.1))
p1 = plt.setp(axes1.get_yticklabels(),fontsize=8)
p1 = plt.gca().invert_yaxis()
p1 = plt.ylabel('x [unit]', fontsize=14)
p1 = plt.xlabel("Value [unit]", fontsize=14)
p1 = plt.tick_params('both', length=5, width=1, which='minor', direction = 'in')
p1 = axes1.xaxis.set_minor_locator(mpl.ticker.MultipleLocator(0.1))
p1 = plt.xlim(xlimit1)
p1 = plt.xticks(xticks1, xticks1, rotation=0)
p1 = plt.setp(axes1.get_xticklabels(),fontsize=8)
p1 = plt.legend(loc='best',fontsize = 8, ncol=2) #
# the second plot (something random)
axes2 = plt.subplot(gs[0,1])
for count, vals in enumerate(df.columns.values[1:]):
nonans = df[vals].dropna()
result=nonans-0.5
p2 = plt.plot(result,nonans.index,color=colours[count],linestyle='-',linewidth=1.5)
p2 = plt.ylim(ylimit)
p2 = plt.yticks(yticks1, yticks1, rotation=0)
p2 = axes2.yaxis.set_minor_locator(mpl.ticker.MultipleLocator(0.1))
p2 = plt.gca().invert_yaxis()
p2 = plt.xlim(xlimit1)
p2 = plt.xticks(xticks1, xticks1, rotation=0)
p2 = axes2.xaxis.set_minor_locator(mpl.ticker.MultipleLocator(0.1))
p2 = plt.setp(axes2.get_xticklabels(),fontsize=8)
p2 = plt.xlabel("Other value [unit]", fontsize=14)
p2 = plt.tick_params('x', length=5, width=1, which='minor', direction = 'in')
p2 = plt.setp(axes2.get_yticklabels(), visible=False)
fig.suptitle(title1, size=16)
plt.show()
However, is it possible to show the y tick labels of the second subplot on the right hand side? The current code produces this:
And I would like to know if there is an easy way to get this:
No, ok, found out it is precisely what I wanted.
I want the TICKS to be on BOTH sides, just the LABELS to be on the right. The solution above removes my ticks from the left side of the subplot, which doesn't look good. However, this answer seems to get the right solution :)
To sum up:
to get the ticks on both sides and labels on the right, this is what fixes it:
axes2.yaxis.tick_right(‌​)
axes2.yaxis.set_ticks_p‌​osition('both')
And if you need the same for x axis, it's axes2.xaxis.tick_top(‌​)
try something like
axes2.yaxis.tick_right()
Just look around Python Matplotlib Y-Axis ticks on Right Side of Plot.

Categories