error in dataframe loc when applying these two dataframes

error in dataframe loc when applying these two dataframes - python

I'm trying to put these two dataframes(data2 and trades) together tto make it look like this https://i.stack.imgur.com/pR8bW.png:
data2:
Close
2015-08-28 113.290001
2015-08-31 112.760002
2015-09-01 107.720001
2015-09-02 112.339996
2015-09-03 110.370003
2015-09-04 109.269997
2015-09-08 112.309998
2015-09-09 110.150002
2015-09-10 112.570000
2015-09-11 114.209999
trades:
Trades
2015-08-28 3.0
2015-08-31 3.0
2015-09-01 3.0
2015-09-02 3.0
2015-09-03 2.0
code:
import matplotlib.pyplot as plt
fig = plt.figure()
ax1 = fig.add_subplot(111, ylabel='Portfolio value in $')
data2["Close"].plot(ax=ax1, lw=2.)
ax1.plot(data2.loc[trades.Trades == 2.0].index, data2.total[trades.Trades == 2.0],
'^', markersize=10, color='m')
ax1.plot(data2.loc[trades.Trades == 3.0].index,
data2.total[trades.Trades == 3.0],
'v', markersize=10, color='k')
plt.show()
But this gives the following error:
---------------------------------------------------------------------------
IndexingError Traceback (most recent call last)
<ipython-input-38-9cde686354a8> in <module>()
7 data2["Close"].plot(ax=ax1, lw=2.)
8
----> 9 ax1.plot(data2.loc[trades.Trades == 2.0].index, data2.total[trades.Trades == 2.0],
10 '^', markersize=10, color='m')
11 ax1.plot(data2.loc[trades.Trades == 3.0].index,
3 frames
/usr/local/lib/python3.6/dist-packages/pandas/core/indexing.py in check_bool_indexer(index, key)
2316 if mask.any():
2317 raise IndexingError(
-> 2318 "Unalignable boolean Series provided as "
2319 "indexer (index of the boolean Series and of "
2320 "the indexed object do not match)."
IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match).

The indexes of the two data frames are different. I've taken the approach of define masks for data2 dataframe that are based of values in trades dataframe and it works.
Additionally your sample code referred to total which does not exist. Updates to use Close
import pandas as pd
import io
import matplotlib.pyplot as plt
data2 = pd.read_csv(io.StringIO(""" Close
2015-08-28 113.290001
2015-08-31 112.760002
2015-09-01 107.720001
2015-09-02 112.339996
2015-09-03 110.370003
2015-09-04 109.269997
2015-09-08 112.309998
2015-09-09 110.150002
2015-09-10 112.570000
2015-09-11 114.209999"""), sep="\s+")
trades = pd.read_csv(io.StringIO(""" Trades
2015-08-28 3.0
2015-08-31 3.0
2015-09-01 3.0
2015-09-02 3.0
2015-09-03 2.0"""), sep="\s+")
# make sure it's dates
data2 = data2.reset_index().assign(index=lambda x: pd.to_datetime(x["index"])).set_index("index")
trades = trades.reset_index().assign(index=lambda x: pd.to_datetime(x["index"])).set_index("index")
fig = plt.figure()
ax1 = fig.add_subplot(111, ylabel='Portfolio value in $')
data2["Close"].plot(ax=ax1, lw=2.)
mask2 = data2.index.isin((trades.Trades == 2.0).index)
mask3 = data2.index.isin((trades.Trades == 3.0).index)
ax1.plot(data2.loc[mask2].index, data2.Close[mask2],
'^', markersize=10, color='m')
ax1.plot(data2.loc[mask3].index,
data2.Close[mask3],
'v', markersize=10, color='k')
plt.show()
output

Related

plot and draw curves in python matplotlib without ignoring first and last Nan values from the graph figure

I have csv format file like the below table
depth
x1
x2
x3
1000
15
Nan
Nan
1001
10
Nan
Nan
1002
5
Nan
Nan
1003
8
10
Nan
1004
12
11.11111111
Nan
1010
13
17.77777778
14.16666667
1011
14
18.88888889
15
1012
15
20
15.71428571
1013
16
20.55555556
16.42857143
1014
17
21.11111111
17.14285714
1017
20
22.77777778
19.28571429
1018
21
23.33333333
20
1019
22
23.88888889
20.83333333
1024
27
17.5
25
1025
28
15
25
1026
25
Nan
Nan
1027
26
Nan
Nan
1028
7
Nan
Nan
I want to plot x1, x2, x3 columns versus depth columns but sometimes these columns contain Nan values at start and end of columns, I want to plot whole curves points without ignoring the first and last Nan values
the below code is my attempt to plot curves but the plot always start and end at first and last valid values and ignores the first and last Nan values
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.gridspec as gridspec
df = pd.read_csv("result.csv")
fig = plt.figure(figsize=(15, 12), dpi=100, tight_layout=True)
gs = gridspec.GridSpec(nrows=1, ncols=5, wspace=0)
fig.add_subplot(gs[0, 1])
plt.plot(df['x1'],df["depth"], linewidth=2, color='black', marker="o", markersize=3)
plt.gca().invert_yaxis()
fig.add_subplot(gs[0,2 ])
plt.plot(df["x2"],df["depth"], linewidth =2, color='black', marker="o", markersize=3)
plt.gca().invert_yaxis()
fig.add_subplot(gs[0,3])
plt.plot(df["x3"],df["depth"], linewidth =2, color='black', marker="o", markersize=3)
plt.gca().invert_yaxis()
plt.show()
the current reult
the desired result in the below image where all curves y axis start from same depth point

You need to share the y axis with the other y axis:
fig, axs = plt.subplots(1, 3, figsize=(15, 12), dpi=100, tight_layout=True, gridspec_kw={'wspace': 0})
axs[0].plot(df.x1, df.depth, '-ok', lw=2, ms=3)
axs[1].plot(df.x2, df.depth, '-ok', lw=2, ms=3)
axs[1].sharey(axs[0])
axs[2].plot(df.x3, df.depth, '-ok', lw=2, ms=3)
axs[2].sharey(axs[0])

[Python3]How to use Seaborn/Matplotlib to graph pandas dataframe

I'm still having troubles to do this
Here is how my data looks like:
date positive negative neutral
0 2015-09 23 6 18
1 2016-04 709 288 704
2 2016-08 1478 692 1750
3 2016-09 1881 926 2234
4 2016-10 3196 1594 3956
in my csv file I don't have those 0-4 indexes, but only 4 columns from 'date' to 'neutral'.
I don't know how to fix my codes to get it look like this
Seaborn code
sns.set(style='darkgrid', context='talk', palette='Dark2')
fig, ax = plt.subplots(figsize=(8, 8))
sns.barplot(x=df['positive'], y=df['negative'], ax=ax)
ax.set_xticklabels(['Negative', 'Neutral', 'Positive'])
ax.set_ylabel("Percentage")
plt.show()

To do this in seaborn you'll need to transform your data into long format. You can easily do this via melt:
plotting_df = df.melt(id_vars="date", var_name="sign", value_name="percentage")
print(plotting_df.head())
date sign percentage
0 2015-09 positive 23
1 2016-04 positive 709
2 2016-08 positive 1478
3 2016-09 positive 1881
4 2016-10 positive 3196
Then you can plot this long-format dataframe with seaborn in a straightforward mannter:
sns.set(style='darkgrid', context='talk', palette='Dark2')
fig, ax = plt.subplots(figsize=(8, 8))
sns.barplot(x="date", y="percentage", ax=ax, hue="sign", data=plotting_df)

Based on the data you posted
sns.set(style='darkgrid', context='talk', palette='Dark2')
# fig, ax = plt.subplots(figsize=(8, 8))
df.plot(x="date",y=["positive","neutral","negative"],kind="bar")
plt.xticks(rotation=-360)
# ax.set_xticklabels(['Negative', 'Neutral', 'Positive'])
# ax.set_ylabel("Percentage")
plt.show()

plot from pandas dataframe with negative and positive values

I have a dataframe which looks like this:
MM Initial Energy MM Initial Angle QM Energy QM Angle
0 13.029277 120.0 18.048 120.0
1 11.173115 125.0 15.250 125.0
2 9.411475 130.0 12.668 130.0
3 7.762888 135.0 10.309 135.0
4 6.239025 140.0 8.180 140.0
5 4.853004 145.0 6.286 145.0
6 3.617394 150.0 4.633 150.0
7 2.544760 155.0 3.226 155.0
8 1.646335 160.0 2.070 160.0
9 0.934298 165.0 1.166 165.0
10 0.419003 170.0 0.519 170.0
11 0.105913 175.0 0.130 175.0
12 0.000000 -180.0 0.000 -180.0
13 0.105988 -175.0 0.130 -175.0
14 0.420029 -170.0 0.519 -170.0
15 0.937312 -165.0 1.166 -165.0
16 1.650080 -160.0 2.070 -160.0
17 2.548463 -155.0 3.227 -155.0
18 3.621227 -150.0 4.633 -150.0
19 4.856266 -145.0 6.286 -145.0
20 6.236939 -140.0 8.180 -140.0
21 7.760035 -135.0 10.309 -135.0
22 9.409117 -130.0 12.669 -130.0
23 11.170671 -125.0 15.251 -125.0
24 13.033293 -120.0 18.048 -120.0
I want to plot the data with Angles on the x-axis and energy on the y. This sounds fairly simple, however what happens is that pandas or matplotlib sorts the X-axis values in a such a manner that my plot looks split. This is what it looks like:
However, this is how I want it:
My code is as follows:
df=pd.read_fwf('scan_c1c2c3h31_orig.txt', header=None, prefix='X')
df.rename(columns={'X0':'MM Initial Energy',
'X1':'MM Initial Angle',
'X2':'QM Energy', 'X3':'QM Angle'},
inplace=True)
df=df.sort_values(by=['MM Initial Angle'], axis=0, ascending=True)
df=df.reset_index(drop=False)
df2=pd.read_fwf('scan_c1c2c3h31.txt', header=None, prefix='X')
df2.rename(columns={'X0':'MM Energy',
'X1':'MM Angle',
'X2':'QM Energy', 'X3':'QM Angle'},
inplace=True)
df2=df2.sort_values(by=['MM Angle'], axis=0, ascending=True)
df2=df2.reset_index(drop=False)
df
df2
ax = plt.axes()
df.plot(y="MM Initial Energy", x="MM Initial Angle", color='red', linestyle='dashed',linewidth=2.0, ax=ax, fontsize=20, legend=True)
df2.plot(y="MM Energy", x="MM Angle", color='red', ax=ax, linewidth=2.0, fontsize=20, legend=True)
df2.plot(y="QM Energy", x="QM Angle", color='blue', ax=ax, linewidth=2.0, fontsize=20, legend=True)
plt.ylim(-0.05, 6)
ax.xaxis.set_major_locator(MultipleLocator(20))
ax.xaxis.set_minor_locator(MultipleLocator(10))
ax.yaxis.set_minor_locator(MultipleLocator(0.5))
plt.xlabel('Angles (Degrees)', fontsize=25)
plt.ylabel('Energy (kcal/mol)', fontsize=25)
What I am doing is, sorting the dataframe by 'MM Angles'/'MM Initial Angles' to avoid plot "scarambling" due to repeating values in the y-axis.The angles vary from -180 to 180, where I want the -180 and +180 next to each other.
I have tried sorting the negative values in ascending order and positive values in descending order as suggested in this post, but I still get the same plot where x axis ranges from -180 to +180.
I have also tried matplotlib axis spines to recenter the plot, and I have also tried inverting the x-axis as suggested in this post, but still get the same plot. Additionally, I have also tried suggestion in this another post.
Any help will be appreciated.

If you don't need to rescale the plot, I would plot against the positive angles 0-360 and manually re-label the ticks:
fig, ax = plt.subplots()
(df.assign(Angle=df['MM Initial Angle']%360)
.plot(x='Angle', y=['QM Energy','MM Initial Energy'], ax=ax)
)
ax.xaxis.set_major_locator(MultipleLocator(20))
x_ticks = ax.get_xticks()
x_ticks = [t-360 if t>180 else t for t in x_ticks]
ax.set_xticklabels(x_ticks)
plt.plot()
Output:

X-Axis scales not matching with 2 data sets on same plot

I have 2 datasets that I'm trying to plot on the same figure. They share a common column that I'm using for the X-axis, however one of my sets of data is collected annually and the other monthly so the number of data points in each set is significantly different.
Pyplot is not plotting the X values for each set where I would expect when I plot both sets on the same graph
When I plot just my annually collected data set I get:
When I plot just my monthly collected data set I get:
But when I plot the two sets overlayed (code below) I get:
tframe:
10003 Date
0 257 201401
1 216 201402
2 417 201403
3 568 201404
4 768 201405
5 836 201406
6 798 201407
7 809 201408
8 839 201409
9 796 201410
tax_for_zip_data:
TAX BRACKET $1 under $25,000 ... Date
2 5740 ... 201301
0 5380 ... 201401
1 5320 ... 201501
3 5030 ... 201601
So I did as wwii suggested in the comments and converted my Date columns to datetime objects:
tframe:
10003 Date
0 257 2014-01-31
1 216 2014-02-28
2 417 2014-03-31
3 568 2014-04-30
4 768 2014-05-31
5 836 2014-06-30
6 798 2014-07-31
7 809 2014-08-31
8 839 2014-09-30
9 796 2014-10-31
tax_for_zip_data:
TAX BRACKET $1 under $25,000 ... Date
2 5740 ... 2013-01-31
0 5380 ... 2014-01-31
1 5320 ... 2015-01-31
3 5030 ... 2016-01-31
But the dates are still plotting offset,
None of my data goes back to 2012- Jan 2013 is the earliest. The tax_for_zip_data are all offset by a year. If I plot just that set alone it plots properly.
fig, ax1 = plt.subplots(sharex = True)
color = "tab:red"
ax1.set_xlabel('Date')
ax1.set_ylabel('Trips', color = color)
tframe.plot(kind = 'line',x = 'Date', y = "10003", ax = ax1, color = color)
ax1.tick_params(axis = 'y', labelcolor = color)
ax2 = ax1.twinx()
color = "tab:blue"
ax2.set_ylabel('Num Returns', color = color)
tax_for_zip_data.plot(kind = 'line', x = 'Date', y = tax_for_zip_data.columns[:-1], ax = ax2)
ax2.tick_params(axis = 'y', labelcolor = color)
plt.show()

If you can make the DataFrame index a datetime index plotting is easier.
s = '''10003 Date
257 201401
216 201402
417 201403
568 201404
768 201405
836 201406
798 201407
809 201408
839 201409
796 201410
'''
df1 = pd.read_csv(io.StringIO(s), delimiter='\s{2,}',engine='python')
df1.index = pd.to_datetime(df1['Date'],format='%Y%m')
s = '''TAX BRACKET $1 under $25,000 Date
2 5740 201301
0 5380 201401
1 5320 201501
3 5030 201601
'''
df2 = pd.read_csv(io.StringIO(s), delimiter='\s{2,}',engine='python')
df2.index = pd.to_datetime(df2['Date'],format='%Y%m')
You don't need to specify an argument for plot's x parameter.
fig, ax1 = plt.subplots(sharex = True)
color = "tab:red"
ax1.set_xlabel('Date')
ax1.set_ylabel('Trips', color = color)
df1.plot(kind = 'line',y="10003", ax = ax1, color = color)
ax1.tick_params(axis = 'y', labelcolor = color)
ax2 = ax1.twinx()
color = "tab:blue"
ax2.set_ylabel('Num Returns', color = color)
df2.plot(kind = 'line', y='$1 under $25,000', ax = ax2)
ax2.tick_params(axis = 'y', labelcolor = color)
plt.show()
plt.close()

how can i plt this data? its file extension is .xvg

I am new in Python. I have tried this script but it does not work.
It give me this error:
Traceback (most recent call last):
File "temp.py", line 11, in <module>
y = [row.split(' ')[1] for row in data]
File "temp.py", line 11, in <listcomp>
y = [row.split(' ')[1] for row in data]
IndexError: list index out of range
The script is:
import numpy as np
import matplotlib.pyplot as plt
with open("data.xvg") as f:
data = f.read()
data = data.split('\n')
x = [row.split(' ')[0] for row in data]
y = [row.split(' ')[1] for row in data]
fig = plt.figure()
ax1 = fig.add_subplot(111)
ax1.set_title("Plot title...")
ax1.set_xlabel('your x label..')
ax1.set_ylabel('your y label...')
ax1.plot(x,y, c='r', label='the data')
leg = ax1.legend()
plt.show()
The data is:
0.000000 299.526978
1.000000 4.849206
2.000000 0.975336
3.000000 0.853160
4.000000 0.767092
5.000000 0.995595
6.000000 0.976332
7.000000 1.111898
8.000000 1.251045
9.000000 1.346720
10.000000 1.522089
11.000000 1.705517
12.000000 1.822599
13.000000 1.988752
14.000000 2.073061
15.000000 2.242703
16.000000 2.370366
17.000000 2.530256
18.000000 2.714863
19.000000 2.849218
20.000000 3.033373
21.000000 3.185251
22.000000 3.282328
23.000000 3.431681
24.000000 3.668798
25.000000 3.788214
26.000000 3.877117
27.000000 4.032224
28.000000 4.138007
29.000000 4.315784
30.000000 4.504521
31.000000 4.668567
32.000000 4.787213
33.000000 4.973860
34.000000 5.128736
35.000000 5.240545
36.000000 5.392560
37.000000 5.556009
38.000000 5.709351
39.000000 5.793169
40.000000 5.987224
41.000000 6.096015
42.000000 6.158622
43.000000 6.402116
44.000000 6.533816
45.000000 6.711002
46.000000 6.876793
47.000000 7.104519
48.000000 7.237456
49.000000 7.299352
50.000000 7.471975
51.000000 7.691428
52.000000 7.792002
53.000000 7.928269
54.000000 8.014977
55.000000 8.211984
56.000000 8.330894
57.000000 8.530197
58.000000 8.690166
59.000000 8.808934
60.000000 8.996209
61.000000 9.104818
62.000000 9.325309
63.000000 9.389288
64.000000 9.576900
65.000000 9.761865
66.000000 9.807437
67.000000 10.027261
68.000000 10.129250
69.000000 10.392891
70.000000 10.497618
71.000000 10.627769
72.000000 10.811770
73.000000 11.119184
74.000000 11.181286
75.000000 11.156842
76.000000 11.350290
77.000000 11.493779
78.000000 11.720265
79.000000 11.700112
80.000000 11.939404
81.000000 12.293530
82.000000 12.267791
83.000000 12.394929
84.000000 12.545286
85.000000 12.784669
86.000000 12.754122
87.000000 13.129798
88.000000 13.166340
89.000000 13.389514
90.000000 13.436648
91.000000 13.647285
92.000000 13.722875
93.000000 13.992217
94.000000 14.167837
95.000000 14.320843
96.000000 14.450310
97.000000 14.515556
98.000000 14.598526
99.000000 14.807360
100.000000 14.982592
101.000000 15.312892
102.000000 15.280009

If it is an xvg file from GROMACS it probably has some comments starting with # so without editing that file you can:
x,y = np.loadtxt("file.xvg",comments="#",unpack=True)
plt.plot(x,y)
unpack=True makes the columns come out as individual arrays that are set to x and y on the left-hand side. Of course you could also parse the comments to get the labels and legends.

Try the following, you needed to convert each of your values into a float before appending them:
import numpy as np
import matplotlib.pyplot as plt
x, y = [], []
with open("data.xvg") as f:
for line in f:
cols = line.split()
if len(cols) == 2:
x.append(float(cols[0]))
y.append(float(cols[1]))
fig = plt.figure()
ax1 = fig.add_subplot(111)
ax1.set_title("Plot title...")
ax1.set_xlabel('your x label..')
ax1.set_ylabel('your y label...')
ax1.plot(x,y, c='r', label='the data')
leg = ax1.legend()
plt.show()
This would give you a graph looking like:
The reason for getting the error is probably because you have an empty line somewhere in your file. By checking that the number of entries after the split is 2, it ensures that you should not get an index out of range error.

You can use python library or windows/linux executable to plot XVG files from GMXvg package.
It will discover XVGs and convert them to JPG or any other extension supported by python's matplotlib.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

error in dataframe loc when applying these two dataframes - python

Related

plot and draw curves in python matplotlib without ignoring first and last Nan values from the graph figure

[Python3]How to use Seaborn/Matplotlib to graph pandas dataframe

plot from pandas dataframe with negative and positive values

X-Axis scales not matching with 2 data sets on same plot

how can i plt this data? its file extension is .xvg

Categories

Resources