plot from pandas dataframe with negative and positive values - python

I have a dataframe which looks like this:
MM Initial Energy MM Initial Angle QM Energy QM Angle
0 13.029277 120.0 18.048 120.0
1 11.173115 125.0 15.250 125.0
2 9.411475 130.0 12.668 130.0
3 7.762888 135.0 10.309 135.0
4 6.239025 140.0 8.180 140.0
5 4.853004 145.0 6.286 145.0
6 3.617394 150.0 4.633 150.0
7 2.544760 155.0 3.226 155.0
8 1.646335 160.0 2.070 160.0
9 0.934298 165.0 1.166 165.0
10 0.419003 170.0 0.519 170.0
11 0.105913 175.0 0.130 175.0
12 0.000000 -180.0 0.000 -180.0
13 0.105988 -175.0 0.130 -175.0
14 0.420029 -170.0 0.519 -170.0
15 0.937312 -165.0 1.166 -165.0
16 1.650080 -160.0 2.070 -160.0
17 2.548463 -155.0 3.227 -155.0
18 3.621227 -150.0 4.633 -150.0
19 4.856266 -145.0 6.286 -145.0
20 6.236939 -140.0 8.180 -140.0
21 7.760035 -135.0 10.309 -135.0
22 9.409117 -130.0 12.669 -130.0
23 11.170671 -125.0 15.251 -125.0
24 13.033293 -120.0 18.048 -120.0
I want to plot the data with Angles on the x-axis and energy on the y. This sounds fairly simple, however what happens is that pandas or matplotlib sorts the X-axis values in a such a manner that my plot looks split. This is what it looks like:
However, this is how I want it:
My code is as follows:
df=pd.read_fwf('scan_c1c2c3h31_orig.txt', header=None, prefix='X')
df.rename(columns={'X0':'MM Initial Energy',
'X1':'MM Initial Angle',
'X2':'QM Energy', 'X3':'QM Angle'},
inplace=True)
df=df.sort_values(by=['MM Initial Angle'], axis=0, ascending=True)
df=df.reset_index(drop=False)
df2=pd.read_fwf('scan_c1c2c3h31.txt', header=None, prefix='X')
df2.rename(columns={'X0':'MM Energy',
'X1':'MM Angle',
'X2':'QM Energy', 'X3':'QM Angle'},
inplace=True)
df2=df2.sort_values(by=['MM Angle'], axis=0, ascending=True)
df2=df2.reset_index(drop=False)
df
df2
ax = plt.axes()
df.plot(y="MM Initial Energy", x="MM Initial Angle", color='red', linestyle='dashed',linewidth=2.0, ax=ax, fontsize=20, legend=True)
df2.plot(y="MM Energy", x="MM Angle", color='red', ax=ax, linewidth=2.0, fontsize=20, legend=True)
df2.plot(y="QM Energy", x="QM Angle", color='blue', ax=ax, linewidth=2.0, fontsize=20, legend=True)
plt.ylim(-0.05, 6)
ax.xaxis.set_major_locator(MultipleLocator(20))
ax.xaxis.set_minor_locator(MultipleLocator(10))
ax.yaxis.set_minor_locator(MultipleLocator(0.5))
plt.xlabel('Angles (Degrees)', fontsize=25)
plt.ylabel('Energy (kcal/mol)', fontsize=25)
What I am doing is, sorting the dataframe by 'MM Angles'/'MM Initial Angles' to avoid plot "scarambling" due to repeating values in the y-axis.The angles vary from -180 to 180, where I want the -180 and +180 next to each other.
I have tried sorting the negative values in ascending order and positive values in descending order as suggested in this post, but I still get the same plot where x axis ranges from -180 to +180.
I have also tried matplotlib axis spines to recenter the plot, and I have also tried inverting the x-axis as suggested in this post, but still get the same plot. Additionally, I have also tried suggestion in this another post.
Any help will be appreciated.

If you don't need to rescale the plot, I would plot against the positive angles 0-360 and manually re-label the ticks:
fig, ax = plt.subplots()
(df.assign(Angle=df['MM Initial Angle']%360)
.plot(x='Angle', y=['QM Energy','MM Initial Energy'], ax=ax)
)
ax.xaxis.set_major_locator(MultipleLocator(20))
x_ticks = ax.get_xticks()
x_ticks = [t-360 if t>180 else t for t in x_ticks]
ax.set_xticklabels(x_ticks)
plt.plot()
Output:

Related

plot and draw curves in python matplotlib without ignoring first and last Nan values from the graph figure

I have csv format file like the below table
depth
x1
x2
x3
1000
15
Nan
Nan
1001
10
Nan
Nan
1002
5
Nan
Nan
1003
8
10
Nan
1004
12
11.11111111
Nan
1010
13
17.77777778
14.16666667
1011
14
18.88888889
15
1012
15
20
15.71428571
1013
16
20.55555556
16.42857143
1014
17
21.11111111
17.14285714
1017
20
22.77777778
19.28571429
1018
21
23.33333333
20
1019
22
23.88888889
20.83333333
1024
27
17.5
25
1025
28
15
25
1026
25
Nan
Nan
1027
26
Nan
Nan
1028
7
Nan
Nan
I want to plot x1, x2, x3 columns versus depth columns but sometimes these columns contain Nan values at start and end of columns, I want to plot whole curves points without ignoring the first and last Nan values
the below code is my attempt to plot curves but the plot always start and end at first and last valid values and ignores the first and last Nan values
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.gridspec as gridspec
df = pd.read_csv("result.csv")
fig = plt.figure(figsize=(15, 12), dpi=100, tight_layout=True)
gs = gridspec.GridSpec(nrows=1, ncols=5, wspace=0)
fig.add_subplot(gs[0, 1])
plt.plot(df['x1'],df["depth"], linewidth=2, color='black', marker="o", markersize=3)
plt.gca().invert_yaxis()
fig.add_subplot(gs[0,2 ])
plt.plot(df["x2"],df["depth"], linewidth =2, color='black', marker="o", markersize=3)
plt.gca().invert_yaxis()
fig.add_subplot(gs[0,3])
plt.plot(df["x3"],df["depth"], linewidth =2, color='black', marker="o", markersize=3)
plt.gca().invert_yaxis()
plt.show()
the current reult
the desired result in the below image where all curves y axis start from same depth point
You need to share the y axis with the other y axis:
fig, axs = plt.subplots(1, 3, figsize=(15, 12), dpi=100, tight_layout=True, gridspec_kw={'wspace': 0})
axs[0].plot(df.x1, df.depth, '-ok', lw=2, ms=3)
axs[1].plot(df.x2, df.depth, '-ok', lw=2, ms=3)
axs[1].sharey(axs[0])
axs[2].plot(df.x3, df.depth, '-ok', lw=2, ms=3)
axs[2].sharey(axs[0])

Different binning for histplot as JoinGrid (x,y) marginal plot

I have a pandas dataframe like this:
Date
Weight
Year
Month
Day
Week
DayOfWeek
0
2017-11-13
76.1
2017
11
13
46
0
1
2017-11-14
76.2
2017
11
14
46
1
2
2017-11-15
76.6
2017
11
15
46
2
3
2017-11-16
77.1
2017
11
16
46
3
4
2017-11-17
76.7
2017
11
17
46
4
...
...
...
...
...
...
...
...
I created a JoinGrid with:
g = sns.JointGrid(data=df,
x="Date",
y="Weight",
marginal_ticks=True,
height=6,
ratio=2,
space=.05)
Then a defined joint and marginal plots:
g.plot_joint(sns.scatterplot,
hue=df["Year"],
alpha=.4,
legend=True)
g.plot_marginals(sns.histplot,
multiple="stack",
bins=20,
hue=df["Year"])
Result is this.
Now the question is: "is it possible to specify different binning for the two histplot resulting in the x and y marginal plot?"
I don't think there is a built-in way to do that, by you can plot directly on the marginal axes using the plotting function of your choice, like so:
penguins = sns.load_dataset('penguins')
data = penguins
x_col = "bill_length_mm"
y_col = "bill_depth_mm"
hue_col = "species"
g = sns.JointGrid(data=data, x=x_col, y=y_col, hue=hue_col)
g.plot_joint(sns.scatterplot)
# top marginal
sns.histplot(data=data, x=x_col, hue=hue_col, bins=5, ax=g.ax_marg_x, legend=False, multiple='stack')
# right marginal
sns.histplot(data=data, y=y_col, hue=hue_col, bins=40, ax=g.ax_marg_y, legend=False, multiple='stack')

How to make plots with small whitespace separations in Matplotlib or Seaborn?

I'd like to make this type of plot with multiple columns separated by small whitespace, each having different category having 3-5 (5 in this example) different observations with varying values on y axis:
actually, i can plot this plot use ggplot2. for example:
head(mtcars)
# mpg cyl disp hp drat wt qsec vs am gear carb
# Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
# Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
# Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
# Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
# Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
# Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
library(dplyr)
library(ggplot2)
mtcars %>% reshape2::melt() %>%
ggplot(aes(x = variable, y = value)) +
geom_point() + facet_grid(~ variable) +
theme(axis.text.x = element_blank())
you set a categorical variable in your dataset,then use the facet_grid(~).this function can change your plot into multiple plot by your categrical variable
Here is an approach to draw a similar plot using Python's matplotlib. The plot has a grey background and white major and minor gridlines to delimit the zones. Getting the dots in the center of each little cell is somewhat tricky: divide into n+1 spaces and shift half a cell (1/2n). A secondary x-axis can be used to set the labels. A zorder has to be set to have the dots on top of the gridlines.
import numpy as np
from matplotlib import pyplot as plt
from matplotlib import ticker
n = 5
cols = 7
values = [np.random.uniform(1, 10, n) for c in range(cols)]
fig, ax = plt.subplots()
ax.set_facecolor('lightgrey')
ax.xaxis.set_major_locator(ticker.MultipleLocator(1))
ax.xaxis.set_minor_locator(ticker.MultipleLocator(1 / (n)))
ax.yaxis.set_major_locator(ticker.MultipleLocator(1))
ax.grid(True, which='both', axis='both', color='white')
ax.set_xticklabels([])
ax.tick_params(axis='x', which='both', length=0)
ax.grid(which='major', axis='both', lw=3)
ax.set_xlim(1, cols + 1)
for i in range(1, cols + 1):
ax.scatter(np.linspace(i, i + 1, n, endpoint=False) + 1 / (2 * n), values[i-1], c='crimson', zorder=2)
ax2 = ax.twiny()
ax2.set_xlim(0.5, cols + 0.5)
ticks = range(1, cols + 1)
ax2.set_xticks(ticks)
ax2.set_xticklabels([f'Cat_{t:02d}' for t in ticks])
bbox = dict(boxstyle="round", ec="limegreen", fc="limegreen", alpha=0.5)
plt.setp(ax2.get_xticklabels(), bbox=bbox)
ax2.tick_params(axis='x', length=0)
plt.show()

Expand x axis when x is a string (make xlim wider)

I have the following pandas data frame:
print(so)
Time Minions Crime_rate
0 2018-01 1907 0.147352
1 2018-02 2094 0.165234
2 2018-03 2227 0.148181
3 2018-04 2101 0.135174
4 2018-05 2321 0.132271
5 2018-06 2208 0.128623
6 2018-07 2593 0.140378
7 2018-08 2660 0.145865
8 2018-09 2488 0.149920
9 2018-10 2640 0.152273
10 2018-11 2501 0.138345
11 2018-12 2379 0.134931
I want to plot Time on the x axis, Minions on the y axis and Crime_rate on a secondary y axis. The problem is that the x axis is cropped and I want to expand it. I tried the following code:
so.plot(x="Time", y="Minions", kind="bar", color="orange", legend=False)
plt.ylabel("Number of Minions")
so["Crime_rate"].plot(secondary_y=True, rot=90)
plt.ylabel("Minion crime rate")
plt.ylim(0, 1)
# plt.xlim(min, max)
plt.show()
The code returns the following plot:
I had done this before using plt.xlim(), but so["Time"] is a string, so I cannot subtract or add to the limits. How can I expand the x axis limits to show the first and last bars?
I couldn't find a solution that involves keeping the x axis as a string. To solve this, I had to avoid setting the x axis and then overwriting its values using set_xticklabels().
fig, ax1 = plt.subplots()
ax1 = so["Minions"].plot(ax=ax1, kind="bar", color="orange", legend=False)
ax2 = ax1.twinx()
so["Crime_rate"].plot(ax=ax2, legend=False)
ax1.set_ylabel("Minions")
ax1.set_xlabel("Time")
ax2.set_ylabel("Minion crime rate")
ax2.set_xlim(-0.5, len(so) - 0.5) # extend the x axis by 0.5 to the left and 0.5 to the right
ax2.set_ylim(0, 1)
ax2.set_xticklabels(so["Time"])
plt.show()
This works because I never set the x axis in ax1, so it was generically set to a [0, 1, 2, ..., 10, 11]. This way, I could set the x axis range from -0.5 to 11.5.

How to make axis tick labels visible on the other side of the plot in gridspec?

Plotting my favourite example dataframe,which looks like this:
x val1 val2 val3
0 0.0 10.0 NaN NaN
1 0.5 10.5 NaN NaN
2 1.0 11.0 NaN NaN
3 1.5 11.5 NaN 11.60
4 2.0 12.0 NaN 12.08
5 2.5 12.5 12.2 12.56
6 3.0 13.0 19.8 13.04
7 3.5 13.5 13.3 13.52
8 4.0 14.0 19.8 14.00
9 4.5 14.5 14.4 14.48
10 5.0 NaN 19.8 14.96
11 5.5 15.5 15.5 15.44
12 6.0 16.0 19.8 15.92
13 6.5 16.5 16.6 16.40
14 7.0 17.0 19.8 18.00
15 7.5 17.5 17.7 NaN
16 8.0 18.0 19.8 NaN
17 8.5 18.5 18.8 NaN
18 9.0 19.0 19.8 NaN
19 9.5 19.5 19.9 NaN
20 10.0 20.0 19.8 NaN
I have two subplots, for some other reasons it is best for me to use gridspec. The plotting code is as follows (it is quite comprehensive, so I would like to avoid major changes in the code that otherwise works perfectly and just doesn't do one unimportant detail):
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import gridspec
import matplotlib as mpl
df = pd.read_csv('H:/DocumentsRedir/pokus/dataframe.csv', delimiter=',')
# setting limits for x and y
ylimit=(0,10)
yticks1=np.arange(0,11,1)
xlimit1=(10,20)
xticks1 = np.arange(10,21,1)
# general plot formatting (axes colour, background etc.)
plt.style.use('ggplot')
plt.rc('axes',edgecolor='black')
plt.rc('axes', facecolor = 'white')
plt.rc('grid', color = 'grey')
plt.rc('grid', alpha = 0.3) # alpha is percentage of transparency
colours = ['g','b','r']
title1 = 'The plot'
# GRIDSPEC INTRO - rows, cols, distance of individual plots
fig = plt.figure(figsize=(6,4))
gs=gridspec.GridSpec(1,2, hspace=0.15, wspace=0.08,width_ratios=[1,1])
## SUBPLOT of GRIDSPEC with lines
# the first plot
axes1 = plt.subplot(gs[0,0])
for count, vals in enumerate(df.columns.values[1:]):
X = np.asarray(df[vals])
h = vals
p1 = plt.plot(X,df.index,color=colours[count],linestyle='-',linewidth=1.5,label=h)
# formatting
p1 = plt.ylim(ylimit)
p1 = plt.yticks(yticks1, yticks1, rotation=0)
p1 = axes1.yaxis.set_minor_locator(mpl.ticker.MultipleLocator(0.1))
p1 = plt.setp(axes1.get_yticklabels(),fontsize=8)
p1 = plt.gca().invert_yaxis()
p1 = plt.ylabel('x [unit]', fontsize=14)
p1 = plt.xlabel("Value [unit]", fontsize=14)
p1 = plt.tick_params('both', length=5, width=1, which='minor', direction = 'in')
p1 = axes1.xaxis.set_minor_locator(mpl.ticker.MultipleLocator(0.1))
p1 = plt.xlim(xlimit1)
p1 = plt.xticks(xticks1, xticks1, rotation=0)
p1 = plt.setp(axes1.get_xticklabels(),fontsize=8)
p1 = plt.legend(loc='best',fontsize = 8, ncol=2) #
# the second plot (something random)
axes2 = plt.subplot(gs[0,1])
for count, vals in enumerate(df.columns.values[1:]):
nonans = df[vals].dropna()
result=nonans-0.5
p2 = plt.plot(result,nonans.index,color=colours[count],linestyle='-',linewidth=1.5)
p2 = plt.ylim(ylimit)
p2 = plt.yticks(yticks1, yticks1, rotation=0)
p2 = axes2.yaxis.set_minor_locator(mpl.ticker.MultipleLocator(0.1))
p2 = plt.gca().invert_yaxis()
p2 = plt.xlim(xlimit1)
p2 = plt.xticks(xticks1, xticks1, rotation=0)
p2 = axes2.xaxis.set_minor_locator(mpl.ticker.MultipleLocator(0.1))
p2 = plt.setp(axes2.get_xticklabels(),fontsize=8)
p2 = plt.xlabel("Other value [unit]", fontsize=14)
p2 = plt.tick_params('x', length=5, width=1, which='minor', direction = 'in')
p2 = plt.setp(axes2.get_yticklabels(), visible=False)
fig.suptitle(title1, size=16)
plt.show()
However, is it possible to show the y tick labels of the second subplot on the right hand side? The current code produces this:
And I would like to know if there is an easy way to get this:
No, ok, found out it is precisely what I wanted.
I want the TICKS to be on BOTH sides, just the LABELS to be on the right. The solution above removes my ticks from the left side of the subplot, which doesn't look good. However, this answer seems to get the right solution :)
To sum up:
to get the ticks on both sides and labels on the right, this is what fixes it:
axes2.yaxis.tick_right(‌​)
axes2.yaxis.set_ticks_p‌​osition('both')
And if you need the same for x axis, it's axes2.xaxis.tick_top(‌​)
try something like
axes2.yaxis.tick_right()
Just look around Python Matplotlib Y-Axis ticks on Right Side of Plot.

Categories