Matplotlib violinplots overlap on the same column - python

I want to create a figure with different violin plots on the same graph (but not on the same column).
My data are a list of dataframes and I want to create a violin plot of one column for each dataframe. (the names of the columns in the final figure I prefer to have as a name that is inside each dataframe in one other column).
I used this code:
for i in range(0,len(sta_list)):
plt.violinplot(sta_list[i]['diff_APS_1'])
I know that this is wrong, I want to split up the resulting plots in the figure.

You can specify the x-position of the violin plot for each column using positions argument
for i in range(0, len(sta_list)):
plt.violinplot(sta_list[i]['diff_APS_1'], positions=[i])
A sample answer for demonstration taking the dataset from this post
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
x = np.random.poisson(lam =3, size=100)
y = np.random.choice(["S{}".format(i+1) for i in range(4)], size=len(x))
df = pd.DataFrame({"Scenario":y, "LMP":x})
fig, ax = plt.subplots()
for i, key in enumerate(['S1', 'S2', 'S3', 'S4']):
ax.violinplot(df[df.Scenario == key]["LMP"].values, positions=[i])

Related

How to force display of x- and y-axis for each subplot in plotly.express

I want to plot a histogram with row and colum facets using plotly.express.histogram() where each subplot gets its own x- and y-axis (for better readability). When looking at the documentation (e.g. go to section "Histogram Facet Grids") I can see a lot of examples where the x- and y-axes are repeated. But in my case, this somehow is not done automatically.
import numpy as np
import pandas as pd
import plotly.express as px
# create a dummy dataframe with lots of variables
rng = np.random.default_rng(42)
n_vars = 3
n_samples = 10
random_vars = [rng.normal(size=n_samples) for v in range(n_vars)]
m = np.vstack(random_vars).T
columns = pd.MultiIndex.from_tuples([('a','b'),('a','c'),('b','c')],names=['src','tgt'])
df = pd.DataFrame(m,columns=columns)
# convert to long format
df_long = df.melt()
# plot with plotly
fig = px.histogram(df_long,x='value',facet_row='src',facet_col='tgt')
fig.update_layout(yaxis={'side': 'left'})
fig.show()
which gives me:
How do I post-hoc configure the figure so that the x- and y-axis are shown for each subplot?
All you need to do is to customize each y and x axis by:
fig.for_each_yaxis(lambda y: y.update(showticklabels=True,matches=None))
fig.for_each_xaxis(lambda x: x.update(showticklabels=True,matches=None))
Output

Multiple boxplot in a single Graphic in Python

I'm a beginner in Python.
In my internship project I am trying to plot bloxplots from data contained in a csv
I need to plot bloxplots for each of the 4 (four) variables showed above (AAG, DENS, SRG e RCG). Since each variable presents values ​​in the range from [001] to [100], there will be 100 boxplots for each variable, which need to be plotted in a single graph as shown in the image.
This is the graph I need to plot, but for each variable there will be 100 bloxplots as each one has 100 columns of values:
The x-axis is the "Year", which ranges from 2025 to 2030, so I need a graph like the one shown in figure 2 for each year and the y-axis is the sets of values ​​for each variable.
Using Pandas-melt function and seaborn library I was able to plot only the boxplots of a column. But that's not what I need:
import pandas as pd
import seaborn as sns
df = pd.read_csv("2DBM_50x50_Central_Aug21_Sim.cliped.csv")
mdf= df.melt(id_vars=['Year'], value_vars='AAG[001]')
print(mdf)
ax=sns.boxplot(x='Year', y='value',width = 0.2, data=mdf)
Result of the code above:
What can I try to resolve this?
The following code gives you five subplots, where each subplot only contains the data of one variable. Then a boxplot is generated for each year. To change the range of columns used for each variable, change the upper limit in var_range = range(1, 101), and to see the outliers change showfliers to True.
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
df = pd.read_csv("2DBM_50x50_Central_Aug21_Sim.cliped.csv")
variables = ["AAG", "DENS", "SRG", "RCG", "Thick"]
period = range(2025, 2031)
var_range = range(1, 101)
fig, axes = plt.subplots(2, 3)
flattened_axes = fig.axes
flattened_axes[-1].set_visible(False)
for i, var in enumerate(variables):
var_columns = [f"TB_acc_{var}[{j:05}]" for j in var_range]
data = df.melt(id_vars=["Period"], value_vars=var_columns, value_name=var)
ax = flattened_axes[i]
sns.boxplot(x="Period", y=var, width=0.2, data=data, ax=ax, showfliers=False)
plt.tight_layout()
plt.show()
output:

Changing the order of pandas/matplotlib line plotting without changing data order

Given the following example:
df = pd.DataFrame(np.random.randint(1,10, size=(8,3)), columns=list('XYZ'))
df.plot(linewidth=10)
The order of plotting puts the last column on top:
How can I make this keep the data & legend order but change the behaviour so that it plots X on top of Y on top of Z?
(I know I can change the data column order and edit the legend order but I am hoping for a simpler easier method leaving the data as is)
UPDATE: final solution used:
(Thanks to r-beginners) I used the get_lines to modify the z-order of each plot
df = pd.DataFrame(np.random.randint(1,10, size=(8,3)), columns=list('XYZ'))
fig = plt.figure()
ax = fig.add_subplot(111)
df.plot(ax=ax, linewidth=10)
lines = ax.get_lines()
for i, line in enumerate(lines, -len(lines)):
line.set_zorder(abs(i))
fig
In a notebook produces:
Get the default zorder and sort it in the desired order.
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
np.random.seed(2021)
df = pd.DataFrame(np.random.randint(1,10, size=(8,3)), columns=list('XYZ'))
ax = df.plot(linewidth=10)
l = ax.get_children()
print(l)
l[0].set_zorder(3)
l[1].set_zorder(1)
l[2].set_zorder(2)
Before definition
After defining zorder
I will just put this answer here because it is a solution to the problem, but probably not the one you are looking for.
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
# generate data
df = pd.DataFrame(np.random.randint(1,10, size=(8,3)), columns=list('XYZ'))
# read columns in reverse order and plot them
# so normally, the legend will be inverted as well, but if we invert it again, you should get what you want
df[df.columns[::-1]].plot(linewidth=10, legend="reverse")
Note that in this example, you don't change the order of your data, you just read it differently, so I don't really know if that's what you want.
You can also make it easier on the eyes by creating a corresponding method.
def plot_dataframe(df: pd.DataFrame) -> None:
df[df.columns[::-1]].plot(linewidth=10, legend="reverse")
# then you just have to call this
df = pd.DataFrame(np.random.randint(1,10, size=(8,3)), columns=list('XYZ'))
plot_dataframe(df)

How to remove certain values before plotting data

I'm using python for the first time. I have a csv file with a few columns of data: location, height, density, day etc... I am plotting height (i_h100) v density (i_cd) and have managed to constrain the height to values below 50 with the code below. I now want to constrain the values on the y axis to be within a certain 'day' range say (85-260). I can't work out how to do this.
import pandas
import matplotlib.pyplot as plt
data=pandas.read_csv('data.csv')
data.plot(kind='scatter',x='i_h100',y='i_cd')
plt.xlim(right=50)
Use .loc to subset data going into graph.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# Make some dummy data
np.random.seed(42)
df = pd.DataFrame({'a':np.random.randint(0,365,20),
'b':np.random.rand(20),
'c':np.random.rand(20)})
# all data: plot of 'b' vs. 'c'
df.plot(kind='scatter', x='b', y='c')
plt.show()
# use .loc to subset data displayed based on value in 'a'
# can also use .loc to restrict values of 'b' displayed rather than plt.xlim
df.loc[df['a'].between(85,260) & (df['b'] < 0.5)].plot(kind='scatter', x='b', y='c')
plt.show()

Can we plot a particular column of a row against another column of same row in matplotlib

I am using python to plot my data set. I want a particular column of a row to be plotted against another column of same row. To be precise, I want my two columns to be the x-axis and y-axis and then plot a particular value entered by the user to be plotted on that graph.
import matplotlib.pyplot as plt
import pandas
import numpy as np
filename = 'friuts.csv'
raw_data = open(filename, 'rb')
data = pandas.read_csv(raw_data)
mydata = pandas.DataFrame(np.random.randn(10,2), columns=['col1','col2'])
mydata.hist()
plt.show()
my data set has column with fruit name and their weights in two different columns. Can those two weights be taken as x and y axis. But, I only want a graph of single row at a time.
What I have tried is taking the entire columns of all the rows.
Is this what you are looking for? http://matplotlib.org/examples/shapes_and_collections/scatter_demo.html
plt.scatter(mydata.col1, mydata.col2)
plt.show()
Assuming that you want to plot a single point with the information in a given row:
Select the row using the fruit name, this will return a pd.Series
Invoke the method plot over the result
For example:
import pandas as pd
import matplotlib.pyplot as plt
# Create the data frame
mydata = pd.DataFrame({
'name': ['banana', 'mango', 'lima', 'apple'],
'weight': [1, 2, 3, 4]})
# Select the fruit you want to plot. This will return a pd.Series
# including the colums 'name' and 'weight'
to_plot = mydata[mydata['name'] == 'banana']
# Call the plot function indicating the which column X and Y axis.
fig, ax = plt.subplots()
to_plot.plot(x='name', y='weight', marker='o', ax=ax)
ax.set_ylabel('Weight')

Categories