This question already has answers here:
How to plot in multiple subplots
(12 answers)
Closed 12 months ago.
How can we create mutiple boxplot at once using matplotlib or seaborn? For example, in a data frame I have numerical variable 'y' and 4 catergorical variables. So, I want 4 box plot for each of the categorial variable with 'y' at once. I can do one by one which is the forth line of the code for one categorical variable. I am attaching my code.
# Create boxplot and add palette
# with predefined values like Paired, Set1, etc
#x=merged_df[["MinWarrantyInMonths","MaxWarrantyInMonths"]]
sns.boxplot(x='MinWarrantyInMonths', y="CountSevereAlarm",
data=merged_df, palette="Set1")
import matplotlib.pyplot as plt
plt.style.use('ggplot')
from ggplot import ggplot, aes, geom_boxplot
import pandas as pd
import numpy as np
data = merged_df
#labels = np.repeat(['A','B'],20)
merged_df[["MinWarrantyInMonths","MaxWarrantyInMonths"]]=labels
data.columns = ['vals','labels']
ggplot(data, aes(x='vals', y='labels')) + geom_boxplot()
I hope I understood correctly what you're asking. If so, I suggest you try a for loop + using plt.subplot to create them together (side by side for example). See this:
columns = ['col1', 'col2', 'col3', 'col4']
for n, column in enumerate(columns):
ax = plt.subplot(1, 4, n + 1)
sns.boxplot(x=column, y="CountSevereAlarm", data=merged_df, palette="Set1")
within the plt.subplot you'll need to specify the number of rows and columns you want. In your situation this is 1 row, 4 columns (because you're interested in 4 box plots). The n+1 means the index location. Alternatively, (4,1,n+1) means that you'll have 4 rows, 1 column and box plots will appear one after another (not side by side).
I hope this helps. You can also read online about Matplotlib and subplots as there are other options to get the same result as you want.
Related
This question already has answers here:
Changing marker style in scatter plot according to third variable
(3 answers)
Scatter plot with different colors and markers from wide formatted data
(1 answer)
Closed 1 year ago.
I am trying to plot a scatter graph on some data with grouping. They are grouped by the column group and I want them to have different marker styles based on the group.
Minimal working code
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
colors = ['r','g','b','y']
markers = ['o', '^', 's', 'P']
df = pd.DataFrame()
df["index"] = list(range(100))
df["data"] = np.random.randint(100, size=100)
df["group"] = np.random.randint(4, size=100)
df["color"] = df.apply(lambda x: colors[x["group"]], axis=1)
df["marker"] = df.apply(lambda x: markers[x["group"]], axis=1)
plt.scatter(x=df["index"], y=df["data"], c=df["color"])
# What I thought would have worked
# plt.scatter(x=df["index"], y=df["data"], c=df["color"], marker=df["marker"])
plt.show()
What I want
I want the groups to have different marker styles as well. For example the red entries will have marker "o" (big dot), green entries with marker "^" (upward triangle) and so on.
What I tried
I thought
plt.scatter(x=df["index"], y=df["data"], c=df["color"], marker=df["marker"])
would have worked but nope...
TypeError: 'Series' objects are mutable, thus they cannot be hashed
I can for loop over the DataFrame and group the entries by their group. Then plot them with the marker argument set with the list defined (like plt.scatter(..., marker=markers[group]). That would result in 4 plt.scatter(...) as there are 4 groups in total. But that is ugly IMO to loop through a DataFrame row by row and I strongly believe there is a better way.
Thanks in advance!
matplotlib
that is ugly IMO to loop through a DataFrame row by row and I strongly believe there is a better way
With matplotlib, I don't think there is a better way than to loop. Note that if you groupby the markers, it does not loop row by row, just group by group (so 4 times in this case).
This will call plt.scatter 4 times (once per marker):
for marker, d in df.groupby('marker'):
plt.scatter(x=d['index'], y=d['data'], c=d['color'], marker=marker, label=marker)
plt.legend()
seaborn
As r-beginners commented, sns.scatterplot supports multiple markers via style:
sns.scatterplot(x=df['index'], y=df['data'], c=df['color'], style=df['marker'])
I'd like to plot lines from a 3D data frame, the third dimension being an extra level in the column index. But I can't manage to either wrangle the data in a proper format or call the plot function appropriately. What I'm looking for is a plot where many series are plotted in subplots arranged by the outer column index. Let me illustrate with some random data.
import numpy as np
import pandas as pd
n_points_per_series = 6
n_series_per_feature = 5
n_features = 4
shape = (n_points_per_series, n_features, n_series_per_feature)
data = np.random.randn(*shape).reshape(n_points_per_series, -1)
points = range(n_points_per_series)
features = [chr(ord('a') + i) for i in range(n_features)]
series = [f'S{i}' for i in range(n_series_per_feature)]
index = pd.Index(points, name='point')
columns = pd.MultiIndex.from_product((features, series)).rename(['feature', 'series'])
data = pd.DataFrame(data, index=index, columns=columns)
So for this particular data frame, 4 subplots (n_features) should be generated, each containing 5 (n_series_per_feature) series with 6 data points. Since the method plots lines in the index direction and subplots can be generated for each column, I tried some variations:
data.plot()
data.plot(subplots=True)
data.stack().plot()
data.stack().plot(subplots=True)
None of them work. Either too many lines are generated with no subplots, a subplot is made for each line separately or after stacking values along the index are joined to one long series. And I think the x and y arguments are not usable here, since converting the index to a column and using it in x just produces a long line jumping all over the place:
data.stack().reset_index().set_index('series').plot(x='point', y=features)
In my experience this sort of stuff should be pretty straight forward in Pandas, but I'm at a loss. How could this subplot arrangement be achieved? If not a single function call, are there any more convenient ways than generating subplots in matplotlib and indexing the series for plotting manually?
If you're okay with using seaborn, it can be used to produce subplots from a data frame column, onto which plots with other columns can then be mapped. With the same setup you had I'd try something along these lines:
import seaborn as sns
# Completely stack the data frame
df = data \
.stack() \
.stack() \
.rename("value") \
.reset_index()
# Create grid and map line plots
g = sns.FacetGrid(df, col="feature", col_wrap=2, hue="series")
g.map_dataframe(sns.lineplot, x="point", y="value")
g.add_legend()
Output:
1 - My goal is to create a bar plot of grades (y axis) and students id (x axis).
2 - Add an extra column with the mean() of the grades in a different color.
What's the best way of doing it?
I could create the first part but when it comes to change the color of the following column (mean), I couldn't finish it.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
a = pd.read_excel('x.xlsx')
Felipe_stu = a['Teacher'] == 'Felipe'
Felipe_stu.plot(kind = 'bar', figsize = (20,5), color = 'gold')
Example of data (the first 10):
data example
Example of plot:
I've already tried to create a list with all the colors of the respective items on the plot.
Such as:
my_color = []
for c in range(0, len(Jorge_stu))
my_color.append('gold')
my_color.append('blue')
So, I would make the last column (the mean) in the color that I chose (blue in this case). This didn't work.
Any ideas how can I put the mean column on my plot?
Is it a better option to add an extra column to the plot or to add it in the proper dataframe and afterwards plot it?
U may need to do something like this:
How to create a matplotlib bar chart with a threshold line?
the threshold value in the above example, will be ur mean line, and that can be simply calculated with the df[score_column_name].mean()
This question already has an answer here:
seaborn two corner pairplot
(1 answer)
Closed 1 year ago.
I wanted to do a pairplot with two different dataframes data_up and data_low on the lower part and the upper part of the pairgrid. The two dataframes have both 4 columns, wich correspond to the variables.
Looking at Pairgrid, i did not found a way to give different data to each triangle.
e.g :
import numpy as np
import seaborn as sns
import pandas as pd
# Dummy data :
data_up = np.random.uniform(size=(100,4))
data_low = np.random.uniform(size=(100,4))
# The two pairplots i currently uses and want to mix :
sns.pairplot(pd.DataFrame(data_up))
sns.pairplot(pd.DataFrame(data_low))
How can i have only the upper triangle of the first one plotted witht he lower traingle of the second one ? On the diagonal i dont really care what's plotted. Maybe a qqplot between the two corresponding marginals could be nice, but i'll see later.
You could try to put all columns together in the dataframe, and then use x_vars=... to tell which columns to use for the x-direction. Similar for y.
import numpy as np
import seaborn as sns
import pandas as pd
# Dummy data :
data_up_down = np.random.uniform(size=(100,8))
df = pd.DataFrame(data_up_down)
# use columns 0..3 for the x and 4..7 for the y
sns.pairplot(df, x_vars=(0,1,2,3), y_vars=(4,5,6,7))
import matplotlib.pyplot as plt
plt.show()
This question already has answers here:
Inconsistency when setting figure size using pandas plot method
(2 answers)
Closed 4 years ago.
In the two snippets below, where the only difference seems to be the datasource type (pd.Series vs pd.DataFrame), does plt.figure(num=None, figsize=(12, 3), dpi=80) have an effect in one case but not in the other when using pd.DataFrame.plot?
Snippet 1 - Adjusting plot size when data is a pandas Series
# Imports
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
# data
np.random.seed(123)
df = pd.Series(np.random.randn(10000),index=pd.date_range('1/1/2000', periods=10000)).cumsum()
print(type(df))
# plot
plt.figure(num=None, figsize=(12, 3), dpi=80)
ax = df.plot()
plt.show()
Output 1
Snippet 2 - Now the data source is a pandas Dataframe
# imports
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
# data
np.random.seed(123)
dfx = pd.Series(np.random.randn(100),index=pd.date_range('1/1/2000', periods=100)).cumsum()
dfy = pd.Series(np.random.randn(100),index=pd.date_range('1/1/2000', periods=100)).cumsum()
df = pd.concat([dfx, dfy], axis = 1)
print(type(df))
# plot
plt.figure(num=None, figsize=(12, 3), dpi=80)
ax = df.plot()
plt.show()
The only difference here seems to be the type of the datasource. Why would that have something to say for the matplotlib output?
It seems that pd.Dataframe.plot() works a bit differently from pd.Series.plot(). Since the dataframe might have any number of columns, which might require subplots, different axes, etc., Pandas defaults to creating a new figure. The way around this is to feed the arguments directly to the plot call, ie, df.plot(figsize=(12, 3)) (dpi isn't accepted as a keyword-argument, unfortunately). You can read more about in this great answer:
In the first case, you create a matplotlib figure via fig =
plt.figure(figsize=(10,4)) and then plot a single column DataFrame.
Now the internal logic of pandas plot function is to check if there is
already a figure present in the matplotlib state machine, and if so,
use it's current axes to plot the columns values to it. This works as
expected.
However in the second case, the data consists of two columns. There
are several options how to handle such a plot, including using
different subplots with shared or non-shared axes etc. In order for
pandas to be able to apply any of those possible requirements, it will
by default create a new figure to which it can add the axes to plot
to. The new figure will not know about the already existing figure and
its size, but rather have the default size, unless you specify the
figsize argument.