matplotlib legend order horizontally first - python

Is there a way to make a plot legend run horizontally (left to right) instead of vertically, without specifying the number of columns (ncol=...)?
I'm plotting a varying number of lines (roughly 5-15), and I'd rather not try to calculate the optimal number of columns dynamically (i.e. the number of columns that will fit across the figure without running off, when the labels are varying). Also, when there is more than a single row, the ordering of entries goes top-down, then left-right; if it could default to horizontal, this would also be alleviated.
Related: Matplotlib legend, add items across columns instead of down

It seems at this time that matplotlib defaults to a vertical layout. Though not ideal, an option is to do the number of lines/2, as a workaround:
import math
import numpy as np
npoints = 1000
xs = [np.random.randn(npoints) for i in 10]
ys = [np.random.randn(npoints) for i in 10]
for x, y in zip(xs, ys):
ax.scatter(x, y)
nlines = len(xs)
ncol = int(math.ceil(nlines/2.))
plt.legend(ncol=ncol)
So here you would take the length of the number of lines you're plotting (via nlines = len(xs)) and then transform that into a number of columns, via ncol = int(math.ceil(nlines/2.)) and send that to plt.legend(ncol=ncol)

Related

How to dynamically plot multiple subplots in Python?

I need to plot a variable number of plots (at least 1 but it isn't known the number max) and I couldn't come up with a way to dynamically create and assign subplots to the given graphs.
The code looks like this:
check = False
if "node_x_9" in names:
if "node_x_11" in names:
plt.plot(df["node_x_9"], df["node_x_11"])
check = True
elif "node_x_10" in names:
if "node_x_12" in names:
plt.plot(df["node_x_10", "node_x_12"])
check = True
if check:
plt.show()
I thought about presetting a number of subplots (e.g. plt.subplots(3, 3)) but I still could not come up with a way to assign the plots without bounding them to a given subplot position.
My idea would be to create a 2x1 plot if I have two subplots, 1x1 if I have one, 3x1 if I have 3 and so on and not letting any subplot space empty.
I've come across cases like this, you want to generate one plot per case, but don't know how many cases exist until you query the data on the day.
I used a square layout as an assumption (alter the below if you require a different aspect ratio) then count how many cases you have - find the integer square-root, which, plus one, will give you the integer side-length of a square that is guaranteed to fit your requirements.
Now, you can establish a matplotlib Gridspec object with the requisite width and height, referencing it by index to place your individual plots.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib import gridspec
import random
# Create some random data with size=`random` number between 5 and 100
size = random.randint(5,100)
data_rows = pd.DataFrame([np.random.normal(1,5,25) for s in range(0,size)])
# Find the length of a (near) square based on the number of the data samples
side_length = int(len(data_rows)**(1/2))+1
print(side_length)
#Create a gridspec object based on the side_length in both x and y dimensions
gs=gridspec.GridSpec(side_length, side_length)
fig = plt.figure(figsize=(10,10))
# Using the index i, populate the gridpsec object with
# one plot per cell.
for i,row in data_rows.iterrows():
ax=fig.add_subplot(gs[i])
plt.bar(x=range(0,25),height=row)

How to adjust space between bars in plot.bar? [duplicate]

I'm trying to make a grouped bar plot in matplotlib, following the example in the gallery. I use the following:
import matplotlib.pyplot as plt
plt.figure(figsize=(7,7), dpi=300)
xticks = [0.1, 1.1]
groups = [[1.04, 0.96],
[1.69, 4.02]]
group_labels = ["G1", "G2"]
num_items = len(group_labels)
ind = arange(num_items)
width = 0.1
s = plt.subplot(1,1,1)
for num, vals in enumerate(groups):
print "plotting: ", vals
group_len = len(vals)
gene_rects = plt.bar(ind, vals, width,
align="center")
ind = ind + width
num_groups = len(group_labels)
# Make label centered with respect to group of bars
# Is there a less complicated way?
offset = (num_groups / 2.) * width
xticks = arange(num_groups) + offset
s.set_xticks(xticks)
print "xticks: ", xticks
plt.xlim([0 - width, max(xticks) + (num_groups * width)])
s.set_xticklabels(group_labels)
My questions are:
How can I control the space between the groups of bars? Right now the spacing is huge and it looks silly. Note that I do not want to make the bars wider - I want them to have the same width, but be closer together.
How can I get the labels to be centered below the groups of bars? I tried to come up with some arithmetic calculations to position the xlabels in the right place (see code above) but it's still slightly off... it feels a bit like writing a plotting library rather than using one. How can this be fixed? (Is there a wrapper or built in utility for matplotlib where this is default behavior?)
EDIT: Reply to #mlgill: thank you for your answer. Your code is certainly much more elegant but still has the same issue, namely that the width of the bars and the spacing between the groups are not controlled separately. Your graph looks correct but the bars are far too wide -- it looks like an Excel graph -- and I wanted to make the bar thinner.
Width and margin are now linked, so if I try:
margin = 0.60
width = (1.-2.*margin)/num_items
It makes the bar skinnier, but brings the group far apart, so the plot again does not look right.
How can I make a grouped bar plot function that takes two parameters: the width of each bar, and the spacing between the bar groups, and plots it correctly like your code did, i.e. with the x-axis labels centered below the groups?
I think that since the user has to compute specific low-level layout quantities like margin and width, we are still basically writing a plotting library :)
Actually I think this problem is best solved by adjusting figsize and width; here is my output with figsize=(2,7) and width=0.3:
By the way, this type of thing becomes a lot simpler if you use pandas wrappers (i've also imported seaborn, not necessary for the solution, but makes the plot a lot prettier and more modern looking in my opinion):
import pandas as pd
import seaborn
seaborn.set()
df = pd.DataFrame(groups, index=group_labels)
df.plot(kind='bar', legend=False, width=0.8, figsize=(2,5))
plt.show()
The trick to both of your questions is understanding that bar graphs in Matplotlib expect each series (G1, G2) to have a total width of "1.0", counting margins on either side. Thus, it's probably easiest to set margins up and then calculate the width of each bar depending on how many of them there are per series. In your case, there are two bars per series.
Assuming you left align each bar, instead of center aligning them as you had done, this setup will result in series which span from 0.0 to 1.0, 1.0 to 2.0, and so forth on the x-axis. Thus, the exact center of each series, which is where you want your labels to appear, will be at 0.5, 1.5, etc.
I've cleaned up your code as there were a lot of extraneous variables. See comments within.
import matplotlib.pyplot as plt
import numpy as np
plt.figure(figsize=(7,7), dpi=300)
groups = [[1.04, 0.96],
[1.69, 4.02]]
group_labels = ["G1", "G2"]
num_items = len(group_labels)
# This needs to be a numpy range for xdata calculations
# to work.
ind = np.arange(num_items)
# Bar graphs expect a total width of "1.0" per group
# Thus, you should make the sum of the two margins
# plus the sum of the width for each entry equal 1.0.
# One way of doing that is shown below. You can make
# The margins smaller if they're still too big.
margin = 0.05
width = (1.-2.*margin)/num_items
s = plt.subplot(1,1,1)
for num, vals in enumerate(groups):
print "plotting: ", vals
# The position of the xdata must be calculated for each of the two data series
xdata = ind+margin+(num*width)
# Removing the "align=center" feature will left align graphs, which is what
# this method of calculating positions assumes
gene_rects = plt.bar(xdata, vals, width)
# You should no longer need to manually set the plot limit since everything
# is scaled to one.
# Also the ticks should be much simpler now that each group of bars extends from
# 0.0 to 1.0, 1.0 to 2.0, and so forth and, thus, are centered at 0.5, 1.5, etc.
s.set_xticks(ind+0.5)
s.set_xticklabels(group_labels)
I read an answer that Paul Ivanov posted on Nabble that might solve this problem with less complexity. Just set the index as below. This will increase the spacing between grouped columns.
ind = np.arange(0,12,2)

Scatter matrix with few variables with target in Y axis

I am trying to plot variable Vs SalePrice data. I tried pd.scatter_matrix but I am getting number of unnecessary plot with various combinations. I look for is SalePrice in Y axis and a scatter plot for each element from the data set. Here is the code I tried.
data_prep_num['Sales_test_data']=data_sales_price_old
att=['Sales_test_data','YearBuilt','LotArea','MSSubClass','BsmtFinSF1','TotalBsmtSF','1stFlrSF','2ndFlrSF','GrLivArea','GarageArea']
pd.scatter_matrix(data_prep_num[att],alpha=.4,figsize=(30,30))```
If you want to use pd.plotting.scatter_matrix but only want one of the rows (i.e. the Sales_test_data column), you can iterate over the plotting axes, and hide the combinations you don't want.
Assuming the SalePrice is the very first column (index 0):
import numpy as np
import matplotlib.pyplot as plt
axes = pd.plotting.scatter_matrix(data_prep_num[att], alpha=0.4, figsize=(30,30))
for i in range(np.shape(axes)[0]):
if i != 0:
for j in range(np.shape(axes)[1]):
axes[i,j].set_visible(False)
Note: This is obviously not super efficient when you start having lots of columns though.

Summing lines in pyplot

I have a pyplot figure with a few lines on it. I would like to be able to draw an extra line, which would be a sum of all others' values. The lines are not plotted against the same x values (they are visually shorter in the plot - see the image). The resulting line would be somewhat above all others.
One idea I have for it requires obtaining a line's y value in a specific x point. Is there such a function? Or does pyplot/matplotlib support summing lines' values?
Superposition it the short answer to your question: read this for more.
Example:
import matplotlib.pyplot as plt
import numpy as np
x = np.arange(100) #range of x axis
y1 = np.random.rand(100,1) #some random numbers
y2 = np.random.rand(100,1)
#this will only plot y1 value
plt.plot(x,y1)
plt.show()
#this will plot summation of two elements
plt.plot(x,y1+y2)
plt.show()
I took a second look at your question, what I saw is your y values have different length so adding them would not be the case as shown in example above. What you can do is create equal sized 4 lists, where non existing values in that list is zero, then you can apply super position to this (simply add all of them and then plot)
For the future generations: numpy.interp() was my solution to this problem.

Altering individual graphs in Pandas hist groupby plot

I'm plotting a frequencies group by countries in an iPython notebook using:
df['country'].hist(by=df['frequency'], bins=yr_bins)
But the resultant figure is badly formatted;
Things I'd like/like to be able to define:
y axis log or not
sizing of individual graphs
x axis limits
auto layout
spacing between each individual graph so the labels don't over lap
Things I've realised so far:
the call to .hist outputs a 9x9 2d array of matplotlib.axes._subplots.AxesSubplot objects
all of these AxesSubplotss are embedded in a single figure
Best working case so far:
For log or not: just using keyword log=True
Sizing of individual graphs and an auto layout:
Determine the number of groups: n = len(df.groupby('country')
Then use the combination of keywords layout=(row, column) and figsize(width, height):
Hard code number of columns to c, and desired width w and height h ofeach graph
Then use layout((n / c), c) and figsize=((c * w), (( (n/c) * h ))
Setting x axis limits: get the axes array by axes = df... then loop over the axes applying set_xlim(lim)
for row in axes:
for ax in row:
ax.set_xlim(lim)
The spacing turnout of but if required then do:
plt.subplots_adjust(wspace=0.5, hspace=1)

Categories