I need to plot a variable number of plots (at least 1 but it isn't known the number max) and I couldn't come up with a way to dynamically create and assign subplots to the given graphs.
The code looks like this:
check = False
if "node_x_9" in names:
if "node_x_11" in names:
plt.plot(df["node_x_9"], df["node_x_11"])
check = True
elif "node_x_10" in names:
if "node_x_12" in names:
plt.plot(df["node_x_10", "node_x_12"])
check = True
if check:
plt.show()
I thought about presetting a number of subplots (e.g. plt.subplots(3, 3)) but I still could not come up with a way to assign the plots without bounding them to a given subplot position.
My idea would be to create a 2x1 plot if I have two subplots, 1x1 if I have one, 3x1 if I have 3 and so on and not letting any subplot space empty.
I've come across cases like this, you want to generate one plot per case, but don't know how many cases exist until you query the data on the day.
I used a square layout as an assumption (alter the below if you require a different aspect ratio) then count how many cases you have - find the integer square-root, which, plus one, will give you the integer side-length of a square that is guaranteed to fit your requirements.
Now, you can establish a matplotlib Gridspec object with the requisite width and height, referencing it by index to place your individual plots.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib import gridspec
import random
# Create some random data with size=`random` number between 5 and 100
size = random.randint(5,100)
data_rows = pd.DataFrame([np.random.normal(1,5,25) for s in range(0,size)])
# Find the length of a (near) square based on the number of the data samples
side_length = int(len(data_rows)**(1/2))+1
print(side_length)
#Create a gridspec object based on the side_length in both x and y dimensions
gs=gridspec.GridSpec(side_length, side_length)
fig = plt.figure(figsize=(10,10))
# Using the index i, populate the gridpsec object with
# one plot per cell.
for i,row in data_rows.iterrows():
ax=fig.add_subplot(gs[i])
plt.bar(x=range(0,25),height=row)
Related
1 - My goal is to create a bar plot of grades (y axis) and students id (x axis).
2 - Add an extra column with the mean() of the grades in a different color.
What's the best way of doing it?
I could create the first part but when it comes to change the color of the following column (mean), I couldn't finish it.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
a = pd.read_excel('x.xlsx')
Felipe_stu = a['Teacher'] == 'Felipe'
Felipe_stu.plot(kind = 'bar', figsize = (20,5), color = 'gold')
Example of data (the first 10):
data example
Example of plot:
I've already tried to create a list with all the colors of the respective items on the plot.
Such as:
my_color = []
for c in range(0, len(Jorge_stu))
my_color.append('gold')
my_color.append('blue')
So, I would make the last column (the mean) in the color that I chose (blue in this case). This didn't work.
Any ideas how can I put the mean column on my plot?
Is it a better option to add an extra column to the plot or to add it in the proper dataframe and afterwards plot it?
U may need to do something like this:
How to create a matplotlib bar chart with a threshold line?
the threshold value in the above example, will be ur mean line, and that can be simply calculated with the df[score_column_name].mean()
I am trying to plot variable Vs SalePrice data. I tried pd.scatter_matrix but I am getting number of unnecessary plot with various combinations. I look for is SalePrice in Y axis and a scatter plot for each element from the data set. Here is the code I tried.
data_prep_num['Sales_test_data']=data_sales_price_old
att=['Sales_test_data','YearBuilt','LotArea','MSSubClass','BsmtFinSF1','TotalBsmtSF','1stFlrSF','2ndFlrSF','GrLivArea','GarageArea']
pd.scatter_matrix(data_prep_num[att],alpha=.4,figsize=(30,30))```
If you want to use pd.plotting.scatter_matrix but only want one of the rows (i.e. the Sales_test_data column), you can iterate over the plotting axes, and hide the combinations you don't want.
Assuming the SalePrice is the very first column (index 0):
import numpy as np
import matplotlib.pyplot as plt
axes = pd.plotting.scatter_matrix(data_prep_num[att], alpha=0.4, figsize=(30,30))
for i in range(np.shape(axes)[0]):
if i != 0:
for j in range(np.shape(axes)[1]):
axes[i,j].set_visible(False)
Note: This is obviously not super efficient when you start having lots of columns though.
I have a pyplot figure with a few lines on it. I would like to be able to draw an extra line, which would be a sum of all others' values. The lines are not plotted against the same x values (they are visually shorter in the plot - see the image). The resulting line would be somewhat above all others.
One idea I have for it requires obtaining a line's y value in a specific x point. Is there such a function? Or does pyplot/matplotlib support summing lines' values?
Superposition it the short answer to your question: read this for more.
Example:
import matplotlib.pyplot as plt
import numpy as np
x = np.arange(100) #range of x axis
y1 = np.random.rand(100,1) #some random numbers
y2 = np.random.rand(100,1)
#this will only plot y1 value
plt.plot(x,y1)
plt.show()
#this will plot summation of two elements
plt.plot(x,y1+y2)
plt.show()
I took a second look at your question, what I saw is your y values have different length so adding them would not be the case as shown in example above. What you can do is create equal sized 4 lists, where non existing values in that list is zero, then you can apply super position to this (simply add all of them and then plot)
For the future generations: numpy.interp() was my solution to this problem.
I'm beginning with plotting on python using the very nice pyplot. I aim at showing the evolution of two series of data along time. Instead of doing a casual plot of data function of time, I'd like to have a scatter plot (data1,data2) where the time component is shown as a color gradient.
In my two column file, the time would be described by the line number. Either written as a 3rd column in the file either using the intrinsic capability of pyplot to get the line number on its own.
Can anyone help me in doing that ?
Thanks a lot.
Nicolas
When plotting using matplotlib.pyplot.scatter you can pass a third array via the keyword argument c. This array can choose the colors that you want your scatter points to be. You then also pick an appropriate colormap from matplotlib.cm and assign that with the cmap keyword argument.
This toy example creates two datasets data1 and data2. It then also creates an array colors, an array of continual values equally spaced between 0 and 1, and with the same length as data1 and data2. It doesn't need to know the "line number", it just needs to know the total number of data points, and then equally spaces the colors.
I've also added a colorbar. You can remove this by removing the plt.colorbar() line.
import matplotlib.pyplot as plt
from matplotlib import cm
import numpy as np
N = 500
data1 = np.random.randn(N)
data2 = np.random.randn(N)
colors = np.linspace(0,1,N)
plt.scatter(data1, data2, c=colors, cmap=cm.Blues)
plt.colorbar()
plt.show()
Is there a way to make a plot legend run horizontally (left to right) instead of vertically, without specifying the number of columns (ncol=...)?
I'm plotting a varying number of lines (roughly 5-15), and I'd rather not try to calculate the optimal number of columns dynamically (i.e. the number of columns that will fit across the figure without running off, when the labels are varying). Also, when there is more than a single row, the ordering of entries goes top-down, then left-right; if it could default to horizontal, this would also be alleviated.
Related: Matplotlib legend, add items across columns instead of down
It seems at this time that matplotlib defaults to a vertical layout. Though not ideal, an option is to do the number of lines/2, as a workaround:
import math
import numpy as np
npoints = 1000
xs = [np.random.randn(npoints) for i in 10]
ys = [np.random.randn(npoints) for i in 10]
for x, y in zip(xs, ys):
ax.scatter(x, y)
nlines = len(xs)
ncol = int(math.ceil(nlines/2.))
plt.legend(ncol=ncol)
So here you would take the length of the number of lines you're plotting (via nlines = len(xs)) and then transform that into a number of columns, via ncol = int(math.ceil(nlines/2.)) and send that to plt.legend(ncol=ncol)