I have a three-dimensional array.
The first dimension has 4 elements.
The second dimension has 10 elements.
The third dimension has 5 elements.
I want to plot the contents of this array as follows.
Each element of the first dimension gets its own graph (four graphs on the page)
The values of the second dimension correspond to the y values of the graphs. (there are 10 lines on each graph)
The values of the third dimension correspond to the x values of the graphs (each of the 10 lines has 5 x values)
I'm pretty new to python, and even newer to graphing.
I figured out how to correctly load my array with the data...and I'm not even trying to get the 'four graphs on one page' aspect working.
For now I just want one graph to work correctly.
Here's what I have so far (once my array is set up, and I've correctly loaded my arrays. Right now the graph shows up, but it's blank, and the x-axis includes negative values. None of my data is negative)
for n in range(1):
for m in range(10):
for o in range(5):
plt.plot(quadnumcounts[n][m][o])
plt.xlabel("Trials")
plt.ylabel("Frequency")
plt.show()
Any help would be really appreciated!
Edit. Further clarification. Let's say my array is loaded as follows:
myarray[0][1][0] = 22
myarray[0][1][1] = 10
myarray[0][1][2] = 15
myarray[0][1][3] = 25
myarray[0][1][4] = 13
I want there to be a line, with the y values 22, 10, 15, 25, 13, and the x values 1, 2, 3, 4, 5 (since it's 0 indexed, I can just +1 before printing the label)
Then, let's say I have
myarray[0][2][0] = 10
myarray[0][2][1] = 17
myarray[0][2][2] = 9
myarray[0][2][3] = 12
myarray[0][2][4] = 3
I want that to be another line, following the same rules as the first.
Here's how to make the 4 plots with 10 lines in each.
import matplotlib.pyplot as plt
for i, fig_data in enumerate(quadnumcounts):
# Set current figure to the i'th subplot in the 2x2 grid
plt.subplot(2, 2, i + 1)
# Set axis labels for current figure
plt.xlabel('Trials')
plt.ylabel('Frequency')
for line_data in fig_data:
# Plot a single line
xs = [i + 1 for i in range(len(line_data))]
ys = line_data
plt.plot(xs, ys)
# Now that we have created all plots, show the result
plt.show()
Here is the example of creating subplots of your data. You have not provided the dataset so I used x to be an angle from 0 to 360 degrees and the y to be the trigonemetric functions of x (sine and cosine).
Code example:
import numpy as np
import pylab as plt
x = np.arange(0, 361) # 0 to 360 degrees
y = []
y.append(1*np.sin(x*np.pi/180.0))
y.append(2*np.sin(x*np.pi/180.0))
y.append(1*np.cos(x*np.pi/180.0))
y.append(2*np.cos(x*np.pi/180.0))
z = [[x, y[0]], [x, y[1]], [x, y[2]], [x, y[3]]] # 3-dimensional array
# plot graphs
for count, (x_data, y_data) in enumerate(z):
plt.subplot(2, 2, count + 1)
plt.plot(x_data, y_data)
plt.xlabel('Angle')
plt.ylabel('Amplitude')
plt.grid(True)
plt.show()
Output:
UPDATE:
Using the sample date you provided in your update, you could proceed as follows:
import numpy as np
import pylab as plt
y1 = (10, 17, 9, 12, 3)
y2 = (22, 10, 15, 25, 13)
y3 = tuple(reversed(y1)) # generated for explanation
y4 = tuple(reversed(y2)) # generated for explanation
mydata = [y1, y2, y3, y4]
# plot graphs
for count, y_data in enumerate(mydata):
x_data = range(1, len(y_data) + 1)
print x_data
print y_data
plt.subplot(2, 2, count + 1)
plt.plot(x_data, y_data, '-*')
plt.xlabel('Trials')
plt.ylabel('Frequency')
plt.grid(True)
plt.show()
Note that the dimensions are slightly different from yours. Here they are such that mydata[0][0] == 10, mydata[1][3] == 25 etc. The output is show below:
Related
I have some data as numpy arrays x, y, v as shown in the code below.
This is actually dummy data for velocity (v) of dust particles in a x-y plane.
I have binned my data into 4 bins and for each bin I have calculated mean of entries in each bin and made a heat map.
Now what I want to do is make a histogram/distribution of v in each bin with 0 as the centre of the histogram.
I do not want to plot the mean anymore, just want to divide my data into the same bins as this code and for each bin I want to generate a histogram of the values in the bins.
How should I do it?
I think this is a way to model the spectrum of an emission line from the gas particles. Any help is appreciated! Thanks!
from scipy import stats
import numpy as np
import matplotlib.pyplot as plt
x = np.array([-10,-2,4,12,3,6,8,14,3])
y = np.array([5,5,-6,8,-20,10,2,2,8])
v = np.array([4,-6,-10,40,22,-14,20,8,-10])
x_bins = np.linspace(-20, 20, 3)
y_bins = np.linspace(-20, 20, 3)
H, xedges, yedges = np.histogram2d(x, y, bins = [x_bins, y_bins], weights = v)
pstat = stats.binned_statistic_2d(x, y, v, statistic='mean', bins = [x_bins, y_bins])
plt.xlabel("x")
plt.ylabel("y")
plt.imshow(pstat.statistic.T, origin='lower', cmap='RdBu',
extent=[xedges[0], xedges[-1], yedges[0], yedges[-1]])
plt.colorbar().set_label('mean', rotation=270)
EDIT: Please note that my original data is huge. My arrays for x,y, v are very large and I am using 30x30 grid, that is, not just 4quadrants but 900 bins. I might also need to increase the bin number. So, we want to find a way to automatically divide the 'v' data into the regularly spaced bins and then be able to plot the histograms of the 'v' data in each bin.
I would iterate over the zipped x and y, then flag if v is inside the quadrant and append them to a quadrant list. after, you can plot whatever you'd like:
x = np.array([-10,-2,4,12,3,6,8,14,3])
y = np.array([5,5,-6,8,-20,10,2,2,8])
v = np.array([4,-6,-10,40,22,-14,20,8,-10])
q1 = []
q2 = []
q3 = []
q4 = []
for i, (x1,y1) in enumerate(zip(x,y)):
if x1<0 and y1>=0:
q1.append(v[i])
elif x1>=0 and y1>=0:
q2.append(v[i])
elif x1>=0 and y1<0:
q3.append(v[i])
elif x1<0 and y1<0:
q4.append(v[i])
print(q1)
print(q2)
print(q3)
print(q4)
#[4, -6]
#[40, -14, 20, 8, -10]
#[-10, 22]
#[]
plt.hist(q1, density=True)
plt.hist(q2, density=True)
plt.hist(q3, density=True)
#q4 is empty
I have the following dataframe where it contains the best equipment in operation ranked by 1 to 300 (1 is the best, 300 is the worst) over a few days (df columns)
Equipment 21-03-27 21-03-28 21-03-29 21-03-30 21-03-31 21-04-01 21-04-02
P01-INV-1-1 1 1 1 1 1 2 2
P01-INV-1-2 2 2 4 4 5 1 1
P01-INV-1-3 4 4 3 5 6 10 10
I would like to customize a line plot (example found here) but I'm having some troubles trying to modify the example code provided:
import matplotlib.pyplot as plt
import numpy as np
def energy_rank(data, marker_width=0.1, color='blue'):
y_data = np.repeat(data, 2)
x_data = np.empty_like(y_data)
x_data[0::2] = np.arange(1, len(data)+1) - (marker_width/2)
x_data[1::2] = np.arange(1, len(data)+1) + (marker_width/2)
lines = []
lines.append(plt.Line2D(x_data, y_data, lw=1, linestyle='dashed', color=color))
for x in range(0,len(data)*2, 2):
lines.append(plt.Line2D(x_data[x:x+2], y_data[x:x+2], lw=2, linestyle='solid', color=color))
return lines
data = ranks.head(4).to_numpy() #ranks is the above dataframe
artists = []
for row, color in zip(data, ('red','blue','green','magenta')):
artists.extend(energy_rank(row, color=color))
fig, ax = plt.subplots()
ax.set_xticklabels(ranks.columns) # set X axis to be dataframe columns
ax.set_xticklabels(ax.get_xticklabels(), rotation=35, fontsize = 10)
for artist in artists:
ax.add_artist(artist)
ax.set_ybound([15,0])
ax.set_xbound([.5,8.5])
When using ax.set_xticklabels(ranks.columns), for some reason, it only plots 5 of the 7 days from ranks columns, removing specifically the first and last values. I tried to duplicate those values but this did not work as well. I end up having this below:
In summary, I would like to know if its possible to do 3 customizations:
input all dates from ranks columns on X axis
revert Y axis. ax.set_ybound([15,0]) is not working. It would make more sense to see the graph starting with 0 on top, since 1 is the most important rank to look at
add labels to the end of each line at the last day (last value on X axis). I could add the little window label, but it often gets really messy when you plot more data, so adding just the text at the end of each line would really make it look cleaner
Please let me know if those customizations are impossible to do and any help is really appreciated! Thank you in advance!
To show all the dates, use plt.xticks() and set_xbound to start at 0. To reverse the y axis, use ax.set_ylim(ax.get_ylim()[::-1]). To set the legends the way you described, you can use annotation and set the coordinates of the annotation at your last datapoint for each series.
fig, ax = plt.subplots()
plt.xticks(np.arange(len(ranks.columns)), list(ranks.columns), rotation = 35, fontsize = 10)
plt.xlabel('Date')
plt.ylabel('Rank')
for artist in artists:
ax.add_artist(artist)
ax.set_ybound([0,15])
ax.set_ylim(ax.get_ylim()[::-1])
ax.set_xbound([0,8.5])
ax.annotate('Series 1', xy =(7.1, 2), color = 'red')
ax.annotate('Series 2', xy =(7.1, 1), color = 'blue')
ax.annotate('Series 3', xy =(7.1, 10), color = 'green')
plt.show()
Here is the plot for the three rows of data in your sample dataframe:
Easy question; I have 2 sets of data, approx 500 entries:
iStart, iStop, iStep = 0.2, 100, 0.2
x = list(np.arange(iStart, iStop+iStep, iStep))
y = np.random.uniform(20,25,(500,1))
plt.plot(x, y[::-1])
I want to have on the x-axis from left to right the vector x; descending. When I use there [::-1], the y values change as well.
There is a Matplotlib example for doing just this.
Edit: OP asked about adjusting x-ticks... I have made a deliberately awful example to show how this works. Passing an array into plt.xticks([100, 20, 5]) tells it which to show. Using np.arrange(start, stop, step) for example could give you an evenly spaced array.
Solution
import numpy as np
iStart, iStop, iStep = 0.2, 100, 0.2
x = list(np.arange(iStart, iStop+iStep, iStep))
y = np.random.uniform(20,25,(500,1))
fig, ax = plt.subplots(nrows = 1, ncols = 1)
ax.plot(x, y[::-1])
ax.set_xlim(np.max(x),np.min(x))
plt.xticks([100, 20, 5])
plt.show()
Output
I am doing a Kernel Density Estimation in Python and getting the contours and paths as shown below. (here is my sample data: https://pastebin.com/193PUhQf).
from numpy import *
from math import *
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
x_2d = []
y_2d = []
data = {}
data['nodes'] = []
# here is the sample data:
# https://pastebin.com/193PUhQf
X = [.....]
for Picker in xrange(0, len(X)):
x_2d.append(X[Picker][0])
y_2d.append(X[Picker][1])
# convert to arrays
m1 = np.array([x_2d])
m2 = np.array([y_2d])
x_min = m1.min() - 30
x_max = m1.max() + 30
y_min = m2.min() - 30
y_max = m2.max() + 30
x, y = np.mgrid[x_min:x_max:200j, y_min:y_max:200j]
positions = np.vstack([x.ravel(), y.ravel()])
values = np.vstack([m1, m2])
kde = stats.gaussian_kde(values)
z = np.reshape(kde(positions).T, x.shape)
fig = plt.figure(2, dpi=200)
ax = fig.add_subplot(111)
pc = ax.pcolor(x, y, z)
cb = plt.colorbar(pc)
cb.ax.set_ylabel('Probability density')
c_s = plt.contour(x, y, z, 20, linewidths=1, colors='k')
ax.plot(m1, m2, 'o', mfc='w', mec='k')
ax.set_title("My Title", fontsize='medium')
plt.savefig("kde.png", dpi=200)
plt.show()
There is a similar way to get the contours using R, which is described here:
http://bl.ocks.org/diegovalle/5166482
Question: how can I achieve the same output using my python script or as a start point?
the desired output should be like contours_tj.json which can be used by leaflet.js lib.
UPDATE:
My input data structure is composed of three columns, comma separated:
first one is the X value
second one is the Y value
third one is the ID of my data, it has no numerical value, it is simply an identifier of the data point.
Update 2:
Question, if simply put, is that I want the same output as in the above link using my input file which is in numpy array format.
update 3:
my input data structure is of list type:
print type(X)
<type 'list'>
and here are the first few lines:
print X[0:5]
[[10.800584, 11.446064, 4478597], [10.576840,11.020229, 4644503], [11.434276,10.790881, 5570870], [11.156718,11.034633, 6500333], [11.054956,11.100243, 6513301]]
geojsoncontour is a python library to convert matplotlib contours to geojson
geojsoncontour.contour_to_geojson requires a contour_levels argument. The levels in pyplot.contour are chosen automatically, but you can access them with c_s._levels
So, for your example you could do:
import geojsoncontour
# your code here
c_s = plt.contour(x, y, z, 20, linewidths=1, colors='k')
# Convert matplotlib contour to geojson
geojsoncontour.contour_to_geojson(
contour=c_s,
geojson_filepath='out.geojson',
contour_levels=c_s._levels,
ndigits=3,
unit='m'
)
I am new with python I am trying to save a huge bunch of data into a pdf with figures using PdfPages of matplotlib and subplots. Problem is that I found a blottleneck I dont know how to solve, the code goes something like:
import matplotlib.pyplot as plt
from matplotlib.backends.backend_pdf import PdfPages
with PdfPages('myfigures.pdf') as pdf:
for i in range(1000):
f,axarr = plt.subplots(2, 3)
plt.subplots(2, 3)
axarr[0, 0].plot(x1, y1)
axarr[1, 0].plot(x2, y2)
pdf.savefig(f)
plt.close('all')
Creating a figure each loop it is highly time consuming, but if I put that outside the loop it doesnt clear each plot. Other options I tried like clear() or clf() didnt work either or ended in creating multiple different figures, anyone as an idea on how to put this in a different way so that it goes faster?
Multipage PDF appending w/ matplotlib
Create 𝑚-rows × 𝑛-cols matrices of subplot axes arrays per pdf page & save (append) as each page's matrix of subplots becomes completely full → then create new page, repeat, 𝐞𝐭𝐜.
To contain large numbers of subplots as multipage output inside a single pdf, immediately start filling the first page with your plot(s), then you'll need to create a new page after detecting that the latest subplot addition in your iteration of plot generation has maxed out the available space in the current page's 𝑚-rows × 𝑛-cols subplot-array layout [i.e., an 𝑚 × 𝑛 matrix of subplots], as applicable.
Here's a way to do it where the dimensions (𝑚 × 𝑛) controlling the number of subplots per page can easily be changed:
import sys
import matplotlib
from matplotlib.backends.backend_pdf import PdfPages
import matplotlib.pyplot as plt
import numpy as np
matplotlib.rcParams.update({"font.size": 6})
# Dimensions for any m-rows × n-cols array of subplots / pg.
m, n = 4, 5
# Don't forget to indent after the with statement
with PdfPages("auto_subplotting.pdf") as pdf:
"""Before beginning the iteration through all the data,
initialize the layout for the plots and create a
representation of the subplots that can be easily
iterated over for knowing when to create the next page
(and also for custom settings like partial axes labels)"""
f, axarr = plt.subplots(m, n, sharex="col", sharey="row")
arr_ij = [(x, y) for x, y in np.ndindex(axarr.shape)]
subplots = [axarr[index] for index in arr_ij]
# To conserve needed plotting real estate,
# only label the bottom row and leftmost subplots
# as determined automatically using m and n
splot_index = 0
for s, splot in enumerate(subplots):
splot.set_ylim(0, 0.15)
splot.set_xlim(0, 50)
last_row = m * n - s < n + 1
first_in_row = s % n == 0
if last_row:
splot.set_xlabel("X-axis label")
if first_in_row:
splot.set_ylabel("Y-axis label")
# Iterate through each sample in the data
for sample in range(33):
# As a stand-in for real data, let's just make numpy take 100 random draws
# from a poisson distribution centered around say ~25 and then display
# the outcome as a histogram
scaled_y = np.random.randint(20, 30)
random_data = np.random.poisson(scaled_y, 100)
subplots[splot_index].hist(
random_data,
bins=12,
normed=True,
fc=(0, 0, 0, 0),
lw=0.75,
ec="b",
)
# Keep collecting subplots (into the mpl-created array;
# see: [1]) through the samples in the data and increment
# a counter each time. The page will be full once the count is equal
# to the product of the user-set dimensions (i.e. m * n)
splot_index += 1
"""Once an mxn number of subplots have been collected
you now have a full page's worth, and it's time to
close and save to pdf that page and re-initialize for a
new page possibly. We can basically repeat the same
exact code block used for the first layout
initialization, but with the addition of 3 new lines:
+2 for creating & saving the just-finished pdf page,
+1 more to reset the subplot index (back to zero)"""
if splot_index == m * n:
pdf.savefig()
plt.close(f)
f, axarr = plt.subplots(m, n, sharex="col", sharey="row")
arr_ij = [(x, y) for x, y in np.ndindex(axarr.shape)]
subplots = [axarr[index] for index in arr_ij]
splot_index = 0
for s, splot in enumerate(subplots):
splot.set_ylim(0, 0.15)
splot.set_xlim(0, 50)
last_row = (m * n) - s < n + 1
first_in_row = s % n == 0
if last_row:
splot.set_xlabel("X-axis label")
if first_in_row:
splot.set_ylabel("Y-axis label")
# Done!
# But don't forget to save to pdf after the last page
pdf.savefig()
plt.close(f)
For any m×n layout, just change the declarations for the values of m and n, respectively. From the code above (where "m, n = 4, 5"), a 4x5 matrix of subplots with a total 33 samples is produced as a two-page pdf output file:
References
Link to matplotlib subplots official docs.
Note:
There will be, on the final page of the multipage PDF, a number of blank subplots equal to the remainder from the the product of your chosen subplots 𝑚 × 𝑛 layout dimension numbers and your total number of samples/data to plot. E.g., say m=3, and n=4, thus you get 3 rows of 4 subplots each equals 12 per page, and if you had say 20 samples, then there would be a two-page pdf auto-created with a total of 24 subplots with the last 4 (so full bottom-most row in this hypothetical example) of subplots on the second page empty.
Using seaborn
For a more advanced (& more "pythonic"*) extension of the implementation above, see below:
The multipage handling should probably be simplified by creating a new_page function; it's better to not repeat code verbatim*, especially if you start customizing the plots in which case you won't want to have to mirror every change and type the same thing twice. A more customized aesthetic based off of seaborn and utilizing the available matplotlib parameters like shown below might be preferable too.
Add a new_page function & some customizations for the subplot style:
import matplotlib.pyplot as plt
import numpy as np
import random
import seaborn as sns
from matplotlib.backends.backend_pdf import PdfPages
# this erases labels for any blank plots on the last page
sns.set(font_scale=0.0)
m, n = 4, 6
datasize = 37
# 37 % (m*n) = 13, (m*n) - 13 = 24 - 13 = 11. Thus 11 blank subplots on final page
# custom colors scheme / palette
ctheme = [
"k", "gray", "magenta", "fuchsia", "#be03fd", "#1e488f",
(0.44313725490196076, 0.44313725490196076, 0.88627450980392153), "#75bbfd",
"teal", "lime", "g", (0.6666674, 0.6666663, 0.29078014184397138), "y",
"#f1da7a", "tan", "orange", "maroon", "r", ] # pick whatever colors you wish
colors = sns.blend_palette(ctheme, datasize)
fz = 7 # labels fontsize
def new_page(m, n):
global splot_index
splot_index = 0
fig, axarr = plt.subplots(m, n, sharey="row")
plt.subplots_adjust(hspace=0.5, wspace=0.15)
arr_ij = [(x, y) for x, y in np.ndindex(axarr.shape)]
subplots = [axarr[index] for index in arr_ij]
for s, splot in enumerate(subplots):
splot.grid(
b=True,
which="major",
color="gray",
linestyle="-",
alpha=0.25,
zorder=1,
lw=0.5,
)
splot.set_ylim(0, 0.15)
splot.set_xlim(0, 50)
last_row = m * n - s < n + 1
first_in_row = s % n == 0
if last_row:
splot.set_xlabel("X-axis label", labelpad=8, fontsize=fz)
if first_in_row:
splot.set_ylabel("Y-axis label", labelpad=8, fontsize=fz)
return (fig, subplots)
with PdfPages("auto_subplotting_colors.pdf") as pdf:
fig, subplots = new_page(m, n)
for sample in xrange(datasize):
splot = subplots[splot_index]
splot_index += 1
scaled_y = np.random.randint(20, 30)
random_data = np.random.poisson(scaled_y, 100)
splot.hist(
random_data,
bins=12,
normed=True,
zorder=2,
alpha=0.99,
fc="white",
lw=0.75,
ec=colors.pop(),
)
splot.set_title("Sample {}".format(sample + 1), fontsize=fz)
# tick fontsize & spacing
splot.xaxis.set_tick_params(pad=4, labelsize=6)
splot.yaxis.set_tick_params(pad=4, labelsize=6)
# make new page:
if splot_index == m * n:
pdf.savefig()
plt.close(fig)
fig, subplots = new_page(m, n)
if splot_index > 0:
pdf.savefig()
plt.close(f)