Put a gap/break in a line plot - python

I have a data set with effectively "continuous" sensor readings, with the occasional gap.
However there are several periods in which no data was recorded. These gaps are significantly longer than the sample period.
By default, pyplot connects each data point to the next (if I have a line style set), however I feel that this is slightly misleading when it connects the two data points either side of a long gap.
I would prefer to simply have no line there; that is, I would like the line to stop and to start again after the gap.
I have tried adding in an element in these gap sections with the y-value None, but seems to send the line back to an earlier part of the plot (though strangely these lines don't appear at all zoom levels).
The other option I have thought of is to simply plot each piece with a separate call to plot, but this would be a bit ugly and cumbersome.
Is there a more elegant way of achieving this?
Edit: Below is a minimal working example demonstrating the behaviour. The first plot is the joining line I am trying to avoid. The second plot shows that adding a None value appears to work, however if you pan the view of the plot, you get what is shown in the third figure, a line jumping to an earlier part of the plot.
import numpy as np
import matplotlib.pyplot as plt
t1 = np.arange(0, 8, 0.05)
t2 = np.arange(10, 14, 0.05)
t = np.concatenate([t1, t2])
c = np.cos(t)
fig = plt.figure()
ax = fig.gca()
ax.plot(t, c)
ax.set_title('Undesirable joining line')
t1 = np.arange(0, 8, 0.05)
t2 = np.arange(10, 14, 0.05)
c1 = np.cos(t1)
c2 = np.cos(t2)
t = np.concatenate([t1, t1[-1:], t2])
c = np.concatenate([c1, [None,], c2])
fig = plt.figure()
ax = fig.gca()
ax.plot(t, c)
ax.set_title('Ok if you don\'t pan the plot')
fig = plt.figure()
ax = fig.gca()
ax.plot(t, c)
ax.axis([-1, 12, -0.5, 1.25])
ax.set_title('Strange jumping line')
plt.show()

Masked arrays work well for this. You just need to mask the first of the points you don't want to connect:
import numpy as np
import numpy.ma as ma
import matplotlib.pyplot as plt
t1 = np.arange(0, 8, 0.05)
mask_start = len(t1)
t2 = np.arange(10, 14, 0.05)
t = np.concatenate([t1, t2])
c = np.cos(t) # an aside, but it's better to use numpy ufuncs than list comps
mc = ma.array(c)
mc[mask_start] = ma.masked
plt.figure()
plt.plot(t, mc)
plt.title('Using masked arrays')
plt.show()
At least on my system (OSX, Python 2.7, mpl 1.1.0), I don't have any issues with panning, etc.

The strange lines were a bug in matplotlib 1.1.1.
There is no need to have the t component of the dummy points in chronological order, zero values will also work.
For the c component, I use np.nan instead of None, which (on conversion from a list) forces the dtype to 'float64' instead of 'O' (object).
Dummy points are best inserted at the time of filling the array with samples (or appending to a list), like so:
samples = [] # (t,c) data pairs.
# Waiting for samples in a loop.
if samples and current_sample[0] > samples[-1][0] + GAP_TOLERANCE:
samples.append((0, np.nan))
samples.append(current_sample)
t, c = np.array(samples).T

Related

Matplotlib + pandas change xtick label frequency when using period[Q-DEC] [duplicate]

I am trying to fix how python plots my data.
Say:
x = [0,5,9,10,15]
y = [0,1,2,3,4]
matplotlib.pyplot.plot(x,y)
matplotlib.pyplot.show()
The x axis' ticks are plotted in intervals of 5. Is there a way to make it show intervals of 1?
You could explicitly set where you want to tick marks with plt.xticks:
plt.xticks(np.arange(min(x), max(x)+1, 1.0))
For example,
import numpy as np
import matplotlib.pyplot as plt
x = [0,5,9,10,15]
y = [0,1,2,3,4]
plt.plot(x,y)
plt.xticks(np.arange(min(x), max(x)+1, 1.0))
plt.show()
(np.arange was used rather than Python's range function just in case min(x) and max(x) are floats instead of ints.)
The plt.plot (or ax.plot) function will automatically set default x and y limits. If you wish to keep those limits, and just change the stepsize of the tick marks, then you could use ax.get_xlim() to discover what limits Matplotlib has already set.
start, end = ax.get_xlim()
ax.xaxis.set_ticks(np.arange(start, end, stepsize))
The default tick formatter should do a decent job rounding the tick values to a sensible number of significant digits. However, if you wish to have more control over the format, you can define your own formatter. For example,
ax.xaxis.set_major_formatter(ticker.FormatStrFormatter('%0.1f'))
Here's a runnable example:
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
x = [0,5,9,10,15]
y = [0,1,2,3,4]
fig, ax = plt.subplots()
ax.plot(x,y)
start, end = ax.get_xlim()
ax.xaxis.set_ticks(np.arange(start, end, 0.712123))
ax.xaxis.set_major_formatter(ticker.FormatStrFormatter('%0.1f'))
plt.show()
Another approach is to set the axis locator:
import matplotlib.ticker as plticker
loc = plticker.MultipleLocator(base=1.0) # this locator puts ticks at regular intervals
ax.xaxis.set_major_locator(loc)
There are several different types of locator depending upon your needs.
Here is a full example:
import matplotlib.pyplot as plt
import matplotlib.ticker as plticker
x = [0,5,9,10,15]
y = [0,1,2,3,4]
fig, ax = plt.subplots()
ax.plot(x,y)
loc = plticker.MultipleLocator(base=1.0) # this locator puts ticks at regular intervals
ax.xaxis.set_major_locator(loc)
plt.show()
I like this solution (from the Matplotlib Plotting Cookbook):
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
x = [0,5,9,10,15]
y = [0,1,2,3,4]
tick_spacing = 1
fig, ax = plt.subplots(1,1)
ax.plot(x,y)
ax.xaxis.set_major_locator(ticker.MultipleLocator(tick_spacing))
plt.show()
This solution give you explicit control of the tick spacing via the number given to ticker.MultipleLocater(), allows automatic limit determination, and is easy to read later.
In case anyone is interested in a general one-liner, simply get the current ticks and use it to set the new ticks by sampling every other tick.
ax.set_xticks(ax.get_xticks()[::2])
if you just want to set the spacing a simple one liner with minimal boilerplate:
plt.gca().xaxis.set_major_locator(plt.MultipleLocator(1))
also works easily for minor ticks:
plt.gca().xaxis.set_minor_locator(plt.MultipleLocator(1))
a bit of a mouthfull, but pretty compact
This is a bit hacky, but by far the cleanest/easiest to understand example that I've found to do this. It's from an answer on SO here:
Cleanest way to hide every nth tick label in matplotlib colorbar?
for label in ax.get_xticklabels()[::2]:
label.set_visible(False)
Then you can loop over the labels setting them to visible or not depending on the density you want.
edit: note that sometimes matplotlib sets labels == '', so it might look like a label is not present, when in fact it is and just isn't displaying anything. To make sure you're looping through actual visible labels, you could try:
visible_labels = [lab for lab in ax.get_xticklabels() if lab.get_visible() is True and lab.get_text() != '']
plt.setp(visible_labels[::2], visible=False)
This is an old topic, but I stumble over this every now and then and made this function. It's very convenient:
import matplotlib.pyplot as pp
import numpy as np
def resadjust(ax, xres=None, yres=None):
"""
Send in an axis and I fix the resolution as desired.
"""
if xres:
start, stop = ax.get_xlim()
ticks = np.arange(start, stop + xres, xres)
ax.set_xticks(ticks)
if yres:
start, stop = ax.get_ylim()
ticks = np.arange(start, stop + yres, yres)
ax.set_yticks(ticks)
One caveat of controlling the ticks like this is that one does no longer enjoy the interactive automagic updating of max scale after an added line. Then do
gca().set_ylim(top=new_top) # for example
and run the resadjust function again.
I developed an inelegant solution. Consider that we have the X axis and also a list of labels for each point in X.
Example:
import matplotlib.pyplot as plt
x = [0,1,2,3,4,5]
y = [10,20,15,18,7,19]
xlabels = ['jan','feb','mar','apr','may','jun']
Let's say that I want to show ticks labels only for 'feb' and 'jun'
xlabelsnew = []
for i in xlabels:
if i not in ['feb','jun']:
i = ' '
xlabelsnew.append(i)
else:
xlabelsnew.append(i)
Good, now we have a fake list of labels. First, we plotted the original version.
plt.plot(x,y)
plt.xticks(range(0,len(x)),xlabels,rotation=45)
plt.show()
Now, the modified version.
plt.plot(x,y)
plt.xticks(range(0,len(x)),xlabelsnew,rotation=45)
plt.show()
Pure Python Implementation
Below's a pure python implementation of the desired functionality that handles any numeric series (int or float) with positive, negative, or mixed values and allows for the user to specify the desired step size:
import math
def computeTicks (x, step = 5):
"""
Computes domain with given step encompassing series x
# params
x - Required - A list-like object of integers or floats
step - Optional - Tick frequency
"""
xMax, xMin = math.ceil(max(x)), math.floor(min(x))
dMax, dMin = xMax + abs((xMax % step) - step) + (step if (xMax % step != 0) else 0), xMin - abs((xMin % step))
return range(dMin, dMax, step)
Sample Output
# Negative to Positive
series = [-2, 18, 24, 29, 43]
print(list(computeTicks(series)))
[-5, 0, 5, 10, 15, 20, 25, 30, 35, 40, 45]
# Negative to 0
series = [-30, -14, -10, -9, -3, 0]
print(list(computeTicks(series)))
[-30, -25, -20, -15, -10, -5, 0]
# 0 to Positive
series = [19, 23, 24, 27]
print(list(computeTicks(series)))
[15, 20, 25, 30]
# Floats
series = [1.8, 12.0, 21.2]
print(list(computeTicks(series)))
[0, 5, 10, 15, 20, 25]
# Step – 100
series = [118.3, 293.2, 768.1]
print(list(computeTicks(series, step = 100)))
[100, 200, 300, 400, 500, 600, 700, 800]
Sample Usage
import matplotlib.pyplot as plt
x = [0,5,9,10,15]
y = [0,1,2,3,4]
plt.plot(x,y)
plt.xticks(computeTicks(x))
plt.show()
Notice the x-axis has integer values all evenly spaced by 5, whereas the y-axis has a different interval (the matplotlib default behavior, because the ticks weren't specified).
Generalisable one liner, with only Numpy imported:
ax.set_xticks(np.arange(min(x),max(x),1))
Set in the context of the question:
import numpy as np
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
x = [0,5,9,10,15]
y = [0,1,2,3,4]
ax.plot(x,y)
ax.set_xticks(np.arange(min(x),max(x),1))
plt.show()
How it works:
fig, ax = plt.subplots() gives the ax object which contains the axes.
np.arange(min(x),max(x),1) gives an array of interval 1 from the min of x to the max of x. This is the new x ticks that we want.
ax.set_xticks() changes the ticks on the ax object.
xmarks=[i for i in range(1,length+1,1)]
plt.xticks(xmarks)
This worked for me
if you want ticks between [1,5] (1 and 5 inclusive) then replace
length = 5
Since None of the above solutions worked for my usecase, here I provide a solution using None (pun!) which can be adapted to a wide variety of scenarios.
Here is a sample piece of code that produces cluttered ticks on both X and Y axes.
# Note the super cluttered ticks on both X and Y axis.
# inputs
x = np.arange(1, 101)
y = x * np.log(x)
fig = plt.figure() # create figure
ax = fig.add_subplot(111)
ax.plot(x, y)
ax.set_xticks(x) # set xtick values
ax.set_yticks(y) # set ytick values
plt.show()
Now, we clean up the clutter with a new plot that shows only a sparse set of values on both x and y axes as ticks.
# inputs
x = np.arange(1, 101)
y = x * np.log(x)
fig = plt.figure() # create figure
ax = fig.add_subplot(111)
ax.plot(x, y)
ax.set_xticks(x)
ax.set_yticks(y)
# which values need to be shown?
# here, we show every third value from `x` and `y`
show_every = 3
sparse_xticks = [None] * x.shape[0]
sparse_xticks[::show_every] = x[::show_every]
sparse_yticks = [None] * y.shape[0]
sparse_yticks[::show_every] = y[::show_every]
ax.set_xticklabels(sparse_xticks, fontsize=6) # set sparse xtick values
ax.set_yticklabels(sparse_yticks, fontsize=6) # set sparse ytick values
plt.show()
Depending on the usecase, one can adapt the above code simply by changing show_every and using that for sampling tick values for X or Y or both the axes.
If this stepsize based solution doesn't fit, then one can also populate the values of sparse_xticks or sparse_yticks at irregular intervals, if that is what is desired.
You can loop through labels and show or hide those you want:
for i, label in enumerate(ax.get_xticklabels()):
if i % interval != 0:
label.set_visible(False)

python violin plot regular axis

I want to to a violin plot of binned data but at the same time be able to plot a model prediction and visualize how well the model describes the main part of the individual data distributions. My problem here is, I guess, that the x-axis after the violin plot does not behave like a regular axis with numbers, but more like string-values that just accidentally happen to be numbers. Maybe not a good description, but in the example I would like to have a "normal" plot a function, e.g. f(x) = 2*x**2, and at x=1, x=5.2, x=18.3 and x=27 I would like to have the violin in the background.
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
np.random.seed(10)
collectn_1 = np.random.normal(1, 2, 200)
collectn_2 = np.random.normal(802, 30, 200)
collectn_3 = np.random.normal(90, 20, 200)
collectn_4 = np.random.normal(70, 25, 200)
ys = [collectn_1, collectn_2, collectn_3, collectn_4]
xs = [1, 5.2, 18.3, 27]
sns.violinplot(x=xs, y=ys)
xx = np.arange(0, 30, 10)
plt.plot(xx, 2*xx**2)
plt.show()
Somehow this code actually does not plot violins but only bars, this is only a problem in this example and not in the original code though. In my real code I want to have different "half-violins" on both sides, therefore I use sns.violinplot(x="..", y="..", hue="..", data=.., split=True).
I think that would be hard to do with seaborn because it does not provide an easy way to manipulate the artists that it creates, particularly if there are other things plotted on the same Axes. Matplotlib's violinplot allows setting the position of the violins, but does not provide an option for plotting only half violins. Therefore, I would suggest using statsmodels.graphics.boxplots.violinplot, which does both.
from statsmodels.graphics.boxplots import violinplot
df = sns.load_dataset('tips')
x_col = 'day'
y_col = 'total_bill'
hue_col = 'smoker'
xs = [1, 5.2, 18.3, 27]
xx = np.arange(0, 30, 1)
yy = 0.1*xx**2
cs = ['C0','C1']
fig, ax = plt.subplots()
ax.plot(xx,yy)
for (_,gr0),side,c in zip(df.groupby(hue_col),['left','right'],cs):
print(side)
data = [gr1 for (_,gr1) in gr0.groupby(x_col)[y_col]]
violinplot(ax=ax, data=data, positions=xs, side=side, show_boxplot=False, plot_opts=dict(violin_fc=c))
# violinplot above messes up which ticks are shown, the line below restores a sensible tick locator
ax.xaxis.set_major_locator(matplotlib.ticker.MaxNLocator())

How to set fixed spaces between ticks in maptlotlib

I am preparing a graph of latency percentile results. This is my pd.DataFrame looks like:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
%matplotlib inline
result = pd.DataFrame(np.random.randint(133000, size=(5,3)), columns=list('ABC'), index=[99.0, 99.9, 99.99, 99.999, 99.9999])
I am using this function (commented lines are different pyplot methods I have already tried to achieve my goal):
def plot_latency_time_bar(result):
ind = np.arange(4)
means = []
stds = []
for index, row in result.iterrows():
means.append(np.mean([row[0]//1000, row[1]//1000, row[2]//1000]))
stds.append(np .std([row[0]//1000, row[1]//1000, row[2]//1000]))
plt.bar(result.index.values, means, 0.2, yerr=stds, align='center')
plt.xlabel('Percentile')
plt.ylabel('Latency')
plt.xticks(result.index.values)
# plt.xticks(ind, ('99.0', '99.9', '99.99', '99.999', '99.99999'))
# plt.autoscale(enable=False, axis='x', tight=False)
# plt.axis('auto')
# plt.margins(0.8, 0)
# plt.semilogx(basex=5)
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)
fig = plt.gcf()
fig.set_size_inches(15.5, 10.5)
And here is the figure:
As you can see bars for all percentiles above 99.0 overlaps and are completely unreadable. I would like to set some fixed space between ticks to have a same space between all of them.
Since you're using pandas, you can do all this from within that library:
means = df.mean(axis=1)/1000
stds = df.std(axis=1)/1000
means.plot.bar(yerr=stds, fc='b')
# Make some room for the x-axis tick labels
plt.subplots_adjust(bottom=0.2)
plt.show()
Not wishing to take anything away from xnx's answer (which is the most elegant way to do things given that you're working in pandas, and therefore likely the best answer for you) but the key insight you're missing is that, in matplotlib, the x positions of the data you're plotting and the x tick labels are independent things. If you say:
nominalX = np.arange( 1, 6 ) ** 2
y = np.arange( 1, 6 ) ** 4
positionalX = np.arange(len(y))
plt.bar( positionalX, y ) # graph y against the numbers 1..n
plt.gca().set(xticks=positionalX + 0.4, xticklabels=nominalX) # ...but superficially label the X values as something else
then that's different from tying positions to your nominal X values:
plt.bar( nominalX, y )
Note that I added 0.4 to the x position of the ticks, because that's half the default width of the bars bar( ..., width=0.8 )—so the ticks end up in the middle of the bar.

Line-based heatmap or 2D line histogram

I have a synthetic dataset with 1000 noisy polygons of various orders and sin/cos curves that I can plot as lines using python seaborn.
Since I have quite a few lines that are overlapping, I'd like to plot some sort of heatmap or histogram of my line graphs.
I've tried iterating over the columns and aggregating the counts to use seaborn's heatmap graph, but with many lines this takes quite a while.
The next best thing that results in what I want was a hexbin graph (with seaborn jointgraph).
But it's a compromise between runtime and granularity (the shown graph has gridsize 750). I couldn't find any other graph-type for my problem. But I also don't know exactly what it might be called.
I've also tried with line alpha set to 0.2. This results in a similar graph to what I want. But it's less precise (if more than 5 lines overlap at the same point I already have zero transparency left). Also, it misses the typical coloration of heatmaps.
(Moot search terms were: heatmap, 2D line histogram, line histogram, density plots...)
Does anybody know packages to plot this more efficiently and high(er) quality or knows how to do it with the popular python plotters (i.e. the matplotlib family: matplotlib, seaborn, bokeh). I'm really fine with any package though.
It took me awhile, but I finally solved this using Datashader. If using a notebook, the plots can be embedded into interactive Bokeh plots, which looks really nice.
Anyhow, here is the code for static images, in case someone else is in need of something similar:
# coding: utf-8
import time
import numpy as np
from numpy.polynomial import polynomial
import pandas as pd
import matplotlib.pyplot as plt
import datashader as ds
import datashader.transfer_functions as tf
plt.style.use("seaborn-whitegrid")
def create_data():
# ...
# Each column is one data sample
df = create_data()
# Following will append a nan-row and reshape the dataframe into two columns, with each sample stacked on top of each other
# THIS IS CRUCIAL TO OPTIMIZE SPEED: https://github.com/bokeh/datashader/issues/286
# Append row with nan-values
df = df.append(pd.DataFrame([np.array([np.nan] * len(df.columns))], columns=df.columns, index=[np.nan]))
# Reshape
x, y = df.shape
arr = df.as_matrix().reshape((x * y, 1), order='F')
df_reshaped = pd.DataFrame(arr, columns=list('y'), index=np.tile(df.index.values, y))
df_reshaped = df_reshaped.reset_index()
df_reshaped.columns.values[0] = 'x'
# Plotting parameters
x_range = (min(df.index.values), max(df.index.values))
y_range = (df.min().min(), df.max().max())
w = 1000
h = 750
dpi = 150
cvs = ds.Canvas(x_range=x_range, y_range=y_range, plot_height=h, plot_width=w)
# Aggregate data
t0 = time.time()
aggs = cvs.line(df_reshaped, 'x', 'y', ds.count())
print("Time to aggregate line data: {}".format(time.time()-t0))
# One colored plot
t1 = time.time()
stacked_img = tf.Image(tf.shade(aggs, cmap=["darkblue", "darkblue"]))
print("Time to create stacked image: {}".format(time.time() - t1))
# Save
f0 = plt.figure(figsize=(w / dpi, h / dpi), dpi=dpi)
ax0 = f0.add_subplot(111)
ax0.imshow(stacked_img.to_pil())
ax0.grid(False)
f0.savefig("stacked.png", bbox_inches="tight", dpi=dpi)
# Heat map - This uses a equalized histogram (built-in default), there are other options, though.
t2 = time.time()
heatmap_img = tf.Image(tf.shade(aggs, cmap=plt.cm.Spectral_r))
print("Time to create stacked image: {}".format(time.time() - t2))
# Save
f1 = plt.figure(figsize=(w / dpi, h / dpi), dpi=dpi)
ax1 = f1.add_subplot(111)
ax1.imshow(heatmap_img.to_pil())
ax1.grid(False)
f1.savefig("heatmap.png", bbox_inches="tight", dpi=dpi)
With following run times (in seconds):
Time to aggregate line data: 0.7710442543029785
Time to create stacked image: 0.06000351905822754
Time to create stacked image: 0.05600309371948242
The resulting plots:
Although it seems you have tried this, plotting the counts seems to give a good representation of the data. However, it really depends what you're trying to find in your data, what is it supposed to tell you?
The reason for the long run time is due to plotting so many lines, a heatmap based on the counts however will plot fairly quickly.
I created some dummy data for sinus waves, based on noise, no. of lines, amplitude and shift. Added both a boxplot and heatmap.
import matplotlib.pyplot as plt
import numpy as np
import matplotlib as mpl
import random
import pandas as pd
np.random.seed(0)
#create dummy data
N = 200
sinuses = []
no_lines = 200
for i in range(no_lines):
a = np.random.randint(5, 40)/5 #amplitude
x = random.choice([int(N/5), int(N/(2/5))]) #random shift
sinuses.append(np.roll(a * np.sin(np.linspace(0, 2 * np.pi, N)) + np.random.randn(N), x))
fig = plt.figure(figsize=(20 / 2.54, 20 / 2.54))
sins = pd.DataFrame(sinuses, )
ax1 = plt.subplot2grid((3,10), (0,0), colspan=10)
ax2 = plt.subplot2grid((3,10), (1,0), colspan=10)
ax3 = plt.subplot2grid((3,10), (2,0), colspan=9)
ax4 = plt.subplot2grid((3,10), (2,9))
# plot line data
sins.T.plot(ax=ax1, color='lightblue',linewidth=.3)
ax1.legend_.remove()
ax1.set_xlim(0, N)
# try boxplot
sins.plot.box(ax=ax2, showfliers=False)
xticks = ax2.xaxis.get_major_ticks()
for index, label in enumerate(ax2.get_xaxis().get_ticklabels()):
xticks[index].set_visible(False) # hide ticks where labels are hidden
#make a list of bins
no_bins = 20
bins = list(np.arange(sins.min().min(), sins.max().max(), int(abs(sins.min().min())+sins.max().max())/no_bins))
bins.append(sins.max().max())
# calculate histogram
hists = []
for col in sins.columns:
count, division = np.histogram(sins.iloc[:,col], bins=bins)
hists.append(count)
hists = pd.DataFrame(hists, columns=[str(i) for i in bins[1:]])
print(hists.shape, '\n', hists.head())
cmap = mpl.colors.ListedColormap(['white', '#FFFFBB', '#C3FDB8', '#B5EAAA', '#64E986', '#54C571',
'#4AA02C', '#347C17', '#347235', '#25383C', '#254117'])
#heatmap
im = ax3.pcolor(hists.T, cmap=cmap)
cbar = plt.colorbar(im, cax=ax4)
yticks = np.arange(0, len(bins))
yticklabels = hists.columns.tolist()
ax3.set_yticks(yticks)
ax3.set_yticklabels([round(i,1) for i in bins])
ax3.set_title('Count')
yticks = ax3.yaxis.get_major_ticks()
for index, label in enumerate(ax3.get_yaxis().get_ticklabels()):
if index % 3 != 0: #make some labels invisible
yticks[index].set_visible(False) # hide ticks where labels are hidden
plt.show()
Although the boxplot is easy to interpret, it doesn't show the actual distribution of the data very well, but knowing where the median and quantiles lie may be helpful.
Increasing the number of lines and amount of values per line will increase plotting time considerably for the line plots, the heatmap is still fairly quick though to generate. The boxplot becomes indiscernible however.
I couldn't exactly replicate your data (or know the actual size of it), but perhaps the heatmap may be helpful.

Defining and plotting a Schechter function: plot problems

I'm currently defining a function in python as:
def schechter_fit(logM, phi=5.96E-11, log_M0=11.03, alpha=-1.35, e=2.718281828):
schechter = phi*(10**((alpha+1)*(logM-log_M0)))*(e**(pow(-10,logM-log_M0)))
return schechter
schechter_range = numpy.linspace(10.0, 11.9, 10000)
And then plotting said function as:
import numpy
import matplotlib.pyplot as plt
from mpl_toolkits.axes_grid.axislines import SubplotZero
schechter_range = numpy.linspace(10, 12, 10000)
fig = plt.figure(1)
ax = SubplotZero(fig, 111)
fig.add_subplot(ax)
ax.plot(schechter_range, schechter_fit(schechter_range), 'k')
This is the graphical output I am receiving is just a blank plot with no curve plotted. There must be a problem with how I have defined the function, but I can't see the problem. The plot should look something like this:
I'm new to python functions so perhaps my equation isn't quite right. This is what I am looking to plot and the parameters I am starting with:
The function you describe returns a complex result over most of your input range. Here I added +0j to the input to allow for an imaginary result; if you don't do this you just get a bunch of nans (which mpl doesn't plot). Here are the plots:
import numpy
import matplotlib.pyplot as plt
from mpl_toolkits.axes_grid.axislines import SubplotZero
schechter_range = numpy.linspace(10, 12, 10000)
fig = plt.figure(1)
ax = SubplotZero(fig, 111)
fig.add_subplot(ax)
def schechter_fit(logM, phi=5.96E-11, log_M0=11.03, alpha=-1.35, e=2.718281828):
schechter = phi*(10**((alpha+1)*(logM-log_M0)))*(e**(pow(-10,logM-log_M0)))
return schechter
y = schechter_fit(schechter_range+0j) # Note the +0j here to allow an imaginary result
ax.plot(schechter_range, y.real, 'b', label="Re Part")
ax.plot(schechter_range, y.imag, 'r', label="Im Part")
ax.legend()
plt.show()
Now that you can see why the data is not plotting, and that complex numbers are being generated, and you know physically that you don't want that, it would be reasonable to figure out where these are coming from. Hopefully, it's obvious that these are originate from pow(-10,logM-log_M0), and from there it's clear that this is assuming the wrong operator precedence: the equation isn't pow(-10,logM-log_M0), but -pow(10,logM-log_M0). Making this corrections gives (after a log is taken, because I can see the log in the plot in the question):
I also extended the lower bound from 10 to 8, so the region of constant slope is clear and it better matches the graph shown in the question. This is still off by a factor on the y-axis, but I'm guessing that's a factor of (SFR/M*) that's not being applied correctly (it's difficult to know without seeing the context and the full y-axis).
i did amost the same as tom10 except that i took the log of your expression directly, which turns the factors into summands and may make things easier to debug.
i did not really test the formula!
import numpy
import matplotlib.pyplot as plt
from mpl_toolkits.axes_grid.axislines import SubplotZero
def log_schechter_fit(logM, SFR_M=5.96E-11, log_M0=11.03,
alpha=-1.35):
schechter = numpy.log(SFR_M)
schechter += (alpha+1)*(logM-log_M0)*numpy.log(10)
schechter += pow(-10,logM-log_M0)
return schechter
schechter_range = numpy.linspace(10, 12, 10000)
# for i in range(10,13):
for i in numpy.linspace(10, 11.03, 10):
print(i, log_schechter_fit(i+0j))
fig = plt.figure(1)
ax = SubplotZero(fig, 111)
fig.add_subplot(ax)
ax.set_xlim([10,12])
y = log_schechter_fit(schechter_range+0j)
ax.plot(schechter_range, y.real, 'b', label="Re Part")
ax.plot(schechter_range, y.imag, 'r', label="Im Part")
ax.legend()
and i got:
UPDATE
again using tom10's comments on operator precedence and changing the last part in the function:
LOG_10 = numpy.log(10)
SFR_M = 5.96E-11
LOG_SFR_M = numpy.log(SFR_M)
def log_schechter_fit(logM, log_SFR_M=LOG_SFR_M, log_M0=11.03,
alpha=-1.35):
schechter = log_SFR_M
schechter += (alpha+1)*(logM-log_M0)*LOG_10
schechter -= pow(10,logM-log_M0)
return schechter
i can reproduce the plot of the accepted answer. the shape of the curve fits but i can not explain the discrepancy to the values compared with the original plot posted in the question...

Categories