Python - matplotlib axes limits approximate ticker location - python

When no axes limits are specified, matplotlib chooses default values as nice, round numbers below and above the minimum and maximum values in the list to be plotted.
Sometimes I have outliers in my data and I don't want them included when the axes are selected. I can detect the outliers, but I don't want to actually delete them, just have them be beyond the area of the plot. I have tried setting the axes to be the minimum and maximum value in the list not including the outliers, but that means that those values lie exactly on the axes, and the bounds of the plot do not line up with ticker points.
Is there a way to specify that the axes limits should be in a certain range, but let matplotlib choose an appropriate point?
For example, the following code produces a nice plot with the y-axis limits automatically set to (0.140,0.165):
from matplotlib import pyplot as plt
plt.plot([0.144490353418, 0.142921640661, 0.144511781706, 0.143587888773, 0.146009766101, 0.147241517391, 0.147224266382, 0.151530932135, 0.158778411784, 0.160337332636])
plt.show()
After introducing an outlier in the data and setting the limits manually, the y-axis limits are set to slightly below 0.145 and slightly above 0.160 - not nearly as neat and tidy.
from matplotlib import pyplot as plt
plt.plot([0.144490353418, 0.142921640661, 0.144511781706, 0.143587888773, 500000, 0.146009766101, 0.147241517391, 0.147224266382, 0.151530932135, 0.158778411784, 0.160337332636])
plt.ylim(0.142921640661, 0.160337332636)
plt.show()
Is there any way to tell matplotlib to either ignore the outlier value when setting the limits, or set the axes to 'below 0.142921640661' and 'above 0.160337332636', but let it decide an appropriate location? I can't simply round the numbers up and down, as all my datasets occur on a different scale of magnitude.

You could make your data a masked array:
from matplotlib import pyplot as plt
import numpy as np
data = [0.144490353418, 0.142921640661, 0.144511781706, 0.143587888773, 500000, 0.146009766101, 0.147241517391, 0.147224266382, 0.151530932135, 0.158778411784, 0.160337332636]
data = np.ma.array(data, mask=False)
data.mask = data>0.16
plt.plot(data)
plt.show()

unutbu actually gave me an idea that solves the problem. It's not the most efficient solution, so if anyone has any other ideas, I'm all ears.
EDIT: I was originally masking the data like unutbu said, but that doesn't actually set the axes right. I have to remove the outliers from the data.
After removing the outliers from the data, the remaining values can be plotted and the y-axis limits obtained. Then the data with the outliers can be plotted again, but setting the limits from the first plot.
from matplotlib import pyplot as plt
data = [0.144490353418, 0.142921640661, 0.144511781706, 0.143587888773, 500000, 0.146009766101, 0.147241517391, 0.147224266382, 0.151530932135, 0.158778411784, 0.160337332636]
cleanedData = remove_outliers(data) #Function defined by me elsewhere.
plt.plot(cleanedData)
ymin, ymax = plt.ylim()
plt.clf()
plt.plot(data)
plt.ylim(ymin,ymax)
plt.show()

Related

Adjust axes to make space for offset line plot

I would like to plot a series of curves in the same Axes each having a constant y offset from eachother. Because the data I have needs to be displayed in log scale, simply adding a y offset to each curve (as done here) does not give the desired output.
I have tried using matplotlib.transforms to achieve the same, i.e. artificially shifting the curve in Figure coordinates. This achieves the desired result, but requires adjusting the Axes y limits so that the shifted curves are visible. Here is an example to illustrate this, though such data would not require log scale to be visible:
import matplotlib as mpl
import matplotlib.pyplot as plt
import numpy as np
fig, ax = plt.subplots(1,1)
for i in range(1,19):
x, y = np.arange(200), np.random.rand(200)
dy = 0.5*i
shifted = mpl.transforms.offset_copy(ax.transData, y=dy, fig=fig, units='inches')
ax.set_xlim(0, 200)
ax.set_ylim(0.1, 1e20)
ax.set_yscale('log')
ax.plot(x, y, transform=shifted, c=mpl.cm.plasma(i/18), lw=2)
The problem is that to make all the shifted curves visible, I would need to adjust the ylim to a very high number, which compresses all the curves so that the features visible because of the log scale cannot be seen anymore.
Since the displayed y axis values are meaningless to me, is there any way to artificially extend the Axes limits to display all the curves, without having to make the Figure very large? Apparently this can be done with seaborn, but if possible I would like to stick to matplotlib.
EDIT:
This is the kind of data I need to plot (an X-ray diffraction pattern varying with temperature):

Python matplotlib contourf plot

I have one questions about matplotlib and contourf.
I am using the last version of matplotlib with python3.7. Basically I have to matrix I want to plot on the same contour plot but using different colormap. One important aspect is that, for instance, if we have zero matrixA and matrixB with shape=(10,10) then the positions in which matrixA is different of zero are the positions in which matrixB are non-zero, and viceversa.
In other words I want to plot in different colors two different mask.
Thanks for your time.
Edited:
I add an example here
import numpy
import matplotlib.pyplot as plt
matrixA=numpy.random.randn(10,10).reshape(100,)
matrixB=numpy.random.randn(10,10).reshape(100,)
mask=numpy.random.uniform(10,10)
mask=mask.reshape(100,)
indexA=numpy.where(mask[mask>0.5])[0]
indexB=numpy.where(mask[mask<=0.5])[0]
matrixA_masked=numpy.zeros(100,)
matrixB_masked=numpy.zeros(100,)
matrixA_masked[indexA]=matrixA[indexA]
matrixB_masked[indexB]=matrixB[indexB]
matrixA_masked=matrixA_masked.reshape(100,100)
matrixB_masked=matrixB_masked.reshape(100,100)
x=numpy.linspace(0,10,1)
X,Y = numpy.meshgrid(x,x)
plt.contourf(X,Y,matrixA_masked,colormap='gray')
plt.contourf(X,Y,matrixB_masked,colormap='winter')
plt.show()
What I want is to be able to use different colormaps that appear in the same plot. So for instance in the plot there will be a part assigned to matrixA with a contour color (and 0 where matrixB take place), and the same to matrixB with a different colormap.
In other works each part of the contourf plot correspond to one matrix. I am plotting decision surfaces of Machine Learning Models.
I stumbled into some errors in your code so I have created my own dataset.
To have two colormaps on one plot you need to open a figure and define the axes:
import numpy
import matplotlib.pyplot as plt
matrixA=numpy.linspace(1,20,100)
matrixA[matrixA >= 10] = numpy.nan
matrixA_2 = numpy.reshape(matrixA,[50,2])
matrixB=numpy.linspace(1,20,100)
matrixB[matrixB <= 10] = numpy.nan
matrixB_2 = numpy.reshape(matrixB,[50,2])
fig,ax = plt.subplots()
a = ax.contourf(matrixA_2,cmap='copper',alpha=0.5,zorder=0)
fig.colorbar(a,ax=ax,orientation='vertical')
b=ax.contourf(matrixB_2,cmap='cool',alpha=0.5,zorder=1)
fig.colorbar(b,ax=ax,orientation='horizontal')
plt.show()
You'll also see I've changed the alpha and zorder
I hope this helps.

How to ensure even spacing between labels on x axis of matplotlib graph?

I have been given a data for which I need to find a histogram. So I used pandas hist() function and plot it using matplotlib. The code runs on a remote server so I cannot directly see it and hence I save the image. Here is what the image looks like
Here is my code below
import matplotlib.pyplot as plt
df_hist = pd.DataFrame(np.array(raw_data)).hist(bins=5) // raw_data is the data supplied to me
plt.savefig('/path/to/file.png')
plt.close()
As you can see the x axis labels are overlapping. So I used this function plt.tight_layout() like so
import matplotlib.pyplot as plt
df_hist = pd.DataFrame(np.array(raw_data)).hist(bins=5)
plt.tight_layout()
plt.savefig('/path/to/file.png')
plt.close()
There is some improvement now
But still the labels are too close. Is there a way to ensure the labels do not touch each other and there is fair spacing between them? Also I want to resize the image to make it smaller.
I checked the documentation here https://matplotlib.org/api/_as_gen/matplotlib.pyplot.savefig.html but not sure which parameter to use for savefig.
Since raw_data is not already a pandas dataframe there's no need to turn it into one to do the plotting. Instead you can plot directly with matplotlib.
There are many different ways to achieve what you'd like. I'll start by setting up some data which looks similar to yours:
import matplotlib.pyplot as plt
import numpy as np
from scipy.stats import gamma
raw_data = gamma.rvs(a=1, scale=1e6, size=100)
If we go ahead and use matplotlib to create the histogram we may find the xticks too close together:
fig, ax = plt.subplots(1, 1, figsize=[5, 3])
ax.hist(raw_data, bins=5)
fig.tight_layout()
The xticks are hard to read with all the zeros, regardless of spacing. So, one thing you may wish to do would be to use scientific formatting. This makes the x-axis much easier to interpret:
ax.ticklabel_format(style='sci', axis='x', scilimits=(0,0))
Another option, without using scientific formatting would be to rotate the ticks (as mentioned in the comments):
ax.tick_params(axis='x', rotation=45)
fig.tight_layout()
Finally, you also mentioned altering the size of the image. Note that this is best done when the figure is initialised. You can set the size of the figure with the figsize argument. The following would create a figure 5" wide and 3" in height:
fig, ax = plt.subplots(1, 1, figsize=[5, 3])
I think the two best fixes were mentioned by Pam in the comments.
You can rotate the labels with
plt.xticks(rotation=45
For more information, look here: Rotate axis text in python matplotlib
The real problem is too many zeros that don't provide any extra info. Numpy arrays are pretty easy to work with, so pd.DataFrame(np.array(raw_data)/1000).hist(bins=5) should get rid of three zeros off of both axes. Then just add a 'kilo' in the axes labels.
To change the size of the graph use rcParams.
from matplotlib import rcParams
rcParams['figure.figsize'] = 7, 5.75 #the numbers are the dimensions

Setting both axes logarithmic in bar plot matploblib

I have already binned data to plot a histogram. For this reason I'm using the plt.bar() function. I'd like to set both axes in the plot to a logarithmic scale.
If I set plt.bar(x, y, width=10, color='b', log=True) which lets me set the y-axis to log but I can't set the x-axis logarithmic.
I've tried plt.xscale('log') unfortunately this doesn't work right. The x-axis ticks vanish and the sizes of the bars don't have equal width.
I would be grateful for any help.
By default, the bars of a barplot have a width of 0.8. Therefore they appear larger for smaller x values on a logarithmic scale. If instead of specifying a constant width, one uses the distance between the bin edges and supplies this to the width argument, the bars will have the correct width. One would also need to set the align to "edge" for this to work.
import matplotlib.pyplot as plt
import numpy as np; np.random.seed(1)
x = np.logspace(0, 5, num=21)
y = (np.sin(1.e-2*(x[:-1]-20))+3)**10
fig, ax = plt.subplots()
ax.bar(x[:-1], y, width=np.diff(x), log=True,ec="k", align="edge")
ax.set_xscale("log")
plt.show()
I cannot reproduce missing ticklabels for a logarithmic scaling. This may be due to some settings in the code that are not shown in the question or due to the fact that an older matplotlib version is used. The example here works fine with matplotlib 2.0.
If the goal is to have equal width bars, assuming datapoints are not equidistant, then the most proper solution is to set width as
plt.bar(x, y, width=c*np.array(x), color='b', log=True) for a constant c appropriate for the plot. Alignment can be anything.
I know it is a very old question and you might have solved it but I've come to this post because I was with something like this but at the y axis and I manage to solve it just using ax.set_ylim(df['my data'].min()+100, df['my data'].max()+100). In y axis I have some sensible information which I thouhg the best way was to show in log scale but when I set log scale I couldn't see the numbers proper (as this post in x axis) so I just leave the idea of use log and use the min and max argment. It sets the scale of my graph much like as log. Still looking for another way for doesnt need use that -+100 at set_ylim.
While this does not actually use pyplot.bar, I think this method could be helpful in achieving what the OP is trying to do. I found this to be easier than trying to calibrate the width as a function of the log-scale, though it's more steps. Create a line collection whose width is independent of the chart scale.
import matplotlib.pyplot as plt
import numpy as np
import matplotlib.collections as coll
#Generate data and sort into bins
a = np.random.logseries(0.5, 1000)
hist, bin_edges = np.histogram(a, bins=20, density=False)
x = bin_edges[:-1] # remove the top-end from bin_edges to match dimensions of hist
lines = []
for i in range(len(x)):
pair=[(x[i],0), (x[i], hist[i])]
lines.append(pair)
linecoll = coll.LineCollection(lines, linewidths=10, linestyles='solid')
fig, ax = plt.subplots()
ax.add_collection(linecoll)
ax.set_xscale("log")
ax.set_yscale("log")
ax.set_xlim(min(x)/10,max(x)*10)
ax.set_ylim(0.1,1.1*max(hist)) #since this is an unweighted histogram, the logy doesn't make much sense.
Resulting plot - no frills
One drawback is that the "bars" will be centered, but this could be changed by offsetting the x-values by half of the linewidth value ... I think it would be
x_new = x + (linewidth/2)*10**round(np.log10(x),0).

Scale axes 3d in matplotlib

I'm facing issues in scaling axes 3d in matplotlib. I have found another questions but somehow the answer it does not seems to work. Here is a sample code:
import matplotlib as mpl
import numpy as np
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
data=np.array([[0,0,0],[10,1,1],[2,2,2]])
fig=plt.figure()
ax=Axes3D(fig)
ax.set_xlim3d(0,15)
ax.set_ylim3d(0,15)
ax.set_zlim3d(0,15)
ax.scatter(data[:,0],data[:,1],data[:,2])
plt.show()
It seems it just ignore the ax.set commands...
In my experience, you have to set your axis limits after plotting the data, otherwise it will look at your data and adjust whatever axes settings you entered before to fit it all in-frame out to the next convenient increment along the axes in question. If, for instance, you set your x-axis limits to +/-400 but your data go out to about +/-1700 and matplotlib decides to label the x-axis in increments of 500, it's going to display the data relative to an x-axis that goes out to +/-2000.
So in your case, you just want to rearrange that last block of text as:
fig=plt.figure()
ax=Axes3D(fig)
ax.scatter(data[:,0],data[:,1],data[:,2])
ax.set_xlim3d(0,15)
ax.set_ylim3d(0,15)
ax.set_zlim3d(0,15)
plt.show()
The way of ColorOutOfSpace is good. But if you want to automate the scaling you have to search for the maximum and minimum number in the data and scale with those values.
min = np.amin(data) # lowest number in the array
max = np.amax(data) # highest number in the array
ax.set_xlim3d(min, max)
ax.set_ylim3d(min, max)
ax.set_zlim3d(min, max)

Categories