Look at this pretty graph.
Is there a way, in matplotlib, to make parts of the red and green graph invisible (where f(x)=0)?
Not just those, but also the single line segment where the flat part connects to the sine curve.
Basically, is it possible to tell matplotlib to only plot graph on a certain interval and not draw the rest (or vice versa)?
You could try replacing your points of interest with np.nan as shown below:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# here is some example data because none was provided in the question;
# it is a quadratic from x=-5:5
x = np.arange(-5, 6)
s = pd.Series(x**2, index=x)
# replace all y values less than 4 with np.nan and store in a new Series object
s_mod = s.apply(lambda y: np.nan if y < 4 else y)
# plot the modified data with the original data
fig, ax = plt.subplots()
s.plot(marker='o', markersize=16, ax=ax, label='original')
s_mod.plot(marker='s', ax=ax, label='modified')
ax.legend()
fig # displays as follows
Related
I'm trying to control the y axis order on a matplotlib scatter plot but the ordering of the x and y axes in the data I have is causing the plot to be displayed incorrectly.
Here's some code to illustrate the problem and one sub-optimal attempt to make a solution.
import pandas as pd
from numpy import random
import matplotlib.pyplot as plt
# make some fake data
axes = ['a', 'b', 'c', 'd']
pairs = pd.DataFrame([(x, y) for x in axes for y in axes], columns=['x', 'y'])
pairs['value'] = random.randint(100, size=16) + 100
# remove the diagonal
pairs_nodiag = pairs[pairs['x'] != pairs['y']]
# zero the values for the diagonal
pairs_diag = pairs.copy()
pairs_diag.loc[pairs_diag['x'] == pairs_diag['y'], 'value'] = 0
fig, ax = plt.subplots(nrows=1, ncols=3, figsize=(5, 3))
scatter = ax[0].scatter(x=pairs['x'], y=pairs['y'], s=pairs['value'])
scatter = ax[1].scatter(x=pairs_nodiag['x'], y=pairs_nodiag['y'], s=pairs_nodiag['value'])
scatter = ax[2].scatter(x=pairs_diag['x'], y=pairs_diag['y'], s=pairs_diag['value'])
plt.show()
The left most is the raw data. The middle is the plot with the problem; I want the y axis to be the same as the left most plot. The right most plot is what I am after using a sub-optimal workaround. I'm sure there is a way of controlling the ordering on the axes but I'm not expert enough in Python yet to know exactly how to do this.
You need to create your own StringCategoryConverter with your desired mapping (matplotlib by default maps strings to numbers in the sequence the occur).
import matplotlib.category as mcat
# insert the following before scatter = ax[1].scatter(...
units = mcat.UnitData(sorted(pairs_nodiag.y.unique()))
ax[1].yaxis.set_units(units)
ax[1].yaxis.set_major_locator(mcat.StrCategoryLocator(units._mapping))
ax[1].yaxis.set_major_formatter(mcat.StrCategoryFormatter(units._mapping))
UPDATE: The following is the official way to do it without using _mapping:
import matplotlib
# insert the following before scatter = ax[1].scatter(...
scc = matplotlib.category.StrCategoryConverter()
units = scc.default_units(sorted(pairs_nodiag.y.unique()), ax[1].yaxis)
axisinfo = scc.axisinfo(units, ax[1].yaxis)
ax[1].yaxis.set_major_locator(axisinfo.majloc)
ax[1].yaxis.set_major_formatter(axisinfo.majfmt)
I'm trying to create a CDF but at the end of the graph, there is a vertical line, shown below:
I've read that his is because matplotlib uses the end of the bins to draw the vertical lines, which makes sense, so I added into my code as:
bins = sorted(X) + [np.inf]
where X is the data set I'm using and set the bin size to this when plotting:
plt.hist(X, bins = bins, cumulative = True, histtype = 'step', color = 'b')
This does remove the line at the end and produce the desired effect, however when I normalise this graph now it produces an error:
ymin = max(ymin*0.9, minimum) if not input_empty else minimum
UnboundLocalError: local variable 'ymin' referenced before assignment
Is there anyway to either normalise the data with
bins = sorted(X) + [np.inf]
in my code or is there another way to remove the line on the graph?
An alternative way to plot a CDF would be as follows (in my example, X is a bunch of samples drawn from the unit normal):
import numpy as np
import matplotlib.pyplot as plt
X = np.random.randn(10000)
n = np.arange(1,len(X)+1) / np.float(len(X))
Xs = np.sort(X)
fig, ax = plt.subplots()
ax.step(Xs,n)
I needed a solution where I would not need to alter the rest of my code (using plt.hist(...) or, with pandas, dataframe.plot.hist(...)) and that I could reuse easily many times in the same jupyter notebook.
I now use this little helper function to do so:
def fix_hist_step_vertical_line_at_end(ax):
axpolygons = [poly for poly in ax.get_children() if isinstance(poly, mpl.patches.Polygon)]
for poly in axpolygons:
poly.set_xy(poly.get_xy()[:-1])
Which can be used like this (without pandas):
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
X = np.sort(np.random.randn(1000))
fig, ax = plt.subplots()
plt.hist(X, bins=100, cumulative=True, density=True, histtype='step')
fix_hist_step_vertical_line_at_end(ax)
Or like this (with pandas):
import numpy as np
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
df = pd.DataFrame(np.random.randn(1000))
fig, ax = plt.subplots()
ax = df.plot.hist(ax=ax, bins=100, cumulative=True, density=True, histtype='step', legend=False)
fix_hist_step_vertical_line_at_end(ax)
This works well even if you have multiple cumulative density histograms on the same axes.
Warning: this may not lead to the wanted results if your axes contain other patches falling under the mpl.patches.Polygon category. That was not my case so I prefer using this little helper function in my plots.
Assuming that your intentions are pure aesthetic, add a vertical line, of the same color as your plot background:
ax.axvline(x = value, color = 'white', linewidth = 2)
Where "value" stands for the right extreme of the rightmost bin.
I have 3 vectors - x,y,vel each having some 8k values. I also have quite a few files containing these 3 vectors. All the files have different x,y,vel. I want to get multiple scatter plots with the following conditions:
Color coded according to the 3rd variable i.e vel.
Once the ranges have been set for the colors (for the data from the 1st file), they should remain constant for all the remaining files. i don't want a dynamically changing (color code changing with each new file).
Want to plot a colorbar.
I greatly appreciate all your thoughts!!
I have attached the code for a single file.
import numpy as np
import matplotlib.pyplot as plt
# Create Map
cm = plt.cm.get_cmap('RdYlBu')
x,y,vel = np.loadtxt('finaldata_temp.txt', skiprows=0, unpack=True)
vel = [cm(float(i)/(8000)) for i in xrange(8000)] # 8000 is the no. of values in each of x,y,vel vectors.
# 2D Plot
plt.scatter(x, y, s=27, c=vel, marker='o')
plt.axis('equal')
plt.savefig('testfig.png', dpi=300)
plt.show()
quit()
You will have to iterate over all your data files to get the maximum value for vel, I have added a few lines of code (that need to be adjusted to fit your case) that will do that.
Therefore, your colorbar line has been changed to use the max_vel, allowing you to get rid of that code using the fixed value of 8000.
Additionally, I took the liberty to remove the black edges around the points, because I find that they 'obfuscate' the color of the point.
Lastly, I have added adjusted your plot code to use an axis object, which is required to have a colorbar.
import numpy as np
import matplotlib.pyplot as plt
# This is needed to iterate over your data files
import glob
# Loop over all your data files to get the maximum value for 'vel'.
# You will have to adjust this for your code
"""max_vel = 0
for i in glob.glob(<your files>,'r') as fr:
# Iterate over all lines
if <vel value> > max_vel:
max_vel = <vel_value>"""
# Create Map
cm = plt.cm.get_cmap('RdYlBu')
x,y,vel = np.loadtxt('finaldata_temp.txt', skiprows=0, unpack=True)
# Plot the data
fig=plt.figure()
fig.patch.set_facecolor('white')
# Here we switch to an axis object
# Additionally, you can plot several of your files in the same figure using
# the subplot option.
ax=fig.add_subplot(111)
s = ax.scatter(x,y,c=vel,edgecolor=''))
# Here we assign the color bar to the axis object
cb = plt.colorbar(mappable=s,ax=ax,cmap=cm)
# Here we set the range of the color bar based on the maximum observed value
# NOTE: This line only changes the calculated color and not the display
# 'range' of the legend next to the plot, for that we need to switch to
# ColorbarBase (see second code snippet).
cb.setlim(0,max_vel)
cb.set_label('Value of \'vel\'')
plt.show()
Snippet, demonstrating ColorbarBase
import numpy as np
import matplotlib.pyplot as plt
import matplotlib as mpl
cm = plt.cm.get_cmap('RdYlBu')
x = [1,5,10]
y = [2,6,9]
vel = [7,2,1]
# Plot the data
fig=plt.figure()
fig.patch.set_facecolor('white')
ax=fig.add_subplot(111)
s = ax.scatter(x,y,c=vel,edgecolor=''))
norm = mpl.colors.Normalize(vmin=0, vmax=10)
ax1 = fig.add_axes([0.95, 0.1, 0.01, 0.8])
cb = mpl.colorbar.ColorbarBase(ax1,norm=norm,cmap=cm,orientation='vertical')
cb.set_clim(vmin = 0, vmax = 10)
cb.set_label('Value of \'vel\'')
plt.show()
This produces the following plot
For more examples of what you can do with the colorbar, specifically the more flexible ColorbarBase, I would suggest that you check the documentation -> http://matplotlib.org/examples/api/colorbar_only.html
I am trying to plot multiple lines in a 3D plot using matplotlib. I have 6 datasets with x and y values. What I've tried so far was, to give each point in the data sets a z-value. So all points in data set 1 have z=1 all points of data set 2 have z=2 and so on.
Then I exported them into three files. "X.txt" containing all x-values, "Y.txt" containing all y-values, same for "Z.txt".
Here's the code so far:
#!/usr/bin/python
from mpl_toolkits.mplot3d import axes3d
import matplotlib.pyplot as plt
import numpy as np
import pylab
xdata = '/X.txt'
ydata = '/Y.txt'
zdata = '/Z.txt'
X = np.loadtxt(xdata)
Y = np.loadtxt(ydata)
Z = np.loadtxt(zdata)
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.plot_wireframe(X,Y,Z)
plt.show()
What I get looks pretty close to what I need. But when using wireframe, the first point and the last point of each dataset are connected. How can I change the colour of the line for each data set and how can I remove the connecting lines between the datasets?
Is there a better plotting style then wireframe?
Load the data sets individually, and then plot each one individually.
I don't know what formats you have, but you want something like this
from mpl_toolkits.mplot3d.axes3d import Axes3D
import matplotlib.pyplot as plt
fig, ax = plt.subplots(subplot_kw={'projection': '3d'})
datasets = [{"x":[1,2,3], "y":[1,4,9], "z":[0,0,0], "colour": "red"} for _ in range(6)]
for dataset in datasets:
ax.plot(dataset["x"], dataset["y"], dataset["z"], color=dataset["colour"])
plt.show()
Each time you call plot (or plot_wireframe but i don't know what you need that) on an axes object, it will add the data as a new series. If you leave out the color argument matplotlib will choose them for you, but it's not too smart and after you add too many series' it will loop around and start using the same colours again.
n.b. i haven't tested this - can't remember if color is the correct argument. Pretty sure it is though.
I'm trying to do a heat map over a shape file in python. I need to make quite a few of these so don't want to read in the .shp every time.
Instead, I thought I could create a lineCollection instance of the map boundaries and overlay the two images. Problem is - I can't seem to get the two to line up correctly.
Here is the code, where linecol is the lineCollection object.
fig = plt.figure()
ax = fig.add_subplot(111)
ax.contourf(xi,yi,zi)
ax.add_collection(linecol, autolim = False)
plt.show()
Is there an easy way to fix the limits of linecol to match those of the other plot? I've had a play with set_xlim and transforms.Bbox, but can't seem to manage it.
Thank you very much for your help!
Transforms are tricky because of the various coordinate systems involved. See http://matplotlib.sourceforge.net/users/transforms_tutorial.html.
I managed to scale a LineCollection to the appropriate size like this. The key was to realize that I needed to add + ax.transData to the new transform I set on the LineCollection. (When you don't set any transform on an artist object, ax.transData is the default. It converts data coordinates into display coordinates.)
from matplotlib import cm
import matplotlib.pyplot as plt
import matplotlib.collections as mc
import matplotlib.transforms as tx
import numpy as np
fig = plt.figure()
# Heat map spans 1 x 1.
ax = fig.add_subplot(111)
xs = ys = np.arange(0, 1.01, 0.01)
zs = np.random.random((101,101))
ax.contourf(xs, ys, zs, cmap=cm.autumn)
lines = mc.LineCollection([[(5,1), (9,5), (5,9), (1,5), (5,1)]])
# Shape spans 10 x 10. Resize it to 1 x 1 before applying the transform from
# data coords to display coords.
trans = tx.Affine2D().scale(0.1) + ax.transData
lines.set_transform(trans)
ax.add_collection(lines)
plt.show()
(Output here: http://i.stack.imgur.com/hDNN8.png Not enough reputation to post inline.)
It should be easy to modify this if you need the shape translated or scaled unequally on x and y.