Remove the x-axis spikes with empty labels in matplotlib - python

I have a plot in which I have to divide my data points into several groups, so I made customized sticks for this plot.
For instance, I have to group data points into multiples of 12, this is what I did
my_xticks = []
for x_ele in range(len(all_points)):
if x_ele % 12 == 0:
my_xticks.append(x_ele//12 + 1)
else:
my_xticks.append('')
ax.set_xticks(range(len(my_xticks)))
ax.set_xticklabels(my_xticks)
And the x-axis of the plot looks as
However, I wish to remove those spikes with empty labels, as circled in red
So the final x-axis could look like
Any idea? Thanks!

You didn't provide any data so i solved this by using some data i created. the idea is to use the range function to create the same gap between each tick.
Here is my code:
from matplotlib import pyplot as plt
import numpy as np
# create sample data
x = np.linspace(1, 60, 100)
y = x*x
# define the space of ticks
space = 12
# get minimum x value
min_val = int(min(x))
# get maximum x value
max_val = int(max(x))
# define our ticks
xticks = list(range(min_val, max_val, space))
# define labels for each tick
xticklabels = list(range(1, len(xticks) + 1, 1))
# create plot
fig, ax = plt.subplots()
ax.plot(x, y)
ax.set_xticks(xticks)
ax.set_xticklabels(xticklabels)
plt.show()
And output:

Related

how to create a column of dots in python [duplicate]

I'd like to create what my statistics book calls a "dot plot" where the number of dots in the plot equals the number of observations. Here's an example from mathisfun.com:
In the example, there are six dots above the 0 value on the X-axis representing the six observations of the value zero.
It seems that a "dot plot" can have several variations. In looking up how to create this with Matplotlib, I only came across what I know of as a scatter plot with a data point representing the relationship between the X and Y value.
Is the type of plot I'm trying to create possible with Matplotlib?
Supoose you have some data that would produce a histogram like the following,
import numpy as np; np.random.seed(13)
import matplotlib.pyplot as plt
data = np.random.randint(0,12,size=72)
plt.hist(data, bins=np.arange(13)-0.5, ec="k")
plt.show()
You may create your dot plot by calculating the histogram and plotting a scatter plot of all possible points, the color of the points being white if they exceed the number given by the histogram.
import numpy as np; np.random.seed(13)
import matplotlib.pyplot as plt
data = np.random.randint(0,12,size=72)
bins = np.arange(13)-0.5
hist, edges = np.histogram(data, bins=bins)
y = np.arange(1,hist.max()+1)
x = np.arange(12)
X,Y = np.meshgrid(x,y)
plt.scatter(X,Y, c=Y<=hist, cmap="Greys")
plt.show()
Alternatively you may set the unwanted points to nan,
Y = Y.astype(np.float)
Y[Y>hist] = np.nan
plt.scatter(X,Y)
This answer is built on the code posted by eyllanesc in his comment to the question as I find it elegant enough to merit an illustrative example. I provide two versions: a simple one where formatting parameters have been set manually and a second version where some of the formatting parameters are set automatically based on the data.
Simple version with manual formatting
import numpy as np # v 1.19.2
import matplotlib.pyplot as plt # v 3.3.2
# Create random data
rng = np.random.default_rng(123) # random number generator
data = rng.integers(0, 13, size=40)
values, counts = np.unique(data, return_counts=True)
# Draw dot plot with appropriate figure size, marker size and y-axis limits
fig, ax = plt.subplots(figsize=(6, 2.25))
for value, count in zip(values, counts):
ax.plot([value]*count, list(range(count)), 'co', ms=10, linestyle='')
for spine in ['top', 'right', 'left']:
ax.spines[spine].set_visible(False)
ax.yaxis.set_visible(False)
ax.set_ylim(-1, max(counts))
ax.set_xticks(range(min(values), max(values)+1))
ax.tick_params(axis='x', length=0, pad=8, labelsize=12)
plt.show()
Advanced version with automated formatting
If you plan on using this plot quite often, it can be useful to add some automated formatting parameters to get appropriate figure dimensions and marker size. In the following example, the parameters are defined in a way that works best with the kind of data for which this type of plot is typically useful (integer data with a range of up to a few dozen units and no more than a few hundred data points).
# Create random data
rng = np.random.default_rng(1) # random number generator
data = rng.integers(0, 21, size=100)
values, counts = np.unique(data, return_counts=True)
# Set formatting parameters based on data
data_range = max(values)-min(values)
width = data_range/2 if data_range<30 else 15
height = max(counts)/3 if data_range<50 else max(counts)/4
marker_size = 10 if data_range<50 else np.ceil(30/(data_range//10))
# Create dot plot with appropriate format
fig, ax = plt.subplots(figsize=(width, height))
for value, count in zip(values, counts):
ax.plot([value]*count, list(range(count)), marker='o', color='tab:blue',
ms=marker_size, linestyle='')
for spine in ['top', 'right', 'left']:
ax.spines[spine].set_visible(False)
ax.yaxis.set_visible(False)
ax.set_ylim(-1, max(counts))
ax.set_xticks(range(min(values), max(values)+1))
ax.tick_params(axis='x', length=0, pad=10)
plt.show()
Pass your dataset to this function:
def dot_diagram(dataset):
values, counts = np.unique(dataset, return_counts=True)
data_range = max(values)-min(values)
width = data_range/2 if data_range<30 else 15
height = max(counts)/3 if data_range<50 else max(counts)/4
marker_size = 10 if data_range<50 else np.ceil(30/(data_range//10))
fig, ax = plt.subplots(figsize=(width, height))
for value, count in zip(values, counts):
ax.plot([value]*count, list(range(count)), marker='o', color='tab:blue',
ms=marker_size, linestyle='')
for spine in ['top', 'right', 'left']:
ax.spines[spine].set_visible(False)
ax.yaxis.set_visible(False)
ax.set_ylim(-1, max(counts))
ax.set_xticks(range(min(values), max(values)+1))
ax.tick_params(axis='x', length=0, pad=10)
Let's say this is my data:
data = [5,8,3,7,1,5,3,2,3,3,8,5]
In order to plot a "dot plot", I will need the data (x-axis) and frequency (y-axis)
pos = []
keys = {} # this dict will help to keep track ...
# this loop will give us a list of frequencies to each number
for num in data:
if num not in keys:
keys[num] = 1
pos.append(1)
else:
keys[num] += 1
apos.append(keys[num])
print(pos)
[1, 1, 1, 1, 1, 2, 2, 1, 3, 4, 2, 3]
plt.scatter(data, pos)
plt.show()
Recently, I have also come up with something like this. And I have made the following for my case.
Hope this is helpful.
Well, we will first generate the frequency table and then we will generate points from that to do a scatter plot. Thats all! Superb simple.
For example, in your case, we have for 0 minutes, 6 people. This frequency can be converted into
[(0,1),(0,2),(0,3),(0,4),(0,5),(0,6)]
Then, these points has to be simply plotted using the pyplot.scatter.
import numpy as np
import matplotlib.pyplot as plt
def generate_points_for_dotplot(arr):
freq = np.unique(arr,return_counts=True)
ls = []
for (value, count) in zip(freq[0],freq[1]):
ls += [(value,num) for num in range(count)]
x = [x for (x,y) in ls]
y = [y for (x,y) in ls]
return np.array([x,y])
Of course, this function return an array of two arrays, one for x co-ordinates and the other for y co-ordinates (Just because, thats how pyplot needs the points!). Now, we have the function to generate the points required to us, let us plot it then.
arr = np.random.randint(1,21,size=100)
x,y = generate_points_for_dotplot(arr)
# Plotting
fig,ax = plt.subplots(figsize = (max(x)/3,3)) # feel free to use Patricks answer to make it more dynamic
ax.scatter(x,y,s=100,facecolors='none',edgecolors='black')
ax.set_xticks(np.unique(x))
ax.yaxis.set_visible(False)
# removing the spines
for spine in ['top', 'right', 'left']:
ax.spines[spine].set_visible(False)
plt.show()
Output:
Probably, if the x ticks becomes over whelming, you can rotate them. However, for more number of values, that also becomes clumsy.

Show custom tick value in plot

Let's say I have made a plot, and in that plot there is a specific point where I draw vertical line from to the x-axis. This point has the x-value 33.55 for example. However, my tick separation is something like 10 or 20 from 0 to 100.
So basically: Is there a way in which I can add this single custom value to the tick axis, so it shows together with all the other values that where there before ?
Use np.append to add to the array of ticks:
import numpy as np
from matplotlib import pyplot as plt
x = np.random.rand(100) * 100
y = np.random.rand(100) * 100
fig, ax = plt.subplots(figsize=(8, 6))
ax.scatter(x, y)
ax.set_xticks(np.append(ax.get_xticks(), 33.55))
Note that if your plot is not big enough, the tick labels may overlap.
If you want the new tick to "clear its orbit", so to speak:
special_value = 33.55
black_hole_radius = 10
new_ticks = [value for value in ax.get_xticks() if abs(value - special_value) > black_hole_radius] + [special_value]
ax.set_xticks(new_ticks)

How to create a "dot plot" in Matplotlib? (not a scatter plot)

I'd like to create what my statistics book calls a "dot plot" where the number of dots in the plot equals the number of observations. Here's an example from mathisfun.com:
In the example, there are six dots above the 0 value on the X-axis representing the six observations of the value zero.
It seems that a "dot plot" can have several variations. In looking up how to create this with Matplotlib, I only came across what I know of as a scatter plot with a data point representing the relationship between the X and Y value.
Is the type of plot I'm trying to create possible with Matplotlib?
Supoose you have some data that would produce a histogram like the following,
import numpy as np; np.random.seed(13)
import matplotlib.pyplot as plt
data = np.random.randint(0,12,size=72)
plt.hist(data, bins=np.arange(13)-0.5, ec="k")
plt.show()
You may create your dot plot by calculating the histogram and plotting a scatter plot of all possible points, the color of the points being white if they exceed the number given by the histogram.
import numpy as np; np.random.seed(13)
import matplotlib.pyplot as plt
data = np.random.randint(0,12,size=72)
bins = np.arange(13)-0.5
hist, edges = np.histogram(data, bins=bins)
y = np.arange(1,hist.max()+1)
x = np.arange(12)
X,Y = np.meshgrid(x,y)
plt.scatter(X,Y, c=Y<=hist, cmap="Greys")
plt.show()
Alternatively you may set the unwanted points to nan,
Y = Y.astype(np.float)
Y[Y>hist] = np.nan
plt.scatter(X,Y)
This answer is built on the code posted by eyllanesc in his comment to the question as I find it elegant enough to merit an illustrative example. I provide two versions: a simple one where formatting parameters have been set manually and a second version where some of the formatting parameters are set automatically based on the data.
Simple version with manual formatting
import numpy as np # v 1.19.2
import matplotlib.pyplot as plt # v 3.3.2
# Create random data
rng = np.random.default_rng(123) # random number generator
data = rng.integers(0, 13, size=40)
values, counts = np.unique(data, return_counts=True)
# Draw dot plot with appropriate figure size, marker size and y-axis limits
fig, ax = plt.subplots(figsize=(6, 2.25))
for value, count in zip(values, counts):
ax.plot([value]*count, list(range(count)), 'co', ms=10, linestyle='')
for spine in ['top', 'right', 'left']:
ax.spines[spine].set_visible(False)
ax.yaxis.set_visible(False)
ax.set_ylim(-1, max(counts))
ax.set_xticks(range(min(values), max(values)+1))
ax.tick_params(axis='x', length=0, pad=8, labelsize=12)
plt.show()
Advanced version with automated formatting
If you plan on using this plot quite often, it can be useful to add some automated formatting parameters to get appropriate figure dimensions and marker size. In the following example, the parameters are defined in a way that works best with the kind of data for which this type of plot is typically useful (integer data with a range of up to a few dozen units and no more than a few hundred data points).
# Create random data
rng = np.random.default_rng(1) # random number generator
data = rng.integers(0, 21, size=100)
values, counts = np.unique(data, return_counts=True)
# Set formatting parameters based on data
data_range = max(values)-min(values)
width = data_range/2 if data_range<30 else 15
height = max(counts)/3 if data_range<50 else max(counts)/4
marker_size = 10 if data_range<50 else np.ceil(30/(data_range//10))
# Create dot plot with appropriate format
fig, ax = plt.subplots(figsize=(width, height))
for value, count in zip(values, counts):
ax.plot([value]*count, list(range(count)), marker='o', color='tab:blue',
ms=marker_size, linestyle='')
for spine in ['top', 'right', 'left']:
ax.spines[spine].set_visible(False)
ax.yaxis.set_visible(False)
ax.set_ylim(-1, max(counts))
ax.set_xticks(range(min(values), max(values)+1))
ax.tick_params(axis='x', length=0, pad=10)
plt.show()
Pass your dataset to this function:
def dot_diagram(dataset):
values, counts = np.unique(dataset, return_counts=True)
data_range = max(values)-min(values)
width = data_range/2 if data_range<30 else 15
height = max(counts)/3 if data_range<50 else max(counts)/4
marker_size = 10 if data_range<50 else np.ceil(30/(data_range//10))
fig, ax = plt.subplots(figsize=(width, height))
for value, count in zip(values, counts):
ax.plot([value]*count, list(range(count)), marker='o', color='tab:blue',
ms=marker_size, linestyle='')
for spine in ['top', 'right', 'left']:
ax.spines[spine].set_visible(False)
ax.yaxis.set_visible(False)
ax.set_ylim(-1, max(counts))
ax.set_xticks(range(min(values), max(values)+1))
ax.tick_params(axis='x', length=0, pad=10)
Let's say this is my data:
data = [5,8,3,7,1,5,3,2,3,3,8,5]
In order to plot a "dot plot", I will need the data (x-axis) and frequency (y-axis)
pos = []
keys = {} # this dict will help to keep track ...
# this loop will give us a list of frequencies to each number
for num in data:
if num not in keys:
keys[num] = 1
pos.append(1)
else:
keys[num] += 1
apos.append(keys[num])
print(pos)
[1, 1, 1, 1, 1, 2, 2, 1, 3, 4, 2, 3]
plt.scatter(data, pos)
plt.show()
Recently, I have also come up with something like this. And I have made the following for my case.
Hope this is helpful.
Well, we will first generate the frequency table and then we will generate points from that to do a scatter plot. Thats all! Superb simple.
For example, in your case, we have for 0 minutes, 6 people. This frequency can be converted into
[(0,1),(0,2),(0,3),(0,4),(0,5),(0,6)]
Then, these points has to be simply plotted using the pyplot.scatter.
import numpy as np
import matplotlib.pyplot as plt
def generate_points_for_dotplot(arr):
freq = np.unique(arr,return_counts=True)
ls = []
for (value, count) in zip(freq[0],freq[1]):
ls += [(value,num) for num in range(count)]
x = [x for (x,y) in ls]
y = [y for (x,y) in ls]
return np.array([x,y])
Of course, this function return an array of two arrays, one for x co-ordinates and the other for y co-ordinates (Just because, thats how pyplot needs the points!). Now, we have the function to generate the points required to us, let us plot it then.
arr = np.random.randint(1,21,size=100)
x,y = generate_points_for_dotplot(arr)
# Plotting
fig,ax = plt.subplots(figsize = (max(x)/3,3)) # feel free to use Patricks answer to make it more dynamic
ax.scatter(x,y,s=100,facecolors='none',edgecolors='black')
ax.set_xticks(np.unique(x))
ax.yaxis.set_visible(False)
# removing the spines
for spine in ['top', 'right', 'left']:
ax.spines[spine].set_visible(False)
plt.show()
Output:
Probably, if the x ticks becomes over whelming, you can rotate them. However, for more number of values, that also becomes clumsy.

How to change the x-axis unit in matplotlib? [duplicate]

I am creating a plot in python. Is there a way to re-scale the axis by a factor? The yscale and xscale commands only allow me to turn log scale off.
Edit:
For example. If I have a plot where the x scales goes from 1 nm to 50 nm, the x scale will range from 1x10^(-9) to 50x10^(-9) and I want it to change from 1 to 50. Thus, I want the plot function to divide the x values placed on the plot by 10^(-9)
As you have noticed, xscale and yscale does not support a simple linear re-scaling (unfortunately). As an alternative to Hooked's answer, instead of messing with the data, you can trick the labels like so:
ticks = ticker.FuncFormatter(lambda x, pos: '{0:g}'.format(x*scale))
ax.xaxis.set_major_formatter(ticks)
A complete example showing both x and y scaling:
import numpy as np
import pylab as plt
import matplotlib.ticker as ticker
# Generate data
x = np.linspace(0, 1e-9)
y = 1e3*np.sin(2*np.pi*x/1e-9) # one period, 1k amplitude
# setup figures
fig = plt.figure()
ax1 = fig.add_subplot(121)
ax2 = fig.add_subplot(122)
# plot two identical plots
ax1.plot(x, y)
ax2.plot(x, y)
# Change only ax2
scale_x = 1e-9
scale_y = 1e3
ticks_x = ticker.FuncFormatter(lambda x, pos: '{0:g}'.format(x/scale_x))
ax2.xaxis.set_major_formatter(ticks_x)
ticks_y = ticker.FuncFormatter(lambda x, pos: '{0:g}'.format(x/scale_y))
ax2.yaxis.set_major_formatter(ticks_y)
ax1.set_xlabel("meters")
ax1.set_ylabel('volt')
ax2.set_xlabel("nanometers")
ax2.set_ylabel('kilovolt')
plt.show()
And finally I have the credits for a picture:
Note that, if you have text.usetex: true as I have, you may want to enclose the labels in $, like so: '${0:g}$'.
Instead of changing the ticks, why not change the units instead? Make a separate array X of x-values whose units are in nm. This way, when you plot the data it is already in the correct format! Just make sure you add a xlabel to indicate the units (which should always be done anyways).
from pylab import *
# Generate random test data in your range
N = 200
epsilon = 10**(-9.0)
X = epsilon*(50*random(N) + 1)
Y = random(N)
# X2 now has the "units" of nanometers by scaling X
X2 = (1/epsilon) * X
subplot(121)
scatter(X,Y)
xlim(epsilon,50*epsilon)
xlabel("meters")
subplot(122)
scatter(X2,Y)
xlim(1, 50)
xlabel("nanometers")
show()
To set the range of the x-axis, you can use set_xlim(left, right), here are the docs
Update:
It looks like you want an identical plot, but only change the 'tick values', you can do that by getting the tick values and then just changing them to whatever you want. So for your need it would be like this:
ticks = your_plot.get_xticks()*10**9
your_plot.set_xticklabels(ticks)

Combining plt.plot(x,y) with plt.boxplot()

I'm trying to combine a normal matplotlib.pyplot plt.plot(x,y) with variable y as a function of variable x with a boxplot. However, I only want a boxplot on certain (variable) locations of x but this does not seem to work in matplotlib?
Are you wanting something like this? The positions kwarg to boxplot allows you to place the boxplots at arbitrary positions.
import matplotlib.pyplot as plt
import numpy as np
# Generate some data...
data = np.random.random((100, 5))
y = data.mean(axis=0)
x = np.random.random(y.size) * 10
x -= x.min()
x.sort()
# Plot a line between the means of each dataset
plt.plot(x, y, 'b-')
# Save the default tick positions, so we can reset them...
locs, labels = plt.xticks()
plt.boxplot(data, positions=x, notch=True)
# Reset the xtick locations.
plt.xticks(locs)
plt.show()
This is what has worked for me:
plot box-plot
get boxt-plot x-axis tick locations
use box-plot x-axis tick locations as x-axis values for the line plot
# Plot Box-plot
ax.boxplot(data, positions=x, notch=True)
# Get box-plot x-tick locations
locs=ax.get_xticks()
# Plot a line between the means of each dataset
# x-values = box-plot x-tick locations
# y-values = means
ax.plot(locs, y, 'b-')

Categories