X-ticks values consistent with bars [duplicate] - python

This question already has answers here:
Aligning rotated xticklabels with their respective xticks
(5 answers)
Closed 1 year ago.
Could someone please help me to set my x-ticks with bars. The bars are not consistent with xtick time values as you can see in the image. I have printed my data values of g01, g02 below and code as well. I have tried this solution Python MatplotLib plot x-axis with first x-axis value labeled as 1 (instead of 0), plt.xticks(np.arange(len(g01)), np.arange(1, len(g01)+1)) although then bars are consistent with x-ticks but it changes to numbers 1 to 28. I want time period like in my image.
g01 = ['2021-02-01 05:00:31', '2021-02-02 00:01:04', '2021-02-03 00:05:09', '2021-02-04 00:05:15', '2021-02-05 00:03:14', '2021-02-06 00:00:25', '2021-02-07 00:04:09', '2021-02-08 00:04:35', '2021-02-09 00:00:00', '2021-02-10 00:02:00', '2021-02-11 00:01:28', '2021-02-12 00:06:31', '2021-02-13 00:00:30', '2021-02-14 00:03:30', '2021-02-15 00:05:20', '2021-02-16 00:00:13', '2021-02-17 00:00:21', '2021-02-18 00:08:02', '2021-02-19 00:00:31', '2021-02-20 00:00:04', '2021-02-21 00:05:05', '2021-02-22 00:02:18', '2021-02-23 00:00:10', '2021-02-24 00:00:38', '2021-02-25 00:00:47', '2021-02-26 00:00:17', '2021-02-27 00:00:28', '2021-02-28 00:03:00']
g02 = [164, 158, 180, 200, 177, 112, 97, 237, 95, 178, 163, 78, 67, 65, 134, 93, 220, 74, 131, 172, 77, 102, 208, 109, 113, 208, 110, 101]
fig = plt.figure()
fig, ax1 = plt.subplots(1,1)
plt.yscale("log")
barlist1=ax1.bar(g01,g02)
for i in range(21):
barlist1[i].set_color('pink')
degrees = 70
plt.xticks(rotation=degrees)
plt.xlabel('period', fontsize=14, fontweight="bold")
plt.ylabel('rating values', fontsize=10, fontweight="bold")

While the linked duplicate does improve the alignment with ha='right', the labels will still be slightly off.
First note that the ticks/labels are correctly mapped, which you can see by using rotation=90 (left subplot):
plt.xticks(rotation=90)
If you use rotation=70 with ha='right', notice that the labels are still slightly shifted. This is because matplotlib uses the text's bounding box for alignment, not the text itself (center subplot):
plt.xticks(rotation=70, ha='right')
To tweak the labels more precisely, add a ScaledTranslation transform (right subplot):
from matplotlib.transforms import ScaledTranslation
offset = ScaledTranslation(xt=0.075, yt=0, scale_trans=fig.dpi_scale_trans)
for label in ax1.xaxis.get_majorticklabels():
label.set_transform(label.get_transform() + offset)

Related

How to set up a detailed heat map [duplicate]

This question already has answers here:
Change color according to conditions for seaborn heatmaps
(1 answer)
Changing annotation text color in Seaborn heat map
(1 answer)
How to visualize a list of strings on a colorbar in matplotlib
(1 answer)
Masking annotations in seaborn heatmap
(1 answer)
Change certain squares in a seaborn heat map
(2 answers)
Closed 8 months ago.
I'm currently working on a heat map produced by the following data.
df = pd.DataFrame(
data = {
'Set_A' : [91, 91, 91, 90, 91, 91, 91],
'Set_B' : [91, 92, 91, 89, 91, 91, 91],
'Set_C' : [89, 90, 89, 88, 90, 89, 89],
'model' : ['SVM', 'RF', 'LR', 'KNN', 'NB', 'MLP', 'LGB'],
}
)
df = df.set_index('model')
sns.heatmap(df, cmap='Reds', annot=True, vmin=85, vmax=95, annot_kws={'color':'black'}, linewidths=.5);
I have two questions.
How can I change the color of the text in the heat map to white for 91 and above and black for 90 and below?
Currently, the colorbar is 86 to 94. I would like to change this to show 85 to 95.

Plot A Lineplot with Y-Axis as Percentage (Using PercentFormatter)

I am using the following nested dictionary to make a lineplot:
df = {'A':
{'weight': [200, 190, 188, 180, 170],
'days_since_gym': [0, 91, 174, 205, 279],
'days_since_fasting': 40},
'B':
{'weight': [181, 175, 172, 165, 150],
'days_since_gym': [43, 171, 241, 273, 300],
'days_since_fasting': 100}}
While making the lineplot, I want the Y-Axis ticks as the percentage value, for which I'm using PercentFormatter:
# set the plot size
fig, ax = plt.subplots(2, figsize=(10, 6))
for i, x in enumerate(df.keys()):
sns.lineplot(
x=df[x]['days_since_gym'],
y=df[x]['weight'],
marker="o",
ax=ax[i],
)
ax[i].axvline(df[x]['days_since_fasting'], color='k', linestyle='--', label='Fasting Starts')
ax[i].set_xlim(left=0, right=365)
# Percentage y-axis
ax[i].yaxis.set_major_formatter(mtick.PercentFormatter())
plt.xlabel('Days Since Joined Gym')
plt.ylabel('Relastive Weight')
plt.legend(bbox_to_anchor=(1.04, 1), loc="upper left")
plt.show()
However, I don't want the default percentage value (as the fig shows). I want the 1st value would be the starting percentage and the subsequent value would be the relative percentage. For example, the 1st plot starts with 200%, which I want as 0%, and the plot ends with 170%, which I want as -something%.
Any suggestions would be appreciated. Thanks!
One way with minor changes to your code is to make the values in y relative to the first value. That is, keep everything as is and replace:
y=df[x]['weight'],
with:
y=[a-df[x]['weight'][0] for a in df[x]['weight']],

Matplotlib blank space with no color when use fill_between with where option

Update:
I slice days into 100 points then interpolate the corresponding value of min_temp and max_temp, the result become better, but still some area have no color, how to modify it?
days_vals=numpy.linspace(1,10,100)
min_interp=numpy.interp(days_vals,days,min_temp)
max_interp=numpy.interp(days_vals,days,max_temp)
plt.xticks(days)
plt.plot(days_vals,min_interp,c='b',marker='o')
plt.plot(days_vals,max_interp,c='g',marker='o')
plt.fill_between(days_vals,min_interp,max_interp,where=[i>35 for i in min_interp],
facecolor='lightgreen',alpha=0.7,interpolate=False)
plt.fill_between(days_vals,min_interp,max_interp,where=[i<=35 for i in min_interp],
facecolor='lightpink',alpha=0.7,interpolate=False)
==========================================================================
I am using fill_between with where option to fill the color, min_temp > 35 fill green and min_temp <= 35 fill pink, but see the result is not as my expected
there are so many blank area with no color.
I search one question somelike my issue link
it solution is to add additional data-points to the series that that lie on the axis, but it not fix my issue
How can i modify my codes to make the color continuous with no blank space?
here's the codes:
from matplotlib import pyplot as plt
days=range(1,11)
max_temp=[37, 35, 42, 36, 39, 56, 50, 45, 41, 39]
min_temp=[32, 30, 37, 20, 34, 40, 37, 38, 32, 30]
fig=plt.figure(figsize=(10,8))
font={'weight':'normal',
'color':'cyan',
'fontsize':24,
}
plt.title('Weather 2014',fontdict=font)
plt.xlabel('Month',fontdict=font)
plt.ylabel('Temperature',fontdict=font)
plt.title('Weather 2014',fontdict=font)
plt.xlabel('Month',fontdict=font)
plt.ylabel('Temperature',fontdict=font)
plt.xticks(days)
plt.plot(days,max_temp,marker='o',mfc='red',mec='None',markersize=3,label='Max Temp')
plt.plot(days,min_temp,marker='o',mfc='g',mec='None',markersize=3,label='Min Temp')
'''add additional data points'''
eta=1e-6
plt.fill_between(days,min_temp,max_temp,where=[i+eta>35 for i in min_temp],
facecolor='lightgreen',alpha=0.7)
plt.fill_between(days,min_temp,max_temp,where=[i-eta<=35 for i in min_temp],
facecolor='lightpink',alpha=0.7)
plt.legend(loc='upper left',bbox_to_anchor=(1,1))
fig.autofmt_xdate()
plt.grid(True)
plt.show()

Matplotlib : single line chart with different markers

I have a list for markers on my time series depicting a trade. First index in each each list of the bigger list is the index where i want my marker on the line chart. Now I want a different marker for buy and sell
[[109, 'sell'],
[122, 'buy'],
[122, 'sell'],
[127, 'buy'],
[131, 'sell'],
[142, 'buy'],
[142, 'sell'],
[150, 'buy']]
code:
fig = plt.figure(figsize=(20,10))
ax = fig.add_subplot(1,1,1)
ax.set_ylim( min(timeSeriesList_1)-0.5, max(timeSeriesList_1)+0.5)
start, end = 0, len(timeSeriesList_index)
stepsize = 10
ax.xaxis.set_ticks(np.arange(start, end, stepsize))
ax.set_xticklabels(timeSeriesList_index2, rotation=50)
## change required here:
ax.plot(timeSeriesList_1, '-gD', markevery= [ x[0] for x in markers_on_list1])
This is how my chart looks:
Please tell me, how I can have different markers for buy and sell.
Create two new arrays, one buy-array and one sell-array and plot them individually, with different markers. To create the two arrays you can use list-comprehension
buy = [x[0] for x in your_array if x[1]=='buy']
sell = [x[0] for x in your_array if x[1]=='sell']

Running MatplotLib Python Code ImportErrors

I am trying to run Randy Olson's code - Percentage of Bachelor's Degrees Conferred to Women.
http://www.randalolson.com/2014/06/28/how-to-make-beautiful-data-visualizations-in-python-with-matplotlib/
Full Code (written by Randy Olson and not me, obviously):
from pandas import read_csv
# Read the data into a pandas DataFrame.
gender_degree_data = read_csv("http://www.randalolson.com/wp-content/uploads/percent-bachelors-degrees-women-usa.csv")
# These are the "Tableau 20" colors as RGB.
tableau20 = [(31, 119, 180), (174, 199, 232), (255, 127, 14), (255, 187, 120),
(44, 160, 44), (152, 223, 138), (214, 39, 40), (255, 152, 150),
(148, 103, 189), (197, 176, 213), (140, 86, 75), (196, 156, 148),
(227, 119, 194), (247, 182, 210), (127, 127, 127), (199, 199, 199),
(188, 189, 34), (219, 219, 141), (23, 190, 207), (158, 218, 229)]
# Scale the RGB values to the [0, 1] range, which is the format matplotlib accepts.
for i in range(len(tableau20)):
r, g, b = tableau20[i]
tableau20[i] = (r / 255., g / 255., b / 255.)
# You typically want your plot to be ~1.33x wider than tall. This plot is a rare
# exception because of the number of lines being plotted on it.
# Common sizes: (10, 7.5) and (12, 9)
figure(figsize=(12, 14))
# Remove the plot frame lines. They are unnecessary chartjunk.
ax = subplot(111)
ax.spines["top"].set_visible(False)
ax.spines["bottom"].set_visible(False)
ax.spines["right"].set_visible(False)
ax.spines["left"].set_visible(False)
# Ensure that the axis ticks only show up on the bottom and left of the plot.
# Ticks on the right and top of the plot are generally unnecessary chartjunk.
ax.get_xaxis().tick_bottom()
ax.get_yaxis().tick_left()
# Limit the range of the plot to only where the data is.
# Avoid unnecessary whitespace.
ylim(0, 90)
xlim(1968, 2014)
# Make sure your axis ticks are large enough to be easily read.
# You don't want your viewers squinting to read your plot.
yticks(range(0, 91, 10), [str(x) + "%" for x in range(0, 91, 10)], fontsize=14)
xticks(fontsize=14)
# Provide tick lines across the plot to help your viewers trace along
# the axis ticks. Make sure that the lines are light and small so they
# don't obscure the primary data lines.
for y in range(10, 91, 10):
plot(range(1968, 2012), [y] * len(range(1968, 2012)), "--", lw=0.5, color="black", alpha=0.3)
# Remove the tick marks; they are unnecessary with the tick lines we just plotted.
plt.tick_params(axis="both", which="both", bottom="off", top="off",
labelbottom="on", left="off", right="off", labelleft="on")
# Now that the plot is prepared, it's time to actually plot the data!
# Note that I plotted the majors in order of the highest % in the final year.
majors = ['Health Professions', 'Public Administration', 'Education', 'Psychology',
'Foreign Languages', 'English', 'Communications\nand Journalism',
'Art and Performance', 'Biology', 'Agriculture',
'Social Sciences and History', 'Business', 'Math and Statistics',
'Architecture', 'Physical Sciences', 'Computer Science',
'Engineering']
for rank, column in enumerate(majors):
# Plot each line separately with its own color, using the Tableau 20
# color set in order.
plot(gender_degree_data.Year.values,
gender_degree_data[column.replace("\n", " ")].values,
lw=2.5, color=tableau20[rank])
# Add a text label to the right end of every line. Most of the code below
# is adding specific offsets y position because some labels overlapped.
y_pos = gender_degree_data[column.replace("\n", " ")].values[-1] - 0.5
if column == "Foreign Languages":
y_pos += 0.5
elif column == "English":
y_pos -= 0.5
elif column == "Communications\nand Journalism":
y_pos += 0.75
elif column == "Art and Performance":
y_pos -= 0.25
elif column == "Agriculture":
y_pos += 1.25
elif column == "Social Sciences and History":
y_pos += 0.25
elif column == "Business":
y_pos -= 0.75
elif column == "Math and Statistics":
y_pos += 0.75
elif column == "Architecture":
y_pos -= 0.75
elif column == "Computer Science":
y_pos += 0.75
elif column == "Engineering":
y_pos -= 0.25
# Again, make sure that all labels are large enough to be easily read
# by the viewer.
text(2011.5, y_pos, column, fontsize=14, color=tableau20[rank])
# matplotlib's title() call centers the title on the plot, but not the graph,
# so I used the text() call to customize where the title goes.
# Make the title big enough so it spans the entire plot, but don't make it
# so big that it requires two lines to show.
# Note that if the title is descriptive enough, it is unnecessary to include
# axis labels; they are self-evident, in this plot's case.
text(1995, 93, "Percentage of Bachelor's degrees conferred to women in the U.S.A."
", by major (1970-2012)", fontsize=17, ha="center")
# Always include your data source(s) and copyright notice! And for your
# data sources, tell your viewers exactly where the data came from,
# preferably with a direct link to the data. Just telling your viewers
# that you used data from the "U.S. Census Bureau" is completely useless:
# the U.S. Census Bureau provides all kinds of data, so how are your
# viewers supposed to know which data set you used?
text(1966, -8, "Data source: nces.ed.gov/programs/digest/2013menu_tables.asp"
"\nAuthor: Randy Olson (randalolson.com / #randal_olson)"
"\nNote: Some majors are missing because the historical data "
"is not available for them", fontsize=10)
# Finally, save the figure as a PNG.
# You can also save it as a PDF, JPEG, etc.
# Just change the file extension in this call.
# bbox_inches="tight" removes all the extra whitespace on the edges of your plot.
savefig("percent-bachelors-degrees-women-usa.png", bbox_inches="tight");
I have all of the dependencies as I installed Python through Anaconda. I am not sure how to run it through IPython Notebook, though, and am hoping I can work around that. I am having trouble with the imports
I have:
from pandas import read_csv
from matplotlib import *
from matplotlib.figure import figure
But I keep getting TypeError: 'module' object is not callable or ImportError: cannot import name figure
I know this is a pretty basic Python problem but I'm not sure what to do here. I want a line plot with multiple lines that has an interactive hovertool and this seems like the best example I can find. If anyone knows how to fix this or even knows of other examples of already written interactive lineplots that are easy to manipulate with new data, let me know!
EDIT:
using
from pandas import read_csv
from matplotlib import *
from matplotlib.figure import Figure
import pandas
and the same code:
Full Traceback
runfile('C:/Users/jbyrusb/Documents/Python Scripts/Disputes/WomenDegreesExample.py', wdir='C:/Users/jbyrusb/Documents/Python Scripts/Disputes')
Traceback (most recent call last):
File "<ipython-input-30-1b99e15a9df1>", line 1, in <module>
runfile('C:/Users/jbyrusb/Documents/Python Scripts/Disputes/WomenDegreesExample.py', wdir='C:/Users/jbyrusb/Documents/Python Scripts/Disputes')
File "C:\Users\jbyrusb\AppData\Local\Continuum\Anaconda\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 682, in runfile
execfile(filename, namespace)
File "C:\Users\jbyrusb\AppData\Local\Continuum\Anaconda\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 71, in execfile
exec(compile(scripttext, filename, 'exec'), glob, loc)
File "C:/Users/jbyrusb/Documents/Python Scripts/Disputes/WomenDegreesExample.py", line 33, in <module>
figure(figsize=(12, 14))
TypeError: 'module' object is not callable
The example in your link calls %pylab inline, which an ipython command that among other things, executesfrom pylab import *.
This is literally the worst way to demonstrate matplotlib and if I could wave a magic wand and it remove it from the internet and the world, I would.
In short, adding from pylab import * to the top of the original code should solve the problems.
Here's the code in modern object-oriented matplotlib:
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
import pandas
import seaborn
seaborn.set(style='white')
# Read the data into a pandas DataFrame.
url = "http://www.randalolson.com/wp-content/uploads/percent-bachelors-degrees-women-usa.csv"
gender_degree_data = pandas.read_csv(url)
# These are the "Tableau 20" colors as RGB.
tableau20 = np.array([
( 31, 119, 180), (174, 199, 232), (255, 127, 14), (255, 187, 120),
( 44, 160, 44), (152, 223, 138), (214, 39, 40), (255, 152, 150),
(148, 103, 189), (197, 176, 213), (140, 86, 75), (196, 156, 148),
(227, 119, 194), (247, 182, 210), (127, 127, 127), (199, 199, 199),
(188, 189, 34), (219, 219, 141), ( 23, 190, 207), (158, 218, 229)
]) / 255.
fig, ax = plt.subplots(figsize=(12, 14))
seaborn.despine(ax=ax, left=True, bottom=True)
ax.xaxis.tick_bottom()
ax.yaxis.tick_left()
ax.set_ylim(bottom=0, top=90)
ax.set_xlim(left=1968, right=2014)
ax.set_yticks(range(0, 91, 10))
ax.set_yticklabels([str(x) + "%" for x in range(0, 91, 10)])
for y in range(10, 91, 10):
ax.plot(range(1968, 2012), [y] * len(range(1968, 2012)), "--",
lw=0.5, color="black", alpha=0.3)
ax.tick_params(axis="both", which="both", bottom="off", top="off", labelsize=14,
labelbottom="on", left="off", right="off", labelleft="on")
majors = [
'Health Professions', 'Public Administration', 'Education',
'Psychology','Foreign Languages', 'English',
'Communications\nand Journalism', 'Art and Performance',
'Biology', 'Agriculture', 'Social Sciences and History',
'Business', 'Math and Statistics', 'Architecture',
'Physical Sciences', 'Computer Science','Engineering'
]
offsets = {
"Foreign Languages": +0.5,
"English": -0.5,
"Communications\nand Journalism": +0.75,
"Art and Performance": -0.25,
"Agriculture": +1.25,
"Social Sciences and History": +0.25,
"Business": -0.75,
"Math and Statistics": +0.75,
"Architecture": -0.75,
"Computer Science": +0.75,
"Engineering": -0.25,
}
for rank, column in enumerate(majors):
ax.plot(gender_degree_data.Year.values,
gender_degree_data[column.replace("\n", " ")].values,
lw=2.5, color=tableau20[rank])
y_pos = gender_degree_data[column.replace("\n", " ")].values[-1] - 0.5
y_pos += offsets.get(column, 0)
ax.text(2011.5, y_pos, column, fontsize=14, color=tableau20[rank])
ax.text(1995, 93, "Percentage of Bachelor's degrees conferred to women in the U.S.A."
", by major (1970-2012)", fontsize=17, ha="center")
ax.text(1966, -8, "Data source: nces.ed.gov/programs/digest/2013menu_tables.asp"
"\nAuthor: Randy Olson (randalolson.com / #randal_olson)"
"\nNote: Some majors are missing because the historical data "
"is not available for them", fontsize=10)
fig.savefig("percent-bachelors-degrees-women-usa.png", bbox_inches="tight")
As Paul points out, using %pylab inline is an outdated practice and should no longer be used. Here's the updated code that can be run outside of the IPython Notebook and doesn't add the extra Seaborn dependency.
I've also written an example that uses only matplotlib. You can find it in the matplotlib gallery here.
import matplotlib.pyplot as plt
import pandas as pd
# Read the data into a pandas DataFrame.
gender_degree_data = pd.read_csv("http://www.randalolson.com/wp-content/uploads/percent-bachelors-degrees-women-usa.csv")
# These are the "Tableau 20" colors as RGB.
tableau20 = [(31, 119, 180), (174, 199, 232), (255, 127, 14), (255, 187, 120),
(44, 160, 44), (152, 223, 138), (214, 39, 40), (255, 152, 150),
(148, 103, 189), (197, 176, 213), (140, 86, 75), (196, 156, 148),
(227, 119, 194), (247, 182, 210), (127, 127, 127), (199, 199, 199),
(188, 189, 34), (219, 219, 141), (23, 190, 207), (158, 218, 229)]
# Scale the RGB values to the [0, 1] range, which is the format matplotlib accepts.
for i in range(len(tableau20)):
r, g, b = tableau20[i]
tableau20[i] = (r / 255., g / 255., b / 255.)
# You typically want your plot to be ~1.33x wider than tall. This plot is a rare
# exception because of the number of lines being plotted on it.
# Common sizes: (10, 7.5) and (12, 9)
plt.figure(figsize=(12, 14))
# Remove the plot frame lines. They are unnecessary chartjunk.
ax = plt.subplot(111)
ax.spines["top"].set_visible(False)
ax.spines["bottom"].set_visible(False)
ax.spines["right"].set_visible(False)
ax.spines["left"].set_visible(False)
# Ensure that the axis ticks only show up on the bottom and left of the plot.
# Ticks on the right and top of the plot are generally unnecessary chartjunk.
ax.get_xaxis().tick_bottom()
ax.get_yaxis().tick_left()
# Limit the range of the plot to only where the data is.
# Avoid unnecessary whitespace.
plt.ylim(0, 90)
plt.xlim(1968, 2014)
# Make sure your axis ticks are large enough to be easily read.
# You don't want your viewers squinting to read your plot.
plt.yticks(range(0, 91, 10), [str(x) + "%" for x in range(0, 91, 10)], fontsize=14)
plt.xticks(fontsize=14)
# Provide tick lines across the plot to help your viewers trace along
# the axis ticks. Make sure that the lines are light and small so they
# don't obscure the primary data lines.
for y in range(10, 91, 10):
plt.plot(range(1968, 2012), [y] * len(range(1968, 2012)), "--", lw=0.5, color="black", alpha=0.3)
# Remove the tick marks; they are unnecessary with the tick lines we just plotted.
plt.tick_params(axis="both", which="both", bottom="off", top="off",
labelbottom="on", left="off", right="off", labelleft="on")
# Now that the plot is prepared, it's time to actually plot the data!
# Note that I plotted the majors in order of the highest % in the final year.
majors = ['Health Professions', 'Public Administration', 'Education', 'Psychology',
'Foreign Languages', 'English', 'Communications\nand Journalism',
'Art and Performance', 'Biology', 'Agriculture',
'Social Sciences and History', 'Business', 'Math and Statistics',
'Architecture', 'Physical Sciences', 'Computer Science',
'Engineering']
for rank, column in enumerate(majors):
# Plot each line separately with its own color, using the Tableau 20
# color set in order.
plt.plot(gender_degree_data.Year.values,
gender_degree_data[column.replace("\n", " ")].values,
lw=2.5, color=tableau20[rank])
# Add a text label to the right end of every line. Most of the code below
# is adding specific offsets y position because some labels overlapped.
y_pos = gender_degree_data[column.replace("\n", " ")].values[-1] - 0.5
if column == "Foreign Languages":
y_pos += 0.5
elif column == "English":
y_pos -= 0.5
elif column == "Communications\nand Journalism":
y_pos += 0.75
elif column == "Art and Performance":
y_pos -= 0.25
elif column == "Agriculture":
y_pos += 1.25
elif column == "Social Sciences and History":
y_pos += 0.25
elif column == "Business":
y_pos -= 0.75
elif column == "Math and Statistics":
y_pos += 0.75
elif column == "Architecture":
y_pos -= 0.75
elif column == "Computer Science":
y_pos += 0.75
elif column == "Engineering":
y_pos -= 0.25
# Again, make sure that all labels are large enough to be easily read
# by the viewer.
plt.text(2011.5, y_pos, column, fontsize=14, color=tableau20[rank])
# matplotlib's title() call centers the title on the plot, but not the graph,
# so I used the text() call to customize where the title goes.
# Make the title big enough so it spans the entire plot, but don't make it
# so big that it requires two lines to show.
# Note that if the title is descriptive enough, it is unnecessary to include
# axis labels; they are self-evident, in this plot's case.
plt.text(1995, 93, "Percentage of Bachelor's degrees conferred to women in the U.S.A."
", by major (1970-2012)", fontsize=17, ha="center")
# Always include your data source(s) and copyright notice! And for your
# data sources, tell your viewers exactly where the data came from,
# preferably with a direct link to the data. Just telling your viewers
# that you used data from the "U.S. Census Bureau" is completely useless:
# the U.S. Census Bureau provides all kinds of data, so how are your
# viewers supposed to know which data set you used?
plt.text(1966, -8, "Data source: nces.ed.gov/programs/digest/2013menu_tables.asp"
"\nAuthor: Randy Olson (randalolson.com / #randal_olson)"
"\nNote: Some majors are missing because the historical data "
"is not available for them", fontsize=10)
# Finally, save the figure as a PNG.
# You can also save it as a PDF, JPEG, etc.
# Just change the file extension in this call.
# bbox_inches="tight" removes all the extra whitespace on the edges of your plot.
plt.savefig("percent-bachelors-degrees-women-usa.png", bbox_inches="tight");
Here's what the final result looks like:
I've updated my blog post with this new code as well. Thanks for bringing this issue to my attention!

Categories