I am trying to run Randy Olson's code - Percentage of Bachelor's Degrees Conferred to Women.
http://www.randalolson.com/2014/06/28/how-to-make-beautiful-data-visualizations-in-python-with-matplotlib/
Full Code (written by Randy Olson and not me, obviously):
from pandas import read_csv
# Read the data into a pandas DataFrame.
gender_degree_data = read_csv("http://www.randalolson.com/wp-content/uploads/percent-bachelors-degrees-women-usa.csv")
# These are the "Tableau 20" colors as RGB.
tableau20 = [(31, 119, 180), (174, 199, 232), (255, 127, 14), (255, 187, 120),
(44, 160, 44), (152, 223, 138), (214, 39, 40), (255, 152, 150),
(148, 103, 189), (197, 176, 213), (140, 86, 75), (196, 156, 148),
(227, 119, 194), (247, 182, 210), (127, 127, 127), (199, 199, 199),
(188, 189, 34), (219, 219, 141), (23, 190, 207), (158, 218, 229)]
# Scale the RGB values to the [0, 1] range, which is the format matplotlib accepts.
for i in range(len(tableau20)):
r, g, b = tableau20[i]
tableau20[i] = (r / 255., g / 255., b / 255.)
# You typically want your plot to be ~1.33x wider than tall. This plot is a rare
# exception because of the number of lines being plotted on it.
# Common sizes: (10, 7.5) and (12, 9)
figure(figsize=(12, 14))
# Remove the plot frame lines. They are unnecessary chartjunk.
ax = subplot(111)
ax.spines["top"].set_visible(False)
ax.spines["bottom"].set_visible(False)
ax.spines["right"].set_visible(False)
ax.spines["left"].set_visible(False)
# Ensure that the axis ticks only show up on the bottom and left of the plot.
# Ticks on the right and top of the plot are generally unnecessary chartjunk.
ax.get_xaxis().tick_bottom()
ax.get_yaxis().tick_left()
# Limit the range of the plot to only where the data is.
# Avoid unnecessary whitespace.
ylim(0, 90)
xlim(1968, 2014)
# Make sure your axis ticks are large enough to be easily read.
# You don't want your viewers squinting to read your plot.
yticks(range(0, 91, 10), [str(x) + "%" for x in range(0, 91, 10)], fontsize=14)
xticks(fontsize=14)
# Provide tick lines across the plot to help your viewers trace along
# the axis ticks. Make sure that the lines are light and small so they
# don't obscure the primary data lines.
for y in range(10, 91, 10):
plot(range(1968, 2012), [y] * len(range(1968, 2012)), "--", lw=0.5, color="black", alpha=0.3)
# Remove the tick marks; they are unnecessary with the tick lines we just plotted.
plt.tick_params(axis="both", which="both", bottom="off", top="off",
labelbottom="on", left="off", right="off", labelleft="on")
# Now that the plot is prepared, it's time to actually plot the data!
# Note that I plotted the majors in order of the highest % in the final year.
majors = ['Health Professions', 'Public Administration', 'Education', 'Psychology',
'Foreign Languages', 'English', 'Communications\nand Journalism',
'Art and Performance', 'Biology', 'Agriculture',
'Social Sciences and History', 'Business', 'Math and Statistics',
'Architecture', 'Physical Sciences', 'Computer Science',
'Engineering']
for rank, column in enumerate(majors):
# Plot each line separately with its own color, using the Tableau 20
# color set in order.
plot(gender_degree_data.Year.values,
gender_degree_data[column.replace("\n", " ")].values,
lw=2.5, color=tableau20[rank])
# Add a text label to the right end of every line. Most of the code below
# is adding specific offsets y position because some labels overlapped.
y_pos = gender_degree_data[column.replace("\n", " ")].values[-1] - 0.5
if column == "Foreign Languages":
y_pos += 0.5
elif column == "English":
y_pos -= 0.5
elif column == "Communications\nand Journalism":
y_pos += 0.75
elif column == "Art and Performance":
y_pos -= 0.25
elif column == "Agriculture":
y_pos += 1.25
elif column == "Social Sciences and History":
y_pos += 0.25
elif column == "Business":
y_pos -= 0.75
elif column == "Math and Statistics":
y_pos += 0.75
elif column == "Architecture":
y_pos -= 0.75
elif column == "Computer Science":
y_pos += 0.75
elif column == "Engineering":
y_pos -= 0.25
# Again, make sure that all labels are large enough to be easily read
# by the viewer.
text(2011.5, y_pos, column, fontsize=14, color=tableau20[rank])
# matplotlib's title() call centers the title on the plot, but not the graph,
# so I used the text() call to customize where the title goes.
# Make the title big enough so it spans the entire plot, but don't make it
# so big that it requires two lines to show.
# Note that if the title is descriptive enough, it is unnecessary to include
# axis labels; they are self-evident, in this plot's case.
text(1995, 93, "Percentage of Bachelor's degrees conferred to women in the U.S.A."
", by major (1970-2012)", fontsize=17, ha="center")
# Always include your data source(s) and copyright notice! And for your
# data sources, tell your viewers exactly where the data came from,
# preferably with a direct link to the data. Just telling your viewers
# that you used data from the "U.S. Census Bureau" is completely useless:
# the U.S. Census Bureau provides all kinds of data, so how are your
# viewers supposed to know which data set you used?
text(1966, -8, "Data source: nces.ed.gov/programs/digest/2013menu_tables.asp"
"\nAuthor: Randy Olson (randalolson.com / #randal_olson)"
"\nNote: Some majors are missing because the historical data "
"is not available for them", fontsize=10)
# Finally, save the figure as a PNG.
# You can also save it as a PDF, JPEG, etc.
# Just change the file extension in this call.
# bbox_inches="tight" removes all the extra whitespace on the edges of your plot.
savefig("percent-bachelors-degrees-women-usa.png", bbox_inches="tight");
I have all of the dependencies as I installed Python through Anaconda. I am not sure how to run it through IPython Notebook, though, and am hoping I can work around that. I am having trouble with the imports
I have:
from pandas import read_csv
from matplotlib import *
from matplotlib.figure import figure
But I keep getting TypeError: 'module' object is not callable or ImportError: cannot import name figure
I know this is a pretty basic Python problem but I'm not sure what to do here. I want a line plot with multiple lines that has an interactive hovertool and this seems like the best example I can find. If anyone knows how to fix this or even knows of other examples of already written interactive lineplots that are easy to manipulate with new data, let me know!
EDIT:
using
from pandas import read_csv
from matplotlib import *
from matplotlib.figure import Figure
import pandas
and the same code:
Full Traceback
runfile('C:/Users/jbyrusb/Documents/Python Scripts/Disputes/WomenDegreesExample.py', wdir='C:/Users/jbyrusb/Documents/Python Scripts/Disputes')
Traceback (most recent call last):
File "<ipython-input-30-1b99e15a9df1>", line 1, in <module>
runfile('C:/Users/jbyrusb/Documents/Python Scripts/Disputes/WomenDegreesExample.py', wdir='C:/Users/jbyrusb/Documents/Python Scripts/Disputes')
File "C:\Users\jbyrusb\AppData\Local\Continuum\Anaconda\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 682, in runfile
execfile(filename, namespace)
File "C:\Users\jbyrusb\AppData\Local\Continuum\Anaconda\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 71, in execfile
exec(compile(scripttext, filename, 'exec'), glob, loc)
File "C:/Users/jbyrusb/Documents/Python Scripts/Disputes/WomenDegreesExample.py", line 33, in <module>
figure(figsize=(12, 14))
TypeError: 'module' object is not callable
The example in your link calls %pylab inline, which an ipython command that among other things, executesfrom pylab import *.
This is literally the worst way to demonstrate matplotlib and if I could wave a magic wand and it remove it from the internet and the world, I would.
In short, adding from pylab import * to the top of the original code should solve the problems.
Here's the code in modern object-oriented matplotlib:
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
import pandas
import seaborn
seaborn.set(style='white')
# Read the data into a pandas DataFrame.
url = "http://www.randalolson.com/wp-content/uploads/percent-bachelors-degrees-women-usa.csv"
gender_degree_data = pandas.read_csv(url)
# These are the "Tableau 20" colors as RGB.
tableau20 = np.array([
( 31, 119, 180), (174, 199, 232), (255, 127, 14), (255, 187, 120),
( 44, 160, 44), (152, 223, 138), (214, 39, 40), (255, 152, 150),
(148, 103, 189), (197, 176, 213), (140, 86, 75), (196, 156, 148),
(227, 119, 194), (247, 182, 210), (127, 127, 127), (199, 199, 199),
(188, 189, 34), (219, 219, 141), ( 23, 190, 207), (158, 218, 229)
]) / 255.
fig, ax = plt.subplots(figsize=(12, 14))
seaborn.despine(ax=ax, left=True, bottom=True)
ax.xaxis.tick_bottom()
ax.yaxis.tick_left()
ax.set_ylim(bottom=0, top=90)
ax.set_xlim(left=1968, right=2014)
ax.set_yticks(range(0, 91, 10))
ax.set_yticklabels([str(x) + "%" for x in range(0, 91, 10)])
for y in range(10, 91, 10):
ax.plot(range(1968, 2012), [y] * len(range(1968, 2012)), "--",
lw=0.5, color="black", alpha=0.3)
ax.tick_params(axis="both", which="both", bottom="off", top="off", labelsize=14,
labelbottom="on", left="off", right="off", labelleft="on")
majors = [
'Health Professions', 'Public Administration', 'Education',
'Psychology','Foreign Languages', 'English',
'Communications\nand Journalism', 'Art and Performance',
'Biology', 'Agriculture', 'Social Sciences and History',
'Business', 'Math and Statistics', 'Architecture',
'Physical Sciences', 'Computer Science','Engineering'
]
offsets = {
"Foreign Languages": +0.5,
"English": -0.5,
"Communications\nand Journalism": +0.75,
"Art and Performance": -0.25,
"Agriculture": +1.25,
"Social Sciences and History": +0.25,
"Business": -0.75,
"Math and Statistics": +0.75,
"Architecture": -0.75,
"Computer Science": +0.75,
"Engineering": -0.25,
}
for rank, column in enumerate(majors):
ax.plot(gender_degree_data.Year.values,
gender_degree_data[column.replace("\n", " ")].values,
lw=2.5, color=tableau20[rank])
y_pos = gender_degree_data[column.replace("\n", " ")].values[-1] - 0.5
y_pos += offsets.get(column, 0)
ax.text(2011.5, y_pos, column, fontsize=14, color=tableau20[rank])
ax.text(1995, 93, "Percentage of Bachelor's degrees conferred to women in the U.S.A."
", by major (1970-2012)", fontsize=17, ha="center")
ax.text(1966, -8, "Data source: nces.ed.gov/programs/digest/2013menu_tables.asp"
"\nAuthor: Randy Olson (randalolson.com / #randal_olson)"
"\nNote: Some majors are missing because the historical data "
"is not available for them", fontsize=10)
fig.savefig("percent-bachelors-degrees-women-usa.png", bbox_inches="tight")
As Paul points out, using %pylab inline is an outdated practice and should no longer be used. Here's the updated code that can be run outside of the IPython Notebook and doesn't add the extra Seaborn dependency.
I've also written an example that uses only matplotlib. You can find it in the matplotlib gallery here.
import matplotlib.pyplot as plt
import pandas as pd
# Read the data into a pandas DataFrame.
gender_degree_data = pd.read_csv("http://www.randalolson.com/wp-content/uploads/percent-bachelors-degrees-women-usa.csv")
# These are the "Tableau 20" colors as RGB.
tableau20 = [(31, 119, 180), (174, 199, 232), (255, 127, 14), (255, 187, 120),
(44, 160, 44), (152, 223, 138), (214, 39, 40), (255, 152, 150),
(148, 103, 189), (197, 176, 213), (140, 86, 75), (196, 156, 148),
(227, 119, 194), (247, 182, 210), (127, 127, 127), (199, 199, 199),
(188, 189, 34), (219, 219, 141), (23, 190, 207), (158, 218, 229)]
# Scale the RGB values to the [0, 1] range, which is the format matplotlib accepts.
for i in range(len(tableau20)):
r, g, b = tableau20[i]
tableau20[i] = (r / 255., g / 255., b / 255.)
# You typically want your plot to be ~1.33x wider than tall. This plot is a rare
# exception because of the number of lines being plotted on it.
# Common sizes: (10, 7.5) and (12, 9)
plt.figure(figsize=(12, 14))
# Remove the plot frame lines. They are unnecessary chartjunk.
ax = plt.subplot(111)
ax.spines["top"].set_visible(False)
ax.spines["bottom"].set_visible(False)
ax.spines["right"].set_visible(False)
ax.spines["left"].set_visible(False)
# Ensure that the axis ticks only show up on the bottom and left of the plot.
# Ticks on the right and top of the plot are generally unnecessary chartjunk.
ax.get_xaxis().tick_bottom()
ax.get_yaxis().tick_left()
# Limit the range of the plot to only where the data is.
# Avoid unnecessary whitespace.
plt.ylim(0, 90)
plt.xlim(1968, 2014)
# Make sure your axis ticks are large enough to be easily read.
# You don't want your viewers squinting to read your plot.
plt.yticks(range(0, 91, 10), [str(x) + "%" for x in range(0, 91, 10)], fontsize=14)
plt.xticks(fontsize=14)
# Provide tick lines across the plot to help your viewers trace along
# the axis ticks. Make sure that the lines are light and small so they
# don't obscure the primary data lines.
for y in range(10, 91, 10):
plt.plot(range(1968, 2012), [y] * len(range(1968, 2012)), "--", lw=0.5, color="black", alpha=0.3)
# Remove the tick marks; they are unnecessary with the tick lines we just plotted.
plt.tick_params(axis="both", which="both", bottom="off", top="off",
labelbottom="on", left="off", right="off", labelleft="on")
# Now that the plot is prepared, it's time to actually plot the data!
# Note that I plotted the majors in order of the highest % in the final year.
majors = ['Health Professions', 'Public Administration', 'Education', 'Psychology',
'Foreign Languages', 'English', 'Communications\nand Journalism',
'Art and Performance', 'Biology', 'Agriculture',
'Social Sciences and History', 'Business', 'Math and Statistics',
'Architecture', 'Physical Sciences', 'Computer Science',
'Engineering']
for rank, column in enumerate(majors):
# Plot each line separately with its own color, using the Tableau 20
# color set in order.
plt.plot(gender_degree_data.Year.values,
gender_degree_data[column.replace("\n", " ")].values,
lw=2.5, color=tableau20[rank])
# Add a text label to the right end of every line. Most of the code below
# is adding specific offsets y position because some labels overlapped.
y_pos = gender_degree_data[column.replace("\n", " ")].values[-1] - 0.5
if column == "Foreign Languages":
y_pos += 0.5
elif column == "English":
y_pos -= 0.5
elif column == "Communications\nand Journalism":
y_pos += 0.75
elif column == "Art and Performance":
y_pos -= 0.25
elif column == "Agriculture":
y_pos += 1.25
elif column == "Social Sciences and History":
y_pos += 0.25
elif column == "Business":
y_pos -= 0.75
elif column == "Math and Statistics":
y_pos += 0.75
elif column == "Architecture":
y_pos -= 0.75
elif column == "Computer Science":
y_pos += 0.75
elif column == "Engineering":
y_pos -= 0.25
# Again, make sure that all labels are large enough to be easily read
# by the viewer.
plt.text(2011.5, y_pos, column, fontsize=14, color=tableau20[rank])
# matplotlib's title() call centers the title on the plot, but not the graph,
# so I used the text() call to customize where the title goes.
# Make the title big enough so it spans the entire plot, but don't make it
# so big that it requires two lines to show.
# Note that if the title is descriptive enough, it is unnecessary to include
# axis labels; they are self-evident, in this plot's case.
plt.text(1995, 93, "Percentage of Bachelor's degrees conferred to women in the U.S.A."
", by major (1970-2012)", fontsize=17, ha="center")
# Always include your data source(s) and copyright notice! And for your
# data sources, tell your viewers exactly where the data came from,
# preferably with a direct link to the data. Just telling your viewers
# that you used data from the "U.S. Census Bureau" is completely useless:
# the U.S. Census Bureau provides all kinds of data, so how are your
# viewers supposed to know which data set you used?
plt.text(1966, -8, "Data source: nces.ed.gov/programs/digest/2013menu_tables.asp"
"\nAuthor: Randy Olson (randalolson.com / #randal_olson)"
"\nNote: Some majors are missing because the historical data "
"is not available for them", fontsize=10)
# Finally, save the figure as a PNG.
# You can also save it as a PDF, JPEG, etc.
# Just change the file extension in this call.
# bbox_inches="tight" removes all the extra whitespace on the edges of your plot.
plt.savefig("percent-bachelors-degrees-women-usa.png", bbox_inches="tight");
Here's what the final result looks like:
I've updated my blog post with this new code as well. Thanks for bringing this issue to my attention!
Related
I am using the following nested dictionary to make a lineplot:
df = {'A':
{'weight': [200, 190, 188, 180, 170],
'days_since_gym': [0, 91, 174, 205, 279],
'days_since_fasting': 40},
'B':
{'weight': [181, 175, 172, 165, 150],
'days_since_gym': [43, 171, 241, 273, 300],
'days_since_fasting': 100}}
While making the lineplot, I want the Y-Axis ticks as the percentage value, for which I'm using PercentFormatter:
# set the plot size
fig, ax = plt.subplots(2, figsize=(10, 6))
for i, x in enumerate(df.keys()):
sns.lineplot(
x=df[x]['days_since_gym'],
y=df[x]['weight'],
marker="o",
ax=ax[i],
)
ax[i].axvline(df[x]['days_since_fasting'], color='k', linestyle='--', label='Fasting Starts')
ax[i].set_xlim(left=0, right=365)
# Percentage y-axis
ax[i].yaxis.set_major_formatter(mtick.PercentFormatter())
plt.xlabel('Days Since Joined Gym')
plt.ylabel('Relastive Weight')
plt.legend(bbox_to_anchor=(1.04, 1), loc="upper left")
plt.show()
However, I don't want the default percentage value (as the fig shows). I want the 1st value would be the starting percentage and the subsequent value would be the relative percentage. For example, the 1st plot starts with 200%, which I want as 0%, and the plot ends with 170%, which I want as -something%.
Any suggestions would be appreciated. Thanks!
One way with minor changes to your code is to make the values in y relative to the first value. That is, keep everything as is and replace:
y=df[x]['weight'],
with:
y=[a-df[x]['weight'][0] for a in df[x]['weight']],
This question already has answers here:
Aligning rotated xticklabels with their respective xticks
(5 answers)
Closed 1 year ago.
Could someone please help me to set my x-ticks with bars. The bars are not consistent with xtick time values as you can see in the image. I have printed my data values of g01, g02 below and code as well. I have tried this solution Python MatplotLib plot x-axis with first x-axis value labeled as 1 (instead of 0), plt.xticks(np.arange(len(g01)), np.arange(1, len(g01)+1)) although then bars are consistent with x-ticks but it changes to numbers 1 to 28. I want time period like in my image.
g01 = ['2021-02-01 05:00:31', '2021-02-02 00:01:04', '2021-02-03 00:05:09', '2021-02-04 00:05:15', '2021-02-05 00:03:14', '2021-02-06 00:00:25', '2021-02-07 00:04:09', '2021-02-08 00:04:35', '2021-02-09 00:00:00', '2021-02-10 00:02:00', '2021-02-11 00:01:28', '2021-02-12 00:06:31', '2021-02-13 00:00:30', '2021-02-14 00:03:30', '2021-02-15 00:05:20', '2021-02-16 00:00:13', '2021-02-17 00:00:21', '2021-02-18 00:08:02', '2021-02-19 00:00:31', '2021-02-20 00:00:04', '2021-02-21 00:05:05', '2021-02-22 00:02:18', '2021-02-23 00:00:10', '2021-02-24 00:00:38', '2021-02-25 00:00:47', '2021-02-26 00:00:17', '2021-02-27 00:00:28', '2021-02-28 00:03:00']
g02 = [164, 158, 180, 200, 177, 112, 97, 237, 95, 178, 163, 78, 67, 65, 134, 93, 220, 74, 131, 172, 77, 102, 208, 109, 113, 208, 110, 101]
fig = plt.figure()
fig, ax1 = plt.subplots(1,1)
plt.yscale("log")
barlist1=ax1.bar(g01,g02)
for i in range(21):
barlist1[i].set_color('pink')
degrees = 70
plt.xticks(rotation=degrees)
plt.xlabel('period', fontsize=14, fontweight="bold")
plt.ylabel('rating values', fontsize=10, fontweight="bold")
While the linked duplicate does improve the alignment with ha='right', the labels will still be slightly off.
First note that the ticks/labels are correctly mapped, which you can see by using rotation=90 (left subplot):
plt.xticks(rotation=90)
If you use rotation=70 with ha='right', notice that the labels are still slightly shifted. This is because matplotlib uses the text's bounding box for alignment, not the text itself (center subplot):
plt.xticks(rotation=70, ha='right')
To tweak the labels more precisely, add a ScaledTranslation transform (right subplot):
from matplotlib.transforms import ScaledTranslation
offset = ScaledTranslation(xt=0.075, yt=0, scale_trans=fig.dpi_scale_trans)
for label in ax1.xaxis.get_majorticklabels():
label.set_transform(label.get_transform() + offset)
I've been playing around with Matplotlib and created a horizontal bar using the following algorithm (Full code and junk data provided at the bottom of this post).
# Version 1
ax.broken_barh([(depth_start[0], thick[0]), (depth_start[1], thick[1]), (depth_start[2], thick[2])], (25, 0.8),
facecolors=('tab:brown', 'tab:blue', 'tab:green'))
which produces the following graphical output:
So I've been trying to make the code more efficient by introducing itertools
I managed to simplify the above code into a version 2:
# Version 2
for i in thick:
ax.broken_barh([(next(cycle_depth), next(cycle_thick))], (15, 0.8), facecolors=(next(cycle_colour)))
Great, this also produces the above bar in the same order with the same colours.
The Problem
But I'm struggling with my next objective which is to replace facecolors=('tab:brown', 'tab:blue', 'tab:green') with a function that uses a for loop. This function ideally selects the correct colour for each bar based on the thickness. All 3 bars return a brown colour as the function continuously returns the value associated with the else statement (see image below).
I've attempted substituting next(cycle_thick) in place of the variable cycle_think in the function, but then only one of the colours is correct again.
The colour_checker() function is as follows:
def colour_checker():
if cycle_thick == 10:
return 'tab:green'
elif cycle_thick == 20:
return 'tab:blue'
else:
return 'tab:brown'
# Version 3
for i in thick:
ax.broken_barh([(next(cycle_depth), next(cycle_thick))], (10, 0.8), facecolors=colour_checker())
Any hints or suggestions welcomed!
Full Code and Junk Data
import itertools
import matplotlib.pyplot as plt
# Junk data in the form of lists
depth_start = [90, 70, 40] # top of lithology
thick = [30, 20, 10] # thickness for each lithology
colour = ('tab:brown', 'tab:blue', 'tab:green')
# Lists to be cycled through
cycle_colour = itertools.cycle(colour)
cycle_depth = itertools.cycle(depth_start)
cycle_thick = itertools.cycle(thick)
#setting up the plot
fig, ax = plt.subplots()
def colour_checker():
if cycle_thick == [0]:
return 'tab:green'
elif cycle_thick == [1]:
return 'tab:blue'
else:
return 'tab:brown'
# Version 1
ax.broken_barh([(depth_start[0], thick[0]), (depth_start[1], thick[1]), (depth_start[2], thick[2])], (25, 0.8),
facecolors=('tab:brown', 'tab:blue', 'tab:green'))
# Version 2
for i in thick:
ax.broken_barh([(next(cycle_depth), next(cycle_thick))], (15, 0.8), facecolors=(next(cycle_colour)))
# Version 3
for i in thick:
ax.broken_barh([(next(cycle_depth), next(cycle_thick))], (10, 0.8), facecolors=colour_checker())
ax.set_ylabel('X_UTM Position')
ax.set_xlabel('MAMSL')
plt.show()
Since the intention of the outcome was ambiguous, I have created examples for all three versions I can imagine.
import matplotlib.pyplot as plt
# Junk data in the form of lists
depth_start = [90, 70, 40, 200, 170, 140] # top of lithology
thick = [30, 20, 10, 20, 10, 30] # thickness for each lithology
colour = ('tab:brown', 'tab:blue', 'tab:green')
#setting up the plot
fig, ax = plt.subplots()
#Version 1: using zip to chain all three lists
for start, length, color in zip(depth_start, thick, colour+colour[::-1]):
ax.broken_barh([(start, length)], (-0.4, 0.8), facecolors=color)
#Version 2: color cycler repetitive color assignments
from itertools import cycle
cycle_colour = cycle(colour)
for start, length in zip(depth_start, thick):
ax.broken_barh([(start, length)], (0.6, 0.8), facecolors=next(cycle_colour))
#Version 3: lookup table to color bars of a specific length with a certain color
color_dic = {30: 'tab:brown', 20: 'tab:blue', 10: 'tab:green'}
for start, length in zip(depth_start, thick):
ax.broken_barh([(start, length)], (1.6, 0.8), facecolors=color_dic[length])
ax.set_yticks(range(3))
ax.set_yticklabels(["Version 1", "Version 2", "Version 3"])
plt.show()
Sample output:
I would like to add error bar in my plot that I can show the min max of each plot. Please, anyone can help me. Thanks in advance.
The min max is as follow:
Delay = (53.46 (min 0, max60) , 36.22 (min 12,max 70), 83 (min 21,max 54), 17 (min 12,max 70))
Latency = (38 (min 2,max 70), 44 (min 12,max 87), 53 (min 9,max 60), 10 (min 11,max 77))
import matplotlib.pyplot as plt
import pandas as pd
from pandas import DataFrame
from matplotlib.dates import date2num
import datetime
Delay = (53.46, 36.22, 83, 17)
Latency = (38, 44, 53, 10)
index = ['T=0', 'T=26', 'T=50','T=900']
df = pd.DataFrame({'Delay': Delay, 'Latency': Latency}, index=index)
ax = df.plot.bar(rot=0)
plt.xlabel('Time')
plt.ylabel('(%)')
plt.ylim(0, 101)
plt.savefig('TestX.png', dpi=300, bbox_inches='tight')
plt.show()
In order to plot in the correct location on a bar plot, the patch data for each bar must be extracted.
An ndarray is returned with one matplotlib.axes.Axes per column.
In the case of this figure, ax.patches contains 8 matplotlib.patches.Rectangle objects, one for each segment of each bar.
By using the associated methods for this object, the height, width, and x locations can be extracted, and used to draw a line with plt.vlines.
The height of the bar is used to extract the correct min and max value from dict, z.
Unfortunately, the patch data does not contain the bar label (e.g. Delay & Latency).
import pandas as pd
import matplotlib.pyplot as plt
# create dataframe
Delay = (53.46, 36.22, 83, 17)
Latency = (38, 44, 53, 10)
index = ['T=0', 'T=26', 'T=50','T=900']
df = pd.DataFrame({'Delay': Delay, 'Latency': Latency}, index=index)
# dicts with errors
Delay_error = {53.46: {'min': 0,'max': 60}, 36.22: {'min': 12,'max': 70}, 83: {'min': 21,'max': 54}, 17: {'min': 12,'max': 70}}
Latency_error = {38: {'min': 2, 'max': 70}, 44: {'min': 12,'max': 87}, 53: {'min': 9,'max': 60}, 10: {'min': 11,'max': 77}}
# combine them; providing all the keys are unique
z = {**Delay_error, **Latency_error}
# plot
ax = df.plot.bar(rot=0)
plt.xlabel('Time')
plt.ylabel('(%)')
plt.ylim(0, 101)
for p in ax.patches:
x = p.get_x() # get the bottom left x corner of the bar
w = p.get_width() # get width of bar
h = p.get_height() # get height of bar
min_y = z[h]['min'] # use h to get min from dict z
max_y = z[h]['max'] # use h to get max from dict z
plt.vlines(x+w/2, min_y, max_y, color='k') # draw a vertical line
If there are non-unique values in the two dicts, so they can't be combined, we can select the correct dict based on the bar plot order.
All the bars for a single label are plotted first.
In this case, index 0-3 are the Dalay bars, and 4-7 are the Latency bars
for i, p in enumerate(ax.patches):
print(i, p)
x = p.get_x()
w = p.get_width()
h = p.get_height()
if i < len(ax.patches)/2: # select which dictionary to use
d = Delay_error
else:
d = Latency_error
min_y = d[h]['min']
max_y = d[h]['max']
plt.vlines(x+w/2, min_y, max_y, color='k')
Some zipping and stacking will suffice—see bar_min_maxs below. Simplifying and slightly generalizing Trenton's code:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
# create dataframe
Delay = (53.46, 36.22, 83, 17)
Latency = (38, 44, 53, 10)
index = ['T=0', 'T=26', 'T=50','T=900']
df = pd.DataFrame({'Delay': Delay, 'Latency': Latency,
'Delay_min': (0, 12, 21, 12), # supply min and max
'Delay_max': (60, 70, 54, 70),
'Latency_min': (2, 12, 9, 11),
'Latency_max': (70, 87, 60, 77)},
index=index)
# plot
ax = df[['Delay', 'Latency']].plot.bar(rot=0)
plt.xlabel('Time')
plt.ylabel('(%)')
plt.ylim(0, 101)
# bar_min_maxs[i] is bar/patch i's min, max
bar_min_maxs = np.vstack((list(zip(df['Delay_min'], df['Delay_max'])),
list(zip(df['Latency_min'], df['Latency_max']))))
assert len(bar_min_maxs) == len(ax.patches)
for patch, (min_y, max_y) in zip(ax.patches, bar_min_maxs):
plt.vlines(patch.get_x() + patch.get_width()/2,
min_y, max_y, color='k')
And if errorbars are expressed through margins of errors instead of mins and maxs, i.e., the errorbar is centered at the bar's height w/ length 2 x margin of error, then here's code to plot those:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
# create dataframe
Delay = (53.46, 36.22, 83, 17)
Latency = (38, 44, 53, 10)
index = ['T=0', 'T=26', 'T=50','T=900']
df = pd.DataFrame({'Delay': Delay, 'Latency': Latency,
'Delay_moe': (5, 15, 25, 35), # supply margin of error
'Latency_moe': (10, 20, 30, 40)},
index=index)
# plot
ax = df[['Delay', 'Latency']].plot.bar(rot=0)
plt.xlabel('Time')
plt.ylabel('(%)')
plt.ylim(0, 101)
# bar_moes[i] is bar/patch i's margin of error, i.e., half the length of an
# errorbar centered at the bar's height
bar_moes = np.ravel(df[['Delay_moe', 'Latency_moe']].values.T)
assert len(bar_moes) == len(ax.patches)
for patch, moe in zip(ax.patches, bar_moes):
height = patch.get_height() # of bar
min_y, max_y = height - moe, height + moe
plt.vlines(patch.get_x() + patch.get_width()/2,
min_y, max_y, color='k')
One minor statistical note: if the difference b/t the two groups (Delay and Latency for each T=t) is of interest, then add a plot for the difference with an errorbar for the difference. A plot like the one above is not sufficient for directly analyzing differences; if, e.g., the two errorbars overlap at T=0, this does not imply that the difference b/t Delay and Latency is not statistically significant at whatever level was used. (Though if they don't overlap, then the difference is statistically significant.)
hi i have a dict with 3-int-tuple representing color (as key) and an int representing the numbers of occurences of that color in an image (as value)
for exemple, this is a 4x4 pixels image with 3 colors:
{(87, 82, 44): 1, (255, 245, 241): 11, (24, 13, 9): 4}
i want to plot a pie chart of list [1,11,4] in which each slice of the piechart is colored with the right color.. how can i do?
Update: the other answer from Paul is much better but there's not really any point in me just editing my original answer until it's essentially the same :) (I can't delete this answer because it's accepted.)
Does this do what you want? I just took an example from the matplotlib documentation and turned your data into parameters that pie() expects:
# This is a trivial modification of the example here:
# http://matplotlib.sourceforge.net/examples/pylab_examples/pie_demo.html
from pylab import *
data = {(87, 82, 44): 1, (255, 245, 241): 11, (24, 13, 9): 4}
colors = []
counts = []
for color, count in data.items():
colors.append([float(x)/255 for x in color])
counts.append(count)
figure(1, figsize=(6,6))
pie(counts, colors=colors, autopct='%1.1f%%', shadow=True)
title('Example Pie Chart', bbox={'facecolor':'0.8', 'pad':5})
show()
The result looks like this:
Mark beat me by 5 minutes, so points should go to him, but here's my (nearly identical, but more terse) answer anyway:
from matplotlib import pyplot
data = {(87, 82, 44): 1, (255, 245, 241): 11, (24, 13, 9): 4}
colors, values = data.keys(), data.values()
# matplotlib wants colors as 0.0-1.0 floats, not 0-255 ints
colors = [tuple(i/255. for i in c) for c in colors]
pyplot.pie(values, colors=colors)
pyplot.show()