I really don't understand what's going wrong with this... I've looked through what is pretty simple data several times and have restarted the kernel (running on Jupyter Notebook) and nothing seems to be solving it.
Here's the data frame I have (sorry I know the numbers look a bit silly, this is a really sparse dataset over a long time period, original is reindexed for 20 years):
DATE NODP NVP VP VDP
03/08/2002 0.083623 0.10400659 0.81235517 1.52458E-05
14/09/2003 0.24669167 0.24806379 0.5052293 1.52458E-05
26/07/2005 0.15553726 0.13324796 0.7111538 0.000060983
20/05/2006 0 0.23 0.315 0.455
05/06/2007 0.21280034 0.29139224 0.49579217 1.52458E-05
21/02/2010 0 0.55502195 0.4449628 1.52458E-05
09/04/2011 0.09531311 0.17514162 0.72954527 0
14/02/2012 0.19213217 0.12866237 0.67920546 0
17/01/2014 0.12438848 0.10297326 0.77263826 0
24/02/2017 0.01541347 0.09897548 0.88561105 0
Note that all of the rows add up to 1! I have triple, quadruple checked this...XD
I am trying to produce a stacked bar chart of this data, with the following code, which seems to have worked perfectly for everything else I have been using it for:
NODP = df['NODP']
NVP = df['NVP']
VDP = df['VDP']
VP = df['VP']
ind = np.arange(len(df.index))
width = 5.0
p1 = plt.bar(ind, NODP, width, label = 'NODP', bottom=NVP, color= 'grey')
p2 = plt.bar(ind, NVP, width, label = 'NVP', bottom=VDP, color= 'tan')
p3 = plt.bar(ind, VDP, width, label = 'VDP', bottom=VP, color= 'darkorange')
p4 = plt.bar(ind, VP, width, label = 'VP', color= 'darkgreen')
plt.ylabel('Ratio')
plt.xlabel('Year')
plt.title('Ratio change',x=0.06,y=0.8)
plt.xticks(np.arange(min(ind), max(ind)+1, 6.0), labels=xlabels) #the xticks were cumbersome so not included in this example code
plt.legend()
Which gives me the following plot:
As is evident, 1) NODP is not showing up at all, and 2) the remainder of them are being plotted with the wrong proportions...
I really don't understand what is wrong, it should be really simple, right?! I'm sorry if it is really simple, it's probably right under my nose. Any ideas greatly appreciated!
If you want to create stacked bars this way (so standard matplotlib without using pandas or seaborn for plotting), the bottom needs to be the sum of all the lower bars.
Here is an example with the given data.
from matplotlib import pyplot as plt
import numpy as np
import pandas as pd
columns = ['DATE', 'NODP', 'NVP', 'VP', 'VDP']
data = [['03/08/2002', 0.083623, 0.10400659, 0.81235517, 1.52458E-05],
['14/09/2003', 0.24669167, 0.24806379, 0.5052293, 1.52458E-05],
['26/07/2005', 0.15553726, 0.13324796, 0.7111538, 0.000060983],
['20/05/2006', 0, 0.23, 0.315, 0.455],
['05/06/2007', 0.21280034, 0.29139224, 0.49579217, 1.52458E-05],
['21/02/2010', 0, 0.55502195, 0.4449628, 1.52458E-05],
['09/04/2011', 0.09531311, 0.17514162, 0.72954527, 0],
['14/02/2012', 0.19213217, 0.12866237, 0.67920546, 0],
['17/01/2014', 0.12438848, 0.10297326, 0.77263826, 0],
['24/02/2017', 0.01541347, 0.09897548, 0.88561105, 0]]
df = pd.DataFrame(data=data, columns=columns)
ind = pd.to_datetime(df.DATE)
NODP = df.NODP.to_numpy()
NVP = df.NVP.to_numpy()
VP = df.VP.to_numpy()
VDP = df.VDP.to_numpy()
width = 120
p1 = plt.bar(ind, NODP, width, label='NODP', bottom=NVP+VDP+VP, color='grey')
p2 = plt.bar(ind, NVP, width, label='NVP', bottom=VDP+VP, color='tan')
p3 = plt.bar(ind, VDP, width, label='VDP', bottom=VP, color='darkorange')
p4 = plt.bar(ind, VP, width, label='VP', color='darkgreen')
plt.ylabel('Ratio')
plt.xlabel('Year')
plt.title('Ratio change')
plt.yticks(np.arange(0, 1.001, 0.1))
plt.legend()
plt.show()
Note that in this case the x-axis measured in days, and each bar is located at its date. This helps to know the relative spacing between the dates, in case this is important. If it isn't important, the x-positions could be chosen equidistant and labeled via the dates column.
To do so with standard matplotlib, following code would change:
ind = range(len(df))
width = 0.8
plt.xticks(ind, df.DATE, rotation=20)
plt.tight_layout() # needed to show the full labels of the x-axis
Plot the dataframe
# using your data above
df.DATE = pd.to_datetime(df.DATE)
df.set_index('DATE', inplace=True)
ax = df.plot(stacked=True, kind='bar', figsize=(12, 8))
ax.legend(bbox_to_anchor=(1.05, 1), loc='upper left', borderaxespad=0.)
# sets the tick labels so time isn't included
ax.xaxis.set_major_formatter(plt.FixedFormatter(df.index.to_series().dt.strftime("%Y-%m-%d")))
plt.show()
Add labels for clarity
By adding the following code before plt.show() you can add text annotations to the bars
# .patches is everything inside of the chart
for rect in ax.patches:
# Find where everything is located
height = rect.get_height()
width = rect.get_width()
x = rect.get_x()
y = rect.get_y()
# The width of the bar is the data value and can used as the label
label_text = f'{height:.2f}' # f'{height:.2f}' if you have decimal values as labels
label_x = x + width - 0.125
label_y = y + height / 2
# don't include label if it's equivalently 0
if height > 0.001:
ax.text(label_x, label_y, label_text, ha='right', va='center', fontsize=8)
plt.show()
Related
I am plotting a grouped bar plot out of the data:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame({'x':[0.716468, 0.799652, 0.611284, 0.700020, 0.745372, 0.717280, 0.212407, 0.225291, 0.443395, 0.649912, 0.756463, 0.588992],
'y':[1.891988, 2.750937, 4.495497, 5.260436, 6.100882, 6.262784, 7.339279, 6.877465, 6.349050, 4.797649, 3.290293, 2.106541],
'x_err':[0.022882, 0.021447, 0.009402, 0.011324, 0.008872, 0.015882, 0.009615, 0.007617, 0.012816, 0.010310, 0.009213, 0.020137],
'y_err':[0.156298, 0.151681, 0.178215, 0.143700, 0.137071, 0.133951, 0.209588, 0.185246, 0.214665, 0.214598, 0.163624, 0.132138]})
with the following code:
fig, ax = plt.subplots()
width = 0.35
df['x'].plot(kind = 'bar',ax = ax ,width = width, position = 0 , yerr = df['x_err'],color = 'red',use_index = True)
ax.set_ylabel('X')
ax1= ax.twinx()
df['y'].plot(kind = 'bar',ax = ax1 ,width = width, position = 1 , yerr = df['y_err'],color = 'blue',use_index = True)
ax1.set_ylabel('Y')
plt.show()
and got the following plot:
The plot is okay except for the red bar in the last group i.e. group bar 11 (shown by arrow) only appears half. I know that after reducing the width I can visualize it. The problem is the bars become thin, which I do not want. As you can see, there is still plenty of gap between two successive groups, I want to reduce the gap and accommodate all bars clearly.
Any help would be highly appreciated.
That's due to the fact that you modify the position of the bars with position. Try adjusting xlims:
# other plotting functions
# ...
xlims = ax1.get_xlim()
ax1.set_xlim(xlims[0], xlims[1] + width)
plt.show()
Output:
I was trying to reproduce this plot with Matplotlib:
So, by looking at the documentation, I found out that the closest thing is a grouped bar chart. The problem is that I have a different number of "bars" for each category (subject, illumination, ...) compared to the example provided by matplotlib that instead only has 2 classes (M, F) for each category (G1, G2, G3, ...). I don't know exactly from where to start, does anyone here has any clue? I think in this case the trick they made to specify bars location:
x = np.arange(len(labels)) # the label locations
width = 0.35 # the width of the bars
fig, ax = plt.subplots()
rects1 = ax.bar(x - width/2, men_means, width, label='Men')
rects2 = ax.bar(x + width/2, women_means, width, label='Women')
does not work at all as in the second class (for example) there is a different number of bars. It would be awesome if anyone could give me an idea. Thank you in advance!
Supposing the data resides in a dataframe, the bars can be generated by looping through the categories:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
# first create some test data, similar in structure to the question's
categories = ['Subject', 'Illumination', 'Location', 'Daytime']
df = pd.DataFrame(columns=['Category', 'Class', 'Value'])
for cat in categories:
for _ in range(np.random.randint(2, 7)):
df = df.append({'Category': cat,
'Class': "".join(np.random.choice([*'tuvwxyz'], 10)),
'Value': np.random.uniform(10, 17)}, ignore_index=True)
fig, ax = plt.subplots()
start = 0 # position for first label
gap = 1 # gap between labels
labels = [] # list for all the labels
label_pos = np.array([]) # list for all the label positions
# loop through the categories of the dataframe
# provide a list of colors (at least as long as the expected number of categories)
for (cat, df_cat), color in zip(df.groupby('Category', sort=False), ['navy', 'orange'] * len(df)):
num_in_cat = len(df_cat)
# add a text for the category, using "axes coordinates" for the y-axis
ax.text(start + num_in_cat / 2, 0.95, cat, ha='center', va='top', transform=ax.get_xaxis_transform())
# positions for the labels of the current category
this_label_pos = np.arange(start, start + num_in_cat)
# create bars at the desired positions
ax.bar(this_label_pos, df_cat['Value'], color=color)
# store labels and their positions
labels += df_cat['Class'].to_list()
label_pos = np.append(label_pos, this_label_pos)
start += num_in_cat + gap
# set the positions for the labels
ax.set_xticks(label_pos)
# set the labels
ax.set_xticklabels(labels, rotation=30)
# optionally set a new lower position for the y-axis
ax.set_ylim(ymin=9)
# optionally reduce the margin left and right
ax.margins(x=0.01)
plt.tight_layout()
plt.show()
I am creating a figure with a title, 16 subplots, and a legend. I cannot for the life of me get it to save nicely. I am going to try my best to explain my predicament but my vocabulary may not be correct, so I apologize in advanced.
If I run my code (end) I receive the following output:
That is not pretty, everything is overlapping or cut off. If I were to add plt.savefig() that is what I get.
I can drag the corners of the pop-up window and that gives me a very nicely spaced figure, and is precisely what I want:
However, the save function at the bottom of that pop up window does not always work, and I would much rather be able to create a nice figure in my code that i send to the plt.savefig() function.
In all my searches I keep seeing tight_layout being recommended as a fix to this. The issue with that is it adjusts my plot sizes rather than the spacing between plots, so my titles overlap and my data isn't as visible:
I have also tried constrained_layout() with zero success
I am really hoping there is an obvious solution I am missing, as taking screen shots of the plot isn't really working for me.
eq_csv = r'/here/is/the/file.csv'
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
eq_df = pd.read_csv(eq_csv)
eq_data = eq_df[['LON', 'LAT', 'DEPTH', 'MAG']] # retrieve only the columns I need
eq_data = eq_data.sort_values(['MAG'], ascending=False)
# Get the NSEW boundaries and the Magnitude min and max
nbound = max(eq_data.LAT) + 0.05
sbound = min(eq_data.LAT) - 0.05
ebound = max(eq_data.LON) + 0.01
wbound = min(eq_data.LON)
xlimit = (wbound, ebound)
ylimit = (sbound, nbound)
magmin = min(eq_data.MAG)
magmax = max(eq_data.MAG)
# Loop through depth slices and create a 4 x 4 figure of subplots
fig, axes = plt.subplots(4,4)
for ax in axes.flat:
for n in list(range(1, 17)):
km = eq_data[(eq_data.DEPTH > n - 1) & (eq_data.DEPTH <= n)]
km = km.sort_values(['MAG'], ascending=True)
plt.subplot(4, 4, n) # plot a 4x4 sub plot at the nth location
scatter = plt.scatter(km["LON"], km['LAT'], s=10, c=km['MAG'], vmin=magmin, vmax=magmax, alpha = 0.5)
plt.ylim(sbound, nbound)
plt.xlim(wbound, ebound)
plt.tick_params(axis='both', which='major', labelsize=4)
plt.yticks(rotation = 90)
plt.ylabel('Latitude', rotation = 90, size = 6)
plt.xlabel('Longitude', size = 6)
plt.subplots_adjust(hspace=0.65, wspace=0.25)
plt.gca().set_title('Depth = ' + str(n - 1) + 'km to ' + str(n) + 'km', size=8, fontweight = 'bold') # set title of subplots
plt.suptitle('Magnitude of Events at Different Depth Slices, 1950 to Today', size = 20, fontweight = 'bold')
plt.tight_layout()
fig.subplots_adjust(right=0.8) #adust location of plot
cbar_ax = fig.add_axes([0.85, 0.15, 0.01, 0.7]) #location of color bar
cbar = fig.colorbar(scatter, cax=cbar_ax)
cbar.set_alpha(1)
cbar.set_label('Magnitude', rotation = 270, labelpad = 10)
cbar.draw_all()
plt.show()
plt.savefig('save/location')
I have this image from Matplotlib :
I would like to write for each category (cat i with i in [1-10] in the figure) the highest value and its corresponding legend on the graphic.
Below you can find visually what I would like to achieve :
The thing is the fact that I don't know if it is possible because of the way of plotting from matplotlib.
Basically, this is the part of the code for drawing multiple bars :
# create plot
fig, ax = plt.subplots(figsize = (9,9))
index = np.arange(len_category)
if multiple:
bar_width = 0.3
else :
bar_width = 1.5
opacity = 1.0
#test_array contains test1 and test2
cmap = get_cmap(len(test_array))
for i in range(len(test_array)):
count = count + 1
current_label = test_array[i]
rects = plt.bar(index-0.2+bar_width*i, score_array[i], bar_width, alpha=opacity, color=np.random.rand(3,1),label=current_label )
plt.xlabel('Categories')
plt.ylabel('Scores')
plt.title('Scores by Categories')
plt.xticks(index + bar_width, categories_array)
plt.legend()
plt.tight_layout()
plt.show()
and this is the part I have added in order to do what I would like to achieve. But it searches the max across all the bars in the graphics. For example, the max of test1 will be in cat10 and the max of test2 will be cat2. Instead, I would like to have the max for each category.
for i in range(len(test_array)):
count = count + 1
current_label = test_array[i]
rects = plt.bar(index-0.2+bar_width*i, score_array[i], bar_width,alpha=opacity,color=np.random.rand(3,1),label=current_label )
max_score_current = max(score_array[i])
list_rect = list()
max_height = 0
#The id of the rectangle who get the highest score
max_idx = 0
for idx,rect in enumerate(rects):
list_rect.append(rect)
height = rect.get_height()
if height > max_height:
max_height = height
max_idx = idx
highest_rect = list_rect[max_idx]
plt.text(highest_rect.get_x() + highest_rect.get_width()/2.0, max_height, str(test_array[i]),color='blue', fontweight='bold')
del list_rect[:]
Do you have an idea about how I can achieve that ?
Thank you
It usually better to keep data generation and visualization separate. Instead of looping through the bars themselves, just get the necessary data prior to plotting. This makes everything a lot more simple.
So first create a list of labels to use and then loop over the positions to annotate then. In the code below the labels are created by mapping the argmax of a column array to the test set via a dictionary.
import numpy as np
import matplotlib.pyplot as plt
test1 = [6,4,5,8,3]
test2 = [4,5,3,4,6]
labeldic = {0:"test1", 1:"test2"}
a = np.c_[test1,test2]
maxi = np.max(a, axis=1)
l = ["{} {}".format(i,labeldic[j]) for i,j in zip(maxi, np.argmax(a, axis=1))]
for i in range(a.shape[1]):
plt.bar(np.arange(a.shape[0])+(i-1)*0.3, a[:,i], width=0.3, align="edge",
label = labeldic[i])
for i in range(a.shape[0]):
plt.annotate(l[i], xy=(i,maxi[i]), xytext=(0,10),
textcoords="offset points", ha="center")
plt.margins(y=0.2)
plt.legend()
plt.show()
From your question it is not entirely clear what you want to achieve, but assuming that you want the relative height of each bar in one group printed above that bar, here is one way to achieve that:
from matplotlib import pyplot as plt
import numpy as np
score_array = np.random.rand(2,10)
index = np.arange(score_array.shape[1])
test_array=['test1','test2']
opacity = 1
bar_width = 0.25
for i,label in enumerate(test_array):
rects = plt.bar(index-0.2+bar_width*i, score_array[i], bar_width,alpha=opacity,label=label)
heights = [r.get_height() for r in rects]
print(heights)
rel_heights = [h/max(heights) for h in heights]
idx = heights.index(max(heights))
for i,(r,h, rh) in enumerate(zip(rects, heights, rel_heights)):
plt.text(r.get_x() + r.get_width()/2.0, h, '{:.2}'.format(rh), color='b', fontweight ='bold', ha='center')
plt.show()
The result looks like this:
I have just started learning python and I am using the Titanic data set to practice
I am not able to create a grouped bar chart and it it giving me an error
'incompatible sizes: argument 'height' must be length 2 or scalar'
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv("Titanic/train.csv")
top_five = df.head(5)
print(top_five)
column_no = df.columns
print(column_no)
female_count = len([p for p in df["Sex"] if p == 'female'])
male_count = len([i for i in df["Sex"] if i == 'male'])
have_survived= len([m for m in df["Survived"] if m == 1])
not_survived = len([n for n in df["Survived"] if n == 0])
plt.bar([0],female_count, color ='b')
plt.bar([1],male_count,color = 'y')
plt.xticks([0+0.2,1+0.2],['females','males'])
plt.show()
plt.bar([0],not_survived, color ='r')
plt.bar([1],have_survived, color ='g')
plt.xticks([0+0.2,1+0.2],['not_survived','have_survived'])
plt.show()
it works fine until here and I get two individual charts
Instead i want one chart that displays bars for male and female and color code the bars based on survival.
This does not seem to work
N = 2
index = np.arange(N)
bar_width = 0.35
plt.bar(index, have_survived, bar_width, color ='b')
plt.bar(index + bar_width, not_survived, bar_width,color ='r',)
plt.xticks([0+0.2,1+0.2],['females','males'])
plt.legend()
Thanks in advance!!
How about replacing your second block of code (the one that returns a ValueError) with this
bar_width = 0.35
tot_people_count = (female_count + male_count) * 1.0
plt.bar(0, female_count, bar_width, color ='b')
plt.bar(1, male_count, bar_width, color ='y',)
plt.bar(0, have_survived/tot_people_count*female_count, bar_width, color='r')
plt.bar(1, have_survived/tot_people_count*male_count, bar_width, color='g')
plt.xticks([0+0.2,1+0.2],['females','males'])
plt.legend(['female deceased', 'male deceased', 'female survivors', 'male survivors'],
loc='best')
I get this bar graph as an output,
The reason for the error you get is that the left and height parameters of plt.bar must either have the same length as each other or one (or both) of them must be a scalar. That is why changing index in your code to the simple scalars 0 and 1 fixes the error.