I have the following dataframe where it contains the best equipment in operation ranked by 1 to 300 (1 is the best, 300 is the worst) over a few days (df columns)
Equipment 21-03-27 21-03-28 21-03-29 21-03-30 21-03-31 21-04-01 21-04-02
P01-INV-1-1 1 1 1 1 1 2 2
P01-INV-1-2 2 2 4 4 5 1 1
P01-INV-1-3 4 4 3 5 6 10 10
I would like to customize a line plot (example found here) but I'm having some troubles trying to modify the example code provided:
import matplotlib.pyplot as plt
import numpy as np
def energy_rank(data, marker_width=0.1, color='blue'):
y_data = np.repeat(data, 2)
x_data = np.empty_like(y_data)
x_data[0::2] = np.arange(1, len(data)+1) - (marker_width/2)
x_data[1::2] = np.arange(1, len(data)+1) + (marker_width/2)
lines = []
lines.append(plt.Line2D(x_data, y_data, lw=1, linestyle='dashed', color=color))
for x in range(0,len(data)*2, 2):
lines.append(plt.Line2D(x_data[x:x+2], y_data[x:x+2], lw=2, linestyle='solid', color=color))
return lines
data = ranks.head(4).to_numpy() #ranks is the above dataframe
artists = []
for row, color in zip(data, ('red','blue','green','magenta')):
artists.extend(energy_rank(row, color=color))
fig, ax = plt.subplots()
ax.set_xticklabels(ranks.columns) # set X axis to be dataframe columns
ax.set_xticklabels(ax.get_xticklabels(), rotation=35, fontsize = 10)
for artist in artists:
ax.add_artist(artist)
ax.set_ybound([15,0])
ax.set_xbound([.5,8.5])
When using ax.set_xticklabels(ranks.columns), for some reason, it only plots 5 of the 7 days from ranks columns, removing specifically the first and last values. I tried to duplicate those values but this did not work as well. I end up having this below:
In summary, I would like to know if its possible to do 3 customizations:
input all dates from ranks columns on X axis
revert Y axis. ax.set_ybound([15,0]) is not working. It would make more sense to see the graph starting with 0 on top, since 1 is the most important rank to look at
add labels to the end of each line at the last day (last value on X axis). I could add the little window label, but it often gets really messy when you plot more data, so adding just the text at the end of each line would really make it look cleaner
Please let me know if those customizations are impossible to do and any help is really appreciated! Thank you in advance!
To show all the dates, use plt.xticks() and set_xbound to start at 0. To reverse the y axis, use ax.set_ylim(ax.get_ylim()[::-1]). To set the legends the way you described, you can use annotation and set the coordinates of the annotation at your last datapoint for each series.
fig, ax = plt.subplots()
plt.xticks(np.arange(len(ranks.columns)), list(ranks.columns), rotation = 35, fontsize = 10)
plt.xlabel('Date')
plt.ylabel('Rank')
for artist in artists:
ax.add_artist(artist)
ax.set_ybound([0,15])
ax.set_ylim(ax.get_ylim()[::-1])
ax.set_xbound([0,8.5])
ax.annotate('Series 1', xy =(7.1, 2), color = 'red')
ax.annotate('Series 2', xy =(7.1, 1), color = 'blue')
ax.annotate('Series 3', xy =(7.1, 10), color = 'green')
plt.show()
Here is the plot for the three rows of data in your sample dataframe:
I have a data frame (my_data) as follows:
0 2017-01 2017-02 2017-03 2017-04
0 S1 2 3 2 2
1 S2 2 0 2 0
2 S3 1 0 2 2
3 S4 3 2 2 2
4 … … … … …
5 … … … … …
6 S10 2 2 3 2
This data frame is a result of a classification problem in different dates for each sample (S1,.., S10). In order to simplify the plotting I converted the confusion matrix in different numbers as follows: 0 means ‘TP’, 1 means ‘FP’, 2 refers to ‘TN’ and 3 points to ‘FN’. Now, I want to plot this data frame like the below image.
It needs to be mentioned that I already asked this question, but nobody could help me. So, now I tried to make the question more easy to understand that I can get help.
Unfortunately, I don't know of a way to plot one set of data with different markers, so you will have to plot over all your data separately.
You can use matplotlib to plot your data. I'm not sure how your data looks, but for a file with these contents:
2017-01,2017-02,2017-03,2017-04
2,3,2,2
2,0,2,0
1,0,2,2
3,2,2,2
2,2,3,2
You can use the following code to get the plot you want:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
fig, ax = plt.subplots()
df = pd.read_csv('dataframe.txt', parse_dates = True)
dates = list(df.columns.values) #get dates
number_of_dates = len(dates)
markers = ["o", "d", "^", "s"] #set marker shape
colors = ["g", "r", "m", "y"] #set marker color
# loop over the data in your dataframe
for i in range(df.shape[0]):
# get a row of 1s, 2s, ... as you want your
# data S1, S2, in one line on top of each other
dataY = (i+1)*np.ones(number_of_dates)
# get the data that will specify which marker to use
data = df.loc[i]
# plot dashed line first, setting it underneath markers with zorder
plt.plot(dates, dataY, c="k", linewidth=1, dashes=[6, 2], zorder=1)
# loop over each data point x is the date, y a constant number,
# and data specifies which marker to use
for _x, _y, _data in zip(dates, dataY, data):
plt.scatter(_x, _y, marker=markers[_data], c=colors[_data], s=100, edgecolors="k", linewidths=0.5, zorder=2)
# label your ticks S1, S2, ...
ticklist = list(range(1,df.shape[0]+1))
l2 = [("S%s" % x) for x in ticklist]
ax.set_yticks(ticklist)
ax.set_yticklabels(l2)
labels = ["TP","TN","FP","FN"]
legend_elements = []
for l,c, m in zip(labels, colors, markers):
legend_elements.append(Line2D([0], [0], marker=m, color="w", label=l, markerfacecolor=c, markeredgecolor = "k", markersize=10))
ax.legend(handles=legend_elements, loc='upper right')
plt.show()
Plotting idea from this answer.
This results in a plot looking like this:
EDIT Added dashed line and outline for markers to look more like example in question.
EDIT2 Added legend.
* Please help it's very important: Why is not possible to get subplots of cloumns of Pandas dataframe by using HeatMap inside of for-loop?
I am trying to create subplots of columns in pandas dataframe inside of for-loop during iterations since I plot result for every cycle that is for each 480 values to get all 3 subplots belong to A, B, C side by side in one window. I've found only one answer here which I'm afraid is not my case! #euri10 answered by using flat.
My scripts are following:
# Import and call the needed libraries
import numpy as np
import pandas as pd
import os
import seaborn as sns
import matplotlib.pyplot as plt
'''
Take a list and create the formatted matrix
'''
def mkdf(ListOf480Numbers):
normalMatrix = np.array_split(ListOf480Numbers,8) #Take a list and create 8 array (Sections)
fixMatrix = []
for i in range(8):
lines = np.array_split(normalMatrix[i],6) #Split each section in lines (each line contains 10 cells from 0-9)
newMatrix = [0,0,0,0,0,0] #Empty array to contain reordered lines
for j in (1,3,5):
newMatrix[j] = lines[j] #lines 1,3,5 remain equal
for j in (0,2,4):
newMatrix[j] = lines[j][::-1] #lines 2,4,6 are inverted
fixMatrix.append(newMatrix) #After last update of format of table inverted (bottom-up zig-zag)
return fixMatrix
'''
Print the matrix with the required format
'''
def print_df(fixMatrix):
values = []
for i in range(6):
values.append([*fixMatrix[4][i], *fixMatrix[7][i]]) #lines form section 6 and 7 are side by side
for i in range(6):
values.append([*fixMatrix[5][i], *fixMatrix[6][i]]) #lines form section 4 and 5 are side by side
for i in range(6):
values.append([*fixMatrix[1][i], *fixMatrix[2][i]]) #lines form section 2 and 3 are side by side
for i in range(6):
values.append([*fixMatrix[0][i], *fixMatrix[3][i]]) #lines form section 0 and 1 are side by side
df = pd.DataFrame(values)
return (df)
'''
Normalizing Formula
'''
def normalize(value, min_value, max_value, min_norm, max_norm):
new_value = ((max_norm - min_norm)*((value - min_value)/(max_value - min_value))) + min_norm
return new_value
'''
Split data in three different lists A, B and C
'''
dft = pd.read_csv('D:\me4.TXT', header=None)
id_set = dft[dft.index % 4 == 0].astype('int').values
A = dft[dft.index % 4 == 1].values
B = dft[dft.index % 4 == 2].values
C = dft[dft.index % 4 == 3].values
data = {'A': A[:,0], 'B': B[:,0], 'C': C[:,0]}
#df contains all the data
df = pd.DataFrame(data, columns=['A','B','C'], index = id_set[:,0])
'''
Data generation phase
'''
#next iteration create all plots, change the number of cycles
cycles = int(len(df)/480)
print(cycles)
for i in df:
try:
os.mkdir(i)
except:
pass
min_val = df[i].min()
min_nor = -1
max_val = df[i].max()
max_nor = 1
for cycle in range(1): #iterate thriugh all cycles range(1) by ====> range(int(len(df)/480))
count = '{:04}'.format(cycle)
j = cycle * 480
ordered_data = mkdf(df.iloc[j:j+480][i])
csv = print_df(ordered_data)
#Print .csv files contains matrix of each parameters by name of cycles respectively
csv.to_csv(f'{i}/{i}{count}.csv', header=None, index=None)
if 'C' in i:
min_nor = -40
max_nor = 150
#Applying normalization for C between [-40,+150]
new_value3 = normalize(df['C'].iloc[j:j+480][i].values, min_val, max_val, -40, 150)
n_cbar_kws = {"ticks":[-40,150,-20,0,25,50,75,100,125]}
df3 = print_df(mkdf(new_value3))
else:
#Applying normalizayion for A,B between [-1,+1]
new_value1 = normalize(df['A'].iloc[j:j+480][i].values, min_val, max_val, -1, 1)
new_value2 = normalize(df['B'].iloc[j:j+480][i].values, min_val, max_val, -1, 1)
n_cbar_kws = {"ticks":[-1.0,-0.75,-0.50,-0.25,0.00,0.25,0.50,0.75,1.0]}
df1 = print_df(mkdf(new_value1))
df2 = print_df(mkdf(new_value2))
#Plotting parameters by using HeatMap
plt.figure()
sns.heatmap(df, vmin=min_nor, vmax=max_nor, cmap ='coolwarm', cbar_kws=n_cbar_kws)
plt.title(i, fontsize=12, color='black', loc='left', style='italic')
plt.axis('off')
#Print .PNG images contains HeatMap plots of each parameters by name of cycles respectively
plt.savefig(f'{i}/{i}{count}.png')
#plotting all columns ['A','B','C'] in-one-window side by side
fig, axes = plt.subplots(nrows=1, ncols=3 , figsize=(20,10))
plt.subplot(131)
sns.heatmap(df1, vmin=-1, vmax=1, cmap ="coolwarm", linewidths=.75 , linecolor='black', cbar=True , cbar_kws={"ticks":[-1.0,-0.75,-0.5,-0.25,0.00,0.25,0.5,0.75,1.0]})
fig.axes[-1].set_ylabel('[MPa]', size=20) #cbar_kws={'label': 'Celsius'}
plt.title('A', fontsize=12, color='black', loc='left', style='italic')
plt.axis('off')
plt.subplot(132)
sns.heatmap(df2, vmin=-1, vmax=1, cmap ="coolwarm", cbar=True , cbar_kws={"ticks":[-1.0,-0.75,-0.5,-0.25,0.00,0.25,0.5,0.75,1.0]})
fig.axes[-1].set_ylabel('[Mpa]', size=20) #cbar_kws={'label': 'Celsius'}
#sns.despine(left=True)
plt.title('B', fontsize=12, color='black', loc='left', style='italic')
plt.axis('off')
plt.subplot(133)
sns.heatmap(df3, vmin=-40, vmax=150, cmap ="coolwarm" , cbar=True , cbar_kws={"ticks":[-40,150,-20,0,25,50,75,100,125]})
fig.axes[-1].set_ylabel('[°C]', size=20) #cbar_kws={'label': 'Celsius'}
#sns.despine(left=True)
plt.title('C', fontsize=12, color='black', loc='left', style='italic')
plt.axis('off')
plt.suptitle(f'Analysis of data in cycle Nr.: {count}', color='yellow', backgroundcolor='black', fontsize=48, fontweight='bold')
plt.subplots_adjust(top=0.7, bottom=0.3, left=0.05, right=0.95, hspace=0.2, wspace=0.2)
#plt.subplot_tool()
plt.savefig(f'{i}/{i}{i}{count}.png')
plt.show()
So far I couldn't get proper output due to in each cycle it prints plot each of them 3 times in different intervals eg. it prints 'A' left then again it prints 'A' under the name of 'B' and 'C' in middle and right in-one-window. Again it prints 'B' 3-times instead once and put it middle and in the end it prints 'C' 3-times instead of once and put in right side it put in middle and left!
Target is to catch subplots of all 3 columns A,B & C in one-window for each cycle (every 480-values by 480-values) in main for-loop!
1st cycle : 0000 -----> subplots of A,B,C ----> Store it as 0000.png
2nd cycle : 0001 -----> subplots of A,B,C ----> Store it as 0001.png
...
Problem is usage of df inside of for-loop and it passes values of A or B or C 3 times while it should pass it values belong to each column once respectively I provide a picture of unsuccessful output here so that you could see exactly where the problem is clearly
my desired output is below:
I also provide sample text file of dataset for 3 cycles: dataset
So after looking at your code and and your requirements I think I know what the problem is.
Your for loops are in the wrong order. You want a new figure for each cycle, containing each 'A', 'B' and 'C' as subplots.
This means your outer loop should go over the cycles and then your inner loop over i, whereas your indentation and order of the loops makes you trying to plot all 'A','B','C'subplots already on your first loop through i (i='A', cycle=1) and not after your first loop through the first cycle, with all i (i='A','B','C', cycle=1).
This is also why you get the problem (as mentioned in your comment on this answer ) of not defining df3. The definition of df3 ist in an if block checking if 'C' in i, on your first loop through, this condition is not met and therefore df3 is not defined, but you are still trying to plot it!
Also you got the same problem as in your other question with the NaN/inf values again.
Rearraning the for loops and the indentation and cleaning up the NaN/inf values gets you the following code:
#...
#df contains all the data
df = pd.DataFrame(data, columns=['A','B','C'], index = id_set[:,0])
df = df.replace(np.inf, np.nan)
df = df.fillna(0)
'''
Data generation phase
'''
#next iteration create all plots, change the number of cycles
cycles = int(len(df)/480)
print(cycles)
for cycle in range(cycles): #iterate thriugh all cycles range(1) by ====> range(int(len(df)/480))
count = '{:04}'.format(cycle)
j = cycle * 480
for i in df:
try:
os.mkdir(i)
except:
pass
min_val = df[i].min()
min_nor = -1
max_val = df[i].max()
max_nor = 1
ordered_data = mkdf(df.iloc[j:j+480][i])
csv = print_df(ordered_data)
#Print .csv files contains matrix of each parameters by name of cycles respectively
csv.to_csv(f'{i}/{i}{count}.csv', header=None, index=None)
if 'C' in i:
min_nor = -40
max_nor = 150
#Applying normalization for C between [-40,+150]
new_value3 = normalize(df['C'].iloc[j:j+480], min_val, max_val, -40, 150)
n_cbar_kws = {"ticks":[-40,150,-20,0,25,50,75,100,125]}
df3 = print_df(mkdf(new_value3))
else:
#Applying normalizayion for A,B between [-1,+1]
new_value1 = normalize(df['A'].iloc[j:j+480], min_val, max_val, -1, 1)
new_value2 = normalize(df['B'].iloc[j:j+480], min_val, max_val, -1, 1)
n_cbar_kws = {"ticks":[-1.0,-0.75,-0.50,-0.25,0.00,0.25,0.50,0.75,1.0]}
df1 = print_df(mkdf(new_value1))
df2 = print_df(mkdf(new_value2))
# #Plotting parameters by using HeatMap
# plt.figure()
# sns.heatmap(df, vmin=min_nor, vmax=max_nor, cmap ='coolwarm', cbar_kws=n_cbar_kws)
# plt.title(i, fontsize=12, color='black', loc='left', style='italic')
# plt.axis('off')
# #Print .PNG images contains HeatMap plots of each parameters by name of cycles respectively
# plt.savefig(f'{i}/{i}{count}.png')
#plotting all columns ['A','B','C'] in-one-window side by side
fig, axes = plt.subplots(nrows=1, ncols=3 , figsize=(20,10))
plt.subplot(131)
sns.heatmap(df1, vmin=-1, vmax=1, cmap ="coolwarm", linewidths=.75 , linecolor='black', cbar=True , cbar_kws={"ticks":[-1.0,-0.75,-0.5,-0.25,0.00,0.25,0.5,0.75,1.0]})
fig.axes[-1].set_ylabel('[MPa]', size=20) #cbar_kws={'label': 'Celsius'}
plt.title('A', fontsize=12, color='black', loc='left', style='italic')
plt.axis('off')
plt.subplot(132)
sns.heatmap(df2, vmin=-1, vmax=1, cmap ="coolwarm", cbar=True , cbar_kws={"ticks":[-1.0,-0.75,-0.5,-0.25,0.00,0.25,0.5,0.75,1.0]})
fig.axes[-1].set_ylabel('[Mpa]', size=20) #cbar_kws={'label': 'Celsius'}
#sns.despine(left=True)
plt.title('B', fontsize=12, color='black', loc='left', style='italic')
plt.axis('off')
plt.subplot(133)
sns.heatmap(df3, vmin=-40, vmax=150, cmap ="coolwarm" , cbar=True , cbar_kws={"ticks":[-40,150,-20,0,25,50,75,100,125]})
fig.axes[-1].set_ylabel('[°C]', size=20) #cbar_kws={'label': 'Celsius'}
#sns.despine(left=True)
plt.title('C', fontsize=12, color='black', loc='left', style='italic')
plt.axis('off')
plt.suptitle(f'Analysis of data in cycle Nr.: {count}', color='yellow', backgroundcolor='black', fontsize=48, fontweight='bold')
plt.subplots_adjust(top=0.7, bottom=0.3, left=0.05, right=0.95, hspace=0.2, wspace=0.2)
#plt.subplot_tool()
plt.savefig(f'{i}/{i}{i}{count}.png')
plt.show()
This gets you the following three images as three seperate figures with the data you provided:
Figure 1, Figure 2, Figure 3
Generally speaking, your code is quite messy. I get it, if you're new to programming and just want to analyse your data, you do whatever works, doesn't matter if it is pretty.
However, I think that the messy code means you cant properly look at the underlying logic of your script, which is how you got this problem.
I would recommend if you get a problem like that again to write out some 'pseudo code' with all of the loops and try to think about what you are trying to accomplish in each loop.
I am trying to plot three lines from different Pandas Dataframes to the same subplot in matplotlib. But, when I did this I found one of the plots shifted along the xaxis (The xrange is different for each line). However, when I plot each line individually the xlimits are correct and none of them are shifted. I have tried to reproduce my problem here:
def plot_dH1(title1, title_sim, a, b, c):
fig = plt.figure(figsize=(4,4))
plt.style.use(‘ggplot’)
sns.set_style(‘ticks’)
plt.rcParams[‘font.size’] = 15
n = 0
for i in range(len(a)):
ax = fig.add_subplot(1,1,n+1)
ax = a[i].groupby(level=(0, 1)).mean()[0].plot(label=‘$\chi_{180}$‘)
ax = b[i].groupby(level=(0, 1)).mean()[0].plot(label=‘All’)
ax = c[i].groupby(level=(0, 1)).mean()[0].plot(label=‘$\chi_{90}$‘)
ax.set_ylabel(‘$dH$‘)
ax.set_xlabel(‘$\lambda_{0}$, $\lambda_{1}$‘)
ax.set_title(title_sim[i])
title = title_sim[i]
sns.despine(offset=10, ax=ax)
plt.xticks(rotation=90)
# plt.yticks(range(0,350,20),range(0,350,20))
n = n+1
lgd = ax.legend(loc=‘upper center’, bbox_to_anchor=(0.35, -0.8),fancybox=True, shadow=True, ncol=3)
# plt.tight_layout()
# fig.savefig(‘{}.pdf’.format(‘dHdl_all’))
fig.savefig(‘{}.pdf’.format(‘dHdl_all’),bbox_extra_artists=(lgd,), bbox_inches=‘tight’)
array = [range(10), range(10,20)]
tuples = list(zip(*array))
index = pd.MultiIndex.from_tuples(tuples)
a = [pd.DataFrame(np.random.randn(10,1), index=index)]
b = [pd.DataFrame(np.random.randn(5,1), index=index[5:])]
c = [pd.DataFrame(np.random.randn(8,1), index=index[2:])]
plot_dH1(title1, title_sim, a, b, c)
a, b, c are lists of Pandas Data Frame. I am not able to upload an image. But if you run it you will see the problem. Does anyone knows why one of the lines is shifted along the xaxis?
You'll get answers quicker and more reliably if you can provide a minimal, working example. Your supplied code was missing several imports that were renamed, the title definitions, and had several commented lines cluttering things. Using the following code, I see that all of the lines start with the same x shift:
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import pandas as pd
def plot_dH1(title1, title_sim, a, b, c):
fig = plt.figure(figsize=(4,4))
plt.style.use('ggplot')
sns.set_style('ticks')
plt.rcParams['font.size'] = 15
n = 0
for i in range(len(a)):
ax = fig.add_subplot(1,1,n+1)
ax = a[i].groupby(level=(0, 1)).mean()[0].plot(label='$\chi_{180}$')
ax = b[i].groupby(level=(0, 1)).mean()[0].plot(label='All')
ax = c[i].groupby(level=(0, 1)).mean()[0].plot(label='$\chi_{90}$')
ax.set_ylabel('$dH$')
ax.set_xlabel('$\lambda_{0}$, $\lambda_{1}$')
ax.set_title(title_sim[i])
sns.despine(offset=10, ax=ax)
plt.xticks(rotation=90)
n = n+1
lgd = ax.legend(loc='upper center', bbox_to_anchor=(0.35, -0.8),fancybox=True, shadow=True, ncol=3)
fig.savefig('{}.pdf'.format('dHdl_all'),bbox_extra_artists=(lgd,), bbox_inches='tight')
array = [range(10), range(10,20)]
tuples = list(zip(*array))
index = pd.MultiIndex.from_tuples(tuples)
a = [pd.DataFrame(np.random.randn(10,1), index=index)]
b = [pd.DataFrame(np.random.randn(5,1), index=index[5:])]
c = [pd.DataFrame(np.random.randn(8,1), index=index[2:])]
title_sim = np.arange(10)
title1 = ''
plot_dH1(title1, title_sim, a, b, c)
produces the following plot using Python 3.5.2:
I don't see an x shift of one of the three curves plotted relative to the other two. An x offset is defined, and is controlled by the offset parameter of the line sns.despine(offset=10, ax=ax). Setting it to zero makes all of the lines adjacent to the y-axis:
No, Dataframe b is acutually shifted. Look at the blue curve. Its index are defined as index[5:] which means that it should have these values:
[ 0
5 15 1.398019
6 16 0.325211
7 17 0.113059
8 18 0.814993
9 19 0.402437]
So it should start from (5 15) along the X axis, but it is actually starting from (2, 12), which means that it is shifted.
I am plotting a DataFrame as a scatter graph using this code:
My dataframe somewhat looks like this -
Sector AvgDeg
0 1 52
1 2 52
2 3 52
3 4 54
4 5 52
... ... ...
df.plot.scatter(x='Sector', y='AvgDeg', s=df['AvgDeg'], color='LightBlue',grid=True)
plt.show()
and I'm getting this result:
What I need is to draw every dot with a different color and with the corresponding legend. For example: -blue dot- 'Sector 1', -red dot- 'Sector 2', and so on.
Do you have any idea how to do this? Tks!!
What you have to do is to use a list of the same size as the points in the c parameter of scatter plot.
cmap_light = ListedColormap(['#FFAAAA', '#AAFFAA', '#AAAAFF'])
txt = ["text1", "text2", "text3", "text4"]
fig, ax = plt.subplots()
x = np.arange(1, 5)
y = np.arange(1, 5)
#c will change the colors of each point
#s is the size of each point...
#c_map is the color map you want to use
ax.scatter(x, y,s = 40, cmap = cmap_light, c=np.arange(1, 5))
for i, j in enumerate(txt):
#use the below code to display the text for each point
ax.annotate(j, (x[i], y[i]))
plt.show()
What this gives you as a result is -
To assign more different colors for 31 points for example you just gotta change the size...
ax.scatter(x, y,s = 40, cmap = cmap_light, c=np.arange(1, 32))
Similarly you can annotate those points by changing the txt list above.
i would do it this way:
import matplotlib.pyplot as plt
import matplotlib as mpl
mpl.style.use('ggplot')
colorlist = list(mpl.colors.ColorConverter.colors.keys())
ax = df.plot.scatter(x='Sector', y='AvgDeg', s=df.AvgDeg*1.2,
c=(colorlist * len(df))[:len(df)])
df.apply(lambda x: ax.text(x.Sector, x.AvgDeg, 'Sector {}'.format(x.Sector)), axis=1)
plt.show()
Result