Different hue for each category seaborn - python

So, I have made a stripplot with seaborn the easiest way, with 5 different categories:
sns.set_style('whitegrid')
plt.figure(figsize=(35,20))
sns.set(font_scale = 3)
sns.stripplot(df.speed, df.routeID, hue=df.speed>50, jitter=0.2, alpha=0.5, size=10, edgecolor='black')
plt.xlabel("Speed", size=40)
plt.ylabel("route ID", size=40)
plt.title("Velocity stripplot", size=50)
Now, the thing is I want to have a different hue for each category, say speed greater than 50 kmh for first category, 30 kmh for second and so on. Is this possible? I tried to do it passing a list for hue:
hue=([("ROUTE 30">50),("ROUTE 104">0)])
but it marks: SyntaxError: invalid syntax
The thing is, I want to do it all at once (since the most obvious answer would be to plot separately) in the same plot, how can this be done?
EDIT: I followed the suggested answer. Used the same code:
plt.figure(figsize=(20,7))
my_palette = ['b' if x > 82 else 'g' for x in df.speed.values]
sns.stripplot(df.speed, df.routeID, jitter=0.2, alpha=0.5, size=8, edgecolor='black', palette = my_palette)
but didnt turned out like expected:
I dont understand what is wrong here. Any ideas?

I suggest to create separate column in df for dot color.
try this:
# INITIAL DATA
n = 1000
df = pd.DataFrame()
df['speed'] = np.random.randint(10,90,n)
df['routeID'] = np.random.choice(['ROUTE_5','ROUTE_66','ROUTE_95','ROUTE_101'], n)
# set hue indices to match your conditions
df['hue'] = 'normal' # new column with default value
df.loc[df.speed > 50, 'hue'] = 'fast'
df.loc[(df.routeID=="ROUTE_5") & (df.speed>40)|
(df.routeID=="ROUTE_66") & (df.speed>30)|
(df.routeID=="ROUTE_95") & (df.speed>60),
'hue'] = 'special'
palette = {'normal':'g','fast':'r','special':'magenta'}
sns.stripplot(x=df.speed, y=df.routeID, size=15,
hue=df.hue, palette=palette)

Related

go.scatterpolar : trying to render radar graph with various lines color not working

I am trying to build a radar chart where each line is of different color.
I feel like I have followed the doc closely and I am now facing an error I can't seem solve, especially because NO ERROR is output!
here is some dummy data I am working with :
r = [52,36,85]
theta = ["analytique", "analogique", "affectif"]
colors = ["blue", "red","yellow"]
Here is what I have for my graph:
for i in range(len(theta)):
fig_reception.add_trace(go.Scatterpolar(
mode='lines+text',
theta=[theta[i]],
r=[r[i]],
, line_color=text_colors[i],
fillcolor='#d3d3d3',
marker=dict(color=text_colors),
))
fig_reception.update_layout(autosize=False,
height=305,
polar=dict(radialaxis = dict(range=[0,100],visible = False),
angularaxis=dict(rotation=180,direction="clockwise") )
)
fig_reception.update_layout(
template=None,
polar = dict(bgcolor = "rgba(255, 255, 255, 0.2)"),)
fig_reception.update_layout(
font=dict(
size=16,
color="black",
family="Courier New, monospace",
),
title="Réception",
title_font_family="Courier New, monospace",
showlegend=False
)
what's strange its that when I hover each line, a frame with the right color and value shows up.
Here is a picture
I don't have a full solution for you, but I hope my answer leads you in the right way.
Simple start
First, let's simplify and plot a radar/spyder plot with default colors:
import plotly.express as px
import pandas as pd
r = [52,36,85]
theta = ["analytique", "analogique", "affectif"]
types = ["one", "two","three"]
df = pd.DataFrame(dict(r=r, theta=theta, type=types))
df
r
theta
type
0
52
analytique
one
1
36
analogique
two
2
85
affectif
three
Plotting this with plotly.express.line_polar, gives:
fig = px.line_polar(df, r='r', theta='theta', line_close=True, markers=True)
fig.show()
Every edge its own color
Now, you want every edge to have it's own color. For the sake of this example, I assume you want this color to be based on the column type which I defined earlier.
Simply plotting this straight away will not work, it will only give you the dots, no lines:
fig = px.line_polar(df, r='r', theta='theta', line_close=True, color='type', markers=True)
fig.show()
You need to duplicate the rows, and assign sequential data points the same type.
# First append the df to itself, but only keep the r and theta columns
# This will make the type column NaN for the appended rows
df2 = pd.concat([df, df[['r', 'theta']]]).sort_values(by=['r', 'theta'])
# Now fill the NaN type value by taking the type value of the next row
df2.type.fillna(method='bfill', inplace=True)
# The last type value should be equal to the first type value to close the loop
# This needs to be set manually
df2.type.fillna(df2.type.iloc[0], inplace=True)
df2
r
theta
type
1
36
analogique
two
1
36
analogique
one
0
52
analytique
one
0
52
analytique
three
2
85
affectif
three
2
85
affectif
two
Now if you plot that, you will get a triangle with every edge having a separate color:
fig = px.line_polar(df2, r='r', theta='theta', color='type', line_close=True, markers=True)
fig.show()
Not sure why the categories have changed order, but you can probably fix that by sorting the df2 DataFrame differently.
Text labels
If you would like to have text labels in your graph, you'll find in the docs that there is a text parameter:
fig = px.line_polar(df2, r='r', theta='theta', color='type', text='r', line_close=True, markers=True)
fig.update_traces(textposition='top center')

Plot Bar Graph with different Parametes in X Axis

I have a DataFrame like below. It has Actual and Predicted columns. I want to compare Actual Vs Predicted in Bar plot in one on one. I have confidence value for Predicted column and default for Actual confidence is 1. So, I want to keep Each row in single bar group Actual and Predicted value will be X axis and corresponding Confidence score as y value.
I am unable to get the expected plot because X axis values are not aligned or grouped to same value in each row.
Actual Predicted Confidence
0 A A 0.90
1 B C 0.30
2 C C 0.60
3 D D 0.75
Expected Bar plot.
Any hint would be appreciable. Please let me know if further details required.
What I have tried so far.
df_actual = pd.DataFrame()
df_actual['Key']= df['Actual'].copy()
df_actual['Confidence'] = 1
df_actual['Identifier'] = 'Actual'
df_predicted=pd.DataFrame()
df_predicted = df[['Predicted', 'Confidence']]
df_predicted = df_predicted.rename(columns={'Predicted': 'Key'})
df_predicted['Identifier'] = 'Predicted'
df_combined = pd.concat([df_actual,df_predicted], ignore_index=True)
df_combined
fig = px.bar(df_combined, x="Key", y="Confidence", color='Identifier',
barmode='group', height=400)
fig.show()
I have found that adjusting the data first makes it easier to get the plot I want. I have used Seaborn, hope that is ok. Please see if this code works for you. I have considered that the df mentioned above is already available. I created df2 so that it aligns to what you had shown in the expected figure. Also, I used index as the X-axis column so that the order is maintained... Some adjustments to ensure xtick names align and the legend is outside as you wanted it.
Code
vals= []
conf = []
for x, y, z in zip(df.Actual, df.Predicted, df.Confidence):
vals += [x, y]
conf += [1, z]
df2 = pd.DataFrame({'Values': vals, 'Confidence':conf}).reset_index()
ax=sns.barplot(data = df2, x='index', y='Confidence', hue='Values',dodge=False)
ax.set_xticklabels(['Actual', 'Predicted']*4)
plt.legend(bbox_to_anchor=(1.0,1))
plt.show()
Plot
Update - grouping Actual and Predicted bars
Hi #Mohammed - As we have already used up hue, I don't think there is a way to do this easily with Seaborn. You would need to use matplotlib and adjust the bar position, xtick positions, etc. Below is the code that will do this. You can change SET1 to another color map to change colors. I have also added a black outline as the same colored bars were blending into one another. Further, I had to rotate the xlables, as they were on top of one another. You can change it as per your requirements. Hope this helps...
vals = df[['Actual','Predicted']].melt(value_name='texts')['texts']
conf = [1]*4 + list(df.Confidence)
ident = ['Actual', 'Predicted']*4
df2 = pd.DataFrame({'Values': vals, 'Confidence':conf, 'Identifier':ident}).reset_index()
uvals, uind = np.unique(df2["Values"], return_inverse=1)
cmap = plt.cm.get_cmap("Set1")
fig, ax=plt.subplots()
l = len(df2)
pos = np.arange(0,l) % (l//2) + (np.arange(0,l)//(l//2)-1)*0.4
ax.bar(pos, df2["Confidence"], width=0.4, align="edge", ec="k",color=cmap(uind) )
handles=[plt.Rectangle((0,0),1,1, color=cmap(i), ec="k") for i in range(len(uvals))]
ax.legend(handles=handles, labels=list(uvals), prop ={'size':10}, loc=9, ncol=8)
pos=pos+0.2
pos.sort()
ax.set_xticks(pos)
ax.set_xticklabels(df2["Identifier"][:l], rotation=45,ha='right', rotation_mode="anchor")
ax.set_ylim(0, 1.2)
plt.show()
Output plot
I updated #Redox answer to get the exact output.
df_ = pd.DataFrame({'Labels': df.reset_index()[['Actual', 'Predicted', 'index']].values.ravel(),
'Confidence': np.array(list(zip(np.repeat(1, len(df)), df['Confidence'].values, np.repeat(0, len(df))))).ravel()})
df_.loc[df_['Labels'].astype(str).str.isdigit(), 'Labels'] = ''
plt.figure(figsize=(15, 6))
ax=sns.barplot(data = df_, x=df_.index, y='Confidence', hue='Labels',dodge=False, ci=None)
ax.set_xticklabels(['Actual', 'Predicted', '']*len(df))
plt.setp(ax.get_xticklabels(), rotation=90)
ax.tick_params(labelsize=14)
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)
Output:
Removed loop to improve performance
Added blank bar values to look alike group chart.

Pandas Python Visualization - ValueError: shape mismatch: ERROR

*Edit:
Why the right plot (Bar) is showing 50% , half black screen on the plot, wierd numbers, "garbage"... how to fix the right plot ?
here is my code:
top_series = all_data.head(50).groupby('Top Rated ')['Top Rated '].count()
top_values = top_series.values.tolist()
top_index = ['Top Rated', 'Not Top Rated']
top_colors = ['#27AE60', '#E74C3C']
rating_series = all_data.head(50).groupby('Rating')['Rating'].count()
rating_values = rating_series.values.tolist()
rating_index = ['High' , 'Low']
rating_colors = ['#F1C40F', '#27AE60']
fig, axs = plt.subplots(1,2, figsize=(16,5))
axs[0].pie(top_values, labels=top_index, autopct='%1.1f%%', shadow=True, startangle=90,
explode=(0.05, 0.05), radius=1.5, colors=top_colors, textprops={'fontsize':15})
axs[1].bar(rating_series.index, rating_series.values, color='b')
axs[1].set_xlabel('Rating')
axs[1].set_ylabel('Amount')
fig.suptitle('Does "Rating" really affect on Top Sellers ? ')
CSV cols:
Output (look at the right plot):
I suppose, that keys is a list of all keys. So it can have a different shape than the top_values.
If you would do:
axs[1].bar(top_series.index, top_series.values, color='b')
It should work well.
But, if you just want to plot the histogram, there is even shorter version, without temporary objects:
all_data['Top Rated '].value_counts().plot(kind = 'bar', ax=axs[1])
Edit: The Rating column is a numeric one, not a string one. You have to create a column which will have values High and Low. For example:
all_data['Rating_Cat'] = all_data['Rating'].apply(lambda x : 'High' if (x > 10000000 ) else 'Low')
And then use this column to plot this kind of bar plot

Visualizing third variable with MatplotLib Histograms

Excuse my bad english.
On a DataFrame like the following one :
-----------------
|index|var1|var2|
-----------------
there is lot of rows
var1 is between 0 and 4000
var2 is between -100 and 100
I'm looking to create an histogram that show how many rows there is according to var1.
On the Y axis, we can see how many rows there is, for example for 0 > var1 < 500, there is almost 500k rows.
Now I want to add var2, which show the quality of a row.
I want that for example the histgram become blue from 0 to 500 and another color from 500 to 1000 according to the value of var2 (like if the bar as values where the mean of var2 is 100, make it green, if the mean is 0, make it red).
I tried to hardcore this, but as soon as I change the bins or anything, my code break.
I also tried to do it using plot on the top of the hist, but it doesn't work.
My current code for the screenshot :
plt.hist(var1, bins=10, range=(0,4000), color='orange', alpha=0.7)
plt.title('Var 1',weight='bold', fontsize=18)
plt.yticks(weight='bold')
plt.xticks(weight='bold')
I feel like this is simple things to do, but I'm completely stuck in my learning because of this.
Many thanks for your help.
If you create a list containing the colors for each bar in your histogram you can use the following code snippet. It catches the return values of the plt.hist command, which include the individual patches. The color can be set individually while iterating through those patches.
n, bins, patches = plt.hist(var1, bins=8, range=(0,4000), color="orange", alpha=0.7)
for i, patch in enumerate(patches):
plt.setp(patch, "facecolor", colors[i])
Additionally, here is one possible way to create the mentioned color list based on the kind of data you have:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# create random values and store them in a DataFrame
y1 = np.random.randint(0,4000, 50)
y2 = np.random.randint(-100, 101, 50)
y = zip(y1,y2)
df = pd.DataFrame(y, columns=["Var1","Var2"])
var1 = df["Var1"].values
# pd.cut to bin the dataframe in the appropriate ranges of Var1
# then the mean of Var2 is calculated for each bin, results are stored in a list
mean = [df.groupby(pd.cut(df["Var1"], np.arange(0, 4000+500, 500)))["Var2"].mean()]
# how to color the bars based on Var2:
# -100 <= mean(Var2) < -33: blue
# -33 <= mean(Var2) < 33: red
# 33 <= mean(Var2) < 100: green
color_bins = np.array([-100,-33,33,100])
color_list = ["blue","red","green"]
# bin the means of Var2 according to the color_bins we just created
inds = np.digitize(mean, color_bins)
# list that assigns the appropriate color to each patch
colors = [color_list[value-1] for value in inds[0]]
n, bins, patches = plt.hist(var1, bins=8, range=(0,4000), color="orange", alpha=0.7)
for i, patch in enumerate(patches):
plt.setp(patch, "facecolor", colors[i])
plt.title('Var 1',weight='bold', fontsize=18)
plt.yticks(weight='bold')
plt.xticks(weight='bold')
plt.show()

How to create bar chart with secondary_y from dataframe

I want to create a bar chart of two series (say 'A' and 'B') contained in a Pandas dataframe. If I wanted to just plot them using a different y-axis, I can use secondary_y:
df = pd.DataFrame(np.random.uniform(size=10).reshape(5,2),columns=['A','B'])
df['A'] = df['A'] * 100
df.plot(secondary_y=['A'])
but if I want to create bar graphs, the equivalent command is ignored (it doesn't put different scales on the y-axis), so the bars from 'A' are so big that the bars from 'B' are cannot be distinguished:
df.plot(kind='bar',secondary_y=['A'])
How can I do this in pandas directly? or how would you create such graph?
I'm using pandas 0.10.1 and matplotlib version 1.2.1.
Don't think pandas graphing supports this. Did some manual matplotlib code.. you can tweak it further
import pylab as pl
fig = pl.figure()
ax1 = pl.subplot(111,ylabel='A')
#ax2 = gcf().add_axes(ax1.get_position(), sharex=ax1, frameon=False, ylabel='axes2')
ax2 =ax1.twinx()
ax2.set_ylabel('B')
ax1.bar(df.index,df.A.values, width =0.4, color ='g', align = 'center')
ax2.bar(df.index,df.B.values, width = 0.4, color='r', align = 'edge')
ax1.legend(['A'], loc = 'upper left')
ax2.legend(['B'], loc = 'upper right')
fig.show()
I am sure there are ways to force the one bar further tweak it. move bars further apart, one slightly transparent etc.
Ok, I had the same problem recently and even if it's an old question, I think that I can give an answer for this problem, in case if someone else lost his mind with this. Joop gave the bases of the thing to do, and it's easy when you only have (for exemple) two columns in your dataframe, but it becomes really nasty when you have a different numbers of columns for the two axis, due to the fact that you need to play with the position argument of the pandas plot() function. In my exemple I use seaborn but it's optionnal :
import pandas as pd
import seaborn as sns
import pylab as plt
import numpy as np
df1 = pd.DataFrame(np.array([[i*99 for i in range(11)]]).transpose(), columns = ["100"], index = [i for i in range(11)])
df2 = pd.DataFrame(np.array([[i for i in range(11)], [i*2 for i in range(11)]]).transpose(), columns = ["1", "2"], index = [i for i in range(11)])
fig, ax = plt.subplots()
ax2 = ax.twinx()
# we must define the length of each column.
df1_len = len(df1.columns.values)
df2_len = len(df2.columns.values)
column_width = 0.8 / (df1_len + df2_len)
# we calculate the position of each column in the plot. This value is based on the position definition :
# Specify relative alignments for bar plot layout. From 0 (left/bottom-end) to 1 (right/top-end). Default is 0.5 (center)
# http://pandas.pydata.org/pandas-docs/dev/generated/pandas.DataFrame.plot.html
df1_posi = 0.5 + (df2_len/float(df1_len)) * 0.5
df2_posi = 0.5 - (df1_len/float(df2_len)) * 0.5
# In order to have nice color, I use the default color palette of seaborn
df1.plot(kind='bar', ax=ax, width=column_width*df1_len, color=sns.color_palette()[:df1_len], position=df1_posi)
df2.plot(kind='bar', ax=ax2, width=column_width*df2_len, color=sns.color_palette()[df1_len:df1_len+df2_len], position=df2_posi)
ax.legend(loc="upper left")
# Pandas add line at x = 0 for each dataframe.
ax.lines[0].set_visible(False)
ax2.lines[0].set_visible(False)
# Specific to seaborn, we have to remove the background line
ax2.grid(b=False, axis='both')
# We need to add some space, the xlim don't manage the new positions
column_length = (ax2.get_xlim()[1] - abs(ax2.get_xlim()[0])) / float(len(df1.index))
ax2.set_xlim([ax2.get_xlim()[0] - column_length, ax2.get_xlim()[1] + column_length])
fig.patch.set_facecolor('white')
plt.show()
And the result : http://i.stack.imgur.com/LZjK8.png
I didn't test every possibilities but it looks like it works fine whatever the number of columns in each dataframe you use.

Categories