I have a pandas dataframe, with columns 'groupname', 'result', and 'temperature'. I've plotted a Seaborn swarmplot, where x='groupname' and y='result', which shows the results data separated into the groups.
What I also want to do is to colour the markers according to their temperature, using a colormap, so that for example the coldest are blue and hottest red.
Plotting the chart is very simple:
import pandas as pd
import seaborn as sns
import matplotlib
import matplotlib.pyplot as plt
data = {'groupname': ['G0', 'G0', 'G0', 'G0', 'G1', 'G1', 'G1'], 'shot': [1, 2, 3, 4, 1, 2, 3], 'temperature': [20, 25, 35, 10, -20, -17, -6], 'result': [10.0, 10.1, 10.5, 15.0, 15.1, 13.5, 10.5]}
df = pd.DataFrame(data)
groupname shot temperature result
0 G0 1 20 10.0
1 G0 2 25 10.1
2 G0 3 35 10.5
3 G0 4 10 15.0
4 G1 1 -20 15.1
5 G1 2 -17 13.5
6 G1 3 -6 10.5
plt.figure()
sns.stripplot(data=results, x="groupname", y="result")
plt.show()
But now I'm stuck trying to colour the points, I've tried a few things like:
sns.stripplot(data=results, x="groupname", y="result", cmap=matplotlib.cm.get_cmap('Spectral'))
which doesn't seem to do anything.
Also tried:
sns.stripplot(data=results, x="groupname", y="result", hue='temperature')
which does colour the points depending on the temperature, however the colours are random rather than mapped.
I feel like there is probably a very simple way to do this, but haven't been able to find any examples.
Ideally looking for something like:
sns.stripplot(data=results, x="groupname", y="result", colorscale='temperature')
Hello the keyword you are looking for is "palette"
Below should work:
sns.stripplot(data=results, x="groupname", y="result", hue='temperature',palette="vlag")
http://man.hubwiz.com/docset/Seaborn.docset/Contents/Resources/Documents/generated/seaborn.stripplot.html
Related
Here's the table from the dataframe:
Points_groups
Qty Contracts
Qty Gones
1
350+
108
275
2
300-350
725
1718
3
250-300
885
3170
4
200-250
2121
10890
5
150-200
3120
7925
6
100-150
653
1318
7
50-100
101
247
8
0-50
45
137
I'd like to get something like this out of it:
But that the columns correspond to the 'x' axis,
which was built from the 'Scores_groups' column like this
I tried a bunch of options already, but I couldn't get it.
For example:
df.plot(kind ='hist')
plt.xlabel('Points_groups')
plt.ylabel("Number Of Students");
or
sns.distplot(df['Кол-во Ушедшие'])
sns.distplot(df['Кол-во Контракт'])
plt.show()
or
df.hist(column='Баллы_groups', by= ['Кол-во Контракт', 'Кол-во Ушедшие'], bins=2, grid=False, rwidth=0.9,color='purple', sharex=True);
Since you already have the distribution in your pandas dataframe, the plot you need can be achieved with the following code:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
Df = pd.DataFrame({'key': ['red', 'green', 'blue'], 'A': [1, 2, 1], 'B': [2, 4, 3]})
X_axis = np.arange(len(Df['key']))
plt.bar(X_axis - 0.2, Df['A'], 0.4, label = 'A')
plt.bar(X_axis + 0.2, Df['B'], 0.4, label = 'B')
X_label = list(Df['key'].values)
plt.xticks(X_axis, X_label)
plt.legend()
plt.show()
Since I don't have access to your data, I made some mock dataframe. This results in the following figure:
I want to visualize the grade depending on the sex (male/female).
My dataframe:
df = pd.DataFrame(
{
"key": ["K0", "K1", "K2", "K3", "K4", "K5", "K6", "K7", "K8", "K9"],
"grade": [1.0, 2.0, 4.0, 1.0, 5.0, 2.0, 3.0, 1.0, 6.0, 3.0],
"sex": [1, 0, 0, 1, 0,1,0,1,0,0]
}
)
key grade sex
0 K0 1.0 1
1 K1 2.0 0
2 K2 4.0 0
3 K3 1.0 1
4 K4 5.0 0
5 K5 2.0 1
6 K6 3.0 0
7 K7 1.0 1
8 K8 6.0 0
9 K9 3.0 0
My approach was to use a histogram and plot the distribution. However, I don't know how to visualize the distribution depending on the target. There are some examples in Seaborn Documentation, but I failed to apply it to my specific problem.
All I have is this:
plt.hist(df['grade'], bins=10, edgecolor='black');
plt.xlabel('grade');
plt.ylabel('count');
You can do this in matplotlib:
import matplotlib.pyplot as pyplot
x=df.loc[df['sex']==1, 'grade']
y=df.loc[df['sex']==0, 'grade']
bins=list(range(6))
pyplot.hist(x, bins, alpha=0.5, label='sex=1')
pyplot.hist(y, bins, alpha=0.5, label='sex=2')
pyplot.legend(loc='upper right')
pyplot.show()
There is also a way for doing this with pandas:
df[df['sex'] == 0]['grade'].plot.hist()
df[df['sex'] == 1]['grade'].plot.hist()
and you can also have smooth curve with using kde():
df[df['sex'] == 0]['grade'].plot.kde()
I have a problem about drawing a nested pie graph in Matplotlib in Python. I wrote some codes to handle with this process but I have an issue related with design and label
I'd like to draw a kind of this nested pie graph. (from the uppermost layer of the nested to its innermost is SEX, ALIGN with covering their counts)
Here is my dataframe which is shown below.
ALIGN SEX count
2 Bad Characters Male Characters 1542
5 Good Characters Male Characters 1419
3 Good Characters Female Characters 714
0 Bad Characters Female Characters 419
8 Neutral Characters Male Characters 254
6 Neutral Characters Female Characters 138
1 Bad Characters Genderless Characters 9
4 Good Characters Genderless Characters 4
7 Neutral Characters Genderless Characters 3
9 Reformed Criminals Male Characters 2
Here is my code snippets related with showing nested pie graph which is shown below.
fig, ax = plt.subplots(figsize=(24,12))
size = 0.3
ax.pie(dc_df_ALIGN_SEX.groupby('SEX')['count'].sum(), radius=1,
labels=dc_df_ALIGN_SEX['SEX'].drop_duplicates(),
autopct='%1.1f%%',
wedgeprops=dict(width=size, edgecolor='w'))
ax.pie(dc_df_ALIGN_SEX['count'], radius=1-size, labels = dc_df_ALIGN_SEX["ALIGN"],
wedgeprops=dict(width=size, edgecolor='w'))
ax.set(aspect="equal", title='Pie plot with `ax.pie`')
plt.show()
How can I design 4 row and 4 column and put each one in each slot and showing labels in legend area?
Since the question has been changed, I'm posting a new answer.
First, I slightly simplified your DataFrame:
import pandas as pd
df = pd.DataFrame([['Bad', 'Male', 1542],
['Good', 'Male', 1419],
['Good', 'Female', 714],
['Bad', 'Female', 419],
['Neutral', 'Male', 254],
['Neutral', 'Female', 138],
['Bad', 'Genderless', 9],
['Good', 'Genderless', 4],
['Neutral', 'Genderless', 3],
['Reformed', 'Male', 2]])
df.columns = ['ALIGN', 'SEX', 'n']
For the numbers in the outer ring, we can use a simple groupby, as you did:
outer = df.groupby('SEX').sum()
But for the numbers in the inner ring, we need to group by both categorical variables, which results in a MultiIndex:
inner = df.groupby(['SEX', 'ALIGN']).sum()
inner
n
SEX ALIGN
Female Bad 419
Good 714
Neutral 138
Genderless Bad 9
Good 4
Neutral 3
Male Bad 1542
Good 1419
Neutral 254
Reformed 2
We can extract the appropriate labels from the MultiIndex with its get_level_values() method:
inner_labels = inner.index.get_level_values(1)
Now you can turn the above values into one-dimensional arrays and plug them into your plot calls:
import matplotlib.pyplot as plt
import numpy as np
fig, ax = plt.subplots(figsize=(24,12))
size = 0.3
ax.pie(outer.values.flatten(), radius=1,
labels=outer.index,
autopct='%1.1f%%',
wedgeprops=dict(width=size, edgecolor='w'))
ax.pie(inner.values.flatten(), radius=1-size,
labels = inner_labels,
wedgeprops=dict(width=size, edgecolor='w'))
ax.set(aspect="equal", title='Pie plot with `ax.pie`')
plt.show()
You define the function percentage_growth(l) in a way that supposes its argument l to be a list (or some other one-dimensional object). But then (to assign colors) you call this function on dc_df_ALIGN_SEX, which is apparently your DataFrame. So the function (in the first iteration of its loop) tries to evaluate dc_df_ALIGN_SEX[0], which throws the key error, because that is not a proper way to index the DataFrame.
Perhaps you want to do something like percentage_growth(dc_df_ALIGN_SEX['count']) instead?
I am using panda in order to assign a score to some gamers.
I computed, using the same KPIs, some attributes for every gamer and now I have, for each player, a row with the results.
The dataframe looks like this (the only difference is that it has more columns) :
| Name | Speed | ATK |
| G1 | 0.32 | 0.89 |
| G4 | 0.31 | 0.76 |
I thought it would be nice to plot a radar chart (https://en.wikipedia.org/wiki/Radar_chart)
for each row using matplotlib (if possible).
How would you do it?
Is there a better alternative to matplotlib?
Thanks.
To get this spider-look, you need at least three columns. So, I've added a Random column to your dataframe:
import pandas as pd
df = pd.DataFrame({"Name": ["G1", "G4"],
"Speed": [0.32, 0.31],
"ATK": [0.89, 0.76],
"Random": [0.4, 0.8]})
print(df)
# Name Speed ATK Random
#0 G1 0.32 0.89 0.4
#1 G4 0.31 0.76 0.8
Now, let's see how to plot this simple dataframe. The follwing code is adapted from this blog post:
# import necessary modules
import numpy as np
import matplotlib.pyplot as plt
from math import pi
# obtain df information
categories = list(df)[1:]
values = df.mean().values.flatten().tolist()
values += values[:1] # repeat the first value to close the circular graph
angles = [n / float(len(categories)) * 2 * pi for n in range(len(categories))]
angles += angles[:1]
# define plot
fig, ax = plt.subplots(nrows=1, ncols=1, figsize=(8, 8),
subplot_kw=dict(polar=True))
plt.xticks(angles[:-1], categories, color='grey', size=12)
plt.yticks(np.arange(0.5, 2, 0.5), ['0.5', '1.0', '1.5'],
color='grey', size=12)
plt.ylim(0, 2)
ax.set_rlabel_position(30)
# draw radar-chart:
for i in range(len(df)):
val_c1 = df.loc[i].drop('Name').values.flatten().tolist()
val_c1 += val_c1[:1]
ax.plot(angles, val_c1, linewidth=1, linestyle='solid',
label=df.loc[i]["Name"])
ax.fill(angles, val_c1, alpha=0.4)
# add legent and show plot
plt.legend(loc='upper right', bbox_to_anchor=(0.1, 0.1))
plt.show()
Which results a graph like so:
I'm trying to interpolate temperature data observed on an urban area formed by 5 locations. I am using cartopy to interpolate and draw the map, however, when I run the script the temperature interpolation is not shown and I only get the layer of the urban area with the color palette. Can someone help me fix this error? The link of shapefile is
https://www.dropbox.com/s/0u76k3yegvr09sx/LimiteAMG.shp?dl=0
https://www.dropbox.com/s/yxsmm3v2ey3ngsp/LimiteAMG.cpg?dl=0
https://www.dropbox.com/s/yx05n31dfkggbb6/LimiteAMG.dbf?dl=0
https://www.dropbox.com/s/a6nk0xczgjeen2d/LimiteAMG.prj?dl=0
https://www.dropbox.com/s/royw7s51n2f0a6x/LimiteAMG.qpj?dl=0
https://www.dropbox.com/s/7k44dcl1k5891qc/LimiteAMG.shx?dl=0
Data
Lat Lon tmax
0 20.8208 -103.4434 22.8
1 20.7019 -103.4728 17.7
2 20.6833 -103.3500 24.9
3 20.6280 -103.4261 NaN
4 20.7205 -103.3172 26.4
5 20.7355 -103.3782 25.7
6 20.6593 -103.4136 NaN
7 20.6740 -103.3842 25.8
8 20.7585 -103.3904 NaN
9 20.6230 -103.4265 NaN
10 20.6209 -103.5004 NaN
11 20.6758 -103.6439 24.5
12 20.7084 -103.3901 24.0
13 20.6353 -103.3994 23.0
14 20.5994 -103.4133 25.0
15 20.6302 -103.3422 NaN
16 20.7400 -103.3122 23.0
17 20.6061 -103.3475 NaN
18 20.6400 -103.2900 23.0
19 20.7248 -103.5305 24.0
20 20.6238 -103.2401 NaN
21 20.4753 -103.4451 NaN
Code:
import cartopy
import cartopy.crs as ccrs
from matplotlib.colors import BoundaryNorm
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import cartopy.io.shapereader as shpreader
from metpy.calc import get_wind_components
from metpy.cbook import get_test_data
from metpy.gridding.gridding_functions import interpolate, remove_nan_observation
from metpy.plots import add_metpy_logo
from metpy.units import units
to_proj = ccrs.PlateCarree()
data=pd.read_csv('/home/borisvladimir/Documentos/Datos/EMAs/EstacionesZMG/RedZMG.csv',usecols=(1,2,3),names=['Lat','Lon','tmax'],na_values=-99999,header=0)
fname='/home/borisvladimir/Dropbox/Diversos/Shapes/LimiteAMG.shp'
adm1_shapes = list(shpreader.Reader(fname).geometries())
lon = data['Lon'].values
lat = data['Lat'].values
xp, yp, _ = to_proj.transform_points(ccrs.Geodetic(), lon, lat).T
x_masked, y_masked, t = remove_nan_observations(xp, yp, data['tmax'].values)
#Interpola temp usando Cressman
tempx, tempy, temp = interpolate(x_masked, y_masked, t, interp_type='cressman', minimum_neighbors=3, search_radius=400000, hres=35000)
temp = np.ma.masked_where(np.isnan(temp), temp)
levels = list(range(-20, 20, 1))
cmap = plt.get_cmap('viridis')
norm = BoundaryNorm(levels, ncolors=cmap.N, clip=True)
fig = plt.figure(figsize=(15, 10))
view = fig.add_subplot(1, 1, 1, projection=to_proj)
view.add_geometries(adm1_shapes, ccrs.PlateCarree(),edgecolor='black', facecolor='white', alpha=0.5)
view.set_extent([-103.8, -103, 20.3, 21.099 ], ccrs.PlateCarree())
ZapLon,ZapLat=-103.50,20.80
GuadLon,GuadLat=-103.33,20.68
TonaLon,TonaLat=-103.21,20.62
TlaqLon,TlaqLat=-103.34,20.59
TlajoLon,TlajoLat=-103.44,20.47
plt.text(ZapLon,ZapLat,'Zapopan',transform=ccrs.Geodetic())
plt.text(GuadLon,GuadLat,'Guadalajara',transform=ccrs.Geodetic())
plt.text(TonaLon,TonaLat,'Tonala',transform=ccrs.Geodetic())
plt.text(TlaqLon,TlaqLat,'Tlaquepaque',transform=ccrs.Geodetic())
plt.text(TlajoLon,TlajoLat,'Tlajomulco',transform=ccrs.Geodetic())
mmb = view.pcolormesh(tempx, tempy, temp,transform=ccrs.PlateCarree(),cmap=cmap, norm=norm)
plt.colorbar(mmb, shrink=.4, pad=0.02, boundaries=levels)
plt.show()
The problem is in the call to MetPy's interpolate function. With the setting of hres=35000, it is generating a grid spaced at 35km. However, it appears that your data points are spaced much more closely than that; together, that results in a generated grid that has only two points, as shown as the red points below (black points are the original stations with non-masked data):
The result is that it only creates two points for the grid, both of which are outside the bounds of your data points; therefore those points end up masked. If instead we set hres to something much lower, say 5km (i.e. 5000), then a much more sensible result comes out: