Plotting values from two datasets for comparison - python

I would like to plot two dataframes in order to compare the results. My first choice would be to plot line charts based only on one column from the two dataframes.
df
Name Surname P R F
0 B N 0.41 0.76 0.53
1 B L 0.62 0.67 0.61
2 B SV 0.63 0.53 0.52
3 B SG 0.43 0.61 0.53
4 B R 0.81 0.51 0.53
5 T N 0.32 0.82 0.53
6 T L 0.58 0.69 0.62
7 T SV 0.67 0.61 0.64
8 T SG 0.53 0.63 0.57
9 T R 0.74 0.48 0.58
and
data = [['B','N',0.41,0.72,0.51],
['B','L',0.66,0.67,0.62],
['B','SV',0.63,0.51,0.51],
['B','SG',0.44,0.63,0.51],
['B','R',0.81,0.51,0.62],
['T','N',0.33,0.80,0.47],
['T','L',0.58,0.61,0.63],
['T','SV',0.68,0.61,0.64],
['T','SG',0.53,0.63,0.57],
['T','R',0.74,0.48,0.58]]
df1 = pd.DataFrame(data, columns = ['Name','Surname','P','R','F'])
I would like to create a plot based on F values, keeping information (in legend/labels) of B/T and R,N,L, SV, SG.
I have tried with bar charts, but this does not take into account labels/legend.
I am looking for something like this:
fig, ax = plt.subplots()
ax2 = ax.twinx()
df.plot(x="Name", y=["F"], ax=ax)
df1.plot(x="Name", y=["F"], ax=ax2, ls="--")
However this misses labels and legend.
I have also tried with:
ax = df.plot()
l = ax.get_lines()
df1.plot(ax=ax, linestyle='--', color=(i.get_color() for i in l))
But I cannot distinguish by Name, Surname and dataframe (on the x axis there should be Surname).
It would be also ok to plot separately the values (P, R and F) as follows:
ax = df[['P']].plot()
l = ax.get_lines()
df1[['P']].plot(ax=ax, linestyle='--', color=(i.get_color() for i in l))
I should compare F values of the two plots based on Name and Surname.
Any help would be greatly appreciated.

IIUC,
fig, ax = plt.subplots()
ax2 = ax.twinx()
df.plot(x="Name", y=["F"], ax=ax)
df1.plot(x="Name", y=["F"], ax=ax2, ls="--")
fig.legend(loc="upper right", bbox_to_anchor=(1,1), bbox_transform=ax.transAxes)
Output:

The simplest way to add information about other parameters to a graph is to use functions like ax.text or ax.annotate over a loop. The code should look like this:
fig, ax = plt.subplots()
data1 = ax.bar(20*index, df["F"], bar_width)
data2 = ax.bar(20*index+bar_width, df1["F"],bar_width)
for i in index:
ax.text(i*20-5,0,df['Surname'][i],)
ax.text(i*20-5,0.05,df['Name'][i])
ax.text(i*20+bar_width-5,0,df1['Surname'][i])
ax.text(i*20+bar_width-5,0.05,df1['Name'][i])
plt.show()
Useful link:
Official Documentation for Text in Matplotlib Plots
Edit:
Probably similar problem: Different text at each point
Edit 2:
Code without index:
fig, ax = plt.subplots()
data1 = ax.plot(df["F"])
data2 = ax.plot(df1["F"])
for i in range(1,10):
ax.text(i,df["F"][i],df['Name'][i]+" "+df['Surname'][i],)
ax.text(i,df["F"][i],df['Name'][i]+" "+df['Surname'][i],)
plt.show()

Related

Data visulisation using ridge and scatter plot

Background:
I am working on python, I have a lot of data points (in .CSV form) so far what the code I have
Reads the csv and the "result" column
if the value in the "result" column is positive, the code plots the A B C D E F G parameters corresponding to it in such a way that the y-axis is the value of the parameters and x-axis is the name of the parameter.
If the number of such "result" are more than 10 It plots the first 10 A B C D E F G parameters corresponding to the results.
An example of the type of dataset is below. (Mine contains around 12000 rows)
The Dataset
A B C D E F G result
1.00 0.85 -0.999 0.27 0.98 0.39 0.80 -0.86
0.89 0.4 -0.6 0.47 0.28 0.29 0.26 0.65
0.65 -1.00 0.26 0.67 -0.88 0.29 0.10 0.50
0.98 -0.98 0.76 0.37 0.68 0.59 0.90 0
0 0.5 0.56 0.27 0.38 0.79 0.48 -0.65
The code :
df = pd.read_csv("result.csv")
df.loc[df.result>0, df.columns[:-1]].T.plot(ls='', marker='o')
Issue :
Sometimes if the value is the same the dot mark is at the same place thus it's hard to see the frequency distribution(such as in Column B and C below though they look similar one value has more points.
What I want to do is to plot something like a ridge plot on the current graph (as I drew below )so that the frequency distribution can be seen.
I am a novice in this type of data visualization. Kindly guide me on how it could be done
The density plot type already does pretty much what you want, we just need to superpose it to your data:
>>> data_to_plot = df.loc[df.result>0, df.columns[:-1]]
>>> data_to_plot.plot(kind='density')
This is trivial if you want horizontal subplots, you can simply use the subplots=True on either plot (and then zip the returned axes with columns to superpose the other plot):
>>> axes = data_to_plot.plot(kind='density', subplots=True, legend=False)
>>> for ax, (colname, series) in zip(axes, data_to_plot.iteritems()):
... ax.plot(series.values, np.zeros_like(series), ls='', marker='o')
... ax.set_ylabel(colname)
However if you want them vertically it’s likely we’ll have to compute the Gaussian densities ourselves. Pandas documentation points to scipy.stats.gaussian_kde. For this we’ll need to know at which points to interpolate the kernel. On your example it looks like [-1..1] is a good interval but of course you can take it from data min/max.
>>> from scipy.stats import gaussian_kde
>>> y = np.arange(-1, 1.01, .01)
>>> ridges = data_to_plot.apply(lambda s: gaussian_kde(s)(y))
>>> ridges
A B C D E F G
0 0.001119 0.271510 0.270048 2.029737e-24 0.163222 2.352981e-15 0.000018
1 0.001247 0.272310 0.272122 4.796826e-24 0.164507 3.959987e-15 0.000021
2 0.001389 0.273071 0.274155 1.125941e-23 0.165765 6.637610e-15 0.000025
3 0.001545 0.273794 0.276145 2.624972e-23 0.166995 1.108083e-14 0.000030
4 0.001717 0.274479 0.278093 6.078288e-23 0.168200 1.842365e-14 0.000036
.. ... ... ... ... ... ... ...
196 0.939109 0.307535 0.314227 3.791151e-02 0.436305 3.153771e-01 0.630121
197 0.932996 0.304793 0.310216 3.100156e-02 0.431472 2.913782e-01 0.615406
198 0.926089 0.302012 0.306172 2.518140e-02 0.426576 2.682819e-01 0.600298
199 0.918401 0.299193 0.302097 2.031681e-02 0.421619 2.461581e-01 0.584834
200 0.909948 0.296337 0.297994 1.628194e-02 0.416607 2.250649e-01 0.569049
[201 rows x 7 columns]
Then simply ploy with zip, as before. There might be some adjustment needed, but this is how it looks like with your sample data. Note the scaling of ridges so they are all on the same scale and fit inside a 0.5-wide space on the plot.
>>> ax = data_to_plot.T.plot(ls='', marker='o')
>>> for n, (colname, ridge) in enumerate(ridges.iteritems()):
... ax.plot(ridge / (-2 * ridges.max().max()) + n, y, color='black')

Create a Radar Chart in python for each row of a Panda dataframe

I am using panda in order to assign a score to some gamers.
I computed, using the same KPIs, some attributes for every gamer and now I have, for each player, a row with the results.
The dataframe looks like this (the only difference is that it has more columns) :
| Name | Speed | ATK |
| G1 | 0.32 | 0.89 |
| G4 | 0.31 | 0.76 |
I thought it would be nice to plot a radar chart (https://en.wikipedia.org/wiki/Radar_chart)
for each row using matplotlib (if possible).
How would you do it?
Is there a better alternative to matplotlib?
Thanks.
To get this spider-look, you need at least three columns. So, I've added a Random column to your dataframe:
import pandas as pd
df = pd.DataFrame({"Name": ["G1", "G4"],
"Speed": [0.32, 0.31],
"ATK": [0.89, 0.76],
"Random": [0.4, 0.8]})
print(df)
# Name Speed ATK Random
#0 G1 0.32 0.89 0.4
#1 G4 0.31 0.76 0.8
Now, let's see how to plot this simple dataframe. The follwing code is adapted from this blog post:
# import necessary modules
import numpy as np
import matplotlib.pyplot as plt
from math import pi
# obtain df information
categories = list(df)[1:]
values = df.mean().values.flatten().tolist()
values += values[:1] # repeat the first value to close the circular graph
angles = [n / float(len(categories)) * 2 * pi for n in range(len(categories))]
angles += angles[:1]
# define plot
fig, ax = plt.subplots(nrows=1, ncols=1, figsize=(8, 8),
subplot_kw=dict(polar=True))
plt.xticks(angles[:-1], categories, color='grey', size=12)
plt.yticks(np.arange(0.5, 2, 0.5), ['0.5', '1.0', '1.5'],
color='grey', size=12)
plt.ylim(0, 2)
ax.set_rlabel_position(30)
# draw radar-chart:
for i in range(len(df)):
val_c1 = df.loc[i].drop('Name').values.flatten().tolist()
val_c1 += val_c1[:1]
ax.plot(angles, val_c1, linewidth=1, linestyle='solid',
label=df.loc[i]["Name"])
ax.fill(angles, val_c1, alpha=0.4)
# add legent and show plot
plt.legend(loc='upper right', bbox_to_anchor=(0.1, 0.1))
plt.show()
Which results a graph like so:

How to plot two colors in the one line by other columns value?

I have a dataframe like this:
df=pd.DataFrame([[1.65, -0.05, 0],
[1.68, -0.01, 0],
[1.70, 0.01, 1],
[1.67, -0.02, 1],
[1.73 , 0.05, 1],
[1.67 , 0.01, 1],
[ 1.67, -0.02, 1],
[1.70 , 0.03, 0],
[ 1.66, -0.01, 0],
[ 1.69 ,-0.01 , 0]
])
df.rename(columns={1: "diff", 2: "label"},inplace=True)
df['label']=df['label'].astype(str)
print(df)
0 diff label
0 1.65 -0.05 0
1 1.68 -0.01 0
2 1.70 0.01 1
3 1.67 -0.02 1
4 1.73 0.05 1
5 1.67 0.01 1
6 1.67 -0.02 1
7 1.70 0.03 0
8 1.66 -0.01 0
9 1.69 -0.01 0
I want to plot first columns and give it different color by 'label' column.
label=1 blue , label=0 red
That is , there are two colors in the one line.
I use the following code to plot.
df.iloc[0:2,0].plot(y=df.columns[0],color='r', )
df.iloc[1:7,0].plot(y=df.columns[0],color='b' )
df.iloc[6:10,0].plot(y=df.columns[0],color='r' )
Have any better method to plot?
In fact,the real data have 10000 rows
Essentially, you are trying to plot the value of 0 by its interaction with the previous value in the dataset, relative to the index.
My proposed solution is to plot each datapoint individually:
# First, create a new column for color
df['color'] = df['label'].map({0:'red',1:'blue'})
# Next, import & set up subplot
import matplotlib.patches as mpatches
import matplotlib.pyplot as plt
fig, ax = plt.subplots(1,1, figsize=(10,4))
# Iterate through rows
for idx, row in df[[0,'color']].iterrows():
v, c = row
# If you want a scatter plot
ax.scatter(idx, v, color=c)
if idx>0:
# If you want a line plot
ax.plot([idx-1,idx], [prev_v, v], color=c)
# Set the previous value
prev_v = v
# Add a legend
red_patch = mpatches.Patch(color='red', label='Losses')
blue_patch = mpatches.Patch(color='blue', label='Gains')
ax.legend(handles=[red_patch,blue_patch])
plt.show()
You can probably simplify it but as a general solution the following lines should help you grab all of the rows where label is 1 or 0:
# label == 1
df.iloc[df['label'].where(df['label'].astype(int) == 1).dropna().index].plot(y=df.columns[0], color='b')
# label == 0
df.iloc[df['label'].where(df['label'].astype(int) == 0).dropna().index].plot(y=df.columns[0], color='r')

Incorrect Python Matplotlib Polar Plotting

I'm trying to plot a polar plot using the code below
ax = plt.subplot(111,polar=True)
plt.scatter(ts,rds,c=cs,
s=[(y*100)+10 for y in ys],
cmap='gist_rainbow')
ax.set_yticklabels([])
for i in range(len(names)):
ax.text(ts[i],rds[i],"{}".format(i+1),size=12)
plt.show()
There are 42 points, the first 7 of which are
# Theta Radius
1 249.25 0.39
2 66.25 0.40
3 239.09 0.71
4 31.82 1.05
5 114.02 0.54
6 189.15 0.46
7 359.00 0.05
However, the resulting figure/plot at the below link is incorrect
https://i.imgur.com/n5dMuND.png
Before taking a screenshot, I mouse hovered over point 7. You can see that point 7 is plotted at 41.186 degrees and not 359 as it should be. Any idea why this is?
Thanks

How can I plot a correlation matrix as a set of ellipses, similar to the R open-air package?

The figure below is plotted using the open-air R package:
I know matplotlib has the plt.matshow function,
but it can't clearly show the relation between variables at the same time.
Here is my early work:
df is a pandas dataframe with 7 variables shows like below:
I don't know how to attach a .csv file to StackOverflow.
Using plt.matshow(df.corr(),cmap = plt.cm.Greens), the figure shows like this:
The second figure can't represent the correlation relations of the variables as clearly as the first one.
Edit:
I upload the csv file to Google docs here.
I'm not aware of any existing Python library that does these "ellipse plots", but it's not particularly hard to implement using a matplotlib.collections.EllipseCollection:
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
from matplotlib.collections import EllipseCollection
def plot_corr_ellipses(data, ax=None, **kwargs):
M = np.array(data)
if not M.ndim == 2:
raise ValueError('data must be a 2D array')
if ax is None:
fig, ax = plt.subplots(1, 1, subplot_kw={'aspect':'equal'})
ax.set_xlim(-0.5, M.shape[1] - 0.5)
ax.set_ylim(-0.5, M.shape[0] - 0.5)
# xy locations of each ellipse center
xy = np.indices(M.shape)[::-1].reshape(2, -1).T
# set the relative sizes of the major/minor axes according to the strength of
# the positive/negative correlation
w = np.ones_like(M).ravel()
h = 1 - np.abs(M).ravel()
a = 45 * np.sign(M).ravel()
ec = EllipseCollection(widths=w, heights=h, angles=a, units='x', offsets=xy,
transOffset=ax.transData, array=M.ravel(), **kwargs)
ax.add_collection(ec)
# if data is a DataFrame, use the row/column names as tick labels
if isinstance(data, pd.DataFrame):
ax.set_xticks(np.arange(M.shape[1]))
ax.set_xticklabels(data.columns, rotation=90)
ax.set_yticks(np.arange(M.shape[0]))
ax.set_yticklabels(data.index)
return ec
For example, using your data:
data = df.corr()
fig, ax = plt.subplots(1, 1)
m = plot_corr_ellipses(data, ax=ax, cmap='Greens')
cb = fig.colorbar(m)
cb.set_label('Correlation coefficient')
ax.margins(0.1)
Negative correlations can be plotted as ellipses with the opposite orientation:
fig2, ax2 = plt.subplots(1, 1)
data2 = np.linspace(-1, 1, 9).reshape(3, 3)
m2 = plot_corr_ellipses(data2, ax=ax2, cmap='seismic', clim=[-1, 1])
cb2 = fig2.colorbar(m2)
ax2.margins(0.3)
Assuming you are interested in showing cluster relations, the seaborn package mentioned in the comments also has a clustermap. Using your correlation matrix (looks like you want to show correlation coefficients as int in the [-100, 100] range, you could do the following:
corr = df.corr().mul(100).astype(int)
GX HG RM SJ XB XN ZG
GX 100 77 62 71 48 66 57
HG 77 100 69 74 61 61 58
RM 62 69 100 75 48 64 68
SJ 71 74 75 100 50 70 65
XB 48 61 48 50 100 46 51
XN 66 61 64 70 46 100 75
ZG 57 58 68 65 51 75 100
and then use seaborn.clustermap() as follows:
import seaborn as sns
sns.clustermap(data=corr, annot=True, fmt='d', cmap='Greens').savefig('cluster.png')
I just discovered this Python package biokit today. It provides a very handy function to create various kinds of correlation charts. For example:
In [1]: import pandas as pd
In [2]: import matplotlib.pyplot as plt
...: from biokit.viz import corrplot
In [6]: corr
Out[6]:
GX HG RM SJ XB XN ZG
GX 1.00 -0.77 0.62 0.71 0.48 0.66 0.57
HG -0.77 1.00 0.69 0.74 0.61 0.61 0.58
RM 0.62 0.69 1.00 0.75 0.48 0.64 0.68
SJ 0.71 0.74 0.75 1.00 0.50 0.70 0.65
XB 0.48 0.61 0.48 0.50 1.00 -0.46 0.51
XN 0.66 0.61 0.64 0.70 -0.46 1.00 0.75
ZG 0.57 0.58 0.68 0.65 0.51 0.75 1.00
I took Stefan's data and modified it a little bit. Let's assume this is a correlation matrix. Now to create a correlation chart, you can simply do this:
In [7]: c = corrplot.Corrplot(corr)
...: c.plot()
Correlation chart with ellipses
You can read more examples here.

Categories