Incorrect Python Matplotlib Polar Plotting - python

I'm trying to plot a polar plot using the code below
ax = plt.subplot(111,polar=True)
plt.scatter(ts,rds,c=cs,
s=[(y*100)+10 for y in ys],
cmap='gist_rainbow')
ax.set_yticklabels([])
for i in range(len(names)):
ax.text(ts[i],rds[i],"{}".format(i+1),size=12)
plt.show()
There are 42 points, the first 7 of which are
# Theta Radius
1 249.25 0.39
2 66.25 0.40
3 239.09 0.71
4 31.82 1.05
5 114.02 0.54
6 189.15 0.46
7 359.00 0.05
However, the resulting figure/plot at the below link is incorrect
https://i.imgur.com/n5dMuND.png
Before taking a screenshot, I mouse hovered over point 7. You can see that point 7 is plotted at 41.186 degrees and not 359 as it should be. Any idea why this is?
Thanks

Related

Data visulisation using ridge and scatter plot

Background:
I am working on python, I have a lot of data points (in .CSV form) so far what the code I have
Reads the csv and the "result" column
if the value in the "result" column is positive, the code plots the A B C D E F G parameters corresponding to it in such a way that the y-axis is the value of the parameters and x-axis is the name of the parameter.
If the number of such "result" are more than 10 It plots the first 10 A B C D E F G parameters corresponding to the results.
An example of the type of dataset is below. (Mine contains around 12000 rows)
The Dataset
A B C D E F G result
1.00 0.85 -0.999 0.27 0.98 0.39 0.80 -0.86
0.89 0.4 -0.6 0.47 0.28 0.29 0.26 0.65
0.65 -1.00 0.26 0.67 -0.88 0.29 0.10 0.50
0.98 -0.98 0.76 0.37 0.68 0.59 0.90 0
0 0.5 0.56 0.27 0.38 0.79 0.48 -0.65
The code :
df = pd.read_csv("result.csv")
df.loc[df.result>0, df.columns[:-1]].T.plot(ls='', marker='o')
Issue :
Sometimes if the value is the same the dot mark is at the same place thus it's hard to see the frequency distribution(such as in Column B and C below though they look similar one value has more points.
What I want to do is to plot something like a ridge plot on the current graph (as I drew below )so that the frequency distribution can be seen.
I am a novice in this type of data visualization. Kindly guide me on how it could be done
The density plot type already does pretty much what you want, we just need to superpose it to your data:
>>> data_to_plot = df.loc[df.result>0, df.columns[:-1]]
>>> data_to_plot.plot(kind='density')
This is trivial if you want horizontal subplots, you can simply use the subplots=True on either plot (and then zip the returned axes with columns to superpose the other plot):
>>> axes = data_to_plot.plot(kind='density', subplots=True, legend=False)
>>> for ax, (colname, series) in zip(axes, data_to_plot.iteritems()):
... ax.plot(series.values, np.zeros_like(series), ls='', marker='o')
... ax.set_ylabel(colname)
However if you want them vertically it’s likely we’ll have to compute the Gaussian densities ourselves. Pandas documentation points to scipy.stats.gaussian_kde. For this we’ll need to know at which points to interpolate the kernel. On your example it looks like [-1..1] is a good interval but of course you can take it from data min/max.
>>> from scipy.stats import gaussian_kde
>>> y = np.arange(-1, 1.01, .01)
>>> ridges = data_to_plot.apply(lambda s: gaussian_kde(s)(y))
>>> ridges
A B C D E F G
0 0.001119 0.271510 0.270048 2.029737e-24 0.163222 2.352981e-15 0.000018
1 0.001247 0.272310 0.272122 4.796826e-24 0.164507 3.959987e-15 0.000021
2 0.001389 0.273071 0.274155 1.125941e-23 0.165765 6.637610e-15 0.000025
3 0.001545 0.273794 0.276145 2.624972e-23 0.166995 1.108083e-14 0.000030
4 0.001717 0.274479 0.278093 6.078288e-23 0.168200 1.842365e-14 0.000036
.. ... ... ... ... ... ... ...
196 0.939109 0.307535 0.314227 3.791151e-02 0.436305 3.153771e-01 0.630121
197 0.932996 0.304793 0.310216 3.100156e-02 0.431472 2.913782e-01 0.615406
198 0.926089 0.302012 0.306172 2.518140e-02 0.426576 2.682819e-01 0.600298
199 0.918401 0.299193 0.302097 2.031681e-02 0.421619 2.461581e-01 0.584834
200 0.909948 0.296337 0.297994 1.628194e-02 0.416607 2.250649e-01 0.569049
[201 rows x 7 columns]
Then simply ploy with zip, as before. There might be some adjustment needed, but this is how it looks like with your sample data. Note the scaling of ridges so they are all on the same scale and fit inside a 0.5-wide space on the plot.
>>> ax = data_to_plot.T.plot(ls='', marker='o')
>>> for n, (colname, ridge) in enumerate(ridges.iteritems()):
... ax.plot(ridge / (-2 * ridges.max().max()) + n, y, color='black')

Plotting values from two datasets for comparison

I would like to plot two dataframes in order to compare the results. My first choice would be to plot line charts based only on one column from the two dataframes.
df
Name Surname P R F
0 B N 0.41 0.76 0.53
1 B L 0.62 0.67 0.61
2 B SV 0.63 0.53 0.52
3 B SG 0.43 0.61 0.53
4 B R 0.81 0.51 0.53
5 T N 0.32 0.82 0.53
6 T L 0.58 0.69 0.62
7 T SV 0.67 0.61 0.64
8 T SG 0.53 0.63 0.57
9 T R 0.74 0.48 0.58
and
data = [['B','N',0.41,0.72,0.51],
['B','L',0.66,0.67,0.62],
['B','SV',0.63,0.51,0.51],
['B','SG',0.44,0.63,0.51],
['B','R',0.81,0.51,0.62],
['T','N',0.33,0.80,0.47],
['T','L',0.58,0.61,0.63],
['T','SV',0.68,0.61,0.64],
['T','SG',0.53,0.63,0.57],
['T','R',0.74,0.48,0.58]]
df1 = pd.DataFrame(data, columns = ['Name','Surname','P','R','F'])
I would like to create a plot based on F values, keeping information (in legend/labels) of B/T and R,N,L, SV, SG.
I have tried with bar charts, but this does not take into account labels/legend.
I am looking for something like this:
fig, ax = plt.subplots()
ax2 = ax.twinx()
df.plot(x="Name", y=["F"], ax=ax)
df1.plot(x="Name", y=["F"], ax=ax2, ls="--")
However this misses labels and legend.
I have also tried with:
ax = df.plot()
l = ax.get_lines()
df1.plot(ax=ax, linestyle='--', color=(i.get_color() for i in l))
But I cannot distinguish by Name, Surname and dataframe (on the x axis there should be Surname).
It would be also ok to plot separately the values (P, R and F) as follows:
ax = df[['P']].plot()
l = ax.get_lines()
df1[['P']].plot(ax=ax, linestyle='--', color=(i.get_color() for i in l))
I should compare F values of the two plots based on Name and Surname.
Any help would be greatly appreciated.
IIUC,
fig, ax = plt.subplots()
ax2 = ax.twinx()
df.plot(x="Name", y=["F"], ax=ax)
df1.plot(x="Name", y=["F"], ax=ax2, ls="--")
fig.legend(loc="upper right", bbox_to_anchor=(1,1), bbox_transform=ax.transAxes)
Output:
The simplest way to add information about other parameters to a graph is to use functions like ax.text or ax.annotate over a loop. The code should look like this:
fig, ax = plt.subplots()
data1 = ax.bar(20*index, df["F"], bar_width)
data2 = ax.bar(20*index+bar_width, df1["F"],bar_width)
for i in index:
ax.text(i*20-5,0,df['Surname'][i],)
ax.text(i*20-5,0.05,df['Name'][i])
ax.text(i*20+bar_width-5,0,df1['Surname'][i])
ax.text(i*20+bar_width-5,0.05,df1['Name'][i])
plt.show()
Useful link:
Official Documentation for Text in Matplotlib Plots
Edit:
Probably similar problem: Different text at each point
Edit 2:
Code without index:
fig, ax = plt.subplots()
data1 = ax.plot(df["F"])
data2 = ax.plot(df1["F"])
for i in range(1,10):
ax.text(i,df["F"][i],df['Name'][i]+" "+df['Surname'][i],)
ax.text(i,df["F"][i],df['Name'][i]+" "+df['Surname'][i],)
plt.show()

Making Categorical or Grouped Bar Graph with secondary Axis Line Graph

I need to compare different sets of daily data between 4 shifts(categorical / groupby), using bar graphs and line graphs. I have looked everywhere and have not found a working solution for this that doesn't include generating new pivots and such.
I've used both, matplotlib and seaborn, and while I can do one or the other(different colored bars/lines for each shift), once I incorporate the other, either one disappears, or other anomalies happen like only one plot point shows. I have looked all over and there are solutions for representing a single series of data on both chart types, but none that goes into multi category or grouped for both.
Data Example:
report_date wh_id shift Head_Count UTL_R
3/17/19 55 A 72 25%
3/18/19 55 A 71 10%
3/19/19 55 A 76 20%
3/20/19 55 A 59 33%
3/21/19 55 A 65 10%
3/22/19 55 A 54 20%
3/23/19 55 A 66 14%
3/17/19 55 1 11 10%
3/17/19 55 2 27 13%
3/17/19 55 3 18 25%
3/18/19 55 1 23 100%
3/18/19 55 2 16 25%
3/18/19 55 3 12 50%
3/19/19 55 1 28 10%
3/19/19 55 2 23 50%
3/19/19 55 3 14 33%
3/20/19 55 1 29 25%
3/20/19 55 2 29 25%
3/20/19 55 3 10 50%
3/21/19 55 1 17 20%
3/21/19 55 2 29 14%
3/21/19 55 3 30 17%
3/22/19 55 1 12 14%
3/22/19 55 2 10 100%
3/22/19 55 3 17 14%
3/23/19 55 1 16 10%
3/23/19 55 2 11 100%
3/23/19 55 3 13 10%
tm_daily_df = pd.read_csv('fg_TM_Daily.csv')
tm_daily_df = tm_daily_df.set_index('report_date')
fig2, ax2 = plt.subplots(figsize=(12,8))
ax3 = ax2.twinx()
group_obj = tm_daily_df.groupby('shift')
g = group_obj['Head_Count'].plot(kind='bar', x='report_date', y='Head_Count',ax=ax2,stacked=False,alpha = .2)
g = group_obj['UTL_R'].plot(kind='line',x='report_date', y='UTL_R', ax=ax3,marker='d', markersize=12)
plt.legend(tm_daily_df['shift'].unique())
This code has gotten me the closest I've been able to get. Notice that even with stacked = False, they are still stacked. I changed the setting to True, and nothing changes.
All i need is for the bars to be next to each other with the same color scheme representative of the shift
The graph:
Here are two solutions (stacked and unstacked). Based on your questions we will:
plot Head_Count in the left y axis and UTL_R in the right y axis.
report_date will be our x axis
shift will represent the hue of our graph.
The stacked version uses pandas default plotting feature, and the unstacked version uses seaborn.
EDIT
From your request, I added a 100% stacked graph. While it is not quite exactly what you asked in the comment, the graph type you asked may create some confusion when reading (are the values based on the upper line of the stack or the width of the stack). An alternative solution may be using a 100% stacked graph.
Stacked
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
dfg = df.set_index(['report_date', 'shift']).sort_index(level=[0,1])
fig, ax = plt.subplots(figsize=(12,6))
ax2 = ax.twinx()
dfg['Head_Count'].unstack().plot.bar(stacked=True, ax=ax, alpha=0.6)
dfg['UTL_R'].unstack().plot(kind='line', ax=ax2, marker='o', legend=None)
ax.set_title('My Graph')
plt.show()
Stacked 100%
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
dfg = df.set_index(['report_date', 'shift']).sort_index(level=[0,1])
# Create `Head_Count_Pct` column
for date in dfg.index.get_level_values('report_date').unique():
for shift in dfg.loc[date, :].index.get_level_values('shift').unique():
dfg.loc[(date, shift), 'Head_Count_Pct'] = dfg.loc[(date, shift), 'Head_Count'].sum() / dfg.loc[(date, 'A'), 'Head_Count'].sum()
fig, ax = plt.subplots(figsize=(12,6))
ax2 = ax.twinx()
pal = sns.color_palette("Set1")
dfg[dfg.index.get_level_values('shift').isin(['1','2','3'])]['Head_Count_Pct'].unstack().plot.bar(stacked=True, ax=ax, alpha=0.5, color=pal)
dfg['UTL_R'].unstack().plot(kind='line', ax=ax2, marker='o', legend=None, color=pal)
ax.set_title('My Graph')
plt.show()
Unstacked
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
dfg = df.set_index(['report_date', 'shift']).sort_index(level=[0,1])
fig, ax = plt.subplots(figsize=(15,6))
ax2 = ax.twinx()
sns.barplot(x=dfg.index.get_level_values('report_date'),
y=dfg.Head_Count,
hue=dfg.index.get_level_values('shift'), ax=ax, alpha=0.7)
sns.lineplot(x=dfg.index.get_level_values('report_date'),
y=dfg.UTL_R,
hue=dfg.index.get_level_values('shift'), ax=ax2, marker='o', legend=None)
ax.set_title('My Graph')
plt.show()
EDIT #2
Here is the graph as you requested in a second time (stacked, but stack n+1 does not start where stack n ends).
It is slightly more involving as we have to do multiple things:
- we need to manually assign our color to our shift in our df
- once we have our colors assign, we will iterate through each date range and 1) sort or Head_Count values descending (so that our largest sack is in the back when we plot the graph), and 2) plot the data and assign the color to each stacj
- Then we can create our second y axis and plot our UTL_R values
- Then we need to assign the correct color to our legend labels
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
def assignColor(shift):
if shift == 'A':
return 'R'
if shift == '1':
return 'B'
if shift == '2':
return 'G'
if shift == '3':
return 'Y'
# map a color to a shift
df['color'] = df['shift'].apply(assignColor)
fig, ax = plt.subplots(figsize=(15,6))
# plot our Head_Count values
for date in df.report_date.unique():
d = df[df.report_date == date].sort_values(by='Head_Count', ascending=False)
y = d.Head_Count.values
x = date
color = d.color
b = plt.bar(x,y, color=color)
# Plot our UTL_R values
ax2 = ax.twinx()
sns.lineplot(x=df.report_date, y=df.UTL_R, hue=df['shift'], marker='o', legend=None)
# Assign the color label color to our legend
leg = ax.legend(labels=df['shift'].unique(), loc=1)
legend_maping = dict()
for shift in df['shift'].unique():
legend_maping[shift] = df[df['shift'] == shift].color.unique()[0]
i = 0
for leg_lab in leg.texts:
leg.legendHandles[i].set_color(legend_maping[leg_lab.get_text()])
i += 1
How about this?
tm_daily_df['UTL_R'] = tm_daily_df['UTL_R'].str.replace('%', '').astype('float') / 100
pivoted = tm_daily_df.pivot_table(values=['Head_Count', 'UTL_R'],
index='report_date',
columns='shift')
pivoted
# Head_Count UTL_R
# shift 1 2 3 A 1 2 3 A
# report_date
# 3/17/19 11 27 18 72 0.10 0.13 0.25 0.25
# 3/18/19 23 16 12 71 1.00 0.25 0.50 0.10
# 3/19/19 28 23 14 76 0.10 0.50 0.33 0.20
# 3/20/19 29 29 10 59 0.25 0.25 0.50 0.33
# 3/21/19 17 29 30 65 0.20 0.14 0.17 0.10
# 3/22/19 12 10 17 54 0.14 1.00 0.14 0.20
# 3/23/19 16 11 13 66 0.10 1.00 0.10 0.14
fig, ax = plt.subplots()
pivoted['Head_Count'].plot.bar(ax=ax)
pivoted['UTL_R'].plot.line(ax=ax, legend=False, secondary_y=True, marker='D')
ax.legend(loc='upper left', title='shift')

How can i plot this in matplotlib with 3 dimensions?

My data looks like this:
Rate 4 8 16 32 64 128 256
David 0.25 0.176 0.652 0.126 0.123 0.142 0.318
Saul 0.132 0.244 0.142 0.162 0.174 0.244 0.149
Maria 0.145 0.189 0.65 0.42 0.111 0.197 0.182
I need to plot this data 3d. For clarity, I have 3 dimensions, first one is names like David etc. Secondly the rate, and a variable value for every combination of Name and Rate.
Which kind of plot should I use? I am a little confused because rates and variables are in the same dimension but names are different and when I use regular 3d plot like scatter it said xs, ys and zs should be in the same dimension.
This is my code:
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
fig=plt.figure()
ax=fig.add_subplot(111, projection='3d')
X=[1,2,3]
Y=[4,5,6]
Z=[7,8,9]
ax.scatter(X,Y,Z, c='r', marker='o')
ax.set_xlabel('x axis')
ax.set_ylabel('y axis')
ax.set_zlabel('z axis')
plt.show()
If we assume X, Y and Z as names, rates and variables, then an error arise about x and y and z should be in the same dimension!

How can I plot a correlation matrix as a set of ellipses, similar to the R open-air package?

The figure below is plotted using the open-air R package:
I know matplotlib has the plt.matshow function,
but it can't clearly show the relation between variables at the same time.
Here is my early work:
df is a pandas dataframe with 7 variables shows like below:
I don't know how to attach a .csv file to StackOverflow.
Using plt.matshow(df.corr(),cmap = plt.cm.Greens), the figure shows like this:
The second figure can't represent the correlation relations of the variables as clearly as the first one.
Edit:
I upload the csv file to Google docs here.
I'm not aware of any existing Python library that does these "ellipse plots", but it's not particularly hard to implement using a matplotlib.collections.EllipseCollection:
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
from matplotlib.collections import EllipseCollection
def plot_corr_ellipses(data, ax=None, **kwargs):
M = np.array(data)
if not M.ndim == 2:
raise ValueError('data must be a 2D array')
if ax is None:
fig, ax = plt.subplots(1, 1, subplot_kw={'aspect':'equal'})
ax.set_xlim(-0.5, M.shape[1] - 0.5)
ax.set_ylim(-0.5, M.shape[0] - 0.5)
# xy locations of each ellipse center
xy = np.indices(M.shape)[::-1].reshape(2, -1).T
# set the relative sizes of the major/minor axes according to the strength of
# the positive/negative correlation
w = np.ones_like(M).ravel()
h = 1 - np.abs(M).ravel()
a = 45 * np.sign(M).ravel()
ec = EllipseCollection(widths=w, heights=h, angles=a, units='x', offsets=xy,
transOffset=ax.transData, array=M.ravel(), **kwargs)
ax.add_collection(ec)
# if data is a DataFrame, use the row/column names as tick labels
if isinstance(data, pd.DataFrame):
ax.set_xticks(np.arange(M.shape[1]))
ax.set_xticklabels(data.columns, rotation=90)
ax.set_yticks(np.arange(M.shape[0]))
ax.set_yticklabels(data.index)
return ec
For example, using your data:
data = df.corr()
fig, ax = plt.subplots(1, 1)
m = plot_corr_ellipses(data, ax=ax, cmap='Greens')
cb = fig.colorbar(m)
cb.set_label('Correlation coefficient')
ax.margins(0.1)
Negative correlations can be plotted as ellipses with the opposite orientation:
fig2, ax2 = plt.subplots(1, 1)
data2 = np.linspace(-1, 1, 9).reshape(3, 3)
m2 = plot_corr_ellipses(data2, ax=ax2, cmap='seismic', clim=[-1, 1])
cb2 = fig2.colorbar(m2)
ax2.margins(0.3)
Assuming you are interested in showing cluster relations, the seaborn package mentioned in the comments also has a clustermap. Using your correlation matrix (looks like you want to show correlation coefficients as int in the [-100, 100] range, you could do the following:
corr = df.corr().mul(100).astype(int)
GX HG RM SJ XB XN ZG
GX 100 77 62 71 48 66 57
HG 77 100 69 74 61 61 58
RM 62 69 100 75 48 64 68
SJ 71 74 75 100 50 70 65
XB 48 61 48 50 100 46 51
XN 66 61 64 70 46 100 75
ZG 57 58 68 65 51 75 100
and then use seaborn.clustermap() as follows:
import seaborn as sns
sns.clustermap(data=corr, annot=True, fmt='d', cmap='Greens').savefig('cluster.png')
I just discovered this Python package biokit today. It provides a very handy function to create various kinds of correlation charts. For example:
In [1]: import pandas as pd
In [2]: import matplotlib.pyplot as plt
...: from biokit.viz import corrplot
In [6]: corr
Out[6]:
GX HG RM SJ XB XN ZG
GX 1.00 -0.77 0.62 0.71 0.48 0.66 0.57
HG -0.77 1.00 0.69 0.74 0.61 0.61 0.58
RM 0.62 0.69 1.00 0.75 0.48 0.64 0.68
SJ 0.71 0.74 0.75 1.00 0.50 0.70 0.65
XB 0.48 0.61 0.48 0.50 1.00 -0.46 0.51
XN 0.66 0.61 0.64 0.70 -0.46 1.00 0.75
ZG 0.57 0.58 0.68 0.65 0.51 0.75 1.00
I took Stefan's data and modified it a little bit. Let's assume this is a correlation matrix. Now to create a correlation chart, you can simply do this:
In [7]: c = corrplot.Corrplot(corr)
...: c.plot()
Correlation chart with ellipses
You can read more examples here.

Categories