Pandas scatter_matrix analog function to pairs(lower.panel, upper.panel)

Pandas scatter_matrix analog function to pairs(lower.panel, upper.panel) - python

I need to create a scatter matrix in Python. I tried using scatter_matrix for this but I would like to leave only the scatter plots above the diagonal line.
I`m in the really beginning (did not got far) and I have troubles when columns have names (not the default numbers).
Here is my code:
import itertools
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
data=pd.DataFrame(np.random.randint(0,100,size=(10, 5)), columns=list('ABCDE')) #THE PROBLEM IS HERE - I WILL HAVE COLUMNS WITH NAMES
d = data.shape[1]
fig, axes = plt.subplots(nrows=d, ncols=d, sharex=True, sharey=True)
for i in range(d):
for j in range(d):
ax = axes[i,j]
if i == j:
ax.text(0.5, 0.5, "Diagonal", transform=ax.transAxes,
horizontalalignment='center', verticalalignment='center',
fontsize=16)
else:
ax.scatter(data[j], data[i], s=10)

You have an issue when selecting a column from a data frame. You can use iloc to select columns based on integer location. Change your last line to:
ax.scatter(data.iloc[:,j], data.iloc[:,i], s=10)
Gives:

Related

Dynamic pandas subplots with matplotlib

I need help creating subplots in matplotlib dynamically from a pandas dataframe.
The data I am using is from data.word.
I have already created the viz but the plots have been created manually.
The reason why I need it dynamically is because I am going to apply a filter dynamically (in Power BI) and i need the graph to adjust to the filter.
This is what i have so far:
I imported the data and got it in the shape i need:
import matplotlib.pyplot as plt
import pandas as pd
import numpy as
# read file from makeover monday year 2018 week 48
df = pd.read_csv(r'C:\Users\Ruth Pozuelo\Documents\python_examples\data\2018w48.csv', usecols=["city", "category","item", "cost"], index_col=False, decimal=",")
df.head()
this is the table:
I then apply the filter that will come from Power BI dynamically:
df = df[df.category=='Party night']
and then I count the number of plots based on the number of items I get after I apply the filter:
itemCount = df['item'].nunique() #number of plots
If I then plot the subplots:
fig, ax = plt.subplots( nrows=1, ncols=itemCount ,figsize=(30,10), sharey=True)
I get the skeleton:
So far so good!
But now i am suck on how to feed the x axis to the loop to generate the subcategories. I am trying something like below, but nothing works.
#for i, ax in enumerate(axes.flatten()):
# ax.plot(??,cityValues, marker='o',markersize=25, lw=0, color="green") # The top-left axes
As I already have the code for the look and feel of the chart, annotations,ect, I would love to be able to use the plt.subplots method and I prefer not use seaborn if possible.
Any ideas on how to get his working?
Thanks in advance!

The data was presented to us and we used it as the basis for our code. I prepared a list of columns and a list of coloring and looped through them. axes.rabel() is more memory efficient than axes.fatten(). This is because the list contains an object for each subplot, allowing for centralized configuration.
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
url='https://raw.githubusercontent.com/Curbal-Data-Labs/Matplotlib-Labs/master/2018w48.csv'
dataset = pd.read_csv(url)
dataset.drop_duplicates(['city','item'], inplace=True)
dataset.pivot_table(index='city', columns='item', values='cost', aggfunc='sum', margins = True).sort_values('All', ascending=True).drop('All', axis=1)
df = dataset.pivot_table(index='city', columns='item', values='cost', aggfunc='sum', margins = True).sort_values('All', ascending=True).drop('All', axis=1).sort_values('All', ascending=False, axis=1).drop('All').reset_index()
# comma replace
for c in df.columns[1:]:
df[c] = df[c].str.replace(',','.').astype(float)
fig, axes = plt.subplots(nrows=1, ncols=5, figsize=(30,10), sharey=True)
colors = ['green','blue','red','black','brown']
col_names = ['Dinner','Drinks at Dinner','2 Longdrinks','Club entry','Cinema entry']
for i, (ax,col,c) in enumerate(zip(axes.ravel(), col_names, colors)):
ax.plot(df.loc[:,col], df['city'], marker='o', markersize=25, lw=0, color=c)
ax.set_title(col)
for i,j in zip(df[col], df['city']):
ax.annotate('$'+str(i), xy=(i, j), xytext=(i-4,j), color="white", fontsize=8)
ax.set_xticks([])
ax.spines[['top', 'right', 'left', 'bottom']].set_visible(False)
ax.grid(True, axis='y', linestyle='solid', linewidth=2)
ax.grid(True, axis='x', linestyle='solid', linewidth=0.2)
ax.xaxis.tick_top()
ax.xaxis.set_label_position('top')
ax.set_xlim(xmin=0, xmax=160)
ax.xaxis.set_major_formatter('${x:1.0f}')
ax.tick_params(labelsize=8, top=False, left=False)
plt.show()

Working Example below. I used seaborn to plot the bars but the idea is the same you can loop through the facets and increase a count. Starting from -1 so that your first count = 0, and use this as the axis label.
import seaborn as sns
fig, ax = plt.subplots( nrows=1, ncols=itemCount ,figsize=(30,10), sharey=True)
df['Cost'] = df['Cost'].astype(float)
count = -1
variables = df['Item'].unique()
fig, axs = plt.subplots(1,itemCount , figsize=(25,70), sharex=False, sharey= False)
for var in variables:
count += 1
sns.barplot(ax=axs[count],data=df, x='Cost', y='City')

matplotlib: Add AxesSubplot instances to a figure

I'm going insane here ... this should be a simple exercise but I'm stuck:
I have a Jupyter notebook and am using the ruptures Python package. All I want to do is, take the figure or AxesSubplot(s) that the display() function returns and add it to a figure of my own, so I can share the x-axis, have a single image, etc.:
import pandas as pd
import matplotlib.pyplot as plt
myfigure = plt.figure()
l = len(df.columns)
for index, series in enumerate(df):
data = series.to_numpy().astype(int)
algo = rpt.KernelCPD(kernel='rbf', min_size=4).fit(data)
result = algo.predict(pen=3)
myfigure.add_subplot(l, 1, index+1)
rpt.display(data, result)
plt.title(series.name)
plt.show()
What I get is a figure with the desired number of subplots (all empty) and n separate figures from ruptures:
When instead I want want the subplots to be filled with the figures ...

I basically had to recreate the plot that ruptures.display(data,result) produces, to get my desired figure:
import pandas as pd
import numpy as np
import ruptures as rpt
import matplotlib.pyplot as plt
from matplotlib.ticker import EngFormatter
fig, axs = plt.subplots(len(df.columns), figsize=(22,20), dpi=300)
for index, series in enumerate(df):
resampled = df[series].dropna().resample('6H').mean().pad()
data = resampled.to_numpy().astype(int)
algo = rpt.KernelCPD(kernel='rbf', min_size=4).fit(data)
result = algo.predict(pen=3)
# Create ndarray of tuples from the result
result = np.insert(result, 0, 0) # Insert 0 as first result
tuples = np.array([ result[i:i+2] for i in range(len(result)-1) ])
ax = axs[index]
# Fill area beween results alternating blue/red
for i, tup in enumerate(tuples):
if i%2==0:
ax.axvspan(tup[0], tup[1], lw=0, alpha=.25)
else:
ax.axvspan(tup[0], tup[1], lw=0, alpha=.25, color='red')
ax.plot(data)
ax.set_title(series)
ax.yaxis.set_major_formatter(EngFormatter())
plt.subplots_adjust(hspace=.3)
plt.show()
I've wasted more time on this than I can justify, but it's pretty now and I can sleep well tonight :D

Plot legend shows unknown item/ same legend item shown twice with line different style

I am plotting some routes on a black and white png. Now it appears that there is a item in the legend that should not be there. I am iterating a pandas dataframe and identify the different routes by there unique id. I also have a start and a end point that i have right at the beginning of the dataframe, so at i=0, and i=1, I plot marker='o' instead, so I can see that single points on my plot/rows in my dataframe. All working fine so far, but as you can see in the legend for i=0, there are 2 entries. Once the starting point, but in the second line it adds an orange line. How can that be? In the dataframe it is definitely only 1 row with id=0.
Here my code with an example dataframe:
import pandas as pd
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
df = pd.DataFrame({'x':[100,60,1,1,1,5,4,4], 'y':[100,125,1,2,3,10,10,9],'id':[0,1,2,2,2,3,3,3]})
for i, g in df.groupby('id'):
if(i==0):
g.plot(x='x',y='y',ax=ax,marker='o',title="Alternative Routes",label="Start Punkt")
if(i==1):
g.plot(x='x',y='y',ax=ax,marker='o',title="Alternative Routes",label="End Punkt")
else:
g.plot(x='x',y='y',ax=ax, title="Alternative Routes",label=i)
plt.show()
Here the resulting plot:

Found the answer by myself: Should be an elif instead of a an if for i==1
import pandas as pd
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
df = pd.DataFrame({'x':[100,60,1,1,1,5,4,4], 'y':[100,125,1,2,3,10,10,9],'id':[0,1,2,2,2,3,3,3]})
for i, g in df.groupby('id'):
if(i==0):
g.plot(x='x',y='y',ax=ax,marker='o',title="Alternative Routes",label="Start Punkt")
elif(i==1):
g.plot(x='x',y='y',ax=ax,marker='o',title="Alternative Routes",label="End Punkt")
else:
g.plot(x='x',y='y',ax=ax, title="Alternative Routes",label=i)
plt.show()

sharey='all' argument in plt.subplots() not passed to df.plot()?

I have a pandas dataframe which I would like to slice, and plot each slice in a separate subplot. I would like to use the sharey='all' and have matplotlib decide on some reasonable y-axis limits, rather than having to search the dataframe for the min and max and add offsets.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df = pd.DataFrame(np.arange(50).reshape((5,10))).transpose()
fig, axes = plt.subplots(nrows=0,ncols=0, sharey='all', tight_layout=True)
for i in range(1, len(df.columns) + 1):
ax = fig.add_subplot(2,3,i)
iC = df.iloc[:, i-1]
iC.plot(ax=ax)
Which gives the following plot:
In fact, it gives that irrespective of what I specify sharey to be ('all','col','row',True, or False). What I sought after using sharey='all' would be something like:
Can somebody perhaps explain me what I'm doing wrong here?

The following version would only add those axes you need for your df-columns and share their y-scales:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df = pd.DataFrame(np.arange(50).reshape((5,10))).transpose()
fig = plt.figure(tight_layout=True)
ref_ax = None
for i in range(len(df.columns)):
ax = fig.add_subplot(2, 3, i+1, sharey=ref_ax)
ref_ax=ax
iC = df.iloc[:, i]
iC.plot(ax=ax)
plt.show()
The grid-layout Parameters, which are explicitly given as ...add_subplot(2, 3, ... here can of course be calculated with respect to len(df.columns).

Your plots are not shared. You create a subplot grid with 0 rows and 0 columns, i.e. no subplots at all, but those nonexisting subplots have their y axes shared. Then you create some other (existing) subplots, which are not shared. Those are the ones that are plotted to.
Instead you need to set nrows and ncols to some useful values and plot to those hence created axes.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df = pd.DataFrame(np.arange(50).reshape((5,10))).transpose()
fig, axes = plt.subplots(nrows=2,ncols=3, sharey='all', tight_layout=True)
for i, ax in zip(range(len(df.columns)), axes.flat):
iC = df.iloc[:, i]
iC.plot(ax=ax)
for j in range(len(df.columns),len(axes.flat)):
axes.flatten()[j].axis("off")
plt.show()

How to plot heat map in matplotlib with label at both side right and left

UPDATED
I have write down a code like the given bellow..
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
df = pd.read_csv("data_1.csv",index_col="Group")
print df
fig,ax = plt.subplots(1)
heatmap = ax.pcolor(df)########
ax.pcolor(df,edgecolors='k')
cbar = plt.colorbar(heatmap)##########
plt.ylim([0,12])
ax.invert_yaxis()
locs_y, labels_y = plt.yticks(np.arange(0.5, len(df.index), 1), df.index)
locs_x, labels_x = plt.xticks(np.arange(0.5, len(df.columns), 1), df.columns)
ax.set_xticklabels(labels_x, rotation=10)
ax.set_yticklabels(labels_y,fontsize=10)
plt.show()
Which takes input like given bellow and plot a heat map with the two side leabel left and bottom..
GP1,c1,c2,c3,c4,c5
S1,21,21,20,69,30
S2,28,20,20,39,25
S3,20,21,21,44,21
I further want to add additional labels at right side as given bellow to the data and want to plot a heatmap with three side label. right left and bottom.
GP1,c1,c2,c3,c4,c5
S1,21,21,20,69,30,V1
S2,28,20,20,39,25,V2
S3,20,21,21,44,21,V3
What changes should i incorporate into the code.
Please help ..

You may create a new axis on the right of the plot, called twinx. Then you need to essentially adjust this axis the same way you already did with the first axis.
u = u"""GP1,c1,c2,c3,c4,c5
S1,21,21,20,69,30
S2,28,20,20,39,25
S3,20,21,21,44,21"""
import io
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
df= pd.read_csv(io.StringIO(u),index_col="GP1")
fig,ax = plt.subplots(1)
heatmap = ax.pcolor(df, edgecolors='k')
cbar = plt.colorbar(heatmap, pad=0.1)
bx = ax.twinx()
ax.set_yticks(np.arange(0.5, len(df.index), 1))
ax.set_xticks(np.arange(0.5, len(df.columns), 1), )
ax.set_xticklabels(df.columns, rotation=10)
ax.set_yticklabels(df.index,fontsize=10)
bx.set_yticks(np.arange(0.5, len(df.index), 1))
bx.set_yticklabels(["V1","V2","V3"],fontsize=10)
ax.set_ylim([0,12])
bx.set_ylim([0,12])
ax.invert_yaxis()
bx.invert_yaxis()
plt.show()

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Pandas scatter_matrix analog function to pairs(lower.panel, upper.panel) - python

You have an issue when selecting a column from a data frame. You can use iloc to select columns based on integer location. Change your last line to: ax.scatter(data.iloc[:,j], data.iloc[:,i], s=10) Gives:

Related

Dynamic pandas subplots with matplotlib

matplotlib: Add AxesSubplot instances to a figure

Plot legend shows unknown item/ same legend item shown twice with line different style

sharey='all' argument in plt.subplots() not passed to df.plot()?

How to plot heat map in matplotlib with label at both side right and left

Categories

Resources