I need to create a heatmap from a csv file and highlight some cells, my idea was to create a mask from a Panda's dataframe and then iterate through the mask and add a patch each time.
Unfortunately even if the mask seems to work correctly only two patches are placed instead of the 4 I would like to have, does anyone know why?
df = pd.read_csv(argv[1])
df = df.transpose()
mask = df == 3
fig, ax = plt.subplots()
ax = sns.heatmap(df, ax=ax)
for row in range(df.shape[0]):
for col in range(df.shape[1]):
if mask[col][row]:
ax.add_patch(Rectangle((row, col), 1, 1))
plt.show()
The obtained graph:
After trying, you need to change the order of your indices when you create the rectangles:
import seaborn as sns
from matplotlib.patches import Rectangle
df = pd.DataFrame([[3,0,0],
[2,3,0],
[1,2,0],
[3,1,0],
[2,0,3],
[1,0,2],
[0,0,1],
[0,0,0]], columns = ["reg1","reg2","reg3"])
mask = df == 3
fig, ax = plt.subplots()
ax = sns.heatmap(df, ax=ax)
for row in range(df.shape[0]):
for col in range(df.shape[1]):
if mask.iloc[row,col]:
ax.add_patch(Rectangle((col, row), 1, 1, fill=False, edgecolor='blue', lw=3))
plt.show()
Output:
Related
Using the code below,
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv("population.csv")
df.head()
df["MonthYear"] = df["Month"].map(str) + " " + df["Year"].map(str)
df["MonthYear"] = pd.to_datetime(df["MonthYear"], format="%b %Y")
x = df["MonthYear"]
y = df["Population"]
fig, axs = plt.subplots(nrows=9, ncols=2, figsize = (9,19))
for col, ax in zip(df.columns, axs.flatten()):
ax.plot(x,y)
fig.tight_layout()
plt.show()
Can someone please help me try to figure out how to fix this? I'm doing it for days yet I can't figure it out.
Below:
create a datetime column and set it as index
split your dataset according to different possible values for "Region"
-> there is one subplot per Region
EDIT: with real dataset
EDIT: the author of the question has removed key informations from their question and deleted their comments. So to fully understand this answer:
the dataset is from here
in order to remove the last (empty) subplot: you should add fig.delaxes(axs.flat[-1])
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv('denguecases.csv')
df['Date'] = pd.to_datetime(df.apply(lambda row: row.Month + ' ' + str(row.Year), axis=1))
df.set_index('Date', inplace=True)
fig, axs = plt.subplots(nrows=9, ncols=2, figsize = (9,19))
for region, ax in zip(df.Region.unique(), axs.flat):
ax.plot(df.query('Region == #region').Dengue_Cases)
ax.tick_params(axis='x', labelrotation = 45)
ax.set_title(region)
fig.tight_layout()
Try this instead:
for ax in axs.flatten():
ax.plot(x,y)
But this of course will plot the same plot in all the subplots. I am not sure if you have data for each subplot or you are expecting the same data for all plots.
Update:
Lets say you have n columns and you want to make n subplots
x = df["MonthYear"]
column_names = df.columns
n = len(column_names)
fig, axs = plt.subplots(nrows=9, ncols=2, figsize = (9,19))
for i in range(n):
y = df[column_names[i]]
axs.flatten()[i].plot(x,y)
I want to plot normalized count grouped values with seaborn. At first, I tried doing the following:
fig, ax = plt.subplots(figsize=(10, 6))
ax = sns.histplot(
data = df,
x = 'age_bins',
hue = 'Showup',
multiple="dodge",
stat = 'count',
shrink = 0.4,
)
Original Count
Now I want to normalize each bar relative to the overall 'bin' count. The only way I successeded to do so was by doing this:
fig, ax = plt.subplots(figsize=(10, 6))
ax = sns.histplot(
data = df,
x = 'age_bins',
hue = 'Showup',
multiple="fill",
stat = 'count',
shrink = 0.4,
)
multiple = 'fill'
Now this made me achieve what I wanted in terms of values, but is there anyway to plot the same results but with bars dodged beside each other instead of above each other?
You can group by ages and "showup", count them, then change "showup" to individual columns. Then divide each row by the row total and create a bar plot via pandas:
import matplotlib.pyplot as plt
from matplotlib.ticker import PercentFormatter
import seaborn as sns
import pandas as pd
import numpy as np
ages = ['<10', '<20', '<30', '<40', '<50', '<60', '<70', '++70']
df = pd.DataFrame({'age_bins': np.random.choice(ages, 10000),
'Showup': np.random.choice([True, False], 10000, p=[0.88, 0.12])})
df_counts = df.groupby(['age_bins', 'Showup']).size().unstack().reindex(ages)
df_percentages = df_counts.div(df_counts.sum(axis=1), axis=0) * 100
sns.set() # set default seaborn style
fig, ax = plt.subplots(figsize=(10, 6))
df_percentages.plot.bar(rot=0, ax=ax)
ax.set_xlabel('')
ax.set_ylabel('Percentage per age group')
ax.yaxis.set_major_formatter(PercentFormatter(100))
plt.tight_layout()
plt.show()
I have created a correlation matrix of a pandas dataframe using seaborn with the following commands:
corrMatrix = df.corr()
#sns.heatmap(corrMatrix, annot=True)
#plt.show()
ax = sns.heatmap(
corrMatrix,
vmin=-1, vmax=1, center=0,
cmap=sns.diverging_palette(20, 220, n=200),
square=True, annot=True
)
ax.set_xticklabels(
ax.get_xticklabels(),
rotation=45,
horizontalalignment='right'
);
I get the following matrix plot:
How can you mask the correlation matrix to only show the first column of the matrix? I would also like the legend show along the right side.
In your case x is corrMatrix[['# of Prophages']]
df = pd.DataFrame({'A': np.random.rand(8), 'B': np.random.rand(8)})
corr = df.corr()
x = corr[['A']]
sns.heatmap(x)
corr:
A B
A 1.000000 -0.192375
B -0.192375 1.000000
sns.heatmap(corr):
sns.heatmap(x):
This might help you:
Credit goes to unutbu
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import seaborn.matrix as smatrix
sns.set()
flights_long = sns.load_dataset("flights")
flights = flights_long.pivot("month", "year", "passengers")
flights = flights.reindex(flights_long.iloc[:12].month)
sns.heatmap(flights)
Result:
columns = [1953]
myflights = flights.copy()
mask = myflights.columns.isin(columns)
myflights.loc[:, mask] = 0
arr = flights.values
vmin, vmax = arr.min(), arr.max()
sns.heatmap(flights, mask=myflights, annot=True, fmt="d", vmin=vmin, vmax=vmax)
Output:
columns = [1953]
myflights = flights.copy()
mask = myflights.columns.isin(columns)
myflights = myflights.loc[:, mask]
arr = flights.values
vmin, vmax = arr.min(), arr.max()
sns.heatmap(myflights, annot=True, fmt="d", vmin=vmin, vmax=vmax)
Output:
This worked for me on dummy data:
df = pd.DataFrame(corrMatrix['# of Prophages'],
index=corrMatrix.index)
sns.heatmap(df, annot=True, fmt="g", cmap='viridis')
plt.show()
This was adapted from this answer: Seaborn Heatmap with single column
much searching has not yielded a working solution to a python matplotlib problem. I'm sure I'm missing something simple...
MWE:
import pandas as pd
import matplotlib.pyplot as plt
#MWE plot
T = [1, 2, 3, 4, 5, 6]
n = len(T)
d1 = list(zip([500]*n, [250]*n))
d2 = list(zip([250]*n, [125]*n))
df1 = pd.DataFrame(data=d1, index=T)
df2 = pd.DataFrame(data=d2, index=T)
fig = plt.figure()
ax = fig.add_subplot(111)
df1.plot(kind='bar', stacked=True, align='edge', width=-0.4, ax=ax)
df2.plot(kind='bar', stacked=True, align='edge', width=0.4, ax=ax)
plt.show()
Generates:
Shifted Plot
No matter what parameters I play around with, that first bar is cut off on the left. If I only plot a single bar (i.e. not clusters of bars), the bars are not cut off and in fact there is nice even white space on both sides.
I hard-coded the data for this MWE; however, I am trying to find a generic way to ensure the correct alignment since I will likely produce a LOT of these plots with varying numbers of items on the x axis and potentially a varying number of bars in each cluster.
How do I shift the bars so that the they are spaced correctly on the x axis with even white space?
It all depends on the width that you put in your plots. Put some xlim.
import pandas as pd
import matplotlib.pyplot as plt
#MWE plot
T = [1, 2, 3, 4, 5, 6]
n = len(T)
d1 = list(zip([500]*n, [250]*n))
d2 = list(zip([250]*n, [125]*n))
df1 = pd.DataFrame(data=d1, index=T)
df2 = pd.DataFrame(data=d2, index=T)
fig = plt.figure()
ax = fig.add_subplot(111)
df1.plot(kind='bar', stacked=True, align='edge', width=-0.4, ax=ax)
df2.plot(kind='bar', stacked=True, align='edge', width=0.4, ax=ax)
plt.xlim(-.4,5.4)
plt.show()
Hope it works!
I'm using a for cycle to scatter more than one dataframe on a same pd.plot.scatterplot, but everytime the cycle return it print a colorbar.
How can I have just one colorbar at the end of the cycle?
This is my code
if colormap is None: colormap='jet'
f,ax = plt.subplots()
for i, data in enumerate(wells):
data.plot.scatter(x,y, c=z, colormap=colormap, ax=ax)
ax.set_xlabel(x); ax.set_xlim(xlim)
ax.set_ylabel(y); ax.set_ylim(ylim)
ax.legend()
ax.grid()
ax.set_title(title)
This can be achieved by using figure and adding the axes to the same subplot:
import pandas as pd
import numpy as np
# created two dataframes with random values
df1 = pd.DataFrame(np.random.rand(25, 2), columns=['a', 'b'])
df2 = pd.DataFrame(np.random.rand(25, 2), columns=['a', 'b'])
And then:
fig = plt.figure()
for i, data in enumerate([df1, df2]):
ax = fig.add_subplot(111)
ax = data.plot.scatter(x='a', y='b', ax=ax,
c='#00FF00' if i == 0 else '#FF0000')
plt.show()
You can add the labels and other elements as required.