I have a data which has various values of A, B, C and D based different dates, i want to make a stripplot of these points, such that data points of recent date should be shaded darker(or have more alpha value) compared data points of previous dates.
this is what i have right now, all i need is to shade the points based on date for each bucket. but i am not able to figure that out
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib as mlp
plt.style.use("ggplot")
data = pd.DataFrame({"Date":pd.date_range(start="2020-01-06", end="2020-08-10", freq="W-MON"),
"A":[np.random.randint(-5, 50) for _ in range(len(pd.date_range(start="2020-01-06", end="2020-08-10", freq="W-MON")))],
"B":[np.random.randint(-5, 50) for _ in range(len(pd.date_range(start="2020-01-06", end="2020-08-10", freq="W-MON")))],
"C":[np.random.randint(-10, 50) for _ in range(len(pd.date_range(start="2020-01-06", end="2020-08-10", freq="W-MON")))],
"D":[np.random.randint(9, 50) for _ in range(len(pd.date_range(start="2020-01-06", end="2020-08-10", freq="W-MON")))]})
data.set_index("Date", inplace=True)
data.head()
sns.catplot(data=data, aspect=15/6, height=6)
This is the result of the above code
A scatter plot with randomized x-displacements can be used to apply one colormap per column.
To illustrate the effect, the example below uses random data with the most recent values being the largest.
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
plt.style.use("ggplot")
dates = pd.date_range(start="2020-01-06", end="2020-08-10", freq="W-MON")
N = len(dates)
data = pd.DataFrame({"Date": dates,
"A": 30 + np.random.uniform(-5, 8, N).cumsum(),
"B": 20 + np.random.uniform(-4, 9, N).cumsum(),
"C": 25 + np.random.uniform(-4, 7, N).cumsum(),
"D": 40 + np.random.uniform(-2, 8, N).cumsum()})
data.set_index("Date", inplace=True)
columns = data.columns
for col_id, (column, cmap) in enumerate(zip(columns, ['Reds', 'Blues', 'Greens', 'Purples'])):
plt.scatter(col_id + np.random.uniform(-0.2, 0.2, N), data[column], c=range(N), cmap=cmap)
plt.xticks(range(len(columns)), columns)
plt.show()
Related
I am drawing boxplots with Python Seaborn package. I have facet grid with both rows and columns. That much I've been able to do with the Seaborn function catplot.
I also want to annotate the outliers. I have found some nice examples at SO for annotating the outliers but without facet structure. That's where I'm struggling.
Here is what I've got (borrows heavily from this post: Boxplot : Outliers Labels Python):
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from matplotlib.cbook import boxplot_stats
sns.set_style('darkgrid')
Month = np.repeat(np.arange(1, 11), 10)
Id = np.arange(1, 101)
Value = np.random.randn(100)
Row = ["up", "down"]*50
df = pd.DataFrame({'Value': Value, 'Month': Month, 'Id': Id, 'Row': Row})
g = sns.catplot(data=df, x="Month", y="Value", row="Row", kind="box", height=3, aspect=3)
for name, group in df.groupby(["Month", "Row"]):
fliers = [y for stat in boxplot_stats(group["Value"]) for y in stat["fliers"]]
d = group[group["Value"].isin(fliers)]
g.axes.flatten().annotate(d["Id"], xy=(d["Month"] - 1, d["Value"]))
The dataframe d collects all the outliers by patch. The last line aims to match d with the graph g patches. However, that doesn't work, but I haven't found a way to flatten axes to a list where each element would correspond to a grouped dataframe element.
I'd be glad to hear alternative versions for achieving this too.
One way to do it:
for name, group in df.groupby(["Month", "Row"]):
fliers = [y for stat in boxplot_stats(group["Value"]) for y in stat["fliers"]]
d = group[group["Value"].isin(fliers)]
for i in range(len(d)):
ngrid = (0 if d.iloc[i,3]=='up' else 1)
g.fig.axes[ngrid].annotate(d.iloc[i, 2], xy=(d.iloc[i, 1] - 1, d.iloc[i, 0]))
You can loop through g.axes_dict to visit each of the axes.
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from matplotlib.cbook import boxplot_stats
sns.set_style('darkgrid')
Month = np.repeat(np.arange(1, 11), 10)
Id = np.arange(1, 101)
Value = np.random.randn(100)
Row = ["up", "down"] * 50
df = pd.DataFrame({'Value': Value, 'Month': Month, 'Id': Id, 'Row': Row})
g = sns.catplot(data=df, x="Month", y="Value", row="Row", kind="box", height=3, aspect=3)
for row, ax in g.axes_dict.items():
for month in np.unique(df["Month"]):
group = df.loc[(df["Row"] == row) & (df["Month"] == month), :]
fliers = boxplot_stats(group["Value"])[0]["fliers"]
if len(fliers) > 0:
for mon, val, id in zip(group["Month"], group["Value"], group["Id"]):
if val in fliers:
ax.annotate(f' {id}', xy=(mon - 1, val))
plt.tight_layout()
plt.show()
I have a sample dataframe like the following:
import pandas as pd
df = pd.DataFrame(np.random.randint(
0, 10, size=(1000, 11)), columns=list('ABCDEFGHIJK'))
The desired but unpolished output looks like this:
The data of each column in the dataframe is plotted as a subplot with five rows of bars.
I prefer to use matplotlib because I can relatively easily make the graphs looking good. But its performance seems pretty slow.
You can use the bottom parameter of bar to offset the individual rows.
The following not optimized example demonstrates this approach:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
df = pd.DataFrame(np.random.randint(0, 10, size=(1000, 11)), columns=list('ABCDEFGHIJK'))
fig = plt.figure()
for i,c in enumerate(df.columns):
ax = fig.add_subplot(3, 4, i+1)
for x,h,b in zip((df.index.to_numpy() % 200).reshape(-1, 200), df[c].to_numpy().reshape(-1, 200), (df.index.to_numpy() // 200 * 10).reshape(-1, 200)):
ax.set_title(c)
ax.bar(x, h, bottom=b, color='k' )
My code-
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
plt.figure()
cols = ['hops','frequency']
data = [[-13,1],[-8,1],[-5,1],[0,2],[2,1],[4,1],[7,1]]
data = np.asarray(data)
indices = np.arange(0,len(data))
plot_data = pd.DataFrame(data, index=indices, columns=cols)
plt.bar(plot_data['hops'].tolist(),plot_data['frequency'].tolist(),width=0.8)
plt.xlim([-20,20])
plt.ylim([0,20])
plt.ylabel('Frequency')
plt.xlabel('Hops')
Output-
My requirements-
I want the graph to have the scale X axis-[-20,20],Y axis [0,18] and the bars should be labelled like in this case the 1st bar should be numbered 1 in this case and so on.
From your comment above, I am assuming this is what you want. You just need to specify the positions at which you want the x-tick labels.
xtcks = [-20, 20]
plt.xticks(np.insert(xtcks, 1, data[:, 0]))
plt.yticks([0, 18])
Plotting histogram on a seaborn PairGrid with hue leads to stacking by default. Is there a way to avoid this ? (stacked=False is inefficient.)
I tried with seaborn.distplot, kde=False but the bars are too wide in my case and decreasing rwidth kind of shifts the bars away from the corresponding variable values (which does not happen with plt.hist).
EDIT code to illustrate so-called 'shifting away from the corresponding variable values' (actually plt.hist does it too but it is less obvious).
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
df = pd.DataFrame()
for n in ['a', 'b']:
tmp = pd.DataFrame({'name': [n] * 100,
'prior': [1, 10] * 50,
'post': [1, 10] * 50})
df = df.append(tmp)
g = sns.PairGrid(df, hue='name', diag_sharey=False)
g.map_offdiag(sns.regplot, fit_reg=False, x_jitter=.1)
g.map_diag(plt.hist, rwidth=0.2, stacked=False)
g = sns.PairGrid(df, hue='name', diag_sharey=False)
g.map_offdiag(sns.regplot, fit_reg=False, x_jitter=.1)
g.map_diag(sns.distplot, kde=False, hist_kws={'rwidth':0.2})
I need to create MatplotLib heatmap (pcolormesh) using Pandas DataFrame TimeSeries column (df_all.ts) as my X-axis.
How to convert Pandas TimeSeries column to something which can be used as X-axis in np.meshgrid(x, y) function to create heatmap? The workaround is to create Matplotlib drange using same parameters as in pandas column, but is there a simple way?
x = pd.date_range(df_all.ts.min(),df_all.ts.max(),freq='H')
xt = mdates.drange(df_all.ts.min(), df_all.ts.max(), dt.timedelta(hours=1))
y = arange(ylen)
X,Y = np.meshgrid(xt, y)
I do not know what you mean by heat map for a time series, but for a dataframe you may do as below:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from itertools import product
from string import ascii_uppercase
from matplotlib import patheffects
m, n = 4, 7 # 4 rows, 7 columns
df = pd.DataFrame(np.random.randn(m, n),
columns=list(ascii_uppercase[:n]),
index=list(ascii_uppercase[-m:]))
ax = plt.imshow(df, interpolation='nearest', cmap='Oranges').axes
_ = ax.set_xticks(np.linspace(0, n-1, n))
_ = ax.set_xticklabels(df.columns)
_ = ax.set_yticks(np.linspace(0, m-1, m))
_ = ax.set_yticklabels(df.index)
ax.grid('off')
ax.xaxis.tick_top()
optionally, to print actual values in the middle of each square, with some shadows for readability, you may do:
path_effects = [patheffects.withSimplePatchShadow(shadow_rgbFace=(1,1,1))]
for i, j in product(range(m), range(n)):
_ = ax.text(j, i, '{0:.2f}'.format(df.iloc[i, j]),
size='medium', ha='center', va='center',
path_effects=path_effects)