I have 15 features in my data set which are time series.
I want to plot it in a pairplot, and have the colours of the points be corresponding to a sequential colormap like so:
Early datapoints will then have a brighter blue-color than the old ones.
One of the columns in my dataframe is called index, and I tried using the hue='Indexparameter in the plotting function, without any luck.
import matplotlib.pyplot as plt
sns.set(style="ticks", color_codes=True,palette='Blues_d')
#norm = plt.Normalize(df.Index.min(), df.Index.max())
#sm = plt.cm.ScalarMappable(cmap="Reds", norm=norm)
#sm.set_array([])
ax= sns.pairplot(df,vars=['AvgPower','energy_mean',
'ActPower','WindSpeed','NacelleDirection','AvgSpeed','rms','kurt','skewness','signal_mean','Power spectral entropy','B1','B2','B3','B4','B5'],
hue='Index') # I do not include 'Index' in the vars, so it isn't plotted.
ax.get_legend().remove()
ax.figure.colorbar(sm)
plt.show()
How can I get this to work?
Related
I'm trying to visualize what filters are learning in CNN text classification model. To do this, I extracted feature maps of text samples right after the convolutional layer, and for size 3 filter, I got an (filter_num)*(length_of_sentences) sized tensor.
df = pd.DataFrame(-np.random.randn(50,50), index = range(50), columns= range(50))
g= sns.clustermap(df,row_cluster=True,col_cluster=False)
plt.setp(g.ax_heatmap.yaxis.get_majorticklabels(), rotation=0) # ytick rotate
g.cax.remove() # remove colorbar
plt.show()
This code results in :
Where I can't see all the ticks in the y-axis. This is necessary
because I need to see which filters learn which information. Is there
any way to properly exhibit all the ticks in the y-axis?
kwargs from sns.clustermap get passed on to sns.heatmap, which has an option yticklabels, whose documentation states (emphasis mine):
If True, plot the column names of the dataframe. If False, don’t plot the column names. If list-like, plot these alternate labels as the xticklabels. If an integer, use the column names but plot only every n label. If “auto”, try to densely plot non-overlapping labels.
Here, the easiest option is to set it to an integer, so it will plot every n labels. We want every label, so we want to set it to 1, i.e.:
g = sns.clustermap(df, row_cluster=True, col_cluster=False, yticklabels=1)
In your complete example:
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
df = pd.DataFrame(-np.random.randn(50,50), index=range(50), columns=range(50))
g = sns.clustermap(df, row_cluster=True, col_cluster=False, yticklabels=1)
plt.setp(g.ax_heatmap.yaxis.get_majorticklabels(), rotation=0) # ytick rotate
g.cax.remove() # remove colorbar
plt.show()
I have a dataset of values arriving in 5min timestamped intervals that I'm visualising grouped by hours of day, like this
I want to turn this into a whisker/box plot for the added information. However, the implementations of matplotlib, seaborn and pandas of this plot all want an array of raw data to compute the plot's contents themselves.
Is there a way to create whisker plots from pre-computed/grouped mean, median, std and quartiles? I would like to avoid reinventing the wheel with a comparatively inefficient grouping algorithm to build per-day datasets just for this.
This is some code to produce toy data and a version of the current plot.
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
# some toy data in a 15-day range
data = [1.5+np.sin(x)*5 for x in np.arange(0, 403.3, .1)]
s = pd.Series(data=data, index=pd.date_range('2019-01-01', '2019-01-15', freq='5min'))
s.groupby(s.index.hour).mean().plot(kind='bar')
plt.show()
Adding to #Quang Hoang's solution: You can use hlines() to display the median as well:
axis.bar(data.index, data['q75'] - data['q25'], bottom=data['q25'], width=wd)
axis.hlines(y=data['median'], xmin=data.index-wd/2, xmax=data.index+wd/2, color='black', linewidth=1)
I don't think there is anything for that. But you can create a whisker plot fairly simply with two plot command:
# precomputed data:
data = (s.groupby(s.index.hour)
.agg(['mean','std','median',
lambda x: x.quantile(.25),
lambda x: x.quantile(.75)])
)
data.columns = ['mean','std','median','q25','q75']
# plot the whiskers with `errorbar` from `mean` and `std`
fig, ax = plt.subplots(figsize=(12,6))
ax.errorbar(data.index,data['mean'],
yerr=data['std']*1.96,
linestyle='none',
capsize=5
)
# plot the boxes with `bar` at bottoms from quantiles
ax.bar(data.index, data['q75']-data['q25'], bottom=data['q25'])
Output:
The code below takes a dataframe filters by a string in a column and then plot the values of another column
I plot the values of the using histogram and than worked fine until I added Mean, Median and standard deviation but now I am just getting an empty graph where instead the all of the variables mentioned below should be plotted in one graph together with their labels
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib import pyplot as plt
from matplotlib import pyplot as plt
import numpy as np
df = pd.read_csv(r'C:/Users/output.csv', delimiter=";", encoding='unicode_escape')
df['Plot_column'] = df['Plot_column'].str.split(',').str[0]
df['Plot_column'] = df['Plot_column'].astype('int64', copy=False)
X=df[df['goal_colum']=='start running']['Plot_column'].values
dev_x= X
mean_=np.mean(dev_x)
median_=np.median(dev_x)
standard_=np.std(dev_x)
plt.hist(dev_x, bins=5)
plt.plot(mean_, label='Mean')
plt.plot(median_, label='Median')
plt.plot(standard_, label='Std Deviation')
plt.title('Data')
https://matplotlib.org/3.1.1/gallery/statistics/histogram_features.html
There are two major ways to plot in matplotlib, pyplot (the easy way) and ax (the hard way). Ax lets you customize your plot more and you should work to move towards that. Try something like the following
num_bins = 50
fig, ax = plt.subplots()
# the histogram of the data
n, bins, patches = ax.hist(dev_x, num_bins, density=1)
ax.plot(np.mean(dev_x))
ax.plot(np.median(dev_x))
ax.plot(np.std(dev_x))
# Tweak spacing to prevent clipping of ylabel
fig.tight_layout()
plt.show()
So I need to create a number of heatmaps in seaborn with varying datascales. Some range from 0-100 and some +100 to -100. What I need to do is to keep the colour grading the same throughout all graphs. So for example I want anything below 0 to be steadily getting from dark blue to light blue and anything above 0 to be getting darker red such as the terrible example graph below.
What I need that is not shown below very well is a fluid colour transition as currently I am not fully sure how seaborn is working it out as I have just listed a number of colours - Code below
sns.heatmap(df.T, cmap=ListedColormap(['#000066','#000099','#0000cc','#1a1aff','#6666ff','#b3b3ff','#ffff00','#ffcccc','#ff9999','#ff6666','#ff3333','#ff0000']), annot=False)
Thanks for any advise.
To specify the color normalization, you can use a Normalize instance, plt.Normalize(vmin, vmax) and supply it to the heatmap using the norm keyword (which is routed to the underlying pcolormesh).
To obtain a colormap with gradually changing colors, you may use the static LinearSegmentedColormap.from_list method and supply it with a list of colors.
import numpy as np; np.random.seed(0)
import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib.colors as mcolors
x1 = np.random.randint(0,100,size=(12,8))
x2 = np.random.randint(-100,100,size=(12,8))
fig, axes = plt.subplots(ncols=2)
cmap = mcolors.LinearSegmentedColormap.from_list("n",['#000066','#000099','#0000cc','#1a1aff','#6666ff','#b3b3ff',
'#ffff00','#ffcccc','#ff9999','#ff6666','#ff3333','#ff0000'])
norm = plt.Normalize(-100,100)
sns.heatmap(x1, ax=axes[0], cmap=cmap, norm=norm)
sns.heatmap(x2, ax=axes[1], cmap=cmap, norm=norm)
plt.show()
Hi all, I am trying to plot the following type of plot using seaborn with a different data set. The problem is when a histogram type is used, I cannot name the bins (like 2-2.5,2.5-3..etc) even though it provides kernel curves. Bar plots dont have function to draw the normal curve like in the picture. The image seems to be used SPSS statistical package which I have little knowledge of.
Following is the closest thing I can get (I have attached the code)
df = pd.DataFrame({'cat': ['1-1.5', '1.5-2', '2-2.5','2.5-3','3-3.5','3.5-4','4-4.5','4.5-5'],'val': [0,0,1,7,7,33,17,10]})
ax = sns.barplot(y = 'val', x = 'cat',
data = df)
ax.set(xlabel='Categories', ylabel='Frequency')
plt.show()
So the problem is of course that you don't have the original data, but data that has already been binned. One could reverse this binning and start with an array of raw data. Then perform the histogramming again and use a sns.distplot which, by default, shows a KDE plot as well.
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
cat = ['1-1.5', '1.5-2', '2-2.5','2.5-3','3-3.5','3.5-4','4-4.5','4.5-5']
val = [0,0,1,7,7,33,17,10]
data = []
for i in range(len(cat)):
data.extend([1.25+i*0.5]*val[i])
bins = np.arange(1,5.5, 0.5)
ax = sns.distplot(data, bins=bins, hist_kws= dict(edgecolor="k"))
ax.set(xlabel='Categories', ylabel='Frequency')
ax.set_xticks(bins[:-1]+0.25)
ax.set_xticklabels(cat)
plt.show()
Use the bw keyword argument to the KDE function to set the smoothness of the curve. E.g. sns.distplot(data, bins=bins, kde_kws=dict(bw=0.5), hist_kws= dict(edgecolor="k")) where bw=0.5 produces
Also try bw=0.1, bw=0.25, bw=0.35 and bw=2 to see the differences.