I am using scipy.signal library to find the peaks of a time graph. I inputted the y values of my pandas series. And it gave me the location of the the peaks. Now i am trying to use the locations from the find_peaks function to return the position in time of the peaks. Here is my function:
def turn_peaks_to_time_series(df,t_interval):
df_values = df['l'].values
fig, ax1 = plt.subplots()
x_of_peaks, _ = find_peaks(df_values, height=None)
y_of_peaks = df_values[x_of_peaks]
x_values_to_t_values = lambda x : timedelta(minutes=x) * t_interval
time_initial = np.min(df.index)
t_of_peaks = [ time_initial + x_values_to_t_values(int(i)) for i in x_of_peaks ] #source of issue
ax1.plot(t_of_peaks, y_of_peaks, "rp",label='peak') #plot peaks on graph
ax1.plot(df.index,df.l) # plot df line
plt.show()
However, peaks are not properly aligning
I know the issue is with my x_values_to_t_values function. In addition, any suggesting to optimize my code are very welcomed.
Turns out i was trying to reinvent the wheel. The solution to my problem was extremely simple. Also I adjusted the code to be more general.
def turn_peaks_to_time_series(series):
series_values = series.values
series_index = series.index
fig, ax1 = plt.subplots()
x_of_peaks, _ = find_peaks(series_values, height=None)
y_of_peaks = series_values[x_of_peaks]
ax1.plot(series_index[x_of_peaks], y_of_peaks, "rp",label='peak') #plot peaks on graph
ax1.plot(series_index,series_values) # plot df line
plt.show()
Related
I really like to the look of Seaborn's KDE plot:
I was wondering how can I replicate this for line plot.
In my case I actually have the function to generate the density instead of samples of the data.
So assuming I have the data in a data frame:
x - The value of x per sample.
y - The value of the density function at y.
μσ - Categorical variable to group data from the same density (In the code, I use the mean and standard deviation of a normal distribution).
I can use Seaborn's lineplot to get what I want without the area below the curve as in the image above.
I'm after achieving the look as above for the data I have.
Is there a way to replicate this theme, area under the curve included, for lineplot?
The code below shows what I got so far:
import numpy as np
import scipy as sp
import pandas as pd
from scipy.stats import norm
import matplotlib.pyplot as plt
import seaborn as sns
num_grid_pts = 1000
val_μ = [0, -1, 1, 0]
val_σ = [1, 2, 3, 4]
num_var = len(val_μ) # variations
x = np.linspace(-10, 10, num_grid_pts)
P = np.zeros((num_grid_pts, num_var)) # PDF
μσ = [f'μ = {μ}, σ = {σ}' for μ, σ in zip(val_μ, val_σ)]
for ii, (μ, σ) in enumerate(zip(val_μ, val_σ)):
randVar = norm(μ, σ)
P[:, ii] = randVar.pdf(x)
df_P = pd.DataFrame(data = {'x': np.tile(x, num_var), 'PDF': P.flatten('F'), 'μσ': np.repeat(μσ, len(x))})
f, ax = plt.subplots(figsize=(15, 10))
sns.lineplot(data=df_P, x='x', y='PDF', hue='μσ', ax=ax)
plot_lines = ax.get_lines()
for ii in range(num_var):
ax.fill_between(x=plot_lines[ii].get_xdata(), y1=plot_lines[ii].get_ydata(), alpha=0.25, color=plot_lines[ii].get_color())
ax.set_title(f'Normal Distribution')
ax.set_xlabel(f'Value')
ax.set_ylabel(f'Probability')
plt.show()
I used the lineplot to create the lines and then created the fills. But this is a hack, I was wondering if I can do it more naturally within Seaborn.
I found a way to manually play with the elements do so using the area object:
(
so.Plot(healthexp, "Year", "Spending_USD", color="Country")
.add(so.Area(alpha=.7), so.Stack())
)
The result is:
Yet for some reason the example code doesn't work.
What I did was using Seabron's lineplot() and then manually add fill_between() polygon:
ax = sns.lineplot(data=data_frame, x='data_x', y='data_y', hue='data_color')
plot_lines = ax.get_lines()
for i in range(num_unique_colors):
ax.fill_between(x=plot_lines[i].get_xdata(), y1=plot_lines[i].get_ydata(), alpha=0.25, color=plot_lines[i].get_color())
I'm trying to plot bar hist of interest rates and attach to it a PDF line. I have looked for solutions and found a way with kdeplot.
The result is pretty strange the kdeplot line is much higher than the bars hist and I don't know how to fix it.
After applying kdeplot:
Before applying kdeplot:
Here is the code that I'm using:
df=pd.read_excel('interestrate.xlsx')
k=0.0005
bin_steps = np.arange(start = df['Interest rate Real'].min(), stop = df['Interest rate Real'].max(), step = k)
ax = df['Interest rate Real'].hist(bins = bin_steps, figsize=[10,5])
ax1 = df['Interest rate Real']
vals = ax.get_xticks()
ax.set_xticklabels(['{:,.2%}'.format(x) for x in vals])
ax.set_yticklabels(['{:,.2%}'.format(x) for x in vals])
ax.set_title("PDF for Real Interest Rate")
#sns.kdeplot(ax1)
The following code snippet should set you in the right direction (just insert your data):
import scipy.stats as st
y = np.random.randn(1000) # your data goes here
plt.hist(y,50, density=True)
mn, mx = plt.xlim()
plt.xlim(mn, mx)
x = np.linspace(mn, mx, 301)
kde = st.gaussian_kde(y)
plt.plot(x, kde.pdf(x));
Alternatively with seaborn:
import seaborn as sns
plt.hist(y,50, density=True)
sns.kdeplot(y);
or as simple as:
sns.distplot(y)
I have a line plot and a scatter plot that are conceptually linked by sample IDs, i.e. each dot on the 2D scatter plot corresponds to a line on the line plot.
While I have done linked plotting before using scatter plots, I have not seen examples of this for the situation above - where I select dots and thus selectively view lines.
Is it possible to link dots on a scatter plot to a line on a line plot? If so, is there an example implementation available online?
Searching the web for bokeh link line and scatter plot yields no examples online, as of 14 August 2018.
I know this is a little late - but maybe this snippet of code will help?
import numpy as np
from bokeh.io import output_file, show
from bokeh.layouts import gridplot
from bokeh.models import ColumnDataSource
from bokeh.plotting import figure
from bokeh.models import Circle,MultiLine
def play():
x = np.linspace(0,10,100)
y = np.random.rand(100)
xs = np.random.rand(100,3)
ys = np.random.normal(size=(100,3))
xp = [list(xi) for xi in xs] # Because Multi-List does not like numpy arrays
yp = [list(yi) for yi in ys]
output_file('play.html')
source = ColumnDataSource(data=dict(x=x,y=y,xp=xp,yp=yp))
TOOLS = 'box_select'
left = figure(tools=TOOLS,plot_width=700,plot_height=700)
c1 = left.circle('x','y',source=source)
c1.nonselection_glyph = Circle(fill_color='gray',fill_alpha=0.4,
line_color=None)
c1.selection_glyph = Circle(fill_color='orange',line_color=None)
right = figure(tools=TOOLS,plot_width=700,plot_height=700)
c2 = right.multi_line(xs='xp',ys='yp',source=source)
c2.nonselection_glyph = MultiLine(line_color='gray',line_alpha=0.2)
c2.selection_glyph = MultiLine(line_color='orange')
p = gridplot([[left, right]])
show(p)
As things turn out, I was able to make this happen by using HoloViews rather than Bokeh. The relevant example for making this work comes from the Selection1d tap stream.
http://holoviews.org/reference/streams/bokeh/Selection1D_tap.html#selection1d-tap
I will do an annotated version of the example below.
First, we begin with imports. (Note: all of this assumes work is being done in the Jupyter notebook.)
import numpy as np
import holoviews as hv
from holoviews.streams import Selection1D
from scipy import stats
hv.extension('bokeh')
First off, we set some styling options for the charts. In my experience, I usually build the chart before styling it, though.
%%opts Scatter [color_index=2 tools=['tap', 'hover'] width=600] {+framewise} (marker='triangle' cmap='Set1' size=10)
%%opts Overlay [toolbar='above' legend_position='right'] Curve (line_color='black') {+framewise}
This function below generates data.
def gen_samples(N, corr=0.8):
xx = np.array([-0.51, 51.2])
yy = np.array([0.33, 51.6])
means = [xx.mean(), yy.mean()]
stds = [xx.std() / 3, yy.std() / 3]
covs = [[stds[0]**2 , stds[0]*stds[1]*corr],
[stds[0]*stds[1]*corr, stds[1]**2]]
return np.random.multivariate_normal(means, covs, N)
data = [('Week %d' % (i%10), np.random.rand(), chr(65+np.random.randint(5)), i) for i in range(100)]
sample_data = hv.NdOverlay({i: hv.Points(gen_samples(np.random.randint(1000, 5000), r2))
for _, r2, _, i in data})
The real magic begins here. First off, we set up a scatterplot using the hv.Scatter object.
points = hv.Scatter(data, ['Date', 'r2'], ['block', 'id']).redim.range(r2=(0., 1))
Then, we create a Selection1D stream. It pulls in points from the points object.
stream = Selection1D(source=points)
We then create a function to display the regression plot on the right. There's an empty plot that is the "default", and then there's a callback that hv.DynamicMap calls on.
empty = (hv.Points(np.random.rand(0, 2)) * hv.Curve(np.random.rand(0, 2))).relabel('No selection')
def regression(index):
if not index:
return empty
scatter = sample_data[index[0]]
xs, ys = scatter['x'], scatter['y']
slope, intercep, rval, pval, std = stats.linregress(xs, ys)
xs = np.linspace(*scatter.range(0)+(2,))
reg = slope*xs+intercep
return (scatter * hv.Curve((xs, reg))).relabel('r2: %.3f' % slope)
Now, we create the DynamicMap which dynamically loads the regression curve data.
reg = hv.DynamicMap(regression, kdims=[], streams=[stream])
# Ignoring annotation for average - it is not relevant here.
average = hv.Curve(points, 'Date', 'r2').aggregate(function=np.mean)
Finally, we display the plots.
points * average + reg
The most important thing I learned from building this is that the indices for the points have to be lined up with the indices for the regression curves.
I hope this helps others building awesome viz using HoloViews!
I have a synthetic dataset with 1000 noisy polygons of various orders and sin/cos curves that I can plot as lines using python seaborn.
Since I have quite a few lines that are overlapping, I'd like to plot some sort of heatmap or histogram of my line graphs.
I've tried iterating over the columns and aggregating the counts to use seaborn's heatmap graph, but with many lines this takes quite a while.
The next best thing that results in what I want was a hexbin graph (with seaborn jointgraph).
But it's a compromise between runtime and granularity (the shown graph has gridsize 750). I couldn't find any other graph-type for my problem. But I also don't know exactly what it might be called.
I've also tried with line alpha set to 0.2. This results in a similar graph to what I want. But it's less precise (if more than 5 lines overlap at the same point I already have zero transparency left). Also, it misses the typical coloration of heatmaps.
(Moot search terms were: heatmap, 2D line histogram, line histogram, density plots...)
Does anybody know packages to plot this more efficiently and high(er) quality or knows how to do it with the popular python plotters (i.e. the matplotlib family: matplotlib, seaborn, bokeh). I'm really fine with any package though.
It took me awhile, but I finally solved this using Datashader. If using a notebook, the plots can be embedded into interactive Bokeh plots, which looks really nice.
Anyhow, here is the code for static images, in case someone else is in need of something similar:
# coding: utf-8
import time
import numpy as np
from numpy.polynomial import polynomial
import pandas as pd
import matplotlib.pyplot as plt
import datashader as ds
import datashader.transfer_functions as tf
plt.style.use("seaborn-whitegrid")
def create_data():
# ...
# Each column is one data sample
df = create_data()
# Following will append a nan-row and reshape the dataframe into two columns, with each sample stacked on top of each other
# THIS IS CRUCIAL TO OPTIMIZE SPEED: https://github.com/bokeh/datashader/issues/286
# Append row with nan-values
df = df.append(pd.DataFrame([np.array([np.nan] * len(df.columns))], columns=df.columns, index=[np.nan]))
# Reshape
x, y = df.shape
arr = df.as_matrix().reshape((x * y, 1), order='F')
df_reshaped = pd.DataFrame(arr, columns=list('y'), index=np.tile(df.index.values, y))
df_reshaped = df_reshaped.reset_index()
df_reshaped.columns.values[0] = 'x'
# Plotting parameters
x_range = (min(df.index.values), max(df.index.values))
y_range = (df.min().min(), df.max().max())
w = 1000
h = 750
dpi = 150
cvs = ds.Canvas(x_range=x_range, y_range=y_range, plot_height=h, plot_width=w)
# Aggregate data
t0 = time.time()
aggs = cvs.line(df_reshaped, 'x', 'y', ds.count())
print("Time to aggregate line data: {}".format(time.time()-t0))
# One colored plot
t1 = time.time()
stacked_img = tf.Image(tf.shade(aggs, cmap=["darkblue", "darkblue"]))
print("Time to create stacked image: {}".format(time.time() - t1))
# Save
f0 = plt.figure(figsize=(w / dpi, h / dpi), dpi=dpi)
ax0 = f0.add_subplot(111)
ax0.imshow(stacked_img.to_pil())
ax0.grid(False)
f0.savefig("stacked.png", bbox_inches="tight", dpi=dpi)
# Heat map - This uses a equalized histogram (built-in default), there are other options, though.
t2 = time.time()
heatmap_img = tf.Image(tf.shade(aggs, cmap=plt.cm.Spectral_r))
print("Time to create stacked image: {}".format(time.time() - t2))
# Save
f1 = plt.figure(figsize=(w / dpi, h / dpi), dpi=dpi)
ax1 = f1.add_subplot(111)
ax1.imshow(heatmap_img.to_pil())
ax1.grid(False)
f1.savefig("heatmap.png", bbox_inches="tight", dpi=dpi)
With following run times (in seconds):
Time to aggregate line data: 0.7710442543029785
Time to create stacked image: 0.06000351905822754
Time to create stacked image: 0.05600309371948242
The resulting plots:
Although it seems you have tried this, plotting the counts seems to give a good representation of the data. However, it really depends what you're trying to find in your data, what is it supposed to tell you?
The reason for the long run time is due to plotting so many lines, a heatmap based on the counts however will plot fairly quickly.
I created some dummy data for sinus waves, based on noise, no. of lines, amplitude and shift. Added both a boxplot and heatmap.
import matplotlib.pyplot as plt
import numpy as np
import matplotlib as mpl
import random
import pandas as pd
np.random.seed(0)
#create dummy data
N = 200
sinuses = []
no_lines = 200
for i in range(no_lines):
a = np.random.randint(5, 40)/5 #amplitude
x = random.choice([int(N/5), int(N/(2/5))]) #random shift
sinuses.append(np.roll(a * np.sin(np.linspace(0, 2 * np.pi, N)) + np.random.randn(N), x))
fig = plt.figure(figsize=(20 / 2.54, 20 / 2.54))
sins = pd.DataFrame(sinuses, )
ax1 = plt.subplot2grid((3,10), (0,0), colspan=10)
ax2 = plt.subplot2grid((3,10), (1,0), colspan=10)
ax3 = plt.subplot2grid((3,10), (2,0), colspan=9)
ax4 = plt.subplot2grid((3,10), (2,9))
# plot line data
sins.T.plot(ax=ax1, color='lightblue',linewidth=.3)
ax1.legend_.remove()
ax1.set_xlim(0, N)
# try boxplot
sins.plot.box(ax=ax2, showfliers=False)
xticks = ax2.xaxis.get_major_ticks()
for index, label in enumerate(ax2.get_xaxis().get_ticklabels()):
xticks[index].set_visible(False) # hide ticks where labels are hidden
#make a list of bins
no_bins = 20
bins = list(np.arange(sins.min().min(), sins.max().max(), int(abs(sins.min().min())+sins.max().max())/no_bins))
bins.append(sins.max().max())
# calculate histogram
hists = []
for col in sins.columns:
count, division = np.histogram(sins.iloc[:,col], bins=bins)
hists.append(count)
hists = pd.DataFrame(hists, columns=[str(i) for i in bins[1:]])
print(hists.shape, '\n', hists.head())
cmap = mpl.colors.ListedColormap(['white', '#FFFFBB', '#C3FDB8', '#B5EAAA', '#64E986', '#54C571',
'#4AA02C', '#347C17', '#347235', '#25383C', '#254117'])
#heatmap
im = ax3.pcolor(hists.T, cmap=cmap)
cbar = plt.colorbar(im, cax=ax4)
yticks = np.arange(0, len(bins))
yticklabels = hists.columns.tolist()
ax3.set_yticks(yticks)
ax3.set_yticklabels([round(i,1) for i in bins])
ax3.set_title('Count')
yticks = ax3.yaxis.get_major_ticks()
for index, label in enumerate(ax3.get_yaxis().get_ticklabels()):
if index % 3 != 0: #make some labels invisible
yticks[index].set_visible(False) # hide ticks where labels are hidden
plt.show()
Although the boxplot is easy to interpret, it doesn't show the actual distribution of the data very well, but knowing where the median and quantiles lie may be helpful.
Increasing the number of lines and amount of values per line will increase plotting time considerably for the line plots, the heatmap is still fairly quick though to generate. The boxplot becomes indiscernible however.
I couldn't exactly replicate your data (or know the actual size of it), but perhaps the heatmap may be helpful.
My question is almost exactly similar to this one. However, I'm not satisfied with the answers, because I want to generate an actual heatmap, without explicitely binning the data.
To be precise, I would like to display the function that is the result of a convolution between the scatter data and a custom kernel, such as 1/x^2.
How should I implement this with matplotlib?
EDIT: Basically, what I have done is this. The result is here. I'd like to keep everything, the axis, the title, the labels and so on. Basically just change the plot to be like I described, while re-implementing as little as possible.
Convert your time series data into a numeric format with matplotlib.dats.date2num. Lay down a rectangular grid that spans your x and y ranges and do your convolution on that plot. Make a pseudo-color plot of your convolution and then reformat the x labels to be dates.
The label formatting is a little messy, but reasonably well documented. You just need to replace AutoDateFormatter with DateFormatter and an appropriate formatting string.
You'll need to tweak the constants in the convolution for your data.
import numpy as np
import datetime as dt
import pylab as plt
import matplotlib.dates as dates
t0 = dt.date.today()
t1 = t0+dt.timedelta(days=10)
times = np.linspace(dates.date2num(t0), dates.date2num(t1), 10)
dt = times[-1]-times[0]
price = 100 - (times-times.mean())**2
dp = price.max() - price.min()
volume = np.linspace(1, 100, 10)
tgrid = np.linspace(times.min(), times.max(), 100)
pgrid = np.linspace(70, 110, 100)
tgrid, pgrid = np.meshgrid(tgrid, pgrid)
heat = np.zeros_like(tgrid)
for t,p,v in zip(times, price, volume):
delt = (t-tgrid)**2
delp = (p-pgrid)**2
heat += v/( delt + delp*1.e-2 + 5.e-1 )**2
fig = plt.figure()
ax = fig.add_subplot(111)
ax.pcolormesh(tgrid, pgrid, heat, cmap='gist_heat_r')
plt.scatter(times, price, volume, marker='x')
locator = dates.DayLocator()
ax.xaxis.set_major_locator(locator)
ax.xaxis.set_major_formatter(dates.AutoDateFormatter(locator))
fig.autofmt_xdate()
plt.show()