Detect a given pattern in time series - python

How an I detect this type of change in a time series in python?click here to see image
Thanks for your help

There are many ways to do this.
I will show one of the fastest and simplest way. It is based on using correlation.
First of all we need a data(time series) and template(in our case the template is like a signum function):
data = np.concatenate([np.random.rand(70),np.random.rand(30)+2])
template = np.concatenate([[-1]*5,[1]*5])
Before detection I strongly recommend normalize the data(for example like that):
data = (data - data.mean())/data.std()
And now all we need is use of correlation function:
corr_res = np.correlate(data, template,mode='same')
You need to choose the threshold for results(you should define that value based on your template):
th = 9
You can see the results:
plt.figure(figsize=(10,5))
plt.subplot(211)
plt.plot(data)
plt.subplot(212)
plt.plot(corr_res)
plt.plot(np.arange(len(corr_res))[corr_res > th],corr_res[corr_res > th],'ro')
plt.show()

Related

How can I plot only particular values in xarray?

I am using data from cdasws to plot dynamic spectra. I am following the example found here https://cdaweb.gsfc.nasa.gov/WebServices/REST/jupyter/CdasWsExample.html
This is my code which I have modified to obtain a dynamic spectra for STEREO.
from cdasws import CdasWs
from cdasws.datarepresentation import DataRepresentation
import matplotlib.pyplot as plt
cdas = CdasWs()
import numpy as np
datasets = cdas.get_datasets(observatoryGroup='STEREO')
for index, dataset in enumerate(datasets):
print(dataset['Id'], dataset['Label'])
variables = cdas.get_variables('STEREO_LEVEL2_SWAVES')
for variable_1 in variables:
print(variable_1['Name'], variable_1['LongDescription'])
data = cdas.get_data('STEREO_LEVEL2_SWAVES', ['avg_intens_ahead'],
'2020-07-11T02:00:00Z', '2020-07-11T03:00:00Z',
dataRepresentation = DataRepresentation.XARRAY)[1]
print(data)
plt.figure(figsize = (15,7))
# plt.ylim(100,1000)
plt.xticks(fontsize=18)
plt.yticks(fontsize=18)
plt.yscale('log')
sorted_data.transpose().plot()
plt.xlabel("Time",size=18)
plt.ylabel("Frequency (kHz)",size=18)
plt.show()
Using this code gives a plot that looks something like this,
My question is, is there anyway of plotting this spectrum only for a particular frequency? For example, I want to plot just the intensity values at 636 kHz, is there any way I can do that?
Any help is greatly appreciated, I dont understand xarray, I have never worked with it before.
Edit -
Using the command,
data_stereo.avg_intens_ahead.loc[:,625].plot()
generates a plot that looks like,
While this is useful, what I needed is;
for the dynamic spectrum, if i choose a particular frequency like 600khz, can it display something like this (i have just added white boxes to clarify what i mean) -
If you still want the plot to be 2D, but to include a subset of your data along one of the dimensions, you can provide an array of indices or a slice object. For example:
data_stereo.avg_intens_ahead.sel(
frequency=[625]
).plot()
Or
# include a 10% band on either side
data_stereo.avg_intens_ahead.sel(
frequency=slice(625*0.9, 625*1.1)
).plot()
Alternatively, if you would actually like your plot to show white space outside this selected area, you could mask your data with where:
data_stereo.avg_intens_ahead.where(
data_stereo.frequency==625
).plot()

How to Convert Color-Code Legends from Logarithmic Scale to Actual Values?

What is the best way to display actual vallues in color-code legend when using logarithmic scale color coding in plotly.figure_factory.create_choropleth?
Here is the sample code:
import plotly.figure_factory as ff
fips = df['fips']
values = np.log10(df['values'])
endpts = list(np.linspace(0, 4, len(colorscale) - 1))
fig = ff.create_choropleth(fips=fips, values=values, scope = ['usa'], binning_endpoints = endpts)
Here is what I have currently:
Here is what I wish to have:
Exactly same as above map except in the legend displaying actual numbers instead of log10(values). For example instead of 0.0-0.5, and 0.5-1.0 (meaning 10^0-to-10^1/2, and 10^1/2-to-10^1) I would like to see: 1-3, 4-10 and so forth.
I am not familiar with Plotly API and since you do not provide a minimal working example, it is hard for me to test, but I am quite confident that you could specify a colormap. If so, then you could just convert the colormap in logarithmic scale while feeding the numbers in liner scale.

How to make the confidence interval (error bands) show on seaborn lineplot

I'm trying to create a plot of classification accuracy for three ML models, depending on the number of features used from the data (the number of features used is from 1 to 75, ranked according to a feature selection method). I did 100 iterations of calculating the accuracy output for each model and for each "# of features used". Below is what my data looks like (clsf from 0 to 2, timepoint from 1 to 75):
data
I am then calling the seaborn function as shown in documentation files.
sns.lineplot(x= "timepoint", y="acc", hue="clsf", data=ttest_df, ci= "sd", err_style = "band")
The plot comes out like this:
plot
I wanted there to be confidence intervals for each point on the x-axis, and don't know why it is not working. I have 100 y values for each x value, so I don't see why it cannot calculate/show it.
You could try your data set using Seaborn's pointplot function instead. It's specifically for showing an indication of uncertainty around a scatter plot of points. By default pointplot will connect values by a line. This is fine if the categorical variable is ordinal in nature, but it can be a good idea to remove the line via linestyles = "" for nominal data. (I used join = False in my example)
I tried to recreate your notebook to give a visual, but wasn't able to get the confidence interval in my plot exactly as you describe. I hope this is helpful for you.
sb.set(style="darkgrid")
sb.pointplot(x = 'timepoint', y = 'acc', hue = 'clsf',
data = ttest_df, ci = 'sd', palette = 'magma',
join = False);

a nice pystan trace plot for a stan vector parameter

I am doing a multiple regression in Stan.
I want a trace plot of the beta vector parameter for the regressors/design matrix.
When I do the following:
fit = model.sampling(data=data, iter=2000, chains=4)
fig = fit.plot('beta')
I get a pretty horrid image:
I was after something a little more user friendly. I have managed to hack the following which is closer to what I am after.
My hack plugs into the back of pystan as follows.
r = fit.extract() # r for results
from pystan.external.pymc import plots
param = 'beta'
beta = r[param]
name = df.columns.values.tolist()
(rows, cols) = beta.shape
assert(len(df.columns) == cols)
values = {param+'['+str(k+1)+'] '+name[k]:
beta[:,k] for k in range(cols)}
fig = plots.traceplot(values, values.keys())
for a in fig.axes:
# shorten the y-labels
l = a.get_ylabel()
if l == 'frequency':
a.set_ylabel('freq')
if l=='sample value':
a.set_ylabel('val')
fig.set_size_inches(8, 12)
fig.tight_layout(pad=1)
fig.savefig(g_dir+param+'-trace.png', dpi=125)
plt.close()
My question - surely I have missed something - but is there an easier way to get the kind of output I am after from pystan for a vector parameter?
Discovered that the ArviZ module does this pretty well.
ArviZ can be found here: https://arviz-devs.github.io/arviz/
I also struggled with this and just found a way to extract the parameters for the traceplot (the betas, I already knew).
When you do your fit, you can save it to a dataframe:
fit_df = fit.to_dataframe()
Now you have a new variable, your dataframe. Yes, it took me a while to find that pystan had a straightforward way to save the fit to a dataframe.
With that at hand you can check your dataframe. You can see it's header by printing the keys:
fit_df.keys()
the output is something like this:
Index([u'chain', u'chain_idx', u'warmup', u'accept_stat__', u'energy__',
u'n_leapfrog__', u'stepsize__', u'treedepth__', u'divergent__',
u'beta[1,1]', ...
u'eta05[892]', u'eta05[893]', u'eta05[894]', u'eta05[895]',
u'eta05[896]', u'eta05[897]', u'eta05[898]', u'eta05[899]',
u'eta05[900]', u'lp__'],
dtype='object', length=9037)
Now, you have everything you need! The betas are in columns as well as the chain ids. That's all you need to plot the betas and traceplot. Therefore, you can manipulate it in anyway you want and customize your figures as you wish. I'll show you an example of how I did it:
chain_idx = fit_df['chain_idx']
beta11 = fit_df['beta[1,1]']
beta12 = fit_df['beta[1,2]']
plt.subplots(figsize=(15,3))
plt.subplot(1,4,1)
sns.kdeplot(beta11)
plt.subplot(1,4,2)
plt.plot(chain_idx, beta11)
plt.subplot(1,4,3)
sns.kdeplot(beta12)
plt.subplot(1,4,4)
plt.plot(chain_idx, beta12)
plt.tight_layout()
plt.show()
The image from the above plot!
I hope it helps (if you still need it) ;)

How to overplot arrays of different shape?

I'm trying to overplot two arrays with different shapes but I'm unable to project one on the top of the other. For example:
#importing the relevant packages
import numpy as np
import matplotlib.pyplot as plt
def overplot(data1,data2):
'''
This function should make a contour plot
of data2 over the data1 plot.
'''
#creating the figure
fig = plt.figure()
#adding an axe
ax = fig.add_axes([1,1,1,1])
#making the plot for the
#first dataset
ax.imshow(data1)
#overplotting the contours
#for the second dataset
ax.contour(data2, projection = data2,
levels = [0.5,0.7])
#showing the figure
plt.show(fig)
return
if __name__ == '__main__':
'''
testing zone
'''
#creating two mock datasets
data1 = np.random.rand(3,3)
data2 = np.random.rand(9,9)
#using the overplot
overplot(data1,data2)
Currently, my output is something like:
While what I actually would like is to project the contours of the second dataset into the first one. This way, if I got images of the same object but with different resolution for the cameras I would be able to do such plots. How can I do that?
Thanks for your time and attention.
It's generally best to make the data match, and then plot it. This way you have complete control over how things are done.
In the simple example you give, you could use repeat along each axis to expand the 3x3 data to match the 9x9 data. That is, you could use, data1b = np.repeat(np.repeat(data1, 3, axis=1), 3, axis=0) to give:
But for the more interesting case of images, like you mention at the end of your question, then the axes probably won't be integer multiples and you'll be better served by a spline or other type interpolation. This difference is an example of why it's better to have control over this yourself, since there are many ways to to this type of mapping.

Categories