I have taken an upper air sounding from UWYo Database and currently calculating the Brunt-Vaisala frequency (the 'squared' one, at the moment) using MetPy across several stations for some basic synoptic purposes.
The minimal (at some point) and reproducible code runs like this;
import metpy.calc as mpcalc
import matplotlib.pyplot as plt
import numpy as np
from datetime import datetime
from metpy.units import units, pandas_dataframe_to_unit_arrays
from siphon.simplewebservice.wyoming import WyomingUpperAir
stations = ['RPLI', 'RPUB', '98433', 'RPMP', 'RPVP', 'RPMD'] #6 stations
station_data = {}
date = datetime(2016, 8, 14, 0)
for station in stations:
print(f'Getting {station}')
df = pandas_dataframe_to_unit_arrays(WyomingUpperAir.request_data(date, station))
df['theta'] = mpcalc.potential_temperature(df['pressure'], df['temperature'])
df['bv_squared'] = mpcalc.brunt_vaisala_frequency_squared(df['height'], df['theta'])
station_data[station] = df
mean_bv = []
for station in stations:
df = station_data[station]
keep_idx = (df['height'] >= 1000 * units.m) & (df['height'] <= 5 * units.km)
mean_bv.append(np.mean(df['bv_squared'][keep_idx]).m)
plt.title("Atmospheric Stability")
plt.plot(mean_bv)
plt.show()
which produces a simple plot like this
I would like to ask for help on how to smooth out those 'lines'/data, like by applying interpolation producing a smooth curve? I'm a bit novice, thus I look forward to your help and responses.
Essentially what you're looking for is to smooth or (low-pass) filter the data.
One option is to fit the data points to some kind of appropriate curve (polynomial, spline, exponential, etc.), and replace the original data values with with those computed from the curve. You can look at some of the tools in scipy.optimize to do the fit.
For filtering, there are a variety of options, from a moving average to more traditional filters; for this a good simple Savitzky-Golay filter. scipy.signal has a lot of tools to help you with this.
Related
I am having difficulties accessing (the right) data when using holoviews/bokeh, either for connected plots showing a different aspect of the dataset, or just customising a plot with dynamic access to the data as plotted (say a tooltip).
TLDR: How to add a projection plot of my dataset (different set of dimensions and linked to main plot, like a marginal distribution but, you know, not restricted to histogram or distribution) and probably with a similar solution a related question I asked here on SO
Let me exemplify (straight from a ipynb, should be quite reproducible):
import numpy as np
import random, pandas as pd
import bokeh
import datashader as ds
import holoviews as hv
from holoviews import opts
from holoviews.operation.datashader import datashade, shade, dynspread, spread, rasterize
hv.extension('bokeh')
With imports set up, let's create a dataset (N target 10e12 ;) to use with datashader. Beside the key dimensions, I really need some value dimensions (here z and z2).
import numpy as np
import pandas as pd
N = int(10e6)
x_r = (0,100)
y_r = (100,2000)
z_r = (0,10e8)
x = np.random.randint(x_r[0]*1000,x_r[1]*1000,size=(N, 1))
y = np.random.randint(y_r[0]*1000,y_r[1]*1000,size=(N, 1))
z = np.random.randint(z_r[0]*1000,z_r[1]*1000,size=(N, 1))
z2 = np.ones((N,1)).astype(int)
df = pd.DataFrame(np.column_stack([x,y,z,z2]), columns=['x','y','z','z2'])
df[['x','y','z']] = df[['x','y','z']].div(1000, axis=0)
df
Now I plot the data, rasterised, and also activate the tooltip to see the defaults. Sure, x/y is trivial, but as I said, I care about the value dimensions. It shows z2 as x_y z2. I have a question related to tooltips with the same sort of data here on SO for value dimension access for the tooltips.
from matplotlib.cm import get_cmap
palette = get_cmap('viridis')
# palette_inv = palette.reversed()
p=hv.Points(df,['x','y'], ['z','z2'])
P=rasterize(p, aggregator=ds.sum("z2"),x_range=(0,100)).opts(cmap=palette)
P.opts(tools=["hover"]).opts(height=500, width=500,xlim=(0,100),ylim=(100,2000))
Now I can add a histogram or a marginal distribution which is pretty close to what I want, but there are issues with this soon past the trivial defaults. (E.g.: P << hv.Distribution(p, kdims=['y']) or P.hist(dimension='y',weight_dimension='x_y z',num_bins = 2000,normed=True))
Both are close approaches, but do not give me the other value dimension I'd like visualise. If I try to access the other value dimension ('x_y z') this fails. Also, the 'x_y z2' way seems very clumsy, is there a better way?
When I do something like this, my browser/notebook-extension blows up, of course.
transformed = p.transform(x=hv.dim('z'))
P << hv.Curve(transformed)
So how do I access all my data in the right way?
I have a dataset containing around 1000 different time series. Some of these are showing clear periodicity, and some are not.
I want to be able to automatically determine if a time series has clear periodicity in it, so I know if I need to do seasonal decomposition of it before applying some outlier methods.
Here is a signal with daily periodicity, each sample is taken with 15 minute interval.
In order for me to try and automatically determine if there is daily periodicity, I have tried using different methods. The first approach is using detection function in library from kats.
from kats.consts import TimeSeriesData
from kats.detectors.seasonality import FFTDetector, ACFDetector
def detect_seasonality(df,feature,time_col,detector_type):
df_kpi = df[[feature]].reset_index().rename(columns={feature:'value'})
ts = TimeSeriesData(df_kpi,time_col_name=time_col)
if detector_type == 'fft':
detector = FFTDetector(ts)
elif detector_type == 'acf':
detector = ACFDetector(ts)
else:
raise Exception("Detector types are fft or acf")
detection = detector.detector()
seasonality_presence = detection['seasonality_presence']
return seasonality_presence
This approach returned "False" seasonality presence, both using fft and acf detector.
Another approach is using fft
import numpy as np
import scipy.signal
from matplotlib import pyplot as plt
L = np.array(df[kpi_of_interest].values)
L -= np.mean(L)
# Window signal
L *= scipy.signal.windows.hann(len(L))
fft = np.fft.rfft(L, norm="ortho")
plt.figure()
plt.plot(abs(fft))
But here we don't see any clear way to determine the daily periodicity I expected.
So in order for me to automatically detect the daily periodicity, are there any other better methods to apply here? Are there any necessary preprocessing steps needed for me in beforehand? Or could it simply be a lack of data? I only have around 10 days of data for each time series.
I hope I explain this correctly.
I'm looking for a way to better visualise underwater noise.
I'm not after a solution (well, maybe I am) but more interested in what the perfect start would be, considering speed is of essence (so pretty much your opinion of Q1 to Q3).
I’m trying perform calculations and visualisations of a body of water.
For this I want basically import the bathymetry (csv containing x,y,z) of a substantial area (lets say 50kmx50km).
Q1: do I use a pandas dataframe or numpy array.
Q2: do you envision this as a mesh, where the column names are x and row names are y and the elevation(z) are the fields?
As z can be positive or negative, the landmass starts when z>0, which will always vary depending on tide. I want to be able to increase or decrease the low and high tide on the ‘fly’
The actual seafloor bottom is also important depending on the surface, salinity, water temperature per meter ect.
Q3: is this where I should go 3D (in mesh?)
For now I was just focussing on importing the bathymetry and visualise what I imported in a graphical way (and failing a bit).
So far my code looks like below, sorry about the lack of comments.
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import axes3d
import pandas as pd
import tkinter as tk
from tkinter import filedialog
import scipy
from scipy import interpolate
# min_? is minimum bound, max_? is maximum bound,
# dim_? is the granularity in that direction
filename = filedialog.askopenfilename()
df = pd.read_csv(filename, delimiter=',',names=["X", "Y", "Z"])
df.sort_values(by=["X"])
mat = df.to_numpy()
min_x = np.array(df['X'].values.tolist()).min(axis=0)
max_x = np.array(df['X'].values.tolist()).max(axis=0)
min_y = np.array(df['Y'].values.tolist()).min(axis=0)
max_y = np.array(df['Y'].values.tolist()).max(axis=0)
min_Z = np.array(df['Z'].values.tolist()).min(axis=0)
max_z = np.array(df['Z'].values.tolist()).max(axis=0)
dim_x = np.array(df['X'].count())
dim_y = np.array(df['Y'].count())
x=df.columns[0:]
y=df.columns[1:]
z=df.columns[2:]
x = np.linspace(min_x, max_x, dim_x)
y = np.linspace(min_y, max_y, dim_y)
X,Y = np.meshgrid(x, y)
# Interpolate (x,y,z) points [mat] over a normal (x,y) grid [X,Y]
# Depending on your "error", you may be able to use other methods
Z = scipy.interpolate.griddata((mat[:,0], mat[:,1]), mat[:,2], (X,Y),method='linear')
plt.pcolormesh(X,Y,Z)
plt.show()
For my evaluation, I wanted to use PyKalman filter library. I have created a very small time series data ready with three columns formatted as follows. The full dataset is attached here for reproduciability since I can't attach a file on stackoverflow:
http://www.mediafire.com/file/el1tkrdun0j2dk4/testdata.csv/file
time X Y
0.040662 1.041667 1
0.139757 1.760417 2
0.144357 1.190104 1
0.145341 1.047526 1
0.145401 1.011882 1
0.148465 1.002970 1
.... ..... .
I have read the PyKalman library documetation for Python and managed to do a simple linear filtering using Kalman Filterand here is my code
import matplotlib.pyplot as plt
from pykalman import KalmanFilter
import numpy as np
import pandas as pd
df = pd.read_csv('testdata.csv')
print(df)
pd.set_option('use_inf_as_null', True)
df.dropna(inplace=True)
X = df.drop('Y', axis=1)
y = df['Y']
estimated_value= np.array(X)
real_value = np.array(y)
measurements = np.asarray(estimated_value)
kf = KalmanFilter(n_dim_obs=1, n_dim_state=1,
transition_matrices=[1],
observation_matrices=[1],
initial_state_mean=measurements[0,1],
initial_state_covariance=1,
observation_covariance=5,
transition_covariance=1)
state_means, state_covariances = kf.filter(measurements[:,1])
state_std = np.sqrt(state_covariances[:,0])
print (state_std)
print (state_means)
print (state_covariances)
fig, ax = plt.subplots()
ax.margins(x=0, y=0.05)
plt.plot(measurements[:,0], measurements[:,1], '-r', label='Real Value Input')
plt.plot(measurements[:,0], state_means, '-b', label='Kalman-Filter')
plt.legend(loc='best')
ax.set_xlabel("Time")
ax.set_ylabel("Value")
plt.show()
Which gives the following plot as an output
As we can see from the plot and my dataset, my input is non-linear. Therefore, I wanted to use Kalman Filter and see if I can detect and track the drops of the filtered signal (blue color in the above plot). But since I am so new to Kalman Filter, I seem to have a hardtime understanding the mathematical formulation and and to get started with Unscented Kalman Filter. I found a good example on basic use of PyKalman UKF - but it doesn't show how to defined the percentage of the drop (peaks). I would, therefore, appreciate for any help at least which detects how big the drop from the peaks of the filtered one is (for example, 50% or 80% of the previous drop of the blue line in the plot). Any help would be appreciated.
I used fft.fft(data) and plotted that result I was expecting to the frequency that I gave in data.
I was expecting to see 50 hz but I got something strange.
import numpy as np
import math as m
import matplotlib.pyplot as plt
data=[]
for x in range(1000):
data.append(m.sin(2*m.pi*50*0.001*x))
plt.plot(np.fft.fft(data)/len(data))
plt.show()
What should I do to see 50 Hz as result?
Thank you very much
You need to specify the x axis in your plot.
First, create the data:
import numpy as np
import matplotlib.pyplot as plt
t = np.linspace(0, 1, 1000)
data = np.sin(2*np.pi*50*t)
Now, get the frequencies:
f = np.fft.fftfreq(len(data), t[1]-t[0]) # length of data, and dt
And plot the magnitude of the fft vs frequencies:
data_fft = np.abs(np.fft.fft(data)) / len(data)
plt.plot(f, data_fft)
This is really a question for the DSP stack exchange (https://dsp.stackexchange.com/).
You are doing two things that are causing the odd result:
You are performing a complex to complex FFT on real data, so you will have your signal mirrored about the Nyquist frequency (Hermitian symmetry).
You are dividing and plotting the complex output, not the Fourier amplitudes or powers.(Matplotlib doesn't "get" complex numbers, so this comes out looking like garbage.)
try this instead:
plt.plot(abs(np.fft.rfft(data))/(len(data)/2))