I have a 2D numpy array containing X and Y data. The axis X contain time information with resolution of nano seconds. My problem occours because I need to compare simulated signal and a real signal. The problem of the simulated signal is that the simulator, with optimization purposes, has a diferent step sizes, as show on fig. 1.
In other hand my real data was acquired by an osciloscope and your data has exaclty 1 ns of diference between each point recorded. Because of this I need to have the same scale in the X axis to make a correct comparasion. How can I get the extra points to make my data with a constant step between the points?
EDIT 1
I need that this new points fill my array to make the simulated data with constant step, like show in fig 2.
The green points show an example of data extracted from extrapolated data.
A common way to do this is to simply duplicate some points (adding a point with same average value doesn't modify much most of statistical values)
So you have to change the dataset everytime you change the scale. Takes lots of time every scale change but it is super easy. If you don't have to change the scale too much, you can try.
This problem was solved using scipy interpolate module. Eg.
interpolate.py
import matplotlib.pyplot as plt
from scipy import interpolate as inter
import numpy as np
Fs = 0.1
f = 0.01
sample = 10
x = np.arange(sample)
y = np.sin(2 * np.pi * f * x / Fs)
inte = inter.interp1d(x,y)
new_x = np.arange(0,9,0.1)
new_y = inte(new_x)
fig = plt.figure()
ax1 = fig.add_subplot(111)
ax1.scatter(new_x,new_y,s=5,marker='.')
ax1.scatter(x,y,s=50,marker='*')
plt.show()
This code give the following result.
Related
So basically I have some data and I need to find a way to smoothen it out (so that the line produced from it is smooth and not jittery). When plotted out the data right now looks like this:
and what I want it to look is like this:
I tried using this numpy method to get the equation of the line, but it did not work for me as the graph repeats (there are multiple readings so the graph rises, saturates, then falls then repeats that multiple times) so there isn't really an equation that can represent that.
I also tried this but it did not work for the same reason as above.
The graph is defined as such:
gx = [] #x is already taken so gx -> graphx
gy = [] #same as above
#Put in data
#Get nice data #[this is what I need help with]
#Plot nice data and original data
plt.xlabel('x')
plt.ylabel('y')
plt.legend()
plt.show()
The method I think would be most applicable to my solution is getting the average of every 2 points and setting that to the value of both points, but this idea doesn't sit right with me - potential values may be lost.
You could use a infinite horizon filter
import numpy as np
import matplotlib.pyplot as plt
x = 0.85 # adjust x to use more or less of the previous value
k = np.sin(np.linspace(0.5,1.5,100))+np.random.normal(0,0.05,100)
filtered = np.zeros_like(k)
#filtered = newvalue*x+oldvalue*(1-x)
filtered[0]=k[0]
for i in range(1,len(k)):
# uses x% of the previous filtered value and 1-x % of the new value
filtered[i] = filtered[i-1]*x+k[i]*(1-x)
plt.plot(k)
plt.plot(filtered)
plt.show()
I figured it out, by averaging 4 results I was able to significantly smooth out the graph. Here is a demonstration:
Hope this helps whoever needs it
I am trying to match two graphs drawn below as close as possible by shifting one graph to another one in python.
Figure
Two graphs are of different ranges in x and these are drawn from two array datasets.
What I am thinking now is that by shifting one by one iteration of one of them and let it move until the difference between two data (or graph) get minimized.
Yet, I have no idea how to start.
What I'd do is try to shift one of the sets such that root mean square of difference between it and the other one is minimised. You could also narrow the criterion down to a region of interest in the data (I'm guessing around the peak). To compute RMS error, you'll need to interpolate the data onto the same x-values. Here's an example:
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import minimize
# Create data
x0 = np.linspace(0, 2.*np.pi, 101)
y0 = np.sin(x0)
x1 = np.linspace(0, 2.*np.pi, 201)
y1 = np.sin(x1+0.1*np.pi)
def target(x):
# Interpolate set 1 onto grid of set 0 while shifting it by x.
y1interp = np.interp(x0, x1+x, y1)
# Compute RMS error between the two data with set 1 shifted by x.
return np.sqrt(np.sum((y0-y1interp)**2.))
result = minimize(target, method="BFGS", x0=[0.])#, bounds=[(-0.2, 0.2)]) # bounds work with some methods only
print(result)
plt.figure()
plt.plot(x0, y0, "r", x1, y1, "b")
plt.plot(x1+result.x, y1, "k", lw=2)
plt.legend(["set 0", "set 1", "set 1 shifted"])
Result:
Note that scipy.optimize.minimize is quite sensitive to the settings so you'll need to play with them to make them better suited to tackle your problem:
https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.minimize.html
I have a bunch of data cotaining coordinate intervals within one large region, which i want to plot and then create a density plot showing where in the region there are more interval lines than others.
As a very basic example i've just plotted some horizontal lines for a given interval. I cant really find any good examples of how to create a better plot of intervals. I've looked into seaborn, but im not entirely sure about it. So here i've just created a basic example of what i am trying to do.
import numpy as np
import matplotlib.pyplot as plt
x1 = np.linspace(1, 30,100)
x2 = np.linspace(10,40,100)
x3 = np.linspace(2,50,100)
x4 = np.linspace(40,60,100)
x5 = np.linspace(30,78,100)
x6 = np.linspace(82,99,100)
x7 = np.linspace(66,85,100)
x = [x1,x2,x3,x4,x5,x6,x7]
y = np.linspace(1,len(x),len(x))
fig, ax = plt.subplots()
for i in range(len(x)):
ax.hlines(y[i], xmin=x[i][0], xmax=x[i][-1], linewidth=1)
plt.xlim(-5,105)
plt.show()
And then I would like to create a density plot of the number of intervals overlapping. Could anyone have any suggestions on how to proceed with this?
Thans for your help and suggestions
This seems to do what you want:
def count(xi):
samples = np.linspace(0, 100, 101)
return (xi[0] < samples) & (samples <= xi[-1])
is_in_range = np.apply_along_axis(count, arr=x, axis=1)
density = np.sum(is_in_range, axis=0)
The general idea is to make some output linspace, then check to see if those coordinates are in the ranges in the array x — that's what the function count does. Then apply_along_axis runs this function on every row (i.e. every 1D array) in your array x.
Here's what I get when I plot density:
You might want to adjust the <= and < signs in the count function to handle the edges as you like.
If your actual data have a different format, or if there are multiple intervals in one array, you will need to adjust this.
My code structure for a equation i am working on goes like this.
import matplotlib.pyplot as plt
for x in range (0 ,20):
temp = x * (-1*10**(-9)*(2.73**(x/0.0258*0.8) - 1) + 3.1)
P.append(temp)
Q.append(x)
print temp
plt.plot(Q,P)
plt.show()
Printing temp gives me this
4.759377049180328889121938118
-33447.32349862001706705983714
-2238083697441414267.104517188
-1.123028419942448387512537968E+32
-5.009018636753031534804021565E+45
-2.094526332030486492065138064E+59
-8.407952213322881981287736804E+72
-3.281407666305436036872349205E+86
-1.254513385166959745710275399E+100
-4.721184644539803475363811828E+113
-1.754816222227633792004755288E+127
-6.457248346728221564046430946E+140
-2.356455347384037854507854340E+154
-8.539736787129928434375037129E+167
-3.076467506425168063232368199E+181
-1.102652635599075169095479067E+195
-3.934509583907661118429424988E+208
-1.398436369682635574296418585E+222
-4.953240988408539700713401539E+235
-1.749015740500628326472633516E+249
The results shown are highly inaccurate. I know this because, the graph obtained is not what i am supposedly to get. A quick plotting of the same equation in google gave me this
This pic shows the differences in the graphs
The actual plot is the google.com one.
I m fairly certain that the errors are due to the floating point calculations. Can someone help me correct the formulated equations ?
Beginning from around 0.7 your scores drop into nothingness. Google is very clever to figure that out and limits the y-axis to a reasonable scale. In matplotlib you have to set this scale manually.
Also note that you are plotting integers from 0 to 19. When plotting continuous functions, linearly spaced points in an interval often make more sense.
import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(0, 0.8, 100)
y = x * (-1e-9 *(2.73**(x/0.0258*0.8) - 1) + 3.1)
plt.plot(x,y)
plt.ylim(-0.5, 2)
I need to (numerically) calculate the first and second derivative of a function for which I've attempted to use both splrep and UnivariateSpline to create splines for the purpose of interpolation the function to take the derivatives.
However, it seems that there's an inherent problem in the spline representation itself for functions who's magnitude is order 10^-1 or lower and are (rapidly) oscillating.
As an example, consider the following code to create a spline representation of the sine function over the interval (0,6*pi) (so the function oscillates three times only):
import scipy
from scipy import interpolate
import numpy
from numpy import linspace
import math
from math import sin
k = linspace(0, 6.*pi, num=10000) #interval (0,6*pi) in 10'000 steps
y=[]
A = 1.e0 # Amplitude of sine function
for i in range(len(k)):
y.append(A*sin(k[i]))
tck =interpolate.UnivariateSpline(x, y, w=None, bbox=[None, None], k=5, s=2)
M=tck(k)
Below are the results for M for A = 1.e0 and A = 1.e-2
http://i.imgur.com/uEIxq.png Amplitude = 1
http://i.imgur.com/zFfK0.png Amplitude = 1/100
Clearly the interpolated function created by the splines is totally incorrect! The 2nd graph does not even oscillate the correct frequency.
Does anyone have any insight into this problem? Or know of another way to create splines within numpy/scipy?
Cheers,
Rory
I'm guessing that your problem is due to aliasing.
What is x in your example?
If the x values that you're interpolating at are less closely spaced than your original points, you'll inherently lose frequency information. This is completely independent from any type of interpolation. It's inherent in downsampling.
Nevermind the above bit about aliasing. It doesn't apply in this case (though I still have no idea what x is in your example...
I just realized that you're evaluating your points at the original input points when you're using a non-zero smoothing factor (s).
By definition, smoothing won't fit the data exactly. Try putting s=0 in instead.
As a quick example:
import matplotlib.pyplot as plt
import numpy as np
from scipy import interpolate
x = np.linspace(0, 6.*np.pi, num=100) #interval (0,6*pi) in 10'000 steps
A = 1.e-4 # Amplitude of sine function
y = A*np.sin(x)
fig, axes = plt.subplots(nrows=2)
for ax, s, title in zip(axes, [2, 0], ['With', 'Without']):
yinterp = interpolate.UnivariateSpline(x, y, s=s)(x)
ax.plot(x, yinterp, label='Interpolated')
ax.plot(x, y, 'bo',label='Original')
ax.legend()
ax.set_title(title + ' Smoothing')
plt.show()
The reason that you're only clearly seeing the effects of smoothing with a low amplitude is due to the way the smoothing factor is defined. See the documentation for scipy.interpolate.UnivariateSpline for more details.
Even with a higher amplitude, the interpolated data won't match the original data if you use smoothing.
For example, if we just change the amplitude (A) to 1.0 in the code example above, we'll still see the effects of smoothing...
The problem is in choosing suitable values for the s parameter. Its values depend on the scaling of the data.
Reading the documentation carefully, one can deduce that the parameter should be chosen around s = len(y) * np.var(y), i.e. # of data points * variance. Taking for example s = 0.05 * len(y) * np.var(y) gives a smoothing spline that does not depend on the scaling of the data or the number of data points.
EDIT: sensible values for s depend of course also on the noise level in the data. The docs seem to recommend choosing s in the range (m - sqrt(2*m)) * std**2 <= s <= (m + sqrt(2*m)) * std**2 where std is the standard deviation associated with the "noise" you want to smooth over.