I need to plot a few exponential curves on the same plot - with the constraint that the plot ends at y=1.
For reference, here is the code:
from numpy import arange
from matplotlib import pyplot as plt
T = arange(60,89)
curve1 = 2**(T - 74)
curve2 = 2**(T - 60)
plt.plot(T,curve1 )
plt.plot(T,curve2 )
plt.show()
Here's the result:
The second curve is just barely visible, since the values are comparatively so low.
The problem I'm having is that all of these curves blow up to 700000+ fairly rapidly, but I'm only interested in the range being (0,1). How do I plot just these bits, but with nice smooth curves (so that one curve doesn't just stop halfway along)?
As you've found, this is easy to do if you adjust the range (T) for each function you add. However, if you need to change the functions, you'll need to recheck it.
The problem you're dealing with, generically, is calculating the x-range of some functions given their y-range - or, as a mathematician may put it, determining the domain of a function that corresponds to a range of its image. While, for a arbitrary function, this is impossible, it's possible if your function is injective, as is the case.
Let's say we have a function y=f(x), and the yrange is [y1,y2]. The x-range will be [f^(-1)(y1), f^(-1)(y2] (f^-1 being the inverse function of f)
Since we have multiple functions that we need to plot, the x_range is simply the greatest range - of all - the lower portion of the final range is the minimum of the lower portion of all the ranges, and the upper portion is the maximum of the upper portions.
Here's some code that exemplifies all this, by taking as a parameter the number of steps, and calculating the proper T over the x-range:
from numpy import arange
from matplotlib import pyplot as plt
from sympy import sympify, solve
f1= '2**(T - 74)' #note these are strings
f2= '2**(T - 60)'
y_bounds= (0.001, 1) #exponential functions never take 0 value, so we use 0.001
mm= (min, max)
x_bounds= [m(solve(sympify(f+"-"+str(y)))[0] for f in (f1,f2)) for y,m in zip(y_bounds, mm)]
print x_bounds
N_STEPS=100 #distributed over x_bounds
T = arange(x_bounds[0], x_bounds[1]+0.001, (x_bounds[1]-x_bounds[0])/N_STEPS)
curve1 = eval(f1) #this evaluates the function over the range, by evaluating the string as a python expression
curve2 = eval(f2)
plt.plot(T,curve1)
plt.plot(T,curve2)
plt.ylim([0,1])
plt.show()
The code outputs both the x-range (50.03, 74) and this plot:
This one was really easy: sorry to have wasted time on it.
Just set the y limits to be [0,1] via
plt.ylim([0,1])
and you're done.
In addition to the other answers, it may be worth plotting on a log-scale, since the growth of your functions is essentially exponential. For example:
from numpy import arange
from matplotlib import pyplot as plt
T = arange(60,89)
curve1 = 2**(T - 74)
curve2 = 2**(T - 60)
plt.semilogy(T,curve1 )
plt.semilogy(T,curve2 )
plt.show()
Related
I am trying to interpolate a cumulated distribution of e.g. i) number of people to ii) number of owned cars, showing that e.g. the top 20% of people own much more than 20% of all cars - off course 100% of people own 100% of cars. Also I know that there are e.g. 100mn people and 200mn cars.
Now coming to my code:
#import libraries (more than required here)
import pandas as pd
from scipy import interpolate
from scipy.interpolate import interp1d
from sympy import symbols, solve, Eq
import matplotlib.pyplot as plt
from matplotlib import pyplot as plt
%matplotlib inline
import plotly.express as px
from scipy import interpolate
curve=pd.read_excel('inputs.xlsx',sheet_name='inputdata')
Input data: Curveplot (cumulated people (x) on the left // cumulated cars (y) on the right)
#Input data in list form (I am not sure how to interpolate from a list for the moment)
cumulatedpeople = [0, 0.453086, 0.772334, 0.950475, 0.978981, 0.999876, 0.999990, 1]
cumulatedcars= [0, 0.016356, 0.126713, 0.410482, 0.554976, 0.950073, 0.984913, 1]
x, y = points[:,0], points[:,1]
interpolation = interp1d(x, y, kind = 'cubic')
number_of_people_mn= 100000000
oneperson = 1 / number_of_people_mn
dataset = pd.DataFrame(range(number_of_people_mn + 1))
dataset.columns = ["nr_of_one_person"]
dataset.drop(dataset.index[:1], inplace=True)
#calculating the position of every single person on the cumulated x-axis (between 0 and 1)
dataset["cumulatedpeople"] = dataset["nr_of_one_person"] / number_of_people_mn
#finding the "cumulatedcars" to the "cumulatedpeople" via interpolation (between 0 and 1)
dataset["cumulatedcars"] = interpolation(dataset["cumulatedpeople"])
plt.plot(dataset["cumulatedpeople"], dataset["cumulatedcars"])
plt.legend(['Cubic interpolation'], loc = 'best')
plt.xlabel('Cumulated people')
plt.ylabel('Cumulated cars')
plt.title("People-to-car cumulated curve")
plt.show()
However when looking at the actual plot, I get the following result which is false: Cubic interpolation
In fact, the curve should look almost like the one from a linear interpolation with the exact same input data - however this is not accurate enough for my purpose: Linear interpolation
Is there any relevant step I am missing out or what would be the best way to get an accurate interpolation from the inputs that almost looks like the one from a linear interpolation?
Short answer: your code is doing the right thing, but the data is unsuitable for cubic interpolation.
Let me explain. Here is your code that I simplified for clarity
from scipy.interpolate import interp1d
from matplotlib import pyplot as plt
cumulatedpeople = [0, 0.453086, 0.772334, 0.950475, 0.978981, 0.999876, 0.999990, 1]
cumulatedcars= [0, 0.016356, 0.126713, 0.410482, 0.554976, 0.950073, 0.984913, 1]
interpolation = interp1d(cumulatedpeople, cumulatedcars, kind = 'cubic')
number_of_people_mn= 100#000000
cumppl = np.arange(number_of_people_mn + 1)/number_of_people_mn
cumcars = interpolation(cumppl)
plt.plot(cumppl, cumcars)
plt.plot(cumulatedpeople, cumulatedcars,'o')
plt.show()
note the last couple of lines -- I am plotting, on the same graph, both the interpolated results and the input date. Here is the result
orange dots are the original data, blue line is cubic interpolation. The interpolator passes through all the points so technically is doing the right thing
Clearly it is not doing what you would want
The reason for such strange behavior is mostly at the right end where you have a few x-points that are very close together -- the interpolator produces massive wiggles trying to fit very closely spaced points.
If I remove two right-most points from the interpolator:
interpolation = interp1d(cumulatedpeople[:-2], cumulatedcars[:-2], kind = 'cubic')
it looks a bit more reasonable:
But still one would argue linear interpolation is better. The wiggles on the left end now because the gaps between initial x-poonts are too large
The moral here is that cubic interpolation should really be used only if gaps between x points are roughly the same
Your best bet here, I think, is to use something like curve_fit
a related discussion can be found here
specifically monotone interpolation as explained here yields good results on your data. Copying the relevant bits here, you would replace the interpolator with
from scipy.interpolate import pchip
interpolation = pchip(cumulatedpeople, cumulatedcars)
and get a decent-looking fit:
I am using the scipy peakfinder scipy.signal.find_peaks_cwt to find peaks in a signal. All peaks are found reliably, but I always get additional results (so far all of them were at the end of the signal) that are not peaks. I am wondering why this happens...
Here is a complete example with synthetic data:
from scipy.signal import find_peaks_cwt
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline # for jupyter notebooks
x = np.arange(0, 15, 0.1)
y = np.sin(x)
plt.plot(y)
peakinds = find_peaks_cwt(y, np.arange(1, 5))
plt.plot(peakinds, y[peakinds], 'o')
(To run in a normal python shell, umcomment the %matplotlib inline and add a plt.show() to the end)
Plot with peaks marked as dots:
(The last three dots should not be there)
With my real data the same thing happens:
(Here the last dot is wrong)
Why is this happening?
Your widths parameter in find_peaks_cwt is the problem.
from scipy.signal import find_peaks_cwt
import numpy as np
import matplotlib.pyplot as plt
x = np.arange(0, 15, 0.1)
y = np.sin(x)
fig0 = plt.figure()
ax0 = fig0.add_subplot(111)
ax0.plot(y)
peakinds = find_peaks_cwt(y, np.arange(1, 10)) # Here changed 5 to 10
ax0.plot(peakinds, y[peakinds], 'o')
plt.axis([0, 160, -1.1, 1.1])
From the documentation:
widths : sequence
1-D array of widths to use for calculating the CWT matrix. In general, this range should cover the expected width of peaks of interest.
Edit:
The default wavelet used is the Ricker wavelet. Basically, a convolution is performed between the signal and the wavelet at all specified widths (by calling ricker(width[i]). Therefore, the range you give has to go from small (to precisely localize the peak) to big enough (for detecting the peaks of interest) but not too big (to avoid aliasing - let's remember here that wavelet work in the frequency domain).
Excerpt of the documentation: The algorithm is as follows: 1 - Perform a continuous wavelet transform on vector, for the supplied widths. This is a convolution of vector with wavelet(width) for each width in widths.
If you change widths by np.arange(10, 20), you will notice that the peaks are detected but their maximum is not well-localized (we are missing the fine scales). If you try again with np.arange(1, 20), the peaks are better localized.
Also, if you want to visualize the ricker wavelet:
from scipy.signal import ricker
vec = ricker(100, 10) # (nb_of_points, frequency)
fig0 = plt.figure()
ax0 = fig0.add_subplot(111)
ax0.plot(vec)
Edit 2:
As for the extra peak erroneously detected at the end of the signal, this is most probably due to the border effect. Basically, the window for the convolution goes beyond the last sample of the signal. Usually, a padding (zero-padding, signal wrapping,...) is done on the signal but depending on how it is done (or not done at all) this kind of error can occur. Discarding the first few and last points is generally appropriate when working with these types of methods.
My code structure for a equation i am working on goes like this.
import matplotlib.pyplot as plt
for x in range (0 ,20):
temp = x * (-1*10**(-9)*(2.73**(x/0.0258*0.8) - 1) + 3.1)
P.append(temp)
Q.append(x)
print temp
plt.plot(Q,P)
plt.show()
Printing temp gives me this
4.759377049180328889121938118
-33447.32349862001706705983714
-2238083697441414267.104517188
-1.123028419942448387512537968E+32
-5.009018636753031534804021565E+45
-2.094526332030486492065138064E+59
-8.407952213322881981287736804E+72
-3.281407666305436036872349205E+86
-1.254513385166959745710275399E+100
-4.721184644539803475363811828E+113
-1.754816222227633792004755288E+127
-6.457248346728221564046430946E+140
-2.356455347384037854507854340E+154
-8.539736787129928434375037129E+167
-3.076467506425168063232368199E+181
-1.102652635599075169095479067E+195
-3.934509583907661118429424988E+208
-1.398436369682635574296418585E+222
-4.953240988408539700713401539E+235
-1.749015740500628326472633516E+249
The results shown are highly inaccurate. I know this because, the graph obtained is not what i am supposedly to get. A quick plotting of the same equation in google gave me this
This pic shows the differences in the graphs
The actual plot is the google.com one.
I m fairly certain that the errors are due to the floating point calculations. Can someone help me correct the formulated equations ?
Beginning from around 0.7 your scores drop into nothingness. Google is very clever to figure that out and limits the y-axis to a reasonable scale. In matplotlib you have to set this scale manually.
Also note that you are plotting integers from 0 to 19. When plotting continuous functions, linearly spaced points in an interval often make more sense.
import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(0, 0.8, 100)
y = x * (-1e-9 *(2.73**(x/0.0258*0.8) - 1) + 3.1)
plt.plot(x,y)
plt.ylim(-0.5, 2)
What I'm trying to do is, from a list of x-y points that has a periodic pattern, calculate the period. With my limited mathematics knowledge I know that Fourier Transformation can do this sort of thing.
I'm writing Python code.
I found a related answer here, but it uses an evenly-distributed x axis, i.e. dt is fixed, which isn't the case for me. Since I don't really understand the math behind it, I'm not sure if it would work properly in my code.
My question is, does it work? Or, is there some method in numpy that already does my work? Or, how can I do it?
EDIT: All values are Pythonic float (i.e. double-precision)
For samples that are not evenly spaced, you can use scipy.signal.lombscargle to compute the Lomb-Scargle periodogram. Here's an example, with a signal whose
dominant frequency is 2.5 rad/s.
from __future__ import division
import numpy as np
from scipy.signal import lombscargle
import matplotlib.pyplot as plt
np.random.seed(12345)
n = 100
x = np.sort(10*np.random.rand(n))
# Dominant periodic signal
y = np.sin(2.5*x)
# Add some smaller periodic components
y += 0.15*np.cos(0.75*x) + 0.2*np.sin(4*x+.1)
# Add some noise
y += 0.2*np.random.randn(x.size)
plt.figure(1)
plt.plot(x, y, 'b')
plt.xlabel('x')
plt.ylabel('y')
plt.grid()
dxmin = np.diff(x).min()
duration = x.ptp()
freqs = np.linspace(1/duration, n/duration, 5*n)
periodogram = lombscargle(x, y, freqs)
kmax = periodogram.argmax()
print("%8.3f" % (freqs[kmax],))
plt.figure(2)
plt.plot(freqs, np.sqrt(4*periodogram/(5*n)))
plt.xlabel('Frequency (rad/s)')
plt.grid()
plt.axvline(freqs[kmax], color='r', alpha=0.25)
plt.show()
The script prints 2.497 and generates the following plots:
As starting point:
(I assume all coordinates are positive and integer, otherwise map them to reasonable range like 0..4095)
find max coordinates xMax, yMax in list
make 2D array with dimensions yMax, xMax
fill it with zeros
walk through you list, set array elements, corresponding to coordinates, to 1
make 2D Fourier transform
look for peculiarities (peaks) in FT result
This page from Scipy shows you basic knowledge of how Discrete Fourier Transform works:
http://docs.scipy.org/doc/numpy-1.10.0/reference/routines.fft.html
They also provide API for using DFT. For your case, you should look at how to use fft2.
I need to (numerically) calculate the first and second derivative of a function for which I've attempted to use both splrep and UnivariateSpline to create splines for the purpose of interpolation the function to take the derivatives.
However, it seems that there's an inherent problem in the spline representation itself for functions who's magnitude is order 10^-1 or lower and are (rapidly) oscillating.
As an example, consider the following code to create a spline representation of the sine function over the interval (0,6*pi) (so the function oscillates three times only):
import scipy
from scipy import interpolate
import numpy
from numpy import linspace
import math
from math import sin
k = linspace(0, 6.*pi, num=10000) #interval (0,6*pi) in 10'000 steps
y=[]
A = 1.e0 # Amplitude of sine function
for i in range(len(k)):
y.append(A*sin(k[i]))
tck =interpolate.UnivariateSpline(x, y, w=None, bbox=[None, None], k=5, s=2)
M=tck(k)
Below are the results for M for A = 1.e0 and A = 1.e-2
http://i.imgur.com/uEIxq.png Amplitude = 1
http://i.imgur.com/zFfK0.png Amplitude = 1/100
Clearly the interpolated function created by the splines is totally incorrect! The 2nd graph does not even oscillate the correct frequency.
Does anyone have any insight into this problem? Or know of another way to create splines within numpy/scipy?
Cheers,
Rory
I'm guessing that your problem is due to aliasing.
What is x in your example?
If the x values that you're interpolating at are less closely spaced than your original points, you'll inherently lose frequency information. This is completely independent from any type of interpolation. It's inherent in downsampling.
Nevermind the above bit about aliasing. It doesn't apply in this case (though I still have no idea what x is in your example...
I just realized that you're evaluating your points at the original input points when you're using a non-zero smoothing factor (s).
By definition, smoothing won't fit the data exactly. Try putting s=0 in instead.
As a quick example:
import matplotlib.pyplot as plt
import numpy as np
from scipy import interpolate
x = np.linspace(0, 6.*np.pi, num=100) #interval (0,6*pi) in 10'000 steps
A = 1.e-4 # Amplitude of sine function
y = A*np.sin(x)
fig, axes = plt.subplots(nrows=2)
for ax, s, title in zip(axes, [2, 0], ['With', 'Without']):
yinterp = interpolate.UnivariateSpline(x, y, s=s)(x)
ax.plot(x, yinterp, label='Interpolated')
ax.plot(x, y, 'bo',label='Original')
ax.legend()
ax.set_title(title + ' Smoothing')
plt.show()
The reason that you're only clearly seeing the effects of smoothing with a low amplitude is due to the way the smoothing factor is defined. See the documentation for scipy.interpolate.UnivariateSpline for more details.
Even with a higher amplitude, the interpolated data won't match the original data if you use smoothing.
For example, if we just change the amplitude (A) to 1.0 in the code example above, we'll still see the effects of smoothing...
The problem is in choosing suitable values for the s parameter. Its values depend on the scaling of the data.
Reading the documentation carefully, one can deduce that the parameter should be chosen around s = len(y) * np.var(y), i.e. # of data points * variance. Taking for example s = 0.05 * len(y) * np.var(y) gives a smoothing spline that does not depend on the scaling of the data or the number of data points.
EDIT: sensible values for s depend of course also on the noise level in the data. The docs seem to recommend choosing s in the range (m - sqrt(2*m)) * std**2 <= s <= (m + sqrt(2*m)) * std**2 where std is the standard deviation associated with the "noise" you want to smooth over.