Problem when graphing sine waves in python - python

I've written the following program using python in order to graph multiple sine waves of different frequencies, as well as display the points of intersection between them;
import numpy as np
import matplotlib.pyplot as plt
plt.style.use("ggplot")
fig = plt.figure()
ax = plt.axes()
f1 = float(input("Enter first frequency: "))
f2 = float(input("Enter second frequency: "))
t = np.linspace(0, 10, 1000)
y1 = np.sin(2*np.pi*f1*t)
y2 = np.sin(2*np.pi*f2*t)
plt.plot(t,y1, color = "firebrick", label = "sin({}Hz)".format(f1))
plt.plot(t,y2, color = "teal", label = "sin({}Hz)".format(f2))
plt.axhline(y = 0, color = "grey", linestyle = "dashed", label = "y = 0")
idx = np.argwhere(np.diff(np.sign(y1 - y2))).flatten()
plt.plot(t[idx], y1[idx], 'k.')
plt.legend(loc = "best", frameon=True, fancybox = True,
shadow = True, facecolor = "white")
plt.axis([-0.5, 10.5, -1.5, 1.5])
plt.title("Sine Waves")
plt.xlabel("Time")
plt.ylabel("Amplitude")
plt.show()
Sometimes the output looks just as it's supposed to, for example
in this screenshot.
However, at other times i obtain an undesired output such as in this one.
Could someone demonstrate how to fix this problem? Thank you.

I would like to suggest you to increase your time discretization or simply plot these waves in terms of number of n_T periods of the highest/lowest frequency to avoid undersampling problems. For instance, if you more interested in the lowest frequency you could modified your code as follows:
import numpy as np
import matplotlib.pyplot as plt
plt.style.use("ggplot")
fig = plt.figure()
ax = plt.axes()
f1 = float(input("Enter first frequency: "))
f2 = float(input("Enter second frequency: "))
n_T = float(input("Enter number of periods of lowest frequency to display: "))
t_max = n_T/min(f1,f2) # change here max or min if you want highest or lowest frequency to be represented on n_T periods
t = np.linspace(0, t_max, 1000)
y1 = np.sin(2*np.pi*f1*t)
y2 = np.sin(2*np.pi*f2*t)
plt.plot(t,y1, color = "firebrick", label = "sin({}Hz)".format(f1))
plt.plot(t,y2, color = "teal", label = "sin({}Hz)".format(f2))
plt.axhline(y = 0, color = "grey", linestyle = "dashed", label = "y = 0")
idx = np.argwhere(np.diff(np.sign(y1 - y2))).flatten()
plt.plot(t[idx], y1[idx], 'k.')
plt.legend(loc = "best", frameon=True, fancybox = True,
shadow = True, facecolor = "white")
plt.axis([-0.05*t_max, 1.05*t_max, -1.5, 1.5])
plt.title("Sine Waves")
plt.xlabel("Time")
plt.ylabel("Amplitude")
plt.show()
which gives for n_T=3 and f1=200 and f2=400 Hz :
and for your problematic case f1=520 and f2=750 Hz:
BONUS : if you want to compute automatically the minimum number n_T of periods to display the exact number of unique intersections between the two oscillating components. First, convert user inputs f1 and f2 from floats to integers, then find the lowest common multiple lcm between them (using greatest common divisor gcd function from math) and divide it by the highest frequency, here you are:
from math import gcd
def lcm(a,b):
"""
Compute the lowest common multiple of a and b
"""
return a*b/gcd(a,b)
# minimum of n_T periods to visualize every unique intersections of waves
n_T = lcm(f1,f2)/max(f1,f2)
for instance for f1=250 and f2=300 Hz, n_T=1500/300=5 which will give:

Related

Render y-axis properly when overlaying pandas KDE and histogram

Similar questions to this have been asked before but not using these exact two plotting functions together so here we are:
I have a column from a pandas DataFrame that I am plotting both a histogram and the KDE. However, when I plot them, the y-axis is using the raw data value range instead of discrete number of samples/bin (what I want). How can I fix this? The actual plot is perfect, but the y-axis is wrong.
Data:
t2 = [140547476703.0, 113395471484.0, 158360225172.0, 105497674121.0, 186457736557.0, 153705359063.0, 36826568371.0, 200653068740.0, 190761317478.0, 126529980843.0, 98776029557.0, 132773701862.0, 14780432449.0, 167507656251.0, 121353262386.0, 136377019007.0, 134190768743.0, 218619462126.0, 07912778721.0, 215628911255.0, 147024833865.0, 94136343562.0, 135685803096.0, 165901502129.0, 45476074790.0, 125195690010.0, 113910844263.0, 123134290987.0, 112028565305.0, 93448218430.0, 07341012378.0, 93146854494.0, 132958913610.0, 102326700019.0, 196826471714.0, 122045354980.0, 76591131961.0, 134694468251.0, 120212625727.0, 108456858852.0, 106363042112.0, 193367024628.0, 39578667378.0, 178075400604.0, 155513974664.0, 132834624567.0, 137336282646.0, 125379267464.0]
Code:
fig = plt.figure()
# plot hist + kde
t2[t2.columns[0]].plot.kde(color = "maroon", label = "_nolegend_")
t2[t2.columns[0]].plot.hist(density = True, edgecolor = "grey", color = "tomato", title = t2.columns[0])
# plot mean/stdev
m = t2[t2.columns[0]].mean()
stdev = t2[t2.columns[0]].std()
plt.axvline(m, color = "black", ymax = 0.05, label = "mean")
plt.axvline(m-2*stdev, color = "black", ymax = 0.05, linestyle = ":", label = "+/- 2*Stdev")
plt.axvline(m+2*stdev, color = "black", ymax = 0.05, linestyle = ":")
plt.legend()
What it looks like now:
If you want the real counts, the you'll need to scale the KDE up by the width of the bins multiplied by the number of observations. The trickiest part is accessing the data pandas uses to plot the KDE. (I've removed parts related to the legend to simplify the problem at hand).
import matplotlib.pyplot as plt
import numpy as np
# Calculate KDE, get data
axis = t2[t2.columns[0]].plot.kde(color = "maroon", label = "_nolegend_")
xdata = axis.get_children()[0]._x
ydata = axis.get_children()[0]._y
plt.clf()
# Real figure
fig, ax = plt.subplots(figsize=(7,5))
# Plot Histogram, no density.
x = ax.hist(t2[t2.columns[0]], edgecolor = "grey", color = "tomato")
# size of the bins * N obs
scale = np.diff(x[1])[0]*len(t2)
# Plot scaled KDE
ax.plot(xdata, ydata*scale, color='blue')
ax.set_ylabel('N observations')
plt.show()

Custom Histogram Normalization in matplotlib

I am trying to make a normalized histogram in matplotlib, however I want it normalized such that the total area will be 1000. Is there a way to do this?
I know to get it normalized to 1, you just have to include density=True,stacked=True in the argument of plt.hist(). An equivalent solution would be to do this and multiply the height of each column by 1000, if that would be more doable than changing what the histogram is normalized to.
Thank you very much in advance!
The following approach uses np.histogram to calculate the counts for each histogram bin. Using 1000 / total_count / bin_width as normalization factor, the total area will be 1000. On the contrary, to get the sum of all bar heights to be 1000, a factor of 1000 / total_count would be needed.
plt.bar is used to display the end result.
The example code calculates the same combined histogram with density=True, to compare it with the new histogram summing to 1000.
import matplotlib.pyplot as plt
import numpy as np
data = [np.random.randn(100) * 5 + 10, np.random.randn(300) * 4 + 14, np.random.randn(100) * 3 + 17]
fig, (ax1, ax2) = plt.subplots(ncols=2, figsize=(12, 4))
ax1.hist(data, stacked=True, density=True)
ax1.set_title('Histogram with density=True')
xmin = min([min(d) for d in data])
xmax = max([max(d) for d in data])
bins = np.linspace(xmin, xmax, 11)
bin_width = bins[1] - bins[0]
counts = [np.histogram(d, bins=bins)[0] for d in data]
total_count = sum([sum(c) for c in counts])
# factor = 1000 / total_count # to sum to 1000
factor = 1000 / total_count / bin_width # for an area of 1000
thousands = [c * factor for c in counts]
bottom = 0
for t in thousands:
ax2.bar(bins[:-1], t, bottom=bottom, width=bin_width, align='edge')
bottom += t
ax2.set_title('Histogram with total area of 1000')
plt.show()
An easy way to do this is to set up a second y-axis whose tick labels are the original multiplied by 1000, then hide the original axis' ticks:
import matplotlib.pyplot as plt
import numpy as np
data = [np.random.randn(5000)]
fig, ax1 = plt.subplots()
ax2 = ax1.twinx()
#hist returns a tuple that contains a list of y values at its 0 index:
y,_,_ = ax1.hist(data, density=True, bins=10, edgecolor = 'black')
#find max y value of histogram and multiply by 1000:
max_y = np.round(y.max(),1)*1000
#set up the second y-axis ticks as increments of max_y:
ax2.set_ylim(0,max_y)
ax2.set_yticks(np.linspace(0, max_y, 9))
#hide original y-axis ticks:
ax1.axes.yaxis.set_ticks([])
plt.show()

Different shading under Seaborn Distplot

I'm trying to create plot with shadings which are based on this MIC(1) line.
Different shading above than beneath.
from scipy import stats
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
def createSkewDist(mean, sd, skew, size):
# calculate the degrees of freedom 1 required to obtain the specific skewness statistic, derived from simulations
loglog_slope=-2.211897875506251
loglog_intercept=1.002555437670879
df2=500
df1 = 10**(loglog_slope*np.log10(abs(skew)) + loglog_intercept)
# sample from F distribution
fsample = np.sort(stats.f(df1, df2).rvs(size=size))
# adjust the variance by scaling the distance from each point to the distribution mean by a constant, derived from simulations
k1_slope = 0.5670830069364579
k1_intercept = -0.09239985798819927
k2_slope = 0.5823114978219056
k2_intercept = -0.11748300123471256
scaling_slope = abs(skew)*k1_slope + k1_intercept
scaling_intercept = abs(skew)*k2_slope + k2_intercept
scale_factor = (sd - scaling_intercept)/scaling_slope
new_dist = (fsample - np.mean(fsample))*scale_factor + fsample
# flip the distribution if specified skew is negative
if skew < 0:
new_dist = np.mean(new_dist) - new_dist
# adjust the distribution mean to the specified value
final_dist = new_dist + (mean - np.mean(new_dist))
return final_dist
desired_mean = 30
desired_skew = 1.5
desired_sd = 20
final_dist = createSkewDist(mean=desired_mean, sd=desired_sd, skew=desired_skew, size=1000000)
# inspect the plots & moments, try random sample
fig, ax = plt.subplots(figsize=(12,7))
sns.distplot(final_dist,
hist=False,
ax=ax,
color='darkred',
kde_kws=dict(linewidth=4))
l1 = ax.lines[0]
# Get the xy data from the lines so that we can shade
x1 = l1.get_xydata()[:,0]
x1[0] = 0
y1 = l1.get_xydata()[:,1]
y1[0] = 0
ax.fill_between(x1,y1, color="lemonchiffon", alpha=0.3)
ax.set_ylim(0.0001,0.03)
ax.axhline(0.002, ls="--")
ax.set_xlim(1.5, 200)
ax.set_yticklabels([])
ax.set_xticklabels([])
trans = transforms.blended_transform_factory(
ax.get_yticklabels()[0].get_transform(), ax.transData)
ax.text(0,0.0025, "{}".format("MIC(1) = 1"), color="blue", transform=trans,
ha="right", va="top", fontsize = 12)
trans_2 = transforms.blended_transform_factory(
ax.get_xticklabels()[0].get_transform(), ax.transData)
ax.text(84,0, "{}".format("\n84"), color="darkred", transform=trans_2,
ha="center", va="top", fontsize = 12)
ax.text(1.5,0, "{}".format("\n0"), color="darkred", transform=trans_2,
ha="center", va="top", fontsize = 12)
ax.axvline(x = 84, ymin = 0, ymax = 0.03, ls = '--', color = 'darkred' )
ax.set_yticks([])
ax.set_xticks([])
ax.spines['top'].set_color(None)
ax.spines['right'].set_color(None)
ax.spines['left'].set_linewidth(2)
ax.spines['bottom'].set_linewidth(2)
ax.set_ylabel("Concentration [mg/L]", labelpad = 80, fontsize = 15)
ax.set_xlabel("Time [h]", labelpad = 80, fontsize = 15)
ax.set_title("AUC/MIC", fontsize = 20, pad = 30)
plt.annotate("AUC/MIC",
xy=(18, 0.02),
xytext=(18, 0.03),
arrowprops=dict(arrowstyle="->"), fontsize = 12);
;
That's what I have:
And that's what I'd like to have (it's done in paint, so forgive me :) ):
I was experimenting with fill_between and fill_betweenx. However, without any satisfying results. Definitely, run out of ideas. I'd really appreciate any help on this. Best wishes!
Your fill_between works as expected. The problem is that color="lemonchiffon" with alpha=0.3 is barely visible. Try to use a brighter color and/or a higher value for alpha.
So, this colors the part of the graph between zero and the kde curve.
Now, to create a different coloring above and below the horizontal line, where= and np.minimum can be used in fill_between:
pos_hline = 0.002
ax.fill_between(x1, pos_hline, y1, color="yellow", alpha=0.3, where=y1 > pos_hline)
ax.fill_between(x1, 0, np.minimum(y1, pos_hline), color="blue", alpha=0.3)
Without where=y1 > pos_hline, fill_between would also color the region above the curve where the curve falls below that horizontal line.
PS: Note that sns.histplot has been deprecated since Seaborn version 0.11. To only plot the kde curve, you can use sns.kdeplot:
sns.kdeplot(final_dist, ax=ax, color='darkred', linewidth=4)

Plotting Fourier Transform Of A Sinusoid In Python

The following python program plots a sinusoid:
import matplotlib.pyplot as plt
import numpy as np
# Canvas
plt.style.use("ggplot")
# Frequency, Oscillations & Range
f = int(input("Enter frequency: "))
n_o = int(input("Enter number of oscillations: "))
t_max = n_o/f
t = np.linspace(0, t_max, 1000)
# Sine
y_sin = np.sin(2*np.pi*f*t)
# Setting subplots on separate axes
fig, axs = plt.subplots(2, 1, constrained_layout = True)
# Sine axis
axs[0].plot(t, y_sin, color = "firebrick", label = "sin({}Hz)".format(f))
axs[0].axhline(y = 0, color = "grey", linestyle = "dashed", label = "y = 0")
axs[0].legend(loc = "lower left", frameon = True, fancybox = True,
shadow = True, facecolor = "white")
# Title
axs[0].set_title("Sine")
axs[0].set_xlabel("Time(s)")
axs[0].set_ylabel("Amplitude")
# Axis Limits
axs[0].axis([-0.05*t_max, t_max+0.05*t_max, -1.5, 1.5])
plt.show()
How can i plot the Fourier transform of this frequency in the second subplot? I have seen various examples but they only work with small frequencies, whereas i'm working with frequencies above 100 Hz. Thanks.
By correctly applying FFT on your signal you should be just fine:
# FFT
# number of samples
N = len(t)
# time step
dt = t[1]-t[0]
# max number of harmonic to display
H_max = 5
xf = np.linspace(0.0, 1.0/(2.0*dt), N//2)
yf = np.fft.fft(y_sin)
axs[1].plot(xf, (2/N)*np.abs(yf[:N//2]))
axs[1].set_xlim([0, H_max*f])
axs[1].set_xlabel('f (Hz)')
axs[1].set_ylabel('$||H_i||_2$')
which gives for inputs f=100 and n_o=3:
Hope this helps.

Turn Weighted Numbers into Multiple Histograms

I am using the below code to create a weighted list of random numbers within a range.
import csv
import random
import numpy as np
import matplotlib.pyplot as plt
itemsList = []
rnd_numbs = csv.writer(open("rnd_numbs.csv", "wb"))
rnd_numbs.writerow(['number'])
items = [1, 2, 3, 4, 5]
probabilities= [0.1, 0.1, 0.2, 0.2, 0.4]
prob = sum(probabilities)
print prob
c = (1.0)/prob
probabilities = map(lambda x: c*x, probabilities)
print probabilities
ml = max(probabilities, key=lambda x: len(str(x)) - str(x).find('.'))
ml = len(str(ml)) - str(ml).find('.') -1
amounts = [ int(x*(10**ml)) for x in probabilities]
itemsList = list()
for i in range(0, len(items)):
itemsList += items[i:i+1]*amounts[i]
for item in itemsList:
rnd_numbs.writerow([item])
What I would like to do is (a) list these numbers randomly down the csv column, not sure why they are coming out pre-sorted, (b) list the numbers down the comumn instead of as one entry, and (c) create and save multiple histrograms at defined intervals, such as the first 100 numbers, then first 250 numbers, then first 500 numbers, ... to the end
For (c) I would like to create multiple pictures such as this for various cutoffs of the data list.
Attempt at histogram
x = itemsList[0:20]
fig = plt.figure()
ax = fig.add_subplot(111)
# 100 is the number of bins
ax.hist(x, 10, normed=1, facecolor='green', alpha=0.75)
ax.set_xlim(0, 5)
ax.set_ylim(0, 500)
ax.grid(True)
plt.show()
As for the third part of your question, take a look at matplotlib (and numpy.loadtxt() for reading your data). There are many examples to help you learn the basics, as well as advanced features. Here's a quick example of plotting a histogram of a random normal distribution:
import numpy as np
import matplotlib.pyplot as plt
x = np.random.randn(10000)
fig = plt.figure()
ax = fig.add_subplot(111)
# 100 is the number of bins
n = ax.hist(x, 100, facecolor='green', alpha=0.75)
# n[0] is the array of bin heights,
# n[1] is the array of bin edges
xmin = min(n[1]) * 1.1
xmax = max(n[1]) * 1.1
ymax = max(n[0]) * 1.1
ax.set_xlim(xmin, xmax)
ax.set_ylim(0, ymax)
ax.grid(True)
plt.show()
which gives you a nice image:
You can make loops to generate multiple images using different ranges of your data, and save the generated figures in a number of formats, with or without previewing them first.

Categories