Producing a geometric sequence using NumPy - python

I'm trying to produce a geometric sequence, something similar to 1, 2, 4, 8...
I have the following code:
import numpy as np
lower_price = 1
upper_price = 2
total_grids = 10
grid_box = np.linspace(lower_price , upper_price, total_grids, retstep=True)
print(grid_box)
This outputs:
(array([1. , 1.11111111, 1.22222222, 1.33333333, 1.44444444,
1.55555556, 1.66666667, 1.77777778, 1.88888889, 2. ]), 0.1111111111111111)
This code creates an arithmetic, rather than a geometric sequence. How can I fix this code to produce the latter as opposed to the former?

You're looking for np.logspace, not np.linspace:
For example,
# Lower bound is 2**0 == 1
# Upper bound is 2**10 == 1024
np.logspace(0, 10, 10, base=2)
outputs:
[1.00000000e+00 2.16011948e+00 4.66611616e+00 1.00793684e+01
2.17726400e+01 4.70315038e+01 1.01593667e+02 2.19454460e+02
4.74047853e+02 1.02400000e+03]
If you're trying to get 10 values between 1 and 2, use:
# Lower bound is 2**0 == 1
# Upper bound is 2**1 == 2
np.logspace(0, 1, 10, base=2)

'percentage' gives the % increment between each values. You can see that it remains constant for constant total_grids and changes only if you change it.
import numpy as np
lower_price = 10
upper_price = 2000
total_grids = 10
grid_box = np.linspace(lower_price , upper_price, total_grids, retstep=True)
full_range = upper_price - lower_price
correctedStartValue = grid_box[0][1] - lower_price
percentage = (correctedStartValue * 100) / full_range
print(grid_box)
print(percentage)

Related

Spacing points per decade for logarithmic plot

I'm trying to space out a number of points in between a start and end frequency.
In a way you can see here down below:
Startfreq = 1 Hz ( variable )
Stopfreq = 5402 Hz ( also variable )
stepsperdecade
How i want it to look:
1 - 2 - 3 - 4.. 10 - 20 - 30..100 - 200 - 300.. 1000 - 2000 - 3000 - 4000 - 5000 - 5402
OR
1 - steps based on the stepsperdecade - 10 - steps based on the stepsperdecade - 100 .. 1000 - steps based on the stepsperdecade 5402.
SO i want the spacing to be same until it reaches the end frequency
I tried to do it in the following way in python.
from math import log10
import numpy as np
startfreq = 1
endfreq = 10000
points_per_decade = 10
numberdecades = log10(endfreq) - log10(startfreq)
print(numberdecades)
points = int(numberdecades) * points_per_decade
points = np.logspace(log10(startfreq), log10(endfreq), num=points, endpoint=True, base=10)
print(points)
But this way doesn't give me the 10 - 100 - 1000 i want in between the steps.
Would any one know or could someone hint me in the right direction.
I don't know if this works for you but using some basic maths I created this while loop snippet
from math import log10
startfreq = 1
endfreq = 5402
points_per_decade = 10
points = [startfreq]
ndig = int(log10(startfreq))
point = startfreq - startfreq % 10 ** ndig + 10 ** ndig
while point < endfreq:
points.append(point)
ndig = int(log10(point))
point = round(point + 10 ** ndig, ndigits=-ndig)
points.append(endfreq)
print(points)
I edited the answer to fix certain values, like startfreq = 175 should produce 200 as the next value, then continue in steps of +100: [175, 200, 300...]
You could do this comfortably with numpy arrays, by taking an outer product:
import numpy as np
exponents = np.arange(0, 4)# -> [0, 1, 2, 3]
prefactors = np.arange(1, 10)# -> [1, 2, ..., 9]
factor_matrix = np.outer(10**exponents, prefactors)
This will give you what you want in matrix form:
[[1, 2, ..., 9],
[10, 20, ..., 90],
...,
[1000, 2000, ..., 9000]]
Of course, you want a flat array that stops before endpoint=5402, then append endpoint manually:
flattened_array = factor_matrix.flatten()
flattened_array = flattened_array[flattened_array<endpoint]
flattened_array = np.append(flattened_array, endpoint)

Different results with matlab cumtrapz and scipy.integrate cumtrapz

Im translating some matlab code to python code and debugging both codes i get a different result from a call to the cumtrapz function, i also verified that the input data of both is similar. This are the codes:
Python Code
from numpy import zeros, ceil, array, mean, ptp, abs, sqrt, power
from scipy.integrate import cumtrapz
def step_length_vector(ics_y, fcs_y, acc_y, l, sf):
step_length_m1 = zeros(int(ceil(len(ics_y)/2))-1)
for i in range(0, len(ics_y)-2, 2):
av = acc_y[int(ics_y[i]):int(ics_y[i+2])+1]
t = array(range(1, int((ics_y[i+2]-ics_y[i])+2)))/sf
hvel = cumtrapz(t, av - mean(av), initial=0)
h = cumtrapz(t, hvel - mean(hvel), initial=0)
hend = ptp(h)
sl = 6.8*(sqrt(abs(2*l*hend - hend**2)))
step_length_m1[int(ceil(i/2))] = sl
return step_length_m1
Matlab Code
function [StepLengthM1] = StepLengthVector(ICsY,FCsY,ACCY,l,sf)
StepLengthM1 = zeros(1,ceil(length(ICsY)/2)-1);
for i= 1:2:length(ICsY)-2
av = ACCY(ICsY(i):ICsY(i+2));
t = (1:(ICsY(i+2)-ICsY(i))+1)/sf;
hvel = cumtrapz(t,av-mean(av));
h = cumtrapz(t,hvel-mean(hvel));
hend = peak2peak(h);
sl = 6.8*(sqrt(abs(2*l*hend - hend.^2)));
StepLengthM1(ceil(i/2)) = sl;
end
end
The hvel variable is different for both codes. Maybe im using wrong the scipy cumtrapz because i asume that the initial value that recives is 0. In both cases the inputs ics_y(ICsy), fcs_y(FCsY), acc_y(ACCY) are one dimensional arrays and l and sf are scalars.
Thanks!!!
(If this question is about cumtrapz, you should simplify your tests to just a single call to cumtrapz with the same input arrays in matlab and Python. Also, be sure you read the matlab and SciPy documentation of each function carefully. The SciPy functions are typically not exact duplicates of the corresponding matlab function.)
The problem is that when you give both x and y values, the order in which they are given in matlab/octave is x, y, but in the SciPy version, it is y, x.
For example,
octave:11> t = [0 1 1.5 4 4.5 6]
t =
0.00000 1.00000 1.50000 4.00000 4.50000 6.00000
octave:12> y = [1 2 3 -2 0 1]
y =
1 2 3 -2 0 1
octave:13> cumtrapz(t, y)
ans =
0.00000 1.50000 2.75000 4.00000 3.50000 4.25000
To get the same result with scipy.integrate.cumtrapz:
In [22]: from scipy.integrate import cumtrapz
In [23]: t = np.array([0, 1, 1.5, 4, 4.5, 6])
In [24]: y = np.array([1, 2, 3, -2, 0, 1])
In [25]: cumtrapz(y, t, initial=0)
Out[25]: array([0. , 1.5 , 2.75, 4. , 3.5 , 4.25])

numpy random array values between -1 and 1

what is the best way to create a NumPy array of a given size with values randomly and uniformly spread between -1 and 1?
I tried 2*np.random.rand(size)-1
I'm not sure. Try:
s = np.random.uniform(-1, 1, size)
reference: https://docs.scipy.org/doc/numpy/reference/generated/numpy.random.uniform.html
I can use numpy.arange:
import numpy as np
print(np.arange(start=-1.0, stop=1.0, step=0.2, dtype=np.float))
The step parameter defines the size and the uniformity in the distribution of the elements.
In your solution the np.random.rand(size) returns random floats in the half-open interval [0.0, 1.0)
this means 2 * np.random.rand(size) - 1 returns numbers in the half open interval [0, 2) - 1 := [-1, 1), i.e. range including -1 but not 1.
If this is what you wish to do then it is okay.
But, if you wish to generate numbers in the open interval (-1, 1), i.e. between -1 and 1 and hence not including either -1 or 1, may I suggest the following -
from numpy.random import default_rng
rg = default_rng(2)
size = (5,5)
rand_arr = rg.random(size)
rand_signs = rg.choice([-1,1], size)
rand_arr = rand_arr * rand_signs
print(rand_arr)
I have used the new suggested Generator per numpy, see link https://numpy.org/devdocs/reference/random/index.html#quick-start
100% working Code:
a = np.random.uniform(-1,1)
print(a)

Finding anomalous values from sinusoidal data

How can I find anomalous values from following data. I am simulating a sinusoidal pattern. While I can plot the data and spot any anomalies or noise in data, but how can I do it without plotting the data. I am looking for simple approaches other than Machine learning methods.
import random
import numpy as np
import matplotlib.pyplot as plt
N = 10 # Set signal sample length
t1 = -np.pi # Simulation begins at t1
t2 = np.pi; # Simulation ends at t2
in_array = np.linspace(t1, t2, N)
print("in_array : ", in_array)
out_array = np.sin(in_array)
plt.plot(in_array, out_array, color = 'red', marker = "o") ; plt.title("numpy.sin()")
Inject random noise
noise_input = random.uniform(-.5, .5); print("Noise : ",noise_input)
in_array[random.randint(0,len(in_array)-1)] = noise_input
print(in_array)
plt.plot(in_array, out_array, color = 'red', marker = "o") ; plt.title("numpy.sin()")
Data with noise
I've thought of the following approach to your problem, since you have only some values that are anomalous in the time vector, it means that the rest of the values have a regular progression, which means that if we gather all the data points in the vector under clusters and calculate the average step for the biggest cluster (which is essentially the pool of values that represent the real deal), then we can use that average to do a triad detection, in a given threshold, over the vector and detect which of the elements are anomalous.
For this we need two functions: calculate_average_step which will calculate that average for the biggest cluster of close values, and then we need detect_anomalous_values which will yield the indexes of the anomalous values in our vector, based on that average calculated earlier.
After we detected the anomalous values, we can go ahead and replace them with an estimated value, which we can determine from our average step value and by using the adjacent points in the vector.
import random
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
def calculate_average_step(array, threshold=5):
"""
Determine the average step by doing a weighted average based on clustering of averages.
array: our array
threshold: the +/- offset for grouping clusters. Aplicable on all elements in the array.
"""
# determine all the steps
steps = []
for i in range(0, len(array) - 1):
steps.append(abs(array[i] - array[i+1]))
# determine the steps clusters
clusters = []
skip_indexes = []
cluster_index = 0
for i in range(len(steps)):
if i in skip_indexes:
continue
# determine the cluster band (based on threshold)
cluster_lower = steps[i] - (steps[i]/100) * threshold
cluster_upper = steps[i] + (steps[i]/100) * threshold
# create the new cluster
clusters.append([])
clusters[cluster_index].append(steps[i])
# try to match elements from the rest of the array
for j in range(i + 1, len(steps)):
if not (cluster_lower <= steps[j] <= cluster_upper):
continue
clusters[cluster_index].append(steps[j])
skip_indexes.append(j)
cluster_index += 1 # increment the cluster id
clusters = sorted(clusters, key=lambda x: len(x), reverse=True)
biggest_cluster = clusters[0] if len(clusters) > 0 else None
if biggest_cluster is None:
return None
return sum(biggest_cluster) / len(biggest_cluster) # return our most common average
def detect_anomalous_values(array, regular_step, threshold=5):
"""
Will scan every triad (3 points) in the array to detect anomalies.
array: the array to iterate over.
regular_step: the step around which we form the upper/lower band for filtering
treshold: +/- variation between the steps of the first and median element and median and third element.
"""
assert(len(array) >= 3) # must have at least 3 elements
anomalous_indexes = []
step_lower = regular_step - (regular_step / 100) * threshold
step_upper = regular_step + (regular_step / 100) * threshold
# detection will be forward from i (hence 3 elements must be available for the d)
for i in range(0, len(array) - 2):
a = array[i]
b = array[i+1]
c = array[i+2]
first_step = abs(a-b)
second_step = abs(b-c)
first_belonging = step_lower <= first_step <= step_upper
second_belonging = step_lower <= second_step <= step_upper
# detect that both steps are alright
if first_belonging and second_belonging:
continue # all is good here, nothing to do
# detect if the first point in the triad is bad
if not first_belonging and second_belonging:
anomalous_indexes.append(i)
# detect the last point in the triad is bad
if first_belonging and not second_belonging:
anomalous_indexes.append(i+2)
# detect the mid point in triad is bad (or everything is bad)
if not first_belonging and not second_belonging:
anomalous_indexes.append(i+1)
# we won't add here the others because they will be detected by
# the rest of the triad scans
return sorted(set(anomalous_indexes)) # return unique indexes
if __name__ == "__main__":
N = 10 # Set signal sample length
t1 = -np.pi # Simulation begins at t1
t2 = np.pi; # Simulation ends at t2
in_array = np.linspace(t1, t2, N)
# add some noise
noise_input = random.uniform(-.5, .5);
in_array[random.randint(0, len(in_array)-1)] = noise_input
noisy_out_array = np.sin(in_array)
# display noisy sin
plt.figure()
plt.plot(in_array, noisy_out_array, color = 'red', marker = "o");
plt.title("noisy numpy.sin()")
# detect anomalous values
average_step = calculate_average_step(in_array)
anomalous_indexes = detect_anomalous_values(in_array, average_step)
# replace anomalous points with an estimated value based on our calculated average
for anomalous in anomalous_indexes:
# try forward extrapolation
try:
in_array[anomalous] = in_array[anomalous-1] + average_step
# else try backwward extrapolation
except IndexError:
in_array[anomalous] = in_array[anomalous+1] - average_step
# generate sine wave
out_array = np.sin(in_array)
plt.figure()
plt.plot(in_array, out_array, color = 'green', marker = "o");
plt.title("cleaned numpy.sin()")
plt.show()
Noisy sine:
Cleaned sine:
Your problem relies in the time vector (which is of 1 dimension). You will need to apply some sort of filter on that vector.
First thing that came to mind was medfilt (median filter) from scipy and it looks something like this:
from scipy.signal import medfilt
l1 = [0, 10, 20, 30, 2, 50, 70, 15, 90, 100]
l2 = medfilt(l1)
print(l2)
the output of this will be:
[ 0. 10. 20. 20. 30. 50. 50. 70. 90. 90.]
the problem with this filter though is that if we apply some noise values to the edges of the vector like [200, 0, 10, 20, 30, 2, 50, 70, 15, 90, 100, -50] then the output would be something like [ 0. 10. 10. 20. 20. 30. 50. 50. 70. 90. 90. 0.] and obviously this is not ok for the sine plot since it will produce the same artifacts for the sine values array.
A better approach to this problem is to treat the time vector as an y output and it's index values as the x input and do a linear regression on the "time linear function", not the quotes, it just means we're faking the 2 dimensional model by applying a fake X vector. The code implies the use of scipy's linregress (linear regression) function:
from scipy.stats import linregress
l1 = [5, 0, 10, 20, 30, -20, 50, 70, 15, 90, 100]
l1_x = range(0, len(l1))
slope, intercept, r_val, p_val, std_err = linregress(l1_x, l1)
l1 = intercept + slope * l1_x
print(l1)
whose output will be:
[-10.45454545 -1.63636364 7.18181818 16. 24.81818182
33.63636364 42.45454545 51.27272727 60.09090909 68.90909091
77.72727273]
Now let's apply this to your time vector.
import random
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy.stats import linregress
N = 20
# N = 10 # Set signal sample length
t1 = -np.pi # Simulation begins at t1
t2 = np.pi; # Simulation ends at t2
in_array = np.linspace(t1, t2, N)
# add some noise
noise_input = random.uniform(-.5, .5);
in_array[random.randint(0, len(in_array)-1)] = noise_input
# apply filter on time array
in_array_x = range(0, len(in_array))
slope, intercept, r_val, p_val, std_err = linregress(in_array_x, in_array)
in_array = intercept + slope * in_array_x
# generate sine wave
out_array = np.sin(in_array)
print("OUT ARRAY")
print(out_array)
plt.plot(in_array, out_array, color = 'red', marker = "o") ; plt.title("numpy.sin()")
plt.show()
the output will be:
the resulting signal will be an approximation of the original, as it is with any form of extrapolation/interpolation/regression filtering.

How should I multiply scipy.fftpack output vectors together?

The scipy.fftpack.rfft function returns the DFT as a vector of floats, alternating between the real and complex part. This means to multiply to DFTs together (for convolution) I will have to do the complex multiplication "manually" which seems quite tricky. This must be something people do often - I presume/hope there is a simple trick to do this efficiently that I haven't spotted?
Basically I want to fix this code so that both methods give the same answer:
import numpy as np
import scipy.fftpack as sfft
X = np.random.normal(size = 2000)
Y = np.random.normal(size = 2000)
NZ = np.fft.irfft(np.fft.rfft(Y) * np.fft.rfft(X))
SZ = sfft.irfft(sfft.rfft(Y) * sfft.rfft(X)) # This multiplication is wrong
NZ
array([-43.23961083, 53.62608086, 17.92013729, ..., -16.57605207,
8.19605764, 5.23929023])
SZ
array([-19.90115323, 16.98680347, -8.16608202, ..., -47.01643274,
-3.50572376, 58.1961597 ])
N.B. I am aware that fftpack contains a convolve function, but I only need to fft one half of the transform - my filter can be fft'd once in advance and then used over and over again.
You don't have to flip back to np.float64 and hstack. You can create an empty destination array, the same shape as sfft.rfft(Y) and sfft.rfft(X), then create a np.complex128 view of it and fill this view with the result of the multiplication. This will automatically fill the destination array as wanted.
If I retake your example :
import numpy as np
import scipy.fftpack as sfft
X = np.random.normal(size = 2000)
Y = np.random.normal(size = 2000)
Xf = np.fft.rfft(X)
Xf_cpx = Xf[1:-1].view(np.complex128)
Yf = np.fft.rfft(Y)
Yf_cpx = Yf[1:-1].view(np.complex128)
Zf = np.empty(X.shape)
Zf_cpx = Zf[1:-1].view(np.complex128)
Zf[0] = Xf[0]*Yf[0]
# the [...] is important to use the view as a reference to Zf and not overwrite it
Zf_cpx[...] = Xf_cpx * Yf_cpx
Zf[-1] = Xf[-1]*Yf[-1]
Z = sfft.irfft.irfft(Zf)
and that's it!
You can use a simple if statement if you want your code to be more general and handle odd lengths as explained in Jaime's answer.
Here is a function that does what you want:
def rfft_mult(a,b):
"""Multiplies two outputs of scipy.fftpack.rfft"""
assert a.shape == b.shape
c = np.empty( a.shape )
c[...,0] = a[...,0]*b[...,0]
# To comply with the rfft support of multi dimensional arrays
ar = a.reshape(-1,a.shape[-1])
br = b.reshape(-1,b.shape[-1])
cr = c.reshape(-1,c.shape[-1])
# Note that we cannot use ellipses to achieve that because of
# the way `view` work. If there are many dimensions, one should
# consider to manually perform the complex multiplication with slices.
if c.shape[-1] & 0x1: # if odd
for i in range(len(ar)):
ac = ar[i,1:].view(np.complex128)
bc = br[i,1:].view(np.complex128)
cc = cr[i,1:].view(np.complex128)
cc[...] = ac*bc
else:
for i in range(len(ar)):
ac = ar[i,1:-1].view(np.complex128)
bc = br[i,1:-1].view(np.complex128)
cc = cr[i,1:-1].view(np.complex128)
cc[...] = ac*bc
c[...,-1] = a[...,-1]*b[...,-1]
return c
You can take a view of a slice of your return array, e.g.:
>>> scipy.fftpack.fft(np.arange(8))
array([ 28.+0.j , -4.+9.65685425j, -4.+4.j ,
-4.+1.65685425j, -4.+0.j , -4.-1.65685425j,
-4.-4.j , -4.-9.65685425j])
>>> a = scipy.fftpack.rfft(np.arange(8))
>>> a
array([ 28. , -4. , 9.65685425, -4. ,
4. , -4. , 1.65685425, -4. ])
>>> a.dtype
dtype('float64')
>>> a[1:-1].view(np.complex128) # First and last entries are real
array([-4.+9.65685425j, -4.+4.j , -4.+1.65685425j])
You will need to handle even or odd sized FFTs differently:
>>> scipy.fftpack.fft(np.arange(7))
array([ 21.0+0.j , -3.5+7.26782489j, -3.5+2.79115686j,
-3.5+0.79885216j, -3.5-0.79885216j, -3.5-2.79115686j,
-3.5-7.26782489j])
>>> a = scipy.fftpack.rfft(np.arange(7))
>>> a
array([ 21. , -3.5 , 7.26782489, -3.5 ,
2.79115686, -3.5 , 0.79885216])
>>> a.dtype
dtype('float64')
>>> a[1:].view(np.complex128)
array([-3.5+7.26782489j, -3.5+2.79115686j, -3.5+0.79885216j])

Categories