How to defind t as a time variable - python

from scipy import arange
import numpy as np
import matplotlib.pyplot as plt
from numpy import cos,pi,sin
fm=200
fc=500
t=np.arange(0, 5, 0.5)
y1=cos(2*pi*fc*t+(fc-fm)/fm*sin(2*pi*fm*t))
it just a line with value one in figure how can i defind this t as a time variable
Thank you

Something like this:
(np.arange(0, 5, 0.5) * 60).astype('timedelta64[s]')
It gives you:
array([ 0, 30, 60, 90, 120, 150, 180, 210, 240, 270], dtype='timedelta64[s]')
You can choose the units--s for seconds, m for minutes, etc.

Related

Python fitting a curve spitting TypeError: only size-1 arrays can be converted to Python scalars

I am trying to fit a curve, this is my code:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from scipy.optimize import curve_fit
import math
vector = np.vectorize(np.int_)
x_data = np.array([-5.0, -4, -3, -2, -1, 0, 1, 2, 3, 4])
x1 = vector(x_data)
y_data = np.array([77, 81, 171, 303, 409, 302, 139, 115, 88, 89])
y1 = vector(y_data)
def model_f(x, a, b, c, d):
return a/(math.sqrt(2*math.pi*d**2)) * math.exp( -(x-c)**2/(2*d**2) ) + b
popt, pcov = curve_fit(model_f, x1, y1, p0=[3,2,-16, 2])
This is the error I get:
TypeError: only size-1 arrays can be converted to Python scalars
From what I understand math.sqrt() and math.exp() are causing the problem. I thought that vectorizing the arrays would fix it. Am I missing something?
Don't call vectorize, and don't use the math module; use np. equivalents. Also your initial values were way off and produced a degenerate solution. Either don't provide initial values at all, or provide ones in the ballpark of what you know to be needed:
import numpy as np
from scipy.optimize import curve_fit
def model_f(x: np.ndarray, a: float, b: float, c: float, d: float) -> np.ndarray:
return a/d/np.sqrt(2*np.pi) * np.exp(-((x-c)/d)**2 / 2) + b
x1 = np.arange(-5, 5)
y1 = np.array((77, 81, 171, 303, 409, 302, 139, 115, 88, 89))
popt, _ = curve_fit(model_f, x1, y1, p0=(1000, 100, -1, 1))
print('Parameters:', popt)
print('Ideal vs. fit y:')
print(np.stack((y1, model_f(x1, *popt))))
Parameters: [916.86287196 85.71611182 -1.03419295 1.13753421]
Ideal vs. fit y:
[[ 77. 81. 171. 303. 409.
302. 139. 115. 88. 89. ]
[ 86.45393326 96.46010314 157.95219577 309.95808531 407.12196914
298.41481145 150.70663751 94.88484707 86.3133437 85.73407366]]

How perform unsupervised clustering on numbers in an Array using PyTorch

I got this array and I want to cluster/group the numbers into similar values.
An example of input array:
array([ 57, 58, 59, 60, 61, 78, 79, 80, 81, 82, 83, 101, 102, 103, 104, 105, 106]
expected result :
array([57,58,59,60,61]), ([78,79,80,81,82,83]), ([101,102,103,104,105,106])
I tried to use clustering but I don't think it's gonna work if I don't know how many I'm going to split up.
true = np.where(array>=1)
-> (array([ 57, 58, 59, 60, 61, 78, 79, 80, 81, 82, 83, 101, 102,
103, 104, 105, 106], dtype=int64),)
Dynamic binning requires explicit criteria and is not an easy problem to automate because each array may require a different set of thresholds to bin them efficiently.
I think Gaussian mixtures with a silhouette score criteria is the best bet you have. Here is a code for what you are trying to achieve. The silhouette scores help you determine the number of clusters/Gaussians you should use and is quite accurate and interpretable for 1D data.
import numpy as np
from sklearn.mixture import GaussianMixture
from sklearn.metrics import silhouette_score
import scipy
import matplotlib.pyplot as plt
%matplotlib inline
#Sample data
x = [57, 58, 59, 60, 61, 78, 79, 80, 81, 82, 83, 101, 102, 103, 104, 105, 106]
#Fit a model onto the data
data = np.array(x).reshape(-1,1)
#change value of clusters to check best silhoutte score
print('Silhoutte scores')
scores = []
for n in range(2,11):
model = GaussianMixture(n).fit(data)
preds = model.predict(data)
score = silhouette_score(data, preds)
scores.append(score)
print(n,'->',score)
n_best = np.argmax(scores)+2 #because clusters start from 2
model = GaussianMixture(n_best).fit(data) #best model fit
#Get list of means and variances
mu = np.abs(model.means_.flatten())
sd = np.sqrt(np.abs(model.covariances_.flatten()))
#Plotting
extend_window = 50 #this is for zooming into or out of the graph, higher it is , more zoom out
x_values = np.arange(data.min()-extend_window, data.max()+extend_window, 0.1) #For plotting smooth graphs
plt.plot(data, np.zeros(data.shape), linestyle='None', markersize = 10.0, marker='o') #plot the data on x axis
#plot the different distributions (in this case 2 of them)
for i in range(num_components):
y_values = scipy.stats.norm(mu[i], sd[i])
plt.plot(x_values, y_values.pdf(x_values))
#split data by clusters
pred = model.predict(data)
output = np.split(x, np.sort(np.unique(pred, return_index=True)[1])[1:])
print(output)
Silhoutte scores
2 -> 0.699444729378163
3 -> 0.8962176943475543 #<--- selected as nbest
4 -> 0.7602523591781903
5 -> 0.5835620702692205
6 -> 0.5313888070615105
7 -> 0.4457049486461251
8 -> 0.4355742296918767
9 -> 0.13725490196078433
10 -> 0.2159663865546218
This creates 3 gaussians with the following distributions to split the data into clusters.
Arrays output finally split by similar values
#output -
[array([57, 58, 59, 60, 61]),
array([78, 79, 80, 81, 82, 83]),
array([101, 102, 103, 104, 105, 106])]
You can perform kind of derivation on this array so that you can track changes better, assume your array is:
A = np.array([ 57, 58, 59, 60, 61, 78, 79, 80, 81, 82, 83, 101, 102, 103, 104, 105, 106])
so you can make a derivation vector by simply convolving your vector with [-1 1]:
A_ = abs(np.convolve(A, np.array([-1, 1])))
then A_ is:
array([57, 1, 1, 1, 1, 17, 1, 1, 1, 1, 1, 18, 2, 1, 1, 1, 106]
now you can define a threshold like 5 and find the cluster boundaries.
THRESHOLD = 5
cluster_bounds = np.argwhere(A_ > THRESHOLD)
now cluster_bounds is:
array([[0], [5], [11], [16]], dtype=int32)

What is the correct way handle with multidimensional array in gekko nonlinear regression?

Trying to make nonlinear regression with gekko library for python.
Sample was taken from here
http://apmonitor.com/wiki/index.php/Main/GekkoPythonOptimization
In my case I need multidimentional regression. So I tried make some
modifications. And here is result.
import pandas
from gekko import GEKKO
import numpy as np
import matplotlib.pyplot as plt
# # measurements
xm = np.array([[80435, 33576, 3930495], [63320, 21365, 2515052],
[131294, 46680, 10339497], [64470, 29271, 3272846],
[23966, 7973, 3450144], [19863, 11429, 3427307],
[32139, 13114, 2462822], [78976, 26973, 5619715],
[32857, 10455, 3192817], [29400, 12808, 3665615],
[4667, 2876, 2556650], [21477, 10349, 6005812],
[9168, 4617, 2878631], [385112, 127609, 4063576],
[55522, 29954, 3632023], [155, 197, 507],
[160, 106, 336], [25, 23, 669], [86, 96, 751], [199, 235, 515],
[60, 83, 511], [8, 25, 187], [32, 59, 679], [11, 22, 365],
[322, 244, 2001], [172, 229, 1110], [41, 48, 447], [109, 144, 2386],
[23, 27, 319], [105, 204, 672], [77, 77, 2]])
ym = np.array([90,85,91,90,90,82,81,85,83,83,72,78,
74,92,90,28,26,13,12,22,25,5,10,15,50,54,4,28,10,7,6])
# GEKKO model
m = GEKKO()
# parameters
x = m.Param(value=xm, name='X')
y = m.CV(value=ym)
y.FSTATUS = 1
a1 = m.FV()
a1.STATUS=1
a2 = m.FV()
a2.STATUS=1
a3 = m.FV()
a3.STATUS=1
# regression equation
for i in range(len(x)):
m.Equation(
y[i] == np.log10(x[i][0]) * a1 +
np.log10(x[i][1]) * a2 +
np.log10(x[i][2]) * a3)
# regression mode
m.options.IMODE = 2
# optimize
m.solve(disp=False, GUI=False)
# print parameters
print('Optimized, a = ', str(a1), str(a2), str(a3))
plt.plot(y.value, ym, 'bo')
# plt.plot(xm, y.value, 'r-')
plt.show()
As a result I get error
File "/usr/local/lib/python3.6/dist-packages/gekko/gekko.py", line 1830, in solve
self._write_csv()
File "/usr/local/lib/python3.6/dist-packages/gekko/gk_write_files.py", line
184, in _write_csv
raise Exception('Data arrays must have the same length, and match time discretization in dynamic problems')
Exception: Data arrays must have the same length, and match time discretization in dynamic problems
Here is a summary of the modifications:
Use m.log10 instead of np.log10
Define x as an Array and load each column (e.g. xm[:,0]) into the x[0].value separately.
Define the equation only once, not multiple times for each data row. IMODE=2 is efficient for large data sets this way because the equation is only defined one and the data points are all evaluated with that same expression.
Added red line to plot
Print a[i].value[0] to display the numeric solution
import pandas
from gekko import GEKKO
import numpy as np
import matplotlib.pyplot as plt
# # measurements
xm = np.array([[80435, 33576, 3930495], [63320, 21365, 2515052],
[131294, 46680, 10339497], [64470, 29271, 3272846],
[23966, 7973, 3450144], [19863, 11429, 3427307],
[32139, 13114, 2462822], [78976, 26973, 5619715],
[32857, 10455, 3192817], [29400, 12808, 3665615],
[4667, 2876, 2556650], [21477, 10349, 6005812],
[9168, 4617, 2878631], [385112, 127609, 4063576],
[55522, 29954, 3632023], [155, 197, 507],
[160, 106, 336], [25, 23, 669], [86, 96, 751], [199, 235, 515],
[60, 83, 511], [8, 25, 187], [32, 59, 679], [11, 22, 365],
[322, 244, 2001], [172, 229, 1110], [41, 48, 447], [109, 144, 2386],
[23, 27, 319], [105, 204, 672], [77, 77, 2]])
ym = np.array([90,85,91,90,90,82,81,85,83,83,72,78,
74,92,90,28,26,13,12,22,25,5,10,15,50,54,4,28,10,7,6])
# GEKKO model
m = GEKKO(remote=False)
# parameters
n = np.size(xm,1)
x = m.Array(m.Param,n)
for i in range(n):
x[i].value = xm[:,i]
y = m.CV(value=ym)
y.FSTATUS = 1
a1 = m.FV()
a1.STATUS=1
a2 = m.FV()
a2.STATUS=1
a3 = m.FV()
a3.STATUS=1
# regression equation
m.Equation(y == m.log10(x[0]) * a1 + \
m.log10(x[1]) * a2 + \
m.log10(x[2]) * a3)
# regression mode
m.options.IMODE = 2
# optimize
m.solve(disp=True, GUI=False)
# print parameters
print('Optimized, a = ', str(a1.value.value[0]), str(a2.value[0]), str(a3.value[0]))
plt.plot(y.value, ym, 'bo')
plt.plot([0,max(ym)],[0,max(ym)],'r-')
plt.show()

Python: Random numbers into a list

Create a 'list' called my_randoms of 10 random numbers between 0 and 100.
This is what I have so far:
import random
my_randoms=[]
for i in range (10):
my_randoms.append(random.randrange(1, 101, 1))
print (my_randoms)
Unfortunately Python's output is this:
[34]
[34, 30]
[34, 30, 75]
[34, 30, 75, 27]
[34, 30, 75, 27, 8]
[34, 30, 75, 27, 8, 58]
[34, 30, 75, 27, 8, 58, 10]
[34, 30, 75, 27, 8, 58, 10, 1]
[34, 30, 75, 27, 8, 58, 10, 1, 59]
[34, 30, 75, 27, 8, 58, 10, 1, 59, 25]
It generates the 10 numbers like I ask it to, but it generates it one at a time. What am I doing wrong?
You could use random.sample to generate the list with one call:
import random
my_randoms = random.sample(range(100), 10)
That generates numbers in the (inclusive) range from 0 to 99. If you want 1 to 100, you could use this (thanks to #martineau for pointing out my convoluted solution):
my_randoms = random.sample(range(1, 101), 10)
import random
my_randoms = [random.randrange(1, 101, 1) for _ in range(10)]
Fix the indentation of the print statement
import random
my_randoms=[]
for i in range (10):
my_randoms.append(random.randrange(1,101,1))
print (my_randoms)
This is way late but in-case someone finds this helpful.
You could use list comprehension.
rand = [random.randint(0, 100) for x in range(1, 11)]
print(rand)
Output:
[974, 440, 305, 102, 822, 128, 205, 362, 948, 751]
Cheers!
Here I use the sample method to generate 10 random numbers between 0 and 100.
Note: I'm using Python 3's range function (not xrange).
import random
print(random.sample(range(0, 100), 10))
The output is placed into a list:
[11, 72, 64, 65, 16, 94, 29, 79, 76, 27]
xrange() will not work for 3.x.
numpy.random.randint().tolist() is a great alternative for integers in a specified interval:
#[In]:
import numpy as np
np.random.seed(123) #option for reproducibility
np.random.randint(low=0, high=100, size=10).tolist()
#[Out:]
[66, 92, 98, 17, 83, 57, 86, 97, 96, 47]
You also have np.random.uniform() for floats:
#[In]:
np.random.uniform(low=0, high=100, size=10).tolist()
#[Out]:
[69.64691855978616,
28.613933495037948,
22.68514535642031,
55.13147690828912,
71.94689697855631,
42.3106460124461,
98.07641983846155,
68.48297385848633,
48.09319014843609,
39.211751819415056]
import random
a=[]
n=int(input("Enter number of elements:"))
for j in range(n):
a.append(random.randint(1,20))
print('Randomised list is: ',a)
Simple solution:
indices=[]
for i in range(0,10):
n = random.randint(0,99)
indices.append(n)
The one random list generator in the random module not mentioned here is random.choices:
my_randoms = random.choices(range(0, 100), k=10)
It's like random.sample but with replacement. The sequence passed doesn't have to be a range; it doesn't even have to be numbers. The following works just as well:
my_randoms = random.choices(['a','b'], k=10)
If we compare the runtimes, among random list generators, random.choices is the fastest no matter the size of the list to be created. However, for larger lists/arrays, numpy options are much faster. So for example, if you're creating a random list/array to assign to a pandas DataFrame column, then using np.random.randint is the fastest option.
Code used to produce the above plot:
import perfplot
import numpy as np
import random
perfplot.show(
setup=lambda n: n,
kernels=[
lambda n: [random.randint(0, n*2) for x in range(n)],
lambda n: random.sample(range(0, n*2), k=n),
lambda n: [random.randrange(n*2) for i in range(n)],
lambda n: random.choices(range(0, n*2), k=n),
lambda n: np.random.rand(n),
lambda n: np.random.randint(0, n*2, size=n),
lambda n: np.random.choice(np.arange(n*2), size=n),
],
labels=['random_randint', 'random_sample', 'random_randrange', 'random_choices',
'np_random_rand', 'np_random_randint', 'np_random_choice'],
n_range=[2 ** k for k in range(17)],
equality_check=None,
xlabel='~n'
)
my_randoms = [randint(n1,n2) for x in range(listsize)]

Difference of spline interpolation in IDL and Python

I wrote IDL code:
zz= [ 0, 5, 10, 15, 30, 50, 90, 100, 500]
uz= [ 20, 20, 20, 30, 60, 90, 30, -200, -200]*(-1.)
zp= findgen(120)*500+500
up= spline((zz-10.),uz,(zp/1000.0))
print, up
and IDL gave me the values of up array from about -20 to 500
.The same I did in Python
import numpy as npy
zz = npy.array([ 0, 5, 10, 15, 30, 50, 90, 100, 500])
uz = npy.array([ 20, 20, 20, 30, 60, 90, 30, -200, -200])*(-1.)
zp = npy.arange(0,120)*500+500
from scipy.interpolate import interp1d
cubic_interp_u = interp1d(zz-10., uz, kind='cubic')
up = cubic_interp_u(zp/1000)
print up
and it gave me up with values from about -20 to -160. Any idea? Thanks in advance!
Actually, I don't see a problem. I'm using UnivariateSpline here instead of interp1d and cubic_interp_u, but the underlying routines are essentially the same, as far as I can tell:
import numpy as npy
import pyplot as pl
from scipy.interpolate import UnivariateSpline
zz = npy.array([ 0, 5, 10, 15, 30, 50, 90, 100, 500])
uz = npy.array([ 20, 20, 20, 30, 60, 90, 30, -200, -200])*(-1.)
zp = npy.arange(0,120)*500+500
pl.plot(zz, uz, 'ro')
pl.plot(zp/100, UnivariateSpline(zz, uz, s=1, k=3)(zp/100), 'k-.')
pl.plot(zp/1000, UnivariateSpline(zz, uz, s=1, k=3)(zp/1000), 'b-')
The only problem I see is that you limited the interpolation, by using zp/1000. Using zp/100, I get all lots of values outside that -160, -20 range, which you can also see on the graph from the dot-dashed line, compared to the blue line (zp/1000):
It looks like scipy is doing a fine job.
By the way, if you want to (spline-)fit such outlying values, you may want to consider working in log-log space instead, or roughly normalizing your data (log-log space kind-of does that). Most fitting problems work best if the values are in the same order of magnitude.

Categories