How to go about data modelling?

How to go about data modelling? - python

I've spent the last 2 hours or so figuring out how to apply it to my two variables. I am supposed to demonstrate/explain how I would handle the relationship of the two following variables in data modelling:
Pressure24h DangerLevel24h
1000.2 45
1014.8 90
990.8 14
998.4 95
1002.1 46
1006 21
There is another 185,000 data to work with but that's just a very small sample of it. Pressure24h is measured in hectopascals and DangerLevel24h is measured in percentage. That's the only information I have to work with.
Is there any method that can be used to approach this?
I created a scatter plot to show the relationship but that was as far as I have gotten so far.
https://i.stack.imgur.com/Ty5Yn.png

Here's my code as discussed in the comments:
def lobf(*cords):
cords = cords[0]
print(cords)
x_mean, y_mean = 0, 0
print(cords)
for x, y in cords:
x_mean += x # get x sum
y_mean += y # get y sum
x_mean /= len(cords) # get x mean
y_mean /= len(cords) # get y mean
# Step 2 from https://www.varsitytutors.com/hotmath/hotmath_help/topics/line-of-best-fit
sigma_numerator, sigma_denominator = 0, 0
for xi, yi in cords:
sigma_numerator += (xi - x_mean)*(yi - y_mean) # get numerator
sigma_denominator += (xi - x_mean)**2 # get denominator
m = sigma_numerator/sigma_denominator # get slope
c = y_mean - m*x_mean # get y-intercept
return m ,c
data_values = [(2,2), (4,4)] # Sample data value you can put yours here
# Creating a for loop for every increment of 5 to avoid the blue blob you got.
# You can change the increment as per your choice
predicted_values = []
increment = 5
m ,c = lobf(data_values)
for i in range(data_values[-1][0]+increment, len(data_values)*100, increment): # You can consider dangerlevel24h as your x
"""
Starts with incrementing the last x value of your data
"""
predicted_values.append((i,i*m + c)) # appends x, y
print(predicted_values)
You can then plot every value from predicted_values. By iterating through every 5th or your desired iteration, you can avoid blue blobs to form. Also, this method will help you in predicting future values that aren't in your data. You can also try using Pearson's Theory of Correlation which is related to this method.

Related

How to center the signals to start around zero

I have been working with some sensor data and I came across each one and realised that some of the sensors do not start around zero. So, I was wondering is there a way to be able to move signals to the centre? See the image below for the signal plots and individual plots.
Example of one of the sensors can be found in this chat (https://chat.stackoverflow.com/rooms/238608/signal)
Code:
(Get the signal)
for fp in DataPathList:
k += 1
# print(k)
# Load spreadsheet:
print('Opened file number: {}'.format(fp))
dataset = np.loadtxt(fname=fp)
y = dataset[:, column_no]
y_signal[k] = np.array(y)
y_S1_max_signal[k] = np.max(np.array(dataset[:, 0]))
S_F = 1000
N = np.array(y_signal[k]).shape[0]
S_T = 1 / S_F
t_n = S_T * N # seconds of sampling
x_time = np.linspace(0, t_n, N)
Edit 1:
I was able to solve this problem by subtracting the signal with its mean and managed to move the plot to start from 0. However, I have a question here will this cause a large change in my data?
for fp in DataPathList:
k += 1
# print(k)
# Load spreadsheet:
print('Opened file number: {}'.format(fp))
dataset = np.loadtxt(fname=fp)
y = dataset[:, column_no]
y_signal[k] = np.array(y)
y_signal[k] = np.array(y) - np.mean(np.array(y))
y_S1_max_signal[k] = np.max(np.array(dataset[:, 0]))
S_F = 1000
N = np.array(y_signal[k]).shape[0]
S_T = 1 / S_F
t_n = S_T * N # seconds of sampling
x_time = np.linspace(0, t_n, N)

You might want to post code in order for people to be able to help you better. Generally, what you might want to do is centre your peak at 0. If you have a numpy array, use np.argmax(array) to find the location of your maximum. Then in matplotlib, make sure that you subtract that index from your xspace (the list/array that goes on your x-axis).

Numerical solutions of unsteady 2D heat equation in python producing error incorrectly

I am trying to implement two numerical solutions. A forward Euler and a second order Runge-Kutta for the unsteady 2D heat equation with periodic boundary conditions. I am using a 3 point central difference in both cases for the spatial discretization.
This is my code:
def analytical(npx,npy,t,alpha=0.1): #Function to create analytical solution data
X = numpy.linspace(0,1,npx,endpoint=False) #X spatial range
Y = numpy.linspace(0,1,npy,endpoint=False) #Y Spatial Range
uShape = (1,numpy.shape(X)[0],numpy.shape(Y)[0]) #Shape of data array
u = numpy.zeros(uShape) #Allocate data array
m = 2 #m and n = 2 makes the function periodic in 0->1
n = 2
for i,x in enumerate(X): #Looping through x and y to produce analytical solution
for j,y in enumerate(Y):
u[0,i,j] = numpy.sin(m*pi*x)*numpy.sin(n*pi*y)*numpy.exp(-(m*m+n*n)*pi*pi*alpha*t)
return u,X,Y
def numericalHeatComparisonFE(): #Numerical function for forward euler
arraysize = 10 #Size of simulation array in x and y
d = 0.1 #value of pde coefficient alpha*dt/dx/dx
alpha = 1 #thermal diffusivity
dx = 1/arraysize #getting spatial step
dt = float(d*dx**2/alpha) #getting time step
T,x,y = analytical(arraysize,arraysize,0,alpha=alpha) #get analytical solution
numerical = numpy.zeros((2,)+T.shape) #create numerical data array
ns = numerical.shape #shape of numerical array aliased
numerical[0,:,:,:] = T[:,:,:] # assign initial conditions to first element in numerical
error = [] #create empty error list for absolute error
courant = alpha*dt/dx/dx #technically not the courant number but the coefficient for PDE
for i in range(1,20):#looping through twenty times for testing - solving FE each step
T,x,y = analytical(arraysize,arraysize,i*dt,alpha=alpha)
for idx,idy in numpy.ndindex(ns[2:]):
dxx = numerical[0,0,idx-1,idy]+numerical[0,0,(idx+1)%ns[-2],idy]-2*numerical[0,0,idx,idy] #X direction diffusion
dyy = numerical[0,0,idx,idy-1]+numerical[0,0,idx,(idy+1)%ns[-1]]-2*numerical[0,0,idx,idy] #Y direction diffusion
numerical[1,0,idx,idy] = courant*(dxx+dyy)+numerical[0,0,idx,idy] #Update formula
error.append(numpy.amax(numpy.absolute(numerical[1,:,:,:]-T[:,:,:])))#Add max error to error list
numerical[0,:,:,:] = numerical[1,:,:,:] #Update initial condition
print(numpy.amax(error))
def numericalHeatComparisonRK2():
arraysize = 10 #Size of simulation array in x and y
d = 0.1 #value of pde coefficient alpha*dt/dx/dx
alpha = 1 #thermal diffusivity
dx = 1/arraysize #getting spatial step
dt = float(d*dx**2/alpha) #getting time step
T,x,y = analytical(arraysize,arraysize,0,alpha=alpha) #get analytical solution
numerical = numpy.zeros((3,)+T.shape) #create numerical data array
ns = numerical.shape #shape of numerical array aliased
numerical[0,:,:,:] = T[:,:,:] # assign initial conditions to first element in numerical
error = [] #create empty error list for absolute error
courant = alpha*dt/dx/dx #technically not the courant number but the coefficient for PDE
for i in range(1,20): #Test twenty time steps -RK2
T,x,y = analytical(arraysize,arraysize,i*dt,alpha=alpha)
for idx,idy in numpy.ndindex(ns[2:]): #Intermediate step looping through indices
#Intermediate
dxx = numerical[0,0,idx-1,idy]+numerical[0,0,(idx+1)%ns[-2],idy]-2*numerical[0,0,idx,idy]
dyy = numerical[0,0,idx,idy-1]+numerical[0,0,idx,(idy+1)%ns[-1]]-2*numerical[0,0,idx,idy]
numerical[1,0,idx,idy] = 0.5*courant*(dxx+dyy)+numerical[0,0,idx,idy]
for idx,idy in numpy.ndindex(ns[2:]): #Update step looping through indices
#RK Step
dxx = numerical[1,0,idx-1,idy]+numerical[1,0,(idx+1)%ns[-2],idy]-2*numerical[1,0,idx,idy]
dyy = numerical[1,0,idx,idy-1]+numerical[1,0,idx,(idy+1)%ns[-1]]-2*numerical[1,0,idx,idy]
numerical[2,0,idx,idy] = courant*(dxx+dyy)+numerical[0,0,idx,idy]
error.append(numpy.amax(numpy.absolute(numerical[2,:,:,:]-T[:,:,:]))) #Add maximum error to list
numerical[0,:,:,:] = numerical[2,:,:,:] #Update initial conditions
print(numpy.amax(error))
if __name__ == "__main__":
numericalHeatComparisonFE()
numericalHeatComparisonRK2()
when running the code, I expect that the maximum error for the RK2 should be less than that of the FE but I get
0.0021498590913591187
for the FE and
0.011325197051528346
for the RK2. I have searched the code pretty thoroughly and haven't found any glaring typos or errors. I feel it has to be something minor that I am missing but I can't seem to find it. If you happen to spot an error or know something I don't help or a comment would be appreciated.
Thanks!

displaying Mandelbrot set in python using matplotlib.pyplot and numpy

I am trying to get a plot of a Mandelbrot set and having trouble plotting the expected plot.
As I understand, the Mandelbrot set is made up of values c, which would converge if are iterated through the following equation z = z**2 + c. I used the initial value of z = 0.
Initially, I was getting a straight line. I look for solutions online to see where I went wrong. Using the following link in particular, I attempted to improve my code:
https://scipy-lectures.org/intro/numpy/auto_examples/plot_mandelbrot.html
Here is my improved code. I don't really understand the reason of using np.newaxis and why I am plotting the final z values that converge. Am I misunderstanding the definition of the Mandelbrot set?
# initial values
loop = 50 # number of interations
div = 600 # divisions
# all possible values of c
c = np.linspace(-2,2,div)[:,np.newaxis] + 1j*np.linspace(-2,2,div)[np.newaxis,:]
z = 0
for n in range(0,loop):
z = z**2 + c
plt.rcParams['figure.figsize'] = [12, 7.5]
z = z[abs(z) < 2] # removing z values that diverge
plt.scatter(z.real, z.imag, color = "black" ) # plotting points
plt.xlabel("Real")
plt.ylabel("i (imaginary)")
plt.xlim(-2,2)
plt.ylim(-1.5,1.5)
plt.savefig("plot.png")
plt.show()
and got the following image, which looks closer to the Mandelbrot set than anything I got so far. But it looks more of a starfish with scattered dots around it.
Image
For reference, here is my initial code before improvement:
# initial values
loop = 50
div = 50
clist = np.linspace(-2,2,div) + 1j*np.linspace(-1.5,1.5,div) # range of c values
all_results = []
for c in clist: # for each value of c
z = 0 # starting point
for a in range(0,loop):
negative = 0 # unstable
z = z**2 + c
if np.abs(z) > 2:
negative +=1
if negative > 2:
break
if negative == 0:
all_results.append([c,"blue"]) #converging
else:
all_results.append([c,"black"]) # not converging

Alternatively, with another small change to the code in the question, one can use the values of z to colorize the plot. One can store the value of n where the absolute value of the series becomes larger than 2 (meaning it diverges), and color the points outside the Mandelbrot set with it:
import pylab as plt
import numpy as np
# initial values
loop = 50 # number of interations
div = 600 # divisions
# all possible values of c
c = np.linspace(-2,2,div)[:,np.newaxis] + 1j*np.linspace(-2,2,div)[np.newaxis,:]
# array of ones of same dimensions as c
ones = np.ones(np.shape(c), np.int)
# Array that will hold colors for plot, initial value set here will be
# the color of the points in the mandelbrot set, i.e. where the series
# converges.
# For the code below to work, this initial value must at least be 'loop'.
# Here it is loop + 5
color = ones * loop + 5
z = 0
for n in range(0,loop):
z = z**2 + c
diverged = np.abs(z)>2
# Store value of n at which series was detected to diverge.
# The later the series is detected to diverge, the higher
# the 'color' value.
color[diverged] = np.minimum(color[diverged], ones[diverged]*n)
plt.rcParams['figure.figsize'] = [12, 7.5]
# contour plot with real and imaginary parts of c as axes
# and colored according to 'color'
plt.contourf(c.real, c.imag, color)
plt.xlabel("Real($c$)")
plt.ylabel("Imag($c$)")
plt.xlim(-2,2)
plt.ylim(-1.5,1.5)
plt.savefig("plot.png")
plt.show()

The plot doesn't look correct, because in the code in the question z (i.e. the iterated variable) is plotted. Iterating z = z*z + c, the Mandelbrot set is given by those real, imaginary part pairs of c, for which the series doesn't diverge. Hence the small change to the code as shown below gives the correct Mandelbrot plot:
import pylab as plt
import numpy as np
# initial values
loop = 50 # number of interations
div = 600 # divisions
# all possible values of c
c = np.linspace(-2,2,div)[:,np.newaxis] + 1j*np.linspace(-2,2,div)[np.newaxis,:]
z = 0
for n in range(0,loop):
z = z**2 + c
plt.rcParams['figure.figsize'] = [12, 7.5]
p = c[abs(z) < 2] # removing c values for which z has diverged
plt.scatter(p.real, p.imag, color = "black" ) # plotting points
plt.xlabel("Real")
plt.ylabel("i (imaginary)")
plt.xlim(-2,2)
plt.ylim(-1.5,1.5)
plt.savefig("plot.png")
plt.show()

Plotting a graph given function definition

I'm currently trying to plot a graph of iterations of a certain function in python. I have defined the function as stated below but I am unsure on how to plot the graph such that the y value is on the y axis and the iteration number is on the x axis.
So, I have tried using the plt.plot function with different values in as my x values but using logistic(4, 0.7) as the y value for the y axis.
def logistic(A, x):
y = A * x * (1 - x)
return y
But each return an error. Can anyone shed any light on this, I want to do a total of 1000 iterations.

I dont understand much what you are saying concerning x being number ofiteration while you are showing us function logistic(4, 0.7). As far as I know, iterations is integer, whole number. You cant iterate just halfly or partially
def logistic(A, x):
y = A * x * (1 - x)
return y
A = 1
x_vals = []
y_vals = []
for x in range(1,1000):
x_vals.append(x)
y_vals.append(logistic(A,x))
#plt.plot(x_vals,y_vals) # See every iteration
#plt.show()
plt.plot(x_vals,y_vals) # See all iterations at once
plt.show()

Ah, the logistic map. Are you trying to make a cobweb plot? If so, your error may be elsewhere. As others have mentioned, you should post the error message and your code, so we can better help you. However, based on what you've given us, you can use numpy.arrays to achieve your desired result.
import numpy as np
import matplotlib.pyplot as plt
start = 0
end = 1
num = 1000
# Create array of 'num' evenly spaced values between 'start' and 'end'
x = np.linspace(start, end, num)
# Initialize y array
y = np.zeros(len(x))
# Logistic function
def logistic(A, x):
y = A * x * (1 - x)
return y
# Add values to y array
for i in range(len(x)):
y[i] = logistic(4, x[i])
plt.plot(x,y)
plt.show()
However, with numpy.arrays, you can omit the for loop and just do
x = np.linspace(start, end, num)
y = logistic(4, x)
and you'll get the same result, but faster.

Mean over a sub-interval Python

I am using Python but since I am noob I can't figure out how to compute the average of a vector each, let's say, 100 elements in a larger for-loop.
My trial so far, which is not what I want is
import numpy as np
r = np.zeros(10000) # declare my vector
for i in range(0,2000): # start the loop
r[i] = i**2 # some function to compute and save
if (i%100 == 0): # each time I save 100 elements I want the mean
av_r = np.mean(r)
print(av_r)
My code do not work as I want because I would like to make the average of 100 elements only then pass to the other 100, compute the mean and go on.
I try to reduce the dimension of the vector and clean it into the if:
import numpy as np
r = np.zeros(100) # declare my vector
for i in range(0,2000): # start the loop
r[i] = i**2 # some function to compute and save
if (i%100 == 0): # each time I save 100 elements I want the mean
av_r = np.mean(r)
print(av_r)
r = np.zeros(100)
naively, I thought you may save 100 elements, compute the average clean the vector and continue the calculation saving the other elements from 100+1 to 200+1 but it give me errors. In particular:
IndexError: index 100 is out of bounds for axis 0 with size 100
Many thanks for your help.

Is this what you're looking for? This code will iterate from 0 to 2000 in intervals of 100, mapping some function (x -> x**2) over each interval, calculating the mean and printing the result.
import numpy as np
r = np.zeros(10000)
for i in range(0, 2000, 100):
interval = [x ** 2 for x in r[i:i + 100]]
av_r = np.mean(interval)
print(av_r)
The output from this is just a series of 20 0.0.

the error you probably have encountered is an arrays out of bounds (IndexError: index 100 is out of bounds for axis 0 with size 100), because your index ranges from 0 to 1999 and you're doing
r[i] = i**2 # some function to compute and save
on a 100-sized array.
Fix:
r[i%100] = i**2 # some function to compute and save

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to go about data modelling? - python

Related

How to center the signals to start around zero

Numerical solutions of unsteady 2D heat equation in python producing error incorrectly

displaying Mandelbrot set in python using matplotlib.pyplot and numpy

Plotting a graph given function definition

Mean over a sub-interval Python

Categories

Resources