How to calculate average values of list elements in python?

How to calculate average values of list elements in python? - python

I'd like to calculate average of each values in a list. To do so, I wrote a function which gets list as parameter and calculate the average and returns the list of average again.
Here is the signal:
random_data = [10 * random.uniform(0,1) for i in range(1000)]
random_peak = [100 * random.uniform(0,1) for i in range(50)] + [0] * 950
random.shuffle(peak)
for i in range(0, len(signal)):
signal = [peak[x] + random_data[x] for x in range(len(random_data))]
And now, I'd like to calculate m as following.
'''
m1 = 1/(number of signal) * x1
m2 = 1/(number of signal) * (x1+x2)
m3 = 1/(number of signal) * (x1+x2+x3)
...
'''
I wrote a following function to calculate m. How would I change the function to return list of m s?
def mean_values(s):
for i in range(len(s)):
m[i] = 1/len(s)*s[i]
return m[i]
mean_values(signal)
#mean_values(np.array(signal)

use m as a float instead of list it make more sense
to get mean
s = 1 / n * Σxi
you can use this to get new mean from a previous one
s' = s + (x1 - s) / n1
where s is the lastest mean, x1 the new value and n1 the new length
However in numpy their is a prebuilt function np.mean() which do that and manage python list too

Related

making python code faster using numba, cache or any other optimization

I have defined the following function
def laplacian_2D_array(func_2D):
func_2nd_derv_x_fin2D = np.zeros((N,N))
for j in range (0,N):
func_CD2_x_list = []
for i in range (0,N):
value = func_2D[i][j] #func_2D is a NxN matrix
func_CD2_x_list.append(value)
func_CD2_x_array = np.array (func_CD2_x_list)[np.newaxis]
func_2nd_derv_x = matrix_R # (np.transpose(func_CD2_x_array))
func_2nd_derv_x_fin2D[j] = np.transpose (np.reshape(func_2nd_derv_x,[N,]))
func_2nd_derv_y_fin2D = np.zeros((N,N))
for i in range (0,N):
func_CD2_y_list = []
for j in range (0,N):
value = func_2D[i][j]
func_CD2_y_list.append(value)
func_CD2_y_array = np.array (func_CD2_y_list)[np.newaxis]
func_2nd_derv_y = matrix_S # (np.transpose(func_CD2_y_array))
func_2nd_derv_y_fin2D[i] = (np.reshape(func_2nd_derv_y,[N,]))
return (np.add(func_2nd_derv_x_fin2D, func_2nd_derv_y_fin2D))
In the above code a 2D matrix func_2Deach j row is extracted as column vector and multiplied with matrix_R and stored in jth column of func_2nd_derv_x_fin2D and similarly of 2nd block of code and finally function returns the addition of func_2nd_derv_x_fin2D and func_2nd_derv_y_fin2D
In this N = 401 and matrix_S and matrix_Rare also N X N matrices. This function is being called multiple times in a while loop and execution of single iteration is taking a lot of time. I have tried #njit to make it faster but I am not successful in doing so and getting errors. I have also tried using cache.
How can we optimise for lists and arrays in this and what are other ways to optimize the defined function?
I am showing the code where the function is used.
while (time<timemax):
#Analytical Solution----------------------------------------------------------------------------
exact_time = time/a_sec
omega_t = np.zeros((N,N))
psi_t = np.zeros((N,N))
for i in range (0,N):
for j in range (0,N):
psi_t[i][j] = np.sin(x_list[i]) * np.sin(y_list[j]) * np.exp((-2*exact_time)/Re)
omega_t[i][j] = 2*np.sin (x_list[i]) * np.sin(y_list[j]) * np.exp((-2*exact_time)/Re)
.
.
.
.
.
.
.
# BiCGSTAB algo
x0 = psi_0 #initial guess--> psi of previous time step
r0 = omega_0 - laplacian_2D_array(x0) # r0 = b-Ax0
r0_hat = r0
rho_0 = 1
alpha = 1
w0 = 1
v0 = np.zeros((N,N))
P_0 = np.zeros((N,N))
tol = 10 ** (-7)
iteration = 0
while ((np.max(np.abs(laplacian_2D_array(x0) - omega_0))) < tol):
rho_prev = rho_0
rho = np.dot ((np.reshape(r0_hat,(N**2,1))),(np.reshape(r0,(N**2,1))))
beta = (rho/rho_prev) * (alpha/w0)
P_0 = r0 + beta * (P_0 - w0 * v0)
v0 = laplacian_2D_array(P_0)
alpha = rho / np.dot ((np.reshape(r0_hat,(N**2,1))),(np.reshape(v0,(N**2,1))))
s = r0 - alpha * v0
t = laplacian_2D_array(s)
Any suggestions to fasten code is highly appreciable.

It turns out the complicated code of laplacian_2D_array can be simplified as the following implementation:
def laplacian_2D_array(func_2D):
return (matrix_R # func_2D + matrix_S # func_2D.T).T
This is 63 times faster on my machine on random matrices based on your provided inputs. Most of the time should be spent in the matrix multiplication (performed very efficiently in parallel if the installation of Numpy/Python/BLAS is done correctly on the target platform).

How do I calculate standard deviation in python without using numpy?

I'm trying to calculate standard deviation in python without the use of numpy or any external library except for math. I want to get better at writing algorithms and am just doing this as a bit of "homework" as I improve my python skills. My goal is to translate this formula into python but am not getting the correct result.
I'm using an array of speeds where speeds = [86,87,88,86,87,85,86]
When I run:
std_dev = numpy.std(speeds)
print(std_dev)
I get: 0.903507902905. But I don't want to rely on numpy. So...
My implementation is as follows:
import math
speeds = [86,87,88,86,87,85,86]
def get_mean(array):
sum = 0
for i in array:
sum = sum + i
mean = sum/len(array)
return mean
def get_std_dev(array):
# get mu
mean = get_mean(array)
# (x[i] - mu)**2
for i in array:
array = (i - mean) ** 2
return array
sum_sqr_diff = 0
# get sigma
for i in array:
sum_sqr_diff = sum_sqr_diff + i
return sum_sqr_diff
# get mean of squared differences
variance = 1/len(array)
mean_sqr_diff = (variance * sum_sqr_diff)
std_dev = math.sqrt(mean_sqr_diff)
return std_dev
std_dev = get_std_dev(speeds)
print(std_dev)
Now when I run:
std_dev = get_std_dev(speeds)
print(std_dev)
I get: [0] but I am expecting 0.903507902905
What am I missing here?

The problem in your code is the reuse of array and return in the middle of the loop
def get_std_dev(array):
# get mu
mean = get_mean(array) <-- this is 86.4
# (x[i] - mu)**2
for i in array:
array = (i - mean) ** 2 <-- this is almost 0
return array <-- this is the value returned
Now let us look at the algorithm you are using. Note that there are two std deviation formulas that are commonly used. There are various arguments as to which one is correct.
sqrt(sum((x - mean)^2) / n)
or
sqrt(sum((x - mean)^2) / (n -1))
For big values of n, the first formula is used since the -1 is insignificant. The first formula can be reduced to
sqrt(sum(x^2) /n - mean^2)
So how would you do this in python?
def std_dev1(array):
n = len(array)
mean = sum(array) / n
sumsq = sum(v * v for v in array)
return (sumsq / n - mean * mean) ** 0.5

speeds = [86,87,88,86,87,85,86]
# Calculate the mean of the values in your list
mean_speeds = sum(speeds) / len(speeds)
# Calculate the variance of the values in your list
# This is 1/N * sum((x - mean(X))^2)
var_speeds = sum((x - mean_speeds) ** 2 for x in speeds) / len(speeds)
# Take the square root of variance to get standard deviation
sd_speeds = var_speeds ** 0.5
>>> sd_speeds
0.9035079029052513

some problems in the code, one of them is the return value inside the for statement. you can try this
def get_mean(array):
return sum(array) / len(array)
def get_std_dev(array):
n = len(array)
mean = get_mean(array)
squares_arr = []
for item in array:
squares_arr.append((item - mean) ** 2)
return math.sqrt(sum(squares_arr) / n)

If you don't want to use numpy its ok give a try to statistics package in python
import statistics
st_dev = statistics.pstdev(speeds)
print(st_dev)
or if you are still willing to use a custom solution then I recommend you to use the following way using list comprehension instead of your complex buggy approach
import math
mean = sum(speeds) / len(speeds)
var = sum((l-mean)**2 for l in speeds) / len(speeds)
st_dev = math.sqrt(var)
print(st_dev)

This. You need to get rid of return inside for loops.
def get_std_dev(array):
# get mu
mean = get_mean(array)
sum_sqr_diff = 0
# get sigma
for i in array:
sum_sqr_diff = sum_sqr_diff + (i - mean)**2
# get mean of squared differences
variance = 1/len(array)
mean_sqr_diff = (variance * sum_sqr_diff)
std_dev = math.sqrt(mean_sqr_diff)
return std_dev

Distance between list of points based on latitude/longitude

so I have this list of coordinates and I need final SUM of distance between them.
track = [[49.16967, 20.21491, 1343],
[49.17066, 20.22002, 1373],
[49.16979, 20.22416, 1408],
[49.17077, 20.22186, 1422],
[49.17258, 20.22094, 1467],
[49.17294, 20.21944, 1460]]
So far I have basic formula for calculating distance between 2 sets of coordinates
import math
def distance(lat_start, lon_start, lat_ciel, lon_ciel):
R = 6371000
lat_start = math.radians(lat_start)
lon_start = math.radians(lon_start)
lat_ciel = math.radians(lat_ciel)
lon_ciel = math.radians(lon_ciel)
DiffLat = lat_ciel - lat_start
DiffLon = lon_ciel - lon_start
a = math.sin(DiffLat/2) ** 2 + math.cos(lat_start) * math.cos(lat_ciel) * math.sin(DiffLon / 2) ** 2
c = 2 * math.atan2(math.sqrt(a), math.sqrt(1 - a))
return R * c
I am stuck on the next step, I tried creating a different function that uses existing function for distance and just take each set of coordinates and calculate distance and just add the result numbers together.
Thanks for any help.

import math
from itertools import combinations
def distance(lat_start, lon_start, lat_ciel, lon_ciel):
R = 6371000
lat_start = math.radians(lat_start)
lon_start = math.radians(lon_start)
lat_ciel = math.radians(lat_ciel)
lon_ciel = math.radians(lon_ciel)
DiffLat = lat_ciel - lat_start
DiffLon = lon_ciel - lon_start
a = math.sin(DiffLat/2) ** 2 + math.cos(lat_start) * math.cos(lat_ciel) * math.sin(DiffLon / 2) ** 2
c = 2 * math.atan2(math.sqrt(a), math.sqrt(1 - a))
return R * c
def sum_distance(track):
return sum((map(lambda p: distance(*p[0][:2], *p[1][:2]), combinations(track, 2))))
my_track = [[49.16967, 20.21491, 1343],
[49.17066, 20.22002, 1373],
[49.16979, 20.22416, 1408],
[49.17077, 20.22186, 1422],
[49.17258, 20.22094, 1467],
[49.17294, 20.21944, 1460]]
print(sum_distance(my_track)) # 5252.0327870706005
Explanation
combinations(...) from
https://docs.python.org/2/library/itertools.html#itertools.combinations
provides all combinations of pairs
lambda p: distance(*p[0][:2], *p[1][:2]) computes distance for a
pair, with p[0] and p[1] being the first and second elements of a pair
[:2] is a slice to get the first two elements (i.e. lat/long)
*p[x][:2] provides the unpacking of the first two elements for arguments of the distance function
map(...) generate distance for all pairs
sum(...) sums up the distance of pairs

Speed up distance calculations, sliding window

I have two time series A and B. A with length m and B with length n. m << n. Both have the dimension d.
I calculate the distance between A and all subsequencies in B by sliding A over B.
In python the code looks like this.
def sliding_dist(A,B)
n = len(B)
dist = np.zeros(n)
for i in range(n-m):
subrange = B[i:i+m,:]
distance = np.linalg.norm(A-subrange)
dist[i] = distance
return dist
Now this code takes a lot of time to execute and I have very many calculations to do.
I need to speed up the calculations. My guess is that I could do this by using convolutions, and multiplication in frequency domain(FFT). However, I have been unable to implement it.
Any ideas? :) Thanks

norm(A - subrange) isn't a convolution in itself, but it may be expressed as:
sqrt(dot(A, A) + dot(subrange, subrange) - 2 * dot(A, subrange))
How to calculate each term fast:
dot(A, A) - this is just a constant.
dot(subrange, subrange) - this can be calculated in O(1) (per position) using a recursive approach.
dot(A, subrange) - this is a convolution in this context. So this can be calculated in the frequency domain via the convolution theorem.1
Note, however, that you're unlikely to see a performance improvement if the subrange size is only 10.
1. AKA fast convolution.

Implementation with matrix operations, like I mentioned in the comment. Idea is to evaluate norm step by step. In your case i'th value is:
d[i] = sqrt((A[0] - B[i])^2 + (A[1] - B[+1])^2 + ... + (A[m-1] - B[i+m-1])^2)
First three lines calculate sum of squares, and last line is doing sqrt().
Speed-up is ~60x.
import numpy
import time
def sliding_dist(A, B):
m = len(A)
n = len(B)
dist = numpy.zeros(n-m)
for i in range(n-m):
subrange = B[i:i+m]
distance = numpy.linalg.norm(A-subrange)
dist[i] = distance
return dist
def sd_2(A, B):
m = len(A)
dist = numpy.square(A[0] - B[:-m])
for i in range(1, m):
dist += numpy.square(A[i] - B[i:-m+i])
return numpy.sqrt(dist, out=dist)
A = numpy.random.rand(10)
B = numpy.random.rand(500)
x = 1000
t = time.time()
for _ in range(x):
d1 = sliding_dist(A, B)
t1 = time.time()
for _ in range(x):
d2 = sd_2(A, B)
t2 = time.time()
print numpy.allclose(d1, d2)
print 'Orig %0.3f ms, second approach %0.3f ms' % ((t1 - t) * 1000., (t2 - t1) * 1000.)
print 'Speedup ', (t1 - t) / (t2 - t1)
Update
This is 're-implementation' of norm you need in matrix operations. It is not flexible if you want some other norm that numpy offers. Different approach is possible, to create matrix of B sliding windows and make norm on that whole array, since norm() receives parameter axis. Here is implementation of that approach, but speed-up is ~40x, which is slower than previous.
def sd_3(A, B):
m = len(A)
n = len(B)
bb = numpy.empty((len(B) - m, m))
for i in range(m):
bb[:, i] = B[i:-m+i]
return numpy.linalg.norm(A - bb, axis=1)

How to create an array that can be accessed according to its indices in Numpy?

I am trying to solve the following problem via a Finite Difference Approximation in Python using NumPy:
$u_t = k \, u_{xx}$, on $0 < x < L$ and $t > 0$;
$u(0,t) = u(L,t) = 0$;
$u(x,0) = f(x)$.
I take $u(x,0) = f(x) = x^2$ for my problem.
Programming is not my forte so I need help with the implementation of my code. Here is my code (I'm sorry it is a bit messy, but not too bad I hope):
## This program is to implement a Finite Difference method approximation
## to solve the Heat Equation, u_t = k * u_xx,
## in 1D w/out sources & on a finite interval 0 < x < L. The PDE
## is subject to B.C: u(0,t) = u(L,t) = 0,
## and the I.C: u(x,0) = f(x).
import numpy as np
import matplotlib.pyplot as plt
# definition of initial condition function
def f(x):
return x^2
# parameters
L = 1
T = 10
N = 10
M = 100
s = 0.25
# uniform mesh
x_init = 0
x_end = L
dx = float(x_end - x_init) / N
#x = np.zeros(N+1)
x = np.arange(x_init, x_end, dx)
x[0] = x_init
# time discretization
t_init = 0
t_end = T
dt = float(t_end - t_init) / M
#t = np.zeros(M+1)
t = np.arange(t_init, t_end, dt)
t[0] = t_init
# Boundary Conditions
for m in xrange(0, M):
t[m] = m * dt
# Initial Conditions
for j in xrange(0, N):
x[j] = j * dx
# definition of solution to u_t = k * u_xx
u = np.zeros((N+1, M+1)) # NxM array to store values of the solution
# finite difference scheme
for j in xrange(0, N-1):
u[j][0] = x**2 #initial condition
for m in xrange(0, M):
for j in xrange(1, N-1):
if j == 1:
u[j-1][m] = 0 # Boundary condition
else:
u[j][m+1] = u[j][m] + s * ( u[j+1][m] - #FDM scheme
2 * u[j][m] + u[j-1][m] )
else:
if j == N-1:
u[j+1][m] = 0 # Boundary Condition
print u, t, x
#plt.plot(t, u)
#plt.show()
So the first issue I am having is I am trying to create an array/matrix to store values for the solution. I wanted it to be an NxM matrix, but in my code I made the matrix (N+1)x(M+1) because I kept getting an error that the index was going out of bounds. Anyways how can I make such a matrix using numpy.array so as not to needlessly take up memory by creating a (N+1)x(M+1) matrix filled with zeros?
Second, how can I "access" such an array? The real solution u(x,t) is approximated by u(x[j], t[m]) were j is the jth spatial value, and m is the mth time value. The finite difference scheme is given by:
u(x[j],t[m+1]) = u(x[j],t[m]) + s * ( u(x[j+1],t[m]) - 2 * u(x[j],t[m]) + u(x[j-1],t[m]) )
(See here for the formulation)
I want to be able to implement the Initial Condition u(x[j],t[0]) = x**2 for all values of j = 0,...,N-1. I also need to implement Boundary Conditions u(x[0],t[m]) = 0 = u(x[N],t[m]) for all values of t = 0,...,M. Is the nested loop I created the best way to do this? Originally I tried implementing the I.C. and B.C. under two different for loops which I used to calculate values of the matrices x and t (in my code I still have comments placed where I tried to do this)
I think I am just not using the right notation but I cannot find anywhere in the documentation for NumPy how to "call" such an array so at to iterate through each value in the proposed scheme. Can anyone shed some light on what I am doing wrong?
Any help is very greatly appreciated. This is not homework but rather to understand how to program FDM for Heat Equation because later I will use similar methods to solve the Black-Scholes PDE.
EDIT: So when I run my code on line 60 (the last "else" that I use) I get an error that says invalid syntax, and on line 51 (u[j][0] = x**2 #initial condition) I get an error that reads "setting an array element with a sequence." What does that mean?

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to calculate average values of list elements in python? - python

Related

making python code faster using numba, cache or any other optimization

How do I calculate standard deviation in python without using numpy?

Distance between list of points based on latitude/longitude

Speed up distance calculations, sliding window

How to create an array that can be accessed according to its indices in Numpy?

Categories

Resources