best method of making an array - python

I'm new to programming and am a bit unsure about how to write my own for loop. This is what I would like please?
Let us subdivide interval [0,1] into n points x0=0,...,xn−1=1.
Write a function compute_discrete_u(epsilon, n) that returns two numpy arrays:
x_array contains the coordinates of the n points
u_array contains the discrete values of u at these points.
u(x)=sin(1x+ϵ)
Thank you!

First of all, you do not need a for loop at all. You want to use numpy, so you can use the vectorized operations that numpy is built upon.
Here's the function you are literally asking for (and most likely not how you should solve your problem):
# Do NOT use this.
import numpy as np
def compute_discrete_u(epsilon, n):
x = np.linspace(0, 1, n)
return x, np.sin(x + expsilon)
That's quite an awkward API. From a design point-of-view, you are mixing two responsibilities in the function:
Generating a certain x vector
Calculating a u vector based on a mathematical function.
You should not do this for complexity and reusability reasons. What if you want a non-uniform x later on?
So here's what you should do:
import numpy as np
def compute_u(x, epsilon):
return np.sin(x + epsilon)
x = np.linspace(0, 1, num=101)
u = compute_u(x, epsilon=1e-3)
This is more easy to understand because the function is just the mathematical function. Additionally, you can compute u for any x array (or single float) you like. If you do not need compute_u elsewhere, you may even completely drop it and write u = np.sin(x + epsilon)

Related

Eliminating Redundancy with (Multiple) Nested For-Loops

for x in range(10):
for y in range(10):
for z in range(10):
if (1111*x + 1111*y + 1111*z) == (10000*y + 1110*x + z):
print(z)
Is there a way to shorten this code, specifically the first 3 lines where I've used three similar looking for loops? I'm quite new to python so please explain any modules used, if possible.
Well, you're essentially evaluating a function in a 3d coordinate system, with coordinates given by x, y, and z. So let's look at Numpy, which implements arrays in Python. If you're familiar with matlab or IDL, these arrays have similar functionality.
import numpy
x = numpy.arange(10) #Same as range but creates an array instead of a generator
y = numpy.arange(10)
z = numpy.arange(10)
#Now build a 3d array with every point
#defined by the coordinate arrays
xg, yg, zg = numpy.meshgrid(x,y,z)
#Evaluate your functions
#and store the Boolean result in an array.
mask = (1111*xg + 1111*yg + 1111*zg) == (10000*yg + 1110*xg + zg)
#Print out the z values where the mask is True
print(zg[mask])
Is this more readable? Debatable. Is it shorter? No. But it does leverage array operations which may be faster in certain circumstances.

Using a function to populate a numpy array

I have made a function that performs a random walk simulation (random_path) and returns a 1D array (of length num_steps +1). I would like to perform a large number of simulations (n_sims) using this function and then examine my results. I can do this using lists and for loops as:
simulations = []
for i in range(0, n_sims):
current_sim = random_path(x, y, sigma, T, num_steps)
simulations.append(current_sim)
This works fine. I am wondering if there is a more pythonic way of doing this though? Is it possible to do this using only numpy arrays? That is, instead of setting up simulations as an empty list and then creating a list of arrays with a for loop, can I directly initialise simulations using the function random_path to create an array that I guess ultimately would be of shape (n_sims, num_steps + 1)?
Let's assume that you generate your random walk something like this (and if you don't, you probably should be):
walk = np.r_[0, np.random.normal(scale=sigma, size=N).cumsum()]
To make M simulations, just generate the appropriate number of data points and sum over the correct axis:
walks = np.concatenate((np.zeros((M, 1)), np.random.normal(scale=sigma, size=(M, N)).cumsum(axis=-1)), axis=-1)
You can use list comprehension:
simulations = [random_path(x, y, sigma, T, num_steps) for i in range(n_sims)]
If you don't want to explicitly use lists you can try with numpy.vectorize:
import numpy as np
vect_func = np.vectorize(lambda ignored: random_path(x, y, sigma, T, num_steps))
simulations = vect_func(range(n_sims))
In this particular case, ignored is, in fact, ignored.

find 2d elements in a 3d array which are similar to 2d elements in another 3d array

I have two 3D arrays and want to identify 2D elements in one array, which have one or more similar counterparts in the other array.
This works in Python 3:
import numpy as np
import random
np.random.seed(123)
A = np.round(np.random.rand(25000,2,2),2)
B = np.round(np.random.rand(25000,2,2),2)
a_index = np.zeros(A.shape[0])
for a in range(A.shape[0]):
for b in range(B.shape[0]):
if np.allclose(A[a,:,:].reshape(-1, A.shape[1]), B[b,:,:].reshape(-1, B.shape[1]),
rtol=1e-04, atol=1e-06):
a_index[a] = 1
break
np.nonzero(a_index)[0]
But of course this approach is awfully slow. Please tell me, that there is a more efficient way (and what it is). THX.
You are trying to do an all-nearest-neighbor type query. This is something that has special O(n log n) algorithms, I'm not aware of a python implementation. However you can use regular nearest-neighbor which is also O(n log n) just a bit slower. For example scipy.spatial.KDTree or cKDTree.
import numpy as np
import random
np.random.seed(123)
A = np.round(np.random.rand(25000,2,2),2)
B = np.round(np.random.rand(25000,2,2),2)
import scipy.spatial
tree = scipy.spatial.cKDTree(A.reshape(25000, 4))
results = tree.query_ball_point(B.reshape(25000, 4), r=1e-04, p=1)
print [r for r in results if r != []]
# [[14252], [1972], [7108], [13369], [23171]]
query_ball_point() is not an exact equivalent to allclose() but it is close enough, especially if you don't care about the rtol parameter to allclose(). You also get a choice of metric (p=1 for city block, or p=2 for Euclidean).
P.S. Consider using query_ball_tree() for very large data sets. Both A and B have to be indexed in that case.
P.S. I'm not sure what effect the 2d-ness of the elements should have; the sample code I gave treats them as 1d and that is identical at least when using city block metric.
From the docs of np.allclose, we have :
If the following equation is element-wise True, then allclose returns
True.
absolute(a - b) <= (atol + rtol * absolute(b))
Using that criteria, we can have a vectorized implementation using broadcasting, customized for the stated problem, like so -
# Setup parameters
rtol,atol = 1e-04, 1e-06
# Use np.allclose criteria to detect true/false across all pairwise elements
mask = np.abs(A[:,None,] - B) <= (atol + rtol * np.abs(B))
# Use the problem context to get final output
out = np.nonzero(mask.all(axis=(2,3)).any(1))[0]

Iterative solving of sparse systems of linear equations with (M, N) right-hand size matrix

I would like to solve a sparse linear equations system: A x = b, where A is a (M x M) array, b is an (M x N) array and x is and (M x N) array.
I solve this in three ways using the:
scipy.linalg.solve(A.toarray(), b.toarray()),
scipy.sparse.linalg.spsolve(A, b),
scipy.sparse.linalg.splu(A).solve(b.toarray()) # returns a dense array
I wish to solve the problem using the iterative scipy.sparse.linalg methods:
scipy.sparse.linalg.cg,
scipy.sparse.linalg.bicg,
...
However, the metods suport only a right hand side b with a shape (M,) or (M, 1). Any ideas on how to expand these methods to (M x N) array b?
A key difference between iterative solvers and direct solvers is that direct solvers can more efficiently solve for multiple right-hand values by using a factorization (usually either Cholesky or LU), while iterative solvers can't. This means that for direct solvers there is a computational advantage to solving for multiple columns simultaneously.
For iterative solvers, on the other hand, there's no computational gain to be had in simultaneously solving multiple columns, and this is probably why matrix solutions are not supported natively in the API of cg, bicg, etc.
Because of this, a direct solution like scipy.sparse.linalg.spsolve will probably be optimal for your case. If for some reason you still desire an iterative solution, I'd just create a simple convenience function like this:
from scipy.sparse.linalg import bicg
def bicg_solve(M, B):
X, info = zip(*(bicg(M, b) for b in B.T))
return np.transpose(X), info
Then you can create some data and call it as follows:
import numpy as np
from scipy.sparse import csc_matrix
# create some matrices
M = csc_matrix(np.random.rand(5, 5))
B = np.random.rand(5, 4)
X, info = bicg_solve(M, B)
print(X.shape)
# (5, 4)
Any iterative solver API which accepts a matrix on the right-hand-side will essentially just be a wrapper for something like this.

root mean square in numpy and complications of matrix and arrays of numpy

Can anyone direct me to the section of numpy manual where i can get functions to accomplish root mean square calculations ...
(i know this can be accomplished using np.mean and np.abs .. isn't there a built in ..if no why?? .. just curious ..no offense)
can anyone explain the complications of matrix and arrays (just in the following case):
U is a matrix(T-by-N,or u say T cross N) , Ue is another matrix(T-by-N)
I define k as a numpy array
U[ind,:] is still matrix
in the following fashion
k = np.array(U[ind,:])
when I print k or type k in ipython
it displays following
K = array ([[2,.3 .....
......
9]])
You see the double square brackets (which makes it multi-dim i guess)
which gives it the shape = (1,N)
but I can't assign it to array defined in this way
l = np.zeros(N)
shape = (,N) or perhaps (N,) something like that
l[:] = k[:]
error:
matrix dimensions incompatible
Is there a way to accomplish the vector assignment which I intend to do ... Please don't tell me do this l = k (that defeats the purpose ... I get different errors in program .. I know the reasons ..If you need I may attach the piece of code)
writing a loop is the dumb way .. which I'm using for the time being ...
I hope I was able to explain .. the problems I'm facing ..
regards ...
For the RMS, I think this is the clearest:
from numpy import mean, sqrt, square, arange
a = arange(10) # For example
rms = sqrt(mean(square(a)))
The code reads like you say it: "root-mean-square".
For rms, the fastest expression I have found for small x.size (~ 1024) and real x is:
def rms(x):
return np.sqrt(x.dot(x)/x.size)
This seems to be around twice as fast as the linalg.norm version (ipython %timeit on a really old laptop).
If you want complex arrays handled more appropriately then this also would work:
def rms(x):
return np.sqrt(np.vdot(x, x)/x.size)
However, this version is nearly as slow as the norm version and only works for flat arrays.
For the RMS, how about
norm(V)/sqrt(V.size)
I don't know why it's not built in. I like
def rms(x, axis=None):
return sqrt(mean(x**2, axis=axis))
If you have nans in your data, you can do
def nanrms(x, axis=None):
return sqrt(nanmean(x**2, axis=axis))
Try this:
U = np.zeros((N,N))
ind = 1
k = np.zeros(N)
k[:] = U[ind,:]
I use this for RMS, all using NumPy, and let it also have an optional axis similar to other NumPy functions:
import numpy as np
rms = lambda V, axis=None: np.sqrt(np.mean(np.square(V), axis))
If you have complex vectors and are using pytorch, the vector norm is the fastest approach on CPU & GPU:
import torch
batch_size, length = 512, 4096
batch = torch.randn(batch_size, length, dtype=torch.complex64)
scale = 1 / torch.sqrt(torch.tensor(length))
rms_power = batch.norm(p=2, dim=-1, keepdim=True)
batch_rms = batch / (rms_power * scale)
Using batch vdot like goodboy's approach is 60% slower than above. Using naïve method similar to deprecated's approach is 85% slower than above.

Categories