whether to use python to the huge computing? - python

Sorry for my bad English.I am currently work with python and i have a problem with too slow to filling of matrix size of 10000x10000.I program a "relaxation method" where i need to test the different size of matrix.What do you think about that?...P.S I wait 3-5 min to fill one matrix 10000x10000.
def CalcMatrix(self):
for i in range(0, self._n + 1): #(0,10000)
for j in range(0, self._n + 1): #(0,10000)
one = sin(pi * self._x[i]) # x is the vector of size 10000
two = sin(pi * self._y[j]) # y too
self._f[i][j] = 2 * pi * pi * one * two #fill

Native Python loop speed is very slow. People working with arrays and matrices in Python usually use numpy. There are other tools like cython and numba which can dramatically improve the speed in certain circumstances, but the basic idea of numpy is to vectorize the operations and push the hard work down to fast libraries implemented in C and fortran.
The following code takes only a few seconds on my not-very-fast notebook:
import numpy as np
from numpy import pi
x = np.linspace(0,1,10**4)
y = np.linspace(2,5,10**4)
ans = 2*pi**2 * np.outer(np.sin(pi*x), np.sin(pi*y))
(PS: If your _n == 10000, then won't your matrix be 10001x10001, not 10000x10000?)

Some improvements are possible. Consider moving some of the computations out of the loop
def CalcMatrix(self):
for i in range(0, self._n + 1): #(0,10000)
one = sin(pi * self._x[i]) # x is the vector of size 10000
for j in range(0, self._n + 1): #(0,10000)
two = sin(pi * self._y[j]) # y too
self._f[i][j] = 2 * pi * pi * one * two #fill
This value 2 * pi * pi can be precomputed and stored in a variable so it doesn't have to be recomputed each time in the loop.
If that is still not enough, consider using a native language like C or Fortran.

Related

Vector Normalization in Python

I'm trying to port this MatLab function in Python:
fs = 128;
x = (0:1:999)/fs;
y_orig = sin(2*pi*15*x);
y_noised = y_orig + 0.5*randn(1,length(x));
[yseg] = mapstd(y_noised);
I wrote this code (which works, so there are not problems with missing variables or else):
Norm_Y = 0
Y_Normalized = []
for i in range(0, len(YSeg), 1):
Norm_Y = Norm_Y + (pow(YSeg[i],2))
Norm_Y = sqrt(Norm_Y)
for i in range(0, len(YSeg), 1):
Y_Normalized.append(YSeg[i] / Norm_Y)
print("%3d %f" %(i, Y_Normalized[i]))
YSeg is Y_Noised (I wrote it in another section of the code).
Now I don't expect the values to be same between MatLab code and mine, cause YSeg or Y_Noised are generated by RAND values, so it's ok they are different, but they are TOO MUCH different.
These are the first 10 values in Matlab:
0.145728655284548
1.41918657039301
1.72322238170491
0.684826842884694
0.125379108969931
-0.188899711186140
-1.03820858801652
-0.402591786430960
-0.844782236884026
0.626897216311757
While these are the first 10 numbers in my python code:
0.052015
0.051132
0.041209
0.034144
0.034450
0.003812
0.048629
0.016854
0.024484
0.021435
It's like mine are 100 times lower. So I feel like I've missed a step during normalization. Can you help ?
You can normalize a vector quite easily in python with numpy:
import numpy as np
def normalize_vector(input_vector):
return input_vector / np.sqrt(np.sum(input_vector**2))
random_vec = np.random.rand(10)
vec_norm = normalize_vector(random_vec)
print(vec_norm)
You can call the provided function with your input vector (YSeg) and check the output. I would expect a similar output as in matlab.
This is an implementation in numpy:
import numpy as np
fs = 127
x = np.arange(10000) / fs
y_orig = np.sin(2 * np.pi * 15 * x)
y_noised = y_orig + 0.5 * np.random.randn(len(x))
yseg = (y_noised - y_noised.mean()) / y_noised.std()
However, why do you consider the values to be "too much different"? After all, the values of y_orig are in range [-1, 1] and you are randomly distorting them by ~0.4 on average.

How to speed up nested for loop in python

I want to calculate the cM between two different windows along a chromosome.
My code has three nested loops.
For sample, I use random number stand for the recombination map.
import random
windnr = 54800
w, h = windnr, windnr
recmatrix = [[0 for x in range(w)] for y in range(h)]
#Generate 54800 random numbers between 10 and 30
rec_map = random.sample(range(0, 30), 54800)
for i in range(windnr):
for j in range(windnr):
recmatrix[i][j] = 0.25 * rec_map[i] #mean distance within own window
if i > j:
recmatrix[i][j] = recmatrix[i][j] + 0.5 * rec_map[j] #+ mean rdistance final window
for k in range(i-1,j,-1):
recmatrix[i][j] = recmatrix[i][j] + rec_map[k] #add all windows between i and j
if i < j:
recmatrix[i][j] = recmatrix[i][j] + 0.5 * rec_map[j] #+ mean distance final window
for k in range(i+1,j):
recmatrix[i][j] = recmatrix[i][j] + rec_map[k] #add all windows between i and j
#j += 1
if i % 10 == 0:
print("window {}".format(i))
#i += 1
The calculation costs a lot of time. I have to calculate almost 7 days for my data.
Can I speed up the nested for loop within 10 hours?
How can I increase the performance?
Although the 2D array has 3 billion items (~96 GB when being floats), I would rule out hard disk swapping issues, since the server which does the computation has 200 GB of RAM.
Using Numpy will make your application much faster. It's written in C/C++, so it does not suffer from slow loops in Python.
I'm doing my tests on an old Intel Xeon X5550 with 2 sockets, 8 cores and 96 GB of triple channel RAM. I don't have much experience with Numpy, so bear with me, if below code is not optimal.
Array initialization
Already the initialization is much faster:
recmatrix = [[0 for x in range(w)] for y in range(h)]
needs 24 GB of RAM (integers) and takes 3:28 minutes on my PC. Whereas
recmatrix = np.zeros((windnr, windnr), dtype=np.int)
is finished after 50 ms. But since you need floats anyway, start with floats from the beginning:
recmatrix = np.zeros((windnr, windnr), dtype=np.float)
Random samples
The code
#Generate 54800 random numbers between 10 and 30
rec_map = random.sample(range(0, 30), 54800)
did not work for me, so I replaced it and increased k for more stable measurements
rec_map = random.choices(range(0, 30), k=5480000)
which runs in 2.5 seconds. The numpy replacement
rec_map = np.random.choice(np.arange(0, 30), size=5480000)
is done in 0.1 seconds.
The loop
The loop will need most work, since you'll avoid Python loops in Numpy whenever possible.
For example, if you have an array and want to multiply all elements by 2, you would not write a loop but simply multiply the whole array:
import numpy as np
single = np.random.choice(np.arange(0, 10), size=100)
doubled = single * 2
print(single, "\r\n", doubled)
I don't fully understand what the code does, but let's apply that strategy on the first part of the loop. The original is
for i in range(windnr):
for j in range(windnr):
recmatrix[i][j] = 0.25 * rec_map[i] #mean distance within own window
and it takes 18.5 seconds with a reduced windnr = 5480. The numpy equivalent should be
column = 0.25 * rec_map_np
recmatrix = np.repeat(column, windnr)
and is done within 0.25 seconds. Also note: since we're assigning the variable here, we don't need the zero initialization at all.
For the if i>j: and if i<j: parts, I see that the first line is identical
recmatrix[i][j] = recmatrix[i][j] + 0.5 * rec_map[j]
That means, this calculation is applied to all elements except the ones on the diagonal. You can use a mask for that:
mask = np.ones((windnr, windnr), dtype=bool)
np.fill_diagonal(mask, False)
rec_map_2d = np.repeat(0.5 * rec_map_np, windnr-1)
recmatrix[mask] += rec_map_2d
This took only 1:20 minutes for all 54800 elements, but reached my RAM limit at 93 GB.
Usually in python looping always take much time. So if possible then in your case then use map this will save a lot of time for you. Where you are using a iter(list) so it will be good for this script.
example:
def func():
your code
nu = (1, 2, 3, 4)
output = map(func, nu)
print(output)

How to compute the recurrence formula without using array

I'd like to compute the y_{n+1} = (a * y_n * b) + (b * y_n * a), where a and b are some matrix respectively.
And I coded this recurrence formula using array for yn.
yn = []
for n in range(30):
if n == 0:
yn.append(a.dot(b) + b.dot(a))
else:
yn.append(a.dot(yn[n-1]).dot(b) + b.dot(yn[n-1]).dot(a))
However, it turned out this code doesn't work well for large sized matrix because of the memory problem. So, I want a another way to compute this without using array. Can someone help me to solve this?
Just one variable.
y = a.dot(b) + b.dot(a)
for n in range(1, 30):
y = a.dot(y).dot(b) + b.dot(y).dot(a)

Efficiently coding gradient of function

I'm currently trying to code this beastie in Python (using the numpy libraries). The lambda * w is supposed to be outside the summation.
Currently, I've coded the problem using a for loop, and a running total outside; however, this approach takes a long time.
My vectors for y, w, and x are very large - think 100,000s of elements. I was wondering whether there is a simpler way to vectorize the element instead using simple matrix operations instead of looping through the vector one element by another element.
This is my vectorized code:
xty = xtrain.T.dot(ytrain)
e = math.exp(-w_0.T.dot(xty))
gradient = (-xty*(e/1+e)-lambda_var*w_0)
If I understand your problem correctly, you might just have to bite the bullet and go with the loop:
import numpy as np
wave = 1e3
xs, ys, w = np.arange(1, 4), np.arange(4, 7), np.arange(7, 10)
eps = np.zeros(w.T.shape)
for x, y in zip(xs, ys):
eps += -y * np.exp(-y * w.T * x) * x / (1 + np.exp(-y * w.T * x))
print(eps + wave * w)
[ 7000. 8000. 9000.]

Speed up this interpolation in python

I have an image processing problem I'm currently solving in python, using numpy and scipy. Briefly, I have an image that I want to apply many local contractions to. My prototype code is working, and the final images look great. However, processing time has become a serious bottleneck in our application. Can you help me speed up my image processing code?
I've tried to boil down our code to the 'cartoon' version below. Profiling suggests that I'm spending most of my time on interpolation. Are there obvious ways to speed up execution?
import cProfile, pstats
import numpy
from scipy.ndimage import interpolation
def get_centered_subimage(
center_point, window_size, image):
x, y = numpy.round(center_point).astype(int)
xSl = slice(max(x-window_size-1, 0), x+window_size+2)
ySl = slice(max(y-window_size-1, 0), y+window_size+2)
subimage = image[xSl, ySl]
interpolation.shift(
subimage, shift=(x, y)-center_point, output=subimage)
return subimage[1:-1, 1:-1]
"""In real life, this is experimental data"""
im = numpy.zeros((1000, 1000), dtype=float)
"""In real life, this mask is a non-zero pattern"""
window_radius = 10
mask = numpy.zeros((2*window_radius+1, 2*window_radius+1), dtype=float)
"""The x, y coordinates in the output image"""
new_grid_x = numpy.linspace(0, im.shape[0]-1, 2*im.shape[0])
new_grid_y = numpy.linspace(0, im.shape[1]-1, 2*im.shape[1])
"""The grid we'll end up interpolating onto"""
grid_step_x = new_grid_x[1] - new_grid_x[0]
grid_step_y = new_grid_y[1] - new_grid_y[0]
subgrid_radius = numpy.floor(
(-1 + window_radius * 0.5 / grid_step_x,
-1 + window_radius * 0.5 / grid_step_y))
subgrid = (
window_radius + 2 * grid_step_x * numpy.arange(
-subgrid_radius[0], subgrid_radius[0] + 1),
window_radius + 2 * grid_step_y * numpy.arange(
-subgrid_radius[1], subgrid_radius[1] + 1))
subgrid_points = ((2*subgrid_radius[0] + 1) *
(2*subgrid_radius[1] + 1))
"""The coordinates of the set of spots we we want to contract. In real
life, this set is non-random:"""
numpy.random.seed(0)
num_points = 10000
center_points = numpy.random.random(2*num_points).reshape(num_points, 2)
center_points[:, 0] *= im.shape[0]
center_points[:, 1] *= im.shape[1]
"""The output image"""
final_image = numpy.zeros(
(new_grid_x.shape[0], new_grid_y.shape[0]), dtype=numpy.float)
def profile_me():
for m, cp in enumerate(center_points):
"""Take an image centered on each illumination point"""
spot_image = get_centered_subimage(
center_point=cp, window_size=window_radius, image=im)
if spot_image.shape != (2*window_radius+1, 2*window_radius+1):
continue #Skip to the next spot
"""Mask the image"""
masked_image = mask * spot_image
"""Resample the image"""
nearest_grid_index = numpy.round(
(cp - (new_grid_x[0], new_grid_y[0])) /
(grid_step_x, grid_step_y))
nearest_grid_point = (
(new_grid_x[0], new_grid_y[0]) +
(grid_step_x, grid_step_y) * nearest_grid_index)
new_coordinates = numpy.meshgrid(
subgrid[0] + 2 * (nearest_grid_point[0] - cp[0]),
subgrid[1] + 2 * (nearest_grid_point[1] - cp[1]))
resampled_image = interpolation.map_coordinates(
masked_image,
(new_coordinates[0].reshape(subgrid_points),
new_coordinates[1].reshape(subgrid_points))
).reshape(2*subgrid_radius[1]+1,
2*subgrid_radius[0]+1).T
"""Add the recentered image back to the scan grid"""
final_image[
nearest_grid_index[0]-subgrid_radius[0]:
nearest_grid_index[0]+subgrid_radius[0]+1,
nearest_grid_index[1]-subgrid_radius[1]:
nearest_grid_index[1]+subgrid_radius[1]+1,
] += resampled_image
cProfile.run('profile_me()', 'profile_results')
p = pstats.Stats('profile_results')
p.strip_dirs().sort_stats('cumulative').print_stats(10)
Vague explanation of what the code does:
We start with a pixellated 2D image, and a set of arbitrary (x, y) points in our image that don't generally fall on an integer grid. For each (x, y) point, I want to multiply the image by a small mask centered precisely on that point. Next we contract/expand the masked region by a finite amount, before finally adding this processed sub-image to a final image, which may not have the same pixel size as the original image. (Not my finest explanation. Ah well).
I'm pretty sure that, as you said, the bulk of the calculation time happens in interpolate.map_coordinates(…), which gets called once for every iteration on center_points, here 10,000 times. Generally, working with the numpy/scipy stack, you want the repetitive task over a large array to happen in native Numpy/Scipy functions -- i.e. in a C loop over homogeneous data -- as opposed to explicitely in Python.
One strategy that might speed up the interpolation, but that will also increase the amount of memory used, is :
First, fetch all the subimages (here named masked_image) in a 3-dimensional array (window_radius x window_radius x center_points.size)
Make a ufunc (read that, it's useful) that wraps the work that has to be done on each subimage, using numpy.frompyfunc, which should return another 3-dimensional array (subgrid_radius[0] x subgrid_radius[1] x center_points.size). In short, this creates a vectorized version of the python function, that can be broadcast element-wise on an array.
Build the final image by summing over the third dimension.
Hope that gets you closer to your goals!

Categories