Python Newaxis vs for loop

Python Newaxis vs for loop - python

I am trying to make my program faster.
I have a matrix and a vector:
GDES = N.array([[1,2,3,4,5],
[6,7,8,9,10],
[11,12,13,14,15],
[16,17,18,19,20],
[21,22,23,24,25]])
Ene=N.array([1,2,3,4,5])
NN=len(GDES);
I have defined a function for matrix multiplication:
def Gl(n,np,k,q):
matrix = GDES[k,np]*GDES[k,n]*GDES[q,np]*GDES[q,n]
return matrix
and I have made a for loop in my calculation:
SIl = N.zeros((NN,NN),N.float)
for n in xrange(NN):
for np in xrange(NN):
SumJ = N.sum(N.sum(Gl(n,np,k,q) for q in xrange(NN)) for k in xrange(NN))
SIl[n,np]=SumJ
print 'SIl:',SIl
output:
SIl: [[ 731025. 828100. 931225. 1040400. 1155625.]
[ 828100. 940900. 1060900. 1188100. 1322500.]
[ 931225. 1060900. 1199025. 1345600. 1500625.]
[ 1040400. 1188100. 1345600. 1512900. 1690000.]
[ 1155625. 1322500. 1500625. 1690000. 1890625.]]
I want to use newaxis to make it faster:
def G():
Mknp = GDES[:, :, N.newaxis, N.newaxis]
Mkn = GDES[:, N.newaxis, :, N.newaxis]
Mqnp = GDES[:, N.newaxis, N.newaxis, :]
Mqn = GDES[N.newaxis, :, :, N.newaxis]
matrix=Mknp*Mkn*Mqnp*Mqn
return matrix
tmp = G()
MGI = N.sum(N.sum(tmp,axis=3), axis=2)
MGI = N.reshape(MGI,(NN,NN))
print 'MGI:', MGI
output:
MGI: [[ 825 3900 9225 16800 26625]
[ 31200 92400 169600 262800 372000]
[ 146575 413400 722475 1073800 1467375]
[ 403200 1116900 1911600 2787300 3744000]
[ 857325 2352900 3980725 5740800 7633125]]
Any idea how can I get the right answer?

Your problem is a perfect fit for np.einsum:
>>> GDES = np.arange(1, 26).reshape(5, 5)
>>> np.einsum('kj,ki,lj,li->ij', GDES, GDES, GDES, GDES)
array([[ 731025, 828100, 931225, 1040400, 1155625],
[ 828100, 940900, 1060900, 1188100, 1322500],
[ 931225, 1060900, 1199025, 1345600, 1500625],
[1040400, 1188100, 1345600, 1512900, 1690000],
[1155625, 1322500, 1500625, 1690000, 1890625]])
For your particular case, this other syntax may be easier to figure out:
>>> np.einsum(GDES, [2,1], GDES, [2,0], GDES, [3,1], GDES, [3,0], [0,1])
array([[ 731025, 828100, 931225, 1040400, 1155625],
[ 828100, 940900, 1060900, 1188100, 1322500],
[ 931225, 1060900, 1199025, 1345600, 1500625],
[1040400, 1188100, 1345600, 1512900, 1690000],
[1155625, 1322500, 1500625, 1690000, 1890625]])

Related

How tp speed up looping arrays as inputs for pandas calculation?

I have two arrays named x and y. The goal is to iterate them as the input for pandas calculation.
Here's an example.
Iterating each x and y and appending the calculation result to the res list is slow.
The calculation is to get the exponential of each column modified by a and then sum together, multiply with b. Anyway, this calculation can be replaced by any other calculations.
import numpy as np
import pandas as pd
np.random.seed(0)
df = pd.DataFrame(np.random.randint(0,5,size=(5, 1)),columns=['data'])
x = np.linspace(1, 24, 4)
y = np.linspace(10, 1500, 5)
res = []
for a in x:
for b in y:
res.append(np.exp(-df/a).sum().values[0]*b)
res = np.array(res).reshape(4, 5)
expected output:
array([[ 11.67676844, 446.63639283, 881.59601721, 1316.5556416 ,
1751.51526599],
[ 37.52524129, 1435.34047927, 2833.15571725, 4230.97095523,
5628.78619321],
[ 42.79406912, 1636.87314392, 3230.95221871, 4825.0312935 ,
6419.1103683 ],
[ 44.93972433, 1718.94445549, 3392.94918665, 5066.95391781,
6740.95864897]])

You can use numpy broadcasting:
res = np.array(res).reshape(4, 5)
print (res)
[[ 11.67676844 446.63639283 881.59601721 1316.5556416 1751.51526599]
[ 37.52524129 1435.34047927 2833.15571725 4230.97095523 5628.78619321]
[ 42.79406912 1636.87314392 3230.95221871 4825.0312935 6419.1103683 ]
[ 44.93972433 1718.94445549 3392.94918665 5066.95391781 6740.95864897]]
res = np.exp(-df.to_numpy()/x).sum(axis=0)[:, None] * y
print (res)
[[ 11.67676844 446.63639283 881.59601721 1316.5556416 1751.51526599]
[ 37.52524129 1435.34047927 2833.15571725 4230.97095523 5628.78619321]
[ 42.79406912 1636.87314392 3230.95221871 4825.0312935 6419.1103683 ]
[ 44.93972433 1718.94445549 3392.94918665 5066.95391781 6740.95864897]]

I think what you want is:
z = -df['data'].to_numpy()
res = np.exp(z/x[:, None]).sum(axis=1)[:, None]*y
output:
array([[ 11.67676844, 446.63639283, 881.59601721, 1316.5556416 ,
1751.51526599],
[ 37.52524129, 1435.34047927, 2833.15571725, 4230.97095523,
5628.78619321],
[ 42.79406912, 1636.87314392, 3230.95221871, 4825.0312935 ,
6419.1103683 ],
[ 44.93972433, 1718.94445549, 3392.94918665, 5066.95391781,
6740.95864897]])

How to do fancy indexing in tensorflow

I have a Tensorflow tensor A with (let's say) shape (5, 3, 5).
I want to get a tensor B with shape (5, 3) such that
# B = [A[0, :, 0], A[1, :, 1], A[2, :, 2], ...]
I want to achieve this indexing without using any for-loops.
Using numpy one would do:
import numpy as np
# A.shape = (5, 3, 5)
B = A[np.arange(A.shape[0]), :, np.arange(A.shape[2])]
Any suggestions how to do this using Tensorflow?

There are two ways to achieve your goal.
import tensorflow as tf
a = tf.random_normal(shape=(5,3,5))
# method 1: take the diagonal after transpose
b_diag = tf.matrix_diag_part(tf.transpose(a,[1,0,2])) # shape = (3,5)
result1 = tf.transpose(b_diag,[1,0])
# method 2: take the value by indices
indices = tf.stack([tf.range(tf.shape(a)[0])]*2,axis=-1)
# [[0 0]
# [1 1]
# [2 2]
# [3 3]
# [4 4]]
result2 = tf.gather_nd(tf.transpose(a,[0,2,1]),indices)
with tf.Session() as sess:
val_a,val_result1,val_result2 = sess.run([a,result1,result2])
print('origin matrix:\n',val_a)
print('method 1:\n',val_result1)
print('method 2:\n',val_result2)
origin matrix:
[[[ 0.6905094 0.13725948 -0.42244634 -0.19795062 0.02895796]
[-1.2307093 -0.90263253 0.8939539 0.43943858 0.60205126]
[ 0.1317933 0.7697048 -0.8040689 -0.41206598 -0.66366917]]
[[-0.07341296 -0.83268213 1.1547179 -1.035854 -0.43292868]
[ 0.63890094 -1.9335823 -0.61634874 -3.2909455 -1.1862688 ]
[-1.0031502 -0.07485765 0.53183764 0.55050373 -0.03113765]]
[[ 0.23482691 -0.9363624 0.30995724 -0.02038437 0.65965956]
[ 0.73754835 0.23244548 -1.5190666 0.89143264 -0.47610378]
[ 0.6452583 1.5191171 -0.15525642 0.5060588 1.2310679 ]]
[[ 0.32281107 0.80718434 -0.865543 0.5899832 -0.66145474]
[ 0.45294672 -0.31048244 -0.48481905 -1.1497563 1.4231541 ]
[ 0.2343677 -0.8113462 0.58899856 1.6336825 0.11803629]]
[[ 0.8602735 1.3486015 1.4897087 -1.2132328 -0.70290196]
[-2.635646 -0.3950463 0.19890717 -1.9909118 1.3279002 ]
[-0.88162804 -0.7264523 -0.40416357 -0.7689555 1.33081 ]]]
method 1:
[[ 0.6905094 -1.2307093 0.1317933 ]
[-0.83268213 -1.9335823 -0.07485765]
[ 0.30995724 -1.5190666 -0.15525642]
[ 0.5899832 -1.1497563 1.6336825 ]
[-0.70290196 1.3279002 1.33081 ]]
method 2:
[[ 0.6905094 -1.2307093 0.1317933 ]
[-0.83268213 -1.9335823 -0.07485765]
[ 0.30995724 -1.5190666 -0.15525642]
[ 0.5899832 -1.1497563 1.6336825 ]
[-0.70290196 1.3279002 1.33081 ]]

efficient numpy array creation

Given x, I want to produce x, log(x) as a numpy array whereby x has shape s, the result has shape (*s, 2). What's the neatest way to do this? x may just be a float, in which case I want a result with shape (2,).
An ugly way to do this is:
import numpy as np
x = np.asarray(x)
result = np.empty((*x.shape, 2))
result[..., 0] = x
result[..., 1] = np.log(x)

It's important to separate aesthetics from performance. Sometimes ugly code is
fast. In fact, that's the case here. Although creating an empty array and then
assigning values to slices may not look beautiful, it is fast.
import numpy as np
import timeit
import itertools as IT
import pandas as pd
def using_empty(x):
x = np.asarray(x)
result = np.empty(x.shape + (2,))
result[..., 0] = x
result[..., 1] = np.log(x)
return result
def using_concat(x):
x = np.asarray(x)
return np.concatenate([x, np.log(x)], axis=-1).reshape(x.shape+(2,), order='F')
def using_stack(x):
x = np.asarray(x)
return np.stack([x, np.log(x)], axis=x.ndim)
def using_ufunc(x):
return np.array([x, np.log(x)])
using_ufunc = np.vectorize(using_ufunc, otypes=[np.ndarray])
tests = [np.arange(600),
np.arange(600).reshape(20,30),
np.arange(960).reshape(8,15,8)]
# check that all implementations return the same result
for x in tests:
assert np.allclose(using_empty(x), using_concat(x))
assert np.allclose(using_empty(x), using_stack(x))
timing = []
funcs = ['using_empty', 'using_concat', 'using_stack', 'using_ufunc']
for test, func in IT.product(tests, funcs):
timing.append(timeit.timeit(
'{}(test)'.format(func),
setup='from __main__ import test, {}'.format(func), number=1000))
timing = pd.DataFrame(np.array(timing).reshape(-1, len(funcs)), columns=funcs)
print(timing)
yields, the following timeit results on my machine:
using_empty using_concat using_stack using_ufunc
0 0.024754 0.025182 0.030244 2.414580
1 0.025766 0.027692 0.031970 2.408344
2 0.037502 0.039644 0.044032 3.907487
So using_empty is the fastest (of the options tested applied to tests).
Note that np.stack does exactly what you want, so
np.stack([x, np.log(x)], axis=x.ndim)
looks reasonably pretty, but it is also the slowest of the three options tested.
Note that along with being much slower, using_ufunc returns an array of object dtype:
In [236]: x = np.arange(6)
In [237]: using_ufunc(x)
Out[237]:
array([array([ 0., -inf]), array([ 1., 0.]),
array([ 2. , 0.69314718]),
array([ 3. , 1.09861229]),
array([ 4. , 1.38629436]), array([ 5. , 1.60943791])], dtype=object)
which is not the same as the desired result:
In [240]: using_empty(x)
Out[240]:
array([[ 0. , -inf],
[ 1. , 0. ],
[ 2. , 0.69314718],
[ 3. , 1.09861229],
[ 4. , 1.38629436],
[ 5. , 1.60943791]])
In [238]: using_ufunc(x).shape
Out[238]: (6,)
In [239]: using_empty(x).shape
Out[239]: (6, 2)

Using array indexing to apply 2D array function on 3D array

I wrote a function that takes in one set of randomized cartesian coordinates and returns the subset that remains within some spatial domain. To illustrate:
grid = np.ones((5,5))
grid = np.lib.pad(grid, ((10,10), (10,10)), 'constant')
>> np.shape(grid)
(25, 25)
random_pts = np.random.random(size=(100, 2)) * len(grid)
def inside(input):
idx = np.floor(input).astype(np.int)
mask = grid[idx[:,0], idx[:,1]] == 1
return input[mask]
>> inside(random_pts)
array([[ 10.59441506, 11.37998288],
[ 10.39124766, 13.27615815],
[ 12.28225713, 10.6970708 ],
[ 13.78351949, 12.9933591 ]])
But now I want the ability to simultaneously generate n sets of random_pts and keep n corresponding subsets that satisfy the same functional condition. So, if n=3,
random_pts = np.random.random(size=(3, 100, 2)) * len(grid)
Without resorting to for loop, how could I index my variables such that inside(random_pts) returns something like
array([[[ 17.73323523, 9.81956681],
[ 10.97074592, 2.19671642],
[ 21.12081044, 12.80412997]],
[[ 11.41995519, 2.60974757]],
[[ 9.89827156, 9.74580059],
[ 17.35840479, 7.76972241]]])

One approach -
def inside3d(input):
# Get idx in 3D
idx3d = np.floor(input).astype(np.int)
# Create a similar mask as witrh 2D case, but in 3D now
mask3d = grid[idx3d[:,:,0], idx3d[:,:,1]]==1
# Count of mask matches for each index in 0th dim
counts = np.sum(mask3d,axis=1)
# Index into input to get masked matches across all elements in 0th dim
out_cat_array = input.reshape(-1,2)[mask3d.ravel()]
# Split the rows based on the counts, as the final output
return np.split(out_cat_array,counts.cumsum()[:-1])
Verify results -
Create 3D random input:
In [91]: random_pts3d = np.random.random(size=(3, 100, 2)) * len(grid)
With inside3d:
In [92]: inside3d(random_pts3d)
Out[92]:
[array([[ 10.71196268, 12.9875877 ],
[ 10.29700184, 10.00506662],
[ 13.80111411, 14.80514828],
[ 12.55070282, 14.63155383]]), array([[ 10.42636137, 12.45736944],
[ 11.26682474, 13.01632751],
[ 13.23550598, 10.99431284],
[ 14.86871413, 14.19079225],
[ 10.61103434, 14.95970597]]), array([[ 13.67395756, 10.17229061],
[ 10.01518846, 14.95480515],
[ 12.18167251, 12.62880968],
[ 11.27861513, 14.45609646],
[ 10.895685 , 13.35214678],
[ 13.42690335, 13.67224414]])]
With inside:
In [93]: inside(random_pts3d[0])
Out[93]:
array([[ 10.71196268, 12.9875877 ],
[ 10.29700184, 10.00506662],
[ 13.80111411, 14.80514828],
[ 12.55070282, 14.63155383]])
In [94]: inside(random_pts3d[1])
Out[94]:
array([[ 10.42636137, 12.45736944],
[ 11.26682474, 13.01632751],
[ 13.23550598, 10.99431284],
[ 14.86871413, 14.19079225],
[ 10.61103434, 14.95970597]])
In [95]: inside(random_pts3d[2])
Out[95]:
array([[ 13.67395756, 10.17229061],
[ 10.01518846, 14.95480515],
[ 12.18167251, 12.62880968],
[ 11.27861513, 14.45609646],
[ 10.895685 , 13.35214678],
[ 13.42690335, 13.67224414]])

Sorting an Array Alongside a 2d Array

So I'm using NumPy's linear algebra routines to do some basic computational quantum mechanics. Say I have a matrix, hamiltonian, and I want its eigenvalues and eigenvectors
import numpy as np
from numpy import linalg as la
hamiltonian = np.zeros((N, N)) # N is some constant I have defined
# fill up hamiltonian here
energies, states = la.eig(hamiltonian)
Now, I want to sort the energies in increasing order, and I want to sort the states along with them. For example, if I do:
groundStateEnergy = min(energies)
groundStateIndex = np.where(energies == groundStateEnergy)
groundState = states[groundStateIndex, :]
I correctly plot the ground state (eigenvector with the lowest eigenvalue). However, if I try something like this:
energies, states = zip(*sorted(zip(energies, states)))
or even
energies, states = zip(*sorted(zip(energies, states), key = lambda pair:pair[0])))
plotting in the same way no longer plots the correct state.So how can I sort states alongside energies, but only by row? (i.e, I want to associate each row of states with a value in energies, and I want to rearrange the rows so that the ordering of the rows corresponds to the sorted ordering of the values in energies)

You can use argsort as follows:
>>> x = np.random.random((1,10))
>>> x
array([ 0.69719108, 0.75828237, 0.79944838, 0.68245968, 0.36232211,
0.46565445, 0.76552493, 0.94967472, 0.43531813, 0.22913607])
>>> y = np.random.random((10))
>>> y
array([ 0.64332275, 0.34984653, 0.55240204, 0.31019789, 0.96354724,
0.76723872, 0.25721343, 0.51629662, 0.13096252, 0.86220311])
>>> idx = np.argsort(x)
>>> idx
array([9, 4, 8, 5, 3, 0, 1, 6, 2, 7])
>>> xsorted= x[idx]
>>> xsorted
array([ 0.22913607, 0.36232211, 0.43531813, 0.46565445, 0.68245968,
0.69719108, 0.75828237, 0.76552493, 0.79944838, 0.94967472])
>>> ysordedbyx = y[idx]
>>> ysordedbyx
array([ 0.86220311, 0.96354724, 0.13096252, 0.76723872, 0.31019789,
0.64332275, 0.34984653, 0.25721343, 0.55240204, 0.51629662])
and as suggested by the comments an example where we sort a 2d array by it's first collumn
>>> x=np.random.random((10,2))
>>> x
array([[ 0.72789275, 0.29404982],
[ 0.05149693, 0.24411234],
[ 0.34863983, 0.58950756],
[ 0.81916424, 0.32032827],
[ 0.52958012, 0.00417253],
[ 0.41587698, 0.32733306],
[ 0.79918377, 0.18465189],
[ 0.678948 , 0.55039723],
[ 0.8287709 , 0.54735691],
[ 0.74044999, 0.70688683]])
>>> idx = np.argsort(x[:,0])
>>> idx
array([1, 2, 5, 4, 7, 0, 9, 6, 3, 8])
>>> xsorted = x[idx,:]
>>> xsorted
array([[ 0.05149693, 0.24411234],
[ 0.34863983, 0.58950756],
[ 0.41587698, 0.32733306],
[ 0.52958012, 0.00417253],
[ 0.678948 , 0.55039723],
[ 0.72789275, 0.29404982],
[ 0.74044999, 0.70688683],
[ 0.79918377, 0.18465189],
[ 0.81916424, 0.32032827],
[ 0.8287709 , 0.54735691]])

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python Newaxis vs for loop - python

Related

How tp speed up looping arrays as inputs for pandas calculation?

How to do fancy indexing in tensorflow

efficient numpy array creation

Using array indexing to apply 2D array function on 3D array

Sorting an Array Alongside a 2d Array

Categories

Resources