python all possible products between columns - python

I have a numpy matrix X and I would like to add to this matrix as new variables all the possible products between 2 columns.
So if X=(x1,x2,x3) I want X=(x1,x2,x3,x1x2,x2x3,x1x3)
Is there an elegant way to do that?
I think a combination of numpy and itertools should work
EDIT:
Very good answers but are they considering that X is a matrix? So x1,x1,.. x3 can eventually be arrays?
EDIT:
A Real example
a=array([[1,2,3],[4,5,6]])

Itertools should be the answer here.
a = [1, 2, 3]
p = (x * y for x, y in itertools.combinations(a, 2))
print list(itertools.chain(a, p))
Result:
[1, 2, 3, 2, 3, 6] # 1, 2, 3, 2 x 1, 3 x 1, 3 x 2

I think Samy's solution is pretty good. If you need to use numpy, you could transform it a little like this:
from itertools import combinations
from numpy import prod
x = [1, 2, 3]
print x + map(prod, combinations(x, 2))
Gives the same output as Samy's solution:
[1, 2, 3, 2, 3, 6]

If your arrays are small, then Samy's pure-Python solution using itertools.combinations should be fine:
from itertools import combinations, chain
def all_products1(a):
p = (x * y for x, y in combinations(a, 2))
return list(chain(a, p))
But if your arrays are large, then you'll get a substantial speedup by fully vectorizing the computation, using numpy.triu_indices, like this:
import numpy as np
def all_products2(a):
x, y = np.triu_indices(len(a), 1)
return np.r_[a, a[x] * a[y]]
Let's compare these:
>>> data = np.random.uniform(0, 100, (10000,))
>>> timeit(lambda:all_products1(data), number=1)
53.745754408999346
>>> timeit(lambda:all_products2(data), number=1)
12.26144006299728
The solution using numpy.triu_indices also works for multi-dimensional data:
>>> np.random.uniform(0, 100, (3,2))
array([[ 63.75071196, 15.19461254],
[ 94.33972762, 50.76916376],
[ 88.24056878, 90.36136808]])
>>> all_products2(_)
array([[ 63.75071196, 15.19461254],
[ 94.33972762, 50.76916376],
[ 88.24056878, 90.36136808],
[ 6014.22480172, 771.41777239],
[ 5625.39908354, 1373.00597677],
[ 8324.59122432, 4587.57109368]])
If you want to operate on columns rather than rows, use:
def all_products3(a):
x, y = np.triu_indices(a.shape[1], 1)
return np.c_[a, a[:,x] * a[:,y]]
For example:
>>> np.random.uniform(0, 100, (2,3))
array([[ 33.0062385 , 28.17575024, 20.42504351],
[ 40.84235995, 61.12417428, 58.74835028]])
>>> all_products3(_)
array([[ 33.0062385 , 28.17575024, 20.42504351, 929.97553238,
674.15385734, 575.4909246 ],
[ 40.84235995, 61.12417428, 58.74835028, 2496.45552756,
2399.42126888, 3590.94440122]])

Related

What's meaning of plt.plot(x[0:-1],y/y[0])?

I am plotting an exponential distribution using the information provided by the tutor.
plt.plot(x[:-1],y/y[0])
plt.plot(tvals,pvals)
plt.show()
But, I do not know what's meaning of x[:-1] and y/y[0]?
x[:-1] means all the elements except the last one
y/y[0] is simply dividing the array y by the first value i.e y[0] of the array.
Code Example
import numpy as np
import matplotlib.pyplot as plt
x = np.array([1, 3, 5, 7])
y = np.array([2, 4, 6])
a = x[:-1] # [1, 3, 5]
b = y/y[0] # [1, 2, 3]
plt.plot(a, b)
Output

Filtering of array elements by another array in numpy

Here a simple example
import numpy as np
x=np.random.rand(5,5)
k,p = np.where(x>0.5)
k and p are arrays of indices
Now I have a list of rows which should be considered m=[0,2,4], so I need to find all entries of k which are in the list m.
I came up with a very simple but horrible inefficient solution
d = np.array([ (a,b) for a,b in zip(k,p) if a in m])
The solution works, but very slow. I’m looking for a better and more efficient one. I need to do a few millions of such operations with dynamically adjusted m, so efficiency of an algorithm is really a critical question.
Maybe the below is faster:
d=np.dstack((k,p))[0]
print(d[np.isin(d[:,0],m)])
You could use isin() to get a boolean mask which you can use to index k.
>>> x=np.random.rand(3,3)
>>> x
array([[0.74043564, 0.48328081, 0.82396324],
[0.40693944, 0.24951958, 0.18043229],
[0.46623863, 0.53559775, 0.98956277]])
>>> k, p = np.where(x > 0.5)
>>> p
array([0, 2, 1, 2])
>>> k
array([0, 0, 2, 2])
>>> m
array([0, 1])
>>> np.isin(k, m)
array([ True, True, False, False])
>>> k[np.isin(k, m)]
array([0, 0])
How about:
import numpy as np
m = np.array([0, 2, 4])
k, p = np.where(x[m, :] > 0.5)
k = m[k]
print(zip(k, p))
This only considers the interesting rows (and then zips them to 2d indices).

Python NumPy: How to fill a matrix using an equation

I wish to initialise a matrix A, using the equation A_i,j = f(i,j) for some f (It's not important what this is).
How can I do so concisely avoiding a situation where I have two for loops?
numpy.fromfunction fits the bill here.
Example from doc:
>>> import numpy as np
>>> np.fromfunction(lambda i, j: i + j, (3, 3), dtype=int)
array([[0, 1, 2],
[1, 2, 3],
[2, 3, 4]])
One could also get the indexes of your array with numpy.indices and then apply the function f in a vectorized fashion,
import numpy as np
shape = 1000, 1000
Xi, Yj = np.indices(shape)
A = (2*Xi + 3*Yj).astype(np.int) # or any other function f(Xi, Yj)

Sorting an Array Alongside a 2d Array

So I'm using NumPy's linear algebra routines to do some basic computational quantum mechanics. Say I have a matrix, hamiltonian, and I want its eigenvalues and eigenvectors
import numpy as np
from numpy import linalg as la
hamiltonian = np.zeros((N, N)) # N is some constant I have defined
# fill up hamiltonian here
energies, states = la.eig(hamiltonian)
Now, I want to sort the energies in increasing order, and I want to sort the states along with them. For example, if I do:
groundStateEnergy = min(energies)
groundStateIndex = np.where(energies == groundStateEnergy)
groundState = states[groundStateIndex, :]
I correctly plot the ground state (eigenvector with the lowest eigenvalue). However, if I try something like this:
energies, states = zip(*sorted(zip(energies, states)))
or even
energies, states = zip(*sorted(zip(energies, states), key = lambda pair:pair[0])))
plotting in the same way no longer plots the correct state.So how can I sort states alongside energies, but only by row? (i.e, I want to associate each row of states with a value in energies, and I want to rearrange the rows so that the ordering of the rows corresponds to the sorted ordering of the values in energies)
You can use argsort as follows:
>>> x = np.random.random((1,10))
>>> x
array([ 0.69719108, 0.75828237, 0.79944838, 0.68245968, 0.36232211,
0.46565445, 0.76552493, 0.94967472, 0.43531813, 0.22913607])
>>> y = np.random.random((10))
>>> y
array([ 0.64332275, 0.34984653, 0.55240204, 0.31019789, 0.96354724,
0.76723872, 0.25721343, 0.51629662, 0.13096252, 0.86220311])
>>> idx = np.argsort(x)
>>> idx
array([9, 4, 8, 5, 3, 0, 1, 6, 2, 7])
>>> xsorted= x[idx]
>>> xsorted
array([ 0.22913607, 0.36232211, 0.43531813, 0.46565445, 0.68245968,
0.69719108, 0.75828237, 0.76552493, 0.79944838, 0.94967472])
>>> ysordedbyx = y[idx]
>>> ysordedbyx
array([ 0.86220311, 0.96354724, 0.13096252, 0.76723872, 0.31019789,
0.64332275, 0.34984653, 0.25721343, 0.55240204, 0.51629662])
and as suggested by the comments an example where we sort a 2d array by it's first collumn
>>> x=np.random.random((10,2))
>>> x
array([[ 0.72789275, 0.29404982],
[ 0.05149693, 0.24411234],
[ 0.34863983, 0.58950756],
[ 0.81916424, 0.32032827],
[ 0.52958012, 0.00417253],
[ 0.41587698, 0.32733306],
[ 0.79918377, 0.18465189],
[ 0.678948 , 0.55039723],
[ 0.8287709 , 0.54735691],
[ 0.74044999, 0.70688683]])
>>> idx = np.argsort(x[:,0])
>>> idx
array([1, 2, 5, 4, 7, 0, 9, 6, 3, 8])
>>> xsorted = x[idx,:]
>>> xsorted
array([[ 0.05149693, 0.24411234],
[ 0.34863983, 0.58950756],
[ 0.41587698, 0.32733306],
[ 0.52958012, 0.00417253],
[ 0.678948 , 0.55039723],
[ 0.72789275, 0.29404982],
[ 0.74044999, 0.70688683],
[ 0.79918377, 0.18465189],
[ 0.81916424, 0.32032827],
[ 0.8287709 , 0.54735691]])

numpy ndarray slicing and iteration

I'm trying to slice and iterate over a multidimensional array at the same time. I have a solution that's functional, but it's kind of ugly, and I bet there's a slick way to do the iteration and slicing that I don't know about. Here's the code:
import numpy as np
x = np.arange(64).reshape(4,4,4)
y = [x[i:i+2,j:j+2,k:k+2] for i in range(0,4,2)
for j in range(0,4,2)
for k in range(0,4,2)]
y = np.array(y)
z = np.array([np.min(u) for u in y]).reshape(y.shape[1:])
Your last reshape doesn't work, because y has no shape defined. Without it you get:
>>> x = np.arange(64).reshape(4,4,4)
>>> y = [x[i:i+2,j:j+2,k:k+2] for i in range(0,4,2)
... for j in range(0,4,2)
... for k in range(0,4,2)]
>>> z = np.array([np.min(u) for u in y])
>>> z
array([ 0, 2, 8, 10, 32, 34, 40, 42])
But despite that, what you probably want is reshaping your array to 6 dimensions, which gets you the same result as above:
>>> xx = x.reshape(2, 2, 2, 2, 2, 2)
>>> zz = xx.min(axis=-1).min(axis=-2).min(axis=-3)
>>> zz
array([[[ 0, 2],
[ 8, 10]],
[[32, 34],
[40, 42]]])
>>> zz.ravel()
array([ 0, 2, 8, 10, 32, 34, 40, 42])
It's hard to tell exactly what you want in the last mean, but you can use stride_tricks to get a "slicker" way. It's rather tricky.
import numpy.lib.stride_tricks
# This returns a view with custom strides, x2[i,j,k] matches y[4*i+2*j+k]
x2 = numpy.lib.stride_tricks(
x, shape=(2,2,2,2,2,2),
strides=(numpy.array([32,8,2,16,4,1])*x.dtype.itemsize))
z2 = z2.min(axis=-1).min(axis=-2).min(axis=-3)
Still, I can't say this is much more readable. (Or efficient, as each min call will make temporaries.)
Note, my answer differs from Jaime's because I tried to match your elements of y. You can tell if you replace the min with max.

Categories