Python data structure: parameter dependent arrays

Python data structure: parameter dependent arrays - python

I have a problem where I build some matrices depending on, let's say, two integer parameters. Let's call them A, that depend on p1, p2 where p1, p2 take values from 0 to 5.
Is there a way in Python to store the eigenvalues and eigenvectors of A in an "object", called B, such that somthing like B(1,2)[i] (or B[1,2,i]) will give as a result the eigenvalues (for i=0) or eigenvectors (for i=1) of the matrix A build with p1 = 1 and p2 = 2?
Currently what I am doing is storing the eigenvectors in a dictionary as in the simple example below, but I think it is a dirty hack. I would appreciate any
Example:
import numpy as np
# Build A matrices
def Amatrix(p1,p2):
import numpy as np
return np.array([[p1,p2/10],[p2/10,-p1]])
# Empty dict
eigvec_dict = {}
for p1 in range(3):
for p2 in range(2):
label = str(p1)+str(p2)
eigenvec_dict[label] = np.linalg.eigh(Amatrix(p1,p2))
eigenvec_dict.keys()
Out[9]: ['11', '10', '00', '01', '20', '21']
eigenvec_dict["01"][0]
Out[10]: array([-1., 1.])
eigenvec_dict["01"][1]
Out[11]:
array([[-0.70710678, 0.70710678],
[ 0.70710678, 0.70710678]])

I would use an object that takes a list of points (I think a point is better a tuple than a string) and calculates the eighs immediately.
__getitem__ is overridden returning for this [0, 1, 0] the eigen value for the point (0, 1). The internal data structure still is a dict, but its wrapped in an object and can be nicely called from outside.
import numpy as np
# class to store eigen values / vectors
class EigenH(object):
def __init__(self, points):
self.eighstore = self._create_eighstore(points)
def _create_eighstore(self, points):
eighstore = {}
for point in points:
eighs = np.linalg.eigh(self._get_amatrix(point))
eighstore[point] = eighs
return eighstore
def _get_amatrix(self, point):
p1, p2 = point
return np.array([[p1,p2/10.],[p2/10.,-p1]])
def __getitem__(self, key):
return self.eighstore[key[:2]][key[2]]
def keys(self):
return self.eighstore.keys()
# create point list
points = []
for p1 in range(3):
for p2 in range(2):
# I prefer tuples over strings in this case
points.append((p1, p2))
# instantiate class
eigh = EigenH(points)
# get eigen value
print(eigh[0, 1, 0])
#get eigen vectors
print(eigh[0, 1, 1])
# all available eighs
print(eigh.keys())

Related

All combinations of all elements of a 2D array

So I have matrix A
A = [[0,0,1,-1]
[0,0,1,-1]
[0,0,1,-1]
[0,0,1,-1]]
And I want to have all the possible combinations with these elements. This means that rows can change between them and columns as well. In this situation, I would expect a 4^4 = 256 possibilities. I have tried:
combs = np.array(list(itertools.product(*A)))
It does creates me, my desire to output a matrix of (256,4), but all the rows are equal. This means that I get vector [0,0,1,-1], 256 times.
Here is an example:
output = [[0,0,0,0]
[0,0,0,1]
[0,0,1,1]
[0,1,1,1]
[1,1,1,1]
[-1,1,1,-1]
[-1,-1,-1,-1]
....
[0,-1,0,-1]
Another example, if
A = [[1,2,3]
[4,5,6]
[7,8,9]]
The output should be all the possible combinations of arrays that the matrix can form
Combs =[[1,1,1]
[1,1,2]
[1,1,3]
[1,1,...9]
[2,1,1]
[2,2,1]
[1,2,1]
Another example would be:
I have the vector layers
layers = [1,2,3,4,5]
And then I have vector angle
angle = [0,90,45,-45]
each layer can have one of the angles, so I create a matrix A
A = [[0,90,45,-45]
[0,90,45,-45]
[0,90,45,-45]
[0,90,45,-45]
[0,90,45,-45]]
Great, but now I want to know all possible combinations that layers can have. For example, layer 1 can have an angle of 0º, layer 2 an angle of 90º, layer 3 an angle of 0º, layer 4 an angle of 45º and layer 5 and an angle of 0º. This creates the array
Comb = [0,90,0,45,0]
So all the combinations would be in a matrix
Comb = [[0,0,0,0,0]
[0,0,0,0,90]
[0,0,0,90,90]
[0,0,90,90,90]
[0,90,90,90,90]
[90,90,90,90,90]
...
[0,45,45,45,45]
[0,45,90,-45,90]]
How can I generalize this process for bigger matrices.
Am I doing something wrong?
Thank you!

It's OK to use np.array in conjunction with list(iterable), especially in your case where iterable is itertools.product(*A). However, this can be optimised since you know the shape of array of your output.
There are many ways to perform product so I'll just put my list:
Methods of Cartesian Product
import itertools
import numpy as np
def numpy_product_itertools(arr):
return np.array(list(itertools.product(*arr)))
def numpy_product_fromiter(arr):
dt = np.dtype([('', np.intp)]*len(arr)) #or np.dtype(','.join('i'*len(arr)))
indices = np.fromiter(itertools.product(*arr), dt)
return indices.view(np.intp).reshape(-1, len(arr))
def numpy_product_meshgrid(arr):
return np.stack(np.meshgrid(*arr), axis=-1).reshape(-1, len(arr))
def numpy_product_broadcast(arr): #a little bit different type of output
items = [np.array(item) for item in arr]
idx = np.where(np.eye(len(arr)), Ellipsis, None)
out = [x[tuple(i)] for x,i in zip(items, idx)]
return list(np.broadcast(*out))
Example of usage
A = [[1,2,3], [4,5], [7]]
numpy_product_itertools(A)
numpy_product_fromiter(A)
numpy_product_meshgrid(A)
numpy_product_broadcast(A)
Comparison of performance
import benchit
benchit.setparams(rep=1)
%matplotlib inline
sizes = [3,4,5,6,7]
N = sizes[-1]
arr = [np.arange(0,100,10).tolist()] * N
fns = [numpy_product_itertools, numpy_product_fromiter, numpy_product_meshgrid, numpy_product_broadcast]
in_ = {s: (arr[:s],) for s in sizes}
t = benchit.timings(fns, in_, multivar=True, input_name='Cartesian product of N arrays of length=10')
t.plot(logx=False, figsize=(12, 6), fontsize=14)
Note that numba beats majority of these algorithms although it's not included.

How do I use sum()/average() for namedtuple in python?

from collections import namedtuple
Point = namedtuple('Point', ['x', 'y'])
points = [Point(x=1.0, y=1.0), Point(x=2.0, y=2.0)]
I'd like to compute the average point out of points list, i.e. receive Point(1.5, 1.5) as a result:
point = average(points) # x = 1.5, y = 1.5
E.g. I know there's np.average(points, axis=0) if points.shape is (N, 2), but I'd rather keep a named tuple instead.

Calculate the average coordinate-wise:
import numpy as np
Point(np.average([p.x for p in points]),
np.average([p.y for p in points]))
#Point(x=1.5, y=1.5)
Or, better, implicitly convert the list of points to a numpy array, get the average, and convert the result back to a Point
Point(*np.average(points, axis=0))
#Point(x=1.5, y=1.5)

Maybe I am missing something, but if you want to avoid numpy then you can do
>>> Point(sum(p.x for p in points)/len(points),sum(p.y for p in points)/len(points))
Point(x=1.5, y=1.5)
Seems a bit roundabout, though.

Here's a cutsie way, using only Python built-ins:
In [1]: from collections import namedtuple
...: Point = namedtuple('Point', ['x', 'y'])
...: points = [Point(x=1.0, y=1.0), Point(x=2.0, y=2.0)]
...:
In [2]: import statistics
In [3]: Point(*map(statistics.mean, zip(*points)))
Out[3]: Point(x=1.5, y=1.5)
Why are you using numpy to begin with? It doesn't make much sense here.

This is a new subclass of namedtuple that has a classmethod to construct a new point from the average or sum of a sequence of points.
from collections import namedtuple
class Point(namedtuple('Point', ['x', 'y'])):
#classmethod
def from_average(cls, points):
from_sum = cls.from_sum(points)
return cls(from_sum.x / len(points), from_sum.y / len(points))
#classmethod
def from_sum(cls, points):
if not all(isinstance(p, cls) for p in points):
raise ValueError('All items in sequence must be of type {}'.format(cls.__name__))
x = sum(p.x for p in points)
y = sum(p.y for p in points)
return cls(x, y)
point_1 = Point(1.0, 1.0)
point_2 = Point(2.0, 2.0)
point_3 = Point.from_average([point_1, point_2])
point_3
# Point(x=1.5, y=1.5)
point_4 = Point.from_sum([point_1, point_2])
point_4
# Point(x=3.0, y=3.0)

Python multiprocessing pool.map with multiples arguments

I need some help because I tried since two days, and I don't know how I can do this. I have function compute_desc that takes multiples arguments (5 to be exact) and I would like to run this in parallel.
I have this for now:
def compute_desc(coord, radius, coords, feat, verbose):
# Compute here my descriptors
return my_desc # numpy array (1x10 dimensions)
def main():
points = np.rand.random((1000000, 4))
coords = points[:, 0:3]
feat = points[:, 3]
all_features = np.empty((1000000, 10))
all_features[:] = np.NAN
scales = [0.5, 1, 2]
for radius in scales:
for index, coord in enumerate(coords):
all_features[index, :] = compute_desc(coord,
radius,
coords,
feat,
False)
I would like to parallelize this. I saw several solutions with a Pool, but I don't understand how it works.
I tried with a pool.map(), but I can only send only one argument to the function.
Here is my solution (it doesn't work):
all_features = [pool.map(compute_desc, zip(point, repeat([radius,
coords,
feat,
False]
)
)
)]
but I doubt it can work with a numpy array.
EDIT
This is my minimum code with a pool (it works now):
import numpy as np
from multiprocessing import Pool
from itertools import repeat
def compute_desc(coord, radius, coords, feat, verbose):
# Compute here my descriptors
my_desc = np.rand.random((1, 10))
return my_desc
def compute_desc_pool(args):
coord, radius, coords, feat, verbose = args
compute_desc(coord, radius, coords, feat, verbose)
def main():
points = np.random.rand(1000000, 4)
coords = points[:, 0:3]
feat = points[:, 3]
scales = [0.5, 1, 2]
for radius in scales:
with Pool() as pool:
args = zip(points, repeat(radius),
repeat(coords),
repeat(feat),
repeat(kdtree),
repeat(False))
feat_one_scale = pool.map(compute_desc_pool, args)
feat_one_scale = np.array(feat_one_scale)
if radius == scales[0]:
all_features = feat_one_scale
else:
all_features = np.hstack([all_features, feat_one_scale])
# Others stuffs

The generic solution is to pass to Pool.map a sequence of tuples, each tuple holding one set of arguments for your worker function, and then to unpack the tuple in the worker function.
So, just change your function to accept only one argument, a tuple of your arguments, which you already prepared with zip and passed to Pool.map. Then simply unpack args to variables:
def compute_desc(args):
coord, radius, coords, feat, verbose = args
# Compute here my descriptors
Also, Pool.map should work with numpy types too, since after all, they are valid Python types.
Just be sure to properly zip 5 sequences, so your function receives a 5-tuple. You don't need to iterate over point in coords, zip will do that for you:
args = zip(coords, repeat(radius), repeat(coords), repeat(feat), repeat(False))
# args is a list of [(coords[0], radius, coords, feat, False), (coords[1], ... )]
(if you do, and give point as a first sequence to zip, the zip will iterate over that point, which is in this case a 3-element array).
Your Pool.map line should look like:
for radius in scales:
args = zip(coords, repeat(radius), repeat(coords), repeat(feat), repeat(False))
feat_one_scale = [pool.map(compute_desc_pool, args)]
# other stuff
A solution specific to your case, where all arguments except one are fixed could be to use functools.partial (as the other answer suggests). Furthermore, you don't even need to unpack coords in the first argument, just pass the index [0..n] in coords, since each invocation of your worker function already receives the complete coords array.

I assume from your example that four of those five arguments would be constant to all calls to compute_desc_pool. If so, then you can use partial to do this.
from functools import partial
....
def compute_desc_pool(coord, radius, coords, feat, verbose):
compute_desc(coord, radius, coords, feat, verbose)
def main():
points = np.random.rand(1000000, 4)
coords = points[:, 0:3]
feat = points[:, 3]
feat_one_scale = np.empty((1000000, 10))
feat_one_scale[:] = np.NAN
scales = [0.5, 1, 2]
pool = Pool()
for radius in scales:
feat_one_scale = [pool.map(partial(compute_desc_pool, radius, coords,
feat, False), coords)]

getting elements in an array1 that are not in array2

Main Problem
What is the better/pythonic way of retrieving elements in a particular array that are not found in a different array. This is what I have;
idata = [np.column_stack(data[k]) for k in range(len(data)) if data[k] not in final]
idata = np.vstack(idata)
My interest is in performance. My data is an (X,Y,Z) array of size (7000 x 3) and my gdata is an (X,Y) array of (11000 x 2)
Preamble
I am working on an octant search to find the n-number(e.g. 8) of points (+) closest to my circular point (o) in each octant. This would mean that my points (+) are reduced to only 64 (8 per octant). Then for each gdata I would save the elements that are not found in data.
import tkinter as tk
from tkinter import filedialog
import pandas as pd
import numpy as np
from scipy.spatial.distance import cdist
from collections import defaultdict
root = tk.Tk()
root.withdraw()
file_path = filedialog.askopenfilename()
data = pd.read_excel(file_path)
data = np.array(data, dtype=np.float)
nrow, cols = data.shape
file_path1 = filedialog.askopenfilename()
gdata = pd.read_excel(file_path1)
gdata = np.array(gdata, dtype=np.float)
gnrow, gcols = gdata.shape
N=8
delta = gdata - data[:,:2]
angles = np.arctan2(delta[:,1], delta[:,0])
bins = np.linspace(-np.pi, np.pi, 9)
bins[-1] = np.inf # handle edge case
octantsort = []
for j in range(gnrow):
delta = gdata[j, ::] - data[:, :2]
angles = np.arctan2(delta[:, 1], delta[:, 0])
octantsort = []
for i in range(8):
data_i = data[(bins[i] <= angles) & (angles < bins[i+1])]
if data_i.size > 0:
dist_order = np.argsort(cdist(data_i[:, :2], gdata[j, ::][np.newaxis]), axis=0)
if dist_order.size < npoint_per_octant+1:
[octantsort.append(data_i[dist_order[:npoint_per_octant][j]]) for j in range(dist_order.size)]
else:
[octantsort.append(data_i[dist_order[:npoint_per_octant][j]]) for j in range(npoint_per_octant)]
final = np.vstack(octantsort)
idata = [np.column_stack(data[k]) for k in range(len(data)) if data[k] not in final]
idata = np.vstack(idata)
Is there an efficient and pythonic way of doing this do increase performance in the last two lines of the code?

If I understand your code correctly, then I see the following potential savings:
dedent the final = ... line
don't use arctan it's expensive; since you only want octants compare the coordinates to zero and to each other
don't do a full argsort, use argpartition instead
make your octantsort an "octantargsort", i.e. store the indices into data, not the data points themselves; this would save you the search in the last but one line and allow you to use np.delete for removing
don't use append inside a list comprehension. This will produce a list of Nones that is immediately discarded. You can use list.extend outside the comprehension instead
besides, these list comprehensions look like a convoluted way of converting data_i[dist_order[:npoint_per_octant]] into a list, why not simply cast, or even keep as an array, since you want to vstack in the end?
Here is some sample code illustrating these ideas:
import numpy as np
def discard_nearest_in_each_octant(eater, eaten, n_eaten_p_eater):
# build octants
# start with quadrants ...
top, left = (eaten < eater).T
quadrants = [np.where(v&h)[0] for v in (top, ~top) for h in (left, ~left)]
dcoord2 = (eaten - eater)**2
dc2quadrant = [dcoord2[q] for q in quadrants]
# ... and split them
oct4158 = [q[:, 0] < q [:, 1] for q in dc2quadrant]
# main loop
dc2octants = [[q[o], q[~o]] for q, o in zip (dc2quadrant, oct4158)]
reloap = [[
np.argpartition(o.sum(-1), n_eaten_p_eater)[:n_eaten_p_eater]
if o.shape[0] > n_eaten_p_eater else None
for o in opair] for opair in dc2octants]
# translate indices
octantargpartition = [q[so] if oap is None else q[np.where(so)[0][oap]]
for q, o, oaps in zip(quadrants, oct4158, reloap)
for so, oap in zip([o, ~o], oaps)]
octantargpartition = np.concatenate(octantargpartition)
return np.delete(eaten, octantargpartition, axis=0)

Why does numpy.random.dirichlet() not accept multidimensional arrays?

On the numpy page they give the example of
s = np.random.dirichlet((10, 5, 3), 20)
which is all fine and great; but what if you want to generate random samples from a 2D array of alphas?
alphas = np.random.randint(10, size=(20, 3))
If you try np.random.dirichlet(alphas), np.random.dirichlet([x for x in alphas]), or np.random.dirichlet((x for x in alphas)), it results in a
ValueError: object too deep for desired array. The only thing that seems to work is:
y = np.empty(alphas.shape)
for i in xrange(np.alen(alphas)):
y[i] = np.random.dirichlet(alphas[i])
print y
...which is far from ideal for my code structure. Why is this the case, and can anyone think of a more "numpy-like" way of doing this?
Thanks in advance.

np.random.dirichlet is written to generate samples for a single Dirichlet distribution. That code is implemented in terms of the Gamma distribution, and that implementation can be used as the basis for a vectorized code to generate samples from different distributions. In the following, dirichlet_sample takes an array alphas with shape (n, k), where each row is an alpha vector for a Dirichlet distribution. It returns an array also with shape (n, k), each row being a sample of the corresponding distribution from alphas. When run as a script, it generates samples using dirichlet_sample and np.random.dirichlet to verify that they are generating the same samples (up to normal floating point differences).
import numpy as np
def dirichlet_sample(alphas):
"""
Generate samples from an array of alpha distributions.
"""
r = np.random.standard_gamma(alphas)
return r / r.sum(-1, keepdims=True)
if __name__ == "__main__":
alphas = 2 ** np.random.randint(0, 4, size=(6, 3))
np.random.seed(1234)
d1 = dirichlet_sample(alphas)
print "dirichlet_sample:"
print d1
np.random.seed(1234)
d2 = np.empty(alphas.shape)
for k in range(len(alphas)):
d2[k] = np.random.dirichlet(alphas[k])
print "np.random.dirichlet:"
print d2
# Compare d1 and d2:
err = np.abs(d1 - d2).max()
print "max difference:", err
Sample run:
dirichlet_sample:
[[ 0.38980834 0.4043844 0.20580726]
[ 0.14076375 0.26906604 0.59017021]
[ 0.64223074 0.26099934 0.09676991]
[ 0.21880145 0.33775249 0.44344606]
[ 0.39879859 0.40984454 0.19135688]
[ 0.73976425 0.21467288 0.04556287]]
np.random.dirichlet:
[[ 0.38980834 0.4043844 0.20580726]
[ 0.14076375 0.26906604 0.59017021]
[ 0.64223074 0.26099934 0.09676991]
[ 0.21880145 0.33775249 0.44344606]
[ 0.39879859 0.40984454 0.19135688]
[ 0.73976425 0.21467288 0.04556287]]
max difference: 5.55111512313e-17

I think you're looking for
y = np.array([np.random.dirichlet(x) for x in alphas])
for your list comprehension. Otherwise you're simply passing a python list or tuple. I imagine the reason numpy.random.dirichlet does not accept your list of alpha values is because it's not set up to - it already accepts an array, which it expects to have a dimension of k, as per the documentation.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python data structure: parameter dependent arrays - python

Related

All combinations of all elements of a 2D array

How do I use sum()/average() for namedtuple in python?

Python multiprocessing pool.map with multiples arguments

getting elements in an array1 that are not in array2

Why does numpy.random.dirichlet() not accept multidimensional arrays?

Categories

Resources