Python Genetic Algorithm "Natural" Selection - python

How can I perform selection (i.e. deletion of elements) in an array that tends towards lower numbers.
If I have an array of fitnesses sorted lowest to highest, how can I use random number generation that tends towards the smaller numbers to delete those elements at random.
pop_sorted_by_fitness = [1, 4, 10, 330]
I want to randomly delete one of those smaller elements, where it's most of the time 1, sometimes 4, and rarely 10, with almost never 330. How can I achieve this sort of algorithm.

How about making use of exponential distribution for sampling your indexes using numpy.random.exponential
import numpy as np
s = [1, 4, 10, 330]
limit = len(s)
scale = 10**int(np.log10(limit))
index = int(np.random.exponential()*scale)%limit
Test it
In [37]: sorted([int(np.random.exponential()*scale)%limit for _ in xrange(20)])
Out[37]: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 2, 3, 3]

Related

Define unit vector in Python with s components

I want to define a column vector v = (0, 0, …, 0, 1) in Python. The vector is supposed to have s components (for an arbitrary s), so the first s-1 components are supposed to be 0 and the last component 1. How do I do it? Because s is arbitrary. Thank you for your help in advance!
As I said, I wanted do use np.array but s is arbitrary. so I do not know how to use the np.array thingie (since the amount of components is not clear yet)
numpy.repeat is one option:
s = 10
v = np.repeat([0, 1], [s-1, 1])
Output: array([0, 0, 0, 0, 0, 0, 0, 0, 0, 1])
You can generalize to any number of values:
# one "1", zero "2", three "3", two "4"
np.repeat([1, 2, 3, 4], [1, 0, 3, 2])
# array([1, 3, 3, 3, 4, 4])

Efficient way to compute pair wise difference among the 1d numpy array

I have a specific problem.
I made up this code to compute difference of pairs of element in 1d array.
np.array([j-i for m, i in enumerate(X[:]) for j in X[m+1:]])
For example, for a input X=np.array([0, 1, 2, 0, 1, 2, 0, 1, 2]), this code return 9*8/2=36 elements array which is:
np.array([1,2,0,1,2,0,1,2,1,-1,0,1,-1,0,1,-2,-1,0,-2,-1,0,1,2,0,1,2,1,-1,0,1,-2,-1,0,1,2,1])
Although I understand that this code is inherently a O(n^2), my code takes a lot of time for larger array X (only n~400) and use a lot of memory. So I think double loop indexing is cause of this slow down and vectorization of this method may make it faster. Do you have any idea or know standard module to compute this?
You can do this (time) efficiently using broadcasting (which uses vectorization). The solution for X of length 400 in instantaneous on my machine:
# X = np.random.rand(400)
X=np.array([0, 1, 2, 0, 1, 2, 0, 1, 2])
X = X.reshape(-1,1)
M = X.T - X
idx = np.triu_indices(len(X), k=1)
solution = M[idx]
array([ 1, 2, 0, 1, 2, 0, 1, 2, 1, -1, 0, 1, -1, 0, 1, -2, -1,
0, -2, -1, 0, 1, 2, 0, 1, 2, 1, -1, 0, 1, -2, -1, 0, 1,
2, 1])
You want to compute the difference between all possible pairs? That's inherently a O(n2).
You're always going to run into trouble at some point, but you can go a lot further by not keeping the entire square in memory, only lazily generating and using each value as you iterate.

How to vectorize this pytorch code over (at least) the batch dimension?

I want to implement a code to build an adjacency matrix such that (for example):
If X[0] : [0, 1, 2, 0, 1, 0], then,
A[0, 1] = 1
A[1, 2] = 1
A[2, 0] = 1
A[0, 1] = 1
A[1, 0] = 1
The following code works fine, however, it's too slow! So, please help me to vectorize this code on the batch (first) dimension at least:
A = torch.zeros((3, 3, 3), dtype = torch.float)
X = torch.tensor([[0, 1, 2, 0, 1, 0], [1, 0, 0, 2, 1, 1], [0, 0, 2, 2, 1, 1]])
for a, x in zip(A, X):
for i, j in zip(x, x[1:]):
a[i, j] = 1
Thanks! :)
I am pretty sure that there is a much simpler way of doing this, but I tried to keep within the realm of torch function calls, to make sure that any gradient operation could be properly tracked.
In case this is not required for backpropagation, I strongly suggest you look into solution that maybe utilize some numpy functions, because I think there is a stronger guarantee to find something suitable here. But, without further ado, here is the solution I came up with.
It essentially transforms your X vector into a series of tuple entries that correspond to the position in A. For this, we need to align some of the indices (specifically, the first dimension is only implicitly given in X, since the first list in X corresponds to A[0,:,:], the second list to A[1,:,:], and so on.
This is also probably where you can start optimizing the code, because I did not find a reasonable description of such a matrix, and therefore had to come up with my own way of creating it.
# Start by "aligning" your shifted view of X
# Essentially, take the all but the last element,
# and put it on top of all but the first element.
X_shift = torch.stack([X[:,:-1], X[:,1:]], dim=2)
# X_shift.shape: (3,5,2) in your example
# To assign this properly, we need to turn it into a "concatenated" list,
# where each entry corresponds to a 2D tuple in the respective dimension of A.
temp_tuples = X_shift.view(-1,2).transpose(0,1)
# temp_tuples.shape: (2,15) in your example. Below are the values:
tensor([[0, 1, 2, 0, 1, 1, 0, 0, 2, 1, 0, 0, 2, 2, 1],
[1, 2, 0, 1, 0, 0, 0, 2, 1, 1, 0, 2, 2, 1, 1]])
# Now we have to create a matrix do indicate the proper "first dimension index"
fix_dims = torch.repeat_interleave(torch.arange(0,3,1), len(X[0])-1, 0).unsqueeze(dim=0)
# fix_dims.shape: (1,15)
# Long story short, this creates the following vector.
tensor([[0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2]])
# Note that the unsqueeze is necessary to properly concatenate the two matrices:
access_tuples = tuple(torch.cat([fix_dims, temp_tuples], dim=0))
A[access_tuples] = 1
This further assumes that every dimension in X has the same number of tuples changed. If that is not the case, then you have to manually create a fix_dims vector, where each increment is repeated the length of X[i] times. If it is equal as in your example, you can safely use the proposed solution.
Make X a tuple instead of a tensor:
A = torch.zeros((3, 3, 3), dtype = torch.float)
X = ([0, 1, 2, 0, 1, 0], [1, 0, 0, 2, 1, 1], [0, 0, 2, 2, 1, 1])
A[X] = 1
For example, by casting it like this: A[tuple(X)]

Python: Cumulative insertion of values in a sparse matrix (lil_matrix) due to repeated indices

my situation is as follows:
I have an array of results, say
S = np.array([2,3,10,-1,12,1,2,4,4]), which I would like to insert in the last row of a scipy.sparse.lil_matrix M according to an array of column indices with possibly repeated elements (with no specific pattern), e.g.:
j = np.array([3,4,5,14,15,16,3,4,5]).
When column indices are repeated, the sum of their corresponding values in S should be inserted in the matrix M. Thus, in the example above, results [4,7,14] should be placed in columns [3,4,5] of the last row of M. In other words, I would like to achieve something like:
M[-1,j] = np.array([2+2,3+4,10+4,-1,12,1]).
Calculation speed is very important for my program, such that I should avoid using loops. Looking forward to your clever solutions! Thanks!
That kind of summation is the normal behavior for sparse matrices, especially in the csr format.
define the 3 input arrays:
In [408]: S = np.array([2,3,10,-1,12,1,2,4,4])
In [409]: j=np.array([3,4,5,14,15,16,3,4,5])
In [410]: i=np.ones(S.shape,int)
The coo format takes those 3 arrays, as is, without change
In [411]: c0=sparse.coo_matrix((S,(i,j)))
In [412]: c0.data
Out[412]: array([ 2, 3, 10, -1, 12, 1, 2, 4, 4])
But when converted to csr format, it sums repeated indices:
In [413]: c1=c0.tocsr()
In [414]: c1.data
Out[414]: array([ 4, 7, 14, -1, 12, 1], dtype=int32)
In [415]: c1.A
Out[415]:
array([[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[ 0, 0, 0, 4, 7, 14, 0, 0, 0, 0, 0, 0, 0, 0, -1, 12, 1]], dtype=int32)
That summation is also done when converting the coo to dense or array, c0.A.
and when converting to lil:
In [419]: cl=c0.tolil()
In [420]: cl.data
Out[420]: array([[], [4, 7, 14, -1, 12, 1]], dtype=object)
In [421]: cl.rows
Out[421]: array([[], [3, 4, 5, 14, 15, 16]], dtype=object)
lil_matrix does not accept the (data,(i,j)) input directly, so you have to go through coo if that is your target.
http://docs.scipy.org/doc/scipy-0.15.1/reference/generated/scipy.sparse.coo_matrix.html
By default when converting to CSR or CSC format, duplicate (i,j) entries will be summed together. This facilitates efficient construction of finite element matrices and the like. (see example)
To do this as an insertion in an existing lil use an intermediate csr:
In [443]: L=sparse.lil_matrix((3,17),dtype=S.dtype)
In [444]: L[-1,:]=sparse.csr_matrix((S,(np.zeros(S.shape),j)))
In [445]: L.A
Out[445]:
array([[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[ 0, 0, 0, 4, 7, 14, 0, 0, 0, 0, 0, 0, 0, 0, -1, 12, 1]])
This statement is faster than the one using csr_matrix;
L[-1,:]=sparse.coo_matrix((S,(np.zeros(S.shape),j)))
Examine L.__setitem__ if you are really worried about speed. Off hand it looks like it normally converts a sparse matrix to array
L[-1,:]=sparse.coo_matrix((S,(np.zeros(S.shape),j))).A
takes the same time. With a small test case like this, the overhead of creating an intermediate matrix can swamp any time spent adding these duplicate indices.
In general, inserting or appending values to an existing sparse matrix is slow, regardless of whether you do this summation or not. Where possible it is best to create the data, i and j arrays for the whole matrix first, and then make the sparse matrix.
You could use a defaultdict that maps the M column indices to their value and use the map function to update this defaultdict, like so:
from collections import defaultdict
d = defaultdict(int) #Use your array type here
def f(j, s):
d[j] += s
map(f, j, S)
M[-1, d.keys()] = d.values() #keys and values are always in the same order
Instead of map, you can use filter if you don't want to create a list of None uselessly:
d = defaultdict(int) #Use your array type here
def g(e):
d[e[1]] += S[e[0]]
filter(g, enumerate(j))
M[-1, d.keys()] = d.values() #keys and values are always in the same

Scipy - find bases of column space of matrix

I'm trying to code up a simple Simplex algorithm, the first step of which is to find a basic feasible solution:
Choose a set B of linearly independent columns of A
Set all components of x corresponding to the columns not in B to zero.
Solve the m resulting equations to determine the components of x. These are the basic variables.
I know the solution will involve using scipy.linalg.svd (or scipy.linalg.lu) and some numpy.argwhere / numpy.where magic, but I'm not sure exactly how.
Does anyone have a pure-Numpy/Scipy implementation of finding a basis (step 1) or, even better, all of the above?
Example:
>>> A
array([[1, 1, 1, 1, 0, 0, 0],
[1, 0, 0, 0, 1, 0, 0],
[0, 0, 1, 0, 0, 1, 0],
[0, 3, 1, 0, 0, 0, 1]])
>>> u, s, v = scipy.linalg.svd(A)
>>> non_zero, = numpy.where(s > 1e-7)
>>> rank = len(non_zero)
>>> rank
4
>>> for basis in some_unknown_function(A):
... print(basis)
{3, 4, 5, 6}
{1, 4, 5, 6}
and so on.
A QR decomposition provides an orthogonal basis for the column space of A:
q,r = np.linalg.qr(A)
If the rank of A is n, then the first n columns of q form a basis for the column space of A.
Try using this
scipy.linalg.orth(A)
this produces orthonormal basis for the matrix A

Categories