Script for making contracted basis set in python - python

I'm trying to make a script which generates a contracted basis set that I need in my scientific calculations. The output file contains basically 2D grid of numbers. In the first column I have exponents and the second column and columns after that contain contraction coefficients. In the example below I have four functions (that correspond to 4 exponents) and in this contraction scheme these four functions are contracted into two functions in such a way that in the first function I have 3 exponents and hence 3 nonzero coefficients. The second function consist only of 1 function and hence 1 nonzero coefficient. So, 4 functions contracted into two as 3 + 1 it would look like this
exponent1 coefficient1 0
exponent2 coefficient2 0
exponent3 coefficient3 0
exponent4 0 coefficient4
so that the non-zero numbers in the column are contracted together. In my script I of course try to make a general scheme so that if I have n exponents I can make a contraction scheme where I can have any kind of scheme between 1 and n functions.
The script is as follows now
#!/usr/bin/python3
def contract(e, i, c, n):
"""
e = list of exponents == number of functions.
Functions e are contracted to i functions whose coefficients
are determined by c and n is a list which tells how the contraction
is done (e.g. n = [3, 1] --> functions contracted into two as 3 + 1.)
"""
num = len(e)
grid = [[0 for i in range(i + 1)] for x in range(num)]
for num1, row1 in enumerate(grid):
row1[0] = e[num1] #add exponents
for g in grid:
print(g)
i = 2
e = [0, 1, 2, 3]
c = [4, 5, 6, 7]
n = [3, 1]
contract(e, i, c, n)
This works so far and the output this produces now is:
[0, 0, 0]
[1, 0, 0]
[2, 0, 0]
[3, 0, 0]
The output should be:
[0, 4, 0]
[1, 5, 0]
[2, 6, 0]
[3, 0, 7]
The problem is that I don't know how I could do the rest of the code. So how could I get the coefficients to correct places? I will then somehow print the numbers in the list so the final output file should in this case look like:
0 4 0
1 5 0
2 6 0
3 0 7
Does anyone have an idea how I could do this?

Related

Error when trying to implement MERGE algorithm merging to sorted lists of integers in python?

I'm new to both algorithms AND programming.
As an intro to the MERGE algorithms the chapter introduces first the MERGE algorithm by itself. It merges and sorts an array consisting of 2 sorted sub-arrays.
I did the pseudocode on paper according to the book:
Source: "Introduction to Algorithms
Third Edition" Thomas H. Cormen Charles E. Leiserson Ronald L. Rivest Clifford Stein
Since I am implementing it in python3 I had to change some lines given that indexing in python starts at 0 unlike in the pseudocode example of the book.
Keep in mind that the input is one array that contains 2 SORTED sub-arrays which are then merged and sorted, and returned. I kept the prints in my code, so you can see my checks...
#!/anaconda3/bin/python3
import math
import argparse
# For now only MERGE slides ch 2 -- Im defining p q and r WITHIN the function
# But for MERGE_SORT p,q and r are defined as parameters!
def merge(ar):
'''
Takes as input an array. This array consists of 2 subarrays that ARE ALLREADY sorted
(small to large). When splitting the array into half, the left
part will be longer by one if not divisible by 2. These subarrays will be
called left and right. Each of the subarrays must already be sorted. Merge() then
merges these sorted arrays into one big sorted array. The sorted array is returned.
'''
print(ar)
p=0 # for now defining always as 0
if len(ar)%2==0:
q=len(ar)//2-1 # because indexing starts from ZERO in py
else:
q=len(ar)//2 # left sub array will be 1 item longer
r=len(ar)-1 # again -1 because indexing starts from ZERO in py
print('p', p, 'q', q, 'r', r)
# lets see if n1 and n2 check out
n_1 = q-p+1 # lenght of left subarray
n_2 = r-q # lenght of right subarray
print('n1 is: ', n_1)
print('n2 is: ', n_2)
left = [0]*(n_1+1) # initiating zero list of lenght n1
right=[0]*(n_2+1)
print(left, len(left))
print(right, len(right))
# filling left and right
for i in range(n_1):# because last value will always be infinity
left[i] = ar[p+i]
for j in range(n_2):
right[j] = ar[q+j+1]
#print(ar[q+j+1])
#print(right[j])
# inserting infinity at last index for each subarray
left[n_1]=math.inf
right[n_2]=math.inf
print(left)
print(right)
# merging: initiating indexes at 0
i=0
j=0
print('p', p)
print('r', r)
for k in range(p,r):
if left[i] <= right[j]:
ar[k]=left[i]
# increase i
i += 1
else:
ar[k]=right[j]
#increase j
j += 1
print(ar)
#############################################################################################################################
# Adding parser
#############################################################################################################################
parser = argparse.ArgumentParser(description='MERGE algorithm from ch 2')
parser.add_argument('-a', '--array', type=str, metavar='', required=True, help='One List of integers composed of 2 sorted halves. Sorting must start from smallest to largest for each of the halves.')
args = parser.parse_args()
args_list_st=args.array.split(',') # list of strings
args_list_int=[]
for i in args_list_st:
args_list_int.append(int(i))
if __name__ == "__main__":
merge(args_list_int)
The problem:
When I try to sort the array as shown in the book the merged array that is returned contains two 6es and the 7 is lost.
$ ./2.merge.py -a=2,4,5,7,1,2,3,6
[2, 4, 5, 7, 1, 2, 3, 6]
p 0 q 3 r 7
n1 is: 4
n2 is: 4
[0, 0, 0, 0, 0] 5
[0, 0, 0, 0, 0] 5
[2, 4, 5, 7, inf]
[1, 2, 3, 6, inf]
p 0
r 7
[1, 2, 2, 3, 4, 5, 6, 6]
This does how ever not happen with arrays of any number higher than 6.
$ ./2.merge.py -a=2,4,5,7,1,2,3,8
[2, 4, 5, 7, 1, 2, 3, 8]
p 0 q 3 r 7
n1 is: 4
n2 is: 4
[0, 0, 0, 0, 0] 5
[0, 0, 0, 0, 0] 5
[2, 4, 5, 7, inf]
[1, 2, 3, 8, inf]
p 0
r 7
[1, 2, 2, 3, 4, 5, 7, 8]
I showed it to a colleague in my class without success. And I've walked it through manually with numbers on paper snippets but withouth success. I hope someone can find my silly mistake because I'm completely stuck.
Thanks
As r is the index of the last value in arr, you need to add one to it to make a range that also includes that final index:
for k in range(p, r + 1):
# ^^^^^
Note that your code could be greatly reduced if you would use list slicing.
Brother you made a very small mistake in this line
for k in range(p,r):
Here you loop is running from p to r-1 and your last index i.e r, will not get iterated.
So you have to use
for k in range(p,r+1):
And in the second testcase a=[2,4,5,7,1,2,3,8]
You are getting the correct output even with your wrong code because you are overwriting the values in array ar and your current code was able to sort the array till index r-1 and the number present at index r will be the same which was present before the execution of your merge function i.e 8
Try using this testcase: [2, 4, 5, 8, 1, 2, 3, 7]
And your output will be [1, 2, 2, 3, 4, 5, 7, 7]
Hope this helped

How can I get exactly the same amount elements replaced in numpy 2D matrix?

I got a symmetrical 2D numpy matrix, it only contains ones and zeros and diagonal elements are always 0.
I want to replace part of the elements from one to zero, and the result need to keep symmetrical too. How many elements will be selected depends on the parameterreplace_rate.
Since it's a symmetrical matrix, I take half of the matrix and select the elements(those values are 1) randomly, change them from 1 to 0. And then with a mirror operation, make sure the whole matrix are still symmetrical.
For example
com = np.array ([[0, 1, 1, 1, 1],
[1, 0, 1, 1, 1],
[1, 1, 0, 1, 1],
[1, 1, 1, 0, 1],
[1, 1, 1, 1, 0]])
replace_rate = 0.1
com = np.triu(com)
mask = np.random.choice([0,1],size=(com.shape),p=((1-replace_rate),replace_rate)).astype(np.bool)
r1 = np.random.rand(*com.shape)
com[mask] = r1[mask]
com += com.T - np.diag(com.diagonal())
com is a (5,5) symmetrical matrix, and 10% of elements (only include those values are 1, the diagonal elements are excluded) will be replaced to 0 randomly.
The question is , how can I make sure the amount of elements changed keep the same each time?
Keep the same replace_rate = 0.1, sometimes I will get result like:
com = np.array([[0 1 1 1 1]
[1 0 1 1 1]
[1 1 0 1 1]
[1 1 1 0 1]
[1 1 1 1 0]])
Actually no one changed this time, and if I repeat it, I got 2 elements changed :
com = np.array([[0 1 1 1 1]
[1 0 1 1 1]
[1 1 0 1 0]
[1 1 1 0 1]
[1 1 0 1 0]])
I want to know how to fix the amount of elements changed with the same replace_rate?
Thanks in advance!!
How about something like this:
def make_transform(m, replace_rate):
changed = [] # keep track of indices we already changed
def get_random():
# Get a random pair of indices which are not equal (i.e. not on the diagonal)
c1, c2 = random.choices(range(len(com)), k=2)
if c1 == c2 or (c1,c2) in changed or (c2,c1) in changed:
return get_random() # Recurse until we find an i,j pair : i!=j , that hasnt already been changed
else:
changed.append((c1,c2))
return c1, c2
n_changes = int(m.shape[0]**2 * replace_rate) # the number of changes to make
print(n_changes)
for _ in range(n_changes):
i, j = get_random() # Get an valid index
m[i][j] = m[j][i] = 0
return m
This is the solution I suggest:
def rand_zero(mat, replace_rate):
triu_mat = np.triu(mat)
_ind = np.where(triu_mat != 0) # gets indices of non-zero elements, not just non-diagonals
ind = [x for x in zip(*_ind)]
chng = np.random.choice(range(len(ind)), # select some indices, at rate 'replace_rate'
size = int(replace_rate*mat.size),
replace = False) # do not select duplicates
mod_mat = triu_mat
for c in chng:
mod_mat[ind[c]] = 0
mod_mat = mod_mat + mod_mat.T
return mod_mat
I use int() to truncate to an integer in size, but you can use round() if that's what you desire.
Hope this gives consistent results!

compute density map D

You are given two integer numbers n and r, such that 1 <= r < n,
a two-dimensional array W of size n x n.
Each element of this array is either 0 or 1.
Your goal is to compute density map D for array W, using radius of r.
The output density map is also two-dimensional array,
where each value represent number of 1's in matrix W within the specified radius.
Given the following input array W of size 5 and radius 1 (n = 5, r = 1)
1 0 0 0 1
1 1 1 0 0
1 0 0 0 0
0 0 0 1 1
0 1 0 0 0
Output (using Python):
3 4 2 2 1
4 5 2 2 1
3 4 3 3 2
2 2 2 2 2
1 1 2 2 2
Logic: Input first row, first column value is 1. r value is 1. So we should check 1 right element, 1 left element, 1 top element, top left, top right, bottom , bottom left and bottom right and sum all elements.
Should not use any 3rd party library.
I did it using for loop and inner for loop and check for each element. Any better work around ?
Optimization: For each 1 in W, update count for locations, in whose neighborhood it belongs
Although for W of size nxn, the following algorithm would still take O(n^2) steps, however if W is sparse i.e. number of 1s (say k) << nxn then instead of rxrxnxn steps for approach stated in question, following would take nxn + rxrxk steps, which is much lower if k << nxn
Given r assigned and W stored as
[[1, 0, 0, 0, 1],
[1, 1, 1, 0, 0],
[1, 0, 0, 0, 0],
[0, 0, 0, 1, 1],
[0, 1, 0, 0, 0]]
then following
output = [[ 0 for i in range(5) ] for j in range(5) ]
for i in range(len(W)):
for j in range(len(W[0])):
if W[i][j] == 1:
for off_i in range(-r,r+1):
for off_j in range(-r,r+1):
if (0 <= i+off_i < len(W)) and (0 <= j+off_j < len(W[0])):
output[i+off_i][j+off_j] += 1
stores required values in output
for r = 1, output is as required
[[3, 4, 2, 2, 1],
[4, 5, 2, 2, 1],
[3, 4, 3, 3, 2],
[2, 2, 2, 2, 2],
[1, 1, 2, 2, 2]]

Numpy: Storing standard basis vector in a memory efficient way

This was the only question I found about standard basis vectors in numpy but it's not really related to my question.
I have a numpy array of integers and I want to determine the co-occurrence matrix which stores the number of times indicies have the same value in the same column. This question describes the problem in more detail.
I have a method of solving my problem but it doesn't scale well.
My question then is this:
Is it possible to store standard basis vectors in a numpy array in a memory efficient manner?
I want to be able to do the following:
Given an array
M = e1 e2 e1
e1 e2 e2
e3 e1 e3
e2 e3 e3
where ei is the transposed i-th standard basis vector of the vector space (R3 in this case), perform matrix multiplication with the transpose of M, i.e. determine np.dot(M, M.T). To be clear, the matrix M above could be written as:
M = 1 0 0 0 1 0 1 0 0
1 0 0 0 1 0 0 1 0
0 0 1 1 0 0 0 0 1
0 1 0 0 0 1 0 0 1
(extra spaces added for emphasis).
The issue with representing the matrix like this is that it isn't scalable in memory with the number of rows and dimension of the vector space.
EDIT: I should mention that the number of columns can increase as well. The memory complexity is D * R * C where D is the dimension of the vector space, R is the number of rows and C is the number of columns. In an average working example I have roughly D == 150, R == 2000 and C == 1000 though R can go up to 20,000 and C is unbounded (though 10,000 is a reasonable estimate).
The rules for standard basis vector multiplication are simple (ei * ei.T == 1, ei * ej.T == 0 if i != j) so I was wondering if it's possible to store these rules in a numpy array to save memory.
Let's encode the basis vectors with numbers: e1 -> 1, e2 -> 2, ... This allows a very memory-efficient storage.
M = np.array([[1, 2, 1], [1, 2, 2], [3, 1, 3], [2, 3, 3]], dtype=np.uint8)
# if more than 255 basis vectors, use uint16.
Now we only need to implement a special dot product that works with these basis vectors. Basically we only replace the multiplication with a comparison:
def basis_dot(a, b):
return np.sum(a[:, :, np.newaxis] == b[np.newaxis, :, :], axis=1)
print(basis_dot(M, M.T))
# [[3 2 0 0]
# [2 3 0 0]
# [0 0 3 1]
# [0 0 1 3]]
Let's verify the result:
M = np.array([[1, 0, 0, 0, 1, 0, 1, 0, 0],
[1, 0, 0, 0, 1, 0, 0, 1, 0],
[0, 0, 1, 1, 0, 0, 0, 0, 1],
[0, 1, 0, 0, 0, 1, 0, 0, 1]])
np.dot(M, M.T)
# array([[3, 2, 0, 0],
# [2, 3, 0, 0],
# [0, 0, 3, 1],
# [0, 0, 1, 3]])
A potential drawback with the approach is the large temporary array required in basis_dot. The memory requirement can be reduced by explicitly coding the loops, at the cost of performance (unless you use a jit compiler).
# slower but more memory friendly
def basis_dot(a, b):
out = np.empty((a.shape[0], b.shape[1]))
for i in range(a.shape[0]):
for j in range(b.shape[1]):
out[i, j] = np.sum(a[i, :] == b[:, j])
return out
So, my assumption based on your example is that you're actually working with a higher dimensionality than just 3. My other assumption is that you're not computing any basis vectors, but just auto-generating basis vectors for RN. I'll ignore the question of exactly what you're trying to accomplish or why you're storing vectors that you can easily auto-generate for now.
If all of the above assumptions are accurate then you can likely gain a lot of benefit by storing in a sparse data format. This will only improve storage if you've got a preponderance of zeroes, but that seems like a reasonable assumption. There are a large number of sparse formats which you can view here. My best guess for you would be the coo_matrix class.
from scipy.sparse import coo_matrix
new_matrix = coo_matrix(<your_matrix>)
Then saving new matrix in your format of choice.

Efficient way of re-numbering elements in an array

I am reasonably new to python and am trying to implement a genetic algorithm, but need some assistance with the code for one of the operations.
I have formulated the problem this way:
each individual I is represented by a string of M integers
each element e in I takes a value from 0 to N
every number from 0 - N must appear in I at least once
the value of e is not important, so long as each uniquely valued element takes the same unique value (think of them as class labels)
e is less than or equal to N
N can be different for each I
after applying the crossover operation i can potentially generate children which violate one or more of these constraints, so i need to find a way to re-number the elements so that they retain their properties, but fit with the constraints.
for example:
parent_1 (N=5): [1 3 5 4 2 1|0 0 5 2]
parent_2 (N=3): [2 0 1 3 0 1|0 2 1 3]
*** crossover applied at "|" ***
child_1: [1 3 5 4 2 1 0 2 1 3]
child_2: [2 0 1 3 0 1 0 0 5 2]
child_1 obviously still satisfies all of the constraints, as N = 5 and all values 0-5 appear at least once in the array.
The problem lies with child 2 - if we use the max(child_2) way of calculating N we get a value of 5, but if we count the number of unique values then N = 4, which is what the value for N should be. What I am asking (in a very long winded way, granted) is what is a good, pythonic way of doing this:
child_2: [2 0 1 3 0 1 0 0 5 2]
*** some python magic ***
child_2': [2 0 1 3 0 1 0 0 4 2]
*or*
child_2'': [0 1 2 3 1 2 1 1 4 0]
child_2'' is there to illustrate that the values themselves dont matter, so long as each element of a unique value maps to the same value, the constraints are satisfied.
here is what i have tried so far:
value_map = []
for el in child:
if el not in value_map:
value_map.append(el)
for ii in range(0,len(child)):
child[ii] = value_map.index(child[ii])
this approach works and returns a result similar to child_2'', but i can't imagine that it is very efficient in the way it iterates over the string twice, so i was wondering if anyone has any suggestions of how to make it better.
thanks, and sorry for such a long post for such a simple question!
You will need to iterates the list more than once, I don't think there's any way around this. After all, you first have to determine the number of different elements (first pass) before you can start changing elements (second pass). Note, however, that depending on the number of different elements you might have up to O(n^2) due to the repetitive calls to index and not in, which have O(n) on a list.
Alternatively, you could use a dict instead of a list for your value_map. A dictionary has much faster lookup than a list, so this way, the complexity should indeed be on the order of O(n). You can do this using (1) a dictionary comprehension to determine the mapping of old to new values, and (2) a list comprehension for creating the updated child.
value_map = {el: i for i, el in enumerate(set(child))}
child2 = [value_map[el] for el in child]
Or change the child in-place using a for loop.
for i, el in enumerate(child):
child[i] = value_map[el]
You can do it with a single loop like this:
value_map = []
result = []
for el in child:
if el not in value_map:
value_map.append(el)
result.append(value_map.index(el))
One solution I can think of is:
Determine the value of N and determine unused integers. (this forces you to iterate over the array once)
Go through the array and each time you meet a number superior to N, map it to an unused integer.
This forces you to go through the arrays twice, but it should be faster than your example (that forces you to go through the value_map at each element of the array at each iteration)
child = [2, 0, 1, 3, 0, 1, 0, 0, 5, 2]
used = set(child)
N = len(used) - 1
unused = set(xrange(N+1)) - used
value_map = dict()
for i, e in enumerate(child):
if e <= N:
continue
if e not in value_map:
value_map[e] = unused.pop()
child[i] = value_map[e]
print child # [2, 0, 1, 3, 0, 1, 0, 0, 4, 2]
I like #Selçuk Cihan answer. It can also be done in place.
>>> child = [2, 0, 1, 3, 0, 1, 0, 0, 5, 2]
>>>
>>> value_map = []
>>> for i in range(len(child)):
... el = child[i]
... if el not in value_map:
... value_map.append(el)
... child[i] = value_map.index(el)
...
>>> child
[0, 1, 2, 3, 1, 2, 1, 1, 4, 0]
I believe that this works, although I didn't test it for more than the single case that is given in the question.
The only thing that bothers me is that value_map appears three times in the code...
def renumber(individual):
"""
>>> renumber([2, 0, 1, 3, 0, 1, 0, 0, 4, 2])
[0, 1, 2, 3, 1, 2, 1, 1, 4, 0]
"""
value_map = {}
return [value_map.setdefault(e, len(value_map)) for e in individual]
Here is a fast solution, which iterates the list only once.
a = [2, 0, 1, 3, 0, 1, 0, 0, 5, 2]
b = [-1]*len(a)
j = 0
for i in range(len(a)):
if b[a[i]] == -1:
b[a[i]] = j
a[i] = j
j += 1
else:
a[i] = b[a[i]]
print(a) # [0, 1, 2, 3, 1, 2, 1, 1, 4, 0]

Categories