Efficient way of re-numbering elements in an array - python

I am reasonably new to python and am trying to implement a genetic algorithm, but need some assistance with the code for one of the operations.
I have formulated the problem this way:
each individual I is represented by a string of M integers
each element e in I takes a value from 0 to N
every number from 0 - N must appear in I at least once
the value of e is not important, so long as each uniquely valued element takes the same unique value (think of them as class labels)
e is less than or equal to N
N can be different for each I
after applying the crossover operation i can potentially generate children which violate one or more of these constraints, so i need to find a way to re-number the elements so that they retain their properties, but fit with the constraints.
for example:
parent_1 (N=5): [1 3 5 4 2 1|0 0 5 2]
parent_2 (N=3): [2 0 1 3 0 1|0 2 1 3]
*** crossover applied at "|" ***
child_1: [1 3 5 4 2 1 0 2 1 3]
child_2: [2 0 1 3 0 1 0 0 5 2]
child_1 obviously still satisfies all of the constraints, as N = 5 and all values 0-5 appear at least once in the array.
The problem lies with child 2 - if we use the max(child_2) way of calculating N we get a value of 5, but if we count the number of unique values then N = 4, which is what the value for N should be. What I am asking (in a very long winded way, granted) is what is a good, pythonic way of doing this:
child_2: [2 0 1 3 0 1 0 0 5 2]
*** some python magic ***
child_2': [2 0 1 3 0 1 0 0 4 2]
*or*
child_2'': [0 1 2 3 1 2 1 1 4 0]
child_2'' is there to illustrate that the values themselves dont matter, so long as each element of a unique value maps to the same value, the constraints are satisfied.
here is what i have tried so far:
value_map = []
for el in child:
if el not in value_map:
value_map.append(el)
for ii in range(0,len(child)):
child[ii] = value_map.index(child[ii])
this approach works and returns a result similar to child_2'', but i can't imagine that it is very efficient in the way it iterates over the string twice, so i was wondering if anyone has any suggestions of how to make it better.
thanks, and sorry for such a long post for such a simple question!

You will need to iterates the list more than once, I don't think there's any way around this. After all, you first have to determine the number of different elements (first pass) before you can start changing elements (second pass). Note, however, that depending on the number of different elements you might have up to O(n^2) due to the repetitive calls to index and not in, which have O(n) on a list.
Alternatively, you could use a dict instead of a list for your value_map. A dictionary has much faster lookup than a list, so this way, the complexity should indeed be on the order of O(n). You can do this using (1) a dictionary comprehension to determine the mapping of old to new values, and (2) a list comprehension for creating the updated child.
value_map = {el: i for i, el in enumerate(set(child))}
child2 = [value_map[el] for el in child]
Or change the child in-place using a for loop.
for i, el in enumerate(child):
child[i] = value_map[el]

You can do it with a single loop like this:
value_map = []
result = []
for el in child:
if el not in value_map:
value_map.append(el)
result.append(value_map.index(el))

One solution I can think of is:
Determine the value of N and determine unused integers. (this forces you to iterate over the array once)
Go through the array and each time you meet a number superior to N, map it to an unused integer.
This forces you to go through the arrays twice, but it should be faster than your example (that forces you to go through the value_map at each element of the array at each iteration)
child = [2, 0, 1, 3, 0, 1, 0, 0, 5, 2]
used = set(child)
N = len(used) - 1
unused = set(xrange(N+1)) - used
value_map = dict()
for i, e in enumerate(child):
if e <= N:
continue
if e not in value_map:
value_map[e] = unused.pop()
child[i] = value_map[e]
print child # [2, 0, 1, 3, 0, 1, 0, 0, 4, 2]

I like #Selçuk Cihan answer. It can also be done in place.
>>> child = [2, 0, 1, 3, 0, 1, 0, 0, 5, 2]
>>>
>>> value_map = []
>>> for i in range(len(child)):
... el = child[i]
... if el not in value_map:
... value_map.append(el)
... child[i] = value_map.index(el)
...
>>> child
[0, 1, 2, 3, 1, 2, 1, 1, 4, 0]

I believe that this works, although I didn't test it for more than the single case that is given in the question.
The only thing that bothers me is that value_map appears three times in the code...
def renumber(individual):
"""
>>> renumber([2, 0, 1, 3, 0, 1, 0, 0, 4, 2])
[0, 1, 2, 3, 1, 2, 1, 1, 4, 0]
"""
value_map = {}
return [value_map.setdefault(e, len(value_map)) for e in individual]

Here is a fast solution, which iterates the list only once.
a = [2, 0, 1, 3, 0, 1, 0, 0, 5, 2]
b = [-1]*len(a)
j = 0
for i in range(len(a)):
if b[a[i]] == -1:
b[a[i]] = j
a[i] = j
j += 1
else:
a[i] = b[a[i]]
print(a) # [0, 1, 2, 3, 1, 2, 1, 1, 4, 0]

Related

Script for making contracted basis set in python

I'm trying to make a script which generates a contracted basis set that I need in my scientific calculations. The output file contains basically 2D grid of numbers. In the first column I have exponents and the second column and columns after that contain contraction coefficients. In the example below I have four functions (that correspond to 4 exponents) and in this contraction scheme these four functions are contracted into two functions in such a way that in the first function I have 3 exponents and hence 3 nonzero coefficients. The second function consist only of 1 function and hence 1 nonzero coefficient. So, 4 functions contracted into two as 3 + 1 it would look like this
exponent1 coefficient1 0
exponent2 coefficient2 0
exponent3 coefficient3 0
exponent4 0 coefficient4
so that the non-zero numbers in the column are contracted together. In my script I of course try to make a general scheme so that if I have n exponents I can make a contraction scheme where I can have any kind of scheme between 1 and n functions.
The script is as follows now
#!/usr/bin/python3
def contract(e, i, c, n):
"""
e = list of exponents == number of functions.
Functions e are contracted to i functions whose coefficients
are determined by c and n is a list which tells how the contraction
is done (e.g. n = [3, 1] --> functions contracted into two as 3 + 1.)
"""
num = len(e)
grid = [[0 for i in range(i + 1)] for x in range(num)]
for num1, row1 in enumerate(grid):
row1[0] = e[num1] #add exponents
for g in grid:
print(g)
i = 2
e = [0, 1, 2, 3]
c = [4, 5, 6, 7]
n = [3, 1]
contract(e, i, c, n)
This works so far and the output this produces now is:
[0, 0, 0]
[1, 0, 0]
[2, 0, 0]
[3, 0, 0]
The output should be:
[0, 4, 0]
[1, 5, 0]
[2, 6, 0]
[3, 0, 7]
The problem is that I don't know how I could do the rest of the code. So how could I get the coefficients to correct places? I will then somehow print the numbers in the list so the final output file should in this case look like:
0 4 0
1 5 0
2 6 0
3 0 7
Does anyone have an idea how I could do this?

Why does python return index 0 if the item is 0 for the .index method?

[ss]https://i.imgur.com/dyggsaJ.png
howdy guys,
I wrote a list
l = [0, 0, 0, 1]
and asked Python to print the index for every item, like this
for i in l:
print(l.index(i))
It returns
0
0
0
3
Notice how it returns 0 for all the elements that are 0 but returns the correct index when the item is 1.
Similarly,
l2 = [0, 0, 3, 2, 5]
for i in l2:
print(l2.index(i))
# returns 0, 0, 2, 3, 4
l3 = [2, 3, 4, 0, 0]
for i in l3:
print(l3.index(i))
# returns 0, 1, 2, 3, 3. what?
But
l4 = [1, 2, 0, 3, 4]
for i in l4:
print(l4.index(i))
# returns 0, 1, 2, 3, 4
It seems that the loop goes wacky as soon as there are two elements in a row that are 0. Is there a name for this or an explanation? Did I write the code wrong?
You've got it wrong. If you try:
l = [2, 2, 2, 1]
for i in l:
print(l.index(i))
you will also get:
0
0
0
3
That is because during the ierations, you tell python to print the index of 0 three times. Python will only print the first index it sees that equal to what was specified.
As seen in the documentation,
list.index(x[, start[, end]])
returns "zero-based index in the list of the first item whose value is equal to x".
So, if there are two or more elements with the same value in your list, you will get always the index of the first occurence. In other words your code works as expected.
If you want the index and item of a list while iterating over each entry you can try either of the below 2 options -
Option 1:
for i in range(len(l)):
print(i)
Option 2:
for i,v in enumerate(l):
print(i)
Both the above options will give you the result 0 1 2 3

I want to take the XOR of all the elements of 1 list with another. How do I do it? [duplicate]

This question already has answers here:
How do you get the logical xor of two variables in Python?
(28 answers)
Closed 3 years ago.
I have a bunch of lists in the form of say [0,0,1,0,1...], and I want to take the XOR of 2 lists and give the output as a list.
Like:
[ 0, 0, 1 ] XOR [ 0, 1, 0 ] -> [ 0, 1, 1 ]
res = []
tmp = []
for i in Employee_Specific_Vocabulary_Dict['Binary Vector']:
for j in Course_Specific_Vocabulary_Dict['Binary Vector']:
tmp = [i[index] ^ j[index] for index in range(len(i))]
res.append(temp)
The size of each of my lists / vectors is around 3500 elements, so I need something to save time, since this piece of code is taking more than 20 mins to run.
I have 3085 lists, each of which need an XOR operation with 4089 other lists.
How do I do this without iterating through each list explicitly?
Use map:
answer = list(map(operator.xor, lst1, lst2)).
or zip:
answer = [x ^ y for x,y in zip(lst1, lst2)]
If you need something faster, consider using NumPy instead of Python lists to hold your data.
Assuming a and b are the same size you can use the xor operation (i.e. ^) with simple list indexing:
a = [0, 0, 1]
b = [0, 1, 1]
c = [a[index] ^ b[index] for index in range(len(a))]
print(c) # [0, 1, 0]
or you can use zip with the xor:
a = [0, 0, 1]
b = [0, 1, 1]
c = [x ^ y for x, y in zip(a, b)]
print(c) # [0, 1, 0]
zip will only go to the shortest list (if they are not the same size). If they are not the same size and you want to go to the longer list you can use zip_longest:
from itertools import zip_longest
a = [0, 0, 1, 1]
b = [0, 1, 1]
c = [x ^ y for x, y in zip_longest(a, b, fillvalue=0)]
print(c) # [0, 1, 0, 1]
Using numpy you should have some performance gains, the function you need is bitwise_xor, like so:
import numpy as np
results = []
for i in Employee_Specific_Vocabulary_Dict['Binary Vector']:
for j in Course_Specific_Vocabulary_Dict['Binary Vector']:
results.append(np.bitwise_xor(i, j))
A proof of concept:
a = [1,0,0,1,1]
b = [1,1,0,0,1]
x = np.bitwise_xor(a,b)
print("a\tb\tres")
for i in range(len(a)):
print("{}\t{}\t{}".format(a[i], b[i], x[i]))
output:
a b x
1 1 0
0 1 1
0 0 0
1 0 1
1 1 0
Edit
Note that if your arrays have the same size, you can simply do one operation and the bitwise_xor will still work, so:
a = [[1,1,0], [0,0,1]]
b = [[0,1,0], [1,0,1]]
res = np.bitwise_xor(a, b)
will still work, and you'll have:
res: [[1, 0, 0], [1, 0, 0]]
In your case, a workaround would possibily be:
results = []
n = len(Course_Specific_Vocabulary_Dict['Binary Vector'])
for a in Employee_Specific_Vocabulary_Dict['Binary Vector']:
# Get same size array w.r.t Course_Specific_Vocabulary_Dict["Binary Vector]
repeated_a = np.repeat([a], n, axis=0)
results.append(np.bitwise_xor(repeated_a, Course_Specific_Vocabulary_Dict['Binary Vector']))
However I don't know if that would actually improve performance, it is to be checked; for sure it will require some more memory.

How to Transfer a number in an array from one position to another

I would like to know how to transfer a number.
For example: [1,0,2,3,4]
Remove the one and transfer the one to two's position
Result: [0,0,1,3,4]
If your manipulations are purely index-based, you can do this:
lst = [1,0,2,3,4]
lst[2] = lst[0]
lst[0] = 0
# [0, 0, 1, 3, 4]
Alternatively, if you need to work out the index of 2:
lst[lst.index(2)] = lst[0]
lst[0] = 0
Since you have not described your question with clear instructions, There is case when there will be more than one 2 or 1 in vector then what you want to do ?
My solution is only for that condition when there is single 1 and 2 in vector because when you use .index method it always returns first value index no matter there are other values too.
Since in your dataset there is always 1 times 1 and 2 in all vector so here is the solution for that
data=[[1, 2, 3, 4, 0], [1, 3, 2, 4, 0], [2, 1, 3, 4, 0] ]
def replace_ (vector_ , replace_value, replace_with):
memory=vector_.index(replace_with)
vector_[vector_.index(replace_value)]=vector_[vector_.index(replace_with)]
vector_[memory]=0
return vector_
for i in data:
print(replace_(i,1,2))
If there are more than one 1 or 2 in vector like [1,0,1,1,2,2] then describe your logic and edit your question for that.

Compute the length of consecutive true values in a list

Essentially this problem can be split into two parts. I have a set of binary values that indicate whether a given signal is present or not. Given that the each value also corresponds to a unit of time (in this case minutes) I am trying to determine how long the signal exists on average given its occurrence within the overall list of values throughout the period I'm analyzing. For example, if I have the following list:
[0,0,0,1,1,1,0,0,1,0,0,0,1,1,1,1,0]
I can see that the signal occurs 3 separate times for variable lengths of time (i.e. in the first case for 3 minutes). If I want to calculate the average length of time for each occurrence however I need an indication of how many independent instances of the signal exist (i.e. 3). I have tried various index based strategies such as:
arb_ops.index(1)
to find the next occurrence of true values and correspondingly finding the next occurrence of 0 to find the length but am having trouble contextualizing this into a recursive function for the entire array.
You could use itertools.groupby() to group consecutive equal elements. To calculate a group's length convert the iterator to a list and apply len() to it:
>>> from itertools import groupby
>>> lst = [0 ,0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0 ,1, 1, 1, 1, 0]
>>> for k, g in groupby(lst):
... g = list(g)
... print(k, g, len(g))
...
0 [0, 0, 0] 3
1 [1, 1, 1] 3
0 [0, 0] 2
1 [1] 1
0 [0, 0, 0] 3
1 [1, 1, 1, 1] 4
0 [0] 1
Another option may be MaskedArray.count, which counts non-masked elements of an array along a given axis:
import numpy.ma as ma
a = ma.arange(6).reshape((2, 3))
a[1, :] = ma.masked
a
masked_array(data =
[[0 1 2]
[-- -- --]],
mask =
[[False False False]
[ True True True]],
fill_value = 999999)
a.count()
3
You can extend Masked Arrays quite far...
#eugene-yarmash solution with the groupby is decent. However, if you wanted to go with a solution that requires no import, and where you do the grouping yourself --for learning purposes-- you could try this::
>>> l = [0,0,0,1,1,1,0,0,1,0,0,0,1,1,1,1,0]
>>> def size(xs):
... sz = 0
... for x in xs:
... if x == 0 and sz > 0:
... yield sz
... sz = 0
... if x == 1:
... sz += 1
... if sz > 0:
... yield sz
...
>>> list(size(l))
[3, 1, 4]
I think this problem is actually pretty simple--you know you have a new signal if you see a value is 1, and the previous value is 0.
The code I provided is kind of long, but super simple, and done without imports.
signal = [0,0,0,1,1,1,0,0,1,0,0,0,1,1,1,1,0]
def find_number_of_signals(signal):
index = 0
signal_counter = 0
signal_duration = 0
for i in range(len(signal) - 1):
if signal[index] == 1:
signal_duration += 1.0
if signal[index- 1] == 0:
signal_counter += 1.0
index += 1
print signal_counter
print signal_duration
print float(signal_duration / signal_counter)
find_number_of_signals(signal)

Categories