How to avoid so many indexing in the code - python - python

I am new to python and I feel like I am using the absolutely wrong strategy for programming in python. Here is an example:
I have a list like this:
selected_parents =
[array([[4, 6, 3, 1, 0, 7, 5, 2]]), array([[0, 2, 7, 3, 5, 4, 1, 6]])]
Now I want to apply crossover to the elements of the list (please see the P.S. for what I mean by crossover and how it is done, but again, my question is how I should avoid so many indexing that I use while programming in python):
crossed_p1 = np.zeros((len(selected_parents[0][0]))).astype(int)
crossed_p2 = np.zeros((len(selected_parents[0][0]))).astype(int)
co_point = rd.sample(range(len(selected_parents[0][0])),1)
if co_point[0] >= len(selected_parents[0][0])/2:
crossed_p1[0:co_point[0]] = selected_parents[0][0][0:co_point[0]]
indeces = []
for i in range(co_point[0],len(selected_parents[0][0])):
a = np.where(selected_parents[1][0] == selected_parents[0][0][i])
indeces.append(a)
indeces = sorted(indeces)
for i in range(len(indeces)):
crossed_p1[i + co_point[0]] = selected_parents[1][0][indeces[i][0][0]]
crossed_p2[0:co_point[0]] = selected_parents[1][0][0:co_point[0]]
indeces = []
for i in range(co_point[0],len(selected_parents[0][0])):
a = np.where(selected_parents[0][0] == selected_parents[1][0][i])
indeces.append(a)
indeces = sorted(indeces)
for i in range(len(indeces)):
crossed_p2[i + co_point[0]] = selected_parents[0][0][indeces[i][0][0]]
else:
crossed_p1[co_point[0]:] = selected_parents[0][0][co_point[0]:]
indeces = []
for i in range(co_point[0]):
a = np.where(selected_parents[1][0] == selected_parents[0][0][i])
indeces.append(a)
indeces = sorted(indeces)
for i in range(len(indeces)):
crossed_p1[i] = selected_parents[1][0][indeces[i][0][0]]
crossed_p2[co_point[0]:] = selected_parents[1][0][co_point[0]:]
indeces = []
for i in range(co_point[0]):
a = np.where(selected_parents[0][0] == selected_parents[1][0][i])
indeces.append(a)
indeces = sorted(indeces)
for i in range(len(indeces)):
crossed_p2[i] = selected_parents[0][0][indeces[i][0][0]]
The code works like a charm, but I hate the way I am writing it! Like I keep questioning myself, do I really have to write something like selected_parents[0][0][indeces[i][0][0]]?! Like is there a better way of doing what I am doing?!
P.S. This is an example of genetic algorithm and I have the two arrays in selected_parents as the first generation parents. Now I want to apply crossover, which means: A cutting point (i.e. co_point in the code) which is a random integer between 1 and the parents length (herein 8), is selected randomly; the first descendant (i.e. crossed_p1) inherits a longer substring from the first parent and replaces the numbers of shorter substring in the order of numbers appeared in the second parent. And similar procedure is repeated for the second descendant (i.e. crossed_p2). For example, based on the current selected_parents list, and for a co_point = 5, the first descendant (i.e. crossed_p1) inherits the substring of 46310 from the first parent, and the remaining substring of 752 is replaced by 275 which is the order of numbers appeared in the second parent. Hence, the first descendant (i.e. crossed_p1) is 46310275 and the second descendant (i.e. crossed_p2) will be 02735461.

Most of the index is to elements of the selected_parents list, which are 2d arrays:
selected_parents[0][0][0:co_point[0]]
arrays can be indexed with one set of []:
selected_parents[0][0, 0:co_point[0]]
notationally it might be convenient to 'name' the 2 elements of the list (unpacking):
p1, p2 = selected_parents
p1[0, 0:co_point[0]]
Generally it is better to use shape than len on an array. Replace
len(selected_parents[0][0])
with
p1.shape[1]
p1.shape is (1,8)
Looks like p1, p2 have the same shape. In which case
np.stack(selected_parents)
should produce a (2,1,8) array, which could be reshaped to (2,8). Or
np.concatenate(selected_parents, axis=0)
producing a (2,8) array.

It seems the simplest way to make a crossover if your gens are 1D lists:
>>> selected_parents = [[4, 6, 3, 1, 0, 7, 5, 2], [0, 2, 7, 3, 5, 4, 1, 6]]
Let's create two parants, and selecet point of crossover:
>>> p1, p2 = selected_parents
>>> cx = random.randint(len(p1))
>>> p1
[4, 6, 3, 1, 0, 7, 5, 2]
>>> p2
[0, 2, 7, 3, 5, 4, 1, 6]
>>> cx
4
First and second chlidrens are conjuctions of two tancated lists
>>> ch1=p1[:cx]+p2[cx:]
>>> ch1
[4, 6, 3, 1, 5, 4, 1, 6]
>>> ch2=p2[:cx]+p1[cx:]
>>> ch2
[0, 2, 7, 3, 0, 7, 5, 2]
>>>
If you need numpy, it is not a problem. The same idea below:
>>> selected_parents = [array([[4, 6, 3, 1, 0, 7, 5, 2]]), array([[0, 2, 7, 3, 5, 4, 1, 6]])]
>>> p1, p2 = selected_parents
>>> p1
array([[4, 6, 3, 1, 0, 7, 5, 2]])
>>> p2
array([[0, 2, 7, 3, 5, 4, 1, 6]])
>>> cx = random.randint(p1.shape[1])
>>> cx
5
>>> ch1=append(p1[0][:cx],p2[0][cx:])
>>> ch1
array([4, 6, 3, 1, 0, 4, 1, 6])
>>> ch2=append(p2[0][:cx],p1[0][cx:])
>>> ch2
array([0, 2, 7, 3, 5, 7, 5, 2])

Here is a vectorized version of your code. One pleasant side effect of vectorization is that it often does away with most of the indices.
This code assumes that the parent vectors are shuffles of 0, 1, 2, .... If that's not the case some more work is needed:
def invperm(p):
out = np.empty_like(p)
idx = np.ogrid[tuple(map(slice, p.shape))]
idx[-1] = p
out[idx] = np.arange(p.shape[-1])
return out
def f_pp(selected_parents):
sp = np.reshape(selected_parents, (2, -1))
_, N = sp.shape
co = np.random.randint(0, N)
out = sp.copy()
slc = np.s_[:co] if 2*co < N else np.s_[co:]
out[::-1, slc] = out[
np.c_[:2], np.sort(invperm(sp)[np.c_[:2], sp[::-1, slc]], axis=1)]
return out

Related

Shift values in numpy array by differing amounts

I have an array a = np.array([2, 2, 2, 3, 3, 15, 7, 7, 9]) that continues like that. I would like to shift this array but I'm not sure if I can use np.roll() here.
The array I would like to produce is [0, 0, 0, 2, 2, 3, 15, 15, 7].
As you can see, the first like numbers which are in array a (in this case the three '2's) should be replaced with '0's. Everything should then be shifted such that the '3's are replaced with '2's, the '15' is replaced with the '3' etc. Ideally I would like to do this operation without any for loop as I need it to run quickly.
I realise this operation may be a bit confusing so please ask questions.
If you want to stick with NumPy, you can achieve this using np.unique by returning the counts per unique elements with the return_counts option.
Then, simply roll the values and construct a new array with np.repeat:
>>> s, i, c = np.unique(a, return_index=True, return_counts=True)
(array([ 2, 3, 7, 9, 15]), array([0, 3, 6, 8, 5]), array([3, 2, 2, 1, 1]))
The three outputs are respectively: unique sorted elements, indices of first encounter unique element, and the count per unique element.
np.unique sorts the value, so we need to unsort the values as well as the counts first. We can then shift the values with np.roll:
>>> idx = np.argsort(i)
>>> v = np.roll(s[idx], 1)
>>> v[0] = 0
array([ 0, 2, 3, 15, 7])
Alternatively with np.append, this requires a whole copy though:
>>> v = np.append([0], s[idx][:-1])
array([ 0, 2, 3, 15, 7])
Finally reassemble:
>>> np.repeat(v, c[idx])
array([ 0, 0, 0, 2, 2, 3, 15, 15, 7])
Another - more general - solution that will work when there are recurring values in a. This requires the use of np.diff.
You can get the indices of the elements with:
>>> i = np.diff(np.append(a, [0])).nonzero()[0] + 1
array([3, 5, 6, 8, 9])
>>> idx = np.append([0], i)
array([0, 3, 5, 6, 8, 9])
The values are then given using a[idx]:
>>> v = np.append([0], a)[idx]
array([ 0, 2, 3, 15, 7, 9])
And the counts per element with:
>>> c = np.append(np.diff(i, prepend=0), [0])
array([3, 2, 1, 2, 1, 0])
Finally, reassemble:
>>> np.repeat(v, c)
array([ 0, 0, 0, 2, 2, 3, 15, 15, 7])
This is not using numpy, but one approach that comes to mind is to itertools.groupby to collect contiguous runs of the same elements. Then shift all the elements (by prepending a 0) and use the counts to repeat them.
from itertools import chain, groupby
def shift(data):
values = [(k, len(list(g))) for k,g in groupby(data)]
keys = [0] + [i[0] for i in values]
reps = [i[1] for i in values]
return list(chain.from_iterable([[k]*rep for k, rep in zip(keys, reps)]))
For example
>>> a = np.array([2,2,2,3,3,15,7,7,9])
>>> shift(a)
[0, 0, 0, 2, 2, 3, 15, 15, 7]
You can try this code:
import numpy as np
a = np.array([2, 2, 2, 3, 3, 15, 7, 7, 9])
diff_a=np.diff(a)
idx=np.flatnonzero(diff_a)
val=diff_a[idx]
val=np.insert(val[:-1],0, a[0]) #update value
diff_a[idx]=val
res=np.append([0],np.cumsum(diff_a))
print(res)
You can try this:
import numpy as np
a = np.array([2, 2, 2, 3, 3, 15, 7, 7, 9])
z = a - np.pad(a, (1,0))[:-1]
z[m] = np.pad(z[(m := z!=0)], (1,0))[:-1]
print(z.cumsum())
It gives:
[ 0 0 0 2 2 3 15 15 7]

Python: choosing indices from array that correspond to elements of specific value

I have an array that looks like this:
x = [1, 1, 2, 3, 3, 2, 2, 1, 2, 3, 2, 3, 2, 1, 2, 1, 1, 2, 1]
I want to write a function that will randomly return some specified number of indices that correspond to a specified number. In other words, if I pass the function the array x, the desired number of indices such as 3, and the target value 1, I would want it to return an array such as:
[0, 7, 13]
Since 0, 7, and 13 are the indices that correspond to 1 in x.
Does anyone know how I might do this efficiently?
You want to use random.sample for this:
import random
def f(arr, target, num):
return random.sample([i for i, x in enumerate(arr) if x == target], k=num)
x = [1, 1, 2, 3, 3, 2, 2, 1, 2, 3, 2, 3, 2, 1, 2, 1, 1, 2, 1]
print(f(x, 1, 3))
Output:
[0, 1, 15]
You can use the sample function from the random module and pass it the list of indices that match the specified value:
x = [1, 1, 2, 3, 3, 2, 2, 1, 2, 3, 2, 3, 2, 1, 2, 1, 1, 2, 1]
from random import sample
def randomIndices(a,count,v):
return sample([i for i,n in enumerate(a) if n==v],count)
print(randomIndices(x,3,1)) # [1,18,15]
Your question asks how to do this efficiently, which depends on how you plan on using this code. As myself and others have pointed out, one way is to use enumerate to filter the list for the indices that correspond to the target value. The downside here is that each time you pick a new target value or request a new sample, you have to once again enumerate the list which is an O(n) operation.
If you plan on taking multiple samples, you may be better off building a dictionary mapping the target value to the indices upfront. Then you can subsequently use this dictionary to draw random samples more efficiently than enumerating. (The magnitude of the savings would grow as x becomes very large).
First build the dictionary using collections.defaultdict:
from collections import defaultdict
d = defaultdict(list)
for i, val in enumerate(x):
d[val].append(i)
print(dict(d))
#{1: [0, 1, 7, 13, 15, 16, 18], 2: [2, 5, 6, 8, 10, 12, 14, 17], 3: [3, 4, 9, 11]}
Now you can use d to draw your samples:
from random import sample
def get_random_sample(d, target_value, size):
return sample(d[target_value], size)
print(get_random_sample(d, target_value=1, size=3))
#[16, 7, 18]
You can do the next:
Get the indices of the items with value equal to 1
Use random.sample to select randomly only a few indices (without repetitions) extracted from the previous step.
Here is one way to do it (n indicates the number of indices to pick):
from random import sample
x = [1, 1, 2, 3, 3, 2, 2, 1, 2, 3, 2, 3, 2, 1, 2, 1, 1, 2, 1]
n = 3
target = 1
indices = frozenset(filter(lambda k: x[k] == target, range(len(x))))
out = sample(indices, min(len(indices), n))
print(out)
Note that the number of returned indices could be lower than n (if the number of 1s in the list is less than n)

Combine elements from two lists

I want to merge two arrays in python in a special way.
The entries with an odd index of my output array out shall be the coresponding entries of my first input array in0. The entries with an even index in out shall be the coresponding entries of my second input array
in1.
in0, in1 and out are all the same length.
Example:
The input arrays
in0 = [0, 1, 2, 3]
in1 = [4, 5, 6, 7]
shall be merge to the output array
out = [0, 5, 2, 7]
Is there a nicer way than to loop over the whole length of the inputs and fill my out 'by hand'?
You could use a list comprehension and select values from in0 on even indices and in1 on odd indices:
[in0[i] if i % 2 == 0 else in1[i] for i in range(len(in0))]
# [0, 5, 2, 7]
If you're happy to make full list copy, this is simple with slicing:
>>> in0 = [0, 1, 2, 3]
>>> in1 = [4, 5, 6, 7]
>>> out = in0[:]
>>> out[1::2] = in1[1::2]
>>> out
[0, 5, 2, 7]
If you don't mind some verbosity...
from itertools import cycle
in0 = [0, 1, 2, 3]
in1 = [4, 5, 6, 7]
out = [pair[i] for pair, i in zip(zip(in0, in1), cycle([0,1]))]
How it works:
zip(in0, in1) is a sequence of tuples, (0,4), (1,5), (2,6), (3,7).
cycle([0,1]) is an endless stream of alternating 0s and 1s to be used as indices in the tuples from step 1.
zip(zip(...), cycle(...)) produces a pair of tuples and indices:
(0, (0,4)), (1, (1,5)), (0, (2,6)), (1, (3,7)).
The list comprehension takes the correct element from each tuple.
In the end, the list comprehension is a general version of
[(0,4)[0], (1,5)[1], (2,6)[0], (3,7)[1]]
Without using loops, but not in the exact same order you requested:
>> in0 = [0, 1, 2, 3]
>> in1 = [4, 5, 6, 7]
>> out = in0[0::2] + in1[1::2]
>> out
[0, 2, 5, 7]
EDIT: correcting the output order with itertools:
>> import itertools
>> in0 = [0, 1, 2, 3]
>> in1 = [4, 5, 6, 7]
>> out = list(itertools.chain(*zip(in0[0::2], in1[1::2])))
>> out
[0, 5, 2, 7]

Split a list into increasing sequences using itertools

I have a list with mixed sequences like
[1,2,3,4,5,2,3,4,1,2]
I want to know how I can use itertools to split the list into increasing sequences cutting the list at decreasing points. For instance the above would output
[[1, 2, 3, 4, 5], [2, 3, 4], [1, 2]]
this has been obtained by noting that the sequence decreases at 2 so we cut the first bit there and another decrease is at one cutting again there.
Another example is with the sequence
[3,2,1]
the output should be
[[3], [2], [1]]
In the event that the given sequence is increasing we return the same sequence. For example
[1,2,3]
returns the same result. i.e
[[1, 2, 3]]
For a repeating list like
[ 1, 2,2,2, 1, 2, 3, 3, 1,1,1, 2, 3, 4, 1, 2, 3, 4, 5, 6]
the output should be
[[1, 2, 2, 2], [1, 2, 3, 3], [1, 1, 1, 2, 3, 4], [1, 2, 3, 4, 5, 6]]
What I did to achieve this is define the following function
def splitter (L):
result = []
tmp = 0
initialPoint=0
for i in range(len(L)):
if (L[i] < tmp):
tmpp = L[initialPoint:i]
result.append(tmpp)
initialPoint=i
tmp = L[i]
result.append(L[initialPoint:])
return result
The function is working 100% but what I need is to do the same with itertools so that I can improve efficiency of my code. Is there a way to do this with itertools package to avoid the explicit looping?
With numpy, you can use numpy.split, this requires the index as split positions; since you want to split where the value decreases, you can use numpy.diff to calculate the difference and check where the difference is smaller than zero and use numpy.where to retrieve corresponding indices, an example with the last case in the question:
import numpy as np
lst = [ 1, 2,2,2, 1, 2, 3, 3, 1,1,1, 2, 3, 4, 1, 2, 3, 4, 5, 6]
np.split(lst, np.where(np.diff(lst) < 0)[0] + 1)
# [array([1, 2, 2, 2]),
# array([1, 2, 3, 3]),
# array([1, 1, 1, 2, 3, 4]),
# array([1, 2, 3, 4, 5, 6])]
Psidom already has you covered with a good answer, but another NumPy solution would be to use scipy.signal.argrelmax to acquire the local maxima, then np.split.
from scipy.signal import argrelmax
arr = np.random.randint(1000, size=10**6)
splits = np.split(arr, argrelmax(arr)[0]+1)
Assume your original input array:
a = [1, 2, 3, 4, 5, 2, 3, 4, 1, 2]
First find the places where the splits shall occur:
p = [ i+1 for i, (x, y) in enumerate(zip(a, a[1:])) if x > y ]
Then create slices for each such split:
print [ a[m:n] for m, n in zip([ 0 ] + p, p + [ None ]) ]
This will print this:
[[1, 2, 3, 4, 5], [2, 3, 4], [1, 2]]
I propose to use more speaking names than p, n, m, etc. ;-)

Finding differences between all values in an List

I want to find the differences between all values in a numpy array and append it to a new list.
Example: a = [1,4,2,6]
result : newlist= [3,1,5,3,2,2,1,2,4,5,2,4]
i.e for each value i of a, determine difference between values of the rest of the list.
At this point I have been unable to find a solution
You can do this:
a = [1,4,2,6]
newlist = [abs(i-j) for i in a for j in a if i != j]
Output:
print newlist
[3, 1, 5, 3, 2, 2, 1, 2, 4, 5, 2, 4]
I believe what you are trying to do is to calculate absolute differences between elements of the input list, but excluding the self-differences. So, with that idea, this could be one vectorized approach also known as array programming -
# Input list
a = [1,4,2,6]
# Convert input list to a numpy array
arr = np.array(a)
# Calculate absolute differences between each element
# against all elements to give us a 2D array
sub_arr = np.abs(arr[:,None] - arr)
# Get diagonal indices for the 2D array
N = arr.size
rem_idx = np.arange(N)*(N+1)
# Remove the diagonal elements for the final output
out = np.delete(sub_arr,rem_idx)
Sample run to show the outputs at each step -
In [60]: a
Out[60]: [1, 4, 2, 6]
In [61]: arr
Out[61]: array([1, 4, 2, 6])
In [62]: sub_arr
Out[62]:
array([[0, 3, 1, 5],
[3, 0, 2, 2],
[1, 2, 0, 4],
[5, 2, 4, 0]])
In [63]: rem_idx
Out[63]: array([ 0, 5, 10, 15])
In [64]: out
Out[64]: array([3, 1, 5, 3, 2, 2, 1, 2, 4, 5, 2, 4])

Categories