Fix first element, shuffle the rest of a list/array - python

Is it possible to shuffle only a (continuous) part of a given list (or array in numpy)?
If this is not generally possible, how about the special case where the first element is fixed while the rest of the list/array need to be shuffled? For example, I have a list/array:
to_be_shuffled = [None, 'a', 'b', 'c', 'd', ...]
where the first element should always stay, while the rest are going to be shuffled repeatedly.
One possible way is to shuffle the whole list first, and then check the first element, if it is not the special fixed element (e.g. None), then swap its position with that of the special element (which would then require a lookup).
Is there any better way for doing this?

Why not just
import random
rest = to_be_shuffled[1:]
random.shuffle(rest)
shuffled_lst = [to_be_shuffled[0]] + rest

numpy arrays don't copy data on slicing:
numpy.random.shuffle(a[1:])

I thought it would be interesting and educational to try to implement a slightly more general approach than what you're asking for. Here I shuffle the indices to the original list (rather than the list itself), excluding the locked indices, and use that index-list to cherry pick elements from the original list. This is not an in-place solution, but implemented as a generator so you can lazily pick elements.
Feel free to edit if you can improve it.
import random
def partial_shuf(input_list, fixed_indices):
"""Given an input_list, yield elements from that list in random order
except where elements indices are in fixed_indices."""
fixed_indices = sorted(set(i for i in fixed_indices if i < len(input_list)))
i = 0
for fixed in fixed_indices:
aslice = range(i, fixed)
i = 1 + fixed
random.shuffle(aslice)
for j in aslice:
yield input_list[j]
yield input_list[fixed]
aslice = range(i, len(input_list))
random.shuffle(aslice)
for j in aslice:
yield input_list[j]
print '\n'.join(' '.join((str(i), str(n))) for i, n in enumerate(partial_shuf(range(4, 36), [0, 4, 9, 17, 25, 40])))
assert sorted(partial_shuf(range(4, 36), [0, 4, 9, 17, 25, 40])) == range(4, 36)

I took the shuffle function from the standard library random module (found in Lib\random.py) and modified it slightly so that it shuffles only a portion of the list specified by start and stop. It does this in place. Enjoy!
from random import randint
def shuffle(x, start=0, stop=None):
if stop is None:
stop = len(x)
for i in reversed(range(start + 1, stop)):
# pick an element in x[start: i+1] with which to exchange x[i]
j = randint(start, i)
x[i], x[j] = x[j], x[i]
For your purposes, calling this function with 1 as the start parameter should do the trick.

Related

Generate Unique Pairs From Two Lists Without Itertools

I am looping over a list twice and want to catch all unique pairs, rather than all combinations- ie. order in the pair doesn't matter
listy=[0,1,2]
out=[]
for i in listy:
for j in listy:
out.append([i,j])
The output I'm getting is [[0,0],[0,1],[0,2],[1,0],[1,1],[1,2],[2,0],[2,1],[2,2]]
What I am looking for is [[0,0],[0,1],[0,2],[1,1],[1,2],[2,2]]
One possible solution is,
listy=[0,1,2]
out=[]
for i in listy:
for j in listy:
pair=set([i,j])
if pair not in out:
out.append(pair)
This produces [{0},{0,1},{0,2},{1},{1,2},{2}]
However this creates inefficiency in what is a heavy script (the lists are long) and I also do not want a list of sets. I want a list of lists.
Is there a better way to achieve this without using itertools (I want an implementation I can also apply to javascript without too much rethinking)
I don't know javascript at all, but if there is something analogous to list comprehension I would try this:
listy = [0,1,2]
pairs = [[i, j] for i in listy for j in listy if i <= j]
The content of pairs is exactly as you wanted:
[[0, 0], [0, 1], [0, 2], [1, 1], [1, 2], [2, 2]]
Option 1 - "minor fix"
A trivial "fix" would be:
listy=[0,1,2]
out=set()
for i in listy:
for j in listy:
if (i, j) not in out and (j, i) not in out:
out.add((i,j))
The result is:
{(0, 1), (1, 2), (0, 0), (1, 1), (0, 2), (2, 2)}
However this is not an efficient implementation, because we have to check twice if an element is in the list.
Option 2 - More efficient implementation
You could achieve your goal using a trivial scan of the array:
listy=[0,1,2]
out = [(i, j) for i in range(len(listy)) for j in range(i, len(listy))]
NOTE: I use tuples for the pairs, you could easily change that into a list of lists using:
out = [[i, j] for i in range(len(listy)) for j in range(i, len(listy))]
The most straightforward translation of itertools.combinations_with_replacement(listy, 2) (which matches the behavior you desire) that directly generates the results you want from an input list is:
listy = [0,1,2]
out = []
for idx, i in enumerate(listy):
for j in listy[idx:]:
out.append([i, j])
Only change is the use of enumerate (to get the current index being iterated as you iterate) and slicing the listy used in the inner loop (so it starts at the same index as the current run of the outer loop).
This gets the exact result requested with minimal overhead (it does make shallow copies of the list, of decreasing size, once for each inner loop, but this is fairly quick in Python; unless the list is huge, it should be a pretty minimal cost). If you need to avoid slicing, you can make the inner loop an index-based loop with indexing (but in practice, the overhead of indexing is high enough that it'll often lose to the slicing):
listy = [0,1,2]
out = []
for idx, i in enumerate(listy):
for idxj in range(idx, len(listy)):
out.append([i, j[idxj]])

Split sorted list into two lists

I'm trying to split a sorted integer list into two lists. The first list would have all ints under n and the second all ints over n. Note that n does not have to be in the original list.
I can easily do this with:
under = []
over = []
for x in sorted_list:
if x < n:
under.append(x)
else
over.append(x)
But it just seems like it should be possible to do this in a more elegant way knowing that the list is sorted. takewhile and dropwhile from itertools sound like the solution but then I would be iterating over the list twice.
Functionally, the best I can do is this:
i = 0
while sorted_list[i] < n:
i += 1
under = sorted_list[:i]
over = sorted_list[i:]
But I'm not even sure if it is actually better than just iterating over the list twice and it is definitely not more elegant.
I guess I'm looking for a way to get the list returned by takewhile and the remaining list, perhaps, in a pair.
The correct solution here is the bisect module. Use bisect.bisect to find the index to the right of n (or the index where it would be inserted if it's missing), then slice around that point:
import bisect # At top of file
split_idx = bisect.bisect(sorted_list, n)
under = sorted_list[:split_idx]
over = sorted_list[split_idx:]
While any solution is going to be O(n) (you do have to copy the elements after all), the comparisons are typically more expensive than simple pointer copies (and associated reference count updates), and bisect reduces the comparison work on a sorted list to O(log n), so this will typically (on larger inputs) beat simply iterating and copying element by element until you find the split point.
Use bisect.bisect_left (which finds the leftmost index of n) instead of bisect.bisect (equivalent to bisect.bisect_right) if you want n to end up in over instead of under.
I would use following approach, where I find the index and use slicing to create under and over:
sorted_list = [1,2,4,5,6,7,8]
n=6
idx = sorted_list.index(n)
under = sorted_list[:idx]
over = sorted_list[idx:]
print(under)
print(over)
Output (same as with your code):
[1, 2, 4, 5]
[6, 7, 8]
Edit: As I understood the question wrong here is an adapted solution to find the nearest index:
import numpy as np
sorted_list = [1,2,4,5,6,7,8]
n=3
idx = np.searchsorted(sorted_list, n)
under = sorted_list[:idx]
over = sorted_list[idx:]
print(under)
print(over)
Output:
[1, 2]
[4, 5, 6, 7, 8]

Merge N lists by randomly picking elements at each index

I have a bajillion paired lists, each pair of equal size. I want to "merge" each by picking a random element from each index, but my current implementation is very slow - even when multiprocessing. (FWIW, my code does need to be threadable).
def rand_merge(l1, l2):
newl = []
for i in range(len(l1)):
q = random.choice([l1, l2])
newl.append(q[i])
return newl
Pretty basic, but running it on 20k lists of sizes ~5-25, it takes crazy long - I assume it's random.choice gumming up the works. But I've also tried other versions of random, like creating a string of 0's and 1's to refer to, no go.
EDIT:
More clarity: It's a Genetic Algorithm designed to write sentences by matching up against a corpus. The lists in question are sentences split by word. The GA is "merging" winning fitness "parents" into children, each of which are a merging of the two parent sentences' "genes."
That means that the "lists" do need to match up, and can't pull from a larger list of lists (I don't think).
Here some code...
from multiprocessing import Pool as ThreadPool
import random
def offspring(parents):
child = []
p1 = parents[0].split(' ')
p2 = parents[1].split(' ')
for i in range(min(len(p1), len(p2))):
q = random.choice([p1, p2])
child.append(q[i])
child = ' '.join([g for g in child]).strip()
return child
def nextgen(l): #l is two lists of previous generation and grammar seed
oldgen = l[0][:pop] # Population's worth of previous generation
gramsent = l[1] # this is the grammar seed
newgen = []
newgen.append(tuple([oldgen[0][0], oldgen[0][0]])) # Keep the winner!
for i in range(len(oldgen) - len(oldgen)//4):
ind1 = oldgen[0][0] # paired off against the winner - for larger pools, this is a random.sample/"tournament"
ind2 = oldgen[i][0]
newgen.append(tuple([ind1, ind2]))
pool = ThreadPool(processes=8)
newgen = pool.map(offspring, newgen)
pool.close()
pool.join()
The populations and generations can get into high numbers together, and each sentence runs through. Since posting the question originally, troubled that it was taking so long for each generation to roll by, I discovered (head-scratcher for me) that the long processing times actually have (almost) nothing to do with the "population" size or number of lists. It was taking ~15 seconds to mutate each generation. I upped the population from 50 to 50000 and the generations went from 15 seconds to 17 or so. So the slowness is apparently hiding elsewhere.
Try merging all 20,000 lists at once, instead of two at a time.
from itertools import zip_longest
from functools import partial
import random
lists = [l1, l2, ...]
idxvals = map(partial(filter, None), itertools.zip_longest(*lists))
newl = [random.choice([*i]) for i in idxvals]
Since you want to pick a random element at each index, it makes sense to chose from all 20k lists at once instead of 2 at a time.
>>> lists = [[1, 2, 3], [10], [20, 30, 40, 5]]
zip_longest will zip to the longest list, filling missing values with None.
>>> list(itertools.zip_longest(*lists))
[(1, 10, 20), (2, None, 30), (3, None, 40), (None, None, 5)]
These Nones will need to be filtered out before the choose step. filter will help with that.
>>> f = partial(filter, None)
>>> list(map(list, map(f, itertools.zip_longest(*lists))))
[[1, 10, 20], [2, 30], [3, 40], [5]]
It should be clear what I'm trying to do. The ith index of the output contains those elements present at l[i], for every l in lists.
Now, iterate over idxvals and choose:
>>> idxvals = map(f, itertools.zip_longest(*lists))
>>> [random.choice([*i]) for i in idxvals]
[10, 30, 3, 5]

Python split list by one element

I am analysing data using Python and I have a list of N 2d data arrays. I would like to look at these elements one by one and compare them to the average of the other N-1 elements.
Is there a built-in method in Python to loop over a list and have on one hand a single element and on the other the rest of the list.
I know how to do it the ``ugly'' way by looping over an integer and joining the left and right part:
for i in xrange(N):
my_element = my_list[i]
my_sublist = my_list[:i] + my_list[i+1:]
but is there a Pythonista way of doing it?
We can calculate the sum of all elements. Then we can easily for each element find the sum of the rest of the elements. We subtract from the total sum the value of the current element and then in order to find the average we divide by N - 1:
s = sum(my_list)
for my_element in my_list:
avg_of_others = (s - my_element) / float(len(my_list) - 1)
...
EDIT:
This is an example how it can be extended to numpy:
import numpy as np
l = np.array([(1, 2), (1, 3)])
m = np.array([(3, 1), (2, 4)])
my_list = [l, m]
s = sum(my_list)
for my_element in my_list:
avg_of_others = (s - my_element) / float(len(my_list) - 1)
Nothing built-in that I know of, but maybe a little less copy-intensive using generators:
def iwithout(pos, seq):
for i, elem in enumerate(seq):
if i != pos:
yield elem
for elem, others in (elem, iwithout(i, N) for i, elem in enumerate(N)):
...
# others is now a generator expression running over N
# leaving out elem
I would like to look at these elements one by one and compare them to the average of the other N-1 elements.
For this specific use case, you should just calculate the sum of the entire list once and substract the current element to calculate the average, as explained in JuniorCompressor's answer.
Is there a built-in method in Python to loop over a list and have on one hand a single element and on the other the rest of the list.
For the more general problem, you could use collections.deque to pop the next element from the one end, giving you that element and the remaining elements from the list, and then add it back to the other end before the next iteration of the loop. Both operations are O(1).
my_queue = collections.deque(my_list)
for _ in enumerate(my_list):
my_element = my_queue.popleft() # pop next my_element
my_sublist = my_queue # rest of queue without my_element
print my_element, my_sublist # do stuff...
my_queue.append(my_element) # re-insert my_element
Sample output for my_list = range(5):
0 deque([1, 2, 3, 4])
1 deque([2, 3, 4, 0])
2 deque([3, 4, 0, 1])
3 deque([4, 0, 1, 2])
4 deque([0, 1, 2, 3])

Is there a need for range(len(a))?

One frequently finds expressions of this type in python questions on SO. Either for just accessing all items of the iterable
for i in range(len(a)):
print(a[i])
Which is just a clumbersome way of writing:
for e in a:
print(e)
Or for assigning to elements of the iterable:
for i in range(len(a)):
a[i] = a[i] * 2
Which should be the same as:
for i, e in enumerate(a):
a[i] = e * 2
# Or if it isn't too expensive to create a new iterable
a = [e * 2 for e in a]
Or for filtering over the indices:
for i in range(len(a)):
if i % 2 == 1: continue
print(a[i])
Which could be expressed like this:
for e in a [::2]:
print(e)
Or when you just need the length of the list, and not its content:
for _ in range(len(a)):
doSomethingUnrelatedToA()
Which could be:
for _ in a:
doSomethingUnrelatedToA()
In python we have enumerate, slicing, filter, sorted, etc... As python for constructs are intended to iterate over iterables and not only ranges of integers, are there real-world use-cases where you need in range(len(a))?
If you need to work with indices of a sequence, then yes - you use it... eg for the equivalent of numpy.argsort...:
>>> a = [6, 3, 1, 2, 5, 4]
>>> sorted(range(len(a)), key=a.__getitem__)
[2, 3, 1, 5, 4, 0]
Short answer: mathematically speaking, no, in practical terms, yes, for example for Intentional Programming.
Technically, the answer would be "no, it's not needed" because it's expressible using other constructs. But in practice, I use for i in range(len(a) (or for _ in range(len(a)) if I don't need the index) to make it explicit that I want to iterate as many times as there are items in a sequence without needing to use the items in the sequence for anything.
So: "Is there a need?"? — yes, I need it to express the meaning/intent of the code for readability purposes.
See also: https://en.wikipedia.org/wiki/Intentional_programming
And obviously, if there is no collection that is associated with the iteration at all, for ... in range(len(N)) is the only option, so as to not resort to i = 0; while i < N; i += 1 ...
What if you need to access two elements of the list simultaneously?
for i in range(len(a[0:-1])):
something_new[i] = a[i] * a[i+1]
You can use this, but it's probably less clear:
for i, _ in enumerate(a[0:-1]):
something_new[i] = a[i] * a[i+1]
Personally I'm not 100% happy with either!
Going by the comments as well as personal experience, I say no, there is no need for range(len(a)). Everything you can do with range(len(a)) can be done in another (usually far more efficient) way.
You gave many examples in your post, so I won't repeat them here. Instead, I will give an example for those who say "What if I want just the length of a, not the items?". This is one of the only times you might consider using range(len(a)). However, even this can be done like so:
>>> a = [1, 2, 3, 4]
>>> for _ in a:
... print True
...
True
True
True
True
>>>
Clements answer (as shown by Allik) can also be reworked to remove range(len(a)):
>>> a = [6, 3, 1, 2, 5, 4]
>>> sorted(range(len(a)), key=a.__getitem__)
[2, 3, 1, 5, 4, 0]
>>> # Note however that, in this case, range(len(a)) is more efficient.
>>> [x for x, _ in sorted(enumerate(a), key=lambda i: i[1])]
[2, 3, 1, 5, 4, 0]
>>>
So, in conclusion, range(len(a)) is not needed. Its only upside is readability (its intention is clear). But that is just preference and code style.
Sometimes matplotlib requires range(len(y)), e.g., while y=array([1,2,5,6]), plot(y) works fine, scatter(y) does not. One has to write scatter(range(len(y)),y). (Personally, I think this is a bug in scatter; plot and its friends scatter and stem should use the same calling sequences as much as possible.)
It's nice to have when you need to use the index for some kind of manipulation and having the current element doesn't suffice. Take for instance a binary tree that's stored in an array. If you have a method that asks you to return a list of tuples that contains each nodes direct children then you need the index.
#0 -> 1,2 : 1 -> 3,4 : 2 -> 5,6 : 3 -> 7,8 ...
nodes = [0,1,2,3,4,5,6,7,8,9,10]
children = []
for i in range(len(nodes)):
leftNode = None
rightNode = None
if i*2 + 1 < len(nodes):
leftNode = nodes[i*2 + 1]
if i*2 + 2 < len(nodes):
rightNode = nodes[i*2 + 2]
children.append((leftNode,rightNode))
return children
Of course if the element you're working on is an object, you can just call a get children method. But yea, you only really need the index if you're doing some sort of manipulation.
Sometimes, you really don't care about the collection itself. For instance, creating a simple model fit line to compare an "approximation" with the raw data:
fib_raw = [1, 1, 2, 3, 5, 8, 13, 21] # Fibonacci numbers
phi = (1 + sqrt(5)) / 2
phi2 = (1 - sqrt(5)) / 2
def fib_approx(n): return (phi**n - phi2**n) / sqrt(5)
x = range(len(data))
y = [fib_approx(n) for n in x]
# Now plot to compare fib_raw and y
# Compare error, etc
In this case, the values of the Fibonacci sequence itself were irrelevant. All we needed here was the size of the input sequence we were comparing with.
If you have to iterate over the first len(a) items of an object b (that is larger than a), you should probably use range(len(a)):
for i in range(len(a)):
do_something_with(b[i])
I have an use case I don't believe any of your examples cover.
boxes = [b1, b2, b3]
items = [i1, i2, i3, i4, i5]
for j in range(len(boxes)):
boxes[j].putitemin(items[j])
I'm relatively new to python though so happy to learn a more elegant approach.
Very simple example:
def loadById(self, id):
if id in range(len(self.itemList)):
self.load(self.itemList[id])
I can't think of a solution that does not use the range-len composition quickly.
But probably instead this should be done with try .. except to stay pythonic i guess..
One problem with for i, num in enumerate(a) is that num does not change when you change a[i]. For example, this loop:
for i, num in enumerate(a):
while num > 0:
a[i] -= 1
will never end.
Of course, you could still use enumerate while swapping each use of num for a[i], but that kind of defeats the whole purpose of enumerate, so using for i in range(len(a)) just becomes more logical and readable.
Having a range of indices is useful for some more sophisticated problems in combinatorics. For example, to get all possible partitions of a list into three non-empty sections, the most straightforward approach is to find all possible combinations of distinct endpoints between the first and second section and between the second and third section. This is equivalent to ordered pairs of integers chosen from the valid indices into the list (except zero, since that would make the first partition empty). Thus:
>>> from itertools import combinations
>>> def three_parts(sequence):
... for i, j in combinations(range(1, len(sequence)), 2):
... yield (sequence[:i], sequence[i:j], sequence[j:])
...
>>> list(three_parts('example'))
[('e', 'x', 'ample'), ('e', 'xa', 'mple'), ('e', 'xam', 'ple'), ('e', 'xamp', 'le'), ('e', 'xampl', 'e'), ('ex', 'a', 'mple'), ('ex', 'am', 'ple'), ('ex', 'amp', 'le'), ('ex', 'ampl', 'e'), ('exa', 'm', 'ple'), ('exa', 'mp', 'le'), ('exa', 'mpl', 'e'), ('exam', 'p', 'le'), ('exam', 'pl', 'e'), ('examp', 'l', 'e')]
My code is:
s=["9"]*int(input())
for I in range(len(s)):
while not set(s[I])<=set('01'):s[i]=input(i)
print(bin(sum([int(x,2)for x in s]))[2:])
It is a binary adder but I don't think the range len or the inside can be replaced to make it smaller/better.
I think it's useful for tqdm if you have a large loop and you want to track progress. This will output a progress bar:
from tqdm import tqdm
empty_list = np.full(len(items), np.nan)
for i in tqdm(range(len(items))):
empty_list[i] = do_something(items[i])
This will not show progress, at least in the case I was using it for:
empty_list = np.full(len(items), np.nan)
for i, _ in tqdm(enumerate(items)):
empty_list[i] = do_something(items[i])
Just showed number of iterations. Not as helpful.

Categories