Python, converting a list of indices to slices - python

So I have a list of indices,
[0, 1, 2, 3, 5, 7, 8, 10]
and want to convert it to this,
[[0, 3], [5], [7, 8], [10]]
this will run on a large number of indices.
Also, this technically isn't for slices in python, the tool I am working with is faster when given a range compared to when given the individual ids.
The pattern is based on being in a range, like slices work in python. So in the example, the 1 and 2 are dropped because they are already included in the range of 0 to 3. The 5 would need accessed individually since it is not in a range, etc. This is more helpful when a large number of ids get included in a range such as [0, 5000].

Since you want the code to be fast, I wouldn't try to be too fancy. A straight-forward approach should perform quite well:
a = [0, 1, 2, 3, 5, 7, 8, 10]
it = iter(a)
start = next(it)
slices = []
for i, x in enumerate(it):
if x - a[i] != 1:
end = a[i]
if start == end:
slices.append([start])
else:
slices.append([start, end])
start = x
if a[-1] == start:
slices.append([start])
else:
slices.append([start, a[-1]])
Admittedly, that's doesn't look too nice, but I expect the nicer solutions I can think of to perform worse. (I did not do a benchmark.)
Here is s slightly nicer, but slower solution:
from itertools import groupby
a = [0, 1, 2, 3, 5, 7, 8, 10]
slices = []
for key, it in groupby(enumerate(a), lambda x: x[1] - x[0]):
indices = [y for x, y in it]
if len(indices) == 1:
slices.append([indices[0]])
else:
slices.append([indices[0], indices[-1]])

def runs(seq):
previous = None
start = None
for value in itertools.chain(seq, [None]):
if start is None:
start = value
if previous is not None and value != previous + 1:
if start == previous:
yield [previous]
else:
yield [start, previous]
start = value
previous = value

Since performance is an issue go with the first solution by #SvenMarnach but here is a fun one liner split into two lines! :D
>>> from itertools import groupby, count
>>> indices = [0, 1, 2, 3, 5, 7, 8, 10]
>>> [[next(v)] + list(v)[-1:]
for k,v in groupby(indices, lambda x,c=count(): x-next(c))]
[[0, 3], [5], [7, 8], [10]]

Below is a simple python code with numpy:
def list_to_slices(inputlist):
"""
Convert a flatten list to a list of slices:
test = [0,2,3,4,5,6,12,99,100,101,102,13,14,18,19,20,25]
list_to_slices(test)
-> [(0, 0), (2, 6), (12, 14), (18, 20), (25, 25), (99, 102)]
"""
inputlist.sort()
pointers = numpy.where(numpy.diff(inputlist) > 1)[0]
pointers = zip(numpy.r_[0, pointers+1], numpy.r_[pointers, len(inputlist)-1])
slices = [(inputlist[i], inputlist[j]) for i, j in pointers]
return slices

If your input is a sorted sequence, which I assume it is, you can do it in a minimalistic way in three steps by employing the old good zip() function:
x = [0, 1, 2, 3, 5, 7, 8, 10]
# find beginnings and endings of sequential runs,
# N.B. the first beginning and the last ending are not included
begs_ends_iter = zip(
*[(x1, x0) for x0, x1 in zip(x[:-1], x[1:]) if x1 - x0 > 1]
)
# handling case when there is only one sequential run
begs, ends = tuple(begs_ends_iter) or ((), ())
# add the first beginning and the last ending,
# combine corresponding beginnings and endings,
# and convert isolated elements into the lists of length one
y = [
[beg] if beg == end else [beg, end]
for beg, end in zip(tuple(x[:1]) + begs, ends + tuple(x[-1:]))
]
If your input is unsorted then sort it and you will get sorted list, which is a sequence. If you have a sorted iterable and do not want to convert it to a sequence (e.g., because it is too long) then you may make use of chain() and pairwise() functions from itertools package (pairwise() is available since Python 3.10):
from itertools import chain, pairwise
x = [0, 1, 2, 3, 5, 7, 8, 10]
# find beginnings and endings of sequential runs,
# N.B. the last beginning and the first ending are None's
begs, ends = zip(
*[
(x1, x0)
for x0, x1 in pairwise(chain([None], x, [None]))
if x0 is None or x1 is None or x1 - x0 > 1
]
)
# removing the last beginning and the first ending,
# combine corresponding beginnings and endings,
# and convert isolated elements into the lists of length one
y = [
[beg] if beg == end else [beg, end]
for beg, end in zip(begs[:-1], ends[1:])
]
These solutions are similar to the one proposed by bougui, but without using numpy. Which may be more efficient if data is not in numpy array already and is not very large sequence or opposite, too large iterable to fit into memory.

Related

How to iterate through a list without including the same element twice?

I was wondering how I could iterate through this list without including the same number twice.
import itertools
def sum_pairs(ints, s):
indexes = []
pair = []
for numbers in itertools.combinations(ints,2):
if sum(numbers) == s:
pair.append(numbers)
for n in numbers:
indexes.append(ints.index(n))
print(pair)
print(indexes)
a = [10, 5, 2, 3, 7, 5]
target = 10
Here's the output:
[(5, 5), (3, 7)]
[1, 1, 3, 4]
'pair' correctly outputs 5 and 5 to equal 10, but when I check where the numbers come from with the variable 'indexes', I can see that the same 5 was used twice and the second five was never taken into consideration. What i'm looking for is how can I modify this to not add the same number twice if it's in the same index. For ex. the output of indexes would be [1, 5, 3, 4].
Thank you so much.
Smuggle the index of each value along with it by using enumerate:
import itertools
def sum_pairs(ints, s):
indexes = []
pair = []
for (ix, x), (iy, y) in itertools.combinations(enumerate(ints),2):
if x + y == s:
pair.append((x, y))
indexes += (ix, iy)
print(pair)
print(indexes)
a = [10, 5, 2, 3, 7, 5]
target = 10
sum_pairs(a, target)
which outputs:
[(5, 5), (3, 7)]
[1, 5, 3, 4]
To simplify the usage of the values, I unpacked the tuple of tuples to names (x and y are the "real" values, ix is the index of x and iy is the index of y). By attaching the index to the value, you always know exactly where it came from, without having to guess at it.
Your use of the index method didn't work because, for all practical purposes, the two 5 in your input are indistinguishable (on CPython, thanks to the small int optimization, they're actually the same object), and index just returns the first one it finds (and has to needlessly rescan for it every time). By keeping the index with the associated value, you don't have to recheck at all, you already know it.
Run combination on the index instead. BTW your indexes is defined not-so-commonly. If you got what I meant, try change the 'extend' with an append below
def sum_pairs(ints, s):
indexes = []
pair = []
for numbers in itertools.combinations(range(len(ints)),2):
if ints[numbers[0]]+ints[numbers[1]] == s:
indexes.extend(numbers)
pair.append((ints[numbers[0]],ints[numbers[1]]))
print(pair)
print(indexes)

Loop from a specific point in a list of lists Python

I would like to append to a new list all elements of an existing list of lists after a specific point
m = [[1,2,3],[4,5,10],[6,2,1]]
specific point = m[0][2]
newlist = [3,4,5,10,6,2,1]
You can directly slice off the remainder of the first target list and then add on all subsequent elements, eg:
m = [[1,2,3],[4,5,10],[6,2,1]]
y, x = 0, 2
new_list = m[y][x:] + [v for el in m[y+1:] for v in el]
# [3, 4, 5, 10, 6, 2, 1]
Here's a couple of functional approaches for efficiently iterating over your data.
If sublists are evenly sized, and you know the index from where to begin extracting elements, use chain + islice:
from itertools import chain, islice
n = 3 # Sublist size.
i,j = 0,2
newlist = list(islice(chain.from_iterable(m), i*n + j, None))
If you don't know the size of your sublists in advance, you can use next to discard the first portion of your data.
V = chain.from_iterable(m)
next(v for v in V if v == m[i][j])
newlist = list(V)
newlist.insert(m[i][j], 0)
This assumes there is no identical value earlier in the sequence.
You can put a conditional in your iteration and only add based on that condition. Once you hit that specific index, make your condition true. Something like this:
m = [[1,2,3],[4,5,10],[6,2,1]]
specific_point = (0,2)
newlist = [3,4,5,10,6,2,1]
output = []
for i in range(len(m)):
for j in range(len(m[i])):
if (i,j) < specific_point:
continue
output.append(m[i][j])
output:
[3, 4, 5, 10, 6, 2, 1]
why not flatten the initial list and go from there
flat_list = [item for sublist in m for item in sublist]
would return [1,2,3,4,5,10,6,2,1] so now you're really on flat_list[2:]
Most of the answers only work for this specific shape of nested list, but it's also possible to create a solution that works with any shape of nested list.
def flatten_from(sequence, path=[]):
start = path.pop(0) if path else 0
for item in sequence[start:]:
if isinstance(item, (list, tuple)):
yield from flatten_from(item, path)
else:
yield item
With the example from the question
>>> list(flatten_from([[1, 2, 3], [4, 5, 10], [6, 2, 1]], [0, 2]))
[3, 4, 5, 10, 6, 2, 1]
It also works with any shape and level of nesting of the input data
m = [[1], [[2], [3, 4, 5, 6, 7]], 8, [9, [10, 11]]]
flatten_from(m, [])) # 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11
flatten_from(m, [2]) # 8, 9, 10, 11
flatten_from(m, [1, 1, 3]) # 6, 7, 8, 9, 10, 11
This is a bit of a bastard algorithm, though. On one hand, it uses nice functional programming concepts: recursion and yield.
On the other hand it relies on the side effect of mutating the path argument with list.pop, so it's not a pure function.
Below solution will work for your case where your array is restricted to list of list and the size of 'sublist' is consistent throughout i.e "3" in your case
m = [[1,2,3],[4,5,10],[6,2,1]] #input 2D array
a, b = 0, 2 #user input --> specific point a and b
flat_list_m = [item for firstlist in m for item in firstlist] #flat the 2D list
print (flat_list_m[len(m[0])*a+b:]) #print from specific position a and b, considering your sublist length is consistent throughout.
I hope this helps! :)

Cycle a list from alternating sides

Given a list
a = [0,1,2,3,4,5,6,7,8,9]
how can I get
b = [0,9,1,8,2,7,3,6,4,5]
That is, produce a new list in which each successive element is alternately taken from the two sides of the original list?
>>> [a[-i//2] if i % 2 else a[i//2] for i in range(len(a))]
[0, 9, 1, 8, 2, 7, 3, 6, 4, 5]
Explanation:
This code picks numbers from the beginning (a[i//2]) and from the end (a[-i//2]) of a, alternatingly (if i%2 else). A total of len(a) numbers are picked, so this produces no ill effects even if len(a) is odd.
[-i//2 for i in range(len(a))] yields 0, -1, -1, -2, -2, -3, -3, -4, -4, -5,
[ i//2 for i in range(len(a))] yields 0, 0, 1, 1, 2, 2, 3, 3, 4, 4,
and i%2 alternates between False and True,
so the indices we extract from a are: 0, -1, 1, -2, 2, -3, 3, -4, 4, -5.
My assessment of pythonicness:
The nice thing about this one-liner is that it's short and shows symmetry (+i//2 and -i//2).
The bad thing, though, is that this symmetry is deceptive:
One might think that -i//2 were the same as i//2 with the sign flipped. But in Python, integer division returns the floor of the result instead of truncating towards zero. So -1//2 == -1.
Also, I find accessing list elements by index less pythonic than iteration.
cycle between getting items from the forward iter and the reversed one. Just make sure you stop at len(a) with islice.
from itertools import islice, cycle
iters = cycle((iter(a), reversed(a)))
b = [next(it) for it in islice(iters, len(a))]
>>> b
[0, 9, 1, 8, 2, 7, 3, 6, 4, 5]
This can easily be put into a single line but then it becomes much more difficult to read:
[next(it) for it in islice(cycle((iter(a),reversed(a))),len(a))]
Putting it in one line would also prevent you from using the other half of the iterators if you wanted to:
>>> iters = cycle((iter(a), reversed(a)))
>>> [next(it) for it in islice(iters, len(a))]
[0, 9, 1, 8, 2, 7, 3, 6, 4, 5]
>>> [next(it) for it in islice(iters, len(a))]
[5, 4, 6, 3, 7, 2, 8, 1, 9, 0]
A very nice one-liner in Python 2.7:
results = list(sum(zip(a, reversed(a))[:len(a)/2], ()))
>>>> [0, 9, 1, 8, 2, 7, 3, 6, 4, 5]
First you zip the list with its reverse, take half that list, sum the tuples to form one tuple, and then convert to list.
In Python 3, zip returns a generator, so you have have to use islice from itertools:
from itertools import islice
results = list(sum(islice(zip(a, reversed(a)),0,int(len(a)/2)),()))
Edit: It appears this only works perfectly for even-list lengths - odd-list lengths will omit the middle element :( A small correction for int(len(a)/2) to int(len(a)/2) + 1 will give you a duplicate middle value, so be warned.
Use the right toolz.
from toolz import interleave, take
b = list(take(len(a), interleave((a, reversed(a)))))
First, I tried something similar to Raymond Hettinger's solution with itertools (Python 3).
from itertools import chain, islice
interleaved = chain.from_iterable(zip(a, reversed(a)))
b = list(islice(interleaved, len(a)))
If you don’t mind sacrificing the source list, a, you can just pop back and forth:
b = [a.pop(-1 if i % 2 else 0) for i in range(len(a))]
Edit:
b = [a.pop(-bool(i % 2)) for i in range(len(a))]
Not terribly different from some of the other answers, but it avoids a conditional expression for determining the sign of the index.
a = range(10)
b = [a[i // (2*(-1)**(i&1))] for i in a]
i & 1 alternates between 0 and 1. This causes the exponent to alternate between 1 and -1. This causes the index divisor to alternate between 2 and -2, which causes the index to alternate from end to end as i increases. The sequence is a[0], a[-1], a[1], a[-2], a[2], a[-3], etc.
(I iterate i over a since in this case each value of a is equal to its index. In general, iterate over range(len(a)).)
The basic principle behind your question is a so-called roundrobin algorithm. The itertools-documentation-page contains a possible implementation of it:
from itertools import cycle, islice
def roundrobin(*iterables):
"""This function is taken from the python documentation!
roundrobin('ABC', 'D', 'EF') --> A D E B F C
Recipe credited to George Sakkis"""
pending = len(iterables)
nexts = cycle(iter(it).__next__ for it in iterables) # next instead of __next__ for py2
while pending:
try:
for next in nexts:
yield next()
except StopIteration:
pending -= 1
nexts = cycle(islice(nexts, pending))
so all you have to do is split your list into two sublists one starting from the left end and one from the right end:
import math
mid = math.ceil(len(a)/2) # Just so that the next line doesn't need to calculate it twice
list(roundrobin(a[:mid], a[:mid-1:-1]))
# Gives you the desired result: [0, 9, 1, 8, 2, 7, 3, 6, 4, 5]
alternatively you could create a longer list (containing alternating items from sequence going from left to right and the items of the complete sequence going right to left) and only take the relevant elements:
list(roundrobin(a, reversed(a)))[:len(a)]
or using it as explicit generator with next:
rr = roundrobin(a, reversed(a))
[next(rr) for _ in range(len(a))]
or the speedy variant suggested by #Tadhg McDonald-Jensen (thank you!):
list(islice(roundrobin(a,reversed(a)),len(a)))
Not sure, whether this can be written more compactly, but it is efficient as it only uses iterators / generators
a = [0,1,2,3,4,5,6,7,8,9]
iter1 = iter(a)
iter2 = reversed(a)
b = [item for n, item in enumerate(
next(iter) for _ in a for iter in (iter1, iter2)
) if n < len(a)]
For fun, here is an itertools variant:
>>> a = [0,1,2,3,4,5,6,7,8,9]
>>> list(chain.from_iterable(izip(islice(a, len(a)//2), reversed(a))))
[0, 9, 1, 8, 2, 7, 3, 6, 4, 5]
This works where len(a) is even. It would need a special code for odd-lengthened input.
Enjoy!
Not at all elegant, but it is a clumsy one-liner:
a = range(10)
[val for pair in zip(a[:len(a)//2],a[-1:(len(a)//2-1):-1]) for val in pair]
Note that it assumes you are doing this for a list of even length. If that breaks, then this breaks (it drops the middle term). Note that I got some of the idea from here.
Two versions not seen yet:
b = list(sum(zip(a, a[::-1]), ())[:len(a)])
and
import itertools as it
b = [a[j] for j in it.accumulate(i*(-1)**i for i in range(len(a)))]
mylist = [0,1,2,3,4,5,6,7,8,9]
result = []
for i in mylist:
result += [i, mylist.pop()]
Note:
Beware: Just like #Tadhg McDonald-Jensen has said (see the comment below)
it'll destroy half of original list object.
One way to do this for even-sized lists (inspired by this post):
a = range(10)
b = [val for pair in zip(a[:5], a[5:][::-1]) for val in pair]
I would do something like this
a = [0,1,2,3,4,5,6,7,8,9]
b = []
i = 0
j = len(a) - 1
mid = (i + j) / 2
while i <= j:
if i == mid and len(a) % 2 == 1:
b.append(a[i])
break
b.extend([a[i], a[j]])
i = i + 1
j = j - 1
print b
You can partition the list into two parts about the middle, reverse the second half and zip the two partitions, like so:
a = [0,1,2,3,4,5,6,7,8,9]
mid = len(a)//2
l = []
for x, y in zip(a[:mid], a[:mid-1:-1]):
l.append(x)
l.append(y)
# if the length is odd
if len(a) % 2 == 1:
l.append(a[mid])
print(l)
Output:
[0, 9, 1, 8, 2, 7, 3, 6, 4, 5]

Pythonic way to merge two overlapping lists, preserving order

Alright, so I have two lists, as such:
They can and will have overlapping items, for example, [1, 2, 3, 4, 5], [4, 5, 6, 7].
There will not be additional items in the overlap, for example, this will not happen: [1, 2, 3, 4, 5], [3.5, 4, 5, 6, 7]
The lists are not necessarily ordered nor unique. [9, 1, 1, 8, 7], [8, 6, 7].
I want to merge the lists such that existing order is preserved, and to merge at the last possible valid position, and such that no data is lost. Additionally, the first list might be huge. My current working code is as such:
master = [1,3,9,8,3,4,5]
addition = [3,4,5,7,8]
def merge(master, addition):
n = 1
while n < len(master):
if master[-n:] == addition[:n]:
return master + addition[n:]
n += 1
return master + addition
What I would like to know is - is there a more efficient way of doing this? It works, but I'm slightly leery of this, because it can run into large runtimes in my application - I'm merging large lists of strings.
EDIT: I'd expect the merge of [1,3,9,8,3,4,5], [3,4,5,7,8] to be: [1,3,9,8,3,4,5,7,8]. For clarity, I've highlighted the overlapping portion.
[9, 1, 1, 8, 7], [8, 6, 7] should merge to [9, 1, 1, 8, 7, 8, 6, 7]
You can try the following:
>>> a = [1, 3, 9, 8, 3, 4, 5]
>>> b = [3, 4, 5, 7, 8]
>>> matches = (i for i in xrange(len(b), 0, -1) if b[:i] == a[-i:])
>>> i = next(matches, 0)
>>> a + b[i:]
[1, 3, 9, 8, 3, 4, 5, 7, 8]
The idea is we check the first i elements of b (b[:i]) with the last i elements of a (a[-i:]). We take i in decreasing order, starting from the length of b until 1 (xrange(len(b), 0, -1)) because we want to match as much as possible. We take the first such i by using next and if we don't find it we use the zero value (next(..., 0)). From the moment we found the i, we add to a the elements of b from index i.
There are a couple of easy optimizations that are possible.
You don't need to start at master[1], since the longest overlap starts at master[-len(addition)]
If you add a call to list.index you can avoid creating sub-lists and comparing lists for each index:
This approach keeps the code pretty understandable too (and easier to optimize by using cython or pypy):
master = [1,3,9,8,3,4,5]
addition = [3,4,5,7,8]
def merge(master, addition):
first = addition[0]
n = max(len(master) - len(addition), 1) # (1)
while 1:
try:
n = master.index(first, n) # (2)
except ValueError:
return master + addition
if master[-n:] == addition[:n]:
return master + addition[n:]
n += 1
This actually isn't too terribly difficult. After all, essentially all you're doing is checking what substring at the end of A lines up with what substring of B.
def merge(a, b):
max_offset = len(b) # can't overlap with greater size than len(b)
for i in reversed(range(max_offset+1)):
# checks for equivalence of decreasing sized slices
if a[-i:] == b[:i]:
break
return a + b[i:]
We can test with your test data by doing:
test_data = [{'a': [1,3,9,8,3,4,5], 'b': [3,4,5,7,8], 'result': [1,3,9,8,3,4,5,7,8]},
{'a': [9, 1, 1, 8, 7], 'b': [8, 6, 7], 'result': [9, 1, 1, 8, 7, 8, 6, 7]}]
all(merge(test['a'], test['b']) == test['result'] for test in test_data)
This runs through every possible combination of slices that could result in an overlap and remembers the result of the overlap if one is found. If nothing is found, it uses the last result of i which will always be 0. Either way, it returns all of a plus everything past b[i] (in the overlap case, that's the non overlapping portion. In the non-overlap case, it's everything)
Note that we can make a couple optimizations in corner cases. For instance, the worst case here is that it runs through the whole list without finding any solution. You could add a quick check at the beginning that might short circuit that worst case
def merge(a, b):
if a[-1] not in b:
return a + b
...
In fact you could take that solution one step further and probably make your algorithm much faster
def merge(a, b):
while True:
try:
idx = b.index(a[-1]) + 1 # leftmost occurrence of a[-1] in b
except ValueError: # a[-1] not in b
return a + b
if a[-idx:] == b[:idx]:
return a + b[:idx]
However this might not find the longest overlap in cases like:
a = [1,2,3,4,1,2,3,4]
b = [3,4,1,2,3,4,5,6]
# result should be [1,2,3,4,1,2,3,4,5,6], but
# this algo produces [1,2,3,4,1,2,3,4,1,2,3,4,5,6]
You could fix that be using rindex instead of index to match the longest slice instead of the shortest, but I'm not sure what that does to your speed. It's certainly slower, but it might be inconsequential. You could also memoize the results and return the shortest result, which might be a better idea.
def merge(a, b):
results = []
while True:
try:
idx = b.index(a[-1]) + 1 # leftmost occurrence of a[-1] in b
except ValueError: # a[-1] not in b
results.append(a + b)
break
if a[-idx:] == b[:idx]:
results.append(a + b[:idx])
return min(results, key=len)
Which should work since merging the longest overlap should produce the shortest result in all cases.
One trivial optimization is not iterating over the whole master list. I.e., replace while n < len(master) with for n in range(min(len(addition), len(master))) (and don't increment n in the loop). If there is no match, your current code will iterate over the entire master list, even if the slices being compared aren't even of the same length.
Another concern is that you're taking slices of master and addition in order to compare them, which creates two new lists every time, and isn't really necessary. This solution (inspired by Boyer-Moore) doesn't use slicing:
def merge(master, addition):
overlap_lens = (i + 1 for i, e in enumerate(addition) if e == master[-1])
for overlap_len in overlap_lens:
for i in range(overlap_len):
if master[-overlap_len + i] != addition[i]:
break
else:
return master + addition[overlap_len:]
return master + addition
The idea here is to generate all the indices of the last element of master in addition, and add 1 to each. Since a valid overlap must end with the last element of master, only those values are lengths of possible overlaps. Then we can check for each of them if the elements before it also line up.
The function currently assumes that master is longer than addition (you'll probably get an IndexError at master[-overlap_len + i] if it isn't). Add a condition to the overlap_lens generator if you can't guarantee it.
It's also non-greedy, i.e. it looks for the smallest non-empty overlap (merge([1, 2, 2], [2, 2, 3]) will return [1, 2, 2, 2, 3]). I think that's what you meant by "to merge at the last possible valid position". If you want a greedy version, reverse the overlap_lens generator.
I don't offer optimizations but another way of looking at the problem. To me, this seems like a particular case of http://en.wikipedia.org/wiki/Longest_common_substring_problem where the substring would always be at the end of the list/string. The following algorithm is the dynamic programming version.
def longest_common_substring(s1, s2):
m = [[0] * (1 + len(s2)) for i in xrange(1 + len(s1))]
longest, x_longest = 0, 0
for x in xrange(1, 1 + len(s1)):
for y in xrange(1, 1 + len(s2)):
if s1[x - 1] == s2[y - 1]:
m[x][y] = m[x - 1][y - 1] + 1
if m[x][y] > longest:
longest = m[x][y]
x_longest = x
else:
m[x][y] = 0
return x_longest - longest, x_longest
master = [1,3,9,8,3,4,5]
addition = [3,4,5,7,8]
s, e = longest_common_substring(master, addition)
if e - s > 1:
print master[:s] + addition
master = [9, 1, 1, 8, 7]
addition = [8, 6, 7]
s, e = longest_common_substring(master, addition)
if e - s > 1:
print master[:s] + addition
else:
print master + addition
[1, 3, 9, 8, 3, 4, 5, 7, 8]
[9, 1, 1, 8, 7, 8, 6, 7]
First of all and for clarity, you can replace your while loop with a for loop:
def merge(master, addition):
for n in xrange(1, len(master)):
if master[-n:] == addition[:n]:
return master + addition[n:]
return master + addition
Then, you don't have to compare all possible slices, but only those for which master's slice starts with the first element of addition:
def merge(master, addition):
indices = [len(master) - i for i, x in enumerate(master) if x == addition[0]]
for n in indices:
if master[-n:] == addition[:n]:
return master + addition[n:]
return master + addition
So instead of comparing slices like this:
1234123141234
3579
3579
3579
3579
3579
3579
3579
3579
3579
3579
3579
3579
3579
you are only doing these comparisons:
1234123141234
| | |
| | 3579
| 3579
3579
How much this will speed up your program depends on the nature of your data: the fewer repeated elements your lists have, the better.
You could also generate a list of indices for addition so its own slices always end with master's last element, further restricting the number of comparisons.
Based on https://stackoverflow.com/a/30056066/541208:
def join_two_lists(a, b):
index = 0
for i in xrange(len(b), 0, -1):
#if everything from start to ith of b is the
#same from the end of a at ith append the result
if b[:i] == a[-i:]:
index = i
break
return a + b[index:]
All the above solutions are similar in terms of using a for / while loop for the merging task. I first tried the solutions by #JuniorCompressor and #TankorSmash, but these solutions are way too slow for merging two large-scale lists (e.g. lists with about millions of elements).
I found using pandas to concatenate lists with large size is much more time-efficient:
import pandas as pd, numpy as np
trainCompIdMaps = pd.DataFrame( { "compoundId": np.random.permutation( range(800) )[0:80], "partition": np.repeat( "train", 80).tolist()} )
testCompIdMaps = pd.DataFrame( {"compoundId": np.random.permutation( range(800) )[0:20], "partition": np.repeat( "test", 20).tolist()} )
# row-wise concatenation for two pandas
compoundIdMaps = pd.concat([trainCompIdMaps, testCompIdMaps], axis=0)
mergedCompIds = np.array(compoundIdMaps["compoundId"])
What you need is a sequence alignment algorithm like Needleman-Wunsch.
Needleman-Wunsch is a global sequence alignment algorithm based on dynamic programming:
I found this nice implementation to merge arbitrary object sequences in python:
https://github.com/ajnisbet/paired
import paired
seq_1 = 'The quick brown fox jumped over the lazy dog'.split(' ')
seq_2 = 'The brown fox leaped over the lazy dog'.split(' ')
alignment = paired.align(seq_1, seq_2)
print(alignment)
# [(0, 0), (1, None), (2, 1), (3, 2), (4, 3), (5, 4), (6, 5), (7, 6), (8, 7)]
for i_1, i_2 in alignment:
print((seq_1[i_1] if i_1 is not None else '').ljust(15), end='')
print(seq_2[i_2] if i_2 is not None else '')
# The The
# quick
# brown brown
# fox fox
# jumped leaped
# over over
# the the
# lazy lazy
# dog dog

Getting the indices of the X largest numbers in a list

Please no built-ins besides len() or range(). I'm studying for a final exam.
Here's an example of what I mean.
def find_numbers(x, lst):
lst = [3, 8, 1, 2, 0, 4, 8, 5]
find_numbers(3, lst) # this should return -> (1, 6, 7)
I tried this not fully....couldn't figure out the best way of going about it:
def find_K_highest(lst, k):
newlst = [0] * k
maxvalue = lst[0]
for i in range(len(lst)):
if lst[i] > maxvalue:
maxvalue = lst[i]
newlst[0] = i
Take the first 3 (x) numbers from the list. The minimum value for the maximum are these. In your case: 3, 8, 1. Their index is (0, 1, 2). Build pairs of them ((3,0), (8,1), (1,2)).
Now sort them by size of the maximum value: ((8,1), (3,0), (1,2)).
With this initial List, you can traverse the rest of the list recursively. Compare the smallest value (1, _) with the next element in the list (2, 3). If that is larger (it is), sort it into the list ((8,1), (3,0), (2,3)) and throw away the smallest.
In the beginning you have many changes in the top 3, but later on, they get rare. Of course you have to keep book about the last position (3, 4, 5, ...) too, when traversing.
An insertion sort for the top N elements should be pretty performant.
Here is a similar problem in Scala but without the need to report the indexes.
I dont know is it good to post a solution, but this seems to work:
def find_K_highest(lst, k):
# escape index error
if k>len(lst):
k=len(lst)
# the output array
idxs = [None]*k
to_watch = range(len(lst))
# do it k times
for i in range(k):
# guess that max value is at least at idx '0' of to_watch
to_del=0
idx = to_watch[to_del]
max_val = lst[idx]
# search through the list for bigger value and its index
for jj in range(len(to_watch)):
j=to_watch[jj]
val = lst[j]
# check that its bigger that previously finded max
if val > max_val:
idx = j
max_val = val
to_del=jj
# append it
idxs[i] = idx
del to_watch[to_del]
# return answer
return idxs
PS I tried to explain every line of code.
Can you use list methods? (e.g. append, sort, index?). If so, this should work (I think...)
def find_numbers(n,lst):
ll=lst[:]
ll.sort()
biggest=ll[-n:]
idx=[lst.index(i) for i in biggest] #This has the indices already, but we could have trouble if one of the numbers appeared twice
idx.sort()
#check for duplicates. Duplicates will always be next to each other since we sorted.
for i in range(1,len(idx)):
if(idx[i-1]==idx[i]):
idx[i]=idx[i]+lst[idx[i]+1:].index(lst[idx[i]]) #found a duplicate, chop up the input list and find the new index of that number
idx.sort()
return idx
lst = [3, 8, 1, 2, 0, 4, 8, 5]
print find_numbers(3, lst)
Dude. You have two ways you can go with this.
First way is to be clever. Phyc your teacher out. What she is looking for is recursion. You can write this with NO recursion and NO built in functions or methods:
#!/usr/bin/python
lst = [3, 8, 1, 2, 0, 4, 8, 5]
minval=-2**64
largest=[]
def enum(lst):
for i in range(len(lst)):
yield i,lst[i]
for x in range(3):
m=minval
m_index=None
for i,j in enum(lst):
if j>m:
m=j
m_index=i
if m_index:
largest=largest+[m_index]
lst[m_index]=minval
print largest
This works. It is clever. Take that teacher!!! BUT, you will get a C or lower...
OR -- you can be the teacher's pet. Write it the way she wants. You will need a recursive max of a list. The rest is easy!
def max_of_l(l):
if len(l) <= 1:
if not l:
raise ValueError("Max() arg is an empty sequence")
else:
return l[0]
else:
m = max_of_l(l[1:])
return m if m > l[0] else l[0]
print max_of_l([3, 8, 1, 2, 0, 4, 8, 5])

Categories