Clone elements of a list - python

Let's say I have a Python list that looks like this:
list = [ a, b, c, d]
I am looking for the most efficient way performanse wise to get this:
list = [ a, a, a, a, b, b, b, c, c, d ]
So if the list is N elements long then the first element is cloned N-1 times, the second element N-2 times, and so forth...the last element is cloned N-N times or 0 times. Any suggestions on how to do this efficiently on large lists.

Note that I am testing speed, not correctness. If someone wants to edit in a unit test, I'll get around to it.
pyfunc_fastest: 152.58769989 usecs
pyfunc_local_extend: 154.679298401 usecs
pyfunc_iadd: 158.183312416 usecs
pyfunc_xrange: 162.234091759 usecs
pyfunc: 166.495800018 usecs
Ignacio: 238.87629509 usecs
Ishpeck: 311.713695526 usecs
FabrizioM: 456.708812714 usecs
JohnKugleman: 519.239497185 usecs
Bwmat: 1309.29429531 usecs
Test code here. The second revision is trash because I was rushing to get everybody tested that posted after my first batch of tests. These timings are for the fifth revision of the code.
Here's the fastest version that I was able to get.
def pyfunc_fastest(x):
t = []
lenList = len(x)
extend = t.extend
for l in xrange(0, lenList):
extend([x[l]] * (lenList - l))
Oddly, a version that I modified to avoid indexing into the list by using enumerate ran slower than the original.

>>> items = ['a', 'b', 'c', 'd']
>>> [item for i, item in enumerate(items) for j in xrange(len(items) - i)]
['a', 'a', 'a', 'a', 'b', 'b', 'b', 'c', 'c', 'd']
First we use enumerate to pull out both indexes and values at the same time. Then we use a nested for loop to iterate over each item a decreasing number of times. (Notice that the variable j is never used. It is junk.)
This should be near optimal, with minimal memory usage thanks to the use of the enumerate and xrange generators.

How about this - A simple one
>>> x = ['a', 'b', 'c', 'd']
>>> t = []
>>> lenList = len(x)
>>> for l in range(0, lenList):
... t.extend([x[l]] * (lenList - l))
...
>>> t
['a', 'a', 'a', 'a', 'b', 'b', 'b', 'c', 'c', 'd']
>>>

Lazy mode:
import itertools
l = ['foo', 'bar', 'baz', 'quux']
for i in itertools.chain.from_iterable(itertools.repeat(e, len(l) - i)
for i, e in enumerate(l)):
print i
Just shove it through list() if you really do need a list instead.
list(itertools.chain.from_iterable(itertools.repeat(e, len(l) - i)
for i, e in enumerate(l)))

My first instinct..
l = ['a', 'b', 'c', 'd']
nl = []
i = 0
while len(l[i:])>0:
nl.extend( [l[i]]*len(l[i:]) )
i+=1
print nl

The trick is in using repeat from itertools
from itertools import repeat
alist = "a b c d".split()
print [ x for idx, value in enumerate(alist) for x in repeat(value, len(alist) - idx) ]
>>>['a', 'a', 'a', 'a', 'b', 'b', 'b', 'c', 'c', 'd']

Use a generator: it's O(1) memory and O(N^2) cpu, unlike any solution that produces the final list which uses O(N^2) memory and cpu. This means it'll be massively faster as soon as the input list is large enough that the constructed list fills memory and swapping starts. It's unlikely you need to have the final list in memory unless this is homework.
def triangle(seq):
for i, x in enumerate(seq):
for _ in xrange(len(seq) - i - 1):
yield x

To create that new list, list = [ a, a, a, a, b, b, b, c, c, d ] would require O(4n) = O(n) time since for every n elements, you are creating 4n elements in the second array. aaronasterling gives that linear solution.
You could cheat and just not create the new list. Simply, get the index value as input. Divide the index value by 4. Use the result as the index value of the original list.
In pseudocode:
function getElement(int i)
{
int trueIndex = i / 4;
return list[trueIndex]; // Note: that integer division will lead us to the correct index in the original array.
}

fwiw:
>>> lst = list('abcd')
>>> [i for i, j in zip(lst, range(len(lst), 0, -1)) for _ in range(j)]
['a', 'a', 'a', 'a', 'b', 'b', 'b', 'c', 'c', 'd']

def gen_indices(list_length):
for index in range(list_length):
for _ in range(list_length - index):
yield index
new_list = [list[i] for i in gen_indices(len(list))]
untested but I think it'll work

Related

Basic Sorting / Order Algorithm

Trying to implement and form a very simple algorithm. This algorithm takes in a sequence of letters or numbers. It first creates an array (list) out of each character or digit. Then it checks each individual character compared with the following character in the sequence. If the two are equal, it removes the character from the array.
For example the input: 12223344112233 or AAAABBBCCCDDAAABB
And the output should be: 1234123 or ABCDAB
I believe the issue stems from the fact I created a counter and increment each loop. I use this counter for my comparison using the counter as an index marker in the array. Although, each time I remove an item from the array it changes the index while the counter increases.
Here is the code I have:
def sort(i):
iter = list(i)
counter = 0
for item in iter:
if item == iter[counter + 1]:
del iter[counter]
counter = counter + 1
return iter
You're iterating over the same list that you are deleting from. That usually causes behaviour that you would not expect. Make a copy of the list & iterate over that.
However, there is a simpler solution: Use itertools.groupby
import itertools
def sort(i):
return [x for x, _ in itertools.groupby(list(i))]
print(sort('12223344112233'))
Output:
['1', '2', '3', '4', '1', '2', '3']
A few alternatives, all using s = 'AAAABBBCCCDDAAABB' as setup:
>>> import re
>>> re.sub(r'(.)\1+', r'\1', s)
'ABCDAB'
>>> p = None
>>> [c for c in s if p != (p := c)]
['A', 'B', 'C', 'D', 'A', 'B']
>>> [c for c, p in zip(s, [None] + list(s)) if c != p]
['A', 'B', 'C', 'D', 'A', 'B']
>>> [c for i, c in enumerate(s) if not s.endswith(c, None, i)]
['A', 'B', 'C', 'D', 'A', 'B']
The other answers a good. This one iterates over the list in reverse to prevent skipping items, and uses the look ahead type algorithm OP described. Quick note OP this really isn't a sorting algorithm.
def sort(input_str: str) -> str:
as_list = list(input_str)
for idx in range(len(as_list), 0, -1)):
if item == as_list[idx-1]:
del as_list[idx]
return ''.join(as_list)

Partition List by indices

So there is a list List = ['a', 'b', 'c', 'd', 'e'] and a list of indices Indices = [1, 2, 4].
I want to partition the list into two lists: one containing the elements at the Indices (['b', 'c', 'e']) and one containing all other elements (['a', 'd').
For the first list I already have simple solution.
In_List = [List[i] for i in Indices]
However, for the other list I only have a rather ugly solution
Out_List = [List[i] for i in range(len(List)) if i not in Indices]
The Solution I have works, ... But it feels like there should be a more elegant way of doing this.
Any Suggestions?
Edit/Update
It seems that there are 3 suggestions:
One Loop over indices:
In_List = []
Out_List = []
for i in range(len(List)):
if i in Indices:
In_List.append(List[i])
else:
Out_List.append(List[i])
Loop via enumerate:
In_List = []
Out_List = []
for index, value in enumerate(List):
if index in Indices:
In_List += [value]
else:
Out_List += [value]
Using Numpy:
Indices = np.array(Indices)
List = np.array(List)
In_List = list(List[Indices])
Out_List = list(np.delete(List, Indices))
Thanks to everybody for the suggestion.
I took these three solutions and my initial solution and compared them for differently sized Lists(range(10, 1000, 10)) picking one eighth of the elements every time - averaged over 100 repetitions. It seems that list comprehension is slightly faster than the loops, but not significantly. Numpy seems slower for short lists but absolutely crushes the other solutions for larger lists.
Edit/Update: made the numpy version also return a list and then updated the graph.
It is not more elgant, but at least you avoid running two for loops (which is quite inefficient if you are dealing with a lot of data).
In_List = []
Out_List = []
for i in range(len(List)):
if i in Indices:
In_List.append(List[i])
else:
Out_List.append(List[i])
Edit: you can also write the code above in one liner, but it isn't really readable:
in_List = []
out_List = []
[in_List.append(List[j]) if j in Indices else out_List.append(List[j]) for j in range(len(List))]
If you are ok in using numpy the code will look nicer (though some people may claim using numpy here is using a machine gun to kill a moskito):
import numpy as np
Indices = np.array(Indices)
List = np.array(List)
In_List = List[Indices]
Out_List = np.delete(List, Indices)
this would also work:
List = ['a', 'b', 'c', 'd', 'e']
Indices = [1, 2, 4]
ret = ([], [])
for i, item in enumerate(List):
ret[i in Indices].append(item)
Out_List, In_List = ret
where i use i in Indices as index for the nested tuple ret and then unpack it in the last line to get In_List and Out_List.
You can achieve the same result with only one parsing of your List using the enumerate method:
List = ['a', 'b', 'c', 'd', 'e']
Indices = [1, 2, 4]
In_List = []
Out_List = []
for index, value in enumerate(List):
if index in Indices:
In_List += [value]
else:
Out_List += [value]
It would be even more efficient if your Indices variable was a set instead of a list.
Using numpy boolean mask (with np.in1d):
import numpy as np
lst = np.array(['a', 'b', 'c', 'd', 'e'])
indices = np.array([1, 2, 4])
m = np.in1d(range(lst.size), indices)
in_list, out_list = lst[m], lst[~m] # ['b' 'c' 'e'] ['a' 'd']
you can use itemgetter from operator module:
from operator import itemgetter
my_list = ['a', 'b', 'c', 'd', 'e']
in_indices = [1, 2, 3]
out_indices = set(range(len(my_list))).difference(in_indices)
# also you ca use:
# out_indices = [0, 4]
in_list = list(itemgetter(*in_indices)(my_list ))
out_list = list(itemgetter(*out_indices)(my_list ))
print(in_list)
print(out_list)
output:
['b', 'c', 'd']
['a', 'e']

Reverse a List by Swap Ends

I am trying to reverse a list's order by finding three bugs in this function. This function is supposed to reverse the first and last elements of a list, the second and second to last elements, and so on. I believe I found two, but am having trouble fixing the line of list[j] = y.
def reverse(list):
"""Reverses elements of a list."""
for i in range(len(list)):
j = len(list) - i
x = list[i]
y = list[j-1]
list[i] = x
list[j] = y
l = ['a', 'b', 'c', 'd', 'e']
reverse(l)
print(l)
Homework I suspect...
But - we all need a break from homework. By looping over the whole list you're reversing it twice.
def reverse(list):
"""Reverses elements of a list."""
for i in range(len(list)/2):
j = i + 1
x = list[i]
y = list[-j]
list[-j] = x
list[i] = y
l = ['a', 'b', 'c', 'd', 'e']
l=reverse(l)
print(l)
resulting in
['e', 'd', 'c', 'b', 'a']
You have a couple problems. Your first problem is that you use list[j] = y instead of list[j-1] = x. You defined y correctly with j-1, but you should be changing list[j-1] to the other one, x. Another problem is that you are going from the beginning of the list all the way to the end. Once you get to more than half way through the list, you are undoing your work. You also don't need to use len(list)-i because you can just use -i. Here is the updated code:
def reverse(seq):
"""Reverses elements of a list."""
for i in range(len(seq)//2):
x = seq[i]
y = seq[-i-1]
seq[i] = y
seq[-i-1] = x
l = ['a', 'b', 'c', 'd', 'e']
reverse(l)
print(l)
Output:
['e', 'd', 'c', 'b', 'a']
You don't even need to define x and y. Instead, do this:
def reverse(seq):
"""Reverses elements of a list."""
for i in range(len(list)//2):
seq[i], seq[-i-1] = seq[-i-1], seq[i]
I also changed your naming. There's probably a better name than seq, but list is unacceptable because it conflicts with the built-in type.
Use this code:
l = ['a', 'b', 'c', 'd', 'e']
l=l[::-1]
print(l)
Why you want to complicate this simple construction? Or if you don't wanna do this on that way, try to use:
l.reverse()
function. Python has a lot of functions ready to use.

Increment the next element based on previous element

When looping through a list, you can work with the current item of the list. For example, if you want to replace certain items with others, you can use:
a=['a','b','c','d','e']
b=[]
for i in a:
if i=='b':
b.append('replacement')
else:
b.append(i)
print b
['a', 'replacement', 'c', 'd', 'e']
However, I wish the replace certain values not based on index i, but based on index i+1. I've been trying for ages and I can't seem to make it work. I would like something like this:
c=['a','b','c','d','e']
d=[]
for i in c:
if i+1=='b':
d.append('replacement')
else:
d.append(i)
print d
d=['replacement','b','c','d','e']
Is there any way to achieve this?
Use a list comprehension along with enumerate
>>> ['replacement' if a[i+1]=='b' else v for i,v in enumerate(a[:-1])]+[a[-1]]
['replacement', 'b', 'c', 'd', 'e']
The code replaces all those elements where the next element is b. However to take care of the last index and prevent IndexError, we just append the last element and loop till the penultimate element.
Without a list comprehension
a=['a','b','c','d','e']
d=[]
for i,v in enumerate(a[:-1]):
if a[i+1]=='b':
d.append('replacement')
else:
d.append(v)
d.append(a[-1])
print d
It's generally better style to not iterate over indices in Python. A common way to approach a problem like this is to use zip (or the similar izip_longest in itertools) to see multiple values at once:
In [32]: from itertools import izip_longest
In [33]: a=['a','b','c','d','e']
In [34]: b = []
In [35]: for c, next in izip_longest(a, a[1:]):
....: if next == 'd':
....: b.append("replacement")
....: else:
....: b.append(c)
....:
In [36]: b
Out[36]: ['a', 'b', 'replacement', 'd', 'e']
I think there's a confusion in your post between the list indices and list elements. In the loop as you have written it i will be the actual element (e.g. 'b') and not the index, thus i+1 is meaningless and will throw a TypeError exception.
I think one of the smallest set of changes you can do to your example to make it work is:
c = ['a', 'b', 'c', 'd', 'e']
d = []
for i, el in enumerate(c[:-1]):
if c[i + 1] == 'b':
d.append('replacement')
else:
d.append(el)
print d
# Output...
# ['replacement', 'b', 'c', 'd']
Additionally it's undefined how you should deal with the boundaries. Particularly when i points to the last element 'e', what should i+1 point to? There are many possible answers here. In the example above I've chosen one option, which is to end the iteration one element early (so we never point to the last element e).
If I was doing this I would do something similar to a combination of the other answers:
c = ['a', 'b', 'c', 'd', 'e']
d = ['replacement' if next == 'b' else current
for current, next in zip(c[:-1], c[1:]) ]
print d
# Output...
# ['replacement', 'b', 'c', 'd']
where I have used a list comprehension to avoid the loop, and zip on the list and a shifted list to avoid the explicit indices.
Try using index of current element to check for the next element in the list .
Replace
if i+1=='b':
with
if c[c.index(i)+1]=='b':

Python list subtraction operation *respecting the repetitions*

I am looking to do the subtraction of a list from another list but by respecting repetitions:
>>> a = ['a', 'b', 'c','c', 'c', 'c', 'd', 'e', 'e']
>>> b = ['a', 'c', 'e', 'f','c']
>>> a - b
['b', 'c','c', 'd', 'e']
Order of elements does not matter.
There is a question with answers here but it ignores the repetitions. Solutions there would give:
>>> a - b
['b', 'd']
One solution considers duplicates but it alters one of the original list:
[i for i in a if not i in b or b.remove(i)]
I wrote this solution:
a_sub_b = list(a)
b_sub_a = list(b)
for e in a:
if e in b_sub_a:
a_sub_b.remove(e)
b_sub_a.remove(e)
print a_sub_b # a - b
print b_sub_a # b - a
That works for me , but is there a better solution , simpler or more efficient ?
If order doesn't matter, use collections.Counter:
c = list((Counter(a) - Counter(b)).elements())
Counter(a) - Counter(b) builds a Counter with the count of an element x equal to the number of times x appears in a minus the number of times x appears in b. elements() creates an iterator that yields each element a number of times equal to its count, and list turns that into a list. The whole thing takes O(len(a)+len(b)) time.
Note that depending on what you're doing, it might be best to not work in terms of lists and just keep a, b, and c represented as Counters.
This is going to search every element of b for each element of a. It's also going to do a linear remove on each list for each element that matches. So, your algorithm takes quadratic time—O(max(N, M)^2) where N is the length of a and M is the length of b.
If you just copy b into a set instead of a list, that solves the problem. Now you're just doing a constant-time set lookup for each element in a, and a constant-time set remove instead of a list remove. But you've still got the problem with the linear-time and incorrect removing from the a copy. And you can't just copy a into a set, because that loses duplicates.
On top of that, a_sub_b.remove(e) removes an element matching e. That isn't necessarily the same element as the element you just looked up. It's going to be an equal element, and if identity doesn't matter at all, that's fine… but if it does, then remove may do the wrong thing.
At any rate, performance is already a good enough reason not to use remove. Once you've solved the problems above, this is the only thing making your algorithm quadratic instead of linear.
The easiest way to solve this problem is to build up a new list, rather than copying the list and removing from it.
Solving both problems, you have O(2N+M) time, which is linear.
So, putting the two together:
b_set = set(b)
new_a = []
for element in a:
if a in b_set:
b_set.remove(element)
else:
new_a.append(element)
However, this still may have a problem. You haven't stated things very clearly, so it's hard to be sure, but can b contain duplicates, and, if so, does that mean the duplicated elements should be removed from a multiple times? If so, you need a multi-set, not a set. The easiest way to do that in Python is with a Counter:
from collections import Counter
b_counts = Counter(b)
new_a = []
for element in a:
if b_counts[element]:
b_counts[element] -= 1
else:
new_a.append(element)
On the other hand, if the order of neither a nor b matters, this just reduces to multiset difference, which makes it even easier:
new_a = list((Counter(a) - Counter(b)).elements())
But really, if the order of both is meaningless, you probably should have been using a Counter or other multiset representation in the first place, not a list…
The following uses standard library only:
a = ['a', 'b', 'b', 'c', 'c', 'c', 'c', 'd', 'd', 'd', 'e', 'e']
b = ['a', 'c', 'e', 'f','c']
a_set = set(a)
b_set = set(b)
only_in_a = list(a_set - b_set)
diff_list = list()
for _o in only_in_a:
tmp = a.count(_o) * _o
diff_list.extend(tmp)
for _b in b_set:
tmp = (a.count(_b) - b.count(_b)) * _b
diff_list.extend(tmp)
print diff_list
And gives:
['b', 'b', 'd', 'd', 'd', 'c', 'c', 'e']
as expected.

Categories