Partition List by indices - python

So there is a list List = ['a', 'b', 'c', 'd', 'e'] and a list of indices Indices = [1, 2, 4].
I want to partition the list into two lists: one containing the elements at the Indices (['b', 'c', 'e']) and one containing all other elements (['a', 'd').
For the first list I already have simple solution.
In_List = [List[i] for i in Indices]
However, for the other list I only have a rather ugly solution
Out_List = [List[i] for i in range(len(List)) if i not in Indices]
The Solution I have works, ... But it feels like there should be a more elegant way of doing this.
Any Suggestions?
Edit/Update
It seems that there are 3 suggestions:
One Loop over indices:
In_List = []
Out_List = []
for i in range(len(List)):
if i in Indices:
In_List.append(List[i])
else:
Out_List.append(List[i])
Loop via enumerate:
In_List = []
Out_List = []
for index, value in enumerate(List):
if index in Indices:
In_List += [value]
else:
Out_List += [value]
Using Numpy:
Indices = np.array(Indices)
List = np.array(List)
In_List = list(List[Indices])
Out_List = list(np.delete(List, Indices))
Thanks to everybody for the suggestion.
I took these three solutions and my initial solution and compared them for differently sized Lists(range(10, 1000, 10)) picking one eighth of the elements every time - averaged over 100 repetitions. It seems that list comprehension is slightly faster than the loops, but not significantly. Numpy seems slower for short lists but absolutely crushes the other solutions for larger lists.
Edit/Update: made the numpy version also return a list and then updated the graph.

It is not more elgant, but at least you avoid running two for loops (which is quite inefficient if you are dealing with a lot of data).
In_List = []
Out_List = []
for i in range(len(List)):
if i in Indices:
In_List.append(List[i])
else:
Out_List.append(List[i])
Edit: you can also write the code above in one liner, but it isn't really readable:
in_List = []
out_List = []
[in_List.append(List[j]) if j in Indices else out_List.append(List[j]) for j in range(len(List))]
If you are ok in using numpy the code will look nicer (though some people may claim using numpy here is using a machine gun to kill a moskito):
import numpy as np
Indices = np.array(Indices)
List = np.array(List)
In_List = List[Indices]
Out_List = np.delete(List, Indices)

this would also work:
List = ['a', 'b', 'c', 'd', 'e']
Indices = [1, 2, 4]
ret = ([], [])
for i, item in enumerate(List):
ret[i in Indices].append(item)
Out_List, In_List = ret
where i use i in Indices as index for the nested tuple ret and then unpack it in the last line to get In_List and Out_List.

You can achieve the same result with only one parsing of your List using the enumerate method:
List = ['a', 'b', 'c', 'd', 'e']
Indices = [1, 2, 4]
In_List = []
Out_List = []
for index, value in enumerate(List):
if index in Indices:
In_List += [value]
else:
Out_List += [value]
It would be even more efficient if your Indices variable was a set instead of a list.

Using numpy boolean mask (with np.in1d):
import numpy as np
lst = np.array(['a', 'b', 'c', 'd', 'e'])
indices = np.array([1, 2, 4])
m = np.in1d(range(lst.size), indices)
in_list, out_list = lst[m], lst[~m] # ['b' 'c' 'e'] ['a' 'd']

you can use itemgetter from operator module:
from operator import itemgetter
my_list = ['a', 'b', 'c', 'd', 'e']
in_indices = [1, 2, 3]
out_indices = set(range(len(my_list))).difference(in_indices)
# also you ca use:
# out_indices = [0, 4]
in_list = list(itemgetter(*in_indices)(my_list ))
out_list = list(itemgetter(*out_indices)(my_list ))
print(in_list)
print(out_list)
output:
['b', 'c', 'd']
['a', 'e']

Related

Making list that takes in value if value appears more than once

If my list has values that appears more than once I want to do the following:
my_list = ['a','b','c','a','a','b']
I want that my_list becomes ['a','b','c']
and at the same time new_list = ['a','a','b']
I have started with the code but can't manage to finish it:
def func(word):
tgt = 1
found = []
lst = [1,2,3,45,6,1]
if lst.count(word)> 1:
found.append(word)
return found, lst
print(func(1))
You can iterate through the list, store the element in one list if it is not visited or in a new list if it is already visited:
my_list = ['a','b','c','a','a','b']
visited, lst, new_list = set(), [], []
for x in my_list:
if x not in visited:
lst.append(x)
visited.add(x)
else:
new_list.append(x)
print(lst, new_list)
# ['a', 'b', 'c'] ['a', 'a', 'b']
my_list = ['a','b','c','a','a','b']
new_list = my_list.copy()
my_list = list(set(my_list))
my_list.sort()
# remove unique items from new_list
for item in my_list:
new_list.pop(new_list.index(item))
I'm going to use a collections.Counter() to count up the occurrences of each items in the list. At that point the keys() become your new my_list and then we will use some list multiplication to construct your new_list.
import collections
data = ['a','b','c','a','a','b']
counted = collections.Counter(data)
At this point finding your new my_list is dead simple:
my_list = list(counted)
print(my_list)
gives you:
['a', 'b', 'c']
Now we can leverage the fact that ['a'] * 4 == ['a', 'a', 'a', 'a'] to construct a list from they keys of counted based on the number of times they were identified.
new_list = []
for key, value in counted.items():
if value > 1:
new_list.extend([key]*(value-1))
print(new_list)
This will give us back:
['a', 'a', 'b']
Full Solution:
import collections
data = ['a','b','c','a','a','b']
counted = collections.Counter(data)
my_list = list(counted)
new_list = []
for key, value in counted.items():
if value > 1:
new_list.extend([key]*(value-1))
print(my_list, new_list)
lst = ["a","b","a","c","d","b"]
new_list=[]
for i in lst:
if i not in new_list: # check the new list if values repeat or not
new_list.append(i) # add the repeating values in new list
for i in new_list:
if i in lst:
lst.remove(i) # remove the repeating values from first list
print(lst,new_list)
at first you can add the repeating values in a different list then remove these values from your first list.
l = ['a', 'b', 'c', 'a', 'c']
arr = []
for x in l:
if x not in a:
arr.append(x)
# now arr only have the non repeating elements of the array l

List of index where corresponding elements of two lists are same

I want to compare two different lists and return the indexes of similar stings.
For example, if I have two lists like:
grades = ['A', 'B', 'A', 'E', 'D']
scored = ['A', 'B', 'F', 'F', 'D']
My expected output is:
[0, 1, 4] #The indexes of similar strings in both lists
However this is the result I am getting at the moment:
[0, 1, 2, 4] #Problem: The 2nd index being counted again
I have tried coding using using two approaches.
First Approach:
def markGrades(grades, scored):
indices = [i for i, item in enumerate(grades) if item in scored]
return indices
Second Approach:
def markGrades(grades, scored):
indices = []
for i, item in enumerate(grades):
if i in scored and i not in indices:
indices.append(i)
return indices
The second approach returns correct strings but not the indexes.
You can use enumerate along with zip in list comprehension to achieve this as:
>>> grades = ['A', 'B', 'A', 'E', 'D']
>>> scored = ['A', 'B', 'F', 'F', 'D']
>>> [i for i, (g, s) in enumerate(zip(grades, scored)) if g==s]
[0, 1, 4]
Issue with your code is that you are not comparing the elements at the same index. Instead via using in you are checking whether elements of one list are present in another list or not.
Because 'A' at index 2 of grades is present in scored list. You are getting index 2 in your resultant list.
Your logic fails in that it doesn't check whether the elements are in the same position, merely that the grades element appears somewhere in scored. If you simply check corresponding elements, you can do this simply.
Using your second approach:
for i, item in enumerate(grades):
if item == scored[i]:
indices.append(i)
The solution that Anonymous gives is what I was about to add as the "Pythonic" way to solve the problem.
You can access the two lists in pairs (to avoid the over-generalization of finding a match anywhere in the other array) with zip
grades = ['A', 'B', 'A', 'E', 'D']
scored = ['A', 'B', 'F', 'F', 'D']
matches = []
for ix, (gr, sc) in enumerate(zip(grades,scored)):
if gr == sc:
matches.append(ix)
or more compactly with list comprehension, if that suits your purpose
matches = [ix for ix, (gr, sc) in enumerate(zip(grades,scored)) if gr == sc]

Splitting string values in list into individual values, Python

I have a list of values in which some values are words separated by commas, but are considered single strings as shown:
l = ["a",
"b,c",
"d,e,f"]
#end result should be
#new_list = ['a','b','c','d','e','f']
I want to split those strings and was wondering if there's a one liner or something short to do such a mutation. So far what, I was thinking of just iterating through l and .split(',')-ing all the elements then merging, but that seems like it would take a while to run.
import itertools
new_list = list(itertools.chain(*[x.split(',') for x in l]))
print(new_list)
>>> ['a', 'b', 'c', 'd', 'e', 'f']
Kind of unusual but you could join all your elements with , and then split them:
l = ["a",
"b,c",
"d,e,f"]
newList = ','.join(l).split(',')
print(newList)
Output:
['a', 'b', 'c', 'd', 'e', 'f']
Here's a one-liner using a (nested) list comprehension:
new_list = [item for csv in l for item in csv.split(',')]
See it run here.
Not exactly a one-liner, but 2 lines:
>>> l = ["a",
"b,c",
"d,e,f"]
>>> ll =[]
>>> [ll.extend(x.split(',')) for x in l]
[None, None, None]
>>> ll
['a', 'b', 'c', 'd', 'e', 'f']
The accumulator needs to be created separately since x.split(',') can not be unpacked inside a comprehension.

Concise way to remove elements from list by index in Python

I have a list of characters and list of indexes
myList = ['a','b','c','d']
toRemove = [0,2]
and I'd like to get this in one operation
myList = ['b','d']
I could do this but is there is a way to do it faster?
toRemove.reverse()
for i in toRemove:
myList.pop(i)
Concise answer
>>> myList = ['a','b','c','d']
>>> toRemove = [0,2]
>>>
>>> [v for i, v in enumerate(myList) if i not in toRemove]
['b', 'd']
>>>
You could use a list comprehension as other answers have suggested, but to make it truly faster I would suggest using a set for the set of indices you want removed.
>>> myList = ['a','b','c','d']
>>> toRemove = set([0,2])
>>> [x for i,x in enumerate(myList) if i not in toRemove]
['b', 'd']
Checking every element in myList against every element in toRemove is O(n*m) (where n is the length of myList and m is the length of toRemove). If you use a set, checking for membership is O(1), so the whole procedure becomes O(n). Keep in mind though, the difference in speed will not be noticeable unless toRemove is really big (say more than a thousand).
If you wanted to, you could use numpy.
import numpy as np
myList = ['a','b','c','d']
toRemove = [0,2]
new_list = np.delete(myList, toRemove)
Result:
>>> new_list
array(['b', 'd'],
dtype='|S1')
Note that new_list is a numpy array.
One-liner:
>>>[myList[x] for x in range(len(myList)) if not x in [0,2]]
['b', 'd']
You could write a function to do it for you.
def removethese(list, *args):
for arg in args:
del list[arg]
Then do
mylist = ['a', 'b', 'c', 'd', 'e']
removethese(mylist, 0, 1, 4)
mylist now is ['c', 'd']

Clone elements of a list

Let's say I have a Python list that looks like this:
list = [ a, b, c, d]
I am looking for the most efficient way performanse wise to get this:
list = [ a, a, a, a, b, b, b, c, c, d ]
So if the list is N elements long then the first element is cloned N-1 times, the second element N-2 times, and so forth...the last element is cloned N-N times or 0 times. Any suggestions on how to do this efficiently on large lists.
Note that I am testing speed, not correctness. If someone wants to edit in a unit test, I'll get around to it.
pyfunc_fastest: 152.58769989 usecs
pyfunc_local_extend: 154.679298401 usecs
pyfunc_iadd: 158.183312416 usecs
pyfunc_xrange: 162.234091759 usecs
pyfunc: 166.495800018 usecs
Ignacio: 238.87629509 usecs
Ishpeck: 311.713695526 usecs
FabrizioM: 456.708812714 usecs
JohnKugleman: 519.239497185 usecs
Bwmat: 1309.29429531 usecs
Test code here. The second revision is trash because I was rushing to get everybody tested that posted after my first batch of tests. These timings are for the fifth revision of the code.
Here's the fastest version that I was able to get.
def pyfunc_fastest(x):
t = []
lenList = len(x)
extend = t.extend
for l in xrange(0, lenList):
extend([x[l]] * (lenList - l))
Oddly, a version that I modified to avoid indexing into the list by using enumerate ran slower than the original.
>>> items = ['a', 'b', 'c', 'd']
>>> [item for i, item in enumerate(items) for j in xrange(len(items) - i)]
['a', 'a', 'a', 'a', 'b', 'b', 'b', 'c', 'c', 'd']
First we use enumerate to pull out both indexes and values at the same time. Then we use a nested for loop to iterate over each item a decreasing number of times. (Notice that the variable j is never used. It is junk.)
This should be near optimal, with minimal memory usage thanks to the use of the enumerate and xrange generators.
How about this - A simple one
>>> x = ['a', 'b', 'c', 'd']
>>> t = []
>>> lenList = len(x)
>>> for l in range(0, lenList):
... t.extend([x[l]] * (lenList - l))
...
>>> t
['a', 'a', 'a', 'a', 'b', 'b', 'b', 'c', 'c', 'd']
>>>
Lazy mode:
import itertools
l = ['foo', 'bar', 'baz', 'quux']
for i in itertools.chain.from_iterable(itertools.repeat(e, len(l) - i)
for i, e in enumerate(l)):
print i
Just shove it through list() if you really do need a list instead.
list(itertools.chain.from_iterable(itertools.repeat(e, len(l) - i)
for i, e in enumerate(l)))
My first instinct..
l = ['a', 'b', 'c', 'd']
nl = []
i = 0
while len(l[i:])>0:
nl.extend( [l[i]]*len(l[i:]) )
i+=1
print nl
The trick is in using repeat from itertools
from itertools import repeat
alist = "a b c d".split()
print [ x for idx, value in enumerate(alist) for x in repeat(value, len(alist) - idx) ]
>>>['a', 'a', 'a', 'a', 'b', 'b', 'b', 'c', 'c', 'd']
Use a generator: it's O(1) memory and O(N^2) cpu, unlike any solution that produces the final list which uses O(N^2) memory and cpu. This means it'll be massively faster as soon as the input list is large enough that the constructed list fills memory and swapping starts. It's unlikely you need to have the final list in memory unless this is homework.
def triangle(seq):
for i, x in enumerate(seq):
for _ in xrange(len(seq) - i - 1):
yield x
To create that new list, list = [ a, a, a, a, b, b, b, c, c, d ] would require O(4n) = O(n) time since for every n elements, you are creating 4n elements in the second array. aaronasterling gives that linear solution.
You could cheat and just not create the new list. Simply, get the index value as input. Divide the index value by 4. Use the result as the index value of the original list.
In pseudocode:
function getElement(int i)
{
int trueIndex = i / 4;
return list[trueIndex]; // Note: that integer division will lead us to the correct index in the original array.
}
fwiw:
>>> lst = list('abcd')
>>> [i for i, j in zip(lst, range(len(lst), 0, -1)) for _ in range(j)]
['a', 'a', 'a', 'a', 'b', 'b', 'b', 'c', 'c', 'd']
def gen_indices(list_length):
for index in range(list_length):
for _ in range(list_length - index):
yield index
new_list = [list[i] for i in gen_indices(len(list))]
untested but I think it'll work

Categories