I was wondering what would be an efficient an elegant way of slicing a python list based on the index. In order to provide a minimal example:
temp = ['a','b','c','d']
index_needed=[0,2]
How can I slice the list without the loop?
expected output
output_list =['a','c']
I have a sense that there would be a way but haven't figured out any. Any suggestions?
First, note that indexing in Python begins at 0. So the indices you need will be [0, 2].
You can then use a list comprehension:
temp = ['a', 'b', 'c', 'd']
idx = [0, 2]
res = [temp[i] for i in idx] # ['a', 'c']
With built-ins, you may find map performs better:
res = map(temp.__getitem__, idx) # ['a', 'c']
Since you are using Python 2.7, this returns a list. For Python 3.x, you would need to pass the map object to list.
If you are looking to avoid a Python-level loop altogether, you may wish to use a 3rd party library such as NumPy:
import numpy as np
temp = np.array(['a', 'b', 'c', 'd'])
res = temp[idx]
# array(['a', 'c'],
# dtype='<U1')
res2 = np.delete(temp, idx)
# array(['b', 'd'],
# dtype='<U1')
This returns a NumPy array, which you can then be converted to a list via res.tolist().
Use this :
temp = ['a','b','c','d']
temp[0:4:2]
#Output
['a', 'c']
Here first value is starting index number which is (Included) second value is ending index number which is (Excluded) and third value is (steps) to be taken.
Happy Learning...:)
An alternative that pushes the work to the C layer on CPython (the reference interpreter):
from operator import itemgetter
temp = ['a','b','c','d']
index_needed=[0,2]
output_list = itemgetter(*index_needed)(temp)
That returns tuple of the values; if list is necessary, just wrap in the list constructor:
output_list = list(itemgetter(*index_needed)(temp))
Note that this only works properly if you need at least two indices; itemgetter is variable return type based on how it's initialized, returning the value directly when it's passed a single key to pull, and a tuple of values when passed more than one key.
It's also not particularly efficient for one-off uses. A more common use case would be if you had an iterable of sequences (typically tuples, but any sequence works), and don't care about them. For example, with an input list of:
allvalues = [(1, 2, 3, 4),
(5, 6, 7, 8)]
if you only wanted the values from index 1 and 3, you could write a loop like:
for _, x, _, y in allvalues:
where you unpack all the values but send the ones you don't care about to _ to indicate the lack of interest, or you can use itemgetter and map to strip them down to what you care about before the unpack:
from future_builtins import map # Because Py2's map is terrible; not needed on Py3
for x, y in map(itemgetter(1, 3), allvalues):
The itemgetter based approach doesn't care if you have more than four items in a given element of allvalues, while manual unpacking would always require exactly four; which is better is largely based on your use case.
Related
I want to get an intersection of lists where duplication is not eliminated.
And I hope that the method is a fast way not to use loops.
Below was my attempt, but this method failed because duplicates were removed.
a = ['a','b','c','f']
b = ['a','b','b','o','k']
tmp = list(set(a) & set(b))
>>>tmp
>>>['b','a']
I want the result to be ['a', 'b', 'b'].
In this method, 'a' is a fixed value and 'b' is a variable value.
And the concept of extracting 'a' value from 'b'.
Is there a way to extract a list of cross-values that do not remove duplicate values?
A solution could be
good = set(a)
result = [x for x in b if x in good]
there are two loops here; one is the set-building loop of set (that is implemented in C, a hundred of times faster than whatever you can do in Python) the other is the comprehension and runs in the interpreter.
The first loop is done to avoid a linear search in a for each element of b (if a becomes big this can be a serious problem).
Note that using filter instead is probably not going to gain much (if anything) because despite the filter loop being in C, for each element it will have to get back to the interpreter to call the filtering function.
Note that if you care about speed then probably Python is not a good choice... for example may be PyPy would be better here and in this case just writing an optimal algorithm explicitly should be ok (avoiding re-searching a for duplicates when they are consecutive in b like happens in your example)
good = set(a)
res = []
i = 0
while i < len(b):
x = b[i]
if x in good:
while i < len(b) and b[i] == x: # is?
res.append(x)
i += 1
else:
i += 1
Of course in performance optimization the only real way is try and measure with real data on the real system... guessing works less and less as technology advances and becomes more complicated.
If you insist on not using for explicitly then this will work:
>>> list(filter(a.__contains__, b))
['a', 'b', 'b']
But directly calling magic methods like __contains__ is not a recommended practice to the best of my knowledge, so consider this instead:
>>> list(filter(lambda x: x in a, b))
['a', 'b', 'b']
And if you want to improve the lookup in a from O(n) to O(1) then create a set of it first:
>>> a_set = set(a)
>>> list(filter(lambda x: x in a_set, b))
['a', 'b', 'b']
>>a = ['a','b','c','f']
>>b = ['a','b','b','o','k']
>>items = set(a)
>>found = [i for i in b if i in items]
>>items
{'f', 'a', 'c', 'b'}
>>found
['a', 'b', 'b']
This should do your work.
I guess it's not faster than a loop and finally you probably still need a loop to extract the result. Anyway...
from collections import Counter
a = ['a','a','b','c','f']
b = ['a','b','b','o','k']
count_b = Counter(b)
count_ab = Counter(set(b)-set(a))
count_b - count_ab
#=> Counter({'a': 1, 'b': 2})
I mean, if res holds the result, you need to:
[ val for sublist in [ [s] * n for s, n in res.items() ] for val in sublist ]
#=> ['a', 'b', 'b']
It isn't clear how duplicates are handled when performing an intersection of lists which contain duplicate elements, as you have given only one test case and its expected result, and you did not explain duplicate handling.
According to how keeping duplicates work currently, the common elements are 'a' and 'b', and the intersection list lists 'a' with multiplicity 1 and 'b' with multiplicity 2. Note 'a' occurs once on both lists a and b, but 'b' occurs twice on b. The intersection list lists the common element with multiplicity equal to the list having that element at the maximum multiplicity.
The answer is yes. However, a loop may implicitly be called - though you want your code to not explicitly use any loop statements. This algorithm, however, will always be iterative.
Step 1: Create the intersection set, Intersect that does not contain duplicates (You already done that). Convert to list to keep indexing.
Step 2: Create a second array, IntersectD. Create a new variable Freq which counts the maximum number of occurrences for that common element, using count. Use Intersect and Freq to append the element Intersect[k] a number of times depending on its corresponding Freq[k].
An example code with 3 lists would be
a = ['a','b','c','1','1','1','1','2','3','o']
b = ['a','b','b','o','1','o','1']
c = ['a','a','a','b','1','2']
intersect = list(set(a) & set(b) & set(c)) # 3-set case
intersectD = []
for k in range(len(intersect)):
cmn = intersect[k]
freq = max(a.count(cmn), b.count(cmn), c.count(cmn)) # 3-set case
for i in range(freq): # Can be done with itertools
intersectD.append(cmn)
>>> intersectD
>>> ['b', 'b', 'a', 'a', 'a', '1', '1', '1', '1']
For cases involving more than two lists, freq for this common element can be computed using a more complex set intersection and max expression. If using a list of lists, freq can be computed using an inner loop. You can also replace the inner i-loop with an itertools expression from How can I count the occurrences of a list item?.
What is the best way to categorize a list in python?
for example:
totalist is below
totalist[1] = ['A','B','C','D','E']
totalist[2] = ['A','B','X','Y','Z']
totalist[3] = ['A','F','T','U','V']
totalist[4] = ['A','F','M','N','O']
Say I want to get the list where the first two items are ['A','B'], basically list[1] and list[2]. Is there an easy way to get these without iterate one item at a time? Like something like this?
if ['A','B'] in totalist
I know that doesn't work.
You could check the first two elements of each list.
for totalist in all_lists:
if totalist[:2] == ['A', 'B']:
# Do something.
Note: The one-liner solutions suggested by Kasramvd are quite nice too. I found my solution more readable. Though I should say comprehensions are slightly faster than regular for loops. (Which I tested myself.)
Just for fun, itertools solution to push per-element work to the C layer:
from future_builtins import map # Py2 only; not needed on Py3
from itertools import compress
from operator import itemgetter
# Generator
prefixes = map(itemgetter(slice(2)), totalist)
selectors = map(['A','B'].__eq__, prefixes)
# If you need them one at a time, just skip list wrapping and iterate
# compress output directly
matches = list(compress(totalist, selectors))
This could all be one-lined to:
matches = list(compress(totalist, map(['A','B'].__eq__, map(itemgetter(slice(2)), totalist))))
but I wouldn't recommend it. Incidentally, if totalist might be a generator, not a re-iterable sequence, you'd want to use itertools.tee to double it, adding:
totalist, forselection = itertools.tee(totalist, 2)
and changing the definition of prefixes to map over forselection, not totalist; since compress iterates both iterators in parallel, tee won't have meaningful memory overhead.
Of course, as others have noted, even moving to C, this is a linear algorithm. Ideally, you'd use something like a collections.defaultdict(list) to map from two element prefixes of each list (converted to tuple to make them legal dict keys) to a list of all lists with that prefix. Then, instead of linear search over N lists to find those with matching prefixes, you just do totaldict['A', 'B'] and you get the results with O(1) lookup (and less fixed work too; no constant slicing).
Example precompute work:
from collections import defaultdict
totaldict = defaultdict(list)
for x in totalist:
totaldict[tuple(x[:2])].append(x)
# Optionally, to prevent autovivification later:
totaldict = dict(totaldict)
Then you can get matches effectively instantly for any two element prefix with just:
matches = totaldict['A', 'B']
You could do this.
>>> for i in totalist:
... if ['A','B']==i[:2]:
... print i
Basically you can't do this in python with a nested list. But if you are looking for an optimized approach here are some ways:
Use a simple list comprehension, by comparing the intended list with only first two items of sub lists:
>>> [sub for sub in totalist if sub[:2] == ['A', 'B']]
[['A', 'B', 'C', 'D', 'E'], ['A', 'B', 'X', 'Y', 'Z']]
If you want the indices use enumerate:
>>> [ind for ind, sub in enumerate(totalist) if sub[:2] == ['A', 'B']]
[0, 1]
And here is a approach in Numpy which is pretty much optimized when you are dealing with large data sets:
>>> import numpy as np
>>>
>>> totalist = np.array([['A','B','C','D','E'],
... ['A','B','X','Y','Z'],
... ['A','F','T','U','V'],
... ['A','F','M','N','O']])
>>> totalist[(totalist[:,:2]==['A', 'B']).all(axis=1)]
array([['A', 'B', 'C', 'D', 'E'],
['A', 'B', 'X', 'Y', 'Z']],
dtype='|S1')
Also as an alternative to list comprehension in python if you don't want to use a loop and you are looking for a functional way, you can use filter function, which is not as optimized as a list comprehension:
>>> list(filter(lambda x: x[:2]==['A', 'B'], totalist))
[['A', 'B', 'C', 'D', 'E'], ['A', 'B', 'X', 'Y', 'Z']]
You imply that you are concerned about performance (cost). If you need to do this, and if you are worried about performance, you need a different data-structure. This will add a little "cost" when you making the lists, but save you time when filtering them.
If the need to filter based on the first two elements is fixed (it doesn't generalise to the first n elements) then I would add the lists, as they are made, to a dict where the key is a tuple of the first two elements, and the item is a list of lists.
then you simply retrieve your list by doing a dict lookup. This is easy to do and will bring potentially large speed ups, at almost no cost in memory and time while making the lists.
Assume you have a list
>>> m = ['a','b','c']
I'd like to make a new list n that has everything except for a given item in m (for example the item 'a'). However, when I use
>>> m.remove('a')
>>> m
m = ['b', 'c']
the original list is mutated (the value 'a' is removed from the original list). Is there a way to get a new list sans-'a' without mutating the original? So I mean that m should still be [ 'a', 'b', 'c' ], and I will get a new list, which has to be [ 'b', 'c' ].
I assume you mean that you want to create a new list without a given element, instead of changing the original list. One way is to use a list comprehension:
m = ['a', 'b', 'c']
n = [x for x in m if x != 'a']
n is now a copy of m, but without the 'a' element.
Another way would of course be to copy the list first
m = ['a', 'b', 'c']
n = m[:]
n.remove('a')
If removing a value by index, it is even simpler
n = m[:index] + m[index+1:]
There is a simple way to do that using built-in function :filter .
Here is ax example:
a = [1, 2, 3, 4]
b = filter(lambda x: x != 3, a)
If the order is unimportant, you can use set (besides, the removal seems to be fast in sets):
list(set(m) - set(['a']))
This will remove duplicate elements from your original list though
We can do it via built-in copy() function for list;
However, should assign a new name for the copy;
m = ['a','b','c']
m_copy=m.copy()
m_copy.remove('a')
print (m)
['a', 'b', 'c']
print(m_copy)
['b', 'c']
You can create a new list without the offending element with a list-comprehension. This will preserve the value of the original list.
l = ['a', 'b', 'c']
[s for s in l if s != 'a']
Another approach to list comprehension is numpy:
>>> import numpy
>>> a = [1, 2, 3, 4]
>>> list(numpy.remove(a, a.index(3)))
[1, 2, 4]
We can do it without using in built remove function and also without creating new list variable
Code:
# List m
m = ['a', 'b', 'c']
# Updated list m, without creating new list variable
m = [x for x in m if x != a]
print(m)
output
>>> ['b', 'c']
The question is useful as I sometimes have a list that I use throughout my given script but I need to at a certain step to apply a logic on a subset of the list elements. In that case I found it useful to use the same list but only exclude the needed element for that individual step, without the need to create a totally new list with a different name. For this you can use either:
list comprehension: say you have l=['a','b','c'] to exclude b, you can have [x for x in l if x!='b']
set [only if order is unimortant]: list(set(l) - set(['b'])), pay attention here that you pass 'b' as list ['b']
For some reason, I keep having 'how do I sort this list of tuples' questions. (A prior question of mine: sorting list of tuples by arbitrary key).
Here is some arbitrary raw input:
number_of = 3 # or whatever
tuple_list = [(n, 'a', 'b', 'c') for n in xrange(number_of)] # [(0, 'a', 'b', 'c')...]
ordering_list = random.sample(range(number_of), number_of) # e.g. [1, 0, 2]
Sorting tuple_list by ordering_list using sorted:
ordered = sorted(tuple_list, key=lambda t: ordering_list.index(t[0]))
# ordered = [(1, 'a', 'b', 'c'), (0, 'a', 'b', 'c'), (2, 'a', 'b', 'c')]
I have a slightly awkward approach which seems to be much faster, especially as the number of elements in the tuple_list grows. I create a dictionary, breaking the tuple into (tuple[0], tuple[1:]) items inside dictionary list_dict. I retrieve the dictionary item using ordering_list as keys, and then re-assemble the sequence of (tuple[0], tuple[1:]) into a list of tuples, using an idiom I'm still trying to wrap my head around completely: zip(*[iter(_list)] * x) where x is the length of each tuple composed of items from _list. So my question is: is there a version of this approach which is manages the disassemble - reassemble part of the code better?
def gen_key_then_values(key_list, list_dict):
for key in key_list:
values = list_dict[key]
yield key
for n in values:
yield n
list_dict = {t[0]: t[1:] for t in tuple_list}
ordered = zip(*[gen_key_then_values(ordering_list, list_dict)] * 4)
NOTE BETTER CODE, using an obvious comment from Steve Jessop below:
list_dict = {t[0]: t for t in tuple_list}
ordered = [list_dict[k] for k in ordering_list]
My actual project code still requires assembling a tuple for each (k, ['a', 'b' ...]) item retrieved from the list_dict but there was no reason for me to include that part of the code here.
Breaking the elements of tuple_list apart in the dictionary doesn't really gain you anything and requires creating a bunch more tuples for the values. All you're doing is looking up elements in the list according to their first element, so it's probably not worth actually splitting them:
list_dict = { t[0] : t for t in tuple_list }
Note that this only works if the first element is unique, but then the ordering_list only makes sense if the first element is unique, so that's probably OK.
zip(*[iter(_list)] * 4) is just a way of grouping _list into fours, so give it a suitable name and you won't have to worry about it:
def fixed_size_groups(n, iterable):
return zip(*[iter(iterable)] * n)
But all things considered you don't actually need it anyway:
ordered = list(list_dict[val] for val in ordering_list)
The reason your first code is slow, is that ordering_list.index is slow -- it searches through the ordering_list for t[0], and it does this once for each t. So in total it does (number_of ** 2) / 2 inspections of a list element.
I have a list in python ('A','B','C','D','E'), how do I get which item is under a particular index number?
Example:
Say it was given 0, it would return A.
Given 2, it would return C.
Given 4, it would return E.
What you show, ('A','B','C','D','E'), is not a list, it's a tuple (the round parentheses instead of square brackets show that). Nevertheless, whether it to index a list or a tuple (for getting one item at an index), in either case you append the index in square brackets.
So:
thetuple = ('A','B','C','D','E')
print thetuple[0]
prints A, and so forth.
Tuples (differently from lists) are immutable, so you couldn't assign to thetuple[0] etc (as you could assign to an indexing of a list). However you can definitely just access ("get") the item by indexing in either case.
values = ['A', 'B', 'C', 'D', 'E']
values[0] # returns 'A'
values[2] # returns 'C'
# etc.
You can use _ _getitem__(key) function.
>>> iterable = ('A', 'B', 'C', 'D', 'E')
>>> key = 4
>>> iterable.__getitem__(key)
'E'
Same as any other language, just pass index number of element that you want to retrieve.
#!/usr/bin/env python
x = [2,3,4,5,6,7]
print(x[5])
You can use pop():
x=[2,3,4,5,6,7]
print(x.pop(2))
output is 4