Is there a method like pop that temporarily removes, say an element of a list, without permanently changing the original list?
One that would do the following:
list = [1,2,3,4]
newpop(list, 0) returns [2,3,4]
list is unchanged
I come from R, where I would just do c(1,2,3,4)[-4] if I wanted to temporarily remove the last element of some list, so forgive me if I'm thinking backwards here.
I know I could write a function like the following:
def newpop(list, index):
return(list[:index] + list[index+1 :]
, but it seems overly complex? Any tips would be appreciated, I'm trying to learn to think more Python and less R.
I am probably taking the "temporary" bit way too litaral, but you could define a contextmanager to pop the item from the list and insert it back in when you are done working with the list:
from contextlib import contextmanager
#contextmanager
def out(lst, idx):
x = lst.pop(idx) # enter 'with'
yield # in `with`, no need for `as` here
lst.insert(idx, x) # leave 'with'
lst = list("abcdef")
with out(lst, 2):
print(lst)
# ['a', 'b', 'd', 'e', 'f']
print(lst)
# ['a', 'b', 'c', 'd', 'e', 'f']
Note: This does not create a copy of the list. All changes you do to the list during with will reflect in the original, up to the point that inserting the element back into the list might fail if the index is no longer valid.
Also note that popping the element and then putting it back into the list will have complexity up to O(n) depending on the position, so from a performance point of view this does not really make sense either, except if you want to save memory on copying the list.
More akin to your newpop function, and probably more practical, you could use a list comprehension with enumerate to create a copy of the list without the offending position. This does not create the two temporary slices and might also be a bit more readable and less prone to off-by-one mistakes.
def without(lst, idx):
return [x for i, x in enumerate(lst) if i != idx]
print(without(lst, 2))
# ['a', 'b', 'd', 'e', 'f']
You could also change this to return a generator expression by simply changing the [...] to (...), i.e. return (x for ...). This will then be sort of a read-only "view" on the list without creating an actual copy.
Related
I need to check whether a given list is equal to the result of substituting some lists for some elements in another list. Concretely, I have a dictionary, say f = {'o': ['a', 'b'], 'l': ['z'], 'x': ['y']} and a list list1 = ['H', 'e', 'l', 'l', 'o'], so I want to check if some list2 is equal to ['H', 'e', 'z', 'z', 'a', 'b'].
Below, I first write a function apply to compute the image of list1 under f. Then, it suffices to write list2 == apply(list1, f). Since this function will be called thousands of times in my program, I need to make it very fast. Therefore, I thought of the second function below, which should be faster but turns out not to be. So my questions (detailed below) are: why? And: is there a faster method?
First function:
def apply(l, f):
result = []
for x in l:
if x in f:
result.extend(f[x])
else:
result.append(x)
return result
Second function:
def apply_equal(list1, f, list2):
i = 0
for x in list1:
if x in f:
sublist = f[x]
length = len(sublist)
if list2[i:i + length] != substmt:
return False
i += length
else:
if list2[i] != x:
return False
i += 1
return i == len(list2)
I thought the second method would be faster since it does not construct the list which is the image of the first list by the function and then checks equality with the second list. On the contrary, it checks equality "on the fly" without constructing a new list. So I was surprised to see that it is not faster (and even: a bit slower). For the record: list1, list2, and the lists which are values in the dictionary are all small (typically under 50 elements), as well as the number of keys of the dictionary.
So my questions are: why isn't the second method faster ? And: are there ways to do this faster (possibly using other data structures)?
Edit in response to the comments: list1 and list2 will most often be different, but f may be common to some of them. Typically, there could be around 100,000 checks in batches of around 50 consecutive checks with a common f. The elements of the lists are short strings. It is expected that all checks return True (so the whole lists have to be iterated over).
Without proper data for benchmarking it's hard to say, so I tested various solutions with various "sizes" of data.
Replacing result.extend(f[x]) with result += f[x] always made it faster.
This was faster for longer lists (using itertools.chain):
list2 == list(chain.from_iterable(map(f.get, list1, zip(list1))))
If the data allows it, explicitly storing all possible keys and always accessing with f[x] would speed it up. That is, set f[k] = k, for all "missing" keys in advance, so you don't have to check with in or use get.
You need to use profiling tools like scalene to see what's slow in your code, don't try to guess.
In case you want to read it, I was able to produce an even slower version based on your idea of stoping as soon as possible, but while keeping the first readable apply implementation:
def apply(l, f):
for x in l:
if x in f:
yield from f[x]
else:
yield x
def apply_equal(l1, f, l2):
return all(left == right for left, right in zip(apply(l1, f), l2, strict=True))
Beware it needs Python 3.10 for zip's strict=True.
As the comments told, speed highly depends on your data here, constructing the whole list may look faster on small datasets, but halting soon may be faster on a bigger list.
I'm trying to figure out a way to find all the elements that appear between two list elements (inclusive) - but to do it without reference to position, and instead with reference to the elements themselves. It's easier to explain with code:
I have a list like this:
['a','b','c','d','e']
And I want a function that would take, two arguments corresponding to elements eg. f('a','d'), and return the following:
['a','b','c','d']
I'd also like it to wrap around, eg. f('d','b'):
['d','e','a','b']
I'm not sure how to go about coding this. One hacky way I've thought of is duplicating the list in question (['a','b','c','d','e','a','b','c','d','e']) and then looping through it and flagging when the first element appears and when the last element does and then discarding the rest - but it seems like there would be a better way. Any suggestions?
def foo(a, b):
s, e = [a.index(x) for x in b]
if s <= e:
return a[s:e+1]
else:
return a[s:] + a[:e+1]
print(foo(['a','b','c','d','e'], ['a', 'd'])) # --> ['a', 'b', 'c', 'd']
print(foo(['a','b','c','d','e'], ['d', 'b'])) # --> ['d', 'e', 'a', 'b']
So the following obviously needs error handling as indicated below, and also, note the the index() function only takes the index of the first occurrence. You have not specified how you want to handle duplicate elements in the list.
def f(mylist, elem1, elem2):
posn_first = mylist.index(elem1) # what if it's not in the list?
posn_second = mylist.index(elem2) # ditto
if (posn_first <= posn_second):
return mylist[posn_first:posn_second+1]
else:
return mylist[posn_first:] + mylist[:posn_second+1]
This would be a simple approach, given you always want to use the first appearence of the element in the list:
def get_wrapped_values(input_list, start_element, end_element):
return input_list[input_list.index(start_element): input_list.index(end_element)+1]
I'm looking to retrieve all but one value from a list:
ll = ['a','b','c']
nob = [x for x in ll if x !='b']
Is there any simpler, more pythonic way to do this, with sets perhaps?
given that the element is unique in the list, you can use list.index
i = l.index('b')
l = ll[:i] +ll[i+1:]
another possibility is to use list.remove
ll.remove('b') #notice that ll will change underneath here
whatever you do, you'll always have to step through the list and compare each element, which gets slow for long lists. However, using the index, you'll get the index of the first matching element and can operate with this alone, thus avoiding to step through the remainder of the list.
list_ = ['a', 'b', 'c']
list_.pop(1)
You can also use .pop, and pass the index column, or name, that you want to pop from the list. When you print the list you will see that it stores ['a', 'c'] and 'b' has been "popped" from it.
What is the best way to categorize a list in python?
for example:
totalist is below
totalist[1] = ['A','B','C','D','E']
totalist[2] = ['A','B','X','Y','Z']
totalist[3] = ['A','F','T','U','V']
totalist[4] = ['A','F','M','N','O']
Say I want to get the list where the first two items are ['A','B'], basically list[1] and list[2]. Is there an easy way to get these without iterate one item at a time? Like something like this?
if ['A','B'] in totalist
I know that doesn't work.
You could check the first two elements of each list.
for totalist in all_lists:
if totalist[:2] == ['A', 'B']:
# Do something.
Note: The one-liner solutions suggested by Kasramvd are quite nice too. I found my solution more readable. Though I should say comprehensions are slightly faster than regular for loops. (Which I tested myself.)
Just for fun, itertools solution to push per-element work to the C layer:
from future_builtins import map # Py2 only; not needed on Py3
from itertools import compress
from operator import itemgetter
# Generator
prefixes = map(itemgetter(slice(2)), totalist)
selectors = map(['A','B'].__eq__, prefixes)
# If you need them one at a time, just skip list wrapping and iterate
# compress output directly
matches = list(compress(totalist, selectors))
This could all be one-lined to:
matches = list(compress(totalist, map(['A','B'].__eq__, map(itemgetter(slice(2)), totalist))))
but I wouldn't recommend it. Incidentally, if totalist might be a generator, not a re-iterable sequence, you'd want to use itertools.tee to double it, adding:
totalist, forselection = itertools.tee(totalist, 2)
and changing the definition of prefixes to map over forselection, not totalist; since compress iterates both iterators in parallel, tee won't have meaningful memory overhead.
Of course, as others have noted, even moving to C, this is a linear algorithm. Ideally, you'd use something like a collections.defaultdict(list) to map from two element prefixes of each list (converted to tuple to make them legal dict keys) to a list of all lists with that prefix. Then, instead of linear search over N lists to find those with matching prefixes, you just do totaldict['A', 'B'] and you get the results with O(1) lookup (and less fixed work too; no constant slicing).
Example precompute work:
from collections import defaultdict
totaldict = defaultdict(list)
for x in totalist:
totaldict[tuple(x[:2])].append(x)
# Optionally, to prevent autovivification later:
totaldict = dict(totaldict)
Then you can get matches effectively instantly for any two element prefix with just:
matches = totaldict['A', 'B']
You could do this.
>>> for i in totalist:
... if ['A','B']==i[:2]:
... print i
Basically you can't do this in python with a nested list. But if you are looking for an optimized approach here are some ways:
Use a simple list comprehension, by comparing the intended list with only first two items of sub lists:
>>> [sub for sub in totalist if sub[:2] == ['A', 'B']]
[['A', 'B', 'C', 'D', 'E'], ['A', 'B', 'X', 'Y', 'Z']]
If you want the indices use enumerate:
>>> [ind for ind, sub in enumerate(totalist) if sub[:2] == ['A', 'B']]
[0, 1]
And here is a approach in Numpy which is pretty much optimized when you are dealing with large data sets:
>>> import numpy as np
>>>
>>> totalist = np.array([['A','B','C','D','E'],
... ['A','B','X','Y','Z'],
... ['A','F','T','U','V'],
... ['A','F','M','N','O']])
>>> totalist[(totalist[:,:2]==['A', 'B']).all(axis=1)]
array([['A', 'B', 'C', 'D', 'E'],
['A', 'B', 'X', 'Y', 'Z']],
dtype='|S1')
Also as an alternative to list comprehension in python if you don't want to use a loop and you are looking for a functional way, you can use filter function, which is not as optimized as a list comprehension:
>>> list(filter(lambda x: x[:2]==['A', 'B'], totalist))
[['A', 'B', 'C', 'D', 'E'], ['A', 'B', 'X', 'Y', 'Z']]
You imply that you are concerned about performance (cost). If you need to do this, and if you are worried about performance, you need a different data-structure. This will add a little "cost" when you making the lists, but save you time when filtering them.
If the need to filter based on the first two elements is fixed (it doesn't generalise to the first n elements) then I would add the lists, as they are made, to a dict where the key is a tuple of the first two elements, and the item is a list of lists.
then you simply retrieve your list by doing a dict lookup. This is easy to do and will bring potentially large speed ups, at almost no cost in memory and time while making the lists.
from pprint import *
sites = [['a','b','c'],['d','e','f'],[1,2,3]]
pprint(sites)
for site in sites:
sites.remove(site)
pprint(sites)
outputs:
[['a', 'b', 'c'], ['d', 'e', 'f'], [1, 2, 3]]
[['d', 'e', 'f']]
why is it not None, or an empty list [] ?
It's because you're modifying a list as you're iterating over it. You should never do that.
For something like this, you should make a copy of the list and iterate over that.
for site in sites[:]:
sites.remove(site)
Because resizing a collection while iterating over it is the Python equivalent to undefined behaviour in C and C++. You may get an exception or subtly wrong behaviour. Just don't do it. In this particular case, what likely happens under the hood is:
The iterator starts with index 0, stores that it is at index 0, and gives you the item stored at that index.
You remove the item at index 0 and everything afterwards is moved to the left by one to fill the hole.
The iterator is asked for the next item, and faithfully increments the index it's at by one, stores 1 as the new index, and gives you the item at that index. But because of said moving of items caused by the remove operation, the item at index 1 is the item that started out at index 2 (the last item).
You delete that.
The iterator is asked for the next item, but signals end of iteration as the next index (2) is out of range (which is now just 0..0).
Normally I would expect the iterator to bail out because of modifying the connected list. With a dictionary, this would happen at least.
Why is the d, e, f stuff not removed? I can only guess: Probably the iterator has an internal counter (or is even only based on the "fallback iteration protocol" with getitem).
I. e., the first item yielded is sites[0], i. e. ['a', 'b', 'c']. This is then removed from the list.
The second one is sites[1] - which is [1, 2, 3] because the indexes have changed. This is removed as well.
And the third would be sites[2] - but as this would be an index error, the iterator stops.