I am wondering why the default list in Python does not have any shift, unshift methods. Maybe there is an obvious reason for it like the way the lists are ordered in memory.
So currently, I know that I can use append to add an item at the end of a list and pop to remove an element from the end. However, I can only use list concatenation to imitate the behavior of a missing shift or unshift method.
>>> a = [1,2,3,4,5]
>>> a = [0] + a # Unshift / Push
>>> a
[0,1,2,3,4,5]
>>> a = a[1:] # Shift / UnPush
>>> a
[1,2,3,4,5]
Did I miss something?
Python lists were optimized for fast fixed-length operations and incur O(n) memory movement costs for pop(0) and insert(0, v) operations which change both the size and position of the underlying data representation. Actually, the "list" datatype in CPython works differently as to what many other languages might call a list (e.g. a linked-list) - it is implemented more similarly to what other languages might call an array, though there are some differences here too.
You may be interested instead in collections.deque, which is a list-like container with fast appends and pops on either end.
Deques support thread-safe, memory efficient appends and pops from either side of the deque with approximately the same O(1) performance in either direction.
It provides the missing methods you appear to be asking about under the names appendleft and popleft:
appendleft(x)
Add x to the left side of the deque.
popleft()
Remove and return an element from the left side of the deque.
If no elements are present, raises an IndexError.
Of course there are trade-offs: indexing or inserting/removing near the middle of the deque is slow. In fact deque.insert(index, object) wasn't even possible before Python 3.5, you would need to rotate, insert/pop, and rotate back. deques also don't support slicing, for similar functionality you'll have to write something annoying with e.g. itertools.islice instead.
For further discussion of the advantages and disadvantages of deque vs list data structures, see How are deques in Python implemented, and when are they worse than lists?
in Python3
we have insert method on a list.
takes the value you require to add and also the index where you want to add it.
arrayOrList = [1,2,3,4,5]
arrayOrList.insert(0 , 0)
print(arrayOrList)
Related
I have a condition that compares one object to several others, like so:
if 'a' in ('a','b','c','e'):
The sequence was created for this purpose and doesn't exist anywhere else in the function. What are the pros and cons to grouping it as a tuple, list, or set, given that they all seem to work the same and the list is short? Which would be idiomatic?
Use a set until you have good reason not to. (And then use a list.)
I would consider a set to be more idiomatic. It conveys the meaning more clearly, since order doesn't matter, only membership.
And to be clear, a set is a collection but not a "sequence type" (even though it's iterable), because it's semantically "unordered".
Why not use a set?
Sets may only contain hashable types. And, this is important, they will raise a TypeError instead of simply returning False when you ask if an unhashable type is in the set. If you might get an unhashable object on either side of the in operator, you're out of luck. Sometimes you can use hashable elements instead (like frozenset instead of set or tuple instead of list), sometimes you can't.
But tuples and lists don't have to hash their elements.
Why a list over a tuple?
The main advantage of a list that they avoid a syntactic quirk for tuples of one element. Say you have ('foo', 'bar') and later decide to remove the 'bar'. Then you have ('foo'). Oops, see what I did there? It was actually supposed to be ('foo',). It's easy to forget the comma. And the in check still works for strings like ('foo'), since in checks for substrings. This can subtly change the meaning of your program. 'oo' is in ('foo'), but not in ('foo',).
A one-item list like ['foo'] doesn't have that problem. [And as
user2357112 pointed out, a constant list is going to get compiled to a tuple anyway.]
Note that a one-item set, like {'a'} doesn't have that problem either. An empty {} is a dict instead, but that's not going to cause any issues with an in check because it's also an empty collection.
But you should arguably be using == instead of in when comparing against only one element.
That's it for clarity. Now for the micro-optimizations. Early optimization is the root of all evil. Don't optimize at the expense of readability before it's actually necessary.
A set lookup is faster if it's not too small, since a tuple's elements have to be checked one-by-one which (on average) grows with the size of the tuple, while a set is backed by a hashtable (like a dict), which has a small constant overhead. If the distribution of cases isn't uniform, this means that the order of elements in the tuple matters a lot. Putting the more common cases first in the tuple will make the checks much faster than the reverse, on average.
How small does the collection have to be to for the set's constant overhead to matter? Profile and see. Performance can vary based on a lot of factors. It's not just the number of elements, but how long an equality check takes, and where they're located in memory, etc.
A tuple should have a slightly smaller overhead both in memory and construction time than the other collections. But the construction overhead doesn't really matter if the compiler can make it load as a saved constant value. (This can happen when all the elements are themselves constant at compile time. You can use the dis module to confirm this is happening.)
Let's consider this code which iterates over a list while removing an item each iteration:
x = list(range(5))
for i in x:
print(i)
x.pop()
It will print 0, 1, 2. Only the first three elements are printed since the last two elements in the list were removed by the first two iterations.
But if you try something similar on a dict:
y = {i: i for i in range(5)}
for i in y:
print(i)
y.pop(i)
It will print 0, then raise RuntimeError: dictionary changed size during iteration, because we are removing a key from the dictionary while iterating over it.
Of course, modifying a list during iteration is bad. But why is a RuntimeError not raised as in the case of dictionary? Is there any good reason for this behaviour?
I think the reason is simple. lists are ordered, dicts (prior to Python 3.6/3.7) and sets are not. So modifying a lists as you iterate may be not advised as best practise, but it leads to consistent, reproducible, and guaranteed behaviour.
You could use this, for example let's say you wanted to split a list with an even number of elements in half and reverse the 2nd half:
>>> lst = [0,1,2,3]
>>> lst2 = [lst.pop() for _ in lst]
>>> lst, lst2
([0, 1], [3, 2])
Of course, there are much better and more intuitive ways to perform this operation, but the point is it works.
By contrast, the behaviour for dicts and sets is totally implementation specific since the iteration order may change depending on the hashing.
You get a RunTimeError with collections.OrderedDict, presumably for consistency with the dict behaviour. I don't think any change in the dict behaviour is likely after Python 3.6 (where dicts are guaranteed to maintain insertion ordered) since it would break backward compatibility for no real use cases.
Note that collections.deque also raises a RuntimeError in this case, despite being ordered.
It wouldn't have been possible to add such a check to lists without breaking backward compatibility. For dicts, there was no such issue.
In the old, pre-iterators design, for loops worked by calling the sequence element retrieval hook with increasing integer indices until it raised IndexError. (I would say __getitem__, but this was back before type/class unification, so C types didn't have __getitem__.) len isn't even involved in this design, and there is nowhere to check for modification.
When iterators were introduced, the dict iterator had the size change check from the very first commit that introduced iterators to the language. Dicts weren't iterable at all before that, so there was no backward compatibility to break. Lists still went through the old iteration protocol, though.
When list.__iter__ was introduced, it was purely a speed optimization, not intended to be a behavioral change, and adding a modification check would have broken backward compatibility with existing code that relied on the old behavior.
Dictionary uses insertion order with an additional level of indirection, which causes hiccups when iterating while keys are removed and re-inserted, thereby changing the order and internal pointers of the dictionary.
And this problem is not fixed by iterating d.keys() instead of d, since in Python 3, d.keys() returns a dynamic view of the keys in the dict which results in the same problem. Instead, iterate over list(d) as this will produce a list from the keys of the dictionary that will not change during iteration
i wanted to do something like this but this code return list of None (i think it's because list.reverse() is reversing the list in place):
map(lambda row: row.reverse(), figure)
i tried this one, but the reversed return an iterator :
map(reversed, figure)
finally i did something like this , which work for me , but i don't know if it's the right solution:
def reverse(row):
"""func that reverse a list not in place"""
row.reverse()
return row
map(reverse, figure)
if someone has a better solution that i'm not aware of please let me know
kind regards,
The mutator methods of Python's mutable containers (such as the .reverse method of lists) almost invariably return None -- a few return one useful value, e.g. the .pop method returns the popped element, but the key concept to retain is that none of those mutators returns the mutated container: rather, the container mutates in-place and the return value of the mutator method is not that container. (This is an application of the CQS principle of design -- not quite as fanatical as, say, in Eiffel, the language devised by Bertrand Meyer, who also invented CQS, but that's just because in Python "practicality beats purity, cfr import this;-).
Building a list is often costlier than just building an iterator, for the overwhelmingly common case where all you want to do is loop on the result; therefore, built-ins such as reversed (and all the wonderful building blocks in the itertools module) return iterators, not lists.
But what if you therefore have an iterator x but really truly need the equivalent list y? Piece of cake -- just do y = list(x). To make a new instance of type list, you call type list -- this is such a general Python idea that it's even more crucial to retain than the pretty-important stuff I pointed out in the first two paragraphs!-)
So, the code for your specific problem is really very easy to put together based on the crucial notions in the previous paragraphs:
[list(reversed(row)) for row in figure]
Note that I'm using a list comprehension, not map: as a rule of thumb, map should only be used as a last-ditch optimization when there is no need for a lambda to build it (if a lambda is involved then a listcomp, as well as being clearer as usual, also tends to be faster anyway!-).
Once you're a "past master of Python", if your profiling tells you that this code is a bottleneck, you can then know to try alternatives such as
[row[::-1] for row in figure]
applying a negative-step slicing (aka "Martian Smiley") to make reversed copies of the rows, knowing it's usually faster than the list(reversed(row)) approach. But -- unless your code is meant to be maintained only by yourself or somebody at least as skilled at Python -- it's a defensible position to use the simplest "code from first principles" approach except where profiling tells you to push down on the pedal. (Personally I think the "Martian Smiley" is important enough to avoid applying this good general philosophy to this specific use case, but, hey, reasonable people could differ on this very specific point!-).
You can also use a slice to get the reversal of a single list (not in place):
>>> a = [1,2,3,4]
>>> a[::-1]
[4, 3, 2, 1]
So something like:
all_reversed = [lst[::-1] for lst in figure]
...or...
all_reversed = map(lambda x: x[::-1], figure)
...will do what you want.
reversed_lists = [list(reversed(x)) for x in figure]
map(lambda row: list(reversed(row)), figure)
You can also simply do
for row in figure:
row.reverse()
to change each row in place.
The problem
My concern is the following: I am storing a relativity large dataset in a classical python list and in order to process the data I must iterate over the list several times, perform some operations on the elements, and often pop an item out of the list.
It seems that deleting one item out of a Python list costs O(N) since Python has to copy all the items above the element at hand down one place. Furthermore, since the number of items to delete is approximately proportional to the number of elements in the list this results in an O(N^2) algorithm.
I am hoping to find a solution that is cost effective (time and memory-wise). I have studied what I could find on the internet and have summarized my different options below. Which one is the best candidate ?
Keeping a local index:
while processingdata:
index = 0
while index < len(somelist):
item = somelist[index]
dosomestuff(item)
if somecondition(item):
del somelist[index]
else:
index += 1
This is the original solution I came up with. Not only is this not very elegant, but I am hoping there is better way to do it that remains time and memory efficient.
Walking the list backwards:
while processingdata:
for i in xrange(len(somelist) - 1, -1, -1):
dosomestuff(item)
if somecondition(somelist, i):
somelist.pop(i)
This avoids incrementing an index variable but ultimately has the same cost as the original version. It also breaks the logic of dosomestuff(item) that wishes to process them in the same order as they appear in the original list.
Making a new list:
while processingdata:
for i, item in enumerate(somelist):
dosomestuff(item)
newlist = []
for item in somelist:
if somecondition(item):
newlist.append(item)
somelist = newlist
gc.collect()
This is a very naive strategy for eliminating elements from a list and requires lots of memory since an almost full copy of the list must be made.
Using list comprehensions:
while processingdata:
for i, item in enumerate(somelist):
dosomestuff(item)
somelist[:] = [x for x in somelist if somecondition(x)]
This is very elegant but under-the-cover it walks the whole list one more time and must copy most of the elements in it. My intuition is that this operation probably costs more than the original del statement at least memory wise. Keep in mind that somelist can be huge and that any solution that will iterate through it only once per run will probably always win.
Using the filter function:
while processingdata:
for i, item in enumerate(somelist):
dosomestuff(item)
somelist = filter(lambda x: not subtle_condition(x), somelist)
This also creates a new list occupying lots of RAM.
Using the itertools' filter function:
from itertools import ifilterfalse
while processingdata:
for item in itertools.ifilterfalse(somecondtion, somelist):
dosomestuff(item)
This version of the filter call does not create a new list but will not call dosomestuff on every item breaking the logic of the algorithm. I am including this example only for the purpose of creating an exhaustive list.
Moving items up the list while walking
while processingdata:
index = 0
for item in somelist:
dosomestuff(item)
if not somecondition(item):
somelist[index] = item
index += 1
del somelist[index:]
This is a subtle method that seems cost effective. I think it will move each item (or the pointer to each item ?) exactly once resulting in an O(N) algorithm. Finally, I hope Python will be intelligent enough to resize the list at the end without allocating memory for a new copy of the list. Not sure though.
Abandoning Python lists:
class Doubly_Linked_List:
def __init__(self):
self.first = None
self.last = None
self.n = 0
def __len__(self):
return self.n
def __iter__(self):
return DLLIter(self)
def iterator(self):
return self.__iter__()
def append(self, x):
x = DLLElement(x)
x.next = None
if self.last is None:
x.prev = None
self.last = x
self.first = x
self.n = 1
else:
x.prev = self.last
x.prev.next = x
self.last = x
self.n += 1
class DLLElement:
def __init__(self, x):
self.next = None
self.data = x
self.prev = None
class DLLIter:
etc...
This type of object resembles a python list in a limited way. However, deletion of an element is guaranteed O(1). I would not like to go here since this would require massive amounts of code refactoring almost everywhere.
Without knowing the specifics of what you're doing with this list, it's hard to know exactly what would be best in this case. If your processing stage depends on the current index of the list element, this won't work, but if not, it appears you've left off the most Pythonic (and in many ways, easiest) approach: generators.
If all you're doing is iterating over each element, processing it in some way, then either including that element in the list or not, use a generator. Then you never need to store the entire iterable in memory.
def process_and_generate_data(source_iterable):
for item in source_iterable:
dosomestuff(item)
if not somecondition(item):
yield item
You would need to have a processing loop that dealt with persisting the processed iterable (writing it back to a file, or whatever), or if you have multiple processing stages you'd prefer to separate into different generators you could have your processing loop pass one generator to the next.
From your description it sounds like a deque ("deck") would be exactly what you are looking for:
http://docs.python.org/library/collections.html#deque-objects
"Iterate" across it by repeatedly calling pop() and then, if you want to keep the popped item in the deque, returning that item to the front with appendleft(item). To keep up with when you're done iterating and have seen everything in the deque, either put in a marker object like None that you watch for, or just ask for the deque's len() when you start a particular loop and use range() to pop() exactly that many items.
I believe you will find all of the operations you need are then O(1).
Python stores only references to objects in the list - not the elements themselves. If you grow a list item by item, the list (that is the list of references to the objects) will grow one by one, eventually reaching the end of the excess memory that Python preallocated at the end of the list (of references!). It then copies the list (of references!) into a new larger place while your list elements stay at their old location. As your code visits all the elements in the old list anyway, copying the references to a new list by new_list[i]=old_list[i] will be nearly no burden at all. The only performance hint is to allocate all new elements at once instead of appending them (OTOH the Python docs say that amortized append is still O(1) as the number of excess elements grows with the list size). If you are lacking the place for the new list (of references) then I fear you are out of luck - any data structure that would evade the O(n) in-place insert/delete will likely be bigger than a simple array of 4- or 8-byte entries.
A doubly linked list is worse than just reallocating the list. A Python list uses 5 words + one word per element. A doubly linked list will use 5 words per element. Even if you use a singly linked list, it's still going to be 4 words per element - a lot worse than the less than 2 words per element that rebuilding the list will take.
From memory usage perspective, moving items up the list and deleting the slack at the end is the best approach. Python will release the memory if the list gets less than half full. The question to ask yourself is, does it really matter. The list entries probably point to some data, unless you have lots of duplicate objects in the list, the memory used for the list is insignificant compared to the data. Given that, you might just as well just build a new list.
For building a new list, the approach you suggested is not that good. There's no apparent reason why you couldn't just go over the list once. Also, calling gc.collect() is unnecessary and actually harmful - the CPython reference counting will release the old list immediately anyway, and even the other garbage collectors are better off collecting when they hit memory pressure. So something like this will work:
while processingdata:
retained = []
for item in somelist:
dosomething(item)
if not somecondition(item):
retained.append(item)
somelist = retained
If you don't mind using side effects in list comprehensions, then the following is also an option:
def process_and_decide(item):
dosomething(item)
return not somecondition(item)
while processingdata:
somelist = [item for item in somelist if process_and_decide(item)]
The inplace method can also be refactored so the mechanism and business logic are separated:
def inplace_filter(func, list_):
pos = 0
for item in list_:
if func(item):
list_[pos] = item
pos += 1
del list_[pos:]
while processingdata:
inplace_filter(process_and_decide, somelist)
You do not provide enough information I can find to answer this question really well. I don't know your use case well enough to tell you what data structures will get you the time complexities you want if you have to optimize for time. The typical solution is to build a new list rather than repeated deletions, but obviously this doubles(ish) memory usage.
If you have memory usage issues, you might want to abandon using in-memory Python constructs and go with an on-disk database. Many databases are available and sqlite ships with Python. Depending on your usage and how tight your memory requirements are, array.array or numpy might help you, but this is highly dependent on what you need to do. array.array will have all the same time complexities as list and numpy arrays sort of will but work in some different ways. Using lazy iterators (like generators and the stuff in the itertools module) can often reduce memory usage by a factor of n.
Using a database will improve time to delete items from arbitrary locations (though order will be lost if this is important). Using a dict will do the same, but potentially at high memory usage.
You can also consider blist as a drop-in replacement for a list that might get some of the compromises you want. I don't believe it will drastically increase memory usage, but it will change item removal to O(log n). This comes at the cost of making other operations more expensive, of course.
I would have to see testing to believe that the constant factor for memory use for your doubly linked list implementation would be less than the 2 that you get by simply creating a new list. I really doubt it.
You will have to share more about your problem class for a more concrete answer, I think, but the general advice is
Iterate over a list building a new list as you go along (or using a generator to yield the items when you need them). If you actually need a list, this will have a memory factor of 2, which scales fine but doesn't help if you are short on memory period.
If you are running out of memory, rather than microoptimization you probably want an on-disk database or to store your data in a file.
Brandon Craig Rhodes suggests using a collections.deque, which can suit this problem: no additional memory is required for the operation and it is kept O(n). I do not know the total memory usage and how it compares to a list; it's worth noting that a deque has to store a lot more references and I would not be surprised if it isn't as memory intensive as using two lists. You would have to test or study it to know yourself.
If you were to use a deque, I would deploy it slightly differently than Rhodes suggests:
from collections import deque
d = deque(range(30))
n = deque()
print d
while True:
try:
item = d.popleft()
except IndexError:
break
if item % 3 != 0:
n.append(item)
print n
There is no significant memory difference doing it this way, but there's a lot less opportunity to flub up than mutating the same deque as you go.
I was writing a python function that looked something like this
def foo(some_list):
for i in range(0, len(some_list)):
bar(some_list[i], i)
so that it was called with
x = [0, 1, 2, 3, ... ]
foo(x)
I had assumed that index access of lists was O(1), but was surprised to find that for large lists this was significantly slower than I expected.
My question, then, is how are python lists are implemented, and what is the runtime complexity of the following
Indexing: list[x]
Popping from the end: list.pop()
Popping from the beginning: list.pop(0)
Extending the list: list.append(x)
For extra credit, splicing or arbitrary pops.
there is a very detailed table on python wiki which answers your question.
However, in your particular example you should use enumerate to get an index of an iterable within a loop. like so:
for i, item in enumerate(some_seq):
bar(item, i)
The answer is "undefined". The Python language doesn't define the underlying implementation. Here are some links to a mailing list thread you might be interested in.
It is true that Python's lists have
been implemented as contiguous
vectors in the C implementations of
Python so far.
I'm not saying that the O()
behaviours of these things should be
kept a secret or anything. But you
need to interpret them in the context
of how Python works generally.
Also, the more Pythonic way of writing your loop would be this:
def foo(some_list):
for item in some_list:
bar(item)
Lists are indeed O(1) to index - they are implemented as a vector with proportional overallocation, so perform much as you'd expect. The likely reason you were finding this code slower than you expected is the call to "range(0, len(some_list))".
range() creates a new list of the specified size, so if some_list has 1,000,000 items, you will create a new million item list up front. This behaviour changes in python3 (range is an iterator), to which the python2 equivalent is xrange, or even better for your case, enumerate
if you need index and value then use enumerate:
for idx, item in enumerate(range(10, 100, 10)):
print idx, item
Python list actually nothing but arrays. Thus,
indexing takes O(1)
for pop and append again it should be O(1) as per the docs
Check out following link for details:
http://dustycodes.wordpress.com/2012/03/31/pythons-data-structures-complexity-analysis/