Efficient reduction of a list in python

Efficient reduction of a list in python - python

So I have a list of 85 items. I would like to continually reduce this list in half (essentially a binary search on the items) -- my question is then, what is the most efficient way to reduce the list? A list comprehension would continually create copies of the list which is not ideal. I would like in-place removal of ranges of my list until I am left with one element.
I'm not sure if this is relevant but I'm using collections.deque instead of a standard list. They probably work the same way more or less though so I doubt this matters.

For a mere 85 items, truthfully, almost any method you want to use would be more than fast enough. Don't optimize prematurely.
That said, depending on what you're actually doing, a list may be faster than a deque. A deque is faster for adding and removing items at either end, but it doesn't support slicing.
With a list, if you want to copy or delete a contiguous range of items (say, the first 42) you can do this with a slice. Assuming half the list is eliminated at each pass, copying items to a new list would be slower on average than deleting items from the existing list (deleting requires moving the half of the list that's not being deleted "leftward" in memory, which would be about the same time cost as copying the other half, but you won't always need to do this; deleting the latter half of a list won't need to move anything).
To do this with a deque efficiently, you would want to pop() or popleft() the items rather than slicing them (lots of attribute access and method calls, which are relatively expensive in Python), and you'd have to write the loop that controls the operation in Python, which will be slower than the native slice operation.
Since you said it's basically a binary search, it is probably actually fastest to simply find the item you want to keep without modifying the original container at all, and then return a new container holding that single item. A list is going to be faster for this than a deque since you will be doing a lot of accessing items by index. To do this in a deque will require Python to follow the linked list from the beginning each time you access an item, while accessing an item by index is a simple, fast calculation for a list.

collections.deque is implemented via a linked list, hence binary search would be much slower than a linear search. Rethink your approach.

Not sure that this is what you really need but:
x = range(100)
while len(x) > 1:
if condition:
x = x[:int(len(x)/2)]
else:
x = x[int(len(x)/2):]

85 items are not even worth thinking about. Computers are fast, really.
Why would you delete ranges from the list, instead of simply picking the one result?
If there is a good reason why you cant do (2): Keep the original list and change two indices only: The start and end index of the sublist you're looking at.

On a previous question I compared a number of techniques for removing a list of items given a predicate. (That is, I have a function which returns True or False for whether to keep a particular item.) As I recall using a list comprehension was the fastest. The fact is, copying is really really cheap.
The only thing that you can do to improve the speed depends on which items you are removing. But you haven't indicated anything about that so I can't suggest anything.

Related

fast list subtraction for big list with o(n) order in python

I have two big string list in python. I want to subtract these two list fast with the order of o(n). I found some way like remove second list elements in a loop from first list, or converting list to set() (problem:change order of list)and use minus(-) operator, but these methods are not efficient. Is there any way for doing this operation?
a=['1','2','3',...,'500000']
b=['1','2','3',...,'200000']
c=a-b
c=['200001','200002',...,'500000']

Your problem, as formulated, is:
Go through A
For each element, search it in B and take it if it's not found
No assumptions about elements is made
For arbitrary data, list search is O(N), set search is O(1), converting to set is O(N). Going through A is O(N).
So it's O(N^2) with only lists and O(N) if converting B to a set.
The only way you can speed it up is to make either iterating or searching more efficient -- which is impossible without using some additional knowledge about your data. E.g.
In your example, your data are sequential numbers, so you can take A[len(B):].
If you are going to use the same B multiple times, you can cache the set
You can make B a set right off the bat (if order needs to be preserved, you can use an ordered set)
If all data are of the same type and are short, you can use numpy arrays and its fast setdiff1d
etc

Dequeue complexity on using Python list as a queue

A list is used as a queue such that the first element is the head of the queue. I was discussing the complexity of dequeue operation in such data structure with my teacher, who said it is O(1). My thinking is, if you're removing the first element of the list, wouldn't that be O(n), since then all elements after that first element need to be shifted over a spot? Am I looking at this the wrong way?

You're correct. List in Python is inefficient at dequeuing since all items following the item being dequeued need to be copied to their preceding positions like you say. You can instead use collections.deque, which is implemented with a doubly-linked list, so that you can use the popLeft method to dequeue efficiently in O(1) time complexity.

Your teacher probably thought python list is a linked list. It is not. From the doc:
It is also possible to use a list as a queue, where the first element added is the first element retrieved (“first-in, first-out”); however, lists are not efficient for this purpose. While appends and pops from the end of list are fast, doing inserts or pops from the beginning of a list is slow (because all of the other elements have to be shifted by one).
Indeed, the doc suggested you to use collections.deque for this purpose.

You are correct. Python's list is an array-list, so, not appropriate for popping from head. Instead, use collections.deque for O(1) add/remove at first element.

Comparison, Python deque's or indexes?

I am required to access the leftmost element of a Python list, while popping elements from the left. One example of this usage could be the merge operation of a merge sort.
There are two good ways of doing it.
collections.deque
Using indexes, keep an increasing integer as I pop.
Both these methods seems efficient in complexity terms (Should mention at the beginning of the program I need to convert the list to deque, meaning some extra O(n)).
So in the terms of style and speed, which is the best for Python usage ? I did not see an official recommendation or did not encounter a similar question which is why I am asking this.
Thanks in advance.
Related: Time complexity

Definitely collections.deque. First, it's already written so you don't have to. Second, it's written in C so probably it is going to be much faster than another Python re-implementation. Third, using index leaves the head of the list unused, whereas queue is more clever. Appending to a standard list only to increase the starting index on each left pop is very inefficient because you would need to realloc the list more often.

What is the fastest way to access elements in a nested list?

I have a list which is made up out of three layers, looking something like this for illustrative purposes:
a = [[['1'],['2'],['3'],['']],[['5'],['21','33']]]
Thus I have a top list which contains several other lists each of which again contains lists.
The first layer will contain in the tens of lists. The next layer could contain possibly millions of lists and the bottom layer will contain either an empty string, a single string, or a handful of values (each a string).
I now need to access the values in the bottom-most layer and store them in a new list in a particular order which is done inside a loop. What is the fastest way of accessing these values? The amount of memory used is not of primary concern to me (though I obviously don't want to squander it either).
I can think of two ways:
I access list a directly to retrieve the desired value, e.g. a[1][1][0] would return '21'.
I create a copy of the elements of a and then access these to flatten the list a bit more. In this case thus, e.g.: b=a[0], c=a[1] so instead of accessing a[1][1][0] I would now access b[1][0] to retrieve '21'.
Is there any performance penalty involved in accessing nested lists? Thus, is there any benefit to be gained in splitting list a it into separate lists or am I merely incurring a RAM penalty in doing so?

Accessing elements via their index (ie: a[1][1][0]) is a O(1) operation: source. You won't get much quicker than that.
Now, assignment is also a O(1) operation, so there's no difference between the two methods you've described as far as speed goes. The second one actually doesn't incur in any memory problems because assignments to lists are by reference, not by copy (except you explicitly tell it to do it otherwise).

The two methods are more or less identical, given that b=a[0] only binds another name to the list at that index. It does not copy the list. That said, the only difference is that, in your second method, the only difference is that you, in addition to access the nested lists, you end up throwing references around. So, in theory, it is a tiny little bit slower.
As pointed out by #joaquinlpereyra, the Python Wiki has a list of the complexity of such operations: https://wiki.python.org/moin/TimeComplexity
So, long answer cut short: Just accessing the list items is faster.

Is it possible and advisable to construct a generator returning unique values from a collection

Currently I'm doing this:
# duplicates is a list
uniques = list(set(duplicates))
However, uniques is often transitory. Would it be better to construct a generator for uniques? If so, how would I do this?

If you don't need a list, just use set(duplicates) instead. That roughly halves your memory use. Sets are iterable.
Alternatively, you can define a generator:
def uniques(it):
seen = set()
for x in it:
if x not in seen:
yield x
seen.add(x)
but my hunch is that this will be a lot slower than just constructing a set in one go. In any case, the memory consumption is about the same.

It is not entirely clearly to me what you're hoping to achieve by using a generator.
One thing is clear: it won't lower memory requirements, since in order to establish whether the current element is unique, the generator would need to know all previously seen unique elements.
Also, the purpose of constructing the list in list(set(...)) is not entirely clear. Why not just stick with the set that you're already constructing?

There are two possible benefits from using generators instead of static collections, of which only one (possibly) applies here:
Memory usage. Does not apply here, because to generate uniques you need O(n) memory this way or the other
Time - if you expect to consume only part of the generated output, then you can save time by producing it lazily. So if this is your case, then maybe using a generator will save you some processing power. Of course to generate uniques lazily you need to remember the set of values already produced (see above) and filter those out as you go.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Efficient reduction of a list in python - python

collections.deque is implemented via a linked list, hence binary search would be much slower than a linear search. Rethink your approach.

Not sure that this is what you really need but: x = range(100) while len(x) > 1: if condition: x = x[:int(len(x)/2)] else: x = x[int(len(x)/2):]

Related

fast list subtraction for big list with o(n) order in python

Dequeue complexity on using Python list as a queue

Comparison, Python deque's or indexes?

What is the fastest way to access elements in a nested list?

Is it possible and advisable to construct a generator returning unique values from a collection

Categories

Resources