Search a list of objects in Python - python

I have a list of Item objects that have a date attribute. I also have a a single date I am grabbing from the database.
I want to search through the list, find all of the list items that are greater than the date I have returned from the database.
My list of Items objects has over a thousand objects in it, so I want to be as efficient as possible.
I assume that looping over every item in my list and checking if it is greater than the date I returned from the db is not the most efficient way to do this.
class Item(object):
def __init__(self, title, link, description, date):
self.title = title
self.link = link
self.description = description
self.date = date
item_list = []
...
#assume I populate the list with a 1,000 Item objects
filtered_list = []
for it in item_list:
if date_from_db > it.date:
filtered_list.append(it)

List comprehensions are a fairly efficient way to do this outside of a database:
[it for it in item_list if date_from_db > it.date]
Otherwise, you could use the filter builtin:
filter(lambda it: it if date_from_db > it.date, item_list)

The only way to avoid looping over every item in your list is to sort it by date and then just searching it backwards until you find the last item that's greater than your target date, adding them to your filtered_list as you go.
Or sort your list descending and search forwards until you find the first last item that's greater than your target. This would let you easily modify your loop like this:
filtered_list = []
for it in item_list:
if date_from_db > it.date:
filtered_list.append(it)
else:
break
Alternatively, if you expect more than a few items in your filtered list, it may be faster to do a binary search to find the first item that meets your criteria and use a list slice to copy it to filtered_list:
first = binary_search(item_list, lambda it: cmp(date_from_db, it.date))
if first == -1:
return []
return item_list[first:]
And here's a binary search function I adapted from the link in the last paragraph. I believe it should work:
def binary_search(a, comp, lo=0, hi=None):
if hi is None:
hi = len(a)
while lo < hi:
mid = (lo+hi)//2
cmpval = comp(a[mid])
if cmpval < 0:
lo = mid+1
elif cmpval > 0:
hi = mid
else:
return mid
return -1

In response to a claim that this list comp is confusing, I'll post a way to format it that makes it clear. I've been using this a lot recently.
filtered_list = [item # What we're collecting
for item in item_list # What we're collecting it over
if date_from_db < item.date] # the conditions
It does turn what could be a one liner into a three liner like it would be with a regular for loop but in cases much worse than this (and even here) it improves readability and lets you have the improved efficiency.

You can use filter:
filtered_list = filter(lambda(item): date_from_db < item.date, item_list)
Or you can use for comprehension:
filtered_list = [item for item in item_list if date_from_db < item.date]
I believe that people prefer the latter more often, but I like the former. Lambda is just an inline function - you can make that function explicit if you like.

Sort the list then bisect it. This works the same if you're looking for one specific item and will also solve the puzzle of finding an object based on it's attributes.
# add __eq__ and __lt__ functions to your object so they'll sort by date
class Item(object):
"""blah blah as above"""
def __eq__(self, other):
return self.date == other.date
def __lt__(self, other):
"""Order by date ascending"""
return self.date < other.date
# sort that puppy
item_list = sorted(item_list)
# mock up an item to faux search for
item_from_db = Item(None, None, None, date_from_db)
# find the index of it's rightful place
idx = bisect.bisect(item_list, item_from_db)
# aberacadabera
return item_list[idx:]
Implementing an append routine for filtered_list which uses bisect.insort, rather than sorting the whole list in one hit, seems likely to offer performance gains as well. Not tested exactly as posted, but should be enough to get you there.

Related

Which is the cleaner way to get a Python #property as a list with particular conditions?

Now I have the source code above:
class Stats(object):
def __init__(self):
self._pending = []
self._done = []
#property
def pending(self):
return self._pending
The way those lists are filled is not important for my question.
The situation is that I'm getting a sublist of these lists this way:
stats = Stats()
// code to fill the lists
stats.pending[2:10]
The problem here is that I expect to get as many elements as I retrieved.
In the example above I expect a sublist that contains 8 elements (10-2).
Of course, actually I'll get less than 8 elements if the list is shorter.
So, what I need is:
When the list has enough items, it returns the corresponding sublist.
When the list is shorter, it returns a sublist with the expected length, filled with the last elements of the original lists and a default value (for example None) for the extra items.
This way, if I did:
pending_tasks = stats.pending[44:46]
And the pending list only contains 30 elements, it should returns a list of two default elements, for example: [None, None]; instead of an empty list ([]) which is the default behaviour of the lists.
I guess I already know how to do it inside a normal method/function, but I want to do it in the most clean way, trying to follow the #property approach, if possible.
Thanks a lot!
This is not easy to do because the slicing operation is what you want to modify, and that happens after the original list has been returned by the property. It's not impossible though, you'll just need to wrap the regular list with another object that will take care of padding the slices for you. How easy or difficult that will be may depend on how much of the list interface you need your wrapper to implement. If you only need indexing and slicing, it's really easy:
class PadSlice(object):
def __init__(self, lst, default_value=None):
self.lst = lst
self.default_value
def __getitem__(self, index):
item = getitem(self.lst, index)
if isinstance(index, slice):
expected_length = (index.stop - index.start) // (index.step or 1)
if len(item) != expected_length:
item.extend([default_value] * (expected_length - len(item)))
return item
This code probably won't work right for negative step slices, or for slices that don't specify one of the end points (it does have logic to detect an omitted step, since that's common). If this was important to you, you could probably fix up those corner cases.
This is not easy. How would the object (list) you return know how it will be sliced later? You could subclass list, however, and override __getitem__ and __getslice__ (Python2 only):
class L(list):
def __getitem__(self, key):
if isinstance(key, slice):
return [list(self)[i] if 0 <= i < len(self) else None for i in xrange(key.start, key.stop, key.step or 1)]
return list(self)[key]
def __getslice__(self, i, j):
return self.__getitem__(slice(i, j))
This will pad all slices with None, fully compatible with negative indexing and steps != 1. And in your property, return an L version of the actual list:
#property
def pending(self):
return L(self._pending)
You can construct a new class, which is a subclass of list. Then you can overload the __getitem__ magic method to overload [] operator to the appropriate behavior. Consider this subclass of list called MyList:
class MyList(list):
def __getitem__(self, index):
"""Modify index [] operator"""
result = super(MyList, self).__getitem__(index)
if isinstance(index, slice):
# Get sublist length.
if index.step: # Check for zero to avoid divide by zero error
sublist_len = (index.stop - index.start) // index.step
else:
sublist_len = (index.stop - index.start)
# If sublist length is greater (or list is shorter), then extend
# the list to length requested with default value of None
if sublist_len > len(self) or index.start > len(self):
result.extend([None for _ in range(sublist_len - len(result))])
return result
Then you can just change the pending method to return a MyList type instead of list.
class Stats(object):
#property
def pending(self):
return MyList(self._pending)
Hopefully this helps.

Python: using a function inside a function?

So for an assignment I have to create a bunch of different functions in one Python file. One of the functions calls for inputting a list (sorted_list) and a string from that list (item). What the function does is reads the list and removes any duplicates of the specified string from the list.
def remove_duplicates(sorted_list, item):
list_real = []
for x in range(len(sorted_list)-1):
if(sorted_list[i] == item and sorted_list[i+1] == item):
list_real = list_real + [item]
i+1
else:
if(sorted_list[i] != item):
list_real = list_real + [sorted_list[i]]
i+=1
return list_real
So
remove_duplicates(['a','a','a','b','b','c'] 'a') would return ['a','b','b','c']
This probably isn't the most efficient way to do something like this, but that isn't my problem.
The next function I have to define is similar to the one above except it only takes sorted_list and it has to remove duplicates for each item instead of a specified one. The only thing I know is that you have to use a for loop that makes the remove_duplicates run for each item in a given list, but I have no idea how to actually implement a function inside of another function. Can anyone help me out?
This works nicely:
from itertools import ifilterfalse
def remove_duplicates(sorted_list, item):
idx = sorted_list.index(item)
list_real = sorted_list[:idx+1]
if len(list_real) != len(sorted_list):
for item in ifilterfalse (lambda x: x is item, sorted_list[idx:]):
list_real.append(item)
return list_real

recursive sorting in python

I am trying to run a sorting function recursively in python. I have an empty list that starts everything but everytime I try to print the list I get an empty list. here is my code. Any help would be greatly appreciated
def parse(list):
newParse = []
if len(list) == 0:
return newParse
else:
x = min(list)
list.remove(x)
newParse.append(x)
return sort(list)
The value of newParse is not preserved between invocations of the function; you're setting it equal to [] (well, you're creating a new variable with the value []).
Since the only time you return is
newParse = []
if len(list) == 0:
return newParse`
you will always be returning [] because that is the value of newParse at that time.
Because you are doing this recursively, you are calling the function anew, without keeping the function's own state. Take a moment to consider the implications of this on your code.
Instead of initialising newParse = [], add an optional parameter newParse defaulting to a bogus value, and set newParse = [] if you receive that bogus value for newParse. Otherwise, you'll actually be getting the same list every time (i.e. the contents of the list object are being mutated). And newParse through in your tail call.
You also seem to have the problem that your definition and and the supposedly-recursive call refer to different functions.
def sort(list, newParse = None):
if newParse is None:
newParse = []
if len(list) == 0:
return newParse
else:
x = min(list)
list.remove(x)
newParse.append(x)
return sort(list, newParse)
Here is what I think you are trying to do:
def recursive_sort(a_list):
def helper_function(list_to_be_sorted, list_already_sorted):
new = []
if len(list_to_be_sorted) == 0:
return list_already_sorted
else:
x = min(list_to_be_sorted)
list_to_be_sorted.remove(x)
new.append(x)
return helper_function(list_to_be_sorted, list_already_sorted + new)
return helper_function(a_list, [])
You shouldn't name variables list, as that is a builtin.
Also, if you are trying to implement a recursive sort function, you might want to look at quicksort, which is a very common (and fast) recursive sorting algorithm. What you have tried to implement is a recursive version of selection sort, which is much slower.
Also, if you actually need a sorting function, rather than just wanting to implement a recursive one, you should use the list method sort, or the function on an iterable sorted, both of which will be a lot faster than anything you could make in Python.

Using recursion to create a linked list from a list

how would one go about using recursion in order to take a list of random values and make it a linked list? Where each value is a node. As of right now, i've tried implementing the following...
def pyListToMyList(pylst):
lists = mkMyList()
lists.head = pyListToMyListRec(pylst)
return lists
def pyListToMyList(pylst):
if pylst:
return mkEmptyNode()
else:
return mkNode(pylst[0], pyLstToMyListRec(pylst[1:]))
The problem is the the else statement which returns an error saying that the index is out of range.
def pyListToMyList(pylst):
if not pylst:
return mkEmptyNode()
else:
return mkNode(pylst[0], pyLstToMyListRec(pylst[1:]))
EDIT: Though this is O(n^2) because of all the list copying.
I would do
def pyListToMyList(pylst, i=0):
if i > len(pylst):
return mkEmptyNode()
else:
return mkNode(pylst[i], pyLstToMyListRec(pylst, i+1))
or even more efficient and less likely to overflow stack (though this does not use recursion):
def pyListToMyList(pylst):
lst = mkEmptyNode()
for x in reversed(pylist):
lst = mkNode(x, lst)
return lst

Sorting through a nested list

I have a list that describes a hierarchy, as such:
[obj1, obj2, [child1, child2, [gchild1, gchild2]] onemoreobject]
Where, child1 (and others) are children of obj2, while gchild1 and 2 are children of child 2.
Each of this objects has attributes like date, for example, and I want to sort them according to such attributes. In regular list I would go like this:
sorted(obj_list, key=attrgetter('date'))
In this case, nonetheless that method wont work, since lists don't have date attribute... Even if it did, if its attribute would be different of its parent, then the hierarchical ordering would be broken. Is there a simple and elegant way to do this in python?
I think you just need to put your key in the sort(key=None) functions and this will work. I tested it with strings and it seems to work. I wasn't sure of the structure of onemoreobject. This was sorted to the beginning with obj1 and obj2. I thought that onemoreobject might represent a new hierarchy so I enclosed each hierarchy into a list to keep like objects together.
def embededsort(alist):
islist = False
temp = []
for index, obj in enumerate(alist):
if isinstance(obj,list):
islist = True
embededsort(obj)
temp.append((index,obj))
if islist:
for lists in reversed(temp):
del alist[lists[0]]
alist.sort(key=None)
for lists in temp:
alist.append(lists[1])
else:
alist.sort(key=None)
return alist
>>>l=[['obj2', 'obj1', ['child2', 'child1', ['gchild2', 'gchild1']]], ['obj22', 'obj21', ['child22', 'child21', ['gchild22', 'gchild21']]]]
>>>print(embededsort(l))
[['obj1', 'obj2', ['child1', 'child2', ['gchild1', 'gchild2']]], ['obj21', 'obj22', ['child21', 'child22', ['gchild21', 'gchild22']]]]
This is an implementation of QuickSort algorithm using the polymorphism provided by Python. It should work for ints, floats, lists, nested lists, tuples and even dictionaries
def qsort(list):
if not list: return []
first = list[0]
lesser = filter( lambda x: x < first, list[1:] )
greater = filter( lambda x: x >= first, list[1:] )
return qsort(lesser) + [first] + qsort(greater)
thanks for the answers, as they gave me quite a few ideas, and new stuff to learn from. The final code, which seems to work looks like this. Not as shor and elegant as I imagined, but works:
def sort_by_date(element_list):
last_item = None
sorted_list = []
for item in element_list:
#if item is a list recurse and store it right below last item (parent)
if type(item) == list:
if last_comparisson:
if last_comparisson == 'greater':
sorted_list.append(sort_by_date(item))
else:
sorted_list.insert(1, sort_by_date(item))
#if not a list check if it is greater or smaller then last comparisson
else:
if last_item == None or item.date > last_item:
last_comparisson = 'greater'
sorted_list.append(item)
else:
last_comparisson = 'smaller'
sorted_list.insert(0, item)
last_item = item.date
return(sorted_list)
If you want to sort all children of a node without taking into consideration those nodes which are not siblings, go for a tree structure:
class Tree:
def __init__ (self, payload):
self.payload = payload
self.__children = []
def __iadd__ (self, child):
self.__children.append (child)
return self
def sort (self, attr):
self.__children = sorted (self.__children, key = lambda x: getattr (x.payload, attr) )
for child in self.__children: child.sort (attr)
def __repr__ (self):
return '{}: {}'.format (self.payload, self.__children)

Categories