Sorting through a nested list - python

I have a list that describes a hierarchy, as such:
[obj1, obj2, [child1, child2, [gchild1, gchild2]] onemoreobject]
Where, child1 (and others) are children of obj2, while gchild1 and 2 are children of child 2.
Each of this objects has attributes like date, for example, and I want to sort them according to such attributes. In regular list I would go like this:
sorted(obj_list, key=attrgetter('date'))
In this case, nonetheless that method wont work, since lists don't have date attribute... Even if it did, if its attribute would be different of its parent, then the hierarchical ordering would be broken. Is there a simple and elegant way to do this in python?

I think you just need to put your key in the sort(key=None) functions and this will work. I tested it with strings and it seems to work. I wasn't sure of the structure of onemoreobject. This was sorted to the beginning with obj1 and obj2. I thought that onemoreobject might represent a new hierarchy so I enclosed each hierarchy into a list to keep like objects together.
def embededsort(alist):
islist = False
temp = []
for index, obj in enumerate(alist):
if isinstance(obj,list):
islist = True
embededsort(obj)
temp.append((index,obj))
if islist:
for lists in reversed(temp):
del alist[lists[0]]
alist.sort(key=None)
for lists in temp:
alist.append(lists[1])
else:
alist.sort(key=None)
return alist
>>>l=[['obj2', 'obj1', ['child2', 'child1', ['gchild2', 'gchild1']]], ['obj22', 'obj21', ['child22', 'child21', ['gchild22', 'gchild21']]]]
>>>print(embededsort(l))
[['obj1', 'obj2', ['child1', 'child2', ['gchild1', 'gchild2']]], ['obj21', 'obj22', ['child21', 'child22', ['gchild21', 'gchild22']]]]

This is an implementation of QuickSort algorithm using the polymorphism provided by Python. It should work for ints, floats, lists, nested lists, tuples and even dictionaries
def qsort(list):
if not list: return []
first = list[0]
lesser = filter( lambda x: x < first, list[1:] )
greater = filter( lambda x: x >= first, list[1:] )
return qsort(lesser) + [first] + qsort(greater)

thanks for the answers, as they gave me quite a few ideas, and new stuff to learn from. The final code, which seems to work looks like this. Not as shor and elegant as I imagined, but works:
def sort_by_date(element_list):
last_item = None
sorted_list = []
for item in element_list:
#if item is a list recurse and store it right below last item (parent)
if type(item) == list:
if last_comparisson:
if last_comparisson == 'greater':
sorted_list.append(sort_by_date(item))
else:
sorted_list.insert(1, sort_by_date(item))
#if not a list check if it is greater or smaller then last comparisson
else:
if last_item == None or item.date > last_item:
last_comparisson = 'greater'
sorted_list.append(item)
else:
last_comparisson = 'smaller'
sorted_list.insert(0, item)
last_item = item.date
return(sorted_list)

If you want to sort all children of a node without taking into consideration those nodes which are not siblings, go for a tree structure:
class Tree:
def __init__ (self, payload):
self.payload = payload
self.__children = []
def __iadd__ (self, child):
self.__children.append (child)
return self
def sort (self, attr):
self.__children = sorted (self.__children, key = lambda x: getattr (x.payload, attr) )
for child in self.__children: child.sort (attr)
def __repr__ (self):
return '{}: {}'.format (self.payload, self.__children)

Related

How to find two items of a list with the same return value of a function on their attribute?

Given a basic class Item:
class Item(object):
def __init__(self, val):
self.val = val
a list of objects of this class (the number of items can be much larger):
items = [ Item(0), Item(11), Item(25), Item(16), Item(31) ]
and a function compute that process and return a value.
How to find two items of this list for which the function compute return the same value when using the attribute val? If nothing is found, an exception should be raised. If there are more than two items that match, simple return any two of them.
For example, let's define compute:
def compute( x ):
return x % 10
The excepted pair would be: (Item(11), Item(31)).
You can check the length of the set of resulting values:
class Item(object):
def __init__(self, val):
self.val = val
def __repr__(self):
return f'Item({self.val})'
def compute(x):
return x%10
items = [ Item(0), Item(11), Item(25), Item(16), Item(31)]
c = list(map(lambda x:compute(x.val), items))
if len(set(c)) == len(c): #no two or more equal values exist in the list
raise Exception("All elements have unique computational results")
To find values with similar computational results, a dictionary can be used:
from collections import Counter
new_d = {i:compute(i.val) for i in items}
d = Counter(new_d.values())
multiple = [a for a, b in new_d.items() if d[b] > 1]
Output:
[Item(11), Item(31)]
A slightly more efficient way to find if multiple objects of the same computational value exist is to use any, requiring a single pass over the Counter object, whereas using a set with len requires several iterations:
if all(b == 1 for b in d.values()):
raise Exception("All elements have unique computational results")
Assuming the values returned by compute are hashable (e.g., float values), you can use a dict to store results.
And you don't need to do anything fancy, like a multidict storing all items that produce a result. As soon as you see a duplicate, you're done. Besides being simpler, this also means we short-circuit the search as soon as we find a match, without even calling compute on the rest of the elements.
def find_pair(items, compute):
results = {}
for item in items:
result = compute(item.val)
if result in results:
return results[result], item
results[result] = item
raise ValueError('No pair of items')
A dictionary val_to_it that contains Items keyed by computed val can be used:
val_to_it = {}
for it in items:
computed_val = compute(it.val)
# Check if an Item in val_to_it has the same computed val
dict_it = val_to_it.get(computed_val)
if dict_it is None:
# If not, add it to val_to_it so it can be referred to
val_to_it[computed_val] = it
else:
# We found the two elements!
res = [dict_it, it]
break
else:
raise Exception( "Can't find two items" )
The for block can be rewrite to handle n number of elements:
for it in items:
computed_val = compute(it.val)
dict_lit = val_to_it.get(computed_val)
if dict_lit is None:
val_to_it[computed_val] = [it]
else:
dict_lit.append(it)
# Check if we have the expected number of elements
if len(dict_lit) == n:
# Found n elements!
res = dict_lit
break

Which is the cleaner way to get a Python #property as a list with particular conditions?

Now I have the source code above:
class Stats(object):
def __init__(self):
self._pending = []
self._done = []
#property
def pending(self):
return self._pending
The way those lists are filled is not important for my question.
The situation is that I'm getting a sublist of these lists this way:
stats = Stats()
// code to fill the lists
stats.pending[2:10]
The problem here is that I expect to get as many elements as I retrieved.
In the example above I expect a sublist that contains 8 elements (10-2).
Of course, actually I'll get less than 8 elements if the list is shorter.
So, what I need is:
When the list has enough items, it returns the corresponding sublist.
When the list is shorter, it returns a sublist with the expected length, filled with the last elements of the original lists and a default value (for example None) for the extra items.
This way, if I did:
pending_tasks = stats.pending[44:46]
And the pending list only contains 30 elements, it should returns a list of two default elements, for example: [None, None]; instead of an empty list ([]) which is the default behaviour of the lists.
I guess I already know how to do it inside a normal method/function, but I want to do it in the most clean way, trying to follow the #property approach, if possible.
Thanks a lot!
This is not easy to do because the slicing operation is what you want to modify, and that happens after the original list has been returned by the property. It's not impossible though, you'll just need to wrap the regular list with another object that will take care of padding the slices for you. How easy or difficult that will be may depend on how much of the list interface you need your wrapper to implement. If you only need indexing and slicing, it's really easy:
class PadSlice(object):
def __init__(self, lst, default_value=None):
self.lst = lst
self.default_value
def __getitem__(self, index):
item = getitem(self.lst, index)
if isinstance(index, slice):
expected_length = (index.stop - index.start) // (index.step or 1)
if len(item) != expected_length:
item.extend([default_value] * (expected_length - len(item)))
return item
This code probably won't work right for negative step slices, or for slices that don't specify one of the end points (it does have logic to detect an omitted step, since that's common). If this was important to you, you could probably fix up those corner cases.
This is not easy. How would the object (list) you return know how it will be sliced later? You could subclass list, however, and override __getitem__ and __getslice__ (Python2 only):
class L(list):
def __getitem__(self, key):
if isinstance(key, slice):
return [list(self)[i] if 0 <= i < len(self) else None for i in xrange(key.start, key.stop, key.step or 1)]
return list(self)[key]
def __getslice__(self, i, j):
return self.__getitem__(slice(i, j))
This will pad all slices with None, fully compatible with negative indexing and steps != 1. And in your property, return an L version of the actual list:
#property
def pending(self):
return L(self._pending)
You can construct a new class, which is a subclass of list. Then you can overload the __getitem__ magic method to overload [] operator to the appropriate behavior. Consider this subclass of list called MyList:
class MyList(list):
def __getitem__(self, index):
"""Modify index [] operator"""
result = super(MyList, self).__getitem__(index)
if isinstance(index, slice):
# Get sublist length.
if index.step: # Check for zero to avoid divide by zero error
sublist_len = (index.stop - index.start) // index.step
else:
sublist_len = (index.stop - index.start)
# If sublist length is greater (or list is shorter), then extend
# the list to length requested with default value of None
if sublist_len > len(self) or index.start > len(self):
result.extend([None for _ in range(sublist_len - len(result))])
return result
Then you can just change the pending method to return a MyList type instead of list.
class Stats(object):
#property
def pending(self):
return MyList(self._pending)
Hopefully this helps.

Multiple "is" in Python

I have a list with a few hundred of objects, and I want to check, if a newcomer object is already added to my list (not an equal object, but exactly this exact instance).
I have a dumb realization like this:
def is_one_of(test_object, all_objects):
for elm in all_objects:
if test_object is elm:
return True
return False
Cannot it be more beautiful?
use any:
if any(x is test_object for x in all_objects):
The example in the python reference looks remarkably similar to your code already :)
Use the any() function:
def is_one_of(test_object, all_objects):
return any(test_object is elm for elm in all_objects)
It'll stop iterating over the generator expression as soon as a True result is found.
Eh, I made it by putting id(element) to a set:
def _unit_list(self):
"""
Returns all units in the order they should be initialized.
(Performs search by width from start_point).
"""
unit_id_set = set()
unit_list = []
unit_id_set.add(self.start_point)
unit_list.append(self.start_point)
pos = 0
while pos < len(unit_list):
cur_unit = unit_list[pos]
for child in cur_unit.links_to:
if not (id(child) in unit_id_set):
unit_list.append(child)
unit_id_set.add(id(child))
pos += 1
return unit_list
You can use
if any(test_object is x for x in all_objects): ...
if you need to do this test often however may be you can keep a set of all object ids instead
all_ids = set(map(id, all_objects))
then you can check faster with
if id(test_object) in all_ids: ...
Another common solution that may apply is to store in the object itself in a specific field if it has been already processed:
# add the object to the list
all_objects.append(x)
x.added = True
...
# Check if already added
if test_object.added: ...
I think you're looking for the in operator. The equivalent function would be:
def is_one_of(test_object, all_objects):
return test_object in all_objects
(but you really wouldn't want to write that as a function).
Edit: I'm wrong. According to the Expressions page:
For the list and tuple types, x in y is true if and only if there exists an index i such that x == y[i] is true.
That would work if your class doesn't define __eq__, but that's more fragile than I'd want to rely on. For example:
class ObjWithEq(object):
def __init__(self, val):
self.val = val
def __eq__(self, other):
return self.val == other.val
a = ObjWithEq(1)
b = ObjWithEq(1)
assert a == b
assert a in [b]
class ObjWithoutEq(object):
def __init__(self, val):
self.val = val
a = ObjWithoutEq(1)
b = ObjWithoutEq(1)
assert a != b
assert a not in [b]

Recursive function to count all items in a nested list?

I'm trying to create a method that will count all the items in a nested list. So count([[3, 2] , [2]]) == 3. However, it's a Class attribute so I can't just simply do:
def count(L, target):
s = 0
for i in L:
if isinstance(i, list):
s += count(i, target)
else:
if i == target:
s += 1
return s
Rather, I tried to do this, but I get a max recursion depth error. I'm not sure why. Before you look at the code, there's a few things to keep in mind: (1) I expect the base list given to only contain lists so it will have the format: [ [], ]. Also (2) the sub lists will not contain anything except items: [ [item, item], [item] ] :
def count(self, stack=None):
n = 0
if stack:
n += len(stack)
else:
for i in self._items:
if isinstance(i, list):
n += self.count(i)
return n
if stack:
Empty lists are considered false in a boolean context. You want if stack is not None.
Why use recursion, though? You don't need it.
def count(self):
return sum(len(item) for item in self._items)
If your lists are only nested one level deep, this is easy.
class MyClass:
def __init__(self, items):
self.items = items
def count(self):
return sum(len(x) for x in self.items)
a = MyClass([[3,2],[2]])
b = MyClass([[1,2,3],[4,5,6],[7],[]])
print(a.count()) # 3
print(b.count()) # 7

Search a list of objects in Python

I have a list of Item objects that have a date attribute. I also have a a single date I am grabbing from the database.
I want to search through the list, find all of the list items that are greater than the date I have returned from the database.
My list of Items objects has over a thousand objects in it, so I want to be as efficient as possible.
I assume that looping over every item in my list and checking if it is greater than the date I returned from the db is not the most efficient way to do this.
class Item(object):
def __init__(self, title, link, description, date):
self.title = title
self.link = link
self.description = description
self.date = date
item_list = []
...
#assume I populate the list with a 1,000 Item objects
filtered_list = []
for it in item_list:
if date_from_db > it.date:
filtered_list.append(it)
List comprehensions are a fairly efficient way to do this outside of a database:
[it for it in item_list if date_from_db > it.date]
Otherwise, you could use the filter builtin:
filter(lambda it: it if date_from_db > it.date, item_list)
The only way to avoid looping over every item in your list is to sort it by date and then just searching it backwards until you find the last item that's greater than your target date, adding them to your filtered_list as you go.
Or sort your list descending and search forwards until you find the first last item that's greater than your target. This would let you easily modify your loop like this:
filtered_list = []
for it in item_list:
if date_from_db > it.date:
filtered_list.append(it)
else:
break
Alternatively, if you expect more than a few items in your filtered list, it may be faster to do a binary search to find the first item that meets your criteria and use a list slice to copy it to filtered_list:
first = binary_search(item_list, lambda it: cmp(date_from_db, it.date))
if first == -1:
return []
return item_list[first:]
And here's a binary search function I adapted from the link in the last paragraph. I believe it should work:
def binary_search(a, comp, lo=0, hi=None):
if hi is None:
hi = len(a)
while lo < hi:
mid = (lo+hi)//2
cmpval = comp(a[mid])
if cmpval < 0:
lo = mid+1
elif cmpval > 0:
hi = mid
else:
return mid
return -1
In response to a claim that this list comp is confusing, I'll post a way to format it that makes it clear. I've been using this a lot recently.
filtered_list = [item # What we're collecting
for item in item_list # What we're collecting it over
if date_from_db < item.date] # the conditions
It does turn what could be a one liner into a three liner like it would be with a regular for loop but in cases much worse than this (and even here) it improves readability and lets you have the improved efficiency.
You can use filter:
filtered_list = filter(lambda(item): date_from_db < item.date, item_list)
Or you can use for comprehension:
filtered_list = [item for item in item_list if date_from_db < item.date]
I believe that people prefer the latter more often, but I like the former. Lambda is just an inline function - you can make that function explicit if you like.
Sort the list then bisect it. This works the same if you're looking for one specific item and will also solve the puzzle of finding an object based on it's attributes.
# add __eq__ and __lt__ functions to your object so they'll sort by date
class Item(object):
"""blah blah as above"""
def __eq__(self, other):
return self.date == other.date
def __lt__(self, other):
"""Order by date ascending"""
return self.date < other.date
# sort that puppy
item_list = sorted(item_list)
# mock up an item to faux search for
item_from_db = Item(None, None, None, date_from_db)
# find the index of it's rightful place
idx = bisect.bisect(item_list, item_from_db)
# aberacadabera
return item_list[idx:]
Implementing an append routine for filtered_list which uses bisect.insort, rather than sorting the whole list in one hit, seems likely to offer performance gains as well. Not tested exactly as posted, but should be enough to get you there.

Categories