List of lists: Changing all references with one assignment? - python

Rationale: I start a script with a list that contains multiple heaps. At every step, I want to "join" heaps, but keep both joined indexes pointing to the same element.
In short, given a list of mutable elements, such that list[a] is list[b] returns True, how can I assign (list[a] = [new]) while keeping list[a] and list[b] references to the same mutable container?
So for example, if each heap was represented by a letter, we would have
t is ['a', 'b', 'c', 'd']
t is ['ac', 'b', 'ac', 'd'] (0 and 2 merge)
t is ['abc', 'abc', 'abc', 'd'] (0 and 1 merge, but since 0 also refers to 2, it is as if 0/2 and 1 merged... into 0/1/2)
at this stage, if I did a set(map(id, t)), I would expect it to only have TWO elements.
my problem is that I cannot seem to affect the object being pointed to directly, so I have to iterate over the entire list, picking out any ids that match either merge index, and assign directly.
is there any way to change the underlying object rather than all pointers to it?
Full Example of desired behavior:
>>> my_list = [['a'], ['b'], ['c']]
>>> merge(my_list, 0, 2) # mutates, returns None
>>> my_list
[['a', 'c'], ['b'], ['a', 'c']]
>>> my_list[0] is my_list[2]
True
>>> merge(my_list, 0, 1)
>>> my_list
[['a', 'c', 'b'], ['a', 'c', 'b'], ['a', 'c', 'b']]
>>> my_list[0] is my_list[1]
True
>>> my_list[1] is my_list[2]
True
The problem is that, if within merge I simply call
my_list[arg1] = my_list[arg2] = my_list[arg1]+my_list[arg2]
it ONLY affects the entries at arg1 and arg2. I want it to affect any other entries that may point to the elements at either my_list[arg1] or my_list[arg2], so that eventually my_list is just a collection of pointers to the same big heap, that has absorbed all the little ones.

This is as close as you are going to get:
def merge(primary, first, second):
primary[first] += primary[second]
primary[second] = primary[first]
first = ['a']
second = ['b']
third = ['c']
main = [first, second, third]
print(main)
merge(main, 0, 2)
print(main)
assert main[0] is main[2]
merge(main, 0, 1)
print(main)
assert main[1] is main[2]
print(first, second, third)
and the printout:
[['a'], ['b'], ['c']]
[['a', 'c'], ['b'], ['a', 'c']]
[['a', 'c', 'b'], ['a', 'c', 'b'], ['a', 'c', 'b']]
(['a', 'c', 'b'], ['b'], ['c'])
As you can see, the list elements all end up being the same object, but the lists that went into the main list are not.
edit
Unfortunately, this also fails if you don't include the first list element in every merge.
So there are no short-cuts here. If you want every element to either be the same element, or just be equal, you are going to have to traverse the list to get it done:
def merge(primary, first, second):
targets = primary[first], primary[second]
result = primary[first] + primary[second]
for index, item in enumerate(primary):
if item in targets:
primary[index] = result

Chain assignment:
>>> my_list = [['a'], ['b'], ['c']]
>>> my_list[0] = my_list[2] = my_list[0]+my_list[2]
>>> my_list
[['a', 'c'], ['b'], ['a', 'c']]
>>> my_list[0] += [1,2,3]
>>> my_list
[['a', 'c', 1, 2, 3], ['b'], ['a', 'c', 1, 2, 3]]
>>>
If you do a=c=a+c first, it produce a==c==_list_object_001; when you do a=b=a+b 2nd, it produce a==b==_list_object_002, so it went wrong.
You can't do this even with a pointer if you don't take care of the order of assignment. I think you should maintain a map to get things right.

As far as I can see, I don't think it's possible to do this using the mutability of python lists. If you have elements X and Y, both of which are already amalgamations of other elements, and you want to merge them, list mutability will not be able to telegraph the change to both sides. You can make X, Y and all the elements pointing at the same object as X be the same, or you can make X, Y and all the elements pointing at the same object as Y be the same, but not both.
I think the best bet is to define a custom object to represent the elements, which has a notion of parent nodes and ultimate parent nodes.
class Nodes(object):
def __init__(self, input1):
self.parent = None
self.val = [input1]
def ultimate_parent(self):
if self.parent:
return self.parent.ultimate_parent()
else:
return self
def merge(self, child):
self.ultimate_parent().val += child.ultimate_parent().val
child.ultimate_parent().parent = self.ultimate_parent()
def __repr__(self):
return str(self.ultimate_parent().val)
def merge(node_list, i, j):
node_list[i].merge(node_list[j])
list1 = [Nodes(x) for x in 'abcd']
print list1
merge(list1, 0, 2)
print list1
merge(list1, 1, 3)
print list1
merge(list1, 0, 1)
print list1
This will output:
[['a', 'c'],['b'],['a', 'c'],['d']]
[['a', 'c'],['b', 'd'],['a', 'c'],['b', 'd']]
[['a', 'c', 'b', 'd'],['a', 'c', 'b', 'd'],['a', 'c', 'b', 'd'],['a', 'c', 'b', 'd']]

Related

How to efficiently get common items from two lists that may have duplicates?

my_list = ['a', 'b', 'a', 'd', 'e', 'f']
my_list_2 = ['a', 'b', 'c']
The common items are:
c = ['a', 'b', 'a']
The code:
for e in my_list:
if e in my_list_2:
c.append(e)
...
If the my_list is long, this would be very inefficient. If I convert both lists into two sets, then use set's intersection() function to get the common items, I will lose the duplicates in my_list.
How to deal with this efficiently?
dict is already a hashmap, so lookups are practically as efficient as a set, so you may not need to do any extra work collecting the values - if it wasn't, you could pack the values into a set to check before checking the dict
However, a large improvement may be to make a generator for the values, rather than creating a new intermediate list, to iterate over where you actually want the values
def foo(src_dict, check_list):
for value in check_list:
if value in my_dict:
yield value
With the edit, you may find you're better off packing all the inputs into a set
def foo(src_list, check_list):
hashmap = set(src_list)
for value in check_list:
if value in hashmap:
yield value
If you know a lot about the inputs, you can do better, but that's an unusual case (for example if the lists are ordered you could bisect, or if you have a huge verifying list or very very few values to check against it you may find some efficiency in the ordering and if you make a set)
I am not sure about time efficiency, but, personally speaking, list comprehension would always be more of interest to me:
[x for x in my_list if x in my_list_2]
Output
['a', 'b', 'a']
First, utilize the set.intersection() method to get the intersecting values in the list. Then, use a nested list comprehension to include the duplicates based on the original list's count on each value:
my_list = ['a', 'b', 'a', 'd', 'e', 'f']
my_list_2 = ['a', 'b', 'c']
c = [x for x in set(my_list).intersection(set(my_list_2)) for _ in range(my_list.count(x))]
print(c)
The above may be slower than just
my_list = ['a', 'b', 'a', 'd', 'e', 'f']
my_list_2 = ['a', 'b', 'c']
c = []
for e in my_list:
if e in my_list_2:
c.append(e)
print(c)
But when the lists are significantly larger, the code block utilizing the set.intersection() method will be significantly more efficient (faster).
sorry for not reading the post carefully and now it is not possible to delete.. however, it is an attempt for solution.
c = lambda my_list, my_list_2: (my_list, my_list_2, list(set(my_list).intersection(set(my_list_2))))
print("(list_1,list_2,duplicate_items) -", c(my_list, my_list_2))
Output:
(list_1,list_2,duplicate_items) -> (['a', 'b', 'a', 'd', 'e', 'f'], ['a', 'b', 'c'], ['b', 'a'])
or can be
[i for i in my_list if i in my_list_2]
output:
['a', 'b', 'a']

Python: weird behavior of list append function

I'm a newbie in Python.
I try to make a Kruskal algo. Here is my code:
#c={('v','a'):5,('v','g'):5,('g','a'):5,('v','b'):7,('v','e'):4,('f','e'):8,('f','b'):8,('f','c'):11,('c','e'):9,('c','d'):7,('c','b'):9,('e','d'):8}
#n=8
def argmin(c):
m=1e100
r=()
for i in c:
if c[i]<m:
m=c[i]
r=i
return r
def kruskal (n, c):
T=[]
B=[]
while len(T)<n-1:
E=argmin(c)
c.pop(E)
e=[]
e+=E
a0=0
a1=0
f0=-1
f1=-1
cross=0
# print('e avant',e)
for i in range(len(B)):
for j in range(len(B[i])):
if e[0]==B[i][j]:
cross+=1
f0=i
if e[1]==B[i][j]:
cross+=1
f1=i
if cross==2: break
else: cross=0
if cross==2: continue
# print('e apres',e)
T.append(e)
# print('T',T)
if f0!=-1 and f1!=-1:
B[f0].extend(B[f1])
B.pop(f1)
elif f0!=-1:
B[f0].extend(e[1])
elif f1!=-1:
B[f1].extend(e[0])
else :
B.append(e)
# print('B', B)
return T
The problem I have is in the line where is: "T.append(e)"
In the result T[0] is not what I expect.
if I input the following:
c={('v','a'):5,('v','g'):5,('g','a'):5,('v','b'):7,('v','e'):4,('f','e'):8,('f','b'):8,('f','c'):11,('c','e'):9,('c','d'):7,('c','b'):9,('e','d'):8}
n=8
Then I call my function:
kruskal(8, c)
I get:
[['v', 'e', 'g', 'a', 'b', 'f', 'c', 'd'], ['v', 'g'], ['v', 'a'], ['v', 'b'], ['c', 'd'], ['f', 'b'], ['e', 'd']]
Where as I expect the following:
[['v', 'e'], ['v', 'g'], ['v', 'a'], ['v', 'b'], ['c', 'd'], ['f', 'b'], ['e', 'd']]
Not looking for all your code.But something is found that, You are appending references of list sometime.So to simply fix:
from copy import deepcopy
T.append(deepcopy(e)) #in place of T.append(e)
Will give output as
[['v', 'e'], ['g', 'a'], ['v', 'a'], ['v', 'b'], ['c', 'd'], ['f', 'b'], ['e', 'd']]
Example
a = [1, 2]
b = a
b.append(3)
>>>a
[1,2,3]
>>>b
[1,2,3]
What is happening here
a = [1,2]
b = a
>>>id(a), id(b)
(140526873334272, 140526873334272)
That is list [1,2] is tagged by two variables a and b. So any changes to list will affect every varables tagged to it.
The answer by itzmeontv is correct regarding the underlying reason for your problem the first member of T is a mutable list, which you then modify further on in your code. The code path why this happens in your case is a little complicated:
On the first iteration through your algorithm, you append e to T - so T[0] refers to the list name e at that point (contents ['v','e'])
On that same iteration, you always hit this block:
else :
B.append(e)
This means that B[0] now refers to the list named e as well - i.e. T[0] refers to the same list as that in B[0] and any changes to B[0] will be reflected in T[0] as lists are mutable.
The next iteration then creates a new e - but the list referred to in B[0] and T[0] is still that same list i.e. ['v','e']
You then continue to extend this list when you do B[f0].extend(B[f1]) in the cases where f0 is zero
This is a symptom of assigning a list not making a copy in Python, a detailed treatment of which is given in this question if you want to deepen your understanding. A range of options of how to append your list to T is given along with timings - you might, for instance, like to write T.append(e[:]), with the slice notation implicitly making a copy of e at the point you append it.
One thing you might want to consider is whether you need the members of T to be mutable once they are added to T. If you do not - an option may be to append tuples rather than lists - i.e.
T.append(tuple(e))

Removing duplicates (not by using set)

My data look like this:
let = ['a', 'b', 'a', 'c', 'a']
How do I remove the duplicates? I want my output to be something like this:
['b', 'c']
When I use the set function, I get:
set(['a', 'c', 'b'])
This is not what I want.
One option would be (as derived from Ritesh Kumar's answer here)
let = ['a', 'b', 'a', 'c', 'a']
onlySingles = [x for x in let if let.count(x) < 2]
which gives
>>> onlySingles
['b', 'c']
Try this,
>>> let
['a', 'b', 'a', 'c', 'a']
>>> dict.fromkeys(let).keys()
['a', 'c', 'b']
>>>
Sort the input, then removing duplicates becomes trivial:
data = ['a', 'b', 'a', 'c', 'a']
def uniq(data):
last = None
result = []
for item in data:
if item != last:
result.append(item)
last = item
return result
print uniq(sorted(data))
# prints ['a', 'b', 'c']
This is basically the shell's cat data | sort | uniq idiom.
The cost is O(N * log N), same as with a tree-based set.
Instead of sorting, or linearly scanning and re-counting the main list for its occurrences each time.
Count the number of occurrences and then filter on items that appear once...
>>> from collections import Counter
>>> let = ['a', 'b', 'a', 'c', 'a']
>>> [k for k, v in Counter(let).items() if v == 1]
['c', 'b']
You have to look at the sequence at least once regardless - although it makes sense to limit the amount of times you do so.
If you really want to avoid any type or set or otherwise hashed container (because you perhaps can't use them?), then yes, you can sort it, then use:
>>> from itertools import groupby, islice
>>> [k for k,v in groupby(sorted(let)) if len(list(islice(v, 2))) == 1]
['b', 'c']

-Python- Ordering lists based on a format

I'm new to programming in general, so looking to really expand my skills here. I'm trying to write a script that will grab a list of strings from an object, then order them based on a template of my design. Any items not in the template will be added to the end.
Here's how I'm doing it now, but could someone suggest a better/more efficient way?
originalList = ['b', 'a', 'c', 'z', 'd']
listTemplate = ['a', 'b', 'c', 'd']
listFinal = []
for thing in listTemplate:
if thing in originalList:
listFinal.append(thing)
originalList.pop(originalList.index(thing))
for thing in originalList:
listFinal.append(thing)
originalList.pop(originalList.index(thing))
Try this:
originalList = ['b', 'a', 'c', 'z', 'd']
listTemplate = ['a', 'b', 'c', 'd']
order = { element:index for index, element in enumerate(listTemplate) }
sorted(originalList, key=lambda element: order.get(element, float('+inf')))
=> ['a', 'b', 'c', 'd', 'z']
This is how it works:
First, we build a dictionary indicating, for each element in listTemplate, its relative order with respect to the others. For example a is 0, b is 1 and so on
Then we sort originalList. If one of its elements is present in the order dictionary, then use its relative position for ordering. If it's not present, return a positive infinite value - this will guarantee that the elements not in listTemplate will end up at the end, with no further ordering among them.
The solution in the question, although correct, is not very pythonic. In particular, whenever you have to build a new list, try to use a list comprehension instead of explicit looping/appending. And it's not a good practice to "destroy" the input list (using pop() in this case).
You can create a dict using the listTemplate list, that way the expensive(O(N)) list.index operations can be reduced to O(1) lookups.
>>> lis1 = ['b', 'a', 'c', 'z', 'd']
>>> lis2 = ['a', 'b', 'c', 'd']
Use enumerate to create a dict with the items as keys(Considering that the items are hashable) and index as values.
>>> dic = { x:i for i,x in enumerate(lis2) }
Now dic looks like:
{'a': 0, 'c': 2, 'b': 1, 'd': 3}
Now for each item in lis1 we need to check it's index in dic, if the key is not found we return float('inf').
Function used as key:
def get_index(key):
return dic.get(key, float('inf'))
Now sort the list:
>>> lis1.sort(key=get_index)
>>> lis1
['a', 'b', 'c', 'd', 'z']
For the final step, you can just use:
listFinal += originalList
and it will add these items to the end.
There is no need to create a new dictionary at all:
>>> len_lis1=len(lis1)
>>> lis1.sort(key = lambda x: lis2.index(x) if x in lis2 else len_lis1)
>>> lis1
['a', 'b', 'c', 'd', 'z']
Here is a way that has better computational complexity:
# add all elements of originalList not found in listTemplate to the back of listTemplate
s = set(listTemplate)
listTemplate.extend(el for el in originalList if el not in s)
# now sort
rank = {el:index for index,el in enumerate(listTemplate)}
listFinal = sorted(originalList, key=rank.get)

Obtain all subtrees in value

Given "a.b.c.d.e" I want to obtain all subtrees, efficiently, e.g. "b.c.d.e" and "c.d.e", but not "a.d.e" or "b.c.d".
Real world situation:
I have foo.bar.baz.example.com and I want all possible subdomain trees.
listed = "a.b.c.d.e".split('.')
subtrees = ['.'.join(listed[idx:]) for idx in xrange(len(listed))]
Given your sample data, subtrees equals ['a.b.c.d.e', 'b.c.d.e', 'c.d.e', 'd.e', 'e'].
items = data.split('.')
['.'.join(items[i:]) for i in range(0, len(items))]
def parts( s, sep ):
while True:
yield s
try:
# cut the string after the next sep
s = s[s.index(sep)+1:]
except ValueError:
# no `sep` left
break
print list(parts("a.b.c.d.e", '.'))
# ['a.b.c.d.e', 'b.c.d.e', 'c.d.e', 'd.e', 'e']
Not sure, if this is what you want.
But slicing of the list with varying sizes yields that.
>>> x = "a.b.c.d.e"
>>> k = x.split('.')
>>> k
['a', 'b', 'c', 'd', 'e']
>>> l = []
>>> for el in range(len(k)): l.append(k[el+1:])
...
>>> l
[['b', 'c', 'd', 'e'], ['c', 'd', 'e'], ['d', 'e'], ['e'], []]
>>> [".".join(l1) for l1 in l if l1]
['b.c.d.e', 'c.d.e', 'd.e', 'e']
>>>
Of course, the above was to illustrate the process. You could combine them into one liner.
[Edit: I thought the answer is same as any here and explains it well]

Categories