Python: weird behavior of list append function - python

I'm a newbie in Python.
I try to make a Kruskal algo. Here is my code:
#c={('v','a'):5,('v','g'):5,('g','a'):5,('v','b'):7,('v','e'):4,('f','e'):8,('f','b'):8,('f','c'):11,('c','e'):9,('c','d'):7,('c','b'):9,('e','d'):8}
#n=8
def argmin(c):
m=1e100
r=()
for i in c:
if c[i]<m:
m=c[i]
r=i
return r
def kruskal (n, c):
T=[]
B=[]
while len(T)<n-1:
E=argmin(c)
c.pop(E)
e=[]
e+=E
a0=0
a1=0
f0=-1
f1=-1
cross=0
# print('e avant',e)
for i in range(len(B)):
for j in range(len(B[i])):
if e[0]==B[i][j]:
cross+=1
f0=i
if e[1]==B[i][j]:
cross+=1
f1=i
if cross==2: break
else: cross=0
if cross==2: continue
# print('e apres',e)
T.append(e)
# print('T',T)
if f0!=-1 and f1!=-1:
B[f0].extend(B[f1])
B.pop(f1)
elif f0!=-1:
B[f0].extend(e[1])
elif f1!=-1:
B[f1].extend(e[0])
else :
B.append(e)
# print('B', B)
return T
The problem I have is in the line where is: "T.append(e)"
In the result T[0] is not what I expect.
if I input the following:
c={('v','a'):5,('v','g'):5,('g','a'):5,('v','b'):7,('v','e'):4,('f','e'):8,('f','b'):8,('f','c'):11,('c','e'):9,('c','d'):7,('c','b'):9,('e','d'):8}
n=8
Then I call my function:
kruskal(8, c)
I get:
[['v', 'e', 'g', 'a', 'b', 'f', 'c', 'd'], ['v', 'g'], ['v', 'a'], ['v', 'b'], ['c', 'd'], ['f', 'b'], ['e', 'd']]
Where as I expect the following:
[['v', 'e'], ['v', 'g'], ['v', 'a'], ['v', 'b'], ['c', 'd'], ['f', 'b'], ['e', 'd']]

Not looking for all your code.But something is found that, You are appending references of list sometime.So to simply fix:
from copy import deepcopy
T.append(deepcopy(e)) #in place of T.append(e)
Will give output as
[['v', 'e'], ['g', 'a'], ['v', 'a'], ['v', 'b'], ['c', 'd'], ['f', 'b'], ['e', 'd']]
Example
a = [1, 2]
b = a
b.append(3)
>>>a
[1,2,3]
>>>b
[1,2,3]
What is happening here
a = [1,2]
b = a
>>>id(a), id(b)
(140526873334272, 140526873334272)
That is list [1,2] is tagged by two variables a and b. So any changes to list will affect every varables tagged to it.

The answer by itzmeontv is correct regarding the underlying reason for your problem the first member of T is a mutable list, which you then modify further on in your code. The code path why this happens in your case is a little complicated:
On the first iteration through your algorithm, you append e to T - so T[0] refers to the list name e at that point (contents ['v','e'])
On that same iteration, you always hit this block:
else :
B.append(e)
This means that B[0] now refers to the list named e as well - i.e. T[0] refers to the same list as that in B[0] and any changes to B[0] will be reflected in T[0] as lists are mutable.
The next iteration then creates a new e - but the list referred to in B[0] and T[0] is still that same list i.e. ['v','e']
You then continue to extend this list when you do B[f0].extend(B[f1]) in the cases where f0 is zero
This is a symptom of assigning a list not making a copy in Python, a detailed treatment of which is given in this question if you want to deepen your understanding. A range of options of how to append your list to T is given along with timings - you might, for instance, like to write T.append(e[:]), with the slice notation implicitly making a copy of e at the point you append it.
One thing you might want to consider is whether you need the members of T to be mutable once they are added to T. If you do not - an option may be to append tuples rather than lists - i.e.
T.append(tuple(e))

Related

How to efficiently get common items from two lists that may have duplicates?

my_list = ['a', 'b', 'a', 'd', 'e', 'f']
my_list_2 = ['a', 'b', 'c']
The common items are:
c = ['a', 'b', 'a']
The code:
for e in my_list:
if e in my_list_2:
c.append(e)
...
If the my_list is long, this would be very inefficient. If I convert both lists into two sets, then use set's intersection() function to get the common items, I will lose the duplicates in my_list.
How to deal with this efficiently?
dict is already a hashmap, so lookups are practically as efficient as a set, so you may not need to do any extra work collecting the values - if it wasn't, you could pack the values into a set to check before checking the dict
However, a large improvement may be to make a generator for the values, rather than creating a new intermediate list, to iterate over where you actually want the values
def foo(src_dict, check_list):
for value in check_list:
if value in my_dict:
yield value
With the edit, you may find you're better off packing all the inputs into a set
def foo(src_list, check_list):
hashmap = set(src_list)
for value in check_list:
if value in hashmap:
yield value
If you know a lot about the inputs, you can do better, but that's an unusual case (for example if the lists are ordered you could bisect, or if you have a huge verifying list or very very few values to check against it you may find some efficiency in the ordering and if you make a set)
I am not sure about time efficiency, but, personally speaking, list comprehension would always be more of interest to me:
[x for x in my_list if x in my_list_2]
Output
['a', 'b', 'a']
First, utilize the set.intersection() method to get the intersecting values in the list. Then, use a nested list comprehension to include the duplicates based on the original list's count on each value:
my_list = ['a', 'b', 'a', 'd', 'e', 'f']
my_list_2 = ['a', 'b', 'c']
c = [x for x in set(my_list).intersection(set(my_list_2)) for _ in range(my_list.count(x))]
print(c)
The above may be slower than just
my_list = ['a', 'b', 'a', 'd', 'e', 'f']
my_list_2 = ['a', 'b', 'c']
c = []
for e in my_list:
if e in my_list_2:
c.append(e)
print(c)
But when the lists are significantly larger, the code block utilizing the set.intersection() method will be significantly more efficient (faster).
sorry for not reading the post carefully and now it is not possible to delete.. however, it is an attempt for solution.
c = lambda my_list, my_list_2: (my_list, my_list_2, list(set(my_list).intersection(set(my_list_2))))
print("(list_1,list_2,duplicate_items) -", c(my_list, my_list_2))
Output:
(list_1,list_2,duplicate_items) -> (['a', 'b', 'a', 'd', 'e', 'f'], ['a', 'b', 'c'], ['b', 'a'])
or can be
[i for i in my_list if i in my_list_2]
output:
['a', 'b', 'a']

Iterate over list with sequence according to given dependencies from dict

I have a list lst and I want to iteratively transform each element in the list using a function f:
lst = ['a', 'b', 'c', 'd', 'e', 'f', 'g']
def f(x):
# do something
return x
I can iteratively apply this transformation over the list:
new_lst = [f(x) for x in lst]
So far so good. Now we want to introduce sequencing for processing elements in the lst. The dict ensure_transformed describes a list of elements that need to be transformed before processing a given element:
ensure_transformed = {'a' : ['b', 'e'],
'b' : ['e', 'f'],
'c' : ['a', 'd'],
'd' : [],
'e' : [],
'f' : []
}
The interpretation is - Before transforming a, transform b and e, before transforming b, transform e and f and so on.
(The meaning of 'd' : [] is that no elements need to be processed before d. g is absent from ensure_transformed implying g shouldn't be transformed.)
How can I transform lst using function f while ensuring sequencing from ensure_transformed?
Options on the table:
Use recursion to go through list while keeping track of elements that have already been transformed. This approach has become messy to manage.
Re-order the list using the dict and then iterate over the list. I haven't tried this approach yet, but it seems promising.
I'm open to other approaches.
As mentioned by #CristianCiupitu in the comments, this is now very easy to do with Python 3.9 by using the graphlib module:
import graphlib
lst = ['a', 'b', 'c', 'd', 'e', 'f', 'g']
def f(x): return x.upper() #for example
ensure_transformed = {'a' : ['b', 'e'],
'b' : ['e', 'f'],
'c' : ['a', 'd'],
'd' : [],
'e' : [],
'f' : []
}
ts = graphlib.TopologicalSorter(ensure_transformed)
processed = [f(x) for x in ts.static_order()]
print(processed)
#prints ['E', 'F', 'D', 'B', 'A', 'C']
The ensure_transformed dictionary describes a directed graph where {'a': ['b', 'e']} means that there is an edge b → a and an edge e → a.
Assuming that the graph has no cycles, your task is equivalent to applying topological sorting to it, for which you can find various algorithms.

Python array value unexpectedly changes after function call

I have a small function which uses one list to populate another. For some reason, the source list gets modified. I don't have a single line that manipulates the source list arr. I am probably missing the way Python deals with scope of variables, lists. My expected output is for the list arr to remain the same after the function call.
numTestRows = 5
m = 2
def getTestData():
data['test'] = []
size_c = len(arr)
for i in range(numTestRows):
data['test'].append(arr[i%size_c])
for j in range(m):
data['test'][i].append('xyz')
#just a 2x5 str matrix
arr = [['a', 'b', 'c', 'd', 'e'], ['f', 'g', 'h', 'i', 'j']]
print('Array before: ')
print( arr)
data = {}
getTestData()
print('Array after: ')
print( arr)
Output
Array before:
[['a', 'b', 'c', 'd', 'e'], ['f', 'g', 'h', 'i', 'j']]
Array after:
[['a', 'b', 'c', 'd', 'e', 'xyz', 'xyz', 'xyz', 'xyz', 'xyz', 'xyz'], ['f', 'g', 'h', 'i', 'j', 'xyz', 'xyz', 'xyz', 'xyz']]
You've mis-handled the references in your list of lists (not a matrix). Perhaps if we break this down a little more, you can see what's happening. Start your main program with the two char lists as separate variables:
left = ['a', 'b', 'c', 'd', 'e']
right = ['f', 'g', 'h', 'i', 'j']
arr = [left, right]
Now, look at what happens within your function at the critical lines. On this first iteration, size_c is 2, i is 0 ...
data['test'].append(arr[i%size_c])
This will append arr[0] to data[test], which started as an empty list. Now for the critical part: arr[0] is not a new list; rather, it's a reference to the list we now know as left in the main program. There is only one copy of this list.
Now, when we get into the next loop, we hit the statement:
data['test'][i].append('xyz')
data['test'][i] is a reference to the same list as left ... and this explains the appending to the original list.
You can easily copy a list with the suffix [:], making a new slice of the entire list. For instance:
data['test'].append(arr[i%size_c][:])
... and this should solve your reference problem.

Removing a string from a list inside a list of lists

What I'm trying to achieve here in a small local test is to iterate over an array of strings, which are basically arrays of strings inside a parent array.
I'm trying to achieve the following...
1) Get the first array in the parent array
2) Get the rest of the list without the one taken
3) Iterate through the taken array, so I take each of the strings
4) Look for each string taken in all of the rest of the arrays
5) If found, remove it from the array
So far I've tried the following, but I'm struggling with an error that I don't know where it does come from...
lines = map(lambda l: str.replace(l, "\n", ""),
list(open("PATH", 'r')))
splitLines = map(lambda l: l.split(','), lines)
for line in splitLines:
for keyword in line:
print(list(splitLines).remove(keyword))
But I'm getting the following error...
ValueError: list.remove(x): x not in list
Which isn't true as 'x' isn't a string included in any of the given test arrays.
SAMPLE INPUT (Comma separated lines in a text file, so I get an array of strings per line):
[['a', 'b', 'c'], ['e', 'f', 'g'], ['b', 'q', 'a']]
SAMPLE OUTPUT:
[['a', 'b', 'c'], ['e', 'f', 'g'], ['q']]
You can keep track of previously seen strings using a set for fast lookups, and using a simple list comprehension to add elements not found in the previously seen set.
prev = set()
final = []
for i in x:
final.append([j for j in i if j not in prev])
prev = prev.union(set(i))
print(final)
Output:
[['a', 'b', 'c'], ['e', 'f', 'g'], ['q']]
inputlist = [['a', 'b', 'c'], ['e', 'f', 'g'], ['b', 'q', 'a']]
scanned=[]
res=[]
for i in inputlist:
temp=[]
for j in i:
if j in scanned:
pass
else:
scanned.append(j)
temp.append(j)
res.append(temp)
[['a', 'b', 'c'], ['e', 'f', 'g'], ['q']]

List of lists: Changing all references with one assignment?

Rationale: I start a script with a list that contains multiple heaps. At every step, I want to "join" heaps, but keep both joined indexes pointing to the same element.
In short, given a list of mutable elements, such that list[a] is list[b] returns True, how can I assign (list[a] = [new]) while keeping list[a] and list[b] references to the same mutable container?
So for example, if each heap was represented by a letter, we would have
t is ['a', 'b', 'c', 'd']
t is ['ac', 'b', 'ac', 'd'] (0 and 2 merge)
t is ['abc', 'abc', 'abc', 'd'] (0 and 1 merge, but since 0 also refers to 2, it is as if 0/2 and 1 merged... into 0/1/2)
at this stage, if I did a set(map(id, t)), I would expect it to only have TWO elements.
my problem is that I cannot seem to affect the object being pointed to directly, so I have to iterate over the entire list, picking out any ids that match either merge index, and assign directly.
is there any way to change the underlying object rather than all pointers to it?
Full Example of desired behavior:
>>> my_list = [['a'], ['b'], ['c']]
>>> merge(my_list, 0, 2) # mutates, returns None
>>> my_list
[['a', 'c'], ['b'], ['a', 'c']]
>>> my_list[0] is my_list[2]
True
>>> merge(my_list, 0, 1)
>>> my_list
[['a', 'c', 'b'], ['a', 'c', 'b'], ['a', 'c', 'b']]
>>> my_list[0] is my_list[1]
True
>>> my_list[1] is my_list[2]
True
The problem is that, if within merge I simply call
my_list[arg1] = my_list[arg2] = my_list[arg1]+my_list[arg2]
it ONLY affects the entries at arg1 and arg2. I want it to affect any other entries that may point to the elements at either my_list[arg1] or my_list[arg2], so that eventually my_list is just a collection of pointers to the same big heap, that has absorbed all the little ones.
This is as close as you are going to get:
def merge(primary, first, second):
primary[first] += primary[second]
primary[second] = primary[first]
first = ['a']
second = ['b']
third = ['c']
main = [first, second, third]
print(main)
merge(main, 0, 2)
print(main)
assert main[0] is main[2]
merge(main, 0, 1)
print(main)
assert main[1] is main[2]
print(first, second, third)
and the printout:
[['a'], ['b'], ['c']]
[['a', 'c'], ['b'], ['a', 'c']]
[['a', 'c', 'b'], ['a', 'c', 'b'], ['a', 'c', 'b']]
(['a', 'c', 'b'], ['b'], ['c'])
As you can see, the list elements all end up being the same object, but the lists that went into the main list are not.
edit
Unfortunately, this also fails if you don't include the first list element in every merge.
So there are no short-cuts here. If you want every element to either be the same element, or just be equal, you are going to have to traverse the list to get it done:
def merge(primary, first, second):
targets = primary[first], primary[second]
result = primary[first] + primary[second]
for index, item in enumerate(primary):
if item in targets:
primary[index] = result
Chain assignment:
>>> my_list = [['a'], ['b'], ['c']]
>>> my_list[0] = my_list[2] = my_list[0]+my_list[2]
>>> my_list
[['a', 'c'], ['b'], ['a', 'c']]
>>> my_list[0] += [1,2,3]
>>> my_list
[['a', 'c', 1, 2, 3], ['b'], ['a', 'c', 1, 2, 3]]
>>>
If you do a=c=a+c first, it produce a==c==_list_object_001; when you do a=b=a+b 2nd, it produce a==b==_list_object_002, so it went wrong.
You can't do this even with a pointer if you don't take care of the order of assignment. I think you should maintain a map to get things right.
As far as I can see, I don't think it's possible to do this using the mutability of python lists. If you have elements X and Y, both of which are already amalgamations of other elements, and you want to merge them, list mutability will not be able to telegraph the change to both sides. You can make X, Y and all the elements pointing at the same object as X be the same, or you can make X, Y and all the elements pointing at the same object as Y be the same, but not both.
I think the best bet is to define a custom object to represent the elements, which has a notion of parent nodes and ultimate parent nodes.
class Nodes(object):
def __init__(self, input1):
self.parent = None
self.val = [input1]
def ultimate_parent(self):
if self.parent:
return self.parent.ultimate_parent()
else:
return self
def merge(self, child):
self.ultimate_parent().val += child.ultimate_parent().val
child.ultimate_parent().parent = self.ultimate_parent()
def __repr__(self):
return str(self.ultimate_parent().val)
def merge(node_list, i, j):
node_list[i].merge(node_list[j])
list1 = [Nodes(x) for x in 'abcd']
print list1
merge(list1, 0, 2)
print list1
merge(list1, 1, 3)
print list1
merge(list1, 0, 1)
print list1
This will output:
[['a', 'c'],['b'],['a', 'c'],['d']]
[['a', 'c'],['b', 'd'],['a', 'c'],['b', 'd']]
[['a', 'c', 'b', 'd'],['a', 'c', 'b', 'd'],['a', 'c', 'b', 'd'],['a', 'c', 'b', 'd']]

Categories