Python array value unexpectedly changes after function call - python

I have a small function which uses one list to populate another. For some reason, the source list gets modified. I don't have a single line that manipulates the source list arr. I am probably missing the way Python deals with scope of variables, lists. My expected output is for the list arr to remain the same after the function call.
numTestRows = 5
m = 2
def getTestData():
data['test'] = []
size_c = len(arr)
for i in range(numTestRows):
data['test'].append(arr[i%size_c])
for j in range(m):
data['test'][i].append('xyz')
#just a 2x5 str matrix
arr = [['a', 'b', 'c', 'd', 'e'], ['f', 'g', 'h', 'i', 'j']]
print('Array before: ')
print( arr)
data = {}
getTestData()
print('Array after: ')
print( arr)
Output
Array before:
[['a', 'b', 'c', 'd', 'e'], ['f', 'g', 'h', 'i', 'j']]
Array after:
[['a', 'b', 'c', 'd', 'e', 'xyz', 'xyz', 'xyz', 'xyz', 'xyz', 'xyz'], ['f', 'g', 'h', 'i', 'j', 'xyz', 'xyz', 'xyz', 'xyz']]

You've mis-handled the references in your list of lists (not a matrix). Perhaps if we break this down a little more, you can see what's happening. Start your main program with the two char lists as separate variables:
left = ['a', 'b', 'c', 'd', 'e']
right = ['f', 'g', 'h', 'i', 'j']
arr = [left, right]
Now, look at what happens within your function at the critical lines. On this first iteration, size_c is 2, i is 0 ...
data['test'].append(arr[i%size_c])
This will append arr[0] to data[test], which started as an empty list. Now for the critical part: arr[0] is not a new list; rather, it's a reference to the list we now know as left in the main program. There is only one copy of this list.
Now, when we get into the next loop, we hit the statement:
data['test'][i].append('xyz')
data['test'][i] is a reference to the same list as left ... and this explains the appending to the original list.
You can easily copy a list with the suffix [:], making a new slice of the entire list. For instance:
data['test'].append(arr[i%size_c][:])
... and this should solve your reference problem.

Related

Python inserting element to a list varying with each iteration

I am trying to insert an element in the list at multiple instances. But by doing this, the length of the list is constantly changing. So, it is not reaching the last element.
my_list = ['a', 'b', 'c', 'd', 'e', 'a']
aq = len(my_list)
for i in range(aq):
if my_list[i] == 'a':
my_list.insert(i+1, 'g')
aq = aq+1
print(my_list)
The output I am getting is -
['a', 'g', 'b', 'c', 'd', 'e', 'a']
The output I am trying to get is -
['a', 'g', 'b', 'c', 'd', 'e', 'a', 'g']
How can I get that?
Changing aq in the loop does not change the range. That created an iterator when you entered the loop, and that iterator won't change. There are two ways to do this. The easy way is to build a new list:
newlist = []
for c in my_list:
newlist.append(c)
if c == 'a':
newlist.append('g')
The trickier way is to use .find() to find the next instance of 'a' and insert a 'g' after it, then keep searching for the next one.
Here is a nice way to write it using the built-in itertools.chain.from_iterable:
from itertools import chain
my_list = ['a', 'b', 'c', 'd', 'e', 'a']
my_list = list(chain.from_iterable((x, "g") if x == "a" else x for x in my_list))
# ['a', 'g', 'b', 'c', 'd', 'e', 'a', 'g']
Here, every occurance of "a" is replaced with "a", "g" in the list, otherwise the elements are left alone.

Python: Combine all dict key values based on one particular key being the same

I know there are a million questions like this, I just can't find an answer that works for me.
I have this:
list1 = [{'assembly_id': '1', 'asym_id_list': ['A', 'B', 'E', 'G', 'H']}, {'assembly_id': '1', 'asym_id_list': ['C', 'D', 'F', 'I', 'J']}, {'assembly_id':2,'asym_id_list':['D,C'],'auth_id_list':['C','V']}]
if the assembly_ids are the same, I want to combine the other same keys in the dict.
In this example, assembly_id 1 appears twice, so the input above would turn into:
[{'assembly_id': '1', 'asym_id_list': ['A', 'B', 'E', 'G', 'H','C', 'D', 'F', 'I', 'J']},{'assembly_id':2,'asym_id_list:['D,C'],'auth_id_list':['C','V']}]
In theory there can be n assembly_ids (i.e. assembly 1 could appear in the dict 10 or 20 times, not just 2) and there can be up to two other lists to combine (asym_id_list and auth_id_list).
I was looking at this method:
new_dict = {}
assembly_list = [] #to keep track of assemblies already seen
for dict_name in list1: #for each dict in the list
if dict_name['assembly_id'] not in assembly_list: #if the assembly id is new
new_dict['assembly_id'] = dict_name #this line is wrong, add the entry to new_dict
assembly_list.append(new_dict['assembly_id']) #append the id to 'assembly_list'
else:
new_dict['assembly_id'].append(dict_name) #else if it's already seen, append the dictionaries together, this is wrong
print(new_dict)
The output is wrong:
{'assembly_id': {'assembly_id': 2, 'asym_id_list': ['D,C'], 'auth_id_list': ['C', 'V']}}
But I think the idea is right, that I should open a new list and dict, and if not seen before, append; whereas if it has been seen before...combine? But it's just the specifics I'm not getting?
You are logically thinking correctly, we can use a dictionary m which contains key, value pairs of assembly_id and its corresponding dictionary to keep track of visited assembly_ids, whenever a new assembly_id is encountered we add it to the dictionary m otherwise if its already contain the assembly_id we just extend the asym_id_list, auth_id_list for that assembly_id:
def merge(dicts):
m = {} # keeps track of the visited assembly_ids
for d in dicts:
key = d['assembly_id'] # assembly_id is used as merge/grouping key
if key in m:
if 'asym_id_list' in d:
m[key]['asym_id_list'] = m[key].get('asym_id_list', []) + d['asym_id_list']
elif 'auth_id_list' in d:
m[key]['auth_id_list'] = m[key].get('auth_id_list', []) + d['auth_id_list']
else:
m[key] = d
return list(m.values())
Result:
# merge(list1)
[
{
'assembly_id': '1', 'asym_id_list': ['A', 'B', 'E', 'G', 'H', 'C', 'D', 'F', 'I', 'J']
},
{
'assembly_id': 2, 'asym_id_list': ['D,C'], 'auth_id_list': ['C', 'V']
}
]
Use a dict keyed on assembly_id to collect all the data for a given key; you can then go back and generate a list of dicts in the original format if needed.
>>> from collections import defaultdict
>>> from typing import Dict, List
>>> id_lists: Dict[str, List[str]] = defaultdict(list)
>>> for d in list1:
... id_lists[d['assembly_id']].extend(d['asym_id_list'])
...
>>> combined_list = [{
... 'assembly_id': id, 'asym_id_list': id_list
... } for id, id_list in id_lists.items()]
>>> combined_list
[{'assembly_id': '1', 'asym_id_list': ['A', 'B', 'E', 'G', 'H', 'C', 'D', 'F', 'I', 'J']}, {'assembly_id': 2, 'asym_id_list': ['D,C']}]
>>>
(edit) didn't see the bit about auth_id_lists because it's hidden in the scroll in the original code -- same strategy applies, just either use two dicts in the first step or make it a dict of some collection of lists (e.g. a dict of dicts of lists, with the outer dict keyed on assembly_id values and the inner dict keyed on the original field name).
#Samwise has provided a good answer to the question you asked and this is not intended to replace that. However, I am going to make a suggestion to the way you are keeping the data after the merge. I would put this in a comment but there is no way to keep code formatting in a comment and it is a bit too big as well.
Before that, I think that you have a typo in your example data. I think that you meant the 'D,C' in 'assembly_id':2,'asym_id_list':['D,C'] to be separate strings like this: 'assembly_id':2,'asym_id_list':['D', 'C']. I am going to assume that below, but if not it does not change any of the code or comments.
Instead of the merged structure being a list of dictionaries like this:
merge_l = [
{'assembly_id': '1', 'asym_id_list': ['A', 'B', 'E', 'G', 'H', 'C', 'D', 'F', 'I', 'J']},
{'assembly_id': 2, 'asym_id_list': ['D', 'C'], 'auth_id_list': ['C', 'V']}
]
Instead, I would recommend not using a list as the top level structure, but instead using a dictionary keyed by the value of the assembly_id. So it would be a dictionary whos values are dictionaries. Like this:
merge_d = { '1': {'asym_id_list': ['A', 'B', 'E', 'G', 'H', 'C', 'D', 'F', 'I', 'J']},
'2': {'asym_id_list': ['D', 'C'], 'auth_id_list': ['C', 'V']}
}
or if you want to keep the 'assembly_id' as well, like this:
merge_d = { '1': {'assembly_id': '1', 'asym_id_list': ['A', 'B', 'E', 'G', 'H', 'C', 'D', 'F', 'I', 'J']},
'2': {'assembly_id': 2, 'asym_id_list': ['D', 'C'], 'auth_id_list': ['C', 'V']}
}
That last one can be achieved by just changing the return from #Samwise's merge() method and just return m instead of converting the dict to a list.
One other comment on #Samwise code, just so you are aware of it, is that the combined lists can contain duplicates. So if the original data had asym_id_list': ['A', 'B'] in one entry and asym_id_list': ['B', 'C'] in another, the combined list would contain asym_id_list': ['A', 'B', 'B', 'C']. That could be what you want, but if you want to avoid that you could use sets instead of lists for the internal container for asym_id and auth_id containers.
In #Samwise answer, change it something like this:
def merge(dicts):
m = {} # keeps track of the visited assembly_ids
for d in dicts:
key = d['assembly_id'] # assembly_id is used as merge/grouping key
if key in m:
if 'asym_id_list' in d:
m[key]['asym_id_list'] = m[key].get('asym_id_list', set()) | set(d['asym_id_list'])
if 'auth_id_list' in d:
m[key]['auth_id_list'] = m[key].get('auth_id_list', set()) | set(d['auth_id_list'])
else:
m[key] = {'assembly_id': d['assembly_id']}
if 'asym_id_list' in d:
m[key]['asym_id_list'] = set(d['asym_id_list'])
if 'auth_id_list' in d:
m[key]['auth_id_list'] = set(d['auth_id_list'])
return m
If you go this way, you might want to reconsider the key names 'asym_id_list' and 'auth_id_list' since they are sets not lists. But that may be constrained by the other code around this and what it expects.

replace duplicate values in a list with 'x'?

I am trying to understand the process of creating a function that can replace duplicate strings in a list of strings. for example, I want to convert this list
mylist = ['a', 'b', 'b', 'a', 'c', 'a']
to this
mylist = ['a', 'b', 'x', 'x', 'c', 'x']
initially, I know I need create my function and iterate through the list
def replace(foo):
newlist= []
for i in foo:
if foo[i] == foo[i+1]:
foo[i].replace('x')
return foo
However, I know there are two problems with this. the first is that I get an error stating
list indices must be integers or slices, not str
so I believe I should instead be operating on the range of this list, but I'm not sure how to implement it. The other being that this would only help me if the duplicate letter comes directly after my iteration (i).
Unfortunately, that's as far as my understanding of the problem reaches. If anyone can provide some clarification on this procedure for me, I would be very grateful.
Go through the list, and keep track of what you've seen in a set. Replace things you've seen before in the list with 'x':
mylist = ['a', 'b', 'b', 'a', 'c', 'a']
seen = set()
for i, e in enumerate(mylist):
if e in seen:
mylist[i] = 'x'
else:
seen.add(e)
print(mylist)
# ['a', 'b', 'x', 'x', 'c', 'x']
Simple Solution.
my_list = ['a', 'b', 'b', 'a', 'c', 'a']
new_list = []
for i in range(len(my_list)):
if my_list[i] in new_list:
new_list.append('x')
else:
new_list.append(my_list[i])
print(my_list)
print(new_list)
# output
#['a', 'b', 'b', 'a', 'c', 'a']
#['a', 'b', 'x', 'x', 'c', 'x']
The other solutions use indexing, which isn't necessarily required.
Really simply, you could check if the value is in the new list, else you can append x. If you wanted to use a function:
old = ['a', 'b', 'b', 'a', 'c']
def replace_dupes_with_x(l):
tmp = list()
for char in l:
if char in tmp:
tmp.append('x')
else:
tmp.append(char)
return tmp
new = replace_dupes_with_x(old)
You can use the following solution:
from collections import defaultdict
mylist = ['a', 'b', 'b', 'a', 'c', 'a']
ret, appear = [], defaultdict(int)
for c in mylist:
appear[c] += 1
ret.append(c if appear[c] == 1 else 'x')
Which will give you:
['a', 'b', 'x', 'x', 'c', 'x']

Removing a string from a list inside a list of lists

What I'm trying to achieve here in a small local test is to iterate over an array of strings, which are basically arrays of strings inside a parent array.
I'm trying to achieve the following...
1) Get the first array in the parent array
2) Get the rest of the list without the one taken
3) Iterate through the taken array, so I take each of the strings
4) Look for each string taken in all of the rest of the arrays
5) If found, remove it from the array
So far I've tried the following, but I'm struggling with an error that I don't know where it does come from...
lines = map(lambda l: str.replace(l, "\n", ""),
list(open("PATH", 'r')))
splitLines = map(lambda l: l.split(','), lines)
for line in splitLines:
for keyword in line:
print(list(splitLines).remove(keyword))
But I'm getting the following error...
ValueError: list.remove(x): x not in list
Which isn't true as 'x' isn't a string included in any of the given test arrays.
SAMPLE INPUT (Comma separated lines in a text file, so I get an array of strings per line):
[['a', 'b', 'c'], ['e', 'f', 'g'], ['b', 'q', 'a']]
SAMPLE OUTPUT:
[['a', 'b', 'c'], ['e', 'f', 'g'], ['q']]
You can keep track of previously seen strings using a set for fast lookups, and using a simple list comprehension to add elements not found in the previously seen set.
prev = set()
final = []
for i in x:
final.append([j for j in i if j not in prev])
prev = prev.union(set(i))
print(final)
Output:
[['a', 'b', 'c'], ['e', 'f', 'g'], ['q']]
inputlist = [['a', 'b', 'c'], ['e', 'f', 'g'], ['b', 'q', 'a']]
scanned=[]
res=[]
for i in inputlist:
temp=[]
for j in i:
if j in scanned:
pass
else:
scanned.append(j)
temp.append(j)
res.append(temp)
[['a', 'b', 'c'], ['e', 'f', 'g'], ['q']]

Python Splitting String and Sorting Alphabetically

Can somebody please help me to create a python program whereby the unsorted list is split up into groups of 2, arranged alphabetically within their groups of two. The program should then create a new list in alphabetical order by taking the next greatest letter from the correct pair. Please don't tell me to do this in a different way as my method must take place as is written above. Thanks :)
unsorted = ['B', 'D', 'A', 'G', 'F', 'E', 'H', 'C']
n = 4
num = float(len(unsorted))/n
l = [ unsorted [i:i + int(num)] for i in range(0, (n-1)*int(num), int(num))]
l.append(unsorted[(n-1)*int(num):])
print(l)
complete = unsorted.split()
print(complete)
If I understand correctly, you are trying to turn unsorted into the following list:
['D', 'G', 'F', 'H']
If that is the case, I have modified your code so that it produces the correct output.
unsorted = ['B', 'D', 'A', 'G', 'F', 'E', 'H', 'C']
n = 4
num = float(len(unsorted))/n
l = [ unsorted [i:i + int(num)] for i in range(0, (n-1)*int(num), int(num))]
l.append(unsorted[(n-1)*int(num):])
# This part has been added in. It sorts each sublist,
# then takes the second element (the character further along the alphabet)
for i in range(len(l)):
l[i] = sorted(l[i])[1]
print(l)
Assuming you are looking for ['D', 'G', 'F', 'H'], I think the following code is similar to yours but a little clearer :
a = ['B', 'D', 'A', 'G', 'F', 'E', 'H', 'C']
Iterate over pairs of items in list, a[i:i+2], using a list comprehension, and sort the items in each pair in a reversed alphabetic order using sorted(list,reverse=True):
groups = [ sorted(a[i:i+2],reverse=True) for i in range(0,len(a),2)]
Iterate over the sorted groups, selecting the first item of a pair
result = [i[0] for i in groups]

Categories