Combining nested lists and removing duplicates - python

Hoping to get some help on a newbie question. I want to sort through a bunch of sublists and if any items inside of the sublists appear twice, remove them, and merge the values in their respective lists into a new sublist.
As an example:
my_currentlist = [[A,B],[C,D],[B,E]]
my_desiredlist = [[A,E],[C,D]]
Any ideas?
My failed attempt (sorry, new to python):
for i in range(len(items)):
j=i+1
for j in range(len(items)):
if items[i][0]==items[j][0]:
items.remove(items[i])
items.remove(items[j])
items.append([items[i][1], items[j][1]])
elif items[i][0]==items[j][1]:
items.remove(items[i])
items.remove(items[j])
items.append([items[i][1], items[j][0]])
elif items[i][1]==items[j][0]:
items.remove(items[i])
items.remove(items[j])
items.append([items[i][0], items[j][1]])
elif items[i][1]==items[j][1]:
items.remove(items[i])
items.remove(items[j])
items.append([items[i][0], items[j][0]])

you need a double loop, the first one to take one element (sublist) from the list, and the second one to check the presence of its the elements in the rest of the sublist
this should do the trick
>>> my_currentlist = [["A","B"],["C","D"],["B","E"]]
>>> my_desiredlist = []
>>> while my_currentlist:
a,b = my_currentlist.pop(0)
for i,temp in enumerate(my_currentlist):
if a in temp:
temp.remove(a)
a = temp[0]
my_currentlist.pop(i)
break
elif b in temp:
temp.remove(b)
b = temp[0]
my_currentlist.pop(i)
break
my_desiredlist.append( [a,b] )
>>> my_desiredlist
[['A', 'E'], ['C', 'D']]
>>>

Related

Python trouble with matching tuples

For reference this is my code:
list1 = [('10.180.13.101', '10.50.60.30', 'STCMGMTUNIX01')]
list2 = [('0.0.0.0', 'STCMGMTUNIX01')]
for i in list1:
for j in list2:
for k in j:
print (k)
if k.upper() in i:
matching_app.add(j)
for i in matching_app:
print (i)
When I run it, it does not match. This list can contain two or three variables and I need it to add it to the matching_app set if ANY value from list2 = ANY value from list1. It does not work unless the tuples are of equal length.
Any direction to how to resolve this logic error will be appreciated.
You can solve this in a few different ways. Here are two approaches:
Looping:
list1 = [('10.180.13.101', '10.50.60.30', 'STCMGMTUNIX01')]
list2 = [('0.0.0.0', 'STCMGMTUNIX01')]
matches = []
for i in list1[0]:
if i in list2[0]:
matches.append(i)
print(matches)
#['STCMGMTUNIX01']
List Comp with a set
merged = list(list1[0] + list2[0])
matches2 = set([i for i in merged if merged.count(i) > 1])
print(matches2)
#{'STCMGMTUNIX01'}
I'm not clear of what you want to do. You have two lists, each containing exactly one tuple. There also seems to be one missing comma in the first tuple.
For finding an item from a list in another list you can:
list1 = ['10.180.13.101', '10.50.60.30', 'STCMGMTUNIX01']
list2 = ['0.0.0.0', 'STCMGMTUNIX01']
for item in list2:
if item.upper() in list1: # Check if item is in list
print(item, 'found in', list1)
Works the same way with tuples.

Return True and stop if first item in list is matches, return false if no item matches

I wrote a function to check a string based on the following conditions:
Check if any word in the string is in list 1.
If string in list 1, return position to see if the item next to it in
the string is in list 2.
If in list 1 and in list 2, return True.
Else return False.
The tricky part is that those lists are really long. So to be efficient, the first occurrence in list 2 will return True and move on to the next string.
Below is the code that I have choked up but I doubt it is working as intended. Efficiency is really the key her as well. I tried creating a full combination of list 1 and list 2 and then loop through but it seems a bit insane looping over 100 million times per string.
1st Code:
def double_loop(words):
all_words = words.split()
indices = [n for n,x in enumerate(allwords) if x in list1]
if len(indices) != 0:
for pos in indices[:-1]:
if all_words[pos+1] in list2:
return True
#the purpose here is not to continue looking #through the whole list of indices, but end as soon as list2 contains an item 1 position after.
break
else:
return False
I am unsure whether the code above is working based on my logic above. In comparison;
2nd Code:
def double_loop(words):
all_words = words.split()
indices = [n for n,x in enumerate(allwords) if x in list1]
if len(indices) != 0:
indices2 = [i for i in indices[:-1] if all_words[i+1] in list2]
if len(indices2) != 0:
return True
else:
return False
else:
return False
has the same run time based on my test data.
I guess the clearer question is, will my first code actually run till it finds the "first" element that meets its criteria and break. Or is it still running through all the elements like the second code.
If I understand your question correctly then you will achieve the fastest lookup time with a precomputed index. This index is the set of all items of list 1, where the following item is in list 2.
# Build index, can be reused
index = set(item1 for i, item1 in enumerate(list1[:-1]) if list1[i+1] in list2)
def double_loop(words):
return any(word in index for word in words.split())
Index lookups will be done in constant time, no matter how long list1 and list2 are. Index building will take longer when list1 and list2 get longer. Note that making list2 a set might speed up index building when list1 is large.
You could iterate the main list only once by combining your criteria in a single list comprehension:
list1 = ['h', 'e', 'l', 'l', 'o']
list2 = ['a', 'h', 'e', 'p', 'l', 'o']
set_list2 = set(list2)
check = [item for x, item in enumerate(list1) if item in set_list2 and list2[x+1] == item]
If you want the function to short-circuit:
def check(list1, list2):
set_list2 = set(list2)
for x, item in enumerate(list1):
if item in set_list2 and list2[x+1] == item:
return True
return False
a = check(list1, list2)
First, your list1 and list2 should not be lists but sets. That is because a set lookup is a hash lookup and a list lookup is a linear search.
And you don't need a nested loop.
def single_loop(words):
all_words = words.split()
for w1, w2 in ((all_words[i],all_words[i+1]) for i in range(len(all_words)-1)):
if w1 in set1 and w2 in set2:
return True
else:
return False

searching a list for duplicate values in python 2.7

list1 = ["green","red","yellow","purple","Green","blue","blue"]
so I got a list, I want to loop through the list and see if the colour has not been mentioned more than once. If it has it doesn't get append to the new list.
so u should be left with
list2 = ["red","yellow","purple"]
So i've tried this
list1 = ["green","red","yellow","purple","Green","blue","blue","yellow"]
list2 =[]
num = 0
for i in list1:
if (list1[num]).lower == (list1[i]).lower:
num +=1
else:
list2.append(i)
num +=1
but i keep getting an error
Use a Counter: https://docs.python.org/2/library/collections.html#collections.Counter.
from collections import Counter
list1 = ["green","red","yellow","purple","green","blue","blue","yellow"]
list1_counter = Counter([x.lower() for x in list1])
list2 = [x for x in list1 if list1_counter[x.lower()] == 1]
Note that your example is wrong because yellow is present twice.
The first problem is that for i in list1: iterates over the elements in the list, not indices, so with i you have an element in your hands.
Next there is num which is an index, but you seem to increment it rather the wrong way.
I would suggest that you use the following code:
for i in range(len(list1)):
unique = True
for j in range(len(list1)):
if i != j and list1[i].lower() == list1[j].lower():
unique = False
break
if unique:
list2.append(list1[i])
How does this work: here i and j are indices, you iterate over the list and with i you iterate over the indices of element you potentially want to add, now you do a test: you check if somewhere in the list you see another element that is equal. If so, you set unique to False and do it for the next element, otherwise you add.
You can also use a for-else construct like #YevhenKuzmovych says:
for i in range(len(list1)):
for j in range(len(list1)):
if i != j and list1[i].lower() == list1[j].lower():
break
else:
list2.append(list1[i])
You can make the code more elegant by using an any:
for i in range(len(list1)):
if not any(i != j and list1[i].lower() == list1[j].lower() for j in range(len(list1))):
list2.append(list1[i])
Now this is more elegant but still not very efficient. For more efficiency, you can use a Counter:
from collections import Counter
ctr = Counter(x.lower() for x in list1)
Once you have constructed the counter, you look up the amount of times you have seen the element and if it is less than 2, you add it to the list:
from collections import Counter
ctr = Counter(x.lower() for x in list1)
for element in list1:
if ctr[element.lower()] < 2:
list2.append(element)
Finally you can now even use list comprehension to make it very elegantly:
from collections import Counter
ctr = Counter(x.lower() for x in list1)
list2 = [element for element in list1 if ctr[element.lower()] < 2]
Here's another solution:
list2 = []
for i, element in enumerate(list1):
if element.lower() not in [e.lower() for e in list1[:i] + list1[i + 1:]]:
list2.append(element)
By combining the built-ins, you can keep only unique items and preserve order of appearance, with just a single empty class definition and a one-liner:
from future_builtins import map # Only on Python 2 to get generator based map
from collections import Counter, OrderedDict
class OrderedCounter(Counter, OrderedDict):
pass
list1 = ["green","red","yellow","purple","Green","blue","blue"]
# On Python 2, use .iteritems() to avoid temporary list
list2 = [x for x, cnt in OrderedCounter(map(str.lower, list1)).items() if cnt == 1]
# Result: ['red', 'yellow', 'purple']
You use the Counter feature to count an iterable, while inheriting from OrderedDict preserves key order. All you have to do is filter the results to check and strip the counts, reducing your code complexity significantly.
It also reduces the work to one pass of the original list, with a second pass that does work proportional to the number of unique items in the list, rather than having to perform two passes of the original list (important if duplicates are common and the list is huge).
Here's yet another solution...
Firstly, before I get started, I think there is a typo... You have "yellow" listed twice and you specified you'd have yellow in the end. So, to that, I will provide two scripts, one that allows the duplicates and one that doesn't.
Original:
list1 = ["green","red","yellow","purple","Green","blue","blue","yellow"]
list2 =[]
num = 0
for i in list1:
if (list1[num]).lower == (list1[i]).lower:
num +=1
else:
list2.append(i)
num +=1
Modified (does allow duplicates):
colorlist_1 = ["green","red","yellow","purple","Green","blue","blue","yellow"]
colorlist_2 = []
seek = set()
i = 0
numberOfColors = len(colorlist_1)
while i < numberOfColors:
if numberOfColors[i].lower() not in seek:
seek.add(numberOfColors[i].lower())
colorlist_2.append(numberOfColors[i].lower())
i+=1
print(colorlist_2)
# prints ["green","red","yellow","purple","blue"]
Modified (does not allow duplicates):
EDIT
Willem Van Onsem's answer is totally applicable and already thorough.

Iterating over lists of lists in Python

I have a list of lists:
lst1 = [["(a)", "(b)", "(c)"],["(d)", "(e)", "(f)", "(g)"]]
I want to iterate over each element and perform some string operations on them for example:
replace("(", "")
I tried iterating over the list using:
for l1 in lst1:
for i in l1:
lst2.append(list(map(str.replace("(", ""), l1)))
I wanted the out result to be the same as original list of lists but without the parenthesis. Also, I am looking for a method in editing lists of lists and not really a specific solution to this question.
Thank you,
Edit:
Yes, you should use normal for-loops if you want to:
Preform multiple operations on each item contained in each sub-list.
Keep both the main list as well as the sub-lists as the same objects.
Below is a simple demonstration of how to do this:
main = [["(a)", "(b)", "(c)"], ["(d)", "(e)", "(f)", "(g)"]]
print id(main)
print id(main[0])
print id(main[1])
print
for sub in main:
for index,item in enumerate(sub):
### Preform operations ###
item = item.replace("(", "")
item = item.replace(")", "")
item *= 2
sub[index] = item # Reassign the item
print main
print
print id(main)
print id(main[0])
print id(main[1])
Output:
25321880
25321600
25276288
[['aa', 'bb', 'cc'], ['dd', 'ee', 'ff', 'gg']]
25321880
25321600
25276288
Use a nested list comprehension:
>>> lst1 = [["(a)", "(b)", "(c)"],["(d)", "(e)", "(f)", "(g)"]]
>>> id(lst1)
35863808
>>> lst1[:] = [[y.replace("(", "") for y in x] for x in lst1]
>>> lst1
[['a)', 'b)', 'c)'], ['d)', 'e)', 'f)', 'g)']]
>>> id(lst1)
35863808
>>>
The [:] will keep the list object the same.
I just did what you did, i used the fact that each element of a list can be assigned a new (or updated) value:
>>> lst1 = [["(a)", "(b)", "(c)"],["(d)", "(e)", "(f)", "(g)"]]
>>> for x in range(len(lst1)):
for y in range(len(lst1[x])):
lst1[x][y] = lst1[x][y].replace("(", "")
>>> lst1
[['a)', 'b)', 'c)'], ['d)', 'e)', 'f)', 'g)']]
EDIT
This is how you do it with the "real problem" that you mentioned in the comment:
a = [[(12.22, 12.122, 0.000)], [(1232.11, 123.1231, 0.000)]]
some_num = 10
for x in range(len(a)):
b = list(a[x][0])
for y in range(len(b)):
b[y] *= some_num
a[x] = tuple(b)
print(a)
OUTPUT:
[(122.2, 121.22, 0.0), (12321.099999999999, 1231.231, 0.0)]
^ All elements have been multiplied by a number and the original format is kept
This is how it works:
So you have the initial list 'a' that has two sublists each with only ONE element (the tuple that contains the x,y,z coordinates). I go through list 'a' and make the tuples a list and set them equal to 'b' (so the fourth line has a value of [12.22, 12.122, 0.000] the first time around (and the next tuple (as a list) the next time around).
Then I go through each of the elements in 'b' (the tuple converted into a list) and multiply each element in that tuple by a number with the use of the increment operator (+=, -=, /=, *=). Once this loop is done, I set that same position in the master list 'a' equal to the tuple of the previously converted tuple. < If this doesn't make sense, what I'm saying is that the initial tuples are converted into lists (then operated on), and then converter back to tuples (since you want it to end up with the same format as before).
Hope this helps!
>>> lst1 = [["(a)", "(b)", "(c)"],["(d)", "(e)", "(f)", "(g)"]]
>>> [[j.strip('()') for j in i] for i in lst1]
[['a', 'b', 'c'], ['d', 'e', 'f', 'g']]
>>> [[j.lstrip('(') for j in i] for i in lst1]
[['a)', 'b)', 'c)'], ['d)', 'e)', 'f)', 'g)']]

Python wont remove items from list

filtered_list = ['PerezHilton', 'tomCruise', 'q', 'p']
#BIO[user]['follows'] is just a list of strings say ['a', 'b', 'katieh']
#specs is also a string say eg. 'katieh'
for user in filtered_list:
if specs not in BIO[user]['follows']:
filtered_list.remove(user)
The above code for some reson gives this error "ValueError: list.remove(x): x not in list" but clearly 'p' is in the list so why is it not detecting 'p' but it is finding 'q'??
Im soo stumped but any help is appreciated, thanks
** SORRY i FIXED IT NOW *
The list comprehension that does this correctly in one line is at the bottom of the post. Here's some insight into the problem first.
Don't do things like:
for item in list_:
list_.remove(item)
because bad and confusing things happen.
>>> list_ = range(10)
>>> for item in list_:
... list_.remove(item)
...
>>> list_
[1, 3, 5, 7, 9]
Every time you remove an item, you change the indexes for the rest of the items which messes up the loop. One good way to remove items from a list while you're traversing it is to do it by index and work backwards so that removals don't affect the rest of the iterations. This is better because if you remove the 9'th element, then the 8'th element is still the 8'th element but the 10'th element becomes the 9'th element. If you've already dealt with that element, then you don't care what its index is.
>>> list_ = range(10)
>>> for i in xrange(len(list_) - 1, -1, -1):
... del list_[i]
...
>>> list_
[]
Or with a while loop:
i = len(list_)
while i:
i -= 1
del list_[i]
So in your case, the code would look something like
users[:] = [user for user in users if specs in BIO[user]['follows']]
because this is a filtering job and those are best done with list comprehensions. The point of the [:] is that it assigns to a slice of the list instead of clobbering the reference to the list. This means that every other reference to the list will be updated. It's essentially in-place, except that a copy is made before overwriting the original list. For the sake of completeness, here's how to do it with a while loop.
i = len(users)
while i:
i -= 1
if specs not in BIO[users[i]]['follows']:
del users[i]
You could do this if you wanted it done in place. No copy of the list is made here.
Why are you iterating?
>>> un = ['PerezHilton', 'tomCruise', 'q', 'p']
>>> un.remove('p')
>>> un
['PerezHilton', 'tomCruise', 'q']

Categories