Find how many lists in list have the same element - python

I am new at Python, so I'm having trouble with something. I have a few string lists in one list.
list=[ [['AA','A0'],['AB','A0']],
[['AA','B0'],['AB','A0']],
[['A0','00'],['00','A0'], [['00','BB'],['AB','A0'],['AA','A0']] ]
]
And I have to find how many lists have the same element. For example, the correct result for the above list is 3 for the element ['AB','A0'] because it is the element that connects the most of them.
I wrote some code...but it's not good...it works for 2 lists in list,but not for more....
Please,help!
This is my code...for the above list...
for t in range(0,len(list)-1):
pattern=[]
flag=True
pattern.append(list[t])
count=1
rest=list[t+1:]
for p in pattern:
for j in p:
if flag==False:
break
pair= j
for i in rest:
for y in i:
if pair==y:
count=count+1
break
if brojac==len(list):
flag=False
break

Since your data structure is rather complex, you might want to build a recursive function, that is a function that calls itself (http://en.wikipedia.org/wiki/Recursion_(computer_science)).
This function is rather simple. You iterate through all items of the original list. If the current item is equal to the value you are searching for, you increment the number of found objects by 1. If the item is itself a list, you will go through that sub-list and find all matches in that sub-list (by calling the same function on the sub-list, instead of the original list). You then increment the total number of found objects by the count in your sub-list. I hope my explanation is somewhat clear.
alist=[[['AA','A0'],['AB','A0']],[['AA','B0'],['AB','A0']],[['A0','00'],['00','A0'],[['00','BB'],['AB','A0'],['AA','A0']]]]
def count_in_list(val, arr):
val_is_list = isinstance(val, list)
ct = 0
for item in arr:
item_is_list = isinstance(item, list)
if item == val or (val_is_list and item_is_list and sorted(item) == sorted(val)):
ct += 1
if item_is_list :
ct += count_in_list(val, item)
return ct
print count_in_list(['AB', 'A0'], alist)

This is an iterative approach that will also work using python3 that will get the count of all sublists:
from collections import defaultdict
d = defaultdict(int)
def counter(lst, d):
it = iter(lst)
nxt = next(it)
while nxt:
if isinstance(nxt, list):
if nxt and isinstance(nxt[0], str):
d[tuple(nxt)] += 1
rev = tuple(reversed(nxt))
if rev in d:
d[rev] += 1
else:
lst += nxt
nxt = next(it,"")
return d
print((counter(lst, d)['AB', 'A0'])
3
It will only work on data like your input, nesting of strings beside lists will break the code.
To get a single sublist count is easier:
def counter(lst, ele):
it = iter(lst)
nxt = next(it)
count = 0
while nxt:
if isinstance(nxt, list):
if ele in (nxt, nxt[::-1]):
count += 1
else:
lst += nxt
nxt = next(it, "")
return count
print(counter(lst, ['AB', 'A0']))
3

Ooookay - this maybe isn't very nice and straightforward code, but that's how i'd try to solve this. Please don't hurt me ;-)
First,
i'd fragment the problem in three smaller ones:
Get rid of your multiple nested lists,
Count the occurence of all value-pairs in the inner lists and
Extract the most occurring value-pair from the counting results.
1.
I'd still use nested lists, but only of two-levels depth. An outer list, to iterate through, and all the two-value-lists inside of it. You can finde an awful lot of information about how to get rid of nested lists right here. As i'm just a beginner, i couldn't make much out of all that very detailed information - but if you scroll down, you'll find an example similar to mine. This is what i understand, this is how i can do.
Note that it's a recursive function. As you mentioned in comments that you think this isn't easy to understand: I think you're right. I'll try to explain it somehow:
I don't know if the nesting depth is consistent in your list. and i don't want to exctract the values themselves, as you want to work with lists. So this function loops through the outer list. For each element, it checks if it's a list. If not, nothing happens. If it is a list, it'll have a look at the first element inside of that list. It'll check again if it's a list or not.
If the first element inside the current list is another list, the function will be called again - recursive - but this time starting with the current inner list. This is repeated until the function finds a list, containing an element on the first position that is NOT a list.
In your example, it'll dig through the complete list-of-lists, until it finds your first string values. Then it gets the list containing this value - and put that in another list, the one that is returned.
Oh boy, that sounds really crazy - tell me if that clarified anything... :-D
"Yo dawg, i herd you like lists, so i put a list in a list..."
def get_inner_lists(some_list):
inner_lists = []
for item in some_list:
if hasattr(item, '__iter__') and not isinstance(item, basestring):
if hasattr(item[0], '__iter__') and not isinstance(item[0], basestring):
inner_lists.extend(get_inner_lists(item))
else:
inner_lists.append(item)
return inner_lists
Whatever - call that function and you'll find your list re-arranged a little bit:
>>> foo = [[['AA','A0'],['AB','A0']],[['AA','B0'],['AB','A0']],[['A0','00'],['00','A0'],[['00','BB'],['AB','A0'],['AA','A0']]]]
>>> print get_inner_lists(foo)
[['AA', 'A0'], ['AB', 'A0'], ['AA', 'B0'], ['AB', 'A0'], ['A0', '00'], ['00', 'A0'], ['00', 'BB'], ['AB', 'A0'], ['AA', 'A0']]
2.
Now i'd iterate through that lists and build a string with their values. This will only work with lists of two values, but as this is what you showed in your example it'll do. While iterating, i'd build up a dictionary with the strings as keys and the occurrence as values. That makes it really easy to add new values and raise the counter of existing ones:
def count_list_values(some_list):
result = {}
for item in some_list:
str = item[0]+'-'+item[1]
if not str in result.keys():
result[str] = 1
else:
result[str] += 1
return result
There you have it, all the counting is done. I don't know if it's needed, but as a side effect there are all values and all occurrences:
>>> print count_list_values(get_inner_lists(foo))
{'00-A0': 1, '00-BB': 1, 'A0-00': 1, 'AB-A0': 3, 'AA-A0': 2, 'AA-B0': 1}
3.
But you want clear results, so let's loop through that dictionary, list all keys and all values, find the maximum value - and return the corresponding key. Having built the string-of-two-values with a seperator (-), it's easy to split it and make a list out of it, again:
def get_max_dict_value(some_dict):
all_keys = []
all_values = []
for key, val in some_dict.items():
all_keys.append(key)
all_values.append(val)
return all_keys[all_values.index(max(all_values))].split('-')
If you define this three little functions and call them combined, this is what you'll get:
>>> print get_max_dict_value(count_list_values(get_inner_lists(foo)))
['AB', 'A0']
Ta-Daa! :-)
If you really have such lists with only nine elements, and you don't need to count values that often - do it manually. By reading values and counting with fingers. It'll be so much easier ;-)
Otherwise, here you go!
Or...
...you wait until some Guru shows up and gives you a super fast, elegant one-line python command that i've never seen before, which will do the same ;-)

This is as simple as I can reasonably make it:
from collections import Counter
lst = [ [['AA','A0'],['AB','A0']],
[['AA','B0'],['AB','A0']],
[['A0','00'],['00','A0'], [['00','BB'],['AB','A0'],['AA','A0']] ]
]
def is_leaf(element):
return (isinstance(element, list) and
len(element) == 2 and
isinstance(element[0], basestring)
and isinstance(element[1], basestring))
def traverse(iterable):
for element in iterable:
if is_leaf(element):
yield tuple(sorted(element))
else:
for value in traverse(element):
yield value
value, count = Counter(traverse(lst)).most_common(1)[0]
print 'Value {!r} is present {} times'.format(value, count)
The traverse() generate yields a series of sorted tuples representing each item in your list. The Counter object counts the number of occurrences of each, and its .most_common(1) method returns the value and count of the most common item.
You've said recursion is too difficult, but I beg to differ: it's the simplest way possible to attack this problem. The sooner you come to love recursion, the happier you'll be. :-)

Hopefully soemthing like this is what you were looking for. It is a bit tenuous and would suggest that recursion is better. But Since you didn't want it that way here is some code that might work. I am not super good at python but hope it will do the job:
def Compare(List):
#Assuming that the list input is a simple list like ["A1","B0"]
myList =[[['AA','A0'],['AB','A0']],[['AA','B0'],['AB','A0']],[['A0','00'],['00','A0'],[['00','BB'],['AB','A0'],['AA','A0']]]]
#Create a counter that will count if the elements are the same
myCounter = 0;
for innerList1 in myList:
for innerList2 in innerList1
for innerList3 in innerList2
for element in innerList3
for myListElements in myList
if (myListElements == element)
myCounter = myCounter + 1;
#I am putting the break here so that it counts how many lists have the
#same elements, not how many elements are the same in the lists
break;
return myCounter;

Related

Python: Remove Strings in a List that are contained by at least one other String in the same List

I would love to filter my list of strings the following way: I want to exclude strings, if there is at least one other string in the same list that is "in" it. Or to put this differently: I want to maintain strings, if there is no other string of the same list that is in it. Case Sensitivity should play a role here, if possible.
To make this more clear, please find below an example:
My "first" list that contains every string:
elements =["tree","TREE","treeforest","water","waterfall"]
After applying the solution, I would love to receive this list:
elements = ["tree","TREE","water"]
For example: tree is in treeforest. Thus, treeforest is excluded from my list. Same applies for water and waterfall. However, tree, TREE and water should be maintained, because there are no others strings, that are "in" them.
As I'd like to apply this to a "larger" list of strings, more efficient solutions are preferred.
Hope this is understandable. Thanks a lot in advance!! Any help is highly appreciated.
Quite optimized function with 2 loops, which saves a lot of loop iterations:
def filterlist(l):
# keep track of elements, which will be deleted
deletelist = [False for _ in l]
for i, el in enumerate(l):
# already in deletelist, jump right to the next el
if deletelist[i]:
continue
for j, el2 in enumerate(l):
# comparing item to itself or el2 already in deletelist?
# jump to next el2
if i == j or deletelist[j]:
continue
# the comparison everyone expects
if el in el2:
deletelist[j] = True
# also, check the other way around
# will save loop iterations later
elif el2 in el:
deletelist[i] = True
break # causes jump to next el
# create new list, keep elements that are not in deletelist
return [el for i, el in enumerate(l) if not deletelist[i]]
Usually built-in functions are faster, so let's compare it to Ed Ward's solution:
# result of Ed Ward's solution using timeit:
100000 loops, best of 10: 5.38 usec per loop
# filterlist function with loops using timeit:
100000 loops, best of 10: 4.42 usec per loop
Interesting, but to get a really representative result, you should run timeit with a larger element list.
from copy import deepcopy
def remove_composite_words(e,elements):
temp = [x for x in elements if e in x]
temp = set(temp)
elements = list(set(elements).difference(temp))
return e,sorted(elements, key=len)
def keep_shortest_root(elements):
elements = deepcopy(elements)
elements = list(set(elements))
elements = sorted(elements, key=len)
if len(elements[0]) ==0:
elements = elements[1:]
results = []
e = elements[0]
while elements:
e,elements = remove_composite_words(e,elements)
results.append(e)
if elements:
e = elements[0]
return results
elements =["tree","TREE","treeforest","water","waterfall",'forestTREE','tree']
keep_shortest_root(elements)
This should return
['tree', 'TREE', 'water']
How it works:
The function remove_composite_words() tests if an element in contained in any other element in the list and save only those that match. Then it remove the matching elements from the initial list.
So if you have element 'a' and list ['a','aa','b','c'] the function will return 'a' and the list ['b','c'].
keep_shortest_root() applies remove_composite_words() to the initial list and then to the transformed list (output from remove_composite_words()) until there are no more words left.
Note that keep_shortest_root() first gets the unique words from the input list and then sorts them by length. This combined with the fact that remove_composite_words() removed the matched words from initial list make the algorithm run faster since the number of comparisons drops with the number of iterations.
This is an explanation of the answer I gave in my comment
I used this code:
new_elements = list(filter(lambda item: not any(elem in item for elem in elements if elem != item), elements))
which yields:
['tree', 'TREE', 'water']
I don't know how much you know about Python generator expressions, and filter, so I'll try to explain anyway.
filter is a Python built-in function, which takes a function to use on each item in the supplied iterable (eg list, etc). In our case, the function is this:
lambda item: not any(elem in item for elem in elements if elem != item)
This function takes an item from the the list (item), and then iterates over every element in the list (for elem in elements), and for each element (elem) checks if this element is in our string (item). Note that it skips to the next element if elem != item, because we don't want to compare it with itself.
The function any simply keeps iterating until either the expression returned is True, or it reaches the end. If there were any matches, any returns True, but to tell filter to drop this item, we need to return False, so we invert the output from any.
We also pass to filter our list (elements), and convert the result from filter to another list.
Note: the bonus of using any instead of iterating over every item for every other item is that in the case of finding a match, we don't have to iterate over the entire list: any returns at that point. In theory, this could be faster than two nested for-loops without a break statement.
Found a bit of a simpler solution to the one already provided, thought I might chip in
def Remove_Subset(List):
ListCopy=List
for Element1 in List:
for Element2 in List:
if (Element1 in Element2) and (Element1!= Element2):
ListCopy.remove(Element2)
return(ListCopy)
elements =["treeforest","tree","TREE","treeforest","water","waterfall","tree"]
print(Remove_Subset(elements))
>>> ['tree', 'TREE', 'water']

Check number not a sum of 2 ints on a list

Given a list of integers, I want to check a second list and remove from the first only those which can not be made from the sum of two numbers from the second. So given a = [3,19,20] and b = [1,2,17], I'd want [3,19].
Seems like a a cinch with two nested loops - except that I've gotten stuck with break and continue commands.
Here's what I have:
def myFunction(list_a, list_b):
for i in list_a:
for a in list_b:
for b in list_b:
if a + b == i:
break
else:
continue
break
else:
continue
list_a.remove(i)
return list_a
I know what I need to do, just the syntax seems unnecessarily confusing. Can someone show me an easier way? TIA!
You can do like this,
In [13]: from itertools import combinations
In [15]: [item for item in a if item in [sum(i) for i in combinations(b,2)]]
Out[15]: [3, 19]
combinations will give all possible combinations in b and get the list of sum. And just check the value is present in a
Edit
If you don't want to use the itertools wrote a function for it. Like this,
def comb(s):
for i, v1 in enumerate(s):
for j in range(i+1, len(s)):
yield [v1, s[j]]
result = [item for item in a if item in [sum(i) for i in comb(b)]]
Comments on code:
It's very dangerous to delete elements from a list while iterating over it. Perhaps you could append items you want to keep to a new list, and return that.
Your current algorithm is O(nm^2), where n is the size of list_a, and m is the size of list_b. This is pretty inefficient, but a good start to the problem.
Thee's also a lot of unnecessary continue and break statements, which can lead to complicated code that is hard to debug.
You also put everything into one function. If you split up each task into different functions, such as dedicating one function to finding pairs, and one for checking each item in list_a against list_b. This is a way of splitting problems into smaller problems, and using them to solve the bigger problem.
Overall I think your function is doing too much, and the logic could be condensed into much simpler code by breaking down the problem.
Another approach:
Since I found this task interesting, I decided to try it myself. My outlined approach is illustrated below.
1. You can first check if a list has a pair of a given sum in O(n) time using hashing:
def check_pairs(lst, sums):
lookup = set()
for x in lst:
current = sums - x
if current in lookup:
return True
lookup.add(x)
return False
2. Then you could use this function to check if any any pair in list_b is equal to the sum of numbers iterated in list_a:
def remove_first_sum(list_a, list_b):
new_list_a = []
for x in list_a:
check = check_pairs(list_b, x)
if check:
new_list_a.append(x)
return new_list_a
Which keeps numbers in list_a that contribute to a sum of two numbers in list_b.
3. The above can also be written with a list comprehension:
def remove_first_sum(list_a, list_b):
return [x for x in list_a if check_pairs(list_b, x)]
Both of which works as follows:
>>> remove_first_sum([3,19,20], [1,2,17])
[3, 19]
>>> remove_first_sum([3,19,20,18], [1,2,17])
[3, 19, 18]
>>> remove_first_sum([1,2,5,6],[2,3,4])
[5, 6]
Note: Overall the algorithm above is O(n) time complexity, which doesn't require anything too complicated. However, this also leads to O(n) extra auxiliary space, because a set is kept to record what items have been seen.
You can do it by first creating all possible sum combinations, then filtering out elements which don't belong to that combination list
Define the input lists
>>> a = [3,19,20]
>>> b = [1,2,17]
Next we will define all possible combinations of sum of two elements
>>> y = [i+j for k,j in enumerate(b) for i in b[k+1:]]
Next we will apply a function to every element of list a and check if it is present in above calculated list. map function can be use with an if/else clause. map will yield None in case of else clause is successful. To cater for this we can filter the list to remove None values
>>> list(filter(None, map(lambda x: x if x in y else None,a)))
The above operation will output:
>>> [3,19]
You can also write a one-line by combining all these lines into one, but I don't recommend this.
you can try something like that:
a = [3,19,20]
b= [1,2,17,5]
n_m_s=[]
data=[n_m_s.append(i+j) for i in b for j in b if i+j in a]
print(set(n_m_s))
print("after remove")
final_data=[]
for j,i in enumerate(a):
if i not in n_m_s:
final_data.append(i)
print(final_data)
output:
{19, 3}
after remove
[20]

Creating a function that removes duplicates in list

I'm trying to manually make a function that removes duplicates from a list. I know there is a Python function that does something similar (set()), but I want to create my own. This is what I have:
def remove(lst):
for i in range(len(lst)):
aux = lst[0:i] + lst[i+1:len(lst)]
if lst[i] in aux:
del(lst[i])
return lst
I was trying something like creating a sub-list with all the items except the one the for is currently on, and then check if the item is still in the list. If it is, remove it.
The problem is that it gives me an index out of range error. Does the for i in range(len(lst)): line not update every time it starts over? Since I'm removing items from the list, the list will be shorter, so for a list that has 10 items and 2 duplicates, it will go up to index 9 instead of stopping on the 7th.
Is there anyway to fix this, or should I just try doing this is another way?
I know this does not fix your current script, but would something like this work?
def remove(lst):
unique=[]
for i in lst:
if i not in unique: unique.append(i)
return unique
Just simply looping through, creating another list and checking for membership?
The problem is you are manipulating the list as you are iterating over it. This means that when you reach the end of the list, it is now shorter because you're removed elements. You should (generally) avoid removing elements while you are looping over lists.
You got it the first time: len(lst) is evaluated only when you enter the loop. If you want it re-evaluated, try the while version:
i = 0
while i < len(lst):
...
i += 1
Next, you get to worry about another problem: you increment i only when you don't delete an item. When you do delete, shortening the list gets you to the next element.
i = 0
while i < len(lst):
aux = lst[0:i] + lst[i+1:len(lst)]
if lst[i] in aux:
del(lst[i])
else:
i += 1
I think that should solve your problem ... using the logic you intended.
def remove(lst):
new_list = []
for i in lst:
if i not in new_list:
new_list.append(i)
return new_list
You should append the values to a secondary list. As Bobbyrogers said, it's not a good idea to iterate over a list that is changing.
You can also try this:
lst = [1,2,3,3,4,4,5,6]
lst2 = []
for i in lst:
if i not in lst2:
lst2.append(i)
print(lst2)
[1, 2, 3, 4, 5, 6]

how to convert a set in python into a dictionary

I am new to python and trying to convert a Set into a Dictionary. I am struggling to find a way to make this possible. Any inputs are highly appreciated. Thanks.
Input : {'1438789225', '1438789230'}
Output : {'1438789225':1, '1438789230':2}
Use enumerate() to generate a value starting from 0 and counting upward for each item in the dictionary, and then assign it in a comprehension:
input_set = {'1438789225', '1438789230'}
output_dict = {item:val for val,item in enumerate(input_set)}
Or a traditional loop:
output_dict = {}
for val,item in enumerate(input_set):
output_dict[item] = val
If you want it to start from 1 instead of 0, use item:val+1 for the first snippet and output_dict[item] = val+1 for the second snippet.
That said, this dictionary would be pretty much the same as a list:
output = list(input_set)
My one-liner:
output = dict(zip(input_set, range(1, len(s) + 1)))
zip mixes two lists (or sets) element by element (l1[0] + l2[0] + l1[1] + l2[1] + ...).
We're feeding it two things:
the input_set
a list from 1 to the length of the set + 1 (since you specified you wanted to count from 1 onwards, not from 0)
The output is a list of tuples like [('1438789225', 1), ('1438789230', 2)] which can be turned into a dict simply by feeding it to the dict constructor... dict.
But like TigerhawkT3 said, I can hardly find a use for such a dictionary. But if you have your motives there you have another way of doing it. If you take away anything from this post let it be the existence of zip.
an easy way of doing this is by iterating on the set, and populating the result dictionary element by element, using a counter as dictionary key:
def setToIndexedDict(s):
counter = 1
result = dict()
for element in s:
result[element] = counter #adding new element to dictionary
counter += 1 #incrementing dictionary key
return result
My Python is pretty rusty, but this should do it:
def indexedDict(oldSet):
dic = {}
for elem,n in zip(oldSet, range(len(oldSet)):
dic[elem] = n
return dic
If I wrote anything illegal, tell me and I'll fix it. I don't have an interpreter handy.
Basically, I'm just zipping the list with a range object (basically a continuous list of numbers, but more efficient), then using the resulting tuples.
Id got with Tiger's answer, this is basically a more naive version of his.

How can I make my code be a set?

I have a little code that takes a list of objects, and only outputs the items in the list that are unique.
This is my code
def only_once(a):
return [x for x in a if a.count(x) is 1]
My teacher requires us to use sets for this function though.
Can someone show me what I can do?
My code has to take an input such as a=[1,4,6,7,3,2,4,5,7,5,6], and output [1, 3, 2]. Has to retain it's order also.
[I'm assuming that you're also user1744238 and user1744316 -- please pick a username and stick to it, that way it's easier to check to see what variants of a question you've asked and what you've already tried.]
One set-based approach is to use two sets as a counter. You only care about whether you've seen something once or more than once. For example, here's an easy-to-explain approach:
Make an empty set for once and more.
Loop over every element of your list, and:
If you haven't seen it before, add it to once.
If you've seen it once, remove it from once and add it to more.
Now you know what elements you've seen exactly once, in the set once.
Loop over the elements of the list, and if you've seen it once, add it to the output list, and remove it from the once set so you don't output the same element twice.
This gives me:
In [49]: f([1,4,6,7,3,2,4,5,7,5,6])
Out[49]: [1, 3, 2]
To clarify, what you want is a set of items that appear once, and only once.
The best option here is to use collections.Counter(), as it means you only count the items once, rather than once per item, greatly increasing performance:
>>> import collections
>>> {key for key, count in collections.Counter(a).items() if count == 1}
{1, 2, 3}
We simply replace the square brackets with curly braces to signify a set comprehension over a list comprehension, to get a set of results.
If you need to remove any item that is in the list more than once, not just occurences after the first, you can use:
# without using generators / comprehensions
def only_once(iterable):
seen = set()
duplicates = set()
for item in iterable:
if item in seen:
duplicates.add(item)
seen.add(item)
result = []
for item in iterable:
if item not in duplicates:
result.append(item)
return result
For general order-preserving duplicate elimination, see unique_everseen in the itertools recipes:
def unique_everseen(iterable, key=None):
"List unique elements, preserving order. Remember all elements ever seen."
# unique_everseen('AAAABBBCCDAABBB') --> A B C D
# unique_everseen('ABBCcAD', str.lower) --> A B C D
seen = set()
seen_add = seen.add
if key is None:
for element in ifilterfalse(seen.__contains__, iterable):
seen_add(element)
yield element
else:
for element in iterable:
k = key(element)
if k not in seen:
seen_add(k)
yield element

Categories