Get list based on occurrences in unknown number of sublists - python

I'm looking for a way to make a list containing list (a below) into a single list (b below) with 2 conditions:
The order of the new list (b) is based on the number of times the value has occurred in some of the lists in a.
A value can only appear once
Basically turn a into b:
a = [[1,2,3,4], [2,3,4], [4,5,6]]
# value 4 occurs 3 times in list a and gets first position
# value 2 occurs 2 times in list a and get second position and so on...
b = [4,2,3,1,5,6]
I figure one could do this with set and some list magic. But can't get my head around it when a can contain any number of list. The a list is created based on user input (I guess that it can contain between 1 - 20 list with up 200-300 items in each list).
My trying something along the line with [set(l) for l in a] but don't know how to perform set(l) & set(l).... to get all matched items.
Is possible without have a for loop iterating sublist count * items in sublist times?

I think this is probably the closest you're going to get:
from collections import defaultdict
d = defaultdict(int)
for sub in outer:
for val in sub:
d[val] += 1
print sorted(d.keys(), key=lambda k: d[k], reverse = True)
# Output: [4, 2, 3, 1, 5, 6]
There is an off chance that the order of elements that appear an identical number of times may be indeterminate - the output of d.keys() is not ordered.

import itertools
all_items = set(itertools.chain(*a))
b = sorted(all_items, key = lambda y: -sum(x.count(y) for x in a))

Try this -
a = [[1,2,3,4], [2,3,4], [4,5,6]]
s = set()
for l in a:
s.update(l)
print s
#set([1, 2, 3, 4, 5, 6])
b = list(s)
This will add each list to the set, which will give you a unique set of all elements in all the lists. If that is what you are after.
Edit. To preserve the order of elements in the original list, you can't use sets.
a = [[1,2,3,4], [2,3,4], [4,5,6]]
b = []
for l in a:
for i in l:
if not i in b:
b.append(i)
print b
#[1,2,3,4,5,6] - The same order as the set in this case, since thats the order they appear in the list

import itertools
from collections import defaultdict
def list_by_count(lists):
data_stream = itertools.chain.from_iterable(lists)
counts = defaultdict(int)
for item in data_stream:
counts[item] += 1
return [item for (item, count) in
sorted(counts.items(), key=lambda x: (-x[1], x[0]))]
Having the x[0] in the sort key ensures that items with the same count are in some kind of sequence as well.

Related

Save list number within a list only if it contains elements in python

I have list of lists such as :
my_list_of_list=[['A','B','C','E'],['A','B','C','E','F'],['D','G','A'],['X','Z'],['D','M'],['B','G'],['X','Z']]
as you can see, the list 1 and 2 share the most elements (4). So, I keep a list within my_list_of_list only if the 4 shared elements (A,B,C or E) are present within that list.
Here I then save within the list_shared_number[], only the lists 1,2,3 and 6 since the other does not contain (A,B,C or E).
Expected output:
print(list_shared_number)
[0,1,2,5]
Probably sub optimal because I need to iterate 3 times over lists but it's the expect result:
from itertools import combinations
from functools import reduce
common_elements = [set(i).intersection(j)
for i, j in combinations(my_list_of_list, r=2)]
common_element = reduce(lambda i, j: i if len(i) >= len(j) else j, common_elements)
list_shared_number = [idx for idx, l in enumerate(my_list_of_list)
if common_element.intersection(l)]
print(list_shared_number)
# Output
[0, 1, 2, 5]
Alternative with 2 iterations:
common_element = {}
for i, j in combinations(my_list_of_list, r=2):
c = set(i).intersection(j)
common_element = c if len(c) > len(common_element) else common_element
list_shared_number = [idx for idx, l in enumerate(my_list_of_list)
if common_element.intersection(l)]
print(list_shared_number)
# Output
[0, 1, 2, 5]
You can find shared elements by using list comprehension. Checking if index 0 and index 1:
share = [x for x in my_list_of_list[0] if x in my_list_of_list[1]]
print(share)
Assume j is each item so [j for j in x if j in share] can find shared inner elements. if the length of this array is more than 0 so it should include in the output.
So final code is like this:
share = [x for x in my_list_of_list[0] if x in my_list_of_list[1]]
my_list = [i for i, x in enumerate(my_list_of_list) if len([j for j in x if j in share]) > 0]
print(my_list)
You can use itertools.combinations and set operations.
In the first line, you find the intersection that is the longest among pairs of lists. In the second line, you iterate over my_list_of_list to identify the lists that contain elements from the set you found in the first line.
from itertools import combinations
comparison = max(map(lambda x: (len(set(x[0]).intersection(x[1])), set(x[0]).intersection(x[1])), combinations(my_list_of_list, 2)))[1]
out = [i for i, lst in enumerate(my_list_of_list) if comparison - set(lst) != comparison]
Output:
[0, 1, 2, 5]
Oh boy, so mine is a bit messy, however I did not use any imports AND I included the initial "finding" of the two lists which have the most in common with one another. This can easily be optimised but it does do exactly what you wanted.
my_list_of_list=[['A','B','C','E'],['A','B','C','E','F'],['D','G','A'],['X','Z'],['D','M'],['B','G'],['X','Z']]
my_list_of_list = list(map(set,my_list_of_list))
mostIntersects = [0, (None,)]
for i, IndSet in enumerate(my_list_of_list):
for j in range(i+1,len(my_list_of_list)):
intersects = len(IndSet.intersection(my_list_of_list[j]))
if intersects > mostIntersects[0]: mostIntersects = [intersects, (i,j)]
FinalIntersection = set(my_list_of_list[mostIntersects[1][0]]).intersection(my_list_of_list[mostIntersects[1][1]])
skipIndexes = set(mostIntersects[1])
for i,sub_list in enumerate(my_list_of_list):
[skipIndexes.add(i) for char in sub_list
if i not in skipIndexes and char in FinalIntersection]
print(*map(list,(mostIntersects, FinalIntersection, skipIndexes)), sep = '\n')
The print provides this :
[4, (0, 1)]
['E', 'C', 'B', 'A']
[0, 1, 2, 5]
This works by first converting the lists to sets using the map function (it has to be turned back into a list so i can use len and iterate properly) I then intersect each list with the others in the list of lists and count how many elements are in each. Each time i find one with a larger number, i set mostIntersections equal to the len and the set indexes. Once i go through them all, i get the lists at the two indexes (0 and 1 in this case) and intersect them to give a list of elements [A,B,C,E] (var:finalIntersection). From there, i just iterate over all lists which are not already being used and just check if any of the elements are found in finalIntersection. If one is, the index of the list is appended to skipIndexes. This results in the final list of indexes {indices? idk} that you were after. Technically the result is a set, but to convert it back you can just use list({0,1,2,5}) which will give you the value you were after.

How do you count the number of same entities in a list? [duplicate]

This question already has answers here:
How do I count the occurrences of a list item?
(29 answers)
Closed 4 years ago.
For example
MyList=[a,a,a,c,c,a,d,d,d,b]
Returns
[4,2,3,1]
from collections import Counter
MyList=['a','a','a','c','c','a','d','d','d','b']
Counter(MyList).values() # Counts's the frequency of each element
[4, 2, 1, 3]
Counter(MyList).keys() # The corresponding elements
['a', 'c', 'b', 'd']
just do it with dictionary :
counter_dict = {}
for elem in MyList:
if elem in counter_dict.keys():
counter_dict[elem] += 1
else :
counter_dict[elem] = 1
at the end you have a dictionary with key with key , which is the element in the list , and value which is the number of appearances.
If you need only a list:
# Your list
l1 = [1,1,2,3,1,2]
# Set will give you each element from l1 without duplicates
l2 = set(l1)
# Let`s see how set look
#print(l2)
# Create new empty list
l3 = []
# Use for loop to find count of each element from set and store that in l3
for i in l2:
k = l1.count(i)
l3.append(k)
# Check how l3 now looks like
print(l3)
Return:
[3, 2, 1]
First you should change your list elements to a string type:
my_list = ['a','a','a','c','c','a','d','d','d','b']
Because a, b, c or d without quote(') don't represent any type,so then you'll get error.
Try to use python dictionary, as follows:
result ={}
for each in my_list:
result.setdefault(each, 0)
result[each] += 1
print result.values()
Then this would be the output:
[4,2,3,1]
A first try would be doing this:
occurrencies = {n:a.count(n) for n in set(a)}
This returns a dictionary which has the element as key, and its occurrences as value. I use set to avoid counting elements more than once.
This is not a one-pass approach and it has quadratic complexity in time, hence this could be really slow.
Here's a way you could do to get the same result with one pass, linear complexity:
def count_occurrencies(inputArray):
result = {}
for element in inputArray:
if element not in result:
result[element] = 1
else:
result[element] += 1
return result

Indexing a list with an unique index

I have a list say l = [10,10,20,15,10,20]. I want to assign each unique value a certain "index" to get [1,1,2,3,1,2].
This is my code:
a = list(set(l))
res = [a.index(x) for x in l]
Which turns out to be very slow.
l has 1M elements, and 100K unique elements. I have also tried map with lambda and sorting, which did not help. What is the ideal way to do this?
You can do this in O(N) time using a defaultdict and a list comprehension:
>>> from itertools import count
>>> from collections import defaultdict
>>> lst = [10, 10, 20, 15, 10, 20]
>>> d = defaultdict(count(1).next)
>>> [d[k] for k in lst]
[1, 1, 2, 3, 1, 2]
In Python 3 use __next__ instead of next.
If you're wondering how it works?
The default_factory(i.e count(1).next in this case) passed to defaultdict is called only when Python encounters a missing key, so for 10 the value is going to be 1, then for the next ten it is not a missing key anymore hence the previously calculated 1 is used, now 20 is again a missing key and Python will call the default_factory again to get its value and so on.
d at the end will look like this:
>>> d
defaultdict(<method-wrapper 'next' of itertools.count object at 0x1057c83b0>,
{10: 1, 20: 2, 15: 3})
The slowness of your code arises because a.index(x) performs a linear search and you perform that linear search for each of the elements in l. So for each of the 1M items you perform (up to) 100K comparisons.
The fastest way to transform one value to another is looking it up in a map. You'll need to create the map and fill in the relationship between the original values and the values you want. Then retrieve the value from the map when you encounter another of the same value in your list.
Here is an example that makes a single pass through l. There may be room for further optimization to eliminate the need to repeatedly reallocate res when appending to it.
res = []
conversion = {}
i = 0
for x in l:
if x not in conversion:
value = conversion[x] = i
i += 1
else:
value = conversion[x]
res.append(value)
Well I guess it depends on if you want it to return the indexes in that specific order or not. If you want the example to return:
[1,1,2,3,1,2]
then you can look at the other answers submitted. However if you only care about getting a unique index for each unique number then I have a fast solution for you
import numpy as np
l = [10,10,20,15,10,20]
a = np.array(l)
x,y = np.unique(a,return_inverse = True)
and for this example the output of y is:
y = [0,0,2,1,0,2]
I tested this for 1,000,000 entries and it was done essentially immediately.
Your solution is slow because its complexity is O(nm) with m being the number of unique elements in l: a.index() is O(m) and you call it for every element in l.
To make it O(n), get rid of index() and store indexes in a dictionary:
>>> idx, indexes = 1, {}
>>> for x in l:
... if x not in indexes:
... indexes[x] = idx
... idx += 1
...
>>> [indexes[x] for x in l]
[1, 1, 2, 3, 1, 2]
If l contains only integers in a known range, you could also store indexes in a list instead of a dictionary for faster lookups.
You can use collections.OrderedDict() in order to preserve the unique items in order and, loop over the enumerate of this ordered unique items in order to get a dict of items and those indices (based on their order) then pass this dictionary with the main list to operator.itemgetter() to get the corresponding index for each item:
>>> from collections import OrderedDict
>>> from operator import itemgetter
>>> itemgetter(*lst)({j:i for i,j in enumerate(OrderedDict.fromkeys(lst),1)})
(1, 1, 2, 3, 1, 2)
For completness, you can also do it eagerly:
from itertools import count
wordid = dict(zip(set(list_), count(1)))
This uses a set to obtain the unique words in list_, pairs
each of those unique words with the next value from count() (which
counts upwards), and constructs a dictionary from the results.
Original answer, written by nneonneo.

how to generate list of lists from conditional items in another list of lists

I have a list of lists and am trying to make another list of lists from specific items within the first list of lists
listOne = [[1,1,9,9],[1,4,9,6],[2,1,12,12]]
listTwo = []
for every inner list with the same numbers in positions 0 and 2, append to listTwo only the inner list with the largest value in position 3
for example, inner list 0 and inner list 1 both have a 1 in position 0 and a 9 in position 2, but inner list 0 has a 9 in position 3 and inner list 1 has a 6 in position 3 so I want to append inner list 1 and not inner list 9 to listTwo. Since inner list 2 is the only list with a 2 in position 0 and a 12 in position 1, it need not be compared to anything else, and can be appended to listTwo.
I'm thinking something like:
for items in listOne :
#for all items where items[0] and items[2] are equal :
#tempList = []
#tempList.append(items)
#tempList.sort(key = lambda x: (x[3]))
#for value in tempList[0] :
#listTwo.append(all lists with value in tempList[0])
but I'm not sure how to implement this without a lot of really bad looking code, any suggestions for a "pythonic" way of sorting these lists?
If you're looking to write concise python you will want to use list comprehensions wherever possible. Your description was a little confusing, but something like
list_two = [inner_list for inner_list in list_one if inner_list[0] == inner_list[2]]
will get you all of the inner lists in which the 0 and 2 indices match. Then you can search all these to find the one with the largest 3 index, assuming there aren't any ties
list_three = [0,0,0,0]
for i in list_two:
if i[3] > list_three[3]:
list_three = i
Perhaps throwing everything into a dictionary? Something like this:
def strangeFilter(listOne):
listTwo = []
d = {}
for innerList in listOne:
positions = (innerList[0],innerList[2])
if positions not in d:
d[positions] = []
d[positions].append(innerList)
for positions in d:
listTwo.append(max(d[positions], key= lambda x: x[3]))
return listTwo
Not sure how much of a 'pythonic' solution this is, but it uses python-defined structures alright and has a nice time order of O(n)
Sort the list on items zero and two of the inner-lists. Using itertools.groupby extract the item in each group that has a maximum value at position 3.
import operator, itertools
# a couple of useful callables for the key functions
zero_two = operator.itemgetter(0,2)
three = operator.itemgetter(3)
a = [[2,1,12,22],[1,1,9,9],[2,1,12,10],
[1,4,9,6],[8,8,8,1],[2,1,12,12],
[1,3,9,8],[2,1,12,15],[8,8,8,0]
]
a.sort(key = zero_two)
for key, group in itertools.groupby(a, zero_two):
print(key, max(group, key = three))
'''
>>>
(1, 9) [1, 1, 9, 9]
(2, 12) [2, 1, 12, 22]
(8, 8) [8, 8, 8, 1]
>>>
'''
result = [max(group, key = three) for key, group in itertools.groupby(a, zero_two)]
You could also sort on items zero, two, three. Then group by items zero and two and extract the last item of the group.
zero_two_three = operator.itemgetter(0,2,3)
zero_two = operator.itemgetter(0,2)
last_item = operator.itemgetter(-1)
a.sort(key = zero_two_three)
for key, group in itertools.groupby(a, zero_two):
print(key, last_item(list(group)))

Finding duplicates in few lists

In my case duplicate is not a an item that reappear in one list, but also in the same positions on another lists. For example:
list1 = [1,2,3,3,3,4,5,5]
list2 = ['a','b','b','c','b','d','e','e']
list3 = ['T1','T2','T3','T4','T3','T4','T5','T5']
So the position of the real duplicates in all 3 lists is [2,4] and [6,7]. Because in list1 3 is repeated, in list2 'b' is repeated in the same position as in list1, in list 3 'T3'. in second case 5,e,T5 represent duplicated items in the same positions in their lists. I have a hard time to present results "automatically" in one step.
1) I find duplicate in first list
# Find Duplicated part numbers (exact maches)
def list_duplicates(seq):
seen = set()
seen_add = seen.add
# adds all elements it doesn't know yet to seen and all other to seen_twice
seen_twice = set( x for x in seq if x in seen or seen_add(x) )
# turn the set into a list (as requested)
return list(seen_twice)
# List of Duplicated part numbers
D_list1 = list_duplicates(list1)
D_list2 = list_duplicates(list2)
2) Then I find the positions of given duplicate and look at that position in second list
# find the row position of duplicated part numbers
def list_position_duplicates(list1,n,D_list1):
position = []
gen = (i for i,x in enumerate(data) if x == D_list1[n])
for i in gen: position.append(i)
return position
# Actual calculation find the row position of duplicated part numbers, beginning and end
lpd_part = list_position_duplicates(list1,1,D_list1)
start = lpd_part[0]
end = lpd_part[-1]
lpd_parent = list_position_duplicates(list2[start:end+1],0,D_list2)
So in step 2 I need to put n (position of found duplicate in the list), I would like to do this step automatically, to have a position of duplicated elements in the same positions in the lists. For all duplicates in the same time, and not one by one "manualy". I think it just need a for loop or if, but I'm new to Python and I tried many combinations and it didn't work.
You can use items from all 3 lists on the same index as key and store the the corresponding index as value(in a list). If for any key there are more than 1 indices stored in the list, it is duplicate:
from itertools import izip
def solve(*lists):
d = {}
for i, k in enumerate(izip(*lists)):
d.setdefault(k, []).append(i)
for k, v in d.items():
if len(v) > 1:
print k, v
solve(list1, list2, list3)
#(3, 'b', 'T3') [2, 4]
#(5, 'e', 'T5') [6, 7]

Categories