Finding item frequency in list of lists - python

Let's say I have a list of lists and I want to find the frequency in which pairs (or more) of elements appears in total.
For example, if i have [[a,b,c],[b,c,d],[c,d,e]
I want :(a,b) = 1, (b,c) = 2, (c,d) = 2, etc.
I tried finding a usable apriori algorithm that would allow me to do this, but i couldn't find a easy to implement one in python.
How would I approach this problem in a better way?

This is a way to do it:
from itertools import combinations
l = [['a','b','c'],['b','c','d'],['c','d','e']]
d = {}
for i in l:
# for every item on l take all the possible combinations of 2
comb = combinations(i, 2)
for c in comb:
k = ''.join(c)
if d.get(k):
d[k] += 1
else:
d[k] = 1
Result:
>>> d
{'bd': 1, 'ac': 1, 'ab': 1, 'bc': 2, 'de': 1, 'ce': 1, 'cd': 2}

Related

Histogram of lists enteries

I have a number of lists as follows:
list1 = ['a_1','a_2','b_17','c_19']
list2 = ['aa_1','a_12','b_15','d_39']
list3 = ['a_1','a_200','ba_1','u_0']
I wish to generate a histogram based on the labels, ignoring the numbering, that is, a has 4 entries over all the lists, ba 1 entry, u has 1 entry, and so on. The labels, are file names from a specific folder, before adding the numbers, so it is a finite known list.
How can I perform such a count without a bunch of ugly loops? Can I use unique here, somehow?
You cannot acheive it without a loop. But you can instead use list comphrension to make it into a single line. Something like this.
list1 = ['a_1','a_2','b_17','c_19']
list2 = ['aa_1','a_12','b_15','d_39']
list3 = ['a_1','a_200','ba_1','u_0']
lst = [x.split('_')[0] for x in (list1 + list2 + list3)]
print({x: lst.count(x) for x in lst})
You can use a defaultdict initialized to 0 to count the occurrence and get a nice container with the required information.
So, define the container:
from collections import defaultdict
histo = defaultdict(int)
I'd like to split the operation into methods.
First get the prefix from the string, to be used as key in the dictionary:
def get_prefix(string):
return string.split('_')[0]
This works like
get_prefix('az_1')
#=> 'az'
Then a method to update de dictionary:
def count_elements(lst):
for e in lst:
histo[get_prefix(e)] += 1
Finally you can call this way:
count_elements(list1)
count_elements(list2)
count_elements(list3)
dict(histo)
#=> {'a': 5, 'b': 2, 'c': 1, 'aa': 1, 'd': 1, 'ba': 1, 'u': 1}
Or directly
count_elements(list1 + list2 + list3)
To get the unique count, call it using set:
count_elements(set(list1 + list2 + list3))
dict(histo)
{'ba': 1, 'a': 4, 'aa': 1, 'b': 2, 'u': 1, 'd': 1, 'c': 1}

Is it possible to update a dictionary based on a certain condition in python3?

I want update dictionary keys, with different values. I currently have a dictionary like this
{'Dog': 15, 'Cat': 9, 'Rat': 12}
I am trying to update the values such that, the key with greatest value takes an integer number 1, then the key with the second greatest value takes an integer number 2 and so on. I am trying to get the output as
{'Dog': 1, 'Cat': 3, 'Rat': 2}
Here is my code below:
values = [val for k, val in Animals.items()]
keys = [k for k in Animals]
n = len(values)
count = 1
k_index = 0
for i in range(n -1):
for j in range(0, n - i - 1):
if values[j] > values[j+1]:
key = keys[k_index]
key2 = keys[k_index + 1]
Animals[key], Animals[key] = count, (count + 1)
print(Animals)```
You can try the following idea:
First create and sort the list according to the values in the original dict:
data = {'Dog': 15, 'Cat': 9, 'Rat': 12}
temp = sorted(data.items(), key=lambda x: x[1], reverse=True)
#[('Dog', 15), ('Rat', 12), ('Cat', 9)]
Then use that order to create a new dict:
result = {item[0]: i + 1 for i,item in enumerate(temp)}
#{'Dog': 1, 'Rat': 2, 'Cat': 3}

numbered List transfer to bins

If I have a list:
a=[6,3,2,5,1,4]
and a specific size likes 3,
so number 1~2 will tagged 1, 3~4 tagged 2, 5~6 tagged 3, likes list b:
b=[3,2,1,3,1,2]
How can I efficient to do this?
Note that list could be floating number
Sorry for unclear description,
Update with more complex example:
a=[2.5,1.4,1.6,2.1,1.5,0.7]
output will be :
b=[3,1,2,3,2,1]
This is what you need :
import math
a=[6,3,2,5,1,4]
size = 3
b = list(a)
b.sort()
for i in range(len(a)) :
w = len(b)/size #1
a[i] = math.ceil((b.index(a[i]) + 1)/(w)) #2
print(a)
1: we're getting the length of each chunk
2: we're getting in which chunk the number is present in our sorted list and replacing it with the number of the chunk in the original list
You can achieve this by first ordering the list, and splitting it into chunks of the desired size:
a=[6,3,2,5,1,4]
n = 3
a_ord = sorted(a)
def split_list(alist, wanted_parts=1):
length = len(alist)
return [ alist[i*length // wanted_parts: (i+1)*length // wanted_parts]
for i in range(wanted_parts) ]
x = split_list(a_ord,n)
And then crating a dictionary using these chunks as the bins:
d = {j:x.index(i)+1 for i in x for j in i}
print(d)
>>> {1: 1, 2: 1, 3: 2, 4: 2, 5: 3, 6: 3}
And then mapping the values using the dictionary:
b = [d[key] for key in a]
print(b)
>>> [3, 2, 1, 3, 1, 2]
sample_dict = {
1:1,
2:1,
3:2,
4:2,
5:3,
6:3
}
for k in a:
out_arr.append(sample_dict[k])
print (out_arr)

combining sets within a list

Hi so I'm trying to do the following but have gotten a bit stuck. Say I have a list of sets:
A = [set([1,2]), set([3,4]), set([1,6]), set([1,5])]
I want to create a new list which looks like the following:
B = [ set([1,2,5,6]), set([3,4]) ]
i.e create a list of sets with the sets joined if they overlap. This is probably simple but I can't quite get it right this morning.
This also works and is quite short:
import itertools
groups = [{'1', '2'}, {'3', '2'}, {'2', '4'}, {'5', '6'}, {'7', '8'}, {'7','9'}]
while True:
for s1, s2 in itertools.combinations(groups, 2):
if s1.intersection(s2):
break
else:
break
groups.remove(s1)
groups.remove(s2)
groups.append(s1.union(s2))
groups
This gives the following output:
[{'5', '6'}, {'1', '2', '3', '4'}, {'7', '8', '9'}]
The while True does seems a bit dangerous to me, any thoughts anyone?
How about:
from collections import defaultdict
def sortOverlap(listOfTuples):
# The locations of the values
locations = defaultdict(lambda: [])
# 'Sorted' list to return
sortedList = []
# For each tuple in the original list
for i, a in enumerate(listOfTuples):
for k, element in enumerate(a):
locations[element].append(i)
# Now construct the sorted list
coveredElements = set()
for element, tupleIndices in locations.iteritems():
# If we've seen this element already then skip it
if element in coveredElements:
continue
# Combine the lists
temp = []
for index in tupleIndices:
temp += listOfTuples[index]
# Add to the list of sorted tuples
sortedList.append(list(set(temp)))
# Record that we've covered this element
for element in sortedList[-1]:
coveredElements.add(element)
return sortedList
# Run the example (with tuples)
print sortOverlap([(1,2), (3,4), (1,5), (1,6)])
# Run the example (with sets)
print sortOverlap([set([1,2]), set([3,4]), set([1,5]), set([1,6])])
You could use intersection() and union() in for loops:
A = [set([1,2]), set([3,4]), set([1,6]), set([1,5])]
intersecting = []
for someSet in A:
for anotherSet in A:
if someSet.intersection(anotherSet) and someSet != anotherSet:
intersecting.append(someSet.union(anotherSet))
A.pop(A.index(anotherSet))
A.pop(A.index(someSet))
finalSet = set([])
for someSet in intersecting:
finalSet = finalSet.union(someSet)
A.append(finalSet)
print A
Output: [set([3, 4]), set([1, 2, 5, 6])]
A slightly more straightforward solution,
def overlaps(sets):
overlapping = []
for a in sets:
match = False
for b in overlapping:
if a.intersection(b):
b.update(a)
match = True
break
if not match:
overlapping.append(a)
return overlapping
examples
>>> overlaps([set([1,2]), set([1,3]), set([1,6]), set([3,5])])
[{1, 2, 3, 5, 6}]
>>> overlaps([set([1,2]), set([3,4]), set([1,6]), set([1,5])])
[{1, 2, 5, 6}, {3, 4}]
for set_ in A:
new_set = set(set_)
for other_set in A:
if other_set == new_set:
continue
for item in other_set:
if item in set_:
new_set = new_set.union(other_set)
break
if new_set not in B:
B.append(new_set)
Input/Output:
A = [set([1,2]), set([3,4]), set([2,3]) ]
B = [set([1, 2, 3]), set([2, 3, 4]), set([1, 2, 3, 4])]
A = [set([1,2]), set([3,4]), set([1,6]), set([1,5])]
B = [set([1, 2, 5, 6]), set([3, 4])]
A = [set([1,2]), set([1,3]), set([1,6]), set([3,5])]
B = [set([1, 2, 3, 6]), set([1, 2, 3, 5, 6]), set([1, 3, 5])]
This function will do the job, without touching the input:
from copy import deepcopy
def remove_overlapped(input_list):
input = deepcopy(input_list)
output = []
index = 1
while input:
head = input[0]
try:
next_item = input[index]
except IndexError:
output.append(head)
input.remove(head)
index = 1
continue
if head & next_item:
head.update(next_item)
input.remove(next_item)
index = 1
else:
index += 1
return output
Here is a function that does what you want. Probably not the most pythonic one but does the job, most likely can be improved a lot.
from sets import Set
A = [set([1,2]), set([3,4]), set([2,3]) ]
merges = any( a&b for a in A for b in A if a!=b)
while(merges):
B = [A[0]]
for a in A[1:] :
merged = False
for i,b in enumerate(B):
if a&b :
B[i]=b | a
merged =True
break
if not merged:
B.append(a)
A = B
merges = any( a&b for a in A for b in A if a!=b)
print B
What is happening there is the following, we loop all the sets in A, (except the first since we added that to B already. We check the intersection with all the sets in B, if the intersection result anything but False (aka empty set) we perform a union on the set and start the next iteration, about set operation check this page:
https://docs.python.org/2/library/sets.html
& is intersection operator
| is union operator
You can probably go more pythonic using any() etc but wuold have required more processing so I avoided that

Is there a way in Python to index a list of containers (tuples, lists, dictionaries) by an element of a container?

I have been poking around for a recipe / example to index a list of tuples without taking a modification of the decorate, sort, undecorate approach.
For example:
l=[(a,b,c),(x,c,b),(z,c,b),(z,c,d),(a,d,d),(x,d,c) . . .]
The approach I have been using is to build a dictionary using defaultdict of the second element
from collections import defaultdict
tdict=defaultdict(int)
for myTuple in l:
tdict[myTuple[1]]+=1
Then I have to build a list consisting of only the second item in the tuple for each item in the list. While there are a number of ways to get there a simple approach is to:
tempList=[myTuple[1] for myTuple in l]
and then generate an index of each item in tdict
indexDict=defaultdict(dict)
for key in tdict:
indexDict[key]['index']=tempList.index(key)
Clearly this does not seem very Pythonic. I have been trying to find examples or insights thinking that I should be able to use something magical to get the index directly. No such luck so far.
Note, I understand that I can take my approach a little more directly and not generating tdict.
output could be a dictionary with the index
indexDict={'b':{'index':0},'c':{'index':1},'d':{'index':4},. . .}
After learning a lot from Nadia's responses I think the answer is no.
While her response works I think it is more complicated than needed. I would simply
def build_index(someList):
indexDict={}
for item in enumerate(someList):
if item[1][1] not in indexDict:
indexDict[item[1][1]]=item[0]
return indexDict
This will generate the result you want
dict((myTuple[1], index) for index, myTuple in enumerate(l))
>>> l = [(1, 2, 3), (4, 5, 6), (1, 4, 6)]
>>> dict((myTuple[1], index) for index, myTuple in enumerate(l))
{2: 0, 4: 2, 5: 1}
And if you insist on using a dictionary to represent the index:
dict((myTuple[1], {'index': index}) for index, myTuple in enumerate(l))
The result will be:
{2: {'index': 0}, 4: {'index': 2}, 5: {'index': 1}}
EDIT
If you want to handle key collision then you'll have to extend the solution like this:
def build_index(l):
indexes = [(myTuple[1], index) for index, myTuple in enumerate(l)]
d = {}
for e, index in indexes:
d[e] = min(index, d.get(e, index))
return d
>>> l = [(1, 2, 3), (4, 5, 6), (1, 4, 6), (2, 4, 6)]
>>> build_index(l)
{2: 0, 4: 2, 5: 1}
EDIT 2
And a more generalized and compact solution (in a similar definition to sorted)
def index(l, key):
d = {}
for index, myTuple in enumerate(l):
d[key(myTuple)] = min(index, d.get(key(myTuple), index))
return d
>>> index(l, lambda a: a[1])
{2: 0, 4: 2, 5: 1}
So the answer to your question is yes: There is a way in Python to index a list of containers (tuples, lists, dictionaries) by an element of a container without preprocessing. But your request of storing the result in a dictionary makes it impossible to be a one liner. But there is no preprocessing here. The list is iterated only once.
If i think this is what you're asking...
l = ['asd', 'asdxzc']
d = {}
for i, x in enumerate(l):
d[x] = {'index': i}

Categories