Combining two lists with similar elements - python

I have multiple lists each containing words and the a number representing the number of times the word showed up in a article. I want to combine these lists together keeping unique words separate and adding the counts of same words. Example:
list_one = [(u'he':3),(u'she':2),(u'it':1),(u'pineapple':1)]
list_two = [(u'he':4),(u'she':1),(u'it':0)]
and then by combining list_one and list_two return a list_three
list_three = [(u'he':7),(u'she':3),(u'it':1),(u'pineapple':1)]
I got lists using collections.Counter from articles and have tried using Counter.update to add the two together . I would like to keep the order, meaning keeping the highest number of counts in the front of the list. Any help would be great.
Swiss

Python Counters can actually be summed! - http://ideone.com/spJMsx
Several mathematical operations are provided for combining Counter objects to produce multisets (counters that have counts greater than zero). Addition and subtraction combine counters by adding or subtracting the counts of corresponding elements.
From the Python documentation
So this:
from collections import Counter
list1 = Counter(['eggs','spam','spam','eggs','sausage','and spam'])
list2 = Counter(['spam','bacon','spam','eggs','sausage','and spam'])
print list1
print list2
print list1+list2
Outputs this:
Counter({'eggs': 2, 'spam': 2, 'sausage': 1, 'and spam': 1})
Counter({'spam': 2, 'eggs': 1, 'bacon': 1, 'sausage': 1, 'and spam': 1})
Counter({'spam': 4, 'eggs': 3, 'sausage': 2, 'and spam': 2, 'bacon': 1})

Let's start with your two lists, adapted slightly to work in Python:
list_one = [(u'he', 3),(u'she', 2),(u'it', 1),(u'pineapple', 1)]
list_two = [(u'he', 4),(u'she', 1),(u'it',0)]
Now, let's combine them:
d = {word:value for word, value in list_one}
for word, value in list_two:
d[word] = d.get(word, 0) + value
print(d)
This produces the desired numbers in dictionary form:
{u'it': 1, u'pineapple': 1, u'she': 3, u'he': 7}
The above is a dictionary. If you wanted it back in list of tuple form, just use list(d.items()):
[(u'it', 1), (u'pineapple', 1), (u'she', 3), (u'he', 7)]

Related

How to print non repeating elements with original list

given a list of integers nums, return a list of all the element but the repeating number should not be printed more than twice
example
input: nums = [1,1,2,3,3,4,4,4,5]
output: [1,1,2,3,3,4,4,5]
A more flexible implementation using itertools:
from itertools import islice, groupby, chain
nums = [1,1,2,3,3,4,4,4,5]
output = (islice(g, 2) for _, g in groupby(nums))
output = list(chain.from_iterable(output))
print(output) # [1, 1, 2, 3, 3, 4, 4, 5]
You can replace 2 in islice(g, 2) to tune the max repeats you want.
The easiest and I guess most straight forward way to use unique collections is with a set:
list(set(nums)) -> [1, 2, 3, 4, 5]
The downside of this approuch is that sets are unordered. And we cannot really depend on how the list will be sorted after the conversion.
If order is important in your case you can do this:
list(dict.fromkeys(nums))
[1, 2, 3, 4, 5]
dicts are ordered since python3 came out, and their keys are unique. So with this small trick we get a list of the unique keys of a dictionary, but still maitain the original order!

Counting "unique pairs" of numbers into a python dictionary?

EDIT: Edited typos; the key values of the dictionary should be dictionaries, not sets.
I will keep the typos here though, as the questions below address this question. My apologies for the confusion.
Here's the problem:
Let's say I have a list of integers whereby are never repeats:
list1 = [2, 3]
In this case, there is a unique pair 2-3 and 3-2, so the dictionary should be:
{2:{3: 1}, 3:{2: 1}}
That is, there is 1 pair of 2-3 and 1 pair of 3-2.
For larger lists, the pairing is the same, e.g.
list2 = [2, 3, 4]
has the dicitonary
{2:{3: 1}, 3:{2: 1}, 3:{4: 1}, 4:{3: 1}, 2:{4: 1}, 4:{2: 1}}
(1) Once the size of the lists become far larger, how would one algorithmically find the "unique pairs" in this format using python data structures?
(2) I mentioned that the lists cannot have repeat integers, e.g.
[2, 2, 3]
is impossible, as there are two 2s.
However, one may have a list of lists:
list3 = [[2, 3], [2, 3, 4]]
whereby the dictionary must be
{2:{3: 2}, 3:{2: 2}, 3:{4: 1}, 4:{3: 1}, 2:{4: 1}, 4:{2: 1}}
as there are two pairs of 2-3 and 3-2. How would one "update" the dictionary given multiple lists within a list?
This is an algorithmic problem, and I don't know of the most efficient solution. My idea would be to somehow cache values in a list and enumerate pairs...but that would be so slow. I'm guessing there's something useful from itertools.
What you want is to count pairs that arise from combinations in your lists. You can find those with a Counter and combinations.
from itertools import combinations
from collections import Counter
list2 = [2, 3, 4]
count = Counter(combinations(list2, 2))
print(count)
Output
Counter({(2, 3): 1, (2, 4): 1, (3, 4): 1})
As for your list of list, we update the Counter with the result from each sublist.
from itertools import combinations
from collections import Counter
list3 = [[2, 3], [2, 3, 4]]
count = Counter()
for sublist in list3:
count.update(Counter(combinations(sublist, 2)))
print(count)
Output
Counter({(2, 3): 2, (2, 4): 1, (3, 4): 1})
My approach iterates over the input dict (linear complexity) and pairs each key with its first available integer (this complexity depends on the exact specs of your question - e.g., can each list contain unlimited sub-lists?), inserting these into an output dict (constant complexity).
import os
import sys
def update_results(result_map, tup):
# Update dict inplace
# Don't need to keep count here
try:
result_map[tup] += 1
except KeyError:
result_map[tup] = 1
return
def algo(input):
# Use dict to keep count of unique pairs while iterating
# over each (key, v[i]) pair where v[i] is an integer in
# list input[key]
result_map = dict()
for key, val in input.items():
key_pairs = list()
if isinstance(val, list):
for x in val:
if isinstance(x, list):
for y in x:
update_results(result_map, (key, y))
else:
update_results(result_map, (key, x))
else:
update_results(result_map, (key, val))
return len(result_map.keys())
>>> input = { 1: [1, 2], 2: [1, 2, [2, 3]] }
>>> algo(input)
>>> 5
I'm pretty sure there's a more refined way to do this (again, would depend on the exact specs of your question), but this could get your started (no imports)

Fast sorting of large nested lists

I am looking to find out the likelihood of parameter combinations using Monte Carlo Simulation.
I've got 4 parameters and each can have about 250 values.
I have randomly generated 250,000 scenarios for each of those parameters using some probability distribution function.
I now want to find out which parameter combinations are the most likely to occur.
To achieve this I have started by filtering out any duplicates from my 250,000 randomly generated samples in order to reduce the length of the list.
I then iterated through this reduced list and checked how many times each scenario occurs in the original 250,000 long list.
I have a large list of 250,000 items which contains lists, as such :
a = [[1,2,5,8],[1,2,5,8],[3,4,5,6],[3,4,5,7],....,[3,4,5,7]]# len(a) is equal to 250,000
I want to find a fast and efficient way of having each list within my list only occurring once.
The end goal is to count the occurrences of each list within list a.
so far I've got:
'''Removing duplicates from list a and storing this as a new list temp'''
b_set = set(tuple(x) for x in a)
temp = [ list(x) for x in b_set ]
temp.sort(key = lambda x: a.index(x) )
''' I then iterate through each of my possible lists (i.e. temp) and count how many times they occur in a'''
most_likely_dict = {}
for scenario in temp:
freq = list(scenario_list).count(scenario)
most_likely_dict[str(scenario)] = freq
at the moment it takes a good 15 minutes to perform ... Any suggestion on how to turn that into a few seconds would be greatly appreciated !!
You can take out the sorting part, as the final result is a dictionary which will be unordered in any case, then use a dict comprehension:
>>> a = [[1,2],[1,2],[3,4,5],[3,4,5], [3,4,5]]
>>> a_tupled = [tuple(i) for i in a]
>>> b_set = set(a_tupled)
>>> {repr(i): a_tupled.count(i) for i in b_set}
{'(1, 2)': 2, '(3, 4, 5)': 3}
calling list on your tuples will add more overhead, but you can if you want to
>>> {repr(list(i)): a_tupled.count(i) for i in b_set}
{'[3, 4, 5]': 3, '[1, 2]': 2}
Or just use a Counter:
>>> from collections import Counter
>>> Counter(tuple(i) for i in a)
{str(item):a.count(item) for item in a}
Input :
a = [[1,2,5,8],[1,2,5,8],[3,4,5,6],[3,4,5,7],[3,4,5,7]]
Output :
{'[3, 4, 5, 6]': 1, '[1, 2, 5, 8]': 2, '[3, 4, 5, 7]': 2}

Extending a dictionary by adding to tupled value

(I'm just using things to represent the format my dictionary is like, this is not how it actually looks)
The current dictionary I have is in the format of :
dict={"a":(1,2,3),"b":(2,3,4),"c":(3,4,5)}
dict2={1:(1,1,2),2:(2,2,3),3:(3,3,4)}
I am trying to add on to it. I have a function that calculated an average and now I want to create a second function that would pull that average from the first function and add it to the tuple in the values of my dictionary. So if the average was 4 for "a", then I would want to add it on for it to look like this {"a":(1,2,3,4)} etc.
def func1(dict, dict2, number):
if number in dict:
for num in dict.values():
#return sum after iteration of sum through num[0] to num[2]
def func 2 (dict1, dict2)
if number in dict:
new_val_to_add= func1(dict, dict2, number)
From here I'm not sure where I would go with adding the returned value to the tuple and be able to print back dict with the added value. I was thinking of maybe converting the tuple in the first dictionary into a list and then append to that as I iterate through each key in the dictionary, and finally convert it back into a tuple. Would this be the right way of going about this?
If you want the values to be mutable, why don't you use lists instead of tuples?
You can combine two tuples using +, though:
>>> (1, 2, 3) + (4,)
(1, 2, 3, 4)
I'm having trouble following your example code, but with a dictionary following your description you can do it like this:
>>> d1 = {'a': (1, 2, 3)}
>>> d1['a'] += (4,)
>>> d1
{'a': (1, 2, 3, 4)}
Or, using a list instead:
>>> d1 = {'a': [1, 2, 3]}
>>> d1['a'].append(4)
>>> d1
{'a': [1, 2, 3, 4]}

Counting occurrences in a Python list

I have a list of integers; for example:
l = [1, 2, 3, 4, 4, 4, 1, 1, 1, 2]
I am trying to make a list of the three elements in l with the highest number of occurrences, in descending order of frequency. So in this case I want the list [1, 4, 2], because 1 occurs the most in l (four times), 4 is next with three instances, and then 2 with two. I only want the top three results, so 3 (with only one instance) doesn't make the list.
How can I generate that list?
Use a collections.Counter:
import collections
l= [1 ,2 ,3 ,4,4,4 , 1 ,1 ,1 ,2]
x=collections.Counter(l)
print(x.most_common())
# [(1, 4), (4, 3), (2, 2), (3, 1)]
print([elt for elt,count in x.most_common(3)])
# [1, 4, 2]
collections.Counter was introduced in Python 2.7. If you are using an older version, then you could use the implementation here.
l_items = set(l) # produce the items without duplicates
l_counts = [ (l.count(x), x) for x in set(l)]
# for every item create a tuple with the number of times the item appears and
# the item itself
l_counts.sort(reverse=True)
# sort the list of items, reversing is so that big items are first
l_result = [ y for x,y in l_counts ]
# get rid of the counts leaving just the items
from collections import defaultdict
l= [1 ,2 ,3 ,4,4,4 , 1 , 1 ,1 ,2]
counter=defaultdict(int)
for item in l:
counter[item]+=1
inverted_dict = dict([[v,k] for k,v in counter.items()])
for count in sorted(inverted_dict.keys()):
print inverted_dict[count],count
This should print out the most frequents items in 'l': you would need to restrict to the first three. Be careful when using the inverted_dict there (that is the keys and values gets swapped): this will result in an over-write of values (if two items have identical counts, then only one will be written back to the dict).
Without using collections:
a = reversed(sorted(l,key=l.count))
outlist = []
for element in a:
if element not in outlist:
outlist.append(element)
The first line gets you all the original items sorted by count.
The for loop is necessary to uniquify without losing the order (there may be a better way).

Categories