EDIT: Edited typos; the key values of the dictionary should be dictionaries, not sets.
I will keep the typos here though, as the questions below address this question. My apologies for the confusion.
Here's the problem:
Let's say I have a list of integers whereby are never repeats:
list1 = [2, 3]
In this case, there is a unique pair 2-3 and 3-2, so the dictionary should be:
{2:{3: 1}, 3:{2: 1}}
That is, there is 1 pair of 2-3 and 1 pair of 3-2.
For larger lists, the pairing is the same, e.g.
list2 = [2, 3, 4]
has the dicitonary
{2:{3: 1}, 3:{2: 1}, 3:{4: 1}, 4:{3: 1}, 2:{4: 1}, 4:{2: 1}}
(1) Once the size of the lists become far larger, how would one algorithmically find the "unique pairs" in this format using python data structures?
(2) I mentioned that the lists cannot have repeat integers, e.g.
[2, 2, 3]
is impossible, as there are two 2s.
However, one may have a list of lists:
list3 = [[2, 3], [2, 3, 4]]
whereby the dictionary must be
{2:{3: 2}, 3:{2: 2}, 3:{4: 1}, 4:{3: 1}, 2:{4: 1}, 4:{2: 1}}
as there are two pairs of 2-3 and 3-2. How would one "update" the dictionary given multiple lists within a list?
This is an algorithmic problem, and I don't know of the most efficient solution. My idea would be to somehow cache values in a list and enumerate pairs...but that would be so slow. I'm guessing there's something useful from itertools.
What you want is to count pairs that arise from combinations in your lists. You can find those with a Counter and combinations.
from itertools import combinations
from collections import Counter
list2 = [2, 3, 4]
count = Counter(combinations(list2, 2))
print(count)
Output
Counter({(2, 3): 1, (2, 4): 1, (3, 4): 1})
As for your list of list, we update the Counter with the result from each sublist.
from itertools import combinations
from collections import Counter
list3 = [[2, 3], [2, 3, 4]]
count = Counter()
for sublist in list3:
count.update(Counter(combinations(sublist, 2)))
print(count)
Output
Counter({(2, 3): 2, (2, 4): 1, (3, 4): 1})
My approach iterates over the input dict (linear complexity) and pairs each key with its first available integer (this complexity depends on the exact specs of your question - e.g., can each list contain unlimited sub-lists?), inserting these into an output dict (constant complexity).
import os
import sys
def update_results(result_map, tup):
# Update dict inplace
# Don't need to keep count here
try:
result_map[tup] += 1
except KeyError:
result_map[tup] = 1
return
def algo(input):
# Use dict to keep count of unique pairs while iterating
# over each (key, v[i]) pair where v[i] is an integer in
# list input[key]
result_map = dict()
for key, val in input.items():
key_pairs = list()
if isinstance(val, list):
for x in val:
if isinstance(x, list):
for y in x:
update_results(result_map, (key, y))
else:
update_results(result_map, (key, x))
else:
update_results(result_map, (key, val))
return len(result_map.keys())
>>> input = { 1: [1, 2], 2: [1, 2, [2, 3]] }
>>> algo(input)
>>> 5
I'm pretty sure there's a more refined way to do this (again, would depend on the exact specs of your question), but this could get your started (no imports)
Related
I'm new so bear with me.. I could create two separate for loops but I like this method since I'll probably iterate through multiple lists and this is less code. Is it possible in itertools? As I understand it, it creates one list out of the two so I might be out of luck.
import itertools
a = [0, 1, 2]
b = [3, 4, 5]
def finditem(n):
for i in itertools.chain(a, b):
if i == n:
print(n) # here i want (n) and (a or b)
n = 3
finditem(n)
I think what you want is:
given a list of lists and an item to find if it exists in that list
tell me what list that item was found in.
You should probably create a dictionary with your keys being what that list is called and the values for the things in that collection
my_lists = {'a': [0, 1, 2], 'b': [3, 4, 5]}
def find_item(my_lists, item_to_find):
for key in my_lists:
if item_to_find in my_lists[key]:
print(key)
n=3
find_item(my_lists, n)
There isn't much need for itertools here, but you should store your data in a dictionary if you want to be able to give them labels that you can refer to in the program. To list all of the list names containing n:
lists = {'fruits': [0, 1, 2], 'vegetables': [3, 4, 5]}
def find_item(n, lists):
for name, list_ in lists.items():
if n in list_:
print(name)
or similarly:
print([name for name, list_ in lists.items() if n in list_])
or to print only the first list found that contains n:
print(next(name for name, list_ in lists.items() if n in list_))
If you need this for an if condition, and the number of lists is not very large, then probably don't use a loop in the first place, but a simple condition like this:
fruits = [0, 1, 2]
vegetables = [3, 4, 5]
if n in fruits:
# do this
elif n in vegetables:
# do that
I have a numeric list a and I want to output a list with the hierarchical position of every element in a (0 for the highest value, 1 for the second-highest, etc).
I want to know if this is the most Pythonic and efficient way to do this. Perhaps there is a better way?
a = [3,5,6,25,-3,100]
b = sorted(a)
b = b[::-1]
[b.index(i) for i in a]
#ThierryLathuille's answer works only if there are no duplicates in the input list since the answer relies on a dict with the list values as keys. If there can be duplicates in the list, you should sort the items in the input list with their indices generated by enumerate, and map those indices to their sorted positions instead:
from operator import itemgetter
mapping = dict(zip(map(itemgetter(0), sorted(enumerate(a), key=itemgetter(1), reverse=True)), range(len(a))))
mapping becomes:
{5: 0, 3: 1, 2: 2, 1: 3, 0: 4, 4: 5}
so that you can then iterate an index over the length of the list to obtain the sorted positions in order:
[mapping[i] for i in range(len(a))]
which returns:
[4, 3, 2, 1, 5, 0]
You could also you numpy.argsort(-a) (-a because argsort assumes ascending order). It could have better performance for large arrays (though there's no official analysis that I know of).
One problem with your solution is the repeated use of index, that will make your final comprehension O(n**2), as index has to go over the sorted list each time.
It would be more efficient to build a dict with the rank of each value in the sorted list:
a = [3,5,6,25,-3,100]
ranks = {val:idx for idx, val in enumerate(sorted(a, reverse=True))}
# {100: 0, 25: 1, 6: 2, 5: 3, 3: 4, -3: 5}
out = [ranks[val] for val in a]
print(out)
# [4, 3, 2, 1, 5, 0]
in order to have a final step in O(n).
First, zip the list with a with range(len(a)) to create a list of tuples (of element and their positions), sort this list in reverse order, zip this with range(len(a)) to mark the positions of each element after the sort, now unsort this list (by sorting this based on the original position of each element), and finally grab the position of each element when it was sorted
>>> a = [3,5,6,25,-3,100]
>>> [i for _,i in sorted(zip(sorted(zip(a, range(len(a))), reverse=True), range(len(a))), key=lambda t:t[0][1])]
[4, 3, 2, 1, 5, 0]
I have been tasked to group a list by frequency. This is a very common question on SOF and so far the forum has been very educational. However, of all the examples given, only one follows these perimeters:
Sort the given iterable so that its elements end up in the decreasing frequency order.
If two elements have the same frequency, they should end up in the same order as the first appearance in the iterable.
Using these two lists:
[4, 6, 2, 2, 6, 4, 4, 4]
[17, 99, 42]
The following common codes given as solutions to this question have failed.
from collections import Counter
freq = Counter(items)
# Ex 1
# The items dont stay grouped in the final list :(
sorted(items, key = items.count, reverse=True)
sorted(items, key=lambda x: -freq[x])
[4, 4, 4, 4, 6, 2, 2, 6]
# Ex 2
# The order that the items appear in the list gets rearranged :(
sorted(sorted(items), key=freq.get, reverse=True)
[4, 4, 4, 4, 2, 2, 6, 6]
# Ex 3
# With a list of integers, after the quantity gets sorted,
# the int value gets sorted :(
sorted(items, key=lambda x: (freq[x], x), reverse=True)
[99, 42, 17]
I did find a solution that works great though:
s_list = sorted(freq, key=freq.get, reverse=True)
new_list = []
for num in s_list:
for rep in range(freq[num]):
new_list.append(num)
print(new_list)
I can't figure out how the second loop references the number of occurrences though.
I ran the process through pythontutor to visualize it and the code seems to simply know that there are four "4", two "6" and two "2" in the 'items' list. The only solution I can think of is that python can reference a list in a global frame without it being named. Or perhaps being able to utilize the value from the "freq" dictionary. Is this correct?
referenced thread:
Sort list by frequency in python
Yes, the values of freq are the ones making the second loop work.
freq is a Counter:
It is an unordered collection where elements are stored as dictionary keys and their counts are stored as dictionary values.
In other words, freq is a dictionary which keys are the unique elements of items mapped to the amount of times they appeared in items.
And to illustrate your example:
>>> from collections import Counter
>>> items = [4, 6, 2, 2, 6, 4, 4, 4]
>>> freq = Counter(items)
>>> freq
Counter({4: 4, 6: 2, 2: 2})
So when range(freq[num]) is iterated over in your second loop, all it does is iterating over the amount of times num appeared in items.
Edit 2019-02-13: Additional information and example for Python Tutor
It looks like Python Tutor represents simple build-in types (integers, strings, ...) as-is, and not as "objects" in their own cell.
You can see the references clearly if you use new objects instead of integer. For instance, if you were to wrap the integer such as:
from collections import Counter
class MyIntWrapper:
def __init__(self, value):
self.value = value
items = [4, 6, 2, 2, 6, 4, 4, 4]
items_wrapped = [MyIntWrapper(item) for item in items]
freq = Counter(items_wrapped)
s_list = sorted(freq, key=freq.get, reverse=True)
new_list = []
for num in s_list:
for rep in range(freq[num]):
new_list.append(num)
This question was previously asked here with an egregious typo: Counting "unique pairs" of numbers into a python dictionary?
This is an algorithmic problem, and I don't know of the most efficient solution. My idea would be to somehow cache values in a list and enumerate pairs...but that would be so slow. I'm guessing there's something useful from itertools.
Let's say I have a list of integers whereby are never repeats:
list1 = [2, 3]
In this case, there is a unique pair 2-3 and 3-2, so the dictionary should be:
{2:{3: 1}, 3:{2: 1}}
That is, there is 1 pair of 2-3 and 1 pair of 3-2.
For larger lists, the pairing is the same, e.g.
list2 = [2, 3, 4]
has the dicitonary
{2:{3:1, 4:1}, 3:{2:1, 4:1}, 4:{3:1, 2:1}}
(1) Once the size of the lists become far larger, how would one algorithmically find the "unique pairs" in this format using python data structures?
(2) I mentioned that the lists cannot have repeat integers, e.g.
[2, 2, 3]
is impossible, as there are two 2s.
However, one may have a list of lists:
list3 = [[2, 3], [2, 3, 4]]
whereby the dictionary must be
{2:{3:2, 4:1}, 3:{2:2, 4:1}, 4:{2:1, 3:1}}
as there are two pairs of 2-3 and 3-2. How would one "update" the dictionary given multiple lists within a list?
EDIT: My ultimate use case is, I want to iterate through hundreds of lists of integers, and create a single dictionary with the "counts" of pairs. Does this make sense? There might be another data structure which is more useful.
For the nested list example, you can do the following, making use of itertools.permutations and dict.setdefault:
from itertools import permutations
list3 = [[2, 3], [2, 3, 4]]
d = {}
for l in list3:
for a, b in permutations(l, 2):
d[a][b] = d.setdefault(a, {}).setdefault(b, 0) + 1
# {2: {3: 2, 4: 1}, 3: {2: 2, 4: 1}, 4: {2: 1, 3: 1}}
For flat lists l, use only the inner loop and omit the outer one
For this example I'll just use a list with straight numbers and no nested list:
values = [3, 2, 4]
result = dict.from_keys(values)
for key, value in result.items():
value = {}
for num in values:
if num != key:
value[num] = 1
This creates a dict with each number as a key. Now in each key, make the value a nested dict who's contents are num: 1 for each number in the original values list if it isn't the name of the key that we're in
use defaultdict with permutations
from collections import defaultdict
from itertools import permutations
d = defaultdict(dict)
for i in [x for x in permutations([4,2,3])]:
d[i[0]] = {k: 1 for k in i[1:]}
output is
In [22]: d
Out[22]: defaultdict(dict, {2: {3: 1, 4: 1}, 4: {2: 1, 3: 1}, 3: {2: 1, 4: 1}})
for inherit list of lists https://stackoverflow.com/a/52206554/8060120
I'm trying to wrap my brain around this but it's not flexible enough.
In my Python script I have a dictionary of dictionaries of lists. (Actually it gets a little deeper but that level is not involved in this question.) I want to flatten all this into one long list, throwing away all the dictionary keys.
Thus I want to transform
{1: {'a': [1, 2, 3], 'b': [0]},
2: {'c': [4, 5, 1], 'd': [3, 8]}}
to
[1, 2, 3, 0, 4, 5, 1, 3, 8]
I could probably set up a map-reduce to iterate over items of the outer dictionary to build a sublist from each subdictionary and then concatenate all the sublists together.
But that seems inefficient for large data sets, because of the intermediate data structures (sublists) that will get thrown away. Is there a way to do it in one pass?
Barring that, I would be happy to accept a two-level implementation that works... my map-reduce is rusty!
Update:
For those who are interested, below is the code I ended up using.
Note that although I asked above for a list as output, what I really needed was a sorted list; i.e. the output of the flattening could be any iterable that can be sorted.
def genSessions(d):
"""Given the ipDict, return an iterator that provides all the sessions,
one by one, converted to tuples."""
for uaDict in d.itervalues():
for sessions in uaDict.itervalues():
for session in sessions:
yield tuple(session)
...
# Flatten dict of dicts of lists of sessions into a list of sessions.
# Sort that list by start time
sessionsByStartTime = sorted(genSessions(ipDict), key=operator.itemgetter(0))
# Then make another copy sorted by end time.
sessionsByEndTime = sorted(sessionsByStartTime, key=operator.itemgetter(1))
Thanks again to all who helped.
[Update: replaced nthGetter() with operator.itemgetter(), thanks to #intuited.]
I hope you realize that any order you see in a dict is accidental -- it's there only because, when shown on screen, some order has to be picked, but there's absolutely no guarantee.
Net of ordering issues among the various sublists getting catenated,
[x for d in thedict.itervalues()
for alist in d.itervalues()
for x in alist]
does what you want without any inefficiency nor intermediate lists.
edit: re-read the original question and reworked answer to assume that all non-dictionaries are lists to be flattened.
In cases where you're not sure how far down the dictionaries go, you would want to use a recursive function. #Arrieta has already posted a function that recursively builds a list of non-dictionary values.
This one is a generator that yields successive non-dictionary values in the dictionary tree:
def flatten(d):
"""Recursively flatten dictionary values in `d`.
>>> hat = {'cat': ['images/cat-in-the-hat.png'],
... 'fish': {'colours': {'red': [0xFF0000], 'blue': [0x0000FF]},
... 'numbers': {'one': [1], 'two': [2]}},
... 'food': {'eggs': {'green': [0x00FF00]},
... 'ham': ['lean', 'medium', 'fat']}}
>>> set_of_values = set(flatten(hat))
>>> sorted(set_of_values)
[1, 2, 255, 65280, 16711680, 'fat', 'images/cat-in-the-hat.png', 'lean', 'medium']
"""
try:
for v in d.itervalues():
for nested_v in flatten(v):
yield nested_v
except AttributeError:
for list_v in d:
yield list_v
The doctest passes the resulting iterator to the set function. This is likely to be what you want, since, as Mr. Martelli points out, there's no intrinsic order to the values of a dictionary, and therefore no reason to keep track of the order in which they were found.
You may want to keep track of the number of occurrences of each value; this information will be lost if you pass the iterator to set. If you want to track that, just pass the result of flatten(hat) to some other function instead of set. Under Python 2.7, that other function could be collections.Counter. For compatibility with less-evolved pythons, you can write your own function or (with some loss of efficiency) combine sorted with itertools.groupby.
A recursive function may work:
def flat(d, out=[]):
for val in d.values():
if isinstance(val, dict):
flat(d, out)
else:
out+= val
If you try it with :
>>> d = {1: {'a': [1, 2, 3], 'b': [0]}, 2: {'c': [4, 5, 6], 'd': [3, 8]}}
>>> out = []
>>> flat(d, out)
>>> print out
[1, 2, 3, 0, 4, 5, 6, 3, 8]
Notice that dictionaries have no order, so the list is in random order.
You can also return out (at the end of the loop) and don't call the function with a list argument.
def flat(d, out=[]):
for val in d.values():
if isinstance(val, dict):
flat(d, out)
else:
out+= val
return out
call as:
my_list = flat(d)