I need to append the values from one dictionary (N) to another (M) - pseudocode below
if[x] in M:
M[x]=M[x]+N[x]
else:
M[x]=N[x]
Doing this for every key in N seems quite untidy coding.
What would be the most efficient way to achieve this?
Of course you should be iterating your keys in "x" already - but a single line solution is:
M.update({key:((M[key] + value) if key in M else value) for key, value in N.items()})
not entirely sure what your x is (guessing the keys of both M and N), then this might work:
M = {key: M.get(key, 0) + N.get(key, 0) for key in set((*M, *N))}
for the example:
M = {'a': 1, 'c': 3, 'e': 5}
N = {'a': 2, 'b': 4, 'c': 6, 'd': 8}
you get:
print(M) # {'a': 3, 'e': 5, 'd': 8, 'c': 9, 'b': 4}
or please clarify what the desired output for the given example would be.
When you say "append", I assume that means that the values in your dicts are lists. However the techniques below can easily be adapted if they're simple objects like integers or strings.
Python doesn't provide any built-in methods for handling dicts of lists, and the most efficient way to do this depends on the data, because some ways work best when there are a high proportion of shared keys, other ways are better when there aren't many shared keys.
If the proportion of shared keys isn't too high, this code is reasonably efficient:
m = {1:[1, 2], 2:[3]}
n = {1:[4], 3:[5]}
for k, v in n.items():
m.setdefault(k, []).extend(v)
print(m)
output
{1: [1, 2, 4], 2: [3], 3: [5]}
You can make this slightly faster by caching the .setdefault method and the empty list:
def merge(m, n):
setdef = m.setdefault
empty = []
for k, v in n.items():
setdef(k, empty).extend(v)
If you expect a high proportion of shared keys, then it's better to perform set operations on the keys (in Python 3, dict.keys() return a set-like View object, which is extremely efficient to construct), and handle the shared keys separately from the unique keys of N.
Related
I want to check if there is an overlap between points in one dictionary with another.
If there is no overlap, then create a new key in the dictionary with the value.
I am getting an error while implementing the following code.
RuntimeError: dictionary changed size during iteration
for k1 in d1:
for k2 in d2:
if(overlap(k1,k2)==False):
d2[len(d2)+1]=d1[k1]
Is there another way to implement this?
Edit:
d1 = {"file1":[2,3],"file2":[11,15]}
d2 = {1:[1,5],2:[6,10]}
Output:
d2 = {1:[1,5],2:[6,10],3:[11,15]}
I believe you are looking for dict.update():
d1 = {'foo': 0,
'bar': 1}
d2 = {'foo': 0,
'bar': 1,
'baz': 2}
d1.update(d2)
This results in:
d1 = {'foo': 0,
'bar': 1,
'baz': 2}
Edit:
import itertools
d1 = {"file1":[2,3],
"file2":[11,15]}
d2 = {1:[1,5],
2:[6,10]}
d2 = {**d2, **{max(d2.keys())+i+1: v for i, (k, v) in enumerate({k: v for k, v in d1.items() if not any(i in range(v[0], v[1]+1) for i in itertools.chain.from_iterable(range(v[0], v[1]+1) for v in d2.values()))}.items())}}
Produces:
d2 = {1: [1, 5],
2: [6, 10],
3: [11, 15]}
😃
If you are trying to merge dictionaries then the faster way is using one of these
a = dict(one=1,two=2,three=3,four=4,five=5,six=6)
b = dict(one=1,two=2,three=5,six=6, seven=7, nine=9)
a.update(b)
Which gives {'one': 1, 'two': 2, 'three': 5, 'four': 4, 'five': 5, 'six': 6, 'seven': 7, 'nine': 9}
Or you can use c = dict(a, **b)
You don't need a double loop there. Loop through the keys in d1. Check if d2.keys contains k1. If not, add k1 to d2. I don't have the syntax right here, but something along these lines.
for k1 in d1.keys:
if not k1 in d2.keys:
d2[k1] = d1[k1]
return d2
Based on your description, edit, and comments it doesn't sound like dict.update() will give you the results you want.
The error you're hitting is because the loop makes an assumption of the number of items it will be iterating once it starts up. During the loop, you change that number, invalidating the assumption, causing the the program to fail.
To fix this, you need to iterate a copy of the items instead. You also mentioned you're concerned with the values and not the keys, so you need to adjust what you're iterating as well.
Now, you didn't actually post what your overlap function does, so I'm assuming something like this based on your comments/sample:
def overlap(left, right):
return left[0] - right[-1] == 1
For the most part, the actual implementation of your overlap is irrelevant to the question, but the code above at least gave me your expected output.
I've then adjusted your code to the following:
for v1 in list(d1.values()):
for v2 in list(d2.values()):
# If v1 overlaps v2
if overlap(v1, v2):
# Add v1 to d2 under the next sequential key
d2[len(d1) + 1] = v1
By using list(d1.values()), I'm no longer iterating the dictionary itself while modifying it - I'm taking a "snapshot" of the contents (the values specifically here) at the start of the loop.
Using this and your sample data, d2 contains {1: [1, 5], 2: [6, 10], 3: [11, 15]} after the loop.
EDIT: Edited typos; the key values of the dictionary should be dictionaries, not sets.
I will keep the typos here though, as the questions below address this question. My apologies for the confusion.
Here's the problem:
Let's say I have a list of integers whereby are never repeats:
list1 = [2, 3]
In this case, there is a unique pair 2-3 and 3-2, so the dictionary should be:
{2:{3: 1}, 3:{2: 1}}
That is, there is 1 pair of 2-3 and 1 pair of 3-2.
For larger lists, the pairing is the same, e.g.
list2 = [2, 3, 4]
has the dicitonary
{2:{3: 1}, 3:{2: 1}, 3:{4: 1}, 4:{3: 1}, 2:{4: 1}, 4:{2: 1}}
(1) Once the size of the lists become far larger, how would one algorithmically find the "unique pairs" in this format using python data structures?
(2) I mentioned that the lists cannot have repeat integers, e.g.
[2, 2, 3]
is impossible, as there are two 2s.
However, one may have a list of lists:
list3 = [[2, 3], [2, 3, 4]]
whereby the dictionary must be
{2:{3: 2}, 3:{2: 2}, 3:{4: 1}, 4:{3: 1}, 2:{4: 1}, 4:{2: 1}}
as there are two pairs of 2-3 and 3-2. How would one "update" the dictionary given multiple lists within a list?
This is an algorithmic problem, and I don't know of the most efficient solution. My idea would be to somehow cache values in a list and enumerate pairs...but that would be so slow. I'm guessing there's something useful from itertools.
What you want is to count pairs that arise from combinations in your lists. You can find those with a Counter and combinations.
from itertools import combinations
from collections import Counter
list2 = [2, 3, 4]
count = Counter(combinations(list2, 2))
print(count)
Output
Counter({(2, 3): 1, (2, 4): 1, (3, 4): 1})
As for your list of list, we update the Counter with the result from each sublist.
from itertools import combinations
from collections import Counter
list3 = [[2, 3], [2, 3, 4]]
count = Counter()
for sublist in list3:
count.update(Counter(combinations(sublist, 2)))
print(count)
Output
Counter({(2, 3): 2, (2, 4): 1, (3, 4): 1})
My approach iterates over the input dict (linear complexity) and pairs each key with its first available integer (this complexity depends on the exact specs of your question - e.g., can each list contain unlimited sub-lists?), inserting these into an output dict (constant complexity).
import os
import sys
def update_results(result_map, tup):
# Update dict inplace
# Don't need to keep count here
try:
result_map[tup] += 1
except KeyError:
result_map[tup] = 1
return
def algo(input):
# Use dict to keep count of unique pairs while iterating
# over each (key, v[i]) pair where v[i] is an integer in
# list input[key]
result_map = dict()
for key, val in input.items():
key_pairs = list()
if isinstance(val, list):
for x in val:
if isinstance(x, list):
for y in x:
update_results(result_map, (key, y))
else:
update_results(result_map, (key, x))
else:
update_results(result_map, (key, val))
return len(result_map.keys())
>>> input = { 1: [1, 2], 2: [1, 2, [2, 3]] }
>>> algo(input)
>>> 5
I'm pretty sure there's a more refined way to do this (again, would depend on the exact specs of your question), but this could get your started (no imports)
I have a dictionary master which contains around 50000 to 100000 unique lists which can be simple lists or also lists of lists. Every list is assigned to a specific ID (which is the key of the dictionary):
master = {12: [1, 2, 4], 21: [[1, 2, 3], [5, 6, 7, 9]], ...} # len(master) is several ten thousands
Now I have a few hundreds of dictionarys which again contain around 10000 lists (same as above: can be nested). Example of one of those dicts:
a = {'key1': [6, 9, 3, 1], 'key2': [[1, 2, 3], [5, 6, 7, 9]], 'key3': [7], ...}
I want to cross-reference this data for every single dictionary in reference to my master, i.e. instead of saving every list within a, I want to only store the ID of the master in case the list is present in the master.
=> a = {'key1': [6, 9, 3, 1], 'key2': 21, 'key3': [7], ...}
I can do that by looping over all values in a and all values of master and try to match the lists (by sorting them), but that'll take ages.
Now I'm wondering how would you solve this?
I thought of "hashing" every list in master to a unique string and store it as a key of a new master_inverse reference dict, e.g.:
master_inverse = {hash([1,2,4]): 12, hash([[1, 2, 3], [5, 6, 7, 9]]): 21}
Then it would be very simple to look it up later on:
for k, v in a.items():
h = hash(v)
if h in master_inverse:
a[k] = master_inverse[h]
Do you have a better idea?
How could such a hash look like? Is there a built-in-method already which is fast and unique?
EDIT:
Dunno why I didn't come up instantly with this approach:
What do you think of using a m5-hash of either the pickle or the repr() any single list?
Something like this:
import hashlib
def myHash(str):
return hashlib.md5(repr(str)).hexdigest()
master_inverse = {myHash(v): k for k, v in master.items()}
for k, v in a.items():
h = myHash(v)
if h in master_inverse:
a[k] = master_inverse[h]
EDIT2:
I benched it: To check one of the hundred dicts (in my example a, a contains for my benchmark around 20k values) against my master_inverse is very fast, didn't expect that: 0.08sec. So I guess I can live with that well enough.
MD5 approach will work, but you need to be cautions about very small possibility of cache collisions (see How many random elements before MD5 produces collisions? for more deitals) when using MD5 hash.
If you need to be absolutely sure that program works correctly you can convert lists to tuples and create dictionary where keys are tuples you have created and values are keys from your master dictionary (same as master_inverse, but with full values instead of MD5 hash values).
More info on how to use tuples as dictionary keys: http://www.developer.com/lang/other/article.php/630941/Learn-to-Program-using-Python-Using-Tuples-as-Keys.htm.
I have a list
In [4]: a = [1, 2, 3, 3, 2, 4]
from which I would like to remove duplicates via a comprehension using a sentinel list (see below why):
In [8]: [x if x not in seen else seen.append(x) for x in a]
Out[8]: [1, 2, 3, 3, 2, 4]
It seems that seen is not taken into account (neither updated, not checked). Why is it so?
As for the reason why using a convoluted method: The list I have is of the form
[{'a': 3, 'b': 4}, {'a': 10, 'b': 4}, {'a': 5, 'b': 5}]
and I want to remove duplicates based on the value of a specific key (b in the case above, to leave [{'a': 3, 'b': 4}, {'a': 5, 'b': 5}] (I do not care which dict is removed). The idea would be to build a sentinel list with the values of b and keep only the dicts without b equal to any element in that sentinel list.
Since x is not in seen, you are never adding it to seen either; the else branch is not executed when x not in seen is true.
However, you are using a conditional expression; it always produces a value; either x or the result of seen.append() (which is None), so you are not filtering, you are mapping here.
If you wanted to filter, move the test to an if section after the for loop:
seen = set()
[x for x in a if not (x in seen or seen.add(x))]
Since you were using seen.append() I presume you were using a list; I switched you to a set() instead, as membership tests are way faster using a set.
So x is excluded only if a) x in seen is true (so we have already seen it), or seen.append(x) returned a true value (None is not true). Yes, this works, if only a little convoluted.
Demo:
>>> a = [1, 2, 3, 3, 2, 4]
>>> seen = set()
>>> [x for x in a if not (x in seen or seen.add(x))]
[1, 2, 3, 4]
>>> seen
set([1, 2, 3, 4])
Applying this to your specific problem:
>>> a = [{'a': 3, 'b': 4}, {'a': 10, 'b': 4}, {'a': 5, 'b': 5}]
>>> seen = set()
>>> [entry for entry in a if not (entry['b'] in seen or seen.add(entry['b']))]
[{'a': 3, 'b': 4}, {'a': 5, 'b': 5}]
You never execute the else part of the if, because you do not update when you match the first time. You could do this:
[seen.append(x) or x for x in lst if x not in seen]
This way the or returns the last value (and executes the update using append (which always returns None, to let the or continue looking for truth-y value).
Maybe you can use the fact that dict keys are a set for this. If you want to prioritize the last items use reversed (last item is prioritized here):
>>> lst = [{'a': 3, 'b': 4}, {'a': 10, 'b': 4}, {'a': 5, 'b': 5}]
>>> filtered = {item['b']: item for item in reversed(lst)}
>>> filtered.values()
[{'a': 3, 'b': 4}, {'a': 5, 'b': 5}]
This uses 'b' as the key to map a value to, so only a single elemnt can be mapped to a value of 'b', which effectively creates a set over 'b'.
note: this will return the values in random order. To fix it nicely, for big datasets, I'd create another mapping, of each object to it's index in the original list (O(n)), and use that mapping as a sorting function of the final result (O(n*log(n))). That's beyond the scope of this answer.
I'm always queasy making use of operator precedence as execution flow control. I feel that the below is marginally more explicit and palatable, although it does carry the additional cost of tuple creation.
b_values = set()
[(item, b_values.add(item['b']))[0] for item in original_list
if item['b'] not in b_values]
But really when you're maintaining/updating some sort of state, I think the best format is the simple for-loop:
output_list = []
b_values = set()
for item in original_list:
if item['b'] not in b_values:
output_list.append(item)
b_values.add(item['b'])
I'm trying to wrap my brain around this but it's not flexible enough.
In my Python script I have a dictionary of dictionaries of lists. (Actually it gets a little deeper but that level is not involved in this question.) I want to flatten all this into one long list, throwing away all the dictionary keys.
Thus I want to transform
{1: {'a': [1, 2, 3], 'b': [0]},
2: {'c': [4, 5, 1], 'd': [3, 8]}}
to
[1, 2, 3, 0, 4, 5, 1, 3, 8]
I could probably set up a map-reduce to iterate over items of the outer dictionary to build a sublist from each subdictionary and then concatenate all the sublists together.
But that seems inefficient for large data sets, because of the intermediate data structures (sublists) that will get thrown away. Is there a way to do it in one pass?
Barring that, I would be happy to accept a two-level implementation that works... my map-reduce is rusty!
Update:
For those who are interested, below is the code I ended up using.
Note that although I asked above for a list as output, what I really needed was a sorted list; i.e. the output of the flattening could be any iterable that can be sorted.
def genSessions(d):
"""Given the ipDict, return an iterator that provides all the sessions,
one by one, converted to tuples."""
for uaDict in d.itervalues():
for sessions in uaDict.itervalues():
for session in sessions:
yield tuple(session)
...
# Flatten dict of dicts of lists of sessions into a list of sessions.
# Sort that list by start time
sessionsByStartTime = sorted(genSessions(ipDict), key=operator.itemgetter(0))
# Then make another copy sorted by end time.
sessionsByEndTime = sorted(sessionsByStartTime, key=operator.itemgetter(1))
Thanks again to all who helped.
[Update: replaced nthGetter() with operator.itemgetter(), thanks to #intuited.]
I hope you realize that any order you see in a dict is accidental -- it's there only because, when shown on screen, some order has to be picked, but there's absolutely no guarantee.
Net of ordering issues among the various sublists getting catenated,
[x for d in thedict.itervalues()
for alist in d.itervalues()
for x in alist]
does what you want without any inefficiency nor intermediate lists.
edit: re-read the original question and reworked answer to assume that all non-dictionaries are lists to be flattened.
In cases where you're not sure how far down the dictionaries go, you would want to use a recursive function. #Arrieta has already posted a function that recursively builds a list of non-dictionary values.
This one is a generator that yields successive non-dictionary values in the dictionary tree:
def flatten(d):
"""Recursively flatten dictionary values in `d`.
>>> hat = {'cat': ['images/cat-in-the-hat.png'],
... 'fish': {'colours': {'red': [0xFF0000], 'blue': [0x0000FF]},
... 'numbers': {'one': [1], 'two': [2]}},
... 'food': {'eggs': {'green': [0x00FF00]},
... 'ham': ['lean', 'medium', 'fat']}}
>>> set_of_values = set(flatten(hat))
>>> sorted(set_of_values)
[1, 2, 255, 65280, 16711680, 'fat', 'images/cat-in-the-hat.png', 'lean', 'medium']
"""
try:
for v in d.itervalues():
for nested_v in flatten(v):
yield nested_v
except AttributeError:
for list_v in d:
yield list_v
The doctest passes the resulting iterator to the set function. This is likely to be what you want, since, as Mr. Martelli points out, there's no intrinsic order to the values of a dictionary, and therefore no reason to keep track of the order in which they were found.
You may want to keep track of the number of occurrences of each value; this information will be lost if you pass the iterator to set. If you want to track that, just pass the result of flatten(hat) to some other function instead of set. Under Python 2.7, that other function could be collections.Counter. For compatibility with less-evolved pythons, you can write your own function or (with some loss of efficiency) combine sorted with itertools.groupby.
A recursive function may work:
def flat(d, out=[]):
for val in d.values():
if isinstance(val, dict):
flat(d, out)
else:
out+= val
If you try it with :
>>> d = {1: {'a': [1, 2, 3], 'b': [0]}, 2: {'c': [4, 5, 6], 'd': [3, 8]}}
>>> out = []
>>> flat(d, out)
>>> print out
[1, 2, 3, 0, 4, 5, 6, 3, 8]
Notice that dictionaries have no order, so the list is in random order.
You can also return out (at the end of the loop) and don't call the function with a list argument.
def flat(d, out=[]):
for val in d.values():
if isinstance(val, dict):
flat(d, out)
else:
out+= val
return out
call as:
my_list = flat(d)