In Python, count unique key/value pairs in a dictionary

In Python, count unique key/value pairs in a dictionary - python

I have a dictionary that is made with a list of values. Some of these values are also keys or values in other key/value pairs in the dictionary. I would simply like to count how many of these unique pairs there are in the dictionary.
Ex.
dict = {'dog':['milo','otis','laurel','hardy'],'cat':['bob','joe'],'milo':['otis','laurel','hardy','dog'],'bob':['cat','joe'],'hardy':['dog']}
I need to count the number of key/value pairs that do not have share a key/value with another in the dict. For example the above should count to only 2, those connected to dog and cat. Even though milo is unique to dog, dog is also in the key/value pair 'hardy' and both of these should therefore be counted together (ie, only 1). (See comments below)
I have tried to go about it by replacing a key (key A) that exists in the values of another key (key B) with 'key B', without success however as I cannot specify key B correctly.
for keys, values in dict.iteritems():
for key,value in dict.iteriterms():
if key in values:
dict[keys] = dict.pop(key)
Is there an easier method?
Thanks in advance...

If I understand the problem correctly, your dictionary is the adjacency map of a graph and you're trying to find the sets of connected components. The regular algorithm (using a depth- or breadth-first search) may not work correctly since your graph is not undirected (e.g. you have edges from "bob" and "cat" to "joe", but none coming out from "joe").
Instead, I suggest using a disjoint set data structure. It's not hard to build one using a dictionary to handle the mapping of values to parents. Here's an implementation I wrote for a previous question:
class DisjointSet:
def __init__(self):
self.parent = {}
self.rank = {}
def find(self, element):
if element not in self.parent: # leader elements are not in `parent` dict
return element
leader = self.find(self.parent[element]) # search recursively
self.parent[element] = leader # compress path by saving leader as parent
return leader
def union(self, leader1, leader2):
rank1 = self.rank.get(leader1,0)
rank2 = self.rank.get(leader2,0)
if rank1 > rank2: # union by rank
self.parent[leader2] = leader1
elif rank2 > rank1:
self.parent[leader1] = leader2
else: # ranks are equal
self.parent[leader2] = leader1 # favor leader1 arbitrarily
self.rank[leader1] = rank1+1 # increment rank
And here's how you could use it to solve your problem:
djs = DisjointSet()
all_values = set()
for key, values in my_dict.items():
all_values.add(key)
all_values.update(values)
for val in values:
l1 = djs.find(key)
l2 = djs.find(val)
if l1 != l2:
djs.union(l1, l2)
roots = {djs.find(x) for x in all_values}
print("The number of disjoint sets is:", len(roots))
The first part of this code does two things. First it builds a set with all the unique nodes found anywhere in the graph. Secondly, it combines the nodes into disjoint sets by doing a union wherever there's an edge.
The second step is to build up a set of "root" elements from the disjoint set.

Here is one possible solution:
values = {'dog':['milo','otis','laurel','hardy'],
'cat':['bob','joe'],
'milo':['otis','laurel','hardy','dog'],
'bob':['cat','joe'],
'hardy':['dog']}
result = []
for x in values.iteritems():
y = set([x[0]] + x[1])
if not any([z for z in result if z.intersection(y)]):
result.append(y)
print len(result)
Note that you shouldn't call a variable dict because you're shadowing the built-in type dict.
Your goal is unclear, but you can modify the construction of the y set to meet your needs.

If I understand your question correctly, you are trying to describe a graph-like structure, and you're looking at whether the keys appear in a value list. Since you are only interested in count, you don't have to worry about future value lists, when iterating through the dict, so this should work:
d = {'dog': ['milo','otis','laurel','hardy'],'cat': ['bob','joe'],'milo': 'otis','laurel','hardy','dog'], 'bob': ['cat','joe'], 'hardy': ['dog']}
seen = set()
unique = []
for key, values in d.iteritems():
if key not in seen:
unique.append(key)
seen = seen.union(values)
print(len(unique))
Note that the actual values contained in unique are dependent on dict ordering, are are only keys, not values. If you are actually trying to some sort of network or graph analysis, I suggest you make use of a library such as networkx

Related

Compare list with dictionary (that contains wildcards), return values

I have a list that contains several strings and a dictionary with strings (that contain wildcards) as keys and integers as values.
For example like this:
list1 = ['i', 'like', 'tomatoes']
dict1 = {'tomato*':'3', 'shirt*':'7', 'snowboard*':'1'}
I would like to go through list1 and see if there is a key in dict1 that (with the wildcard) matches the string from list1 and get the respective value from dict1. So in this case 3 for 'tomato*'.
Is there a way to iterate over list1, see if one of the dict1 keys (with wildcards) matches with this particular string and return the value from dict1?
I know I could iterate over dict1 and compare the keys with the elements in list1 this way. But in my case, the dict is very large and in addition, I have a lot of lists to go through. So it would take too much time to loop through the dictionary every time.
I thought about turning the keys into a list as well and get wildcard matches with a list comprehension and fnmatch(), but the returned match wouldn't be able to find the value in the dict (because of the wildcard).

Here is a data structure implemented using default python package to help you.
from collections import defaultdict
class Trie(defaultdict):
def __init__(self, value=None):
super().__init__(lambda: Trie(value)) # Trie is essentially hash-table within hash-table
self.__value = value
def __getitem__(self, key):
node = self
if len(key) > 1: # allows you to access the trie like this trie["abc"] instead of trie["a"]["b"]["c"]
for char in key:
node = node[char]
return node
else: # actual getitem routine
return defaultdict.__getitem__(self, key)
def __setitem__(self, key, value):
node = self
if len(key) > 1: # allows you to access the trie like this trie["abc"] instead of trie["a"]["b"]["c"]
for char in key[:-1]:
node = node[char]
node[key[-1]] = value
else: # actual setitem routine
if type(value) is int:
value = Trie(int(value))
defaultdict.__setitem__(self, key, value)
def __str__(self):
return str(self.__value)
d = Trie()
d["ab"] = 3
print(d["abcde"])
3

Get key by more than one value in dictionary?

Maybe the dict is not intended to be used in this way, but I need to add more than one value to the same key. My intension is to use a kind of transitory property. If my dict is A:B and B:C, than I want to have the dict A:[B,C].
Let's make an example in order to explain better what I'd like to do:
numDict={'60':['4869'], '4869':['629'], '13':['2']}
I want it to return:
{'60':['4869','629'], '13':['2']}
For just two elements, it is possible to use something like this:
result={}
for key in numDict.keys():
if [key] in numDict.values():
result[list(numDict.keys())[list(numDict.values()).index([key])]]=[key]+numDict[key]
But what about if I have more elements? For example:
numDict={'60':['4869'], '4869':['629'], '13':['2'], '629':['427'}
What can I do in order to get returned {'60':[4869,629,427'], '13':['2']}?

def unchain(d):
#assemble a collection of keys that are not also values. These will be the keys of the final dict.
top_level_keys = set(d.keys()) - set(d.values())
result = {}
for k in top_level_keys:
chain = []
#follow the reference chain as far as necessary.
value = d[k]
while True:
if value in chain: raise Exception("Referential loop detected: {} encountered twice".format(value))
chain.append(value)
if value not in d: break
value = d[value]
result[k] = chain
return result
numDict={'60':'4869', '4869':'629', '13':'2', '629':'427'}
print(unchain(numDict))
Result:
{'60': ['4869', '629', '427'], '13': ['2']}
You might notice that I changed the layout of numDict since it's easier to process if the values aren't one-element lists. But if you're dead set on keeping it that way, you can just add d = {k:v[0] for k,v in d.items()} to the top of unchain, to convert from one to the other.

You can build your own structure, consisting of a reverse mapping of (values, key), and a dictionary of (key, [values]). Adding a key, value pair consists of following a chain of existing entries via the reverse mapping, until it finds the correct location; in case it does not exist, it introduces a new key entry:
class Groupir:
def __init__(self):
self.mapping = {}
self.reverse_mapping = {}
def add_key_value(self, k, v):
self.reverse_mapping[v] = k
val = v
key = k
while True:
try:
self.reverse_mapping[val]
key = val
val = self.reverse_mapping[val]
except KeyError:
try:
self.mapping[val].append(v)
except KeyError:
self.mapping[val] = [v]
break
with this test client:
groupir = Groupir()
groupir.add_key_value(60, 4869)
print(groupir.mapping)
groupir.add_key_value(4869, 629)
print(groupir.mapping)
groupir.add_key_value(13, 2)
print(groupir.mapping)
groupir.add_key_value(629, 427)
print(groupir.mapping)
outputs:
{60: [4869]}
{60: [4869, 629]}
{60: [4869, 629], 13: [2]}
{60: [4869, 629, 427], 13: [2]}
Restrictions:
Cycles as mentioned in comments.
Non unique keys
Non unique values
Probably some corner cases to take care of.

I have written a code for it. See if it helps.
What I have done is to go on diving in till i can go (hope you understand this statement) and mark them as visited as they will no longer be required. At the end I filter out the root keys.
numDict={'60':['4869'], '4869':['629'], '13':['2'], '629':['427']}
l = list(numDict) # list of keys
l1 = {i:-1 for i in numDict} # to track visited keys (initialized to -1 initially)
for i in numDict:
# if key is root and diving in is possible
if l1[i] == -1 and numDict[i][0] in l:
t = numDict[i][0]
while(t in l): # dive deeper and deeper
numDict[i].extend(numDict[t]) # update the value of key
l1[t] = 1 # mark as visited
t = numDict[t][0]
# filter the root keys
answer = {i:numDict[i] for i in numDict if l1[i] == -1}
print(answer)
Output:
{'60': ['4869', '629', '427'], '13': ['2']}

Changing only one value of a key in a dictionary that has multiple values

I have a key with multiple values assigned. I would like to ask the user for input (new_population), and replace the old value (current_population) within the key. I would like the other value within the key (num) to remain unaffected.
current_population = 5
num = 3
dict = {}
dict["key"] = [current_population, num]
new_population = input("What is the new population?")
Say (for example sake), the value of new_population is 10. My goal is a final output of:
{'key': [10, 3]}
How would I go about doing this?

To be clear, what you actually have is a dictionary where each element is a list. It is not possible to have multiple elements with the same key, since that would create undefined behaviour (which element would be returned by a lookup?). If you do
current_population = 5
num = 3
mydict = {}
mydict["key"] = [current_population, num]
elem = mydict["key"]
print elem
you will see that elem is actually the list [5,3]. So to get or set either value, you need to index into the list you get from indexing into the dictionary.
mydict["key"][0] = new_population
(like in the accepted answer).
If you don't want to keep track of which index is population and which is num, you could make a dictionary of dictionaries instead:
mydict = {}
mydict["key"] = {"pop": current_population, "num", num}
mydict["key"]["pop"] = new_population

dict["key"][0] = new_population

Finding if there are distinct elements in a python dictionary

I have a python dictionary containing n key-value pairs, out of which n-1 values are identical and 1 is not. I need to find the key of the distinct element.
For example: consider a python list [{a:1},{b:1},{c:2},{d:1}]. I need the to get 'c' as the output.
I can use a for loop to compare consecutive elements and then use two more for loops to compare those elements with the other elements. But is there a more efficient way to go about it or perhaps a built-in function which I am unaware of?

If you have a dictionary you can quickly check and find the first value which is different from the next two values cycling around the keys of your dictionary.
Here's an example:
def find_different(d):
k = d.keys()
for i in xrange(0, len(k)):
if d[k[i]] != d[k[(i+1)%len(k)]] and d[k[i]] != d[k[(i+2)%len(k)]]:
return k[i]
>>> mydict = {'a':1, 'b':1, 'c':2, 'd':1}
>>> find_different(mydict)
'c'
Otherwise, if what you have is a list of single-key dictionaries, then you can do it quite nicely mapping your list with a function which "extracts" the values from your elements, then check each one using the same logic.
Here's another working example:
def find_different(l):
mask = map(lambda x: x[x.keys()[0]], l)
for i in xrange(0, len(l)):
if mask[i] != mask[(i+1)%len(l)] and mask[i] != mask[(i+2)%len(l)]:
return l[i].keys()[0]
>>> mylist = [{'a':1},{'b':1},{'c':2},{'d':1}]
>>> find_different(mylist)
'c'
NOTE: these solutions do not work in Python 3 as the map function doesn't return a list and neither does the .keys() method of dictionaries.

Assuming that your "list of pairs" (actually list of dictionaries, sigh) cannot be changed:
from collections import defaultdict
def get_pair(d):
return (d.keys()[0], d.values()[0])
def extract_unique(l):
d = defaultdict(list)
for key, value in map(get_pair, l):
d[value].append(key)
return filter(lambda (v,l): len(l) == 1, d.items())[0][1]

If you already have your dictionary, then you make a list of all of the keys: key_list = yourDic.keys(). Using that list, you can then loop through your dictionary. This is easier if you know one of the values, but below I assume that you do not.
yourDic = {'a':1, 'b':4, 'c':1, 'd':1, }
key_list = yourDic.keys()
previous_value = yourDic[key_list[0]] # Making it so loop gets past first test
count = 0
for key in key_list:
test_value = yourDic[key]
if (test_value != previous_value) and count == 1: # Checks first key
print key_list[count - 1]
break
elif (test_value != previous_value):
print key
break
else:
previous_value = test_value
count += 1
So, once you find the value that is different, it will print the key. If you want it to print the value, too, you just need a print test_value statement

Python dictionary is not staying in order

I created a dictionary of the alphabet with a value starting at 0, and is increased by a certain amount depending on the word file. I hard coded the initial dictionary and I wanted it to stay in alphabetical order but it does not at all. I want it to return the dictionary in alphabetical order, basically staying the same as the initial dictionary.
How can i keep it in order?
from wordData import*
def letterFreq(words):
totalLetters = 0
letterDict = {'a':0,'b':0,'c':0,'d':0,'e':0,'f':0,'g':0,'h':0,'i':0,'j':0,'k':0,'l':0,'m':0,'n':0,'o':0,'p':0,'q':0,
'r':0,'s':0,'t':0,'u':0,'v':0,'w':0,'x':0,'y':0,'z':0}
for word in words:
totalLetters += totalOccurences(word,words)*len(word)
for char in range(0,len(word)):
for letter in letterDict:
if letter == word[char]:
for year in words[word]:
letterDict[letter] += year.count
for letters in letterDict:
letterDict[letters] = float(letterDict[letters] / totalLetters)
print(letterDict)
return letterDict
def main():
filename = input("Enter filename: ")
words = readWordFile(filename)
letterFreq(words)
if __name__ == '__main__':
main()

Update for Python 3.7+:
Dictionaries now officially maintain insertion order for Python 3.7 and above.
Update for Python 3.6:
Dictionaries maintain insertion order in Python 3.6, however, this is considered an implementation detail and should not be relied upon.
Original answer - up to and including Python 3.5:
Dictionaries are not ordered and don't keep any order for you.
You could use an ordered dictionary, which maintains insertion order:
from collections import OrderedDict
letterDict = OrderedDict([('a', 0), ('b', 0), ('c', 0)])
Or you could just return a sorted list of your dictionary contents
letterDict = {'a':0,'b':0,'c':0}
sortedList = sorted([(k, v) for k, v in letterDict.iteritems()])
print sortedList # [('a', 0), ('b', 0), ('c', 0)]

You're only needing the keys in order once, so:
# create letterDict as in your question
keys = list(letterDict)
keys.sort()
for key in keys:
# do whatever with letterDict[key]
If you needed them in order more than once, you could use the standard library's collections.OrderedDict. Sometimes that's all you need. It preserves dictionary key order by order of addition.
If you truly need an ordered-by-keys dictionary type, and you don't need it just once (where list_.sort() is better), you could try one of these:
http://stromberg.dnsalias.org/~dstromberg/datastructures/
With regard to the above link, if your keys are getting added in an already-sorted order, you're probably best off with a treap or red-black tree (a treap is better on average, but red-black trees have a lower standard deviation). If your keys are (always) getting added in a randomized order, then the simple binary tree is better.
BTW, current fashion seems to favor sorted(list_) over list_.sort(), but sorted(list_) is a relatively recent addition to the language that we got along fine without before it was added, and it's a little slower. Also, list_.sort() doesn't give rise to one-liner-abuse the way sorted(list_) does.
Oh, and vanilla dictionaries are unordered - that's why they're fast for accessing arbitrary elements (they're built on a hash table). Some of the types at datastructures URL I gave above are good at dict_.find_min() and dict_.find_max() and obviate keys.sort(), but they're slower (logn) at accessing arbitrary elements.

You can sort your dictionary's keys and iterate over your dict.
>>> for key in sorted(letterDict.keys()):
... print ('{}: {}').format(key, letterDict.get(key))
...
a: 0
b: 0
c: 0
d: 0
e: 0
...
OR
This can be a possible solution in your case. We can have all your dictionary's keys in list whose sequence doesn't change and then we can get values in that order from your dictionary.
>>> import string
>>> keys = list(string.ascii_lowercase)
>>> letterDict = {'a':0,'b':0,'c':0,'d':0,'e':0,'f':0,'g':0,'h':0,'i':0,'j':0,'k':0,'l':0,'m':0,'n':0,'o':0,'p':0,'q':0,
... 'r':0,'s':0,'t':0,'u':0,'v':0,'w':0,'x':0,'y':0,'z':0}
>>> for key in keys:
... if key in letterDict:
... print ('{}: {}').format(key, letterDict.get(key))
...
a: 0
b: 0
c: 0
d: 0
e: 0
f: 0
g: 0
h: 0
i: 0
j: 0
k: 0
l: 0
m: 0
....

I wouldn't implement it that way. It's pretty hard to read. Something more like this:
# Make sure that division always gives you a float
from __future__ import division
from collections import defaultdict, OrderedDict
from string import ascii_lowercase
...
letterDict = defaultdict(int)
...
# Replace the for char in range(0,len(word)): loop with this
# Shorter, easier to understand, should be equivalent
for year in words[word]:
for char in word:
letterDict[char] += year.count
...
# Filter out any non-letters at this point
# Note that this is the OrderedDict constructor given a generator that creates tuples
# Already in order since ascii_lowercase is
letterRatio = OrderedDict((letter, letterDict[letter] / totalLetters) for letter in ascii_lowercase)
print(letterRatio)
return letterRatio
...
Now that you're returning an OrderedDict, the order will be preserved. I do caution you, though. If you really need it to be in order at some point, I would just sort it when you need it in the right order. Don't depend on functions that compute new data to return things in a specific sort order. Sort it when you need it sorted, and not before.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

In Python, count unique key/value pairs in a dictionary - python

Related

Compare list with dictionary (that contains wildcards), return values

Get key by more than one value in dictionary?

Changing only one value of a key in a dictionary that has multiple values

Finding if there are distinct elements in a python dictionary

Python dictionary is not staying in order

Categories

Resources