In Python I currently have a Dictionary with a composite Key. In this dictionary there are multiple occurences of these keys. (The keys are comma-separated):
(A,B), (A,C), (A,B), (A,D), (C,A), (A,B), (C,A), (C,B), (C,B)
I already have something that totals the unique occurrences and counts the duplicates which gives me a print-out similar to this:
(A,B) with a count of 4, (A,C) with a count of 2, (B,C) with a count of 6, etc.
I would like to know how to code a loop that would give me the following:
Print out the first occurance of the first part of the key and its associtated values and counts.
Name: A:
Type Count
B 4
C 2
Total 6
Name: B:
Type Count
A 3
B 2
C 3
Total 8
I know I need to create a loop where the first statement = the first statement and do the following, but have no real idea how to approach/code this.
Here's a slightly slow algorithm that'll get it done:
def convert(myDict):
keys = myDict.keys()
answer = collections.defaultdict(dict)
for key in keys:
for k in [k for k in keys if k.startswith(key[0])]:
answer[key[0]][k[1]] = myDict[k]
return answer
Ultimately, I think what you're after is a trie
Its a little misleading to say that your dictionary has multiple values for a given key. Python doesn't allow that. Instead, what you have are keys that are tuples. You want to unpack those tuples and rebuild a nested dictionary.
Here's how I'd do it:
import collections
# rebuild data structure
nested = collections.defaultdict(dict)
for k, v in myDict.items():
k1, k2 = k # unpack key tuple
nested[k1][k2] = v
# print out data in the desired format (with totals)
for k1, inner in nested.items():
print("%s\tType\tCount" % k1)
total = 0
for k2, v in innner.items():
print("\t%s\t%d" % (k2, v))
total += v
print("\tTotal\t%d" % total)
Related
I have a python dictionary with several keys.
Example:
dicOut = dict(list(zip(keys, values)))
for i in keys:
print(i)
out:
Trees
Cars
People
.... x n keys
I wish to assign a number in front.
Trees
Cars
People
.... x n keys
How do I make the for loop"
so far:
k = len(keys)
x = range (1,k+1)
for j in x:
for k in keys:
n= j, '-', k
print(n)
However it print all e.g. 3 keys 3 time. How to stop it at just e.g. e distinct keys.
for key in dict:
print(key, '. ', dict[key])
for i,k in enumerate(dicOut, start=1):
print(i,k)
I have dictiony which contains 36 data items. I want to replicate each record 100 times. So total records would be 3600.
def createDataReplication(text_list):
data_item = {}
print(len(text_list))
for k,v in text_list.iteritems():
for i in range(0,100):
data_item[k+str(i)] = v
print(len(data_item))
output
36
3510
Why it's 3510 and not 3600? Am I doing any mistake?
The concatenation k+str(i) is repeated for some combinations of k and i. Dictionary keys must be unique. This causes existing keys to be overwritten.
I suggest you use tuple keys instead which, in addition, aligns data structure with your logic:
for k, v in text_list.iteritems():
for i in range(100):
data_item[(k, i)] = v
Consider that a key like '110' could be created in two ways:
k+str(i) = '1' + str(10) or
k+str(i) = '11' + str(0).
You need to replace k+str(i) with something that is guaranteed to create unique key values. One way to do that is make the key a tuple: (k, i):
data_item[k,i] = v
self.dl = ({'a':1, 'b': 2, 'c': 3}, {'c':13, 'd':14, 'e':15}, {'e':25, 'f':26, 'g':27})
I have this tuple of dictionaries and am trying to get the count of all the distinct keys. I am only able to do this so that all the keys are counted. The output here should be 7 but I am getting 9 because c and e are being counted twice.
I have this so far:
new = []
for d in self.dl:
for k in d:
new.append(k)
return len(new)
from operator import or_ as union
from functools import reduce
len(reduce(union, map(dict.keys, self.dl)))
The view returned by keys() already acts like a set. So if you take the union (or_) of all the key sets from your dicts, you get the set of all keys. The length of that set is the number of unique keys.
You could probably try something like this:
new = {}
for d in self.dl:
for k in d:
if k not in new:
new[k] = 1
return len(new)
sets are definitely the data structure you want here. There are a few different ways you can accumulate the set:
new = set(k for d in self.dl for k in d)
or
from itertools import chain
new = set(chain(*self.dl))
or
new = set()
for d in self.dl:
new &= d.keys()
In all cases, len(new) will be the number of unique keys.
I have a dictionary like this :
d = {'v03':["elem_A","elem_B","elem_C"],'v02':["elem_A","elem_D","elem_C"],'v01':["elem_A","elem_E"]}
How would you return a new dictionary with the elements that are not contained in the key of the highest value ?
In this case :
d2 = {'v02':['elem_D'],'v01':["elem_E"]}
Thank you,
I prefer to do differences with the builtin data type designed for it: sets.
It is also preferable to write loops rather than elaborate comprehensions. One-liners are clever, but understandable code that you can return to and understand is even better.
d = {'v03':["elem_A","elem_B","elem_C"],'v02':["elem_A","elem_D","elem_C"],'v01':["elem_A","elem_E"]}
last = None
d2 = {}
for key in sorted(d.keys()):
if last:
if set(d[last]) - set(d[key]):
d2[last] = sorted(set(d[last]) - set(d[key]))
last = key
print d2
{'v01': ['elem_E'], 'v02': ['elem_D']}
from collections import defaultdict
myNewDict = defaultdict(list)
all_keys = d.keys()
all_keys.sort()
max_value = all_keys[-1]
for key in d:
if key != max_value:
for value in d[key]:
if value not in d[max_value]:
myNewDict[key].append(value)
You can get fancier with set operations by taking the set difference between the values in d[max_value] and each of the other keys but first I think you should get comfortable working with dictionaries and lists.
defaultdict(<type 'list'>, {'v01': ['elem_E'], 'v02': ['elem_D']})
one reason not to use sets is that the solution does not generalize enough because sets can only have hashable objects. If your values are lists of lists the members (sublists) are not hashable so you can't use a set operation
Depending on your python version, you may be able to get this done with only one line, using dict comprehension:
>>> d2 = {k:[v for v in values if not v in d.get(max(d.keys()))] for k, values in d.items()}
>>> d2
{'v01': ['elem_E'], 'v02': ['elem_D'], 'v03': []}
This puts together a copy of dict d with containing lists being stripped off all items stored at the max key. The resulting dict looks more or less like what you are going for.
If you don't want the empty list at key v03, wrap the result itself in another dict:
>>> {k:v for k,v in d2.items() if len(v) > 0}
{'v01': ['elem_E'], 'v02': ['elem_D']}
EDIT:
In case your original dict has a very large keyset [or said operation is required frequently], you might also want to substitute the expression d.get(max(d.keys())) by some previously assigned list variable for performance [but I ain't sure if it doesn't in fact get pre-computed anyway]. This speeds up the whole thing by almost 100%. The following runs 100,000 times in 1.5 secs on my machine, whereas the unsubstituted expression takes more than 3 seconds.
>>> bl = d.get(max(d.keys()))
>>> d2 = {k:v for k,v in {k:[v for v in values if not v in bl] for k, values in d.items()}.items() if len(v) > 0}
How can I take a list of values (percentages):
example = [(1,100), (1,50), (2,50), (1,100), (3,100), (2,50), (3,50)]
and return a dictionary:
example_dict = {1:250, 2:100, 3:150}
and recalculate by dividing by sum(example_dict.values())/100:
final_dict = {1:50, 2:20, 3:30}
The methods I have tried for mapping the list of values to a dictionary results in values being iterated over rather than summed.
Edit:
Since it was asked here are some attempts (after just writing over old values) that went no where and demonstrate my 'noviceness' with python:
{k: +=v if k==w[x][0] for x in range(0,len(w),1)}
invalid
for i in w[x][0] in range(0,len(w),1):
for item in r:
+=v (don't where I was going on that one)
invalid again.
another similar one that was invalid, nothing on google, then to SO.
You could try something like this:
total = float(sum(v for k,v in example))
example_dict = {}
for k,v in example:
example_dict[k] = example_dict.get(k, 0) + v * 100 / total
See it working online: ideone
Use the Counter class:
from collections import Counter
totals = Counter()
for k, v in example: totals.update({k:v})
total = sum(totals.values())
final_dict = {k: 100 * v // total for k, v in totals.items()}