How to use defaultdict in dict.fromkeys? - python

I want to count the histogram for a property value(depth here) of 3 different samples with 1 dictionary.
SamplesList = ('Sa','Sb','Sc')
from collections import defaultdict
DepthCnt = dict.fromkeys(SamplesList, defaultdict(int))
This code will make DepthCnt contains 3 defaultdict(int) of the same one, thus I cannot count different samples.
How can I do it right ?
It is OK to use either DepthCnt[sample][depth] or DepthCnt[depth][sample].
I tested these 3 ways:
from collections import defaultdict
DepthCnt = {key:defaultdict(int) for key in SamplesList}
yDepthCnt = defaultdict(lambda: defaultdict(int))
from collections import Counter
cDepthCnt = {key:Counter() for key in SamplesList}
The memory size are:
DepthCnt[sample][depth]: 993487
yDepthCnt[depth][sample]: 1953307
cDepthCnt[sample][depth]: 994207
It seems good to change to Counter().

Use a dictionary expression/comprehension/display
DepthCnt = {key:defaultdict(int) for key in SamplesList}

It sounds like you're trying to count occurences of sammples in SamplesList. If so you're looking for a collections.Counter
Given:
SamplesList = ('Sa','Sb','Sc')
Counter:
from collections import Counter
DepthCnt = Counter(SamplesList)
print(DepthCnt)
#Counter({'Sc': 1, 'Sa': 1, 'Sb': 1})
Edit:
You can always use a counter instead of a defaultdict as well
DepthCnt = {key:Counter() for key in SamplesList}
print(DepthCnt)
#DepthCnt = {'Sa': Counter(), 'Sb': Counter(), 'Sc': Counter()}
P.S
If you're working over a large dataset as well take a look into the Counter class both Counter and defaultdict are similar below is the TLDR from this great answer to a question on Collections.Counter vs defaultdict(int)
Counter supports most of the operations you can do on a multiset. So,
if you want to use those operation then go for Counter.
Counter won't add new keys to the dict when you query for missing
keys. So, if your queries include keys that may not be present in the
dict then better use Counter.
Counter also has a method called most_common that allows you to sort items by their count. To get the same thing in defaultdict you'll have to use sorted.

Related

how do you append a list to a dictionary value in Python?

So if you have some dictionary like this
dictionary={}
dictionary['a']=1
dictionary['a']=2
print(dictionary)
this would print {'a':2} and replaces the 1
Is there any way I can add 2 to the key 'a' as a list ?
I know i can do something this like:
dictionary['a']=[1,2]
but I don't want to do it like this.
Essentially what i am asking is how can i add the new value to my key using a list instead of replacing the previous value.
Appreciate the help!
dictionary = {}
dictionary['a'] = []
dictionary['a'].append(1)
dictionary['a'].append(2)
print(dictionary)
It would be worth considering using a defaultdict if every value in the dict is/will be a list:
from collections import defaultdict
d = defaultdict(list)
d['a'].append(1)
d['a'].append(2)

How can I auto-initialise key-value in python dictionary if key-value pair does not exist, without doing IF & ELSE?

How do I automatically initialize my python dictionary key-value (no idea what keys are used yet) as 1 if it does not yet exist and if it exists, just increment? I guess this concept can be used for any other logic.
Example:
The code below will give error because the char_counts[char] is not initialised yet for some. And I have no way of initialising this yet as I dont know what key-value pairs I will use at the start. (Actually if I do, side-tracked, is there a convenient way in python to intialise key-value pairs in one shot aside from looping?)
ANYWAY my main question is the below.
for ~some loop~:
char_counts[char] += 1
This is my current workaround, which seems a little lengthy for a simple operation. Is there better way to streamline/ shorten this?
for ~some loop~:
if char_counts.get(char, None):
char_counts[char] += 1
else:
char_counts[char] = 1
Thanks in advance!
use defaultdict from collections library of python
from collections import defaultdict
char_counts = defaultdict(lambda :0)
for ~some loop~:
char_counts[char] += 1
defaultdict will never throw KeyError, if key is missing that key is initialized with 0.
You can use get method of a dictionary object.
If the key is not present in the dictionary get will return the default value to which you want to initialize the dictionary.
char_counts = {}
for ~some loop~:
# If key is not present, 0 will be returned by default
char_counts[char] = char_count.get(char,0) + 1
once you understand how the above code works then read about defaultdict in collections module and try the below code.
from collections import defaultdict
char_counts = defaultdict(int)
for ~some loop~:
char_counts[char] += 1

Clean way of using a python dictionary to hold program statistics

I often write short programs when collect statistics when they run and report at the end. I normally gather these stats in a dictionary to display at the end.
I end up writing these like the simple example below, but I expect there is a cleaner more pythonic way to do this. This way can grow quite large (or nested) when there are several metrics.
stats = {}
def add_result_to_stats(result,func_name):
if not func_name in stats.keys():
stats[func_name] = {}
if not result in stats[func_name].keys():
stats[func_name][result] = 1
else:
stats[func_name][result] += 1
You could combine defaultdict with Counter which would reduce add_result_to_stats to one line:
from collections import defaultdict, Counter
stats = defaultdict(Counter)
def add_result_to_stats(result, func_name):
stats[func_name][result] += 1
add_result_to_stats('foo', 'bar')
print stats # defaultdict(<class 'collections.Counter'>, {'bar': Counter({'foo': 1})})
If you just have to count func_names and results go with a Counter
import collections
stats = collections.Counter()
def add_result_to_stats(result,func_name):
stats.update({(func_name, result):1})

How can I compare the equivalence of keys to values (line 6 of code)

Assume there is a variable , mp_affiliation , that is associated with a dictionary that maps the names of parliament members to party affiliations, associate with the variable party_size a dictionary that maps party names to the number of members they have.
party_size={}
for i in list(mp_affiliation.values):
party_size[i]=0
for k in mp_affiliation:
for i in party_size:
if mp_affiliation[k]==i
party_size[i]+=1
Try this, it's simpler if we use the built-in Counter class:
from collections import Counter
party_size = Counter(mp_affiliation.values())
Now the party_size variable will contain a dictionary mapping the political parties with the number of parliament members. But if you want to do this by hand, the long answer would be:
party_size = {}
for i in mp_affiliation.values():
party_size[i] = 0
for i in mp_affiliation.values():
party_size[i] += 1
Or a bit shorter, using a defaultdict:
from collections import defaultdict
party_size = defaultdict(int)
for i in mp_affiliation.values():
party_size[i] += 1
There is no need to import anything special or write methods to do this. Loop through the keys and values, if the political party is already in the dict then add 1 to the member count, else add it to the dict and set member count to 1.
party_size = {}
for (k, v) in mp_affiliation.items():
if v in party_size.keys():
party_size[v]+=1
else:
party_size[v] = 1

How can you return a default value instead of a key error when accessing a multi-dimenional dictionary in python?

I'm trying to get a value out of a multi-dimensional dictionary, which looks for example like this:
count = {'animals': {'dogs': {'chihuahua': 23}}
So if i want to know how much chihuahua's i got, i'm printing count['animals']['dogs']['chihuahua']
But i want to access count['vehicles']['cars']['vw golf'] too, and instead of key errors i want to return 0.
actually i'm doing this:
if not 'vehicles' in count:
count['vehicles'] = {}
if not 'cars' in count['vehicles']:
count['vehicles']['cars'] = {}
if not 'vw golf' in count['vehicles']['cars']['vw golf']:
count['vehicles']['cars']['vw golf'] = 0
How can i do this better?
I'm thinking of some type of class which inherits from dict, but that's just an idea.
You can just do:
return count.get('vehicles', {}).get('cars', {}).get('vw golf', 0)
basically, return an empty dictionary if not found, and get the count at the end.
This would work assuming the dataset is in the specified format only. It would not raise errors, however you might have to tweak it for other datatypes
Demo
>>> count = {'animals': {'dogs': {'chihuahua': 23}}}
>>> count.get('vehicles', {}).get('cars', {}).get('vw golf', 0)
0
>>> count = {'vehicles': {'cars': {'vw golf': 100}}}
>>> count.get('vehicles', {}).get('cars', {}).get('vw golf', 0)
100
>>>
Use a combination of collections.defaultdict and collections.Counter:
from collections import Counter
from collections import defaultdict
counts = defaultdict(lambda: defaultdict(Counter))
Usage:
>>> counts['animals']['dogs']['chihuahua'] = 23
>>> counts['vehicles']['cars']['vw golf'] = 100
>>>
>>> counts['animals']['dogs']['chihuahua']
23
>>> # No fancy cars yet, Counter defaults to 0
... counts['vehicles']['cars']['porsche']
0
>>>
>>> # No bikes yet, empty counter
... counts['vehicles']['bikes']
Counter()
The lambda in the construction of the defaultdict is needed because defaultdict expects a factory. So lambda: defaultdict(Counter) basically creates a function that will return defaultdict(Counter) when called - which is what's required to create the multi-dimensional dictionary you described:
A dictionary whose values default to a dictionary whose values default to an instance of Counter.
The advantage of this solution is that you don't have to keep track of which categories you already defined. You can simply assign two new categories and a new count in one go, and use the same syntax to add a new count for existing categories:
>>> counts['food']['fruit']['bananas'] = 42
>>> counts['food']['fruit']['apples'] = 3
(This assumes that you'll always want exactly three dimensions to your data structure, the first two being category dictionaries and the third being a Counter where the actual counts of things will be stored).

Categories