Printing a particular subset of keys in a dictionary - python

I have a dictionary in Python where the keys are pathnames. For example:
dict["/A"] = 0
dict["/A/B"] = 1
dict["/A/C"] = 1
dict["/X"] = 10
dict["/X/Y"] = 11
I was wondering, what's a good way to print all "subpaths" given any key.
For example, given a function called "print_dict_path" that does this, something like
print_dict_path("/A")
or
print_dict_path("/A/B")
would print out something like:
"B" = 1
"C" = 1
The only method I can think of is something like using regex and going through the entire dictionary, but I'm not sure if that's the best method (nor am I that well versed in regex).
Thanks for any help.

One possibility without using regex is to just use startswith
top_path = '/A/B'
for p in d.iterkeys():
if p.startswith(top_path):
print d[p]

You can use str.find:
def print_dict_path(prefix, d):
for k in d:
if k.find(prefix) == 0:
print "\"{0}\" = {1}".format(k,d[k])

Well, you'll definitely have to loop through the entire dict.
def filter_dict_path( d, sub ):
for key, val in d.iteritems():
if key.startswith(sub): ## or do you want `sub in key` ?
yield key, val
print dict(filter_dict_path( old_dict, sub ))
You could speed this up by using the appropriate data structure: a Tree.

Is your dictionary structure fixed? It would be nicer to do this using nested dictionaries:
{
"A": {
"value": 0
"dirs": {
"B": {
"value": 1
}
"C": {
"value": 1
}
}
"X": {
"value": 10
"dirs": {
"Y": {
"value": 11
}
}
The underlying data structure here is a tree, but Python doesn't have that built in.

This removes one level of indenting, which may make the code in the body of the for loop more readable in some cases
top_path = '/A/B'
for p in (p for p in d.iterkeys() if p.startswith(top_path)):
print d[p]
If you find performance to be a problem, consider using a trie instead of the dictionary

Related

How to find the depth of a dictionary that contains a list of dictionaries?

I would like to know the depth of a dict that contains a list of dicts, I wrote a simple code but the problem is that it increment the counter of depth at each step.
this is the input that i have as an example :
respons = {
"root":{
"Flow":[{
"Name":"BSB1",
"Output":[{
"Name":"BSB2",
"Output":[{
"Name":"BSB5",
"Output":[{
"Name":"BSB6",
"Output":[{
"Name":"BSB8",
"Output":[]
}]
},
{
"Name":"BSB7",
"Output":[]
}]
}]
},
{
"Name":"BSB3",
"Output":[{
"Name":"BSB4",
"Output":[]
}]
}]
}]
}
}
def calculate_depth(flow,depth):
depth+=1
md = []
if flow['Output']:
for o in flow['Output']:
print(o['BusinessUnit'])
md.append(calculate_depth(o,depth))
print(max(md))
print(md)
return max(md)
else:
return depth
print(calculate_depth(respons['root']['Flow'][0],0))
normally I want the depth of the longest branch of this dict not to go through all of the branches and increment at each step
EDIT
The desired Outcome will be for this structure : 5
Why ?
It is the longest branche BSB1 => BSB2 => BSB5 => BSB6 => BSB8
What the depth is of this structure is debatable. Your code (and the way you indent the data structure) seems to suggest that you don't want to count the intermediate lists as adding a level to a path. Yet if you would want to access deep data you would write
respons['root']['Flow'][0]['Output'][0]['Output'][0]
# ^^^ ^^^ ^^^ ...not a level?
And taking this to the leaves of this tree: is the deepest [] a level?
Here is code that only counts dicts as adding to the level, and only when they are not empty:
def calculate_depth(thing):
if isinstance(thing, list) and len(thing):
return 0 + max(calculate_depth(item) for item in thing)
if isinstance(thing, dict) and len(thing):
return 1 + max(calculate_depth(item) for item in thing.values())
return 0
This prints 19 for the example data:
print(calculate_depth(respons['root']['Flow'][0]))
Adapt to your need.

Is there a pythonic way of referring to the current object (self-reference) we are declaring with (some) Pythons built-in types?

With (some) Pythons built-in types, is it possible to refer to the object we are declaring ?
By "(some) built-in types" I think about, for example, sequence types, mapping types or set types, obviously not numeric types.
I mean, without creating a class myself and adding this functionality (without creating a subclass).
So, something like the this keyword as used in the examples below.
For example, for the "dict" Python built-in type, something like this:
a_dictionary = {
"key_1": "value_1",
"key_2": "value_2",
"key_3": this["key_1"] + "_" + this["key_2"] # == "value_1_value_2"
}
or even:
a_dictionary = {
"sub_dict_1": {
"key_1": "value_1_1",
"key_2": "value_1_2",
"key_3": this["key_1"] + "_" + this["key_2"] # == "value_1_1_value_1_2"
},
"sub_dict_2": {
"key_1": "value_2_1",
"key_2": "value_2_2",
"key_3": this["key_1"] + "_" + this["key_2"] # == "value_2_1_value_2_2"
}
}
I've read :
When doing function chaining in python, is there a way to refer to the "current" object?
What do I do when I need a self referential dictionary?
Reference a dictionary within itself
Self-referencing classes in python?
Is there a way to refer to the current function in python?
Is it possible to access current object while doing list/dict comprehension in Python?
and some others, but it doesn't match up the requirements described at the begining of my question.
Thanks a lot for your help!
Python provides no way to refer to an object under construction by a literal or a display*. You can (ab)use the assignment expression in Python 3.8 or later to simulate this:
a_dictionary = {
"key_1": (x := "value_1"),
"key_2": (y := "value_2"),
"key_3": x + "_" + y
}
It requires some planning ahead, as you are not referring to a key value directly, rather a pre-defined variable. Notice that x and y remain in scope after the assignment to a_dictionary, so this is just a questionable equivalent of
x = "value_1"
y = "value_2"
a_dictionary = {
"key_1": x,
"key_2": y,
"key_3": x + "_" + y
}
A custom class would really be more appropriate:
class Thing:
def __init__(self, v1, v2):
self.key_1 = v1
self.key_2 = v2
self.key_3 = v1 + "_" + v2
a_thing = Thing("value_1", "value_2")
A display is a construct like a literal, but could contain non-literal references. For example, list displays include [1, x, y] and [int(x) for x in foo].
After some digging, and only for the case of a dictionary, I found this other workaround, based on What do I do when I need a self referential dictionary? :
class MyDict(dict):
def __getitem__(self, item):
return dict.__getitem__(self, item).format(self)
a_dictionary = MyDict({
"key_1": "value_1",
"key_2": "value_2",
"key_3": "{0[key_1]}" + "_" + "{0[key_2]}" # == "value_1_value_2"
})

Properly Formatting Set Output in Python

I'm writing a program who's input is a set of sets (or "collection") in python syntax. The output of the program should be the same collection in proper mathematical syntax. To do so, I've written a recursive function
collection = set([
frozenset(["a,b,c"]),
frozenset(),
frozenset(["a"]),
frozenset(["b"]),
frozenset(["a,b"])
])
def format_set(given_set):
# for each element of the set
for i in given_set:
#if the element is itself a set, begin recursion
if type(i) == frozenset:
format_set(i)
else:
return "{", i, "},",
calling format_set(collection) gives the output
{ a,b }, { a,b,c }, { b }, { a },
which is missing a pair of parenthesis, and has an extra comma at the end. The correct output would be
{{ a,b }, { a,b,c }, { b }, { a },{}}.
Thus, I would need to add "{" before the first recursion, and "}" after the last, as well as not adding the comma after the last recursion. Is there a way to find the final recursion?
I could always solve the extra parenthesis problem by defining:
def shortcut(x):
print "{", frozen_set(x), "}"
However, I feel like that's somewhat inelegant, and still leaves the comma problem.
It will be more straightforward if you check the type first and then do the iteration:
def format_set(given):
if isinstance(given, (set, frozenset)):
return '{' + ', '.join(format_set(i) for i in given) + '}'
else:
return str(given)
Output:
{{a,b}, {a,b,c}, {}, {b}, {a}}
Also, note that in your example input all sets are actually empty or have 1 element. If you change the input like this...
collection = set([
frozenset(["a", "b", "c"]),
frozenset(),
frozenset(["a"]),
frozenset(["b"]),
frozenset(["a", "b"])
])
...you'll get this output:
{{a, c, b}, {}, {b}, {a}, {a, b}}

Python: Summing values nested inside different dictionaries in a nested dictionary

I have a nested dictionary called "high_low_teams_in_profile" which looks like this:
{
m_profile1:
{
team_size1:
{
low: 1,
high: 1
},
team_size2:
{
low: 1,
high: 1
}
},
m_profile2:
{
team_size1:
{
low: 1,
high: 1
},
team_size2:
{
low: 1,
high: 1
}
}
}
And I want to get {m_profile1: 4, m_profile2: 4}
What is the most eloquent way to do it in python?
Right now I have the following:
new_num_teams_in_profile = {}
for profile in high_low_teams_in_profile:
new_num_teams_in_profile[profile]= dict((team_size, sum(high_low_teams_in_profile[profile][team_size].values())) for team_size in high_low_teams_in_profile[profile])
new_num_teams_in_profile= dict((profile, sum(new_num_teams_in_profile[profile].values())) for profile in new_num_teams_in_profile)
I'm not sure if I'd say it's the most Pythonic, but it's the most functional:
p = high_low_teams_in_profile
{ prof:sum(p[prof][team][hl]
for team in p[prof]
for hl in p[prof][team])
for prof in p}
The arguments of sum is a generator expression and the outer { prof:sum(...) for prof in p} is a dictionary comprehension.
While this may not be the most pythonic, the following code should work and is more readable than your original version. Note the iteritems() method, which allows access to both the keys and values of the dict, while itervalues(), as the name suggests, only iterates the values of the dict.
final = {}
for key, sizes in high_low_teams_in_profile.iteritems():
total = 0
for value in sizes.itervalues():
s = sum(value.itervalues())
total += s
final[key] = total
print final
In addition, you could use the following. While it is a shorter number of lines, it is slightly more difficult to read.
final = {}
for key, sizes in high_low_teams_in_profile.iteritems():
total = sum([sum(value.itervalues()) for value in sizes.itervalues()])
final[key] = total
print final

Accessing the values of a key

I have a dictionary like:
Data = {
"weight_factors" : {
"parameter1" : 10,
"parameter2" : 30,
"parameter3" : 30
},
"other_info" : {
}
}
I want to get the sum of all values that are under the key "weight_factors":
sum = Data["weight_factors"]["parameter1"] +
Data["weight_factors"]["parameter2"] +
Data["weight_factors"]["parameter3"]
Currently, in order to avoid entering Data["weight_factors"] repeatedly, I use the following commands:
d = Data["weight_factors"]
d["parameter1"] + d["parameter2"] + d["parameter3"]
But, I guess there should be an operator that does the same thing, without storing Data["weight_factors"] as an intermediate variable. I was wondering if such a command or an operator exists.
Data["weight_factors"]<unknown operator>(["parameter1"] +
["parameter2"] +
...
["parametern"])<unknown operator>
EDIT:
In the example given above, it was just a sum operation. But it could for example be:
Data["weight_factors"]["parameter1"] * Data["weight_factors"]["parameter2"] + Data[‌​"weight_factors"]["parameter3"]
But I do not want enter Data["weight_factors"] repeatedly. That's the thing I am searching for... I don't know whether such an operator exists. (In MATLAB, there exists such a thing for cell structures).
No, that kind of operator does not exist for the built-in dict type.
I suppose you could make your own dict type that inherited from dict and overloaded an operator:
class MyDict(dict):
def __add__(self, other):
"""Overload the + operator."""
...
but that is somewhat inefficient and not very good for readability.
If you just want to sum the values, you can use sum and dict.values (dict.itervalues if you are using Python 2.x):
>>> Data = {
... "weight_factors" : {
... "parameter1" : 10,
... "parameter2" : 30,
... "parameter3" : 30
... },
... "other_info" : {
... }
... }
>>> sum(Data["weight_factors"].values())
70
>>>
Otherwise, I would just use what you have now:
d = Data["weight_factors"]
myvar = d["parameter1"] * d["parameter2"] + d["parameter3"]
It is about as clean and efficient as you can get.
For a general solution to repeatedly get the same item from a mapping or index, I suggest the operator module's itemgetter:
>>> import operator
>>> Data = {
"weight_factors" : {
"parameter1" : 10,
"parameter2" : 30,
"parameter3" : 30
},
"other_info" : {
}
}
Now create our easy getter:
>>> get = operator.itemgetter('weight_factors')
And call it on the object whenever you want your sub-dict:
>>> get(Data)['parameter1']
returns:
10
and
>>> sum(get(Data).values())
returns
70
If this is just "how do I access a dict's values easily and repeatedly?" you should just assign them like this, and you can reuse them again and again.
In Python 2:
vals = Data['weight_factors'].values()
In Python 3, values returns an iterator, which you can't reuse, so materialize it in a list:
vals = list(Data['weight_factors'].values())
and then you can do whatever you want with it:
sum(vals)
max(vals)
min(vals)
etc...

Categories