converting lists to strings and integers in nested dictionary - python

I have a nested dictionary where I want to convert 'protein accession' values from lists into simple strings. Also, I want to convert lists of strings into lists of integers for example in 'sequence length', 'start location', 'stop location'.
'protein accession': ['A0A0H3LJT0_e'] into 'protein accession': 'A0A0H3LJT0_e'
'sequence length': ['102'] into 'sequence length': [102]
and so on
Here is the sample of my dictionary:
{
"A0A0H3LJT0_e": {
"protein accession": ["A0A0H3LJT0_e"],
"sequence length": ["102"],
"analysis": ["SMART"],
"signature accession": ["SM00886"],
"signature description": ["Dabb_2"],
"start location": ["4"],
"stop location": ["98"],
"e-value": ["1.5E-22"],
"interpro accession": ["IPR013097"],
"interpro description": ["Stress responsive alpha-beta barrel"],
"nunique": [2],
"domain_count": [1],
}
}
Could someone help me, please?

You need to iterate through the replace values accordingly.
d is the input dictionary here.
In [1]: data = d['C4QY10_e']
In [2]: result = {}
In [3]: for k,v in data.items():
...: if str(v[0]).isdigit():
...: result[k] = [int(v[0])]
...: else:
...: result[k] = v[0]
...:
In [4]: result
Out[4]:
{'protein accession': 'C4QY10_e',
'sequence length': [1879],
'analysis': 'Pfam',
'signature accession': 'PF18314',
'signature description': 'Fatty acid synthase type I helical domain',
'start location': [328],
'stop location': [528],
'e-value': '4.7E-73',
'interpro accession': 'IPR041550',
'interpro description': 'Fatty acid synthase type I',
'nunique': [1],
'domain_count': [5]}
To iterate through the entire dictionary like this,
for val in d.values():
for k,v in val.items():
if str(v[0]).isdigit():
result[k] = [int(v[0])]
else:
result[k] = v[0]
If you want to change the dictionary itself you can do this,
for main_k, main_val in d.items():
for k,v in main_val.items():
if str(v[0]).isdigit():
d[main_k][k] = [int(v[0])]
else:
d[main_k][k] = v[0]

Assuming you also want to convert the string representation of a floating point number then you could do this (which also allows for list values with more than one element:
sample = {
"A0A0H3LJT0_e": {
"protein accession": ["A0A0H3LJT0_e"],
"sequence length": ["102"],
"analysis": ["SMART"],
"signature accession": ["SM00886"],
"signature description": ["Dabb_2"],
"start location": ["4"],
"stop location": ["98"],
"e-value": ["1.5E-22"],
"interpro accession": ["IPR013097"],
"interpro description": ["Stress responsive alpha-beta barrel"],
"nunique": [2],
"domain_count": [1]
}
}
for sd in sample.values():
if isinstance(sd, dict):
for k, v in sd.items():
if isinstance(v, list):
try:
sd[k] = list(map(int, v))
except ValueError:
try:
sd[k] = list(map(float, v))
except ValueError:
sd[k] = ', '.join(map(str, v))
print(sd)
Output:
{'protein accession': 'A0A0H3LJT0_e', 'sequence length': [102], 'analysis': 'SMART', 'signature accession': 'SM00886', 'signature description': 'Dabb_2', 'start location': [4], 'stop location': [98], 'e-value': [1.5e-22], 'interpro accession': 'IPR013097', 'interpro description': 'Stress responsive alpha-beta barrel', 'nunique': [2], 'domain_count': [1]}
Note:
Unless every value in a list can be converted to either int or float, the values will be converted into a single string where each element is separated from the other by ', '

When creating a new mapping-type object with nested container objects (e.g. list, dict, set, etc.), a defaultdict from the built-in collections library may be called for.
However, let's assume you are modifying the existing dictionary in place, thus preserving the dict type. We can use two explicit for loops over dict.items():
# assume input is stored to the variable, data
for name, details in data.items(): # Also possible to use for details in data.values():
for attribute, values in details.items():
# Use tuple unpacking to raise a ValueError
# when no values or more than one value unexpectedly appears
(value,) = values
# Only consider strings with decimal values
if isinstance(value, str) and value.isdecimal():
# details is a reference to data[name]
details[attribute] = int(value)
else:
details[attribute] = value

Related

Python: Building list of lists using dictionary recursion

I have a very complicated problem that I am sort of hoping to rubberduck here:
I have a dictionary:
{
"1": {
"1.1": {
"1.1.1": {}
},
"1.2": {
"1.2.1": {}
}
},
"2": {
"2.1": {
"2.1.1": {}
},
"2.2": {
"2.2.2": {}
}
}
}
whose structure wont always be the same (i.e., there could be further nesting or more keys in any sub-dictionary). I need to be able to generate a specifically ordered list of lists (contained sub-lists need not be ordered) based on some input. The structure of the lists is based on the dictionary. Accounting for all keys in the dictionary, the list of lists would look like:
[['1', '2'], ['1.2', '1.1'], ['1.1.1'], ['1.2.1'], ['2.2', '2.1'], ['2.1.1'], ['2.2.2']]
That is, the first sublist contains the two keys at the highest level of the dictionary. The second sub-list contains the two keys under the first "highest level" key. The third and fourth sub-lists contain the keys available under the "2nd level" of the dictionary. (And so on)
I need a function that, based on input (that is any key in the nested dictionary), will return the correct list of lists. For example(s):
function('2.2.2')
>>> [['2'], None, None, None, ['2.2'], None, ['2.2.2']] # or [['2'], [], [], [], ['2.2'], [], ['2.2.2']]
function('1.1')
>>> [['1'], ['1.1'], None, None, None, None, None] # or [['1'], ['1.1'], [], [], [], [], []]
function('1.2.1')
>>> [['1'], ['1.2'], None, ['1.2.1'], None, None, None] # or [['1'], ['1.2'], [], ['1.2.1'], None, [], []]
It is almost like I need to be able to "know" the structure of the dictionary as I recurse. I keep thinking maybe if I can find the input key in the dictionary and then trace it up, I will be able to generate the list of lists but
how can I recurse "upwards" in the dictionary and
how in the world do I store the information in the lists as I "go along"?
Your master list is just a depth-first list of all the keys in your dict structure. Getting this is fairly easy:
def dive_into(d):
if d and isinstance(d, dict):
yield list(d.keys())
for v in d.values():
yield from dive_into(v)
d = {
"1": {
"1.1": {
"1.1.1": {}
},
"1.2": {
"1.2.1": {}
}
},
"2": {
"2.1": {
"2.1.1": {}
},
"2.2": {
"2.2.2": {}
}
}
}
master_list = list(dive_into(d))
# [['1', '2'], ['1.1', '1.2'], ['1.1.1'], ['1.2.1'], ['2.1', '2.2'], ['2.1.1'], ['2.2.2']]
Next, your function needs to find all the parent keys of the given key, and only return the keys that are in the path to the given key. Since your keys always have the format <parent>.<child>.<grandchild>, you only need to iterate over this list, and return any elements e for which key.startswith(e) is True:
def function(key):
lst = [[e for e in keys if key.startswith(e)] for keys in master_list]
return [item or None for item in lst]
Testing this with your examples:
>>> function('2.2.2')
Out: [['2'], None, None, None, ['2.2'], None, ['2.2.2']]
>>> function('1.1')
Out: [['1'], ['1.1'], None, None, None, None, None]
>>> function('1.2.1')
Out: [['1'], ['1.2'], None, ['1.2.1'], None, None, None]

Python - dictionaries

I am new to programming and python and I dont know how to solve this problem.
my_dict = {'tiger': ['claws', 'sharp teeth', 'four legs', 'stripes'],
'elephant': ['trunk', 'four legs', 'big ears', 'gray skin'],
'human': ['two legs', 'funny looking ears', 'a sense of humor']
}
new_dict = {}
for k, v in my_dict.items():
new_v = v + "WOW"
new_dict[k] = new_v
print(new_dict)
I want to make a new dictionary with added phrase but I got an error "can only concatenate list (not "str") to list", but when I am using only one value per key the programme works. Is there any solution to this?
You can concatenate a list to another list as follows:
if __name__ == '__main__':
my_dict = {'tiger': ['claws', 'sharp teeth', 'four legs', 'stripes'],
'elephant': ['trunk', 'four legs', 'big ears', 'gray skin'],
'human': ['two legs', 'funny looking ears', 'a sense of humor']
}
new_dict = {}
for k, v in my_dict.items():
new_v = v + ["WOW"]
new_dict[k] = new_v
print(new_dict)
{'tiger': ['claws', 'sharp teeth', 'four legs', 'stripes', 'WOW'], 'elephant': ['trunk', 'four legs', 'big ears', 'gray skin', 'WOW'], 'human': ['two legs', 'funny looking ears', 'a sense of humor', 'WOW']}
With list, you only can concatenate a list only.
#So here you're trying to concatenate a list with the string
my_dict['tiger'] + 'WOW' # this won't work as one is string and other is List.
my_dict['tiger'] + ['WOW'] # this will work as both are of same type and concatenation will happen.

Extract set of leaf values found in nested dicts and lists excluding None

I have a nested structure read from YAML which is composed of nested lists and/or nested dicts or a mix of both at various levels of nesting. It can be assumed that the structure doesn't contain any recursive objects.
How do I extract from it the leaf values only? Also, I don't want any None value. The leaf values contain strings which is all I care for. It's okay for recursion to be used, considering that the maximum depth of the structure is not large enough to exceed stack recursion limits. A generator would optionally also be fine.
There exist similar questions which deal with flattening lists or dicts, but not a mix of both. Alternatively, if flattening a dict, they also return the flattened keys which I don't really need, and risk name conflicts.
I tried more_itertools.collapse but its examples only show it to work with nested lists, and not with a mix of dicts and lists.
Sample inputs
struct1 = {
"k0": None,
"k1": "v1",
"k2": ["v0", None, "v1"],
"k3": ["v0", ["v1", "v2", None, ["v3"], ["v4", "v5"], []]],
"k4": {"k0": None},
"k5": {"k1": {"k2": {"k3": "v3", "k4": "v6"}, "k4": {}}},
"k6": [{}, {"k1": "v7"}, {"k2": "v8", "k3": "v9", "k4": {"k5": {"k6": "v10"}, "k7": {}}}],
"k7": {
"k0": [],
"k1": ["v11"],
"k2": ["v12", "v13"],
"k3": ["v14", ["v15"]],
"k4": [["v16"], ["v17"]],
"k5": ["v18", ["v19", "v20", ["v21", "v22", []]]],
},
}
struct2 = ["aa", "bb", "cc", ["dd", "ee", ["ff", "gg"], None, []]]
Expected outputs
struct1_leaves = {f"v{i}" for i in range(23)}
struct2_leaves = {f"{s}{s}" for s in "abcdefg"}
Another possibility is to use a generator with recursion:
struct1 = {'k0': None, 'k1': 'v1', 'k2': ['v0', None, 'v1'], 'k3': ['v0', ['v1', 'v2', None, ['v3'], ['v4', 'v5'], []]], 'k4': {'k0': None}, 'k5': {'k1': {'k2': {'k3': 'v3', 'k4': 'v6'}, 'k4': {}}}, 'k6': [{}, {'k1': 'v7'}, {'k2': 'v8', 'k3': 'v9', 'k4': {'k5': {'k6': 'v10'}, 'k7': {}}}], 'k7': {'k0': [], 'k1': ['v11'], 'k2': ['v12', 'v13'], 'k3': ['v14', ['v15']], 'k4': [['v16'], ['v17']], 'k5': ['v18', ['v19', 'v20', ['v21', 'v22', []]]]}}
def flatten(d):
for i in getattr(d, 'values', lambda :d)():
if isinstance(i, str):
yield i
elif i is not None:
yield from flatten(i)
print(set(flatten(struct1)))
Output:
{'v10', 'v9', 'v8', 'v7', 'v0', 'v18', 'v16', 'v1', 'v21', 'v11', 'v14', 'v15', 'v12', 'v13', 'v4', 'v2', 'v5', 'v20', 'v6', 'v19', 'v3', 'v22', 'v17'}
struct2 = ["aa", "bb", "cc", ["dd", "ee", ["ff", "gg"], None, []]]
print(set(flatten(struct2)))
Output:
{'cc', 'ff', 'dd', 'gg', 'bb', 'ee', 'aa'}
This is a straightforward reference solution which uses recursion to produce the expected outputs for the sample inputs included in the question.
from typing import Any, Set
def leaves(struct: Any) -> Set[Any]:
"""Return a set of leaf values found in nested dicts and lists excluding None values."""
# Ref: https://stackoverflow.com/a/59832362/
values = set()
if isinstance(struct, dict):
for sub_struct in struct.values():
values.update(leaves(sub_struct))
elif isinstance(struct, list):
for sub_struct in struct:
values.update(leaves(sub_struct))
elif struct is not None:
values.add(struct)
return values
This is an adaption of the reference answer to use an inner function and a single set. It also uses recursion to produce the expected outputs for the sample inputs included in the question. It avoids passing every leaf through the entire call stack.
from typing import Any, Set
def leaves(struct: Any) -> Set[Any]:
"""Return a set of leaf values found in nested dicts and lists excluding None values."""
# Ref: https://stackoverflow.com/a/59832594/
values = set()
def add_leaves(struct_: Any) -> None:
if isinstance(struct_, dict):
for sub_struct in struct_.values():
add_leaves(sub_struct)
elif isinstance(struct_, list):
for sub_struct in struct_:
add_leaves(sub_struct)
elif struct_ is not None:
values.add(struct_)
add_leaves(struct)
return values

How to insert key-value pair into dictionary at a specified position?

How would I insert a key-value pair at a specified location in a python dictionary that was loaded from a YAML document?
For example if a dictionary is:
dict = {'Name': 'Zara', 'Age': 7, 'Class': 'First'}
I wish to insert the element 'Phone':'1234'
before 'Age', and after 'Name' for example. The actual dictionary I shall be working on is quite large (parsed YAML file), so deleting and reinserting might be a bit cumbersome (I don't really know).
If I am given a way of inserting into a specified position in an OrderedDict, that would be okay, too.
On python < 3.7 (or cpython < 3.6), you cannot control the ordering of pairs in a standard dictionary.
If you plan on performing arbitrary insertions often, my suggestion would be to use a list to store keys, and a dict to store values.
mykeys = ['Name', 'Age', 'Class']
mydict = {'Name': 'Zara', 'Age': 7, 'Class': 'First'} # order doesn't matter
k, v = 'Phone', '123-456-7890'
mykeys.insert(mykeys.index('Name')+1, k)
mydict[k] = v
for k in mykeys:
print(f'{k} => {mydict[k]}')
# Name => Zara
# Phone => 123-456-7890
# Age => 7
# Class => First
If you plan on initialising a dictionary with ordering whose contents are not likely to change, you can use the collections.OrderedDict structure which maintains insertion order.
from collections import OrderedDict
data = [('Name', 'Zara'), ('Phone', '1234'), ('Age', 7), ('Class', 'First')]
odict = OrderedDict(data)
odict
# OrderedDict([('Name', 'Zara'),
# ('Phone', '1234'),
# ('Age', 7),
# ('Class', 'First')])
Note that OrderedDict does not support insertion at arbitrary positions (it only remembers the order in which keys are inserted into the dictionary).
You will have to initialize your dict as OrderedDict. Create a new empty OrderedDict, go through all keys of the original dictionary and insert before/after when the key name matches.
from pprint import pprint
from collections import OrderedDict
def insert_key_value(a_dict, key, pos_key, value):
new_dict = OrderedDict()
for k, v in a_dict.items():
if k==pos_key:
new_dict[key] = value # insert new key
new_dict[k] = v
return new_dict
mydict = OrderedDict([('Name', 'Zara'), ('Age', 7), ('Class', 'First')])
my_new_dict = insert_key_value(mydict, "Phone", "Age", "1234")
pprint(my_new_dict)
Had the same issue and solved this as described below without any additional imports being required and only a few lines of code.
Tested with Python 3.6.9.
Get position of key 'Age' because the new key value pair should get inserted before
Get dictionary as list of key value pairs
Insert new key value pair at specific position
Create dictionary from list of key value pairs
mydict = {'Name': 'Zara', 'Age': 7, 'Class': 'First'}
print(mydict)
# {'Name': 'Zara', 'Age': 7, 'Class': 'First'}
pos = list(mydict.keys()).index('Age')
items = list(mydict.items())
items.insert(pos, ('Phone', '123-456-7890'))
mydict = dict(items)
print(mydict)
# {'Name': 'Zara', 'Phone': '123-456-7890', 'Age': 7, 'Class': 'First'}
Edit 2021-12-20:
Just saw that there is an insert method available ruamel.yaml, see the example from the project page:
import sys
from ruamel.yaml import YAML
yaml_str = """\
first_name: Art
occupation: Architect # This is an occupation comment
about: Art Vandelay is a fictional character that George invents...
"""
yaml = YAML()
data = yaml.load(yaml_str)
data.insert(1, 'last name', 'Vandelay', comment="new key")
yaml.dump(data, sys.stdout)
This is a follow-up on nurp's answer. Has worked for me, but offered with no warranty.
# Insert dictionary item into a dictionary at specified position:
def insert_item(dic, item={}, pos=None):
"""
Insert a key, value pair into an ordered dictionary.
Insert before the specified position.
"""
from collections import OrderedDict
d = OrderedDict()
# abort early if not a dictionary:
if not item or not isinstance(item, dict):
print('Aborting. Argument item must be a dictionary.')
return dic
# insert anywhere if argument pos not given:
if not pos:
dic.update(item)
return dic
for item_k, item_v in item.items():
for k, v in dic.items():
# insert key at stated position:
if k == pos:
d[item_k] = item_v
d[k] = v
return d
d = {'A':'letter A', 'C': 'letter C'}
insert_item(['A', 'C'], item={'B'})
## Aborting. Argument item must be a dictionary.
insert_item(d, item={'B': 'letter B'})
## {'A': 'letter A', 'C': 'letter C', 'B': 'letter B'}
insert_item(d, pos='C', item={'B': 'letter B'})
# OrderedDict([('A', 'letter A'), ('B', 'letter B'), ('C', 'letter C')])
Would this be "pythonic"?
def add_item(d, new_pair, old_key): #insert a newPair (key, value) after old_key
n=list(d.keys()).index(old_key)
return {key:d.get(key,new_pair[1]) for key in list(d.keys())[:n+1] +[new_pair[0]] + list(d.keys())[n+1:] }
INPUT: new_pair=('Phone',1234) , old_key='Age'
OUTPUT: {'Name': 'Zara', 'Age': 7, 'Phone': 1234, 'Class': 'First'}
Simple reproducible example (using zip() for unpacking and packing)
### Task - Insert 'Bangladesh':'Dhaka' after 'India' in the capitals dictinary
## Given dictionary
capitals = {'France':'Paris', 'United Kingdom':'London', 'India':'New Delhi',
'United States':'Washington DC','Germany':'Berlin'}
## Step 1 - Separate into 2 lists containing : 1) keys, 2) values
country, cap = (list(tup) for tup in zip(*capitals.items()))
# or
country, cap = list(map(list, zip(*capitals.items())))
print(country)
#> ['France', 'United Kingdom', 'India', 'United States', 'Germany']
print(cap)
#> ['Paris', 'London', 'New Delhi', 'Washington DC', 'Berlin']
## Step 2 - Find index of item before the insertion point (from either of the 2 lists)
req_pos = country.index('India')
print(req_pos)
#> 2
## Step 3 - Insert new entry at specified position in both lists
country.insert(req_pos+1, 'Bangladesh')
cap.insert(req_pos+1, 'Dhaka')
print(country)
#> ['France', 'United Kingdom', 'India', 'Bangladesh', 'United States', 'Germany']
print(cap)
#> ['Paris', 'London', 'New Delhi', 'Dhaka', 'Washington DC', 'Berlin']
## Step 4 - Zip up the 2 lists into a dictionary
capitals = dict(zip(country, cap))
print(capitals)
#> {'France': 'Paris', 'United Kingdom': 'London', 'India': 'New Delhi', 'Bangladesh': 'Dhaka', 'United States': 'Washington DC', 'Germany': 'Berlin'}
Once your have used load() (without option Loader=RoundTripLoader) and your data is in a dict() it is to late, as the order that was available in the YAML file is normally gone (the order depending on the actual keys used, the python used (implementation, version and possible compile options).
What you need to do is use round_trip_load():
import sys
from ruamel import yaml
yaml_str = "{'Name': 'Zara', 'Age': 7, 'Class': 'First'}"
data = yaml.round_trip_load(yaml_str)
pos = list(data.keys()).index('Age') # determine position of 'Age'
# insert before position of 'Age'
data.insert(pos, 'Phone', '1234', comment='This is the phone number')
data.fa.set_block_style() # I like block style
yaml.round_trip_dump(data, sys.stdout)
this will invariable give:
Name: Zara
Phone: '1234' # This is the phone number
Age: 7
Class: First
Under the hood round_trip_dump() transparently gives you back a subclass of orderddict to make this possible (which actual implementation is dependent on your Python version).
Since your elements comes in pairs, I think this will could work.
dict = {'Name': 'Zara', 'Age': 7, 'Class': 'First'}
new_element = { 'Phone':'1234'}
dict = {**dict,**new_element}
print(dict)
This is the output I got:
{'Name': 'Zara', 'Age': 7, 'Class': 'First', 'Phone': '1234'}

python dict. if statement

Can someone assist me on the proper placement of an if statement on my dict. I am try to flag any user that is NY and have them come first in the dictionary. I am able to get sort by name which I require as well but not clear how to flag any NY users
names_states = {
'user1': 'CA',
'user2': 'NY',
'user7': 'CA',
'guest': 'MN',
}
for key in sorted(names_states.iterkeys()):
2 ... print "%s : %s" %(key, names_states[key])
3 ...
4 user1 : CA
5 user2 : NY
6 guest : MN
7 user7 : CA
sorted(names_states.iteritems(), key=lambda x: (x[1] != 'NY', x[0]))
Here is an approach that first pushes the NY values to the top, while still sorting by user name as the first key, and the state as the secondary key:
{'bill': 'NY',
'frank': 'NY',
'guest': 'MN',
'user1': 'CA',
'user2': 'NY',
'user7': 'CA'}
def keyFunc(x):
if names_states[x] == 'NY':
return (False, x)
return (x, names_states[x])
sorted(names_states.iterkeys(), key=keyFunc)
# ['bill', 'frank', 'user2', 'guest', 'user1', 'user7']
Note: Sticking with the key approach here is faster than defining a custom cmp function. The key function will only be run once for each item, whereas a cmp function will be run every single combination.
This does what you asked for-- All of newyorkers first, sorted by name, and then everyone else sorted by name too.
names_states = {
'user1': 'CA',
'user2': 'NY',
'user7': 'CA',
'guest': 'MN',
}
def ny_compare(x, y):
if names_states[x] == 'NY':
if names_states[y] == 'NY':
return cmp(x,y)
else:
return -1
elif names_states[y] == 'NY':
return 1
return cmp(x,y)
for key in sorted(names_states.iterkeys(), cmp=ny_compare):
print "%s : %s" %(key, names_states[key])
Using tuple unpacking may make it clearer
sorted(names_states.iteritems(), key=lambda (name, state): (state != 'NY', name))
for key, value in sorted(names_states.iteritems(),
key=lambda (key, value): ((0 if value == 'NY' else 1), key)):
print((key,value))
Here iteritems() gets us an iterator over the key/value pairs in the dictionary. We can then sort those key/value pairs. The lambda expression creates a set of keys that will sort the list as you say. It does this by returning tuples, whose first element is a number based on whether the value is 'NY' or not. Basically, I'm using the fact that tuples are sorted by their first element and then their second and so on.
That said, this is beginning to look quite messy, so you may want to find a more verbose, but clearer way to do this.
You may wish to subclass dict and add an additional method that exposes the keys, values or key/value pairs in the order you desire.
That way you can have something as simple as:
names_states = MyDict({
'user1': 'CA',
'user2': 'NY',
'user7': 'CA',
'guest': 'MN',
})
for key, value in names_states.iteritems_with_given_values_first('NY'):
...
class MyDict(dict):
def __init__(self, *args, **kwargs):
super(MyDict, self).__init__(*args, **kwargs)
def iteritems_with_given_values_first(self, preferred_value):
for key, value in sorted(names_states.iteritems(),
key=lambda (key, value): ((0 if value == preferred_value else 1), key)):
yield key, value

Categories