Related
I have a nested dictionary where I want to convert 'protein accession' values from lists into simple strings. Also, I want to convert lists of strings into lists of integers for example in 'sequence length', 'start location', 'stop location'.
'protein accession': ['A0A0H3LJT0_e'] into 'protein accession': 'A0A0H3LJT0_e'
'sequence length': ['102'] into 'sequence length': [102]
and so on
Here is the sample of my dictionary:
{
"A0A0H3LJT0_e": {
"protein accession": ["A0A0H3LJT0_e"],
"sequence length": ["102"],
"analysis": ["SMART"],
"signature accession": ["SM00886"],
"signature description": ["Dabb_2"],
"start location": ["4"],
"stop location": ["98"],
"e-value": ["1.5E-22"],
"interpro accession": ["IPR013097"],
"interpro description": ["Stress responsive alpha-beta barrel"],
"nunique": [2],
"domain_count": [1],
}
}
Could someone help me, please?
You need to iterate through the replace values accordingly.
d is the input dictionary here.
In [1]: data = d['C4QY10_e']
In [2]: result = {}
In [3]: for k,v in data.items():
...: if str(v[0]).isdigit():
...: result[k] = [int(v[0])]
...: else:
...: result[k] = v[0]
...:
In [4]: result
Out[4]:
{'protein accession': 'C4QY10_e',
'sequence length': [1879],
'analysis': 'Pfam',
'signature accession': 'PF18314',
'signature description': 'Fatty acid synthase type I helical domain',
'start location': [328],
'stop location': [528],
'e-value': '4.7E-73',
'interpro accession': 'IPR041550',
'interpro description': 'Fatty acid synthase type I',
'nunique': [1],
'domain_count': [5]}
To iterate through the entire dictionary like this,
for val in d.values():
for k,v in val.items():
if str(v[0]).isdigit():
result[k] = [int(v[0])]
else:
result[k] = v[0]
If you want to change the dictionary itself you can do this,
for main_k, main_val in d.items():
for k,v in main_val.items():
if str(v[0]).isdigit():
d[main_k][k] = [int(v[0])]
else:
d[main_k][k] = v[0]
Assuming you also want to convert the string representation of a floating point number then you could do this (which also allows for list values with more than one element:
sample = {
"A0A0H3LJT0_e": {
"protein accession": ["A0A0H3LJT0_e"],
"sequence length": ["102"],
"analysis": ["SMART"],
"signature accession": ["SM00886"],
"signature description": ["Dabb_2"],
"start location": ["4"],
"stop location": ["98"],
"e-value": ["1.5E-22"],
"interpro accession": ["IPR013097"],
"interpro description": ["Stress responsive alpha-beta barrel"],
"nunique": [2],
"domain_count": [1]
}
}
for sd in sample.values():
if isinstance(sd, dict):
for k, v in sd.items():
if isinstance(v, list):
try:
sd[k] = list(map(int, v))
except ValueError:
try:
sd[k] = list(map(float, v))
except ValueError:
sd[k] = ', '.join(map(str, v))
print(sd)
Output:
{'protein accession': 'A0A0H3LJT0_e', 'sequence length': [102], 'analysis': 'SMART', 'signature accession': 'SM00886', 'signature description': 'Dabb_2', 'start location': [4], 'stop location': [98], 'e-value': [1.5e-22], 'interpro accession': 'IPR013097', 'interpro description': 'Stress responsive alpha-beta barrel', 'nunique': [2], 'domain_count': [1]}
Note:
Unless every value in a list can be converted to either int or float, the values will be converted into a single string where each element is separated from the other by ', '
When creating a new mapping-type object with nested container objects (e.g. list, dict, set, etc.), a defaultdict from the built-in collections library may be called for.
However, let's assume you are modifying the existing dictionary in place, thus preserving the dict type. We can use two explicit for loops over dict.items():
# assume input is stored to the variable, data
for name, details in data.items(): # Also possible to use for details in data.values():
for attribute, values in details.items():
# Use tuple unpacking to raise a ValueError
# when no values or more than one value unexpectedly appears
(value,) = values
# Only consider strings with decimal values
if isinstance(value, str) and value.isdecimal():
# details is a reference to data[name]
details[attribute] = int(value)
else:
details[attribute] = value
How to make a simple text parser that finds keywords and categorizes them accordingly.
Example: I have two dictionaries
A = {'1': 'USA', '2': 'Canada', '3': 'Germany'}
B = {'t1': "The temp in USA is x", 't2': 'Germany is very cold now', 't3': 'Weather in Canada is good', 't4': 'USA is cold right now'}
Now I want to pick out if the keywords from A are present in B and the result should be something like this.
Result = {'1': ('t1', 't4'), '2' : 't3', '3': 't2'}
I'm a beginner and the logic to get this is very confusing.
You can do this with a dict comprehension:
A = {'1': 'USA', '2': 'Canada', '3': 'Germany'}
B = {'t1': "The temp in USA is x", 't2': 'Germany is very cold now', 't3': 'Weather in Canada is good', 't4': 'USA is cold right now'}
{k: [k_b for k_b, v_b in B.items() if v in v_b.split()] for k, v in A.items()}
# {'1': ['t1', 't4'], '2': ['t3'], '3': ['t2']}
This makes every value in the dict a list rather than some being collections and others strings. That's almost certainly going to be easier to work with than a mixed type dictionary.
If your dicts are going to be large, you might pick up some performance by inverting the B dictionary so you don't need to scan through each value every time.
If you want the exact formatting you have shown, you could do something like this. A simple loop that makes a dict of tuples, and then using dict comprehension to reformat the ones that have a length of 1.
Result = dict()
for ka, va in A.items():
Result[ka] = tuple(kb for kb,vb in B.items() if va in vb)
Result = {k: (v[0] if len(v) == 1 else v) for k,v in Result.items()}
print(Result)
I am trying to sort a dictionary by value, which is a timestamp in the format H:MM:SS (eg "0:41:42") but the code below doesn't work as expected:
album_len = {
'The Piper At The Gates Of Dawn': '0:41:50',
'A Saucerful of Secrets': '0:39:23',
'More': '0:44:53', 'Division Bell': '1:05:52',
'The Wall': '1:17:46',
'Dark side of the moon': '0:45:18',
'Wish you were here': '0:44:17',
'Animals': '0:41:42'
}
album_len = OrderedDict(sorted(album_len.items()))
This is the output I get:
OrderedDict([
('A Saucerful of Secrets', '0:39:23'),
('Animals', '0:41:42'),
('Dark side of the moon', '0:45:18'),
('Division Bell', '1:05:52'),
('More', '0:44:53'),
('The Piper At The Gates Of Dawn', '0:41:50'),
('The Wall', '1:17:46'),
('Wish you were here', '0:44:17')])
It's not supposed to be like that. The first element I expected to see is ('The Wall', '1:17:46'), the longest one.
How do I get the elements sorted the way I intended?
Try converting each value to a datetime and using that as the key:
from collections import OrderedDict
from datetime import datetime
def convert_to_datetime(val):
return datetime.strptime(val, "%H:%M:%S")
album_len = {'The Piper At The Gates Of Dawn': '0:41:50',
'A Saucerful of Secrets': '0:39:23', 'More': '0:44:53',
'Division Bell': '1:05:52', 'The Wall': '1:17:46',
'Dark side of the moon': '0:45:18',
'Wish you were here': '0:44:17', 'Animals': '0:41:42'}
album_len = OrderedDict(
sorted(album_len.items(), key=lambda i: convert_to_datetime(i[1]))
)
print(album_len)
Output:
OrderedDict([('A Saucerful of Secrets', '0:39:23'), ('Animals', '0:41:42'),
('The Piper At The Gates Of Dawn', '0:41:50'),
('Wish you were here', '0:44:17'), ('More', '0:44:53'),
('Dark side of the moon', '0:45:18'), ('Division Bell', '1:05:52'),
('The Wall', '1:17:46')])
Or in descending order with reverse set to True:
album_len = OrderedDict(
sorted(
album_len.items(),
key=lambda i: convert_to_datetime(i[1]),
reverse=True
)
)
Output:
OrderedDict([('The Wall', '1:17:46'), ('Division Bell', '1:05:52'),
('Dark side of the moon', '0:45:18'), ('More', '0:44:53'),
('Wish you were here', '0:44:17'),
('The Piper At The Gates Of Dawn', '0:41:50'),
('Animals', '0:41:42'), ('A Saucerful of Secrets', '0:39:23')])
Edit: If only insertion order needs maintained and the OrderedDict specific functions like move_to_end are not going to be used then a regular python dict also works here for Python3.7+.
Ascending:
album_len = dict(
sorted(album_len.items(), key=lambda i: convert_to_datetime(i[1]))
)
Descending:
album_len = dict(
sorted(album_len.items(), key=lambda i: convert_to_datetime(i[1]),
reverse=True)
)
This is a duplicate of the question: How do I sort a dictionary by value?"
>>> dict(sorted(album_len.items(), key=lambda item: item[1]))
{'A Saucerful of Secrets': '0:39:23',
'Animals': '0:41:42',
'The Piper At The Gates Of Dawn': '0:41:50',
'Wish you were here': '0:44:17',
'More': '0:44:53',
'Dark side of the moon': '0:45:18',
'Division Bell': '1:05:52',
'The Wall': '1:17:46'}
Note: the time format is already lexicographically ordered, you don't need to convert to datetime.
See comment below of #DarrylG. He's totally right, therefore, the remark on the lexicographic order is valid as long as the duration does not exceed 9:59:59 except if hours are padded with a leading zero.
I've made this list; each item is a string that contains commas (in some cases) and colon (always):
dinner = [
'cake,peas,cheese : No',
'duck,broccoli,onions : Maybe',
'motor oil : Definitely Not',
'pizza : Damn Right',
'ice cream : Maybe',
'bologna : No',
'potatoes,bacon,carrots,water: Yes',
'rats,hats : Definitely Not',
'seltzer : Yes',
'sleeping,whining,spitting : No Way',
'marmalade : No'
]
I would like to create a new list from the one above as follows:
['cake : No',
'peas : No',
'cheese : No',
'duck : Maybe',
'broccoli : Maybe',
'onions : Maybe',
'motor oil : Definitely Not',
'pizza : Damn Right',
'ice cream : Maybe',
'bologna : No',
'potatoes : Yes',
'bacon : Yes',
'carrots : Yes',
'water : Yes',
'rats : Definitely Not',
'hats : Definitely Not',
'seltzer : Yes',
'sleeping : No Way',
'whining : No Way',
'spitting : No Way',
'marmalade : No']
But I'd like to know if/ how it's possible to do so in a line or two of efficient code employing primarily Python's higher order functions. I've been attempting it:
reduce(lambda x,y: x + y, (map(lambda x: x.split(':')[0].strip().split(','), dinner)))
...produces this:
['cake',
'peas',
'cheese',
'duck',
'broccoli',
'onions',
'motor oil',
'pizza',
'ice cream',
'bologna',
'potatoes',
'bacon',
'carrots',
'water',
'rats',
'hats',
'seltzer',
'sleeping',
'whining',
'spitting',
'marmalade']
...but I'm struggling with appending the piece of each string after the colon back onto each item.
I would create a dict using, zip, map and itertools.repeat:
from itertools import repeat
data = ({k.strip(): v.strip() for _k, _v in map(lambda x: x.split(":"), dinner)
for k, v in zip(_k.split(","), repeat(_v))})
from pprint import pprint as pp
pp(data)
Output:
{'bacon': 'Yes',
'bologna': 'No',
'broccoli': 'Maybe',
'cake': 'No',
'carrots': 'Yes',
'cheese': 'No',
'duck': 'Maybe',
'hats': 'Definitely Not',
'ice cream': 'Maybe',
'marmalade': 'No',
'motor oil': 'Definitely Not',
'onions': 'Maybe',
'peas': 'No',
'pizza': 'Damn Right',
'potatoes': 'Yes',
'rats': 'Definitely Not',
'seltzer': 'Yes',
'sleeping': 'No Way',
'spitting': 'No Way',
'water': 'Yes',
'whining': 'No Way'}
Or using the dict constructor:
from itertools import repeat
data = dict(map(str.strip, t) for _k, _v in map(lambda x: x.split(":"), dinner)
for t in zip(_k.split(","), repeat(_v)))
from pprint import pprint as pp
pp(data)
If you really want a list of strings, we can do something similar using itertools.chain and joining the substrings:
from itertools import repeat, chain
data = chain.from_iterable(map(":".join, zip(_k.split(","), repeat(_v)))
for _k, _v in map(lambda x: x.split(":"), dinner))
from pprint import pprint as pp
pp(list(data))
Output:
['cake: No',
'peas: No',
'cheese : No',
'duck: Maybe',
'broccoli: Maybe',
'onions : Maybe',
'motor oil : Definitely Not',
'pizza : Damn Right',
'ice cream : Maybe',
'bologna : No',
'potatoes: Yes',
'bacon: Yes',
'carrots: Yes',
'water: Yes',
'rats: Definitely Not',
'hats : Definitely Not',
'seltzer : Yes',
'sleeping: No Way',
'whining: No Way',
'spitting : No Way',
'marmalade : No']
Assuming you really need it as a list of strings vs. a dictionary, which looks like a better data structure.
By simplify using comprehensions you can do this:
>>> [[x+':'+y for x in i.split(',')]
... for i, y in map(lambda l: map(str.strip, l.split(':')), dinner)]
[['cake:No', 'peas:No', 'cheese:No'],
['duck:Maybe', 'broccoli:Maybe', 'onions:Maybe'],
['motor oil:Definitely Not'],
...
['marmalade:No']]
Now just add up the lists:
>>> from operator import add
>>> reduce(add, ([x+':'+y for x in i.split(',')]
... for i, y in map(lambda l: map(str.strip, l.split(':')), dinner)), [])
['cake:No',
'peas:No',
'cheese:No',
'duck:Maybe',
...
'marmalade:No']
Or just flatten the list:
>>> [a for i, y in map(lambda l: map(str.strip, l.split(':')), dinner)
... for a in (x+':'+y for x in i.split(','))]
['cake:No',
'peas:No',
'cheese:No',
'duck:Maybe',
...
'marmalade:No']
This may work:
def processList (aList):
finalList = []
for aListEntry in aList:
aListEntry_entries = aListEntry.split(':')
aListEntry_list = aListEntry_entries[0].split(',')
for aListEntry_list_entry in aListEntry_list:
finalList.append(aListEntry_list_entry.strip() + ' : ' + aListEntry_entries[1].strip())
return finalList
List comprehensions are preferred in Python (check eg this), due to better legibility (at least for some;).
The code demonstrates two types of list comprehension nesting, the first is basically chaining the operations, the other produces one list from two nested loops.
If you make your data more consistent by adding one space after the carrots, water, you can get rid of two .strip() calls;)
dinner = [
'cake,peas,cheese : No',
'duck,broccoli,onions : Maybe',
'motor oil : Definitely Not',
'pizza : Damn Right',
'ice cream : Maybe',
'bologna : No',
'potatoes,bacon,carrots,water : Yes',
'rats,hats : Definitely Not',
'seltzer : Yes',
'sleeping,whining,spitting : No Way',
'marmalade : No'
]
prefs = [(pref, items.split(',')) for items, pref in [it.split(" : ") for it in dinner]]
[" : ".join([item, pref]) for pref, items in prefs for item in items]
I have a list
key_list = ['m.title', 'm.studio', 'm.gross', 'm.year']
cols = [
['Titanic', 'The Lord of the Rings: The Return of the King', 'Toy Story 3'],
['Par.', 'NL', 'BV'],
['2186.8', '1119.9', '1063.2'],
['1997', '2003', '2010']
]
I want to construct a dictionary table_dict whose keys are the elements of key_list, and values are respective sublists of cols.
My current code is as follows:
i = 0
for key in key_list:
table_dict[key] = cols[i]
i = i + 1
return table_dict
I can't seem to find an error, yet when I run it I get:
dict[key] = cols[i]
IndexError: list index out of range
You can simply zip the keys and values and pass it to the dict. You can read more about constructing dictionaries here
print dict(zip(key_list, cols))
Output
{'m.gross': ['2186.8', '1119.9', '1063.2'], 'm.studio': ['Par.', 'NL', 'BV'], 'm.year': ['1997', '2003', '2010'], 'm.title': ['Titanic', 'The Lord of the Rings: The Return of the King', 'Toy Story 3']}
key_list = ['m.title', 'm.studio', 'm.gross', 'm.year']
cols = [
['Titanic', 'The Lord of the Rings: The Return of the King', 'Toy Story 3'],
['Par.', 'NL', 'BV'],
['2186.8', '1119.9', '1063.2'],
['1997', '2003', '2010']]
for i in cols:
print dict(zip(key_list, i))
If You want OUTPUT like this
{'m.gross': 'Toy Story 3', 'm.studio': 'The Lord of the Rings: The Return of the King','m.title': 'Titanic'}{'m.gross': 'BV', 'm.studio': 'NL', 'm.title': 'Par.'}{'m.gross': '1063.2', 'm.studio': '1119.9', 'm.title': '2186.8'}{'m.gross': '2010', 'm.studio': '2003','m.title': '1997'}
The example you provided works without an error. There might be another problem within your code. However, what the error message tells you is that,
The index i of list cols is out of bounds. Which means while iterating over the first list (which has 4 elements in it, so iterating 4 times) the other list cols does not have enough items - meaning less than 4 probably.
The work around this issue refer to the python docs dict
table_dict = dict(zip(key_list, cols))
print table_dict
Output:
{'m.gross': ['2186.8', '1119.9', '1063.2'], 'm.studio': ['Par.', 'NL', 'BV'], 'm.year': ['1997', '2003', '2010'], 'm.title': ['Titanic', 'The Lord of the Rings: The Return of the King', 'Toy Story 3']}