Extracting dictionary with a subset of values from another dictionary - python

I have a dictionary and want to remove certain values in bad_list from its value list, and return the remainder. Here is the code:
d = {1: ['a', 'c', 'd'], 2: ['b'], 5: ['e']}
bad_list = ['d','e']
ad = {k:d[k].remove(i) for k in d.keys() for sublist in d[k] for i in sublist if i in bad_list}
print 'd =', d
print 'ad =', ad
Unfortunately what that does is it changes the values in d permanently, and returns None for values in ad.
d = {1: ['a', 'c'], 2: ['b'], 5: []}
ad = {1: None, 5: None}
How can I get a dictionary that looks like this:
new_dict = {1: ['a','c'], 2:['b']}
without looping through? I have a much larger dictionary to deal with, and I'd like to do it in the most efficient way.

There is no way to do it without loop:
d = dict((key, [x for x in value if x not in bad_list]) for key, value in d.iteritems())
or with filter:
d = dict((key, filter(lambda x: x not in bad_list, d[key])) for key in d)
UPDATE
To exclude empty values:
d = dict((key, list(x)) for key in d for x in [set(d[key]).difference(bad_list)] if x)

Well, you could just use 'list comprehension', this one liner works, thought I find if ugly.
ad = {k:v for k,v in {k:[i for i in v if i not in bad_list] for k,v in d.items()}.items() if v}
I'd better use a for loop.
ad2 = dict()
for k,v in d.items():
_data_ = [item for item in v if item not in bad_list]
if _data_:
ad2[k]=_data_
Output:
print 'd =', d
print 'ad =', ad
print 'ad2=', ad2
>d = {1: ['a', 'c', 'd'], 2: ['b'], 5: ['e']}
>ad = {1: ['a', 'c'], 2: ['b']}
>ad2= {1: ['a', 'c'], 2: ['b']}

The following code written in Python 3.5 appears to do as requested in your question. Minimal change should be required for it to work with Python 2.x instead. Just use print statements instead of functions.
d = {1: ['a', 'c', 'd'], 2: ['b'], 5: ['e']}
bad_list = ['d', 'e']
ad = {a: b for a, b in ((a, [c for c in b if c not in bad_list]) for a, b in d.items()) if b}
print('d =', d)
print('ad =', ad)

Related

Flatten dictoinary of arbitrary depth

I receive from another script a dictionary containing various types, in particular other dictionaries or lists that might contain other dictionaries as values.
Now what I want to do is create a single flat dictionary. Keys might be present multiple times within the encapsulated dictionaries. For me the inner most key holds the newest information, so I think dict.update is the right routine to apply when digesting a 'inner' dict. By 'inner' dict I mean a dictionary withing some value of the outermost dictionary.
Now, I understand how to flatten a dictionary by 1 level. What I struggle with to to flatten it by arbitrarily many levels.
A simple example example of the type of dictionary I'm dealing with is:
d = {1: {6: {7: {2: {'a'}}}}, 2: 'b', 3: {4: {2: 'c'}}, 5: ['a', 'b', {1: 'a'}]}
My attempt works ok for a single level of depth:
dd = dict()
for k, v in d.items():
if isinstance(v, dict):
dd.update(v)
elif isinstance(v, list):
for el in v:
if isinstance(el, dict):
dd.update(el)
dd[k] = [el for el in v if not isinstance(el, dict)]
else:
dd[k] = v
This gives me:
Out[56]: {6: {7: {2: {'a'}}}, 2: 'b', 4: {2: 'c'}, 1: 'a', 5: ['a', 'b']}
What it should give is:
{2: 'a', 5: ['a', 'b']}
Note the value of the key 2: 'c' and not (as I get now) 'b'. This should be because the inner-most value for the key 2 is 'c' and not 'b'.
I'm not just looking to get a functioning code (although this would allow me to continue working) but I'd like to understand how such a problem is tackled in python. I have to admit that I'm a little lost here...
Any help is greatly appreciated!
You can use recursion with a generator and keep a counter to determine the depth:
d = {1: {6: {7: {2: {'a'}}}}, 2: 'b', 3: {4: {2: 'c'}}, 5: ['a', 'b', {1: 'a'}]}
def flatten(_d, _depth = 0):
for a, b in _d.items():
if isinstance(b, list):
yield [a, [i for i in b if not isinstance(i, dict)], _depth]
for c in b:
if isinstance(c, dict):
yield from flatten(c, _depth+1)
elif isinstance(b, dict):
yield from flatten(b, _depth+1)
else:
yield [a, b, _depth]
_result = {}
for a, b, c in flatten(d):
if a not in _result:
_result[a] = [b, c]
else:
if _result[a][-1] < c:
_result[a] = [b, c]
print({a:b for a, [b, c] in _result.items()})
Output:
{2: {'a'}, 5: ['a', 'b'], 1: 'a'}
Your approach is correct. But you have update the dict recursively for it to work on any number of levels
def flatten(d):
dd = dict()
for k, v in d.items():
if isinstance(v, dict):
dd.update(flatten(v))
elif isinstance(v, list):
for el in v:
if isinstance(el, dict):
dd.update(flatten(el))
dd[k] = [el for el in v if not isinstance(el, dict)]
else:
dd[k] = v
return dd
d = {1: {2: {'a'}}, 2: 'b', 3: {4: {2: 'c'}}, 5: ['a', 'b', {1: 'a'}]}
print flatten(d)
# {2: 'c', 1: 'a', 5: ['a', 'b']}

slicing dictionary with values

I have a dictionary like:
d = {1: 'a', 2:'b', 3:'c', 4:'c', 5:'c', 6:'c'}
I want to slice this dictionary such that if the values in the end are same, it should return only the first value encountered. so the return is:
d = {1: 'a', 2:'b', 3:'c'}
I'm using collections.defaultdict(OrderedDict) to maintain sorting by the keys.
Currently, I'm using a loop. Is there a pythonic way of doing this?
UPDATE
the dictionary values can also be dictionaries:
d = {1: {'a': 'a1', 'b': 'b1'}, 2:{'a': 'a1', 'b': 'b2'}, 3:{'a': 'a1', 'b': 'c1'}, 4:{'a': 'a1', 'b': 'c1'}, 5:{'a': 'a1', 'b': 'c1'}, 6:{'a': 'a1', 'b': 'c1'}}
output:
d = {1: {'a': 'a1', 'b': 'b1'}, 2:{'a': 'a1', 'b': 'b2'}, 3:{'a': 'a1', 'b': 'c1'}}
You can use itertools.groupy with a list-comprehension to achieve your result
>>> from itertools import groupby
>>> d = {1: 'a', 2:'b', 3:'c', 4:'c', 5:'c', 6:'c'}
>>> n = [(min([k[0] for k in list(g)]),k) for k,g in groupby(d.items(),key=lambda x: x[1])]
>>> n
>>> [(1, 'a'), (2, 'b'), (3, 'c')]
The above expression can also be written as
>>> from operator import itemgetter
>>> n = [(min(map(itemgetter(0), g)), k) for k, g in groupby(d.items(), key=itemgetter(1))]
You can cast this to dict by simply using
>>> dict(n)
>>> {1: 'a', 2: 'b', 3: 'c'}
This obviously don't maintain order of keys, so you can use OrderedDict
>>> OrderedDict(sorted(n))
>>> OrderedDict([(1, 'a'), (2, 'b'), (3, 'c')])
If you want to get rid of for loop - you can do it this way:
{a:b for b,a in {y:x for x,y in sorted(d.iteritems(), reverse=True)}.iteritems()}
But it is not so pythonic and not so efficient.
Instead of using a ordered dictionary with the keys representing indexes, the more pythonic way is using a list. In this case, you will use indexes instead of keys and will be able to slice the list more effectively.
>>> d = {1: 'a', 2:'b', 3:'c', 4:'c', 5:'c', 6:'c'}
>>> a = list(d.values())
>>> a[:a.index(a[-1])+1]
['a', 'b', 'c']
Just in case, a solution with pandas
import pandas as pd
df = pd.DataFrame(dict(key=list(d.keys()),val=list(d.values())))
print(df)
key val
0 1 a
1 2 b
2 3 c
3 4 c
4 5 c
5 6 c
df = df.drop_duplicates(subset=['val'])
df.index=df.key
df.val.to_dict()
{1: 'a', 2: 'b', 3: 'c'}
Don't know performances issues on biggest dataset or if it is more pythonic.
Nevertheless, no loops.
You can check if two last values are same:
d = OrderedDict({1: 'a', 2:'b', 3:'c', 4:'c', 5:'c', 6:'c'})
while d.values()[-1] == d.values()[-2]:
d.popitem()
print d
# OrderedDict([(1, 'a'), (2, 'b'), (3, 'c')])

Creating graph using dictionaries and nodes only

I have this data.
CITY1 CITY2
A B
A C
A D
B C
B D
C D
How i can create dictionary looking like this from the above data
x={A:[B,C,D],
B:[A,C,D],
C:[A,B,D],
D:[A,B,C]
}
Thanks
Is it in a csv? It looks like, from the data you provide, you are doing an undirected graph. Assuming that the data is in some kind of "row" type format that you can loop through, (i.e. row[0] is the city1 value, and row[1] is the city2 value):
from collections import defaultdict
def make_graph(data):
graph = defaultdict(set)
for a, b in data:
graph[a].add(b)
graph[b].add(a) # delete this line if you want a directed graph
return graph
data = [
['A','B'],
['C','D'],
['A','C']
]
print make_graph(data)
I was trying to do it without any library import.
I made a simple dictionary first .
x={'A':['B','C','D'],'B':['C','D'],'C':['D']}
for i,j in x.items():
for p in j:
if p not in x.keys():
x[p]=[]
if p in x[i] and i not in x[p]:
x[p].append(i)
print x
{'A': ['B', 'C', 'D'], 'C': ['D', 'A', 'B'], 'B': ['C', 'D', 'A'], 'D': ['A', 'C', 'B']}

How do I map values to values with a common key in Python

In the dictionaries below I want to check whether the value in aa matches the value in bb and produce a mapping of the keys of aa to the keys of bb. Do I need to rearrange the dictionaries? I import the data from a tab separated file, so I am not attached to dictionaries. Note that aa is about 100 times bigger than bb (100k lines for aa), but this is to be run infrequently and offline.
Input:
aa = {1: 'a', 3: 'c', 2 : 'b', 4 : 'd'}
bb = {'apple': 'a', 'pear': 'b', 'mango' : 'g'}
Desired output (or any similar data structure):
dd = {1 : 'apple', 2 : 'pear'}
aa = {1:'a', 3:'c', 2:'b', 4:'d'}
bb = {'apple':'a', 'pear':'b', 'mango': 'g'}
bb_rev = dict((value, key)
for key, value in bb.iteritems()) # bb.items() in python3
dd = dict((key, bb_rev[value])
for key, value in aa.iteritems() # aa.items() in python3
if value in bb_rev)
print dd
You can do something like this:
>>> aa = {1: 'a', 3: 'c', 2 : 'b', 4 : 'd'}
>>> bb = {'apple': 'a', 'pear': 'b', 'mango' : 'g'}
>>> tmp = {v: k for k, v in bb.iteritems()}
>>> dd = {k: tmp[v] for k, v in aa.iteritems() if v in tmp}
>>> dd
{1: 'apple', 2: 'pear'}
but note that this will only work if each value of the aa dictionary appears as a value of the bb dictionary either once or not at all.

Creating a dictionary with list of lists in Python

I have a huge file (with around 200k inputs). The inputs are in the form:
A B C D
B E F
C A B D
D
I am reading this file and storing it in a list as follows:
text = f.read().split('\n')
This splits the file whenever it sees a new line. Hence text is like follows:
[[A B C D] [B E F] [C A B D] [D]]
I have to now store these values in a dictionary where the key values are the first element from each list. i.e the keys will be A, B, C, D.
I am finding it difficult to enter the values as the remaining elements of the list. i.e the dictionary should look like:
{A: [B C D]; B: [E F]; C: [A B D]; D: []}
I have done the following:
inlinkDict = {}
for doc in text:
adoc= doc.split(' ')
docid = adoc[0]
inlinkDict[docid] = inlinkDict.get(docid,0) + {I do not understand what to put in here}
Please help as to how should i add the values to my dictionary. It should be 0 if there are no elements in the list except for the one which will be the key value. Like in example for 0.
A dictionary comprehension makes short work of this task:
>>> s = [['A','B','C','D'], ['B','E','F'], ['C','A','B','D'], ['D']]
>>> {t[0]:t[1:] for t in s}
{'A': ['B', 'C', 'D'], 'C': ['A', 'B', 'D'], 'B': ['E', 'F'], 'D': []}
Try using a slice:
inlinkDict[docid] = adoc[1:]
This will give you an empty list instead of a 0 for the case where only the key value is on the line. To get a 0 instead, use an or (which always returns one of the operands):
inlinkDict[docid] = adoc[1:] or 0
Easier way with a dict comprehension:
>>> with open('/tmp/spam.txt') as f:
... data = [line.split() for line in f]
...
>>> {d[0]: d[1:] for d in data}
{'A': ['B', 'C', 'D'], 'C': ['A', 'B', 'D'], 'B': ['E', 'F'], 'D': []}
>>> {d[0]: ' '.join(d[1:]) if d[1:] else 0 for d in data}
{'A': 'B C D', 'C': 'A B D', 'B': 'E F', 'D': 0}
Note: dict keys must be unique, so if you have, say, two lines beginning with 'C' the first one will be over-written.
The accepted answer is correct, except that it reads the entire file into memory (may not be desirable if you have a large file), and it will overwrite duplicate keys.
An alternate approach using defaultdict, which is available from Python 2.4 solves this:
from collections import defaultdict
d = defaultdict(list)
with open('/tmp/spam.txt') as f:
for line in f:
parts = line.strip().split()
d[parts[0]] += parts[1:]
Input:
A B C D
B E F
C A B D
D
C H I J
Result:
>>> d = defaultdict(list)
>>> with open('/tmp/spam.txt') as f:
... for line in f:
... parts = line.strip().split()
... d[parts[0]] += parts[1:]
...
>>> d['C']
['A', 'B', 'D', 'H', 'I', 'J']

Categories