Find matching keywords in nested dictionary

Find matching keywords in nested dictionary - python

I have a nested dictionary like:
data = {
'level1a': {'level2a':[1,2,3]},
'level1b': {'level2b':[4,5,6]},
'level1c': {'level2a':[7,8,9]}
}
Now i would like to find and sum up the 2 lists with the same level 2 keyword ('level2a'). The result should be something like:
[8, 10, 12]
Is there some efficient way to do that?

Something like this:
from collections import Counter
from operator import add
data = {
'level1a': {'level2a':[1,2,3]},
'level1b': {'level2b':[4,5,6]},
'level1c': {'level2a':[7,8,9]}
}
c = Counter()
dic = {}
for k,v in data.iteritems():
for k1, v1 in v.iteritems():
c[k1] += 1
val = dic.setdefault(k1, [0]*len(v1))
dic[k1] = map(add, v1, val)
for k,v in c.iteritems():
if v > 1:
print dic[k]
Output:
[8, 10, 12]

Try:
result=[]
for first_key, first_value in data.iteritems():
for second_key, second_value in first_value.iteritems():
if second_key == 'level2a':
if result == []:
result += second_value
else:
result=[result[i] + value for i,value in enumerate(second_value)]
This will iterate through each of them checking for the correct key in thes second level dicts. The if/else loop determines if any items have been added to the result already. If so, it will increment the values according to the next list in 'level2a'. This also assumes that all second level lists are the same length, otherwise you will have trailing values in result that won't be incremented.

You could try something like this:
>>> data = {
... 'level1a': {'level2a':[1,2,3]},
... 'level1b': {'level2b':[4,5,6]},
... 'level1c': {'level2a':[7,8,9]}
... }
>>>
>>> def sum_matches(data, match):
... inner = data.itervalues()
... matches = (x[match] for x in inner if match in x)
... return [sum(x) for x in zip(*matches)]
...
>>>
>>> sum_matches(data, 'level2a')
[8, 10, 12]

Related

In a dictionary where keys are tuples, remove all keys that does not have a specific value in specific positions

I have a dictionary where each key is a tuple ength N. In each position there is either a string or an empty string.
d = {('Word A','Word B','','','Word C',....) : 50,
('Word F', '','','',....,'Word H') : 10,
....
}
I have a category dictionary containing indexes, if a key in d has an empty string at every position specified by the indexes, it belongs to the category. A key can belong to multiple categories.
category_dictionary= { 'Category A':[1,2,3] , 'Category B' : [0,3,4] , .... }
In this example, the second entry in d ('Word F', '','','',....,'Word H') belongs to Category A since it has an empty string in position 1,2 and 3.
I want to remove all keys in d which do not belong to any category. What would be an efficient way of doing this? Here is code which is working but is slow.
filtered_list = []
for current_tuple in list(d.keys()):
keep_tuple = False
for category,idxs in category_dictionary.items():
all_idxs_empty = True
for idx in idxs:
if current_tuple[idx] != '':
all_idxs_empty = False
if all_idxs_empty:
filtered_list.append(current_tuple)
break
d_filter = {k:v for k,v in d.items() if k in filtered_list}
What would be a more efficient way of doing this? If I have M keys, T categories and the maximum length is U, the complexity is between complexity [O(M*T),O(M*T*U)]
Is there a way to reduce the complexity somehow?
Example data with N = 3 and 2 categories
d = {('A','','B','H') : 10,
('','','','H') : 20,
('','','F','T') : 30,
('A','C','G','') : 0
}
category_dictionary = { 'Category A':[0,1],'Category B' :[3]}
Expected output
d_filter = {('','','','H'): 20,
('','','F','T'):30,
('A','C','G','') : 0
}

Try this:
def f(tuple_, category_dictionary):
l=[i for i in range(len(tuple_)) if tuple_[i]=='']
return any([set(k)-set(l)==set() for k in category_dictionary.values()])
m=list(d.keys())
for i in m:
if not f(i, category_dictionary):
del d[i]

Not sure if this is faster, but this is simpler.
filtered_list = []
category_set = [set(x) for x in category_dictionary.values()]
for current_tuple in list(d.keys()):
#in_category = [i for i,x in enumerate(current_tuple) if x==''] in category_dictionary.values()
current_set = {i for i,x in enumerate(current_tuple) if x==''}
in_category = any([set.issubset(x, current_set) for x in category_set])
if in_category:
filtered_list.append(current_tuple)
d_filter = {k:v for k,v in d.items() if k in filtered_list}
d_filter

What can I change in this comprehension to remove brackets and quotation marks?

I am trying to get clean results in rows, but I don't know what my code is doing wrong. Comprehension gives me brackets and quotation marks which I don't want. I just simply want keys and values together.
# input()=rrbbbcc
for example:
r 2
b 3
c 2
I've tried to change this:
print(*[[k,v] for k,v in count.items() if v > 1],sep='\n')
but it's showing my result with brackets which I don't want.
if __name__ == '__main__':
s = str(input()) # input() = aabbbccde
count={}
for i in s:
count.setdefault(i, 0)
count[i]=count[i]+1
print([(k,v) for k,v in count.items() if v > 1], sep='')
I expect the output to be like this:
b 3
a 2
c 2

The most readable way would be to unwrap the comprehension and move the print function into the resulting for loop:
for k,v in count.items():
if v > 1:
print(k, v)

If you'are using Python 3, you can use Counter from collections package.
The code will be, something like this:
from collections import Counter
input = 'rrrbbbcc'
counter = Counter(input)
for key, value in counter.items():
print(f'{key} {value}')

If you want to sort your print you can work on a list
x = [(k,v) for k,v in count.items() if v > 1] #get a list
x.sort(key=lambda tup: tup[1]) #sort by count
x = x[::-1] #reverse sorting
for i in range(len(x)):
print(x[i][0], x[i][1])

You can do something like this:
from collections import Counter
x = Counter(list('rrrbbbcc'))
[print(*[k, v]) for k,v in x.items()]
>>> r 3
b 3
c 2
Edit:
from collections import Counter
x = Counter(list('rrrbbbcc'))
print('',*(' '.join([k,str(v),'\n']) for k,v in x.items()))
>>>
r 3
b 3
c 2
Edit 2:
from collections import Counter
x = Counter(list('rrrbbbcc'))
print('',*[(yield from (k,str(v)+'\n')) for k,v in x.items()])
>>>
r 3
b 3
c 2

adding empty string while joining the 2 lists - Python

I have 2 lists
mainlist=[['RD-12',12,'a'],['RD-13',45,'c'],['RD-15',50,'e']] and
sublist=[['RD-12',67],['RD-15',65]]
if i join both the list based on 1st element condition by using below code
def combinelist(mainlist,sublist):
dict1 = { e[0]:e[1:] for e in mainlist }
for e in sublist:
try:
dict1[e[0]].extend(e[1:])
except:
pass
result = [ [k] + v for k, v in dict1.items() ]
return result
Its results in like below
[['RD-12',12,'a',67],['RD-13',45,'c',],['RD-15',50,'e',65]]
as their is no element in for 'RD-13' in sublist, i want to empty string on that.
The final output should be
[['RD-12',12,'a',67],['RD-13',45,'c'," "],['RD-15',50,'e',65]]
Please help me.

Your problem can be solved using a while loop to adjust the length of your sublists until it matches the length of the longest sublist by appending the wanted string.
for list in result:
while len(list) < max(len(l) for l in result):
list.append(" ")

You could just go through the result list and check where the total number of your elements is 2 instead of 3.
for list in lists:
if len(list) == 2:
list.append(" ")
UPDATE:
If there are more items in the sublist, just subtract the lists containing the 'keys' of your lists, and then add the desired string.
def combinelist(mainlist,sublist):
dict1 = { e[0]:e[1:] for e in mainlist }
list2 = [e[0] for e in sublist]
for e in sublist:
try:
dict1[e[0]].extend(e[1:])
except:
pass
for e in dict1.keys() - list2:
dict1[e].append(" ")
result = [[k] + v for k, v in dict1.items()]
return result

You can try something like this:
mainlist=[['RD-12',12],['RD-13',45],['RD-15',50]]
sublist=[['RD-12',67],['RD-15',65]]
empty_val = ''
# Lists to dictionaries
maindict = dict(mainlist)
subdict = dict(sublist)
result = []
# go through all keys
for k in list(set(list(maindict.keys()) + list(subdict.keys()))):
# pick the value from each key or a default alternative
result.append([k, maindict.pop(k, empty_val), subdict.pop(k, empty_val)])
# sort by the key
result = sorted(result, key=lambda x: x[0])
You can set up your empty value to whatever you need.
UPDATE
Following the new conditions, it would look like this:
mainlist=[['RD-12',12,'a'], ['RD-13',45,'c'], ['RD-15',50,'e']]
sublist=[['RD-12',67], ['RD-15',65]]
maindict = {a:[b, c] for a, b, c in mainlist}
subdict = dict(sublist)
result = []
for k in list(set(list(maindict.keys()) + list(subdict.keys()))):
result.append([k, ])
result[-1].extend(maindict.pop(k, ' '))
result[-1].append(subdict.pop(k, ' '))
sorted(result, key=lambda x: x[0])

Another option is to convert the sublist to a dict, so items are easily and rapidly accessible.
sublist_dict = dict(sublist)
So you can do (it modifies the mainlist):
for i, e in enumerate(mainlist):
data: mainlist[i].append(sublist_dict.get(e[0], ""))
#=> [['RD-12', 12, 'a', 67], ['RD-13', 45, 'c', ''], ['RD-15', 50, 'e', 65]]
Or a one liner list comprehension (it produces a new list):
[ e + [sublist_dict.get(e[0], "")] for e in mainlist ]
If you want to skip the missing element:
for i, e in enumerate(mainlist):
data = sublist_dict.get(e[0])
if data: mainlist[i].append(data)
print(mainlist)
#=> [['RD-12', 12, 'a', 67], ['RD-13', 45, 'c'], ['RD-15', 50, 'e', 65]]

In Python, How can I get the next and previous key:value of a particular key in a dictionary?

Okay, so this is a little hard to explain, but here goes:
I have a dictionary, which I'm adding content to. The content is a hashed username (key) with an IP address (value).
I was putting the hashes into an order by running them against base 16, and then using Collection.orderedDict.
So, the dictionary looked a little like this:
d = {'1234': '8.8.8.8', '2345':'0.0.0.0', '3213':'4.4.4.4', '4523':'1.1.1.1', '7654':'1.3.3.7', '9999':'127.0.0.1'}
What I needed was a mechanism that would allow me to pick one of those keys, and get the key/value item one higher and one lower. So, for example, If I were to pick 2345, the code would return the key:value combinations '1234:8.8.8.8' and '3213:4.4.4.4'
So, something like:
for i in d:
while i < len(d)
if i == '2345':
print i.nextItem
print i.previousItem
break()

Edit: OP now states that they are using OrderedDicts but the use case still requires this sort of approach.
Since dicts are not ordered you cannot directly do this. From your example, you are trying to reference the item like you would use a linked list.
A quick solution would be instead to extract the keys and sort them then iterate over that list:
keyList=sorted(d.keys())
for i,v in enumerate(keyList):
if v=='eeee':
print d[keyList[i+1]]
print d[keyList[i-1]]
The keyList holds the order of your items and you have to go back to it to find out what the next/previous key is to get the next/previous value. You also have to check for i+1 being greater than the list length and i-1 being less than 0.
You can use an OrderedDict similarly but I believe that you still have to do the above with a separate list as OrderedDict doesn't have next/prev methods.

As seen in the OrderedDict source code,
if you have a key and you want to find the next and prev in O(1) here's how you do that.
>>> from collections import OrderedDict
>>> d = OrderedDict([('aaaa', 'a',), ('bbbb', 'b'), ('cccc', 'c'), ('dddd', 'd'), ('eeee', 'e'), ('ffff', 'f')])
>>> i = 'eeee'
>>> link_prev, link_next, key = d._OrderedDict__map['eeee']
>>> print 'nextKey: ', link_next[2], 'prevKey: ', link_prev[2]
nextKey: ffff prevKey: dddd
This will give you next and prev by insertion order. If you add items in random order then just keep track of your items in sorted order.

You could also use the list.index() method.
This function is more generic (you can check positions +n and -n), it will catch attempts at searching a key that's not in the dict, and it will also return None if there's nothing before of after the key:
def keyshift(dictionary, key, diff):
if key in dictionary:
token = object()
keys = [token]*(diff*-1) + sorted(dictionary) + [token]*diff
newkey = keys[keys.index(key)+diff]
if newkey is token:
print None
else:
print {newkey: dictionary[newkey]}
else:
print 'Key not found'
keyshift(d, 'bbbb', -1)
keyshift(d, 'eeee', +1)

Try:
pos = 0
d = {'aaaa': 'a', 'bbbb':'b', 'cccc':'c', 'dddd':'d', 'eeee':'e', 'ffff':'f'}
for i in d:
pos+=1
if i == 'eeee':
listForm = list(d.values())
print(listForm[pos-1])
print(listForm[pos+1])
As in #AdamKerz's answer enumerate seems pythonic, but if you are a beginner this code might help you understand it in an easy way.
And I think its faster + smaller compared to sorting followed by building list & then enumerating

You could use a generic function, based on iterators, to get a moving window (taken from this question):
import itertools
def window(iterable, n=3):
it = iter(iterable)
result = tuple(itertools.islice(it, n))
if len(result) == n:
yield result
for element in it:
result = result[1:] + (element,)
yield result
l = range(8)
for i in window(l, 3):
print i
Using the above function with OrderedDict.items() will give you three (key, value) pairs, in order:
d = collections.OrderedDict(...)
for p_item, item, n_item in window(d.items()):
p_key, p_value = p_item
key, value = item
# Or, if you don't care about the next value:
n_key, _ = n_item
Of course using this function the first and last values will never be in the middle position (although this should not be difficult to do with some adaptation).
I think the biggest advantage is that it does not require table lookups in the previous and next keys, and also that it is generic and works with any iterable.

Maybe it is an overkill, but you can keep Track of the Keys inserted with a Helper Class and according to that list, you can retrieve the Key for Previous or Next. Just don't forget to check for border conditions, if the objects is already first or last element. This way, you will not need to always resort the ordered list or search for the element.
from collections import OrderedDict
class Helper(object):
"""Helper Class for Keeping track of Insert Order"""
def __init__(self, arg):
super(Helper, self).__init__()
dictContainer = dict()
ordering = list()
#staticmethod
def addItem(dictItem):
for key,value in dictItem.iteritems():
print key,value
Helper.ordering.append(key)
Helper.dictContainer[key] = value
#staticmethod
def getPrevious(key):
index = (Helper.ordering.index(key)-1)
return Helper.dictContainer[Helper.ordering[index]]
#Your unordered dictionary
d = {'aaaa': 'a', 'bbbb':'b', 'cccc':'c', 'dddd':'d', 'eeee':'e', 'ffff':'f'}
#Create Order over keys
ordered = OrderedDict(sorted(d.items(), key=lambda t: t[0]))
#Push your ordered list to your Helper class
Helper.addItem(ordered)
#Get Previous of
print Helper.getPrevious('eeee')
>>> d

You can store the keys and values in temp variable in prior, and can access previous and next key,value pair using index.
It is pretty dynamic, will work for any key you query. Please check this code :
d = {'1234': '8.8.8.8', '2345':'0.0.0.0', '3213':'4.4.4.4', '4523':'1.1.1.1', '7654':'1.3.3.7', '9999':'127.0.0.1'}
ch = raw_input('Pleasure Enter your choice : ')
keys = d.keys()
values = d.values()
#print keys, values
for k,v in d.iteritems():
if k == ch:
ind = d.keys().index(k)
print keys[ind-1], ':',values[ind-1]
print keys[ind+1], ':',values[ind+1]

I think this is a nice Pythonic way of resolving your problem using a lambda and list comprehension, although it may not be optimal in execution time:
import collections
x = collections.OrderedDict([('a','v1'),('b','v2'),('c','v3'),('d','v4')])
previousItem = lambda currentKey, thisOrderedDict : [
list( thisOrderedDict.items() )[ z - 1 ] if (z != 0) else None
for z in range( len( thisOrderedDict.items() ) )
if (list( thisOrderedDict.keys() )[ z ] == currentKey) ][ 0 ]
nextItem = lambda currentKey, thisOrderedDict : [
list( thisOrderedDict.items() )[ z + 1 ] if (z != (len( thisOrderedDict.items() ) - 1)) else None
for z in range( len( thisOrderedDict.items() ) )
if (list( thisOrderedDict.keys() )[ z ] == currentKey) ][ 0 ]
assert previousItem('c', x) == ('b', 'v2')
assert nextItem('c', x) == ('d', 'v4')
assert previousItem('a', x) is None
assert nextItem('d',x) is None

Another way that seems simple and straight forward: this function returns the key which is offset positions away from k
def get_shifted_key(d:dict, k:str, offset:int) -> str:
l = list(d.keys())
if k in l:
i = l.index(k) + offset
if 0 <= i < len(l):
return l[i]
return None

i know how to get next key:value of a particular key in a dictionary:
flag = 0
for k, v in dic.items():
if flag == 0:
code...
flag += 1
continue
code...{next key and value in for}

if correct :
d = { "a": 1, "b":2, "c":3 }
l = list( d.keys() ) # make a list of the keys
k = "b" # the actual key
i = l.index( k ) # get index of the actual key
for the next :
i = i+1 if i+1 < len( l ) else 0 # select next index or restart 0
n = l [ i ]
d [ n ]
for the previous :
i = i-1 if i-1 >= 0 else len( l ) -1 # select previous index or go end
p = l [ i ]
d [ p ]

What is the fastest way to compare two lists with thounsands of entries?

I have
list1 = ["value1;value2;value3;value4;fdsa",]
list2 = ["value1;value2;value3;value4;asdf",]
What I need to do is go through each list2 entry, compare values with index 0,1,2,3 and if they match - use the fourth entry in another method.
Right now I have something like this:
for entry1 in list1:
for entry2 in list2:
if entry2.split(';')[0] == entry1.split(';')[0]: #... etc, compare first 3 values
print(entry2.split(';')[4]) # edited out my code
#do stuff
This obviously works, but it is incredibly slow. I am using Python 2.78

Firstly create a dictionary from list2's items with first four items as keys and the 5th item as value.
dct = dict(x.rsplit(';', 1) for x in list2)
And then loop over list1 and check if the key exist in the above dict:
for x in list1:
k, v = x.rsplit(';', 1)
if k in dct:
val = dct[k]
#do something with val
In case list2 contains repeated keys with different values then you may need to store them in a list:
from collections import defaultdict
d = defaultdict(list)
for x in list2:
k, v = x.rsplit(';', 1)
d[k].append(v)
for x in list1:
k, v = x.rsplit(';', 1)
for val in d[k]:
#do something with val

You are splitting the entries from list1 multiple times. If you split them once and store the result in a variable, you can reuse it in the inner loop.

Try like this:-
colon_sep_list1=list1.split(";")
colon_sep_list2=list2.split(";")
for index in range(len(colon_sep_list2)):
if index <=len(colon_sep_list1):
if colon_sep_list1[index]==colon_sep_list2[index]:
print colon_sep_list2[4]
break

For avoiding to split the list2 entries every time for compare with list1 entries store the split list in a separate variable and work with them :
>>> l2=map(lambda x:x.split(';'),list2)
>>> [j[4] for i in list1 for j in l2 if i.split(';')[0] == j[0]]
['asdf']
Benchmarking :
list1 = ["value1;value2;value3;value4;fdsa",]
list2 = ["value1;value2;value3;value4;asdf",]
def test1():
l2=map(lambda x:x.split(';'),list2)
new=[j[4] for i in list1 for j in l2 if i.split(';')[0] == j[0]]
def test2():
new=[]
for entry1 in list1:
for entry2 in list2:
if entry2.split(';')[0] == entry1.split(';')[0]: #... etc, compare first 3 values
new.append(entry2.split(';')[4]) # edited out my code
#do stuff
if __name__ == '__main__':
import timeit
print 'test 1 : ',timeit.timeit("test1()", setup="from __main__ import test1")
print 'test 2 : ',timeit.timeit("test2()", setup="from __main__ import test2")
result :
test 1 : 1.24494791031
test 2 : 1.34099817276

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Find matching keywords in nested dictionary - python

Related

In a dictionary where keys are tuples, remove all keys that does not have a specific value in specific positions

What can I change in this comprehension to remove brackets and quotation marks?

adding empty string while joining the 2 lists - Python

In Python, How can I get the next and previous key:value of a particular key in a dictionary?

What is the fastest way to compare two lists with thounsands of entries?

Categories

Resources