Access defaultdict(dict) based on list containing the keys - python

Ok so I have a project that I am working on and I cannot figure this out.
I apologize if this has been asked before, I've searched and found nothing.
This is my first post.
I have some pandas dataframes that I want to access based on a hash which I've setup with:
df = defaultdict(lambda: defaultdict(dict))
or
df = defaultdict(dict)
I did this so I could index like df['a']['1'][1] or df['a'][1] depending on the use case.
Note that the shape of the "matrix" will not necessarily be equal. So
df['a']['2'][1] may exist but not df['b']['2'][1].
TLDR
I'd like to access the df using a list like ['a', '2', 1] or ['a', 1]
What I've done:
The old way:
I used to create master lists that I would then iterate through and check. This works but I feel like it is very ugly. It is also different for the two use cases above. I am now trying to make a wrapper around the two use cases above. I would love for the wrapper to not be a big switch for the two use cases.
x_master_list = []
y_master_list = []
for x in df:
if x not in x_master_list:
x_master_list.append(channel)
for y in df[x]:
if y not in y_master_list:
y_master_list.append(idx)
for y in y_master_list:
for x in x_master_list:
if x in df:
if y in df[x]:
The newer way:
I found a link discussing using recursion to get all of the keys. It was nice because it preserved the order of the hierarchy.
def iter_leafs(d, keys=[]):
for key, val in d.items():
if isinstance(val, defaultdict) | isinstance(val, dict):
yield from iter_leafs(val, keys + [key])
else:
yield keys + [key]
I modified the creation of my master lists to:
def create_master_lists(type, df):
check_type(type)
lists = master_lists[type]
key_list = list(iter_leafs(df))
for key in key_list:
for idx,list in enumerate(lists):
if key[idx] not in list:
list.append(key[idx])
return lists
Now I want to do something like the following:
key_list = list(iter_leafs(df))
for y in y_master_list:
valid_idx_keys = [key for key in keylist if key[-1] == y]
Here key_list looks like [['a','1',0],['a','1',1], etc]
and valid_idx_keys is basically a filtered version.
I want to take each list from the valid_idx_keys and access df. I cannot figure out how to achieve this.
If I do the following it works, but again the point is to make a wrapper around the two use cases which do not have the same number of indexing arguments.
for x,y,z in valid_idx_keys:
df[x][y][z]
Maybe something with recursion that slowly steps one layer down for each element in the sublists? I am still trying things, but I wanted to post here in case someone has a way to achieve this or a better solution to my problem.

So I ended up with the following. It works but I am open to suggestions.
from collections import defaultdict
def search_dict(d, list):
key = list[0]
val = d.get(key)
if isinstance(val, defaultdict) | isinstance(val, dict):
yield from search_dict(val, list[1:])
else:
yield val
df = defaultdict(lambda: defaultdict(dict))
df['a']['1'][1] = 0
df['b']['1'][1] = 1
test_key_list = [['a', '1', 1], ['b','1',1]]
print(list(search_dict(df, test_key_list[0])))
print(list(search_dict(df, test_key_list[1])))
vals = []
for lis in test_key_list:
print(lis)
vals = vals + list(search_dict(df, lis))
print(vals)
df2 = defaultdict(dict)
df2['a'][1] = 0
df2['b'][1] = 1
test_key_list2 = [['a', 1], ['b',1]]
vals = []
for lis in test_key_list2:
print(lis)
vals = vals + list(search_dict(df2, lis))
print(vals)

Related

python list of lists to dict when key appear many times

I know to write something simple and slow with loop, but I need it to run super fast in big scale.
input:
lst = [[1, 1, 2], ["txt1", "txt2", "txt3"]]
desired out put:
d = {1 : ["txt1", "txt2"], 2 : "txt3"]
There is something built-in at python which make dict() extend key instead replacing it?
dict(list(zip(lst[0], lst[1])))
One option is to use dict.setdefault:
out = {}
for k, v in zip(*lst):
out.setdefault(k, []).append(v)
Output:
{1: ['txt1', 'txt2'], 2: ['txt3']}
If you want the element itself for singleton lists, one way is adding a condition that checks for it while you build an output dictionary:
out = {}
for k,v in zip(*lst):
if k in out:
if isinstance(out[k], list):
out[k].append(v)
else:
out[k] = [out[k], v]
else:
out[k] = v
or if lst[0] is sorted (like it is in your sample), you could use itertools.groupby:
from itertools import groupby
out = {}
pos = 0
for k, v in groupby(lst[0]):
length = len([*v])
if length > 1:
out[k] = lst[1][pos:pos+length]
else:
out[k] = lst[1][pos]
pos += length
Output:
{1: ['txt1', 'txt2'], 2: 'txt3'}
But as #timgeb notes, it's probably not something you want because afterwards, you'll have to check for data type each time you access this dictionary (if value is a list or not), which is an unnecessary problem that you could avoid by having all values as lists.
If you're dealing with large datasets it may be useful to add a pandas solution.
>>> import pandas as pd
>>> lst = [[1, 1, 2], ["txt1", "txt2", "txt3"]]
>>> s = pd.Series(lst[1], index=lst[0])
>>> s
1 txt1
1 txt2
2 txt3
>>> s.groupby(level=0).apply(list).to_dict()
{1: ['txt1', 'txt2'], 2: ['txt3']}
Note that this also produces lists for single elements (e.g. ['txt3']) which I highly recommend. Having both lists and strings as possible values will result in bugs because both of those types are iterable. You'd need to remember to check the type each time you process a dict-value.
You can use a defaultdict to group the strings by their corresponding key, then make a second pass through the list to extract the strings from singleton lists. Regardless of what you do, you'll need to access every element in both lists at least once, so some iteration structure is necessary (and even if you don't explicitly use iteration, whatever you use will almost definitely use iteration under the hood):
from collections import defaultdict
lst = [[1, 1, 2], ["txt1", "txt2", "txt3"]]
result = defaultdict(list)
for key, value in zip(lst[0], lst[1]):
result[key].append(value)
for key in result:
if len(result[key]) == 1:
result[key] = result[key][0]
print(dict(result)) # Prints {1: ['txt1', 'txt2'], 2: 'txt3'}

Create a dictionary with specific pairs from other dictionaries

This list:
data=[[{'t1':'text1.txt','date1':'class1'}],[{'t2':'text2.txt','date2':'class2'}]]
data
gives
[[{'t1': 'text1.txt', 'date1': 'class1'}],
[{'t2': 'text2.txt', 'date2': 'class2'}]]
and I want to turn it into this:
EDIT brackets added
[[{'text1.txt': 'class1'}], [{'text2.txt': 'class2'}]]
which means:
to create a list where each sublist inside, will be comprised of a dictionary where the key would be the value of the first dictionary in the first sublist and the value would be the value of the second dictionary of the first sublist and so on for the following sublists.
My attempt was this:
se=[]
for i in data:
for j in i:
jk=j.values()
se.append(jk)
se
Iterate through each dictionary in nested list and create a tuple from values() method for each dictionary like this tuple(dict.values()). After converting to tuple you can use dict() to create dictionary from the tuple like this dict([tuple(dict.values())])
Note: If your dictionary has exactly two keys then only it will work.
res = [[dict([tuple(d.values())]) for d in lst]for lst in data]
print(res)
Output:
[[{'text1.txt': 'class1'}], [{'text2.txt': 'class2'}]]
Your code does most of the job. You can add another line to get the desired results:
In [108]: se
Out[108]: [dict_values(['text1.txt', 'class1']), dict_values(['text2.txt', 'class2'])]
In [109]: [[{list(x)[0]:list(x)[1]} for x in se]]
Out[109]: [[{'text1.txt': 'class1'}, {'text2.txt': 'class2'}]]
Try this:
data=[[{'t1':'text1.txt','date1':'class1'}],[{'t2':'text2.txt','date2':'class2'}]]
all_vals = [];
for i in data:
for j in i:
for key in j:
all_vals.append(j[key])
new_list = [];
for i in range(0,len(all_vals)-1):
if (i % 2) == 0:
new_dict = {};
new_dict = {all_vals[i]:all_vals[i+1]}
new_list.append(new_dict)
else:
continue
print(new_list)
Output:
[{'text1.txt': 'class1'}, {'text2.txt': 'class2'}]
This code works regardless of the length of your list.
The following function should convert the inputs to the outputs you requested.
def convert_to_list_of_list_of_dictionaries(input_dictionaries):
ret_dictionaries = []
for inp in input_dictionaries:
k, v = inp[0].values()
ret_dictionaries.append({k, v})
return ret_dictionaries
However, there are a few things going on with the input/outputs that are little concerning and make the data harder to work with.
On the input side, the data is being wrapped in an extra list that in this context, does not provide any function, and forces you to index the first element of the inner list to access the dict k, v = inp[0].values(). On the output side, we're doing the same thing, which makes it harder to iterate over the outputs. It would look something like:
# psuedocode
for kv in reformatted_outputs:
unwrapped_dict = kv[0]
key, value = next(iter(unwrapped_dict.items()))
# process key and value
instead, if you had an output format like ``{'text1.txt': 'class1', 'text2.txt': 'class2'}`, you could process data like:
key, value in reformatted_outputs.items():
# process key and value
If you have the ability to modify the inputs and outputs of what you're working on, this could save you some trouble, and anyone you're working with some head scratches.
If you wanted to modify the output format, your function could look something like:
def convert_to_single_dictionary(input_dictionaries):
ret = {}
for inp in input_dictionaries:
print(inp)
# it looks like the input data is being wrapped in an extra list
k, v = inp[0].values()
ret[k] = v
return ret
Hope this is helpful and thanks for asking the question!

Removing duplicates in values of dictionary in python

Sorry the topic's title is vague, I find it hard to explain.
I have a dictionary in which each value is a list of items. I wish to remove the duplicated items, so that each item will appear minimum times (preferable once) in the lists.
Consider the dictionary:
example_dictionary = {"weapon1":[1,2,3],"weapon2":[2,3],"weapon3":[2,3]}
'weapon2' and 'weapon3' have the same values, so it should result in:
result_dictionary = {"weapon1":[1],"weapon2":[3],"weapon3":[2]}
since I don't mind the order, it can also result in:
result_dictionary = {"weapon1":[1],"weapon2":[2],"weapon3":[3]}
But when "there's no choice" it should leave the value. Consider this new dictionary:
example_dictionary = {"weapon1":[1,2,3],"weapon2":[2,3],"weapon3":[2,3],"weapon4":[3]}
now, since it cannot assign either '2' or '3' only once without leaving a key empty, a possible output would be:
result_dictionary = {"weapon1":[1],"weapon2":[3],"weapon3":[2],"weapon4":[3]}
I can relax the problem to only the first part and manage, though I prefer a solution to the two parts together
#!/usr/bin/env python3
example_dictionary = {"weapon1":[1,2,3],"weapon2":[2,3],"weapon3":[2,3]}
result = {}
used_values = []
def extract_semi_unique_value(my_list):
for val in my_list:
if val not in used_values:
used_values.append(val)
return val
return my_list[0]
for key, value in example_dictionary.items():
semi_unique_value = extract_semi_unique_value(value)
result[key] = [semi_unique_value]
print(result)
This is probably not the most efficient solution possible. Because it involves iteration over all possible combinations, then it'll run quite slow for large targets.
It makes use of itertools.product() to get all possible combinations. Then in it, tries to find the combination with the most unique numbers (by testing the length of a set).
from itertools import product
def dedup(weapons):
# get the keys and values ordered so we can join them back
# up again at the end
keys, vals = zip(*weapons.items())
# because sets remove all duplicates, whichever combo has
# the longest set is the most unique
best = max(product(*vals), key=lambda combo: len(set(combo)))
# combine the keys and whatever we found was the best combo
return {k: [v] for k, v in zip(keys, best)}
From the examples:
dedup({"weapon1":[1,2,3],"weapon2":[2,3],"weapon3":[2,3]})
#: {'weapon1': 1, 'weapon2': 2, 'weapon3': 3}
dedup({"weapon1":[1,2,3],"weapon2":[2,3],"weapon3":[2,3],"weapon4":[3]})
#: {'weapon1': 1, 'weapon2': 2, 'weapon3': 2, 'weapon4': 3}
this could help
import itertools
res = {'weapon1': [1, 2, 3], 'weapon2': [2, 3], 'weapon3': [2, 3]}
r = [[x] for x in list(set(list(itertools.chain.from_iterable(res.values()))))]
r2 = [x for x in res.keys()]
r3 = list(itertools.product(r2,r))
r4 = dict([r3[x] for x in range(0,len(r3)) if not x%4])

Python searching a Json with key value

I'm having a JSON file and I'm trying to do a search using the values ( not the keys). Is there a built in function in Python that does so?
[["2778074170846781111111110", "a33104eb1987bec2519fe051d1e7bd0b4c9e4875"],
["2778074170846781111111111", "f307fb3db3380bfd27901bc591eb025398b0db66"]]
I thought of this approach. Loading the file into a list and start searching. Is there a more efficient way?
def OptionLookUp(keyvalue):
with open('data.json', 'r') as table:
x= json.loads(table)
After your edit I can say that there is no faster/more efficient way than turning your JSON into python 2-dimensional list and loop through each node and compare second field with your "keyvalue".
EDIT: faster/more efficient
your_dict = {'a': 1, 'b': 2, 'c': 'asd'} # the dictionary
your_value = 'asd' # the value to look for
[elem for elem in your_dict.iteritems() if elem[1] == 'asd'] # look for entry with value == your_value
Output: [('c', 'asd')]
EDIT:
for a list:
your_list = [['a', 1], ['b', 2], ['c', 'asd']] # the list
your_value = 'asd' # the value to look for
[elem for elem in your_list if elem[1] == 'asd'] # look for element with value == your_value
Output: [('c', 'asd')]
I assume you're looking for the key (or keys) associated with a given value.
If your data is garanteed to be a list of (key, value) pairs, depending on 1/ the data's volume and 2/ how many lookups you'll have to perform on a same dataset, you can either do a plain sequential search:
def slookup(lst, value):
return [k for k, v in lst if v == value]
or build a reverse index then lookup the index:
import defaultdict
def index(lst):
d = defaultdict(list)
for k, v in lst:
d[v].append(k)
return d
rindex = index(lst)
print rindex.get(someval)
print rindex.get(someotherval)
This second solution only makes sense if you have a lot of lookups to do on the same dataset, obviously...

access value of a python dict() without knowing the keys

I have a dictionary of a list of dictionaries. something like below:
x = {'a':[{'p':1, 'q':2}, {'p':4, 'q':5}], 'b':[{'p':6, 'q':1}, {'p':10, 'q':12}]}
The length of the lists (values) is the same for all keys of dict x.
I want to get the length of any one value i.e. a list without having to go through the obvious method -> get the keys, use len(x[keys[0]]) to get the length.
my code for this as of now:
val = None
for key in x.keys():
val = x[key]
break
#break after the first iteration as the length of the lists is the same for any key
try:
what_i_Want = len(val)
except TypeError:
print 'val wasn't set'
i am not happy with this, can be made more 'pythonic' i believe.
This is most efficient way, since we don't create any intermediate lists.
print len(x[next(iter(x))]) # 2
Note: For this method to work, the dictionary should have atleast one key in it.
What about this:
val = x[x.keys()[0]]
or alternatively:
val = x.values()[0]
and then your answer is
len(val)
Some of the other solutions (posted by thefourtheye and gnibbler) are better because they are not creating an intermediate list. I added this response merely as an easy to remember and obvious option, not a solution for time-efficient usage.
Works ok in Python2 or Python3
>>> x = {'a':[{'p':1, 'q':2}, {'p':4, 'q':5}], 'b':[{'p':6, 'q':1}, {'p':10, 'q':12}]}
>>> next(len(i) for i in x.values())
2
This is better for Python2 as it avoids making a list of the values. Works well in Python3 too
>>> next(len(x[k]) for k in x)
2
Using next and iter:
>>> x = {'a':[{'p':1, 'q':2}, {'p':4, 'q':5}], 'b':[{'p':6, 'q':1}, {'p':10, 'q':12}]}
>>> val = next(iter(x.values()), None) # Use `itervalues` in Python 2.x
>>> val
[{'q': 2, 'p': 1}, {'q': 5, 'p': 4}]
>>> len(val)
2
>>> x = {}
>>> val = next(iter(x.values()), None) # `None`: default value
>>> val is None
True
>>> x = {'a':[{'p':1, 'q':2}, {'p':4, 'q':5}], 'b':[{'p':6, 'q':1}, {'p':10, 'q':12}]}
>>> len(x.values()[0])
2
Here, x.values gives you a list of all values then you can get length of any one value from it.

Categories