I'm having a JSON file and I'm trying to do a search using the values ( not the keys). Is there a built in function in Python that does so?
[["2778074170846781111111110", "a33104eb1987bec2519fe051d1e7bd0b4c9e4875"],
["2778074170846781111111111", "f307fb3db3380bfd27901bc591eb025398b0db66"]]
I thought of this approach. Loading the file into a list and start searching. Is there a more efficient way?
def OptionLookUp(keyvalue):
with open('data.json', 'r') as table:
x= json.loads(table)
After your edit I can say that there is no faster/more efficient way than turning your JSON into python 2-dimensional list and loop through each node and compare second field with your "keyvalue".
EDIT: faster/more efficient
your_dict = {'a': 1, 'b': 2, 'c': 'asd'} # the dictionary
your_value = 'asd' # the value to look for
[elem for elem in your_dict.iteritems() if elem[1] == 'asd'] # look for entry with value == your_value
Output: [('c', 'asd')]
EDIT:
for a list:
your_list = [['a', 1], ['b', 2], ['c', 'asd']] # the list
your_value = 'asd' # the value to look for
[elem for elem in your_list if elem[1] == 'asd'] # look for element with value == your_value
Output: [('c', 'asd')]
I assume you're looking for the key (or keys) associated with a given value.
If your data is garanteed to be a list of (key, value) pairs, depending on 1/ the data's volume and 2/ how many lookups you'll have to perform on a same dataset, you can either do a plain sequential search:
def slookup(lst, value):
return [k for k, v in lst if v == value]
or build a reverse index then lookup the index:
import defaultdict
def index(lst):
d = defaultdict(list)
for k, v in lst:
d[v].append(k)
return d
rindex = index(lst)
print rindex.get(someval)
print rindex.get(someotherval)
This second solution only makes sense if you have a lot of lookups to do on the same dataset, obviously...
Related
I know to write something simple and slow with loop, but I need it to run super fast in big scale.
input:
lst = [[1, 1, 2], ["txt1", "txt2", "txt3"]]
desired out put:
d = {1 : ["txt1", "txt2"], 2 : "txt3"]
There is something built-in at python which make dict() extend key instead replacing it?
dict(list(zip(lst[0], lst[1])))
One option is to use dict.setdefault:
out = {}
for k, v in zip(*lst):
out.setdefault(k, []).append(v)
Output:
{1: ['txt1', 'txt2'], 2: ['txt3']}
If you want the element itself for singleton lists, one way is adding a condition that checks for it while you build an output dictionary:
out = {}
for k,v in zip(*lst):
if k in out:
if isinstance(out[k], list):
out[k].append(v)
else:
out[k] = [out[k], v]
else:
out[k] = v
or if lst[0] is sorted (like it is in your sample), you could use itertools.groupby:
from itertools import groupby
out = {}
pos = 0
for k, v in groupby(lst[0]):
length = len([*v])
if length > 1:
out[k] = lst[1][pos:pos+length]
else:
out[k] = lst[1][pos]
pos += length
Output:
{1: ['txt1', 'txt2'], 2: 'txt3'}
But as #timgeb notes, it's probably not something you want because afterwards, you'll have to check for data type each time you access this dictionary (if value is a list or not), which is an unnecessary problem that you could avoid by having all values as lists.
If you're dealing with large datasets it may be useful to add a pandas solution.
>>> import pandas as pd
>>> lst = [[1, 1, 2], ["txt1", "txt2", "txt3"]]
>>> s = pd.Series(lst[1], index=lst[0])
>>> s
1 txt1
1 txt2
2 txt3
>>> s.groupby(level=0).apply(list).to_dict()
{1: ['txt1', 'txt2'], 2: ['txt3']}
Note that this also produces lists for single elements (e.g. ['txt3']) which I highly recommend. Having both lists and strings as possible values will result in bugs because both of those types are iterable. You'd need to remember to check the type each time you process a dict-value.
You can use a defaultdict to group the strings by their corresponding key, then make a second pass through the list to extract the strings from singleton lists. Regardless of what you do, you'll need to access every element in both lists at least once, so some iteration structure is necessary (and even if you don't explicitly use iteration, whatever you use will almost definitely use iteration under the hood):
from collections import defaultdict
lst = [[1, 1, 2], ["txt1", "txt2", "txt3"]]
result = defaultdict(list)
for key, value in zip(lst[0], lst[1]):
result[key].append(value)
for key in result:
if len(result[key]) == 1:
result[key] = result[key][0]
print(dict(result)) # Prints {1: ['txt1', 'txt2'], 2: 'txt3'}
Ok so I have a project that I am working on and I cannot figure this out.
I apologize if this has been asked before, I've searched and found nothing.
This is my first post.
I have some pandas dataframes that I want to access based on a hash which I've setup with:
df = defaultdict(lambda: defaultdict(dict))
or
df = defaultdict(dict)
I did this so I could index like df['a']['1'][1] or df['a'][1] depending on the use case.
Note that the shape of the "matrix" will not necessarily be equal. So
df['a']['2'][1] may exist but not df['b']['2'][1].
TLDR
I'd like to access the df using a list like ['a', '2', 1] or ['a', 1]
What I've done:
The old way:
I used to create master lists that I would then iterate through and check. This works but I feel like it is very ugly. It is also different for the two use cases above. I am now trying to make a wrapper around the two use cases above. I would love for the wrapper to not be a big switch for the two use cases.
x_master_list = []
y_master_list = []
for x in df:
if x not in x_master_list:
x_master_list.append(channel)
for y in df[x]:
if y not in y_master_list:
y_master_list.append(idx)
for y in y_master_list:
for x in x_master_list:
if x in df:
if y in df[x]:
The newer way:
I found a link discussing using recursion to get all of the keys. It was nice because it preserved the order of the hierarchy.
def iter_leafs(d, keys=[]):
for key, val in d.items():
if isinstance(val, defaultdict) | isinstance(val, dict):
yield from iter_leafs(val, keys + [key])
else:
yield keys + [key]
I modified the creation of my master lists to:
def create_master_lists(type, df):
check_type(type)
lists = master_lists[type]
key_list = list(iter_leafs(df))
for key in key_list:
for idx,list in enumerate(lists):
if key[idx] not in list:
list.append(key[idx])
return lists
Now I want to do something like the following:
key_list = list(iter_leafs(df))
for y in y_master_list:
valid_idx_keys = [key for key in keylist if key[-1] == y]
Here key_list looks like [['a','1',0],['a','1',1], etc]
and valid_idx_keys is basically a filtered version.
I want to take each list from the valid_idx_keys and access df. I cannot figure out how to achieve this.
If I do the following it works, but again the point is to make a wrapper around the two use cases which do not have the same number of indexing arguments.
for x,y,z in valid_idx_keys:
df[x][y][z]
Maybe something with recursion that slowly steps one layer down for each element in the sublists? I am still trying things, but I wanted to post here in case someone has a way to achieve this or a better solution to my problem.
So I ended up with the following. It works but I am open to suggestions.
from collections import defaultdict
def search_dict(d, list):
key = list[0]
val = d.get(key)
if isinstance(val, defaultdict) | isinstance(val, dict):
yield from search_dict(val, list[1:])
else:
yield val
df = defaultdict(lambda: defaultdict(dict))
df['a']['1'][1] = 0
df['b']['1'][1] = 1
test_key_list = [['a', '1', 1], ['b','1',1]]
print(list(search_dict(df, test_key_list[0])))
print(list(search_dict(df, test_key_list[1])))
vals = []
for lis in test_key_list:
print(lis)
vals = vals + list(search_dict(df, lis))
print(vals)
df2 = defaultdict(dict)
df2['a'][1] = 0
df2['b'][1] = 1
test_key_list2 = [['a', 1], ['b',1]]
vals = []
for lis in test_key_list2:
print(lis)
vals = vals + list(search_dict(df2, lis))
print(vals)
I have a Python list which holds pairs of key/value:
l = [[1, 'A'], [1, 'B'], [2, 'C']]
I want to convert the list into a dictionary, where multiple values per key would be aggregated into a tuple:
{1: ('A', 'B'), 2: ('C',)}
The iterative solution is trivial:
l = [[1, 'A'], [1, 'B'], [2, 'C']]
d = {}
for pair in l:
if pair[0] in d:
d[pair[0]] = d[pair[0]] + tuple(pair[1])
else:
d[pair[0]] = tuple(pair[1])
print(d)
{1: ('A', 'B'), 2: ('C',)}
Is there a more elegant, Pythonic solution for this task?
from collections import defaultdict
d1 = defaultdict(list)
for k, v in l:
d1[k].append(v)
d = dict((k, tuple(v)) for k, v in d1.items())
d contains now {1: ('A', 'B'), 2: ('C',)}
d1 is a temporary defaultdict with lists as values, which will be converted to tuples in the last line. This way you are appending to lists and not recreating tuples in the main loop.
Using lists instead of tuples as dict values:
l = [[1, 'A'], [1, 'B'], [2, 'C']]
d = {}
for key, val in l:
d.setdefault(key, []).append(val)
print(d)
Using a plain dictionary is often preferable over a defaultdict, in particular if you build it just once and then continue to read from it later in your code:
First, the plain dictionary is faster to build and access.
Second, and more importantly, the later read operations will error out if you try to access a key that doesn't exist, instead of silently creating that key. A plain dictionary lets you explicitly state when you want to create a key-value pair, while the defaultdict always implicitly creates them, on any kind of access.
This method is relatively efficient and quite compact:
reduce(lambda x, (k,v): x[k].append(v) or x, l, defaultdict(list))
In Python3 this becomes (making exports explicit):
dict(functools.reduce(lambda x, d: x[d[0]].append(d[1]) or x, l, collections.defaultdict(list)))
Note that reduce has moved to functools and that lambdas no longer accept tuples. This version still works in 2.6 and 2.7.
Are the keys already sorted in the input list? If that's the case, you have a functional solution:
import itertools
lst = [(1, 'A'), (1, 'B'), (2, 'C')]
dct = dict((key, tuple(v for (k, v) in pairs))
for (key, pairs) in itertools.groupby(lst, lambda pair: pair[0]))
print dct
# {1: ('A', 'B'), 2: ('C',)}
I had a list of values created as follows:
performance_data = driver.execute_script('return window.performance.getEntries()')
Then I had to store the data (name and duration) in a dictionary with multiple values:
dictionary = {}
for performance_data in range(3):
driver.get(self.base_url)
performance_data = driver.execute_script('return window.performance.getEntries()')
for result in performance_data:
key=result['name']
val=result['duration']
dictionary.setdefault(key, []).append(val)
print(dictionary)
My data was in a Pandas.DataFrame
myDict = dict()
for idin set(data['id'].values):
temp = data[data['id'] == id]
myDict[id] = temp['IP_addr'].to_list()
myDict
Gave me a Dict of the keys, ID, mappings to >= 1 IP_addr. The first IP_addr is Guaranteed. My code should work even if temp['IP_addr'].to_list() == []
{'fooboo_NaN': ['1.1.1.1', '8.8.8.8']}
My two coins for toss into that amazing discussion)
I've tried to wonder around one line solution with only standad libraries. Excuse me for the two excessive imports. Perhaps below code could solve the issue with satisfying quality (for the python3):
from functools import reduce
from collections import defaultdict
a = [1, 1, 2, 3, 1]
b = ['A', 'B', 'C', 'D', 'E']
c = zip(a, b)
print({**reduce(lambda d,e: d[e[0]].append(e[1]) or d, c, defaultdict(list))})
It is a bit hard for me to explain it in words, so I'll show an example:
What I have (data is a dict instance):
data = {'a':[4,5,3], 'b':[1,0,2], 'c':[6,7,8]}
What I need (ordered_data is an OrderedDict instance):
ordered_data = {'b':[0,1,2], 'a':[3,4,5], 'b':[6,7,8]}
The order of keys should be changed with respect to order of items in nested lists
tmp = {k:sorted(v) for k,v in data.items()}
ordered_data = OrderedDict((k,v) for k,v in sorted(tmp.items(), key=lambda i: i[1]))
First sort the values. If you don't need the original data, it's OK to do this in place, but I made a temporary variable.
key is a function that returns a key to be sorted on. In this case, the key is the second element of the item tuple (the list), and since lists are comparable, that's good enough.
You can use OrderedDict by sorting your items and the values :
>>> from operator import itemgetter
>>> from collections import OrderedDict
>>> d = OrderedDict(sorted([(k, sorted(v)) for k, v in data.items()], key=itemgetter(1)))
>>> d
OrderedDict([('b', [0, 1, 2]), ('a', [3, 4, 5]), ('c', [6, 7, 8])])
Usually, you should not worry about the data order in the dictionary itself, and instead, jsut order it when you retrieve the dictionary's contents (i.e.: iterate over it):
data = {'a':[4,5,3], 'b':[1,0,2], 'c':[6,7,8]}
for datum in sorted(data.items(), key=lambda item: item[1]):
...
This seems like such an obvious thing that I feel like I'm missing out on something, but how do you find out if two different keys in the same dictionary have the exact same value? For example, if you have the dictionary test with the keys a, b, and c and the keys a and b both have the value of 10, how would you figure that out? (For the point of the question, please assume a large number of keys, say 100, and you have no knowledge of how many duplicates there are, if there are multiple sets of duplicates, or if there are duplicates at all). Thanks.
len(dictionary.values()) == len(set(dictionary.values()))
This is under the assumption that the only thing you want to know is if there are any duplicate values, not which values are duplicates, which is what I assumed from your question. Let me know if I misinterpreted the question.
Basically this is just checking if any entries were removed when the values of the dictionary were casted to an object that by definition doesn't have any duplicates.
If the above doesn't work for your purposes, this should be a better solution:
set(k for k,v in d.items() if d.values().count(v) > 1))
Basically the second version just checks to see if there is more than one entry that will be removed if you try popping it out of the list.
To detect all of these cases:
>>> import collections
>>> d = {"a": 10, "b": 15, "c": 10}
>>> value_to_key = collections.defaultdict(list)
>>> for k, v in d.iteritems():
... value_to_key[v].append(k)
...
>>> value_to_key
defaultdict(<type 'list'>, {10: ['a', 'c'], 15: ['b']})
#hivert makes the excellent point that this only works if the values are hashable. If this is not the case, there is no nice O(n) solution(sadly). This is the best I can come up with:
d = {"a": [10, 15], "b": [10, 20], "c": [10, 15]}
values = []
for k, v in d.iteritems():
must_insert = True
for val in values:
if val[0] == v:
val[1].append(k)
must_insert = False
break
if must_insert: values.append([v, [k]])
print [v for v in values if len(v[1]) > 1] #prints [[[10, 15], ['a', 'c']]]
You can tell which are the duplicate values by means of a reverse index - where the key is the duplicate value and the value is the set of keys that have that value (this will work as long as the values in the input dictionary are hashable):
from collections import defaultdict
d = {'w':20, 'x':10, 'y':20, 'z':30, 'a':10}
dd = defaultdict(set)
for k, v in d.items():
dd[v].add(k)
dd = { k : v for k, v in dd.items() if len(v) > 1 }
dd
=> {10: set(['a', 'x']), 20: set(['y', 'w'])}
From that last result it's easy to obtain the set of keys with duplicate values:
set.union(*dd.values())
=> set(['y', 'x', 'a', 'w'])
dico = {'a':0, 'b':0, 'c':1}
result = {}
for val in dico:
if dico[val] in result:
result[dico[val]].append(val)
else:
result[dico[val]] = [val]
>>> result
{0: ['a', 'b'], 1: ['c']}
Then you can filter on the result's key that has a value (list) with more than one element, e.g. a duplicate has been found
Build another dict mapping the values of the first dict to all keys that hold that value:
import collections
inverse_dict = collections.defaultdict(list)
for key in original_dict:
inverse_dict[original_dict[key]].append(key)
keys = set()
for key1 in d:
for key2 in d:
if key1 == key2: continue
if d[key1] == d[key2]:
keys |= {key1, key2}
i.e. that's Θ(n²) what you want. The reason is that a dict does not provide Θ(1) search of a key, given a value. So better rethink your data structure choices if that's not good enough.
You can use list in conjunction with dictionary to find duplicate elements!
Here is a simple code demonstrating the same:
d={"val1":4,"val2":4,"val3":5,"val4":3}
l=[]
for key in d:
l.append(d[key])
l.sort()
print(l)
for i in range(len(l)):
if l[i]==l[i+1]:
print("true, there are duplicate elements.")
print("the keys having duplicate elements are: ")
for key in d:
if d[key]==l[i]:
print(key)
break
output:
runfile('C:/Users/Andromeda/listeqtest.py', wdir='C:/Users/Andromeda')
[3, 4, 4, 5]
true, there are duplicate elements.
the keys having duplicate elements are:
val1
val2
when you sort the elements in the list, you will find that equal values always appear together!