bit of a rookie scraper here, trying to make a dictionary out of scraped table.
I scraped a table using selenium, which didn't have different headers and cells, and now I am stuck with an appended list I made myself that features firstly the header names and then all the values such as:
list = [H1, H2, H3, ValueA1, ValueA2, ValueA3, ValueB1, ValueB2, ValueB3 ....]
My desired output is a list of dictionaries that features the first three objects as dictionary keys, and the next three as objects as dictionary values, and so on.
Thank you
Though this is a code request, I'll bite:
In [3]: l = ['asdf', 'qwer', 1, 2, 3, 4, 5, 6, 7, 8]
In [4]: n_headers = 2
In [5]: [{k: v for k, v in zip(l[:n_headers], l[i:i + n_headers])}
for i in range(n_headers, len(l), n_headers)]
Out[5]:
[{'qwer': 2, 'asdf': 1},
{'qwer': 4, 'asdf': 3},
{'qwer': 6, 'asdf': 5},
{'qwer': 8, 'asdf': 7}]
This'll end up slicing the list quite a few times, which you can avoid with the iter() trick:
In [9]: g = zip(*[iter(l)] * 2)
In [10]: hdrs = next(g)
In [11]: hdrs
Out[11]: ('asdf', 'qwer')
In [12]: [{k: v for k, v in zip(hdrs, h)} for h in g]
Out[12]:
[{'qwer': 2, 'asdf': 1},
{'qwer': 4, 'asdf': 3},
{'qwer': 6, 'asdf': 5},
{'qwer': 8, 'asdf': 7}]
Unclear if this is what you're looking for but for a 'list' of dictionaries:
i = 3
d={}
result=[]
while i < len(list): #Iterating over list
d[list[i%3]]=list[i]
i += 1
if (i%3==0): #Add to your list for every third element
result.append(d)
d={}
Output would be along the lines of
[{'H2': 'ValueA2', 'H3': 'ValueA3', 'H1': 'ValueA1'}, {'H2': 'ValueB2', 'H3': 'ValueB3', 'H1': 'ValueB1'}]
Use combination of zip, iter.. Assuming 3 headers..
lst = [ 'H1', 'H2', 'H3', 'ValueA1', 'ValueA2', 'ValueA3', 'ValueB1', 'ValueB2', 'ValueB3', 'ValueC1', 'ValueC2', 'ValueC3' ]
grps = list( zip(*([iter(lst)] * 3)) )
[ dict( zip( grps[0], grps[i]) ) for i in range(1,len(grps))]
Output:
[{'H1': 'ValueA1', 'H2': 'ValueA2', 'H3': 'ValueA3'},
{'H1': 'ValueB1', 'H2': 'ValueB2', 'H3': 'ValueB3'},
{'H1': 'ValueC1', 'H2': 'ValueC2', 'H3': 'ValueC3'}]
Related
I am trying to implement a simple task. I have a dictionary with keys (ti, wi)
y={('t1', 'w1'): 1, ('t2', 'w1'): 2, ('t3', 'w1'): 3, ('t1', 'w2'): 4, ('t2', 'w2'): 5, ('t3', 'w2'): 6}
I want to create a new dictionary where keys will be wi, and value is a list of all ti. So I want to have an output dictionary like:
{'w1': [1, 2, 3], 'w2': [4, 5, 6]}
I wrote the following code:
y={('t1', 'w1'): 1, ('t2', 'w1'): 2, ('t3', 'w1'): 3, ('t1', 'w2'): 4, ('t2', 'w2'): 5, ('t3', 'w2'): 6}
y_w={}
y_t=[]
for w in range(1,3):
y_t.clear()
for t in range(1,4):
print('t= ', t, 'w= ', w, 'y=' , y['t{0}'.format(t), 'w{0}'.format(w)])
y_t.append(y['t{0}'.format(t), 'w{0}'.format(w)])
print(y_t)
y_w['w{0}'.format(w)]=y_t
print(y_w)
But the result I am getting is
{'w1': [4, 5, 6], 'w2': [4, 5, 6]}
I can not understand where the first list disappeared? Can someone help me explain where I am wrong? Is there a nicer way to do it, maybe without for lops?
Your problem lies in the assumption that setting the value in the dictionary somehow freezes the list.
It's no accident the lists have the same values: They are identical, two pointers to the same list. Observe:
>>> a_dict = {}
>>> a_list = []
>>> a_list.append(23)
>>> a_dict["a"] = a_list
>>> a_list.clear()
>>> a_list.append(42)
>>> a_dict["b"] = a_list
>>> a_dict
{'a': [42], 'b': [42]}
You could fix your solution by replacing y_t.clear() with y_t = [], which does create a new list:
y = {('t1', 'w1'): 1, ('t2', 'w1'): 2, ('t3', 'w1'): 3, ('t1', 'w2'): 4, ('t2', 'w2'): 5, ('t3', 'w2'): 6}
y_w = {}
for w in range(1,3):
y_t = []
for t in range(1,4):
print('t= ', t, 'w= ', w, 'y=' , y['t{0}'.format(t), 'w{0}'.format(w)])
y_t.append(y['t{0}'.format(t), 'w{0}'.format(w)])
print(y_t)
y_w['w{0}'.format(w)]=y_t
print(y_w)
But there are, as you suspect, easier ways of doing this, for example the defaultdict solution shown by Riccardo Bucco.
Try this:
from collections import defaultdict
d = defaultdict(list)
for k, v in y.items():
d[k[1]].append(v)
d = dict(d)
The line number 10 is causing the problem, if you replace it with y_t = [] it will work as you expect
You could first find all unique keys:
unique_keys = set(list(zip(*k))[1])
and then create the dict with list-values using those:
{u: [v for k, v in y.items() if k[1] == u] for u in unique_keys}
According to your output here's what you can try:
y = {('t1', 'w1'): 1, ('t2', 'w1'): 2, ('t3', 'w1'): 3, ('t1', 'w2'): 4, ('t2', 'w2'): 5, ('t3', 'w2'): 6}
def new_dict_with_keys(dictionary):
new_dictionary = dict()
# Go through the dictionary keys to read each key's value
for tuple_key in dictionary:
if "w1" in tuple_key or "w2" in tuple_key:
# Determine which key to use
if "w1" in tuple_key:
key = "w1"
else:
key = "w2"
# Check if the new dictionary has the "w1" or "w2" as a an item
# If it does not, create a new list
if new_dictionary.get(key) is None:
new_dictionary[key] = list()
# Append the value in the respective key
new_dictionary[key].append(dictionary[tuple_key])
# Return the dictionary with the items
return new_dictionary
print(new_dict_with_keys(y))
# Prints: {'w1': [1, 2, 3], 'w2': [4, 5, 6]}
Here's a solution using itertools.groupby:
import itertools as it
from operator import itemgetter
items = sorted((k, v) for (_, k), v in y.items())
groups = it.groupby(items, key=itemgetter(0))
result = {k: [v for _, v in vs] for k, vs in groups}
# {'w1': [1, 2, 3], 'w2': [4, 5, 6]}
I have some processing, where in a loop, I am creating a dictionary and appending each dictionary to a list. And later I am doing something else with a list. Now, when I append a dictionary to a list - I want to make some CHECK - if it is true, append, if it is FALSE, just skip and don't append.
I made something simplified to make a point:
k = [1,2,3,4,5]
u = []
for i in k:
x1 = i
x2 = i**2
# print(x1)
my_dict = dict({'x1':x1,
'x2':x2})
u.append(my_dict)
print(u)
# [{'x1': 1, 'x2': 1}, {'x1': 2, 'x2': 4}, {'x1': 3, 'x2': 9}, {'x1': 4, 'x2': 16}, {'x1': 5, 'x2': 25}]
Can you please help me fix the example above and do this: APPEND TO A LIST ONLY IF VALUE OF X2 is 9 OR 16. (skip others). (so I will be checking some key (x2) to have a value to be equal to some tuple of values (9, 16), which were input by argument to a function).
From this, I will be able to make it suitable for my case.
Lets try using a list comprehension with a in condition.
k = [1,2,3,4,5]
conditions = (9,16)
[{'x1':i, 'x2':i**2} for i in k if i**2 in conditions]
[{'x1': 3, 'x2': 9}, {'x1': 4, 'x2': 16}]
If you want to filter the list of dicts AFTER its been defined, to avoid recalculation of x2, you can do this -
k = [1,2,3,4,5]
conditions = (9,16)
u = [{'x1':i, 'x2':i**2} for i in k]
u2 = [i for i in u if i.get('x2') in conditions]
print(u2)
[{'x1': 3, 'x2': 9}, {'x1': 4, 'x2': 16}]
Try
k = [1,2,3,4,5]
u = []
for i in k:
x1 = i
x2 = i**2
# print(x1)
my_dict = dict({'x1':x1,
'x2':x2})
if my_dict['x2'] in [9, 16]: # check if it equals 9 or 16
u.append(my_dict)
print(u)
# [{'x1': 1, 'x2': 1}, {'x1': 2, 'x2': 4}, {'x1': 3, 'x2': 9}, {'x1': 4, 'x2': 16}, {'x1': 5, 'x2': 25}]
I have a list of tuples like this:
list = [(1,2),(1,3),(1,5),(0,8),(0,9),(0,1),(3,6),(3,7)]
I want to build a dictionary with sets of associate values like this:
result = {1:{2,3,5},0:{8,9,1},3:{6,7}}
I have this code:
return {x:y for (x,y) in list}
result = {1: 5, 0: 1, 3: 7}
But I have only the last value, I want all associate values in a set.
Thanks in advance
defaultdict could add value for a key without checking existence
from collections import defaultdict
mylist = [(1,2),(1,3),(1,5),(0,8),(0,9),(0,1),(3,6),(3,7)]
result = defaultdict(list)
for item in mylist:
result[item[0]].append(item[1])
SOLUTION:
This code snippet should solve your problem statement:
lst = [(1,2),(1,3),(1,5),(0,8),(0,9),(0,1),(3,6),(3,7)]
output_dict = dict()
[output_dict[t[0]].add(t[1]) if t[0] in list(output_dict.keys()) else output_dict.update({t[0]: {t[1]}}) for t in lst]
print(output_dict)
OUTPUT:
{1: {2, 3, 5}, 0: {8, 9, 1}, 3: {6, 7}}
This should help u:
lst = [(1,2),(1,3),(1,5),(0,8),(0,9),(0,1),(3,6),(3,7)]
result= {}
[result.setdefault(x, set()).add(y) for x,y in lst]
print(result)
Output:
{1: {2, 3, 5}, 0: {8, 9, 1}, 3: {6, 7}}
Incase if you want one liner:
In [10]: original_list = [(1,2),(1,3),(1,5),(0,8),(0,9),(0,1),(3,6),(3,7),(1,2)]
In [11]: {x: {r for(q, r) in original_list if q == x} for (x, y) in original_list}
Out[11]: {1: {2, 3, 5}, 0: {1, 8, 9}, 3: {6, 7}}
Here it is:
result = {}
for x, y in list:
if x not in result:
result[x] = []
result[x].append(y)
print(result)
I would like to compare nested dictionaries as following:
d = {'siteA': {'00000000': 3, '11111111': 4, '22222222': 5},
'siteB': {'00000000': 1, '11111111': 2, '22222222': 5}}
e = {'siteA': {'00000000': 5}}
f = {'siteB': {'33333333': 10}}
g = {'siteC': {'00000000': 8}}
d is the total full dictionaries that will be use to compare with e, f and g.
If e happens to found in siteA-00000000, then I would like both value (in this case 3 and 5) add up to become 8.
If f is not found (in this case, it's true), I would like to append the dictionary into the d['siteB'].
If g is not found, would like to append into d.
Thanks!
collections.Counter is useful for summing values in dictionaries and adding keys where they do not exist. Since Counter is a subclass of dict, this should not break other operations. Apart from a one-off conversion cost, it is efficient and designed specifically for such tasks.
from collections import Counter
# convert d to dictionary of Counter objects
d = {k: Counter(v) for k, v in d.items()}
# add items from e
for k, v in e.items():
if k in d:
d[k] += Counter(e[k])
# add item from f if not found
for k, v in f.items():
if k not in d:
d[k] += Counter(f[k])
# add item from g if not found
for k, v in g.items():
if k not in d:
d[k] = Counter(v)
Result:
print(d)
{'siteA': Counter({'00000000': 8, '11111111': 4, '22222222': 5}),
'siteB': Counter({'00000000': 1, '11111111': 2, '22222222': 5}),
'siteC': Counter({'00000000': 8})}
You can use Counter from collections in combination with defaultdict.
As the name suggests, the counter counts the same elements, and a defaultdict lets you access non-existing keys by providing a default value (an empty Counter in this case). Your code then becomes
from collections import Counter, defaultdict
d = defaultdict(Counter)
d['siteA'] = Counter({'00000000': 3, '11111111': 4, '22222222': 5})
d['siteB'] = Counter({'00000000': 1, '11111111': 2, '22222222': 5})
print(d.items())
> dict_items([('siteA', Counter({'22222222': 5, '11111111': 4, '00000000': 3})),
> ('siteB', Counter({'22222222': 5, '11111111': 2, '00000000': 1}))])
# d + e:
d['siteA'].update({'00000000': 5})
print(d.items())
> dict_items([('siteA', Counter({'00000000': 8, '22222222': 5, '11111111': 4})),
> ('siteB', Counter({'22222222': 5, '11111111': 2, '00000000': 1}))])
# d + f
d['siteB'].update({'33333333': 10})
print(d.items())
> dict_items([('siteA', Counter({'00000000': 8, '22222222': 5, '11111111': 4})),
> ('siteB', Counter({'33333333': 10, '22222222': 5, '11111111': 2, '00000000': 1}))])
# d + g
d['siteC'].update({'00000000': 8})
print(d.items())
> dict_items([('siteA', Counter({'00000000': 8, '22222222': 5, '11111111': 4})),
> ('siteB', Counter({'33333333': 10, '22222222': 5, '11111111': 2, '00000000': 1})),
>. ('siteC', Counter({'00000000': 8}))])
Given the format of your dictionaries dict[site][address], let's say, this merge function will take the values from dictFrom and insert them into dictTo according to your rules.
def merge(dictTo, dictFrom):
for site in dictFrom:
if site not in dictTo:
dictTo[site] = {}
for address in dictFrom[site]:
dictTo[site][address] = dictTo[site].get(address, 0) + dictFrom[site][address]
merge(d, e)
merge(d, f)
merge(d, g)
This may be preferable to jpp's answer because the objects at dict[site] are all still basic dicts.
I have a dictionary like so (but much longer):
codes = {
'113110': 7, '113310': 1, '213111': 1,
'213112': 3, '236115': 2, '236220': 1,
'238190': 1, '238330': 1, '238990': 2,
'311612': 1, '321214': 1, }
I want to know the sum value of all keys grouped by the first two digits. So, '11' should be 8. But if I check like the following, an occurrence of '11' anywhere in the key will count.
group_11 = sum([ v for k,v in codes.items() if '11' in k])
# Returns 15 instead of 8
I've tried using startswith, but I'm not sure how it works in this context. Not like this:
group_11 = sum([ v for k,v in codes.items() if any(k.startswith('11')])
I have 20 groups to check against, but I want to be able to total any set of keys grouping by first x characters as the groupings could change in the future.
You can use itertools.groupby to sort (the sorting is important for groupby to work properly) and group your dict's items by the first two key chars and sum the values for each group:
from itertools import groupby
d = {
k: sum(item[1] for item in g)
for k, g in groupby(sorted(codes.items()), key=lambda item: item[0][:2])
}
d
{'11': 8, '32': 1, '31': 1, '21': 4, '23': 7}
You could convert all the items in codes to Counter and sum them together:
from collections import Counter
codes = {
'113110': 7, '113310': 1, '213111': 1,
'213112': 3, '236115': 2, '236220': 1,
'238190': 1, '238330': 1, '238990': 2,
'311612': 1, '321214': 1
}
sum((Counter({k[:2]: v}) for k, v in codes.iteritems()), Counter()) # Counter({'11': 8, '23': 7, '21': 4, '32': 1, '31': 1})