Modifying a list of dictionaries with the same keys - python

This is from scraping data off of IMDB. I have four lists of items - ratings, rankings, titles, years. I need to take these lists and merge them into a list of dictionaries which would look like:
dict_list = [{'rating':value_from_rating_list,
'ranking':value_from_ranking_list,
'year':value_year_list,
'title':value_from_title_list},
{entry two},
{entry three},
etc...]
The end product being a list of dictionaries, with each dictionary having those four keys with the different values from the the four lists of items, so if looking at a completed dictionary within the list it would look like:
{'rating':8.5, 'ranking':10, 'year':2010, 'title':'Movie Name'}
with each of key values coming from one of the separate lists.
I've tried generating a dictionary with the key names in place e.g.:
key_names = {'rating':None, 'year':None, 'ranking':None, 'title':None}
lis = []
for i in range(1,20):
lis.append(key_names)
But I'm not sure after doing that how to update the individual dictionaries within the list with the values from the other four pre-generated lists.

You say you have 4 pre-generated lists, in that case (assuming the i-th item belongs together) you can zip them:
mov = ['a', 'b', 'c']
rat = [1, 2, 3]
year = [1999, 2000, 2010]
[{'title': t, 'year': y, 'rating': r} for t, r, y in zip(mov, rat, year)]
which gives:
[{'rating': 1, 'title': 'a', 'year': 1999},
{'rating': 2, 'title': 'b', 'year': 2000},
{'rating': 3, 'title': 'c', 'year': 2010}]
However I personally would prefer an immutable structure like collections.namedtuple here:
from collections import namedtuple
Movie = namedtuple('Movie', ['title', 'year', 'rating', 'ranking'])
mov = ['a', 'b', 'c']
rat = [1, 2, 3]
rank = [10, 9, 10]
year = [1999, 2000, 2010]
>>> [Movie(t, y, rt, rk) for t, rt, y, rk in zip(mov, rat, year, rank)]
[Movie(title='a', year=1999, rating=1, ranking=10),
Movie(title='b', year=2000, rating=2, ranking=9),
Movie(title='c', year=2010, rating=3, ranking=10)]
That's a matter of preference, namedtuple is just an alternative.

You don't need pre-populate keys. Not existed keys automatically created on update.
key_names.update(your_new_list)

Suppose you have following four lists, 2 elements each:
In [177]: l1 = range(2)
In [178]: l2 = range(8, 10)
In [179]: l3 = range(12, 14)
In [180]: l4 = range(15, 17)
Let's create a list where'll store the result:
In [181]: l = []
Iterate through the first list and create a new dictionary for each iteration:
In [184]: for ind, i in enumerate(l1):
d = {"a": i, "b": l2[ind], "c": l3[ind], "d": l4[ind]}
l.append(d)
In [189]: l
Out[189]: [{'a': 0, 'b': 8, 'c': 12, 'd': 15},
{'a': 1, 'b': 9, 'c': 13, 'd': 16}]

Related

Adding to a dictionary based on key and value from lists?

I have a dictionary defined as:
letters = {'a': 2, 'b': 1, 'c': 5}
I want to add values to this dictionary based on two lists: one which contains the keys and another which contains the values.
key_list = [a, c]
value_list = [2, 5]
This should give the output:
{a: 4, b: 1, c: 10}
Any ideas on how I can accomplish this? I am new to working with the dictionary structure so I apologise if this is extremely simple.
Thanks.
You can zip the two lists and then add to the dictionary as so;
letters = {'a': 2, 'b': 1, 'c': 5}
key_list = ['a', 'c']
value_list = [2, 5]
for k,v in zip(key_list, value_list):
letters[k] = letters.get(k, 0) + v
Using the dictionary's get() method as above allows you to add letters that aren't already in the dictionary.
for i in range(len(key_list)):
letters[key_list[i]] += value_list[i]
You can simply add or modify values from a dictionary using the key
For example:
letters = {'a': 2, 'b':1 , 'c': 5}
letters['a'] += 2
letters['c'] += 5
print(letters)
output = {'a': 4, 'b': 1, 'c': 10}

How to merge multiple lists into 1 but only those elements that were in all of the initial lists?

I need to merge 5 lists of which any list can be empty that way so that only items that were in all 5 initial lists are included in the newly formed list.
for filter in filters:
if filter == 'M':
filtered1 = [] # imagine that this is filled
if filter == 'V':
filtered2 = [] # imagine that this is filled
if filter == 'S':
filtered3 = [] # imagine that this is filled
if filter == 'O':
filtered4 = [] # imagine that this is filled
if filter == 'C':
filtered5 = [] # imagine that this is filled
filtered = [] # merge all 5 lists from above
So now I need to make a list filtered with merged data from all filtered lists 1-5. How should I do that?
This is the most classical solution.
filtered = filter1 + filter2 + filter3 + filter4 + filter5
What happens is that you add an list to another one and so on...
So if filter1 was ['a', 'b'] and filter3 was ['c', 'd'] and filter4 was ['e'],
then you would get:
filtered = ['a', 'b', 'c', 'd', 'e']
Given some lists xs1, ..., xs5:
xss = [xs1, xs2, xs3, xs4, xs5]
sets = [set(xs) for xs in xss]
merged = set.intersection(*sets)
This has the property that merged may be in any order.
f1, f2, f3, f4, f5 = [1], [], [2, 5], [4, 1], [3]
only_merge = [*f1, *f2, *f3, *f4, *f5]
print("Only merge: ", only_merge)
merge_and_sort = sorted([*f1, *f2, *f3, *f4, *f5])
print("Merge and sort: ", merge_and_sort)
merge_and_unique_and_sort = list({*f1, *f2, *f3, *f4, *f5})
print("Merge, unique and sort: ", merge_and_unique_and_sort)
Output:
Only merge: [1, 2, 5, 4, 1, 3]
Merge and sort: [1, 1, 2, 3, 4, 5]
Merge, unique and sort: [1, 2, 3, 4, 5]

How to assign certain scores from a list to values in multiple lists and get the sum for each value in python?

Could you explain how to assign certain scores from a list to values in multiple lists and get the total score for each value?
score = [1,2,3,4,5] assigne a score based on the position in the list
l_1 = [a,b,c,d,e]
assign a=1, b=2, c=3, d=4, e=5
l_2 = [c,a,d,e,b]
assign c=1, a=2, d=3, e=4, b=5
I am trying to get the result like
{'e':9, 'b': 7, 'd':7, 'c': 4, 'a': 3}
Thank you!
You can zip the values of score to each list, which gives you a tuple of (key, value) for each letter-score combination. Make each zipped object a dict. Then use a dict comprehension to add the values for each key together.
d_1 = dict(zip(l_1, score))
d_2 = dict(zip(l_2, score))
{k: v + d_2[k] for k, v in d_1.items()}
# {'a': 3, 'b': 7, 'c': 4, 'd': 7, 'e': 9}
You better use zip function:
dic = {'a':0, 'b': 0, 'c':0, 'd': 0, 'e': 0}
def score(dic, *args):
for lst in args:
for k, v in zip(lst, range(len(lst))):
dic[k] += v+1
return dic
l_1 = ['a','b','c','d','e']
l_2 = ['c','a','d','e','b']
score(dic, l_1, l_2)
Instead of storing your lists in separate variables, you should put them in a list of lists so that you can iterate through it and calculate the sums of the scores according to each key's indices in the sub-lists:
score = [1, 2, 3, 4, 5]
lists = [
['a','b','c','d','e'],
['c','a','d','e','b']
]
d = {}
for l in lists:
for i, k in enumerate(l):
d[k] = d.get(k, 0) + score[i]
d would become:
{'a': 3, 'b': 7, 'c': 4, 'd': 7, 'e': 9}
from collections import defaultdict
score = [1,2,3,4,5] # note: 0 no need to use this list if there is no scenario like [5,6,9,10,4]
l_1 = ['a','b','c','d','e']
l_2 = ['c','a','d','e','b']
score_dict = defaultdict(int)
'''
for note: 0
if your score is always consecutive
like score = [2,3,4,5,6] or [5,6,7,8,9]...
you don't need to have seperate list of score you can set
start = score_of_char_at_first_position_ie_at_zero-th_index
like start = 2, or start = 5
else use this function
def add2ScoreDict( lst):
for pos_score, char in zip(score,lst):
score_dict[char] += pos_score
'''
def add2ScoreDict( lst):
for pos, char in enumerate( lst,start =1):
score_dict[char] += pos
# note: 1
add2ScoreDict( l_1)
add2ScoreDict( l_2)
#print(score_dict) # defaultdict(<class 'int'>, {'a': 3, 'b': 7, 'c': 4, 'd': 7, 'e': 9})
score_dict = dict(sorted(score_dict.items(), reverse = True, key=lambda x: x[1]))
print(score_dict) # {'e': 9, 'b': 7, 'd': 7, 'c': 4, 'a': 3}
edit 1:
if you have multiple lists put them in list_of_list = [l_1, l_2] so that you don't have to call func add2ScoreDict yourself again and again.
# for note: 1
for lst in list_of_list:
add2ScoreDict( lst)
You could zip both lists with score as one list l3 then you could use dictionary comprehension with filterto construct your dicitonary. The key being index 1 of the the newly formed tuples in l3, and the value being the sum of all index 0's in l3 after creating a sublist that is filtered for only matching index 0's
score = [1,2,3,4,5]
l_1 = ['a', 'b', 'c', 'd', 'e']
l_2 = ['c', 'a', 'd', 'e', 'b']
l3 = [*zip(score, l_1), *zip(score,l_2)]
d = {i[1]: sum([j[0] for j in list(filter(lambda x: x[1] ==i[1], l3))]) for i in l3}
{'a': 3, 'b': 7, 'c': 4, 'd': 7, 'e': 9}
Expanded Explanation:
d = {}
for i in l3:
f = list(filter(lambda x: x[1] == i[1], l3))
vals = []
for j in f:
vals.append(j[0])
total_vals = sum(vals)
d[i[1]] = total_vals
The simplest way is probably to use a Counter from the Python standard library.
from collections import Counter
tally = Counter()
scores = [1, 2, 3, 4, 5]
def add_scores(letters):
for letter, score in zip(letters, scores):
tally[letter] += score
L1 = ['a', 'b', 'c', 'd', 'e']
add_scores(L1)
L2 = ['c', 'a', 'd', 'e', 'b']
add_scores(L2)
print(tally)
>>> python tally.py
Counter({'e': 9, 'b': 7, 'd': 7, 'c': 4, 'a': 3})
zip is used to pair letters and scores, a for loop to iterate over them and a Counter to collect the results. A Counter is actually a dictionary, so you can write things like
tally['a']
to get the score for letter a or
for letter, score in tally.items():
print('Letter %s scored %s' % (letter, score))
to print the results, just as you would with a normal dictionary.
Finally, small ells and letter O's can be troublesome as variable names because they are hard to distinguish from ones and zeros. The Python style guide (often referred to as PEP8) recommends avoiding them.

Representing observations as a dictionary and filtering based on one key and value set

I have a data set (in a file) composed of multiple observations (rows) with various attributes (columns). For example:
AttrA AttrB AttrC
1 12 'a'
2 43 'd'
3 23 'f'
4 25 'z'
I put this data set into a python dictionary such that:
data = {'AttrA':[1,2,3,4],'AttrB':[12,43,23,25],'AttrC':['a','d','f','z']}
I would like to be able to filter the observations based on a criteria of one of the keys. For examples. Filter observations for AttrA >= 3, such that:
AttrA AttrB AttrC
3 23 'f'
4 25 'z'
or
reducedData = {'AttrA':[3,4],'AttrB':[23,25],'AttrC':['f','z']}
It seems like you could do something like: reduceddata = {(k,v) for k,v in data if (??)}, but I'm not sure what goes after the if statement. Also, is the a dictionary the best data type to use for this example? It seems like it would be easier to filter if the data was in a nested list.
Thank you in advance!!
I would change the structure first:
table = [dict(zip(data.keys(), row)) for row in zip(*data.values())]
It'll look like this:
[{'AttrA': 1, 'AttrB': 12, 'AttrC': 'a'},
{'AttrA': 2, 'AttrB': 43, 'AttrC': 'd'},
{'AttrA': 3, 'AttrB': 23, 'AttrC': 'f'},
{'AttrA': 4, 'AttrB': 25, 'AttrC': 'z'}]
Now, you can filter it exactly like you described:
[row for row in table if row['AttrA'] >= 3]
def my_filter(data,attr,val):
ind = [i for i,x in enumerate(data[attr]) if x >= val]
reducedData = {k: [v[i] for i in ind] for k,v in data.items()}
return reducedData
data = {'AttrA':[1,2,3,4],'AttrB':[12,43,23,25],'AttrC':['a','d','f','z']}
print my_filter(data, 'AttrA',3)
output:
{'AttrB': [23, 25], 'AttrC': ['f', 'z'], 'AttrA': [3, 4]}
I think I would go with:
data = [(12, 'a'), (43, 'd'), (23, 'f'), (25, 'z')]
data_dic = dict(enumerate(data, 1))
reducedData = {k:v for k, v in data_dic if k >= 3}

Python dict from lists with key and valuelist

I have three lists of each x elements:
stat = ["A","B","C"]
X = [1,2,3]
Y = [10,15,20]
No I'd like to create dict out of that lists where
'stat' should be the key and X and Y are valuepairs stored
in a list of each two elements. The result
could look like:
my_dict = {
"A" : [1,10],
"B" : [2,15],
"C" : [3,20]
}
or could even be a nested dict so that I can index it with my_dict["A"]["X"].
or is there any other way to get a "named array" in python?
As I have a second question which is very related to the first one I just add it here instead of opening a new one:
I am acutally very used to R's arrays. Thus my question.: Is there anything like a named array in python? E.g. I have two lists which represent
my column and rownames:
columns = ["A","B","C"], rows = ["row_a","row_b","row_c"]
Now I'd like to create an array from these two lists:
my_array = columns x rows
which I want to index with the names like:
my_array["A","row_b"]
and assign values to the "cells" (populate the array) in a loop.
Is it possible to do such things in python in an easy way? Probably this is done
also best with a dictionary to use indexing with strings..
>>> stat = ["A","B","C"]
>>> X = [1,2,3]
>>> Y = [10,15,20]
>>> dict(zip(stat, map(list, zip(X, Y))))
{'A': [1, 10], 'C': [3, 20], 'B': [2, 15]}
Generator-expressions rule:
dict((key, [v1,v2]) for key, v1, v2 in zip(stat, X, Y))
>>> stat = ["A","B","C"]
>>> X = [1,2,3]
>>> Y = [10,15,20]
>>> {s:[x,y] for s,x,y in zip(stat, X, Y)}
{'A': [1, 10], 'C': [3, 20], 'B': [2, 15]}
To able to use my_dict["A"]["X"] it's slightly different.
>>> {s:{'X':x, 'Y':y} for s,x,y in zip(stat, X, Y)}
{'A': {'Y': 10, 'X': 1}, 'C': {'Y': 20, 'X': 3}, 'B': {'Y': 15, 'X': 2}}
Python3 syntax is superior for the first way:
>>> stat = ["A","B","C"]
>>> X = [1,2,3]
>>> Y = [10,15,20]
>>> {k:v for k,*v in zip(stat, X, Y)}
{'A': [1, 10], 'C': [3, 20], 'B': [2, 15]}

Categories