For a test program I'm making a simple model of the NFL. I'd like to assign a record (wins and losses) to a team as a value in a dictionary? Is that possible?
For example:
afcNorth = ["Baltimore Ravens", "Pittsburgh Steelers", "Cleveland Browns", "Cincinatti Bengals"]
If the Ravens had 13 wins and 3 loses, can the dictionary account for both of those values? If so, how?
sure, just make the value a list or tuple:
afc = {'Baltimore Ravens': (10,3), 'Pb Steelers': (3,4)}
If it gets more complicated, you might want to make a more complicated structure than a tuple - for example if you like dictionaries, you can put a dictionary in your dictionary so you can dictionary while you dictionary.
afc = {'Baltimore Ravens': {'wins':10,'losses': 3}, 'Pb Steelers': {'wins': 3,'losses': 4}}
But eventually you might want to move up to classes...
The values in the dictionary can be tuples or, maybe better in this case, lists:
d = {"Baltimore Ravens": [13, 3]}
d["Baltimore Ravens"][0] += 1
print d
# {"Baltimore Ravens": [14, 3]}
Well, you can use a tuple (or a list):
records = {}
records["Baltimore Ravens"] = (13, 3)
Or you could be fancy and make a Record class with Record.wins and record.losses, but that's probably overkill.
(As another answer points out, using a list means that you can do arithmetic on the values, which might be useful.)
Related
I'm working on a premier league dataset and I need to create a dictionary where the keys are the teams and the values are their relative points. I have a list for the teams and a function that takes the results from the matches and transform them into the points for the teams. I got everything good but the problem is that instead of creating one dictionary with all the teams and their scores, it prints 20 dictionaries for each of the team. What is wrong?
You are creating a new dictionary at each iteration. Instead you should make the dictionary before the loop and then add a new entry at each iteration:
def get_team_points(df, teams):
team_points = {}
for team_name in teams:
num_points = ... # as you have it but since you posted an image I'm not rewriting it
team_points[team_name] = num_points
return team_points
A neater solution is to use a dictionary comprehension
def get_team_points(df, teams):
team_points = {team: get_num_points(team, df) for team in teams}
return team_points
where get_num_points is a function of your num_points = ... line, which again I would type out if you had posted the code as text :)
Also - please start using better variable names ;) your life will improve if you do. Names like List and Dict are really bad since:
they're not descriptive
they shadow build-in classes from the typing module (which you should use)
they violate pep8 naming conventions
and speaking of the typing module, here it is in action:
def get_team_points(df: pd.DataFrame, teams: List[str]) -> Dict[str, int]:
team_points = {team: get_num_points(team, df) for team in teams}
return team_points
now you can use a tool like mypy to catch errors before they occur. If you use an IDE instead of jupyter, it will highlight errors as you go. And also your code becomes much clearer for other developers (including future you) to understand and use.
I think perhaps you want this:
def get_team_points(df, teams):
Dict = {}
for team_name in List:
num_points = TeamPoints(...)
Dict[team_name] = num_points
print(Dict)
In TeamsPointDict() method, you are creating dictionaries for each team member in the list.
To insert all of them in one dictionary, declare the dictionary outside the for loop.
You want to take the sum of HP for Home teams, and AP for Away teams and add them together by team. Instead of manually separating, you can use two groupby operations and sum the results.
The return of each groupby will be a Series that we can then add together as pandas aligns on the index (teams in this case). Then with Series.to_dict() we get the entire dictionary at once.
import pandas as pd
df = pd.DataFrame({'HomeTeam': list('AABCDA'), 'AwayTeam': list('CBAAAB'),
'HP': [4,5,6,7,8,10], 'AP': [0,0,10,11,4,7]})
HomeTeam AwayTeam HP AP
0 A C 4 0
1 A B 5 0
2 B A 6 10
3 C A 7 11
4 D A 8 4
5 A B 10 7
# Fill value so addition works if a team has exclusively home/away games.
s = df.groupby('HomeTeam')['HP'].sum().add(df.groupby('AwayTeam')['AP'].sum(),
fill_value=0).astype(int)
s.to_dict()
{'A': 44, 'B': 13, 'C': 7, 'D': 8}
you should define your dictionary before the function then add your values.
dic = {}
for team_name in List:
dic[team_name] = num_points
Say I have a dictionary like this :
d = {'ben' : 10, 'kim' : 20, 'bob' : 9}
Is there a way to remove a pair like ('bob',9) from the dictionary?
I already know about d.pop('bob') but that will remove the pair even if the value was something other than 9.
Right now the only way I can think of is something like this :
if (d.get('bob', None) == 9):
d.pop('bob')
but is there an easier way? possibly not using if at all
pop also returns the value, so performance-wise (as neglectable as it may be) and readability-wise it might be better to use del.
Other than that I don't think there's something easier/better you can do.
from timeit import Timer
def _del():
d = {'a': 1}
del d['a']
def _pop():
d = {'a': 1}
d.pop('a')
print(min(Timer(_del).repeat(5000, 5000)))
# 0.0005624240000000613
print(min(Timer(_pop).repeat(5000, 5000)))
# 0.0007729860000003086
You want to perform two operations here
1) You want to test the condition d['bob']==9.
2) You want to remove the key along with value if the 1st answer is true.
So we can not omit the testing part, which requires use of if, altogether. But we can certainly do it in one line.
d.pop('bob') if d.get('bob')==9 else None
I have a dictionary currently setup as
{'name': 'firm', 'name':'firm', etc},
Where keys are analyst names and values are analyst firms.
I am trying to create a new dictionary where the new values are the old k,v pairs and the associated key is simply the index (1, 2, 3, 4, etc).
Current code is below:
num_analysts = len(analysts.keys())
for k,v in analysts.items():
analysts_dict = dict.fromkeys(range(num_analysts), [k,v])
Current result
Each numeric key is getting given the same value (old k,v pair). What is wrong with my expression?
You can enumerate the items and convert them to a dictionary. However, dictionaries, in general, are not ordered. This means that the keys may be assigned essentially randomly.
dict(enumerate(analysts.items(), 1))
#{1: ('name1', 'firm1'), 2: ('name2', 'firm2')}
Enumerate and dictionary comprehension for this
d = {'name1': 'firm1', 'name2': 'firm2'}
d2 = {idx: '{}, {}'.format(item, d[item]) for idx, item in enumerate(d, start = 1)}
{1: 'name1, firm1', 2: 'name2, firm2'}
There are already effective answer posted by others. So I may just put the reason why your own solution does't work properly. It may caused by lazy binding. There are good resource on: http://quickinsights.io/python/python-closures-and-late-binding/
Because late binding will literally pick up the last one in dictionary you created. But this last one is not "virtually last one", it is determined by the OS. (Other people already give some explanation on dict data-structure.)
For each time you run in python command line the result may change. If you put the code in .py file, For each time you run in IDE, the result will be same.(always the last one in dict)
During each iteration, analysts_dict is assigned value based on the result of dict.items().
However, you should use comprehension to generate the final result in one line,
E.g. [{i: e} for i, e in enumerate(analysts.items())]
analysts = {
"a": 13,
"b": 123,
"c": 1234
}
num_analysts = len(analysts.keys())
analysts_dict = [{i: e} for i, e in enumerate(analysts.items())]
print(analysts_dict)
>> [{0: ('a', 13)}, {1: ('b', 123)}, {2: ('c', 1234)}]
This code
for k,v in analysts.items():
analysts_dict = dict.fromkeys(range(num_analysts), [k,v])
loops over the original dict and on each loop iteration it creates a new dict using the range numbers as the keys. By the way, every item in that dict shares a reference to a single [k, v] list object. That's generally a bad idea. You should only use an immutable object (eg None, a number, or a string) as the value arg to the dict.fromkeys method. The purpose of the method is to allow you to create a dict with a simple default value for the keys you supply, you can't use it to make a dict with lists as the values if you want those lists to be separate lists.
The new dict object is bound to the name analysts_dict. On the next loop iteration, a new dict is created and bound to that name, replacing the one just created on the previous loop, and the replaced dict is destroyed.
So you end up with an analysts_dict containing a bunch of references to the final [k, v] pair read from the original dict.
To get your desired result, you should use DYZ's code, which I won't repeat here. Note that it stores the old name & firm info in tuples, which is better than using lists for this application.
I want to put this data into one list so i can sort it by timestamp. I tried itertools chain but that didn't really work.
Thank you for your help :)
I'm very bad at making clear what i want to do so im sorry upfront if this takes some explaning.
If i try a chain i get the value back like this.
I want to display it on the html page like this :
date, name , rating, text (newline)
likes comments
Which would work the way i did it but if i want to sort it by time, it wouldn't work so i tried to think of a way to make it into a sortable list. Which can be displayed. Is that understandable ?
['Eva Simia', 'Peter Alexander', {'scale': 5, 'value': 5}, {'scale': 5, 'value': 5}, 1, 0, 1, 0]
it should look like this:
['Peter Alexander, scale:5, value:5, 1,0]
['Eva Simia, scale:5, value:5, 1,0]
for i in user:
name.append(i['name'])
for i in next_level:
rating_values.append(i['rating'])
for i in comment_values:
comments_count.append(i['count'])
for i in likes_values:
likes_count.append(i['count'])
for s in rating_values:
ratings.append(s['value'])
for s in date:
ratings.append(s['date'])
ab = itertools.chain([name], [rating_values],
[comments_count], [likes_values],
[comment_values], [date])
list(ab)
Updated after clarification:
The problem as I understand it:
You have a dataset that is split into several lists, one list per field.
Every list has the records in the same order. That is, user[x]'s rating value is necessarily rating_values[x].
You need to merge that information into a single list of composite items. You'd use zip() for that:
merged = zip(user, next_level, comment_values, likes_values, rating_values, date)
# merged is now [(user[0], next_level[0], comment_values[0], ...),
# (user[1], next_level[1], comment_values[1], ...),
# ...]
From there, you can simply sort your list using sorted():
result = sorted(merged, key=lambda i: (i[5], i[0]))
The key argument must be a function. It is given each item in the list once, and must return the key that will be used to compare items. Here, we build a short function on the fly, that returns the date and username, effectively telling things will be sorted, first by date and if dates are equal, then by username.
[Past answer about itertools.chain, before the clarification]
ab = list(itertools.chain(
(i['name'] for i in user),
(i['rating'] for i in next_level),
(i['count'] for i in comment_values),
(i['count'] for i in likes_values),
(i['value'] for i in rating_values),
(i['date'] for i in date),
))
The point of using itertools.chain is usually to avoid needless copies and intermediary objects. To do that, you want to pass it iterators.
chain will return an iterator that will iterate through each of the given iterators, one at a time, moving to the next iterator when current stops.
Note every iterator has to be wrapped in parentheses, else python will complain. Do not make it square brackets, at it would cause an intermediary list to be built.
You can join list by simply using +.
l = name + rating_values + comments_count + ...
date, rating_values,likes_values,comment_values,next_level,user = (list(t) for t in zip(*sorted(zip(date, rating_values,likes_values,comment_values,next_level,user))))
Using Python 2.7.9: I have a list of dictionaries that hold a 'data' item, how do I access each item into a list so I may get the mean and standard deviation? Here's an example:
values = [{'colour': 'b.-', 'data': 12.3}, {'colour': 'b.-', 'data': 11.2}, {'colour': 'b.-', 'data': 9.21}]
So far I have:
val = []
for each in values:
val.append(each.items()[1][1])
print np.mean(val) # gives 10.903
print np.std(val) # gives 1.278
Crude and not very Pythonic(?)
Using list comprehension is probably easiest. You can extract the numbers like this:
numbers = [x['data'] for x in values]
Then you just call numpys mean/std/etc functions on that, just like you're doing.
Apologies for (perhaps) an unnecessary question, I've seen this:
average list of dictionaries in python
vals = [i['data'] for i in values]
np.mean(vals) # gives 10.903
np.std(vals) # gives 1.278
(Pythonic solution?)
It is an exceptionally bad idea to index into a dictionary since it has no guarantee of order. Sometimes the 'data' element could be first, sometimes it could be second. There is no way to know without checking.
When using a dictionary, you should almost always access elements by using the key. In dictionary notation, this is { key:value, ... } where each key is "unique". I can't remember the exact definition of "unique" for a python dictionary key, but it believe it is the (type, hash) pair of your object or literal.
Keeping this in mind, we have the more pythonic:
val = []
for data_dict in values:
val.append(data_dict['data'])
If you want to be fancy, you can use a list completion which is a fancy way of generating a list from a more complex statement.
val = [data_dict['data'] for data_dict in values]
To be even more fancy, you can add a few conditionals so check for errors.
val = [data_dict['data'] for data_dict in values if (data_dict and 'data' in data_dict)]
What this most-fancy way of doing thing is doing is filtering the results of the for data_dict in values iteration with if (data_dict and 'data' in data_dict) so that the only data_dict instances you use in data_dict['data'] are the ones that pass the if-check.
You want a pythonic one Liner?
data = [k['data'] for k in values]
print("Mean:\t"+ str(np.mean(data)) + "\nstd :\t" + str(np.std(data)))
you could use the one liner
print("Mean:\t"+ str(np.mean([k['data'] for k in values])) + "\nstd :\t" + str(np.std([k['data'] for k in values])))
but there really is no point, as both print
Mean: 10.9033333333
std : 1.27881021092
and the former is more readable.