Dictionary as key value? - python

I have been searching for my answer, probably just not using the right verbiage, and only come up with using lists as dictionary key values.
I need to take 20 csv files and anonomyze identifying student, teacher, school and district information for research purposes on testing data. The csv files range anywhere from 20K to 50K rows and 11 to 20 columns, not all have identical information.
One file may have:
studid, termdates, testname, score, standarderr
And another may have:
termdates, studid, studfirstname, studlastname, studdob, ethnicity, grade
And yet another may have:
termdates, studid, teacher, classname, schoolname, districtname
I am putting the varying data into dictionaries for each type of file/dataset, maybe this isn't the best, but I am getting stuck when trying to use a dictionary as a key value for when a student may have taken multiple tests i.e. Language, Reading, Math etc.
For instance:
studDict{studid{'newid': 12345, 'dob': 1/1/1, test1:{'score': 50, 'date': 1/1/15}, test2:{'score': 50, 'date': 1/1/15}, 'school': 'Hard Knocks'},
studid1{'newid': 12345, 'dob': 1/1/1, test1:{'score': 50, 'date': 1/1/15}, test2:{'score': 50, 'date': 1/1/15}, 'school': 'Hard Knocks'}}
Any guidance on which libraries or a brief direction to a method would be greatly appreciated. I understand enough Python that I do not need a full hand holding, but helping me get across the street would be great. :D
CLARIFICATION
I have a better chance of winning the lottery than this project does of being used more than once, so the simpler the method the better. If it would be a repeating project I would most likely dump the data into db tables and work from there.

A dictionary cannot be a key, but a dictionary can be a value for some key in another dictionary (a dict-of-dicts). However, instantiating dictionaries of varying length for every tuple is probably going to make your data analysis very difficult.
Consider using Pandas to read the tuples into a DataFrame with null values where appropriate.
dict API: https://docs.python.org/2/library/stdtypes.html#mapping-types-dict
Pandas Data handling package: http://pandas.pydata.org/

You cannot use a dictionary as a key to a dictionary. Keys must be hashable (i.e., immutable), and dictionaries are not, therefore cannot be used as keys.
You can store a dictionary in another dictionary just the same as any other value. You can, for example do
studDict = { studid: {'newid': 12345, 'dob': 1/1/1, test1:{'score': 50, 'date': 1/1/15}, test2:{'score': 50, 'date': 1/1/15}, 'school': 'Hard Knocks'},
studid1: {'newid': 12345, 'dob': 1/1/1, test1:{'score': 50, 'date': 1/1/15}, test2:{'score': 50, 'date': 1/1/15}, 'school': 'Hard Knocks'}}
assuming you have defined studid and studid1 elsewhere.

If I interpret you correctly, in the end you want a dict with students (i.e. studid) as key and different student related data as value? This is probably not exactly what you want, but I think it will point you in the right direction (adapted from this answer):
import csv
from collections import namedtuple, defaultdict
D = defaultdict(list)
for filename in files:
with open(filename, mode="r") as infile:
reader = csv.reader(infile)
Data = namedtuple("Data", next(reader))
for row in reader:
data = Data(*row)
D[data.studid].append(data)
In the end that should give you a dict D with stuids as keys and a list of test results as values. Each test result is a namedtuple. This assumes that every file has a studid column!.

If you can know the order of a file ahead of time, it's not hard to make a dictionary for it with help from csv.
File tests.csv:
12345,2015-05-19,AP_Bio,96,0.12
67890,2015-04-28,AP_Calc,92,0.17
In a Python file in the same directory as tests.csv:
import csv
with open("tests.csv") as tests:
# Change the fields for files that follow a different form
fields = ["studid", "termdates", "testname", "score", "standarderr"]
students_data = list(csv.DictReader(tests, fieldnames=fields))
# Just a pretty show
print(*students_data, sep="\n")
# {'studid': '12345', 'testname': 'AP_Bio', 'standarderr': '0.12', 'termdates': '2015-05-19', 'score': '96'}
# {'studid': '67890', 'testname': 'AP_Calc', 'standarderr': '0.17', 'termdates': '2015-04-28', 'score': '92'}

Be more explicit please. Your solution depend on the design.
in district you have schools and in each school you have teachers or student.
first you order your datas by district and school
districts = {
"name_district1":{...},
"name_district2":{...},
...,
"name_districtn":{...},
}
for each distric:
# "name_districtn"
{
"name_school1": {...},
"name_school2": {...},
...,
"name_schooln": {...},
}
for each school:
#"name_schooln"
{
id_student1: {...},
id_student2: {...},
...,
id_studentn: {...}
}
and for each student...you define his elements
you can also define one dictionary for all the student but you have to design a uniq id for each student in this case for example:
uniq_Id = "".join(("name_district","name_school", str(student_id)))
Total = {
uniq_Id: {'dob': 1/1/1, test1:{'score': 50, 'date': 1/1/15}, test2:{'score': 50, 'date': 1/1/15}, 'school': 'Hard Knocks'}} ,
...,
}

Related

Given the value of a dictionary field, how can I find a dictionary in a list of dictionaries?

Based on a list of fastfoods (list of dictionaries), for each fastfood (each dictionary), I'm trying to 1) extract the competitor's name and then 2) use that competitor's name to retrieve its shortname.
Currently my solution really doesn't make sense and I have a feeling it might be recursive? I'm really having an issue conceptualizing this.
fastfoods = [{'id': 30, 'name': 'McDonalds', 'shortname': 'MC', 'competitor': 'BurgerKing'}, {'id': 47, 'name': 'BurgerKing', 'shortname': 'BK', 'competitor': None}]
for fastfood in fastfoods:
competitor_name = fastfood.get('competitor')
short_name = fastfood.get('shortname')
for fastfood in fastfoods:
if competitor_name == short_name:
print(fastfood.get('shortname')
Here's a visualization of what I'm trying to achieve:
In this limited example I have (real example has thousands of dictionaries, but I'm using 2 just for the example.
So here, I loop over the dictionaries, I reach the first dictionary, I extract the competitor's name ('BurgerKing'). At this point, I want to search for 'BurgerKing' as a 'name' field (not as a competitor field). Once that's found, I access that dictionary where the 'name' field == 'BurgerKing' and extract the shortname ('BK').
I think you're looking for something like this:
byName = {dct['name']:dct for dct in fastfoods}
for fastfood in fastfoods:
if 'competitor' in fastfood and fastfood['competitor'] is not None:
competitor = byName[fastfood['competitor']]
if 'shortname' in competitor:
print(competitor['shortname'])
else:
print(f"competitor {fastfood['competitor']} has no shortname")
Explanation:
create byName, a dictionary that indexes dicts in fastfoods by their name entry
iterate over all dicts in fastfoods
if a given dict has a competitor entry and it's non-null, look up the dict for that competitor by name in byName and if it has a shortname entry print it
otherwise print a message indicating there is no shortname entry for the competitor (you can do something different in this case if you like).
I would first construct a dictionary that maps a name to its shortened version, and then use it. This would be way faster than looking for the competitor in the list over and over again.
fastfoods = [{'id': 30, 'name': 'McDonalds', 'shortname': 'MC', 'competitor': 'BurgerKing'}, {'id': 47, 'name': 'BurgerKing', 'shortname': 'BK', 'competitor': None}]
name_to_short = {dct['name']: dct['shortname'] for dct in fastfoods}
for dct in fastfoods:
print(f"The competitor of {dct['name']} is: {name_to_short.get(dct['competitor'], 'None')}")
# The competitor of McDonalds is: BK
# The competitor of BurgerKing is: None

Create a dictionary from list of dictionaries selecting specific values

I have a list of dictionaries as below and I'd like to create a dictionary to store specific data from the list.
test_list = [
{
'id':1,
'colour':'Red',
'name':'Apple',
'edible': True,
'price':100
},
{
'id':2,
'colour':'Blue',
'name':'Blueberry',
'edible': True,
'price':200
},
{
'id':3,
'colour':'Yellow',
'name':'Crayon',
'edible': False,
'price':300
}
]
For instance, a new dictionary to store just the {id, name, price} of the various items.
I created several lists:
id_list = []
name_list = []
price_list = []
Then I added the data I want to each list:
for n in test_list:
id_list.append(n['id']
name_list.append(n['name']
price_list.append(n['price']
But I can't figure out how to create a dictionary (or a more appropriate structure?) to store the data in the {id, name, price} format I'd like. Appreciate help!
If you don't have too much data, you can use this nested list/dictionary comprehension:
keys = ['id', 'name', 'price']
result = {k: [x[k] for x in test_list] for k in keys}
That'll give you:
{
'id': [1, 2, 3],
'name': ['Apple', 'Blueberry', 'Crayon'],
'price': [100, 200, 300]
}
I think a list of dictionaries is stille the right data format, so this:
test_list = [
{
'id':1,
'colour':'Red',
'name':'Apple',
'edible': True,
'price':100
},
{
'id':2,
'colour':'Blue',
'name':'Blueberry',
'edible': True,
'price':200
},
{
'id':3,
'colour':'Yellow',
'name':'Crayon',
'edible': False,
'price':300
}
]
keys = ['id', 'name', 'price']
limited = [{k: v for k, v in d.items() if k in keys} for d in test_list]
print(limited)
Result:
[{'id': 1, 'name': 'Apple', 'price': 100}, {'id': 2, 'name': 'Blueberry', 'price': 200}, {'id': 3, 'name': 'Crayon', 'price': 300}]
This is nice, because you can access its parts like limited[1]['price'].
However, your use case is perfect for pandas, if you don't mind using a third party library:
import pandas as pd
test_list = [
{
'id':1,
'colour':'Red',
'name':'Apple',
'edible': True,
'price':100
},
{
'id':2,
'colour':'Blue',
'name':'Blueberry',
'edible': True,
'price':200
},
{
'id':3,
'colour':'Yellow',
'name':'Crayon',
'edible': False,
'price':300
}
]
df = pd.DataFrame(test_list)
print(df['price'][1])
print(df)
The DataFrame is perfect for this stuff and selecting just the columns you need:
keys = ['id', 'name', 'price']
df_limited = df[keys]
print(df_limited)
The reason I'd prefer either to a dictionary of lists is that manipulating the dictionary of lists will get complicated and error prone and accessing a single record means accessing three separate lists - there's not a lot of advantages to that approach except maybe that some operations on lists will be faster, if you access a single attribute more often. But in that case, pandas wins handily.
In the comments you asked "Let's say I had item_names = ['Apple', 'Teddy', 'Crayon'] and I wanted to check if one of those item names was in the df_limited variable or I guess the df_limited['name'] - is there a way to do that, and if it is then print say the price, or manipulate the price?"
There's many ways of course, I recommend looking into some online pandas tutorials, because it's a very popular library and there's excellent documentation and teaching materials online.
However, just to show how easy it would be in both cases, retrieving the matching objects or just the prices for them:
item_names = ['Apple', 'Teddy', 'Crayon']
items = [d for d in test_list if d['name'] in item_names]
print(items)
item_prices = [d['price'] for d in test_list if d['name'] in item_names]
print(item_prices)
items = df[df['name'].isin(item_names)]
print(items)
item_prices = df[df['name'].isin(item_names)]['price']
print(item_prices)
Results:
[{'id': 1, 'colour': 'Red', 'name': 'Apple', 'edible': True, 'price': 100}, {'id': 3, 'colour': 'Yellow', 'name': 'Crayon', 'edible': False, 'price': 300}]
[100, 300]
id name price
0 1 Apple 100
2 3 Crayon 300
0 100
2 300
In the example with the dataframe there's a few things to note. They are using .isin() since using in won't work in the fancy way dataframes allow you to select data df[<some condition on df using df>], but there's fast and easy to use alternatives for all standard operations in pandas. More importantly, you can just do the work on the original df - it already has everything you need in there.
And let's say you wanted to double the prices for these products:
df.loc[df['name'].isin(item_names), 'price'] *= 2
This uses .loc for technical reasons (you can't modify just any view of a dataframe), but that's way too much to get into in this answer - you'll learn looking into pandas. It's pretty clean and simple though, I'm sure you agree. (you could use .loc for the previous example as well)
In this trivial example, both run instantly, but you'll find that pandas performs better for very large datasets. Also, try writing the same examples using the method you requested (as provided in the accepted answer) and you'll find that it's not as elegant, unless you start by zipping everything together again:
item_prices = [p for i, n, p in zip(result.values()) if n in item_names]
Getting out a result that has the same structure as result is way more trickier with more zipping and unpacking involved, or requires you to go over the lists twice.

How to identify a category and print from a dictionary

New to the world of python, I am trying to get a list of of categories in this dictionary a list of 'type' and 'sub-type'.
I have tried a few different things but no luck, any help would be appreciated
{'accounts': [{'account_id': 'JqRQG4WVV7IMe3LDG7Ebc97Kjoel4asdrRjqX',
'balances': {'available': 100,
'current': 110,
'iso_currency_code': 'USD',
'limit': None,
'unofficial_currency_code': None},
'mask': '0000',
'name': 'Plaid Checking',
'official_name': 'Plaid Gold Standard 0% Interest Checking',
'subtype': 'checking',
'type': 'depository'},
Iterate through the accounts and collect the types and subtypes:
for subdict in original_dictionary['accounts']:
print('{}:{}'.format(subdict['type'], subdict['subtype']))
If you have to look for types and subtypes in values corresponding to other keys besides the 'accounts' key, you'll have to iterate through the key value pairs of your original dictionary via something like:
for key, value in original_dictionary.items():

Retrieve value in JSON from pandas series object

I need help retrieving a value from a JSON response object in python. Specifically, how do I access the prices-asks-price value? I'm having trouble:
JSON object:
{'prices': [{'asks': [{'liquidity': 10000000, 'price': '1.16049'}],
'bids': [{'liquidity': 10000000, 'price': '1.15989'}],
'closeoutAsk': '1.16064',
'closeoutBid': '1.15974',
'instrument': 'EUR_USD',
'quoteHomeConversionFactors': {'negativeUnits': '1.00000000',
'positiveUnits': '1.00000000'},
'status': 'non-tradeable',
'time': '2018-08-31T20:59:57.748335979Z',
'tradeable': False,
'type': 'PRICE',
'unitsAvailable': {'default': {'long': '4063619', 'short': '4063619'},
'openOnly': {'long': '4063619', 'short': '4063619'},
'reduceFirst': {'long': '4063619', 'short': '4063619'},
'reduceOnly': {'long': '0', 'short': '0'}}}],
'time': '2018-09-02T18:56:45.022341038Z'}
Code:
data = pd.io.json.json_normalize(response['prices'])
asks = data['asks']
asks[0]
Out: [{'liquidity': 10000000, 'price': '1.16049'}]
I want to get the value 1.16049 - but having trouble after trying different things.
Thanks
asks[0] returns a list so you might do something like
asks[0][0]['price']
or
data = pd.io.json.json_normalize(response['prices'])
price = data['asks'][0][0]['price']
The data that you have contains jsons and lists inside one another. Hence you need to navigate through this accordingly. Lists are accessed through indexes (starting from 0) and jsons are accessed through keys.
price_value=data['prices'][0]['asks'][0]['price']
liquidity_value=data['prices'][0]['asks'][0]['liquidity']
Explaining this logic in this case : I assume that your big json object is stored in a object called data. First accessing prices key in this object. Then I have index 0 because the next key is present inside a list. Then after you go inside the list, you have a key called asks. Now again you have a list here so you need to access it using index 0. Then finally the key liquidity and price is here.

Proper way to format Dictionary with multiple entries in Python

I am just curious what is the best/most efficient way to structure a Dictionary in Python with multiple entries. It could be a dictionary of students, or employees etc. For sake of argument a 'Name' key/value and a few other key/value pairs per entry.
For example, this works great if you have just one student...
student_dict = {'name': 'John', 'age': 15, 'grades':[90, 80, 75]}
Is this the right way to do it? One variable per dictionary entry? Somehow I don't think it is:
student_1 = {'name': 'Fred', 'age': 17, 'grades':[85, 80, 75]}
student_2 = {'name': 'Sean', 'age': 16, 'grades':[65, 70, 100]}
student_3 = ...
I'm sure there is a simple way to structure this where it would be easy to add new entries and search existing entries in one location, I just can't wrap my head around it.
Thanks
Use a dictionary or list to store the dictionaries. Since you seem to want to be able to refer to individual dictionaries by name I suggest a dictionary of dictionaries:
students = {'student_1': {'name': 'Fred', 'age': 17, 'grades':[85, 80, 75]},
'student_2': {'name': 'Sean', 'age': 16, 'grades':[65, 70, 100]}}
Now you can refer to individual dictionaries by key:
>>> students['student_1']
{'name': 'Fred', 'age': 17, 'grades': [85, 80, 75]}
If you don't care about names, or you need to preserve the order, use a list:
students = [{'name': 'Fred', 'age': 17, 'grades':[85, 80, 75]},
{'name': 'Sean', 'age': 16, 'grades':[65, 70, 100]}]
Access them by index:
>>> students[0]
{'name': 'Fred', 'age': 17, 'grades': [85, 80, 75]}
Or iterate over them:
for student in students:
print(student['name'], student['age'], student['grades'])
You need to choose the key which will give you quick access to a student record. The name seems the most useful:
students = {
'Fred': {'name': 'Fred', 'age': 17, 'grades':[85, 80, 75]}
'Sean': {'name': 'Sean', 'age': 16, 'grades':[65, 70, 100]}
}
Then you can get the record for Fred with students['Fred'].
First of all, you should use dictionaries inside of dictionaries. For example:
people = { "John":{"age":15, "school":"asborne high"},
"Alex":{"age":32, "work":"microsoft"},
"Emily":{"age":21, "school":"florida state"} }
Using this method, you can efficiently index any value by its name alone:
print(people["Alex"]["age"])
Second, if you are shooting for readability and ease-of-use, make sure to properly format your multi-dimensional dictionary objects. What I mean by this is you should try to stick to at most two data structures for your custom-objects. If you need to organize a list of dogs, their colors, name, and age, you should use a structure similar to this:
dogs = { "Lisa":{"colors":["brown","white"], "age":4 },
"Spike":{"colors":["black","white"], "age":10} }
Notice how I do not switch between tuples and lists, or dictionaries and lists. Consistence is key.
When organzing numeric data, stick to the same concept.
numbers = { "A":[2132.62, 422.67, 938.2218113, 3772.7026994],
"B":[5771.11, 799.26, 417.9011772, 8922.0116259],
"C":[455.778, 592.224, 556.657001. 66.674254323] }

Categories