Proper way to format Dictionary with multiple entries in Python

Proper way to format Dictionary with multiple entries in Python - python

I am just curious what is the best/most efficient way to structure a Dictionary in Python with multiple entries. It could be a dictionary of students, or employees etc. For sake of argument a 'Name' key/value and a few other key/value pairs per entry.
For example, this works great if you have just one student...
student_dict = {'name': 'John', 'age': 15, 'grades':[90, 80, 75]}
Is this the right way to do it? One variable per dictionary entry? Somehow I don't think it is:
student_1 = {'name': 'Fred', 'age': 17, 'grades':[85, 80, 75]}
student_2 = {'name': 'Sean', 'age': 16, 'grades':[65, 70, 100]}
student_3 = ...
I'm sure there is a simple way to structure this where it would be easy to add new entries and search existing entries in one location, I just can't wrap my head around it.
Thanks

Use a dictionary or list to store the dictionaries. Since you seem to want to be able to refer to individual dictionaries by name I suggest a dictionary of dictionaries:
students = {'student_1': {'name': 'Fred', 'age': 17, 'grades':[85, 80, 75]},
'student_2': {'name': 'Sean', 'age': 16, 'grades':[65, 70, 100]}}
Now you can refer to individual dictionaries by key:
>>> students['student_1']
{'name': 'Fred', 'age': 17, 'grades': [85, 80, 75]}
If you don't care about names, or you need to preserve the order, use a list:
students = [{'name': 'Fred', 'age': 17, 'grades':[85, 80, 75]},
{'name': 'Sean', 'age': 16, 'grades':[65, 70, 100]}]
Access them by index:
>>> students[0]
{'name': 'Fred', 'age': 17, 'grades': [85, 80, 75]}
Or iterate over them:
for student in students:
print(student['name'], student['age'], student['grades'])

You need to choose the key which will give you quick access to a student record. The name seems the most useful:
students = {
'Fred': {'name': 'Fred', 'age': 17, 'grades':[85, 80, 75]}
'Sean': {'name': 'Sean', 'age': 16, 'grades':[65, 70, 100]}
}
Then you can get the record for Fred with students['Fred'].

First of all, you should use dictionaries inside of dictionaries. For example:
people = { "John":{"age":15, "school":"asborne high"},
"Alex":{"age":32, "work":"microsoft"},
"Emily":{"age":21, "school":"florida state"} }
Using this method, you can efficiently index any value by its name alone:
print(people["Alex"]["age"])
Second, if you are shooting for readability and ease-of-use, make sure to properly format your multi-dimensional dictionary objects. What I mean by this is you should try to stick to at most two data structures for your custom-objects. If you need to organize a list of dogs, their colors, name, and age, you should use a structure similar to this:
dogs = { "Lisa":{"colors":["brown","white"], "age":4 },
"Spike":{"colors":["black","white"], "age":10} }
Notice how I do not switch between tuples and lists, or dictionaries and lists. Consistence is key.
When organzing numeric data, stick to the same concept.
numbers = { "A":[2132.62, 422.67, 938.2218113, 3772.7026994],
"B":[5771.11, 799.26, 417.9011772, 8922.0116259],
"C":[455.778, 592.224, 556.657001. 66.674254323] }

Related

How to get value from a dict that is in a dict

For example, I have this dictionary.
{'count': 1, 'items': [{'date': 1649523732, 'from_id': 269690832, 'id': 190, 'out': 0, 'attachments': [{'type': 'photo', 'photo': {'album_id': -3, 'date': 1649523732, **'id': 457249932**, 'owner_id': 269690832, 'access_key': 'df14603asdd3d26e7a1f5'}}]}]}
I want to get the value of id ('id': 457249932). How do i do this?

The element is nested quite deep -- it's nested inside a list mapped to an items key, a list mapped to an attachments key, and a dictionary mapped to a photo key. So, we can do:
print(data['items'][0]['attachments'][0]['photo']['id'])
where data is the dictionary that you're indexing on.

pandas split list like object

Hi I have this column of data named labels:
[{'id': 123456,
'name': John,
'age': 22,
'pet': None,
'gender': male,
'result': [{'id': 'vEo0PIYPEE',
'type': 'choices',
'value': {'choices': ['Same Person']},
'to_name': 'image',
'from_name': 'person_evaluation'}]}]
[{'id': 123457,
'name': May,
'age': 21,
'pet': None,
'gender': female,
'result': [{'id': zTHYuKIOQ',
'type': 'choices',
'value': {'choices': ['Different Person']},
'to_name': 'image',
'from_name': 'person_evaluation'}]}]
......
Not sure what type is this, and I would like to break this down, to extract the value [Same Person], the outcome should be something like this:
0 [Same Person]
1 [Different Person]
....
How should I achieve this?

Based on the limited data that you have provided, would this work?
df['labels_new'] = df['labels'].apply(lambda x: x[0].get('result')[0].get('value').get('choices'))
labels labels_new
0 [{'id': 123456, 'name': 'John', 'age': 22, 'pe... [Same Person]
1 [{'id': 123457, 'name': 'May', 'age': 21, 'pet... [Different Person]
You can use the following as well, but I find dict.get() to be more versatile (returning default values for example) and has better exception handling.
df['labels'].apply(lambda x: x[0]['result'][0]['value']['choices'])
You could consider using pd.json_normalize (read more here) but for the current state of your column that you have, its going to be a bit complex to extract the data with that, rather than simply using a lambda function

MongoDB complex aggregation pipeline of Age fields

I have a MongoDB collection with documents that may have zero or more of the following fields: DOB (date of birth), YOB (year of birth) and Age. These may contain integers, floats or strings, and they may or may not be mappable to real values. For instance:
{'_id': id1, 'Age': 25},
{'_id': id2, 'Age': 'unknown', 'DOB': 'xxxx-xx-xx'},
{'_id': id3, 'Age': '8', 'DOB': '1988/01/05'},
{'_id': id4, 'YOB': '1995.0'},
{'_id': id5, 'DOB': '5/8/1886'},
{'_id': id6, 'Age': 17, 'YOB': 2003},
{'_id': id7},
...
For each document in the database, I need to extract a single standardized field, Age_Standardized, with the following criteria:
is an integer
is > 12
is < 99
I also need to implement an order of preference in the event multiple of the fields have a viable value, as DOB then YOB then Age.
So, for instance, if Age = 17 but DOB = '1900', Age_Standardized = 17 because, even though DOB exists (and is preferred to Age), it produces an Age_Standardized outside of the viable range (13-98).
If YOB = '1998.0' and Age = '19' then Age_Standardized = 23 because, even though Age exists and is viable, YOB is preferred and also viable.
I need to implement all of this over a large collection, and I was hoping to do it within a single PyMongo aggregation framework. In the examples above, the output would be:
{'_id': id1, 'Age': 25, 'Age_Standardized': 25},
{'_id': id2, 'Age': 'unknown', 'DOB': 'xxxx-xx-xx'},
{'_id': id3, 'Age': '8', 'DOB': '1988/01/05', 'Age_Standardized': 33},
{'_id': id4, 'YOB': '1995.0', 'Age_Standardized': 26},
{'_id': id5, 'DOB': '5/8/1886'},
{'_id': id6, 'Age': 17, 'YOB': 2003, 'Age_Standardized': 18},
{'_id': id7},

How to merge the items of list if they are same and append if items are not same in Python

I have a list tags which contains below dict data
tags = [ {'Time': 1, 'Name': 'John'} ]
I am getting tags value from a function. Some time the values received has same Name so in this case I need to simply get its Time value and update the list. For ex if below data is received:
tags = [ {'Time': 4, 'Name': 'John'} ]
In this case, as Name is same so I will simply get the Time value and update the tags list so output will be:
output = [ {'Time': 4, 'Name': 'John'} ]
Time has changed from 1 to 4. But lets say a new Name is received, for ex below:
tags = [ {'Time': 10, 'Name': 'John'}, {'Time': 6, 'Name': 'Karan'} ]
So in this case John time will be updated and Karan time data will be appended to list so output will be
output = [ {'Time': 10, 'Name': 'John'}, {'Time': 6, 'Name': 'Karan'} ]
So for John we have updated the time and Karan data has been added.
I have output_tags as dict which has a tags as list in it. I am doing the below:
output_tags['Tags'].clear()
for tag in tags:
output_tags['Tags'].append(tag)
Now above code is clearing whatever data we had in output_tags['Tags'] and then simply append all the data. So this way we will update the same Name time and also append any new Name received.
But using this code, I was clearing any previous data I had. For ex, sometime back I received Ellis data but now I am not receiving Ellis data. I still need to keep Ellis data but it was getting cleared. Is there any other way I can resolve this issue. Please help. Thanks

You can merge your two lists of tags :
from collections import defaultdict
tags = [ {'Time': 10, 'Name': 'John'}, {'Time': 6, 'Name': 'Karan'} ]
new_tags = [ {'Time': 30, 'Name': 'Bob'}, {'Time': 40, 'Name': 'Karan'} ]
d = defaultdict(dict)
# using defaultdict(dict), whatever the key your using, it will be initiated with a dict : `d['random_key'] == {}` is True
# we fill d with all tags and new_tags using `Name` as key
for list_ in (tags, new_tags):
for obj in list_:
# if `obj['Name']` has already been set, it is updated
# otherwise it is added
d[obj['Name']].update(obj)
# display only values and make it a list
results = list(d.values())
print(results)
# [{'Time': 10, 'Name': 'John'}, {'Time': 40, 'Name': 'Karan'}, {'Time': 30, 'Name': 'Bob'}]

I have been searching for my answer, probably just not using the right verbiage, and only come up with using lists as dictionary key values.
I need to take 20 csv files and anonomyze identifying student, teacher, school and district information for research purposes on testing data. The csv files range anywhere from 20K to 50K rows and 11 to 20 columns, not all have identical information.
One file may have:
studid, termdates, testname, score, standarderr
And another may have:
termdates, studid, studfirstname, studlastname, studdob, ethnicity, grade
And yet another may have:
termdates, studid, teacher, classname, schoolname, districtname
I am putting the varying data into dictionaries for each type of file/dataset, maybe this isn't the best, but I am getting stuck when trying to use a dictionary as a key value for when a student may have taken multiple tests i.e. Language, Reading, Math etc.
For instance:
studDict{studid{'newid': 12345, 'dob': 1/1/1, test1:{'score': 50, 'date': 1/1/15}, test2:{'score': 50, 'date': 1/1/15}, 'school': 'Hard Knocks'},
studid1{'newid': 12345, 'dob': 1/1/1, test1:{'score': 50, 'date': 1/1/15}, test2:{'score': 50, 'date': 1/1/15}, 'school': 'Hard Knocks'}}
Any guidance on which libraries or a brief direction to a method would be greatly appreciated. I understand enough Python that I do not need a full hand holding, but helping me get across the street would be great. :D
CLARIFICATION
I have a better chance of winning the lottery than this project does of being used more than once, so the simpler the method the better. If it would be a repeating project I would most likely dump the data into db tables and work from there.

A dictionary cannot be a key, but a dictionary can be a value for some key in another dictionary (a dict-of-dicts). However, instantiating dictionaries of varying length for every tuple is probably going to make your data analysis very difficult.
Consider using Pandas to read the tuples into a DataFrame with null values where appropriate.
dict API: https://docs.python.org/2/library/stdtypes.html#mapping-types-dict
Pandas Data handling package: http://pandas.pydata.org/

You cannot use a dictionary as a key to a dictionary. Keys must be hashable (i.e., immutable), and dictionaries are not, therefore cannot be used as keys.
You can store a dictionary in another dictionary just the same as any other value. You can, for example do
studDict = { studid: {'newid': 12345, 'dob': 1/1/1, test1:{'score': 50, 'date': 1/1/15}, test2:{'score': 50, 'date': 1/1/15}, 'school': 'Hard Knocks'},
studid1: {'newid': 12345, 'dob': 1/1/1, test1:{'score': 50, 'date': 1/1/15}, test2:{'score': 50, 'date': 1/1/15}, 'school': 'Hard Knocks'}}
assuming you have defined studid and studid1 elsewhere.

If I interpret you correctly, in the end you want a dict with students (i.e. studid) as key and different student related data as value? This is probably not exactly what you want, but I think it will point you in the right direction (adapted from this answer):
import csv
from collections import namedtuple, defaultdict
D = defaultdict(list)
for filename in files:
with open(filename, mode="r") as infile:
reader = csv.reader(infile)
Data = namedtuple("Data", next(reader))
for row in reader:
data = Data(*row)
D[data.studid].append(data)
In the end that should give you a dict D with stuids as keys and a list of test results as values. Each test result is a namedtuple. This assumes that every file has a studid column!.

If you can know the order of a file ahead of time, it's not hard to make a dictionary for it with help from csv.
File tests.csv:
12345,2015-05-19,AP_Bio,96,0.12
67890,2015-04-28,AP_Calc,92,0.17
In a Python file in the same directory as tests.csv:
import csv
with open("tests.csv") as tests:
# Change the fields for files that follow a different form
fields = ["studid", "termdates", "testname", "score", "standarderr"]
students_data = list(csv.DictReader(tests, fieldnames=fields))
# Just a pretty show
print(*students_data, sep="\n")
# {'studid': '12345', 'testname': 'AP_Bio', 'standarderr': '0.12', 'termdates': '2015-05-19', 'score': '96'}
# {'studid': '67890', 'testname': 'AP_Calc', 'standarderr': '0.17', 'termdates': '2015-04-28', 'score': '92'}

Be more explicit please. Your solution depend on the design.
in district you have schools and in each school you have teachers or student.
first you order your datas by district and school
districts = {
"name_district1":{...},
"name_district2":{...},
...,
"name_districtn":{...},
}
for each distric:
# "name_districtn"
{
"name_school1": {...},
"name_school2": {...},
...,
"name_schooln": {...},
}
for each school:
#"name_schooln"
{
id_student1: {...},
id_student2: {...},
...,
id_studentn: {...}
}
and for each student...you define his elements
you can also define one dictionary for all the student but you have to design a uniq id for each student in this case for example:
uniq_Id = "".join(("name_district","name_school", str(student_id)))
Total = {
uniq_Id: {'dob': 1/1/1, test1:{'score': 50, 'date': 1/1/15}, test2:{'score': 50, 'date': 1/1/15}, 'school': 'Hard Knocks'}} ,
...,
}

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.