Python json add multiple keys and iterate - python

I am looking to convert a dataframe to json, this is the code I currently have:
my_frame = pd.DataFrame(
{'Age':[30, 31],
'Eye':['blue', 'brown'],
'Gender': ['male', 'female']})
my_frame = my_frame.to_json(orient='records')
my_frame
Result:
'[{"Age":30,"Eye":"blue","Gender":"male"},{"Age":31,"Eye":"brown","Gender":"female"}]'
I want to add keys to the json object and add the key info over the entire data that was converted from the dataframe.
add_keys = {'id': 101,
'loc': 'NY',
}
add_keys['info'] = my_frame
add_keys
Result:
{'id': 101,
'info': '[{"Age":30,"Eye":"blue","Gender":"male"},
{"Age":31,"Eye":"brown","Gender":"female"}]',
'loc': 'NY'}
I want to print each of the two records within info, however when I run this iterative code it outputs each character of the string rather than the entire record. I believe this may be an issue from how I am adding the keys.
for item in add_keys['info']:
print(item)
Any help greatly appreciated!

It is better to use pandas inbuilt functionality here. So, this is what you need: add_keys['info'] = my_frame.T.to_dict().values()
Here is the whole code:
>>> my_frame = pd.DataFrame(
... {'Age':[30, 31],
... 'Eye':['blue', 'brown'],
... 'Gender': ['male', 'female']})
>>> my_frame
Age Eye Gender
0 30 blue male
1 31 brown female
>>> add_keys = {'id': 101,
... 'loc': 'NY',
... }
>>> add_keys
{'loc': 'NY', 'id': 101}
>>> add_keys['info'] = my_frame.T.to_dict().values()
>>> add_keys
{'info': [{'Gender': 'male', 'Age': 30L, 'Eye': 'blue'}, {'Gender': 'female', 'Age': 31L, 'Eye': 'brown'}], 'loc': 'NY', 'id': 101}
>>> for item in add_keys['info']:
... print(item)
...
{'Gender': 'male', 'Age': 30L, 'Eye': 'blue'}
{'Gender': 'female', 'Age': 31L, 'Eye': 'brown'}
>>>

When you are using to_json(), pandas is generating a string containing the JSON representation of the dataframe.
If you want to retain the structure of your records in order to manipulate them, use
my_frame = my_frame.to_dict(orient='records')
Then after adding keys, if you want to serialize your data, you can do
json.dumps(add_keys)

Related

What is the most efficient way to create nested dictionaries in Python?

I currently have over 10k elements in my dictionary looks like:
cars = [{'model': 'Ford', 'year': 2010},
{'model': 'BMW', 'year': 2019},
...]
And I have a second dictionary:
car_owners = [{'model': 'BMW', 'name': 'Sam', 'age': 34},
{'model': 'BMW', 'name': 'Taylor', 'age': 34},
.....]
However, I want to join together the 2 together to be something like:
combined = [{'model': 'BMW',
'year': 2019,
'owners: [{'name': 'Sam', 'age': 34}, ...]
}]
What is the best way to combine them? For the moment I am using a For loop but I feel like there are more efficient ways of dealing with this.
** This is just a fake example of data, the one I have is a lot more complex but this helps give the idea of what I want to achieve
Iterate over the first list, creating a dict with the key-val as model-val, then in the second dict, look for the same key (model) and update the first dict, if it is found:
cars = [{'model': 'Ford', 'year': 2010}, {'model': 'BMW', 'year': 2019}]
car_owners = [{'model': 'BMW', 'name': 'Sam', 'age': 34}, {'model': 'Ford', 'name': 'Taylor', 'age': 34}]
dd = {x['model']:x for x in cars}
for item in car_owners:
key = item['model']
if key in dd:
del item['model']
dd[key].update({'car_owners': item})
else:
dd[key] = item
print(list(dd.values()))
OUTPUT:
[{'model': 'BMW', 'year': 2019, 'car_owners': {'name': 'Sam', 'age': 34}}, {'model': 'Ford', 'year': 2010, 'car_owners': {'name': 'Taylor',
'age': 34}}]
Really, what you want performance wise is to have dictionaries with the model as the key. That way, you have O(1) lookup and can quickly get the requested element (instead of looping each time in order to find the car with model x).
If you're starting off with lists, I'd first create dictionaries, and then everything is O(1) from there on out.
models_to_cars = {car['model']: car for car in cars}
models_to_owners = {}
for car_owner in car_owners:
models_to_owners.setdefault(car_owner['model'], []).append(car_owner)
combined = [{
**car,
'owners': models_to_owners.get(model, [])
} for model, car in models_to_cars.items()]
Then you'd have
combined = [{'model': 'BMW',
'year': 2019,
'owners': [{'name': 'Sam', 'age': 34}, ...]
}]
as you wanted

How can I use list comprehension to separate values in a dictionary?

name=[]
age=[]
address=[]
...
for line in pg:
for key,value in line.items():
if key == 'name':
name.append(value)
elif key == 'age':
age.append(value)
elif key == 'address':
address.append(value)
.
.
.
Is it possible to use list comprehension for above code because I need to separate lots of value in the dict? I will use the lists to write to a text file.
Source Data:
a = [{'name': 'paul', 'age': '26.', 'address': 'AU', 'gender': 'male'},
{'name': 'mei', 'age': '26.', 'address': 'NY', 'gender': 'female'},
{'name': 'smith', 'age': '16.', 'address': 'NY', 'gender': 'male'},
{'name': 'raj', 'age': '13.', 'address': 'IND', 'gender': 'male'}]
I don't think list comprehension will be a wise choice because you have multiple lists.
Instead of making multiple lists and appending to them the value if the key matches you can use defaultdict to simplify your code.
from collections import defaultdict
result = defaultdict(list)
for line in pg:
for key, value in line.items():
result[key].append(value)
You can get the name list by using result.get('name')
['paul', 'mei', 'smith', 'raj']
This probably won't work the way you want: Your'e trying to assign the three different lists, so you would need three different comprehensions. If your dict is large, this would roughly triple your execution time.
Something straightforward, such as
name = [value for for key,value in line.items() if key == "name"]
seems to be what you'd want ... three times.
You can proceed as :
pg=[{"name":"name1","age":"age1","address":"address1"},{"name":"name2","age":"age2","address":"address2"}]
name=[v for line in pg for k,v in line.items() if k=="name"]
age=[v for line in pg for k,v in line.items() if k=="age"]
address=[v for line in pg for k,v in line.items() if k=="address"]
In continuation with Vishal's answer, please dont use defaultdict. Using defaultdict is a very bad practice when you want to catch keyerrors. Please use setdefault.
results = dict()
for line in pg:
for key, value in line.items():
result.setdefault(key, []).append(value)
Output
{
'name': ['paul', 'mei', 'smith', 'raj'],
'age': [26, 26, 26, 13],
...
}
However, note that if all dicts in pg dont have the same keys, you will lose the relation/correspondence between the items in the dict
Here is a really simple solution if you want to use pandas:
import pandas as pd
df = pd.DataFrame(a)
name = df['name'].tolist()
age = df['age'].tolist()
address = df['address'].tolist()
print(name)
print(age)
print(address)
Output:
['paul', 'mei', 'smith', 'raj']
['26.', '26.', '16.', '13.']
['AU', 'NY', 'NY', 'IND']
Additionally, if your end result is a text file, you can skip the list creation and write the DataFrame (or parts thereof) directly to a CSV with something as simple as:
df.to_csv('/path/to/output.csv')

Convert multiple lists into dictionary

I want to convert following
Input: Name;class;subject;grade
sam;4;maths;A
tom;5;science;B
kathy;8;biology;A
nancy;9;maths;B
output: [Name:sam,class:4,subject: maths, grade:A],[name:tom,class:5,subject:science,grade:B],[name: kathy,class:8,subject:biology,grade:B],[name:nancy,class:9,subject:maths,grade:B]
You can create a function that accepts strings in a way that each piece of data is seperated by a character like : or ;.
Then you can use
string.split("the character you used")
to get a list of each piece of data stored in a list.
And finally you can store each of these elements in a dictionary and append that dictionary into the list you want to have as your output.
This code I used in my python shell will help you understand these operations better.
>>> input_string = "Tom:6:Maths"
>>> list_of_elements = input_string.split(":")
>>> container_dictioanry = {"Name":list_of_elements[0], "class":list_of_elements[1], "grade":list_of_elements[2]}
>>> output_list = []
>>> output_list.append(container_dictioanry)
>>> print(output_list)
[{'Name': 'Tom', 'class': '6', 'grade': 'Maths'}]
Basically yours is just a csv text chunk with delimiter ;
It can be as simple as:
input_text = '''<YOUR DATA>'''
lines = input_text.split('\n')
headers = lines[0].split(';')
output = [
dict(zip(headers, line.split(';')))
for line in lines[1:]
]
Since the text is CSV, you can use the csv library.
>>> import csv
>>>
>>>
>>> foo = '''Name;class;subject;grade
... sam;4;maths;A
... tom;5;science;B
... kathy;8;biology;A
... nancy;9;maths;B'''
>>>
>>> reader = csv.DictReader(foo.splitlines(), delimiter=';')
>>> print([row for row in reader])
[{'Name': 'sam', 'class': '4', 'subject': 'maths', 'grade': 'A'}, {'Name': 'tom', 'class': '5', 'subject': 'science', 'grade': 'B'}, {'Name': 'kathy', 'class': '8', 'subject': 'biology', 'grade': 'A'}, {'Name': 'nancy', 'class': '9', 'subject': 'maths', 'grade': 'B'}]

Json data manipulation using python 2.7[Redhat6.7]

I am a bit newbie to python and its data manipulation dict, list.
So I have following JSON data :
{'Namelist': {'thomas': {'gender': 'male', 'age': '23'}, 'david': {'gender': 'male'}, 'jennie': {'gender': 'female', 'age': '23'}, 'alex': {'gender': 'male'}}, 'selectors': {'naming': 'studentlist', 'code': 16}}
How can I manipulate through the data and get a result like this :
if age == 23 then return thomas and jennie as output and store it in a variable as string.
NOTE : It should iterate through the whole data and search for age, I am using the "for each" loop for this but not working.
Any help is appreciated
It looks like you already have the JSON parsed into an object, so you can just iterate through it and check the person's age.
dictionary = {
'Namelist': {
'thomas': {'gender': 'male', 'age': '23'},
'david': {'gender': 'male'},
'jennie': {'gender': 'female', 'age': '23'},
'alex': {'gender': 'male'}},
'selectors': {'naming': 'studentlist', 'code': 16}}
# For Loop Method
name_list = []
for name, person in dictionary['Namelist'].items():
if person.get('age') == '23':
name_list.append(name)
print(', '.join(name_list)) # Would print 'thomas, jennie'
# List Comprehension Method
name_list = [name for name, person in dictionary['Namelist'].items() if person.get('age') == '23']
print(', '.join(name_list))
This is a quick and dirty way that I'd do it. You can get into list comprehension as well, but I thought this was easier for you to understand as a newbie. It works in python 3 as well as I use the brackets for print().
variable = {'Namelist': {'thomas': {'gender': 'male', 'age': '23'},
'david': {'gender': 'male'}, 'jennie': {'gender': 'female', 'age':
'23'}, 'alex': {'gender': 'male'}}, 'selectors': {'naming':
'studentlist', 'code': 16}}
response = list() #Create a list to use to store the iterations.
for key, value in variable.items(): #Loop through the main dictionary
if key == 'Namelist': #Filter by the NameList
for theName, subValue in value.items(): #Loop through the dictionaries made for each name.
if 'age' in subValue and subValue['age'] == '23': #the age key wasn't in every dictionary so I check if it exists, then I check if it is set to 23.
response.append(theName + ' is 23 ') #add it to the response list.
nameString = ''.join(response) #turn the list into a string.
print (nameString) #print it

Converting a dict of dicts into a spreadsheet

My data is a dict of dicts like this (but with a lot more fields and names):
{'Jack': {'age': 15, 'location': 'UK'},
'Jill': {'age': 23, 'location': 'US'}}
I want to export it to a spreadsheet like this:
Name Age Location
Jack 15 UK
Jill 23 US
But apparently csv.DictWriter wants to see a list of dicts like this:
[{'name': 'Jack', 'age': 15, 'location': 'UK'},
{'name': 'Jill', 'age': 23, 'location': 'US'}]
What's the simplest way to get my dict of dicts into a CSV file?
I suppose the entries should be in alphabetical name order, too, but that's easy enough to do in the spreadsheet software.
mydict = {'Jack': {'age': 15, 'location': 'UK'},
'Jill': {'age': 23, 'location': 'US'}}
datalist = []
for name in mydict:
data = mydict[name]
data['name'] = name
datalist.append(data)
And datalist will hold the desired result. Notes:
This also modifies mydict (adding the 'name' key to each datum). You probably don't mind about that. It's a bit more efficient than the alternative (copying).
It's not the most flashy one-line-way-to-do-it, but it's very readable, which IMHO is more important
You can use list-comprehension to get your new list of dicts:
>>> [dict(zip(['name'] + attrs.keys(), [name] + attrs.values())) \
for name, attrs in d.iteritems()]
[{'age': 23, 'location': 'US', 'name': 'Jill'},
{'age': 15, 'location': 'UK', 'name': 'Jack'}]
EDIT: d is your dict:
>>> d
{'Jack': {'age': 15, 'location': 'UK'}, 'Jill': {'age': 23, 'location': 'US'}}

Categories