Json data manipulation using python 2.7[Redhat6.7] - python

I am a bit newbie to python and its data manipulation dict, list.
So I have following JSON data :
{'Namelist': {'thomas': {'gender': 'male', 'age': '23'}, 'david': {'gender': 'male'}, 'jennie': {'gender': 'female', 'age': '23'}, 'alex': {'gender': 'male'}}, 'selectors': {'naming': 'studentlist', 'code': 16}}
How can I manipulate through the data and get a result like this :
if age == 23 then return thomas and jennie as output and store it in a variable as string.
NOTE : It should iterate through the whole data and search for age, I am using the "for each" loop for this but not working.
Any help is appreciated

It looks like you already have the JSON parsed into an object, so you can just iterate through it and check the person's age.
dictionary = {
'Namelist': {
'thomas': {'gender': 'male', 'age': '23'},
'david': {'gender': 'male'},
'jennie': {'gender': 'female', 'age': '23'},
'alex': {'gender': 'male'}},
'selectors': {'naming': 'studentlist', 'code': 16}}
# For Loop Method
name_list = []
for name, person in dictionary['Namelist'].items():
if person.get('age') == '23':
name_list.append(name)
print(', '.join(name_list)) # Would print 'thomas, jennie'
# List Comprehension Method
name_list = [name for name, person in dictionary['Namelist'].items() if person.get('age') == '23']
print(', '.join(name_list))

This is a quick and dirty way that I'd do it. You can get into list comprehension as well, but I thought this was easier for you to understand as a newbie. It works in python 3 as well as I use the brackets for print().
variable = {'Namelist': {'thomas': {'gender': 'male', 'age': '23'},
'david': {'gender': 'male'}, 'jennie': {'gender': 'female', 'age':
'23'}, 'alex': {'gender': 'male'}}, 'selectors': {'naming':
'studentlist', 'code': 16}}
response = list() #Create a list to use to store the iterations.
for key, value in variable.items(): #Loop through the main dictionary
if key == 'Namelist': #Filter by the NameList
for theName, subValue in value.items(): #Loop through the dictionaries made for each name.
if 'age' in subValue and subValue['age'] == '23': #the age key wasn't in every dictionary so I check if it exists, then I check if it is set to 23.
response.append(theName + ' is 23 ') #add it to the response list.
nameString = ''.join(response) #turn the list into a string.
print (nameString) #print it

Related

How can i compare and remove nested dictionaries with the same values within the same dictionary?

If I have a dictionary with data in it like below what process should i enact like an if statement to delete duplicate entries such as nested dictionary 1 and 4. Lets say i wanted to delete 4 because the user entered it and i'm assuming that people are unique so they can't have the same demographics there can't be two John R. Smiths.
people = {1: {'name': 'John R. Smith', 'age': '27', 'sex': 'Male'},
2: {'name': 'Marie', 'age': '22', 'sex': 'Female'}
3: {'name': 'Mariah', 'age': '32', 'sex': 'Female'},
4: {'name': 'John R. Smith', 'age': '27', 'sex': 'Male'}}
I am just learning so i wouldn't be surprised if there is something simple I was unable to come up with.
I attempted to compare the entries such as if ['1']['name'] and ['1']['sex'] == ['4']['name'] and ['4']['sex']:
then print['4'] just to test and the error message told me that I need to be using indexes.
I've also turned it into a list which was successfull but was met with another error when trying to compare them in a manner like if person['name'] and person['age'] and person['sex'] is equal to another row within a four loop than print a message and i got nowhere.
I've also tried to turn it into a dataframe and use pandas duplicate function to remove the duplicates in which I got some error
yesterday about 'dict' probably because the dictionaries get nested in the dataframe contrasting to a list with nested
dictionaries which tends to look like this:
[{1: {'name': 'John', 'age': '27', 'sex': 'Male'},
2: {'name': 'Marie', 'age': '22', 'sex': 'Female'}]
You can take advantage of the fact that dict keys are always unique to help de-duplicate. Since dicts are unhashable and can't be used as keys directly, you can convert each sub-dict to a tuple of items first. Use dict.setdefault to keep only the first value for each distinct key:
records = {}
for number, record in people.items():
records.setdefault(tuple(record.items()), (number, record))
print(dict(records.values()))
Given your sample input, this outputs:
{1: {'name': 'John R. Smith', 'age': '27', 'sex': 'Male'}, 2: {'name': 'Marie', 'age': '22', 'sex': 'Female'}, 3: {'name': 'Mariah', 'age': '32', 'sex': 'Female'}}
Demo: https://replit.com/#blhsing/LonelyNumbWatch
One approach is to build a new dictionary by iterating over people and assigning a person to the new dictionary if their data is unique. The following solution uses a set for tracking unique users:
from pprint import pprint
unique_people = {}
unique_ids = set()
for key, data in people.items():
data_id = tuple(data.values())
if data_id in unique_ids:
continue
unique_people[key] = data
unique_ids.add(data_id)
pprint(unique_people)
Output:
{1: {'age': '27', 'name': 'John R. Smith', 'sex': 'Male'},
2: {'age': '22', 'name': 'Marie', 'sex': 'Female'},
3: {'age': '32', 'name': 'Mariah', 'sex': 'Female'}}

convert values of list of dictionaries with different keys to new list of dicts

I have this list of dicts:
[{'name': 'aly', 'age': '104'},
{'name': 'Not A name', 'age': '99'}]
I want the name value to be the key and the age value to be the value of new dict.
Expected output:
['aly' : '104', 'Not A name': '99']
If you want output to be single dict, you can use dict comprehension:
output = {p["name"]: p["age"] for p in persons}
>>> {'aly': '104', 'Not A name': '99'}
If you want output to be list of dicts, you can use list comprehension:
output = [{p["name"]: p["age"]} for p in persons]
>>> [{'aly': '104'}, {'Not A name': '99'}]
You can initialize the new dict, iterate through the list and add to the new dict:
lst = [{'name': 'aly', 'age': '104'}, {'name': 'Not A name', 'age': '99'}]
newdict = {}
for item in lst:
newdict[item['name']] = item['age']
This will help you:
d = [
{'name': 'aly', 'age': '104'},
{'name': 'Not A name', 'age': '99'}
]
dict([i.values() for i in d])
# Result
{'aly': '104', 'Not A name': '99'}
# In case if you want a list of dictionary, use this
[dict([i.values() for i in d])]
# Result
[{'aly': '104', 'Not A name': '99'}]
Just a side note:
Your expected answer looks like a list (because of [ ]) but values inside the list are dictionary (key:value) which is invalid.
Here is the easiest way to convert the new list of dicts
res = list(map(lambda data: {data['name']: data['age']}, d))
print(res)

How to store information from a loop function?

Could you please help me store the 'name' and 'gender' into a new pandas.DataFrame from the following loop's outcome?
Here's my loop function:
def predict_gender_combined(name_input):
d_2=GenderDetector()
g_2=d_2.get_gender(name_input)
g_3= Genderize().get([name_input])
print(f'{g_2}\n{g_3}')
print('---------------')
return(g_2,g_3)
name_list= ['Anna', 'Maria']
for name in name_list:
_=predict_gender_combined(name)
outcome:
Person(title=None, first_name='anna', last_name=None, email=None, gender='f')
[{'name': 'Anna', 'gender': 'female', 'probability': 0.98, 'count': 383713}]
---------------
Person(title=None, first_name='maria', last_name=None, email=None, gender='f')
[{'name': 'Maria', 'gender': 'female', 'probability': 0.98, 'count': 334287}]
---------------
Goal: To create a new pandas.DataFrame, with first column "name" and second column "gender"
name gender
Anna f
Maria f
Attempt:
prediction_list = list()
name_list= ['Anna', 'Maria']
for name in name_list:
prediction=predict_gender_combined(name)
prediction_list.append(prediction)
This is what dictionary comprehensions are for.
# This := syntax is an "assignment expression" that is available in Python 3.8+
result = {"name": predicted[0]["name"], "gender": predicted[0]["gender"] for predicted := predict_gender_combined(name) in name_list}
It is, however a lot. Let's write that out so it's a little easier to read:
result = {}
for name in name_list:
predicted = predict_gender_combined(name)
result["name"] = predicted[0]["name"]
result["gender"] = predicted[0]["gender"]
I'm going to make some assumptions on what you're hoping to do here:
you're trying to get a list of dictionaries
each dictionary holds a name and holds a count for the number of times the names occurred in name_list.
I'm not sure what the probability key is used for, and I don't know what the g_3 is used for in your defined function, so I'll have to leave that up to you. But given these assumptions, here's what I would recommend:
If you really want a list of dictionaries, that's fine, but it would probably be easier to first make a dictionary of dictionaries and then convert it to a list, e.g.,
{
"Tim": {'name': 'Tim', 'gender': 'M', 'probability': 0.0, 'count': 4}, "Sam": {'name': 'Sam', 'gender': 'F', 'probability': 0.0, 'count': 5},
...
}
Then, you could use the following code:
name_list=list_of_users
name_dict={}
for name in name_list:
test_list=predict_gender_combined(name)
if name in name_dict:
name_dict[name] = {'name': name, 'gender': test_list[0], 'probability': 0.0, 'count': 1}
else:
name_dict[name]['count'] += 1
final_list=list(name_dict.values())
Hope that gets you started.

How can I use list comprehension to separate values in a dictionary?

name=[]
age=[]
address=[]
...
for line in pg:
for key,value in line.items():
if key == 'name':
name.append(value)
elif key == 'age':
age.append(value)
elif key == 'address':
address.append(value)
.
.
.
Is it possible to use list comprehension for above code because I need to separate lots of value in the dict? I will use the lists to write to a text file.
Source Data:
a = [{'name': 'paul', 'age': '26.', 'address': 'AU', 'gender': 'male'},
{'name': 'mei', 'age': '26.', 'address': 'NY', 'gender': 'female'},
{'name': 'smith', 'age': '16.', 'address': 'NY', 'gender': 'male'},
{'name': 'raj', 'age': '13.', 'address': 'IND', 'gender': 'male'}]
I don't think list comprehension will be a wise choice because you have multiple lists.
Instead of making multiple lists and appending to them the value if the key matches you can use defaultdict to simplify your code.
from collections import defaultdict
result = defaultdict(list)
for line in pg:
for key, value in line.items():
result[key].append(value)
You can get the name list by using result.get('name')
['paul', 'mei', 'smith', 'raj']
This probably won't work the way you want: Your'e trying to assign the three different lists, so you would need three different comprehensions. If your dict is large, this would roughly triple your execution time.
Something straightforward, such as
name = [value for for key,value in line.items() if key == "name"]
seems to be what you'd want ... three times.
You can proceed as :
pg=[{"name":"name1","age":"age1","address":"address1"},{"name":"name2","age":"age2","address":"address2"}]
name=[v for line in pg for k,v in line.items() if k=="name"]
age=[v for line in pg for k,v in line.items() if k=="age"]
address=[v for line in pg for k,v in line.items() if k=="address"]
In continuation with Vishal's answer, please dont use defaultdict. Using defaultdict is a very bad practice when you want to catch keyerrors. Please use setdefault.
results = dict()
for line in pg:
for key, value in line.items():
result.setdefault(key, []).append(value)
Output
{
'name': ['paul', 'mei', 'smith', 'raj'],
'age': [26, 26, 26, 13],
...
}
However, note that if all dicts in pg dont have the same keys, you will lose the relation/correspondence between the items in the dict
Here is a really simple solution if you want to use pandas:
import pandas as pd
df = pd.DataFrame(a)
name = df['name'].tolist()
age = df['age'].tolist()
address = df['address'].tolist()
print(name)
print(age)
print(address)
Output:
['paul', 'mei', 'smith', 'raj']
['26.', '26.', '16.', '13.']
['AU', 'NY', 'NY', 'IND']
Additionally, if your end result is a text file, you can skip the list creation and write the DataFrame (or parts thereof) directly to a CSV with something as simple as:
df.to_csv('/path/to/output.csv')

Python json add multiple keys and iterate

I am looking to convert a dataframe to json, this is the code I currently have:
my_frame = pd.DataFrame(
{'Age':[30, 31],
'Eye':['blue', 'brown'],
'Gender': ['male', 'female']})
my_frame = my_frame.to_json(orient='records')
my_frame
Result:
'[{"Age":30,"Eye":"blue","Gender":"male"},{"Age":31,"Eye":"brown","Gender":"female"}]'
I want to add keys to the json object and add the key info over the entire data that was converted from the dataframe.
add_keys = {'id': 101,
'loc': 'NY',
}
add_keys['info'] = my_frame
add_keys
Result:
{'id': 101,
'info': '[{"Age":30,"Eye":"blue","Gender":"male"},
{"Age":31,"Eye":"brown","Gender":"female"}]',
'loc': 'NY'}
I want to print each of the two records within info, however when I run this iterative code it outputs each character of the string rather than the entire record. I believe this may be an issue from how I am adding the keys.
for item in add_keys['info']:
print(item)
Any help greatly appreciated!
It is better to use pandas inbuilt functionality here. So, this is what you need: add_keys['info'] = my_frame.T.to_dict().values()
Here is the whole code:
>>> my_frame = pd.DataFrame(
... {'Age':[30, 31],
... 'Eye':['blue', 'brown'],
... 'Gender': ['male', 'female']})
>>> my_frame
Age Eye Gender
0 30 blue male
1 31 brown female
>>> add_keys = {'id': 101,
... 'loc': 'NY',
... }
>>> add_keys
{'loc': 'NY', 'id': 101}
>>> add_keys['info'] = my_frame.T.to_dict().values()
>>> add_keys
{'info': [{'Gender': 'male', 'Age': 30L, 'Eye': 'blue'}, {'Gender': 'female', 'Age': 31L, 'Eye': 'brown'}], 'loc': 'NY', 'id': 101}
>>> for item in add_keys['info']:
... print(item)
...
{'Gender': 'male', 'Age': 30L, 'Eye': 'blue'}
{'Gender': 'female', 'Age': 31L, 'Eye': 'brown'}
>>>
When you are using to_json(), pandas is generating a string containing the JSON representation of the dataframe.
If you want to retain the structure of your records in order to manipulate them, use
my_frame = my_frame.to_dict(orient='records')
Then after adding keys, if you want to serialize your data, you can do
json.dumps(add_keys)

Categories