Remove duplicates from a list of nested dictionaries - python

I'm writing my first python program to manage users in Atlassian On Demand using their RESTful API. I call the users/search?username= API to retrieve lists of users, which returns JSON. The results is a list of complex dictionary types that look something like this:
[
{
"self": "http://www.example.com/jira/rest/api/2/user?username=fred",
"name": "fred",
"avatarUrls": {
"24x24": "http://www.example.com/jira/secure/useravatar?size=small&ownerId=fred",
"16x16": "http://www.example.com/jira/secure/useravatar?size=xsmall&ownerId=fred",
"32x32": "http://www.example.com/jira/secure/useravatar?size=medium&ownerId=fred",
"48x48": "http://www.example.com/jira/secure/useravatar?size=large&ownerId=fred"
},
"displayName": "Fred F. User",
"active": false
},
{
"self": "http://www.example.com/jira/rest/api/2/user?username=andrew",
"name": "andrew",
"avatarUrls": {
"24x24": "http://www.example.com/jira/secure/useravatar?size=small&ownerId=andrew",
"16x16": "http://www.example.com/jira/secure/useravatar?size=xsmall&ownerId=andrew",
"32x32": "http://www.example.com/jira/secure/useravatar?size=medium&ownerId=andrew",
"48x48": "http://www.example.com/jira/secure/useravatar?size=large&ownerId=andrew"
},
"displayName": "Andrew Anderson",
"active": false
}
]
I'm calling this multiple times and thus getting duplicate people in my results. I have been searching and reading but cannot figure out how to deduplicate this list. I figured out how to sort this list using a lambda function. I realize I could sort the list, then iterate and delete duplicates. I'm thinking there must be a more elegant solution.
Thank you!

The usernames are unique, right?
Does it have to be a list? Seems like an easy solution would be to make it a dict of dicts instead. Use the usernames as keys, and only the most recent version will be present.
If the values have to be ordered, there is an OrderedDict type you could look into: http://docs.python.org/2/library/collections.html#collections.OrderedDict

Let say it is what you got,
JSON = [
{
"name": "fred",
...
},
{
"name": "peter",
...
},
{
"name": "fred",
...
},
Convert this list of dict to a dict of dict will remove the duplicate, like so:
r = dict([(user['name'], user) for user in JSON])
In r you will only find one record of fred and peter each.

Related

creating dictionary inside a dictionary for converting it into Json data

My workflow is, I'll be having some dictionaries in different files, I'll be calling them into a file, let's say demo.py. I have a variable temp_dict in demo.py, in which the dictionary from different files will get appended one by one.
Example ::::
{
"Action": None,
"Parameter": "abc",
"ParameterDescription": "def",
"ExpectedValue": "ghi",
{
"Extensions":"jkl",
"MappedData": "no",
"Parameters": "pqr",
"Labels": "Stu",
}
{
"Recorder": "abc",
"Diagnostics": "efg",
"AdditionalRemarks": ""
}
}
I want this type of structure, I need to append dictionaries inside a dictionary, how can I do that.
I will also provide the python code
# function to add data to JSON
def write_json(new_data, filename='report.JSON'):
# new_data is the dictioanries coming from other files, it will be converted into json and dump it into a file.
with open(filename, 'w') as f:
json_string=json.dumps(new_data)
f.write(json_string)
Thanks in advance
The data you've provided is not a valid python dictionary, nor valid JSON.
Dictionaries and JSON are key: value pairs. The value might be a nested dict/JSON, however in your example the nested dictionaries do not have a key.
However, something like this would work:
{
"Action": None,
"Parameter": "abc",
"ParameterDescription": "def",
"ExpectedValue": "ghi",
"YOU NEED SOME NAME HERE": {
"Extensions":"jkl",
"MappedData": "no",
"Parameters": "pqr",
"Labels": "Stu",
},
…
}
You might have been thinking of json objects/dicts inside arrays. There the dictionaries don't have to be named, but that's because they implicitly have a name - their index (position) in the ordered array
[
{
"name": "Faboor",
"type": "user"
},
{
"name": "prithvi"
"reputation": 19
}
]

How to iterate through a nested list in python?

I want to iterate through a list that has a lot of dictionaries inside it. The json response I'm trying to iterate looks something like this:
user 1 JSON response:
[
{
"id": "333",
"name": "hello"
},
{
"id": "999",
"name": "hi"
},
{
"id": "666",
"name": "abc"
},
]
user 2 JSON response:
[
{
"id": "555",
"name": "hello"
},
{
"id": "1001",
"name": "hi"
},
{
"id": "26236",
"name": "abc"
},
]
This is not the actual JSON response but it is structured the same way. What I'm trying to do is to find a specific id and store it in a variable. The JSON response I'm trying to iterate is not organized and changes every time depending on the user. So I need to find the specific id which would be easy but there are many dictionaries inside the list. I tried iterating like this:
for guild_info in guilds:
for guild_ids in guild_info:
This returns the first dictionary which is id: 333. For example, I want to find the value 666 and store it in a variable. How would I do that?
What you have is a list of dictionaries.
When you run for guild_info in guilds: you will iterate through dictionaries, so here each guild_info will be a dictionary. Therefore simply take the key id like so: guild_info['id'].
If what you want to do is find the name corresponding to a specific id, you can use list comprehension and take its first element, as follows:
name = [x['name'] for x in guilds if x['id'] == '666'][0]
Here's a function that will search only until it finds the matching id and then return, which avoids checking further entries unnecessarily.
def get_name_for_id(user, id_to_find):
# user is a list, and each guild in it is a dictionary.
for guild in user:
if guild['id'] == id_to_find:
# Once the matching id is found, we're done.
return guild['name']
# If the loop completes without returning, then there was no match.
return None
user = [
{
"id": "333",
"name": "hello"
},
{
"id": "999",
"name": "hi"
},
{
"id": "666",
"name": "abc"
},
]
name = get_name_for_id(user, '666')
print(name)
name2 = get_name_for_id(user, '10000')
print(name2)
Output:
abc
None
This will create a loop which will iterate to the list of dictionaries.If you are looking for simple approach
for every_dictionary in List_of_dictionary:
for every_dictionary_item in every_dictionary.keys():
print(every_dictionary[every_dictionary_item])

Printing each instance of a single line item from a JSON using python

Does anyone know how to print and multiple instances of the same line from a JSON output?
The code I wish to decipher looks something similar to:
[
{
"project": {
"id": 6514847,
"name": "Trial_1",
"code": "123",
"created_at": "2014-10-08T04:22:14Z",
"updated_at": "2017-04-11T00:32:43Z",
"starts_on": "2014-10-08"
}
},
{
"project": {
"id": 6514864,
"name": "Trial_2",
"code": "456",
"created_at": "2014-10-08T04:26:39Z",
"updated_at": "2017-04-11T00:32:46Z",
"starts_on": "2014-10-08"
}
},
{
"project": {
"id": 12502453,
"name": "Trial_3",
"code": "789",
"created_at": "2016-12-08T05:14:38Z",
"updated_at": "2017-04-11T00:32:38Z",
"starts_on": "2016-12-08"
}
}
]
This code was a request.get()
I know I can print a single instance of this using
req = requests.get(url, headers=headers)
read_req = req.json()
trial = read_req['project']['code']
print(trial) #123
The final product I wish to see is linking each Project Name to its relevant Project Code.
You have a list of dicts of dicts. To iterate over each "project" dict you just use a for loop.
for entry in read_req:
trial = entry['project']['code']
print(trial)
In this case, each time through the loop entry will be a dictionary containing the "project" key.
You need for loop.
read_req = req.json()
for project in read_req:
print(project['project']['code'])
This should work for you:
assuming jsontxt is having input data
for i in range(0,len(jsontxt)):
print jsontxt[i]['project']['name'], jsontxt[i]['project']['code']

Python .get nested Json values

I have a json file with the following example json entry:
{
"title": "Test prod",
"leafPage": true,
"type": "product",
"product": {
"title": "test product",
"offerPrice": "$19.95",
"offerPriceDetails": {
"amount": 19.95,
"text": "$19.95",
"symbol": "$"
},
"media": [
{
"link": "http://www.test.com/cool.jpg",
"primary": true,
"type": "image",
"xpath": "/html[1]/body[1]/div[1]/div[3]/div[2]/div[1]/div[1]/div[1]/div[1]/a[1]/img[1]"
}
],
"availability": true
},
"human_language": "en",
"url": "http://www.test.com"
}
I can post via python script this to my test server perfectly when I use:
"text": entry.get("title"),
"url": entry.get("url"),
"type": entry.get("type"),
However I cannot get the following nested item to upload the values, how do I structure the python json call to get a nested python json entry?
Ive tried the below without success, I need to have it as .get because there are different fields currently in the json file and it errors out without the .get call.
"Amount": entry.get("product"("offerPrice"))
Any help on how to structure the nested json entry would be very much appreciated.
You need to do:
"Amount": entry.get("product", {}).get("offerPrice")
entry.get("product", {}) returns a product dictionary (or an empty dictionary if there is no product key).

Validate dicts in Python

i looking for tool, or examples to/how to validate dictionaries in python.
For example, i have dict:
test = {'foo' : 'bar', 'nested' : {'foo1' : 'bar1', 'foo2' : 'bar2'} }
And now i must validate it. Lets say, value for key foo must be boolean False or non-empty string. Next, if key foo1 have value bar1, that key foo2 must be int in range 1..10. I wrote simple function to do this, but this is not what i exactly want. Yea, sure, i can test every single item in dict with if..else, but if dict have >50 elements, then it is a bit not comfortable.
Is there any good tool/lib to do this in Python? I not looking for parsers, only fast and effective way to do this right.
Voluptous is a nice tool that does this
http://pypi.python.org/pypi/voluptuous
You can also try the link below:
https://github.com/sunlightlabs/validictory
Its a great package that helps in validation in an easier way
I highly recommend Cerberus for its readability or jsonschema because it uses the JSON Schema standard
Webster is a pypi package that does dictionary validation and value regex validation.. this allows you to insure that the dictionary has all the keys its supposed to and the values are more or less what you would expect.
https://pypi.python.org/pypi/Webster
This dict-schema-validator package is a very simple way to validate python dictionaries.
Here is a simple schema representing a Customer:
{
"_id": "ObjectId",
"created": "date",
"is_active": "bool",
"fullname": "string",
"age": ["int", "null"],
"contact": {
"phone": "string",
"email": "string"
},
"cards": [{
"type": "string",
"expires": "date"
}]
}
Validation:
from datetime import datetime
import json
from dict_schema_validator import validator
with open('models/customer.json', 'r') as j:
schema = json.loads(j.read())
customer = {
"_id": 123,
"created": datetime.now(),
"is_active": True,
"fullname": "Jorge York",
"age": 32,
"contact": {
"phone": "559-940-1435",
"email": "york#example.com",
"skype": "j.york123"
},
"cards": [
{"type": "visa", "expires": "12/2029"},
{"type": "visa"},
]
}
errors = validator.validate(schema, customer)
for err in errors:
print(err['msg'])
Output:
[*] "_id" has wrong type. Expected: "ObjectId", found: "int"
[+] Extra field: "contact.skype" having type: "str"
[*] "cards[0].expires" has wrong type. Expected: "date", found: "str"
[-] Missing field: "cards[1].expires"

Categories