Count unique values in a JSON - python

I have a json called thefile.json which looks like this:
{
"domain": "Something",
"domain": "Thingie",
"name": "Another",
"description": "Thing"
}
I am trying to write a python script which would made a set of the values in domain. In this example it would return
{'Something', 'Thingie'}
Here is what I tried:
import json
with open("thefile.json") as my_file:
data = json.load(my_file)
ids = set(item["domain"] for item in data.values())
print(ids)
I get the error message
unique_ids.add(item["domain"])
TypeError: string indices must be integers
Having looked up answers on stack exchange, I'm stumped. Why can't I have a string as an index, seeing as I am using a json whose data type is a dictionary (I think!)? How do I get it so that I can get the values for "domain"?

So, to start, you can read more about JSON formats here: https://www.w3schools.com/python/python_json.asp
Second, dictionaries must have unique keys. Therefore, having two keys named domain is incorrect. You can read more about python dictionaries here: https://www.w3schools.com/python/python_dictionaries.asp
Now, I recommend the following two designs that should do what you need:
Multiple Names, Multiple Domains: In this design, you can access websites and check the domain of each of its values like ids = set(item["domain"] for item in data["websites"])
{
"websites": [
{
"domain": "Something.com",
"name": "Something",
"description": "A thing!"
},
{
"domain": "Thingie.com",
"name": "Thingie",
"description": "A thingie!"
},
]
}
One Name, Multiple Domains: In this design, each website has multiple domains that can be accessed using JVM_Domains = set(data["domains"])
{
"domains": ["Something.com","Thingie.com","Stuff.com"]
"name": "Me Domains",
"description": "A list of domains belonging to Me"
}
I hope this helps. Let me know if I missed any details.

You have a problem in your JSON, duplicate keys. I am not sure if it is forbiden, but I am sure it is bad formatted.
Besides that, of course it is gonna bring you lot of problems.
A dictionary can not have duplicate keys, what would be the return of a duplicate key?.
So, fix your JSON, something like this,
{
"domain": ["Something", "Thingie"],
"name": "Another",
"description": "Thing"
}
Guess what, good format almost solve your problem (you can have duplicates in the list) :)

Related

DynamoDB Query by by nested array

I have a Companies table in DynamoDB that looks like this:
company: {
id: "11",
name: "test",
jobs: [
{
"name": "painter",
"id": 3
},
{
"name": "gardner"
"id": 2
}
]
}
And I want to make a scan query that get all the companies with the "painter" job inside their jobs array
I am using python and boto3
I tried something like this but it didn't work
jobs = ["painter"]
response = self.table.scan(
FilterExpression=Attr('jobs.name').is_in(jobs)
)
Please help.
Thanks.
It looks like this may not be doable in general, however it's possible that the method applied in that link may still be useful. If you know the maximum length of the jobs array over all of your data, you could create an expression for each index chained with ORs. Notably I could not find documentation for handling map and list scan expressions, so I can't really say whether you'd also need to check that you're not going out of bounds.

How to copy a python script which includes dictionaries to a new python script?

I have a python script which contains dictionaries and is used as input from another python script which performs calculations. I want to use the first script which is used as input, to create more scripts with the exact same structure in the dictionaries but different values for the keys.
Original Script: Car1.py
Owner = {
"Name": "Jim",
"Surname": "Johnson",
}
Car_Type = {
"Make": "Ford",
"Model": "Focus",
"Year": "2008"
}
Car_Info = {
"Fuel": "Gas",
"Consumption": 5,
"Max Speed": 190
}
I want to be able to create more input files with identical format but for different cases, e.g.
New Script: Car2.py
Owner = {
"Name": "Nick",
"Surname": "Perry",
}
Car_Type = {
"Make": "BMW",
"Model": "528",
"Year": "2015"
}
Car_Info = {
"Fuel": "Gas",
"Consumption": 10,
"Max Speed": 280
}
So far, i have only seen answers that print just the keys and the values in a new file but not the actual name of the dictionary as well. Can someone provide some help? Thanks in advance!
If you really want to do it that way (not recommended, because of the reasons statet in the comment by spectras and good alternatives) and import your input Python file:
This question has answers on how to read out the dictionaries names from the imported module. (using the dict() on the module while filtering for variables that do not start with "__")
Then get the new values for the dictionary entries and construct the new dicts.
Finally you need to write a exporter that takes care of storing the data in a python readable form, just like you would construct a normal text file.
I do not see any advantage over just storing it in a storage format.
read the file with something like
text=open('yourfile.py','r').read().split('\n')
and then interpret the list of strings you get... after that you can save it with something like
new_text = open('newfile.py','w')
[new_text.write(line) for line in text]
new_text.close()
as spectras said earlier, not ideal... but if that's what you want to do... go for it

Parsing JSON with Python from URL

So I'm trying to get json from a URl and the request works and I get the json but I'm not able to print specific things from it.
request_url = 'http://api.tumblr.com/v2/user/following?limit=1'
r = requests.get(request_url, auth=oauth).json()
r["updated"]
I'm very new with python I'm guessing I need to get the json into a array but I have no idea where to even begin.
According to the tumblr api I should be able to get something like this.
{
"meta": {
"status": 200,
"msg": "OK"
},
"response": {
"total_blogs": 4965,
"blogs": [
{
"name": "xxxxx",
"title": "xxxxxx",
"description": "",
"url": "http://xxxxxx.tumblr.com/",
"updated": 1439795949
}
]
}
}
I only need the name, url, and updated just no idea how to seperate that out.
Just access the levels one by one.
for i in r["response"]["blogs"]:
print i["name"],i["url"],i["updated"]
So this code can be used to print all the objects inside the blogs list
To explain how this works:
Json objects are decoded into something called dictionaries in Python. Dictionaries are simple key value pairs. In your example,
r is a dictionary with the following keys:
meta, response
You access the value of a key using r["meta"].
Now meta itself is a dictionary. The keys associated are:
status,msg
So, r["meta"]["status"] gives the status value returned by the request.
You should be able to print values as though it were nested arrays:
r["response"]["blogs"][0]["updated"] should get you the updated bit, don't go straight to it. Just work your way down. Note how blogs is an array, so in a normal case you may actually want to work towards r["response"]["blogs"], then loop through it and for each of those items, grab the ["updated"].
Similarly, r["meta"]["msg"] will get you the meta message.
The JSON data gets converted as dict which is set to r as per your code.
For accessing the value associated with updated key, you need to first access the values before it.
You should first access r["response"] which contains the actual response of the api. From that level, you should next access r["response"]["blogs"] and then loop through that to find the value of the updated key.
If it is a single blog, you can do something like r["response"]["blogs"][0]["updated"]

parsing JSON files in python to get specific values [duplicate]

This question already has answers here:
How can I extract a single value from a nested data structure (such as from parsing JSON)?
(5 answers)
Closed 4 years ago.
I'm trying to simply grab information from a json file using python. I've seen many threads that briefly cover this, however there are some questions that I have that I have not seen answered on here.
Let's say I have the following json file:
{
"href" : "http://www.google.com",
"Hosts" : {
"cluster_name" : "MyCluster",
"cpu_count" : 4,
"disk_info" : [
{
"available" : "78473288",
"used" : "15260444",
},
{
"available" : "4026904",
"used" : "8",
},
{
"available" : "110454",
"used" : "73547",
}
]
}
}
I understand that I can have the following code:
import json
#Store JSON data into json_file
json_data = json.loads(json_file)
print json_data["Hosts"]
And the output of this code would give me everything that is in hosts:
"cluster_name" : "MyCluster",
"cpu_count" : 4,
"disk_info" : [
{
"available" : "78473288",
"used" : "15260444",
},
{
"available" : "4026904",
"used" : "8",
},
{
"available" : "110454",
"used" : "73547",
}
]
However, how do I grab specific values from specific lines within this json file that has multiple lines of data embedded in others?
For example, what if I wanted to get the value of "cpu_count" from "Hosts"?
Or for an even more difficult example, how would I get the second listing of "available" that is inside "disk_info" which is inside "Hosts"?
I'm much more interested in finding a way to grab specific values within a json file, not an entire list of data.
Any help is appreciated, and please point it out if there is already a thread that ACTUALLY covers this. Thank you!
Json.loads will also decode that dictionary. So to access cpu_count for example it would be json_data["Hosts"]["cpu_count"].
The json library will turn everything into a standard python data type (dict, list, int, str) or from those data type into a json readable format. You can also define a parser to parse from a class into json.
If you understand that you can do json_data["Hosts"], what's stopping you taking the same principle further?
json_data["Hosts"]["cpu_count"]
json_data["Hosts"]["disk_info"][1]["available"]
and so on.
Note that this has nothing at all to do with JSON: this is a perfectly normal Python data structure consisting of dicts and lists.

Multiple FOR loops in iterating over dictionary in Python

This is a simplistic example of a dictionary created by a json.load that I have t deal with:
{
"name": "USGS REST Services Query",
"queryInfo": {
"timePeriod": "PT4H",
"format": "json",
"data": {
"sites": [{
"id": "03198000",
"params": "[00060, 00065]"
},
{
"id": "03195000",
"params": "[00060, 00065]"
}]
}
}
}
Sometimes there may be 15-100 sites with unknown sets of parameters at each site. My goal is to either create two lists (one storing "site" IDs and the other storing "params") or a much simplified dictionary from this original dictionary. Is there a way to do this using nested for loops with kay,value pairs using the iteritem() method?
What I have tried to far is this:
queryDict = {}
for key,value in WS_Req_dict.iteritems():
if key == "queryInfo":
if value == "data":
for key, value in WS_Req_dict[key][value].iteritems():
if key == "sites":
siteVal = key
if value == "params":
paramList = [value]
queryDict["sites"] = siteVal
queryDict["sites"]["params"] = paramList
I run into trouble getting the second FOR loop to work. I haven't looked into pulling out lists yet.
I think this maybe an overall stupid way of doing it, but I can't see around it yet.
I think you can make your code much simpler by just indexing, when feasible, rather than looping over iteritems.
for site in WS_Req_dict['queryInfo']['data']['sites']:
queryDict[site['id']] = site['params']
If some of the keys might be missing, dict's get method is your friend:
for site in WS_Req_dict.get('queryInfo',{}).get('data',{}).get('sites',[]):
would let you quietly ignore missing keys. But, this is much less readable, so, if I needed it, I'd encapsulate it into a function -- and often you may not need this level of precaution! (Another good alternative is a try/except KeyError encapsulation to ignore missing keys, if they are indeed possible in your specific use case).

Categories