Iterate on nested json through python - python

I am having a json data coming through API and i need to print some specific values through it and i am using the code to get the same but its giving me keyerror : results
import json
import requests
headers = {"Content-Type":"application/json","Accept":"application/json"}
r = requests.get('API', headers=headers)
data=r.text
parse_json=json.loads(data)
for result in parse_json['results']:
for cpu in result['cpumfs']:
print(cpu.get('pct_im_utilization'))
Below is the json data :
{'d': {'__count': '0', 'results': [{'ID': '6085', 'Name': 'device1', 'DisplayName': None, 'DisplayDescription': None, 'cpumfs': {'results': [{'ID': '6117', 'Timestamp': '1649157300', 'DeviceItemID': '6085', 'pct_im_Utilization': '4.0'}, {'ID': '6117', 'Timestamp': '1649157600', 'DeviceItemID': '6085', 'pct_im_Utilization': '1.0'}, {'ID': '6117', 'Timestamp': '1649157900', 'DeviceItemID': '6085', 'pct_im_Utilization': '4.0'}, {'ID': '6117', 'Timestamp': '1649158200', 'DeviceItemID': '6085', 'pct_im_Utilization': '1.0'},
I need to get printed Name of the device ,Timestamp,pct_im_utlization

Related

How to fix an error thrown by python stating indices must be integers

I am pulling from an API in URL, and keep getting string indices must be integers error code thrown in python. I am wondering how to fix this? (The url is replaced with "url").
import urllib.request
import json
link = "url"
f = urllib.request.urlopen(link)
data = f.read()
print (str(data, 'utf-8'))
weather = json.loads(data)
print('/n')
print(weather["name"]["temp"])
Here's a sample of the json data:
{"coord":{"lon":-94.2166,"lat":36.4676},"weather":[{"id":600,"main":"Snow","description":"light snow","icon":"13n"}],"base":"stations","main":{"temp":262.83,"feels_like":255.36,"temp_min":262.04,"temp_max":263.71,"pressure":1025,"humidity":92},"visibility":10000,"wind":{"speed":6.17,"deg":30},"clouds":{"all":90},"dt":1613195709,"sys":{"type":1,"id":5695,"country":"US","sunrise":1613135272,"sunset":1613174070},"timezone":-21600,"id":0,"name":"Bella Vista","cod":200}
Here is the data snippet you shared formatted in a way that it makes it easier to see what is going on. As you can see, while weather["name"] is valid, weather["name"]["temp"] is not, and that is what is producing the error you are seeing. Instead, weather["main"]["temp"] will display the temperature value in that dictionary.
weather = {'base': 'stations',
'clouds': {'all': 90},
'cod': 200,
'coord': {'lat': 36.4676, 'lon': -94.2166},
'dt': 1613195709,
'id': 0,
'main': {'feels_like': 255.36,
'humidity': 92,
'pressure': 1025,
'temp': 262.83,
'temp_max': 263.71,
'temp_min': 262.04},
'name': 'Bella Vista',
'sys': {'country': 'US',
'id': 5695,
'sunrise': 1613135272,
'sunset': 1613174070,
'type': 1},
'timezone': -21600,
'visibility': 10000,
'weather': [{'description': 'light snow',
'icon': '13n',
'id': 600,
'main': 'Snow'}],
'wind': {'deg': 30, 'speed': 6.17}}

how to ajax post form as json type to python then get data correct way?

js part
$('#btnUpdate').click(function(){
var formData = JSON.stringify($("#contrast_rule_set").serializeArray());
$.ajax({
type: "POST",
url: "./contrast_rule_set",
data: formData,
success: function(){},
dataType: "json",
contentType : "application/json"
});
})
python part
#app.route('/get_test', methods=['GET','POST'])
def get_test():
web_form_data = request.json
print(web_form_data)
print(type(web_form_data))
print(jsonify(web_form_data))
print(json.dumps(web_form_data))
python print console like
[{'name': 'logic_1', 'value': '1'}, {'name': 'StudyDescription_1', 'value': ''}, {'name': 'SeriesDescription_1', 'value': 'C\\+'}, {'name': 'ImageComments_1', 'value': ''}, {'name': 'logic_2', 'value': '1'}, {'name': 'StudyDescription_2', 'value': '\\-C'}, {'name': 'SeriesDescription_2', 'value': '\\-C'}, {'name': 'ImageComments_2', 'value': '\\-C'}, {'name': 'logic_3', 'value': '1'}, {'name': 'StudyDescription_3', 'value': ''}, {'name': 'SeriesDescription_3', 'value': '\\+C'}, {'name': 'ImageComments_3', 'value': '\\+C'}]
<class 'list'>
how to get list to json data type (or converter ) (html side code adjust or python side code adjust? )
then hope to get data like json type (data is from my another json file )
{
'Logic': 'AND',
'StudyDescription': '',
'SeriesDescription': 'C\+',
'ImageComments': ''
},
{
'Logic': 'NOT',
'StudyDescription': '\-C',
'SeriesDescription': '\-C',
'ImageComments': '\-C'
},
{
'Logic': 'AND',
'StudyDescription': '',
'SeriesDescription': '\+C',
'ImageComments': '\+C'
}

How do you pull JSON array from API in Python?

I am having issues pulling a JSON array from http://api.divesites.com.
When I manually set the data as a string in Python 3.7, I can see the array.
import json
def main():
hardcode = """{"request":{"str":null,"timestamp":1572865590,"loc":{"lat":"50.442","lng":"-4.08279999999999"},"mode":"sites","dist":"9","api":1},"sites":[{"currents":null,"distance":"7.47","hazards":null,"lat":"50.3378","name":"Fort Bovisand","water":null,"marinelife":null,"description":null,"maxdepth":null,"mindepth":null,"predive":null,"id":"24387","equipment":null,"lng":"-4.1285"},{"currents":null,"distance":"7.93","hazards":null,"lat":"50.3352","name":"Plymouth Breakwater","water":null,"marinelife":null,"description":null,"maxdepth":null,"mindepth":null,"predive":null,"id":"24388","equipment":null,"lng":"-4.1485"}],"version":1,"loc":{"lat":"50.442","lng":"-4.08279999999999"},"result":true}
"""
print("Original Data: ---", get_data(hardcode))
print("Site Data:----", get_data(hardcode)['sites'])
def get_data(data):
return json.loads(data)
if __name__=='__main__':
main()
Output
Original Data: --- {'request': {'str': None, 'timestamp': 1572865590, 'loc': {'lat': '50.442', 'lng': '-4.08279999999999'}, 'mode': 'sites', 'dist': '9', 'api': 1}, 'sites': [{'currents': None, 'distance': '7.47', 'hazards': None, 'lat': '50.3378', 'name': 'Fort Bovisand', 'water': None, 'marinelife': None, 'description': None, 'maxdepth': None, 'mindepth': None, 'predive': None, 'id': '24387', 'equipment': None, 'lng': '-4.1285'}, {'currents': None, 'distance': '7.93', 'hazards': None, 'lat': '50.3352', 'name': 'Plymouth Breakwater', 'water': None, 'marinelife': None, 'description': None, 'maxdepth': None, 'mindepth': None, 'predive': None, 'id': '24388', 'equipment': None, 'lng': '-4.1485'}], 'version': 1, 'loc': {'lat': '50.442', 'lng': '-4.08279999999999'}, 'result': True}
Site Data:---- [{'currents': None, 'distance': '7.47', 'hazards': None, 'lat': '50.3378', 'name': 'Fort Bovisand', 'water': None, 'marinelife': None, 'description': None, 'maxdepth': None, 'mindepth': None, 'predive': None, 'id': '24387', 'equipment': None, 'lng': '-4.1285'}, {'currents': None, 'distance': '7.93', 'hazards': None, 'lat': '50.3352', 'name': 'Plymouth Breakwater', 'water': None, 'marinelife': None, 'description': None, 'maxdepth': None, 'mindepth': None, 'predive': None, 'id': '24388', 'equipment': None, 'lng': '-4.1485'}]
However when I try and perform a GET request (using urllib.request.urlopen()), I only receive the objects.
import urllib.request
import json
def get_jsonparsed_data(url):
response = urllib.request.urlopen(url)
data = response.read().decode("utf-8")
return json.loads(data)
def main():
url = 'http://api.divesites.com/?mode=sites&lat=-50.350874&lng=175.849890&dist=12'
print("Original Data: ---", get_jsonparsed_data(url))
print("Sites: ----", get_jsonparsed_data(url)['sites'])
if __name__=='__main__':
main()
Output
Original Data: --- {'request': {'str': None, 'timestamp': 1572866211, 'loc': {'lat': '-50.350874', 'lng': '175.849890'}, 'mode': 'sites', 'dist': '12', 'api': 1}, 'sites': [], 'version': 1, 'loc': {'lat': '-50.350874', 'lng': '175.849890'}, 'result': True}
Site Data:---- []
Am I doing something wrong, or do I need an extra step?
Solved it. I spent so long focusing on debugging the code I forgot to check the final API call. Turns out I either didn't have enough distance set, or the wrong starting longitude and latitude. This was different to my hard coded JSON.
The URL should have been: http://api.divesites.com/?mode=sites&lat=50.442&lng=-4.08279999999999&dist=9
Stupid mistake. Thanks for helping though. It got me to look.
First, I'd use the requests library instead and then you can find the solution here

How to flatten nested dict formatted '_source' column of csv, into dataframe

I have a csv with 500+ rows where one column "_source" is stored as JSON. I want to extract that into a pandas dataframe. I need each key to be its own column. #I have a 1 mb Json file of online social media data that I need to convert the dictionary and key values into their own separate columns. The social media data is from Facebook,Twitter/web crawled... etc. There are approximately 528 separate rows of posts/tweets/text with each having many dictionaries inside dictionaries. I am attaching a few steps from my Jupyter notebook below to give a more complete understanding. need to turn all key value pairs for dictionaries inside dictionaries into columns inside a dataframe
Thank you so much this will be a huge help!!!
I have tried changing it to a dataframe by doing this
source = pd.DataFrame.from_dict(source, orient='columns')
And it returns something like this... I thought it might unpack the dictionary but it did not.
#source.head()
#_source
#0 {'sub_organization_id': 'default', 'uid': 'aba...
#1 {'sub_organization_id': 'default', 'uid': 'ab0...
#2 {'sub_organization_id': 'default', 'uid': 'ac0...
below is the shape
#source.shape (528, 1)
below is what the an actual "_source" row looks like stretched out. There are many dictionaries and key:value pairs where each key needs to be its own column. Thanks! The actual links have been altered/scrambled for privacy reasons.
{'sub_organization_id': 'default',
'uid': 'ac0fafe9ba98327f2d0c72ddc365ffb76336czsa13280b',
'project_veid': 'default',
'campaign_id': 'default',
'organization_id': 'default',
'meta': {'rule_matcher': [{'atribs': {'website': 'github.com/res',
'source': 'Explicit',
'version': '1.1',
'type': 'crawl'},
'results': [{'rule_type': 'hashtag',
'rule_tag': 'Far',
'description': None,
'project_veid': 'A7180EA-7078-0C7F-ED5D-86AD7',
'campaign_id': '2A6DA0C-365BB-67DD-B05830920',
'value': '#Far',
'organization_id': None,
'sub_organization_id': None,
'appid': 'ray',
'project_id': 'CDE2F42-5B87-C594-C900E578C',
'rule_id': '1838',
'node_id': None,
'metadata': {'campaign_title': 'AF',
'project_title': 'AF '}}]}],
'render': [{'attribs': {'website': 'github.com/res',
'version': '1.0',
'type': 'Page Render'},
'results': [{'render_status': 'success',
'path': 'https://east.amanaws.com/rays-ime-store/renders/b/b/70f7dffb8b276f2977f8a13415f82c.jpeg',
'image_hash': 'bb7674b8ea3fc05bfd027a19815f82c',
'url': 'https://discooprdapp.com/',
'load_time': 32}]}]},
'norm_attribs': {'website': 'github.com/res',
'version': '1.1',
'type': 'crawl'},
'project_id': 'default',
'system_timestamp': '2019-02-22T19:04:53.569623',
'doc': {'appid': 'subtter',
'links': [],
'response_url': 'https://discooprdapp.com',
'url': 'https://discooprdapp.com/',
'status_code': 200,
'status_msg': 'OK',
'encoding': 'utf-8',
'attrs': {'uid': '2ab8f2651cb32261b911c990a8b'},
'timestamp': '2019-02-22T19:04:53.963',
'crawlid': '7fd95-785-4dd259-fcc-8752f'},
'type': 'crawl',
'norm': {'body': '\n',
'domain': 'discordapp.com',
'author': 'crawl',
'url': 'https://discooprdapp.com',
'timestamp': '2019-02-22T19:04:53.961283+00:00',
'id': '7fc5-685-4dd9-cc-8762f'}}
before you post make sure the actual code works for the data attached. Thanks!
The below code I tried but it did not work there was a syntax error that I could not figure out.
pd.io.json.json_normalize(source_data.[_source].apply(json.loads))
pd.io.json.json_normalize(source_data.[_source].apply(json.loads))
^
SyntaxError: invalid syntax
Whoever can help me with this will be a saint!
I had to do something like that a while back. Basically I used a function that completely flattened out the json to identify the keys that would be turned into the columns, then iterated through the json to reconstruct a row and append each row into a "results" dataframe. So with the data you provided, it created 52 column row and looking through it, looks like it included all the keys into it's own column. Anything nested, for example: 'meta': {'rule_matcher':[{'atribs': {'website': ...]} should then have a column name meta.rule_matcher.atribs.website where the '.' denotes those nested keys
data_source = {'sub_organization_id': 'default',
'uid': 'ac0fafe9ba98327f2d0c72ddc365ffb76336czsa13280b',
'project_veid': 'default',
'campaign_id': 'default',
'organization_id': 'default',
'meta': {'rule_matcher': [{'atribs': {'website': 'github.com/res',
'source': 'Explicit',
'version': '1.1',
'type': 'crawl'},
'results': [{'rule_type': 'hashtag',
'rule_tag': 'Far',
'description': None,
'project_veid': 'A7180EA-7078-0C7F-ED5D-86AD7',
'campaign_id': '2A6DA0C-365BB-67DD-B05830920',
'value': '#Far',
'organization_id': None,
'sub_organization_id': None,
'appid': 'ray',
'project_id': 'CDE2F42-5B87-C594-C900E578C',
'rule_id': '1838',
'node_id': None,
'metadata': {'campaign_title': 'AF',
'project_title': 'AF '}}]}],
'render': [{'attribs': {'website': 'github.com/res',
'version': '1.0',
'type': 'Page Render'},
'results': [{'render_status': 'success',
'path': 'https://east.amanaws.com/rays-ime-store/renders/b/b/70f7dffb8b276f2977f8a13415f82c.jpeg',
'image_hash': 'bb7674b8ea3fc05bfd027a19815f82c',
'url': 'https://discooprdapp.com/',
'load_time': 32}]}]},
'norm_attribs': {'website': 'github.com/res',
'version': '1.1',
'type': 'crawl'},
'project_id': 'default',
'system_timestamp': '2019-02-22T19:04:53.569623',
'doc': {'appid': 'subtter',
'links': [],
'response_url': 'https://discooprdapp.com',
'url': 'https://discooprdapp.com/',
'status_code': 200,
'status_msg': 'OK',
'encoding': 'utf-8',
'attrs': {'uid': '2ab8f2651cb32261b911c990a8b'},
'timestamp': '2019-02-22T19:04:53.963',
'crawlid': '7fd95-785-4dd259-fcc-8752f'},
'type': 'crawl',
'norm': {'body': '\n',
'domain': 'discordapp.com',
'author': 'crawl',
'url': 'https://discooprdapp.com',
'timestamp': '2019-02-22T19:04:53.961283+00:00',
'id': '7fc5-685-4dd9-cc-8762f'}}
Code:
def flatten_json(y):
out = {}
def flatten(x, name=''):
if type(x) is dict:
for a in x:
flatten(x[a], name + a + '_')
elif type(x) is list:
i = 0
for a in x:
flatten(a, name + str(i) + '_')
i += 1
else:
out[name[:-1]] = x
flatten(y)
return out
flat = flatten_json(data_source)
import pandas as pd
import re
results = pd.DataFrame()
special_cols = []
columns_list = list(flat.keys())
for item in columns_list:
try:
row_idx = re.findall(r'\_(\d+)\_', item )[0]
except:
special_cols.append(item)
continue
column = re.findall(r'\_\d+\_(.*)', item )[0]
column = re.sub(r'\_\d+\_', '.', column)
row_idx = int(row_idx)
value = flat[item]
results.loc[row_idx, column] = value
for item in special_cols:
results[item] = flat[item]
Output:
print (results.to_string())
atribs_website atribs_source atribs_version atribs_type results.rule_type results.rule_tag results.description results.project_veid results.campaign_id results.value results.organization_id results.sub_organization_id results.appid results.project_id results.rule_id results.node_id results.metadata_campaign_title results.metadata_project_title attribs_website attribs_version attribs_type results.render_status results.path results.image_hash results.url results.load_time sub_organization_id uid project_veid campaign_id organization_id norm_attribs_website norm_attribs_version norm_attribs_type project_id system_timestamp doc_appid doc_response_url doc_url doc_status_code doc_status_msg doc_encoding doc_attrs_uid doc_timestamp doc_crawlid type norm_body norm_domain norm_author norm_url norm_timestamp norm_id
0 github.com/res Explicit 1.1 crawl hashtag Far NaN A7180EA-7078-0C7F-ED5D-86AD7 2A6DA0C-365BB-67DD-B05830920 #Far NaN NaN ray CDE2F42-5B87-C594-C900E578C 1838 NaN AF AF github.com/res 1.0 Page Render success https://east.amanaws.com/rays-ime-store/render... bb7674b8ea3fc05bfd027a19815f82c https://discooprdapp.com/ 32.0 default ac0fafe9ba98327f2d0c72ddc365ffb76336czsa13280b default default default github.com/res 1.1 crawl default 2019-02-22T19:04:53.569623 subtter https://discooprdapp.com https://discooprdapp.com/ 200 OK utf-8 2ab8f2651cb32261b911c990a8b 2019-02-22T19:04:53.963 7fd95-785-4dd259-fcc-8752f crawl \n discordapp.com crawl https://discooprdapp.com 2019-02-22T19:04:53.961283+00:00 7fc5-685-4dd9-cc-8762f

Remove item from nested dictionaries if specified key contains None values

I have a list of dictionaries in which I am trying to remove any dictionary should the value of a certain key is None, it will be removed.
item_dict = [
{'code': 'aaa0000',
'id': 415294,
'index_range': '10-33',
'location': 'A010',
'type': 'True'},
{'code': 'bbb1458',
'id': 415575,
'index_range': '30-62',
'location': None,
'type': 'True'},
{'code': 'ccc3013',
'id': 415575,
'index_range': '14-59',
'location': 'C041',
'type': 'True'}
]
for item in item_dict:
filtered = dict((k,v) for k,v in item.iteritems() if v is not None)
# Output Results
# Item - aaa0000 is missing
# {'index_range': '14-59', 'code': 'ccc3013', 'type': 'True', 'id': 415575, 'location': 'C041'}
In my example, the output result is missing one of the dictionary and if I tried to create a new list to append filtered, item bbb1458 will be included in the list as well.
How can I rectify this?
[item for item in item_dict if None not in item.values()]
Each item in this list is a dictionary. And a dictionary is only appended to this list if None does not appear in the dictionary values.
You can create a new list using a list comprehension, filtering on the condition that all values are not None:
item_dict = [
{'code': 'aaa0000',
'id': 415294,
'index_range': '10-33',
'location': 'A010',
'type': 'True'},
{'code': 'bbb1458',
'id': 415575,
'index_range': '30-62',
'location': None,
'type': 'True'},
{'code': 'ccc3013',
'id': 415575,
'index_range': '14-59',
'location': 'C041',
'type': 'True'}
]
filtered = [d for d in item_dict if all(value is not None for value in d.values())]
print(filtered)
#[{'index_range': '10-33', 'id': 415294, 'location': 'A010', 'type': 'True', 'code': 'aaa0000'}, {'index_range': '14-59', 'id': 415575, 'location': 'C041', 'type': 'True', 'code': 'ccc3013'}]

Categories