Working with multiple JSONs from API calls in Python - python

I'm trying to make multiple API calls to retrieve JSON files. The JSONs all follow the same schema. I want to merge all the JSON files together as one file so I can do two things:
1) Extract all the IP addresses from the JSON to work with later
2) Convert the JSON into a Pandas Dataframe
When I first wrote the code, I made a single request and it returned a JSON that I could work with. Now I have used a for loop to collect multiple JSONs and append them to a list called results_list so that the next JSON does not overwrite the previous one I requested.
Here's the code
headers = {
'Accept': 'application/json',
'key': 'MY_API_KEY'
}
query_type = 'QUERY_TYPE'
locations_list = ['London', 'Amsterdam', 'Berlin']
results_list = []
for location in locations_list:
url = ('https://API_URL' )
r = requests.get(url, params={'query':str(query_type)+str(location)}, headers = headers)
results_list.append(r)
with open('my_search_results.json' ,'w') as outfile:
json.dump(results_list, outfile)
The JSON file my_search_results.json has a separate row for each API query e.g. 0 is London, 1 is Amsterdam, 2 is Berlin etc. Like this:
[
{
"complete": true,
"count": 51,
"data": [
{
"actor": "unknown",
"classification": "malicious",
"cve": [],
"first_seen": "2020-03-11",
"ip": "1.2.3.4",
"last_seen": "2020-03-28",
"metadata": {
"asn": "xxxxx",
"category": "isp",
"city": "London",
"country": "United Kingdom",
"country_code": "GB",
"organization": "British Telecommunications PLC",
"os": "Linux 2.2-3.x",
"rdns": "xxxx",
"tor": false
},
"raw_data": {
"ja3": [],
"scan": [
{
"port": 23,
"protocol": "TCP"
},
{
"port": 81,
"protocol": "TCP"
}
],
"web": {}
},
"seen": true,
"spoofable": false,
"tags": [
"some tag",
]
}
(I've redacted any sensitive data. There is a separate row in the JSON for each API request, representing each city, but it's too big to show here)
Now I want to go through the JSON and pick out all the IP addresses:
for d in results_list['data']:
ips = (d['ip'])
print(ips)
However this gives the error:
TypeError: list indices must be integers or slices, not str
When I was working with a single JSON from a single API request this worked fine, but now it seems like either the JSON is not formatted properly or Python is seeing my big JSON as a list and not a dictionary, even though I used json.dump() on results_list earlier in the script. I'm sure it has to do with the way I had to take all the API calls and append them to a list but I can't work out where I'm going wrong.
I'm struggling to figure out how to pick out the IP addresses or if there is just a better way to collect and merge multiple JSONs. Any advice appreciated.

To get the IP try:
for d in results_list['data']: #this works only if you accessed data rightly..
ips = (d[0]['ip'])
print(ips)
Reason for why you recieved the Error:
The key value of data is a list which contains a dictionary of the ip you need. So when you try to access ip by ips = (d['ip']), you are indexing the outer list, which raises the error:
TypeError: list indices must be integers or slices, not str
So if:
results_list= [
{
"complete": True,
"count": 51,
"data": [
{
"actor": "unknown",
"classification": "malicious",
"cve": [],
"first_seen": "2020-03-11",
"ip": "1.2.3.4",
"last_seen": "2020-03-28",
"metadata": {
"asn": "xxxxx",
"category": "isp",
"city": "London",
"country": "United Kingdom",
"country_code": "GB",
"organization": "British Telecommunications PLC",
"os": "Linux 2.2-3.x",
"rdns": "xxxx",
"tor": False
},
"raw_data": {
"ja3": [],
"scan": [
{
"port": 23,
"protocol": "TCP"
},
{
"port": 81,
"protocol": "TCP"
}
],
"web": {}
},
"seen": True,
"spoofable": False,
"tags": [
"some tag",
]
}...(here is your rest data)
]}]
to get all IP addresses, run:
ip_address=[]
# this works only if each result is a seperate dictionary in the results_list
for d in results_list:
ips = d['data'][0]['ip']
ip_address.append(ips)
print(ips)
#if all results are within data
for d in results_list[0]['data']:
ips = d['ip']
ip_address.append(ips)
print(ips)

results_list is a list, not a dictionary, so results_list['data'] raises an error. Instead, you should get each dictionary from that list, then access the 'data' attribute. Noting also that the value for the key 'data' is of type list, you also need to access the element of that list:
for result in results_list:
for d in result["data"]:
ips = d["ip"]
print(ips)
If you know that your JSON list only has one element, you may simplify this to:
for d in results_list[0]["data"]:
ips = d["ip"]
print(ips)

Related

Append to a json file using python

Trying to append to a nested json file
My goal is to append some values to a JSON file.
Here is my original JSON file
{
"web": {
"all": {
"side": {
"tags": [
"admin"
],
"summary": "Generates",
"operationId": "Key",
"consumes": [],
"produces": [
"application/json"
],
"responses": {
"200": {
"description": "YES",
"schema": {
"type": "string"
}
}
},
"Honor": [
{
"presidential": []
}
]
}
}
}
}
It is my intention to add two additional lines inside the key "Honor", with the values "Required" : "YES" and "Prepay" : "NO". As a result of appending the two values, I will have the following JSON file.
{
"web": {
"all": {
"side": {
"tags": [
"admin"
],
"summary": "Generates",
"operationId": "Key",
"consumes": [],
"produces": [
"application/json"
],
"responses": {
"200": {
"description": "YES",
"schema": {
"type": "string"
}
}
},
"Honor": [
{
"presidential": [],
"Required" : "YES",
"Prepay" : "NO"
}
]
}
}
}
}
Below is the Python code that I have written
import json
def write_json(data,filename ="exmpleData.json"):
with open(filename,"w") as f:
json.dump(data,f,indent=2)
with open ("exmpleData.json") as json_files:
data= json.load(json_files)
temp = data["Honor"]
y = {"required": "YES","type": "String"}
temp.append(y)
write_json(data)
I am receiving the following error message:
** temp = data["Honor"] KeyError: 'Honor'
**
I would appreciate any guidance that you can provide to help me achieve my goal. I am running Python 3.7
'Honor' is deeply nested in other dictionaries, and its value is a 1-element list containing a dictionary. Here's how to access:
import json
def write_json(data, filename='exmpleData.json'):
with open(filename, 'w') as f:
json.dump(data, f, indent=2)
with open('exmpleData.json') as json_files:
data = json.load(json_files)
# 'Honor' is deeply nested in other dictionaries
honor = data['web']['all']['side']['Honor']
# Its value is a 1-element list containing another dictionary.
honor[0]['Required'] = 'YES'
honor[0]['Prepay'] = 'NO'
write_json(data)
I'd recommend that you practice your fundamentals a bit more since you're making many mistakes in your data structure handling. The good news is, your JSON load/dump is fine.
The cause of your error message is that data doesn't have an "Honor" property. Data only has a "web" property, which contains "all" which contains "side" which contains "Honor", which contains an array with a dictionary that holds the properties you are trying to add to. So you want to set temp with temp = data['web']['all']['side']['Honor'][0]
You also cannot use append on python dictionaries. Instead, check out dict.update().

Populating JSON data from API in Python pandas DataFrame - TypeError and IndexError

I am trying to populate a pandas DataFrame with select information from JSON output fetched from an API.
candidate_list = []
for candidate in candidate_response['data']:
if 'error' not in candidate_response:
candidate_list.append([candidate['id'], candidate['attributes']['first_name'], candidate['attributes']
['last_name'], candidate['relationships']['educations']['data']['id']])
The DataFrame populates fine until I add candidate['relationships']['educations']['data']['id'], which throws TypeError: list indices must be integers or slices, not str.
When trying to get the values of the indexes for ['id'] by using candidate['relationships']['educations']['data'][0]['id'] instead, I get IndexError: list index out of range.
The JSON output looks something like:
"data": [
{
"attributes": {
"first_name": "Tester",
"last_name": "Testman",
"other stuff": "stuff",
},
"id": "732887",
"relationships": {
"educations": {
"data": [
{
"id": "605372",
"type": "educations"
},
{
"id": "605371",
"type": "educations"
},
{
"id": "605370",
"type": "educations"
}
]
}
},
How would I go about successfully filling a column in the DataFrame with the 'id's under 'relationships'>'educations'>'data'?
Please note then when using candidate['relationships']['educations']['data']['id'] you get that error because at data there is a list, and not a dictionary. And you cannot access dictionary by name.
Assuming, what you are trying to achieve is one entry per data.attributes.relationships.educations.data entry. Complete code that works and does what you are trying is:
import json
json_string = """{
"data": [
{
"attributes": {
"first_name": "Tester",
"last_name": "Testman",
"other stuff": "stuff"
},
"id": "732887",
"relationships": {
"educations": {
"data": [
{
"id": "605372",
"type": "educations"
},
{
"id": "605371",
"type": "educations"
},
{
"id": "605370",
"type": "educations"
}
]
}
}
}
]
}"""
candidate_response = json.loads(json_string)
candidate_list = []
for candidate in candidate_response['data']:
if 'error' not in candidate_response:
for data in candidate['relationships']['educations']['data']:
candidate_list.append(
[
candidate['id'],
candidate['attributes']['first_name'],
candidate['attributes']['last_name'],
data['id']
]
)
print(candidate_list)
Code run available at ideone.
I have analyzed your code and also ran it on Jupyter notebook all looks good, I am getting the output,
The error you got list indices must be integers or slices, not str, that is because you were not using the index, this required because the value which you are looking for that is in the list.
and about this error: IndexError: list index out of range. Maybe some code typo mistake is done from your side otherwise the code is fine.
here is the output of your following code:
candidate_list = []
for candidate in candidate_response['data']:
if 'error' not in candidate_response:
candidate_list.append([candidate['id'], candidate['attributes']['first_name'], candidate['attributes']['last_name'],candidate['relationships']['educations']['data'][0]['id']])
Output
probably for any candidate candidate['relationships']['educations']['data'] is an empty list

Return nested JSON item that has multiple instances

So i am able to return almost all data, except i am not able to capture something like this:
"expand": "schema"
"issues": [
{
"expand": "<>",
"id": "<>",
"self": "<>",
"key": "<>",
"fields": {
"components": [
{
"self": "<>",
"id": "1",
"name": "<>",
"description": "<>"
}
]
}
},
{
"expand": "<>",
"id": "<>",
"self": "<>",
"key": "<>",
"fields": {
"components": [
{
"self": "<>",
"id": "<>",
"name": "<>"
}
]
}
},
I want to return a list that contains both of the 'name's for 'components', i have tried using:
list((item['fields']['components']['name']) for item in data['issues'])
but i get a type error saying TypeError: list indices must be integers or slices, not str when i try to Print() the above line of code
Also, if i could get some explanation of what this type error means, and what "list" is trying to do that means that it is not a "str" that would be appreciated
EDIT:
url = '<url>'
r = http.request('GET', url, headers=headers)
data = json.loads(r.data.decode('utf-8'))
print([d['name'] for d in item['fields']['components']] for item in data['issues'])
As the commenter points out you're treating the list like a dictionary, instead this will select the name fields from the dictionaries in the list:
list((item['fields']['components'][i]['name'] for i, v in enumerate(item['fields']['components'])))
Or simply:
[d['name'] for d in item['fields']['components']]
You'd then need to apply the above to all the items in the iterable.
EDIT: Full solution to just print the name fields, assuming that "issues" is a key in some larger dictionary structure:
for list_item in data["issues"]: # issues is a list, so iterate through list items
for dct in list_item["fields"]["components"]: # each list_item is a dictionary
print(dct["name"]) # name is a field in each dictionary

Accessing nested json objects using python

I am trying to interact with an API and running into issues accessing nested objects. Below is sample json output that I am working with.
{
"results": [
{
"task_id": "22774853-2b2c-49f4-b044-2d053141b635",
"params": {
"type": "host",
"target": "54.243.80.16",
"source": "malware_analysis"
},
"v": "2.0.2",
"status": "success",
"time": 227,
"data": {
"details": {
"as_owner": "Amazon.com, Inc.",
"asn": "14618",
"country": "US",
"detected_urls": [],
"resolutions": [
{
"hostname": "bumbleride.com",
"last_resolved": "2016-09-15 00:00:00"
},
{
"hostname": "chilitechnology.com",
"last_resolved": "2016-09-16 00:00:00"
}
],
"response_code": 1,
"verbose_msg": "IP address in dataset"
},
"match": true
}
}
]
}
The deepest I am able to access is the data portion which returns too much.... ideally I am just trying access as_owner,asn,country,detected_urls,resolutions
When I try to access details / response code ... etc I will get a KeyError. My nested json goes deeper then other Q's mentioned and I have tried that logic.
Below is my current code snippet and any help is appreciated!
import requests
import json
headers = {
'Content-Type': 'application/json',
}
params = (
('wait', 'true'),
)
data = '{"target":{"one":{"type": "ip","target": "54.243.80.16", "sources": ["xxx","xxxxx"]}}}'
r=requests.post('https://fakewebsite:8000/api/services/intel/lookup/jobs', headers=headers, params=params, data=data, auth=('apikey', ''))
parsed_json = json.loads(r.text)
#results = parsed_json["results"]
for item in parsed_json["results"]:
print(item['data'])
You just need to index correctly into the converted JSON. Then you can easily loop over a list of the keys you want to fetch, since they are all in the "details" dictionary.
import json
raw = '''\
{
"results": [
{
"task_id": "22774853-2b2c-49f4-b044-2d053141b635",
"params": {
"type": "host",
"target": "54.243.80.16",
"source": "malware_analysis"
},
"v": "2.0.2",
"status": "success",
"time": 227,
"data": {
"details": {
"as_owner": "Amazon.com, Inc.",
"asn": "14618",
"country": "US",
"detected_urls": [],
"resolutions": [
{
"hostname": "bumbleride.com",
"last_resolved": "2016-09-15 00:00:00"
},
{
"hostname": "chilitechnology.com",
"last_resolved": "2016-09-16 00:00:00"
}
],
"response_code": 1,
"verbose_msg": "IP address in dataset"
},
"match": true
}
}
]
}
'''
parsed_json = json.loads(raw)
wanted = ['as_owner', 'asn', 'country', 'detected_urls', 'resolutions']
for item in parsed_json["results"]:
details = item['data']['details']
for key in wanted:
print(key, ':', json.dumps(details[key], indent=4))
# Put a blank line at the end of the details for each item
print()
output
as_owner : "Amazon.com, Inc."
asn : "14618"
country : "US"
detected_urls : []
resolutions : [
{
"hostname": "bumbleride.com",
"last_resolved": "2016-09-15 00:00:00"
},
{
"hostname": "chilitechnology.com",
"last_resolved": "2016-09-16 00:00:00"
}
]
BTW, when you fetch JSON data using requests there's no need to use json.loads: you can access the converted JSON using the .json method of the returned request object instead of using its .text attribute.
Here's a more robust version of the main loop of the above code. It simply ignores any missing keys. I didn't post this code earlier because the extra if tests make it slightly less efficient, and I didn't know that keys could be missing.
for item in parsed_json["results"]:
if not 'data' in item:
continue
data = item['data']
if not 'details' in data:
continue
details = data['details']
for key in wanted:
if key in details:
print(key, ':', json.dumps(details[key], indent=4))
# Put a blank line at the end of the details for each item
print()

Flask python json parsing

Hello I am completely new to flask and python. I am using an API to geocode
and i get a json which is
"info": {
"copyright": {
"imageAltText": "\u00a9 2015 MapQuest, Inc.",
"imageUrl": "http://api.mqcdn.com/res/mqlogo.gif",
"text": "\u00a9 2015 MapQuest, Inc."
},
"messages": [],
"statuscode": 0
},
"options": {
"ignoreLatLngInput": false,
"maxResults": -1,
"thumbMaps": true
},
"results": [
{
"locations": [
{
"adminArea1": "US",
"adminArea1Type": "Country",
"adminArea3": "",
"adminArea3Type": "",
"adminArea4": "",
"adminArea4Type": "County",
"adminArea5": "",
"adminArea5Type": "City",
"adminArea6": "",
"adminArea6Type": "Neighborhood",
"displayLatLng": {
"lat": 33.663512,
"lng": -111.958849
},
"dragPoint": false,
"geocodeQuality": "ADDRESS",
"geocodeQualityCode": "L1AAA",
"latLng": {
"lat": 33.663512,
"lng": -111.958849
},
"linkId": "25438895i35930428r65831359",
"mapUrl": "http://www.mapquestapi.com/staticmap/v4/getmap?key=&rand=1009123942",
"postalCode": "",
"sideOfStreet": "R",
"street": "",
"type": "s",
"unknownInput": ""
}
],
"providedLocation": {
"city": " ",
"postalCode": "",
"state": "",
"street": "E Blvd"
}
}
]
}
RIght now i am doing this
data=json.loads(r)
return jsonify(data)
and this prints all the data as shown above. I need to get the latlng array from locations which is in results. I have tried
data.get("results").get("locations") and hundreds of combinations like that but i still cant get it to work. I basically need to store the lat and long in a session variable. Any help is appreciated
Assuming you just have one location as in your example:
from __future__ import print_function
import json
r = ...
data = json.loads(r)
latlng = data['results'][0]['locations'][0]['latLng']
latitude = latlng['lat']
longitude = latlng['lng']
print(latitude, longitude) # 33.663512 -111.958849
data.get("results") will return a list type object. As list object does not have get attribute, you can not do data.get("results").get("locations")
According to the json you provided, you can do like this:
data.get('results')[0].get('locations') # also a list
This will give you the array. Now you can get the lat and lng like this:
data.get('results')[0].get('locations')[0].get('latLng').get('lat') # lat
data.get('results')[0].get('locations')[0].get('latLng').get('lng') # lng
I summarize my comments as follows:
You can use data as a dict of dict and list.
A quick ref to dict and list:
A dictionary’s keys are almost arbitrary values.
get(key[, default])
Return the value for key if key is in the dictionary, else default. If default is not given, it defaults to None, so that this method never raises a KeyError.
official docs about stdtypes

Categories