Extracting values from json containing list as values to key

Extracting values from json containing list as values to key - python

i am currently doing a data science project (beginner) and have the following scenario :
I have a dataframe with Pincode , address and city (approx 57000 rows)
I need the geo coordinates of the same
i am trying to use the Bing Map API to get the coordinates in python. But i am stuck at parsing the Json response.
pincodelist=[]
import json
i=0
for i in range(5): #just trying with first 5 rows
countryRegion = "IN"
locality = df_all.iloc[13,6] #references the location column
postalCode =df_all.iloc[13,12] #references the pincode column
addressLine = df_all.iloc[13,0] #references the address
BingMapsKey = 'my api key'
url="http://dev.virtualearth.net/REST/v1/Locations?countryRegion="+str(countryRegion)+"&locality="+str(locality)+"&postalCode=" + str(postalCode)+"&addressLine="+str(addressLine)+"&key="+str(BingMapsKey)
# make the GET request
results = requests.get(url).json()
pincodelist.append([
addressLine,
postalCode,
results(['resourceSets']['resources']['bbox'])])
print(pincodelist)
I get the following error:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-198-396207c04bc6> in <module>
20 addressLine,
21 postalCode,
---> 22 results(['resourceSets']['resources']['bbox'])])
23 print(pincodelist)
24
TypeError: list indices must be integers or slices, not str
can somebody please help me how to parse this json response? the info i need is "bbox" which contains the coordinates.
{
"authenticationResultCode":"ValidCredentials",
"brandLogoUri":"http://dev.virtualearth.net/Branding/logo_powered_by.png",
"copyright":"Copyright © 2020 Microsoft and its suppliers. All rights reserved. This API cannot be accessed and the content and any results may not be used, reproduced or transmitted in any manner without express written permission from Microsoft Corporation.",
"resourceSets":[
{
"estimatedTotal":1,
"resources":[
{
"__type":"Location:http://schemas.microsoft.com/search/local/ws/rest/v1",
"bbox":[
12.91842713075696,
77.56459359208381,
12.926152565898313,
77.57516165693963
],
"name":"Banashankari, India",
"point":{
"type":"Point",
"coordinates":[
12.922289848327637,
77.56987762451172
]
},
"address":{
"adminDistrict":"KA",
"adminDistrict2":"Bengaluru",
"countryRegion":"India",
"formattedAddress":"Banashankari, India",
"locality":"Bengaluru"
},
"confidence":"High",
"entityType":"Neighborhood",
"geocodePoints":[
{
"type":"Point",
"coordinates":[
12.922289848327637,
77.56987762451172
],
"calculationMethod":"Rooftop",
"usageTypes":[
"Display"
]
}
],
"matchCodes":[
"Good"
]
}
]
}
],
"statusCode":200,
"statusDescription":"OK",
"traceId":"4e23d3d9bef84411846539f3113cc06b|DU00000D7F|0.0.0.1|Ref A: F8AB7E576A9B47B1A86B3DE04F1058A9 Ref B: DB3EDGE1616 Ref C: 2020-05-24T11:30:41Z"
}
Also would be helpful if you can refer any other location data service considering the numbers of rows to query. As a student a paid service is not feasible for me.

resourceSets is a list of objects, as can be seen by the square brackets that follow the key, so you need to specify a numeric index to get an element out of it. The same goes for the resources key.
{'resourceSets': [{'estimatedTotal': 1, 'resources': [{'__type': ...
In you example, there is only one resourceSet, so we can just get the first element:
# resource sets - a list
resource_sets = results['resourceSets']
# resource set object
resource_set = resource_sets[0]
# resources - a list
resources = resource_set['resources']
# first resource
resource = resources[0]
# bbox - a list with 4 numbers inside
bbox = resource['bbox']
# Or in one line:
results['resourceSets'][0]['resources'][0]['bbox']

Parse the response into json using json.parse.
I tried doing it in javaScript and assigned the json response to a variable t.
was able to extract using this reference, some keys have list as their values that could be the problem you are facing.
t.resourceSets[0].resources[0].bbox
(4) [12.91842713075696, 77.56459359208381, 12.926152565898313, 77.57516165693963]

Related

Python json pulling list of items

I apologize in advance if this is simple. This is my first go at Python and I've been searching and trying things all day and just haven't been able to figure out how to accomplish what I need.
I am pulling a list of assets from an API. Below is an example of the result of this request (in reality it will return 50 sensorpoints.
There is a second request that will pull readings from a specific sensor based on sensorPointId. I need to be able to enter an assetId, and pull the readings from each sensor.
{
"assetId": 1436,
"assetName": "Pharmacy",
"groupId": "104",
"groupName": "West",
"environment": "Freezer",
"lastActivityDate": "2021-01-25T18:54:34.5970000Z",
"tags": [
"Manager: Casey",
"State: Oregon"
],
"sensorPoints": [
{
"sensorPointId": 126,
"sensorPointName": "Top Temperature",
"devices": [
"23004000080793070793",
"74012807612084533500"
]
},
{
"sensorPointId": 129,
"sensorPointName": "Bottom Temperature",
"devices": [
"86004000080793070956"
]
}
]
}
My plan was to go through the list from the first request, make a list of all the sensorpointIds in that asset then run the second request for each based on that list. The problem no matter which method I try to pull the individual sensorpointIds, it says object is not subscriptable, even when looking at a string value. These are all the things I've tried. I'm sure it's something silly I'm missing, but all of these I have seen in examples. I've written the full response to a text file just to make sure I'm getting good data, and that works fine.
r = request...
data = r.json
for sensor in data:
print (data["sensorpointId")
or
print(["sensorsPoints"]["sensorPointName"])
these give 'method' object is not iterable
I've also just tried to print a single sensorpointId
print(data["sensorpointId"][0])
print(data["sensorpointName"][0])
print(data["sensorPoints"][0]["sensorpointId"])
all of these give object is not subscriptable
print(r["sensorPoints"][0]["sensorpointName"])
'Response' object is not subscriptable
print(data["sensorPoints"][0]["sensorpointName"])
print(["sensorPoints"][0]["sensorpointName"]
string indices must be integers, not 'str'

I got it!
data = r.json()['sensorPoints']
sensors = []
for d in data:
sensor = d['sensorPointId']
sensors.append(sensor)

How to sort paginated logs by #timestamp with Elasticsearch?

My goal is to sort millions of logs by timestamp that I receive out of Elasticsearch.
Example logs:
{"realIp": "192.168.0.2", "#timestamp": "2020-12-06T02:00:09.000Z"}
{"realIp": "192.168.0.2", "#timestamp": "2020-12-06T02:01:09.000Z"}
{"realIp": "192.168.0.2", "#timestamp": "2020-12-06T02:02:09.000Z"}
{"realIp": "192.168.0.2", "#timestamp": "2020-12-06T02:04:09.000Z"}
Unfortunately, I am not able to get all the logs sorted out of Elastic. It seems like I have to do it by myself.
Approaches I have tried to get the data sorted out of elastic:
es = Search(index="somelogs-*").using(client).params(preserve_order=True)
for hit in es.scan():
print(hit['#timestamp'])
Another approach:
notifications = (es
.query("range", **{
"#timestamp": {
'gte': 'now-48h',
'lt' : 'now'
}
})
.sort("#timestamp")
.scan()
)
So I am looking for a way to sort these logs by myself or directly through Elasticsearch. Currently, I am saving all the data in a local 'logs.json' and it seems to me I have to iter over and sort it by myself.

You should definitely let Elasticsearch do the sorting, then return the data to you already sorted.
The problem is that you are using .scan(). It uses Elasticsearch's scan/scroll API, which unfortunately only applies the sorting params on each page/slice, not the entire search result. This is noted in the elasticsearch-dsl docs on Pagination:
Pagination
...
If you want to access all the documents matched by your query you can
use the scan method which uses the scan/scroll elasticsearch API:
for hit in s.scan():
print(hit.title)
Note that in this case the results won’t be sorted.
(emphasis mine)
Using pagination is definitely an option especially when you have a "millions of logs" as you said. There is a search_after pagination API:
Search after
You can use the search_after parameter to retrieve the next page of
hits using a set of sort values from the previous page.
...
To get the first page of results, submit a search request with a sort
argument.
...
The search response includes an array of sort values for
each hit.
...
To get the next page of results, rerun the previous search using the last hit’s sort values as the search_after argument. ... The search’s query and sort arguments must remain unchanged. If provided, the from argument must be 0 (default) or -1.
...
You can repeat this process to get additional pages of results.
(omitted the raw JSON requests since I'll show a sample in Python below)
Here's a sample how to do it with elasticsearch-dsl for Python. Note that I'm limiting the fields and the number of results to make it easier to test. The important parts here are the sort and the extra(search_after=).
search = Search(using=client, index='some-index')
# The main query
search = search.extra(size=100)
search = search.query('range', **{'#timestamp': {'gte': '2020-12-29T09:00', 'lt': '2020-12-29T09:59'}})
search = search.source(fields=('#timestamp', ))
search = search.sort({
'#timestamp': {
'order': 'desc'
},
})
# Store all the results (it would be better to be wrap all this in a generator to be performant)
hits = []
# Get the 1st page
results = search.execute()
hits.extend(results.hits)
total = results.hits.total
print(f'Expecting {total}')
# Get the next pages
# Real use-case condition should be "until total" or "until no more results.hits"
while len(hits) < 1000:
print(f'Now have {len(hits)}')
last_hit_sort_id = hits[-1].meta.sort[0]
search = search.extra(search_after=[last_hit_sort_id])
results = search.execute()
hits.extend(results.hits)
with open('results.txt', 'w') as out:
for hit in hits:
out.write(f'{hit["#timestamp"]}\n')
That would lead to an already sorted data:
# 1st 10 lines
2020-12-29T09:58:57.749Z
2020-12-29T09:58:55.736Z
2020-12-29T09:58:53.627Z
2020-12-29T09:58:52.738Z
2020-12-29T09:58:47.221Z
2020-12-29T09:58:45.676Z
2020-12-29T09:58:44.523Z
2020-12-29T09:58:43.541Z
2020-12-29T09:58:40.116Z
2020-12-29T09:58:38.206Z
...
# 250-260
2020-12-29T09:50:31.117Z
2020-12-29T09:50:27.754Z
2020-12-29T09:50:25.738Z
2020-12-29T09:50:23.601Z
2020-12-29T09:50:17.736Z
2020-12-29T09:50:15.753Z
2020-12-29T09:50:14.491Z
2020-12-29T09:50:13.555Z
2020-12-29T09:50:07.721Z
2020-12-29T09:50:05.744Z
2020-12-29T09:50:03.630Z
...
# 675-685
2020-12-29T09:43:30.609Z
2020-12-29T09:43:30.608Z
2020-12-29T09:43:30.602Z
2020-12-29T09:43:30.570Z
2020-12-29T09:43:30.568Z
2020-12-29T09:43:30.529Z
2020-12-29T09:43:30.475Z
2020-12-29T09:43:30.474Z
2020-12-29T09:43:30.468Z
2020-12-29T09:43:30.418Z
2020-12-29T09:43:30.417Z
...
# 840-850
2020-12-29T09:43:27.953Z
2020-12-29T09:43:27.929Z
2020-12-29T09:43:27.927Z
2020-12-29T09:43:27.920Z
2020-12-29T09:43:27.897Z
2020-12-29T09:43:27.895Z
2020-12-29T09:43:27.886Z
2020-12-29T09:43:27.861Z
2020-12-29T09:43:27.860Z
2020-12-29T09:43:27.853Z
2020-12-29T09:43:27.828Z
...
# Last 3
2020-12-29T09:43:25.878Z
2020-12-29T09:43:25.876Z
2020-12-29T09:43:25.869Z
There are some considerations on using search_after as discussed in the API docs:
Use a Point In Time or PIT parameter
If a refresh occurs between these requests, the order of your results may change, causing inconsistent results across pages. To prevent this, you can create a point in time (PIT) to preserve the current index state over your searches.
You need to first make a POST request to get a PIT ID
Then add an extra 'pit': {'id':xxxx, 'keep_alive':5m} parameter to every request
Make sure to use the PIT ID from the last response
Use a tiebreaker
We recommend you include a tiebreaker field in your sort. This tiebreaker field should contain a unique value for each document. If you don’t include a tiebreaker field, your paged results could miss or duplicate hits.
This would depend on your Document schema
# Add some ID as a tiebreaker to the `sort` call
search = search.sort(
{'#timestamp': {
'order': 'desc'
}},
{'some.id': {
'order': 'desc'
}}
)
# Include both the sort ID and the some.ID in `search_after`
last_hit_sort_id, last_hit_route_id = hits[-1].meta.sort
search = search.extra(search_after=[last_hit_sort_id, last_hit_route_id])

Thank you Gino Mempin. It works!
But I also figured out, that a simple change does the same job.
by adding .params(preserve_order=True) elasticsearch will sort all the data.
es = Search(index="somelog-*").using(client)
notifications = (es
.query("range", **{
"#timestamp": {
'gte': 'now-48h',
'lt' : 'now'
}
})
.sort("#timestamp")
.params(preserve_order=True)
.scan()
)

Trouble when storing API data in Python list

I'm struggling with my json data that I get from an API. I've gone into several api urls to grab my data, and I've stored it in an empty list. I then want to take out all fields that say "reputation" and I'm only interested in that number. See my code here:
import json
import requests
f = requests.get('my_api_url')
if(f.ok):
data = json.loads(f.content)
url_list = [] #the list stores a number of urls that I want to request data from
for items in data:
url_list.append(items['details_url']) #grab the urls that I want to enter
total_url = [] #stores all data from all urls here
for index in range(len(url_list)):
url = requests.get(url_list[index])
if(url.ok):
url_data = json.loads(url.content)
total_url.append(url_data)
print(json.dumps(total_url, indent=2)) #only want to see if it's working
Thus far I'm happy and can enter all urls and get the data. It's in the next step I get trouble. The above code outputs the following json data for me:
[
[
{
"id": 316,
"name": "storabro",
"url": "https://storabro.net",
"customer": true,
"administrator": false,
"reputation": 568
}
],
[
{
"id": 541,
"name": "sega",
"url": "https://wedonthaveanyyet.com",
"customer": true,
"administrator": false,
"reputation": 45
},
{
"id": 90,
"name": "Villa",
"url": "https://brandvillas.co.uk",
"customer": true,
"administrator": false,
"reputation": 6
}
]
]
However, I only want to print out the reputation, and I cannot get it working. If I in my code instead use print(total_url['reputation']) it doesn't work and says "TypeError: list indices must be integers or slices, not str", and if I try:
for s in total_url:
print(s['reputation'])
I get the same TypeError.
Feels like I've tried everything but I can't find any answers on the web that can help me, but I understand I still have a lot to learn and that my error will be obvious to some people here. It seems very similar to other things I've done with Python, but this time I'm stuck. To clarify, I'm expecting an output similar to: [568, 45, 6]
Perhaps I used the wrong way to do this from the beginning and that's why it's not working all the way for me. Started to code with Python in October and it's still very new to me but I want to learn. Thank you all in advance!

It looks like your total_url is a list of lists, so you might write a function like:
def get_reputations(data):
for url in data:
for obj in url:
print(obj.get('reputation'))
get_reputations(total_url)
# output:
# 568
# 45
# 6
If you'd rather not work with a list of lists in the first place, you can extend the list with each result instead of append in the expression used to construct total_url

You can also use json.load and try to read the response
def get_rep():
response = urlopen(api_url)
r = response.read().decode('utf-8')
r_obj = json.loads(r)
for item in r_obj['response']:
print("Reputation: {}".format(item['reputation']))

Python Key Value Error (Json)

I am trying to grab this data and print into a string of text i am having the worst! issues getting this to work.
Here is the source i am working with to get a better understanding i am working on an envirmental controller and my sonoff switch combined
https://github.com/FirstCypress/LiV/blob/master/software/liv/iotConnectors/sonoff/sonoff.py this code works for two pages once completed so ignore the keys for tempature etc
m = json.loads(content)
co2 = m["Value"]
I need the value of "Value" under the "TaskValues" it should be either a 1 or a 0 in almost any case how would i pulled that key in the right form?
"Sensors":[
{
"TaskValues": [
{"ValueNumber":1,
"Name":"Switch",
"NrDecimals":0,
"Value":0
}],
"DataAcquisition": [
{"Controller":1,
"IDX":0,
"Enabled":"false"
},
{"Controller":2,
"IDX":0,
"Enabled":"false"
},
{"Controller":3,
"IDX":0,
"Enabled":"false"
}],
"TaskInterval":0,
"Type":"Switch input - Switch",
"TaskName":"relias",
"TaskEnabled":"true",
"TaskNumber":1
}
],
"TTL":60000
}

You can get it by
m['Sensors'][0]['TaskValues'][0]['Value']

"Value" is nested in your json, as you've mentioned. To get what you want, you'll need to traverse the parent data structures:
m = json.loads(content)
# This is a list
a = m.get('Sensors')
# This is a dictionary
sensor = a[0]
# This is a list
taskvalue = sensor.get('TaskValues')
# Your answer
value = taskvalue[0].get('Value')

Raster: How to get elevation at lat/long using python?

I also posted this question in the GIS section of SO. As I'm not sure if this rather a 'pure' python question I also ask it here again.
I was wondering if anyone has some experience in getting elevation data from a raster without using ArcGIS, but rather get the information as a python list or dict?
I get my XY data as a list of tuples.
I'd like to loop through the list or pass it to a function or class-method to get the corresponding elevation for the xy-pairs.
I did some research on the topic and the gdal API sounds promising. Can anyone advice me how to go about things, pitfalls, sample code? Other options?
Thanks for your efforts, LarsVegas

I recommend checking out the Google Elevation API
It's very straightforward to use:
http://maps.googleapis.com/maps/api/elevation/json?locations=39.7391536,-104.9847034&sensor=true_or_false
{
"results" : [
{
"elevation" : 1608.637939453125,
"location" : {
"lat" : 39.73915360,
"lng" : -104.98470340
},
"resolution" : 4.771975994110107
}
],
"status" : "OK"
}
note that the free version is limited to 2500 requests per day.

We used this code to get elevation for a given latitude/longitude (NOTE: we only asked to print the elevation, and the rounded lat and long values).
import urllib.request
import json
lati = input("Enter the latitude:")
lngi = input("Enter the longitude:")
# url_params completes the base url with the given latitude and longitude values
ELEVATION_BASE_URL = 'http://maps.googleapis.com/maps/api/elevation/json?'
URL_PARAMS = "locations=%s,%s&sensor=%s" % (lati, lngi, "false")
url=ELEVATION_BASE_URL + URL_PARAMS
with urllib.request.urlopen(url) as f:
response = json.loads(f.read().decode())
status = response["status"]
result = response["results"][0]
print(float(result["elevation"]))
print(float(result["location"]["lat"]))
print(float(result["location"]["lng"]))

Have a look at altimeter a wrapper for the Google Elevation API

Here is the another one nice API that I`v built: https://algorithmia.com/algorithms/Gaploid/Elevation
import Algorithmia
input = {
"lat": "50.2111",
"lon": "18.1233"
}
client = Algorithmia.client('YOUR_API_KEY')
algo = client.algo('Gaploid/Elevation/0.3.0')
print algo.pipe(input)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Extracting values from json containing list as values to key - python

Related

Python json pulling list of items

How to sort paginated logs by #timestamp with Elasticsearch?

Trouble when storing API data in Python list

Python Key Value Error (Json)

Raster: How to get elevation at lat/long using python?

Categories

Resources