Elasticsearch Aggregation to pandas Dataframe

Elasticsearch Aggregation to pandas Dataframe - python

I am working with some ElasticSearch data and i would like to generate the tables from the aggregations like in Kibana. A sample output of the aggregation is below, based on the following code :
s.aggs.bucket("name1", "terms", field="field1").bucket(
"name2", "terms", field="innerField1"
).bucket("name3", "terms", field="InnerAgg1")
response = s.execute()
resp_dict = response.aggregations.name.buckets
{
"key": "Locationx",
"doc_count": 12,
"name2": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [{
"key": "Sub-Loc1",
"doc_count": 1,
"name3": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [{
"key": "super-Loc1",
"doc_count": 1
}]
}
}, {
"key": "Sub-Loc2",
"doc_count": 1,
"name3": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [{
"key": "super-Loc1",
"doc_count": 1
}]
}
}]
}
}
In this case, the expected output would be:
Now, I have tried a variety of methods, with a short description of what went wrong :
Pandasticsearch = completely failed even with just 1 dictionary. The dictionary was not created, as it was struggling with keys, even with each dictionary being dealt with separately:
for d in resp_dict :
x= d.to_dict()
pandas_df = Select.from_dict(x).to_pandas()
print(pandas_df)
In particular, the error that was recieved related to the the fact that the dictionary was not made and thus ['took'] was not a key.
Pandas (pd.Dataframe.from_records()) = only gave me the first aggregation, with a column containing the inner dictionary, and using pd.apply(pd.Series) on it gave another table of resulting dictionaries.
StackOverflow posts recursive function = the dictionary looks completely different than the example used,and tinkering led me nowhere unless i drastically change the input.

Struggling with the same problem, I've come to believe the reason for this being that the response_dict are not normal dicts, but an elasticsearch_dsl.utils.AttrList of elasticsearch_dsl.utils.AttrDict.
If you have an AttrList of AttrDicts, it's possible to do:
resp_dict = response.aggregations.name.buckets
new_response = [i._d_ for i in resp_dict]
To get a list of normal dicts instead. This will probably play nicer with other libraries.
Edit:
I wrote a recursive function which at least handles some cases, not extensively tested yet though and not wrapped in a nice module or anything. It's just a script. The one_lvl function keeps track of all the siblings and siblings of parents in the tree in a dictionary called tmp, and recurses when it finds a new named aggregation. It assumes a lot about the structure of the data, which I'm not sure is warranted in the general case.
The lvl stuff is necessary I think because you might have duplicate names, so key exists at several aggregation-levels for instance.
#!/usr/bin/env python3
from elasticsearch_dsl.query import QueryString
from elasticsearch_dsl import Search, A
from elasticsearch import Elasticsearch
import pandas as pd
PORT = 9250
TIMEOUT = 10000
USR = "someusr"
PW = "somepw"
HOST = "test.com"
INDEX = "my_index"
QUERY = "foobar"
client = Elasticsearch([HOST], port = PORT, http_auth=(USR, PW), timeout = TIMEOUT)
qs = QueryString(query = QUERY)
s = Search(using=client, index=INDEX).query(qs)
s = s.params(size = 0)
agg= {
"dates" : A("date_histogram", field="date", interval="1M", time_zone="Europe/Berlin"),
"region" : A("terms", field="region", size=10),
"county" : A("terms", field="county", size = 10)
}
s.aggs.bucket("dates", agg["dates"]). \
bucket("region", agg["region"]). \
bucket("county", agg["county"])
resp = s.execute()
data = {"buckets" : [i._d_ for i in resp.aggregations.dates]}
rec_list = ["buckets"] + [*agg.keys()]
def get_fields(i, lvl):
return {(k + f"{lvl}"):v for k, v in i.items() if k not in rec_list}
def one_lvl(data, tmp, lvl, rows, maxlvl):
tmp = {**tmp, **get_fields(data, lvl)}
if "buckets" not in data:
rows.append(tmp)
for d in data:
if d in ["buckets"]:
for v, b in enumerate(data[d]):
tmp = {**tmp, **get_fields(data[d][v], lvl)}
for k in b:
if k in agg.keys():
one_lvl(data[d][v][k], tmp, lvl+1, rows, maxlvl)
else:
if lvl == maxlvl:
tmp = {**tmp, (k + f"{lvl}") : data[d][v][k]}
rows.append(tmp)
return rows
rows = one_lvl(data, {}, 1, [], len(agg))
df = pd.DataFrame(rows)

Related

Batch update validation and formatting cells using pygsheets

I am using pygsheets and would like to batch validate cells instead of looping through each cell and doing it iteratively. I have gone through the pygsheets documentation and have not found an example of this, would this be possible and if so how would one do this? I did see an example of batching in the documentation (through unlinking and then linking again), but this did not work for me instead no update happened.
Below I have a working example of the code that I am trying to optimise by batching the update.
A
B
C
import pygsheets
spread_sheet_id = "...insert...spreadsheet...id"
spreadsheet_name = "...spreadsheet_name..."
wks_name_or_pos = "...worksheet_name..."
spreadsheet = pygsheets.Spreadsheet(client=service,id=spread_sheet_id)
wksheet = spreadsheet.worksheet('title',wks_name_or_pos)
header_list = ["A","B","C"]
for index, element in enumerate(header_list):
cell_string = str(chr(65+index)+"1")
wksheet.cell(cell_string).set_text_format('bold', True).value = element
header_cell = wksheet.cell(cell_string)
header_cell.color = (0.9529412, 0.9529412, 0.9529412, 0) # set background color of this cell as a tuple (red, green, blue, alpha)
header_cell.update()
wksheet.set_data_validation(
start=cell_string,end=cell_string,
condition_type='TEXT_CONTAINS',
condition_values=[element], inputMessage=f"Value must be {element}", strict=True)
I have realised I can change the value in the cell by passing it in as a list of lists, but not sure how to batch the validation and batch format the cell.
header_list = ["A","B","C"]
list_of_lists = [[col] for col in header_list]
# update values with list of lists (working)
wksheet.update_cells('A1:C1',list_of_lists)
# batch update to bold, change the colour to grey and make sure values fit in cell (increase cell size) ?
# wksheet.add_conditional_formatting(start='A1', end='C1',
# condition_type='CUSTOM_FORMULA',
# format={'backgroundColor':{'red':0.5,'green':0.5, 'blue':0.5, 'alpha':0}},
# condition_values=['=NOT(ISBLANK(A1))'])
# batch validate multiple cells so that the value is strictly the value provided ?
I also tried just unlinking, running the pygsheets commands then linking again as
wksheet.unlink()
header_list = ["A","B","C"]
for index, element in enumerate(header_list):
cell_string = str(chr(65+index)+"1")
wksheet.cell(cell_string).set_text_format('bold', True).value = element
header_cell = wksheet.cell(cell_string)
header_cell.color = (0.9529412, 0.9529412, 0.9529412, 0) # set background color of this cell as a tuple (red, green, blue, alpha)
header_cell.update()
wksheet.set_data_validation(
start=cell_string,end=cell_string,
condition_type='TEXT_CONTAINS',condition_values=[element], inputMessage=f"Value must be {element}", strict=True)
wksheet.link()

I believe your goal is as follows.
Your showing 1st script works fine.
You want to reduce the process cost of your script and want to achieve your multiple requests by one API call.
You want to achieve this using pygsheets for python.
In this case, how about using batch_update of Sheet API Wrapper as follows?
Modified script:
header_list = ["A", "B", "C"] # This is from your script.
# I modified the below script.
values = [
{
"userEnteredValue": {"stringValue": e},
"userEnteredFormat": {"textFormat": {"bold": True}},
"dataValidation": {
"condition": {"type": "TEXT_CONTAINS", "values": [{"userEnteredValue": e}]},
"inputMessage": "Value must be " + e,
"strict": True,
},
}
for e in header_list
]
requests = [
{
"updateCells": {
"range": {
"sheetId": wksheet.id,
"startRowIndex": 0,
"startColumnIndex": 0,
"endRowIndex": 1,
"endColumnIndex": 3,
},
"rows": [{"values": values}],
"fields": "userEnteredValue,userEnteredFormat,dataValidation",
}
}
]
service.sheet.batch_update(spread_sheet_id, requests)
service is your client for pygsheets.
When this script is run, the same result as your 1st script is obtained by one API call.
References:
Sheet API Wrapper
UpdateCellsRequest
Added:
From your following reply,
I was looking for a solution with the bolding of the cells in the first row, and grey coloring.
I was also hoping to be able to pass the formatting in individual methods without writing dictionaries with strings (if possible, I understand this may be the only way).
How about the following sample script?
Sample script:
class Sample:
startRange = {}
values = []
userEnteredFormat = {"textFormat": {}, "backgroundColor": {}}
dataValidation = {}
def setStartCell(self, sheetId, row, col):
self.startRange = {"sheetId": sheetId, "rowIndex": row, "columnIndex": col}
def setValues(self, v):
self.values = v
def setTextFormat(self, v1, v2):
self.userEnteredFormat["textFormat"][v1] = v2
def setBackgroundColor(self, v1):
self.userEnteredFormat["backgroundColor"] = {
"red": v1[0],
"green": v1[1],
"blue": v1[2],
"alpha": v1[3],
}
def setDataValidation(self, v1, v2):
self.dataValidation = [v1, v2]
def create(self):
values = [
{
"userEnteredValue": {"stringValue": e},
"userEnteredFormat": self.userEnteredFormat,
"dataValidation": {
"condition": {
"type": self.dataValidation[0],
"values": [{"userEnteredValue": e}],
},
"inputMessage": self.dataValidation[1].replace("{element}", e),
"strict": True,
},
}
for e in self.values
]
return [
{
"updateCells": {
"start": self.startRange,
"rows": [{"values": values}],
"fields": "userEnteredValue,userEnteredFormat,dataValidation",
}
}
]
spread_sheet_id = "...insert...spreadsheet...id"
wks_name_or_pos = "...worksheet_name..."
spreadsheet = pygsheets.Spreadsheet(client=service, id=spread_sheet_id)
wksheet = spreadsheet.worksheet("title", wks_name_or_pos)
header_list = ["A", "B", "C"] # This is from your question.
s = Sample()
s.setStartCell(wksheet.id, 0, 0) # cell "A1" (0, 0) of wksheet.
s.setValues(header_list)
s.setTextFormat("bold", True)
s.setBackgroundColor([0.9529412, 0.9529412, 0.9529412, 0]) # R, G, B, Alpha
s.setDataValidation("TEXT_CONTAINS", "Value must be {element}") # type, inputMessage
service.sheet.batch_update(spread_sheet_id, s.create())
In this sample script, a request body for the batchUpdate method is created by Sample. And, the created request body is used with service.sheet.batch_update of pygsheets.

How to create a tree using BFS in python?

So I have a flattened tree in JSON like this, as array of objects:
[{
aid: "id3",
data: ["id1", "id2"]
},
{
aid: "id1",
data: ["id3", "id2"]
},
{
aid: "id2",
nested_data: {aid: "id4", atype: "nested", data: ["id1", "id3"]},
data: []
}]
I want to gather that tree and resolve ids into data with recursion loops into something like this (say we start from "id3"):
{
"aid":"id3",
"payload":"1",
"data":[
{
"id1":{
"aid":"id1",
"data":[
{
"id3":null
},
{
"id2":null
}
]
}
},
{
"id2":{
"aid":"id2",
"nested_data":{
"aid":"id4",
"atype":"nested",
"data":[
{
"id1":null
},
{
"id3":null
}
]
},
"data":[
]
}
}
]
}
So that we would get breadth-first search and resolve some field into "value": "object with that field" on first entrance and "value": Null
How to do such a thing in python 3?

Apart from all the problems that your structure has in terms of syntax (identifiers must be within quotes, etc.), the code below will provide you with the requested answer.
But you should carefully think about what you are doing, and have the following into account:
Using the relations expressed in the flat structure that you provide will mean that you will have an endless recursion since you have items that include other items that in turn include the first ones (like id3 including id1, which in turn include id3. So, you have to define stop criteria, or be sure that this does not occur in your flat structure.
Your initial flat structure is better to be in the form of a dictionary, instead of a list of pairs {id, data}. That is why the first thing the code below does is to transform this.
Your final, desired structure contains a lot of redundancies in terms of information contained. Consider simplifying it.
Finally, you mentioned nothing about the "nested_data" nodes, and how they should be treated. I simply assumed that in case that exist, further expansion is required.
Please, consider trying to provide a bit of context in your questions, some real data examples (I believe the data provided is not real data, therefore the inconsistencies and redundancies), and try yourself and provide your efforts; that's the only way to learn.
from pprint import pprint
def reformat_flat_info(flat):
reformatted = {}
for o in flat:
key = o["aid"]
del o["aid"]
reformatted[key] = o
return reformatted
def expand_data(aid, flat, lvl=0):
obj = flat[aid]
if obj is None: return {aid: obj}
obj.update({"aid": aid})
if lvl > 1:
return {aid: None}
for nid,id in enumerate(obj["data"]):
obj["data"][nid] = expand_data(id, flat, lvl=lvl+1)
if "nested_data" in obj:
for nid,id in enumerate(obj["nested_data"]["data"]):
obj["nested_data"]["data"][nid] = expand_data(id, flat, lvl=lvl+1)
return {aid: obj}
# Provide the flat information structure
flat_info = [
{
"aid": "id3",
"data": ["id1", "id2"]
}, {
"aid": "id1",
"data": ["id3", "id2"]
}, {
"aid": "id2",
"nested_data": {"aid": "id4", "atype": "nested", "data": ["id1", "id3"]},
"data": []
}
]
pprint(flat_info)
print('-'*80)
# Reformat the flat information structure
new_flat_info = reformat_flat_info(flat=flat_info)
pprint(new_flat_info)
print('-'*80)
# Generate the result
starting_id = "id3"
result = expand_data(aid=starting_id, flat=new_flat_info)
pprint(result)

How to convert any nested json into a pandas dataframe

I'm currently working on a project that will be analyzing multiple data sources for information, other data sources are fine but I am having a lot of trouble with json and its sometimes deeply nested structure. I have tried to turn the json into a python dictionary, but with not much luck as it can start to struggle as it gets more complicated. For example with this sample json file:
{
"Employees": [
{
"userId": "rirani",
"jobTitleName": "Developer",
"firstName": "Romin",
"lastName": "Irani",
"preferredFullName": "Romin Irani",
"employeeCode": "E1",
"region": "CA",
"phoneNumber": "408-1234567",
"emailAddress": "romin.k.irani#gmail.com"
},
{
"userId": "nirani",
"jobTitleName": "Developer",
"firstName": "Neil",
"lastName": "Irani",
"preferredFullName": "Neil Irani",
"employeeCode": "E2",
"region": "CA",
"phoneNumber": "408-1111111",
"emailAddress": "neilrirani#gmail.com"
}
]
}
after converting to dictionary and doing dict.keys() only returns "Employees".
I then resorted to instead opt for a pandas dataframe and I could achieve what I wanted by calling json_normalize(dict['Employees'], sep="_") but my problem is that it must work for ALL jsons and looking at the data beforehand is not an option so my method of normalizing this way will not always work. Is there some way I could write some sort of function that would take in any json and convert it into a nice pandas dataframe? I have searched for about 2 weeks for answers bt with no luck regarding my specific problem. Thanks

I've had to do that in the past (Flatten out a big nested json). This blog was really helpful. Would something like this work for you?
Note, like the others have stated, for this to work for EVERY json, is a tall task, I'm merely offering a way to get started if you have a wider range of json format objects. I'm assuming they will be relatively CLOSE to what you posted as an example with hopefully similarly structures.)
jsonStr = '''{
"Employees" : [
{
"userId":"rirani",
"jobTitleName":"Developer",
"firstName":"Romin",
"lastName":"Irani",
"preferredFullName":"Romin Irani",
"employeeCode":"E1",
"region":"CA",
"phoneNumber":"408-1234567",
"emailAddress":"romin.k.irani#gmail.com"
},
{
"userId":"nirani",
"jobTitleName":"Developer",
"firstName":"Neil",
"lastName":"Irani",
"preferredFullName":"Neil Irani",
"employeeCode":"E2",
"region":"CA",
"phoneNumber":"408-1111111",
"emailAddress":"neilrirani#gmail.com"
}]
}'''
It flattens out the entire json into single rows, then you can put into a dataframe. In this case it creates 1 row with 18 columns. Then iterates through those columns, using the number values within those column names to reconstruct into multiple rows. If you had a different nested json, I'm thinking it theoretically should work, but you'll have to test it out.
import json
import pandas as pd
import re
def flatten_json(y):
out = {}
def flatten(x, name=''):
if type(x) is dict:
for a in x:
flatten(x[a], name + a + '_')
elif type(x) is list:
i = 0
for a in x:
flatten(a, name + str(i) + '_')
i += 1
else:
out[name[:-1]] = x
flatten(y)
return out
jsonObj = json.loads(jsonStr)
flat = flatten_json(jsonObj)
results = pd.DataFrame()
columns_list = list(flat.keys())
for item in columns_list:
row_idx = re.findall(r'\_(\d+)\_', item )[0]
column = item.replace('_'+row_idx+'_', '_')
row_idx = int(row_idx)
value = flat[item]
results.loc[row_idx, column] = value
print (results)
Output:
print (results)
Employees_userId ... Employees_emailAddress
0 rirani ... romin.k.irani#gmail.com
1 nirani ... neilrirani#gmail.com
[2 rows x 9 columns]

d={
"Employees" : [
{
"userId":"rirani",
"jobTitleName":"Developer",
"firstName":"Romin",
"lastName":"Irani",
"preferredFullName":"Romin Irani",
"employeeCode":"E1",
"region":"CA",
"phoneNumber":"408-1234567",
"emailAddress":"romin.k.irani#gmail.com"
},
{
"userId":"nirani",
"jobTitleName":"Developer",
"firstName":"Neil",
"lastName":"Irani",
"preferredFullName":"Neil Irani",
"employeeCode":"E2",
"region":"CA",
"phoneNumber":"408-1111111",
"emailAddress":"neilrirani#gmail.com"
}]
}
import pandas as pd
df=pd.DataFrame([x.values() for x in d["Employees"]],columns=d["Employees"][0].keys())
print(df)
Output
userId jobTitleName firstName ... region phoneNumber emailAddress
0 rirani Developer Romin ... CA 408-1234567 romin.k.irani#gmail.com
1 nirani Developer Neil ... CA 408-1111111 neilrirani#gmail.com
[2 rows x 9 columns]

For the particular JSON data given. My approach, which uses pandas package only, follows:
import pandas as pd
# json as python's dict object
jsn = {
"Employees" : [
{
"userId":"rirani",
"jobTitleName":"Developer",
"firstName":"Romin",
"lastName":"Irani",
"preferredFullName":"Romin Irani",
"employeeCode":"E1",
"region":"CA",
"phoneNumber":"408-1234567",
"emailAddress":"romin.k.irani#gmail.com"
},
{
"userId":"nirani",
"jobTitleName":"Developer",
"firstName":"Neil",
"lastName":"Irani",
"preferredFullName":"Neil Irani",
"employeeCode":"E2",
"region":"CA",
"phoneNumber":"408-1111111",
"emailAddress":"neilrirani#gmail.com"
}]
}
# get the main key, here 'Employees' with index '0'
emp = list(jsn.keys())[0]
# when you have several keys at this level, i.e. 'Employers' for example
# .. you need to handle all of them too (your task)
# get all the sub-keys of the main key[0]
all_keys = jsn[emp][0].keys()
# build dataframe
result_df = pd.DataFrame() # init a dataframe
for key in all_keys:
col_vals = []
for ea in jsn[emp]:
col_vals.append(ea[key])
# add a new column to the dataframe using sub-key as its header
# it is possible that values here is a nested object(s)
# .. such as dict, list, json
result_df[key]=col_vals
print(result_df.to_string())
Output:
userId lastName jobTitleName phoneNumber emailAddress employeeCode preferredFullName firstName region
0 rirani Irani Developer 408-1234567 romin.k.irani#gmail.com E1 Romin Irani Romin CA
1 nirani Irani Developer 408-1111111 neilrirani#gmail.com E2 Neil Irani Neil CA

Compare expected vs actual json response for a get request in Python

You will get a small json response back when you go to this site https://reqres.in/api/users/2
I am saving the response in a variable(actual). I have also put the response in another variable(expected).Both responses are same. I am changing the values to test failed cases. The ultimate goal is to compare 2 and make sure they match.
I have 2 functions, 1 compares keys and value of both dictionaries and the other function sorts the dictionaries. Code below:
import json
import requests
response = requests.get('https://reqres.in/api/users/2')
#actual_response saves the json as we get it from url above
actual_response= json.loads(response.text)
#expected response is saved after using pretty json that will be used to testing/comparing actual vs expected
expected_response={
"data": {
"id": 2,
"first_name": "Janet",
"last_name": "Weaver",
"avatar": "https://s3.amazonaws.com/uifaces/faces/twitter/josephstein/128.jpg"
}
}
# sort the key values before comparing
def dict_sort(dictA,dictB):
dictA, dictB = json.dumps(dictA, sort_keys=True), json.dumps(dictB, sort_keys=True)
dictA == dictB
#if there are any failure due to mismatch in key value the function below will show that
def key_diff(dictA,dictB):
for key,value in dictA.items():
for keyB,valueB in dictB.items():
for k,v in value.items():
for k2,v2 in valueB.items():
if(key!= keyB):
print('Expected',key,' but got',keyB)
if(k!=k2):
print('Expected', k, ' but got', k2)
if(v!=v2):
print('Expected', v, ' but got', v2)
else:
print()
dict_sort(actual_response,expected_response)
if(actual_response==expected_response):
print('Passed')
else:
print('Failed')
key_diff(actual_response,expected_response)
Problem: The test passes when there is no difference.However if there is any difference the order goes crazy. Here is an example where I changed data to dat inside expected response:
Expected data but got dat
Expected id but got last_name
Expected 2 but got Weaver
Should the sort function be more specific rather than using sort_keys=True?By the way thought about **args but I don't think that is a good choice in this scenario.
Thank You for your expert comment and time.

I advise to use unittest and avoid using so much nested for loops
from unittest import TestCase
import pandas as pd
import requests
def mocked_server_response():
expected = {"data": {"id": 2, "first_name": "Janet", "last_name": "Weaver",
"avatar": "https://s3.amazonaws.com/uifaces/faces/twitter/josephstein/128.jpg"}}
data = expected['data']
df = pd.DataFrame(my_dict['data'], index=[0])
return [expected, data, df]
At this point, the mocked_server_response()will get you this :
Out[27]:
id first_name last_name avatar
0 2 Janet Weaver https://s3.amazonaws.com/uifaces/faces/twitter...
Now, you can easily make test in a class.
class TestServerResponse(TestCase):
real_response = requests.get('https://reqres.in/api/users/2')
def setUp(self):
self.actual_response = real_response
def response(self):
self.assertEqual(self.actual_response, mocked_server_response()[0])
def test_data_in_response(self):
self.assertEqual(self.actual_response['data'], mocked_server_response()[1])
def test_dataframe(self):
self.assertEqual(pd.DataFrame(self.actual_response['data'], index=[0]), mocked_server_response()[2])

Key order is not guaranteed in Python versions under 3.7; you should use collections.OrderedDict when you need to create an object that remembers key order.
In Python 3.7 the insertion ordered is preserved, so your keys will always match.

import requests
import json
# Here I am converting the expected payload in dictionary
expected_payload = json.loads("""[
{
"key": "data-center",
"value": "All",
"values": [
"1",
"2"
]
},
{
"key": "router",
"value": "All",
"values": [
"cisco",
"juniper"
]
},
{
"key": "virtual-machine",
"value": "All",
"values": [
"dell",
"hp",
"None"
]
},
]""")
def test_get_all_system_attributes():
url = "http://" + str(ipaddr) + ":" + str(port) + "/Service/" + "system_attribute/"
payload = {}
headers = {
'Content-Type': 'application/json'
}
actual_response = requests.request("GET", url, headers=headers, data=payload)
assert json.loads(actual_response.text) == expected_payload
# json.loads(actual_response.text) will convert the response in dictionary
# using assert I am comparing the actual_response with exepcted_response
if __name__ == '__main__':
test_get_all_system_attributes()

KeyError: 'Bytes_Written' python

I do not understand why I get this error Bytes_Written is in the dataset but why can't python find it? I am getting this information(see dataset below) from a VM, I want to select Bytes_Written and Bytes_Read and then subtract the previous values from current value and print a json object like this
{'Bytes_Written': previousValue-currentValue, 'Bytes_Read': previousValue-currentValue}
here is what the data looks like:
{
"Number of Devices": 2,
"Block Devices": {
"bdev0": {
"Backend_Device_Path": "/dev/disk/by-path/ip-192.168.26.1:3260-iscsi-iqn.2010-10.org.openstack:volume-d1c8e7c6-8c77-444c-9a93-8b56fa1e37f2-lun-010.0.0.142",
"Capacity": "2147483648",
"Guest_Device_Name": "vdb",
"IO_Operations": "97069",
"Bytes_Written": "34410496",
"Bytes_Read": "363172864"
},
"bdev1": {
"Backend_Device_Path": "/dev/disk/by-path/ip-192.168.26.1:3260-iscsi-iqn.2010-10.org.openstack:volume-b27110f9-41ba-4bc6-b97c-b5dde23af1f9-lun-010.0.0.146",
"Capacity": "2147483648",
"Guest_Device_Name": "vdb",
"IO_Operations": "93",
"Bytes_Written": "0",
"Bytes_Read": "380928"
}
}
}
This is the complete code that I am running.
FIELDS = ("Bytes_Written", "Bytes_Read", "IO_Operation")
def counterVolume_one(state):
url = 'http://url'
r = requests.get(url)
data = r.json()
for field in FIELDS:
state[field] += data[field]
return state
state = {"Bytes_Written": 0, "Bytes_Read": 0, "IO_Operation": 0}
while True:
counterVolume_one(state)
time.sleep(1)
for field in FIELDS:
print("{field:s}: {count:d}".format(field=field, count=state[field]))
counterVolume_one(state)

Your returned JSON structure does not have any of these FIELDS = ("Bytes_Written", "Bytes_Read", "IO_Operation") keys directly.
You'll need to modify your code slightly.
data = r.json()
for block_device in data['Block Devices'].iterkeys():
for field in FIELDS:
state[field] += int(data['Block Devices'][block_device][field])

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Elasticsearch Aggregation to pandas Dataframe - python

Related

Batch update validation and formatting cells using pygsheets

How to create a tree using BFS in python?

How to convert any nested json into a pandas dataframe

Compare expected vs actual json response for a get request in Python

KeyError: 'Bytes_Written' python

Categories

Resources