KeyError: 'Bytes_Written' python

KeyError: 'Bytes_Written' python - python

I do not understand why I get this error Bytes_Written is in the dataset but why can't python find it? I am getting this information(see dataset below) from a VM, I want to select Bytes_Written and Bytes_Read and then subtract the previous values from current value and print a json object like this
{'Bytes_Written': previousValue-currentValue, 'Bytes_Read': previousValue-currentValue}
here is what the data looks like:
{
"Number of Devices": 2,
"Block Devices": {
"bdev0": {
"Backend_Device_Path": "/dev/disk/by-path/ip-192.168.26.1:3260-iscsi-iqn.2010-10.org.openstack:volume-d1c8e7c6-8c77-444c-9a93-8b56fa1e37f2-lun-010.0.0.142",
"Capacity": "2147483648",
"Guest_Device_Name": "vdb",
"IO_Operations": "97069",
"Bytes_Written": "34410496",
"Bytes_Read": "363172864"
},
"bdev1": {
"Backend_Device_Path": "/dev/disk/by-path/ip-192.168.26.1:3260-iscsi-iqn.2010-10.org.openstack:volume-b27110f9-41ba-4bc6-b97c-b5dde23af1f9-lun-010.0.0.146",
"Capacity": "2147483648",
"Guest_Device_Name": "vdb",
"IO_Operations": "93",
"Bytes_Written": "0",
"Bytes_Read": "380928"
}
}
}
This is the complete code that I am running.
FIELDS = ("Bytes_Written", "Bytes_Read", "IO_Operation")
def counterVolume_one(state):
url = 'http://url'
r = requests.get(url)
data = r.json()
for field in FIELDS:
state[field] += data[field]
return state
state = {"Bytes_Written": 0, "Bytes_Read": 0, "IO_Operation": 0}
while True:
counterVolume_one(state)
time.sleep(1)
for field in FIELDS:
print("{field:s}: {count:d}".format(field=field, count=state[field]))
counterVolume_one(state)

Your returned JSON structure does not have any of these FIELDS = ("Bytes_Written", "Bytes_Read", "IO_Operation") keys directly.
You'll need to modify your code slightly.
data = r.json()
for block_device in data['Block Devices'].iterkeys():
for field in FIELDS:
state[field] += int(data['Block Devices'][block_device][field])

Related

AttributeError 'bytes' object has no attribute 'get'

I'm managing an api and I would like to understand why I have the error "AttributeError 'bytes' object has no attribute 'get'"
import requests
from requests.auth import HTTPBasicAuth
incidents = []
limit = 100
offset = 0
got_all_events = False
while not got_all_events:
alerts = requests.get(f'***', auth=HTTPBasicAuth('***', '***'))
print(alerts)
for alert in alerts:
incident = {
'name': alert.get('hostname'),
'rawJSON': json.dumps(alert),
}
incidents.append(incident)
if len(alerts) == limit:
offset += limit
else:
got_all_events = True
print(incident)
The error is regarding this code line
'name': alert.get('hostname'),
From my api my array is defined like that:
{
"totalSize": 0,
"data": [
{
"hostname": "string",
"domain": "string",
"tags": "string",
"os": {},
}
],
"size": 0
}

Your dictionary has the hostname key nested in a subdictionary as part of a list, so the get is unable to find it.
The only valid targets for the get on alert are totalsize, data, and size.
You could run a get on alert with key data to return a list, then run another get on position 0 of that list with key hostname to get what you want.
A more straightforward approach would just be:
'name': alert["data"][0]["hostname"]

alerts is a requests.Response object, not something decoded from JSON. You have to extract the body first. The iterator for a Response yields a sequence of bytes values, which is where the bytes value you think should be a dict comes from.
response = requests.get(f'***', auth=HTTPBasicAuth('***', '***'))
alerts = response.json()
for alert in alerts:
incident = {
'name': alert.get('hostname'),
'rawJSON': json.dumps(alert),
}
incidents.append(incident)

Thanks for your help.
Now it is working like that:
r=alerts.json()
for alert in r['data']:
incident = {
'hostName': alert["hostname"],
}
incidents.append(incident)
print(json.dumps(incidents, indent=1))

Why does my pandas list contain duplicates?

I have a function that takes api_response and tests to see if a condition is met if "meta" not in api_response:. If the condition is met, I extract the key/pair value percent_complete value from the response and print it to the console. This value is a percentage, and only appears once in the api_response.
My issue is, when it prints to the console, the list (which should contain 1x value e.g, 0.19) is printing the value twice.
E.g., if percent_complete == 0.19, the console will print Your data requested, associated with ID: 2219040 is (0.19, 0.19) complete!.
Is there anything wrong with my code, that might be causing this?
Function -
def api_call():
# Calling function that returns API authentication details for use in endpoint_initializer()
key, secret, url = ini_reader()
# Calling function that makes initial API POST call and returns endpoint_url to call, until data is returned.
endpoint_url = endpoint_initializer()
# saving current date in a variable, for use when printing user message
date = dt.datetime.today().strftime("%Y-%m-%d")
# Printing endpoint_url and current date.
print("-------------------------------------\n","API URL constructed for:", date, "\n-------------------------------------")
print("-------------------------------------------------------------\n","Endpoint:", endpoint_url, "\n-------------------------------------------------------------")
# Loop will continously call the end_point URL until data is returned. When data is not returned the `percent_complete' key value is extracted from api response.
# this will inform user of status of data aggregation.
while True:
response = requests.get(url = endpoint_url, auth = HTTPBasicAuth(key, secret), headers = {"Vendor-firm": "343"})
api_response = json.loads(response.text)
# Test condition to see if "meta" is in api_response. Meta only in response, when data is ready.
if "meta" not in api_response:
id_value = "id"
res1 = [val[id_value] for key, val in api_response.items() if id_value in val]
id_value = "".join(res1)
percent_value = "percent_complete"
res2 = (tuple(api_response["data"]["attributes"].get("percent_complete", '') for key, val in api_response.items()))
print(f' Your data requested, associated with ID: {id_value} is {res2} complete!')
time.sleep(5)
# Condition to allow API response to be returned, if condition is not met.
elif "meta" in api_response:
return api_response
Example API response -
{
"data": {
"id": "2219040",
"type": "jobs",
"attributes": {
"job_type": "PORTFOLIO_VIEW_RESULTS",
"started_at": "2021-12-18T17:40:17Z",
"parameters": {
"end_date": "2021-12-14",
"output_type": "json",
"view_id": 304078,
"portfolio_id": 1,
"portfolio_type": "firm",
"start_date": "2021-12-14"
},
"percent_complete": 0.19,
"status": "In Progress"
},
"relationships": {
"creator": {
"links": {
"self": "/v1/jobs/2219040/relationships/creator",
"related": "/v1/jobs/2219040/creator"
},
"data": {
"type": "users",
"id": "731221"
}
}
},
"links": {
"self": "/v1/jobs/2219040"
}
},
"included": []
}

The dictionary in your response contains two items ('data' and 'included'). Your code that creates res2 iterates over all of the items:
res2 = (tuple(api_response["data"]["attributes"].get("percent_complete", '') for key, val in api_response.items()))
so you get the information twice. Since you are just pulling data from the 'data' key, it's silly to iterate over the items. Right? Just do:
res2 = api_response["data"]["attributes"].get("percent_complete", '')

Elasticsearch Aggregation to pandas Dataframe

I am working with some ElasticSearch data and i would like to generate the tables from the aggregations like in Kibana. A sample output of the aggregation is below, based on the following code :
s.aggs.bucket("name1", "terms", field="field1").bucket(
"name2", "terms", field="innerField1"
).bucket("name3", "terms", field="InnerAgg1")
response = s.execute()
resp_dict = response.aggregations.name.buckets
{
"key": "Locationx",
"doc_count": 12,
"name2": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [{
"key": "Sub-Loc1",
"doc_count": 1,
"name3": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [{
"key": "super-Loc1",
"doc_count": 1
}]
}
}, {
"key": "Sub-Loc2",
"doc_count": 1,
"name3": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [{
"key": "super-Loc1",
"doc_count": 1
}]
}
}]
}
}
In this case, the expected output would be:
Now, I have tried a variety of methods, with a short description of what went wrong :
Pandasticsearch = completely failed even with just 1 dictionary. The dictionary was not created, as it was struggling with keys, even with each dictionary being dealt with separately:
for d in resp_dict :
x= d.to_dict()
pandas_df = Select.from_dict(x).to_pandas()
print(pandas_df)
In particular, the error that was recieved related to the the fact that the dictionary was not made and thus ['took'] was not a key.
Pandas (pd.Dataframe.from_records()) = only gave me the first aggregation, with a column containing the inner dictionary, and using pd.apply(pd.Series) on it gave another table of resulting dictionaries.
StackOverflow posts recursive function = the dictionary looks completely different than the example used,and tinkering led me nowhere unless i drastically change the input.

Struggling with the same problem, I've come to believe the reason for this being that the response_dict are not normal dicts, but an elasticsearch_dsl.utils.AttrList of elasticsearch_dsl.utils.AttrDict.
If you have an AttrList of AttrDicts, it's possible to do:
resp_dict = response.aggregations.name.buckets
new_response = [i._d_ for i in resp_dict]
To get a list of normal dicts instead. This will probably play nicer with other libraries.
Edit:
I wrote a recursive function which at least handles some cases, not extensively tested yet though and not wrapped in a nice module or anything. It's just a script. The one_lvl function keeps track of all the siblings and siblings of parents in the tree in a dictionary called tmp, and recurses when it finds a new named aggregation. It assumes a lot about the structure of the data, which I'm not sure is warranted in the general case.
The lvl stuff is necessary I think because you might have duplicate names, so key exists at several aggregation-levels for instance.
#!/usr/bin/env python3
from elasticsearch_dsl.query import QueryString
from elasticsearch_dsl import Search, A
from elasticsearch import Elasticsearch
import pandas as pd
PORT = 9250
TIMEOUT = 10000
USR = "someusr"
PW = "somepw"
HOST = "test.com"
INDEX = "my_index"
QUERY = "foobar"
client = Elasticsearch([HOST], port = PORT, http_auth=(USR, PW), timeout = TIMEOUT)
qs = QueryString(query = QUERY)
s = Search(using=client, index=INDEX).query(qs)
s = s.params(size = 0)
agg= {
"dates" : A("date_histogram", field="date", interval="1M", time_zone="Europe/Berlin"),
"region" : A("terms", field="region", size=10),
"county" : A("terms", field="county", size = 10)
}
s.aggs.bucket("dates", agg["dates"]). \
bucket("region", agg["region"]). \
bucket("county", agg["county"])
resp = s.execute()
data = {"buckets" : [i._d_ for i in resp.aggregations.dates]}
rec_list = ["buckets"] + [*agg.keys()]
def get_fields(i, lvl):
return {(k + f"{lvl}"):v for k, v in i.items() if k not in rec_list}
def one_lvl(data, tmp, lvl, rows, maxlvl):
tmp = {**tmp, **get_fields(data, lvl)}
if "buckets" not in data:
rows.append(tmp)
for d in data:
if d in ["buckets"]:
for v, b in enumerate(data[d]):
tmp = {**tmp, **get_fields(data[d][v], lvl)}
for k in b:
if k in agg.keys():
one_lvl(data[d][v][k], tmp, lvl+1, rows, maxlvl)
else:
if lvl == maxlvl:
tmp = {**tmp, (k + f"{lvl}") : data[d][v][k]}
rows.append(tmp)
return rows
rows = one_lvl(data, {}, 1, [], len(agg))
df = pd.DataFrame(rows)

elasticsearch scrolling using python client

When scrolling in elasticsearch it is important to provide at each scroll the latest scroll_id:
The initial search request and each subsequent scroll request returns
a new scroll_id — only the most recent scroll_id should be used.
The following example (taken from here) puzzle me. First, the srolling initialization:
rs = es.search(index=['tweets-2014-04-12','tweets-2014-04-13'],
scroll='10s',
search_type='scan',
size=100,
preference='_primary_first',
body={
"fields" : ["created_at", "entities.urls.expanded_url", "user.id_str"],
"query" : {
"wildcard" : { "entities.urls.expanded_url" : "*.ru" }
}
}
)
sid = rs['_scroll_id']
and then the looping:
tweets = [] while (1):
try:
rs = es.scroll(scroll_id=sid, scroll='10s')
tweets += rs['hits']['hits']
except:
break
It works, but I don't see where sid is updated... I believe that it happens internally, in the python client; but I don't understand how it works...

This is an old question, but for some reason came up first when searching for "elasticsearch python scroll". The python module provides a helper method to do all the work for you. It is a generator function that will return each document to you while managing the underlying scroll ids.
https://elasticsearch-py.readthedocs.io/en/master/helpers.html#scan
Here is an example of usage:
from elasticsearch import Elasticsearch
from elasticsearch.helpers import scan
query = {
"query": {"match_all": {}}
}
es = Elasticsearch(...)
for hit in scan(es, index="my-index", query=query):
print(hit["_source"]["field"])

Using python requests
import requests
import json
elastic_url = 'http://localhost:9200/my_index/_search?scroll=1m'
scroll_api_url = 'http://localhost:9200/_search/scroll'
headers = {'Content-Type': 'application/json'}
payload = {
"size": 100,
"sort": ["_doc"]
"query": {
"match" : {
"title" : "elasticsearch"
}
}
}
r1 = requests.request(
"POST",
elastic_url,
data=json.dumps(payload),
headers=headers
)
# first batch data
try:
res_json = r1.json()
data = res_json['hits']['hits']
_scroll_id = res_json['_scroll_id']
except KeyError:
data = []
_scroll_id = None
print 'Error: Elastic Search: %s' % str(r1.json())
while data:
print data
# scroll to get next batch data
scroll_payload = json.dumps({
'scroll': '1m',
'scroll_id': _scroll_id
})
scroll_res = requests.request(
"POST", scroll_api_url,
data=scroll_payload,
headers=headers
)
try:
res_json = scroll_res.json()
data = res_json['hits']['hits']
_scroll_id = res_json['_scroll_id']
except KeyError:
data = []
_scroll_id = None
err_msg = 'Error: Elastic Search Scroll: %s'
print err_msg % str(scroll_res.json())
Reference: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-scroll.html#search-request-scroll

In fact the code has a bug in it - in order to use the scroll feature correctly you are supposed to use the new scroll_id returned with each new call in the next call to scroll(), not reuse the first one:
Important
The initial search request and each subsequent scroll request returns
a new scroll_id — only the most recent scroll_id should be used.
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-scroll.html
It's working because Elasticsearch does not always change the scroll_id in between calls and can for smaller result sets return the same scroll_id as was originally returned for some time. This discussion from last year is between two other users seeing the same issue, the same scroll_id being returned for awhile:
http://elasticsearch-users.115913.n3.nabble.com/Distributing-query-results-using-scrolling-td4036726.html
So while your code is working for a smaller result set it's not correct - you need to capture the scroll_id returned in each new call to scroll() and use that for the next call.

self._elkUrl = "http://Hostname:9200/logstash-*/_search?scroll=1m"
self._scrollUrl="http://Hostname:9200/_search/scroll"
"""
Function to get the data from ELK through scrolling mechanism
"""
def GetDataFromELK(self):
#implementing scroll and retriving data from elk to get more than 100000 records at one search
#ref :https://www.elastic.co/guide/en/elasticsearch/reference/6.8/search-request-scroll.html
try :
dataFrame=pd.DataFrame()
if self._elkUrl is None:
raise ValueError("_elkUrl is missing")
if self._username is None:
raise ValueError("_userNmae for elk is missing")
if self._password is None:
raise ValueError("_password for elk is missing")
response=requests.post(self._elkUrl,json=self.body,auth=(self._username,self._password))
response=response.json()
if response is None:
raise ValueError("response is missing")
sid = response['_scroll_id']
hits = response['hits']
total= hits["total"]
if total is None:
raise ValueError("total hits from ELK is none")
total_val=int(total['value'])
url = self._scrollUrl
if url is None:
raise ValueError("scroll url is missing")
#start scrolling
while(total_val>0):
#keep search context alive for 2m
scroll = '2m'
scroll_query={"scroll" : scroll, "scroll_id" : sid }
response1=requests.post(url,json=scroll_query,auth=(self._username,self._password))
response1=response1.json()
# The result from the above request includes a scroll_id, which should be passed to the scroll API in order to retrieve the next batch of results
sid = response1['_scroll_id']
hits=response1['hits']
data=response1['hits']['hits']
if len(data)>0:
cleanDataFrame=self.DataClean(data)
dataFrame=dataFrame.append(cleanDataFrame)
total_val=len(response1['hits']['hits'])
num=len(dataFrame)
print('Total records recieved from ELK=',num)
return dataFrame
except Exception as e:
logging.error('Error while getting the data from elk', exc_info=e)
sys.exit()

from elasticsearch import Elasticsearch
elasticsearch_user_name ='es_username'
elasticsearch_user_password ='es_password'
es_index = "es_index"
es = Elasticsearch(["127.0.0.1:9200"],
http_auth=(elasticsearch_user_name, elasticsearch_user_password))
query = {
"query": {
"bool": {
"must": [
{
"range": {
"es_datetime": {
"gte": "2021-06-21T09:00:00.356Z",
"lte": "2021-06-21T09:01:00.356Z",
"format": "strict_date_optional_time"
}
}
}
]
}
},
"fields": [
"*"
],
"_source": False,
"size": 2000,
}
resp = es.search(index=es_index, body=query, scroll="1m")
old_scroll_id = resp['_scroll_id']
results = resp['hits']['hits']
while len(results):
for i, r in enumerate(results):
# do something whih data
pass
result = es.scroll(
scroll_id=old_scroll_id,
scroll='1m' # length of time to keep search context
)
# check if there's a new scroll ID
if old_scroll_id != result['_scroll_id']:
print("NEW SCROLL ID:", result['_scroll_id'])
# keep track of pass scroll _id
old_scroll_id = result['_scroll_id']
results = result['hits']['hits']

Selecting fields from JSON output

Using Python, how can i extract the field id to a variable? Basicaly, i to transform this:
{
"accountWide": true,
"criteria": [
{
"description": "some description",
"id": 7553,
"max": 1,
"orderIndex": 0
}
]
}
to something like
print "Description is: " + description
print "ID is: " + id
print "Max value is : " + max

Assume you stored that dictionary in a variable called values. To get id in to a variable, do:
idValue = values['criteria'][0]['id']
If that json is in a file, do the following to load it:
import json
jsonFile = open('your_filename.json', 'r')
values = json.load(jsonFile)
jsonFile.close()
If that json is from a URL, do the following to load it:
import urllib, json
f = urllib.urlopen("http://domain/path/jsonPage")
values = json.load(f)
f.close()
To print ALL of the criteria, you could:
for criteria in values['criteria']:
for key, value in criteria.iteritems():
print key, 'is:', value
print ''

Assuming you are dealing with a JSON-string in the input, you can parse it using the json package, see the documentation.
In the specific example you posted you would need
x = json.loads("""{
"accountWide": true,
"criteria": [
{
"description": "some description",
"id": 7553,
"max": 1,
"orderIndex": 0
}
]
}""")
description = x['criteria'][0]['description']
id = x['criteria'][0]['id']
max = x['criteria'][0]['max']

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

KeyError: 'Bytes_Written' python - python

Related

AttributeError 'bytes' object has no attribute 'get'

Why does my pandas list contain duplicates?

Elasticsearch Aggregation to pandas Dataframe

elasticsearch scrolling using python client

Selecting fields from JSON output

Categories

Resources