I do not understand why I get this error Bytes_Written is in the dataset but why can't python find it? I am getting this information(see dataset below) from a VM, I want to select Bytes_Written and Bytes_Read and then subtract the previous values from current value and print a json object like this
{'Bytes_Written': previousValue-currentValue, 'Bytes_Read': previousValue-currentValue}
here is what the data looks like:
{
"Number of Devices": 2,
"Block Devices": {
"bdev0": {
"Backend_Device_Path": "/dev/disk/by-path/ip-192.168.26.1:3260-iscsi-iqn.2010-10.org.openstack:volume-d1c8e7c6-8c77-444c-9a93-8b56fa1e37f2-lun-010.0.0.142",
"Capacity": "2147483648",
"Guest_Device_Name": "vdb",
"IO_Operations": "97069",
"Bytes_Written": "34410496",
"Bytes_Read": "363172864"
},
"bdev1": {
"Backend_Device_Path": "/dev/disk/by-path/ip-192.168.26.1:3260-iscsi-iqn.2010-10.org.openstack:volume-b27110f9-41ba-4bc6-b97c-b5dde23af1f9-lun-010.0.0.146",
"Capacity": "2147483648",
"Guest_Device_Name": "vdb",
"IO_Operations": "93",
"Bytes_Written": "0",
"Bytes_Read": "380928"
}
}
}
This is the complete code that I am running.
FIELDS = ("Bytes_Written", "Bytes_Read", "IO_Operation")
def counterVolume_one(state):
url = 'http://url'
r = requests.get(url)
data = r.json()
for field in FIELDS:
state[field] += data[field]
return state
state = {"Bytes_Written": 0, "Bytes_Read": 0, "IO_Operation": 0}
while True:
counterVolume_one(state)
time.sleep(1)
for field in FIELDS:
print("{field:s}: {count:d}".format(field=field, count=state[field]))
counterVolume_one(state)
Your returned JSON structure does not have any of these FIELDS = ("Bytes_Written", "Bytes_Read", "IO_Operation") keys directly.
You'll need to modify your code slightly.
data = r.json()
for block_device in data['Block Devices'].iterkeys():
for field in FIELDS:
state[field] += int(data['Block Devices'][block_device][field])
Related
I'm managing an api and I would like to understand why I have the error "AttributeError 'bytes' object has no attribute 'get'"
import requests
from requests.auth import HTTPBasicAuth
incidents = []
limit = 100
offset = 0
got_all_events = False
while not got_all_events:
alerts = requests.get(f'***', auth=HTTPBasicAuth('***', '***'))
print(alerts)
for alert in alerts:
incident = {
'name': alert.get('hostname'),
'rawJSON': json.dumps(alert),
}
incidents.append(incident)
if len(alerts) == limit:
offset += limit
else:
got_all_events = True
print(incident)
The error is regarding this code line
'name': alert.get('hostname'),
From my api my array is defined like that:
{
"totalSize": 0,
"data": [
{
"hostname": "string",
"domain": "string",
"tags": "string",
"os": {},
}
],
"size": 0
}
Your dictionary has the hostname key nested in a subdictionary as part of a list, so the get is unable to find it.
The only valid targets for the get on alert are totalsize, data, and size.
You could run a get on alert with key data to return a list, then run another get on position 0 of that list with key hostname to get what you want.
A more straightforward approach would just be:
'name': alert["data"][0]["hostname"]
alerts is a requests.Response object, not something decoded from JSON. You have to extract the body first. The iterator for a Response yields a sequence of bytes values, which is where the bytes value you think should be a dict comes from.
response = requests.get(f'***', auth=HTTPBasicAuth('***', '***'))
alerts = response.json()
for alert in alerts:
incident = {
'name': alert.get('hostname'),
'rawJSON': json.dumps(alert),
}
incidents.append(incident)
Thanks for your help.
Now it is working like that:
r=alerts.json()
for alert in r['data']:
incident = {
'hostName': alert["hostname"],
}
incidents.append(incident)
print(json.dumps(incidents, indent=1))
I have a function that takes api_response and tests to see if a condition is met if "meta" not in api_response:. If the condition is met, I extract the key/pair value percent_complete value from the response and print it to the console. This value is a percentage, and only appears once in the api_response.
My issue is, when it prints to the console, the list (which should contain 1x value e.g, 0.19) is printing the value twice.
E.g., if percent_complete == 0.19, the console will print Your data requested, associated with ID: 2219040 is (0.19, 0.19) complete!.
Is there anything wrong with my code, that might be causing this?
Function -
def api_call():
# Calling function that returns API authentication details for use in endpoint_initializer()
key, secret, url = ini_reader()
# Calling function that makes initial API POST call and returns endpoint_url to call, until data is returned.
endpoint_url = endpoint_initializer()
# saving current date in a variable, for use when printing user message
date = dt.datetime.today().strftime("%Y-%m-%d")
# Printing endpoint_url and current date.
print("-------------------------------------\n","API URL constructed for:", date, "\n-------------------------------------")
print("-------------------------------------------------------------\n","Endpoint:", endpoint_url, "\n-------------------------------------------------------------")
# Loop will continously call the end_point URL until data is returned. When data is not returned the `percent_complete' key value is extracted from api response.
# this will inform user of status of data aggregation.
while True:
response = requests.get(url = endpoint_url, auth = HTTPBasicAuth(key, secret), headers = {"Vendor-firm": "343"})
api_response = json.loads(response.text)
# Test condition to see if "meta" is in api_response. Meta only in response, when data is ready.
if "meta" not in api_response:
id_value = "id"
res1 = [val[id_value] for key, val in api_response.items() if id_value in val]
id_value = "".join(res1)
percent_value = "percent_complete"
res2 = (tuple(api_response["data"]["attributes"].get("percent_complete", '') for key, val in api_response.items()))
print(f' Your data requested, associated with ID: {id_value} is {res2} complete!')
time.sleep(5)
# Condition to allow API response to be returned, if condition is not met.
elif "meta" in api_response:
return api_response
Example API response -
{
"data": {
"id": "2219040",
"type": "jobs",
"attributes": {
"job_type": "PORTFOLIO_VIEW_RESULTS",
"started_at": "2021-12-18T17:40:17Z",
"parameters": {
"end_date": "2021-12-14",
"output_type": "json",
"view_id": 304078,
"portfolio_id": 1,
"portfolio_type": "firm",
"start_date": "2021-12-14"
},
"percent_complete": 0.19,
"status": "In Progress"
},
"relationships": {
"creator": {
"links": {
"self": "/v1/jobs/2219040/relationships/creator",
"related": "/v1/jobs/2219040/creator"
},
"data": {
"type": "users",
"id": "731221"
}
}
},
"links": {
"self": "/v1/jobs/2219040"
}
},
"included": []
}
The dictionary in your response contains two items ('data' and 'included'). Your code that creates res2 iterates over all of the items:
res2 = (tuple(api_response["data"]["attributes"].get("percent_complete", '') for key, val in api_response.items()))
so you get the information twice. Since you are just pulling data from the 'data' key, it's silly to iterate over the items. Right? Just do:
res2 = api_response["data"]["attributes"].get("percent_complete", '')
I am working with some ElasticSearch data and i would like to generate the tables from the aggregations like in Kibana. A sample output of the aggregation is below, based on the following code :
s.aggs.bucket("name1", "terms", field="field1").bucket(
"name2", "terms", field="innerField1"
).bucket("name3", "terms", field="InnerAgg1")
response = s.execute()
resp_dict = response.aggregations.name.buckets
{
"key": "Locationx",
"doc_count": 12,
"name2": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [{
"key": "Sub-Loc1",
"doc_count": 1,
"name3": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [{
"key": "super-Loc1",
"doc_count": 1
}]
}
}, {
"key": "Sub-Loc2",
"doc_count": 1,
"name3": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [{
"key": "super-Loc1",
"doc_count": 1
}]
}
}]
}
}
In this case, the expected output would be:
Now, I have tried a variety of methods, with a short description of what went wrong :
Pandasticsearch = completely failed even with just 1 dictionary. The dictionary was not created, as it was struggling with keys, even with each dictionary being dealt with separately:
for d in resp_dict :
x= d.to_dict()
pandas_df = Select.from_dict(x).to_pandas()
print(pandas_df)
In particular, the error that was recieved related to the the fact that the dictionary was not made and thus ['took'] was not a key.
Pandas (pd.Dataframe.from_records()) = only gave me the first aggregation, with a column containing the inner dictionary, and using pd.apply(pd.Series) on it gave another table of resulting dictionaries.
StackOverflow posts recursive function = the dictionary looks completely different than the example used,and tinkering led me nowhere unless i drastically change the input.
Struggling with the same problem, I've come to believe the reason for this being that the response_dict are not normal dicts, but an elasticsearch_dsl.utils.AttrList of elasticsearch_dsl.utils.AttrDict.
If you have an AttrList of AttrDicts, it's possible to do:
resp_dict = response.aggregations.name.buckets
new_response = [i._d_ for i in resp_dict]
To get a list of normal dicts instead. This will probably play nicer with other libraries.
Edit:
I wrote a recursive function which at least handles some cases, not extensively tested yet though and not wrapped in a nice module or anything. It's just a script. The one_lvl function keeps track of all the siblings and siblings of parents in the tree in a dictionary called tmp, and recurses when it finds a new named aggregation. It assumes a lot about the structure of the data, which I'm not sure is warranted in the general case.
The lvl stuff is necessary I think because you might have duplicate names, so key exists at several aggregation-levels for instance.
#!/usr/bin/env python3
from elasticsearch_dsl.query import QueryString
from elasticsearch_dsl import Search, A
from elasticsearch import Elasticsearch
import pandas as pd
PORT = 9250
TIMEOUT = 10000
USR = "someusr"
PW = "somepw"
HOST = "test.com"
INDEX = "my_index"
QUERY = "foobar"
client = Elasticsearch([HOST], port = PORT, http_auth=(USR, PW), timeout = TIMEOUT)
qs = QueryString(query = QUERY)
s = Search(using=client, index=INDEX).query(qs)
s = s.params(size = 0)
agg= {
"dates" : A("date_histogram", field="date", interval="1M", time_zone="Europe/Berlin"),
"region" : A("terms", field="region", size=10),
"county" : A("terms", field="county", size = 10)
}
s.aggs.bucket("dates", agg["dates"]). \
bucket("region", agg["region"]). \
bucket("county", agg["county"])
resp = s.execute()
data = {"buckets" : [i._d_ for i in resp.aggregations.dates]}
rec_list = ["buckets"] + [*agg.keys()]
def get_fields(i, lvl):
return {(k + f"{lvl}"):v for k, v in i.items() if k not in rec_list}
def one_lvl(data, tmp, lvl, rows, maxlvl):
tmp = {**tmp, **get_fields(data, lvl)}
if "buckets" not in data:
rows.append(tmp)
for d in data:
if d in ["buckets"]:
for v, b in enumerate(data[d]):
tmp = {**tmp, **get_fields(data[d][v], lvl)}
for k in b:
if k in agg.keys():
one_lvl(data[d][v][k], tmp, lvl+1, rows, maxlvl)
else:
if lvl == maxlvl:
tmp = {**tmp, (k + f"{lvl}") : data[d][v][k]}
rows.append(tmp)
return rows
rows = one_lvl(data, {}, 1, [], len(agg))
df = pd.DataFrame(rows)
When scrolling in elasticsearch it is important to provide at each scroll the latest scroll_id:
The initial search request and each subsequent scroll request returns
a new scroll_id — only the most recent scroll_id should be used.
The following example (taken from here) puzzle me. First, the srolling initialization:
rs = es.search(index=['tweets-2014-04-12','tweets-2014-04-13'],
scroll='10s',
search_type='scan',
size=100,
preference='_primary_first',
body={
"fields" : ["created_at", "entities.urls.expanded_url", "user.id_str"],
"query" : {
"wildcard" : { "entities.urls.expanded_url" : "*.ru" }
}
}
)
sid = rs['_scroll_id']
and then the looping:
tweets = [] while (1):
try:
rs = es.scroll(scroll_id=sid, scroll='10s')
tweets += rs['hits']['hits']
except:
break
It works, but I don't see where sid is updated... I believe that it happens internally, in the python client; but I don't understand how it works...
This is an old question, but for some reason came up first when searching for "elasticsearch python scroll". The python module provides a helper method to do all the work for you. It is a generator function that will return each document to you while managing the underlying scroll ids.
https://elasticsearch-py.readthedocs.io/en/master/helpers.html#scan
Here is an example of usage:
from elasticsearch import Elasticsearch
from elasticsearch.helpers import scan
query = {
"query": {"match_all": {}}
}
es = Elasticsearch(...)
for hit in scan(es, index="my-index", query=query):
print(hit["_source"]["field"])
Using python requests
import requests
import json
elastic_url = 'http://localhost:9200/my_index/_search?scroll=1m'
scroll_api_url = 'http://localhost:9200/_search/scroll'
headers = {'Content-Type': 'application/json'}
payload = {
"size": 100,
"sort": ["_doc"]
"query": {
"match" : {
"title" : "elasticsearch"
}
}
}
r1 = requests.request(
"POST",
elastic_url,
data=json.dumps(payload),
headers=headers
)
# first batch data
try:
res_json = r1.json()
data = res_json['hits']['hits']
_scroll_id = res_json['_scroll_id']
except KeyError:
data = []
_scroll_id = None
print 'Error: Elastic Search: %s' % str(r1.json())
while data:
print data
# scroll to get next batch data
scroll_payload = json.dumps({
'scroll': '1m',
'scroll_id': _scroll_id
})
scroll_res = requests.request(
"POST", scroll_api_url,
data=scroll_payload,
headers=headers
)
try:
res_json = scroll_res.json()
data = res_json['hits']['hits']
_scroll_id = res_json['_scroll_id']
except KeyError:
data = []
_scroll_id = None
err_msg = 'Error: Elastic Search Scroll: %s'
print err_msg % str(scroll_res.json())
Reference: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-scroll.html#search-request-scroll
In fact the code has a bug in it - in order to use the scroll feature correctly you are supposed to use the new scroll_id returned with each new call in the next call to scroll(), not reuse the first one:
Important
The initial search request and each subsequent scroll request returns
a new scroll_id — only the most recent scroll_id should be used.
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-scroll.html
It's working because Elasticsearch does not always change the scroll_id in between calls and can for smaller result sets return the same scroll_id as was originally returned for some time. This discussion from last year is between two other users seeing the same issue, the same scroll_id being returned for awhile:
http://elasticsearch-users.115913.n3.nabble.com/Distributing-query-results-using-scrolling-td4036726.html
So while your code is working for a smaller result set it's not correct - you need to capture the scroll_id returned in each new call to scroll() and use that for the next call.
self._elkUrl = "http://Hostname:9200/logstash-*/_search?scroll=1m"
self._scrollUrl="http://Hostname:9200/_search/scroll"
"""
Function to get the data from ELK through scrolling mechanism
"""
def GetDataFromELK(self):
#implementing scroll and retriving data from elk to get more than 100000 records at one search
#ref :https://www.elastic.co/guide/en/elasticsearch/reference/6.8/search-request-scroll.html
try :
dataFrame=pd.DataFrame()
if self._elkUrl is None:
raise ValueError("_elkUrl is missing")
if self._username is None:
raise ValueError("_userNmae for elk is missing")
if self._password is None:
raise ValueError("_password for elk is missing")
response=requests.post(self._elkUrl,json=self.body,auth=(self._username,self._password))
response=response.json()
if response is None:
raise ValueError("response is missing")
sid = response['_scroll_id']
hits = response['hits']
total= hits["total"]
if total is None:
raise ValueError("total hits from ELK is none")
total_val=int(total['value'])
url = self._scrollUrl
if url is None:
raise ValueError("scroll url is missing")
#start scrolling
while(total_val>0):
#keep search context alive for 2m
scroll = '2m'
scroll_query={"scroll" : scroll, "scroll_id" : sid }
response1=requests.post(url,json=scroll_query,auth=(self._username,self._password))
response1=response1.json()
# The result from the above request includes a scroll_id, which should be passed to the scroll API in order to retrieve the next batch of results
sid = response1['_scroll_id']
hits=response1['hits']
data=response1['hits']['hits']
if len(data)>0:
cleanDataFrame=self.DataClean(data)
dataFrame=dataFrame.append(cleanDataFrame)
total_val=len(response1['hits']['hits'])
num=len(dataFrame)
print('Total records recieved from ELK=',num)
return dataFrame
except Exception as e:
logging.error('Error while getting the data from elk', exc_info=e)
sys.exit()
from elasticsearch import Elasticsearch
elasticsearch_user_name ='es_username'
elasticsearch_user_password ='es_password'
es_index = "es_index"
es = Elasticsearch(["127.0.0.1:9200"],
http_auth=(elasticsearch_user_name, elasticsearch_user_password))
query = {
"query": {
"bool": {
"must": [
{
"range": {
"es_datetime": {
"gte": "2021-06-21T09:00:00.356Z",
"lte": "2021-06-21T09:01:00.356Z",
"format": "strict_date_optional_time"
}
}
}
]
}
},
"fields": [
"*"
],
"_source": False,
"size": 2000,
}
resp = es.search(index=es_index, body=query, scroll="1m")
old_scroll_id = resp['_scroll_id']
results = resp['hits']['hits']
while len(results):
for i, r in enumerate(results):
# do something whih data
pass
result = es.scroll(
scroll_id=old_scroll_id,
scroll='1m' # length of time to keep search context
)
# check if there's a new scroll ID
if old_scroll_id != result['_scroll_id']:
print("NEW SCROLL ID:", result['_scroll_id'])
# keep track of pass scroll _id
old_scroll_id = result['_scroll_id']
results = result['hits']['hits']
Using Python, how can i extract the field id to a variable? Basicaly, i to transform this:
{
"accountWide": true,
"criteria": [
{
"description": "some description",
"id": 7553,
"max": 1,
"orderIndex": 0
}
]
}
to something like
print "Description is: " + description
print "ID is: " + id
print "Max value is : " + max
Assume you stored that dictionary in a variable called values. To get id in to a variable, do:
idValue = values['criteria'][0]['id']
If that json is in a file, do the following to load it:
import json
jsonFile = open('your_filename.json', 'r')
values = json.load(jsonFile)
jsonFile.close()
If that json is from a URL, do the following to load it:
import urllib, json
f = urllib.urlopen("http://domain/path/jsonPage")
values = json.load(f)
f.close()
To print ALL of the criteria, you could:
for criteria in values['criteria']:
for key, value in criteria.iteritems():
print key, 'is:', value
print ''
Assuming you are dealing with a JSON-string in the input, you can parse it using the json package, see the documentation.
In the specific example you posted you would need
x = json.loads("""{
"accountWide": true,
"criteria": [
{
"description": "some description",
"id": 7553,
"max": 1,
"orderIndex": 0
}
]
}""")
description = x['criteria'][0]['description']
id = x['criteria'][0]['id']
max = x['criteria'][0]['max']