KeyError when unpacking nested record - python

I have this code below which aims to unpack a nested record when found. Sometimes it works and sometimes it throws an error.
Would anyone have an idea how to resolve this?
Data (Works):
d = {
"_id" : 245,
"connId" : "3r34b32",
"roomList" : [
{
"reportId" : 29,
"siteId" : 1
}]
}
Data (Doesn't work):
d = {
"_id" : 2,
"connId" : 128,
"Config" : {
"Id" : 5203,
"TemplateId" : "587",
"alertRules" : [
{
"id" : 6,
"daysOfTheWeek" : [
"mon",
"tue",
"wed",
"thu",
"fri",
"sat",
"sun"
],
}
]
}}
Code (Dynamic):
root = pd.json_normalize(d)
nested_cols = [i for i in root.columns if isinstance(root[i][0], list)]
l = [root.drop(nested_cols,1),]
for i in nested_cols:
l.append(pd.json_normalize(d, record_path=i))
output = pd.concat(l, axis=1)
print(output)
Traceback Error:
Traceback (most recent call last):
File "c:/Users/Max/Desktop/Azure/TestTimerTrigger/testing.py", line 30, in <module>
l.append(pd.json_normalize(d, record_path=i))
File "c:\Users\Max\Desktop\Azure\.venv\lib\site-packages\pandas\io\json\_normalize.py", line 336, in _json_normalize
_recursive_extract(data, record_path, {}, level=0)
File "c:\Users\Max\Desktop\Azure\.venv\lib\site-packages\pandas\io\json\_normalize.py", line 309, in _recursive_extract
recs = _pull_records(obj, path[0])
File "c:\Users\Max\Desktop\Azure\.venv\lib\site-packages\pandas\io\json\_normalize.py", line 248, in _pull_records
result = _pull_field(js, spec)
File "c:\Users\Max\Desktop\Azure\.venv\lib\site-packages\pandas\io\json\_normalize.py", line 239, in _pull_field
result = result[spec]
KeyError: 'Config.alertRules'
Expected Output:
_id,connid,config.id,config.templateid,id,daysoftheweek
2,128,5203,587,6,[mon,tue,wed,thu,fri,sat,sun]
Note:
I know a keyerror is when the key in a dictionary cannot be located, however, I'm unsure how to go about resolving this.
Any help or guidance will be greatly appreciated.

It is looking for the key Config.alertRules in your dict like d["Config.alertRules"]. It is a nested dict so you should index it like d["Config"]["alertRules"], how are you passing these keys?
This error probably does not occur for your first dictionary since there are no nested dicts there. (d["roomList"] is a list)

Related

how to achieve the following result using dict comprehension?

Hello I would like to get all the 'Integer' values from a dict:
array_test = [{ "result1" : "date1", "type" : "Integer"},{ "result1" : "date2", "type" : "null"}]
I tried:
test = {'result1':array_test['result1'] for element in array_test if array_test['type'] == "Integer"}
However I got this error:
>>> test = {'result1':array_test['result1'] for element in array_test if array_test['type'] == "Integer"}
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 1, in <dictcomp>
TypeError: list indices must be integers or slices, not str
>>>
>>>
So I would like to appreciate support to achieve the following output
test = [{ "result1" : "date1", "type" : "Integer"}]
You need a list-comprehension, not dictionary-comprehension:
array_test = [{ "result1" : "date1", "type" : "Integer"},{ "result1" : "date2", "type" : "null"}]
test = [x for x in array_test if x['type'] == 'Integer']
# [{'result1': 'date1', 'type': 'Integer'}]
Why? Because required output is a list (list of dictionaries).

MongoDB doesn't handle aggregation with allowDiskUsage:True

the data structure is like:
way: {
_id:'9762264'
node: ['253333910', '3304026514']
}
and I'm trying to count the frequency of nodes' appearance in ways. Here is my code using pymongo:
node = db.way.aggregate([
{'$unwind': '$node'},
{
'$group': {
'_id': '$node',
'appear_count': {'$sum': 1}
}
},
{'$sort': {'appear_count': -1}},
{'$limit': 10}
],
{'allowDiskUse': True}
)
it will report an error:
Traceback (most recent call last):
File "<input>", line 1, in <module>
File ".../OSM Wrangling/explore.py", line 78, in most_passed_node
{'allowDiskUse': True}
File ".../pymongo/collection.py", line 2181, in aggregate
**kwargs)
File ".../pymongo/collection.py", line 2088, in _aggregate
client=self.__database.client)
File ".../pymongo/pool.py", line 464, in command
self.validate_session(client, session)
File ".../pymongo/pool.py", line 609, in validate_session
if session._client is not client:
AttributeError: 'dict' object has no attribute '_client'
However, if I removed the {'allowDiskUse': True} and test it on a smaller set of data, it works well. It seems that the allowDiskUse statement brings something wrong? And there is no information about this mistake in the docs of MongoDB
How should I solve this problem and get the answer I want?
How should I solve this problem and get the answer I want?
This is because in PyMongo v3.6 the method signature for collection.aggregate() has been changed. An optional parameter for session has been added.
The method signature now is :
aggregate(pipeline, session=None, **kwargs)
Applying this to your code example, you can specify allowDiskUse as below:
node = db.way.aggregate(pipeline=[
{'$unwind': '$node'},
{'$group': {
'_id': '$node',
'appear_count': {'$sum': 1}
}
},
{'$sort': {'appear_count': -1}},
{'$limit': 10}
],
allowDiskUse=True
)
See also pymongo.client_session if you would like to know more about session.
js is case sensitive, please use lowercase boolean true
{'allowDiskUse': true}

Error while loading bulk data into Elasticsearch

I am using Elasticsearch in python. I have data in pandas frame(3 columns), then I added two columns _index and _type and converted the data into json with each record using pandas inbuilt method.
data = data.to_json(orient='records')
This is my data then,
[{"op_key":99140046678,"employee_key":991400459,"Revenue Results":6625.76480192,"_index":"revenueindex","_type":"revenuetype"},
{"op_key":99140045489,"employee_key":9914004258,"Revenue Results":6691.05435536,"_index":"revenueindex","_type":"revenuetype"},
......
}]
My mapping is:
user_mapping = {
"settings" : {
"number_of_shards": 3,
"number_of_replicas": 2
},
'mappings': {
'revenuetype': {
'properties': {
'op_key':{'type':'string'},
'employee_key':{'type':'string'},
'Revenue Results':{'type':'float','index':'not_analyzed'},
}
}
}
}
Then facing this error while using helpers.bulk(es,data):
Traceback (most recent call last):
File "/Users/adaggula/Documents/workspace/ElSearchPython/sample.py", line 59, in <module>
res = helpers.bulk(client,data)
File "/Users/adaggula/workspace/python/pve/lib/python2.7/site-packages/elasticsearch/helpers/__init__.py", line 188, in bulk
for ok, item in streaming_bulk(client, actions, **kwargs):
File "/Users/adaggula/workspace/python/pve/lib/python2.7/site-packages/elasticsearch/helpers/__init__.py", line 160, in streaming_bulk
for result in _process_bulk_chunk(client, bulk_actions, raise_on_exception, raise_on_error, **kwargs):
File "/Users/adaggula/workspace/python/pve/lib/python2.7/site-packages/elasticsearch/helpers/__init__.py", line 89, in _process_bulk_chunk
raise e
elasticsearch.exceptions.RequestError: TransportError(400, u'action_request_validation_exception', u'Validation Failed: 1: index is
missing;2: type is missing;3: index is missing;4: type is missing;5: index is
missing;6: ....... type is missing;999: index is missing;1000: type is missing;')
It looks like for every json object, index and type's are missing. How to overcome this?
Pandas Data frame to json conversion is the trick which resolved the problem.
data = data.to_json(orient='records')
data= json.loads(data)

TypeError: list indices must be integers, not str,while parsing json

After submitting a request I have received the following json back:
{"type": [
{"ID": "all", "count": 1, "references": [
{ "id": "Boston,MA,02118", "text": "Boston,MA,02118", "val": "Boston,MA,02118", "type": 1 ,"zip": "02118","city": "Boston","state": "MA","lt": "42.3369","lg": "-71.0637","s": ""}
] }
] }
I captured the response in variable j and loaded it as follows,
l = json.loads(j)
Now I have:
>>> type(l)
<type 'dict'>
>>> l['type']['references']
Traceback (most recent call last):
File "C:\PyCharm\helpers\pydev\pydevd_exec.py", line 3, in Exec
exec exp in global_vars, local_vars
File "<input>", line 1, in <module>
TypeError: list indices must be integers, not str
What am I doing wrong?
l['type'] gives you a list, not an object. So you have to access it like a list
l['type'][0]["references"]
The value of the key type is a list. Try doing type(l['type']) to confirm.
To access that value you would need to use:
l['type'][0]['references']
which would give you another list.

Elastic Search: pyes.exceptions.IndexMissingException exception from search result

This is a question about Elastic-Search python API (pyes).
I run a very simple testcase through curl, and everything seems to work as expected.
Here is the description of the curl test-case:
The only document that exists in the ES is:
curl 'http://localhost:9200/test/index1' -d '{"page_text":"This is the text that was found on the page!"}
Then I search the ES for all documents that the word "found" exists in. The result seems to be OK:
curl 'http://localhost:9200/test/index1/_search?q=page_text:found&pretty=true'
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 0.15342641,
"hits" : [ {
"_index" : "test",
"_type" : "index1",
"_id" : "uaxRHpQZSpuicawk69Ouwg",
"_score" : 0.15342641, "_source" : {"page_text":"This is the text that was found on the page!"}
} ]
}
}
However, when I run the same query though python2.7 api (pyes), something goes wrong:
>>> import pyes
>>> conn = pyes.ES('localhost:9200')
>>> result = conn.search({"page_text":"found"}, index="index1")
>>> print result
<pyes.es.ResultSet object at 0xd43e50>
>>> result.count()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/pythonbrew/pythons/Python-2.7.3/lib/python2.7/site-packages/pyes/es.py", line 1717, in count
return self.total
File "/usr/local/pythonbrew/pythons/Python-2.7.3/lib/python2.7/site-packages/pyes/es.py", line 1686, in total
self._do_search()
File "/usr/local/pythonbrew/pythons/Python-2.7.3/lib/python2.7/site-packages/pyes/es.py", line 1646, in _do_search
doc_types=self.doc_types, **self.query_params)
File "/usr/local/pythonbrew/pythons/Python-2.7.3/lib/python2.7/site-packages/pyes/es.py", line 1381, in search_raw
return self._query_call("_search", body, indices, doc_types, **query_params)
File "/usr/local/pythonbrew/pythons/Python-2.7.3/lib/python2.7/site-packages/pyes/es.py", line 622, in _query_call
return self._send_request('GET', path, body, params=querystring_args)
File "/usr/local/pythonbrew/pythons/Python-2.7.3/lib/python2.7/site-packages/pyes/es.py", line 603, in _send_request
raise_if_error(response.status, decoded)
File "/usr/local/pythonbrew/pythons/Python-2.7.3/lib/python2.7/site-packages/pyes/convert_errors.py", line 83, in raise_if_error
raise excClass(msg, status, result, request)
pyes.exceptions.IndexMissingException: [_all] missing
As you can see, pyes returns the result object, but from some reason I can't even get the number of results there.
Anyone was any guess what may be wrong here?
Thanks a lot in advance!
The name of the parameter changed, it's no longer called index, it's called indices and it's a list:
>>> result = conn.search({"page_text":"found"}, indices=["index1"])

Categories