Elastic Search: pyes.exceptions.IndexMissingException exception from search result - python

This is a question about Elastic-Search python API (pyes).
I run a very simple testcase through curl, and everything seems to work as expected.
Here is the description of the curl test-case:
The only document that exists in the ES is:
curl 'http://localhost:9200/test/index1' -d '{"page_text":"This is the text that was found on the page!"}
Then I search the ES for all documents that the word "found" exists in. The result seems to be OK:
curl 'http://localhost:9200/test/index1/_search?q=page_text:found&pretty=true'
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 0.15342641,
"hits" : [ {
"_index" : "test",
"_type" : "index1",
"_id" : "uaxRHpQZSpuicawk69Ouwg",
"_score" : 0.15342641, "_source" : {"page_text":"This is the text that was found on the page!"}
} ]
}
}
However, when I run the same query though python2.7 api (pyes), something goes wrong:
>>> import pyes
>>> conn = pyes.ES('localhost:9200')
>>> result = conn.search({"page_text":"found"}, index="index1")
>>> print result
<pyes.es.ResultSet object at 0xd43e50>
>>> result.count()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/pythonbrew/pythons/Python-2.7.3/lib/python2.7/site-packages/pyes/es.py", line 1717, in count
return self.total
File "/usr/local/pythonbrew/pythons/Python-2.7.3/lib/python2.7/site-packages/pyes/es.py", line 1686, in total
self._do_search()
File "/usr/local/pythonbrew/pythons/Python-2.7.3/lib/python2.7/site-packages/pyes/es.py", line 1646, in _do_search
doc_types=self.doc_types, **self.query_params)
File "/usr/local/pythonbrew/pythons/Python-2.7.3/lib/python2.7/site-packages/pyes/es.py", line 1381, in search_raw
return self._query_call("_search", body, indices, doc_types, **query_params)
File "/usr/local/pythonbrew/pythons/Python-2.7.3/lib/python2.7/site-packages/pyes/es.py", line 622, in _query_call
return self._send_request('GET', path, body, params=querystring_args)
File "/usr/local/pythonbrew/pythons/Python-2.7.3/lib/python2.7/site-packages/pyes/es.py", line 603, in _send_request
raise_if_error(response.status, decoded)
File "/usr/local/pythonbrew/pythons/Python-2.7.3/lib/python2.7/site-packages/pyes/convert_errors.py", line 83, in raise_if_error
raise excClass(msg, status, result, request)
pyes.exceptions.IndexMissingException: [_all] missing
As you can see, pyes returns the result object, but from some reason I can't even get the number of results there.
Anyone was any guess what may be wrong here?
Thanks a lot in advance!

The name of the parameter changed, it's no longer called index, it's called indices and it's a list:
>>> result = conn.search({"page_text":"found"}, indices=["index1"])

Related

'S3' object has no attribute 'get_object_lock_configuration'

I'm trying to implement the object lock feature but functions (get/put_object_lock_configuration) are not available :
>>> import boto3
>>> boto3.__version__
'1.17.64'
>>> client = boto3.client('s3')
>>> client.get_object_lock_configuration
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.6/site-packages/botocore/client.py", line 553, in __getattr__
self.__class__.__name__, item)
AttributeError: 'S3' object has no attribute 'get_object_lock_configuration'
>>> client.get_object_lock_configuration(Bucket='tst', ExpectedBucketOwner='tst')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.6/site-packages/botocore/client.py", line 553, in __getattr__
self.__class__.__name__, item)
AttributeError: 'S3' object has no attribute 'get_object_lock_configuration'
Edit:
object lock functions not showing in python (tab tab) :
>>> client.get_object_
client.get_object_acl( client.get_object_tagging( client.get_object_torrent(
>>> client.put_object
client.put_object( client.put_object_acl( client.put_object_tagging(
get_object_lock_configuration is a function not a property.
You need to call it like that:
response = client.get_object_lock_configuration(
Bucket='string',
ExpectedBucketOwner='string'
)
The syntanx to call function client.get_object_lock_configuration:
response = client.get_object_lock_configuration(
Bucket='string',
ExpectedBucketOwner='string'
)
Syntax to call function client.put_object_lock_configuration:
response = client.put_object_lock_configuration(
Bucket='string',
ObjectLockConfiguration={
'ObjectLockEnabled': 'Enabled',
'Rule': {
'DefaultRetention': {
'Mode': 'GOVERNANCE'|'COMPLIANCE',
'Days': 123,
'Years': 123
}
}
},
RequestPayer='requester',
Token='string',
ContentMD5='string',
ExpectedBucketOwner='string'
)
To know more please refer to this: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Client.put_object_lock_configuration
Edit:
Example Code:
import json
import boto3
client = boto3.client('s3')
response = client.get_object_lock_configuration(
Bucket='anynewname')
print(response)
Output sytanx:
{
'ObjectLockConfiguration': {
'ObjectLockEnabled': 'Enabled',
'Rule': {
'DefaultRetention': {
'Mode': 'GOVERNANCE'|'COMPLIANCE',
'Days': 123,
'Years': 123
}
}
}
}
Note: It will throw an error if object lock config is not set on bucket.
{
"errorMessage": "An error occurred (ObjectLockConfigurationNotFoundError) when calling the GetObjectLockConfiguration operation: Object Lock configuration does not exist for this bucket",
"errorType": "ClientError",
"stackTrace": [
" File \"/var/task/lambda_function.py\", line 7, in lambda_handler\n response = client.get_object_lock_configuration(\n",
" File \"/var/runtime/botocore/client.py\", line 357, in _api_call\n return self._make_api_call(operation_name, kwargs)\n",
" File \"/var/runtime/botocore/client.py\", line 676, in _make_api_call\n raise error_class(parsed_response, operation_name)\n"
]
}

jsonb join not working properly in sqlalchemy

I have a query that joins on a jsonb type column in postgres that I want to convert to sqlalchemy in django using the aldjemy package
SELECT anon_1.key AS tag, count(anon_1.value ->> 'polarity') AS count_1, anon_1.value ->> 'polarity' AS anon_2
FROM feedback f
JOIN tagging t ON t.feedback_id = f.id
JOIN jsonb_each(t.json_content -> 'entityMap') AS anon_3 ON true
JOIN jsonb_each(((anon_3.value -> 'data') - 'selectionState') - 'segment') AS anon_1 ON true
where f.id = 2
GROUP BY anon_1.value ->> 'polarity', anon_1.key;
The json_content field stores data in the following format:
{
"entityMap":
{
"0":
{
"data":
{
"people":
{
"labelId": 5,
"polarity": "positive"
},
"segment": "a small segment",
"selectionState":
{
"focusKey": "9xrre",
"hasFocus": true,
"anchorKey": "9xrre",
"isBackward": false,
"focusOffset": 75,
"anchorOffset": 3
}
},
"type": "TAG",
"mutability": "IMMUTABLE"
},
"1":
{
"data":
{
"product":
{
"labelId": 6,
"polarity": "positive"
},
"segment": "another segment",
"selectionState":
{
"focusKey": "9xrre",
"hasFocus": true,
"anchorKey": "9xrre",
"isBackward": false,
"focusOffset": 138,
"anchorOffset": 79
}
},
"type": "TAG",
"mutability": "IMMUTABLE"
}
}
}
I wrote the following sqlalchemy code to achieve the query
first_alias = aliased(func.jsonb_each(Tagging.sa.json_content["entityMap"]))
print(first_alias)
second_alias = aliased(
func.jsonb_each(
first_alias.c.value.op("->")("data")
.op("-")("selectionState")
.op("-")("segment")
)
)
polarity = second_alias.c.value.op("->>")("polarity")
p_tag = second_alias.c.key
_count = (
Feedback.sa.query()
.join(
CampaignQuestion,
CampaignQuestion.sa.question_id == Feedback.sa.question_id,
isouter=True,
)
.join(Tagging)
.join(first_alias, true())
.join(second_alias, true())
.filter(CampaignQuestion.sa.campaign_id == campaign_id)
.with_entities(p_tag.label("p_tag"), func.count(polarity), polarity)
.group_by(polarity, p_tag)
.all()
)
print(_count)
but it is giving me a NotImplementedError: Operator 'getitem' is not supported on this expression error on accessing first_alias.c
the stack trace:
Traceback (most recent call last):
File "/home/.cache/pypoetry/virtualenvs/api-FPSaTdE5-py3.8/lib/python3.8/site-packages/rest_framework/views.py", line 506, in dispatch
response = handler(request, *args, **kwargs)
File "/home/work/api/app/campaign/views.py", line 119, in results_p_tags
d = campaign_service.get_p_tag_count_for_campaign_results(id)
File "/home/work/api/app/campaign/services/campaign.py", line 177, in get_p_tag_count_for_campaign_results
return campaign_selectors.get_p_tag_counts_for_campaign(campaign_id)
File "/home/work/api/app/campaign/selectors.py", line 196, in get_p_tag_counts_for_campaign
polarity = second_alias.c.value.op("->>")("polarity")
File "/home/.cache/pypoetry/virtualenvs/api-FPSaTdE5-py3.8/lib/python3.8/site-packages/sqlalchemy/util/langhelpers.py", line 1093, in __get__
obj.__dict__[self.__name__] = result = self.fget(obj)
File "/home/.cache/pypoetry/virtualenvs/api-FPSaTdE5-py3.8/lib/python3.8/site-packages/sqlalchemy/sql/selectable.py", line 746, in columns
self._populate_column_collection()
File "/home/.cache/pypoetry/virtualenvs/api-FPSaTdE5-py3.8/lib/python3.8/site-packages/sqlalchemy/sql/selectable.py", line 1617, in _populate_column_collection
self.element._generate_fromclause_column_proxies(self)
File "/home/.cache/pypoetry/virtualenvs/api-FPSaTdE5-py3.8/lib/python3.8/site-packages/sqlalchemy/sql/selectable.py", line 703, in _generate_fromclause_column_proxies
fromclause._columns._populate_separate_keys(
File "/home/.cache/pypoetry/virtualenvs/api-FPSaTdE5-py3.8/lib/python3.8/site-packages/sqlalchemy/sql/base.py", line 1216, in _populate_separate_keys
self._colset.update(c for k, c in self._collection)
File "/home/.cache/pypoetry/virtualenvs/api-FPSaTdE5-py3.8/lib/python3.8/site-packages/sqlalchemy/sql/base.py", line 1216, in <genexpr>
self._colset.update(c for k, c in self._collection)
File "/home/.cache/pypoetry/virtualenvs/api-FPSaTdE5-py3.8/lib/python3.8/site-packages/sqlalchemy/sql/operators.py", line 434, in __getitem__
return self.operate(getitem, index)
File "/home/.cache/pypoetry/virtualenvs/api-FPSaTdE5-py3.8/lib/python3.8/site-packages/sqlalchemy/sql/elements.py", line 831, in operate
return op(self.comparator, *other, **kwargs)
File "/home/.cache/pypoetry/virtualenvs/api-FPSaTdE5-py3.8/lib/python3.8/site-packages/sqlalchemy/sql/operators.py", line 434, in __getitem__
return self.operate(getitem, index)
File "/home/.cache/pypoetry/virtualenvs/api-FPSaTdE5-py3.8/lib/python3.8/site-packages/sqlalchemy/sql/type_api.py", line 75, in operate
return o[0](self.expr, op, *(other + o[1:]), **kwargs)
File "/home/.cache/pypoetry/virtualenvs/api-FPSaTdE5-py3.8/lib/python3.8/site-packages/sqlalchemy/sql/default_comparator.py", line 173, in _getitem_impl
_unsupported_impl(expr, op, other, **kw)
File "/home/.cache/pypoetry/virtualenvs/api-FPSaTdE5-py3.8/lib/python3.8/site-packages/sqlalchemy/sql/default_comparator.py", line 177, in _unsupported_impl
raise NotImplementedError(
NotImplementedError: Operator 'getitem' is not supported on this expression
Any help would be greatly appreciated
PS: The sqlalchemy version I'm using for this is 1.4.6
I used the same sqlalchmy query expression before in a flask project using sqlalchemy version 1.3.22 and it was working correctly
Fixed the issue by using table_valued functions as mentioned in the docs,
and accessing the ColumnCollection of the function using indices instead of keys. Code is as follows:
first_alias = func.jsonb_each(Tagging.sa.json_content["entityMap"]).table_valued(
"key", "value"
)
second_alias = func.jsonb_each(
first_alias.c[1].op("->")("data").op("-")("selectionState").op("-")("segment")
).table_valued("key", "value")
polarity = second_alias.c[1].op("->>")("polarity")
p_tag = second_alias.c[0]

KeyError when unpacking nested record

I have this code below which aims to unpack a nested record when found. Sometimes it works and sometimes it throws an error.
Would anyone have an idea how to resolve this?
Data (Works):
d = {
"_id" : 245,
"connId" : "3r34b32",
"roomList" : [
{
"reportId" : 29,
"siteId" : 1
}]
}
Data (Doesn't work):
d = {
"_id" : 2,
"connId" : 128,
"Config" : {
"Id" : 5203,
"TemplateId" : "587",
"alertRules" : [
{
"id" : 6,
"daysOfTheWeek" : [
"mon",
"tue",
"wed",
"thu",
"fri",
"sat",
"sun"
],
}
]
}}
Code (Dynamic):
root = pd.json_normalize(d)
nested_cols = [i for i in root.columns if isinstance(root[i][0], list)]
l = [root.drop(nested_cols,1),]
for i in nested_cols:
l.append(pd.json_normalize(d, record_path=i))
output = pd.concat(l, axis=1)
print(output)
Traceback Error:
Traceback (most recent call last):
File "c:/Users/Max/Desktop/Azure/TestTimerTrigger/testing.py", line 30, in <module>
l.append(pd.json_normalize(d, record_path=i))
File "c:\Users\Max\Desktop\Azure\.venv\lib\site-packages\pandas\io\json\_normalize.py", line 336, in _json_normalize
_recursive_extract(data, record_path, {}, level=0)
File "c:\Users\Max\Desktop\Azure\.venv\lib\site-packages\pandas\io\json\_normalize.py", line 309, in _recursive_extract
recs = _pull_records(obj, path[0])
File "c:\Users\Max\Desktop\Azure\.venv\lib\site-packages\pandas\io\json\_normalize.py", line 248, in _pull_records
result = _pull_field(js, spec)
File "c:\Users\Max\Desktop\Azure\.venv\lib\site-packages\pandas\io\json\_normalize.py", line 239, in _pull_field
result = result[spec]
KeyError: 'Config.alertRules'
Expected Output:
_id,connid,config.id,config.templateid,id,daysoftheweek
2,128,5203,587,6,[mon,tue,wed,thu,fri,sat,sun]
Note:
I know a keyerror is when the key in a dictionary cannot be located, however, I'm unsure how to go about resolving this.
Any help or guidance will be greatly appreciated.
It is looking for the key Config.alertRules in your dict like d["Config.alertRules"]. It is a nested dict so you should index it like d["Config"]["alertRules"], how are you passing these keys?
This error probably does not occur for your first dictionary since there are no nested dicts there. (d["roomList"] is a list)

Why MongoEngine/pymongo giving error when trying to access object first time only

I have defined MongoEngine classes which are mapped with MongoDB. When I am trying to access the data using MongoEngine, at the specific code it is failing at first attempt but successfully returns data in the second attempt with the same code. Executing the code in python terminal
from Project.Mongo import User
user = User.objects(username = 'xyz#xyz.com').first()
from Project.Mongo import Asset
Asset.objects(org = user.org)
Last line from code generating the following error in first attempt.
Traceback (most recent call last):
File "", line 1, in
File "/usr/local/lib/python3.5/dist-packages/mongoengine/queryset/manager.py", line 37, in get
queryset = queryset_class(owner, owner._get_collection())
File "/usr/local/lib/python3.5/dist-packages/mongoengine/document.py", line 209, in _get_collection
cls.ensure_indexes()
File "/usr/local/lib/python3.5/dist-packages/mongoengine/document.py", line 765, in ensure_indexes
collection.create_index(fields, background=background, **opts)
File "/usr/local/lib/python3.5/dist-packages/pymongo/collection.py", line 1754, in create_index
self.__create_index(keys, kwargs, session, **cmd_options)
File "/usr/local/lib/python3.5/dist-packages/pymongo/collection.py", line 1656, in __create_index
session=session)
File "/usr/local/lib/python3.5/dist-packages/pymongo/collection.py", line 245, in _command
retryable_write=retryable_write)
File "/usr/local/lib/python3.5/dist-packages/pymongo/pool.py", line 517, in command
collation=collation)
File "/usr/local/lib/python3.5/dist-packages/pymongo/network.py", line 125, in command
parse_write_concern_error=parse_write_concern_error)
File "/usr/local/lib/python3.5/dist-packages/pymongo/helpers.py", line 145, in _check_command_response
raise OperationFailure(msg % errmsg, code, response)
pymongo.errors.OperationFailure: Index: { v: 2, key: { org: 1, _fts: "text", _ftsx: 1 }, name: "org_1_name_content_text_description_text_content_text_tag_content_text_remote.source_text", ns: "digitile.asset", weights: { content: 3, description: 1, name_content: 10, remote.owner__name: 20, remote.source: 2, tag_content: 2 }, default_language: "english", background: false, language_override: "language", textIndexVersion: 3 } already exists with different options: { v: 2, key: { org: 1, _fts: "text", _ftsx: 1 }, name: "org_1_name_text_description_text_content_text_tag_content_text_remote.source_text", ns: "digitile.asset", default_language: "english", background: false, weights: { content: 3, description: 1, name: 10, remote.owner__name: 20, remote.source: 2, tag_content: 2 }, language_override: "language", textIndexVersion: 3 }
When I try same last line second time, it produces accurate result
I am using python 3.5.2
pymongo 3.7.2
mongoengine 0.10.6
The first time you call .objects on a document class, mongoengine tries to create the indexes if they don't exist.
In this case it fails during the creation of an index on the asset collection (detail of indexes are taken from your Asset/User Document classes) as you can see in the error message:
pymongo.errors.OperationFailure: Index: {...new index details...} already exists with different options {...existing index details...}.
The second time you make that call, mongoengine assumes that the indexes were created and isn't attempting to create it again, which explains why the second call passes.

MongoDB doesn't handle aggregation with allowDiskUsage:True

the data structure is like:
way: {
_id:'9762264'
node: ['253333910', '3304026514']
}
and I'm trying to count the frequency of nodes' appearance in ways. Here is my code using pymongo:
node = db.way.aggregate([
{'$unwind': '$node'},
{
'$group': {
'_id': '$node',
'appear_count': {'$sum': 1}
}
},
{'$sort': {'appear_count': -1}},
{'$limit': 10}
],
{'allowDiskUse': True}
)
it will report an error:
Traceback (most recent call last):
File "<input>", line 1, in <module>
File ".../OSM Wrangling/explore.py", line 78, in most_passed_node
{'allowDiskUse': True}
File ".../pymongo/collection.py", line 2181, in aggregate
**kwargs)
File ".../pymongo/collection.py", line 2088, in _aggregate
client=self.__database.client)
File ".../pymongo/pool.py", line 464, in command
self.validate_session(client, session)
File ".../pymongo/pool.py", line 609, in validate_session
if session._client is not client:
AttributeError: 'dict' object has no attribute '_client'
However, if I removed the {'allowDiskUse': True} and test it on a smaller set of data, it works well. It seems that the allowDiskUse statement brings something wrong? And there is no information about this mistake in the docs of MongoDB
How should I solve this problem and get the answer I want?
How should I solve this problem and get the answer I want?
This is because in PyMongo v3.6 the method signature for collection.aggregate() has been changed. An optional parameter for session has been added.
The method signature now is :
aggregate(pipeline, session=None, **kwargs)
Applying this to your code example, you can specify allowDiskUse as below:
node = db.way.aggregate(pipeline=[
{'$unwind': '$node'},
{'$group': {
'_id': '$node',
'appear_count': {'$sum': 1}
}
},
{'$sort': {'appear_count': -1}},
{'$limit': 10}
],
allowDiskUse=True
)
See also pymongo.client_session if you would like to know more about session.
js is case sensitive, please use lowercase boolean true
{'allowDiskUse': true}

Categories