i am trying to write a query where it searches in elastic that a particular field is null.this query us executed in python using Python Elasticsearch Client.
query:
{
"_source": ["name"],
"query": {
"nested": {
"path": "experience",
"query": {
"match": {
"experience.resignation_date": {
"query": None
}
}
}
}
}
}
since its python i have used None in the query part but it throwing me this error.
elasticsearch.exceptions.RequestError: TransportError(400, 'parsing_exception', '[match] unknown token [VALUE_NULL] after [query]')
The missing query is deprecated, you're looking for bool/must_not + exists
{
"_source": [
"name"
],
"query": {
"nested": {
"path": "experience",
"query": {
"bool": {
"must_not": {
"exists": {
"field": "experience.resignation_date"
}
}
}
}
}
}
}
With this expression you're not querying for null, you're saying use null as the query.
Query must always be one of the query types that ElasticSearch has defined, such as for example the "match" query.
The kind of syntax you wanted to write here was
query: {
match: {
"experience.resignation_date": None
}
}
Where you are asserting that the value matches "None"
However, there is a better specific query type for matching documents with empty field values called "missing".
It will match null fields as well as documents which lack the property all together. Whether this is more appropriate for you depends on your needs.
EDIT: As a subsequent answer points out "missing" is actually deprecated now. The equivalent is to negate the "exists" query instead.
query: {
bool: {
must_not: {
exists: "experience.resignation_date"
}
}
}
Related
search = {
"from": str(start),
"size": str(size),
"query": {
"bool": {
"must": {
"multi_match": {
"query":query,
"fields":["name","description","tags","comments","created","creator","transaction","wallet"],
"operator":"or"}
},
"filter": { "term": { "channel": channel } } } } }
This is the python dict object. It gets the following error:
elasticsearch.BadRequestError: BadRequestError(400, 'parsing_exception', '[bool] malformed query, expected [END_OBJECT] but found [FIELD_NAME]')
I'm not seeing it. Please help. Start, size, query, and channel are all variables.
I have looked at a lot of example elasticsearch queries. Nothing I've tried has gotten passed syntax errors. I've also tried simple_search_string and a simple multi_match. I always need start and size, and always need to filter on channel.
So the issue is some of those fields are arrays and need [] inside them. Specifically must and filter. Adding appropriate braces solved the issue. Here's the new format:
search = {
"from": start,
"query": {
"bool": {
"must": [
{ "multi_match": {
"query": query,
"fields": ["name","description","tags","comments","created","creator","transaction","wallet"]
} },
{ "match": {
"channel": channel
} }
]
}
}
}
Notice I've also dropped using the filter and just added another match term. I'm using size in the search call as one of its parameters.
Now I am having some data in the following form:
df = pd.DataFrame([['foo','some text',1, 13],['foo','Another text',2, 4],['foo','Third text',3, 10],['bar','Text1',2, 25], ['bar','Long text',1, 17],['num','short text',3, 0],['num','fifth text',3, 8]], index = range(1,8), columns = ['category','text','label', 'count'])
I've put the documents into an es index and try to searh with the condition of getting "count" that is greater than 0 and less than 10, and "category" that is not "foo".
I tried to use the "none" clause in "filter" clause of a boolean query, but it gives the error of "no query registered for [none]".
text: "text"
data = json.dumps({
"query":{
"bool":{
"should":[
{
"match":{
"text":text
}
}
],
"filter": [
{
"range": {
"count": {
"from": 0,
"to": 10
}
}
},
{
"none": {
"term": {
"category.keyword": "foo"
}
}
}
]
}
}
})
So I am now using the "must_not" clause as below:
text: "text"
data = json.dumps({
"query":{
"bool":{
"should":[
{
"match":{
"text":text
}
}
],
"filter": [
{
"range": {
"count": {
"from": 0,
"to": 10
}
}
}
]
,
"must_not":[
{
"term": {
"category.keyword": "foo"
}
}
]
}
}
})
Is there a way to use "none" in the "filter" clause and to make the query work more efficiently? Thank you!
As mentioned in the documentation, scoring is ignored for the must_not clause, so there will be no impact on the performance of the query (used above), even if must_not clause is included outside the filter clause.
The clause (query) must not appear in the matching documents. Clauses
are executed in filter context meaning that scoring is ignored and
clauses are considered for caching. Because scoring is ignored, a
score of 0 for all documents is returned.
And apart from this, there is no none query, instead, there is only Match None query, which matches no documents.
I use elasticsearch-dsl in order to query Elasticsearch in python.
I want to search documents with text field and get all documents that created field of them is less than datetime.now().
I execute the following query but elasticsearch raises error.
q = "some word"
es.search(
index="comment",
body={
"query": {
"query_string": {
"query": f"*{q}*" ,
"default_field": "text"
},
"range": {"created": {"lt": datetime.now()}}
},
"fields": ["id"],
"size": 10
},
)
The following is elasticsearch error:
elasticsearch.exceptions.RequestError: RequestError(400, 'parsing_exception', '[query_string] malformed query, expected [END_OBJECT] but found [FIELD_NAME]')
Note: When I comment "range": {"created": {"lt": datetime.now()}}, the query will work without error (but datetime filter is not applied).
You can't combine few types of queries like this, use bool:
{
"query": {
"bool": {
"must": [{"query_string": {"query": f"*{q}*" , "default_field": "text"}}],
"filter": [{"range": {"created": {"lt": datetime.now()}}}]}
}
}
}
Notice that full-text query_string clause goes into must section and gets scored while range filter goes to filter and ES doesn't calculate scores for these.
I have stored some HTML pages in ElasticSearch, now I want to match an input string with all the strings present in those HTML and get the exact location of the match. So far I have written this query:
"query": {
"query_string": {
"query": queryText,
"default_field": "html"
}
}
This returns the whole document where the match is found. Is there a way to get the exact location of the match?
You can leverage the Highlight feature, like this:
GET myindex/_search
{
"query": {
"query_string": {
"query": queryText,
"default_field": "html"
}
},
"highlight": {
"fields": {
"html": {}
}
}
}
I'm curious about the best approach to count the instances of a particular field, across all documents, in a given ElasticSearch index.
For example, if I've got the following documents in index goober:
{
'_id':'foo',
'field1':'a value',
'field2':'a value'
},
{
'_id':'bar',
'field1':'a value',
'field2':'a value'
},
{
'_id':'baz',
'field1':'a value',
'field3':'a value'
}
I'd like to know something like the following:
{
'index':'goober',
'field_counts':
'field1':3,
'field2':2,
'field3':1
}
Is this doable with a single query? or multiple? For what it's worth, I'm using python elasticsearch and elasticsearch-dsl clients.
I've successfully issued a GET request to /goober and retrieved the mappings, and am learning how to submit requests for aggregations for each field, but I'm interested in learning how many times a particular field appears across all documents.
Coming from using Solr, still getting my bearings with ES. Thanks in advance for any suggestions.
The below will return you the count of docs with "field2":
POST /INDEX/_search
{
"size": 0,
"query": {
"bool": {
"filter": {
"exists": {
"field": "field2"
}
}
}
}
}
And here is an example using multiple aggregates (will return each agg in a bucket with a count), using field exist counts:
POST /INDEX/_search
{
"size": 0,
"aggs": {
"field_has1": {
"filter": {
"exists": {
"field": "field1"
}
}
},
"field_has2": {
"filter": {
"exists": {
"field": "field2"
}
}
}
}
}
The behavior within each agg on the second example will mimic the behavior of the first query. In many cases, you can take a regular search query and nest those lookups within aggregate buckets.
Quick time-saver based on existing answer:
interesting_fields = ['field1', 'field2']
body = {
'size': 0,
'aggs': {f'has_{field_name}': {
"filter": {
"exists": {
"field": f'export.{field_name}'
}
}
} for field_name in interesting_fields},
}
print(requests.post('http://localhost:9200/INDEX/_search', json=body).json())