I'm trying to return certain elements of an array in the document below.
{
"_id": 2,
"awardAmount": 6000,
"url": "www.url.com",
"numAwards": 3,
"award": "Faculty Seed Research Grant",
"Type": "faculty",
"Applicates": [
{
"School": "psu",
"Name": "tom",
"URL": "www.url.com",
"Time": "",
"Research": "",
"Budge": 7500,
"appId": 100,
"citizenship": "us",
"Major": "mat",
"preAwards": "None",
"Advisor": ""
},
{
"School": "ffff",
"Name": "KEVIN",
"URL": "www.url.com",
"Time": "5/5/5-6/6/6",
"Research": "topology",
"Budge": 9850,
"appId": 101,
"citizenship": "us",
"Major": "gym",
"preAwards": "None",
"Advisor": "Dr. cool",
"Evaluators": [
{
"abstractScore": 3,
"goalsObjectivesScore": 4,
"evalNum": 1
},
{
"abstractScore": 545646,
"goalsObjectivesScore": 46546,
"evalNum": 2
}
]
}
]
}
I want only the "Applicates" data if they have an "Evaluators" field. Here is what I was trying
db.coll.find({'Applicates.Evaluators':{'$exists': True }})
This gives me the whole document but I just want "Applicates" data that have the "Evaluators" field in it like this.
{
"_id": 2,
"awardAmount": 6000,
"url": "www.url.com",
"numAwards": 3,
"award": "Faculty Seed Research Grant",
"Type": "faculty",
"Applicates": [
{
"School": "ffff",
"Name": "KEVIN",
"URL": "www.url.com",
"Time": "5/5/5-6/6/6",
"Research": "topology",
"Budge": 9850,
"appId": 101,
"citizenship": "us",
"Major": "gym",
"preAwards": "None",
"Advisor": "Dr. cool",
"Evaluators": [
{
"abstractScore": 3,
"goalsObjectivesScore": 4,
"evalNum": 1
},
{
"abstractScore": 545646,
"goalsObjectivesScore": 46546,
"evalNum": 2
}
]
}
]
}
Try this (the key is using $unwind operator)
db.coll.aggregate(
[
{ $match : {'Applicates.Evaluators':{'$exists': true }} },
{ $unwind : "$Applicates" },
{ $match : {'Applicates.Evaluators':{'$exists': true }} },
{ $group : { _id : "$_id",
'Applicates' : {$push : '$Applicates'} ,
awardAmount : {$first : '$awardAmount'},
url : {$first : '$url'},
award : {$first : '$award'},
numAwards : {$first : '$numAwards'},
award : {$first : '$award'},
Type : {$first : '$Type'},
}},
])
Related
I have this array of 3 objects.
The parameter that interests me is "id", that is nested into "categories" attribute.
list = [
{
"title": "\u00c9glise Saint-Julien",
"distance": 1841,
"excursionDistance": 1575,
"categories": [
{
"id": "300-3200-0030",
"name": "\u00c9glise",
"primary": true
},
{
"id": "300-3000-0025",
"name": "Monument historique"
}
]
},
{
"title": "Sevdec",
"distance": 2250,
"excursionDistance": 301,
"categories": [
{
"id": "700-7600-0322",
"name": "Station de recharge",
"primary": true
}
]
},
{
"title": "SIEGE 27",
"distance": 2651,
"excursionDistance": 1095,
"categories": [
{
"id": "700-7600-0322",
"name": "Station de recharge",
"primary": true
}
]
}
]
Then I have these two arrays that contain ids:
mCat1 = ["300-3000-0000","300-3000-0023","300-3000-0030","300-3000-0025","300-3000-0024","300-3100"] # macro cat1 = tourism
mCat2 = ["400-4300","700-7600-0322"]
I need to filter "list" on "mCat1" in order to extract in a new variable the object(s) that have at least one "id" that matches those in "mCat1".
Then I need to do the same with "mCat2".
In this example the expected result would be:
mCat1Result = [{
"title": "\u00c9glise Saint-Julien",
"distance": 1841,
"excursionDistance": 1575,
"categories": [
{
"id": "300-3200-0030",
"name": "\u00c9glise",
"primary": true
},
{
"id": "300-3000-0025",
"name": "Monument historique"
}
]
}]
mCat2Result = [{
"title": "Sevdec",
"distance": 2250,
"excursionDistance": 301,
"categories": [
{
"id": "700-7600-0322",
"name": "Station de recharge",
"primary": true
}
]
},
{
"title": "SIEGE 27",
"distance": 2651,
"excursionDistance": 1095,
"categories": [
{
"id": "700-7600-0322",
"name": "Station de recharge",
"primary": true
}
]
}]
What would be the most efficient way to do this? I am able to do it using loops but it is very resource dependent on large datasets.
I've never heard of or found an option for what I'm looking for, but maybe someone knows a way:
To collect the data from a JSON I need to map manually it like this:
events = response['events']
for event in events:
tournament_name = event['tournament']['name']
tournament_slug = event['tournament']['slug']
tournament_category_name = event['tournament']['category']['name']
tournament_category_slug = event['tournament']['category']['slug']
tournament_category_sport_name = event['tournament']['category']['sport']['name']
tournament_category_sport_slug = event['tournament']['category']['sport']['slug']
tournament_category_sport_id = event['tournament']['category']['sport']['id']
The complete model is this:
{
"events": [
{
"tournament": {
"name": "Serie A",
"slug": "serie-a",
"category": {
"name": "Italy",
"slug": "italy",
"sport": {
"name": "Football",
"slug": "football",
"id": 1
},
"id": 31,
"flag": "italy",
"alpha2": "IT"
},
"uniqueTournament": {
"name": "Serie A",
"slug": "serie-a",
"category": {
"name": "Italy",
"slug": "italy",
"sport": {
"name": "Football",
"slug": "football",
"id": 1
},
"id": 31,
"flag": "italy",
"alpha2": "IT"
},
"userCount": 586563,
"id": 23,
"hasEventPlayerStatistics": true
},
"priority": 254,
"id": 33
},
"roundInfo": {
"round": 24
},
"customId": "Kdbsfeb",
"status": {
"code": 7,
"description": "2nd half",
"type": "inprogress"
},
"winnerCode": 0,
"homeTeam": {
"name": "Bologna",
"slug": "bologna",
"shortName": "Bologna",
"gender": "M",
"userCount": 39429,
"nameCode": "BOL",
"national": false,
"type": 0,
"id": 2685,
"subTeams": [
],
"teamColors": {
"primary": "#003366",
"secondary": "#cc0000",
"text": "#cc0000"
}
},
"awayTeam": {
"name": "Empoli",
"slug": "empoli",
"shortName": "Empoli",
"gender": "M",
"userCount": 31469,
"nameCode": "EMP",
"national": false,
"type": 0,
"id": 2705,
"subTeams": [
],
"teamColors": {
"primary": "#0d5696",
"secondary": "#ffffff",
"text": "#ffffff"
}
},
"homeScore": {
"current": 0,
"display": 0,
"period1": 0
},
"awayScore": {
"current": 0,
"display": 0,
"period1": 0
},
"coverage": 1,
"time": {
"initial": 2700,
"max": 5400,
"extra": 540,
"currentPeriodStartTimestamp": 1644159735
},
"changes": {
"changes": [
"status.code",
"status.description",
"time.currentPeriodStart"
],
"changeTimestamp": 1644159743
},
"hasGlobalHighlights": false,
"hasEventPlayerStatistics": true,
"hasEventPlayerHeatMap": true,
"id": 9645399,
"statusTime": {
"prefix": "",
"initial": 2700,
"max": 5400,
"timestamp": 1644159735,
"extra": 540
},
"startTimestamp": 1644156000,
"slug": "empoli-bologna",
"lastPeriod": "period2",
"finalResultOnly": false
}
]
}
In my example I am collecting 7 values.
But there are 83 possible values to be collected.
In case I want to get all the values options that exist in this JSON, is there any way to make this map sequence automatically to print so I can copy it to the code?
Because manually it takes too long to do and it's very tiring.
And the results of texts like print() in terminal would be something like:
tournament_name = event['tournament']['name']
tournament_slug = event['tournament']['slug']
...
...
...
And so on until delivering the 83 object paths with values to collect...
Then I could copy all the prints and paste into my Python file to retrieve the values or any other way to make the work easier.
If the elements in the events arrays are the same, this code works without errors.
def get_prints(recode: dict):
for key in recode.keys():
if type(recode[key]) == dict:
for sub_print in get_prints(recode[key]):
yield [key] + sub_print
else:
yield [key]
class Automater:
def __init__(self,name: str):
"""
Params:
name: name of json
"""
self.name = name
def get_print(self,*args):
"""
Params:
*args: keys json
"""
return '_'.join(args) + ' = ' + self.name + ''.join([f"['{arg}']" for arg in args])
For example, this code:
dicts = {
'tournament':{
'name':"any name",
'slug':'somthing else',
'sport':{
'name':'sport',
'anotherdict':{
'yes':True
}
}
}
}
list_names = get_prints(dicts)
for name in list_names:
print(auto.get_print(*name))
Gives this output:
tournament_name = event['tournament']['name']
tournament_slug = event['tournament']['slug']
tournament_sport_name = event['tournament']['sport']['name']
tournament_sport_anotherdict_yes = event['tournament']['sport']['anotherdict']['yes']
Following is the kibana JSON of a single row,
{
"_index": "questionanswers",
"_type": "doc",
"_id": "3",
"_version": 1,
"_score": 0,
"_source": {
"question": {
"id": 3,
"text": "Your first salary",
"answer_type": "FL",
"question_type": "BQ"
},
"candidate": {
"id": 13
},
"job": {
"id": 6
},
"id": 3,
"status": "AN",
"answered_on": "2019-07-12T09:26:01+00:00",
"answer": "12222222"
},
"fields": {
"answered_on": [
"2019-07-12T09:26:01.000Z"
]
}
}
I have an sql query like,
Select * from questionanswers where question.id = 3 and answer between 1250 and 1253666
I have converted this to elasticsearch query as follows,
{
"size": 1000,
"query": {
"bool": {
"must": [
{
"term": {
"question.id":3
}
},
{
"range": {
"answer": {
"from": 1250,
"to": 1253666999,
"include_lower": true,
"include_upper": true,
"boost": 1
}
}
}
],
"adjust_pure_negative": true,
"boost": 1
}
}
}
Here answer is declared as String , But i holds Date,FLoat and String values.
"question": {
"id": 3,
"text": "Your first salary",
"answer_type": "FL",
"question_type": "BQ"
},
Here answer_type tells which type of answer it is expecting.
When I try to run this query I am not getting desired results. I am getting an empty response on this hit.
But actually, there is a row that satisfies this query.
How my elasticsearch query should be so that I can filter with
question.id = 3 , question.answer_type = "FL" and answer between 1250 and 1253666```
see your document again. The answer is a string value, and you are treating as a number in your query. So it does not work obviously.
change the mapping for this field to number.
Here is the document I indexed in a test index and ran your query again and it works
Indexing the document ( see the field answer)
POST /so-index4/_doc/1
{
"question": {
"id": 3,
"text": "Your first salary",
"answer_type": "FL",
"question_type": "BQ"
},
"candidate": {
"id": 13
},
"job": {
"id": 6
},
"id": 3,
"status": "AN",
"answered_on": "2019-07-12T09:26:01+00:00",
"answer": 12222222,
"fields": {
"answered_on": [
"2019-07-12T09:26:01.000Z"
]
}
}
and the query (same query that you provided above)
GET /so-index4/_search
{
"size": 1000,
"query": {
"bool": {
"must": [
{
"term": {
"question.id":3
}
},
{
"range": {
"answer": {
"from": 1250,
"to": 1253666999,
"include_lower": true,
"include_upper": true,
"boost": 1
}
}
}
],
"adjust_pure_negative": true,
"boost": 1
}
}
}
the result
{
"took" : 5,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 2.0,
"hits" : [
{
"_index" : "so-index4",
"_type" : "_doc",
"_id" : "1",
"_score" : 2.0,
"_source" : {
"question" : {
"id" : 3,
"text" : "Your first salary",
"answer_type" : "FL",
"question_type" : "BQ"
},
"candidate" : {
"id" : 13
},
"job" : {
"id" : 6
},
"id" : 3,
"status" : "AN",
"answered_on" : "2019-07-12T09:26:01+00:00",
"answer" : 12222222,
"fields" : {
"answered_on" : [
"2019-07-12T09:26:01.000Z"
]
}
}
}
]
}
}
Relatively new to Python here, coming from a node.js background, having quite a few issues parsing the output I get from get_query_results()
Documentation Link
I have been at this for some hours, i have tried iterating through the ['ResultSetMetadata']['ColumnInfo'] to grab the column names, but i don't know how to tie the ['ResultSet']['Data'] to these items so the code knows which name to apply to each dataValue.
I know i need to select the row headers then add the associated objects to those rows, but the logic on how to do such a thing in python escapes me.
I can see that the first column name always lines up with the first ['Data']['VarCharValue'] so I can get all the values in order, but if I loop through ['ResultSet']['Rows'] how do I isolate the first iteration as the column names to then populate with each other row?
Or is there a better way to do this?
Here is my json.dumps(ATHENAoutput)
{
"ResultSet": {
"Rows": [{
"Data": [{
"VarCharValue": "postcode"
}, {
"VarCharValue": "CountOf"
}]
}, {
"Data": [{
"VarCharValue": "1231"
}, {
"VarCharValue": "2"
}]
}, {
"Data": [{
"VarCharValue": "1166"
}, {
"VarCharValue": "2"
}]
}, {
"Data": [{
"VarCharValue": "3651"
}, {
"VarCharValue": "3"
}]
}, {
"Data": [{
"VarCharValue": "2171"
}, {
"VarCharValue": "2"
}]
}, {
"Data": [{
"VarCharValue": "4697"
}, {
"VarCharValue": "2"
}]
}, {
"Data": [{
"VarCharValue": "4450"
}, {
"VarCharValue": "2"
}]
}, {
"Data": [{
"VarCharValue": "4469"
}, {
"VarCharValue": "1"
}]
}],
"ResultSetMetadata": {
"ColumnInfo": [{
"Scale": 0,
"Name": "postcode",
"Nullable": "UNKNOWN",
"TableName": "",
"Precision": 2147483647,
"Label": "postcode",
"CaseSensitive": true,
"SchemaName": "",
"Type": "varchar",
"CatalogName": "hive"
}, {
"Scale": 0,
"Name": "CountOf",
"Nullable": "UNKNOWN",
"TableName": "",
"Precision": 19,
"Label": "CountOf",
"CaseSensitive": false,
"SchemaName": "",
"Type": "bigint",
"CatalogName": "hive"
}]
}
},
"ResponseMetadata": {
"RetryAttempts": 0,
"HTTPStatusCode": 200,
"RequestId": "18190e7c-901c-40b4-b6ef-10a5013b1a70",
"HTTPHeaders": {
"date": "Mon, 01 Oct 2018 04:51:14 GMT",
"x-amzn-requestid": "18190e7c-901c-40b4-b6ef-10a5013b1a70",
"content-length": "1464",
"content-type": "application/x-amz-json-1.1",
"connection": "keep-alive"
}
}
}
My desired Result is a JSON Array like the following:
[{
"postcode": "2171",
"CountOf": "2"
}, {
"postcode": "4697",
"CountOf": "2"
}, {
"postcode": "1166",
"CountOf": "2"
},
...
]
>>> def get_var_char_values(d):
... return [obj['VarCharValue'] for obj in d['Data']]
...
...
... header, *rows = input_data['ResultSet']['Rows']
... header = get_var_char_values(header)
... result = [dict(zip(header, get_var_char_values(row))) for row in rows]
>>> import json; print(json.dumps(result, indent=2))
[
{
"postcode": "4450",
"CountOf": "2"
},
{
"postcode": "1231",
"CountOf": "2"
},
{
"postcode": "4469",
"CountOf": "1"
},
{
"postcode": "3651",
"CountOf": "3"
},
{
"postcode": "1166",
"CountOf": "2"
},
{
"postcode": "4697",
"CountOf": "2"
},
{
"postcode": "2171",
"CountOf": "2"
}
]
In elastic search aggregation query I need to get all the movies watched by the user who watches the movie "Frozen". This is how my Result source
{
"_index": "user",
"_type": "user",
"_id": "ovUowmUBREWOv-CU-4RT",
"_version": 4,
"_score": 1,
"_source": {
"movies": [
"Angry birds 1",
"PINNOCCHIO",
"Frozen",
"Hotel Transylvania 3"
],
"user_id": 86
}
}
This is the query I'm using.
{
"query": {
"match": {
"movies": "Frozen"
}
},
"size": 0,
"aggregations": {
"movies_like_Frozen": {
"terms": {
"field": "movies",
"min_doc_count": 1
}
}
}
}
The result I got in the bucket is correct, but the movie names are splits by white space like this
"buckets": [
{
"key": "3",
"doc_count": 2
},
{
"key": "hotel",
"doc_count": 2
},
{
"key": "transylvania",
"doc_count": 2
},
{
"key": "1",
"doc_count": 1
},
{
"key": "angry",
"doc_count": 1
},
{
"key": "birds",
"doc_count": 1
}
]
How can I get buckets with "Angry birds 1", "Hotel Transylvania 3" as result.
Please help.
In elasticsearch 6.x, every text field is analyzed implicitly. To override this, you need to create a mapping for text type fields as not_analyzed in an index, then insert documents in it.
In your case,
{
"mappings": {
"user": {
"properties": {
"movies": {
"type": "text",
"index": "not_analyzed",
"fields": {
"keyword": {
"type": "text",
"index": "not_analyzed"
}
}
},
"user_id": {
"type": "long"
}
}
}
}
}
Hope it works.