using $split for twice in mongodb using python - python

Let say in simple my document in mongodb is like this:
{'status' = {'tat': 'a, b <b>, c, d <d>' } }
I want to separate them and print it like
{bbced_name : 'a'},
{bbced_name : 'b'},
{bbced_name : 'c'},
{bbced_name : 'd'},
Therefore I try to split the data for twice. The first one is that to split the text with separator comma, then I split again with the separator < :
#the first split
project = { "$project" : { "bcced_name" : {
"$split" :
["$status.tat", ", "]
}
}
}
unwind = {"$unwind" : "$bcced_name"}
#the second split
project2= {"$project" : { "bbced_name2" : {
"$split" :
["$cced_name", "<"]
}
}
}
unwind2 = {"unwind" : "$bbced2"}
cur = collection.aggregate([project, unwind, project2, unwind2])
could I use split for twice in one pipeline? The first split is working well, but the second isn't.

You can below aggregation in 3.4.
$split to create a array of string values followed by $map to output a $substrCP value from start of the string to delimiter <.
Each substring end value is calculated by iterating the string using $range and $filter to output the location of the < string.
db.collection_name.aggregate(
[{"$project":
{"bcced_name":
{"$map":{
"input":{"$split":["$status.tat",", "]},
"as":"tat",
"in":{
"$cond":[
{"$eq":[{"$strLenCP":"$$tat"},1]},
"$$tat",
{
"$substrCP":[
"$$tat",
0,
{
"$arrayElemAt":[
{"$filter":{
"input":{"$range":[0,{"$strLenCP":"$$tat"},1]},
"as":"r",
"cond":{"$eq":[{"$substrCP":["$$tat","$$r",2]}," <"]}}
},
0]
}
]
}
]
}
}
}
}
},
{"$unwind": "$bcced_name"}
])
Update: (Use $indexOfCP)
db.collection_name.aggregate(
[{"$project":
{"bcced_name":
{"$map":{
"input":{"$split":["$status.tat",", "]},
"as":"tat",
"in":{
"$cond":[
{"$eq":[{"$strLenCP":"$$tat"},1]},
"$$tat",
{
"$substrCP":[
"$$tat",
0,
{ "$indexOfCP": [ "$$tat", " <" ] }
]
}
]
}
}
}
}
},
{"$unwind": "$bcced_name"}
])

{"$project":{
"bcced_name":
{"$map":{
"input":{"$split":["$status.tat",", "]},
"as":"tat",
"in":{
"$cond":[
{"$gt":[{"$indexOfCP":["$$tat","<"]},0]},
{"$arrayElemAt" : [{"$split":["$$tat", "<"]}, 0]},
"$$tat"
]
}
}
}
}
}

Related

Unable to replicate post_filter query in elasticsearch-dsl

The query I would like to replicate in DSL is as below:
GET /_search
{
"query":{
"bool":{
"must":[
{
"term":{
"destination":"singapore"
}
},
{
"terms":{
"tag_ids":[
"tag_luxury"
]
}
}
]
}
},
"aggs":{
"max_price":{
"max":{
"field":"price_range_from.SGD"
}
},
"min_price":{
"min":{
"field":"price_range_from.SGD"
}
}
},
"post_filter":{
"range":{
"price_range_from.SGD":{
"gte":0.0,
"lte":100.0
}
}
}
}
The above query
Matches terms - destination and tags_ids
Aggregates to result to find the max price from field price_range_from.SGD
Applies another post_filter to subset the result set within price limits
It works perfectly well in the Elastic/Kibana console.
I replicated the above query in elasticsearch-dsl as below:
es_query = []
es_query.append(Q("term", destination="singapore"))
es_query.append(Q("terms", tag_ids=["tag_luxury"]))
final_query = Q("bool", must=es_query)
es_conn = ElasticSearch.instance().get_client()
dsl_client = DSLSearch(using=es_conn, index=index).get_dsl_client()
dsl_client.query = final_query
dsl_client.aggs.metric("min_price", "min", field="price_range_from.SGD")
dsl_client.aggs.metric("max_price", "max", field="price_range_from.SGD")
q = Q("range", **{"price_range_from.SGD":{"gte": 0.0, "lte": 100.0}})
dsl_client.post_filter(q)
print(dsl_client.to_dict())
response = dsl_client.execute()
print(response.to_dict().get("hits", {}))
Although the aggregations are correct, products beyond the price range are also being returned. There is no error returned but it seems like the post_filter query is not applied.
I dived in the dsl_client object to see whether my query is being captured correctly. I see only the query and aggs but don't see the post_filter part in the object. The query when converted to a dictionary using dsl_client.to_dict() is as below -
{
"query":{
"bool":{
"must":[
{
"term":{
"destination":"singapore"
}
},
{
"terms":{
"tag_ids":[
"tag_luxury"
]
}
}
]
}
},
"aggs":{
"min_price":{
"min":{
"field":"price_range_from.SGD"
}
},
"max_price":{
"max":{
"field":"price_range_from.SGD"
}
}
}
}
Please help. Thanks!
You have to re-assign the dsl_client like:
dsl_client = dsl_client.post_filter(q)

How to query first and last objects from an array in MongoDB

So, I have an array of objects like this one:
"coordinates":[
{
"action":"charge",
"position":{
"city":"City A"
}
},
{
"action":"charge",
"position":{
"city":"City B"
}
},
{
"action":"discharge",
"position":{
"city":"City C"
}
},
{
"action":"discharge",
"position":{
"city":"City D"
}
},
(...)
]
This array has a number N of objects, so I don't know the total of objects inside the array.
My question is: How do I query the first and last city from the object position of the coordinates array? I was doing something like this:
db.find({
'coordinates.0.position.city': city_first_name,
'coordinates.position': {'$elemMatch': {'city': city_last_name}
},
{
'coordinates.$.position': {'$slice': -1}})
})
But that didn't work really well. It gets the first position, but will get any element that is in any part of any object. Should I use aggregation or there is another way using find?
Thanks for any help.
db.collection.aggregate([
{
"$project": {
"f": {
"$slice": [//Firsgt
"$a",
1
]
},
"l": {
"$slice": [//Last
"$a",
-1
]
}
}
}
])
You could achieve using $slice with $project
playground

MongoDB Python MongoEngine - Returning Document by filter of Embedded Documents Sum of Filtered property

I am using Python and MongoEngine to try and query the below Document in MongoDB.
I need a query to efficiently get the Documents only when they contain Embedded Documents 'Keywords' that match the following criteria:
Keywords Filtered where the Property 'SFR' is LTE '100000'
SUM the filtered keywords
Return the parent documents where SUM of the keywords matching the criteria is Greater than '9'
Example structure:
{
"_id" : ObjectId("5eae60e4055ef0e717f06a50"),
"registered_data" : ISODate("2020-05-03T16:12:51.999+0000"),
"UniqueName" : "SomeUniqueNameHere",
"keywords" : [
{
"keyword" : "carport",
"search_volume" : NumberInt(10532),
"sfr" : NumberInt(20127),
"percent_contribution" : 6.47,
"competing_product_count" : NumberInt(997),
"avg_review_count" : NumberInt(143),
"avg_review_score" : 4.05,
"avg_price" : 331.77,
"exact_ppc_bid" : 3.44,
"broad_ppc_bid" : 2.98,
"exact_hsa_bid" : 8.33,
"broad_hsa_bid" : 9.29
},
{
"keyword" : "party tent",
"search_volume" : NumberInt(6944),
"sfr" : NumberInt(35970),
"percent_contribution" : 4.27,
"competing_product_count" : NumberInt(2000),
"avg_review_count" : NumberInt(216),
"avg_review_score" : 3.72,
"avg_price" : 210.16,
"exact_ppc_bid" : 1.13,
"broad_ppc_bid" : 0.55,
"exact_hsa_bid" : 9.66,
"broad_hsa_bid" : 8.29
}
]
}
From the research I have been doing, I believe an Aggregate type query might do what I am attempting.
Unfortunately, being new to MongoDB / MongoEngine I am struggling to figure out how to structure the query and have failed in finding an example similar to what I am attempting to do (RED FLAG RIGHT????).
I did find an example of a aggregate but unsure how to structure my criteria in it, maybe something like this is getting close but does not work.
pipeline = [
{
"$lte": {
"$sum" : {
"keywords" : {
"$lte": {
"keyword": 100000
}
}
}: 9
}
}
]
data = product.objects().aggregate(pipeline)
Any guidance would be greatly appreciated.
Thanks,
Ben
you can try something like this
db.collection.aggregate([
{
$project: { // the first project to filter the keywords array
registered_data: 1,
UniqueName: 1,
keywords: {
$filter: {
input: "$keywords",
as: "item",
cond: {
$lte: [
"$$item.sfr",
100000
]
}
}
}
}
},
{
$project: { // the second project to get the length of the keywords array
registered_data: 1,
UniqueName: 1,
keywords: 1,
keywordsLength: {
$size: "$keywords"
}
}
},
{
$match: { // then do the match
keywordsLength: {
$gte: 9
}
}
}
])
you can test it here Mongo Playground
hope it helps
Note, I used sfr property only from the keywords array for simplicity

How to query a string field contains one of array items?

I have documents like this:
{
name: '...'
}
I want to query for documents which names contains one of:
cities = ['a', 'b', 'c']
Of course it's easy to check for exact match like this:
col_areas = db['areas']
col_areas.find({'name': {'$in': cities}})
I want use $regex with each item of cities. How to do that?
I also have tried:
for c in cities:
cities_query.append('/^%s/' % c)
results = col_areas.find({'name': {'$in': cities_query}})
Maybe there is a better way.
Sample
db.col_areas.aggregate([
{
$project: {
'name': 1
}
},
{
$match: {
$or:
[{ 'name': { $regex: 'a', $options: 'g' } }]
}
}
])
Customize by yourself.

Various queries - MongoDB

This is my table:
unicorns = {'name':'George',
'actions':[{'action':'jump', 'time':123123},
{'action':'run', 'time':345345},
...]}
How can I perform the following queries?
Grab the time of all actions of all unicorns where action=='jump' ?
Grab all actions of all unicorns where time is equal?
e.g. {'action':'jump', 'time':123} and {'action':'stomp', 'time':123}
Help would be amazing =)
For the second query, you need to use MapReduce, which can get a big hairy. This will work:
map = function() {
for (var i = 0, j = this.actions.length; i < j; i++) {
emit(this.actions[i].time, this.actions[i].action);
}
}
reduce = function(key, value_array) {
var array = [];
for (var i = 0, j = value_array.length; i < j; i++) {
if (value_array[i]['actions']) {
array = array.concat(value_array[i]['actions']);
} else {
array.push(value_array[i]);
}
}
return ({ actions: array });
}
res = db.test.mapReduce(map, reduce)
db[res.result].find()
This would return something like this, where the _id keys are your timestamps:
{ "_id" : 123, "value" : { "actions" : [ "jump" ] } }
{ "_id" : 125, "value" : { "actions" : [ "neigh", "canter" ] } }
{ "_id" : 127, "value" : { "actions" : [ "whinny" ] } }
Unfortunately, mongo doesn't currently support returning an array from a reduce function, thus necessitating the silly {actions: [...]} syntax.
Use dot-separated notation:
db.unicorns.find({'actions.action' : 'jump'})
Similarly for times:
db.unicorns.find({'actions.time' : 123})
Edit: if you want to group the results by time, you'll have to use MapReduce.

Categories