As a newbie in Python/MongoDB/pymongo, I used Studio 3T for generating a working aggregation pipeline for accessing and processing the content of a mongoDB collection. I can see that it works perfectly fine there and outputs the expected result. The problem is that Studio 3T generates a .js file for the code while I need to write the code in Python within a wrapper.
Code executing correctly on Studio 3T:
db.cln_matching_results.aggregate(
// Pipeline
[
// Stage 1
{
$unwind: {
path : '$_availability',
includeArrayIndex : 'arrayIndex', // optional
preserveNullAndEmptyArrays : false // optional
}
},
// Stage 2
{
$unwind: {
path : '$_availability.availability_data',
includeArrayIndex : 'arrayIndex', // optional
preserveNullAndEmptyArrays : false // optional
}
},
// Stage 3
{
$match: {
'_availability.availability_data.start_date': {$gte: "2017-09-14T00:00:00.000Z", $lte: "2017-09-31T00:00:00.000Z"}
}
},
// Stage 4
{
$group: {
_id : '$_id',
MD_offered_max: { $sum: { $divide: [ "$_availability.availability_data.value", 100 ] } }
}
},
]);
My attempt at replicating the working code within my Python wrapper breaks the pipeline and gives an empty array after execution. I stress that the rest of the python code works perfectly fine when I comment out the piece of code below:
test_pipeline.extend([
{"$unwind": {'path': '$_availability', 'preserveNullAndEmptyArrays': False}},
{"$unwind": {'path': '$_availability.availability_data', 'preserveNullAndEmptyArrays': False}},
{'$match': {'$_availability.availability_data.start_date': {'$gte': "2017-09-14T00:00:00.000Z", '$lte': "2017-09-16T00:00:00.000Z"}}},
{'$group':{'_id' : '$_id', 'MD_offered_max': { '$sum': { '$divide': [ "$_availability.availability_data.value", 100 ] } }}},
{'$addFields': {'availability_scoring': '$MD_offered_max'}}
])
My question is: where did I go wrong in my attempt at translating the code from a .js format to Python? Thanks for your help.
Related
So I keep running into a issue where json is asking to use a int to find pieces of data in a json response. Below is the code that works though the issue is i want to print every 'name' in the json though id have to change the [0] to 1 and then 2 ect. I tried to increment it though that ran into issue too. This could be just me overlooking something but let me know, thanks.
def BaseTesting(TarLink):
PayloadToSend = {
}
HeadersToSend = { # make sure to change your token at least once every 30 mins
'authorization': '',
'user-agent': ''
}
ReqForFriends = requests.post(TarLink, headers=HeadersToSend, data=PayloadToSend).text
LoadedJSONData = json.loads(ReqForFriends)
print(LoadedJSONData['friends'][0]['name'])
BaseTesting(TarLink="")
JSON
{
"friends":[
{
"name":"test1",
"user_id":"1132",
"type":2,
"display":"true",
},
{
"name":"test2",
"user_id":"2341",
"type":1,
"display":"true",
},
{
"name":"test3",
"user_id":"1234",
"type":2,
"display":"true",
},
}
it seems to work properly
LoadedJSONData = {
"friends":[
{
"name":"test1",
"user_id":"1132",
"type":2,
"display":"true",
},
{
"name":"test2",
"user_id":"2341",
"type":1,
"display":"true",
},
{
"name":"test3",
"user_id":"1234",
"type":2,
"display":"true",
},
]
}
for i in range(0,3):
print(LoadedJSONData['friends'][i]['name']) #test1 test2 test3
I have a collection of accounts and I am trying to find an account in which the targetAmount >= totalAmount + N
{
"_id": {
"$oid": "60d097b761484f6ad65b5305"
},
"targetAmount": 100,
"totalAmount": 0,
"highPriority": false,
"lastTimeUsed": 1624283088
}
Now I just select all accounts, iterate over them and check if the condition is met. But I'm trying to do this all in a query:
amount = 10
tasks = ProviderAccountTaskModel.objects(
__raw__={
'targetAmount': {
'$gte': {'$add': ['totalAmount', amount]}
}
}
).order_by('-highPriority', 'lastTimeUsed')
I have also tried using the $sum, but both options do not work.
Can't it be used when searching, or am I just going the wrong way?
You can use a $where. Just be aware it will be fairly slow (has to execute Javascript code on every record) so combine with indexed queries if you can.
db.getCollection('YourCollectionName').find( { $where: function() { return this.targetAmount > (this.totalAmount + 10) } })
or more compact way of doing it will be
db.getCollection('YourCollectionName').find( { $where: "this.targetAmount > this.totalAmount + 10" })
You have to use aggregation instead of the find command since self-referencing of documents in addition to arithmetic operations won't work on it.
Below is the aggregation command you are looking for. Convert it into motoengine equivalent command.
db.collection.aggregate([
{
"$match": {
"$expr": {
"$gte": [
"$targetAmount",
{
"$sum": [
"$totalAmount",
10
]
},
],
},
},
},
{
"$sort": {
"highPriority": -1,
"lastTimeUsed": 1,
},
},
])
Mongo Playground Sample Execution
The query I would like to replicate in DSL is as below:
GET /_search
{
"query":{
"bool":{
"must":[
{
"term":{
"destination":"singapore"
}
},
{
"terms":{
"tag_ids":[
"tag_luxury"
]
}
}
]
}
},
"aggs":{
"max_price":{
"max":{
"field":"price_range_from.SGD"
}
},
"min_price":{
"min":{
"field":"price_range_from.SGD"
}
}
},
"post_filter":{
"range":{
"price_range_from.SGD":{
"gte":0.0,
"lte":100.0
}
}
}
}
The above query
Matches terms - destination and tags_ids
Aggregates to result to find the max price from field price_range_from.SGD
Applies another post_filter to subset the result set within price limits
It works perfectly well in the Elastic/Kibana console.
I replicated the above query in elasticsearch-dsl as below:
es_query = []
es_query.append(Q("term", destination="singapore"))
es_query.append(Q("terms", tag_ids=["tag_luxury"]))
final_query = Q("bool", must=es_query)
es_conn = ElasticSearch.instance().get_client()
dsl_client = DSLSearch(using=es_conn, index=index).get_dsl_client()
dsl_client.query = final_query
dsl_client.aggs.metric("min_price", "min", field="price_range_from.SGD")
dsl_client.aggs.metric("max_price", "max", field="price_range_from.SGD")
q = Q("range", **{"price_range_from.SGD":{"gte": 0.0, "lte": 100.0}})
dsl_client.post_filter(q)
print(dsl_client.to_dict())
response = dsl_client.execute()
print(response.to_dict().get("hits", {}))
Although the aggregations are correct, products beyond the price range are also being returned. There is no error returned but it seems like the post_filter query is not applied.
I dived in the dsl_client object to see whether my query is being captured correctly. I see only the query and aggs but don't see the post_filter part in the object. The query when converted to a dictionary using dsl_client.to_dict() is as below -
{
"query":{
"bool":{
"must":[
{
"term":{
"destination":"singapore"
}
},
{
"terms":{
"tag_ids":[
"tag_luxury"
]
}
}
]
}
},
"aggs":{
"min_price":{
"min":{
"field":"price_range_from.SGD"
}
},
"max_price":{
"max":{
"field":"price_range_from.SGD"
}
}
}
}
Please help. Thanks!
You have to re-assign the dsl_client like:
dsl_client = dsl_client.post_filter(q)
I am using Python and MongoEngine to try and query the below Document in MongoDB.
I need a query to efficiently get the Documents only when they contain Embedded Documents 'Keywords' that match the following criteria:
Keywords Filtered where the Property 'SFR' is LTE '100000'
SUM the filtered keywords
Return the parent documents where SUM of the keywords matching the criteria is Greater than '9'
Example structure:
{
"_id" : ObjectId("5eae60e4055ef0e717f06a50"),
"registered_data" : ISODate("2020-05-03T16:12:51.999+0000"),
"UniqueName" : "SomeUniqueNameHere",
"keywords" : [
{
"keyword" : "carport",
"search_volume" : NumberInt(10532),
"sfr" : NumberInt(20127),
"percent_contribution" : 6.47,
"competing_product_count" : NumberInt(997),
"avg_review_count" : NumberInt(143),
"avg_review_score" : 4.05,
"avg_price" : 331.77,
"exact_ppc_bid" : 3.44,
"broad_ppc_bid" : 2.98,
"exact_hsa_bid" : 8.33,
"broad_hsa_bid" : 9.29
},
{
"keyword" : "party tent",
"search_volume" : NumberInt(6944),
"sfr" : NumberInt(35970),
"percent_contribution" : 4.27,
"competing_product_count" : NumberInt(2000),
"avg_review_count" : NumberInt(216),
"avg_review_score" : 3.72,
"avg_price" : 210.16,
"exact_ppc_bid" : 1.13,
"broad_ppc_bid" : 0.55,
"exact_hsa_bid" : 9.66,
"broad_hsa_bid" : 8.29
}
]
}
From the research I have been doing, I believe an Aggregate type query might do what I am attempting.
Unfortunately, being new to MongoDB / MongoEngine I am struggling to figure out how to structure the query and have failed in finding an example similar to what I am attempting to do (RED FLAG RIGHT????).
I did find an example of a aggregate but unsure how to structure my criteria in it, maybe something like this is getting close but does not work.
pipeline = [
{
"$lte": {
"$sum" : {
"keywords" : {
"$lte": {
"keyword": 100000
}
}
}: 9
}
}
]
data = product.objects().aggregate(pipeline)
Any guidance would be greatly appreciated.
Thanks,
Ben
you can try something like this
db.collection.aggregate([
{
$project: { // the first project to filter the keywords array
registered_data: 1,
UniqueName: 1,
keywords: {
$filter: {
input: "$keywords",
as: "item",
cond: {
$lte: [
"$$item.sfr",
100000
]
}
}
}
}
},
{
$project: { // the second project to get the length of the keywords array
registered_data: 1,
UniqueName: 1,
keywords: 1,
keywordsLength: {
$size: "$keywords"
}
}
},
{
$match: { // then do the match
keywordsLength: {
$gte: 9
}
}
}
])
you can test it here Mongo Playground
hope it helps
Note, I used sfr property only from the keywords array for simplicity
The below code exist inside a python file I run at cli as 'python test.py '....
import pymongo
from pymongo import Connection
connection = Connection('localhost', 27017)
db = connection.school
data = db.students.aggregate(
{ $match : { 'scores.type': 'homework' },
{ $project: { id : $_id,
name : $name,
scores : $scores
},
{ $unwind: "$scores" },
{ $group : {
_id : "$id",
minScore: { $min :"$scores.score" },
maxScore: { $max: "$scores.score" }
}
});
for _id in data:
print _id
# NOTE: this can only be done ONCE...first find lowest_id then
# remove it by uncommenting the line below...then recomment line.
# db.students.remove(data)
when I run this code I get this error...
File "test.py", line 11
{ $match : { 'scores.type': 'homework' },
^
SyntaxError: invalid syntax
How do I rewrite this code so it works correctly from inside my test.py python file?
You have a few syntax issues.
First, pipeline is an array (a list in Python) where you are trying to pass multiple pipeline elements as separate parameters.
Second, you need to quote the pipeline operators, such as $match: '$match'
Here is a page that has some nice examples:
http://blog.pythonisito.com/2012/06/using-mongodbs-new-aggregation.html