I'm trying to make the following aggregate in my Python Project:
pipeline = [
{
'$group': {
'date': { '$max': "$date" },
'_id': {
'interface': "$interface",
'message': "$message",
'server': "$server"
},
'record_count': {
'$sum': '1'
}
}
}
]
errors = EntryError.objects.aggregate(pipeline)
But when the aggregate function is executed, it gives me the following error:
pymongo.errors.OperationFailure: Each element of the 'pipeline' array must be an object
But the same pipeline code works on Robo3T and when using mongo shell.
What am I doing wrong?
I figured out what I was doing wrong.
The code that solved everything it's this one:
pipeline = {
"$group": {
"date": {"$max": "$date"},
"_id": {
"interface": "$interface",
"message": "$message",
"server": "$server"
},
"record_count": {
"$sum": 1
}
}
}
errors = EntryError.objects.filter(
date__gte=start_date,
date__lte=end_date
).aggregate(pipeline)
"pipeline" as dict instead of list.
Related
After updating Python package elasticsearch from 7.6.0 to 8.1.0, I started to receive an error at this line of code:
count = es.count(index=my_index, body={'query': query['query']} )["count"]
receive following error message:
DeprecationWarning: The 'body' parameter is deprecated and will be
removed in a future version. Instead use individual parameters.
count = es.count(index=ums_index, body={'query': query['query']}
)["count"]
I don't understand how to use the above-mentioned "individual parameters".
Here is my query:
query = {
"bool": {
"must":
[
{"exists" : { "field" : 'device'}},
{"exists" : { "field" : 'app_version'}},
{"exists" : { "field" : 'updatecheck'}},
{"exists" : { "field" : 'updatecheck_status'}},
{"term" : { "updatecheck_status" : 'ok'}},
{"term" : { "updatecheck" : 1}},
{
"range": {
"#timestamp": {
"gte": from_date,
"lte": to_date,
"format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd"
}
}
}
],
"must_not":
[
{"term" : { "device" : ""}},
{"term" : { "updatecheck" : ""}},
{"term" : { "updatecheck_status" : ""}},
{
"terms" : {
"app_version" : ['2.2.1.1', '2.2.1.2', '2.2.1.3', '2.2.1.4', '2.2.1.5',
'2.2.1.6', '2.2.1.7', '2.1.2.9', '2.1.3.2', '0.0.0.0', '']
}
}
]
}
}
In the official documentation, I can't find any chance to find examples of how to pass my query in new versions of Elasticsearch.
Possibly someone has a solution for this case other than reverting to previous versions of Elasticsearch?
According to the documentation, this is now to be done as follows:
# ✅ New usage:
es.search(query={...})
# ❌ Deprecated usage:
es.search(body={"query": {...}})
So the queries are done directly in the same line of code without "body", substituting the api you need to use, in your case "count" for "search".
You can try the following:
# ✅ New usage:
es.count(query={...})
# ❌ Deprecated usage:
es.count(body={"query": {...}})
enter code here
You can find out more by clicking on the following link:
https://github.com/elastic/elasticsearch-py/issues/1698
For example, if the query would be:
GET index-00001/_count
{
"query" : {
"match_all": {
}
}
}
Python client would be the next:
my_index = "index-00001"
query = {
"match_all": {
}
}
hits = en.count(index=my_index, query=query)
or
hits = en.count(index=my_index, query={"match_all": {}})
Using Elasticsearch 8.4.1, I got the same warning when creating indices via Python client.
I had to this this way instead:
settings = {
"number_of_shards": 2,
"number_of_replicas": 1
}
mappings = {
"dynamic": "true",
"numeric_detection": "true",
"_source": {
"enabled": "true"
},
"properties": {
"p_text": {
"type": "text"
},
"p_vector": {
"type": "dense_vector",
"dims": 768
},
}
}
es.indices.create(index=index_name, settings=settings, mappings=mappings)
Hope this helps.
I have a collection with the following document format:
{
{ "_id": 1234,
"processes": [
"0": { "0_0": aaaa , "0_1": bbbb },
"1": { "1_0": cccc, "1_1": dddd },
"2": { "2_0": eeee, "2_1": ffff},
]},
{ "_id": 5678,
"processes": [
"0": { "0_0": gggg, "0_1": hhhh},
"1": { "1_0": iiii, "1_1": jjjj},
"2": { "2_0": kkkk, "2_1": mmmm},
]}
}
In another query I made about the same DB, my colleague #hhharsha36 helped me with the problem I had:
Update with Pymongo boolean field in a subdocument within a list field of a document in a MongoDB collection
Thanks to that my current aggregate query looks like this:
cursor_processes = collection.aggregate([
{
"$unwind": "$processes"
},
{
"$replaceRoot": {
"newRoot": {
"$mergeObjects": [
{
"_id": "$_id"
},
"$processes"
]
}
}
}
])
At this moment, I want to create a key-value that depends on whether or not a key named motive exists previously. Then:
cursor_processes = collection.aggregate([
{
"$unwind": "$processes"
},
{
"$replaceRoot": {
"newRoot": {
"$mergeObjects": [
{
"_id": "$_id"
},
"$processes",
{
"code": {
'$cond': [
{"processes.0": {'$exists': True}},
{'$concat': ["$processes.0_0", "$processes.0_1", {'$substr': ["$_id", 0, -1]}, "si_00"]}
{'$concat': ["$processes.1_0", "$processes.2_0", {'$substr': ["$_id", 0, -1]}, "no_00"]}
]
}
}
]
}
}
}
])
list_proc = [i for i in cursor_processes] #Create a list
When I debug, I get the following error message:
pymongo.errors.OperationFailure: Unrecognized expression '$exists', full error: {'ok': 0.0, 'errmsg': "Unrecognized expression '$exists'", 'code': 168, 'codeName': 'InvalidPipelineOperator'}
$exists is a query operator.
$cond is expecting that first argument to be a boolean expression, not a filter document.
You might try the $ifNull pipeline operator, or perhaps a combination of $eq and $not pipeline operators
I have a document that references another document, and I'd like to join these documents and filter based on the contents of an array in the child document:
deployment_machine document:
{
"_id": 1,
"name": "Test Machine",
"machine_status": 10,
"active": true
}
machine_status document:
{
"_id": 10,
"breakdown": [
{
"status_name": "Rollout",
"state": "complete"
},
{
"status_name": "Deploying",
"state": "complete"
}
]
}
I'm using Mongo 3.6 and am having mixed success with the lookup and pipeline, heres the object I'm using in the python MongoEngine being passed to the aggregate function:
pipeline = [
{'$match': {'breakdown': {'$elemMatch': {'status_name': 'Rollout'}}}},
{'$lookup':
{
'from': 'deployment_machine',
'let': {'status_id': '$_id'},
'pipeline': [
{'$match':
{'$expr':
{'$and': [
{'$eq': ['$machine_status', '$$status_id']},
]},
}
}
],
'as': 'result',
},
},
{'$project': {
'breakdown': {'$filter': {
'input': '$breakdown',
'as': 'breakdown',
'cond': {'$eq': ['$$breakdown.status_name', 'Rollout']}
}}
}},
]
result = list(MachineStatus.objects.aggregate(*pipeline))
This works well, but how can I exclude results where the Deployment Machine isn't active? I feel it must go in the project but can't find a condition that works. Any help appreciated.
You can add more condition in $lookup pipeline
pipeline = [
{ $match: { breakdown: { $elemMatch: { status_name: "Rollout" } } } },
{
$lookup: {
from: "deployment_machine",
let: { status_id: "$_id" },
pipeline: [
{
$match: {
$expr: { $eq: ["$machine_status", "$$status_id"] },
active: false
}
}
],
as: "result",
}
},
{
$project: {
breakdown: {
$filter: {
input: "$breakdown",
as: "breakdown",
cond: { $eq: ["$$breakdown.status_name", "Rollout"] },
}
}
}
}
];
Im currently using mongodb version v3.0
this is my code:
{'$lookup': {
'from': 'Matrix',
'localField': 'account_id',
'foreignField': 'account_id',
'as': 'Matrix'
}
}
Im having this error:
Exception calling application: exception: Unrecognized pipeline stage name: '$lookup'
Query using the aggregation framework with PyMongo.This requires two connections to MongoDB (one for PyMongo to perform the aggregation query, and a second for the regular query or insert or updating via MongoEngine).
But _get_collection() resolve this problem.
In below example, we use two model Plans and Addons and in both relation is quote_id
collection = Plans._get_collection()
pipeline = [
{
"$lookup":
{
"from":"addons",
"localField":"plans.quote_id",
"foreignField":"addons.quote_id",
"as": "addons_docs"
}
},
{
"$match":{
"addons_docs":{
"$ne":[]
}
}
},
{
"$addFields":
{
"addons_docs":
{
"$arrayElemAt":["$addons_docs",0]
}
}
},
{
"$replaceRoot":
{
"newRoot":
{
"$mergeObjects":["$addons_docs","$$ROOT"]
}
}
},
{
"$project":
{
"addons_docs":0
}
},
{
"$sort":
{
"_id":-1
}
},
{
"$limit":100
}
]
cursor = collection.aggregate(pipeline)
try:
for doc in cursor:
print(doc)
finally:
cursor.close()
I want to iterate Mongodb database Arraylist items(TRANSACTION list) and remove Arraylist specific(TRANSACTION List) item using pymongo ?
I create Mongo collection as above using python pymongo. I want to iterate array list item using pymongo and remove final item only in Arraylist?
Data insert query using Python pymongo
# added new method create block chain_structure
def addCoinWiseTransaction(self, senz, coin, format_date):
self.collection = self.db.block_chain
coinValexists = self.collection.find({"_id": str(coin)}).count()
print('coin exists : ', coinValexists)
if (coinValexists > 0):
print('coin hash exists')
newTransaction = {"$push": {"TRANSACTION": {"SENDER": senz.attributes["#SENDER"],
"RECIVER": senz.attributes["#RECIVER"],
"T_NO_COIN": int(1),
"DATE": datetime.datetime.utcnow()
}}}
self.collection.update({"_id": str(coin)}, newTransaction)
else:
flag = senz.attributes["#f"];
print flag
if (flag == "ccb"):
print('new coin mined othir minner')
root = {"_id": str(coin)
, "S_ID": int(senz.attributes["#S_ID"]), "S_PARA": senz.attributes["#S_PARA"],
"FORMAT_DATE": format_date,
"NO_COIN": int(1),
"TRANSACTION": [{"MINER": senz.attributes["#M_S_ID"],
"RECIVER": senz.attributes["#RECIVER"],
"T_NO_COIN": int(1),
"DATE": datetime.datetime.utcnow()
}
]
}
self.collection.insert(root)
else:
print('new coin mined')
root = {"_id": str(coin)
, "S_ID": int(senz.attributes["#S_ID"]), "S_PARA": senz.attributes["#S_PARA"],
"FORMAT_DATE": format_date,
"NO_COIN": int(1),
"TRANSACTION": [{"MINER": "M_1",
"RECIVER": senz.sender,
"T_NO_COIN": int(1),
"DATE": datetime.datetime.utcnow()
}
]
}
self.collection.insert(root)
return 'DONE'
To remove the last entry, the general idea (as you have mentioned) is to iterate the array and grab the index of the last element as denoted by its DATE field, then update the collection by removing it using $pull. So the crucial piece of data you need for this to work is the DATE value and the document's _id.
One approach you could take is to first use the aggregation framework to get this data. With this, you can run a pipeline where the first step if filtering the documents in the collection by using the $match operator which uses standard MongoDB queries.
The next stage after filtering the documents is to flatten the TRANSACTION array i.e. denormalise the documents in the list so that you can filter the final item i.e. get the last document by the DATE field. This is made possible with the $unwind operator, which for each input document, outputs n documents where n is the number of array elements and can be zero for an empty array.
After deconstructing the array, in order to get the last document, use the $group operator where you can regroup the flattened documents and in the process use the group accumulator operators to obtain
the last TRANSACTION date by using the $max operator applied to its embedded DATE field.
So in essence, run the following pipeline and use the results to update the collection. For example, you can run the following pipeline:
mongo shell
db.block_chain.aggregate([
{ "$match": { "_id": coin_id } },
{ "$unwind": "$TRANSACTION" },
{
"$group": {
"_id": "$_id",
"last_transaction_date": { "$max": "$TRANSACTION.DATE" }
}
}
])
You can then get the document with the update data from this aggregate operation using the toArray() method or the aggregate cursor and update your collection:
var docs = db.block_chain.aggregate([
{ "$match": { "_id": coin_id } },
{ "$unwind": "$TRANSACTION" },
{
"$group": {
"_id": "$_id",
"LAST_TRANSACTION_DATE": { "$max": "$TRANSACTION.DATE" }
}
}
]).toArray()
db.block_chain.updateOne(
{ "_id": docs[0]._id },
{
"$pull": {
"TRANSACTION": {
"DATE": docs[0]["LAST_TRANSACTION_DATE"]
}
}
}
)
python
def remove_last_transaction(self, coin):
self.collection = self.db.block_chain
pipe = [
{ "$match": { "_id": str(coin) } },
{ "$unwind": "$TRANSACTION" },
{
"$group": {
"_id": "$_id",
"last_transaction_date": { "$max": "$TRANSACTION.DATE" }
}
}
]
# run aggregate pipeline
cursor = self.collection.aggregate(pipeline=pipe)
docs = list(cursor)
# run update
self.collection.update_one(
{ "_id": docs[0]["_id"] },
{
"$pull": {
"TRANSACTION": {
"DATE": docs[0]["LAST_TRANSACTION_DATE"]
}
}
}
)
Alternatively, you can run a single aggregate operation that will also update your collection using the $out pipeline which writes the results of the pipeline to the same collection:
If the collection specified by the $out operation already
exists, then upon completion of the aggregation, the $out stage atomically replaces the existing collection with the new results collection. The $out operation does not
change any indexes that existed on the previous collection. If the
aggregation fails, the $out operation makes no changes to
the pre-existing collection.
For example, you could run this pipeline:
mongo shell
db.block_chain.aggregate([
{ "$match": { "_id": coin_id } },
{ "$unwind": "$TRANSACTION" },
{ "$sort": { "TRANSACTION.DATE": 1 } }
{
"$group": {
"_id": "$_id",
"LAST_TRANSACTION": { "$last": "$TRANSACTION" },
"FORMAT_DATE": { "$first": "$FORMAT_DATE" },
"NO_COIN": { "$first": "$NO_COIN" },
"S_ID": { "$first": "$S_ID" },
"S_PARA": { "$first": "$S_PARA" },
"TRANSACTION": { "$push": "$TRANSACTION" }
}
},
{
"$project": {
"FORMAT_DATE": 1,
"NO_COIN": 1,
"S_ID": 1,
"S_PARA": 1,
"TRANSACTION": {
"$setDifference": ["$TRANSACTION", ["$LAST_TRANSACTION"]]
}
}
},
{ "$out": "block_chain" }
])
python
def remove_last_transaction(self, coin):
self.db.block_chain.aggregate([
{ "$match": { "_id": str(coin) } },
{ "$unwind": "$TRANSACTION" },
{ "$sort": { "TRANSACTION.DATE": 1 } },
{
"$group": {
"_id": "$_id",
"LAST_TRANSACTION": { "$last": "$TRANSACTION" },
"FORMAT_DATE": { "$first": "$FORMAT_DATE" },
"NO_COIN": { "$first": "$NO_COIN" },
"S_ID": { "$first": "$S_ID" },
"S_PARA": { "$first": "$S_PARA" },
"TRANSACTION": { "$push": "$TRANSACTION" }
}
},
{
"$project": {
"FORMAT_DATE": 1,
"NO_COIN": 1,
"S_ID": 1,
"S_PARA": 1,
"TRANSACTION": {
"$setDifference": ["$TRANSACTION", ["$LAST_TRANSACTION"]]
}
}
},
{ "$out": "block_chain" }
])
Whilst this approach can be more efficient than the first, it requires knowledge of the existing fields first so in some cases the solution cannot be practical.