Mongo returns a WriteResult on upserts:
WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })
Is there any way I could access those fields from pymongo? I need this because an update always returns none in pymongo and I want to know if the document I was querying was modified or even if it exists without doing an additional query. Can you please tell me how this could be done?
P.S. I know this has been asked before but it was a few years ago and everything I could found on google didn't include an example.
Since we're at it, is there a way to get fields from the document from the result of an upsert? (or at least the _id)
Solved: As Neil Lunn suggests, the Bulk API is the way to go if you want to get more data out of what happened with your updates. I'd just like to point out this quick walkthrough of the API.
The newer MongoDB shell implementations from MongoDB 2.6 and upwards actually define there shell helper methods for .update() and .insert() etc, using the "Bulk operations API" where this is available.
So basically where the shell is connecting to a a MongoDB 2.6 instance or greater the "Bulk" methods are used "under the hood". Even if they actually are only acting on one document at a time, or otherwise effectively only issuing "one" update request or similar.
The general driver interfaces have not yet caught up with this and you need to still invoke explicitly:
bulk = db.test.initialize_ordered_bulk_op()
bulk.find({}).upsert().update({ "$set" { "this": "that" } }
result = bulk.execute()
The "result" returned here matches the "Bulk Write Result" specification you see in the shell, which is different to how the "legacy" implementations which are currently used in the standard driver methods return.
Related
This is the use case: I have a server that receives instructions from many clients. Each client instructions are handled by its own Session object, who holds all the information about the state of the session and queries mongoengine for the data it needs.
Now, suppose session1 queries mongoengine and gets document "A" as a document object.
Later, session2 also queries and gets document "A", as another separate document object.
Now we have 2 document objects representing document "A", and to get them consistent I need to call A.update() and A.reload() all the time, which seems unnecessary.
Is there any way I can get a reference to the same document object over the two queries? This way both sessions could make changes to the document object and those changes would be seen by the other sessions, since they would be made to the same python object.
I've thought about making a wrapper for mongoengine that caches the documents that we have as document objects at runtime and ensures there are no multiple objects for the same document at any given time. But my knowledge of mongoengine is too rudimentary to do it at the time.
Any thoughts on this? Is my entire design flawed? Is there any easy solution?
I don't think going in that direction is a good idea. From what I understand you are in a web application context, you might be able to get something working for threads within a single process but you won't be able to share instances across different processes (and it gets even worse if you have processes running on different machines).
One way to address this is to use optimistic concurrency validation, you basically maintain a field like "version-identifier" that gets updated whenever the instance is updated and whenever you save/update the object, you run a query like "update object if version-identifier=... else you fail"
This means that if there are concurrent requests, 1 of them will succeed (first one to be flusged), the other one will fail because the version-identifier that they have is outdated. MongoEngine has no built in support for that but more info can be found here https://github.com/MongoEngine/mongoengine/issues/1563
Postgreshook in airflow has a function get_record that returns the result of a query.
The result is a tuple object. How can we receive the result in a dict form?
Also since the code mentions here that we can use a dict cursor how do we do that?
The directions given in your link are technically accurate, but unfortunately not very clear as to what "connection" you should be working with.
Modify the relevant Connection in the Airflow web interface's Connections manager so that its Extra field contains the following JSON: {"cursor": "dictcursor"}.
. Hi, community.
I have a question/issue about firestore query from Firebase.
I have a collection of around 18000 documents. I would like to get the value of a single same field of some of these documents. I use the python firestore_v1 library from google-cloud-python client. So, for example with list_edges.length = 250:
[db_firestore.document(f"edges/{edge['id']}").get({"distance"}).to_dict()["distance"] for edge in list_edges]
it takes like 30+ seconds to be evaluated, meanwhile with the equal collection on MongoDB it takes not more than 3 seconds doing this and loading the whole object, not only a one field:
list(db_mongo["edges"].find({"city_id":{"$eq":city_id},"id": {"$in": [edge_id for edge in list_edges]}}))
...having said that, I thought the solution could be separate the large collection by city_id, so I create a new collection and copy the corresponded documents inside, so now the query looks like:
[db_firestore.document(f"edges/7/edges/{edge['id']}").get({"distance"}).to_dict()["distance"] for edge in list_edges]
where 7 is a city_id.
However, it takes the same time. So, maybe the issue is around the .get() method, but I could not find any optimized solution for my case.
Could you help me with this? Thanks!
EDITED
I've got the answer from firestore support. The problem is that I make 250 requests doing .get() for each document separately. The idea is to get all the data I want in only one request, so I need to modify the query.
Let's assume I have the next DB:
edges collection with multiples edge_id documents. For each new request, I use a new generated list of edges I need to catch.
In MongoDB, I can do it with the $in operator (having edge_id inside the document), but in firestore, the 'in' operator only accepts up to 10 equality.
So, I need to find out another way to do this.
Any ideas? Thanks!
Firebase recently added support for a limited in operation. See:
The blog post announcing the feature.
The documentation on in and array-contains-any queries.
From the latter:
cities_ref = db.collection(u'cities')
query = cities_ref.where(u'country', u'in', [u'USA', u'Japan'])
A few caveats though:
You can have at most 10 values in the in clause, and you can have only on in (or array-contains-any) clause in query.
I am not sure if you can use this operator to select by ID.
I am trying to use an API which I have used previously for various jobs, to query and get me relevant data. But lately, I am unable to do that because of an unusual exception returned, which I honestly have no idea about.
The CODE:
import SIEMAuth
import requests
alert_id = '144116287822364672|12101929'
query_params = {"id": {"value": alert_id}, "format": {"format": 0}}
print(requests.post(SIEMAuth.url + 'ipsGetAlertPacket', json=query_params, headers=SIEMAuth.session_headers, verify=False).text)
The following exception/traceback response is returned on querying this:
Can not construct instance of com.mcafee.siem.api.data.alert.EsmPacketFormat: no suitable constructor found, can not deserialize from Object value (missing default constructor or creator, or perhaps need to add/enable type information?)
at [Source: java.io.StringReader#1a15fbf; line: 1, column: 2]
Process finished with exit code 0
On trying to surf the internet to know more about the exception, most of the results are related to Jackson Parser for Json in Java Programming Environment which is not something I am working on or am aware of.
If anybody could help, I'd be extremely grateful.....
Unfortunately it's as I suggested; basically one way or another it's broken. The response from their support is as follows.
I have reach out to my development team for this question. I got below response.
That particular get is not meant to be used in the external API. It should only be used from the interface, and has been removed since the version of the ESM you are on. If you want to use that externally then you need to submit it as a per.
I hope this clears your questions.
Edit: This has actually been expanded on in a thread on their support forums. You need a login to see the original thread.
Name notwithstanding, this API does not return the actual data packet associated with an event. In fact, when aggregation is enabled, not all of the packets associated with a given event are available on the ESM. Raw packet data can be retrieved from the ELM through the UI, but unfortunately there currently is not a way to do that programmatically.
Background Information
The answer to my previous question (In Eve, how can you make a sub-resource of a collection and keep the parent collections endpoint?) was to use the multiple endpoints, one datasource feature of Eve. In the IRC channel, I was speaking with cuibonobo, and she was able to get this working by changing the game_id to be an objectid instead of a string, as shown here:
http://gist.github.com/uunsamp/d969116367181bb30731
I however didn't get this working, and as you can see from the conversation, I was putting documents into the collection differently:
14:59 < cuibonobo> no. it's just that since your previous settings file saved the game id as a string, the lookup won't work
15:00 < cuibonobo> it will only work on documents where game_id has been saved as an ObjectId
15:01 < cuibonobo> the way Eve currently works, if you set the type to 'objectid', it will convert the string to a Mongo ObjectId before saving it in the database. but that conversion doesn't happen with strings
15:02 < znn> i haven't been using eve for storing objects
15:02 < znn> i've been using the mongo shell interface for inserting items
15:03 < cuibonobo> oh. hmm. that might complicate things. Eve does type conversions and other stuff before inserting documents.
15:04 < cuibonobo> so inserting stuff directly into mongo generally isn't recommended
Question
Which leads me to stackoverflow :)
What is the difference between inserting a document into a collection using the http method POST and using the mongo shell? Will users eventually be able to use either method of document insertion?
Extra information
I was looking through http://github.com/nicolaiarocci/eve/blob/develop/eve/methods/post.py before asking this question, but this could take awhile to understand, much longer than just asking someone who maybe is more familiar with the code than myself.
The quick answer is that Eve is adding a few meta fields etag, updated, created along with every stored document. If you want to store documents locally (not going through HTTP) you can use post_internal:
Intended for internal post calls, this method is not rate limited,
authentication is not checked and pre-request events are not raised.
Adds one or more documents to a resource. Each document is validated
against the domain schema. If validation passes the document is inserted
and ID_FIELD, LAST_UPDATED and DATE_CREATED along with a link to the
document are returned. If validation fails, a list of validation issues
is returned.
Usage example:
from run import app
from eve.methods.post import post_internal
payload = {
"firstname": "Ray",
"lastname": "LaMontagne",
"role": ["contributor"]
}
with app.test_request_context():
x = post_internal('people', payload)
print(x)
Documents inserted with post_internal are subject to the same validation and will be stored like they were by API clients via HTTP. In 0.5-dev (not released yet) PATCH, PUT and DELETE internal methods have been added too.