Python Eve - where clause using objectid

Python Eve - where clause using objectid - python

I have the following resource defined in settings.py,
builds = {
'item_title': 'builds',
'schema': {
'sources': {
'type': 'list',
'schema': {
'type': 'objectid',
'data_relation': {
'resource': 'sources',
'embeddable': True,
}
}
},
'checkin_id': {
'type': 'string',
'required': True,
'minlength': 1,
},
}
}
When I try to filter based on a member whose value is an objectid, I get empty list.
http://127.0.0.1:5000/builds?where={"sources":"54e328ec537d3d20bbdf2ed5"}
54e328ec537d3d20bbdf2ed5 is the id of source
Is there anyway to do this?

Your query should work just fine assuming that you actually have the 54e328ec537d3d20bbdf2ed5 value included in any sources field within any builds document.
What I mean is, you can't query the builds endpoint for the existence of a document in the sources endpoint (you can of course do that at the sources endpoint.) But, if you actually store a builds document and it references a sources document, then you query will work fine because what you are actually asking is "get me all builds documents which have a reference to this sources document". For example, if you POST a document like this to the builds endpoint:
{
"sources": ["54e328ec537d3d20bbdf2ed5"]
"checkin_id": "A"
}
Then this query:
http://127.0.0.1:5000/builds?where={"sources":"54e328ec537d3d20bbdf2ed5"}
Will return that one document. Of course since you defined sources as embeddable you can also do:
http://127.0.0.1:5000/builds?where={"sources":"54e328ec537d3d20bbdf2ed5"}&embedded={"sources":1}
Which will get you referenced documents embedded along with any matching document, like so:
{
"sources": [{"field1": "hey", "field2":"I'm an embedded source"}]
"checkin_id": "A"
}
Whereas you would get a 'raw' document without the explicit embed. It is probably worth mentioning that you can also enable predefined embedding of referenced resources, so your clients don't have to explicitly request an embed.
Hope this helps.

New to Eve but I have an advance on Nicola's "should work", because my experience is that it does not and as this question is what comes up when looking trying to deal with the frustration of figuring out why...
Debugging this the library got me to the point where Eve automagically decides that something with a signature that looks like "54e328ec537d3d20bbdf2ed5" should be cast to an ObjectId, which is all good. However, then the comparison of type ObjectId:54e328ec537d3d20bbdf2ed5 against type string:54e328ec537d3d20bbdf2ed5 is not an equality so your filter returns no results
The really simple solution is to change checkin_id to ObjectId. Eve starters can be assured you don't need all the additional decorations, so in the above example just change 'type':'string' to 'type':'objectId' and will be good. Specifically, if you have calling code where this field is defined as a string, you can leave it as it is, the cast will occur within eve as described above and it will just work as expected.
edit - See also eve's schema level "query_objectid_as_string" configuration setting for which upon reading seems to override this behaviour.

Related

How to see what can be set/updated on an issue?

I'm trying to use the JIRA Python API to create and update issues on different projects.
Currently I'm after timetracking but I've seen other fields that cannot be set on this or that project getting the error message:
... cannot be set. It is not on the appropriate screen, or unknown.
I can already set timetracking on some projects like:
issue.update(fields={'timetracking': {'originalEstimate': '4h'}})
But on others I get the mentioned error message although the field is clearly present among the issue fields:
>>> issue.fields.timetracking
<JIRA TimeTracking at 2072336111640>
There seems to be nothing obvious on the object itself that could make me identify the thing as "not set-able".
Here is a post on how to get the fields on the screen via REST API. I think that's what the Python thing is doing in the background. But do I really need to go that way?

Given the path from the REST API question answer we can get the data with the private _get_json:
path = 'issue/createmeta?projectKeys={KEY}&expand=projects.issuetypes.fields'
data = jira_connection._get_json(_FIELDS_PATH.format(KEY=project_key))
project_fields = {}
for issuetype in data['projects'][0]['issuetypes']:
project_fields[issuetype['name']] = dict((f, v['name']) for f,v in issuetype['fields'].items())
This will result in a project_fields dictionary like:
{
"ISSUE_TYPE_NAME": {
"FIELD_ID": "FIELD_NAME",
...
}, // for example:
"Task": {
"summary": "Summary",
"issuetype": "Issue Type",
...
}
}
As long as there is no such feature in the jira package directly.

Any suggestions to customize an analyzer when using query match with elasticsearch-py?

I can't apply a custom analyzer when using query match with elasticsearch-py.
I customized an analyzer called custom_lowercase_stemmed and used es.indices.put_settings to update the index setting.
However, it couldn't find the analyzer when I do a search.
I also looked into the parameter analyzer in es.search, but it returns an error:
..unrecognized parameter: [analyzer]
Can I get any suggestions here in terms of a customized analyzer? Thank you!
query_body = {
"query": {
"match": {
"employer":{
"query": txt,
"fuzziness": 'AUTO',
"analyzer" : 'custom_lowercase_stemmed'
}
}
}
}
es.search(index='hello',body=query_body)
Here is the full error:
RequestError: RequestError(400, 'search_phase_execution_exception', '[match] analyzer [custom_lowercase_stemmed] not found')

I think you have to make sure you have the following:
Have your Configuration set properly. In your case, you should have in your settings the field "custom_lowercase_stemmed" as analyzer. You can also define the words you want to stop.
With the Python ES Client, you have to send the analyzer as a parameter of the method .search(). Check the docs. However, you can try to run your query as it is. I haven't played that much with analyzer.
Hope this is helpful! :D

Ensure that you have specified your analyzer at your mapping and ensure your querying the correct field as well.
For your question on the matching problem for removing duplicate names, at term level and short words, fuzziness and wildcard parameters would be the best fit!
Cheers,
Min Han (:

Best practice for collections in jsons: array vs dict/map

I need to pass data in a python back-end to a front end through an api call, using a json format. In the python back end, the data is in a dictionary structure, which I can easily and directly convert to a json. But should I?
My front-end developer believes the answer is no, for reasons related to best practice.
But I challenge that:
Is the best to structure a json as it is in python, or should it rather be converted to some other form, such as several arrays (as would be necessary in my example case below)?
Or, differently put:
What should be the governing principles related to collections/dicts/maps/arrays for interfacing information through jsons?
I've done some googling for an answer, but I've not come across much that addresses this directly. Links would be appreciated.
(Note about the example below: of course if the data is written to a database, it would probably make most sense for the front-end to access the database directly, but let's assume this is not the case)
Example:
In the back end there is a collection of objects called pets:
each item in the collection has a unique pet_id, some non-optional properties, e.g. name and date_of_birth, some optional properties registration_certificate_nr, adopted_from_kennel, some lists like siblings and children and some objects like medication.
Assuming that the front end needs all of this info at some point, it could be
{
"pets": {
"17-01-24-01": {
"name": "Buster",
"date_of_birth": "04/01/2017",
"registration_certificate_nr": "AAD-1123-1432"
},
"17-03-04-01": {
"name": "Hooch",
"date_of_birth": "05/02/2015",
"adopted_from_kennel": "Pretoria Shire",
"children": [
"17-05-01-01",
"17-05-01-02",
"17-05-01-03"
]
},
"17-05-01-01": {
"name": "Snappy",
"date_of_birth": "17-05-01",
"siblings": [
"17-05-01-02",
"17-05-01-03"
]
},
"17-05-01-02": {
"name": "Gizmo",
"date_of_birth": "17-05-01",
"siblings": [
"17-05-01-01",
"17-05-01-03"
]
},
"17-05-01-03": {
"name": "Toothless",
"date_of_birth": "17-05-01",
"siblings": [
"17-05-01-01",
"17-05-01-03"
],
"medication": [
{
"name": "anti-worm",
"code": "aw445",
"dosage": "1 pill per day"
},
{
"name": "disinfectant",
"code": "pdi-2",
"dosage": "as required"
}
]
}
}
}

JSON formatting is a somewhat subjective matter, and related disagreements are usually best settled between colleagues.
That being said, there are some potentially valid criticisms to be made against the JSON format in the question, especially if we are trying to create a consistent, RESTful API.
The 2 pain points that stand out:
A map collection is represented in JSON, which isn't really JSON standard compliant, or particularly RESTful.
None of the pet objects have an id defined. There is a pet_id mentioned in the question, but it seems to be maintained separately from the pet object itself. If a value is accessed in the pets map in the question, a user of the API would have to manually add the pet_id to the provided pet object in order to have the id available further down the line, when the full JSON may no longer be available.
The closest things we have to guiding standards in this situation is the REST architectural style and the JSON standard.
We can start by looking at the JSON standard. Here is a quote from the JSON wiki:
JavaScript syntax defines several native data types that are not included in the JSON standard: Map, Set, Date, Error, Regular Expression, Function, Promise, and undefined.
The key takeaway here is that JSON is not meant to represent the map data type. Python dictionaries are a map implementation, so directly serializing a dictionary to JSON with the intent to represent a map-like collection goes against the intended use of JSON.
For an individual object like a pet, the JSON object is appropriate, but for collections there is one option: the JSON array. There is a usage example with the JSON array further down in this answer.
There may be edge cases where deviating from the standard makes sense, but I don't see a reason in this scenario.
There are also some shortcomings in the JSON format from a RESTful design perspective. RESTful API design is nice because it encourages one to keep things simple and consistent. It also happens to be a de facto industry standard.
In a RESTful HTTP API, this is how fetching a single pet resource should look:
Request: GET /api/pets/17-01-24-01
Response: 200 {
"id": "17-01-24-01",
"name": "Buster",
"date_of_birth": "04/01/2017",
"registration_certificate_nr": "AAD-1123-1432"
}
The response is a completely defined resource with an explicitly defined id. It is also the simplest complete JSON representation of a pet.
Next, we define what fetching multiple pet resources looks like, assuming only 2 pets are defined:
Request: GET /api/pets
Response: 200 [
{
"id": "17-01-24-01",
"name": "Buster",
"date_of_birth": "04/01/2017",
"registration_certificate_nr": "AAD-1123-1432"
},
{
"id": "17-03-04-01",
"name": "Hooch",
"date_of_birth": "05/02/2015",
"adopted_from_kennel": "Pretoria Shire",
"children": [
"17-05-01-01",
"17-05-01-02",
"17-05-01-03"
]
}
]
The above response format is the most straight forward way to pluralize the single resource response format, thus keeping the API as simple and consistent as possible. (for the sake of brevity, I only used 2 of the sample resources from the question). Once again, the ids are explicitly defined, and belong to their respective pet objects.
Nothing is gained from adding map keys to the above format.
Proponents of the JSON format in the question may suggest to just add the id field into each pet object in order to work around pain point 2, but that would raise the question of repeating data within the response. Why does the id need to be both inside and outside the object? Surely it should only be on the inside? After eliminating the redundant data, the result will look like the response above.
That is the REST argument. There are use cases where REST doesn't really work, but this is far from that.
PS. Front ends should never access databases directly. The API is responsible for writing to and reading from whatever data persistence infrastructure is used. In a lot of bigger real world systems, there is even an additional BFF layer between the front end and the API(s), separating the front end and the DB even further.

MongoDB Update with Array Filters [duplicate]

I am trying to update a value in the nested array but can't get it to work.
My object is like this
{
"_id": {
"$oid": "1"
},
"array1": [
{
"_id": "12",
"array2": [
{
"_id": "123",
"answeredBy": [], // need to push "success"
},
{
"_id": "124",
"answeredBy": [],
}
],
}
]
}
I need to push a value to "answeredBy" array.
In the below example, I tried pushing "success" string to the "answeredBy" array of the "123 _id" object but it does not work.
callback = function(err,value){
if(err){
res.send(err);
}else{
res.send(value);
}
};
conditions = {
"_id": 1,
"array1._id": 12,
"array2._id": 123
};
updates = {
$push: {
"array2.$.answeredBy": "success"
}
};
options = {
upsert: true
};
Model.update(conditions, updates, options, callback);
I found this link, but its answer only says I should use object like structure instead of array's. This cannot be applied in my situation. I really need my object to be nested in arrays
It would be great if you can help me out here. I've been spending hours to figure this out.
Thank you in advance!

General Scope and Explanation
There are a few things wrong with what you are doing here. Firstly your query conditions. You are referring to several _id values where you should not need to, and at least one of which is not on the top level.
In order to get into a "nested" value and also presuming that _id value is unique and would not appear in any other document, you query form should be like this:
Model.update(
{ "array1.array2._id": "123" },
{ "$push": { "array1.0.array2.$.answeredBy": "success" } },
function(err,numAffected) {
// something with the result in here
}
);
Now that would actually work, but really it is only a fluke that it does as there are very good reasons why it should not work for you.
The important reading is in the official documentation for the positional $ operator under the subject of "Nested Arrays". What this says is:
The positional $ operator cannot be used for queries which traverse more than one array, such as queries that traverse arrays nested within other arrays, because the replacement for the $ placeholder is a single value
Specifically what that means is the element that will be matched and returned in the positional placeholder is the value of the index from the first matching array. This means in your case the matching index on the "top" level array.
So if you look at the query notation as shown, we have "hardcoded" the first ( or 0 index ) position in the top level array, and it just so happens that the matching element within "array2" is also the zero index entry.
To demonstrate this you can change the matching _id value to "124" and the result will $push an new entry onto the element with _id "123" as they are both in the zero index entry of "array1" and that is the value returned to the placeholder.
So that is the general problem with nesting arrays. You could remove one of the levels and you would still be able to $push to the correct element in your "top" array, but there would still be multiple levels.
Try to avoid nesting arrays as you will run into update problems as is shown.
The general case is to "flatten" the things you "think" are "levels" and actually make theses "attributes" on the final detail items. For example, the "flattened" form of the structure in the question should be something like:
{
"answers": [
{ "by": "success", "type2": "123", "type1": "12" }
]
}
Or even when accepting the inner array is $push only, and never updated:
{
"array": [
{ "type1": "12", "type2": "123", "answeredBy": ["success"] },
{ "type1": "12", "type2": "124", "answeredBy": [] }
]
}
Which both lend themselves to atomic updates within the scope of the positional $ operator
MongoDB 3.6 and Above
From MongoDB 3.6 there are new features available to work with nested arrays. This uses the positional filtered $[<identifier>] syntax in order to match the specific elements and apply different conditions through arrayFilters in the update statement:
Model.update(
{
"_id": 1,
"array1": {
"$elemMatch": {
"_id": "12","array2._id": "123"
}
}
},
{
"$push": { "array1.$[outer].array2.$[inner].answeredBy": "success" }
},
{
"arrayFilters": [{ "outer._id": "12" },{ "inner._id": "123" }]
}
)
The "arrayFilters" as passed to the options for .update() or even
.updateOne(), .updateMany(), .findOneAndUpdate() or .bulkWrite() method specifies the conditions to match on the identifier given in the update statement. Any elements that match the condition given will be updated.
Because the structure is "nested", we actually use "multiple filters" as is specified with an "array" of filter definitions as shown. The marked "identifier" is used in matching against the positional filtered $[<identifier>] syntax actually used in the update block of the statement. In this case inner and outer are the identifiers used for each condition as specified with the nested chain.
This new expansion makes the update of nested array content possible, but it does not really help with the practicality of "querying" such data, so the same caveats apply as explained earlier.
You typically really "mean" to express as "attributes", even if your brain initially thinks "nesting", it's just usually a reaction to how you believe the "previous relational parts" come together. In reality you really need more denormalization.
Also see How to Update Multiple Array Elements in mongodb, since these new update operators actually match and update "multiple array elements" rather than just the first, which has been the previous action of positional updates.
NOTE Somewhat ironically, since this is specified in the "options" argument for .update() and like methods, the syntax is generally compatible with all recent release driver versions.
However this is not true of the mongo shell, since the way the method is implemented there ( "ironically for backward compatibility" ) the arrayFilters argument is not recognized and removed by an internal method that parses the options in order to deliver "backward compatibility" with prior MongoDB server versions and a "legacy" .update() API call syntax.
So if you want to use the command in the mongo shell or other "shell based" products ( notably Robo 3T ) you need a latest version from either the development branch or production release as of 3.6 or greater.
See also positional all $[] which also updates "multiple array elements" but without applying to specified conditions and applies to all elements in the array where that is the desired action.

I know this is a very old question, but I just struggled with this problem myself, and found, what I believe to be, a better answer.
A way to solve this problem is to use Sub-Documents. This is done by nesting schemas within your schemas
MainSchema = new mongoose.Schema({
array1: [Array1Schema]
})
Array1Schema = new mongoose.Schema({
array2: [Array2Schema]
})
Array2Schema = new mongoose.Schema({
answeredBy": [...]
})
This way the object will look like the one you show, but now each array are filled with sub-documents. This makes it possible to dot your way into the sub-document you want. Instead of using a .update you then use a .find or .findOne to get the document you want to update.
Main.findOne((
{
_id: 1
}
)
.exec(
function(err, result){
result.array1.id(12).array2.id(123).answeredBy.push('success')
result.save(function(err){
console.log(result)
});
}
)
Haven't used the .push() function this way myself, so the syntax might not be right, but I have used both .set() and .remove(), and both works perfectly fine.

Add a validator to a Mongodb collection with pymongo

I am trying to add a validator to a MongoDB collection using pymongo.
The command I would like to run adapted from here
Is equivalent to this:
db.runCommand( {
collMod: "contacts",
validator: { phone: { $type: 'string' } },
validationLevel: "moderate"
} )
{ "ok" : 1 }
And subsequently will throw an error if a non-string datatype is inserted tin the phone field
Using python I did the following:
db.command({'collMod': 'contacts',
'validator': {'phone': {'$type': 'string'}},
'validationLevel': 'moderate'})
.
.
.
InvalidDocument: Cannot encode object: Collection(Database(MongoClient(host=['localhost:27017'], document_class=dict, tz_aware=False, connect=True), 'test_table'), 'contacts')
I'm sure that my python interpretation is wrong, that much is clear, however I have not been able to find the correct translation, or whether this is even possible in python

I eventually found the solution here. Hopefully it can help someone else.
Of course, when all else fails read the docs...
.. note:: the order of keys in the command document is
significant (the "verb" must come first), so commands
which require multiple keys (e.g. findandmodify)
should use an instance of :class:~bson.son.SON or
a string and kwargs instead of a Python dict
Also valid is an OrderedDict
query = [('collMod', 'contacts'),
('validator', {'phone': {'$type': 'string'}}),
('validationLevel', 'moderate')]
query = OrderedDict(query)
db.command(query)
{'ok': 1.0}
EDIT:
Current Documentation from where the above comes from. Note this was added after the question was originally answered so the documentation has changed, however it should still be relevant

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.