RavenDB Object properly saved but some empty attributes when querying

RavenDB Object properly saved but some empty attributes when querying - python

I'm currently trying to save some python objects (websites) via PyRavenDB in a RavenDB database. The problem is that data are saved properly, but when I test it by querying the results, some of the attributes are returned empty.
The code is simple, I can't properly find be the problem.
The JSON object in the database is the following (verified via the DB web UI).
{
"htmlCode": "<code>TEST HTML</code>",
"added": "2017-02-21",
"uniqueid": "262e4584f3e546afa2c67045a0096b54",
"url": "www.example.com",
"myHash": "d41d8cd98f00b204e9800998ecf8427e",
"lastaccessed": "2017-02-21"
}
When I use this code to query
from pyravendb.store import document_store
store = document_store.documentstore(url="http://somewhere:someport", database="websites")
store.initialize()
with store.open_session() as session:
query_result = list(session.query().where_equals("www.example.com", url))
print query_result
print type(query_result)
return query_result
It returns this object :
{
'uniqueid': 'f942e86f965d4709a2d69caca3001f2a',
'url': '',
'myHash': 'd41d8cd98f00b204e9800998ecf8427e',
'htmlCode': '',
'added': '2017-02-21',
'lastaccessed': '2017-02-21'
}
As you can see, url and html code are empty. They should be okey since in DB they are properly stored.
Thanks.

The problem here is that you don't use the where_equal right.
where_equal first argument is the field name you want to query with and then the value (def where_equals(self, field_name, value)).
Just change this line query_result = list(session.query().where_equals("www.example.com", url))
To this query_result = list(session.query().where_equals("url", "www.example.com"))
This will fix your problem

Related

How do i get the recent inserted document in MongoDB with all it's fields?

I'm working on this REST application in python Flask and a driver called pymongo. But if someone knows mongodb well he/she maybe able to answer my question.
Suppose Im inserting a new document in a collection say students. I want to get the whole inserted document as soon as the document is saved in the collection. Here is what i've tried so far.
res = db.students.insert_one({
"name": args["name"],
"surname": args["surname"],
"student_number": args["student_number"],
"course": args["course"],
"mark": args["mark"]
})
If i call:
print(res.inserted_id) ## i get the id
How can i get something like:
{
"name": "student1",
"surname": "surname1",
"mark": 78,
"course": "ML",
"student_number": 2
}
from the res object. Because if i print res i am getting <pymongo.results.InsertOneResult object at 0x00000203F96DCA80>

Put the data to be inserted into a dictionary variable; on insert, the variable will have the _id added by pymongo.
from pymongo import MongoClient
db = MongoClient()['mydatabase']
doc = {
"name": "name"
}
db.students.insert_one(doc)
print(doc)
prints:
{'name': 'name', '_id': ObjectId('60ce419c205a661d9f80ba23')}

Unfortunately, the commenters are correct. The PyMongo pattern doesn't specifically allow for what you are asking. You are expected to just use the inserted_id from the result and if you needed to get the full object from the collection later do a regular query operation afterwards

PyMongo Atlas Search not returning anything

I'm trying to do a full text search using Atlas for MongoDB. I'm doing this through the PyMongo driver in Python. I'm using the aggregate pipeline, and doing a $search but it seems to return nothing.
cursor = db.collection.aggregate([
{"$search": {"text": {"query": "hello", "path": "text_here"}}},
{"$project": {"file_name": 1}}
])
for x in cursor:
print(x)
What I'm trying to achieve with this code is to search through a field in the collection called "text_here", and I'm searching for a term "hello" and returning all the results that contain that term and listing them by their "file_name". However, it returns nothing and I'm quite confused as this is almost identical to the example code on the documentation website. The only thing I could think of right now is that possible the path isn't correct and it can't access the field I've specified. Also, this code returns no errors, simply just returns nothing as I've tested by looping through cursor.

I had the same issue. I solved it by also passing the name of the index in the query. For example:
{
index: "name_of_the_index",
text: {
query: 'john doe',
path: 'name'
}
}
I followed the tutorials but couldn't get any result back without specifying the "index" name. I wish this was mentioned in the documentation as mandatory.

If you are only doing a find and project, you don't need an aggregate query, just a find(). The syntax you want is:
db.collection.find({'$text': {'$search': 'hello'}}, {'file_name': 1})
Equivalent using aggregate:
cursor = db.collection.aggregate([
{'$match': {'$text': {'$search': 'hello'}}},
{'$project': {'file_name': 1}}])
Worked example:
from pymongo import MongoClient, TEXT
db = MongoClient()['mydatabase']
db.collection.create_index([('text_here', TEXT)])
db.collection.insert_one({"text_here": "hello, is it me you're looking for", "file_name": "foo.bar"})
cursor = db.collection.find({'$text': {'$search': 'hello'}}, {'file_name': 1})
for item in cursor:
print(item)
prints:
{'_id': ObjectId('5fc81ce9a4a46710459de610'), 'file_name': 'foo.bar'}

Elasticsearch - Reindex single field with different analyzer using Python

I use dynamic mapping in elasticsearch to load my json file into elasticsearch, like this:
es = Elasticsearch([{'host': 'localhost', 'port': 9200}])
def extract():
f = open('tmdb.json')
if f:
return json.loads(f.read())
movieDict = extract()
def index(movieDict={}):
for id, body in movieDict.items():
es.index(index='tmdb', id=id, doc_type='movie', body=body)
index(movieDict)
How can I update mapping for single field? I have field title to which I want to add different analyzer.
title_settings = {"properties" : { "title": {"type" : "text", "analyzer": "english"}}}
es.indices.put_mapping(index='tmdb', body=title_settings)
This fails.
I know that I cannot update already existing index, but what is proper way to reindex mapping generated from json file? My file has a lot of fields, creating mapping/settings manually would be very troublesome.
I am able to specify analyzer for an query, like this:
query = {"query": {
"multi_match": {
"query": userSearch, "analyzer":"english", "fields": ['title^10', 'overview']}}}
How do I specify it for index or field?
I am also able to put analyzer to settings after closing and opening index
analysis = {'settings': {'analysis': {'analyzer': 'english'}}}
es.indices.close(index='tmdb')
es.indices.put_settings(index='tmdb', body=analysis)
es.indices.open(index='tmdb')
Copying exact settings for english analyzers doesn't do 'activate' it for my data.
https://www.elastic.co/guide/en/elasticsearch/reference/7.6/analysis-lang-analyzer.html#english-analyzer
By 'activate' I mean, search is not returned in a form processed by english analyzer ie. there are still stopwords.

Solved it with massive amount of googling....
You cannot change analyzer on already indexed data. This includes opening/closing of index. You can specify new index, create new mapping and load your data (quickest way)
Specifying analyzer for whole index isn't good solution, as 'english' analyzer is specific to 'text' fields. It's better to specify analyzer by field.
If analyzers are specified by field you also need to specify type.
You need to remember that analyzers are used at can be used at/or index and search time. Reference Specifying analyzers
Code:
def create_index(movieDict={}, mapping={}):
es.indices.create(index='test_index', body=mapping)
start = time.time()
for id, body in movieDict.items():
es.index(index='test_index', id=id, doc_type='movie', body=body)
print("--- %s seconds ---" % (time.time() - start))
Now, I've got mapping from dynamic mapping of my json file. I just saved it back to json file for ease of processing (editing). That's because I have over 40 fields to map, doing it by hand would be just tiresome.
mapping = es.indices.get_mapping(index='tmdb')
This is example of how title key should be specified to use english analyzer
'title': {'type': 'text', 'analyzer': 'english','fields': {'keyword': {'type': 'keyword', 'ignore_above': 256}}}

Flask-SQLAlchemy ORM/GeoAlchemy2 results to a dictionary and ultimately JSON

I am using Flask/SQLAlchemy to create a web app with a map in it, so naturally I'm using a PostGIS database. The geom column requires an ST_Transform and somehow I need to turn this column and all others into JSON. The general structure of the database is:
from app import login, db
from datetime import datetime
from geoalchemy2 import Geometry
from time import time
from flask import current_app
from sqlalchemy import func
class Streets(db.Model):
id = db.Column(db.Integer, primary_key=True)
street = db.Column(db.String(50))
geom = db.Column(Geometry(geometry_type='LINESTRING'))
def to_dict(self):
data = {
'id': self.id,
'street': self.street,
'_geom': func.ST_AsGeoJSON(func.ST_Transform(self.geom, 4326))
}
return data
My api route turns this result into an api:
return jsonify(Streets.query.get_or_404(id).to_dict())
But I keep getting this error: NameError: name 'ST_AsGeoJSON' is not defined
I also tried to create my _geom value like this:
data['_geom'] = db.session.query(func.ST_AsGeoJSON(func.ST_Transform(self.geom, 4326)))
The error message is: TypeError: Object of type 'BaseQuery' is not JSON serializable
Finally, I tried an api route like this:
data = Streets.to_dict(
db.session.query(
func.ST_AsGeoJSON(
func.ST_Transform(
Streets.geom, 4326
)
)
)
.filter(Streets.id==id))
return jsonify(data)
And I get a different error:
AttributeError: 'BaseQuery' object has no attribute 'id'
If I run this in flask shell it works:
streets = db.session.query(
Streets.id,
Streets.street,
func.ST_AsGeoJSON(func.ST_Transform(Streets.geom, 4326)))
How can I perform ST_Transform and get JSON to my api route?
UPDATE
I found this in the SQLALchemy documentation that got me some progress: "orm.column_property() can be used to map a SQL expression". So I tried adding this to my class Streets(db.Model):
coords = db.column_property(func.ST_AsGeoJSON(func.ST_Transform(geom, 4326)))
Then I add it to data like this:
def to_dict(self):
data = {
'id': self.id,
'street': self.street,
'coords': self.coords
}
return data
But now I'm double encoding my results, once into GeoJSON and then I jsonify it:
return jsonify(Streets.query.get_or_404(id).to_dict())
So my api inserts \'s:
{"coords": "{\"type\":\"MultiLineString\",\"coordinates\":[[[-80.8357132798193,35.2260689001034],[-80.8347602582754,35.2252424284259]]]}"}
And using ST_AsText just turns it into text:
{"coords": "MULTILINESTRING((-80.8357132798193 35.2260689001034,-80.8347602582754 35.2252424284259))"}
I think I'm close with this update, but does anyone have a suggestion for getting correct GeoJSON with the JSON of the other fields of my database?

The first error
NameError: name 'ST_AsGeoJSON' is not defined
means that your example code is not what you were actually using. You had forgot to access it through func. It would not work after fixing that either, since you'd be mixing the SQL world and the Python world. func.ST_AsGeoJSON(...) creates an SQL function expression object that is supposed to be compiled to SQL and sent to the DB in a query, not passed to jsonify().
The second error
TypeError: Object of type 'BaseQuery' is not JSON serializable
should be somewhat obvious.
data['_geom'] = db.session.query(func.ST_AsGeoJSON(func.ST_Transform(self.geom, 4326)))
creates a Query, and a too broad query at that, since you've not limited it to fetch data of the current object. The Query object is not JSON serializable.
In
data = Streets.to_dict(db.session.query(...)...)
you pass the Query object as self to Streets.to_dict(), which then tries to access its id attribute in
'id': self.id,
which fails for obvious reasons – namely passing an unrelated object as the instance to a method.
The column_property() approach produces the doubly encoded JSON because SQLAlchemy does not by default expect ST_AsGeoJSON to return JSON and treats it as text instead, which it actually returns. Try decoding in between manually:
def to_dict(self):
data = {
'id': self.id,
'street': self.street,
'coords': json.loads(self.coords)
}
return data

Removing _id element from Pymongo results

I'm attempting to create a web service using MongoDB and Flask (using the pymongo driver). A query to the database returns documents with the "_id" field included, of course. I don't want to send this to the client, so how do I remove it?
Here's a Flask route:
#app.route('/theobjects')
def index():
objects = db.collection.find()
return str(json.dumps({'results': list(objects)},
default = json_util.default,
indent = 4))
This returns:
{
"results": [
{
"whatever": {
"field1": "value",
"field2": "value",
},
"whatever2": {
"field3": "value"
},
...
"_id": {
"$oid": "..."
},
...
}
]}
I thought it was a dictionary and I could just delete the element before returning it:
del objects['_id']
But that returns a TypeError:
TypeError: 'Cursor' object does not support item deletion
So it isn't a dictionary, but something I have to iterate over with each result as a dictionary. So I try to do that with this code:
for object in objects:
del object['_id']
Each object dictionary looks the way I'd like it to now, but the objects cursor is empty. So I try to create a new dictionary and after deleting _id from each, add to a new dictionary that Flask will return:
new_object = {}
for object in objects:
for key, item in objects.items():
if key == '_id':
del object['_id']
new_object.update(object)
This just returns a dictionary with the first-level keys and nothing else.
So this is sort of a standard nested dictionaries problem, but I'm also shocked that MongoDB doesn't have a way to easily deal with this.
The MongoDB documentation explains that you can exclude _id with
{ _id : 0 }
But that does nothing with pymongo. The Pymongo documentation explains that you can list the fields you want returned, but "(“_id” will always be included)". Seriously? Is there no way around this? Is there something simple and stupid that I'm overlooking here?

To exclude the _id field in a find query in pymongo, you can use:
db.collection.find({}, {'_id': False})
The documentation is somewhat missleading on this as it says the _id field is always included. But you can exclude it like shown above.

Above answer fails if we want specific fields and still ignore _id. Use the following in such cases:
db.collection.find({'required_column_A':1,'required_col_B':1, '_id': False})

You are calling
del objects['_id']
on the cursor object!
The cursor object is obviously an iterable over the result set and not single
document that you can manipulate.
for obj in objects:
del obj['_id']
is likely what you want.
So your claim is completely wrong as the following code shows:
import pymongo
c = pymongo.Connection()
db = c['mydb']
db.foo.remove({})
db.foo.save({'foo' : 42})
for row in db.foo.find():
del row['_id']
print row
$ bin/python foo.py
> {u'foo': 42}

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

RavenDB Object properly saved but some empty attributes when querying - python

Related

How do i get the recent inserted document in MongoDB with all it's fields?

PyMongo Atlas Search not returning anything

Elasticsearch - Reindex single field with different analyzer using Python

Flask-SQLAlchemy ORM/GeoAlchemy2 results to a dictionary and ultimately JSON

Removing _id element from Pymongo results

Categories

Resources