I have a small json file, with the following lines:
{
"IdTitulo": "Jaws",
"IdDirector": "Steven Spielberg",
"IdNumber": 8,
"IdDecimal": "2.33"
}
An there is a schema in my db collection, named test_dec. This is what I've used to create the schema:
db.createCollection("test_dec",
{validator: {
$jsonSchema: {
bsonType: "object",
required: ["IdTitulo","IdDirector"],
properties: {
IdTitulo: {
"bsonType": "string",
"description": "string type, nombre de la pelicula"
},
IdDirector: {
"bsonType": "string",
"description": "string type, nombre del director"
},
IdNumber : {
"bsonType": "int",
"description": "number type to test"
},
IdDecimal : {
"bsonType": "decimal",
"description": "decimal type"
}
}
}}
})
I've made multiple attempts to insert the data. The problem is in the IdDecimal field value.
Some of the trials, replacing the IdDecimal line by:
"IdDecimal": 2.33
"IdDecimal": {"$numberDecimal": "2.33"}
"IdDecimal": NumberDecimal("2.33")
None of them work. The second one is the formal solution provided by MongoDB manuals (mongodb-extended-json) adn the error is the output I've placed in my question: bson.errors.InvalidDocument: key'$numberDecimal' must not start with '$'.
I am currently using a python to load the json. I've been playing around with this file:
import os,sys
import re
import io
import json
from pymongo import MongoClient
from bson.raw_bson import RawBSONDocument
from bson.json_util import CANONICAL_JSON_OPTIONS,dumps,loads
import bsonjs as bs
#connection
client = MongoClient('localhost',27018,document_class=RawBSONDocument)
db = client['myDB']
coll = db['test_dec']
other_col = db['free']
for fname in os.listdir('/mnt/win/load'):
num = re.findall("\d+", fname)
if num:
with io.open(fname, encoding="ISO-8859-1") as f:
doc_data = loads(dumps(f,json_options=CANONICAL_JSON_OPTIONS))
print(doc_data)
test = '{"idTitulo":"La pelicula","idRelease":2019}'
raw_bson = bs.loads(test)
load_raw = RawBSONDocument(raw_bson)
db.other_col.insert_one(load_raw)
client.close()
I am using a json file. If I try to parse anything like Decimal128('2.33') the output is "ValueError: No JSON object could be decoded", because my json has an invalid format.
The result of
db.other_col.insert_one(load_raw)
Is that the content of "test" is inserted.
But I cannot use doc_data with RawBSONDocument, because it goes like that. It says:
TypeError: unpack_from() argument 1 must be string or buffer, not list:
When I manage to parse the json directly to the RawBSONDocument I got all the trash within and the record in database looks like the sample here:
{
"_id" : ObjectId("5eb2920a34eea737626667c2"),
"0" : "{\n",
"1" : "\t\"IdTitulo\": \"Gremlins\",\n",
"2" : "\t\"IdDirector\": \"Joe Dante\",\n",
"3" : "\t\"IdNumber\": 6,\n",
"4" : "\"IdDate\": {\"$date\": \"2010-06-18T:00.12:00Z\"}\t\n",
"5" : "}\n"
}
It seems it is not that simple to load a extended json into MongoDB. The extended version is because I want to use schema validation.
Oleg pointed out that is numberDecimal and not NumberDecimal as I had it before. I've fixed the json file, but nothing changed.
Executed:
with io.open(fname, encoding="ISO-8859-1") as f:
doc_data = json.load(f)
coll.insert(doc_data)
And the json file:
{
"IdTitulo": "Gremlins",
"IdDirector": "Joe Dante",
"IdNumber": 6,
"IdDecimal": {"$numberDecimal": "3.45"}
}
One more roll of the dice from me. If you are using schema validation as you are, I would recommend defining a class and being explicit with defining each field and how you propose to convert the field to the relevant python datatypes. While your solution is generic, the data structure has to be rigid to match the validation.
IMO this is clearer and you have control over any errors etc within the class.
Just to confirm I ran the schema validation and this works with the supplied validation.
from pymongo import MongoClient
import bson.json_util
import dateutil.parser
import json
class Film:
def __init__(self, file):
data = file.read()
loaded = json.loads(data)
self.IdTitulo = loaded.get('IdTitulo')
self.IdDirector = loaded.get('IdDirector')
self.IdDecimal = bson.json_util.Decimal128(loaded.get('IdDecimal'))
self.IdNumber = int(loaded.get('IdNumber'))
self.IdDateTime = dateutil.parser.parse(loaded.get('IdDateTime'))
def insert_one(self, collection):
collection.insert_one(self.__dict__)
client = MongoClient()
mycollection = client.mydatabase.test_dec
with open('c:/temp/1.json', 'r') as jfile:
film = Film(jfile)
film.insert_one(mycollection)
gives:
> db.test_dec.findOne()
{
"_id" : ObjectId("5eba79eabf951a15d32843ae"),
"IdTitulo" : "Jaws",
"IdDirector" : "Steven Spielberg",
"IdDecimal" : NumberDecimal("2.33"),
"IdNumber" : 8,
"IdDateTime" : ISODate("2020-05-12T10:08:21Z")
}
>
JSON file used:
{
"IdTitulo": "Jaws",
"IdDirector": "Steven Spielberg",
"IdNumber": 8,
"IdDecimal": "2.33",
"IdDateTime": "2020-05-12T11:08:21+0100"
}
JSON with type information is called Extended JSON. Following the examples, construct extended json for your data:
ext_json = '''
{
"IdTitulo": "Jaws",
"IdDirector": "Steven Spielberg",
"IdNumber": 8,
"IdDecimal": {"$numberDecimal":"2.33"}
}
'''
In Python, use json_util to load extended json into a Python dictionary:
from bson.json_util import loads
doc = loads(ext_json)
print(doc)
# {u'IdTitulo': u'Jaws', u'IdDirector': u'Steven Spielberg', u'IdDecimal': Decimal128('2.33'), u'IdNumber': 8}
The result of this load is sometimes referred to as a "BSON document" but it is not BSON, which is binary. "BSON" in this context really means that some values are not of python standard library types. The "document" part basically means the object is a dictionary.
You will notice that IdNumber is of a non-standard library type:
print type(doc['IdDecimal'])
# <class 'bson.decimal128.Decimal128'>
To insert this dictionary into MongoDB, follow pymongo tutorial:
from pymongo import MongoClient
client = MongoClient('localhost', 14420)
db = client.test_database
collection = db.test_collection
collection.insert_one(doc)
print(doc)
Finally, I've got the solution and it is using RawBSONDocument.
First the json file:
{
"IdTitulo": "Dead Snow",
"IdDirector": "Tommy Wirkola",
"IdNumber": 11,
"IdDecimal": {"$numberDecimal": "2.22"}
}
& the validation schema file:
db.createCollection("test_dec",
{validator: {
$jsonSchema: {
bsonType: "object",
required: ["IdTitulo","IdDirector"],
properties: {
IdTitulo: {
"bsonType": "string",
"description": "string type, nombre de la pelicula"
},
IdDirector: {
"bsonType": "string",
"description": "string type, nombre del director"
},
IdNumber : {
"bsonType": "int",
"description": "number type to test"
},
IdDecimal : {
"bsonType": "decimal",
"description": "decimal type"
}
}
}}
})
So, the collection in this case is "test_dec".
And the python script that opens the file ".json", reads it and parses it to be imported into MongoDB.
import json
from bson.raw_bson import RawBSONDocument
from pymongo import MongoClient
import bsonjs
#connection
client = MongoClient('localhost',27018)
db = client['movieDB']
coll = db['test_dec']
#open an read file
with open('1.json', 'r') as jfile:
data = jfile.read()
loaded = json.loads(data)
dumped = json.dumps(loaded, indent=4)
bson_bytes = bsonjs.loads(dumped)
coll.insert_one(RawBSONDocument(bson_bytes))
client.close()
The inserted document:
{
"_id" : ObjectId("5eb971ec6fbab859dfae8a6f"),
"IdTitulo" : "Dead Snow",
"IdDirector" : "Toomy Wirkola",
"IdDecimal" : NumberDecimal("2.22"),
"IdNumber" : 11
}
I don't know how it flipped the fields IdDecimal and IdNumber, but it passes the validation and I am really happy.
I tried a document with 'hello' instead of a number in NumberDecimal and the insertion resulted in:
{
"_id" : ObjectId("5eb973b76fbab859dfae8ecd"),
"IdTitulo" : "Shining",
"IdDirector" : "Stanley Kubrick",
"IdDecimal" : NumberDecimal("NaN"),
"IdNumber" : 19
}
Thanks to all that tried to help. Specially Oleg!!! Thank you for being so patient.
Could you not just use bson.decimal128.Decimal128? Ot am I missing something?
from pymongo import MongoClient
from bson.decimal128 import Decimal128
db = MongoClient()['mydatabase']
data = {
"IdTitulo": "Jaws",
"IdDirector": "Steven Spielberg",
"IdNumber": 8,
"IdDecimal": "2.33"
}
data['IdDecimal'] = Decimal128(data['IdDecimal'])
db.other_col.insert_one(data)
I have a nested json and i want to find a record whose value is equal to a given number. I'm using pymongo in python wih equal operator but getting some errors:
from pymongo import MongoClient
import datetime
import json
import pprint
def connectDatabase(service):
try:
if service=="mongodb":
host = 'mongodb://connecion_string.mlab.com:31989'
database = 'xxx'
user = 'xxx'
password = 'xxx'
client = MongoClient(host)
client.xxx.authenticate(user, password, mechanism='SCRAM-SHA-1')
db = client.xxx
new_posts = [{"author": "Mike", "text": "Another post!",
"tags": [
{
"CSPAccountNo": "414693"
},
{
"CSPAccountNo": "349903"
}]
}]
result=db.posts1.insert_many(new_posts)
print (xxx.posts1.find( {"CSPAccountNo": { $eq: "414693" } } )
except Exception as e:
print(e)
pass
print (connectDatabase("mongodb"))
Error:
File "mongo.py", line 35
print (cstore.posts1.find( {"CSPAccountNo": { $eq: "414693" } } )
^
SyntaxError: invalid syntax
I'm very new to Mongodb. Any help would be appreciated
In Python you have to use Python syntax, not JS syntax like in the Mongo shell. That means that dictionary keys need quotes:
print(cstore.posts1.find( {"CSPAccountNo": { "$eq": "414693" } } )
(Note, in your code you never actually insert the new_posts into the collection, so your find call might not actually find anything.)
It should be
print (xxx.posts1.find( {"tags.CSPAccountNo": { $eq: "414693" } } )
Here is my sample mongodb database
database image for one object
The above is a database with an array of articles. I fetched only one object for simplicity purposes.
database image for multiple objects ( max 20 as it's the size limit )
I have about 18k such entries.
I have to extract the description and title tags present inside the (articles and 0) subsections.
The find() method is the question here.. i have tried this :
for i in db.ncollec.find({'status':"ok"}, { 'articles.0.title' : 1 , 'articles.0.description' : 1}):
for j in i:
save.write(j)
After executing the code, the file save has this :
_id
articles
_id
articles
and it goes on and on..
Any help on how to print what i stated above?
My entire code for reference :
import json
import newsapi
from newsapi import NewsApiClient
import pymongo
from pymongo import MongoClient
client = MongoClient()
db = client.dbasenews
ncollec = db.ncollec
newsapi = NewsApiClient(api_key='**********')
source = open('TextsExtractedTemp.txt', 'r')
destination = open('NewsExtracteddict.txt', "w")
for word in source:
if word == '\n':
continue
all_articles = newsapi.get_everything(q=word, language='en', page_size=1)
print(all_articles)
json.dump(all_articles, destination)
destination.write("\n")
try:
ncollec.insert(all_articles)
except:
pass
Okay, so I checked a little to update my rusty memory of pymongo, and here is what I found.
The correct query should be :
db.ncollec.find({ 'status':"ok",
'articles.title' : { '$exists' : 'True' },
'articles.description' : { '$exists' : 'True' } })
Now, if you do this :
query = { 'status' : "ok",
'articles.title' : { '$exists' : 'True' },
'articles.description' : { '$exists' : 'True' } }
for item in db.ncollect.find(query):
print item
And that it doesn't show anything, the query is correct, but you don't have the right database, or the right tree, or whatever.
But I assure you, that with the database you showed me, that if you do...
query = { 'status' : "ok",
'articles.title' : { '$exists' : 'True' },
'articles.description' : { '$exists' : 'True' } }
for item in db.ncollect.find(query):
save.write(item[0]['title'])
save.write(item[0]['description'])
It'll do what you wished to do in the first place.
Now, the key item[0] might not be good, but for this, I can't really be of any help since it is was you are showing on the screen. :)
Okay, now. I have found something for you that is a bit more complicated, but is cool :)
But I'm not sure if it'll work for you. I suspect you're giving us a wrong tree, since when you do .find( {'status' : 'ok'} ), it doesn't return anything, and it should return all the documents with a 'status' : 'ok', and since you have lots...
Anyways, here is the query, that you should use with .aggregate() method, instead of .find() :
elem = { '$match' : { 'status' : 'ok', 'articles.title' : { '$exists' : 'True'}, 'articles.description' : { '$exists' : 'True'}} }
[ elem, { '$unwind' : '$articles' }, elem ]
If you want an explanation as to how this works, I invite you to read this page.
This query will return ONLY the elements in your array that have a title, and a description, with a status OK. If an element doesn't have a title, or a description, it will be ignored.
I'm using python's elasticsearch module to connect and search through my elasticsearch cluster.
In the cluster, one of the fields in my index is 'message' - I want to query my elastic, from python, for a specific value in this 'message' field.
Here is my basic search which simply returns all logs of a specific index.
es = elasticsearch.Elasticsearch(source_cluster)
doc = {
'size' : 10000,
'query': {
'match_all' : {}
}
}
res = es.search(index='test-index', body=doc, scroll='1m')
How should I change this query in order to find all results with the word 'moved' in their 'message' field?
The equivalent query that does it from Kibana is:
_index:test-index && message: moved
Thanks,
Noam
You need to use the match query. Try this:
doc = {
'size' : 10000,
'query': {
'match' : {
'message': 'moved'
}
}
}
How to use the "suggest" feature in pyes? Cannot seem to figure it out due to poor documentation. Could someone provide a working example? None of what I tried appears to work. In the docs its listed under query, but using:
query = Suggest(fields="fieldname")
connectionobject.search(query=query)
Since version 5:
_suggest endpoint has been deprecated in favour of using suggest via _search endpoint. In 5.0, the _search endpoint has been optimized for suggest only search requests.
(from https://www.elastic.co/guide/en/elasticsearch/reference/5.5/search-suggesters.html)
Better way to do this is using search api with suggest option
from elasticsearch import Elasticsearch
es = Elasticsearch()
text = 'ra'
suggest_dictionary = {"my-entity-suggest" : {
'text' : text,
"completion" : {
"field" : "suggest"
}
}
}
query_dictionary = {'suggest' : suggest_dictionary}
res = es.search(
index='auto_sugg',
doc_type='entity',
body=query_dictionary)
print(res)
Make sure you have indexed each document with suggest field
sample_entity= {
'id' : 'test123',
'name': 'Ramtin Seraj',
'title' : 'XYZ',
"suggest" : {
"input": [ 'Ramtin', 'Seraj', 'XYZ'],
"output": "Ramtin Seraj",
"weight" : 34 # a prior weight
}
}
Here is my code which runs perfectly.
from elasticsearch import Elasticsearch
es = Elasticsearch()
text = 'ra'
suggDoc = {
"entity-suggest" : {
'text' : text,
"completion" : {
"field" : "suggest"
}
}
}
res = es.suggest(body=suggDoc, index="auto_sugg", params=None)
print(res)
I used the same client mentioned on the elasticsearch site here
I indexed the data in the elasticsearch index by using completion suggester from here