Mongoengine, retriving only some of a MapField - python

For Example.. In Mongodb..
> db.test.findOne({}, {'mapField.FREE':1})
{
"_id" : ObjectId("4fb7b248c450190a2000006a"),
"mapField" : {
"BOXFLUX" : {
"a" : "f",
}
}
}
The 'mapField' field is made of MapField of Mongoengine.
and 'mapField' field has a log of key and data.. but I just retrieved only 'BOXFLUX'..
this query is not working in MongoEngine....
for example..
BoxfluxDocument.objects( ~~ querying ~~ ).only('mapField.BOXFLUX')
AS you can see..
only('mapField.BOXFLUX') or only only('mapField__BOXFLUX') does not work.
it retrieves all 'mapField' data, including 'BOXFLUX' one..
How can I retrieve only a field of MapField???

I see there is a ticket for this: https://github.com/hmarr/mongoengine/issues/508
Works for me heres an example test case:
def test_only_with_mapfields(self):
class BlogPost(Document):
content = StringField()
author = MapField(field=StringField())
BlogPost.drop_collection()
post = BlogPost(content='Had a good coffee today...',
author={'name': "Ross", "age": "20"}).save()
obj = BlogPost.objects.only('author__name',).get()
self.assertEquals(obj.author['name'], "Ross")
self.assertEquals(obj.author.get("age", None), None)

Try this:
query = BlogPost.objects({your: query})
if name:
query = query.only('author__'+name)
else:
query = query.only('author')

I found my fault! I used only twice.
For example:
BlogPost.objects.only('author').only('author__name')
I spent a whole day finding out what is wrong with Mongoengine.
So my wrong conclusion was:
BlogPost.objects()._collection.find_one(~~ filtering query ~~, {'author.'+ name:1})
But as you know it's a just raw data not a mongoengine query.
After this code, I cannot run any mongoengine methods.
In my case, I should have to query depending on some conditions.
so it will be great that 'only' method overwrites 'only' methods written before.. In my humble opinion.
I hope this feature would be integrated with next version. Right now, I have to code duplicate code:
not this code:
query = BlogPost.objects()
query( query~~).only('author')
if name:
query = query.only('author__'+name)
This code:
query = BlogPost.objects()
query( query~~).only('author')
if name:
query = BlogPost.objects().only('author__'+name)
So I think the second one looks dirtier than first one.
of course, the first code shows you all the data
using only('author') not only('author__name')

Related

pymongo query for all items containing a unique identifier

I have a mongo collection with data structure in the follwoing way
content: {'description': { 'text': [{'_date': '2019-05-21','_sectionId': 'a13a','_objectId: 'f637cee'},
{'_date': '2019-05-21','_objectId': '8b2ed183', '_source: 'f637cee'},
{ etc....}
{'_date': '2019-05-21','_sectionId': 'a13a','_objectId: 'XXXcee'}
},
'client' : {.....},
}
I am looking for the way to query the collection to get a list of tuples in the following way:
given a section Id I would like to get the corresponding 'objectId'
In this case the result would be:
('a13a','f637cee'), ('a13a','XXXcee')
I started to do something like this:
import pymongo
myclient = pymongo.MongoClient(mongoconnection)
print('databases names:')
myclient.list_database_names()
# getting the collection:
mydb = myclient["clients"]
query = {'content.description.text._sectionId': 'a13a'}
cur = mydb.find(query)
But I dont know how to extract the information from the cursor.
Some help?
Note the info might be nested in different places, i.e. there are more nodes preceding "content" that can vary.
Thanks a lot
Use the second parameter of the find() to get required fields.
Ex:
query = {'content.description.text._sectionId': 'a13a'}
cur = mydb.find(query, { "_id": 0, "_sectionId": 1, "_objectId": 1 })
print([tuple(i.values()) for i in cur])

Python JSON scraping - how can I handle missing values?

I'm pretty new to coding, so I'm learning a lot as I go. This problem got me stumped, and even though I can find several similar questions on here, I can't find one that works or has a recognizable syntax to me.
I'm trying to scrape various user data from a JSON API, og then store those values in a MySQL database I've set up.
The code seems to run fine for the most part, but some users does not have the attributes I'm trying to scrape in the JSON, and thus I'm left with Nonetype errors that I cant seem to foil.
If possible I'd like to just store "0" in the database where the json does not contain the attribute.
In the m/snippet below this works fine for users that has a job, but users without a job returns Nonetype on jobposition and apparently breaks the loop.
response = requests.get("URL")
json_obj = json.loads(response.text)
timer = json_obj['timestamp']
jobposition = json_obj['job']['position']
query = "INSERT INTO users (timer, jobposition) VALUES (%s, %s)"
values = (timer, jobposition)
cursor = db.cursor()
cursor.execute(query, values)
db.commit()
Thanks in advance!
You can use for that the get() method of the dictionary as follow
timer = json_obj.get('timestamp', 0)
0 is the default value and in case there is no 'timestamp' attribute it will return 0.
For job position, you can do
jobposition = json_obj['job'].get('position', 0) if 'job' in json_obj else 0
Try this
try:
jobposition = json_obj['job']['position']
except:
jobposition = 0
You can more clearly declare the data schema using dataclasses:
from dataclasses import dataclass
from validated_dc import ValidatedDC
#dataclass
class Job(ValidatedDC):
name: str
position: int = 0
#dataclass
class Workers(ValidatedDC):
timer: int
job: Job
input_data = {
'timer': 123,
'job': {'name': 'driver'}
}
workers = Workers(**input_data)
assert workers.job.position == 0
https://github.com/EvgeniyBurdin/validated_dc

Fetching data interactively from TinyDB

I'm trying to figure out how to use the data a user enters as input to get information from a TinyDB DB.
My DB looks something like this:
{"_default": {"1": {"switch": "n9k-c9372px", "names": ["nexus 9372px", "nexus 9372-px", "nexus9372px", "n9372px", "n9k-c9372px"], "fex_comp": ["2224tp", "2232pp"]}, "2": {"switch": "n9k-c9396px", "names": ["nexus 9396px", "nexus 9396-px", "nexus9396px", "n9396px", "n9k-c9396px"], "fex_comp": ["2232tm-e", "2248tp"]}}}
Basically, the DB is the result of two dictionaries with lists, like these:
{"switch": "switch1", "names": ["name1", "name2", "name3"], "fex_comp":["fex1", "fex2", "fex3"]
My idea is the following:
To have a prompt asking for a switch model (q= input("Tell me the
model")).
Take the input (q) from the user, and check if it matches
any of the "names" in the database.
If it does, then print the fex_comp list, the whole list. Otherwise, print a different message.
I understand how to form the if, else, statements and also how to use for loops, but I haven't managed to figure out how to do what I describe above.
Any help is much appreciated!
Edvard
Like so then?
from tinydb import TinyDB, Query
ql = ['nexus9372px','nexus9396px', 'not_there']
def mkdb():
db = TinyDB('db.json')
db.purge()
db.insert({'switch': 'n9k-c9372px',
'names': ['nexus 9372px',
'nexus 9372-px',
'nexus9372px', 'n9372px'],
'fex_comp': ['2224tp', '2232pp',
'2232tm', '2232tm-e']})
db.insert({"switch": "n9k-c9396px",
"names": ["nexus 9396px", "nexus 9396-px",
"nexus9396px", "n9396px",
"n9k-c9396px"],
"fex_comp": ["2232tm-e", "2248tp"]})
return(db)
def get_name():
return(input('Name? '))
def search(name, db):
Name = Query()
res = db.search(Name.names.any(name))
if res:
#print(res)
print('fex_comp for {}: {}'.format(name, res[0]['fex_comp']))
else:
print('{} not found'.format(name))
db = mkdb()
name = get_name()
search(name, db)

ElasticSearch and Python - Correct methodolgy

I am building a search engine for the list of articles I have. I was advised by a lot of people to use elastic search for full text search. I wrote the following code. It works. But I have a few issues.
1) If the same article is added twice - that is indexdoc is run twice for the same article, it accepts it and adds the article twice. Is there a way to have a "unique key" in the search index.
2) How can I change the scoring / ranking function? I want to give more importance to title?
3) Is this the correct way to do it anyways?
4) How do I show related results - if there is a spelling mistake?
from elasticsearch import Elasticsearch
from crsq.models import ArticleInfo
es = Elasticsearch()
def indexdoc(articledict):
doc = {
'text': articledict['articlecontent'],
'title' : articledict['articletitle'],
'url': articledict['url']
}
res = es.index(index="article-index", doc_type='article', body=doc)
def searchdoc(keywordstr):
res = es.search(index="article-index", body={"query": {"query_string": {"query": keywordstr}}})
print("Got %d Hits:" % res['hits']['total'])
for hit in res['hits']['hits']:
print("%(url)s: %(text)s" % hit["_source"])
def indexurl(url):
articledict = ArticleInfo.objects.filter(url=url).values()
if len(articledict):
indexdoc(articledict)
return
1) You have to specify an id for you document. You have to add the parameter id when you are indexing
res = es.index(index="article-index", doc_type='article', body=doc, id="some_unique_id")
2) There is more than one way to do this, but for example you can boost title by changing a bit your query:
{"query": {"query_string": {"query": keywordstr, "fields" : ["text", "title^2"]}}
With this change title will have the double of importance that field text
3) As a proof of concept is not bad.
4) This is a big topic, I think you should check the documentation of suggesters

Mongoengine, after dictionary key field.. Mongoengine cannot convert field names to db_fields

If you try this code.. you can see the problem I have..
class Embedded(EmbeddedDocument):
boxfluxInt = IntField(default=0, db_field='i')
meta = {'allow_inheritance': False}
class Test(Document):
boxflux = MapField(field=EmbeddedDocumentField(Embedded), db_field='x')
meta = {'collection': 'test',
'allow_inheritance': False}
Test.drop_collection()
newTestDoc = Test()
newTestDoc.boxflux['DICTIONARY_KEY'] = Embedded(boxfluxInt=1)
newTestDoc.save()
Test.objects.update_one(inc__boxflux__DICTIONARY_KEY__boxfluxInt=1)
The result in Mongodb is like..
> db.test.findOne()
{
"_id" : ObjectId("4fbdbbc8c450190a50000001"),
"x" : {
"DICTIONARY_KEY" : {
"boxfluxInt" : 1,
"i" : 1
}
}
}
>
As you can see, I intended to increase 'x.DICTIONARY_KEY.i' by 1
but the result is that a new key (boxfluxInt) is created even though I set 'boxfluxInt' 's db_field as 'i'
Is it bug? or am I wrong?
I think the dictionary key ('DICTIONARY_KEY') makes conversion to mongo style db fields impossible.. if I'm correct..
OK this looks like a bug, best place to report them is in github: http://github.com/mongoengine/mongoengine
This wont get fixed until 0.7 as it will break existing users in production. So I'll have to write up migration notes as part of the fix.

Categories