Data not visible in mongoDB when inserted from PyMongo - python

I am trying to insert data in a mongodb collection from python but the data is not being logged. This is how I am doing it:
from pymongo import MongoClient
import time
class data2db:
def __init__(self):
pass
def enter_data(self,data):
client = MongoClient('127.0.0.1', 27017)
db = client.db
coll=db.Temperature1
post = {"auth": data ,
"Time" : time.asctime(time.localtime(time.time()))}
post_= coll.insert(post)
c=data2db()
c.enter_data("24.3")
When I try to access the data from another method it returns null. This how I do it:
client = MongoClient('127.0.0.1', 27017)
db = client.db
coll=db.Temperature1
print coll.find_one({"_id" : 1})
print coll.find()
listed=str(coll.find({"_id" : 1})).split(' ')
listed=listed[len(listed)-1].split('>')[0]
listed={"_id" : "ObjectID(\""+listed+"\")"}
print coll.find_one(listed)'''
print db.command("collstats", "events")
As you can see, I already tried it in different ways but no matter what, it returns null. If I try to access the dbstats, like this:
print db.command("dbstats")
I get:
{u'extentFreeList': {u'totalSize': 0, u'num': 0}, u'storageSize': 49152, u'ok': 1.0, u'avgObjSize': 64.90566037735849, u'dataFileVersion': {u'major': 4, u'minor': 5}, u'db': u'db', u'indexes': 4, u'objects': 53, u'collections': 6, u'fileSize': 67108864, u'numExtents': 6, u'dataSize': 3440, u'indexSize': 32704, u'nsSizeMB': 16}
THe collection is not even showing on mongo commandline. Desperate for help.

Your code "works' but I think you may have copied-and-pasted something wrong. In particular, you're swapping the use of find() and find_one().
The enter_data() method calls insert() without specifying _id so the driver will invent one for you. That _id ends up being an ObjectId similar to this:
{ "_id" : ObjectId("558019749f43b8c19779c106"), "auth" : "24.3", "Time" : "Tue Jun 16 08:41:24 2015" }
Your code later calls print coll.find_one({"_id" : 1}) which will yield null because the invented _id will not be 1.
find() does not return a record; it returns a cursor. Calling print does not print the contents. Try this instead:
for r in coll.find():
print r
Lastly, because db is special convenience variable in the CLI, I'd avoid naming a database db.

Related

S3 Select Query JSON for nested value when keys are dynamic

I have a JSON object in S3 which follows this structure:
<code> : {
<client>: <value>
}
For example,
{
"code_abc": {
"client_1": 1,
"client_2": 10
},
"code_def": {
"client_2": 40,
"client_3": 50,
"client_5": 100
},
...
}
I am trying to retrieve the numerical value with an S3 Select query, where the "code" and the "client" are populated dynamically with each query.
So far I have tried:
sql_exp = f"SELECT * from s3object[*][*] s where s.{proc}.{client_name} IS NOT NULL"
sql_exp = f"SELECT * from s3object s where s.{proc}[*].{client_name}[*] IS NOT NULL"
as well as without the asterisk inside the square brackets, but nothing works, I get ClientError: An error occurred (ParseUnexpectedToken) when calling the SelectObjectContent operation: Unexpected token found LITERAL:UNKNOWN at line 1, column X (depending on the length of the query string)
Within the function defining the object, I have:
resp = s3.select_object_content(
Bucket=<bucket>,
Key=<filename>,
ExpressionType="SQL",
Expression=sql_exp,
InputSerialization={'JSON': {"Type": "Document"}},
OutputSerialization={"JSON": {}},
)
Is there something off in the way I define the object serialization? How can I fix the query so I can retrieve the desired numerical value on the fly when I provide ”code” and “client”?
I did some tinkering based on the documentation, and it works!
I need to access the single event in the EventStream (resp) as follows:
event_stream = resp['Payload']
# unpack successful query response
for event in event_stream:
if "Records" in event:
output_str = event["Records"]["Payload"].decode("utf-8") # bytes to string
output_dict = json.loads(output_str) # string to dict
Now the correct SQL expression is:
sql_exp= f"SELECT s['{code}']['{client}'] FROM S3Object s"
where I have gotten (dynamically) my values for code and client beforehand.
For example, based on the dummy JSON structure above, if code = "code_abc" and client = "client_2", I want this S3 Select query to return the value 10.
The f-string resolves to sql_exp = "SELECT s['code_abc']['client_2'] FROM S3Object s", and when we call resp, we retrieve output_dict = {'client_2': 10} (Not sure if there is a clear way to get the value by itself without the client key, this is how it looks like in the documentation as well).
So, the final step is to retrieve value = output_dict['client_2'], which in our case is equal to 10.

pymongo query for all items containing a unique identifier

I have a mongo collection with data structure in the follwoing way
content: {'description': { 'text': [{'_date': '2019-05-21','_sectionId': 'a13a','_objectId: 'f637cee'},
{'_date': '2019-05-21','_objectId': '8b2ed183', '_source: 'f637cee'},
{ etc....}
{'_date': '2019-05-21','_sectionId': 'a13a','_objectId: 'XXXcee'}
},
'client' : {.....},
}
I am looking for the way to query the collection to get a list of tuples in the following way:
given a section Id I would like to get the corresponding 'objectId'
In this case the result would be:
('a13a','f637cee'), ('a13a','XXXcee')
I started to do something like this:
import pymongo
myclient = pymongo.MongoClient(mongoconnection)
print('databases names:')
myclient.list_database_names()
# getting the collection:
mydb = myclient["clients"]
query = {'content.description.text._sectionId': 'a13a'}
cur = mydb.find(query)
But I dont know how to extract the information from the cursor.
Some help?
Note the info might be nested in different places, i.e. there are more nodes preceding "content" that can vary.
Thanks a lot
Use the second parameter of the find() to get required fields.
Ex:
query = {'content.description.text._sectionId': 'a13a'}
cur = mydb.find(query, { "_id": 0, "_sectionId": 1, "_objectId": 1 })
print([tuple(i.values()) for i in cur])

How to get a single value of a collection in MongoDB python

I have the following python code in MongoDB:
input_1 = object_collection.find({"_id": ObjectId(key_1)})
for i in input_1:
print(i)
and it returns this:
{'_id': ObjectId('5d949843cc1e1fc0556983bc'), 'x_input': '11', 'y_input': '22'}
I am interested only in x_input and y_input where I would like to store them in order to calculate the same of them
So you can use 'project' in your query...
from pymongo import MongoClient
from bson import ObjectId
if __name__ == '__main__':
client = MongoClient("localhost:27017", username="barry", password="barry", authSource="admin", authMechanism="SCRAM-SHA-256")
with client.start_session(causal_consistency = True) as my_session:
with my_session.start_transaction():
db = client.mydatabase
collection = db.mycollection
for result in collection.find({"_id": ObjectId("5d97713e11261b4afebe517b")}, {"_id": 0, "x_input": 1}):
print (str(result))
See the bit ...
{"_id": 0, "x_input": 1}
... this instructs the query engine to turn off display for "_id", and turn on display for "x_input". If we specific any 'project' at all, then all fields we want to see must be specified. "_id" is the oddball to this strategy and will always show unless turned off.
Results:
{u'x_input': u'11'}

PyMongo and Mongodb: using update()

I am trying to use pymongo to update an existing index:
#!/usr/bin/env python
import pymongo
from pymongo import MongoClient
client = MongoClient()
db = client.alfmonitor
tests = db.tests
post = {
'name' : 'Blah',
'active' : True,
'url' : 'http://www.google.com',
}
tests.insert(post)
tests_list = db.tests.find({'active':True, 'name':'Blah'})
for test in tests_list:
test['active'] = False
test.update(
test,
)
print '===================================================='
for test in db.tests.find():
print test #<- when I print out these, active=True is still showing up
I've been trying to follow documentation and examples I have seen on SO but none of them seem to be working for me. Anyone can explain what I'm doing wrong here?
Thanks!
Use this (don't forget to add multi=True if you want to update all matches):
db.tests.update({'active':True, 'name':'Blah'}, {'$set': {'active': False}}, multi=True)
Why your code isn't working:
for test in tests_list:
# test is of a dict type
test['active'] = False
# You are calling the method of dict type
# that adds all values from dictionary test to dictionary test,
# so nothing goes to database
test.update(
test,
)
When you want to commit changes made to a retrieved doc, use collection.save instead of update:
test['active'] = False
db.tests.save(test)
Both of these worked for me:
db.tests.update(
{'active':True, 'name':'Blah'},
{
'$set': {'active': False}
},
multi=True
)
tests_list = db.tests.find({'active':True, 'name':'Blah'})
for test in tests_list:
test['active'] = False
db.tests.save(test)
Thanks very much traceur and JonnyHK!

Mongoengine, retriving only some of a MapField

For Example.. In Mongodb..
> db.test.findOne({}, {'mapField.FREE':1})
{
"_id" : ObjectId("4fb7b248c450190a2000006a"),
"mapField" : {
"BOXFLUX" : {
"a" : "f",
}
}
}
The 'mapField' field is made of MapField of Mongoengine.
and 'mapField' field has a log of key and data.. but I just retrieved only 'BOXFLUX'..
this query is not working in MongoEngine....
for example..
BoxfluxDocument.objects( ~~ querying ~~ ).only('mapField.BOXFLUX')
AS you can see..
only('mapField.BOXFLUX') or only only('mapField__BOXFLUX') does not work.
it retrieves all 'mapField' data, including 'BOXFLUX' one..
How can I retrieve only a field of MapField???
I see there is a ticket for this: https://github.com/hmarr/mongoengine/issues/508
Works for me heres an example test case:
def test_only_with_mapfields(self):
class BlogPost(Document):
content = StringField()
author = MapField(field=StringField())
BlogPost.drop_collection()
post = BlogPost(content='Had a good coffee today...',
author={'name': "Ross", "age": "20"}).save()
obj = BlogPost.objects.only('author__name',).get()
self.assertEquals(obj.author['name'], "Ross")
self.assertEquals(obj.author.get("age", None), None)
Try this:
query = BlogPost.objects({your: query})
if name:
query = query.only('author__'+name)
else:
query = query.only('author')
I found my fault! I used only twice.
For example:
BlogPost.objects.only('author').only('author__name')
I spent a whole day finding out what is wrong with Mongoengine.
So my wrong conclusion was:
BlogPost.objects()._collection.find_one(~~ filtering query ~~, {'author.'+ name:1})
But as you know it's a just raw data not a mongoengine query.
After this code, I cannot run any mongoengine methods.
In my case, I should have to query depending on some conditions.
so it will be great that 'only' method overwrites 'only' methods written before.. In my humble opinion.
I hope this feature would be integrated with next version. Right now, I have to code duplicate code:
not this code:
query = BlogPost.objects()
query( query~~).only('author')
if name:
query = query.only('author__'+name)
This code:
query = BlogPost.objects()
query( query~~).only('author')
if name:
query = BlogPost.objects().only('author__'+name)
So I think the second one looks dirtier than first one.
of course, the first code shows you all the data
using only('author') not only('author__name')

Categories