connection not getting created to the new dbname in mongoengine - python

Using python Mongoengine I am trying to create databases and add documents to different databases. Here's how I am trying to do it :
from mongoengine import *
class MongoDataAccessObject():
# method to connect to the database and initialize the tables etc.
def __init__(self, my_env, helperObj):
print "initializing db for the environment ", my_env
self.con = None
self.dbName = my_env
self.helper_obj = helperObj
try:
self.con = connect(db=self.dbName)
except Exception as e:
print e
def addRecord(self, document_object):
document_object.save()
Now, I pass the names of different databases that I want created while creating the object of the above class, and add the documents like this :
for my_env in list_of_envs:
dao = MongoDataAccessObject(my_env, helper_object)
dao.addRecord(myDocument)
Now there are 2 questions here:
For some reason all my documents keep getting added to the same DB (the first one being passed while MongoDataAccessObject object creation. I would assume that when I am creating a new object every time, while passing a different db name each time, a new connection should get created to the new db passed and documents should get added to the db which is currently connected to.
To verify if I am actually connected to the DB in question or not, I could not find a method like get_database_name() on the connection object. Is there a way to verify if I am getting connected to the DB name being passed or not ?
Ok did some more research and found this:
https://github.com/MongoEngine/mongoengine/issues/605
Tried it out like this in iptyhon:
from mongoengine import *
import datetime
class Page(Document):
title = StringField(max_length=200, required=True)
date_modified = DateTimeField(default=datetime.datetime.now)
def switch(model, db):
model._meta['db_alias'] = db
# must set _collection to none so it is re-evaluated
model._collection = None
return model
register_connection('default', name='testing')
register_connection('mycon', name='db1')
page = Page(title="Test Page")
page = switch(page, 'mycon')
page.save()
This works and creates a db named db1 and stores the document there.
Now I do this again:
register_connection('mycon2', name='db2')
page = Page(title="Test Page")
page = switch(page, 'mycon2')
page.save()
Contrary to my expectation this time db2 was not created (checked from both mongo client and from Robomongo), however the document was saved successfully. Wonder where exactly did the document get saved then ??
So to figure that out repeated the above exercise with a small change as below:
register_connection('mycon2', name='db2')
page = Page(title="Test Page")
page = switch(page, 'mycon2')
x = page.save()
# did a dir(x) and found that there is _get_db, so tried it out as below
x._get_db()
and the output was :
Database(MongoClient(host=['localhost:27017'], document_class=dict, tz_aware=False, connect=True, read_preference=Primary()), u'db2')
which I guess means that the document got saved in a database named db2. But where on earth is this db2 ???? Why can't I see it through either mongo client or even Robomongo etc. ?

Finally the only way I could find to achieve the above was through the context manager with statement provided by MongoEngine, which also is well documented here : http://docs.mongoengine.org/guide/connecting.html#switch-database
The way I did it in my code above is something like this :
In the first call to the db, a default db needs to be created, which should have the alias as default. Only then can another alias be created, else Mongoengine throws an error saying that no default db found.
So to do this, when the very first db object is created, a False flag is sent to the __init__of MongoDataAccessObject
So to do this, the MongoDataAccessObject was also changed to something like this:
class MongoDataAccessObject():
# method to connect to the database and initialize the tables etc.
def __init__(self, my_env, helperObj, is_default_db_set):
print "initializing db for the environment ", my_env
self.con = None
self.dbName = my_env
self.helper_obj = helperObj
self.db_alias = my_env
self.is_default_db_set = is_default_db_set
try:
# all of this part with the is_default_db_set and register_connection() is needed because MongoEngine does not
# provide a simple way of changing dbs or switching db connections. The only way to do it is through the switch_db()
# context manager they provide (which has been used in the addRecord() below)
if not self.is_default_db_set:
self.con = connect(db=self.dbName, alias='default')
else:
# register_connection(alias_name, db_name)
register_connection(self.db_alias, self.dbName)
except Exception as e:
print e
And the addRecord() was also modified as :
def addRecord(self, document_object):
with switch_db(model_object, self.db_alias) as model_object:
document_object = model_object()
document_object.save()
And this part above:
for my_env in list_of_envs:
dao = MongoDataAccessObject(my_env, helper_object)
dao.addRecord(myDocument)
was also modified as:
for my_env in list_of_envs:
dao = MongoDataAccessObject(my_env,helper_object,mongo_default_db_flag)
dao.addRecord(myDocument)
And this seemed to do the job for me.

Related

How to delete collection automatically in mongodb with pymongo ? (Django doesn't delete mongodb collection)

I have Django app which creates collections in MongoDB automatically. But when I tried to integrate the delete functionality, collections that are created with delete functionality are not deleted. Collections that are automatically created are edited successfully. This method is called in another file, with all parameters.
An interesting thing to note is when I manually tried to delete via python shell it worked. I won't be deleting the collections which are not required anymore.
import pymongo
from .databaseconnection import retrndb #credentials from another file all admin rights are given
mydb = retrndb()
class Delete():
def DeleteData(postid,name):
PostID = postid
tbl = name + 'Database'
liketbl = PostID + 'Likes'
likecol = mydb[liketbl]
pcol = mydb[tbl]
col = mydb['fpost']
post = {"post_id":PostID}
ppost = {"PostID":PostID}
result1 = mydb.commentcol.drop() #this doesn't work
result2 = mydb.likecol.drop() #this doesn't work
print(result1,'\n',result2) #returns none for both
try:
col.delete_one(post) #this works
pcol.delete_one(ppost) #this works
return False
except Exception as e:
return e
Any solutions, I have been trying to solve this thing for a week.
Should I change the database engine as Django doesn't support NoSQL natively. Although I have written whole custom scripts that do CRUD using pymongo.

Using mongoengine with multiprocessing - how do you close mongoengine connections?

No matter what I try I keep hitting the "MongoClient opened before fork" warning regarding not forking active mongo connections when trying to use multiprocessing on a mongoengine db. The standard mongo advice seems to be to only connect to the db from within the child processes but I think what I'm doing should be functionally equivalent because I'm closing the database prior to using multiprocessing however I still hit the problem.
Related questions either without a minimal example or with inapplicable solutions are here, here, and specifically for the case of flask/celery and here
Minimal example to reproduce the problem:
from mongoengine import connect, Document, StringField, ListField, ReferenceField
from pathos.multiprocessing import ProcessingPool
class Base(Document):
key = StringField(primary_key=True)
name = StringField()
parent = ReferenceField('Parent', required=True)
class Parent(Document):
key = StringField(primary_key=True)
name = StringField()
bases = ListField(ReferenceField('Base'))
def remove_base(key):
db = connect('mydb')
mongo_b = Base.objects().get(key=key)
mongo_b.parent.update(pull__bases=mongo_b)
mongo_b.delete()
### setup
db = connect('mydb', connect=False)
Base(key='b1', name='test', parent='p1').save()
Base(key='b2', name='test', parent='p1').save()
Base(key='b3', name='test2', parent='p1').save()
p=Parent(key='p1', name='parent').save()
p.update(add_to_set__bases='b1')
p.update(add_to_set__bases='b2')
p.update(add_to_set__bases='b3')
### find objects we want to delete
my_base_objects = Base.objects(name='test')
keys = [b.key for b in my_base_objects]
del my_base_objects
# close db to avoid problems?!
db.close()
del db
# parallel map removing base objects and references from the db
# warning generated here
pp = ProcessingPool(2)
pp.map(remove_base, keys)
Ok so I figured it out. Mongoengine caches connections to the database all over the place. If you manually remove them then the issue is resolved. Adding the following import
from mongoengine import connection
then adding in:
connection._connections = {}
connection._connection_settings ={}
connection._dbs = {}
Base._collection = None
Parent._collection = None
to the '#close db' section appears to solve the issue.
Complete code:
from mongoengine import connect, Document, StringField, ListField, ReferenceField, connection
from pathos.multiprocessing import ProcessingPool
class Base(Document):
key = StringField(primary_key=True)
name = StringField()
parent = ReferenceField('Parent', required=True)
class Parent(Document):
key = StringField(primary_key=True)
name = StringField()
bases = ListField(ReferenceField('Base'))
def remove_base(key):
db = connect('mydb', connect=False)
mongo_b = Base.objects().get(key=key)
mongo_b.parent.update(pull__bases=mongo_b)
mongo_b.delete()
def setup():
Base(key='b1', name='test', parent='p1').save()
Base(key='b2', name='test', parent='p1').save()
Base(key='b3', name='test2', parent='p1').save()
p=Parent(key='p1', name='parent').save()
p.update(add_to_set__bases='b1')
p.update(add_to_set__bases='b2')
p.update(add_to_set__bases='b3')
db = connect('mydb', connect=False)
setup()
### find objects we want to delete
my_base_objects = Base.objects(name='test')
keys = [b.key for b in my_base_objects]
del my_base_objects
### close db to avoid problems?!
db.close()
db = None
connection._connections = {}
connection._connection_settings ={}
connection._dbs = {}
Base._collection = None
Parent._collection = None
### parallel map removing base objects from the db
pp = ProcessingPool(2)
pp.map(remove_base, keys)
This got recently improved and as of MongoEngine>=0.18.0, the method disconnect() and disconnect_all() should be used to respectively disconnect 1 or all existing connections (changelog 0.18.0)
See official doc

SqlAlchemy query result outputting

I am trying to query one of my tables in my Postgres database using SqlAlchemy in Python 3. It runs the query fine but as I go through each row in the result that SqlAlchemy returns, I try to use the attribute 'text' (one of my column names). I receive this error:
'str' object has no attribute 'text'
I have printed the attribute like so:
for row in result:
print(row.text)
This does not give the error. The code that produces the error is below. However, to give my environment:
I have two servers running. One is for my database the other is for my python server.
Database Server:
Postgres v9.6 - On Amazon's RDS
Server with Python
Linux 3.13.0-65-generic x86_64 - On an Amazon EC2 Instance
SqlAlchemy v1.1.5
Python v3.4.3
Flask 0.11.1
Files related:
import sqlalchemy as sa
from sqlalchemy.ext.automap import automap_base
from sqlalchemy.orm import Session
import re
from nltk import sent_tokenize
class DocumentProcess:
def __init__(self):
...
Engine = sa.create_engine(
CONFIG.POSTGRES_URL,
client_encoding='utf8',
pool_size=20,
max_overflow=0
)
# initialize SQLAlchemy
Base = automap_base()
# reflect the tables
Base.prepare(Engine, reflect=True)
# Define all needed tables
self.Document = Base.classes.documents
self.session = Session(Engine)
...
def process_documents(self):
try:
offset = 5
limit = 50
###### This is the query in question ##########
result = self.session.query(self.Document) \
.order_by(self.Document.id) \
.offset(offset) \
.limit(limit)
for row in result:
# The print statement below does print out the text
print(row.text)
# when passing document.text to sent_tokenize, it
# gives the following error:
# 'str' object has no attribute 'text'
snippets = sent_tokenize(row.text.strip('\n')) # I have removed strip, but the same problem
except Exception as e:
logging.info(format(e))
raise e
This is my model for Document, in my PostgreSQL database:
class Document(db.Model):
__tablename__ = "documents"
id = db.Column(db.Integer, primary_key=True)
text = db.Column(db.Text)
tweet = db.Column(db.JSON)
keywords = db.Column(db.ARRAY(db.String), nullable=True)
def to_dict(self):
return dict(
id=self.id,
text=self.text,
tweet=self.tweet,
keywords=self.keywords
)
def json(self):
return jsonify(self.to_dict())
def __repr__(self):
return "<%s %r>" % (self.__class__, self.to_dict())
Things I have tried
Before, I did not have order_by in the Document query. This was working before. However, even removing order_by does not fix it anymore.
Used a SELECT statement and went through the result manually, but still the same result
What I haven't tried
I am wondering if its because I named the column 'text'. I noticed that when I write this query out in postgres, it highlights it as a reserved word. I'm confused why my query worked before, but now it doesn't work. Could this be the issue?
Any thoughts on this issue would be much appreciated.
It turns out that text is a reserved word in PostgreSQL. I renamed the column name and refactored my code to match. This solved the issue.
You are likely to get this error in PostgreSQL if you are creating a foreign table and one of the column datatype is text. Change it to character varying() and the error disappears!

django database inserts not getting picked up

We have a little bit of a complicated setup:
In our normal code, we connect manually to a mysql db. We're doing this because I guess the connections django normally uses are not threadsafe? So we let django make the connection, extract the information from it, and then use a mysqldb connection to do the actual querying.
Our code is largely an update process, so we have autocommit turned off to save time.
For ease of creating test data, I created django models that represent the tables, and use them to create rows to test on. So I have functions like:
def make_thing(**overrides):
fields = deepcopy(DEFAULT_THING)
fields.update(overrides)
s = Thing(**fields)
s.save()
transaction.commit(using='ourdb')
reset_queries()
return s
However, it doesn't seem to actually be committing! After I make an object, I later have code that executes raw sql against the mysqldb connection:
def get_information(self, value):
print self.api.rawSql("select count(*) from thing")[0][0]
query = 'select info from thing where column = %s' % value
return self.api.rawSql(query)[0][0]
This print statement prints 0! Why?
Also, if I turn autocommit off, I get
TransactionManagementError: This is forbidden when an 'atomic' block is active.
when we try to alter the autocommit level later.
EDIT: I also just tried https://groups.google.com/forum/#!topic/django-users/4lzsQAWYwG0, which did not help.
EDIT2: I checked from a shell against the database--the commit is working, it's just not getting picked up. I've tried setting the transaction isolation level but it isn't helping. I should add that a function further up from get_information uses this decorator:
def single_transaction(fn):
from django.db import transaction
from django.db import connection
def wrapper(*args, **kwargs):
prior_autocommit = transaction.get_autocommit()
transaction.set_autocommit(False)
connection.cursor().execute('set transaction isolation level read committed')
connection.cursor().execute("SELECT ##session.tx_isolation")
try:
result = fn(*args, **kwargs)
transaction.commit()
return result
finally:
transaction.set_autocommit(prior_autocommit)
django.db.reset_queries()
gc.collect()
wrapper.__name__ = fn.__name__
return wrapper

instance has no attribute (python)

I have a weird issue, which is probably easy to resolve.
I have a class Database with an __init__ and an executeDictMore method (among others).
class Database():
def __init__(self, database, server,login, password ):
self.database = database
my_conv = { FIELD_TYPE.LONG: int }
self.conn = MySQLdb.Connection(user=login, passwd=password, db=self.database, host=server, conv=my_conv)
self.cursor = self.conn.cursor()
def executeDictMore(self, query):
self.cursor.execute(query)
data = self.cursor.fetchall()
if data == None :
return None
result = []
for d in data:
desc = self.cursor.description
dict = {}
for (name, value) in zip(desc, d) :
dict[name[0]] = value
result.append(dict)
return result
Then I instantiate this class in a file db_functions.py :
from Database import Database
db = Database()
And I call the executeDictMore method from a function of db_functions :
def test(id):
query = "SELECT * FROM table WHERE table_id=%s;" %(id)
return db.executeDictMore(query)
Now comes the weird part.
If I import db_functions and call db_functions.test(id) from a python console:
import db_functions
t = db_functions.test(12)
it works just fine.
But if I do the same thing from another python file I get the following error :
AttributeError: Database instance has no attribute 'executeDictMore'
I really don't understand what is going on here. I don't think I have another Database class interfering. And I append the folder where the modules are in sys.path, so it should call the right module anyway.
If someone has an idea, it's very welcome.
You have another Database module or package in your path somewhere, and it is getting imported instead.
To diagnose where that other module is living, add:
import Database
print Database.__file__
before the from Database import Database line; it'll print the filename of the module. You'll have to rename one or the other module to not conflict.
You could at least try to avoid SQL injection. Python provides such neat ways to do so:
def executeDictMore(self, query, data=None):
self.cursor.execute(query, data)
and
def test(id):
query = "SELECT * FROM table WHERE table_id=%s"
return db.executeDictMore(query, id)
are the ways to do so.
Sorry, this should rather be a comment, but an answer allows for better formatting. Iam aware that it doesn't answer your question...
You should insert (not append) into your sys.path if you want it first in the search path:
sys.path.insert(0, '/path/to/your/Database/class')
Im not too sure what is wrong but you could try passing the database object to the function as an argument like
db_functions.test(db, 12) with db being your Database class

Categories