New to MongoDB, I am trying to optimize bulk writes to the database. I do not understand how to initialize the Bulk() operations, though.
My understanding is that since the inserts will be done on a collection, this is where (or rather "on what") initializeUnorderedBulkOp() should be initialized:
The code below covers all the cases I can think of:
import pymongo
conn = pymongo.MongoClient('mongodb.example.com', 27017)
db = conn.test
coll = db.testing
# tried all of the following
# this one does not make much sense to me as I insert to a collection, added for completeness
bulk = db.initializeUnorderedBulkOp()
# that one seems to be the most reasonable to me
bulk = coll.initializeUnorderedBulkOp()
# that one is from http://blog.mongodb.org/post/84922794768/mongodbs-new-bulk-api
bulk = db.collection('testing').initializeUnorderedBulkOp()
# the staging and execution
# bulk.find({'name': 'hello'}).update({'name': 'hello', 'who': 'world'})
# bulk.execute()
The exception raised is
TypeError: 'Collection' object is not callable. If you meant to call the 'initializeUnorderedBulkOp' method on a 'Database' object it is failing because no such method exists.
with 'Database' for the first case and 'Collection' for the last two
How should I use initializeUnorderedBulkOp() ?
In Python (using the pymongo module), the method name is not initializeUnorderedBulkOp but initialize_unordered_bulk_op.
You have to call it, as you correctly guessed, on the collection (in your case, coll.initialize_unordered_bulk_op() should work).
Related
When using SQLAlchemy (version 1.4.44) to create, drop or otherwise modify tables, the updates don't appear to be committing. Attempting to solve this, I'm following the docs and using the commit() function. Here's a simple example
from sqlalchemy import create_engine, text
engine = create_engine("postgresql://user:password#connection_string:5432/database_name")
with engine.connect() as connection:
sql = "create table test as (select count(1) as result from userquery);"
result = connection.execute(text(sql))
connection.commit()
This produces the error:
AttributeError: 'Connection' object has no attribute 'commit'
What am I missing?
The comment on the question is correct you are looking at the 2.0 docs but all you need to do is set future=True when calling create_engine() to use the "commit as you go" functionality provided in 2.0.
SEE migration-core-connection-transaction
When using 2.0 style with the create_engine.future flag, “commit as
you go” style may also be used, as the Connection features autobegin
behavior, which takes place when a statement is first invoked in the
absence of an explicit call to Connection.begin():
The documentation is actually misleading (version 1.4). We can see using Connection.commit() method in documentation describing rows inserting, but the method doesn't exist.
I have managed to find a clarity explanation for using transations in the transactions section:
The block managed by each .begin() method has the behavior such that the transaction is committed when the block completes. If an exception is raised, the transaction is instead rolled back, and the exception propagated outwards.
Example from documentation below. There is no commit() method calling.
# runs a transaction
with engine.begin() as connection:
r1 = connection.execute(table1.select())
connection.execute(table1.insert(), {"col1": 7, "col2": "this is some data"})
I recently became aware of the sqlalchemy raw_connections() method which gives you the ability to use the DBAPI for the given relational db. I was hoping I could use this method to load an existing database into memory like so:
engine = create_engine('sqlite://')
connection = engine.raw_connection()
src = sqlite3.connect("Name of existing db")
src.backup(connection)
Unfortuntaley the object returned by raw_connection() is a _ConnectionFairy object and the following error occurs: TypeError: backup() argument 1 must be sqlite3.Connection, not _ConnectionFairy
Does anyone know a work around this or possibly a working method of what I am trying to do?
While trying to do the following operation:
for line in blines:
line.account = get_customer(line.AccountCode)
I am getting an error while trying to assign a value to line.account:
DetachedInstanceError: Parent instance <SunLedgerA at 0x16eda4d0> is not bound to a Session; lazy load operation of attribute 'account' cannot proceed
Am I doing something wrong??
"detached" means you're dealing with an ORM object that is not associated with a Session. The Session is the gateway to the relational database, so anytime you refer to attributes on the mapped object, the ORM will sometimes need to go back to the database to get the current value of that attribute. In general, you should only work with "attached" objects - "detached" is a temporary state used for caching and for moving objects between sessions.
See Quickie Intro to Object States, then probably read the rest of that document too ;).
I had the same problem with Celery. Adding lazy='subquery' to relationship solved my problem.
I encountered this type of DetachedInstanceError when I prematurely close the query session (that is, having code to deal with those SQLAlchemy model objects AFTER the session is closed). So that's one clue to double check no session closure until you absolutely don't need interact with model objects, I.E. some Lazy Loaded model attributes etc.
I had the same problem when unittesting.
The solution was to call everything within the "with" context:
with self.app.test_client() as c:
res = c.post('my_url/test', data=XYZ, content_type='application/json')
Then it worked.
Adding the lazy attribute didn't work for me.
To access the attribute connected to other table, you should call it within session.
#contextmanager
def get_db_session(engine):
SessionLocal = sessionmaker(autocommit=False, autoflush=False, bind=engine)
db = SessionLocal()
try:
yield db
except Exception:
db.rollback()
raise
finally:
db.close()
with get_db_session(engine) as sess:
data = sess.query(Groups).all()
# `group_users` is connected to other table
print([x.group_users for x in data]) # sucess
print([x.group_users for x in data]) # fail
I'm new to SQLAlchemy and have inherited a somewhat messy codebase without access to the original author.
The code is litered with calls to DBSession.flush(), seemingly any time the author wanted to make sure data was being saved. At first I was just following patterns I saw in this code, but as I'm reading docs, it seems this is unnecessary - that autoflushing should be in place. Additionally, I've gotten into a few cases with AJAX calls that generate the error "InvalidRequestError: Session is already flushing".
Under what scenarios would I legitimately want to keep a call to flush()?
This is a Pyramid app, and SQLAlchemy is being setup with:
DBSession = scoped_session(sessionmaker(extension=ZopeTransactionExtension(), expire_on_commit=False))
Base = declarative_base()
The ZopeTransactionExtension on the DBSession in conjunction with the pyramid_tm being active on your project will handle all commits for you. The situations where you need to flush are:
You want to create a new object and get back the primary key.
DBSession.add(obj)
DBSession.flush()
log.info('look, my new object got primary key %d', obj.id)
You want to try to execute some SQL in a savepoint and rollback if it fails without invalidating the entire transaction.
sp = transaction.savepoint()
try:
foo = Foo()
foo.id = 5
DBSession.add(foo)
DBSession.flush()
except IntegrityError:
log.error('something already has id 5!!')
sp.rollback()
In all other cases involving the ORM, the transaction will be aborted for you upon exception, or committed upon success automatically by pyramid_tm. If you execute raw SQL, you will need to execute transaction.commit() yourself or mark the session as dirty via zope.sqlalchemy.mark_changed(DBSession) otherwise there is no way for the ZTE to know the session has changed.
Also you should leave expire_on_commit at the default of True unless you have a really good reason.
I am writing a script that requires interacting with several databases (not concurrently). In order to facilitate this, I am mainting the db related information (connections etc) in a dictionary. As an aside, I am using sqlAlchemy for all interaction with the db. I don't know whether that is relevant to this question or not.
I have a function to set up the pool. It looks somewhat like this:
def setupPool():
global pooled_objects
for name in NAMES:
engine = create_engine("postgresql+psycopg2://postgres:pwd#localhost/%s" % name)
metadata = MetaData(engine)
conn = engine.connect()
tbl = Table('my_table', metadata, autoload=True)
info = {'db_connection': conn, 'table': tbl }
pooled_objects[name] = info
I am not sure if there are any gotchas in the code above, since I am using the same variable names, and its not clear (to me atleast), how the underlying pointers to the resources (connection are being handled). For example, will creating another engine (to a different db) and assigning it to the 'engine' variable cause the previous instance to be 'harvested' by the GC (since no code is using that reference yet - the pool is still being setup).
In short, is the code above OK?, and if not, why not - i.e. how may I fix it with respect to the issues mentioned above?
The code you have is perfectly good.
Just because you use the same variable name does not mean you are overriding (or freeing) another object that was assigned to that variable. In fact, you can look at the names as temporary labels to your objects.
Now, you store the final objects in the global dictionary pooled_objects, which means that until your program is done or your delete data from there explicitely, GC is not going to free them.