According to documentation on twisted deferToThread (http://twistedmatrix.com/documents/current/api/twisted.internet.threads.deferToThread.html) I can give it one function and it's arguments.
I want to limit the output of the documents I will find (say to 3 documents), so I also want to use the limit function from MongoDB driver (pymongo).
"find" creates a PyMongo Cursor, and does no more work. "find" does not send a message to the MongoDB server, and it does not retrieve any results. The work does not begin unless you iterate the cursor like this:
for doc in cursor:
print(doc)
Or:
all_docs = list(cursor)
So the way you're doing it is already wrong: you're deferring to a thread the work of creating a Cursor, which does not need to be deferred because it doesn't do network I/O. But you're then using the cursor on the main thread, which you do need to defer.
So I propose something like:
def find_all():
# find_one() actually does network I/O
doc1 = self.mongo_pool.database[collection].find_one(self.my_id)
# creating a cursor does no I/O
cursor = self.mongo_pool.database[collection].find().limit(3)
# calling list() on a cursor does network I/O
return doc1, list(cursor)
stuff_deferred = deferToThread(find_all)
Related
Is there a simple example of working with the neo4j python driver?
How do I just pass cypher query to the driver to run and return a cursor?
If I'm reading for example this it seems the demo has a class wrapper, with a private member func I pass to the session.write,
session.write_transaction(self._create_and_return_greeting, ...
That then gets called it with a transaction as a first parameter...
def _create_and_return_greeting(tx, message):
that in turn runs the cypher
result = tx.run("CREATE (a:Greeting) "
This seems 10X more complicated than it needs to be.
I did just try a simpler:
def raw_query(query, **kwargs):
neodriver = neo_connect() # cached dbconn
with neodriver.session() as session:
try:
result = session.run(query, **kwargs)
return result.data()
But this results in a socket error on the query, probably because the session goes out of scope?
[dfcx/__init__] ERROR | Underlying socket connection gone (_ssl.c:2396)
[dfcx/__init__] ERROR | Failed to write data to connection IPv4Address(('neo4j-core-8afc8558-3.production-orch-0042.neo4j.io', 7687)) (IPv4Address(('34.82.120.138', 7687)))
Also I can't return a cursor/iterator, just the data()
When the session goes out of scope, the query result seems to die with it.
If I manually open and close a session, then I'd have the same problems?
Python must be the most popular language this DB is used with, does everyone use a different driver?
Py2neo seems cute, but completely lacking in ORM wrapper function for most of the cypher language features, so you have to drop down to raw cypher anyway. And I'm not sure it supports **kwargs argument interpolation in the same way.
I guess that big raise should help iron out some kinks :D
Slightly longer version trying to get a working DB wrapper:
def neo_connect() -> Union[neo4j.BoltDriver, neo4j.Neo4jDriver]:
global raw_driver
if raw_driver:
# print('reuse driver')
return raw_driver
neoconfig = NEOCONFIG
raw_driver = neo4j.GraphDatabase.driver(
neoconfig['url'], auth=(
neoconfig['user'], neoconfig['pass']))
if raw_driver is None:
raise BaseException("cannot connect to neo4j")
else:
return raw_driver
def raw_query(query, **kwargs):
# just get data, no cursor
neodriver = neo_connect()
session = neodriver.session()
# logging.info('neoquery %s', query)
# with neodriver.session() as session:
try:
result = session.run(query, **kwargs)
data = result.data()
return data
except neo4j.exceptions.CypherSyntaxError as err:
logging.error('neo error %s', err)
logging.error('failed query: %s', query)
raise err
# finally:
# logging.info('close session')
# session.close()
update: someone pointed me to this example which is another way to use the tx wrapper.
https://github.com/neo4j-graph-examples/northwind/blob/main/code/python/example.py#L16-L21
def raw_query(query, **kwargs):
neodriver = neo_connect() # cached dbconn
with neodriver.session() as session:
try:
result = session.run(query, **kwargs)
return result.data()
This is perfectly fine and works as intended on my end.
The error you're seeing is stating that there is a connection problem. So there must be something going on between the server and the driver that's outside of its influence.
Also, please note, that there is a difference between all of these ways to run a query:
with driver.session():
result = session.run("<SOME CYPHER>")
def work(tx):
result = tx.run("<SOME CYPHER>")
with driver.session():
session.write_transaction(work)
The latter one might be 3 lines longer and the team working on the drivers collected some feedback regarding this. However, there are more things to consider here. Firstly, changing the API surface is something that needs careful planning and cannot be done in say a patch release. Secondly, there are technical hurdles to overcome. Here are the semantics, anyway:
Auto-commit transaction. Runs only that query as one unit of work.
If you run a new auto-commit transaction within the same session, the previous result will buffer all available records for you (depending on the query, this will consume a lot of memory). This can be avoided by calling result.consume(). However, if the session goes out of scope, the result will be consumed automatically. This means you cannot extract further records from it. Lastly, any error will be raised and needs handling in the application code.
Managed transaction. Runs whatever unit of work you want within that function. A transaction is implicitly started and committed (unless you rollback explicitly) around the function.
If the transaction ends (end of function or rollback), the result will be consumed and become invalid. You'll have to extract all records you need before that.
This is the recommended way of using the driver because it will not raise all errors but handle some internally (where appropriate) and retry the work function (e.g. if the server is only temporarily unavailable). Since the function might be executed multiple time, you must make sure it's idempotent.
Closing thoughts:
Please remember that stackoverlfow is monitored on a best-effort basis and what can be perceived as hasty comments may get in the way of getting helpful answers to your questions
Being new to both Python and sqlite, I've been playing around with them both recently trying to figure things out. In particular with sqlite I've learned how to open/close/commit data to a db. But now I'm trying to clean things up a bit so that I can open/close the db via function calls. For instance, I'd like to do something like:
def open_db():
conn = sqlite3.connect("path)
c = conn.cursor()
def close_db():
c.close()
conn.close()
def create_db():
open_db()
c.execute("CREATE STUFF")
close_db()
Then when I run the program, before I query or write to the table, I could do something like:
open_db()
c.execute('SELECT * DO STUFF')
OR
c.execute('DELETE * DO OTHER STUFF')
conn.commit
close_db()
I've read about context managers but I'm not sure I understand entirely whats going on with them. What would be the easiest solution to cleaning up the way I open/close my DB connections so I'm not always having to type in the cursor command.
This is because the connection you define is local to the open db function. Change it as follows
def open_db():
conn = sqlite3.connect("path)
return conn.cursor()
and then
c = open_db()
c.execute('SELECT * DO STUFF')
It should be noted that writing function like this purely as a learning exercise might be ok, but generally it's not very useful to write a thin wrapper around a database connectivity api.
I don't know that there is an easy way. As already suggested, if you make the name of a database cursor or connection local to a function then these will be lost upon exit from that function. The answer might be to write code using the contextlib module (which is included with the Python distribution, and documented in the help file); I wouldn't call that easy. The documentation for sqlite3 does mention that connection objects can be used as context managers; I suspect you've already noticed that. I also see that there's some sort of context manager for MySQL but I haven't used it.
I was hoping someone could explain to me how to use offsets or cursors in App Engine. I'm using gcloud to remote access entities for a huge data migration, and would like to grab data in batches of 100.
I'm guessing there is a very simple way to do this, but the documentation doesn't dive into cursors all too much. Here is what I have so far:
client = datastore.Client(dataset_id=projectID)
# retrieve 100 Articles
query = client.query(kind='Article').fetch(100)
for article in query:
print article
How could I mark the end of that batch of 100 and then move into the next one? Thanks so much!
Edit:
I should mention that I do not have access to the app engine environment, which is why I'm a bit lost at the moment... :(
I don't have any experience with gcloud, but I don't think this should be too different.
When you query, you will use fetch_page instead of the fetch function. The fetch_page function returns three things (results, cursor, more). The cursor is your bookmark for the query, and more is true if there are probably more results.
Once you've handled your 100 entities, you can pass on the cursor in urlsafe form to your request handler's URI, where you will continue the process starting at the new cursor.
from google.appengine.datastore.datastore_query import Cursor
class printArticles(webapp2.RequestHandler):
def post(self):
query = Client.query()
#Retrieve the cursor.
curs = Cursor(urlsafe=self.request.get('cursor'))
#fetch_page returns three things
articles, next_curs, more = query.fetch_page(100, start_cursor=curs)
#do whatever you need to do
for article in articles:
print article
#If there are more results to fetch
if more == True and next_curs is not None:
#then pass along the cursor
self.redirect("/job_URI?cursor=" + next_curs.urlsafe())
I'm writing a python CGI script that will query a MySQL database. I'm using the MySQLdb module. Since the database will be queryed repeatedly, I wrote this function....
def getDatabaseResult(sqlQuery,connectioninfohere):
# connect to the database
vDatabase = MySQLdb.connect(connectioninfohere)
# create a cursor, execute and SQL statement and get the result as a tuple
cursor = vDatabase.cursor()
try:
cursor.execute(sqlQuery)
except:
cursor.close()
return None
result = cursor.fetchall()
cursor.close()
return result
My question is... Is this the best practice? Of should I reuse my cursor within my functions? For example. Which is better...
def callsANewCursorAndConnectionEachTime():
result1 = getDatabaseResult(someQuery1)
result2 = getDatabaseResult(someQuery2)
result3 = getDatabaseResult(someQuery3)
result4 = getDatabaseResult(someQuery4)
or do away with the getDatabaseeResult function all together and do something like..
def reusesTheSameCursor():
vDatabase = MySQLdb.connect(connectionInfohere)
cursor = vDatabase.cursor()
cursor.execute(someQuery1)
result1 = cursor.fetchall()
cursor.execute(someQuery2)
result2 = cursor.fetchall()
cursor.execute(someQuery3)
result3 = cursor.fetchall()
cursor.execute(someQuery4)
result4 = cursor.fetchall()
The MySQLdb developer recommends building an application specific API that does the DB access stuff for you so that you don't have to worry about the mysql query strings in the application code. It'll make the code a bit more extendable (link).
As for the cursors my understanding is that the best thing is to create a cursor per operation/transaction. So some check value -> update value -> read value type of transaction could use the same cursor, but for the next one you would create a new one. This is again pointing to the direction of building an internal API for the db access instead of having a generic executeSql method.
Also remember to close your cursors, and commit changes to the connection after the queries are done.
Your getDatabaseResult function doesn't need to have a connect for every separate query though. You can share the connection between the queries as long as you act responsible with the cursors.
When the following code is executed:
q = MyKind.all()
taskqueue.add(url="/admin/build", params={'cursor': q.cursor()})
I get:
AssertionError: No cursor available.
Why does this happen? Do I need to fetch something first? (I'd rather not; the code is cleaner just to get the query and pass it on.)
I'm using Python on Google App Engine 1.3.5.
Yes, a cursor is only available if you've fetched something; there's no cursor for the first result in the query.
As a workaround, you could wrap the call to cursor() in a try/except and pass on None to the next task if there isn't a cursor available.