I am moving the code that interacts with my Neo4J database into a separate python module so I can stop repeating myself in other modules.
The problem I am having is that in each function call in the new module I am having to have a separate call to...
db = Graph('http://localhost/db/data')
...to establish a connection to the database.
This seems really silly and is not solving my goal of reducing the amount of unnecessary code.
Normally, I would establish the connection in the main function but because this module is being called from elsewhere I can't do this.
I am looking for a way of establishing a local variable that will persist between function calls, so I can forget about connecting to the db.
Apart from code bloat you are making it extremely inefficient by having every function connect to a database. If you want to stick with procedural programming, then make it a global variable. With classes you can either connect in the __init__ method or pass database connection to __init__ as an argument. Most of my classes support both:
class A:
def __init__(self, ..., db_connection=None):
self.db_connection = db_connection or DbConnection()
...
def f(self):
self.db_connection("SELECT A from B")
Related
I'm building an application that uses Redis as a datastore. Accordingly, I have many functions that interact with Redis, usually as wrappers for a group of Redis commands.
As the application grows past my initial .py file, I'm at a loss for how to handle the Redis connection across multiple modules. Currently, my pointer to the Redis connection is declared at the top of the file and every function assumes it's present rather than passing it to every function. If I spread these functions into multiple files, then each module creates its own Redis pointer to use and each instance of the application opens up multiple connections to Redis.
I would like one instance to just make use of the same connection.
I don't want to do this:
import redis
class MyApp(object):
def __init__(self):
self.r = redis.Redis()
(all my app functions that touch redis go here)
I also don't want to pass the Redis pointer as an argument into every function.
Is there some other way I can get functions from different modules to share a single Redis() instance?
Something like this:
Module redis_manager:
class RedisManager(object):
def __init__():
# Connect to redis etc
self.redis = 12345
redis_manager = RedisManager()
Then in your other modules, you can do:
from redis_manager import redis_manager
redis_manager.redis.stuff
So I have a daemon process that talks to Postgres via sqlalchemy. The daemon does something like this:
while True:
oEngine = setup_new_engine()
with oEngine.connect() as conn:
Logger.debug("connection established")
DBSession = sessionmaker(bind=conn)()
Logger.debug('DBSession created. id={0}'.format(id(DBSession)))
#do a bunch of stuff with DBSession
DBSession.commit()
Logger.debug('DBSession committed. id={0}'.format(id(DBSession)))
On the first iteration of the forever loop everything works great. For a while. The DBSession successfully makes a few queries to the database. But then one query fails with the error:
OperationalError: (OperationalError) SSL SYSCALL error: Bad file descriptor
This speaks to me of a closed connection or file descriptor being used. But the connections are created and maintained by the daemon so I don't know what this means.
In other words what happens is:
create engine
open connection
setup dbsession
query dbsession => works great
query dbsession => ERROR
The query in question looks like:
DBSession.query(Login)
.filter(Login.LFTime == oLineTime)
.filter(Login.success == self.success)
.count()
which seems perfectly reasonable to me.
My question is: What kinds of reasons could there be for this kind of behaviour and how can I fix it or isolate the problem?
Let me know if you need more code. There is a heck of a lot of it so I went for the minimalist approach here...
I fixed this by thinking about the session scope instead of the transaction scope.
while True:
do_stuff()
def do_stuff():
oEngine = setup_new_engine()
with oEngine.connect() as conn:
Logger.debug("connection established")
DBSession = sessionmaker(bind=conn)()
#do a bunch of stuff with DBSession
DBSession.commit()
DBSession.close()
I would still like to know why this fixed things though...
You are creating the session inside your while loop, which is very ill-advised. With the code the way you had it the first time, you would spawn off a new connection at every iteration and leave it open. Before too long, you would be bound to hit some kind of limit and be unable to open yet another new session. (What kind of limit? Hard to say, but it could be a memory condition since DB connections are pretty weighty; it could be a DB-server limit where it will only accept a certain number of simultaneous user connections for performance reasons; hard to know and it doesn't really matter, because whatever the limit was, it has prevented you from using a very wasteful approach and hence has worked as intended!)
The solution you have hit upon fixes the problem because, as you open a new connection with each loop, so you also close it with each loop, freeing up the resources and allowing additional loops to create sessions of their own and succeed. However, this is still a lot of unnecessary busyness and a waste of processing resources on both the server and the client. I suspect it would work just as well-- and potentially be a lot faster-- if you move the sessionmaker outside the while loop.
def main():
oEngine = setup_new_engine()
with oEngine.connect() as conn:
Logger.debug("connection established")
DBSession = sessionmaker(bind=conn)()
apparently_infinite_loop(DBSession)
# close only after we are done and have somehow exited the infinite loop
DBSession.close()
def apparently_infinite_loop(DBSession):
while True:
#do a bunch of stuff with DBSession
DBSession.commit()
I don't currently have a working sqlalchemy setup, so you likely have some syntax errors in there, but anyway I hope it makes the point about the fundamental underlying issue.
More detail is available here: http://docs.sqlalchemy.org/en/rel_0_9/orm/session.html#session-faq-whentocreate
Some points from the docs to note:
"The Session will begin a new transaction if it is used again". So this is why you don't need to be constantly opening new sessions in order to get transaction scope; a commit is all it takes.
"As a general rule, the application should manage the lifecycle of the session externally to functions that deal with specific data." So your fundamental problem originally (and still) is all of that session management going on right down there inside the while loop right alongside your data processing code.
I want to write a custom OutputWriter for GAE's Mapreduce framework. This OutputWriter should open a direct tcp connection to an open MongoDB port, and write the results of the reduce step directly to this database.
I'm using pymongo to interact with mongodb. The existing Mapreduce library requires output writers to be JSON serializable. Once the output writer has thus established a connection with the mongodb instance like so:
from pymongo import Connection
conn = Connection(host=MONGODB_HOST, port=MONGODB_PORT)
db = conn.test_db
db.authenticate(MONGODB_USERNAME, MONGODB_PASSWD)
I'd like to either serialize Connection (of type pymongo.connection.Connection) or db itself (a pymongo.database.Database). Naturally, those objects aren't JSON serializable, so I thought I could just make a JSON dict with a pickled database inside, but it seems that pymongo doesn't natively support pickling these objects, i.e. neither has a __getstate__ method.
I assume I could simply store the connection and authentication parameters, and reopen a connection when the OutputWriter is deserialized, but that seems overly hacky and time and resource intensive.
Can someone point me to a workaround, or perhaps a different kind of serialization I haven't thought of?
I assume I could simply store the connection and authentication parameters, and reopen a connection when the OutputWriter is deserialized, but that seems overly hacky and time and resource intensive.
What else would you expect to be able to do? A database connection is, in general, a wrapper around some objects that lives outside of Python (sockets, file handles, instances of opaque objects created by a C library, etc.), so there's no way to just store one and restore it in a later instance of the process, pass it to a different process, etc. So, any general-purpose serialization for a class like this would have to work by storing the connection parameters and re-connecting.
But there are many cases where you wouldn't want to do that. (Also, remember that making something pickleable also makes it copyable, and it's far from clear that you'd always want to copy a database connection by opening a new distinct but equivalent connection.) Which is why most database connection objects and similar things are not pickleable.
Meanwhile, if you're trying to pass these around within a process, while the connection is still aliveā¦ then you shouldn't be pickling them in the first place, just pass references to the connection around.
So anyway, I'd suggest you do exactly what you suggested but didn't want to do, but wrap it up by subclassing (or monkeypatching) the two classes so they can be pickled directly, instead of passing a bunch of separate parameters around and making everyone else have to know how to deal with it.
I don't think __getstate__ will work here. That would imply that you can make a database connection by default-constructing the instance and then setting attributes or calling methods after the fact, but most database connection classes require you to pass arguments into the constructor call to be used at __new__ or __init__ time. You could probably do this with just __getnewargs__ (which is actually even simpler than __getstate__), however. If not, you'll need the more complex __reduce__ mechanism.
I'm developing a python program to monitor and control a game-server. The game-server has many game-cores, and those cores handle the clients.
I have a python class called Server that holds instances of the class Core, and those instances are used to manage the actual game-cores. The Core class needs to connect to the game-core via TCP-Socket, in order to send commands to that specific game-core. To close those sockets properly, the Core class has a __del__ method which closes the socket.
An example:
class Server(object):
Cores = [] # list which will be filled with the Core objects
def __init__(self):
# detect the game-cores, create the core objects and append them to self.Cores
class Core(object):
CoreSocket = None # when the socket gets created, the socket-object will be bound to this variable
def __init__(self, coreID):
# initiate the socket connection between the running game-core and this python object
def __del__(self):
# properly close the socket connection
Now, when I use the Core class itself, the destructor always gets called properly. But when I use the Server class, the Core objects inside Server.Cores never get destructed. I have read that the gc has a problem with circular references and classes with destructors, but the Core objects never reference the Server object (only the socket-object, in Core.CoreSocket), so no circular references are created.
I usually prefer using the with-statement for resource cleaning, but in this case I need to send commands over many different methods in the Server class, so using with won't help ... I also tried to create and close the socket on each command, but that really kills the performance when I need to send many commands. Weak refereneces created with the weakref module won't help eigther, because the destructors then get called immediately after I create the Server object.
Why don't the Core objects get destructed properly when the Server object gets cleaned up by the gc? I guess I'm just forgetting something simple, but I just can't find out what it is.
Or maybe there is a better approach for closing those sockets when the object gets cleaned up?
You've mixed up class and instance members. Unlike in some other languages, defining a variable at class scope creates a class variable, not an instance variable. When a Server instance dies, the Server class is still around and still holds references to the cores. Define self.cores in the __init__ method instead:
class Server(object):
def __init__(self):
self.cores = []
I was wondering about implementing a singleton class following http://code.activestate.com/recipes/52558-the-singleton-pattern-implemented-with-python/ but was wondering about any (b)locking issues. My code is suppose to cache SQL statements and execute all cached statements using cursor.executemany(SQL, list-of-params) when a certain number of cached elements are reached or a specific execute-call is done by the user. Implementing a singleton was suppose to make it possible to cache statements application-wide, but Im afraid Ill run into (b)locking issues.
Any thoughts?
By avoiding lazy initialization the blocking problem will go away. In a module where initialization of your connection to the database is occurring import the module that contains the singleton and then immediately create an instance of the singleton that is not stored in a variable.
#Do Database Initialization
import MySingleton
MySingleton()
#Allow threads to be created
Why don't you use the module directly (as pointed out before, models are Singletons). If you create a module like:
# mymodule.py
from mydb import Connection
connection = Connection('host', 'port')
you can use the import mechanism and the connection instance will be the same everywhere.
from mymodule import connection
Of course, you can define a much more complex initialization of connection (possibly via writing your own class), but the point is that Python will only initialize the module once, and provide the same objects for every subsequent call.
I believe the Singleton (or Borg) patterns have very specific applications in Python, and for the most part you should rely on direct imports until proven otherwise.
There should be no problems unless you plan to use that Singleton instance with several threads.
Recently I've faced with some issue caused by wrongly implemented cache reloading mechanism - cache data was first cleared and then filled. This works well in single thread, but produces bugs in multithreading.
As long as you use CPython - Global Interpreter Lock should prevent blocking problems. You could also use the Borg pattern.