I am writing a script that requires interacting with several databases (not concurrently). In order to facilitate this, I am mainting the db related information (connections etc) in a dictionary. As an aside, I am using sqlAlchemy for all interaction with the db. I don't know whether that is relevant to this question or not.
I have a function to set up the pool. It looks somewhat like this:
def setupPool():
global pooled_objects
for name in NAMES:
engine = create_engine("postgresql+psycopg2://postgres:pwd#localhost/%s" % name)
metadata = MetaData(engine)
conn = engine.connect()
tbl = Table('my_table', metadata, autoload=True)
info = {'db_connection': conn, 'table': tbl }
pooled_objects[name] = info
I am not sure if there are any gotchas in the code above, since I am using the same variable names, and its not clear (to me atleast), how the underlying pointers to the resources (connection are being handled). For example, will creating another engine (to a different db) and assigning it to the 'engine' variable cause the previous instance to be 'harvested' by the GC (since no code is using that reference yet - the pool is still being setup).
In short, is the code above OK?, and if not, why not - i.e. how may I fix it with respect to the issues mentioned above?
The code you have is perfectly good.
Just because you use the same variable name does not mean you are overriding (or freeing) another object that was assigned to that variable. In fact, you can look at the names as temporary labels to your objects.
Now, you store the final objects in the global dictionary pooled_objects, which means that until your program is done or your delete data from there explicitely, GC is not going to free them.
Related
I have a Python app split across different files. One of them, models.py, contains, among PyQt5 table models, several maps referred from several PyQt5 form files:
# first lines:
agents_id_map = \
{agent.name:agent.id for agent in db.session.query(db.Agent, db.Agent.id)}
# ....
# 2000 thousand lines
I want to keep this kind of maps centralized in a single point. I'm using SQLAlchemy also. Agent class is defined in a db.py file. I use these maps to fulfill the foreign key in another object, say, an invoice, like:
invoice = db.Invoice()
# Here is a reference
invoice.agent_id = models.agents_id_map[agent_combo.currentText()]
ยทยทยทยท
db.session.add(invoice)
db.session.commit()
The problem is that the model.py module gets cached and several parts of the application access old data, and, if another running instance A of the app creates a new agent, and a running instance B wants to create a new invoice, the B running instance won't see the new Agent created by A unless restarts the app. This also happens if a user in the same running instance creates an agent and then he wants to create an invoice. My solutions are:
Reload the module, to get the whole code executed again, but this could be very expensive.
Isolate the code building those maps in another file, say maps.py, which would be less expensive to reload and change all code that references it through refactoring.
Is there a solution that would allow me to touch only the code building those maps and the rest of the application remains ignorant of the change, and every time the map is referenced from another module or even the same, the code gets executed, effectively re-building maps with fresh data?
Is there a solution that would allow me to touch only the code building those maps and the rest of the application remains ignorant of the change, and every time the map is referenced from another module or even the same, the code gets executed, effectively re-building maps with fresh data?
Certainly: put you maps inside a function, or even better, a class.
If I understand this problem correctly, you have stateful data (maps) which need regenerating under some condition (every time they are accessed? Or just every time the db is updated?). I would do something like this:
class Mappings:
def __init__(self, db):
self._db = db
... # do any initial db stuff you need to here
def id_map(self, thing):
db_thing = getattr(self._db, thing.title)
return {x.name:x.id for x in self._db.session.query(db_thing, db_thing.id)}
def other_property_map(self, prop):
... # etc
mapping = Mapping(db)
mapping.id_map("agent")
This assumes that the mapping example you've given is your major use-case, but this model could easily be adapted for almost any other mapping you might want.
You would write a method of every kind of 'mapping' you need, and it would return the desired dictionary. Note that here I've assumed you handle setting up the db elsewhere and pass a fully initialised db access object to the class, which is probably what you want to do---this class is just about encapsulating mapper state, not re-inventing your orm.
Caching
I have not provided any caching. But if you have complete control over the db, it is easy enough to run a hook before you do any db commits looking to see if you've touched any particular model, and then state that those need rebuilding. Something like this:
class DbAccess(Mappings):
def __init__(self, db, models):
super().init(db)
self._cached_map = {model: {} for model in models}
def db_update(model: str, params: dict):
try:
self._cached_map[model] = {} # wipe cache
except KeyError:
pass
self._db.update_with_model(model, params) # dummy fn
def id_map(self, thing: str):
try:
return self._cached_map[thing]["id"]
except KeyError:
self._cached_map[thing]["id"] = super().id_map(thing)
return self._cached_map[thing]["id"]
I don't really think DbAccess should inherit from Mappings---put it all in one class, or have a DB class and a Mappings mixin and inherit from both. I just didn't want to write everything out again.
I've not written any real db access routines, (hence my dummy fn) as I don't know how you're doing it (but clearly using an ORM). But the basic idea is just to handle the caching yourself, by storing the mapping every time, but deleting all the stored mappings every time you do any commit transactions involving the model in question (thus rebuilding the cache as needed).
Aside
Note that if you really do have 2,000 lines of manually declared mappings of the form thing.name: thing.id you really should generate them at runtime anyhow. Declarative is all very well and good, but writing out 2,000 permutations of the same thing isn't declarative, it's just time-consuming---and doing the job a simple loop putting the data in ram could do for you at startup.
Is there any way to explicitly mark an object as clean in the SQLAlchemy ORM?
This is related partly to a previous question on bulk update strategies.
I want to, within a before_flush event listener mark a bunch of object as actually not needing to be flushed. This is due to them being manually synced with the database by other means.
I have tried the strategy below, but it results in the object being removed from the session, which then can cause problems later when a lazy load happens.
#event.listens_for(SignallingSession, 'before_flush')
def before_flush(session, flush_context, instances):
ledgers = []
if session.dirty:
for elem in session.dirty:
if ( session.is_modified(elem, include_collections=False) ):
if isinstance(elem, Wallet):
session.expunge(elem) # causes problems later
ledgers.append(Ledger(id=elem.id, amount=elem.balance))
if ledgers:
session.bulk_save_objects(ledgers)
session.execute('UPDATE wallet w JOIN ledger l on w.id = l.id SET w.balance = l.amount')
session.execute('TRUNCATE ledger')
I want to do something like:
session.dirty.remove(MyObject)
But that doesn't work as session.dirty is a computed property, not a regular attribute. I've been digging around the instrumentation code, but can't see how I might fool the dirty list to not contain something. I see there is also a history on the object state that will need taking care of as well.
Any ideas? The underlying database is MySQL if that makes any difference.
-Matt
When you modify the database outside of the ORM, you can let the ORM know the current database state by using set_committed_value().
Example:
wallet = session.query(Wallet).filter_by(id=123)
wallet.balance = 0
session.execute("UPDATE wallet SET balance = 0 WHERE id = 123;")
set_committed_value(wallet, "balance", 0)
session.commit() # won't issue additional SQL to update wallet
If you really wanted to mark the instance as not dirty, you can muck with the internals of SQLAlchemy:
state = inspect(p)
session.identity_map._modified.discard(state)
state.modified = False
print(p in session.dirty) # False
Let me summarize this insanity.
from sqlalchemy.orm import attributes
attributes.instance_state(your_object).committed_state.clear()
Easy. (no)
there's something I'm struggling to understand with SQLAlchamy from it's documentation and tutorials.
I see how to autoload classes from a DB table, and I see how to design a class and create from it (declaratively or using the mapper()) a table that is added to the DB.
My question is how does one write code that both creates the table (e.g. on first run) and then reuses it?
I don't want to have to create the database with one tool or one piece of code and have separate code to use the database.
Thanks in advance,
Peter
create_all() does not do anything if a table exists already, so just call it as soon as you set up your engine or connection.
(Note that if you change your table schema, create_all() will not update it! So you still need "another program" to do that.)
This is the usual pattern:
def createEngine(metadata, dsn, **args):
engine = create_engine(dsn, **args)
metadata.create_all(engine)
return engine
def doStuff(engine):
res = engine.execute('select * from mytable')
# etc etc
def main():
engine = createEngine(metadata, 'sqlite:///:memory:')
doStuff(engine)
if __name__=='__main__':
main()
I think you're perhaps over-thinking the situation. If you want to create the database afresh, you normally just call Base.metadata.create_all() or equivalent, and if you don't want to do that, you don't call it.
You could try calling it every time and handling the exception if it goes wrong, assuming that the database is already set up.
Or you could try querying for a certain table and if that fails, call create_all() to put everything in place.
Every other part of your app should work in the same way whether you perform the db creation or not.
i want to know if db.run_in_transaction() acts as a lock for Data store operations
and helps in case of concurrent access on same entity.
Does in following code it is guarantied that a concurrent access will not cause a race and instead of creating new entity it will not do a over-write
Is db.run_in_transaction() correct/best way to do so
in following code i m trying to create new unique entity with following code
def txn(charmer=None):
new = None
key = my_magic() + random_part()
sk = Snake.get_by_name(key)
if not sk:
new = Snake(key_name=key, charmer= charmer)
new.put()
return new
db.run_in_transaction(txn, charmer)
That is a safe method. Should the same name get generated twice, only one entity would be created.
It sounds like you have already looked at the transactions documentation. There is also a more detailed description.
Check out the docs (specifically the equivalent code) on Model.get_or_insert, it answers exactly the question you are asking:
The get and subsequent (possible) put
are wrapped in a transaction to ensure
atomicity. Ths means that
get_or_insert() will never overwrite
an existing entity, and will insert a
new entity if and only if no entity
with the given kind and name exists.
What you've done is right and sort of duplicates the Model.get_or_insert, like Robert already explained.
I don't know if this can be called a 'lock'... the way this works is optimistic concurrency - the operation will execute assuming that no one else is trying to do the same thing at the same time, and if someone is, it will give you an exception. You'll need to figure out what you want to do in that case. Maybe ask the user to choose a new name?
say i want to ask the many users to give me their ID number and their name, than save it.
and than i can call any ID and get the name. can someone tell me how i can do that by making a class and using the _ _ init _ _ method?
The "asking" part, as #Zonda's answer says, could use raw_input (or Python 3's input) at a terminal ("command window" in Windows); but it could also use a web application, or a GUI application -- you don't really tell us enough about where the users will be (on the same machine you're using to run your code, or at a browser while your code runs on a server?) and whether GUI or textual interfaces are preferred, so it's impossible to give more precise advice.
For storing and retrieving the data, a SQL engine as mentioned in #aaron's answer is a possibility (though some might consider it overkill if this is all you want to save), but his suggested alternative of using pickle directly makes little sense -- I would instead recommend the shelf module, which offers (just about) the equivalent of a dictionary persisted to disk. (Keys, however, can only be strings -- but even if your IDs are integers instead, that's no problem, just use str(someid) as the key both to store and to retrieve).
In a truly weird comment I see you ask...:
is there any way to do it by making a
class? and using the __init__
method?
Of course there is a way to do "in a class, using the __init__ method" most anything you can do in a function -- at worst, you write all the code that would (in a sensible program) be in the function, in the __init__ method instead (in lieu of return, you stash the result in self.result and then get the .result attribute of the weirdly useless instance you have thus created).
But it makes any sense to use a class only when you need special methods, or want to associate state and behavior, and you don't at all explain why either condition should apply here, which is why I call your request "weird" -- you provide absolutely no context to explain why you would at all want that in lieu of functions.
If you can clarify your motivations (ideally by editing your question, or, even better, asking a separate one, but not by extending your question in sundry comments!-) maybe it's possible to help you further.
To get data from a user, use this code (python 3).
ID = input("Enter your id: ")
In python 2, replace input with raw_input.
The same should can be done to get the users name.
This will save it to a variable, which can be used later in the program. If you want to save it to a file, use the following code:
w = open('\path\to\file.txt', 'w')
w.write(ID, age)
w.close()
if you're not concerned with security, you can use the pickle module to pickle a dictionary.
import pickle
data = {}
# whatever you do to collect the data
data[id] = name
pickle.dump(data, filename)
new_data = pickle.load(filename)
new_name = new_data[id]
#new_name == name
otherwise use the sqlite3 module
import sqlite3
conn = sqlite3.connect(filename)
cur = conn.cursor()
cur.execute('CREATE TABLE IF NOT EXISTS names (id INTEGER, name TEXT)')
#do whatever you do to get the data
cur.execute('INSERT INTO names VALUES (?,?)', (id, name))
#to get the name later by id you would do...
cur.execute('SELECT name FROM names WHERE id = ?', (id, ))
name = cur.fetchone()[0]