I'm querying an existing read-only database with SQLAlchemy, and wonder if there is a way to cache the queried table object locally (in an automatic way) so that I can use it later. The main reason of this need is not to lock the database while my script is running (e.g. I have to keep a session connected to wait for user's response), and the database is read-only so I really don't need the modified data to be synchronized back.
Right now I'm working through a solution to convert the queried results into pd.DataFrame, but it would be nice to keep the advantage of SQLAlchemy where the queried result can retain it's structure rather than being converted to a flat table in pd.DataFrame.
I'm new to SQLAlchemy and still learning. Any suggestions about the solution or if I miss some major features already provided in SQLAlchemy package would really be appreciated!
Related
I have many models with relational links to each other which I have to use. My code is very complicated so I cannot keep session alive after a query. Instead, I try to preload all the objects:
def db_get_structure():
with Session(my_engine) as session:
deps = {x.id: x for x in session.query(Department).all()}
...
return (deps, ...)
def some_logic(id):
struct = db_get_structure()
return some_other_logic(struct.deps[id].owner)
However, I get the following error anyway regardless of the fact that all the objects are already loaded:
sqlalchemy.orm.exc.DetachedInstanceError: Parent instance <Department at 0x10476e780> is not bound to a Session; lazy load operation of attribute 'owner' cannot proceed
Is it possible to link preloaded objects with each other so that the relations will work after session get closed?
I know about joined queries (.options(joinedload(), but this approach leads to more code lines and bigger DB request, and I think this should be solved simpler, because all the objects are already loaded into Python objects.
It's even possible now to request the related objects like struct.deps[struct.deps[id].owner_id], but I think the ORM should do this and provide shorter notation struct.deps[id].owner using some "cached load".
Whenever you access an attribute on a DB entity that has not yet been loaded from the DB, SQLAlchemy will issue an implicit SQL statement to the DB to fetch that data. My guess is that this is what happens when you issue struct.deps[struct.deps[id].owner_id].
If the object in question has been removed from the session it is in a "detached" state and SQLAlchemy protects you from accidentally running into inconsistent data. In order to work with that object again it needs to be "re-attached".
I've done this already fairly often with session.merge:
attached_object = new_session.merge(detached_object)
But this will reconile the object instance with the DB and potentially issue updates to the DB if necessary. The detached_object is taken as "truth".
I believe you can do the reverse (attaching it by reading from the DB instead of writing to it) by using session.refresh(detached_object), but I need to verify this. I'll update the post if I found something.
Both ways have to talk to the DB with at least a select to ensure the data is consistent.
In order to avoid loading, issue session.merge(..., load=False). But this has some very important cavetas. Have a look at the docs of session.merge() for details.
I will need to read up on your link you added concerning your "complicated code". I would like to understand why you need to throw away your session the way you do it. Maybe there is an easier way?
I know there are ways of storing data/tables from one server to another, such as the instruction provided here. However, due to I use python to scrape, create, and store data, I am wondering that whether I could fulfill this process by directly using SQLAlchemy. More precisely, after I store the scraped data in the database I create through SQLAlchemy in my own computer, can I simultaneously store.copy those database/tables to another computer/server directly through SQLAlchemy? Can anyone help? Thanks so much.
Some devices are asynchronously storing values on a common remote MySQL database server.
I would like to write a supervisor app in Python (and possibly SQLAlchemy) to recognize the external INSERT events on the database and act upon the last rows' data. This is to avoid a long manual test to see if every table is being updated regularly or a logger crashed.
Can somebody just tell me where to search online this kind of info and, even better, an example?
EDIT
I already read all tables periodically using a datetime primary key ({date_time}), loading the last row of each table, and comparing to the previous values:
SELECT * FROM table ORDER BY date_time DESC LIMIT 1
but it looks very cumbersome and doesn't guarantee that I don't lose some rows between successive database checks.
The engine is an old version of INNODB that I cannot upgrade: I cannot use the UPDATE field in schema because it simply doesn't work.
To reword my question:
How to listen any database event with a daemon-like Python application (sleeping thread) and wake up only when something happens?
I want also to avoid SQL triggers because this would be just too heavy to manage: tables are in hundreds and they are added/removed very often according to the active loggers.
I gave a look to SQLAlchemy but all reference I could find, if I don't misunderstood it, are decorators to act on INSERTs made by SQLAlchemy's itself. I didn't find anything about external changes to the database.
About the example request: I am not interested in a copy-and-paste, because first I want to understand how stuff works. I prefer (even incomplete) examples because SQLAlchemy documentation is far too deep for my knowledge and I simply cannot put the pieces together.
I'm completely new to managing data using databases so I hope my question is not too stupid but I did not find anything related using the title keywords...
I want to setup a SQL database to store computation results; these are performed using a python library. My idea was to use a python ORM like SQLAlchemy or peewee to store the results to a database.
However, the computations are done by several people on many different machines, including some that are not directly connected to internet: it is therefore impossible to simply use one common database.
What would be useful to me would be a way of saving the data in the ORM's format to be able to read it again directly once I transfer the data to a machine where the main database can be accessed.
To summarize, I want to do:
On the 1st machine: Python data -> ORM object -> ORM.fileformat
After transfer on a connected machine: ORM.fileformat -> ORM object -> SQL database
Would anyone know if existing ORMs offer that kind of feature?
Is there a reason why some of the machine cannot be connected to the internet?
If you really can't, what I would do is setup a database and the Python app on each machine where data is collected/generated. Have each machine use the app to store into its own local database and then later you can create a dump of each database from each machine and import those results into one database.
Not the ideal solution but it will work.
Ok,
thanks to MAhsan's and Padraic's answers I was able to find the how this can be done: the CSV format is indeed easy to use for import/export from a database.
Here are examples for SQLAlchemy (import 1, import 2, and export) and peewee
New to Pandas & SQL. Haven't found an answer specific to this config, and not sure if standard SQL wisdom applies when introducing pandas to the mix.
Doing a school project that involves ~300 gb of data in ~6gb .csv chunks.
School advised syncing data via dropbox, but this seemed impractical for a 4-person team.
So, current solution is AWS EC2 & RDS instance (MySQL, I think it'll be, 1 table).
What I wanted to confirm before we start setting it up:
If multiple users are working with (and occasionally modifying) the data, can this arrangement manage conflicts? e.g., if user A uses pandas to construct a dataframe from a query, are the records in that query frozen if user B tries to work with them?
My assumption is that the data in the frame are in memory, and the records in the SQL database are free to be modified by others until the dataframe is written back to the db, but I'm hoping that either I'm wrong or there's a simple solution here (like a random sample query for each user or something).
A pandas DataFrame object does not interact directly with the db. Once you read it in it sits in memory locally. You would have to use a method like DataFrame.to_sql to write your changes back to the MySQL DB. For more information on reading and writing to SQL tables, see the pandas documentation here.