I try to store (an admittedly very large) BLOB into an sqlite database using SqlAlchemy.
For the MCVE I use ubuntu-14.04.2-desktop-amd64.iso as BLOB I want to store. Its size:
$ ls -lhubuntu-14.04.2-desktop-amd64.iso
... 996M ... ubuntu-14.04.2-desktop-amd64.iso
The code
from pathlib import Path
from sqlalchemy import (Column, Integer, String, BLOB, create_engine)
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker
from sqlite3 import dbapi2 as sqlite
SA_BASE = declarative_base()
class DbPath(SA_BASE):
__tablename__ = 'file'
pk_path = Column(Integer, primary_key=True)
path = Column(String)
data = Column(BLOB, default=None)
def create_session(db_path):
db_url = 'sqlite+pysqlite:///{}'.format(db_path)
engine = create_engine(db_url, module=sqlite)
SA_BASE.metadata.create_all(engine)
session = sessionmaker(bind=engine)
return session()
if __name__ == '__main__':
pth = Path('/home/user/Downloads/iso/ubuntu-14.04.2-desktop-amd64.iso')
with pth.open('rb') as file_pointer:
iso_data = file_pointer.read()
db_pth = DbPath(path=str(pth), data=iso_data)
db_session = create_session('test.sqlite')
db_session.add(db_pth)
db_session.commit()
Running this raises the error
InterfaceError: (InterfaceError) Error binding parameter 1 - probably unsupported
type. 'INSERT INTO file (path, data) VALUES (?, ?)'
('/home/user/Downloads/iso/ubuntu-14.04.2-desktop-amd64.iso', <memory
at 0x7faf37cc18e0>)
I looked at the sqlite limitations but found nothing that should prevent me from doing this. Does SqlAlchemy have a limitation?
Everything of this works fine for this file:
$ ls -lh ubuntu-14.04.2-server-amd64.iso
... 595M ... ubuntu-14.04.2-server-amd64.iso
Is there a data size limit? or what do I have to do differently when the file size surpasses a certain (where would that be?) limit?
And whatever the answer about the limit: what im interested about is how can is store files of this size into sqlite using SqlAlchemy?
Related
I'm testing a FastAPI app with pytest. I've created a client fixture which includes a sqlite DB created from CSVs:
import pytest
from os import path, listdir, remove
from pandas import read_csv
from fastapi.testclient import TestClient
from api.main import app
from api.db import engine, db_url
#pytest.fixture(scope="session")
def client():
db_path = db_url.split("///")[-1]
if path.exists(db_path):
remove(db_path)
file_path = path.dirname(path.realpath(__file__))
table_path = path.join(file_path, "mockdb")
for table in listdir(table_path):
df = read_csv(path.join(table_path, table))
df.to_sql(table.split('.')[0], engine, if_exists="append", index=False)
client = TestClient(app)
yield client
My DB setup in the FastAPI app:
import os
from sqlalchemy import create_engine
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker
dirname = os.path.dirname(__file__)
if "pytest" in modules:
mock_db_path = os.path.join(dirname, '../test/mockdb/test.db')
db_url = f"sqlite:///{mock_db_path}"
else:
db_url = os.environ.get("DATABASE_URL", None)
if "sqlite" in db_url:
engine = create_engine(db_url, connect_args={"check_same_thread": False})
else:
engine = create_engine(db_url)
SessionLocal = sessionmaker(autocommit=False, autoflush=False, bind=engine)
Base = declarative_base()
This works: I can set up tests for app endpoints which query the DB and the data I put in the CSVs is returned, e.g. after adding one row to mockdb/person.csv:
from api.db import SessionLocal
db = SessionLocal()
all = db.query(Person).all()
print(all)
[<tables.Person object at 0x7fc829f81430>]
I am now trying to test code which adds new rows to tables in the database.
This only works if I specify the ID (assume this occurs during the pytest run):
db.add(Person(id=2, name="Alice"))
db.commit()
all = db.query(Person).all()
print(all)
[<tables.Person object at 0x7fc829f81430>, <tables.Person object at 0x7fc829f3bdc0>]
The above result is as I'd expect the program to behave. However, if I don't specify the ID, then the result is None:
db.add(Person(name="Alice"))
db.commit()
all = db.query(Person).all()
print(all)
[<tables.Person object at 0x7fc829f81430>, None]
This result is not how I expect the program to behave.
The code that I want to test does not specify IDs, it uses autoincrement as is good practice. Thus, I am unable to test this code. It simply creates these Nones.
At first, I though the culprit was not creating tables with Base.metadata.create_all(). However, I have tried placing this both in my client fixture, and following my DB setup (i.e. the first 2 code blocks above), but the result is the same: Nones.
Stepping through with the debugger, when the Person row is added, the following error appears:
sqlalchemy.orm.exc.ObjectDeletedError: Instance '<Person at 0x7fc829f3bdc0>' has been deleted, or its row is otherwise not present.
Why is the resulting row None and how do I solve this error?
The cause of the error was that I had a column type in my DB that was not compatible with SQLite, namely PostgresSQL's ARRAY type. Unfortunately there was no error message hinting at this. The simplest solution is to remove or change the type of this column.
It is also possible to retain the column and the SQLite fixture by changing client() as follows:
from mytableschema import MyOffendingTable
#pytest.fixture(scope="session")
def client():
table_meta = SBEvent.metadata.tables[MyOffendingTable.__tablename__]
table_meta._columns.remove(table_meta._columns["my_offending_column"])
Base.metadata.create_all(bind=engine)
db_path = db_url.split("///")[-1]
if path.exists(db_path):
remove(db_path)
file_path = path.dirname(path.realpath(__file__))
table_path = path.join(file_path, "mockdb")
for table in listdir(table_path):
df = read_csv(path.join(table_path, table))
df.to_sql(table.split('.')[0], engine, if_exists="append", index=False)
client = TestClient(app)
yield client
It is now possible to proceed as normal if you remove my_offending_column from the MyOffendingTable CSV. No more Nones!
Sadly querying the offending table during the test run will still run into issues as the SELECT statement will look for the nonexistent my_offending_column. For those needing to query said table, I recommend using dialect-specific compilation rules.
Pre-existing db and sqlalchemy. Using reflections I would like to query such database, but there is a problem. Table is called 'logs' and it has two foreign keys both referring table 'server'. Table server has column called 'class' and that name is restricted in python.
Code:
from sqlalchemy import orm, create_engine
from sqlalchemy.ext.automap import automap_base
from django.conf import settings
base = automap_base()
connection_setup = (
"{driver}://{user}:{password}#{host}:{port}/{dbname}".format(
**settings.ALCHEMY_DB))
engine = create_engine(connection_setup, echo=False)
base.prepare(engine, reflect=True)
scoped_session = orm.scoped_session(orm.sessionmaker(bind=engine))
session = scoped_session()
logs = base.classes.logs
server = base.classes.server
local_server = orm.aliased(server, name='local_server')
remote_server = orm.aliased(server, name='remote_server')
query = (
session
.query(
logs, local_server.class, remote_server.class)
.outerjoin(
local_server, logs.local_server_id == local_server.id
)
.outerjoin(
remote_server, logs.remte_server_id == remote_server.id
)
)
rows = query.all()
Exception:
File "ff.py", line 29
logs, local_server.class, remote_server.class)
^
SyntaxError: invalid syntax
How to approach such problem?
Probably the easiest solution:
getattr(local_server, 'class')
Alternatively it should be possible to explicitly override column:
https://docs.sqlalchemy.org/en/13/orm/extensions/automap.html#specifying-classes-explicitly
Tried and tested both of them.
I am transfering some data from one DB to another DB using sqlalchemy in python. I want to make a direct and rapid transfer.
I don't know how to use the function of bulk_insert_mappings() from SQLAlchemy. (Field-wise both tables are identical)
This is what I have tried so far.
from sqlalchemy import create_engine, Column, Integer, String, Date
from sqlalchemy.orm import sessionmaker
from sqlalchemy.ext.declarative import declarative_base
engine_old = create_engine('mysql+pymysql://<id>:<pw>#database_old.amazonaws.com:3306/schema_name_old?charset=utf8')
engine_new = create_engine('mysql+pymysql://<id>:<pw>#database_new.amazonaws.com:3306/schema_name_new?charset=utf8')
data_old = engine_before.execute('SELECT * FROM table_old')
session = sessionmaker()
session.configure(bind=engine_after)
s = session()
how to handle with "s.bulk_insert_mappings(????, data_old)"?**
Could anyone help me?
Thank you.
There are many ways to achieve moving data from one database to another. The specificity of the method depends your individual needs and what you already have implemented. Assuming that both databases old and new already have a schema in their respective DBs, you would need two separate bases and engines. The mapping of an existing database's schema is achieved using automap_base(). Below I am sharing a short example of how this would look like:
from sqlalchemy.orm import Session
from sqlalchemy import create_engine
from sqlalchemy.ext.automap import automap_base
old_base = automap_base()
old_engine = create_engine("<OLD_DB_URI>", echo=True)
old_base.prepare(old_engine, reflect=True)
TableOld = old_base.classes.table_old
old_session = Session(old_engine)
new_base = automap_base()
new_engine = create_engine("<NEW_DB_URI>", echo=True)
new_base.prepare(new_engine, reflect=True)
TableNew = old_base.classes.table_new
new_session = Session(new_engine)
# here you can write your queries
old_table_results = old_session.query(TableOld).all()
new_data = []
for result in old_table_results:
new = TableNew()
new.id = result.id
new.name = result.name
new_data.append(new)
new_session.bulk_save_objects(new_data)
new_session.commit()
Now, about you second question here's a link of examples directly from SQLAlchemy's site: http://docs.sqlalchemy.org/en/latest/_modules/examples/performance/bulk_inserts.html and to answer you question bulk_insert_mappings takes a two parameter a db model (TableNew or TableOld) in the example above and a list of dictionaries representing instances (aka rows) in a db model.
I have been struggling on this for a while now and did not find an answer yet, or maybe I already have seen the answer and just didnt get it - however, I hope I am able to describe my problem.
I have a MS SQL database in which the tables are grouped in namespaces (or whatever it is called), denoted by Prefix.Tablename (with a dot). So a native sql statement to request some content looks like this:
SELECT TOP 100
[Value], [ValueDate]
FROM [FinancialDataBase].[Reporting].[IndexedElements]
How to map this to sqlalchemy?
If the "Reporting" prefix would not be there, the solution (or one way to do it) looks like this:
from sqlalchemy import *
from sqlalchemy.ext.declarative import declarative_base, declared_attr
from sqlalchemy.orm import sessionmaker
def get_session():
from urllib.parse import quote_plus as urllib_quote_plus
server = "FinancialDataBase.sql.local"
connstr = "DRIVER={SQL Server};SERVER=%s;DATABASE=FinancialDataBase" % server
params = urllib_quote_plus(connstr)
base_url = "mssql+pyodbc:///?odbc_connect=%s" % params
engine = create_engine(base_url,echo=True)
Session = sessionmaker(bind=engine)
session = Session()
return engine, session
Base = declarative_base()
class IndexedElements(Base):
__tablename__ = "IndexedElements"
UniqueID = Column(String,primary_key=True)
ValueDate = Column(DateTime)
Value = Column(Float)
And then requests can be done and wrapped in a Pandas dataframe for example like this:
import pandas as pd
engine, session = get_session()
query = session.query(IndexedElements.Value,IndexedElements.ValueDate)
data = pd.read_sql(query.statement,query.session.bind)
But the SQL statement that is compiled and actually executed in this, includes this wrong FROM part:
FROM [FinancialDataBase].[IndexedElements]
Due to the namespace-prefix it would have to be
FROM [FinancialDataBase].[Reporting].[IndexedElements]
Simply expanding the table name to
__tablename__ = "Reporting.IndexedElements"
doesnt fix it, because it changes the compiled sql statement to
FROM [FinancialDataBase].[Reporting.IndexedElements]
which doesnt work properly.
So how can this be solved?
The answer is given in the comment by Ilja above:
The "namespace" is a so called schema and has to be declarated in the mapped object. Given the example from the opening post, the mapped table has to be defined like this:
class IndexedElements(Base):
__tablename__ = "IndexedElements"
__table_args__ = {"schema": "Reporting"}
UniqueID = Column(String,primary_key=True)
ValueDate = Column(DateTime)
Value = Column(Float)
Or define a base class containing these informations for different schemata. Check also "Augmenting the base" in sqlalchemy docs:
http://docs.sqlalchemy.org/en/latest/orm/extensions/declarative/mixins.html#augmenting-the-base
I'm trying to save an object into a mysql table. I created a database with a table, in this table there's a text column.
my actual code is
conn = MySQLdb.connect(host='localhost', user='root', passwd='password',db='database')
x = conn.cursor()
x.execute("""INSERT INTO table (title) VALUES (%s)""", (test,))
where test is the object I created parsing from json. After entering this command python shows 1L but when in sql i do
select * from table;
nothing appears, what is wrong?
You need to commit the changes you make to the data base. Use:
x.execute(...)
conn.commit()
I'd try one of two things. If I have to go with a full script like that, I don't bother using the conn... I'll do a subprocess call.
# farm_out_to_os.py
cmd = """INSERT INTO table (title) VALUES ({})""".format(test))
subprocess.call("mysql -u{} -p{} < {}".format(uname, pwd, cmd), shell=True)
But if you want to do it more programmatically, maybe consider using a full ORM like SQLAlchemy
# models.py
import sqlalchemy as db
from sqlalchemy import declarative_base
Base = declarative_base()
class MyTable(Base):
__tablename__ = 'mytable'
id = db.Column(db.Integer, primary_key=True)
val = db.Column(db.Integer)
def __init__(self, val):
self.val = val
And the code:
# code.py
from sqlalchemy import create_engine, sessionmaker
engine = create_engine(config.SQLALCHEMY_URL)
session = sqlalchemy.sessionmaker(bind=engine)
newval = models.MyTable(val=5)
session.add(newval)
session.commit()
session.close()
Depends on what you're trying to do :)