Related
In continue to my previous post
I'm trying to use the bulk_save_objects for a list of objects (the objects dont have a PK value therefore it should create it for each object). When I use the bulk_save_objects I see an insert per object instead of one insert for all objects.
The code :
class Product(Base):
__tablename__ = 'products'
id = Column('id',BIGINT, primary_key=True)
barcode = Column('barcode' ,BIGINT)
productName = Column('name', TEXT,nullable=False)
objectHash=Column('objectHash',TEXT,unique=True,nullable=False)
def __init__(self, productData,picture=None):
self.barcode = productData[ProductTagsEnum.barcode.value]
self.productName = productData[ProductTagsEnum.productName.value]
self.objectHash = md5((str(self.barcode)+self.produtName).encode('utf-8')).hexdigest()
Another class contains the following save method :
def saveNewProducts(self,products):
Session = sessionmaker()
session=Session()
productsHashes=[ product.objectHash for product in products]
query = session.query(Product.objectHash).filter(Product.objectHash.in_(productsHashes))
existedHashes=query.all()
newProducts = [ product for product in products if product.objectHash not in productsHashes]
/*also tried : session.bulk_save_objects(newProducts, preserve_order=False)*/
session.bulk_save_objects(newProducts)
UPDATE 1
I following what #Ilja Everilä recommended in the comments, I added a few parameters to the connection string :
engine = create_engine('postgresql://postgres:123#localhost:5432/mydb', pool_size=25, max_overflow=0,
executemany_mode='values',
executemany_values_page_size=10000, executemany_batch_page_size=500,
echo=True)
In the console I saw multiple inserts with the following format :
2019-09-16 16:48:46,509 INFO sqlalchemy.engine.base.Engine INSERT INTO products (barcode, productName, objectHash) VALUES (%(barcode)s, %(productName)s, %(objectHash)s, ) RETURNING products.id
2019-09-16 16:48:46,509 INFO sqlalchemy.engine.base.Engine {'barcode': '5008251', 'productName': 'ice ream','object_hash': 'b2752233ec523f2e874dc95b70020ae5'}
In my case, the solution I used : I deleted the id column and set the objectHash as PK, and afterwards the save_bulk and add_all functions worked and actually did bulk insert. It seems like those functions work only if you already have the pk inside the object.
I have to build a list of data from two tables. One regular table and another is association object table. Now I am doing something like this
items = User.query.all()
users_table = UsersTable(items)
and then using this with flask_table module like this:
class UsersTable(Table):
classes = ['table', 'table-striped']
id = Col('#')
name = Col('Name')
How to write the query for items object that it could contain also field from AssocationClass? I mean that for example in AssocationClass I have field user_type and when I write like this:
class UsersTable(Table):
classes = ['table', 'table-striped']
id = Col('#')
name = Col('Name')
user_type = Col('Type')
This will work.
PS. I tried like this but this don't want to work.
items=User.query.join(AssocationClass, (AssocationClass.id_user == User.id)).all()
If it would be a normal SQL I would write something like this:
Select id, name, user_type from User join AssocationClass(id_user);
I'm trying to adapt some part of a MySQLdb application to sqlalchemy in declarative base. I'm only beginning with sqlalchemy.
The legacy tables are defined something like:
student: id_number*, semester*, stateid, condition, ...
choice: id_number*, semester*, choice_id, school, program, ...
We have 3 tables for each of them (student_tmp, student_year, student_summer, choice_tmp, choice_year, choice_summer), so each pair (_tmp, _year, _summer) contains information for a specific moment.
select *
from `student_tmp`
inner join `choice_tmp` using (`id_number`, `semester`)
My problem is the information that is important to me is to get the equivalent of the following select:
SELECT t.*
FROM (
(
SELECT st.*, ct.*
FROM `student_tmp` AS st
INNER JOIN `choice_tmp` as ct USING (`id_number`, `semester`)
WHERE (ct.`choice_id` = IF(right(ct.`semester`, 1)='1', '3', '4'))
AND (st.`condition` = 'A')
) UNION (
SELECT sy.*, cy.*
FROM `student_year` AS sy
INNER JOIN `choice_year` as cy USING (`id_number`, `semester`)
WHERE (cy.`choice_id` = 4)
AND (sy.`condition` = 'A')
) UNION (
SELECT ss.*, cs.*
FROM `student_summer` AS ss
INNER JOIN `choice_summer` as cs USING (`id_number`, `semester`)
WHERE (cs.`choice_id` = 3)
AND (ss.`condition` = 'A')
)
) as t
* used for shorten the select, but I'm actually only querying for about 7 columns out of the 50 availables.
This information is used in many flavors... "Do I have new students? Do I still have all students from a given date? Which students are subscribed after the given date? etc..." The result of this select statement is to be saved in another database.
Would it be possible for me to achieve this with a single view-like class? The information is read-only so I don't need to be able to modify/create/delte. Or do I have to declare a class for each table (ending up with 6 classes) and every time I need to query, I have to remember to filter?
Thanks for pointers.
EDIT: I don't have modification access to the database (I cannot create a view). Both databases may not be on the same server (so I cannot create a view on my second DB).
My concern is to avoid the full table scan before filtering on condition and choice_id.
EDIT 2: I've set up declarative classes like this:
class BaseStudent(object):
id_number = sqlalchemy.Column(sqlalchemy.String(7), primary_key=True)
semester = sqlalchemy.Column(sqlalchemy.String(5), primary_key=True)
unique_id_number = sqlalchemy.Column(sqlalchemy.String(7))
stateid = sqlalchemy.Column(sqlalchemy.String(12))
condition = sqlalchemy.Column(sqlalchemy.String(3))
class Student(BaseStudent, Base):
__tablename__ = 'student'
choices = orm.relationship('Choice', backref='student')
#class StudentYear(BaseStudent, Base):...
#class StudentSummer(BaseStudent, Base):...
class BaseChoice(object):
id_number = sqlalchemy.Column(sqlalchemy.String(7), primary_key=True)
semester = sqlalchemy.Column(sqlalchemy.String(5), primary_key=True)
choice_id = sqlalchemy.Column(sqlalchemy.String(1))
school = sqlalchemy.Column(sqlalchemy.String(2))
program = sqlalchemy.Column(sqlalchemy.String(5))
class Choice(BaseChoice, Base):
__tablename__ = 'choice'
__table_args__ = (
sqlalchemy.ForeignKeyConstraint(['id_number', 'semester',],
[Student.id_number, Student.semester,]),
)
#class ChoiceYear(BaseChoice, Base): ...
#class ChoiceSummer(BaseChoice, Base): ...
Now, the query that gives me correct SQL for one set of table is:
q = session.query(StudentYear, ChoiceYear) \
.select_from(StudentYear) \
.join(ChoiceYear) \
.filter(StudentYear.condition=='A') \
.filter(ChoiceYear.choice_id=='4')
but it throws an exception...
"Could not locate column in row for column '%s'" % key)
sqlalchemy.exc.NoSuchColumnError: "Could not locate column in row for column '*'"
How do I use that query to create myself a class I can use?
If you can create this view on the database, then you simply map the view as if it was a table. See Reflecting Views.
# DB VIEW
CREATE VIEW my_view AS -- #todo: your select statements here
# SA
my_view = Table('my_view', metadata, autoload=True)
# define view object
class ViewObject(object):
def __repr__(self):
return "ViewObject %s" % str((self.id_number, self.semester,))
# map the view to the object
view_mapper = mapper(ViewObject, my_view)
# query the view
q = session.query(ViewObject)
for _ in q:
print _
If you cannot create a VIEW on the database level, you could create a selectable and map the ViewObject to it. The code below should give you the idea:
student_tmp = Table('student_tmp', metadata, autoload=True)
choice_tmp = Table('choice_tmp', metadata, autoload=True)
# your SELECT part with the columns you need
qry = select([student_tmp.c.id_number, student_tmp.c.semester, student_tmp.stateid, choice_tmp.school])
# your INNER JOIN condition
qry = qry.where(student_tmp.c.id_number == choice_tmp.c.id_number).where(student_tmp.c.semester == choice_tmp.c.semester)
# other WHERE clauses
qry = qry.where(student_tmp.c.condition == 'A')
You can create 3 queries like this, then combine them with union_all and use the resulting query in the mapper:
view_mapper = mapper(ViewObject, my_combined_qry)
In both cases you have to ensure though that a PrimaryKey is properly defined on the view, and you might need to override the autoloaded view, and specify the primary key explicitely (see the link above). Otherwise you will either receive an error, or might not get proper results from the query.
Answer to EDIT-2:
qry = (session.query(StudentYear, ChoiceYear).
select_from(StudentYear).
join(ChoiceYear).
filter(StudentYear.condition == 'A').
filter(ChoiceYear.choice_id == '4')
)
The result will be tuple pairs: (Student, Choice).
But if you want to create a new mapped class for the query, then you can create a selectable as the sample above:
student_tmp = StudentTmp.__table__
choice_tmp = ChoiceTmp.__table__
.... (see sample code above)
This is to show what I ended up doing, any comment welcomed.
class JoinedYear(Base):
__table__ = sqlalchemy.select(
[
StudentYear.id_number,
StudentYear.semester,
StudentYear.stateid,
ChoiceYear.school,
ChoiceYear.program,
],
from_obj=StudentYear.__table__.join(ChoiceYear.__table__),
) \
.where(StudentYear.condition == 'A') \
.where(ChoiceYear.choice_id == '4') \
.alias('YearView')
and I will elaborate from there...
Thanks #van
I am using SQLAlchemy without the ORM, i.e. using hand-crafted SQL statements to directly interact with the backend database. I am using PG as my backend database (psycopg2 as DB driver) in this instance - I don't know if that affects the answer.
I have statements like this,for brevity, assume that conn is a valid connection to the database:
conn.execute("INSERT INTO user (name, country_id) VALUES ('Homer', 123)")
Assume also that the user table consists of the columns (id [SERIAL PRIMARY KEY], name, country_id)
How may I obtain the id of the new user, ideally, without hitting the database again?
You might be able to use the RETURNING clause of the INSERT statement like this:
result = conn.execute("INSERT INTO user (name, country_id) VALUES ('Homer', 123)
RETURNING *")
If you only want the resulting id:
result = conn.execute("INSERT INTO user (name, country_id) VALUES ('Homer', 123)
RETURNING id")
[new_id] = result.fetchone()
User lastrowid
result = conn.execute("INSERT INTO user (name, country_id) VALUES ('Homer', 123)")
result.lastrowid
Current SQLAlchemy documentation suggests
result.inserted_primary_key should work!
Python + SQLAlchemy
after commit, you get the primary_key column id (autoincremeted) updated in your object.
db.session.add(new_usr)
db.session.commit() #will insert the new_usr data into database AND retrieve id
idd = new_usr.usrID # usrID is the autoincremented primary_key column.
return jsonify(idd),201 #usrID = 12, correct id from table User in Database.
this question has been asked many times on stackoverflow and no answer I have seen is comprehensive. Googling 'sqlalchemy insert get id of new row' brings up a lot of them.
There are three levels to SQLAlchemy.
Top: the ORM.
Middle: Database abstraction (DBA) with Table classes etc.
Bottom: SQL using the text function.
To an OO programmer the ORM level looks natural, but to a database programmer it looks ugly and the ORM gets in the way. The DBA layer is an OK compromise. The SQL layer looks natural to database programmers and would look alien to an OO-only programmer.
Each level has it own syntax, similar but different enough to be frustrating. On top of this there is almost too much documentation online, very hard to find the answer.
I will describe how to get the inserted id AT THE SQL LAYER for the RDBMS I use.
Table: User(user_id integer primary autoincrement key, user_name string)
conn: Is a Connection obtained within SQLAlchemy to the DBMS you are using.
SQLite
======
insstmt = text(
'''INSERT INTO user (user_name)
VALUES (:usernm) ''' )
# Execute within a transaction (optional)
txn = conn.begin()
result = conn.execute(insstmt, usernm='Jane Doe')
# The id!
recid = result.lastrowid
txn.commit()
MS SQL Server
=============
insstmt = text(
'''INSERT INTO user (user_name)
OUTPUT inserted.record_id
VALUES (:usernm) ''' )
txn = conn.begin()
result = conn.execute(insstmt, usernm='Jane Doe')
# The id!
recid = result.fetchone()[0]
txn.commit()
MariaDB/MySQL
=============
insstmt = text(
'''INSERT INTO user (user_name)
VALUES (:usernm) ''' )
txn = conn.begin()
result = conn.execute(insstmt, usernm='Jane Doe')
# The id!
recid = conn.execute(text('SELECT LAST_INSERT_ID()')).fetchone()[0]
txn.commit()
Postgres
========
insstmt = text(
'''INSERT INTO user (user_name)
VALUES (:usernm)
RETURNING user_id ''' )
txn = conn.begin()
result = conn.execute(insstmt, usernm='Jane Doe')
# The id!
recid = result.fetchone()[0]
txn.commit()
result.inserted_primary_key
Worked for me. The only thing to note is that this returns a list that contains that last_insert_id.
Make sure you use fetchrow/fetch to receive the returning object
insert_stmt = user.insert().values(name="homer", country_id="123").returning(user.c.id)
row_id = await conn.fetchrow(insert_stmt)
For Postgress inserts from python code is simple to use "RETURNING" keyword with the "col_id" (name of the column which you want to get the last inserted row id) in insert statement at end
syntax -
from sqlalchemy import create_engine
conn_string = "postgresql://USERNAME:PSWD#HOSTNAME/DATABASE_NAME"
db = create_engine(conn_string)
conn = db.connect()
INSERT INTO emp_table (col_id, Name ,Age)
VALUES(3,'xyz',30) RETURNING col_id;
or
(if col_id column is auto increment)
insert_sql = (INSERT INTO emp_table (Name ,Age)
VALUES('xyz',30) RETURNING col_id;)
result = conn.execute(insert_sql)
[last_row_id] = result.fetchone()
print(last_row_id)
#output = 3
ex -
I would like to store Python objects into a SQLite database. Is that possible?
If so what would be some links / examples for it?
You can't store the object itself in the DB. What you do is to store the data from the object and reconstruct it later.
A good way is to use the excellent SQLAlchemy library. It lets you map your defined class to a table in the database. Every mapped attribute will be stored, and can be used to reconstruct the object. Querying the database returns instances of your class.
With it you can use not only sqlite, but most databases - It currently also supports Postgres, MySQL, Oracle, MS-SQL, Firebird, MaxDB, MS Access, Sybase, Informix and IBM DB2. And you can have your user choose which one she wants to use, because you can basically switch between those databases without changing the code at all.
There are also a lot of cool features - like automatic JOINs, polymorphing...
A quick, simple example you can run:
from sqlalchemy import Column, Integer, Unicode, UnicodeText, String
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
from sqlalchemy.ext.declarative import declarative_base
from random import choice
from string import letters
engine = create_engine('sqlite:////tmp/teste.db', echo=True)
Base = declarative_base(bind=engine)
class User(Base):
__tablename__ = 'users'
id = Column(Integer, primary_key=True)
name = Column(Unicode(40))
address = Column(UnicodeText, nullable=True)
password = Column(String(20))
def __init__(self, name, address=None, password=None):
self.name = name
self.address = address
if password is None:
password = ''.join(choice(letters) for n in xrange(10))
self.password = password
Base.metadata.create_all()
Session = sessionmaker(bind=engine)
s = Session()
Then I can use it like this:
# create instances of my user object
u = User('nosklo')
u.address = '66 Some Street #500'
u2 = User('lakshmipathi')
u2.password = 'ihtapimhskal'
# testing
s.add_all([u, u2])
s.commit()
That would run INSERT statements against the database.
# When you query the data back it returns instances of your class:
for user in s.query(User):
print type(user), user.name, user.password
That query would run SELECT users.id AS users_id, users.name AS users_name, users.address AS users_address, users.password AS users_password.
The printed result would be:
<class '__main__.User'> nosklo aBPDXlTPJs
<class '__main__.User'> lakshmipathi ihtapimhskal
So you're effectively storing your object into the database, the best way.
Yes it's possible but there are different approaches and which one is the suitable one, will depend on your requirements.
Pickling
You can use the pickle module to serialize objects, then store these objects in a blob in sqlite3 (or a textfield, if the dump is e.g. base64 encoded). Be aware of some possible problems: questions/198692/can-i-pickle-a-python-dictionary-into-a-sqlite3-text-field
Object-Relational-Mapping
You can use object relational mapping. This creates, in effect, a "virtual object database" that can be used from within the programming language (Wikipedia). For python, there is a nice toolkit for that: sqlalchemy.
You can use pickle.dumps, its return pickable objects as strings, you would not need to write it to temporary files.
Return the pickled representation of
the object as a string, instead of
writing it to a file.
import pickle
class Foo:
attr = 'a class attr'
picklestring = pickle.dumps(Foo)
SQLite 3's adaptors and converters
I'm surprised by how no one has read the docs for the SQLite 3 library, because it says that you can do this by creating an adaptor and converter. For example, let's say that we have a class called 'Point' and we want to store this and have this returned when selecting it and using the database cursor's fetchone method to return it. Let's make the module know that what you select from the database is a point
from sqlite3 import connect, register_adaptor, register_converter
class Point:
def __init__(self, x, y):
self.x, self.y = x, y
def __repr__(self):
return "(%f;%f)" % (self.x, self.y)
def adapt_point(point):
return ("%f;%f" % (point.x, point.y)).encode('ascii')
def convert_point(s):
x, y = list(map(float, s.split(b";")))
return Point(x, y)
# Register the adapter
register_adapter(Point, adapt_point)
# Register the converter
register_converter("point", convert_point)
p = Point(4.0, -3.2)
# 1) Using declared types
con = connect(":memory:", detect_types=sqlite3.PARSE_DECLTYPES)
con.execute("create table test(p point)")
con.execute("insert into test(p) values (?)", (p,))
cur = con.execute("select p from test")
print("with declared types:", cur.fetchone()[0])
con.close()
# 1) Using column names
con = connect(":memory:", detect_types=sqlite3.PARSE_COLNAMES)
con.execute("create table test(p)")
con.execute("insert into test(p) values (?)", (p,))
cur = con.execute('select p as "p [point]" from test')
print("with column names:", cur.fetchone()[0])
con.close()
You can use pickle to serialize the object. The serialized object can be inserted to the sqlite DB as a bytearray field.
f=open('object.dump', 'rw')
pickle.dump(obj, f)
Now read object.dump from the file, and write it to the sqlite DB. You might want to write it as a binary data type; read about storing binary data and blob in SQLite here. Note that according to this source, SQLite limits the size of such datafield to 1Mb.
I think that a better option would be serializing your object into a file, and keeping the file name, not contents, in the database.
You other choice instead of pickling is to use an ORM. This lets you map rows in a database to an object. See http://wiki.python.org/moin/HigherLevelDatabaseProgramming for a starting point. I'd recommend SQLAlchemy or SQLObject.
There is relatively simple way to store and compare objects, eaven to index those objects right way and to restrict (with ubique) columns containing objects. And all of that without using ORM engines. Objects mast be stored using pickle dump (so performance might be a issue) Here is example for storing python tuples, indexing restricting and comparing. This method can be easily applied to any other python class. All that is needed is explained in python sqlite3 documentation (somebody already posted the link). Anyway here it is all put together in the following example:
import sqlite3
import pickle
def adapt_tuple(tuple):
return pickle.dumps(tuple)
sqlite3.register_adapter(tuple, adapt_tuple) #cannot use pickle.dumps directly because of inadequate argument signature
sqlite3.register_converter("tuple", pickle.loads)
def collate_tuple(string1, string2):
return cmp(pickle.loads(string1), pickle.loads(string2))
# 1) Using declared types
con = sqlite3.connect(":memory:", detect_types=sqlite3.PARSE_DECLTYPES)
con.create_collation("cmptuple", collate_tuple)
cur = con.cursor()
cur.execute("create table test(p tuple unique collate cmptuple) ")
cur.execute("create index tuple_collated_index on test(p collate cmptuple)")
######################### Test ########################
cur.execute("select name, type from sqlite_master") # where type = 'table'")
print(cur.fetchall())
p = (1,2,3)
p1 = (1,2)
cur.execute("insert into test(p) values (?)", (p,))
cur.execute("insert into test(p) values (?)", (p1,))
cur.execute("insert into test(p) values (?)", ((10, 1),))
cur.execute("insert into test(p) values (?)", (tuple((9, 33)) ,))
cur.execute("insert into test(p) values (?)", (((9, 5), 33) ,))
try:
cur.execute("insert into test(p) values (?)", (tuple((9, 33)) ,))
except Exception as e:
print e
cur.execute("select p from test order by p")
print "\nwith declared types and default collate on column:"
for raw in cur:
print raw
cur.execute("select p from test order by p collate cmptuple")
print "\nwith declared types collate:"
for raw in cur:
print raw
con.create_function('pycmp', 2, cmp)
print "\nselect grater than using cmp function:"
cur.execute("select p from test where pycmp(p,?) >= 0", ((10, ),) )
for raw in cur:
print raw
cur.execute("select p from test where pycmp(p,?) >= 0", ((3,)))
for raw in cur:
print raw
print "\nselect grater than using collate:"
cur.execute("select p from test where p > ?", ((10,),) )
for raw in cur:
print raw
cur.execute("explain query plan select p from test where p > ?", ((3,)))
for raw in cur:
print raw
cur.close()
con.close()
Depending on your exact needs, it could be worth looking into Django (www.djangoproject.com) for this task. Django is actually a web framework, but one of the tasks it handles is to allow you to define Models as python objects (inheriting from a base class provided by the framework). It will then automatically create the database tables required to store those objects, and sqlite is among the supported backends. It also provides handy functions to query the database and return one or more matching objects. See for example the documentation about Models in django:
http://docs.djangoproject.com/en/1.9/topics/db/models/
The drawback is of course that you have to install a full web framework, and (as far as I remember) you can only store objects whose attributes are supported by django. Also, it's made for storing many instances of predefined objects, not for storing one instance each of many different objects. Depending on your needs, this may or may not be impractical.
One option is to use an O/R mapper like SQLObject. It will do most of the plumbing to persist the Python object to a database, and it supports SQLite. As mentioned elsewhere you can also serialise the object using a method such as pickle, which dumps out a representation of the object that it can reconstruct by reading back in and parsing.
As others have mentioned, the answer is yes... but the object needs to be serialized first. I'm the author of a package called klepto that is built to seamlessly store python objects in SQL databases, HDF archives, and other types of key-value stores.
It provides a simple dictionary interface, like this:
>>> from klepto.archives import sqltable_archive as sql_archive
>>> d = sql_archive(cached=False)
>>> d['a'] = 1
>>> d['b'] = '1'
>>> d['c'] = min
>>> squared = lambda x:x*x
>>> d['d'] = squared
>>> class Foo(object):
... def __init__(self, x):
... self.x = x
... def __call__(self):
... return squared(self.x)
...
>>> f = Foo(2)
>>> d['e'] = Foo
>>> d['f'] = f
>>>
>>> d
sqltable_archive('sqlite:///:memory:?table=memo' {'a': 1, 'b': '1', 'c': <built-in function min>, 'd': <function <lambda> at 0x10f631268>, 'e': <class '__main__.Foo'>, 'f': <__main__.Foo object at 0x10f63d908>}, cached=False)
>>>
>>> # min(squared(2), 1)
>>> d['c'](d['f'](), d['a'])
1
>>>
The cached keyword in the archive constructor signifies whether you want to use a local memory cache, with the archive set as the cache backend (cached=True) or just use the archive directly (cached=False). Under the covers, it can use pickle, json, dill, or other serializers to pickle the objects. Looking at the archive's internals, you can see it's leveraging SQLAlchemy:
>>> d._engine
Engine(sqlite://)
>>> d.__state__
{'serialized': True, 'root': 'sqlite:///:memory:', 'id': Table('memo', MetaData(bind=None), Column('Kkey', String(length=255), table=<memo>, primary_key=True, nullable=False), Column('Kval', PickleType(), table=<memo>), schema=None), 'protocol': 3, 'config': {}}