Accented column names, introspection and mapping

Accented column names, introspection and mapping - python

I would like to access a legacy MSSQL database using SQLAlchemy. With basic schema inspection I could already list the columns of the tables I'm interested in. Unfortunately, these column names sometimes contain accented letters (e.g. "Magánszemély", "LevelezésiCímIrányítószám").
My only requirement is to be able to query this DB.
I've listed some database names for you using the following commands
def inspect_komplex_table():
table = Table('D_Allomanylista_Komplex_V', meta, autoload=True, autoload_with=engine)
return table
def get_columns():
keys = inspect_komplex_table().columns.keys()
keys.sort()
txt = '\n'.join(keys)
open('column_names.txt', 'w').write(txt.encode('utf8'))
This gives a (long) list in the column_names.txt file with lines:
...
JutÉrvKezd
KEZDET
KTVSZAM
KamaraiTagszám
Képviselők
Lejarat
LevelezésiCímIrányítószám
LevelezésiCímUtca
LevelezésiCímVáros
...
I've tried to create a basic mapping without the accented columns first
class BiztositasokModel(object):
def __init__(self, UgyfKod):
self.ugyfelkod = UgyfKod
where UgyfKod is one the columns from the introspection, but
mapper(BiztositasokModel, inspect_komplex_table())
fails with UnicodeEncodeError
Could someone give me an idea on how to handle such a DB?

I've managed to find two answers, one using the declarative syntax, the other using the classical mapping.
Both implement a way to change the default naming scheme of the mapping columns to Python object properties.
The classical mapping answer is: SQLAlchemy mapping table with non-ascii columns to class
The answer using a declarative syntax: http://docs.sqlalchemy.org/en/latest/orm/mapper_config.html#automating-column-naming-schemes-from-reflected-tables

Related

SQLAlchemy: how to refer to filtered fields

I am doing a relatively simple ETL project using SQLAlchemy.
There is a large existing PostgreSQL database with multiple 'schemas' (in the PostgreSQL sub-database sense), one of which is new and the project is to convert the data from schema 'old' to schema 'new'.
I have one set of two 'old' source tables that I have to join together to get the new table ... I can't see how to refer to the fields in the joined/filtered superset of the two tables. For example, if I just loop over one table:
allp = session.query(Permit).all()
for p in allp:
print p.permit_id
... works as expected.
But if I set up a filter to combine the two tables:
prmp = session.query(Permit,Permit_master).filter(Permit_master.id == Permit.mast_id).all()
for p in prmp:
print p.permit_id
gives
'result' object has no attribute 'permit_id'
This must be something simple, but I've tried inspecting the object with dir() to no avail.
Help please ...

The results of your query are keyed 2-tuples of Permit and Permit_master. You can access the result entities using either their position or key:
for p in prmp:
print p.Permit.permit_id
# or
print p[0].permit_id

How do I handle database columns with reserved characters in SQLAlchemy ORM?

I'm somewhat new to SQLAlchemy ORM, and I'm trying to select and then store data from a column within a view that has a forward slash in the name of the column.
The databases are mapped using the following:
source_engine = create_engine("...")
base = automap_base()
base.prepare(source_engine, reflect=True)
metadata = MetaData(self.engine)
table_1 = Table("table_1", self.metadata, autoload=True)
The second destination table is mapped the same way.
Then, I connect to this database, and I'm trying to select information from columns to copy into a different database:
source_table_session = Session(source_engine)
dest_table_session = Session(dest_engine)
table_1_data = table_1_session.query(table_1)
for instance in table_1_data:
newrow = dest_table.base.classes.dest_table()
newrow.Column1 = instance.Column1 # This works fine, column has normal name
But then, the problem is that there's a column in the view with the name "Slot/Port"
With a direct query, you can do:
select "Slot/Port" from source_database;
But in ORM, you can't just type:
newrow.Slot/Port = instance.Slot/Port
or
newrow.'Slot/Port' = instance.'Slot/Port'
That isn't going to be correct, and the following doesn't work either:
newrow.SlotPort = instance.SlotPort
AttributeError: 'result' object has no attribute 'SlotPort'
I have no control over how columns are named in the source database.
I find the SQLAlchemy documentation to be generally fragmented (only showing small snippets of code) and confusing, so I'm not sure if this is kind of thing is addressed or not. Is there a way to get around this limitation, or perhaps if the columns are already mapped to a valid name without a slash or a way to do so?

Thanks to #DeepSpace for helping me find the answer.
Instead of
newrow.whatever = instance.whatever
I needed:
setattr(newrow, 'Slot/Port', getattr(instance, 'Slot/Port'))

column names and types for insert operation in sqlalchemy

I am building a sqlite browser in Python/sqlalchemy.
Here is my requirement.
I want to do insert operation on the table.
I need to pass a table name to a function. It should return all columns along with the respective types.
Can anyone tell me how to do this in sqlalchemy ?

You can access all columns of a Table like this:
my_table.c
Which returns a type that behaves similar to a dictionary, i.e. it has values method and so on:
columns = [(item.name, item.type) for item in my_table.c.values()]
You can play around with that to see what you can get from that. Using the declarative extension you can access the table through the class' __table__ attribute. Furthermore, you might find the Runtime Inspection API helpful.

Proper use of MySQL full text search with SQLAlchemy

I would like to be able to full text search across several text fields of one of my SQLAlchemy mapped objects. I would also like my mapped object to support foreign keys and transactions.
I plan to use MySQL to run the full text search. However, I understand that MySQL can only run full text search on a MyISAM table, which does not support transactions and foreign keys.
In order to accomplish my objective I plan to create two tables. My code will look something like this:
class User(Base):
__tablename__ = 'users'
id = Column(Integer, primary_key=True)
name = Column(String(50))
description = Column(Text)
users_myisam = Table('users_myisam', Base.metadata,
Column('id', Integer),
Column('name', String(50)),
Column('description', Text),
mysql_engine='MyISAM')
conn = Base.metadata.bind.connect()
conn.execute("CREATE FULLTEXT INDEX idx_users_ftxt \
on users_myisam (name, description)")
Then, to search I will run this:
q = 'monkey'
ft_search = users_myisam.select("MATCH (name,description) AGAINST ('%s')" % q)
result = ft_search.execute()
for row in result: print row
This seems to work, but I have a few questions:
Is my approach of creating two tables to solve my problem reasonable? Is there a standard/better/cleaner way to do this?
Is there a SQLAlchemy way to create the fulltext index, or am I best to just directly execute "CREATE FULLTEXT INDEX ..." as I did above?
Looks like I have a SQL injection problem in my search/match against query. How can I do the select the "SQLAlchemy way" to fix this?
Is there a clean way to join the users_myisam select/match against right back to my user table and return actual User instances, since this is what I really want?
In order to keep my users_myisam table in sync with my mapped object user table, does it make sense for me to use a MapperExtension on my User class, and set the before_insert, before_update, and before_delete methods to update the users_myisam table appropriately, or is there some better way to accomplish this?
Thanks,
Michael

Is my approach of creating two tables to solve my problem reasonable?
Is there a standard/better/cleaner way to do this?
I've not seen this use case attempted before, as developers who value transactions and constraints tend to use Postgresql in the first place. I understand that may not be possible in your specific scenario.
Is there a SQLAlchemy way to create the fulltext index, or am I best
to just directly execute "CREATE FULLTEXT INDEX ..." as I did above?
conn.execute() is fine though if you want something slightly more integrated you can use the DDL() construct, read through http://docs.sqlalchemy.org/en/rel_0_8/core/schema.html?highlight=ddl#customizing-ddl for details
Looks like I have a SQL injection problem in my search/match against query. How can I do the
select the "SQLAlchemy way" to fix this?
note: this recipe is only for MATCH against multiple columns simultaneously - if you have just one column, use the match() operator more simply.
most basically you could use the text() construct:
from sqlalchemy import text, bindparam
users_myisam.select(
text("MATCH (name,description) AGAINST (:value)",
bindparams=[bindparam('value', q)])
)
more comprehensively you could define a custom construct:
from sqlalchemy.ext.compiler import compiles
from sqlalchemy.sql.expression import ClauseElement
from sqlalchemy import literal
class Match(ClauseElement):
def __init__(self, columns, value):
self.columns = columns
self.value = literal(value)
#compiles(Match)
def _match(element, compiler, **kw):
return "MATCH (%s) AGAINST (%s)" % (
", ".join(compiler.process(c, **kw) for c in element.columns),
compiler.process(element.value)
)
my_table.select(Match([my_table.c.a, my_table.c.b], "some value"))
docs:
http://docs.sqlalchemy.org/en/rel_0_8/core/compiler.html
Is there a clean way to join the users_myisam select/match against right back
to my user table and return actual User instances, since this is what I really want?
you should probably create a UserMyISAM class, map it just like User, then use relationship() to link the two classes together, then simple operations like this are possible:
query(User).join(User.search_table).\
filter(Match([UserSearch.x, UserSearch.y], "some value"))
In order to keep my users_myisam table in sync with my mapped object
user table, does it make sense for me to use a MapperExtension on my
User class, and set the before_insert, before_update, and
before_delete methods to update the users_myisam table appropriately,
or is there some better way to accomplish this?
MapperExtensions are deprecated, so you'd at least use the event API, and in most cases we want to try applying object mutations outside of the flush process. In this case, I'd be using the constructor for User, or alternatively the init event, as well as a basic #validates decorator which will receive values for the target attributes on User and copy those values into User.search_table.
Overall, if you've been learning SQLAlchemy from another source (like the Oreilly book), its really out of date by many years, and I'd be focusing on the current online documentation.

How do you escape strings for SQLite table/column names in Python?

The standard approach for using variable values in SQLite queries is the "question mark style", like this:
import sqlite3
with sqlite3.connect(":memory:") as connection:
connection.execute("CREATE TABLE foo(bar)")
connection.execute("INSERT INTO foo(bar) VALUES (?)", ("cow",))
print(list(connection.execute("SELECT * from foo")))
# prints [(u'cow',)]
However, this only works for substituting values into queries. It fails when used for table or column names:
import sqlite3
with sqlite3.connect(":memory:") as connection:
connection.execute("CREATE TABLE foo(?)", ("bar",))
# raises sqlite3.OperationalError: near "?": syntax error
Neither the sqlite3 module nor PEP 249 mention a function for escaping names or values. Presumably this is to discourage users from assembling their queries with strings, but it leaves me at a loss.
What function or technique is most appropriate for using variable names for columns or tables in SQLite? I'd would strongly prefer to do able to do this without any other dependencies, since I'll be using it in my own wrapper.
I looked for but couldn't find a clear and complete description of the relevant part of SQLite's syntax, to use to write my own function. I want to be sure this will work for any identifier permitted by SQLite, so a trial-and-error solution is too uncertain for me.
SQLite uses " to quote identifiers but I'm not sure that just escaping them is sufficient. PHP's sqlite_escape_string function's documentation suggests that certain binary data may need to be escaped as well, but that may be a quirk of the PHP library.

To convert any string into a SQLite identifier:
Ensure the string can be encoded as UTF-8.
Ensure the string does not include any NUL characters.
Replace all " with "".
Wrap the entire thing in double quotes.
Implementation
import codecs
def quote_identifier(s, errors="strict"):
encodable = s.encode("utf-8", errors).decode("utf-8")
nul_index = encodable.find("\x00")
if nul_index >= 0:
error = UnicodeEncodeError("NUL-terminated utf-8", encodable,
nul_index, nul_index + 1, "NUL not allowed")
error_handler = codecs.lookup_error(errors)
replacement, _ = error_handler(error)
encodable = encodable.replace("\x00", replacement)
return "\"" + encodable.replace("\"", "\"\"") + "\""
Given a string single argument, it will escape and quote it correctly or raise an exception. The second argument can be used to specify any error handler registered in the codecs module. The built-in ones are:
'strict': raise an exception in case of an encoding error
'replace': replace malformed data with a suitable replacement marker, such as '?' or '\ufffd'
'ignore': ignore malformed data and continue without further notice
'xmlcharrefreplace': replace with the appropriate XML character reference (for encoding only)
'backslashreplace': replace with backslashed escape sequences (for encoding only)
This doesn't check for reserved identifiers, so if you try to create a new SQLITE_MASTER table it won't stop you.
Example Usage
import sqlite3
def test_identifier(identifier):
"Tests an identifier to ensure it's handled properly."
with sqlite3.connect(":memory:") as c:
c.execute("CREATE TABLE " + quote_identifier(identifier) + " (foo)")
assert identifier == c.execute("SELECT name FROM SQLITE_MASTER").fetchone()[0]
test_identifier("'Héllo?'\\\n\r\t\"Hello!\" -☃") # works
test_identifier("北方话") # works
test_identifier(chr(0x20000)) # works
print(quote_identifier("Fo\x00o!", "replace")) # prints "Fo?o!"
print(quote_identifier("Fo\x00o!", "ignore")) # prints "Foo!"
print(quote_identifier("Fo\x00o!")) # raises UnicodeEncodeError
print(quote_identifier(chr(0xD800))) # raises UnicodeEncodeError
Observations and References
SQLite identifiers are TEXT, not binary.
SQLITE_MASTER schema in the FAQ
Python 2 SQLite API yelled at me when I gave it bytes it couldn't decode as text.
Python 3 SQLite API requires queries be strs, not bytes.
SQLite identifiers are quoted using double-quotes.
SQL as Understood by SQLite
Double-quotes in SQLite identifiers are escaped as two double quotes.
SQLite identifiers preserve case, but they are case-insensitive towards ASCII letters. It is possible to enable unicode-aware case-insensitivity.
SQLite FAQ Question #18
SQLite does not support the NUL character in strings or identifiers.
SQLite Ticket 57c971fc74
sqlite3 can handle any other unicode string as long as it can be properly encoded to UTF-8. Invalid strings could cause crashes between Python 3.0 and Python 3.1.2 or thereabouts. Python 2 accepted these invalid strings, but this is considered a bug.
Python Issue #12569
Modules/_sqlite/cursor.c
I tested it a bunch.

The psycopg2 documentation explicitly recommends using normal python % or {} formatting to substitute in table and column names (or other bits of dynamic syntax), and then using the parameter mechanism to substitute values into the query.
I disagree with everyone who is saying "don't ever use dynamic table/column names, you're doing something wrong if you need to". I write programs to automate stuff with databases every day, and I do it all the time. We have lots of databases with lots of tables, but they are all built on repeated patterns, so generic code to handle them is extremely useful. Hand-writing the queries every time would be far more error prone and dangerous.
It comes down to what "safe" means. The conventional wisdom is that using normal python string manipulation to put values into your queries is not "safe". This is because there are all sorts of things that can go wrong if you do that, and such data very often comes from the user and is not in your control. You need a 100% reliable way of escaping these values properly so that a user cannot inject SQL in a data value and have the database execute it. So the library writers do this job; you never should.
If, however, you're writing generic helper code to operate on things in databases, then these considerations don't apply as much. You are implicitly giving anyone who can call such code access to everything in the database; that's the point of the helper code. So now the safety concern is making sure that user-generated data can never be used in such code. This is a general security issue in coding, and is just the same problem as blindly execing a user-input string. It's a distinct issue from inserting values into your queries, because there you want to be able to safely handle user-input data.
So my recommendation is: do whatever you want to dynamically assemble your queries. Use normal python string templating to sub in table and column names, glue on where clauses and joins, all the good (and horrible to debug) stuff. But make sure you're aware that whatever values such code touches has to come from you, not your users[1]. Then you use SQLite's parameter substitution functionality to safely insert user-input values into your queries as values.
[1] If (as is the case for a lot of the code I write) your users are the people who have full access to databases anyway and the code is to simplify their work, then this consideration doesn't really apply; you probably are assembling queries on user-specified tables. But you should still use SQLite's parameter substitution to save yourself from the inevitable genuine value that eventually contains quotes or percent signs.

If you're quite certain that you need to specify column names dynamically, you should use a library that can do so safely (and complains about things that are wrong). SQLAlchemy is very good at that.
>>> import sqlalchemy
>>> from sqlalchemy import *
>>> metadata = MetaData()
>>> dynamic_column = "cow"
>>> foo_table = Table('foo', metadata,
... Column(dynamic_column, Integer))
>>>
foo_table now represents the table with the dynamic schema, but you can only use it in the context of an actual database connection (so that sqlalchemy knows the dialect, and what to do with the generated sql).
>>> metadata.bind = create_engine('sqlite:///:memory:', echo=True)
You can then issue the CREATE TABLE .... with echo=True, sqlalchemy will log the generated sql, but in general, sqlalchemy goes out of its way to keep the generated sql out of your hands (lest you consider using it for evil purposes).
>>> foo_table.create()
2011-06-28 21:54:54,040 INFO sqlalchemy.engine.base.Engine.0x...2f4c
CREATE TABLE foo (
cow INTEGER
)
2011-06-28 21:54:54,040 INFO sqlalchemy.engine.base.Engine.0x...2f4c ()
2011-06-28 21:54:54,041 INFO sqlalchemy.engine.base.Engine.0x...2f4c COMMIT
>>>
and yes, sqlalchemy will take care of any column names that need special handling, like when the column name is a sql reserved word
>>> dynamic_column = "order"
>>> metadata = MetaData()
>>> foo_table = Table('foo', metadata,
... Column(dynamic_column, Integer))
>>> metadata.bind = create_engine('sqlite:///:memory:', echo=True)
>>> foo_table.create()
2011-06-28 22:00:56,267 INFO sqlalchemy.engine.base.Engine.0x...aa8c
CREATE TABLE foo (
"order" INTEGER
)
2011-06-28 22:00:56,267 INFO sqlalchemy.engine.base.Engine.0x...aa8c ()
2011-06-28 22:00:56,268 INFO sqlalchemy.engine.base.Engine.0x...aa8c COMMIT
>>>
and can save you from possible badness:
>>> dynamic_column = "); drop table users; -- the evil bobby tables!"
>>> metadata = MetaData()
>>> foo_table = Table('foo', metadata,
... Column(dynamic_column, Integer))
>>> metadata.bind = create_engine('sqlite:///:memory:', echo=True)
>>> foo_table.create()
2011-06-28 22:04:22,051 INFO sqlalchemy.engine.base.Engine.0x...05ec
CREATE TABLE foo (
"); drop table users; -- the evil bobby tables!" INTEGER
)
2011-06-28 22:04:22,051 INFO sqlalchemy.engine.base.Engine.0x...05ec ()
2011-06-28 22:04:22,051 INFO sqlalchemy.engine.base.Engine.0x...05ec COMMIT
>>>
(apparently, some strange things are perfectly legal identifiers in sqlite)

The first thing to understand is that table/column names cannot be escaped in the same sense than you can escape strings stored as database values.
The reason is that you either have to:
accept/reject the potential table/column name, i.e. it is not guaranteed that a string is an acceptable column/table name, contrarily to a string to be stored in some database; or,
sanitize the string which will have the same effect as creating a digest: the function used is surjective, not bijective (once again, the inverse is true for a string that is to be stored in some database); so not only can't you be certain of going from the sanitized name back to the original name, but you are at risk of unintentionally trying to create two columns or tables with the same name.
Having understood that, the second thing to understand is that how you will end up "escaping" table/column names depends on your specific context, and so there is more than one way to do this, but whatever the way, you'll need to dig up to figure out exactly what is or is not an acceptable column/table name in sqlite.
To get you started, here is one condition:
Table names that begin with "sqlite_" are reserved for internal use. It is an error to attempt to create a table with a name that starts with "sqlite_".
Even better, using certain column names can have unintended side effects:
Every row of every SQLite table has a 64-bit signed integer key that
uniquely identifies the row within its table. This integer is usually
called the "rowid". The rowid value can be accessed using one of the
special case-independent names "rowid", "oid", or "rowid" in place
of a column name. If a table contains a user defined column named
"rowid", "oid" or "rowid", then that name always refers the
explicitly declared column and cannot be used to retrieve the integer
rowid value.
Both quoted texts are from http://www.sqlite.org/lang_createtable.html

From the sqlite faq, question 24 (the formulation of the question of course does not give a clue that the answer may be useful to your question):
SQL uses double-quotes around identifiers (column or table names) that contains special characters or which are keywords. So double-quotes are a way of escaping identifier names.
If the name itself contains double quotes, escape that double quote with another one.

Placeholders are only for values. Column and table names are structural, and are akin to variable names; you can't use placeholders to fill them in.
You have three options:
Appropriately escape/quote the column name everywhere you use it. This is fragile and dangerous.
Use an ORM like SQLAlchemy, which will take care of escaping/quoting for you.
Ideally, just don't have dynamic column names. Tables and columns are for structure; anything dynamic is data and should be in the table rather than part of it.

I made some research because I was unsatisfied with the current unsafe answers, and I would recommend using the internal printf function of sqlite to do that. It is made to escape any identifier (table name, column table...) and make it safe for concatenation.
In python, it should be something like that (I'm not a python user, so there may be mistakes, but the logic itself works):
table = "bar"
escaped_table = connection.execute("SELECT printf('%w', ?)", (table,)).fetchone()[0]
connection.execute("CREATE TABLE \""+escaped_table+"\" (bar TEXT)")
According to the documentation of %w:
This substitution works like %q except that it doubles all double-quote characters (") instead of single-quotes, making the result suitable for using with a double-quoted identifier name in an SQL statement.
The %w substitution is an SQLite enhancements, not found in most other printf() implementations.
Which means you can alternatively do the same with single quotes using %q:
table = "bar"
escaped_table = connection.execute("SELECT printf('%q', ?)", (table,)).fetchone()[0]
connection.execute("CREATE TABLE '"+escaped_table+"' (bar TEXT)")

If you find that you need a variable entity name (either relvar or field) then you probably are doing something wrong. an alternative pattern would be to use a property map, something like:
CREATE TABLE foo_properties(
id INTEGER NOT NULL,
name VARCHAR NOT NULL,
value VARCHAR,
PRIMARY KEY(id, name)
);
Then, you just specify the name dynamically when doing an insert instead of a column.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.