Get query result as a tuple to make substitution - python

I build dynamic queries, so that I do not know in advance table name and table fields. I do it in order to export programmatically data from one arbitrary table to another. So, my algorithm gets as a parameter name of source table, name of destination table and from one system table gets fields mapping (from one table to another). I almost did it. I built a select query from source table, so I can do
cursor.execute(selectquery)
for row in cursor:
... do something with rows
And besides I built a template of insert query for the destination table, so it looks like
insert into sourcetable (attr1,attr2,attr3) values (%s,%s,%s) # let me call it template_query
Now I want to substitute those %s, %s, %s with values returned by the select query. Something like this (which not works, but demonstrates what I want):
cursor.execute(selectquery)
for row in cursor:
final_query = template_query % row # <- I want this substitution
cursor2.execute(final_query)

I use something similar. What you need to do is to decorate/wrap the row with a __getitem__ and then use %(colname)s rather than %s in the template's values.
class Wrapper(object):
def __init__(self,o):
self.o = o
def __getitem__(self,key):
try:
return getattr(self.o, key)
except AttributeError:
raise KeyError, key
Then, using the django shell (my model has one column, tagtype)
python manage.py shell
>>> from wrapper import Wrapper
>>> from pssystem.models import MetaTag
>>> o = MetaTag.objects.all()[0]
>>> w = Wrapper(o)
>>> "insert into sourcetable (attr1,attr2,attr3) values ('%(tagtype)s','%(tagtype)s, '%(tagtype)s)" % w
u"insert into sourcetable (attr1,attr2,attr3) values ('PROFILE','PROFILE, 'PROFILE)"
You can get fancier than that (and you definitely should if the source object contains untrusted, user-entered, content), but this works fine.
Notice that you need to add quotes around the substitutions if those are character variables. Dates might also be fun too!
Hmmm, sorry, just noticed that your source rows are coming from a straight select rather than a fetch from a Django model. The Django tag confused me -- there is very little Django in your question. Well then, it still works, but you first need to do something with the cursor's result rows.
Something like this does the trick:
def fmtRow(cursor, row):
di = dict()
for i, col in enumerate(cursor.description):
di[col] = row[i]
return di
and then you can dispense with the Wrapper, because your row is changed to a dictionary already.
This is a very naive implementation, not suitable for high volumes, but it works.

You can use kwargs to update querysets dynamically.
kwargs = {'name': "Jenny", 'color': "Blue"}
print People.objects.filter(**kwargs)
I'm not sure this helps with the dynamically named table though. Maybe something like this would help: http://dynamic-models.readthedocs.org/en/latest/ (it's where that kwarg example came from).

Related

Inserting data from a CSV file to postgres using SQL

Struggling with this python issue as I'm new to it and I don't have significant experience in the language. I currently have a CSV file with containing around 20 headers and the same amount of rows so listing each out like some examples here is what I'm trying to avoid:
https://www.dataquest.io/blog/loading-data-into-postgres/
My code consists of the following so far:
with open('dummy-data.csv', 'r') as f:
reader = csv.reader(f)
next(reader)
for row in reader:
cur.execute('INSERT INTO messages VALUES', (row))
I'm getting a syntax error at the end of the input, so I assumed it is linked to the way my execute method has been written but I still don't know what I would do in order to address the issue. Any help?
P.S. I understand the person usings %s for that, but if that was the case, can it be avoided since I don't want to have it duplicated in a line 20 times.
Basically, you DO have to specify at least the required placeholders - and preferably the fields names too - in your query.
If it's a one-shot affair and you know which fields are in the CSV and in which order, then you simply hardcode them in the query ie
SQL = "insert into tablename(field1, field2, field21) values(%s, %s, %s)"
Ok, for 20 or so fields it gets quite boring, so you can also use a list of field names to generate the fieldnames part and the placeholders:
fields = ["field1", "field2", "field21"]
placeholders = ["%s"] * len(fields) # list multiplication, yes
SQL = "insert into tablename({}) values({})".format(", ".join(fields), ", ".join(placeholders))
If by chance the CSV header row contains the exact field names, you can also just use this row as value for fields - but you have to trust the csv then.
NB: specifying the fields list in the query is not strictly required but it can protect you from possible issues with a malformed csv. Actually, unless you really trust the source (your csv), you should actively validate the incoming data before sending them to the database.
NB2:
%s is for strings I know but would it work the same for timestamps?
In this case, "%s" is not used as a Python string format specifier but as a plain database query placeholder. The choice of the string format specifier here is really unfortunate as it creates a lot of confusion. Note that this is DB vendor specific though, some vendors use "?" instead which is much clearer IMHO (and you want to check your own db-api connector's doc for the correct plaeholder to use BTW).
And since it's not a string format specifier, it will work for any type and doesn't need to be quoted for strings, it's the db-api module's job to do proper formatting (including quoting etc) according to the db column's type.
While we're at it, by all means, NEVER directly use Python string formatting operations when passing values to your queries - unless you want your database to be open-bar for script-kiddies of course.
The problem lies on the insert itself:
cur.execute('INSERT INTO messages VALUES', (row))
The problem is that, since you are not defining parameters on the query, it is interpreting that you literally want to execute INSERT INTO messages VALUES, with no parameters, which will cause a syntax error; using a single parameter won't work either, since it will understand that you want a single parameter, instead of multiple parameters.
If you want to create parameters in a more dynamic way, you could try to construct the query string dynamically.
Please, take a look the documentation: http://initd.org/psycopg/docs/cursor.html#cursor.execute
You can use strings multiply.
import csv
import psycopg2
conn = psycopg2.connect('postgresql://db_user:db_user_password#server_name:port/db_name')
cur = conn.cursor()
multiple_placehorders = ','.join(['%s']*20)
with open('dummy-data.csv', 'r') as f:
reader = csv.reader(f)
next(reader)
for row in reader:
cur.execute('INSERT INTO public.messages VALUES (' + multiple_placehorders + ')', row)
conn.commit()
If you want to have a single placeholder that covers an whole list of values, you can use a different method, located in "extras", which covers that usage:
psycopg2.extras.execute_values(cur, 'INSERT INTO messages VALUES %s', (row,))
This method can take many rows at a time (which is good for performance), which is why you need to wrap your single row in (...,).
Last time when I was struggling to insert a CSV data into the postgres I've used pgAdmin and it has worked. I don't know whether this answer is a solution but an easy idea to get along with it.
You can use the cursor and executemany so that you can skip the iteration , But its slower than string joining parameterized approach.
import pandas
df = pd.read_csv('dummy-data.csv')
df.columns = [<define the headers here>] # You can skip this line if headers match column names
try:
cursor.prepare("insert into public.messages(<Column Names>) values(:1, :2, :3 ,:4, :5)")
cursor.executemany(None, df.values.tolist())
conn.commit()
except:
conn.rollback()

How to create a SQL query with optional parameters in Python Flask?

How to create a SQL query with optional parameters in Python Flask?
I use a HTML form to filter the data from a SQL table. The form has multiple optional fields. How can I create a SQL query based on the filled fields in the form?
For example, if the users fill in 3 fields "name", "amount" and "itemtype", the query is like:
rows = cursor.execute("""SELECT * FROM items WHERE name = ? AND amount=? AND itemtype = ? """, name, amount, itemtype).fetchall()
If they skip "amount", the query is like:
rows = cursor.execute("""SELECT * FROM items WHERE name = ? AND itemtype = ? """, name, itemtype).fetchall()
I prefer to use Python's format string function for this. It's splitting hairs, but format allows to set name in the string, so it's technically more explicit. However, I would suggest using **kwargs instead of *args, so you don't have to rely on magic.
UPADATE 2019
This was a terrible answer. You never, ever, EVER want to take user generated data and interpolate it directly into a SQL query. It is imperative that you always sanitize user input in order to protect against SQL injection. Python has defined a database API specification that any database package that is not an ORM like SQLAlchemy should implement. Long story short, you should be NEVER, EVER, EVER use str.format(), %, or "fstrings" to interpolate data into your SQL queries.
The database API specification provides a way to safely interpolate data into queries. Every python database interface should have a Cursor class that is returned from a Connection object. The Cursor class will implement a method named execute. This obviously will execute a query, but it will also have a second argument–usually called args. According to the specification:
Parameters may be provided as sequence or mapping and will be bound to variables in the operation. Variables are specified in a database-specific notation (see the module's paramstyle attribute for details).
By "sequence", it means that args can be a list or tuple, and by "mapping", it means that args can also be a dict. Depending on the package, the way to specify where your data should be interpolated may differ. There are six options for this. Which formatting the package you're using can be found in the paramstyle constant of the package. For instance, PyMySQL(and most implementations of the spec that I've used) uses format and pyformat. A simple example would be:
format
cursor.execute('SELECT * FROM t WHERE a = %s, b = %s;', (1, 'baz'))
pyformat
cursor.execute('SELECT * FROM t WHERE a = %(foo)s, b = %(bar)s;', {'foo': 1, 'bar': 'baz'})
Both of these would execute as:
SELECT * FROM t WHERE a = 1, b = 'baz';
You should make sure to explore the documentation of the database API package you're using. One extremely helpful thing I came across using psycopg2, a PostgresSQL package, was its extras module. For instance, a common problem when trying to insert data securely is encountered when inserting multiple rows at once. psycopg2 has a clever solution to this problem in its execute_values function. Using execute_values, this code:
execute_values(cursor, "INSERT INTO t (a, b) VALUES %s;", ((1, 'foo'), (2, 'baz')))
... is executed as:
"INSERT INTO t (a, b) VALUES (1, 'foo'), (2, 'baz');"
I don't think the current answer actually addresses "what if I have parameters that are optional?", so I have a similar scenario that I'm solving like this -
def some_func(a, b = None, c = None):
where_clause = f"id = {str(a)}"
if b:
where_clause += f" and lower(b) = lower('{b}')"
if c:
where_clause += f" and c = {str(c)}"
query = f"select * from table where {where_clause}"
Not the most scalable solution but it works if you only have a few optional parameters. You could refactor the clause-builder into its own function to build the string and accept parameters for any transforms that need to be applied (lower, etc).
I also assume there are some ORM's with functionality that solves this but while working on a small app this is sufficient for me.

Python SQL Alchemy how to query by excluding selected columns

I basically just need to know how to query by excluding selected columns. Is this possible?
Example: I have table which has id, name, age, address, location, birth, age, sex... etc.
Instead of citing out the columns to retrieve, I'd like to just exclude some columns in the query(exclude age for example).
Sample code:
db.session.query(User.username).filter_by(username = request.form['username'], password = request.form['password']).first()
Last thing I wanna do is to list down all the attributes on the query() method, since this would be pretty long especially when you have lots of attributes, thus I just wanna exclude some columns.
Not sure why you're not just fetching the model. When doing that, you can defer loading of certain columns so that they are only queried on access.
db.session.query(User).options(db.defer('location')).filter_by(...).first()
In this example, accessing User.location the first time on an instance will issue another query to get the data.
See the documentation on column deferral: http://sqlalchemy.readthedocs.org/en/rel_0_9/orm/mapper_config.html?highlight=defer#column-deferral-api
Note that unless you're loading huge amounts of data, you won't see any speedup with this. It might actually make things slower since another query will be issued later. I have queries that load thousands of rows with eager-loaded relationships in less than 200ms, so this might be a case of premature optimization.
We can use the Inspection API to get the model's columns, and then create a list of columns that we want.
exclude = {'age', 'registration_date'}
insp = sa.inspect(User)
include = [c for c in insp.columns if c.name not in exclude]
# Traditional ORM style
with Session() as s:
q = s.query(*include)
for row in q:
print(row.id, row.name)
print()
# 1.4 style
with Session() as s:
q = sa.select(*include)
for row in s.execute(q):
print(row.id, row.name)
print()
inspect returns the mapper for the model class; to work with non-column attributes like relationships use one of the mapper's other attributes, such as all_orm_descriptors.
If you're using an object deserializer like marshmallow, it is easier to omit the required fields during the deserialization.
https://marshmallow.readthedocs.io/en/latest/api_reference.html#marshmallow.EXCLUDE
The fields to be omitted can be formed dynamically and conditionally excluded. Example:
ModelSchema(exclude=(field1, field2,)).jsonify(records)
I am not aware of a method that does that directly, but you can always get the column keys, exclude your columns, then call the resulting list. You don't need to see what is in the list while doing that.
q = db.session.query(blah blah...)
exclude = ['age']
targ_cols = [x for x in q.first().keys() if x not in exclude]
q.with_entities(targ_cols).all()

Changing where clause without generating subquery in SQLAlchemy

I'm trying to build a relatively complex query and would like to manipulate the where clause of the result directly, without cloning/subquerying the returned query. An example would look like:
session = sessionmaker(bind=engine)()
def generate_complex_query():
return select(
columns=[location.c.id.label('id')],
from_obj=location,
whereclause=location.c.id>50
).alias('a')
query = generate_complex_query()
# based on this query, I'd like to add additional where conditions, ideally like:
# `query.where(query.c.id<100)`
# but without subquerying the original query
# this is what I found so far, which is quite verbose and it doesn't solve the subquery problem
query = select(
columns=[query.c.id],
from_obj=query,
whereclause=query.c.id<100
)
# Another option I was considering was to map the query to a class:
# class Location(object):pass
# mapper(Location, query)
# session.query(Location).filter(Location.id<100)
# which looks more elegant, but also creates a subquery
result = session.execute(query)
for r in result:
print r
This is the generated query:
SELECT a.id
FROM (SELECT location.id AS id
FROM location
WHERE location.id > %(id_1)s) AS a
WHERE a.id < %(id_2)s
I would like to obtain:
SELECT location.id AS id
FROM location
WHERE id > %(id_1)s and
id < %(id_2)s
Is there any way to achieve this? The reason for this is that I think query (2) is slightly faster (not much), and the mapper example (2nd example above) which I have in place messes up the labels (id becomes anon_1_id or a.id if I name the alias).
Why don't you do it like this:
query = generate_complex_query()
query = query.where(location.c.id < 100)
Essentially you can refine any query like this. Additionally, I suggest reading the SQL Expression Language Tutorial which is pretty awesome and introduces all the techniques you need. The way you build a select is only one way. Usually, I build my queries more like this: select(column).where(expression).where(next_expression) and so on. The FROM is usually automatically inferred by SQLAlchemy from the context, i.e. you rarely need to specify it.
Since you don't have access to the internals of generate_complex_query try this:
query = query.where(query.c.id < 100)
This should work in your case I presume.
Another idea:
query = query.where(text("id < 100"))
This uses SQLAlchemy's text expression. This could work for you, however, and this is important: If you want to introduce variables, read the description of the API linked above, because just using format strings intead of bound parameters will open you up to SQL injection, something that normally is a no-brainer with SQLAlchemy but must be taken care of if working with such literal expressions.
Also note that this works because you label the column as id. If you don't do that and don't know the column name, then this won't work either.

How can I reference columns by their names in python calling SQLite?

I have some code which I've been using to query MySQL, and I'm hoping to use it with SQLite. My real hope is that this will not involve making too many changes to the code. Unfortunately, the following code doesn't work with SQLite:
cursor.execute(query)
rows = cursor.fetchall()
data = []
for row in rows
data.append(row["column_name"])
This gives the following error:
TypeError: tuple indices must be integers
Whereas if I change the reference to use a column number, it works fine:
data.append(row[1])
Can I execute the query in such a way that I can reference columns by their names?
In the five years since the question was asked and then answered, a very simple solution has arisen. Any new code can simply wrap the connection object with a row factory. Code example:
import sqlite3
conn = sqlite3.connect('./someFile')
conn.row_factory = sqlite3.Row // Here's the magic!
cursor = conn.execute("SELECT name, age FROM someTable")
for row in cursor:
print(row['name'])
Here are some fine docs. Enjoy!
To access columns by name, use the row_factory attribute of the Connection instance. It lets you set a function that takes the arguments cursor and row, and return whatever you'd like. There's a few builtin to pysqlite, namely sqlite3.Row, which does what you've asked.
This can be done by adding a single line after the "connect" statment:
conn.row_factory = sqlite3.Row
Check the documentation here:
http://docs.python.org/library/sqlite3.html#accessing-columns-by-name-instead-of-by-index
I'm not sure if this is the best approach, but here's what I typically do to retrieve a record set using a DB-API 2 compliant module:
cursor.execute("""SELECT foo, bar, baz, quux FROM table WHERE id = %s;""",
(interesting_record_id,))
for foo, bar, baz, quux in cursor.fetchall():
frobnicate(foo + bar, baz * quux)
The query formatting method is one of the DB-API standards, but happens to be the preferred method for Psycopg2; other DB-API adapters might suggest a different convention which will be fine.
Writing queries like this, where implicit tuple unpacking is used to work with the result set, has typically been more effective for me than trying to worry about matching Python variable names to SQL column names (which I usually only use to drop prefixes, and then only if I'm working with a subset of the column names such that the prefixes don't help to clarify things anymore), and is much better than remembering numerical column IDs.
This style also helps you avoid SELECT * FROM table..., which is just a maintenance disaster for anything but the simplest tables and queries.
So, not exactly the answer you were asking for, but possibly enlightening nonetheless.
The SQLite API supports cursor.description so you can easily do it like this
headers = {}
for record in cursor.fetchall():
if not headers:
headers = dict((desc[0], idx) for idx,desc in cursor.description))
data.append(record[headers['column_name']])
A little long winded but gets the job done. I noticed they even have it in the factory.py file under dict_factory.
kushal's answer to this forum works fine:
Use a DictCursor:
import MySQLdb.cursors
.
.
.
cursor = db.cursor (MySQLdb.cursors.DictCursor)
cursor.execute (query)
rows = cursor.fetchall ()
for row in rows:
print row['employee_id']
Please take note that the column name is case sensitive.
Use the cursor description, like so:
rows = c.fetchall()
for row in rows:
for col_i, col in enumerate(row):
print("Attribute: {0:30} Value: {1}".format(c.description[col_i][0],col))

Categories