How can I reference columns by their names in python calling SQLite? - python

I have some code which I've been using to query MySQL, and I'm hoping to use it with SQLite. My real hope is that this will not involve making too many changes to the code. Unfortunately, the following code doesn't work with SQLite:
cursor.execute(query)
rows = cursor.fetchall()
data = []
for row in rows
data.append(row["column_name"])
This gives the following error:
TypeError: tuple indices must be integers
Whereas if I change the reference to use a column number, it works fine:
data.append(row[1])
Can I execute the query in such a way that I can reference columns by their names?

In the five years since the question was asked and then answered, a very simple solution has arisen. Any new code can simply wrap the connection object with a row factory. Code example:
import sqlite3
conn = sqlite3.connect('./someFile')
conn.row_factory = sqlite3.Row // Here's the magic!
cursor = conn.execute("SELECT name, age FROM someTable")
for row in cursor:
print(row['name'])
Here are some fine docs. Enjoy!

To access columns by name, use the row_factory attribute of the Connection instance. It lets you set a function that takes the arguments cursor and row, and return whatever you'd like. There's a few builtin to pysqlite, namely sqlite3.Row, which does what you've asked.

This can be done by adding a single line after the "connect" statment:
conn.row_factory = sqlite3.Row
Check the documentation here:
http://docs.python.org/library/sqlite3.html#accessing-columns-by-name-instead-of-by-index

I'm not sure if this is the best approach, but here's what I typically do to retrieve a record set using a DB-API 2 compliant module:
cursor.execute("""SELECT foo, bar, baz, quux FROM table WHERE id = %s;""",
(interesting_record_id,))
for foo, bar, baz, quux in cursor.fetchall():
frobnicate(foo + bar, baz * quux)
The query formatting method is one of the DB-API standards, but happens to be the preferred method for Psycopg2; other DB-API adapters might suggest a different convention which will be fine.
Writing queries like this, where implicit tuple unpacking is used to work with the result set, has typically been more effective for me than trying to worry about matching Python variable names to SQL column names (which I usually only use to drop prefixes, and then only if I'm working with a subset of the column names such that the prefixes don't help to clarify things anymore), and is much better than remembering numerical column IDs.
This style also helps you avoid SELECT * FROM table..., which is just a maintenance disaster for anything but the simplest tables and queries.
So, not exactly the answer you were asking for, but possibly enlightening nonetheless.

The SQLite API supports cursor.description so you can easily do it like this
headers = {}
for record in cursor.fetchall():
if not headers:
headers = dict((desc[0], idx) for idx,desc in cursor.description))
data.append(record[headers['column_name']])
A little long winded but gets the job done. I noticed they even have it in the factory.py file under dict_factory.

kushal's answer to this forum works fine:
Use a DictCursor:
import MySQLdb.cursors
.
.
.
cursor = db.cursor (MySQLdb.cursors.DictCursor)
cursor.execute (query)
rows = cursor.fetchall ()
for row in rows:
print row['employee_id']
Please take note that the column name is case sensitive.

Use the cursor description, like so:
rows = c.fetchall()
for row in rows:
for col_i, col in enumerate(row):
print("Attribute: {0:30} Value: {1}".format(c.description[col_i][0],col))

Related

How to insert user variable into an SQL Update/Select statement using python [duplicate]

This question already has answers here:
How to use variables in SQL statement in Python?
(5 answers)
Closed 2 months ago.
def update_inv_quant():
new_quant = int(input("Enter the updated quantity in stock: "))
Hello! I'm wondering how to insert a user variable into an sql statement so that a record is updated to said variable. Also, it'd be really helpful if you could also help me figure out how to print records of the database into the actual python console. Thank you!
I tried doing soemthing like ("INSERT INTO Inv(ItemName) Value {user_iname)") but i'm not surprised it didnt work
It would have been more helpful if you specified an actual database.
First method (Bad)
The usual way (which is highly discouraged as Graybeard said in the comments) is using python's f-string. You can google what it is and how to use it more in-depth.
but basically, say you have two variables user_id = 1 and user_name = 'fish', f-string turns something like f"INSERT INTO mytable(id, name) values({user_id},'{user_name}')" into the string INSERT INTO mytable(id,name) values(1,'fish').
As we mentioned before, this causes something called SQL injection. There are many good youtube videos that demonstrate what that is and why it's dangerous.
Second method
The second method is dependent on what database you are using. For example, in Psycopg2 (Driver for PostgreSQL database), the cursor.execute method uses the following syntax to pass variables cur.execute('SELECT id FROM users WHERE cookie_id = %s',(cookieid,)), notice that the variables are passed in a tuple as a second argument.
All databases use similar methods, with minor differences. For example, I believe SQLite3 uses ? instead of psycopg2's %s. That's why I said that specifying the actual database would have been more helpful.
Fetching records
I am most familiar with PostgreSQL and psycopg2, so you will have to read the docs of your database of choice.
To fetch records, you send the query with cursor.execute() like we said before, and then call cursor.fetchone() which returns a single row, or cursor.fetchall() which returns all rows in an iterable that you can directly print.
Execute didn't update the database?
Statements executing from drivers are transactional, which is a whole topic by itself that I am sure will find people on the internet who can explain it better than I can. To keep things short, for the statement to physically change the database, you call connection.commit() after cursor.execute()
So finally to answer both of your questions, read the documentation of the database's driver and look for the execute method.
This is what I do (which is for sqlite3 and would be similar for other SQL type databases):
Assuming that you have connected to the database and the table exists (otherwise you need to create the table). For the purpose of the example, i have used a table called trades.
new_quant = 1000
# insert one record (row)
command = f"""INSERT INTO trades VALUES (
'some_ticker', {new_quant}, other_values, ...
) """
cur.execute(command)
con.commit()
print('trade inserted !!')
You can then wrap the above into your function accordingly.

Inserting data from a CSV file to postgres using SQL

Struggling with this python issue as I'm new to it and I don't have significant experience in the language. I currently have a CSV file with containing around 20 headers and the same amount of rows so listing each out like some examples here is what I'm trying to avoid:
https://www.dataquest.io/blog/loading-data-into-postgres/
My code consists of the following so far:
with open('dummy-data.csv', 'r') as f:
reader = csv.reader(f)
next(reader)
for row in reader:
cur.execute('INSERT INTO messages VALUES', (row))
I'm getting a syntax error at the end of the input, so I assumed it is linked to the way my execute method has been written but I still don't know what I would do in order to address the issue. Any help?
P.S. I understand the person usings %s for that, but if that was the case, can it be avoided since I don't want to have it duplicated in a line 20 times.
Basically, you DO have to specify at least the required placeholders - and preferably the fields names too - in your query.
If it's a one-shot affair and you know which fields are in the CSV and in which order, then you simply hardcode them in the query ie
SQL = "insert into tablename(field1, field2, field21) values(%s, %s, %s)"
Ok, for 20 or so fields it gets quite boring, so you can also use a list of field names to generate the fieldnames part and the placeholders:
fields = ["field1", "field2", "field21"]
placeholders = ["%s"] * len(fields) # list multiplication, yes
SQL = "insert into tablename({}) values({})".format(", ".join(fields), ", ".join(placeholders))
If by chance the CSV header row contains the exact field names, you can also just use this row as value for fields - but you have to trust the csv then.
NB: specifying the fields list in the query is not strictly required but it can protect you from possible issues with a malformed csv. Actually, unless you really trust the source (your csv), you should actively validate the incoming data before sending them to the database.
NB2:
%s is for strings I know but would it work the same for timestamps?
In this case, "%s" is not used as a Python string format specifier but as a plain database query placeholder. The choice of the string format specifier here is really unfortunate as it creates a lot of confusion. Note that this is DB vendor specific though, some vendors use "?" instead which is much clearer IMHO (and you want to check your own db-api connector's doc for the correct plaeholder to use BTW).
And since it's not a string format specifier, it will work for any type and doesn't need to be quoted for strings, it's the db-api module's job to do proper formatting (including quoting etc) according to the db column's type.
While we're at it, by all means, NEVER directly use Python string formatting operations when passing values to your queries - unless you want your database to be open-bar for script-kiddies of course.
The problem lies on the insert itself:
cur.execute('INSERT INTO messages VALUES', (row))
The problem is that, since you are not defining parameters on the query, it is interpreting that you literally want to execute INSERT INTO messages VALUES, with no parameters, which will cause a syntax error; using a single parameter won't work either, since it will understand that you want a single parameter, instead of multiple parameters.
If you want to create parameters in a more dynamic way, you could try to construct the query string dynamically.
Please, take a look the documentation: http://initd.org/psycopg/docs/cursor.html#cursor.execute
You can use strings multiply.
import csv
import psycopg2
conn = psycopg2.connect('postgresql://db_user:db_user_password#server_name:port/db_name')
cur = conn.cursor()
multiple_placehorders = ','.join(['%s']*20)
with open('dummy-data.csv', 'r') as f:
reader = csv.reader(f)
next(reader)
for row in reader:
cur.execute('INSERT INTO public.messages VALUES (' + multiple_placehorders + ')', row)
conn.commit()
If you want to have a single placeholder that covers an whole list of values, you can use a different method, located in "extras", which covers that usage:
psycopg2.extras.execute_values(cur, 'INSERT INTO messages VALUES %s', (row,))
This method can take many rows at a time (which is good for performance), which is why you need to wrap your single row in (...,).
Last time when I was struggling to insert a CSV data into the postgres I've used pgAdmin and it has worked. I don't know whether this answer is a solution but an easy idea to get along with it.
You can use the cursor and executemany so that you can skip the iteration , But its slower than string joining parameterized approach.
import pandas
df = pd.read_csv('dummy-data.csv')
df.columns = [<define the headers here>] # You can skip this line if headers match column names
try:
cursor.prepare("insert into public.messages(<Column Names>) values(:1, :2, :3 ,:4, :5)")
cursor.executemany(None, df.values.tolist())
conn.commit()
except:
conn.rollback()

Get query result as a tuple to make substitution

I build dynamic queries, so that I do not know in advance table name and table fields. I do it in order to export programmatically data from one arbitrary table to another. So, my algorithm gets as a parameter name of source table, name of destination table and from one system table gets fields mapping (from one table to another). I almost did it. I built a select query from source table, so I can do
cursor.execute(selectquery)
for row in cursor:
... do something with rows
And besides I built a template of insert query for the destination table, so it looks like
insert into sourcetable (attr1,attr2,attr3) values (%s,%s,%s) # let me call it template_query
Now I want to substitute those %s, %s, %s with values returned by the select query. Something like this (which not works, but demonstrates what I want):
cursor.execute(selectquery)
for row in cursor:
final_query = template_query % row # <- I want this substitution
cursor2.execute(final_query)
I use something similar. What you need to do is to decorate/wrap the row with a __getitem__ and then use %(colname)s rather than %s in the template's values.
class Wrapper(object):
def __init__(self,o):
self.o = o
def __getitem__(self,key):
try:
return getattr(self.o, key)
except AttributeError:
raise KeyError, key
Then, using the django shell (my model has one column, tagtype)
python manage.py shell
>>> from wrapper import Wrapper
>>> from pssystem.models import MetaTag
>>> o = MetaTag.objects.all()[0]
>>> w = Wrapper(o)
>>> "insert into sourcetable (attr1,attr2,attr3) values ('%(tagtype)s','%(tagtype)s, '%(tagtype)s)" % w
u"insert into sourcetable (attr1,attr2,attr3) values ('PROFILE','PROFILE, 'PROFILE)"
You can get fancier than that (and you definitely should if the source object contains untrusted, user-entered, content), but this works fine.
Notice that you need to add quotes around the substitutions if those are character variables. Dates might also be fun too!
Hmmm, sorry, just noticed that your source rows are coming from a straight select rather than a fetch from a Django model. The Django tag confused me -- there is very little Django in your question. Well then, it still works, but you first need to do something with the cursor's result rows.
Something like this does the trick:
def fmtRow(cursor, row):
di = dict()
for i, col in enumerate(cursor.description):
di[col] = row[i]
return di
and then you can dispense with the Wrapper, because your row is changed to a dictionary already.
This is a very naive implementation, not suitable for high volumes, but it works.
You can use kwargs to update querysets dynamically.
kwargs = {'name': "Jenny", 'color': "Blue"}
print People.objects.filter(**kwargs)
I'm not sure this helps with the dynamically named table though. Maybe something like this would help: http://dynamic-models.readthedocs.org/en/latest/ (it's where that kwarg example came from).

syntax error when attempting to insert data into postgresql

I am attempting to insert parsed dta data into a postgresql database with each row being a separate variable table, and it was working until I added in the second row "recodeid_fk". The error I now get when attempting to run this code is: pg8000.errors.ProgrammingError: ('ERROR', '42601', 'syntax error at or near "imp"').
Eventually, I want to be able to parse multiple files at the same time and insert the data into the database, but if anyone could help me understand whats going on now it would be fantastic. I am using Python 2.7.5, the statareader is from pandas 0.12 development records, and I have very little experience in Python.
dr = statareader.read_stata('file.dta')
a = 2
t = 1
for t in range(1,10):
z = str(t)
for date, row in dr.iterrows():
cur.execute("INSERT INTO tblv00{} (data, recodeid_fk) VALUES({}, {})".format(z, str(row[a]),29))
a += 1
t += 1
conn.commit()
cur.close()
conn.close()
To your specific error...
The syntax error probably comes from strings {} that need quotes around them. execute() can take care of this for you automtically. Replace
execute("INSERT INTO tblv00{} (data, recodeid_fk) VALUES({}, {})".format(z, str(row[a]),29))
execute("INSERT INTO tblv00{} (data, recodeid_fk) VALUES(%s, %s)".format(z), (row[a],29))
The table name is completed the same way as before, but the the values will be filled in by execute, which inserts quotes if they are needed. Maybe execute could fill in the table name too, and we could drop format entirely, but that would be an unusual usage, and I'm guessing execute might (wrongly) put quotes in the middle of the name.
But there's a nicer approach...
Pandas includes a function for writing DataFrames to SQL tables. Postgresql is not yet supported, but in simple cases you should be able to pretend that you are connected to sqlite or MySQL database and have no trouble.
What do you intend with z here? As it is, you loop z from '1' to '9' before proceeding to the next for loop. Should the loops be nested? That is, did you mean to insert the contents dr into nine different tables called tblv001 through tblv009?
If you mean that loop to put different parts of dr into different tables, please check the indentation of your code and clarify it.
In either case, the link above should take care of the SQL insertion.
Response to Edit
It seems like t, z, and a are doing redundant things. How about:
import pandas as pd
import string
...
# Loop through columns of dr, and count them as we go.
for i, col in enumerate(dr):
table_name = 'tblv' + string.zfill(i, 3) # e.g., tblv001 or tblv010
df1 = DataFrame(dr[col]).reset_index()
df1.columns = ['data', 'recodeid_fk']
pd.io.sql.write_frame(df1, table_name, conn)
I used reset_index to make the index into a column. The new (sequential) index will not be saved by write_frame.

Python: Number of rows affected by cursor.execute("SELECT ...)

How can I access the number of rows affected by:
cursor.execute("SELECT COUNT(*) from result where server_state='2' AND name LIKE '"+digest+"_"+charset+"_%'")
Try using fetchone:
cursor.execute("SELECT COUNT(*) from result where server_state='2' AND name LIKE '"+digest+"_"+charset+"_%'")
result=cursor.fetchone()
result will hold a tuple with one element, the value of COUNT(*).
So to find the number of rows:
number_of_rows=result[0]
Or, if you'd rather do it in one fell swoop:
cursor.execute("SELECT COUNT(*) from result where server_state='2' AND name LIKE '"+digest+"_"+charset+"_%'")
(number_of_rows,)=cursor.fetchone()
PS. It's also good practice to use parametrized arguments whenever possible, because it can automatically quote arguments for you when needed, and protect against sql injection.
The correct syntax for parametrized arguments depends on your python/database adapter (e.g. mysqldb, psycopg2 or sqlite3). It would look something like
cursor.execute("SELECT COUNT(*) from result where server_state= %s AND name LIKE %s",[2,digest+"_"+charset+"_%"])
(number_of_rows,)=cursor.fetchone()
From PEP 249, which is usually implemented by Python database APIs:
Cursor Objects should respond to the following methods and attributes:
[…]
.rowcount
This read-only attribute specifies the number of rows that the last .execute*() produced (for DQL statements like 'select') or affected (for DML statements like 'update' or 'insert').
But be careful—it goes on to say:
The attribute is -1 in case no .execute*() has been performed on the cursor or the rowcount of the last operation is cannot be determined by the interface. [7]
Note:
Future versions of the DB API specification could redefine the latter case to have the object return None instead of -1.
So if you've executed your statement, and it works, and you're certain your code will always be run against the same version of the same DBMS, this is a reasonable solution.
The number of rows effected is returned from execute:
rows_affected=cursor.execute("SELECT ... ")
of course, as AndiDog already mentioned, you can get the row count by accessing the rowcount property of the cursor at any time to get the count for the last execute:
cursor.execute("SELECT ... ")
rows_affected=cursor.rowcount
From the inline documentation of python MySQLdb:
def execute(self, query, args=None):
"""Execute a query.
query -- string, query to execute on server
args -- optional sequence or mapping, parameters to use with query.
Note: If args is a sequence, then %s must be used as the
parameter placeholder in the query. If a mapping is used,
%(key)s must be used as the placeholder.
Returns long integer rows affected, if any
"""
In my opinion, the simplest way to get the amount of selected rows is the following:
The cursor object returns a list with the results when using the fetch commands (fetchall(), fetchone(), fetchmany()). To get the selected rows just print the length of this list. But it just makes sense for fetchall(). ;-)
print len(cursor.fetchall)
# python3
print(len(cur.fetchall()))
To get the number of selected rows I usually use the following:
cursor.execute(sql)
count = len(cursor.fetchall())
when using count(*) the result is {'count(*)': 9}
-- where 9 represents the number of rows in the table, for the instance.
So, in order to fetch the just the number, this worked in my case, using mysql 8.
cursor.fetchone()['count(*)']

Categories