I just started using sqlite3 with python .
I would like to know the difference between :
cursor = db.execute("SELECT customer FROM table")
for row in cursor:
print row[0]
and
cursor = db.execute("SELECT customer FROM table")
for row in cursor.fetchall():
print row[0]
Except that cursor is <type 'sqlite3.Cursor'> and cursor.fetchall() is <type 'list'>, both of them have the same result .
Is there a difference, a preference or specific cases where one is more preferred than the other ?
fetchall() reads all records into memory, and then returns that list.
When iterating over the cursor itself, rows are read only when needed.
This is more efficient when you have much data and can handle the rows one by one.
The main difference is precisely the call to fetchall(). By issuing fetchall(), it will return a list object filled with all the elements remaining in your initial query (all elements if you haven't get anything yet). This has several drawbacks:
Increments memory usage: by storing all the query's elements in a list, which could be huge
Bad performance: filling the list can be quite slow if there are many elements
Process termination: If it is a huge query then your program might crash by running out of memory.
When you instead use cursor iterator (for e in cursor:) you get the query's rows lazily. This means, returning one by one only when the program requires it.
Surely that the output of your two code snippets are the same, but internally there's a huge perfomance drawback between using the fetchall() against using only cursor.
Hope this helps!
Related
Why do I get nothing when I execute cursor.fetchall() twice after a cursor.execute()? Is there anyway of preventing this from happening? Do I need to store the information on a variable? Is it suppose to work this way?
fetchall does what it says--it fetches all. There's nothing left after that. To get more results, you'd need to run another query (or the same query again).
From the python db-api 2.0 specification:
cursor.fetchall()
Fetch all (remaining) rows of a query result, returning
them as a sequence of sequences (e.g. a list of tuples).
Note that the cursor's arraysize attribute can affect the
performance of this operation.
cursor.fetchone()
Fetch the next row of a query result set, returning a
single sequence, or None when no more data is
available. [6]
I have a small piece of code which inserts some data into a database. However, the data is being inserting in a reverse order.
If i "commit" after the for loop has run through, it inserts backwards, if i "commit" as part of the for loop, it inserts in the correct order, however it is much slower.
How can i commit after the for loop but still retain the correct order?
import subprocess, sqlite3
output4 = subprocess.Popen(['laZagne.exe', 'all'], stdout=subprocess.PIPE).communicate()[0]
lines4 = output4.splitlines()
conn = sqlite3.connect('DBNAME')
cur = conn.cursor()
for j in lines4:
print j
cur.execute('insert into Passwords (PassString) VALUES (?)',(j,))
conn.commit()
conn.close()
You can't rely on any ordering in SQL database tables. Insertion takes place in an implementation-dependent manner, and where rows end up depends entirely on the storage implementation used and the data that is already there.
As such, no reversing takes place; if you are selecting data from the table again and these rows come back in a reverse order, then that's a coincidence and not a choice the database made.
If rows must come back in a specific order, use ORDER BY when selecting. You could order by ROWID for example, which may be increasing monotonically for new rows and thus give you an approximation for insertion order. See ROWIDs and the INTEGER PRIMARY KEY.
I am trying to extract some information from one table and store it in another table using Sqlite and Python. Table 1 contains a list of websites in the form of (www.abc.com). I am trying to extract the (abc) part from each row and store it in Table 2 which also maintains a count for each site. If the site already exist in Table 2, then it just increment the count.
Here the code I have:
p = re.compile('^.+\.([a-zA-Z]+)\..+$')
for row in c.execute('SELECT links FROM table1'):
link = p.match(row[0])
if link.group(1):
print(link.group(1))
c.execute('SELECT EXISTS(SELECT 1 FROM table2 WHERE site_name = ?)', (link.group(1), ))
When I run the script, it will only execute once, then I get:
Traceback (most recent call last):
File "test.py", line 43, in <module>
link = p.match(row[0])
TypeError: expected string or buffer
If I comment out the c.execute line, all the site names are printed properly. I am new to Python and Sqlite, so I am not sure what the problem is.
Any help will be great, thanks in advance.
The problem is that you're iterating over a cursor whose rows contain a single string:
for row in c.execute('SELECT links FROM table1'):
… but then, in the middle of the iteration, you change it into a cursor whose rows consist of a single number:
c.execute('SELECT EXISTS(SELECT 1 FROM table2 WHERE site_name = ?)', (link.group(1), ))
So, when you get the next row, it's going to be [1] instead of ['http://example.com'], so p.match(row[0]) is passing the number 1 to match, and it's complaining that 1 is not a string or buffer.
For future reference, it's really helpful to debug things by looking at the intermediate values. Whether you run in the debugger, or just add print(row) calls and the like to log what's going on, you'd know that it works the first time through the loop, but that it fails the second time, and that row looked like [1] when it failed. That would make it much easier for you to track down the problem (or allow you to ask a better question on SO, because obviously you still won't be able to find everything yourself.)
You could fix this in (at least) three ways, in increasing order of "goodness if appropriate":
Fetch all of the values from the first query, then loop over those, so your second query doesn't get in the way.
Use a separate cursor for each query, instead of reusing the same one.
Don't make the second query in the first place—it's a SELECT query and you aren't doing anything with the rows, so what good is it doing?
The inner execute is probably stepping on your cursor iterator state. Try creating a second cursor object for that query.
I have a table, and I want to execute a query that will return the values of two rows:
cursor.execute("""SELECT `egg_id`
FROM `groups`
WHERE `id` = %s;""", (req_id))
req_egg = str(cursor.fetchone())
print req_egg
The column egg_id has two rows it can return for that query, however the above code will only print the first result -- I want it to also show the second, how would I get both values?
Edit: Would there be any way to store each one in a separate variable, with fetchmany?
in this case you can use fetchmany to fetch a specified number of rows:
req_egg = cursor.fetchmany(2)
edit:
but be aware: if you have a table with many rows but only need two, then you should also use a LIMIT in your sql query, otherwise all rows are returned from the database, but only two are used by your program.
Call .fetchone() a second time, and it would return the next result.
Otherwise if you are 100% positively sure that your query would always return exactly two results, even if you've had a bug or inconsistent data in the database, then just do a .fetchall() and capture both results.
Try this:
Cursor.fetchmany(size=2)
Documentation for sqlite3 (which also implements dbapi): http://docs.python.org/library/sqlite3.html#sqlite3.Cursor.fetchmany
There are several ways to iterate over a result set. What are the tradeoff of each?
The canonical way is to use the built-in cursor iterator.
curs.execute('select * from people')
for row in curs:
print row
You can use fetchall() to get all rows at once.
for row in curs.fetchall():
print row
It can be convenient to use this to create a Python list containing the values returned:
curs.execute('select first_name from people')
names = [row[0] for row in curs.fetchall()]
This can be useful for smaller result sets, but can have bad side effects if the result set is large.
You have to wait for the entire result set to be returned to
your client process.
You may eat up a lot of memory in your client to hold
the built-up list.
It may take a while for Python to construct and deconstruct the
list which you are going to immediately discard anyways.
If you know there's a single row being returned in the result set you can call fetchone() to get the single row.
curs.execute('select max(x) from t')
maxValue = curs.fetchone()[0]
Finally, you can loop over the result set fetching one row at a time. In general, there's no particular advantage in doing this over using the iterator.
row = curs.fetchone()
while row:
print row
row = curs.fetchone()
My preferred way is the cursor iterator, but setting first the arraysize property of the cursor.
curs.execute('select * from people')
curs.arraysize = 256
for row in curs:
print row
In this example, cx_Oracle will fetch rows from Oracle 256 rows at a time, reducing the number of network round trips that need to be performed
There's also the way psyco-pg seems to do it... From what I gather, it seems to create dictionary-like row-proxies to map key lookup into the memory block returned by the query. In that case, fetching the whole answer and working with a similar proxy-factory over the rows seems like useful idea. Come to think of it though, it feels more like Lua than Python.
Also, this should be applicable to all PEP-249 DBAPI2.0 interfaces, not just Oracle, or did you mean just fastest using Oracle?