Python - Regex pattern matching issue after executing a sqlite query - python

I am trying to extract some information from one table and store it in another table using Sqlite and Python. Table 1 contains a list of websites in the form of (www.abc.com). I am trying to extract the (abc) part from each row and store it in Table 2 which also maintains a count for each site. If the site already exist in Table 2, then it just increment the count.
Here the code I have:
p = re.compile('^.+\.([a-zA-Z]+)\..+$')
for row in c.execute('SELECT links FROM table1'):
link = p.match(row[0])
if link.group(1):
print(link.group(1))
c.execute('SELECT EXISTS(SELECT 1 FROM table2 WHERE site_name = ?)', (link.group(1), ))
When I run the script, it will only execute once, then I get:
Traceback (most recent call last):
File "test.py", line 43, in <module>
link = p.match(row[0])
TypeError: expected string or buffer
If I comment out the c.execute line, all the site names are printed properly. I am new to Python and Sqlite, so I am not sure what the problem is.
Any help will be great, thanks in advance.

The problem is that you're iterating over a cursor whose rows contain a single string:
for row in c.execute('SELECT links FROM table1'):
… but then, in the middle of the iteration, you change it into a cursor whose rows consist of a single number:
c.execute('SELECT EXISTS(SELECT 1 FROM table2 WHERE site_name = ?)', (link.group(1), ))
So, when you get the next row, it's going to be [1] instead of ['http://example.com'], so p.match(row[0]) is passing the number 1 to match, and it's complaining that 1 is not a string or buffer.
For future reference, it's really helpful to debug things by looking at the intermediate values. Whether you run in the debugger, or just add print(row) calls and the like to log what's going on, you'd know that it works the first time through the loop, but that it fails the second time, and that row looked like [1] when it failed. That would make it much easier for you to track down the problem (or allow you to ask a better question on SO, because obviously you still won't be able to find everything yourself.)
You could fix this in (at least) three ways, in increasing order of "goodness if appropriate":
Fetch all of the values from the first query, then loop over those, so your second query doesn't get in the way.
Use a separate cursor for each query, instead of reusing the same one.
Don't make the second query in the first place—it's a SELECT query and you aren't doing anything with the rows, so what good is it doing?

The inner execute is probably stepping on your cursor iterator state. Try creating a second cursor object for that query.

Related

python cursor return only those rows which first column is not empty

In Python 3.8, I have a select query.
dbconn.execute("select name, id, date from test_table")
That query returned always wrong number of rows. After too much debugging, I was able to fix it by only replacing id column place with name column and it started working normally.
The issue was with empty value for name column for some rows.
It means, python cursor returns only those rows which first column is not empty. Do I miss anything in my conclusion?
Check your database integrity, it might be that some of the entries are corrupted thus failing. Cuz I had issue before that the query is failing (wrong number of output) at some point due to integrity issue.
1 thing also is instead of returning the row as list/tuple, try it with dictionary-like with key-value pair.
dbconn.row_factory = sqlite3.Row

How to I read input from a file and use it in an sqlplus query?

I am trying something like
select customer_id, order_id from order_table where purchase_id = 10 OR
purchase_id = 25 OR
...
purchase_id = 25432;
Since the query is too big, I am running to variety of problems... if I run the entire query in a single line, I am running into the error:
SP2-0027: Input is too long (> 2499 characters) - line ignored
If split the query to multiple lines, the query gets corrupted, due to the interference with line numbers printed for each line of the entered query. If I disable line numbers, SQL> prompt at each line is troubling me.
Same error if run the query from a text file SQL> #query.sql
(I did not face such issues with mysql in the past but with sqlplus now).
I am not an expert in shell-script nor in python. It would be of great help if I can get pointers on how I can put all the purchase_ids in a text file, one purchase_id per line and supply it to sqlplus query at script-runtime.
I did sufficient research, but I still appreciate pointers as well.
1) Syntax change:
Try to use 'in (10,25,2542, ...)' instead of a series of 'OR'. It can reduce the size of the sql statement
2) Logic change:
Syntax may delay the inevitable, but the exception will still occur if there are a lot of id to exclude.
2a)
A straight-forward fix is to break the query down into batches. You can issue a select query per 50 purchase IDs until all IDs are covered.
2b)
Or you can look into a more generalised way to retrieve the same query result. Let's assume what you actually want to see is a list of 'unconfirmed order'. Then instead of a using a set of purchase IDs in the where clause, you can add a boolean field 'confirmed' to the order_table and select based on this criteria.
another idea:
Create a table "query_ids" (one column) and input all your order_id from the WHERE clause.
New query would be:
select customer_id, order_id from order_table where purchase_id = ( select * from query_ids);

Is sqlite3 fetchall necessary?

I just started using sqlite3 with python .
I would like to know the difference between :
cursor = db.execute("SELECT customer FROM table")
for row in cursor:
print row[0]
and
cursor = db.execute("SELECT customer FROM table")
for row in cursor.fetchall():
print row[0]
Except that cursor is <type 'sqlite3.Cursor'> and cursor.fetchall() is <type 'list'>, both of them have the same result .
Is there a difference, a preference or specific cases where one is more preferred than the other ?
fetchall() reads all records into memory, and then returns that list.
When iterating over the cursor itself, rows are read only when needed.
This is more efficient when you have much data and can handle the rows one by one.
The main difference is precisely the call to fetchall(). By issuing fetchall(), it will return a list object filled with all the elements remaining in your initial query (all elements if you haven't get anything yet). This has several drawbacks:
Increments memory usage: by storing all the query's elements in a list, which could be huge
Bad performance: filling the list can be quite slow if there are many elements
Process termination: If it is a huge query then your program might crash by running out of memory.
When you instead use cursor iterator (for e in cursor:) you get the query's rows lazily. This means, returning one by one only when the program requires it.
Surely that the output of your two code snippets are the same, but internally there's a huge perfomance drawback between using the fetchall() against using only cursor.
Hope this helps!

How can I "fetch two" with python-mysql?

I have a table, and I want to execute a query that will return the values of two rows:
cursor.execute("""SELECT `egg_id`
FROM `groups`
WHERE `id` = %s;""", (req_id))
req_egg = str(cursor.fetchone())
print req_egg
The column egg_id has two rows it can return for that query, however the above code will only print the first result -- I want it to also show the second, how would I get both values?
Edit: Would there be any way to store each one in a separate variable, with fetchmany?
in this case you can use fetchmany to fetch a specified number of rows:
req_egg = cursor.fetchmany(2)
edit:
but be aware: if you have a table with many rows but only need two, then you should also use a LIMIT in your sql query, otherwise all rows are returned from the database, but only two are used by your program.
Call .fetchone() a second time, and it would return the next result.
Otherwise if you are 100% positively sure that your query would always return exactly two results, even if you've had a bug or inconsistent data in the database, then just do a .fetchall() and capture both results.
Try this:
Cursor.fetchmany(size=2)
Documentation for sqlite3 (which also implements dbapi): http://docs.python.org/library/sqlite3.html#sqlite3.Cursor.fetchmany

lastrowid() alternative or syntax without using execute in sqlite python?

In sqlite3 in python, I'm trying to make a program where the new row in the table to be written will be inserted next, needs to be printed out. But I just read the documentation here that an INSERT should be used in execute() statement. Problem is that the program I'm making asks the user for his/her information and the primary key ID will be assigned for the member as his/her ID number must be displayed. So in other words, the execute("INSERT") statement must not be executed first as the ID Keys would be wrong for the assignment of the member.
I first thought that lastrowid can be run without using execute("INSERT") but I noticed that it always gave me the value "None". Then I read the documentation in sqlite3 in python and googled alternatives to solve this problem.
I've read through google somewhere that SELECT last_insert_rowid() can be used but would it be alright to ask what is the syntax of it in python? I've tried coding it like this
NextID = con.execute("select last_insert_rowid()")
But it just gave me an cursor object output ""
I've also been thinking of just making another table where there will always only be one value. It will get the value of lastrowid of the main table whenever there is a new input of data in the main table. The value it gets will then be inserted and overwritten in another table so that every time there is a new set of data needs to be input in the main table and the next row ID is needed, it will just access the table with that one value.
Or is there an alternative and easier way of doing this?
Any help is very much appreciated bows deeply
You could guess the next ID if you would query your table before asking the user for his/her information with
SELECT MAX(ID) + 1 as NewID FROM DesiredTable.
Before inserting the new data (including the new ID), start a transaction,
only rollback if the insert failes (because another process was faster with the same operation) and ask your user again. If eveything is OK just do a commit.
Thanks for the answers and suggestions posted everyone but I ended up doing something like this:
#only to get the value of NextID to display
TempNick = "ThisIsADummyNickToBeDeleted"
cur.execute("insert into Members (Nick) values (?)", (TempNick, ))
NextID = cur.lastrowid
cur.execute("delete from Members where ID = ?", (NextID, ))
So basically, in order to get the lastrowid, I ended up inserting a Dummy data then after getting the value of the lastrowid, the dummy data will be deleted.
lastrowid
This read-only attribute provides the rowid of the last modified row. It is only set if you issued an INSERT statement using the execute() method. For operations other than INSERT or when executemany() is called, lastrowid is set to None.
from https://docs.python.org/2/library/sqlite3.html

Categories