Check if entry exists in previous mysql query results - python

I'm using MySQLdb with python. I execute a SELECT query and store the results in a dictionary, D. The actual query is quite complicated so it's not clear how to do it in a single query, which is why I'm splitting it into two.
Then I run a second query and would like to add the condition that rows from columnB in the second query exist IN D.values(). ie, I'd like to do something like:
import MySQLdb
db = MySQLdb.connect()
cursor = db.cursor(MySQLdb.cursors.DictCursor)
cursor.execute("SELECT a, b FROM t1;")
results1 = cursor.fetchall()
I'd like to do the following, somehow passing an array from a previous query's results into to SELECT command:
cursor.execute("SELECT c, d FROM t2 WHERE d IN results1.b.values();")
Thank you,

It sounds like you can do what you want with a single query, and a join statement:
cursor.execute("SELECT t1.a, t1.b, t2.c, t2.d FROM t1 JOIN t2 ON t1.b = t2.d;")

You can set a set of values in an IN statement, but it is not advisable for the following reasons:
Slow: An IN statement becomes an OR a OR b OR etc statement, which is just slow in execution
An IN statement is limited in number of values you can place in there.
A better solution is to use a subquery:
SELECT c, d FROM t2 WHERE d IN (SELECT b FROM t1);
Or even better (faster) is the JOIN:
SELECT c, d FROM t2
INNER JOIN t1 ON t2.d=t1.b;

Related

Efficiently delete multiple records

I am trying to execute a delete statement that checks if the table has any SKU that exists in the SKU column of the dataframe. And if it does, it deletes the row. As I am using a for statement to iterate through the rows and check, it takes a long time to run the program for 6000 rows of data.
I used executemany() as it was faster than using a for loop for the delete statement, but I am finding it hard to find an alternative for checking values in the dataframe.
sname = input("Enter name: ")
cursor = mydb.cursor(prepared=True)
column = df["SKU"]
data=list([(sname, x) for x in column])
query="""DELETE FROM price_calculations1 WHERE Name=%s AND SKU=%s"""
cursor.executemany(query,data)
mydb.commit()
cursor.close()
Is there a more efficient code for achieving the same?
You could first use a GET id FROM price_calculations1 WHERE Name=%s AND SKU=%s
and then use a MYSQL WHILE loop to delete these ids without the need of a cursor, which seems to be more performant.
See: https://www.mssqltips.com/sqlservertip/6148/sql-server-loop-through-table-rows-without-cursor/
A WHILE loop without the previous get, might also work.
See: https://dev.mysql.com/doc/refman/8.0/en/while.html
Rather than looping, try to do all the work in a single call to the database (this guideline is often applicable when working with databases).
Given a list of name / sku pairs:
pairs = [(name1, sku1), (name2, sku2), ...]
create a query that identifies all the matching records and deletes them
base_query = """DELETE FROM t1.price_calculations1 t1
WHERE t1.id IN (
SELECT t2.id FROM price_calculations1 t2
WHERE {})
"""
# Build the WHERE clause criteria
criteria = "OR ".join(["(name = %s AND sku = %s)"] * len(pairs))
# Create the query
query = base_query.format(criteria)
# "Flatten" the value pairs
values = [i for j in pairs for i in j]
cursor.execute(query, values)
cursor.commit()

SqlAlchemy Outer Join Only Returns One Table

So when I run
select * from table1 t1 left outer join table2 t2 on t1.id = t2.id; in sqlite3 terminal
I get the data back as I want and would expect.
However, when I run this in SqlAlchemy
TableOneModel.query.outerjoin(TableTwoModel,TableOneModel.id == TableTwoModel.id)
I only get table1 information back. I don't even get empty columns from table2. Am I missing something silly?
You're probably using Flask-SQLAlchemy, which provides the query property as a shortcut for selecting model entities. Your query is equivalent to
db.session.query(TableOneModel).\
join(TableTwoModel,TableOneModel.id == TableTwoModel.id)
Either explicitly query for both entities:
db.session.query(TableOneModel, TableTwoModel).\
join(TableTwoModel,TableOneModel.id == TableTwoModel.id)
or add the entity to your original:
TableOneModel.query.\
join(TableTwoModel,TableOneModel.id == TableTwoModel.id).\
add_entity(TableTwoModel)

Any faster way to do mysql update query in R? in python?

I tried to run this query:
update table1 A
set number = (select count(distinct(id)) from table2 B where B.col1 = A.col1 or B.col2 = A.col2);
but it takes forever bc table1 has 1,100,000 rows and table2 has 350,000,000 rows.
Is there any faster way to do this query in R? or in python?
I rewrote your query with three subqueries instead of one - with UNION and two INNER JOIN statements:
UPDATE table1 as A
SET number = (SELECT COUNT(DISTINCT(id))
FROM
(SELECT A.id as id
FROM table1 as A
INNER JOIN table2 as B
ON A.col1 = B.col1) -- condition for col1
UNION DISTINCT
(SELECT A.id as id
FROM table1 as A
INNER JOIN table2 as B
ON A.col2 = B.col2) -- condition for col2
)
My notes:
Updating all of the rows in table1 doesn't look like a good idea, because we have to touch 1.1M rows. Probably, another data structure for storing number would have better performance
Try to run part of the query without update of table1 (only part of the query in parenthesis
Take a look into EXPLAIN, if you need more general approach for optimization of SQL queries: https://dev.mysql.com/doc/refman/5.7/en/using-explain.html

python mysqldb query with where

I use MySQLDB to query some data from database, when use like in sql, I am confused about sql sentence.
As I use like, so I construct below sql which can get correct result.
cur.execute("SELECT a FROM table WHERE b like %s limit 0,10", ("%"+"ccc"+"%",))
Now I want to make column b as variable as below. it will get none
cur.execute("SELECT a FROM table WHERE %s like %s limit 0,10", ("b", "%"+"ccc"+"%"))
I searached many website but not get result. I am a bit dizzy.
In the db-api, parameters are for values only, not for columns or other parts of the query. You'll need to insert that using normal string substitution.
column = 'b'
query = "SELECT a FROM table WHERE {} like %s limit 0,10".format(column)
cur.execute(query, ("%"+"ccc"+"%",))
You could make this a bit nicer by using format in the parameters too:
cur.execute(query, ("%{}%".format("ccc",))
The reason that the second query does not work is that the query that results from the substitution in the parameterised query looks like this:
select a from table where 'b' like '%ccc%' limit 0,10
'b' does not refer to a table, but to the static string 'b'. If you instead passed the string abcccba into the query you'd get a query that selects all rows:
cur.execute("SELECT a FROM table WHERE %s like %s limit 0,10", ("abcccba", "%"+"ccc"+"%"))
generates query:
SELECT a FROM table WHERE 'abcccba' like '%ccc%' limit 0,10
From this you should now be able to see why the second query returns an empty result set: the string b is not like %ccc%, so no rows will be returned.
Therefore you can not set values for table or column names using parameterised queries, you must use normal Python string subtitution:
cur.execute("SELECT a FROM table WHERE {} like %s limit 0,10".format('b'), ("abcccba", "%"+"ccc"+"%"))
which will generate and execute the query:
SELECT a FROM table WHERE b like '%ccc%' limit 0,10
You probably need to rewrite your variable substitution from
cur.execute("SELECT a FROM table WHERE b like %s limit 0,10", ("%"+"ccc"+"%"))
to
cur.execute("SELECT a FROM table WHERE b like %s limit 0,10", ("%"+"ccc"+"%",))
Note the trailing comma which adds a last empty element, which makes sure the tuple that states variables is longer than 1 element. In this example the string concatenation isn't even necessary, this code says:
cur.execute("SELECT a FROM table WHERE b like %s limit 0,10", ("%ccc%",))

sqlite SQL query for unprocessed rows

I'm not quite even sure where / what to search for - so apologies if this is a trivial thing that has been asked before!
I have two tables in sqlite:
table_A = [id, value1, value2]
table_A$foo = [id, foo(value1), foo(value2)]
table_A$bar = [id, bar(value1), bar(value2)]
Where foo() / bar() are arbitrary functions not really relevant here
Now at the moment, I do:
select * from table_A
And use this cursor to compute all the rows for each of the derivative tables.
If something goes wrong (or I add new rows to table_A), i'd like a way to be able to compute (within SQL, rather than in python) which rows are already present in table_A$foo etc. and so just select the remaining (so like a AND NOT)to compute foo() and bar() - i should be able to do this on the ID col, as these remain the same.
Wondering if there is a way to do this in sqlite, which I imagine would be quicker than trying to rig this up in python.
Many thanks!
I don't understand if you consider a match based on value1 columns matching, or a combination of all three columns...
Using EXISTS to find those that are already present:
SELECT *
FROM TABLE_A a
WHERE EXISTS(SELECT NULL
FROM TABLE_A$foo f
WHERE a.id = f.id
AND a.value1 = f.value1
AND a.value2 = f.value2)
Using EXISTS to find those that are not present:
SELECT *
FROM TABLE_A a
WHERE NOT EXISTS(SELECT NULL
FROM TABLE_A$foo f
WHERE a.id = f.id
AND a.value1 = f.value1
AND a.value2 = f.value2)

Categories