MySQL - Match two tables contains HUGE DATA and find the similar data - python

I have two tables in my SQL.
Table 1 contains many data, but Table 2 contains huge data.
Here's the code I implement using Python
import MySQLdb
db = MySQLdb.connect(host = "localhost", user = "root", passwd="", db="fak")
cursor = db.cursor()
#Execute SQL Statement:
cursor.execute("SELECT invention_title FROM auip_wipo_sample WHERE invention_title IN (SELECT invention_title FROM us_pat_2005_to_2012)")
#Get the result set as a tuple:
result = cursor.fetchall()
#Iterate through results and print:
for record in result:
print record
print "Finish."
#Finish dealing with the database and close it
db.commit()
db.close()
However, it takes so long. I have run the Python script for 1 hour, and it still doesn't give me any results yet.
Please help me.

Do you have index on invention_title in both tables? If not, then create it:
ALTER TABLE auip_wipo_sample ADD KEY (`invention_title`);
ALTER TABLE us_pat_2005_to_2012 ADD KEY (`invention_title`);
Then combine your query into one which don't use subqueries:
SELECT invention_title FROM auip_wipo_sample
INNER JOIN us_pat_2005_to_2012 ON auip_wipo_sample.invention_title = us_pat_2005_to_2012.invention_title
And let me know about your results.

Related

Python MySql.Connector fetchall() is not, in fact, fetching all rows

This question has been asked a dozen times on this site with no real answer.
I use mysql.connector all the time for work, but recently I've discovered that it does not consistently return all results.
sql = ("""SELECT cp.location, cpt.created_ts, cpt.amount FROM
customer_plan_transactions cpt
JOIN customer_plans cp on cp.id = cpt.customer_plan_id
WHERE cpt.created_ts like "2022-09%" """)
cursor = my_db.cursor()
cursor.execute(sql)
rows = cursor.fetchall()
print(len(rows))
4395
Though, if I run this query through phpMyAdmin:
Any ideas? Is there another library I should be using for MySql?
edit: It must be a bug with mysql.connector. If I simply re-order the fields in the select statement, I suddenly get all the rows I am expecting.
sql = ("""SELECT cpt.created_ts, cp.location, cpt.amount FROM customer_plan_transactions cpt
JOIN customer_plans cp on cp.id = cpt.customer_plan_id
WHERE cpt.created_ts between "2022-09-01" and "2022-10-01" """)
cursor = jax_uc.cursor()
cursor.execute(sql)
rows = cursor.fetchall()
print(len(rows))
63140
So, it's a bug, right?

MySQL 'SHOW TABLES' Returns count instead of list (Python)

I'm troubleshooting a script I am using to query the database. To make sure I had everything working right I stripped it down to a simple 'SHOW TABLES' query. The problem is that it is returning a count of the tables instead of the list of names it should return.
import pymysql
connection = pymysql.connect(host='10.0.0.208', user='admin', passwd='Passwrd')
cursor = connection.cursor()
sqlstring = 'SHOW TABLES;'
cursor.execute('USE CustDB')
x = cursor.execute(sqlstring)
print(x)
This is only returning '17'. What am I missing??
Per the documentation, execute returns the number of rows affected
Returns: Number of affected rows
In order to get the desired results, you need to loop through the cursor
cursor.execute('USE CustDB')
tables = [c for c in cursor]
or use fetchall
cursor.execute('USE CustDB')
tables = cursor.fetchall()

loop over all tables in mysql databases

I am new with MySQL and I need some help please. I am using MySQL connector to write scripts.
I have database contain 7K tables and I am trying to select some values from some of these tables
cursor.execute( "SELECT SUM(VOLUME) FROM stat_20030103 WHERE company ='Apple'")
for (Volume,) in cursor:
print(Volume)
This works for one table e.g (stats_20030103). However I want to sum all volume of all tables .startwith (stats_2016) where the company name is Apple. How I can loop over my tables?
I'm not an expert in MySQL, but here is something quick and simple in python:
# Get all the tables starting with "stats_2016" and store them
cursor.execute("SHOW TABLES LIKE 'stats_2016%'")
tables = [v for (v, ) in cursor]
# Iterate over all tables, store the volumes sum
all_volumes = list()
for t in tables:
cursor.execute("SELECT SUM(VOLUME) FROM %s WHERE company = 'Apple'" % t)
# Get the first row as is the sum, or 0 if None rows found
all_volumes.append(cursor.fetchone()[0] or 0)
# Return the sum of all volumes
print(sum(all_volumes))
You can probably use select * from information_schema.tables to get all tables name into your query.
I'd try to left-join.
SELECT tables.*, stat.company, SUM(stat.volume) AS volume
FROM information_schema.tables AS tables LEFT JOIN mydb.stat_20030103 AS stat
WHERE tables.schema = "mydb" GROUP BY stat.company;
This will give you all results at once. Maybe MySQL doesn't support joining from metatables, in which case you might select it into a temporary table.
CREATE TEMPORARY TABLE mydb.tables SELECT name FROM information_schema.tables WHERE schema = "mydb"
See MySQL doc on information_schema.table.

Using instance variables in SQLite3 update?

Ok so basically I'm trying to update an existing SQLite3 Database with instance variables (typ and lvl)
#Set variables
typ = 'Test'
lvl = 6
#Print Databse
print("\nHere's a listing of all the records in the table:\n")
for row in cursor.execute("SELECT rowid, * FROM fieldmap ORDER BY rowid"):
print(row)
#Update Info
sql = """
UPDATE fieldmap
SET buildtype = typ, buildlevel = lvl
WHERE rowid = 11
"""
cursor.execute(sql)
#Print Databse
print("\nHere's a listing of all the records in the table:\n")
for row in cursor.execute("SELECT rowid, * FROM fieldmap ORDER BY rowid"):
print(row)
As an Error I'm getting
sqlite3.OperationalError: no such column: typ
Now I basically know the problem is that my variable is inserted with the wrong syntax but I can not for the life of me find the correct one. It works with strings and ints just fine like this:
sql = """
UPDATE fieldmap
SET buildtype = 'house', buildlevel = 3
WHERE rowid = 11
"""
But as soon as I switch to the variables it throws the error.
Your query is not actually inserting the values of the variables typ and lvl into the query string. As written the query is trying to reference columns named typ and lvl, but these don't exist in the table.
Try writing is as a parameterised query:
sql = """
UPDATE fieldmap
SET buildtype = ?, buildlevel = ?
WHERE rowid = 11
"""
cursor.execute(sql, (typ, lvl))
The ? acts as a placeholder in the query string which is replaced by the values in the tuple passed to execute(). This is a secure way to construct the query and avoids SQL injection vulnerabilities.
Hey I think you should use ORM to manipulate with SQL database.
SQLAlchemy is your friend. I use that with SQLite, MySQL, PostgreSQL. It is fantastic.
That can make you get away from this syntax error since SQL does take commas and quotation marks as importance.
For hard coding, you may try this:
sql = """
UPDATE fieldmap
SET buildtype = '%s', buildlevel = 3
WHERE rowid = 11
""" % (house)
This can solve your problem temporarily but not for the long run. ORM is your friend.
Hope this could be helpful!

Put retrieved data from MySQL query into DataFrame pandas by a for loop

I have one database with two tables, both have a column called barcode, the aim is to retrieve barcode from one table and search for the entries in the other where extra information of that certain barcode is stored. I would like to have bothe retrieved data to be saved in a DataFrame. The problem is when I want to insert the retrieved data into DataFrame from the second query, it stores only the last entry:
import mysql.connector
import pandas as pd
cnx = mysql.connector(user,password,host,database)
query_barcode = ("SELECT barcode FROM barcode_store")
cursor = cnx.cursor()
cursor.execute(query_barcode)
data_barcode = cursor.fetchall()
Up to this point everything works smoothly, and here is the part with problem:
query_info = ("SELECT product_code FROM product_info WHERE barcode=%s")
for each_barcode in data_barcode:
cursor.execute(query_info % each_barcode)
pro_info = pd.DataFrame(cursor.fetchall())
pro_info contains only the last matching barcode information! While I want to retrieve all the information for each data_barcode match.
That's because you are consistently overriding existing pro_info with new data in each loop iteration. You should rather do something like:
query_info = ("SELECT product_code FROM product_info")
cursor.execute(query_info)
pro_info = pd.DataFrame(cursor.fetchall())
Making so many SELECTs is redundant since you can get all records in one SELECT and instantly insert them to your DataFrame.
#edit: However if you need to use the WHERE statement to fetch only specific products, you need to store records in a list until you insert them to DataFrame. So your code will eventually look like:
pro_list = []
query_info = ("SELECT product_code FROM product_info WHERE barcode=%s")
for each_barcode in data_barcode:
cursor.execute(query_info % each_barcode)
pro_list.append(cursor.fetchone())
pro_info = pd.DataFrame(pro_list)
Cheers!

Categories