loop over all tables in mysql databases - python

I am new with MySQL and I need some help please. I am using MySQL connector to write scripts.
I have database contain 7K tables and I am trying to select some values from some of these tables
cursor.execute( "SELECT SUM(VOLUME) FROM stat_20030103 WHERE company ='Apple'")
for (Volume,) in cursor:
print(Volume)
This works for one table e.g (stats_20030103). However I want to sum all volume of all tables .startwith (stats_2016) where the company name is Apple. How I can loop over my tables?

I'm not an expert in MySQL, but here is something quick and simple in python:
# Get all the tables starting with "stats_2016" and store them
cursor.execute("SHOW TABLES LIKE 'stats_2016%'")
tables = [v for (v, ) in cursor]
# Iterate over all tables, store the volumes sum
all_volumes = list()
for t in tables:
cursor.execute("SELECT SUM(VOLUME) FROM %s WHERE company = 'Apple'" % t)
# Get the first row as is the sum, or 0 if None rows found
all_volumes.append(cursor.fetchone()[0] or 0)
# Return the sum of all volumes
print(sum(all_volumes))

You can probably use select * from information_schema.tables to get all tables name into your query.

I'd try to left-join.
SELECT tables.*, stat.company, SUM(stat.volume) AS volume
FROM information_schema.tables AS tables LEFT JOIN mydb.stat_20030103 AS stat
WHERE tables.schema = "mydb" GROUP BY stat.company;
This will give you all results at once. Maybe MySQL doesn't support joining from metatables, in which case you might select it into a temporary table.
CREATE TEMPORARY TABLE mydb.tables SELECT name FROM information_schema.tables WHERE schema = "mydb"
See MySQL doc on information_schema.table.

Related

Python MySql.Connector fetchall() is not, in fact, fetching all rows

This question has been asked a dozen times on this site with no real answer.
I use mysql.connector all the time for work, but recently I've discovered that it does not consistently return all results.
sql = ("""SELECT cp.location, cpt.created_ts, cpt.amount FROM
customer_plan_transactions cpt
JOIN customer_plans cp on cp.id = cpt.customer_plan_id
WHERE cpt.created_ts like "2022-09%" """)
cursor = my_db.cursor()
cursor.execute(sql)
rows = cursor.fetchall()
print(len(rows))
4395
Though, if I run this query through phpMyAdmin:
Any ideas? Is there another library I should be using for MySql?
edit: It must be a bug with mysql.connector. If I simply re-order the fields in the select statement, I suddenly get all the rows I am expecting.
sql = ("""SELECT cpt.created_ts, cp.location, cpt.amount FROM customer_plan_transactions cpt
JOIN customer_plans cp on cp.id = cpt.customer_plan_id
WHERE cpt.created_ts between "2022-09-01" and "2022-10-01" """)
cursor = jax_uc.cursor()
cursor.execute(sql)
rows = cursor.fetchall()
print(len(rows))
63140
So, it's a bug, right?

Separating Destination tables and Source tables from a query

I do have lot of queries that need to separate insert into (destination) table names as well as from (Source) table names.
Do you have any idea or a python code to separate them? I do have really large oracle stored procedure list.
Doing it manually is really time consuming. If someone has any clue for this, would be highly appreciated.
I need to separate only the destination tables and source tables..
Below is a sample query to work on
Create or replace procedure sfa.dlm_upload
BEGIN
INSERT INTO SFL.DFV_ALERT_INT
SELECT A.PROFILE_ID, A.AGENT_NAME, B.CONTACTR SRC_MSD,
C.PROFILE_ID, B.DSR_PROFILE_ID
FROM EDW_TRD.RETAILER_SO1_DATA A, SFL.SFL_AGENT_DTL B,
SFL.SFL_AGENT_DTL_TEMP C
WHERE A.PROFILE_ID = B.PROFILE_ID
AND B.DSR_ID = C.AGENT_ID
AND C.AGENT_STATUS = 'Active'
AND MONTH_KEY = (SELECT MAX(MONTH_KEY) FROM
EDW_TRD.RETAILER_SO1_DATAMART)
;
INSERT OVERWRITE INTO SFL.MLV_ALERT_INTER
SELECT PROFILE_ID, TRUNC(PROFILE_CREATED_DATE) DATE_,
COUNT(DISTINCT CONTRACT_ID)
FROM
(SELECT PROFILE_ID,PROFILE_CREATED_DATE, CONTRACT_ID
FROM MDW.RTV_PRE_CHANE_SALES
WHERE TRUNC(PROFILE_DATE,'MM') >=
ADD_MONTHS(TRUNC(SYSDATE,'MM'),-2)
UNION ALL
SELECT TO_NUMBER(PMS_ID), PROFILE_CREATED_DATE, CONTRACT_ID
FROM MDW.MTV_POST_CHAN_SALES
WHERE TRUNC(PROFILE_CREATED_DATE,'MM') >=
ADD_MONTHS(TRUNC(SYSDATE,'MM'),-2))
GROUP BY PROFILE_ID, TRUNC(PROFILE_CREATED_DATE);
END;
OUTPUT -
Destination tables
SFL.DFV_ALERT_INT
SFL.MLV_ALERT_INTER
Source tables
EDW_TRD.RETAILER_SO1_DATA
SFL.SFL_AGENT_DTL
SFL.SFL_AGENT_DTL_TEMP
MDW.RTV_PRE_CHANE_SALES
MDW.MTV_POST_CHAN_SALES
Can anyone help me on this?

Update multiple rows of SQL table from Python script

I have a massive table (over 100B records), that I added an empty column to. I parse strings from another field (string) if the required string is available, extract an integer from that field, and want to update it in the new column for all rows that have that string.
At the moment, after data has been parsed and saved locally in a dataframe, I iterate on it to update the Redshift table with clean data. This takes approx 1sec/iteration, which is way too long.
My current code example:
conn = psycopg2.connect(connection_details)
cur = conn.cursor()
clean_df = raw_data.apply(clean_field_to_parse)
for ind, row in clean_df.iterrows():
update_query = build_update_query(row.id, row.clean_integer1, row.clean_integer2)
cur.execute(update_query)
where update_query is a function to generate the update query:
def update_query(id, int1, int2):
query = """
update tab_tab
set
clean_int_1 = {}::int,
clean_int_2 = {}::int,
updated_date = GETDATE()
where id = {}
;
"""
return query.format(int1, int2, id)
and where clean_df is structured like:
id . field_to_parse . clean_int_1 . clean_int_2
1 . {'int_1':'2+1'}. 3 . np.nan
2 . {'int_2':'7-0'}. np.nan . 7
Is there a way to update specific table fields in bulk, so that there is no need to execute one query at a time?
I'm parsing the strings and running the update statement from Python. The database is stored on Redshift.
As mentioned, consider pure SQL and avoid iterating through billions of rows by pushing the Pandas data frame to Postgres as a staging table and then run one single UPDATE across both tables. With SQLAlchemy you can use DataFrame.to_sql to create a table replica of data frame. Even add an index of the join field, id, and drop the very large staging table at end.
from sqlalchemy import create_engine
engine = create_engine("postgresql+psycopg2://myuser:mypwd!#myhost/mydatabase")
# PUSH TO POSTGRES (SAME NAME AS DF)
clean_df.to_sql(name="clean_df", con=engine, if_exists="replace", index=False)
# SQL UPDATE (USING TRANSACTION)
with engine.begin() as conn:
sql = "CREATE INDEX idx_clean_df_id ON clean_df(id)"
conn.execute(sql)
sql = """UPDATE tab_tab t
SET t.clean_int_1 = c.int1,
t.clean_int_2 = c.int2,
t.updated_date = GETDATE()
FROM clean_df c
WHERE c.id = t.id
"""
conn.execute(sql)
sql = "DROP TABLE IF EXISTS clean_df"
conn.execute(sql)
engine.dispose()

sqlite3 index table in python

I have created this table in python 2.7 . I use it to store unique pairs name and value. In some queries I search for names and in others I search for values. Lets say that SELECT queries are 50-50. Is there any way to create a table that will be double index (one index on names and another for values) so my program will seek faster the data ?
Here is the database and table creation:
import sqlite3
#-------------------------db creation ---------------------------------------#
db1 = sqlite3.connect('/my_db.db')
cursor = db1.cursor()
cursor.execute("DROP TABLE IF EXISTS my_table")
sql = '''CREATE TABLE my_table (
name TEXT DEFAULT NULL,
value INT
);'''
cursor.execute(sql)
sql = ("CREATE INDEX index_my_table ON my_table (name);")
cursor.execute(sql)
Or is there any other faster struct for faster value seek ?
You can create another index...
sql = ("CREATE INDEX index_my_table2 ON my_table (value);")
cursor.execute(sql)
I think the best way for faster research is to create a index on the 2 fields.
like: sql = ("CREATE INDEX index_my_table ON my_table (Field1, field2)")
Multi-Column Indices or Covering Indices.
see the (great) doc here: https://www.sqlite.org/queryplanner.html

MySQL - Match two tables contains HUGE DATA and find the similar data

I have two tables in my SQL.
Table 1 contains many data, but Table 2 contains huge data.
Here's the code I implement using Python
import MySQLdb
db = MySQLdb.connect(host = "localhost", user = "root", passwd="", db="fak")
cursor = db.cursor()
#Execute SQL Statement:
cursor.execute("SELECT invention_title FROM auip_wipo_sample WHERE invention_title IN (SELECT invention_title FROM us_pat_2005_to_2012)")
#Get the result set as a tuple:
result = cursor.fetchall()
#Iterate through results and print:
for record in result:
print record
print "Finish."
#Finish dealing with the database and close it
db.commit()
db.close()
However, it takes so long. I have run the Python script for 1 hour, and it still doesn't give me any results yet.
Please help me.
Do you have index on invention_title in both tables? If not, then create it:
ALTER TABLE auip_wipo_sample ADD KEY (`invention_title`);
ALTER TABLE us_pat_2005_to_2012 ADD KEY (`invention_title`);
Then combine your query into one which don't use subqueries:
SELECT invention_title FROM auip_wipo_sample
INNER JOIN us_pat_2005_to_2012 ON auip_wipo_sample.invention_title = us_pat_2005_to_2012.invention_title
And let me know about your results.

Categories