Separating Destination tables and Source tables from a query - python

I do have lot of queries that need to separate insert into (destination) table names as well as from (Source) table names.
Do you have any idea or a python code to separate them? I do have really large oracle stored procedure list.
Doing it manually is really time consuming. If someone has any clue for this, would be highly appreciated.
I need to separate only the destination tables and source tables..
Below is a sample query to work on
Create or replace procedure sfa.dlm_upload
BEGIN
INSERT INTO SFL.DFV_ALERT_INT
SELECT A.PROFILE_ID, A.AGENT_NAME, B.CONTACTR SRC_MSD,
C.PROFILE_ID, B.DSR_PROFILE_ID
FROM EDW_TRD.RETAILER_SO1_DATA A, SFL.SFL_AGENT_DTL B,
SFL.SFL_AGENT_DTL_TEMP C
WHERE A.PROFILE_ID = B.PROFILE_ID
AND B.DSR_ID = C.AGENT_ID
AND C.AGENT_STATUS = 'Active'
AND MONTH_KEY = (SELECT MAX(MONTH_KEY) FROM
EDW_TRD.RETAILER_SO1_DATAMART)
;
INSERT OVERWRITE INTO SFL.MLV_ALERT_INTER
SELECT PROFILE_ID, TRUNC(PROFILE_CREATED_DATE) DATE_,
COUNT(DISTINCT CONTRACT_ID)
FROM
(SELECT PROFILE_ID,PROFILE_CREATED_DATE, CONTRACT_ID
FROM MDW.RTV_PRE_CHANE_SALES
WHERE TRUNC(PROFILE_DATE,'MM') >=
ADD_MONTHS(TRUNC(SYSDATE,'MM'),-2)
UNION ALL
SELECT TO_NUMBER(PMS_ID), PROFILE_CREATED_DATE, CONTRACT_ID
FROM MDW.MTV_POST_CHAN_SALES
WHERE TRUNC(PROFILE_CREATED_DATE,'MM') >=
ADD_MONTHS(TRUNC(SYSDATE,'MM'),-2))
GROUP BY PROFILE_ID, TRUNC(PROFILE_CREATED_DATE);
END;
OUTPUT -
Destination tables
SFL.DFV_ALERT_INT
SFL.MLV_ALERT_INTER
Source tables
EDW_TRD.RETAILER_SO1_DATA
SFL.SFL_AGENT_DTL
SFL.SFL_AGENT_DTL_TEMP
MDW.RTV_PRE_CHANE_SALES
MDW.MTV_POST_CHAN_SALES
Can anyone help me on this?

Related

With pyodbc & SQL Server, how to insert multiple foreign keys in a table

I am able to insert a foreign key in a SQL table. However, after doing the same thing for 3 other tables, I will have to insert those 4 FK's in my fact table. I am asking now to know in advance if this is the way to go, database-model-wise.
Code to skip duplicate rows, insert columns and a FK RegionID:
cursor.execute(("""
IF NOT EXISTS (
SELECT #address1client, #address2client, #cityClient
INTERSECT
SELECT address1client, address2client, cityClient
FROM dbo.AddressClient)
BEGIN
INSERT INTO dbo.AddressClient (address1client, address2client, cityClient, RegionID)
SELECT #address1client, #address2client, #cityClient, RegionID
FROM dbo.Region
WHERE province=#province AND country=#country)
END""")
My questions are:
Does a BEGIN ... END statement execute all at once? If the answer is yes, would the code below work? I ask, because there can at no point be FK_ID columns with null values.
...
BEGIN
INSERT INTO dbo.Fact(product,saleTotal,saleDate) VALUES (#product, #saleTotal, #saleDate)
INSERT INTO dbo.Fact (ClientAddressID)
SELECT ClientAddressID
FROM dbo.ClientAddress
WHERE address1c=#address1c AND address2c=#address2c AND cityC=#cityC)
INSERT INTO dbo.Fact (SupplierAddressID)
SELECT SupplierAddressID
FROM dbo.SupplierAddress
WHERE address1s=#address1s AND address2s=#address2s AND cityS=#cityS)
INSERT INTO dbo.Fact (DetailID)
SELECT DetailID
FROM dbo.Detail
WHERE categoryNum=#categoryNum AND type=#type AND nature=#nature)
END""")
2- If a BEGIN ... END statement doesn't execute all at once, how do I go about inserting multiple FK's in a table?

Python Sqlite: how to select non-existing records(rows) based on a column?

Hope everyone's doing well.
Database:
Value Date
---------------------------------
3000 2019-12-15
6000 2019-12-17
What I hope to return:
"Data:3000 on 2019-12-15"
"NO data on 2019-12-16" (non-existing column based on Date)
"Data:6000 on 2019-12-17"
I don't know how to filter non-existing records(rows) based on a column.
Possible boilerplate code:
db = sqlite3.connect("Database1.db")
cursor = db.cursor()
cursor.execute("""
SELECT * FROM Table1
WHERE Date >= "2019-12-15" and Date <= "2019-12-17"
""")
entry = cursor.fetchall()
for i in entry:
if i is None:
print("No entry found:", i)
else:
print("Entry found")
db.close()
Any help is much appreciated!
The general way you might handle this problem uses something called a calendar table, which is just a table containing all dates you want to see in your report. Consider the following query:
SELECT
d.dt,
t.Value
FROM
(
SELECT '2019-12-15' AS dt UNION ALL
SELECT '2019-12-16' UNION ALL
SELECT '2019-12-17'
) d
LEFT JOIN yourTable t
ON d.dt = t.Date
ORDER BY
d.dt;
In practice, if you had a long term need to do this and/or had a large number of dates to cover, you might setup a bona-fide calendar table in your SQLite database for this purpose. The above query is only intended to be a proof-of-concept.

Postgresql: Insert from huge csv file, collect the ids and respect unique constraints

In a postgresql database:
class Persons(models.Model):
person_name = models.CharField(max_length=10, unique=True)
The persons.csv file, contains 1 million names.
$cat persons.csv
Name-1
Name-2
...
Name-1000000
I want to:
Create the names that do not already exist
Query the database and fetch the id for each name contained in the csv file.
My approach:
Use the COPY command or the django-postgres-copy application that implements it.
Also take advantage of the new Postgresql-9.5+ upsert feature.
Now, all the names in the csv file, are also in the database.
I need to get their ids -from the database- either in memory or in another csv file with an efficient way:
Use Q objects
list_of_million_q = <iterate csv and append Qs>
million_names = Names.objects.filter(list_of_million_q)
or
Use __in to filter based on a list of names:
list_of_million_names = <iterate csv and append strings>
million_names = Names.objects.filter(
person_name__in=[list_of_million_names]
)
or
?
I do not feel that any of the above approaches for fetching the ids is efficient.
Update
There is a third option, along the lines of this post that should be a great solution which combines all the above.
Something like:
SELECT * FROM persons;
make a name: id dictionary out of the names recieved from the database:
db_dict = {'Harry': 1, 'Bob': 2, ...}
Query the dictionary:
ids = []
for name in list_of_million_names:
if name in db_dict:
ids.append(db_dict[name])
This way you're using the quick dictionary indexing as opposed to the slower if x in list approach.
But the only way to really know for sure is to benchmark these 3 approaches.
This post describes how to use RETURNING with ON CONFLICT so while inserting into the database the contents of the csv file, the ids will be saved in another table either when an insertion was successful, or when -due to unique constraints- the insertion was omitted.
I have tested it in sqlfiddle where I used a set up that resembles the one used for the COPY command which inserts to the database straight from a csv file, respecting the unique constraints.
The schema:
CREATE TABLE IF NOT EXISTS label (
id serial PRIMARY KEY,
label_name varchar(200) NOT NULL UNIQUE
);
INSERT INTO label (label_name) VALUES
('Name-1'),
('Name-2');
CREATE TABLE IF NOT EXISTS ids (
id serial PRIMARY KEY,
label_ids varchar(12) NOT NULL
);
The script:
CREATE TEMP TABLE tmp_table
(LIKE label INCLUDING DEFAULTS)
ON COMMIT DROP;
INSERT INTO tmp_table (label_name) VALUES
('Name-2'),
('Name-3');
WITH ins AS(
INSERT INTO label
SELECT *
FROM tmp_table
ON CONFLICT (label_name) DO NOTHING
RETURNING id
)
INSERT INTO ids (label_ids)
SELECT
id FROM ins
UNION ALL
SELECT
l.id FROM tmp_table
JOIN label l USING(label_name);
The output:
SELECT * FROM ids;
SELECT * FROM label;

Trying to SELECT row by long list of composite primary keys in SQLite

This is my query using code found perusing this site:
query="""SELECT Family
FROM Table2
INNER JOIN Table1 ON Table1.idSequence=Table2.idSequence
WHERE (Table1.Chromosome, Table1.hg19_coordinate) IN ({seq})
""".format(seq=','.join(['?']*len(matchIds_list)))
matchIds_list is a list of tuples in (?,?) format.
It works if I just ask for one condition (ie just Table1.Chromosome as oppose to both Chromosome and hg_coordinate) and matchIds_list is just a simple list of single values, but I don't know how to get it to work with a composite key or both columns.
Since you're running SQLite 3.7.17, I'd recommend to just use a temporary table.
Create and populate your temporary table.
cursor.executescript("""
CREATE TEMP TABLE control_list (
Chromosome TEXT NOT NULL,
hg19_coordinate TEXT NOT NULL
);
CREATE INDEX control_list_idx ON control_list (Chromosome, hg19_coordinate);
""")
cursor.executemany("""
INSERT INTO control_list (Chromosome, hg19_coordinate)
VALUES (?, ?)
""", matchIds_list)
Just constrain your query to the control list temporary table.
SELECT Family
FROM Table2
INNER JOIN Table1
ON Table1.idSequence = Table2.idSequence
-- Constrain to control_list.
WHERE EXISTS (
SELECT *
FROM control_list
WHERE control_list.Chromosome = Table1.Chromosome
AND control_list.hg19_coordinate = Table1.hg19_coordinate
)
And finally perform your query (there's no need to format this one).
cursor.execute(query)
# Remove the temporary table since we're done with it.
cursor.execute("""
DROP TABLE control_list;
""")
Short Query (requires SQLite 3.15): You actually almost had it. You need to make the IN ({seq}) a subquery
expression.
SELECT Family
FROM Table2
INNER JOIN Table1
ON Table1.idSequence = Table2.idSequence
WHERE (Table1.Chromosome, Table1.hg19_coordinate) IN (VALUES {seq});
Long Query (requires SQLite 3.8.3): It looks a little complicated, but it's pretty straight forward. Put your
control list into a sub-select, and then constrain that main select by the control
list.
SELECT Family
FROM Table2
INNER JOIN Table1
ON Table1.idSequence = Table2.idSequence
-- Constrain to control_list.
WHERE EXISTS (
SELECT *
FROM (
SELECT
-- Name the columns (must match order in tuples).
"" AS Chromosome,
":1" AS hg19_coordinate
FROM (
-- Get control list.
VALUES {seq}
) AS control_values
) AS control_list
-- Constrain Table1 to control_list.
WHERE control_list.Chromosome = Table1.Chromosome
AND control_list.hg19_coordinate = Table1.hg19_coordinate
)
Regardless of which query you use, when formatting the SQL replace {seq} with (?,?) for each compsite
key instead of just ?.
query = " ... ".format(seq=','.join(['(?,?)']*len(matchIds_list)))
And finally flatten matchIds_list when you execute the query because it is a list of tuples.
import itertools
cursor.execute(query, list(itertools.chain.from_iterable(matchIds_list)))

loop over all tables in mysql databases

I am new with MySQL and I need some help please. I am using MySQL connector to write scripts.
I have database contain 7K tables and I am trying to select some values from some of these tables
cursor.execute( "SELECT SUM(VOLUME) FROM stat_20030103 WHERE company ='Apple'")
for (Volume,) in cursor:
print(Volume)
This works for one table e.g (stats_20030103). However I want to sum all volume of all tables .startwith (stats_2016) where the company name is Apple. How I can loop over my tables?
I'm not an expert in MySQL, but here is something quick and simple in python:
# Get all the tables starting with "stats_2016" and store them
cursor.execute("SHOW TABLES LIKE 'stats_2016%'")
tables = [v for (v, ) in cursor]
# Iterate over all tables, store the volumes sum
all_volumes = list()
for t in tables:
cursor.execute("SELECT SUM(VOLUME) FROM %s WHERE company = 'Apple'" % t)
# Get the first row as is the sum, or 0 if None rows found
all_volumes.append(cursor.fetchone()[0] or 0)
# Return the sum of all volumes
print(sum(all_volumes))
You can probably use select * from information_schema.tables to get all tables name into your query.
I'd try to left-join.
SELECT tables.*, stat.company, SUM(stat.volume) AS volume
FROM information_schema.tables AS tables LEFT JOIN mydb.stat_20030103 AS stat
WHERE tables.schema = "mydb" GROUP BY stat.company;
This will give you all results at once. Maybe MySQL doesn't support joining from metatables, in which case you might select it into a temporary table.
CREATE TEMPORARY TABLE mydb.tables SELECT name FROM information_schema.tables WHERE schema = "mydb"
See MySQL doc on information_schema.table.

Categories