How to create/populate a SQLITE table from JOIN command? - python

I am trying to join two tables on a column and then populate a new table with the query results.
I know that the join command gives me the table data I want but now how do I insert this data into a new table without having to loop through the results as there are many unique column names. Is there a way to do this with a SQLite command? To do this without SQLite command would require nested for loops and become computationally expensive (if it even works).
Join command that works:
connection = sqlite3.connect("database1.db")
c = connection.cursor()
c.execute("ATTACH DATABASE 'database1.db' AS db_1")
c.execute("ATTACH DATABASE 'database2.db' AS db_2")
c.execute("SELECT * FROM db_1.Table1Name AS a JOIN db_2.Table2Name AS b WHERE a.Column1 = b.Column2")
Attempt to join and insert command that does not error but does not populate the table:
c.execute("INSERT INTO 'NewTableName' SELECT * FROM db_1.Table1Name AS a JOIN db_2.Table2Name AS b WHERE a.Column1 = b.Column2")

the sql part is:
CREATE TABLE new_table AS
SELECT expressions
FROM existing_tables
[WHERE conditions];

Related

Deleting duplicate rows in "large" sqlite table takes too much time (Python)

I have a relatively small sqlite3 database (~2.6GB) with 820k rows and 26 columns (single table). I run an iterative process, and every time new data is generated, the data is placed in a pandas dataframe, and inserted into the sqlite database with the function insert_values_to_table. This process operates fine and is very fast.
After every data insert, the database is sanitized from its duplicate row listings (all 26 columns need to be duplicate) with the function sanitize_database. This operation connects to the database in similar fashion, creates a cursor, and executes the following logic: Create new temporary_table with only unique values from original table --> Delete all rows from original table --> Insert all rows from temporary table into empty original table --> Drop the temporary table.
It works, but the sanitize_database function is extremely slow, and can easily take up to an hour for even this small dataset. I tried to set a certain column as primary key, or to unique value, however, pandas.DataFrame.to_sql does not allow for this operation as it can either insert the whole dataframe at once, or none at all. That functionality can be reviewed here (append_skipdupes).
Is there a method to make this process more efficient?
#Function to insert pandas dataframes into SQLITE3 database
def insert_values_to_table(table_name, output):
conn = connect_to_db("/mnt/wwn-0x5002538e00000000-part1/DATABASE/table_name.db")
#if connection exists perform data insertion
if conn is not None:
c = conn.cursor()
#Add pandas data (output) into sql database
output.to_sql(name=table_name, con=conn, if_exists='append', index=False)
#Close connection
conn.close()
print('SQL insert process finished')
#To keep only unique rows in SQLITE3 database
def sanitize_database():
conn = connect_to_db("/mnt/wwn-0x5002538e00000000-part1/DATABASE/table_name.db")
c = conn.cursor()
c.executescript("""
CREATE TABLE temp_table as SELECT DISTINCT * FROM table_name;
DELETE FROM table_name;
INSERT INTO table_name SELECT * FROM temp_table;
DROP TABLE temp_table
""")
conn.close()

is there a way to return a specified tables column titles in Python using mysql.connector?

I'm trying to get a list of column names from a table in a SQL database. For example, if my database is called "book_shop" and the table I want to return the columns is called "books".
It's just the string formatting I'm after. I've tried the following...
SELECT *
from information_schema.columns
WHERE table_schema = 'book_shop'
ORDER BY table_name,ordinal_position
Ive got the fetchall and executed commands but it says there's something up with my SQL syntax.

Python SQL Server database loop not working

Using Python looping through a number of SQL Server databases creating tables using select into, but when I run the script nothing happens i.e. no error messages and the tables have not been created. Below is an extract example of what I am doing. Can anyone advise?
df = [] # dataframe of database names as example
for i, x in df.iterrows():
SQL = """
Drop table if exists {x}..table
Select
Name
Into
{y}..table
From
MainDatabase..Details
""".format(x=x['Database'],y=x['Database'])
cursor.execute(SQL)
conn.commit()
Looks like your DB driver doesn't support multiple statements behavior, try to split your query to 2 single statements one with drop and other with select:
for i, x in df.iterrows():
drop_sql = """
Drop table if exists {x}..table
""".format(x=x['Database'])
select_sql = """
Select
Name
Into
{y}..table
From
MainDatabase..Details
""".format(x=x['Database'], y=x['Database'])
cursor.execute(drop_sql)
cursor.execute(select_sql)
cursor.commit()
And second tip, your x=x['Database'] and y=x['Database'] are the same, is this correct?

inserting three columns in a table at a time in mysql using select and values

I need to insert in three columns of a table in mysql at a time. First two columns are inserted by selecting data from other tables by using select statement while the third column needs to be inserted directly and it doesn't need any select. I don't know its syntax in mysql. pos is an array and i need to insert it simultaneously.
here is my sql command in python.
sql="insert into quranic_index_2(quran_wordid,translationid,pos) select quranic_words.wordid,quran_english_translations.translationid from quranic_words, quran_english_translation where quranic_words.lemma=%s and quran_english_translations.verse_no=%s and
quran_english_translations.translatorid="%s,values(%s)"
data=l,words[2],var1,words[i+1]
r=cursor.execute(sql,data)
data is passing variables in which all the variables are stored. words[i+1] holds values for pos.
Try using below sample query :
INSERT INTO table_name(field_1, field_2, field3) VALUES
('Value_1', (SELECT value_2,from user_table ), 'value_3')

Python - Bulk Select then Insert from one DB to another

I'm looking for some help on how to do this in Python using sqlite3
Basically I have a process which downloads a DB (temp) and then needs to insert it's records into a 2nd identical DB (the main db).. and at the same time ignore/bypass any possible duplicate key errors
I was thinking of two scenarios but am unsure how to best do this in Python
Option 1:
create 2 connections and cursor objects, 1 to each DB
select from DB 1 eg:
dbcur.executemany('SELECT * from table1')
rows = dbcur.fetchall()
insert them into DB 2:
dbcur.execute('INSERT INTO table1 VALUES (:column1, :column2)', rows)
dbcon.commit()
This of course does not work as I'm not sure how to do it properly :)
Option 2 (which I would prefer, but not sure how to do):
SELECT and INSERT in 1 statement
Also, I have 4 tables within the DB's each with varying columns, can I skip naming the columns on the INSERT statement?
As far as the duplicate keys go, I have read I can use 'ON DUPLICATE KEY' to handle
eg.
INSERT INTO table1 VALUES (:column1, :column2) ON DUPLICATE KEY UPDATE set column1=column1
You can ATTACH two databases to the same connection with code like this:
import sqlite3
connection = sqlite3.connect('/path/to/temp.sqlite')
cursor=connection.cursor()
cursor.execute('ATTACH "/path/to/main.sqlite" AS master')
There is no ON DUPLICATE KEY syntax in sqlite as there is in MySQL. This SO question contains alternatives.
So to do the bulk insert in one sql statement, you could use something like
cursor.execute('INSERT OR REPLACE INTO master.table1 SELECT * FROM table1')
See this page for information about REPLACE and other ON CONFLICT options.
The code for option 1 looks correct.
If you need filtering to bypass duplicate keys, do the insert into a temporary table and then use SQL commands to eliminate duplicates and merge them into the target table.

Categories