Merge tables from two different databases - sqlite3/Python

Merge tables from two different databases - sqlite3/Python - python

I have two different SQLite databases XXX and YYY.
XXX contains table A and YYY contains B respectively.
A and B have same structure(columns).
How to append the rows of B in A in Python - SQLite API.
After appending A contains rows of A and rows of B.

You first get a connection to the database using sqlite3.connect, then create a cursor so you can execute sql. Once you have a cursor, you can execute arbitrary sql commands.
Example:
import sqlite3
# Get connections to the databases
db_a = sqlite3.connect('database_a.db')
db_b = sqlite3.connect('database_b.db')
# Get the contents of a table
b_cursor = db_b.cursor()
b_cursor.execute('SELECT * FROM mytable')
output = b_cursor.fetchall() # Returns the results as a list.
# Insert those contents into another table.
a_cursor = db_a.cursor()
for row in output:
a_cursor.execute('INSERT INTO myothertable VALUES (?, ?, ...etc..., ?, ?)', row)
# Cleanup
db_a.commit()
a_cursor.close()
b_cursor.close()
Caveat: I haven't actually tested this, so it might have a few bugs in it, but the basic idea is sound, I think.

This is a generalized function and should be customized to your particular environment. To do this, you may structure the "dynamically determine SQL expression requirements" section with the static SQL parameters (rather than PRAGMA table_info). This should improve performance.
import sqlite3
def merge_tables(cursor_new: sqlite3.Cursor, cursor_old: sqlite3.Cursor, table_name: str, del_old_table: bool = False) -> None:
'''
This function merges the content of a specific table from an old cursor into a new cursor.
:param cursor_new: [sqlite3.Cursor] the primary cursor
:param cursor_old: [sqlite3.Cursor] the secondary cursor
:param table_name: [str] the name of the table
:return: None
'''
# dynamically determine SQL expression requirements
column_names = cursor_new.execute(f"PRAGMA table_info({table_name})").fetchall()
column_names = tuple([x[1] for x in column_names][1:]) # remove the primary keyword
values_placeholders = ', '.join(['?' for x in column_names]) # format appropriately
# SQL select columns from table
data = cursor_old.execute(f"SELECT {', '.join(column_names)} FROM {table_name}").fetchall()
# insert the data into the primary cursor
cursor_new.executemany(f"INSERT INTO {table_name} {column_names} VALUES ({values_placeholders})", data)
if (cursor_new.connection.commit() == None):
# With Ephemeral RAM connections & testing, deleting the table may be ill-advised
if del_old_table:
cursor_old.execute(f"DELETE FROM {table_name}") # cursor_old.execute(f'DROP TABLE {table_name}')
cursor_old.connection.commit()
print(f"Table {table_name} merged from {cursor_old.connection} to {cursor_new.connection}") # Consider logging.info()
return None

Related

How to drop all tables including dependent object

When I'm trying to remove all tables with:
base.metadata.drop_all(engine)
I'm getting following error:
ERROR:libdl.database_operations:Cannot drop table: (psycopg2.errors.DependentObjectsStillExist) cannot drop sequence <schema>.<sequence> because other objects depend on it
DETAIL: default for table <schema>.<table> column id depends on sequence <schema>.<sequence>
HINT: Use DROP ... CASCADE to drop the dependent objects too.
Is there an elegant one-line solution for that?

import psycopg2
from psycopg2 import sql
cnn = psycopg2.connect('...')
cur = cnn.cursor()
cur.execute("""
select s.nspname as s, t.relname as t
from pg_class t join pg_namespace s on s.oid = t.relnamespace
where t.relkind = 'r'
and s.nspname !~ '^pg_' and s.nspname != 'information_schema'
order by 1,2
""")
tables = cur.fetchall() # make sure they are the right ones
for t in tables:
cur.execute(
sql.SQL("drop table if exists {}.{} cascade")
.format(sql.Identifier(t[0]), sql.Identifier(t[1])))
cnn.commit() # goodbye

Value error inserting into Postgres table with psycopg2

I've been trying to use this piece of code:
# df is the dataframe
if len(df) > 0:
df_columns = list(df)
# create (col1,col2,...)
columns = ",".join(df_columns)
# create VALUES('%s', '%s",...) one '%s' per column
values = "VALUES({})".format(",".join(["%s" for _ in df_columns]))
#create INSERT INTO table (columns) VALUES('%s',...)
insert_stmt = "INSERT INTO {} ({}) {}".format(table,columns,values)
cur = conn.cursor()
cur = db_conn.cursor()
psycopg2.extras.execute_batch(cur, insert_stmt, df.values)
conn.commit()
cur.close()
So I could connect into Postgres DB and insert values from a df.
I get these 2 errors for this code:
LINE 1: INSERT INTO mrr.shipments (mainFreight_freight_motherVesselD...
psycopg2.errors.UndefinedColumn: column "mainfreight_freight_mothervesseldepartdatetime" of relation "shipments" does not exist
for some reason, the columns can't get the values properly
What can I do to fix it?

You should not do your own string interpolation; let psycopg2 handle it. From the docs:
Warning Never, never, NEVER use Python string concatenation (+) or string parameters interpolation (%) to pass variables to a SQL query string. Not even at gunpoint.
Since you also have dynamic column names, you should use psycopg2.sql to create the statement and then use the standard method of passing query parameters to psycopg2 instead of using format.

Update multiple rows of SQL table from Python script

I have a massive table (over 100B records), that I added an empty column to. I parse strings from another field (string) if the required string is available, extract an integer from that field, and want to update it in the new column for all rows that have that string.
At the moment, after data has been parsed and saved locally in a dataframe, I iterate on it to update the Redshift table with clean data. This takes approx 1sec/iteration, which is way too long.
My current code example:
conn = psycopg2.connect(connection_details)
cur = conn.cursor()
clean_df = raw_data.apply(clean_field_to_parse)
for ind, row in clean_df.iterrows():
update_query = build_update_query(row.id, row.clean_integer1, row.clean_integer2)
cur.execute(update_query)
where update_query is a function to generate the update query:
def update_query(id, int1, int2):
query = """
update tab_tab
set
clean_int_1 = {}::int,
clean_int_2 = {}::int,
updated_date = GETDATE()
where id = {}
;
"""
return query.format(int1, int2, id)
and where clean_df is structured like:
id . field_to_parse . clean_int_1 . clean_int_2
1 . {'int_1':'2+1'}. 3 . np.nan
2 . {'int_2':'7-0'}. np.nan . 7
Is there a way to update specific table fields in bulk, so that there is no need to execute one query at a time?
I'm parsing the strings and running the update statement from Python. The database is stored on Redshift.

As mentioned, consider pure SQL and avoid iterating through billions of rows by pushing the Pandas data frame to Postgres as a staging table and then run one single UPDATE across both tables. With SQLAlchemy you can use DataFrame.to_sql to create a table replica of data frame. Even add an index of the join field, id, and drop the very large staging table at end.
from sqlalchemy import create_engine
engine = create_engine("postgresql+psycopg2://myuser:mypwd!#myhost/mydatabase")
# PUSH TO POSTGRES (SAME NAME AS DF)
clean_df.to_sql(name="clean_df", con=engine, if_exists="replace", index=False)
# SQL UPDATE (USING TRANSACTION)
with engine.begin() as conn:
sql = "CREATE INDEX idx_clean_df_id ON clean_df(id)"
conn.execute(sql)
sql = """UPDATE tab_tab t
SET t.clean_int_1 = c.int1,
t.clean_int_2 = c.int2,
t.updated_date = GETDATE()
FROM clean_df c
WHERE c.id = t.id
"""
conn.execute(sql)
sql = "DROP TABLE IF EXISTS clean_df"
conn.execute(sql)
engine.dispose()

Syntax error when inserting into 2003 MDB file with Pyodbc

I am trying to write 2003 mdb files from scratch. I already have a file with the tables and column names (I have 112 columns). In my attempt I read lines from a pandas DataFrame (named sections in my code) and append those lines to the mdb file. But, when using the pyodbc INSERT INTO syntax it gave me this error:
ProgrammingError: ('42000', "[42000] [Microsoft][Driver ODBC Microsoft Access] Expression syntax error 'Equatorial-TB-BG-CA_IRI-1.0_SNP-1.0_ACA-0_ESAL-1000'. (-3100) (SQLExecDirectW)")
here is my code:
for k in range(len(sections)):
cols = tuple(list(sections.columns))
vals = tuple(list(sections.iloc[k]))
action = 'INSERT INTO SECTIONS {columns} VALUES {values}'.format(columns = str(cols).replace("'",""), values = str(vals).replace("'",""))
cursor.execute(action)
conn.commit()
Does anyone know why I am having this kind of problem?

Actually, this is not an Access specific error but a general SQL error where your string literals are not properly enclosed with quotes. Therefore, the Access engine assumes they are named fields further complicated by the hyphens where engine assumes you are running a subtraction expression.
To demonstrate the issue, see below filling in for your unknown values. Notice the string items passed in VALUES are not quoted:
sections_columns = ['database', 'tool']
cols = tuple(list(sections_columns))
sections_vals = ['ms-access', 'pandas']
vals = tuple(list(sections_vals))
action = 'INSERT INTO SECTIONS {columns} VALUES {values}'.\
format(columns = str(cols).replace("'",""), values = str(vals).replace("'",""))
print(action)
# INSERT INTO SECTIONS (database, tool) VALUES (ms-access, pandas)
Now, you could leave in the single quotes you replace in str(vals):
action = 'INSERT INTO SECTIONS {columns} VALUES {values}'.\
format(columns = str(cols).replace("'",""), values = str(vals))
print(action)
# INSERT INTO SECTIONS (database, tool) VALUES ('ms-access', 'pandas')
But even better, consider parameterizing the query with qmark placeholders and passing the values as parameters (second argument of cursor.execute(query, params)). This avoids any need to quote or unquote string or numeric values:
# MOVED OUTSIDE LOOP AS UNCHANGING OBJECTS
cols = tuple(sections.columns) # REMOVED UNNEEDED list()
qmarks = tuple(['?' for i in cols]) # NEW OBJECT
action = 'INSERT INTO SECTIONS {columns} VALUES {qmarks}'.\
format(columns = str(cols).replace("'",""), qmarks = str(qmarks))
# INSERT INTO SECTIONS (col1, col2, col3, ...) VALUES (?, ?, ?...)
for k in range(len(sections)):
vals = list(sections.iloc[k]) # REMOVED tuple()
cursor.execute(action, vals) # EXECUTE PARAMETERIZED QUERY
conn.commit()
Even much better, avoid any looping with executemany of DataFrame.values.tolist() using a prepared statement:
# PREPARED STATEMENT
cols = tuple(sections.columns)
qmarks = tuple(['?' for i in cols])
action = 'INSERT INTO SECTIONS {columns} VALUES {qmarks}'.\
format(columns = str(cols).replace("'",""), qmarks = str(qmarks))
# EXECUTE PARAMETERIZED QUERY
cursor.executemany(action, sections.values.tolist())
conn.commit()

Using instance variables in SQLite3 update?

Ok so basically I'm trying to update an existing SQLite3 Database with instance variables (typ and lvl)
#Set variables
typ = 'Test'
lvl = 6
#Print Databse
print("\nHere's a listing of all the records in the table:\n")
for row in cursor.execute("SELECT rowid, * FROM fieldmap ORDER BY rowid"):
print(row)
#Update Info
sql = """
UPDATE fieldmap
SET buildtype = typ, buildlevel = lvl
WHERE rowid = 11
"""
cursor.execute(sql)
#Print Databse
print("\nHere's a listing of all the records in the table:\n")
for row in cursor.execute("SELECT rowid, * FROM fieldmap ORDER BY rowid"):
print(row)
As an Error I'm getting
sqlite3.OperationalError: no such column: typ
Now I basically know the problem is that my variable is inserted with the wrong syntax but I can not for the life of me find the correct one. It works with strings and ints just fine like this:
sql = """
UPDATE fieldmap
SET buildtype = 'house', buildlevel = 3
WHERE rowid = 11
"""
But as soon as I switch to the variables it throws the error.

Your query is not actually inserting the values of the variables typ and lvl into the query string. As written the query is trying to reference columns named typ and lvl, but these don't exist in the table.
Try writing is as a parameterised query:
sql = """
UPDATE fieldmap
SET buildtype = ?, buildlevel = ?
WHERE rowid = 11
"""
cursor.execute(sql, (typ, lvl))
The ? acts as a placeholder in the query string which is replaced by the values in the tuple passed to execute(). This is a secure way to construct the query and avoids SQL injection vulnerabilities.

Hey I think you should use ORM to manipulate with SQL database.
SQLAlchemy is your friend. I use that with SQLite, MySQL, PostgreSQL. It is fantastic.
That can make you get away from this syntax error since SQL does take commas and quotation marks as importance.
For hard coding, you may try this:
sql = """
UPDATE fieldmap
SET buildtype = '%s', buildlevel = 3
WHERE rowid = 11
""" % (house)
This can solve your problem temporarily but not for the long run. ORM is your friend.
Hope this could be helpful!

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Merge tables from two different databases - sqlite3/Python - python

I have two different SQLite databases XXX and YYY. XXX contains table A and YYY contains B respectively. A and B have same structure(columns). How to append the rows of B in A in Python - SQLite API. After appending A contains rows of A and rows of B.

Related

How to drop all tables including dependent object

Value error inserting into Postgres table with psycopg2

Update multiple rows of SQL table from Python script

Syntax error when inserting into 2003 MDB file with Pyodbc

Using instance variables in SQLite3 update?

Categories

Resources