I have made a table 'temporary'(x,train1,train2,..train4) with 5 columns.I want to fill the column 'train1' with calculated data (train.y1-ideal.y1) from tables 'train'(x,y1) and 'ideal'(x,y1). But the following nested sql query is giving 'syntax error near SELECT'. What is wrong with it?
train=1
with engine.connect() as conn:
while train<2:
ideal=1
col_train='y'+str(train)
train_no=str(train)
col_ideal='y'+str(ideal)
query1=conn.execute(text(("INSERT INTO temporary (train%s) VALUES (SELECT (train.%s-ideal.%s)*(train.%s-ideal.%s) FROM train INNER JOIN ideal ON train.x=ideal.x)")%(train_no,col_train,col_ideal,col_train,col_ideal)))
train+=1
What is wrong with it?
I believe that your issue is that the SELECT .... should be enclosed in parenthesises.
The Fix (assuming that I've correctly added the parenthesises in the right place, if not see the demo below)
query1=conn.execute(text(("INSERT INTO temporary (train%s) VALUES ((SELECT (train.%s-ideal.%s)*(train.%s-ideal.%s) FROM train INNER JOIN ideal ON train.x=ideal.x))")%(train_no,col_train,col_ideal,col_train,col_ideal)))
The following is a demo of the working SQL (albeit that the tables may be different) :-
DROP TABLE IF EXISTS train;
DROP TABLE IF EXISTS ideal;
DROP TABLE IF EXISTS temporary;
CREATE TABLE IF NOT EXISTS train (x INTEGER PRIMARY KEY, train_no INTEGER,col_train TEXT);
CREATE TABLE IF NOT EXISTS ideal (x INTEGER PRIMARY KEY, col_ideal INTEGER, col_train INTEGER);
CREATE TABLE IF NOT EXISTS temporary (train_no INTEGER);
INSERT INTO temporary (train_no) VALUES (
( /*<<<<<<<<<< ADDED */
SELECT (train.col_train-ideal.col_ideal)*(train.col_train-ideal.col_ideal)
FROM train INNER JOIN ideal ON train.x=ideal.x
) /*<<<<<<<<<< ADDED */
);
When executed then:-
INSERT INTO temporary (train_no) VALUES (
( /* ADDED */
SELECT (train.col_train-ideal.col_ideal)*(train.col_train-ideal.col_ideal)
FROM train INNER JOIN ideal ON train.x=ideal.x
) /* ADDED */
)
> Affected rows: 1
> Time: 0.084s
As opposed to (without the parenthesises) :-
INSERT INTO temporary (train_no) VALUES (
/*(*/ /* ADDED */
SELECT (train.col_train-ideal.col_ideal)*(train.col_train-ideal.col_ideal)
FROM train INNER JOIN ideal ON train.x=ideal.x
/*)*/ /* ADDED */
)
> near "SELECT": syntax error
> Time: 0s
Related
I have 2 database connections. I want to compare a single table from each connection to each other. And if there are unmatched records, I want to add them to the table database where they are missing.
This is what I came up with but id doesn't seem to do the inserting part. I'm new to python excuse the code thanks.
# establishing connections and querying the database
import sqlite3
con1 = sqlite3.connect("database1.db")
cur1 = con1.cursor()
table1 = cur1.execute("SELECT * FROM table1")
fetch_table1 = table1.fetchall()
mylist = list(table1)
con2 = sqlite3.connect("database2.db")
cur2 = con2.cursor()
table2= cur2.execute("SELECT * FROM table2")
table2 = table2.fetchall()
mylist2 = list(table2).
# finding unmatched eliminates and inserting them to the database
def non_match_elements(mylist2, mylist):
non_match = []
for i in mylist2:
if i not in mylist:
non_match.append(i)
non_match = non_match_elements(mylist2, mylist)
cur1.executemany("""INSERT INTO table 1 VALUES (?,?,?)""", non_match)
con1.commit()
res = cur1.execute("select column from table1")
print(res.fetchall())
Thanks again guys
I would suggest ATTACHing one connection to the other, you then the have two INSERT INTO table SELECT * FROM table2 WHERE query that would insert from one table to the other.
here's an example/demo (not of the ATTACH DATABASE but of aligning two tables with the same schema but with different data):-
/* Cleanup - just in case*/
DROP TABLE IF EXISTS table1;
DROP TABLE IF EXISTS table2;
/* Create the two tables */
CREATE TABLE IF NOT EXISTS table1 (val1 TEXT, val2 TEXT, val3 TEXT);
CREATE TABLE IF NOT EXISTS table2 (val1 TEXT, val2 TEXT, val3 TEXT);
/*****************************************************************/
/* load the two different sets of data and also some common data */
INSERT INTO table1 VALUES ('A','AA','AAA'),('B','BB','BBB'),('C','CC','CCC'),('M','MM','MMM');
INSERT INTO table2 VALUES ('X','XX','XXX'),('Y','YY','YYY'),('Z','ZZ','ZZZ'),('M','MM','MMM');
/*************************************************************/
/* Macth each table to the other using an INSERT .... SELECT */
/*************************************************************/
INSERT INTO table1 SELECT * FROM table2 WHERE val1||val2||val3 NOT IN (SELECT(val1||val2||val3) FROM table1);
INSERT INTO table2 SELECT * FROM table1 WHERE val1||val2||val3 NOT IN (SELECT(val1||val2||val3) FROM table2);
/* Output both tables */
SELECT 'T1',* FROM table1;
SELECT 'T2',* FROM table1;
/* Cleanup */
DROP TABLE IF EXISTS table1;
DROP TABLE IF EXISTS table2;
The results of the 2 SELECTS being :-
and
the first column (T1 or T2) just being used to indicate which table the SELECT is from.
table1 has the X,Y and Z values rows copied from table2
table2 has the A,B and C values rows copied from table1
the M values row, as they exist in both remain intact, they are neither duplicated nor deleted.
Thus data wise the two tables are identical.
I am able to insert a foreign key in a SQL table. However, after doing the same thing for 3 other tables, I will have to insert those 4 FK's in my fact table. I am asking now to know in advance if this is the way to go, database-model-wise.
Code to skip duplicate rows, insert columns and a FK RegionID:
cursor.execute(("""
IF NOT EXISTS (
SELECT #address1client, #address2client, #cityClient
INTERSECT
SELECT address1client, address2client, cityClient
FROM dbo.AddressClient)
BEGIN
INSERT INTO dbo.AddressClient (address1client, address2client, cityClient, RegionID)
SELECT #address1client, #address2client, #cityClient, RegionID
FROM dbo.Region
WHERE province=#province AND country=#country)
END""")
My questions are:
Does a BEGIN ... END statement execute all at once? If the answer is yes, would the code below work? I ask, because there can at no point be FK_ID columns with null values.
...
BEGIN
INSERT INTO dbo.Fact(product,saleTotal,saleDate) VALUES (#product, #saleTotal, #saleDate)
INSERT INTO dbo.Fact (ClientAddressID)
SELECT ClientAddressID
FROM dbo.ClientAddress
WHERE address1c=#address1c AND address2c=#address2c AND cityC=#cityC)
INSERT INTO dbo.Fact (SupplierAddressID)
SELECT SupplierAddressID
FROM dbo.SupplierAddress
WHERE address1s=#address1s AND address2s=#address2s AND cityS=#cityS)
INSERT INTO dbo.Fact (DetailID)
SELECT DetailID
FROM dbo.Detail
WHERE categoryNum=#categoryNum AND type=#type AND nature=#nature)
END""")
2- If a BEGIN ... END statement doesn't execute all at once, how do I go about inserting multiple FK's in a table?
I have an SQLite database. I am trying to insert values (users_id, lessoninfo_id) in table bookmarks, only if both do not exist before in a row.
INSERT INTO bookmarks(users_id,lessoninfo_id)
VALUES(
(SELECT _id FROM Users WHERE User='"+$('#user_lesson').html()+"'),
(SELECT _id FROM lessoninfo
WHERE Lesson="+lesson_no+" AND cast(starttime AS int)="+Math.floor(result_set.rows.item(markerCount-1).starttime)+")
WHERE NOT EXISTS (
SELECT users_id,lessoninfo_id from bookmarks
WHERE users_id=(SELECT _id FROM Users
WHERE User='"+$('#user_lesson').html()+"') AND lessoninfo_id=(
SELECT _id FROM lessoninfo
WHERE Lesson="+lesson_no+")))
This gives an error saying:
db error near where syntax.
If you never want to have duplicates, you should declare this as a table constraint:
CREATE TABLE bookmarks(
users_id INTEGER,
lessoninfo_id INTEGER,
UNIQUE(users_id, lessoninfo_id)
);
(A primary key over both columns would have the same effect.)
It is then possible to tell the database that you want to silently ignore records that would violate such a constraint:
INSERT OR IGNORE INTO bookmarks(users_id, lessoninfo_id) VALUES(123, 456)
If you have a table called memos that has two columns id and text you should be able to do like this:
INSERT INTO memos(id,text)
SELECT 5, 'text to insert'
WHERE NOT EXISTS(SELECT 1 FROM memos WHERE id = 5 AND text = 'text to insert');
If a record already contains a row where text is equal to 'text to insert' and id is equal to 5, then the insert operation will be ignored.
I don't know if this will work for your particular query, but perhaps it give you a hint on how to proceed.
I would advice that you instead design your table so that no duplicates are allowed as explained in #CLs answer below.
For a unique column, use this:
INSERT OR REPLACE INTO tableName (...) values(...);
For more information, see: sqlite.org/lang_insert
insert into bookmarks (users_id, lessoninfo_id)
select 1, 167
EXCEPT
select user_id, lessoninfo_id
from bookmarks
where user_id=1
and lessoninfo_id=167;
This is the fastest way.
For some other SQL engines, you can use a Dummy table containing 1 record.
e.g:
select 1, 167 from ONE_RECORD_DUMMY_TABLE
This is my query using code found perusing this site:
query="""SELECT Family
FROM Table2
INNER JOIN Table1 ON Table1.idSequence=Table2.idSequence
WHERE (Table1.Chromosome, Table1.hg19_coordinate) IN ({seq})
""".format(seq=','.join(['?']*len(matchIds_list)))
matchIds_list is a list of tuples in (?,?) format.
It works if I just ask for one condition (ie just Table1.Chromosome as oppose to both Chromosome and hg_coordinate) and matchIds_list is just a simple list of single values, but I don't know how to get it to work with a composite key or both columns.
Since you're running SQLite 3.7.17, I'd recommend to just use a temporary table.
Create and populate your temporary table.
cursor.executescript("""
CREATE TEMP TABLE control_list (
Chromosome TEXT NOT NULL,
hg19_coordinate TEXT NOT NULL
);
CREATE INDEX control_list_idx ON control_list (Chromosome, hg19_coordinate);
""")
cursor.executemany("""
INSERT INTO control_list (Chromosome, hg19_coordinate)
VALUES (?, ?)
""", matchIds_list)
Just constrain your query to the control list temporary table.
SELECT Family
FROM Table2
INNER JOIN Table1
ON Table1.idSequence = Table2.idSequence
-- Constrain to control_list.
WHERE EXISTS (
SELECT *
FROM control_list
WHERE control_list.Chromosome = Table1.Chromosome
AND control_list.hg19_coordinate = Table1.hg19_coordinate
)
And finally perform your query (there's no need to format this one).
cursor.execute(query)
# Remove the temporary table since we're done with it.
cursor.execute("""
DROP TABLE control_list;
""")
Short Query (requires SQLite 3.15): You actually almost had it. You need to make the IN ({seq}) a subquery
expression.
SELECT Family
FROM Table2
INNER JOIN Table1
ON Table1.idSequence = Table2.idSequence
WHERE (Table1.Chromosome, Table1.hg19_coordinate) IN (VALUES {seq});
Long Query (requires SQLite 3.8.3): It looks a little complicated, but it's pretty straight forward. Put your
control list into a sub-select, and then constrain that main select by the control
list.
SELECT Family
FROM Table2
INNER JOIN Table1
ON Table1.idSequence = Table2.idSequence
-- Constrain to control_list.
WHERE EXISTS (
SELECT *
FROM (
SELECT
-- Name the columns (must match order in tuples).
"" AS Chromosome,
":1" AS hg19_coordinate
FROM (
-- Get control list.
VALUES {seq}
) AS control_values
) AS control_list
-- Constrain Table1 to control_list.
WHERE control_list.Chromosome = Table1.Chromosome
AND control_list.hg19_coordinate = Table1.hg19_coordinate
)
Regardless of which query you use, when formatting the SQL replace {seq} with (?,?) for each compsite
key instead of just ?.
query = " ... ".format(seq=','.join(['(?,?)']*len(matchIds_list)))
And finally flatten matchIds_list when you execute the query because it is a list of tuples.
import itertools
cursor.execute(query, list(itertools.chain.from_iterable(matchIds_list)))
This is a follow-up question. Below is a piece of my Python script that reads a constantly growing log files (text) and insert data into Postgresql DB. New log file generated each day. What I do is I commit each line which cuases a huge load and a really poor performance (needs 4 hours to insert 30 min of the file data!). How can I improve this code to insert bulks insead of lines? and would this help improve the performance and reduce load? I've read about copy_from but couldn't figure out how to use it in such situation.
import psycopg2 as psycopg
try:
connectStr = "dbname='postgis20' user='postgres' password='' host='localhost'"
cx = psycopg.connect(connectStr)
cu = cx.cursor()
logging.info("connected to DB")
except:
logging.error("could not connect to the database")
import time
file = open('textfile.log', 'r')
while 1:
where = file.tell()
line = file.readline()
if not line:
time.sleep(1)
file.seek(where)
else:
print line, # already has newline
dodecode(line)
------------
def dodecode(fields):
global cx
from time import strftime, gmtime
from calendar import timegm
import os
msg = fields.split(',')
part = eval(msg[2])
msgnum = int(msg[3:6])
print "message#:", msgnum
print fields
if (part==1):
if msgnum==1:
msg1 = msg_1.decode(bv)
#print "message1 :",msg1
Insert(msgnum,time,msg1)
elif msgnum==2:
msg2 = msg_2.decode(bv)
#print "message2 :",msg2
Insert(msgnum,time,msg2)
elif msgnum==3:
....
....
....
----------------
def Insert(msgnum,time,msg):
global cx
try:
if msgnum in [1,2,3]:
if msg['type']==0:
cu.execute("INSERT INTO table1 ( messageid, timestamp, userid, position, text ) SELECT "+str(msgnum)+", '"+time+"', "+str(msg['UserID'])+", ST_GeomFromText('POINT("+str(float(msg['longitude']), '"+text+"')+" "+str(float(msg['latitude']))+")']))+" WHERE NOT EXISTS (SELECT * FROM table1 WHERE timestamp='"+time+"' AND text='"+text+"';")
cu.execute("INSERT INTO table2 ( field1,field2,field3, time_stamp, pos,) SELECT "+str(msg['UserID'])+","+str(int(msg['UserName']))+","+str(int(msg['UserIO']))+", '"+time+"', ST_GeomFromText('POINT("+str(float(msg['longitude']))+" "+str(float(msg['latitude']))+")')," WHERE NOT EXISTS (SELECT * FROM table2 WHERE field1="+str(msg['UserID'])+");")
cu.execute("Update table2 SET field3='"+str(int(msg['UserIO']))+"',time_stamp='"+str(time)+"',pos=ST_GeomFromText('POINT("+str(float(msg['longitude']))+" "+str(float(msg['latitude']))+")'),"' WHERE field1='"+str(msg['UserID'])+"' AND time_stamp < '"+str(time)+"';")
elif msg['type']==1:
cu.execute("INSERT INTO table1 ( messageid, timestamp, userid, position, text ) SELECT "+str(msgnum)+", '"+time+"', "+str(msg['UserID'])+", ST_GeomFromText('POINT("+str(float(msg['longitude']), '"+text+"')+" "+str(float(msg['latitude']))+")']))+" WHERE NOT EXISTS (SELECT * FROM table1 WHERE timestamp='"+time+"' AND text='"+text+"';")
cu.execute("INSERT INTO table2 ( field1,field2,field3, time_stamp, pos,) SELECT "+str(msg['UserID'])+","+str(int(msg['UserName']))+","+str(int(msg['UserIO']))+", '"+time+"', ST_GeomFromText('POINT("+str(float(msg['longitude']))+" "+str(float(msg['latitude']))+")')," WHERE NOT EXISTS (SELECT * FROM table2 WHERE field1="+str(msg['UserID'])+");")
cu.execute("Update table2 SET field3='"+str(int(msg['UserIO']))+"',time_stamp='"+str(time)+"',pos=ST_GeomFromText('POINT("+str(float(msg['longitude']))+" "+str(float(msg['latitude']))+")'),"' WHERE field1='"+str(msg['UserID'])+"' AND time_stamp < '"+str(time)+"';")
elif msg['type']==2:
....
....
....
except Exception, err:
#print('ERROR: %s\n' % str(err))
logging.error('ERROR: %s\n' % str(err))
cx.commit()
cx.commit()
doing multiple rows per transaction, and per query will make it go faster,
when faced with a similar problem I put multiple rows in the values part of the insert query,
but you have complicated insert queries, so you'll likely need a different approach.
I'd suggest creating a temporary table and inserting say 10000 rows into it with ordinary multi-row inserts
insert into temptable values ( /* row1 data */ ) ,( /* row2 data */ ) etc...
500 rows per insert.is a good starting point.
then joining the temp table with the existing data to de-dupe it.
delete from temptable using livetable where /* .join condition */ ;
and de-duping it against itself if that is needed too
delete from temptable where id not in
( select distinct on ( /* unique columns */) id from temptable);
then using insert-select to copy the rows from the temporary table into the live table
insert into livetable ( /* columns */ )
select /* columns */ from temptable;
it looks like you might need an update-from too
and finally dropping the temp table and starting again.
ans you're writing two tables you;re going to need to double-up all these operations.
I'd do the insert by maintaing a count and a list of values to insert and then at insert time
building a repeating the (%s,%s,%s,%s) part ot the query as many times as needed and passing the list of values in separately and letting psycopg2 deal with the formatting.
I'd expect making those changes could get you a speed up of 5 times for more