Can I split INSERT statement into several ones without repeat inserting rows? - python

I have such an INSERT statement:
mtemp = "station, calendar, type, name, date, time"
query = "INSERT INTO table (%s) VALUES ( '%s', '%s', '%s', %s, '%s', '%s' );"
query = query % (mtemp, mstation, mcalendar, mtype, mname, mdate, mtime)
curs.execute(query, )
conn.commit()
The problem is that I can not get the variables: mcalendar, mdate, mtime in this statement. They are not constant values. I would have to access each of them within a forloop. However, the values of mstation, mtype and mname are fixed. I tried to split the INSERT statement into several ones: one for each of the three variables in a forloop, and one for the three fixed values in a single forloop. The forloop is basically to define when to insert rows. I have a list of rows1 and a list of rows2, rows1 is a full list of records while rows2 lack some of them. I’m checking if the rows2 record exist in rows1. If it does, then execute the INSERT statement, if not, do nothing.
I ran the codes and found two problems:
It’s inserting way more rows than it is supposed to. It’s supposed to insert no more than 240 rows for there are only 240 time occurrences in each day for each sensor. (I wonder if it is because I wrote too many forloop so that it keeps inserting rows). Now it’s getting more than 400 new rows.
In these new rows being inserted to the table, they only have values in the columns of fixed value. For the three ones that I use the single forloop to insert data, they don’t have value at all.
Hope someone give me some tip here. Thanks in advance! I can put more codes here if needed. I’m not even sure if I’m in the right track.

I'm not sure I understand exactly your scenario, but is this the sort of thing you need?
Pseudo code
mstation = "foo"
mtype = "bar"
mname = "baz"
mtemp = "station, calendar, type, name, date, time"
queryTemplate = "INSERT INTO table (%s) VALUES ( '%s', '%s', '%s', %s, '%s', '%s' );"
foreach (mcalendar in calendars)
foreach (mdate in dates)
foreach (mtime in times)
query = queryTemplate % (mtemp, mstation, mcalendar, mtype, mname, mdate, mtime)
curs.execute(query, )

One INSERT statement always corresponds to one new row in a table. (Unless of course there is an error during the insert.) You can INSERT a row, and then UPDATE it later to add/change information but there is no such thing as splitting up an INSERT.
If you have a query which needs to be executed multiple times with changing data, the best option is a prepared statement. A prepared statement "compiles" an SQL query but leaves placeholders that can set each time it is executed. This improves performance because the statement doesn't need to be parsed each time. You didn't specify what library you're using to connect to postgres so I don't know what the syntax would be, but it's something to look in to.
If you can't/don't want to use prepared statements, you'll have to just create the query string once for each insert. Don't substitute the values in before the loop, wait until you know them all before creating the query.

Following syntax works in SQL Server 2008 but not in SQL Server 2005.
CREATE TABLE Temp (id int, name varchar(10));
INSERT INTO Temp (id, name) VALUES (1, 'Anil'), (2, 'Ankur'), (3, 'Arjun');
SELECT * FROM Temp;
id | name
------------
1 | Anil
2 | Ankur
3 | Arjun

Related

How do you avoid duplicate entry into database?

I would like to filter database inserts to avoid duplicates so it only insert 1 product per 1 ProductId. How do I do this?
This is my insert:
add_data = ("INSERT INTO productdetails"
"(productId, productUrl, discount, evaluateScore, volume, packageType, lotNum, validTime, storeName, storeUrl, allImageUrls, description) "
"VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s)")
This is how it suppose to look like but in PyMySQL, how do I do the same in mysql.connector ?
INSERT INTO producttable (productId, productTitle, salePrice, originalPrice )
SELECT * FROM (SELECT %(productId)s, %(productTitle)s, %(salePrice)s, %(originalPrice)s) AS tmp
WHERE NOT EXISTS (
SELECT productId FROM producttable WHERE productId = %(productId)s
)
LIMIT 1;
The proper approach to do this is at the database end. You need to add a unique constraint:
ALTER TABLE productdetails
ADD UNIQUE (productId);
You can than simply do Insert, without any where or if.
Why?
If you keep a set as suggested by yayati, you will limit yourself by having the set and the processing surrounding it as a bottleneck.
If you add the constraint, than it's left to the database to do fast checks for uniqueness even with millions of rows. Than you see if the DB returns error if its not unique.
Set column to unique. then use INSERT IGNORE statement, If there is duplicate entry the query will not execute. you can read more here about INSERT IGNORE.
What you could do is create the Insert statements via the String Interpolation in and keep adding them in a Set. The Set collection would only keep unique strings in itself. You can then bulk load the Set of unique SQL insert statements into your RDBMS.

python sql statement optimalization in python

I found the only way how to update only null variables in mysql db with python.
I have this kind of statement:
sql = "INSERT INTO `table` VALUES (%s,%s,%s,%s,%s,%s,%s,%s,%s)\
ON DUPLICATE KEY UPDATE Data_block_1_HC1_sec_voltage=IF(VALUES(Data_block_1_HC1_sec_voltage)IS NULL,Data_block_1_HC1_sec_voltage,VALUES(Data_block_1_HC1_sec_voltage)),\
`Data_block_1_TC1_1`=IF(VALUES(`Data_block_1_TC1_1`)IS NULL,`Data_block_1_TC1_1`,VALUES(`Data_block_1_TC1_1`)),\
`Data_block_1_TC1_2`=IF(VALUES(`Data_block_1_TC1_2`)IS NULL,`Data_block_1_TC1_2`,VALUES(`Data_block_1_TC1_2`)),\
`Data_block_1_TCF1_1`=IF(VALUES(`Data_block_1_TCF1_1`)IS NULL,`Data_block_1_TCF1_1`,VALUES(`Data_block_1_TCF1_1`)),\
`HC1_HC1_output`=IF(VALUES(`HC1_HC1_output`)IS NULL,`HC1_HC1_output`,VALUES(`HC1_HC1_output`)),\
`Data_block_1_HC1_sec_cur`=IF(VALUES(`Data_block_1_HC1_sec_cur`)IS NULL,`Data_block_1_HC1_sec_cur`,VALUES(`Data_block_1_HC1_sec_cur`)),\
`Data_block_1_HC1_power`=IF(VALUES(`Data_block_1_HC1_power`)IS NULL,`Data_block_1_HC1_power`,VALUES(`Data_block_1_HC1_power`)),\
`HC1_HC1_setpoint`=IF(VALUES(`HC1_HC1_setpoint`)IS NULL,`HC1_HC1_setpoint`,VALUES(`HC1_HC1_setpoint`))\
"
Datablocks are columns in db. Primary key is datetime. Right now there are 8 columns but I will have a lot more variables (more columns). I am not reallz good at python but I dont like the statement because its kind of hardcoded. Could I make this statement somehow in a for cycle or something so It doesnt have to be so long and I dont have to write all the variables manually?
Thx for your help
Let's assume that your column names are stored in an array cols. Then in order to generate the "interesting" inner part of the SQL statement above, you could do
',\\\n'.join(map(lambda c: r'`%(col)s` = IF(VALUES(`%(col)s`) IS NULL, `%(col)s`, VALUES(`%(col)s`))' % {'col': c}, cols))
Here, map generates for each element of cols the corresponding line of the SQL statement and join then stitches everything together.

python mysqldb query with where

I use MySQLDB to query some data from database, when use like in sql, I am confused about sql sentence.
As I use like, so I construct below sql which can get correct result.
cur.execute("SELECT a FROM table WHERE b like %s limit 0,10", ("%"+"ccc"+"%",))
Now I want to make column b as variable as below. it will get none
cur.execute("SELECT a FROM table WHERE %s like %s limit 0,10", ("b", "%"+"ccc"+"%"))
I searached many website but not get result. I am a bit dizzy.
In the db-api, parameters are for values only, not for columns or other parts of the query. You'll need to insert that using normal string substitution.
column = 'b'
query = "SELECT a FROM table WHERE {} like %s limit 0,10".format(column)
cur.execute(query, ("%"+"ccc"+"%",))
You could make this a bit nicer by using format in the parameters too:
cur.execute(query, ("%{}%".format("ccc",))
The reason that the second query does not work is that the query that results from the substitution in the parameterised query looks like this:
select a from table where 'b' like '%ccc%' limit 0,10
'b' does not refer to a table, but to the static string 'b'. If you instead passed the string abcccba into the query you'd get a query that selects all rows:
cur.execute("SELECT a FROM table WHERE %s like %s limit 0,10", ("abcccba", "%"+"ccc"+"%"))
generates query:
SELECT a FROM table WHERE 'abcccba' like '%ccc%' limit 0,10
From this you should now be able to see why the second query returns an empty result set: the string b is not like %ccc%, so no rows will be returned.
Therefore you can not set values for table or column names using parameterised queries, you must use normal Python string subtitution:
cur.execute("SELECT a FROM table WHERE {} like %s limit 0,10".format('b'), ("abcccba", "%"+"ccc"+"%"))
which will generate and execute the query:
SELECT a FROM table WHERE b like '%ccc%' limit 0,10
You probably need to rewrite your variable substitution from
cur.execute("SELECT a FROM table WHERE b like %s limit 0,10", ("%"+"ccc"+"%"))
to
cur.execute("SELECT a FROM table WHERE b like %s limit 0,10", ("%"+"ccc"+"%",))
Note the trailing comma which adds a last empty element, which makes sure the tuple that states variables is longer than 1 element. In this example the string concatenation isn't even necessary, this code says:
cur.execute("SELECT a FROM table WHERE b like %s limit 0,10", ("%ccc%",))

select all id's and insert missing data

I have a database where i store some values with a auto generated index key. I also have a n:m mapping table like this:
create table data(id int not null identity(1,1), col1 int not null, col2 varchar(256) not null);
create table otherdata(id int not null identity(1.1), value varchar(256) not null);
create table data_map(dataid int not null, otherdataid int not null);
every day the data table needs to be updated with a list of new values, where a lot of them are already present but needs to be inserted into the data_map (the key in otherdata is then generated, so in this table the data is always new).
one way of doing it would be to first try to insert all values, then select the generated id, then insert into data_map:
mydata = [] # list of tuples
cursor.executemany("if not exists (select * from data where col1 = %d and col2 = %d) insert into data (col1, col2) values (%d, %d)", mydata);
# now select the id's
# [...]
but that obviously is quite bad because i need to select all things without using the key and also i need to do the check without using the key, so i need indexed data first, otherwise everything is very slow.
my next approach was to use a hashfunction (like md5 or crc64) to generate my own hash over col1 and col2, to be able to insert all values without using a select and be able to use the indexed key when inserting missing values.
can this be optimized or is it the best thing i could do?
the amout of lines is >500k per change, where maybe ~20-50% will be already in the database.
timing wise it looks like that calculating the hashes is much faster than inserting data into the database.
As far as I concern, you use mysql.connector. If it is, when you run cursor.execute() you should not use %d types. Everything should be just %s and connector will do this job about type conversions

Bulk insert with returning IDs performance

I've got a table which I want to insert around 1000 items each query, and get their PK after creation, for later use as FK for other tables.
I've tried inserting them using returning syntax in postgresql.
but it takes around 10 sec to insert
INSERT INTO table_name (col1, col2, col3) VALUES (a1,a2,a3)....(a(n-2),a(n-1),a(n)) returning id;
By removing RETURNING I get much better performance ~50ms.
I think that if I can get an atomic operation to get the first id and insert the rows at the same time I could get better performance by removing the RETURNING.
but don't understand if that is possible.
Generate id using the nextval
http://www.postgresql.org/docs/9.1/static/sql-createsequence.html
CREATE TEMP TABLE temp_val AS(
VALUES(nextval('table_name_id_seq'),a1,a2,a3),
(nextval('table_name_id_seq'),a1,a2,a3)
);
INSERT INTO table_name (id, col1, col2, col3)(
SELECT column1,column2,column3,column4
FROM temp_val
);
SELECT column1 FROM temp_val;

Categories