I have a table in MySQL which has a column called id (auto-increment), status ('running', 'failed', 'success') and others, and I want to author a DAG which starts by inserting a row in this table, and after some other task I want at the end update this same row with a new status.
How can I get the last inserted row id from MySqlOperator?
mysql = MySqlOperator(task_id='task_name', sql='insert into table (status, others) VALUES ('running', other)')
Thanks
You can try to get the last insert id after your query:
INSERT INTO table_name (status, others) VALUES ('running', other);
SELECT LAST_INSERT_ID();
and to ensure getting the right id:
START TRANSACTION;
INSERT INTO table_name (status, others) VALUES ('running', other);
SELECT LAST_INSERT_ID();
COMMIT;
Then the operator should return the result as a XCom, you can access it by "{{ ti.xcom_pull(task_ids='task_name') }}"
Related
OK so I'm trying to improve my asp data entry page to ensure that the entry going into my data table is unique.
So in this table I have SoftwareName and SoftwareType. I'm trying to get it so if the entry page sends an insert query with parameters that match whats in the table (so same title and type) then an error is thrown up and the Data isn't entered.
Something like this:
INSERT INTO tblSoftwareTitles(
SoftwareName,
SoftwareSystemType)
VALUES(#SoftwareName,#SoftwareType)
WHERE NOT EXISTS (SELECT SoftwareName
FROM tblSoftwareTitles
WHERE Softwarename = #SoftwareName
AND SoftwareType = #Softwaretype)
So this syntax works great for selecting columns from one table into another without duplicates being entered but doesn't seem to want to work with a parametrized insert query. Can anyone help me out with this?
Edit:
Here's the code I'm using in my ASP insert method
private void ExecuteInsert(string name, string type)
{
//Creates a new connection using the HWM string
using (SqlConnection HWM = new SqlConnection(GetConnectionStringHWM()))
{
//Creates a sql string with parameters
string sql = " INSERT INTO tblSoftwareTitles( "
+ " SoftwareName, "
+ " SoftwareSystemType) "
+ " SELECT "
+ " #SoftwareName, "
+ " #SoftwareType "
+ " WHERE NOT EXISTS "
+ " ( SELECT 1 "
+ " FROM tblSoftwareTitles "
+ " WHERE Softwarename = #SoftwareName "
+ " AND SoftwareSystemType = #Softwaretype); ";
//Opens the connection
HWM.Open();
try
{
//Creates a Sql command
using (SqlCommand addSoftware = new SqlCommand{
CommandType = CommandType.Text,
Connection = HWM,
CommandTimeout = 300,
CommandText = sql})
{
//adds parameters to the Sql command
addSoftware.Parameters.Add("#SoftwareName", SqlDbType.NVarChar, 200).Value = name;
addSoftware.Parameters.Add("#SoftwareType", SqlDbType.Int).Value = type;
//Executes the Sql
addSoftware.ExecuteNonQuery();
}
Alert.Show("Software title saved!");
}
catch (System.Data.SqlClient.SqlException ex)
{
string msg = "Insert Error:";
msg += ex.Message;
throw new Exception(msg);
}
}
}
You could do this using an IF statement:
IF NOT EXISTS
( SELECT 1
FROM tblSoftwareTitles
WHERE Softwarename = #SoftwareName
AND SoftwareSystemType = #Softwaretype
)
BEGIN
INSERT tblSoftwareTitles (SoftwareName, SoftwareSystemType)
VALUES (#SoftwareName, #SoftwareType)
END;
You could do it without IF using SELECT
INSERT tblSoftwareTitles (SoftwareName, SoftwareSystemType)
SELECT #SoftwareName,#SoftwareType
WHERE NOT EXISTS
( SELECT 1
FROM tblSoftwareTitles
WHERE Softwarename = #SoftwareName
AND SoftwareSystemType = #Softwaretype
);
Both methods are susceptible to a race condition, so while I would still use one of the above to insert, but you can safeguard duplicate inserts with a unique constraint:
CREATE UNIQUE NONCLUSTERED INDEX UQ_tblSoftwareTitles_Softwarename_SoftwareSystemType
ON tblSoftwareTitles (SoftwareName, SoftwareSystemType);
Example on SQL-Fiddle
ADDENDUM
In SQL Server 2008 or later you can use MERGE with HOLDLOCK to remove the chance of a race condition (which is still not a substitute for a unique constraint).
MERGE tblSoftwareTitles WITH (HOLDLOCK) AS t
USING (VALUES (#SoftwareName, #SoftwareType)) AS s (SoftwareName, SoftwareSystemType)
ON s.Softwarename = t.SoftwareName
AND s.SoftwareSystemType = t.SoftwareSystemType
WHEN NOT MATCHED BY TARGET THEN
INSERT (SoftwareName, SoftwareSystemType)
VALUES (s.SoftwareName, s.SoftwareSystemType);
Example of Merge on SQL Fiddle
This isn't an answer. I just want to show that IF NOT EXISTS(...) INSERT method isn't safe. You have to execute first Session #1 and then Session #2. After v #2 you will see that without an UNIQUE index you could get duplicate pairs (SoftwareName,SoftwareSystemType). Delay from session #1 is used to give you enough time to execute the second script (session #2). You could reduce this delay.
Session #1 (SSMS > New Query > F5 (Execute))
CREATE DATABASE DemoEXISTS;
GO
USE DemoEXISTS;
GO
CREATE TABLE dbo.Software(
SoftwareID INT PRIMARY KEY,
SoftwareName NCHAR(400) NOT NULL,
SoftwareSystemType NVARCHAR(50) NOT NULL
);
GO
INSERT INTO dbo.Software(SoftwareID,SoftwareName,SoftwareSystemType)
VALUES (1,'Dynamics AX 2009','ERP');
INSERT INTO dbo.Software(SoftwareID,SoftwareName,SoftwareSystemType)
VALUES (2,'Dynamics NAV 2009','SCM');
INSERT INTO dbo.Software(SoftwareID,SoftwareName,SoftwareSystemType)
VALUES (3,'Dynamics CRM 2011','CRM');
INSERT INTO dbo.Software(SoftwareID,SoftwareName,SoftwareSystemType)
VALUES (4,'Dynamics CRM 2013','CRM');
INSERT INTO dbo.Software(SoftwareID,SoftwareName,SoftwareSystemType)
VALUES (5,'Dynamics CRM 2015','CRM');
GO
/*
CREATE UNIQUE INDEX IUN_Software_SoftwareName_SoftareSystemType
ON dbo.Software(SoftwareName,SoftwareSystemType);
GO
*/
-- Session #1
BEGIN TRANSACTION;
UPDATE dbo.Software
SET SoftwareName='Dynamics CRM',
SoftwareSystemType='CRM'
WHERE SoftwareID=5;
WAITFOR DELAY '00:00:15' -- 15 seconds delay; you have less than 15 seconds to switch SSMS window to session #2
UPDATE dbo.Software
SET SoftwareName='Dynamics AX',
SoftwareSystemType='ERP'
WHERE SoftwareID=1;
COMMIT
--ROLLBACK
PRINT 'Session #1 results:';
SELECT *
FROM dbo.Software;
Session #2 (SSMS > New Query > F5 (Execute))
USE DemoEXISTS;
GO
-- Session #2
DECLARE
#SoftwareName NVARCHAR(100),
#SoftwareSystemType NVARCHAR(50);
SELECT
#SoftwareName=N'Dynamics AX',
#SoftwareSystemType=N'ERP';
PRINT 'Session #2 results:';
IF NOT EXISTS(SELECT *
FROM dbo.Software s
WHERE s.SoftwareName=#SoftwareName
AND s.SoftwareSystemType=#SoftwareSystemType)
BEGIN
PRINT 'Session #2: INSERT';
INSERT INTO dbo.Software(SoftwareID,SoftwareName,SoftwareSystemType)
VALUES (6,#SoftwareName,#SoftwareSystemType);
END
PRINT 'Session #2: FINISH';
SELECT *
FROM dbo.Software;
Results:
Session #1 results:
SoftwareID SoftwareName SoftwareSystemType
----------- ----------------- ------------------
1 Dynamics AX ERP
2 Dynamics NAV 2009 SCM
3 Dynamics CRM 2011 CRM
4 Dynamics CRM 2013 CRM
5 Dynamics CRM CRM
Session #2 results:
Session #2: INSERT
Session #2: FINISH
SoftwareID SoftwareName SoftwareSystemType
----------- ----------------- ------------------
1 Dynamics AX ERP <-- duplicate (row updated by session #1)
2 Dynamics NAV 2009 SCM
3 Dynamics CRM 2011 CRM
4 Dynamics CRM 2013 CRM
5 Dynamics CRM CRM
6 Dynamics AX ERP <-- duplicate (row inserted by session #2)
There is a great solution for this problem ,You can use the Merge Keyword of Sql
Merge MyTargetTable hba
USING (SELECT Id = 8, Name = 'Product Listing Message') temp
ON temp.Id = hba.Id
WHEN NOT matched THEN
INSERT (Id, Name) VALUES (temp.Id, temp.Name);
You can check this before following, below is the sample
IF OBJECT_ID ('dbo.TargetTable') IS NOT NULL
DROP TABLE dbo.TargetTable
GO
CREATE TABLE dbo.TargetTable
(
Id INT NOT NULL,
Name VARCHAR (255) NOT NULL,
CONSTRAINT PK_TargetTable PRIMARY KEY (Id)
)
GO
INSERT INTO dbo.TargetTable (Name)
VALUES ('Unknown')
GO
INSERT INTO dbo.TargetTable (Name)
VALUES ('Mapping')
GO
INSERT INTO dbo.TargetTable (Name)
VALUES ('Update')
GO
INSERT INTO dbo.TargetTable (Name)
VALUES ('Message')
GO
INSERT INTO dbo.TargetTable (Name)
VALUES ('Switch')
GO
INSERT INTO dbo.TargetTable (Name)
VALUES ('Unmatched')
GO
INSERT INTO dbo.TargetTable (Name)
VALUES ('ProductMessage')
GO
Merge MyTargetTable hba
USING (SELECT Id = 8, Name = 'Listing Message') temp
ON temp.Id = hba.Id
WHEN NOT matched THEN
INSERT (Id, Name) VALUES (temp.Id, temp.Name);
More of a comment link for suggested further reading...A really good blog article which benchmarks various ways of accomplishing this task can be found here.
They use a few techniques: "Insert Where Not Exists", "Merge" statement, "Insert Except", and your typical "left join" to see which way is the fastest to accomplish this task.
The example code used for each technique is as follows (straight copy/paste from their page) :
INSERT INTO #table1 (Id, guidd, TimeAdded, ExtraData)
SELECT Id, guidd, TimeAdded, ExtraData
FROM #table2
WHERE NOT EXISTS (Select Id, guidd From #table1 WHERE #table1.id = #table2.id)
-----------------------------------
MERGE #table1 as [Target]
USING (select Id, guidd, TimeAdded, ExtraData from #table2) as [Source]
(id, guidd, TimeAdded, ExtraData)
on [Target].id =[Source].id
WHEN NOT MATCHED THEN
INSERT (id, guidd, TimeAdded, ExtraData)
VALUES ([Source].id, [Source].guidd, [Source].TimeAdded, [Source].ExtraData);
------------------------------
INSERT INTO #table1 (id, guidd, TimeAdded, ExtraData)
SELECT id, guidd, TimeAdded, ExtraData from #table2
EXCEPT
SELECT id, guidd, TimeAdded, ExtraData from #table1
------------------------------
INSERT INTO #table1 (id, guidd, TimeAdded, ExtraData)
SELECT #table2.id, #table2.guidd, #table2.TimeAdded, #table2.ExtraData
FROM #table2
LEFT JOIN #table1 on #table1.id = #table2.id
WHERE #table1.id is null
It's a good read for those who are looking for speed! On SQL 2014, the Insert-Except method turned out to be the fastest for 50 million or more records.
I know this post is old but I found an original way to insert values into a table with the key words INSERT INTO and EXISTS.
I say original because I did not find it on the Internet.
Here it is :
INSERT INTO targetTable(c1,c2)
select value1,value2
WHERE NOT EXISTS(select 1 from targetTable where c1=value1 and c2=value2 )
Ingnoring the duplicated unique constraint isn't a solution?
INSERT IGNORE INTO tblSoftwareTitles...
How to retrieve inserted id after inserting row in SQLite using Python? I have table like this:
id INT AUTOINCREMENT PRIMARY KEY,
username VARCHAR(50),
password VARCHAR(50)
I insert a new row with example data username="test" and password="test". How do I retrieve the generated id in a transaction safe way? This is for a website solution, where two people may be inserting data at the same time. I know I can get the last read row, but I don't think that is transaction safe. Can somebody give me some advice?
You could use cursor.lastrowid (see "Optional DB API Extensions"):
connection=sqlite3.connect(':memory:')
cursor=connection.cursor()
cursor.execute('''CREATE TABLE foo (id integer primary key autoincrement ,
username varchar(50),
password varchar(50))''')
cursor.execute('INSERT INTO foo (username,password) VALUES (?,?)',
('test','test'))
print(cursor.lastrowid)
# 1
If two people are inserting at the same time, as long as they are using different cursors, cursor.lastrowid will return the id for the last row that cursor inserted:
cursor.execute('INSERT INTO foo (username,password) VALUES (?,?)',
('blah','blah'))
cursor2=connection.cursor()
cursor2.execute('INSERT INTO foo (username,password) VALUES (?,?)',
('blah','blah'))
print(cursor2.lastrowid)
# 3
print(cursor.lastrowid)
# 2
cursor.execute('INSERT INTO foo (id,username,password) VALUES (?,?,?)',
(100,'blah','blah'))
print(cursor.lastrowid)
# 100
Note that lastrowid returns None when you insert more than one row at a time with executemany:
cursor.executemany('INSERT INTO foo (username,password) VALUES (?,?)',
(('baz','bar'),('bing','bop')))
print(cursor.lastrowid)
# None
All credits to #Martijn Pieters in the comments:
You can use the function last_insert_rowid():
The last_insert_rowid() function returns the ROWID of the last row insert from the database connection which invoked the function. The last_insert_rowid() SQL function is a wrapper around the sqlite3_last_insert_rowid() C/C++ interface function.
SQLite 3.35's RETURNING clause:
CREATE TABLE users (
id INTEGER PRIMARY KEY,
first_name TEXT,
last_name TEXT
);
INSERT INTO users (first_name, last_name)
VALUES ('Jane', 'Doe')
RETURNING id;
returns requested columns of the inserted row in INSERT, UPDATE and DELETE statements. Python usage:
cursor.execute('INSERT INTO users (first_name, last_name) VALUES (?,?)'
' RETURNING id',
('Jane', 'Doe'))
row = cursor.fetchone()
(inserted_id, ) = row if row else None
I'm having a bit of trouble trying to fix a problem I'm having in retrieving the last insert id from a query in SQLite3 using Python.
Here's a sample of my code:
import sqlite3
# Setup our SQLite Database
conn = sqlite3.connect('value_serve.db')
conn.execute("PRAGMA foreign_keys = 1") # Enable Foreign Keys
cursor = conn.cursor()
# Create table for Categories
conn.executescript('DROP TABLE IF EXISTS Category;')
conn.execute('''CREATE TABLE Category (
id INTEGER PRIMARY KEY AUTOINCREMENT,
category CHAR(132),
description TEXT,
parent_id INT,
FOREIGN KEY (parent_id) REFERENCES Category (id)
);''')
conn.execute("INSERT INTO Category (category, parent_id) VALUES ('Food', NULL)")
food_category = cursor.lastrowid
conn.execute("INSERT INTO Category (category, parent_id) VALUES ('Beverage', NULL)")
beverage_category = cursor.lastrowid
...
conn.commit() # Commit to Database
No matter what I do, when I try to get the value of 'food_category' I get a return value of 'None'.
Any help would be appreciated, thanks in advance.
The lastrowid value is set per cursor, and only visible to that cursor.
You need to execute your query on the cursor that executed the query to get the last row id. You are asking an arbitrary cursor, one that never actually is used to execute the query for a last row id, but that cursor can't know that value.
If you actually execute the query on the cursor object, it works:
cursor.execute("INSERT INTO Category (category, parent_id) VALUES ('Food', NULL)")
food_category = cursor.lastrowid
The connection.execute() function creates a new (local) cursor for that query and the last row id is only visible on that local cursor. That cursor is returned when you use connection.execute(), so you could get the same value from that return value:
cursor_used = conn.execute("INSERT INTO Category (category, parent_id) VALUES ('Food', NULL)")
food_category = cursor_used.lastrowid
I'm trying to get table name for field in result set that I got from database (Python, Postgres). There is a function in PHP to get table name for field, I used it and it works so I know it can be done (in PHP). I'm looking for similar function in Python.
pg_field_table() function in PHP gets results and field number and "returns the name of the table that field belongs to". That is exactly what I need, but in Python.
Simple exaple - create tables, insert rows, select data:
CREATE TABLE table_a (
id INT,
name VARCHAR(10)
);
CREATE TABLE table_b (
id INT,
name VARCHAR(10)
);
INSERT INTO table_a (id, name) VALUES (1, 'hello');
INSERT INTO table_b (id, name) VALUES (1, 'world');
When using psycopg2 or sqlalchemy I got right data and right field names but without information about table name.
import psycopg2
query = '''
SELECT *
FROM table_a A
LEFT JOIN table_b B
ON A.id = B.id
'''
con = psycopg2.connect('dbname=testdb user=postgres password=postgres')
cur = con.cursor()
cur.execute(query)
data = cur.fetchall()
print('fields', [desc[0] for desc in cur.description])
print('data', data)
The example above prints field names. The output is:
fields ['id', 'name', 'id', 'name']
data [(1, 'hello', 1, 'world')]
I know that there is cursor.description, but it does not contain table name, just the field name.
What I need - some way to retrieve table names for fields in result set when using raw SQL to query data.
EDIT 1: I need to know if "hello" came from "table_a" or "table_b", both fields are named same ("name"). Without information about table name you can't tell in which table the value is.
EDIT 2: I know that there are some workarounds like SQL aliases: SELECT table_a.name AS name1, table_b.name AS name2 but I'm really asking how to retrieve table name from result set.
EDIT 3: I'm looking for solution that allows me to write any raw SQL query, sometimes SELECT *, sometimes SELECT A.id, B.id ... and after executing that query I will get field names and table names for fields in the result set.
It is necessary to query the pg_attribute catalog for the table qualified column names:
query = '''
select
string_agg(format(
'%%1$s.%%2$s as "%%1$s.%%2$s"',
attrelid::regclass, attname
) , ', ')
from pg_attribute
where attrelid = any (%s::regclass[]) and attnum > 0 and not attisdropped
'''
cursor.execute(query, ([t for t in ('a','b')],))
select_list = cursor.fetchone()[0]
query = '''
select {}
from a left join b on a.id = b.id
'''.format(select_list)
print cursor.mogrify(query)
cursor.execute(query)
print [desc[0] for desc in cursor.description]
Output:
select a.id as "a.id", a.name as "a.name", b.id as "b.id", b.name as "b.name"
from a left join b on a.id = b.id
['a.id', 'a.name', 'b.id', 'b.name']
I have a db with a million row, I want to fetch all the rows and do some operation on them an insert them into another table (newTable).
I figured out I need to use server side cursor, since I can not fetch all data into memory.
and I also figured out I need to use two connections so when I commit I dont loose the cursor that I made.
but now my problem is, it wont put all the records into the newTable as it shows in the log.
in console log I see it tries to insert 500,000 th record into the database
560530 inserting 20551581 and 2176511
but when I do a count on the created table (while it is doing it) it shows only about 10,000 rows in the new table .
select count(*) from newTable;
count
-------
10236
and when the program finishes, I only have about 11000 records in the new table, while in the records it shows it tried to insert at least 2 million rows. whats wrong with my code?
def fillMyTable(self):
try:
self.con=psycopg2.connect(database='XXXX',user='XXXX',password='XXXX',host='localhost')
cur=self.con.cursor(name="mycursor")
cur.arraysize=1000
cur.itersize=2000
self.con2=psycopg2.connect(database='XXXX',user='XXXX',password='XXXX',host='localhost')
cur2=self.con2.cursor()
q="SELECT id,oldgroups from oldTable;"
cur.execute(q)
i=0
while True:
batch= cur.fetchmany()
if not batch:
break
for row in batch:
userid=row[0]
groupids=self.doSomethingOnGroups(row[1])
for groupid in groupids:
# insert only if it does NOT exist
i+=1
print (str(i)+" inserting "+str(userid)+" and "+str(groupid))
q2="INSERT INTO newTable (userid, groupid) SELECT %s, %s WHERE NOT EXISTS ( SELECT %s FROM newTable WHERE groupid = %s);"%(userid,groupid,userid,groupid)
cur2.execute(q2)
self.con2.commit()
except psycopg2.DatabaseError, e:
self.writeLog(e)
finally:
cur.close()
self.con2.commit()
self.con.close()
self.con2.close()
Update : I also noticed it uses lots of my RAM, isnt server side cursor supposed not do that?
Cpu(s): 15.2%us, 6.4%sy, 0.0%ni, 56.5%id, 2.8%wa, 0.0%hi, 0.2%si,
18.9%st Mem: 1695220k total, 1680496k used, 14724k free, 3084k buffers Swap: 0k total, 0k used, 0k free,
1395020k cached
If the oldgroups column is in the form 1,3,6,7 this will work:
insert into newTable (userid, groupid)
select id, groupid
from (
select
id,
regexp_split_to_table(olgroups, ',') as groupid
from oldTable
) o
where
not exists (
select 1
from newTable
where groupid = o.groupid
)
and groupid < 10000000
But I suspect you want to check for the existence of both groupid and id:
insert into newTable (userid, groupid)
select id, groupid
from (
select
id,
regexp_split_to_table(olgroups, ',') as groupid
from oldTable
) o
where
not exists (
select 1
from newTable
where groupid = o.groupid and id = o.id
)
and groupid < 10000000
The regexp_split_to_table function will "explode" the oldgroups column in rows doing a cross join with the id column.