make MariaDB update from Python much faster

make MariaDB update from Python much faster - python

I have a python script that aggregates data from multiple sources to one, for technical reasons.
In this script, I create an employees table fills it with data and in a second step, fetch each employee's name/last name from another data source. My code is the following:
Create the table and fill it with data:
def createIdentite(mariaConnector, fmsConnector):
print('Creating table "Identite"...')
mariadbCursor = mariaConnector.cursor()
# verify we have the destination tables we need
print(' Checking for table Identite...')
if mariaCheckTableExists(mariadbConnector, 'Identite') == False:
print(' Table doesn\'t exist, creating it...')
mariadbCursor.execute("""
CREATE TABLE Identite (
PK_FP VARCHAR(50) NOT NULL,
LieuNaissance TEXT,
PaysNaissance TEXT,
Name TEXT,
LastName TEXT,
Nationalite TEXT,
PaysResidence TEXT,
PersonneAPrevenir TEXT,
Tel1_PAP TEXT,
Tel2_PAP TEXT,
CategorieMutuelle TEXT,
Ep1_MUTUELLE BOOLEAN,
TypeMutuelle BOOLEAN,
NiveauMutuelle BOOLEAN,
NiveauMutuelle2 BOOLEAN,
NiveauMutuelle3 BOOLEAN,
PartMutuelleSalarie FLOAT,
PartMutuelleSalarieOption FLOAT,
PRIMARY KEY (PK_FP)
)
""")
mariadbCursor.execute("CREATE INDEX IdentitePK_FP ON Identite(PK_FP)")
else:
# flush the table
print(' Table exists, flushing it...')
mariadbCursor.execute("DELETE FROM Identite")
# now fill it with fresh data
print(' Retrieving the data from FMS...')
fmsCursor = fmsConnector.cursor()
fmsCursor.execute("""
SELECT
PK_FP,
Lieu_Naiss_Txt,
Pays_Naiss_Txt,
Nationalite_Txt,
Pays_Resid__Txt,
Pers_URG,
Tel1_URG,
Tel2_URG,
CAT_MUTUELLE,
CASE WHEN Ep1_MUTUELLE = 'OUI' THEN 1 ELSE 0 END as Ep1_MUTUELLE,
CASE WHEN TYPE_MUT = 'OUI' THEN 1 ELSE 0 END as TYPE_MUT,
CASE WHEN Niv_Mutuelle IS NULL THEN 0 ELSE 1 END as Niv_Mutuelle,
CASE WHEN NIV_MUTUELLE[2] IS NULL THEN 0 ELSE 1 END as Niv_Mutuelle2,
CASE WHEN NIV_MUTUELLE[3] IS NULL THEN 0 ELSE 1 END as Niv_Mutuelle3,
PART_MUT_SAL,
PART_MUT_SAL_Option
FROM B_EMPLOYE
WHERE PK_FP IS NOT NULL
""")
print(' Transferring...')
#for row in fmsCursor:
insert = """INSERT INTO Identite (
PK_FP,
LieuNaissance,
PaysNaissance,
Nationalite,
PaysResidence,
PersonneAPrevenir,
Tel1_PAP,
Tel2_PAP,
CategorieMutuelle,
Ep1_MUTUELLE,
TypeMutuelle,
NiveauMutuelle,
NiveauMutuelle2,
NiveauMutuelle3,
PartMutuelleSalarie,
PartMutuelleSalarieOption
) VALUES (
%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s
)"""
values = fmsCursor.fetchall()
mariadbCursor.executemany(insert, values)
mariaConnector.commit()
print(' Inserted '+str(len(values))+' values')
return len(values)
And the part where I retrieve first name and last name:
def updateEmployeeNames(mariaConnector, mssqlConnector):
print("Updating employee names...")
mariadbCursor = mariaConnector.cursor()
mssqlCursor = mssqlConnector.cursor()
mssqlCursor.execute("SELECT Name, LastName, PK_FP FROM F_Person")
rows = mssqlCursor.fetchall()
query = """
UPDATE Identite
SET Name = %s, LastName = %s
WHERE PK_FP = %s
"""
mariadbCursor.executemany(query, rows)
mariadbConnector.commit()
As you might have guessed, the first function takes almost no time to execute (less that 2 seconds), where the second one take almost 20.
Python's not my strong suit, but there might be another way, the aim is to make it much faster.
I already tried adding values to createIdentite's each tuple before the executeMany, but Mysql connector won't let me do that.
Thanks a lot for your help.

So the UPDATE to the existing MariaDB table is the bottleneck, in which case it might be faster to do the update on a pandas DataFrame and then push the result the MariaDB table using pandas to_sql method. A simplified example would be ...
df_main = pd.read_sql_query(fms_query, fms_engine, index_col='PK_FP')
df_mssql = pd.read_sql_query(mssql_query, mssql_engine, index_col='PK_FP')
df_main.update(df_mssql)
df_main.to_sql('Identite', mariadb_engine, if_exists='replace',
dtype={'PK_FP': sqlalchemy.types.String(50)})
... where fms_query and mssql_query are the queries from your question. fms_engine, mssql_engine, and mariadb_engine would be SQLAlchemy Engine objects.

In all MySQL Python Drivers the execute_many is rewritten, since bulk operations are not supported in MySQL, they are supported only via binary protocol in MariaDB since 10.2, full support (including delete and update) was added later and is available in the lastest 10.2, 10.3 and 10.4 versions of MariaDB Server.
The python Driver is rewriting an insert query, iterates over the number of rows and transforms the statement to
INSERT INTO t1 VALUES (row1_id, row1_data), (row2_id, row2_data),....(rown_id, row_n data)
This is quite fast, but the SQL Syntax doesn't allow this for UPDATE or DELETE. In this case the Driver needs to execute the statement n times (n= number of rows), passing the values for each row in a single statment.
MariaDB binary protocol allows to prepare the statement, executing it by sending all data at once (The execute package also contains the data).
If C would be an alternative, take a look at the bulk unittests on Github repository of MariaDB Connector/C. Otherwise you have to wait, MariaDB will likey release it's own python Driver next year.

Create the index as you create the temp table.
These combined statements work: CREATE TABLE ... SELECT ...; and INSERT INTO table ... SELECT .... However, they may be difficult to perform from Python.
It is unclear whether you need the temp table at all.
Learn how to use JOIN to get information simultaneously from two tables.

Related

update the last entered value from a selection of values in a database with python , mysql

Okay so i have a table which has student id and the student id is used as identifier to edit the column but what if the same student lends a book twice then all the student value will b edited which i don't want....i want the last entered data of student id to b edited and using a Sl.No is not a solution here because its practically complicated.I am using python connector. Please help :) Thanks in advance
code i use right now :
con = mysql.connect(host='localhost', user='root',
password='monkey123', database='BOOK')
c = con.cursor()
c.execute(
f"UPDATE library set `status`='Returned',`date returned`='{str(cal.selection_get())}' WHERE `STUDENT ID`='{e_sch.get()}';")
c.execute('commit')
con.close()
messagebox.showinfo(
'Success', 'Book has been returned successfully')

If I followed you correctly, you want to update just one record that matches the where condition. For this to be done in a reliable manner, you need a column to define the ordering of the records. It could be a date, an incrementing id, or else. I assume that such column exists in your table and is called ordering_column.
A simple option is to use ORDER BY and LIMIT in the UPDATE statement, like so:
sql = """
UPDATE library
SET status = 'Returned', date returned = %s
WHERE student_id = %s
ORDER BY ordering_column DESC
LIMIT 1
"""
c = con.cursor()
c.execute(sql, (str(cal.selection_get()), e_sch.get(), )
Note that I modified your code so input values are given as parameters rather than concatenated into the query string. This is an important change, that makes your code safer and more efficient.

How to make automatically items id numeration?

I'm trying to insert some data into SQL database, and the problem is that I'm really green on this. So the MAIN problem is that How can I sort all the items in table? I have 3 main things: ID, CARNUM, TIME. But in this 'Insertion' I have to type the id manually. How can I make that the system would create a numeric id numeration automatically?
Here's the insertion code:
postgres_insert_query = """ INSERT INTO Vartotojai (ID, CARNUM, TIME) VALUES (%s,%s,%s)"""
record_to_insert = (id, car_numb, Reg_Tikslus_Laikas)
cursor.execute(postgres_insert_query, record_to_insert)
connection.commit()
count = cursor.rowcount
print (count, "Record inserted successfully into mobile table")
pgadmin sort
pgadmin table

You could change the datatype of ID to serial, which is an auto incrementing integer. Meaning that you don't have to manually enter an ID when inserting into the database.
Read more about datatype serial: source

Updating timestamp each time a row is added?

I have code that loops, adding a row with information to each row. However, I find that each row does not have a new timestamp, but rather has the same one as the very first row, leading me to believe that the value of current_timestamp is not updating each time. Thus, what fix this problem? Here is my code:
if __name__ == "__main__":
main()
deleteAll() # Clears current table
ID = 0
while ID < 100:
insert(ID, 'current_date', 'current_timestamp')
ID += 1
conn.commit()
my insert function:
def insert(ID, date, timestamp): # Assumes table name is test1
cur.execute(
"""INSERT INTO test1 (ID, date,timestamp) VALUES (%s, %s, %s);""", (ID, AsIs(date), AsIs(timestamp)))
This code is in python, btw, and it is using postgresql for database stuff.

The immediate fix is to commit after each insert otherwise all of the inserts will be done inside a single transaction
while ID < 100:
insert(ID, 'current_date', 'current_timestamp')
ID += 1
conn.commit()
http://www.postgresql.org/docs/current/static/functions-datetime.html#FUNCTIONS-DATETIME-CURRENT
Since these functions return the start time of the current transaction, their values do not change during the transaction. This is considered a feature: the intent is to allow a single transaction to have a consistent notion of the "current" time, so that multiple modifications within the same transaction bear the same time stamp.
Those functions should not be passed as parameters but included in the SQL statement
def insert(ID): # Assumes table name is test1
cur.execute("""
INSERT INTO test1 (ID, date, timestamp)
VALUES (%s, current_date, current_timestamp);
""", (ID,)
)
The best practice is to keep the commit outside of the loop to have a single transaction
while ID < 100:
insert(ID)
ID += 1
conn.commit()
and use the statement_timestamp function which, as the name implies, returns the statement timestamp in instead of the transaction beginning timestamp
INSERT INTO test1 (ID, date, timestamp)
values (%s, statement_timestamp()::date, statement_timestamp())

SQLite insert or ignore and return original _rowid_

I've spent some time reading the SQLite docs, various questions and answers here on Stack Overflow, and this thing, but have not come to a full answer.
I know that there is no way to do something like INSERT OR IGNORE INTO foo VALUES(...) with SQLite and get back the rowid of the original row, and that the closest to it would be INSERT OR REPLACE but that deletes the entire row and inserts a new row and thus gets a new rowid.
Example table:
CREATE TABLE foo(
id INTEGER PRIMARY KEY AUTOINCREMENT,
data TEXT
);
Right now I can do:
sql = sqlite3.connect(":memory:")
# create database
sql.execute("INSERT OR IGNORE INTO foo(data) VALUES(?);", ("Some text.", ))
the_id_of_the_row = None
for row in sql.execute("SELECT id FROM foo WHERE data = ?", ("Some text.", )):
the_id_of_the_row = row[0]
But something ideal would look like:
the_id_of_the_row = sql.execute("INSERT OR IGNORE foo(data) VALUES(?)", ("Some text", )).lastrowid
What is the best (read: most efficient) way to insert a row into a table and return the rowid, or to ignore the row if it already exists and just get the rowid? Efficiency is important because this will be happening quite often.
Is there a way to INSERT OR IGNORE and return the rowid of the row that the ignored row was compared to? This would be great, as it would be just as efficient as an insert.

The way that worked the best for me was to insert or ignore the values, and the select the rowid in two separate steps. I used a unique constraint on the data column to both speed up selects and avoid duplicates.
sql.execute("INSERT OR IGNORE INTO foo(data) VALUES(?);" ("Some text.", ))
last_row_id = sql.execute("SELECT id FROM foo WHERE data = ?;" ("Some text. ", ))
The select statement isn't as slow as I thought it would be. This, it seems, is due to SQLite automatically creating an index for the unique columns.

INSERT OR IGNORE is for situations where you do not care about the identity of the record; where the goal is only to have some record with that specific value.
If you want to know whether a new record is inserted or not, you have to check by hand:
the_id_of_the_row = None
for row in sql.execute("SELECT id FROM foo WHERE data = ?", ...):
the_id_of_the_row = row[0]
if the_id_of_the_row is None:
c = sql.cursor()
c.execute("INSERT INTO foo(data) VALUES(?)", ...)
the_id_of_the_row = c.lastrowid
As for efficiency: when SQLite checks the datacolumn for duplicates, it has to do exactly the same query that you're doing with the SELECT, and once you've done that, the access path is in the cache, so performance should not be a problem. In any case, it is necessary to execute two separate INSERT/SELECT queries (in either order, both your and my code work, but yours is simpler).

Python MySQLdb : Duplicate entry '2147483647' for key 1

I am getting this error when I run my program in Python.
Here's the table of my database :
Field Type Collation Null Key Default
articleCode varchar(25) latin1_swedish_ci NO UNI
dateReceived datetime NULL NO MUL 0000-00-00 00:00:00
s100RSD datetime NULL YES 0000-00-00 00:00:00
remarks longtext latin1_swedish_ci YES
And to simplify the problem of my program, I will isolate the part of the program that makes an error, here:
import MySQLdb
def main():
dateReceived = '2011-10-07 01:06:30'
articleCode = 'name'
s100rsd = '2011-10-07 01:06:30'
remark_text = 'This is a remark'
db = MySQLdb.connect('server', 'user', 'passwd', 'table_name', port)
cur = db.cursor()
db_query = cur.execute("INSERT INTO tblS100CurrentListing (articleCode, dateReceived, s100RSD, remarks) VALUES ('articleCode', 'dateReceived', 's100rsd', 'remark_text')")
cur.close()
db.close()
if __name__ == '__main__':
main()
Here's the error that I get : _mysql_exceptions.IntegrityError: (1062, "Duplicate entry '2147483647' for key 1")
Thanks for all your help!

You seem to be inserting constants into the database, not your actual values. Instead, try something similar to;
db_query = cur.execute("INSERT INTO tblS100CurrentListing " +
"(articleCode, dateReceived, s100RSD, remarks) VALUES (%s, %s, %s, %s)",
(articleCode, dateReceived, s100rsd, remark_text))

This happens because the limit of Key. If it is INTEGER, 2147483647 is the limit. You can choose something like BIGINT or the one bigger than INTEGER. All records after 2147483647 will be tried to write on value 2147483647. So this is why you are having this problem. Change it with BIGINT / LARGEINT or something like this.
Hope it helps.

Unique key on field "articleCode" preventing MySQL to have two records in this column with the same content. Seems like you already inserted one on the first program run.
Remove previously inserted record with articleCode = 'name' OR remove UNIQUE KEY on articleCode field OR try to insert different value of articleCode.
Hope this helps!

After correcting the code as described in other answers, you should modify the table in order to reset its auto_increment counter.
ALTER TABLE tblS100CurrentListing auto_increment=1
should reset the counter to the lowest possible value.
Removing or repairing the erroneous values from the table is required; otherwise the change won't have any effect.
Besides, is it really needed to insert a field which is set to auto_increment? Or is this part of a restore process? Otherwise, the two things are redundant: either you get the data automatically or you insert them. Both can (as seen) lead to conflicts.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

make MariaDB update from Python much faster - python

Related

update the last entered value from a selection of values in a database with python , mysql

How to make automatically items id numeration?

Updating timestamp each time a row is added?

SQLite insert or ignore and return original _rowid_

Python MySQLdb : Duplicate entry '2147483647' for key 1

Categories

Resources