I am moving data from Mysql to Postgres and my code is like below -
import os, re, time, codecs, glob, sqlite3
from StringIO import StringIO
import psycopg2, MySQLdb, datetime, decimal
from datetime import date
import gc
tables = (['table1' , 27],)
conn = psycopg2.connect("dbname='xxx' user='xxx' host='localhost' password='xxx' ")
curpost = conn.cursor()
db = MySQLdb.connect(host="127.0.0.1", user="root", passwd="root" , unix_socket='/var/mysql/mysql.sock', port=3306 )
cur = db.cursor()
cur.execute('use xxx;')
for t in tables:
print t
curpost.execute( "truncate table " + t[0] )
cur.execute("select * from "+ t[0] )
a = ','.join( '%s' for i in range(t[1]) )
qry = "insert into " + t[0] + " values ( " + a +" )"
print qry
i = 0
while True:
rows = cur.fetchmany(5000)
if not rows: break
string = ''
for row in rows:
string = string + ('|'.join([str(x) for x in row])) + "\n"
curpost.copy_from(StringIO(string), t[0], sep="|", null="None" )
i += curpost.rowcount
print i , " loaded"
curpost.connection.commit()
del string, row, rows
gc.collect()
curpost.close()
cur.close()
For small tables, the code runs fine. However the larger ones (3.6 million records), the moment the mysql execute (cur.execute("select * from "+ t[0] )) runs, the memory utilization on the machine zooms. This is even though i have used fetchmany and records should only come in batches of 5000. I have tried with 500 records also and its the same. For large tables it seems that fetchmany is not working as documented..
Edit - I added garbage collection and del statements. Still the memory keeps on bloating till all records are not processed.
Any ideas?
Sorry if I am wrong, you've said that you don't want to change query
But just in case if you have no choice you can try:
replace this fragment:
cur.execute("select * from "+ t[0] )
a = ','.join( '%s' for i in range(t[1]) )
qry = "insert into " + t[0] + " values ( " + a +" )"
print qry
i = 0
while True:
rows = cur.fetchmany(5000)
to this one:
a = ','.join( '%s' for i in range(t[1]) )
qry = "insert into " + t[0] + " values ( " + a +" )"
print qry
i = 0
while True:
cur.execute("select * from "+ t[0]+" LIMIT "+i+", 5000")
rows = cur.fetchall()
Related
I am pretty new in python developing. I have a long python script what "clone" a database and add additional stored functions and procedures. Clone means copy only the schema of DB.These steps work fine.
My question is about pymysql insert exection:
I have to copy some table contents into the new DB. I don't get any sql error. If I debug or print the created INSERT INTO command is correct (I've tested it in an sql editor/handler). The insert execution is correct becuse the result contain the exact row number...but all rows are missing from destination table in dest.DB...
(Ofcourse DB_* variables have been definied!)
import pymysql
liveDbConn = pymysql.connect(DB_HOST, DB_USER, DB_PWD, LIVE_DB_NAME)
testDbConn = pymysql.connect(DB_HOST, DB_USER, DB_PWD, TEST_DB_NAME)
tablesForCopy = ['role', 'permission']
for table in tablesForCopy:
with liveDbConn.cursor() as liveCursor:
# Get name of columns
liveCursor.execute("DESCRIBE `%s`;" % (table))
columns = '';
for column in liveCursor.fetchall():
columns += '`' + column[0] + '`,'
columns = columns.strip(',')
# Get and convert values
values = ''
liveCursor.execute("SELECT * FROM `%s`;" % (table))
for result in liveCursor.fetchall():
data = []
for item in result:
if type(item)==type(None):
data.append('NULL')
elif type(item)==type('str'):
data.append("'"+item+"'")
elif type(item)==type(datetime.datetime.now()):
data.append("'"+str(item)+"'")
else: # for numeric values
data.append(str(item))
v = '(' + ', '.join(data) + ')'
values += v + ', '
values = values.strip(', ')
print("### table: %s" % (table))
testDbCursor = testDbConn.cursor()
testDbCursor.execute("INSERT INTO `" + TEST_DB_NAME + "`.`" + table + "` (" + columns + ") VALUES " + values + ";")
print("Result: {}".format(testDbCursor._result.message))
liveDbConn.close()
testDbConn.close()
Result is:
### table: role
Result: b"'Records: 16 Duplicates: 0 Warnings: 0"
### table: permission
Result: b'(Records: 222 Duplicates: 0 Warnings: 0'
What am I doing wrong? Thanks!
You have 2 main issues here:
You don't use conn.commit() (which would be either be liveDbConn.commit() or testDbConn.commit() here). Changes to the database will not be reflected without committing those changes. Note that all changes need committing but SELECT, for example, does not.
Your query is open to SQL Injection. This is a serious problem.
Table names cannot be parameterized, so there's not much we can do about that, but you'll want to parameterize your values. I've made multiple corrections to the code in relation to type checking as well as parameterization.
for table in tablesForCopy:
with liveDbConn.cursor() as liveCursor:
liveCursor.execute("SELECT * FROM `%s`;" % (table))
name_of_columns = [item[0] for item in liveCursor.description]
insert_list = []
for result in liveCursor.fetchall():
data = []
for item in result:
if item is None: # test identity against the None singleton
data.append('NULL')
elif isinstance(item, str): # Use isinstance to check type
data.append(item)
elif isinstance(item, datetime.datetime):
data.append(item.strftime('%Y-%m-%d %H:%M:%S'))
else: # for numeric values
data.append(str(item))
insert_list.append(data)
testDbCursor = testDbConn.cursor()
placeholders = ', '.join(['`%s`' for item in insert_list[0]])
testDbCursor.executemany("INSERT INTO `{}.{}` ({}) VALUES ({})".format(
TEST_DB_NAME,
table,
name_of_columns,
placeholders),
insert_list)
testDbConn.commit()
From this github thread, I notice that executemany does not work as expected in psycopg2; it instead sends each entry as a single query. You'll need to use execute_batch:
from psycopg2.extras import execute_batch
execute_batch(testDbCursor,
"INSERT INTO `{}.{}` ({}) VALUES ({})".format(TEST_DB_NAME,
table,
name_of_columns,
placeholders),
insert_list)
testDbConn.commit()
How to insert data into table using python pymsql
Find my solution below
import pymysql
import datetime
# Create a connection object
dbServerName = "127.0.0.1"
port = 8889
dbUser = "root"
dbPassword = ""
dbName = "blog_flask"
# charSet = "utf8mb4"
conn = pymysql.connect(host=dbServerName, user=dbUser, password=dbPassword,db=dbName, port= port)
try:
# Create a cursor object
cursor = conn.cursor()
# Insert rows into the MySQL Table
now = datetime.datetime.utcnow()
my_datetime = now.strftime('%Y-%m-%d %H:%M:%S')
cursor.execute('INSERT INTO posts (post_id, post_title, post_content, \
filename,post_time) VALUES (%s,%s,%s,%s,%s)',(5,'title2','description2','filename2',my_datetime))
conn.commit()
except Exception as e:
print("Exeception occured:{}".format(e))
finally:
conn.close()
I am moving data from MySQL to MSSQL - however I have a problem with insert into statement when I have ' in value.
for export i have used code below:
import pymssql
import mysql.connector
conn = pymssql.connect(host='XXX', user='XXX',
password='XXX', database='XXX')
sqlcursor = conn.cursor()
cnx = mysql.connector.connect(user='root',password='XXX',
database='XXX')
cursor = cnx.cursor()
sql= "SELECT Max(ID) FROM XXX;"
cursor.execute(sql)
row=cursor.fetchall()
maxID = str(row)
maxID = maxID.replace("[(", "")
maxID = maxID.replace(",)]", "")
AMAX = int(maxID)
LC = 1
while LC <= AMAX:
LCC = str(LC)
sql= "SELECT * FROM XX where ID ='"+ LCC +"'"
cursor.execute(sql)
result = cursor.fetchall()
data = str(result)
data = data.replace("[(","")
data = data.replace(")]","")
data = data.replace("None","NULL")
#print(row)
si = "insert into [XXX].[dbo].[XXX] select " + data
#print(si)
#sys.exit("stop")
try:
sqlcursor.execute(si)
conn.commit()
except Exception:
print("-----------------------")
print(si)
LC = LC + 1
print('Import done | total count:', LC)
It is working fine until I have ' in one of my values:
'N', '0000000000', **"test string'S nice company"**
I would like to avoid spiting the data into columns and then checking if there is ' in the data - as my table has about 500 fields.
Is there a smart way of replacing ' with ''?
Answer:
Added SET QUOTED_IDENTIFIER OFF to insert statement:
si = "SET QUOTED_IDENTIFIER OFF insert into [TechAdv].[dbo].[aem_data_copy]
select " + data
In MSSQL, you can SET QUOTED_IDENTIFIER OFF, then you can use double quotes to escape a singe quote, or use two single quotes to escape one quote.
I am trying to fetch records after a regular interval from a database table which growing with records. I am using Python and its pyodbc package to carry out the fetching of records. While fetching, how can I point the cursor to the next row of the row which was read/fetched last so that with every fetch I can only get the new set of records inserted.
To explain more,
my table has 100 records and they are fetched.
after an interval the table has 200 records and I want to fetch rows from 101 to 200. And so on.
Is there a way with pyodbc cursor?
Or any other suggestion would be very helpful.
Below is the code I am trying:
#!/usr/bin/python
import pyodbc
import csv
import time
conn_str = (
"DRIVER={PostgreSQL Unicode};"
"DATABASE=postgres;"
"UID=userid;"
"PWD=database;"
"SERVER=localhost;"
"PORT=5432;"
)
conn = pyodbc.connect(conn_str)
cursor = conn.cursor()
def fetch_table(**kwargs):
qry = kwargs['qrystr']
try:
#cursor = conn.cursor()
cursor.execute(qry)
all_rows = cursor.fetchall()
rowcnt = cursor.rowcount
rownum = cursor.description
#return (rowcnt, rownum)
return all_rows
except pyodbc.ProgrammingError as e:
print ("Exception occured as :", type(e) , e)
def poll_db():
for i in [1, 2]:
stmt = "select * from my_database_table"
rows = fetch_table(qrystr = stmt)
print("***** For i = " , i , "******")
for r in rows:
print("ROW-> ", r)
time.sleep(10)
poll_db()
conn.close()
I don't think you can use pyodbc, or any other odbc package, to find "new" rows. But if there is a 'timestamp' column in your database, or if you can add such a column (some databases allow for it to be automatically populated as the time of insertion so you don't have to change the insert queries) then you can change your query to select only the rows whose timestamp is greater than the previous timestamp. And you can keep changing the prev_timestamp variable on each iteration.
def poll_db():
prev_timestamp = ""
for i in [1, 2]:
if prev_timestamp == "":
stmt = "select * from my_database_table"
else:
# convert your timestamp str to match the database's format
stmt = "select * from my_database_table where timestamp > " + str(prev_timestamp)
rows = fetch_table(qrystr = stmt)
prev_timestamp = datetime.datetime.now()
print("***** For i = " , i , "******")
for r in rows:
print("ROW-> ", r)
time.sleep(10)
Can someone please explain how I can get the tables in the current database?
I am using postgresql-8.4 psycopg2.
This did the trick for me:
cursor.execute("""SELECT table_name FROM information_schema.tables
WHERE table_schema = 'public'""")
for table in cursor.fetchall():
print(table)
pg_class stores all the required information.
executing the below query will return user defined tables as a tuple in a list
conn = psycopg2.connect(conn_string)
cursor = conn.cursor()
cursor.execute("select relname from pg_class where relkind='r' and relname !~ '^(pg_|sql_)';")
print cursor.fetchall()
output:
[('table1',), ('table2',), ('table3',)]
The question is about using python's psycopg2 to do things with postgres. Here are two handy functions:
def table_exists(con, table_str):
exists = False
try:
cur = con.cursor()
cur.execute("select exists(select relname from pg_class where relname='" + table_str + "')")
exists = cur.fetchone()[0]
print exists
cur.close()
except psycopg2.Error as e:
print e
return exists
def get_table_col_names(con, table_str):
col_names = []
try:
cur = con.cursor()
cur.execute("select * from " + table_str + " LIMIT 0")
for desc in cur.description:
col_names.append(desc[0])
cur.close()
except psycopg2.Error as e:
print e
return col_names
Here's a Python3 snippet that includes connect() parameters as well as generate a Python list() for output:
conn = psycopg2.connect(host='localhost', dbname='mySchema',
user='myUserName', password='myPassword')
cursor = conn.cursor()
cursor.execute("""SELECT relname FROM pg_class WHERE relkind='r'
AND relname !~ '^(pg_|sql_)';""") # "rel" is short for relation.
tables = [i[0] for i in cursor.fetchall()] # A list() of tables.
Although it has been answered by Kalu, but the query mentioned returns tables + views from postgres database. If you need only tables and not views then you can include table_type in your query like-
s = "SELECT"
s += " table_schema"
s += ", table_name"
s += " FROM information_schema.tables"
s += " WHERE"
s += " ("
s += " table_schema = '"+SCHEMA+"'"
s += " AND table_type = 'BASE TABLE'"
s += " )"
s += " ORDER BY table_schema, table_name;"
db_cursor.execute(s)
list_tables = db_cursor.fetchall()
you can use this code for python 3
import psycopg2
conn=psycopg2.connect(database="your_database",user="postgres", password="",
host="127.0.0.1", port="5432")
cur = conn.cursor()
cur.execute("select * from your_table")
rows = cur.fetchall()
conn.close()
I am trying to create a program using Python 3.1 and Sqlite3. The program will open a text file and read parameters to pass for the select query and output a text file with the result. I am getting stuck on the cursor.execute(query) statement. I may be doing everything incorrect. Any help would be appreciated.
import sqlite3
# Connect to database and test
#Make sure the database is in the same folder as the python script folder
conn = sqlite3.connect("nnhs.sqlite3")
if (conn):
print ("Connection successful")
else:
print ("Connection not successful")
# Create a cursor to execute SQL queries
cursor = conn.cursor()
# Read data from a file
data = []
infile = open ("patient_in.txt", "r")
for line in infile:
line = line.rstrip("\n")
line = line.strip()
seq = line.split(' ')
seq[5] = int(seq[5])
seq = tuple (seq)
data.append(seq)
infile.close()
# Check that the data has been read correctly
print
print ("Check that the data was read from file")
print (data)
# output file
outfile = open("patient_out.txt", "w")
# select statement
query = "SELECT DISTINCT patients.resnum, patients.facnum, patients.sex, patients.age, patients.rxmed, icd9_1.resnum, icd9_1.code "
query += "from patients "
query += "INNER JOIN icd9 as icd9_1 on (icd9_1.resnum = patients.resnum) AND (icd9_1.code LIKE ':6%') "
query += "INNER JOIN icd9 as icd9_2 on (icd9_2.resnum = patients.resnum) AND (icd9_2.code LIKE ':6%') "
query += "(where patients.age >= :2) AND (patients.age <= :3) "
query += "AND patients.sex = :1 "
query += "AND (patients.rxmed >= :4) AND (patients.rxmed <= :5) "
query += "ORDER BY patients.resnum;"
result = cursor.execute(query)
for row in result:
ResultNumber = row[0]
FacNumber = row[1]
Sex = row[2]
Age = row[3]
RxMed = row[4]
ICDResNum = row[5]
ICDCode = row[6]
outfile.write("Patient Id Number: " + str(ResultNumber) + "\t" + " ICD Res Num: " + str(ICDResNum) + "\t" + " Condition: " + str(ICDCode) + "\t" + " Fac ID Num: " + str(FacNumber) + "\t" + " Patient Sex: " + str(Sex) + "\t" + " Patient Age: " + str(Age) + "\t" +" Number of Medications: " + str(RxMed) + "\t" + "\n")
# Close the cursor
cursor.close()
# Close the connection
con.close()
You have read multiple rows of query parameters and stored them in data and then ... nothing. data is a misleading name. Let's call it queries instead.
You presumably want to iterate over queries and perform one query for each row in queries. So do that: for query_row in queries: .....
Also let's rename query to sql.
You'll need result = cursor.execute(sql, query_row)
You'll also need to decide whether you want to have a different output file for each query_row, or have only one file with a field (or sub-heading) to distinguish what info comes from what query_row.
Update about parameter passing with sqlite3
It appears not to be documented, but if you use numbered place holders, you can supply a tuple of arguments -- you don't need to supply a dict. The following example presumes a database blah.db with an empty table created by
create table foo (id int, name text, amt int);
>>> import sqlite3
>>> conn = sqlite3.connect('blah.db')
>>> curs = conn.cursor()
>>> sql = 'insert into foo values(:1,:2,:1);'
>>> curs.execute(sql, (42, 'bar'))
<sqlite3.Cursor object at 0x01E3D520>
>>> result = curs.execute('select * from foo;')
>>> print list(result)
[(42, u'bar', 42)]
>>> curs.execute(sql, {'1':42, '2':'bar'})
<sqlite3.Cursor object at 0x01E3D520>
>>> result = curs.execute('select * from foo;')
>>> print list(result)
[(42, u'bar', 42), (42, u'bar', 42)]
>>> curs.execute(sql, {1:42, 2:'bar'})
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
sqlite3.ProgrammingError: You did not supply a value for binding 1.
>>>
Update 2a You have a bug in this line of your SQL (and the following one):
INNER JOIN icd9 as icd9_1 on (icd9_1.resnum = patients.resnum) AND (icd9_1.code LIKE ':6%')
If your parameter is the Python string "XYZ", the resultant SQL will be ... LIKE ''XYZ'%') which is not what you want. The db interface will always quote your supplied string. You want ... LIKE 'XYZ%'). What you should do is have ... LIKE :6) in your SQL, and pass e.g. user_input[5].rstrip("%") + "%" (ensures exactly 1 %) as the parameter.
Update 2b You can of course use a dictionary for the parameters, as documented, but it would improve the legibility considerably if you used meaningful names instead of digits.
For example, ... LIKE :code) instead of the above, and pass e.g. {'code': user_input[5].rstrip("%"), .....} as the second arg of execute()
:2? These are placeholders for parameters, which you're not giving it. Take a look at the module docs for how to call execute with a tuple of parameters:
http://docs.python.org/library/sqlite3.html#sqlite3.Cursor.execute
Yeah, it looks like you need to add the parameters. Also, changing the line query += "(where patients.age >= :2) AND (patients.age <= :3) " to query += "where (patients.age >= :2) AND (patients.age <= :3) " might help.
Here is a slightly more python-ish way of writing your code... Although PEP8 might say otherwise...
import sqlite3
with open sqlite3.connect("nnhs.sqlite3") as f:
cursor = f.cursor()
data = []
with open("patient_in.txt", "r") as infile:
for line in infile:
line = line.rstrip("\n")
line = line.strip()
seq = line.split(' ')
seq[5] = int(seq[5])
seq = tuple (seq)
data.append(seq)
print(data)
with open("patient_out.txt", "w") as outfile:
query = """SELECT DISTINCT patients.resnum, patients.facnum,patients.sex, patients.age, patients.rxmed, icd9_1.resnum, icd9_1.code
from patients
INNER JOIN icd9 as icd9_1 on (icd9_1.resnum = patients.resnum) AND (icd9_1.code LIKE ':6%')
INNER JOIN icd9 as icd9_2 on (icd9_2.resnum = patients.resnum) AND (icd9_2.code LIKE ':6%')
(where patients.age >= :2) AND (patients.age <= :3)
AND patients.sex = :1
AND (patients.rxmed >= :4) AND (patients.rxmed <= :5)
ORDER BY patients.resnum"""
variables = {'1':sex, '2':age_lowerbound, '3':age_upperbound, '4':rxmed_lowerbound, '5':rxmed_upperbound, '6':icd9_1}
cursor.execute(query, variables)
for row in cursor.fetchall():
outfile.write("Patient Id Number: {0}\t ICD Res Num: {1}\t Condition: {2}\t Fac ID Num: {3}\t Patient Sex: {4}\t Patient Age: {5}\t Number of Medications: {6}\n".format(*row))