below is a sample of code that i am using to push data from one postgres server to another postgres server. I am trying to move 28 Million records. This worked perfectly with sql server to postgres, but now that it's postgres to postgres it is hanging on line
sourcecursor.execute('select * from "schema"."reallylargetable"; ')
it never reaches any of the other statements to get to the Iterator.
I get this message:
psycopg2.DatabaseError: out of memory for query result ad the select query statement.
#cursors for aiods and ili#
sourcecursor = sourceconn.cursor()
destcursor= destconn.cursor()
#name of temp csv file
filenme= 'filename.csv'
#defenition that uses fetchmany to iterate through data in batch. default
value is in 10000#
def ResultIterator(cursor, arraysize=1000):
'iterator using fetchmany and consumes less memory'
while True:
results = cursor.fetchmany(arraysize)
if not results:
break
for result in results:
yield result
#set data for the cursor#
print("start get data")
#it is not going past the line below. it errors at with out of memory for query result
sourcecursor.execute('select * from "schema"."reallylargetable"; ')
print("iterator")
dataresults= ResultIterator(sourcecursor)
*****do something with dataresults *********
Please change this line:
sourcecursor = sourceconn.cursor()
to name your cursor (use whatever name pleases you):
sourcecursor = sourceconn.cursor('mysourcecursor')
What this does is direct psycopg2 to open a postgresql server-side named cursor for your query. Without a named cursor on the server side, psycopg2 attempts to grab all rows when executing the query.
Related
I'm trying to perform an update using flask-sqlalchemy but when it gets to the update script it does not return anything. it seems the script is hanging or it is not doing anything.
I tried to wrap a try catch on the code that does not complete but there are no errors.
I gave it 10 minutes to complete the update statement which only updates 1 record and still, it will not do anything for some reason.
When I cancel the script, it provides an error Communication link failure (0) (SQLEndTran) but I don't think this is the root cause of the error because on the same script, I have other sql scripts that works ok so the connection to db is good
what my script does is get some list of filenames that I need to process (I have no issues with this). then using the retrieved list of filenames, I will look into the directory to check if the file exists. if it does not exists, I will update the database to tag the file as it is not found. this is where I get the issue, it does not perform the update nor provide an error message of some sort.
I even tried to create a new engine just for the update script, but still I get the same behavior.
I also tried to print out the sql script first in python before executing. I ran the printed sql command on my sql browser and it worked ok.
The code is very simple, I'm not really sure why it's having the issue.
#!/usr/bin/env python3
from flask_sqlalchemy import sqlalchemy
import glob
files_directory = "/files_dir/"
sql_string = """
select *
from table
where status is null
"""
# ommited conn_string
engine1 = sqlalchemy.create_engine(conn_string)
result = engine1.execute(sql_string)
for r in result:
engine2 = sqlalchemy.create_engine(conn_string)
filename = r[11]
match = glob.glob(f"{files_directory}/**/{filename}.wav")
if not match:
print('no match')
script = "update table set status = 'not_found' where filename = '" + filename + "' "
engine2.execute(script)
engine2.dispose()
continue
engine1.dispose()
it appears that if I try to loop through 26k records, the script doesn't work. but when I try to do by batches of 2k records per run, then the script will work. so my sql string will become (added top 2000 on the query)
sql_string = """
select top 2000 *
from table
where status is null
"""
it's manual yeah, but it works for me since I just need to run this script once. (I mean 13 times)
I am trying to use a python function to execute a .sql file.
The sql file begins with a DROP DATABASE statement.
The first lines of the .sql file look like this:
DROP DATABASE IF EXISTS myDB;
CREATE DATABASE myDB;
The rest of the .sql file defines all the tables and views for 'myDB'
Python Code:
def connect():
conn = psycopg2.connect(dbname='template1', user='user01')
conn.set_isolation_level(psycopg2.extensions.ISOLATION_LEVEL_AUTOCOMMIT)
cursor = conn.cursor()
sqlfile = open('/path/to/myDB-schema.sql', 'r')
cursor.execute(sqlfile.read())
db = psycopg2.connect(dbname='myDB', user='user01')
cursor = db.cursor()
return db,cursor
When I run the connect() function, I get an error on the DROP DATABASE statement.
ERROR:
psycopg2.InternalError: DROP DATABASE cannot be executed from a function or multi-command string
I spent a lot of time googling this error, and I can't find a solution.
I also tried adding an AUTOCOMMIT statement to the top of the .sql file, but it didn't change anything.
SET AUTOCOMMIT TO ON;
I am aware that postgreSQL doesn't allow you to drop a database that you are currently connected to, but I didn't think that was the problem here because I begin the connect() function by connecting to the template1 database, and from that connection create the cursor object which opens the .sql file.
Has anyone else run into this error, is there any way to to execute the .sql file using a python function?
This worked for me for a file consisting of SQL Queries in each line:
sql_file = open('file.sql','r')
cursor.execute(sql_file.read())
You are reading in the entire file and passing the whole thing to PostgreSQL as one string (as the error message says, "multi-command string". Is that what you are intending to do? If so, it isn't going to work.
Try this:
cursor.execute(sqlfile.readline())
Or, shell out to psql and let it do the work.
In order to deploy scripts using CRON that serve as ETL files that use .SQL, we had to expand how we call the SQL file itself.
sql_file = os.path.join(os.path.dirname(__file__), "../sql/ccd_parcels.sql")
sqlcurr = open(sql_file, mode='r').read()
curDest.execute(sqlcurr)
connDest.commit()
This seemed to please the CRON job...
I am currently using the python program for inserting the record and i am using the below statement.The issue is i am trying to print the no of of record inserted in the log file but it is printing only 0 but i can see the inserted record count in the console while running the program Can you help me to print the record count in the log file
Also i know that redirecting the python program to > file can have the record count but i want to bring all the details in the same log file after the insert record statement is done as i am using loop for different statement.
log="/fs/logfile.txt"
log_file = open(log,'w')
_op = os.system('psql ' + db_host_connection + ' -c "insert into emp select * from emp1;"')
print date , "printing" , _op
You should probably switch to a "proper" python module for postgresql interactions.
Haven't used postgresql in python before, but one of the first search engine hits leads to:
http://initd.org/psycopg/docs/usage.html
You could then do something along the following lines:
import psycopg2
conn = psycopg2.connect("dbname=test user=postgres")
# create a cursor for interaction with the database
cursor = conn.cursor()
# execute your sql statement
cursor.execute("insert into emp select * from emp1")
# retrieve the number of selected rows
number_rows_inserted = cursor.rowcount
# commit the changes
conn.commit()
This should also make things significantly faster than using an os.system call(s), especially if you're planning to execute multiple statements.
I have two concurrent processes:
1.) Writer - inserts new rows into a MySQL database on a regular basis (10-20 rows/sec)
2.) Reader - reads from the same table being inserted into
I notice that the Reader process only seems to see a snapshot of the database at about the time of its startup. Inserts occuring before this startup are found, but inserts occuring after are not. If I shut the Reader process down and restart it (but leave the Writer running), it will sometimes (but not always) see more data, but again seems to get a point-in-time view of the database.
I'm running a commit after each insert (code snippet below). I investigated whether this was a function of change buffering/pooling, but doing a "set ##global.innodb_change_buffering=none;" had no effect. Also, if I go in through MySQL workbench, I can query the most current data being inserted by the Writer. So this seems to be a function of how the Python/MySQL connection is getting set up.
My environment is:
Windows 7
MySQL 5.5.9
Python 2.6.6 -- EPD 6.3-1 (32-bit)
MySQL python connector
The insert code is:
def insert(dbConnection, statement):
cursor = dbConnection.cursor()
cursor.execute(statement)
warnings = cursor.fetchwarnings()
if warnings:
print warnings
rowid = []
else:
rowid = cursor.lastrowid
cursor.close()
dbConnection.commit()
return rowid
The reader code is:
def select(dbConnection, statement):
cursor = dbConnection.cursor()
cursor.execute(statement)
warnings = cursor.fetchwarnings()
if warnings:
print warnings
values = []
else:
values = np.asarray(cursor.fetchall())
cursor.close()
return values
What's the read side look like?
I bet this is a problem with the isolation level on the read side. Most likely your read connection is getting an implicit transaction and the default InnoDB isolation level is:
Repeatable Read
Try issuing:
cursor.execute("SET SESSION TRANSACTION ISOLATION LEVEL READ COMMITTED")
on the read side.
I am working on a windows vista machine in python 3.1.1. I am trying to insert a large number of rows into a SQLite3 db. The file exists, and my program properly inserts some rows into the db. However, at some point in the insertion process, the program dies with this message:
sqlite3.OperationalError: unable to open database file
However, before it dies, there are several rows that are properly added to the database.
Here is the code which specifically handles the insertion:
idx = 0
lst_to_ins = []
for addl_img in all_jpegs:
lst_to_ins.append((addl_img['col1'], addl_img['col2']))
idx = idx + 1
if idx % 10 == 0:
logging.debug('adding rows [%s]', lst_to_ins)
conn.executemany(ins_sql, lst_to_ins)
conn.commit()
lst_to_ins = []
logging.debug('added 10 rows [%d]', idx)
if len(lst_to_ins) > 0:
conn.executemany(ins_sql, lst_to_ins)
conn.commit()
logging.debug('adding the last few rows to the db')
This code inserts anywhere from 10 to 400 rows, then dies with the error message
conn.executemany(ins_sql, lst_to_ins)
sqlite3.OperationalError: unable to open database file
How is it possible that I can insert some rows, but then get this error?
SQLite does not have record locking; it uses a simple locking mechanism that locks the entire database file briefly during a write. It sounds like you are running into a lock that hasn't cleared yet.
The author of SQLite recommends that you create a transaction prior to doing your inserts, and then complete the transaction at the end. This causes SQLite to queue the insert requests, and perform them using a single file lock when the transaction is committed.
In the newest version of SQLite, the locking mechanism has been enhanced, so it might not require a full file lock anymore.
same error here on windows 7 (python 2.6, django 1.1.1 and sqllite) after some records inserted correctly: sqlite3.OperationalError: unable to open database file
I ran my script from Eclipse different times and always got that error. But as I ran it from the command line (after setting PYTHONPATH and DJANGO_SETTINGS_MODULE) it worked as a charm...
just my 2 cents!