Execute .sql schema in psycopg2 in Python - python

I have a PostgreSQL schema stored in .sql file. It looks something like:
CREATE TABLE IF NOT EXISTS users (
id INTEGER PRIMARY KEY,
facebook_id TEXT NOT NULL,
name TEXT NOT NULL,
access_token TEXT,
created INTEGER NOT NULL
);
How shall I run this schema after connecting to the database?
My existing Python code works for SQLite databases:
# Create database connection
self.connection = sqlite3.connect("example.db")
# Run database schema
with self.connection as cursor:
cursor.executescript(open("schema.sql", "r").read())
But the psycopg2 doesn't have an executescript method on the cursor. So, how can I achieve this?

You can just use execute:
with self.connection as cursor:
cursor.execute(open("schema.sql", "r").read())
though you may want to set psycopg2 to autocommit mode first so you can use the script's own transaction management.
It'd be nice if psycopg2 offered a smarter mode where it read the file in a statement-at-a-time and sent it to the DB, but at present there's no such mode as far as I know. It'd need a fairly solid parser to do it correctly when faced with $$ quoting (and its $delimiter$ variant where the deimiter may be any identifier), standard_conforming_strings, E'' strings, nested function bodies, etc.
Note that this will not work with:
anything containing psql backslash commands
COPY .. FROM STDIN
very long input
... and therefore won't work with dumps from pg_dump

I can't reply to comments of the selected answer by lack of reputation, so i'll make an answer to help with the COPY issue.
Depending on the volume of your DB,pg_dump --inserts outputs INSERTs instead of COPYs

Related

How to set postgres and psycopg2 so that it always searches the schema without having to explicitly mention it?

I have a postgres DB where I have run this command to avoid having to mention schema explicitly:
ALTER DATABASE ibkr_trader SET search_path TO public;
However, when I connect using psycopg2 in python, I still have to type this to access my tables:
select count(*) from "public"."MY_TABLE"
I even tried setting options in psycopg2.connect but it didn't work:
return psycopg2.connect(
dbname=self.dbname,
user=self.user,
password=self.password,
port=self.port,
host=self.host,
options="-c search_path=public"
)
What is the most elegant way to set this network up so that I don't have to type "public":"MY_TABLE" for each query? I do not have any other schemas in my DB and I don't wanna have to mention it explicitly.
Your ALTER DATABASE and your options="-c search_path=public" both work for me. But then again public is usually already in your search path, so neither of them should be needed at all unless you went out of your way to break something.
I suspect you are misinterpreting something. If you try select count(*) from MY_TABLE, that won't work. Not because you are missing the "public", but because you are missing the double quotes around "MY_TABLE" and therefore are searching for the table by a down-cased name of "my_table".
The default search path should look like `"$user", public'", meaning that tables / views in the public schema may be referenced without specifying the schema name, and the server will first search through the schema that matches the role name, and then the public schema. Evidently this has been altered on your server.
It is possible to set the search_path for each user. The alter user command would look like.
alter user username set search_path = 'public'

sql INSERT in python (postgres, cursor, execute)

I had no problem with SELECTing data in python from postgres database using cursor/execute. Just changed the sql to INSERT a row but nothing is inserted to DB. Can anyone let me know what should be modified? A little confused because everything is the same except for the sql statement.
<!-- language: python -->
#app.route("/addcontact")
def addcontact():
# this connection/cursor setting showed no problem so far
conn = pg.connect(conn_str)
cur = conn.cursor(cursor_factory=psycopg2.extras.DictCursor)
sql = f"INSERT INTO jna (sid, phone, email) VALUES ('123','123','123')"
cur.execute(sql)
return redirect("/contacts")
first look at your table setup and make sure your variables are named right in the right order, format and all that, if your not logging into the specific database on the sql server it won't know where the table is, you might need to send something like 'USE databasename' before you do your insert statement so your computer is in the right place in the server.
I might not be up to date with the language but is that 'f' supposed to be right before the quotes? if thats in ur code that'd probably throw an error unless it has a use im not aware of or its not relevant to the problem.
You have to commit your transaction by adding the line below after execute(sql)
conn.commit()
Ref: Using INSERT with a PostgreSQL Database using Python

Turn .sql database dump into pandas dataframe

I have a .sql file that contains a database dump. I would prefer to get this file into a pandas dataframe so that I can view the data and manipulate it. Willing to take any solution, but need explicit instructions, I've never worked with a .sql file previously.
The file's structure is as follows:
-- MySQL dump 10.13 Distrib 8.0.11, for Win64 (x86_64)
--
-- Host: localhost Database: somedatabase
-- ------------------------------------------------------
-- Server version 8.0.11
DROP TABLE IF EXISTS `selected`;
CREATE TABLE `selected` (
`date` date DEFAULT NULL,
`weekday` int(1) DEFAULT NULL,
`monthday` int(4) DEFAULT NULL,
... [more variables]) ENGINE=somengine DEFAULT CHARSET=something COLLATE=something;
LOCK TABLES `selected` WRITE;
INSERT INTO `selected` VALUES (dateval, weekdayval, monthdayval), (dateval, weekdayval, monthdayval), ... (dateval, weekdayval, monthdayval);
INSERT INTO `selected` VALUES (...), (...), ..., (...);
... (more insert statements) ...
-- Dump completed on timestamp
You should use the sqlalchemy library for this:
https://docs.sqlalchemy.org/en/13/dialects/mysql.html
Or alternatively you could use this:
https://pynative.com/python-mysql-database-connection/
The second option my be easier to load your data to mysql as you could just take your sql file text as the query object and pass it to the connection.
Something like this:
import mysql.connector
connection = mysql.connector.connect(host='localhost',
database='database',
user='user',
password='pw')
query = yourSQLfile
cursor = connection.cursor()
result = cursor.execute(query)
Once you've loaded your table you create the engine with sqlalchemy to connect pandas to your database and simply use the pandas read_sql() command to load your table to a dataframe object.
Another note is that if you just want to manipulate the data, you could take the values statement from the sql file and use that to populate a dataframe manually if you needed to. Just change the "Values (....),(....),(....)" to mydict = {[....],[....],[....]} and load it to a dataframe. Or you could dump the values statement to excel and delete the parentheses and do text to columns, give it headers and save it, then load it to a dataframe from excel. Or just manipulate it in excel (you could even use a concat formula to recreate the sql values syntax and replace the data in the sql file). It really depends on exactly what your end-goal here is.
Sorry you did not receive a timely answer here.

Redshift COPY operation doesn't work in SQLAlchemy

I'm trying to do a Redshift COPY in SQLAlchemy.
The following SQL correctly copies objects from my S3 bucket into my Redshift table when I execute it in psql:
COPY posts FROM 's3://mybucket/the/key/prefix'
WITH CREDENTIALS 'aws_access_key_id=myaccesskey;aws_secret_access_key=mysecretaccesskey'
JSON AS 'auto';
I have several files named
s3://mybucket/the/key/prefix.001.json
s3://mybucket/the/key/prefix.002.json
etc.
I can verify that the new rows were added to the table with select count(*) from posts.
However, when I execute the exact same SQL expression in SQLAlchemy, execute completes without error, but no rows get added to my table.
session = get_redshift_session()
session.bind.execute("COPY posts FROM 's3://mybucket/the/key/prefix' WITH CREDENTIALS aws_access_key_id=myaccesskey;aws_secret_access_key=mysecretaccesskey' JSON AS 'auto';")
session.commit()
It doesn't matter whether I do the above or
from sqlalchemy.sql import text
session = get_redshift_session()
session.execute(text("COPY posts FROM 's3://mybucket/the/key/prefix' WITH CREDENTIALS aws_access_key_id=myaccesskey;aws_secret_access_key=mysecretaccesskey' JSON AS 'auto';"))
session.commit()
I basically had the same problem, though in my case it was more:
engine = create_engine('...')
engine.execute(text("COPY posts FROM 's3://mybucket/the/key/prefix' WITH CREDENTIALS aws_access_key_id=myaccesskey;aws_secret_access_key=mysecretaccesskey' JSON AS 'auto';"))
By stepping through pdb, the problem was obviously the lack of a .commit() being invoked. I don't know why session.commit() is not working in your case (maybe the session "lost track" of the sent commands?) so it might not actually fix your problem.
Anyhow, as explained in the sqlalchemy docs
Given this requirement, SQLAlchemy implements its own “autocommit” feature which works completely consistently across all backends. This is achieved by detecting statements which represent data-changing operations, i.e. INSERT, UPDATE, DELETE [...] If the statement is a text-only statement and the flag is not set, a regular expression is used to detect INSERT, UPDATE, DELETE, as well as a variety of other commands for a particular backend.
So, there are 2 solutions, either:
text("COPY posts FROM 's3://mybucket/the/key/prefix' WITH CREDENTIALS aws_access_key_id=myaccesskey;aws_secret_access_key=mysecretaccesskey' JSON AS 'auto';").execution_options(autocommit=True).
Or, get a fixed version of the redshift dialect... I just opened a PR about it
Add a commit to the end of the copy worked for me:
<your copy sql>;commit;
I have had success using the core expression language and Connection.execute() (as opposed to the ORM and sessions) to copy delimited files to Redshift with the code below. Perhaps you could adapt it for JSON.
def copy_s3_to_redshift(conn, s3path, table, aws_access_key, aws_secret_key, delim='\t', uncompress='auto', ignoreheader=None):
"""Copy a TSV file from S3 into redshift.
Note the CSV option is not used, so quotes and escapes are ignored. Empty fields are loaded as null.
Does not commit a transaction.
:param Connection conn: SQLAlchemy Connection
:param str uncompress: None, 'gzip', 'lzop', or 'auto' to autodetect from `s3path` extension.
:param int ignoreheader: Ignore this many initial rows.
:return: Whatever a copy command returns.
"""
if uncompress == 'auto':
uncompress = 'gzip' if s3path.endswith('.gz') else 'lzop' if s3path.endswith('.lzo') else None
copy = text("""
copy "{table}"
from :s3path
credentials 'aws_access_key_id={aws_access_key};aws_secret_access_key={aws_secret_key}'
delimiter :delim
emptyasnull
ignoreheader :ignoreheader
compupdate on
comprows 1000000
{uncompress};
""".format(uncompress=uncompress or '', table=text(table), aws_access_key=aws_access_key, aws_secret_key=aws_secret_key)) # copy command doesn't like table name or keys single-quoted
return conn.execute(copy, s3path=s3path, delim=delim, ignoreheader=ignoreheader or 0)

Issue with the web.py tutorial when using sqlite3

For the record, I have looked into this, but cannot seem to figure out what is wrong.
So I'm doing the tutorial on web.py, and I get to the database part (can do everything above it). I wanted to use sqlite3 for various reasons. Since I couldn't figure out where to type the
sqlite3 test.db
line, I look into the sqlite3 module, and create a database with that. The code for that is:
import sqlite3
conn = sqlite3.connect("test.db")
print("Opened database successfully");
conn.execute('''CREATE TABLE todo
(id serial primary key,
title text,
created timestamp default now(),
done boolean default 'f');''')
conn.execute("INSERT INTO todo (title) VALUES ('Learn web.py')");
but I get the error
done boolean default 'f');''')
sqlite3.OperationalError: near "(": syntax error
I've tried looking into this, but cannot figure out for the life of me what the issue is.
I haven't had luck with other databases (new to this, so not sure on the subtleties), I wasn't able to just make the sqlite database directly so it might be a python thing, but it matches the tester.py I made with the sqlite with python tutorial...
Thanks if anyone can help me!
The problem causing the error is that you can't use the MySQL now() function here. Try instead
created default current_timestamp
This works:
conn.execute('''CREATE TABLE todo
(id serial primary key,
title text,
created default current_timestamp,
done boolean default 'f');''')
You are using SQLite but are specifying data types from some other database engine. SQLite accepts only INT, TEXT, REAL, NUMERIC, and NONE. Boolean is most likely being mapped to one of the number types and therefore DEFAULT 'F' is not valid syntax (although I don't think it would be valid in any version of SQL that does support BOOLEAN as a datatype, since they general use INTEGER for the underlying storage).
Rewrite the CREATE TABLE statement with SQLite datatypes and allowable default values and your code should work fine.
More details on the (somewhat unusual) SQLite type system: http://www.sqlite.org/datatype3.html

Categories