Too many server roundtrips w/ psycopg2 - python

I am making a script, that should create a schema for each customer. I’m fetching all metadata from a database that defines how each customer’s schema should look like, and then create it. Everything is well defined, the types, names of tables, etc. A customer has many tables (fx, address, customers, contact, item, etc), and each table has the same metadata.
My procedure now:
get everything I need from the metadataDatabase.
In a for loop, create a table, and then Alter Table and add each metadata (This is done for each table).
Right now my script runs in about a minute for each customer, which I think is too slow. It has something to do with me having a loop, and in that loop, I’m altering each table.
I think that instead of me altering (which might be not so clever approach), I should do something like the following:
Note that this is just a stupid but valid example:
for table in tables:
con.execute("CREATE TABLE IF NOT EXISTS tester.%s (%s, %s);", (table, "last_seen date", "valid_from timestamp"))
But it gives me this error (it seems like it reads the table name as a string in a string..):
psycopg2.errors.SyntaxError: syntax error at or near "'billing'"
LINE 1: CREATE TABLE IF NOT EXISTS tester.'billing' ('last_seen da...

Consider creating tables with a serial type (i.e., autonumber) ID field and then use alter table for all other fields by using a combination of sql.Identifier for identifiers (schema names, table names, column names, function names, etc.) and regular format for data types which are not literals in SQL statement.
from psycopg2 import sql
# CREATE TABLE
query = """CREATE TABLE IF NOT EXISTS {shm}.{tbl} (ID serial)"""
cur.execute(sql.SQL(query).format(shm = sql.Identifier("tester"),
tbl = sql.Identifier("table")))
# ALTER TABLE
items = [("last_seen", "date"), ("valid_from", "timestamp")]
query = """ALTER TABLE {shm}.{tbl} ADD COLUMN {col} {typ}"""
for item in items:
# KEEP IDENTIFIER PLACEHOLDERS
final_query = query.format(shm="{shm}", tbl="{tbl}", col="{col}", typ=i[1])
cur.execute(sql.SQL(final_query).format(shm = sql.Identifier("tester"),
tbl = sql.Identifier("table"),
col = sql.Identifier(item[0]))
Alternatively, use str.join with list comprehension for one CREATE TABLE:
query = """CREATE TABLE IF NOT EXISTS {shm}.{tbl} (
"id" serial,
{vals}
)"""
items = [("last_seen", "date"), ("valid_from", "timestamp")]
val = ",\n ".join(["{{}} {typ}".format(typ=i[1]) for i in items])
# KEEP IDENTIFIER PLACEHOLDERS
pre_query = query.format(shm="{shm}", tbl="{tbl}", vals=val)
final_query = sql.SQL(pre_query).format(*[sql.Identifier(i[0]) for i in items],
shm = sql.Identifier("tester"),
tbl = sql.Identifier("table"))
cur.execute(final_query)
SQL (sent to database)
CREATE TABLE IF NOT EXISTS "tester"."table" (
"id" serial,
"last_seen" date,
"valid_from" timestamp
)

However, this becomes heavy as there are too many server roundtrips.
How many tables with how many columns are you creating that this is slow? Could you ssh to a machine closer to your server and run the python there?
I don't get that error. Rather, I get an SQL syntax error. A values list is for conveying data. But ALTER TABLE is not about data, it is about metadata. You can't use a values list there. You need the names of the columns and types in double quotes (or no quotes) rather than single quotes. And you can't have a comma between name and type. And you can't have parentheses around each pair. And each pair needs to be introduced with "ADD", you can't have it just once. You are using the wrong tool for the job. execute_batch is almost the right tool, except it will use single quotes rather than double quotes around the identifiers. Perhaps you could add a flag to it tell it to use quote_ident.
Not only is execute_values the wrong tool for the job, but I think python in general might be as well. Why not just load from a .sql file?

Related

Using arbitrary sqlalchemy select results (e.g., from a CTE) to create ORM instances

When I create an instance of an object (in my example below, a Company) I want to automagically create default, related objects. One way is to use a per-row, after-insert trigger, but I'm trying to avoid that route and use CTEs which are easier to read and maintain. I have this SQL working (underlying db is PostgreSQL and the only thing you need to know about table company is its primary key is: id SERIAL PRIMARY KEY and it has one other required column, name VARCHAR NOT NULL):
with new_company as (
-- insert my company row, returning the whole row
insert into company (name)
values ('Acme, Inc.')
returning *
),
other_related as (
-- herein I join to `new_company` and create default related rows
-- in other tables. Here we use, effectively, a no-op - what it
-- actually does is not germane to the issue.
select id from new_company
)
-- Having created the related rows, we return the row we inserted into
-- table `company`.
select * from new_company;
The above works like a charm and with the recently added Select.add_cte() (in sqlalchemy 1.4.21) I can write the above with the following python:
import sqlalchemy as sa
from myapp.models import Company
new_company = (
sa.insert(Company)
.values(name='Acme, Inc.')
.returning(Company)
.cte(name='new_company')
)
other_related = (
sa.select(sa.text('new_company.id'))
.select_from(new_company)
.cte('other_related')
)
fetch_company = (
sa.select(sa.text('* from new_company'))
.add_cte(other_related)
)
print(fetch_company)
And the output is:
WITH new_company AS
(INSERT INTO company (name) VALUES (:param_1) RETURNING company.id, company.name),
other_related AS
(SELECT new_company.id FROM new_company)
SELECT * from new_company
Perfect! But when I execute the above query I get back a Row:
>>> result = session.execute(fetch_company).fetchone()
>>> print(result)
(26, 'Acme, Inc.')
I can create an instance with:
>>> result = session.execute(fetch_company).fetchone()
>>> company = Company(**result)
But this instance, if added to the session, is in the wrong state, pending, and if I flush and/or commit, I get a duplicate key error because the company is already in the database.
If I try using Company in the select list, I get a bad query because sqlalchemy automagically sets the from-clause and I cannot figure out how to clear or explicitly set the from-clause to use my CTE.
I'm looking for one of several possible solutions:
annotate an arbitrary query in some way to say, "build an instance of MyModel, but use this table/alias", e.g., query = sa.select(Company).select_from(new_company.alias('company'), reset=True).
tell a session that an instance is persistent regardless of what the session thinks about the instance, e.g., company = Company(**result); session.add(company, force_state='persistent')
Obviously I could do another round-trip to the db with a call to session.merge() (as discussed in early comments of this question) so the instance ends up in the correct state, but that seems terribly inefficient especially if/when used to return lists of instances.

ProgrammingError: (psycopg2.errors.UndefinedColumn), while working with sqlalchemy

I have trouble querying a table, created with sqlalchemy on postgres db (local).
While I am able to execute, and receive query result with:
SELECT * FROM olympic_games
I am getting an error message when I'm trying to access single column, or perform any other operation on table:
SELECT games FROM olympic_games
The error message is (couple sentences translated from Polish):
ProgrammingError: (psycopg2.errors.UndefinedColumn) BŁĄD: column "games" does not exist
LINE 1: SELECT COUNT(Sport)
^
HINT: maybe you meant "olympic_games.Games".
SQL: SELECT games FROM olympic_games LIMIT 5;]
(Background on this error at: http://sqlalche.me/e/f405)
It pretty much sums to that program doesn't see, or can access specific column, and display that it doesn't exist.
I tried accessing with table.column format, it didn't work as well. I am also able to see column names, via information_schema.columns
Data (.csv) was loaded with pd.read_csv, and then DataFrame.to_sql. Code below, thanks for help!
engine = create_engine('postgresql://:#:/olympic_games')
with open('olympic_athletes_2016_14.csv', 'r') as file:
df = pd.read_csv(file, index_col='ID')
df.to_sql(name = 'olympic_games', con = engine, if_exists = 'replace', index_label = 'ID')
Both execute commands returned with same error:
with engine.connect() as con:
rs = con.execute("SELECT games FROM olympic_games LIMIT 5;")
df_fetch = pd.DataFrame(rs.fetchall())
df_fetch2 = engine.execute("""SELECT games FROM olympic_games LIMIT 5;""").fetchall()
Essentially, this is the double quoting issue of column identifiers as mentioned in the PostgreSQL manual:
Quoting an identifier also makes it case-sensitive, whereas unquoted names are always folded to lower case. For example, the identifiers FOO, foo, and "foo" are considered the same by PostgreSQL, but "Foo" and "FOO" are different from these three and each other.
When any of your Pandas data frame columns have mixed cases, the DataFrame.to_sql preserves the case sensitivity by creating columns with double quotes at CREATE TABLE stage. Specifically, the below Python Pandas code when using replace
df.to_sql(name='olympic_games', con=engine, if_exists='replace', index_label='ID')
Translates as below in Postgres if Sport was a titled case column in data frame:
DROP TABLE IF EXISTS public."olympic_games";
CREATE TABLE public."olympic_games"
(
...
"Sport" varchar(255)
"Games" varchar(255)
...
);
Once an identifier is quoted with mixed cases, it must always be referred to in that manner. Therefore sport is not the same as "Sport". Remember in SQL, double quotes actually is different than single quotes which can be interchangeable in Python.
To fix, consider rendering all your Pandas columns to lower case since "games" is the same as games, Games or GAMES (but not "Games" or "GAMES").
df.columns = df.columns.str.lower()
df.to_sql(name='olympic_games', con=engine, if_exists='replace', index_label='ID')
Alternatively, leave as is and quote appropriately:
SELECT "Games" FROM olympic_games
Try SELECT "games" FROM olympic_games. In some cases PostgreSQL create the quotes around a columns names. For example if the column name contained mixed register. I have to remind you: PostgreSQL is case sensitive

One of my sqlite rows contains my column names. How do I select it for deletion

I used python version 3.4.3 with the sqlite3 package.
I made mistake while transferring a load of .txt files into sqlite tables. Some of the .txt files had more than one header line. So somewhere in the resulting sql table there is a row containing column names of that table.
For example if I set up a table like this:
import sqlite3
con = sqlite3.connect(path to a db)
con.execute('CREATE TABLE A_table (Id PRIMARY KEY,name TEXT,value INTEGER)')
rows = [('Id','name','value'),(1,'Ted',111),(2,'Thelma',22)]
con.executemany('INSERT INTO A_table (Id,name,value) Values(?,?,?)',rows)
If I try to remove the row like this:
con.execute('DELETE FROM A_table WHERE name = "name"')
It deletes all rows in the table.
In my real database the row that needs to go is not always the first row it could appear at any point. Short of rebuilding the tables what should I do?
I am sure that this has be asked already but I don't have a clue what to call this problem so I have had 0 luck finding help.
Edit: I used python. I am not python.
Use a parametrized query:
con.execute("DELETE FROM A_table WHERE name=?", ('name'))
In SQL, strings use single quotes.
Double quotes are used to escape column names, so name = "name" is the same as name = name.
To avoid string formatting problems, it might be a better idea to use parameters:
con.execute("DELETE FROM A_table WHERE name = 'name';")
con.execute("DELETE FROM A_table WHERE name = ?;", ["name"]) # a Python string

Python mysql using variable to select a certain field

Having a little tricky issue with python and mysql. To keep it simple, the following code returns whatever is in the variable 'field', which is a string. Such as 'username' or 'password'.
options = [field, userID]
entries = cursor.execute('select (?) from users where id=(?)', options).fetchall()
print(entries);
This code works correctly if I remove the first (?) and just use the actually name (like 'username') instead. Can anyone provide some input?
Your query is actually formed as:
select "field" from users where id="value"
which returns you a string "field" instead of the actual table field value.
You cannot parameterize column and table names (docs):
Parameter placeholders can only be used to insert column values. They
can not be used for other parts of SQL, such as table names,
statements, etc.
Use string formatting for that part:
options = [userID]
query = 'select {field} from users where id=(?)'.format(field=field)
cursor.execute(query, options).fetchall()
Related threads with some more explanations:
pysqlite: Placeholder substitution for column or table names?
Python MySQLdb: Query parameters as a named dictionary

Creating Insert Statement for MySQL in Python

I am trying to construct an insert statement that is built from the results of a query. I run a query that retrieves results from one database and then creates an insert statement from the results and inserts that into a different database.
The server that is initially queried only returns those fields in the reply which are populated and this can differ from record to record. The destination database table has all of the possible fields available. This is why I need to construct the insert statement on the fly for each record that is retrieved and why I cannot use a default list of fields as I have no control over which ones will be populated in the response.
Here is a sample of the code, I send off a request for the T&C for an isin and the response is a name and value.
fields = []
data = []
getTCQ = ("MDH:T&C|"+isin+"|NAME|VALUE")
mdh.execute(getTCQ)
TC = mdh.fetchall()
for values in TC:
fields.append(values[0])
data.append(values[1])
insertQ = ("INSERT INTO sp_fields ("+fields+") VALUES ('"+data+"')")
The problem is with the fields part, mysql is expecting the following:
INSERT INTO sp_fields (ACCRUAL_COUNT,AMOUNT_OUTSTANDING_CALC_DATE) VALUES ('030/360','2014-11-10')
But I am getting the following for insertQ:
INSERT INTO sp_fields ('ACCRUAL_COUNT','AMOUNT_OUTSTANDING_CALC_DATE') VALUES ('030/360','2014-11-10')
and mysql does not like the ' ' around the fields names.
How do I get rid of these? so that it looks like the 1st insertQ statement that works.
many thanks in advance.
You could use ','.join(fields) to create the desired string (without quotes around each field).
Then use parametrized sql and pass the values as the second argument to cursor.execute:
insertQ = ("INSERT INTO sp_fields ({}) VALUES ({})".format(
','.join(fields), ','.join(['%s']*len(dates)))
cursor.execute(insertQ, dates)
Note that the correct placemarker to use, e.g. %s, depends on the DB adapter you are using. MySQLdb uses %s, but oursql uses ?, for instance.

Categories