Read table in pandas with lowercase column using sqlalchemy - python

I would like to read a table in my database as pandas dataframe. I am working with sqlalchemy and it seems to me that it only executes queries in uppercase.
The table XYZ in my schema has a column name "pred_pred" in lowercase. When I do the following:
import pandas as pd
import cx_Oracle as ora
from sqlalchemy import create_engine
from sqlalchemy.engine import url
connect_url = url.URL(...)
engine = create_engine(connect_url)
connection = engine.connect()
input = pd.read_sql_query('SELECT pred_pred FROM XYZ', connection)
I get the following error:
DatabaseError: ORA-00904: "PRED_PRED": invalid identifier
Is there a workaround?
EDIT: as a workaround at the moment I am simply importing all columns using * and then working on them in pandas because the table has only few columns. I would still like to know if it's possible to solve this problem in a more direct way.

As also described in comment, you should just add double quotes to wrap your columns as oracle converts it to upper case if it is not wrapped with double quotes.
I think you need something like following:
input = pd.read_sql_query('SELECT "pred_pred" FROM XYZ', connection)
Because you must have created the xyz table with column wrapped in double quotes, it is stored as case sensitive name i.e lowercase.
See this db<>fiddle demo for more clarification.
Cheers!!

It depends how the table was created in the first place. By default, it doesn't matter if the DDL for that table was written in uppercase or lowercase, Oracle will change it all to uppercase and store it in the database.
That means, that bellow DDL statements are equal for Oracle:
create table table1 (column1 VARCHAR2(20));
CREATE TABLE TABLE1 (COLUMN1 VARCHAR2(20));
Table with such creation DDL can be simply queried both by:
SELECT COLUMN1 FROM TABLE1;
SELECT column1 FROM table1;
However, different issue is, when the table name or column is specified with double quotes.
create table table1 ("column1" VARCHAR2(20));
Then everytime you're querying that column, it has to be always queried again by those quotes and with the exact casing as it was created:
SELECT "column1" FROM TABLE1;
Regarding to Python code, in REPL it seems that you can easily combine double quotes with single quotes:
>>>input = 'SELECT "pred_pred" FROM XYZ'
>>>input
>>>'SELECT "pred_pred" FROM XYZ'
So the correct code you can just change your code with:
input = pd.read_sql_query('SELECT "pred_pred" FROM XYZ', connection)
To be sure that we're approaching the correct issue in here, you might want to connect to your database via for example SQL developer and query:
SELECT COLUMN_NAME FROM ALL_TAB_COLUMNS WHERE UPPER(TABLE_NAME) = 'XYZ'
If the column_name is not all upperacase, then it was created with double quotes and must be therefore queried the same way.
Further reading regarding to naming rules https://docs.oracle.com/database/121/SQLRF/sql_elements008.htm#SQLRF00223
You already mentioned the workaround with * in the comments - but it seems to be bad idea to query with * and do the projecting part on the Python side, since it would be raising the needed IO operations on the database side.

Related

Writing pandas dataframe into SQL server table - no result and no error

Here is my code
from sqlalchemy import create_engine
import pandas as pd
engine = create_engine("connection string")
conn_obj = engine.connect()
my_df = pd.DataFrame({'col1': ['29199'], 'date_created': ['2022-06-29 17:15:49.776867']})
my_df.to_sql('SomeSQLTable', conn_obj, if_exists='append', index = False)
I also created SomeSQLTable with script:
CREATE TABLE SomeSQLTable(
col1 nvarchar(90),
date_created datetime2)
GO
Everything runs fine, but no records are inserted into SQL table and no errors are displayed. I am not sure how to troubleshoot. conn_obj works fine, I was able to pull data.
I don't think it's exactly the answer but I don't have the privileges of commenting right now.
First of all, the pd.to_sql() returns the number of rows affected by the operation, can you please check that?
Lastly, you are defining the data types in the table creation, it could be a problem of casting the data types. I never create the table through sql as pd.to_sql() can create it if needed.
Thirdly, Please check on the table name, there could be an issue with the pascal case in some db's.

Snowflake table created with SQLAlchemy requires quotes ("") to query

I am ingesting data into Snowflake tables using Python and SQLAlchemy. These tables that I have created all require quotations to query both the table name and the column names. For example, select * from "database"."schema"."table" where "column" = 2; Will run, while select * from database.schema.table where column = 2; will not run. The difference being the quotes.
I understand that if a table is created in Snowflake with quotes than quotes will be required to query it. However, I only put an Excel file in a Pandas data frame then used SQLAlchemy and pd.to_sql to create the table. An example of my code:
engine = create_engine(URL(
account = 'my_account',
user = 'my_username',
password = 'my_password',
database = 'My_Database',
schema = 'My_Schema',
warehouse = 'My_Wh',
role='My Role',
))
connection = engine.connect()
df.to_sql('My_Table', con=engine, if_exists='replace', index=False, index_label=None, chunksize=16384)
Does SQLAlchemy automatically create the tables with quotes? Is this a problem with the schema? I did not set that up. Is there a way around this?
From the SQLAlchemy Snowflake Github documentation:
Object Name Case Handling
Snowflake stores all case-insensitive object
names in uppercase text. In contrast, SQLAlchemy considers all
lowercase object names to be case-insensitive. Snowflake SQLAlchemy
converts the object name case during schema-level communication, i.e.
during table and index reflection. If you use uppercase object names,
SQLAlchemy assumes they are case-sensitive and encloses the names with
quotes. This behavior will cause mismatches agaisnt data dictionary
data received from Snowflake, so unless identifier names have been
truly created as case sensitive using quotes, e.g., "TestDb", all
lowercase names should be used on the SQLAlchemy side.
What I think this is trying to say is SQLAlchemy treats any names containing capital letters as being case-sensitive and automatically encloses them in quotes, conversely any names in lower case are not quoted. It doesn't look like this behaviour is configurable.
You probably don't have any control over database and possibly schema names, but when creating your table if you want consistent behaviour whether quoted or unquoted then you should stick to using lower case naming. What you should find is that the table name will then work whether you use "my_table" or my_table.

ProgrammingError: (psycopg2.errors.UndefinedColumn), while working with sqlalchemy

I have trouble querying a table, created with sqlalchemy on postgres db (local).
While I am able to execute, and receive query result with:
SELECT * FROM olympic_games
I am getting an error message when I'm trying to access single column, or perform any other operation on table:
SELECT games FROM olympic_games
The error message is (couple sentences translated from Polish):
ProgrammingError: (psycopg2.errors.UndefinedColumn) BŁĄD: column "games" does not exist
LINE 1: SELECT COUNT(Sport)
^
HINT: maybe you meant "olympic_games.Games".
SQL: SELECT games FROM olympic_games LIMIT 5;]
(Background on this error at: http://sqlalche.me/e/f405)
It pretty much sums to that program doesn't see, or can access specific column, and display that it doesn't exist.
I tried accessing with table.column format, it didn't work as well. I am also able to see column names, via information_schema.columns
Data (.csv) was loaded with pd.read_csv, and then DataFrame.to_sql. Code below, thanks for help!
engine = create_engine('postgresql://:#:/olympic_games')
with open('olympic_athletes_2016_14.csv', 'r') as file:
df = pd.read_csv(file, index_col='ID')
df.to_sql(name = 'olympic_games', con = engine, if_exists = 'replace', index_label = 'ID')
Both execute commands returned with same error:
with engine.connect() as con:
rs = con.execute("SELECT games FROM olympic_games LIMIT 5;")
df_fetch = pd.DataFrame(rs.fetchall())
df_fetch2 = engine.execute("""SELECT games FROM olympic_games LIMIT 5;""").fetchall()
Essentially, this is the double quoting issue of column identifiers as mentioned in the PostgreSQL manual:
Quoting an identifier also makes it case-sensitive, whereas unquoted names are always folded to lower case. For example, the identifiers FOO, foo, and "foo" are considered the same by PostgreSQL, but "Foo" and "FOO" are different from these three and each other.
When any of your Pandas data frame columns have mixed cases, the DataFrame.to_sql preserves the case sensitivity by creating columns with double quotes at CREATE TABLE stage. Specifically, the below Python Pandas code when using replace
df.to_sql(name='olympic_games', con=engine, if_exists='replace', index_label='ID')
Translates as below in Postgres if Sport was a titled case column in data frame:
DROP TABLE IF EXISTS public."olympic_games";
CREATE TABLE public."olympic_games"
(
...
"Sport" varchar(255)
"Games" varchar(255)
...
);
Once an identifier is quoted with mixed cases, it must always be referred to in that manner. Therefore sport is not the same as "Sport". Remember in SQL, double quotes actually is different than single quotes which can be interchangeable in Python.
To fix, consider rendering all your Pandas columns to lower case since "games" is the same as games, Games or GAMES (but not "Games" or "GAMES").
df.columns = df.columns.str.lower()
df.to_sql(name='olympic_games', con=engine, if_exists='replace', index_label='ID')
Alternatively, leave as is and quote appropriately:
SELECT "Games" FROM olympic_games
Try SELECT "games" FROM olympic_games. In some cases PostgreSQL create the quotes around a columns names. For example if the column name contained mixed register. I have to remind you: PostgreSQL is case sensitive

Too many server roundtrips w/ psycopg2

I am making a script, that should create a schema for each customer. I’m fetching all metadata from a database that defines how each customer’s schema should look like, and then create it. Everything is well defined, the types, names of tables, etc. A customer has many tables (fx, address, customers, contact, item, etc), and each table has the same metadata.
My procedure now:
get everything I need from the metadataDatabase.
In a for loop, create a table, and then Alter Table and add each metadata (This is done for each table).
Right now my script runs in about a minute for each customer, which I think is too slow. It has something to do with me having a loop, and in that loop, I’m altering each table.
I think that instead of me altering (which might be not so clever approach), I should do something like the following:
Note that this is just a stupid but valid example:
for table in tables:
con.execute("CREATE TABLE IF NOT EXISTS tester.%s (%s, %s);", (table, "last_seen date", "valid_from timestamp"))
But it gives me this error (it seems like it reads the table name as a string in a string..):
psycopg2.errors.SyntaxError: syntax error at or near "'billing'"
LINE 1: CREATE TABLE IF NOT EXISTS tester.'billing' ('last_seen da...
Consider creating tables with a serial type (i.e., autonumber) ID field and then use alter table for all other fields by using a combination of sql.Identifier for identifiers (schema names, table names, column names, function names, etc.) and regular format for data types which are not literals in SQL statement.
from psycopg2 import sql
# CREATE TABLE
query = """CREATE TABLE IF NOT EXISTS {shm}.{tbl} (ID serial)"""
cur.execute(sql.SQL(query).format(shm = sql.Identifier("tester"),
tbl = sql.Identifier("table")))
# ALTER TABLE
items = [("last_seen", "date"), ("valid_from", "timestamp")]
query = """ALTER TABLE {shm}.{tbl} ADD COLUMN {col} {typ}"""
for item in items:
# KEEP IDENTIFIER PLACEHOLDERS
final_query = query.format(shm="{shm}", tbl="{tbl}", col="{col}", typ=i[1])
cur.execute(sql.SQL(final_query).format(shm = sql.Identifier("tester"),
tbl = sql.Identifier("table"),
col = sql.Identifier(item[0]))
Alternatively, use str.join with list comprehension for one CREATE TABLE:
query = """CREATE TABLE IF NOT EXISTS {shm}.{tbl} (
"id" serial,
{vals}
)"""
items = [("last_seen", "date"), ("valid_from", "timestamp")]
val = ",\n ".join(["{{}} {typ}".format(typ=i[1]) for i in items])
# KEEP IDENTIFIER PLACEHOLDERS
pre_query = query.format(shm="{shm}", tbl="{tbl}", vals=val)
final_query = sql.SQL(pre_query).format(*[sql.Identifier(i[0]) for i in items],
shm = sql.Identifier("tester"),
tbl = sql.Identifier("table"))
cur.execute(final_query)
SQL (sent to database)
CREATE TABLE IF NOT EXISTS "tester"."table" (
"id" serial,
"last_seen" date,
"valid_from" timestamp
)
However, this becomes heavy as there are too many server roundtrips.
How many tables with how many columns are you creating that this is slow? Could you ssh to a machine closer to your server and run the python there?
I don't get that error. Rather, I get an SQL syntax error. A values list is for conveying data. But ALTER TABLE is not about data, it is about metadata. You can't use a values list there. You need the names of the columns and types in double quotes (or no quotes) rather than single quotes. And you can't have a comma between name and type. And you can't have parentheses around each pair. And each pair needs to be introduced with "ADD", you can't have it just once. You are using the wrong tool for the job. execute_batch is almost the right tool, except it will use single quotes rather than double quotes around the identifiers. Perhaps you could add a flag to it tell it to use quote_ident.
Not only is execute_values the wrong tool for the job, but I think python in general might be as well. Why not just load from a .sql file?

One of my sqlite rows contains my column names. How do I select it for deletion

I used python version 3.4.3 with the sqlite3 package.
I made mistake while transferring a load of .txt files into sqlite tables. Some of the .txt files had more than one header line. So somewhere in the resulting sql table there is a row containing column names of that table.
For example if I set up a table like this:
import sqlite3
con = sqlite3.connect(path to a db)
con.execute('CREATE TABLE A_table (Id PRIMARY KEY,name TEXT,value INTEGER)')
rows = [('Id','name','value'),(1,'Ted',111),(2,'Thelma',22)]
con.executemany('INSERT INTO A_table (Id,name,value) Values(?,?,?)',rows)
If I try to remove the row like this:
con.execute('DELETE FROM A_table WHERE name = "name"')
It deletes all rows in the table.
In my real database the row that needs to go is not always the first row it could appear at any point. Short of rebuilding the tables what should I do?
I am sure that this has be asked already but I don't have a clue what to call this problem so I have had 0 luck finding help.
Edit: I used python. I am not python.
Use a parametrized query:
con.execute("DELETE FROM A_table WHERE name=?", ('name'))
In SQL, strings use single quotes.
Double quotes are used to escape column names, so name = "name" is the same as name = name.
To avoid string formatting problems, it might be a better idea to use parameters:
con.execute("DELETE FROM A_table WHERE name = 'name';")
con.execute("DELETE FROM A_table WHERE name = ?;", ["name"]) # a Python string

Categories