Im having trouble with a postgresql query using SQLAlchemy.
I created some large tables using this line of code:
frame.to_sql('Table1', con=engine, method='multi', if_exists='append')
It worked fine. Now, when I want to query data out of it, my first problem is that I have to use quotation marks for each table and column name and I dont really know why, maybe somebody can help me out there.
That is not my main problem though. My main problem is, that when querying the data, all numerical WHERE conditions work fine, but not the ones with Strings in the column data. I get an error that the column does not exist. Im using:
df = pd.read_sql_query('SELECT "variable1", "variable2" FROM "Table1" WHERE "variable1" = 123 AND "variable2" = "abc" ', engine)
I think it might be a problem that I use "abc" instead of 'abc', but I cant change it because of the ' signs in the argument of the query. If I change those ' to " then the Column names and Table names are not detected correctly (because of the problem before that they have to be in quotation marks).
This is the error message:
ProgrammingError: (psycopg2.errors.UndefinedColumn) ERROR: COLUMN »abc« does not exist
LINE 1: ...er" FROM "Table1" WHERE "variable2" = "abc"
And there is an arrow pointing to the first quotation mark of the "abc".
Im new to SQL and I would really appreciate if someone could point me in the right direction.
"Most" SQL dialects (notable exceptions being MS SQL Server and MS Access) strictly differentiate between
single quotes: for string literals, e.g., WHERE thing = 'foo'
double quotes: for object (table, column) names, e.g., WHERE "some col" = 123
PostgreSQL throws in the added wrinkle that table/column names are forced to lower case if they are not (double-)quoted and then uses case-sensitive matching, so if your table is named Table1 then
SELECT * FROM Table1 will fail because PostgreSQL will look for table1, but
SELECT * FROM "Table1" will succeed.
The way to avoid confusion in your query is to use query parameters instead of string literals:
# set up test environment
with engine.begin() as conn:
conn.exec_driver_sql('DROP TABLE IF EXISTS "Table1"')
conn.exec_driver_sql('CREATE TABLE "Table1" (variable1 int, variable2 varchar(50))')
df1 = pd.DataFrame([(123, "abc"), (456, "def")], columns=["variable1", "variable2"])
df1.to_sql("Table1", engine, index=False, if_exists="append")
# test .read_sql_query() with parameters
import sqlalchemy as sa
sql = sa.text('SELECT * FROM "Table1" WHERE variable1 = :v1 AND variable2 = :v2')
param_dict = {"v1": 123, "v2": "abc"}
df2 = pd.read_sql_query(sql, engine, params=param_dict)
print(df2)
"""
variable1 variable2
0 123 abc
"""
It should be: AND "variable2" = 'abc'.
You cannot quote strings/literals with ", as PostgreSQL will interpret it as a database object. Btw. you do not need to wrap table names and and columns with double quotes unless it is extremely necessary, e.g. case sensitive object names, names containing spaces, etc. Imho it is a bad practice and on the long run only leads to confusion. So your query could be perfectly written as follows:
SELECT variable1, variable2
FROM table1
WHERE variable1 = 123 AND variable2 = 'abc';
Keep in mind that it also applies for other objects, like tables or indexes.
CREATE TABLE Table1 (id int) - nice.
CREATE TABLE "Table1" (id int) - not nice.
CREATE TABLE "Table1" ("id" int) - definitely not nice ;)
In case you want to remove the unnecessary double quotes from your table name:
ALTER TABLE "Table1" RENAME TO table1;
Demo: db<>fiddle
Related
I am trying to export my dataframe to sql database (Postgres).
I created the table as following:
CREATE TABLE dataops.OUTPUT
(
ID_TAIL CHAR(30) NOT NULL,
ID_MODEL CHAR(30) NOT NULL,
ID_FIN CHAR(30) NOT NULL,
ID_GROUP_FIN CHAR(30) NOT NULL,
ID_COMPONENT CHAR(30) NOT NULL,
DT_OPERATION TIMESTAMP NOT NULL,
DT_EXECUTION TIMESTAMP NOT NULL,
FT_VALUE_SENSOR FLOAT NOT NULL,
DT_LOAD TIMESTAMP NOT NULL
);
And I want to write this dataframe into that sql table:
conn = sqlalchemy.create_engine("postgres://root:1234#localhost:5432/postgres")
data = [['ID_1', 'A4_DOOUE_ADM001', '1201MJ52', 'PATH_1', 'LATCHED1AFT',
'2016-06-22 19:10:25', '2020-11-12 17:20:33.616016', 2.9, '2020-11-12 17:54:06.340735']]
output_df=pd.DataFrame(data,columns=["id_tail", "id_model", "id_fin", "id_group_fin", "id_component", "dt_operation",
"dt_execution", "ft_value_sensor", "dt_load"])
But, when I run the command to write into database output_df.to_sql I realize that a new table "OUTPUT", with double qupotes has been created with the data inserted.
output_df.to_sql(cfg.table_names["output_rep27"], conn, cfg.db_parameters["schema"], if_exists='append',index=False)
This is what I see in my DDBB:
But the same table without quotes is empty:
When you purposely try to insert the table wrong (changing a column name for example) you see that pandas is inserting with double quotes because the error:
How to avoid pandas inserts with double quotes for the table?
Short version Pandas is double quoting identifiers which is fairly standard. When that happens with upper case identifier you have to double quote from then on when using it. Using it unquoted will fold the name to lower case and you won't find the table. For more information on this, see Identifier Syntax. You have three choices, do as I suggested in comment and force name to lower case, always double quote identifiers when using them or modify Panda source code to not double quote.
I found the same question and here is the accepted answer for it
We need to set the dataframe column into lower case before we send it to PostgreSQL, and set a lower cased table name for the table, so we don't need to add double quotes when we select the table or columns
*EDIT : I found out that whitespace also force to_sql function from pandas to write the table or column name using double quotes in PostgreSQL, so if you wanna make the table or column name double-quotes-free, change the whitespaces into non-whitespace characters or just delete the whitespaces from the table name or column name
this is the example from my own case:
import pandas as pd
import re
from sqlalchemy import create_engine
df = pd.read_excel('data.xlsx')
ws = re.compile("\s+")
# lower the case, strip leading and trailing white space,
# and substitute the whitespace between words with underscore
df.columns = [ws.sub("_", i.lower().strip()) for i in df.columns]
my_db_name = 'postgresql://postgres:my_password#localhost:5432/db_name'
engine = create_engine(my_db_name)
df.to_sql('lowercase_table_name', engine) #use lower cased table name
this line of code worked for me
appended_data.columns = map(str.lower, df2.columns)
appended_data.to_sql('table_name', con=engine,
schema='public', index=False, if_exists='append',method='multi')
You need to use large letters in pandas in order to get names without quotes in SQL table.
Use this code on your df.
df.columns.str.upper()
I didn't found a "good" solution, so what I did was to create my own function to insert the values:
import sqlalchemy
import pandas as pd
conn = sqlalchemy.create_engine("postgres://root:1234#localhost:5432/postgres")
data = [['ID_1', 'A4_DOOUE_ADM001', '1201MJ52', 'PATH_1', 'LATCHED1AFT',
'2016-06-22 19:10:25', '2020-11-12 17:20:33.616016', 2.9, '2020-11-12 17:54:06.340735']]
output_df=pd.DataFrame(data,columns=["id_tail", "id_model", "id_fin", "id_group_fin", "id_component", "dt_operation",
"dt_execution", "ft_value_sensor", "dt_load"])
def to_sql(output_df,table_name,conn,schema):
my_query = 'INSERT INTO '+schema+'.'+table_name+' ('+", ".join(list(output_df.columns))+') \
VALUES ('+ ", ".join(np.repeat('%s',output_df.shape[1]).tolist()) +');'
record_to_insert = output_df.applymap(str).values.tolist()
conn.execute(my_query,record_to_insert)
to_sql(output_df,table_name,conn,schema)
I hope it is useful for somebody
For those, who is still looking for the answer.
Instead of writing
output_df.to_sql(name='some_schema.some_table', con=conn)
you should put schema into corresponding to_sql() parameter
output_df.to_sql(name='some_table', schema='some_schema', con=conn)
Otherwise 'some_schema.some_table' will be considered as single table name and enquoted.
I think to fulfill the syntax requirements and already tried a lot...
I have subsequent variables set up:
db_uri = "postgres://{}:{}#{}/{}".format(user, pwd, server, db)
engine = create_engine(db_uri)
con = engine.connect()
What already works:
df_sql = pd.read_sql_table('TABLE', engine)
What also works:
query = 'SELECT * FROM "TABLE" WHERE id_column = 12564993'
df = pd.read_sql_query(query, con)
But when I change the id_column to a date_column nothing works anymore:
query = 'SELECT * FROM "TABLE" WHERE CAST(ts_column as date) = ts_column "2019-06-19"'
df = pd.read_sql_query(query, con)
Indepently from all syntax options available I get an error code:
ProgrammingError: (psycopg2.errors.SyntaxError) syntax error at or near ""2019-06-19""
LINE 1: ...LECT * FROM "TABLE" WHERE CAST(ts_column as date) = ts_column "2019-06-1...
There is a ^ below the " of "2019-06.1... Any idea what to fix? I consulted the docs and searched for any kind of conditional where statement topic, but I still don't get it. Why can't I just select a specific date attribute to get matching rows?
What type of syntax is this?
WHERE CAST(ts_column as date) = ts_column "2019-06-19"'
You can write this as:
WHERE CAST(ts_column as date) = '2019-06-19'
Or more colloquially in Postgres as:
WHERE ts_column::date = '2019-06-19'::date
Thanks, subsequent definitions are working:
query = '''SELECT * FROM "TABLE" WHERE ts_date::date = date '2019-06-19' '''
query = '''SELECT * FROM "TABLE" WHERE ts_date::date = '2019-06-19' '''
Required syntax:
The table name has to be surrounded by double quotes since the table is name is 100% capitalized. Single quotes aren't working here. I found it somewhere, but don't find the link anymore. Sorry.
Triple quotes for the statement itself seem to be the only reliable variant.
Between the triple quotes and single quote for the date attribute has to be a whitespace. Without whitespace the query doesn't work.
Because to deal with static sql into dynamic sql, so need to use python conversion format, for example
DROP TABLE TABLE_NAME PURE;
CREATE TABLE TABLE_NAME NOLOGGING AS
SELECT DUMMY USER_ID FROM DUAL;
CREATE INDEX I_TABLE_NAME_USER_ID ON TABLE_NAME(USER_ID) NOLOGGING;
I want to convert the format to:
First determine whether there is a “drop table” and then remove the data, replaced
V_TAB_NAME: = 'TABLE_NAME';
IF (F_DROP_TAB (V_TAB_NAME) = 1) THEN
V_SQL: = '
The final result is shown as
V_TAB_NAME := 'TABLE_NAME ';
IF (F_DROP_TAB(V_TAB_NAME) = 1) THEN
SJ_SQL := '
CREATE TABLE TABLE_NAME NOLOGGING AS
SELECT DUMMY USER_ID FROM DUAL';
EXECUTE IMMEDIATE SJ_SQL;
COMMIT;
END IF;
SJ_SQL := '
CREATE INDEX I_TABLE_NAME_USER_ID ON TABLE_NAME (USER_ID) NOLOGGING';
EXECUTE IMMEDIATE SJ_SQL;
It may be a bit difficult to understand, but I hope someone can give me a little string to iterate over and judge the modified example or idea. Thanks.
Here's a simple way to insert variables into SQL statements:
table = 'mytable'
col1 = 'name'
col2 = 'age'
statement = 'select {}, {} from {};'.format(col1, col2, table)
Is this what you're looking for? If not, i don't really understand the question.
I want to read all of the tables contained in a database into pandas data frames. This answer does what I want to accomplish, but I'd like to use the DBAPI syntax with the ? instead of the %s, per the documentation. However, I ran into an error. I thought this answer may address the problem, but I'm now posting my own question because I can't figure it out.
Minimal example
import pandas as pd
import sqlite3
pd.__version__ # 0.19.1
sqlite3.version # 2.6.0
excon = sqlite3.connect('example.db')
c = excon.cursor()
c.execute('''CREATE TABLE stocks
(date text, trans text, symbol text, qty real, price real)''')
c.execute("INSERT INTO stocks VALUES ('2006-01-05', 'BUY', 'RHAT', 100, 35.14)")
c.execute('''CREATE TABLE bonds
(date text, trans text, symbol text, qty real, price real)''')
c.execute("INSERT INTO bonds VALUES ('2015-01-01', 'BUY', 'RSOCK', 90, 23.11)")
data = pd.read_sql_query('SELECT * FROM stocks', excon)
# >>> data
# date trans symbol qty price
# 0 2006-01-05 BUY RHAT 100.0 35.14
But when I include a ? or a (?) as below, I get the error message pandas.io.sql.DatabaseError: Execution failed on sql 'SELECT * FROM (?)': near "?": syntax error.
Problem code
c.execute("SELECT name FROM sqlite_master WHERE type='table';")
tables = c.fetchall()
# >>> tables
# [('stocks',), ('bonds',)]
table = tables[0]
data = pd.read_sql_query("SELECT * FROM ?", excon, params=table)
It's probably something trivial that I'm missing, but I'm not seeing it!
The problem is that you're trying to use parameter substitution for a table name, which is not possible. There's an issue on GitHub that discusses this. The relevant part is at the very end of the thread, in a comment by #jorisvandenbossche:
Parameter substitution is not possible for the table name AFAIK.
The thing is, in sql there is often a difference between string
quoting, and variable quoting (see eg
https://sqlite.org/lang_keywords.html the difference in quoting
between string and identifier). So you are filling in a string, which
is for sql something else as a variable name (in this case a table
name).
Parameter substitution is essential to prevent SQL Injection from unsafe user-entered values.
In this particular example you are sourcing table names directly from the database's own metadata, which is already safe, so it's OK to just use normal string formatting to construct the query, but still good to wrap the table names in quotes.
If you are sourcing user-entered table names, you can also parameterize them first before using them in your normal python string formatting.
e.g.
# assume this is user-entered:
table = '; select * from members; DROP members --'
c.execute("SELECT name FROM sqlite_master WHERE type='table' and name = ?;", excon, params=table )
tables = c.fetchall()
In this case the user has entered some malicious input intended to cause havoc, and the parameterized query will cleanse it and the query will return no rows.
If the user entered a clean table e.g. table = 'stocks' then the above query would return that same name back to you, through the wash, and it is now safe.
Then it is fine to continue with normal python string formatting, in this case using f-string style:
table = tables[0]
data = pd.read_sql_query(f"""SELECT * FROM "{table}" ;""", excon)
Referring back to your original example, my first step above is entirely unnecessary. I just provided it for context. It is unnecessary, because there is no user input so you could just do something like this to get a dictionary of dataframes for every table.
c.execute("SELECT name FROM sqlite_master WHERE type='table';")
tables = c.fetchall()
# >>> tables
# [('stocks',), ('bonds',)]
dfs = dict()
for t in tables:
dfs[t] = pd.read_sql_query(f"""SELECT * FROM "{t}" ;""", excon)
Then you can fetch the dataframe from the dictionary using the tablename as the key.
This question is similar to drastega's question
I have similar problem, however I want to get rid of any quoting characters from names. Here is an example:
CREATE TABLE Resolved (
[Name] TEXT,
[Count] INTEGER,
[Obs_Date] TEXT,
[Bessel_year] REAL,
[Filter] TEXT,
[Comments] TEXT
);
changes to:
CREATE TABLE Resolved (
Name TEXT,
Count INTEGER,
Obs_Date TEXT,
Bessel_year REAL,
Filter TEXT,
Comments TEXT
);
Following the steps, from the link above I have managed to change "[" to quotes. However, I don't want to use any quoting characters. I tried to read documentation about sqlalchemy's metadata. I know that I need to use quote=False parameter. But I don't know where to call it. Thank you in advance for your answers.
The code from Joris worked well in my case by just changing the line c.quote = False to c.name.quote = False
with a pandas version 0.23.4, sqlalchemy=1.2.13 and python 3.6 for a postgres database.
It is a bit strange that an sqlite application errors on the quotes (as sqlite should be case insensitive, quotes or not), and I would also be very cautious with some of the special characters you mention in column names. But if you need to insert the data into sqlite with a schema without quotes, you can do the following.
Starting from this:
import pandas as pd
from pandas.io import sql
import sqlalchemy
from sqlalchemy import create_engine
engine = create_engine('sqlite:///:memory:')
df = pd.DataFrame({'Col1': [1, 2], 'Col2': [0.1, 0.2]})
df.to_sql('test', engine, if_exists='replace', index=False)
So by default sqlalchemy uses quotes (because you have capitals, otherwise no quotes would be used):
In [8]: res = engine.execute("SELECT * FROM sqlite_master;").fetchall()
In [9]: print res[0][4]
CREATE TABLE test (
"Col1" BIGINT,
"Col2" FLOAT
)
Slqalchemy has a quote parameter on each Column that you can set to False. To do this combined with the pandas function, we have to use a workaround (as pandas already creates the Columns with the default values):
db = sql.SQLDatabase(engine)
t = sql.SQLTable('test', db, frame=df, if_exists='replace', index=False)
for c in t.table.columns:
c.quote = False
t.create()
t.insert()
The above is equivalent to the to_sql call, but with interacting with the created table object before writing to the database. Now you have no quotes:
In [15]: res = engine.execute("SELECT * FROM sqlite_master;").fetchall()
In [16]: print res[0][4]
CREATE TABLE test (
Col1 BIGINT,
Col2 FLOAT
)
You can try to use lower case for both table name and column names. Then SQLAlchemy won't quote the table name and column names.