I'd like to retrieve the fully referenced column name from a PyOdbc Cursor. For example, say I have 2 simple tables:
Table_1(Id, < some other fields >)
Table_2(Id, < some other fields >)
and I want to retrieve the joined data
select * from Table_1 t1, Table2 t2 where t1.Id = t2.Id
using pyodbc, like this:
query = 'select * from Table_1 t1, Table2 t2 where t1.Id = t2.Id'
import pyodbc
conn_string = '<removed>'
connection = pyodbc.connect(conn_string)
cursor = connection.cursor()cursor.execute(query)
I then want to get the column names:
for row in cursor.description:
print row[0]
BUT if I do this I'll get Id twice which I don't want. Ideally I could get t1.Id and t2.Id in the output.
Some of the solutions I've thought of (and why I don't really want to implement them):
re-name the columns in the query - in my real-world use case there are dozens of tables, some with dozens of rows that are changed far too often
parse my query and automate my SQL query generation (basically checking the query for tables, using the cursor.tables function to get the columns and then replacing the select * with a set of named columns) - If I have too I'll do this, but it seems like overkill for a testing harness
Is there a better way? Any advice would be appreciated.
The PyOdbc docs offer
# columns in table x
for row in cursor.columns(table='x'):
print(row.column_name)
www.PyOdbc wiki The API docs are useful
Here's how I do it.
import pyodbc
connection = pyodbc.connect('DSN=vertica_standby', UID='my_user', PWD='my_password', ansi=True)
cursor = connection.cursor()
for row in cursor.columns(table='table_name_in_your_database'):
print(row.column_name)
You have to have your DSN (data source name) set up via two files. odbc.ini and odbcinst.ini
It doesn't seem to be possible to do what I want without writing a decent amount of code to wrap it up. None of the other answers actually answered the question of returning different column names by the table they originate from in some relatively automatic fashion.
Related
I am trying to parse all the queries executed by users (within a period of time) in PostgreSQL DB (by querying the pg_stat_statements table) and trying to create a report of which tables are used by users to run either a Select or an Insert or a Delete query. Basically running something like Select query, queryid, userid from pg_stat_state and then parsing each query to check if it was a Select or an Insert or a Delete query and also extract the table_Name from the query.
I am using sqlparse python module but very new to it so need help.
I am able to get the table name by using something like:
import sqlparse
from sqlparse.sql import Where, Comparison, Parenthesis, Identifier
for tokens in sqlparse.parse(sql_statement)[0]:
if isinstance(tokens, Identifier):
print(str(tokens))
but not sure how to get the type of statement (Select/Insert/Delete) together with the name of the table. Also, need to incorporate COPY statements as Selects too.
I tried using psqlparse but I did not see much info/help online regarding this module.
Please suggest.
Thanks.
This is not trivial, and I don't think sqlparse really helps very much. INSERT and DELETE are pretty easy, because they usually start out "INSERT INTO table" and "DELETE FROM table", but "SELECT" is the wild wild west. Clearly the tables will be mentioned in a FROM clause, but it could be "FROM table1 t1, table t2, table t3 WHERE" or "FROM table t1 LEFT INNER JOIN table t2 LEFT INNER JOIN table t3 WHERE".
You might have nested queries, and a SELECT doesn't even have to have a table. Plus, there could be UNIONs that mention further tables. And, of course, "SELECT INTO" is just another way of doing "INSERT". I believe you should start out just doing text processing, looking for the key words. You might get far enough.
I have made an test table in sql with the following information schema as shown:
Now I extract this information using the python script the code of which is as shown:
import pandas as pd
import mysql.connector
db = mysql.connector.connect(host="localhost", user="root", passwd="abcdef")
pointer = db.cursor()
pointer.execute("use holdings")
x = "Select * FROM orders where tradingsymbol like 'TATACHEM'"
pointer.execute(x)
rows = pointer.fetchall()
rows = pd.DataFrame(rows)
stock = rows[1]
The production table contains 200 unique trading symbols and has the schema similar to the test table.
My doubt is that for the following statement:
x = "Select * FROM orders where tradingsymbol like 'TATACHEM'"
I will have to replace value of tradingsymbols 200 times which is ineffective.
Is there an effective way to do this?
If I understand you correctly, your problem is that you want to avoid sending multiple queries for each trading symbol, correct? In this case the following MySQL IN might be of help. You could then simply send one query to the database containing all tradingsymbols you want. If you want to do different things with the various trading symbols, you could select the subsets within pandas.
Another performance improvement could be pandas.read_sql since this speeds up the creation of the dataframe somewhat
Two more things to add for efficiency:
Ensure that tradingsymbols is indexed in MySQL for faster lookup processes
Make tradingsymbols an ENUM to ensure that no typos or alike are accepted. Otherwise the above-mentioned "IN" method also does not work since it does a full-text comparison.
Is there a way to return the aliased column names from a sql query returned from JayDeBeApi?
For example, I have the following query:
sql = """ SELECT visitorid AS id_alias FROM table LIMIT 1 """
I then run the following (connect_to_vdm() establishes a connection to my DB):
curs = connect_to_vdm().cursor()
curs.execute(sql)
vals = curs.fetchall()
I normally retrieve column names like so:
desc = curs.description
column_names = [col[0] for col in desc]
This returns the original column name "visitorid" and not the alias specified in the query "id_alias".
I know I could swap the names for the value in Python, but hoping to be able to have this done within the query since it is already defined in the Select statement. This behaves as expected in a SQL client, but I cannot seem to get the Aliases to return when using python/JayDeBeApi. Is there a way to do this using JayDeBeApi?
EDIT:
I have discovered that structuring my query with a CTE seems to help fix the problem, but still wondering if there is a more straightforward solution out there. Here is how I rewrote the same query:
sql = """ WITH cte (id_alias) AS (SELECT visitorid AS id_alias FROM table LIMIT 1) SELECT id_alias from cte"""
I was able to fix this using a CTE (Common Table Expression)
sql = """ WITH cte (id_alias) AS (SELECT visitorid AS id_alias FROM table LIMIT 1) SELECT id_alias from cte"""
Hat tip to pybokeh on Github, but this worked for me.
According to IBM (here and here), the behavior of JDBC drivers changed at some point. Bizarrely, the column aliases display just fine when using a tool like DBVisualizer, but not by querying through jaydebeapi.
To fix, add the following to the end of your DB URL:
:useJDBC4ColumnNameAndLabelSemantics=false;
Example:
jdbc:db2://[DBSERVER]:[PORT]/[DBNAME]:useJDBC4ColumnNameAndLabelSemantics=false;
I have 2 tables; we'll call them table1 and table2. table2 has a foreign key to table1. I need to delete the rows in table1 that have zero child records in table2. The SQL to do this is pretty straightforward:
DELETE FROM table1
WHERE 0 = (SELECT COUNT(*) FROM table2 WHERE table2.table1_id = table1.table1_id);
However, I haven't been able to find a way to translate this query to SQLAlchemy. Trying the straightforward approach:
subquery = session.query(sqlfunc.count(Table2).label('t2_count')).select_from(Table2).filter(Table2.table1_id == Table1.table1_id).subquery()
session.query(Table1).filter(0 == subquery.columns.t2_count).delete()
Just yielded an error:
sqlalchemy.exc.ArgumentError: Only deletion via a single table query is currently supported
How can I perform this DELETE with SQLAlchemy?
Python 2.7
PostgreSQL 9.2.4
SQLAlchemy 0.7.10 (Cannot upgrade due to using GeoAlchemy, but am interested if newer versions would make this easier)
I'm pretty sure this is what you want. You should try it out though. It uses EXISTS.
from sqlalchemy.sql import not_
# This fetches rows in python to determine which ones were removed.
Session.query(Table1).filter(not_(Table1.table2s.any())).delete(
synchronize_session='fetch')
# If you will not be referencing more Table1 objects in this session then you
# can just ignore syncing the session.
Session.query(Table1).filter(not_(Table1.table2s.any())).delete(
synchronize_session=False)
Explanation of the argument for delete():
http://docs.sqlalchemy.org/en/rel_0_8/orm/query.html#sqlalchemy.orm.query.Query.delete
Example with exists(using any() above uses EXISTS):
http://docs.sqlalchemy.org/en/rel_0_8/orm/tutorial.html#using-exists
Here is the SQL that should be generated:
DELETE FROM table1 WHERE NOT (EXISTS (SELECT 1
FROM table2
WHERE table1.id = table2.table1_id))
If you are using declarative I think there is a way to access Table2.table and then you could just use the sql layer of sqlalchemy to do exactly what you want. Although you run into the same issue of making your Session out of sync.
Well, I found one very ugly way to do it. You can do a select with a join to get the rows loaded into memory, then you can delete them individually:
subquery = session.query(Table2.table1_id
,sqlalchemy.func.count(Table2.table2_id).label('t1count')
) \
.select_from(Table2) \
.group_by(Table2.table1_id) \
.subquery()
rows = session.query(Table1) \
.select_from(Table1) \
.outerjoin(subquery, Table1.table1_id == subquery.c.table1_id) \
.filter(subquery.c.t1count == None) \
.all()
for r in rows:
session.delete(r)
This is not only nasty to write, it's also pretty nasty performance-wise. For starters, you have to bring the table1 rows into memory. Second, if you were like me and had a line like this on Table2's class definition:
table1 = orm.relationship(Table1, backref=orm.backref('table2s'))
then SQLAlchemy will actually perform a query to pull the related table2 rows into memory, too (even though there aren't any). Even worse, because you have to loop over the list (I tried just passing in the list; didn't work), it does so one table1 row at a time. So if you're deleting 10 rows, it's 21 individual queries (1 for the initial select, 1 for each relationship pull, and 1 for each delete). Maybe there are ways to mitigate that; I would have to go through the documentation to see. All this for things I don't even want in my database, much less in memory.
I won't mark this as the answer. I want a cleaner, more efficient way of doing this, but this is all I have for now.
So My problem is this, I have a query that uses Mysql User-defined variable like:
#x:=0 SELECT #X:=#X+1 from some_table and this code returns a column from 1-1000.
However, this query doesn't work if I sent it through mySQLdb in Python.
connection =MySQLdb.Connect(host='xxx',user='xxx',passwd='xxx',db = 'xxx')
cursor = connection.cursor
cursor.execute("""SET #X:=0;SELECT #X:=#X+1 FROM some_table""")
rows = cursor.fetchall()
print rows
It prints a empty tuple.
How can I solve this?
Thanks
Try to execute one query at a time:
cursor.execute("SET #X:=0;");
cursor.execute("SELECT #X:=#X+1 FROM some_table");
Try it as two queries.
If you want it to be one query, the examples in the comments to the MySQL User Variables documentation look like this:
SELECT #rownum:=#rownum+1 rownum, t.* FROM (SELECT #rownum:=1) r, mytable t;
or
SELECT if(#a, #a:=#a+1, #a:=1) as rownum
See http://dev.mysql.com/doc/refman/5.1/en/user-variables.html