I am running a SQL statement and fetching the results(column/header) and I am trying to collect only the headers that have _check at the end of it and then do a .replace command to get a list of the newly named columns without the _check at the end.
Here is my code:
pg_hook = PostgresHook(postgre_conn_id="postgres_default", schema='schema1')
connection = pg_hook.get_conn()
col_query = "select * from schema.table"
cursor = connection.cursor()
cursor.execute(col_query)
#fetchall to dictonary
desc = cursor.description
column_names = [col[0] for col in desc]
data = [dict(zip(column_names, row)) for row in cursor.fetchall()]
for x in column_names:
if x='updated_check':
x.replace('_check','')
connection.commit()
connection.close()
Any ideas or suggestions on how to replace the name of the columns that have _check in them and them put them inside a list? Any help would be appreciated. I am using airflow, python, mysqldb, and psycopg2
Related
I am trying to capture the only the record from a PostgreSQL statement. The select statements outputs one row with column named as updated_at and the value is a timestamp- '2008-01-01 00:50:01'. I want to just capture/collect that value so when I call that variable, it just outputs '2008-01-01 00:50:01'.
Here is my code:
def get_etl_record():
pg_hook = PostgresHook(postgre_conn_id="post", schema='schema1')
connection = pg_hook.get_conn()
cursor = connection.cursor()
cursor2 = connection.cursor()
latest_update_query = "select max(updated_at) from my_table group by updated_at"
cursor.execute(latest_update_query)
#results= cursor.fetchall()
columns = [col[0] for col in cursor.description]
rows = [dict(zip(columns, row[0])) for row in cursor.fetchall()]
print(rows)
However this code doesnt give me an output.
Any ideas or suggestions?
There is no way to do what you want, but 3 ways to do very similar:
1.
cursor = connection.cursor()
cursor.execute(sql)
result = cursor.fetchone()
max_updated_at = result[0]
2.
dict_cur = connection.cursor(cursor_factory=psycopg2.extras.DictCursor)
dict_cur.execute('select max(updated_at) as max_updated_at ...')
result = dict_cur.fetchone()
max_updated_at = result['max_updated_at']
3.
nt_cur = connection.cursor(cursor_factory=psycopg2.extras.NamedTupleCursor)
nt_cur.execute('select max(updated_at) as max_updated_at ...')
result = nt_cur.fetchone()
max_updated_at = result.max_updated_at
How to fetch just one column as array in python with pymysql;
for example sql:
select name from users
data:
["Tom", "Ben", "Jon"]
cursor = conn.cursor() # where conn is your connection
cursor.execute('select name from users')
rows = cursor.fetchall()
result_list = [row[0] for row in rows]
here's a run down of what I'd like to do: I have a list of table names, and I want to run sql against an oracle database and pull back the table name and row count for every table in my table list. However, not every table name in my list of table names is necessarily actually in the database. This causes my code to throw a database error. What I would like to do, is whenever I come to a table name that is not in the database, I create a dataframe that contains the table name and instead of count(*), there's some text that says 'table not found', or something similar. At the end of the loop I'm concatenating all of the dataframes into one dataframe. The overall goal here is to validate that certain tables exist and that they have the expected row counts.
query_list=[]
df_List=[]
connstr= '%s/%s#%s' %(username, password, server)
conn = cx_Oracle.connect(connstr)
with conn:
query_list = ["SELECT '%s' as tbl, count(*) FROM %s." %(elm, database) +elm for elm in table_list]
df_List = [pd.read_sql(elm,conn) for elm in query_list]
df = pd.concat(df_List)
Consider try/except handling to return query output or table not found output:
def get_table_count(sql, conn, elm):
try:
return pd.read_sql(sql, conn)
except:
return pd.DataFrame({'tbl': elm, 'note': 'table not found'}, index = [0])
with conn:
sql = "SELECT '{t}' as tbl, count(*) as table_count FROM {d}.{t}"
df_List = [get_table_count(sql.format(t = elm, d = database), conn, elm) \
for elm in table_list]
df = pd.concat(df_List, ignore_index = True)
Get a list of all the Table Names which are in the DB, then create a loop to query each Table to get the row count.
Here is a SQL statement to get a list of all Tables in an Oracle DB:
SQL:
SELECT DISTINCT TABLE_NAME FROM ALL_TAB_COLUMNS ORDER BY TABLE_NAME ASC;
Python (to make list of tables you want row counts for and which exist in the DB):
list(set(tables_that_exist_in_DB) - (set(tables_that_exist_in_DB) - set(list_of_tables_you_want)))
import sqlite3
#connect to the sqlite database
conn = sqlite3.connect('database.db')
#create a cursor
c = conn.cursor()
#select query to return a single row
c.execute('SELECT NAME FROM T1')
#row contains the returned result
row = c.fetchone()
#print the result
print(row)
It prints something like --> (u'John',), but I only want John
You are printing the whole row, which is always going to be a tuple.
If you wanted to print just the first column, use subscription:
print(row[0])
In one of my django views I query database using plain sql (not orm) and return results.
sql = "select * from foo_bar"
cursor = connection.cursor()
cursor.execute(sql)
rows = cursor.fetchall()
I am getting the data fine, but not the column names. How can I get the field names of the result set that is returned?
On the Django docs, there's a pretty simple method provided (which does indeed use cursor.description, as Ignacio answered).
def dictfetchall(cursor):
"Return all rows from a cursor as a dict"
columns = [col[0] for col in cursor.description]
return [
dict(zip(columns, row))
for row in cursor.fetchall()
]
According to PEP 249, you can try using cursor.description, but this is not entirely reliable.
I have found a nice solution in Doug Hellmann's blog:
http://doughellmann.com/2007/12/30/using-raw-sql-in-django.html
from itertools import *
from django.db import connection
def query_to_dicts(query_string, *query_args):
"""Run a simple query and produce a generator
that returns the results as a bunch of dictionaries
with keys for the column values selected.
"""
cursor = connection.cursor()
cursor.execute(query_string, query_args)
col_names = [desc[0] for desc in cursor.description]
while True:
row = cursor.fetchone()
if row is None:
break
row_dict = dict(izip(col_names, row))
yield row_dict
return
Example usage:
row_dicts = query_to_dicts("""select * from table""")
try the following code :
def read_data(db_name,tbl_name):
details = sfconfig_1.dbdetails
connect_string = 'DRIVER=ODBC Driver 17 for SQL Server;SERVER={server}; DATABASE={database};UID={username}\
;PWD={password};Encrypt=YES;TrustServerCertificate=YES'.format(**details)
connection = pyodbc.connect(connect_string)#connecting to the server
print("connencted to db")
# query syntax
query = 'select top 100 * from '+'[{}].[dbo].[{}]'.format(db_name,tbl_name) + ' t where t.chargeid ='+ "'622102*3'"+';'
#print(query,"\n")
df = pd.read_sql_query(query,con=connection)
print(df.iloc[0])
return "connected to db...................."