How to retrieve all saved queries from MS Access database using python? - python

Basically, i'm comparing 2 access databases using python.
i do not have access to manually open any access files, it must be done entirely within python!
I need to retrieve the full list of
Query names
Associated query code
I will not know what the names of the queries are ahead of time.
I've tried a number of solutions that have nearly worked, i've outlined the 3 closed below.
Partial Solution 1
i nearly had it working using win32com & the CurrentDb.QueryDefs method to retrieve each query's code.
However, it appears that the order of the joins is not stored deterministicaly between 2 databases.
(it appears to be dependent on the order of the entry in MSysQueries)
i.e. in one database, the text for the join could be
on Table1.ColumnA = Table2.ColumnA & Table1.ColumnB = Table2.ColumnB
and in another
on Table1.ColumnB = Table2.ColumnB & Table1.ColumnA = Table2.ColumnA
obviously these will result in the same type of join, but not the exact same query text.
If i compared the text directly they do not match. Processing the text before comparing seems like a bad idea with lots of corner cases.
Sample Code
objAccess = Dispatch("Access.Application")
objAccess.Visible = False
counter = 0
query_dicts = {}
for database_path in (new_database_path, old_database_path):
# Open New DB and pull stored queries into dict
objAccess.OpenCurrentDatabase(database_path)
objDB = objAccess.CurrentDb()
db_query_dict = {}
for stored_query in objDB.QueryDefs:
db_query_dict[stored_query.name] = stored_query.sql
query_dicts[("New" if counter == 0 else 'Old')] = db_query_dict
objAccess.CloseCurrentDatabase()
counter += 1
Partial Solution 2
After the first solution failed, i tried to write a query on MSysQueries and force an ordering. However, pyodbc does not have read access to the table!
It appears you cannot grant read access from python itself, which is an issue, could be wrong here.
Query:
SELECT MSysObjects.Name
, MSysQueries.Attribute
, MSysQueries.Expression
, MSysQueries.Flag
, MSysQueries.Name1
, MSysQueries.Name2
FROM MSysObjects INNER JOIN MSysQueries ON MSysObjects.Id = MSysQueries.ObjectId
order by MSysObjects.Name
, MSysQueries.Attribute
, MSysQueries.Expression
, MSysQueries.Flag
, MSysQueries.Name1
, MSysQueries.Name2
Partial Solution 3
Another thing i tried was to get python to store a VBA module into the database, that will write the meta info to a table and then read that table out via pyodbc.
i could add the module, but the access database kept prompting for a name for the module. I couldnt find the documentation on how to name the module with a method call
Sample Code:
import win32com.client as win32
import comtypes, comtypes.client
import win32api, time
from win32com.client import Dispatch
strDbName = r'C:\Users\Username\SampleDatabase.mdb'
objAccess = Dispatch("Access.Application")
# objAccess.Visible = False
objAccess.OpenCurrentDatabase(strDbName)
objDB = objAccess.CurrentDb()
xlmodule = objAccess.VBE.VbProjects(1).VBComponents.Add(1) # vbext_ct_StdModule
xlmodule.CodeModule.AddFromString(Constants.ACCESS_QUERY_META_INFO_MACRO)
objAccess.Run("CreateQueryMetaInfoTable")
objAccess.CloseCurrentDatabase()
objAccess.Quit()
Macro i was attempting to add.
Sub CreateQueryMetaInfoTable()
Dim sql_string As String
# Create empty table
CurrentDb.Execute ("Create Table QueryMetaInfoTable (QueryName text, SqlCode text)")
Dim qd As QueryDef
For Each qd In CurrentDb.QueryDefs
# insert values
sql_string = "Insert into QueryMetaInfoTable (QueryName, SqlCode) values ('" & qd.Name & "', '" & qd.SQL & "')"
CurrentDb.Execute sql_string
Next
End Sub

With the help of #Gord Thompson, i have a working solution now.
I needed to connect with OLEDB to grant the read access 1st, generated a non-system table with the info needed, then read the table back with ODBC via pandas.
CONNECTION_STRING_OLEDB = "PROVIDER=Microsoft.Jet.OLEDB.4.0;DATA SOURCE={};Jet OLEDB:System Database={};"
ACCESS_QUERY_META_INFO_CREATE = """SELECT MSysObjects.Name
, MSysQueries.Attribute
, MSysQueries.Expression
, MSysQueries.Flag
, MSysQueries.Name1
, MSysQueries.Name2
INTO QueryMetaInfo
FROM MSysObjects INNER JOIN MSysQueries ON MSysObjects.Id = MSysQueries.ObjectId
order by MSysObjects.Name
, MSysQueries.Attribute
, MSysQueries.Expression
, MSysQueries.Flag
, MSysQueries.Name1
, MSysQueries.Name2"""
ACCESS_QUERY_META_INFO_READ = """select * from QueryMetaInfo
order by Name
, Attribute
, Expression
, Flag
, Name1
, Name2;"""
ACCESS_QUERY_META_INFO_DROP = "DROP TABLE QueryMetaInfo"
connection = win32com.client.Dispatch(r'ADODB.Connection')
DSN = CONNECTION_STRING_OLEDB.format(database_path, r"C:\Users\C218\AppData\Roaming\Microsoft\Access\System.mdw")
connection.Open(DSN)
cmd = win32com.client.Dispatch(r'ADODB.Command')
cmd.ActiveConnection = connection
cmd.CommandText = "GRANT SELECT ON MSysObjects TO Admin;"
cmd.Execute()
connection.Execute(ACCESS_QUERY_META_INFO_CREATE)
connection.Close()
# connect with odbc to read the query meta info into pandas
connection_string = Constants.CONNECTION_STRING_ACCESS.format(database_path)
access_con = pyodbc.connect(connection_string)
access_cursor = access_con.cursor()
df = pd.read_sql(ACCESS_QUERY_META_INFO_READ, access_con)
# drop table after read
access_cursor.execute(ACCESS_QUERY_META_INFO_DROP)
access_cursor.commit

Related

Executing a postgresql query with plpg-sql from sqlalchemy

I can't find examples using plpg-sql in raw SQL to be executed by sqlAlchemy these were the closest but no plpg-sql:
How to execute raw SQL in Flask-SQLAlchemy app
how to set autocommit = 1 in a sqlalchemy.engine.Connection
I've done research and I'm not sure if this is possible. I'm trying to either INSERT or UPDATE a record and there is no error. It must fail silently because there's no record created/updated in the database and I've explicitly set AutoCommit=True.
Python:
engine = db.create_engine(connstr, pool_size=20, max_overflow=0)
Session = scoped_session(sessionmaker(bind=engine, autocommit=True))
s = Session()
query = """DO $$
declare
ppllastActivity date;
percComplete numeric;
begin
select lastactivity into ppllastActivity FROM feeds WHERE email = :e and courseName=:c and provider = :prov;
IF COALESCE (ppllastActivity, '1900-01-01') = '1900-01-01' THEN
INSERT INTO feeds (email, courseName, completedratio, lastActivity, provider) VALUES (:e, :c, :p, :l, :prov);
ELSEIF ppllastActivity < :l THEN
UPDATE feeds set completedratio = :p,lastActivity = :l WHERE email = :e and courseName = :c and provider = :prov;
END if;
end; $$"""
params = {'e' : item.get('email').replace("'", "''").lower(), 'c' : item.get('courseName').replace("'", "''"), 'p' : item.get('progress'), 'l' : item.get('lastActivity'),'prov' : "ACG" }
result = s.execute(text(query),params)
I'm unable to troubleshoot since it doesn't give me any errors. Am I going down the wrong path? Should I just use psql.exe or can you do plpg-sql in raw SQL with sqlAlchemy?
While typing this question up I found a solution or a bug.
The automcommit=True doesn't work, you have to begin a transaction:
with s.begin():
result = s.execute(text(query),params)

Extract domain from a link in Python and using SQL

I have a database to which I am connecting to using Python and running the sql statements in the following way .
import ibm_db
conn = ibm_db.connect("DATABASE=ABCD;HOSTNAME=dsomehostname.net;PORT=50001;PROTOCOL=TCPIP;UID=User1_id;PWD=Password; Security = SSL; ConnectTimeout = 30; sslConnection=TRUE","","")
connState = ibm_db.active(conn)
print(connState)
import ibm_db_dbi
# con = ibm_db_dbi.Connection(conn)
sql = "SELECT emails from Database1.Table1 WHERE TIMESTAMP>'2020-08-20' GROUP BY emails; "
stmt = ibm_db.exec_immediate(conn, sql)
dictionary = ibm_db.fetch_both(stmt)
It is giving me emails in the following way :
**https://abc**.ind.analytics.google.com/bs/?perspective=story
**https://abc**.ind.analytics.google.com/bs/
**https://tmb**.ind.analytics.google.com/bs/?perspective=ca-modeller
**https://fgt**.ind.analytics.google.com/bs/?perspective=explore
(null)
**https://abc**.ind.analytics.google.com/bs/?perspective=home
(null)
**https://col**.ind.analytics.google.com/bs/?perspective=classicviewer
**https://prod**.ind.analytics.google.com/bs/
(null)
**https://fcv**.ind.analytics.google.com/bs/?perspective=home
**https://prod**.health-analytics.something-else.com/bs/
(null)
**https://fcv**.health-analytics.something-else?perspective=home
I only want the bold part. i.e. I only want the part before "ind.analytics.google.com/bs/......" AND NOT before "health-analytics.something-else.com/bs/...:
https://abc
https://tmb
https://fgt
Is there a way I can include regex into this and fire the query. It will be great if someone can help me out with that.
You can iterate through the dictionary and collect the data you want into a list.
emailList = []
for email in dictionary:
domain = email.split('.')[0] // as the domain is the first element
emailList.append(domain)

Python call sql-server stored procedure with table valued parameter

I have a python script that loads , transform and calculates data. In sql-server there's a stored procedure that requires a table valued parameter, 2 required parameters and 2 optional parameters. In sql server I can call this SP:
USE [InstName]
GO
DECLARE #return_value int
DECLARE #MergeOnColumn core.MatchColumnTable
INSERT INTO #MergeOnColumn
SELECT 'foo.ExternalInput','bar.ExternalInput'
EXEC #return_value = [core].[_TableData]
#Target = N'[dbname].[tablename1]',
#Source = N'[dbname].[table2]',
#MergeOnColumn = #MergeOnColumn,
#Opt1Param = False,
#Opt2Param = False
SELECT 'Return Value' = #return_value
GO
after a comprehensive search I found the following post:
How to call stored procedure with SQLAlchemy that requires a user-defined-type Table parameter
it suggests to use PYTDS and the sql-alchemy 's dialect 'sql alchemy pytds' to call a SP with table valued parameters.
with this post and the documentation I created the following Python script:
import pandas as pd
import pytds
from pytds import login
import sqlalchemy as sa
from sqlalchemy import create_engine
import sqlalchemy_pytds
def connect():
return pytds.connect(dsn='ServerName',database='DBName', auth=login.SspiAuth())
engine = sa.create_engine('mssql+pytds://[ServerName]', creator=connect)
conn = engine.raw_connection()
with conn.cursor() as cur:
arg = ("foo.ExternalInput","bar.ExternalInput")
tvp = pytds.TableValuedParam(type_name="MergeOnColumn", rows=(arg))
cur.execute('EXEC test_proc %s', ("[dbname].[table2]", "[dbname].[table1]", tvp,))
cur.fetchall()
When I run this code I get the following error message:
TypeError: not all arguments converted during string formatting
Doe anyone know how to pass in the multiple arguments correctly or has a suggestion how I could handle this call SP directly?
On the basis of the comments to my question i've managed to get the stored procedure running with table valued parameters (and get the return values from the SP)
The final script is as follows:
import pandas as pd
import pytds
from pytds import login
import sqlalchemy as sa
from sqlalchemy import create_engine
import sqlalchemy_pytds
def connect():
return pytds.connect(dsn='ServerName',database='DBName',autocommit=True, auth=login.SspiAuth())
engine = sa.create_engine('mssql+pytds://[ServerName]', creator=connect)
conn = engine.raw_connection()
with conn.cursor() as cur:
arg = [["foo.ExternalInput","bar.ExternalInput"]]
tvp = pytds.TableValuedParam(type_name="core.MatchColumnTable", rows=arg)
cur.execute("EXEC test_proc #Target = N'[dbname].[tablename1]', #Source = N'[dbname].[table2]', #CleanTarget = 0, #UseColumnsFromTarget = 0, #MergeOnColumn = %s", (tvp,))
result = cur.fetchall()
print(result)
The autocommit is added in the connection (to commit the transaction in the cursor), the table valued parameter (marchcolumntable) expects 2 columns, so the arg is modified to fit 2 columns.
The parameters that are required besides the tvp are included in the exec string. The last param in the execute string is the name of the tvp parameter(mergeoncolumn) that is filled with the tvp.
optionally you can add the result status or row count as descripted in the pytds documentation:
https://python-tds.readthedocs.io/en/latest/index.html
Note!: in the stored procedure you have to make sure that the
SET NOCOUNT ON is added otherwise you wont get any results back to Python
pytds
Python DBAPI driver for MSSQL using pure Python TDS (Tabular Data Stream) protocol implementation
I used pytds for merge / upsert via a stored procedure targeting a SQL Server.
Example
Here are a example of the basic functions, a row data is represented by Tuple:
def get_connection(instance: str, database: str, user: str, password: str):
return pytds.connect(
dsn=instance, database=database, user=user, password=password, autocommit=True
)
def execute_with_tvp(connection: pytds.Connection, procedure_name: str, rows: list):
with connection.cursor() as cursor:
tvp = pytds.TableValuedParam(type_name=my_type, rows=rows)
cursor.callproc(procedure_name, tvp)
mssql+pyodbc://
pyodbc added support for table-valued parameters (TVPs) in version 4.0.25, released 2018-12-13. Simply supply the TVP value as a list of tuples:
proc_name = "so51930062"
type_name = proc_name + "Type"
# set up test environment
with engine.begin() as conn:
conn.exec_driver_sql(f"""\
DROP PROCEDURE IF EXISTS {proc_name}
""")
conn.exec_driver_sql(f"""\
DROP TYPE IF EXISTS {type_name}
""")
conn.exec_driver_sql(f"""\
CREATE TYPE {type_name} AS TABLE (
id int,
txt nvarchar(50)
)
""")
conn.exec_driver_sql(f"""\
CREATE PROCEDURE {proc_name}
#prefix nvarchar(10),
#tvp {type_name} READONLY
AS
BEGIN
SET NOCOUNT ON;
SELECT id, #prefix + txt AS new_txt FROM #tvp;
END
""")
#run test
with engine.begin() as conn:
data = {"prefix": "new_", "tvp": [(1, "foo"), (2, "bar")]}
sql = f"{{CALL {proc_name} (:prefix, :tvp)}}"
print(conn.execute(sa.text(sql), data).fetchall())
# [(1, 'new_foo'), (2, 'new_bar')]

Using Boto3 to interact with amazon Aurora on RDS

I have set up a database in Amazon RDS using Amazon Aurora and would like to interact with the database using Python - the obvious choice is to use Boto.
However, their documentation is awful and does nopt cover ways in which I can interact with the databse to:
Run queries with SQL statements
Interact with the tables in the database
etc
Does anyone have an links to some examples/tutorials, or know how to do these tasks?
When using Amazon RDS offerings (including Aurora), you don't connect to the database via any AWS API (including Boto). Instead you would use the native client of your chosen database. In the case of Aurora, you would connect using the MySQL Command Line client. From there, you can query it just like any other MySQL database.
There's a brief section of the "Getting Started" documentation that talks about connecting to your Aurora database:
Connecting to an Amazon Aurora DB Cluster
Here are a couple examples:
INSERT example:
import boto3
sql = """
INSERT INTO YOUR_TABLE_NAME_HERE
(
your_column_name_1
,your_column_name_2
,your_column_name_3)
VALUES(
:your_param_1_name
,:your_param_2_name)
,:your_param_3_name
"""
param1 = {'name':'your_param_1_name', 'value':{'longValue': 5}}
param2 = {'name':'your_param_2_name', 'value':{'longValue': 63}}
param3 = {'name':'your_param_3_name', 'value':{'stringValue': 'para bailar la bamba'}}
param_set = [param1, param2, param3]
db_clust_arn = 'your_db_cluster_arn_here'
db_secret_arn = 'your_db_secret_arn_here'
rds_data = boto3.client('rds-data')
response = rds_data.execute_statement(
resourceArn = db_clust_arn,
secretArn = db_secret_arn,
database = 'your_database_name_here',
sql = sql,
parameters = param_set)
print(str(response))
READ example:
import boto3
rds_data = boto3.client('rds-data')
db_clust_arn = 'your_db_cluster_arn_here'
db_secret_arn = 'your_db_secret_arn_here'
employee_id = 35853
get_vacation_days_sql = f"""
select vacation_days_remaining
from employees_tbl
where employee_id = {employee_id}
"""
response1 = rds_data.execute_statement(
resourceArn = db_clust_arn,
secretArn = db_secret_arn,
database = 'your_database_name_here',
sql = get_vacation_days_sql)
#recs is a list (of rows returned from Db)
recs = response1['records']
print(f"recs === {recs}")
#recs === [[{'longValue': 57}]]
#single_row is a list of dictionaries, where each dictionary represents a
#column from that single row
for single_row in recs:
print(f"single_row === {single_row}")
#single_row === [{'longValue': 57}]
#one_dict is a dictionary with one key value pair
#where the key is the data type of the column and the
#value is the value of the column
#each additional column is another dictionary
for single_column_dict in single_row:
print(f"one_dict === {single_column_dict}")
# one_dict === {'longValue': 57}
vacation_days_remaining = single_column_dict['longValue']
print(f'vacation days remaining === {vacation_days_remaining}')
Source Link:
https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/data-api.html#data-api.calling.python

getting only updated data from database

I have to get the recently updated data from database. For the purpose of solving it, I have saved the last read row number into shelve of python. The following code works for a simple query like select * from rows. My code is:
from pyodbc import connect
from peewee import *
import random
import shelve
import connection
d = shelve.open("data.shelve")
db = SqliteDatabase("data.db")
class Rows(Model):
valueone = IntegerField()
valuetwo = IntegerField()
class Meta:
database = db
def CreateAndPopulate():
db.connect()
db.create_tables([Rows],safe=True)
with db.atomic():
for i in range(100):
row = Rows(valueone=random.randrange(0,100),valuetwo=random.randrange(0,100))
row.save()
db.close()
def get_last_primay_key():
return d.get('max_row',0)
def doWork():
query = "select * from rows" #could be anything
conn = connection.Connection("localhost","","SQLite3 ODBC Driver","data.db","","")
max_key_query = "SELECT MAX(%s) from %s" % ("id", "rows")
max_primary_key = conn.fetch_one(max_key_query)[0]
print "max_primary_key " + str(max_primary_key)
last_primary_key = get_last_primay_key()
print "last_primary_key " + str(last_primary_key)
if max_primary_key == last_primary_key:
print "no new records"
elif max_primary_key > last_primary_key:
print "There are some datas"
optimizedQuery = query + " where id>" + str(last_primary_key)
print query
for data in conn.fetch_all(optimizedQuery):
print data
d['max_row'] = max_primary_key
# print d['max_row']
# CreateAndPopulate() # to populate data
doWork()
While the code will work for a simple query without where clause, but the query can be anything from simple to complex, having joins and multiple where clauses. If so, then the portion where I'm adding where will fail. How can I get only last updated data from database whatever be the query?
PS: I cannot modify database. I just have to fetch from it.
Use an OFFSET clause. For example:
SELECT * FROM [....] WHERE [....] LIMIT -1 OFFSET 1000
In your query, replace 1000 with a parameter bound to your shelve variable. That will skip the top "shelve" number of rows and only grab newer ones. You may want to consider a more robust refactor eventually, but good luck.

Categories