Using Boto3 to interact with amazon Aurora on RDS - python

I have set up a database in Amazon RDS using Amazon Aurora and would like to interact with the database using Python - the obvious choice is to use Boto.
However, their documentation is awful and does nopt cover ways in which I can interact with the databse to:
Run queries with SQL statements
Interact with the tables in the database
etc
Does anyone have an links to some examples/tutorials, or know how to do these tasks?

When using Amazon RDS offerings (including Aurora), you don't connect to the database via any AWS API (including Boto). Instead you would use the native client of your chosen database. In the case of Aurora, you would connect using the MySQL Command Line client. From there, you can query it just like any other MySQL database.
There's a brief section of the "Getting Started" documentation that talks about connecting to your Aurora database:
Connecting to an Amazon Aurora DB Cluster

Here are a couple examples:
INSERT example:
import boto3
sql = """
INSERT INTO YOUR_TABLE_NAME_HERE
(
your_column_name_1
,your_column_name_2
,your_column_name_3)
VALUES(
:your_param_1_name
,:your_param_2_name)
,:your_param_3_name
"""
param1 = {'name':'your_param_1_name', 'value':{'longValue': 5}}
param2 = {'name':'your_param_2_name', 'value':{'longValue': 63}}
param3 = {'name':'your_param_3_name', 'value':{'stringValue': 'para bailar la bamba'}}
param_set = [param1, param2, param3]
db_clust_arn = 'your_db_cluster_arn_here'
db_secret_arn = 'your_db_secret_arn_here'
rds_data = boto3.client('rds-data')
response = rds_data.execute_statement(
resourceArn = db_clust_arn,
secretArn = db_secret_arn,
database = 'your_database_name_here',
sql = sql,
parameters = param_set)
print(str(response))
READ example:
import boto3
rds_data = boto3.client('rds-data')
db_clust_arn = 'your_db_cluster_arn_here'
db_secret_arn = 'your_db_secret_arn_here'
employee_id = 35853
get_vacation_days_sql = f"""
select vacation_days_remaining
from employees_tbl
where employee_id = {employee_id}
"""
response1 = rds_data.execute_statement(
resourceArn = db_clust_arn,
secretArn = db_secret_arn,
database = 'your_database_name_here',
sql = get_vacation_days_sql)
#recs is a list (of rows returned from Db)
recs = response1['records']
print(f"recs === {recs}")
#recs === [[{'longValue': 57}]]
#single_row is a list of dictionaries, where each dictionary represents a
#column from that single row
for single_row in recs:
print(f"single_row === {single_row}")
#single_row === [{'longValue': 57}]
#one_dict is a dictionary with one key value pair
#where the key is the data type of the column and the
#value is the value of the column
#each additional column is another dictionary
for single_column_dict in single_row:
print(f"one_dict === {single_column_dict}")
# one_dict === {'longValue': 57}
vacation_days_remaining = single_column_dict['longValue']
print(f'vacation days remaining === {vacation_days_remaining}')
Source Link:
https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/data-api.html#data-api.calling.python

Related

How to use Bulk insert to insert data from Dataframe to SQL Server table?

I'm new to Python so reaching out for help. I have a csv file in S3 bucket, I would like to use Python pyodbc to import this csv file to a table in SQL server. This file is 50 MB (400k records). My code is below. As my code states below, my csv data is in a dataframe, how can I use Bulk insert to insert dataframe data into sql server table. If my approach does not work, please advise me with a different approach.
# Connection to S3
s3 = boto3.client(
service_name = 's3',
region_name = 'us-gov-west-1',
aws_access_key_id = 'ZZZZZZZZZZZZZZZZZZ',
aws_secret_access_key = 'AAAAAAAAAAAAAAAAA')
# Connection to SQL Server
server = 'myserver.amazonaws.com'
path = 'folder1/folder2/folder3/myCSVFile.csv'
cnxn = pyodbc.connect('DRIVER={SQL Server};SERVER='+server+';DATABASE=DB-staging;UID=User132;PWD=XXXXXX')
cursor = cnxn.cursor()
obj_sum = s3.get_object(Bucket = 'my_bucket', Key = path)
csv_data = pd.read_csv(obj_sum['Body'])
df = pd.DataFrame(csv_data, columns = ['SYSTEM_NAME', 'BUCKET_NAME', 'LOCATION', 'FILE_NAME', 'LAST_MOD_DATE', 'FILE_SIZE'])
#print(df.head(n=15).to_string(index=False))
# Insert DataFrame to table
cursor.execute("""truncate table dbo.table1""")
cursor.execute("""BULK INSERT dbo.table1 FROM """ + .....# what do I put here since data is in dataframe??)
I tried to loop through the dataframe and it took 20 minutes to insert 5k records. Code below. Looping through each record is an option but a poor one. This is why I'm moving towards bulk insert if possible.
for i in df.itertuples(index = False):
if i.FILE_SIZE != 0:
cursor.execute("""insert into dbo.table1 (SYSTEM_NAME, BUCKET_NAME, X_LOCATION, FILE_NAME, LAST_MOD_DATE, FILE_SIZE)
values (?,?,?,?,?,?)""", i.SYSTEM_NAME, i.BUCKET_NAME, i.LOCATION, i.FILE_NAME, i.LAST_MOD_DATE, i.FILE_SIZE)
Lastly, bonus question ... I would like to check if the "FILE_SIZE" column in my dataframe equals to 0, if it is skip over that record and move forward to the next record.
Thank you in advnace.
Thanks for the help.
using fast_executemany = True did the job for me.
engine = sal.create_engine("mssql+pyodbc://username:password#"+server+":1433/db-name?driver=ODBC+Driver+17+for+SQL+Server?Trusted_Connection=yes",
fast_executemany = True)
conn = engine.connect()
I had to change my code around to use "sqlalchemy" but it working great now.
To call the function to upload data to SQL Server is below:
df.to_sql(str, con = engine, index = False, if_exists = 'replace')

Extract domain from a link in Python and using SQL

I have a database to which I am connecting to using Python and running the sql statements in the following way .
import ibm_db
conn = ibm_db.connect("DATABASE=ABCD;HOSTNAME=dsomehostname.net;PORT=50001;PROTOCOL=TCPIP;UID=User1_id;PWD=Password; Security = SSL; ConnectTimeout = 30; sslConnection=TRUE","","")
connState = ibm_db.active(conn)
print(connState)
import ibm_db_dbi
# con = ibm_db_dbi.Connection(conn)
sql = "SELECT emails from Database1.Table1 WHERE TIMESTAMP>'2020-08-20' GROUP BY emails; "
stmt = ibm_db.exec_immediate(conn, sql)
dictionary = ibm_db.fetch_both(stmt)
It is giving me emails in the following way :
**https://abc**.ind.analytics.google.com/bs/?perspective=story
**https://abc**.ind.analytics.google.com/bs/
**https://tmb**.ind.analytics.google.com/bs/?perspective=ca-modeller
**https://fgt**.ind.analytics.google.com/bs/?perspective=explore
(null)
**https://abc**.ind.analytics.google.com/bs/?perspective=home
(null)
**https://col**.ind.analytics.google.com/bs/?perspective=classicviewer
**https://prod**.ind.analytics.google.com/bs/
(null)
**https://fcv**.ind.analytics.google.com/bs/?perspective=home
**https://prod**.health-analytics.something-else.com/bs/
(null)
**https://fcv**.health-analytics.something-else?perspective=home
I only want the bold part. i.e. I only want the part before "ind.analytics.google.com/bs/......" AND NOT before "health-analytics.something-else.com/bs/...:
https://abc
https://tmb
https://fgt
Is there a way I can include regex into this and fire the query. It will be great if someone can help me out with that.
You can iterate through the dictionary and collect the data you want into a list.
emailList = []
for email in dictionary:
domain = email.split('.')[0] // as the domain is the first element
emailList.append(domain)

Python call sql-server stored procedure with table valued parameter

I have a python script that loads , transform and calculates data. In sql-server there's a stored procedure that requires a table valued parameter, 2 required parameters and 2 optional parameters. In sql server I can call this SP:
USE [InstName]
GO
DECLARE #return_value int
DECLARE #MergeOnColumn core.MatchColumnTable
INSERT INTO #MergeOnColumn
SELECT 'foo.ExternalInput','bar.ExternalInput'
EXEC #return_value = [core].[_TableData]
#Target = N'[dbname].[tablename1]',
#Source = N'[dbname].[table2]',
#MergeOnColumn = #MergeOnColumn,
#Opt1Param = False,
#Opt2Param = False
SELECT 'Return Value' = #return_value
GO
after a comprehensive search I found the following post:
How to call stored procedure with SQLAlchemy that requires a user-defined-type Table parameter
it suggests to use PYTDS and the sql-alchemy 's dialect 'sql alchemy pytds' to call a SP with table valued parameters.
with this post and the documentation I created the following Python script:
import pandas as pd
import pytds
from pytds import login
import sqlalchemy as sa
from sqlalchemy import create_engine
import sqlalchemy_pytds
def connect():
return pytds.connect(dsn='ServerName',database='DBName', auth=login.SspiAuth())
engine = sa.create_engine('mssql+pytds://[ServerName]', creator=connect)
conn = engine.raw_connection()
with conn.cursor() as cur:
arg = ("foo.ExternalInput","bar.ExternalInput")
tvp = pytds.TableValuedParam(type_name="MergeOnColumn", rows=(arg))
cur.execute('EXEC test_proc %s', ("[dbname].[table2]", "[dbname].[table1]", tvp,))
cur.fetchall()
When I run this code I get the following error message:
TypeError: not all arguments converted during string formatting
Doe anyone know how to pass in the multiple arguments correctly or has a suggestion how I could handle this call SP directly?
On the basis of the comments to my question i've managed to get the stored procedure running with table valued parameters (and get the return values from the SP)
The final script is as follows:
import pandas as pd
import pytds
from pytds import login
import sqlalchemy as sa
from sqlalchemy import create_engine
import sqlalchemy_pytds
def connect():
return pytds.connect(dsn='ServerName',database='DBName',autocommit=True, auth=login.SspiAuth())
engine = sa.create_engine('mssql+pytds://[ServerName]', creator=connect)
conn = engine.raw_connection()
with conn.cursor() as cur:
arg = [["foo.ExternalInput","bar.ExternalInput"]]
tvp = pytds.TableValuedParam(type_name="core.MatchColumnTable", rows=arg)
cur.execute("EXEC test_proc #Target = N'[dbname].[tablename1]', #Source = N'[dbname].[table2]', #CleanTarget = 0, #UseColumnsFromTarget = 0, #MergeOnColumn = %s", (tvp,))
result = cur.fetchall()
print(result)
The autocommit is added in the connection (to commit the transaction in the cursor), the table valued parameter (marchcolumntable) expects 2 columns, so the arg is modified to fit 2 columns.
The parameters that are required besides the tvp are included in the exec string. The last param in the execute string is the name of the tvp parameter(mergeoncolumn) that is filled with the tvp.
optionally you can add the result status or row count as descripted in the pytds documentation:
https://python-tds.readthedocs.io/en/latest/index.html
Note!: in the stored procedure you have to make sure that the
SET NOCOUNT ON is added otherwise you wont get any results back to Python
pytds
Python DBAPI driver for MSSQL using pure Python TDS (Tabular Data Stream) protocol implementation
I used pytds for merge / upsert via a stored procedure targeting a SQL Server.
Example
Here are a example of the basic functions, a row data is represented by Tuple:
def get_connection(instance: str, database: str, user: str, password: str):
return pytds.connect(
dsn=instance, database=database, user=user, password=password, autocommit=True
)
def execute_with_tvp(connection: pytds.Connection, procedure_name: str, rows: list):
with connection.cursor() as cursor:
tvp = pytds.TableValuedParam(type_name=my_type, rows=rows)
cursor.callproc(procedure_name, tvp)
mssql+pyodbc://
pyodbc added support for table-valued parameters (TVPs) in version 4.0.25, released 2018-12-13. Simply supply the TVP value as a list of tuples:
proc_name = "so51930062"
type_name = proc_name + "Type"
# set up test environment
with engine.begin() as conn:
conn.exec_driver_sql(f"""\
DROP PROCEDURE IF EXISTS {proc_name}
""")
conn.exec_driver_sql(f"""\
DROP TYPE IF EXISTS {type_name}
""")
conn.exec_driver_sql(f"""\
CREATE TYPE {type_name} AS TABLE (
id int,
txt nvarchar(50)
)
""")
conn.exec_driver_sql(f"""\
CREATE PROCEDURE {proc_name}
#prefix nvarchar(10),
#tvp {type_name} READONLY
AS
BEGIN
SET NOCOUNT ON;
SELECT id, #prefix + txt AS new_txt FROM #tvp;
END
""")
#run test
with engine.begin() as conn:
data = {"prefix": "new_", "tvp": [(1, "foo"), (2, "bar")]}
sql = f"{{CALL {proc_name} (:prefix, :tvp)}}"
print(conn.execute(sa.text(sql), data).fetchall())
# [(1, 'new_foo'), (2, 'new_bar')]

Number of entries in SAP R/3 table using Pyrfc

How do you use the Pyrfc Python library to query the number of entries in an SAP R/3 database table?
I know of three methods to do this using Pyrfc. Modify the following example with your SAP R/3 server connection settings and desired table name:
from pyrfc import Connection
params = dict(
ashost="1.1.1.1",
sysnr="1",
client="100",
user="username",
passwd="password",
)
table = "MKAL"
with Connection(**params) as conn:
# Method 1
result = conn.call("RFC_GET_TABLE_ENTRIES", TABLE_NAME=table, MAX_ENTRIES=1)
entries = result["NUMBER_OF_ENTRIES"]
# Method 2
result = conn.call("EM_GET_NUMBER_OF_ENTRIES", IT_TABLES=[{"TABNAME": table}])
entries = result["IT_TABLES"][0]["TABROWS"]
# Method 3
short_field = "MANDT" # table field with short data length
result = conn.call(
"RFC_READ_TABLE",
QUERY_TABLE=table,
ROWCOUNT=0,
FIELDS=short_field,
)
entries = len(result)

CosmosDB and Python3: how to query?

I am using CosmosDB (Azure documentDB) in my project, written in Python 3.
I have been looking for a while now, but I cannot find out how to query my table. I have seen some example code, but I do not see an example of how to query... all I can do is get all documents (not ideal when my DB is > 80GB).
The GitHub repo shows a very tiny set of operations for database and collections: https://github.com/Azure/azure-documentdb-python/blob/master/samples/CollectionManagement/Program.py
And the following SO post shows how to read all documents... but not how to perform querying such as "WHERE = X;"
I'd really appreciate it if someone can point me in the right direction, and possibly supply an example showing how to run queries.
Based on my understanding, I think you want to know how to perform a SQL-like query using Python to retrieve documents on Azure CosmosDB of DocumentDB API, please refer to the code below from here.
A query is performed using SQL
# Query them in SQL
query = { 'query': 'SELECT * FROM server s' }
options = {}
options['enableCrossPartitionQuery'] = True
options['maxItemCount'] = 2
result_iterable = client.QueryDocuments(collection['_self'], query, options)
results = list(result_iterable);
print(results)
The above code is using the method QueryDocuments.
Any concern, please feel free to let me know.
Update: Combine with my sample code for the other SO thread you linked, as below.
from pydocumentdb import document_client
uri = 'https://ronyazrak.documents.azure.com:443/'
key = '<your-primary-key>'
client = document_client.DocumentClient(uri, {'masterKey': key})
db_id = 'test1'
db_query = "select * from r where r.id = '{0}'".format(db_id)
db = list(client.QueryDatabases(db_query))[0]
db_link = db['_self']
coll_id = 'test1'
coll_query = "select * from r where r.id = '{0}'".format(coll_id)
coll = list(client.QueryCollections(db_link, coll_query))[0]
coll_link = coll['_self']
query = { 'query': 'SELECT * FROM server s' }
docs = client.QueryDocuments(coll_link, query)
print list(docs)
query = 'SELECT * FROM c'
docs = list(client.QueryItems(coll_link,query))
QueryDocuments has been replaced with QueryItems.
I have a similar problem recently. You can fetch blocks (not entire query set) by calling fetch_next_block().
query = "select * from c"
options = {'maxItemCount': 1000, 'continuation': True}
q = db_source._client.QueryDocuments(collection_link, query, options)
block1 = q.fetch_next_block()
block2 = q.fetch_next_block()

Categories