CosmosDB and Python3: how to query? - python

I am using CosmosDB (Azure documentDB) in my project, written in Python 3.
I have been looking for a while now, but I cannot find out how to query my table. I have seen some example code, but I do not see an example of how to query... all I can do is get all documents (not ideal when my DB is > 80GB).
The GitHub repo shows a very tiny set of operations for database and collections: https://github.com/Azure/azure-documentdb-python/blob/master/samples/CollectionManagement/Program.py
And the following SO post shows how to read all documents... but not how to perform querying such as "WHERE = X;"
I'd really appreciate it if someone can point me in the right direction, and possibly supply an example showing how to run queries.

Based on my understanding, I think you want to know how to perform a SQL-like query using Python to retrieve documents on Azure CosmosDB of DocumentDB API, please refer to the code below from here.
A query is performed using SQL
# Query them in SQL
query = { 'query': 'SELECT * FROM server s' }
options = {}
options['enableCrossPartitionQuery'] = True
options['maxItemCount'] = 2
result_iterable = client.QueryDocuments(collection['_self'], query, options)
results = list(result_iterable);
print(results)
The above code is using the method QueryDocuments.
Any concern, please feel free to let me know.
Update: Combine with my sample code for the other SO thread you linked, as below.
from pydocumentdb import document_client
uri = 'https://ronyazrak.documents.azure.com:443/'
key = '<your-primary-key>'
client = document_client.DocumentClient(uri, {'masterKey': key})
db_id = 'test1'
db_query = "select * from r where r.id = '{0}'".format(db_id)
db = list(client.QueryDatabases(db_query))[0]
db_link = db['_self']
coll_id = 'test1'
coll_query = "select * from r where r.id = '{0}'".format(coll_id)
coll = list(client.QueryCollections(db_link, coll_query))[0]
coll_link = coll['_self']
query = { 'query': 'SELECT * FROM server s' }
docs = client.QueryDocuments(coll_link, query)
print list(docs)

query = 'SELECT * FROM c'
docs = list(client.QueryItems(coll_link,query))
QueryDocuments has been replaced with QueryItems.

I have a similar problem recently. You can fetch blocks (not entire query set) by calling fetch_next_block().
query = "select * from c"
options = {'maxItemCount': 1000, 'continuation': True}
q = db_source._client.QueryDocuments(collection_link, query, options)
block1 = q.fetch_next_block()
block2 = q.fetch_next_block()

Related

Executing a postgresql query with plpg-sql from sqlalchemy

I can't find examples using plpg-sql in raw SQL to be executed by sqlAlchemy these were the closest but no plpg-sql:
How to execute raw SQL in Flask-SQLAlchemy app
how to set autocommit = 1 in a sqlalchemy.engine.Connection
I've done research and I'm not sure if this is possible. I'm trying to either INSERT or UPDATE a record and there is no error. It must fail silently because there's no record created/updated in the database and I've explicitly set AutoCommit=True.
Python:
engine = db.create_engine(connstr, pool_size=20, max_overflow=0)
Session = scoped_session(sessionmaker(bind=engine, autocommit=True))
s = Session()
query = """DO $$
declare
ppllastActivity date;
percComplete numeric;
begin
select lastactivity into ppllastActivity FROM feeds WHERE email = :e and courseName=:c and provider = :prov;
IF COALESCE (ppllastActivity, '1900-01-01') = '1900-01-01' THEN
INSERT INTO feeds (email, courseName, completedratio, lastActivity, provider) VALUES (:e, :c, :p, :l, :prov);
ELSEIF ppllastActivity < :l THEN
UPDATE feeds set completedratio = :p,lastActivity = :l WHERE email = :e and courseName = :c and provider = :prov;
END if;
end; $$"""
params = {'e' : item.get('email').replace("'", "''").lower(), 'c' : item.get('courseName').replace("'", "''"), 'p' : item.get('progress'), 'l' : item.get('lastActivity'),'prov' : "ACG" }
result = s.execute(text(query),params)
I'm unable to troubleshoot since it doesn't give me any errors. Am I going down the wrong path? Should I just use psql.exe or can you do plpg-sql in raw SQL with sqlAlchemy?
While typing this question up I found a solution or a bug.
The automcommit=True doesn't work, you have to begin a transaction:
with s.begin():
result = s.execute(text(query),params)

How to count using pydocumentdb?

I'm using pydocumentdb as sdk to access my CosmosDB(DocumentDB) database
I'm trying to execute this query
SELECT VALUE COUNT(1) FROM c WHERE c._type="User"
And I'm getting different results each time I execute it using this QueryDocuments methods from DocumentClient class.
Thanks in advance
After i followed the tip given by #nick-chapsas i found the solution.
Here it is the result code
from pydocumentdb import document_client
DB_HOST = "my-host"
DB_KEY = "my key=="
DB_DATABASE = "my database"
DB_COLLECTION = "my collection"
dbclient = document_client.DocumentClient(DB_HOST, {'masterKey': DB_KEY})
path = 'dbs/{}/colls/{}'.format(DB_DATABASE, DB_COLLECTION)
query = "SELECT VALUE COUNT(1) FROM c JOIN chk0 IN c.communities WHERE chk0.id='bliive' AND c._type='User'"
result = [dta for dta in dbclient.QueryDocuments(path, query)]
print("count:")
print(str(sum(result)))
Thank you!

Using Boto3 to interact with amazon Aurora on RDS

I have set up a database in Amazon RDS using Amazon Aurora and would like to interact with the database using Python - the obvious choice is to use Boto.
However, their documentation is awful and does nopt cover ways in which I can interact with the databse to:
Run queries with SQL statements
Interact with the tables in the database
etc
Does anyone have an links to some examples/tutorials, or know how to do these tasks?
When using Amazon RDS offerings (including Aurora), you don't connect to the database via any AWS API (including Boto). Instead you would use the native client of your chosen database. In the case of Aurora, you would connect using the MySQL Command Line client. From there, you can query it just like any other MySQL database.
There's a brief section of the "Getting Started" documentation that talks about connecting to your Aurora database:
Connecting to an Amazon Aurora DB Cluster
Here are a couple examples:
INSERT example:
import boto3
sql = """
INSERT INTO YOUR_TABLE_NAME_HERE
(
your_column_name_1
,your_column_name_2
,your_column_name_3)
VALUES(
:your_param_1_name
,:your_param_2_name)
,:your_param_3_name
"""
param1 = {'name':'your_param_1_name', 'value':{'longValue': 5}}
param2 = {'name':'your_param_2_name', 'value':{'longValue': 63}}
param3 = {'name':'your_param_3_name', 'value':{'stringValue': 'para bailar la bamba'}}
param_set = [param1, param2, param3]
db_clust_arn = 'your_db_cluster_arn_here'
db_secret_arn = 'your_db_secret_arn_here'
rds_data = boto3.client('rds-data')
response = rds_data.execute_statement(
resourceArn = db_clust_arn,
secretArn = db_secret_arn,
database = 'your_database_name_here',
sql = sql,
parameters = param_set)
print(str(response))
READ example:
import boto3
rds_data = boto3.client('rds-data')
db_clust_arn = 'your_db_cluster_arn_here'
db_secret_arn = 'your_db_secret_arn_here'
employee_id = 35853
get_vacation_days_sql = f"""
select vacation_days_remaining
from employees_tbl
where employee_id = {employee_id}
"""
response1 = rds_data.execute_statement(
resourceArn = db_clust_arn,
secretArn = db_secret_arn,
database = 'your_database_name_here',
sql = get_vacation_days_sql)
#recs is a list (of rows returned from Db)
recs = response1['records']
print(f"recs === {recs}")
#recs === [[{'longValue': 57}]]
#single_row is a list of dictionaries, where each dictionary represents a
#column from that single row
for single_row in recs:
print(f"single_row === {single_row}")
#single_row === [{'longValue': 57}]
#one_dict is a dictionary with one key value pair
#where the key is the data type of the column and the
#value is the value of the column
#each additional column is another dictionary
for single_column_dict in single_row:
print(f"one_dict === {single_column_dict}")
# one_dict === {'longValue': 57}
vacation_days_remaining = single_column_dict['longValue']
print(f'vacation days remaining === {vacation_days_remaining}')
Source Link:
https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/data-api.html#data-api.calling.python

getting only updated data from database

I have to get the recently updated data from database. For the purpose of solving it, I have saved the last read row number into shelve of python. The following code works for a simple query like select * from rows. My code is:
from pyodbc import connect
from peewee import *
import random
import shelve
import connection
d = shelve.open("data.shelve")
db = SqliteDatabase("data.db")
class Rows(Model):
valueone = IntegerField()
valuetwo = IntegerField()
class Meta:
database = db
def CreateAndPopulate():
db.connect()
db.create_tables([Rows],safe=True)
with db.atomic():
for i in range(100):
row = Rows(valueone=random.randrange(0,100),valuetwo=random.randrange(0,100))
row.save()
db.close()
def get_last_primay_key():
return d.get('max_row',0)
def doWork():
query = "select * from rows" #could be anything
conn = connection.Connection("localhost","","SQLite3 ODBC Driver","data.db","","")
max_key_query = "SELECT MAX(%s) from %s" % ("id", "rows")
max_primary_key = conn.fetch_one(max_key_query)[0]
print "max_primary_key " + str(max_primary_key)
last_primary_key = get_last_primay_key()
print "last_primary_key " + str(last_primary_key)
if max_primary_key == last_primary_key:
print "no new records"
elif max_primary_key > last_primary_key:
print "There are some datas"
optimizedQuery = query + " where id>" + str(last_primary_key)
print query
for data in conn.fetch_all(optimizedQuery):
print data
d['max_row'] = max_primary_key
# print d['max_row']
# CreateAndPopulate() # to populate data
doWork()
While the code will work for a simple query without where clause, but the query can be anything from simple to complex, having joins and multiple where clauses. If so, then the portion where I'm adding where will fail. How can I get only last updated data from database whatever be the query?
PS: I cannot modify database. I just have to fetch from it.
Use an OFFSET clause. For example:
SELECT * FROM [....] WHERE [....] LIMIT -1 OFFSET 1000
In your query, replace 1000 with a parameter bound to your shelve variable. That will skip the top "shelve" number of rows and only grab newer ones. You may want to consider a more robust refactor eventually, but good luck.

How can I get int result from web python sql query?

My code is as follows. My question is, how can I get the int result from the query?
import web
db = web.database(...)
rs = db.query('select max(id) from tablename')
The query method returns a list. Then you can access your calculated column by using an alias. If you don't want to use an alias, then I think you can do rs[0]['max(id)']
rs = db.query('select max(id) as max_value from tablename')
print rs[0].max_value
This is based on the example used in the web.py docs: http://webpy.org/cookbook/query
Here is the example from the link:
import web
db = web.database(dbn='postgres', db='mydata', user='dbuser', pw='')
results = db.query("SELECT COUNT(*) AS total_users FROM users")
print results[0].total_users # -> prints number of entries in 'users' table

Categories