How to count using pydocumentdb? - python

I'm using pydocumentdb as sdk to access my CosmosDB(DocumentDB) database
I'm trying to execute this query
SELECT VALUE COUNT(1) FROM c WHERE c._type="User"
And I'm getting different results each time I execute it using this QueryDocuments methods from DocumentClient class.
Thanks in advance

After i followed the tip given by #nick-chapsas i found the solution.
Here it is the result code
from pydocumentdb import document_client
DB_HOST = "my-host"
DB_KEY = "my key=="
DB_DATABASE = "my database"
DB_COLLECTION = "my collection"
dbclient = document_client.DocumentClient(DB_HOST, {'masterKey': DB_KEY})
path = 'dbs/{}/colls/{}'.format(DB_DATABASE, DB_COLLECTION)
query = "SELECT VALUE COUNT(1) FROM c JOIN chk0 IN c.communities WHERE chk0.id='bliive' AND c._type='User'"
result = [dta for dta in dbclient.QueryDocuments(path, query)]
print("count:")
print(str(sum(result)))
Thank you!

Related

Python Script to get multi table counts

I'm trying to write a python script to get a count of some tables for monitoring which looks a bit like the code below. I'm trying to get an output such as below and have tried using python multi-dimensional arrays but not having any luck.
Expected Output:
('oltptransactions:', [(12L,)])
('oltpcases:', [(24L,)])
Script:
import psycopg2
# Connection with the DataBase
conn = psycopg2.connect(user = "appuser", database = "onedb", host = "192.168.1.1", port = "5432")
cursor = conn.cursor()
sql = """SELECT COUNT(id) FROM appuser.oltptransactions"""
sql2 = """SELECT count(id) FROM appuser.oltpcases"""
sqls = [sql,sql2]
for i in sqls:
cursor.execute(i)
result = cursor.fetchall()
print('Counts:',result)
conn.close()
Current output:
[root#pgenc python_scripts]# python multi_getrcount.py
('Counts:', [(12L,)])
('Counts:', [(24L,)])
Any help is appreciated.
Thanks!
I am a bit reluctant to show this way, because best practices recommend to never build a dynamic SQL string but always use a constant string and parameters, but this is one use case where computing the string is legit:
a table name cannot be a parameter in SQL
the input only comes from the program itself and is fully mastered
Possible code:
sql = """SELECT count(*) from appuser.{}"""
tables = ['oltptransactions', 'oltpcases']
for t in tables:
cursor.execute(sql.format(t))
result = cursor.fetchall()
print("('", t, "':,", result, ")")
I believe something as below, Unable to test code because of certificate issue.
sql = """SELECT 'oltptransactions', COUNT(id) FROM appuser.oltptransactions"""
sql2 = """SELECT 'oltpcases', COUNT(id) FROM appuser.oltpcases"""
sqls = [sql,sql2]
for i in sqls:
cursor.execute(i)
for name, count in cursor:
print ("")
Or
sql = """SELECT 'oltptransactions :'||COUNT(id) FROM appuser.oltptransactions"""
sql2 = """SELECT 'oltpcases :'||COUNT(id) FROM appuser.oltpcases"""
sqls = [sql,sql2]
for i in sqls:
cursor.execute(i)
result = cursor.fetchall()
print(result)

CosmosDB and Python3: how to query?

I am using CosmosDB (Azure documentDB) in my project, written in Python 3.
I have been looking for a while now, but I cannot find out how to query my table. I have seen some example code, but I do not see an example of how to query... all I can do is get all documents (not ideal when my DB is > 80GB).
The GitHub repo shows a very tiny set of operations for database and collections: https://github.com/Azure/azure-documentdb-python/blob/master/samples/CollectionManagement/Program.py
And the following SO post shows how to read all documents... but not how to perform querying such as "WHERE = X;"
I'd really appreciate it if someone can point me in the right direction, and possibly supply an example showing how to run queries.
Based on my understanding, I think you want to know how to perform a SQL-like query using Python to retrieve documents on Azure CosmosDB of DocumentDB API, please refer to the code below from here.
A query is performed using SQL
# Query them in SQL
query = { 'query': 'SELECT * FROM server s' }
options = {}
options['enableCrossPartitionQuery'] = True
options['maxItemCount'] = 2
result_iterable = client.QueryDocuments(collection['_self'], query, options)
results = list(result_iterable);
print(results)
The above code is using the method QueryDocuments.
Any concern, please feel free to let me know.
Update: Combine with my sample code for the other SO thread you linked, as below.
from pydocumentdb import document_client
uri = 'https://ronyazrak.documents.azure.com:443/'
key = '<your-primary-key>'
client = document_client.DocumentClient(uri, {'masterKey': key})
db_id = 'test1'
db_query = "select * from r where r.id = '{0}'".format(db_id)
db = list(client.QueryDatabases(db_query))[0]
db_link = db['_self']
coll_id = 'test1'
coll_query = "select * from r where r.id = '{0}'".format(coll_id)
coll = list(client.QueryCollections(db_link, coll_query))[0]
coll_link = coll['_self']
query = { 'query': 'SELECT * FROM server s' }
docs = client.QueryDocuments(coll_link, query)
print list(docs)
query = 'SELECT * FROM c'
docs = list(client.QueryItems(coll_link,query))
QueryDocuments has been replaced with QueryItems.
I have a similar problem recently. You can fetch blocks (not entire query set) by calling fetch_next_block().
query = "select * from c"
options = {'maxItemCount': 1000, 'continuation': True}
q = db_source._client.QueryDocuments(collection_link, query, options)
block1 = q.fetch_next_block()
block2 = q.fetch_next_block()

How can I use multiple parameters using pandas pd.read_sql_query?

I am trying to pass three variables in a sql query. These are region, feature, newUser. I am using SQL driver SQL Server Native Client 11.0.
Here is my code that works.
query = "SELECT LicenseNo FROM License_Mgmt_Reporting.dbo.MATLAB_NNU_OPTIONS WHERE Region = ?"
data_df = pd.read_sql_query((query),engine,params={region})
output.
LicenseNo
0 12
1 5
Instead i want to pass in three variables and this code does not work.
query = "SELECT LicenseNo FROM License_Mgmt_Reporting.dbo.MATLAB_NNU_OPTIONS WHERE Region = ? and FeatureName = ? and NewUser =?"
nnu_data_df = pd.read_sql_query((query),engine,params={region, feature, newUser})
Output returns an empty data frame.
Empty DataFrame
Columns: [LicenseNo]
Index: []
try a string in a tuple, also you can take out the () in the query:
so you could do something like
query = "SELECT LicenseNo FROM License_Mgmt_Reporting.dbo.MATLAB_NNU_OPTIONS WHERE Region = ? and FeatureName = ? and NewUser =?"
region = 'US'
feature = 'tall'
newUser = 'john'
data_df = pd.read_sql_query(query, engine, params=(region, feature , newUser))
Operator error by me :( I was using the wrong variable and the database returned no results because it didn't exist!

getting only updated data from database

I have to get the recently updated data from database. For the purpose of solving it, I have saved the last read row number into shelve of python. The following code works for a simple query like select * from rows. My code is:
from pyodbc import connect
from peewee import *
import random
import shelve
import connection
d = shelve.open("data.shelve")
db = SqliteDatabase("data.db")
class Rows(Model):
valueone = IntegerField()
valuetwo = IntegerField()
class Meta:
database = db
def CreateAndPopulate():
db.connect()
db.create_tables([Rows],safe=True)
with db.atomic():
for i in range(100):
row = Rows(valueone=random.randrange(0,100),valuetwo=random.randrange(0,100))
row.save()
db.close()
def get_last_primay_key():
return d.get('max_row',0)
def doWork():
query = "select * from rows" #could be anything
conn = connection.Connection("localhost","","SQLite3 ODBC Driver","data.db","","")
max_key_query = "SELECT MAX(%s) from %s" % ("id", "rows")
max_primary_key = conn.fetch_one(max_key_query)[0]
print "max_primary_key " + str(max_primary_key)
last_primary_key = get_last_primay_key()
print "last_primary_key " + str(last_primary_key)
if max_primary_key == last_primary_key:
print "no new records"
elif max_primary_key > last_primary_key:
print "There are some datas"
optimizedQuery = query + " where id>" + str(last_primary_key)
print query
for data in conn.fetch_all(optimizedQuery):
print data
d['max_row'] = max_primary_key
# print d['max_row']
# CreateAndPopulate() # to populate data
doWork()
While the code will work for a simple query without where clause, but the query can be anything from simple to complex, having joins and multiple where clauses. If so, then the portion where I'm adding where will fail. How can I get only last updated data from database whatever be the query?
PS: I cannot modify database. I just have to fetch from it.
Use an OFFSET clause. For example:
SELECT * FROM [....] WHERE [....] LIMIT -1 OFFSET 1000
In your query, replace 1000 with a parameter bound to your shelve variable. That will skip the top "shelve" number of rows and only grab newer ones. You may want to consider a more robust refactor eventually, but good luck.

How can I get int result from web python sql query?

My code is as follows. My question is, how can I get the int result from the query?
import web
db = web.database(...)
rs = db.query('select max(id) from tablename')
The query method returns a list. Then you can access your calculated column by using an alias. If you don't want to use an alias, then I think you can do rs[0]['max(id)']
rs = db.query('select max(id) as max_value from tablename')
print rs[0].max_value
This is based on the example used in the web.py docs: http://webpy.org/cookbook/query
Here is the example from the link:
import web
db = web.database(dbn='postgres', db='mydata', user='dbuser', pw='')
results = db.query("SELECT COUNT(*) AS total_users FROM users")
print results[0].total_users # -> prints number of entries in 'users' table

Categories