dynamodb row count via python, boto query - python

Folks,
Am trying to get the following bit of code working to return the row count in a table:
import boto
import boto.dynamodb2
from boto.dynamodb2.table import Table
from boto.dynamodb2.fields import HashKey, RangeKey
drivers = Table('current_fhv_drivers')
rowcountquery = drivers.query(
number = 'blah',
expiration = 'foo',
count=True,
)
for x in rowcountquery:
print x['Count']
Error I see is:
boto.dynamodb2.exceptions.UnknownFilterTypeError: Operator 'count' from 'count' is not recognized.
Whats the correct syntaxt to get row count :)
Thanks!

That exception is basically telling you that the operator 'count' is not recognized by boto.
If you read the second paragraph on http://boto.readthedocs.org/en/latest/dynamodb2_tut.html#querying you'll see that:
Filter parameters are passed as kwargs & use a __ to separate the fieldname from the operator being used to filter the value.
So I would change your code to:
import boto
import boto.dynamodb2
from boto.dynamodb2.table import Table
from boto.dynamodb2.fields import HashKey, RangeKey
drivers = Table('current_fhv_drivers')
rowcountquery = drivers.query(
number__eq = 'blah',
expiration__eq = 'foo',
count__eq = True,
)
for x in rowcountquery:
print x['Count']

from boto.dynamodb2.table import Table
table = Table('current_fhv_drivers')
print(table.query_count(number__eq = 'blah', expiration__eq = 'foo'))

Related

How do I compile and bring in multiple outputs from the same worker?

I'm developing a kubeflow pipeline that takes in a data set, splits that dataset into two different datasets based on a filter inside the code, and outputs both datasets. That function looks like the following:
def merge_promo_sales(input_data: Input[Dataset],
output_data_hd: OutputPath("Dataset"),
output_data_shop: OutputPath("Dataset")):
import pandas as pd
pd.set_option('display.max_rows', 100)
pd.set_option('display.max_columns', 500)
import numpy as np
from google.cloud import bigquery
from utils import google_bucket
client = bigquery.Client("gcp-sc-demand-plan-analytics")
print("Client creating using default project: {}".format(client.project), "Pulling Data")
query = """
SELECT * FROM `gcp-sc-demand-plan-analytics.Modeling_Input.monthly_delivery_type_sales` a
Left Join `gcp-sc-demand-plan-analytics.Modeling_Input.monthly_promotion` b
on a.ship_base7 = b.item_no
and a.oper_cntry_id = b.corp_cd
and a.dmand_mo_yr = b.dates
"""
query_job = client.query(
query,
# Location must match that of the dataset(s) referenced in the query.
location="US",
) # API request - starts the query
df = query_job.to_dataframe()
df.drop(['corp_cd', 'item_no', 'dates'], axis = 1, inplace=True)
df.loc[:, 'promo_objective_increase_margin':] = df.loc[:, 'promo_objective_increase_margin':].fillna(0)
items = df_['ship_base7'].unique()
df = df[df['ship_base7'].isin(items)]
df_hd = df[df['location_type'] == 'home_delivery']
df_shop = df[df['location_type'] != 'home_delivery']
df_hd.to_pickle(output_data_hd)
df_shop.to_pickle(output_data_shop)
That part works fine. When I try to feed those two data sets into the next function with the compiler, I hit errors.
I tried the following:
#kfp.v2.dsl.pipeline(name=PIPELINE_NAME)
def my_pipeline():
merge_promo_sales_nl = merge_promo_sales(input_data = new_launch.output)
rule_3_hd = rule_3(input_data = merge_promo_sales_nl.output_data_hd)
rule_3_shop = rule_3(input_data = merge_promo_sales_nl.output_data_shop)`
The error I get is the following:
AttributeError: 'ContainerOp' object has no attribute 'output_data_hd'
output_data_hd is the parameter I put that dataset out to but apparently it's not the name of parameter kubeflow is looking for.
I just figured this out.
When you run multiple outputs, you use the following in the compile section:
rule_3_hd = rule_3(input_data = merge_promo_sales_nl.outputs['output_data_hd'])
rule_3_shop = rule_3(input_data = merge_promo_sales_nl.outputs['output_data_shop'])

Error while creating person group person in microsoft cognitive face API

I have a code which uses older API. I don't know new API. Those who know help me with modifying the code.
import cognitive_face as CF
from global_variables import personGroupId
import sqlite3
Key = '###################'
CF.Key.set(Key)
BASE_URL = 'https://region.api.cognitive.microsoft.com/face/v1.0/'
CF.BaseUrl.set(BASE_URL)
if len(sys.argv) is not 1:
res = CF.person.create(personGroupId, str(sys.argv[1])) #error line
print(res)
extractId = str(sys.argv[1])[-2:]
connect = sqlite3.connect("studentdb")
cmd = "SELECT * FROM Students WHERE id = " + extractId
cursor = connect.execute(cmd)
isRecordExist = 0
for row in cursor:
isRecordExist = 1
if isRecordExist == 1:
connect.execute("UPDATE Students SET personID = ? WHERE ID = ?",(res['personId'], extractId))
connect.commit()
connect.close()
As you mentioned you are using older API. You are expected to use the new API. Refer this (official documentation) for installing the package and further reference.
PACKAGE:
pip install --upgrade azure-cognitiveservices-vision-face
Import the following libraries (excluding other basic libraries)
from azure.cognitiveservices.vision.face import FaceClient
from msrest.authentication import CognitiveServicesCredentials
from azure.cognitiveservices.vision.face.models import TrainingStatusType, Person, SnapshotObjectType, OperationStatusType
The updated API command is as follows:
res = face_client.person_group_person.create(person_group_id, str(sys.argv[1]))
In addition to what Soorya answered above, For those who want the sample code reference, you can see the latest API sample code from here
def build_person_group(client, person_group_id, pgp_name):
print('Create and build a person group...')
# Create empty Person Group. Person Group ID must be lower case, alphanumeric, and/or with '-', '_'.
print('Person group ID:', person_group_id)
client.person_group.create(person_group_id = person_group_id, name=person_group_id)

Using Python elastisearch_dsl with nested objects

I want to try and use elasticsearch_dsl with python for the following
import elasticsearch
es_server = 'my_server_name'
es_port = '9200'
es_index_name = 'my_index_name'
es_connection = Elasticsearch([{'host': es_server, 'port': es_port}])
es_query = '{"query":{"bool":{"must":[{"term":{"data.party.fullName":"john do"}}],"must_not":[],"should":[]}},"from":0,"size":1,"sort":[],"facets":{}}'
my_results = es_connection.search(index=es_index_name, body=es_query)
print my_results
es_query ='{"query": {"nested" : {"filter" : {"term" : {"party.phoneList.phoneFullNumber" : "4081234567"}},"path" : "party.phoneList"}},"from" :0,"size" : 1}';
my_results = es_connection.search(index=es_index_name, body=es_query)
print my_results
I am able to get the 1st query but am not sure on the second one
from elasticsearch import Elasticsearch
from elasticsearch_dsl import Search, Q
client = Elasticsearch('my_server:9200')
s = Search(using=client, index = "my_index").query("term",fullName="john do ")
response = s.execute()
print response
Not sure how to do the query using DSL for the nested object party.phoneList.phoneFullNumber
New to ES and hence could not figure out how to do the nested objects.
I looked at https://github.com/elastic/elasticsearch-dsl-py/issues/28 and could not quite figure out.
Thanks !
Just use __ instead of . to get around python's limitations and the nested query:
s = Search(using=client, index = "my_index")
s = s.query("nested",
path="party.phoneList",
query=Q("term", party__phoneList__phoneFullNumber="4081234567")
)

How to solve "TypeError: expected string or buffer" when importing json data via api?

I'm trying to import JSON data via an API, and use the imported data to construct a DataFrame.
import json
import pandas as pd
import numpy as np
import requests
api_username = 'acb'
api_password = 'efg'
germany_name = 'Germany'
germany_api_url = "https://api.country_data.com/stats/?country=" + germany_name + "&year=2014"
germany_api_resp = requests.get(germany_api_url,auth=(api_username,api_password))
germany_data_json = json.loads(germany_api_resp)
germany_frame = pd.DataFrame(germany_data_json['data']).set_index('tag')
print(germany_frame) shows me the desired DataFrame.
I want to repeat the process for many countries, not just 'Germany', so I created a country object like this:
class Country(object):
def __init__(self,name):
self.name = name
self.api_url = "https://api.country_data.com/stats/?country=" + name + "&year=2014"
self.api_resp = requests.get(self.api_url,auth=(api_username,api_password))
self.data_json = json.loads(self.api_resp)
self.frame = pd.DataFrame(self.data_json['data']).set_index('tag')
When I create my first object, like this:
Germany = Country('Germany')
I get an Error message:
TypeError: expected string or buffer
Can someone help me with this issue?
I don't which version of Python you're using, and which version of requests but I recommend to you to update everything. Here is a error I found :
self.data_json = json.loads(self.api_resp)
You try to load in a json-way a Response from requests, so change it to :
self.data_json = self.api_resp.json()
I replaced your api url to another because yours is wrong and it works for me.
See ya !

Selecting values from a JSON file in Python

I am getting JIRA data using the following python code,
how do I store the response for more than one key (my example shows only one KEY but in general I get lot of data) and print only the values corresponding to total,key, customfield_12830, summary
import requests
import json
import logging
import datetime
import base64
import urllib
serverURL = 'https://jira-stability-tools.company.com/jira'
user = 'username'
password = 'password'
query = 'project = PROJECTNAME AND "Build Info" ~ BUILDNAME AND assignee=ASSIGNEENAME'
jql = '/rest/api/2/search?jql=%s' % urllib.quote(query)
response = requests.get(serverURL + jql,verify=False,auth=(user, password))
print response.json()
response.json() OUTPUT:-
http://pastebin.com/h8R4QMgB
From the the link you pasted to pastebin and from the json that I saw, its a you issues as list containing key, fields(which holds custom fields), self, id, expand.
You can simply iterate through this response and extract values for keys you want. You can go like.
data = response.json()
issues = data.get('issues', list())
x = list()
for issue in issues:
temp = {
'key': issue['key'],
'customfield': issue['fields']['customfield_12830'],
'total': issue['fields']['progress']['total']
}
x.append(temp)
print(x)
x is list of dictionaries containing the data for fields you mentioned. Let me know if I have been unclear somewhere or what I have given is not what you are looking for.
PS: It is always advisable to use dict.get('keyname', None) to get values as you can always put a default value if key is not found. For this solution I didn't do it as I just wanted to provide approach.
Update: In the comments you(OP) mentioned that it gives attributerror.Try this code
data = response.json()
issues = data.get('issues', list())
x = list()
for issue in issues:
temp = dict()
key = issue.get('key', None)
if key:
temp['key'] = key
fields = issue.get('fields', None)
if fields:
customfield = fields.get('customfield_12830', None)
temp['customfield'] = customfield
progress = fields.get('progress', None)
if progress:
total = progress.get('total', None)
temp['total'] = total
x.append(temp)
print(x)

Categories