Pulling the 'ExternalImageId' data when running search_faces_by_image - python

I'm fairly new to AWS and for the past week, been following all the helpful documentation on the site.
I am currently stuck on bring unable to pull the External Image Id data from a Reko collection after a 'search face by image', I just need to be able to put that data into a variable or to print it, does anybody know how I could do that?
Basically, this is my code:
import boto3
if name == "main":
bucket = 'bucketname'
collectionId = 'collectionname'
fileName = 'test.jpg'
threshold = 90
maxFaces = 2
admin = 'test'
targetFile = "%sTarget.jpg" % admin
imageTarget = open(targetFile, 'rb')
client = boto3.client('rekognition')
response = client.search_faces_by_image(CollectionId=collectionId,
Image={'Bytes': imageTarget.read()},
FaceMatchThreshold=threshold,
MaxFaces=maxFaces)
faceMatches = response['FaceMatches']
print ('Matching faces')
for match in faceMatches:
print ('FaceId:' + match['Face']['FaceId'])
print ('Similarity: ' + "{:.2f}".format(match['Similarity']) + "%")
at the end of it, I receive:
Matching faces
FaceId:8081ad90-b3bf-47e0-9745-dfb5a530a1a7
Similarity: 96.12%
Process finished with exit code 0
What I need is the External Image Id instead of the FaceId.
Thanks!

Related

Refactoring Cognitive Services Code in Python in Databricks on Python

The following code gives the following Print Output:
-------Recognizing business card #1--------
Contact First Name: Chris has confidence: 1.0
Contact Last Name: Smith has confidence: 1.0
The code that provides the above output is:
bcUrl = "https://raw.githubusercontent.com/Azure/azure-sdk-for-python/master/sdk/formrecognizer/azure-ai-formrecognizer/samples/sample_forms/business_cards/business-card-english.jpg"
poller = form_recognizer_client.begin_recognize_business_cards_from_url(bcUrl)
business_cards = poller.result()
for idx, business_card in enumerate(business_cards):
print("--------Recognizing business card #{}--------".format(idx+1))
contact_names = business_card.fields.get("ContactNames")
if contact_names:
for contact_name in contact_names.value:
print("Contact First Name: {} has confidence: {}".format(
contact_name.value["FirstName"].value, contact_name.value["FirstName"].confidence
))
print("Contact Last Name: {} has confidence: {}".format(
contact_name.value["LastName"].value, contact_name.value["LastName"].confidence
))
I am trying refactor the code so as to output the results to a dataframe as follows:
import pandas as pd
field_list = ["FirstName", "LastName"]
df = pd.DataFrame(columns=field_list)
bcUrl = "https://raw.githubusercontent.com/Azure/azure-sdk-for-python/master/sdk/formrecognizer/azure-ai formrecognizer/samples/sample_forms/business_cards/business-card-english.jpg"
for blob in container.list_blobs():
blob_url = container_url + "/" + blob.name
poller = form_recognizer_client.begin_recognize_business_cards_from_url(bcUrl)
business_cards = poller.result()
print("Scanning " + blob.name + "...")
for idx, business_card in enumerate(business_cards):
single_df = pd.DataFrame(columns=field_list)
for field in field_list:
entry = business_card.fields.get(field)
if entry:
single_df[field] = [entry.value]
single_df['FileName'] = blob.name
df = df.append(single_df)
df = df.reset_index(drop=True)
df
However, my code does not provide any output:
Can someone take a look and let know why I'm not getting any output?
When I tried to connect blob storage, I got the same kind of error. I just followed the below syntax for connecting blob storage and also, I deleted .json and some other .fott files only I have the PDFs in the container. I run the same code without any problem it is working fine. Please follow below reference which has detailed information.
Install packages
Connect to Azure Storage Container
Enable Cognitive Services
Send files to Cognitive Services
Reference:
https://www.youtube.com/watch?v=hQ2NeO4c9iI&t=458s
Azure Databricks and Form Recognizer - Invalid Image or password protected - Stack Overflow
https://github.com/tomweinandy/form_recognizer_demo

azure.cognitiveservices.vision.face.models._models_py3.APIErrorException: (InvalidImageSize) Image size is too small

ERROR MESSAGE
I am trying to upload images of reasonable size(around 20KB). But according to documentation image of size 1KB to 6MB can be uploaded. I hope there is some part of the program that needs modification to rectify the error.
File "add_person_faces.py", line 46, in <module>
res = face_client.person_group_person.add_face_from_stream(global_var.personGroupId, person_id, img_data)
File "C:\Python\Python36\lib\site-packages\azure\cognitiveservices\vision\face\operations\_person_group_person_operations.py", line 785, in add_face_from_stream
raise models.APIErrorException(self._deserialize, response)
azure.cognitiveservices.vision.face.models._models_py3.APIErrorException: (InvalidImageSize) Image size is too small.
CODE
import os, time
import global_variables as global_var
from azure.cognitiveservices.vision.face import FaceClient
from msrest.authentication import CognitiveServicesCredentials
from azure.cognitiveservices.vision.face.models import TrainingStatusType, Person, SnapshotObjectType, OperationStatusType
import urllib
import sqlite3
import requests
from requests.packages.urllib3.exceptions import InsecureRequestWarning
requests.packages.urllib3.disable_warnings(InsecureRequestWarning)
KEY = global_var.key
ENDPOINT = 'https://centralindia.api.cognitive.microsoft.com'
face_client = FaceClient(ENDPOINT,CognitiveServicesCredentials(KEY))
def get_person_id():
person_id = ''
extractId = str(sys.argv[1])[-2:]
connect = sqlite3.connect("Face-DataBase")
c = connect.cursor()
cmd = "SELECT * FROM Students WHERE ID = " + extractId
c.execute(cmd)
row = c.fetchone()
person_id = row[3]
connect.close()
return person_id
if len(sys.argv) is not 1:
currentDir = os.path.dirname(os.path.abspath(__file__))
imageFolder = os.path.join(currentDir, "dataset/" + str(sys.argv[1]))
person_id = get_person_id()
for filename in os.listdir(imageFolder):
if filename.endswith(".jpg"):
print(filename)
img_data = open(os.path.join(imageFolder,filename), "rb")
res = face_client.face.detect_with_stream(img_data)
if not res:
print('No face detected from image {}'.format(filename))
continue
res = face_client.person_group_person.add_face_from_stream(global_var.personGroupId, person_id, img_data)
print(res)
time.sleep(6)
else:
print("supply attributes please from dataset folder")
After looking through the API calls you are making, I realized there are some elements missing. You might not have posted the entire code, but I will add a sample below that illustrates the steps. Using the steps avoids any image errors, so the 'wrong size' image error could have likely been from missing steps.
In your code, before you can add an image to a Person Group Person (PGP), you have to create a Person Group(PG) for that PGP to belong to. Then after you create the Person Group (it is empty at the start), you must create a Person Group Person with that PG ID in it. Once those two things are created, then you can start adding images to your Person Group Person.
So here are the steps summarized above:
Create a Person Group with the API call create()
Create a Person Group Person with its API call for create()
Add your image(s) to the Person Group Person with the API call add_face_from_stream()
Once you have added all your images that belong to your Person Group Person, then you can use data from it however you like.
See the code sample below, where a single local image is uploaded and added to a Person Group Person. I'll include the image I am using if you wanted to download and test.
import os
from azure.cognitiveservices.vision.face import FaceClient
from msrest.authentication import CognitiveServicesCredentials
KEY = os.environ['FACE_SUBSCRIPTION_KEY']
ENDPOINT = os.environ['FACE_ENDPOINT']
face_client = FaceClient(ENDPOINT, CognitiveServicesCredentials(KEY))
person_group_id = 'women_person_group'
person_id = 'women_hats'
image_name = 'woman_with_sunhat.jpg'
# Create empty Person Group. Person Group ID must be lower case, alphanumeric, and/or with '-', '_'.
print('Creating a Person Group:', person_group_id)
face_client.person_group.create(person_group_id=person_group_id, name=person_group_id)
# Create a Person Group Person.
print('Creating the Person Group Person:', person_id)
women_hat_group = face_client.person_group_person.create(person_group_id, person_id)
# Add image to our Person Group Person.
print('Adding face to the Person Group Person:', person_id)
face_image = open(image_name, 'r+b')
face_client.person_group_person.add_face_from_stream(person_group_id, women_hat_group.person_id, face_image)
# Print ID from face.
print('Person ID:', women_hat_group.person_id)
# Since testing, delete the Person Group, so no duplication conflicts if script is run again.
face_client.person_group.delete(person_group_id)
print()
print("Deleted the person group {} from the Azure Face account.".format(person_group_id))

How to delete the snapshots with a tag by setting the retention period

I have a script to delete the snapshots after a retention period. It works good and deletes the snapshots that passes the retention period. But I have to filter it with tags. Means only the snapshots that has a particular tag should be deleted.
from botocore.exceptions import ClientError
import datetime
# Set the global variables
globalVars = {}
globalVars['Owner'] = "Cloud"
globalVars['Environment'] = "Test"
globalVars['REGION_NAME'] = "ap-south-1"
globalVars['tagName'] = "Testing"
globalVars['findNeedle'] = "DeleteOn"
globalVars['RetentionDays'] = "1"
globalVars['tagsToExclude'] = "Do-Not-Delete"
ec2_client = boto3.client('ec2')
"""
This function looks at *all* snapshots that have a "DeleteOn" tag containing
the current day formatted as YYYY-MM-DD. This function should be run at least
daily.
"""
def janitor_for_snapshots():
account_ids = list()
account_ids.append( boto3.client('sts').get_caller_identity().get('Account') )
snap_older_than_RetentionDays = ( datetime.date.today() - datetime.timedelta(days= int(globalVars['RetentionDays'])) ).strftime('%Y-%m-%d')
delete_today = datetime.date.today().strftime('%Y-%m-%d')
tag_key = 'tag:' + globalVars['findNeedle']
filters = [{'Name': tag_key, 'Values': [delete_today]},]
# filters={ 'tag:' + config['tag_name']: config['tag_value'] }
# Get list of Snaps with Tag 'globalVars['findNeedle']'
snaps_to_remove = ec2_client.describe_snapshots(OwnerIds=account_ids,Filters=filters)
# Get the snaps that doesn't have the tag and are older than Retention days
all_snaps = ec2_client.describe_snapshots(OwnerIds=account_ids)
for snap in all_snaps['Snapshots']:
if snap['StartTime'].strftime('%Y-%m-%d') <= snap_older_than_RetentionDays:
snaps_to_remove['Snapshots'].append(snap)
snapsDeleted = {'Snapshots': []}
for snap in snaps_to_remove['Snapshots']:
try:
ec2_client.delete_snapshot(SnapshotId=snap['SnapshotId'])
snapsDeleted['Snapshots'].append({'Description': snap['Description'], 'SnapshotId': snap['SnapshotId'], 'OwnerId': snap['OwnerId']})
except ClientError as e:
if "is currently in use by" in str(e):
print("Snapshot {} is part of an AMI".format(snap.get('SnapshotId')))
snapsDeleted['Status']='{} Snapshots were Deleted'.format( len(snaps_to_remove['Snapshots']))
return snapsDeleted
def lambda_handler(event, context):
return janitor_for_snapshots()
if __name__ == '__main__':
lambda_handler(None, None)
I want to delete the snapshots only with "DeleteOn" Tag. But this script deletes all that passed the retention period. Its not checking the Tag part.
Please check and help on this.
Thank You.
If you are asking how to fix the code so that it only deletes snapshots that:
Have the given tag, AND
Have passed the retention period
then look closely at your code.
This part:
# Get list of Snaps with Tag 'globalVars['findNeedle']'
snaps_to_remove = ec2_client.describe_snapshots(OwnerIds=account_ids,Filters=filters)
is obtaining a list of snapshots by tag. Great!
Then this part:
# Get the snaps that doesn't have the tag and are older than Retention days
all_snaps = ec2_client.describe_snapshots(OwnerIds=account_ids)
for snap in all_snaps['Snapshots']:
if snap['StartTime'].strftime('%Y-%m-%d') <= snap_older_than_RetentionDays:
snaps_to_remove['Snapshots'].append(snap)
is getting a NEW list of snapshots and checking the retention.
Then, the resulting snaps_to_remove contains the results from BOTH of them.
You will need to combine your logic so it is only adding snaps that meet both criteria rather than compiling the list of snapshots separately.

Reverse-Geo Coding Fails when run in Loop

I am trying to do reverse geocoding and extract pincodes for lot-long. The .csv file has around 1 million records..
Below is my problem
1. Google API failing to give address for large records, and taking huge amount of time. I will later move it to Batch-Process though.
2. I tried to split the file into chunks and ran few files manually one by one (1000 records in each file after splitting), then i surprisingly get 100% result.
3. Later, I ran in loop one by one, again, Google API fails to give the result
Note: Right now we are looking for free API's only
**Below is my code**
def reverse_geocode(latlng):
result = {}
url = 'https://maps.googleapis.com/maps/api/geocode/json?latlng={}'
request = url.format(latlng)
key= '&key=' + api_key
request = request + key
data = requests.get(request).json()
if len(data['results']) > 0:
result = data['results'][0]
return result
def parse_postal_code(geocode_data):
if (not geocode_data is None) and ('formatted_address' in geocode_data):
for component in geocode_data['address_components']:
if 'postal_code' in component['types']:
return component['short_name']
return None
dfinal = pd.DataFrame(columns=colnames)
dmiss = pd.DataFrame(columns=colnames)
for fl in files:
df = pd.read_csv(fl)
print ('Processing file : ' + fl[36:])
df['geocode_data'] = ''
df['Pincode'] = ''
df['geocode_data']=df['latlng'].map(reverse_geocode)
df['Pincode'] = df['geocode_data'].map(parse_postal_code)
if (len(df[df['Pincode'].isnull()]) > 0):
d0=df[df['Pincode'].isnull()]
print("Missing Picodes : " + str(len(df[df['Pincode'].isnull()])) + " / " + str(len(df)))
dmiss.append(d0)
d0=df[~df['Pincode'].isnull()]
dfinal.append(d0)
else:
dfinal.append(df)
Can anybody help me out, what is the problem in my code? or if any additional info required please let me know....
You've run into Google API usage limits.

Reading specific test steps from Quality Center with python

I am working with Quality Center via OTA COM library. I figured out how to connect to server, but I am lost in OTA documentation on how to work with it. What I need is to create a function which takes a test name as an input and returns number of steps in this test from QC.
For now I am this far in this question.
import win32com
from win32com.client import Dispatch
# import codecs #to store info in additional codacs
import re
import json
import getpass #for password
qcServer = "***"
qcUser = "***"
qcPassword = getpass.getpass('Password: ')
qcDomain = "***"
qcProject = "***"
td = win32com.client.Dispatch("TDApiOle80.TDConnection.1")
#Starting to connect
td.InitConnectionEx(qcServer)
td.Login(qcUser,qcPassword)
td.Connect(qcDomain, qcProject)
if td.Connected == True:
print "Connected to " + qcProject
else:
print "Connection failed"
#Path = "Subject\Regression\C.001_Band_tones"
mg=td.TreeManager
npath="Subject\Regression"
tsFolder = td.TestSetTreeManager.NodeByPath(npath)
print tsFolder
td.Disconnect
td.Logout
print "Disconnected from " + qcProject
Any help on descent python examples or tutorials will be highly appreciated. For now I found this and this, but they doesn't help.
Using the OTA API to get data from Quality Center normally means to get some element by path, create a factory and then use the factory to get search the object. In your case you need the TreeManager to get a folder in the Test Plan, then you need a TestFactory to get the test and finally you need the DesignStepFactory to get the steps. I'm no Python programmer but I hope you can get something out of this:
mg=td.TreeManager
npath="Subject\Test"
tsFolder = mg.NodeByPath(npath)
testFactory = tsFolder.TestFactory
testFilter = testFactory.Filter
testFilter["TS_NAME"] = "Some Test"
testList = testFactory.NewList(testFilter.Text)
test = testList.Item(1) # There should be only 1 item
print test.Name
stepFactory = test.DesignStepFactory
stepList = stepFactory.NewList("")
for step in stepList:
print step.StepName
It takes some time to get used to the QC OTA API documentation but I find it very helpful. Nearly all of my knowledge comes from the examples in the API documentation—for your problem there are examples like "Finding a unique test" or "Get a test object with name and path". Both examples are examples to the Test object. Even if the examples are in VB it should be no big thing to adapt them to Python.
I figured out the solution, if there is a better way to do this you are welcome to post it.
import win32com
from win32com.client import Dispatch
import getpass
def number_of_steps(name):
qcServer = "***"
qcUser = "***"
qcPassword = getpass.getpass('Password: ')
qcDomain = "***"
qcProject = "***"
td = win32com.client.Dispatch("TDApiOle80.TDConnection.1")
#Starting to connect
td.InitConnectionEx(qcServer)
td.Login(qcUser, qcPassword)
td.Connect(qcDomain, qcProject)
if td.Connected is True:
print "Connected to " + qcProject
else:
print "Connection failed"
mg = td.TreeManager # Tree manager
folder = mg.NodeByPath("Subject\Regression")
testList = folder.FindTests(name) # Make a list of tests matching name (partial match is accepted)
if testList is not None:
if len(testList) > 1:
print "There are multiple tests matching this name, please check input parameter\nTests matching"
for test in testList:
print test.name
td.Disconnect
td.Logout
return False
if len(testList) == 1:
print "In test %s there is %d steps" % (testList[0].Name, testList[0].DesStepsNum)
else:
print "There are no test with this test name in Quality Center"
td.Disconnect
td.Logout
return False
td.Disconnect
td.Logout
print "Disconnected from " + qcProject
return testList[0].DesStepsNum # Return number of steps for given test

Categories