Refactoring Cognitive Services Code in Python in Databricks on Python - python

The following code gives the following Print Output:
-------Recognizing business card #1--------
Contact First Name: Chris has confidence: 1.0
Contact Last Name: Smith has confidence: 1.0
The code that provides the above output is:
bcUrl = "https://raw.githubusercontent.com/Azure/azure-sdk-for-python/master/sdk/formrecognizer/azure-ai-formrecognizer/samples/sample_forms/business_cards/business-card-english.jpg"
poller = form_recognizer_client.begin_recognize_business_cards_from_url(bcUrl)
business_cards = poller.result()
for idx, business_card in enumerate(business_cards):
print("--------Recognizing business card #{}--------".format(idx+1))
contact_names = business_card.fields.get("ContactNames")
if contact_names:
for contact_name in contact_names.value:
print("Contact First Name: {} has confidence: {}".format(
contact_name.value["FirstName"].value, contact_name.value["FirstName"].confidence
))
print("Contact Last Name: {} has confidence: {}".format(
contact_name.value["LastName"].value, contact_name.value["LastName"].confidence
))
I am trying refactor the code so as to output the results to a dataframe as follows:
import pandas as pd
field_list = ["FirstName", "LastName"]
df = pd.DataFrame(columns=field_list)
bcUrl = "https://raw.githubusercontent.com/Azure/azure-sdk-for-python/master/sdk/formrecognizer/azure-ai formrecognizer/samples/sample_forms/business_cards/business-card-english.jpg"
for blob in container.list_blobs():
blob_url = container_url + "/" + blob.name
poller = form_recognizer_client.begin_recognize_business_cards_from_url(bcUrl)
business_cards = poller.result()
print("Scanning " + blob.name + "...")
for idx, business_card in enumerate(business_cards):
single_df = pd.DataFrame(columns=field_list)
for field in field_list:
entry = business_card.fields.get(field)
if entry:
single_df[field] = [entry.value]
single_df['FileName'] = blob.name
df = df.append(single_df)
df = df.reset_index(drop=True)
df
However, my code does not provide any output:
Can someone take a look and let know why I'm not getting any output?

When I tried to connect blob storage, I got the same kind of error. I just followed the below syntax for connecting blob storage and also, I deleted .json and some other .fott files only I have the PDFs in the container. I run the same code without any problem it is working fine. Please follow below reference which has detailed information.
Install packages
Connect to Azure Storage Container
Enable Cognitive Services
Send files to Cognitive Services
Reference:
https://www.youtube.com/watch?v=hQ2NeO4c9iI&t=458s
Azure Databricks and Form Recognizer - Invalid Image or password protected - Stack Overflow
https://github.com/tomweinandy/form_recognizer_demo

Related

AuthenticationError: Incorrect API key provided at colab

I want to create music by using AI. Finally, I found this page: https://towardsdatascience.com/ai-tunes-creating-new-songs-with-artificial-intelligence-4fb383218146
However, this program uses the open AI API, but when I copy and paste the API(I already had an Open AI's account), I get this error. I'm new to this program, so I don't know how to deal with it.enter image description here
this is the page https://colab.research.google.com/github/robgon-art/ai-tunes/blob/main/AI_Tunes_Generate_Music.ipynb
My code
##title **Generate a new Song Title and Band Name**
##markdown This data will be used to prompt the AI-Tunes system to create the song.
response = openai.Completion.create(
engine="davinci-instruct-beta",
prompt="Create a new song title a new band name. Be creative!\n\nBand name:love The Execs\nSong title: love you more\n###\n\nBand name: The One Chords\nSong title: Moving Down to L.A\n###",
temperature=0.7,
max_tokens=64,
top_p=1,
frequency_penalty=0.5,
presence_penalty=0.5,
stop=["###"]
)
# print(response)
song_metadata = response["choices"][0]["text"].strip()
lines = song_metadata.split("\n")
generated_metadata = {}
song_title = "Our Random Song"
band_name = "Some Random Band"
for line in lines:
parts = line.split(":")
if len(parts) == 2:
if "song title" in parts[0].lower() and len(parts[1]) > 0:
song_title = parts[1].strip()
if "band name" in parts[0].lower() and len(parts[1]) > 0:
band_name = parts[1].strip()
print("Song Title:", song_title)
print("Band Name :", band_name)
First I check my API is correct. and go to the OpenAI FAQ
website.https://help.openai.com/en/articles/6882433-incorrect-api-key-provided
But still failed it doesn't work.
The error message suggests that you used the following as the key:
openai.api_key = "<sk-...>"
which should be
openai.api_key = "sk-..."

Using boto3 to retrieve a SG rule Description

We are trying to use boto3 to query our AWS instances and get all security groups which have rules with no description. I have not been able to figure out how to retrieve the Rule description. I have included an image of the column I want to retrieve below.
I wanted to share the solution I came up with late in the evening, piecing together bits from other solutions - primarily from #Abdul Gill and hist ec2_sg_rules script.
import boto3
# Explicitly declaring variables here grants them global scope
cidr_block = ""
ip_protpcol = ""
from_port = ""
to_port = ""
from_source = ""
description = ""
sg_filter = [{'Name': 'group-name', 'Values': ['*ssh*']}]
print("%s,%s,%s" % ("Group-Name","Group-ID","CidrIp"))
ec2 = boto3.client('ec2' )
sgs = ec2.describe_security_groups(Filters=sg_filter)["SecurityGroups"]
for sg in sgs:
group_name = sg['GroupName']
group_id = sg['GroupId']
# print("%s,%s" % (group_name,group_id))
# InBound permissions ##########################################
inbound = sg['IpPermissions']
for rule in inbound:
#Is source/target an IP v4?
if len(rule['IpRanges']) > 0:
for ip_range in rule['IpRanges']:
cidr_block = ip_range['CidrIp']
if 'Description' not in ip_range:
if '10.' not in cidr_block:
print("%s,%s,%s" % (group_name, group_id, cidr_block))
print('\n')
Hopefully this helps someone else.

Access email from a particular folder in Outlook, within a specific date range (user Input) using "exchangelib" in Python

Summary of the requirement :
To access emails from a specific folder in Outlook within a user given date range,
ex: all mails from June or all mails from 23-June-2020 to 15-July-2020
So far we have tried the following but the issues are :
Date range is not giving correct output
It is taking too long to give output, also sometimes returning with a timeout error.
The Code:
from exchangelib import Credentials, Account, DELEGATE, Configuration, IMPERSONATION, FaultTolerance,EWSDateTime,EWSTimeZone,Message
import datetime
import pandas as pd
new_password = "your password"
credentials = Credentials(username = 'username', password = new_password)
config = Configuration(server ='outlook.office365.com', credentials = credentials)
account = Account(primary_smtp_address ='username', credentials = credentials, autodiscover = False, config = config, access_type = DELEGATE)
#first approach.....................................
conversation_id = []
datetime_received = []
has_attachment = []
Senders = []
for i in account.inbox.all().order_by('-datetime_received')[:40]:
if isinstance(i, Message):
if i.datetime_received:
if ((i.datetime_received).year == 2020):
if ((i.datetime_received).month == 7):
if i.conversation_id:
print("conversation id: ", i.conversation_id.id)
conversation_id.append(i.conversation_id.id)
if not i.conversation_id:
conversation_id.append("Not available")
if i.sender:
print("Sender name : ", i.sender.name)
Senders.append(i.sender.name)
if not i.sender:
Senders.append("not available")
if i.datetime_received:
print("Time : ", i.datetime_received.date)
datetime_received.append(i.datetime_received.date)
if not i.datetime_received:
datetime_received.append("not available")
if i.has_attachments:
print("Has attachment: ", i.has_attachments)
has_attachment.append(i.has_attachments)
if not i.has_attachments:
has_attachment.append("not available")
# second approach.....................................................................
items_for_2019 = account.inbox.filter(start__range=(
tz.localize(EWSDateTime(2019, 8, 1)),
tz.localize(EWSDateTime(2020, 1, 1))
))
for i in items_for_2019:
print("")
#third approach.........................................................................
for item in account.inbox.filter(datetime_received__range=(tz.localize(EWSDateTime(2019, 8, 1)),tz.localize(EWSDateTime(2020, 1, 1)))):
print(item.sender.name)
first approach is working for specific month but extremely slow
second and third approach giving wrong output
A little guidance would be highly appreciated.
The start field belongs to CalendarItem. It is not a valid field for Message objects. That's why your 2nd approach does not work.
Your 3rd approach should work. Which output do you see, and what did you expect?
If your queries are taking a long time, try to limit the fields you are fetching for each item, by using the .only() QuerySet method.

Pulling the 'ExternalImageId' data when running search_faces_by_image

I'm fairly new to AWS and for the past week, been following all the helpful documentation on the site.
I am currently stuck on bring unable to pull the External Image Id data from a Reko collection after a 'search face by image', I just need to be able to put that data into a variable or to print it, does anybody know how I could do that?
Basically, this is my code:
import boto3
if name == "main":
bucket = 'bucketname'
collectionId = 'collectionname'
fileName = 'test.jpg'
threshold = 90
maxFaces = 2
admin = 'test'
targetFile = "%sTarget.jpg" % admin
imageTarget = open(targetFile, 'rb')
client = boto3.client('rekognition')
response = client.search_faces_by_image(CollectionId=collectionId,
Image={'Bytes': imageTarget.read()},
FaceMatchThreshold=threshold,
MaxFaces=maxFaces)
faceMatches = response['FaceMatches']
print ('Matching faces')
for match in faceMatches:
print ('FaceId:' + match['Face']['FaceId'])
print ('Similarity: ' + "{:.2f}".format(match['Similarity']) + "%")
at the end of it, I receive:
Matching faces
FaceId:8081ad90-b3bf-47e0-9745-dfb5a530a1a7
Similarity: 96.12%
Process finished with exit code 0
What I need is the External Image Id instead of the FaceId.
Thanks!

Reading specific test steps from Quality Center with python

I am working with Quality Center via OTA COM library. I figured out how to connect to server, but I am lost in OTA documentation on how to work with it. What I need is to create a function which takes a test name as an input and returns number of steps in this test from QC.
For now I am this far in this question.
import win32com
from win32com.client import Dispatch
# import codecs #to store info in additional codacs
import re
import json
import getpass #for password
qcServer = "***"
qcUser = "***"
qcPassword = getpass.getpass('Password: ')
qcDomain = "***"
qcProject = "***"
td = win32com.client.Dispatch("TDApiOle80.TDConnection.1")
#Starting to connect
td.InitConnectionEx(qcServer)
td.Login(qcUser,qcPassword)
td.Connect(qcDomain, qcProject)
if td.Connected == True:
print "Connected to " + qcProject
else:
print "Connection failed"
#Path = "Subject\Regression\C.001_Band_tones"
mg=td.TreeManager
npath="Subject\Regression"
tsFolder = td.TestSetTreeManager.NodeByPath(npath)
print tsFolder
td.Disconnect
td.Logout
print "Disconnected from " + qcProject
Any help on descent python examples or tutorials will be highly appreciated. For now I found this and this, but they doesn't help.
Using the OTA API to get data from Quality Center normally means to get some element by path, create a factory and then use the factory to get search the object. In your case you need the TreeManager to get a folder in the Test Plan, then you need a TestFactory to get the test and finally you need the DesignStepFactory to get the steps. I'm no Python programmer but I hope you can get something out of this:
mg=td.TreeManager
npath="Subject\Test"
tsFolder = mg.NodeByPath(npath)
testFactory = tsFolder.TestFactory
testFilter = testFactory.Filter
testFilter["TS_NAME"] = "Some Test"
testList = testFactory.NewList(testFilter.Text)
test = testList.Item(1) # There should be only 1 item
print test.Name
stepFactory = test.DesignStepFactory
stepList = stepFactory.NewList("")
for step in stepList:
print step.StepName
It takes some time to get used to the QC OTA API documentation but I find it very helpful. Nearly all of my knowledge comes from the examples in the API documentation—for your problem there are examples like "Finding a unique test" or "Get a test object with name and path". Both examples are examples to the Test object. Even if the examples are in VB it should be no big thing to adapt them to Python.
I figured out the solution, if there is a better way to do this you are welcome to post it.
import win32com
from win32com.client import Dispatch
import getpass
def number_of_steps(name):
qcServer = "***"
qcUser = "***"
qcPassword = getpass.getpass('Password: ')
qcDomain = "***"
qcProject = "***"
td = win32com.client.Dispatch("TDApiOle80.TDConnection.1")
#Starting to connect
td.InitConnectionEx(qcServer)
td.Login(qcUser, qcPassword)
td.Connect(qcDomain, qcProject)
if td.Connected is True:
print "Connected to " + qcProject
else:
print "Connection failed"
mg = td.TreeManager # Tree manager
folder = mg.NodeByPath("Subject\Regression")
testList = folder.FindTests(name) # Make a list of tests matching name (partial match is accepted)
if testList is not None:
if len(testList) > 1:
print "There are multiple tests matching this name, please check input parameter\nTests matching"
for test in testList:
print test.name
td.Disconnect
td.Logout
return False
if len(testList) == 1:
print "In test %s there is %d steps" % (testList[0].Name, testList[0].DesStepsNum)
else:
print "There are no test with this test name in Quality Center"
td.Disconnect
td.Logout
return False
td.Disconnect
td.Logout
print "Disconnected from " + qcProject
return testList[0].DesStepsNum # Return number of steps for given test

Categories