Read text from a gif image using Azure Cognitive services - python

azure API is working for .jpg images but when i tried for gif image it show Operation returned an invalid status code 'Bad Request'
print("===== Read File - remote =====")
# Get an image with text
read_image_url = "https://ci6.googleusercontent.com/proxy/5NB2CkeM22wqFhiQSmRlJVVinEp3o2nEbZQcy6_8CCKlKst_WW25N0PcsPaYiWAASXO52hufvUAEimUd3IreGowknEXy322x5oYG3lzkBGyctLI0M3eH_w-qHH9qPqtobjpGYooM7AvyNX2CCZtcnEgu8duKlee2GGaswg=s0-d-e1-ft#https://image.e.us.partycity.com/lib/fe301570756406747c1c72/m/10/93d08fa0-c760-4d8b-8e35-ddd5308ec311.gif"
# Call API with URL and raw response (allows you to get the operation location)
read_response = computervision_client.read(read_image_url, raw=True)

You CANNOT send a gif directly to azure read api because the documentation states below:
Request body
Input passed within the POST body. Supported input methods: raw image binary or image URL.
Input requirements:
Supported image formats: JPEG, PNG, BMP, PDF and TIFF.
Please do note MPO (Multi Picture Objects) embedded JPEG files are not supported.
For multi-page PDF and TIFF documents:
For the free tier, only the first 2 pages are processed.
For the paid tier, up to 2,000 pages are processed.
Image file size must be less than 50 MB (4 MB for the free tier).
The image/document page dimensions must be at least 50 x 50 pixels and at most 10000 x 10000 pixels.
To handle the gif you need to convert into png and then send a raw binary image for recognition as shown below:
import glob
import time
import requests
from PIL import Image
endpoint = 'https://NAME.cognitiveservices.azure.com/'
subscription_key = 'SUBSCRIPTION_KEY'
read_url = endpoint + "vision/v3.2/read/analyze"
uri = 'https://ci6.googleusercontent.com/proxy/5NB2CkeM22wqFhiQSmRlJVVinEp3o2nEbZQcy6_8CCKlKst_WW25N0PcsPaYiWAASXO52hufvUAEimUd3IreGowknEXy322x5oYG3lzkBGyctLI0M3eH_w-qHH9qPqtobjpGYooM7AvyNX2CCZtcnEgu8duKlee2GGaswg=s0-d-e1-ft#https://image.e.us.partycity.com/lib/fe301570756406747c1c72/m/10/93d08fa0-c760-4d8b-8e35-ddd5308ec311.gif'
with open('/tmp/pr0n.gif', 'wb') as f:
f.write(requests.get(uri).content)
gif='/tmp/pr0n.gif'
img = Image.open(gif)
img.save(gif+".png",'png', optimize=True, quality=70)
for filename in sorted(glob.glob("/tmp/pr0n.gif*.png")):
# Read the image into a byte array
image_data = open(filename, "rb").read()
headers = {'Ocp-Apim-Subscription-Key': subscription_key, 'Content-Type': 'application/octet-stream'}
params = {'visualFeatures': 'Categories,Description,Color'}
response = requests.post(read_url, headers=headers, params=params, data=image_data)
response.raise_for_status()
# The recognized text isn't immediately available, so poll to wait for completion.
analysis = {}
poll = True
while poll:
response_final = requests.get(response.headers["Operation-Location"], headers=headers)
analysis = response_final.json()
time.sleep(1)
if "analyzeResult" in analysis:
poll = False
if "status" in analysis and analysis['status'] == 'failed':
poll = False
polygons = []
if ("analyzeResult" in analysis):
# Extract the recognized text, with bounding boxes.
print(analysis["analyzeResult"]["readResults"][0])

There is a brand new online portal provided by Microsoft to test this service, among others and input requirements for Read API.
Link: https://preview.vision.azure.com/demo/OCR

Related

Azure SDK for Python: Reading blobs without downloading

I'm currently using the Azure Blob Storage SDK for Python. For my project I want to read/load the data from a specific blob without having to download it / store it on disk before accessing.
According to the documentation loading a specfic blob works for my with:
blob_client = BlobClient(blob_service_client.url,
container_name,
blob_name,
credential)
data_stream = blob_client.download_blob()
data = data_stream.readall()
The last readall() command returns me the byte information of the blob content (in my case a image).
With:
with open(loca_path, "wb") as local_file:
data_stream.readinto(my_blob)
it is possible to save the blob content on disk (classic downloading operation)
BUT:
Is it also possible to convert the byte data from data = data_stream.readall() directly into an image?
It already tried image_data = Image.frombytes(mode="RGB", data=data, size=(1080, 1920))
but it returns me an error not enough image data
Here is the sample code for reading the text without downloading the file.
from azure.storage.blob import BlockBlobService, PublicAccess
accountname="xxxx"
accountkey="xxxx"
blob_service_client = BlockBlobService(account_name=accountname,account_key=accountkey)
container_name="test2"
blob_name="a5.txt"
#get the length of the blob file, you can use it if you need a loop in your code to read a blob file.
blob_property = blob_service_client.get_blob_properties(container_name,blob_name)
print("the length of the blob is: " + str(blob_property.properties.content_length) + " bytes")
print("**********")
#get the first 10 bytes data
b1 = blob_service_client.get_blob_to_text(container_name,blob_name,start_range=0,end_range=10)
#you can use the method below to read stream
#blob_service_client.get_blob_to_stream(container_name,blob_name,start_range=0,end_range=10)
print(b1.content)
print("*******")
#get the next range of data
b2=blob_service_client.get_blob_to_text(container_name,blob_name,start_range=10,end_range=50)
print(b2.content)
print("********")
#get the next range of data
b3=blob_service_client.get_blob_to_text(container_name,blob_name,start_range=50,end_range=200)
print(b3.content)
For complete information you can check the document with Python libraries.

Reading multiple page PDF file using Google Cloud Vision

I am currently trying to read a multiple page pdf file using the google cloud vison API. I am currently only able to read the first page of the pdf and I am also getting an error in my code that gives me an error in my one line of code. I have attached pieces of my code below. How can I solve this error, and also be able to read the whole pdf instead of just one page?
Assuming that your Vision API JSON responses in your GCS are correct, you need to get the whole response instead of getting only the 1st element in response['responses'] and loop through it to get the annotation per page. See code below:
blob_list = list(bucket.list_blobs(prefix=prefix))
print('Output files:')
for obj in blob_list:
print(obj.name)
for blob in blob_list[1:]:
json_string = blob.download_as_string()
response = json.loads(json_string)
pages_response = response['responses'] #get complete response
for page in pages_response: #loop through all pages
annotation = page['fullTextAnnotation']
print('Full text:\n')
print(annotation['text'])
print('END OF PAGE')
print('##########################')
I used a Google Vision sample file (gs://cloud-samples-data/vision/document_understanding/custom_0773375000.pdf) and processed 3 pages. See sample run:

Format OCR text annotation from Cloud Vision API in Python

I am using the Google Cloud Vision API for Python on a small program I'm using. The function is working and I get the OCR results, but I need to format these before being able to work with them.
This is the function:
# Call to OCR API
def detect_text_uri(uri):
"""Detects text in the file located in Google Cloud Storage or on the Web.
"""
client = vision.ImageAnnotatorClient()
image = types.Image()
image.source.image_uri = uri
response = client.text_detection(image=image)
texts = response.text_annotations
for text in texts:
textdescription = (" "+ text.description )
return textdescription
I specifically need to slice the text line by line and add four spaces in the beginning and a line break in the end, but at this moment this is only working for the first line, and the rest is returned as a single line blob.
I've been checking the official documentation but didn't really find out about the format of the response of the API.
You are almost right there. As you want to slice the text line by line, instead of looping the text annotations, try to get the direct 'description' from google vision's response as shown below.
def parse_image(image_path=None):
"""
Parse the image using Google Cloud Vision API, Detects "document" features in an image
:param image_path: path of the image
:return: text content
:rtype: str
"""
client = vision.ImageAnnotatorClient()
response = client.text_detection(image=open(image_path, 'rb'))
text = response.text_annotations
del response # to clean-up the system memory
return text[0].description
The above function returns a string with the content in the image, with the lines separated by "\n"
Now, you can add prefix & suffix as you need to each line.
image_content = parse_image(image_path="path\to\image")
my_formatted_text = ""
for line in image_content.split("\n"):
my_formatted_text += " " + line + "\n"
my_formatted_text is the text you need.

How to draw image from raw bytes using ReportLab?

All the examples I encounter in the internet is loading the image from url (either locally or in the web). What I want is to draw the image directly to the pdf from raw bytes.
UPDATE:
#georgexsh Here is my code based on my understanding of your comment below:
def PDF_view(request):
response = HttpResponse(content_type='application/pdf')
...
page = canvas.Canvas(response, pagesize=A4)
page.setTitle("Sample PDF")
image = StringIO(raw_image_bytes) # raw_image_bytes is from external source
image.seek(0)
page.drawImage(image, 100, 100 )
filename = 'document.pdf'
page.showPage()
page.save()
return response
from report lab Image object source code, filelike obj is acceptable, so you could wrap image data with StringIO/io.BytesIO, pass it as filename.
updated:
as you're using drawImage method, it needs a ImageReader obj:
from reportlab.lib.utils import ImageReader
import io
image = ImageReader(io.BytesIO(raw_image_bytes))
page.drawImage(image, ...)

Uploading files with google app engine

I've been following the tutorials from google app engine for uploading an image
I've set up a simple HTML page for uploading a file using the "file" input type, and the name of the element is "file"
The form enctype is multipart/form-data and method is 'post'
Following the example, I would store the image data as a blob and store it with my object
myfile = MyFile()
myfile.filedata = db.Blob(self.request.get('file'))
myfile.filename = self.request.get('filename')
myfile.put()
But when I look at what's stored in filedata via datastore viewer, it wasn't binary image data, but instead just
236 bytes, SHA-1 = 63540e4ca6dba45e9ff3553787f7f23d330b7791
When the image I uploaded is definitely larger than 236 bytes.
Should the snippet from above retrieve the entire image and put it in a blob?
It seems like all that's being stored in the blob is the request header.

Categories