I am using the Google Cloud Vision API for Python on a small program I'm using. The function is working and I get the OCR results, but I need to format these before being able to work with them.
This is the function:
# Call to OCR API
def detect_text_uri(uri):
"""Detects text in the file located in Google Cloud Storage or on the Web.
"""
client = vision.ImageAnnotatorClient()
image = types.Image()
image.source.image_uri = uri
response = client.text_detection(image=image)
texts = response.text_annotations
for text in texts:
textdescription = (" "+ text.description )
return textdescription
I specifically need to slice the text line by line and add four spaces in the beginning and a line break in the end, but at this moment this is only working for the first line, and the rest is returned as a single line blob.
I've been checking the official documentation but didn't really find out about the format of the response of the API.
You are almost right there. As you want to slice the text line by line, instead of looping the text annotations, try to get the direct 'description' from google vision's response as shown below.
def parse_image(image_path=None):
"""
Parse the image using Google Cloud Vision API, Detects "document" features in an image
:param image_path: path of the image
:return: text content
:rtype: str
"""
client = vision.ImageAnnotatorClient()
response = client.text_detection(image=open(image_path, 'rb'))
text = response.text_annotations
del response # to clean-up the system memory
return text[0].description
The above function returns a string with the content in the image, with the lines separated by "\n"
Now, you can add prefix & suffix as you need to each line.
image_content = parse_image(image_path="path\to\image")
my_formatted_text = ""
for line in image_content.split("\n"):
my_formatted_text += " " + line + "\n"
my_formatted_text is the text you need.
Related
I'm currently using the Azure Blob Storage SDK for Python. For my project I want to read/load the data from a specific blob without having to download it / store it on disk before accessing.
According to the documentation loading a specfic blob works for my with:
blob_client = BlobClient(blob_service_client.url,
container_name,
blob_name,
credential)
data_stream = blob_client.download_blob()
data = data_stream.readall()
The last readall() command returns me the byte information of the blob content (in my case a image).
With:
with open(loca_path, "wb") as local_file:
data_stream.readinto(my_blob)
it is possible to save the blob content on disk (classic downloading operation)
BUT:
Is it also possible to convert the byte data from data = data_stream.readall() directly into an image?
It already tried image_data = Image.frombytes(mode="RGB", data=data, size=(1080, 1920))
but it returns me an error not enough image data
Here is the sample code for reading the text without downloading the file.
from azure.storage.blob import BlockBlobService, PublicAccess
accountname="xxxx"
accountkey="xxxx"
blob_service_client = BlockBlobService(account_name=accountname,account_key=accountkey)
container_name="test2"
blob_name="a5.txt"
#get the length of the blob file, you can use it if you need a loop in your code to read a blob file.
blob_property = blob_service_client.get_blob_properties(container_name,blob_name)
print("the length of the blob is: " + str(blob_property.properties.content_length) + " bytes")
print("**********")
#get the first 10 bytes data
b1 = blob_service_client.get_blob_to_text(container_name,blob_name,start_range=0,end_range=10)
#you can use the method below to read stream
#blob_service_client.get_blob_to_stream(container_name,blob_name,start_range=0,end_range=10)
print(b1.content)
print("*******")
#get the next range of data
b2=blob_service_client.get_blob_to_text(container_name,blob_name,start_range=10,end_range=50)
print(b2.content)
print("********")
#get the next range of data
b3=blob_service_client.get_blob_to_text(container_name,blob_name,start_range=50,end_range=200)
print(b3.content)
For complete information you can check the document with Python libraries.
I am using Google Vision in a python project.
Up until now it would read images from the top to the bottom. There is text at various locations in the image but it would read first line from top.
Recently, without me changing the code it started reading in chunks from left to right, completely altering the resulting text response.
Is there a way to force it to read only "vertically" ?
code snippet:
def detect_text(file):
"""Detects text in the file."""
x_client = vision.ImageAnnotatorClient()
with io.open(file, 'rb') as image_file:
content = image_file.read()
image = vision.types.Image(content=content)
response = x_client.text_detection(image=image)
texts = response.text_annotations
del response
for text in texts:
str_list.append(text.description)
return str_list
str_list = detect_text(image_url)
azure API is working for .jpg images but when i tried for gif image it show Operation returned an invalid status code 'Bad Request'
print("===== Read File - remote =====")
# Get an image with text
read_image_url = "https://ci6.googleusercontent.com/proxy/5NB2CkeM22wqFhiQSmRlJVVinEp3o2nEbZQcy6_8CCKlKst_WW25N0PcsPaYiWAASXO52hufvUAEimUd3IreGowknEXy322x5oYG3lzkBGyctLI0M3eH_w-qHH9qPqtobjpGYooM7AvyNX2CCZtcnEgu8duKlee2GGaswg=s0-d-e1-ft#https://image.e.us.partycity.com/lib/fe301570756406747c1c72/m/10/93d08fa0-c760-4d8b-8e35-ddd5308ec311.gif"
# Call API with URL and raw response (allows you to get the operation location)
read_response = computervision_client.read(read_image_url, raw=True)
You CANNOT send a gif directly to azure read api because the documentation states below:
Request body
Input passed within the POST body. Supported input methods: raw image binary or image URL.
Input requirements:
Supported image formats: JPEG, PNG, BMP, PDF and TIFF.
Please do note MPO (Multi Picture Objects) embedded JPEG files are not supported.
For multi-page PDF and TIFF documents:
For the free tier, only the first 2 pages are processed.
For the paid tier, up to 2,000 pages are processed.
Image file size must be less than 50 MB (4 MB for the free tier).
The image/document page dimensions must be at least 50 x 50 pixels and at most 10000 x 10000 pixels.
To handle the gif you need to convert into png and then send a raw binary image for recognition as shown below:
import glob
import time
import requests
from PIL import Image
endpoint = 'https://NAME.cognitiveservices.azure.com/'
subscription_key = 'SUBSCRIPTION_KEY'
read_url = endpoint + "vision/v3.2/read/analyze"
uri = 'https://ci6.googleusercontent.com/proxy/5NB2CkeM22wqFhiQSmRlJVVinEp3o2nEbZQcy6_8CCKlKst_WW25N0PcsPaYiWAASXO52hufvUAEimUd3IreGowknEXy322x5oYG3lzkBGyctLI0M3eH_w-qHH9qPqtobjpGYooM7AvyNX2CCZtcnEgu8duKlee2GGaswg=s0-d-e1-ft#https://image.e.us.partycity.com/lib/fe301570756406747c1c72/m/10/93d08fa0-c760-4d8b-8e35-ddd5308ec311.gif'
with open('/tmp/pr0n.gif', 'wb') as f:
f.write(requests.get(uri).content)
gif='/tmp/pr0n.gif'
img = Image.open(gif)
img.save(gif+".png",'png', optimize=True, quality=70)
for filename in sorted(glob.glob("/tmp/pr0n.gif*.png")):
# Read the image into a byte array
image_data = open(filename, "rb").read()
headers = {'Ocp-Apim-Subscription-Key': subscription_key, 'Content-Type': 'application/octet-stream'}
params = {'visualFeatures': 'Categories,Description,Color'}
response = requests.post(read_url, headers=headers, params=params, data=image_data)
response.raise_for_status()
# The recognized text isn't immediately available, so poll to wait for completion.
analysis = {}
poll = True
while poll:
response_final = requests.get(response.headers["Operation-Location"], headers=headers)
analysis = response_final.json()
time.sleep(1)
if "analyzeResult" in analysis:
poll = False
if "status" in analysis and analysis['status'] == 'failed':
poll = False
polygons = []
if ("analyzeResult" in analysis):
# Extract the recognized text, with bounding boxes.
print(analysis["analyzeResult"]["readResults"][0])
There is a brand new online portal provided by Microsoft to test this service, among others and input requirements for Read API.
Link: https://preview.vision.azure.com/demo/OCR
I am currently trying to read a multiple page pdf file using the google cloud vison API. I am currently only able to read the first page of the pdf and I am also getting an error in my code that gives me an error in my one line of code. I have attached pieces of my code below. How can I solve this error, and also be able to read the whole pdf instead of just one page?
Assuming that your Vision API JSON responses in your GCS are correct, you need to get the whole response instead of getting only the 1st element in response['responses'] and loop through it to get the annotation per page. See code below:
blob_list = list(bucket.list_blobs(prefix=prefix))
print('Output files:')
for obj in blob_list:
print(obj.name)
for blob in blob_list[1:]:
json_string = blob.download_as_string()
response = json.loads(json_string)
pages_response = response['responses'] #get complete response
for page in pages_response: #loop through all pages
annotation = page['fullTextAnnotation']
print('Full text:\n')
print(annotation['text'])
print('END OF PAGE')
print('##########################')
I used a Google Vision sample file (gs://cloud-samples-data/vision/document_understanding/custom_0773375000.pdf) and processed 3 pages. See sample run:
I have been using google text to speech API because of how great the voices are. The only problem is been trying to find how to make it user friendly. The biggest thing is google text to speech can only accept text files with 5000 or fewer characters. The main issue that, I have been finding is that currently all I can do is use a single text file copy and paste my stuff on there before saving. Does anyone know how can I upload a folder filled with text files to make it quicker? Plus also saving the mp3 instead of overwriting them?
# [START tts_ssml_address_imports]
from google.cloud import texttospeech
import os
import html
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] =
# [END tts_ssml_address_imports]
# [START tts_ssml_address_audio]
def ssml_to_audio(ssml_text, outfile):
# Generates SSML text from plaintext.
#
# Given a string of SSML text and an output file name, this function
# calls the Text-to-Speech API. The API returns a synthetic audio
# version of the text, formatted according to the SSML commands. This
# function saves the synthetic audio to the designated output file.
#
# Args:
# ssml_text: string of SSML text
# outfile: string name of file under which to save audio output
#
# Returns:
# nothing
# Instantiates a client
client = texttospeech.TextToSpeechClient()
# Sets the text input to be synthesized
synthesis_input = texttospeech.types.SynthesisInput(text=ssml_text)
# Builds the voice request, selects the language code ("en-US") and
# the SSML voice gender ("MALE")
voice = texttospeech.types.VoiceSelectionParams(language_code='en-US',
name="en-US-Wavenet-D",
ssml_gender=texttospeech.enums.SsmlVoiceGender.MALE))
# Selects the type of audio file to return
audio_config = texttospeech.types.AudioConfig(audio_encoding="LINEAR16", pitch = 0, speaking_rate = 0.9)
# Performs the text-to-speech request on the text input with the selected
# voice parameters and audio file type
response = client.synthesize_speech(synthesis_input, voice, audio_config)
# Writes the synthetic audio to the output file.
with open(outfile, 'wb') as out:
out.write(response.audio_content)
print('Audio content written to file ' + outfile)
# [END tts_ssml_address_audio]
def main():
# test example address file
file = 'input_text.txt'
with open(file, 'r') as f:
text = f.read()
ssml_text = text
ssml_to_audio(ssml_text, 'file_output_speech.mp3')
# [END tts_ssml_address_test]
if __name__ == '__main__':
main()