I am fairly new to the Google Cloud Vision API so my apologies if there is an obvious answer to this. I am noticing that for some images I am getting different OCR results between the Google Cloud Vision API Drag and Drop (https://cloud.google.com/vision/docs/drag-and-drop) and from local image detection in python.
My code is as follows
import io
# Imports the Google Cloud client library
from google.cloud import vision
from google.cloud.vision import types
# Instantiates a client
client = vision.ImageAnnotatorClient()
# The name of the image file to annotate
file_name = "./test0004a.jpg"
# Loads the image into memory
with io.open(file_name, 'rb') as image_file:
content = image_file.read()
image = types.Image(content=content)
response = client.text_detection(image=image)
texts = response.text_annotations
print('Texts:')
for text in texts:
# print('\n"{}"'.format(text.description.encode('utf-8')))
print('\n"{}"'.format(text.description.encode('ascii','ignore')))
vertices = (['({},{})'.format(vertex.x, vertex.y)
for vertex in text.bounding_poly.vertices])
print('bounds: {}'.format(','.join(vertices)))
A sample image that highlights this is attached Sample Image
The python code above doesn't return anything, but in the browser using drag and drop it correctly identifies "2340" as the text.
Shouldn't both python and the browser return the same result?. And if not, why not?, Do I need to include additional parameters in the code?.
The issue here is that you are using TEXT_DETECTION instead of DOCUMENT_TEXT_DETECTION, which is the feature being used in the Drag and Drop example page that you shared.
By changing the method (to document_text_detection()), you should obtain the desired results (I have tested it with your code, and it did work):
# Using TEXT_DETECTION
response = client.text_detection(image=image)
# Using DOCUMENT_TEXT_DETECTION
response = client.document_text_detection(image=image)
Although both methods can be used for OCR, as presented in the documentation, DOCUMENT_TEXT_DETECTION is optimized for dense text and documents. The image you shared is not a really high-quality one, and the text is not clear, therefore it may be that for this type of images, DOCUMENT_TEXT_DETECTION offers a better performance than TEXT_DETECTION.
See some other examples where DOCUMENT_TEXT_DETECTION worked better than TEXT_DETECTION. In any case, please note that it might not always be the situation, and TEXT_DETECTION may still have better results under certain conditions:
Getting Different Data on using Demo and Actual API
Google Vision API text detection strange behaviour
Related
I am trying to create an image thumbnail creation function using python, running in a google cloud platform's function. The image is sent as a base64 string to the cloud function, manipulated and made smaller with Python's Pillow package. It is then uploaded as an image, going from a Pillow Image object, to a BytesIO object, then saved to google cloud storage. This is all done successfully.
The problem here is very strange: Google Cloud Storage does not recognize the image until an access token is created manually. Otherwise, the image is left in an infinite loop, never loading, and never being able to be used.
I have reviewed this SO post, which has a very similar problem to mine (the image here shows exactly my problem: an uploaded file cannot be loaded properly), but it differs in two imporant categories: 1) They are manipulating the image array directly, while my code never touches it and 2) they are working in Node.js, where the Firebase SDK is different than in Python.
The code to generate the image is as follows:
def thumbnailCreator(request):
# Setting up the resourcse we are going to use
storage_client = storage.Client()
stor_bucket = storage_client.bucket(BUCKET_LINK)
# Retriving the Data
sent_data = request.get_json()['data']
name = sent_data['name']
userID = sent_data['userID']
# Process to go between base64 string to bytes, to a file object
imageString = stor_bucket.blob(PATH_TO_FULL_SIZE_IMAGE).download_as_string()
imageFile = BytesIO(imageString)
image = Image.open(imageFile)
# Resizing the image is the goal
image = image.resize(THUMBNAIL_SIZE)
# Go between pillow Image object to a file
imageFile = BytesIO()
image.save(imageFile, format='PNG')
imageBytes = imageFile.getvalue()
image64 = base64.b64encode(imageBytes)
imageFile.seek(0)
# Uploading the Data
other_blob = stor_bucket.blob(PATH_FOR_THUMBNAIL_IMAGE)
other_blob.upload_from_file(imageFile, content_type = 'image/png')
return {'data': {'response': 'ok', 'status': 200}}
Again, this works. I have a feeling there is something wrong with the MIME type. I am a novice when it comes to this type of programming/networking/image manipulation, so I'm always looking for a better way to do this. Anyway, thanks for any and all help.
It appears that the premise of this question - that a access token must be made manually for the image to work - is not accurate. After further testing, the error came from other parts of the code base I was working in. The above python script does work for image manipulation. An access token to the image can be generated via code, and be provided client-side.
Leaving this up in case someone stumbles upon it in the future when they need to work with Pillow/PIL in the Google Cloud Platform.
I am trying to scrape images from websites and use Google Cloud Vision API to detect if an image on the website is a logo. It works if I provide it a logo like Apple, but it doesn't seem to work for well known non fortune-500 tech company logos (e.g., LaunchDarkly, LogDNA, etc) despite the images clearly being logos. Is it supposed to work for any type of logo or only large brands? Is there a solution out there better suited for my needs?
client = vision.ImageAnnotatorClient()
with io.open('./img.png', 'rb') as image_file:
content = image_file.read()
image = vision.types.Image(content=content)
response = client.logo_detection(image=image)
logos = response.logo_annotations
for logo in logos:
print(logo.description)
print(logo.score)
if response.error.message:
raise Exception(
'{}\nFor more info on error messages, check: '
'https://cloud.google.com/apis/design/errors'.format(
response.error.message))
As explained in the documentation, Logo Detection detects popular product logos. It is expected that it doesn't detect logos with which the model has not been trained.
The solution you can try within CGP is AutoML Vision. This product allows you to retrain GCP's models to classify your images according to your own defined labels. You can create a dataset with the logos you need to detect and retrain the models with it. It has a very easy interface to be able to do it even if you don't have any Machine Learning expertise.
I'm using Azure Microsoft Custom Vision.
I've already created my algorithm, and what I need now is the URL of my predicted images.
I'm aware that I can get the training images with methods written in Training API (get_tagged_images), but now I'm trying to get the URL of the prediction image. In the Prediction API, there are no getters.
If I inspect the predicted image in Azure Custom Vision Portal, I can find the blob URL, but I'm unable to get that URL through a method.
How can I get the predicted image URL?
The images are available through the QueryPredictions API in the Training API.
The REST documentation is here.
The Python documentation is here.
Here's what your code might look like:
from azure.cognitiveservices.vision.customvision.training import CustomVisionTrainingClient
from azure.cognitiveservices.vision.customvision.training.models import PredictionQueryToken
# Set your region
endpoint = 'https://<your region>.api.cognitive.microsoft.com'
# Set your Training API key
training_key = '<your training key>'
# Set your Project ID
project_id = '<your project id>'
# Query the stored prediction images
trainer = CustomVisionTrainingClient(training_key, endpoint=endpoint)
token = PredictionQueryToken()
response = trainer.query_predictions(project_id, token)
# Get the image URLs, for example
urls = [result.original_image_uri for result in response.results]
It seems that the links of API references in your description are not correct. And there are several versions of Azure Microsoft Custom Vision APIs as the figure below, you can refer to https://<your region, such as southcentralus>.dev.cognitive.microsoft.com/docs/services/?page=2 to see them, and the APIs for getting training images are belong to training stage.
So if you want to get the urls of the training images, first you need to find out what version of Custom Vision Training you used now. As I know, you can see the version information at the Overview & Quick start tabs of your subscription on Azure portal. For example, my custom vision is 1.0 as the figures below.
Fig 1. Overview tab
Fig 2. Quick start tab, and click the API reference to see its documents related to the version
So I can see there are three APIs satisfied your needs, as the figure below.
Here is my sample code to list all tagged images via GetAllTaggedImages(v1.0).
import requests
projectId = "<your project id from project settings of Cognitive portal>"
endpoint = f"https://southcentralus.api.cognitive.microsoft.com/customvision/v1.0/Training/projects/{projectId}/images/tagged/all"
print(endpoint)
headers = {
'Training-key': '<key from keys tab of Azure portal or project settings of Cognitive portal>',
}
resp = requests.get(endpoint, headers=headers)
print(resp.text)
import json
images = json.loads(resp.text)
image_urls = (image['ImageUri'] for image in images)
for image_url in image_urls:
print(image_url)
Hope it helps.
I have the follow function that passes a image url to google vision service and returns the letters and numbers (characters) in the image. It works fine with general web urls but I'm calling it to access files stored in Google storage, it doesn't work. How can i get this to work? I've looked at examples from googling but I cant work out how to do this?
If its not possible to use google storage, is there a way you can just upload the image rather than storing in on a file system? I have no need for storing the image, all i care about is the returned characters.
def detect_text_uri(uri):
"""Detects text in the file located in Google Cloud Storage or on the Web.
"""
from google.cloud import vision
client = vision.ImageAnnotatorClient()
image = vision.types.Image()
image.source.image_uri = uri
image.source.gcs_image_uri = uri
response = client.text_detection(image=image)
texts = response.text_annotations
print('Texts:')
for text in texts:
print('\n"{}"'.format(text.description))
vertices = (['({},{})'.format(vertex.x, vertex.y)
for vertex in text.bounding_poly.vertices])
print('bounds: {}'.format(','.join(vertices)))
return texts
{
This line doesn't work which should read an image I've placed in google storage, all thats returned is a blank responce:
detect_text_uri("'source': {'image_uri': 'gs://ocr_storage/meter_reader.jpg'}")
This line works fine :
detect_text_uri('https://upload.wikimedia.org/wikipedia/commons/thumb/4/4a/Transparent_Electricity_Meter_found_in_Israel.JPG/220px-Transparent_Electricity_Meter_found_in_Israel.JPG')
Your function is just expecting the gcs uri
detect_text_uri('gs://ocr_storage/meter_reader.jpg')
What that function is waiting for, is the URI from google storage. But before running that cell you need to log in with the json file which contains your credentials:
os.environ['GOOGLE_APPLICATION_CREDENTIALS']=r"credentials.json"
You can use a similar function to read text from a local image.
The code is in this link: https://cloud.google.com/vision/docs/ocr#vision_text_detection_gcs-python
Is there a way for read text from an image, some kind of text recognition method using python??
I need to read some images and get the text written on them.
I have been searching libraries such as pytesser, PIL and pillow, but anyone knows something else?
For windows and python 3.6.1
Thank you,
Marcus
The Google Vision API might help. It is able to pull out what objects are present in an image as well as other information (brands, colors, face detection etc). It can pull out text pretty reliably too.
https://cloud.google.com/vision/
Here is some example code from their website using the Python Client Library:
import io
import os
# Imports the Google Cloud client library
from google.cloud import vision
# Instantiates a client
vision_client = vision.Client()
# The name of the image file to annotate
file_name = os.path.join(
os.path.dirname(__file__),
'resources/wakeupcat.jpg')
# Loads the image into memory
with io.open(file_name, 'rb') as image_file:
content = image_file.read()
image = vision_client.image(
content=content)
# Performs label detection on the image file
labels = image.detect_labels()
print('Labels:')
for label in labels:
print(label.description)