Is there a way for read text from an image, some kind of text recognition method using python??
I need to read some images and get the text written on them.
I have been searching libraries such as pytesser, PIL and pillow, but anyone knows something else?
For windows and python 3.6.1
Thank you,
Marcus
The Google Vision API might help. It is able to pull out what objects are present in an image as well as other information (brands, colors, face detection etc). It can pull out text pretty reliably too.
https://cloud.google.com/vision/
Here is some example code from their website using the Python Client Library:
import io
import os
# Imports the Google Cloud client library
from google.cloud import vision
# Instantiates a client
vision_client = vision.Client()
# The name of the image file to annotate
file_name = os.path.join(
os.path.dirname(__file__),
'resources/wakeupcat.jpg')
# Loads the image into memory
with io.open(file_name, 'rb') as image_file:
content = image_file.read()
image = vision_client.image(
content=content)
# Performs label detection on the image file
labels = image.detect_labels()
print('Labels:')
for label in labels:
print(label.description)
Related
I already have the body of my code in which I can create an album and then take a picture directly from Pythonista. After that, I would want to take this recently took picture and transfer it into the album that I just created. This is what I have:
import photos
import console
console.clear()
nom_album = contacts
photos.create_album(nom_album)
img=photos.capture_image()
The photos.create_album method returns an Asset-Collection object. Asset-Collections have a method, add_assets, that take a list of “assets”, that is, photos. Specifically, as I read it, an Asset is a photo that is already in the iOS device’s photo library. To add a photo to an album, the photo must already be in the device’s photo library.
The capture_image method does not return Asset objects, however. That is, it does not automatically add the new photo to the device’s photo library. You can verify this in your own code: the images you take using that code should not be in your device’s “recents” album.
Instead, capture_image returns a PIL image. I do not see any way to add a PIL image to the device’s photo library directly. What I was able to do was save the PIL image locally and then convert the saved file into an Asset. That is, (1) add the saved file to the device’s photo library, using create_image_asset; then (2) that Asset can then be added to an Asset-Collection.
Here‘s an example:
import photos
#a test album in the device’s library
#note that multiple runs of this script will create multiple albums with this name
testAlbum = 'Pythonista Test'
#the filename to save the test photograph to in Pythonista
testFile = testAlbum + '.jpg'
#create an album in the device’s photo library
newAlbum = photos.create_album(testAlbum)
#take a photo using the device’s camera
newImage = photos.capture_image()
#save that photo to a file in Pythonista
newImage.save(testFile)
#add that newly-created file to the device’s photo library
newAsset = photos.create_image_asset(testFile)
#add that newly-created library item to the previously-created album
newAlbum.add_assets([newAsset])
If you don’t want to keep the file around in your Pythonista installation, you can use os.remove to remove it.
import os
os.remove(testFile)
While saving a PIL image to a local file first and then adding the file to the device’s library seems a convoluted means of adding a PIL image to the device’s library, it appears to be the expected means of doing this. In Photo Library Access on iOS, the documentation says:
To add a new image asset to the photo library, use the create_image_asset() function, providing the path to an image file that you’ve saved to disk previously.
I am trying to create an image thumbnail creation function using python, running in a google cloud platform's function. The image is sent as a base64 string to the cloud function, manipulated and made smaller with Python's Pillow package. It is then uploaded as an image, going from a Pillow Image object, to a BytesIO object, then saved to google cloud storage. This is all done successfully.
The problem here is very strange: Google Cloud Storage does not recognize the image until an access token is created manually. Otherwise, the image is left in an infinite loop, never loading, and never being able to be used.
I have reviewed this SO post, which has a very similar problem to mine (the image here shows exactly my problem: an uploaded file cannot be loaded properly), but it differs in two imporant categories: 1) They are manipulating the image array directly, while my code never touches it and 2) they are working in Node.js, where the Firebase SDK is different than in Python.
The code to generate the image is as follows:
def thumbnailCreator(request):
# Setting up the resourcse we are going to use
storage_client = storage.Client()
stor_bucket = storage_client.bucket(BUCKET_LINK)
# Retriving the Data
sent_data = request.get_json()['data']
name = sent_data['name']
userID = sent_data['userID']
# Process to go between base64 string to bytes, to a file object
imageString = stor_bucket.blob(PATH_TO_FULL_SIZE_IMAGE).download_as_string()
imageFile = BytesIO(imageString)
image = Image.open(imageFile)
# Resizing the image is the goal
image = image.resize(THUMBNAIL_SIZE)
# Go between pillow Image object to a file
imageFile = BytesIO()
image.save(imageFile, format='PNG')
imageBytes = imageFile.getvalue()
image64 = base64.b64encode(imageBytes)
imageFile.seek(0)
# Uploading the Data
other_blob = stor_bucket.blob(PATH_FOR_THUMBNAIL_IMAGE)
other_blob.upload_from_file(imageFile, content_type = 'image/png')
return {'data': {'response': 'ok', 'status': 200}}
Again, this works. I have a feeling there is something wrong with the MIME type. I am a novice when it comes to this type of programming/networking/image manipulation, so I'm always looking for a better way to do this. Anyway, thanks for any and all help.
It appears that the premise of this question - that a access token must be made manually for the image to work - is not accurate. After further testing, the error came from other parts of the code base I was working in. The above python script does work for image manipulation. An access token to the image can be generated via code, and be provided client-side.
Leaving this up in case someone stumbles upon it in the future when they need to work with Pillow/PIL in the Google Cloud Platform.
I am fairly new to the Google Cloud Vision API so my apologies if there is an obvious answer to this. I am noticing that for some images I am getting different OCR results between the Google Cloud Vision API Drag and Drop (https://cloud.google.com/vision/docs/drag-and-drop) and from local image detection in python.
My code is as follows
import io
# Imports the Google Cloud client library
from google.cloud import vision
from google.cloud.vision import types
# Instantiates a client
client = vision.ImageAnnotatorClient()
# The name of the image file to annotate
file_name = "./test0004a.jpg"
# Loads the image into memory
with io.open(file_name, 'rb') as image_file:
content = image_file.read()
image = types.Image(content=content)
response = client.text_detection(image=image)
texts = response.text_annotations
print('Texts:')
for text in texts:
# print('\n"{}"'.format(text.description.encode('utf-8')))
print('\n"{}"'.format(text.description.encode('ascii','ignore')))
vertices = (['({},{})'.format(vertex.x, vertex.y)
for vertex in text.bounding_poly.vertices])
print('bounds: {}'.format(','.join(vertices)))
A sample image that highlights this is attached Sample Image
The python code above doesn't return anything, but in the browser using drag and drop it correctly identifies "2340" as the text.
Shouldn't both python and the browser return the same result?. And if not, why not?, Do I need to include additional parameters in the code?.
The issue here is that you are using TEXT_DETECTION instead of DOCUMENT_TEXT_DETECTION, which is the feature being used in the Drag and Drop example page that you shared.
By changing the method (to document_text_detection()), you should obtain the desired results (I have tested it with your code, and it did work):
# Using TEXT_DETECTION
response = client.text_detection(image=image)
# Using DOCUMENT_TEXT_DETECTION
response = client.document_text_detection(image=image)
Although both methods can be used for OCR, as presented in the documentation, DOCUMENT_TEXT_DETECTION is optimized for dense text and documents. The image you shared is not a really high-quality one, and the text is not clear, therefore it may be that for this type of images, DOCUMENT_TEXT_DETECTION offers a better performance than TEXT_DETECTION.
See some other examples where DOCUMENT_TEXT_DETECTION worked better than TEXT_DETECTION. In any case, please note that it might not always be the situation, and TEXT_DETECTION may still have better results under certain conditions:
Getting Different Data on using Demo and Actual API
Google Vision API text detection strange behaviour
I am developing a chat-bot using wit.ai and my own UI instead of facebook messenger. I am using python to implement actions. This post and this post gives some insights about how this can be done in facebook messenger. However I want to have image upload and display functionality in my own UI, which uses wit.ai. How can this be done?
My current code can extract an intent named upload and call uploadImage() function. What should be there in uploadImage() function that can upload an image and even display in Chat UI?
The following works for a general Python program. I am not sure it is the proper way to do it with wit.ai.
If you want to do image processing on the image I recommend the OpenCV library. Using that and the easygui library you can prompt the user for an image, read it, and display it. The following code shows how do to it. The dialog box defaults to the folder "c:\" and has filters for png and jpg files. You will need to figure out how to display the image in your UI.
import numpy as np
import cv2
import easygui
# Prompt the user to open a file.
file_path = easygui.fileopenbox(msg='Locate an image file',
filetypes=["*.png", "*.jpg"],
title='Specify the image file to upload',
default='c:\*.png')
# Load an image
img = cv2.imread(file_path)
cv2.imshow('image', img)
cv2.waitKey(0)
cv2.destroyAllWindows()
I am trying to write a python program to download images from glance service. However, I could not find a way to download images from the cloud using the API. In the documentation which can be found here:
http://docs.openstack.org/user-guide/content/sdk_manage_images.html
they explain how to upload images, but not to download them.
The following code shows how to get image object, but I don't now what to do with this object:
import novaclient.v1_1.client as nvclient
name = "cirros"
nova = nvclient.Client(...)
image = nova.images.find(name=name)
is there any way to download the image file and save it on disk using this object "image"?
Without installing glance cli you can download image via HTTP call as described here:
http://docs.openstack.org/developer/glance/glanceapi.html#retrieve-raw-image-data
For the python client you can use
img = client.images.get(IMAGE_ID)
and then call
client.images.data(img) # or img.data()
to retrieve generator by which you can access raw data of image.
Full example (saving image from glance to disk):
img = client.images.find(name='cirros-0.3.2-x86_64-uec')
file_name = "%s.img" % img.name
image_file = open(file_name, 'w+')
for chunk in img.data():
image_file.write(chunk)
You can do this using glance CLI with image-download command:
glance image-download [--file <FILE>] [--progress] <IMAGE>
You will have to install glance cli for this.
Also depending upon the cloud provider/service that you are using, this operation may be disabled for regular user. You might have to check with your provider.