I am trying to compare faces using AWS Rekognitionthrough Python boto3, as instructed in the AWS documentation.
My API call is:
client = boto3.client('rekognition', aws_access_key_id=key, aws_secret_access_key=secret, region_name=region )
source_bytes = open('source.jpg', 'rb')
target_bytes = open('target.jpg', 'rb')
response = client.compare_faces(
SourceImage = {
'Bytes':bytearray(source_bytes.read())
},
TargetImage = {
'Bytes':bytearray(target_bytes.read())
},
SimilarityThreshold = SIMILARITY_THRESHOLD
)
source_image.close()
target_image.close()
But everytime I run this program,I get the following error:
botocore.errorfactory.InvalidParameterException: An error occurred (InvalidParameterException) when calling the CompareFaces operation: Request has Invalid Parameters
I have specified the secret, key, region, and threshold properly. How can I clear off this error and make the request call work?
Your code is perfectly fine,
image dimensions matters when it comes to AWS Rekognition.
Limits in Amazon Rekognition
The following is a list of limits in Amazon Rekognition:
Maximum image size stored as an Amazon S3 object is limited to 15 MB.
The minimum pixel resolution for height and width is 80 pixels.Maximum images size as raw bytes passed in as parameter to an API is 5 MB.
Amazon Rekognition supports the PNG and JPEG image formats. That is, the images you provide as input to various API operations, such as DetectLabels and IndexFaces must be in one of the supported formats.
Maximum number of faces you can store in a single face collection is 1 million.
The maximum matching faces the search API returns is 4096.
source: AWS Docs
For those still looking for answer,
I had the same problem, while, #mohanbabu pointed to official docs for what should go in to compare_faces, what I realised is that compare_faces looks for faces in both SourceImage and TargetImage. I confirmed this by first detecting faces using aws's detect_faces and passing deteced faces to compare_faces.
compare_faces failed almost all the time when face detected by detect_faces was a littile obscure.
So, to summerize if any of your SourceImage or TargetImage is tightly cropped to face AND that face is not instantly obvious, compare_faces will fail.
There can be other reason but this observation worked for me.
ex:
In the above image you can fairly confidently say there is a face in the middle
But,
Now, not so obvious.
This was the reason for me atleast, check both your images and you should know.
The way you are opening the file, you don't need to cast to bytearray.
Try this:
client = boto3.client('rekognition', aws_access_key_id=key, aws_secret_access_key=secret, region_name=region )
source_bytes = open('source.jpg', 'rb')
target_bytes = open('target.jpg', 'rb')
response = client.compare_faces(
SourceImage = {
'Bytes':source_bytes.read()
},
TargetImage = {
'Bytes':target_bytes.read()
},
SimilarityThreshold = SIMILARITY_THRESHOLD
)
source_image.close()
target_image.close()
Related
I need to find the optimal way to upload a large number of images (up to a few thousand) of size ~6MB per image on average. Our service is written in Python.
We have the following flow:
There is a service that has a single BlobServiceClient created. We are using CertificateCredentials to authenticate
Service is running in a container on Linux and written in Python code
Service is receiving a message that has 6 to 9 images as Numpy ndarray + JSON metadata object for each
every time we get a message we are sending all the files plus JSON files to storage using ThreadPoolExecutor with max_threads = 20
We are NOT using the async version of the library
Trimmed out and simplified code will look like this (below will not work, just an illustration, azurestorageclient is out wrapper around Azure Python SDK. It has single BlobServiceClient instance that we are using to create containers and upload blobs):
def _upload_file(self,
blob_name: str,
data: bytes,
blob_type: BlobType,
length=None):
blob_client = self._upload_container.get_blob_client(blob_name)
return blob_client.upload_blob(data, length=len(data), blob_type=BlobType.BlockBlob)
def _upload(self, executor: ThreadPoolExecutor, storage_client: AzureStorageClient,
image: ndarray, metadata: str) -> (Future, Future):
DEFAULT_LOGGER.info(f"Uploading image blob: {img_blob_name} ...")
img_upload_future = executor.submit(
self.upload_file,
blob_name=img_blob_name, byte_array=image.tobytes(),
content_type="image/jpeg",
overwrite=True,
)
DEFAULT_LOGGER.info(f"Uploading JSON blob: {metadata_blob_name} ...")
metadata_upload_future = executor.submit(
self.upload_file,
blob_name=metadata_blob_name, byte_array=metadata_json_bytes,
content_type="application/json",
overwrite=True,
)
return img_upload_future, metadata_upload_future
def send(storage_client: AzureStorageClient,
image_data: Dict[metadata, ndarray]):
with ThreadPoolExecutor(max_workers=_THREAD_SEND_MAX_WORKERS) as executor:
upload_futures = {
image_metadata: _upload(
executor=executor,
storage_client=storage_client,
image=image,
metadata=metadata
)
for metadata, image in image_data.items()
}
We observe a very bad performance of such a service when uploading files in a slow network with big signal strength fluctuations.
We are now trying to find and measure different options how to improve performance:
We will store files to HDD first and then upload them in bigger chunks from time to time
We think that uploading a single big file should perform better (e.g. 100files into zip/tar file)
We think that reducing the number of parallel jobs when the connection is bad should be also better
We consider using AzCopy instead of Python
Has anyone other suggestions or nice code samples in Python on how to work in such scenarios? Or maybe we should change a service that is used to upload data? For example use ssh to connect to VM and upload files that way (I doubt it will be faster, but got such suggestions).
Mike
According to your situation, I suggest you zip some files as a big file and upload the bigfile in chunks. Regarding how to upload the file in chunks, you can use the method BlobClient.stage_block and BlobClient.commit_block_list to implement it.
For example
block_list=[]
chunk_size=1024
with open('csvfile.csv','rb') as f:
while True:
read_data = f.read(chunk_size)
if not read_data:
break # done
blk_id = str(uuid.uuid4())
blob_client.stage_block(block_id=blk_id,data=read_data)
block_list.append(BlobBlock(block_id=blk_id))
blob_client.commit_block_list(block_list)
I am making a program in python that scans receipts and relies on an OCR response using the OCRSpace API. It has worked perfectly in that past with a couple hundred tries but when uploading an image to my flask server from an iphone instead of a computer, the image's contents do not have an OCR result. I have tried using the same image on their website and it gives a normal response but with my flask app it returns
parsed_results = result.get("ParsedResults")[0]
TypeError: 'NoneType' object is not subscriptable
I am using the code:
img = cv2.imread(file_path)
height, width, _ = img.shape
roi = img[0: height, 0: width]
_, compressedimage = cv2.imencode(".jpg", roi, [1, 90])
file_bytes = io.BytesIO(compressedimage)
url_api = "https://api.ocr.space/parse/image"
result = requests.post(url_api,
files = {os.path.join(r'PATH', file_name): file_bytes},
data = {"apikey": "KEY",
"language": "eng",
#"OCREngine": 2,
"isTable": True})
result = result.content.decode()
result = json.loads(result)
parsed_results = result.get("ParsedResults")[0]
global OCRText
OCRText = parsed_results.get("ParsedText")
Thanks for any help in advance!
iPhones and iPads as of iOS 11 use HEIF as standard; there are no incompatibilities when transferring to PC or sending e.g. by sharing, as the images are converted to the widely supported JPEG format; however, incompatibilities arise when using cloud services e.g. Google Photos.
High Efficiency Image File Format (HEIF)
As #rob247 posted IPhones are using HEIF format by default(official link here)
So when you uploaded photos to the script please try converting it to JPEG before use since opencv does not support *heif,*avif,*heic yet see issue #14534 also view the list of supported formats at opencv imread if you prefer other formats
I am attempting to create a function in Python in which I pass a filename and an image object, which I want to be uploaded to a Google storage bucket. I have the bucket already created, I have all the credentials in an environment variable, but I'm confused about the whole process.
Currently I have the following setup:
class ImageStorage:
bucket_name = os.getenv('STORAGE_BUCKET_NAME')
project_name = os.getenv('STORAGE_BUCKET_PROJECT_ID')
client = storage.Client(project=project_name)
bucket = client.get_bucket(bucket_name)
def save_image(self, filename, image):
blob = self.bucket.blob(filename)
blob.upload_from_file(image)
But once I run this, I get the error:
total bytes could not be determined. Please pass an explicit size, or supply a chunk size for a streaming transfer.
I'm not sure how I can provide a bytes size of this image object. Do I first need to create a file locally from the image object and then pass onto uploading it?
As per the Github issue, you should provide chunk_size parameter for stream upload.
blob = self.bucket.blob(filename, chunk_size=262144) # 256KB
blob.upload_from_file(image)
chunk_size (int) – The size of a chunk of data whenever iterating (in bytes). This must be a multiple of 256 KB per the API specification.
This question might be fairly straightforward, if you have some experience with Python Flask, Boto3, Pillow (a.k.a. PIL).
I'm attempting to receive an incoming image from a client (only allowing .jpg, .jpeg, .tif,) and i'd like to read the dimensions of the image before uploading it to Amazon S3 using Boto3.
The code is fairly straight forward:
file = request.files['file']
# produces an instance of FileStorage
asset = models.Asset(file, AssetType.profile_img, donor.id)
# a model managed by the ORM
img = Image.open(BytesIO(file.stream.read()))
# produces a PIL Image object
size = img.size
# read the size of the Image object
asset.width = size[0]
asset.height = size[1]
# set the size to the ORM
response = s3.Object('my-bucket', asset.s3_key()).put(Body=file)
# upload to S3
Here's the catch, I can either (A) read the image OR (B) upload to s3, but I can't do both. Literally, commenting out one or the other produces the desired operation, but not both in combination.
I've narrowed it down to the upload. It's my belief that somewhere along the line, the file.strea.read() operation is causing an issue with the Boto3 upload, but I can't figure it out. Can you?
Thanks in advance.
You're close - changing the byte source for S3 should do it. Roughly, something like this:
file = request.files['file']
# produces an instance of FileStorage
asset = models.Asset(file, AssetType.profile_img, donor.id)
# a model managed by the ORM
image_bytes = BytesIO(file.stream.read())
# save bytes in a buffer
img = Image.open(image_bytes)
# produces a PIL Image object
size = img.size
# read the size of the Image object
asset.width = size[0]
asset.height = size[1]
# set the size to the ORM
image_bytes.seek(0)
response = s3.Object('my-bucket', asset.s3_key()).put(Body=image_bytes)
# upload to S3
Note the call to seek and the use of BytesIO in the call to S3. I can't overstate how useful BytesIO and StringIO are for doing this sort of thing!
I'm trying to store images in database.This is my code for get an Image :
image = Image.open(...a resource on web...)
imageData = StringIO.StringIO()
image.save(imageData, image.format)
myImage = imageData.getvalue()
But when trying to store in database by this:
myTable.create(...some fields , image=myImage)
I catch an exception with this message:
Bad Request: Invalid STRING constant(ffd8ffe0.. and so on...adss4das) for image of type blob
I previously store images by these codes using Cassandra1.2.9!
But when I installed Cassandra2.0 , this problem happened!
I check my code line by line,and I'm sure that error in the way of storing images in C2.0 or getting image.
I think you're having problems with this: https://github.com/datastax/python-driver/pull/39. I'm sure that cqlengine isn't updated yet to take advantage of that fix (I just merged the pull request today), but that at least explains what the problem is.
As a workaround, you might be able to do something like:
from binascii import hexlify
hex_image = '0x' + hexlify(myImage)
myTable.create(..., image=hex_image)