I have bunch of videos in my s3 bucket and i want to convert their format using python but Currently I'm stuck at one issue. My python script for fetching all objects of bucket is as below.
s3 = boto3.client('s3',
region_name = S3_REGION,
aws_access_key_id = S3_ACCESS_KEY_ID,
aws_secret_access_key = S3_ACCESS_SECRET_KEY)
result = s3.list_objects(Bucket = bucket_name, Prefix='videos/')
for o in result.get('Contents'):
data = s3.get_object(Bucket=bucket_name, Key=o.get('Key'))
And for conversion of video format i have used MoviePy library which convert video format to mp4
import moviepy.editor as moviepy
clip = moviepy.VideoFileClip("video-529.webm")
clip.write_videofile("converted-recording.mp4")
But problem with this library is it need a file only you can not pass s3 object as a file so i don't know how to overcome this issue if anyone have better idea for this then please help me ? How to resolve this ?.
You are correct. Libraries require the video file to be on the 'local disk', so you should use download_file() instead of get_object().
Alternatively, you could use Amazon Elastic Transcoder to transcode the file 'as a service' rather than doing it in your own code. (Charges apply, based on video length.)
Related
I've been trying to send an image to firebase storage but when it gets to the storage, firebase can't render the image.
The image for now is pure base64.
versions:
Python 3.10.6
firebase==3.0.1
firebase-admin==6.0.1
Flask==2.0.3
dontpad.com for the base64 being used
Code:
def filePath(folderPath):
return f'{folderPath}/{date.today()}'
def fileUpload(file,folderPath):
fileName = filePath(folderPath)
from firebase_admin import storage
bucket = storage.bucket()
blob = bucket.blob(fileName)
blob.upload_from_string(file,'image/jpg' )
blob.make_public()
return blob.public_url
Additional info if needed will be provided when asked.
Expected:
Result:
What did I try?
Alternative data objects to replace base64 has been studied in the project but base64 is the only data I'm provided for the image so alternative ways have been discarded.
Most similar questions have used JavaScript, that's not my case, and they use different libraries with different methods and parameters so that hasn't helped my case.
Tried adding "data:image/jpeg;base64," to the start of the filename.
Tried replacing content type with "data_url" or "base64".
Tried uploading with and without the extension on the filename.
I started trying Amazon Rekognition to compare faces called from a Lambda execution. my model starts from user uploading an image and S3 will send an event to trigger a lambda that directly fetches the two closest images in the bucket to compare faces, but I can't read the image from the address S3's URI to Lambda to compare faces which has to create a test to read two images from S3 up. Do you guys have a way to get the URI address from S3 to Lambda to compare faces?
This is my test.
{
"sourceImage": "source.jpg",
"targetImage": "target.jpg"
}
This is the main program
import json
import boto3
s3 = boto3.resource('s3')
def lambda_handler(event, context):
print(event)
dump = json.loads(json.dumps(event))
sourceImage = dump['sourceImage']
targetImage = dump['targetImage']
bucket='your_name'
client = boto3.client('rekognition')
faceComparison= client.compare_faces(
SourceImage={'S3Object': {'Bucket':bucket,'Name':str(sourceImage)}},
TargetImage={'S3Object': {'Bucket':bucket,'Name':str(targetImage)}}
)
res = {
"faceRecognition": faceComparison
}
return res
You cannot use URI to access objects on s3 unless it is a public object/bucket. There are 2 alternatives you can use:
You can use the download_fileobj method from boto3 to get streamingbody of the object and pass that to your function.
You can use download_file method to download the file to /tmp location in lambda and then give path of that file to your function.
If you want to perform a Lambda function that reads objects from Amazon S3, use the Amazon S3 API to read the object bytes and DO NOT use an Object URI.
This AWS tutorial performs a very similar use case:
The Lambda function reads all images in a given Amazon S3 bucket. For each object in the bucket, it passes the image to the Amazon Rekognition service to detect PPE information. The results are stored as records in an Amazon DynamoDB table and then emailed to a recipient.
So instead of comparing faces, it detects PPE gear. Its implemented in Java, but you can port it to your programming language. It will point you in the right direction:
Creating an AWS Lambda function that detects images with Personal Protective Equipment
I`m trying to use Google Cloud Speech-to-Text API.
I converted mp3 audio file format to .raw as I understood from API documentation, and uploaded to bucket storage.
Here is my code:
def transcribe_gcs(gcs_uri):
"""Asynchronously transcribes the audio file specified by the gcs_uri."""
from google.cloud import speech
from google.cloud.speech import enums
from google.cloud.speech import types
client = speech.SpeechClient()
audio = types.RecognitionAudio(uri=gcs_uri)
config = types.RecognitionConfig(
encoding=enums.RecognitionConfig.AudioEncoding.FLAC,
sample_rate_hertz=16000,
language_code='en-US')
operation = client.long_running_recognize(config, audio)
print('Waiting for operation to complete...')
response = operation.result()
# Each result is for a consecutive portion of the audio. Iterate through
# them to get the transcripts for the entire audio file.
for result in response.results:
# The first alternative is the most likely one for this portion.
print(u'Transcript: {}'.format(result.alternatives[0].transcript))
print('Confidence: {}'.format(result.alternatives[0].confidence))
transcribe_gcs("gs://cloudh3-200314.appspot.com/cs.raw")
What I`m doing wrong?
I faced a similar issue, this is something to do with the format that is acceptable. Even though you may have converted into RAW, there still could be something wrong with the format, it wouldn't give you output if it can't read the file.
I recently processed a 56 min audio that took 17 mins so that should give you an idea of how long it should be.
Process your file using sox, I found the conversion parameters that work using the command -
sox basefile.mp3 -r 16000 -c 1 newfile.flac
I'm trying to save a user uploaded file directly to S3 without saving it locally. This project is using Django 1.9 and Boto3.
The relevant code is:
p=request.FILES['img'].read()
s3=boto3.resource('s3',
aws_access_key_id=settings.AWS_ACCESS_KEY_ID,
aws_secret_access_key=settings.AWS_SECRET_ACCESS_KEY)
b = s3.Bucket(settings.AWS_STORAGE_BUCKET_NAME)
b.put_object(Key="media/test.jpg", Body=p)
This correctly uploads a file called 'test.jpg' to the media folder.
However, if I download 'test.jpg' from Amazon and try to open it in an image viewer, I get the message: "Error interpreting JPEG image file (Not a JPEG file: starts with 0xf0 0xef)". The jpg file is also only 26kb whereas the original was 116kb.
What is going wrong? I assume I am passing the wrong data as Body in the put_object method. But what should p be instead?
Update and Solutions
With JordonPhilips's help, I realised that because I had already opened the uploaded image earlier in the view with Pillow, the request.FILES['img'] socket had already been read.
The solution I went with was to remove the Pillow code, leaving the boto upload as the first access of request.FILES['img'].
However, I also figured out a solution if you want to do something to the image first (e.g. in Pillow):
from Pillow import Image
import cStringIO as StringIO
import boto3
and then in the view function:
im = Image.open(request.FILES['img'])
# whatever image analysis here
file2 = StringIO.StringIO()
im.save(file2,"jpeg",quality='keep')
s3 = boto3.resource( 's3', aws_access_key_id=settings. AWS_ACCESS_KEY_ID, aws_secret_access_key=settings.AWS_SECRET_ACCESS_KEY)
b = s3.Bucket(settings.AWS_STORAGE_BUCKET_NAME)
b.put_object(Key="media/test.jpg", Body=file2.getvalue())
It looks like your problem was that you were trying to read the socket multiple times. You can only read the socket once, so you need to keep reference to important information.
I want to upload some video clips to Amazon S3. These videos are generated as intermediate results. So, I prefer to store these small video clips in memory (around 400~500 KB), then upload each of them to S3.
After uploading, the temporary files can be removed from memory. Hence, I want to use tempfile. But there are errors in the following code. Please take a look and how to do it correctly?
#contextmanager
def s3upload(key):
with tempfile.SpooledTemporaryFile(max_size=1021*1000) as buffer:
yield buffer
buffer.seek(0)
# key.send_file(buffer)
k.set_contents_from_file(buffer)
k.set_acl('public-read')
conn = boto.connect_s3()
b = conn.get_bucket('cc_test_s3')
k = Key(b)
k.key = '1.flv'
mime = mimetypes.guess_type('1.flv')[0]
with s3upload(k) as out:
out.write('1.flv')
Output:
The size of file uploaded is 5 KB, which is much less than the actual size of 1.flv (~400 KB).
I would recommend you to use s3fuse , which will basically mount your s3 bucket on your local drive and then you can directly save the files as if you are saving in local directory. for reference you can look at s3fuse - google-code