Boto3 S3 NosuchKey error when downloading file - python

I am accessing the Project Gutenberg API and I wrote a Python program that creates a text file out of a randomly chosen book from the API. Since the txt file is different each time the document in the S3 bucket should be blank. I want to be able to store an object I can constantly write over in S3 then pull from it in Flask and put it on the user's computer.
So far I have been using boto3 and I set up an AWS account with a specific bucket. I loaded a trial .txt file in there but when the program is accessed now it only downloads the file I put in there with the specific text, it doesn't change based on my program like it should.
Boto seems to be throwing me for a loop so if there's another way I am open to it.
My code right now is a mess. I'm throwing everything I can at it to make it work but I know I've reach the point where I need some help.
from flask import Flask, render_template
from gutenberg.acquire import load_etext
from gutenberg.cleanup import strip_headers
import random
import os
from flask import send_from_directory
import boto3
from flask import Flask, request
from flask import Flask, Response
from boto3 import client
import botocore
app = Flask(__name__, static_url_path='/static')
def get_client():
return client(
's3',
'us-east-1',
aws_access_key_id='XXXXXXX',
aws_secret_access_key='XXXXXXX'
)
#app.route('/')
def welcome():
return 'Welcome to the server'
#app.route('/favicon.ico')
def favicon():
return send_from_directory(os.path.join(app.root_path, 'static'), 'favicon.ico', mimetype='image/vnd.microsoft.icon')
#app.route('/roulette', methods=['POST', 'GET'])
def roulette():
s3 = get_client()
f = open("GutProject.txt", "w")
file = s3.get_object(Bucket='book-roulette', Key='GutProject.txt')
for x in range(1):
y = (random.randint(0, 59000))
text = strip_headers(load_etext(y)).strip()
s3_client = boto3.client('s3')
open('GutProject.txt').write(text)
s3_client.upload_file('GutProject.txt', 'book-roulette', 'Gut-remote.txt')
s3_client.download_file('book-roulette', 'hello-remote.txt', 'hello2.txt')
print(open('hello2.txt').read())s3_client.download_file('MyBucket', 'hello-remote.txt', 'hello2.txt')
print(open('hello2.txt').read())
return Response(
file['Body'].read(),
mimetype='text/plain',
headers={"Content-Disposition": "attachment;filename=GutProject.txt"}
)
if __name__ == "__main__":
app.run(debug=True)
My app should let the user click a button on the URL page and it will download a random file to their computer. The HTML works great and the Python/Flask worked before but the file wasn't downloading (I'm on Heroku).
I keep getting these errors:
botocore.errorfactory.NoSuchKey
botocore.errorfactory.NoSuchKey: An error occurred (NoSuchKey) when calling the GetObject operation: The specified key does not exist.

If the error stems from the line: s3_client.download_file('book-roulette', 'hello-remote.txt', 'hello2.txt'), then the error NoSuchKey is trying to say it cannot find the file s3://book-roulette/hello-remote.txt in the s3 region you specify.
I would suggest checking that s3 path to make sure it exists, or that the specified bucket and key are correct.
Edit: I notice that you create the s3_client object within your loop and overwrite the one where you specify your region and credentials, so it's possible it might not be checking in the right region anymore, but that might result in an access denied error or bucket not found error instead

Related

Google Cloud Function can't be invoked

I have a cloud function, the code is fine when I test locally. However, it doesn't work as a cloud function even though it deploys successfully. When deployed, I tried adding allUsers as a Cloud Function invoker. Ingress settings are set to allow all web traffic.
I get a 500 error and it says >Error: could not handle the request when visiting the URL.
Cloud Scheduler constantly fails, and the logs for the cloud function don't really help give any understanding as to why it fails.
When expanded, the logs give no further detail either.
I've got no idea what else to try and resolve this issue. I just want to be able to invoke my HTTP cloud function on a schedule, the code works fine when run and tested using a service account. Why doesn't it work when added to the function?
Here is the code I'm using;
from bs4 import BeautifulSoup
import pandas as pd
import constants as const
from google.cloud import storage
import os
import json
from datetime import datetime
from google.cloud import bigquery
import re
from flask import escape
#service_account_path = os.path.join("/Users/nbamodel/nba-data-keys.json")
#os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = service_account_path
client = storage.Client()
bucket = client.get_bucket(const.destination_gcs_bucket)
def scrape_team_data(request):
"""HTTP Cloud Function.
Args:
request (flask.Request): The request object.
<http://flask.pocoo.org/docs/1.0/api/#flask.Request>
Returns:
The response text, or any set of values that can be turned into a
Response object using `make_response`
<http://flask.pocoo.org/docs/1.0/api/#flask.Flask.make_response>.
"""
headers = [
'Rank',
'Team',
'Age',
'Wins',
'Losses',
'PW',
'PL',
'MOV',
'SOS',
'SRS',
'ORtg',
'DRtg',
'NRtg',
'Pace',
'FTr',
'_3PAr',
'TS_pct',
'offense_eFG_pct',
'offense_TOV_pct',
'offense_ORB_pct',
'offense_FT_FGA',
'defense_eFG_pct',
'defense_TOV_pct',
'defense_DRB_pct',
'defense_FT_FGA',
'Arena',
'Attendance',
'Attendance_Game'
]
r = requests.get('https://www.basketball-reference.com/leagues/NBA_2020.html')
matches = re.findall(r'id=\"misc_stats\".+?(?=table>)table>', r.text, re.DOTALL)
find_table = pd.read_html('<table ' + matches[0])
df = find_table[0]
df.columns = headers
filename = 'teams_data_adv_stats' #+ datetime.now().strftime("%Y%m%d")
df.to_json(filename, orient='records', lines=True)
print(filename)
# Push data to GCS
blob = bucket.blob(filename)
blob.upload_from_filename(
filename=filename,
content_type='application/json'
)
# Create BQ table from data in bucket
client = bigquery.Client()
dataset_id = 'nba_model'
dataset_ref = client.dataset(dataset_id)
job_config = bigquery.LoadJobConfig()
job_config.create_disposition = 'CREATE_IF_NEEDED'
job_config.source_format = bigquery.SourceFormat.NEWLINE_DELIMITED_JSON
uri = "gs://nba_teams_data/{}".format(filename)
load_job = client.load_table_from_uri(
uri,
dataset_ref.table("teams_data"),
location="US", # Location must match that of the destination dataset.
job_config=job_config,
) # API request
print("Starting job {}".format(load_job.job_id))
load_job.result() # Waits for table load to complete.
print("Job finished.")
destination_table = client.get_table(dataset_ref.table("teams_data"))
print("Loaded {} rows.".format(destination_table.num_rows))
return
I have deployed your code into a Cloud Function and it's failing due to two reasons.
First, it's missing the requests dependency, so the line import requests has to be added on top of the file, with the other imports.
Second, it seems like your code is trying to write a file on a read-only file system, which is immediately rejected the os and the function gets terminated. Said write operation is being done by the method DataFrame.to_json, which is trying to write content to the file teams_data_adv_stats to later upload it to a GCS bucket.
There are two ways that you can work around this issue:
Create the file in the temporary folder. As explained on the documentation you cannot write in the file system with the exception of the /tmp directory. I have managed to succeed using this method with the following modified lines:
filename = 'teams_data_adv_stats'
path = os.path.join('/tmp', filename)
df.to_json(path, orient='records', lines=True)
blob = bucket.blob(filename)
blob.upload_from_filename(
filename=path,
content_type='application/json'
)
Avoid creating a file and work with a string. Instead of using upload_from_filename I suggest you work with upload_from_string. I have managed to succeed using this method with the following modified lines:
filename = 'teams_data_adv_stats'
data_json = df.to_json(orient='records', lines=True)
blob = bucket.blob(filename)
blob.upload_from_string(
data_json,
content_type='application/json'
)
As a heads up, you can test your Cloud Functions from testing tab on the function's details. I recommend you use it because it's what I have worked with in order to troubleshoot your issue and could be handy to know about it. Also bear in mind that there's an on-going issue with logs on failing Cloud Functions with the python37 runtime that prevents the error message to appear. I encountered the issue while working on your CF and I worked around it with the workaround provided.
As a side note I did all the reproduction with the following requirements.txt file in order to deploy and run successfully, since you didn't provide it. I assume this is correct:
beautifulsoup4==4.9.1
Flask==1.1.2
google-cloud-bigquery==1.27.2
google-cloud-storage==1.30.0
lxml==4.5.2
pandas==1.1.1

Serving media file to podcast.app (iOS/MacOS) for download

I'm trying to create a simple podcast hosting web server with flask.
Streaming media files from the server works beautifully. However whenever I try and download the media for offline viewing my podcast app throws "Download error".
I've tried all sorts of headers and responses.
End point I'm using with relevant code (I hardcoded the filename for testing):
import os
import logging
from streamlink.exceptions import (PluginError)
from flask import Flask, redirect, abort, request
from flask import send_from_directory
def run_server(host, port, file_dir):
#app.route('/media', methods=['GET'])
def media():
return send_from_directory(file_dir, '6TfLVL5GeE4.mp4')
app.run(host=host, port=port, debug=True)
Podcast.app error log states:
Download failed due to error: Invalid asset: The original extension and resolved extension were not playable for episode url Optional(http://10.0.0.115:8084/media?id=6TfLVL5GeE4&provider=tw).
I can't for the life of me figure why it would fail for downloading but stream perfectly fine.
Please help!
Found out that you can not use parameters for a podcast feed url.
The url must include a filename.
For example: http://localhost/file.mp3

400 Bad Request when uploading files [Flask on Cloud9]

I'm having trouble resolving this issue. I've got a very simple file upload form that I'm getting the following error for:
Bad Request
The browser (or proxy) sent a request that this server could not understand.
I've followed a basic tutorial and copied it exactly. I'm wondering if it's because of some permissions thing as I'm using the Cloud9 IDE and just trying to upload the files into a folder I've created in the root of the site.
application.py is as follows:
from cs50 import SQL
import os
from flask import Flask, jsonify, redirect, render_template, request
# Configure application
app = Flask(__name__)
#get the absolute directry of the server path for file uploads
APP_ROOT = os.path.dirname(os.path.abspath(__file__))
[further in the code]
#app.route("/create_staff", methods=["POST"])
def create_staff():
#define upload path
target = os.path.join(APP_ROOT, 'images/')
#check the folder exists, if it doesn't, create it
if not os.path.isdir(target):
os.mkdir(target)
print("PREPPING TO UPLOAD FILE \n")
f = request.files['file']
filename = f.filename
print(filename)
destination = "/".join([target, filename])
f.save(destination)
return redirect("/addstaff")
Any suggestions on how to troubleshoot this?
i had exact same issue, and not found the solution yet, but make sure that:
your input html tag has name='file' as this will refer to request.files['file']
your <form> tag is marked with enctype='multipart/form-data'
https://flask.palletsprojects.com/en/1.1.x/patterns/fileuploads/

(Flask API), make firebase uploaded images have unique names

I'm building an API that uploads images to Firebase storage, everything works as expected in that regard, the problem is that the syntax makes me specify the file name in each upload, and in production mode the API will receive upload requests from multiple devices, so I need to make to code so it checks for an available id, set it for the "blob()" object, and then do a normal upload, but I have no idea how to do that. or a random name I don't care as long as it doesn't overwrite another picture
Here is my current code:
from flask_pymongo import PyMongo
import firebase_admin
from firebase_admin import credentials, auth, storage, firestore
import os
import io
cred = credentials.Certificate('service_account_key.json')
firebase_admin.initialize_app(cred, {'storageBucket': 'MY-DATABASE-NAME.appspot.com'})
bucket = storage.bucket()
blob = bucket.blob("images/newimage.png") #here is where im guessing i #should put the next available name
# "apple.png" is a sample image #for testing in my directory
with open("apple.png", "rb") as f:
blob.upload_from_file(f)
As "Klaus D."'s comment said the solution was to implement the "uuid" module
import uuid
.....
.....
blob = bucket.blob("images/" + str(uuid.uuid4()))

Google App Engine, Cloud Storage can't read uploaded files in python

I created the bucket, upload a file via
gsutil cp -a public-read test.htm gs://BUCKETNAME
I can see the file in the Storage Browser and with gsutil ls
When I try to read it with the following code, I get
NotFoundError: Expect status [200] from Google Storage. But got status
404.
Code:
import logging
import os
import lib.cloudstorage as gcs
import webapp2
from google.appengine.api import app_identity
my_default_retry_params = gcs.RetryParams(initial_delay=0.2,
max_delay=5.0,
backoff_factor=2,
max_retry_period=50)
gcs.set_default_retry_params(my_default_retry_params)
class MainPage(webapp2.RequestHandler):
def get(self):
bucket_name = os.environ.get('BUCKET_NAME',
app_identity.get_default_gcs_bucket_name())
self.response.headers['Content-Type'] = 'text/plain'
self.response.write('Demo GCS Application running from Version: '
+ os.environ['CURRENT_VERSION_ID'] + '\n')
self.response.write('Using bucket name: ' + bucket_name + '\n\n')
bucket = '/' + bucket_name
filename = bucket + '/test.htm'
self.read_file(filename)
def read_file(self, filename):
self.response.write('Abbreviated file content (first line and last 1K):\n')
gcs_file = gcs.open(filename)
self.response.write(gcs_file)
app = webapp2.WSGIApplication([
('/', MainPage),
], debug=True)
Strage thing happens, when I use the sample code from here (https://github.com/GoogleCloudPlatform/appengine-gcs-client/blob/master/python/demo/main.py) in my environment.
The code creates and writes a file on the same bucket, then reads it and deletes it. Works fine, but when I comment out the delete parts, I can not see the files in Storage Browser or with gsutil, but I can still read them with the code.
I checked my bucket permissions in Storage Browser and they seem fine, the service account of the app is bucket owner.
Any help is greatly appreciated.
After revisiting this, due to recent demand I figured it out. gsutil has a bug, I read about its somewhere at github. When I upload the file via storage browser or the python client libraries it works.

Categories