I am trying to download the log files that are present in S3 bucket using boto. The reason behind not using s3cmd and some other tools is that I don't want my code to be dependent on some kind of software/tool so that others can also use my code directly and don't have to worry about downloading some other dependencies.
I am getting the following stack trace. I saw various related posts but none of them solved my problem.
Traceback (most recent call last):
File "/Library/Python/2.7/site-packages/fabric/main.py", line 743, in main
*args, **kwargs
File "/Library/Python/2.7/site-packages/fabric/tasks.py", line 405, in execute
results['<local-only>'] = task.run(*args, **new_kwargs)
File "/Library/Python/2.7/site-packages/fabric/tasks.py", line 171, in run
return self.wrapped(*args, **kwargs)
File "/pgbadger/pgbadger_html.py", line 86, in dlogs
s3 = S3()
File "/pgbadger/pgbadger_html.py", line 46, in __init__
self.bucket = self._get_bucket(self.log_bucket)
File "/pgbadger/pgbadger_html.py", line 65, in _get_bucket
return self.s3_conn.get_bucket(bucket)
File "/Library/Python/2.7/site-packages/boto/s3/connection.py", line 471, in get_bucket
return self.head_bucket(bucket_name, headers=headers)
File "/Library/Python/2.7/site-packages/boto/s3/connection.py", line 518, in head_bucket
response.status, response.reason, body)
boto.exception.S3ResponseError: S3ResponseError: 400 Bad Request
I have seen the code and I don't know why I am getting this error. My code is as follows:
from fabric.api import task
from fabric.api import env
S3_LOG_BUCKET = BUCKET-NAME
class S3(object):
s3_conn = None
log_bucket = S3_LOG_BUCKET
region = region
bucket = None
env.host_string = REGION-NAME
def __init__(self):
self._s3_connect()
self.bucket = self._get_bucket(self.log_bucket)
def _s3_connect(self):
if not self.s3_conn:
self.s3_conn = boto.s3.connect_to_region(
self.region,
aws_access_key_id=AWS_ACCESS_KEY_ID,
aws_secret_access_key=AWS_SECRET_ACCESS_KEY,
)
if not self.s3_conn:
raise ValueError('Invalid Region Name: {}'.format(region))
def download_s3_logs(self):
for l in self.bucket.list():
key_string = str(l.key)
l.get_contents_to_filename("/tempLogFiles/" + key_string)
print l.key
def _get_bucket(self, bucket):
return self.s3_conn.get_bucket(bucket)
#task
def dlogs():
s3 = S3()
s3.download_s3_logs()
Problem Solved:
In the S3-Log-Bucket, I was mentioning the entire path specific to my bucket.. Like I have my bucket and there are multiple folders in it. So I was mentioning the entire path to it but BOTO doesn't expect that to happen. Hence I just had to mention the bucket name only instead of mentioning the entire path.
Previously I was doing-->
log_bucket = Bucket/Inner Folder 1/Inner Folder 2/.../ which was wrong
Correct way of doing it-->
log_bucket = Bucket
Related
I'm trying to get data from a resource: novel-coronavirus-2019-ncov-cases from the site Humanitarian Data Exchang. Previously everything was fine, I updated the library hdx-python-api==5.9.7 to the latest version and I get the following error:
Traceback (most recent call last):
File "/home/user/dashboard/scripts/jhu.py", line 31, in <module>
Configuration.create(hdx_site="prod", user_agent="A_Quick_Example", hdx_read_only=True)
File "/home/user/anaconda3/lib/python3.8/site-packages/hdx/api/configuration.py", line 647, in create
return cls._create(
File "/home/user/anaconda3/lib/python3.8/site-packages/hdx/api/configuration.py", line 607, in _create
cls._configuration.setup_session_remoteckan(remoteckan, **kwargs)
File "/home/user/anaconda3/lib/python3.8/site-packages/hdx/api/configuration.py", line 471, in setup_session_remoteckan
self._session, user_agent = self.create_session_user_agent(
File "/home/user/anaconda3/lib/python3.8/site-packages/hdx/api/configuration.py", line 436, in create_session_user_agent
session = get_session(
File "/home/user/anaconda3/lib/python3.8/site-packages/hdx/utilities/session.py", line 173, in get_session
retries = Retry(
TypeError: __init__() got an unexpected keyword argument 'allowed_methods'
Which clearly refers me to a configuration error. I only need to download the data, so I'm using the read configuration example that was given in the official documentation.
Example code:
from hdx.api.configuration import Configuration
from hdx.data.dataset import Dataset
Configuration.create(hdx_site='prod', user_agent='A_Quick_Example', hdx_read_only=True)
def save(direct):
datasets = Dataset.read_from_hdx('novel-coronavirus-2019-ncov-cases')
print(datasets.get_date_of_dataset())
resources = Dataset.get_all_resources(datasets)
for res in resources:
url, path = res.download(folder=direct)
print('Resource URL %s downloaded to %s' % (url, path))
Can you help to solve this error?
I am trying to do a timer trigger whereby every 5 minutes, the function will upload a file into my blob storage.
When I run my code locally, it works, but it fails when it is deployed on Azure. Any help will be appreciated.
Main Method
device_client = IoTHubDeviceClient.create_from_connection_string(CONNECTION_STRING)
PATH_TO_FILE = wget.download("link-of-something", out=os.getcwd())
device_client.connect()
blob_name = os.path.basename(PATH_TO_FILE)
storage_info = device_client.get_storage_info_for_blob(blob_name)
store_blob(storage_info, PATH_TO_FILE)
device_client.shutdown()
Helper method
def store_blob(blob_info, file_name):
try:
sas_url = "https://{}/{}/{}{}".format(
blob_info["hostName"],
blob_info["containerName"],
blob_info["blobName"],
blob_info["sasToken"]
)
print("\nUploading file: {} to Azure Storage as blob: {} in container {}\n".format(file_name, blob_info["blobName"], blob_info["containerName"]))
# Upload the specified file
with BlobClient.from_blob_url(sas_url) as blob_client:
with open(file_name, "rb") as f:
result = blob_client.upload_blob(f, overwrite=True)
return (True, result)
except FileNotFoundError as ex:
# catch file not found and add an HTTP status code to return in notification to IoT Hub
ex.status_code = 404
return (False, ex)
except AzureError as ex:
# catch Azure errors that might result from the upload operation
return (False, ex)
This is the error log (Edited)
Result: Failure Exception: OSError: [Errno 30] Read-only file system: './nasa_graphics_manual_nhb_1430-2_jan_1976.pdfjo243l48.tmp' Stack: File "/azure-functions-host/workers/python/3.9/LINUX/X64/azure_functions_worker/dispatcher.py", line 407, in _handle__invocation_request call_result = await self._loop.run_in_executor( File "/usr/local/lib/python3.9/concurrent/futures/thread.py", line 58, in run result = self.fn(*self.args, **self.kwargs) File "/azure-functions-host/workers/python/3.9/LINUX/X64/azure_functions_worker/dispatcher.py", line 649, in _run_sync_func return ExtensionManager.get_sync_invocation_wrapper(context, File "/azure-functions-host/workers/python/3.9/LINUX/X64/azure_functions_worker/extension.py", line 215, in _raw_invocation_wrapper result = function(**args) File "/home/site/wwwroot/azure-function-timer/__init__.py", line 134, in main PATH_TO_FILE = wget.download("https://www.nasa.gov/sites/default/files/atoms/files/nasa_graphics_manual_nhb_1430-2_jan_1976.pdf", out=os.getcwd()) # wget to get the filename and path File "/home/site/wwwroot/.python_packages/lib/site-packages/wget.py", line 506, in download (fd, tmpfile) = tempfile.mkstemp(".tmp", prefix=prefix, dir=".") File "/usr/local/lib/python3.9/tempfile.py", line 336, in mkstemp return _mkstemp_inner(dir, prefix, suffix, flags, output_type) File "/usr/local/lib/python3.9/tempfile.py", line 255, in _mkstemp_inner fd = _os.open(file, flags, 0o600)
What you can do is containerize the function using docker and inside the container the place the file so that you can read the file later.
Because if you don't containerize the function, only the function's code will be deployed and not the file.
Refer this documentation for Indepth explanation.
I have the following code snippet:
import firebase_admin
from firebase_admin import credentials
from firebase_admin import storage
from google.cloud import storage
class firebase_storage():
def __init__(self, path_to_sak, root_bucket):
try:
self.cred = credentials.Certificate(path_to_sak)
firebase_admin.initialize_app(self.cred)
except Exception as e:
print("Firebase App may have already been initialized")
self.bucket = firebase_admin.storage.bucket(root_bucket)
def upload(self, key, file_path):
blob = storage.Blob(key, self.bucket)
blob.upload_from_filename(file_path)
def download(self, key, file_path):
blob = storage.Blob(key, self.bucket)
blob.download_to_filename(file_path)
def upload_string(self, key, string, mime_type):
blob = storage.Blob(key, self.bucket)
blob.upload_from_string(string, content_type=mime_type)
I'm using Firebase Emulators for Storage, I have verified that downloads work using the method call firebase_storage.download().
However, when I try to call upload() the following exception is thrown:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/google/cloud/storage/blob.py", line 2348, in upload_from_file
created_json = self._do_upload(
File "/usr/local/lib/python3.8/dist-packages/google/cloud/storage/blob.py", line 2170, in _do_upload
response = self._do_multipart_upload(
File "/usr/local/lib/python3.8/dist-packages/google/cloud/storage/blob.py", line 1732, in _do_multipart_upload
response = upload.transmit(
File "/usr/local/lib/python3.8/dist-packages/google/resumable_media/requests/upload.py", line 149, in transmit
self._process_response(response)
File "/usr/local/lib/python3.8/dist-packages/google/resumable_media/_upload.py", line 116, in _process_response
_helpers.require_status_code(response, (http_client.OK,), self._get_status_code)
File "/usr/local/lib/python3.8/dist-packages/google/resumable_media/_helpers.py", line 99, in require_status_code
raise common.InvalidResponse(
google.resumable_media.common.InvalidResponse: ('Request failed with status code', 400, 'Expected one of', <HTTPStatus.OK: 200>)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "boot.py", line 55, in <module>
run()
File "boot.py", line 35, in run
fb_storage.upload(key, file)
File "/root/python_db_client/src/firebase_storage.py", line 20, in upload
blob.upload_from_filename(file_path)
File "/usr/local/lib/python3.8/dist-packages/google/cloud/storage/blob.py", line 2475, in upload_from_filename
self.upload_from_file(
File "/usr/local/lib/python3.8/dist-packages/google/cloud/storage/blob.py", line 2364, in upload_from_file
_raise_from_invalid_response(exc)
File "/usr/local/lib/python3.8/dist-packages/google/cloud/storage/blob.py", line 3933, in _raise_from_invalid_response
raise exceptions.from_http_status(response.status_code, message, response=response)
google.api_core.exceptions.BadRequest: 400 POST http://myserver.com:9194/upload/storage/v1/b/xxxxxx.appspot.com/o?uploadType=multipart: Bad Request: ('Request failed with status code', 400, 'Expected one of', <HTTPStatus.OK: 200>)
My storage.rules look like this:
rules_version = '2';
service firebase.storage {
match /b/{bucket}/o {
match /{allPaths=**} {
allow write, read: if true;
}
}
}
And so, it would appear that public read/write access is allowed.
Everything is working, I have other emulators (Firestore, Auth) that is working fine, but Storage uploads refuse to work :(
Any help would be greatly appreciated thank you!
Maybe there is a problem initializing your app. I see your are taking granted that the app is initialized if there is an error while initializing. Try checking connection first! It may help...
Python Admin SDK does not currently support the Storage emulator according to the documentation
https://firebase.google.com/docs/emulator-suite/install_and_configure#admin_sdk_availability
I'm having trouble identifying the source of transaction.interfaces.NoTransaction errors within my Pyramid App. I don't see any patterns to when the error happens, so to me it's quite random.
This app is a (semi-) RESTful API and uses SQLAlchemy and MySQL. I'm currently running within a docker container that connects to an external (bare metal) MySQL instance on the same host OS.
Here's the stack trace for a login attempt within the App. This error happened right after another login attempt that was actually successful.
2020-06-15 03:57:18,982 DEBUG [txn.140501728405248:108][waitress-1] new transaction
2020-06-15 03:57:18,984 INFO [sqlalchemy.engine.base.Engine:730][waitress-1] BEGIN (implicit)
2020-06-15 03:57:18,984 DEBUG [txn.140501728405248:576][waitress-1] abort
2020-06-15 03:57:18,985 ERROR [waitress:357][waitress-1] Exception while serving /auth
Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/waitress/channel.py", line 350, in service
task.service()
File "/usr/local/lib/python3.8/site-packages/waitress/task.py", line 171, in service
self.execute()
File "/usr/local/lib/python3.8/site-packages/waitress/task.py", line 441, in execute
app_iter = self.channel.server.application(environ, start_response)
File "/usr/local/lib/python3.8/site-packages/pyramid/router.py", line 270, in __call__
response = self.execution_policy(environ, self)
File "/usr/local/lib/python3.8/site-packages/pyramid_retry/__init__.py", line 127, in retry_policy
response = router.invoke_request(request)
File "/usr/local/lib/python3.8/site-packages/pyramid/router.py", line 249, in invoke_request
response = handle_request(request)
File "/usr/local/lib/python3.8/site-packages/pyramid_tm/__init__.py", line 178, in tm_tween
reraise(*exc_info)
File "/usr/local/lib/python3.8/site-packages/pyramid_tm/compat.py", line 36, in reraise
raise value
File "/usr/local/lib/python3.8/site-packages/pyramid_tm/__init__.py", line 135, in tm_tween
userid = request.authenticated_userid
File "/usr/local/lib/python3.8/site-packages/pyramid/security.py", line 381, in authenticated_userid
return policy.authenticated_userid(self)
File "/opt/REDACTED-api/REDACTED_api/auth/policy.py", line 208, in authenticated_userid
result = self._authenticate(request)
File "/opt/REDACTED-api/REDACTED_api/auth/policy.py", line 199, in _authenticate
session = self._get_session_from_token(token)
File "/opt/REDACTED-api/REDACTED_api/auth/policy.py", line 320, in _get_session_from_token
session = service.get(session_id)
File "/opt/REDACTED-api/REDACTED_api/service/__init__.py", line 122, in get
entity = self.queryset.filter(self.Meta.model.id == entity_id).first()
File "/usr/local/lib/python3.8/site-packages/sqlalchemy/orm/query.py", line 3375, in first
ret = list(self[0:1])
File "/usr/local/lib/python3.8/site-packages/sqlalchemy/orm/query.py", line 3149, in __getitem__
return list(res)
File "/usr/local/lib/python3.8/site-packages/sqlalchemy/orm/query.py", line 3481, in __iter__
return self._execute_and_instances(context)
File "/usr/local/lib/python3.8/site-packages/sqlalchemy/orm/query.py", line 3502, in _execute_and_instances
conn = self._get_bind_args(
File "/usr/local/lib/python3.8/site-packages/sqlalchemy/orm/query.py", line 3517, in _get_bind_args
return fn(
File "/usr/local/lib/python3.8/site-packages/sqlalchemy/orm/query.py", line 3496, in _connection_from_session
conn = self.session.connection(**kw)
File "/usr/local/lib/python3.8/site-packages/sqlalchemy/orm/session.py", line 1138, in connection
return self._connection_for_bind(
File "/usr/local/lib/python3.8/site-packages/sqlalchemy/orm/session.py", line 1146, in _connection_for_bind
return self.transaction._connection_for_bind(
File "/usr/local/lib/python3.8/site-packages/sqlalchemy/orm/session.py", line 458, in _connection_for_bind
self.session.dispatch.after_begin(self.session, self, conn)
File "/usr/local/lib/python3.8/site-packages/sqlalchemy/event/attr.py", line 322, in __call__
fn(*args, **kw)
File "/usr/local/lib/python3.8/site-packages/zope/sqlalchemy/datamanager.py", line 268, in after_begin
join_transaction(
File "/usr/local/lib/python3.8/site-packages/zope/sqlalchemy/datamanager.py", line 233, in join_transaction
DataManager(
File "/usr/local/lib/python3.8/site-packages/zope/sqlalchemy/datamanager.py", line 89, in __init__
transaction_manager.get().join(self)
File "/usr/local/lib/python3.8/site-packages/transaction/_manager.py", line 91, in get
raise NoTransaction()
transaction.interfaces.NoTransaction
The trace shows that the execution eventually reaches my project, but only my custom authentication policy. And it fails right where the database should be queried for the user.
What intrigues me here is the third line on the stack trace. It seems Waitress somehow aborted the transaction it created? Any clue why?
EDIT: Here's the code where that happens: policy.py:320
def _get_session_from_token(self, token) -> UserSession:
try:
session_id, session_secret = self.parse_token(token)
except InvalidToken as e:
raise SessionNotFound(e)
service = AuthService(self.dbsession, None)
try:
session = service.get(session_id) # <---- Service Class called here
except NoResultsFound:
raise SessionNotFound("Invalid session found Request headers. "
"Session id: %s".format(session_id))
if not service.check_session(session, session_secret):
raise SessionNotFound("Session signature does not match")
now = datetime.now(tz=pytz.UTC)
if session.validity < now:
raise SessionNotFound(
"Current session ID {session_id} is expired".format(
session_id=session.id
)
)
return session
And here is an a view on the that service class method:
class AuthService(ModelService):
class Meta:
model = UserSession
queryset = Query(UserSession)
search_fields = []
order_fields = [UserSession.created_at.desc()]
# These below are from the generic ModelClass father class
def __init__(self, dbsession: Session, user_id: str):
self.user_id = user_id
self.dbsession = dbsession
self.Meta.queryset = self.Meta.queryset.with_session(dbsession)
self.logger = logging.getLogger("REDACTED")
#property
def queryset(self):
return self.Meta.queryset
def get(self, entity_id) -> Base:
entity = self.queryset.filter(self.Meta.model.id == entity_id).first()
if not entity:
raise NoResultsFound(f"Could not find requested ID {entity_id}")
As you can see, the there's already some exception treatment. I really don't see what other exception I could try to catch on AuthService.get
I found the solution to be much simpler than tinkering inside Pyramid or SQLAlchemy.
Debugging my Authentication Policy closely, I found out that my it was keeping a sticky reference for the dbsession. It was stored on the first request ever who used it, and never released.
The first request works as expected, the following one fails: My understanding is that the object is still in memory while the app is running, and after the initial transaction is closed. The second request has a new connection, and a new transaction, but the object in memory still points to the previous one, that when used ultimately causes this.
What I don't understand is why the exception didn't happen sometimes. As I mentioned initially, it was seemingly random.
Another thing that I struggled with was in writing a test case to expose the issue. On my tests, the issue never happens because I have (and I've never seen it done differently) a single connection and a single transaction throughout the entire testing session, as opposed of a new connection/transaction per request, so I have not found no way to actually reproduce.
Please let me know if that makes sense, and if you can shed a light on how to expose the bug on a test case.
I'm making a web app with AppEngine that uses the Spotify and Reddit APIs and have it working locally with dev_appserver.py, but when I upload my project and try the exact same thing on the website version, I get an error:
Traceback (most recent call last):
File "/base/alloc/tmpfs/dynamic_runtimes/python27g/350d926c06a7e859/python27/python27_lib/versions/third_party/webapp2-2.5.2/webapp2.py", line 1535, in __call__
rv = self.handle_exception(request, response, e)
File "/base/alloc/tmpfs/dynamic_runtimes/python27g/350d926c06a7e859/python27/python27_lib/versions/third_party/webapp2-2.5.2/webapp2.py", line 1529, in __call__
...
...
File "/base/data/home/apps/s~kabloombox-219016/20190116t005128.415435515961651480/main.py", line 274, in post
scan_subreddit(language, access_token)
File "/base/data/home/apps/s~kabloombox-219016/20190116t005128.415435515961651480/main.py", line 190, in scan_subreddit
reddit = praw.Reddit(client_id=CLIENT_ID_REDDIT, client_secret=CLIENT_SECRET_REDDIT, user_agent=USER_AGENT)
...
...
File "/base/alloc/tmpfs/dynamic_runtimes/python27g/350d926c06a7e859/python27/python27_dist/lib/python2.7/platform.py", line 165, in libc_ver
f = open(executable,'rb')
IOError: [Errno 2] No such file or directory: '/base/alloc/tmpfs/dynamic_runtimes/python27g/350d926c06a7e859/python27/python27_dist/python'
I get this error after submitting a form, which then is supposed to run a web scraper but just errors instantly instead. I found a lot of people with the same No such file or directory error on their own files that they made and just needed to change app.yaml, but /base/alloc/tmpfs/dynamic_runtimes/python27g/350d926c06a7e859/python27/python27_dist/python is some random file/folder and I have absolutely no idea what I'm supposed to make of it.
You can either change your praw initialization to prevent it from calling platform.platform():
r = praw.Reddit(user_agent='...', disable_update_check=True)
Or patch platform.platform() to return a string literal in appengine_config.py:
import platform
def patch(module):
def decorate(func):
setattr(module, func.func_name, func)
return func
return decorate
#patch(platform)
def platform():
return 'AppEngine'