Endpoint is weird Amazon Textract Python - python

I'm trying to use textract in python. I got the code from this url: https://github.com/aws-samples/amazon-textract-code-samples/blob/c8f34ca25113100730e0f4db3f6f316b0cff44d6/python/02-detect-text-s3.py.
I only changed s3BucketName and documentName in the code. But when I ran the code, I got this error:
botocore.exceptions.EndpointConnectionError: Could not connect to the endpoint URL: "https://textract.USA.amazonaws.com/"
Should I alter the url manually? If so, how can i do that?

The endpoint URL depends on your AWS region; USA is not a valid AWS region.
You can set the region when creating the boto3 client:
textract = boto3.client('textract', region_name='us-west-1')
will use https://textract.us-west-1.amazonaws.com/ as the endpoint.
Alternatively, the region can come from the profile or environment; see the boto3 configuration docs for more details.

Related

boto3: How to interract with DigitalOcean S3 Spaces when CDN is enabled

I'm working with DigitalOcean Spaces (S3 storage protocol) which has enabled CDN.
Any file on s3 can be accessed via direct URL in the given form:
https://my-bucket.fra1.digitaloceanspaces.com/<file_key>
If CDN is enabled, the file can be accessed via additional CDN URL:
https://my-bucket.fra1.cdn.digitaloceanspaces.com/<file_key>
where fra1 is a region_name.
When I'm using boto3 SDK for Python, the file URL is the following (generated by boto3):
https://fra1.digitaloceanspaces.com/my-bucket/<file_key>
# just note that bucket name is no more a domain part!
This format also works fine.
But, if CDN is enabled - file url causes an error:
EndpointConnectionError: Could not connect to the endpoint URL: https://fra1.cdn.digitaloceanspaces.com/my-bucket/<file_key>
assuming the endpoint_url was changed from
default_endpoint=https://fra1.digitaloceanspaces.com
to
default_endpoint=https://fra1.cdn.digitaloceanspaces.com
How to connect to CDN with proper URL without getting an error?
And why boto3 uses different URL format? Is any workaround can be applied in this case?
code:
s3_client = boto3.client('s3',
region_name=s3_configs['default_region'],
endpoint_url=s3_configs['default_endpoint'],
aws_access_key_id=s3_configs['bucket_access_key'],
aws_secret_access_key=s3_configs['bucket_secret_key'])
s3_client.download_file(bucket_name,key,local_filepath)
boto3 guide for DigitalOcean Spaces.
Here is what I've also tried but It didn't work:
Generate presigned url's
UPDATE
Based on #Amit Singh's answer:
As I mentioned before, I've already tried this trick with presigned URLs.
I've got Urls like this
https://fra1.digitaloceanspaces.com/<my-bucket>/interiors/uploaded/images/07IRgHJ2PFhVqVrJDCIpzhghqe4TwK1cSSUXaC4T.jpeg?<presigned-url-params>
The bucket name appears after endpoint. I had to move It to domain-level manually:
https://<my-bucket>.fra1.cdn.digitaloceanspaces.com/interiors/uploaded/images/07IRgHJ2PFhVqVrJDCIpzhghqe4TwK1cSSUXaC4T.jpeg?<presigned-url-params>
With this URL I can now connect to Digital ocean, but another arror occures:
This XML file does not appear to have any style information associated with it. The document tree is shown below.
<Error>
<Code>SignatureDoesNotMatch</Code>
<RequestId>tx00000000000008dfdbc88-006005347c-604235a-fra1a</RequestId>
<HostId>604235a-fra1a-fra1</HostId>
</Error>
As a workaround I've tired to use signature s3v4:
s3_client = boto3.client('s3',
region_name=configs['default_region'],
endpoint_url=configs['default_endpoint'],
aws_access_key_id=configs['bucket_access_key'],
aws_secret_access_key=configs['bucket_secret_key'],
config= boto3.session.Config(signature_version='s3v4'))
but It still fails.
boto3 is a client library for Amazon S3 and not Digital Ocean Spaces. So, boto3 will not recognize the CDN URL fra1.cdn.digitaloceanspaces.com since it is provided by Digital Ocean and the URL with CDN is not one of the supported URI patterns. I don't fully understand how CDNs work internally, so my guess is there might be challenges with implementing this redirection to correct URL.
Now that that's clear, let's see how we can get a pre-signed CDN URL. Suppose, your CDN URL is https://fra1.cdn.digitaloceanspaces.com and your space name is my-space. We want to get a pre-signed URL for an object my-example-object stored in the space.
import os
import boto3
from botocore.client import Config
# Initialize the client
session = boto3.session.Session()
client = session.client('s3',
region_name='fra1',
endpoint_url='https://fra1.digitaloceanspaces.com', # Remove `.cdn` from the URL
aws_access_key_id=os.getenv('SPACES_KEY'),
aws_secret_access_key=os.getenv('SPACES_SECRET'),
config=Config(s3={'addressing_style': 'virtual'}))
# Get a presigned URL for object
url = client.generate_presigned_url(ClientMethod='get_object',
Params={'Bucket': 'my-space',
'Key': 'my-example-object'},
ExpiresIn=300)
print(url)
The pre-signed URL will look something like :
https://my-space.fra1.digitaloceanspaces.com/my-example-object?AWSAccessKeyId=EXAMPLE7UQOTHDTF3GK4&Content-Type=text&Expires=1580419378&Signature=YIXPlynk4BALXE6fH7vqbnwjSEw%3D
Add the cdn in between either manually or programmatically, in case you need to so that your final URL will become:
https://my-space.fra1.cdn.digitaloceanspaces.com/my-example-object?AWSAccessKeyId=EXAMPLE7UQOTHDTF3GK4&Content-Type=text&Expires=1580419378&Signature=YIXPlynk4BALXE6fH7vqbnwjSEw%3D
This is your CDN URL.
Based on #Amit Singh's answer, I've made an additional research of this issue.
Answers that helped me were found here and here.
To make boto3 presigned URLs work, I've made the following update to client and generate_presigned_url() params.
s3_client = boto3.client('s3',
region_name=configs['default_region'],
endpoint_url=configs['default_endpoint'],
aws_access_key_id=configs['bucket_access_key'],
aws_secret_access_key=configs['bucket_secret_key'],
config=boto3.session.Config(signature_version='s3v4', retries={
'max_attempts': 10,
'mode': 'standard'
},
s3={'addressing_style': "virtual"}, ))
...
response = s3_client.generate_presigned_url('get_object',
Params={'Bucket': bucket_name,
'Key': object_name},
ExpiresIn=3600,
HttpMethod=None
)
After that, .cdn domain part shoud be added after region name.

Uploading file with python returns Request failed with status code', 403, 'Expected one of', <HTTPStatus.OK: 200>

blob.upload_from_filename(source) gives the error
raise exceptions.from_http_status(response.status_code, message, >response=response)
google.api_core.exceptions.Forbidden: 403 POST >https://www.googleapis.com/upload/storage/v1/b/bucket1-newsdata->bluetechsoft/o?uploadType=multipart: ('Request failed with status >code', 403, 'Expected one of', )
I am following the example of google cloud written in python here!
from google.cloud import storage
def upload_blob(bucket, source, des):
client = storage.Client.from_service_account_json('/path')
storage_client = storage.Client()
bucket = storage_client.get_bucket(bucket)
blob = bucket.blob(des)
blob.upload_from_filename(source)
I used gsutil to upload files, which is working fine.
Tried to list the bucket names using the python script which is also working fine.
I have necessary permissions and GOOGLE_APPLICATION_CREDENTIALS set.
This whole things wasn't working because I didn't have permission storage admin in the service account that I am using in GCP.
Allowing storage admin to my service account solved my problem.
As other answers have indicated that this is related to the issue of permission, I have found one following command as useful way to create default application credential for currently logged in user.
Assuming, you got this error, while running this code in some machine. Just following steps would be sufficient:
SSH to vm where code is running or will be running. Make sure you are user, who has permission to upload things in google storage.
Run following command:
gcloud auth application-default login
This above command will ask to create token by clicking on url. Generate token and paste in ssh console.
That's it. All your python application started as that user, will use this as default credential for storage buckets interaction.
Happy GCP'ing :)
This question is more appropriate for a support case.
As you are getting a 403, most likely you are missing a permission on IAM, the Google Cloud Platform support team will be able to inspect your resources and configurations.
This is what worked for me when the google documentation didn't work. I was getting the same error with the appropriate permissions.
import pathlib
import google.cloud.storage as gcs
client = gcs.Client()
#set target file to write to
target = pathlib.Path("local_file.txt")
#set file to download
FULL_FILE_PATH = "gs://bucket_name/folder_name/file_name.txt"
#open filestream with write permissions
with target.open(mode="wb") as downloaded_file:
#download and write file locally
client.download_blob_to_file(FULL_FILE_PATH, downloaded_file)

How to authenticate in Jenkins while remotely accessing its JSON API?

I need to access the Jenkins JSON API from a Python script. The problem is that our Jenkins installation is secured so to log in users have to select a certificate. Sadly, in Jenkins Remote Access Documentation they don't mention a thing about certificates and I tried using the API Token without success.
How can I get to authenticate from a Python script to use their JSON API?
Thanks in advance!
You have to authenticate to the JSON API using HTTP Basic Auth.
To make scripted clients (such as wget) invoke operations that require authorization (such as scheduling a build), use HTTP BASIC authentication to specify the user name and the API token. This is often more convenient than emulating the form-based authentication
https://wiki.jenkins-ci.org/display/JENKINS/Authenticating+scripted+clients
Here is a sample of using Basic Auth with Python.
http://docs.python-requests.org/en/master/user/authentication/
Keep in mind if you are using a Self Signed certificate on an internal Jenkin Server you'll need to turn off certificate validation OR get the certificate from the server and add it to the HTTP request
http://docs.python-requests.org/en/master/user/advanced/
I finally found out how to authenticate to Jenkins using certs and wget. I had to convert my pfx certificates into pem ones with cert and keys in separate files For more info about that come here. In the end this is the command I used.
wget --certificate=/home/B/cert.pem --private-key=/home/B/key.pem --no-check-certificate --output-document=jenkins.json https:<URL>
I'm not completely sure it covers your certificate use case, but since it took me some time to find out, I still want to share this snipped that retrieves the email address for a given user name in Python without special Jenkins libraries. It uses an API token and "supports" (actually ignores) https:
def _get_email_adress(user):
request = urllib.request.Request("https://jenkins_server/user/"+ user +"/api/json")
#according to https://stackoverflow.com/a/28052583/4609258 the following is ugly
context = ssl._create_unverified_context()
base64string = base64.b64encode(bytes('%s:%s' % ('my user name', 'my API token'),'ascii'))
request.add_header("Authorization", "Basic %s" % base64string.decode('utf-8'))
with urllib.request.urlopen(request, context=context) as url:
user_data = json.loads(url.read().decode())
for property in user_data['property']:
if property["_class"]=="hudson.tasks.Mailer$UserProperty":
return property["address"];

Authenticating connection in PySolr

This is the first time I am using Python and Solr. I have my Solr instance set up within tomcat on GCE. I am trying to connect to it from my Python code using PySolr. However, I am not sure how to send authentication parameters via PySolr.
This is the exception I get:
solr = pysolr.Solr('http://MY INSTANCE IP/solr/News', timeout=10)
Apache Tomcat/7.0.28 - Error report HTTP Status 401 - type Status reportmessage description This request requires HTTP authentication ().Apache Tomcat/7.0.28
Please advise.
solr = pysolr.Solr('http://user:pass#IP:8983/solr/')
That's all you need ...
You can pass Solr authentication as part of the Solr connection parameter.
You don't have proper documentation in pySolr on how to carry out authentication. Since pySolr internally uses requests for authentication you can follow authentication in requests.
Here is a small example on custom authentication as well.
In the case of Basic Authentication, you can use it as
solr = pysolr.Solr('http://IP:8983/solr/collection',auth=('username','password'))
or
from requests.auth import HTTPBasicAuth
solr = pysolr.Solr('http://IP:8983/solr/collection',auth=HTTPBasicAuth('username','password'))
This is the proper way of authentication. Passing username and password as a part of URL is not recommended as it might create issues if # or ' are used in any of those may create issues in the authentication.Refer this GitHub issue

Basic authentication with jira-python

I'm new to Python, new to the jira-python library, and new to network programming, though I do have quite a bit of experience with application and integration programming and database queries (though it's been a while).
Using Python 2.7 and requests 1.0.3
I'm trying to use this library - http://jira-python.readthedocs.org/en/latest/ to query Jira 5.1 using Python. I successfully connected using an unauthenticated query, though I had to make a change to a line in client.py, changing
I changed
self._session = requests.session(verify=verify, hooks={'args': self._add_content_type})
to
self._session = requests.session()
I didn't know what I was doing exactly but before the change I got an error and after the change I got a successful list of project names returned.
Then I tried basic authentication so I can take advantage of my Jira permissions and do reporting. That failed initially too. And I made the same change to
def _create_http_basic_session
in client.py , but now I just get another error. So problem not solved. Now I get a different error:
HTTP Status 415 - Unsupported Media Type
type Status report
message Unsupported Media Type
description The server refused this request because the request entity is in
a format not` `supported by the requested resource for the requested method
(Unsupported Media Type).
So then I decided to do a super simple test just using the requests module, which I believe is being used by the jira-python module and this code seemed to log me in. I got a good response:
import requests
r = requests.get(the_url, auth=(my username , password))
print r.text
Any suggestions?
Here's how I use the jira module with authentication in a Python script:
from jira.client import JIRA
import logging
# Defines a function for connecting to Jira
def connect_jira(log, jira_server, jira_user, jira_password):
'''
Connect to JIRA. Return None on error
'''
try:
log.info("Connecting to JIRA: %s" % jira_server)
jira_options = {'server': jira_server}
jira = JIRA(options=jira_options, basic_auth=(jira_user, jira_password))
# ^--- Note the tuple
return jira
except Exception,e:
log.error("Failed to connect to JIRA: %s" % e)
return None
# create logger
log = logging.getLogger(__name__)
# NOTE: You put your login details in the function call connect_jira(..) below!
# create a connection object, jc
jc = connect_jira(log, "https://myjira.mydom.com", "myusername", "mypassword")
# print names of all projects
projects = jc.projects()
for v in projects:
print v
Below Python script connects to Jira and does basic authentication and lists all projects.
from jira.client import JIRA
options = {'server': 'Jira-URL'}
jira = JIRA(options, basic_auth=('username', 'password'))
projects = jira.projects()
for v in projects:
print v
It prints a list of all the project's available within your instance of Jira.
Problem:
As of June 2019, Atlassian Cloud users who are using a REST endpoint in Jira or Confluence Cloud with basic or cookie-based authentication will need to update their app or integration processes to use an API token, OAuth, or Atlassian Connect.
After June 5th, 2019 attempts to authenticate via basic auth with an Atlassian account password will return an invalid credentials error.
Reference: Deprecation of basic authentication with passwords for Jira and Confluence APIs
Solution to the Above-mentioned Problem:
You can use an API token to authenticate a script or other process with an Atlassian cloud product. You generate the token from your Atlassian account, then copy and paste it to the script.
If you use two-step verification to authenticate, your script will need to use a REST API token to authenticate.
Steps to Create an API Token from your Atlassian Account:
Log in to https://id.atlassian.com/manage/api-tokens
Click Create API token.
From the dialog that appears, enter a memorable and concise Label for your token and click Create.
Click Copy to clipboard, then paste the token to your script.
Reference: API tokens
Python 3.8 Code Reference
from jira.client import JIRA
jira_client = JIRA(options={'server': JIRA_URL}, basic_auth=(JIRA_USERNAME, JIRA_TOKEN))
issue = jira_client.issue('PLAT-8742')
print(issue.fields.summary)
Don't change the library, instead put your credentials inside the ~/.netrc file.
If you put them there you will also be able to test your calls using curl or wget.
I am not sure anymore about compatibility with Jira 5.x, only 7.x and 6.4 are currently tested. If you setup an instance for testing I could modify the integration tests to run against it, too.
My lucky guess is that you broke it with that change.
As of 2019 Atlassian has deprecated authorizing with passwords.
You can easily replace the password with an API Token created here.
Here's a minimalistic example:
pip install jira
from jira import JIRA
jira = JIRA("YOUR-JIRA-URL", basic_auth=("YOUR-EMAIL", "YOUR-API-TOKEN"))
issue = jira.issue("YOUR-ISSUE-KEY (e.g. ABC-13)")
print(issue.fields.summary)
I recommend storing your API Token as an environment variable and accessing it with os.environ[key].

Categories