Find out if the current machine is on aws in python - python

I have a python script that runs on aws machines, as well as on other machines.
The functionality of the script depends on whether or not it is on AWS.
Is there a way to programmatically discover whether or not it runs on AWS? (maybe using boto?)

If you want to do that strictly using boto, you could do:
import boto.utils
md = boto.utils.get_instance_metadata(timeout=.1, num_retries=0)
The timeout specifies the how long the HTTP client will wait for a response before timing out. The num_retries parameter controls how many times the client will retry the request before giving up and returning and empty dictionary.

you can easily use the AWS SDK and check for instance id.
beside of that, you can check the aws ip ranges - check out this link
https://forums.aws.amazon.com/ann.jspa?annID=1701

I found a way, using:
try:
instance_id_resp = requests.get('http://169.254.169.254/latest/meta-data/instance-id')
is_on_aws = True
except requests.exceptions.ConnectionError as e:
is_on_awas = False

I tried some of the above, and when not running on Amazon I had troubles accessing 169.254.169.254. Maybe it has something to do with the fact I'm outside the US.
In any case, here's a piece of code that worked for me:
def running_on_amazon():
import urllib2
import socket
# I'm using curlmyip.com, but there are other websites that provide the same service
ip_finder_addr = "http://curlmyip.com"
f = urllib2.urlopen(ip_finder_addr)
my_ip = f.read(100).strip()
host_addr = socket.gethostbyaddr(my_ip)
my_public_name = host_addr[0]
amazon = (my_public_name.find("aws") >=0 )
return amazon # returns a boolean value.

Related

How to debug PyGitHub not being responsive?

I'm using PyGitHub to update my GitHub.com repos from inside a Ubuntu server using a python script.
I have noticed there are times, when my script just hangs there and there's no error message indicating what went wrong.
This is my script
from pathlib import Path
from typing import Optional
import typer
from github import Github, GithubException
app = typer.Typer()
#app.command()
def add_deploy_key(
token: Optional[str] = None, user_repo: Optional[str] = None, repo_name: Optional[str] = None
):
typer.echo("Starting to access GitHub.com... ")
try:
# using an access token
g = Github(token)
# I skipped a bunch of code to save space
for key in repo.get_keys():
if str(key.key) == str(pure_public_key):
typer.echo(
"We found an existing public key in " + user_repo + ", so we're NOT adding it"
)
return
rep_key = repo.create_key(
"DigitalOcean for " + repo_name, current_public_key, read_only=True
)
if rep_key:
typer.echo("Success with adding public key to repo in GitHub.com!")
typer.echo("")
typer.echo("The url to the deposited key is: " + rep_key.url)
else:
typer.echo("There's some issue when adding public key to repo in GitHub.com")
except GithubException as e:
typer.echo("There's some issue")
typer.echo(str(e))
return
if __name__ == "__main__":
app()
The way I trigger is inside a bash script
output=$(python /opt/github-add-deploy-keys.py --token="$token" --user-repo="$user_repo" --repo-name="$repo_name")
It works. But sometimes it just hangs there without any output. And since it happens intermittently and not consistently, it's hard to debug.
I cannot be sure if it's a typer issue or a network issue or a GitHub.com issue. There's just nothing.
I want it to fail fast and often. I know there's a timeout and a retry for the GitHub object.
See https://pygithub.readthedocs.io/en/latest/github.html?highlight=retry#github.MainClass.Github
I wonder if I can do anything with these two parameters so at least I have a visual of knowing something is being done. I can add a lot of typer.echo statements but that would be extremely verbose.
I'm also unfamiliar with the retry object. I wish even if a retry is made, there will be some echo statements to tell me a retry is being attempted.
What can I try?
The timeout should prevent the github request to hang and retries should make it work but based on the documentation there is a default timeout already. Since this question is for debugging I would suggest using the logging python library to just the steps your script runs in a file. You can find a good logging tutorial here.
As for the logging style, since your case has a lot of unknowns I would log when the script starts, before "Create Key" step and after "Create Key" an maybe in case of errors. You can go over the log file when the script hangs.
You can create a bash script to run your script with timeout and get a notification if the program exits with non zero exit code and leave it to run over night:
timeout 5 python <yourscript>.py

boto3 how to get the logstream form a sagemaker transform job?

i am able to crete the job and it fail, using boto3
import boto3
session = boto3.session.Session()
client = session.client('sagemaker')
descibe = client.describe_transform_job(TransformJobName="my_transform_job_name")
in the ui i can see the button to go to the logs, i can use boto3 to retrive the logs if hardcode the group name and the log-stream.
but how can i get the Log stream from the batch transfrom job? shouldnt be a field with logstream or something like that in the ".describe_transform_job"?
sagemaker doesnt provide a direct way to do it, the way to do it, is to also use the log client.
get the log streams corresponding to your batchtransform_job
client_logs = boto3.client('logs')
log_groups =
client_logs.describe_log_streams(logGroupName="the_log_group_name", logStreamNamePrefix=transform_job_name)
log_streams_names= []
for i in log_groups["logStreams"]:
log_streams_names.append(i["logStreamName"])
and this will give a list of "project_name/virtualMachine_id" that is the machines that your code was run depending on how many instances you set.
After you can run for each of the log_streams
for i_stream_name in log_streams_names:
client_logs.get_log_events("the_stream_log_name", "the_log_group_name")
now you can loop and print the lines of the log stream event =)

delay between requests using github3 in python

I'm using python github3 module and i need to set delay between request to github api, because my app make to much load on server.
I'm doing things such as
git = github3.GitHub()
for i in itertools.chain(git.all_repositories(), git.repositories(type='private')):
do things
I found that GitHub use requests to make request to github api.
https://github.com/sigmavirus24/github3.py/blob/3e251f2a066df3c8da7ce0b56d24befcf5eb2d4b/github3/models.py#L233
But i can't figure out what parameter i should pass or what atribute i should change to set some delay between the requests.
Can you advise me something?
github3.py presently has no options to enforce delays between requests. That said, there is a way to get the request metadata which includes the number of requests you have left in your ratelimit as well as when that ratelimit should reset. I suggest you use git.rate_limit()['resources']['core'] to determine what delays you should set for yourself inside your own loop.
I use the following function when I expect to exceed my query limit:
def wait_for_karma(gh, min_karma=25, msg=None):
while gh:
core = gh.rate_limit()['resources']['core']
if core['remaining'] < min_karma:
now = time.time()
nap = max(core['reset'] - now, 0.1)
logger.info("napping for %s seconds", nap)
if msg:
logger.info(msg)
time.sleep(nap)
else:
break
I'll call it before making a call that I believe is "big" (i.e. could require multiple API calls to satisfy). Based on your code sample, you may want to do this at the bottom of your loop:
git = github3.GitHub()
for i in itertools.chain(git.all_repositories(), git.repositories(type='private')):
do_things()
wait_for_karma(git, msg="pausing")

How to know if I used http while https was available - Python

I am trying to write a script that checks if HTTPS is available when I used http.
My Idea was to collect all of the HTTP links and use urllib2 in order to open a connection to the server using HTTPS as follows (please ignore syntax problems if there are. I have tried to simplify the code so it will be easier to understand the problem itself):
count=0
for packet in trafficPackets:
if packet["http.host"] != None:
if https_supported(packet["ip.dest"]):
count+=1
where https_supported is the following function:
def https_supported(ip):
try:
if len(urlopen("https://"+ip))>0
return True
except:
return False
return False
I have tried to run the code on a little traffic file which contains an HTTP connection to a site that supports https but the result was unexpected- it was always returned zero.
Where did I go wrong ? Does anyone have an idea of how can I do it?
Thank you!
Using the exact same code with the http.host field instead of the IP seems to work.

LDAP: ldap.SIZELIMIT_EXCEEDED

I am getting an ldap.SIZELIMIT_EXCEEDED error when I run this code:
import ldap
url = 'ldap://<domain>:389'
binddn = 'cn=<username> readonly,cn=users,dc=tnc,dc=org'
password = '<password>'
conn = ldap.initialize(url)
conn.simple_bind_s(binddn,password)
base_dn = "ou=People,dc=tnc,dc=org"
filter = '(objectClass=*)'
attrs = ['sn']
conn.search_s( base_dn, ldap.SCOPE_SUBTREE, filter, attrs )
Where username is my actual username, password is my actual password, and domain is the actual domain.
I don't understand why this is. Can somebody shed some light?
Manual: http://www.python-ldap.org/doc/html/ldap.html
exception ldap.SIZELIMIT_EXCEEDED
An LDAP size limit was exceeded. This
could be due to a sizelimit
configuration on the LDAP server.
I think your best bet here is to limit the sizelimit on the message you receive from the server. You can do that by setting the attribute LDAPObject.sizelimit (deprecated) or using the sizelimit parameter when using search_ext()
You should also make sure your bind was actually successful...
You're encountering that exception most likely because the server you're communicating with has more results than can be returned by a single request. In order to get around this you need to use paged results which can be done by using SimplePagedResultsControl.
Here's a Python3 implementation that I came up with after heavily editing what I found here and in the official documentation. At the time of writing this it works with the pip3 package python-ldap version 3.2.0.
def get_list_of_ldap_users():
hostname = "<domain>:389"
username = "username_here"
password = "password_here"
base = "ou=People,dc=tnc,dc=org"
print(f"Connecting to the LDAP server at '{hostname}'...")
connect = ldap.initialize(f"ldap://{hostname}")
connect.set_option(ldap.OPT_REFERRALS, 0)
connect.simple_bind_s(username, password)
search_flt = "(objectClass=*)"
page_size = 500 # how many users to search for in each page, this depends on the server maximum setting (default highest value is 1000)
searchreq_attrlist=["sn"] # change these to the attributes you care about
req_ctrl = SimplePagedResultsControl(criticality=True, size=page_size, cookie='')
msgid = connect.search_ext(base=base, scope=ldap.SCOPE_SUBTREE, filterstr=search_flt, attrlist=searchreq_attrlist, serverctrls=[req_ctrl])
total_results = []
pages = 0
while True: # loop over all of the pages using the same cookie, otherwise the search will fail
pages += 1
rtype, rdata, rmsgid, serverctrls = connect.result3(msgid)
for user in rdata:
total_results.append(user)
pctrls = [c for c in serverctrls if c.controlType == SimplePagedResultsControl.controlType]
if pctrls:
if pctrls[0].cookie: # Copy cookie from response control to request control
req_ctrl.cookie = pctrls[0].cookie
msgid = connect.search_ext(base=base, scope=ldap.SCOPE_SUBTREE, filterstr=search_flt, attrlist=searchreq_attrlist, serverctrls=[req_ctrl])
else:
break
else:
break
return total_results
This will return a list of all users but you can edit it as required to return what you want without hitting the SIZELIMIT_EXCEEDED issue :)
see here for what to do when you get this error:
How get get more search results than the server's sizelimit with Python LDAP?
The filter you provided (objectClass=*) is a presence filter. In this case it limits the results to the search request to objects in the directory at and underneath the base object you supplied - which is every object underneath the base object since every object has at least one objectClass. Restrict your search by using a more restrictive filter, or a tighter scope, or a lower base object, or all three. For more information on the topic of the search request, see Using ldapsearch and LDAP: Programming Practices.
Directory Server administrators are free to impose a server-wide limit on entries that can be returned to LDAP clients, these are known as a server-imposed size limit. There is a time limit which follows the same rules.
LDAP clients should always supply a size limit and time limit with a search request, these limits, known as client-requested limits cannot override the server-imposed limits, however.
Active Directory defaults to returning a max of 1000 results. What is sort of annoying is that rather than return 1000, with an associated error code, it seems to send the error code without the data.
eDirectory starts with no default, and is completely conifgurable to whatever you like.
Other directories handle it differently. (Edit and add in, if you know).
You must use paged search to achieve this.
The page size would depend on your ldap server, 1000 would work for Active Directory.
Have a look at http://google-apps-for-your-domain-ldap-sync.googlecode.com/svn/trunk/ldap_ctxt.py for an example

Categories