Parse a URL and replace variables if present - python

I have URI's specified in an xls file. I want to read that xls file, get the URI from there, parse it, and replace variables (if present) with the corresponding values, and then make an API call to that URI
For example:
These are a few URI's in the xls sheet:
https://api.something.com/v1/me
https://api.something.com/v1/{user_id}/account
(Where user_id is a variable, that has to be replaces by an appropriate value.) is there an easy way to parse the URI and check if there's a variable present there, if yes, get the value of the variable and form a new string with the value and then use the URI to make an API call. Else use the URI as is.

Field names can be discovered using stdlib string.Formatter:
>>> s = "https://api.something.com/v1/{user_id}/account"
>>> from string import Formatter
>>> parsed = Formatter().parse(s)
>>> field_names = []
>>> for literal_text, field_name, format_spec, conversion in parsed:
... if field_name is not None:
... field_names.append(field_name)
...
>>> field_names
['user_id']

Fortunately, Python has a built-in mechanism for handling this!
>>> 'https://api.something.com/v1/{user_id}/account'.format(user_id='my_id', unused_variable='xyzzy')
'https://api.something.com/v1/my_id/account'

Related

Extracting data from hyperlink cell in CSV

When I am reading a cell with hyperlink from CSV file I am getting the following:
=HYPERLINK("http://google.com","google") #for example
Is there a way to extract only the "google" without the =hyperlink and the link?
As per #martineau's comment, you have two versions of HYPERLINK.
>>> s1 = '=HYPERLINK("http://google.com","google")'
Or
>>> s2 = '=HYPERLINK("http://google.com")'
You can split, use a regex, but these methods are tricky (what if you have a comma in the url? an escaped quote in the name?).
There is a module called ast that parses Python expressions. We can use it, because Excel function call syntax is close to Python's one. Here's a version that returns the friendly name if there is one, and the url else:
>>> import ast
>>> ast.parse(s1[1:]).body[0].value.args[-1].s
'google'
And:
>>> ast.parse(s2[1:]).body[0].value.args[-1].s
'http://google.com'
This is how it works: s1[1:] removes the = sign. Then we take the value of the expression:
>>> v = ast.parse(s1[1:]).body[0].value
>>> v
<_ast.Call object at ...>
It is easy to extract the function name:
>>> v.func.id
'HYPERLINK'
And the args:
>>> [arg.s for arg in v.args]
['http://google.com', 'google']
Just take the last arg ( ....args[-1].s) to get the friendly name if it exists, and the url else. You can also checklen(args)` to do something if there is one arg, and something else if there are two args.

How to parse python b' before dict

How to access id or nickname value with python3:
response._content = b'{"id":44564,"nickname":'Demo'}
It looks like you're trying to read in a Json string and convert it to a dict, e.g.:
import json
# response._content = b'{"id":44564,"nickname":"Demo"}'
data = json.loads(response._content.decode('utf-8'))
# data = {'id': 44564, 'nickname': 'Demo'}
This is a byte string that includes JSON as stated above. Another way to look at it is that is a dict definition (i.e python code). You can use eval for that:
foo = eval( b'{"id":44564,"nickname":"Demo"}')
foo['nickname']
Probably this is not the preferred or secure way to do it because eval is considered dangerous
https://nedbatchelder.com/blog/201206/eval_really_is_dangerous.html

How can I get only the latest file/files created/modified on S3 location through python

using boto i tried the below code :
from boto.s3.connection import S3Connection
conn = S3Connection('XXX', 'YYYY')
bucket = conn.get_bucket('myBucket')
file_list = bucket.list('just/a/prefix/')
but am unable to get the length of the list or the last element of the file_list as it is a BucketListResultSet type ,please suggest a solution for this scenario
You are trying to use boto library, which is rather obsolete and not maintained. The number of
issues with this library is growing.
Better use currently developed boto3.
First, let us define parameters of our search:
>>> bucket_name = "bucket_of_m"
>>> prefix = "region/cz/"
Do import boto3 and create s3 representing S3 resource:
>>> import boto3
>>> s3 = boto3.resource("s3")
Get the bucket:
>>> bucket = s3.Bucket(name=bucket_name)
>>> bucket
s3.Bucket(name='bucket_of_m')
Define filter for objects with given prefix:
>>> res = bucket.objects.filter(Prefix=prefix)
>>> res
s3.Bucket.objectsCollection(s3.Bucket(name='bucket_of_m'), s3.ObjectSummary)
and iterate over it:
>>> for obj in res:
... print obj.key
... print obj.size
... print obj.last_modified
...
Each obj is ObjectSummary (not Object itself), but it holds enought to learn something about it
>>> obj
s3.ObjectSummary(bucket_name='bucket_of_m', key=u'region/cz/Ostrava/Nadrazni.txt')
>>> type(obj)
boto3.resources.factory.s3.ObjectSummary
You can get Object from it and use it as you need:
>>> o = obj.Object()
>>> o
s3.Object(bucket_name='bucket_of_m', key=u'region/cz/rodos/fusion/AdvancedDataFusion.xml')
There are not so many options for filtering, but prefix is available.
As an addendum to Jan's answer:
Seems that the boto3 library has changed in the meantime and currently (version 1.6.19 at the time of writing) offers more parameters for the filter method:
object_summary_iterator = bucket.objects.filter(
Delimiter='string',
EncodingType='url',
Marker='string',
MaxKeys=123,
Prefix='string',
RequestPayer='requester'
)
Three useful parameters to limit the number of entries for your scenario are Marker, MaxKeys and Prefix:
Marker (string) -- Specifies the key to start with when listing
objects in a bucket.
MaxKeys (integer) -- Sets the maximum number of
keys returned in the response. The response might contain fewer keys
but will never contain more.
Prefix (string) -- Limits the response to
keys that begin with the specified prefix.
Two notes:
The key you specify for Marker will not be included in the result, i.e. the listing starts from the key following the one you specify as Marker.
The boto3 library is performing automatic pagination on the results. The size of each page is determined by the MaxKeys parameter of the filter function (defaulting to 1000).
If you iterate over the s3.Bucket.objectsCollection object for more than that, it will automatically download the next page. While this is generally useful, it might be surprising when you specify e.g. MaxKeys=10 and want to iterate only over the 10 keys, yet the iterator will go over all matched keys, just with a new request to server each 10 keys.
So, if you just want e.g. the first three results, break off the loop manually, don't rely on the iterator.
(Unfortunately this is not clear in the docs (it's actually quite wrong), as the library parameter description is copied from the API parameter description, where it actually makes sense: "The response might contain fewer keys but will never contain more.")

Python Environment variable within environment variable

I'm trying to set up an environment variable via Python:
os.environ["myRoot"]="/home/myName"
os.environ["subDir"]="$myRoot/subDir"
I expect the subDir environment variable to hold /home/myname/subDir, however it holds the string '$myRoot/subDir'. How do I get this functionality?
(Bigger picture : I'm reading a json file of environment variables and the ones lower down reference the ones higher up)
Use os.environ to fetch the value, and os.path to correctly put slashes in the right places:
os.environ["myRoot"]="/home/myName"
os.environ["subDir"] = os.path.join(os.environ['myRoot'], "subDir")
You can use os.path.expandvars to expand environment variables like so:
>>> import os
>>> print os.path.expandvars("My home directory is $HOME")
My home director is /home/Majaha
>>>
For your example, you might do:
os.environ["myRoot"] = "/home/myName"
os.environ["subDir"] = os.path.expandvars("$myRoot/subDir")
I think #johntellsall's answer is the better for the specific example you gave, however I don't doubt you'll find this useful for your json work.
Edit: I would now recommend using #johntellsall's answer, as os.path.expandvars() is designed explicitly for use with paths, so using it for arbitrary strings may work but is kinda hacky.
def fix_text(txt,data):
'''txt is the string to fix, data is the dictionary with the variable names/values'''
def fixer(m): #takes a regex match
match = m.groups()[0] #since theres only one thats all we worry about
#return a replacement or the variable name if its not in the dictionary
return data.get(match,"$%s"%match)
return re.sub("$([a-zA-Z]+)",fixer,txt) #regular expression to match a "$" followed by 1 or more letters
with open("some.json") as f: #open the json file to read
file_text= f.read()
data = json.loads(file_text) #load it into a json object
#try to ensure you evaluate them in the order you found them
keys = sorted(data.keys() ,key=file_text.index)
#create a new dictionary by mapping our ordered keys above to "fixed" strings that support simple variables
data2= dict(map(lambda k:(k,fixer(data[k],data)),keys)
#sanity check
print data2
[edited to fix a typo that would cause it not to work]

How to retrieve GET vars in python bottle app

I'm trying to make a simple REST api using the Python bottle app.
I'm facing a problem in retrieving the GET variables from the request global object.
Any suggestions how to retrieve this from the GET request?
They are stored in the request.query object.
http://bottlepy.org/docs/dev/tutorial.html#query-variables
It looks like you can also access them by treating the request.query attribute like a dictionary:
request.query['city']
So dict(request.query) would create a dictionary of all the query parameters.
As #mklauber notes, this will not work for multi-byte characters. It looks like the best method is:
my_dict = request.query.decode()
or:
dict(request.query.decode())
to have a dict instead of a <bottle.FormsDict object at 0x000000000391B...> object.
If you want them all:
from urllib.parse import parse_qs
dict = parse_qs(request.query_string)
If you want one:
one = request.GET.get('one', '').strip()
Can you try this please:
For this example : http://localhost:8080/command?param_name=param_value
In your code:
param_value = request.query.param_name
from the docs
name = request.cookies.name
# is a shortcut for:
name = request.cookies.getunicode('name') # encoding='utf-8' (default)
# which basically does this:
try:
name = request.cookies.get('name', '').decode('utf-8')
except UnicodeError:
name = u''
So you might prefer using attribute accessor (request.query.variable_name) than request.query.get('variable_name')
Another point is you can use request.params.variable_name which works both for GET and POST methods, than having to swich request.query.variable_name or request.forms.variable_name depending GET/POST.

Categories