Parsing and appending a link - python

I have a link as input below,i need to parse this link and append "/#c/" as shown below,any inputs on how
this can be done?
INPUT:-https://link.com/617394/
OUTPUT:-https://link.com/#/c/617394/

Try something such as:
from urlparse import urlsplit, urlunsplit
s = 'https://link.com/617394/'
split = urlsplit(s)
new_url = urlunsplit(split._replace(path='/#/c' + split.path))
# https://link.com/#/c/617394/

"https://link.com/617394/".replace("m/6","m/#/c/6")
although I suspect your real problem is something else
"com/#/c/".join("https://link.com/617394/".split("com/"))
may be slightly more applicable to your actual problem statement (which I still dont know what that is)
my_link = "https://link.com/617394/"
print re.sub("\.(com|org|net)/",".\\1/#/c/",my_link)
maybe more of what your actually looking for ...
that urlsplit solution of #JonClements is pretty dang sweet too

Related

TypeError: string indices must be integers when making rest api request

When I try to parse a rest api data,
it raises TypeError.
This is my code:
def get_contracts():
response_object = requests.get(
"https://testnet-api.phemex.com/md/orderbook?symbol=BTCUSD"
)
print(response_object.status_code)
for contract in response_object.json()["result"]["book"]:
print(contract["asks"])
get_contracts()
Any tip or solution will be very welcomed. Thanks in advance.
Edit/Update:
For some reason I am not able to select a specific key in the format above, its only possible if I do it like this:
data = response_object.json()['result']['book']['asks']
print(data)
I will try to work my code around that. Thanks for everyone who helped.
This code review may help you:
import requests
url = "https://testnet-api.phemex.com/md/orderbook?symbol=BTCUSD"
response_object = requests.get(url)
data = response_object.json()
# Printing your data helps to inspect the structure
# print(data)
# This is the list you are looking for:
asks = data['result']['book']['asks']
for ask in asks:
print(ask)
You need to iterate through asks, not book.
You have a nested dictionary where asks is a nested list.
If you simply click on the link you get getting, or print out your response_object.json() you would see the structure.
for foo in response_object.json()['result']['book']['asks']:
print(foo)
Although generally it's better to assign your response_object to a variable.
data = response_object.json()
for foo in data['result']['book']['asks']:
print(foo)
It looks like you are trying to access something that is not there, hence the KeyError.
I would debug, a simple print, the JSON object you are getting as answer and make sure that the keys you are trying to access are there.

Python function getBetweenHTML

This is a short question:
Where does the function getBetweenHTML() come from, e.g. urllib2 or something i am not sure.
Could anyone also give me an explanation of this function or better alternatives, thanks.
Code syntax:
import urllib2
url = urllib2.urlopen('http://google.com').read()
scraped = getBetweenHTML(url,'<div class="question">',"</div>)
print scraped
e.g. prints a question
Edit: I've found the solution, I actually made my own function, that's the reason I couldn't find it anywhere, I coded it myself in bs4, for anyone who needs it:
def getBetweenHTML(strSource, strStart,strEnd):
start = strSource.find(strStart) + len(strStart)
end = strSource.find(strEnd,start)
return strSource[start:end]

how do I modify a url that I pick at random in python

I have an app that will show images from reddit. Some images come like this http://imgur.com/Cuv9oau, when I need to make them look like this http://i.imgur.com/Cuv9oau.jpg. Just add an (i) at the beginning and (.jpg) at the end.
You can use a string replace:
s = "http://imgur.com/Cuv9oau"
s = s.replace("//imgur", "//i.imgur")+(".jpg" if not s.endswith(".jpg") else "")
This sets s to:
'http://i.imgur.com/Cuv9oau.jpg'
This function should do what you need. I expanded on #jh314's response and made the code a little less compact and checked that the url started with http://imgur.com as that code would cause issues with other URLs, like the google search I included. It also only replaces the first instance, which could causes issues.
def fixImgurLinks(url):
if url.lower().startswith("http://imgur.com"):
url = url.replace("http://imgur", "http://i.imgur",1) # Only replace the first instance.
if not url.endswith(".jpg"):
url +=".jpg"
return url
for u in ["http://imgur.com/Cuv9oau","http://www.google.com/search?q=http://imgur"]:
print fixImgurLinks(u)
Gives:
>>> http://i.imgur.com/Cuv9oau.jpg
>>> http://www.google.com/search?q=http://imgur
You should use Python's regular expressions to place the i. As for the .jpg you can just append it.

Why Does My Put request fail?

Using Python 2.5 and httplib......
I am admittedly a python novice.....but this seems straight forward, why doesn't this work?
httpConn = HTTPConnection('127.0.0.1', 44789)
httpConn.request('PUT','/ShazaamMon/setmfgdata.cgi?serial=', hwSerialNum)
httpResp = httpConn.getresponse()
xmlResp = httpResp.read()
httpConn.close()
it returns the following response: <HTML><HEAD><TITLE>HTTP 404.......
Any clues anyone???
I think you should replace PUT with GET.
You should consider sanitizing the input, trye
httpConn.request('GET','/ShazaamMon/setmfgdata.cgi?serial=%s' % (urllib.quote(hwSerialNum)))
HTTP 404 means that the resource you requested does not exist. Are you sure that the URL is correct?
Moreover, you put in the body of the request (third parameter of request()) a variable that I think is a parameter of the request.
Try the following:
httpConn.request('PUT','/ShazaamMon/setmfgdata.cgi?serial=' + str(hwSerialNum))
or maybe (if GET is required instead of PUT):
httpConn.request('GET','/ShazaamMon/setmfgdata.cgi?serial=' + str(hwSerialNum))
#Angelom's answer is concise and correct. For a nice example-filled explanation of using PUT in urllib and urllib2 try http://www.voidspace.org.uk/python/articles/urllib2.shtml#data.

BeautifulSoup question

<parent1>
<span>Text1</span>
</parnet1>
<parent2>
<span>Text2</span>
</parnet2>
<parent3>
<span>Text3</span>
</parnet3>
I'm parsing this with Python & BeautifulSoup. I have a variable soupData which stores pointer for need object. How can I get pointer for the parent2, for example, if I have the text Text2. So the problem is to filter span-tags by content. How can I do this?
After correcting the spelling on the end-tags:
[e for e in soup(recursive=False, text=False) if e.span.string == 'Text2']
I don't think there's a way to do it in a single step. So:
for parenttag in soupData:
if parenttag.span.string == "Text2":
do_stuff(parenttag)
break
It's possible to use a generator expression, but not much shorter.
Using python 2.7.6 and BeautifulSoup 4.3.2 I found Marcelo's answer to give an empty list. This worked for me, however:
[x.parent for x in bSoup.findAll('span') if x.text == 'Text2'][0]
Alternatively, for a ridiculously overengineered solution (to this particular problem at least, but maybe it would be useful if you'll be doing filtering on criteria too long to put in a reasonably easily understandable list expression) you could do:
def hasText(text):
def hasTextFunc(x):
return x.text == text
return hasTextFunc
to create a function factory, then
hasTextText2 = hasText('Text2')
filter(hasTextText2,bSoup.findAll('span'))[0].parent
to get the reference to the parent tag that you were looking for

Categories