This is a short question:
Where does the function getBetweenHTML() come from, e.g. urllib2 or something i am not sure.
Could anyone also give me an explanation of this function or better alternatives, thanks.
Code syntax:
import urllib2
url = urllib2.urlopen('http://google.com').read()
scraped = getBetweenHTML(url,'<div class="question">',"</div>)
print scraped
e.g. prints a question
Edit: I've found the solution, I actually made my own function, that's the reason I couldn't find it anywhere, I coded it myself in bs4, for anyone who needs it:
def getBetweenHTML(strSource, strStart,strEnd):
start = strSource.find(strStart) + len(strStart)
end = strSource.find(strEnd,start)
return strSource[start:end]
Related
When I try to parse a rest api data,
it raises TypeError.
This is my code:
def get_contracts():
response_object = requests.get(
"https://testnet-api.phemex.com/md/orderbook?symbol=BTCUSD"
)
print(response_object.status_code)
for contract in response_object.json()["result"]["book"]:
print(contract["asks"])
get_contracts()
Any tip or solution will be very welcomed. Thanks in advance.
Edit/Update:
For some reason I am not able to select a specific key in the format above, its only possible if I do it like this:
data = response_object.json()['result']['book']['asks']
print(data)
I will try to work my code around that. Thanks for everyone who helped.
This code review may help you:
import requests
url = "https://testnet-api.phemex.com/md/orderbook?symbol=BTCUSD"
response_object = requests.get(url)
data = response_object.json()
# Printing your data helps to inspect the structure
# print(data)
# This is the list you are looking for:
asks = data['result']['book']['asks']
for ask in asks:
print(ask)
You need to iterate through asks, not book.
You have a nested dictionary where asks is a nested list.
If you simply click on the link you get getting, or print out your response_object.json() you would see the structure.
for foo in response_object.json()['result']['book']['asks']:
print(foo)
Although generally it's better to assign your response_object to a variable.
data = response_object.json()
for foo in data['result']['book']['asks']:
print(foo)
It looks like you are trying to access something that is not there, hence the KeyError.
I would debug, a simple print, the JSON object you are getting as answer and make sure that the keys you are trying to access are there.
I am using an API I found online for one of my scripts, and I am wondering if I can change one word from the API to something else. My code is:
import requests
people = requests.get('https://insult.mattbas.org/api/insult')
print("Welcome to the insult machine!\nType somebody you want to insult!")
b = input()
print(people.replace("You", b))
Is replace not a command? If so, what plugin and/or commands would I need to do it? Thanks!
The value returned from requests.get isn’t a string, it’s a response object and that class has no replace method.
Have a look at the structure of that class. For example, you can do r = requests.get(...) and r.text.replace(...).
In other words, you need to operate on the text part of the response object.
I have a link as input below,i need to parse this link and append "/#c/" as shown below,any inputs on how
this can be done?
INPUT:-https://link.com/617394/
OUTPUT:-https://link.com/#/c/617394/
Try something such as:
from urlparse import urlsplit, urlunsplit
s = 'https://link.com/617394/'
split = urlsplit(s)
new_url = urlunsplit(split._replace(path='/#/c' + split.path))
# https://link.com/#/c/617394/
"https://link.com/617394/".replace("m/6","m/#/c/6")
although I suspect your real problem is something else
"com/#/c/".join("https://link.com/617394/".split("com/"))
may be slightly more applicable to your actual problem statement (which I still dont know what that is)
my_link = "https://link.com/617394/"
print re.sub("\.(com|org|net)/",".\\1/#/c/",my_link)
maybe more of what your actually looking for ...
that urlsplit solution of #JonClements is pretty dang sweet too
I have an app that will show images from reddit. Some images come like this http://imgur.com/Cuv9oau, when I need to make them look like this http://i.imgur.com/Cuv9oau.jpg. Just add an (i) at the beginning and (.jpg) at the end.
You can use a string replace:
s = "http://imgur.com/Cuv9oau"
s = s.replace("//imgur", "//i.imgur")+(".jpg" if not s.endswith(".jpg") else "")
This sets s to:
'http://i.imgur.com/Cuv9oau.jpg'
This function should do what you need. I expanded on #jh314's response and made the code a little less compact and checked that the url started with http://imgur.com as that code would cause issues with other URLs, like the google search I included. It also only replaces the first instance, which could causes issues.
def fixImgurLinks(url):
if url.lower().startswith("http://imgur.com"):
url = url.replace("http://imgur", "http://i.imgur",1) # Only replace the first instance.
if not url.endswith(".jpg"):
url +=".jpg"
return url
for u in ["http://imgur.com/Cuv9oau","http://www.google.com/search?q=http://imgur"]:
print fixImgurLinks(u)
Gives:
>>> http://i.imgur.com/Cuv9oau.jpg
>>> http://www.google.com/search?q=http://imgur
You should use Python's regular expressions to place the i. As for the .jpg you can just append it.
I'm trying to pass information to a python page via the url. I have the following link text:
"<a href='complete?id=%s'>" % (str(r[0]))
on the complete page, I have this:
import cgi
def complete():
form = cgi.FieldStorage()
db = MySQLdb.connect(user="", passwd="", db="todo")
c = db.cursor()
c.execute("delete from tasks where id =" + str(form["id"]))
return "<html><center>Task completed! Click <a href='/chris'>here</a> to go back!</center></html>"
The problem is that when i go to the complete page, i get a key error on "id". Does anyone know how to fix this?
EDIT
when i run cgi.test() it gives me nothing
I think something is wrong with the way i'm using the url because its not getting passed through.
its basically localhost/chris/complete?id=1
/chris/ is a folder and complete is a function within index.py
Am i formatting the url the wrong way?
The error means that form["id"] failed to find the key "id" in cgi.FieldStorage().
To test what keys are in the called URL, use cgi.test():
cgi.test()
Robust test CGI script, usable as main program. Writes minimal HTTP headers and formats all information provided to the script in HTML form.
EDIT: a basic test script (using the python cgi module with Linux path) is only 3 lines. Make sure you know how to run it on your system, then call it from a browser to check arguments are seen on the CGI side. You may also want to add traceback formatting with import cgitb; cgitb.enable().
#!/usr/bin/python
import cgi
cgi.test()
Have you tried printing out the value of form to make sure you're getting what you think you're getting? You do have a little problem with your code though... you should be doing form["id"].value to get the value of the item from FieldStorage. Another alternative is to just do it yourself, like so:
import os
import cgi
query_string = os.environ.get("QUERY_STRING", "")
form = cgi.parse_qs(query_string)
This should result in something like this:
{'id': ['123']}
First off, you should make dictionary lookups via
possibly_none = my_dict.get( "key_name" )
Because this assigns None to the variable, if the key is not in the dict. You can then use the
if key is not None:
do_stuff
idiom (yes, I'm a fan of null checks and defensive programming in general...). The python documentation suggests something along these lines as well.
Without digging into the code too much, I think you should reference
form.get( 'id' ).value
in order to extract the data you seem to be asking for.