How do I get the response text from a treq request? - python

I am trying to get started with some example code of the treq library, to little avail. While it is easy to get the status code and a few other properties of the response to the request, getting the actual text of the response is a little more difficult. The print_response function available in this example code is not present in the version that I have:
from twisted.internet.task import react
from _utils import print_response
import treq
def main(reactor, *args):
d = treq.get('http://httpbin.org/get')
d.addCallback(print_response)
return d
react(main, [])
Here is the traceback:
Traceback (most recent call last):
File "test.py", line 2, in <module>
from _utils import print_response
ModuleNotFoundError: No module named '_utils'
I am not really sure where to go from here...any help would be greatly appreciated.

Now that I look at it, that example is extremely bad, especially if you're new to twisted. Please give this a try:
import treq
from twisted.internet import defer, task
def main(reactor):
d = treq.get('https://httpbin.org/get')
d.addCallback(print_results)
return d
#defer.inlineCallbacks
def print_results(response):
content = yield response.content()
text = yield response.text()
json = yield response.json()
print(content) # raw content in bytes
print(text) # encoded text
print(json) # JSON
task.react(main)
The only thing you really have to know is that the .content(),.text(),.json() return Deferred objects that eventually return the body of the response. For this reason, you need to yield or execute callbacks.
Let's say you only want the text content, you could this:
def main(reactor):
d = treq.get('https://httpbin.org/get')
d.addCallback(treq.text_content)
d.addCallback(print) # replace print with your own callback function
return d
The treq.content() line of functions make it easy to return only the content, if thats all you care about, and do stuff with it.

Related

Poloniex Api Trouble

So I'm accessing the poloniex API with python and this is my code:
from poloniex import Poloniex
import krakenex
import threading
import pprint
import urllib.request
import json
####POLONIEX####
#FUNCTIONS
polo = Poloniex()
def BTC_USDT_LAST_POLONIEX():
polo = Poloniex()
threading.Timer(1.0, BTC_USDT_LAST_POLONIEX).start() # called every minute
print("BTC Last Price = " + (polo('returnTicker')['USDT_BTC']['last']))
def POLONIEX_ASSET_LIST():
pprint.pprint(sorted(list(polo('returnTicker'))))
Everything is working so far and I want to avoid using urllib as its a pain to turn a http request into a list. I'm trying to access the order book but get the following error:
>>> polo('returnOrderBook')
Traceback (most recent call last):
File "<pyshell#27>", line 1, in <module>
polo('returnOrderBook')
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/poloniex/retry.py", line 15, in wrapped
return function(*args, **kwargs)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/poloniex/__init__.py", line 183, in __call__
return self.parseJson(ret.text)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/poloniex/__init__.py", line 197, in parseJson
raise PoloniexError(jsonout['error'])
poloniex.PoloniexError: Please specify a currency pair.
I've tried specifying the currency pair but have no idea how to plug it in.
Rewrite your code and use requests module instead of urllib:
import requests
ret = requests.get('http://poloniex.com/public?command=returnOrderBook&currencyPair=BTC_BCN').json()
print ret
>>> {u'bids': [[u'0.00000034', 20629605.566027], [u'0.00000033', 43382683.465305], [u'0.00000032', 70007976.087993], [u'0.00000031', 49571221.248027], [u'0.00000030', 77520227.415484], [u'0.00000029', 46037827.046996], [u'0.00000028', 26267440.401662], [u'0.00000027', 22511987.85933], [u'0.00000026', 18885378.040015], [u'0.00000025', 13313109.292994], [u'0.00000024', 6243527.5236432], [u'0.00000023', 7504850.7832509], [u'0.00000022', 8443683.7997507], [u'0.00000021', 8996262.9826951], [u'0.00000020', 24601532.006268], [u'0.00000019', 26853346.478659], [u'0.00000018', 6027262.24889 etc....

Unit testing bottle py application that uses request body results in KeyError: 'wsgi.input'

When unit testing a bottle py route function:
from bottle import request, run, post
#post("/blah/<boo>")
def blah(boo):
body = request.body.readline()
return "body is %s" % body
blah("booooo!")
The following exception is raised:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 3, in blah
File "bottle.py", line 1197, in body
self._body.seek(0)
File "bottle.py", line 166, in __get__
if key not in storage: storage[key] = self.getter(obj)
File "bottle.py", line 1164, in _body
read_func = self.environ['wsgi.input'].read
KeyError: 'wsgi.input'
The code will work if running as a server via bottle's run function, it's purely when I call it as a normal Python function e.g. in a unit test.
What am I missing? How can I invoke this as a normal python func inside a unit test?
I eventually worked out what the problem is. I needed to "fake" the request environment for bottle to play nicely:
from bottle import request, run, post, tob
from io import BytesIO
body = "abc"
request.environ['CONTENT_LENGTH'] = str(len(tob(body)))
request.environ['wsgi.input'] = BytesIO()
request.environ['wsgi.input'].write(tob(body))
request.environ['wsgi.input'].seek(0)
# Now call your route function and assert
Another issue is that Bottle uses thread locals and reads the BytesIO object you put into request.environ the first time you access the body property on request. Therefore if you run multiple tests with post data, e.g. in a TestCase, when you come to read it in your request callback it will only return the value it was initially given, not your updated value.
The solution is to scrub all the values stored on the request object before each test, so in your setUp(self) you can do something like this:
class MyTestCase(TestCase):
def setUp():
# Flush any cached values
request.bind({})
Check out https://pypi.python.org/pypi/boddle. In your test you could do:
from bottle import request, run, post
from boddle import boddle
#post("/blah/<boo>")
def blah(boo):
body = request.body.readline()
return "body is %s" % body
with boddle(body='woot'):
print blah("booooo!")
Which would print body is woot.
Disclaimer: I authored. (Wrote it for work.)

Nonetype object has no attribute '__getitem__'

I am trying to use an API wrapper downloaded from the net to get results from the new azure Bing API. I'm trying to implement it as per the instructions but getting the runtime error:
Traceback (most recent call last):
File "bingwrapper.py", line 4, in <module>
bingsearch.request("affirmative action")
File "/usr/local/lib/python2.7/dist-packages/bingsearch-0.1-py2.7.egg/bingsearch.py", line 8, in request
return r.json['d']['results']
TypeError: 'NoneType' object has no attribute '__getitem__'
This is the wrapper code:
import requests
URL = 'https://api.datamarket.azure.com/Data.ashx/Bing/SearchWeb/Web?Query=%(query)s&$top=50&$format=json'
API_KEY = 'SECRET_API_KEY'
def request(query, **params):
r = requests.get(URL % {'query': query}, auth=('', API_KEY))
return r.json['d']['results']
The instructions are:
>>> import bingsearch
>>> bingsearch.API_KEY='Your-Api-Key-Here'
>>> r = bingsearch.request("Python Software Foundation")
>>> r.status_code
200
>>> r[0]['Description']
u'Python Software Foundation Home Page. The mission of the Python Software Foundation is to promote, protect, and advance the Python programming language, and to ...'
>>> r[0]['Url']
u'http://www.python.org/psf/
This is my code that uses the wrapper (as per the instructions):
import bingsearch
bingsearch.API_KEY='abcdefghijklmnopqrstuv'
r = bingsearch.request("affirmative+action")
I tested this out myself, and it seems what you are missing is to properly url-encode your query. Without it, I was getting a 400 code.
import urllib2
import requests
# note the single quotes surrounding the query
URL = "https://api.datamarket.azure.com/Data.ashx/Bing/SearchWeb/Web?Query='%(query)s'&$top=50&$format=json"
query = 'affirmative+action'
# query == 'affirmative%2Baction'
r = requests.get(URL % {'query': urllib2.quote(query)}, auth=('', API_KEY))
print r.json['d']['results']
Your example doesn't make much sense because your request wrapper returns a list of results, yet in your main usage example you are calling it and then checking a status_code attribute on the return value (which is the list). That attribute would be present on the response objects, but you don't return it from your wrapper.

How do I search for text in a page using regular expressions in Python?

I'm trying to create a simple module for phenny, a simple IRC bot framework in Python. The module is supposed to go to http://www.isup.me/websitetheuserrequested to check is a website was up or down. I assumed I could use regex for the module seeing as other built-in modules use it too, so I tried creating this simple script although I don't think I did it right.
import re, urllib
import web
isupuri = 'http://www.isup.me/%s'
check = re.compile(r'(?ims)<span class="body">.*?</span>')
def isup(phenny, input):
global isupuri
global cleanup
bytes = web.get(isupuri)
quote = check.findall(bytes)
result = re.sub(r'<[^>]*?>', '', str(quote[0]))
phenny.say(result)
isup.commands = ['isup']
isup.priority = 'low'
isup.example = '.isup google.com'
It imports the required web packages (I think), and defines the string and the text to look for within the page. I really don't know what I did in those four lines, I kinda just ripped the code off another phenny module.
Here is an example of a quotes module that grabs a random quote from some webpage, I kinda tried to use that as a base: http://pastebin.com/vs5ypHZy
Does anyone know what I am doing wrong? If something needs clarified I can tell you, I don't think I explained this enough.
Here is the error I get:
Traceback (most recent call last):
File "C:\phenny\bot.py", line 189, in call
try: func(phenny, input)
File "C:\phenny\modules\isup.py", line 18, in isup
result = re.sub(r'<[^>]*?>', '', str(quote[0]))
IndexError: list index out of range
try this (from http://docs.python.org/release/2.6.7/library/httplib.html#examples):
import httplib
conn = httplib.HTTPConnection("www.python.org")
conn.request("HEAD","/index.html")
res = conn.getresponse()
if res.status >= 200 and res.status < 300:
print "up"
else:
print "down"
You will also need to add code to follow redirects before checking the response status.
edit
Alternative that does not need to handle redirects but uses exceptions for logic:
import urllib2
request = urllib2.Request('http://google.com')
request.get_method = lambda : 'HEAD'
try:
response = urllib2.urlopen(request)
print "up"
print response.code
except urllib2.URLError, e:
# failure
print "down"
print e
You should do your own tests and choose the best one.
The error means your regexp wasn't found anywhere on the page (the list quote has no element 0).

example urllib3 and threading in python

I am trying to use urllib3 in simple thread to fetch several wiki pages.
The script will
Create 1 connection for every thread (I don't understand why) and Hang forever.
Any tip, advice or simple example of urllib3 and threading
import threadpool
from urllib3 import connection_from_url
HTTP_POOL = connection_from_url(url, timeout=10.0, maxsize=10, block=True)
def fetch(url, fiedls):
kwargs={'retries':6}
return HTTP_POOL.get_url(url, fields, **kwargs)
pool = threadpool.ThreadPool(5)
requests = threadpool.makeRequests(fetch, iterable)
[pool.putRequest(req) for req in requests]
#Lennart's script got this error:
http://en.wikipedia.org/wiki/2010-11_Premier_LeagueTraceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/threadpool.py", line 156, in run
http://en.wikipedia.org/wiki/List_of_MythBusters_episodeshttp://en.wikipedia.org/wiki/List_of_Top_Gear_episodes http://en.wikipedia.org/wiki/List_of_Unicode_characters result = request.callable(*request.args, **request.kwds)
File "crawler.py", line 9, in fetch
print url, conn.get_url(url)
AttributeError: 'HTTPConnectionPool' object has no attribute 'get_url'
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/threadpool.py", line 156, in run
result = request.callable(*request.args, **request.kwds)
File "crawler.py", line 9, in fetch
print url, conn.get_url(url)
AttributeError: 'HTTPConnectionPool' object has no attribute 'get_url'
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/threadpool.py", line 156, in run
result = request.callable(*request.args, **request.kwds)
File "crawler.py", line 9, in fetch
print url, conn.get_url(url)
AttributeError: 'HTTPConnectionPool' object has no attribute 'get_url'
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/threadpool.py", line 156, in run
result = request.callable(*request.args, **request.kwds)
File "crawler.py", line 9, in fetch
print url, conn.get_url(url)
AttributeError: 'HTTPConnectionPool' object has no attribute 'get_url'
After adding import threadpool; import urllib3 and tpool = threadpool.ThreadPool(4) #user318904's code got this error:
Traceback (most recent call last):
File "crawler.py", line 21, in <module>
tpool.map_async(fetch, urls)
AttributeError: ThreadPool instance has no attribute 'map_async'
Here is my take, a more current solution using Python3 and concurrent.futures.ThreadPoolExecutor.
import urllib3
from concurrent.futures import ThreadPoolExecutor
urls = ['http://en.wikipedia.org/wiki/2010-11_Premier_League',
'http://en.wikipedia.org/wiki/List_of_MythBusters_episodes',
'http://en.wikipedia.org/wiki/List_of_Top_Gear_episodes',
'http://en.wikipedia.org/wiki/List_of_Unicode_characters',
]
def download(url, cmanager):
response = cmanager.request('GET', url)
if response and response.status == 200:
print("+++++++++ url: " + url)
print(response.data[:1024])
connection_mgr = urllib3.PoolManager(maxsize=5)
thread_pool = ThreadPoolExecutor(5)
for url in urls:
thread_pool.submit(download, url, connection_mgr)
Some remarks
My code is based on a similar example from the Python Cookbook by Beazley and Jones.
I particularly like the fact that you only need a standard module besides urllib3.
The setup is extremely simple, and if you are only going for side-effects in download (like printing, saving to a file, etc.), there is no additional effort in joining the threads.
If you want something different, ThreadPoolExecutor.submit actually returns whatever download would return, wrapped in a Future.
I found it helpful to align the number of threads in the thread pool with the number of HTTPConnection's in a connection pool (via maxsize). Otherwise you might encounter (harmless) warnings when all threads try to access the same server (as in the example).
Obviously it will create one connection per thread, how should else each thread be able to fetch a page? And you try to use the same connection, made from one url, for all urls. That can hardly be what you intended.
This code worked just fine:
import threadpool
from urllib3 import connection_from_url
def fetch(url):
kwargs={'retries':6}
conn = connection_from_url(url, timeout=10.0, maxsize=10, block=True)
print url, conn.get_url(url)
print "Done!"
pool = threadpool.ThreadPool(4)
urls = ['http://en.wikipedia.org/wiki/2010-11_Premier_League',
'http://en.wikipedia.org/wiki/List_of_MythBusters_episodes',
'http://en.wikipedia.org/wiki/List_of_Top_Gear_episodes',
'http://en.wikipedia.org/wiki/List_of_Unicode_characters',
]
requests = threadpool.makeRequests(fetch, urls)
[pool.putRequest(req) for req in requests]
pool.wait()
Thread programming is hard, so I wrote workerpool to make exactly what you're doing easier.
More specifically, see the Mass Downloader example.
To do the same thing with urllib3, it looks something like this:
import urllib3
import workerpool
pool = urllib3.connection_from_url("foo", maxsize=3)
def download(url):
r = pool.get_url(url)
# TODO: Do something with r.data
print "Downloaded %s" % url
# Initialize a pool, 5 threads in this case
pool = workerpool.WorkerPool(size=5)
# The ``download`` method will be called with a line from the second
# parameter for each job.
pool.map(download, open("urls.txt").readlines())
# Send shutdown jobs to all threads, and wait until all the jobs have been completed
pool.shutdown()
pool.wait()
For more sophisticated code, have a look at workerpool.EquippedWorker (and the tests here for example usage). You can make the pool be the toolbox you pass in.
I use something like this:
#excluding setup for threadpool etc
upool = urllib3.HTTPConnectionPool('en.wikipedia.org', block=True)
urls = ['/wiki/2010-11_Premier_League',
'/wiki/List_of_MythBusters_episodes',
'/wiki/List_of_Top_Gear_episodes',
'/wiki/List_of_Unicode_characters',
]
def fetch(path):
# add error checking
return pool.get_url(path).data
tpool = ThreadPool()
tpool.map_async(fetch, urls)
# either wait on the result object or give map_async a callback function for the results

Categories