Nonetype object has no attribute '__getitem__' - python

I am trying to use an API wrapper downloaded from the net to get results from the new azure Bing API. I'm trying to implement it as per the instructions but getting the runtime error:
Traceback (most recent call last):
File "bingwrapper.py", line 4, in <module>
bingsearch.request("affirmative action")
File "/usr/local/lib/python2.7/dist-packages/bingsearch-0.1-py2.7.egg/bingsearch.py", line 8, in request
return r.json['d']['results']
TypeError: 'NoneType' object has no attribute '__getitem__'
This is the wrapper code:
import requests
URL = 'https://api.datamarket.azure.com/Data.ashx/Bing/SearchWeb/Web?Query=%(query)s&$top=50&$format=json'
API_KEY = 'SECRET_API_KEY'
def request(query, **params):
r = requests.get(URL % {'query': query}, auth=('', API_KEY))
return r.json['d']['results']
The instructions are:
>>> import bingsearch
>>> bingsearch.API_KEY='Your-Api-Key-Here'
>>> r = bingsearch.request("Python Software Foundation")
>>> r.status_code
200
>>> r[0]['Description']
u'Python Software Foundation Home Page. The mission of the Python Software Foundation is to promote, protect, and advance the Python programming language, and to ...'
>>> r[0]['Url']
u'http://www.python.org/psf/
This is my code that uses the wrapper (as per the instructions):
import bingsearch
bingsearch.API_KEY='abcdefghijklmnopqrstuv'
r = bingsearch.request("affirmative+action")

I tested this out myself, and it seems what you are missing is to properly url-encode your query. Without it, I was getting a 400 code.
import urllib2
import requests
# note the single quotes surrounding the query
URL = "https://api.datamarket.azure.com/Data.ashx/Bing/SearchWeb/Web?Query='%(query)s'&$top=50&$format=json"
query = 'affirmative+action'
# query == 'affirmative%2Baction'
r = requests.get(URL % {'query': urllib2.quote(query)}, auth=('', API_KEY))
print r.json['d']['results']
Your example doesn't make much sense because your request wrapper returns a list of results, yet in your main usage example you are calling it and then checking a status_code attribute on the return value (which is the list). That attribute would be present on the response objects, but you don't return it from your wrapper.

Related

Python - Web Scraping exercise - Attribute Error

I am learning how to scrape web information. Below is a snippet of the actual code solution + output from datacamp.
On datacamp, this works perfectly fine, but when I try to run it on Spyder (my own macbook), it doesn't work...
This is because on datacamp, the URL has already been pre-loaded into a variable named 'response'.. however on Spyder, the URL needs to be defined again.
So, I first defined the response variable as response = requests.get('https://www.datacamp.com/courses/all') so that the code will point to datacamp's website..
My code looks like:
from scrapy.selector import Selector
import requests
response = requests.get('https://www.datacamp.com/courses/all')
this_url = response.url
this_title = response.xpath('/html/head/title/text()').extract_first()
print_url_title( this_url, this_title )
When I run this on Spyder, I got an error message
Traceback (most recent call last):
File "<ipython-input-30-6a8340fd3a71>", line 11, in <module>
this_title = response.xpath('/html/head/title/text()').extract_first()
AttributeError: 'Response' object has no attribute 'xpath'
Could someone please guide me? I would really like to know how to get this code working on Spyder.. thank you very much.
The value returned by requests.get('https://www.datacamp.com/courses/all') is a Response object, and this object has no attribute xpath, hence the error: AttributeError: 'Response' object has no attribute 'xpath'
I assume response from your tutorial source, probably has been assigned to another object (most likely the object returned by etree.HTML) and not the value returned by requests.get(url).
You can however do this:
from lxml import etree #import etree
response = requests.get('https://www.datacamp.com/courses/all') #get the Response object
tree = etree.HTML(response.text) #pass the page's source using the Response object
result = tree.xpath('/html/head/title/text()') #extract the value
print(response.url) #url
print(result) #findings

Why does this python script work on my local machine but not on Heroku?

there. I'm building a simple scraping tool. Here's the code that I have for it.
from bs4 import BeautifulSoup
import requests
from lxml import html
import gspread
from oauth2client.service_account import ServiceAccountCredentials
import datetime
scope = ['https://spreadsheets.google.com/feeds']
credentials = ServiceAccountCredentials.from_json_keyfile_name('Programming
4 Marketers-File-goes-here.json', scope)
site = 'http://nathanbarry.com/authority/'
hdr = {'User-Agent':'Mozilla/5.0'}
req = requests.get(site, headers=hdr)
soup = BeautifulSoup(req.content)
def getFullPrice(soup):
divs = soup.find_all('div', id='complete-package')
price = ""
for i in divs:
price = i.a
completePrice = (str(price).split('$',1)[1]).split('<', 1)[0]
return completePrice
def getVideoPrice(soup):
divs = soup.find_all('div', id='video-package')
price = ""
for i in divs:
price = i.a
videoPrice = (str(price).split('$',1)[1]).split('<', 1)[0]
return videoPrice
fullPrice = getFullPrice(soup)
videoPrice = getVideoPrice(soup)
date = datetime.date.today()
gc = gspread.authorize(credentials)
wks = gc.open("Authority Tracking").sheet1
row = len(wks.col_values(1))+1
wks.update_cell(row, 1, date)
wks.update_cell(row, 2, fullPrice)
wks.update_cell(row, 3, videoPrice)
This script runs on my local machine. But, when I deploy it as a part of an app to Heroku and try to run it, I get the following error:
Traceback (most recent call last):
File "/app/.heroku/python/lib/python3.6/site-packages/gspread/client.py", line 219, in put_feed
r = self.session.put(url, data, headers=headers)
File "/app/.heroku/python/lib/python3.6/site-packages/gspread/httpsession.py", line 82, in put
return self.request('PUT', url, params=params, data=data, **kwargs)
File "/app/.heroku/python/lib/python3.6/site-packages/gspread/httpsession.py", line 69, in request
response.status_code, response.content))
gspread.exceptions.RequestError: (400, "400: b'Invalid query parameter value for cell_id.'")
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "AuthorityScraper.py", line 44, in
wks.update_cell(row, 1, date)
File "/app/.heroku/python/lib/python3.6/site-packages/gspread/models.py", line 517, in update_cell
self.client.put_feed(uri, ElementTree.tostring(feed))
File "/app/.heroku/python/lib/python3.6/site-packages/gspread/client.py", line 221, in put_feed
if ex[0] == 403:
TypeError: 'RequestError' object does not support indexing
What do you think might be causing this error? Do you have any suggestions for how I can fix it?
There are a couple of things going on:
1) The Google Sheets API returned an error: "Invalid query parameter value for cell_id":
gspread.exceptions.RequestError: (400, "400: b'Invalid query parameter value for cell_id.'")
2) A bug in gspread caused an exception upon receipt of the error:
TypeError: 'RequestError' object does not support indexing
Python 3 removed __getitem__ from BaseException, which this gspread error handling relies on. This doesn't matter too much because it would have raised an UpdateCellError exception anyways.
My guess is that you are passing an invalid row number to update_cell. It would be helpful to add some debug logging to your script to show, for example, which row it is trying to update.
It may be better to start with a worksheet with zero rows and use append_row instead. However there does seem to be an outstanding issue in gspread with append_row, and it may actually be the same issue you are running into.
I encountered the same problem. BS4 works fine at a local machine. However, for some reason, it is way too slow in the Heroku server resulting into giving error.
I switched to lxml and it is working fine now.
Install it by command:
pip install lxml
A sample code snippet is given below:
from lxml import html
import requests
getpage = requests.get("https://url_here")
gethtmlcontent = html.fromstring(getpage.content)
data = gethtmlcontent.xpath('//div[#class = "class-name"]/text()')
#this is a sample for fetching data from the dummy div
data = data[0:n] # as per your requirement
#now inject the data into django tmeplate.

Bingsearch returning 'instancemethod' object has no attribute '__getitem__'

I have written this code;
import bingsearch
bingsearch.API_KEY='mykey'
r = bingsearch.request("JohnDalton")
r.status_code
r[0]['Description']
print r[0]['Url']
This is th bingsearch.py file
import requests
import urllib2
URL = 'https://api.datamarket.azure.com/Data.ashx/Bing/SearchWeb/Web?Query=%(query)s&$top=50&$format=json'
API_KEY = 'mykey'
def request(query, **params):
r = requests.get(URL % {'query': query}, auth=('', API_KEY))
return r.json['d']['results']
As i mentioned in the title it gives me an instancemethod error. How should i fix this?
#Chris Barker was spot on earlier.
You need to change your line return r.json['d']['results'] into return r.json()['d']['results'].
You should really do proper error checking on your requests.get result and on the JSON returned. It might not contain the items you expect and it will then raise a KeyError.
For the request errors you might want to check the request documentation which has some basic starting points for possible exceptions.

Python-ldap ldap.initialize rejects a URL that ldapurl considers valid

I want to open a connection to a ldap directory using ldap url that will be given at run time. For example :
ldap://192.168.2.151/dc=directory,dc=example,dc=com
It is valid as far as I can tell. Python-ldap url parser ldapurl.LDAPUrl accepts it.
url = 'ldap://192.168.2.151/dc=directory,dc=example,dc=com'
parsed_url = ldapurl.LDAPUrl(url)
parsed_url.dn
'dc=directory,dc=example,dc=com'
But if I use it to initialize a LDAPObject, I get a ldap.LDAPError exception
ldap.initialize(url)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.7/dist-packages/ldap/functions.py", line 91, in initialize
return LDAPObject(uri,trace_level,trace_file,trace_stack_limit)
File "/usr/lib/python2.7/dist-packages/ldap/ldapobject.py", line 70, in __init__
self._l = ldap.functions._ldap_function_call(ldap._ldap_module_lock,_ldap.initialize,uri)
File "/usr/lib/python2.7/dist-packages/ldap/functions.py", line 63, in _ldap_function_call
result = func(*args,**kwargs)
ldap.LDAPError: (0, 'Error')
I found that if I manually encode the dn part of the url, it works :
url = 'ldap://192.168.2.151/dc=directory%2cdc=example%2cdc=com'
#url still valid
parsed_url = ldapurl.LDAPUrl(url)
parsed_url.dn
'dc=directory,dc=example,dc=com'
#and will return a valid connection
ldap.initialize(url)
<ldap.ldapobject.SimpleLDAPObject instance at 0x1400098>
How can I ensure robust url handling in ldap.initialize without encoding parts of the url myself ? (which, I'm afraid, won't be that robust anyway).
You can programatically encode the last part of the URL:
from urllib import quote # works in Python 2.x
from urllib.parse import quote # works in Python 3.x
url = 'ldap://192.168.2.151/dc=directory,dc=paralint,dc=com'
idx = url.rindex('/') + 1
url[:idx] + quote(url[idx:], '=')
=> 'ldap://192.168.2.151/dc=directory%2Cdc=paralint%2Cdc=com'
One can use LDAPUrl.unparse() method to get a properly encoded version of the URI, like this :
>>> import ldapurl
>>> url = ldapurl.LDAPUrl('ldap://192.168.2.151/dc=directory,dc=example,dc=com')
>>> url.unparse()
'ldap://192.168.2.151/dc%3Ddirectory%2Cdc%3Dparalint%2Cdc%3Dcom???'
>>> ldap.initialize(url.unparse())
<ldap.ldapobject.SimpleLDAPObject instance at 0x103d998>
And LDAPUrl.unparse() will not reencode an already encoded url :
>>> url = ldapurl.LDAPUrl('ldap://example.com/dc%3Dusers%2Cdc%3Dexample%2Cdc%3Dcom%2F???')
>>> url.unparse()
'ldap://example.com/dc%3Dusers%2Cdc%3Dexample%2Cdc%3Dcom%2F???'
So you can use it blindly on any ldap uri your program must handle.

a python script to query MIT START website from local machine

I'm learning Python and the project I've currently set myself includes sending a question from my laptop connected to the net, connect to the MIT START NLP database, enter the question, retrieve the response and display the response. I've read through the "HOWTO Fetch Internet Resources Using urllib2" at docs.python.org but I seem to be missing some poignant bit of this idea. Here's my code:
import urllib
import urllib2
question = raw_input("What is your question? ")
url = 'http://start.csail.mit.edu/'
values = question
data = urllib.urlencode(values)
req = urllib2.Request(url, data)
response = urllib2.urlopen(req)
the_page = response.read()
print the_page
and here's the error I'm getting:
Traceback (most recent call last): File "mitstart.py", line 9, in
data = urllib.urlencode(values) File "/usr/lib/python2.7/urllib.py", line 1298, in urlencode
raise TypeError TypeError: not a valid non-string sequence or mapping object
So I'm thinking that the way I set question in vales was wrong, so I did
values = {question}
and values = (question)
and values = ('question')
with no joy.
(I know, and my response is "I'm learning, it's late, and suddenly my wife decided she needed to talk to me about something trivial while I was trying to figure this out)
Can I get some guidance or at least get pointed in the right direction?
Note that your error says: TypeError: not a valid non-string sequence or mapping object
So, while you've created values as a string, you need a non-string sequence or a mapping object.
urlencoding requires key value pairs (e.g. a mapping object or a dict), so you generally pass it a dictionary.
Looking at the source for the form, you'll see:
<input type="text" name="query" size="60">
This means you should create a dict, something like:
values = { 'query': 'What is your question?' }
Then you should be able to pass that as the argument to urlencode().
urllib.urlencode() doesn't accept a string as an argument.
As #ernie said you should specify query parameter. Also the url is missing the /startfarm.cgi part:
<form method="post" action="startfarm.cgi">
Updated example:
import cgi
from urllib import urlencode
from urllib2 import urlopen
data = urlencode(dict(query=raw_input("What is your question?"))).encode('ascii')
response = urlopen("http://start.csail.mit.edu/startfarm.cgi", data)
# extract encoding from Content-Type and print the response
_, params = cgi.parse_header(response.headers.get('Content-Type', ''))
print response.read().decode(params['charset'])

Categories