I'm using the scrape.py library to scrape a website. (library and documentation can be found here http://zesty.ca/scrape/)
There is a a button on the page I want the session to press, but I don't understand exactly how to use the submit function. As I understand I am supposed to give it a region object of a form. The button itself is an input html element. I tried giving it both the form and input, and I get the same error every time.
My code (on google app engine):
s.go(url)
form = s.doc.first(name="form1")
s.submit(region=form)
or
s.go(url)
input = s.doc.first(tagname="input", id="blabla")
s.submit(region=input)
and the error:
ERROR 2011-05-01 23:37:18,673 __init__.py:427] sequence item 0: expected string, NoneType found
Traceback (most recent call last):
File "\appengine\ext\webapp\__init__.py", line 636, in __call__
handler.post(*groups)
File "main.py", line 135, in post
s.submit(region=form)
File "scrape.py", line 342, in submit
return self.go(url, p, redirects)
File "scrape.py", line 288, in go
self.cookiejar)
File "scrape.py", line 176, in fetch
data = urlencode(data)
File "scrape.py", line 409, in urlencode
for key, value in params.items()]
File "scrape.py", line 405, in urlquote
return ''.join(map(urlquoted.get, text))
TypeError: sequence item 0: expected string, NoneType found
Yes I do know that this is a year old but since I am currently using scrape.py and I know the answer to this question I thought I should add it for those who come after.
The problem is in the submit.
Instead of s.submit(region=form) it should be s.submit(form).
The reason is that the variable form contains something like <Region 1254:1250> so you don't need to tell scrape.py that it's there, it is expected to be there.
So it's probably nothing to do with Javascript.
My assupmtion is that it's probably because the button and the form were covered in javascript, so scrape probably couldn't work with that. Need libraries that support JS, like selenium or windmill.
Related
I'v been trying to launch project(example_setup folder):
https://github.com/OTA-Insight/djangosaml2idp/tree/master/example_setup
I can anybody answer to men according with documentation. But it does not working. First problem, as I undesrtand is in date of methadata in SP(idp_metadata.xml)- validUntil="2020-12-27T12:41:18Z"> . It does not valid at the moment, and was changed to future date, as example(validUntil="2030-12-27T12:41:18Z"). But next I got another problem when trying to sign in to SP(localhost:8000) in my browser, I have more problem:
Error during SAML2 authentication
IncorrectlySigned
In attempts to find problem, I found the place where it is occured. In original it iis in tryexcept block, and can't be found easy.
Traceback (most recent call last):
File "/home/dmitriy/projects/djangosaml2idp/example_setup/idp/djangosaml2idp/views.py", line 251, in get
req_info = idp_server.parse_authn_request(request.session['SAMLRequest'], binding)
File "/home/dmitriy/projects/djangosaml2idp/example_setup/idp/venv/lib/python3.8/site-packages/saml2/server.py", line 238, in parse_authn_request
return self._parse_request(enc_request, AuthnRequest,
File "/home/dmitriy/projects/djangosaml2idp/example_setup/idp/venv/lib/python3.8/site-packages/saml2/entity.py", line 1036, in _parse_request
_request = _request.loads(xmlstr, binding, origdoc=enc_request,
File "/home/dmitriy/projects/djangosaml2idp/example_setup/idp/venv/lib/python3.8/site-packages/saml2/request.py", line 110, in loads
return self._loads(xmldata, binding, origdoc, must,
File "/home/dmitriy/projects/djangosaml2idp/example_setup/idp/venv/lib/python3.8/site-packages/saml2/request.py", line 51, in _loads
print(self.signature_check(xmldata, origdoc=origdoc,
File "/home/dmitriy/projects/djangosaml2idp/example_setup/idp/venv/lib/python3.8/site-packages/saml2/sigver.py", line 1662, in correctly_signed_authn_request
return self.correctly_signed_message(decoded_xml, 'authn_request', must, origdoc, only_valid_cert=only_valid_cert)
File "/home/dmitriy/projects/djangosaml2idp/example_setup/idp/venv/lib/python3.8/site-packages/saml2/sigver.py", line 1653, in correctly_signed_message
return self._check_signature(
File "/home/dmitriy/projects/djangosaml2idp/example_setup/idp/venv/lib/python3.8/site-packages/saml2/sigver.py", line 1503, in _check_signature
raise MissingKey(_issuer)
saml2.sigver.MissingKey: http://localhost:8000/saml2/metadata/
Internal Server Error: /idp/login/process/
Some key is missing:
Error during SAML2 authentication
MissingKey
http://localhost:8000/saml2/metadata/
My idp_metada in the SP is like in the [example_setup][1] folder of project, only validUntil has been changed as I said above, user in IDP has been created as superuser, I also tried to create user in the SP, the same as in the IDP, but nothing changed
Can anybody anser to me, what my problem is?
It's no any information in docs. But you need to create SP inside IDP from admin panel. That was resolved my problem.
I need to copy the source code from a website onto an html file stored locally as parsing from the url directly does not capture all of the page elements. I am hoping to extract locational elements within a table in the source code to be used for geocoding. My program goes through several pages of search results, writing the source code from each to an html file stored locally. The address elements are only about a third of the material each page so it would be nice to get rid of the additional elements to reduce the file size.
To do this, I would like the program to open a blank html doc for writing, write the current page's source code to it, close the doc, reopen it for parsing (in 'r' mode now), open a new doc for writing, and use beautiful soup to capture all of the geocoding data form the first doc and write it to the new document. The program will then close the first doc and then reopen it in 'w' mode again.
This will be done in a loop so the first doc will always get overwritten with the current page's source code while the second doc will stay open and keep having just the geocoding data written to it until there are no more pages.
Everything with looping and navigating and writing the source code to file is working fine but i can't get the parsing part figured out. I tried experimenting in an interactive env with this code:
from bs4 import BeautifulSoup
import html5lib
data = open(r"C:\GIS DataBase\web_resutls_raw_new_test.html",'r').read()
document = html5lib.parse(data)
soup = BeautifulSoup(str(document))
And I get the following error:
Traceback (most recent call last):
File "<interactive input>", line 1, in <module>
File "C:\Python27\lib\bs4\__init__.py", line 228, in __init__
self._feed()
File "C:\Python27\lib\bs4\__init__.py", line 289, in _feed
self.builder.feed(self.markup)
File "C:\Python27\lib\bs4\builder\_htmlparser.py", line 219, in feed
raise e
HTMLParseError: malformed start tag, at line 1, column 11
So I tried the following fix:
soup = HTMLParser.handle_starttag(BeautifulSoup(str(document)))
And alas:
Traceback (most recent call last):
File "<interactive input>", line 1, in <module>
File "C:\Python27\lib\bs4\__init__.py", line 228, in __init__
self._feed()
File "C:\Python27\lib\bs4\__init__.py", line 289, in _feed
self.builder.feed(self.markup)
File "C:\Python27\lib\bs4\builder\_htmlparser.py", line 219, in feed
raise e
HTMLParseError: malformed start tag, at line 1, column 11
I also tried with lxml, ertree and nothing seems to work. I cannot get the elements I need parsing from the url directly. I need to parse from the html file.
Pass data directly to BeautifulSoup as :
soup = BeautifulSoup(data,'html.parser')
Trying to use Folium to generate an interactive map from locations in a pandas dataframe. However, when I try to save the map as an HTML file, I get an Assertion Error: "You cannot render this Element if it's not in a Figure".
The only relevant information I've been able to find is an older forum post that was closed without enough detail for me to see how to fix it:
https://github.com/python-visualization/folium/issues/495
My code:
data = pd.read_csv(in_file)
route_map = folium.Map(location=[data['"PosLat"'].mean(), data['"PosLon"'].mean()],
zoom_start=10, tiles='OpenStreetMap')
for lat, lon, date in zip(data['"PosLat"'], data['"PosLon"'], data['"Date"_"Time"']):
folium.Marker(location=[lat, lon],
icon=folium.Icon(color='blue').add_to(route_map))
out_file = input('Enter file name: ')
if '.html' not in out_file:
out_file += '.html'
route_map.save(out_file)
Error:
Traceback (most recent call last):
File "interactive_map_gen.py", line 21, in <module>
route_map.save(out_file)
File "C:\Program Files\Python36\lib\site-packages\branca\element.py", line 157, in save
html = root.render(**kwargs)
File "C:\Program Files\Python36\lib\site-packages\branca\element.py", line 301, in render
child.render(**kwargs)
File "C:\Program Files\Python36\lib\site-packages\folium\map.py", line 299, in render
super(LegacyMap, self).render(**kwargs)
File "C:\Program Files\Python36\lib\site-packages\branca\element.py", line 617, in render
element.render(**kwargs)
File "C:\Program Files\Python36\lib\site-packages\branca\element.py", line 598, in render
assert isinstance(figure, Figure), ("You cannot render this Element "
AssertionError: You cannot render this Element if it's not in a Figure.
The only suggested workaround in the above forum thread was to import the colormap from folium instead of branca, but I haven't been able to find anything on how to do that. I've tried re-installing folium, I've tried setting the output file name to a fixed string. I'm at a loss. Everything follows the examples for folium 0.3.0 at https://pypi.python.org/pypi/folium. Is there something I'm missing?
I figured it out - simple syntax error that I missed repeatedly.
In the loop where I'm adding the markers, the proper syntax is
folium.Marker(pos, icon).add_to(map)
As I have it now, I'm trying to add the icon parameter to the map, not the whole marker.
folium.Marker(location=[lat, lon], icon=folium.Icon(color='blu**e')).add_to(route_map)**
checkout your error in bold word
I was trying to run a query for data in one of my google docs, and it's worked for several months. Starting yesterday or the day before, I noticed that my script no longer works. Has Google updated their api for spreadsheets? Has anybody found a workaround?
My error looks like this:
Traceback (most recent call last):
File "build_packer_image.py", line 311, in <module>
for index, entry in enumerate(client.GetWorksheetsFeed(doc_key).entry):
File "/build/toolchain/mac-10.5-32/lib/python2.7/site-packages/gdata/spreadsheet/service.py", line 129, in GetWorksheetsFeed
converter=gdata.spreadsheet.SpreadsheetsWorksheetsFeedFromString)
File "/build/toolchain/mac-10.5-32/lib/python2.7/site-packages/gdata/service.py", line 1074, in Get
return converter(result_body)
File "/build/toolchain/mac-10.5-32/lib/python2.7/site-packages/gdata/spreadsheet/__init__.py", line 411, in SpreadsheetsWorksheetsFeedFromString
xml_string)
File "/build/toolchain/mac-10.5-32/lib/python2.7/site-packages/atom/__init__.py", line 93, in optional_warn_function
return f(*args, **kwargs)
File "/build/toolchain/mac-10.5-32/lib/python2.7/site-packages/atom/__init__.py", line 127, in CreateClassFromXMLString
tree = ElementTree.fromstring(xml_string.replace('doctype','DOCTYPE'))
File "<string>", line 125, in XML
cElementTree.ParseError: no element found: line 1, column 0
Build step 'Execute shell' marked build as failure
Finished: FAILURE
I am using:
Python 2.7.5
gdata 2.0.18
I am just using an document key and no oauth in my code, if that makes a difference (I am passing in the username and password to the ClientLogin method)
Actually here is the answer to the problem:
The use of client login (using username/password instead of oauth2) is
likely the cause of the error. That protocol was deprecated 3+ years
ago and was just shutdown. If you capture the HTTP response (which
appears to have some HTML content), that might confirm if it is
related to the shutdown. Migrating to OAuth 2 would get your apps
working again.
After sending xml for update in spreadsheet google respond with a login page.
It means the authentication is not working for gdata now
https://code.google.com/a/google.com/p/apps-api-issues/issues/detail?id=3851#c2
I'm testing ebaysdk Python library that lets you connect to ebay. Now I'm trying examples from: https://github.com/timotheus/ebaysdk-python/
So far I got stuck at this example:
from ebaysdk.shopping import Connection as Shopping
shopping = Shopping(domain="svcs.sandbox.ebay.com", config_file="ebay.yaml")
response = shopping.execute('FindPopularItems',
{'QueryKeywords': 'Python'})
print response.disct()
When I run it. It gives me this error:
Traceback (most recent call last):
File "ebay-test.py", line 13, in <module>
{'QueryKeywords': 'Python'})
File "/usr/local/lib/python2.7/dist-packages/ebaysdk-2.1.0-py2.7.egg/ebaysdk/connection.py", line 123, in execute
self.error_check()
File "/usr/local/lib/python2.7/dist-packages/ebaysdk-2.1.0-py2.7.egg/ebaysdk/connection.py", line 193, in error_check
estr = self.error()
File "/usr/local/lib/python2.7/dist-packages/ebaysdk-2.1.0-py2.7.egg/ebaysdk/connection.py", line 305, in error
error_array.extend(self._get_resp_body_errors())
File "/usr/local/lib/python2.7/dist-packages/ebaysdk-2.1.0-py2.7.egg/ebaysdk/shopping/__init__.py", line 188, in _get_resp_body_errors
dom = self.response.dom()
File "/usr/local/lib/python2.7/dist-packages/ebaysdk-2.1.0-py2.7.egg/ebaysdk/response.py", line 229, in dom
return self._dom
File "/usr/local/lib/python2.7/dist-packages/ebaysdk-2.1.0-py2.7.egg/ebaysdk/response.py", line 216, in __getattr__
return getattr(self._obj, name)
AttributeError: 'Response' object has no attribute '_dom'
Am I missing something here or it could be some kind of bug in library?
Do you have a config file? I had a lot of problems getting started with this SDK. To get the yaml config file to work, I had to specify the directory that it was in. So in your example, it would be:
shopping = Shopping(domain="svcs.sandbox.ebay.com", config_file=os.path.join(os.path.dirname(os.path.realpath(__file__)), 'ebay.yaml'))
You should also be able to specify debug=true in your Shopping() declaration as in Shopping(debug=True).
Make sure if you have not, to specify your APP_ID and other necessary values in the config file.
You have the wrong domain, it should be open.api.sandbox.ebay.com. See this page on the ebaysdk github.