Python Html table can't find data when running on server

Python Html table can't find data when running on server - python

Hi my code won't work when actually running online, it returns None when i use Find how can i fix this?
This is my code;
import time
import sys
import urllib
import re
from bs4 import BeautifulSoup, NavigableString
print "Initializing Python Script"
print "The passed arguments are "
urls = ["http://tweakers.net/pricewatch/355474/gigabyte-gv-n78toc-3g/specificaties/", "http://tweakers.net/pricewatch/328943/sapphire-radeon-hd-7950-3gb-gddr5-with-boosts/specificaties/", "https://www.alternate.nl/GIGABYTE/GV-N78TOC-3GD-grafische-kaart/html/product/1115798", "http://tweakers.net/pricewatch/320116/raspberry-pi-model-b-(512mb)/specificaties/"]
i =0
regex = '<title>(.+?)</title>'
pattern = re.compile(regex)
word = "tweakers"
alternate = "alternate"
while i<len(urls):
dataraw = urllib.urlopen(urls[i])
data = dataraw.read()
soup = BeautifulSoup(data)
table = soup.find("table", {"class" : "spec-detail"})
print table
i+=1
Here is the outcome:
Initializing Python Script
The passed arguments are
None
None
None
None
Script finalized
i have tried using findAll and other methods.. But i don't seem to understand why it is working on my Command line but not on the server itself...
Any help?
Edit
Traceback (most recent call last):
File "python_script.py", line 35, in
soup = BeautifulSoup(urllib2.urlopen(url), 'html.parser')
File "/usr/lib/python2.7/urllib2.py", line 126, in urlopen
return _opener.open(url, data, timeout)
File "/usr/lib/python2.7/urllib2.py", line 406, in open
response = meth(req, response)
File "/usr/lib/python2.7/urllib2.py", line 519, in http_response
'http', request, response, code, msg, hdrs)
File "/usr/lib/python2.7/urllib2.py", line 444, in error
return self._call_chain(*args)
File "/usr/lib/python2.7/urllib2.py", line 378, in _call_chain
result = func(*args)
File "/usr/lib/python2.7/urllib2.py", line 527, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 403: Forbidden

I'm suspecting you are experiencing the differences between parsers.
Specifying the parser explicitly works for me:
import urllib2
from bs4 import BeautifulSoup
urls = ["http://tweakers.net/pricewatch/355474/gigabyte-gv-n78toc-3g/specificaties/",
"http://tweakers.net/pricewatch/328943/sapphire-radeon-hd-7950-3gb-gddr5-with-boosts/specificaties/",
"https://www.alternate.nl/GIGABYTE/GV-N78TOC-3GD-grafische-kaart/html/product/1115798",
"http://tweakers.net/pricewatch/320116/raspberry-pi-model-b-(512mb)/specificaties/"]
for url in urls:
soup = BeautifulSoup(urllib2.urlopen(url), 'html.parser')
table = soup.find("table", {"class": "spec-detail"})
print table
In this case, I'm using html.parser, but you can play around and specify lxml or html5lib, for example.
Note that the third url doesn't contain a table with class="spec-detail" and, therefore, it prints None for it.
I've also introduced few improvements:
removed unused imports
replaced a while loop with indexing with a nice for loop
removed unrelated code
replaced urllib with urllib2
You can also use requests module and set appropriate User-Agent header pretending to be a real browser:
from bs4 import BeautifulSoup
import requests
urls = ["http://tweakers.net/pricewatch/355474/gigabyte-gv-n78toc-3g/specificaties/",
"http://tweakers.net/pricewatch/328943/sapphire-radeon-hd-7950-3gb-gddr5-with-boosts/specificaties/",
"https://www.alternate.nl/GIGABYTE/GV-N78TOC-3GD-grafische-kaart/html/product/1115798",
"http://tweakers.net/pricewatch/320116/raspberry-pi-model-b-(512mb)/specificaties/"]
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2062.124 Safari/537.36'}
for url in urls:
response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.content, 'html.parser')
table = soup.find("table", {"class": "spec-detail"})
print table
Hope that helps.

Related

How to print JSON data to a Google Sheet using GSpread

I have tried every possible fix I can find online, unfortunately, I'm new to this and not sure if I'm getting closer or not.
Ultimately, all I am trying to do is print a JSON feed into a Google Sheet.
GSpread is working (I've appended just number values as a test), but I simply cannot get the JSON feed to print there.
I've gotten it printing to terminal, so I know it's accessible, but writing the loop to append the data becomes the issue.
This is my current script:
# import urllib library
import json
from urllib.request import urlopen
import gspread
gc = gspread.service_account(filename='creds.json')
sh = gc.open_by_key('1-1aiGMn2yUWRlh_jnIebcMNs-6phzUNxkktAFH7uY9o')
worksheet = sh.sheet1
# import json
# store the URL in url as
# parameter for urlopen
url = 'https://api.chucknorris.io/jokes/random'
# store the response of URL
response = urlopen(url)
# storing the JSON response
# from url in data
data_json = json.loads(response.read())
# print the json response
# print(data_json)
result = []
for key in data_json:
result.append([key, data_json[key]])
worksheet.update('a1', result)
I've hit a complete brick wall - any advice would be greatly appreciated
Update - suggested script with new error:
# import urllib library
import json
from urllib.request import urlopen
import gspread
gc = gspread.service_account(filename='creds.json')
sh = gc.open_by_key('1-1aiGMn2yUWRlh_jnIebcMNs-6phzUNxkktAFH7uY9o')
worksheet = sh.sheet1
url = 'https://api.chucknorris.io/jokes/random'
# store the response of URL
response = urlopen(url)
# storing the JSON response
# from url in data
data_json = json.loads(response.read())
# print the json response
# print(data_json)
result = []
for key in data_json:
result.append([key, data_json[key] if not isinstance(
data_json[key], list) else ",".join(map(str, data_json[key]))])
worksheet.update('a1', result)
Error:
Traceback (most recent call last):
File "c:\Users\AMadle\NBA-JSON-Fetch\PrintToSheetTest.py", line 17, in <module>
response = urlopen(url)
File "C:\Python\python3.10.5\lib\urllib\request.py", line 216, in urlopen
return opener.open(url, data, timeout)
File "C:\Python\python3.10.5\lib\urllib\request.py", line 525, in open
response = meth(req, response)
File "C:\Python\python3.10.5\lib\urllib\request.py", line 634, in http_response
response = self.parent.error(
File "C:\Python\python3.10.5\lib\urllib\request.py", line 563, in error
return self._call_chain(*args)
File "C:\Python\python3.10.5\lib\urllib\request.py", line 496, in _call_chain
result = func(*args)
File "C:\Python\python3.10.5\lib\urllib\request.py", line 643, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden
Can confirm it is not a permissions issue, the script below prints the same URL to terminal no problem. Also have no problem writing other data to the sheet:
import requests as rq
from bs4 import BeautifulSoup
url = 'https://api.chucknorris.io/jokes/random'
req = rq.get(url, verify=False)
soup = BeautifulSoup(req.text, 'html.parser')
print(soup)

In your script, I thought that it is required to convert the JSON data to a 2-dimensional array. And, when I saw the value of data_json, I noticed that an array is included in the value. I think that it is required to be also considered. I thought that this might be the reason for your issue. When this is reflected in your script, how about the following modification?
From:
result.append([key, data_json[key]])
To:
result.append([key, data_json[key] if not isinstance(data_json[key], list) else ",".join(map(str, data_json[key]))])
In this modification, the array is converted to the string using join.

BeautifulSoup timing out with certain URL's?

I am brand new to using BeautifulSoup and I am running into an odd issue, likely user error, but I am stumped! I am using BeautifulSoup to parse through a webpage, and return the first a tag with an href attribute. When I use the Wikipedia link, it works as expected! However when I use the BestBuy link, it leads to this timeout...
from bs4 import BeautifulSoup
from urllib.request import Request, urlopen
import urllib.request
# url = r"https://en.wikipedia.org/wiki/Eastern_Front_(World_War_II)"
url = r"https://www.bestbuy.com/site/nintendo-switch-32gb-console-neon-red-neon-blue-joy-con/6364255.p?skuId=6364255"
html_content = urllib.request.urlopen(url)
soup = BeautifulSoup(html_content, 'html.parser')
link = soup.find('a', href=True)
print(link)
Traceback (most recent call last):
File "scrapper.py", line 8, in <module>
html_content = urllib.request.urlopen(url)
File "/usr/local/Cellar/python#3.8/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py", line 222, in urlopen
return opener.open(url, data, timeout)
File "/usr/local/Cellar/python#3.8/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py", line 525, in open
response = self._open(req, data)
File "/usr/local/Cellar/python#3.8/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py", line 542, in _open
result = self._call_chain(self.handle_open, protocol, protocol +
File "/usr/local/Cellar/python#3.8/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py", line 502, in _call_chain
result = func(*args)
File "/usr/local/Cellar/python#3.8/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py", line 1393, in https_open
return self.do_open(http.client.HTTPSConnection, req,
File "/usr/local/Cellar/python#3.8/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py", line 1354, in do_open
r = h.getresponse()
File "/usr/local/Cellar/python#3.8/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/http/client.py", line 1347, in getresponse
response.begin()
File "/usr/local/Cellar/python#3.8/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/http/client.py", line 307, in begin
version, status, reason = self._read_status()
File "/usr/local/Cellar/python#3.8/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/http/client.py", line 268, in _read_status
line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
File "/usr/local/Cellar/python#3.8/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/socket.py", line 669, in readinto
return self._sock.recv_into(b)
File "/usr/local/Cellar/python#3.8/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/ssl.py", line 1241, in recv_into
return self.read(nbytes, buffer)
File "/usr/local/Cellar/python#3.8/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/ssl.py", line 1099, in read
return self._sslobj.read(len, buffer)
TimeoutError: [Errno 60] Operation timed out
Do you guys have any insight as to why this might be happening with only certain URL's? Thanks in advance!

You cannot scrape all websites using BeautifulSoap, some websites have restrictions. Best practice is always use headers:
import requests
from bs4 import BeautifulSoup
headers = {'User-Agent': 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.7) Gecko/2009021910 Firefox/3.0.7'}
url = r"https://www.bestbuy.com/site/nintendo-switch-32gb-console-neon-red-neon-blue-joy-con/6364255.p?skuId=6364255"
req = requests.get(url, headers)
soup = BeautifulSoup(req.content, 'html.parser')
print(soup.prettify())
Output:
<html>
<head>
<title>
Access Denied
</title>
</head>
<body>
<h1>
Access Denied
</h1>
You don't have permission to access "http://www.bestbuy.com/site/nintendo-switch-32gb-console-neon-red-neon-blue-joy-con/6364255.p?" on this server.
<p>
Reference #18.9f01d517.1595655333.b833c
</p>
</body>
</html>
You can achieve this task using selenium, follow below steps:
Step 1: Download the web driver for chrome:
First check your chrome version(Browser's Menu(triple vertical dots) -> Help -> About Google Chrome
Step 2: Download Driver from here according to your chrome browser version(mine is 81.0.4044.138)
Step 3: Once downloaded unzip the file and place chromedriver.exe in the directory where your script is.
Step 4: pip install selenium
Now use the below code:
from selenium import webdriver
import os
from bs4 import BeautifulSoup
from urllib.request import Request, urlopen
import urllib.request
#your website url
site = 'https://www.bestbuy.com/site/nintendo-switch-32gb-console-neon-red-neon-blue-joy-con/6364255.p?skuId=6364255'
#your driver path
driver = webdriver.Chrome(executable_path = 'chromedriver.exe')
#passing website url
driver.get(site)
soup = BeautifulSoup(driver.page_source, 'html.parser')
driver.close()
link = soup.find('a', href=True)
print(link)
Output:
<a href="https://www.bestbuy.ca/en-CA/home.aspx">
<img alt="Canada" src="https://www.bestbuy.com/~assets/bby/_intl/landing_page/images/maps/canada.svg"/>
<h4>Canada</h4>
</a>

Trying to Scrape a website for data but getting an Error 403

from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
my_url = 'http://www.csgoanalyst.win'
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()
page_soup = soup(page_html, "html.parser")
page_soup.body
I am trying to scrape hltv.org in order to find out what maps each team bans and picks. However, I keep getting the following error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/anaconda/lib/python3.6/urllib/request.py", line 223, in urlopen
return opener.open(url, data, timeout)
File "/anaconda/lib/python3.6/urllib/request.py", line 532, in open
response = meth(req, response)
File "/anaconda/lib/python3.6/urllib/request.py", line 642, in http_response
'http', request, response, code, msg, hdrs)
File "/anaconda/lib/python3.6/urllib/request.py", line 570, in error
return self._call_chain(*args)
File "/anaconda/lib/python3.6/urllib/request.py", line 504, in _call_chain
result = func(*args)
File "/anaconda/lib/python3.6/urllib/request.py", line 650, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden
>>> page_html = uClient.read()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'uClient' is not defined
>>> uClient.close()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'uClient' is not defined
I tried the script on another website so I know it works. I assume hltv has blocked bots or whatever from doing this and I know I shouldn't particularly be doing it if they don't want people to but I would love to get the data.
Any help will be super helpful.
Thank you.

from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
my_url = 'https://www.hltv.org/stats/teams/maps/6665/Astralis'
u_client = uReq(my_url)
soup = bs.BeautifulSoup(u_client,"html.parser")
print soup
and if you want to remove the tags
import bleach
print bleach.clean(soup,tags = [],strip = True)

I suggest you use requests module instead of urllib. It's fast, and has other advantages as well. You're getting forbidden because you're lacking a User-Agent header. Try something like the following:
import requests
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Cafari/537.36'}
my_url = 'https://www.hltv.org/stats/teams/maps/6665/Astralis'
page = requests.get(my_url, headers=headers)
page_html = page.text
requests can be easily installed using pip: pip install requests
You can add headers using urllib as well, but it's slightly more complex syntactically and perhaps slower.

automated login webpage with credentials and browse to different subpage and download that subpage perl or python

I need help with to Login to webpage using id and password and goto a link within the website and download the complete page code or response that we can see in page source as response for the link. I tried using perl or python but no luck.
I need help with to login to www.server.com:7180 and save cookies and then redirect to ww.server.com:7180/healthissues page or directly download whatever we get response in browser in text file.
import urllib
import urllib2
import webbrowser
import cookielib
data1={
'j_username':'id', 'j_password':'pass'
}
data = urllib.urlencode(data1)
url = 'http://server.intranet.com:7180/cmf/allHealthIssues'
full_url = url + '?' + data
response = urllib2.urlopen(full_url)
with open("results.html", "w") as f:
f.write(response.read())
webbrowser.open("results.html")
The above code downloads the webpage but i always end up with authentication page in the download. I found lot of packages but unfortunately i donot have access to install packages or libraries. Any help is appreciated.
I tried with the code suggested by PM 2Ring but I'm getting the error below. I have python 2.6.6 and I'm not sure if that method will work. Please let me know any workaround or way to resolve the error.
Traceback (most recent call last):
File "a.py", line 15, in <module>
handle = urllib2.urlopen(req)
File "/usr/lib64/python2.6/urllib2.py", line 126, in urlopen
return _opener.open(url, data, timeout)
File "/usr/lib64/python2.6/urllib2.py", line 397, in open
response = meth(req, response)
File "/usr/lib64/python2.6/urllib2.py", line 510, in http_response
'http', request, response, code, msg, hdrs)
File "/usr/lib64/python2.6/urllib2.py", line 435, in error
return self._call_chain(*args)
File "/usr/lib64/python2.6/urllib2.py", line 369, in _call_chain
result = func(*args)
File "/usr/lib64/python2.6/urllib2.py", line 518, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 403: Invalid request

Although you are importing cookielib you aren't actually using it, so you can't get past the authentication page of the website. The Python docs for cookielib have some simple examples of how to use it. Eg,
import cookielib, urllib2
cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
r = opener.open("http://example.com/")
Obviously, your code will need to be a little more complicated, as you need to send the password.
So you'll need to do something like this (untested):
cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
urllib2.install_opener(opener)
data1={'j_username':'id', 'j_password':'pass'}
data = urllib.urlencode(data1)
headers = {'User-agent' : 'Mozilla/5.0 (X11; Linux i586; rv:31.0) Gecko/20100101 Firefox/31.0'}
req = urllib2.Request(url, data, headers)
handle = urllib2.urlopen(req)
It's a shame you can't install Requests, as it makes things so much simpler than using the native Python modules.

Python getting all links from a google search result page

i want to create a script that returns all the urls found in a page a google for example , so i create this script : (using BeautifulSoup)
import urllib2
from BeautifulSoup import BeautifulSoup
page = urllib2.urlopen("https://www.google.dz/search?q=see")
soup = BeautifulSoup(page.read())
links = soup.findAll("a")
for link in links:
print link["href"]
and it return this 403 forbidden result :
Traceback (most recent call last):
File "C:\Python27\sql\sql.py", line 3, in <module>
page = urllib2.urlopen("https://www.google.dz/search?q=see")
File "C:\Python27\lib\urllib2.py", line 126, in urlopen
return _opener.open(url, data, timeout)
File "C:\Python27\lib\urllib2.py", line 400, in open
response = meth(req, response)
File "C:\Python27\lib\urllib2.py", line 513, in http_response
'http', request, response, code, msg, hdrs)
File "C:\Python27\lib\urllib2.py", line 438, in error
return self._call_chain(*args)
File "C:\Python27\lib\urllib2.py", line 372, in _call_chain
result = func(*args)
File "C:\Python27\lib\urllib2.py", line 521, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 403: Forbidden
any idea to avoid this error or another methode to get the urls from of the search result ?

No problem using requests
import requests
from BeautifulSoup import BeautifulSoup
page = requests.get("https://www.google.dz/search?q=see")
soup = BeautifulSoup(page.content)
links = soup.findAll("a")
Some of the links have links are like search%:http:// where the end of one joins another so we need to split then using re
import requests
from bs4 import BeautifulSoup
page = requests.get("https://www.google.dz/search?q=see")
soup = BeautifulSoup(page.content)
import re
links = soup.findAll("a")
for link in soup.find_all("a",href=re.compile("(?<=/url\?q=)(htt.*://.*)")):
print re.split(":(?=http)",link["href"].replace("/url?q=",""))
['https://www.see.asso.fr/&sa=U&ei=ryv6U6PvEKzA7AaB4ICwCA&ved=0CBIQFjAA&usg=AFQjCNF2_I8jB98JwR3jcKniLZekSrRO7Q']
['http://webcache.googleusercontent.com/search%3Fq%3Dcache:f7M8NX1XmDsJ', 'https://www.see.asso.fr/%252Bsee%26hl%3Dfr%26%26ct%3Dclnk&sa=U&ei=ryv6U6PvEKzA7AaB4ICwCA&ved=0CBUQIDAA&usg=AFQjCNF8WJButjMNXQXvXBbtyXnF1SgiOg']
['https://www.see.asso.fr/3ei&sa=U&ei=ryv6U6PvEKzA7AaB4ICwCA&ved=0CBgQ0gIoADAA&usg=AFQjCNGnPL1RiX5TekI_yMUc-w_f2oVXtw']
['https://www.see.asso.fr/node/9587&sa=U&ei=ryv6U6PvEKzA7AaB4ICwCA&ved=0CBkQ0gIoATAA&usg=AFQjCNHX-6AzBgLQUF0s8TxFcZjIhxz_Hw']
['https://www.see.asso.fr/ree&sa=U&ei=ryv6U6PvEKzA7AaB4ICwCA&ved=0CBoQ0gIoAjAA&usg=AFQjCNGkkd8e1JjiNrhSM4HQYE-M6g6j-w']
['https://www.see.asso.fr/node/130&sa=U&ei=ryv6U6PvEKzA7AaB4ICwCA&ved=0CBsQ0gIoAzAA&usg=AFQjCNEkVdpcbXDz5-cV9u2NNYoV6aM8VA']
['http://www.wordreference.com/enfr/see&sa=U&ei=ryv6U6PvEKzA7AaB4ICwCA&ved=0CB0QFjAB&usg=AFQjCNHQGwcsGpro26dhxFP6q-fQvwbB0Q']
['http://webcache.googleusercontent.com/search%3Fq%3Dcache:ooK-I_HuCkwJ', 'http://www.wordreference.com/enfr/see%252Bsee%26hl%3Dfr%26%26ct%3Dclnk&sa=U&ei=ryv6U6PvEKzA7AaB4ICwCA&ved=0CCAQIDAB&usg=AFQjCNFRlV5Zv_n48Wivr4LeOkTQsA0D1Q']
['http://fr.wikipedia.org/wiki/S%25C3%25A9e&sa=U&ei=ryv6U6PvEKzA7AaB4ICwCA&ved=0CCMQFjAC&usg=AFQjCNGmtqmcXPqYZ_nwa0RWL0uYf5PMJw']
['http://webcache.googleusercontent.com/search%3Fq%3Dcache:GjcgkyzsUigJ', 'http://fr.wikipedia.org/wiki/S%2525C3%2525A9e%252Bsee%26hl%3Dfr%26%26ct%3Dclnk&sa=U&ei=ryv6U6PvEKzA7AaB4ICwCA&ved=0CCYQIDAC&usg=AFQjCNHesOIBU3OXBspARcONbK_k_8-gnw']
['http://fr.wikipedia.org/wiki/Camille_S%25C3%25A9e&sa=U&ei=ryv6U6PvEKzA7AaB4ICwCA&ved=0CCkQFjAD&usg=AFQjCNGO-WIDl4TrBeo88WY9QsopWmsMyQ']
['http://webcache.googleusercontent.com/search%3Fq%3Dcache:izhQjC85nOoJ', 'http://fr.wikipedia.org/wiki/Camille_S%2525C3%2525A9e%252Bsee%26hl%3Dfr%26%26ct%3Dclnk&sa=U&ei=ryv6U6PvEKzA7AaB4ICwCA&ved=0CCwQIDAD&usg=AFQjCNEfcIKsKbf026xgWT7NkrAueZvL0A']
['http://de.wikipedia.org/wiki/Zugersee&sa=U&ei=ryv6U6PvEKzA7AaB4ICwCA&ved=0CDEQ9QEwBA&usg=AFQjCNHpfJW5-XdsgpFUSP-jEmHjXQUWHQ']
['http://commons.wikimedia.org/wiki/File:Champex_See.jpg&sa=U&ei=ryv6U6PvEKzA7AaB4ICwCA&ved=0CDMQ9QEwBQ&usg=AFQjCNEordFWr2QIaob45WlR5Yi-ZvZSiA']
['http://www.all-free-photos.com/show/showphotop.php%3Fidtop%3D4%26lang%3Dfr&sa=U&ei=ryv6U6PvEKzA7AaB4ICwCA&ved=0CDUQ9QEwBg&usg=AFQjCNEC24FOIE5cvF4zmEDgq5-5xubM3w']
['http://www.allbestwallpapers.com/travel-zell_am_see,_kaprun,_austria_wallpapers.html&sa=U&ei=ryv6U6PvEKzA7AaB4ICwCA&ved=0CDcQ9QEwBw&usg=AFQjCNFkzMZDuthZHvnF-JvyksNUqjt1dQ']
['http://www.see-swe.org/&sa=U&ei=ryv6U6PvEKzA7AaB4ICwCA&ved=0CDkQFjAI&usg=AFQjCNF1zbcLfjanxgCXtHoOQXOdMgh_AQ']
['http://webcache.googleusercontent.com/search%3Fq%3Dcache:lzh6JxvKUTIJ', 'http://www.see-swe.org/%252Bsee%26hl%3Dfr%26%26ct%3Dclnk&sa=U&ei=ryv6U6PvEKzA7AaB4ICwCA&ved=0CDwQIDAI&usg=AFQjCNFYN6tzzVaHsAc5aOvYNql3Zy4m3A']
['http://fr.wiktionary.org/wiki/see&sa=U&ei=ryv6U6PvEKzA7AaB4ICwCA&ved=0CD8QFjAJ&usg=AFQjCNFWYIGc1gj0prytowzqI-0LDFRvZA']
['http://webcache.googleusercontent.com/search%3Fq%3Dcache:G9v8lXWRCyQJ', 'http://fr.wiktionary.org/wiki/see%252Bsee%26hl%3Dfr%26%26ct%3Dclnk&sa=U&ei=ryv6U6PvEKzA7AaB4ICwCA&ved=0CEIQIDAJ&usg=AFQjCNENzi4E1n-9qHYsNahY6lQzaW5Xvg']
['http://en.wiktionary.org/wiki/see&sa=U&ei=ryv6U6PvEKzA7AaB4ICwCA&ved=0CEUQFjAK&usg=AFQjCNECGZjw-rBUALO43WaTh2yB9BUhDg']
['http://webcache.googleusercontent.com/search%3Fq%3Dcache:ywc4URuPdIQJ', 'http://en.wiktionary.org/wiki/see%252Bsee%26hl%3Dfr%26%26ct%3Dclnk&sa=U&ei=ryv6U6PvEKzA7AaB4ICwCA&ved=0CEgQIDAK&usg=AFQjCNE0pykIqXXRl08E-uTtoj03QEpnbg']
['http://see-concept.com/&sa=U&ei=ryv6U6PvEKzA7AaB4ICwCA&ved=0CEsQFjAL&usg=AFQjCNGFWjhiH7dEBhITJt01ob_JENlz1Q']
['http://webcache.googleusercontent.com/search%3Fq%3Dcache:jHTkOVEoRsAJ', 'http://see-concept.com/%252Bsee%26hl%3Dfr%26%26ct%3Dclnk&sa=U&ei=ryv6U6PvEKzA7AaB4ICwCA&ved=0CE4QIDAL&usg=AFQjCNECPgxt9ZSFmZzK_ker9Hw_FoCi_A']
['http://www.theconjugator.com/la/conjugaison/du/verbe/see.html&sa=U&ei=ryv6U6PvEKzA7AaB4ICwCA&ved=0CFEQFjAM&usg=AFQjCNETCTQ0vPDIdV_2Q57qq11dyN0d8Q']
['http://webcache.googleusercontent.com/search%3Fq%3Dcache:xD7_Qo7roS8J', 'http://www.theconjugator.com/la/conjugaison/du/verbe/see.html%252Bsee%26hl%3Dfr%26%26ct%3Dclnk&sa=U&ei=ryv6U6PvEKzA7AaB4ICwCA&ved=0CFQQIDAM&usg=AFQjCNF_hBCyDZncivYGnL7je5kYme9hEg']
['http://www.zellamsee-kaprun.com/fr&sa=U&ei=ryv6U6PvEKzA7AaB4ICwCA&ved=0CFcQFjAN&usg=AFQjCNFVDeBWrZMDSjK9jKYF4AQlIXa9lA']
['http://webcache.googleusercontent.com/search%3Fq%3Dcache:BFBEUp05w7YJ', 'http://www.zellamsee-kaprun.com/fr%252Bsee%26hl%3Dfr%26%26ct%3Dclnk&sa=U&ei=ryv6U6PvEKzA7AaB4ICwCA&ved=0CFoQIDAN&usg=AFQjCNHtrOeEpYWqvT3f0M1p-gxUkYT1IA']

The best way to do this is to use the google API (pip install google)
GeeksforGeeks writes about it here
from googlesearch import search
# to search
query = "see"
links = []
for j in search(query, tld="co.in", num=10, stop=10, pause=2):
links.append(j)

import urllib.request
from BeautifulSoup import BeautifulSoup
page = urllib.request.urlopen("https://www.google.dz/search?q=see")
soup = BeautifulSoup(page.read())
links = soup.findAll("a")
for link in links:
print link["href"]

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python Html table can't find data when running on server - python

Related

How to print JSON data to a Google Sheet using GSpread

BeautifulSoup timing out with certain URL's?

Trying to Scrape a website for data but getting an Error 403

automated login webpage with credentials and browse to different subpage and download that subpage perl or python

Python getting all links from a google search result page

Categories

Resources