How to Scrape Google's quick answer box?

How to Scrape Google's quick answer box? - python

I want to scrape quick answer box of google(e.g., the selected text):
I've checked other questions asked on the website regarding the same but that didn't help. How can I do that ?

I think this might help you ,
have given gold rate in search
import requests
from bs4 import BeautifulSoup
headers = {
'User-agent':
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582"
}
r = requests.get('https://www.google.com/search?q=gold+rate+india&safe=active&rlz=1C1GCEB_enIN960IN960&ei=9qksYc76FeeS4-EP-8iQ8AY&oq=gold+rate+india&gs_lcp=Cgdnd3Mtd2l6EAMyCAgAEIAEELEDMgUIABCABDIFCAAQgAQyBQgAEIAEMgUIABCABDIHCAAQsQMQCjIFCAAQgAQyBQgAEIAEMgUIABCABDIHCAAQsQMQCjoHCAAQRxCwAzoKCAAQsAMQQxCLAzoNCAAQsAMQyQMQQxCLAzoQCAAQgAQQsQMQyQMQRhCAAjoKCAAQgAQQsQMQCjoNCAAQgAQQsQMQgwEQCjoLCAAQgAQQsQMQgwE6BQgAELEDOgcIABCABBAKOgsIABCABBCxAxDJAzoFCAAQkgM6CgguEIAEELEDEAo6CAgAELEDEIMBOgoIABCxAxCDARAKOhMILhCxAxCDARDHARDRAxBDEJMCOgcIABCxAxBDOgQIABBDOgYIABAKEEM6CggAELEDEIMBEEM6CQgAEMkDEAoQQzoICAAQsQMQkQI6CAguEIAEELEDOgUIABCRAjoOCAAQsQMQgwEQyQMQkQI6CwgAELEDEMkDEJECSgUIOhIBMUoFCDwSATNKBAhBGABQgytY4oQBYN2GAWgFcAJ4BIABiQiIAYRBkgEQMC43LjEwLjQuNC4wLjEuMZgBAKABAbABAMgBCrgBAsABAQ&sclient=gws-wiz&ved=0ahUKEwjOza2jvNjyAhVnyTgGHXskBG4Q4dUDCA8&uact=5', headers=headers)
soup = BeautifulSoup(r.text, 'lxml')
result = soup.find('div', class_='vlzY6d')
print(result.text)

The Beautiful Soup library is best suited for this task. To find the desired selector, you can use the select_one() method. This method accepts a selector to search for. To get the desired element, you need to refer to the general div with the .kno-rdesc class and select the span tag in it. The resulting selector will look like this: .kno-rdesc span. The method will return the html element. In order to extract text from there, you must use the text method.
Below is a code snippet that uses the method described above:
result = soup.select_one(".kno-rdesc span").text
print(result)
Also, make sure you're using request headers user-agent to act as a "real" user visit. Because default requests user-agent is python-requests and websites understand that it's most likely a script that sends a request. Check what's your user-agent.
Code and full example in online IDE:
from bs4 import BeautifulSoup
import requests, lxml
# https://docs.python-requests.org/en/master/user/quickstart/#passing-parameters-in-urls
params = {
"q": "Narendra Modi",
"hl": "en", # language
"gl": "us" # country of the search, US -> USA
}
# https://docs.python-requests.org/en/master/user/quickstart/#custom-headers
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36",
}
html = requests.get("https://www.google.com/search", params=params, headers=headers, timeout=30)
soup = BeautifulSoup(html.text, "lxml")
result = soup.select_one(".kno-rdesc span").text
print(result)
Output:
Narendra Damodardas Modi is an Indian politician serving as the 14th and current prime minister of India since 2014. Modi was the chief minister of Gujarat from 2001 to 2014 and is the Member of Parliament from Varanasi.
Alternatively, you can use Google Organic Results API from SerpApi. It`s a paid API with the free plan.
The difference is that it will bypass blocks from Google or other search engines, so the end-user doesn't have to figure out how to do it, maintain the parse, and only think about what data to retrieve instead.
Example code to integrate:
from serpapi import GoogleSearch
import os
params = {
# https://docs.python.org/3/library/os.html#os.getenv
"api_key": os.getenv("API_KEY"), # your serpapi api key
"engine": "google", # search engine
"q": "Narendra Modi" # search query
# other parameters
}
search = GoogleSearch(params) # where data extraction happens on the SerpApi backend
result_dict = search.get_dict() # JSON -> Python dict
result = result_dict["knowledge_graph"]["description"]
print(result)
Output:
Narendra Damodardas Modi is an Indian politician serving as the 14th and current prime minister of India since 2014. Modi was the chief minister of Gujarat from 2001 to 2014 and is the Member of Parliament from Varanasi.
Disclaimer, I work for SerpApi.

Related

Problem with webscraping google python beautiful soup

i am writing code:
i want to open some subpages which have been found.
import bs4
import requests
url = 'https://www.google.com/search?q=python'
res = requests.get(url)
res.raise_for_status()
soup = bs4.BeautifulSoup(res.text, 'html.parser')
list_sites = soup.select('a[href]')
print(len(list_sites))
i want to open for example site in google like 'python' and then open some first links, but i have a problem with function select. What i should put inside to find links to
subpage? like a: Polish Python Coders Group - News, Welcome to Python.org, ...
I tried to put: a[href], a, h3 class but it doesnt work...

The wrong selector is selected in your code. Even if it worked, you wouldn't get what you wanted. Because you're selecting all the links on the page, not the ones that lead to websites.
To get these links, you need to get the selector that contains them. In our case, this is the .yuRUbf a selector. Let's use a select() method that will return a list of all the links we need.
To iterate over all links, we can use for loop and iterate the list of matched elements what select() method returned. Use get('href') or ['href'] to extract attributes.
for url in soup.select(".yuRUbf a"):
print(url.get("href"))
Also, make sure you're using request headers user-agent to act as a "real" user visit. Because default requests user-agent is python-requests and websites understand that it's most likely a script that sends a request. Check what's your user-agent.
Code and full example in online IDE:
from bs4 import BeautifulSoup
import requests, lxml
# https://docs.python-requests.org/en/master/user/quickstart/#passing-parameters-in-urls
params = {
"q": "python",
"hl": "en", # language
"gl": "us" # country of the search, US -> USA
}
# https://docs.python-requests.org/en/master/user/quickstart/#custom-headers
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.88 Safari/537.36",
}
html = requests.get("https://www.google.com/search", params=params, headers=headers, timeout=30)
soup = BeautifulSoup(html.text, "lxml")
for url in soup.select(".yuRUbf a"):
print(url.get("href"))
Output:
https://www.python.org/
https://en.wikipedia.org/wiki/Python_(programming_language)
https://www.w3schools.com/python/
https://www.w3schools.com/python/python_intro.asp
https://www.codecademy.com/catalog/language/python
https://www.geeksforgeeks.org/python-programming-language/
If you don't want to figure out how to build a reliable parser from scratch and maintain it, have a look at API solutions. For example Google Organic Results API from SerpApi.
Hello World example:
from serpapi import GoogleSearch
import os
# https://docs.python-requests.org/en/master/user/quickstart/#passing-parameters-in-urls
params = {
# https://docs.python.org/3/library/os.html#os.getenv
"api_key": os.getenv("API_KEY"), # your serpapi api key
"engine": "google", # search engine
"q": "python" # search query
# other parameters
}
search = GoogleSearch(params) # where data extraction happens on the SerpApi backend
result_dict = search.get_dict() # JSON -> Python dict
for result in result_dict["organic_results"]:
print(result["link"])
Output:
https://www.python.org/
https://en.wikipedia.org/wiki/Python_(programming_language)
https://www.w3schools.com/python/
https://www.codecademy.com/catalog/language/python
https://www.geeksforgeeks.org/python-programming-language/

is this you need?
from bs4 import BeautifulSoup
import requests, urllib.parse
import lxml
def print_extracted_data_from_url(url):
headers = {
"User-Agent":
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582"
}
response = requests.get(url, headers=headers).text
soup = BeautifulSoup(response, 'lxml')
for container in soup.findAll('div', class_='tF2Cxc'):
head_link = container.a['href']
print(head_link)
return soup.select_one('a#pnnext')
next_page_node = print_extracted_data_from_url('https://www.google.com/search?hl=en-US&q=python')

Request.get() always returning a none value from google.com

I have this function which is meant to return the value of how many search results were gotten for a specific word. It was working at one point, however not it only ever returns a none value. Wondering if anybody has some insight into this issue?
edit: sorry the url is set to "https://www.google.com/search?q="
def pyGoogleSearch(userInput):#Creates a list of values based off the total number of results
newWord = url + userInput #add the url and userInput into one object
page = requests.get(newWord)#search the word in google
soup = BeautifulSoup(page.content,'lxml')#create a soup objects which parses the html
search = soup.find('div',id="resultStats").text#actually search for the value
[int(s) for s in search.split() if s.isdigit()] #convert value to a list of values, still broken up
print(search)#debug
return search

As others have mentioned in the comments, we dont know what your url is set to and its likely that it's either not set or set to a wrong url.
If you are looking to query sites such as wikipedia then the below solution would be of much simpler approach. It uses the URL and appends the search word to the request. Once fetched and decoded we can iterate through and find the number of times this word occurs. You can modify this and apply it for your problem.
import urllib.request
def getTopicCount(topic):
url = "https://en.wikipedia.org/w/api.php?action=parse&section=0&prop=text&format=json&page="
contents = urllib.request.urlopen(url+topic).read().decode('utf-8')
count = 0
pos = contents.find(topic)#returns when this word was encountered. if -1 its not there
while pos != -1: #returns -1 if not found
count += 1
pos = contents.find(topic, pos+1)#starting posistion in the returned json request
return count
print(getTopicCount("pizza"))//prints 146

It's because you haven't specified user-agent in HTTP requests headers. Learn more about user-agent and request headers.
Basically, user-agent let identifies the browser, its version number, and its host operating system that representing a person (browser) in a Web context that lets servers and network peers identify if it's a bot or not. Check what's your user-agent.
Pass user-agent into request headers:
headers = {
"User-Agent":
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582"
}
requests.get("YOUR_URL", headers=headers)
Use select_one() instead. CSS selectors are more readable and a bit faster. CSS selectors reference.
soup.select_one('#result-stats nobr').previous_sibling
# About 107,000 results
Code and example in the online IDE:
import requests, lxml
from bs4 import BeautifulSoup
headers = {
"User-Agent":
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582"
}
params = {
"q": "fus ro dah definition", # query
"gl": "us", # country
"hl": "en" # language
}
response = requests.get('https://www.google.com/search',
headers=headers,
params=params)
soup = BeautifulSoup(response.text, 'lxml')
# .previous_sibling will go to, well, previous sibling removing unwanted part: "(0.38 seconds)"
number_of_results = soup.select_one('#result-stats nobr').previous_sibling
print(number_of_results)
# About 107,000 results
Alternatively, you can achieve the same thing by using Google Organic Results API from SerpApi. it's a paid API with a free plan.
The main difference in your case is that you don't have to deal with selecting selectors to extract data or maintain parser over time since it's already done for the end-user. The only thing that needs to be done is just to get the data you want from the structured JSON.
Code to integrate:
import os
from serpapi import GoogleSearch
params = {
"engine": "google",
"q": "fus ro dah defenition",
"api_key": os.getenv("API_KEY"),
}
search = GoogleSearch(params)
results = search.get_dict()
result = results["search_information"]['total_results']
print(result)
# 107000
P.S - I wrote a blog post about how to scrape Google Organic Results.
Disclaimer, I work for SerpApi.

Beautiful Soup CSS selector not finding anything

I'm using Python 3. The code below is supposed to let the user enter a search term into the command line, after which it searches Google and runs through the HTML of the results page to find tags matching the CSS selector ('.r a').
Say we search for the term "cats." I know the tags I'm looking for exist on the "cats" search results page since I looked through the page source myself.
But when I run my code, the linkElems list is empty. What is going wrong?
import requests, sys, bs4
print('Googling...')
res = requests.get('http://google.com/search?q=' +' '.join(sys.argv[1:]))
print(res.raise_for_status())
soup = bs4.BeautifulSoup(res.text, 'html5lib')
linkElems = soup.select(".r a")
print(linkElems)

The ".r" class is rendered by Javascript, so it's not available in the HTML received. You can either render the javascript using selenium or similar or you can try a more creative solution to extracting the links from the tags. First check that the tags exist by finding them without the ".r" class. soup.find_all("a") Then as an example you can use regex to extract all urls beginning with "/url?q="
import re
linkelems = soup.find_all(href=re.compile("^/url\?q=.*"))

The parts you want to extract are not rendered by JavaScript as Matts mentioned and you don't need regex for such a task.
Make sure you're using user-agent otherwise Google will block your request eventually. That might be the reason why you were getting an empty output since you received a completely different HTML. Check what is your user-agent. I already answered about what is user-agent and HTTP headers.
Pass user-agent into HTTP headers:
headers = {
'User-agent':
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582"
}
requests.get("YOUR_URL", headers=headers)
html5lib is the slowest parser, try to use lxml instead, it's way faster. If you want to use even faster parser, have a look at selectolax.
Code and full example in the online IDE:
from bs4 import BeautifulSoup
import requests
headers = {
'User-agent':
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582"
}
params = {
"q": "selena gomez"
}
html = requests.get('https://www.google.com/search', headers=headers, params=params)
soup = BeautifulSoup(html.text, 'lxml')
for result in soup.select('.tF2Cxc'):
link = result.select_one('.yuRUbf a')['href']
print(link)
----
'''
https://www.instagram.com/selenagomez/
https://www.selenagomez.com/
https://en.wikipedia.org/wiki/Selena_Gomez
https://www.imdb.com/name/nm1411125/
https://www.facebook.com/Selena/
https://www.youtube.com/channel/UCPNxhDvTcytIdvwXWAm43cA
https://www.vogue.com/article/selena-gomez-cover-april-2021
https://open.spotify.com/artist/0C8ZW7ezQVs4URX5aX7Kqx
'''
Alternatively, you can achieve the same thing using Google Organic Results API from SerpApi. It's a paid API with a free plan.
The difference in your case is that you don't have to deal with the parsing part, instead, you only need to iterate over structured JSON and get the data you want, plus you don't have to maintain the parser over time.
Code to integrate:
import os
from serpapi import GoogleSearch
params = {
"engine": "google",
"q": "selena gomez",
"api_key": os.getenv("API_KEY"),
}
search = GoogleSearch(params)
results = search.get_dict()
for result in results["organic_results"]:
link = result['link']
print(link)
----
'''
https://www.instagram.com/selenagomez/
https://www.selenagomez.com/
https://en.wikipedia.org/wiki/Selena_Gomez
https://www.imdb.com/name/nm1411125/
https://www.facebook.com/Selena/
https://www.youtube.com/channel/UCPNxhDvTcytIdvwXWAm43cA
https://www.vogue.com/article/selena-gomez-cover-april-2021
https://open.spotify.com/artist/0C8ZW7ezQVs4URX5aX7Kqx
'''
P.S - I wrote a blog post about how to scrape Google Organic Search Results.
Disclaimer, I work for SerpApi.

Using BeautifulSoup to scrape Google top feedback results for phone number

I'm a beginner at python. I trying to run a script that allows a person to input a university name to get a phone number back. The feedback google result is all i need. for example of search "university of alabama" then the word "phone"
but the result of running the code
brings me the result "None"
I need help getting down to the phone number in my scrape using beautiful soup.
Any suggestions?
ng

CSS selectors provided in answers by QHarr and Bitto Bennichan do not exist in the current Google Organic Results in HTML layout and it will throw an error (if using without try/except block).
Currently, it's this:
>>> phone = soup.select_one('.mw31Ze').text
"+1 205-348-6010"
Also, it was returning None to you because there's no user-agent specified thus Google blocked your request and you received a different HTML with some sort of error.
Because the default requests user-agent is python-requests. Google understands it and blocks a request since it's not the "real" user visit. Checks what's your user-agent.
Pass user-agent intro request headers:
headers = {
"User-Agent":
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582"
}
requests.get("YOUR_URL", headers=headers)
Code:
import requests, lxml
from bs4 import BeautifulSoup
headers = {
"User-Agent":
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582"
}
params = {
"hl": "en",
"gl": "uk" # contry to search from. uk = United Kingdom. us = United States
}
query = input("What would you like to search: ")
query = f"https://www.google.com/search?q={query} phone"
response = requests.get(query, headers=headers, params=params)
soup = BeautifulSoup(response.text, 'lxml')
try:
phone = soup.select_one(".X0KC1c").text
except: phone = "not found"
print(phone)
'''
What would you like to search: university of alabama
+1 205-348-6010
'''
Alternatively, you can achieve the same thing by using Google Knowledge Graph API from SerpApi. It's a paid API with a free plan.
The difference in your case is that you only need to iterate over structured JSON and get the data you want rather than figuring out why certain things break and don't work as they should, and you don't have to maintain the parser over time if some selectors will be changed and cause the parser to brake.
Code to integrate:
from serpapi import GoogleSearch
params = {
"api_key": "YOUR_API_KEY",
"engine": "google",
"q": "university of alabama phone",
"gl": "uk",
"hl": "en"
}
search = GoogleSearch(params)
results = search.get_dict()
phone = results['knowledge_graph']['phone']
print(phone)
# +1 205-348-6010
Disclaimer, I work for SerpApi.

You are using the find method wrong. You need to give the name of the tag and then any attribute that you can use to identify the specific tag uniquely. You can use the inspect tool find the tag in which the phone number is present.
Also, you may need to find your user-agent and pass it as a header to request to get the exact same response from google. Just search "what is my user agent" in google to find your user agent.
from bs4 import BeautifulSoup
import requests
headers={
'User-Agent':'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36'
}
r=requests.get('https://www.google.com/search?q=university+of+alabama+phone',headers=headers)
soup=BeautifulSoup(r.text,'html.parser')
ph_no=soup.find('div',class_='Z0LcW').text
print(ph_no)
Output
+1 205-348-6010
Documentation
find() method - BeautifulSoup

No guarantee this holds across all but you can use a css class selector to retrieve first result with select_one, wrapped in a try except
import requests
from bs4 import BeautifulSoup
query = input("What would you like to search: ")
query = query.replace(" ","+")
query = "https://www.google.com/search?q=" + query + "phone"
r = requests.get(query)
html_doc = r.text
soup = BeautifulSoup(html_doc, 'lxml')
try:
s = soup.select_one(".mrH1y").text
except:
s = "not found"
print(s)

extracting href from <a> beautiful soup

I'm trying to extract a link from a google search result. Inspect element tells me that the section I am interested in has "class = r". The first result looks like this:
<h3 class="r" original_target="https://en.wikipedia.org/wiki/chocolate" style="display: inline-block;">
<a href="https://en.wikipedia.org/wiki/Chocolate"
ping="/url?sa=t&source=web&rct=j&url=https://en.wikipedia.org/wiki/Chocolate&ved=0ahUKEwjW6tTC8LXZAhXDjpQKHSXSClIQFgheMAM"
saprocessedanchor="true">
Chocolate - Wikipedia
</a>
</h3>
To extract the "href" I do:
import bs4, requests
res = requests.get('https://www.google.com/search?q=chocolate')
googleSoup = bs4.BeautifulSoup(res.text, "html.parser")
elements= googleSoup.select(".r a")
elements[0].get("href")
But I unexpectedly get:
'/url?q=https://en.wikipedia.org/wiki/Chocolate&sa=U&ved=0ahUKEwjHjrmc_7XZAhUME5QKHSOCAW8QFggWMAA&usg=AOvVaw03f1l4EU9fYd'
Where I wanted:
"https://en.wikipedia.org/wiki/Chocolate"
The attribute "ping" seems to be confusing it. Any ideas?

What's happening?
If you print the response content (i.e. googleSoup.text) you'll see that you're getting a completely different HTML. The page source and the response content don't match.
This is not happening because the content is loaded dynamically; as even then, the page source and the response content are the same. (But the HTML you see while inspecting the element is different.)
A basic explanation for this is that Google recognizes the Python script and changes its response.
Solution:
You can pass a fake User-Agent to make the script look like a real browser request.
Code:
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36'}
r = requests.get('https://www.google.co.in/search?q=chocolate', headers=headers)
soup = BeautifulSoup(r.text, 'lxml')
elements = soup.select('.r a')
print(elements[0]['href'])
Output:
https://en.wikipedia.org/wiki/Chocolate
Resources:
Sending “User-agent” using Requests library in Python
How to use Python requests to fake a browser visit?
Using headers with the Python requests library's get method

As the other answer mentioned, it's because there was no user-agent specified. The default requests user-agent is python-requests thus Google blocks a request because it knows that it's a bot and not a "real" user visit.
User-agent fakes user visit by adding this information into HTTP request headers. It can be done by passing custom headers (check what's yours user-agent):
headers = {
'User-agent':
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582"
}
requests.get("YOUR_URL", headers=headers)
Additionally, to get more accurate results you can pass URL parameters:
params = {
"q": "samurai cop, what does katana mean", # query
"gl": "in", # country to search from
"hl": "en" # language
# other parameters
}
requests.get("YOUR_URL", params=params)
Code and full example in the online IDE (code from another answer will throw an error because of CSS selector change):
from bs4 import BeautifulSoup
import requests, lxml
headers = {
'User-agent':
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582"
}
params = {
"q": "samurai cop what does katana mean",
"gl": "in",
"hl": "en"
}
html = requests.get("https://www.google.com/search", headers=headers, params=params)
soup = BeautifulSoup(html.text, 'lxml')
for result in soup.select('.tF2Cxc'):
title = result.select_one('.DKV0Md').text
link = result.select_one('.yuRUbf a')['href']
print(f'{title}\n{link}\n')
-------
'''
Samurai Cop - He speaks fluent Japanese - YouTube
https://www.youtube.com/watch?v=paTW3wOyIYw
Samurai Cop - What does "katana" mean? - Quotes.net
https://www.quotes.net/mquote/1060647
Samurai Cop (1991) - Mathew Karedas as Joe Marshall - IMDb
https://www.imdb.com/title/tt0130236/characters/nm0360481
...
'''
Alternatively, you can achieve the same thing by using Google Organic Results API from SerpApi. It's a paid API with a free plan.
The difference in your case is that you only need to iterate over structured JSON and get the data you want fast, rather than figuring out why certain things don't work as they should and then maintain the parser over time.
Code to integrate:
import os
from serpapi import GoogleSearch
params = {
"engine": "google",
"q": "samurai cop what does katana mean",
"hl": "en",
"gl": "in",
"api_key": os.getenv("API_KEY"),
}
search = GoogleSearch(params)
results = search.get_dict()
for result in results["organic_results"]:
print(result['title'])
print(result['link'])
print()
------
'''
Samurai Cop - He speaks fluent Japanese - YouTube
https://www.youtube.com/watch?v=paTW3wOyIYw
Samurai Cop - What does "katana" mean? - Quotes.net
https://www.quotes.net/mquote/1060647
...
'''
Disclaimer, I work for SerpApi.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to Scrape Google's quick answer box? - python

I want to scrape quick answer box of google(e.g., the selected text): I've checked other questions asked on the website regarding the same but that didn't help. How can I do that ?

Related

Problem with webscraping google python beautiful soup

Request.get() always returning a none value from google.com

Beautiful Soup CSS selector not finding anything

Using BeautifulSoup to scrape Google top feedback results for phone number

extracting href from <a> beautiful soup

Categories

Resources