I am currently trying to get a list of Google Search results via Python.
Many different package have stopped working or have been deprecated when google changed the html layout several years ago e.g. pygoogle, xgoogle
Searching for "Hiking Trails Los Angeles" on Google, how do I return the top 10 results, ideally with url, title and description or other attributes that are available?
Yh, google-search-api has been deprecated, hence the pygoogle which was a wrapper for the Google search api. At the top of the search api page, there's a warning, along with:
We encourage you to investigate the Custom Search API, which may
provide an alternative solution.
But using this custom search api to search the whole web isn't pretty straightforward. Here I found 2 detailed guides (SO answers):
Programmatically searching google in Python using custom search
1st step: get Google API key.
2nd step: setup Custom Search Engine so that you can search the entire web.
3rd step: install Google API client for Python.
4th step (bonus): do the search.
So, after setting this up, you can follow the code samples from few
places:
simple example: https://github.com/google/google-api-python-client/blob/master/samples/customsearch/main.py
cse() function docs: https://google-api-client-libraries.appspot.com/documentation/customsearch/v1/python/latest/customsearch_v1.cse.html
What are the alternatives now that the Google web search API has been deprecated?
Yes, Google Custom Search has now replaced the old Search API, but you
can still use Google Custom Search to search the entire web, although
the steps are not obvious from the Custom Search setup.
To create a Google Custom Search engine that searches the entire web:
From the Google Custom Search homepage ( http://www.google.com/cse/
), click Create a Custom Search Engine.
Type a name and description for your search engine.
Under Define your search engine, in the Sites to Search box, enter at least one valid URL (For now, just put www.anyurl.com to get past
this screen. More on this later ).
Select the CSE edition you want and accept the Terms of Service, then click Next. Select the layout option you want, and then click
Next.
Click any of the links under the Next steps section to navigate to your Control panel.
In the left-hand menu, under Control Panel, click Basics.
In the Search Preferences section, select Search the entire web but emphasize included sites.
Click Save Changes.
In the left-hand menu, under Control Panel, click Sites.
Delete the site you entered during the initial setup process.
Google Custom Search is not free all the way i.e. Pricing:
Custom Search Engine (free) For CSE users, the API provides 100 search queries per day for free. If you need more, you may sign up for
billing in the API Console. Additional requests cost $5 per 1000
queries, up to 10k queries per day.
Google Site Search (paid). For detailed information on GSS usage limits and quotas, please check GSS pricing options.
Related
I have a blog with the Google Analytics tag in the various pages. I also have links on my site pointing to pages on my site as well as external pages. I have not set up custom events or anything like that.
For a given url/page on my site, within a certain date range, I want to programatically get (ideally from the GA API):
Search words/traffic sources that unique users/traffic from outside my website (e.g. organic traffic searching on Google) used to land on and view that page
For specific links on that page - both internal and external - I want to know the number of unique users who clicked on the link and the number of clicks
For specific links on that page - both internal and external - I want to know the search terms/sources of the users/clicks of the links vs. the visitors that didn't click on the links
Is there a way I can fire a given link on my blog into the Google Analytics API to get this data? I already have a 2-column table that has all of the pages on my site (column 1) and all of the links/urls on those pages (column 2).
I am using Python for all of this btw.
Thanks in advance for any help!
Regarding the information you're looking for:
You won't get organic keywords via the GA API: what you will get most of the time is (not provided) (here is some info and workarounds). You can get this data in the GA UI by linking the search console, but that data won't be exposed via the GA API, only the Search Console API (formerly "webmasters"), which unfortunately you won't be able to link with your GA data.
You will need to implement custom events if you want to track link clicks, as by default GA doesn't do it (here is an example which you can use for both internal and external links). Once you have the events implemented, you can use the ga:eventAction or ga:eventLabel to filter on your links (depending on how you implemented the events), and ga:totalEvents / ga:uniqueEvents to get the total / unique number of clicks.
You will need to create segments in order to define conditions about what users did or did not do. What I advise you to do is to create and test your segments via the UI to make sure they're correct, then simply refer to the segments via ID using the API.
As for the GA API implementation, before coding I advise you to get familiar with the API using:
The query explorer
Google Sheets + GA API plugin
Once you get the results you're looking for, you can automate with the Google Python Client (it's the same client for (nearly) all Google APIs), GA being a service you use with the client, and you'll find some python samples here.
My goal is to create a small sript that find all the result of a google search but in "raw".
I don't speak english very well so i prefer to give an exemple to show you what i would like :
I Type : elephant
The script return
www.elephant.com
www.bluelephant.com
www.ebay.com/elephant
.....
I was thinking about urllib.request but the return value will not be usable to that !
I found some tutorials but not adapted at all to my wish !
Like i told you my goal is to have an .txt file as output witch contains alls the website who match with my query !
Thanks all
One simple way is to make a request to google search, then parse the html result. You can use some Python libraries such us Beautiful Soup to parse the html content easily, finally get the url link you need.
These seem to change often, so hopefully this answer remains useful for a little while...
First, you'll need to create a Google Custom Search, by visiting their site or following the instructions provided here https://developers.google.com/custom-search/docs/tutorial/creatingcse.
This will provide you with both
Custom Search Engine ID
API Key
credentials which are needed to use the service.
In your python script, you'll want to import the following package:
from googleapiclient.discovery import build
which will enable you to create a build object:
service = build("customsearch", developerKey=my_api_key)
According to the docs, this constructs a resource for interacting with the API.
When you want to return search results, call execute() on service's cse().list() method:
res = service.cse().list(q=my_search_keyword, cx=my_cse_id, **kwargs).execute()
to return a list of search results, where each result is a dictionary object. The i'th result's URL can be accessed with the "link" key:
ithresult = res[i]['link']
Note that you can only return 10 results in a single call, so make use of the start keyword argument in .list(), and consider embedding this call in a loop to generate several links at a time.
You should be able to find plenty of SO answers about saving your search results to a text file.
N.B. One more thing that confused me at first - presumably you'll want to search the entire web, and not just a single site. But when creating your CSE you will be asked to specify a single site, or list of sites, to search. Don't worry, just enter any old thing, you can delete it later. Even Google endorses this hack:
Convert a search engine to search the entire web: On the Custom Search
home page, click the search engine you want. Click Setup, and then
click the Basics tab. Select Search the entire web but emphasize
included sites. In the Sites to search section, delete the site you
entered during the initial setup process.
I just add 2 points to "9th Dimension" answer.
Use this guide to find your Custom Search Engine ID
A small modification should be made in the second line of the code: as the following, the "version" should be added as an argument
service = build('customsearch','v1',developerKey= my_api_key)
You have 2 options - using API or make a request like a browser does and then parse HTML.
First option is rather tricky to set up and is limited - 100 free queries/day, then 1000 for $5.
Second option is easier but it violates Google's ToS.
Right now I am trying to figure out how to take a row of data, maybe like 50 entries max, and enter it individually into a search bar. But first I need to understand the beginning concepts so I want to do a practice program that could take info from an Excel sheet and enter into a Google search or YouTube, for example.
My problem is there seems to be no resource on how to do this for beginners. All posts I have read are either parts of the whole problem or not related to actually using a search bar but instead creating one. Even then every post I read has 100 plug-ins I could possibly add.
I'm just looking for a consistent explanation to where I can grasp how I can manipulate code in order to use a search bar function.
To perform a web search (Google, YouTube or whatever) from a program you need to execute the search, either by building up and calling an appropriate search URL or by making a call to an API provided by that site.
The article 'Python - Search for YouTube video' provides a code sample and explanation of how to generate and call a URL to perform a YouTube keyword search. You could do something similar for a Google search by analysing the URL from the result of a Google search, or try searching for 'Python submit google search url'.
The above approach is simplistic and relies on the URL structure for a particular site staying the same. A more complex, reliable and flexible approach is to use the API. For YouTube:
YouTube API - Python developers guide
YouTube API - Python code samples - Search by keyword
I have a python program that takes the md5 & sha1 hash values of passwords and searches for them on the internet using Google's custom search api. The problem is that I'm getting 0 results(which means the hash probably isn't in a rainbow table) when I run the program. But when I searched using my browser, I get a whole bunch of results, in fact at least 10 pages of results.
Could the problem lie in the cx value I used? I picked it up from the sample program provided by google as I couldn't figure out how to get one for myself. Or does the custom search api give only selected results and it's futile trying to get more results from it?
I know it's pretty old post but it is still returned very high in google results so a little bit of clarification:
You can create your own CSE in here: https://www.google.com/cse/ .
API codes can be created using API console: https://cloud.google.com/ .
Using Google Custom Search you can search the whole Web: go to the system from point 1, from the menu on the left choose the CSE to edit, then in the Configuration -> Basics -> Sites select the option to search the whole Web and finally remove previously specified sites.
Still using CSE you might not get the same results as using live google as it does not include google features (real-time results, social features etc.) and once you specify more than 10 sites to look on it can actually use sub-index. More information can be found in here: https://support.google.com/customsearch/answer/70392?hl=en
The Google Custom Search API let's you search the Google indexes for a specific website only, and you will not find any results from anywhere else on the internet. The cx parameter tells Google what website you want to search.
From the Google Custom Search Engine page:
With Google Custom Search, add a search box to your homepage to help people find what they need on your website.
You could use the deprecated Google Web Search API (JavaScript API, should work until November 2013), or you'd have to scrape the HTML UI provided to your browser instead (also see What are the alternatives now that the Google web search API has been deprecated?).
I am trying to use the Google API Custom Search, and I don't have any clue where to start. It seems you have to create a "custom search engine" in order to parse search results, and you are limited to 100 per day.
What module should I use for this? I believe I start here: http://code.google.com/p/google-api-python-client/
I need an API key or something? Basically I want to be able to do this operation, and Google's documentation is confusing, or perhaps beyond my level.
Pseudocode:
from AwesomeGoogleModule import GoogleSearch
search = GoogleSearch("search term")
results = search.SearchResultsNumber
print results
Basically, that number you get of total results for a particular search term? I want to scrape it. I don't want to go via the front-end Google, because that's very easy to get blocked. I don't need to go beyond the 100 searches that the API allows. This will only be for 30-50 search terms, maybe 80-100 at MOST.
Sample code for Custom Search using the google-api-python-client library is here:
http://code.google.com/p/google-api-python-client/source/browse/#hg%2Fsamples%2Fcustomsearch
You will need to create your own API Key by visiting:
https://code.google.com/apis/console/
Create a project in the APIs Console, making sure to turn on the Custom Search API for that project, and then you will find the API Key at the bottom of the API Access tab.