It's trivial to search for a set of keywords in a certain website in a specific date range: in the google search box you enter
desired-kewords site:desired-website
then from the Tools menu you pick the date range.
e.g. "arab spring" search term in www.cnn.com between 1th Jan 2011 and 31th Dec 2013:
As you can see in the second picture there are about 773 results!
The search URI looks like this :
https://www.google.co.nz/search?tbs=cdr%3A1%2Ccd_min%3A1%2F1%2F2011%2Ccd_max%3A12%2F31%2F2013&ei=iDcnWoy3Jsj38QW514S4Aw&q=arab+spring+site%3Awww.cnn.com&oq=arab+spring+site%3Awww.cnn.com&gs_l=psy-ab.12...0.0.0.6996.0.0.0.0.0.0.0.0..0.0....0...1c..64.psy-ab..0.0.0....0.a4-ff19obY4
The date range could be seen in cd_min and cd_max of the tbs parameter (which appears in URI whenever the tools menu is used).
I would like to get the same functionality programmatically using Google's custom search API client for python.
I defined a custom search engine:
Then tried different suggestions I found on the web/stack overflow:
This is a related question which is left unanswered.
This post about Date range search using Google Custom Search API referred to here and suggests using the 'sort' parameter to do the favour (sort = 'date:r:yyyymmdd:yyyymmdd'). It did not work: "totalResults" is "44900".
This post suggests using date restrict field which does not work as well.
Well! Any working solution?
I might be late, but for other people searching for the solution, you can try this:
from googleapiclient.discovery import build
my_api_key = "YOUR_API_KEY"
my_cse_id = "YOUR_CSE_ID"
def google_results_count(query):
service = build("customsearch", "v1",
developerKey=my_api_key)
result = service.cse().list(q=query, cx=my_cse_id, sort="date:r:20110101:20131231").execute()
return result["searchInformation"]["totalResults"]
print google_results_count('arab spring site:www.cnn.com')
This code will return around 1500+ results.
It is still far from the web results, Google has an explanation why.
Also, if you haven't setup your CSE to search the entire web, here's a guide on how to set it up.
P.S. If you still want to get the web version's result/data, you can just scrape it using BeautifulSoup or other libraries.
In case you dont want to use SORT parameter you can insert date into your query parameter like:
https://customsearch.googleapis.com/customsearch/v1?
key=<api_key>&
cx=<search_engine_id>&
q="<your_search_word> after:<YYYY-MM-DD> before:<YYYY-MM-DD>"
Related
My goal is to create a small sript that find all the result of a google search but in "raw".
I don't speak english very well so i prefer to give an exemple to show you what i would like :
I Type : elephant
The script return
www.elephant.com
www.bluelephant.com
www.ebay.com/elephant
.....
I was thinking about urllib.request but the return value will not be usable to that !
I found some tutorials but not adapted at all to my wish !
Like i told you my goal is to have an .txt file as output witch contains alls the website who match with my query !
Thanks all
One simple way is to make a request to google search, then parse the html result. You can use some Python libraries such us Beautiful Soup to parse the html content easily, finally get the url link you need.
These seem to change often, so hopefully this answer remains useful for a little while...
First, you'll need to create a Google Custom Search, by visiting their site or following the instructions provided here https://developers.google.com/custom-search/docs/tutorial/creatingcse.
This will provide you with both
Custom Search Engine ID
API Key
credentials which are needed to use the service.
In your python script, you'll want to import the following package:
from googleapiclient.discovery import build
which will enable you to create a build object:
service = build("customsearch", developerKey=my_api_key)
According to the docs, this constructs a resource for interacting with the API.
When you want to return search results, call execute() on service's cse().list() method:
res = service.cse().list(q=my_search_keyword, cx=my_cse_id, **kwargs).execute()
to return a list of search results, where each result is a dictionary object. The i'th result's URL can be accessed with the "link" key:
ithresult = res[i]['link']
Note that you can only return 10 results in a single call, so make use of the start keyword argument in .list(), and consider embedding this call in a loop to generate several links at a time.
You should be able to find plenty of SO answers about saving your search results to a text file.
N.B. One more thing that confused me at first - presumably you'll want to search the entire web, and not just a single site. But when creating your CSE you will be asked to specify a single site, or list of sites, to search. Don't worry, just enter any old thing, you can delete it later. Even Google endorses this hack:
Convert a search engine to search the entire web: On the Custom Search
home page, click the search engine you want. Click Setup, and then
click the Basics tab. Select Search the entire web but emphasize
included sites. In the Sites to search section, delete the site you
entered during the initial setup process.
I just add 2 points to "9th Dimension" answer.
Use this guide to find your Custom Search Engine ID
A small modification should be made in the second line of the code: as the following, the "version" should be added as an argument
service = build('customsearch','v1',developerKey= my_api_key)
You have 2 options - using API or make a request like a browser does and then parse HTML.
First option is rather tricky to set up and is limited - 100 free queries/day, then 1000 for $5.
Second option is easier but it violates Google's ToS.
Right now I am trying to figure out how to take a row of data, maybe like 50 entries max, and enter it individually into a search bar. But first I need to understand the beginning concepts so I want to do a practice program that could take info from an Excel sheet and enter into a Google search or YouTube, for example.
My problem is there seems to be no resource on how to do this for beginners. All posts I have read are either parts of the whole problem or not related to actually using a search bar but instead creating one. Even then every post I read has 100 plug-ins I could possibly add.
I'm just looking for a consistent explanation to where I can grasp how I can manipulate code in order to use a search bar function.
To perform a web search (Google, YouTube or whatever) from a program you need to execute the search, either by building up and calling an appropriate search URL or by making a call to an API provided by that site.
The article 'Python - Search for YouTube video' provides a code sample and explanation of how to generate and call a URL to perform a YouTube keyword search. You could do something similar for a Google search by analysing the URL from the result of a Google search, or try searching for 'Python submit google search url'.
The above approach is simplistic and relies on the URL structure for a particular site staying the same. A more complex, reliable and flexible approach is to use the API. For YouTube:
YouTube API - Python developers guide
YouTube API - Python code samples - Search by keyword
tl;dr: I am using the Amazon Product Advertising API with Python. How can I do a keyword search for a book and get XML results that contain TITLE, ISBN, and PRICE for each entry?
Verbose version:
I am working in Python on a web site that allows the user to search for textbooks from different sites such as eBay and Amazon. Basically, I need to obtain simple information such as titles, ISBNS, and prices for each item from a set of search results from one of those sites. Then, I can store and format that information as needed in my application (e.g, displaying HTML).
In eBay's case, getting the info I needed wasn't too hard. I used urllib2 to make a request based on a sample I found. All I needed was a special security key to add to the URL:
def ebaySearch(keywords): #keywords is a list of strings, e.g. ['moby', 'dick']
#findItemsAdvanced allows category filter -- 267 is books
#Of course, I replaced my security appname in the example below
url = "http://svcs.ebay.com/services/search/FindingService/v1?OPERATION-NAME=findItemsAdvanced&SERVICE-NAME=FindingService&SERVICE-VERSION=1.0.0&SECURITY-APPNAME=[MY-APPNAME]&RESPONSE-DATA-FORMAT=XML&REST-PAYLOAD&categoryId=267&keywords="
#Complete the url...
numKeywords = len(keywords)
for k in range(0, numKeywords-1):
url += keywords[k]
url += "%20"
#There should not be %20 after last keyword
url += keywords[numKeywords-1]
request = urllib2.Request(url)
response = urllib2.urlopen(request) #file like thing (due to library conversion)
xml_response = response.read()
...
...Then I parsed this with minidom.
In Amazon's case, it doesn't seem to be so easy. I thought I would start out by just looking for an easy wrapper. But their developer site doesn't seem to provide a python wrapper for what I am interested in (the Product Advertising API). One that I have tried, python-amazon-product-api 0.2.5 from https://pypi.python.org/pypi/python-amazon-product-api/, has been giving me some installation issues that may not be worth the time to look into (but maybe I'm just exasperated..). I also looked around and found pyaws and pyecs, but these seem to use deprecated authentication mechanisms.
I then figured I would just try to construct the URLs from scratch as I did for eBay. But Amazon requires a time stamp in the URLs, which I suppose I could programatically construct (perhaps something like these folks, who go the whole 9 yards with the signature: https://forums.aws.amazon.com/thread.jspa?threadID=10048).
Even if that worked (which I doubt will happen, given the amount of frustration the logistics have given so far), the bottom line is that I want name, price, and ISBN for the books that I search for. I was able to generate a sample URL with the tutorial on the API website, and then see the XML result, which indeed contained titles and ISBNs. But no prices! Gah! After some desperate Google searching, a slight modification to the URL (adding &ResponseGroup=Offers and &MerchantID=All) did the trick, but then there were no titles. (I guess yet another question I would have, then, is where can I find an index of the possible ResponseGroup parameters?)
Overall, as you can see, I really just don't have a solid methodology for this. Is the construct-a-url approach a decent way to go, or will it be more trouble than it is worth? Perhaps the tl;dr at the top is a better representation of the overall question.
Another way could be amazon-simple-product-api:
from amazon.api import AmazonAPI
amazon = AmazonAPI(ACCESS_KEY, SECRET, ASSOC)
results = amazon.search(Keywords = "book name", SearchIndex = "Books")
for item in results:
print item.title, item.isbn, item.price_and_currency
To install, just clone from github and run
sudo python setup.py install
Hope this helps!
If you have installation issues with python-amazon-product-api, send details to mailing list and you will be helped.
I have a python program that takes the md5 & sha1 hash values of passwords and searches for them on the internet using Google's custom search api. The problem is that I'm getting 0 results(which means the hash probably isn't in a rainbow table) when I run the program. But when I searched using my browser, I get a whole bunch of results, in fact at least 10 pages of results.
Could the problem lie in the cx value I used? I picked it up from the sample program provided by google as I couldn't figure out how to get one for myself. Or does the custom search api give only selected results and it's futile trying to get more results from it?
I know it's pretty old post but it is still returned very high in google results so a little bit of clarification:
You can create your own CSE in here: https://www.google.com/cse/ .
API codes can be created using API console: https://cloud.google.com/ .
Using Google Custom Search you can search the whole Web: go to the system from point 1, from the menu on the left choose the CSE to edit, then in the Configuration -> Basics -> Sites select the option to search the whole Web and finally remove previously specified sites.
Still using CSE you might not get the same results as using live google as it does not include google features (real-time results, social features etc.) and once you specify more than 10 sites to look on it can actually use sub-index. More information can be found in here: https://support.google.com/customsearch/answer/70392?hl=en
The Google Custom Search API let's you search the Google indexes for a specific website only, and you will not find any results from anywhere else on the internet. The cx parameter tells Google what website you want to search.
From the Google Custom Search Engine page:
With Google Custom Search, add a search box to your homepage to help people find what they need on your website.
You could use the deprecated Google Web Search API (JavaScript API, should work until November 2013), or you'd have to scrape the HTML UI provided to your browser instead (also see What are the alternatives now that the Google web search API has been deprecated?).
I am trying to use the Google API Custom Search, and I don't have any clue where to start. It seems you have to create a "custom search engine" in order to parse search results, and you are limited to 100 per day.
What module should I use for this? I believe I start here: http://code.google.com/p/google-api-python-client/
I need an API key or something? Basically I want to be able to do this operation, and Google's documentation is confusing, or perhaps beyond my level.
Pseudocode:
from AwesomeGoogleModule import GoogleSearch
search = GoogleSearch("search term")
results = search.SearchResultsNumber
print results
Basically, that number you get of total results for a particular search term? I want to scrape it. I don't want to go via the front-end Google, because that's very easy to get blocked. I don't need to go beyond the 100 searches that the API allows. This will only be for 30-50 search terms, maybe 80-100 at MOST.
Sample code for Custom Search using the google-api-python-client library is here:
http://code.google.com/p/google-api-python-client/source/browse/#hg%2Fsamples%2Fcustomsearch
You will need to create your own API Key by visiting:
https://code.google.com/apis/console/
Create a project in the APIs Console, making sure to turn on the Custom Search API for that project, and then you will find the API Key at the bottom of the API Access tab.