Right now I am trying to figure out how to take a row of data, maybe like 50 entries max, and enter it individually into a search bar. But first I need to understand the beginning concepts so I want to do a practice program that could take info from an Excel sheet and enter into a Google search or YouTube, for example.
My problem is there seems to be no resource on how to do this for beginners. All posts I have read are either parts of the whole problem or not related to actually using a search bar but instead creating one. Even then every post I read has 100 plug-ins I could possibly add.
I'm just looking for a consistent explanation to where I can grasp how I can manipulate code in order to use a search bar function.
To perform a web search (Google, YouTube or whatever) from a program you need to execute the search, either by building up and calling an appropriate search URL or by making a call to an API provided by that site.
The article 'Python - Search for YouTube video' provides a code sample and explanation of how to generate and call a URL to perform a YouTube keyword search. You could do something similar for a Google search by analysing the URL from the result of a Google search, or try searching for 'Python submit google search url'.
The above approach is simplistic and relies on the URL structure for a particular site staying the same. A more complex, reliable and flexible approach is to use the API. For YouTube:
YouTube API - Python developers guide
YouTube API - Python code samples - Search by keyword
Related
As the title states I'm looking for a way to get backlinks for a given url / website using Google APIs, since I already have an api key and I'd rather use it instead of relying on other services.
I already tested services like ahrefs, majestic, moz, serpstat etc and actually they can give me the infomation I need, but I was wondering if there was a way to do it with Google.
For what I've read during my past researches I saw that Google offered a way to do it, but then it became deprecated, so no more usable. Do they really took away this feature for good?
I've also noticed that Google offers a similar service with his Google Search Console, but it can just be used for your own website, I'd like to get those kind of information for a random given url.
Actually I will be using Python in my project, but I don't think there's a package able to deliver me these kind of data, or at least I looked for it and didn't find anything.
Any help would be appreciated.
I looked into this question which asked how a bot could input text on a webpage. One of the answers recommended using Selenium, but a comment there suggested using it was an inefficient way of accomplishing that task.
Say I wanted to create a bot that looks up a set of words on Wikipedia (using the search bar on Wikipedia) and gives me the first 20 words in each article. Would Selenium be the best tool for this?
(Note that I'm aware I could do this manually by just looking up https://en.wikipedia.org/wiki/<word I want> for each item in the list, but I'm specifically looking for how a bot would interact with search bars.)
Efficient and bot for what you're doing doesn't seem to intersect from what you described - why bother using a framework that renders the entire view as a human would see it when you are not using any of that visual content? The most efficient way to utilize a python bot to search on wiki would be to utilize the api and get the results as json to be parsed by the bot.
Searching Wikipedia using API
There is nothing magical about a search bar - when the input is put in there, the browser is redirected to the other url location as you stated https://en.wikipedia.org/wiki/<word you want>. I believe the inefficiency that is being referenced is this exact fact that you can just search manually without the search bar. Rendering and finding the bar to type something in and then submit takes hundreds of milliseconds. Searching directly on the API can be done in milliseconds - much more efficient.
I realize that versions of this question have been asked and I spent several hours the other day trying a number of strategies.
What I would like to is use python to scrape all of the URLs from a google search that I can use in a separate script to do text analysis of a large corpus (news sites mainly). This seems relatively straightforward, but none of the attempts I've tried have worked properly.
This is as close as I got:
from google import search
for url in search('site:cbc.ca "kinder morgan" and "trans mountain" and protest*', stop=100):
print(url)
This returned about 300 URLs before I got kicked. An actual search using these parameters provides about 1000 results and I'd like all of them.
First: is this possible? Second: does anyone have any suggestions to do this? I basically just want a txt file of all the URLs that I can use in another script.
It seems that this package uses screen scraping to retrieve search results from google, so it doesn't play well with Google's Terms of Service which could be the reason why you've been blocked.
The relevant clause in Google's Terms of Service:
Don’t misuse our Services. For example, don’t interfere with our Services or try to access them using a method other than the interface and the instructions that we provide. You may use our Services only as permitted by law, including applicable export and re-export control laws and regulations. We may suspend or stop providing our Services to you if you do not comply with our terms or policies or if we are investigating suspected misconduct.
I haven't been able to find a definite number, but it seems like their limit for the number of search queries a day is rather strict too - at 100 search queries / day on their JSON Custom Search API documentation here.
Nonetheless, there's no harm trying out other alternatives to see if they work better:
BeautifulSoup
Scrapy
ParseHub - this one is not in code, but is a useful piece of software with good documentation. Link to their tutorial on how to scrape a list of URLs.
My goal is to create a small sript that find all the result of a google search but in "raw".
I don't speak english very well so i prefer to give an exemple to show you what i would like :
I Type : elephant
The script return
www.elephant.com
www.bluelephant.com
www.ebay.com/elephant
.....
I was thinking about urllib.request but the return value will not be usable to that !
I found some tutorials but not adapted at all to my wish !
Like i told you my goal is to have an .txt file as output witch contains alls the website who match with my query !
Thanks all
One simple way is to make a request to google search, then parse the html result. You can use some Python libraries such us Beautiful Soup to parse the html content easily, finally get the url link you need.
These seem to change often, so hopefully this answer remains useful for a little while...
First, you'll need to create a Google Custom Search, by visiting their site or following the instructions provided here https://developers.google.com/custom-search/docs/tutorial/creatingcse.
This will provide you with both
Custom Search Engine ID
API Key
credentials which are needed to use the service.
In your python script, you'll want to import the following package:
from googleapiclient.discovery import build
which will enable you to create a build object:
service = build("customsearch", developerKey=my_api_key)
According to the docs, this constructs a resource for interacting with the API.
When you want to return search results, call execute() on service's cse().list() method:
res = service.cse().list(q=my_search_keyword, cx=my_cse_id, **kwargs).execute()
to return a list of search results, where each result is a dictionary object. The i'th result's URL can be accessed with the "link" key:
ithresult = res[i]['link']
Note that you can only return 10 results in a single call, so make use of the start keyword argument in .list(), and consider embedding this call in a loop to generate several links at a time.
You should be able to find plenty of SO answers about saving your search results to a text file.
N.B. One more thing that confused me at first - presumably you'll want to search the entire web, and not just a single site. But when creating your CSE you will be asked to specify a single site, or list of sites, to search. Don't worry, just enter any old thing, you can delete it later. Even Google endorses this hack:
Convert a search engine to search the entire web: On the Custom Search
home page, click the search engine you want. Click Setup, and then
click the Basics tab. Select Search the entire web but emphasize
included sites. In the Sites to search section, delete the site you
entered during the initial setup process.
I just add 2 points to "9th Dimension" answer.
Use this guide to find your Custom Search Engine ID
A small modification should be made in the second line of the code: as the following, the "version" should be added as an argument
service = build('customsearch','v1',developerKey= my_api_key)
You have 2 options - using API or make a request like a browser does and then parse HTML.
First option is rather tricky to set up and is limited - 100 free queries/day, then 1000 for $5.
Second option is easier but it violates Google's ToS.
I have a python program that takes the md5 & sha1 hash values of passwords and searches for them on the internet using Google's custom search api. The problem is that I'm getting 0 results(which means the hash probably isn't in a rainbow table) when I run the program. But when I searched using my browser, I get a whole bunch of results, in fact at least 10 pages of results.
Could the problem lie in the cx value I used? I picked it up from the sample program provided by google as I couldn't figure out how to get one for myself. Or does the custom search api give only selected results and it's futile trying to get more results from it?
I know it's pretty old post but it is still returned very high in google results so a little bit of clarification:
You can create your own CSE in here: https://www.google.com/cse/ .
API codes can be created using API console: https://cloud.google.com/ .
Using Google Custom Search you can search the whole Web: go to the system from point 1, from the menu on the left choose the CSE to edit, then in the Configuration -> Basics -> Sites select the option to search the whole Web and finally remove previously specified sites.
Still using CSE you might not get the same results as using live google as it does not include google features (real-time results, social features etc.) and once you specify more than 10 sites to look on it can actually use sub-index. More information can be found in here: https://support.google.com/customsearch/answer/70392?hl=en
The Google Custom Search API let's you search the Google indexes for a specific website only, and you will not find any results from anywhere else on the internet. The cx parameter tells Google what website you want to search.
From the Google Custom Search Engine page:
With Google Custom Search, add a search box to your homepage to help people find what they need on your website.
You could use the deprecated Google Web Search API (JavaScript API, should work until November 2013), or you'd have to scrape the HTML UI provided to your browser instead (also see What are the alternatives now that the Google web search API has been deprecated?).