Python module for google search in GAE - python

I am trying to build an application in GAE using python. I needs to do is give the query received from user and give it to Google search and return the answer in a formatted way to the user. I found lots of questions asked here. But couldn't get a clear answer regarding my requirements. My needs are
Needs to process large number of links. Many Google API described gives only top four links
Which module is best regarding my requirement. Whether I need to go for something like Mechanize, Urllib... I don't know whether they work in GAE. Also found a Google API, but it gives only few results

There is no official library for what you're trying to do, and the Google Terms of Service prohibit using automated tools to 'scrape' search results.

Related

Can I get backlinks for a given url using Google APIs (Python)

As the title states I'm looking for a way to get backlinks for a given url / website using Google APIs, since I already have an api key and I'd rather use it instead of relying on other services.
I already tested services like ahrefs, majestic, moz, serpstat etc and actually they can give me the infomation I need, but I was wondering if there was a way to do it with Google.
For what I've read during my past researches I saw that Google offered a way to do it, but then it became deprecated, so no more usable. Do they really took away this feature for good?
I've also noticed that Google offers a similar service with his Google Search Console, but it can just be used for your own website, I'd like to get those kind of information for a random given url.
Actually I will be using Python in my project, but I don't think there's a package able to deliver me these kind of data, or at least I looked for it and didn't find anything.
Any help would be appreciated.

Can I scrape all URL results using Python from a google search without getting blocked?

I realize that versions of this question have been asked and I spent several hours the other day trying a number of strategies.
What I would like to is use python to scrape all of the URLs from a google search that I can use in a separate script to do text analysis of a large corpus (news sites mainly). This seems relatively straightforward, but none of the attempts I've tried have worked properly.
This is as close as I got:
from google import search
for url in search('site:cbc.ca "kinder morgan" and "trans mountain" and protest*', stop=100):
print(url)
This returned about 300 URLs before I got kicked. An actual search using these parameters provides about 1000 results and I'd like all of them.
First: is this possible? Second: does anyone have any suggestions to do this? I basically just want a txt file of all the URLs that I can use in another script.
It seems that this package uses screen scraping to retrieve search results from google, so it doesn't play well with Google's Terms of Service which could be the reason why you've been blocked.
The relevant clause in Google's Terms of Service:
Don’t misuse our Services. For example, don’t interfere with our Services or try to access them using a method other than the interface and the instructions that we provide. You may use our Services only as permitted by law, including applicable export and re-export control laws and regulations. We may suspend or stop providing our Services to you if you do not comply with our terms or policies or if we are investigating suspected misconduct.
I haven't been able to find a definite number, but it seems like their limit for the number of search queries a day is rather strict too - at 100 search queries / day on their JSON Custom Search API documentation here.
Nonetheless, there's no harm trying out other alternatives to see if they work better:
BeautifulSoup
Scrapy
ParseHub - this one is not in code, but is a useful piece of software with good documentation. Link to their tutorial on how to scrape a list of URLs.

Google search on browser and google search via the custom search api give different results for the same query

I have a python program that takes the md5 & sha1 hash values of passwords and searches for them on the internet using Google's custom search api. The problem is that I'm getting 0 results(which means the hash probably isn't in a rainbow table) when I run the program. But when I searched using my browser, I get a whole bunch of results, in fact at least 10 pages of results.
Could the problem lie in the cx value I used? I picked it up from the sample program provided by google as I couldn't figure out how to get one for myself. Or does the custom search api give only selected results and it's futile trying to get more results from it?
I know it's pretty old post but it is still returned very high in google results so a little bit of clarification:
You can create your own CSE in here: https://www.google.com/cse/ .
API codes can be created using API console: https://cloud.google.com/ .
Using Google Custom Search you can search the whole Web: go to the system from point 1, from the menu on the left choose the CSE to edit, then in the Configuration -> Basics -> Sites select the option to search the whole Web and finally remove previously specified sites.
Still using CSE you might not get the same results as using live google as it does not include google features (real-time results, social features etc.) and once you specify more than 10 sites to look on it can actually use sub-index. More information can be found in here: https://support.google.com/customsearch/answer/70392?hl=en
The Google Custom Search API let's you search the Google indexes for a specific website only, and you will not find any results from anywhere else on the internet. The cx parameter tells Google what website you want to search.
From the Google Custom Search Engine page:
With Google Custom Search, add a search box to your homepage to help people find what they need on your website.
You could use the deprecated Google Web Search API (JavaScript API, should work until November 2013), or you'd have to scrape the HTML UI provided to your browser instead (also see What are the alternatives now that the Google web search API has been deprecated?).

Python Code samples for GData Document List API

Since the change to Google's developer documentation, I can't seem to find any code samples. In particular I am looking for usage examples for
Searching Documents
Export documents in various formats
User impersonation for Google Apps users
I am following the links off the developer pages
https://developers.google.com/google-apps/documents-list/
Does anyone know of links to official examples? Alternative other example resources?
Thanks.
The Python documentation is being rewritten, in the meanwhile you can use the sample included in the library to see how to perform all common tasks:
http://code.google.com/p/gdata-python-client/source/browse/samples/docs/docs_v3_example.py
API docs here: http://packages.python.org/gdata/docs/api.html

How to get information from facebook using python?

I've looked at a lot of questions and libs and didn't found exactly what I wanted. Here's the thing, I'm developing an application in python for a user to get all sorts of things from social networks accounts. I'm having trouble with facebook. I would like, if possible, a step-by-step tutorial on the code and libs to use to get a user's information, from posts to photos information (with the user's login information, and how to do it, because I've had a lot of problem with authentication).
Thank you
I strongly encourage you to use Facebook's own APIs.
First of all, check out documentation on Facebook's Graph API https://developers.facebook.com/docs/reference/api/. If you are not familiar with JSON, DO read a tutorial on it (for instance http://secretgeek.net/json_3mins.asp).
Once you grasp the concepts, start using this API. For Python, there are at several alternatives:
facebook/python-sdk https://github.com/facebook/python-sdk
pyFaceGraph https://github.com/iplatform/pyFaceGraph/
It is also semitrivial to write a simple HTTP client that uses the graph API
I would suggest you to check out the Python libraries, try out the examples in their documentation and see if they are working and do the stuff you need.
Only as a last resort, would I write a scraper and try to extract data with screenscraping (it is much more painful and breaks more easily).
I have not used this with Facebook, but in the past when I had to scrape a site that required login I used Mechanize to handle the login and scraping and Beautiful Soup to parse the resulting HTML.

Categories