I am building a chatbot using nltk.util pairs. I am using regular expressions for the combinations! I want one of the responses to be “ Visit Google” where “Google” should be a hyperlink that should take you to https://www.google.com!
It depends on what media you are showing this hyperlink. You must format the output corresponding to this
For example, if it's in web page (and the page allows it), you can return a "" link
If it's in a cli, you probably won't be able with the standard interface of the OS.
Related
I would like to create a tool with Python and the Twitter API to be able to create lists of tweets that match certain criteria like "contains the word Python" or "has at least 2 likes". Or simple stats like top posters, most liked, etc.
All of my search pointed me to the Tweepy project. But for that I need 0Auth tokens. So I applied for a developer account and was denied with the comment "we are unable to serve your use case".
Do I have any alternatives?
Well, as a general answer for these situations, you can always use a web-based automation tool, which is basically a library that interacts with the remote feature of the browsers and replicates what would be you "opening the website, logging in, etc" and can subsequently parse all data from the rendered elements.
Try looking at selenium, i've used that library in the past to raw scrap facebook and it worked flawless.
Edit: Note that this isn't a twitter specific library, you will have to find the html tags in the login website and use them to log in, same for parsing data, etc.
i need to create a web scraper for this website
However I need to get the links for the counties, stored in the interactive map
Unfortunately, for some reason, their search engine doesn't provide all the results as the interactive map does.
My question:
Could anyone tell me how to get all the links for all the counties, without manually accessing them?
Thanks
Technically you can use a decompiler to do this job.
There are free (e.g.: ActionScript Extractor) and paid (e.g.: Sothink
SWF Decompiler) tools out there.
you can reference this answer
Edit :
Most swf content gets external records from either a .xml or .json file.
Without decompiling and just using the browser's Developer Tools we can see that an xml file is indeed accessed (maybe it contains what you want) :
http://www.allpetservices.co.uk/uk_ir_locator.xml.
Put view-source: in front of the link to read it (if there's an error message).
In that xml you want to extract the contents (the xyz) of each & every <link> xyz </link> tag. This will give you the links of every entry on the map.
The short answer to your question: There's no way to get the links from the site.
The solution: The structure of the links you are trying to retrieve are very predictable. They follow the same structure:
http://www.allpetservices.co.uk/search_map.asp?ccounty={COUNTY_NAME}
So, if you can use another site or data source to get the names of each of the counties, you can formulate each of the links that you need.
My goal is to create a small sript that find all the result of a google search but in "raw".
I don't speak english very well so i prefer to give an exemple to show you what i would like :
I Type : elephant
The script return
www.elephant.com
www.bluelephant.com
www.ebay.com/elephant
.....
I was thinking about urllib.request but the return value will not be usable to that !
I found some tutorials but not adapted at all to my wish !
Like i told you my goal is to have an .txt file as output witch contains alls the website who match with my query !
Thanks all
One simple way is to make a request to google search, then parse the html result. You can use some Python libraries such us Beautiful Soup to parse the html content easily, finally get the url link you need.
These seem to change often, so hopefully this answer remains useful for a little while...
First, you'll need to create a Google Custom Search, by visiting their site or following the instructions provided here https://developers.google.com/custom-search/docs/tutorial/creatingcse.
This will provide you with both
Custom Search Engine ID
API Key
credentials which are needed to use the service.
In your python script, you'll want to import the following package:
from googleapiclient.discovery import build
which will enable you to create a build object:
service = build("customsearch", developerKey=my_api_key)
According to the docs, this constructs a resource for interacting with the API.
When you want to return search results, call execute() on service's cse().list() method:
res = service.cse().list(q=my_search_keyword, cx=my_cse_id, **kwargs).execute()
to return a list of search results, where each result is a dictionary object. The i'th result's URL can be accessed with the "link" key:
ithresult = res[i]['link']
Note that you can only return 10 results in a single call, so make use of the start keyword argument in .list(), and consider embedding this call in a loop to generate several links at a time.
You should be able to find plenty of SO answers about saving your search results to a text file.
N.B. One more thing that confused me at first - presumably you'll want to search the entire web, and not just a single site. But when creating your CSE you will be asked to specify a single site, or list of sites, to search. Don't worry, just enter any old thing, you can delete it later. Even Google endorses this hack:
Convert a search engine to search the entire web: On the Custom Search
home page, click the search engine you want. Click Setup, and then
click the Basics tab. Select Search the entire web but emphasize
included sites. In the Sites to search section, delete the site you
entered during the initial setup process.
I just add 2 points to "9th Dimension" answer.
Use this guide to find your Custom Search Engine ID
A small modification should be made in the second line of the code: as the following, the "version" should be added as an argument
service = build('customsearch','v1',developerKey= my_api_key)
You have 2 options - using API or make a request like a browser does and then parse HTML.
First option is rather tricky to set up and is limited - 100 free queries/day, then 1000 for $5.
Second option is easier but it violates Google's ToS.
Right now I am trying to figure out how to take a row of data, maybe like 50 entries max, and enter it individually into a search bar. But first I need to understand the beginning concepts so I want to do a practice program that could take info from an Excel sheet and enter into a Google search or YouTube, for example.
My problem is there seems to be no resource on how to do this for beginners. All posts I have read are either parts of the whole problem or not related to actually using a search bar but instead creating one. Even then every post I read has 100 plug-ins I could possibly add.
I'm just looking for a consistent explanation to where I can grasp how I can manipulate code in order to use a search bar function.
To perform a web search (Google, YouTube or whatever) from a program you need to execute the search, either by building up and calling an appropriate search URL or by making a call to an API provided by that site.
The article 'Python - Search for YouTube video' provides a code sample and explanation of how to generate and call a URL to perform a YouTube keyword search. You could do something similar for a Google search by analysing the URL from the result of a Google search, or try searching for 'Python submit google search url'.
The above approach is simplistic and relies on the URL structure for a particular site staying the same. A more complex, reliable and flexible approach is to use the API. For YouTube:
YouTube API - Python developers guide
YouTube API - Python code samples - Search by keyword
I want to read text from a google document using google drive API (and python).
With "gdata" module, I didn't find a way to get a document object by its title.
A module named gspread does just that.
(I don't know any other module besides gdata. "gspread" is a module that solves exactly this kind of questinos and simplifies the interaction with googleapi, but it's for spreadsheets, not for documents).
So far I know how to iterate over a list of all items in the "drive" (with GetDocumentListFeed) and get the relevant doc by comparing each item'ts title.text.
is there a way to do something like
item = client.get(title='lalala')
?
It is possible but using the new Google Drive API. Using the list call and then a specific search query (See link below). If you are using the client library the call is as simple as:
file = client.files().list(q="title='YOUR TITLE'").execute()
This will return the items in your drive that match that title (in a dictionary). There are a lot more search queries that are possible if you visit this page